Opposing the cloud -- How to build a truly distributed personal computing system

Time has come to learn how to use more than one computer at once.
What was once the PC is now the network.  Main stream approach is to
move everything into the cloud and put only somewhat smart terminals
in the hands of consumers.  This, I don't like.

This log reports on an attempt to understand the problem, and to build
some effective automation.


Entry: The Unit Machine
Date: Sun Aug 14 12:11:58 EDT 2016

- Platform

Limited to Debian Linux, ideally synchronized to run the same versions
on all hosts to reduce complexity.  This can be relaxed when
interfaces allow it and it is needed for other reasons, e.g. to use
OpenWRT or uC RTOS, or mix Debian stable and testing/experimental for
security and stability reasons.


- Access control / attack surface

The system is split into an inside and an outside.  The idea is that
if one internal node is compromised, all nodes will be compromised.
Local users are trusted.

The external internet attack surface consists of SSH with pubkey
authentication.  This is unlikely to be a problem.

Takeover is most likely to happen through trojan or malicious external
web services, followed by local privilege escalation.  The risk for
this seems acceptable, and similar to single-host operation.

Node interconnect is done over OpenVPN with shared keys.  This allows
to distinguish between possibly malicious IoT devices and the unit
machine.

Where access is made to the outside world, care is taken to use "dumb"
protocols, client-originated requests, and high level language
protocol parsers.


- Trusted internal network

The internal network inside the OpenVPN space has relaxed constraints,
and uses NFS for file sharing, Erlang distribution protocol for
connecting host control software, and pubkey ssh access for remote
command execution.

Internal protocols still have security mechanisms that protect against
bugs and unsophisticated malicious users that can not perform a
service takeover followed by a local escalation attack.


Basically, I spent a lot of time being paranoid and trying to
understand the trust structure but the main point is that this seems
reasonable to the point of not needing more attention.


Entry: Data and Computation
Date: Sun Aug 14 12:47:53 EDT 2016

To make a distributed network behave as a normal PC - which is a lot
of what this is about - means to figure out where to put any of these
3 components:

- data access (e.g. storage, peripheral)
- computation
- user interface

The pattern I've found is that often the user interface and the data
or I/O are on different hosts, and the main question is where to run
the computation.

Some "forces:
- high data bw. pulls computation
- high I/O bw. pulls computation

Basicaly there is only one pattern: data pulls data.

The main problem is a routing problem: how to connect input to
computation to output in a way that the computation is pushed more
towards being CPU bound than data bound.


To make this practical, two mechanisms can be used:
- Make everything routable / network accessible (e.g. NFS)
- Make computation movable

The former is straightforward.
The latter is where the real problem lies.


Entry: Make computation movable
Date: Sun Aug 14 12:55:08 EDT 2016

This is in essense about commutation diagrams that relate computations
and data transports.

Examples:

User edits and compiles software over a slow network.

user terminal is attached to host A
user's data (source tree) is stored on host B

Connection between A and B is slow.


The chain:
display terminal - editor - compiler - data

Editor (emacs) and compiler can be run:

Edit Compile  limiting resource
--------------------------------------------------- 
A    A        bandwidth over NFS during compilation
A    B        bandwidth over NFS/SSH during editing
B    B        latency between terminal and editor


So this is a chain of two "computations" that both can be moved
to/from hosts, linked by "communications".

The optimal point seems to be A,B: reduce latency to make edits
frustration-free, but move the compilation step close to the data to
make it CPU-bound.


The point I want to make is that it should be possible to arbitrarily
move computations around to optimize the resource use.  For this to
work the code needs to be the same on all hosts, which is manageable.
The problme is to set up the data transports in such a way that all
the possible routes are actually possibe.

That turns out the be the sticky point.  Data transport is too
heterogenous.

In this case there are a couple of options for the 3 pipes in the
chain:

   terminal  edit I/O  compile I/O
   -------------------------------
AA local     local     NFS
AB local     NFS/SSH   local
BB SSH       local     local


Optimality is defined differently for these 3 pipes:

terminal:    latency
edit I/O:    latency/troughput
compile I/O: troughput

Also, to set up that commutation automatically or even to switch
between them as you go is not a simple task!

I.e. I would like to be able to switch from AA to BB at a point where
it makes sense to do so if my edit and comple phases are separated in
time.


Basically I already do this in a very ad hoc way, but it is done
manually.  A missing step e.g is to sync data using git from one host
to another.

How to automate it?

Baby steps.

- Make sure all hosts run the same software.  Currently managed
  manually but could be automated using chef/puppet.

- Extend command line to be "smart" about where to execute
  disk-intensive commands if the current directory is an NFS store.

- Automate "focus" of caches, where other stores keep track of main
  store such that spits are possible (e.g. detach laptop to work
  isolated from the system).


Basically: I want a distributed system, not a "cloud" system.


Entry: Smart remote commands
Date: Sun Aug 14 13:19:58 EDT 2016

See pool/bin/smart-remote

Execute disk-intensive make,git,darcs,... commands remotely if current
directory is an NFS share.


Entry: The Swarm
Date: Sat Mar 25 11:54:18 EDT 2017

This all just happened, but let's storify:

Why did I decide to use so many computers at once?
To learn about distributed computing.
Looking at the future, it is inevitable to live in such a world.

Current I have 12 fairly tightly coupled machines.  They all run
Debian, and share the /home/tom and /etc/net code repositories.  These
make up an "inner" network.

They all have some kind of real-world constraint that requires them to
be a separate hardware node.  This is really about physical cable
lengths, and how to allow parts to be powered off.

There are other machines, but the are really just appliances.
E.g. all the openwrt, dd-wrt routers.  These just require security
patches, but are otherwise separated from the "inner" network.

The main lessons learned:
- unified namespaces
- commutation (essentially, how to pick a computation node?)
- fork/merge
- centralization as simplification (e.g DNS)
- refocus master (= distributed centralization)