Sun Aug 14 12:55:08 EDT 2016

Make computation movable

This is in essense about commutation diagrams that relate computations
and data transports.


User edits and compiles software over a slow network.

user terminal is attached to host A
user's data (source tree) is stored on host B

Connection between A and B is slow.

The chain:
display terminal - editor - compiler - data

Editor (emacs) and compiler can be run:

Edit Compile  limiting resource
A    A        bandwidth over NFS during compilation
A    B        bandwidth over NFS/SSH during editing
B    B        latency between terminal and editor

So this is a chain of two "computations" that both can be moved
to/from hosts, linked by "communications".

The optimal point seems to be A,B: reduce latency to make edits
frustration-free, but move the compilation step close to the data to
make it CPU-bound.

The point I want to make is that it should be possible to arbitrarily
move computations around to optimize the resource use.  For this to
work the code needs to be the same on all hosts, which is manageable.
The problme is to set up the data transports in such a way that all
the possible routes are actually possibe.

That turns out the be the sticky point.  Data transport is too

In this case there are a couple of options for the 3 pipes in the

   terminal  edit I/O  compile I/O
AA local     local     NFS
AB local     NFS/SSH   local
BB SSH       local     local

Optimality is defined differently for these 3 pipes:

terminal:    latency
edit I/O:    latency/troughput
compile I/O: troughput

Also, to set up that commutation automatically or even to switch
between them as you go is not a simple task!

I.e. I would like to be able to switch from AA to BB at a point where
it makes sense to do so if my edit and comple phases are separated in

Basically I already do this in a very ad hoc way, but it is done
manually.  A missing step e.g is to sync data using git from one host
to another.

How to automate it?

Baby steps.

- Make sure all hosts run the same software.  Currently managed
  manually but could be automated using chef/puppet.

- Extend command line to be "smart" about where to execute
  disk-intensive commands if the current directory is an NFS store.

- Automate "focus" of caches, where other stores keep track of main
  store such that spits are possible (e.g. detach laptop to work
  isolated from the system).

Basically: I want a distributed system, not a "cloud" system.