Sun Aug 14 12:55:08 EDT 2016
Make computation movable
This is in essense about commutation diagrams that relate computations
and data transports.
User edits and compiles software over a slow network.
user terminal is attached to host A
user's data (source tree) is stored on host B
Connection between A and B is slow.
display terminal - editor - compiler - data
Editor (emacs) and compiler can be run:
Edit Compile limiting resource
A A bandwidth over NFS during compilation
A B bandwidth over NFS/SSH during editing
B B latency between terminal and editor
So this is a chain of two "computations" that both can be moved
to/from hosts, linked by "communications".
The optimal point seems to be A,B: reduce latency to make edits
frustration-free, but move the compilation step close to the data to
make it CPU-bound.
The point I want to make is that it should be possible to arbitrarily
move computations around to optimize the resource use. For this to
work the code needs to be the same on all hosts, which is manageable.
The problme is to set up the data transports in such a way that all
the possible routes are actually possibe.
That turns out the be the sticky point. Data transport is too
In this case there are a couple of options for the 3 pipes in the
terminal edit I/O compile I/O
AA local local NFS
AB local NFS/SSH local
BB SSH local local
Optimality is defined differently for these 3 pipes:
edit I/O: latency/troughput
compile I/O: troughput
Also, to set up that commutation automatically or even to switch
between them as you go is not a simple task!
I.e. I would like to be able to switch from AA to BB at a point where
it makes sense to do so if my edit and comple phases are separated in
Basically I already do this in a very ad hoc way, but it is done
manually. A missing step e.g is to sync data using git from one host
How to automate it?
- Make sure all hosts run the same software. Currently managed
manually but could be automated using chef/puppet.
- Extend command line to be "smart" about where to execute
disk-intensive commands if the current directory is an NFS store.
- Automate "focus" of caches, where other stores keep track of main
store such that spits are possible (e.g. detach laptop to work
isolated from the system).
Basically: I want a distributed system, not a "cloud" system.