Opposing the cloud -- How to build a truly distributed personal computing system Time has come to learn how to use more than one computer at once. What was once the PC is now the network. Main stream approach is to move everything into the cloud and put only somewhat smart terminals in the hands of consumers. This, I don't like. This log reports on an attempt to understand the problem, and to build some effective automation. Entry: The Unit Machine Date: Sun Aug 14 12:11:58 EDT 2016 - Platform Limited to Debian Linux, ideally synchronized to run the same versions on all hosts to reduce complexity. This can be relaxed when interfaces allow it and it is needed for other reasons, e.g. to use OpenWRT or uC RTOS, or mix Debian stable and testing/experimental for security and stability reasons. - Access control / attack surface The system is split into an inside and an outside. The idea is that if one internal node is compromised, all nodes will be compromised. Local users are trusted. The external internet attack surface consists of SSH with pubkey authentication. This is unlikely to be a problem. Takeover is most likely to happen through trojan or malicious external web services, followed by local privilege escalation. The risk for this seems acceptable, and similar to single-host operation. Node interconnect is done over OpenVPN with shared keys. This allows to distinguish between possibly malicious IoT devices and the unit machine. Where access is made to the outside world, care is taken to use "dumb" protocols, client-originated requests, and high level language protocol parsers. - Trusted internal network The internal network inside the OpenVPN space has relaxed constraints, and uses NFS for file sharing, Erlang distribution protocol for connecting host control software, and pubkey ssh access for remote command execution. Internal protocols still have security mechanisms that protect against bugs and unsophisticated malicious users that can not perform a service takeover followed by a local escalation attack. Basically, I spent a lot of time being paranoid and trying to understand the trust structure but the main point is that this seems reasonable to the point of not needing more attention. Entry: Data and Computation Date: Sun Aug 14 12:47:53 EDT 2016 To make a distributed network behave as a normal PC - which is a lot of what this is about - means to figure out where to put any of these 3 components: - data access (e.g. storage, peripheral) - computation - user interface The pattern I've found is that often the user interface and the data or I/O are on different hosts, and the main question is where to run the computation. Some "forces: - high data bw. pulls computation - high I/O bw. pulls computation Basicaly there is only one pattern: data pulls data. The main problem is a routing problem: how to connect input to computation to output in a way that the computation is pushed more towards being CPU bound than data bound. To make this practical, two mechanisms can be used: - Make everything routable / network accessible (e.g. NFS) - Make computation movable The former is straightforward. The latter is where the real problem lies. Entry: Make computation movable Date: Sun Aug 14 12:55:08 EDT 2016 This is in essense about commutation diagrams that relate computations and data transports. Examples: User edits and compiles software over a slow network. user terminal is attached to host A user's data (source tree) is stored on host B Connection between A and B is slow. The chain: display terminal - editor - compiler - data Editor (emacs) and compiler can be run: Edit Compile limiting resource --------------------------------------------------- A A bandwidth over NFS during compilation A B bandwidth over NFS/SSH during editing B B latency between terminal and editor So this is a chain of two "computations" that both can be moved to/from hosts, linked by "communications". The optimal point seems to be A,B: reduce latency to make edits frustration-free, but move the compilation step close to the data to make it CPU-bound. The point I want to make is that it should be possible to arbitrarily move computations around to optimize the resource use. For this to work the code needs to be the same on all hosts, which is manageable. The problme is to set up the data transports in such a way that all the possible routes are actually possibe. That turns out the be the sticky point. Data transport is too heterogenous. In this case there are a couple of options for the 3 pipes in the chain: terminal edit I/O compile I/O ------------------------------- AA local local NFS AB local NFS/SSH local BB SSH local local Optimality is defined differently for these 3 pipes: terminal: latency edit I/O: latency/troughput compile I/O: troughput Also, to set up that commutation automatically or even to switch between them as you go is not a simple task! I.e. I would like to be able to switch from AA to BB at a point where it makes sense to do so if my edit and comple phases are separated in time. Basically I already do this in a very ad hoc way, but it is done manually. A missing step e.g is to sync data using git from one host to another. How to automate it? Baby steps. - Make sure all hosts run the same software. Currently managed manually but could be automated using chef/puppet. - Extend command line to be "smart" about where to execute disk-intensive commands if the current directory is an NFS store. - Automate "focus" of caches, where other stores keep track of main store such that spits are possible (e.g. detach laptop to work isolated from the system). Basically: I want a distributed system, not a "cloud" system. Entry: Smart remote commands Date: Sun Aug 14 13:19:58 EDT 2016 See pool/bin/smart-remote Execute disk-intensive make,git,darcs,... commands remotely if current directory is an NFS share. Entry: The Swarm Date: Sat Mar 25 11:54:18 EDT 2017 This all just happened, but let's storify: Why did I decide to use so many computers at once? To learn about distributed computing. Looking at the future, it is inevitable to live in such a world. Current I have 12 fairly tightly coupled machines. They all run Debian, and share the /home/tom and /etc/net code repositories. These make up an "inner" network. They all have some kind of real-world constraint that requires them to be a separate hardware node. This is really about physical cable lengths, and how to allow parts to be powered off. There are other machines, but the are really just appliances. E.g. all the openwrt, dd-wrt routers. These just require security patches, but are otherwise separated from the "inner" network. The main lessons learned: - unified namespaces - commutation (essentially, how to pick a computation node?) - fork/merge - centralization as simplification (e.g DNS) - refocus master (= distributed centralization)