cache - compilation, interpretation, representation and cache management This is a collection of ideas about what I perceive as the main problem in IT management: the fact that compilation takes a long time, and the fact that management of compilation results (caches) is a hard problem in itself. There has to be some better way to look at this. As a side goal, is it possible to build a "personal IT system" that takes these ideas into consideration? Entry: btrfs + lxc migration Date: Wed Nov 30 10:03:26 EST 2016 - shut down the container - snapshot current state as readonly - (optionally) remove the writable volume - determine last common ancestor between current and destination - push an update (optionally, retry if error) - make a writable snapshot - start container on remote host The difficult parts here are: - common ancestor - retry if error Common ancestor list: - both source, dest should list ancestors - merge list, sort - dedupe, remove all that are not duplicate Alt: compute intersection of two lists as sets, then sort results. That would be easy to do in a real language with data structures, but can it be done quickly on the command line? http://www.commandlinefu.com/commands/view/5710/intersection-between-two-files grep -Fx -f file1 file2 EDIT: see subvol-migrate.sh EDIT: it seems that moving modified snapshots back doesn't work well because btrfs uses different UUIDs for local parents and received ids. going to use this only in a "germ line" configuration, with a master on a USB3.0 SSD. Entry: migration: treating machine state as cache Date: Wed Nov 30 10:05:07 EST 2016 I've been thinking about migration, and it is quite hard to implement properly. what seems simpler is to migrate only the meaningful application state, and treat all the rest of the state (which is usually *much* larger) as cache. This is what android does[1]: save only the significant portion of the application state, and allow the OS to simply kill the application state when it is swapped out. This idea can be pushed very far. [1] https://developer.android.com/reference/android/app/Activity.html#ActivityLifecycle Entry: Abstraction, compilation, caching Date: Wed Nov 30 10:11:42 EST 2016 To allow high abstraction, one needs: - compilation (meta-programming) - proper caching of compilation results Compilation translates high level description into low level implementation, and can be (and should be?) towered. High level: - infinite memory model with garbage collection, closures, continuations, tasks and type systems that eliminate meaningless corner cases Low level: - state machines, cached code and data (obtained from higher level descriptions Note that compilation allows (target) code and data to be treated in similar ways: functionality can be partitioned into interpreter and data in arbitrary ways that suit the requirements of the representation (usually trading off execution speed and memory usage). Entry: Seed interpreter Date: Wed Nov 30 10:22:58 EST 2016 The main take-away: do not design software in a way that the compilation needs to be done manually. Managing the full stack compilation cache is hard, but should be automated. This is the only real problem of IT management. Manually managing caches is busywork - try to solve the real problem instead. I believe it is essential that the entire stack should be built such that it can be recreated from a single seed: - source code - seed interpreter I wonder if it is possible to start building such a thing, and what the minimlal "seed interpreter" would be. For most practical use, currently, my seed interpreter is a PC with a (intensely) customized Debian installation. This is too large! One of the big advantages of letting cache be built from description, is to allow rebuilds when there are problems, e.g. OTP's supervisors. This solves a practical problem (humans write incorrect code) by relying on a simpler "germ line" of humanly meanigful state propagation that is easier to get correct. Entry: why is lxc-create so fast on btrfs? Date: Wed Nov 30 10:26:41 EST 2016 After doing it once, it seems that creating a copy form a template is quite fast. Entry: updating caches Date: Wed Nov 30 10:31:50 EST 2016 Compilation from scratch is very time consuming, so the main problem to make a cached architecture workable is to incrementally update caches. Examples: - make - apt-get update - btrfs send -p v1 v2 | btrfs receive . Entry: Representation of state : example: restarting emacs Date: Wed Nov 30 10:36:28 EST 2016 Currently I rely on keeping emacs on, always. It is probably better to move to a more robust configuration where it can be restarted from scratch. There is already the desktop file, which works well for just reloading source files. However, restarting processes is still a problem, and likely needs to be done manually, or hooked into buffer save. IDEA: This is the same idea as the use of thunks in lazy evaluation. Writing those thunks, or their idempotent relatives is sometimes quite hard. It's likely also the wrong problem but indicative of some lost relation between source and cached state. Entry: Commutation - migration vs caches Date: Wed Nov 30 10:50:23 EST 2016 Find some more examples of this commutation, and how to manage it better. modify source, then recompile what is necessary (make) automated update of modified binaries (apt-get upgrade) EDIT: The commutation here is the tradeoff between rebuilding and transfer. - Transfer seed, then (possibly incrementally) rebuild - Transfer cache (possibly incrementally) Speedup comes from the incremental bit, but the diagram is different. Ideally, the location of the incremental step should be automatable, but that is the part that is often not possible just due to complexity of the build chain.