cache - compilation, interpretation, representation and cache management

This is a collection of ideas about what I perceive as the main
problem in IT management: the fact that compilation takes a long time,
and the fact that management of compilation results (caches) is a hard
problem in itself.

There has to be some better way to look at this.

As a side goal, is it possible to build a "personal IT system" that
takes these ideas into consideration?


Entry: btrfs + lxc migration
Date: Wed Nov 30 10:03:26 EST 2016

- shut down the container
- snapshot current state as readonly
- (optionally) remove the writable volume
- determine last common ancestor between current and destination
- push an update  (optionally, retry if error)
- make a writable snapshot
- start container on remote host


The difficult parts here are:
- common ancestor
- retry if error


Common ancestor list:
- both source, dest should list ancestors
- merge list, sort
- dedupe, remove all that are not duplicate

Alt: compute intersection of two lists as sets, then sort results.

That would be easy to do in a real language with data structures, but
can it be done quickly on the command line?
http://www.commandlinefu.com/commands/view/5710/intersection-between-two-files
grep -Fx -f file1 file2

EDIT: see subvol-migrate.sh

EDIT: it seems that moving modified snapshots back doesn't work well
because btrfs uses different UUIDs for local parents and received ids.
going to use this only in a "germ line" configuration, with a master
on a USB3.0 SSD.


Entry: migration: treating machine state as cache
Date: Wed Nov 30 10:05:07 EST 2016

I've been thinking about migration, and it is quite hard to implement
properly.  what seems simpler is to migrate only the meaningful
application state, and treat all the rest of the state (which is
usually *much* larger) as cache.

This is what android does[1]: save only the significant portion of the
application state, and allow the OS to simply kill the application
state when it is swapped out.

This idea can be pushed very far.

[1] https://developer.android.com/reference/android/app/Activity.html#ActivityLifecycle


Entry: Abstraction, compilation, caching
Date: Wed Nov 30 10:11:42 EST 2016

To allow high abstraction, one needs:
- compilation (meta-programming)
- proper caching of compilation results

Compilation translates high level description into low level
implementation, and can be (and should be?) towered.

High level:

- infinite memory model with garbage collection, closures,
  continuations, tasks and type systems that eliminate meaningless
  corner cases

Low level:

- state machines, cached code and data (obtained from higher level
  descriptions


Note that compilation allows (target) code and data to be treated in
similar ways: functionality can be partitioned into interpreter and
data in arbitrary ways that suit the requirements of the
representation (usually trading off execution speed and memory usage).


Entry: Seed interpreter
Date: Wed Nov 30 10:22:58 EST 2016

The main take-away: do not design software in a way that the
compilation needs to be done manually.  Managing the full stack
compilation cache is hard, but should be automated.  This is the only
real problem of IT management.  Manually managing caches is busywork -
try to solve the real problem instead.

I believe it is essential that the entire stack should be built such
that it can be recreated from a single seed:
- source code
- seed interpreter

I wonder if it is possible to start building such a thing, and what
the minimlal "seed interpreter" would be.


For most practical use, currently, my seed interpreter is a PC with a
(intensely) customized Debian installation.  This is too large!


One of the big advantages of letting cache be built from description,
is to allow rebuilds when there are problems, e.g. OTP's supervisors.
This solves a practical problem (humans write incorrect code) by
relying on a simpler "germ line" of humanly meanigful state
propagation that is easier to get correct.


Entry: why is lxc-create so fast on btrfs?
Date: Wed Nov 30 10:26:41 EST 2016

After doing it once, it seems that creating a copy form a template is
quite fast.


Entry: updating caches
Date: Wed Nov 30 10:31:50 EST 2016

Compilation from scratch is very time consuming, so the main problem
to make a cached architecture workable is to incrementally update
caches.  Examples:

- make
- apt-get update
- btrfs send -p v1 v2 | btrfs receive .


Entry: Representation of state : example: restarting emacs
Date: Wed Nov 30 10:36:28 EST 2016

Currently I rely on keeping emacs on, always.  It is probably better
to move to a more robust configuration where it can be restarted from
scratch.

There is already the desktop file, which works well for just reloading
source files.  However, restarting processes is still a problem, and
likely needs to be done manually, or hooked into buffer save.

IDEA: This is the same idea as the use of thunks in lazy evaluation.
Writing those thunks, or their idempotent relatives is sometimes quite
hard.  It's likely also the wrong problem but indicative of some lost
relation between source and cached state.


Entry: Commutation - migration vs caches
Date: Wed Nov 30 10:50:23 EST 2016

Find some more examples of this commutation, and how to manage it
better.

modify source, then recompile what is necessary (make)
automated update of modified binaries (apt-get upgrade)

EDIT: The commutation here is the tradeoff between rebuilding and transfer.

- Transfer seed, then (possibly incrementally) rebuild
- Transfer cache (possibly incrementally)

Speedup comes from the incremental bit, but the diagram is different.
Ideally, the location of the incremental step should be automatable,
but that is the part that is often not possible just due to
complexity of the build chain.