Thu Aug 26 09:56:18 CEST 2010

Hash consing for static data structures: mixing Flash and RAM

As I mentioned before, sometimes an application builds an elaborate
data structure that's constant once it's built.  In embedded apps,
such a structure can go in Flash.

The problem is: ROM is "viral".  By its very definition, you can't
have static data depend on dynamic one, so all code that eventually
produces a static structure needs to be lifted to the next stage.

This is what the Racket[1] macro system is about.

So it is a bootstrapping problem.

Is it possible to somehow find the fixpoint in a simpler way?

I.e. instead of having init code build a RAM structure, have it:
     - build _code_ for initializing the structure in Flash
     - verify the current structure in Flash.

I.e. you compile and run the program at least twice.  The first time
the Flash verification fails (because there is no structure yet).  The
second time the generated Flash code is included and verified.

This reminds me of hash consing[2] for data structure sharing.

The problem however is to work around explicit mutation that's
necessary to build cyclic data structures.  It seems really
straightforward to do this for directed structures though.

What are the prerequisites?

  * Objects need to be constant.  No mutation!

That's really it.  The rest is a representation problem.  Data is
shared when it is there, and not when it is not.  The Flash is then a
pre-constructed collection of data from which we can draw.

If no cicular references are needed this is not a real limitation.

However, when circular references are important, how would we solve
it?  What is really needed is a mechanism to create circular
structures without mutation.  (The Y combinator for data).

Actually, the solution is quite simple.  You just need indirection
through names (just like the Y combinator) and somehow abstract name
resolution.  If each structure has a name (which is not its pointer!),
you can refer to structures by name inside the constructor, but you
construct fully resolved structures.  Then you hash-cons based on
name, and you manually resolve all non-hash-consed RAM structs.

[1] http://en.wikipedia.org/wiki/Racket_%28programming_language%29
[2] http://en.wikipedia.org/wiki/Hash_consing