Tue Jun 2 21:46:51 CEST 2009

smarter bootstrapping

So, now that I have an idea about how to make the primitives work,
maybe it's possible to modify eForth to use the buffered compiler,
then bootstrap using gForth.

Once this works, the same could be done for a set of primitives
written in Scheme.  This could then be extended into a working ANS
Forth that runs in Scheme, that can be used to bootstrap standard
Forths for other architectures on top of the Staapl Forth.


 - eForth + buffered compilation

 - bootstrap self-hosted Forth for PIC18 using eForth86/gForth

 - bootstrap eForth on top of Scheme for bootstrapping the
   microcontroller Forths without eForth86/gForth

It does look like eForth is manually bootstrapped: there is just an
ASM file which contains manually compiled threaded code.

This means it can't be metacompiled easily.  I'd like to make this a
bit more convenient.

With the new parser architecture it might be possible to bootstrap
both the target forth and the metacompiler directly from the same
source code using lazy circular programming, manually breaking cycles
in .f code if they occur.

That sounds like an interesting challenge.

The control words don't seem to be such a problem.  The parsing words
are.  Closing the loop there is the core of the problem.

However, there is a neat trick in eForth:


This makes it possible to avoid parsing words in the compiler, but
it requires the code to be threaded, not natively compiled.

This makes me think that trying to bootstrap by instantiating a 16-bit
binary image in Scheme might be feasible.  Once it is resolved and
relaxed, it can be directly transferred by mapping the primitives.

The parser can segment the code such that names can be mapped to
tokens.  From this, immediate words need to be identified so they can
be used in the compilation of the code.  This seems to be the
essential circle to break.  Parsing words defined in .f code are then
no longer part of the circle, and are written bottom-up in threaded

The "R>" trick might interfere with bootstrapping though..  Unless the
whole memory is lazy such that "@" effectively compiles the next word
in line..

There is probably going to be a problem where lazyness and state
(here) will interfere.  In this respect it seems that a 2-pass
algorithm is simpler:

  1. Construct an interpreted version of the compiler as a code graph
     based on scheme functions that string together the primitives.
     This means the compiler cannot inspect its threading mechanism
     (it's not there!).

  2. Use this version to compile the source again.

Hmm.. Direct execution might not work though.  Maybe simulating the
threading is better..

I do wonder if it's possible to use the second pass to go over the
tokens one by one and instantiate them.  If the immediate words are
runnable, they can generate the correct _number_ of tokens, but might
not yet be able to resolve them.  Lazy approach might work after all.

Wait... If the Forth could somehow use single assignment lazy
bootstrapping would work just fine.  The .f file is then a
specification of a string of bytes.

Maybe all this needs is a re-interpretation of "!" and "@" ?