Sun Aug 16 12:09:57 CEST 2009

New PF interpreter

The interpreter evaluates the following recursive data structure.

( Note that this represents fully resolved byte code: it doesn't contain
  any variable references. )

      SEQ = (SUB : SUB)            ;; `:' is graph CONS
      SUB = PRIM | QUOTE | SEQ
      K   = HALT | (SUB . K)       ;; `.' is linear CONS

SEQ  Code sequencing provides the encoding of what to do _now_ and what
     to do _next_

SUB  A subroutine is a basic unit of work: invoke a primitive function,
     load a data object onto the parameter stack, or perform a
     code sequence.

K    The continuation is a stack constructed at run-time, representing
     the rest of the evaluation, finishing with HALT.

The interpreter uses this data structure as follows (in pseudo code,
an ML implementation can be found here[1]).

      sub = POP(K)
      case sub
         SEQ (now : next)  
           -> K = PUSH(now, PUSH(next, K))
           -> execute PRIM
           -> load data item on parameter stack

SUB is the main code type (the subroutine).  It is what you pass to
`run', and it is what can be found as resume points in the
continuation.  It includes primitive code and data, as well as
composite code sequences (SEQ).

When a SEQ is encoded, the two SUBs will be pushed onto K in the
correct order.  Note that this encoding automatically gives proper
tail call handling[2]: loops can be implemented using recursion
without blowing up the K stack.

                            * * *

The above is the representation of compiled code.

Concatenative (Joy-like) code can be compiled straightforwardly to
SEQ-based code.  This can be illustrated using the standard lisp
CDR-linked pair notation to represent both source code CONS pairs and
SUB pairs.  The the symbolic program

   (a b c d)     = CONS(a, CONS(b, CONS(c, CONS(d, NIL))))

is compiled to an improper list 

   (A B C . D)   = SEQ(A, SEQ(B, SEQ(C, D)))

where each symbol x is mapped to a corresponding X representing a SUB.
Resolution happens using a global vocabulary that maps symbols to
                            * * *

The code representation as a binary graph has several advantages
related to the similarity between binary graphs and binary trees (PF's
linear pair data structure for data storage and representing
the machine continuation).

    - SEQ as a pair of SUBs gives a very direct encoding of "now" vs
      "next".  This makes it really easy to represent continuations as
      a stack of SUBs.

    - The continuation K ``looks like'' a SEQ (a particular kind of
      SUB), in that it is a linear pair of a SUB and a K.

    - The machine can run without nonlinear data allocation:
      continuation push/pops are linear.

    - One-shot full and partial continuations (segments of the K
      stack) are (isomorphic to) linear lists.

    - This isomorphism obviously works also the other way: linear
      lists constructed at run time can be converted to one-shot full
      or partial continuations.  (This can be used to emulate one-shot

    - Linear one-shot closures/continuations can be _copied_ to make
      them multi-shot.  

    - They can also be _compiled_ to code (linear pairs represented by
      nonlinear SEQ pairs) so they can be reused any number of times.
      However, this is not a linear operation.

( Note that the isomorphisms need some work to figure out quoting
issues.  Essentially it will boil down to different kinds of
continuation frames. )

[1] http://zwizwa.be/darcs/libprim/pf/pf.ml
[2] http://repository.readscheme.org/ftp/papers/ai-lab-pubs/AIM-443.pdf