A ``primitive oriented'' scripting language core.

EXPERIMENTAL

The libprim[1] project has two somewhat independent goals, which are
mostly inherited from the PacketForth project[2].

1. Construct a collection of primitive C modules to build scripting
   languages.

2. Collect basic C modules for image processing which are easily
   bolted onto any kind of scripting language.


All core C modules are designed in such a way that they can be used
stand-alone in a C application, but are easily ``lifted'' into a more
expressive data/code model for prototyping and debugging.

The scripting language part of libprim constists of 3 main layers:

  * LEAF: C objects (bytes, symbol, port, parser, tuple, ...) with
    manual malloc()-based memory management.

  * EX: graph memory with automatic asynchronous mark/sweep GC, used
    for representing data expressions.  (i.e. a Scheme memory model)

  * PF/SC: VMs implementing two execution models using the EX data
    structures.  SC is a standard Scheme tree-walking interpreter.  PF
    is a linear stack language (a Scheme/Forth hybrid).

[1] http://zwizwa.be/darcs/libprim
[2] http://zwizwa.be/packetforth


Entry: A bit less cryptic
Date: Tue Aug  4 11:14:36 CEST 2009

Practically, this project is about giving ideas implicit in Packet
Forth[1] a new home.

For day-to-day problem solving in embedded software projects, C is not
going anywhere because next to a programming language, it is also an
interface standard, both for commodity machines and commodity
programmers.

Instead of focussing on building new languages, I think it is
worthwhile to do some engineering on with subsets of C itself, by
providing an infrastructure that makes it easier to bootstrap
``functional, concurrency oriented C code''.

Hopefully this idea will become easier to articulate once the
implementation is done.

The goal is this:

   * Write a re-usable collection of C primitives necessary to support
     a dynamically typed scripting language, but do this in such a way
     that it does _not_ depend on external memory or control models.

   * To test the primitives and the programming method, write a couple
     of dyntyped scripting languages (memory and control models) on
     top of the primitives.  I'm thinking about two: a PF-like
     concatenative linear stack language and a Scheme-like
     CONS/CONT/GC based language.

   * Write a compiler for static memory allocation and scheduling to
     optimize away the scripting layer.

   * Use a data-flow architecture (inspired by Oz).

Static types?  Given that the goal of the project is _prototyping_
which is mainly _evolving specification_ I think it's better to stick
to a simpler dynamically typed approach.

[1] http://zwizwa.be/packetforth


Entry: CEKS machine
Date: Tue Aug  4 11:25:38 CEST 2009

Scheme implemented in terms of a CEKS[1] machine.  Focus on the
evaluator first and ignore GC.  Some CEKS machine for
Javascript[3][4].

It might be easier to start with the CEK machine, and add the store
and GC later.

Anyways..  I started with the GC.  Bottom up seems better.  It already
gives a way to encode types.

[1] http://www.cs.utah.edu/plt/publications/pllc.pdf
[2] md5://3ab517a1167f86ef52caf3bf1a4fdb0c
[3] http://wiki.ecmascript.org/doku.php?id=meetings:dave_herman_presentation
[4] http://www.ccs.neu.edu/home/dherman/javascript/
[5] http://www.cs.utah.edu/~mflatt/past-courses/cs6520/public_html/s00/secd.ps

Entry: Lightweight tasks
Date: Tue Aug  4 12:20:48 CEST 2009

Based on the interface present in GNU PTH[1].

[1] http://www.gnu.org/software/pth/


Entry: Primitives and Composition, an OS?
Date: Tue Aug  4 12:25:20 CEST 2009

Any form of language design is based on building primitives (axioms)
and composition mechanism (inference rules).

Is is a stretch to say that an Operating System is actually a
composition mechanism?  If tasks are primitives, gluing tasks together
is what the OS does.


Entry: Aspect oriented programming
Date: Tue Aug  4 12:29:47 CEST 2009

The idea I'm trying to encode is the addition of join points to a C
program: make all these points available to an observer (scripting
language).

[1] http://en.wikipedia.org/wiki/Aspect_oriented_programming


Entry: Garbage Collector
Date: Tue Aug  4 14:53:52 CEST 2009

I've added a simple copying GC in ceks/gc.c for allocating vectors, as
a memory model for implementing the Scheme interpreter.  It uses
tagged objects: vectors, atoms, integers, and one available user tag.

The vectors themselves can be tagged (i.e. to implement a couple of
primitive structs) when the maximum vector size size is limited.

[1] http://www.brpreiss.com/books/opus5/html/page426.html

Entry: Symbols
Date: Tue Aug  4 17:58:47 CEST 2009

Instead of using a hash table, a preliminary implementation could just
use a stack.  This will make parsing slower, but can be easily changed
later.


Entry: Continuations
Date: Tue Aug  4 20:32:34 CEST 2009

So.. ISWIM uses n-ary primitive application, but unary abstraction and
application.  Why is this?  It seems that it makes the extension of
the environment on application easier: only a single variable has to
be added.

Maybe this isn't necessary for Scheme.  A multivalued application adds
all variables at the same time.  Instead of arg,fun,opd as explained
in PLLC, I needs just a single type that evaluates all variables in an
application from left to right, starting with the function position,
and then performs either an application (extending the environment) or
a primitive evaluation (production of a value).

So how to represent (x x x _ x x x) where "_" is the hole?

In short: what does the CEKS machine do?

   It converts the current expression and its continuation in a
   simplified expression and an updated continuation.  It is
   _application_ (triggered by the availability of the last argument)
   that _pops_ the K stack.

So.. Representing continuations.

(((f a b) E) K) -> ( _ (((f E) (a E) (b E)) K))
                -> ((f E) (((a E) (b E)) K))


Entry: Objects and Lambdas
Date: Wed Aug  5 09:43:14 CEST 2009

What I'm trying to accomplish probably has a lot of parallells with
COLA[1].  However, my concern is mostly interoperability with C.

Anyways.  GC and objects.  It might be wise to require objects to
exhibit a class structure: each object points to a record of methods,
where the first couple are used by the GC.
 

[1] http://piumarta.com/software/cola/


Entry: Confused about primitives
Date: Wed Aug  5 13:14:07 CEST 2009

This is ironic: the goal is to properly define primitives, but I'm
already getting hosed!

Once conclusion to draw: there is a set of C primitives that is best
written directly in the required API (sc ,object, ...) -> object
instead of in unwrapped C, because of the tight coupling with the data
representation.  These _direct primitives_ are written manually.

In short: to keep things simple, the interpreter is written in terms
of primitive functions using all scheme datatypes (ala tinyscheme).

What cannot be avoided however is functions that convert between
scheme and C values (esp. strings<-> and ints).  it's best to limit
the use however, and try to use as much as possible real scheme
primitives to avoid duplication (don't give in to premature opti
here!).

So, let's try it:

          All C functions used in the implementation of the Scheme
          interpreter are Scheme primitives.

This fixes the primitive calling convention to:

object fn(sc*, object, ...)

This will raise other problems since I'm not sure it's possible to
have variable argument invokation (assumed from Pure Data source).


Entry: Representing constants
Date: Wed Aug  5 13:54:45 CEST 2009

I might be useful to make it possible to directly cast non-managed
objects, by assuming the class pointer is in the slot _before_ the
pointer.  It is upto the user then to _never_ place them inside GC
managed vectors, as they won't have a valid class pointer.

Elaborate on this a bit..

I.e. it would be nice for a (const char *) to be a valid object that
can reside in a collected data structure.  I.e. the object itself
should be recognizable as not requiring free().

The problem then is that I'm running out of tags:

00 non-managed pointer
01 vector
10 integer
11 bool

Using more than 2 tag bits places extra contstraints on pointer
alignment.  Booleans are useful for predicate implementation in
Scheme.

It is possible however to use the 01 vector slot, since we know what
that will point to:
     - integer (it's a live vector, and the int = size)
     - vector  (moved object)

This could be a non-managed pointer to represent a class?

Maybe another workaround is simpler: doubly wrapping all predicates
and representing bools as integers.

Let's stick to what we have and thing about this a bit more.. It looks
like bools are more useful than constants.  

Otoh: since bools require only two values, and are actually constants,
the problem is solved:


00 constant = non-managed pointer
01 vector
10 integer
11 managed pointer


Let's implement this first before rewriting everything to primitives.


Entry: Unsafe pointers
Date: Wed Aug  5 15:49:26 CEST 2009

I get it now: by making sure the atom_class pointer is the slot one
_before_ the atom pointer, it is possible to write unsafe functions
without too much hassle.

The free() pointer is stored in a class object.  This class object is
the value's type.  Some values however are only statically typed, and
appear as unsafe pointers if they make it into Scheme land.


Entry: debugging
Date: Wed Aug  5 17:48:59 CEST 2009

Hmm.. it's more difficult than I thought.  The interpreter itself I
can probably get debugged, since its structure is quite
straightforward.  However, the GC is going to be an adventure.
Currently I have no way to properly mark roots in case GC is
triggered inside a C primitive.

During the evaluation of step() we could push all allocated atoms to a
stack, and discard it at the end.  This makes sure that intermediate
data will not be freed.

Alternatively, the heap could be marked such that all objects past the
marker need to be copied first.

But this then wouldn't work if there are 2 or more collections.

Alternatively, the marking point could be translated into a vector
that is then added to a local stack of references.

The real question is: does the interpreter know what to mark?

Let's change a gc->root into a callback function.

Done.

Now it shouldn't be to hard to turn a gc marker into an array.
Basicly something like this:

        sc->marker = gc_marker(sc->gc)
        step()
        free_retain_stack()

If there is a collection we do something like

        push_retain_stack(gc_marker_to_vector(sc->marker))
        mark_all()
        sc->marker = gc_marker(sc->gc)


OR

do it manually.
which is probably too error-prone..

Ok.. what about keeping most of it hidden inside the GC.

When the GC calls mark_roots() it will pass in a vector of references
containing the objects that were saved since the last border mark.

Actually, this fancy trick doesn't work because the C stack still has
the _old_ pointers.

Not simple!

One option would be to abort the primitive whenever a collection is
necessary.  If they are written in a purely functional style this
shouldn't be a problem.


Entry: mixing GC and C
Date: Wed Aug  5 19:44:17 CEST 2009

The problem is this: a gc_alloc() inside a primitive might trigger a
collection.  At that point, all the pointers on the C stack will
become invalid.

To overcome this, all primitives that trigger gc_alloc() should be
purely functional, such that an allocation can restart the interpreter
step() that triggered the gc_alloc().

So.. If the gc itself is also made stateless (and written in CPS) then
this could work pretty well.

Lets see if gc_alloc() can be aborted.

Looks like it.. One problem however is that the current implementation
of gc_grow() will never be reached, so the program will end up in an
infinite loop when there isn't enough memory to account for a single
primitive execution.  Here the mark_wild() function would come in
handy:

If it is detected that a single primitive step cannot continue because
there is not enough heap space, the heap needs to grow.  As long as
gc_collect() doesn't abort the current GC invocation, this can be done
automatically.


Entry: constant strings
Date: Wed Aug  5 21:29:40 CEST 2009

Since these are not aligned, they cannot be objects.  So, no
constant strings.

Some goes for primitives.


Entry: First interpreter run
Date: Thu Aug  6 02:15:41 CEST 2009

# (zero? 123)

tom@zni:~/libprim/ceks$ ./gc-test
#state(#closure((zero? 123) ()) ())
#state(#closure(zero? ()) #frame(() (#closure(123 ())) ()))
#state(#closure(#prim<0x4014f3:1> ()) #frame(() (#closure(123 ())) ()))
#state(#closure(123 ()) #frame((#closure(#prim<0x4014f3:1> ())) () ()))
#state(#closure(#f ()) ())
ERROR: halt: #state(#closure(#f ()) ())
Trace/breakpoint trap


So primitive execution seems to work.  Still to test: abstraction and
application.  After fixing some bugs we're at:

# ((lambda (abc) 123) 456)

tom@zni:~/libprim/ceks$ ./gc-test
#state(#closure(((lambda (abc) 123) 456) ()) ())
#state(#closure((lambda (abc) 123) ()) #frame(() (#closure(456 ())) ()))
#state(#closure(#lambda(#(abc) 123) ()) #frame(() (#closure(456 ())) ()))
#state(#closure(456 ()) #frame((#closure(#lambda(#(abc) 123) ())) () ()))
#state(#closure(123 ((abc . #closure(456 ())))) ())
ERROR: halt: #state(#closure(123 ((abc . #closure(456 ())))) ())
Trace/breakpoint trap


Entry: Pre-emption
Date: Thu Aug  6 10:46:21 CEST 2009

So, let's install the preemption based garbage collector + write the
first `functional C' primitive task for parsing s-expressions = a
token enumerator.


Entry: constants vs. closures
Date: Thu Aug  6 12:09:08 CEST 2009

Yesterday went well.  The basic infrastructure seems to work.  Now I
need to get the small errors in the representation right.

Currently I have constants tagged to an environment.  This won't hurt,
but doesn't make much sense.  Maybe change the rep a bit.

The environment contains closures (open_term + environment) or
constants.

OK: it's way simpler to make everything a closure, but tag contstants
with an empty environment.

So.. Something is not right.  This gives an error:

   (post (cons 111 ((lambda (abc) 123) 456)))

ERROR: undefined: abc

I currently lost oversight so let's re-connect with the textbook
rules.


Entry: CEK primitives
Date: Thu Aug  6 15:01:19 CEST 2009

Looks like I'm confusing two things:

    * primitive values + fully reduced closures  (lambda + env)

    * non-fully reduced closures (application/varref/value + env)

I'd like to make sure that primitives only see primitive values and
fully reduced closures.  Also, primitives should be able to return
closures (even not fully reduced).

Where is the inconsistency?

Looking at PLLC[1] Chapter 6, p 74 at the CEK reduction rule [cek5], it
seems that primitives get passed _only_ primitive values, not
closures.

How can I make this consistent with a primitive that can operate on
closures as values?

Hmm.. Let's provide better names for the data structure.

1. A continuation frame represents the current state of argument
   evaluation from reducable closures to irreducable closures.

2. A fully evaluated continuation frame can be _applied_ which means
   that a new (reducable) closure is created by extending the closure
   in the first position with variables.


From what I could test it seems that application works fine.  So I
guess I need to figure out how to understand the difference between:

      - a machine where _all_ values are closures, but primitives can
        only accept and return primitive values (which have their
        environment dropped before primitive evaluation)

      - a machine where values are either closures or primitive
        values, and primitive functions can operate on both.

So, either the interpreter's notion of "closure" is made abstract, or
it is changed to directly operate on both primitive values and
closures.

What I could do is the following: regard interpreter state and
environment + data structures as different worlds.

STATE: the current reducable closure and continuation contain closure
structs only, but all communucation with the environment (extension
and reference) and primtives will pack/unpack closures to variables.

Ok. Added closure_pack() and closure_unpack().

[1] http://www.cs.utah.edu/plt/publications/pllc.pdf
[2] md5://3ab517a1167f86ef52caf3bf1a4fdb0c


Entry: Lists vs. applications
Date: Thu Aug  6 16:44:38 CEST 2009

Currently the implementation doesn't distinguish between applications
and lists.  The CEK machine doesn't deal with `constructors' which are
essentially non-evaluated functions.

I don't really know how this is done in practice.  Let's just wrap
code in a syntax object for now.

This seems to work.  There are now two kinds of terms: non-reduced
terms are SYNTAX while reduced terms are anything else.

Anyways.. It works, but it's probably a bit unorthodox.  This can't
represent syntax as values.  The real solution would be to represent
special forms, applications, variables references and quotes as
separate entities.

This will have to do for now.


Entry: Testing gc preemption
Date: Thu Aug  6 17:47:06 CEST 2009

Seems to work.  There is a problem though: triggering GC outside of
the mainloop is not well-defined.

This basicly boils down to the fact that the GC managed memory is
owned by the interpreter.  The caller of _sc_run() has no business
using these data structures.

It's useful for testing though.  It's necessary to perform a GC before
doing any allocation to prevent corruption.


Entry: Working?
Date: Thu Aug  6 18:51:02 CEST 2009

Looks like it's working. 

tom@zni:~/libprim/scheme$ wc -l *.[ch]
  167 gc.c
   10 gc_config.h
  133 gc.h
   54 main.c
  559 scheme.c
  201 scheme.h
   38 scheme_prim.h
   41 symbol.c
   28 symbol.h
 1231 total

It has almost nothing though.  Only evaluation + supporting prims.
Missing:
        - special forms (if, set!)
        - macros
        - reader
        - ports

The special forms might be enough so reader can maybe be written in
Scheme and compiled to C spec of the syntax tree?

Both special forms need modification to the continuation.  Assignment
will need to be done in such a way that GC interrupt isn't possible.

I'm adding a couple of structs to accomodate these different
continuation types.

Once realizing the frame type is only one of different kinds of
continuations, it was quite straigtforward to add 'if', 'set!',
'begin' and 'lambda' with multiple expressions.


Entry: Coroutines
Date: Thu Aug  6 19:30:26 CEST 2009

This paper defends the revival of (asymmetric) coroutines as a general
control abstraction[1].

[1] http://lambda-the-ultimate.org/node/2868


Entry: Macros
Date: Fri Aug  7 09:17:45 CEST 2009

This will require a syntax-level environment, and a lookup for every
form.

A macro continuation is simply a tag: the current value (s-expression)
needs to be tagged such that it is wrapped as SYNTAX and passed to the
interpreter.

I'm going to change the wording.  What is now SYNTAX should become
AST: a tag for reducable term.  SYNTAX in Scheme sense seems to be
something different: it is a data structure with binding information.
Even if I'm not going to support the latter, it's best not to confuse
things.

Ok, macros work by:  constructing a k_apply linked to k_macro.  The
k_apply is filled so it will trigger the application of the macro.
The k_macro wraps the result as an AST to trigger reduction.

Entry: C-Scheme
Date: Fri Aug  7 10:23:06 CEST 2009

Working with C-callable Scheme primitives makes the step towards a
PreScheme-like implementation easier.  Using only `let*', `if',
`begin', `set!', and a restriction on function application (only
primitives can be called from primitives) this could be turned into a
simple Scheme->C compiler.

At least, the way the interpreter step is implemented now in C +
Scheme datatypes, could be embedded into equivalent Scheme quite
easily.

        Scheme          C
        --------------------------------------------
        let*            Lexical variables and blocks.
        begin           Blocks.
        if              if statement.
        set!            Variable assignment.
        (fn arg ...)    Function call.
        prim-fn         C-implemented prim (or compiled from C-Scheme)
        
The only thing that won't work is `lambda'.  But, in case it is used
to pass to higher order functions (HOF), the HOF could be replaced
with a macro.

Note that PreScheme[1] is much more than this: it compiles a
significantly larger subset of Scheme.

Use pattern matching.


[1] http://en.wikipedia.org/wiki/PreScheme
[2] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.3.4031


Entry: Really done?
Date: Fri Aug  7 11:38:25 CEST 2009

It looks like I have all of Scheme now.  Next: make some C-code
analysis tools that generate all the boilerplate.

One more thing is missing: variable length argument lists.  This is
necessary to implement `list' without which macros are no fun.


Entry: Two Kinds of Primitives
Date: Fri Aug  7 13:20:51 CEST 2009

Due to restart at GC, the interpreter can support two kinds of
primitives:

     * RESTARTABLE: Pure functions operating on Scheme data structures
       (or impure functions that do not perform allocation _after_
       mutation).

     * ABSTRACT: C code that does not refer to any Scheme data.

The disadvantage of not being able to access Scheme data from impure C
code can be largely removed by providing a suspension mechanism for
primitives.  In this case the C code could behave as a coroutine,
which allows the use of enumerators / iterators instead of
construction of intermediate data.

Impure functions performing allocation before mutation are hard to
write, and are best limited to the implementation of internal
primitives supporting mutation (like `set!' and `define') and adaptor
code that bridge Scheme and C datastructures for abstract functions.

In short: use either purely functional programming or concurrency
oriented programming, and you get a very simple interpreter and
application structure.


Entry: Bootstrapping
Date: Fri Aug  7 19:14:57 CEST 2009

Bootstrapping is fun.  It's the first time I did this for a Scheme
interpreter.  I've added two mzscheme .ss files.  One to convert .scm
files to .c_ files as long as there is no reader, and one to lift the
primitives from the .c file and write boilerplate registration code.

Then bootstrapping the rest of the basic scheme functionality on top
of the bare words is quite interesting.

One thing though: I don't immediately see how to make apply and eval
work.  Eval is probably iterating the step() function.  But apply?
Similarly maybe?  Constructing a continuation frame and iterating the
interpreter?

There is probably a simpler though non-obvious way to do this..
Manipulating the machine state from inside a primitive would probably
do the trick.

The trick is that the function modifies the continuation.  This could
be done using mutation, but that won't play nice with running the
interpreter manually (as part of debugging for instance).  It's
probably best to make it possible for primitives to return a
continuation.  Or better: to allow for explicit "state transforming"
primitives.

Let's thinkg about this first..

Before the step() function can really be a pure function, it has to
isolate all side effects, and should not rely on a main loop's
actions.

Maybe it's best to first try to implement call/cc.

Applying a continuation is really simple.  But how to capture it?
Looks like that's similar to apply.

Then about errors.  Maybe it is better to keep errors local to the
step() function that generated them, and return them as values
containing the continuation.  Except for GC restarts: they need to go
all the way up to make sure data is not corrupted.

Ok. errors are now local to step() allowing for recursive step() but
GC restart goes all the way up the chain.

Then a surprise: eval is (make-ast '(post 'foo-foo))

If a primitive returns a reducable term (represented by the ast
wrapper type), it will be evaluated.

Maybe apply can be made in a similar way?  Well.. I found something
that uses eval and map and quote.  Not pretty..

Looks like manipulating the continuation is best.


Entry: letcc -> apply
Date: Sat Aug  8 10:22:47 CEST 2009

It seems better to stick with the way things work right now, and use
only a single type of primitive.  This makes integration with C (the
eventual goal) a lot easier, and will prevent a lot of if-type style
code.

With `letcc' implemented as a special form, it is now possible to
create primitives that operate on continuations as data, creating new
continuations.  These values can themselves be applied.  This means
`apply' can be implemented as a continuation modification followed by
an application.

Yes, but.  The current k_apply type is a bit awkward, because it
evaluates from left to right, it will wait for the final argument
before coninuing with application.  Maybe it's simplest to support a
right-to-left app frame?

Before continuing, let's make the continuations somewhat abstract:
each with a parent slot in the first position.

Ok.

One way is to have a reverse k_apply: one that traverses the arguments
from right to left.  This way the `apply' continuation can be
constructed as an argument list, with the function value passed to it.
This would either require an extra reverse, or some re-arrangement in
the primitive call trampoline.

Or, a different kind of continuation could be made that ignores its
value, but passes closure to its parent continuation.

Basicly, such a continuation takes a value, ignores it, and passes
another value to its parent continuation.

Actually, this already exists: k_seq.

Indeed:  Using a continuation like this:
         
         _ -> (fn a1 a2 (begin _ a3))

will also ignore the supplied value and reduce to (fn a1 a2 a3).

    
Entry: closures and values
Date: Sat Aug  8 12:33:11 CEST 2009

So.. It's probably best to move closure_pack to the start of step(),
and unpack before adding to the environment.  It's a bit awkward the
way it is now.

Let's make it such that:

      - on step() entry, the closure is unpacked to perform reduction.

      - if there is no reduction, the value is packed.


Entry: magic values
Date: Sat Aug  8 12:47:55 CEST 2009

Because closures and ast's have a special meaning in the interpreter,
they cannot be used as values in Scheme.


Entry: Interesting
Date: Sat Aug  8 15:28:07 CEST 2009

Implementing this Scheme interpreter without relying on cons or
closures has been most revealing.  I never knew I knew so little about
the mechanics.  Arrogance tends to get in the way of learning..

There are some meta-confusion things in the syntax representation that
are a bit fishy still:

  - Syntax is wrapped in a weird way (tagged s-expressions.  a full
    parse into an AST so values and terms can never be confused would
    probably be better).

  - Because of this confusion, primitives can return reducable
    expressions (make-ast) and closures (make-closure) that will not
    behave as literal data.

As I've mentioned before, this makes make-ast behave as eval.

Maybe the solution is to explicitly represent non-reducable values as
VALUE?

Ok: 

 * all primitive return values will be tagged with VALUE
 * REDEX that is not a varref or an application is tagged as VALUE


Ok, I see..

It's probably best to get rid of the word "closure", and have two
kinds of values that contain environments:

      - redex
      - lambda

The trouble is primitive values.  In the lambda calculus, there are
only two kinds of terms: applications and abstractions, where
abstractions are values.

So.. Can I get rid of it in one try?  The code should be a lot clearer
with this..

I think I got the concept right, but there is some stray assignment
that messes up the GC.  Maybe time to isolate all assignments?

Inspecting the run state it gives something like:

... (#k_apply(#k_macro(#k_apply(#k_macro(#k_apply( ...

Ok.. recursive macro application.

After some more small tagging bugs (sc_apply_ktx required some head
scratching) it seems to work now.  The only remaining problem is the
invokation of (gc) from within scheme.

Was a problem with sc->entries.  Added sc->step_entries, which is a
different semaphore.

Seems to work now.  I can call the interpreter step from within scheme
+ all the state objects are available:

(make-value 123)
=> #value(123)

(eval-step (make-state (make-redex 'a '((a . 123))) (mt)))
=> #state(#value(123) #k_mt)

(eval-step (eval-step (make-state (make-redex 'a '((a . 123))) (mt))))
=> #error(halt 123 #state(#value(123) #k_mt))


Entry: Loop without cons
Date: Sun Aug  9 01:09:22 CEST 2009

So..  What would be necessary to make an interpreter that can perform
a loop without performing allocation?  It would need to overwrite
variables.

So..  What is scheme without GC?  This should be split in two: Scheme
with closures, but without continuations (using a stack) vs. Scheme
without closures, or only downward closures.


Entry: Reader task
Date: Sun Aug  9 11:32:40 CEST 2009

I got distracted..  Getting a front row view of how Scheme is
implemented on top of a bare machine is fascinating.

Anyways.  Now that the core of the interpreter seems to work, let's
make the first attempt at re-usable code: a reader = tokenizer +
"compressed ast parser" for s-expressions.

A compressed ast is a binary flat representation of a tree, consisting
of a stream of words.  This is important, because it will be the main
interface between two communicating processes that do not share
memory (or not all memory: they might share constant pointers.).


Entry: Keeping track of value names
Date: Sun Aug  9 12:05:38 CEST 2009

How to tag values with names?  Every time a value is lifted from the
environment, it could be recorded in the VALUE struct for debugging
purposes.  When there is an error, the current value's name could be
used for reporting.

Hmm..  It's not so simple..  It's probably easier to tag the primitive
struct itself with names.

Ok: solution was this: tag the sc struct with the primitive before
executing it: exceptions come from the call chain, or the interpreter.


Entry: Byte code
Date: Sun Aug  9 13:40:28 CEST 2009

More procrastination.  What would Scheme byte code look like?  It is
essentially creating a cached representation of the interpreter
sequencing.

The answer seems to be CPS conversion.


Entry: More library code
Date: Sun Aug  9 18:05:10 CEST 2009

let, letrec, named-let, apply, eval, cond.


Entry: Scheme interpreter next:
Date: Mon Aug 10 09:35:13 CEST 2009

critical path:

* C tasks -> reader (also find a standard protocol for transferring
  tree data using a machine word port interface).


structured procrastination:

* compilation to bytecode (CPS)

* subset compilation to C: i.e. embedding C in s-expr syntax using
  let* for set! begin while.  maybe augmented with a bit of
  optimizations to make the subset larger (PreScheme style).

* garbage collector: the stop & copy collector is good for
  bootstrapping, but it at least needs to be replaced by a cheney
  collector that doesn't use a stack.  maybe some more fancy things
  later.  this could be abstracted out and reused.


Entry: Tasks: one-shot + continuations
Date: Mon Aug 10 11:52:16 CEST 2009

It looks like the simplest approach is to use a `prompt'.  The current
tinyscheme implementation has a transparent mechanism: any C primitive
can be suspended relative to the setjmp right before primitive
execution.  The setjmp is also used for implementing exceptions.

The scheme in this project has a different structure: there is some
state unpacking between the mark and the actual primitive call.

This makes me thing it might be better to implement c continuations
and one-shot tasks using a primitive that marks the context.

OK. Continuations are working: they create a new ck struct on each
suspend.  It's probably to make one-shot continuations if the context
is reused + a context size check is performed.

NEXT: 
  - allocate before running (run on separate stack)
  - one-shot


Entry: GC
Date: Mon Aug 10 15:48:22 CEST 2009

Apparently there's a problem with finalization of atoms: if the
unreachable garbage space contains more than one reference to an atom,
its finalization will be called twice.

Hmm..

It looks like atoms that have a free() method need to be wrapped in a
vector, to make sure they only appear once in the heap.

So, the data needs a different encoding.

Let's start from what we need in the GC:

- vectors
- constants
- integers
- finalizers

An object that needs finalization which is not idempotent needs to be
wrapped in a vector.

OK: solved by removing the 'atom' type, but requiring that every
finalizer found in the heap is followed a pointer to finalize.  This
gives a simple mechanism to build wrapper vectors around arbitrary
(aligned) pointers: the client just needs to ensure that each object
occurs only in a single vector, and use that vector as the opaque
representation of the object.


Entry: Alloc before call
Date: Mon Aug 10 17:03:12 CEST 2009

Instead of allocating before calling a primitive, in the case where
the intermediate operation doesn't alloc, it might be simpler to see
if there is enough space and then postpone the allocation, in order
not to create garbage.


Entry: 0xbad
Date: Mon Aug 10 19:31:34 CEST 2009

A coincidence: usually this indicates problems :)

1: test_ck()
(123 . #aref(#fin<0xba6270:0x40123c> #data<0xbad420>))
2: test_ck()
(124 . #aref(#fin<0xba6270:0x40123c> #data<0xbad5b0>))
2: test_ck()
(124 . #aref(#fin<0xba6270:0x40123c> #data<0xbad740>))
2: test_ck()
(124 . #aref(#fin<0xba6270:0x40123c> #data<0xbad8d0>))
3: test_ck()
125


Entry: Dybvig Danfest 2004
Date: Mon Aug 10 23:19:58 CEST 2009

"The macro writers bill of rights"[1].

Check out KFFD hygienic expansion algo[4].  This lead me to hygienic
macros on top of unhygienic ones[2].

Apparently my letrec is wrong: all evaluations need to happen before
all assignments.

Optimizations: constant folding, copy propagation, inlining, dead code
elimnation.  Inlining (deciding to inline) isn't as easy as copy
propagation: you want your compiler to terminate and prevent code
explosion.

[1] http://video.google.com/videoplay?docid=-6899972066795135270
[2] http://p-cos.net/documents/hygiene.pdf
[3] http://people.csail.mit.edu/jaffer/r4rs_12.html
[4] http://www.cs.ucdavis.edu/~devanbu/teaching/260/kohlbecker.pdf

Entry: Sequential code
Date: Tue Aug 11 09:39:04 CEST 2009

Visually I associate left-aligned code in Scheme with side-effects
(begin).  However, parallel code also looks like this, and is not
necessarily bad from a mutation pov.


Entry: GCC
Date: Tue Aug 11 10:19:25 CEST 2009

Looking at the assembly output of scheme.c and it seems that it
performs the inlining well, except for the allocation, which uses a
vararg function _sc_make_struct that looks rather expensive to call.

Also, gc_alloc won't line.  I guess the `full' check prevents that.
It's probably easier to implement GC boundaries with unmapped pages.


Entry: bootstrap in forth
Date: Tue Aug 11 10:44:02 CEST 2009

Now, in order to get to a smaller code size.  What about writing the
step function in threaded code?  Does this make at all sense?

I don't think so: the features that are used to write the interpreter
are GC's vector construct (library) and C lexical scope and C
structure member scope.  It's rather awkward to write something like
that in a combinator language.


Entry: size
Date: Tue Aug 11 11:21:40 CEST 2009

Looks like the real culprit is the primitive registration code.  This
can be done in a table.

strip scheme
ls -l scheme

# currently
-rwxr-xr-x 1 tom tom 44592 2009-08-11 11:28 scheme

# Ha!
# The prim table actually made it bigger :)
-rwxr-xr-x 1 tom tom 45912 2009-08-11 11:34 scheme

# Changing the prototype of the function to take strings instead of
# symbols made it smaller:
-rwxr-xr-x 1 tom tom 43936 2009-08-11 12:02 scheme

So it's really the code that generates the boot code that bloats the
binary.  Looks like a reader is what's needed to compress further.

Also, part of the data structures could be loaded as constants:
symbols and primitives are not GC-managed, so they could be defined as
constant data.

Another thing that leads to bloat is the tag shifting.  Can this be
simplified to a single AND + compare?

After separating the bootstrap code to lib.o i get these stripped sizes:

# gcc -m32
-rw-r--r-- 1 tom tom  1256 2009-08-11 12:48 gc.o
-rw-r--r-- 1 tom tom 16116 2009-08-11 12:48 lib.o
-rw-r--r-- 1 tom tom  2032 2009-08-11 12:48 main.o
-rw-r--r-- 1 tom tom 12456 2009-08-11 12:48 scheme.o
-rw-r--r-- 1 tom tom   676 2009-08-11 12:48 symbol.o
-rw-r--r-- 1 tom tom  1136 2009-08-11 12:48 task.o

# gcc -m64
-rw-r--r-- 1 tom tom  1904 2009-08-11 12:49 gc.o
-rw-r--r-- 1 tom tom 18040 2009-08-11 12:49 lib.o
-rw-r--r-- 1 tom tom  2640 2009-08-11 12:49 main.o
-rw-r--r-- 1 tom tom 17344 2009-08-11 12:49 scheme.o
-rw-r--r-- 1 tom tom  1176 2009-08-11 12:49 symbol.o
-rw-r--r-- 1 tom tom  1832 2009-08-11 12:49 task.o


Entry: Compressing an s-expression into a serial protocol.
Date: Tue Aug 11 12:53:48 CEST 2009

What about this: create a small stack based VM spec that can can
create a data structure from an atom stack, and send the instructions
over the wire.

Hmm.. confusing..  Really I want a proper one-shot abstraction without
copying before attempting this.

What I really want is expanded AST compression or compilation to CPS
bytecode.


Entry: CPS conversion
Date: Tue Aug 11 13:57:19 CEST 2009

(lambda (x y) (fn (- x 1) (+ y 2)))

(lambda (x y k)
  a1 <- (- x 1)
  a2 <- (+ y 2)
  (fn a1 a2 k))


Entry: Local macros
Date: Tue Aug 11 14:06:20 CEST 2009

Let's support those by also recording the macro environment.  Looks
like it's easier to first support `let' as a special form, and then
reuse that code for `letmacro'.

`let' would use an application frame, where the first value is already
filled in (a lambda created from the body).  The same could be done
with `letmacro'.

What would be a `lambda' where the names are bound to macros instead
of values?  Maybe that's the simpler route?

Time to take a break, this is confusing..


Entry: Compiling to C and pattern matching
Date: Tue Aug 11 16:48:29 CEST 2009

Take the interpreter's main loop and write it as a pattern match in
Scheme syntax.  Actually: a pattern matcher is a language for writing
interpreters.

Basicly, all the C structs, predicates, and if/then/else blocks can be
automated easily, as long as it maps data -> data.  Otoh, the C code
itself isn't so difficult.


Entry: Dynamic languages workshop
Date: Tue Aug 11 17:05:08 CEST 2009

http://www.ai.mit.edu/projects/dynlangs/wizards-panels.html

Panel 1: runtimes

David Moon:

 ``Start with a small working system and expand it incrementally,
   adding functionality and adding performance.  The big bang way only
   works for God.  Everyone else has to use evolution.''

 ``Sometimes an error message means exactly what it says.''


Panel 2: compilation

An interesting discussion about finalizers.  For libprim tasks, I'm
thinking that C++ destructors might be interesting in combination with
stack-based "pure" tasks: going out of scope due to return or
exceptions calls destructors.  Will Clinger mentions weak pointers
[1].  Also David Detlefs talked about reference counting coming back.
Figure out what he really meant.


Panel 3: Language design

Jonathan Rees: Tension between distributed programming and OO: an
  object is essentially "here": all abstractions around that is
  essentially smoke and mirrors.  (I.e.: physical location/locality
  becomes more important as systems grow physicaly larger.)  This is
  about physical things (state) vs. information.  Object != value.

Guy Steele: little known secrets

 - stacks are overrated

 - GC is good, finalizers are bad (finalizers w. good semantics <- GC
   at every pointer alteration)

 - lexical scoping is good.  easy to screw up when writing
   interpreter.

 - type inference can be good but error reporting is hard.

 - monads maybe very good

 - higher order functionals are underrated.  thinking in APL can
   approve your lisp code.

 - keyword arguments are good (PG)

 - pretty printers are vastly underrated

 - rule-based systems. (20 years ago, time to try again). 

[1] http://www.iecc.com/gclist/GC-faq.html

Entry: removing optimizations
Date: Tue Aug 11 17:45:36 CEST 2009

Heeding good advice: removing all optimizations that make code look
more complicated.


Entry: Real-time Garbage Collection
Date: Tue Aug 11 22:20:29 CEST 2009

Supercollider uses [1] while libgarbagecollector[2] is used in Io.
This one[3] is a rework of Metronome.

I'd like to know how this works..

After reading a bit, the conclusion I get to is that this isn't really
a science.  Optimality boils down to figuring out which objects are
long lived, so they can be ignored most of the time.  There are a
bunch of heuristic tricks to do this, but none of them are optimal
for all programs.

[1] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.39.4550
[2] http://www.dekorte.com/projects/opensource/libgarbagecollector/
[3] http://lambda-the-ultimate.org/node/2393

Entry: GC stuff
Date: Tue Aug 11 22:49:43 CEST 2009

Generational collection.

One of the key points seems to be that one mostly collects the
`nursery', a pool of recently allocated objects.  The assumption is
that most of the elements in the nursery will become garbage soon
after allocation (child mortality) AND that there are a lot of objects
that don't.  In order for this to work there needs to be a cheap way
of finding pointers that point from the heap at large into the (much
smaller) nursery.


Finalization: Instead of scanning the from space after copying, it's
also possible to perform the finalization later, when allocating anew:
the data will be moved to the cache anyway because of initialization,
so we might just as well access it then.


Entry: Cheney GC
Date: Wed Aug 12 10:13:23 CEST 2009

[ Minsky-Fenichel-Yochelson-Cheney-Arnborg (MFYCA) method ]

Like a recursive mark and copy collector, but using a breadth-first
search.

  1 Copy all root objects: copy atoms, update moved vectors and copy
    vectors not yet moved.

  2 Copy all references in newly allocated objects.

  3 Repeat until heap stops growing (all reachable objects are moved).

OK: typed in, compiling, not working: it blows up.  Simplified, but I
can't see it.  Guess it's time for a proof.

An interesting challenge this is: I've always stayed away from
implementing graph algorithms in C.  It requires a lot of discipline
to get the invariants right.  The cause is usually due to embedding
and premature optimization, i.e. using the same field to represent two
distinct values in different modes when there are subtle cases where
this is not valid.  The Cheney algo is a simple one, but I couldn't
get it right ``just writing it down''.

Ha.. Indeed: caused by leaky abstractions: the problem was that vector
tags didn't get copied, which probably triggered an allocation loop in
the interpreter due to type errors.

Next: add assertions and error handling for the whole project.


[1] http://en.wikipedia.org/wiki/Cheney%27s_algorithm


Entry: Simpler k_apply
Date: Thu Aug 13 12:17:00 CEST 2009

Next time I write an interpreter, it will be a right-to-left
evaluating one.  This simplifies the form of apply continuations, to
easier support the `apply' primitive.


Entry: Restating goals
Date: Thu Aug 13 12:27:44 CEST 2009

In libprim I'm trying to explore the idea of non-invasive scripting.
Essentially, it explores a concurrency-oriented programming style,
where one of the tasks is a `management core'.

At the same time I'm looking into simple interpreter cores to serve as
the operating system.  Currently there's a bare-bones scheme
interpreter.


Entry: Ports
Date: Thu Aug 13 12:36:27 CEST 2009

Added a ports struct (GCd aref) + a debug port until dynamic scope is
implemented.


Entry: Current scheme implementation wrt. generated code.
Date: Thu Aug 13 13:58:21 CEST 2009

Code is generated for declaring primitive functions and defining boot
code in C (as long as there is no reader).  This could be augmented by
generating all code that has to do with data structures: struct def,
lowlevel casts, and highlevel predicates and constructors.
Additionally, all primitive trampoline code could be generated.

Definitely: when the primitive struct accessors need to be available,
it would be simpler to use a code generator.


Entry: Leaf: small object model
Date: Thu Aug 13 14:37:22 CEST 2009

So..  Basicly, what's next is to make primitive data extension
simpler.  I'd like to get to a functionality of basic Scheme or PF: a
collection of primitive objects written in C, easily interfaced with
scripting code and independent of its data and control models.

For the GC memory model (Scheme), such objects are wrapped in aref
(finalizer + constant).  For the linear model, they would be wrapped
in a refcount cell.

tom@zni:~/libprim/leaf$ ls *.c
bytes.c  port.c  symbol.c  task.c


Entry: Naming conventions
Date: Thu Aug 13 16:25:49 CEST 2009

The trick seems to really be: use decent naming conventions for C
functions and types.  This way a lot of boilerplate is easily
generated at compile time.


Entry: Stack language
Date: Thu Aug 13 16:52:32 CEST 2009

With the scheme language as good as finished, let's start the stack
language.

1. references
2. cell-base
3. queues

Ok.. Separated out the object representation of GC: it can be re-used
in refcount based memory model.

The first thing to do is to create the memory and object model.
Essentially: implement `void' and `drop'.

Done.  Next: print objects.  This can be done by factoring out the
primitive print functions.


Entry: Dual regime
Date: Thu Aug 13 18:46:32 CEST 2009

I started with the basics of pf2.c : stack based alloc and free.
Maybe the important thing to realize is that it's not going to be
possible to be completely linear, due to references to code and global
storage.   However, it might make sense to get to an almost-linear
regime, where GC is active during load and setup, and after that the
GC is switched off and only the linear language runs.

Anyway, let's just be carful to make sure pf data types are compatible
with GC.

Alternatively: the code/variable space could be made inaccessible to
the language.  If it is externally managed and accessed abstractly,
references into this memory _can_ be used.  


Entry: Reusing primitives
Date: Thu Aug 13 23:16:55 CEST 2009

There seems to be a real benefit in doing this in a shared way.  All
Scheme primitives that do not CONS can be directly reused.

Even better: the whole machine might be embeddable inside scheme: this
would give a way to make references work (variables and code), which
would lie outside the reach of stack code, but could be used as jump
targets and data storage.


Entry: Moving the stack language into the Scheme interpreter.
Date: Fri Aug 14 08:48:30 CEST 2009

Basic idea: the PF machine never CONSes from the GC free space for
_data_ objects.  It's CONS is a reusing one.

What about doing this differently: separate out the memory model from
the scheme struct.  This includes GC and all leaf object classes.

The thing to decide is which objects are allowed in the stack machine
as data objects, and which should remain hidden.  There's no use for
exporting the Scheme's internal data structures (yet).  Let's start
with cons cells and atoms.  Maybe the main problem is representing
RC'd atoms in the Scheme core.

It doesn't seem to be necessary to run PF in SC.  As long as the
memory management is compatible, they can be made to work together.
The hairy part is the GC restart, since they would need to share the
GC next to all the mem classes, but it doesn't seem infeasible.

So, given that we're using the Scheme's data model, how should
finalization and reference counting be bridged?


Entry: Packet Forth
Date: Fri Aug 14 11:09:11 CEST 2009

It's becoming more clear to me that this is really a rewrite of PF.
The core idea is to be able to use linear+RC based memory management
for ``regime programs'' : code that performs some setup and then ends
up in an infinite loop with very predictable memory patterns.

The mistake I made in the first implementation is to assume that GC
isn't necessary.  Only after _first_ implementing a Scheme around the
primitive data types did I realize that this is really an essential
part: the synergy between a Scheme-like host and a linear stack
language that can mutate the GC-managed graph is essential to the
idea.  You need the first to provide "global memory" for the second.

Representing data.

The PF primitives should know about (``inside memory'').
    - constants
    - cons cells
    - RC managed leaf objects
    - opaque references to "outside" memory

The outside model, accessible from Scheme, needs to wrap RC managed
cells in an aref object.

An ``outside pointer'' TAG_OPAQUE is a data structure that is
GC-managed, but can be accessed abstractly from the inside.  It is
treated as a constant.

Basicly, in the linear model, everything could be opaque by default.
Essentially, PF doesn't need to care about managing anything else than
CONS cells and RC objects.

Now, whenever an RC value is made accessible to Scheme, it needs to be
wrapped in an aref, where the finalizer decrements the refcount.  Such
values are really meant to be kept local to the linear memory, but of
course it shouldn't be prevented to have them in the global space,
i.e. embedded in a piece of code as a constant.

This is not trivial however.  The refcounted object should be linked
to the Scheme wrapper, to keep the representation unique.  However,
this link must not become stale!

It looks like an rc struct needs to be a GC managed vector.
Conceptually it's not so hard: my difficulty is with the
representation.

        LIN -> GRAPH : rc++
        finalize:      rc--

        wrappers don't need to be unique!

The latter remark is key.

Looks like this works fine.


Entry: Code and Variables
Date: Fri Aug 14 16:26:47 CEST 2009

The VM's data structures should not be visible inside the PF language:
the machine is not reflective.  This is a key difference with the
current PF implementation.  This enables graph data to be used to
represent PF byte code.  (Note that if reflection is necessary, it can
be implemented at a greater expense by _copying_ from graph -> linear
memory and back.)

The remaining problem is then the representation of variables.
Essentially a variable is a box that's referenced by code.

            CODE -> BOX

This BOX can then be loaded on the data stack to enable the PF machine
to modify its contents.

But.  What can be put in a box?  What if a box disappears?  Hmm.. not
done yet.  Basicly it seems that a box needs to be an aref.  What if
an aref contains a list?  Yes.. plenty of pitfalls..

This can be solved by changing the semantics of finalizers: they take
objects instead of constant's pointer.  By tagging constant with 00
nothing really changes though.

So, finalizers take objects: this means that linear structures can be
embedded in an aref: if an aref becomes unreachable, an RC-- will
ripple through the list.


Entry: Scheme Primitives in the Stack Machine
Date: Fri Aug 14 18:00:40 CEST 2009

If a primtive doesn't CONS it can be reused without problems.  Let's
separate all of them out, making reuse easier.


Entry: Queues
Date: Fri Aug 14 18:01:53 CEST 2009

As the CONS of two stacks?  Note that this gives only amortized O(1)
addittions and removals, and might for that reason not be a good
candidate.  Otoh, it does allows representation as a tree.

Because of the linear nature however, mutable vector-based solutions
might work.


Entry: Packet Forth Interpreter
Date: Fri Aug 14 18:08:45 CEST 2009

Next: the interpreter.  Should be quite straightforward, only this
time I'd like to not make the mistake of not having proper tail call
handling.  For some late night zen.

One thing though: the return addresses.  What are they?

They need to be real references to guarantee a safe memory model, but
they need to be distinguishable from ordinary CONS cells.  Maybe just
add a CODE cell?  Also, this gives a very elegant way to make tail
recursion work: simply link the CDR to the head of the last word
called!

Looks like this is ``continuations for free''.

The return stack then becomes a CONS list of CODE pairs.

CODE  = RETURN | (SUB : CODE)    ;; ':' is a GC CONS
SUB   = PRIM | CODE

RS    = MT | (CODE . RS)         ;; '.' is a linear CONS

Here SUB is either a PRIM which can be invoked to modify the machine
state, or a CODE list which can be executed by saving the current CODE
cdr on the return stack and jumping to the SUB CODE.  RETURN pops the
next CODE from RS and jumps to it.

( Note that RETURN is necessary _only_ if a code definition ends on a
primitive.  If not, it's simply linked to the tail procedure. )

Essentially, the return stack itself is in this case also just a code
graph ``constructed at run-time'' : it is a callable function!

Note however that `.' is a _stack_ that's managed linearly, while the
`:' is graph managed.  I.e. the return stack can't contain loops, but
the code can.

This makes the new PF semantics complete.

HA!


Entry: pf_run()
Date: Sat Aug 15 00:04:48 CEST 2009

The basic interpreter seems to be running.  Next: exceptions and
constants (quotation).

For quotations, I'm running into the same problem as for the Scheme
interpreter: it is necessary to strictly separate language and
metalanguage: explicit quote tags are necessary to be able to quote
code.


Entry: Linear pairs
Date: Sat Aug 15 11:47:05 CEST 2009

It's probably better to use a different data type for linear pairs.
That way nonlinear pairs can be used as constants too.

Otoh: reusing the same pair structure allows primitives to be reused.
Even those that do cons. (Cons could be part of a shared object).

It's really not such a hurdle to manage the border between linear and
graph memory, so let's stick to the simple implementation: cons lists
are cons lists, they just have only a single reference in the linear
part.

Entry: More sharing between Scheme and PF
Date: Sat Aug 15 12:02:16 CEST 2009

There is already quite some duplication, mostly in exception handling,
GC, and printing.  This should be handled by "mem". (mom?).

I guess it's a good idea to factor out as much as possible beforehand.

Let's rename "mem" to "base", which implements:

      - GC restart
      - primitive leaf types
      - primtiive exception handling
      - printing

Hmm..  Let's not.  The simplest way is to make sure the primitives
themselves will take from the base class.  This can be factored out
later.. It was only a couple of lines to add the ABORT exceptions in
PF + it's slightly different since the interpreter is not re-entrant.


Entry: Code bootstrap
Date: Sat Aug 15 12:48:20 CEST 2009

Now that default abort works it's time to start adding some primitives
and a bootstrap reader (sexp -> C).

Seems straightforward once list -> code translation works.  (eval).

Before going there, I think it's best to do the shared primitives
anyway.  The PF core can benefit greatly from _expressions_ : in fact,
almost all code in pf.c is doubly done: once as expression, and once
bound to the linear stack.


Entry: EX: expression language
Date: Sat Aug 15 13:55:38 CEST 2009

`expressions' use the following resources:
              - GC
              - exceptions
              - leaf types

It is a dynamically typed functional language using the C-stack for
expression nesting.

The Scheme and PF languages add primitives that operate on an
extension of this state, with most of the primitive expressions
shared.

Now, by binding EX in the respective scripting languages to &pf->m and
&sc->m it is possible to abbreviate all expressions, leaving out the
prefixes.  I.e. generalization of:

#define CONS(a,b)  ex_cons(EX, a, b)

Now that the infrastructure is there, let's move the primitives lazily
depending on need in both languages.

Entry: Static scope.
Date: Sat Aug 15 15:41:17 CEST 2009

What's interesting for the current PF code graph implementation, is
that it is actually statically scoped.  All variables are bound at
compile time, which makes (data / code) hiding through local
definitions possible.

I'm not sure how to reflect this back into the language, but it could
be used to create local dictionaries, one of the things that was
really hard to do in original PF.


Entry: PF / PX primitives
Date: Sat Aug 15 16:05:08 CEST 2009

In light of this sharing it makes sense to define PF in terms of PX
primitives: in-place expressions.

pf_  forth words
pl_  linear (in-place update) expressions

ex_  nonlinear (sharing / consing) expressions (usable in Scheme)


pf_  probably are redundant, and can be generated from px_


Entry: EX printing
Date: Sat Aug 15 16:43:05 CEST 2009

EX has a lowlevel port* reference to implement basic printing.  It
needs a lowlevel interface because in SC and PF ports are managed
respectively as graph and linear atoms.


Entry: Building
Date: Sat Aug 15 20:47:04 CEST 2009

Looks like I've got the build problems ironed out.  Moral: dont _EVER_
generate files outside your local makefile directory.  This can be
subtly done by some rule that depends on a specification outside of
your dir..  I had something like this:

../ex/ex_prims.h_:  ../ex/ex_prims.c

which nicely overwrite the .h_ file that should have been built in
another directory using a locally defined rule for .h_ files

Another thing is to use different extensions for different generators.
Otherwise the problem above is bound to impose itself.


Entry: EX errors
Date: Sat Aug 15 21:18:47 CEST 2009

So.. Time to move primitive errors from SC -> EX.
Done.


Entry: Knot tying
Date: Sat Aug 15 22:50:25 CEST 2009

To define recursive functions, it is necessary to use mutation of the
code graph.  I would like to preserve a semantics of early binding
though.

OK, it's not so difficult.  Adjust the environment so each pair points
to a code cell.  This is the one that will be referred to from within
other code.  When a definition arrives, reuse the cell.

What I really would like is a way to call Scheme code during
definition.  Mutation doesn't go well with the allocations necessary.
So maybe this needs to be written as a bootstrapped composite word?

Otoh, it might be solved linearly, translating the code in-place,
transforming the structure into code directly.

1. compile in-place, reusing the cons cells (code is isomorphic to the
data representation, except for `quote' and variable wrappers.

2. update the global environment.

This would work were it not for the wrapping that needs to perform
allocation.  It really would be simpler to do it from Scheme.

Alternatively, it can be done in two passes: one pass creates the
skeleton but doesn't mutate to make sure it can be restarted, while
the second pass fills in the values.

Also, it's possible to start with a non-recursive version and use that
to boostrap the recursive one as a primitive.

It's really not so difficult: the only need is that everything fits
into free GC space before the modification to the environment is made.


Entry: PF compiler
Date: Sun Aug 16 11:44:20 CEST 2009

"Compiler" is a strong word: it's mostly a linker.  As mentioned in
the previous post, this is a bit awkward due to the graph patching
that needs to be done.  There are many corner cases related to
singleton words (that need ``eta-reduction'') or primitives, that can
be inlined.

The real question is (and I've run into this before) does the
dictionary contain CODE references, or SUB references?

   CODE  = RETURN | (SUB : CODE)    ;; ':' is a GC CONS
   SUB   = PRIM | QUOTE | CODE
 
   RS    = MT | (CODE . RS)         ;; '.' is a linear CONS

It might make more sense to use SUB refs.  Tail calls are tricky!

Let's change the types a bit:

   CPAIR = (SUB : CODE)
   CODE  = RETURN | CPAIR   
   SUB   = PRIM | QUOTE | CPAIR
 
   RS    = NIL | (CPAIR . RS) 

The reason this feels a bit weird is that composite code doesn't need
RETURN.  The only way to ever pop RS is to have a word end in a
primitive, i.e. [ PRIM RETURN ].  Composite code can always just jump.

Looking at CPS versions of Scheme, this is also quite clear: the only
thing that ever _invokes_ the continuation, is a primitive.

So, can the types be made a bit more intuitive?


Entry: New PF interpreter
Date: Sun Aug 16 12:09:57 CEST 2009


The interpreter evaluates the following recursive data structure.

( Note that this represents fully resolved byte code: it doesn't contain
  any variable references. )

      SEQ = (SUB : SUB)            ;; `:' is graph CONS
      SUB = PRIM | QUOTE | SEQ
 
      K   = HALT | (SUB . K)       ;; `.' is linear CONS

SEQ  Code sequencing provides the encoding of what to do _now_ and what
     to do _next_

SUB  A subroutine is a basic unit of work: invoke a primitive function,
     load a data object onto the parameter stack, or perform a
     code sequence.

K    The continuation is a stack constructed at run-time, representing
     the rest of the evaluation, finishing with HALT.


The interpreter uses this data structure as follows (in pseudo code,
an ML implementation can be found here[1]).

      sub = POP(K)
      case sub
         SEQ (now : next)  
           -> K = PUSH(now, PUSH(next, K))
         PRIM
           -> execute PRIM
         QUOTE
           -> load data item on parameter stack

SUB is the main code type (the subroutine).  It is what you pass to
`run', and it is what can be found as resume points in the
continuation.  It includes primitive code and data, as well as
composite code sequences (SEQ).

When a SEQ is encoded, the two SUBs will be pushed onto K in the
correct order.  Note that this encoding automatically gives proper
tail call handling[2]: loops can be implemented using recursion
without blowing up the K stack.


                            * * *

The above is the representation of compiled code.

Concatenative (Joy-like) code can be compiled straightforwardly to
SEQ-based code.  This can be illustrated using the standard lisp
CDR-linked pair notation to represent both source code CONS pairs and
SUB pairs.  The the symbolic program

   (a b c d)     = CONS(a, CONS(b, CONS(c, CONS(d, NIL))))

is compiled to an improper list 

   (A B C . D)   = SEQ(A, SEQ(B, SEQ(C, D)))

where each symbol x is mapped to a corresponding X representing a SUB.
Resolution happens using a global vocabulary that maps symbols to
SUBs.
 
                            * * *

The code representation as a binary graph has several advantages
related to the similarity between binary graphs and binary trees (PF's
linear pair data structure for data storage and representing
the machine continuation).

    - SEQ as a pair of SUBs gives a very direct encoding of "now" vs
      "next".  This makes it really easy to represent continuations as
      a stack of SUBs.

    - The continuation K ``looks like'' a SEQ (a particular kind of
      SUB), in that it is a linear pair of a SUB and a K.

    - The machine can run without nonlinear data allocation:
      continuation push/pops are linear.

    - One-shot full and partial continuations (segments of the K
      stack) are (isomorphic to) linear lists.

    - This isomorphism obviously works also the other way: linear
      lists constructed at run time can be converted to one-shot full
      or partial continuations.  (This can be used to emulate one-shot
      `closures'.)

    - Linear one-shot closures/continuations can be _copied_ to make
      them multi-shot.  

    - They can also be _compiled_ to code (linear pairs represented by
      nonlinear SEQ pairs) so they can be reused any number of times.
      However, this is not a linear operation.


( Note that the isomorphisms need some work to figure out quoting
issues.  Essentially it will boil down to different kinds of
continuation frames. )


[1] http://zwizwa.be/darcs/libprim/pf/pf.ml
[2] http://repository.readscheme.org/ftp/papers/ai-lab-pubs/AIM-443.pdf


Entry: Compilation
Date: Sun Aug 16 12:43:45 CEST 2009

Now.. I think the bytecode does what I need it to do.  But how to
compile it?  This requires multiple passes to properly inline all
references.

I have something like this:

FOR(SRC):
        symbol? YES -> dereference if possible
                NO  -> quote as datum

        next?   YES -> alloc SEQ and continue
                NO  -> store and finish

Then iterate over the data structure until all symbols are
dereferenced.

There are (at least?) two special cases that won't work:

 - empty programs are not handled properly.

 - degenerate loops lead to infinite dereference passes:

   (a b)
   (b a)


Handling those separately should make it work though.  First, filter
out all references to empty programs.  Second: use a counter to
indicate the number of dereferences performed.  If 0, there is no
progress: any remaining references are degenerate loops.

The empty program (NOP) is necessary as a corner case, because empty
quotations need to work.  These can be optimized out of normal code
though.

Hmm.. this approach has a number of pitfalls.  It isn't possible to
determine whether a program is a NOP before completely compiling it.

Let's make it work first using a 1->1 map from source structure, and
perform optimizations in a second pass.

So, the passes are:

  (PURE)
    1. construct environment
    2. compile s-expr -> AST  (``improper'' SUB lists)
    3. resolve all references
    4. check degenerate loops (NI)
    5. snap pointers (NI)
  (IMPURE)
    6. patch global environment (NI)

Most of the passes can be done purely.  They don't create much garbage
so they can be kept as primitives (should make it to the end without
GC).

To test it, first the primitives need to be defined.

OK. Prims load + highlevel code compiles.


Entry: Linear cons cells
Date: Sun Aug 16 18:46:55 CEST 2009

It might be best to switch to an explicit representation for linear
CONS cells.  Especially for the interaction with Scheme, it might be
confusing to have the same datastructure represent two different
things.

The hack is of course that inside the PF primitives, it's possible to
use direct calls to read-only ex_ functions, without the need to make
them polymorphic to both linear and nonlinear lists.

Anyways, this seems something to postpone until it becomes a problem.
It should be straightforward to do: it's mostly the functions
_pf_copy_from_graph and _pf_copy_to_graph.


Entry: QUOTE
Date: Sun Aug 16 19:32:48 CEST 2009

Another premature optimization: currently there's a lot of copying
going on when defining new code from within PF (represented as linear
lists).


Entry: Decompile
Date: Sun Aug 16 20:59:37 CEST 2009

With everything packed as a graph, it's quite difficult to print out
any code.  Let's add some reverse lookup for code printing.

OK.  This seems to work.  With a bit of effort re-quoting also works.
Note that words don't have a definite end, except when they end in a
primitive.  Whenever it encounters a sequence tail that has a name, it
prints that instead.  Using brackets for printing code:

(def . ['456 dup-post abc])
(loop . ['1 [print-state] ['1 '2] loop])
(abc . ['123 dup-post])


Entry: Autosnarf
Date: Sun Aug 16 22:38:53 CEST 2009

From EX / PX -> PF.

Possible?  Probably not, since there are no guarantees about linearity
of the output.  Without proper typing this doesn't seem to work..

So it needs to be done manually.  


Entry: No IP
Date: Mon Aug 17 09:18:48 CEST 2009

With the machine organized as it is now, there is really no need for
an IP register: always use the top of RS.

Well, conceptually that's the case, but let's keep it like it is now.
It's an implementation issue that doesn't have too many consequences.

Another thing though: the return stack can't be used for storing
temporary data.  Or can it?  Looks like it can, at least in the middle
of a SEQ, because an atom placed on RS will only be accessed when
evaluating a primitive or constant in tail position.


Entry: Next?
Date: Mon Aug 17 09:55:03 CEST 2009

- More primitives: combinators + list/stack/queue ops.
- Reader


Entry: Making PF expressions available to scheme.
Date: Mon Aug 17 09:59:05 CEST 2009

The requirement is to decouple them from the pf* state other than ex*.
Very straightforward: rename them px_ -> ex_ and use the
../ex/ex_prims.ss snarfer.


Entry: Does PF need real-time GC?
Date: Mon Aug 17 10:27:47 CEST 2009

Note that I realize that these days it is possible to get real-time
incremental GC.  However, these are still not synchronous memory
managers: they won't reclaim large blocks immediately so always
require more memory than a synchronous GC, and on top of that will
cause unnecessary loss of data locality.

This doesn't mean that the current (very simple stop-and-copy Cheney
GC) can't be replaced by something more elaborate to make the
_compilation_ also respect real-time constraints.

Otoh, since the linear and graph memory are completely separate, and
the GC will never modify the linear data, a GC that doesn't move the
data can be run in a low priority thread without problems.

So, yes: I guess I'm taking position _against_ real-time GC, at least
for scripted DSP applications (for static compilation, you can
probably use static buffer allocation) where large chunks of memory
are allocated and freed at a rapid rate.  It seems that the
multitasking approach makes more sense: a linear core which can use
(read-only) data and code from a nonlinear one.

... Baker 1978[1] -> contains some information about CDR coding.
Useful for making byte code sequential (cache-friendly).  Chapter 8 is
of interest though: talks about RC counting.

The trouble with RC managed data that Baker mentions is that a `drop'
of a huge data structure takes a long time to deallocate.  He suggests
to do te deallocation lazily.  In practice though (especially due to
the `physical' nature of linear objects) this doesn't really happen
much.  Consumption rate is usually one-by-one.  And, lazy dealloc
still doesn't perform fast reclaim for cache locality, what we're
really after..


[1] http://home.pipeline.com/~hbaker1/RealTimeGC.html


Entry: `Harvard' PF in Flash ROM
Date: Mon Aug 17 19:15:18 CEST 2009

Now.. If this works, does it make sense to try to implement the same
algorithm on a PIC18 while using the RAM as linear memory, and keeping
the ROM GC managed?  It really doesn't need all that much space, and
the self-write seems to be compatible.


Entry: PEG/LEG
Date: Tue Aug 18 08:58:30 CEST 2009

Trying out this[1].

The best way to understand is the metacircular example: The grammar
for peg grammars is shown below.

           Grammar         <- Spacing Definition+ EndOfFile

           Definition      <- Identifier LEFTARROW Expression
           Expression      <- Sequence ( SLASH Sequence )*
           Sequence        <- Prefix*
           Prefix          <- AND Action
                            / ( AND | NOT )? Suffix
           Suffix          <- Primary ( QUERY / STAR / PLUS )?
           Primary         <- Identifier !LEFTARROW
                            / OPEN Expression CLOSE
                            / Literal
                            / Class
                            / DOT
                            / Action
                            / BEGIN
                            / END

           Identifier      <- < IdentStart IdentCont* > Spacing
           IdentStart      <- [a-zA-Z_]
           IdentCont       <- IdentStart / [0-9]
           Literal         <- ['] < ( !['] Char  )* > ['] Spacing
                            / ["] < ( !["] Char  )* > ["] Spacing
           Class           <- '[' < ( !']' Range )* > ']' Spacing
           Range           <- Char '-' Char / Char
           Char            <- '\\' [abefnrtv'"\[\]\\]
                            / '\\' [0-3][0-7][0-7]
                            / '\\' [0-7][0-7]?
                            / '\\' '-'
                            / !'\\' .
           LEFTARROW       <- '<-' Spacing
           SLASH           <- '/' Spacing
           AND             <- '&' Spacing
           NOT             <- '!' Spacing
           QUERY           <- '?' Spacing
           STAR            <- '*' Spacing
           PLUS            <- '+' Spacing
           OPEN            <- '(' Spacing
           CLOSE           <- ')' Spacing
           DOT             <- '.' Spacing
           Spacing         <- ( Space / Comment )*
           Comment         <- '#' ( !EndOfLine . )* EndOfLine
           Space           <- ' ' / '\t' / EndOfLine
           EndOfLine       <- '\r\n' / '\n' / '\r'
           EndOfFile       <- !.
           Action          <- '{' < [^}]* > '}' Spacing
           BEGIN           <- '<' Spacing
           END             <- '>' Spacing

Starting with this:

Grammar         <- Spacing Identifier+ EndOfFile         
Spacing         <- ( Space / Comment )*
Space           <- ' ' / '\t' / EndOfLine
Comment         <- ';;' ( !EndOfLine . )* EndOfLine
EndOfLine       <- '\r\n' / '\n' / '\r'
EndOfFile       <- !.


Identifier      <- < IdentStart IdentCont* > Spacing    { printf("ID: %s\n", yytext); }
IdentStart      <- [a-zA-Z_]
IdentCont       <- IdentStart / [0-9]

Which doesn't do anything..  I guess I'm not getting
something... Maybe what I need is leg instead of peg?  Indeed, it
seems to support expressions.

exampeles/calc.leg:


%{
#include <stdio.h>
int vars[26];
%}

Stmt	= - e:Expr EOL			{ printf("%d\n", e); }
	| ( !EOL . )* EOL		{ printf("error\n"); }

Expr	= i:ID ASSIGN s:Sum		{ $$= vars[i]= s; }
	| s:Sum				{ $$= s; }

Sum	= l:Product
		( PLUS  r:Product	{ l += r; }
		| MINUS r:Product	{ l -= r; }
		)*			{ $$= l; }

Product	= l:Value
		( TIMES  r:Value	{ l *= r; }
		| DIVIDE r:Value	{ l /= r; }
		)*			{ $$= l; }

Value	= i:NUMBER			{ $$= atoi(yytext); }
	| i:ID !ASSIGN			{ $$= vars[i]; }
	| OPEN i:Expr CLOSE		{ $$= i; }

NUMBER	= < [0-9]+ >	-		{ $$= atoi(yytext); }
ID	= < [a-z]  >	-		{ $$= yytext[0] - 'a'; }
ASSIGN	= '='		-
PLUS	= '+'		-
MINUS	= '-'		-
TIMES	= '*'		-
DIVIDE	= '/'		-
OPEN	= '('		-
CLOSE	= ')'		-

-	= [ \t]*
EOL	= '\n' | '\r\n' | '\r' | ';'

%%

int main()
{
  while (yyparse());

  return 0;
}


One problem though: it doesn't seem to be possible to pass context
into yyparse that can trickle down to the actions.  Makes me wonder
how one would implement dynamic scope in C in a thread-safe way.  I
guess by using thread-local variables..

Ok.  I think I get the basic idea.  Now the subtleties.  How to
express delimiting?

[1] http://piumarta.com/software/peg/peg.1.html


Entry: Weird error
Date: Tue Aug 18 15:37:32 CEST 2009

(define (read-file)
  (let loop ()
    (let ((expr (read)))
      (if (eof-object? expr)
          (post expr)
          (begin
            (post expr)
            (loop))))))
(read-file)

Passing the string ' 1 2 3 abc "def" foo'  gives:

READ-TEST
;; gc 2340:12378
;; gc 2383:12335
;; gc 2380:12338
1
2
;; gc 2377:12341
3
abc
"def"
foo
;; gc 2298:12420
ERROR: undefined: expr

This `undefined' message should not happen, as the variable should be
in lexical context.  Something goes wrong here..

The environment does have `expr' :

(gdb) p ex_post(sc, _ex_map1_prim(sc, ex_car, env))
(expr loop)

So maybe the two symbols are not equal?

(gdb) p term
$6 = 6561248

(gdb) p ex_caar(sc, env)
$7 = 6561248

No, they are..

Ok, I know what this is:  This should use find_slot istead of find,
because the value can be #f


Entry: First PEG grammar
Date: Tue Aug 18 16:41:23 CEST 2009

Next    = - d:Datum                      { ob = d; }
        | - e:EOF                        { ob = e; }
        | j:Junk                         { ob = j; }

Datum   = List | Number | Symbol | String

LP      = '(' -
RP      = ')' -
Dot     = '.' Space -

List     = LP Tail
Tail     = a:Datum Dot d:Datum RP        { $$= CONS(a,d) }
         | RP                            { $$= NIL }
         | d:Datum t:Tail                { $$= CONS(d,t) }

Junk    = < .* > EOF                     { $$= JUNK(yytext) }

Number	= < Digit+ > !Letter -	         { $$= NUMBER(atoi(yytext)) }
Symbol  = < Letter Char* > -	         { $$= SYMBOL(yytext) }
String  = ["] < ( !["] Char )* > ["] -   { $$= STRING(yytext) }


Letter  = [A-Za-z]
Digit   = [0-9]
Char    = Letter | Digit

Space   = [ \t\n]
-	= ( Space | Comment )*
EOF     = !.                            { $$= EOF_OBJECT }

Comment = ';' ( !EOL . )* EOL
EOL	= '\n' | '\r\n' | '\r'


After adding some special characters, it parses the boot.scm file.
Parse error reporting is going to be spartan it looks like..

So, how to parse from a different input port?
( Simple: read the man page. )

I think I get it now: a PEG is a backtracking program with a
sequential left to right choice operator.  It is not a description of
a CFG.

So, in order to read a minimal amount of characters, the current
grammar needs to be reworked a bit.  Esp. the greedy post-whitespace
parse needs to be replaced by something else.

Start with simply putting all '-' rules in front.  This seems to
almost work.. The .scm file won't parse though.

Entry: REPL working
Date: Tue Aug 18 20:07:50 CEST 2009

# This is the non-greedy version of the grammar:
# http://zwizwa.be/darcs/libprim/ex/sexp.leg


Read     = - n:Next                       { ob = n; }

Next     = Datum | EOF | Junk

Datum    = Quote | Const | List 
         | Number | Symbol | String

List     = LP Tail
Tail     = a:Datum Dot d:Datum RP         { $$= CONS(a,d) }
         | RP                             { $$= NIL }
         | d:Datum t:Tail                 { $$= CONS(d,t) }

Quote    = - ['] - d:Datum                { $$= QUOTE(d) }

Const    = True | False

LP       = - '('
RP       = - ')'
Dot      = Space - '.'
True     = - '#t'                         { $$= TRUE }
False    = - '#f'                         { $$= FALSE }

Number	 = - < Digit+ > !NonDigit         { $$= NUMBER(atoi(yytext)) }
Symbol   = - < Char+ >  	          { $$= SYMBOL(yytext) }
String   = - ["] < ( !["] . )* > ["]      { $$= STRING(yytext) }


Letter   = [A-Za-z]
Digit    = [0-9]
Special  = '!' | '-' | '?' | '*' 
         | '_' | ':' | '<' | '>' 
         | '=' | '/' | '+'
NonDigit = Letter | Special
Char     = Digit | NonDigit

Space    = [ \t\n]
-        = ( Space | Comment )*

Comment  = ';' ( !EOL . )* EOL
EOL	 = '\n' | '\r\n' | '\r'

EOF      = !.                             { $$= EOF_OBJECT }

Junk     = < .* > EOF                     { $$= JUNK(yytext) }


Entry: Next
Date: Tue Aug 18 21:26:25 CEST 2009

Scheme: 
- dynamic parameters
- dynamic wind
- partial continuations

PF: 
- separate linear pair type (OK)
- export non-linear part as EX (OK)
- polymorphy
- packet reuse
- exceptions

EX:
- primitives


Entry: GC and read -> need C suspensions
Date: Wed Aug 19 08:43:40 CEST 2009

Because ports are stateful entities, the read operation might interact
badly with GC restarts.

It might be necessary to construct an intermediate representation.
The real solution however is to make the reader communicate externally
at everything which is now a CONS, so it never sees scheme values.

I guess this can be made to work, so I'm going to stick to the current
version, since that change would be transparent once context switching
is implemented.


Entry: PF compiler: SC < PF ?
Date: Wed Aug 19 09:25:54 CEST 2009

Since it needs to construct the right data types, which ultimately tie
into the memory manager through finalizers, it looks like the linear
memory manager needs to be part of the Scheme's machine state if we
want to create linear code..

So no, this doesn't work..

What might work is to do it the other way around: have PF state be an
extension of SC state, so all Scheme code can run on the PF machine,
_and_ use the PF compiler words, but not vice versa.

Indeed: the `sc' struct only has the global registers and symbol cache
in addition to the `ex' struct.  This can be easily changed.

In order for this to work, the linear data needs to be well
separated.  Let's make a separate LCONS data structure.

Ok, almost done.  ex_bang_reverse was not polymorphic, and it wasn't
completely clear where to do the graph -> lin conversion.

NONLIN -> LIN conversion should happen in pf_ functions, while px_
functions operate on nonlinear structures.  They can depend on linear
structures, as long as they don't link them to anything.  I.e. inplace
update is allowed, and some kinds of folding.


Entry: LIN vs. BOX
Date: Wed Aug 19 12:24:51 CEST 2009

It's probably simpler to place ALL linear data that is supposed to go
into the nonlinear tree in a BOX.  LIN is only useful for quotations,
which can be nonlinear values.

This does break LIN->GRAPH->LIN equivalence, since all data will be
wrapped.

Something is not well specified in the data conversion: how to handle
lists at compilation time?  The interpreter should be dumb: it unwraps
LIN, but will simpy copy anything else.

Let's default QUOTE such that it will place a LIN-wrapped copy of the
quoted datum.


Entry: GC restarts
Date: Wed Aug 19 13:04:24 CEST 2009

When this keeps growing, it's probably going to be a good idea to get
rid of the GC restarts, and scan the C stack for references to update.
Restarting works well for simple functions, but with more complicated
C functions and these functions calling each other, the invariants (no
alloc after mutation) are not so easy to maintain.


Entry: PF read needs to be linear
Date: Wed Aug 19 13:15:28 CEST 2009

This is going to be more difficult to manage.  But I/O really needs to
be free of GC issues.  Ports also need to be lowlevel (fd instead of
FILE).


Entry: PF design choices
Date: Wed Aug 19 13:55:15 CEST 2009

So what problems does it actually solve?  The PF design allows the
combination of:

    * Synchronous finalization.

    * Reflection.

In PF, synchronous finalization is a consequence of the use of
primitive functions that operate on a linear core memory: linear
lists/trees and open/close RC-managed leaf objects).

Reflection -- here limited to compilation: the translation of source
code represented as a data object to a representation of composite
functions that can be executed by the interpreter -- is made practical
by embedding the linear core machine in a non-linear dynamic
language.

This composite code is naturally represented as a non-linear data
structure: it uses sharing (subroutine calls) and can contain cycles
(program loops).

While compilation itself is a nonlinear operations and might produce
garbage that needs to be collected asynchronously, running the
resulting code does not, as it only sequences linear operations.  Note
that nonlinear data can be treated as constant by the linear
primitives and interpreter because it is managed elsewhere, as long as
the asynchronous GC traces all the nonlinear references in the linear
memory.

Dynamic features are helpful for debugging, testing and evolving
design (= writing throwaway code).  This, however, is largely a matter
of taste.  The usefulness of the linear memory manager has some deeper
reasons though.

Asynchronous GC doesn't work well in combination with finalizers:
functions that free external resources (different from core memory)
when their representation gets collected.  

In practice, an ``external resource'' is anything with an
open/use/close access semantics that _you cannot change_ for whatever
reason.  This can be anything.  Some examples:

   - A file in a filesystem.

   - A hardware resource.
   
   - Cache-managed resources (i.e. processor memory cache).

( The latter case might be unclear.  Here the open/close is not
  explicit, but some resource will be evicted whenever a resource
  that's not in the cache is accessed.  A cache doesn't behave as
  memory because there is no `memory-full' notification when an
  eviction happens.  Absence of this signal is actually the basic
  idea: it frees the user from having to ``pool'' the resource.  In
  some cases however (essentially due to broken design, when you can't
  bypass the cache API), you want to regain control over the pooling
  behaviour by manipulating the access patterns from an external
  pooling mechanism.  Note that in general, this is only a performance
  problem: dead references will eventuall get evicted from the cache
  because they are never accessed, but they might trigger eviction of
  _other_ components that are still in use.  I.e. the round-robin
  access pattern typical for an asynchronous GC does exactly this in
  the case of an LRU cache strategy. )

The problem is that you never know when the GC runs, which might lead
you to cases where you're waiting on the availability of a resource
that's unreachable, but not yet collected.  Triggering local GC in
this case isn't a solution: the reference could be held by an
unreachable representation object in a _different_ program that uses
asynchronous GC which doesn't expose an interface to its GC trigger.
The real problem here seems to be that asynchronous GC is essentially
a _global_ operation, and making this work with low-level resource
management requires a specific kind of resource -> GC signalling to
happen between all components involved.

In stark contrast, the open/close semantics composes easily (a
constructor/destructor calls several constructors/destructors), and
shields the lower layers from what happens above.  This means that in
practical situations, open/close resource semantics are the only
available option, and in the case where fast reclaim is necessary, a
linear container memory might be the simplest solution.

A linear _concatenative_ language gives you many of the benefits of a
higher level safe language, with the added benefit of not needing an
asynchronous GC, at the expense of giving up the ``random access''
that come with lexical variables.

Of course, it is also possible to translate this to a language _with_
lexical variables as long as proper constraints are placed on variable
references: every variable needs to be referenced exactly once.  This
can be enforced statically.  This seems to be harder to get right
though (i.e. use-once closures).  However, if such a language can be
defined, it can probably be directly translated into the PF
concatenative execution model.

Note that Real-Time GC (incremental GC with aribrarily low upper
bounds on collection times) isn't a solution to the problem unless it
can manage _all_ resources in a bounded time.  Collection still isn't
synchronous, so any depletion event needs to propagate into the
collector, which might break the upper collection bounds as it
requires a full GC.


Entry: Static linearity checks
Date: Wed Aug 19 14:35:50 CEST 2009

It would be nice to find a way to statically enforce the linearity
constraint.  Currently, it's quite error-prone to write primitives.


Entry: Merging SC and PF
Date: Wed Aug 19 15:16:50 CEST 2009

Maybe it's time to completely unify the two interpreters.  Essentially
this would work as a `scheme-eval-step' in PF.

The idea is that once you have everything inplace to implement PF in
EX, the only thing that's left is the Scheme interpreter's (pure) step
function and some other operations on the interpreter state.

The only problem is Scheme's toplevel environment.  However, this
could be hidden as dynamic scope in the continuation chain inside the
state.


Entry: Multiple dispatch
Date: Wed Aug 19 16:46:09 CEST 2009

How is this best implemented?


Entry: No >r, nor r> : towards call/cc
Date: Wed Aug 19 17:56:53 CEST 2009

These don't work any more.  The problem is that there isn't really a
return stack.  There is only a continuation.

If this is the case then it is better to get rid of IP, and always use
the top of RS as the continuation.

What can be done is to modify the continuation such that an operation
can be inserted after another one.  I.e. to implement `dip'.

OK: removed IP register.  RUN is now simpy a push to RS.

Let's rename RS to K.

Now call/cc is very simple:

void pf_call_with_cc(pf *pf) {
    _ fn = TOP;
    _ k = pf->k;
    pf->k = LINEAR_CONS(fn, HALT);
    _TOP = k;
}

Note these are linear one-shot continuations, but they can be copied
without causing any trouble.  They are also _data structures_ and
cannot be applied like a function.

But, maybe "run" should take linear lists, by pushing them onto K.

Note this is problematic without partial continuations (which should
be the default..)


Entry: Lowhanging fruit
Date: Wed Aug 19 20:03:57 CEST 2009

- dip

- map

- linear lists (partial continuations) should not be recognized by
  `run' -> use `run-list' or so.


Map: This can't be a primitive, but I prefer to make it fast (local),
which could be done by modifying the continuation from the primitive.

It needs 3 storage locations: in out fn

  - move result to OUT
  - replace with next from IN
  - push FN to K

What I really need is a better way to express all these stack
permuations.

Dip: there's a problem here: quote isn't a linear atom, so it should
be handled by undip.


Entry: Delimited Continuations
Date: Thu Aug 20 09:27:08 CEST 2009

In contrast with the previous PF implementation, which was mostly an
indirect threaded Forth, the current VM has a direct representation of
continuations.  Partial continuations are not so difficult to
construct on top of this: they simply splice the current K upto a
marker.

I prefer to use shift/reset, so let's implement that first.

Seems to work:

(shift 1 2 3) reset
ps
<1> [{'1 '2 '3}]
run
ps
<3> 1 2 3

The first paper about delimited continuations seems to be this
one[1].  In the introduction it mentions that the crucial idea is to
represent continuations as functions that can be composed.
In contrast, traditional continuations cannot be composed, as they do
not return a value to the calling context (it is discarded).

On page 7, on changing the CEK machine: ``One solution is changing
(CEK7) so that the invoked continuation is concatenated to the current
one: ...''  Apparently this is problematic for the denotational
semantics (continuations being opaque?) but I see no problems on the
implementation side.

Essentially, in PF this concatenation could be made ``lazy'' (just
like SEQ SUBroutines) by allowing K to be a tree instead of a list.
That way the interpreter can unpack branches one cell at a time.
Otoh, it seems simpler to place the burdon in `run' as it is now.

There are some other problems with the current implementation though:
the `prompt-tag' and `undip' functions are special, but can occur in
lists that are representing a continuation.  Therefore it is probably
best to represent the continuations abstractly to prevent programs
from destructing them.  In any case, it could be left as is now.  Just
keep in mind: a list of code can be converted to a continuation, but
vise versa isn't really sound.

[1] http://www.ccs.neu.edu/scheme/pubs/felleisen-beyond.pdf


Entry: Error prone
Date: Thu Aug 20 10:01:05 CEST 2009

As mentioned before, writing the PF primitives is error prone.
Linearity constraints are difficult to maintain in the presence of
primitive exceptions.  There is only one remedy: never keep linear
variables on the C stack.  But this can be tedious..


Entry: Pattern matching for EX
Date: Thu Aug 20 11:28:14 CEST 2009

This really shouldn't be so difficult: from the data structures, it
would be a generation of a couple of CPP macros that setup a lexical
environment.  It would require some trickery with curly braces though.

MATCH(v) 
   CASE(pair, a, d) { }
   CASE(NIL)        { }
   ELSE             { }
END_MATCH


where MATCH sets up the first '{' and each CASE starts with a '}'.


Entry: Scheme -> EX compilation
Date: Thu Aug 20 15:24:19 CEST 2009

This seems really straightforward.  It would be nice to be able to
statically check at least names and argument arity.

About writing code generators: you _really_ want to write it as a
coroutine (a `generator') that passes strings to an output port
through a `yield' function, in this case traditionally called `emit'.
Using mapping might be more elegant, but in this case due to small
differences in tree structure, decoupling the collector from the
enumerator is best.

In C, there are statements and expressions.  For EX it's simpler to
get rid of the statements early on, so one can focus on the
expressions.  Also, they don't need to be converted to statements
(CPS) since C's expression syntax will do just fine.

Then, it's simpler to use syntax-rules style pattern matching to write
the generator than an using scheme/match at runtime.

A bit of work trying to get the basic abstractions right: statements
emit a line, expressions use string substitution to be built
recursively, function/constructor names get mapped, context arguments
get added.  Everything is a function, except the language forms
themselves, which are implemented as macros (`def', `block' and `return').


(def (bar x y)
  (block ((a 123)
          (b 345))
    (block ((c (plus a b))
            (d (min a)))
       (return (foo c d x y)))))

=>
_ ex_bar(ex *ex, _ x, _ y)
{
    _ a = 123;
    _ b = 345;
    {
        _ c = ex_plus(ex, a, b);
        _ d = ex_min(ex, a);
        return ex_foo(ex, c, d, x, y);
    }
}

Translating from Scheme-like syntax (eliminating 'return') will
require automatic distinction between special forms and application
forms.

There's another thing that's really annoying: nested Scheme/lisp
expressions return values, but C's curly braces represent nested
statements.  Maybe it's best to simply target GCC's statement
expressions[1], and use the '?' condition operator.

This seems more natural.  I'm also using a direct compiler instead of
macros.  To make this look better though, the indentation needs to be
changed so statement expressions can be assigned to variables to
reduce the visual complexity of the nesting.

That's a pretty-printing exercise, so consider the real problem
solved.  The rest is play ;)

So, prettyprinting.  What you want is "newline" to indent to the
current indentation point.

--

Trying something else: rewrote it to make it more _manual_ :
essentially the code makes decisions locally about whether to "press
enter" or whether to change indentation or not.  This seems to work
quite well.

(emit-definition
 '(define (foo a b c)
    (let* ((x (plus b c))
           (y (let* ((d (min a b))
                     (e (max a b)))
                (times d e))))
      (bar x y))))
(pp-enter)

=>
_ ex_foo(ex *ex, _ a, _ b, _ c)
{
  return ({ _ x = ex_plus(ex, b, c)
            _ y = ({ _ d = ex_min(ex, a, b)
                     _ e = ex_max(ex, a, b)
                     ex_times(ex, d, e); })
            ex_bar(ex, x, y); });
}


OK..  I now see that this is not trivial because it requires
backtracking.  Better use something that's seen some thinking[3].

[1] http://gcc.gnu.org/onlinedocs/gcc/Statement-Exprs.html
[2] http://www.iai.uni-bonn.de/~ralf/hw2001/9.html
[3] http://planet.plt-scheme.org/display.ss?package=pprint.plt&owner=dherman
[4] http://eprints.kfupm.edu.sa/69518/1/69518.pdf

Entry: Terminology
Date: Thu Aug 20 22:59:14 CEST 2009

I shouldn't be calling the code graph an AST (abstract syntax _tree_)
since it is a _linked_ AST, which in general is a graph: variable
references linked to their definition points.

In PF all variables refer to code: primitives, quotations or SUB
cells.


Entry: PF interpreter in Ocaml
Date: Fri Aug 21 08:14:26 CEST 2009

I'd like to encode the interpreter data types in something more rigid,
to see if I missed some corner case.

The first thing to realize is to distinguish types and constructors.

      SEQ = (SUB : SUB)            ;; `:' is graph CONS
      SUB = PRIM | QUOTE | SEQ
 
      K   = HALT | (SUB . K)      ;; `.' is linear CONS

type sub  = 
    Prim
  | Quote
  | Seq   of sub * sub ;;

type k = 
    MT 
  | Frame of sub * k ;;

The second thing is to realize that dynamic typing is a _lot_ less
restrictive than the ML type system.  Especially the wrapping
necessary make unions of other types.  Dave Herman describes it
here[2] in relation to the way it's done in Typed Scheme[3].

type sub  = 
    Prim  of prim
  | Quote of datum
  | Seq   of sub * sub

and k = 
    MT 
  | Frame of sub * k

and datum =
    Number of int
  | Code of sub 
  | Pair of pair

and prim =
    Address of int

and pair =
    Nil
  | Cons of datum * datum

;;

Now I'm trying to express the interpreter.  Ocaml's syntax is weird.
Constructors have different syntax from functions?

Aha.  Ok I understand.  Constructors can take multiple inputs, but
functions always take a signle one.  Multiple inputs are handled using
currying.  A constructor seems to be a function that takes a tuple.

Next: how do you put a function in a datatype?  I.e. the prim type is
actually state -> state.  Looks like I want GADTs[4][5][6].  This
looks like a can of worms I don't want to get into yet.  Let's use
simple enumeration.

Good.  It's running.  See code here [7].

I've added 'Run' which leads to some arbitrariness in encoding the
types.  Is Run a prim or a special case of code?  I'd say the latter
since it operates on the whole machine state instead of just the
parameter stack.

Quite revealing to have to specify things a bit tighter..  From a
first real contact, I think I like OCaml, even if it is more
restrictive.  It is definitely more easy to get things right once they
pass the type checker.  I should have a closer look at Typed
Scheme[3].

Now, getting the 'fac' example running. In factor:

  : fac ( n -- n! ) dup 1 = [ 1 ] [ dup 1 - fac ] if * ;

What is necessary is a way to define a cyclic data structure.
Apparently `let rec' does not allow this:

let rec _fac =
  Seq(_dup,
  Seq(lit 1,
  Seq(_equals,
  Seq(quot (lit 1),  
  Seq(quot ((Seq(_dup,
             Seq(lit 1,
             Seq(_minus, 
                 _fac))))),
  Seq(_if, 
      _multiply))))));;

=> Error: This kind of expression is not allowed as right-hand side of `let rec'

How to solve this?  In Scheme it's easy: just dump a symbol at first,
then walk the data structure and set! the value.


[1] http://lambda-the-ultimate.org/node/983
[2] http://calculist.blogspot.com/2008/09/true-unions-revisited.html
[3] http://www.ccs.neu.edu/home/samth/typed-scheme/
[4] http://www.haskell.org/haskellwiki/Generalised_algebraic_datatype
[5] http://okmij.org/ftp/ML/GADT.ml
[6] http://www.haskell.org/haskellwiki/GADTs_for_dummies
[7] http://zwizwa.be/darcs/libprim/pf/pf.ml
[8] http://caml.inria.fr/pub/ml-archives/caml-list/2008/12/460de486fcbbc2eca13e5c77d014da17.en.html


Entry: PF interpreter in Haskell
Date: Fri Aug 21 17:02:45 CEST 2009

That was fun.  Let's try the Haskell version.  First, how does this
work..  Loading the file pf.hs in ghci:

> :l ~/libprim/misc/pf

To reload:

> :r


Oops.. ran out of steam..  maybe one lanuage in one day is enough..

I wonder though: the only apparent difference with Ocaml (apart from
the pure and lazy things) is that constructors seem to be curried
functions, while on Ocaml they take tuples.


[1] http://learnyouahaskell.com


Entry: Remarks about the PF interpreter
Date: Sat Aug 22 08:15:00 CEST 2009

Refer to entry://20090816-120957

Note it is important to distinguish the byte code graph data structure
from an AST or the untyped s-expression encoding of _symbolic_ code.

The SEQ type can have sharing (shared subroutines) or loops (direct or
mutual recursion), while an AST or an s-expression is really a flat
_tree_.


    - symbolic source code like:  ``1 dup cons''

      here the symbols are translated by a dictionary (vocabulary)
      that maps symbol -> SUB.  symbolic code is a _tree_ represented
      as an _untyped_ s-expression.

    - `byte code graph'.  Each symbolic source code definition can be
      compiled to a single SUB (subroutine) type, which is _directly
      linked_ to other SUB types by data sharing.  Because it's a
      graph, this data structure is not easily printed.

    - AST: because the original syntax is so simple, the intermediate
      form one would traditionally use in an interpreter = a data
      structure with a recursive type that represents the same _tree_
      as a source code s-expression, is _not_ present in the current
      PF implementation.  Compilation goes straight from symbolic code
      as non-annotated s-expressions to the `byte code graph'.


IMPLEMENTATION

      One could _implement_ a SEQ(foo,bar) as the following machine
      code:

                  call <address of foo>
                  jump <address of bar>

      Note that the `first' slot in a SEQ will push the return stack
      when it points to another SEQ.  If foo were a prim, the `call'
      instruction could be replaced by inlined machine code
      implementing the primitive, or a call to a machine code
      subroutine.

      In the case 'bar' is a subroutine that's used only once, it
      could be inlined to eliminate the jump.

      Alternatively, if the machine has a literal push instructiom, an
      even simpler representation would be something like:

                  push <address of bar>
                  jump <address of foo>


Entry: The return stack should contain XTs
Date: Sat Aug 22 10:40:14 CEST 2009

Something that has always bothered me about Forth is that the return
stack contains _pointers_ to XTs (i.e. XT _arrays) while `execute'
takes XTs.

In other word: the return stack _itself_ is not a program.  This is a
consequence of having two forms of code: threaded code and machine
code.  An XT is often implemented as a pointer to machine code (direct
threaded) or a pointer to a pointer to machine code (indirect
threaded).


Entry: PF closure
Date: Sat Aug 22 10:46:47 CEST 2009

Because there is no nested lexical scope, there are no closures as
there are in a functional programming language based on the lamda
calculus.

However, they can be simulated by `consing' data onto code.


Entry: Ocaml and tagged interpreters
Date: Sat Aug 22 12:23:04 CEST 2009

It seems to me that not being able to define data types that can
contain functions is quite a limitation.  Maybe there is something I
didn't get?  Maybe this only can't be done _recursively_ ?

I.e. what I want is:

type sub =
    Prim    of (state -> state)
  | Seq     of sub * sub (* NL *)

instead of having to break it down to tags where each state -> state
function is interpreted.

Actually.  Look at [1] Function types.  

Also [2] Recursive values which are not functions.  About recursive
values: apparently it is only able to handle `simple' expressions.
``Informally, the class of accepted definitions consists of those
definitions where the defined names occur only inside function bodies
or as argument to a data constructor.''[3]

[1] http://caml.inria.fr/pub/docs/oreilly-book/html/book-ora016.html#toc21
[2] http://caml.inria.fr/pub/docs/oreilly-book/html/book-ora016.html#toc23
[3] http://caml.inria.fr/pub/docs/manual-ocaml/manual021.html#s:letrecvalues

Entry: Next
Date: Sat Aug 22 15:22:33 CEST 2009

After the small excursion to Ocaml land (maybe I should forgo the
Haskell version?) it's time to get back on track and get this thing
working before next week.

Goals: 

 - tasks/assymetric coroutines with proper stack switching.

 - automatic wrapping of C APIs.


Entry: Closures / Different continuation types
Date: Sat Aug 22 18:22:21 CEST 2009

The 'cons' trick to make a runnable continuation out of lists
(possibly consed to existing code) doesn't work: code needs to be
quoted.

> '(1 2 3) run
> ERROR: unknown-instruction: 1

Maybe it was really a good idea to go for the Ocaml route: now I can
try to express these things in a type system to eliminate glossing
over those level-shifting tricks that tend to seep in to ``just make
it work''.  This is something to be _really_ careful about.  Something
I've only learned to appreciate recently..  It makes things more
complex, but it is well worth it when you try reasoning about the
code.  Guy Steele mentioned this in [1].  It is also quite central to
the PLT module system[2].

Intuitively: it is about a more general principle of not giving into
the ``turing machine feedback loop''.  Recursion (or iteration) is a
fine tool, but too much mixing makes things hard to understand.  More
often you want to abstract direct recursion into combinators.

For my taste Joy relies too much on `i' which constantly shifts
levels.  However it does illustrate a point: if you want a really
simple language: eval everything.

What would work is to have different 'frame' types to instruct the
interpreter to handle these speical cases.  Also, the stuff you pinch
off of a continuation stack should really be opaque.  Composable, yes,
but not decomposable..  Maybe there could be two kinds: interpreting
and non-interpreting frames?  So it looks like we need different kinds
of continuations after all..  They can all be differently tagged
linear pair types though.


[1] http://www.ai.mit.edu/projects/dynlangs/wizards-panels.html
[2] http://www.cs.utah.edu/plt/publications/macromod.pdf


Entry: Quote of the day
Date: Sat Aug 22 18:50:04 CEST 2009

Attributed to Matthew Flatt:
 ``You need a certain amount complexity to make things look simple.''

[1] http://list.cs.brown.edu/pipermail/plt-scheme/2009-August/035243.html


Entry: Asymmetric Coroutines in Lua
Date: Sat Aug 22 22:39:04 CEST 2009

It makes the point that, like partial continuations (PC), asymetric
coroutines (AC) are easier to use because they are composable just
like routines, while symmetric coroutines are hard to set up.

Coroutines cannot yield when there is a C function call in the stack.

The authors mentioned they were working on investigating the relation
between AC and PC.  From what I tried in Scheme, an AC can be
implemented in terms of a PC.  Also the other way around?  They both
are essentially segments of a control stack.

The update is here[3][4].

A library by Sandro Magi (naasking)[5][6].  Looks like this is the
place to contiue working.  I missed it last time, since I went to the
sigfpe[7] article for inspiration.  Currently, libprim/task.c has
stack copying coroutines.

So, what am I going to do?  Stack copying already works, so it could
be a fall-back.  Implementing one-shot continuations could then be
implemented as an additional feature.

[1] http://lambda-the-ultimate.org/node/438#comment
[2] http://www.inf.puc-rio.br/~roberto/docs/corosblp.pdf
[3] http://lambda-the-ultimate.org/node/2868
[4] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.58.4017
[5] http://code.google.com/p/libconcurrency/
[6] http://higherlogics.blogspot.com/2008/07/coroutines-in-c-redux.html
[7] http://homepage.mac.com/sigfpe/Computing/continuations.html
[8] http://www.reddit.com/comments/6s5tt/coroutines_in_c


Entry: Making partial continuations abstract
Date: Sun Aug 23 09:35:40 CEST 2009

So, it looks like it's clear that PCs are the proper abstraction
mechanism: they are k-stack segments that can be concatenated to the
current continuation.

There are several ways to approach this.  However, considering that
the tagging needs to use _linear_ memory, the appropriate way is to
tag the pairs.  If this is the way to do things, is it possible to
move other kinds of tagging to a ``snapped'' pair-based encoding?

I.e. instead of Seq(Quote(a), b) use SeqQuote(a, b)

It doesn't look like it: there are several disadvantages to not having
Quote(...) and Prim(...) as separate instances of a sub (subroutine)
type.  These are used a lot as a single type and should be bundled, so
they don't complicate matters.

So what about this:


and cont  = 
    Done  
    | Next of sub * cont
    | Late of datum * cont

Where 'Late' has the implied semantics of using `interpret' to convert
datum -> sub.  This would allow certain kinds of cheap, linear
reflective behaviour to be implemented.

( Note: I'm _only_ trying to work around the construction of a Quote
type, which is nonlinear.  I.e. it is a flattening optimization that
allows the use of linear data only. )

What about calling 'Late' something like 'Data' or 'Close'.  It really
doesn't need `interpret' semantics: it can be a pure quote, because
the call to `interpret' can be inserted directly into the
continuation.

This looks alright.  I just have to convince myself that this is
elegant: I.e. that it's really necessary to have 2 quotation
mechanisms: one part of nonlinear code (enabling data to behave as
code + be shared), and one part of linearly constructed code (data as
code, but not shared).  

Let's give it some fermentation..

EDIT: One thing to note is that the continuation doesn't need this
whole tail-recursive constraint: it is always consed to the empty
continuation, which leads to a machine halt state.  So it really _is_
a pair.

All this is because of subtle differences is between `code', `closure'
and `continuation'.  (I need a better name for `closure' though.)


Entry: Tree permutations
Date: Sun Aug 23 10:00:42 CEST 2009

It's suspiciously difficult because of the possible type exceptions
that might mess up temporary storage on the data stack.  So let's
forget about optimality and write well-factored code.

A linear language specification needs a sublanguage for specifying
_tree permutations_.  Essentially, once you've figured out which
pointers permute, the inverse is just the inverse permutation.  This
needs to be split in two parts:

      - make sure the tree has the structure you expect (type check)

      - perform permutation

Essentially, I want th tree transformation to be safe and guaranteed
linear in the first place, and correct in the 2nd place (which is then
easily checked in the terminal).

Probably Knuth has something to say about this.  Also, I ran into a
tree transformation language a while back.  Forgot where..  Was a
Belgian guy.

Also, I did something like this before.  Probably hidden somewhere in
Staapl / Brood code archives..  Poke?

This looks like a nice exercise to write a compiler for.

Can it be done using binary tree rotations?

( Really, I've done all this before..  But where? )

Suppose the stack is this: (L . S)

`cons' will be something like  (L . (d . S)) -> ((d . L) . S)
then the reverse `uncons' is:  ((d . L) . S) -> (L . (d . S))

They are indeed simple tree rotations.

A left rotation can be transformed into a right rotation as
LEFT = FLIP * RIGHT * FLIP

Nope!  They are not tree rotations.  They are rotations followd by a
particular flip:

               LEFT                 FLIP
(L . (d . S))  --->  ((L . d) . S)  --->   ((d . L) . S)

Anyways, it doesn't seem so difficult to compile such a definition
into a program that deconstructs the list, and then re-uses the list
cells to reconstruct it.

This can even be done manually.  I've summarized it in the next post.


Entry: Writing linear tree transformations
Date: Sun Aug 23 10:58:56 CEST 2009

/* Implementing (conservative) tree transformations. 

   A tree transformer can be derived using the following manual
   compilation approach:

       1. Write down the transform in a high level dotted pair form:
          LHS -> RHS
    
       2. Following the structure in LHS, bind the nodes (and
          deconstructed dot pairs) to C lexical variables using
          type-checking casts for the pairs.  This makes sure
          exceptions happen before we mutate anything.  

       3. Rebuild the tree by re-using the pairs to create the dotted
          tree in the RHS.

   To increase readability, index the pairs from left to right as they
   appear in the textual form of LHS and RHS.

*/


/*  D = datum
    L = list
    P = parameter stack
    
      (L . (D . P)) -> ((D . L) . P)
*/
void pf_cons(pf *pf) {
    pair *dot0 = CAST(lpair, pf->p);
    pair *dot1 = CAST(lpair, dot0->cdr);
    _ L = dot0->car;
    _ D = dot1->car;
    _ P = dot1->cdr;
    pf->p     = VEC(dot1);
    dot1->car = VEC(dot0);
    dot0->car = D;
    dot0->cdr = L;
    dot1->cdr = P;
}

/*  D = datum
    L = list
    P = parameter stack

      ((D . L) . P) -> (L . (D . P))
*/
void pf_uncons(pf *pf) {
    pair *dot0 = CAST(lpair, pf->p);
    pair *dot1 = CAST(lpair, dot0->car);
    _ D = dot1->car;
    _ L = dot1->cdr;
    _ P = dot0->cdr;
    pf->p     = VEC(dot0);
    dot0->cdr = VEC(dot1);
    dot0->car = L;
    dot1->car = D;
    dot1->cdr = P;
}


Entry: peg/leg leaky?
Date: Sun Aug 23 11:18:05 CEST 2009

Let's clean up the reader.  There is a problem with the peg/leg
parser: it uses malloc() but no free().  OK for one-shot tools but if
I'm going to keep using it, it's probably best to handle that
problem..

Ok, it apparently reuses its buffers, and grows them whenever
necessary.  I guess that's OK.

So, now I just need to be careful to not upset it.  Probably best to
use malloc()


Entry: Different continuation types
Date: Mon Aug 24 11:03:53 CEST 2009

Continuing the ideas in [1].  Starting with the following context:

  * A partial application is a binding of data context to code.
    Suppose you have an object referenced by `ob' and an operation
    `op' that expects such an object as the first parameter on the
    stack, the following code would be a closure:

          (ob op)

  * In Joy, code == data.  Its VM wouldn't need to distinguish pairs
    for code, partial applications and partial continuations.  Their
    difference is only visible through usage: they are all represented
    as lists.

  * In PF, these 3 need a different representation because code is
    nonlinear, while partial applications and continuations are
    linear.
    

What about this:

and stack = Empty | Next of datum * stack

and cont  = Done  | Next of sub   * cont
                  | Data of datum * cont

Wait.  There is already a mechanism to construct partial applications
as continuations: `dip' followed by `run'.

The question is then: what is best?  Explicit tagging (using different
continuation frame types) or ``prefix parsing''.

It seems to me that tagging is better behaved: in that case each frame
has its meaning isolated, and doesn't depend on the meaning of the
previous frame.  So be it: tagged frames.  This also makes `dip'
simpler.

[1] entry://20090823-093540


Entry: A Joy machine
Date: Mon Aug 24 12:21:55 CEST 2009

In [1] you find an intensional Joy machine with continuations
implemented as intensional quotations constructed at run time as the
machine performs name -> quotation dereference.

This is an example of how a reflective feedback loop (turning data
into behaviour) can make a language interpreter really simple.

The flip side is that reflection breaks static analysis.

[1] http://zwizwa.be/darcs/libprim/pf/joy.ml


Entry: SEQ -- why encode the byte code as a binary tree instead of nested lists?
Date: Mon Aug 24 17:16:15 CEST 2009

(From a discussion with Dominikus.)

Why use a binary tree instead of a list to represent byte code?

We start with the assumption that we agree on representing byte code
as a recursive data type (i.e. as opposed to using low-level arrays
and pointers/indices), and that we agree on wanting proper tail call
handling so loops can be expressed as recursive calls without blowing
up the context stack.

To do this correctly, given a concatenative function specified by the
code sequence "a b c ... z", the _last_ procedure `z' needs to be
handled in a special way.  Before `z' is invoked, no context needs to
be saved because there is nothing to do afterwards.  This is in
contrast with the function calls `a', `b', `c' ... which need the
computation to resume and thus require a context save.

Now, in the case of a compiled language, you have two options.  You
either perform this special treatment of the last element in the list
at run time, or you perform it at compile time and represent the code
in a data structure that doesn't need any special case handling by the
interpreter. 

This is exactly what SEQ does: it provides a data structure which can
be interpreted without special-casing.  The `handling' of tail calls
is done at compile time only, by encoding the program as a binary
tree.  Interpreting the binary tree has only one case: the first
branch needs context save, while the latter one doesn't.


Entry: Rant about reflection
Date: Mon Aug 24 19:04:41 CEST 2009

PF isn't really so much about Forth.  The name is a historical
accident..  PF is really about building a simple, safe, dynamically
typed language with linear memory management.

Let me rant a bit.

Forth is.. well.. Forth :) There isn't much to say about it, except
that it's a very nice machine model with a hacked-together text
interpreter that's really too reflective.

Over the last couple of years I've moved more into the direction of
less reflective code, mostly guided by the PLT Scheme approach to
domain specific languages.  The attempt in Staapl to ``unroll'' the
very reflective nature of standard Forth stands as one of the pilars
of my approach to concatenative languages.

In general the pattern seems to be that using very reflective
languages leads to very simple language cores and a lot of expressive
freedom, but in turn makes it easier to write programs that are harder
to understand.

In a static programming style you put the cleverness in the types and
other static constraints: something a compiler can check and proove
consistent.  In a dynamic language the cleverness goes right into the
core of things: anything can be special-cased.

The main point being that when code bases grow it seems to be
beneficial to be able to bolt things down in a more static way: huge
code bases can't be kept in a human head so you need a machine to help
manage at least part of the consistency.  Adding some static
properties seems to be the only way to do that effectively.

I'm currently right in the middle of these two paradigms: I realize
dynamic languages are essential for debugging and exploration, but
also that static structure is necessary to put more meaning into code
you write for "later" when it's meaning is no longer in your
short-term memory.  The holy grail seems to be to find ways to combine
these two in a productive manner.  Gradual typing seems to be the way
to go.


Entry: JOY: Intensional quotation
Date: Mon Aug 24 20:32:40 CEST 2009


Both intensional and extensional quotations can be _constructucted_ by
building larger objects that contain other quotations as subparts, but
only intensional quotations can be _deconstructed_ i.e. interpreted as
a list.

Because the use of extensional quotation requires two different types
with different operators, Manfred concludes[1] that ``On the whole,
then, it does seem preferable to have quotation as an intensional
constructor.''

PF (and Scat in Staapl) do it differently to be able to compile
quotations to a more efficient form.  MetaOcaml does it differently to
not loose well-typed guarantees.  In general it seems to me that
extensional quotation is the sensible default: loose reflective object
language expressivity to gain meta language expressivity.


[1] http://www.latrobe.edu.au/philosophy/phimvt/joy/j07rrs.html
[2] http://en.wikipedia.org/wiki/Intensional_definition
[3] http://ccl.clozure.com/irc-logs/scheme/2008-10/scheme-2008.10.22.txt


Entry: rewriting -> Factor list
Date: Tue Aug 25 08:00:05 CEST 2009

Hello list,

I guess this is mostly for Daniel, but anyone else of course feel free 
to chime in...

I'm trying to get a better idea about rewriting systems for
concatenative languages, particularly in the area of specification of
peephole optimizations.  It's been on my mind for a while, but I've
always side-stepped any non-deterministic computations.  I.e. in
Staapl[1], the PIC18 code generator[2] is written using an eager
pattern matching approach, essentiall re-interpreting already
generated machine code as stack machine code that can then be
partially evaluated together with the currently compiling primitive
operation.  (Explained here[3] in some detail).

I've always been suspicious that this formulation, while it works well
in practice and has a certain elegance, is a special case of a more
general, non-confluent rewriting system.  

It seems to me that if there are any interesting optimizations to
make, they are not going to be unique and depend on which path through
the reduction rules is used to get to a particular irreducible
expression.

And then it stops.

I have some intuition about reduction machines, as long as they
compute a unique solution (lambda calculus w. strict eval (CEK
machine) and lazy eval using graph reduction, ...) but I'm not sure
how to view the other cases where you have a bunch of rewrite rules
and you ask the compiler to pick a sequence of reduction rule
applications that leads to the ``most optimal'' reduction under some
kind of cost function (for PIC18 this would be the machine code
length).

I hope this makes some sense..
Any ideas on this welcome.

Cheers,
Tom

[1] http://zwizwa.be/staapl
[2] http://zwizwa.be/darcs/staapl/staapl/pic18/pic18-unit.ss
[3] http://zwizwa.be/archive/staapl.html


Entry: Real-time GC
Date: Tue Aug 25 08:34:16 CEST 2009

PF uses linear core memory management and nonlinear code memory
management.

With linear memory management I mean that the machine's run-time uses
_only_ a tree data structure.  SEQ encodes all the rest, and is the
basic data structure of the nonlinear memory.  (Essentially, SEQ is
only constructed at compile time: during run-time the code structure
is constant.)

The reason for this is that (in my understanding), "real" real-time 
tracing garbage collection doesn't exist.  It exists for memory-only GC, 
but in practice you also want to manage much more scarce hardware
resources which in the light of a real-time GC _still_ require full
non-bounded collection steps.

If you replace a tracing GC with refcount management system, then you
can have real-time GC, given that you realize GC cost is proportional
to the _size_ of data structures you destroy.  In practice, this isn't
a problem.  An embedded system usually has a loop with very
predictable data usage.  The case where you destroy a single data
structure containing a large amount of objects is rare.


Entry: Metacircular interpretation: Maxwell's equiations of software
Date: Tue Aug 25 10:27:08 CEST 2009

[1] http://arcfn.com/2008/07/maxwells-equations-of-software-examined.html


Entry: Next
Date: Tue Aug 25 10:37:26 CEST 2009

Implement the 2-type continuation frames in C.  This requires two
kinds of linear cells: data cells (list cells) and contiuation
k-cells.

Got it..  let's see if `dip' works.

OK.

shift/reset now also work again.

The implementation (using linear pairs for data and `next' frames for
sequence continuations) is a bit messy.  Maybe there is something to
say about doing it like this..  Too reflective?  Let's keep it like it
is and see if there are any inconsistencies.

Summary: it's now possible to `cons' a datum onto a partial
continuation, which is implemented as a nested data/next frames, where
the data frames are just the linear list object.

This is still not correct:

Essentially one wants a `bind' and a `compose' operation which
combines a datum or a function with a function.


Entry: What is a continuation?
Date: Tue Aug 25 17:33:50 CEST 2009

That's the down-to-earth bit-twiddler's ``is''..

A continuation is essentially (a representation of) an imperative
program that sequences the remaining primitive machine operations to
perform to complete the current program/evaluation.  The side effect
of this imperative program is the evaluation of the expression it
represents.

Obvious if you look at what CPS-conversion does: it translate a
functional program written in terms of nested expressions, into an
imperative program that incrementally builds and updates a data
structure.

Now, this becomes really obvious in a concatenative (point-free)
language, which already is in CPS form.  The continuation is
essentially a program in the same form as any other program.


Entry: PLT Redex
Date: Tue Aug 25 17:46:05 CEST 2009

Jao calls it syntax-rules on steroids[1].  Looks like the answer to
the question about the peephole optimizer's rewriting behaviour has
been right under my nose for quite a while.

[1] http://programming-musings.org/2009/06/19/a-merry-gang/
[2] http://redex.plt-scheme.org/


Entry: Re: rewriting
Date: Tue Aug 25 21:31:25 CEST 2009

Thread + reply from Daniel:
http://www.mail-archive.com/factor-talk@lists.sourceforge.net/msg03559.html

> If the rewrite rules are monotonic, then there's an easy procedure
> to optimize the program: list out all programs, calculate their
> efficiency, and select one of the best ones

I guess here `monitonic' means that there are no loops?  Something
like `strictly decreasing'.

> With something like an optimizer, I don't understand why things have
> to be confluent. 

This corresponds to my intuition also.

> Maybe the algorithm to find the optimal series of peephole
> optimizations will be a confluent subset of the general rewriting
> system of all valid peephole optimizations; is this what you mean?

What I meant was that the algorithm in Staapl is formulated
differently (each word is a macro: a _fuction_ which takes an object
code stack to an object code stack).  I don't immediately see how this
can be reformulated as a more general rewrite system on code _syntax_,
but I've got this hunch it's not so difficult.

I'm also not sure if they are really 100% equivalent, since I'm
throwing in some extra specification of the order of evaluation.  This
makes it possible to play tricks like using objects that only exist at
compile time without requiring any special notation for it: the eager
algorithm guarantees that such objects are fully reduced before code
is compiled (or raise an error).  What would be neat is to be able to
keep this semantics, but use a more elegant way of formulating it as a
syntactic operation.  (If this doesn't make sense I'd be happy to
elaborate.)

> On register-based architectures (like the PIC18, right?), doing
> everything with a stack is inherently inefficient because of peeks
> and replaces on the top of the stack.

The 8-bit PIC chips are weird.  The 16-bit cores are more or less
standard register/risc but on the 8-bitters everything needs to pass
through a single working register.  This makes it almost cry for a
stack language approach :)

> I guess you do the equivalent of DCN by your peephole
> optimizations. 

DCN = deconcatenation?

> The thing about Factor's compiler is, it does this in a more general
> way. You should look at Slava's past blog posts, or read the code,
> if you're interested in knowing more about it.

Bout time I have a look at it then..  Is the compiler tied in closely
to the rest of the system?  I'm wondering if I it would be difficult
to embed it in PLT Scheme through the concatenative -> Scheme wrapper
macros used in Staapl.

So, your basic idea seems to be that yes, you want something that's
not confluent, and no, there is no general algorithm: general
rewriting is too unstructured, so a special purpose approach is
required.

Thanks so far for these answers!

Cheers,
Tom


Entry: Abstract continuations
Date: Wed Aug 26 11:12:17 CEST 2009

Time to byte the bullet and make the code more abstract.
TODO: make the polymorphy more highlevel on next change.


Entry: Partial Application
Date: Wed Aug 26 11:55:02 CEST 2009

The pf.ml machine viewed as a non-linear machine can have a `sub' for
continuation.  To do this linearly one needs:

      1. a way to wrap up a a linear datum as something runnable

      2. allow for the composition of runnable things.

It turns out that `compose' is a much better base to build on than
partial application, which is a special case of run-time data -> code
conversion and runtime code composition.

In the current form it needs an extra wrapper for nonlinear -> linear
data, but after that, `compose' is just list concatenation where the
CONS cells are continuation frames of `dip' and code sequencing.


Entry: PF: Linear code composition with `compose'
Date: Wed Aug 26 15:47:04 CEST 2009

The `>lcode' word _projects_ linear code and nonlinear code to linear
code.

The `abstract' word _wraps_ any atom as linear code.

The `lcompose' word takes two linear code words and composes them.

The `compose' word takes two elements on the stack, projects them to
linear code using `>lcode', and composes them with `lcompose'.

The `pa' word takes two elements on the stack, converts the first one
using `abstract' and the second one using `>lcode', and composes the
resulting linear code with `lcompose'.

A partial continuation obtained using `shift' and `reset' is
represented as linear code.  A full continuation obtained from
`call/cc' is currently implemented as a partial continuation that
ignores continuation marks inserted by `reset'.  The `run' word and
with it all combinators take both linear and nonlinear code.

They do _not_ take data however.  In that light I'm still not
convinced that `>lcode' should automatically convert data to code.
There is probably need for an `>ldata' word too.


Summary:

        Anything that represents code can be passed to `compose' to
        yield a code quotation that can be passed to `run' and higher
        order combinators.


Entry: Shift / reset
Date: Wed Aug 26 23:19:22 CEST 2009

In Scheme, `shift' binds the partial continuation to a value and
then passes this to a body, shifting it in place of the removed
frames.  Looks like `reset' and `shift' need to be exchanged!

More tomorrow..

(shift 1 2 3) reset
> > ps
<1> <,{'1 '2 '3}>
> run
> ps
<3> 1 2 3
> 

It is `reset' that marks the continuation stack and `shift' that packs
up the continuation.  So it looks like it is correct.  The only
strange thing is that in a CPS language, the ``body of shift'' will
come after `reset' !


Entry: Concatenative languages and Full continuations
Date: Thu Aug 27 14:32:29 CEST 2009

From a discussion with Dominikus I come to conclude that it's
important to distinguish different kinds of continuations:

  * Full state continuations, that capture the entire machine state:
    both parameter and context stack.

  * Context-only full/partial continuations, that capture marked
    segments of the control stack and view the whole data stack as the
    value to be passed.  

Note that marking the data stack is ill-defined, as it may be popped
``beyond the mark'' before it is captured.

Can both be unified?

Rephrasing one of the solutions Dominikus proposed, one could wrap the
stack as a quotation, and concatenate it onto the remainder of the
current program being evaluated, yielding a new program that
encapsulates the full continuation.

Let's implement this in PF and see where it takes us.


OK.  After doing this it seems that

 * You still want partial continuations in this context: in most cases
   you're not interested in the toplevel drivers, just in the control
   that's part of the program.

 * The operation factors out into two steps: ordinary continuation
   operators that ignore the parameter stack, and a separate operation
   that appends the stack as a quotation to the resulting
   continuation.


Thinking a bit further and keeping an eye on how it's implemented in
Scheme, it seems that this generalizes to any kind of threaded state
mechanism (i.e. FP-style matrices, Monads, Arrows, ...?)


Entry: simpler primitives
Date: Thu Aug 27 15:03:09 CEST 2009

As mentioned before: there are too many things to think about
implementing linear primitives (possible GC restart and mutation).
This reduces my confidence that the code is correct in the presence of
primitive error aborts or GC aborts.

So, either I give up on the idea of GC restarts, or I factor the code
such that these distinctions are no longer a problem.


Entry: TODO
Date: Fri Aug 28 11:26:56 CEST 2009


- define partial continuations better

- check out the multipe state threading in dominikus' code

Entry: Partial Continuations in a Concatenative Language
Date: Fri Aug 28 13:19:08 CEST 2009


Look here[1] for links of the theoretical background.

First, the current implementation of `reset' and `shift' are _wrong_
-- actually doubly wrong:

* the _captured_ by `shift' is delimited.
* the _shifted expression_ is placed in a delimited context.

I'm using an interface that doesn't even have a shifted expression.

So let's fix this: the API is now:

   shift / control ( quot -- )

Where the quotation is called with the continuation on top of stack.


[1] entry://../compsci/20090828-144955
[2] http://www.bluishcoder.co.nz/2006/03/factor-partial-continuation-updates.html
[3] http://docs.factorcode.org/content/article-partial-continuations.html


Entry: GC restarts: clear separation between PURE and LINEAR functions.
Date: Fri Aug 28 21:01:50 CEST 2009

To catch GC restarts in the definition of linera primitives, a simple
trick can be used for testing: simply kill the reference to the GC
right before a mutation is made.

So, primitives are either purely functional and can have allocation,
or are linear and cannot perform allocation but are allowed to perform
in-place operations.

Let's say that a primitive starts in pure mode, but whenver it mutates
something, it switches to impure mode.

In order to test all primitives, a freelist = NIL + GC step can be
performed before the execution of each primitive to cover the case
where a primitive needs extra storage.


Practically:

- The VM: replace all PUSH/DROP operations with a FROM_TO. Start with
  making this type-tagged.  There is a real problem in the exception
  handler: it might fail as there is no guarantee there's enough
  memory.

- Find a more highlevel way of doing this.  Preferably static.  Can't
  this be solved as a type issue?  In C?

- Am I solving a real problem, or is this just a consequence of the
  way the GC is implemented?  The real problem is to make a
  distinction between PURE and LINEAR, where the latter cannot perform
  any allocations, and the former cannot perform any mutations.

- Try it first in Scheme.


Ok.  In Scheme it doesn't seem to be such a problem.

Now, in PF, can't we make LINEAR the default?  The problem is then:
how to specify nonlinear words?

Let's say it's allowed to switch to PURE mode at the entry point of a
primitive.

There's something fishy going on with exceptions in Scheme too:
exceptions should NOT perform allocations.  The _sc_step() might have
switched to linear mode, so the sc_make_error might trigger GC.

    switch(exception = setjmp(sc->m.r.step)) {
        case EXCEPT_TRY:
            PURE();
            rv = _sc_step(sc, state);
            break;
        case EXCEPT_ABORT: 
            rv = sc_make_error(sc, sc->m.error_tag, sc->m.error_arg, 
                               state, const_to_object(sc->m.r.prim));
            sc->m.error_arg = NIL;
            sc->m.error_tag = NIL;
            break;
        default:
            break;
    }

(fixed: sc_make_error is now called before _sc_step, to preallocate
such that the error allocation itself can't trigger GC restart).

Maybe primitive exceptions should be linear?  There are a lot of
nuances here..  It needs to be simplified.

EDIT: I'm still using the earlier paradigm of ``pure before linear''.
In the C code this requires respecting some invariants in an
error-prone dentralized way, but the principle itself holds and seems
to have some higher significance: 

   You can use pure funcitional operations as long as you don't change
   the world state.  I.e. it's either PURE or LINEAR.  And you're
   allowed to switch from pure to linear, but not the other way
   around.

Practically, this makes memory management easier in the presence of
brutally low-level C-code.  Also, it's quite simple to enforce
linearity: simply do not pass a reference to the allocator!  It's more
difficult to enforce purity however.  In C this can't be enforced (or
maybe it can.. using some `const' declarations..)


Entry: Primitive exceptions should be linear
Date: Sat Aug 29 09:39:12 CEST 2009

The main reason for this is that you really want the C model to be
independent of the GC.  Exceptions extend into C.

The _real_ problem is that in PF, there are occasions where you want
to perform a linear allocation, but you're not sure this is going to
succeed.  What about calling this a system failure?  I.e. always make
sure there are enough free cells.  In an embedded system it most
definitely is an unrecoverable failure.

The problem is that in current ``prototyping PF'' the linear cells are
allocated lazily.  Maybe a good solution would be to keep an extra
free buffer with a couple of cells N to guarantee at least N
allocations to succeed in the LINEAR regime where the GC is switched
off.  Then when the machine can take a GC restart, replenish this
buffer.

So, let's leave it as it is.  Fix the problem when there are
unexpected allocations in LINEAR mode.  Currently this seems to only
happen when errors are already quite severe.

Then, the only remaining problem is to make sure no _inconsistencies_
are introduced due to primitive errors by keeping linear data linked
to the VM instances at all times.


Entry: Fully linear PF machine
Date: Sat Aug 29 11:11:18 CEST 2009

Hmm... Maybe the trick is really to have a single linear <-> nonlinear
conversion primitive, and never perform any conversions in other
primitives.

Another problem: in _px_run() the `ip' can't be a C variable for the
same reason that other structures can't be present only on the C
stack: primitives might abort in which case the current `ip' is lost.

What about turning this around: 

  1. PF primitives can never fail due to GC restarts as they _always_
     run in LINEAR mode.  They will abort on an empty freelist, but this
     case can be handled separately and be made more gentle in the
     prototyping case (where you want to give up real-time behaviour
     for extra allocation).

  2. Anything that's nonlinear needs to be guarded from GC restarts in
     a different way.


Entry: Linear `compile'
Date: Sat Aug 29 14:15:06 CEST 2009

Now, because `compile' can't introduce any non-linearities, it could
be made to produce a linear code representation.  Only when code
enters the _dictionary_ need it become non-linear: since then it has
become random-access and there are no more guarantees about how it is
referenced.

Let's split this up in `compile-datum' and `map'.


Entry: Combining CDR-coding with vectors
Date: Sat Aug 29 19:24:12 CEST 2009

Instead of using vectors with tags in the name field, it might be
simpler to optimize the representation for the encoding of CDR-linked
lists.

This is to be able to give in to some performance issues:  speed is
not important for PF, but memory size might be.  Currently, CONS cells
are really expensive, requiring 3 pointer slots per cell.

However, since the rest of the implementation relies heavily on type
tags, maybe this isn't going to work well?  It might be better
reserved for a tagless representation..

Something for later.

[1] http://en.wikipedia.org/wiki/CDR_coding


Entry: PF: clear staging
Date: Sun Aug 30 11:09:35 CEST 2009

Everything in the code points to and optimal organization where linear
/ inplace programming and pure functional programming are clearly
separated.  Keeping both in the same VM might be useful for sharing a
memory model, and is _very_ useful as a user interface, but it
complicates matters.

The lessen to learn: this _is_ a staged approach, so it can be
implemented in a way where the object language / machine is completely
independent from all reflection, and where code and other
graph-structured objects are constant.

The good news is that this is a very good fit for Staapl.

What needs to be done is to find a way to express this in a typed
setting.  It is my hunch that the PF VM _itself_ might be better
expressed in a typed language, where linearity constraints can be
managed statically.

Essentially, a linear concatenative language is a _notation_ that
allows one to forgo linearity annotations, which seem to be necessary
in a applicative language with unstructured variable references.


Entry: A fully linear concatenative VM needs interpretation
Date: Sun Aug 30 11:50:31 CEST 2009

What I find really remarkable is that it is possible to make a pure
functional concatenative language fully linear.  You don't need _any_
graph structure, including its virtual machine!

However.  Full linearity of the machine requires some form of
interpretation (laziness).  When code is executed, name references
need to be expanded to their definitions.

This `lookup' cannot be performed in a separate compilation stage
(i.e. to eliminate its associated run-time cost) while maintaining
linearity of the VM.  Such a stage would directly link variable
references with variable definitions and require a non-linear (graph)
data structure:

   * Names can be used multiple times (i.e. procedure re-use) which
     requires a directed acyclic graph.

   * Name-based recursion can introduce loops in the representation.

In this light, the idea behind the PF machine is that you operate in
two regimes: one which remains linear, but keeps code and VM as
opaque, constant structures, and one which is nonlinear to implement
issues related to the VM (in the simplest case just code compilation).

The main idea is that: ``Names are part of the meta-language''[1].

[1] entry://../staapl-blog/20080813-125604


Entry: Using C's `const' to distinguish between PURE and LINEAR
Date: Sun Aug 30 17:13:21 CEST 2009

LINEAR is easy to enforce dynamically: simply don't provide a
reference to the allocator.  Pure is more difficult, but might be done
statically using C's `const' for return values of constructors.

Entry: Factor-like syntax
Date: Sun Aug 30 18:27:22 CEST 2009

Removed gratuituous incompatibilities with Factor: the parser now
accepts '{' and '}' for based list literals, and the pretty-printer
uses '[' and ']' to display decompiled code.

FIXME: this is broken: {{123}} -> '('(123))


Entry: Using Partial Continuations
Date: Mon Aug 31 09:34:24 CEST 2009

Let's try the classical inversion of control problem: convert a
traversal function into a cursor.

First example: convert a list into a stream by inverting `each'.

    (list->stream . ((((cons) shift) each '()) reset))

A `stream' is a pair of a value and a quotation or '().  


> '(1 2 3) list->stream
> > ps
<1> {1 . <,[each-next] '{2 3} '[[cons] shift] ,['()] ,[prompt-tag]>}
> uncons
> ps
<2> 1 <,[each-next] '{2 3} '[[cons] shift] ,['()] ,[prompt-tag]>
> run ps
> <2> 1 {2 . <,[each-next] '{3} '[[cons] shift] ,['()] ,[prompt-tag]>}
> uncons run ps
> > <3> 1 2 {3 . <,[each-next] '() '[[cons] shift] ,['()] ,[prompt-tag]>}
> uncons run ps
> > <4> 1 2 3 ()


This also makes it pretty clear that a partial continuation shouldn't
take the data stack, otherwise `list->stream' wouldn't work
correctly.  I.e. in the following you really want everything
underneath the literal list to be left intact.

> 123 123 123 '(1 2 3) list->stream
> > > > > ps
<4> 123 123 123 {1 . <,[each-next] '{2 3} '[[cons] shift] ,['()] ,[prompt-tag]>}


Entry: Onward
Date: Sun Sep 20 14:59:20 CEST 2009

Looks like the overall design is done.  In the mean time Dave Herman
fixed the show-stopping bug in c.plt, and I've started integrating it
in Staapl.  The plan is now to gradually move the libprim
implementation in C to a higher level description, so it can be
compiled to C or another substrate (i.e. a stack machine).  This would
also enable the linearity constraints to be machine-checked.

EDIT: the goal here should be to keep the current functionality
working at all times, and migrate it to a different implementation.


Entry: Summary
Date: Tue Oct  6 17:18:15 CEST 2009

What do we have now:

  * EX: a very simple GC supports an expression evaluation language
    that is used to implement the Scheme and PF machines.

  * Scheme: a very simple reflective Scheme interpreter.

  * PF: a VM for a linear memory concatenative language.

  * Some scaffolding code to generate wrappers.


What do we want:

  * Wrapper generation for a library of leaf objects.

  * Fully functional PF language.

  * Simpler way to implement PF primitives.


Entry: Images
Date: Tue Oct  6 17:25:41 CEST 2009

Goal: get the video in, XV out working in PF.

Strategy: use the old PF code, after factoring it into plain C OO
code.  Currently, the code in X11 seems already factored so should be
usable as-is.

Next: create a wrapper generator.  For scheme, it's usually simplest
to keep the parameter layout.  For PF however, object references are
usually best placed on top-of-stack, with all the other arguments in
reverse order.  

Something like this:    

          message(object, arg1, arg2) ->  arg1 arg2 object message


Let's try Scheme first, then use that as a base to write the RPN wrappers.

Step one: parse the file with c.plt
Took cplt.ss from staapl (includes simple printer).

Hmm..  This isn't going very well.  Maybe the hint I should take from
this is that my header files aren't abstract enough: they depend on a
lot of typedefs from deep within the system.  The C objects should
really abstract most if not all of this..

So, what's next?  Setting a clean header standard?  I guess this is
doable: allow for public and private versions, with public ones
exposing only functions.  Alternatively, working around the
limitations of c.plt might be simpler.

Filed bug report.

Note: there is a syntax for creating ASTs:
http://planet.plt-scheme.org/package-source/dherman/c.plt/3/2/planet-docs/c/pc.html


Entry: C parsing
Date: Wed Oct  7 09:09:29 CEST 2009

Ok, let's make it work with modified .h files for now.  No, the bug is
too much of a block.  The simplified headers won't parse either.  I'm
letting this go until the block is fixed.


Entry: PLT Scheme
Date: Wed Oct 14 09:28:21 CEST 2009

The current design of a combined Scheme / PF machine seems useful.
However, if there is no constraint on the _size_ of the scripting
system, it might be wiser to bolt a PF-like linear machine onto PLT
Scheme, in a way that is usable with other Scheme implementations.

Making the definition of PF more formal might be done in this way too:
providing a substrate in PLT Scheme which can compile to C and the
minimalistic Scheme/GC in librim.


Entry: Gstreamer
Date: Thu Oct 22 12:46:55 CEST 2009

Recent events have made me think it's probably best to bridge PF and
gstreamer.  However, it might be interesting to do this as a 3-way
coupling between plt-scheme, PF vm and the gstreamer framework.


Entry: Next: usability + test case
Date: Tue Nov  3 10:28:39 CET 2009

Need to evolve the design towards an (undisclosed) application.
Things to fix:

  * automatic wrapper generation

This might be the most interesting thing to start at.

  * unify the Scheme and PF cores into a single VM.


Entry: FFmpeg
Date: Tue Nov  3 10:36:12 CET 2009

Need FFmpeg encoder bridge + image format.

image:   256 level greyscale.
encoder: mpeg4, no audio

How to generate the glue code?  -> manually for now
(probably need both manual and generated glue code)

  Q: Is it worth it to use the C api?  Is that amount of control
     necessary?  Maybe a pipe bridge is easier?

  Q: What documentation to use?  Its difficult to find a good
     introduction for the C encoder api.

This seems to be the right spot[1].  Start with the file
libavcodec/apiexample.c[2].

The example file uses the following objects:

    AVCodec *codec;     // codec object
    AVCodecContext *c;  // configuration data for codec
    AVFrame *picture;

General structure is simple: picture data is converted to raw packet
based on current codec context.

Next: representing encoded data blobs is maybe best done using the
native string objects.

Problem: how to bridge wrapped buffers and C-level buffers?  I.e. the
low-level C objects need to be able to interface with leaf/bytes.h. 

The simplest solution seems to be to never let the C code allocate
anything (except for data embedded in locally defined objects).  

[1] http://ffmpeg.org/developer.html
[2] http://www.irisa.fr/texmex/people/dufouil/ffmpegdoxy/apiexample_8c.html


Entry: building a custom Scheme interpreter
Date: Tue Nov  3 16:16:06 CET 2009

How to build an interpreter with extensions?

Currently leaf objects are initialized in sc/scheme.c : _sc_new()

    /* Atom classes. */
    sc->m.p = malloc(sizeof(*(sc->m.p)));
    TYPES->ck_type = ck_class_new();
    TYPES->symbol_type = symbol_class_new(1000);
    TYPES->prim_type = (void*)0xF001; // dummy class
    TYPES->port_type = port_class_new();

In ex/ex.h : base_types (the classes) are accessed by the core
implementation through the sc->m.p member.

The problem that needs to be solved: a scheme factory method
`make-foo' needs to access the foo class object through the sc* scheme
VM pointer.

Question: do these classes really need to be tied to the VM?  The
problem is this: they need to be stored somewhere.  It's possible to
store them as a global variable : some C libraries do this anyway.

Essentially: 
 - do different VMs _need_ different class instances?
 - do we want to _allow_ this?
 - classes could be shared

Two conflicting requirements: it's better not to use any global
process data, but some form of sharing will be present when libraries
are used that do this.  Let's build the FFmpeg wrapper with global
classes.


Solution: _sc_init() takes a parameter = a struct containing classes
that will be stored in the sc->m.p member.  These can then be used by
procedures to access vm-global resources, where the implementation can
then decide to make them image-global (i.e. FFmpeg's init)


Entry: installed libprim scheme
Date: Tue Nov  3 17:26:21 CET 2009

Todo: 

 - make the header files compatible with system-includes.  I.e. use a
   single base (ex/... sc/...)

 - bundle libraries:

      libprim_ex.a
      libprim_sc.a
      libprim_media.a


Libraries are not a problem.  Header files are, because of
generation..  Separating in private / public will be a mess.

Ok, fixed it


Entry: Automatic wrapper generation
Date: Tue Nov  3 18:44:40 CET 2009

The idea is now the following: from a .h file containing a simple C
wrapper written in terms of known C objects, generate all the glue
code to register the functions with the VM.

For SC and PF, this amounts to packing/unpacking strings, ints,
... depending on the memory model used.

Essentially: the wrapper binds the pure C objects to a PF/SC memory +
control (exception) model.

The task should be simple: examine the .c / .h code and generate
anything that looks like a pattern.


Entry: The big plan
Date: Tue Nov  3 21:33:02 CET 2009

So, what am I going to do with this?

To be able to use it in commercial projects, I'm probably going to
switch the licence to LGPL or BSD.  The idea is that a simple .c
tarball (without dependency on mzscheme) can be used to build the .a
libs which can then be linked into an app.  DONE

The problem is that I'm still not positive about how to tackle the
wrapper generation.  This essentially bridges the C data model to the
EX data model, possibly wrapped differently for SC/PF.  This is still
too complex.  The architecture needs to be formalized.

The C function model should be something like dataflow functions in
Oz.  The idea being that the C code doesn't perform any data
allocation, but that function output is handled by in-place updating
the collection of basic objects.

I.e. SSA, but on an object-level.


Entry: The media lib
Date: Wed Nov  4 06:38:18 CET 2009

Collection of .c/.h files.
media.c : wrappers for scripting languages.

To generate the wrapper code it would be best to make sure that the .h
files are "clean" : no system libs, no delegate library libs.

This can be moved out of the critical path though.  Keeping the
general idea in mind, the ex functions can be coded manually.

The problem with class registration remains though.  Classes are
accessed as part of a struct, so they cannot be dynamic.


Entry: More formal architecture
Date: Wed Nov  4 07:19:19 CET 2009

C - side: objects, constructors, destructors and in-place functions.

The C functions can be bridged to EX function by wrapping C objects in
a GC managed struct.  This should be automated.

How can this automation be made as simple as possible?


Entry: Swig
Date: Wed Nov  4 07:28:54 CET 2009

Let's have a closer look at Swig[1].  It might be wise to make all
interfaces compatible with standard scripting languages through Swig.

The problem with Swig is that it generates large code.  It can really
be done a lot simpler..

[1] http://www.swig.org/Doc1.1/HTML/Contents.html


Entry: Are classes/types global?
Date: Wed Nov  4 07:49:31 CET 2009

I guess the answer is yes.  C objects contain pointers to C classes,
but the classes themselves do not really need to be tied to the VM
instance.

The real question is: what is a VM instance?

Let's reformulate: each VM has its own memory manager (GC).  This
means that managed data cannot be exchanged between VMs.

Really, a VM is kind of a global variable.

Why is there a need to reference the base_types struct?  Let's look at
the macros that use it:

#define DEF_ATOM(name)                                                \
    static inline name *object_to_##name(object ob, ex *m) {          \
        return (name*)object_struct(ob,m->p->name##_type); }

This essentially maps `name' to a pointer to its associated class/type
object for use in the `object_struct' function, which is the main
function that determines if an object matches a type, and returns a
pointer or NULL.

There is no real reason why m->p->name_FOOTYPE shouldn't be
globalvar_FOOTYPE, apart from global variable references being more
expensive.

So..  How to make the scheme interpreter extensible?  Can these core
implementation issues be hidden from user extensions?  The whole point
is to be able to simply wrap some C/C++ code..

Let's stick with:

  - interface: class_new() methods in the C code, to allow for most
    general approach.

  - policy: (static) global variables for class objects in extensions


Entry: Wrappers
Date: Wed Nov  4 13:06:42 CET 2009

Looking at media.c, what occurs here is:

  - CAST
  - ERROR
  - C object calls

CAST + object calls can be automated.  

For PF and SC, object wrapping might be different.  Can this be solved
at link time?  The real interface is object_to_xxx which is currently
an inline function.  If this is transformed into linkable symbols, the
binary code can be reused + plugins become feasible.

How to automate ERROR wrapper generation?  Somewhere the decision
needs to be made about how to handle errors.  Either C code returns
error values, or a setjump-based approach can be used.


Entry: Missing functionality
Date: Wed Nov  4 13:41:43 CET 2009

- current-input-port / current-output-port OK
- system OK
- string parser OK
- parameters
- let*
- macros: check if it's actually correct (evaluation order)
- readline

Can parameters be bypassed?  I'm already using implicit output ports
all over the code.  Currently just using direct (lexical) write
routines: parameters aren't really necessary.


Entry: ffmpeg delayed frames
Date: Wed Nov  4 13:59:19 CET 2009

- first frame might be zero size
- setting frame input to NULL produces delayed frames.


Entry: ffmpeg external codecs
Date: Wed Nov  4 15:05:12 CET 2009

does ffmpeg have an mp3 encoder?
no : use lame

ac3, aac, vorbis are built-in

apparently all configuration data is stored in the AVContext struct.
the ffmpeg code is quite error prone..  a bit of abstraction around it
won't hurt.

next: write encoded audio to buffer. there is no data structure for
audio - just a flat array of short int.


int avcodec_encode_audio(AVCodecContext *avctx,
                         uint8_t *buf,
                         int buf_size,
                         const short *samples);


Entry: Muxing
Date: Wed Nov  4 16:35:54 CET 2009

av_interleaved_write_frame AVPacket -> AVFormatContext

an AVFormatContext has a number of AVStream


Whether to write audio or video is based on the pts (time stamp)


        /* write interleaved audio and video frames */
        if (!video_st || (video_st && audio_st && audio_pts < video_pts)) {
            write_audio_frame(oc, audio_st);
        } else {
            write_video_frame(oc, video_st);
        }


[1] http://cekirdek.pardus.org.tr/~ismail/ffmpeg-docs/structAVPacket.html
[2] http://cekirdek.pardus.org.tr/~ismail/ffmpeg-docs/structAVStream.html
[3] http://cekirdek.pardus.org.tr/~ismail/ffmpeg-docs/structAVFormatContext.html


Entry: load
Date: Thu Nov  5 08:13:29 CET 2009

It's working: implemented in scheme, but the parser won't give eof: it
hangs.  Fixed: it actually worked, just needed to add eof-object? check.


Entry: C90
Date: Thu Nov  5 10:29:01 CET 2009

Making sure it works using C90.  How to ask GCC to check that?

gcc -ansi -pedantic

Problems:
    - task (as expected..) :

task.c:63: warning: ISO C90 forbids variable-size array ‘reserve’
task.c:63: warning: ISO C90 forbids mixed declarations and code

    - va_copy (C99)

Conclusion: apart from some annoying limitations and the task.c
abstraction there seems to be little trouble in making it compliant to
C99.  C90 is more difficult though.  Postponed until it's required.


Entry: finding home
Date: Thu Nov  5 11:13:53 CET 2009

After "make install" the binary library should be able to find its
boot code.  It's probably simplest to use pkg-config to do this.


Entry: libprim classes
Date: Thu Nov  5 13:50:26 CET 2009

Change needed: all classes need the same members, so the scripting
core can treat all objects properly.

0. free
1. write

Essentially, the class struct contains the methods, while the object
struct contains a pointer to the class struct which identifies the
object type.

The problem is that the method implementation need to be independent
of any scripting language core details.

OK works.


Entry: Parser
Date: Thu Nov  5 17:03:46 CET 2009

the leg parser gives problems.  essentially, I don't understand it.
time to switch to a different one.  maybe gut the one from tinyscheme?


Entry: GC restarts and reading
Date: Thu Nov  5 17:34:05 CET 2009

Reading is stateful: it can't trigger a restart.  This is best solved
by calling the collector before read.

Growing the heap:  restart loops need to be detected.

fixed with gc->margin check after collection.  Currently set at 100
cells.  (Heuristic: 11 is not enough for SC and triggers a restart
loop).


Entry: SRFI-1
Date: Fri Nov  6 07:51:05 CET 2009

I took the reference implementation from [1], but had to remove some
initial comments: the parser choked on realloc().

Problems keep turning up.  While it's quite elegant to express it as a
leg grammar, it's probably best to start thinking about writing one
from scratch that I actually understand and that plays nice with the
GC + supports suspension.  Weekend job.

Code parses, but some functionality is missing:

 values, receive, check-arg

[1] http://srfi.schemers.org/srfi-1/srfi-1.html


Entry: dynamic linking
Date: Fri Nov  6 11:16:27 CET 2009

All object_to_xxx and xxx_to_object calls should be accessible by the
linker.  This consists a major interface between the C lowlevel
objects/classes and the wrapped representation in the scripting layers
PF/SC/...


Entry: parsing
Date: Fri Nov  6 16:04:36 CET 2009

Using peg/leg anyway..  The problems to solve:

  - realloc problem

  - GC access since it's a side-effecting operation.

Also look at this[1], a (Haskell-style) parser combinator for C.

[1] http://www.cs.chalmers.se/~koen/ParserComboC/parser-combo-c.html


Entry: manual parsing?
Date: Sat Nov  7 07:26:48 CET 2009

Lisp syntax is LL(1) -> deterministic recursive descent.

Manually?  problems:
  - data buffering (growable arrays)
  - serialization 

First: how to serialize a parse?  Essentially, I need a linear
representation of CONS cell based s-expressions.

(1 2 3)   -> ( 1 . ( 2 . ( 3 . () ) ) )
((1 2) 3) -> ( ( 1 . ( 2 . () ) ) . ( 3 . () ) )

CONS NUMBER 1 CONS NUMBER 2 CONS NUMBER 3 NIL
CONS CONS NUMBER 1 CONS NUMBER 2 NIL CONS NUMBER 3 NIL

here CONS and NUMBER are functions that consume a fixed amount of next
items which are always functions to be executed.

Essentially, the s-expressions need to be translated to a prefix
stream.  This can be standardized.  There is only one primitive data
item: a fixed size byte array with size prefix.  The rest are
ADT constructors.  The constructors themselves could be encoded as
prefixed byte strings. 

So, in order to do this, the only thing that's needed is an encoding
of natural numbers in byte strings to represent size tags.

CONS NUMBER 1 CONS NUMBER 2 CONS NUMBER 3 NIL
-> 1C1N11C1N21C1N31X

where "C" = cons
      "N" = number
      "X" = nil

Also, to make the format completely general, the constructor sizes
need to be specified too.  So we have:

<c-name-length> <name> <c-nb-arguments> ...

So, the only problem is self-delimiting natural numbers.  Constraints:
    - small numbers are fast to decode.
    - mostly ASCII when the names themselves are ASCII


CONS NUMBER 1 CONS NUMBER 2 CONS NUMBER 3 NIL
-> 1C21N11...


Simplest way to have self-delimited byte sequences is to mark the last
digit.  This way small sizes are captured in single bytes.

Using base-16 is inefficient, but it's easiest to decode on small
targets.  This way letters can be used, with capitals denoting the
last symbol.  An advantage is that this introduces redundancy that can
be used to guess the file format.

abcd
efgh
ijkl
mnop

eHello

Better is this: use the range 0x30 "0123456789:;<=>?" for the final
symbols and 0x40 @ABC... for the prefixed.  This way the small strings
have decimal digits.

1C21N1B40x10

Even better: use the range 0x20 for prefixes: this distinguishes them
from names.

1B4fooo


Entry: Architecture summary
Date: Sat Nov  7 09:36:10 CET 2009

LEAF:  primitive C objects
EX:    C-based expression languages using GCd objects (wrapping leaf objects)
PF/SC: execution model, using EX primitive operations + data model.

caveats:

  - SC is slow: heap-allocated tree walking interpreter with
    search-based symbol binding.  The upside is that the structure is
    simple: a textbook CEKS machine.

  - EX requires distinction between two modes: 

    * PURE: construction without mutation = GC enabled -> primitives
            might restart on collection

    * LINEAR: mutation without construction = GC disabled.


Entry: tokenized tree parser -> postfix stream
Date: Sat Nov  7 11:09:10 CET 2009

Got the numbers working.

Now, instead of using a prefix format.  What about using a postfix
one, where bytes are pushed and constructors executed afterwards?  An
extra bit could encode whether the byte string is quoted, or whether
it is an operation.

This way the parser doesn't need procedure recursion, but can use an
explicit (more efficient?) reduction stack.

However, to make the format partially parsable, arity still needs to
be specified.

So, it looks like the tradeof is to use bottom up or top down formats.

The conclusion seems to be that really, parsing isn't the problem.
The hairy one is tokenizing, and representing a tokenized stream so it
can be trivially reconstructed.

Another thing: tokenizing produces a stream of things.  What things?
There needs to be a distinction between code/constructors and
data/primitives.

Ascii rep constraints:
  - no '"' and '\' characters so it fits in C strings
  - 3 modes: prefix numbers: last digit contains quotation bit


Entry: Representing tokenized streams
Date: Sat Nov  7 11:30:02 CET 2009

Let's sum up the design decisions:

  * representing tokenized streams: prefixed or delimited

Prefix tokens are simpler for implementing token streams, as they
allow memory allocation in the _recognizer_ to happen before the
payload data is touched.  However, to do this completely general, some
form of delimited parsing needs to be used to represent the _size_
itself (if one doesn't want to introduce arbitrary size limits by
choosing fixed representations as in Quicktime/MPEG).  However, this
is not a problem: in practice a recogniser can set a limit (i.e. 64
bit sizes since it's limited anyway) while the format is in no way
limited.

  * quotation vs. construction

Since the base type is made of byte chunks, an extra tag is necessary
to indicate whether a list of bytes is a payload or a constructor
name.

  * bottom-up (postfix) or top-down (prefix)

This reflects the structure of the parser.  The same token stream
format can be used for both parsers.  I.e. the list (1 2 3) is
represented as

  RPN:  1 2 3 NIL CONS CONS CONS
  PN:   CONS 1 CONS 2 CONS 3 NIL

Note that for cons lists RPN isn't so great.  One could use XCONS to
build lists in reverse.

  RPN   NIL 3 XCONS 2 XCONS 1 XCONS

There is one important point: RPN isn't terminated, while for PN it's
always clear whether we've read all the data or not.  I.e. RPN needs
EOF.

  * arity-tagged constructors

Arity tagged constructors are important to be able to represent the
stream as a tree without meta-information about the constructor tags.

Is this actually useful?  It doesn't seem so...  Any kind of
less-structured format can always be embedded in cons-cell based
lists, and is maybe better represented as such.

So, for libprim:

  - prefixed tokens
  - 3 tags: 
      0x20 non-trailing-digit
      0x30 trailing-code (numerical digits)
      0x40 trailing-data (bytes)
  - polish (rpn = not delimited)


Entry: s-expression parser
Date: Sat Nov  7 15:25:27 CET 2009

Now, for the s-expression parser on top of the tokenizer.  Is it
possible to convert between the two formats without parsing, or is an
intermediate tree rep necessary?

It looks like this needs a tree rep to keep track of the paren
matching.  Alternatively, it could be tokenized to LP DOT RP and
parsed from that.

List    = LP Tail
Tail    = a:Datum Dot d:Datum RP         { $$= CONS(a,d) }
        | RP                             { $$= NIL }
        | d:Datum t:Tail                 { $$= CONS(d,t) }


Entry: macros
Date: Sun Nov  8 10:27:16 CET 2009

The question is: should macros be expanded when a procedure body is
created, or when it is applied?

1. Does this change semantics?  In a way that's important for the
standard?

Yes: if macro definitions change after creation of body code both
regimes behave differently.

2. Does it affect (memory) performance?

Without optimizations, expanded expressions are probably more
expensive to store than original code.  However, it has probably a
huge effect on run-time performance.

So... It looks like some form of compilation is in order.


Entry: macro expansion
Date: Sun Nov  8 11:22:33 CET 2009

The bootstrap now starts with the definition of a `macro-expand'
function that traverses an expression and recursively expands it.  The
question is now where to plug it in.  The inner interpreter should no
longer recognize macros.

Let's try to see what happens if that functionality is simply removed.


Entry: profiling
Date: Mon Nov  9 10:13:42 CET 2009

Check why it's so slow.  Probably this is a name lookup issue.  It's
especially clear now that boot performs macro expansion.  Figure out a
way to cache that or implement environments differently.  If I recall
from Forth, dict lookup whas the biggest bottleneck.

Check Dybvig's thesis[1].

[1] http://www.cs.indiana.edu/~dyb/pubs/3imp.pdf


Entry: libprim dependency structure
Date: Mon Nov  9 15:12:13 CET 2009


        LEAF   <--  MEDIA  <--.
          ^                   |
          |                   |        (C)
. . . . . | . . . . . . . . . | . . . . . . 
          |                   |        (EX)
   PF --> EX  <-- SC  <-- SC_MEDIA    


Above the dotted line is C code without memory management (except
malloc for DAG-structured leaf objects).  Below is built on the EX/GC
memory model, with PF and SC implementing VM control structure.


Entry: source distro w/o mzcsheme dep
Date: Mon Nov  9 16:33:18 CET 2009

Build the generated files + tar up.

   
Entry: string ports
Date: Tue Nov 10 08:23:21 CET 2009

This morning I've added an indirection to the port class that makes it
possible.  Next: create ports with an embedded bytes string.


Entry: class pointers as global variables
Date: Tue Nov 10 08:24:29 CET 2009

I guess there's nothing wrong with that.  The only reason the VM
structs need the class pointers is to dispatch on type of leaf object.
The fact that these are referenced somewhere else from a global
context (i.e. leaf objets can be created outside of the VMs) doesn't
matter.

This will probably make a lot of code simpler.  Especially writing
extensions.


Entry: IO to string
Date: Tue Nov 10 10:48:14 CET 2009

In order for the normal write routines to work, the prototypes need to
change so formatting can be done in two steps.

> (define p (open-output-string))
> (write 'd p)

Program received signal SIGSEGV, Segmentation fault.
0x0806b5d4 in _IO_str_overflow ()
(gdb) bt
#0  0x0806b5d4 in _IO_str_overflow ()
#1  0x080699d3 in _IO_default_xsputn ()
#2  0x0805d811 in vfprintf ()
#3  0x082209a8 in ?? ()
#4  0x00000001 in ?? ()
#5  0xbfa435e0 in ?? ()
#6  0xbfa455dc in ?? ()
#7  0x08056309 in yy_Tail () at sexp.h_leg:620
#8  0x08056626 in yy_Tail () at sexp.h_leg:624
#9  0x00000000 in ?? ()
(gdb) 


Essentially: all FILE* operations need to be replaced with port*
operations + the routines need to return bytes written, and should be
robust to NULL references, which will just get size estimates.

Is there a way to use sprintf as a one-pass algo?

So, need to fix:
  - output to string OK (from packetforth, using vnsprintf)
  - faster boot  (added expand.scm -> broken) (parser -> OK?)
  - constructors without class argument (class = global context) OK

Entry: boot speed
Date: Tue Nov 10 15:40:34 CET 2009

The problem seems to be the parser.  So, time to get rid of it.

Lexer seems +- ok.


Entry: stack
Date: Wed Nov 11 11:37:50 CET 2009

- bytes: add mime type

I'm thinking to use the 'bytes' type to implement any data type that
behaves as a buffer of bytes.

- all leaf objects: printer routines in terms of 'port'

Ok.. it took a bit longer than expected, but it looks like the
parser's working.  At least, it takes the boot.scm input.

The rest of the day maybe do some more testing.


Entry: object structuring + exceptions
Date: Wed Nov 11 15:44:27 CET 2009

aborting the parser..  representing the parser in an object?
connecting a parser straight to the port struct?

latter is a bit dirty, but should work.

found a simpler way: using setjmp

did some more bugfixing.  todo: make the error reporting a bit less
spartan.


Entry: printing to string ports
Date: Thu Nov 12 06:43:36 CET 2009

Fixed.  Using vnsprintf instead of vsprintf to limit nb of chars
actually written.  This can probably be optimized by attempting to
write and retrying when it fails instead of always calling the
formatter twice.


Entry: parameters
Date: Thu Nov 12 08:13:28 CET 2009

The next useful feature would be dynamic parameters.  I think the
necessary VM machinery is in place (continuation marks) so the rest
would be manipulation of continuation chain.


Entry: String repl
Date: Thu Nov 12 10:41:39 CET 2009

For embedded applications without stdio a string repl is useful.  A
coroutine-style interface between a C node and the VM would make this
possible.

Let's use the following interface.  This will consume Scheme commands
in string form and produce a string with stdio.  The string needs to
be used before the VM is re-entered.

const char *_sc_rep(sc *sc, const char *commands);

Needed: 
  - a repl written in scheme
  - a 'c-yield-string' word that exchanges strings.
  - modifiable output ports

parameter's aren't necessary yet + IO ports only need to be settable
from scheme, so this should simply work.


Entry: sc_eval_step debugging
Date: Thu Nov 12 12:05:20 CET 2009

(gdb) p sc_write_stderr(sc, state)
#state(#redex(repl-oneshot ()) #k_apply(#k_mt () () ()))$3 = 20


Entry: lowlevel errors
Date: Thu Nov 12 17:56:48 CET 2009

Need a mechanism in leaf to propagate errors.


Entry: gprof
Date: Fri Nov 13 07:29:17 CET 2009

Tried to inline some functions that are called a lot.  Most of them
are data access & creation.

The real message is: in order to speed this up: perform less consing &
lookup.

According to Dybvig, it's variable references and procedure
applications (frame extensions).  Global variables could be linked to
boxes instead of having to go through a search operation.  This might
be the best speedup.

ex_find() takes 44.9% of the time.

It's probably best to replace global variables with a direct reference
to the cell that contains the definition.  The same can be done for
free variables in closures.  If they're not assigned to, they can
additionally be unboxed.

#(var (name . val))


Entry: bootstrapping
Date: Fri Nov 13 08:45:19 CET 2009

Also for wrapper generation from C: this should be self-hosted.
Currently using PLT Scheme is ok but eventually it's probably best to
make it standard.


Entry: fdopen and read + write
Date: Fri Nov 13 13:04:53 CET 2009

man fdopen:

   Reads and writes may be intermixed on read/write streams in any
   order.  Note that ANSI C requires that a file positioning function
   intervene between output and input, unless an input operation
   encounters end-of-file.  (If this condition is not met, then a read
   is allowed to return the result of writes other than the most
   recent.)  Therefore it is good practice (and indeed sometimes
   necessary under Linux) to put an fseek() or fgetpos() opera- tion
   between write and read operations on such a stream.  This operation
   may be an apparent no-op (as in fseek(..., 0L, SEEK_CUR) called for
   its synchronizing side effect.

It's probably best to use 2 FILE objects.


Entry: GC restarts: disallowed by default.  
Date: Fri Nov 13 13:24:17 CET 2009

FIXME: bunch of (all?) _sc_make_aref() calls might be restarted!  it's
probably simplest to probe the GC free space inside the interpreter
loop to at least make those calls succeed.

In general: define a global nb of cells that is _guaranteed_ to be
available to all primitives invoked from _sc_step().  This might be
extended to a mechanism that completely prevents GC restarts inside
all primitives, turning it into an error.

Primitives that allocate a lot of cells then can be the exception:
they could allow GC restart manually.


Entry: sockets working
Date: Fri Nov 13 16:24:00 CET 2009

It's now possible to start a console on a tcp socket as client or
server.


Entry: Proposed changes
Date: Sat Nov 14 07:35:45 CET 2009

- eliminate global variable lookup
- treewalker -> CPS VM  (write this in Scheme first, follow Dybvig)
- GC restarts: make sure these only happen in step()  OK


Entry: preventing GC restarts
Date: Sat Nov 14 07:42:33 CET 2009

It might be possible for me to write code under restarts, but in
practice, interfacing with (stateful) C code becomes really
difficult.  It's best to make the mechanism more explicit.

remarks:

1. It is possible to prevent the gross of the restarts by checking the
available memory before executing the next VM step.  This guarantees
no restarts in case the nb of allocated cells remains below a certain
threshold.

2. For primitives that allocate a lot of memory the restart mechanism
might be enabled.


Strategy: remove the 'PURE' mechanism, but prevent restarts in
primitives.

OK.  This seems to work.  The only places where unbounded allocation
occurs are in ex.c -> ex_list_clone() (used by the mapping functions)
and ex_list_reverse().  Time will tell if I overlooked something.


Entry: VM
Date: Sat Nov 14 09:36:00 CET 2009

What are the operations on the continuation?  Is it enough to simply
serialize these?

Contiuation types: if set! begin apply

Currently the interpreter works in two steps:
  1. reduce non-value
  2. apply value to continuation


How do do this:
 - tag s-expressions
 - cps-convert


Entry: Memory usage
Date: Sat Nov 14 10:20:54 CET 2009

The current implementation isn't so efficient for memory usage.
From[1] I gather that a one-size mark/sweep algorithm might be
better.  It would also remove the need for restarts.  However, it
would complicate the use of variable length objects.

Additionally, pairs are represented quite inefficiently: 3 word cells
per pair.  However, in typical applications, the amount of memory used
for representing cells is probably much less that used for
representing leaf objects.  Let' s not worry too much about this..

[1] http://w3.ift.ulaval.ca/~dadub100/files/picbit.pdf


Entry: quasiquote
Date: Sun Nov 15 09:41:45 CET 2009

Adapted from tinyscheme source.


Entry: compact VM
Date: Sun Nov 15 10:49:52 CET 2009

Now, suppose I can later fix the compactness of data representation.
How to make at least code as compact as possible?  Using the base-16
ASCII encoding expaned before, it's possible to create some very
compact code.


Entry: cond -> hygiene
Date: Sun Nov 15 11:12:55 CET 2009

The '=>' syntax in 'cond' is the first place I encounter that needs to
introduce variables.

  (let ((x expr)) (if x (function x) <else>))

However, it seems that x is perfectly shielded from the lexical scope
of function, so there is no danger of unhygienic effects.


Entry: treewalker vs. bytecode
Date: Sun Nov 15 12:45:37 CET 2009

Let's try to build a stackless bytecode VM.  I don't think it's really
worth it to try to build a stack-based VM.  In cases where performance
is necessary, it might be better to use an existing Scheme compiler.

Try to focus on byte-code size.


Entry: Primitive errors
Date: Mon Nov 16 08:15:27 CET 2009

The interpreter fixes can wait: it's only speed.  More important
additions are leaf/ error handling.

The problems:
    1 this is again "context" (as with class instances)
    2 messages should be linked to the _wrapped_ objects
    3 memory management.

For 1, there are two options: pass in an extra argument to the
primitives, or solve it with a thread-local variable.

For 2 it's best to have the caller register the wrapped object that
needs to show up in the error message.  

For 3 it's best to not have the leaf code do any allocations.

Essentially: the VM needs to provide an error handling mechanism.  For
Scheme, the simplest approach is to keep the argument list of the
primitive call around.


Entry: Grids & Folds
Date: Mon Nov 16 10:43:35 CET 2009

Add (unsafe) operations for operating on vectors.  Make it general at
first, then try to add a compile-time binding mechanism.  Is it worth
to try to start from generation directly?  Probably not.  Let's define
the datatypes first.


Added: homogeneous grids - double floats to make interoperation with
GSL easier.


Entry: fixes
Date: Mon Nov 16 16:56:26 CET 2009

remove the EX->TYPES struct: use only global classes + change all prototypes.


Entry: compile "display lists"
Date: Mon Nov 16 17:52:40 CET 2009

Similar to opengl, it's probably possible to observe the calls into
leaf to simply record all primitive calls to compile them to threaded
code.  Adding the proper inspection points (all leaf calls go through
a broker?) will make this possible.

Essentially, the broker is an RPC mechanism: the client (EX/SC/PF)
calls into LEAF through an observable channel:

      EX <-> RPC <-> LEAF

Ignoring side effects, replacing the sequence of calls generated by EX
by a flat script should produce the same result.  However, this
doesn't capture loops efficiently.  Maybe some combinators could be
added to the RPC mechanism?

In general: simply recording shouldn't be so difficult.
Parameterizing the representation however is not so easy.

Also, it would probably work better with statically allocated objects:
given a pool of objects, record all operations that go between.

Maybe starting with a generic prototype would help?  Return int for
errors + only operate on leaf objects.


Entry: global _type() access
Date: Tue Nov 17 07:36:37 CET 2009

Removed "cached" type references from ex struct.


Entry: TODO: fix generic functions for numbers
Date: Tue Nov 17 13:58:11 CET 2009

Mix fixnums and flonums + prepare for full numeric tower.


Entry: Wrapper-independent code
Date: Wed Nov 18 07:35:49 CET 2009

PF and SC wrap the LEAF objects differently.  How can they share
highlevel EX code?  Currently all "virtual" behaviour is encoded in a
table in the EX struct, but since this is not VM-specific
(i.e. multiple SC or PF VMS, all with their own local memory and GC)
it can maybe be expressed using a static or dynamic linker interface.

OK: made leaf object wrapping virtual + moved a lot of code to from
scheme.c to ex.c


Entry: yuv type
Date: Wed Nov 18 13:37:10 CET 2009

Y Y'
U Cb
V Cr

To be pedantic, these are not the same:
  Y  luminance = detected energy
  Y' luma = brightness = gamma-compressed luminance = electric voltage

However, in digital video, the Y is always gamma-compressed, and the
prime is often removed.

YPbPr is the analog equivalent[2].  For more info see YUV(analog) [3]
and YCbCr(digital) [4].

How to make the video frame type as general as possible?  Practically
though, only a couple of different formats are necessary.

These are the degrees of freedom:

  * planar / packed

    It seems that planar formats use only 2x2.
    YV12     Y,Cr,Cb
    I420     Y,Cb,Cr

  * component order

  * subsampling

          4:4:4  none
          4:2:2  vertical
          4:1:1  vertical + horizontal


Practically.  Can the yuv format be made a subclass of the bytes
format?  Not really, as it is not a growable buffer.

Done.


[1] http://www.fourcc.org
[2] http://en.wikipedia.org/wiki/YPbPr
[3] http://en.wikipedia.org/wiki/Yuv
[4] http://en.wikipedia.org/wiki/YCbCr


Entry: Re-introducing packets
Date: Wed Nov 18 14:58:24 CET 2009

A packet is something that can be constructed from a raw bytes buffer
given some meta information.  The other way around, packets can dump
their data.


Entry: External filters
Date: Thu Nov 19 11:30:04 CET 2009

Make an abstraction to call an external data pipe filter as if it were
a scheme function.  This needs some select-based sequencing.

Actually, this only works if there is a clear RPC semantics.  Maybe
coroutines are better?  If unix pipes can be seen as synchronous
channels with well-defined messages this can be greatly simplified.

The original PF sequencer can probably be used.  This needs
non-blocking IO.

Is there an SRFI for this?

Anyway, in order to try this, it is necessary to create tasks and i/o
operations that can be retried.  As opposed to PF, it's probably best
to implement the basic functionality in C.

This needs:
  - pre-emptable parser (a C one-shot continuation).
  - output state machines
  - input state machines

Only the parser is a push-down automaton, not a state machine.  This
will be a significant amount of work.  Let's postpone it a couple of
weeks.

Rewriting the parser to use an explicit stack might be an interesting
approach.  However, it's probably better to use a proper coroutine
implementation.  Maybe this can be taken from lua?

Alternatively, pthread can be used.  It might be a lot simpler to
implement.  As long as the EX memory model can be separated from
read/write threads this should actually be quite straightforward.
However..  This does make them more expensive.

We can have both..  On the longer term, figure out how lua does it.
Switchable stacks is what is necessary.  On the short term: create a
channel abstraction that can produce/consume leaf objects.  Malloc can
still be used to allocate data that's passed between threads, and the
leaf code is completely separate from any graph memory model.

So, a pchannel can be used to synchronize with system threads.


Implementing channels using condition variables (semaphores are not
enough?) There are two blocking conditions.

GET blocks when C = NULL
PUT blocks when C = OBJ

Signalling  consumer -> producer:  write_ok
            producer -> consumer:  read_ok


leaf_object *channel_get(channel *x) {
    leaf_object *ob;
    pthread_mutex_lock(&x->mut);
        while(!x->object) { pthread_cond_wait(&x->get_ok, &x->mut); }
        ob = x->object;
        x->object = NULL;
        pthread_cond_signal(&x->put_ok);  // wake up producer
    pthread_mutex_unlock(&x->mut);
    return ob;
}
int channel_put(channel *x, leaf_object *ob) {
    pthread_mutex_lock(&x->mut);
        while(x->object) { pthread_cond_wait(&x->put_ok, &x->mut); }
        x->object = ob;
        pthread_cond_signal(&x->get_ok); // wake up consumer
    pthread_mutex_unlock(&x->mut);
    return 0;
}

Entry: testing channel
Date: Thu Nov 19 15:07:10 CET 2009

The basic use case is a channel that wraps a file descriptor, which
represents a socket or an other pipe to another process.

OK, seems to work.

Making this many->one would make it possible to serialize external events.

The only problem is resource management.  See the following scenario:

1 consumer, N producers.  Whenever the consumer looses interest, all
the produces should give up.  However, when a producer disappears this
should keep the channel open.

In this case the 'free' method, issued at the consumer side should
signal all producers to give up, and only cleanup the object when
they're gone.


Let's make it work for the many->one case.


Entry: channel teardown
Date: Fri Nov 20 15:11:14 CET 2009

Channels are quite simple.  What's not simple is to represent
end-of-stream and teardown without causing deadlock.

But.. calling an external app with both input and output isn't so
simple.  Something goes wrong in the synchronization: i get commands
that do not output anything.

If I perform the commands manually everything works fine.

Yes..  This is too complicated to get right.  The bric-a-brac resource
management simply doesn't work.


Other possibilities:  use only a feeder thread, and read the output of
the process synchronously.

But, given the trouble this creates and the difficulty of debugging a
multi-threaded app in gdb, I'm thinking about moving to non-blocking
IO.

Let's create a special case: execute an external program, feeding it
binary data on stdin, and collect its stout in a buffer.  Sequence the
reads & writes with select.

Is it possible to move from FILE to fd's?  For output this is just
flush.  How does input work?


Entry: channel problems + external processes
Date: Fri Nov 20 16:34:54 CET 2009

channels work fine, but teardown is problematic.

current problems:
 - resource allocation of attached threads doesn't work
 - what do do with memory management?
 - how to avoid deadlock

how to talk to an external process?


The problem seems to be that teardown isn't defined very well.  What
does it mean?  

     - all threads that block on GET should return with EOF
     - all threads that block on PUT should get an input signal

Teardown and PUT should be properly serialized = what's wrong now.

This needs a different abstraction.  Let's concentrate on asymmetric
solutions that always connect to a thread, and only support
uni-drectional operations.

Frustrating.  There has to be a way to do this properly, but the
resource management is nontrivial.


Entry: to thread or not to thread
Date: Sat Nov 21 14:26:50 CET 2009

+ C-based parsers are simpler

- state machines don't need it
- can't use VM code
- gdb is more difficult
- not all targets support it
- requires special memory management

The main question is however: isn't it better to have an event loop
based mechanism in the interpreter?  Original PF at some point had to
do this too.

In this light, isn't it better to switch to non-buffered IO?  Direct
fds?


Entry: select() based event loop
Date: Sun Nov 22 12:16:30 CET 2009

Essentially, this consists of a list of file descriptors and
associated actions.

Currently the actions are:
  - INPUT:  read when ready + parse
  - OUTPUT: write when ready

The assumption can be made that the tasks are representable as state
machines.  Push-down automata will then need an explicit stack
structure, or somehow wrap the C stack.


Entry: Simpler PF
Date: Sun Nov 22 12:19:02 CET 2009

What about building the core of the PF interpreter directly on top of
LEAF instead of on EX?  The core doesn't need a GC, and its memory can
be built from tuples.


Entry: setbuf fgetc
Date: Sun Nov 22 17:47:00 CET 2009

What I'd like to know is how fgetc() buffers the input.

How I would do it: fix some buffer size and attempt to read as much as
possible without blocking.

2nd question: how to get at the input buffer without triggering a
blocking read?

 -> set the underlying fd to non-blocking.  fread will return a short
    item count and set errno = EAGAIN.

so, there seems to be no problem.

the real problem is: how to code this?  should resuming happen at
scheme level, or is a low-level mechanism better (i.e. present streams
as object channels, and wake-up only after a full object is
read/written).

iow: do we use a low-level state machine sequencer (fast, but
inflexible) or a high-level task switching engine (slow but
flexible).

as mentioned before, one important thing to note is that if a _parser_
is encoded as a blocking task, this whole idea won't work.  currently,
even the scheme syntax tokenizer uses function calls.

it would be _really_ convenient to have a portable lightweight threads
library.  i don't understand why uclibc doesn't implement makecontext:

       SUSv2, POSIX.1-2001.  POSIX.1-2008 removes the specifications
       of makecontext() and swapcontext(), citing portability issues,
       and recommending that applications be rewritten to use POSIX
       threads instead.

alternatively, it might be interesting to construct some C macros for
nested code.  this mechanism isn't needed much: maybe a portable
specialized implementation make sense?

taking the birds-eye view, it seems that writing parts of the code
directly in forth makes sense.  maybe the project needs a ruffle?

pff... it's time to focus, and work with what's there.  writing the
parsers as a state-machine isn't necessary yet.  current I/O
requirements only need simple state machines.  it's probably best to
propagate select up to the highest level, and then see how to
specialize it if necessay.

next: find a simple interface for select()

A list with structures like this, where the busy flag is modified in
place:  (port 0/1/2 !busy fn)


Entry: R4RS compliance
Date: Tue Nov 24 08:56:32 CET 2009

Apparently `char?' and `number?' need to be disjoint.  Since I'm not
really interested in unicode, chars can be implemented as bytes, and
embedded inside the constant space.

Other R4RS issues include an incomplete implementation of the numeric
tower.  This requires quite some effort to get working, and I will
update it ongoingly.  At least the tokenizer should recognize
numerical syntax, and maybe complex numbers could be implemented.
Rationals and bignums are more work and currently not really relevant.


Entry: arm cross build + test with qemu
Date: Tue Nov 24 15:56:55 CET 2009

# build the toolchain with busybox.  standard config: just pick
# generic arm target.  this gives arm-linux-uclibcgnueabi

# the distro includes a script that builds it and installs it as a
# subdir of current dir.
bin/build-arm

# the app is compiled statically, so can run directly in QEMU.  use
# the expanded code to boot faster.
qemu-arm arm-linux-uclibcgnueabi/bin/sc --boot sc/boot-expanded.scm


Entry: Compile to AST
Date: Wed Nov 25 12:07:21 CET 2009

Currently the code remains in s-expression form.  However, `lambda'
should compile to AST with a limited number of cases.  If this is then
put into CPS form we have a VM.

Roadmap: AST from is required for making varref explicit + cachable.

Entry: VM ideas
Date: Wed Nov 25 08:41:21 CET 2009

To make varref work, the compiled lambda expressions need to be
modified.  Since this is a caching operation, it could be done at
runtime, whenever a varref is encountered.

Essentially, a bare-bones lambda expression is a tree containing:
  - constants
  - variable references
  - abstractions
  - special forms

The goal to speedup:
    - eliminate variable lookup
    - eliminate intermediate structures (deforestation)
    

Entry: kawa scheme
Date: Thu Nov 26 22:59:20 CET 2009

cd /usr/local/lib
wget ftp://ftp.gnu.org/pub/gnu/kawa/kawa-1.9.90.jar
export CLASSPATH=/usr/local/lib/kawa-1.9.90.jar
java kawa.repl

On android[1].

[1] http://per.bothner.com/blog/2009/AndroidHelloScheme/


Entry: `freeze'
Date: Thu Nov 26 23:28:40 CET 2009

Currently 50% of the execution time is spent in variable lookup.  What
about adding a `freeze' operation that will replace all global names
with values?

The reason to make this a command is to make the semantics
controllable.  Freezing variables to values needs boxing to preserve
mutability.

As long as define / set! are not used to change the contents of a
global variables, this works fine and should speed up a lot.


Entry: `apply'
Date: Thu Nov 26 23:40:53 CET 2009

I suspect apply is way too slow because of the continuation stack
trick.  Changing the evaluation order of arguments from back to front
should make some things easier.. i.e. argument lists can be
constructed in a straightforward manner.  The 'reverse' operation
necessary to construct the list of redexes can be done once at
compilation time.

But, before I'm going to do something like this, some form of
scheme-level profiler is necessary.


Entry: Scheme Resources
Date: Fri Nov 27 08:03:36 CET 2009


1. libprim/sc vs Standard Scheme.

The SC interpreter is optimized for embedded use.  It has a very small
C core of about 60K.  It is standard R4RS with the most expensive
features removed (i.e. no rational numbers, complex numbers or
infinite precision integers).

Its main goal is to glue together C data and code, and provide a
system interface akin to a bash shell (programs / pipes / files).


2. getting to know Scheme

To get started, take a look at this

http://www.scheme.com/tspl3/


Entry: VM change roadmap
Date: Fri Nov 27 09:33:40 CET 2009

This is a longer change, probably for the weekend.  Change the
representation of code such that (expanded) lambda expressions get
transformed to an intermediate form which is then interpreted by the
VM.  This can still be a tree.  Just make sure that some binding work
that's done at run-time atm is replaced with compile-time work.


Entry: Coroutine interface C <-> VM
Date: Fri Nov 27 11:23:31 CET 2009

Requires: "yield" signal that preserves current continuation.  This is
already +- implememented with the MT continuation.

Currently I don't see how to do this.  Something isn't right.

Yield should behave as a primitive: the re-entry from scheme behaves
as the return value of the sc_yield() primitive.  


Entry: remove support for towers
Date: Fri Nov 27 11:36:23 CET 2009

It makes the interpreter loop too complicated, and it's not used.

This means sc_step is not a function accessible from scheme, which
means it can be implemented more efficiently, and probably simpler.

The fundamental conflict is this:  some primitives need explicit
access to the current interpreter state - this conflicts with
implementing the interpreter in a purely functional way.

Essentially, the concept is broken so let's get rid of it.  The vm is
a state-machine because of a hack that allows modification of internal
state (i.e. the current value & continuation) in an interface towards
primitives that pretends as if there is no suck access.

Is there a reason to not unify primitive exceptions and GC restarts?


Entry: A better approach
Date: Sat Nov 28 10:33:09 CET 2009

1. Write the VM in Scheme
2. Translate this to EX using a bootstrap compiler

This approach allows to get the code working right first, and gives
another escape route to compiled code.

Entry: Dybvig heap machine
Date: Fri Nov 27 16:48:14 CET 2009

A accumulator
X next expression
E environment
R value rib (already accumulated values for next evaluation)
S stack/continuation

How is this different from the cek machine continuation frames?


Entry: Detached console
Date: Mon Nov 30 13:41:47 CET 2009

The idea is to run the VM with its own dispatch loop in a separate
thread, and have the foreground app (anything that links the VM into
its image) present only a character console I/O.


leaf_object *console_rpc(console *d, const char *cmd) {
    console_write_raw(d, cmd, strlen(cmd));
    console_write_raw(d, "\n", 1);
    port_flush(d->out);
    return console_read(d);
}

int main(int argc, char **argv) {
    sc *sc = _sc_new(argc, argv);

    /* This will start the VM in the background with a standard
    unix-socket based dispatch loop.  See boot.scm::init-console */

    console *c = _sc_start_console(sc);


    /* To exchange information with the VM, a character interface is
    used.  To properly syncrhonise on the return stream, the leaf/
    s-expression parser is used. */

    for(;;) {
        leaf_object *o = console_rpc(c, "(+ 1 2)");
        leaf_write(o, p);
        port_printf(p, "\n");
        leaf_free(o);
        sleep(3);
    }
}
    

Entry: wish list
Date: Mon Nov 30 15:02:50 CET 2009

 - parameters + io-redir
 - dynamic-wind
 - exceptions 
 - continuation printing
 - stepping / debug?


Entry: #< ... >
Date: Mon Nov 30 16:08:28 CET 2009

This will be scanned correctly if the brackets '<' and '>' are
balanced.  The default parser will tag these with "hash".  The Scheme
parser raises an error.


Entry: build system fixes
Date: Tue Dec  1 14:25:40 CET 2009

Environment variables:

LDFLAGS


Entry: select and FILE*
Date: Wed Dec  2 10:49:44 CET 2009

Because of input buffering, select will not always give the correct
behaviour.  I.e. for the input console, giving a sequence "1 1 1 1 1"
will give only one answer as it seems the whole string was buffered.

This is ok as long as the user waits for reply before sending another
command, but it's something to be careful about for unidirectional
channels!


Entry: Code reps
Date: Wed Dec  2 13:57:13 CET 2009

- tuples (low-level, tree-structures, C)
- vectors (high-level, graphs, EX, GC)
- cons-lists (in terms of vectors)

For communication between apps, symbol-tagged tuples to represent ADT
constructors seems simplest.  However, the s-expression parser
produces such a description based on cons cells.

Maybe there should be a routine that flattens this, and fails on
improper lists.

tagged cons list -> tagged tuples.


Can the parser be extended to produce flat vectors directly?


Entry: Java comm (converting any rep to java data structures)
Date: Wed Dec  2 14:01:24 CET 2009

Using StringTokenizer[2] a string can be converted into an iterator.
However, a straight connection to JNI might be simpler (see next
post).

[1] http://www.cafeaulait.org/javafaq.html#xtocid75724
[2] http://www.javasoft.com/products/JDK/CurrentRelease/api/java.util.StringTokenizer.html


Entry: Java JNI
Date: Thu Dec  3 11:36:00 CET 2009

/* Recipe for running Scheme in Java with access to Java objects:

    Start Scheme VM as a native method called from a Java thread which
    never returns.  The JNIEnv variable passed in can be used to call
    into Java code from the VM mainloop. */

/* JAVA JNI */

/* This macro defines standard LEAF object behaviour for wrapped Java
   JNI objects, and unwrappers for AREF representation. */
#define DEF_JAVA(name) \
    typedef struct { leaf_class super; } java_##name##_class; \
    typedef struct { java_##name##_class *type; name jni_ref; } java_##name; \
    void java_##name##_free(java_##name *x) { free(x); } \
    int java_##name##_write(java_##name *x, port *p) { \
        return port_printf(p, "#<" #name ":%p>", x->jni_ref);  \
    } \
    LEAF_SIMPLE_TYPE(java_##name) \
    java_##name *java_##name##_new(name jni_ref) { \
        java_##name *x = calloc(1, sizeof(*x)); \
        x->type = java_##name##_type(); \
        x->jni_ref = jni_ref; \
        return x; \
    }\
    DEF_AREF_TYPE(java_##name)

DEF_JAVA(jclass)
DEF_JAVA(jobject)
DEF_JAVA(jmethodID)

/* Dynamic context (VM running in the dynamic extent of a java call) */
typedef struct {
    JNIEnv *env;   // current Java thread's environment
    jobject this;  // the embedding scheme object
} java_ctx;
static inline java_ctx *_sc_java_ctx(sc *sc) {
    java_ctx *ctx = (java_ctx*)(EX->ctx);
    return ctx;
}
#define JAVA_ENV (_sc_java_ctx(sc)->env)

/* Exceptions. 

   JNI methods give a special return value (i.e. NULL) to indicate an
   exception has been raised.  This needs to be handled _before_
   making another JNI call.

*/
_ _sc_java_error(sc *sc, _ ob) {
    if ((*JAVA_ENV)->ExceptionCheck(JAVA_ENV)) {
        (*JAVA_ENV)->ExceptionClear(JAVA_ENV);
    }
    return ERROR("java", ob);
}
_ sc_java_class(sc *sc, _ name) {
    jclass class = (*JAVA_ENV)->FindClass(JAVA_ENV, CAST(cstring, name));
    if (!class) { return _sc_java_error(sc, name); }
    return _ex_leaf_to_object(EX, java_jclass_new(class));
}
_ sc_java_methodID(sc *sc, _ cls, _ name, _ sig) {
    jmethodID method = (*JAVA_ENV)->GetMethodID(JAVA_ENV, CAST(java_jclass, cls)->jni_ref,
                                                CAST(cstring, name), CAST(cstring, sig));
    if (!method) return _sc_java_error(sc, name);
    return _ex_leaf_to_object(EX, java_jmethodID_new(method));
}
_ sc_java_this(sc *sc) {
    return _ex_leaf_to_object(EX, java_jobject_new(_sc_java_ctx(sc)->this));
}
_ sc_java_string(sc *sc, _ ob) {
    return _ex_leaf_to_object(EX, java_jobject_new((*JAVA_ENV)->NewStringUTF(JAVA_ENV, CAST(cstring, ob))));
}


/* Method calls in Java are a bit of a pain due to there not being a
   single root object.  The scalar objects need to be handled
   differently.  It's probably simplest to restrict to object-only
   calls and wrap scalar types somewhere else. 

   Otherwise, the native types can be handled using some macros to
   perform the routing based on input types.

*/
_ sc_java_call(sc *sc, _ ob, _ ID, _ args) {
    vector *v = CAST(vector, args);
    int i, n = vector_size(v);
    jvalue jargs[n];
    for (i=0; i<n; i++) jargs[i].l = CAST(java_jobject, v->slot[i])->jni_ref;
    jobject rv = (*JAVA_ENV)->CallObjectMethodA
        (JAVA_ENV, CAST(java_jobject, ob)->jni_ref, CAST(java_jmethodID, ID)->jni_ref, jargs);
    if (!rv)  { return _sc_java_error(sc, args); }
    return _ex_leaf_to_object(EX, java_jobject_new(rv));
}


/* FIXME: check this

   All objects objtained through the JNI will remain live as long as
   the method does not return.  References need to be explicitly
   released otherwise.

   http://journals.ecs.soton.ac.uk/java/tutorial/native1.1/implementing/refs.html

     "To make sure that Java objects can eventually be freed, the JNI
      by default creates _local references_.  Local references become
      invalid when the execution returns from the native method in
      which the local reference is created."


     "In certain situations, however, the native code may need to call
      the DeleteLocalRef function to explicitly delete a local
      reference."

   This is done using (*env)->DeleteLocalRef(env, object)


*/


Entry: simplifications
Date: Thu Dec  3 13:27:24 CET 2009

- eliminate towers of interpreters
- eliminate GC_INTEGER ??
  -> all objects are either structured leaf objects or constants
     (it can be argued that an integer is a constant though)
- fix the .h_prims generator


Entry: Wrapping
Date: Fri Dec  4 07:42:52 CET 2009

I spend a lot of time in wrapping something -> leaf and then leaf ->
EX/SC.  Make at least the latter one automatic.  Use this to make
generic C code generatior for a subset of procedures.


Entry: JNI interface with type checking
Date: Fri Dec  4 14:17:30 CET 2009

Something that's quite necessary to make this work reliably is a
dynamic type checker.  The information is available (it's necessary
for method lookup) so it shouldn't be too difficult to integrate.

What's the simplest way to run a JNI example on a Linux platform?
From here[1] we have:

gcc  -o libnativelib.so -shared -Wl,-soname,libnative.so  
     -I/export/home/jdk1.2/include 
     -I/export/home/jdk1.2/include/linux nativelib.c  
     -static -lc


[1] http://java.sun.com/developer/onlineTraining/Programming/JDCBook/jniexamp.html


I have some preliminary code outside of the project, which with a bit
of cleanup can probably be made into a LEAF<->JAVA interface.


Entry: EX and LEAF compilers
Date: Fri Dec  4 18:02:22 CET 2009

Start thinking about compiling Scheme/PF code in (readable) EX and
LEAF C code.  For non-looping code this could be done in a
straightforward way recording all LEAF api calls.  For looping code
it's more involved.


Entry: reflection on future
Date: Sun Dec  6 10:22:22 CET 2009

Recent developments have shifted the off-clock effort from making the
Scheme VM faster to making the Scheme layer easier to eliminate,
producing readable and editable C code.

Things that seem to become more important:

  - LEAF data structures with sharing
  - Automatic wrapper generation
  - Migration from Scheme to C/LEAF code
  - Java JNI integration

In short: it should be possible to peel the EX/SC layers off more
easily, leaving just C code and the LEAF object library.

For structures with sharing, but without dependency on an asynchronous
GC, it's necessary to implement some form of reference counting.  This
can either be done inside the base object, (taking a 2nd field), or as
a subclass.

Things that are hinted at:

  - Polymorphy in LEAF
  - Implementing ex/vector in terms of leaf/tuple
  - Implementing PF's linear (data) core in terms of LEAF, with code
    execution in terms of VECTOR.

The more I think about this, the more it seems plausible to take a
closer look at the object side of the COLA architecture.  Some missing
links are it's object model, and bootstrap through the method table
cache.  From another side, CLOS and implementations might also be a
good source of information.


Entry: Unification
Date: Sun Dec  6 10:37:42 CET 2009

To simplify compilation from Scheme -> C using only the LEAF object
model and manual memory management, all objects need to become leaf
objects (i'm looking at integers and characters), so at least there is
a unified model for data and operations at a lower level.

As became quite clear toying with the Java JNI, having to special case
data items that are not objects is quite a pain.

It's important not to forget the goal of the libprim project:  have
low-level code and data models that can be _augmented_ with highlevel
ones (data: EX GDd vectors, code: Scheme and PF interpreters).

This is about the total elimination of _all_ Scheme and PF code in a
final project, to get rid of the management requirement of not _using_
any unknown or non-standard languages, so the resulting C code can be
modified without needing any access to the higher level data and code
models.


Entry: Next
Date: Sun Dec  6 11:02:46 CET 2009

- Unified objects (integers as leaf objects)
- Data sharing and reference counts

RC: if there is a necessity to create circular data structures, it's
probably prudent to contain those locally, i.e. wrap them in a
non-circular structure.

It's probably necessary to equip each object with a reference count
and a `final' flag.  The latter is necessary because there is
otherwise no guarantee that a multi-owned object will remain constant.


Entry: Assumptions
Date: Sun Dec  6 11:38:44 CET 2009

What are the assumptions?

  - Synchronous memory management will be necessary, i.e. the system
    already contains open/close resources (code libraries / hardware
    interfaces / remote connections / ...) that cannot be modified.

  - Use of anything outside of plain C will be a liability and should
    be removed (if possible).

  - During prototyping and testing there is no constraint on the tools
    used.


Entry: Implementing RC
Date: Sun Dec  6 11:44:13 CET 2009

I'm probably going to add it to the base object.  This means that yes,
all objects require an RC which will make them larger.  However, I
intend to keep the interface abstract, so objects with RC=1 might be
implemented simpler.

One problem however: how can you tell whether a C function consumes an
object, like the parser uses a port, or reads/modifies it in place?
That inconvenience is not going away..  Maybe the convention is that
all argument to constructors will use the object as a delegate.  Maybe
this should be done where it's appropriate: at the point where the
reference is _STORED_.  Essentially: users (callers) can't know what a
procedure does with a reference.

Convention: all leaf objects passed as arguments are not stored inside
a data structure that exits the procedure in a different form.  Each
_persistent assignment_ requires a duplication.  Assignments to local
C stack variables are not a problem.


Entry: LYSP
Date: Sun Dec  6 18:41:40 CET 2009

See here[1].  Before I try to do anything else I might better read the
code.  One of the interesting tricks is to use dlsym for looking up
native code.

[1] http://www.piumarta.com/software/lysp/


Entry: Memory and stack management questions
Date: Sun Dec  6 18:44:42 CET 2009

- how efficient is malloc?
- does it make sense to have a separate pooling mechanism?
- how non-portable is makecontext() ?


Entry: JNI & reflection
Date: Mon Dec  7 14:12:37 CET 2009

java.lang.Class
java.lang.reflect.Method
java.lang.reflect.Constructor

Something that bothers me in Method.invoke() is that the arguments are
an Object[] array.  Do they get automatically casted to native types
like int,long ?

I don't see anything of this sort, so it seems that the answer is yes:
this only supports objects.

It would be interesting to figure out if kawa can be used to perform
this reflection task..

Conclusion: java reflection API seems useful, but there are
complications if arguments are not an Object subclass.

Nope[2].  It is possible to represent primitive types as a Class object.


[1] http://java.sun.com/j2se/1.3/docs/api/java/lang/Class.html
[2] http://java.sun.com/j2se/1.4.2/docs/api/java/lang/Class.html#isPrimitive()


Entry: prettyprinting
Date: Tue Dec  8 07:25:06 CET 2009

Don't underestimate the power of pure functional C code,
prettyprinting and gdb.

I'm developing for Android atm, and I'm quite surprised by the lack of
good debugging tools.


Entry: simplifying exceptions : no more multiple entries
Date: Tue Dec  8 07:46:47 CET 2009

This seems to work.  Both PF and SC interpreter loops are simpler now:
jmp-buf is only set once per exception, instead of once per primitive
invocation.


Entry: More reflection
Date: Tue Dec  8 13:28:55 CET 2009

Varargs are expressed as this:

    Object print_var(Object... a) {
        System.out.println(((String)a[0]) + ", " + ((String)a[1]));
        return _void;
    }

The method can then be resolved + invoked as in:

    Method find(String cmd) 
        throws java.lang.NoSuchMethodException
    {
        return this.getClass().getDeclaredMethod(cmd, new Class[] {object_array_class});
    }
    void test() 
        throws java.lang.NoSuchMethodException,
               java.lang.IllegalAccessException,
               java.lang.reflect.InvocationTargetException
    {
        print_2("aaa", "bbb");
        Object[] args = {"ccc", "ddd"};
        find("print_var").invoke(this, new Object[] {args});
    }

Note that the vararg definition is essential.  The following method
does not have the right signature and raises
java.lang.NoSuchMethodException: eval.print_2([Ljava.lang.Object;)

    Object print_2(Object a0, Object a1) {
        System.out.println(((String)a0) + ", " + ((String)a1));
        return _void;
    }


But, vararg arrays are enough to make at least the definition of
dynamic methods a bit more straightforward.  Summarized in
jni/eval.java


Entry: Mapping tuples to method invocations
Date: Tue Dec  8 14:08:09 CET 2009

Main problems are strictness and argument agregation.  I.e. given the tuple:

   t = tuple("doit", "foo");

* Does eval(t) evaluate "foo"?
* Does eval pass multiple arguments 1->... as an argument vector?

It's probably best to deviated from lisp-style semantics here, and
simply use method lookup and explicitly constructed argument lists.


Entry: More java reflection
Date: Wed Dec  9 08:10:19 CET 2009

The missing element is a Scheme object -> java object interpreter.  Or
even simply a string -> java object interpreter.

Essentially, to quickly hack it, all primitive types really could be
represented as strings, while all object types need to be somehow
wrapped, or placed in a local store.

This should make it possible to keep the interface between java and
Scheme completely dumb.

The problem is: an explicit distinction needs to be made between local
variable references and other constructors.

Doing this properly looks like a good weekend project.


Entry: sc in Java on linux
Date: Wed Dec  9 17:49:04 CET 2009

Moved code from the external project side into the libprim side, so
the interface can be made simpler.

Next: how to wrap a C pointer as a java object?  -> use long

Next: hash table for java<->aref wrapper management.

such a hash table needs a simple linked list anyway for collisions, so
maybe start with a linked dictionary first.  Can this be done with
tuples?  Keys then need to be leaf objects.

Actually yes.  The ref list is really just a list.


Entry: JNI object -> class
Date: Thu Dec 10 11:04:24 CET 2009

I changed some methods from object to class context (static methods).
At the same time some problems started occuring: sc_java_class doesn't
resolve any more.

Next:
  - print exceptions
  - maybe re-instate object context?


Entry: Reflection
Date: Thu Dec 10 13:41:20 CET 2009

So, there's a SC/C <-> Java connection.  Link this then to the
eval.java and reflect.java classes.

Different strategy:

Instead of going through a large amount of red tape in C to be able to
call different constructors / class methods / methods, it might be
simpler to move most of the dispatch logic to the Java side and use
the reflection API, but keep a single interface through some static
method interface from C->Java.


On the C side, things need to be simplified such that there are 3
different routines:  constructors / static / methods

FIXME: constructors with arguments dont work.  w.o. ok

FIXME: create an object pool based on the java object pointer, not the
local wrapper.

this is a mess.. solve it properly.


it's not possible to map leaf -> aref due to moving GC, so each leaf
needs to be _dupped.


Entry: Java JNI fixed
Date: Fri Dec 11 10:10:09 CET 2009

Fixed most of the bugs in the JNI <-> LEAF/EX bridge, except for
invoking constructors.


Entry: JNI w/o type signatures
Date: Fri Dec 11 11:33:01 CET 2009

It would be handy to be able to get rid of the type signatures when
looking up methods.  Could be done in two steps:

  - map object -> class -> methods list
  - find method with correct type, or simply try them one by one


Entry: JAVA calls -> running VM loop
Date: Fri Dec 11 11:53:18 CET 2009

This needs some locking.  The simplest place to release the lock is
during the select() wait period.


Entry: global instead of local refs
Date: Fri Dec 11 14:59:46 CET 2009

Need to change the ref rep to global, so they survive across JNI
calls.

         /* Create a global reference */
         stringClass = (*env)->NewGlobalRef(env, localRefCls);
 
         /* The local reference is no longer useful */
         (*env)->DeleteLocalRef(env, localRefCls);


Apparently, this gives a unique object.  Looks like the object mapper
isn't necessary.  But let's leave it in for now in case it is.

No: it still is, but in less cases (i.e. objects passed through
functions?).  Or are these also local refs that get re-wrapped?

Confusing..


Looks like I introduced a problem that wasn't there before.  Revert.


/* The sc struct contains a reference to the currently active Java context. */

/* For some reason the global refs cause crashes..  */
#define GLOBAL_REFS 0

static void jniref_free(jniref *x) {
    // LOGF("DeleteRef(%p) RC=%d\n", x->jni_ref, leaf_rc(&x->base));
    sc *sc = x->sc;
    if (sc) {
        java_pool = tuple_list_remove_object(java_pool, (leaf_object*)x, 1);
#if GLOBAL_REFS        
        (*JAVA_ENV)->DeleteGlobalRef(JAVA_ENV, x->jni_ref);
#else
        (*JAVA_ENV)->DeleteLocalRef(JAVA_ENV, x->jni_ref);
#endif
    }
    free(x);
}
static int jniref_write(jniref *x, port *p) {
    return port_printf(p, "#<jniref:%p>", x->jni_ref);
}
LEAF_SIMPLE_TYPE(jniref)
DEF_AREF_TYPE(jniref)

static int jobject_equal(jniref *x, void *jni_ref) { return x->jni_ref == jni_ref; }


static jniref *new_jniref(sc *sc, void *jni_ref) {
    jniref *x = calloc(1, sizeof(*x));
    leaf_init(&x->base, jniref_type());
    x->sc = sc;
    x->jni_ref = jni_ref;
    java_pool = tuple_stack_push(java_pool, (leaf_object*)x); 
    return x;
}

/* 1st level wrapping makes a 1-1 map between leaf objects and jni references */
static jniref *new_jobject(sc *sc, void *jni_ref) {
#if GLOBAL_REFS
    void *global_jni_ref =  (*JAVA_ENV)->NewGlobalRef(JAVA_ENV, jni_ref);
    (*JAVA_ENV)->DeleteLocalRef(JAVA_ENV, jni_ref);
    return new_jniref(sc, global_jni_ref);
#else
    jniref *x;
    /* If it's already wrapped as jobject, return it with RC++.  */
    if ((x = (jniref*)tuple_list_find(java_pool, (leaf_predicate)jobject_equal, jni_ref))) {
        LOGF("reuse %p\n", jni_ref);
        return LEAF_DUP(x);
    }
    else {
        // LOGF("wrap %p\n", jni_ref);
        return new_jniref(sc, jni_ref);
    }
#endif
}


Entry: Other problem
Date: Fri Dec 11 16:00:56 CET 2009

Probably because the string created in java-string isn't a local ref
and can't be freed as such?


I need to do this differently...  The current design is not very
controllable.  Maybe it's best to make sure nowhere in the C code
there is ever any local reference visible, then see how far that gets.

A mix of java and C code is probably going to be best.  C side only
sees global refs and only objects, then use the reflection API on the
Java side to do all the work.


Entry: ReleaseStringUTF
Date: Sat Dec 12 07:57:01 CET 2009

ReleaseStringUTF is for freeing the resources allocated by
GetStringUTF.  I had it coupled with NewStringUTF8 which creates a
java object, not a C-readable UTF-8 string.


Entry: Simpler interface C->Java
Date: Sat Dec 12 09:02:48 CET 2009

Currently using a single delegate static method which is passed an
array of objects.  Next: make this recursive tuples.

Ok.  Most code seems to be connected properly now.


Entry: Testing: call arbitrary methods + send/return integers
Date: Sat Dec 12 14:03:11 CET 2009

In other words: make method lookup simpler (no fussing with specifying
types: automate as much as possible).

Done, but I didn't make special number convertors.  Structured strings
are enough for now.


Entry: Mapping numbers
Date: Sun Dec 13 12:28:12 CET 2009

Calling the constructors is straightforward.  However, automaticly
converting numbers will not suffice, as Java distinguishes between
different types.  Also, invoking a specific constructor/convertor and
then automatically converting Java->Scheme rep also causes problems.

So, let's _not_ automatically convert back to Scheme rep, but make
this a 2nd operation.

This would also eliminate GC restart problems.


Entry: Re-integration
Date: Mon Dec 14 10:01:44 CET 2009

It worked just fine this weekend in isolation, however now the class
instances don't seem to be the same for the sc_java_unpack() function.
Maybe there is some double wrapping going on?

  -> could be this then needs isSameObject() 

It does work on Android though..  Good.

FIXED: needed  isSameObject()


Entry: getDeclaredField
Date: Mon Dec 14 11:35:09 CET 2009

Any method that accesses a field doesn't work when it's invoked a
second time.  This stuff is quite full of pitfalls..

FIXED? seems to have been a release() error.

Entry: separate build dirs
Date: Tue Dec 15 11:16:18 CET 2009

Necessary for cross-compilation.

Seems to work.  This was mostly adding the $(SRCDIR)/$(MODULE)
prefixes for all source files, and copying the Makefiles into the
build dir.


Entry: --eval or --load options + java pass-through
Date: Wed Dec 16 15:01:47 CET 2009

For running java debugcode this is necessary.

Done: the sc/sc.c code now uses an --eval option that will pass its
string argument to the `eval-string' function after loading the boot
script.  The default value is '(repl)'.

To run a script, use: 

   --eval '(script)' <script> <arg> ...
   
'eval' can use anything defined in the boot script, which can still be
set using the --boot option.


Entry: evaluation last to first
Date: Sun Dec 20 09:11:21 CET 2009

The reason for this is that the result of an evaluation is then a
simple CONS list: no reverse is necessary.  Reversing of the
expressions to evaluate (CPS or ANF) can be performed at compile
time.

However, this requires quite some surgery as the concrete
representation is used in a lot of places.  I attempted it once and
failed, then reverted.


OK, it seems to work after hunting bugs.  Code is simpler now, but
there might still be subtle problems where order of argument
evaluation is used.


Entry: insidious TCP console bug
Date: Wed Dec 23 08:37:04 CET 2009

in sc/scheme.c : 
console *_sc_prepare_console_server(sc *sc, const char *node, int port)

    _ addr = (port == 0) 
        ? STRING(node)  // Unix socket
        : CONS(STRING(node), CONS(integer_to_object(port), NIL)); // TCP socket

    _sc_prepare(sc, CONS(SYMBOL("init-console"), 
                    CONS(QUOTE(io),
                    CONS(QUOTE(addr),   // (*)
                    NIL))));

The (*) marked part wasn't quoted.  This was ok for a unix socket
(strings are self-quoting) but of course not for a list.  The
subsequent error did not get handled.

Why?

It looks like the resume() call needs an error handling mechanism.


Entry: Interpreter replacement
Date: Thu Dec 24 10:49:54 CET 2009

Back to the more abitious project: replace the interpreter by
something a bit faster.  I'd like to write the machine itself in
Scheme, and find a translation to EX.

I'd like this VM and the current treewalker to use the same data
structures.  Maybe it's time to flatten the struct a bit.


Entry: Onwards
Date: Sun Dec 27 19:00:05 CET 2009

As the pressiure is a bit off the kettle from the side of the
commercial application, where can I go next?  Let's try to summarize.

- Need some form of byte-code representation to make at least
  variable reference a bit cheaper, and eliminate some of the
  redundant allocations of intermediate interpretation state.

- Need some kind of scheme -> C compilation (ala pre-scheme) to
  eliminate tedious debugging of C code, and reduce code size.

Both can be done at once.  Compiling some core Scheme forms to EX
(without HOFs and proper tail calls) doesn't seem very difficult.

So, how can this be made practical?  Where to start?

It seems simplest to gradually move parts of the current C
implementation to Scheme, and write a bootstrap compiler for it.
Essentially this is an exercise in bootstrapping: how to organize the
code in such a way that most of it is in Scheme and duplication in the
compiler is minimal.

When using a VM, part of the compiler needs to be implemented in C (or
compiled to C from Scheme).


Entry: LLVM
Date: Sun Dec 27 19:10:53 CET 2009

[1] http://www.ida.liu.se/~tobnu/scheme2llvm/


Entry: VM refactoring again
Date: Thu Dec 31 11:04:51 CET 2009

The first things to eliminate are the continuation transformers.  Eval
and apply could be made to modify the continuation directly.

Benchmark: boot.scm compilation in debug mode on zni.
tom@zni:~/libprim/sc$ rm -f boot-expanded.scm ; time make boot-expanded.scm
./sc --boot /home/tom/libprim/sc/boot.scm --eval '(script)' /home/tom/libprim/sc/expand.scm /home/tom/libprim/sc/boot.scm boot-expanded.scm

real	0m9.082s
user	0m9.077s
sys	0m0.008s

How to do this?  Restructure the main loop: use the state vector only
as a mark point to recover from GC restarts.

So, first objective: eliminate the state object.  (current redex +
continuation).

( This keeps eluding me: the current structure is really too
high-level to allow simple imperative optimizations...)

Plan: 

* flatten the state object into an expression + continuation object in
  the global state vector.
 

The problem this leads to is that _sc_step_value returns a state
object, which is a code + kontinuation (c . k) pair.  Essentially,
this needs to be transformed into a loop, updating a central machine.

It doesn't look like it is useful to start modifying the current
implementation in-place.  It needs to be completely rewritten.  I'm
going to revert the code back.


Entry: Bootstrapping a Scheme VM in MZ
Date: Thu Dec 31 11:34:23 CET 2009

So.. Let's do it a bit more intelligently.  Use MZ to bootstrap a native 
Scheme on top of the EX primitives, completely independent of the sc/ approach.

Once this works, the same can be done for pf/ which is simpler.


Entry: EX code generator working
Date: Thu Dec 31 14:04:12 CET 2009

Using only these forms:
  - (if) conditionals
  - (let*) lexical scope + sequential code blocks
  - first order function application using the ex_ C namespace
  - constants + immediates


Entry: VM primitives
Date: Thu Dec 31 15:10:41 CET 2009

Representation:

- value store with explicit boxes (set! and mutable varrefs are transformed)

- all global variables are boxed.  code can be optionally "frozen"
  when compiled, to give a poor man's module system.

- environment is a list of vectors.  use GC integers to index frame+var

- continuations frames (data structure) are tagged with functions that
  take a single input argument and update the machine state, and with
  a dictionary of continuation marks.

Questions:

 * I wonder if the PF VM and the Schem VM can be unified.  I guess..
   If the current environment is equated to the data stack?  Yes let's
   do that.

 * A value store allows (partial) elimination of the environment at
   run time: variables that have known bindings can be substituted.

 * How to properly distinguish un-evaluated code from values = result
   of evaluation?

 * What is a continuation frame for an application?  Can this directly
   construct the environment?  I.e. always first evaluate the function
   closure, and then simply concatenate.

 * The question is really: what is a closure, thinking of PF?  

   It's something that takes arguments, and works like this:
     - replace the current data stack with the one embedded in the closure
     - evaluate arguments and push them on the data stack
     - jump to the closure's code

   I.e. it's a PF function that _replaces_ the data stack before it is
   executed.

   Translating the above to a machine instruction sequence:
     - make a continuation frame that restores the current environment


Let's try to build this.  This seems to be an interplay between
instructions and continuations, which can both be represented as
primitive (C) functions.

Code is an expression tree (no sequential VM), represented by cons
cells:

(prim . args)

All highlevel function closures are defunctionalized, and passed to an
apply primitive.

The machine primitives are CPS: they do not return, but instead update
the machine state.

Looks like plain and simple CPS is preferrable (see EOPL).  The
non-returning lambda expressions can then be represented as structs.

So how are CPS and DF related?

From Danvy's paper[1]: DF can be used to represent the closures
resulting after performing CPS transform.


The main intuitive stumbling block is 'if'.  Somehow I don't understand
why this needs two primitives: one to start the evaluation of the
condition, and one to pass control to the proper expression based on
the boolean value of the condition.

PLAN: It looks like what I'm about to do is quite close to chapters
8,9,10 in EOPL, so it might be a good idea to simply read those again.


[1] http://www.brics.dk/RS/01/23/


Entry: CPS and `if'
Date: Sat Jan  2 10:44:40 CET 2010

Indeed: it needs two control points: before and after the evaluation
of the condition.

(if a b c) -> 

(if (a (lambda (v) (b k))
       (lambda (v) (c k))))

Looks like CPS by itself isn't the problem.  Representing the
resulting functional continuations as data is.

Here defunctionalization comes in: replace each lamba with a call to
apply.

(lambda (v) (b k))

-> (apply (Lambdaxxx v))


In the Danvy paper[1] representation as closures and
defunctionalization are mentioned as separate approaches to
representing higher order functions:

* Closures: capture code pointers together with an environment
  representing the free variable bindings.

* Defunctionalization: represent a higher order function as a data
  structure containing the free variables.

* Combinators: translate a program into combinators (which eliminates
  variable names) and use graph reduction.


[1] http://www.brics.dk/RS/01/23/


Entry: Experiments - trying to understand defunctionalized continuations
Date: Sat Jan  2 11:08:07 CET 2010

So, let's have a look at this from a more cocrete point of view: using
a trampoline VM loop, how to implement `if'?

Notes

  * In the final implementation, the continuation represented as
    nested frames needs to support continuation marks.  So it might be
    best to stick close to the current representation.  However, the
    tags that identify continuation frame constructors can be
    implemented by primitive functions directly.  However, this
    modification can probably be made later.

  * Represent environments and continuation stacks as efficiently as
    possible (i.e. use vectors).

  * Avoid creating garbage in the VM.  Continuation frames themselves
    are the main point.  Make sure that this is limited to an absolute
    minimum, i.e. the place where the current value is stored (passed
    to the current continuation) should not be a fresh cell.

A place where a lot of garbage gets created in the original VM is
function argument evaluation: instead of simply pushing in-place onto
a stack, each evalated argument replaces the current continuation
frame.  Note that when continuations can be captured, in-place update
is not allowed.

So how to fix this?  VMs i've seen before keep the current value rib
stored in a register, and save the rib whenever a new frame gets
entered.  Actually, this isn't much different except that it saves the
creation of frames for primitive function evaluations.

The question is now: once converted to CPS where the order of
evaluation is clear, does this need special support?  I'm still mixing
up several ways of looking at the problem..

CPS-converted code names all intermediates explicitly, so they are
added to the current environment.  Also, all primitive invocations are
values.  The same goes for ANF.

At the VM level this needs just apply, and no sequencing of
evaluation.

So which one is simplest?  ANF or CPS?  Since continuations need to be
implemented as data structures in the current VM approach (due to
access to the continuation marks), ANF might be simplest.

However, this only side-steps the problem: sequencing still needs to
be encoded somewhere.

I wonder, is it possible to use `let*' (or a single-variable `let1') as
a primitive form, explicitly sequencing the calculations at compile
time?

ANF does require the argument value to be stored in the environment
multiple times: once when evaluated, and once when applied.  I.e.:

(let ((fn ...)
      (arg1 ...)
      (arg2 ...))
  (apply fn arg1 arg2))

The `apply' extend the environment of `fn' with values from the
environment of the body of the let expression.

Can this be optimized?  I.e. something like this: first evaluated fn,
get its environment, and extend that environment directly.

This then gets into trouble with vararg functions.


Alternatively: only one argument functions?  This makes the
continuations simpler, moves sequencing to the compiler, but produces
intermediate garbage.

Is this so?

However, single-argument abstractions seem simpler to implement.

What about this: single-argument abstractions, and primitive
invocations in terms of variable references using De Bruijn indices.


So.. looks like ANF isn't necessary: creating separate continuation
frames for all the instructions seems to be simplest.

The idea behind ANF is then really that the only point where any
continuation frames are created is for the body of a let form.  All
the other forms take values directly.

Maybe this is simpler indeed...


Entry: ANF or not?
Date: Sat Jan  2 15:33:11 CET 2010

As far as I understand now:

* CPS makes sure that the central abstraction form is the closure.
  The continuation is then just a closure.  I don't think I can use
  this as I need an explicit stack form for continuations, to be able
  to attach continuation marks.

* ANF makes sure there are less forms that create continuation framse
  (let, seq).  All the other forms (if, set!, apply, prim-apply) can take
  values from the environment and can be executed immediately.


Reading the intro of [1].  CPS's greatest use is that the full lambda
calulus provides an OS, leading to more transformation rules as
compared to lambda_v with a beta_v strict reduction rule.

The ANF's greatest use is that most of these transfomations can be
carried over to a style with implicit continuations.

[1] http://lambda-the-ultimate.org/node/69


Entry: VM status
Date: Sat Jan  2 21:18:39 CET 2010

- got both opcodes and continuations represented as C code functions
  that operate on the machine state, modifying the C E K registers.

- compiler from Scheme -> VM code

TODO: 

- creating function abstractions + distinguishing primive and
  function abstractions.

- self-hosted compiler


Entry: More VM
Date: Sun Jan  3 09:19:15 CET 2010

Unifying primitives and abstractions.  Maybe this can be done by
making a closure wrapper for each primitive.

So let's make a closure constructor first.

Done: nb argument checking.

Next: distinguish primitives from abstractions.  The problem is that
this can't be done at compile time.

So, something needs to indicate this at run time.  Either the apply
function dispatches on type, or all values in function position are
assumed to be closures.

The latter won't work: since values can come from anywhere, they need
to be checked for type.  OK.

Next: Make vm-compile understand the basic forms in the other VM.
This mainly requires `let'.  Maybe the VM's let1 schould be extended
to support parallel evaluation.  Otoh, if it is possible to write
`let' in terms of `let*' it might be interesting to pursue that route,
as it's useful for the Scheme -> EX compiler.

Alternatively, can the current compile.scm be converted to C?  The
task is finding the sweet spot somewhere.

let* -> let can be achieved by renaming all variables in the let, and
undoing the rename at a reference point.

Problem however is then properly threading the rename environment.
I'm fixing the compiler such that `%let' is no longer recognized as a
primitive form, but both `let*' and `let' compile directly to op-let1
using a shared traversal function.

However, it might be simpler to implement let directly in the VM:
continuation frames are created anyway.

Here's the let -> let* code (with a rename bug for the expressions).

         ((eq? tag 'let*)
          (let ((bindings (car args))
                (body (cdr args)))
            (cond
             ((null? bindings) (comp `(begin ,@body)))
             ((null? (cdr bindings))
              (comp `(%let ,@(car bindings)
                           (begin ,@body))))
             (else
              (comp `(%let ,@(car bindings)
                           (let* ,(cdr bindings) ,@body)))))))
         ((eq? tag 'let)
          (let ((bindings (car args))
                (body (cdr args)))
            ;; Rename variables + keep translation records.
            (let ((names (map car bindings))
                  (exprs (map cadr bindings))
                  (new-names (map gensym bindings)))
              (compile `(let* ,(map list new-names exprs) ,@body)
                       env
                       (append (map cons names new-names)
                               renames)))))


OK: let (made it a VM opcode).

Next: self-hosting compiler.  Re-arrange the code such that it becomes
possible to write the macro expander and the compiler in the VM, then
switch VMs.

Next: Toplevel environment & boxes.

Boxes are not be necessary yet as there is still an environment at run
time.  set! can be implemented by updating this in-place.

OK: assignments


Next: ribbed index.  -> Probably won't work since vectors can't be
updated non-destructively.  However, closures could be packed lazily.
I.e. if the 'ref' function can handle nested vectors like it now does
cons-cells:

#(a b c d e f #(h i j k l '()))

Anyway, that's for later.

Next: toplevel.

Next: match form.

It might be a lot simpler to use the `match' form for writing the
compiler.  The effort needed to implement this might pay off well.

For toplevel, it might be simplest to have the compile operation
extend the toplevel as well.  I.e. it adds slots for undefined
variables, and links those slots into the code.

Or better still, add a dynamic binding opcode that can be
short-cicuited into an unbox opcode at first use?


Entry: Match in match
Date: Sun Jan  3 16:57:52 CET 2010

What about writing match in match, and bootstrap it using mzscheme?

Looking at code I find online and in PLT, it's not a simple macro.
Maybe best to leave it be and use something simpler.


Entry: Self-hosting compiler
Date: Sun Jan  3 18:28:16 CET 2010

Still need to fix toplevel references + some way to bootstrap
compile it.  If I add a #<data:...> reader, or something that can read
a textual representation of byte code it might be simpler to
serialize from a mzscheme hosted compiler and bootstrap that way.

Yes. This seems a lot simpler.  Also, if the bytecode can be made
machine-independent, this can also be used for cross-compilation.


Entry: Values
Date: Sun Jan  3 21:13:12 CET 2010

The benefit of the ANF rep is that the only opcodes that create
continuation frames to support nexted evaluation are `let' and `seq'.

But currently the VM expects variables; constants (that also do not
require evaluation) are still buffered in variables.  Maybe the VM
should support constants directly.

If variables references can be represented in a separate type internal
to the VM, this can be simplified.


Entry: Distractions
Date: Sun Jan  3 22:14:07 CET 2010

Did I reach the goal?  (The goal was to replace the current VM with
something faster and easier to understand).

I do think this approach is simpler.  Although the direct
implementation is quite straightforward also, the particular turns I
took don't all make sense.  One of them is the `ktx' idea.  The reason
is of course the tension between wanting to write the VM as a pure
data structure transformer, while wanting to keep the primitives close
to C function semantics.  This doesn't work well when continuations
are part of the game, and the garbage collector has to be tamed using
state.

The new VM is simpler in that it has only two kinds of continuations:
`let' expressions and `begin' forms (2-expression begin or `seq').
This requires some complication on the compiler side however: the
subset of scheme the VM supports is smaller.

Whether it is faster I haven't checked.  It would really surprise me
though if it is not.  It produces a lot less intermediate state: only
continuation chains and environments.


So what did we learn?  The interesting balance between operations that
save context and setup a continuation, and the subsequent handling of
the continuations: the move from direct style (function returns to the
point after invocation) to CPS style: we tell the machine what to do
next at the point the current evaluation finishes.

This was already a revelation this summer working on the more explicit
interpreter, but the VM and the closer connection to C functions makes
it all a bit more earthy.


Is there any beef in this?  Yes.  I'm not so far removed from being
able to self-host the compiler.  Toplevel defs + some transferrable
bytecode format.  It would be interesting to be able to cross-compile
from any standard Scheme.


Entry: Display data structure (implementing environments)
Date: Sun Jan  3 23:30:59 CET 2010

See 4.4. in ``Dybvig's Three Implementation Models for Scheme''[1]:

  A display is an array of display pointers, each holding the address
  of one frame in the current static chain.  

This avoids frame traversal.

[1] http://www.cs.unm.edu/~williams/cs491/three-imp.pdf


Entry: ANF
Date: Mon Jan  4 00:26:44 CET 2010

To be correct, the compiler does not produce ANF, as it does not
bubble let expressions outward (see the three A-rules in figure 7. in
[1]).  However, the VM can interpret ANF form.

It is not completely clear to me how this then provides room for
particular optimization.  At the end of the paper there is an example.

The current VM is a C_a E K machine, extended with set! and begin.


[1] http://lambda-the-ultimate.org/node/69


Entry: Dalvik-style small footprint
Date: Tue Jan  5 14:49:56 CET 2010

- mmap constant data from disk, so it can be swapped out.


Entry: Bootstrapping the VM
Date: Sat Jan  9 09:51:33 CET 2010

Essence: compile scheme to some lower form that can be serialized and
interpreted by the VM or can be compiled to native machine code.

Ideally, it would be something that can be mmapped.  Two
straightforward optimizations come to mind:

   - replacing code pointers w. opcode tags
   - flattening the code tree

Now, the challenge is, since an ANF interpreter is similar to a
concatenative interpreter, how to make sure both can be hosted?

Ok, let's dig up the tree rep I used before, now renamed to
leaf/bytecode.c

This can compactly encode arbitrary tree structures, and the bytecode
is a tree structure.

But, it's really not necessary.  There is already an s-expression
parser, and it might be simplest to just use ANF transformed scheme as
the serialized "byte code" form.

I can't seem to find a proper point of attack.  To many degrees of
freedom.

Anyways, when using a human-readable s-expression ANF form as a
bytecode representation, the code that compiles this to the currently
established internal representation needs to be written in C.

Done.  This was quite straightforward.


Entry: Compiling the compiler
Date: Sat Jan  9 11:27:18 CET 2010

The next step in compiling the compiler is to clean up the sc/boot.scm
code so we can extract the macros necessary to transform the compiler
code.

Almost there: need to gather some more macros: and or quasiquote cond

Nope.. I messed up the old vm boot file again..

Maybe it's time to throw it away, and run the bytecode on an emulated
CEK machine?  It requires some effort to keep the old VM running.
Let's split the boot code up.

Ok, i rewound the old VM to something that's working as I'm still
using some of that substrate to run the new VM.

New VM can compile itself to byte-code.  To run the byte code it needs
to also compile its dependencies:


Entry: What did I learn about bootstrapping?
Date: Sat Jan  9 20:54:41 CET 2010

- It's a lot of work to get it right: the devil is in the details.

- Incremental bootstrapping is error-prone and non-trivial.  It's fun
  to do once (it's a nice puzzle) but once is really enough.  

- However the system is alive during development: it is possible to
  interactively test all incremental semantics extensions.

- Writing a compiler in a subset of Scheme without any form of pattern
  matching is tedious and error-prone.  


Entry: What's next
Date: Sat Jan  9 21:03:04 CET 2010

Separate out only the very high-level code, but write the macros for
the VM compiler is a reasonably comfortable subset.  There is a clear
advantage to bootstrapping from a higher level.


Entry: Now generalize
Date: Sat Jan  9 21:05:47 CET 2010

So, there is something to learn here, though I'm not sure what.


Entry: A simple pattern matcher
Date: Sat Jan  9 21:07:57 CET 2010

This seems to be the essence of bootstrapping: make sure syntax
translation doesn't become a pain.

What is the simplest form of pattern matcher that's easy to implement?
I guess one that just uses:

  - cons quote '()
  - variables

Essentially quasiquote can be re-used to convert an expression into
nested constructors, which can then be easily matched.

This is the "match in match" idea.

Match seems simplest to implement in CPS: every failure continues with
the next pattern.

(cons a b)
->
(if (pair? expr)
   (let ((a (car expr))
         (b (cdr expr)))
     (next)))

For an implementation check "The implementation of functional
programming languages" by Simon Peyton Jones.

It looks like that algorithm isn't what I'm looking for (it compiles
pattern matches to case expressions, avoiding multiple matches of the
same expression).

Some duplication of computation is probably OK, since most cases
useful atm are all head-tagged.

[1] http://www.cs.tufts.edu/~nr/pubs/match-abstract.html


Entry: Poor man's module system
Date: Sun Jan 10 09:21:14 CET 2010

Problem: sequential bootstrapping is tedious for writing a compiler
and runtime library.  What you want is to write all that code in
itself.  For the library this means the code will not necessarily have
"flat" dependencies, i.e. you can have mutual recursion.

However, since this all takes a lot of time, I have to be a bit
pragmatic: I currently have a sequentially bootstrapped scheme system
I'd like to reuse.

This requires `define-macro' to work in the new VM compiler to
incrementally build the 2 toplevel environments.

Next:
- build initial environment with all the primitives
- gather all macros from boot1-expand*.scm, boot2.scm, compile.scm
- compile the files


Entry: How do deal with improper phase separation?
Date: Sun Jan 10 10:11:57 CET 2010

There is such a huge difference between implicit behaviour
(i.e. stateful sequential bootstrap) and explicit declaration (clear
phase separation).

It shows up clearly when writing a compiler for a system without phase
separation (sequential bootstrap).  Figuring out the dependencies
between code to separate compile time code from run time code objects
is non-trivial.

In case of non-separated phases it seems that the simplest such
algorithm is to EMULATE all macros defined in terms of the target
language on a (sandboxed) VM, as the static semantics of the language
can change in quite arbitrary ways during sequential bootstrap, and
such changes can't be simply transformed to run on the host as part of
the compiler.

More precisely:

  - run the compiler + low level macros that define the kernel
    language on the host.

  - incrementally compile target code

  - emulate any bootstrapping macros encountered in the source
  
So, essentially, looking at this from the target's perspective, the
bootstrap compiler behaves as a magic external component that can be
completely replaced at the end of the incremental bootstrap process,
as it can then be defined in terms of already compiled macros.


Entry: Reflection and self-hosted compilation
Date: Sun Jan 10 11:03:59 CET 2010

Why is this relevant?

Let's abuse terminology and call what was called "incremental
bootstrap" a special case of reflection: the ability of a language to
change its own semantics in some way.

Note that in our case, we change compile-time semantics (macros) in a
bottom-up, linear fashion.  I.e. code, once compiled, does not change
semantics.  This can then be augmented by either early or late binding
of run-time code (redefining procedures).

Some assumptions:

- Interactive console acces is useful for debugging.

- Some form of compilation (to native or VM code) is necessary for
  efficiency.

From this follows that we need an in-image compiler to compile source
code to executable format.

- Using reflection (i.e. giving up phase separation allowing compiler
  extension to change language semantics on the fly) can lead to very
  small run-time systems.

Two examples that illustrate this: Forth and COLA.  The downside of
powerful reflection is the absence of clear compile-time semantics:
there is essentially only behaviour, no static meaning.  I.e. the only
way to get any information about what a program means is to feed it
into the system and see what it does.

It seems to me that the difficulty associated with incremental
bootstrap (the ``puzzle'' factor) translates directly into smaller
code size, probably as a consequence of the very aggressive factoring
required.


Conclusion:

  Reflection in the form of incremental compiler extensions gives up
  static semantics and gets small code size in return.


Entry: Practical bootstrapping with embedded VM
Date: Sun Jan 10 12:11:55 CET 2010

Back to the original plan: to bootstrap the incrementally defined
Scheme library by feeding a VM with code and asking it to compile some
of the macro forms.

This might actually not be so simple: the problem is that this
requires another interface.  Maybe it is more straightforward to
compile the whole library on top of R5RS Scheme.

Some trade-offs:

  - Run the macro transformers on the target VM and provide an
    interface to transfer the data from/to the host compiler.

  - Emulate the VM + primitives.  This is not just the VM (composite
    code) but also the primitives need to be implemented.

  - Don't emulate the VM: compile all macros and their supporting code
    on the host, and use the hosts's language features.

It seems the latter is a more robust approach.  However, I have the
interface for free if I can run the compiler on the old VM.

Maybe it's time to separate the old VM code from the rest of the C
code that's re-used by the old VM.


Entry: Unrolling a sequential bootstrap
Date: Sun Jan 10 12:43:46 CET 2010

This is exactly the same problem as Staapl's standard Forth compiler.
Maybe I should focus on that first and generalize to Scheme?

The essential problem is to build a dependency graph of all the code
that allows each function and macro to be defined independently
relative to some core language.


Entry: A simple match form
Date: Sun Jan 17 08:32:12 CET 2010

What I need is a simple abstraction around lambda applied to list,
with deterministic dispatch on tags.

(switch <value>
  (<tag> <formals> .  <body>)
  ...)

->

(case (car value)
  ((<tag>) (apply (lambda <formals> . <body>) (cdr value)))
  ...)


(define (switch form)
  (let ((data (cadr form))
        (rules (cddr form))
        (_match_ '_match_)
        (_form_  '_form_)) ;; GENSYM!
    
    `(let ((,_form_ ,data))
       (let ((,_match_ (lambda (fn) (apply fn (cdr ,_form_)))))
         (case (car ,data)
           ,@(let ((tx (lambda (rule)
                         `((,(car rule))
                           (,_match_ (lambda ,@(cdr rule)))))))
               (map tx rules)))))))


Entry: Gensym
Date: Sun Jan 17 09:16:53 CET 2010

The simplest implementation is a tagged, zero-size record.  This makes
sure it's GCd.


Entry: Compiler + VM cleanup
Date: Sun Jan 17 11:15:41 CET 2010

- Allow values to appear in applications instead of only variable
  references.

First step is to make all variable refs explicit.  It might pay to
make a binary tagged type for these, also to suppport displays later.

But where to insert it?  Integers, constants, vectors, finalizers.
There's no room, except as part of the integers or constants.
I.e. these could give up one bit of precision to allow for references.

Question: making references explicit, requires also literals and
lambdas to be tagged.  So numbers could still be used to encode
varrefs.

I.e. the following byte code:

  `(op-let ((op-lit 1) (op-lit 2) (op-lit ,add)) (op-app 0 #(1 2)))

Should become something like

  `(op-app (op-lit #<prim>) #((op-lit 1) (op-lit 2)))

I.e. any point in the byte code that expects a value might as well
contain an expression that can be immediately transformed to a value.
But tagging values might not be a good idea.

The problem is there are 2 opaque values that need to be constructed
at run-time - i.e. code needs to run - but the code that runs doesn't
need to create new continuations (no expression evaluation).

   - variables: lookup in the environment
   - lambda expressions: create a closure grabbing the current environment

Other values (concrete values or literals) do not need any extra
processing.

Currently, variables are referenced directly, but lambda forms trigger
the creation and subsequent deletion of a closure (i.e. they are
treated as expressions to be evaluated).

  What did we learn: this is not necessary.  Lambda expressions are
  values, just like variable references and literals.


Entry: Non-recursive make
Date: Tue Jan 19  7:35:11 CET 2010

Main trick:

# Build makfile by including all makefile fragments.
define do_include
MODULE := $(1)
include $(1)/module.mk
PROJECT_SRC := $(PROJECT_SRC) $$(patsubst %, $(1)/%, $$(SRC))
# DUMMY := $$(shell echo $(1) : $$(SRC) >&2)
SRC :=
endef
$(foreach prog,$(MODULES),$(eval $(call do_include,$(prog))))


This works for the .o files.  Now, how to make the .a files
(i.e. aggregation) and how to make the .h_prims files (header
generation which messes up GCC's dependency generation.

The latter is the real problem.  What would solve it is to generate a
dependency file of the .d file itself, and generate the deps in two
steps.

What seems to work is -MM which skips system files.

-MG In conjunction with an option such as -M requesting dependency
    generation, -MG assumes missing header files are generated files
    and adds them to the dependency list without raising an error.
    The dependency filename is taken directly from the "#include"
    directive without prepending any path.  -MG also suppresses
    preprocessed output, as a missing header file renders this
    useless.

That was easy!


Entry: contd..
Date: Tue Jan 19 23:01:44 CET 2010

Using another generated rule for the apps.  $(eval $(call ...)) is
essentially macro expansion, and works quite well.

Tried VPATH for building in a separate directory.  The current rules
work well if the object files already exist, but are out of date.
Otherwise the files are built in the source dir.

VPATH doesn't seem to work well with prefixes...  It really is all
string substitution: I was assuming something like it "unpacking" path
names..  See [1]:

  Although they say ``the commands I list here will create a X from a
  file Y'', that is not what the commands actually do. Instead, they
  create ../obj/X from Y. This is not the same thing at all, and GNU
  make will become very upset with the deception.


[1] http://make.mad-scientist.us/vpath.html


Entry: VM fixes
Date: Fri Jan 22 07:27:43 CET 2010

Time to start incremental refinement to get it to a working state.

Next: handle lambda without creating an evaluation context.  It is
problematic to do this in the `let' evaluation cycle: it needs to be
special-cased.  I.e. the following function needs an extension at (1)
to evaluate lambdas in-place.

static void _kf_let(sc *sc, _ v, kf_let *f) {
    /* Extend environment with value. */
    _ env = CONS(v, f->env);
    if (NIL == f->exprs) {
        /* Done: enter new environment. */
        sc->e = env;
        sc->c = f->body;
    }
    else {
        // (1)

        /* Evaluate next expression. */
        EXTEND_K(kf_let, fnext);
        sc->c        = _CAR(f->exprs);
        fnext->exprs = _CDR(f->exprs);
        fnext->body  = f->body;
        fnext->env   = env;
    }
}

Generally, the basic point is: `let' (next to `seq') is the place
where continuations get created.  For primitive evaluation, variable
references, constants and lambda expressions this is not necessary.

Can the compiler sort this out?

Yes.

So the answer is to wrap the RHS of `let' and `seq' expressions in a
constructor that expresses whether an expression can be reduced to a
value immediately (without creating a continuation) or not.

The problem is: how to handle primitives?  Essentially, the _vm_let()
and _kf_let() opcodes and continuations need to peek into the value of
the operator, which is not known at compile time.

Is it possible to prune complexity and remove the "apply" opcode, and
move this to an implicit operation in let and seq?

(From now on forget about `seq' as the `let' results generalize easy
to `seq'.)

So, the `let' RHS is either a varref, a constant or an application,
which can be:

    - primitive
    - closure
    - continuation

Only in the case of a closure a new continuation needs to be created.
The primitive can be executed directly, and the continuation will void
the current continuation anyway.

How to express this?

The compiler can distinguish between application and non-application.
However, the runtime system needs to then distinguish closures from
primitives and continuations.


Entry: The `let' form: understanding ANF
Date: Sat Jan 23 06:30:56 CET 2010

The point of extension is this function, which is called both from the
machine primitive _vm_let as from the continuation primitive _kf_let.

It needs to decide whether to create a continuation frame or not.
This is only necessary for closure applications.

static void _let(sc *sc, _ exprs, _ body, _ env) {
    if (NIL == exprs) {
        /* Done: enter new environment. */
        sc->e = env;
        sc->c = body;
    }
    else {
        /* Evaluate next expression. */
        EXTEND_K(kf_let, f);
        sc->c    = _CAR(exprs);
        f->exprs = _CDR(exprs);
        f->body  = body;
        f->env   = env;
    }
}

However, nested `let' forms are a special case.  They also need to
create continuation frames before evaluation.

So again, assuming `seq' is implemented as `let' (and later optimized
again to not cons needlessly), the _let() function needs to create a
continuation for:

        - another _vm_let() operation

        - a _vm_app() operation with a closure operator


So, the representation needs to be optimized for dispatching:

new frame:
    (let (let ...))
    (let (app-closure ...))

no frame:
    (let (ref ...))
    (let (const ...))
    (let (app-prim ...))
    (let (app-cont ...))

Can the decision to create a continuation frame be moved from the
caller to the callee?  I.e. something like Forth's ENTER?

Meaning: assume that all primitive machine operations simply produce a
value (i.e. go back to the previous representation), but special-case
closure applications and nested let.


Entry: ANF again
Date: Sat Jan 23 06:38:16 CET 2010

Let's see what I missed last time.  There is a transformation example
on page 10 in [1] that seemed rather strange.

   (add1 (let (x (f 5)) 0))

-> (let (x (f 5)) (add1 0))

This is an application of rule A3 in figure 7 on page 7, which lifts
an inner application out of a primtive or value application.

So, to summarize: ANF's usefulness is mainly for optimization, as it
can expose certain reductions more easily.  Whether a VM creates
continuations during 'let' or during argument evaluation doesn't
matter so much for the internal complexity..


Another thing I missed is that proper ANF is _serial_ (i.e. let*
instead of let): it doesn't have a parallel representation.

[1] http://www.ccs.neu.edu/scheme/pubs/pldi93-fsdf.ps.gz
[2] http://lambda-the-ultimate.org/node/69
[3] http://schemecookbook.org/view/Cookbook/PearlANormalForm


Entry: Encoding `let' non-tail calls as SC primitives
Date: Sat Jan 23 07:15:13 CET 2010

Again, the essence: while walking the list of expressions to be
evaluated in a `let' form, two means of dispatch are necessary:

   - P: the operation is primitive and can compute a value immediately
     without modifying the current continuation.

   - N: the operation nests.

Is it possible to represent the N form as a P form, or does this need
to be encoded explicitly?

The fundamental dissonance is that this then yields 3 kinds of C
functions to implement machine behaviour:

   _vm:  creates context (CALL)
   _kf:  consumes context (RETURN)
   ex:   avoids creation/consumption of context (CALL+RETURN)

The main problem being that ex (functions) cannot represent _vm nor
_kf which are "function yin & yang".

I.e. the constraint to use C function semantics for most of the
structure of libprim requires that at some point a decision is made to
either use the C call semantics (the C stack), or the VM call
semantics (the continuation stack).


Entry: Why box variables?
Date: Sat Jan 23 07:46:41 CET 2010

The reason is that values don't need to be copied if they are never
mutated.

   (let ((a ...)) (let ((b a))) b) 
-> (let ((a ...)) a)

It's also useful for creating display closures (lambda lifting).


Entry: VM cont..
Date: Sat Jan 23 07:53:17 CET 2010

I'm running in circles.  Maybe the machine needs to be designed around
_let() instead?  I.e. to create a frame or to not create a frame?

Maybe it's time to leave it as is, bootstrap the system and see how
fast it runs.  Then try to create a next version.

Let's cleanup the code a bit first. OK.

The idea is this: at the point where we have let's intermediate state
active (list of values + list of expressions) we need to decide
whether this state should be saved to introduce a new frame or not.

It looks like this means breaking out the current expression list and
the list of collected values.

So let's turn it around: the VM's core function is to perform `let':
interpret a linear sequence of opcodes, and collect their results in a
value rib.  Once the rib is complete, add the rib to the environment
and execute a body.

A body is then:
  - a new let
  - an application  (gather arguments + change environment)


Dybvig[1] uses 5 registers:

  - a: accumulator
  - x: next expression
  - e: current environment
  - r: current value rib
  - s: current stack

Using a value rib in the presence of continuations requires some
special care when the continuation is captured during a `let'
expression.

It's really quite complex: there are a lot of possible optimizations.
I wonder if it's possible to formulate the different implementation
trade-offs in a more modular fashion, and use partial evaluation te
generate an implementation.

Looks like guile has display closures too[2].


[1] http://www.cs.unm.edu/~williams/cs491/three-imp.pdf
[2] http://wingolog.org/archives/2009/07/26/the-good-hack


Entry: Build system
Date: Sat Jan 23 08:47:12 CET 2010

Make the module.mk "object oriented".

Instead of defining SRC, define OBJ.  We always produce binary
objects, but the source files might be anything.

Also made application rules locally defined, such that the project.mk
only aggregates.


Entry: Embedded use and debugging
Date: Sat Jan 23 12:20:31 CET 2010

So..  compilation is complex: there are a lot of possible tricks and
trade-offs.  Maybe I should aim for a simpler problem to solve,
instead of picking a route that would end at re-implementing one of
the current scheme interpreters.

For embedded systems it is not so very useful to have a dynamic
language during the application's run time.  It's more useful to be
able to poke around at debug time.  This means that the interpretation
(run-time compilation) can be offloaded to the host.

Maybe it's time to think of the libprim project from that perspective:
easy integration with C and debug-time interactive compilation, more
in the spirit of Staapl.

There is a huge advantage in this approach: if a language does not
need to be self-hosting, a fairly sophisticated host language can be
used to implement the compiler.


Entry: Mangled shared object names
Date: Sat Jan 23 13:20:31 CET 2010

One modification I would like to make is to use shared libraries, and
annotate all functions with the number of input arguments.  This can
be done in a straightforward way using something like this in the
generated header files:

#define ex_write ex_1_write

Implemented.  It seems to work fine.

How to list the symbols in a .so file?  Use
dl_iterate_phdr[1]. (GNU_SOURCE).

[1] http://linux.die.net/man/3/dl_iterate_phdr
[2] http://lists.cairographics.org/archives/cairo-commit/2007-April/007935.html


Entry: More snooping
Date: Sun Jan 24 08:39:57 CET 2010

The bitc[1] and fonc[2] projects have a lot of good stuff to borrow
from.

Piumartha's object model[3] might be a simpler starting place.

[1] http://www.bitc-lang.org/
[2] http://www.vpri.org/fonc_wiki/index.php/Installation_Guide
[3] http://piumarta.com/software/id-objmodel/


Entry: Next?
Date: Sun Jan 24 09:24:22 CET 2010

Bootstrap VM.  It seems simpler to do this from VM2 since it has the
primitive data and code shared with VM2.

VM1 compiling VM2 compiler to VM2: ok.
Next: compile the library.

Pfff... sharing code between the two VMs in this hacked up way is too
error prone.

Maybe it's better to use a properly phased implementation, and rewrite
most of the macros in a higher level function set with some form of
pattern matching.

VM1 was an interesting excercise, and can still be used for
bootstrapping VM2, but it seems better to disentangle both a bit, and
write the library for VM2 in more concise way.  See remarks here[1].

[1] entry://20100109-205441


Entry: VM3?
Date: Sun Jan 24 14:42:36 CET 2010

Maybe I should start over using this[1] ANF compiler.  Or let it go
for a while, and rethink the purpose of the project.

What about focussing on something else instead?  Back to the libprim
roots: how to build a scripting engine on top of a collection of
primitive C objects without much effort?

I.e: given a collection of LEAF objects, automatically wrap them in
Scheme or PF.

Main idea: for systems programming dealing with scarse resources
(i.e. hardware units) asynchronous GC isn't always a nice model, as it
doesn't follow the plentiful memory resource model.


[1] http://schemecookbook.org/view/Cookbook/PearlANormalForm


Entry: More about tethering
Date: Sun Jan 24 14:54:37 CET 2010
  
Let's build a tethered library based only on LEAF.

Given a function from LEAF, how can EX use it?  It needs to know how
many arguments to pass, and how to wrap/unwrap the arguments.

What about adding a _type function that will provide this information?
For each function defined in an .so there is an additional function
_type that will produce meta-info.


   - Incremental bootstrap is messy. 
   - A decent self-hosting compiler is not so trivial

      => A fully bootstrapped, possibly not self-hosted compiler might
         be a better approach.

   - I don't understand it quite well yet.  There are lots of
     decisions to make when writing a VM.  I'm not sure a few simple
     modifications will do much good.  I would like to go back to
     simplicity first, then maybe speed later.


- Did some thinking about tethered development.  This might be the
  most viable way of moving the project forward, and take it closer to
  Staapl.  What I really need however is an application to drive it.


Next?  Hard to say.  Probably the tethered approach and simpler
wrapping.  Those would be the most useful approaches.  Essentially
LEAF+GC could be written in C, but the rest might be generated from
Scheme or some first-order language.


Entry: Modifying VM1
Date: Sun Jan 24 23:38:24 CET 2010

Adding some simple optimizations might already improve it a lot:

- snap global variable pointers (-> cell)
- snap lexial variable indices (-> number)
- eliminate the STATE update allocation
- mark continuations as not captured to re-use frames

The basic idea is that the bootstrapping required for a compiler not
written in C is a pain.  If some bootstrapping is going to be
necessary anyway, it's probably best to use it to make implementing
the system easier, i.e.  raise the abstraction level of all the
components:

     VM:         scheme -> C/EX
     COMP, LIB:  scheme -> VM

Conclusion about dybvig's stack-based implementation: it's probably
not worth the trouble: better to re-use another Scheme implementation.


Entry: Dybvig-style VM
Date: Mon Jan 25 07:49:14 CET 2010

See presentation in [1].

It seems easier to use a full bootstrap from scheme to do development
and testing comfortably in PLT Scheme.

Write a VM + compiler in scheme.
Write a bootstrap / JIT compiler for the VM in scheme -> C/LLVM/ASM.

Principles:

CHEAPER:
   + a single stack for static and dynamic chains makes function calls
     and variable references cheap.

MORE EXPENSIVE:
   - boxes for assignment
   - display closures (flat closures) to eliminate environment 
   - stack copying for continuations


Question: how does ANF relate to this?

At first sight, ANF seems to work well for a stack-oriented approach,
as it gives an elegant way to preserve proper tail calls.

  * Une does not need to worry about proper calls when evaluating
    expressions, as they are never in tail position.

  * The tail call in a let expression can clean up the lexical
    environment after collecting arguments to a function but before
    performing the call, as all closures that use these variables use
    heap-allocated copies.

I think I'm starting to see it.  Primitive forms will be `let1'
(aggregated into `let*', and representing `begin2'), `if', variable
reference (either from the stack or from a display closure), primitive
application and closure application.

[1] http://icfp06.cs.uchicago.edu/dybvig-talk.pdf


Entry: Incrementally fixing VM1
Date: Tue Jan 26 08:59:30 CET 2010

Today I'm hopeful again that VM1 can be fixed incrementally.

- make all the dispatching uniform (if ... return)

- replace VALUE and STATE by using deforestation (connect the
  creation and dispatch by a goto)

- write a trivial compiler in C

- move functionality from interpreter to compiler


LOG:
- _sc_step_value() -> _sc_step_return()

- make STATE() construction uniform: only at tail call (i.e. return
  STATE()).  currently memory is reserved for a primitive call, so we
  don't need to cleanup.

  this might be an important step to return simplicity to the VM!

  something went wrong.. reverting: found it: forgot to wrap VALUE on
  prim return.


- deforestation: replace STATE(VALUE(),) constructors with a direct
  call to _sc_return.
  phase 1: replace STATE(VALUE(),) with STATE_RETURN(,)


- eliminte VALUE structure.  this might be more difficult.  the
  problem is that _sc_yield uses this + the state hacks too.

- its simplest to remove yield for now, and put it back when the
  transformation is done. OK

- VALUE is removed completely.

- eliminate the STATE structure.  all constructors need to be
  converted to jumps to the next_state entry point.  

  wow this seems to work.

- flip STATE and STATE_RETURN / STATE_REDEX: make the latter two fast
  and have STATE unpack a redex and call STATE_REDEX.

  OK


Entry: improvements
Date: Tue Jan 26 18:03:42 CET 2010

Some very slight improvements:


DEBUG:

$ time build/sc_vm1/sc_vm1 --boot sc_vm1/boot.scm --eval '(begin)'

real	0m3.989s
user	0m3.789s
sys	0m0.005s

$ time build/sc_vm3/sc_vm3 --boot sc_vm1/boot.scm --eval '(begin)'

real	0m4.540s
user	0m4.333s
sys	0m0.008s


RELEASE:
$ time build/sc_vm1/sc_vm1 --boot sc_vm1/boot.scm --eval '(begin)'

real	0m0.874s
user	0m0.846s
sys	0m0.003s

$ time build/sc_vm3/sc_vm3 --boot sc_vm1/boot.scm --eval '(begin)'

real	0m1.106s
user	0m1.062s
sys	0m0.001s


Entry: Compilation
Date: Tue Jan 26 20:30:52 CET 2010

Speedup isn't so great.  I'm surprised.  So what's next?  Speedup
varrefs.  This requires compilation to some intermediate structure.

Implemented `apply1' and `eval-core' without the weird ktx trampoline.
Using direct state modification and EXCEPT_RESTART instead.


Entry: Closure conversion
Date: Wed Jan 27 07:38:13 CET 2010

Simple lambda lifting turns first class functions into first order
functions by adding extra parameters at the definition side, and
providing extra parameters at the call site.

  (lambda (x y)
    (let ((f (lambda (a b)
               (+ (* a x) (* b y)))))
      (f 1 2)))
=>

  (define f (lambda (a b x y)
              (+ (* a x) (* b y))))
  (lambda (a b x y)
    (f 1 2 x y))

This works only for downward closures; not for escaping closures.

Closure conversion turns each function into a first order function by
adding an extra argument: a vector of free variables.  Function
objects are then represented as function body + free variable vector.

This is similar to "Display Closures" idea by Dybvig.  However, there
seems to be some difference as the latter needs assignment conversion.
Or maybe [1] does too?  It looks like it.  It is not mentioned how to
create the vector of free variable values in the first place.


[1] http://www.iro.umontreal.ca/~boucherd/mslug/meetings/20041020/90-min-scc/90-min-scc.pdf


Entry: Defunctionalization
Date: Wed Jan 27 16:33:02 CET 2010

Chapter 4: an interpreter is the output of defunctionalization, and
contexts (stacks) correspond to defunctionalized CPS converted code.


[1] http://www.brics.dk/RS/01/23/BRICS-RS-01-23.pdf


Entry: EX
Date: Thu Jan 28 17:09:25 CET 2010

Thinking about it, what is really useful about this project is the
garbage collector: it's simple.  It's also small.

25528 ex.o  -O3 -fomit-frame-pointer ...
14396 ex.o  -Os -fomit-frame-pointer

So, can it be made simpler?  The current GC restart mechanism is not
straightforward for imperative code.


Entry: A stack VM
Date: Fri Jan 29 07:57:27 CET 2010

So, what about writing a stack VM?  The main difficulty in the current
approach seems to be interaction with the garbage collector.


Entry: ANF: let* code
Date: Fri Jan 29 08:16:17 CET 2010

Assuming (begin a b) -> (let* ((_ a)) b)

The VM code is essentially a vector of non-tail calls, and a tail expression.

(let* ((v1 (e1 ...))
       (v2 (e2 ...))
       ...)
  (t ...))

The context needs to be saved whenever an (e_ ...) is a non-primitive.
      

Entry: Cross compilation
Date: Fri Jan 29 14:18:52 CET 2010

Needs some more options to work except for setting gcc.  I.e. AR is
needed, and seemingly also special linker flags.

New option: --cross:

    ../configure --cross=arm-linux-uclibcgnueabi


Entry: LEAF scaffolding
Date: Fri Jan 29 16:15:15 CET 2010

Goal: target has only libleaf.so

Build something on top of that.  I.e. it should be possible to have
only remote references.  Console with scripting core runs on host,
objects live on target.


Entry: Making it run faster
Date: Fri Jan 29 16:49:20 CET 2010

This was a major bottleneck.  2.5x speedup making this operation unsafe.

_ ex_find_slot_safe(ex *ex, _ E, _ var) {
    if (TRUE == ex_is_null(ex, E)) return FALSE;
    _ slot = CAR(E);
    _ name = CAR(slot);
    if (name == var) return slot;
    else return ex_find_slot(ex, CDR(E), var);
}

/* This leads to a 2.5 x speedup, but is unsafe if environment is
   replaced by a different data structure. */
_ ex_find_slot(ex *ex, _ E, _ var) {
    for(;;) {
        if (NIL == E) return FALSE;
        _ slot = _CAR(E);
        _ name = _CAR(slot);
        if (name == var) return slot;
        E = _CDR(E);
    }
}

The result is now:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 24.24      0.08     0.08     1645     0.05     0.05  gc_mark
 15.15      0.13     0.05   414186     0.00     0.00  ex_find_slot
  9.09      0.16     0.03     1554     0.02     0.21  _sc_loop
  6.06      0.18     0.02   399153     0.00     0.00  object_to_redex
  6.06      0.20     0.02   146816     0.00     0.00  sc_close_args
  4.55      0.22     0.02  3156824     0.00     0.00  object_to_pair
  3.03      0.23     0.01  2042892     0.00     0.00  _is_vector_type
  3.03      0.24     0.01  1888451     0.00     0.00  gc_alloc
  3.03      0.25     0.01  1435184     0.00     0.00  ex_cdr
  3.03      0.26     0.01  1198433     0.00     0.00  ex_car
  3.03      0.27     0.01   857956     0.00     0.00  gc_make_tagged
  3.03      0.28     0.01   706757     0.00     0.00  ex_is_symbol
  3.03      0.29     0.01   572557     0.00     0.00  ex_is_pair
  3.03      0.30     0.01   345584     0.00     0.00  sc_is_k_set
  3.03      0.31     0.01      327     0.03     0.03  _gc_finalize
  3.03      0.32     0.01      327     0.03     0.31  gc_collect
  1.52      0.32     0.01  1005671     0.00     0.00  ex_is_null
  1.52      0.33     0.01   140147     0.00     0.00  object_to_vector
  1.52      0.33     0.01                             ex_is_void


Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 21.21      0.07     0.07   414186     0.00     0.00  ex_find_slot
 18.18      0.13     0.06     1645     0.04     0.04  gc_mark
 12.12      0.17     0.04  1888451     0.00     0.00  gc_alloc
 12.12      0.21     0.04     1554     0.03     0.20  _sc_loop
  9.09      0.24     0.03  3156824     0.00     0.00  object_to_pair
  6.06      0.26     0.02   857956     0.00     0.00  gc_make_tagged_v
  3.03      0.27     0.01  2042892     0.00     0.00  _is_vector_type
  3.03      0.28     0.01  1024440     0.00     0.00  ex_cons
  3.03      0.29     0.01   706757     0.00     0.00  ex_is_symbol
  3.03      0.30     0.01   452726     0.00     0.00  sc_make_redex
  3.03      0.31     0.01      327     0.03     0.03  _gc_finalize
  3.03      0.32     0.01                             __i686.get_pc_thunk.bx
  3.03      0.33     0.01                             ex_lcar


So name lookup is quite expensive still, but not overly dominant.  GC
cost reduces significantly when increasing the memory size.  This
doesn't look too bad overall.  It's even usable in the ARM simulator:
4.5 seconds startup time.


Entry: SC VM1 done?
Date: Fri Jan 29 20:24:09 CET 2010

It doesn't look like there is much else to do without changing the VM
code drastically and introducing a compiler.

It's cute, isn't it?


Entry: Guaranteed tail calls
Date: Sat Jan 30 08:05:15 CET 2010

How to guarantee proper tail calls with a stack model?

(let* ((a ...)
       (b ...))
  (t ...))

Essentially, the space alloted to store the arguments of `t' should be
below the space alloted to store `a' and `b'.  This is the `shift'
opcode in Dybvig's dissertation[1].

Essentially, upon entry of a function one should allocate necessary
space to perform argument passing to the next function.


[1] http://www.cs.unm.edu/~williams/cs491/three-imp.pdf


Entry: The future of libprim
Date: Sat Jan 30 09:32:32 CET 2010

It's time for a sabbatical and work on something else.  VM1 is good
enough for now.  Efforts are best spent elsewhere, like automatic
wrapping, and moving some code to a higher level.

I'd like to resume with Staapl, or MetaOCaml based code generation.


Entry: Back to the basic idea
Date: Sat Jan 30 17:29:18 CET 2010

libleaf.so

Automate wrapper generation.  This is mostly about mapping argument
types in both directions:

      object_to_xxx     -> pointer or NULL
      xxx_to_object     -> EX object

Special cases: all scalar C types.  It is exactly those that make this
problem difficult or at least tedious and non-orthogonal.  Is there an
interesting problem to solve?

The main problem: C's data types. (aside: funny that Java simply
inherited that!).


Entry: Automatic wrapping: reifying LEAF type information at run time.
Date: Sun Jan 31 09:01:04 CET 2010

The idea is that a run-time process should be able to collect type
information by either inspecting the names in the .so or by inspecting
the header file.

Really, the next step is a parser that can extract info from the
header file, or a declaration macro that is easily recognized.

The wrapping (constructing a wrapper function from a type description
and a function pointer) and obtaining that type description should be
separated.

The simpler problem to solve is to add a means to define new types in
Scheme.  It all starts with dlsym: OK.


Entry: No restriction on pointers?
Date: Sun Jan 31 09:33:56 CET 2010

Is it possible to not restrict pointers?  If the heap storage is kept
in one piece, GC vectors can be recognized directly by their address.
The point being: it might be too restrictive to require alignment of
all pointers to be able to use tag bits..  This is OK on risc
platforms, but on x86 not all (code) pointers are word-aligned.


Entry: PF
Date: Sun Jan 31 11:12:50 CET 2010

What about building PF on top of LEAF?  Won't work, as it uses some of
the linear primitives in EX.

Maybe the real question is: can some of the linear primitives in EX be
written in LEAF?

Let's think about this a bit.  Is it possible to construct a tree
memory for LEAF objects that is optimized for storage size?

I already have tuples, but what this would do is to not use malloc().

The main reason: i want composition without GC.  For this it's really
not necessary to use a different object model.

Exceptions need to be more primitive.


Entry: Fixed exceptions
Date: Sun Jan 31 16:27:09 CET 2010

Leaf code can now throw exceptions through the leaf_ctx object.


Entry: Tasks
Date: Sun Jan 31 16:27:34 CET 2010

Maybe it's time to revive the old idea of tasks.  The region of the
stack from the top down to the setjmp point can be saved.

Fixed.
Also fixed proper stack growing for resume.


Entry: Automatic wrapping
Date: Sun Jan 31 21:50:31 CET 2010

Problem pops up again with the ck (C continuations).  The idea is to
abstract C code as lazy lists / generators, so C code can be free of
scripting language's memory model which would represent concrete
lists.

I'm currently stuck at defining an interface for such functions.  The
simplest is probably to require generators to be single input / single
output leaf transformers.


Entry: Task communication
Date: Mon Feb  1 07:52:01 CET 2010

EX vectors are not allowed because references are invisible to the
garbage collector.

Leaf references aren't allowed either, unless a way can be found to
add a finalizer to a task: otherwise we'll leak references.

I'm not sure that given these two constraints, this abstraction is at
all useful.


Entry: Wrapping
Date: Tue Feb  2 08:23:05 CET 2010

Look at lysp[1] for inspiration.  The `dlsym' function seems to be
able to create functions that call directly into C without any manual
type conversion.

It passed the pointer from dlsym() to mkPsubr() which stores it in a
struct tagged with Psubr.  In apply() this tag is used to setup an
argument vector and dispatch to primcall().  This then uses
__builtin_apply() and __builtin_return().

I don't understand the arguments to __builtin_appl().  Check here[2]
for inspiration; eval() and funcall() in dlfcn?

Maybe it's better to use libffi[3]?

[1] http://piumarta.com/software/lysp/
[2] http://www.google.be/codesearch/p?hl=nl#5ge3gHPB4K4/gnu/glibc/glibc-2.5.tar.bz2%7CHT3Jwvgod1I/glibc-2.5/dlfcn/eval.c&q=__builtin_apply
[3] http://sourceware.org/libffi/


Entry: gdb in emacs
Date: Wed Feb  3 23:25:07 CET 2010

It seems simpler to load with M-x gdb instead of M-x gud-gdb.  The
latter doesn't seem to interpret the annotate commands.

Also, it's simpler to use debug scripts:

  (gdb "gdb --annotate=3 -x ~/libprim/bin/tom.gdb")


Entry: Status
Date: Sun Jul  4 16:44:44 CEST 2010

To build, run "make" at the toplevel.

The project uses a single Makefile for all subprojects. (Google for
"Recursive make considered harmful").  Builds are always in a separate
directory which is build/ by default (make mrproper = rm -rf build).

The project is layered, see previous posts: basic linear objects <
graph memory < interpreters.  The main idea behind the project is to
enable composition of linear objects in non-linear way through a
scripting language, to provide scaffolding for a linear mostly C
application.

I use "linear" in a sloppy way, meaning: containing only data
structures in the form of directed acyclic graphs such that
asynchronous garbage collection is not required. (see linear logic).

* LEAF: linear C objects (refcounted non-cyclic graph).  Stable.

* EX: garbage-collected tree/graph memory.  Stable.

* SC: simple Scheme interpreter

  cd build
  make sc_vm1_test
  make sc_vm2_test

  VM1 is a treewalking interpreter based on a CEK machine and is
  fairly stable.  The Scheme is incrementally bootstrapped at runtime
  in a very idiosyncratic way on top of a basic C VM in the file
  boot.scm

  The VM2 is still under construction.  It is a byte-code interpreted
  CEK machine that would allow a more standard "big bang" bootstrap
  requiring a foreign Scheme interpreter to compile the boot image.


* PF (Packet Forth or Permutation Forth, maybe even Shufflort), 

  A mostly-safe, linear-data Forth dialect.

  cd build
  make pf_test

  Still in proof-of-concept phase.  The basic idea is that execution
  is linear (operations on stacks and lists), but compilation is not
  (conversion from data to executable code).  Somehow PF can be seen
  as a metaprogrammed language where the core language is simple and
  linear, and the metalanguage is nonlinear, more Scheme-like.
  However, the metaprogramming is just "eval" in a global namespace.

  Problems: the implementation is error-prone (linearity constraint is
  not automatically enforced).


Entry: gdb as an application console
Date: Wed Jul 21 12:43:15 CEST 2010

I'm doing a short 1-2 month project on an Atmel ARM7 AT91SAMX512 using
eCos.  I thought about bringing up a Scheme or PF clone on the
machine.  My work is mostly of low-level nature, fairly tied to the
device (Flash storage management), so an interactive on-target
debugging approach is useful.

However, this would take too long to get going.  Additionally, the
board I'm working on didn't have a console input, just output for
logging.  I decided to use just gdb with Segger's J-Link gdbserver on
an old windows box connected to the target using JTAG.

I did need to tweak some eCos compilation flags to make sure that
functions that are not referenced in the code still end up on the
target, preventing them to be tossed out by the linker.  This required
removal of: (man gcc)

       -ffunction-sections
       -fdata-sections

           Place each function or data item into its own section in
           the output file if the target supports arbitrary sections.

It turns out that test-driven development (use a separate eCos test
application to scaffold the code you're writing) combined with the
possibility to interactively execute code on the machine is more than
enough to get the necessary interactivity.

The approach I used:

   - Define some "registers", i.e. instances of the objects under test

   - Define operations on them that have short names, and don't take
     excessive amount of arguments (i.e. 0 or 1).

This gives you a small working set to use interactively using the gdb
"p" command, which allows you to execute individual functions on the
target as long as it has an execution stack setup.

> p x_dump(123)

The x_dump() function in my case would take an integer argument, fetch
a block from Flash into one of the registers, and display it in
human-readable formatting.

This makes me wonder: how powerful is the gdb scripting language?  The
on-target functions can be strung together in sequences like this:

define dump2
  p dump(0)
  p dump(1)
end

Doesn't seem to be very powerful.  There seems to be a Python
interface though[1].

[1] http://sourceware.org/gdb/current/onlinedocs/gdb/Python.html#Python


Entry: Tree permutation machine
Date: Sat Aug 14 11:47:46 CEST 2010

I'm thinking about implementing PF in a higher level fashion.

    1. Specification as binary tree transformation machine.

    2. Automatic implementation.

Here 1. has as few primitives as possible to guarantee correctness and
2. "caches" some of the composite operations that are used a lot.
Which ops to cache could be defined as a formal search problem.
(I.e. given this much code space, find the best decomposition of ops,
and find a suitable "unrolling" of the tree into registers).

So, what are the primitives of binary tree transformation?

Let's start with the tree transformation group.  A link Antti
suggested [1].  Is this relevant?  I don't see it immediately.

What I'm looking for is generators of the group of permutations of
infinite binary trees.  Each permutation has a finite reach, so can
also be performed on finite trees of sufficient depth.

Are swap and rotate left/right enough?  In [2] I write that a right
rotation can be constructed from a left rotation between swaps, but
it's a bit more complicated:

(A B) C  -R->  A (B C)

A (B C)  -L->  (A B) C

A (B C)  -S->  (B C) A
         -R->  B (C A)
         -S->  (C A) B
         -R->  C (A B)
         -S->  (A B) C

Denoting the group action G on tree t as tG we write G1 followed by G2
as G1G2.  This gives  L = SRSRS.

What are the permutations generated by these primitives?  Can we
generate all binary tree permutations?

An interesting place to look is Wouter van Oortmerssen's [5]
aardappel[6] language from his PhD thesis[7].

( A quick reminder..  Practically, what was the motivation again?
Linear languages are good for embedded systems because they have
immediate resource reclaim, which is important when
resource-allocation becomes a real issues, which is often the case
with components instantiated in hardware.  This is where a-synchronous
mark/sweep collectors to not fare well, as they assume a large pool of
resources that can be reclaimed infrequently. )

I'm playing with the math in [8].  Seems that a "return stack" is a
necessary element to be able to gain locality.  Also, for
non-conservative operations an infinite chain of free nodes is
necessary.


[1] http://www.research.att.com/~njas/sequences/A089840
[2] entry://20090823-100042
[3] http://stackoverflow.com/questions/651565/permutations-of-a-binary-tree
[4] http://en.wikipedia.org/wiki/Random_binary_tree
[5] http://strlen.com/programming-languages
[6] http://strlen.com/aardappel-language
[7] http://strlen.com/files/lang/aardappel/thesis.pdf
[8] entry://../math/20100814-142538


Entry: PF impressions: what's still there after a couple of months?
Date: Sun Aug 15 20:46:39 CEST 2010

It helps to not work on something for a while -- weeds out the fringe
ideas.

Looking at pf.c now, one of the most remarkable points is that partial
continuations become code words constructed at run time, using linear
CONS instead of non-linear CONS.  Continuations are code that is
transformed (binary tree rotations) and _consumed_ while hard-coded
instructions are code that is constant.  This "consumption of code" is
an important one (aka. one-shot continuations).

Code itself is still non-linear, allowing for non-linear
metaprogramming.  Metaprogramming lives completely in the "unreal"
domain, while the linear core operates in the real (hardware) world.

What strikes me is that it is really simple and natural.  The switches
between LINEAR and PURE operations are the only ad-hoc part.


Some of the more important questions are: why is (or isn't) linear
memory a good programming interface?  How come that I have not much
trouble writing linear C code (not using malloc) for embedded systems
work?

[1] entry://../c/20100815-210541
[2] http://lambda-the-ultimate.org/node/558


Entry: Semiphere collector
Date: Mon Aug 23 22:39:26 CEST 2010

Is it possible to use the 2nd half of the cheney/semiphere collector
for something else?  Some scratch buffer space for part of the code
that doesn't need the GC?

I'm thinking for embedded use where RAM is scarce it might be best to
use a more memory-efficient but still simple coloring collector and
get rid of vectors.  PF doesn't need them (uses only tagged CONS cells).


Entry: A metaprogrammed core
Date: Wed Sep  1 21:57:44 CEST 2010

It's difficult to leave behind the swiss army knife: C with malloc().
But, in the presence of some form of metaprogramming, dynamic memory
allocation isn't necessary.  That is, if you don't count linear CONS
cells to build tree structures.

Does that actually mean anything?  How to deramble?

Static memory with bounded stacks and queues.

Linear memory, when re-using, is quite fast.  If dynamic allocation is
needed for CONS cells only, some time is required to "flatten" binary
trees into linked lists:

  * Either the free list is a list, and the drop() operation flattens
    a tree data structure.

  * If the free list is a tree, an allocation might need to locally
    flatten (rotate) a tree until a cell becomes available, or it can
    decend into it, possibly leaving a caching pointer.

When linear memory is combined with refcounted memory management (lazy
copy) or external resource finalizers some more degrees of freedom
arise: execute finalizer on drop() or new().  The answer here is
probably universally best on drop(), otherwise there is garbage
floating around.

Problems arise when moving from single cell sizes to vectors of
arbitrary length (either as pointer-containing cells or as
pointer-free leaf objects): memory fragmentation can arise.

In theory at least it should be possible to use only a single cell
size, also for pointer-free raw data.  However this might complicate
code and reduce data locality.

Is there a way to keep memory semantics equal to binary tree, but
somehow restrict the form it can take, and allocate it accordingly?

I.e. a program has a certain structure which creates a group of
transformations on a binary tree.  Can we somehow encode each of these
states in such a way that the pointer bookkeeping can be "hardcoded"?

Group theory and memory management...

The problem is that it probably gets quite complicated with a couple
of independent generators.


Entry: Port to eCos
Date: Fri Sep 17 14:35:58 CEST 2010

I'm attempting a port to the eCos operating system.  Start from:

# Android.mk :

LEAF = libprim/leaf
EX = libprim/ex
SC = libprim/sc
SC_VM1 = libprim/sc_vm1
JNI = libprim/jni

LOCAL_MODULE     := scheme
LOCAL_SRC_FILES  := \
	sc_android.c \
	$(LEAF)/bytes.c  \
	$(LEAF)/port.c \
	$(LEAF)/parser.c \
	$(LEAF)/scanner.c \
	$(LEAF)/tuple.c \
	$(LEAF)/leaf.c \
	$(LEAF)/symbol.c \
	$(LEAF)/channel.c \
	$(LEAF)/inexact.c \
	$(LEAF)/console.c \
	$(EX)/ex.c  \
	$(EX)/read_intl.c  \
	$(EX)/gc.c \
	$(JNI)/libscheme.c \
	$(SC)/sc.c \
	$(SC_VM1)/vm1.c

I've added the file eCos.mk

What are the difficulties?  As with the Android port the main
challenge is to generate the C files.  Let's ignore that problem now
and see if the generated files compile.

I spent an hour to try to make it build.  Something goes wrong with
generating the .d files:


libprim/ex/ex.d:1: *** multiple target patterns.  Stop.

It's generated by this:

gcc -MM -I"/home/tom/ubidata/build_linux/install/include" -Wall -Wpointer-arith -Wstrict-prototypes -Wundef -Woverloaded-virtual -Wno-write-strings -g -O0 -fno-rtti -fno-exceptions -Ilibprim/build -Ilibprim  -Isrc -Isrc/utils -Isrc/dec libprim/ex/ex.c | sed 's!ex.o[ :]*!libprim/ex/ex.o libprim/ex/ex.d : !' > libprim/ex/ex.d


I see.  It's the dot!

The regular expression "ex.o" matches "ex/object" in the libprim
sources.  It needs to be quoted.  It comes from:

%.d: %.c
	$(XCC) -MM $(XCFLAGS) $< | sed 's!$(*F).o[ :]*!$*.o $@ : !' > $@


Changing it to use quoting works:

%.d: %.c
	$(XCC) -MM $(XCFLAGS) $< | sed 's!$(*F)\.o[ :]*!$*.o $@ : !' > $@


Entry: de-posixification
Date: Sat Sep 18 10:19:17 CEST 2010

While it should be possible to get libprim to run on top of the posix
layer, it might also be a good idea to get a smaller subset to run, or
to make a mini-compat layer.

Currently there is a problem due to inclusion of linux headers in the
synthetic target.

Removing port.c it compiles.  However, there are some undefined
symbols.

I've separated port/port_posix and ex/ex_posix + sprinkled some
"#ifdef POSIX" here and there when things are too hard to separate.
Seems to work out.


Entry: Booting from string.
Date: Sat Sep 18 11:47:47 CEST 2010

The VM needs to boot from a "const char[]".


Entry: leaf init methods
Date: Sat Sep 18 12:58:43 CEST 2010

The idea is to allow for local objects that do not need deallocation.
If an object has an _init() method, it is ok to create one on the
C-stack.  The guarantee is that _init() methods will not perform
dynamic allocation using malloc() or other resource aquisition.


Entry: const wrapped in bytes
Date: Sat Sep 18 13:21:46 CEST 2010

Can we do this without introducing an extra field?  Plan: 

* limit the accesses to b->bufsize to things that actually _allocate_ 
* make that access abstract 
* set bufsize == 0 to mean it's const data, in which case size is the real size

Ok, this seems to work - pending some weird problems associated to the
fact that this is an un-intuitive ugly hack..

I think it also uses a bit too much stack space.

Next problem: longjmp

Nope, that seems to work just fine; just need to make sure you set a
breakpoint at the setjmp.

Same problem as for the JNI port: no decent console output.


Entry: Strings or pre-parsed source?
Date: Sat Sep 18 14:59:46 CEST 2010

There are two possiblities now: embed the boot file as a string, or
use the existing unparse.ss to include it as a constructor.  The
latter might be interesting when it is actually combined with some
form of hash consing so part of the scheme code body doesn't need to
be in RAM.  But it's overkill now; I just need to make it run.

So string.  What's the simplest way to convert a string to a linkable
object?  I'd go through the C compiler but maybe objdump or objcopy
can do this to.

Yes, it's quite straightforward with objdump[1]


arm-eabi-objcopy -I binary -O elf32-little --rename-section .data=.rodata file.bin file.o


It's simple enough, but including it in the build system is a royal pain.


[1] entry://../c/20100918-151333


Entry: Compilation to .a for eCos
Date: Sat Sep 18 20:07:19 CEST 2010

I don't like the eCosPro build system.  Let's make it so that it
compiles with simple eCos tools, and produces a library that can be
linked as a whole into the project.

After setting the -I (and -L) paths, the show stopper is the current
module system, as it is not separated into posix/non-posix.

Should I create leaf_posix and ex_posix modules?

Done that.  This makes config a bit harder though: when to include
ex_posix in sc.c for example?


Entry: separating out posix?
Date: Sat Sep 18 22:22:52 CEST 2010

Maybe it's not a good idea as this gives multiple dimensions:
leaf/ex/sc and portable/posix.

Ok, it's not so hard to manage.  Let's stick with the current
structure.  The modules per directory approach works quite well.


Entry: GC and consts
Date: Sun Sep 19 00:44:37 CEST 2010

Is it possible to integrate const data with the garbage-collected
heap?  The point is to store bytecode in Flash.  It doesn't seem to be
a problem as the heap locations are known.


Entry: Meta system : which one to pick?
Date: Sun Sep 19 16:10:36 CEST 2010

The current c preprocessing is a bit ad-hoc.  Libprim should probably
stay implemented in C, but the meta-system should be converted to a
single language.

Can we do the pre-pricessing with a real C parser?  I know of 3 that
are useful.

  * Racket + c.plt : doesn't seem to be maintained

  * OCaml + Frontc (godi) : seems to work + might work in metaocaml too

  * Haskell + Language.C : not tried yet


Because I'm a bit afraid of using Haskell in a real-world app that
might need some stateful hackery, and because c.plt needs more effort
to get into and fix, it seems that the real alternative is Frontc.  I
believe MetaOCaml also uses parts of this.

As mentioned [4] there is also LLVM Clang.  

[1] http://planet.plt-scheme.org/package-source/dherman/c.plt/3/2/planet-docs/c/index.html
[2] entry://../meta/20090227-102018
[3] http://hackage.haskell.org/package/language-c-0.3.2
[4] entry://../meta/20090719-164343


Entry: .h_prims generation
Date: Sun Sep 19 16:51:53 CEST 2010

The problem is that there is a nasty feedback loop.  The language
isn't yet C when the .h_prims generator runs.

So really, I'm not looking for a C parser!

I'm looking for something that can look beyond the holes in the C
files to be able to fill them in a second pass.

Preprocessing isn't really an option either as this pulls in a huge
amount of junk from the system header files.

Is there a way to incrementally (lazily?) backtrack into header files
to resolve what is necessary?  This is mostly about typedefs in
Frontc.

Also, Frontc doesn't seem to keep track of original source locations
so replacing parts of syntax is not going to work.

Stick with what I have?  Regexp based foo foo.


Entry: Scheme implementation design space
Date: Sun Sep 19 18:48:48 CEST 2010

It's actually a bit bigger that I thought.  Some issues, thinking
about space and time.

  * Memory representation:
     - Garbage collector type
     - Vectors or CONS cells
     - Integers, tagged or not?
     - Opaque objects and finalizers

  * Interpreter:
     - Byte code or tree walker
     - Full continuations support?
     - Stack optimization (complicates tail calls!)
     - Closure implementation
     - Boxed values

  * Compiler:
     - Self hosted compiler? (in case of byte code)
     - Macro expansion
     - Macro hygiene


The main ideas behind the libprim/SC interpreter are implementation
simplicity and memory usage.  The latter isn't so good because of:
     - Semisphere collector
     - 3 word CONS cells
     - No hash consing
     - Code as s-expr instead of byte code


Entry: Hack to improve RAM usage
Date: Sun Sep 19 18:55:37 CEST 2010

Let's say we have plenty of Flash.  And let's say that multiple passes
are allowed.  Can we dump out the image such that it can be stored in
Flash and simply resumed?

This requires serialization of the leaf objects.

That is fixable, but it does not work for non-serializable objects
like OS references (file descriptors).  An image is a tree of memory
objects where some of the leafs are holes that link it to the OS.
BOOTING = LINKING with OS.

But if the Scheme _is_ the OS, then there are really no such objects.

The point is: that is wrong.  In fact, going down to the hardware
there is always going to be hidden state and context-sensitive
behaviour, simply because a machine is never isolated.

Linking with the OS/hardware is linking with _objects_ not just data.

What is the interesting problem hidden here? [1]

An operating system image has a number of communication channels.
Whenever an image gets restored/resumed, the communication channels
need to be restored.

Can we make this explicit?  Have a base object that is a port and a
standard way to re-initialize.  That way image snapshotting is free.

The trick there is dependency between ports, so let's have a port
hierarchy (one port initializer is parameterized by several
initialized ports).


[1] entry://../compsci/20100919-193304


Entry: Now that..
Date: Sun Sep 19 20:19:40 CEST 2010

..I'm runnig loose again.  How to 1-up the libprim design?  It's OK,
but there are some points that are hard to change.

Thinking about things to do with a full machine at disposal.  One of
the cool things about having no external code that deals with pointers
is that memory can be moved around freely.

Many of the ideas that are half-baked in my head are more worked out
in the COLA system.  However, I don't want to give up externally
hosted compilated. (here-you-go code).

This is again more a Staapl idea.  libprim is about C projects and
external code.


Entry: Building eCos / arm libprim
Date: Sun Sep 19 22:07:03 CEST 2010

# Looks like all is well now.  This is how to get it going:


cd ~/libprim
mkdir build_ecos
cd build_ecos
../configure --target=eCos --cmdprefix=arm-eabi- --arch=arm --prefix=<ecos-install>
make VERBOSE=1 ecos_sc_vm1


Entry: libprim on ARM7TMDI
Date: Mon Sep 20 00:12:24 CEST 2010

I'm surprised the .bss is a lot smaller on the arm target than the
linux synthetic target.

tom@one:~/libprim/build_ecos_arm/ecos_sc$ arm-eabi-objdump -h main

main:     file format elf32-littlearm

Sections:
Idx Name          Size      VMA       LMA       File off  Algn
  9 .text         00024118  00100040  00100040  00008040  2**2  CONTENTS, ALLOC, LOAD, READONLY, CODE
 10 .rodata       00007664  00124158  00124158  0002c158  2**3  CONTENTS, ALLOC, LOAD, DATA
 13 .data         000012ac  00200180  0012b8d4  00038180  2**3  CONTENTS, ALLOC, LOAD, CODE
 14 .bss          00006d60  0020142c  0012cb80  0003942c  2**4  ALLOC
tom@one:~/libprim/build_ecos_arm/ecos_sc$ 

It would be nice to find a way to put some Scheme code in Flash too.
After macro expansion, code is essentially static.  RAM is only needed
for continuations and closures.


Entry: Memory opti
Date: Mon Sep 20 00:23:54 CEST 2010

1. Allow for Flash storage of leaf/ex objects.

This requires a modification of free() for leaf objects (to ignore
flash pointers), and the GC such that EX vectors stored in flash are
not taken into account in GC and treated as constants.

2. GC: semisphere -> mark/sweep

3. Hash consing?

The last two introduce more tag bits.  The former should be
straightforward.


Entry: Dump image w. closures
Date: Thu Sep 23 18:26:41 CEST 2010

Apart from other serialization problems, how to serialize closures?
This shouldn't be so difficult:

(let ((a ...)
      (b ...))
  (lambda (x y z)
     ...))

The only problem is that environments can be shared, which is not so
easy to fix.


Entry: Know your arena
Date: Sat Sep 25 17:18:16 CEST 2010

See previous post.  

* For recognizing non-managed vectors it's simplest to just compare
  with the current arena pointers.  This will always give an exact
  answer.  It could be optimized with more platform knowledge, but
  that's not really the point.

* Macro expansion needs to happen before compilation to linkable .o
  code with expanded ASTs.

So some earlier unparse.ss code can be modified to support Flash
stored scripts.  The main problem is to work around the ad-hoc
incremental bootstrap in the boot.scm file.


Entry: Boot expand
Date: Sat Sep 25 18:04:31 CEST 2010

I changed the structure of the boot file to:

    (begin
        (begin . <boot-1-exprs>)
        (eval-list (quote <boot-2-exprs>)))

This is a valid core form and makes it simpler to process.  The
`eval-list' form is defined in <boot-1-exprs>.

This runs the expansion script:


~/libprim/build_linux/posix_sc/main \
       --boot ~/libprim/sc_vm1/boot.scm \
       --eval '(begin (load "/home/tom/libprim/sc_vm1/boot_expand.scm")(script))'  \
       ~/libprim/sc_vm1/boot.scm /tmp/boote.scm


Also fixed the makefile + used and changed the standard `script'
function.


Entry: Serialize core
Date: Tue Sep 28 10:05:12 CEST 2010

There seems to be little else than to actually try to serialize a
running core and see where it doesn't work (external refs).

Strategy:
- collect all nodes (vectors)
- independently name them
- generate a list of refs (serve as C declaration)
- generate a list of defs in terms of refs

EDIT: This might work for the bare metal version, but is not quite so
simple for posix.  There is a lot of state in file descriptors, and it
definitely needs some form of abstraction of "reconnect" for each
object type.

The real question is if this is worth it, apart from being just a cool
idea.


Entry: c.plt is fixed
Date: Sat Oct  2 19:36:58 CEST 2010

Already 13 months ago; immediately after my bug report -- seems I
missed that!

EDIT: actually not, some other problems remain.

So, let's see what we can get to work on libprim.

First, some recap:

  * The syntax rep in c.plt uses prefab structs[1], see ast.ss:
    
    (define-struct src (start-offset start-line start-col end-offset end-line end-col path) #:prefab)

    Prefab structs can exist without being defined, and a definition
    as above simply defines standard struct access procedures to
    access a particular prefab struct.

  * Prettyprinting can be done using dherman/pprint[2].

    
[1] http://docs.racket-lang.org/guide/define-struct.html#%28part._prefab-struct%29
[2] http://planet.plt-scheme.org/package-source/dherman/pprint.plt/4/4/planet-docs/pprint/index.html


Entry: Trying to implement pointer reversal mark GC
Date: Fri Oct  8 09:27:10 CEST 2010

The point: build a CELL store + GC that is very memory efficient to be
run on an embedded ARM target with about 64k RAM. It doesn't need to
be fast.

My eye has fallen on mark/sweep with pointer reversal mark (not using
a stack) and possibly a lazy sweep (during alloc).

The cell store can use 14-16 bit pointers so we can fit a cell in one
machine word.

During traversal, a cell can be in one of 3 states:

   - not marked / traversed  (FREE)
   - marked, left reversed   (TODO)
   - marked, right reversed  (DONE)

This leaves one more state to encode full object pointers.


So how to implement the pointer swapping.  What are the operations on
a node?

   - recurse left
   - recurse right
   - return

My confusion seems to be about return.  Recurse left will swap
pointers left, recurse right will swap pointers right but return will
restore pointer + it needs to know where to put it and whether to
recurse again.

So the marking is really about what state the object is in: did we
initiate a left recurse or a right recurse.


Let's write the traversal as an interpreter of the current code.  It
can be in one of 3 states: TAG_FREE, TAG_CAR, TAG_CDR.

What are the invariants?  The current pointer points to a node that is
in one of 3 states indicating where the back pointer is.

   FREE                CAR                CDR

   ^   ^                  ^              ^
   |   |                  |              |
 [ x | x ]          [ x | x ]          [ x | x ]
                      |                      |
                      v                      v   


In the CAR and CDR cases, an extra storage element is necessary to
hold the overwritten pointer to the subtree under investigation.

Let's use a more interpreter-friendly terminology:

  IP = current subtree under investigation
  RP = location of continuation

Let's call them code and continuation: c, k.


Ok, this works pretty well!

Next: either implement struct/vectors on top of binary trees, or build
a different machine alltogether.  The latter might be best since we
could keep only run-time on the embedded target, and repl + compiler
on the host?

Write the whole thing in racket.

Represent code and continuations simply as pairs.


Entry: Interpreter on top of CELL
Date: Fri Oct  8 22:56:09 CEST 2010

Pick up the VM2 idea, but run it on CELL.

Environment: list.
Varrefs: ints (atom interpreted as number).
Closure: (cons code env)

Can we make a toplevel-less VM?  Run only VM, keep the
compilation/interpretation on the host side?

Remember: the goal is to make compact code.  There are only a handful
of primitive operations and continuations to encode, so best to use
some compressed format.  No need to use addresses.


Question: what kind of VM to use?  Is an ANF machine small and simple,
or should I pick something from Dybvigs writings.?

ANF seems fine if extended with some schemeisms:

   EXP ::= (app VAL VAL ...)
        |  (let (VAR EXP) EXP)
        |  (begin EXP EXP)
        |  (set! VAR VAL)
        |  (if VAL EXP EXP)
        |  (quote DATUM)

   VAL ::= (lambda (VAR ...) EXP)
        |  VAR

Maybe it's simpler to use VAR instead of VAL, and have a let-lambda
form that creates a closure from the lambda expression.

Note that encoding the core language in some weird structure makes it
harder to store continuations.  The continuations needed above are for
`let', `begin'.  Note that `let' and `begin' are the same apart from
`begin' not binding (keeping) the value of the expression.  The basic
idea of ANF is that there is only one kind of continuation frame.

Maybe it's best to stick to simple s-expr encoding.  This would work
better with small integers stored inside the cons cells.  This
requires one extra bit sacrificed, giving 14 bit cell address space =
64kBytes.

Another possibility is to take part of the range for small numbers.
Usually lexical context doesn't go too deep, which is what we would
use it for: varrefs.

Say the first or last 256 cell addresses are used for representing
small integers.  That sacrifices only 1/128 of the max available cell
space, and is enough for encoding varrefs, opcodes and continuations.


Entry: Tagging w. fixed cell store.
Date: Sat Oct  9 00:41:43 CEST 2010

The fixed cell size also makes tagging simpler: extra tag bits can be
added in separate tables without having to give up pointer address
bits.  In fact, it easily scales to other cell store sizes simply
adding non-aligned memory.


Entry: lambda as EXP
Date: Sat Oct  9 00:59:37 CEST 2010

The modified ANF which only uses variable references as values then
becomes:

   EXP ::= (app VAR VAR ...)
        |  (let (VAR EXP) EXP)
        |  (begin EXP EXP)
        |  (set! VAR VAR)
        |  (if VAR EXP EXP)
        |  (quote DATUM)
        |  (lambda (VAR ...) EXP)

Of these only `begin' and `let' push continuation frames.


Encoding the vm is fairly straightforward. (just need to take care to
also restore environments!)

It will probably work better with local labels though; encoding as
functions needs too many trampolines.  But before going on it might be
best to abstract the data structures a bit more.  CAR and CDR can get
confusing.


Ok, got computed goto VM written out.  It seems quite simple
actually. Now I only need op_app.


Entry: Application
Date: Sat Oct  9 12:07:51 CEST 2010

The remaining degree of freedom is function application.  Issues:

- What order?

  Let's put them in a way that is simplest to implement in the
  interpreter.  Order of evaluation is handled in the compiler anyway.
  By moving around the args in the opcode we can still play with
  encoding in the compiler.

- Use ridges or flat list?

  Flat lists seem simpler.  The remaining question is then the order
  of arguments.


Entry: Cleanup
Date: Sat Oct  9 13:24:36 CEST 2010

* Moving more code to cell.h

* Put tag bits in separate table?  Morale: interpreter does not need
  tags: bytecode has been type-checked.


Entry: Cells in flash for storing VM code.
Date: Sat Oct  9 17:24:42 CEST 2010

What is necessary?  Something that supports CAR/CDR and NCAR/NCDR for
representing code in Flash memory.

The problem is that CONS needs to work on such references: any object
or part of an object that eventually gets stored in a RAM pair needs
an address that maps into RAM cell space.  That's quite a constraint.

However, just storing pairs like is done in RAM isn't a problem as
long as relative<->absolute translation is done properly.  For an
embedded target with known memory map this can probably be done
efficiently.

To have "more code" in the Flash, not consuming cell address space,
the simplest approach is probably to just compile to machine code or
other internal data/code reps.  Anything goes there.


Entry: C primtives in new VM
Date: Sat Oct  9 17:41:39 CEST 2010

This poses a problem as the primitives need run-time tagging; all the
other code is untyped (pre-checked statically typed).

Instead of thinking of primitives as functions, it might be
interesting to think of them as closures, i.e. with a full lexical
context.


Entry: Arbitrary cell sizes
Date: Sat Oct  9 17:56:43 CEST 2010

It doesn't seem to be so hard to now move to arbitrary cell sizes.
Everything should really be just the same.

Hmm..  Not _that_ trivial.  Let's leave for later when it's necessary.


Entry: PICOBIT
Date: Sun Oct 10 02:13:47 CEST 2010

Interesting read[1].  Also read[2].

In [1] it mentions a C dialect with non-recursive functions combined
with global register allocation.  Apparently MicroChip's Hi-Tech C
compiler uses this technique.

To brutally summarize, PICBIT[2] is mostly about compact data encoding
while PICOBIT[1] focusses on compact byte code representation.

[1] http://www.ccs.neu.edu/home/stamourv/picobit.pdf
[2] http://www.ift.ulaval.ca/~dadub100/files/picbit.pdf


Entry: Byte code
Date: Sun Oct 10 09:04:29 CEST 2010

What I conclude from [1] is that the current direct ANF-in-tree
approach can be simplified a bit.  Is it possible to distill a byte
code from simply observing the pattern of push/pop instructions?

The main pull is the conversion from expressions to flat code.  The
code pointer becomes specialized.

One thing to note though is that the compiler itself really isn't
trivial either!  I'm far from done!!

Roadmap:
  1. add reader to cellvm to make testing easier (OK)
  2. don't use c reg for storing "continuation code"
  3. copy to cellvm2.c and convert the code pointer to byte code

Would it be possible to describe the VM in scheme such that it can be
easily translated to C, so at least the VM can run also in a
scaffolded setup?

[1] entry://20101010-021347


Entry: Cell questions
Date: Sun Oct 10 15:13:23 CEST 2010

- vectors as balanced trees to support EX, PF and SC?

  cells: simple gc, no need for compacting (also: SRAM: no need for
  locality).

  vectors: either compacting but memory-inefficient (semi-sphere
  copying) or run-time inefficient (mark/sweep + compact).

  for low mem cells seem to be quite the winner.  how to properly
  abstract structs on top of cells?

- convert current tree walker interpreter to bytecode interpreter

  basic idea: instead of representing code as cells, use byte code,
  which is +- cdr-coded lists.

  balanced tree representation of n slots + header (size + type tag).
  SC vm needs:  


Does it make sense to run the old VM1 on a limited cell store?  It is
quite inefficient with memory already.  However it might be possible
to reuse the part of EX that doesn't rely on structs and vectors.


Entry: EX on top of CELL?  NO  -> split EX code
Date: Sun Oct 10 16:12:37 CEST 2010

It doesn't seem to make much sense to port EX on top of CELL.  It
makes more sense to centralize pair processing code since it is used
in LEAF, EX, CELL (and PF).

1. It seems that most code in EX can be auto-generated: it's really
   mostly wrappers.  The beef is in LEAF anyway.

2. The PAIR processing code can be abstracted and possibly reused in
   LEAF for its linear / RC cells.


Entry: Answer original question: better GC for SC?  NO.
Date: Sun Oct 10 16:29:30 CEST 2010

It doesn't seem to be straightforward to add a CELL based GC to SC.
It seems better to write a compacting vector GC, which will make it
even slower.  Then it really needs a more compact code representation
in the form of a byte code interpreter.

I think I'm going to give up on the idea.  EX/SC/PF are for
"linux-size" hosts: couple of MB of RAM.  Small embedded hosts need a
different architecture like CELL.

CELL can use LEAF, but probably it's best to refactor LEAF to remove
the dependency on malloc.

So where am I at?  Is it worth writing a compacting vector GC?
Probably not, as I don't think it would bring me to the real target:
run the whole interpreter on an ARM target using about 32kB of RAM.
That really needs a different approach.

It would be interesting to figure out how much memory it uses on a
linux arm target with static libc.

Anyways..  It was a fun detour but the original idea seems to be a
no-no: there is no simple way to make it work on the AT91.  That needs
a different approach.  There are 2 interesting other things that came
out though:

  - cell based GC is quite simple to implement

  - a tree-walking ANF interpreter is also simple

  - the road to a working byte-code version doesn't seem too hard to
    walk, but is paved with a million trade-offs if efficiency is key.

Possible roads:

  - find the relation between the ANF treewalker and a straightforward
    byte code version.

  - see how this relates to the compiler.  can the ANF be an
    intermediate form so we can compile to a treewalker and a byte
    code version without much extra effort?

  - do the same for VM2 on top of EX.

  - once the bytecode machine can work on SC _and_ reduce memory usage
    (and increase speed?) then see about implementing a memory
    efficient compacting vector GC.


Entry: arm uClibc
Date: Sun Oct 10 18:31:40 CEST 2010

Using a buildroot install in /opt/xc/android/buildroot-2009.08/

configure --cmdprefix=/opt/xc/android/buildroot-2009.08/build_arm/staging_dir/usr/bin/arm-linux-
qemu-arm -L /opt/xc/android/buildroot-2009.08/project_build_arm/uclibc/root/ ./cell


Entry: Tree walking vs byte code
Date: Sun Oct 10 18:49:53 CEST 2010

Is ANF byte code just CDR-coded treewalking?  Looks like it.  Code
takes the form of a sequence of LET and BEGIN forms.  You only need
one bit to distinguish between both, so let's focus on LET only.

So byte code would look like this:


       PUSHK(ex1_r)     // start of let
       jump code_ex1    // compute value (this code jumps to k_return)
ex1_r: PUSH(e, v)       // continuation code pops value to env stack
       ...              // execute body of let

/* This code can also just be inlined. */
code_ex1:  ...
           v = some_data
           goto k_return

k_return:  c = POP(k)
           i = NPOP(c)  // address of ex1_r
           e = POP(c)   // restore env


The above is a bit informal but the idea is quite simple: 

Before jumping into a machine code procedure that computes a value v
to be added to the environment as a new binding, push a continuation
frame that saves the environment, and the code address to resume
operation.

The environment is caller-saved because the callee might be an
application of a closure which destroys the environment.

In the current vm.c the continuation needs 3 args instead: syntax (c)
+ interpreter (continuation bytecode such as k_let and k_begin).


Entry: CALL and RETURN
Date: Sun Oct 10 19:24:21 CEST 2010

So effectively, we're implementing a fairly standard CALL and RETURN
with the following additions:

     - CALL environment pointer
     - RETURN restores environment pointer

How does this then lead to proper tail calls?  If after a sequence of
CALLs to build an environment (the RHSs of `let*') an application is
invoked, there is no frame created: only the environment changes as it
comes from the closure, but the continuation stack is "passed on".

WOW.

I finally got it!


Entry: Scheme is Forth with data stack swapping
Date: Sun Oct 10 19:33:33 CEST 2010

Save data stack on the return stack on every call as data stacks are
replaced by each closure application.  Return restores data stack.


Entry: Generating byte code from ANF trees..
Date: Sun Oct 10 20:10:14 CEST 2010

..is then of course very straightforward (it's the whole point of
ANF).  The idea is to flatten the tree and use jumps, return address
pushes or combinations (calls) to implement it, depending on what is
simplest for the architecture.

Generating C code it's probably best to stick to calls, and return
environment on primitive return.  However if continuation capture is
to be implemented, a CELL stack is necessary.


Entry: PICOBIT VM
Date: Sun Oct 10 23:54:44 CEST 2010

The stack-based VM uses instructions like:
- Push constant, stack element, global
- Push closure with entry point x
- Set global to TOS
- Call/Tailcall closure at TOS with x arguments
- Call (toplevel?) procedure
- Jump if TOS is #f
- Jump
- Return


I wonder where the value goes on return.  Checked in code[2]: It goes
on the stack (env) and is placed in a temp reg when the env gets
restored from the continuation frame, then goes back on the stack.


[1] http://www.ccs.neu.edu/home/stamourv/picobit.pdf
[2] http://repo.or.cz/w/picobit.git


Entry: Kelsey Scheme48 papers
Date: Mon Oct 11 01:12:38 CEST 2010

The stack VM described in [3] refers to Clinger's 311 paper.

[1] http://mumble.net/~kelsey/
[2] http://mumble.net/~kelsey/papers/prescheme.ps.gz
[3] http://repository.readscheme.org/ftp/papers/vlisp-lasc/scheme48.pdf


Entry: Out of steam..
Date: Mon Oct 11 17:49:33 CEST 2010

I'm probably going to stop here to think about it a bit.  This morning
I changed the free/used bit to a phased version to avoid a full
used->free update of the entire map before marking.

Or maybe a little more..

About the VM, there are 2 things to decide:
  - The opcodes + general structure
  - The opcode encoding

It should be straightforward to solve these problems separately.
Let's assume that each instruction takes one machine word as argument.
And let's start with the opcodes from PIC(O)BIT.

I don't need a flat decoder for this: cell store is fine.

The only problem in the implementation is this:

  k_return:
    /* Execute top continuation frame, passing it the return value
       (stored in the c register). */
    c = POP(k);  // k frame
    i = NPOP(c); // opcode
    e = POP(c);  // environment
    goto run;

Which is an artefact of using cell store for the k stack, and c being
a cell itself (so it's the improper tail of the list..)


In a bytecode VM it will probably look more like:

    c = TOP_CODE(k)
    e = TOP_ENV(k)
    DROP(k)

Where c already has the i part of vm.c which needs to distinguish
between s-expression form and the "interpreter" code.

Ok, removed the integer intermediate variables.  Only cell* registers
remain.  


Entry: Separate interpreter from decoder
Date: Fri Oct 15 20:38:35 CEST 2010

Currently the VM runs on s-expressions.  This needs to be made
abstract such that the code format doesn't matter.  Start with writing
the decoder interface on top of s-expressions.

There is a small nuance with the current implementation (which creates
"smart" continuations) and a VM that would simply push a return
address.  Can this be eliminated while still working with s-expr?
It's really about push RV or not (let vs. begin).

So it looks like it's an either/or:

 - Use scheme-like syntax trees and have different continuation types.

 - Use a source representation which allows a single continuation type
   (equivalent to that of `begin') and provide explicit pushes.  (Or
   alternatively, `let' with explicit pops.)


Entry: Sweet spot: simple self-bootstrapping VM (without BC compiler).
Date: Sat Oct 16 21:51:19 CEST 2010

What is the sweet spot for a self-bootstrappable VM, one that can read
Scheme forms and interpret them.  Full core forms?  Maybe I can modify
the VM1 to use vectors + code addresses in continuations so no tags
are necessary.  This would make the memory model simpler.

Alternatively, it might not be so difficult to build a simple core
form compiler in C.


Entry: ANF vs. CPS
Date: Sat Oct 16 22:54:49 CEST 2010

One thing I didn't look at.  Isn't it simpler to use a CPS core form?
Then there is no distinction between continuation and closure.

I forgot how ANF can solve the "administrative redex" problem of CPS.


Entry: Cell on ARM
Date: Mon Feb  7 14:24:32 EST 2011

Let's try to get it to work on ecos + AT91SAM7S256.

To setup the build dir, 

mkdir ~/libprim/build_ecos_at91sam7sek
cd ~/libprim/build_ecos_at91sam7sek
. /opt/xc/ecos/ecosenv-3.0.sh 
../configure --target=eCos --cmdprefix=arm-eabi- --arch=arm --prefix=/opt/xc/ecos/build_3.0_at91sam7sek/install
make ecos_sc

The ecos is built using "ecosbuild 3.0 at91sam7sek" [1].

This currently fails due to relyance on M_PI in grid.c which probably
leaks through in the linux synthetic target.


. /opt/xc/ecos/ecosenv-cvs.sh 
../configure --target=eCos --prefix=/opt/xc/ecos/build_cvs_linux/install
make VERBOSE=1 ecos_sc


Nope, that one also fails on the M_PI error.

I removed grid.o in leaf/module.mk to see if that fixes it.  It does
fix the arm build.  Linux synthetic build complains about vectors.o
(part of eCos).


[1] http://zwizwa.be/darcs/ecos/ecos_build


Entry: Running on the AT91
Date: Mon Feb  7 15:17:37 EST 2011

I'm getting OCD errors trying to upload the ELF:

Info : accepting 'gdb' connection from 3333
Warn : acknowledgment received, but no packet pending
Error: status register: 0x30005
Error: Lock Error Bit Detected, Operation Abort
Error: failed erasing sectors 0 to 11
Error: flash_erase returned -902
Info : dropped 'gdb' connection

The gdb console says:
Error erasing flash with vFlashErase packet

EDIT: problem was flash write protection bits

[1] http://ecos.sourceware.org/docs-3.0/user-guide/modifying-the-memory-layout.html


Entry: AT91
Date: Fri Feb 11 17:20:10 EST 2011

As it looks like I'm going to be friends with the AT91 for a while,
I'd like to get a minimal version of a scripted language up and
running.  The new cell-based VM might be an interesting candidate.

Stepping through the eCos boot sequence gives a good idea about what
is necessary to get it running: oscillator, stack, ...


Entry: eCos CVS
Date: Sat Feb 12 09:57:57 EST 2011

../configure --cmdprefix=arm-eabi- --prefix=/opt/xc/ecos/build_cvs_at91sam7sek/install --target=eCos --enable-debug


Entry: cell123 on AT91
Date: Sat Feb 12 10:31:20 EST 2011

I'm trying to run the cel123 test program on the AT91.  It seems to
compile fine, but there is something wrong with the IO.

Q: When using the main() start point in eCos as opposed to
   cyg_user_start(), does the scheduler get started to enable blocking
   IO?

It seems main is a thread[1].  In retrospect it's apparent from the
stack trace as there is no caller of main() in the backtrace.

Anyway, somewhere in this i->o data gets lost:

    while(1) {
        putchar(getchar());
    }

Ok, it is using the debug input port, but it doesn't get any data, so
I suspect something else is also reading from this port.

Ok, changing main -> cyg_user_start gives me a working console.  That
might be the trick.

[1] http://sourceware.org/ml/ecos-discuss/2008-11/msg00034.html


Entry: Keep code out of RAM
Date: Thu Apr 28 20:02:17 EDT 2011

Thinking about it, it's probably not a good idea to keep code in RAM.
Since scheme (without eval) never creates code at runtime (as opposed
to closures or continuations), there should be no problem really.


Entry: Dead end; new start?
Date: Thu Apr 28 23:36:45 EDT 2011

So after leaving the project alone for a while I thought it might be
interesting to write a compiler in racket.  However what I forgot is
that it seems to be a dead end, at least the s-expression VM one.

So, it might be best to move the project elsewhere.  The next time I
would write a Scheme engine, I'd design it together with the VM and
use a generator for the VM.  I can stil reuse the garbage collector
though.


Entry: Hmm.. learning?
Date: Sat Apr 30 23:12:08 EDT 2011

Maybe it's good to finish the interpreter anyway, to have a testable
model that can be evolved into something more abstract.

Maybe I should learn this time: don't ever rewrite something.  Even if
it might take a very long time, incremental rework has the infinite
advantage to be working at all time: it's more appropriate in the real
world.


Entry: Lambda & app.
Date: Sat Apr 30 23:18:35 EDT 2011

Q: shouldn't closures be bound in the environment?

Currently they are expected as a direct argument of app (in code) but
that can't work because closures can only be the result of a lambda
expression evaluation (make-closure).

So I have to fix that in the VM (trivial: add deref) and in the test
suite.

EDIT: fixed


Entry: ANF compiler
Date: Sun May  1 13:11:48 EDT 2011

Seems lika n ANF compiler is much like an ordinary interpreter that
makes a choice about what to do first.  This is really a CPS
conversion.

So what I need is a state-continuation monad.

This is all so familiar..  Maybe there is something to learn about
basics here.


So this is let insertion.  The state is necessary for name generation.
The CPS is necessary to provide sequencing (in a functional way).

Q: Is the necessesity of using CPS a consequence of the necessity of
   threaded state?

In other words, is there a way to implement this that is just
stateful?  It seems that a stateful approach should work too.  Let's
try both.

It seems to be the case that name generation can be kept local:
shadowing is supported in the comp-vm compiler.

It seems the trouble is with trying to express the conversion of a
function application, where a recursive data structure is built (a
collecting of enclosing let forms) which has a deep that needs to be
filled in with data computed in each recursion step.  It's possible to
fake this using iteration though.

( confazzled ramblings ... ) 


The basic idea is that, NO, of course you don't need CPS or
continuation-state passing style in the compiler as long as you're
allowed to use Scheme's state+strictness.


Entry: serial vs parallel `let'
Date: Sun May  1 15:01:04 EDT 2011

I wonder if it's actually a good idea to have no parallel `let' form
at all.  I did this to make the VM's continuations simpler
(essentially only 1 since `begin' is the same as single-clause `let'
and a dummy var).

It seems this needs a separate register:
  - current environment
  - environment under construction

So which is simpler?

Moving towards a bytecode VM it seems that the most important
simplification is to have only a single continuation type, such that
it doesn't need tagging and can be a machine code jump target.

It seems to me that a linear environment is probably simpler to later
unify with a stack language.

Entry: One continuation type
Date: Sun May  1 15:05:24 EDT 2011

It seems te best optimization to make is to remove dependency on
multiple continuation types, which means that `begin' is implemented
in terms of the `let' primitive, but followed by a pop which
explicitly ignores the retval.

DROP + LET : which order? Drop after let + make sure the env is passed
correctly.

So, now that begin is eliminated, the continuation can be simplified.
I don't really need a K_HALT probably.  Next: eliminate K_HALT.  It
can be implemented as a return frame with a halt opcode.

Done.  That wasn't so hard!


Entry: From structured to linear code
Date: Sun May  1 19:26:47 EDT 2011

Now, with only one continuation type we have a CALL / RETURN structure
that saves/restores closures (code pointer + environment).  If the VM
only uses POP to operate on the code, that operation can be made
abstract and the code representation doesn't actually matter, i.e. it
can be CDR-coded.


/* Read a number from the code stream. */
#define NREADC  NPOP(c) 

/* Read a structured data item from the stream: try to avoid this as
   it hinders abstraction.  Replace with other typed reads.  */
#define READC   POP(c)    // read code data structure


It seems that READC is only used to create closures.  What is really
necessary is a way to represent code addresses in code, and abstract
the representation of code addresses in the continuation stack.

Turning the current cell-based VM code into a proper list seems to be
the way to go.  One at a time.


Still, this doesn't seem to lead to very flat code, except for calls.
I'm looking at set!.  Hmm make this more specific...


Entry: Weird bug
Date: Tue May  3 09:28:25 EDT 2011

Changing OP_REF : cons->list triggered a bug in OP_APP: it needs
varrefs directly, not opcodes.


Entry: Towards abstract code
Date: Tue May  3 10:13:10 EDT 2011

I've got all code ops abstracted:

/* Read a number from the code stream. */
#define NEXT_NUM  NPOP(c) 

/* Read a structured data item from the stream: try to avoid this as
   it hinders abstraction.  Replace with other typed reads.  */
#define NEXT_ADDR   POP(c)         // read abstract code address
#define NEXT_DATA   POP(c)         // read constant data
#define NEXT_VOID   (POP(c)->atom) // read full machine address


The next step is to make representation of code addresses abstract,
i.e. in the continuation stack.

Another point of attention is whether the code itself will be GC-able.
For embedded use (when `eval' is not defined on the target) this is
probably not the case.  Turning that upside down: of course it is not,
otherwise we would _have to_ use CELL representation!


Entry: Simplified op_prim
Date: Tue May  3 10:20:09 EDT 2011

Actually, OP_PRIM does not need to call return.  It can just continue
reducing the next code atom, since it has no need of touching the
environment during eval.

The op_prim that's there now is not a primitive in the ordinary sense,
but a VM extension.  To make the above remark usable, a real prim op
needs to perform argument reference too, but can then skip closure
push/pop.


Entry: Primitives
Date: Wed May  4 15:26:04 EDT 2011

Before getting anywhere, there needs to be a way to allow for standard
C functions to be executed.  The following decisions need to be made:

  - Pass cells/atoms as C arguments, or use a list?

  - Handle type checking for the most basic operations: probably best
    to separate type checking from implementation, to be able to move
    some of this responsability to the compiler.


Entry: Next
Date: Wed May  4 23:20:39 EDT 2011

So..  What's the next interesting thing to do?  To make this into a
full-fledged language on top of the leaf_ code is maybe not something
I'm looking for right now.  Unless it can be automated.

The next thing would be to actually implement a different code format.
That would make it useful on the ARM, to write "control-oriented"
code, favouring control structures over memory use.


Hard to predict evolution of not-directly-purposed software, but it
looks like this is for later, or for sporadic tinkering.


Entry: Abstract code addresses
Date: Fri May  6 14:27:14 EDT 2011

Can be abstracted in ATOM() constructors.  Need to change vm.c to
support these for: op_call/k_return and op_close/op_app.

I'm a bit puzzled why the nb arguments for app actually shows up at
the app side, and not at the code side.

Hmm, not so surprising really: it comes directly from the app syntax:
caller does not know arity of callee, so it's specified at the call
site.  Still, something isn't quite right in the way this is encoded.

Next: cleanup the nb_arg representation for op_app/op_close.

Q: why doesn't a closure for the k stack have a nargs field, while the
   op_app/op_close does?


Entry: Memory allocation and tagging rehash
Date: Fri May 20 16:11:35 CEST 2011

I'm using 32-bit cells with 2 tag bits.  Tags are stored in a separate
RAM vector.  A pair needs two slots which leaves a 16-bit cell address
space.  This is 256kbytes.  That seems fine for the target uCs, which
have limited RAM and virtually unlimited Flash in comparison.

The interesting attribute here is that the pointers are clean: no tag
bit pollution, which means the non-managed opaque object management
can be separated completely from the GC.


Entry: Generated interpreter
Date: Fri May 20 18:26:31 CEST 2011

Now it would be interesting to write a generator for the interpreter
to make it possible to create a host of different interpreters that
could be optimized for a specific application.

How to go about that?  The first step is probably to make it as simple
as possible, so let's do that: eliminate tricks that cause
non-uniformity from the current VM implementation.


Entry: eCos
Date: Tue Jun  7 21:25:16 CEST 2011

What a mess..  What I want is a single command to build a project
based on eCos, but there is no simple way to do this.  There's
configs, paths, linker flags, install dirs, source dirs, argh..

What I really want is these 2 steps:

 1.  make-ecos-compiler:      arm.ecc -> just-compile-it-dammit
 2.  just-compile-it-dammit:  *.c     -> app.elf-arm


So, let's get rid of some things.

The darcs/ecos project: it can be merged with libprim, since the only
reason I need eCos outside of work is to play with interpreters..

This means: move the OpenOCD / GDB code to the ecos-meta project, and
let libprim incorporate the build system fixes.


Entry: VM structure
Date: Wed Jun  8 15:55:02 CEST 2011

Something I've been fretting about is whether this ultra-simple VM
structure isn't just ridiculously slow because it doesn't have a fast
reuse stack or a O(1) varref.  My worry is that maybe it can be made
10x faster by adding 10% complexity, and by not thinking about that I
might loose opportunity.

The thing there of course is measurement, and to make it clear what it
should do.  If it's supposed to go into a low-power device, then
efficiency does matter.

So there are many unknowns.  And fear of not doing it right in the
next iteration is definitely there.  I'm already getting a whiff of
making the VM and compiler part of the same coin..

Maybe that's the real direction to take.  Make the whole structure
abstract, so it has a stronger correctness guarantee, but allow for
specialization.

The basic idea: I'm done with hand-crafting brittle, low-level
structure.  I want to express meanig and mapping separately.


Entry: What do we have here?
Date: Thu Dec  1 13:57:05 EST 2011

- Basic collection of C objects with reference counting mechanism.

- 2-halves garbage collector, with Scheme and PF on top of it.

- embedded cell garbage collector, nothing on top.


Entry: Wrapper generators
Date: Mon Jan 23 13:23:57 EST 2012

Last week I integrated Lua into an embedded C project to facilitate
testing.  As part of this effort I used the Haskell library
Language.C[1] to gather binary protocol definitions from a bunch of .h
files, dumping out Lua tables that can be used for parsing binary data
to Lua structures.  This was such an immense step forward that I'm
thinking much of libprim could probably be written in this way.

[1] http://hackage.haskell.org/package/language-c


Entry: Working on PDP
Date: Thu Nov  1 20:42:25 EDT 2012

I'm working on v4l support in PDP and am trying to figure out how
exactly the leaf / media objects worked.

The idea is to extract a basic C file from pdp_v4l and include it
here.

EDIT: leaf_v4l is separated so there's only glue in the way of having
v4l->xv in libprim.


Entry: Next?
Date: Wed Nov  7 20:55:32 EST 2012

I'm not sure this is really going anywhere.  The problem in the leaf
library is still memory allocation: it relies on malloc() / free()

Is there a way to separate allocation and init?  The trouble here is
composite objects: it's hard to write objects in C that can not
reference external resources..  And anyways, there is no way around
file descriptors.

So, what is the basic structure?  Objects with open/close, new/delete,
...  it doesn't look like there is a way around it, and it doesn't
look like a good idea to try to use something else than malloc/free:
thread-local storage can be used to override behavior there if it is
really necessary (i.e. dynamic parameters).

So time to stop fussing.

Besides, I wonder if the problem is already solved in Lua?  It has a
nice memory model: don't share pointers with C ;)

I'm thinking that maybe this "keep stuff on the stack" approach is not
such a bad idea.

Anyway, most things I'm interested in don't actually need much
complicated memory management.  Maybe it's time to restate the goals:

1. Create an environment to write simple C objects for multimedia
   processing, that are easily accessed in a variety of scripting
   languages, but that also work stand-alone.

2. Provide a set of base objects to allow some fooling around with
   language VMs.


Those are two goals really.

Lua changed everything for me: a very simple language that integrates
well with C.  No need to re-invent the wheel.

So, some questions:
- What with the serialization requirement on leaf objects?


Entry: Integrating media stuff in PDP
Date: Fri Nov 23 12:48:50 EST 2012

I'm dubbing it "zl" in PDP, containging: xv, v4l, ...

It doesn't seem to be integrated in libprim at all, so I wonder why is
it here?  The only reference I find is in the 'install' target in
project.config.mk, but there is no media/module.mk

Maybe it got lost somewhere..

I found some old code that has a copy of the project which has a
Makefile included, but not a module.mk

It looks like the idea here was to keep this a separate module
compiled against an installed libprim, probably due to licensing
issues with ffmpeg.


Entry: How this works
Date: Fri Nov 23 13:14:56 EST 2012

Frankly, I forgot.

A version of libprim that was embedded in another project that
actually used this contained media/Makefile.  It seemed that the point
was to create a separate .a file not part of the libprim project to
keep the GPL code separate (due to GSL and FFMPEG).

Not sure what to do with it.  Until there is a working example of SC
or PF running the code in Media, I don't see the point of making this
work.

The other project also had a couple of .scm files:
 sc/media.scm
 sc/ffmpeg.scm

but it doesn't look like there was any XV or V4L stuff.

There is also sc/media.c which has all the wrapper code.

It looks like this was only taken as far as was necssary to solve a
very specific problem, then abandoned, and never integrated with main
libprim.

Another thing here is that the build system is very confusing with all
the autogen gong on.  I'm a bit lost..

I need an entry point and a list of what to ignore..


Entry: Focus
Date: Fri Nov 23 13:39:53 EST 2012

Current goals:
- Make PDP work with base objects compatible with libprim
- Add a wrapper generator for: PF, SC, Lua, Haskell, Racket, Pd, ...

The main problem is the wrapper generation.  It seems best to use an
object approach here, since that is what the C code abstracts: some
opaque state and some methods.

MODEL: semantics + syntax

Semantics: classes or not?
Model: something that is preprocessor-only?

It seems best to not re-invent stuff here.  SWIG should be able to do
what I want.  Probably best to get to now it first.

It's probably best to grep through the Pd externals tree to see which
project already uses SWIG.


Entry: SWIG
Date: Fri Nov 23 13:59:03 EST 2012

The thing to go for is this[1]: "the parse tree can be exported as XML
and Lisp s-expressions."

So, let's do that.  Take the zl lib and export it as s-expressions.

Step 1: generate a Lua wrapper for zl

swig -lua v4l.h 

Step 2: s-expression

swig -sexp v4l.h

Hmm.. that seems all a bit complex.  What does it look like on the Lua
side I wonder?


Hmm... looks like today is not the day.  A better way to go is
probably first to make this work for a toy problem without all the
linker madness.

Alternatively, it might be a lot easier to write "wrappable" code in
the first place.  Many features of swig I really don't need, such as
structure member representation.  I'm happy with just functions and
size of data objects to allow for caller-side allocation.

There has to be a simpler way to get to this..

[1] http://www.swig.org/compare.html
[2] http://www.swig.org/tutorial.html


Entry: Wrappers: The point
Date: Fri Nov 23 23:37:47 EST 2012

- classes

Generic functions are a better abstraction mechanism, but are hard to
integrate into a wide set of languages.  Find out how exactly this is
true/false, and if true, how to standardize on classes
(single-dispatch + factory).

- abstract vs. concrete (memory-only) state

distinguish between concrete, memory-only objects, and references to
external state (e.g. file descriptors).  this is actually a very
important distinction when thinking about lifetime of objects and of
serialization.  how to make it proper?

( i wrote something about re-initializable state before: state that's
essentially just cache vs. state that is essential to the object's
identity. )


Entry: A practical goal
Date: Fri Nov 30 14:32:57 EST 2012

I'd like to have an exploration platform for uC ARM cores.  Something
that is simple.  Something that can be used to quickly test driver code.


Entry: Added AT91SAM7 project
Date: Tue Dec  4 11:42:54 EST 2012

See at91sam7/at91sam7.txt

This subproject aims to do the following:
- Minimalistic processor startup (no libc)
- Evaluation of open source libc (i.e. newlib)
- Running libprim core on top of a libc
- Running lua + zl primitives on top of a libc


Entry: Change of focus
Date: Tue Dec 25 10:22:00 EST 2012

Need to build an engine for an OpenGL GUI.  Thinking about using CELL
and LEAF, but leave some room to build it in EX too.

* Layout engine: translate a heriarchical GUI description to an
  on-screen layout.

* Event dispatcher: focus on the hard problem: editable graphs.


Roadmap:
1. OpenGL window context with mouse/keyboard events.
2. Layout engine for knobs, sliders, LCDs and grouping boxes.


Entry: CELL vs. EX
Date: Tue Dec 25 10:36:21 EST 2012

CELL is probably the better approach because it is simpler.  It is
also memory efficient so might be a good idea for embedded systems.
However, EX is working, and has support for variable-size vectors.

How to abstract substrate?

One thing I do not like about EX is the build system.  It's dependency
on mzscheme is quite annoying.


Entry: Build system.. how does it work?
Date: Tue Dec 25 10:38:27 EST 2012

Seems a bit complicated.  Maybe it's time to clean it up a bit.

It doesn't seem that dynamic library support is going to be necessary.
Targets are deeply embedded systems or isolated so/DLL.  Dynamic
libraries complicate the build/install process significantly.

Would it be easy to switch the build to static, single .a approach?

In first iteration, collect all the .c files.

-> Making a simpler one that's only static.

However, this gives a problem with object_to_* calls, which are
different for SC and PF.

This used to be part of the specs, i.e. for the dynamic library.  How
to work around this?


/* Scheme specific object wrappers.  Note that these functions need to
   show up at link time, as they are part of the binary interface. */
#define DEF_AREF_TYPE(name) \
    name *object_to_##name(object ob) { \
        return (name*)object_aref_struct(ob,name##_type()); }


/* RC types are const types wrapped in a RC struct. */
#define DEF_RC_TYPE(name)                                              \
    name *object_to_##name(object ob) {                                \
        return (name*)object_rc_struct(ob,name##_type()); }


The thing is that these are referred in EX.


Easiest seems to define a object_rep_struct.
This is a nasty API bug that isn't easy to fix.

Fix it?

Added extra EX argument to all object_to_ methods to allow for
different wrapping in PF and SC.  Might not be so optimal, but better
that it works properly and is consistent.


Entry: CELL
Date: Tue Dec 25 20:12:05 EST 2012

Restored cell to working order on 64bit.


Entry: Graph nudging GUI
Date: Tue Dec 25 20:12:24 EST 2012

Took a bit longer than expected, but build system is in place for
building some layout engine for guis.

The main question is still: how to solve the "graph nudging" problem?

To do the gui, it's maybe best to work in two phases:

- Build vector layout engine for knobs, buttons, sliders and LEDs.

- Solve the graph nudging problem.
 

Entry: Imported zl from pdp
Date: Tue Dec 25 22:24:47 EST 2012

Looks like it will be possible to use the same code base with a little
tweaking here and there.


Entry: Removing old makefiles *.mk
Date: Wed Dec 26 12:18:37 EST 2012

I'm moving towards static lib only, and manual configuration.  It
seems this project is not intended as a general-purpose library, more
like a "include this in your source tree" project, like Lua.

Look in the history around this time for old build scripts.


Entry: Makefile bug
Date: Wed Dec 26 13:13:42 EST 2012

Got one:


tom@zoo:~/libprim/build$ make -f ../main.mk -j -d >/tmp/make.1
gcc: error: posix_sc/main.o: No such file or directory
gcc: error: pf/pf.o: No such file or directorygcc: error: cell/test.o: No such file or directory
make: *** [/home/tom/pub/darcs/libprim/build/posix_sc/main.elf] Error 1
make: *** Waiting for unfinished jobs....
make: *** [/home/tom/pub/darcs/libprim/build/pf/pf.elf] Error 1
make: *** [/home/tom/pub/darcs/libprim/build/cell/test.elf] Error 1
gcc: error: zl/test.o: No such file or directory
make: *** [/home/tom/pub/darcs/libprim/build/zl/test.elf] Error 1
gcc: error: cell/test_gc.o: No such file or directory
make: *** [/home/tom/pub/darcs/libprim/build/cell/test_gc.elf] Error 1
tom@zoo:~/libprim/build$ cp /tmp/make.1 ~/make.1
tom@zoo:~/libprim/build$ 

I'm wondering if this is not just a specification issue.  Can pattern
rules have multiple dependencies?

[1] http://lists.gnu.org/archive/html/bug-make/2012-05/msg00031.html


Entry: Xmonad floating windows
Date: Wed Dec 26 20:05:24 EST 2012


WM_NORMAL_HINTS


http://tronche.com/gui/x/xlib/ICC/client-to-window-manager/XSetWMNormalHints.html

Syntax

void XSetWMNormalHints(display, w, hints)
      Display *display;
      Window w;
      XSizeHints *hints;


/* Size hints mask bits */

#define USPosition	(1L << 0)	/* user specified x, y */
#define USSize		(1L << 1)	/* user specified width, height */
#define PPosition	(1L << 2)	/* program specified position */
#define PSize		(1L << 3)	/* program specified size */
#define PMinSize	(1L << 4)	/* program specified minimum size */
#define PMaxSize	(1L << 5)	/* program specified maximum size */
#define PResizeInc	(1L << 6)	/* program specified resize increments */
#define PAspect		(1L << 7)	/* program specified min and max aspect ratios */
#define PBaseSize	(1L << 8)
#define PWinGravity	(1L << 9)
#define PAllHints	(PPosition|PSize|PMinSize|PMaxSize|PResizeInc|PAspect)

/* Values */

typedef struct {
	long flags;		/* marks which fields in this structure are defined */
	int x, y;		/* Obsolete */
	int width, height;	/* Obsolete */
	int min_width, min_height;
	int max_width, max_height;
	int width_inc, height_inc;
	struct {
	       int x;		/* numerator */
	       int y;		/* denominator */
	} min_aspect, max_aspect;
	int base_width, base_height;
	int win_gravity;
	/* this structure may be extended in the future */
} XSizeHints;


Entry: Sync to VBLANK
Date: Wed Dec 26 22:50:38 EST 2012

Why is this so difficult?
Apparently it's possible in DRI2 with a recent kernel (post from 2010).
Or is this only for nvidia?

So... if glxgears can do it, so can I..
Anyway, can look in the Debian Mesa sources later
The package is mesa-utils, or mesa source package.

Some code here[2].

  589 static void
  590 query_vsync(Display *dpy, GLXDrawable drawable)
  591 {
  592    int interval = 0;
  593 
  594 #if defined(GLX_EXT_swap_control)
  595    if (is_glx_extension_supported(dpy, "GLX_EXT_swap_control")) {
  596        unsigned int tmp = -1;
  597        glXQueryDrawable(dpy, drawable, GLX_SWAP_INTERVAL_EXT, &tmp);
  598        interval = tmp;
  599    } else
  600 #endif
  601    if (is_glx_extension_supported(dpy, "GLX_MESA_swap_control")) {
  602       PFNGLXGETSWAPINTERVALMESAPROC pglXGetSwapIntervalMESA =
  603           (PFNGLXGETSWAPINTERVALMESAPROC)
  604           glXGetProcAddressARB((const GLubyte *) "glXGetSwapIntervalMESA");
  605 
  606       interval = (*pglXGetSwapIntervalMESA)();
  607    } else if (is_glx_extension_supported(dpy, "GLX_SGI_swap_control")) {
  608       /* The default swap interval with this extension is 1.  Assume that it
  609        * is set to the default.
  610        *
  611        * Many Mesa-based drivers default to 0, but all of these drivers also
  612        * export GLX_MESA_swap_control.  In that case, this branch will never
  613        * be taken, and the correct result should be reported.
  614        */
  615       interval = 1;
  616    }
  617 
  618 
  619    if (interval > 0) {
  620       printf("Running synchronized to the vertical refresh.  The framerate should be\n");
  621       if (interval == 1) {
  622          printf("approximately the same as the monitor refresh rate.\n");
  623       } else if (interval > 1) {
  624          printf("approximately 1/%d the monitor refresh rate.\n",
  625                 interval);
  626       }
  627    }
  628 }

So I found some code here[3], which I adapted and put into glx.c
However, it seems that it's already running at vsync, at leas by
measurements.  Weird, it does take a lot of CPU.


[1] http://cgit.freedesktop.org/nouveau/xf86-video-nouveau/commit/?id=70f0d2c886ceaa965ca4864788f4dd8e8f757a92
[2] http://fossies.org/dox/MesaDemos-7.8.2/glxgears_8c_source.html
[3] http://forum.openframeworks.cc/index.php?topic=561.0


Entry: c++ build rules
Date: Sun Jan 13 14:26:01 EST 2013

The makefile doesn't support build rules for c++ files used in final
targets.  For library it should work to add some extra rules; let's
try that first.

Hmm.. this requires some makefile restructuring.


Entry: Weird crt0.S problem:
Date: Fri Aug 30 18:18:18 EDT 2013

FIX in crt0.S
-       stmfd   sp!, {r0, rl}
+       stmfd   sp!, {r0, lr}

I wonder, why did it accept rl in the first place?
How did this end up in the code, since it worked before?


Entry: AT91SAM7S256.h not annotated with __ASSEMBLY__
Date: Fri Aug 30 18:28:23 EDT 2013

Looks like I only ran this on AT91SAM7X256.h ?


Entry: simple pdp / forth
Date: Fri Dec 13 14:31:59 EST 2013

So, basic idea is to revive some of the "patching logic" in pdp to
embed in a mobile app.  Basic ideas:

- The "linear" part in Forth makes it easy to use RC-based memory
  management and keep some form of locality, though not enough.

- Core DSP doesn't need memory management.  Can be implemented in any
  language.

- Speed: core problem is still memory localtiy.


So the core problem is not solved: how to compact code such that
locality is preserved.  This is a loop transformation problem.  To
tackle this, an intermediate structure is necessary.  I.e. an
algebra..


Is there a way to have a canoncial representation?

I'm guessing that IRL this is the "use a lot of memory" approach.


Entry: containers
Date: Sat Aug  9 15:02:56 EDT 2014

It seems that replacing containers with iterators is an interesting
way to avoid language bridging.  Running into a problem with libsigrok
using glib containers and having to find a way to translate them to
python containers.

Why not just have an iterator object that produces native C types that
are easily handled using standard wrappers?


Entry: Merging libprim from eyes
Date: Sat Aug 16 11:31:39 EDT 2014

[master] tom@zoo:~/libprim/zl$ git pull ~/git/zl
remote: Counting objects: 67, done.        
remote: Compressing objects: 100% (52/52), done.        
remote: Total 54 (delta 34), reused 0 (delta 0)        
Unpacking objects: 100% (54/54), done.
From /home/tom/git/zl
 * branch            HEAD       -> FETCH_HEAD
Auto-merging zl.mk
CONFLICT (content): Merge conflict in zl.mk
Auto-merging xwindow.h
Auto-merging xwindow.c
CONFLICT (content): Merge conflict in xwindow.c
Auto-merging xwindow.api
CONFLICT (content): Merge conflict in xwindow.api
CONFLICT (modify/delete): test deleted in f9e8d1fec6b9f269d2a1ea79613bff7b312e4b18 and modified in HEAD. Version HEAD of test left in tree.
Auto-merging glx.c
Auto-merging 3Dcontext_glx.c
CONFLICT (add/add): Merge conflict in 3Dcontext_glx.c
Auto-merging 3Dcontext.h
CONFLICT (add/add): Merge conflict in 3Dcontext.h
Automatic merge failed; fix conflicts and then commit the result.


Entry: swig
Date: Sun Aug 17 01:08:26 EDT 2014

tom@tx:~/libprim/build$ make -f ../main.mk /home/tom/pub/darcs/libprim/build/swig/wrap_python.c


Entry: Functional interface
Date: Wed Aug 20 14:49:25 EDT 2014

From the C side, it's easier to let the client do the memory
management, i.e. do things like this:

    zl_v4l_get_format(&v4l, &fourcc, &w, &h);
    zl_v4l_next(&v4l, &video_data);

However, making swig wrappers this does complicate things.

So I guess that's the whole point of libprim, but the impedance match
isn't there yet.


Entry: Future of libprim
Date: Sun Sep  7 09:26:31 CEST 2014

Am I still interested in building scripting languages?
Does this library still solve a problem?

The biggest problem I've encountered in recent practical use is
building it without all dependencies installed.  So fixing
cross-complation is probably the way to go.


Entry: Inspired by UBF
Date: Sat Jul 18 11:09:12 EDT 2015

Making things work together:

- libprim: a new core for PF
- rust: write ad-hoc glue for PF
- rai: write signal processing primitives for PF

Set this all up so it *BUILDS* !

[1] entry://../staapl-blog/20150718-105036


Entry: Linear engine
Date: Sat Oct 31 11:16:39 EDT 2015

Boiling down the basic ideas behind PDP and PF:
- most audio/video processing executes as data flow graphs (DFGs)
- RPN is a good way to encode DFGs, exposing usage patterns and locality
- exposed dup and drop make refcount management simple (lazy copy).

There isn't much else to it..


To integrate libprim, zl, pdp, pf, do the following:

- "new" pdp is a shell to data flow graphs running inside of pf, so it
  can just be called "pf"

- pf is an intermediate language engine to represent data flow graphs
  as linear programs.  it can be used stand-alone, or as a pd
  component.

- libprim implements pf's object system

- zl implements pf's media bindings

- erlang bindings for sensor network


Next step?  Get PF up.
Goal: get video in, do some minimal processing, and send it out over the network.


Entry: PF revisited
Date: Sat Oct 31 12:02:11 EDT 2015

PF uses both nonlinear and linear memory.
Code is stored in nonlinear memory.
Data should reause linear memory.

The missing part is a bridge between:
- MIMO dataflow primitives
- buffer allocation and reuse

Assuming the multiple-dispatch is handled elsewhere, the main problem
is per type buffer allocation and reuse.

In an application, there are ususually not a lot of "packet" types
involved, so it might be good to make a registry.

The magic happens in _px_unlink()

Currently, it supports rc objects.  What is needed, is to extend rc,
or make a similar class that performs malloc-like reuse.

That's it.


Entry: Data buffers
Date: Sat Oct 31 12:58:41 EDT 2015

Incongruency:
_leaf_object has _rc field
rc has rc field

what happened?

seems that pf got out of sync.
let's remove rc.h


Entry: bug
Date: Sat Oct 31 16:22:27 EDT 2015

old:

tom@pt:~/libprim/build$ pf/pf.elf
<,['{{i . {compile . {run}}} . {{compose . {{>lcode} . {dip . {>lcode . {lcompose}}}}} . {{pa . {{abstract} . {dip . {>lcode . {lcompose}}}}} . {{see . {find \
. {post}}} . {{unrun! . {{quote . {()}} . {lunrun-and-compose!}}} . {{prompt . {reset}} . {{compile . {>nl . {nl-compile}}} . {{+ . {add}} . {{list->stream . \
{{{{cons} . {shift}} . {each . {{quote . {()}}}}} . {reset}}}}}}}}}}}} definitions '"libprim/PF" display cr repl]>
<,['{{i . {compile . {run}}} . {{compose . {{>lcode} . {dip . {>lcode . {lcompose}}}}} . {{pa . {{abstract} . {dip . {>lcode . {lcompose}}}}} . {{see . {find \
. {post}}} . {{unrun! . {{quote . {()}} . {lunrun-and-compose!}}} . {{prompt . {reset}} . {{compile . {>nl . {nl-compile}}} . {{+ . {add}} . {{list->stream . \
{{{{cons} . {shift}} . {each . {{quote . {()}}}}} . {reset}}}}}}}}}}}}] . <,[definitions '"libprim/PF" display cr repl]>>
<,[definitions '"libprim/PF" display cr repl]>
<,[>nl nl-definitions] . <,['"libprim/PF" display cr repl]>>
<,[>nl] . <,[nl-definitions] . <,['"libprim/PF" display cr repl]>>>
<,[nl-definitions] . <,['"libprim/PF" display cr repl]>>
<,['"libprim/PF" display cr repl]>
<,['"libprim/PF"] . <,[display cr repl]>>
<,[display cr repl]>
<,[display] . <,[cr repl]>>
libprim/PF<,[cr repl]>
<,[cr] . <,[rep repl]>>

<,[rep repl]>
<,['"> " display read interpret] . <,[rep repl]>>
<,['"> "] . <,[display read interpret] . <,[rep repl]>>>
<,[display read interpret] . <,[rep repl]>>
<,[display] . <,[read interpret] . <,[rep repl]>>>
> <,[read interpret] . <,[rep repl]>>
<,[read] . <,[interpret] . <,[rep repl]>>>


new:

tom@tp:~/libprim/build/pf$ ./pf.elf 
<,[['quote [['i 'compile 'run] ['compose ['>lcode] 'dip '>lcode 'lcompose] ['pa ['abstract] 'dip '>lcode 'lcompose] ['see 'find 'post] ['unrun! ['quote [nop]] 'lunrun-and-compose!] ['prompt 'reset] ['compile '>nl 'nl-compile] ['+ 'add] ['list->stream [[['cons] 'shift] 'each ['quote [nop]]] 'reset]]] 'definitions '"libprim/PF" 'display 'cr 'repl]>
<,[['quote [['i 'compile 'run] ['compose ['>lcode] 'dip '>lcode 'lcompose] ['pa ['abstract] 'dip '>lcode 'lcompose] ['see 'find 'post] ['unrun! ['quote [nop]] 'lunrun-and-compose!] ['prompt 'reset] ['compile '>nl 'nl-compile] ['+ 'add] ['list->stream [[['cons] 'shift] 'each ['quote [nop]]] 'reset]]]] . <,['definitions '"libprim/PF" 'display 'cr 'repl]>>
<,['definitions '"libprim/PF" 'display 'cr 'repl]>
<,['definitions] . <,['"libprim/PF" 'display 'cr 'repl]>>
<,['"libprim/PF" 'display 'cr 'repl]>
<,['"libprim/PF"] . <,['display 'cr 'repl]>>
<,['display 'cr 'repl]>
<,['display] . <,['cr 'repl]>>
<,['cr 'repl]>
<,['cr] . <,['repl]>>
<,['repl]>
()
!lnext
()
!lnext
tom@tp:~/libprim/build/pf$ 


new:
px_compile_program: ((#<lin:#fin quote> . (((#<lin:#fin i> . 

old:
px_compile_program: ((quote . (((i . (compile . (run)))


So there's something wrong with the lin/fin thing...
I don't understand it at this point, so let's see where it breaks.

Maybe these "lin" boxes are no longer necessary? Look into this.

The problem causing the current bug is that symbols are now handled as
RC boxes.


Entry: lin types
Date: Sat Oct 31 17:05:31 EDT 2015

What are lin wrappers for?
I no longer understand the point of this.

It seems that moving between graph and lin memory is very straightforward:
- convert cell type
- adjust ref counts?

Note that the graph memory only needs one RC increment.  Decrement
happens on GC.

Tried to remove, but there are all kinds of assumptions.

Needs a lot of attention to understand the basic point, and find a new
invariant.  With RC counts inside the types it shouldn't be so hard to
copy between GRAPH and LIN memory.


Entry: Prim types: free
Date: Sat Oct 31 17:52:00 EDT 2015

Main problem at this point is to properly encode values that are not
supposed to be managed with reference counting:
- symbols
- primitives

Currently, prim class is set to 0xF001, but the free method is called.

Solved by adding a prim class with a nop for free.


Entry: The point of LIN
Date: Sat Oct 31 19:04:50 EDT 2015

* The linear machine's data tree is visible by the main GC, meaning
  that it can contain references to nonlinear data.  This is useful.

* The linear machine manages its own cons cells in a free list.  These
  are regular GC objects, but are tagged differently to manage the
  _contents_ differently.

* Leaf objects that are moved around in the linear tree use refcounted
  copy.  The consequence of this, is that _if_ the nonlinear tree does
  not have a reference to data that is contained in the linear tree,
  the free operation is instantaneous.


There needs to be a way to distinguish between objects that are
linear, and objects that are nonlinear.  By removing the RC wrapper,
this has become impossible.

Which means that all objects need to be treated as linear, which
completely removes the need for implementing the linear cells as part
of EX.

The latter might be the most important part: pf needs its own cons
cells, separate from vectors.  The proper way to structure this is:

PF data: LEAF + LCONS
   code: EX

And leave it such that PF data cannot reference EX data.

The latter is problematic for code pointers.

So... pf data should be EX.


Entry: A linear core
Date: Sat Oct 31 19:15:19 EDT 2015

The basic idea is to embed a linear core inside a nonlinear one.

- For variables "seen" by the nonlinear core, the linear RC management
  has no effect.

- For priviate variables, the RC management accomplishes early release
  and locality.


The main question is: which types are RC-managed?


RC++ on copy to graph has another problem: an object might already be
in graph memory, so it's possible to multi-increment RCs.


It seems this idea is quite flawed..

It's probably best to completely separate the memory models, and copy
data on the boundaries.


The previous implementation for RCs was correct: one wrapper per copy,
so multiple copies have multiple wrappers, and each will decrement
refcount on GC.


Entry: What did we learn?
Date: Sat Oct 31 20:39:24 EDT 2015

Because code is graph memory, and linear memory can reference code
(dispatch tables), there is the problem that data memory can't be
isolated from graph memory.  I.e. needs to be embedded in it.


Entry: Copying
Date: Sat Oct 31 21:21:40 EDT 2015

LIN -> GRAPH:   wrap in LIN object, sets RC-- on GC
GRAPH -> LIN:   RC++

It's ok to unpack the lin.


Entry: Renamed LIN -> UNIQ
Date: Sat Oct 31 22:11:33 EDT 2015

Since that's its only function now.

Still there are some problems.  GC will not touch leaf objects; they
are treated as constants. The UNIQ wrapper is actually necessary for
data to be freed by GC at all.


Entry: Next
Date: Sun Nov  1 08:57:35 EST 2015

Use it for a bit to see if there are any obvious bugs, then later it
should become clear how to implement the ref/const mangement more
transparantly.


Possible next steps:

  - integrate zl with pf (start with video capture + network comm).
  
  - integrate pf with Erlang

  - integrate cauterize
    https://github.com/cauterize-tools/cauterize


Just from gut feeling, what is really missing is a way to cut through
all the red tape.  One reason I'm thinking about cauterize..

What is cessary to get access to zl object methods in pf?


Entry: Next, be serious
Date: Tue Nov  3 07:22:08 EST 2015

- get v4l to work in pf through automatic binding

- add some image processing code


Entry: API coupling
Date: Tue Nov  3 07:30:24 EST 2015

The *.api files are not a good idea.

It is better to keep those in C, possibly stylized C, and parse them
in racket, just as is done with the other.

Next goal: zl api distiller.

Use a packrat parser in racket:

http://bford.info/packrat/
https://github.com/kazzmir/Pegs
http://www.lshift.net/blog/2005/08/11/extensible-parsing-systems/


But it's probably best to avoid parsing, and specify the api in s-expressions.
Maybe in a way that is compatible with cauterize?
At least, once it is available in s-exp it is easy to translate to something else.


Entry: Binary tree rewriter
Date: Sun Jan 24 14:13:43 EST 2016

The idea is to factor out the implementation of libprim, which has
opaque primitive objects, and a tree memory consisting of:

- data stack
- return stack
- freelist

Operations involve motion between these 3 stacks, or between the
freelist and the tree on top of the data stack.

Internally, there will be aliasing, but at the pf level this can be
completely hidden behind primitive operations.


Entry: linear language
Date: Sun Apr  3 15:29:24 EDT 2016

I've been playing a bit with Erlang lately, trying out the Erlang on
Linux, small state machines on microcontrollers approach.  What I miss
is a way to run something more erlang-like on the uC.  E.g. spawnable
tasks, high level data structures.

libprim/PF might be a good candidate for this.  that is, the linear
machine.


Entry: Erlang marriage
Date: Sat Aug 27 09:30:59 EDT 2016

Marrying this to Erlang would be interesting.  Some thoughts:

- Control / Process separation in Erlang is natural.  A PF instance
  could be run as a port process or as a C node.

- Adding the processing NIFs to Erlang is another straightforward
  extension.  Types would be lost, but could be represented as a
  {atom(), binary()} tuple.


Entry: Erlang cont
Date: Sat Jan  7 14:05:15 EST 2017

Basic design decision is NIF vs external process.  Leaning towards the
latter because of asynchronous events.

That requires a protocol.  Some options:
- raw PF
- BERT
- UBF

I find UBF interesting, so let's make a layer that is a direct
translation between UBF and PF.

To bootstrap the application, it is ok to send PF and e.g. receive
UBF.  PF doesn't support tuples, but tuples could be used to represent
binary leaf objects as tagged tuples.

One thing I do not like about PF/PDP is the representation of types as
symbols that flatten internal structure, such as image pixel format,
pixel layout and dimensions.  Is it worth doing this differently?

But UBF is now tagged unmaintained.  That doesn't look like a good
sign...