Tue Sep 16 13:06:06 CEST 2008

Staapl's metaprogramming approach

Staapl isn't a conventional multi-stage system which concentrate on
``code as data''.  Staapl concentrates on _code transformers_ where
higher order generation is modeled with higher order functions
(functions producing code transformers) instead of explicit staging.


There are only two forms of ``code'' in Staapl:

    1.  Machine code (M), represented a list of primitive machine
        instructions, where the machine primitives are modeled as an
        algebraic data type[1].

    2.  Machine code transformers (M->M), modeled as functions that
        operate on _one end_ of a list of machine primitive
        instructions.  These machine code transformers thus behave as
        instructions of a stack machine.

( Assume for now that the target machine is a concrete 2-stack
  machine.  The partial evaluation mechanism explained below is
  suitable to include compile time mapping to register based machine
  architectures in a very straightforward way. )

Because machine code transformers can be combined using function
composition, they can be used to build a programming language (== a
set of primitives + composition mechanism).  This forms the language
of Concatenative Macros called Coma.  By enforcing M->M transformers
to only operate on one end of generated code, Coma behaves as a stack
language that can support a Forth language dialect.

An important observation is that this language doesn't merely support
macro _expansion_ as one would see in a macro assembler.  The
mechanism necessary to pass arguments to macros is _identical_ to the
one that represents machine code: the algebraic data type mentioned

Macro arguments and target code are placed on the same level by the
introduction of the (pseudo) machine instruction [qw <number>], short
for Quote Word.  This instruction causes a number (a machine data
word) to be loaded on the run-time parameter stack when the generated
program executes.  However, when the qw instruction appears as the
input of a transformer, it can be removed from the instruction stack
as long as it is possible to use the the argument it carries in a
compile time computation that doesn't change the meaning of the code.
This is called constant folding[2].

    An example to illustrate this.  The macro "1" pushes the machine
    code instruction [qw 1] to the current code stack.  (qw is short
    for Quote Word).  The run time effect of this code is to load that
    number onto the run time parameter stack.  The macro "2" does the
    same for [qw 2].  The composition of these two macros, the Coma
    program "1 2" will produce code on top of the code stack looking

           ... [qw 1] [qw 2]

    with the top of the stack on the right.  This code has the
    run-time effect of loading the two numbers on the run-time stack.
    With this code present on the code stack, the code transformer
    associated to the coma code "+",can perform a compile-time
    computation, and replace the top of the code stack with

           ... [qw 3]

    In other cases "+" will merely insert code to perform an addition
    at run-time, using data from the run-time parameter stack.

An essential observation is the possibility to generalize the
arguments of [qw <number>] to any kind of abstract data type that can
be represented in Scheme, but behaves as a ``constant'' in the sense
that it can somehow be eliminated compile time.  I.e. it can be a code
generator, or a data structure with a high level representation of
code to be generated.  The key point is that these objects can be
manipulated _from within Forth_ code as if they were present at run
time, hiding the compile-time specialization behind the scenes.  This
allows the addition of high-level domain-specific language extensions
written in Scheme to a project with low-level logic written in Forth,
keeping the local semantics simple.

Both the high and low level languages are thus extensible, and can be
used as the substrate for DSLs.  Scheme's macro system can be used to
construct "data-entry" front-ends for algorithms that compile a
specification down to a level where it can run on the abstract
machine, and interface with low-level code written in Forth (or any
other language based on the Coma language).  All identifiers that
occur in any part of the code are managed by PLT Scheme's declarative
module system, providing a prooven way to control complexity of large
systems through modularity.


In order to make this mechanism more flexible, the Coma _composition_
mechanism is exposed to Scheme.  Next to deconstruction and
construction of stack machine code, it is also possible to _construct_
concatenative macros from Scheme, and to _evaluate_ them.  This
mechanism is orthogonal to the machine code pattern matching

Because of the absence of lexical structure in the combinator based
Coma language, there are no hygiene issues when the composition
mechanism is extended with _quasiquotation_ mechanism that can be used
to construct code templates.

These operations correspond to the MetaML operations of Bracket (code
composition), Escape (unquote) and Run (macro evaluation) with
MetaML's code object is replaced with Staapl's code representation
based on stack machine code transformers.  MetaML's cross-stage
persistance is mostly handled by Scheme's lexical scoping rules, and
doesn't present a problem because code is opaque and represented by


An important remark is that the hardware abstraction layer for the
concrete machine is not the machine language, but a set of named Coma
transformers that needs to be implemented for each concrete machine.
This has the advantage that the machine model doesn't need a byte code
and a separate JIT.

Implementing a port uses exactly the same mechanism as the one used
for the more highlevel metaprogramming on top of the machine model.  A
port is a set of rewrite rules for a concrete instruction set using
the pattern matching and quasiquotation metaprogramming tools.  A port
can be kept as straightforward as possible, using only part of the
real machine's instruction set.  Optimizations can be added as needed,
and pseudo instructions can be added as way to factor code rewrite
rules.  (I.e. the PIC18 port uses factors the highly specialized
test+branch opcodes into separate test and branch pseudo opcodes.)

[1] http://en.wikipedia.org/wiki/Algebraic_data_types
[2] http://en.wikipedia.org/wiki/Constant_folding