Mon May 26 20:33:30 CEST 2008

Scheme, Coma and Stack Machine Code

This post talks about how the concatenative macro language Coma is
integrated with Scheme code.  It explains the mechanisms of binding
through pattern matching and derefence through quasiquotation.

To set the stage, let's define a deeply embedded program as the
specialization of a general program in such a way that run-time
complexity (and so cost of supporting hardware) is significantly
reduced.  For large reduction in cost/complexity (or a large enough
production volume) significant fixed engineering cost can be spent on
specialization.  Deeply embedded programming is about spending time
and money on program specialization.

When the specialization techniques can be abstracted, manual program
specialization (writing a machine code program by hand) can be
automated using staging: a parametric model of the code is constructed
which, when given some parameters, reduces to a specialized model.

BINDING: stack machine code pattern matching

The binding form can be best introduced in the form it is most used,
to impelement code transformation rules, which are mostly based on
partial evaluation of concatenative code.  For example, the
concatenative code sequence

                             1 2 +

will is reduced to a single literal 3 in the target code.  This
mechanism is implemented by specifying the behaviour of the code
transformer "+" using code substitution patterns.  The specific
substitution pattern for the case above is:

                [qw 1] [qw 2]     ->     [qw 3]

Where "[qw 1]" is the target (virtual) stack machine code generated by
the code generator called "1".  This is an instance of the generic
substitution pattern:

                [qw a] [qw b]     ->     [qw (+ a b)]

Here the Scheme expression "(+ a b)", which adds the two variables "a"
and "b", is used to perform a computation that give the correct
semantics to this code transformation rule.  Any sequence of two
literal values represented by two "[qw <number>]" stack machine
instructions gets replaced by a single literal value containing the
sum of these two numbers.

Staapl contains a scheme macro called "patterns" which is the the
destructuring binding from for target stack machine code[2].  This can
be used to build subsitution rules.  The complete example for "+"
looks like:

(patterns (macro)
   (([qw a] [qw b] +) ([qw (+ a b)]))

This reads as follows: if the "+" code transformer operates on a stack
of two literal "qw" instructions, bind the values these "qw"
instructions contain to the variables "a" and "b" respectively, and
replace the two instructions by a single "qw" instruction that
contains the result of the scheme operation "+" applied to the
variables "a" and "b".

The general idea behind the substution rule is:

    If all the inputs for an operation are available during code
    generation time, then the operation can be performed, and only
    code needs to be generated that makes available this result at run

This can be generalized quite far: using this mechanism, the basic
concatenative macro language can be extended by explicit (virtual
machine code) generators, accessible from concatenative source code,
for any kind of Scheme -> assembly code metaprogramming.

QUASIQUOTATION: name space merging

To generate Coma code from within a Scheme context, i.e. as the right
hand side of a "patterns" rule, a quasiquotation mechanism is
provided.  The "unquote" operator has a similar semantics for the
"macro:" concatenative code constructor as for the "quasiquote" list

Consider these 3 forms:

  (macro: 1 +)          ;; 1.
  (macro: 1 ,operator)  ;; 2.
  (macro: ',operand +)  ;; 3.

1. This constructs a macro which is the composition of the macro "1"
   (which generates code to quote the number 1) and the macro "+"
   which is fetched from the '(macro) namespace.

2. Constructs a macro which is the composition of the macro "1" and a
   macro bound to the Scheme identifier "operator".

3. Constructs a macro which is the composition of a macro that
   generates code to quote the constant bound to the Scheme identifer
   "operand" and the macro "+" from the '(macro) namespace.

In short, the "unquote" operator allows the introduction of code
generators and quoted date from the Scheme namespace into
concatenative code, which uses a namespace different from that of
Scheme.  This allows the construction of code templates for Scheme ->
coma metaprogramming.


These two mechanisms (deconstructive binding and quasiquotation) allow
a simple impedance match between concatenative code an Scheme,
building on top of Scheme's lexical structure.


[1] This can also be combined with the second quasiquoting macro
    mechanism to yield more powerful constructs.

[2] The binding construct simulates pattern matching for algebraic
    data types, but limited to stacks: next to deconstruction it
    provides a single level of construction (the "qw" instructions on
    the right hand side are interpreted as constructors).  On top of
    that, the pattern matching is from the scheme/match PLT module.

[3] Instead of a stack of instructions, the RHS of a "patterns" rule
    can contain arbitrary scheme code if it is a list that starts with
    an identifier.  (It can always be made such by wrapping it in a
    "begin" form).  If the RHS is a list of lists, it is interpreted
    as a machine code sequence constructor.