Wed Jun 25 16:28:39 CEST 2008

History and Semantics of Coma

This text deals mostly with the implementation and how this approach
came into being.  The semantics of Coma are explained more concisely
in a different post:


An interesting idea in


is the re-introduction of stacks in the implementation of rewrite
rules.  Manfred talks about giving Joy a rewrite semantics, without
the use of stacks, but eventually comes back to stacks as a way to
efficiently implement these rewrite rules.  This seems to be due to
the fact that Joy's rewrite rules are local, and stacks support
operations with local effects.

A more formal treatment would be definitely interesting.  In Coma
however, pragmatism rules and rewriting is implemented as
deterministic target stack machine code pattern matching.

  * Primitive semantics in Coma are provided by operations on target
    machine code programs.  Each identifier (word separated by spaces)
    that occurs in a source file represents a pure function that
    transforms a list of machine instructions.  Primitive functions
    are declared as pattern matching rules, which form an
    implementation of a set of rewrite rules.

  * New Coma functions can be obtained by means of ordinary function
    composition, which is syntactically represented by concatenation
    of function names, yielding a concatenative language.

       x == a b c

  * With identifiers primary representing TRANSFORMERS, the DERIVED
    semantics of identifiers is obtained as the target program which
    results from the application of the function represented by the
    identifier to the empty program.

It is essential to separate the two levels of semantics: code
transformers, and the code being transformed which is exectued by a
concrete machine.  Obviously, these two are somehow linked!  I.e. for
PIC18 the transformer "+" has the derived semantics of addition
because it generates an ADDWF instruction, but it also performs
partial evaluation if possible.

To make a concatenative language useful, some form of locality of
effect has to be introduced(1).  Locality makes it possible to create
reusable components.  A straightforward way of introducing locality is
to make the space of values on which the functions operate take the
form of a space of stacks of values.

For Coma, the choice is easy to make:

  * Target programs are a concatenation of primitive machine

  * Rewriting happens locally: Rewriting functions operate on
    (possibly infinite) target programs by replacing a finite number
    of instructions M by a finite number of instructions N at the
    right end of a target program, leaving the leftmost instructions

This makes the Macro language a purely functional concatenative stack
language: all its primitives operate on a stack of values, each value
representing a machine instruction.

That's nice and clean. The language makes it possible to implement a
peephole optimizing compiler which next to code generation (macro
expansion), also performs some form of partial evaluation (target code
rewriting that eliminates run time computation by reducing the number
of instructions necessary to perform a certain computation).

To perform partial evaluation correctly, care needs to be taken to
align semantics that are a consequence of computations that are
performed during compilation, with computations that need to be
postponed to runtime. For example, the program [1 1 +] might be
rewritten to [2], but [+] has no rewrite rule, so in order to preserve
proper dervied semantics of '+' it needs to be rewritten as a program
that adds two numbers present on a run time stack, and replaces them
with the result(2).

In short, starting from the locality of code transformers, and
consistency requirements coming from partial evaluation operations,
the target machine has to behave as a stack machine.

Now, let's make it a bit more useful. In order to get interesting
properties for parameterized metaprogramming, we relax the semantics a
bit, allowing for the existence of code transformers without
associated derived semantics.  Individual code transformers,
identifiable by names in a source program, might _lack_ derived
semantics because:

  * their domain does not contain the empty program.

     They do not have a purely code-generating form but need non-empty
     program input to rewrite to something else.

  * their codomain is not contained in the set of concatenative
    programs generated by the target machine primitive instructions.

     They generate intermediate instructions, which are compile time
     values, that might be collected by other rewrite rules to
     influence their code generation.

Such macros are called ``intermediate macros''.  Their closing space
is called the space of intermediate programs, which is generated by
the set of primitives comprised of (real) target primitives combined
with these (virtual) intermediate primitives(3).

This allows to define the notion of 'real' macros as macros that (are
concatenative and) map the empty program to a program in the target
program space.  Real macros can be instantiated on the target machine.

Using an (intermediate) program space that is larger than the target
program space gives a means to use intermediate macros for
constructing components of parameterized target programs: build real
macros by composing intermediate macros.  It provides a parameter
passing mechansim for macros that can host values not possible to
represent in the target semantics(4).  This makes it possible to factor
code generation patterns into smaller elements that can be recombined
in several ways.


Using rewrite primitives and function composition, the Coma language
allows the conversion of a concatenative program to a value whenever
the concatenative program has a _pure_ associated target semantics
(all operations are local to the stack), meaning the entire program
can be reduced to a stack of values.

However, the target semantics isn't pure because target programs
depend on input available only at run time.  As a consequence, not all
programs can be completely reduced.  This partial reduction (partial
evaluation) is the target program.

Combine this with a larger compile time value space and associated
rewrite rules to get a powerful metaprogramming system.

(1) Or modify all of it in a uniform way.  For example a language like
    FP or any kind of monad language might have primitive operations
    that modify all the state.  For example a 'map'
    operation.  However, I'm more interested in a model for a complete
    system, not an embedded sublanguage.

(2) For Purrr18, the semantics doesn't always match completely.  For
    example, numbers and associated operations are bignums at compile
    time, but 8 bit integers for run time operations.  This could be
    called 'projected semantics'.  I've found this to be really
    useful, i.e. to compute constants.  This combination of a
    well-defined macro semantics with a link to a (somewhat less
    powerful and approximate) target program semantics gives a very
    powerful metaprogramming system: the time at which computations
    happen can be largely ignored when trying to understand overall
    program semantics, and can be unignored when trying to understand
    its implementation (instantiation and approximation).  Both things
    are equally essential in resource constrained programming, less so
    in general programming where correctness/consistence usually takes
    precidence over implementation efficiency.

(3) For example, in the PIC18 architecture, there is no generic
    conditional branch instruction, but a whole family of such
    instructions optimized for different conditionals.  In the code
    generation, this is solved by using a virtual intermediate
    condition value type that is combined with a conditional branch
    instruction to yield one of the specialized branch/skip

(4) Somewhat analogous to quarks and gluons in physics.  Quarks can be
    bound to other quarks through gluon exchange, but are never seen
    in isolation.


Now, that's a nice story, but that's not how it happened :) The
specification of rewriting rules came quite naturally as a syntactic
abstraction for some previously manually coded peephole optimizations
in the first iteration of the Forth compiler.  This rewrite system
however seemed way more useful than just for performing simple
optimizations, and by introducing pseudo target assembly instructions
has been transformed into a different kind of two-level language
semantics: powerful compile time type system with an analogous run
time type system with 'projected semantics' derived from eager macro
rewrite rules.

The downside of this is that in order to write programs, one has to
make decisions about what to instantiate, and what not.  It might be
interesting to try how far this can be automated: don't instantiate
what cannot be instantiated, but try to isolate subprograms that have
real semantics.