Wed Aug 6 20:26:42 CEST 2008

Formalizing Coma

Scat is a concatenative language modeled after Joy.  Syntactically, a
program p is a concatenation of programs p_i or a primitive program
word p'

    p = (p_0 p_1 ...) | p'

This is isomorphic with the semantics, where each program word can be
associated with a function, and syntactic concatenation maps to
function composition.

For Scat, the operational semantics (the implementation in terms of a
primitive machine) is given by primitive Scheme functions closed over
a state space represented by a Scheme data type. In the case of Scat
and Coma, this is a stack, in case of Coma+Control, this a pair of

For most practical use, but not necessary for theoretical use, the
state contains at least a stack. It's reason of being is to introduce
locality in the effect of functions. It is useful for creating a
practical programming language and to derive simple local syntactic
rewrite rules that can be derived from local stack operations.

Reduction (evaluation) of Scat expressions is eager, and happens from
left to right, where each primitive function part of a larger
composition is applied to the state, which is threaded through the
computation. This is the same as a sequential machine with global

Note that, because function composition associates with function
application, the order of evaluation is arbitrary

    (S_0 a) b = S_0 (a b)

The function a applied to the initial state S_0 returns a value that
when passed to the function b yields the same result as evaluating the
composition (a b)

Now, this only useful if you can prove that there is some c with
nice properties such that

    (a b) = c

which allows the application to be written as

          S_0 c

The ``nice properties'' can be simplified to mean ``shorter code''.

For Coma, the eventual goal is to generate machine code (syntax) from
a concatenative source program (syntax). So instead of looking at
associativity of composition, we should look at associativity of
concatenation (syntax). More specifically, at rules that allow the
substitution of a concatenation of program words i.e. (x y z) by
another concatenation of words (s t).

     a b x y z c  =  a b (x y z) c  =  a b (s t) c  =  a b s t c

This uses the rule

     x y z = s t

Now, where do these rules come from? In Staapl, they are syntactic
transforms that preserve the associated semantics. These semantics are
operational semantics derived from a stack machine's instructions.

TODO: Implementation.

First: translate to QW, CW language.
Then implement rewrite rules.


  * explain where the assymmetry comes from: why does a rewrite rule
    operate on code from the right only?

  * explain in a simple way that all semantics comes from target
    idealized (un-projected) machine operations.

  * explain how you go from arbitrary substitutions to a greedy
    left->right substitution scheme.

EDIT: This needs some cleanup. It deserves a separate paper.

What I'm trying to explain:

  * Coma's primitives are intermediate language transformers. The
    intermediate language has essentially 2 instructions: execution
    and quotation. This is then extended with termination, jumps and
    conditional jumps.

  * Some of the intermediate language is real target language. Open
    question: good or bad? Should this be separated out? Due to the
    pattern matching, it behaves as opaque black-box code + it allows
    the implementation of simple peephole optimizations. (This is ok:
    it's a natural extension of the opaque CW data vs. QW that can be
    partially evaluated. actually, it's a mix of the 2.)

  * It also allows arbitrary data to be passed from macro to macro,
    which is a vehicle for arbitrary incremental code generation: this
    is the presence of QW in the language: it embeds a dynamic stack

  * This can be viewed as eager partial evaluation. In a concatenative
    language, PE is rather trivial (no variable substitutions). But,
    in most cases it is not complete: not all primitives exist at run
    time: they need to be specialized / combined.

  * It is not PE. Actually, it is, on the lowest level (math + stack



This contains quite some posts about the semantics of Coma. It is work
in progress. Essentially, the semantics is defined from a set of
rewrite rules that implement CONCATENATE operation as a binary
function taking a program in intermediate form, and a program

This operation performs semantics preserving transformations.