Tue May 27 14:55:47 CEST 2008

Forth and Metaprogramming

This article introduces the general architecture of the Forth
dialect's language tower.

Using a highlevel language to describe a problem and compiling it down
to lower level code is such a ubiquitous abstraction mechanism it is
hardly noticed as one.  But what happens when this abstraction
mechanism is made programmable?

Instead of using a fixed compiler for a fixed language, make the
compiler extensible using a 'plug-in' mechanism.  This is one of the
powerful ideas behind languages like Lisp and Forth, where the
extension mechanism is implemented in terms of code transformers, also
called macros. In Lisp, the transformation is performed on structured
data, while in Forth the transformation is similar, but usually
implemented more directly without intermediate data step.

With a powerful extension mechanism, the base language and compiler
can be kept relatively minimal: a lot of choices can be left to the
programmer, who can build on top of the simple language primitives.
The choice of base language then can be to maximise flexibility
(Scheme: the untyped lambda calculus with a garbage collected
persistent memory model) or to minimise implementation complexity
(Forth: stack machines with a concrete array memory model).

In Lisp in particular, there are 2 composition mechanisms:

  (A) procedure: composition by lambda abstraction and application
  (B) notation: composition of macros (code transformers)

Here (A) is the basic composition mechanism and can be used to express
solutions to all problems, but is not always terribly succinct because
of its intrinsic notational limitation.  The (B) mechanism can be used
to solve most notational problems by translating arbitrary notation
into base language syntax(1).  An example of a severe notational
problem is the expression of one language in terms of another one.

The Forth dialect in Staapl is implemented as a set of pure
concatenative functions that generate or transform machine code.  This
functional approach greatly simplifies compiler and optimizer
construction.  The language is layered in the following way:

  * [PRIMITIVE CODE PROCESSING] The macro language primitives are
    written mostly in terms of a pattern matching language that
    operates on target (pseudo) code. This implements code generation,
    peephole optimization, partial evaluation, arbitrary code
    generator plugins and specification of intermediate (pseudo code)
    objects used for communication between code generators for compile
    time parameter passing.

  * [MACRO COMPOSITION] The macro language is a purely functional
    concatenative language (Coma, based on Scat) with extensions to
    implement Forth style structured programming words. It can be used
    to compose the primitive code generators into higher level code
    generator abstractions.

  * [INSTANTIATION] In a Forth file, one has the choice to define
    functionality as macros (2) or as instantiated code.  This is
    implemented using an identical (3) syntax to make it easy to
    switch between the two modes.

  * [PREFIX PARSER] The forth syntax layer provides a pattern
    substitution mechanism for prefix words in order to maintain
    standard Forth syntax (4).

The power of Purrr lies mostly in the combination of the purely
functional concatenative macro language supporting a form of
quasiquotation to build template code, and the pattern matching
binding form used to define language primitives that perform partial
evaluation and more general compile time tranformation of target stack
machine code.


(1) Whether (B) is necessary also depends on the base language. For a
    strict language, composition of code that manipulates evaluation
    order cannot be performed with (A) alone. (for example 'if' and
    'delay' in Scheme). In a lazy language like Haskell, (A) can be
    stretched a long way, however, Haskell is full of syntactic sugar
    and there's a Template Haskell library to perform (B).

(2) Traditional forth implements primitives in machine code and tries
    to keep the use of macros to a minimum, to limit (on-target)
    compiler complexity. This is not a constraint in the
    cross-compiled Purrr language, where all primitives are macros and
    no primitive run-time code is necessary.

(3) Almost identical. The instantiated Forth language differs from the
    Macro language in that it has access to the data structure used
    for run-time function composition: the return stack. In addition,
    Forth words can have multiple entry points, which means code can
    fall through to the next definition in a source file.

(4) Traditional Forth employs a more direct and low-level
    metaprogramming syntax (i.e. "postpone", "compile", "accept",
    ...), and uses reflection to reduce code size. Purrr limits this
    kind of reflection and replaces it by 'unrolled' language
    towers. It is still an open question wether using standard Forth
    syntax for the Purrr system is a good idea. Forth syntax is
    limiting to the way metaprogramming is used in Purrr, and is also
    fairly bound to the reflective structure of a standard Forth
    parser/compiler. In any case, in Purrr, the Forth syntax is merely
    a front-end to a more general s-expression based syntax for the
    Macro language, so it is always possible to fall back onto this
    syntax to express particular abstractions.