Tue May 27 14:55:47 CEST 2008
Forth and Metaprogramming
This article introduces the general architecture of the Forth
dialect's language tower.
Using a highlevel language to describe a problem and compiling it down
to lower level code is such a ubiquitous abstraction mechanism it is
hardly noticed as one. But what happens when this abstraction
mechanism is made programmable?
Instead of using a fixed compiler for a fixed language, make the
compiler extensible using a 'plug-in' mechanism. This is one of the
powerful ideas behind languages like Lisp and Forth, where the
extension mechanism is implemented in terms of code transformers, also
called macros. In Lisp, the transformation is performed on structured
data, while in Forth the transformation is similar, but usually
implemented more directly without intermediate data step.
With a powerful extension mechanism, the base language and compiler
can be kept relatively minimal: a lot of choices can be left to the
programmer, who can build on top of the simple language primitives.
The choice of base language then can be to maximise flexibility
(Scheme: the untyped lambda calculus with a garbage collected
persistent memory model) or to minimise implementation complexity
(Forth: stack machines with a concrete array memory model).
In Lisp in particular, there are 2 composition mechanisms:
(A) procedure: composition by lambda abstraction and application
(B) notation: composition of macros (code transformers)
Here (A) is the basic composition mechanism and can be used to express
solutions to all problems, but is not always terribly succinct because
of its intrinsic notational limitation. The (B) mechanism can be used
to solve most notational problems by translating arbitrary notation
into base language syntax(1). An example of a severe notational
problem is the expression of one language in terms of another one.
The Forth dialect in Staapl is implemented as a set of pure
concatenative functions that generate or transform machine code. This
functional approach greatly simplifies compiler and optimizer
construction. The language is layered in the following way:
* [PRIMITIVE CODE PROCESSING] The macro language primitives are
written mostly in terms of a pattern matching language that
operates on target (pseudo) code. This implements code generation,
peephole optimization, partial evaluation, arbitrary code
generator plugins and specification of intermediate (pseudo code)
objects used for communication between code generators for compile
time parameter passing.
* [MACRO COMPOSITION] The macro language is a purely functional
concatenative language (Coma, based on Scat) with extensions to
implement Forth style structured programming words. It can be used
to compose the primitive code generators into higher level code
* [INSTANTIATION] In a Forth file, one has the choice to define
functionality as macros (2) or as instantiated code. This is
implemented using an identical (3) syntax to make it easy to
switch between the two modes.
* [PREFIX PARSER] The forth syntax layer provides a pattern
substitution mechanism for prefix words in order to maintain
standard Forth syntax (4).
The power of Purrr lies mostly in the combination of the purely
functional concatenative macro language supporting a form of
quasiquotation to build template code, and the pattern matching
binding form used to define language primitives that perform partial
evaluation and more general compile time tranformation of target stack
(1) Whether (B) is necessary also depends on the base language. For a
strict language, composition of code that manipulates evaluation
order cannot be performed with (A) alone. (for example 'if' and
'delay' in Scheme). In a lazy language like Haskell, (A) can be
stretched a long way, however, Haskell is full of syntactic sugar
and there's a Template Haskell library to perform (B).
(2) Traditional forth implements primitives in machine code and tries
to keep the use of macros to a minimum, to limit (on-target)
compiler complexity. This is not a constraint in the
cross-compiled Purrr language, where all primitives are macros and
no primitive run-time code is necessary.
(3) Almost identical. The instantiated Forth language differs from the
Macro language in that it has access to the data structure used
for run-time function composition: the return stack. In addition,
Forth words can have multiple entry points, which means code can
fall through to the next definition in a source file.
(4) Traditional Forth employs a more direct and low-level
metaprogramming syntax (i.e. "postpone", "compile", "accept",
...), and uses reflection to reduce code size. Purrr limits this
kind of reflection and replaces it by 'unrolled' language
towers. It is still an open question wether using standard Forth
syntax for the Purrr system is a good idea. Forth syntax is
limiting to the way metaprogramming is used in Purrr, and is also
fairly bound to the reflective structure of a standard Forth
parser/compiler. In any case, in Purrr, the Forth syntax is merely
a front-end to a more general s-expression based syntax for the
Macro language, so it is always possible to fall back onto this
syntax to express particular abstractions.