Sun Jun 15 12:25:15 CEST 2008

Combining Forms and Higher Order Macros

EDIT: This used to be a blog entry, but is too confused to qualify as
such.  Most of this is about the clean coma language with higher order
macros, which needs to be implemented first before any of the array
processing extensions can be added.

Coma is used as the pure functional macro kernel of the Forth language
dialect template called Purrr, which is further specialized to yield a
PIC18 Forth dialect.

Coma is essentially a template programming language: all words that
occur in a source file are generate and process intermediate code if
they occur in an instantiating context, i.e. a Forth word.

Classical macros expand code.  The essential extra step to make
program specialization work is a reduction step: after expansion of
several primitives, examine the resulting code and reduce it by
evaluating part of it.  In Coma the expansion and reduction operations
are combined into a single step which are specified as primitive
intermediate code pattern matching words.

In a compositional/concatenative language, the reduction operation
becomes simpler compared to lambda reduction, due to the absence of
variable names.  Other than that, the principles are the same:
anything done in Coma can be extended to expression languages with
formal parameters with a few extra substitution rules.

What I am interested in is to see in what way this can be applied to
higher order programming.  Currently Coma is used mainly to support
Forth (a first order language), using a simple extension that allows
the implementation of Forth's structured programming on top of
conditional and unconditional jumps.

Alternatively, a more Joy-like language can be constructed on top of
Coma, where all control flow is built on top of recursion and the
'ifte combinator.

This is where I'm a bit in the dark.  It is possible to create
replacements for Forth's structured programming words by using higher
order combinators that perform partial evaluation on code quotations,
inlining them.  A construction like

     (a b c) (d e f) ifte

would essentially compile to Forth's

     if  a b c  else  d e f  then

There are two main problems with this approach. One is should 'ifte
have a semantics if it has arguments that cannot be evaluated at
compile time?  It doesn't seem too difficult to allow a run-time code
quotations to _exist_ at run time; they are just code pointers.
However, allowing them to be _constructed_ at run time (the analogy of
closures) requires some run-time memory management (either full GC or
linear memory).

Another problem is explicit recursion.  The main difficulty about PE
is when to stop.  It's not possible to unroll recursive definitions
completely since this provides an infinitely large program.  Wadler
talks about this in the orginal deforestation paper, where such
infinite expansions are caught by introducing run-time recursion of
newly generated functions.  An alternative approach (Bananas and
Lenses) is to solve the PE problem at a higher level: by identifying
(a limited set of) higher order combinators and deriving a set of
rules about how they combine, program manipulation is possible on that
higher level, instead of working with the mostly unstructured
recursion (goto!) in unfolded combinator bodies.  In Backus' FP, all
the combinators are fixed, and there is no composition mechanism for
combinators. According to Backus (Iverson?)  this would force
programmers to learn to use the available operators efficiently.

What I'd like to figure out is what set of data types + functional
forms would be good for certain classes of problems related to
resource constrainted programming.  How to reduce the space of
programs such that efficiency is guaranteed, but programs can still be
written in a high level declarative form.