Thu Jun 26 10:15:58 CEST 2008

2 stage semantics


It might be helpful to put on a background light: i'm trying to write
a system for parameterized programming of tiny computers (currently
Microchip PIC18 microcontrollers) based on concatenative and
functional languages. I'm interested in limited order semantics mostly
from a perspective of optimal implementations: how simple can the
eventual semantics be made without having to sacrifice space/time
efficiency. Currently I'm leaning towards a full macro system with
first class macros, but I'm interested in this limited order
semantics, and like to see if it can somehow be embedded in my

What i'm trying to figure out is how i can use 'higher order macros' in
my system to allow for limited order semantics as you are suggesting.

The approach i'm taking is:

   * Use 2 stages: concentrate on the first stage which consists of
     a joy-like language that operates on a stack of machine code
     instructions (stage 1) and a stage that executes the resulting
     machine code (stage 2).

   * Start building stage 1 semantics from rewrite rules that operate
     on programs built from a single stage 2 instruction QW (quote
     word), which loads a number onto the run time stack. For example,
     the stage 1 function '+' performs the following program

            ...  [QW 1] [QW 2]   ->  ... [QW 3]

A complete Joy-like semantics can be built from this, if the fact that
QW can only accept numeric arguments is ignored.

At this point, some operations might not be defined for all input
programs. For example '+' applied to the empty program is not
defined. What can be done here is to start building target semantics
based on the program rewrite rules: use a couple of instructions that
manipulate the run time stack to make sure '+' can be defined on all
input programs:

            ... [QW 1] -> ... [ADDLW 1]
            ...        -> ... [ADDWF POSTDEC0 0 0]

Doing this for the whole set of primitives gives a language with 2
stage semantics:

   stage 1: program text represents machine code rewrite functions
   stage 2: rewriting of the empty program results in 'real' programs

The remaining problem is that some values used as arguments to machine
code instructions might not be numbers: the Joy like language is
higher order, so quoted programs are an example of such values. (We
can create another problem by introducing intermediate instructions,
which are stage 1 data objects that do not represent target values nor
instructions. However, this is beneficial for the eventual goal of
paramerized programming.)

As a result, not all (macro) programs that have a stage 1 semantics
can be attributed a stage 2 semantics, because applying them to the
empty program does not yield a program that lies in the target program
space due to the use of non-numeric values, or the use of pseudo
machine instructions.

What I'm already convinced about is that this approach works pretty
well for manual metaprogramming: by requiring the programmer to
instantiate the 'real' programs as parameterized general macros,
programs can be built in a Forth style language. (Think Forth run-time
and immediate words). Allowing for compile-time data types that do not
translate to the (necessarily) limited target machine semantics gives
access to a very powerful way to factor/modularize parameterized
programs (specialized code generators).

What I'm interested in is to figure out how to perfrom automatic
instantiation which gets rid of the 2-mode word/macro Forth-style
semantics, how to turn non-specialized program generation into type
errors where the source can somehow be blamed, and how to embed
limited order operators in a sound way.