Thu Jul 16 13:21:53 CEST 2009

Escaping concrete machine semantics through simulators

One thing which is important in the way Staapl is built up is the
relation to the concrete machine.  Above else, there should be no
abstraction barrier to the lowest level.

What I mean is: currently, Staapl is just a fancy macro assembler.

All semantics is relative to the specification of the instruction
rewrite rules.  This is called "specification by compiler".

This makes it an electronics engineer's dream (fully hackable low
level control with ad-hoc abstraction) and a computer scientist's
nightmare (essentially there is nothing you can do rigorously on the

I would like to bring this nightmare to a closure.  Taking CPU
simulators into the picture, defined relative w.r.t some stable
semantics, i.e. Scheme's.  That way there is at least a single
well-defined semantics for a subset of the base language, transitively
defined in terms of Scheme.

Think of it like this:

             Arch 1
          /         \
   Forth  -- Arch 2 -  Scheme
          \         /
             Arch 3

The left side is a Forth dialect which directly translates to a couple
of machine architectures.  The forking point from Forth to Arch is
where different implementations's concrete semantics diverge.

The right side is a fully-specified bottom line of each of the machine
semantics in a standard representation (could be Scheme or some other
form written in scheme that could be easily translated to a subset of
C to solve the practical issue that simulation needs to be fast.)

The problem is to "lift" the semantics of the simulator in terms of
Scheme code _through_ the implementation of each of the different
architectures to make it decidable _by machine_ which constructs are
portable and well-defined, and which are target-specific hacks.

The basic idea to make this work is that given a specification of a
simulator, Forth functions can be mapped to a pair containing:

   - stateless computation (boolean functions)
   - clobbered machine state

What this might solve is the language specification problem:  it makes
it possible to define a "moving target standard" in terms of a
collection of source code + making this standard machine-checkable.

Alternatively it can assist greatly in the definition of a Forth ->
ASM compiler, since the semantics is directly testable.  Possibly
human intervention is still necessary to come up with good translation
rules, but the correctness could then be either proven or if that's
too difficult, unit-tested.

It seems to me that what needs to be done is to model everything as a
set of equations, and derive everything from that.