This is the Staapl blog, a filtered ramblings file.
For more information see http://zwizwa.be/staapl

Staapl is also known under the working name Brood, which is
unfortunately impossible to google..

For discussions and comments, please join the mailing list at:
http://zwizwa.be/cgi-bin/mailman/listinfo/staapl-list

Entry: The new Staapl (Brood-5)
Date: Wed May 21 16:20:45 CEST 2008

The main motivation for the 5th Staapl iteration was name space
management. The objective was to turn all names that occur in .f files
into (Scheme) compile-time identifiers, and no longer use (Scheme)
run-time hash tables indexed by symbolic names.

The namespace is implemented by prefixing Scat/Macro/Purrr names, so
they can co-exist with and are essentially the same as Scheme
names. For the implementation of the Macro language, there are 2
different namespaces (scat) and (macro) that separate the semantics of
explicit compile time computations and postponed Forth macros in the
Purrr compiler.

As a happy side effect, Scat/Coma/Purrr now has a module system,
piggybacked on that of PLT Scheme. Relying more on Scheme's scoping
mechanism (lambda + modules) instead of ad-hoc hash table based
namespaces improves and simplifies the code quite a bit.

What I realized, is that thinking about code graph structure is a
whole lot more important than thinking about names! It's not names,
it's (nested and shadowing) bindings, the invisible lines that connect
declarations and references, and so build the structure of a
program. Like many things, in retrospect this is really obvious,
especially if you start from lambda calculus, but it had to be right
in my face before I came to realize its importance. Hence the slogan
"Scope is everything!" attributed to Daniel Friedman by Olin Shivers.

So, what does Staapl 5 look like? The Coma language is defined as a
collection of macros, which are (pure) Scat functions that operate on
a stack of PIC18 assembly code (Scat's parameter stack) and map it to
a new stack.

There are essentially 3 phases of compilation:

      (A) Compiler .ss source + Purrr .f source -> Coma functions.

      (B) Coma macro execution -> PIC18 code graph

      (C) Assembly -> binary code + addresses

Phases (B) and (C) can be called trivial. (B) just executes the macro
functions generated in (A) and collects the results, and (C) is a
fairly straightforward assembler that implements address allocation
and relaxation for instruction size optimization. 

At the end of phase (B) there is a target code graph structure, with
each node containing a slice of assembly code. At the end of phase (C)
each node is supplemented with binary code and a code address.

All the magic happens in phase (A). The architecture is a classical
layered bottom-up Scheme application, where (Scheme) macros form the
main abstraction mechanism to construct minilanguages for solving
specific problems. In addition, PLT Scheme's language module features
are used to implement Forth source file syntax on top of basic
concatenative language support.

   * Scat layer. A language layer on top of Scheme that implements a
     concatenative language with threaded state.

   * Coma layer. A collection of Scat functions that implements
     (local) machine code transformations.

   * Forth compiler layer: adds Forth-style control flow words and
     target code instantiation (macro execution) to the Coma language.
     A Forth program consists of declaration of code in terms of the
     macros that generate it.
       
   * On the syntax level, Coma macro language and instantiated Forth
     words are available using a fairly standard Forth syntax+reader
     which allows the definition of new macro and forth compositions
     in a single .f file. This is analogous to Forth's dual mode
     words. Most of the PIC18 kernel software is written in this
     syntax. See pic18/*.f


The identifiers in an .f file only live in phase (A), the Scheme macro
expansion phase. This was a fundamental insight:

  * All names in a .f file are macros = Scheme functions bound to
    identifiers in a module namespace. As such they intermingle very
    well with other Scheme functions, which solves the compiler
    extension problem: you can build all the metaprogramming tools you
    want.

  * Each Forth word (target code entry point) is defined in terms of a
    composition of macros. Creating a .f file is about deciding which
    macros (program generators) to instantiate.

The focus is thus on macros. These macros are pure functions that
operate on rich dynamic data types, and thus have nicer properties
than concrete Forth code, which is very low-level. This gives the
following work flow for writing a Purrr application:

   1. Create a set of highly parameterized macros that is reusable for
      a certain application domain.

   2. Instantiate those macros into concrete code by filling in the
      parameters.

Both phases can yield re-usable code. Not only code generators can be
reused, but also instantiation decisions. Target code size is mostly
determined by which words are inlined, and which words to compose at
run time using function calls.


Entry: Bottom up with holes?
Date: Wed May 21 19:39:42 CEST 2008

( This needs some better explanation. The problem is about
parameterizing the WHOLE compiler instead of a single code
generator. I.e. to add target-specific optimizations to a whole bunch
of arithmetic operators. )

The problem with writing low-level code using high-level macros is
parameterization: how to move provide the static data that allows the
conversion from general to specific.

I like the bottom-up functional approach where parameterization is
handled by providing arguments to functions, including higher order
functions. This approach works well for the core compiler's processing
logic. It improves locality:

    * To write code, you just need to think about the code at hand and
      the lower layer(s).

    * Functional code tests easily (no elaborate test environment with
      intricate dependencies to setup, just provide the arguments).

However, because of the large number of parameters in target specific
code generation and Purrr kernel code, this controlled flow of data
becomes impractical. What you want is a "configuration file": a
dynamic environment that gives context to the code, modifying its
behaviour in a more global way. This pattern is typical for compilers
(data formatters, 3D scene rendering, ...) that produce output that's
highly tuned to a particular medium.

In PLT Scheme code this problem can be solved by 'parameterize', which
provides a mechanism for controlled dynamic binding. This is used in
the Staapl core where appropriate, for example in the RPN macro
transformer.

However, in the Forth language layer this becomes problematic to
express. The problem boils down to redefining the behaviour of a
particular macro. There are 2 cases where this occurs in the current
code base, and I'd like to find out if they are better handled in the
same way, or if they deserve separate abstraction mechanisms:

(A) Compiler specialization. I.e. the core compiler implements the
    partial evaluation for arithmetic operators, but a specialized set
    of macros needs to implement concrete code generation (and
    possibly peephole optimizers), while delegating the abstract
    cases. The point here is to EXTEND behaviour provided by a lower
    code layer, possibly multiple times in different additional
    layers.

(B) Interdependent program modules. The example I ran into today is
    the interactive interpreter which requires 'receive' and
    'transmit' implemented (standard i/o). The essential part is that
    there is a CONNECTION to be made between the module that
    implements the i/o and the interpreter code.

The solution right now is made for (A) but also used for (B), which
might not be a good idea.

Essentially, a compiler is a set of macros. These live in a namespace,
to keep the extension effects local to a (specialized) compiler. By
loading extra functionality, some macros can be extended. The problem
here is multiple extensions: they depend on load order, and the
uncontrolled nature of behaviour modification.

In Brood-4, this problem was solved by loading ALL modules into a
single namespace, and allowing mutual references. For bottom-up
modules with separate compilation a different approach is necessary,
or the previous approach needs to be simulated, which is effectively
what i'm doing.

Now, the question: is a separate once-only linking a better idea for
(B), or do i stick with the imperative OO style 'change behaviour'
approach. And when going for the former, are PLT Scheme units worth
the effort?

Currently I think the real solution is to indeed:
  - use a PLT Scheme units approach for (B)
  - try to limit the use of (A)

What makes this painful is the wind in my back from 2 different
paradigms: 
  
  static, declarative bottom up linking with explicit circular
  references resolution.

                              vs.

  just dump everything in an image, and link by a sequential link
  script.

I'd like to stay with the former, but this requires more explicit
linking. The previous Purrr implementations all used the
one-happy-image approach.


Entry: Code should be data
Date: Fri May 23 09:36:28 CEST 2008

Macros should be about DATA ENTRY. If you can stylize macros such that
what you are actually typing in the end can't be concentrated any
further (an all syrup squishee), then you basicly have a configuration
file picking from your abstracted domain knowledge in a natural way.

In the PIC18 compiler, a nice example is the 'patterns' macro, which
provides an RPN pattern matching syntax to specify compile time
computation and code generation for primitive macros. This defines
Scat words that operate on target instructions:

(patterns (macro) 
   ;; Define '-' to subtract two numbers
   (([qw a] [qw b] -)              ([qw (target: a b -)]))
   (([addlw a] [qw b] -)           ([addlw (target: a b -)]))
   (([qw a] -)                     ([addlw (target: a -1 *)]))   
   (([save] [movf a 0 0] -)        ([bpf 0 STATUS 0 0] [subfwb a 0 0])) 
   ((-)                            ([subwf POSTDEC0 0 0]))
)

The input patterns are assembly code sequences, and the output is
again an assembly code sequence. The 'qw' pseudo operation Quotes a
data Word: it pushes a number to the data stack. Here '-' either is
optimized away, which means the computation performed at compile time
either fully or partially, or compiled as a [subwf] instruction.



Entry: Unrolling Forth
Date: Mon May 26 14:20:11 CEST 2008

The most difficult task in Staapl has been finding a bridge between
the internal s-expression based, purely functional concatenative Coma
language and Forth syntax.

The thing to keep in mind is that:

  * The Coma language itself is the core of the metaprogramming
    system. It's syntax is not standardized, but kept as simple as
    possible.

  * Forth syntax and semantics are important as a user interface.
    Trading the pure concatenative + s-expression syntax for a simpler
    linear one makes the syntax less powerful, but a bit easier to use
    for the bulk of low-level code.

So, to be explicit: i really want both: a very flexible s-expression
based Coma syntax, and something as close as possible to standard
Forth syntax.

The challenge is to map Forth -> Coma.

Forth by itself is highly reflective: macros (immediate words) can be
defined in terms of Forth words, and the words have direct access to
the input stream. The Coma language dispenses of this reflectivity in
favour of non-reflective metaprogramming.



Entry: Extract Method
Date: Mon May 26 14:43:19 CEST 2008

Let's say composition and decomposition are the basic tools of any
engineering discipline:

   * build complex, specialized systems out of simple, generic parts
   * refactor a complex design to create a toolbox of generic parts

Taking apart a complex system that's made from simple parts is the
easiest. Putting it back together again is difficult, but simpler than
putting together a system without a basic plan. The most difficult
step is taking an evolved pot of spaghetti and untangling it to CREATE
a library of simple parts, and rebuilding the system from that.

Decomposition creates interfaces: when that what was once a monolithic
part is split in 2, there is now a thing connecting them. Good
decompositions can be judged by the complexity of the interfaces they
create. You do the right thing if cutting at some point hides the
stuff you want to forget, but keeps open some sane way of interacting
with the rest of the world. Simple interfaces promote reuse.

In (as good as all) programming languages, the procedure / function /
message is a basic unit of composition: it can be used to create a
black box which reacts to input and produces output. Decomposition or
refactoring in such languages amounts to the "extract method" pattern:
split up a single black box to simplify its internals, pushing details
into special cases of more general abstractions.

Looking at Martin Fowler's explanation in Refactoring: Improving the
Design of Existing Code:

    "You have a code fragment that can be grouped together. Turn the
    fragment into a method whose name explains the purpose of the
    method."

Important consequences of extracting methods:

   * compression and modularity: code will be smaller because
     duplication is removed: share NOW. this reduces local complexity.
     it promotes reuse: creates a possibility that code can be shared
     by new nodes LATER, even outside of the current project.

   * documentation: the name annotate the code note with a meaningful
     explanation. the better the factorization, the easier it becomes
     to give names: there is a natural granularity of factorization:
     decompose upto the level where a natural language becomes
     meaningful.

An advantage of concatenative programming languages is that 'extract
method' is a simple cut and paste operation: the absence of names for
function parameters disposes of the hurdle of inventing them when
factoring out code. The stack which 'binds' variables by position
encoding inside a definition, also does this outside. There is no
difference.



Entry: What is Partial Evaluation?
Date: Mon May 26 19:00:32 CEST 2008

Some clarification of terminology:

   * Currying
   * Partial Application
   * Partial Evaluation
   * Staging

These are all very related:

* Curry: move a parameter out of a function to provide a higher order
  function. (L (a b) (+ a b)) -> (L (a) (L (b) (+ a b))). In a
  language like ML or Haskell this happens automatically. In Scheme,
  this needs to be done manually.

* Partial Application: evaluate a curried function (to obtain a
  function of less arguments). 

* Staging: moving comptutations between computation stages (i.e. from
  run-time to compile-time)

* Partial evaluation: program specialization by combining program and
  static input to an equivalent program that when provided with the
  dynamic input produces the same result as the original program.


Partial application (evaluation of curried functions) differs from
partial evaluation mainly because a curried function does not perform
any computation until all arguments are provided. 

The evaluation of a curried function merely creates a closure object,
recording the values of the partial arguments, but does not perform
any computation until the remaining arguments are passed to the
closure. Partial evaluation performs computations that are possible
given some constant arguments, and produces a specialized program
text.

Sources:

http://lambda-the-ultimate.org/node/2266
http://srfi.schemers.org/srfi-26/mail-archive/msg00000.html
http://www.brics.dk/~hosc/local/HOSC-13-4-pp355-368.pdf
http://www.dina.kvl.dk/~sestoft/pebook/


Entry: Partial Evaluation with Holes
Date: Mon May 26 20:33:30 CEST 2008

For deeply embedded code, generality can usually not be tolerated at
runtime because there is much to gain from specializion to a fixed set
of parameters. Let's stop begging the question and define a deeply
embedded program as the specialization of a general program in such a
way that run-time complexity (and so cost of supporting hardware) is
significantly reduced. For large reduction in cost/complexity (or a
large enough production volume) significant fixed engineering cost can
be spent on specialization.

So. Deeply embedded programming is about spending time and money on
program specialization.

Partial evaluation (PE) is an operation on program text: it takes an
input program and static data, and automatically specializes the
program to the static data. It comes as no surprise that this problem
in general is not a simple one.

What is called PE in the Purrr language is actually a combination of
straightforwardly implemented concatenative PE and an interface for
completely generic but explicit metaprogramming using instantiation of
parameterizable code templates. It could be called Partial Evaluation
with Holes (PEH).

So, how does it work?

       PE from (greedy) deterministic pattern matching
                             =
                 a typed template language

So, by fixing the algorithm used to implement PE, a language emerges
that is useful for other kinds of code generation. Let's spin that out
a bit.

PE in a concatenative language is quite straighforward: function
composition is associative which makes evaluation order a parameter to
play with. Compositions of pure functions can be performed at compile
time to generate composite functions that are more efficiently
implemented, while other compositions can be postponed to run time due
to dependency of dynamic data (1).

This is because concatenative syntax allows to abstract away the
composition method: a function can always be trivially inlined,
instead of being invoked at runtime using a the run-time composition
mechanism: the machine's function call (2). When inlining multiple
functions, there can be an opportunity for specialization by moving
some computations to compile time. For example, inlining the functions
[ 1 ], [ 2 ] and [ + ] produces an inlined composition [ 1 2 + ] which
can be replaced by the composition [ 3 ]. This is automatic program
specialization by Partial Evaluation in its purest form.

In Purrr, the Partial Evaluator is not implemented as a separate
component. PE is a consequence of the actions of the machine model,
which is specified in terms of primitive Purrr macros, which implement
a map of recently generated code (the top of the compilation stack) to
new code to be generatied (placed on the compilation stack). These
primitives are expressed in a language with deterministic pattern
matching. It allows the specification of the following compiler
components:

  * target code generation
  * peephole optimization
  * partial evaluation
  * generic parameterized template instantiation

The first 3 could be counted as components of pure partial evaluation.
The last one however is not: it is an interface that connects the
concatenative macro language to explicit code generation tools. It
allows the use of templates that have no target semantics unless they
are parameterized. 


Why is this useful?

Say you want to implement 'cos' as a function of two arguments like

    cos ( angle scale -- value )

Realizing that a true 'cos' function is never used in the target code
because the scale can be fixed and is available at compile time, it
can be implemented as a template that generates a lookup table and
code to lookup the value. If later generic cosine routines are
necessary, this template macro can be extended to compile a call to
library code in case the parameter is not available at compile
time. One can be surprised how many times this pattern occurs: due to
the lack of target support for specific primitive abstractions it is
often easier to write something as a template for specialized
code. Note that this is different from programming for non-embedded
systems where this primitive functionality is usually available.

The advantage of doing it this way is that the code is easier to read:
code expresses semantics more easily without instantiation annotation
getting in the way. This annotation can be expressed somewhere else in
the form of 'forth' and 'macro' mode indicators. The disadvantage is
that a lot of code will be pushed towards implementation as a
macro. If this is taken too far, possible sharing might be
endangered. For that reason, moving between macro and instantiated
target code is made really straightforward in Purrr, but it remains an
explicit operation under programmer control.

Explicit code generation in Purrr is useful when

  * partial evaluation becomes too hard to do automatically
  * some on-target primitives are not available
  * algorithms are hard to express in concatenative syntax

So as long as it is possible to express a general algorithm in the
purely functional macro sublanguage, the built in PE can be used to
specialize the code. The advantage here is that the difference between
compile and run time can be silently ignored as an implementation
detail. However, in practice in some cases it might be easier to make
the code generation process a bit more explicit. In it is made very
straightforward to plug in arbitrary Scheme code for parameterized
code generation.


Conclusion

Stack languages are interesting for writing parameterizable low-level
code because the composition mechanism is so simple:

 * They are very straightforward to implement on the target
   architecture with a small run-time footprint of two stacks.

 * Automatic specialization through partial evaluation is very
   straightforward to implement off-target.

 * Implementing the code generator (including PE) using deterministic
   pattern matching exposes an interface that can be reused for
   plugging in aribitrary parameterized code generators.

In Purrr, the code generator language is Scheme. Within Scheme all of
Purrr and the underlying compiler is exposed: you can decide to
generate (pseudo) assembly code, Purrr code, or interface to external
code generators.

--

(1) Of course, the Purrr target semantics is not purely functional. It
    contains language primitives that introduce machine state
    (determined by world state) through a side channel into the
    concatenative language. This is no problem for PE, since it merely
    limits PE to those the subset of pure functions.  Procedures that
    depend on other parts of the machine state aside from the
    (threaded) parameter stack simply have to be instantiated, and
    cannot be performed at compile time.

(2) Except when it interferes with the implementation of the run-time
    function composition method, i.e. modifies the return stack. A
    more correct statement would be that the subclass of pure
    functions can always be trivially inlined.


Entry: Purrr and Metaprogramming
Date: Tue May 27 14:55:47 CEST 2008

This article introduces the general architecture of the Purrr language
tower.

Using a highlevel language to describe a problem and compiling it down
to lower level code is such a ubiquitous abstraction mechanism it is
hardly noticed as one. But what happens when this abstraction
mechanism is made programmable?

Instead of using a fixed compiler for a fixed language, make the
compiler extensible using a 'plug-in' mechanism.  This is one of the
powerful ideas behind languages like Lisp and Forth, where the
extension mechanism is implemented in terms of code transformers, also
called macros. In Lisp, the transformation is performed on structured
data, while in Forth the transformation is similar, but usually
implemented more directly without intermediate data step.

If the extension mechanism is easy to use, the base language and
compiler can be kept relatively minimal: a lot of choices can be left
to the user (programmer), building on top of the simple language
primitives. The choice of base language then can be to maximise
flexibility (i.e. Scheme) or to minimise implementation complexity
(i.e. Forth)

In Lisp in particular, there are 2 composition mechanisms:

  (A) procedure: composition by lambda abstraction and application
  (B) notation: composition of macros (code transformers)

(A) is the basic composition mechanism and can be used to express
solutions to all problems, but is not always terribly succinct because
of its intrinsic notational limitation. (B) can be used to solve most
notational problems by translating arbitrary notation into base
language syntax(1). An example of a severe notational problem is the
expression of one language in terms of another one.

This brings us to Purrr, a Forth-style language implemented as a set
of concatenative macros that reduce to machine code. This emphasis on
macros is its main difference from the traditional Forth approach
which implements primitives directly in machine code, and employs an
on-target interpreter to compose them. Purrr shifts the implementation
of primitives and the composition mechanism to the compilation
phase. This has the advantage that all language primitives can be
represented as pure functions, which greatly simplifies compiler and
optimizer construction. The language is layered in the following way:

  * [PATTERN] The macro language primitives are written mostly in
    terms of a pattern matching language that operates on target
    (pseudo) code. This implements code generation, peephole
    optimization, partial evaluation, arbitrary code generator plugins
    and specification of intermediate (pseudo code) objects used for
    communication between code generators for compile time parameter
    passing.

  * [MACRO] The macro language is a purely functional concatenative
    language. It can be used to compose the primitive code generators
    into higher level code generator abstractions.

  * [FORTH] The forth semantics layer leaves the choice of code
    instantiation vs. macro composition to the programmer by using a
    composition syntax for macros that is identical (2) to that of
    procedure words (3).

  * [PARSE] The forth syntax layer provides a pattern substitution
    mechanism for prefix words in order to maintain standard Forth
    syntax (4).

The power of Purrr lies mostly in the combination of the purely
functional concatenative macro language, and the pattern matching
language used to define language primitives and partial evaluation in
terms of (pseudo) machine primitives, which also serves as a generic
plug-in point for arbitrary code generators.

--

(1) Wether (B) is necessary also depends on the base language. For a
strict language, composition of code that manipulates evaluation order
cannot be performed with (A) alone. (for example 'if' and 'delay' in
Scheme). In a lazy language like Haskell, (A) can be stretched a long
way, however, Haskell is full of syntactic sugar and there's a
Template Haskell library to perform (B).

(2) The instantiated Forth language differs from the Macro language
in that it has access to the data structure used for run-time function
composition: the return stack. In addition, Forth words can have
multiple entry points, which means code can fall through to the next
definition in a source file.

(3) Traditional forth tries to keep the use of macros to a minimum, to
limit (on-target) compiler complexity. This is not a constraint in the
cross-compiled Purrr language. 

(4) Traditional Forth employs a more direct and low-level
metaprogramming syntax (i.e. "postpone", "compile", "accept", ...),
and uses reflection to reduce code size. Purrr limits this kind of
reflection and replaces it by 'unrolled' language towers. It is still
an open question wether using standard Forth syntax for the Purrr
system is a good idea. Forth syntax is limiting to the way
metaprogramming is used in Purrr, and is also fairly bound to the
reflective structure of a standard Forth parser/compiler. In any case,
in Purrr, the Forth syntax is merely a front-end to a more general
s-expression based syntax for the Macro language, so it is always
possible to fall back onto this syntax to express particular
abstractions.


Entry: load vs. require
Date: Thu Jun  5 11:33:34 CEST 2008

One big shared namespace versus tree structured shielded
namespaces. What's best?

Tree-structured modules are a great way to manage namespaces in large
software projects, but there is something to say for the 'one big
namespace' approach when single instances are the rule and not the
exception, where programming meets hardware design.

For example:

   * generic interpreter code uses the words 'receive' and 'transmit'

   * specific toplevel code links generic code together by simply
     inlining into its namespace the interpreter code and a particular
     definition of these words.

The benefit of this pattern is that it doesn't need much red tape:
there is basicly no namespace management, which can be a down side
when code gets big, and needs to be more generic. However, for deeply
embedded programming more often than not there is usually only one
static instance of a certain object, in the case of the example it's
the debugging serial port. Moreover, the ratio of glue code necessary
to implement the binding and the code that does something useful can
be high.

So why not use the simple mechanism until more abstraction is needed?
For example, generic interpreter code can be defined in a file that's
simply included. Whenever multiple instances or late binding is
necessary, this file is wrapped in a module to shield namespaces. This
is how it is solved right now: you're free to use load (include) or
require (proper module namespace managemetn). The monitor kernel is
implemented with load.

On top of this, macros can be redefined, but compilation generates a
notification of this. This is basically hierarchical modules +
parameters, but without specifying in advance which behaviour can be
parameterized.


Entry: Simulator
Date: Thu Jun  5 14:05:22 CEST 2008

What I need is a language for simulator design, or more specifically,
a strategy for compiling target code + some semantics spec to host
executable C code for optimal performance.

An advantage is that what needs to be simulated is usually quite
lowlevel code with fairly specific space and time characteristics. So
basicly, a state machine description language is necessary. Something
that can be compiled to a massively parallel C program.

If there is one place in Staapl where partial evaluation is going to
make a big difference, it's there.


Entry: The Dataflow Language
Date: Thu Jun  5 18:10:53 CEST 2008

From the KRIkit project it's quite clear that some kind of dataflow
language makes sense for deeply embedded DSP applications. The main
problem I ran into during the project is a too low abstraction level
to write the modem code. Making the code more abstract would cost 10x
execution speed.

So, some declarative dataflow language is a good addition to
Staapl. It's the only format that is easy to understand, and compiles
well to a RISC architecture.

But what syntax should that language have? The main problem that makes
a straightforward dataflow language impractical is expressing the
linking between code modules whenever a module has more than one
output value. Using a scheme-like expression syntax, some form of
'let-values' is necessary to name output values when nested
expressions are no longer sufficient due to the move from a
tree-structured data flow graph do a directed acyclic data flow graph.

So, why not use a concatenative/Forth syntax? Instead of using RPN
notation as a direct specification of control flow, which would fix
the execution order of operations as specified, it could be used only
to build the (parameterized) dataflow graph, which could then be
compiled and re-serialized specifically for the target architecture.

When this is combined with higher order functions (or combining forms)
this gives the original Packet Forth idea: APL with Forth syntax.

I have argued before that writing DSP code in Forth is difficult, but
this problem can be simplified using higher order (list/array)
combinators. What usually deters is the inefficiency of such
composition mechanism when they are implemented at run time. However,
when these combinators can be eliminated at compile time (a
first-order language with higher order macros) it might be feasible to
have both highlevel programming AND efficient code.


  
Entry: Combining Forms and Higher Order Macros
Date: Sun Jun 15 12:25:15 CEST 2008

Coma is a template programming language: all words that occur in a
source file are essentially macros, and are converted to code only if
they occur in an instantiating context, i.e. a Forth word. (Coma is
the pure functional macro kernel of the Forth language dialect called
Purrr).

Classical macros expand code. The essential extra step to make program
specialization through macros useful is a reduction step: after
expansion, evaluate resulting code compositions at compile time.

The essential observation is that a functional language with normal
order reduction semantics makes partial evaluation a trivial
operation. Since evaluation order is unspecified, performing it partly
at compile time and partly at runtime poses no difficulty.

In a compositional/concatenative language, the reduction operation
becomes simpler compared to lambda reduction, due to the absence of
variable names. Other than that, the principles are the same: anything
done in Coma can be extended to expression languages with formal
parameters with a few extra substitution rules.

The real problem in partial evaluation is how to handle higher order
recursive functions. These might lead to infinite expansion. I.e. in
the deforestation paper, Wadler shows how to catch first order
recursion by identifying recurring expansion patterns, and replacing
them with new (local) definitions.

An alternative approach (Bananas and Lenses) is to solve the PE
problem at a higher level: by identifying (a limited set of) higher
order combinators and deriving a set of rules about how they combine,
program manipulation is possible on that higher level, instead of
working with the mostly unstructured recursion (goto!) in unfolded
combinator bodies.

In Backus' FP, all the combinators are fixed, and there is no
composition mechanism for combinators. According to Backus (Iverson?)
this would force programmers to learn to use the available operators
efficiently. My opinion is that limiting composition in any way, while
beneficial for program analysis, is not a good idea for usability. You
need macros, otherwize code is going to have recurrent patterns that
can't be abstracted. On the other hand, (expansion-only) macros are
just notational shortcuts, and can be expanded to yield programs in
their basic algebra form.

What I'd like to figure out is what set of data types + functional
forms would be good for certain classes of problems related to highly
resource constrainted programming. How to reduce the space of programs
such that efficiency is guaranteed, but programs can still be written
in a high level declarative form.

references: the OBJ language, deforestation paper, Bananas and Lenses.




Entry: History and Semantics of Coma
Date: Wed Jun 25 16:28:39 CEST 2008

An interesting idea in

http://www.latrobe.edu.au/philosophy/phimvt/joy/j07rrs.html

is the re-introduction of stacks in the implementation of rewrite
rules. Manfred talks about giving Joy a rewrite semantics, without the
use of stacks, but eventually comes back to stacks as a way to
efficiently implement these rewrite rules.

So, it comes as no surprise that I've found rewrite rules to be an
interesting vehicle for defining the concatenative macro language
Coma:

  * Primitive semantics in Coma are provided by endomaps of target
    machine code programs. Each symbol (word separated by spaces) that
    occurs in a source file represents a pure function that transforms
    a list of machine instructions. Primitive functions are declared
    as pattern matching rules, yielding an eager rewrite system.

  * New Coma functions can be obtained by means of ordinary function
    composition:

       x == a b c

    which is syntactically represented by concatenation of function
    names, yielding a concatenative language.

  * The DERIVED semantics of symbols is obtained as the target program
    which results from the application of the function represented by
    the symbol to the empty program.

To make a concatenative language useful, some form of locality of
effect has to be introduced(1). Locality makes it possible to create
reusable components. A straightforward way of introducing locality is
to make the space of values on which the functions operate take the
form of a space of stacks of values.

For Coma, the choice is easy to make: 

  * Target programs are a concatenation of primitive machine
    instructions.

  * Rewriting functions operate on (possibly infinite) target programs
    by replacing a finite number of instructions M by a finite number
    of instructions N at the right end of a target program, leaving
    the leftmost instructions intact.

This makes the Macro language a purely functional concatenative stack
language: all its primitives operate on a stack of values, each value
representing a machine instruction.

That's nice and clean. The language makes it possible to implement a
peephole optimizing compiler which next to code generation (macro
expansion), also performs some form of partial evaluation (target code
rewriting).

To perform partial evaluation correctly, care needs to be taken to
align semantics that are a consequence of computations that are
performed during compilation, with computations that need to be
postponed to runtime. For example, the program [1 1 +] might be
rewritten to [2], but [+] has no rewrite rule, so in order to preserve
proper dervied semantics of '+' it needs to be rewritten as a program
that adds two numbers present on a run time stack, and replaces them
with the result(2).

Now, let's make it a bit more useful. In order to get interesting
properties for parameterized metaprogramming, we relax the semantics a
bit, allowing for the existence of code transformers without
associated derived semantics. They might lack derived semantics
because:

  * their domain does not contain the empty program.

     They do not have a purely code-generating form but need non-empty
     program input to rewrite to something else.

  * their codomain is not contained in the set of concatenative
    programs generated by the machine primitives.

     They generate intermediate instructions, which are compile time
     values, that might be collected by other rewrite rules to
     influence their code generation.

Such macros are called intermediate macros. Their closing space is
called the space of intermediate programs, which is generated by the
larger set of (real) target primitives compbined with these (virtual)
intermediate primitives(3).

This allows to define the notion of 'real' macros as macros that (are
concatenative and) map the empty program to a program in the target
program space. Real macros can be instantiated on the target machine.

Using an (intermediate) program space that is larger than the target
program space gives a means to use intermediate macros for
constructing components of parameterized target programs: build real
macros by composing intermediate macros. It provides a parameter
passing mechansim for macros that can host values not possible to
represent in the target semantics(4). This makes it possible to factor
code generation patterns into smaller elements that can be recombined
in several ways.

What makes this dirty is that not all programs with valid primary
semantics produce valid derived semantics. In other words, not all
macros expand to valid programs. This pain is mostly alleviated
because this is always the result of insufficient or incorrect
parameterization. Some macros can be seen as parameterized code
generators that transform their arguments into programs, but can
possibly occur in program text with invalid applied arguments. This
disadvantage stems from the dynamic nature of the macro language.

Conclusion:

Using rewrite primitives and function composition, the Macro language
allows the conversion of a concatenative program to a value whenever
the concatenative program has a pure associated target semantics,
meaning it reduces to a stack of values.

However, the target semantics isn't pure, as target programs depend on
input available only at run time. As a consequence, not all programs
can be completely reduced. This partial reduction (partial evaluation)
is the target program.

Combine this with a larger compile time value space and associated
rewrite rules to get a powerful metaprogramming system.


(1) Or modify all of it in a uniform way. For example a language like
    FP or any kind of monad language might have primitive operations
    that modify all the state. For example a 'map' operation. However,
    I'm more interested in a model for a complete system, not an
    embedded sublanguage.

(2) For Purrr18, the semantics doesn't always match completely. For
    example, numbers and associated operations are bignums at compile
    time, but 8 bit integers for run time operations. This could be
    called 'projected semantics'. I've found this to be really useful,
    i.e. to compute constants. This combination of a well-defined
    macro semantics with a link to a (somewhat less powerful and
    approximate) target program semantics gives a very powerful
    metaprogramming system: the time at which computations happen can
    be largely ignored when trying to understand overall program
    semantics, and can be unignored when trying to understand its
    implementation (instantiation and approximation). Both things are
    to say equally essential in resource constrained programming, less
    so in general programming where correctness/consistence usually
    takes precidence over implementation efficiency.

(3) For example, in the PIC18 architecture, there is no generic
    conditional branch instruction, but a whole family of such
    instructions optimized for different conditionals. In the code
    generation, this is solved by using a virtual intermediate
    condition value type that is combined with a conditional branch
    instruction to yield one of the specialized branch/skip
    instructions.

(4) Somewhat analogous to quarks and gluons in physics. Quarks can be
    bound to other quarks through gluon exchange, but are never seen
    in isolation.


--

Now, that's a nice story, but that's not how it happened :) The
specification of rewriting rules came natural as a syntactic
abstraction for some previously manually coded peephole optimizations
in the first iteration of the Forth compiler. This rewrite system
however seemed way more useful than just for performing simple
optimizations, and by introducing pseudo target assembly instructions
has been transformed into a different kind of two-level language
semantics: powerful compile time type system with an analogous run
time type system with 'projected semantics' derived from eager macro
rewrite rules.

The downside of this is that in order to write programs, one has to
make decisions about what to instantiate, and what not. It might be
interesting to try how far this can be automated: don't instantiate
what cannot be instantiated, but try to isolate subprograms that have
real semantics.


Entry: Staapl goals
Date: Wed Jul  2 22:13:52 CEST 2008

1. Practical: Build a framework for metaprogramming (multi-stage
   programming) in 'lisp-and-forth-style', which is taken to mean
   dynamically typed macros and simple stack-based machine
   models. This framework is supposed to be a practical solution and a
   way to build an intuition about Resource Aware Programming
   (RAP). In Staapl there is a balance between a static and a dynamic
   approach; this according to / borrowed from the PLT Scheme
   tradition: a static hierarchical module system for large scale code
   management, and dynamic interactive compilation, incremental code
   upload and target interaction. The concrete target architectures
   are small microcontrollers and DSP processors. The core could be
   called a dynamically typed, concatenative macro assembler (CMA).

The basic framework is as good as ready. There is a single test
backend for Microchip PIC18, and some bits and pieces for a C code
generator. Most changes in the last year were about implementation and
simplification: better integration with the PLT Scheme macro and
module system, and elimination of ad-hoc non-composable constructs.

2. Experimental: Design DSLs for resource (time/space) constrainted
   programming. The idea is to make them fit inside the CMA idea: to
   design useful DSP/control domain specific languages which enable
   efficient compilation, without moving too far away from the
   hands-on bare metal access. 

Currently, there's Purrr, a Forth dialect, and Coma, a cleaner
s-expression based concatenative Macro language used to implement the
purely functional core of Purrr. Future work is mostly about program
transformations (fusing/deforestation/...) in a Backus' FP and related
"Algebra of Programs" framework, OBJ's "theories", and a look into
typed staged systems ala Taha's RAP research.



Entry: Projected Semantics
Date: Sun Jul  6 15:18:49 CEST 2008

 Q: Why is it useful?

It's one of those ill-specified dynamicly typed sharp knifes that give
a lot in return when you know what you're doing. Overall, it's not too
different from ordinary macro-assembler expression evaluation, be it a
bit more general since it works with values and partially applied
functions (i.e. ADDLW <num>)

 Q: Doesn't it lead to a mess?

Yes, if you don't realize that expression evaluation and value /
operation projection don't commute for all operations.  Diagrams
commute for addition and subtraction (i.e. for PIC18 equivalence
modulo 2^8), but involving multiplication and division (and bit
shifts) breaks this morphism.

 Q: Can I get some more hand-holding?

I don't know how to appropriately type this.


Entry: Sheep works!
Date: Wed Jul 23 19:23:52 CEST 2008

The Sheep synth, Purrr's test app with an attitude is working on the
new application core. This means the Brood 5 refactoring has come to
an end, and it's time for a release.


Entry: Staapl 0.5 release notes
Date: Thu Jul 24 11:35:47 CEST 2008

Hello folks,

The Staapl project has come to a point where it is ready for public
scrutiny.

http://zwizwa.be/staapl

Staapl is a collection of abstractions for (meta)programming
microcontrollers from within PLT Scheme. The core of the system is a
programmable code generator structured around a functional
concatenative macro language. On top of this it includes a syntax
frontend for creating Forth-style languages, a backend code generator
for Microchip's PIC18 microcontroller architecture, and interaction
tools for shortening the edit-compile-test cycle.

This is the completion of the phase I goal: to build a practical,
well-factored, easily extensible base system upon which to build phase
II: experiments with domain specific languages for DSP and embedded
control.

From this point I will continue fixing bugs and polishing
documententation. An overview can be found here:

http://zwizwa.be/archive/staapl.html

The PIC18 Forth dialect is documented in the following papers:

http://zwizwa.be/archive/pic18-forth.pdf
http://zwizwa.be/archive/pic18-synth.pdf

Get the tarball at

http://zwizwa.be/archive/staapl-0.5.0.tar.gz


Entry: A simple example
Date: Sun Jul 27 11:53:19 CEST 2008

Staapl is now installed on PLT Planet, which is the central repository
for PLT Scheme user contributions, so it's very easy to install.  Get
the latest PLT Scheme at http://pre.plt-scheme.org You only need the
command line version (MzScheme)

To try it out in batch mode, create a file test.ss with the following
contents:

 #lang scheme/base
 (require (planet zwizwa/staapl/prj/pic18))
 (forth-compile ": inc 1 + ;")
 (save-ihex "test.hex")

Then execute it like this:

> mzscheme test.ss

This will locally cache the Staapl distribution, and create a file
test.hex containing the compiled Forth code ": inc 1 + ;". Check it
with gpdasm:

> gpdasm -p 18f1220 test.hex 
000000:  0f01  addlw	0x1
000002:  0012  return	0

This only uses the barebones compiler. To compile a Forth application
together with the boot monitor, replace the 'require line in the build
script with:

 (require (planet zwizwa/staapl/prj/pic18f1220-serial))

And compile again. The top file of the monitor code can be found here:

 http://zwizwa.be/darcs/staapl/staapl/pic18/monitor-p18f1220.f

These files are cached locally in this directory:

> mzscheme -p zwizwa/staapl/prj/pic18 -e pic18-lib

