This is the Staapl blog, a filtered ramblings file.
For more information see http://zwizwa.be/staapl

Staapl is also known under the working name Brood, which is
unfortunately impossible to google..

For discussions and comments, please join the mailing list at:
http://zwizwa.be/cgi-bin/mailman/listinfo/staapl-list


Entry: The new Staapl (Brood-5)
Date: Wed May 21 16:20:45 CEST 2008

The main motivation for the 5th Staapl iteration was name space
management. The objective was to turn all names that occur in .f files
into (Scheme) compile-time identifiers, and no longer use (Scheme)
run-time hash tables indexed by symbolic names.

The namespace is implemented by prefixing Scat/Macro/Purrr names, so
they can co-exist with and are essentially the same as Scheme
names. For the implementation of the Macro language, there are 2
different namespaces (scat) and (macro) that separate the semantics of
explicit compile time computations and postponed Forth macros in the
Purrr compiler.

As a happy side effect, Scat/Coma/Purrr now has a module system,
piggybacked on that of PLT Scheme. Relying more on Scheme's scoping
mechanism (lambda + modules) instead of ad-hoc hash table based
namespaces improves and simplifies the code quite a bit.

What I realized, is that thinking about code graph structure is a
whole lot more important than thinking about names! It's not names,
it's (nested and shadowing) bindings, the invisible lines that connect
declarations and references, and so build the structure of a
program. Like many things, in retrospect this is really obvious,
especially if you start from lambda calculus, but it had to be right
in my face before I came to realize its importance. Hence the slogan
"Scope is everything!" attributed to Daniel Friedman by Olin Shivers.

So, what does Staapl 5 look like? The Coma language is defined as a
collection of macros, which are (pure) Scat functions that operate on
a stack of PIC18 assembly code (Scat's parameter stack) and map it to
a new stack.

There are essentially 3 phases of compilation:

      (A) Compiler .ss source + Purrr .f source -> Coma functions.

      (B) Coma macro execution -> PIC18 code graph

      (C) Assembly -> binary code + addresses

Phases (B) and (C) can be called trivial. (B) just executes the macro
functions generated in (A) and collects the results, and (C) is a
fairly straightforward assembler that implements address allocation
and relaxation for instruction size optimization. 

At the end of phase (B) there is a target code graph structure, with
each node containing a slice of assembly code. At the end of phase (C)
each node is supplemented with binary code and a code address.

All the magic happens in phase (A). The architecture is a classical
layered bottom-up Scheme application, where (Scheme) macros form the
main abstraction mechanism to construct minilanguages for solving
specific problems. In addition, PLT Scheme's language module features
are used to implement Forth source file syntax on top of basic
concatenative language support.

   * Scat layer. A language layer on top of Scheme that implements a
     concatenative language with threaded state.

   * Coma layer. A collection of Scat functions that implements
     (local) machine code transformations.

   * Forth compiler layer: adds Forth-style control flow words and
     target code instantiation (macro execution) to the Coma language.
     A Forth program consists of declaration of code in terms of the
     macros that generate it.
       
   * On the syntax level, Coma macro language and instantiated Forth
     words are available using a fairly standard Forth syntax+reader
     which allows the definition of new macro and forth compositions
     in a single .f file. This is analogous to Forth's dual mode
     words. Most of the PIC18 kernel software is written in this
     syntax. See pic18/*.f


The identifiers in an .f file only live in phase (A), the Scheme macro
expansion phase. This was a fundamental insight:

  * All names in a .f file are macros = Scheme functions bound to
    identifiers in a module namespace. As such they intermingle very
    well with other Scheme functions, which solves the compiler
    extension problem: you can build all the metaprogramming tools you
    want.

  * Each Forth word (target code entry point) is defined in terms of a
    composition of macros. Creating a .f file is about deciding which
    macros (program generators) to instantiate.

The focus is thus on macros. These macros are pure functions that
operate on rich dynamic data types, and thus have nicer properties
than concrete Forth code, which is very low-level. This gives the
following work flow for writing a Purrr application:

   1. Create a set of highly parameterized macros that is reusable for
      a certain application domain.

   2. Instantiate those macros into concrete code by filling in the
      parameters.

Both phases can yield re-usable code. Not only code generators can be
reused, but also instantiation decisions. Target code size is mostly
determined by which words are inlined, and which words to compose at
run time using function calls.


Entry: Compiler specialization: bottom up with holes?
Date: Wed May 21 19:39:42 CEST 2008

Bottom-up code structuring (an acyclic graph of modules) improves
locality. To write code, you just need to think about the code at
hand, and the primitives provided by the lower layers. Functional code
written in this way tests easily because there is no elaborate test
environment with intricate dependencies to setup.

However, offten the need arrises to parameterize in the other
way. This post explains the why and how of ``backpatching'' macros
during compiler specialization.

The Staapl source code is organized in a bottom up way using PLT
Scheme's hierarchical module system. PLT Scheme supports another
organization method: units. These are like modules, but they are
parameterized and have to be explicitly linked. If you want pluggable
behaviour you basicly have these alternatives:

  * modules + parameters
  * modules arranged in units
  * modules + units + classes (mixins)

I'm using the former: compiler code that needs to be specialized later
is implemented using dynamic parameters. However, I've run into
another problem which is specialization of individual code generators,
i.e. to add target-specific optimizations to the more generic
behaviour.

The code generators in Staapl are factored to a fine granularity: each
(Coma) language element is essentially a mini code transformer. The
functions implementing the macros are bound to prefixed names using
'define. This can be used in either a hierarchical module namespace or
a flat toplevel namespace.

The problem there is that it is difficult to anticipate exactly which
macros might need replacement during specialization. So, instead of
using units or make-parameter, or explicit renaming and overriding or
a more heavyweight object system, the functions implementing the
macros are implemented as boxed functions.


  * It is possible to change the behaviour of ANY macro by replacing
    it with a more specialized version later. Here ``later'' is well
    defined. For (declarative) hierarchical modules it means higher up
    in the hierarchy. For (imperative) incremental load into the flat
    namespace it means the specialization is loaded after the generic
    code.

  * Whenever a specialization happens, it is logged to the console. If
    the need arises, this mechanism can be limited later: it happens
    in one place only: 'redefine!-ns-tx in staapl/ns-tx.ss

  * The replacement has access to the previous definition using the
    name 'super bound in the same namespace. i.e. the (macro)
    namespace, so it can delegate.

  * To re-introduce some form of locality to prevent clashes between
    DIFFERENT specializations, each specialized compiler has its own
    namespace.

This is OK because macros are semantically clearly defined entitities
that can be replaced by target specific specialization without
changing the conceptual hierarchical structure of the system as a
whole. This specialization is orthogonal (an aspect).


SUMMARIZED: 

  In addition to controlled specialization (make-parameter) which can
  be limited locally (parameterize), and other mechanisms (renamed
  imports, units, classes, ...) Staapl allows the redefinition of an
  entire class of functions: the macros.


Entry: Code should be data
Date: Fri May 23 09:36:28 CEST 2008

Macros should be about DATA ENTRY. If you can stylize macros such that
what you are actually typing in the end can't be concentrated any
further (an all syrup squishee), then you basicly have a configuration
file picking from your abstracted domain knowledge in a natural way.

In the PIC18 compiler, a nice example is the 'patterns' macro, which
provides an RPN pattern matching syntax to specify compile time
computation and code generation for primitive macros. This defines
Scat words that operate on target instructions:

(patterns (macro) 
   ;; Define '-' to subtract two numbers
   (([qw a] [qw b] -)              ([qw (target: a b -)]))
   (([addlw a] [qw b] -)           ([addlw (target: a b -)]))
   (([qw a] -)                     ([addlw (target: a -1 *)]))   
   (([save] [movf a 0 0] -)        ([bpf 0 STATUS 0 0] [subfwb a 0 0])) 
   ((-)                            ([subwf POSTDEC0 0 0]))
)

The input patterns are assembly code sequences, and the output is
again an assembly code sequence. The 'qw' pseudo operation Quotes a
data Word: it pushes a number to the data stack. Here '-' either is
optimized away, which means the computation performed at compile time
either fully or partially, or compiled as a [subwf] instruction.



Entry: Unrolling Forth
Date: Mon May 26 14:20:11 CEST 2008

The most difficult task in Staapl has been finding a bridge between
the layered, non-reflective, cross-compiled structure of Coma (lexer,
parser, compiler, interpreter + Scheme code written using PLT's
hierarchical module system) and the more traditional self-hosted
reflective Forth approach.  This ``unrolling'' makes explicit
metaprogramming simpler.

Trying to do this I have come to appreciate more the elegance of the
standard reflective Forth approach.  Forth's compactnes comes exactly
from leaving the intermedate representations out, and maximally
exploiting reflectivity.

It is not possible to ``unroll'' the reflectivity in Forth completely
so it fits in the layered Staapl approach.  In order to write an ANS
standard Forth, some of this reflectivity needs to be restored.  The
problem is mostly parsing words.  Traditionally, immediate words
(macros) have access to the input stream, which is not possible in the
current Staapl approach since lexing is a separate step.  After
lexing, only preprocesser macros (prefix parsers) have access to the
word stream, while ordinary macros can only generate (and transform)
machine code.



So, why does Staapl have to be so complicated?  The answer is
straightforward.  To me the simplicity of the purely function Scat
model is more important than any inconveniences that are a consequence
of Coma being different from standard reflective Forth.  Everything is
built on top of Coma/Scat in layers to make metaprogramming easier,
and to integrate better with PLT's bias towards a static, layered
non-reflective approach.

However, I still find the concept of parsing words useful as a user
interface.  The benefits of having a somewhat standard ``Forth look''
outweigh the disadvantages of having a slightly different semantics
that requires a mechanism different from normal word composition (the
'substitutions construct). 

Nevertheless, I am still looking for an approach that would enable the
inclusion of ANS standard Forth into a Staapl project.  The road that
seems most promising is the simulation of a self-hosted ANS Forth
written in ANS Forth, where the core architecture (dictionary +
threading model) can be easily filled in.



Entry: Extract Method
Date: Mon May 26 14:43:19 CEST 2008

This article is about the benefits of programming in a concatenative
language, viewed from the point of programming as iterative
refactoring of a prototype.

Let's say composition and decomposition are the basic tools of any
engineering discipline:

   * build complex, specialized systems out of simple, generic parts
   * refactor a complex design to create a toolbox of generic parts

Taking apart a complex system that's made from simple parts is the
easiest. Putting it back together again is difficult, but simpler than
putting together a system without a basic plan. The most difficult
step is taking an evolved pot of spaghetti and untangling it to CREATE
a library of simple parts, and rebuilding the system from that.

Decomposition creates interfaces: when a monolithic part is split in
2, there is now a thing connecting them. Good decompositions can be
judged by the complexity of the interfaces they create. You do the
right thing if cutting at some point hides the stuff you want to
forget, but keeps open some sane way of interacting with the rest of
the world. Simple interfaces promote reuse.

In most programming languages, the procedure / function / message is a
basic unit of composition: it can be used to create a black box which
reacts to input and produces output. Decomposition or refactoring in
such languages amounts to the "extract method" pattern: split up a
complex uncontrolled interaction by identifiying primitive operations,
and rewrite the complex interaction as a simpler interaction between
these primitives.

Looking at Martin Fowler's explanation in Refactoring: Improving the
Design of Existing Code:

    "You have a code fragment that can be grouped together. Turn the
    fragment into a method whose name explains the purpose of the
    method."

Important consequences of extracting methods:

   * compression and modularity: code will be smaller because
     duplication is removed: share NOW. this reduces local complexity.

   * it promotes reuse: creates a possibility that code can be shared
     by new nodes LATER, even outside of the current project.

   * documentation: the name annotate the code note with a meaningful
     explanation. the better the factorization, the easier it becomes
     to give names.

An advantage of concatenative programming languages is that 'extract
method' is a simple cut and paste operation: the absence of names for
function parameters disposes of the hurdle of inventing them when
factoring out code. The stack which 'binds' variables by position
encoding inside a definition, also does this outside. There is no
difference.



Entry: What is Partial Evaluation?
Date: Mon May 26 19:00:32 CEST 2008

Some clarification of terminology:

   * Currying
   * Partial Application
   * Partial Evaluation
   * Staging

These are all very related:

* Curry: move a parameter out of a function's parameter list to create
  a higher order function. I.e. for Scheme's multiple argument lambda:

     (lambda (a b) (+ a b)) -> (lambda (a) (lambda (b) (+ a b))). 

  In a language like ML or Haskell the default representation is
  curried (all functions are unary).

* Partial Application: given a higher order (curried) function, obtain
  function of a lower order by applying one or more arguments.

* Staging: moving comptutations between computation stages (i.e. from
  run-time to compile-time)

* Partial evaluation: Automatic staging based on input data available
  at compile time.


Partial application and staging are somewhat related (they are both
about reducing applications), but differ mainly in the way the result
of the reduction is represented. Staging performs substitution on
expression syntax, while partial application typically creates an
abstract closure object comprised of a representation of the original
un-reduced expression and an environment of name->value bindings.


Sources:

http://lambda-the-ultimate.org/node/2266
http://srfi.schemers.org/srfi-26/mail-archive/msg00000.html
http://www.brics.dk/~hosc/local/HOSC-13-4-pp355-368.pdf
http://www.dina.kvl.dk/~sestoft/pebook/
http://citeseer.ist.psu.edu/taha99metaml.html



Entry: Evaluation order in Coma
Date: Mon May 26 20:33:30 CEST 2008


To set the stage, let's define a deeply embedded program as the
specialization of a general program in such a way that run-time
complexity (and so cost of supporting hardware) is significantly
reduced. For large reduction in cost/complexity (or a large enough
production volume) significant fixed engineering cost can be spent on
specialization. Deeply embedded programming is about spending time and
money on program specialization.

Partial evaluation (PE) is an operation on program text: it takes an
input program and static data, and automatically specializes the
program to the static data. It comes as no surprise that this problem
in general is not a simple one. Often it is easier to perform a part
of this problem manually using program templates.

What is called PE in the Purrr language is actually a combination of
straightforwardly implemented concatenative PE and an interface for
completely generic but explicit metaprogramming using instantiation of
parameterizable code templates. It could be called Partial Evaluation
with Holes (PEH).

So, how does it work?

       PE from (greedy) deterministic pattern matching
                             =
                 a typed template language

So, by fixing the algorithm used to implement PE, a language emerges
that is useful for other kinds of code generation. Let's spin that out
a bit.

PE in a concatenative language is quite straighforward: function
composition is associative which makes evaluation order a parameter to
play with. Compositions of pure functions can be performed at compile
time to generate composite functions that are more efficiently
implemented, while other compositions can be postponed to run time due
to dependency of dynamic data (1).

This is because concatenative syntax allows to abstract away the
composition method: a function can always be trivially inlined,
instead of being invoked at runtime using a the run-time composition
mechanism: the machine's function call (2). When inlining multiple
functions, there can be an opportunity for specialization by moving
some computations to compile time. For example, inlining the functions
[ 1 ], [ 2 ] and [ + ] produces an inlined composition [ 1 2 + ] which
can be replaced by the composition [ 3 ]. This is automatic program
specialization by Partial Evaluation in its purest form.

In Purrr, the Partial Evaluator is not implemented as a separate
component. PE is a consequence of the actions of the machine model,
which is specified in terms of primitive Purrr macros, which implement
a map of recently generated code (the top of the compilation stack) to
new code to be generatied (placed on the compilation stack). These
primitives are expressed in a language with deterministic pattern
matching. It allows the specification of the following compiler
components:

  * target code generation
  * peephole optimization
  * partial evaluation
  * generic parameterized template instantiation

The first 3 could be counted as components of pure partial evaluation.
The last one however is not: it is an interface that connects the
concatenative macro language to explicit code generation tools. It
allows the use of templates that have no target semantics unless they
are parameterized. 


Why is this useful?

Say you want to implement 'cos' as a function of two arguments like

    cos ( angle scale -- value )

Realizing that a true 'cos' function is never used in the target code
because the scale can be fixed and is available at compile time, it
can be implemented as a template that generates a lookup table and
code to lookup the value. If later generic cosine routines are
necessary, this template macro can be extended to compile a call to
library code in case the parameter is not available at compile
time. One can be surprised how many times this pattern occurs: due to
the lack of target support for specific primitive abstractions it is
often easier to write something as a template for specialized
code. Note that this is different from programming for non-embedded
systems where this primitive functionality is usually available.

The advantage of doing it this way is that the code is easier to read:
code expresses semantics more easily without instantiation annotation
getting in the way. This annotation can be expressed somewhere else in
the form of 'forth' and 'macro' mode indicators. The disadvantage is
that a lot of code will be pushed towards implementation as a
macro. If this is taken too far, possible sharing might be
endangered. For that reason, moving between macro and instantiated
target code is made really straightforward in Purrr, but it remains an
explicit operation under programmer control.

Explicit code generation in Purrr is useful when

  * partial evaluation becomes too hard to do automatically
  * some on-target primitives are not available
  * algorithms are hard to express in concatenative syntax

So as long as it is possible to express a general algorithm in the
purely functional macro sublanguage, the built in PE can be used to
specialize the code. The advantage here is that the difference between
compile and run time can be silently ignored as an implementation
detail. However, in practice in some cases it might be easier to make
the code generation process a bit more explicit. In it is made very
straightforward to plug in arbitrary Scheme code for parameterized
code generation.


Conclusion

Stack languages are interesting for writing parameterizable low-level
code because the composition mechanism is so simple:

 * They are very straightforward to implement on the target
   architecture with a small run-time footprint of two stacks.

 * Automatic specialization through partial evaluation is very
   straightforward to implement off-target.

 * Implementing the code generator (including PE) using deterministic
   pattern matching exposes an interface that can be reused for
   plugging in aribitrary parameterized code generators.

In Purrr, the code generator language is Scheme. Within Scheme all of
Purrr and the underlying compiler is exposed: you can decide to
generate (pseudo) assembly code, Purrr code, or interface to external
code generators.

--

(1) Of course, the Purrr target semantics is not purely functional. It
    contains language primitives that introduce machine state
    (determined by world state) through a side channel into the
    concatenative language. This is no problem for PE, since it merely
    limits PE to those the subset of pure functions.  Procedures that
    depend on other parts of the machine state aside from the
    (threaded) parameter stack simply have to be instantiated, and
    cannot be performed at compile time.

(2) Except when it interferes with the implementation of the run-time
    function composition method, i.e. modifies the return stack. A
    more correct statement would be that the subclass of pure
    functions can always be trivially inlined.


Entry: Forth and Metaprogramming
Date: Tue May 27 14:55:47 CEST 2008

This article introduces the general architecture of the Forth
dialect's language tower.

Using a highlevel language to describe a problem and compiling it down
to lower level code is such a ubiquitous abstraction mechanism it is
hardly noticed as one. But what happens when this abstraction
mechanism is made programmable?

Instead of using a fixed compiler for a fixed language, make the
compiler extensible using a 'plug-in' mechanism.  This is one of the
powerful ideas behind languages like Lisp and Forth, where the
extension mechanism is implemented in terms of code transformers, also
called macros. In Lisp, the transformation is performed on structured
data, while in Forth the transformation is similar, but usually
implemented more directly without intermediate data step.

If the extension mechanism is easy to use, the base language and
compiler can be kept relatively minimal: a lot of choices can be left
to the user (programmer), building on top of the simple language
primitives. The choice of base language then can be to maximise
flexibility (i.e. Scheme) or to minimise implementation complexity
(i.e. Forth)

In Lisp in particular, there are 2 composition mechanisms:

  (A) procedure: composition by lambda abstraction and application
  (B) notation: composition of macros (code transformers)

Here (A) is the basic composition mechanism and can be used to express
solutions to all problems, but is not always terribly succinct because
of its intrinsic notational limitation. The (B) mechanism can be used
to solve most notational problems by translating arbitrary notation
into base language syntax(1). An example of a severe notational
problem is the expression of one language in terms of another one.

The Forth dialect in Staapl is implemented as a set of concatenative
macros that reduce to machine code.  This has the advantage that all
language primitives can be represented as pure functions, which
greatly simplifies compiler and optimizer construction.  The language
is layered in the following way:

  * [PRIMITIVE CODE PROCESSING] The macro language primitives are
    written mostly in terms of a pattern matching language that
    operates on target (pseudo) code. This implements code generation,
    peephole optimization, partial evaluation, arbitrary code
    generator plugins and specification of intermediate (pseudo code)
    objects used for communication between code generators for compile
    time parameter passing.

  * [MACRO COMPOSITION] The macro language is a purely functional
    concatenative language (Coma, based on Scat) with extensions to
    implement Forth style structured programming words. It can be used
    to compose the primitive code generators into higher level code
    generator abstractions.

  * [INSTANTIATION] In a Forth file, one has the choice to define
    functionality as macros (2) or as instantiated code.  This is
    implemented using an identical (3) syntax to make it easy to
    switch between the two modes.

  * [PREFIX PARSER] The forth syntax layer provides a pattern
    substitution mechanism for prefix words in order to maintain
    standard Forth syntax (4).

The power of Purrr lies mostly in the combination of the purely
functional concatenative macro language, and the pattern matching
language used to define language primitives and partial evaluation in
terms of (pseudo) machine primitives, which also serves as a generic
plug-in point for arbitrary code generators.



Notes:

(1) Whether (B) is necessary also depends on the base language. For a
    strict language, composition of code that manipulates evaluation
    order cannot be performed with (A) alone. (for example 'if' and
    'delay' in Scheme). In a lazy language like Haskell, (A) can be
    stretched a long way, however, Haskell is full of syntactic sugar
    and there's a Template Haskell library to perform (B).

(2) Traditional forth implements primitives in machine code and tries
    to keep the use of macros to a minimum, to limit (on-target)
    compiler complexity. This is not a constraint in the
    cross-compiled Purrr language, where all primitives are macros and
    no primitive run-time code is necessary.

(3) Almost identical. The instantiated Forth language differs from the
    Macro language in that it has access to the data structure used
    for run-time function composition: the return stack. In addition,
    Forth words can have multiple entry points, which means code can
    fall through to the next definition in a source file.

(4) Traditional Forth employs a more direct and low-level
    metaprogramming syntax (i.e. "postpone", "compile", "accept",
    ...), and uses reflection to reduce code size. Purrr limits this
    kind of reflection and replaces it by 'unrolled' language
    towers. It is still an open question wether using standard Forth
    syntax for the Purrr system is a good idea. Forth syntax is
    limiting to the way metaprogramming is used in Purrr, and is also
    fairly bound to the reflective structure of a standard Forth
    parser/compiler. In any case, in Purrr, the Forth syntax is merely
    a front-end to a more general s-expression based syntax for the
    Macro language, so it is always possible to fall back onto this
    syntax to express particular abstractions.


Entry: load vs. require
Date: Thu Jun  5 11:33:34 CEST 2008

One big shared namespace versus tree structured shielded
namespaces. What's best?

Tree-structured modules are a great way to manage namespaces in large
software projects, but there is something to say for the 'one big
namespace' approach when single instances are the rule and not the
exception, where programming meets hardware design.

For example:

   * Generic interpreter code is parameterized by the words 'receive'
     and 'transmit'.

   * Specific toplevel code links generic code together by simply
     inlining into its namespace the interpreter code and a particular
     definition of these words.

The benefit of this pattern is that it doesn't need much red tape:
there is basicly no namespace management, which can be a down side
when code gets big, and needs to be more generic. However, for deeply
embedded programming more often than not there is usually only one
static instance of a certain object, in the case of the example it's
the debugging serial port. Moreover, the ratio of glue code necessary
to implement the binding and the code that does something useful can
be high.

So why not use the simple mechanism until more abstraction is needed?
For example, generic interpreter code can be defined in a file that's
simply included. Whenever multiple instances or late binding is
necessary, this file is wrapped in a module to shield namespaces. This
is how it is solved right now: you're free to use load (include) or
require (proper module namespace managemetn). The monitor kernel is
implemented with load.

On top of this, macros can be redefined, but compilation generates a
notification of this. This is basically hierarchical modules +
parameters, but without specifying in advance which behaviour can be
parameterized.


Entry: Simulator
Date: Thu Jun  5 14:05:22 CEST 2008

What I need is a language for simulator design, or more specifically,
a strategy for compiling target code + some semantics spec to host
executable C code for optimal performance.

An advantage is that what needs to be simulated is usually quite
lowlevel code with fairly specific space and time characteristics. So
basicly, a state machine description language is necessary. Something
that can be compiled to a massively parallel C program.

If there is one place in Staapl where partial evaluation is going to
make a big difference, it's there.


Entry: The Dataflow Language
Date: Thu Jun  5 18:10:53 CEST 2008

From the KRIkit project it's quite clear that some kind of dataflow
language makes sense for deeply embedded DSP applications.  The main
problem I ran into during the project is a too low abstraction level
to write the modem code.  Making the code more abstract at the Forth
level would have cost about 10x execution speed.  Solving this at the
macro level probably a better idea.  Some declarative dataflow or
array language might be a good addition to Staapl and a better match
to RISC architectures with more direct 2/3-operand register file
addressing modes than the 1-operand 8-bit PICs.

But what syntax should that language have? The main problem that makes
a straightforward dataflow language impractical is expressing the
linking between code modules whenever a module has more than one
output value.  Using a scheme-like expression syntax, some form of
'let-values' is necessary to name output values when nested
expressions are no longer sufficient due to the move from a
tree-structured data flow graph do a directed acyclic data flow graph.

Why not use a concatenative/Forth syntax?  Instead of using RPN
notation as a direct specification of control flow, which would fix
the execution order of operations as specified, it could be used only
to build the (parameterized) dataflow graph, which could then be
compiled and re-serialized specifically for the target architecture.

When this is combined with higher order functions (or the macro
version: combining forms) this gives the original Packet Forth idea:
APL with Forth syntax.

I have argued before that writing DSP code in Forth is difficult, but
this problem can be simplified using higher order (list/array)
combinators.  What usually deters is the inefficiency of such
composition mechanism when they are implemented at run time. However,
when these combinators can be eliminated at compile time (a
first-order language with higher order macros) it might be feasible to
have both highlevel programming AND efficient code.

A place to look for inspiration is probably Factor's data flow analysis.

  
Entry: Combining Forms and Higher Order Macros
Date: Sun Jun 15 12:25:15 CEST 2008

Coma is used as the pure functional macro kernel of the Forth language
dialect template called Purrr, which is further specialized to yield a
PIC18 Forth dialect.

Coma is essentially a template programming language: all words that
occur in a source file are generate and process intermediate code if
they occur in an instantiating context, i.e. a Forth word.  

Classical macros expand code.  The essential extra step to make
program specialization work is a reduction step: after expansion of
several primitives, examine the resulting code and reduce it by
evaluating part of it.  In Coma the expansion and reduction operations
are combined into a single step which are specified as primitive
intermediate code pattern matching words.

In a compositional/concatenative language, the reduction operation
becomes simpler compared to lambda reduction, due to the absence of
variable names.  Other than that, the principles are the same:
anything done in Coma can be extended to expression languages with
formal parameters with a few extra substitution rules.



What I am interested in is to see in what way this can be applied to
higher order programming.  Currently Coma is used mainly to support
Forth (a first order language), using a simple extension that allows
the implementation of Forth's structured programming on top of
conditional and unconditional jumps.

Alternatively, a more Joy-like language can be constructed on top of
Coma, where all control flow is built on top of recursion and the
'ifte combinator.

This is where I'm a bit in the dark.  It is possible to create
replacements for Forth's structured programming words by using higher
order combinators that perform partial evaluation on code quotations,
inlining them.  A construction like 

     (a b c) (d e f) ifte

would essentially compile to Forth's

     if  a b c  else  d e f  then

There are two main problems with this approach. One is should 'ifte
have a semantics if it has arguments that cannot be evaluated at
compile time?  It doesn't seem too difficult to allow a run-time code
quotations to _exist_ at run time; they are just code pointers.
However, allowing them to be _constructed_ at run time (the analogy of
closures) requires some run-time memory management (either full GC or
linear memory).

Another problem is explicit recursion.  The main difficulty about PE
is when to stop.  It's not possible to unroll recursive definitions
completely since this provides an infinitely large program.  Wadler
talks about this in the orginal deforestation paper, where such
infinite expansions are caught by introducing run-time recursion of
newly generated functions.  An alternative approach (Bananas and
Lenses) is to solve the PE problem at a higher level: by identifying
(a limited set of) higher order combinators and deriving a set of
rules about how they combine, program manipulation is possible on that
higher level, instead of working with the mostly unstructured
recursion (goto!) in unfolded combinator bodies.  In Backus' FP, all
the combinators are fixed, and there is no composition mechanism for
combinators. According to Backus (Iverson?)  this would force
programmers to learn to use the available operators efficiently.

What I'd like to figure out is what set of data types + functional
forms would be good for certain classes of problems related to
resource constrainted programming.  How to reduce the space of
programs such that efficiency is guaranteed, but programs can still be
written in a high level declarative form.

References:

http://www.stanford.edu/class/cs242/readings/backus.pdf
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.36.1030
http://citeseer.ist.psu.edu/wadler90deforestation.html
http://citeseer.ist.psu.edu/meijer91functional.html


Entry: History and Semantics of Coma
Date: Wed Jun 25 16:28:39 CEST 2008

An interesting idea in

http://www.latrobe.edu.au/philosophy/phimvt/joy/j07rrs.html

is the re-introduction of stacks in the implementation of rewrite
rules. Manfred talks about giving Joy a rewrite semantics, without the
use of stacks, but eventually comes back to stacks as a way to
efficiently implement these rewrite rules.

So, it comes as no surprise that I've found rewrite rules to be an
interesting vehicle for defining the concatenative macro language
Coma:

  * Primitive semantics in Coma are provided by endomaps of target
    machine code programs. Each symbol (word separated by spaces) that
    occurs in a source file represents a pure function that transforms
    a list of machine instructions. Primitive functions are declared
    as pattern matching rules, yielding an eager rewrite system.

  * New Coma functions can be obtained by means of ordinary function
    composition:

       x == a b c

    which is syntactically represented by concatenation of function
    names, yielding a concatenative language.

  * The DERIVED semantics of symbols is obtained as the target program
    which results from the application of the function represented by
    the symbol to the empty program.

To make a concatenative language useful, some form of locality of
effect has to be introduced(1). Locality makes it possible to create
reusable components. A straightforward way of introducing locality is
to make the space of values on which the functions operate take the
form of a space of stacks of values.

For Coma, the choice is easy to make: 

  * Target programs are a concatenation of primitive machine
    instructions.

  * Rewriting functions operate on (possibly infinite) target programs
    by replacing a finite number of instructions M by a finite number
    of instructions N at the right end of a target program, leaving
    the leftmost instructions intact.

This makes the Macro language a purely functional concatenative stack
language: all its primitives operate on a stack of values, each value
representing a machine instruction.

That's nice and clean. The language makes it possible to implement a
peephole optimizing compiler which next to code generation (macro
expansion), also performs some form of partial evaluation (target code
rewriting).

To perform partial evaluation correctly, care needs to be taken to
align semantics that are a consequence of computations that are
performed during compilation, with computations that need to be
postponed to runtime. For example, the program [1 1 +] might be
rewritten to [2], but [+] has no rewrite rule, so in order to preserve
proper dervied semantics of '+' it needs to be rewritten as a program
that adds two numbers present on a run time stack, and replaces them
with the result(2).

Now, let's make it a bit more useful. In order to get interesting
properties for parameterized metaprogramming, we relax the semantics a
bit, allowing for the existence of code transformers without
associated derived semantics. They might lack derived semantics
because:

  * their domain does not contain the empty program.

     They do not have a purely code-generating form but need non-empty
     program input to rewrite to something else.

  * their codomain is not contained in the set of concatenative
    programs generated by the machine primitives.

     They generate intermediate instructions, which are compile time
     values, that might be collected by other rewrite rules to
     influence their code generation.

Such macros are called intermediate macros. Their closing space is
called the space of intermediate programs, which is generated by the
larger set of (real) target primitives compbined with these (virtual)
intermediate primitives(3).

This allows to define the notion of 'real' macros as macros that (are
concatenative and) map the empty program to a program in the target
program space. Real macros can be instantiated on the target machine.

Using an (intermediate) program space that is larger than the target
program space gives a means to use intermediate macros for
constructing components of parameterized target programs: build real
macros by composing intermediate macros. It provides a parameter
passing mechansim for macros that can host values not possible to
represent in the target semantics(4). This makes it possible to factor
code generation patterns into smaller elements that can be recombined
in several ways.

What makes this dirty is that not all programs with valid primary
semantics produce valid derived semantics. In other words, not all
macros expand to valid programs. This pain is mostly alleviated
because this is always the result of insufficient or incorrect
parameterization. Some macros can be seen as parameterized code
generators that transform their arguments into programs, but can
possibly occur in program text with invalid applied arguments. This
disadvantage stems from the dynamic nature of the macro language.

Conclusion:

Using rewrite primitives and function composition, the Macro language
allows the conversion of a concatenative program to a value whenever
the concatenative program has a pure associated target semantics,
meaning it reduces to a stack of values.

However, the target semantics isn't pure, as target programs depend on
input available only at run time. As a consequence, not all programs
can be completely reduced. This partial reduction (partial evaluation)
is the target program.

Combine this with a larger compile time value space and associated
rewrite rules to get a powerful metaprogramming system.


(1) Or modify all of it in a uniform way. For example a language like
    FP or any kind of monad language might have primitive operations
    that modify all the state. For example a 'map' operation. However,
    I'm more interested in a model for a complete system, not an
    embedded sublanguage.

(2) For Purrr18, the semantics doesn't always match completely. For
    example, numbers and associated operations are bignums at compile
    time, but 8 bit integers for run time operations. This could be
    called 'projected semantics'. I've found this to be really useful,
    i.e. to compute constants. This combination of a well-defined
    macro semantics with a link to a (somewhat less powerful and
    approximate) target program semantics gives a very powerful
    metaprogramming system: the time at which computations happen can
    be largely ignored when trying to understand overall program
    semantics, and can be unignored when trying to understand its
    implementation (instantiation and approximation). Both things are
    to say equally essential in resource constrained programming, less
    so in general programming where correctness/consistence usually
    takes precidence over implementation efficiency.

(3) For example, in the PIC18 architecture, there is no generic
    conditional branch instruction, but a whole family of such
    instructions optimized for different conditionals. In the code
    generation, this is solved by using a virtual intermediate
    condition value type that is combined with a conditional branch
    instruction to yield one of the specialized branch/skip
    instructions.

(4) Somewhat analogous to quarks and gluons in physics. Quarks can be
    bound to other quarks through gluon exchange, but are never seen
    in isolation.


--

Now, that's a nice story, but that's not how it happened :) The
specification of rewriting rules came natural as a syntactic
abstraction for some previously manually coded peephole optimizations
in the first iteration of the Forth compiler. This rewrite system
however seemed way more useful than just for performing simple
optimizations, and by introducing pseudo target assembly instructions
has been transformed into a different kind of two-level language
semantics: powerful compile time type system with an analogous run
time type system with 'projected semantics' derived from eager macro
rewrite rules.

The downside of this is that in order to write programs, one has to
make decisions about what to instantiate, and what not. It might be
interesting to try how far this can be automated: don't instantiate
what cannot be instantiated, but try to isolate subprograms that have
real semantics.


Entry: Staapl goals
Date: Wed Jul  2 22:13:52 CEST 2008

1. Practical: Build a framework for metaprogramming (multi-stage
   programming) in 'lisp-and-forth-style', which is taken to mean
   dynamically typed macros and simple stack-based machine
   models. This framework is supposed to be a practical solution and a
   way to build an intuition about Resource Aware Programming
   (RAP). In Staapl there is a balance between a static and a dynamic
   approach; this according to / borrowed from the PLT Scheme
   tradition: a static hierarchical module system for large scale code
   management, and dynamic interactive compilation, incremental code
   upload and target interaction. The concrete target architectures
   are small microcontrollers and DSP processors. The core could be
   called a dynamically typed, concatenative macro assembler (CMA).

The basic framework is as good as ready. There is a single test
backend for Microchip PIC18, and some bits and pieces for a C code
generator. Most changes in the last year were about implementation and
simplification: better integration with the PLT Scheme macro and
module system, and elimination of ad-hoc non-composable constructs.

2. Experimental: Design DSLs for resource (time/space) constrainted
   programming. The idea is to make them fit inside the CMA idea: to
   design useful DSP/control domain specific languages which enable
   efficient compilation, without moving too far away from the
   hands-on bare metal access. 

Currently, there's Purrr, a Forth dialect, and Coma, a cleaner
s-expression based concatenative Macro language used to implement the
purely functional core of Purrr. Future work is mostly about program
transformations (fusing/deforestation/...) in a Backus' FP and related
"Algebra of Programs" framework, OBJ's "theories", and a look into
typed staged systems ala Taha's RAP research.



Entry: Projected Semantics
Date: Sun Jul  6 15:18:49 CEST 2008

 Q: What?

All arithmetic expressions in Coma are evaluated as soon as possible.
However, the partial evaluation of arithmetic operators does _not_ use
the target's limited word length representation, so not all operations
yield the same result if they are evaluated at compile time. This is
in the same spirit of macro assembler expression languages.

 Q: Why is it useful?

It's a sharp knife that can be a great tool if you know how to handle
it.  Essentially, it allows Forth's staging annotations - the words [
and ] - to be dropped in Coma code.

 Q: Doesn't it lead to a mess?

Yes, if you don't know when an expression is evaluated.  This is well
specified however.  The problem is in realizing that that the
operations of expression evaluation and value+operation projection
_don't commute_ for all operations.  They do commute for addition and
subtraction (i.e. for PIC18 equivalence modulo 2^8), but involving
multiplication and division (and bit shifts) breaks the morphism.

 Q: Can I get some more hand-holding?

I don't know how to appropriately add constraints to this to eliminate
common mistakes.


Entry: Sheep works!
Date: Wed Jul 23 19:23:52 CEST 2008

The Sheep synth, Purrr's test app with an attitude is working on the
new application core. This means the Brood 5 refactoring has come to
an end, and it's time for a release.


Entry: Staapl 0.5 release notes
Date: Thu Jul 24 11:35:47 CEST 2008

The Staapl project has come to a point where it is ready for public
scrutiny.

http://zwizwa.be/staapl

Staapl is a collection of abstractions for (meta)programming
microcontrollers from within PLT Scheme. The core of the system is a
programmable code generator structured around a functional
concatenative macro language. On top of this it includes a syntax
frontend for creating Forth-style languages, a backend code generator
for Microchip's PIC18 microcontroller architecture, and interaction
tools for shortening the edit-compile-test cycle.

This is the completion of the phase I goal: to build a practical,
well-factored, easily extensible base system upon which to build phase
II: experiments with domain specific languages for DSP and embedded
control.

From this point I will continue fixing bugs and polishing
documententation. An overview can be found here:

http://zwizwa.be/archive/staapl.html

The PIC18 Forth dialect is documented in the following papers:

http://zwizwa.be/archive/pic18-forth.pdf
http://zwizwa.be/archive/pic18-synth.pdf

The interactive aspect is documented here:

http://zwizwa.be/archive/pic18-interaction.html

Staapl is available on PLaneT as zwizwa/staapl. To install and compile
the test application:

  mzscheme -p zwizwa/staapl/examples/build-synth



Entry: A simple example
Date: Sun Jul 27 11:53:19 CEST 2008

Staapl is now installed on PLT Planet, which is the central repository
for PLT Scheme user contributions, so it's very easy to install.  Get
the latest PLT Scheme at http://pre.plt-scheme.org You only need the
command line version (MzScheme)

To try it out in batch mode, create a file test.ss with the following
contents:

 #lang scheme/base
 (require (planet zwizwa/staapl/prj/pic18))
 (forth-compile ": inc 1 + ;")
 (save-ihex "test.hex")

Then execute it like this:

> mzscheme test.ss

This will locally cache the Staapl distribution, and create a file
test.hex containing the compiled Forth code ": inc 1 + ;". Check it
with gpdasm:

> gpdasm -p 18f1220 test.hex 
000000:  0f01  addlw	0x1
000002:  0012  return	0

This only uses the barebones compiler. To compile a Forth application
together with the boot monitor, replace the 'require line in the build
script with:

 (require (planet zwizwa/staapl/prj/pic18f1220-serial))

And compile again. The top file of the monitor code can be found here:

 http://zwizwa.be/darcs/staapl/staapl/pic18/monitor-p18f1220.f

These files are cached locally in this directory:

> mzscheme -p zwizwa/staapl/prj/pic18 -e pic18-lib


Entry: Onwards: concurrency / types / ans
Date: Wed Jul 30 08:51:04 CEST 2008

* This is interesting:
  http://www.transterpreter.org/docs/index.html

  I'm a bit sad I didn't get to the distributed system for Meshy last
  year, being punished for overestimating my practical electronics
  skills. But surely, concurrency is the next step for Staapl, next to
  some more static analysis. Maybe this occam style stuff can be
  combined with the Forth things?

* Reading about types lately, especially TAPL. This article summarizes
  it well: http://www.pphsg.org/cdsmith/types.html Types are "things
  to proove about programs". The thing is that this can really mean
  anything. According to Chris that's the big idea. This is
  significantly different from "types are sets" in the dynamic/lisp
  sense.

* ANS Forth. I need to provide a mechanism to build a standard Forth
  on top of the PIC18 Forth. This can be done in steps, the first one
  being a data-doubler to bring the 8-bit version to a 16-bit
  version.



Entry: Coma semantics
Date: Wed Aug  6 21:20:34 BST 2008

I've tried several times to explain the semantics of Coma, but I don't
get any further than a concrete step by step rewrite semantics.  Maybe
I should keep it at that. This post tries to clarify my ignorance on
the subject.

The goal is to translate a source concatenative term s into a target
concatenative term t.  Concatenative terms are either concatenations
of concatenative terms or primitive terms.

Let  s = s0 s1 ... | s'   source programs
     t = t0 t1 ... | t'   target programs

Here s, t are terms and s', t' are primitive terms. Denote the set of
s and t as S and T, and the set of s' and t' as S' and T'.
     
Basicly, pure concatenative translation maps primitives to primitives
S' -> T'.  The complete compilation expands a source concatenation
into into a concatenation of primitives which then one by one can be
translated to a target primitives.

In order to make this a bit more interesting, we allow substitutions
to occur in the language and make the translation more general.
Substitutions replace a subconcatenation with another one.

   Define the translation system {p,r} consisting of a map p : S' -> T
   together with a collection r of rewrite rules on T.

The map p associates primitive source terms with concatenations of
primitive target terms.  The rewrite rules r allow the construction of
a reduction system that translates the concatenation of terms p(s0)
p(s1) ... obtained from the source program s = s0 s1 ... to a term t_n
in normal form.

The alternative approach where substitution rules are defined on S
followed by a direct primitive map S'->T' is equivalent, so we
can concentrate on rewrite rules in T only.

Now, the implementation in Coma is a bit different, and modeled after
standard Forth macros.

   Define the translation system {m} where m : S' -> T -> T maps each
   source primitive to a target code transformer T -> T (1)

In uncurried form, Coma replaces the concatenation operation by a
binary operator T x S' -> T.  Given a source program S = S0 S1 ...
this operator can be used to create a whole program concatenation
operator S0' x S1' x ... -> T by folding the binary operator over S
initalized with the empty program.

The question is, how are {p,r} and {m} related? The rewrite system r
obviously needs an algorithm to implement it and some constraints that
allow the existance of normal forms for all terms t.  I think the
interesting questions are then:

  * What are these constraints and how to derive the representation of
    the reduction algorithm {m} from a higher level declarative
    representation {p,r}?

  * Is it useful to implement it like that, or is a more direct
    pattern matching approach good enough? (2)


Notes:

(1) Stack semantics in Coma comes from an extra constraint that gives
    the code transformers T -> T some locality by having them operate
    only at the end of the primitive code concatenation, essentially
    viewing the target program string as a stack.  The specification
    language for primitive transfomers in Coma enforces this, by
    building on top of Scat.  However the general principle exposed
    here is applicable to more general point-free languages.

(2) The Coma pattern matching approach {m} is not exactly limited.  A
    central idea in Coma is that the (directly) expressed rewrite
    rules encode 2 kinds of equivalence relations.  The first one is
    partitioning of target programs with the same target semantics.
    For PIC18 the resulting reduction is mainly a reduction of program
    size. This is a fairly straightforward OPTIMIZATION technique.
    The second relation collapses concatenation of pseudo instructions
    that have no individual target semantics into constructs that do.
    This is the PARAMETERIZATION part of Coma.




Entry: Staapl in a nutshell
Date: Thu Aug  7 08:24:37 BST 2008

Staapl aims to work out the claim that Forth is a decent machine
language on which to build DSLs using macros in Scheme style.

  * [UNROLLED REFLECTION] It places the the basic compositional forms
    of Forth (procedures, macros, parsing words) in a non-reflective
    declarative hierarchical language tower framework.  The main
    advantage is that it gives an declarative a-cyclic ``unrolled''
    layered language structure that is easier for metaprogramming.

  * [FUNCTIONAL PROGRAMMING] Staapl models Forth macros as a pure
    functional stack language, operating on a stack of (virtual)
    machine code instructions.  To this end it provides a pattern
    matching language for writing primitive code transformers in terms
    of instruction templates.

  * [PARTIAL EVALUATION] It uses the purely functional subset of Forth
    (words that operate on the top of the stack only) as a guide to
    derive a set of pattern matching rules that implement partial
    evaluation for this subset.  This is done by eagerly moving
    computations to compile time.

  * [PARAMETERIZED CODE GENERATION] The partial evaluation mechanism
    is generalized to use Scheme's data types, including its number
    system, to facilitate the construction of arbitrary code generator
    primitives.

  * [SCHEME] To avoid any aribitrarily introduced abstraction walls,
    all these mechanisms are available in and built on top of PLT
    Scheme modules.  This allows the construction of s-expression
    based DSLs by expanding code down to a concatenative machine
    language.

  * [INTERACTION] On top of the declarative structure, a toplevel
    interaction environment is provided that allows interactive code
    compilation and execution.  Care is taken to make this convenient,
    i.e. to provide simulation for common language constructs if they
    are not instantiated.

  * [CULTURE] Giving up Forth's traditional imperative self-reflective
    property and absence of intermediate representation dispenses with
    some minimalist elegance.  Staapl is very different in ``culture''
    from traditional Forth.



Entry: Is classic code analysis necessary?
Date: Thu Aug  7 08:33:59 BST 2008

The Coma compiler includes a somewhat traditional but at the moment
quite limited intermediate code processor to implement optimizations.
Between compilation and assembly there is a point where the code is
structured as a control flow graph. This is not used yet, but there to
perform some optimizations not possible in the rewrite framework.
(I.e. jump chaining, loop tricks, ...)

However, I'm still not convinced such a postprocessor is actually
necessary. For very simple target architectures, I suspect the
generated code will be already quite optimal because:

  * Generating good code is the responsability of the macro writer.

  * The partial evaluation for the functional subset works quite well
    to eliminate obvious parameterizations and the pattern matching
    allows the specification of a good impedance match to the machine
    instruction set.

As practical evidence for this, the PIC18 rewrite language already
provides quite optimal code in the practical uses that I've
encountered, which is low level embedded programming with some simple
language extensions, mainly to generate lookup tables.

On the other hand, when implementing more elaborate DSLs on top of
this system, it might be interesting to perform proper data flow
analysis and register allocation. For more complex targets however, I
suspect that Staapl looses its benefits.



Entry: Interaction: single chip vs. distributed applications
Date: Sat Aug  9 10:23:45 BST 2008

I've been thinking about the interaction model for Staapl.  In order
to make things more manageble, I will introduce two different cost
models for interactive development.  The main distinction is about
single/multi core applications.

* SINGLE CHIP

  Defined as an application where a single uC performs a well-defined
  task without much higlevel communication to other chips, and can be
  developed and tested in isolation.

* DISTRIBUTED

  Defined as an application where a single uC is integrated in a
  communication network.  The communication between uCs is an
  essential part of the solution, and cannot be conveniently developed
  or tested in isolation.

These two applications deserve a different solution pattern.  For
single chip applications, communication overhead necessary for
debugging should be minimized, while in distributed systems,
communication is essential to the operation and can be used to support
debugging.


To simplify matters, the rest of the exposition is specific to the
Microchip PIC line, but should be easily generalizable to other
manufacturer's chip lines.

For single chip applications, the aim should be at making the hardware
interface necessary for programming and debugging as simple as
possible since it is not part of the solution.  If interactive
debugging is required, use a communication protocol that minimizes
on-chip resource usage.  For PIC this means the use of the native
Microchip programming interface, and use the same pins for the
debugging protocol.  PicKit2 [1][2] could be of use here.

For distributed applications, Staapl will concentrate on PICs that are
self-programmable: the ones that have an internal charge pump to
generate the Flash programming voltage. (Anything > PIC18 and some of
the PIC16).  Each PIC should be equipped once with a (read-only)
monitor that understands the basic network protocol and can update
firmware.  The programming host <-> network interface can then be
implemented using the single chip approach which interfaces to a
network entry node.

Learning from past attempts at building an easy to interface platform,
the most important drawback has been the necessity for expensive
programming and interaction hardware.  I managed to bring it down to
20 Euro per person for the serial interface + chip with bootloader.
With the availability of the PicKit2 for 25 Euro it is probably not
possible to find a cheaper solution for bootstrapping blank chips.  It
looks like the protocol is open, so direct support in Staapl should be
feasible.
 
[1] http://www.microchip.com/stellent/idcplg?IdcService=SS_GET_PAGE&nodeId=1406&dDocName=en023805
[2] http://www.didel.com/


Entry: Forth's 2 modes: interpret and compile
Date: Sun Aug 10 14:28:24 CEST 2008

Classically, Forth distinguishes two modes:

  (I) Interpret mode
  (C) Compile mode

Staapl's Forth does the same.

For most languages, this difference is not exposed to the programmer.
Compiled languages usually expose only a single entry point to
compiled code (the "main" procedure), unless they are compiled as
libraries, in which case they are linked to other programs or
libraries and not invoked directly.

Forth however uses the observation that code entry points (program
procedures) behave as individual programs and thus can be executed
individually. This is both convenient and confusing.

Convenient: Interactive use of program components is convenient for
            testing.  Only exposing the compiled procedures comes
            essentially for free: a jump to the code address.
            Additionally, this mechanism allows simple implementation
            of reflective beaviour.  I.e. the subprogram ":"
            accessible starts the compilation of new code by switching
            to compile mode.

Confusing:  Not all language elements that are accessible during
            compilation can be accessed in interpret mode.  More
            specifically, all macros (immediate words) cannot be used.
            This includes all structured programmig words like
            conditionals and loops.

In order to eliminate this confusion, most compiled languages that
support an interactive console will compile commands given at the
console before executing the freshly compiled code.  This makes sure
all language elements are there at the console.

From that perspective, it's probably better to compare Forth's
interactive console to a unix command shell (executing compiled
programs, executing commands to compile new programs) than an
interactive language shell that exposes all language features
available to code to be compiled.  In comparison, Haskell's ghci also
do not expose the full language in the repl, i.e. you cannot create
definitions.

In Staapl, interpret mode is not really necessary since the reflection
mechanism of standard Forth that is rooted in this interpret/compile
distinction is not present.  It is not that difficult to implement an
extra compile step to make sure console input can use the full
language, not only compiled words, to lessen confusion.  In fact a
previous version of Staapl included a ``scratch buffer'' used for
compiling code directly, and I'm planning a simplified language that
uses this as a primary interaction method.

However, the mechanism has been preserved because of the use of Flash
ROM code memory in contemporary microcontrollers with Harvard
architecture.  Such memory isn't as easily overwritten as RAM, which
makes generation of temporary code for translation of console commands
difficult.  The standard Forth interpret mode allows the execution of
compiled code without touching the Flash memory.  

This is extended in Staapl with a simulation layer which can offload
some functionality to the host if it is not present as compiled code.
The reason is that compared to standard Forth, Staapl's Forth is
heavily macro based, and some basic words might always be inlined,
making them not available in interpret mode.



Entry: Coma and names: reflection vs. staging
Date: Wed Aug 13 10:51:53 CEST 2008

Why does Staapl need two abstraction mechanisms: macros and prefix
macros?

The short answer: Coma's code generators are flat anonymous higher
order combinators.  All names are just notational shorthands that
identify combinator expressions.  Because the naming mechanism is
completely orthogonal to the semantics of the Coma language,
abstractions built on top of the mechanism that associates names to
expressions needs a separate mechanism.  See next post.



1. TRANSFORMATION

Let's define a macro as a function that transforms program code into
program code.

A Scheme macro transforms a list of Scheme expressions into into a new
Scheme expression.  The macros live in the same space as the
expressions that are being transformed.

A Coma macro transforms a stack of instructions into a stack of
instructions.  The instructions come from some target concatenative
language.


2. STAGING

One essential difference that is immediately observed is that program
transformation with Scheme macros is reflective.  Macros operate on
Scheme code and produce Scheme code.  The process of expansion is
repeated iteratively until all macros have been invoked and only
primitive forms remain.

In Coma, program transformation is staged.  Every transformer function
is a component of a meta language that manipulates a base laguage as
data.  These languages are distinct.  A Coma program defines a base
language transformer.  

Note that the Coma language is a higher order functional language.  A
lot of parameterization can be solved using anonymous macros.  This
approach gives a really simple structure to Coma that allows to treat
staging in a very straightforward algebraic way without having to
worry about bindings at all.  Absence of ``special forms'' allows one
to maintain a very direct relationship between syntax (concatenation)
and semantics (function composition).


3. PREPROCESSING

When one would like to parameterize over names in a Coma program, a
different abstraction mechanism is necessary.  Since Coma is embedded
in Scheme, the straightforward approach is to use Scheme's reflective
macro system to handle this.  This mechanism is introduced into
Staapl's Forth preprocessor as ``prefix macros'' or ``parsing words''.

Additionally, since names in Coma are all global, it makes a lot of
sense to build some scoping mechanism on top of this.  Coma uses both
PLT Scheme's hierarichal module namespace mechanism and an additional
subdivision into separate namespaces for compilation stages: (macro)
and (target), plus namespaces to embed alternative concatenative
semantics like (scat).  On top of this, it is possible to mix Coma and
Scheme code in a way that allows Scheme's lexical bindings to be
spliced into Coma code.




Entry: The role of names in concatenative languages
Date: Wed Aug 13 12:56:04 CEST 2008

Lambda calculus is based on the two complementary operations of
abstraction and applications.  Combinatory logic is a variant of the
lambda calculus expressed in terms of just application and a couple of
primitive lambda terms called combinators[1].  Further simplification is
possible: the combinators can be made ``flat'' by using only left
associative application[2].

In Scheme, a practical programming language based on the lambda
calculus, some forms are allowed to be abbreviated using names.  This
abbreviation mechanism is not essential to computation: it is merely a
notational device.

  * A set of primivite abstractions and values.
  * function application
  * abstractions through LAMBDA
  * abbreviation of values thorugh DEFINE

In Scheme there are 2 mechanisms that introduce names.  DEFINE
introduces abbreviations for expressions on the toplevel, while LAMBDA
uses names to create parameterized code, which are later filled in
using (explicit) applications.  

Eliminating abstraction (LAMBDA) from Scheme can be done in a similar
way as for the lambda calculus -> flat combinators conversion.
Replacing application with only left associative application creates a
language which has a mechanism to define names (DEFINE) that is
completely independent of its primary abstraction mechanism
(concatenation of terms).

This is quite remarkable.  In Scheme it is possible to unify these two
name mechanisms by interpreting DEFINE as the introduction of lexical
variable by wrapping a Scheme expression in a LAMBDA.  However, in a
concatenative language we are now free to take another route: to take
names _out of the language_ completely, making it part of some meta
language.

This is what happens in Coma.  All the metaprogramming machinery can
be concentrated in a very simple combinatory ``algebra'' instead of
having to deal with scoping of names.

[1] http://en.wikipedia.org/wiki/Combinators
[2] http://www.eecs.usma.edu/webs/people/okasaki/pubs.html#jfp03



Entry: Programmer bootstrap, Staapler and PicKit2
Date: Wed Aug 13 17:11:34 CEST 2008

Byron Jeff gives some good arguments here that a zero-cost bootstrap
is not really possible anymore with contemporary hardware:

http://www.nabble.com/Re%3A-New-open-USB-programmer-p18963084.html

Jean-Daniel Nicoud mentions in the same thread that:

   It is not worth to loose time with serial or parallel programmers.
   I like the Pickit2 USB programmer, cheap and it can be used as an
   UART-USB interface when it has finished its work of programming
   your Hex file. It encouraged me to develop a set of boards that
   makes both learning and development more easy. Have a look to
   http://www.didel.com/ It's brand new, so I put the links on the top
   of the page.

From the PicKit manual available in 
http://ww1.microchip.com/downloads/en/DeviceDoc/FirmwareV2-32-00.zip

   5.1.2    UART Mode

     UART Mode allows the PICkit 2 to be used as a simple UART.
     ICSPCLK = TX (data transmitted from the PICkit 2)
     ICSPDAT = RX (data received from target UART)

     Signal levels are inverted (ie Start Bit = GND) logic level
     signals between GND and VDD.

It would be interesting to try to use this instead of writing
Staapler, or let Staapler be the interface to networked programming on
top of a PicKit host connection  
