New patches: [unrevert anon**20130602173828 Ignore-this: 7c45f63f1de6765269953912177e15d4 ] hunk ./www/index.html 1 - - -Staapl - - -

Staapl

- - -

TL;DR

- -Staapl is a "macro assembler on steroids" for PIC18 microcontrollers. -It can be used for writing firmware optimized for size. It goes where -C can't go, and where assembler would be too complicated. It is based -on Scheme (code generator) and Forth (target language / machine -model). - -

About

- -

Staapl uses custom code generators (macros, partial evaluation, - metaprogramming, staging, ...) to generate code that is severely - resource-constrained. - -

Staapl is an extension of - the Racket programming - language, a Scheme dialect with strong support for embedded domain - specific languages (EDSL) written in terms of Scheme macros. A - Staapl target language is a collection of target code generator - macros. - -

The language for the - Microchip PIC18 is a - language based on - on Forth. - It has an optimizing peephole compiler, and a tethering mechanism - with interactive console for running-target code updates through the - PicKit2 programmer. - -

Unlike traditional Forth which is highly reflective, Staapl is - built - using macros - with phase separation, which breaks the circular reflection into - a directed language tower, creating a static semantics (the meaning - of the language doesn't change inside one module, in contrast with - original Forth's approach). Most of Staapl is organized around a - convenient abstract syntax based on a - typical - concatenative combinator language - like Joy. - This syntax can then be used as a base for different interpretations - or as a target for concrete syntax frontends. - -

(With the risk of being too vague..) While the PIC18 language is a - stack language, the basic code generator structure could support any - kind of "algebraic" language - (point-free - languages without variable bindings). Note that "point-free" only - refers to the basic compoisition mechanism. The core substrate of - Staapl is still Scheme, so primitive macros can have local variables - if that simplifies programming. This is used effectively in the - Staapl Forth for PIC18. - -

The basic idea is that point-free languages are easily manipulated - on the meta-level which makes them an ideal code generation target. - 1. Such languages have no variables that complicate code generation - due to scope issues (hygiene). 2. Thanks to the associativity of - the operations of concatenation (syntactically) and function - composition (semantically), computations are trivially partitioned - in compile time and run time. 3. Reliance on combinators for - program structure (as opposed to explicit recursion and loops) - allows for the derivation of program transformation laws that can be - used for program derivation or compilation to efficient - implementations. - - -

Status

- -

- The phase I goal - a usable system for PIC18, be it with some rough - edges - is reached, and the corresponding code is put in incremental - improvement mode. - -

This base will serve as the basis for phase II: further - experiments with point-free domain specific languages for - DSP/control, static verification, and more elaborate compilation - techniques to implement them. - -

At this time, Staapl is only practical for the PIC18 architecture. - While writing other ports should be straightforward, it is a lot of - work. I have some code for the 12 and 14 bit PIC cores, and I've - been looking into porting staapl to PIC24/30/33 (dsPIC). The main - hurdle is writing a peephole optimizer for the new architecture. - - -

Documentation

- -Documentation is loosely split into two parts: one that documents the -whole system from a Scheme p.o.v. and one that exposes only the -low-level Forth language view. - -( NOTE: Some of this is a bit outdated. If you're interested and -having trouble getting things running, I'd be glad to help out in -return for some feedback. Send me an email. ) - - - - -

Since this is a developer-oriented project with a lot of effort put - into clean organization, I suggest looking at the source code to - pierce through the top-layer of workflow glue. Staapl is structured - as a library with multiple entry points, not a monolithic - application. I encourage you to build your own workflow on top of - it, and integrate it with other tools. (And share that code!). - - -

Hire Me

- -

Staapl is best fit for producing turnkey burn-and-forget binary -firmware blobs, with no constraints on tools for team-integration. -The basic idea is that I can be cheap as long as you don't need to -modify the code yourself. Not that you can't or that I'm reluctant to -provide source code and build tools, but there's a steep learning -curve involved. ( Powerful tools do take some investment. ) - -

Projects

- -

- The PIC18 Forth language has been used in: -

- -

Download

- -

Staapl is part of PLaneT. To install the command line Forth - compiler frontend, - install PLT Scheme 4.1 - or newer (only MzScheme is required) and type this with - administrator privileges: -

-  mzscheme -p zwizwa/staapl/install
-
-This will install the staaplc wrapper application -alongside mzscheme. All Scheme modules are accessible -using standard PLaneT paths. - -

For the latest changes track -the darcs archive (hashed -darcs-2 format). If you need a snapshot that doesn't depend on darcs, -just ask. The Staapl source code, including that of its subprojects, -is released to the public under the terms of the GPL v2. Staapl is -written by Tom Schouten. - -

-Sun May 20 10:45:32 EDT 2012
-
- - - + +Staapl
1 Introduction
1.1 Forth to Machine Code
1.2 Scheme as Meta-language
1.2.1 Code and Composition
1.2.2 Primitives
1.3 Other RPN Languages
1.4 Representation
1.5 Assembler
1.6 More
2 PIC18 Forth
2.1 Syntax
2.2 Instantiation
2.3 Macro vs. Forth
2.4 Modules
2.5 Byte cells
3 Reference
scat:
macro:
op:
tv:
target-value?
target-word?
macro
patterns
compositions
state:  stack
target-value->number
op-apply
state->stack
state-print-code
5.3.4.7

Staapl

raco pkg install github://github.com/zwizwa/staapl/master

1 Introduction

Staapl is a Scheme to Forth metaprogramming system. To illustrate the +general idea we’re going to use a concrete application: the Forth +compiler for the 8-bit Microchip PIC18 microcontroller architecture. +This section uses REPL interaction with some example code written on +top of the compiler to demonstrate the code generation process.

 (require pic18/demo)

1.1 Forth to Machine Code

The code> form provided by the demonstration module +interprets the forms in its body as PIC18 Forth code, compiles them, +and prints out the resulting intermediate code with one instruction +per line.

> (code> 123)

[qw 123]

The instruction qw tells the target machine to load the +number 123 on the run-time parameter stack. It is short for +“quote word.” Providing a sequence of numbers in the code> +body will generate concatenated machine code, which is executed by the +target machine from top to bottom.

> (code> 1 2 3)

[qw 1]

[qw 2]

[qw 3]

Target code is represented in this intermediate form during the first +code generation pass to facilitate code transformations. It consists +of a mix of pseudo instructions and real PIC18 instructions. The code +generator will eventually clean up all occurances of qw +before attempting translation to binary machine code. The +pic18> form performs this extra step and shows real machine code +output.

> (pic18> 123)

[movwf PREINC0 0]

[movlw 123]

The first instruction stores the contents of the working register in +the 2nd position on the parameter stack, and the second instruction +replaces the contents of the working register with 123. +Again, concatenating compiler input produces concatenated output:

> (pic18> 1 2 3)

[movwf PREINC0 0]

[movlw 1]

[movwf PREINC0 0]

[movlw 2]

[movwf PREINC0 0]

[movlw 3]

The intermediate instruction set which contains the qw +instruction is useful for implementing partial evaluation +rules. When compiling a particular Forth word, the compiler can +inspect the code already compiled to determine if it can combine its +effect with the effect to be compiled.

> (pic18> +)

[addwf POSTDEC0 0 0]

> (pic18> 1 +)

[addlw 1]

> (pic18> 1 2 +)

[movwf PREINC0 0]

[movlw (1 2 +)]

This illustrates 3 different modes of computation. The first program +computes the addition at run-time, taking both input values from the +runtime stack and putting back the result. The second program adds +the literal value 1 to the top of the stack using a different +machine instruction. The third program doesn’t perform any run-time +computation at all and simply loads the result of the addition that +was computed at comple-time because both inputs to the addition were +available.

Note that in this last program the result of the compile-time +computation is not shown. Instead it shows a program (1 2 +) +that gives the result upon evaluation. The compiler doesn’t need to +know the exact value at this point. It only needs to know that the +value can be determined later when necessary. This is essential for +integration with the assembler, since these expressions might contain +symbolic representations of code locations that are not yet associated +to numerical addresses.

Using the intermediate form with the qw pseudo-instructions +to compile the Forth program 1 2 + shows the key idea: the +target code list can be interpreted as a parameter stack, with +the top of the stack at the bottom of the code list.

> (code> 1 2)

[qw 1]

[qw 2]

> (code> 1 2 +)

[qw (1 2 +)]

The code stack can be used as the argument passing mechanism +for a language of macros that is active at compile time. +Machine instructions then become datatypes of this language. The word ++ names a function that operates at compile time. +It inspects the code stack and if it finds one or two qw objects it +can use them as input to the addition operation and compile a simpler +run-time instruction.

1.2 Scheme as Meta-language

While traditional Forth has its own metaprogramming facilities, +combining it with a Scheme-based meta system gives the added advantage +of tying into a powerful lexical scoping mechanism. The +ability to assign local names to objects comes in handy when dealing +with complex data structures, which is sometimes difficult to do in a +combinator language like Forth or Joy. At the same time, the absence +of lexical binding forms in Forth code make it very suitable to be +handled as data, avoiding complications associated to name +capture.

This sections introduces the forms macro:, +compositions, patterns and tv: which +comprise the metaprogramming interface between the Forth and Scheme +languages in Staapl.

1.2.1 Code and Composition

A representation of a Forth program can be composed using the +macro: form.

(define code1 (macro: 123))
(define code2 (macro: 1 +))

 

The objects created by macro: are called concatenative +macros. These are functions that operate on compilation state +objects.

> code1

#state->state

The target code associated to the objects code1 and +code2 can be printed using the function +state-print-code which extracts a code stack from a state object +and prints it, and the function state:stack which produces a +state object with an empty code stack to serve as an initial state.

(define (print-code code-rep)
  (state-print-code (code-rep (state:stack))))

 

> (print-code code1)

[qw 123]

> (print-code code2)

[addlw 1]

It is seldom necessary to apply concatenative macros to state objects +manually since composition using the macro: form will usually +suffice. In practice such application is performed by the framework +when generating target code.

The forms residing in the code body of a macro: form have a +one-to-one correspondence to concatenative macros. Literals +found in the body of a macro: are mapped to compilation state +transformers that append qw instructions to the current code +list. Identifiers are mapped to function values using the +macro form, which fishes them out of the (macro) +dictionary.

> (macro +)

#state->state

Note that the objects produced by (macro +) and (macro: +) are different, although they have the same behaviour. The former +is a variable reference while the latter creates a new abstraction. +This is similar to the distinction between the Scheme expressions ++ and (lambda (a b) (+ a b)).

Composing macros that are not in the (macro) dictionary is +possible using the unquote operation. The macro: +form behaves similar to quasiquote. Unquoted objects come +from the Scheme lexical environment and are interpreted as macros.

(define code3 (macro: ,code1 ,code2))

 

> (print-code code3)

[qw (123 1 +)]

This illustrates the advantage of keeping target code in +delayed form. The effect of both pieces of code has been +combined into a single target operation.

It is possible to unquote Scheme values as literals by wrapping a +quote form around the unquote form.

(define value 42)
(define code4 (macro: ',value))

 

> (print-code code4)

[qw 42]

To define new identifiers in a particular dictionary, we can use the +compositions form.

(compositions (macro) macro:
 (inc  1 +)
 (dec  1 -))

 

> (print-code (macro: 100 inc))

[qw (100 1 +)]

> (print-code (macro: 100 dec))

[qw (100 1 -)]

The first two sub-forms in a compositions form indicate the +target dictionary and body compiler respectively. The rest of the +body consists of a list of lists, where the first element of each list +is an identifier to which a macro will be associated in the +dictionary, and the rest of the list is a code body that’s passed to +the body compiler form.

1.2.2 Primitives

The previous section describes how to compose existing code to create +new code by concatenation, and how to evaluate code into a form that +can be passed to the assembler. This section will describe how to +define primitive macros operating on stacks of target machine +instructions.

Creating an instance of a machine code instruction is done using the +op: form. It is exactly these objects that are produced when +concatenative macros are evaluated.

(define ins1 (op: addwf 42 0 0))
(define ins2 (op: addlw 123))
(define ins3 (op: qw 123))

 

Real opcodes can be passed to the assembler to produce binary output. +Pseudo instructions cannot.

> (op-apply ins1 0)

'(9258)

> (op-apply ins2 0)

'(3963)

> (op-apply ins3 0)

asm-pseudo-op: qw

However, pseudo instructions can be used to hold intermediate data +during the compilation phase. The following will illustrate the use +of a form to define new primitive operations. We’ll create a macro +add that will behave like the macro + encountered +before.

Creating new primtive macros is done with the patterns form. +Its first subform specifies the dictionary to which definitions are +associated. The rest of the forms contains lists of pattern and +template pairs. The following example defines the add macro +as not taking any input from the code stack, but producing an +addwf instruction as output.

(patterns (macro)
  ((add) ([addwf POSTDEC0 0 0])))

 

The templates in a pattern form are lists of forms that are +passed on to the op: form, resulting in lists of instruction +objects. Verifying if it works gives:

> (print-code (macro: add))

[addwf POSTDEC0 0 0]

This is an example of a non-optimizing macro which only performs code +generation. To add different behaviour for different input patterns, +extra clauses can be added.

(patterns (macro)
  (([qw a] add) ([addlw a]))
  ((add)        ([addwf POSTDEC0 0 0])))

 

> (print-code (macro: 123 add))

[addlw 123]

> (print-code (macro: add))

[addwf POSTDEC0 0 0]

When a qw instruction appears in the input, it is +deconstructed and its operand is used as the operand of a +addlw operation. The patterns form is built on the +PLT Scheme match form, deconstructing a stack of +instructions according to input patterns, and constructing lists of +instructions to be added to the compilation state to replace the +matching top of stack.

Upto now our add doesn’t really perform compile-time +computation other than selecting a different instruction based on the +presence of literal data. Using the tv: form we can add +proper computation when there are two literals available. For now, +think of tv: as an RPN calculator behaving as in:

> (target-value->number (tv: 1 2 +))

3

> (let ([a 100]
        [b 200])
    (target-value->number (tv: a b +)))

300

The tv: form will install wrappers to enable computations +that can only be performed after the assembler has assigned numerical +addresses to code labels. The use of this compile-time calculator +then leads to an implementation of the + macro whith 3 +evaluation modes:

(patterns (macro)
  (([qw a] [qw b] add) ([qw (tv: a b +)]))
  (([qw a] add)        ([addlw a]))
  ((add)               ([addwf POSTDEC0 0 0])))

 

> (print-code (macro: 1 2 add))

[qw (1 2 +)]

> (print-code (macro: 123 add))

[addlw 123]

> (print-code (macro: add))

[addwf POSTDEC0 0 0]

1.3 Other RPN Languages

Staapl contains a generic concatenative language parser in +staapl/rpn which is used to implement the languages +macro:, tv:, scat: and target:. +This language syntax can be extended with prefix parsers to +implement a Forth-style prefix syntax for defining words.

For example the scat: form creates unary functions that +operate on a state? object just like the macro: +form. The state object on which they operate contains a single stack +which can be accessed as a list with the state->stack +function. This is the same stack which serves as the code stack for +concatenative macros. The functions in the (scat) dictionary +are thin wrappers around Scheme functions implementing the same +behaviour but operating on the top of the state’s stack.

> (state->stack ((scat: 1 2) (state:stack)))

'(2 1)

> (state->stack ((scat: 1 2 +) (state:stack)))

'(3)

The form scat> is similar to the form code> in that +it creates a concatenative function and applies it to a state with an +empty stack before pretty-printing the result. The contents of the +parameter stack is printed with the topmost element on the right side.

> (scat> 1 2 3 4)

<4> 1 2 3 4

> (scat> 1 2 3 4 + + +)

<1> 10

> (scat> 123 dup)

<2> 123 123

> (scat> '(a b c))

<1> (a b c)

> (scat> '(a b c) car)

<1> a

> (scat> '(a b c) uncons)

<2> a (b c)

The functions generated by scat: and macro: are +compatible. However the macro: form creates functions that +operate on tagged values representing target machine instructions. +
> (scat> 123 ,(macro: 123))

<2> 123 (#<asm> 123)

> (scat> ,(macro: 123) dup)

<2> (#<asm> 123) (#<asm> 123)

1.4 Representation

Let’s open up the form macro: to see how it is implemented +in terms of Scheme expressions. All RPN compilers are built around +two forms: the definition compiler form rpn-lambda which +builds lambda expressions out of tagged tokens and the +dictionary compiler form rpn-parse which transforms token +sequences into tagged token sequences according to the RPN language +syntax.

To illustrate the implementation we use the pretty-expand +function to perform macro expansion and pretty-printing with toplevel +Scheme annotations like #%app removed. The function takes an +optional second argument to indicate the kind of expansion. The +default is full expansion using expand. In addition single +expansion using expand-once is used.

The form (macro: 1 2 +) expands to an rpn-lambda +form, which nests (folds) the subforms it receives.

(define stx1
  #'(rpn-lambda
     (macro-push 1)
     (macro-push 2)
     (scat-apply (macro +))))

 

> (pretty-expand stx1 expand-once)

(lambda (p)

  (macro-push 1 p (macro-push 2 p (scat-apply (macro +) p p))))

> (pretty-expand stx1)

(lambda (p)

  (let-values (((p) ((lit '1) p)))

    (let-values (((p) ((lit '2) p)))

      (let-values (((p) (macro/+ p))) p))))

The macro-push compiles a literal value while +scat-apply compiles a function application. The +macro form performs dictionary lookup by mapping the ++ identifier to macro/+ which is bound to the +concatenative macro that compiles an addition operation. The +lit function transforms a number into a concatenative macro +that will compile the number. Let’s see them in action individually.

> (pretty-expand #'(macro-push 1 state next))

(let-values (((state) ((lit '1) state))) next)

> (pretty-expand #'(scat-apply macro/+ state next))

(let-values (((state) (macro/+ state))) next)

The rpn-parse form can be used to attach semantics to the +core RPN syntax. Let’s illustrate it by creating a new RPN language +represented by the program: form.

(define-syntax-rule (program: code ...)
  (rpn-parse (dictionary
              (resolve-name)
              function
              immediate
              quoted-program
              program:
              (rpn-lambda))
             code ...))
(define-syntax-rule (dictionary d) 'd)
(define stx2 #'(program: 123 add (foo bar) '(a b c)))

 

> (pretty-expand stx2)

'(rpn-lambda

  (immediate 123)

  (function (resolve-name add))

  (quoted-program (program: foo bar))

  (immediate `(a b c)))

The (rpn-lambda) subform in the definition of the +program: form represents the initial content of the first +dictionary element. The rpn-parse form will interpret the +code forms it finds and append tagged forms to the current dictionary +entry.

The expansion illustrates four syntactic structures in the RPN +language. The 123 form represents a literal value compiled by +the immediate form. The identifier add represents a +function reference, with the name reference processed by the +resolve-name form and the result of that compiled by +function form. The (foo bar) form represents a +quoted program compiled by the program: form and quoted by +the quoted-program form. The '(1 2 3) form is a +quote of a data structure which is a literal value handled by the +immediate form.

The first subform of rpn-parse, here defined as +dictionary, is the form to which rpn-parse expands, +passing its dictionary output as subforms. We defined it here as a +quote operation which will stop expansion of the dictionary +form. By defining dictionary as something that will allow +further expansion, and by defining the other forms passed to +rpn-parse in the definition of program: we can +introduce a semantics.

Suppose the RPN language we want to define uses a list as a state +that’s passed from function to function to represent a stack. We can +define a simple stack machine using these forms:

(define-syntax-rule (dictionary entry) (begin entry))
(define-syntax-rule (resolve-name n) n)
(define-syntax-rule (function fn state subform)
  (let ((state (fn state)))
    subform))
(define-syntax-rule (immediate expr state subform)
  (let ((state (cons expr state)))
    subform))
(define-syntax-rule (quoted-program . a) (immediate . a))
(define (add lst)
  (cons (+ (car lst)
           (cadr lst))
        (cddr lst)))

 

> (pretty-expand stx2)

(begin

  (lambda (p)

    (let-values (((p) (cons '123 p)))

      (let-values (((p) (add p)))

        (let-values (((p)

                      (cons

                       (begin

                         (lambda (p)

                           (let-values (((p) (foo p)))

                             (let-values (((p) (bar p))) p))))

                       p)))

          (let-values (((p) (cons '(a b c) p))) p))))))

> ((program: 1 2 3) '())

'(3 2 1)

> ((program: 1 2 add) '())

'(3)

The RPN languages macro:, target:, scat:, +live:, tv: have the same syntactic structure with a +different semantics attached by means of forms that are passed to +rpn-parse in the definition of their program quotation forms. +They all use dictionaries with a single element. Forth compilation +however uses multiple entries. It uses extensions of +rpn-parse to create new dictionary entries (i.e. the +: word), representing named functionality.

1.5 Assembler

The assembler performs two tasks: converting symbolic representation +of machine code to binary and determining code addresses. The former +is relatively trivial. The staapl/asm module contains the +special form instruction-set which implements a convenient +syntax for creating assembler functions from strings representing bit +field layout.

(instruction-set
   (foo (a b c) "0101 aaaa bbbb bbbb" "1111 cccc cccc cccc"))
(define foo-op (op: foo 1 2 3))
(define start-address 0)

 

> foo-op

'(#<asm> 1 2 3)

> (for ((bin-op (in-list (op-apply foo-op start-address))))
    (printf "HEX: ~x\n" bin-op))

HEX: 5102

HEX: f003

This defines the foo assembler instruction as a 2-word +instruction with 5 bitfields: two constant bitfields and 3 variable +ones. Whitespace in the strings are ignored, and can be used for +nibble separation. Instructions defined using this form also define a +corresponding disassembler function.

Target address allocation is non-trivial due to a circular dependency. +Numerical constants can depend on numerical address values, while +address values are determined by the size of instruction words, which +can depend on the value of numerical constants (i.e. shorter +instructions are used for small literal values or short relative +jumps). To resolve this dependency, the assembler performs multiple +passes using a monotonic relaxation algorithm. On every pass it +re-evaluates expressions encoded in a target-value? object +until code addresses stabilize. These expressions can be created by +the compiler using the tv: form.

1.6 More

[To be documented].

At the core of the Forth code generator is an incremental compiler +which constructs a control flow graph as its first output pass. +Subsequent optimization passes operate on this structure.

The distribution contains some example Forth code and library routines +that implement the target side of host to target tethering.

There is a significant body of code to perform run-time target access +and incremental code compilation and upload. This functionality can +be presented to the programmer as a Forth console, which can access +host and target functionality.

Staapl provides the staaplc command line application which +can be used to compile stand-alone Forth programs to binary, for +upload with a microcontroller programmer tool.

2 PIC18 Forth

For the Forth language there exist ANSI and ISO standards. In good +Forth tradition however, the Staapl Forth for the Microchip PIC18 is +non-standard. It shouldn’t be particularly difficult to build a +standard layer on top of it, so we will ignore that issue for now, and +refer to our particular dialect as “the Forth”.

The Forth is based on a 2–stack machine model. One stack is +used to pass data between subroutines while the other is used to keep +track of procedure nesting and other control state. It is essential +that they are independent, as opposed to languages like C which +store parameters and return points on the same stack. For the PIC18, +the model is a thin layer atop the concrete machine architecture: an +8-bit flash memory based microcontroller. The implementation actually +uses three stacks: byte size parameter and retain stacks, and a +21–bit wide hardware return stack.

A Forth typically enables direct low–level machine resource access +while providing a base for constructing high–level abstractions. +Traditionally Forth is implemented in machine language. However, +since the Staapl Forth doesn’t need to be self–hosting and so has no +constraints about compiler complexity, it can be written almost +entirely in macro form, instantiated by a meta–system running on a +workstation instead of a small target system. Staapl’s Forth is +subroutine threaded, with each Forth word corresponding to a piece of +inlined machine code or a procedure call. See the introductory +section for more information about how this works.

It should be noted that the Forth language layer is written on top of +the concatenative macro layer Coma. Essentially it provides two extra +features: Forth-style syntax in the form of prefix parsing +words (like : and variable) that can manipulate +identifiers by modifying the meaning of subsequent words, and target +code instantiation which allows the result of macro expansion +to be invoked at runtime as a machine code procedure.

In the following we use the form forth> from the +demonstration module, which is similar to the forms code> and +pic18> in that it compiles and prints code, but without any +simplifications. It is different in two respects. It takes an input +string which is passed to the Forth lexer, and it prints out a +representation of a control flow graph which takes the form of labeled +assembly code, which each basic block starting with a label.

> (forth> ": foo 1 2 3 ;")

foo: 0022 0044 6EEC [movwf PREINC0 0]

0023 0046 0E01 [movlw 1]

0024 0048 6EEC [movwf PREINC0 0]

0025 004A 0E02 [movlw 2]

0026 004C 6EEC [movwf PREINC0 0]

0027 004E 0C03 [retlw 3]

In addition to the assembly code, it prints two columns. The first +one is the target Flash ROM address of the instruction, and the second +is the instruction’s binary encoding.

2.1 Syntax

The Coma language on which the Forth is based uses an s-expression +syntax as it is implemented using Scheme macros. A parser converts +Forth code into s-expressions. This parser is based on the +rpn-parse Scheme form we encountered before, and behaves +similarly to Forth parsing words (those that use accept). The +rpn-parser parser form and its exptensions translate a +linear token stream into an s-expression representation of a +Forth-style dictionary data structure. The token stream passed to +rpn-parse is produced by a Forth-style word/whitespace lexer +which is currently not extensible.

The dictionary is the same as we encountered in the single definition +compilers like macro:. However, it has a different +representation due to a different header structure. Also in Forth +parsing, the dictionary typically contains more than one entry, with +each one corresponding to a unique name.

Essentially, the dictionary abstracts two actions. Compilation using +rpn-lambda as encountered before, and identifier definition +using Scheme’s define form.

Compared to traditional Forth, the thing to note here is that the +Staapl Forth is not circular in the sense that input parsing isn’t +done by Forth words, but by separate entities called prefix +parsers. These parsing words are extensions of the +rpn-parse form, and implement behaviour associated to words +like : and variabe. The main reason for this +non–cyclic organization is to keep the Coma core as simple as +possible so it can can support languages other than Forth using +different preprocessing steps. A simple, non–reflective core +language is easier to meta–program. However, breaking these +reflective circles looses some flexibility. Extending the parser (or +lexer) requires the use of a different language. The upside is that +the language for rpn-parse extensions is a simple high–level +rewrite language based on Scheme’s rewrite macro system. In practice +defining new prefix parsing words isn’t often necessary: manipulating +target names (generating collections of words as a module) is better +done in Scheme.

To add some visual to this we can have a look at the intermediate +dictionary s-expression rpn-parse creates before it is +expanded further, and see how it is expanded into Scheme code. We did +a similar thing in the program: form defined in the +introduction. The last generated dictionary form is logged for +debugging purposes in the forth-dictionary-log parameter.

> (pretty-print (forth-dictionary-log))

#<procedure:void>

This shows a list of two forms, both are macro calls to +forth-word. The first form is an empty definition with only +a header. In the second form, the forms following the name +foo represent a macro call to rpn-lambda which is +the heart of the RPN to Scheme compilation. The forth-word +form creates the name foo in the proper namespace and will +use the wrapper function wrap-word to postprocess the result +of the rpn-lambda expansion.

Note that the dictionary data structure looks quite like a typical +Forth dictionary, with some header information describing the +semantics of the code body, and the body a sequence of instructions. +This reflects the typical structure of a Forth parser/compiler. The +parser effectively executes source code from left to right. The +prefix parser represented by the word : will take the next +identifier from the token stream and create a new dictionary entry +with the proper header forms that implement its semantics.

2.2 Instantiation

In contrast to standard Forth which supports only direct style +low–level macros based on the literal, postpone and +immediate words, the Staapl Forth uses a high–level +pattern–based syntax for macro definition. As a result, the syntax +for Forth words and macros is essentially the same. More +specifically, there exists a syntax subset for which compositions can +either be implemented as meta–system macros or as target words.

Forth words are defined using the :forth prefix parsing word, +while macros are defined using the :macro word. Both words +behave as the standard Forth : word. Alternatively the +forth and macro words can be used to switch the +meaning of the : word to mean :forth or :macro +respectively.

From here on we’ll use a more direct interface to the PIC18 compiler. +Code represented as

macro

: add + ;

forth

: plusone 1 add ;

will be passed as a single string to the forth-compile form. +It accepts a string of Forth code and updates the code registry. +We’ll access the registry using the code function:

(define (code)
  (code-print)
  (code-clear!))

 

> (code)

plusone:

0028 0050 0F01 [addlw 1]

0029 0052 0012 [return 0]

This is essentially how the compiler works in actual use, except that +code will not be prettyprinted, but exported as a binary file and an +associated symbolic dictionary.

In the Forth code above the word add was defined as a macro, +while plusone was defined as a target word. In the output +assembly code it can be seen that all references to add have +disappeared. Only its effect remains: the compilation of an +addlw instruction.

Using these language constructs, the Forth programmer can decide which +compositions are to be present as run–time callable code on the +target machine, and which are always inlined. Whether one or the +other is better depends on the context. Staapl’s Forth allows one to +make these decisions manually, with a convenient way to switch between +the two forms. Let’s define add as a target word instead.

forth

: add + ;

: plusone 1 add ;

> (code)

add: 002A 0054 24ED [addwf POSTDEC0 0 0]

002B 0056 0012 [return 0]

plusone:

002C 0058 6EEC [movwf PREINC0 0]

002D 005A 0E01 [movlw 1]

002E 005C D7FB [jsr 1 add]

The rule of thumb is that by default one creates Forth words as this +often leads to smaller code and is easier to debug, except when one +wants to use syntactic features not available to Forth words (like +lexical parameters for macros), or when macros yield smaller and/or +faster code because of specialization. The machine’s primitives are +by default implemented as macros that can perform inline code +specialization.

The Forth language in Staapl is inteded as the lowest level system +programming layer. It is essentially an assembler with a powerful +macro system. Other languages with a higher level semantics might be +built on top of Coma, but the Forth provides the manual low–level +access when it is needed.

2.3 Macro vs. Forth

The differences between macros and forth procedures lie in their +composition system. Target procedures use the machine instruction +pointer and allow manipulation of the hardware return stack, while +macros are essentially lexically scoped Scheme procedures.

Target words allow multiple entry points with code fallthrough. This +is specified in source code by not terminating a word definition with +; such that execution continues with the following word. +Target names behave essentially as assembler labels. Sequential code +in Forth source leads to sequential machine code. Macro words support +only single entry points, as they are separate entities without an +order relation. Macro words always need to be terminated by a +; word.

forth

: plustwo 1 +

: plusone 1 + ;

> (code)

plustwo:

002F 005E 0F01 [addlw 1]

plusone:

0030 0060 0F01 [addlw 1]

0031 0062 0012 [return 0]

Target words allow multiple exit points. Macro words simulate this +feature by jumping to the end of the macro expansion for each +occurance of the ; word.

Macro words allow the definition of lexical variables using the +| word. These variables are taken from the compile–time +value stack and behave as constants: occurence of the variable in code +pushes the referred value. Target words do not support this feature.

macro

: 3dup | x | x x x ;

forth

: example 123 3dup ;

> (code)

example:

0032 0064 6EEC [movwf PREINC0 0]

0033 0066 0E7B [movlw 123]

0034 0068 6EEC [movwf PREINC0 0]

0035 006A 0E7B [movlw 123]

0036 006C 6EEC [movwf PREINC0 0]

0037 006E 0C7B [retlw 123]

Note that lexical names can be introduced anywhere in a macro body. +They are valid until the end of the macro definition.

Lexical variables can be used to create macro closures. The words +[ and ] can be used to create an anonymous macro that +can be passed around as a compile time value. The word i +invokes such a quoted macro.

macro

: make-twotimes | x |

    [ x x ] ;         \ create a nameless macro

: invoke-add

    i + ;

forth

: example

    10 make-twotimes  \ create the macro closure

    invoke-add ;      \ pass it to another macro

> (code)

example:

0038 0070 6EEC [movwf PREINC0 0]

0039 0072 0C14 [retlw (10 10 +)]

The target run–time procedure composition mechanism is available to +macros. This is useful for writing macros that abstract non–standard +control flow. The exit word always means procedure exit, +while ; means procedure exit in Forth words, but designates +macro exit for macro words.

Another important distinction has to be made regarding arithmetic and +logic operations and numeric representation. Once instantiated, these +are limited to the target word size. During compile time computations +however, infinite precision Scheme bignums are used.

forth

: one 1000000000000

       999999999999 - ;

> (code)

one: 003A 0074 6EEC [movwf PREINC0 0]

003B 0076 0C01 [retlw (1000000000000 999999999999 -)]

Compare the opcode with a previous one and note that it really returns +the 8-bit value 1.

2.4 Modules

Forth code can be organized in modules. These are implemented as PLT +Scheme modules to make it easier to mix Scheme and Forth code in the +same project. Scheme modules for PIC18 usually contain macros that +expand to a pic18-begin form.

To import a Forth module in a Forth project use the require, +planet or staapl words. This example imports the +route macro and its supporting target code which is used in +the subsequent examples.

require staapl/pic18/route

> (code)

_route/e:

003C 0078 60EF [cpfslt INDF0 0]

003D 007A 6EEF [movwf INDF0 0]

003E 007C 50ED [drop]

_route: 003F 007E 44E8 [rlncf WREG 0 0]

0040 0080 26FD [addwf TOSL 1 0]

0041 0082 0B01 [andlw 1]

0042 0084 22FE [addwfc TOSH 1 0]

0043 0086 50ED [drop]

0044 0088 0012 [return 0]

This is the same as

staapl pic18/route

Modules will only be instantiated once, registering target code in a +central repository.

2.5 Byte cells

Standard Forth needs a cell size of at least 16 bits. The Staapl +Forth uses 8–bit cells for data and retain stack, and uses a separate +wide stack for holding return addresses.

Keeping the cell size equal to the native data word size makes it +simpler to directly represent the machine architecture as a stack +machine through a collection of macros that map to simple +instructions. The small cell size isn’t much of a problem for simple +computing tasks. However, target code addresses do not fit in a +single cell which leads to some non–standard behaviour. Most notably +the execute word now takes two parameters, taking the high +byte of the word address on the top of the parameter stack. Another +notable point is that loops using for next are +limited to 256 iterations.

The fact that a word address does not fit in a variable or a single +stack cell promotes the use of byte tokens for representing +delayed behaviour, which are implemented in terms of jump tables using +the route word.

forth

: one   1 ;

: two   2 ;

: three 3 ;

: four  4 ;

 

: interpret-byte

    route

       one . two . three . four ;

> (code)

one: 0045 008A 6EEC [movwf PREINC0 0]

0046 008C 0C01 [retlw 1]

two: 0047 008E 6EEC [movwf PREINC0 0]

0048 0090 0C02 [retlw 2]

three: 0049 0092 6EEC [movwf PREINC0 0]

004A 0094 0C03 [retlw 3]

four: 004B 0096 6EEC [movwf PREINC0 0]

004C 0098 0C04 [retlw 4]

interpret-byte:

004D 009A DFF1 [jsr 0 _route]

.L5: 004E 009C D7F6 [jsr 1 one]

004F 009E D7F7 [jsr 1 two]

0050 00A0 D7F8 [jsr 1 three]

0051 00A2 D7F9 [jsr 1 four]

The . word is the same as exit except that it does not +mark the subsequent code as unreachable. The reason this abstraction +works is that a procedure call followed by a procedure exit is always +compiled as a jump. This means the Staapl Forth is tail-recursion +safe.

3 Reference

 (require staapl/macro)

syntax

(scat: rpn-form ...)

Most basic RPN form with the following grammar

  rpn-form = id
  | literal
  | (rpn-form ...)
  | (quote rpn-data-form)
  | (unquote scheme-form)
  rpn-data-form = scheme-data-form
  | (unquote scheme-form)

Here literal is any form that is not an identifer or a list, +form is any Scheme form and scheme-form a Scheme +form expanding to a state transformer function.

syntax

(macro: word ...)

syntax

(op: word ...)

syntax

(tv: word ...)

Creates a composite function similar to +scat: except that it creates a target-value? which +when forced, evaluates the composite function on a state? +with an empty stack. If this evaluation produces a one-element stack +this value is returned. Otherwise an exception is raised. Also, +identifiers in the subforms word are interpreted as quoted +target-value? if they are bound in the lexical environment +of the form. Otherwise they come from the scat dictionary.

procedure

(target-value? tv)  boolean?

  tv : any/c
A delayed computation that can be forced with +target-value-eval or target-value->number. It may +depend on the target-word-address of any +target-word? it contains.

procedure

(target-word? tw)  boolean?

  tw : any/c
A symbolic +representation of a target code address. A node in a control flow +graph. Assigned a value by the assembler. Can be used in +target-value? expressions representing a numerical value.

syntax

(macro id)

syntax

(patterns dictionary (pattern template) ...)

syntax

(compositions dictionary composer (id word ...))

procedure

(state:stack)  state:stack?

Creates a state? +object with an empty stack.

procedure

(target-value->number tv)  number?

  tv : target-value?

procedure

(op-apply op org)  (listof number?)

  op : op?
  org : number?

procedure

(state->stack state)  list?

  state : state?

procedure

(state-print-code state)  void?

  state : state:stack?
 
Context: [announce move to github tom@zwizwa.be**20130531153109 Ignore-this: f862f6ca45656f6f764269aadf692ad ] [TAG github-fork tom@zwizwa.be**20130531004809 Ignore-this: 6c1005358dd02b9e3a0095825de83232 ] Patch bundle hash: dbd55fd9ab5231295ac83657af25f9c12af8dc40