Staapl

5.3.4.7

top ← prev up next →

Staapl

raco pkg install github://github.com/zwizwa/staapl/master

1 Introduction

Staapl is a Scheme to Forth metaprogramming system. To illustrate the +general idea we’re going to use a concrete application: the Forth +compiler for the 8-bit Microchip PIC18 microcontroller architecture. +This section uses REPL interaction with some example code written on +top of the compiler to demonstrate the code generation process.

(require pic18/demo)

1.1 Forth to Machine Code

The code> form provided by the demonstration module +interprets the forms in its body as PIC18 Forth code, compiles them, +and prints out the resulting intermediate code with one instruction +per line.

> (code> 123)
[qw 123]

The instruction qw tells the target machine to load the +number 123 on the run-time parameter stack. It is short for +“quote word.” Providing a sequence of numbers in the code> +body will generate concatenated machine code, which is executed by the +target machine from top to bottom.

> (code> 1 2 3)
[qw 1]
[qw 2]
[qw 3]

Target code is represented in this intermediate form during the first +code generation pass to facilitate code transformations. It consists +of a mix of pseudo instructions and real PIC18 instructions. The code +generator will eventually clean up all occurances of qw +before attempting translation to binary machine code. The +pic18> form performs this extra step and shows real machine code +output.

> (pic18> 123)
[movwf PREINC0 0]
[movlw 123]

The first instruction stores the contents of the working register in +the 2nd position on the parameter stack, and the second instruction +replaces the contents of the working register with 123. +Again, concatenating compiler input produces concatenated output:

> (pic18> 1 2 3)
[movwf PREINC0 0]
[movlw 1]
[movwf PREINC0 0]
[movlw 2]
[movwf PREINC0 0]
[movlw 3]

The intermediate instruction set which contains the qw +instruction is useful for implementing partial evaluation +rules. When compiling a particular Forth word, the compiler can +inspect the code already compiled to determine if it can combine its +effect with the effect to be compiled.

> (pic18> +)
[addwf POSTDEC0 0 0]

> (pic18> 1 +)
[addlw 1]

> (pic18> 1 2 +)
[movwf PREINC0 0]
[movlw (1 2 +)]

This illustrates 3 different modes of computation. The first program +computes the addition at run-time, taking both input values from the +runtime stack and putting back the result. The second program adds +the literal value 1 to the top of the stack using a different +machine instruction. The third program doesn’t perform any run-time +computation at all and simply loads the result of the addition that +was computed at comple-time because both inputs to the addition were +available.

Note that in this last program the result of the compile-time +computation is not shown. Instead it shows a program (1 2 +) +that gives the result upon evaluation. The compiler doesn’t need to +know the exact value at this point. It only needs to know that the +value can be determined later when necessary. This is essential for +integration with the assembler, since these expressions might contain +symbolic representations of code locations that are not yet associated +to numerical addresses.

Using the intermediate form with the qw pseudo-instructions +to compile the Forth program 1 2 + shows the key idea: the +target code list can be interpreted as a parameter stack, with +the top of the stack at the bottom of the code list.

> (code> 1 2)
[qw 1]
[qw 2]

> (code> 1 2 +)
[qw (1 2 +)]

The code stack can be used as the argument passing mechanism +for a language of macros that is active at compile time. +Machine instructions then become datatypes of this language. The word ++ names a function that operates at compile time. +It inspects the code stack and if it finds one or two qw objects it +can use them as input to the addition operation and compile a simpler +run-time instruction.

1.2 Scheme as Meta-language

While traditional Forth has its own metaprogramming facilities, +combining it with a Scheme-based meta system gives the added advantage +of tying into a powerful lexical scoping mechanism. The +ability to assign local names to objects comes in handy when dealing +with complex data structures, which is sometimes difficult to do in a +combinator language like Forth or Joy. At the same time, the absence +of lexical binding forms in Forth code make it very suitable to be +handled as data, avoiding complications associated to name +capture.

This sections introduces the forms macro:, +compositions, patterns and tv: which +comprise the metaprogramming interface between the Forth and Scheme +languages in Staapl.

1.2.1 Code and Composition

A representation of a Forth program can be composed using the +macro: form.

(define code1 (macro: 123))
(define code2 (macro: 1 +))

The objects created by macro: are called concatenative +macros. These are functions that operate on compilation state +objects.

> code1
#state->state

The target code associated to the objects code1 and +code2 can be printed using the function +state-print-code which extracts a code stack from a state object +and prints it, and the function state:stack which produces a +state object with an empty code stack to serve as an initial state.

(define (print-code code-rep)
(state-print-code (code-rep (state:stack))))

> (print-code code1)
[qw 123]
> (print-code code2)
[addlw 1]

It is seldom necessary to apply concatenative macros to state objects +manually since composition using the macro: form will usually +suffice. In practice such application is performed by the framework +when generating target code.

The forms residing in the code body of a macro: form have a +one-to-one correspondence to concatenative macros. Literals +found in the body of a macro: are mapped to compilation state +transformers that append qw instructions to the current code +list. Identifiers are mapped to function values using the +macro form, which fishes them out of the (macro) +dictionary.

> (macro +)
#state->state

Note that the objects produced by (macro +) and (macro: +) are different, although they have the same behaviour. The former +is a variable reference while the latter creates a new abstraction. +This is similar to the distinction between the Scheme expressions ++ and (lambda (a b) (+ a b)).

Composing macros that are not in the (macro) dictionary is +possible using the unquote operation. The macro: +form behaves similar to quasiquote. Unquoted objects come +from the Scheme lexical environment and are interpreted as macros.

(define code3 (macro: ,code1 ,code2))

> (print-code code3)
[qw (123 1 +)]

This illustrates the advantage of keeping target code in +delayed form. The effect of both pieces of code has been +combined into a single target operation.

It is possible to unquote Scheme values as literals by wrapping a +quote form around the unquote form.

(define value 42)
(define code4 (macro: ',value))

> (print-code code4)
[qw 42]

To define new identifiers in a particular dictionary, we can use the +compositions form.

(compositions (macro) macro:
(inc 1 +)
(dec 1 -))

> (print-code (macro: 100 inc))
[qw (100 1 +)]
> (print-code (macro: 100 dec))
[qw (100 1 -)]

The first two sub-forms in a compositions form indicate the +target dictionary and body compiler respectively. The rest of the +body consists of a list of lists, where the first element of each list +is an identifier to which a macro will be associated in the +dictionary, and the rest of the list is a code body that’s passed to +the body compiler form.

1.2.2 Primitives

The previous section describes how to compose existing code to create +new code by concatenation, and how to evaluate code into a form that +can be passed to the assembler. This section will describe how to +define primitive macros operating on stacks of target machine +instructions.

Creating an instance of a machine code instruction is done using the +op: form. It is exactly these objects that are produced when +concatenative macros are evaluated.

(define ins1 (op: addwf 42 0 0))
(define ins2 (op: addlw 123))
(define ins3 (op: qw 123))

Real opcodes can be passed to the assembler to produce binary output. +Pseudo instructions cannot.

> (op-apply ins1 0)
'(9258)
> (op-apply ins2 0)
'(3963)
> (op-apply ins3 0)
asm-pseudo-op: qw

However, pseudo instructions can be used to hold intermediate data +during the compilation phase. The following will illustrate the use +of a form to define new primitive operations. We’ll create a macro +add that will behave like the macro + encountered +before.

Creating new primtive macros is done with the patterns form. +Its first subform specifies the dictionary to which definitions are +associated. The rest of the forms contains lists of pattern and +template pairs. The following example defines the add macro +as not taking any input from the code stack, but producing an +addwf instruction as output.

(patterns (macro)
((add) ([addwf POSTDEC0 0 0])))

The templates in a pattern form are lists of forms that are +passed on to the op: form, resulting in lists of instruction +objects. Verifying if it works gives:

> (print-code (macro: add))
[addwf POSTDEC0 0 0]

This is an example of a non-optimizing macro which only performs code +generation. To add different behaviour for different input patterns, +extra clauses can be added.

(patterns (macro)
(([qw a] add) ([addlw a]))
((add) ([addwf POSTDEC0 0 0])))

> (print-code (macro: 123 add))
[addlw 123]
> (print-code (macro: add))
[addwf POSTDEC0 0 0]

When a qw instruction appears in the input, it is +deconstructed and its operand is used as the operand of a +addlw operation. The patterns form is built on the +PLT Scheme match form, deconstructing a stack of +instructions according to input patterns, and constructing lists of +instructions to be added to the compilation state to replace the +matching top of stack.

Upto now our add doesn’t really perform compile-time +computation other than selecting a different instruction based on the +presence of literal data. Using the tv: form we can add +proper computation when there are two literals available. For now, +think of tv: as an RPN calculator behaving as in:

> (target-value->number (tv: 1 2 +))
3
> (let ([a 100]
[b 200])
(target-value->number (tv: a b +)))
300

The tv: form will install wrappers to enable computations +that can only be performed after the assembler has assigned numerical +addresses to code labels. The use of this compile-time calculator +then leads to an implementation of the + macro whith 3 +evaluation modes:

(patterns (macro)
  (([qw a] [qw b] add) ([qw (tv: a b +)]))
  (([qw a] add)        ([addlw a]))
  ((add)               ([addwf POSTDEC0 0 0])))

> (print-code (macro: 1 2 add))
[qw (1 2 +)]
> (print-code (macro: 123 add))
[addlw 123]
> (print-code (macro: add))
[addwf POSTDEC0 0 0]

1.3 Other RPN Languages

Staapl contains a generic concatenative language parser in +staapl/rpn which is used to implement the languages +macro:, tv:, scat: and target:. +This language syntax can be extended with prefix parsers to +implement a Forth-style prefix syntax for defining words.

For example the scat: form creates unary functions that +operate on a state? object just like the macro: +form. The state object on which they operate contains a single stack +which can be accessed as a list with the state->stack +function. This is the same stack which serves as the code stack for +concatenative macros. The functions in the (scat) dictionary +are thin wrappers around Scheme functions implementing the same +behaviour but operating on the top of the state’s stack.

> (state->stack ((scat: 1 2) (state:stack)))
'(2 1)
> (state->stack ((scat: 1 2 +) (state:stack)))
'(3)

The form scat> is similar to the form code> in that +it creates a concatenative function and applies it to a state with an +empty stack before pretty-printing the result. The contents of the +parameter stack is printed with the topmost element on the right side.

> (scat> 1 2 3 4)
<4> 1 2 3 4
> (scat> 1 2 3 4 + + +)
<1> 10
> (scat> 123 dup)
<2> 123 123
> (scat> '(a b c))
<1> (a b c)
> (scat> '(a b c) car)
<1> a
> (scat> '(a b c) uncons)
<2> a (b c)

The functions generated by scat: and macro: are +compatible. However the macro: form creates functions that +operate on tagged values representing target machine instructions. +

> (scat> 123 ,(macro: 123))
<2> 123 (#<asm> 123)
> (scat> ,(macro: 123) dup)
<2> (#<asm> 123) (#<asm> 123)

1.4 Representation

Let’s open up the form macro: to see how it is implemented +in terms of Scheme expressions. All RPN compilers are built around +two forms: the definition compiler form rpn-lambda which +builds lambda expressions out of tagged tokens and the +dictionary compiler form rpn-parse which transforms token +sequences into tagged token sequences according to the RPN language +syntax.

To illustrate the implementation we use the pretty-expand +function to perform macro expansion and pretty-printing with toplevel +Scheme annotations like #%app removed. The function takes an +optional second argument to indicate the kind of expansion. The +default is full expansion using expand. In addition single +expansion using expand-once is used.

The form (macro: 1 2 +) expands to an rpn-lambda +form, which nests (folds) the subforms it receives.

(define stx1
  #'(rpn-lambda
     (macro-push 1)
     (macro-push 2)
     (scat-apply (macro +))))

> (pretty-expand stx1 expand-once)
(lambda (p)
  (macro-push 1 p (macro-push 2 p (scat-apply (macro +) p p))))
> (pretty-expand stx1)
(lambda (p)
  (let-values (((p) ((lit '1) p)))
    (let-values (((p) ((lit '2) p)))
      (let-values (((p) (macro/+ p))) p))))

The macro-push compiles a literal value while +scat-apply compiles a function application. The +macro form performs dictionary lookup by mapping the ++ identifier to macro/+ which is bound to the +concatenative macro that compiles an addition operation. The +lit function transforms a number into a concatenative macro +that will compile the number. Let’s see them in action individually.

> (pretty-expand #'(macro-push 1 state next))
(let-values (((state) ((lit '1) state))) next)
> (pretty-expand #'(scat-apply macro/+ state next))
(let-values (((state) (macro/+ state))) next)

The rpn-parse form can be used to attach semantics to the +core RPN syntax. Let’s illustrate it by creating a new RPN language +represented by the program: form.

(define-syntax-rule (program: code ...)
  (rpn-parse (dictionary
              (resolve-name)
              function
              immediate
              quoted-program
              program:
              (rpn-lambda))
             code ...))
(define-syntax-rule (dictionary d) 'd)
(define stx2 #'(program: 123 add (foo bar) '(a b c)))

> (pretty-expand stx2)
'(rpn-lambda
  (immediate 123)
  (function (resolve-name add))
  (quoted-program (program: foo bar))
  (immediate `(a b c)))

The (rpn-lambda) subform in the definition of the +program: form represents the initial content of the first +dictionary element. The rpn-parse form will interpret the +code forms it finds and append tagged forms to the current dictionary +entry.

The expansion illustrates four syntactic structures in the RPN +language. The 123 form represents a literal value compiled by +the immediate form. The identifier add represents a +function reference, with the name reference processed by the +resolve-name form and the result of that compiled by +function form. The (foo bar) form represents a +quoted program compiled by the program: form and quoted by +the quoted-program form. The '(1 2 3) form is a +quote of a data structure which is a literal value handled by the +immediate form.

The first subform of rpn-parse, here defined as +dictionary, is the form to which rpn-parse expands, +passing its dictionary output as subforms. We defined it here as a +quote operation which will stop expansion of the dictionary +form. By defining dictionary as something that will allow +further expansion, and by defining the other forms passed to +rpn-parse in the definition of program: we can +introduce a semantics.

Suppose the RPN language we want to define uses a list as a state +that’s passed from function to function to represent a stack. We can +define a simple stack machine using these forms:

(define-syntax-rule (dictionary entry) (begin entry))
(define-syntax-rule (resolve-name n) n)
(define-syntax-rule (function fn state subform)
  (let ((state (fn state)))
    subform))
(define-syntax-rule (immediate expr state subform)
  (let ((state (cons expr state)))
    subform))
(define-syntax-rule (quoted-program . a) (immediate . a))
(define (add lst)
  (cons (+ (car lst)
           (cadr lst))
        (cddr lst)))

> (pretty-expand stx2)
(begin
  (lambda (p)
    (let-values (((p) (cons '123 p)))
      (let-values (((p) (add p)))
        (let-values (((p)
                      (cons
                       (begin
                         (lambda (p)
                           (let-values (((p) (foo p)))
                             (let-values (((p) (bar p))) p))))
                       p)))
          (let-values (((p) (cons '(a b c) p))) p))))))
> ((program: 1 2 3) '())
'(3 2 1)
> ((program: 1 2 add) '())
'(3)

The RPN languages macro:, target:, scat:, +live:, tv: have the same syntactic structure with a +different semantics attached by means of forms that are passed to +rpn-parse in the definition of their program quotation forms. +They all use dictionaries with a single element. Forth compilation +however uses multiple entries. It uses extensions of +rpn-parse to create new dictionary entries (i.e. the +: word), representing named functionality.

1.5 Assembler

The assembler performs two tasks: converting symbolic representation +of machine code to binary and determining code addresses. The former +is relatively trivial. The staapl/asm module contains the +special form instruction-set which implements a convenient +syntax for creating assembler functions from strings representing bit +field layout.

(instruction-set
(foo (a b c) "0101 aaaa bbbb bbbb" "1111 cccc cccc cccc"))
(define foo-op (op: foo 1 2 3))
(define start-address 0)

> foo-op
'(#<asm> 1 2 3)
> (for ((bin-op (in-list (op-apply foo-op start-address))))
(printf "HEX: ~x\n" bin-op))
HEX: 5102
HEX: f003

This defines the foo assembler instruction as a 2-word +instruction with 5 bitfields: two constant bitfields and 3 variable +ones. Whitespace in the strings are ignored, and can be used for +nibble separation. Instructions defined using this form also define a +corresponding disassembler function.

Target address allocation is non-trivial due to a circular dependency. +Numerical constants can depend on numerical address values, while +address values are determined by the size of instruction words, which +can depend on the value of numerical constants (i.e. shorter +instructions are used for small literal values or short relative +jumps). To resolve this dependency, the assembler performs multiple +passes using a monotonic relaxation algorithm. On every pass it +re-evaluates expressions encoded in a target-value? object +until code addresses stabilize. These expressions can be created by +the compiler using the tv: form.

1.6 More

[To be documented].

At the core of the Forth code generator is an incremental compiler +which constructs a control flow graph as its first output pass. +Subsequent optimization passes operate on this structure.

The distribution contains some example Forth code and library routines +that implement the target side of host to target tethering.

There is a significant body of code to perform run-time target access +and incremental code compilation and upload. This functionality can +be presented to the programmer as a Forth console, which can access +host and target functionality.

Staapl provides the staaplc command line application which +can be used to compile stand-alone Forth programs to binary, for +upload with a microcontroller programmer tool.

2 PIC18 Forth

For the Forth language there exist ANSI and ISO standards. In good +Forth tradition however, the Staapl Forth for the Microchip PIC18 is +non-standard. It shouldn’t be particularly difficult to build a +standard layer on top of it, so we will ignore that issue for now, and +refer to our particular dialect as “the Forth”.

The Forth is based on a 2–stack machine model. One stack is +used to pass data between subroutines while the other is used to keep +track of procedure nesting and other control state. It is essential +that they are independent, as opposed to languages like C which +store parameters and return points on the same stack. For the PIC18, +the model is a thin layer atop the concrete machine architecture: an +8-bit flash memory based microcontroller. The implementation actually +uses three stacks: byte size parameter and retain stacks, and a +21–bit wide hardware return stack.

A Forth typically enables direct low–level machine resource access +while providing a base for constructing high–level abstractions. +Traditionally Forth is implemented in machine language. However, +since the Staapl Forth doesn’t need to be self–hosting and so has no +constraints about compiler complexity, it can be written almost +entirely in macro form, instantiated by a meta–system running on a +workstation instead of a small target system. Staapl’s Forth is +subroutine threaded, with each Forth word corresponding to a piece of +inlined machine code or a procedure call. See the introductory +section for more information about how this works.

It should be noted that the Forth language layer is written on top of +the concatenative macro layer Coma. Essentially it provides two extra +features: Forth-style syntax in the form of prefix parsing +words (like : and variable) that can manipulate +identifiers by modifying the meaning of subsequent words, and target +code instantiation which allows the result of macro expansion +to be invoked at runtime as a machine code procedure.

In the following we use the form forth> from the +demonstration module, which is similar to the forms code> and +pic18> in that it compiles and prints code, but without any +simplifications. It is different in two respects. It takes an input +string which is passed to the Forth lexer, and it prints out a +representation of a control flow graph which takes the form of labeled +assembly code, which each basic block starting with a label.

> (forth> ": foo 1 2 3 ;")
foo: 0022 0044 6EEC [movwf PREINC0 0]
0023 0046 0E01 [movlw 1]
0024 0048 6EEC [movwf PREINC0 0]
0025 004A 0E02 [movlw 2]
0026 004C 6EEC [movwf PREINC0 0]
0027 004E 0C03 [retlw 3]

In addition to the assembly code, it prints two columns. The first +one is the target Flash ROM address of the instruction, and the second +is the instruction’s binary encoding.

2.1 Syntax

The Coma language on which the Forth is based uses an s-expression +syntax as it is implemented using Scheme macros. A parser converts +Forth code into s-expressions. This parser is based on the +rpn-parse Scheme form we encountered before, and behaves +similarly to Forth parsing words (those that use accept). The +rpn-parser parser form and its exptensions translate a +linear token stream into an s-expression representation of a +Forth-style dictionary data structure. The token stream passed to +rpn-parse is produced by a Forth-style word/whitespace lexer +which is currently not extensible.

The dictionary is the same as we encountered in the single definition +compilers like macro:. However, it has a different +representation due to a different header structure. Also in Forth +parsing, the dictionary typically contains more than one entry, with +each one corresponding to a unique name.

Essentially, the dictionary abstracts two actions. Compilation using +rpn-lambda as encountered before, and identifier definition +using Scheme’s define form.

Compared to traditional Forth, the thing to note here is that the +Staapl Forth is not circular in the sense that input parsing isn’t +done by Forth words, but by separate entities called prefix +parsers. These parsing words are extensions of the +rpn-parse form, and implement behaviour associated to words +like : and variabe. The main reason for this +non–cyclic organization is to keep the Coma core as simple as +possible so it can can support languages other than Forth using +different preprocessing steps. A simple, non–reflective core +language is easier to meta–program. However, breaking these +reflective circles looses some flexibility. Extending the parser (or +lexer) requires the use of a different language. The upside is that +the language for rpn-parse extensions is a simple high–level +rewrite language based on Scheme’s rewrite macro system. In practice +defining new prefix parsing words isn’t often necessary: manipulating +target names (generating collections of words as a module) is better +done in Scheme.

To add some visual to this we can have a look at the intermediate +dictionary s-expression rpn-parse creates before it is +expanded further, and see how it is expanded into Scheme code. We did +a similar thing in the program: form defined in the +introduction. The last generated dictionary form is logged for +debugging purposes in the forth-dictionary-log parameter.

> (pretty-print (forth-dictionary-log))
#<procedure:void>

This shows a list of two forms, both are macro calls to +forth-word. The first form is an empty definition with only +a header. In the second form, the forms following the name +foo represent a macro call to rpn-lambda which is +the heart of the RPN to Scheme compilation. The forth-word +form creates the name foo in the proper namespace and will +use the wrapper function wrap-word to postprocess the result +of the rpn-lambda expansion.

Note that the dictionary data structure looks quite like a typical +Forth dictionary, with some header information describing the +semantics of the code body, and the body a sequence of instructions. +This reflects the typical structure of a Forth parser/compiler. The +parser effectively executes source code from left to right. The +prefix parser represented by the word : will take the next +identifier from the token stream and create a new dictionary entry +with the proper header forms that implement its semantics.

2.2 Instantiation

In contrast to standard Forth which supports only direct style +low–level macros based on the literal, postpone and +immediate words, the Staapl Forth uses a high–level +pattern–based syntax for macro definition. As a result, the syntax +for Forth words and macros is essentially the same. More +specifically, there exists a syntax subset for which compositions can +either be implemented as meta–system macros or as target words.

Forth words are defined using the :forth prefix parsing word, +while macros are defined using the :macro word. Both words +behave as the standard Forth : word. Alternatively the +forth and macro words can be used to switch the +meaning of the : word to mean :forth or :macro +respectively.

From here on we’ll use a more direct interface to the PIC18 compiler. +Code represented as

macro
: add + ;
forth
: plusone 1 add ;

will be passed as a single string to the forth-compile form. +It accepts a string of Forth code and updates the code registry. +We’ll access the registry using the code function:

(define (code)
(code-print)
(code-clear!))

> (code)
plusone:
0028 0050 0F01 [addlw 1]
0029 0052 0012 [return 0]

This is essentially how the compiler works in actual use, except that +code will not be prettyprinted, but exported as a binary file and an +associated symbolic dictionary.

In the Forth code above the word add was defined as a macro, +while plusone was defined as a target word. In the output +assembly code it can be seen that all references to add have +disappeared. Only its effect remains: the compilation of an +addlw instruction.

Using these language constructs, the Forth programmer can decide which +compositions are to be present as run–time callable code on the +target machine, and which are always inlined. Whether one or the +other is better depends on the context. Staapl’s Forth allows one to +make these decisions manually, with a convenient way to switch between +the two forms. Let’s define add as a target word instead.

forth

: add + ;

: plusone 1 add ;

> (code)
add: 002A 0054 24ED [addwf POSTDEC0 0 0]
002B 0056 0012 [return 0]
plusone:
002C 0058 6EEC [movwf PREINC0 0]
002D 005A 0E01 [movlw 1]
002E 005C D7FB [jsr 1 add]

The rule of thumb is that by default one creates Forth words as this +often leads to smaller code and is easier to debug, except when one +wants to use syntactic features not available to Forth words (like +lexical parameters for macros), or when macros yield smaller and/or +faster code because of specialization. The machine’s primitives are +by default implemented as macros that can perform inline code +specialization.

The Forth language in Staapl is inteded as the lowest level system +programming layer. It is essentially an assembler with a powerful +macro system. Other languages with a higher level semantics might be +built on top of Coma, but the Forth provides the manual low–level +access when it is needed.

2.3 Macro vs. Forth

The differences between macros and forth procedures lie in their +composition system. Target procedures use the machine instruction +pointer and allow manipulation of the hardware return stack, while +macros are essentially lexically scoped Scheme procedures.

Target words allow multiple entry points with code fallthrough. This +is specified in source code by not terminating a word definition with +; such that execution continues with the following word. +Target names behave essentially as assembler labels. Sequential code +in Forth source leads to sequential machine code. Macro words support +only single entry points, as they are separate entities without an +order relation. Macro words always need to be terminated by a +; word.

forth

: plustwo 1 +

: plusone 1 + ;

> (code)
plustwo:
002F 005E 0F01 [addlw 1]
plusone:
0030 0060 0F01 [addlw 1]
0031 0062 0012 [return 0]

Target words allow multiple exit points. Macro words simulate this +feature by jumping to the end of the macro expansion for each +occurance of the ; word.

Macro words allow the definition of lexical variables using the +| word. These variables are taken from the compile–time +value stack and behave as constants: occurence of the variable in code +pushes the referred value. Target words do not support this feature.

macro

: 3dup | x | x x x ;

forth

: example 123 3dup ;

> (code)
example:
0032 0064 6EEC [movwf PREINC0 0]
0033 0066 0E7B [movlw 123]
0034 0068 6EEC [movwf PREINC0 0]
0035 006A 0E7B [movlw 123]
0036 006C 6EEC [movwf PREINC0 0]
0037 006E 0C7B [retlw 123]

Note that lexical names can be introduced anywhere in a macro body. +They are valid until the end of the macro definition.

Lexical variables can be used to create macro closures. The words +[ and ] can be used to create an anonymous macro that +can be passed around as a compile time value. The word i +invokes such a quoted macro.

macro

: make-twotimes | x |

[ x x ] ; \ create a nameless macro

: invoke-add

i + ;

forth

: example

10 make-twotimes \ create the macro closure

invoke-add ; \ pass it to another macro

> (code)
example:
0038 0070 6EEC [movwf PREINC0 0]
0039 0072 0C14 [retlw (10 10 +)]

The target run–time procedure composition mechanism is available to +macros. This is useful for writing macros that abstract non–standard +control flow. The exit word always means procedure exit, +while ; means procedure exit in Forth words, but designates +macro exit for macro words.

Another important distinction has to be made regarding arithmetic and +logic operations and numeric representation. Once instantiated, these +are limited to the target word size. During compile time computations +however, infinite precision Scheme bignums are used.

forth

: one 1000000000000

999999999999 - ;

> (code)
one: 003A 0074 6EEC [movwf PREINC0 0]
003B 0076 0C01 [retlw (1000000000000 999999999999 -)]

Compare the opcode with a previous one and note that it really returns +the 8-bit value 1.

2.4 Modules

Forth code can be organized in modules. These are implemented as PLT +Scheme modules to make it easier to mix Scheme and Forth code in the +same project. Scheme modules for PIC18 usually contain macros that +expand to a pic18-begin form.

To import a Forth module in a Forth project use the require, +planet or staapl words. This example imports the +route macro and its supporting target code which is used in +the subsequent examples.

require staapl/pic18/route

> (code)
_route/e:
003C 0078 60EF [cpfslt INDF0 0]
003D 007A 6EEF [movwf INDF0 0]
003E 007C 50ED [drop]
_route: 003F 007E 44E8 [rlncf WREG 0 0]
0040 0080 26FD [addwf TOSL 1 0]
0041 0082 0B01 [andlw 1]
0042 0084 22FE [addwfc TOSH 1 0]
0043 0086 50ED [drop]
0044 0088 0012 [return 0]

This is the same as

staapl pic18/route

Modules will only be instantiated once, registering target code in a +central repository.

2.5 Byte cells

Standard Forth needs a cell size of at least 16 bits. The Staapl +Forth uses 8–bit cells for data and retain stack, and uses a separate +wide stack for holding return addresses.

Keeping the cell size equal to the native data word size makes it +simpler to directly represent the machine architecture as a stack +machine through a collection of macros that map to simple +instructions. The small cell size isn’t much of a problem for simple +computing tasks. However, target code addresses do not fit in a +single cell which leads to some non–standard behaviour. Most notably +the execute word now takes two parameters, taking the high +byte of the word address on the top of the parameter stack. Another +notable point is that loops using for next are +limited to 256 iterations.

The fact that a word address does not fit in a variable or a single +stack cell promotes the use of byte tokens for representing +delayed behaviour, which are implemented in terms of jump tables using +the route word.

forth

: one 1 ;

: two 2 ;

: three 3 ;

: four 4 ;

: interpret-byte

route

one . two . three . four ;

> (code)
one: 0045 008A 6EEC [movwf PREINC0 0]
0046 008C 0C01 [retlw 1]
two: 0047 008E 6EEC [movwf PREINC0 0]
0048 0090 0C02 [retlw 2]
three: 0049 0092 6EEC [movwf PREINC0 0]
004A 0094 0C03 [retlw 3]
four: 004B 0096 6EEC [movwf PREINC0 0]
004C 0098 0C04 [retlw 4]
interpret-byte:
004D 009A DFF1 [jsr 0 _route]
.L5: 004E 009C D7F6 [jsr 1 one]
004F 009E D7F7 [jsr 1 two]
0050 00A0 D7F8 [jsr 1 three]
0051 00A2 D7F9 [jsr 1 four]

The . word is the same as exit except that it does not +mark the subsequent code as unreachable. The reason this abstraction +works is that a procedure call followed by a procedure exit is always +compiled as a jump. This means the Staapl Forth is tail-recursion +safe.

3 Reference

(require staapl/macro)

syntax
(scat: rpn-form ...)

Most basic RPN form with the following grammar

rpn-form	=	id
	\|	literal
	\|	(rpn-form ...)
	\|	(quote rpn-data-form)
	\|	(unquote scheme-form)

rpn-data-form		=		scheme-data-form
		\|		(unquote scheme-form)

Here literal is any form that is not an identifer or a list, +form is any Scheme form and scheme-form a Scheme +form expanding to a state transformer function.

syntax
(macro: word ...)

syntax
(op: word ...)

syntax
(tv: word ...)

Creates a composite function similar to +scat: except that it creates a target-value? which +when forced, evaluates the composite function on a state? +with an empty stack. If this evaluation produces a one-element stack +this value is returned. Otherwise an exception is raised. Also, +identifiers in the subforms word are interpreted as quoted +target-value? if they are bound in the lexical environment +of the form. Otherwise they come from the scat dictionary.

procedure
(target-value? tv) → boolean?
tv : any/c

A delayed computation that can be forced with +target-value-eval or target-value->number. It may +depend on the target-word-address of any +target-word? it contains.

procedure
(target-word? tw) → boolean?
tw : any/c

A symbolic +representation of a target code address. A node in a control flow +graph. Assigned a value by the assembler. Can be used in +target-value? expressions representing a numerical value.

syntax
(macro id)

syntax
(patterns dictionary (pattern template) ...)

syntax
(compositions dictionary composer (id word ...))

procedure
(state:stack) → state:stack?

Creates a state? +object with an empty stack.

procedure
(target-value->number tv) → number?
tv : target-value?

procedure
(op-apply op org) → (listof number?)
op : op?
org : number?

procedure
(state->stack state) → list?
state : state?

procedure
(state-print-code state) → void?
state : state:stack?

top ← prev up next →

Staapl

TL;DR

About

Status

Documentation

Hire Me

Projects

Download

Staapl

1 Introduction

1.1 Forth to Machine Code

1.2 Scheme as Meta-language

1.2.1 Code and Composition

1.2.2 Primitives

1.3 Other RPN Languages

1.4 Representation

1.5 Assembler

1.6 More

2 PIC18 Forth

2.1 Syntax

2.2 Instantiation

2.3 Macro vs. Forth

2.4 Modules

2.5 Byte cells

3 Reference