#lang scribble/doc @(require scribble/manual scribble/eval scheme/sandbox) @(define stdout (current-output-port)) @(define staapl-eval (make-base-eval)) @(define-syntax ex (syntax-rules () ((_ () . args) (interaction #:eval staapl-eval . args)) ((_ . args) (defs+int #:eval staapl-eval . args)))) @;{ Disables document output during require. } @(define-syntax-rule (for-module spec) (let ((dummy (begin (set! staapl-eval (make-base-eval)) @;(printf "entering ~a\n" 'spec) (staapl-eval '(require spec))))) (defmodule spec))) @title{Staapl reference guide} This document describes the different module components of the Staapl system, providing an @emph{inside} view. For interactive use as a Forth system, these Scheme components are hidden and only the Forth language and interactive console are exposed through a simpler interface. We start with Scat, a concatenative and compositional language with threaded state, and move onwards to Coma and its composition and primitive definition mechanisms. Then we work our way up through Forth control words, prefix parsing words and the PIC18 backend to end up at the project management code which enables the use of multiple instantiated languages. @section{Representation Language} @for-module[(planet zwizwa/staapl/scat)] All concatenative code is built on top of a concatenative and compositional language called Scat with extensible syntax and semantics. The term @emph{concatenative} refers to program syntax being build from concatenation of subprogram names. The term @emph{compositional} refers to program semantics where concatenation is interpreted as composition of functions referred to by names. Like the Joy language, the meaning function relating syntax to semantics is a homomorphism from the syntactic monoid onto the semantic monoid. That is, the syntactic relation of concatenation of identifiers maps directly onto the semantic relation of composition of functions. It is a homomorphism instead of an isomorphism because it is onto but not one-to-one, that is, some sequences of identifiers have the same meaning (i.e. @scheme[dup +] and @scheme[2 *]) but no identifier has more than one meaning. A @scheme[scat:] expression composes Scats functions. @specform[(scat: word ...)]{} The @scheme[_word]s refer to Scat functions in the @scheme[(scat)] namespace, implemented as prefixed identifiers. If there are no arguments, the identity function is returned. A @scheme[scat>] expression can be used to test Scat compositions interactively. @specform[(scat> word ...)]{} This constructs a function by passing the @scheme[_word]s to @scheme[scat:], and prints the result of applying this function to an empty parameter stack, top element on the right. @ex[() (scat> 1 2) (scat> 1 2 +) ] These forms support the @scheme[unquote] operation to introduce scheme expressions into Scat code compositions. This can be done in two ways. A non-quoted unquote is interpreted as a Scat function, while a quoted unquote is interpreted as a literal value. Note that by default identifiers in Scat code @emph{always} represent functions. Literal values occuring in code are a notational shorthand for functions that push a value to the parameter stack. @ex[() (let ([fn (scat: 1 +)] [val 123]) (scat> 1 ,fn) (scat> ',val)) ] New functions can be defined using the @scheme[define-ns] form. @specform[(define-ns namespace name value)]{} The @scheme[_namespace] refers to a list of prefixes for Scat identifiers when represented as a Scheme identifier, @scheme[_name] is the base identifier, and @scheme[_value] is the function value. I.e.: @ex[() (define-ns (scat) foo (scat: 1 2 3)) (define-ns (scat) bar (scat: + +)) (scat> foo) (scat> foo bar) ] There is a shorthand @scheme[compositions] for defining multiple words. @specform[(compositions namespace composer (name word ...) ...)] The @scheme[_namespace] parameter defines the @emph{destination} namespace of new definitions. The @emph{source} namespace is encoded in the @scheme[_composer]. I.e. @scheme[scat:] will pick from the @scheme[(scat)] namespace. This allows the definition of different primitive Scat semantics by taking functionality from one namespace to define primitives of another. The equivalent of the previous example is: @ex[() (compositions (scat) scat: (foo 1 2 3) (bar + +)) ] The primitives available in the Scat namespace are currently not documented. They are largely compatible with Joy or directly snarfed from Scheme. These primitives are not so important for general use, since we mostly employ a different semantics for Scat called Coma, a concatenative macro language. The main idea of a functional representation of concatenative code is that the threaded state object never needs to be explicitly manipulated; one only manipulates point-free function compositions. Internally, Scat functions are modeled as Scheme functions mapping a single state object derived from the @scheme[stack] structure to another state object. Direct access to the state representation is only necessary within language primitives, and as such the constructors, field accessors and pattern matching form @scheme[state-lambda] for (derived) state structures are currently not documented. @section{Concatenative Macros} @for-module[(planet zwizwa/staapl/coma)] The Coma language is a Scat-like language residing in the @scheme[(macro)] namespace. It can be accessed using the @scheme[macro:] and @scheme[macro>] forms analogous to the @scheme[scat:] and @scheme[scat>] forms in the previous section. @specform[(macro: word ...)]{} @specform[(macro> word ...)]{} Coma's values are lists of @emph{stack machine instructions}, and its operations transform these lists. The transformations are always local to the end of the list, so the list of instructions behaves as a stack. Coma is thus a stack language serving as @emph{meta language} for another stack language, the machine language of an idealized @emph{stack computer}, which in its most basic form has five instructions. @itemize{ @item{@scheme[[qw literal]] Quote Word loads a literal (number) from the instruction stream onto the parameter stack.} @item{@scheme[[cw address]] Call Word performs a procedure call to a named function.} @item{@scheme[[exit]] aborts the execution of the current procedure.} @item{@scheme[[jw address]] Jump Word performs a jump (tail call).} @item{@scheme[[jw/if address]] Jump Word If performs a conditional jump after popping a boolean off the parameter stack.} } The Coma language performs @emph{partial evaluation} on the @scheme[qw] instructions, but leaves the other instructions alone. Obviously, to be useful, this instruction set needs to be extended with logic, arithmetic and memory primitives provided by the concrete architecture. @ex[() (macro> 1 2) (macro> 1 2 +) (macro> 123 345 10 3 -) ] The Coma composition mechanism is the same as Scat's. The difference of the two languages lies in their primitives, and in the way literals are treated. When writing a compiler for a particular architecture, the task is to define the semantics of the primitive transformers in terms of (real) target machine code transformers. This can be done using the form @scheme[patterns], a pattern matching mechanism for stack machine code. @specsubform/subs[(patterns namespace (lhs rhs) ...) ([lhs ([opcode operand ...] ... name)] [rhs ([opcode operand ...] ...) expr])] Each clause @scheme[(_lhs _rhs)] defines how the Coma macro @scheme[_name] transforms a particular pattern of instructions @scheme[([opcode operand ...] ...)] on the top of the code stack. For each unique name, the corresponding patterns are grouped and tried from top to bottom until there's a match. The @scheme[_opcode]s in @scheme[_lhs] are matched literally as symbols, while the @scheme[operand]s bind variables in the tagged lists representing instructions. The @scheme[_namespace] is most likely @scheme[(macro)]. If @scheme[_rhs] is a list of lists, the macro will replace the matched instructions with a list of template instructions with the @scheme[_opcode]s quoted literally and the @scheme[_operand]s evaluated. If @scheme[_rhs] is not a list of lists, it is a Scheme expression that evaluates to a macro which is subsequentially evaluated on the code stack with the matched instructions removed. For example, Let's redefine the Coma macro @scheme[+] to use the PIC18 instructions @scheme[addlw]. @ex[() (patterns (macro) (([qw a] [qw b] +) ([qw (+ a b)])) (([qw a] +) ([addlw a]))) (macro> 1 2 +) (macro> 123 +) ] On top of the @scheme[+] just defined, we define a macro @scheme[-] using macro delegation. The second pattern doesn't produce a macro but throws an exception indicating that this operation is compile time only. @ex[() (patterns (macro) (([qw a] -) (macro: ',(* -1 a) +)) ((-) (error 'runtime-not-implemented "-"))) (macro> 10 3 -) (macro> -) ] Observe that @scheme[(macro> 1 2 +)] is now different than before. We used direct evaluation in the expression @scheme[(+ a b)]. However, it is best to use the @scheme[tscat:] form for this. @specform[(tscat: word ...)] This is like @scheme[scat:] in that the @scheme[_word]s come from the @scheme[scat] namespace, but it is different in two ways. If @scheme[_word] is present in the lexical context, it is interpolated as a quoted literal value. In the case above, the expression @scheme[(tscat: a b +)] is equivalent to @scheme[(scat: ',a ',b +)]. Additionally, the computation is wrapped to be evaluated at assembly time. This allows code addresses that are only available at assembly time to be used in partial evaluation. The previous example can now be adapted to: @ex[() (patterns (macro) (([qw a] [qw b] +) ([qw (tscat: a b +)])) (([qw a] +) ([addlw a]))) (macro> 1 2 +) (macro> 123 +) ] Beyond the first level of deconstruction that @scheme[patterns] performs, enforcing primitives to process lists of opcode . operands pairs locally as a stack, the pattern matching syntax is PLT Scheme's @scheme[match] from @scheme[scheme/match] and can be used to further deconstructs operands. This allows the processing of arbitrary data structures in intermediate code and is the substrate of compile time computation in Staapl. One creates primitive code generators and processors that can be used as components in low-level programs written in a Forth-like compositional language. @ex[() (define-struct vec (x y)) (patterns (macro) (([qw x] [qw y] vpack) ([qw (make-vec x y)])) (([qw (struct vec (x y))] vunpack) ([qw x] [qw y])) (([qw (struct vec (x y))] vswap) ([qw (make-vec y x)]))) (macro> 1 2 vpack) (macro> 1 2 vpack vswap vunpack) ] Opcodes are just symbolic tags and can be used inside a @scheme[patterns] form without the need for declaration. The only restriction is that stack machine code produced by a composition of macros can be processed by the stack machine assembler. This freedom allows the introduction of macros that serve as primitives for arbitrary code generation @emph{idioms}, by allowing arbitrary data to be passed on the code stack in the form of @emph{compile time data}, represented by the @scheme['qw] tag, or arbitrarily tagged @emph{pseudo instructions}. Note that in the future this might change to a more static approach based on identifiers instead of symbols. Instead of @scheme[[qw (struct vec (x y))]] we could have introduced the opcode @scheme['vec] using something like @scheme[[vec v]] or @scheme[[vec (struct vec (x y))]]. Doing so prevents manipulation of such objects by generic stack manipulation words like @scheme[dup] and @scheme[drop]. The general rule is: use @scheme['qw] for anything that roughly behaves as a target machine value, and use anything else to prevent the partial evaluator for @scheme['qw] tags to touch the instruction. For certain classes of operations one can abstract the possible patterns over the instances of the class. To this end the macro @scheme[patterns-class] can be used. @specform[(patterns-class namespace formals (actuals ...) clause ...)] The @scheme[_formals] parameter is a list containing identifiers that parameterize the @scheme[_clause]s, which are pattern clauses passed to the @scheme[patterns] form. The @scheme[_actuals] parameters are values to be filled in for the formal parameters. The following example defines pattern matching rules for unary operators on the Microchip PIC18 architecture. Note that the first clause implements peephole optimization: it combines a real machine instruction with the effect of a macro. @ex[() (patterns-class (macro) [word opcode] [(1+ incf) (1- decf) (rot<>c rrcf) (rot<< rlncf) (rot>> rrncf) (swap-nibble swapf)] (([movf f 0 0] word) ([opcode f 0 0])) ((word) ([opcode WREG 0 0]))) ] There are two functions that allow the evaluation of macros. @specform[(macro->code macro)] @specform[(macro->data macro)] These functions convert the Coma function @scheme[_macro] to assembly code or data respectively, by applying the function to an empty state object. The former produces a naked list, where the first element corresponds to the top of the machine code stack. The latter only works if the macro produces a single @scheme[qw] instruction. @ex[() (macro->code (macro: 1 2 3)) (macro->data (macro: 123)) ] The object returned by @scheme[macro->data] can be an instance of the @scheme[target-value] structure produced by the @scheme[tscat:] form, which allows the expression of computations over not yet defined target code labels. Such an object can be converted to a number using the @scheme[target-value->number] function defined in the @scheme[target] module. However, when evaluating macros inside the definition of other macros, it is a better idea to leave the @scheme[target-value] structure as-is. @section{Forth: Structured programming, Prefix Macros and Code Instantiation} @for-module[(planet zwizwa/staapl/purrr)] The Forth language built on top of Coma can be seen as a @emph{programmer interface}, providing a simplified view of the Staapl system for low-level code development that doesn't depend on knowledge of the internals of Staapl and the PLT Scheme programming language. From within this language, code generation primitives defined using Scheme/Coma can be used in a straightforward way. Coma is a declarative concatenative language with s-expression syntax embedded in Scheme. It is kept as simple as possible in order to serve as a practical code generation target. Coma deals exclusively with @emph{function objects} that transform target machine code. To obtain executable code an @emph{instantiation} mechanism is necessary that @emph{evaluates} code transformers. In addition, Coma does not allow the manipulation of identifiers from within the language. Coma's name space management is based on Scheme's @scheme[define] and @scheme[let], abstracted as the @scheme[patterns] and @scheme[compositions] forms for definition of primitive and composite code, and a @scheme[let-ns] form for local names. Instantiation and name manipulation are managed by two separate components. Instantiation is managed by the @scheme[comp] module which can evaluate code generators to a control flow graph useful for further optimization. This is done by giving a meaning to target code labels and jump instructions, represented by the primitives @scheme[sym label: exit jump], on which the Forth structured programming words are built. The standard Forth structured programming words @scheme[if else then begin again do while repeat until] defined in the @scheme[control] module are an extension of purely concatenative Coma code. They are implemented the same as in standard Forth, by extending the Coma state with a @emph{control stack} which is essentially a compile-time equivalent of the run-time return stack. (Note that these are essentially @emph{imperative} constructs, and as such are banned from the experimental mostly-functional @emph{extended Coma} language, which is based on a set of higher order macro primitives instead (i.e. @scheme[ifte]). However, the Forth structured programming words are used to implement these higher order macro primitives.) Name and instantiation manipulation is provided to Forth code by the words @scheme[create : forth macro require load path], the primitive @emph{prefix words}. Because prefix parsing lies outside of Coma, a separate composition mechanism is provided to build Forth-like parsing words on top of these primtives. This is done in the form of the @scheme[substitutions] macro described below. This Forth frontend that includes code instantiation and prefix parsing is called @emph{Purrr}. Together with machine specific primitive code transformers it yields a complete Forth system. It is accessible from within scheme using the @scheme[forth-begin] from. @specform[(forth-begin word ...)]{} The @scheme[_word]s refer to Coma transformers or prefix words that allow the creation of words. The @scheme[:] prefix word creates Macros or instantiated Forth words depending on which is the current active mode. The current mode is switched using the @scheme[forth] and @scheme[macro] words. Forth mode is the default. Note that the @scheme[purrr] module still generates @emph{pseudo code} and prints it to the console instead of storing it in a data structure. Using it directly only makes sense for testing portable Forth primitives. See the next section for a specialization of Purrr. @ex[() (forth-begin : abc begin 123 again)] More convenient however is to use the @scheme[forth-compile] form to use the Forth-style lexer on top of @scheme[forth-begin]. @specform[(forth-compile code-string)]{} @ex[() (forth-compile "macro : one 1 ; forth : foo one one ;")] New prefix words can be defined using the @scheme[substitutions] from, present in the @scheme[forth] module. @specform[(substitutions namespace ((name word ...) temp) ...)] This defines a collection of prefix transformers @scheme[_name] which expand to a template @scheme[_temp] in which the @scheme[_word ...] metavariables are substituted. As explained above, its main use is for words that manipulate identifiers or symbols. @ex[() (substitutions (macro) ((symbol s) ('s))) (macro> symbol foo) ] Have a look at the file @litchar{staapl/purrr/parsing-words.ss} to see how this is used to define most of Forth's parsing words on top of the primitive parsing words and quotation. Note that the Coma and Forth languages are interdependent reusing a lot of each other's functionality. They both live in the @scheme[(macro)] prefixed namespace, but have different functionality. More specifically, Coma doesn't have Forth's control words and prefix macros. These two @emph{language views} on the same core functionality is based entirely on identifier management by PLT Scheme's declarative module system. @section{A Forth compiler for PIC18} The Purrr language presented in the previous section only has practical use when it is combined with a code transformer for a specific machine architecture. This section describes the backend for the Microchip PIC18 microcontroller architecture with 8-bit data word size and 16-bit instruction word size, which maps quite straightforwardly to a stack machine, making it an ideal primary architecture for Staapl. The implementation in Staapl uses the WREG register to implement the top of stack register, and INDF0 to point to the rest of the parameter stack in SRAM. Most arithmetic and conditional jump instructions are used to implement basic forth data and control operations in a fairly optimal way. The compiler produces native code, eliminating the need for an on-target interpreter. Forth code can be used in interrupt routines without trouble. The PIC18 language includes a full RPN assembler. The PIC18F stores code in Flash ROM memory, but has the ability to program itself. This is used to provide run-time code upload in the interaction mode. The rest of this section documents the basic functionality used to implement the command line compiler frontend @litchar{staaplc} which generates Intel HEX files and dictionary information for target interaction. @for-module[(planet zwizwa/staapl/prj/pic18)] Note that here we use the @scheme[prj/pic18] module and not the @scheme[pic18] module directly. The former is a dynamic version of the latter, which allows for @emph{incremental} image-based development on top of the static module code, and supports @emph{live interaction}. This is done using a Scheme dynamic namespace object. Requiring @scheme{prj/pic18} module creates a namespace in which the PIC18 Forth compiler lives. @specform[(forth-load/compile path)]{} This loads a Forth file @scheme[_path]. Relative paths are resolved to the current directory first, then to the @scheme[pic18] Forth library path. @specform[(save-ihex path)] @specform[(save-dict path)] After all forth code is loaded and compiled, these functions can be used to write out the current accumulated state in the form of binary code in Intel HEX format, and dictionary metadata as a Scheme script. Use a device programmer to store the HEX file in the microcontroller Flash memory. The device file is a Scheme script that can be @scheme[load]ed into a Scheme toplevel environment. @specform[(print-code)] Prints the current target code state assembly and binary code. @ex[() (forth-load/compile "p18f1220.f") (forth-compile ": main begin 1 + again") (print-code)] @section{Interaction} Upto now we've talked only about the generation of machine code (syntax). Staapl contains some functionality to allow the @emph{execution} of this code on connected target hardware, i.e. for debugging, testing or profiling purposes. This system is documented separately due to dependence of the documentation generation on a live connection with a microcontroller. The interactive functionality is provided in two forms. There is a set of primitives that allows the interaction with a running target based on communication with an on-target @emph{monitor}. This monitor is a variant of Frank Sergeant's @emph{3-instruction Forth} which allows access to target memory and is able to execute code. On top of these communication primitives a @emph{console} interface is implemented, which simulates traditional Forth interaction, without the need for an self-hosted Forth. The console mechanism performs simulation of macros in case a particular word is not instantiated. This allows for extremely small native code kernels. Interaction words are similar to prefix parsers and as such are extensible using the @scheme[substitutions] Scheme form.