Edited Staapl blog. This is the Staapl blog. It contains articles on topics related to the Staapl metaprogramming system. The articles are continuously updated, with the intention to grow them into chapters of the Staapl manual. This blog does not allow on-line comments, but you are invited to send reactions to me privately or the mailing list http://zwizwa.be/cgi-bin/mailman/listinfo/staapl-list The disorganized development log for Staapl can be found here: http://zwizwa.be/ramblings/staapl Staapl is also known under the working name Brood, which is unfortunately impossible to google.. For more information on the Staapl project see http://zwizwa.be/staapl Entry: The new Staapl (Brood-5) Date: Wed May 21 16:20:45 CEST 2008 The main motivation for the 5th Staapl refactoring iteration was name scope management. The objective was to turn all names that occur in .f files into (Scheme) compile-time identifiers, and no longer use (Scheme) run-time hash tables indexed by symbolic names. This is quite an important change, since it allows to treat all names hygienically. Relying more on PLT Scheme's scoping mechanism (lambda + modules) instead of ad-hoc hash table based namespaces makes Staapl behave better as a PLT Scheme extension[1]. Scope is everything! So, what does Staapl 5 look like? The Coma language is defined as a collection of code transformers, which are (pure) Scat functions that operate on a stack of machine code (Scat's parameter stack). Staapl is implemented as a classical layered bottom-up Scheme application written in functional programming style, where Scheme macros are used extensively to construct binding forms and specification minilanguages. * Scat layer. A language layer on top of Scheme that implements a concatenative language with threaded state. * Coma layer. A collection of _primitive_ Scat functions that implement (local) machine code transformations, specified in terms of target machine code pattern matching. Coma inherits Scat's _composition_ model. * Structured programming layer. Forth-style control flow words implemented using a 2nd compile time stack. * Instantiation layer. Create target code by evaluating generator functions. This produces code arranged in a control flow graph. * Forth-style syntax: in addition to the s-expression syntax, some frontends are provided for: prefix parsers, a forth-style lexer, combined macro/word definition in a single .f file * Interaction and name management code. Independent of the compiler, this layer provides target machine console interaction for interactive debugging and (automated) testing. [1] What I realized, is that thinking about code _graph structure_ is a whole lot more revealing than thinking about names in a flat space! This is where Scheme differs significantly from Lisp and other dynamically typed languages. For more information about the PLT Scheme declarative module system, have a look at: http://www.cs.utah.edu/plt/publications/macromod.pdf This also finally made me see how to unify and represent both dynamic interactive ``toplevel'' development and static hierarchical modules. Entry: Compiler specialization: bottom up with holes? Date: Wed May 21 19:39:42 CEST 2008 This is a justification for the violation of Staapl's bottom op code structure. Code transformers living in the '(macro) namespace, can be arbitrarily mutated, to allow a cross-cutting specialization without too much red tape. This allows the core language implementation to be separated from optimal target implementation, which replaces primitive operations in the core language. The main problem this solves is the impossibility to know in advance which primitive code generators need to be replaced for a specific target. The main problem this introduces is that it is unrestricted. Properties of this system: * All '(macro) transformers are boxed functions. They can be replaced with specialized versions. * Each compiler specialization is instantiated in a separate namespace, making these mutations less harmful due to limited scope. This mechanism behaves somehow as a local 'parameterize operation, but is implemented differently. This could be solved using subclassing, but each ``object'' provides only a single method, meaning that it's better modeled as a parameterized function. All such mutations generate events. Currently this triggers logging of the mutated macro to the console to keep an eye on this extension process, until I figure out a more static mechanism that can prevent some misuse of this mechanism: ;; \__ target ;; \__ scat ;; \__ coma ;; macro/jump <- the 'control module redefines the 'jump macro ;; \__ control ;; macro/sym ;; macro/label: ;; macro/exit ;; \__ comp ;; \__ asm ;; \__ forth ;; \__ live ;; macro/+ ;; macro// ;; macro/* ;; macro/- ;; macro/dup ;; macro/drop ;; macro/swap ;; macro/, ;; macro/or-jump ;; macro/not ;; macro/then ;; \__ purrr ;; \__ pic18 Entry: The use of macros in Scheme Date: Fri May 23 09:36:28 CEST 2008 In this message Matthias Felleisen talks about 3 cases of Scheme macro use. I'll give some examples of these below. http://people.csail.mit.edu/gregs/ll1-discuss-archive-html/msg01539.html 1. The use of macros as a _data sublanguage_ is visible in the assembler generator. It will create a representation (as functions) of a machine code specification typically found in a microcontroller data sheet. Here is an excerpt from the PIC18F assembler: (instruction-set (addwf (f d a) "0010 01da ffff ffff") (addwfc (f d a) "0010 00da ffff ffff") (andwf (f d a) "0001 01da ffff ffff") (clrf (f a) "0110 101a ffff ffff")) If you can stylize data entry macros such that what you are actually typing in the end can't be concentrated any further, then you basicly have a configuration file encoding your domain knowledge in a natural way. 2. In the PIC18 compiler, a nice example of introduction of new _binding constructs_ is the 'patterns macro, which provides an RPN pattern matching syntax to specify machine code transformation. It destructs machine instructions and makes their operands available as lexical Scheme names. (patterns (macro) ;; Define '-' to subtract two numbers (([qw a] [qw b] -) ([qw (target: a b -)])) (([addlw a] [qw b] -) ([addlw (target: a b -)])) (([qw a] -) ([addlw (target: a -1 *)])) (([save] [movf a 0 0] -) ([bpf 0 STATUS 0 0] [subfwb a 0 0])) ((-) ([subwf POSTDEC0 0 0])) ) The input patterns on the left are assembly code sequences, and the output is again an assembly code sequence. The 'qw' pseudo operation Quotes a data Word: it pushes a number to the data stack. Here the '-' subtraction operation is either is optimized away, which means the computation performed at compile time either fully or partially, or compiled as a PIC18 [subwf] instruction. 3. Change of evaluation order is present in the 'target: macro above, which constructs assembly time expressions. During compilation, code and data addresses are not assigned yet, meaning that computations depending on them need to be delayed until all the data is available at assembly time. Note that due to multipass relaxation in the assembler, these expressions will end up being evaluated multiple time with possibly different input addresses. Entry: Unrolling Forth Date: Mon May 26 14:20:11 CEST 2008 The most difficult task in Staapl has been finding a bridge between the layered, non-reflective, cross-compiled structure of Coma (lexer, parser, compiler, interpreter + Scheme code written using PLT's hierarchical module system) and the more traditional self-hosted reflective Forth approach. This ``unrolling'' makes explicit metaprogramming simpler, since it creates a static code structure, compared to a dynamically updated one that is updated on the fly. Standard Forth is essentially imperative, and is read from top to bottom[1]. It is not possible to ``unroll'' the reflectivity in ANS Forth completely [2]. In order to write an ANS standard Forth, some of this reflectivity needs to be restored. The problem is mostly parsing words. Traditionally, immediate words (macros) have access to the input stream, which is not possible in the current Staapl approach since lexing is a separate step. After lexing, only preprocesser macros (prefix parsers) have access to the word stream, while ordinary macros can only generate (and transform) machine code. As a consequence, Staapl's Forth cannot be standard[3]. This makes Staapl more complicated. Is this complication justified? Yes. The simplicity of the purely function Scat model and its name hygiene are more important than any inconveniences that are a consequence of Coma being different[4] from standard reflective Forth. Everything is built on top of Coma/Scat in layers to make metaprogramming easier, and to integrate better with PLT's bias towards a static, layered non-reflective approach. [1] The irony is, that trying to perform this unrolling step, I have come to appreciate more the elegance of the standard reflective Forth approach. Forth's compactnes comes exactly from leaving the intermedate representations out, and maximally exploiting reflectivity. There seems to be much debate even in the Scheme community about the trade-offs between interactive ``toplevel'' program development and the declarative module approach as used in PLT Scheme. (The DrScheme "run" button, or "require" vs "load"). This debate is also quite central to the design of Staapl. [2] It is always possible to unroll any reflective sequence by providing one language layer per statement. However, that doesn't really solve much, so the implied condition is to create not too many layers. [3] There is however a way around this by using only the Staapl machine model, which boils down to writing a standard ANS Forth compiler that compiles to Coma code. This could be extended with a run-time simulator to make this cross-compiled Forth behave more like a standalone one. [4] I still find the concept of parsing words useful as a _user interface_ so Staapl contains some front-end to write code that looks like traditional Forth, by introducing a ``prefix parser'' layer that emulates Forth's parsing words, and a lexer that provides the basic whitespace separated Forth tokenizing. Entry: Extract Method Date: Mon May 26 14:43:19 CEST 2008 This article is about the benefits of programming in a concatenative language, viewed from the point of programming as iterative refactoring of a prototype. Let's say composition and decomposition are the basic tools of any engineering discipline: * build complex, specialized systems out of simple, generic parts * refactor an evolved complex design to create a toolbox of generic parts The most difficult step is obviously the latter one. Decomposition creates interfaces: when a monolithic part is split in 2, there is now a thing connecting them. Good decompositions can be judged by the complexity of the interfaces they create. You do the right thing if cutting at some point hides the stuff you want to forget, but keeps open some sane way of interacting with the rest of the world. Simple interfaces promote reuse. In most programming languages, the procedure / function / message is a basic unit of composition: it can be used to create a black box which reacts to input and produces output. Decomposition or refactoring in such languages amounts to the "extract method" pattern: split up a complex uncontrolled interaction by identifiying primitive operations, and rewrite the complex interaction as a simpler interaction between these primitives. Looking at Martin Fowler's explanation in Refactoring: Improving the Design of Existing Code: "You have a code fragment that can be grouped together. Turn the fragment into a method whose name explains the purpose of the method." Important consequences of extracting methods: * compression and modularity: Code will be smaller because duplication is removed: share NOW. This reduces local complexity. * promotes reuse: It creates a possibility that code can be shared by new nodes LATER, possibly outside of the current project. * documentation: The name annotate the code note with a meaningful explanation. the better the factorization, the easier it becomes to give names. Now, an advantage of concatenative programming languages is that 'extract method' is a simple cut and paste operation: the absence of names for function parameters disposes of the hurdle of inventing them when factoring out code. This leads to easier identification of components, which usually ends up in concatenative code having a very fine factoring granularity. Entry: What is Partial Evaluation? Date: Mon May 26 19:00:32 CEST 2008 Some clarification of terminology: * Currying * Partial Application * Partial Evaluation * Staging These are all very related: * Curry: move a parameter out of a function's parameter list to create a higher order function. I.e. for Scheme's multiple argument lambda: (lambda (a b) (+ a b)) -> (lambda (a) (lambda (b) (+ a b))). In a language like ML or Haskell the default representation is curried (all functions are unary). * Partial Application: given a higher order (curried) function, obtain function of a lower order by applying one or more arguments. * Staging: moving computations between compilation stages (i.e. from run-time to compile-time). A stage converts source code of a higher level form to a lower level form. * Partial evaluation: Automatic staging based on input data available at compile time. Partial application and staging are somewhat related (in a functional setting, they are both about beta-reductions), but differ mainly in the way the result of the reduction is represented. Staging performs substitution on expression _syntax_ while partial application typically creates an abstract closure object comprised of a representation of the original un-reduced expression and an environment of name->value bindings, where final reduction (execution of base-machine code) only happens when _all_ variables are available. Partial evaluation makes most sense for functional languages, where evaluation order can be easily manipulated. To apply it to imperative languages requires the identification of a functional subset. In a language based on function composition, partial evaluation is a trivial operation based only on associativity of function composition. This the main idea behind Staapl's code representation. In the face of unrestricted recursion, the main problem of automatic staging is to decide what NOT to evaluate, to avoid infinite expansion size. Therefore it is useful to restrict the language a bit, the main idea being that full turing-completeness is not always necessary, and can make automatic specialization simpler. Sources: http://lambda-the-ultimate.org/node/2266 http://srfi.schemers.org/srfi-26/mail-archive/msg00000.html http://www.brics.dk/~hosc/local/HOSC-13-4-pp355-368.pdf http://www.dina.kvl.dk/~sestoft/pebook/ http://citeseer.ist.psu.edu/taha99metaml.html Entry: Scheme, Coma and Stack Machine Code Date: Mon May 26 20:33:30 CEST 2008 This post talks about how the concatenative macro language Coma is integrated with Scheme code. It explains the mechanisms of binding through pattern matching and derefence through quasiquotation. To set the stage, let's define a deeply embedded program as the specialization of a general program in such a way that run-time complexity (and so cost of supporting hardware) is significantly reduced. For large reduction in cost/complexity (or a large enough production volume) significant fixed engineering cost can be spent on specialization. Deeply embedded programming is about spending time and money on program specialization. When the specialization techniques can be abstracted, manual program specialization (writing a machine code program by hand) can be automated using staging: a parametric model of the code is constructed which, when given some parameters, reduces to a specialized model. BINDING: stack machine code pattern matching The binding form can be best introduced in the form it is most used, to impelement code transformation rules, which are mostly based on partial evaluation of concatenative code. For example, the concatenative code sequence 1 2 + will is reduced to a single literal 3 in the target code. This mechanism is implemented by specifying the behaviour of the code transformer "+" using code substitution patterns. The specific substitution pattern for the case above is: [qw 1] [qw 2] -> [qw 3] Where "[qw 1]" is the target (virtual) stack machine code generated by the code generator called "1". This is an instance of the generic substitution pattern: [qw a] [qw b] -> [qw (+ a b)] Here the Scheme expression "(+ a b)", which adds the two variables "a" and "b", is used to perform a computation that give the correct semantics to this code transformation rule. Any sequence of two literal values represented by two "[qw ]" stack machine instructions gets replaced by a single literal value containing the sum of these two numbers. Staapl contains a scheme macro called "patterns" which is the the destructuring binding from for target stack machine code[2]. This can be used to build subsitution rules. The complete example for "+" looks like: (patterns (macro) (([qw a] [qw b] +) ([qw (+ a b)])) ) This reads as follows: if the "+" code transformer operates on a stack of two literal "qw" instructions, bind the values these "qw" instructions contain to the variables "a" and "b" respectively, and replace the two instructions by a single "qw" instruction that contains the result of the scheme operation "+" applied to the variables "a" and "b". The general idea behind the substution rule is: If all the inputs for an operation are available during code generation time, then the operation can be performed, and only code needs to be generated that makes available this result at run time. This can be generalized quite far: using this mechanism, the basic concatenative macro language can be extended by explicit (virtual machine code) generators, accessible from concatenative source code, for any kind of Scheme -> assembly code metaprogramming. QUASIQUOTATION: name space merging To generate Coma code from within a Scheme context, i.e. as the right hand side of a "patterns" rule, a quasiquotation mechanism is provided. The "unquote" operator has a similar semantics for the "macro:" concatenative code constructor as for the "quasiquote" list constructor. Consider these 3 forms: (macro: 1 +) ;; 1. (macro: 1 ,operator) ;; 2. (macro: ',operand +) ;; 3. 1. This constructs a macro which is the composition of the macro "1" (which generates code to quote the number 1) and the macro "+" which is fetched from the '(macro) namespace. 2. Constructs a macro which is the composition of the macro "1" and a macro bound to the Scheme identifier "operator". 3. Constructs a macro which is the composition of a macro that generates code to quote the constant bound to the Scheme identifer "operand" and the macro "+" from the '(macro) namespace. In short, the "unquote" operator allows the introduction of code generators and quoted date from the Scheme namespace into concatenative code, which uses a namespace different from that of Scheme. This allows the construction of code templates for Scheme -> coma metaprogramming. CONCLUSION These two mechanisms (deconstructive binding and quasiquotation) allow a simple impedance match between concatenative code an Scheme, building on top of Scheme's lexical structure. NOTES [1] This can also be combined with the second quasiquoting macro mechanism to yield more powerful constructs. [2] The binding construct simulates pattern matching for algebraic data types, but limited to stacks: next to deconstruction it provides a single level of construction (the "qw" instructions on the right hand side are interpreted as constructors). On top of that, the pattern matching is from the scheme/match PLT module. [3] Instead of a stack of instructions, the RHS of a "patterns" rule can contain arbitrary scheme code if it is a list that starts with an identifier. (It can always be made such by wrapping it in a "begin" form). If the RHS is a list of lists, it is interpreted as a machine code sequence constructor. Entry: Forth and Metaprogramming Date: Tue May 27 14:55:47 CEST 2008 This article introduces the general architecture of the Forth dialect's language tower. Using a highlevel language to describe a problem and compiling it down to lower level code is such a ubiquitous abstraction mechanism it is hardly noticed as one. But what happens when this abstraction mechanism is made programmable? Instead of using a fixed compiler for a fixed language, make the compiler extensible using a 'plug-in' mechanism. This is one of the powerful ideas behind languages like Lisp and Forth, where the extension mechanism is implemented in terms of code transformers, also called macros. In Lisp, the transformation is performed on structured data, while in Forth the transformation is similar, but usually implemented more directly without intermediate data step. With a powerful extension mechanism, the base language and compiler can be kept relatively minimal: a lot of choices can be left to the programmer, who can build on top of the simple language primitives. The choice of base language then can be to maximise flexibility (Scheme: the untyped lambda calculus with a garbage collected persistent memory model) or to minimise implementation complexity (Forth: stack machines with a concrete array memory model). In Lisp in particular, there are 2 composition mechanisms: (A) procedure: composition by lambda abstraction and application (B) notation: composition of macros (code transformers) Here (A) is the basic composition mechanism and can be used to express solutions to all problems, but is not always terribly succinct because of its intrinsic notational limitation. The (B) mechanism can be used to solve most notational problems by translating arbitrary notation into base language syntax(1). An example of a severe notational problem is the expression of one language in terms of another one. The Forth dialect in Staapl is implemented as a set of pure concatenative functions that generate or transform machine code. This functional approach greatly simplifies compiler and optimizer construction. The language is layered in the following way: * [PRIMITIVE CODE PROCESSING] The macro language primitives are written mostly in terms of a pattern matching language that operates on target (pseudo) code. This implements code generation, peephole optimization, partial evaluation, arbitrary code generator plugins and specification of intermediate (pseudo code) objects used for communication between code generators for compile time parameter passing. * [MACRO COMPOSITION] The macro language is a purely functional concatenative language (Coma, based on Scat) with extensions to implement Forth style structured programming words. It can be used to compose the primitive code generators into higher level code generator abstractions. * [INSTANTIATION] In a Forth file, one has the choice to define functionality as macros (2) or as instantiated code. This is implemented using an identical (3) syntax to make it easy to switch between the two modes. * [PREFIX PARSER] The forth syntax layer provides a pattern substitution mechanism for prefix words in order to maintain standard Forth syntax (4). The power of Purrr lies mostly in the combination of the purely functional concatenative macro language supporting a form of quasiquotation to build template code, and the pattern matching binding form used to define language primitives that perform partial evaluation and more general compile time tranformation of target stack machine code. Notes: (1) Whether (B) is necessary also depends on the base language. For a strict language, composition of code that manipulates evaluation order cannot be performed with (A) alone. (for example 'if' and 'delay' in Scheme). In a lazy language like Haskell, (A) can be stretched a long way, however, Haskell is full of syntactic sugar and there's a Template Haskell library to perform (B). (2) Traditional forth implements primitives in machine code and tries to keep the use of macros to a minimum, to limit (on-target) compiler complexity. This is not a constraint in the cross-compiled Purrr language, where all primitives are macros and no primitive run-time code is necessary. (3) Almost identical. The instantiated Forth language differs from the Macro language in that it has access to the data structure used for run-time function composition: the return stack. In addition, Forth words can have multiple entry points, which means code can fall through to the next definition in a source file. (4) Traditional Forth employs a more direct and low-level metaprogramming syntax (i.e. "postpone", "compile", "accept", ...), and uses reflection to reduce code size. Purrr limits this kind of reflection and replaces it by 'unrolled' language towers. It is still an open question wether using standard Forth syntax for the Purrr system is a good idea. Forth syntax is limiting to the way metaprogramming is used in Purrr, and is also fairly bound to the reflective structure of a standard Forth parser/compiler. In any case, in Purrr, the Forth syntax is merely a front-end to a more general s-expression based syntax for the Macro language, so it is always possible to fall back onto this syntax to express particular abstractions. Entry: Name spaces: load vs. require Date: Thu Jun 5 11:33:34 CEST 2008 This post fits in the toplevel namespace vs. modules idea. Tree-structured modules are a great way to manage namespaces in large software projects, but there is something to say for the 'one big namespace' approach when single instances are the rule and not the exception. For example: * Generic interpreter code is parameterized by the words 'receive' and 'transmit'. * Specific toplevel code links generic code together by simply inlining into its namespace the interpreter code and a particular definition of these words. The benefit of this pattern is that it doesn't need much red tape: there is basicly no namespace management, which can be a down side when code gets big, and needs to be more generic. However, for deeply embedded programming more often than not there is usually only one static instance of a certain object, in the case of the example it's the debugging serial port. Moreover, the ratio of glue code necessary to implement the binding and the code that does something useful can be high. So why not use the simple mechanism until more abstraction is needed? For example, generic interpreter code can be defined in a file that's simply included. Whenever multiple instances or late binding is necessary, this file is wrapped in a module to shield namespaces. This is how it is solved right now: you're free to use load (include) or require (proper module namespace management). The monitor kernel is implemented with load. On top of this, macros can be redefined, but compilation generates a notification of this. This is basically hierarchical modules + parameters, but without specifying in advance which behaviour can be parameterized. Entry: History and Semantics of Coma Date: Wed Jun 25 16:28:39 CEST 2008 This text deals mostly with the implementation and how this approach came into being. The semantics of Coma are explained more concisely in a different post: entry://20080806-212034 An interesting idea in http://www.latrobe.edu.au/philosophy/phimvt/joy/j07rrs.html is the re-introduction of stacks in the implementation of rewrite rules. Manfred talks about giving Joy a rewrite semantics, without the use of stacks, but eventually comes back to stacks as a way to efficiently implement these rewrite rules. This seems to be due to the fact that Joy's rewrite rules are local, and stacks support operations with local effects. A more formal treatment would be definitely interesting. In Coma however, pragmatism rules and rewriting is implemented as deterministic target stack machine code pattern matching. * Primitive semantics in Coma are provided by operations on target machine code programs. Each identifier (word separated by spaces) that occurs in a source file represents a pure function that transforms a list of machine instructions. Primitive functions are declared as pattern matching rules, which form an implementation of a set of rewrite rules. * New Coma functions can be obtained by means of ordinary function composition, which is syntactically represented by concatenation of function names, yielding a concatenative language. x == a b c * With identifiers primary representing TRANSFORMERS, the DERIVED semantics of identifiers is obtained as the target program which results from the application of the function represented by the identifier to the empty program. It is essential to separate the two levels of semantics: code transformers, and the code being transformed which is exectued by a concrete machine. Obviously, these two are somehow linked! I.e. for PIC18 the transformer "+" has the derived semantics of addition because it generates an ADDWF instruction, but it also performs partial evaluation if possible. To make a concatenative language useful, some form of locality of effect has to be introduced(1). Locality makes it possible to create reusable components. A straightforward way of introducing locality is to make the space of values on which the functions operate take the form of a space of stacks of values. For Coma, the choice is easy to make: * Target programs are a concatenation of primitive machine instructions. * Rewriting happens locally: Rewriting functions operate on (possibly infinite) target programs by replacing a finite number of instructions M by a finite number of instructions N at the right end of a target program, leaving the leftmost instructions intact. This makes the Macro language a purely functional concatenative stack language: all its primitives operate on a stack of values, each value representing a machine instruction. That's nice and clean. The language makes it possible to implement a peephole optimizing compiler which next to code generation (macro expansion), also performs some form of partial evaluation (target code rewriting that eliminates run time computation by reducing the number of instructions necessary to perform a certain computation). To perform partial evaluation correctly, care needs to be taken to align semantics that are a consequence of computations that are performed during compilation, with computations that need to be postponed to runtime. For example, the program [1 1 +] might be rewritten to [2], but [+] has no rewrite rule, so in order to preserve proper dervied semantics of '+' it needs to be rewritten as a program that adds two numbers present on a run time stack, and replaces them with the result(2). In short, starting from the locality of code transformers, and consistency requirements coming from partial evaluation operations, the target machine has to behave as a stack machine. Now, let's make it a bit more useful. In order to get interesting properties for parameterized metaprogramming, we relax the semantics a bit, allowing for the existence of code transformers without associated derived semantics. Individual code transformers, identifiable by names in a source program, might _lack_ derived semantics because: * their domain does not contain the empty program. They do not have a purely code-generating form but need non-empty program input to rewrite to something else. * their codomain is not contained in the set of concatenative programs generated by the target machine primitive instructions. They generate intermediate instructions, which are compile time values, that might be collected by other rewrite rules to influence their code generation. Such macros are called ``intermediate macros''. Their closing space is called the space of intermediate programs, which is generated by the set of primitives comprised of (real) target primitives combined with these (virtual) intermediate primitives(3). This allows to define the notion of 'real' macros as macros that (are concatenative and) map the empty program to a program in the target program space. Real macros can be instantiated on the target machine. Using an (intermediate) program space that is larger than the target program space gives a means to use intermediate macros for constructing components of parameterized target programs: build real macros by composing intermediate macros. It provides a parameter passing mechansim for macros that can host values not possible to represent in the target semantics(4). This makes it possible to factor code generation patterns into smaller elements that can be recombined in several ways. Conclusion: Using rewrite primitives and function composition, the Coma language allows the conversion of a concatenative program to a value whenever the concatenative program has a _pure_ associated target semantics (all operations are local to the stack), meaning the entire program can be reduced to a stack of values. However, the target semantics isn't pure because target programs depend on input available only at run time. As a consequence, not all programs can be completely reduced. This partial reduction (partial evaluation) is the target program. Combine this with a larger compile time value space and associated rewrite rules to get a powerful metaprogramming system. (1) Or modify all of it in a uniform way. For example a language like FP or any kind of monad language might have primitive operations that modify all the state. For example a 'map' operation. However, I'm more interested in a model for a complete system, not an embedded sublanguage. (2) For Purrr18, the semantics doesn't always match completely. For example, numbers and associated operations are bignums at compile time, but 8 bit integers for run time operations. This could be called 'projected semantics'. I've found this to be really useful, i.e. to compute constants. This combination of a well-defined macro semantics with a link to a (somewhat less powerful and approximate) target program semantics gives a very powerful metaprogramming system: the time at which computations happen can be largely ignored when trying to understand overall program semantics, and can be unignored when trying to understand its implementation (instantiation and approximation). Both things are equally essential in resource constrained programming, less so in general programming where correctness/consistence usually takes precidence over implementation efficiency. (3) For example, in the PIC18 architecture, there is no generic conditional branch instruction, but a whole family of such instructions optimized for different conditionals. In the code generation, this is solved by using a virtual intermediate condition value type that is combined with a conditional branch instruction to yield one of the specialized branch/skip instructions. (4) Somewhat analogous to quarks and gluons in physics. Quarks can be bound to other quarks through gluon exchange, but are never seen in isolation. -- Now, that's a nice story, but that's not how it happened :) The specification of rewriting rules came quite naturally as a syntactic abstraction for some previously manually coded peephole optimizations in the first iteration of the Forth compiler. This rewrite system however seemed way more useful than just for performing simple optimizations, and by introducing pseudo target assembly instructions has been transformed into a different kind of two-level language semantics: powerful compile time type system with an analogous run time type system with 'projected semantics' derived from eager macro rewrite rules. The downside of this is that in order to write programs, one has to make decisions about what to instantiate, and what not. It might be interesting to try how far this can be automated: don't instantiate what cannot be instantiated, but try to isolate subprograms that have real semantics. Entry: Staapl goals Date: Wed Jul 2 22:13:52 CEST 2008 1. Practical: Build a framework for metaprogramming (multi-stage programming) in 'lisp-and-forth-style'. This is taken to mean the design of a programmable compiler for 2-stack machine models in a clean dynamic language. This goal is reached through the use of a purely functional dynamically typed stack language, and implemented in a Scheme extended with a hygienic macro system that works well with modular code development. (PLT Scheme's macro system). This framework is supposed to get practical issues out of the way so an intuition can be built about Resource Aware Programming (RAP) centered around metaprogramming of 2-stack machines. Note that there is not much new here: it is merely a combination of existing ideas (Forth, Joy, Scheme) into a coherent system. In Staapl there is a balance between a static and a dynamic approach; this according to / borrowed from the PLT Scheme tradition: a static hierarchical module system for large scale code management, and dynamic interactive compilation, incremental code upload and target interaction. The concrete target architectures are small microcontrollers and DSP processors. The core could be called a dynamically typed, concatenative macro assembler (CMA). The basic framework is as good as ready. There is a single test backend for Microchip PIC18, and some bits and pieces for a C code generator. Most changes in the last year were about implementation and simplification: better integration with the PLT Scheme macro and module system, and simplification of the low level language model to eliminate ad-hoc non-composable constructs. 2. Experimental: Design DSLs for resource (time/space) constrainted programming on top of the concatenative macro assembler idea. Currently, there's Purrr, a Forth dialect, and Coma, a cleaner s-expression based concatenative Macro language used to implement the purely functional core of Purrr. Future work is mostly about program transformations (fusing/deforestation/...) in a Backus' FP and related "Algebra of Programs" framework, OBJ's "theories", and a look into typed staged systems ala Taha's RAP research. I am also interested in parallel programming, trying to figure out a way to marry process calculi with the 2-stack machine model. Entry: Projected Semantics Date: Sun Jul 6 15:18:49 CEST 2008 Q: What? All arithmetic expressions in Coma are evaluated as soon as possible. However, the partial evaluation of arithmetic operators does _not_ use the target's limited word length representation, so not all operations yield the same result if they are evaluated at compile time. This is in the same spirit of macro assembler expression languages. Q: Why is it useful? It's a sharp knife that can be a great tool if you know how to handle it. Essentially, it allows Forth's staging annotations - the words [ and ] - to be dropped in Coma code. Q: Doesn't it lead to a mess? Yes, if you don't know when an expression is evaluated. This is well specified however. The problem is in realizing that the operations of expression evaluation and value+operation projection _don't commute_ for all operations. They do commute for addition and subtraction (i.e. for PIC18 equivalence modulo 2^8), but involving multiplication and division (and bit shifts) breaks the morphism. Q: Can I get some more hand-holding? I don't know how to appropriately add constraints to this to eliminate common mistakes. Entry: Sheep works! Date: Wed Jul 23 19:23:52 CEST 2008 The Sheep synth, Purrr's test app with an attitude is working on the new application core. This means the Brood 5 refactoring has come to an end, and it's time for a release. Entry: Staapl 0.5 release notes Date: Thu Jul 24 11:35:47 CEST 2008 The Staapl project has come to a point where it is ready for public scrutiny. http://zwizwa.be/staapl Staapl is a collection of abstractions for (meta)programming microcontrollers from within PLT Scheme. The core of the system is a programmable code generator structured around a functional concatenative macro language. On top of this it includes a syntax frontend for creating Forth-style languages, a backend code generator for Microchip's PIC18 microcontroller architecture, and interaction tools for shortening the edit-compile-test cycle. This is the completion of the phase I goal: to build a practical, well-factored, easily extensible base system upon which to build phase II: experiments with domain specific languages for DSP and embedded control. From this point I will continue fixing bugs and polishing documententation. An overview can be found here: http://zwizwa.be/archive/staapl.html The PIC18 Forth dialect is documented in the following papers: http://zwizwa.be/archive/pic18-forth.pdf http://zwizwa.be/archive/pic18-synth.pdf The interactive aspect is documented here: http://zwizwa.be/archive/pic18-interaction.html Staapl is available on PLaneT as zwizwa/staapl. To install and compile the test application: mzscheme -p zwizwa/staapl/examples/build-synth Entry: A simple example Date: Sun Jul 27 11:53:19 CEST 2008 Staapl is now installed on PLT Planet, which is the central repository for PLT Scheme user contributions, so it's very easy to install. Get the latest PLT Scheme at http://pre.plt-scheme.org You only need the command line version (MzScheme) To try it out in batch mode, create a file test.ss with the following contents: #lang scheme/base (require (planet zwizwa/staapl/prj/pic18)) (forth-compile ": inc 1 + ;") (save-ihex "test.hex") Then execute it like this: > mzscheme test.ss This will locally cache the Staapl distribution, and create a file test.hex containing the compiled Forth code ": inc 1 + ;". Check it with gpdasm: > gpdasm -p 18f1220 test.hex 000000: 0f01 addlw 0x1 000002: 0012 return 0 This only uses the barebones compiler. To compile a Forth application together with the boot monitor, replace the 'require line in the build script with: (require (planet zwizwa/staapl/prj/pic18f1220-serial)) And compile again. The top file of the monitor code can be found here: http://zwizwa.be/darcs/staapl/staapl/pic18/monitor-p18f1220.f These files are cached locally in this directory: > mzscheme -p zwizwa/staapl/prj/pic18 -e pic18-lib Entry: Coma semantics Date: Wed Aug 6 21:20:34 BST 2008 The semantics of Coma is defined in terms of concrete step by step deterministic rewrite rules. This post tries to find the relation to declarative rewrite rules and place everything in a more formal setting. For a more concrete step-by-step exposition of how this is implemented, see: entry://20080625-162839 The goal of the compilation process is to translate a source concatenative term s into a target concatenative term t. Concatenative terms are either concatenations of concatenative terms or primitive terms. Let s = s0 s1 ... | s' source programs t = t0 t1 ... | t' target programs Here s, t are terms and s', t' are primitive terms. Denote the set of s and t as S and T, and the set of s' and t' as S' and T'. Basicly, pure concatenative translation maps primitives to primitives S' -> T'. The complete compilation expands a source concatenation into into a concatenation of primitives which then one by one can be translated to a target primitives. In order to make this a bit more interesting, we allow substitutions to occur in the language and make the translation more general. Substitutions replace a subconcatenation with another one. Define the translation system {p,r} consisting of a map p : S' -> T together with a collection r of rewrite rules on T. The map p associates primitive source terms with concatenations of primitive target terms. The rewrite rules r allow the construction of a reduction system that translates the concatenation of terms p(s0) p(s1) ... obtained from the source program s = s0 s1 ... to a term t_n in normal form. The alternative approach where substitution rules are defined on S followed by a direct primitive map S'->T' is equivalent, so we can concentrate on rewrite rules in T only. Now, the implementation in Coma is a bit different, and modeled after standard Forth macros. Define the translation system {m} where m : S' -> T -> T maps each source primitive to a target code transformer T -> T (1) In uncurried form, Coma replaces the concatenation operation by a binary operator T x S' -> T. Given a source program S = S0 S1 ... this operator can be used to create a whole program concatenation operator S0' x S1' x ... -> T by folding the binary operator over S initalized with the empty program. The question is, how are {p,r} and {m} related? The rewrite system r obviously needs an algorithm to implement it and some constraints that allow the existance of normal forms for all terms t. I think the interesting questions are then: * What are these constraints and how to derive the representation of the reduction algorithm {m} from a higher level declarative representation {p,r}? * Is it useful to implement it like that, or is a more direct pattern matching approach good enough? (2) Notes: (1) Stack semantics in Coma comes from an extra constraint that gives the code transformers T -> T some locality by having them operate only at the end of the primitive code concatenation, essentially viewing the target program string as a stack. The specification language for primitive transfomers in Coma enforces this, by building on top of Scat. However the general principle exposed here is applicable to more general point-free languages. (2) The Coma pattern matching approach {m} is not exactly limited. A central idea in Coma is that the (directly) expressed rewrite rules encode 2 kinds of equivalence relations. The first one is partitioning of target programs with the same target semantics. For PIC18 the resulting reduction is mainly a reduction of program size. This is a fairly straightforward OPTIMIZATION technique. The second relation collapses concatenation of pseudo instructions that have no individual target semantics into constructs that do. This is the PARAMETERIZATION part of Coma. Entry: Staapl in a nutshell Date: Thu Aug 7 08:24:37 BST 2008 Staapl aims to work out the claim that Forth is a decent machine language on which to build DSLs using macros in Scheme style. * [UNROLLED REFLECTION] It places the the basic compositional forms of Forth (procedures, macros, parsing words) in a non-reflective declarative hierarchical language tower framework. The main advantage is that it gives an declarative a-cyclic ``unrolled'' layered language structure that is easier for metaprogramming. * [FUNCTIONAL PROGRAMMING] Staapl models Forth macros as a pure functional stack language, operating on a stack of (virtual) machine code instructions. To this end it provides a pattern matching language for writing primitive code transformers in terms of instruction templates. * [PARTIAL EVALUATION] It uses the purely functional subset of Forth (words that operate on the top of the stack only) as a guide to derive a set of pattern matching rules that implement partial evaluation for this subset. This is done by eagerly moving computations to compile time. * [PARAMETERIZED CODE GENERATION] The partial evaluation mechanism is generalized to use Scheme's data types, including its number system, to facilitate the construction of arbitrary code generator primitives. * [SCHEME] To avoid any aribitrarily introduced abstraction walls, all these mechanisms are available in and built on top of PLT Scheme modules. This allows the construction of s-expression based DSLs by expanding code down to a concatenative machine language. * [INTERACTION] On top of the declarative structure, a toplevel interaction environment is provided that allows interactive code compilation and execution. Care is taken to make this convenient, i.e. to provide simulation for common language constructs if they are not instantiated. * [CULTURE] Giving up Forth's traditional imperative self-reflective property and absence of intermediate representation dispenses with some minimalist elegance. Staapl is very different in ``culture'' from traditional Forth. Entry: Is classic code analysis necessary? Date: Thu Aug 7 08:33:59 BST 2008 The Coma compiler includes a somewhat traditional but at the moment quite limited intermediate code processor to implement optimizations. Between compilation and assembly there is a point where the code is structured as a control flow graph. This is not used yet, but there to perform some optimizations not possible in the rewrite framework. (I.e. jump chaining, loop tricks, ...) However, I'm still not convinced such a postprocessor is actually necessary. For very simple target architectures, I suspect the generated code will be already quite optimal because: * Generating good specialized code is the whole point of staging, and mostly the responsability of the person writing the code generator. * The partial evaluation for the functional subset works quite well to eliminate obvious parameterizations and the pattern matching allows the specification of a good impedance match to the machine instruction set. As practical evidence for this, the PIC18 rewrite language already provides quite optimal code in the practical uses that I've encountered, which is low level embedded programming with some simple language extensions, mainly to generate data lookup and code dispatch tables. On the other hand, when implementing more elaborate DSLs on top of this system, it might be interesting to perform proper data flow analysis and register allocation. For more complex targets however, I suspect that Staapl looses its benefits. Entry: Interaction: single chip vs. distributed applications Date: Sat Aug 9 10:23:45 BST 2008 I've been thinking about the interaction model for Staapl. In order to make things more manageble, I will introduce two different cost models for interactive development. The main distinction is about single/multi core applications. * SINGLE CHIP Defined as an application where a single uC performs a well-defined task without much higlevel communication to other chips, and can be developed and tested in isolation. * DISTRIBUTED Defined as an application where a single uC is integrated in a communication network. The communication between uCs is an essential part of the solution, and cannot be conveniently developed or tested in isolation. These two applications deserve a different solution pattern. For single chip applications, communication overhead necessary for debugging should be minimized, while in distributed systems, communication is essential to the operation and can be used to support debugging. To simplify matters, the rest of the exposition is specific to the Microchip PIC line, but should be easily generalizable to other manufacturer's chip lines. For single chip applications, the aim should be at making the hardware interface necessary for programming and debugging as simple as possible since it is not part of the solution. If interactive debugging is required, use a communication protocol that minimizes on-chip resource usage. For PIC this means the use of the native Microchip programming interface, and use the same pins for the debugging protocol. PicKit2 [1][2] could be of use here. For distributed applications, Staapl will concentrate on PICs that are self-programmable: the ones that have an internal charge pump to generate the Flash programming voltage. (Anything > PIC18 and some of the PIC16). Each PIC should be equipped once with a (read-only) monitor that understands the basic network protocol and can update firmware. The programming host <-> network interface can then be implemented using the single chip approach which interfaces to a network entry node. Learning from past attempts at building an easy to interface platform, the most important drawback has been the necessity for expensive programming and interaction hardware. I managed to bring it down to 20 Euro per person for the serial interface + chip with bootloader. With the availability of the PicKit2 for 25 Euro it is probably not possible to find a cheaper solution for bootstrapping blank chips. It looks like the protocol is open, so direct support in Staapl should be feasible. [1] http://www.microchip.com/stellent/idcplg?IdcService=SS_GET_PAGE&nodeId=1406&dDocName=en023805 [2] http://www.didel.com/ Entry: Forth's 2 modes: interpret and compile Date: Sun Aug 10 14:28:24 CEST 2008 Classically, Forth distinguishes two modes: (I) Interpret mode (C) Compile mode Staapl's Forth does the same. For most languages, this difference is not exposed to the programmer. Compiled languages usually expose only a single entry point to compiled code (the "main" procedure), unless they are compiled as libraries, in which case they are linked to other programs or libraries and not invoked directly. Forth however uses the observation that code entry points (program procedures) behave as individual programs and thus can be executed individually. This is both convenient and confusing. Convenient: Interactive use of program components is convenient for testing. Only exposing the compiled procedures comes essentially for free: a jump to the code address. Additionally, this mechanism allows simple implementation of reflective beaviour. I.e. the subprogram ":" accessible starts the compilation of new code by switching to compile mode. Confusing: Not all language elements that are accessible during compilation can be accessed in interpret mode. More specifically, all macros (immediate words) cannot be used. This includes all structured programmig words like conditionals and loops. In order to eliminate this confusion, most compiled languages that support an interactive console will compile commands given at the console before executing the freshly compiled code. This makes sure all language elements are there at the console. From that perspective, it's probably better to compare Forth's interactive console to a unix command shell (executing compiled programs, executing commands to compile new programs) than an interactive language shell that exposes all language features available to code to be compiled. In comparison, Haskell's ghci also do not expose the full language in the repl, i.e. you cannot create definitions. In Staapl, interpret mode is not really necessary since the reflection mechanism of standard Forth that is rooted in this interpret/compile distinction is not present. It is not that difficult to implement an extra compile step to make sure console input can use the full language, not only compiled words, to lessen confusion. In fact a previous version of Staapl included a ``scratch buffer'' used for compiling code directly, and I'm planning a simplified language that uses this as a primary interaction method. However, the mechanism has been preserved because of the use of Flash ROM code memory in contemporary microcontrollers with Harvard architecture. Such memory isn't as easily overwritten as RAM, which makes generation of temporary code for translation of console commands difficult. The standard Forth interpret mode allows the execution of compiled code without touching the Flash memory. This is extended in Staapl with a simulation layer which can offload some functionality to the host if it is not present as compiled code. The reason is that compared to standard Forth, Staapl's Forth is heavily macro based, and some basic words might always be inlined, making them not available in interpret mode. Entry: Coma and names: reflection vs. staging Date: Wed Aug 13 10:51:53 CEST 2008 Why does Staapl need two abstraction mechanisms: macros and prefix macros? The short answer: Coma's code generators are flat anonymous higher order combinators. All names are just notational shorthands that identify combinator expressions. Because the naming mechanism is completely orthogonal to the semantics of the Coma language, abstractions built on top of the mechanism that associates names to expressions needs a separate mechanism. See next post. 1. TRANSFORMATION Let's define a macro as a function that transforms program code into program code. A Scheme macro transforms a list of Scheme expressions into into a new Scheme expression. The macros live in the same space as the expressions that are being transformed. A Coma macro transforms a stack of instructions into a stack of instructions. The instructions come from some target concatenative language. 2. STAGING One essential difference that is immediately observed is that program transformation with Scheme macros is reflective. Macros operate on Scheme code and produce Scheme code. The process of expansion is repeated iteratively until all macros have been invoked and only primitive forms remain. In Coma, program transformation is staged. Every transformer function is a component of a meta language that manipulates a base laguage as data. These languages are distinct. A Coma program defines a base language transformer. Note that the Coma language is a higher order functional language. A lot of parameterization can be solved using anonymous macros. This approach gives a really simple structure to Coma that allows to treat staging in a very straightforward algebraic way without having to worry about bindings at all. Absence of ``special forms'' allows one to maintain a very direct relationship between syntax (concatenation) and semantics (function composition). 3. PREPROCESSING When one would like to parameterize over names in a Coma program, a different abstraction mechanism is necessary. Since Coma is embedded in Scheme, the straightforward approach is to use Scheme's reflective macro system to handle this. This mechanism is introduced into Staapl's Forth preprocessor as ``prefix macros'' or ``parsing words''. Additionally, since names in Coma are all global, it makes a lot of sense to build some scoping mechanism on top of this. Coma uses both PLT Scheme's hierarichal module namespace mechanism and an additional subdivision into separate namespaces for compilation stages: (macro) and (target), plus namespaces to embed alternative concatenative semantics like (scat). On top of this, it is possible to mix Coma and Scheme code in a way that allows Scheme's lexical bindings to be spliced into Coma code. Entry: The role of names in concatenative languages Date: Wed Aug 13 12:56:04 CEST 2008 Lambda calculus is based on the two complementary operations of abstraction and applications. Combinatory logic is a variant of the lambda calculus expressed in terms of just application and a couple of primitive lambda terms called combinators[1]. Further simplification is possible: the combinators can be made ``flat'' by using only left associative application[2]. In Scheme, a practical programming language based on the lambda calculus, some forms are allowed to be abbreviated using names. This abbreviation mechanism is not essential to computation: it is merely a notational device. * A set of primivite abstractions and values. * function application * abstractions through LAMBDA * abbreviation of values thorugh DEFINE In Scheme there are 2 mechanisms that introduce names. DEFINE introduces abbreviations for expressions on the toplevel, while LAMBDA uses names to create parameterized code, which are later filled in using (explicit) applications. Eliminating abstraction (LAMBDA) from Scheme can be done in a similar way as for the lambda calculus -> flat combinators conversion. Replacing application with only left associative application creates a language which has a mechanism to define names (DEFINE) that is completely independent of its primary abstraction mechanism (concatenation of terms). This is quite remarkable. In Scheme it is possible to unify these two name mechanisms by interpreting DEFINE as the introduction of lexical variable by wrapping a Scheme expression in a LAMBDA. However, in a concatenative language we are now free to take another route: to take names _out of the language_ completely, making it part of some meta language. This is what happens in Coma. All the metaprogramming machinery can be concentrated in a very simple combinatory ``algebra'' instead of having to deal with scoping of names. [1] http://en.wikipedia.org/wiki/Combinators [2] http://www.eecs.usma.edu/webs/people/okasaki/pubs.html#jfp03 Entry: Programmer bootstrap, Staapler and PicKit2 Date: Wed Aug 13 17:11:34 CEST 2008 Byron Jeff gives some good arguments here that a zero-cost bootstrap is not really possible anymore with contemporary hardware: http://www.nabble.com/Re%3A-New-open-USB-programmer-p18963084.html Jean-Daniel Nicoud mentions in the same thread that: It is not worth to loose time with serial or parallel programmers. I like the Pickit2 USB programmer, cheap and it can be used as an UART-USB interface when it has finished its work of programming your Hex file. It encouraged me to develop a set of boards that makes both learning and development more easy. Have a look to http://www.didel.com/ It's brand new, so I put the links on the top of the page. From the PICkit 2 Interface guide Kit PICkit2SourceGuidePCv2-52FWv2-32.pdf available in http://ww1.microchip.com/downloads/en/DeviceDoc/FirmwareV2-32-00.zip 5.1.2 UART Mode UART Mode allows the PICkit 2 to be used as a simple UART. ICSPCLK = TX (data transmitted from the PICkit 2) ICSPDAT = RX (data received from target UART) Signal levels are inverted (ie Start Bit = GND) logic level signals between GND and VDD. It would be interesting to try to use this instead of writing Staapler, or let Staapler be the interface to networked programming on top of a PicKit host connection Entry: Improving Forth Date: Mon Aug 18 16:41:36 CEST 2008 Does Forth need improvement? Forth already has metaprogramming capabilities, so why add another layer? The short answer is that Staapl REPLACES the metaprogramming mechanism with a different one. I use the name ``Forth'' in texts about Staapl because the machine model used is close enough to Forth for the resulting language to be called a Forth dialect. But to understand the idea behind Staapl, the Forth language needs to be split up into two parts: (1) the 2-stack machine model that supports the language (2) the low-level reflective self-hosted language structure My claim is that (1) is an idea powerful enough to exist outside of (2), and that in a cross compiled, non-self-hosted language setup, the standard (2) can be replaced by something more powerful. In a minimalist self-hosted environment, there is little room for improving (2). Apart from small changes found in a myriad of incompatible Forth dialects, the core design seems to be quite optimal. The problem with Forth is that when you DO want to use some standard compiler techniques factored out in different steps, instead of the usual highly reflective ultra-minimalist no-datastructures-necessary Forth approach, using a low level language like Forth to implement the compiler just makes things more difficult. So I have this to say: * Staapl doesn't use Forth, the reflective language, as you know it. It is mostly inspired by the Forth machine model. It stays close to the Forth language legacy, but is significantly different in key points, where it replaces reflection by explicit staging. * Staapl uses the Joy model, a functional language related to Forth, to implement metaprogramming. For practical reasons, the Joy model is slightly adapted to be embedded in Scheme. PLT Scheme is used for its advanced module system that simplifies programming with macros, and its overall coherance as a well-documented Scheme based programming environment. Entry: Two kinds of parameterized code Date: Tue Aug 19 15:27:29 CEST 2008 1. Quoted macros are (compile time) values in Coma, so they can be used to parameterize other macros. 2. It is possible to use Scheme's macro mechanism and the ``parsing words'' extension built on top of it. Which one should you use? These two are somewhat complementary. The former is cleaner, since it works on the code transformation level. The latter is specifically designed to introduce names and is _just_ source syntax manipulation, but can be useful for abbreviations. An example of the first approach: waiting for a startbit, which is a high->low transition on a pin. macro : wait | q-cond? | q-cond? i if begin q-cond? i not until then begin q-cond? i until ; : startbit? PORTC 0 low? ; forth : wait-start ' startbit? wait ; Parameterization here is accomplished by passing the quoted macro "startbit?" to another macro "wait" that will instantiate (interpret) the macro. Instantiation/interpretation uses the short Joy-inspired word "i". Shorter, with anonymous quotations enabled: : wait-start [ PORTC 0 low? ] wait ; Entry: Staapl design choices Date: Fri Aug 22 10:45:43 CEST 2008 These are the 4 mostly orthogonal big choices that are at the core of the structure of Staapl. * [LIBRARY vs. LANGUAGE] Staapl is not a different language. It uses existing ideas and implementations from Scheme and Forth, and is structured as a library extension of PLT Scheme. This avoids all problems with concrete syntax. Concrete syntax can be built on top of the existing concrete syntax system (i.e. the Forth parsing words) but the abstract syntax is kept as clean as possible: Scheme syntax + purely concatenative for the intermediate representation and low level machine model. * [OOP vs. FP] Staapl is based on functional programming instead of object orientation. The main advantage is that partial evaluation is easier (almost trivial) in the absence of state. * [STACK vs. REGISTER] 2-stack machines live at the intersection of FP and imperative programming. This simplifies machine models: register (RISC) machines have NON-LOCAL state (registers) at their core, while stack machines are built around a LOCAL state mechanism. This difference cannot be stressed enough! * [UNTYPED vs. TYPED] Staapl is based on Scheme (untyped lambda calculus) and not on ML/Haskell. The reason for this is largely historical. Currently I see no reason why it cannot be implemented in a language with ML-style type system. Such an approach would make some corner cases (ad-hoc union types) more explicit. However, Staapl's _implementation_ heavily uses macros for creating new binding and composition structures, so it might need re-implementation in a different style. It seems a better approach to gradually introduce more static features into Staapl than to rewrite the system in an all-static language. Entry: PICkit2 v2.x support Date: Wed Aug 27 14:24:43 CEST 2008 The v2.x Firmware for Microchip's PICkit2 contains a little script interpreter used to decouple device support from programmer firmware. I took this as an opportunity to experiment a bit with a different flavour of metaprogramming. Since it's not a stack machine, I thought I'd try something special-purpose. The scripting language contains two parts: * Interactive commands. The device responds to a set of bytecode commands transported in 64byte buffers. * Scripts. In addition, it is possible to store scripts in programmer RAM using an extra set of instructions. The first implementation generated a simple assembler from a byte code table with names stored in a hash table, and a wrapper macro that allows composition of commands and scripts. Something in the form of: (begin-script (PRIM1) (PRIM2 arg1 arg2 arg3)) (begin-command (CMD1) (CMD2 arg1 arg2)) Where the first form would compile a script to a list of numbers and the second would compile an interactive command list and execute it, collecting the reply. The second implementation has the interactive and script commands all visible in the pk2 module namespace. Since they are all upper case, they do not collide with scheme code. The big advantage here is that the names are lexical and represent first class functions, and the composition is scheme's function nesting and procedure sequencing, not some special purpose 'begin-xxx macro. This allows things like: (begin (EXECUTE_SCRIPT ;; interactive command (PEEK_SFR #x80) ;; script (PEEK_SFR #x81)) ;; script (UPLOAD_DATA)) ;; interactive => (32 251) Here 'EXECUTE_SCRIPT refers to a function that concatenates all script code it receives as an argument, uploads it to the PICkit2 for execution, and waits for a reply. 'PEEK_SFR refers to a function that checks its arguments and compiles a byte code list. The code to generate the functional/procedural representation is really simple, and deals mostly with special case argument formats and binary data formatting (64 byte buffers). http://zwizwa.be/darcs/staapl/staapl/pickit2/interpreter.ss Then the byte code specification is just a table: http://zwizwa.be/darcs/staapl/staapl/pickit2/pk2script.ss The part- and family-specifict scripts in the PK2DeviceFile.dat database are similarly accessible using a single Scheme identifier. This means they can be passed as arguments to 'EXECUTE_SCRIPT. (ProgEntryScript) => (250 247 249 245 243 0 232 20 246 251 231 127 247 250 251 246 232 19) The other properties are available as well too: (Vpp) => 12.0 Entry: Staapl command line compiler Date: Fri Sep 5 12:14:52 CEST 2008 I've added a standard command line interface to the compiler. It is accessible as: mzscheme -p zwizwa/staapl/staaplc -- ... The default language environment is pic18. It is a simple frontend for a couple of scheme functions, and best explained by quoting it's main routine, after command line argument processing. (eval `(begin (require ,(base-language)) (forth-load/compile ,(filename)) (current-console '(,(device) ,(baud))) (save-ihex ,(output-hex)) (save-dict ,(output-dict))) (make-base-namespace)) Given a source file, it produces an Intel HEX file to upload to the target, and a script which instantiates the dictionary (global symbol table) and console information. The latter is only useful when the target contains the monitor code. I.e. to compile a file "foo.f" containing: : foo 1 2 3 ; : bar foo foo bar ; Do this: mzscheme -p zwizwa/staapl/staaplc -- foo.f This produces the files foo.hex and foo.ss It is possible to install a launcher called 'staaplc' by issuing the following command: mzscheme -p zwizwa/staapl/install You probably need to do this using administrator priviliges. From then on it's possible to use staaplc ... Entry: Staapl PLaneT package snapshots Date: Mon Sep 8 21:26:43 CEST 2008 The latest prereleases of the Staapl PLaneT packages can be found here: http://zwizwa.be/darcs/staapl/staapl.plt Other prereleases can be found in the same directory. To install, download the file and use the following command from the PLT distribution: planet fileinject zwizwa staapl.plt 1 7 where "1 7" needs to be replaced with the current prerelease version. To remove use the following command: planet remove zwizwa staapl.plt 1 7 Entry: Blink-A-Led Date: Tue Sep 9 16:55:33 CEST 2008 The example can be found here: http://zwizwa.be/darcs/staapl/staapl/examples/blink-a-led.f Complete recipe: * Install PLT Scheme v4.1 or higher from http://plt-scheme.org * Run "mzscheme -p zwizwa/staapl/install" to download and compile the Staapl code. This creates a wrapper command line application called "staaplc". * Copy-Paste the code below, or download it from the link given above and run "staaplc blink-a-led.f" to the PIC18F452. * Upload the generated blink-a-led.hex You don't need the full PLT distro to run Staapl. If space is an issue, download just MzScheme from: http://pre.plt-scheme.org/installers The example will probably work on other PIC18 chips, but you might need to tweak the configuration bits. I just noticed this example doesn't initialize the stack pointers. Add this definition and call it in init. : init-stacks #x7F 0 lfsr \ DATA stack 16 bytes at #x80 #x8F 1 lfsr \ AUX stack 16 bytes at #x90 0 movlw STKPTR movwf \ hardware return stack ; ---- cut here ---- \ stand-alone BLINK-A-LED for PIC18F452 load p18f452.f \ config bits (40 MHz XTAL) #x300000 org #x00 , #x26 , #x0F , #x0E , #x00 , #x01 , #x81 , #x00 , #x0F , #xC0 , #x0F , #xA0 , \ B: boot write protect #x0F , #x40 , \ we just start at the head of flash memory 0 org : main init begin blink busyloop again : init 0 LATB ! 0 TRISB ! ; : blink LATB 0 toggle ; \ The inner loop takes 3 cycles to execute. [DECF + BNZ] : busyloop 100 for 100 for 100 for next next next ; Entry: Incremental development Date: Fri Sep 12 11:25:35 CEST 2008 As already hinted before, the Staapl framework contains some tension between ``transparent'' and ``toplevel'' development frameworks. TOPLEVEL: Also called ``image based'' development. Typically in a dynamically typed language it is possible to create and change definitions during interaction. This has traditionally been the model in Lisp, Forth and especially Smalltalk. This model is heavy based on mutation and the sequence of code (re)definitions. TRANSPARENT: Here application structure is a property ONLY of the source code, and does not depend on load order. It is essentially declarative. All definitions are permanent, and cannot be redefined. This approach is typically taken in statically typed languages, but is also used in the PLT Scheme module system. Staapl allows you to use both approaches, mostly coming from PLT Scheme's "load" and "require" forms. It is possible to create applications with a fully static module structure which can then be interactively extended. The workflow looks like this: 1. build an application kernel using the module system (this can be a single module with a flat namespace) 2. compile it and upload the kernel image to the target system. 3. connect a console to the target system, and incrementally upload new code. when it's stable, add it to the kernel and reflash the whole application. In order for 3. to work, the console application needs to know where to find the kernel code, since the target does not contain a symbol table. This can be done by either not leaving the compiler after compiling and uploading the kernel, keeping all the symbol tables alive, or by reconstructing this state using a dictionary file. Such a file looks like this: (require (planet zwizwa/staapl/prj/pic18)) (init-prj) (forth-load/compile "/tmp/foo.f") (words! '((asdf code 0))) (console! '("/dev/staapl0" #f)) (pointers! '((code 1) (data 0))) (begin) The first two lines initialize the compiler namespace with the base language. The third line re-loads the top application file, reconstructing all macros and code. The fourth line makes sure the target words are linked to the correct addresses. The remaining lines set console parameters and pointers to free code and data memory respectively. The catch is in the 'words! and 'pointers! procedures. The source code can get out of sync with the binary code uploaded to the microcontroller. When the application has changed compared to the code present on the microcontroller, these two commands are relevant, as they will replace the dictionary compiled from the application source so it reflects the older, out of sync target code, to ensure interactive compilation will still work. This approach is intentionally not transparent. There are cases where it is more convenient to let the application source code and on-target compiled code get slightly out of sync, to benefit from the shorter edit-compile-upload cycle which doesn't need a target reboot. In those cases however it is important to have the new _macros_ available for interactive compilation. Entry: Staapl vs. MetaML / MetaScheme Date: Sat Sep 13 09:31:58 CEST 2008 This post is an attempt to list the differences between the kind of metaprogramming Staapl uses, and the approach used in (untyped) MetaML. Scheme's hygienic macros: Allow manipulation of arbitrary expression trees, but provide both referential transparency (all identifiers take semantics from the definition site) and avoid variable capture (identifiers introduced by a macro never bind variable references that were already present in the code.) Scheme's approach is about IDENTIFIERS, not about abstract syntax trees. The bindings of these identifiers (syntax or values) drives the expansion. Lexical structure comes from extending the expander's local identifier environment whenever new binding forms are encountered. Binding information from the local environment is available through functions like 'bound-identifier=?. Lisp macros and quasiquotation: A simpler templating mechanism that does not interpret names in any way, and thus has problems with hygiene when manipulating or generating code with lexical structure. MetaML's staging: Ignoring the type system details, this becomes MetaScheme[1]. It is based on the operations: Bracket, Escape, Lift, and Run. These operations are performed on a code data structure. Such a system needs a more highlevel representation of code as lambda terms. Care needs to be taken to ensure proper renaming when manipulating binding forms and providing cross-stage-persistence. Staapl's staging: The comparison to MetaML is a bit difficult because Staapl doesn't have a clear concept of ``code object'' in the sense that MetaML does. Staapl's metaprogramming is function-oriented instead of data-oriented. The central object in Staapl is a ``code transformer'' which operates on primitive stack machine code. Metaprogramming in Staapl is established by two mechanisms. The first one is similar to MetaML's 4 operations on code objects, but operates on compositions of code transformers instead of a code data structure, and is no longer a true staging operation in the MetaML sense. The second is the definition of primitive transformers operating on stack machine code (the data-oriented part). Due to the absence of a binding mechanism in Coma code, the equivalents of Bracket and Escape are simply quasiquote and unquote. Bracket corresponds to the concatenative composition form 'macro: while escape is an 'unquote operation which allows interpolation from the Scheme lexical environment. There is an analog for MetaML's cross-stage persistence (Lift), but this is usually better made more explicit in Staapl: Unquote operations are usually preferred over automatic lifting: (let ((square (macro: dup *)) (one 1)) (macro: ,square ,square ',one +)) ;; 4th power + 1 This is for two reasons: Scheme and Coma have different name spaces to support different language semantics side-by-side and unquoting explicitly chooses between them. By default 'macro: takes from the '(macro) namespace, i.e. the '+ 'dup and '* identifiers in the example above. Moreover, Coma code handles values and functions differently, which requires a choice of how Scheme identifiers should be interpreted. This is reflected in the two unquote forms , and ', which insert functions and literals from the Scheme name space in the Coma code respectively. However, when it's clear whether identifiers should represent functions or values, automatic cross-stage lifting is easily added. It is used in the 'tscat: composition operation where Scheme lexical names can be referenced as values: (let ((one 1)) (tscat: one +)) ;; increment Similarly, one could choose to use default function semantics, where Scheme lexical identifiers are dereferenced as Coma functions. This is used in a language extension that allows the introduction of local functions. Note that the code generator closes over identifiers in the currently visible '(macro) namespace also. These identifiers behave exactly as Scheme identifiers; they are implemented as prefixed identifiers. There's a corresponding let form: (let-ns (macro) ((inc (macro: 1 +))) (macro: inc inc)) ;; add 2 The analog for MetaML's Run operation is more difficult to identify, since this requires on-target stack machine code execution. While it is possible in Staapl, compilation is usually kept separate from on-target code exection. However, the different model used in Staapl does present an ``instantiate'' operation which produces stack machine code from a code generator. A generator is a code transformer that accepts the empty program. Upto now we ignored the other metaprogramming mechanism that uses pattern matching to define machine code transformer primitives. This is an essential part of Staapl, and is in some sense complementary to the Scheme->Coma bracketing operations in that it can be used to represent Partial Evaluation rules which _deconstruct_ code. They have no equivalent in MetaML, but resemble abstract interpretation techniques that have been used in conjunction with MetaOcaml code construction. In Staapl this is an integral part, and it's the interplay between quasiquotation and deconstruction that gives the system its expressiveness. [1] http://okmij.org/ftp/Computation/Generative.html#meta-scheme Entry: Staapl's metaprogramming approach Date: Tue Sep 16 13:06:06 CEST 2008 Staapl isn't a conventional multi-stage system which concentrate on ``code as data''. Staapl concentrates on _code transformers_ where higher order generation is modeled with higher order functions (functions producing code transformers) instead of explicit staging. 1. MACHINE CODE and MACHINE CODE TRANSFORMERS There are only two forms of ``code'' in Staapl: 1. Machine code (M), represented a list of primitive machine instructions, where the machine primitives are modeled as an algebraic data type[1]. 2. Machine code transformers (M->M), modeled as functions that operate on _one end_ of a list of machine primitive instructions. These machine code transformers thus behave as instructions of a stack machine. ( Assume for now that the target machine is a concrete 2-stack machine. The partial evaluation mechanism explained below is suitable to include compile time mapping to register based machine architectures in a very straightforward way. ) Because machine code transformers can be combined using function composition, they can be used to build a programming language (== a set of primitives + composition mechanism). This forms the language of Concatenative Macros called Coma. By enforcing M->M transformers to only operate on one end of generated code, Coma behaves as a stack language that can support a Forth language dialect. An important observation is that this language doesn't merely support macro _expansion_ as one would see in a macro assembler. The mechanism necessary to pass arguments to macros is _identical_ to the one that represents machine code: the algebraic data type mentioned above. Macro arguments and target code are placed on the same level by the introduction of the (pseudo) machine instruction [qw ], short for Quote Word. This instruction causes a number (a machine data word) to be loaded on the run-time parameter stack when the generated program executes. However, when the qw instruction appears as the input of a transformer, it can be removed from the instruction stack as long as it is possible to use the the argument it carries in a compile time computation that doesn't change the meaning of the code. This is called constant folding[2]. An example to illustrate this. The macro "1" pushes the machine code instruction [qw 1] to the current code stack. (qw is short for Quote Word). The run time effect of this code is to load that number onto the run time parameter stack. The macro "2" does the same for [qw 2]. The composition of these two macros, the Coma program "1 2" will produce code on top of the code stack looking like ... [qw 1] [qw 2] with the top of the stack on the right. This code has the run-time effect of loading the two numbers on the run-time stack. With this code present on the code stack, the code transformer associated to the coma code "+",can perform a compile-time computation, and replace the top of the code stack with ... [qw 3] In other cases "+" will merely insert code to perform an addition at run-time, using data from the run-time parameter stack. An essential observation is the possibility to generalize the arguments of [qw ] to any kind of abstract data type that can be represented in Scheme, but behaves as a ``constant'' in the sense that it can somehow be eliminated compile time. I.e. it can be a code generator, or a data structure with a high level representation of code to be generated. The key point is that these objects can be manipulated _from within Forth_ code as if they were present at run time, hiding the compile-time specialization behind the scenes. This allows the addition of high-level domain-specific language extensions written in Scheme to a project with low-level logic written in Forth, keeping the local semantics simple. Both the high and low level languages are thus extensible, and can be used as the substrate for DSLs. Scheme's macro system can be used to construct "data-entry" front-ends for algorithms that compile a specification down to a level where it can run on the abstract machine, and interface with low-level code written in Forth (or any other language based on the Coma language). All identifiers that occur in any part of the code are managed by PLT Scheme's declarative module system, providing a prooven way to control complexity of large systems through modularity. 2. HIGHER ORDER FUNCTIONS In order to make this mechanism more flexible, the Coma _composition_ mechanism is exposed to Scheme. Next to deconstruction and construction of stack machine code, it is also possible to _construct_ concatenative macros from Scheme, and to _evaluate_ them. This mechanism is orthogonal to the machine code pattern matching functionality. Because of the absence of lexical structure in the combinator based Coma language, there are no hygiene issues when the composition mechanism is extended with _quasiquotation_ mechanism that can be used to construct code templates. These operations correspond to the MetaML operations of Bracket (code composition), Escape (unquote) and Run (macro evaluation) with MetaML's code object is replaced with Staapl's code representation based on stack machine code transformers. MetaML's cross-stage persistance is mostly handled by Scheme's lexical scoping rules, and doesn't present a problem because code is opaque and represented by closures. 3. MACHINE MODEL An important remark is that the hardware abstraction layer for the concrete machine is not the machine language, but a set of named Coma transformers that needs to be implemented for each concrete machine. This has the advantage that the machine model doesn't need a byte code and a separate JIT. Implementing a port uses exactly the same mechanism as the one used for the more highlevel metaprogramming on top of the machine model. A port is a set of rewrite rules for a concrete instruction set using the pattern matching and quasiquotation metaprogramming tools. A port can be kept as straightforward as possible, using only part of the real machine's instruction set. Optimizations can be added as needed, and pseudo instructions can be added as way to factor code rewrite rules. (I.e. the PIC18 port uses factors the highly specialized test+branch opcodes into separate test and branch pseudo opcodes.) [1] http://en.wikipedia.org/wiki/Algebraic_data_types [2] http://en.wikipedia.org/wiki/Constant_folding Entry: A type system for Coma Date: Fri Sep 26 11:30:58 CEST 2008 I've been thinking a bit about a type system for Coma, and I've run into two problems where dynamic typing features are used in a way that is not trivially eliminated: * Quasiquotation from Scheme expressions. * Code generation dispatch on operand values. My conclusion is that adding the appropriate ``compile time value passing'' is probably going to impose some restrictions that require invasive restructuring. A more interesting approach is probably to try to build the core of the compiler on top of a language with static typing. The functionality of the compiler itself is available in high-level form and straightforwardly transferred to Haskell or Ocaml code. Ignoring the prefix -> postfix trick, this: Techniques for Embedding Postfix Languages in Haskell by Chris Okasaki. Haskell Workshop, October 2002, pages 105-113. http://www.eecs.usma.edu/webs/people/okasaki/hw02.ps contains a straightforward type system, modeling stacks as nested tuples: add :: ((s, Int), Int) -> (s, Int) Which is simply extended using type classes as: add :: Num n => ((s, n), n) -> (s, n) It looks like this solves the main embedding problem. The patterns macro can then be replaced by Haskell's (or Ocaml's) pattern matching functions. Such an approach should be able to be used as a test suite for Staapl. As long as a translation exists from (a subset of) Staapl's transformer core to an embedding in a typed language, the system (subset) is type safe. This approach is rooted in the idea that program correctness can be approached from two sides: dynamic type systems + tests (allow more, but cannot eliminate all run-time type errors) or static type systems (allow less, but eliminate all rtte). Entry: A MetaOcaml offshoring example Date: Sat Sep 27 10:30:24 CEST 2008 This post contains some examples of metaprogramming based on MetaOcaml's staging operations and Ocaml->C offshoring. By representing a subset of C as a subset of Ocaml, basic Ocaml->Ocaml metaprogramming can be used, avoiding manual generation of C code. MetaOcaml: http://www.metaocaml.org/ terminology: http://web.cecs.pdx.edu/~sheard/staged.html See the following paper for an introduction by the authors: ``Implicitly Heterogeneous Multi-Stage Programming'' http://www.cs.rice.edu/~taha/publications/journal/ngc07.pdf The idea is that heterogenous metaprogramming can be implemented in a homogeneous staged language by directly mapping a particular subset of a representation language to an as large as possible subset of a target language. This translation is different from ordinary compilation in that 1. Translation is DIRECT. In order to give full control over generated code, there is no optimization as there is in an ordinary language compiler. 2. The aim is at full coverage of the TARGET language, not the source language as happens in ordinary language compilation. Support as many of C's language features as possible, but do not attempt to translate all of Ocaml's features, like higher order functions, because they have no equivalent target representation. The staging operations used are: .< e >. bracket .~ e escape .! e run .!{Trx.run_gcc} e offshored run, using gcc backend GENERATING PLAIN C CODE ----------------------- Ocaml code that can be translated to C needs to have the following properties: Functions need to be first order, and can use only the C base types char,int,double and 1/2 dimensional arrays of those base types as function arguments. Functions can return only base types. C function that take multiple arguments correspond to first-order Ocaml functions that take a single tuple as input. # .!{Trx.run_gcc}(.< fun (a, b) -> a + b >.) ;; - : int * int -> int = This produces a piece of C code: int procedure(int a_1 , int b_2 ) { return a_1 + b_2; } The other code in the C file is necessary to run within Ocaml, but is guarded by an #ifdef OCAML_COMPILE to make sure the source can be included as-is into different projects. The following defines a higher order (curried) function, which cannot be translated. # .!{Trx.run_gcc}(.< fun a b -> a + b >.) ;; Can't translate function return type in int> Can't run sth that is not a function or is not of int type; you passed int -> int> Fatal error: exception Typecore.Error_Translation_type_check_subset Replacing + by +. (Ocaml's numeric operations are not polymorphic between ints and floats) generates floating point code: # .!{Trx.run_gcc}(.< fun (a, b) -> a +. b >.) ;; - : float * float -> float = double procedure(double a_1 , double b_2 ) { return a_1 + b_2; } Arrays are supported: Ocaml's arrays map to C arrays. # .!{Trx.run_gcc}. x.(0) + 123>. ;; - : int array -> int = int procedure(int * x_1 ) { return x_1[0] + 123; } The looping construct supported is Ocaml's "for" loop. Note that functions cannot return void (unit) so this includes a dummy "0" expression after the side-effecting array element assignment loop. # .!{Trx.run_gcc}. for i=0 to 99 do x.(i) <- 123 done ; 0>. ;; - : int array -> int = int procedure(int * x_1 ) { int i_2 ; for(i_2 = 0; i_2 <= 99; i_2++) { x_1[i_2] = 123; } ; return 0; } Options can be passed to gcc like this: # ( .!{Trx.run_gcc with compiler_flags="-g -O2"; working_dir="."; file_name="test"; main_function_name="procedure"} . x + x>.) 2;; - : int = 4 The following examples use a file called /tmp/test.ml which when loaded generates test.c as a side effect. #use "/tmp/test.ml" ;; This assumes the .ml file contains a line let p = .!{Trx.run_gcc}procedure at the end, where "procedure" is bound to the code object that needs to be offshored. PARAMETERIZED CODE ------------------ Creating parameterized code can be done using two mechanisms: explicit excape (unquote) and cross-stage persistence. The simplest example I can think of is a specialized addition operation. let make_adder c = .< fun x -> x + .~c >. let procedure = make_adder .< 123 >. The following C code is produced from the code above: int procedure(int x_1 ) { return x_1 + 123; } This is an example of explicit escape: the variable "c" is bound to a piece of code that is spliced into the body of the function. The same example using a mechanism called "cross-stage persistence" (csp) looks like this: let make_adder n = .< fun x -> x + n >. let procedure = make_adder 123 Here one of the operands "n" is taken from the lexical environment outside the quoted code. This variable lives at an earlier stage than the code fragment, and so needs to be carried over to the next stage. Carrying over such values to the next stage is no problem as long as one stays inside Ocaml, where csp values can remain abstract. In order for csp values to work for gcc offshoring, they must be representable as concrete C literals. In the example above this is no problem since "n" refers to a numeric value. ABSTRACT INTERPRETATION ----------------------- Code expansion without reduction (partial evaluation) is quite limited. When generating code, often some kind of specialization is desired which eliminates unnecessary run-time computations. In the following paper the technique of abstract interpretation is employed to generate FFT code: http://www.cs.rice.edu/~taha/publications/conference/emsoft04.pdf Instead of working with abstract code objects directly as produced by the bracket operation, and consumed by the run and escape operations, one works with types that allow the representation of extra information when it is available. The idea is that code represented by such objects can be "processed" without having to look inside any abstract code objects. Deconstructing code objects could violate guarantees provided by the type system and is not possible in MetaOcaml. I.e. an integer could be modeled as One | Other, indicating that we know whether the code value is 1 or otherwise do not have more information and will model it as an abstract integer code object. This enables the definition of an abstract interpreting multiplication operator operating on this type, that can eliminate multiplication by 1 at generation time, but defers the multiplication operation to the next stage by generating multiplication code when there is no information about the value of the code. Entry: A new introduction. Date: Sat Mar 7 13:08:13 CET 2009 After a couple of months of settling time, here's a new basic introduction to Staapl. ---- Staapl is a metaprogramming system based on the Forth and Scheme programming languages. Staapl currently contains the following distinct layers: * A compiled Forth-like language for the Microchip PIC18 microcontroller architecture. * A language for macro definition and instantiation (target code generation) using Forth syntax. Used to implement the PIC18, and written in terms of Coma. * A stack-based macro language called Coma. It can be used for the specification of Forth-like stack-based languages ("specification by compiler"). Coma primitives generate and process sequential imperative machine language (a real stack machine or a RISC machine emulating a stack machine). Coma contains mechanism for creating primitives that support partial evaluation based on pattern matching. Coma's composition mechanism (concatenation) can be manipulated from Scheme using hygienic quasiquotation. Coma is written as an extension of Scat. * An untyped stack language called Scat. Scat is written as an extension of the Scheme programming language (PLT Scheme), and is very similar to the Joy language. These layers comprise the layered organization of the stack languages. Most of the highlevel code is Scheme, and specialized macros written in Scheme. How does Staapl differ from its sources of inspiration? * Staapl's idea of Forth differs from traditional Forth in that it is heavily macro-based: all "words" in Staapl's Forth are macros, and macro and procedure definitions are (mostly) the same: procedures are instantiated macros. * Scat is not Joy, because Scat's code is opaque, while Joy allows for direct introspection and creation of code objects using list operations. * Coma is not just a macro assembler. Macros are _programs_ that can operate on data (base machine language programs) in arbitrary ways. * Staapl is very different from metaprogramming tools based on lexical scoping (i.e. MetaML). Staapl uses using a lexically scoped metalanguagelanguage (Scheme) to metaprogram a stack language which lacks local variables, avoiding problems with lexical scope of the target language. Entry: Small computers Date: Sat Mar 14 20:15:10 CET 2009 Forth & lisp & small computers. http://netzhansa.blogspot.com Eventual goal: independent "full" but simple computer systems. Getting rid of current pool of fast time-to-market crap. The basic idea is networks and computers (languages). Find out what's wrong with unix and fix it. Entry: Stack machines with large register files Date: Sat Mar 14 22:01:18 CET 2009 Somehow a lot of the actual code i wrote in Staapl's forth dialect is an exercise in global resource allocation. I never knew data indirection was so expensive until i met the PIC chip :) But the idea is more general: the tension between simple machine/hardware and macros and code generation is very visible with this chip. Programming on a bigger RISC with cheap indirection makes this not so obvious.. So maybe the point is really to get to hardware in the end? Entry: Stacks and Lambda Date: Mon Mar 30 14:35:22 CEST 2009 To be fair, I have to admit that my earlier obsession with using Forth and stacks (early Packet Forth and Badnop) for everything was a delusion rooted in ignorance. For general purpose programming the lambda calculus extended with some useful binding forms is a much better model. However, this obsession did put me on quite an interesting path about functional programming and staging, and the use of stack machines as compilation targets with a human-programmable/readable machine language. Forth is slightly more abstract than register/memory machines which have no locality at all. You need stacks.. So better embrace them. Entry: Restating Goals Date: Wed Apr 8 15:52:17 CEST 2009 Before restating the new goals, I'm going to list the things that are no longer part of Staapl objective: - DSP / dataflow code (moved to the IP project). The reason for this is quite straightforward: dataflow representation exposes parallelism and implementation is mostly about the interaction between memoization and code/block sequencing: how to keep track of intermediate results given a certain memory hierarchy. Concatenative code is inherently _serial_ in nature, and execution context is easily swapped, which makes it more suited for a concurrency oriented programming model. It might make sense to use concatentative code as a _specification_ language for DSP operations by translating it into dataflow representation, but that is not the goal of Staapl as of yet. So, what are the goals: + Understand modules in a language without local names. This has been the most important problem in using Staapl's Forth to write applications. Almost all generic code will have macro form and directives about instantiation. + Investigate bottom-up (safe) language design and verification rooted in concrete machine semantics and compile-time checks for language abstraction (typed macros). + Find a meet-in-the-middle (MIM) model for small microcontrollers based on concrete semantics. + Find the border between a) static semantic relations relating concrete machine, through MIM model to higher order static functions and b) dynamic behaviour which is interaction with memory and communication with external events/devices/machines. + Optimize the MIM model for FPGA implementation. Some of these terms are a bit vague, but I hope the general idea will become more clear later on. Entry: Eliminating Reflection Date: Mon Apr 20 10:20:12 CEST 2009 Funny how I started out being convinced about the possibility of deep reflection as the one true way to build Staapl. Ever since I started to understand hygienic macros and phase separation in PLT Scheme I'm leaning more towards the static side. This journey has given me a much clearer understanding of the reason why you might want to _limit_ flexibility: to be able to reason more about the structure of the program without it this being a consequence of the program's dynamics. At the one end there is mutable everything (Smalltalk style reflection) and late binding (interpretation), possibly cached. At the other end there is declarative (once things are created, they never change) and early binding (things don't get re-interpreted). Compilers are quite straightforward. The most important thing to figure out is the order in which things are evaluated. No better way than to stick to non-cyclic structure and make linking of cyclic structure explicit. (As opposed to let the program bootstrap itself using mutation.) Entry: Forth is already CPS / ANF / SSA Date: Sun Apr 26 17:19:47 CEST 2009 ( Background: I've been skimming some papers by Oleg Kiselyov about metaprogramming[4]. Some of the recent papers are about avoiding CPS or monads for writing generators, and instead using delimited continuations to hide side-effects so they won't mess up guarantees made by the type system. However, in a concatenative language, there is no need to do anything else than CPS. Maybe there is something interesting to find there? ) I've been re-examining the way RPN code gets compiled to Scheme after having trouble with the lexical variables in the Coma macro language. Basicly I was mixing forward (CPS-style) and backward (expression style) nesting. A concatenative function in CPS has two parameters: 'p' the parameter stack and 'k' the continuaton. In Scheme syntax the composition of CPS functions (A B C) is written as: (lambda (p k) (A p (lambda (p) (B p (lambda (p) (C p k)))))) Alternatively concatenative code can be represented quite directly with p -> p functions and implicit continuations as: (lambda (p) (let ((p (A p))) (let ((p (B p))) (let ((p (C p))) p)))) Let's call this "nested let" form. It is related[2] to single assignment form[3] and administrative normal form[1]. This is what the previous CPS form gets optimized to in PLT Scheme. Look at the output of "mzc --decompile" to see this. To see that these two are closely related, think of the invocation of a continuation as the assignment to a viariable and a jump to the code label / execution of next primitive. The above is nested let equivalent to the following nested expression. (lambda (p) (C (B (A p)))) Which is _backwards_ from the perspective of the concatenative code. This backward notation together with forward nesting of let expressions lead to a lexical variable form which i couldn't get to nest properly. However, in the nested let form local bindings are introduced quite naturally, binding to the right. [1] http://en.wikipedia.org/wiki/Administrative_normal_form [2] http://chak.web.cse.unsw.edu.au/~patrykz/papers/ssa-lambda/ [3] http://en.wikipedia.org/wiki/Static_single_assignment_form [4] http://okmij.org/ftp/Computation/Generative.html Entry: Released staapl-0.5.8 Date: Mon May 18 12:03:00 CEST 2009 After a couple of months spent doing Linux/C systems programming, I'm happy to be back to Scheme and Forth. Version 0.5.8 is available on PLaneT[1]. During the last 2 months I've been working on a single big change: * Forth projects as PLT Scheme modules. Previous dependency on namespaces and dynamic overloading of functionality using mutation or Scheme parameters is removed. The compiler is now completely ``hook-less'' with all pluggable behaviour organized as PLT Scheme units, and the rest implemented as bottom-up static modules. Note that while this move to a fully static compiler makes it easier to structure programs, it is still possible to shoot yourself in the foot with REPL-based interactive incremental development for exploratory programming. We don't want to give that up! For details about how this was done see the ramblings[2] starting mid march 2009. It was quite a bit of work to disentangle, but apparently the structure was already there in spirit. I believe it is now possible to gently move to a more statically typed system. [1] http://planet.plt-scheme.org/display.ss?package=staapl.plt&owner=zwizwa [2] http://zwizwa.be/ramblings/staapl Entry: Onward Date: Mon May 18 12:58:04 CEST 2009 So what's next? Several things to do, one of the most pressing practical ones is to get USB to work on PIC18F2550. With the new static structure it should be straightforward to write a mini-language for implementing USB drivers. The second thing that has been on my mind is a proper partial evaluation architecture. I'd like to write a concatenative language that doesn't distinguish between functions and macros, but performs inlining and function specialization automatically. What I've been calling partial evaluation upto now is actually only a form of eager constant folding: it doesn't do automatic specialization but requires you to specify which functions are macros and thus to be inlined. And then there is of course documentation. I'm thinking about writing a single manual, of which the first section is an introductory guide. Entry: Live interaction Date: Mon May 25 13:37:42 CEST 2009 While the compiler language tower is ever moving towards a more static approach, the target interaction language just has become more dynamic. The (target: word ...) form is like any other RPN form in Staapl where the (word ...) represent concatenative code with possible prefix parser extensions. However, as opposed to other such forms it has late-bound semantics: identifiers are interpreted at run-time in a toplevel environment. The static name structure of the debugger (as opposed to that of the program) isn't all that interesting. For debugging, dynamic tweaking power comes in handy. Identifiers in the (target: ...) form are tried in several interpretations successively in the following order until one succeeds: 1. Prefix parsers from the (target) dictionary like "ul" (upload) which modify the semantics of subsequent words. Most of these will expand to scat code that operate on filenames or word names. 2. Compiled words from the (target) dictionary present in the target code memory. These are simply executed. 3. Macros from the (macro) dictionary. These are compiled to intermediate code and then interpreted. This doesn't work for all macros (most notably control words) but provides a convenient way to perform a computation without having to instantiate macros as compiled target words. 4. Words residing in the (scat) dictionary are executed. 5. Scheme procedures are dynamically converted to scat semantics and executed. Doing it this way gives maximum flexibility (all names can be redefined at run-time in the toplevel namespace) and avoid unnecessary static wrapping. All the target interaction is defined in live/tethered.ss as a bunch of scheme functions accessible at all levels. The target console is a very simple layer on top of this. One example of how this is quite useful: tom@zzz:~/staapl/app$ mzscheme picstamp.dict Connected (/dev/ttyUSB0 230400) erasing blocks: 22 memory clear. Press ctrl-D to quit. OK scheme > (define (x) (target> ul /tmp/eq.f)) > OK x include "/tmp/eq.f" ......OK Entry: Compile mode accessible at console Date: Thu May 28 17:16:22 CEST 2009 With the recent simplification of the interaction language, it wasn't so difficult to allow a switch to compile mode from the interactive console. The words forth macro : Will trigger compile mode. The word and all subsequent words on the line will be compiled as if they occured in a Forth file, and uploaded to the target. Similarly, the following words perform compilation, but will switch back to interpret mode after the single symbolic argument they take. variable 2varaiable require planet staapl Note that in standard Forth the ';' word will exit compilation mode, but in Staapl Forth it just indicates procedure exit. A word or macro can have multiple exit points. This also allowed the creation of the '::' word, which will compile a line as Forth code, upload it and execute it. This makes it easier to test macros. To run the code again use 'last'. Entry: Writing device drivers Date: Tue Jun 2 11:08:00 CEST 2009 What I'd really like to figure out is if there is a proper workflow when writing a device driver. I'm getting a good dose of it with the USB code. In my experience it is always trial and error: even if the datasheet has correct information, it is usually presented in such a way that you have to "sort it" first.. Then, if there is example code, it is usually difficult to understand the underlying idea from a big ball of mud that works. Usually you want to do it slightly different for whatever reason, so picking out the right information from example code becomes error-prone. The problem is really state: as I've mentioned before, hardware is very stateful and very sensitive to the order of operations. Its state space usually contains a lot of non-working or even lockup states that have to be avoided. This seems to get only worse for modern equipment: devices get more complicated and much of the "getting it right" is left to the software. This makes sense: hardware becomes simpler to build and is more flexible that way, while software just needs to be written once. However, this doesn't make the life of the driver writer much fun.. If I'm to tackle any real problem, maybe this is the one? So, what is a device driver exactly? It's the missing piece of control software that abstracts a collection of loosely coupled state machines as a coherent piece of equipment with simple services and a couple of modes: an object. The number of modes (states) presented to the used should probably be kept to a minimum. The most flexible approach is a simple on/off mode. Why is writing a device driver so difficult? Individuality of modules gets lost. Idially, you want to hide state but instead the controller (driver software) gets to manage all of it. So the first thing to do is to abstract all items that are actually independent. I've found building these abstractions to be the most difficult point because of interdependency (i.e. resource sharing) being very hard to articulate. Let's find some pubs. NDL[1][2] seems to be the state of the art. Summarized it has the following features: - Bitfields are individually addressable. - The language understands the conept of "device state". - States can be coupled: state transitions might cascade. - Accesses can be predicated (only valid in certain states + state transitions generated) Optimizations (metric = # IO ops) - idempotence (prune successive writes to state registers) - field aggregation (combine stores to bitfields) These features seem simple yet effective. Maybe some similar approach could be taken with some Staapl Forth extensions. [1] http://www1.cs.columbia.edu/~sedwards/papers/conway2004ndl.pdf [2] http://cs.nyu.edu/~cconway/present/ndl-overview-ibm.pdf Entry: Why use (Scheme) macros? Date: Tue Jun 2 13:17:13 CEST 2009 Essentially this[1], but put a bit differently. Macros escape the default semantics of plain Scheme: - names produce values by variable reference - parentheses produce values by passing values to abstract computations (application). The exceptions to this standard behaviour are expressed using the primitive forms LAMBDA and QUOTE. The LAMBDA form is used for two things: to _introduce_ variables in a body of code (create code with holes in it) and _abstract_ code as a value that can be stored in a variable. This value is a computation that takes inputs it received upon application, puts them in the holes, and evaluates the resulting expression producing a result value. The QUOTE form changes the meaning of parenthesis and names entirely, and instantiates them as a data structure. In essence, all other macros compose these two forms or abstractions built on top of them. One could distinguish different classes based on distinct intentions: (A) Abstraction: the only way to build a construct out of macros is to build it as a macro. (B) Binding: Introducting names in code bodies using LAMBDA. New binding forms facilitate the introduction of local names for domain-specific data structures. (i.e. data structure pattern matching). (C) Code abstraction: strict evaluation (run-it-now) can only be escaped by wrapping code in a LAMBDA form. Sometimes it makes sense to build abstractions that modify _when_ code executes, by keeping possible future behaviour stored in a variable. (i.e. lazy evaluation). (D) Data abstraction: analogous to Code abstraction, it is possible to create data structures using the QUOTE form. (i.e. QUASIQUOTE). Note that this classification excludes macros for partial evaluation = compile-time code specialization. However, Scheme is not well equipped for this kind of use: functions abstracted by macros can no longer be used as function values for higher-order programming. [1] http://people.csail.mit.edu/gregs/ll1-discuss-archive-html/msg01539.html [2] entry://20080523-093628 Entry: Why Forth works Date: Tue Jun 2 17:03:19 CEST 2009 (TODO: Reformulate. The ideas are ok, but not well put.) Forth works because: * It has a powerful composition mechanism (words and stacks) which provides referential transparency and absence of side effects, allowing for a functional programming style. * Using early binding and cooperative multitasking, the global memory and machine state can be used without loosing local reasoning, to extend the words+stacks core language. Let's start with the biggest disadvantage of Forth or the Forth-like approach to designing run-time systems: Forth requires you to think differently about programs and programming. Coming from a main-stream (C/C++) programming background, working without lexical or object scope is weird. Giving up the use of local names (random access to your problem's sub-parts) and exchanging them with programming in terms of _incremental updates_ requires a change in perspective. One of the most amazing observations I made while _learning_ Forth is is how my use of stack shuffling words drasticly reduced once I got the knack of factoring. While it is in general more difficult to factor a solution into very small steps, it is skill that can be learned. Writing well factored code has the side-effect of bringing deeper understanding. It requires an initial investment (think harder about representation) but often provides good payback in the long run. Once the problem is reduced to the right granularity of sub-problems, the top-level solution is often trivial. A big advantage of wel-factored code is that the individual parts can be tested interactively and in isolation. Such is really important for low-level programming, where the first words you write abstract the concrete machine into something more intuitive to understand. Surprisingly often the process of factoring leads to words that operate only on the stacks. However, in some cases spilling to global variables is necessary. Another thing that amazed me is that global variables and dynamic scope seem to work better in Forth. Single-threaded execution makes shallow binding easy: just push/pop variables when you use them, preferrably in an abstract way. The second is early binding. Code that defines names that already exist won't interfere with the existing code. On top of this, for simple embedded problems, often the global namespace is yours, which makes state management into a design guideline without requiring much language support. Entry: A balance between lowlevel and highlevel Scheme macros Date: Mon Jun 8 14:27:33 CEST 2009 Most of Staapl is now in macro form. Most of the macros themselves are high-level rewrite macros that only use syntax-rules. Some are low-level syntax-case macros that use Scheme code to operate on syntax objects directly, usually because they need to do some form of global analysis. The breaking point between the two forms is usually taken to be the point where you want to do write a macro that uses some abstract code to process its syntax input before it generates its syntax output. However, I've found it to ly a little bit further than that. There are cases where the absence of a composition mechanism (macro subroutine calls) doesn't offset the convenience of a pattern language, mostly because of the convenience of ellipsis "...". Sometimes it is possible to construct CPS-style macros that are parameterized in the form they will expand to. As long as the actual processing that is happening is linear (a simple pipe, i.e. a lexer macro passing data to a parser macro) this works fairly well. Entry: MetaOcaml Date: Thu Jun 11 18:50:51 CEST 2009 What is the point of guaranteeing type-correctness of generator output when the first thing you do is to compile it? I find this[1] to be quite an interesting point: > but Template Haskell does not guarantee before running a generator > that the generator will only generate well-formed code. That's right, but since generated code will have to be compiled before execution, only well-typed code will ever be executed. (The scope-extrusion problem discussed in previous mails suggests that MetaOCaml's typing system cannot guarantee to generate only well-formed code either.) [1] http://www.nabble.com/Questions-about-Equality-of-Code%2C-CSP-and-State-td23866336.html Entry: Released staapl-0.5.9 Date: Tue Jun 30 11:35:15 CEST 2009 This release fixes some bugs in 0.5.8, most notably the broken "staaplc". Installing the package from planet will still give some error messages, however these are due to junk files and can be ignored. They will disappear once I've figured out how to properly test a planet package before uploading. [1] http://planet.plt-scheme.org/display.ss?package=staapl.plt&owner=zwizwa Entry: Connecting with current engineering practices Date: Wed Jul 15 14:04:59 CEST 2009 Staapl is a tool chain based on metaprogramming, which attempts to provide the means to find an equilibrium between manual low-level software development and representation of application structure in the form of domain-specific abstractions. Application specific compiler High level description -------------> Base language The broader goal is to bring old and new techniques from Lisp / Scheme macro systems and multi-stage programming to practical application in a world that is dominated by the C programming language. The concrete problem which needs to be solved is to make the bridge from Staapl's current incubation form to a programming environment based on C, and to figure out ways to teach the basic principles of partial evaluation and metaprogramming to engineers in the field. I'm concentrating those efforts in two separate logs/projects. The meta[1] logs are about dynamic/static metaprogramming mostly aimed at C code generation. The libprim[2] project is about writing a top-level dynamic language core to serve as application core or debugging/testing framework. [1] entry://../meta [2] entry://../libprim Entry: Escaping concrete machine semantics through simulators Date: Thu Jul 16 13:21:53 CEST 2009 One thing which is important in the way Staapl is built up is the relation to the concrete machine. Above else, there should be no abstraction barrier to the lowest level. What I mean is: currently, Staapl is just a fancy macro assembler. All semantics is relative to the specification of the instruction rewrite rules. This is called "specification by compiler". This makes it an electronics engineer's dream (fully hackable low level control with ad-hoc abstraction) and a computer scientist's nightmare (essentially there is nothing you can do rigorously on the metalevel). I would like to bring this nightmare to a closure. Taking CPU simulators into the picture, defined relative w.r.t some stable semantics, i.e. Scheme's. That way there is at least a single well-defined semantics for a subset of the base language, transitively defined in terms of Scheme. Think of it like this: Arch 1 / \ Forth -- Arch 2 - Scheme \ / Arch 3 The left side is a Forth dialect which directly translates to a couple of machine architectures. The forking point from Forth to Arch is where different implementations's concrete semantics diverge. The right side is a fully-specified bottom line of each of the machine semantics in a standard representation (could be Scheme or some other form written in scheme that could be easily translated to a subset of C to solve the practical issue that simulation needs to be fast.) The problem is to "lift" the semantics of the simulator in terms of Scheme code _through_ the implementation of each of the different architectures to make it decidable _by machine_ which constructs are portable and well-defined, and which are target-specific hacks. The basic idea to make this work is that given a specification of a simulator, Forth functions can be mapped to a pair containing: - stateless computation (boolean functions) - clobbered machine state What this might solve is the language specification problem: it makes it possible to define a "moving target standard" in terms of a collection of source code + making this standard machine-checkable. Alternatively it can assist greatly in the definition of a Forth -> ASM compiler, since the semantics is directly testable. Possibly human intervention is still necessary to come up with good translation rules, but the correctness could then be either proven or if that's too difficult, unit-tested. It seems to me that what needs to be done is to model everything as a set of equations, and derive everything from that. Entry: An approach to bootstrapping static semantics Date: Tue Jul 21 12:43:20 CEST 2009 Currently it's a bit muddy what exactly I mean with ``static semantics'' but as I see now, this can be turned into an advantage. 1. It is important that Staapl be, at the bottom level, a macro assembler. One should not loose _any_ ground to abstraction, but improve on the current concept of `assembler'. Concretely, what this needs in order to support higher levels is a way to make verification possible by defining concrete machine languages in terms of a common operational semantics (i.e. a simulator for each supported architecture). Bottom Staapl = macro assembler + simulator This is enough to provide ``dynamically typed specification by compiler'', which when done in an integrated environment (Scheme macros with proper name management) already improves quite a bit over current standard practice. 2. Provide a means to raise the _static properties_ of the abstractions used, by facilitating the bootstrap of type systems or any other kind of static source code property. Tools that help would take the form of verification (approximate, static) and test (dynamic) based on the low level operational semantics. Top Staapl = collection of tools to build higher level semantics The overall goal of this is to make sure that a programmer can stay within a certain ``safety level'', but has a straightforward way to build abstractions that bridge low and high level by dropping out of the safe zone in a controlled manner. This is quite a general idea, and there are many systems that attempt it. In order to make it fit in my mind, I make a couple of assumptions: One assumption is that picking a combinator language as the part that is able to drop to machine level is a good choice. This started from ``gut feeling'' but I've found at least two good reasons to date: 1. it allows all issues that come with improperly closed lexical scope to be avoided which makes it trivial to meta-program (the machine can use it), and 2. as a language itself it is quite effective to build reasonably complex yet well-factored systems (the human can use it). Another assumption is that leaving the lowest layer _untyped_ is a good idea: it keeps things simple and allows ``locally contained hacks'' inside the machinery for higher level constructs, without having to leave the system. Note that this might disappear gradually depending on how well the simulation based static analysis works out in the end, but it is my belief that a certain amount of hackery is going to be essential to get any highlevel structure implemented efficiently. Containing these hacks is what I see as the main problem in electronics design and low-level systems programming. Entry: Backlinks Date: Thu Jul 23 08:13:23 CEST 2009 http://lambda-the-ultimate.org/node/2934 http://concatenative.org/wiki/view/Staapl http://planet.plt-scheme.org/display.ss?package=staapl.plt&owner=zwizwa http://groups.google.com/group/comp.lang.scheme/browse_thread/thread/62255a0e90bc483e http://playpower.org/blog/2008/08/programming-the-nes-in-basic/#comment-8 http://www.qooiii.com/blog/?p=502 http://delicious.com/doelie/staapl Entry: A Forth console on PIC 18F1320 Date: Wed Aug 12 19:19:03 CEST 2009 # This demo requires these files[1][2] from the source distribution. # The PLaneT package only contains the compiler. # Start with staapl/app/1220-8.fm[2] as the baseline. This will # provide an interactive interpreter into which code can be loaded. # For the buld process, look at staapl/app/Makefile for more # information. In short: tom@zni:~/staapl/app$ make 1220-8.hex cat 1220-8.fm | grep '\\ #sh#' | cut -b7- >1220-8.program mzc -vk 1220-8.fm mzc v4.2.1.1 [3m], Copyright (c) 2004-2009 PLT Scheme Inc. "1220-8.fm": making "/home/tom/darcs/brood-5/app/1220-8.fm" include "/home/tom/staapl/staapl/pic18/p18f1220.f" include "/home/tom/staapl/staapl/pic18/p18f1220-const.f" include "/home/tom/staapl/staapl/pic18/monitor-serial.f" include "/home/tom/staapl/staapl/pic18/monitor-serial-core.f" include "/home/tom/staapl/staapl/pic18/monitor-serial-warm.f" [output to "./compiled/1220-8_fm.zo"] mzscheme -p zwizwa/staapl/staaplc -- -c /dev/ttyUSB0 1220-8.fm # This will produce a .hex file that can be uploaded to the target and # a .dict file with a symbol table necessary to interact with the # stripped machine code on the PIC. # If you have a PicKit 2 w. pk2cmd you can use the following, # otherwise simply upload the .hex file using your programmer. tom@zni:~/staapl/app$ make 1220-8.flash cat 1220-8.fm | grep '\\ #sh#' | cut -b7- >1220-8.program sh 1220-8.program 1220-8.hex PICkit 2 Program Report 12-8-2009, 19:29:22 Device Type: PIC18F1220 Program Succeeded. Device ID = 07C0 Revision = 0007 Device Name = PIC18F1320 !WARNING! -P device mismatch Operation Succeeded rm 1220-8.program # Now start the console. The .dict file is packaged as an mzscheme # module. Invoke as: tom@zni:~/staapl/app$ mzscheme 1220-8.dict Connected (/dev/ttyUSB0 38400) Press ctrl-D to quit. OK # At this point you have interactive access to the pic chip. The # 18f1320 has 8kB of flash. The flash erase unit is 64 byte blocks. # A map of the available blocks can be printed like this: 8 kb x x x x x x x x x x . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . OK # Currently the interactive monitor uses 640 bytes. The rest is # yours. The first block is reserved for boot code, and we'll use it # later to add the startup code. # Now it is possible to define code. Let's create and upload a word # `add1'. The dot before the `OK' indicates that one line (8 bytes) # of flash has been written. : add1 1 + ; .OK # Run the code: 123 add1 p 124 OK # The `p' word prints the top of the stack. (The `.' word does the # same, and is th standard way.) Next to `p' there is `px' to print # hex and `ps' to print unsigned numbers. 123 px 7B OK # When you exit the interpreter by pressing ctrl-D, the next time you # start it up, all your interactively loaded code will be erased. tom@zni:~/staapl/app$ mzscheme 1220-8.dict Connected (/dev/ttyUSB0 38400) erasing blocks: 10 memory clear. Press ctrl-D to quit. OK # Permanent changes are best written as modules and made part of the # kernel .hex file. This code will be left intact. Think of console # interaction as using `scratch space' next to the kernel code on the # PIC. Stuff you load on top of this is mainly for debugging and as # such can be cleared on boot to start with a clean slate. # To inspect the machine code use `see'. This takes byte addresses or # symbolic words. see add1 : 0280 0F01 [addlw 1] 0282 0012 [return 0] 0284 FFFF [_nop 4095] 0286 FFFF [_nop 4095] 0288 FFFF [_nop 4095] 028A FFFF [_nop 4095] 028C FFFF [_nop 4095] 028E FFFF [_nop 4095] 0290 FFFF [_nop 4095] 0292 FFFF [_nop 4095] 0294 FFFF [_nop 4095] 0296 FFFF [_nop 4095] 0298 FFFF [_nop 4095] 029A FFFF [_nop 4095] 029C FFFF [_nop 4095] 029E FFFF [_nop 4095] OK see 0 : 0000 D01F [bra block1] 0002 FFFF [_nop 4095] 0004 FFFF [_nop 4095] 0006 FFFF [_nop 4095] 0008 FFFF [_nop 4095] 000A FFFF [_nop 4095] 000C FFFF [_nop 4095] 000E FFFF [_nop 4095] 0010 FFFF [_nop 4095] 0012 FFFF [_nop 4095] 0014 FFFF [_nop 4095] 0016 FFFF [_nop 4095] 0018 FFFF [_nop 4095] 001A FFFF [_nop 4095] 001C FFFF [_nop 4095] 001E FFFF [_nop 4095] OK see block1 : 0040 D0F1 [bra warm] 0042 FFFF [_nop 4095] 0044 BA9E [btfsp 1 jsr/ack 5 0] 0046 D001 [bra .L2] 0048 D7FD [bra .L0] 004A A2AB [btfsp 0 171 1 0] 004C D003 [bra .L5] 004E 98AB [bpf 1 171 4 0] 0050 88AB [bpf 0 171 4 0] 0052 D7F8 [bra .L0] 0054 6EEC [movwf 236 0] 0056 50AE [movf .L30 0 0] 0058 A4AB [btfsp 0 171 2 0] 005A D002 [bra .L9] 005C 50ED [movf lda 0 0] 005E D7F2 [bra .L0] OK [1] http://zwizwa.be/darcs/staapl/app/Makefile [2] http://zwizwa.be/darcs/staapl/app/1220-8.fm Entry: Lazy partial evaluation Date: Wed Aug 26 10:14:35 CEST 2009 The cool thing about a concatenative language (or any purely functional formulation) is that it allows lazyness, and your `compile time' becomes completely arbitrary, because the order of evaluation makes no difference. In a concatenative language this is a lot easier to express than in the lambda calculus: it's just the associativity law of the syntactic/semantic monoids. Now, in practice however, you do want side-effects as `external input'. (In Staapl's low-level Forth-like language this also includes memory operations.) Moral: you can't fully reduce to a value: some of that needs to be done in a 2nd stage, the program's run-time. This re-introduces compilation, but as a real staging / partial-eval operation. It points the finger to the sour spot too: what I want in Staapl is really this lazy semantics. The current eager approach is an approximation that makes the evaluator easier to get right, but it's probably a dead end in that the neat tricks lazyness allows won't be possible. Some relevant links: [1] http://thyer.name/phd-thesis/ [2] http://lukepalmer.wordpress.com/2009/05/04/lazy-partial-evaluation/ Entry: Why 2 stacks? Date: Wed Sep 2 14:08:44 CEST 2009 I've been trying to answer this question in a straightforward way for a while. Usually this translates to something like ``decoupling argument passing from procedure calls''. As Chuck Moore says[1]: ``Forth is highly factored code. I don't know anything else to say except that Forth is definitions. If you have a lot of small definitions you are writing Forth. In order to write a lot of small definitions you have to have a stack. Stacks are not popular. Its strange to me that they are not. There is a just lot of pressure from vested interests that don't like stacks, they like registers. Stacks are not a solve all problems concept but they are very very useful, especially for information hiding and you have to have two of them.'' My attempt at this is to approach what Chuck calls ``small definitions'' from the point of view of locality and structure of primitive and composite rewrite rules. When reductions and expansions are local, two stacks emerge quite naturally from a purely syntactic desciption. Let's take as an example the formal variant of the Joy language. Here we don't consider semantics. Semantics is usually expressed directly as functions operating on a stack, or operationally where also function evaluation is made explicit as 2 stacks: a parameter stack on which functions operate and a control / return / continuation stack used for function nesting. Formally (syntactically), we're only interested in describing the equivalence of two quotations, not in what they actually mean. Equivalence can be proven by supplying a sequence of quotations (a proof) in which each consecutive quotation is related to the previous one by the application of a rewrite rule. Syntactic rewrite rules can be classified as: * local reduction: primitive operators are defined in terms of rewrite rules that reduce in combination with values * local expansion: composite operators can be replaced by the constituents of the quotation that defines them. Note that the concept of `value' is important. It is exactly this concept, and the form of the primitive rewrite rules which are specified as reductions on values, that gives the language its proper form. I.e. the primitive `swap' is defined as a rule: swap = Two details are important: `swap' is the only non-value in the expression, and it appears on the _right_ of a concatenation of values. Let's examine the behaviour of this formal system, defining numbers as values, and assuming there are rules of the kind above defined for all symbols (primitive words) or that they expand into quotations (composite words). The structure of the rules allows the following notation to be used. Let the symbol `|' denote a separation between values on the left side, and programs on the right. This is only an _annotation_ of reduction proofs (adding visual reminders of which objects are values), not an essential part of the rules! We move the `|' one position to the right whenever we encounter a value. If it is not a value, we perform a primitive reduction or a composite expansion. A example of an equivalence proof based on two primitive rules: | 2 5 + 3 * 2 | 5 + 3 * 2 5 | + 3 * 7 | 3 * 7 3 | * 21 | An equivalence proof including the expansion of a composite program square = [dup *] looks like: | 12 square dup 12 | square dup 12 | dup * dup 12 12 | * dup 144 | dup 144 144 | This illustrates that the 2 stacks _emerge_ as a notation device for presenting equivalence proofs: the parameter stack on the left of `|' and the program (continuation) stack on the right, with facing tops. ( Note that I am begging the question. _Of course_ the reduction rules are structured in such a way that stacks emerge. My point is that you don't need stacks to explain concatenative stack languages, just a principle of locality and some concept of reduction order, here enforced by the concept of `value'. Semantically this then neatly corresponds to function composition, where the reduction order directly relates to the fact that function composition in general does not commute. ) [1] http://www.ultratechnology.com/1xforth.htm Entry: Implementability Date: Thu Sep 3 10:36:07 CEST 2009 It's starting to become more clear that what I am looking for with Staapl and other metaprogramming based on functional programming techniques is the concept of `implementability': is it possible to eliminate high-level constructs that cannot be implemented at run time due to the different semantics of the target language. Given a generic representation of an algorithm and a staging compiler, is it possible to map the representation on the targed given the current set of constraints. So, this would be something inbetween staging and partial evaluation. Entry: Getting rid of CONS Date: Thu Sep 10 13:01:50 CEST 2009 The point in [1] is that ordinary recursive first/rest approach is inherently serial, and in order to make it parallizable the decomposition should be more vector-like: split / process / append. The interesting remark (p.68) is that for any serial program that can be written as a composition of functions, it might be possible to parallellize those because of associativity as long as some kind of decoupling of intermediate functions can be found. This reminds me of something ;) Anyways, apparently ``parallel prefix function composition'' is something that can be used here[2]. The conclusion seems to be that MapReduce is a big deal, and that there are ways around the need for commutativity. One thing that cought my eye is this[3]: ``If you can construct two sequential versions of a function that is a homomorphism on lists, one that operates left-to-right and one right-to-left, then there is a technique for constructing a parallel version automatically.'' Just derive a weak right inverse function and then apply the Third Homomorphism Theorem[4]: ``A function on lists that can be computed both from left-to-right and from right-to-left is _necessarily_ a list homomorphism--it can be computed according to any parenthesization of the list.'' Some remarks. I'm trying to relate this to constructing algebra of recursion operators on algebraic data types in [5] on vectors. [1] http://research.sun.com/projects/plrg/Publications/ICFPAugust2009Steele.pdf [2] http://wwwtcs.inf.tu-dresden.de/~voigt/popl202-voigtlaender.pdf [3] http://www.ipl.t.u-tokyo.ac.jp/~takeichi/attachments/pldi07.pdf [4] http://researchspace.auckland.ac.nz/bitstream/2292/3514/1/005thirdht.pdf [5] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.41.125 Entry: Two Attacks Date: Fri Sep 11 11:07:48 CEST 2009 1. Concatenative languages and combinators. I believe that concatenative languages (point-free functional programming) are a good framework for expressing performance-oriented DSLs with well-formed mathematical models. First, as a basic substrate, concatenative _stack_ languages can serve as the basic _sequential_ scripting substrate of an overall system. Second, (not necessarily as a stack language) point-free style allows a higher abstraction level (data flow instead of sequential control flow), while at the same time allowing _restricted_ semantics to emerge as sublanguages. The more restrictions on the language, the easier mathematical manipulation of _programs_ becomes. I see two main advantages here: * Automated program manipulation is important for abstracting implementation strategies. Automatic partial evaluation alone can not solve this, because it assumes there is a single good solution. (The main problem being whether to unroll/inline or not is in general not decidable.) * In data-oriented programming, the combinators might have a large design space for mapping to actual hardware. Making these parameters explicit allows one to get a better grip on how they relate to performance and implementability. In short: separating algorithm+correctness from implementation mapping is undoubtedly a great productivity gain, _if_ the specification of implementation mapping is done in a _meaningful_ way, i.e. as a point in a search space, possibly in combination with some mathematical laws about the search space, in a form that makes sense. 2. Focus on static semantics and program manipulation. Practically, in Staapl I would like to shift focus to * Putting the current eager partial evaluation (staging without annotation) on a better theoretical ground. The guideline here is partially evaluating threaded state. The attack is probably related to monads and arrows. * Construction of combinator subsets that satisfy mathematical laws (in the FL sense) on top of this substrate of partial evaluation of parallel threaded state. So, the slogan is something like: ``Avoid low-level control constructs such as recursion and sequencing.'' Any serial code (i.e. accumulation), expressed as function composition, should indicate only _necessary_ data dependency. All control flow should be expressed as higher order combinators which can then be treated as opaque mathematical objects. As Guy Steele[1] with his mind on Fortress puts the latter: ``Get rid of CONS!'', maybe one should also add[2]: ``and LAMBDA!'' ;) [1] http://research.sun.com/projects/plrg/Publications/ICFPAugust2009Steele.pdf [2] http://news.ycombinator.com/item?id=814632 Entry: From Stack to Point-Free Date: Sun Sep 20 16:26:37 CEST 2009 I've been thinking a bit more about DSP-style ``parallel pipelined RISC (VLIW/VECTOR)'' programming vs. serial stack-machine approach. Both have their uses. The stack machine approach seems to work best for non-numeric sequencing style code: very simple recursive processors performing tasks that can be handled by dedicated state machines. Essentially: stack machines have cheap branching. On the other end of the spectrum there is the DSP mainstream: VLIW + VECTOR (i.e. TI C6000). Here the data-oriented approach seems to work best. Control structures are very expensive, but for sufficiently streamlined data, the architecture packs a lot of power. ( Granted, Chuck Moore's view on DSP (many simple stack machine cores) does have its appeal as it combines a bit of both worlds. I'd like to give it a try, but current economic inertia doesn't seem to favour the idea much... ) I'd like to steer Staapl a bit into the direction of the DSP world. Mostly because I'm convinced that the concatenative / compostional / point-free style is a good abstraction for building high-level DSP specifications, and Staapls macro system, together with some to-be constructed type system, could serve as the vehicle for developping such applications. Some pointers from John Nowak: Backus[1]. De Moor[2], Bird[3], Meijer[4], Patterson[5], Tatsuya[6], Iverson[7], Meertens[8], Gibbons[9]. [1] http://nl.wikipedia.org/wiki/John_Backus [2] http://www.comlab.ox.ac.uk/people/oege.demoor/ [3] http://en.wikipedia.org/wiki/Richard_Bird_%28computer_scientist%29 [4] http://lambda-the-ultimate.org/user/776 [5] http://www.soi.city.ac.uk/~ross/ [6] [7] http://en.wikipedia.org/wiki/Kenneth_E._Iverson [8] http://en.wikipedia.org/wiki/Lambert_Meertens [9] Entry: Lexical Scope is Difficult Date: Thu Oct 8 16:10:52 CEST 2009 It has become more clear to me recently that lexical scope is indeed quite difficult to get right for metaprogramming. The design of MetaOCaml is non-trival (i.e. side effects and scope extrusion). Picking an algebraic (point-free) formalism seems to make program manipulation a lot simpler. An interesting observation is that the same system (Staapl) can host both a general-purpose stack-based language for systems glue, as well as special-purpose algebraic/numerical formalisms for algorithm specification. Entry: Getting started with Staapl Date: Sun Mar 7 17:16:06 CET 2010 (see also [1]) To try Staapl it's best to use the darcs version. The PLaneT version is not up-to-date and contains only the compiler and not the Makefiles & examples. These are instructions for linux. If you try on OSX or Cygwin and it doesn't work, drop me a line. $ darcs get http://zwizwa.be/darcs/staapl $ cd staapl $ make At this point it's possible to compile applications. To compile the blink-a-led try: $ cd app $ make blink-a-led.hex This produces a HEX file that can be uploaded to a 18f452 chip with a 40 MHz XTAL. To use different configuration, see the config bits in any of the .fm files. The simplest interactive app is 1220-8.fm which has the serial console attached. This is for an 18F1220 with internal osciallator. To produce only the dictionary (and hex) do: $ make 1220-8.dict The .dict files are PLT Scheme modules that can be executed. They configure a front-end console to the binary code in the HEX file once uploaded to the target. If the target running, the console can be started as: $ mzscheme 1220-8.dict Connected (/dev/ttyUSB0 38400) Press ctrl-D to quit. OK If you use a PicKit2 programmer and a USB serial cable on /dev/ttyUSB0, the following method can be used to compile, upload and connect: $ make 1220-8.live See app/Makefile and the .fm files for more information. Once at the console, you can execute commands and load 8-bit numbers on the stack. See the `words' and `commands' commands to display a list of on-target compiled words and console commands respectively. The `macros' command displays a list of macros that can be used inside word definitions only. Using standard forth syntax like : foo 123 ; : bar foo foo ; you can compile and upload new word definitions into the scratch buffer. These are persistent: they go into Flash memory. The scratch buffer is emptied whenever the console is started to make sure the state is predictable. The `empty' command does the same, but it does not clean the dictionary. Definitions can be placed in a file and loaded into the scratch buffer using: load /tmp/definitions.f It is possible to develop a small application like this by simply leaving the last uploaded version on the target. However for larger applications it is better to use the forth module (.fm) approach to build a new kernel that can be compiled to HEX file. [1] entry://20090812-191903 Entry: Verifying the ad-hoc PIC18 code generator Date: Sun Mar 14 19:21:09 CET 2010 PROBLEM: verify the correctness an ad-hoc collection of reduction rules (pattern matching). The solution I see is the use of an explicit representation of target machine semantics. I.e. run the generated code in a simulator and hook it up to a test system that tries to provide full coverage. My main point of confusion is not explicitly recognizing all these language levels involved. Each level's data is next level's function; between every level lies an interpreter. FUNCTION DATA rep Scheme functions Scheme syntax objects ---------------------------------------------------------------------------- 1) Forth prefix parsers concatenative syntax 2) Macro functions machine assembly code (-> binary machine code) 3) Machine state TX machine state Level 1) can largely be ignored: it is a trivial preprocessing step to relate traditional Forth to the functional macro model. Level 2) is where the beef is. This part contains a lot of information (transformation rules) as it encodes semantics of the machine in an indirect way, i.e. preserving an approximation of the semantics of machine code. Level 3) is given: the concrete semantics of the target machine can be used to build a simulator in Scheme. The point is verification of the error-prone creative hackery step 2) by making an interpretation arrow from concatenative syntax (the data of 1) to an abstract (function) domain that can somehow be related to the translation that the compiler makes from data in 1) to the function domain of level 3). The trick is then to find a collection of input code that produces programs that fire the optimization rules in different ways. COMPILER (w/o prefix parser) asm stx | conc stx =====> | macro machine | code state V | asm stx =====> | machine state tx | V machine state REF SEMANTICS: vm state | conc stx ===> | vm state tx | V vm state A diagram that relates the interpretation (=>) and function (->) relations. Interpretation maps syntax (data) to semantics (function). Entry: Released staapl-0.5.10 (PLaneT 1.10) Date: Sun Oct 3 09:36:18 CEST 2010 Bugfix release. Entry: Lambda the Ultimate Date: Sun Nov 14 09:43:42 EST 2010 Staapl got mentioned on LtU[1]. Here's a comment: Staapl consists of two parts that complement each other: a module compiler and an interactive toplevel. The compiler marries the Racket macro system with a Forth-like low level 2-stack language that acts as a machine abstraction. The interactive toplevel is an ad-hoc command interface bolted on top of the compiler and a target machine monitor. The Racket macro system makes language towers easier to understand by pinning down the meaning of identifiers at compile time, eliminating some "moving parts" and so making code easier to reason about and easier to modularize. The "functional nature" of a 2-stack machine as opposed to a register machine adds simple low-level code composition and so further facilitates modularization. The complementary interactive toplevel emulates a standalone Forth interpreter, allowing the execution of existing on-target code and the definition of new on-target code by invoking the compiler and uploading machine code while the target is running. The interactive toplevel is there to "break the rules" when you need more direct control over the target hardware. While the "static" and "purely functional" approach are nice for building well-structured programs, the reality of embedded programming is still that hardware issues are related to state and ill-defined behaviour and require interactive trial and error to resolve. [1] http://lambda-the-ultimate.org/node/3999 Entry: Staapl and RISC machines Date: Sun Nov 14 11:30:46 EST 2010 From an implementation perspective a 2-stack machine model allows for a very simple low-level code generator which is mostly just a peephole optimizer. Note however that while this works very well for a chip like the Microchip PIC18, a many-register machine like ARM or MIPS probably requires a "real" compiler with more aggressive inlining and register allocation if execution speed is an issue. Staapl is side-stepping this issue by concentrating on code size as opposed to execution speed. Forth-style programming can lead to very dense code at the expense of having to think a bit harder to encode your problem and factor it appropriately. Entry: What is a Staapl macro? I :: m -> (t -> t) Date: Tue Nov 16 08:37:47 EST 2010 Traditionally, a (Lisp/Scheme/...) macro perfoms only expansion while the compiler can perform partial evaluation on the resulting expression independently of the behaviour of the macro; working only with the expansion results. In Lisp/Scheme it is possible to inspect sub-forms at macro expansion time to alter their behaviour originally specified as primitive forms or other macro forms. However, this isn't so well-behaved as the meaning of forms is no longer defined in a single spot. In Staapl, macros perform their own evaluation, but they don't modify the structure of program text (with opaque, unknown semantics), but instead operate on a well-defined lower-level representation language. More formally, suppose t is a concatenative target language (syntax) and m is a concatenative macro language (syntax). Each token in m is mapped (interpreted) to a function t->t or I :: m -> (t->t). Each non-primitive element in m consists of a concatenation of primitive elements of m, which each map to a t->t function, thus each non-primitive element of m also corresponds to a t->t function which is the function composition of all the t->t functions associated to each m token. The t-meaning of m is then given by ((I m) t0) where t0 is the empty t program. It doesn't seem that you can do this with an applicative language as there is no relation between concatenation and composition and no "natural" target language. The Scheme above can be towered: in fact in real life the target language t will be interpreted by a machine execution unit E :: t -> (s->s) which maps each t token (machine instruction) to a function transforming the machine state s. One additional element is necessary: in order to get to a usable programming language that supports modularity, the functions (t->t) and (s->s) need to be ``local'' or ``small-dimensional'' in some way. Staapl PIC18 currently uses stack machines where both t and s are stacks of respectively PIC18 code and 8-bit machine words. Entry: S-expression based syntax Date: Thu Dec 30 15:32:11 EST 2010 I started an effort to provide an s-expression based syntax for instantiated target words. The big difference with macros is that target code is inherently flat, i.e. it uses labels instead of well-deliniated functions. I'm not sure if that's the right way to go, but the low level control flow access you get in this way is at least useful. However it might turn out to be better to use a higher level function-based abstraction later on, and provide some guarantees in the code generation phase. But really, I'm not going to fuss about that now. Target code is flat, and this flatness is accessible. words-flat : Define a sequence of (name . code) pairs with fallthrough. This is essentially the stack-based machine language. words : On top of `words-flat', define a list of (name . code) pairs where code has implicit `exit'. variables, variables-flat: Same, but for reserving RAM space. Having the code available in s-expression form allows PLT units to be used for parameterization. In short, the Forth parser is too convoluted to be able to blend in unit support. This is caused by the flatness of the Forth language, and the fact that names can be introduced along the way. If possible I'd like to save the Forth syntax on the long term. However this would require some serious redesign. Currently it is kept as-is. New developments will be implemented in the s-expression syntax. Entry: Abstract Interpretation (staapl/absint) Date: Sat Jan 22 16:25:38 EST 2011 I'm working on porting some of the DSP DSL ideas recently explored in Haskell back to Scheme to see how much I'm going to miss the type system. I already know that I don't miss the continuation monad too much! It seems that pure functional specification combined with abstract domains implemented with well-behaved side effects works better. I.e. it's ok to use dynamic variables to implement dictionary threading, but it's not ok to update the _values_ these dynvars point to, as that will mess up partial continuations. An advantage of pure specifications is that the specs themselves can later be transferred to Haskell without problems for analysis that do not require the sharing structure (i.e. black box / semantic analysis). I guess template haskell could also be used to translate the specificiations to monadic form to implement the Scheme strict evaluation order side-effect the Scheme code relies on, without the ugly notation this would give in Haskell. All this is logged in the meta[1] postings, somwhere around [2]. [1] entry://../meta/ [2] entry://../meta/20110121-113603 Entry: Embedded development is #ifdef-land Date: Tue Mar 8 16:39:23 EST 2011 One of the ideas behind Staapl is to step away from the C macro preprocessor to implement compile time conditionals. Essentially, an application's top level module is a script that links together units, specifying implementations for interfaces. Entry: Staapl's target is flat Date: Fri Mar 11 09:59:12 EST 2011 Staapl's target code generation is very similar to PostScript: a sequence of drawing operations constructs a 2D paper grid. While macros (compilation state transformers) are purely functional, one of the layers in the state that is passed along is the target code graph. It is an "evolved decision" to keep this as a graph, and not a tree. Entry: Right or Fast? Date: Tue Mar 29 00:51:25 EDT 2011 * The ``right'' way to build an application is to build a monolithic kernel. This taps into the lexically sane hygienic macro magic of Racket. A kernel (really a kernel generator) is a Racket module. * The ``fast'' way to build an application is to incrementally add definitions on a live target, relative to a fixed kernel. Both approaches are valid. The monilithic kernel approach is very close to ordinary Racket programming, and is a good approach for developing large programs. Read the Racket guide and manual for more information. The disadvantage is that the cycle is slower (compile, upload). The incremental approach is a ColorForth[1] style early binding approach, and not a lisp style which would allow change of bindings in existing code. You can add new code on top of old code obscuring old bindings for new code, but you can't modify existing bonds unless you provide your own indirection mechanisms. This seems overall a more bullet-proof default. The incremental approach has an `empty' and `load' cycle. The `empty' command will reset the code back to the initial kernel state, and `load' can be used to add (non-modular) on top of this. A workflow I use is to develop code in a single file and use `empty' and `load' to quickly change the code on a running target. When done, move the code to the kernel and recompile. The incremental approach doesn't require a chip reset, and so works better with applications that need to maintain state or run-time behaviour inbetween code updates. Overall I've found this to be the case in deeply embedded programming. You spend most of your time setting up ad-hoc tests and measurements to see why things are not working as expected. A fast incremental state-change approach works better here than a recompile-the-world functional approach. [1] http://en.wikipedia.org/wiki/ColorForth Entry: Forth Syntax Date: Tue Mar 29 11:40:48 EDT 2011 Staapl is a descendant of a Forth compiler[1]. It is a point in a long line of iterations to rewrite the original in Racket (PLT Scheme) using and learning about the approach native to Racket: powerful hygienic macros. I've gone to great lengths to try to maintain a Forth syntax frontend despite of the core going through several makeovers while absorbing new Racket abstractions and approaches. Recently I was weighing the idea of dropping Forth syntax completely because of the tension it creates with the Racket macro & module framework. Eventually, I was able to push the workarounds behind an interface, thinking it was better to not break what worked before, but intending to not use it for newer work. However, some recent work on low-level PIC code has lead me to re-apprecite the usefulness of Forth syntax and mindset. Maybe it's just what I'm used to (and the emacs syntax highlighting ;) but for writing straight line code it is a very good user interface. Combined with the the interactive incremental setting which allows quick one-line function definitions to run ad-hoc tests on the PIC, it is quite powerful. So it's probably going to stay. In any case, all functionality that used to be only available in the Forth syntax is now available in s-expression form without the prefix parser construct. The latter is very hard to reconcile with s-expression macros in the current approach. I plan to rewrite the old Forth parser on top of the new s-expression format (now it's special-purpose) but we'll see if there's time and necessity. [1] http://zwizwa.be/pic/badnop Entry: PICkit 2 Date: Fri Apr 1 00:00:39 EDT 2011 While the PK2 is being phased out by Microchip to be replaced by the PK3, it is still a neat toy. The 2.x firmware (last one is 2.32) is essentially a small scripting interpreter that contains primitives for data communication and voltage control. Its best feature is that it is fully documented. It is currently still available from Microchip, but will probably remain available for a while as a clone[1]. Due to its versatility, the PK2 has been used for many alternative purposes, which might be the reason why Microchip is phasing it out in favor of the PK3 which doesn't have an open architecture. I started writing a driver for use of PK2 in Staapl a bit after the PK2 came out. I gave up after not getting it to work relyably and moving to other projects. Recently however, Jaromir Sukuba published a description of the non-documented PIC on-chip debugger features[2] which has revived my interest in getting the PK2 to work with Staapl. The main issue I had was the hassle involved using both an ICSP port and a serial console on a PIC with low pin count. Moving towards console + programming on only a single port would simplifiy some of my development setups. With info on the debugger available I finally traded some sleep for getting it to work. So it works now. Staapl talks to a PIC over the ICSP connector. The current downside is that it's not particulary fast in practice due to the nature of the handshake protocol I'm using, which ping-pongs across a 1ms USB delay. However this is not intrinsic to the PK2 as it does work well for single-direction burst transfers. Some long tail work TODO there. [1] http://shop.ebay.com/?_nkw=pickit2 [2] entry://../electronics/20110320-225422 Entry: All over the place Date: Fri Apr 1 12:33:44 EDT 2011 ( This probably deserves a whole article. Consider it as a seed. ) It has been interesting and challenging to map the requirements of Staapl onto the Racket code organization abstractions. Most of the organization in Staapl is guided by Racket's macro & module system. I started out completely uninformed and made all the usual beginner mistakes (including dynamic binding, of course). - Modules and macros This is the heart of Racket. It is what makes flexible language extension possible, and Staapl uses it as its default code organization principle. - Alternative interpreter/compilers (syntax certificates) A more advanced form of low-level syntax manipulation. This is used to implement the RPN code compiler/interpreter. Currently the way this is done in Staapl needs some rework as I've not 100% figured out the details. - Units, and the way they mesh with the above. Units are used to implement the compiler internals in a modular way. Unit signatures are used to implement Staapl "word sets". - Namespaces & eval, and how they differe from modules For interactive code upload a namespace approach is used. This does away with all static guarantees in places where it's useful. More deeply, the tension between static code dependencies and dyanamic "patching" is central to Forth, and the tension it creates is fascinating. - Partial continuations While not strictly in the same class as the above, partial continuations are and interesting abstraction for turning code into data and vice versa. Together with a "mostly functional" style of programming they are a powerful tool. Entry: Monad vs. Applicative / Arrow Date: Sun Oct 16 15:52:57 EDT 2011 When I started out writing Staapl, I did not know Haskell, so (without actually verifying) I assumed monads are any kind of "behind the scenes data plumbing" mechanism. However, it seems that the way the Staapl compiler is constructed is more akin to an Arrow approach. It would be interesting to formalize this a bit: find the natural way of mapping a stack composition mechanism to/from Arrow. Entry: Released staapl-0.5.11 (PLaneT 1.11) Date: Mon Nov 7 12:44:09 EST 2011 As usual, there is a main tarball release which includes example code, and a PLaneT Racket module that contains just documentation and modules. Main changes: - Microchip PicKit 2 support. Hook up a PK2 to a factory-clean PIC chip and get started. See app/Makefile in the .tar.gz distribution. - Implemented Racket unit support. It seems that Racket's unit system (pluggable components) is a better abstraction for Forth code modules, as opposed to Racket's module system. This release contains a major refactoring of the Coma compiler and Forth frontend to support standard wordset signatures. Entry: Finite-state Machines (shallow coroutines) and extensible compiler state. Date: Wed Nov 9 10:15:18 EST 2011 While it is often useful to work on a "thread" abstraction level where each thread of control has a separate execution stack, on a low-RAM uC this can be expensive because tasks have to be dimensioned for the worst case stack usage. In practice however, it is often possible to write a solution in terms of a Finite-state Machine (FSM). Doing this manually (without language support) can be a bit of a drag, as it involves explicit computation of a state variable on FSM exit, and dispatch on FSM entry. It is possible to remove this hassle by generating the state storage and state dispatch code as part of a "yield" command, which has the same meaning as a true coroutine yield, with the exception that it can only be executed from the main thread of control, i.e. it's execution stack is only 1 deep, hence the name "shallow coroutine" (SCR). Implementing this in Staapl requires a (compile time) context that can relate control points in the code to allocation of state IDs. In order to make this work, I've added an "ext" field to the compiler state which contains a (functional) hash table. The purpose of this field is to provide local state for macro functionality, i,e, "begin-fsm ... end-fsm" in an extensible, modular way, meaning that state from different modules will not clash. An interesting observation is that once this mechanism works for a 1-SCR, it might be possible to extend it to support n-SCR, which would be a coroutine with bounded stack depth. If all the control points are known at compile time it is possible to "flatten" the thread state by enumerating it. Entry: Staapl niche? Date: Wed Nov 9 10:21:41 EST 2011 Looks like I found the niche: low-memory microcontrollers and programmable hardware. Forth is really good for writing compact procedural code while retaining good local properties that allow reasoning about a program. It gives a lot of bang for the buck concerning RAM usage. If I look at what I've been doing with Staapl (and what I've been enjoying the most) is to find a good balance between extremely resource-efficient programs and readable, maintainable programs. Entry: Getting started Date: Thu May 17 12:34:23 EDT 2012 It's been about half a year since I worked on Staapl. A good opportunity to write a HOWTO. - Connect PIC18F2550 to PK2. PK2 will provide power. - cd staapl/app/ ; make pk2-2550-48.live This should give you an OK prompt. To see if it works, print a Flash block dump. Eeach "x" or "." represents a 64byte Flash block, either (partily) used or empty. tom@zoo:~/staapl/app$ make pk2-2550-48.live mzc -vk pk2-2550-48.fm mzc v5.1 [3m], Copyright (c) 2004-2011 PLT Scheme Inc. "pk2-2550-48.fm": [already up-to-date at "./compiled/pk2-2550-48_fm.zo"] mzscheme -p zwizwa/staapl/staaplc -- -c pickit2 pk2-2550-48.fm chmod +x pk2-2550-48.dict cat pk2-2550-48.fm | grep '\\ #sh#' | cut -b7- >pk2-2550-48.program mzc -vk pk2-2550-48.fm mzc v5.1 [3m], Copyright (c) 2004-2011 PLT Scheme Inc. "pk2-2550-48.fm": [already up-to-date at "./compiled/pk2-2550-48_fm.zo"] mzscheme -p zwizwa/staapl/staaplc -- -c pickit2 pk2-2550-48.fm chmod +x pk2-2550-48.hex sh pk2-2550-48.program pk2-2550-48.hex PICkit 2 Program Report 17-5-2012, 12:44:19 Device Type: PIC18F2550 Program Succeeded. Device ID = 1240 Revision = 0002 Device Name = PIC18F2550 Operation Succeeded mzc -vk ../staapl/live.ss mzc v5.1 [3m], Copyright (c) 2004-2011 PLT Scheme Inc. "../staapl/live.ss": [already up-to-date at "../staapl/compiled/live_ss.zo"] mzscheme pk2-2550-48.dict Connecting to PICkit2. datfile: /usr/local/bin/PK2DeviceFile.dat iProduct: PICkit 2 Microcontroller Programmer Press ctrl-D to quit. OK 4 kb x x x x x x x x x x x . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . OK Entry: USB Date: Sun May 20 09:38:49 EDT 2012 Things have been a bit rough lately. I'm getting demotivated by my lack of success in getting the PIC18F2550 USB to work. Personal motivation issues set aside, the bottom line is that the USB interface is a limiting factor in making Staapl useful for some gadgets I'm thinking of building (and selling). Without USB there is basically no product, and I'm too proud to use the Microchip firmware for this. This continuing lack of success has also become somewhat of a crusade. I started out Staapl as a tool for debugging. I find it a bit ironic that the problem I'm facing is a debugging problem. So that's where my current focus will be: make the USB problem as visible as possible. EDIT: USB is moving forward, and focus on debugging is becoming more clear. Entry: Smaller PIC chips Date: Sun Jun 3 11:32:34 EDT 2012 I think for making this commercially interesting, a move to smaller PIC chips is necessary. This requires a special purpose programmer as most of the small chips are not self-programmable. ( Possibly also a debug header == special chip with debug hardware. ) So the roadmap will be something like this: - get USB to work on 18F - write alternative firmware for PK2 Entry: Protocol oriented programming Date: Wed Jun 6 13:10:47 EDT 2012 There is this nice duality between prefix/postfix that I find hard to articulate, but that always comes up. Stream data is prefix because it needs to "announce" the meaning of the data, but parsers are postfix so they can "gather" context as the data streams in.