BADNOP, a forth compiler for RISC machines with a Harvard architecture. DESIGN ------ The compiler is constructed as a collection of MACROs. Such a macro is a CAT function which operates on assembly object on the top of the stack. This design is fairly standard: Forth is usually its own meta language. The difference here is that the metalanguage is a highlevel functional compositional language, instead of a lowlevel Forth. In order to make the implementation more transparent, the macros are divided into several classes which have limited scope. These classes are then automaticly lifted to the macro prototype. The classes are: - recursive macros - asm state modifiers - pattern matching macros Recursive macros are macros that invoke other macros. Their defining language is thus the macro language itself. All the other macros use the CAT defining language. Asm state modifiers are able to modify the entire assembler output. Pattern matching macros recombine previous output to generate optimized code, they essentially see the assembler output as a typed data stack. CAT --- In order to understand the implementation of BADNOP, a few elements of CAT are required. CAT is a compositional language: one can construct new functions only as a composition of other functions. CAT is modeled after Joy. More concrete, primitives for the CAT virtual machine are represented by Scheme functions of the type "stack -> stack". This type enables a simple notation for function composition: "(a b)" denotes the function which applies b after applying a. Since function composition is associative, one can use multiple functions in sequence in this notation: "(a b c ...)" examples of Scheme primitives: (lambda stack stack) ;; identity (lambda stack (cons 123 stack)) ;; push 123 to stack (lambda stack (cdr stack)) ;; discard (drop) the top element Semantics in CAT are represented by the 'word' abstract data type. A word contains: - source: a symbolic representation (source atom) - compile: function mapping: source atom -> scheme function Parsed CAT code is a cdr linked list of word structures, Parsing is quite literally attaching semantics to code atoms, in the form af a scheme function that compiles symbolic code to a scheme function. This approach makes it fairly easy to have different semantics associated to the same source representation. For example, the list '(1 2 +) could be parsed as cat code using 'parse-cat', which would result in a program which pushes '3' on the stack when invoked. Or, it could be parsed as forth code using 'parse-forth', in which the parsing would attach semantics that will make the program act like a compiler.