Why all these macros? --------------------- To understand the design of PURRR18 as a collection of macros, I'll explain first how a 'computer language' traditionally gets implemented. Then I'll move up to the design of PURRR18, and what macros actually do. A language ---------- A computer only understands one language: its native machine language, which consists of a series of bit patterns stored in memory. These patterns, also called 'instructions' are read out one after the other, and determine switching behaviour inside the computer which in turn causes some computation or inout/output action to happen. For the PIC18 this consists of a sequence of 16bit patterns stored in FLASH memory. In order to use another language to program a computer, you need a way to translate this language to the computer's native machine language. There are basicly two ways of doing this: - using an INTERPRETER This is the simplest way: A program, written in the native machine language, is running on the chip, associating some representation of the high level language (the virtual machine language associated to that language, usually a sequence of tags) into machine code snippets that are already present on the chip, and executes them. This is the traditional way of implementing Forth: each word is associated to a small assembler program that implements its behaviour. - using a COMPILER Instead of storing a program in some intermediate form (using a virtual machine language), the original code is translated in its intirety into an equivalent machine code program, which can run directly on the computer. A compiler ---------- In contrary to popular belief, writing a compiler requires only a slightly bit more work than writing an interpreter. Usually it is very straightforward to convert an interpreter for a language into a compiler, and this process can even be automated. However, a compiler has more information available about the meaning of the code: an interpreter only sees one virtual machine instruction at a time, while a compiler, in theory, can see the whole code at once. Therefore it can use more tricks to generate better machine code, by extracting more meaning or intent from the code and replacing some code fragments with faster or shorter machine code that has exactly the same result. For example, the forth code '1 2 +' can be translated to the sequence: 1 --> push the number 1 onto the stack 2 --> push the number 2 onto the stack + --> pop two numbers off the stack, add them, and push the result This is exactly how an interpreter would perform the following code. It can't do much more, simply because the interpreter is written before any code is actually interpreted! A compiler on the other hand can see the meaning behind the code. It can translate the forth fragment '1 2 +' to the equivalent code '3' and compile that instead: 1 2 + --> push the number 3 onto the stack The result is completely the same, but the machine code is much simpler. This trick is called 'partial evaluation': compute things that can be computed during the compilation phase, instead of postponing them to some later interpretation phase. Moving from interpretation to compilation is essentially the same: the program (interpreter) that interprets the data (virtual machine code) is partially evaluated. The result of this is that some code is computed which will perform the desired behaviour, but this code can be specialized to its particular task. Depending on the smartness of the compiler, this can result in both shorter and faster code. For example, depending on the architecture, a sequence of words '123 @ +' can be compiled into a single machine instruction that fetches the number out of memory location 123 and adds it to the top of the stack. So the basic idea is: if you never look at intermediate results, they don't really need to be there. The thing that can make a compiler complicated is the extent to which it tries to extract meaning. Depending on the source language, this can go very far. Macros ------ Now, what are macros? The traditional way of implementing a Forth compiler is to associate each word individually with some small program that can generate native machine code. This machine code, when executed, performs some desired behaviour. So, there is no single large piece of code that can be called THE compiler, but instead the compiler is a collection of small compilers: each small compiler knows only how to compile one symbolic word. Such a small compiler is called a macro. The result of this approach is that the language's compiler is well factored: it consists of a large quantity of mostly independent subprograms that can be independently tested. This approach makes it easier to implement, easier to get right. Another result is that the compiler is extensible, in much of the same way that Forth itself is extensible. There is no 'one language', but many possible languages. Macros allow for an easy way to implement peephole optimization. This works as follows: instead of translating a whole chunk of code like '1 2 +' into an equivalent piece of code '3', the code is compiled incrementally, but some macros, like '+', check if they can combine the previously compiled machine code with the meaning of the word to some simpler machine code. PURRR18 ------- The forth dialect PURRR18 is entirely implemented using macros. Therefore it does not need an interpreter kernel like a traditional forth does. The reasons for this are essentially: - a compiler (set of macros) enables better mapping to the target machine language, by using peephole optimizing rules. - a full Forth interpreter kernel is not necessary, so the code footprint can be really small - the compiler, running on a powerful host computer, can afford to 'burn some cycles' during compilation, which the target can't do. these cycles can be spent doing some more elaborate optimizations, and to implement the compiler in a high level language. In practice, there is some stripped down interpreter called a 'monitor'. This is implemented using the macros, not vise versa. This monitor is necessary to keep the development process interactive, which is a key feature of a Forth system. The interaction system uses the compiler which runs on the host system, and uses the monitor to upload code to the target, execute it, and inspect the target state. The compiler for PURRR18 only has some very simple optimizations. Most of them are greedy, and thus somewhat suboptimal peephole optimizations described above. Others aim to optimize loops. Once you know the basic words, it is not to hard to write idiomatic forth code that is just as efficient as machine code. In PURRR18, the peephole optimization for literals and constants can be used to perform elaborate runtime computations: it basicly treats the compiled (pseudo) assembler code as a typed data stack. The intermediate language the macros are written in is called CAT. To extend the language in the most flexible way, and have deep access to compiler and assembler constructs, you'll need to write them in CAT. However, some constructs are reflected in the forth-accessible macro definitions. More about this in doc/peval.f Forth in Forth -------------- A small remark about CAT vs FORTH to implement the compiler. I can appreciate the elegance and simplicity of a self-supporting Forth system. But these days, i think this elegance is no more than aestetics, and not really practical. To me, there is really no reason why you should write a Forth compiler in Forth, while you could use Something Better(TM). I still want to USE forth on small systems where it makes a lot of sense to do so, but to solve the problem of mapping a low level forth to some target chip's idiosyncracies in an efficient way, and metaprogramming this low level language using a direct and low level (read: non-symbolic) approach is I think a bit dated. This is my personal preference of course. At this moment, my Something Better(TM) is Scheme and the CAT compositional language. I started to appreciate the value of functional programming and symbolic representations. BADNOP is mainly my way of finding my way to write a system that bridges the high and low levels of programming. That said, one of the projects that popped up in the course of last year is a traditional, simple and hopefully elegant self-hosting standard Forth called PURRR. This Forth is to be used as a 'configuration tool' for a kernel written in PURRR18, and does not need the host BROOD system to compile new code. The code is not there? ---------------------- A big advantage (no forth kernel) is also the main disadvantage of a fully macro-inmplemented Forth. Some code does not have any meaning when used interactively, and can not be independently tested. The simplest example is the '+' word. There is hardly any reason to compile a chunk of code on the PIC18 that implements the behaviour of '+' as a sequence of machine instructions, because it is always faster and sometimes shorter to inline the equivalent machine code. The only reason there would be is to be able to interactively execute only the '+' word, and be able to inspect the machine state before and after the execution. There are two solutions to this. The most straightforward one is to just compile a macro. The code ': + + ;' will create a word '+' that has a physical representation in the chip flash memory. The other solution is to simulate the behaviour on the host system, which is done for most of the arithmetic operations. To make things worse: some words associated to specific chip instructions have no associated meaning as standalone operations, and cannot be instantiated as a compiled macro because they must always be associated with literal arguments. This happens a lot throughout the core of PURRR18: for example in words like 'high' and 'low' need 2 arguments that must be transferrable to numeric address and bit position at compile time. However, these words can be used in composite macros. The main reason these words do not have a run-time equivalent is that they are too inefficient to implement as such. For example, 'high' would need a for loop for the shift count, or a jump table to implement what would otherwise be a simple one instruction operation. Another reason is that they just reflect low level machine parts that have no equivalent in Forth, and serve mainly as a hack to speed things up. An example is the 'z?' word which tests the zero flag. These words are hard to use due to their intrisic dependency of the sequence of instructions and should be hidden in constructs that guarantee their correct usage. INFLUENCES ---------- You are what you read. BADNOP, the forth compiler in the BROOD system, is largely inspired by PicForth written by Samuel Tardieu. This system however is not interactive. BROOD adds an interactive incremental development feature using what is essentially the '3 instruction forth' by Frank Sergeant. In addition to that, the compiler and interaction system are written in Scheme, and a functional compositional language (called CAT) written in Scheme, inspired by the Joy language written by Manfred von Thun. Additional influences are colorForth by Chuck Moore, and the publication 'Moving Forth: a series on writing Forth kernels', by Brad Rodriguez.