Why all these macros?
---------------------

To understand the design of PURRR18 as a collection of macros, I'll
explain first how a 'computer language' traditionally gets
implemented. Then I'll move up to the design of PURRR18, and what
macros actually do.


A language
----------

A computer only understands one language: its native machine language,
which consists of a series of bit patterns stored in memory. These
patterns, also called 'instructions' are read out one after the other,
and determine switching behaviour inside the computer which in turn
causes some computation or inout/output action to happen.

For the PIC18 this consists of a sequence of 16bit patterns stored in
FLASH memory.

In order to use another language to program a computer, you need a way
to translate this language to the computer's native machine language.
There are basicly two ways of doing this:

- using an INTERPRETER

This is the simplest way: A program, written in the native machine
language, is running on the chip, associating some representation of
the high level language (the virtual machine language associated to
that language, usually a sequence of tags) into machine code snippets
that are already present on the chip, and executes them. This is the
traditional way of implementing Forth: each word is associated to a
small assembler program that implements its behaviour.

- using a COMPILER

Instead of storing a program in some intermediate form (using a
virtual machine language), the original code is translated in its
intirety into an equivalent machine code program, which can run
directly on the computer.


A compiler
----------

In contrary to popular belief, writing a compiler requires only a
slightly bit more work than writing an interpreter. Usually it is very
straightforward to convert an interpreter for a language into a
compiler, and this process can even be automated.

However, a compiler has more information available about the meaning
of the code: an interpreter only sees one virtual machine instruction
at a time, while a compiler, in theory, can see the whole code at
once. Therefore it can use more tricks to generate better machine
code, by extracting more meaning or intent from the code and replacing
some code fragments with faster or shorter machine code that has
exactly the same result.

For example, the forth code '1 2 +' can be translated to the sequence:

  1  -->  push the number 1 onto the stack
  2  -->  push the number 2 onto the stack
  +  -->  pop two numbers off the stack, add them, and push the result

This is exactly how an interpreter would perform the following
code. It can't do much more, simply because the interpreter is written
before any code is actually interpreted!

A compiler on the other hand can see the meaning behind the code. It
can translate the forth fragment '1 2 +' to the equivalent code '3'
and compile that instead:

  1 2 +  -->  push the number 3 onto the stack

The result is completely the same, but the machine code is much
simpler. This trick is called 'partial evaluation': compute things
that can be computed during the compilation phase, instead of
postponing them to some later interpretation phase.

Moving from interpretation to compilation is essentially the same: the
program (interpreter) that interprets the data (virtual machine code)
is partially evaluated. The result of this is that some code is
computed which will perform the desired behaviour, but this code can
be specialized to its particular task.

Depending on the smartness of the compiler, this can result in both
shorter and faster code. For example, depending on the architecture, a
sequence of words '123 @ +' can be compiled into a single machine
instruction that fetches the number out of memory location 123 and
adds it to the top of the stack.

So the basic idea is: if you never look at intermediate results, they
don't really need to be there.

The thing that can make a compiler complicated is the extent to which
it tries to extract meaning. Depending on the source language, this
can go very far.


Macros
------

Now, what are macros? The traditional way of implementing a Forth
compiler is to associate each word individually with some small
program that can generate native machine code. This machine code, when
executed, performs some desired behaviour.

So, there is no single large piece of code that can be called THE
compiler, but instead the compiler is a collection of small compilers:
each small compiler knows only how to compile one symbolic word. Such
a small compiler is called a macro.

The result of this approach is that the language's compiler is well
factored: it consists of a large quantity of mostly independent
subprograms that can be independently tested. This approach makes it
easier to implement, easier to get right.

Another result is that the compiler is extensible, in much of the same
way that Forth itself is extensible. There is no 'one language', but
many possible languages.

Macros allow for an easy way to implement peephole optimization. This
works as follows: instead of translating a whole chunk of code like '1
2 +' into an equivalent piece of code '3', the code is compiled
incrementally, but some macros, like '+', check if they can combine
the previously compiled machine code with the meaning of the word to
some simpler machine code. 


PURRR18
-------

The forth dialect PURRR18 is entirely implemented using macros.
Therefore it does not need an interpreter kernel like a traditional
forth does. The reasons for this are essentially:

- a compiler (set of macros) enables better mapping to the target
  machine language, by using peephole optimizing rules.

- a full Forth interpreter kernel is not necessary, so the code
  footprint can be really small

- the compiler, running on a powerful host computer, can afford to
  'burn some cycles' during compilation, which the target can't
  do. these cycles can be spent doing some more elaborate
  optimizations, and to implement the compiler in a high level
  language.


In practice, there is some stripped down interpreter called a
'monitor'. This is implemented using the macros, not vise versa. This
monitor is necessary to keep the development process interactive,
which is a key feature of a Forth system. The interaction system uses
the compiler which runs on the host system, and uses the monitor to
upload code to the target, execute it, and inspect the target state.

The compiler for PURRR18 only has some very simple optimizations. Most
of them are greedy, and thus somewhat suboptimal peephole
optimizations described above. Others aim to optimize loops. Once you
know the basic words, it is not to hard to write idiomatic forth code
that is just as efficient as machine code.

In PURRR18, the peephole optimization for literals and constants can
be used to perform elaborate runtime computations: it basicly treats
the compiled (pseudo) assembler code as a typed data stack.

The intermediate language the macros are written in is called CAT. To
extend the language in the most flexible way, and have deep access to
compiler and assembler constructs, you'll need to write them in
CAT. However, some constructs are reflected in the forth-accessible
macro definitions. More about this in doc/peval.f


Forth in Forth
--------------

A small remark about CAT vs FORTH to implement the compiler.

I can appreciate the elegance and simplicity of a self-supporting
Forth system. But these days, i think this elegance is no more than
aestetics, and not really practical. To me, there is really no reason
why you should write a Forth compiler in Forth, while you could use
Something Better(TM).

I still want to USE forth on small systems where it makes a lot of
sense to do so, but to solve the problem of mapping a low level forth
to some target chip's idiosyncracies in an efficient way, and
metaprogramming this low level language using a direct and low level
(read: non-symbolic) approach is I think a bit dated. 

This is my personal preference of course. At this moment, my Something
Better(TM) is Scheme and the CAT compositional language. I started to
appreciate the value of functional programming and symbolic
representations. BADNOP is mainly my way of finding my way to write a
system that bridges the high and low levels of programming.

That said, one of the projects that popped up in the course of last
year is a traditional, simple and hopefully elegant self-hosting
standard Forth called PURRR. This Forth is to be used as a
'configuration tool' for a kernel written in PURRR18, and does not
need the host BROOD system to compile new code.


The code is not there?
----------------------

A big advantage (no forth kernel) is also the main disadvantage of a
fully macro-inmplemented Forth. Some code does not have any meaning
when used interactively, and can not be independently tested.

The simplest example is the '+' word. There is hardly any reason to
compile a chunk of code on the PIC18 that implements the behaviour of
'+' as a sequence of machine instructions, because it is always faster
and sometimes shorter to inline the equivalent machine code.

The only reason there would be is to be able to interactively execute
only the '+' word, and be able to inspect the machine state before and
after the execution.

There are two solutions to this. The most straightforward one is to
just compile a macro. The code ': + + ;' will create a word '+' that
has a physical representation in the chip flash memory.

The other solution is to simulate the behaviour on the host system,
which is done for most of the arithmetic operations.

To make things worse: some words associated to specific chip
instructions have no associated meaning as standalone operations, and
cannot be instantiated as a compiled macro because they must always be
associated with literal arguments. This happens a lot throughout the
core of PURRR18: for example in words like 'high' and 'low' need 2
arguments that must be transferrable to numeric address and bit
position at compile time. However, these words can be used in
composite macros.

The main reason these words do not have a run-time equivalent is that
they are too inefficient to implement as such. For example, 'high'
would need a for loop for the shift count, or a jump table to
implement what would otherwise be a simple one instruction operation.
Another reason is that they just reflect low level machine parts that
have no equivalent in Forth, and serve mainly as a hack to speed
things up. An example is the 'z?' word which tests the zero flag.
These words are hard to use due to their intrisic dependency of the
sequence of instructions and should be hidden in constructs that
guarantee their correct usage.


INFLUENCES
----------

You are what you read.

BADNOP, the forth compiler in the BROOD system, is largely inspired by
PicForth written by Samuel Tardieu. This system however is not
interactive. BROOD adds an interactive incremental development feature
using what is essentially the '3 instruction forth' by Frank
Sergeant. In addition to that, the compiler and interaction system are
written in Scheme, and a functional compositional language (called
CAT) written in Scheme, inspired by the Joy language written by
Manfred von Thun. Additional influences are colorForth by Chuck Moore,
and the publication 'Moving Forth: a series on writing Forth kernels',
by Brad Rodriguez.