Sun Sep 14 10:03:17 CEST 2008

the Transterpreter

> This is an email from Matt Jadud describing the TVM and occam
> http://www.transterpreter.org/

The TVM is a virtual machine for the instruction set of a Transputer.
In particular, we support the 4xx and 8xx series instruction sets,
which includes floating point operations.

The Transputer had a 3-depth stack (for integer and FP each; I'll
ignore FP from now on, as the FP is really no different/no more
interesting). The A, B, and C registers (as they are called), along
with a workspace pointer, instruction pointer, and a front pointer and
back pointer for the scheduling queue, and a timer queue represent all
of the machine state needed for an occam process. (occam was the
language designed to execute on the Transputer.)

By keeping things to just seven words, you could do very fast context
switches; the Transputer could do nanosecond context switches between
parallel processes. This is, in part, because it had some very fast
RAM on board. It's specialized nature, and expensive construction, are
just two of the reasons it ultimately died; the Intel 286 was a much
cheaper processor.

The occam language is grounded in the CSP algebra for reasoning about
concurrent and parallel processes.


In CSP (and therefore occam) you reason about sequential processes
that execute in parallel. They communicate over channels, and sending
and receiving were designated in the algebra by '!' and '?',
respectively. In occam, these are the operators for sending and
receiving as well.

Producer/consumer looks like:

PROC producer (CHAN BYTE ch!)
    ch ! 42

PROC consumer (CHAN BYTE ch?)
    BYTE c:
    ch ? c

PROC main ()

The channel is represented in the TVM (and on the Transputer) as a
single word in memory. This is a channel word that is in shared
memory; the workspaces of the two processes are private. (This privacy
is, and in fact MUST be, guaranteed by the compiler, or true
parallelism breaks.) When the writer comes in, they tweak the word;
when the reader comes in, they either wait, or complete the

The communications are therefore the synchronization points in a
program, and dictate when we enter the scheduler. Further, they are
synchronous (we block on read and write) and point-to-point. The
extended language, occam-pi, introduces SHARED channels (that make
implementing bus architectures easy), as well as some other nifty
bits, like mobility of data and channel ends. For now, I'm staying in
the core/original language for purposes of this discussion.

How this is implemented does not matter. You could implement that
protected word using transactional memory. You can rip my
producer/consumer apart, and replace that channel with a wireless
link... as long as you preserve the semantics of the channel. It takes
work, but the powerful thing about the channel abstraction is that it
is a clean, well-defined point at which we can ignore what is going on
at the other end, and just assume we're either reading or writing to a
word in memory.

In the TVM, we just use a word and have one or two protected values
that indicate the channel is empty and whatnot (MAX_INT and MIN_INT
play roles here, I think; I can't remember.)

If you're familiar with the Cell, their mailboxes (which provides a
single word for communicating between the CPU<->SPU as well as all of
the SPU<->SPU communications, I think) are effectively the same
primitive. They have implemented what amounts to a CSP channel for
transferring one word at a time between these units directly in
hardware, and a small API for interacting with that word. Of course,
the C compiler doesn't help make sure you're not doing something
braindead, and they do provide for DMA transfers between the SPUs...
but my point is that many hardware devices that are going parallel
have similar operations baked in.

At a higher level, the blocking calls in MPI are equivalent, and I've
mapped occam channels to them in the past. Things "just worked,"
by-and-large... with a few caveats about comms failure, which (sadly)
the original occam model never considered. (It wasn't possible... or,
something catastrophic happened, so you didn't care, since your
computer was on fire.)

So that's a bit of history and a bit of a peek into the way
synchronization happens. The Transterpreter is more interesting than a
Transputer in a lot of ways, since we can programmatically map channel
ends to most anything. For example, in the Blackfin port (and soon the
ARM port, once we get our dev boards), we map the interrupt-driven
UART to a channel end.

0. The VM is sleeping; the CPU is sleeping.
1. A character lands on the UART. The CPU wakes up.
2. The interrupt stuffs the character into an occam channel word. (We
have a C API for this.)
3. The interrupt sets a flag in the VM, wakes the VM, and goes away.
4. The VM runs the occam bytecode, and the "channel" handling comms is
ready. The corresponding read happens.
5. Everyone goes back to sleep if nothing else needs to be done.

The fun thing is that we can do this with an arbitrary number of
interrupts, and the occam program will be semantically sound, even
though we're doing spurrious/arbitrary interrupts on hardware. This is
because channels only guarantee sync, they don't guarantee time.
Therefore, the VM can be "waiting" on any number of "hardware"
channels, but since it thinks they're just normal occam channels, the
VM is happy, and the occam programmer can get on with building a
highly parallel network of communicating processes on their embedded
target without wondering about the safety of their hardware
interrupts. If it works in test (on a single processor with
non-interrupt driven code), then it will work without. (I'm not trying
to waive my hands overly aggressively... things should be cool, but
we're still exploring this space. The semantics, we believe, hold.)

Given that the Transputer was originally used (and still used, in some
form) for codecs, comms, and video processing, it makes sense that
there is a good mapping.