Sat Apr 23 00:42:20 EDT 2011

Protocol-oriented programming

Staapl, PIC18 USB driver: I got it into my head that I want to program
in Forth without using data structures (random access data) but just
using linear time-ordered streams or channels.

Forth allows very dense code construction.  This is a direct
consequence of its implicit data access: simply not having to store
memory locations saves space.  The price you pay is extra effort
necessary to properly factor code.  It becomes very important to have
a procedure "do only one thing".  This is a skill that can be learned,
and in practice it works amazingly well.

The trouble is, that approach doesn't work well with structured data
(trees and graphs).  The obvious reason is that such data
representation requires the re-introduction of names or random access.

Is it possible to do a similar "compression" for data?

The idea is that if it's possible to use local access variables
instead of random access variables in Forth code, it should be
similarly possible to replace all data structures with serial,
uni-directional protocols.

The underlying idea is the operation of deforestation that is possible
in pure functional programming.  Deforestation eliminates
constructor+match pairs and rewrites the code's control flow to turn
such a pair into a function call or let-abstraction.

The big idea here is that data structures are really just postponed
function calls, or "buffers".  Depending on the fan-out of the data
structures (= the number of times a certain structure is used) it is
often possible to not store intermediate data structures in RAM but
push components to consumers directly.  In practice this can often be
done by introducing parallelism.

I would assume this doesn't work for all kinds of code.  I.e. code
that has intrinsic "data storage" features, such as a database.
However it should be possible to employ the principle in cases where
data use is just buffers or temporary storage.  In that case it should
be possible to eliminate it with different control-flow factoring.

One of the drawbacks of this approach seems to be that the protocol
definition should be part of the design process.  Just as function
prototypes are really important to write good Forth code, serial data
protocols should be optimized for ease of parsing.

It seems better to call this approach "parser-oriented" programming:
make sure that protocols between different components are defined in
such a way that local resource requirements are minimized.

In my current problem (USB driver for PIC18) I'm facing the problem
that the input data structures are fixed, and clearly defined with a
random-access approach in mind (think C structs).  So to make this
work in practice it is probably necessary to preprocess the input to
turn it into a usable event stream.

So what is the really big idea?

Make data dependencies explicit.