Tue Sep 29 12:43:22 CEST 2009

Stream combinators continued

So, starting from the primitive delay combinator Z : s -> s, which
takes a stream and maps it to the stream with one operation shifted,
we need algebraic laws such as:

   (map fn s (Z s))  =  (loop_1_Z fn s)
   (map fn (Z s) s)  =  (loop_Z_1 fn s)

Where `map' is an element-wise morphism, or fn : e -> e, and `loop'
has threaded state.

The objective is to find an abstract way to characterize the functions
`loop_1_Z' and `loop_Z_1' so the rewrite rules work in both
directions.  This depends on the representation of streams (which will
ulimately be machine memory arrays).  Let's start with:

   ((map fn) (I s) (Z s))  =  ((loop fn I Z) s)
   ((map fn) (Z s) (I s))  =  ((loop fn Z I) s)

The objective is to start with a LHS representation in terms of
combination of streams, and turn it into a single iteration over a
number of primitive streams.  This is essentially loop merging.

Later we need to give up the functional notation early and use
data-flow variables: each expression contains a number of inputs and
outputs.  For now expressions are simpler.

The rule above needs a more general composite transformation rule that
can merge two loops.  Essentially what one wants is algebraic rules
that relate stream and function operators.

Let's use the following notation:

s : [e]           [.] stream type constructor
f : e^n -> e^m    elementary function
S : s^n -> s^m    stream transformation
F : f -> S        elementary function transformation

   ((F1 fn) (S1 s)) = ((F2 fn) s)
    +-----+ +----+     +-----+
       S       s          S

Laws are needed that have S1 in terms of an F, like:

   ((F1 f1) ((F2 f2) s)) = ((F3 f1 f2) s)

What I can gather from this is that:

  1. types are important to make this managable.  It might be wise to
     put this in a categorical framework to map out the territory.

  2. the rest seems pretty straightforward derivation of theorems that
     proove equalities.

Let's try this with the important players:

  + * : e^2 -> e
  -   : e -> e
  I,Z : s -> s

What's next is a syntax and semantics for `loop' parameterized by the
elementary operators Z^n, n>=0 with I = Z^0.

TODO: build loop on some substrate of streams of elements (i.e. a
circular vector of exact numbers, to make verification of
transformation rules simple.)

Got loop.ss test skeleton.  The first thing I notice is that I need an
abstraction for threading delayed values:  It's not so simple to do
this with unary fold.  Does for/fold allow for multiple values?  Yes.

OK, I have the first TX test working:

 (lambda (s) (map + (I s) (Z s)))
 (lambda (s) (loop_I_Z + s))
 1 10)

Next is to find a way to generate loop_I_Z from a specification (I Z).
This should generate pre/loop/post code (which is the most annoying
part doing manually).

Another thing: if there is no output dependency (i.e. IIR filters), it
is possible to run loops with delays backwards.  This is necessary for
making operations parallel.  Take this as a hint that the best
description probably takes direction simply as a parameter.  The real
deal is this: in the end, there is an inner loop that accesses a mask
over the data.  Find a way to describe this on a higher level.

In order to properly represent loops it might be best to adhere to a
representation that uses arrays and indices directly.  The problem
here is that references are pure, but of course assignments are not.
Let's move to single-assignment vectors (this way coverage can be

In short: separate abstract formula manipulation from implementation
(single-assignment vectors).

Functional vs. dataflow.  An interesting property is that it is
difficult to talk about vector operations without allowing (single)
assignment: iteration loops seem to scream for explicit mention of the
storage/send of outputs.  Wait.. Maybe a read/write formulation would
work better?  I.e. delimited continuations etc..

Q: can these transformations be expressed easier in terms of delimited

Q: Essentially, memory already behaves as a stream (very clear in DSP:
using DMA from DRAM to SRAM to feed a processor..).  Maybe I should
reformulate loops as combinations of IO + coef + state + loop?

Q: What about seeing code as a description of a circuit and then
re-interpreting as loops (operational semantics?)

I.e. a 3-tap fir filter is a description of a network that connects
adders and multipliers from an input stream to an output stream.
Combinators essentially duplicate a basic pattern to a global one.
Looking at it this way (pure functional kernels) it might be simpler
to derive loops and pre/post code by reducing the pure kernels by
combining them with a traversal strategy.

That's really the essence, right?  Declarative dataflow.  What is
declared?  The _structure_ of the computational network, _not_ the map
to a serial machine.

I.e. instead of thinking about delay lines, think of a little more
highlevel concept of spans: transform loop bodies, such that the final
loop construction takes the form of a universal pre/loop/post
combination that takes a description of the span as input.

Some remarks:

   - using a limited set of core operations, issues of associativity
     and commutativity could be left until later (when discovering

   - the remaining problem is the conversion of an input specification
     to a (memoized) data flow program.  in order to have things map
     easily to serial C, the memoized form uses `let*' terminating in
     a `values' form that produces the output.

Let's try this.  The effect of 'z' needs to be pushed through to an
input variable, to be mined later to determine the signature of the

[1] http://citeseerx.ist.psu.edu/viewdoc/download?doi=
[2] http://eprints.kfupm.edu.sa/45250/1/45250.pdf