Thu Oct 8 11:09:27 CEST 2009

Filter language

So, goal is this: given a collection of streams and a number of delay
operators, construct a kernel routine.  Start with 1 (audio), move on
to 2 (image) and finish with 3 (video).

The kernel routine is _declarative_ : it merely states the relation
between neighbouring pixels.  To give an operational semantics, a
serialization needs to be implemented that constructs a nested loop in
terms of a physical data stream.

The problems are thus:

    * specification -> normal form polynomial

    * polynomial -> imperative loop

Parameterizing the choices to be made in the last step will give an
implementation / optimization strategy.

An interesting heuristic for (artificially constructed) school
exercises is to ask: did I use all the axioms/equalities?  It might be
a good idea to first focus on transformation rules for simple
algebraic expressions in s-expression form, i.e. generic

There are two important operations:
   - arbitrary -> right rotated binary tree (rrbt)
   - rrbt <-> flat

This works really well.  A large collection of simple tree / graph
operations will probably be the right approach.

NEXT: loop body construction.  This involves obtaining delay
information for the variables, and constructing delay assignments and
pre-roll code.  This seems quite straightforward.  Requires some core
routines (id hash) so can be done later.

Ok: I have something that generates this:

box> (ttx loop-body (r/z #'(+ q (z a) (z (z b)))))
  (begin (set! b_1 (~ b 1))
         (set! b_2 (~ b 2))
         (set! a_1 (~ a 1)))
  (for ((i (in-range n)))
   (set! b_0 (~ b i))
   (set! a_0 (~ a i))
   (set! q_0 (~ q i))
   (set! (~ result i) (+ q_0 a_1 b_2))
   (set! a_1 a_0)
   (set! b_2 b_1)
   (set! b_1 b_0)))

Now, let's simplify the generator so it becomes more composable.
I.e. generator (postponed generated code) should become an algebraic
object on which certain transformations can be performed.  Can it be
turned into a group with a certain generator/... representation?

I.e. the code above was directly constructed from a dictionary of
variables + their associated maximal delay/offset.  Is it possible to
factor this into separate operations of memoization (dereference) and
matching between adjacent loop iterations?

Maybe it's important to see that LOAD/PROCESS/STORE/PROPAGATE is only
important as a final view.  Memoization can be kept out of the picture
for all transformations.

[1] http://docs.plt-scheme.org/reference/dicts.html#%28def._%28%28lib._racket/dict..ss%29._make-custom-hash%29%29