Thu Oct 8 11:09:27 CEST 2009
So, goal is this: given a collection of streams and a number of delay
operators, construct a kernel routine. Start with 1 (audio), move on
to 2 (image) and finish with 3 (video).
The kernel routine is _declarative_ : it merely states the relation
between neighbouring pixels. To give an operational semantics, a
serialization needs to be implemented that constructs a nested loop in
terms of a physical data stream.
The problems are thus:
* specification -> normal form polynomial
* polynomial -> imperative loop
Parameterizing the choices to be made in the last step will give an
implementation / optimization strategy.
An interesting heuristic for (artificially constructed) school
exercises is to ask: did I use all the axioms/equalities? It might be
a good idea to first focus on transformation rules for simple
algebraic expressions in s-expression form, i.e. generic
There are two important operations:
- arbitrary -> right rotated binary tree (rrbt)
- rrbt <-> flat
This works really well. A large collection of simple tree / graph
operations will probably be the right approach.
NEXT: loop body construction. This involves obtaining delay
information for the variables, and constructing delay assignments and
pre-roll code. This seems quite straightforward. Requires some core
routines (id hash) so can be done later.
Ok: I have something that generates this:
box> (ttx loop-body (r/z #'(+ q (z a) (z (z b)))))
(begin (set! b_1 (~ b 1))
(set! b_2 (~ b 2))
(set! a_1 (~ a 1)))
(for ((i (in-range n)))
(set! b_0 (~ b i))
(set! a_0 (~ a i))
(set! q_0 (~ q i))
(set! (~ result i) (+ q_0 a_1 b_2))
(set! a_1 a_0)
(set! b_2 b_1)
(set! b_1 b_0)))
Now, let's simplify the generator so it becomes more composable.
I.e. generator (postponed generated code) should become an algebraic
object on which certain transformations can be performed. Can it be
turned into a group with a certain generator/... representation?
I.e. the code above was directly constructed from a dictionary of
variables + their associated maximal delay/offset. Is it possible to
factor this into separate operations of memoization (dereference) and
matching between adjacent loop iterations?
Maybe it's important to see that LOAD/PROCESS/STORE/PROPAGATE is only
important as a final view. Memoization can be kept out of the picture
for all transformations.