[<<][staapl][>>][..]Thu Oct 8 11:09:27 CEST 2009

So, goal is this: given a collection of streams and a number of delay operators, construct a kernel routine. Start with 1 (audio), move on to 2 (image) and finish with 3 (video). The kernel routine is _declarative_ : it merely states the relation between neighbouring pixels. To give an operational semantics, a serialization needs to be implemented that constructs a nested loop in terms of a physical data stream. The problems are thus: * specification -> normal form polynomial * polynomial -> imperative loop Parameterizing the choices to be made in the last step will give an implementation / optimization strategy. An interesting heuristic for (artificially constructed) school exercises is to ask: did I use all the axioms/equalities? It might be a good idea to first focus on transformation rules for simple algebraic expressions in s-expression form, i.e. generic associativity. There are two important operations: - arbitrary -> right rotated binary tree (rrbt) - rrbt <-> flat This works really well. A large collection of simple tree / graph operations will probably be the right approach. NEXT: loop body construction. This involves obtaining delay information for the variables, and constructing delay assignments and pre-roll code. This seems quite straightforward. Requires some core routines (id hash) so can be done later. Ok: I have something that generates this: box> (ttx loop-body (r/z #'(+ q (z a) (z (z b))))) (begin (begin (set! b_1 (~ b 1)) (set! b_2 (~ b 2)) (set! a_1 (~ a 1))) (for ((i (in-range n))) (set! b_0 (~ b i)) (set! a_0 (~ a i)) (set! q_0 (~ q i)) (set! (~ result i) (+ q_0 a_1 b_2)) (set! a_1 a_0) (set! b_2 b_1) (set! b_1 b_0))) Now, let's simplify the generator so it becomes more composable. I.e. generator (postponed generated code) should become an algebraic object on which certain transformations can be performed. Can it be turned into a group with a certain generator/... representation? I.e. the code above was directly constructed from a dictionary of variables + their associated maximal delay/offset. Is it possible to factor this into separate operations of memoization (dereference) and matching between adjacent loop iterations? Maybe it's important to see that LOAD/PROCESS/STORE/PROPAGATE is only important as a final view. Memoization can be kept out of the picture for all transformations. [1] http://docs.plt-scheme.org/reference/dicts.html#%28def._%28%28lib._scheme/dict..ss%29._make-custom-hash%29%29

[Reply][About]

[<<][staapl][>>][..]