[<<][staapl][>>][..]Tue Sep 29 12:43:22 CEST 2009

So, starting from the primitive delay combinator Z : s -> s, which takes a stream and maps it to the stream with one operation shifted, we need algebraic laws such as: (map fn s (Z s)) = (loop_1_Z fn s) (map fn (Z s) s) = (loop_Z_1 fn s) Where `map' is an element-wise morphism, or fn : e -> e, and `loop' has threaded state. The objective is to find an abstract way to characterize the functions `loop_1_Z' and `loop_Z_1' so the rewrite rules work in both directions. This depends on the representation of streams (which will ulimately be machine memory arrays). Let's start with: ((map fn) (I s) (Z s)) = ((loop fn I Z) s) ((map fn) (Z s) (I s)) = ((loop fn Z I) s) The objective is to start with a LHS representation in terms of combination of streams, and turn it into a single iteration over a number of primitive streams. This is essentially loop merging. Later we need to give up the functional notation early and use data-flow variables: each expression contains a number of inputs and outputs. For now expressions are simpler. The rule above needs a more general composite transformation rule that can merge two loops. Essentially what one wants is algebraic rules that relate stream and function operators. Let's use the following notation: s : [e] [.] stream type constructor f : e^n -> e^m elementary function S : s^n -> s^m stream transformation F : f -> S elementary function transformation ((F1 fn) (S1 s)) = ((F2 fn) s) +-----+ +----+ +-----+ S s S Laws are needed that have S1 in terms of an F, like: ((F1 f1) ((F2 f2) s)) = ((F3 f1 f2) s) What I can gather from this is that: 1. types are important to make this managable. It might be wise to put this in a categorical framework to map out the territory. 2. the rest seems pretty straightforward derivation of theorems that proove equalities. Let's try this with the important players: + * : e^2 -> e - : e -> e I,Z : s -> s What's next is a syntax and semantics for `loop' parameterized by the elementary operators Z^n, n>=0 with I = Z^0. TODO: build loop on some substrate of streams of elements (i.e. a circular vector of exact numbers, to make verification of transformation rules simple.) Got loop.ss test skeleton. The first thing I notice is that I need an abstraction for threading delayed values: It's not so simple to do this with unary fold. Does for/fold allow for multiple values? Yes. OK, I have the first TX test working: (quickcheck (lambda (s) (map + (I s) (Z s))) (lambda (s) (loop_I_Z + s)) 1 10) Next is to find a way to generate loop_I_Z from a specification (I Z). This should generate pre/loop/post code (which is the most annoying part doing manually). Another thing: if there is no output dependency (i.e. IIR filters), it is possible to run loops with delays backwards. This is necessary for making operations parallel. Take this as a hint that the best description probably takes direction simply as a parameter. The real deal is this: in the end, there is an inner loop that accesses a mask over the data. Find a way to describe this on a higher level. In order to properly represent loops it might be best to adhere to a representation that uses arrays and indices directly. The problem here is that references are pure, but of course assignments are not. Let's move to single-assignment vectors (this way coverage can be tested). In short: separate abstract formula manipulation from implementation (single-assignment vectors). Functional vs. dataflow. An interesting property is that it is difficult to talk about vector operations without allowing (single) assignment: iteration loops seem to scream for explicit mention of the storage/send of outputs. Wait.. Maybe a read/write formulation would work better? I.e. delimited continuations etc.. Q: can these transformations be expressed easier in terms of delimited continuations? Q: Essentially, memory already behaves as a stream (very clear in DSP: using DMA from DRAM to SRAM to feed a processor..). Maybe I should reformulate loops as combinations of IO + coef + state + loop? Q: What about seeing code as a description of a circuit and then re-interpreting as loops (operational semantics?) I.e. a 3-tap fir filter is a description of a network that connects adders and multipliers from an input stream to an output stream. Combinators essentially duplicate a basic pattern to a global one. Looking at it this way (pure functional kernels) it might be simpler to derive loops and pre/post code by reducing the pure kernels by combining them with a traversal strategy. That's really the essence, right? Declarative dataflow. What is declared? The _structure_ of the computational network, _not_ the map to a serial machine. I.e. instead of thinking about delay lines, think of a little more highlevel concept of spans: transform loop bodies, such that the final loop construction takes the form of a universal pre/loop/post combination that takes a description of the span as input. Some remarks: - using a limited set of core operations, issues of associativity and commutativity could be left until later (when discovering ILP). - the remaining problem is the conversion of an input specification to a (memoized) data flow program. in order to have things map easily to serial C, the memoized form uses `let*' terminating in a `values' form that produces the output. Let's try this. The effect of 'z' needs to be pushed through to an input variable, to be mined later to determine the signature of the function. [1] http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.68.5409&rep=rep1&type=pdf [2] http://eprints.kfupm.edu.sa/45250/1/45250.pdf

[Reply][About]

[<<][staapl][>>][..]