Tue May 11 19:59:36 EDT 2010
Since lifting over a simple array iteration is quite trivial, the only
remaining real problem is dealing with the unit delay Z. Essentially
there are two forms to handle:
* Output delay, or delay of computed values. These are stored in
local variables in the loop body, and in state vectors inbetween
loop iterations. A special implementation might be necessary for
long delay lines.
* Input delay. These can be implemented using array indexing, except
for the first iteration where they need to come from stored state,
and the last iteration where they need to be saved as state.
Implementing output delays can be done using the trick mentioned
previously. Input delay needs a specialized code body.
EDIT: Implementing 'Z' is going to be the main problem, as this is the
only structure that allows alternative implementations that
significantly change the memory access patterns of the resulting code.
The goal should probably be to find a representation of the different
degrees of freedom of implementation, and the construction of a cost
function based on memory access patterns.