Sat May 4 08:09:59 EDT 2019
Two-pass full array + triangle feedback
So the idea is to embrace the array nature and allow for "full
history" in the loops, then optimize out:
- intermediate arrays that are only used element-wise
- translate triangles to accumulators
Do this in two passes: one that generates only the normal form, and
one that can perform the optimizations.
This means the zipfold form has to change, as back-references of
outputs are allowed, and full references of inputs are allowed as
So the language I'm writing is an array language with "triangle
feedback" of the outputs.
This way there needs to be no distinction between accumulators and
outputs, as each output can be used as an accumulator, as long as the
index referencing stays within the triangle.
It seems that representation of this at Term level would be
straightforward. It needs only arrays and loops that range over an
The output of a single iteration is a collection of scalars.
The input is a collection of arrays.
- Some of these are original inputs, and are full scale.
- Some are outputs computed in the previous iteration.
So this needs the representation of an array, probably as a function.
- one loop incrementally computes an array
- an other loop can be run to perform a computation on that whole array
So it can't be just "global" variables. Scope really needs to be
local. Storage can be reused though as long as outputs are not
This isn't a simple problem.
But good to realize that a more general view (triangles instead of
accumulators) is better.
- at any "step", you can compute one scalar value from a number of
other scalar values (n-op)
- the place where this value is allocated, i.e. the hole, is what we
- at each "step", we also exactly know the valid range of the input
- based on context, holes can be re-used in subsequent iterations, but
this is an optimization. in first iteration, allocate everything
for single-assignment. then later on, "project" the variables,
where each projection eliminates a dimension.
- start with a setup where all loops have a fixed size, then
generalize to computed fixed sizes (e.g. allocate intermediates on
stack), then generalize to data-based iteration sizes with just an
upper bound on the allocation.
The idea is still the same as RAI, only RAI was too eager with
optimizing out intermediates.
Representing this is going to be a challenge. Term can probably be
used because it has parameterized nodes. What we add is:
- loop variable contexts
- context dependent nodes for value binding
- primitives for dereferencing
So, where to start?
Structurally, the first thing that needs to happen is to introduce
context nesting. Once that is possible, the rest will becomes clear.