Sat May 4 08:09:59 EDT 2019

Two-pass full array + triangle feedback

So the idea is to embrace the array nature and allow for "full
history" in the loops, then optimize out:
- intermediate arrays that are only used element-wise
- translate triangles to accumulators

Do this in two passes: one that generates only the normal form, and
one that can perform the optimizations.

This means the zipfold form has to change, as back-references of
outputs are allowed, and full references of inputs are allowed as

So the language I'm writing is an array language with "triangle
feedback" of the outputs.

This way there needs to be no distinction between accumulators and
outputs, as each output can be used as an accumulator, as long as the
index referencing stays within the triangle.

It seems that representation of this at Term level would be
straightforward.  It needs only arrays and loops that range over an

The output of a single iteration is a collection of scalars.
The input is a collection of arrays.
- Some of these are original inputs, and are full scale.
- Some are outputs computed in the previous iteration.

So this needs the representation of an array, probably as a function.

Some cases:
- one loop incrementally computes an array
- an other loop can be run to perform a computation on that whole array

So it can't be just "global" variables.  Scope really needs to be
local.  Storage can be reused though as long as outputs are not

This isn't a simple problem.

But good to realize that a more general view (triangles instead of
accumulators) is better.

So again:

- at any "step", you can compute one scalar value from a number of
  other scalar values (n-op)

- the place where this value is allocated, i.e. the hole, is what we
  are managing.

- at each "step", we also exactly know the valid range of the input

- based on context, holes can be re-used in subsequent iterations, but
  this is an optimization.  in first iteration, allocate everything
  for single-assignment.  then later on, "project" the variables,
  where each projection eliminates a dimension.

- start with a setup where all loops have a fixed size, then
  generalize to computed fixed sizes (e.g. allocate intermediates on
  stack), then generalize to data-based iteration sizes with just an
  upper bound on the allocation.

The idea is still the same as RAI, only RAI was too eager with
optimizing out intermediates.

Representing this is going to be a challenge.  Term can probably be
used because it has parameterized nodes.  What we add is:

- loop variable contexts

- context dependent nodes for value binding

- primitives for dereferencing

So, where to start?

Structurally, the first thing that needs to happen is to introduce
context nesting.  Once that is possible, the rest will becomes clear.