Mon Feb 18 17:04:30 CET 2013

Distributing indices

It might be best to start working with the dim and stride operations
before making this work.

Currently, a scalar node is casted to a vector node, and this is
compensated for in the "stride map".  But that doesn't seem to be

Currently we always bind to the innermost loop indices, but it might
be that this is not correct.  Actually, that is probably what the bug
is about..

Basically, const unifies a node with (? x) where ? is the
"distribution dimension".

Then both dim and stride need to handle this correctly.

The principle is that all variable occurances inside a loop have the
same rank, but that some dimensions might be "skipped".

I.e.  (4 ? 5) corresponds to a rank 2 grid used in a rank 3 or 3-level
loop where the second level is ignored in the indexing of the rank 2

So it's clear that:
- const introduces a special rank wrapper '?'
- stride needs to insert a 0

what about dim?
where is it used?

Only in the generation of stride lists.

Instead of modifying the code, it seems simpler to do this in a
pre/post approach.

I.e. define an operation that takes a dims list and returns a list
with infinities removed, and a function that will take a list and
re-insert zeros in the correct places.

Maybe, before doing this, it might be best to simplify the type
representation.  The nesting (A (B t)) is unnecessary and can be
flattened to (A B t) or even (A B) since the last element is not
necessary for the grid typing.

This way, dims are the same as the type reps themselves.

What about this, as soon as possible, replace the type map with a dims
map.  Then only use dims.

However, this makes it impossible to use the "skip dimension" trick..

This all needs to be made a lot simpler.  It's probably also better to
use offset variables instead of relying on common subexpression

I'm starting to think that it might actually be easier to perform
explicit indexing, i.e. use plain C arrays.  This will make the
generated code easier to read.  The whole thing can be avoided by
making the state/input vector into a proper struct.