Sun Dec 4 09:02:30 EST 2011

Arrays or ports?

So I wonder still, arrays or some kind of dataflow binding?  Probably
arrays are the better interface.  However, I have to abstract
allocation, so I can't just use MArray[1] as is.

Maybe it is better to first abstract some kind of "store once" binding
mechanism, or an output port.

So I wonder, is there a way to model communicating processes in
Haskell?  I.e. sequential programs with 'read()' and 'write()'?  There
is CHP[2] but this might be overkill.

What I am looking for really is "read process write, and do that
multiple times", which fits nice in a "left-right fold" which I don't
know how to name properly:

       [i] -> s -> (i -> s -> (o, s)) -> [o]

So the point is not really this read/write thing, but to actually
represent the implementation of array access in my target language.

Let's say that again: I'm not looking for some fancy middle
abstraction between functions and real arrays, I'm looking for a way
to represent array read/write and index computation in the EDSL such
that I can both compile to target language, and build a Haskell
function maybe over a mutable array.

The point is to not have to resort to tricks in the compiler that need
to be duplicated: this functionality should be part of the
*interface*, i.e. the syntax of the EDSL, or at least an extension of
it to be able to distinguish pure code from code with array

After writing this down, it becomes very clear that the basic
difficulty is the representation of the array's metadata.  Since we
can't do allocation in the EDSL, the bounds need to somehow be
specified.  I see 2 ways of doing this:

  - all part of declaration: an array declaration binds a tuple (a,f,l,s)

    _array ::  (r (a,f,l,s) -> m (r t)) -> m (r t)

     a : array reference (i.e. pointer)
     f : index of first element
     l : index of last element
     s : array stride  (important to include in low-level rep!)

  - separate array declaration, binding a, and a _bounds operation
    that returns (f,l,s) as values:

    _array  :: (r a -> m (r t)) -> m (r t)
    _bounds :: r a -> m (f, l, s)

The first one seems to be more suited to dumb pointers where all have
the same type: it concentrates bookkeeping in a single place.  The
second one is more modular but requires some machinery to represent
array metadata in the target.

I don't have much to go with, but the more explicit first option seems
to be suited to what I'm currently thinking of (Pd and fairly
non-parameterized code to allow for aggressive constant propagation
and unrolling).  Defining bounds early and exposed as constants seems
to make this easier.

Hmm... toying with this a bit it seems that something like this really
is better, since the array reference itself needs to be passed in from
outside, so I guess this is really the 2nd bullet in a different form..

  _bounds :: (TMLprim v, TMLprim t) => 
            (r (a v)) ->
            (r f,
             r l,
             r s) -> m (r t)
            -> m (r t)

So what about when I have a collection of pointers and they all have
the same array size?  This his can be represented as one array of
records (i1, ..., in, o1,... om).  I.e. it's better to make asserts
implicit (one array with a given length is better than 2 arrays with
the same length.

Ok, let's take this approach.  This requires pointer chasing so the
return type of _get should not be a primitive.

Hmm.. it doesn't look like "begin" is a good idea.  Another problem,
it looks like '_let' really needs a monadic type in its first
argument, i.e. this istead of (r a) as first argument:

  _let :: (TMLprim a, TMLprim t)
          => m (r a)
          -> (r a -> m (r t))
          -> m (r t)

The (r a) version works for things like literals, but not really for
subexpressions.  Otoh do we want subexpressions?

  t1 = _let (lit 123) $ \a -> _ret a

I'm getting confused.  This probably needs to be fixed first.

[1] http://www.haskell.org/ghc/docs/4.08/set/sec-marray.html
[2] http://www.cs.kent.ac.uk/projects/ofa/chp/