Wed Jun 18 15:14:52 CEST 2008

eliminating intermediates in a concatenative language

Suppose [ f map ] applies an operation to a number of data structures
and produces a number of data structures, according to the arity [ f

The transformation that eliminates intermediate storage is:

   [ f map g map ] -> [ [ f g ] map ]

In opti talk this is called loop fusion. For simple video processing,
this is about the single most important optimization: it eliminates
storage of intermediate frames, which take up a large part of cache
memory. This optimization is in practice bounded by:

   - dependency depth for deep pipelines
   - instruction cache size

Naively, to take care of those issues it can be beneficial to limit
loop fusion and allow for a limited sized intermediate
buffer. These 2nd order problems can be ignored for now. The most
important life saver is loop fusion, which if not for speed, can save
a lot of memory.

Translating this optimization to current concatenative macro
architecture, it requires access to all functions in macro form. Final
instantiation of 'map' can be the generation of a for loop and buffer
allocation, but the important step is the fusion. How to use current
code substitution rules to implement this?

For one, it requires a 2-pass algorithm in the current form. The map
macro cannot be instantiated until all fusion has happened. Postponed
partial evaluation is solved in the pattern language using pseudo
assembler instructions.

apparently, in haskell this kind of optimization is pluggable

There are a couple of interesting links in that article.

It's all starting to become a bit more clear: concentrate on
properties of higher order combinators. One of the links seems to be
about automatically deriving these kinds of rules (Wadler's Theorems
For Free).