Sat Jul 3 11:52:25 CEST 2010

The DSP problem

Simplified, implementing DSP algorithms is about optimizing usage
patterns of memory and CPU:

- Find locality of reference (memory hierarchy)

Try to consume non-final intermediate data as soon as possible after
they are generated, keeping them in fast memory.

- Find parallellism (use all functional units)

In current computer architectures, there seem to be 3 levels at which
this can be done:

  * Instruction level (VLIW : usage of functional units on 1 CPU).
  * Shared memory thread level : multiple CPU, 1 memory pool.
  * Network : multiple CPU, only data comm.

The problem structure that sets the stage are the data dependencies
which are algorithm-specific and can be considered fixed.  Only very
high level mathematical understanding can perform algorithmic
transformations that eliminate dependencies.  The irony is that often
the introduction of dependecies can make an algorithm faster (linear