Sun Sep 20 12:08:39 CEST 2009

Software pipelining: An effective scheduling technique for VLIW machines

From the abstract-interpretation based ``flattening'' of matrix
operations (Gauss-Jordan elimination) to data flow graphs it dawned on
me that ``sorting'' this network might be an interesting optimization
for VLIW processors (such as the TI DaVinci / C64x DSP, or NXP
TriMedia) Of course, it turns out I have it all backwards: the VLIW
architecture was introduced to take advantage of this pattern.

From [1] (which I found in a collection[6] of must-read CS papers):
``The key to generating efficient code for the VLIW machine is global
code compaction.  In fact, the VLIW architecture is developped from
the study of the global code compaction technique, trace scheduling''.

On this subject Ellis' PhD[4] seems to be the definitive referece,
however I can't find an electronic copy.  Looks like this book[5] by
Fisher et al. might have some answers too.  This one[2] is on
Trace-Scheduling-2, a non-linear extension.  The basic idea in trace
scheduling is to optimize the most frequently executed traces by
turning them into straight-line code.  This in turn allows for plenty
of opportunities to find parallelism.  So, trace scheduling attempts
to _create_ the conditions I mention in the first paragraph, by
picking traces that are highly probable.

However, it has its problems.  The most important one is exponential
code explosion.  So, [1] talks about software pipelining, which is an
alternative to trace scheduling.  It uses hierarchical reduction to
straighten branches: a conditional branch is split into streams of
conditional code, where the shorter branch is padded to fit the size
of the larger one.  The rationale is that because jumps are
particularly expensive in VLIW (i.e. #FU x pipeline-depth), this
approach (disabling units from the unused side of the branch) is often
better than performing a jump.

[1] http://reference.kfupm.edu.sa/content/s/o/software_pipelining__an_effective_schedu_8310.pdf
[2] http://www.hpl.hp.com/techreports/93/HPL-93-43.pdf
[3] http://courses.ece.illinois.edu/ece512/Papers/trace.pdf
[4] http://mitpress.mit.edu/catalog/item/default.asp?tid=5098&ttype=2
[5] isbn://1558607668
[6] http://www.cs.utexas.edu/users/mckinley/20-years.html