Sun Sep 20 12:08:39 CEST 2009
Software pipelining: An effective scheduling technique for VLIW machines
From the abstract-interpretation based ``flattening'' of matrix
operations (Gauss-Jordan elimination) to data flow graphs it dawned on
me that ``sorting'' this network might be an interesting optimization
for VLIW processors (such as the TI DaVinci / C64x DSP, or NXP
TriMedia) Of course, it turns out I have it all backwards: the VLIW
architecture was introduced to take advantage of this pattern.
From  (which I found in a collection of must-read CS papers):
``The key to generating efficient code for the VLIW machine is global
code compaction. In fact, the VLIW architecture is developped from
the study of the global code compaction technique, trace scheduling''.
On this subject Ellis' PhD seems to be the definitive referece,
however I can't find an electronic copy. Looks like this book by
Fisher et al. might have some answers too. This one is on
Trace-Scheduling-2, a non-linear extension. The basic idea in trace
scheduling is to optimize the most frequently executed traces by
turning them into straight-line code. This in turn allows for plenty
of opportunities to find parallelism. So, trace scheduling attempts
to _create_ the conditions I mention in the first paragraph, by
picking traces that are highly probable.
However, it has its problems. The most important one is exponential
code explosion. So,  talks about software pipelining, which is an
alternative to trace scheduling. It uses hierarchical reduction to
straighten branches: a conditional branch is split into streams of
conditional code, where the shorter branch is padded to fit the size
of the larger one. The rationale is that because jumps are
particularly expensive in VLIW (i.e. #FU x pipeline-depth), this
approach (disabling units from the unused side of the branch) is often
better than performing a jump.