Sun Sep 20 13:17:21 CEST 2009

Selling Lisp in a Bag

From most of my communications lately it looks like selling Lisp[1] is
not going to work.  To get anywhere with testing these ideas against
some gegenwind is to use them where they are used best: backstage,
without any need for explanations..

I'd like to focus on two architectures: the TI fixed point C64x as can
be found in the DaVinci[4] and OMAP SOCs (i.e. the OMAP3530[2] in the
BeagleBoard[3]), and the floating point cousins[5] i.e. the C6701[6].
I have the DM6446[8] (OSD2) which has a C64x[7] DSP.

The C67x[6] is quite similar to the C64x[7] in that it has 8
functional units.  In floating point mode: 4/2 (add/mul).  In fixed
point mode: 6/2.  The FUs are partition in two paths of 4.  The C64x
fixed point core in addition supports vector instructions (each FU can
execute a vector of add/mul + complex mul + Galois field mul).  A
presentation about the arch here[9].

When fixing on an architecture like that, there are 2 overall
strategies that seem obvious:

   1. propagate CPU features upwards into abstractions

   2. compile high-level description to the highest possible
      abstraction level provided by the vendor tools[10] (TI C
      compiler and assembly optimizer)

Getting that right will probably get you more than half way there.
The rest seems to be data memory management.

The tool architecture then looks something like:

META  = special purpose code transformers for static DSL
TOOLS = vendor toolchain
LINK  = binary linking step (no inter-op optimization)

System software then looks like:

* components (final + performance critical):

                  META                      TOOLS
  [ static DSL ] ----->  [ ASM / C / C++ ] -------> [ BIN ]

* toplevel system (debug + proto + not performance critical):
   [ dyn DSL ] ----->  [ BIN ]

One possible design flow is to move components from dynamic
composition to static after exploratory phase.  The dyn DSL could
_interpret_ the static DSL (to provide aspect join points).

An important property of the C6000 architecture is loop buffer support
for software pipelining.  

Another interesting point is that the C6000 tools provide ``linear
assembly'' which is a language level inbetween C and full scheduled
assembly code.  The assembly optimizer can perform register
allocation, partitioning and scheduling.  It can perform software
pipelining.  It looks like this is a very welcome target for
compilation: full access to all functionality without having to deal
with the intricacies of low-level resource allocation.

[1] entry://20090823-133201
[2] http://focus.tihttp://focus.ti.com/docs/prod/folders/print/tms320dm6446.html.com/general/docs/gencontent.tsp?contentId=36915
[3] http://beagleboard.org/
[4] entry://../davinci
[5] http://focus.ti.com/paramsearch/docs/parametricsearch.tsp?family=dsp&sectionId=2&tabId=1948&familyId=1401
[6] http://focus.ti.com/general/docs/lit/getliterature.tsp?genericPartNumber=sm320c6701&fileType=pdf
[7] http://www.ti.com/lit/gpn/tms320dm6446
[8] http://focus.ti.com/docs/prod/folders/print/tms320dm6446.html
[9] www.imec.be/sips/pdf/inv_sp_noraz.pdf 
[10] http://www-s.ti.com/sc/techlit/SPRU187