Wed May 7 17:52:08 CEST 2008

machine model / partial evaluation and state management

the idea is to be able to evaluate (simulate) code off-target, as long
as it only depends on MACHINE state.

[movlw 123] can be translated into: read,modify,write with the update
happening off target. [movwf LATA] is a border case: it can be split
in read,modify,write but it also effects external physical state.

what is required is a clear definition of what simulation means: is it
completely isolated from the 'real' world, or does it just simulate
the computation part of the target? does [movwf LATA] alter the output
of pins, or does it modify some internal model?

it would be best to make this behaviour pluggable: the amount of
'realness' should be configurable.

the modes are:
                  |  STATE     COMPUTATION
  (1) stand-alone |  real      real
  (2) tethered    |  real      emulated
  (3) simulator   |  emulated  emulated
  (4) test        |  emulated  real

and, really, you need only the first 3. does the 4th one make sense
during application development? actually not: the CPU is a functional
unit, and can be exactly emulated (in principle, might not always be
necessary: partial emulation can be good enough). this mode DOES make
sense during emulator testing though. (emulating STATE completely
might be impossible since it depends on the external world)

the place to introduce emulated state is in the partial evaluator of
machine code.

so.. what you want is to be able to modify meaning of code depending
on level of simulation. i.e. [movwf LATA] might mean:

   (1) execute the instruction on the target

   (2) simulate the instruction as passive (memory only) machine
       state update on the host + write the state to the target

   (3) simulate the state update as active machine state update, do
       not involve the target. (i.e. writing to the latch might set
       the state on input ports during next instructions.)

   (4) compare the state update simulated on host and executed on

probably i should generalize brood as a framework for pluggable
simulation. this is more general than the previous emphasis on
tethered development, and potentially a _LOT_ more powerful.

it's probably best to focus on memory mapped i/o and synchronous
execution: get it to work for the PIC18 first, then generalize the
architecture. each functional unit can be implemented as a thread.

what you want basicly is fine grained control over what exactly is
executed on the target, and what is not. there is an order relation
hidden here: it's impossible to simulate state update when executing
code on the target, this means there's a directed graph of 'realness'
that can be used as a guide to building a code/data structure to
implement this.

given the program source, it can be compiled for:

     (1) running completely on the target

     (2) running partly on the host + target state update. the latter
         could be plain code execution.

     (3) complete simulation

some remarks here.

* time-critical software needs to run on-target, so it is important to
  design programs such that they can be tested by virtualizing the
  stimulus (slowing down time): make everything synchronous, that way
  time is an integer and can be abstracted. simulate non-synchronicity
  on top of this.

* the application domain is massive parallel, so the basic unit of
  simulation is a task. PLT scheme has all the necessary tools to
  build this kind of thing. it would be interesting to equip purrr18
  with some libraries to implement state machines and tasks in a way
  that works well with the simulator.

* program compilation = partial evaluation of simulators. i.e. [movwf
  LATA] can be compiled to machine code and executed on the machine
  only if LATA is real. an application will compile to 2 things:

     1. supporting machine code to run on target (i.e. the monitor)
     2. host side entry point, which might sequence simulation

* not so much related, but can 'incremental dev' be used here? only
  recompile parts of target support code that is necessary? this is an
  optimization problem which only needs proper dependency
  management (memoization) and can probably be solved seperately.