Staapl dev log.

This is a day-to-day development log & soundboard.

You probably want:

  home         http://zwizwa.be/staapl
  edited blog  http://zwizwa.be/ramblings/staapl-blog

Related project logs:
  entry://../meta
  entry://../libprim

---------------------------------------------------------------------

TODO

Trivial to fix:
- list all commands
- snot forth console

PIC18 features:
- I2C bus monitor
- live access to multiple microcontrollers
- usb driver: basic abstractions

PIC18 apps:
- interrupt based CRT display controller
- serial keyboard interface for KBsheep
- PC reset logic connected to WRT54

Possible extensions / Non-trivial things to fix:
- build System09 / rekonstrukt using xilinx tools
- reflective forth bootstrapping on top of staapl (reusing primitives!)
- figure out why it compiles so slowly
- write more documentation
- assembler addressing modes
- dsPIC
- 6809
- assembler text output
- external emulator hooks


Entry: introduction
Date: Sun Jan 28 12:00:00 GMT 2007

[ this used to be the blog header ]

Staapl consists of:

* Scat: a set of macros for PLT Scheme implementing a family of
  dynamically typed concatenative languages usable inside scheme code.

* Coma: a COmpositional MAcro language: an extension of the Scat
  language with data types representing target code, and a
  specification syntax to define target code pattern matching
  primitives.

* Control: Extension of the Coma language with Forth-style control
  primitives based on conditional and unconditional jumps, useful for
  low level programming.

* Comp: a compiler that instantiates Coma+Control macros to produce a
  code graph structure + performs a-posteriori optimizations.

* Asm: a straightforward multipass relaxation assembler with arbitrary
  expression evaluation in terms of target addresses and a high level
  opcode definition language.

* Forth: parsing extensions for representing classic Forth syntax +
  PLT scheme language layers.

* Pic18: uses the Purrr template specialized to the Microchip PIC18
  architecture, to implement a PLT Scheme #lang.


The idiomatic forth language and the ideas behind the peephole
optimizing compiler have remained fairly constant since the 1.x
version. The Forth dialect is implemented as a collection of macros:
functions which operate on a stack of assembly instructions, and a
stack of nested control structures.

The thing which changed throughout the versions is the use of better
(higher) abstractions to implement the basic ideas, and factor the
design into simpler components.

Note that this project is much more about functional programming and
PLT Scheme than about Forth. The Forth dialect is used mostly as a
practical macro assembler on top of the clean Coma
architecture. However, the end result is nice to play with PIC18
chips, and it is used for real-world stuff.


Entry: monads
Date: Sun Jan 28 14:43:30 GMT 2007

EDIT: i clearly didn't get it here.. much of the monad stuff i talk
about is not what you find in Haskell. the thing is: a stack that's
threaded through a computation behaves a bit like a monad. it solves
some same practical issues, especially if you start to use syntax
transformations that can convert pure functional code to code that
passes a hidden top element. but it doest have the power of a generic
bind operation. i talk about this later though.


i think i finally start to get the whole monad thing. in layman's
terms: it is centered around splicing together (using 'bind')
functions that take a simple object to a container.

.in what i'm trying to accomplish, this is just compilation: take in
some code and output an updated state.

maybe i should give up on the whole CAT thing after all, and
concentrate on using scheme and some special structures to actually
create a proper language and macro language. i already have a way to
write concatenative code in scheme without too many problems (see
macro.ss)

the layer is probably just not necessary.. scheme is more powerful,
and everything i now do in CAT i could try to move over to a virtual
forth: write everything from the perspective of the forth
itself.. something like: 'spawn this process on the host'.

another thing that's wrong with CAT is the lack of decent data
structures. it's overall too simple for what i'm trying to do: proper
code generation on top of a simple basic language.

let's go back to my recently restated goals for BROOD:

* basis = forth VM on a microcontroller for simplicity
* write cross compiler in a language similar to forth
* use as a FP approach for the compiler language


the middle one can take on different forms. i still think it is very
beneficial to have an intermediate language to express forthisms. but
this language can just be embedded in scheme stuff.

so let's just start to build the thing, right?


Entry: CAT design
Date: Sun Jan 28 17:28:56 GMT 2007

[EDIT: Sun Oct  7 00:20:32 CEST 2007
- the pattern matching grew to a 'quasi algebraic types' construction
- from forth -> machine code there are now a lot more passes
- the shared forth elimination is made machine specific.]

the design is quite classic forth, but it might be simplified a
bit. CAT consists of the following pipeline:

      (1)	  	    	       (2)
forth --> peephole optimized assembler --> absolute bound machine code

currently (1) is the compiler while (2) is the assembler. it might be
more interesting to actually split it up in two parts. introduce a
peephole optimizer that can separate out the forth compiler to a
higher level and the assembler to a lower, machine specific level,
making it a bit more like a frontent/backend multiplatform compiler.

also, several things could be made declarative. peephole optimization
is basicly pattern matching. currently CAT implements it as a pretty
much imperative process: if last instruction is dup, then drop undos
dup etc..

give the target we are using, it is possible to completely write the
assembly language in forth style.

so, summary: split peephole opti in 2 parts:
- shared forth elimination (as a result of macro expansion)
- machine specific assembler opti


Entry: declarative peephole optimizations
Date: Sun Jan 28 17:38:31 GMT 2007

basicly, this is a rewriting system. currently i use a tree structure
for this (ifte). this is a list of transformations:

(
 [(dup drop) ()]
 [(dup save) ()]
 [(save drop) ()]
 [(1 +) (inc)]
 [(1 -) (dec)]
 [((lit?) +) ()]
 )

what do do with things that do not fit this? for example
literals.. i really do need predicates. actually, i should make a
list of all optimizations to make it a bit more clear. currently
the code is way to dense.


Entry: rewriting
Date: Mon Jan 29 00:26:40 GMT 2007

funny.. google search on rewriting lead me to the pragmatic
programmer. maybe i shouldn't read joel on software. especially not
his rant about never scratching a whole project and starting
over.

anyways.. there are some serious things wrong with the way i'm trying
to solve the compilation / optimization problem. i'm using a massive
tool and am still writing in the gerilla hack style most forths are
written in. i have proper data structures now, so why not use them?
why not make some minilanguages that do special tasks.

it would be interesting to start moving functionality over to the lamb
core as soon as possible. most of the code is optimization though,
so..

i'm curious about this rewriting business.. looks like there's
something to learn there. i know it works in a naive way, since that's
what i already have. but i'm curious if this can be taken
further. with faster static code it might be possible to do a whole
lot more (non deterministic stuff).

some problems to face are literals and some..

one thing that worries me is 'how to prevent loops'.  i know, if
things get smaller, there can be no loops, but i can imagine some more
fancy expand/collapse rules that might start looping with a naive
approach.

looking at the optimizations, most are about reducing stack juggling
and moving it to register transfers. this is almost universal on all
machines. i do need to think a bit about a sort of 'base line forth'
that will be the end of the optimizations, such that eventual
compilation is straightforward. this seems like an elegant solution.


Entry: purely compositional approach (joy)
Date: Fri Feb  2 12:35:56 GMT 2007

whenever program text is read, it is immediately compiled. each symbol
is replaced by it's particular function, and each constant is replaced
by a function that pushes the constant.

sym   -> (lookup sym)
data  -> (lambda stack (cons data stack))

to get back the value of a data atom (practical issue), you pass any
list, apply the function, and pop off the data.

this can be done at compile time.

so, can we do with data structures composed entirely of functions?
probably yes.. probably this isn't even such a bad idea..

it looks like it is not a good idea to map composition to single
lambda expressions, but to have an interpreter for it instead, so we
can implement things like CAR efficiently: it is possible to implement
CAR on an abstract function which represents a list by 'testing' it on
a stack, however, this is a lot worse than just getting the left
element of the first pair..

then, why not represent constants by constants instead of their
wrapping functions? back to square one..


Entry: jit compiler + parser
Date: Fri Feb  2 15:24:47 GMT 2007

if i'm absolutely sure that function names are static, it's possible
to use a jit compiler without sacrificing this semantic property:
leave them as symbols until they are encountered, then compile
them. this would also eliminate the problem of forward/backward
declarations etc.

this seems to work very well in the simple first experiments..

the parser seems to work too: parse code = list of things. if one of
the things is a list, parse it and wrap it in a lambda.

so what about closures?


Entry: rewriting
Date: Sat Feb  3 10:52:54 GMT 2007

i'm working on the rewriting, and it looks like this is ideal to use a
compositional mini language for.. so i've quickly extended the 'run'
function to take a 'compiler' argument which will resolve symbols to
functionality, still using the jit compiler.

the 'compiler' term could be used to give context (lexical) dependent
information about symbols. however, it should really be tied to the
stored code then..

the idea is to represent pattern matchers as ordinary composed code,
but using a special compiler (macro) to instanitiate them.

so this gives the list of problems for today:
- solve lexical compilation issues
- how to 'execute' from within a primitive


lol. this is again exactly the same thing as i already had: each forth
word is a macro :)

so the problem reduces to the lexical thing (namespaces) and how to
compile a generic pattern matcher into a macro.


Entry: do i really need lambda?
Date: Sat Feb  3 11:24:04 GMT 2007

what i need is local names, just for the sake of code organization and
different sublanguages. i don't need lambda really. i don't need
runtime binding of symbols to names. the whole idea of
combinatory/compositional/concatenative languages is to eliminate
variable names...


Entry: macro semantics
Date: Sat Feb  3 12:37:36 GMT 2007

i have something like this now:

(define (macros name) name)
(define-resolver register-macro macros)

(register-macro 'nop (lambda stack stack))
(register-macro 'dup (lambda (asm . stack)
                       (pack (pack `(movwf POSTINC) asm) stack)))

which can be executed as

> (run (parse macros '(dup dup)) '(()))
(((movwf POSTINC) (movwf POSTINC)))
>

so in this, compilation is the execution of one program to produce
another program. let's stay in the forth syntax as long as possible,
and rewrite this to:


(define-syntax forth
  (syntax-rules ()

    ((_ output () (rwords ...))
     (pack rwords ... output))

    ((_ output (word words ...) (rwords ...))
     (forth output (words ...) ('word rwords ...)))

    ((_ output (words ...))
     (forth output (words ...) ()))))

(define (macros name) name)
(define-resolver register-macro macros)

(register-macro 'nop (lambda stack stack))
(register-macro 'dup (lambda (out . stack)
                       (pack
                        (forth out (POSTINC1 movwf))
                        stack)))

> (run (parse macros '(dup dup)) '(()))
((movwf POSTINC1 movwf POSTINC1))
>


Entry: dynamic code
Date: Sat Feb  3 12:45:23 GMT 2007

note that once something is run once, it will be compiled in place and
can never be accessed as data again. it is important to make it
impossible to do things like

     (1 2 3 + +) dup run

and then use the 2nd copy. this is easily solved by first creating a
copy though, but sort of defeats the current way the JIT compiler
works. maybe i can make sure that the 'run' word, which is the
interface to the internals, always makes a copy of a list whenever it
encounters dynamic code?

to summarize

- parsed lists are safe. they are always pure code and can never be
  interpreted as pure data.

- anything that's 'run' at run time is not, so here a copy needs to be
  made.

the solution here is that 'parse' is necessary to run symbolic code:

    (1 2 3 + +) parse run

and 'parse' already makes a copy of the list, since it is functional.

NOTE:  (define (list-copy x) (append x '()))

so i defined   interpret <- (parse run)


Entry: bind and stuff
Date: Sat Feb  3 13:17:03 GMT 2007

i think it's starting to dawn on me.. the disadvantage of functions
like the above is that state accumulation is explicit: there is this
chunk of accumulated state on the top of the stack while none of the
functions actually need it to be there. enter the 'bind' concept. in
order to get rid of these arguments, you define a macro according to:
( -- code ), but automaticly lift it to ( code -- code ).

ok.. this seems to work. what i have now is a way to generate simple
substitution macros.


Entry: rewrite macros
Date: Sat Feb  3 15:54:08 GMT 2007

the next step is rewrite macros. this should be done in two steps.  in
order to make a single 'intelligent' macro, different patterns need to
be combined into one fucntion, and one function needs to have
information about different patterns. a sort of 'transpose'.

- make a list of rewrite patterns
- compile it into code

rewrite macros are easier understood as operating on output forth
code.

i don't know. ok.. time to be stupid then. state the previous
solution, then abstract it.

previous solution was explicit

(dup drop) -> ()

drop is a function that discards anything that comes before it and
produces a value (without other side effects: it's important to write
macros so that the last operation is a mutation)

so drop needs to be intelligent.

ok. it's easy enough to implement this in exactly the same way as in
CAT/BADNOP. however, there should be a more highlevel construct that
eliminates the explicit if-then things.


Entry: compositional languages suck
Date: Sat Feb  3 17:07:00 GMT 2007

it's a feast to use them to glue things together, but more complicated
things are easier expressed using lambdas.. i think the approach of
writing the core algorithms in scheme with full fire power, and
keeping the language itself mainly for interaction, is a valid one.

compositional languages are cool because they lift you from the burden
of having to name things, and allow you to think in terms of structure
(more geometricly) vs. random connections in parametrized things.


Entry: more lifting
Date: Sat Feb  3 17:44:56 GMT 2007

i ran into a new class of functions. i already had

  . -> code

which are just constants. now i have

  code  -> code

which are code transformers that need to look at the current generated
code state (never the source code!)

i added the default resolver for macros to be a quote to the forth
output stack. i do need to change the way other types are handled
though. it looks like this is better solved earlier in the process. to
keep the JIT compiler like it is, the parser could be adapted to
already compile constants to quoting procedures.

this works nicely.


Entry: now for the meta stuff
Date: Sat Feb  3 21:16:12 GMT 2007

some questions remain. how to generate more boilerplate for some kinds
of peephole optimizations, and how to check if it is actually possible
to optimize towards a 'core forth' that can be straight compiled to
assembly.

let's find out by systematicly porting some macros.

the main question is: what with arguments?

    123 ldl

means load literal in the top of stack register.


Entry: cat snarf
Date: Sun Feb  4 00:16:56 GMT 2007

porting stuff from old cat to new cat. seems to work really well. not
having state on stack to deal with makes things a lot easier..

but for badnop this means the database needs to be designed in a
proper way. maybe for the assembler we use some kind of dictionary as
a state?

the thing is.. i'd like to keep as much of the 'functional OO' that
was present in CAT. this makes it possible to do parallel stuff and
backtracking in a very easy way, especially now that it's kind of
fast.


Entry: intermediate language
Date: Sun Feb  4 10:51:20 GMT 2007

since i am optimizing for a register machine, it might be best to
write the rewriter in terms of a the register machine primtivies i
used in BADNOP.

the main thing to decide is: is it easier to optimize code like this:

    	 1 2 +

or this
	 (dup) (lda 1) (dup) (lda 2) (call +)

it's definitely easier to do the former.. maybe i should implement the
assembler now, so i can see this a bit clearer.


the problem i'm trying to solve is: rewrite forth in such a way that
assembly becomes trivial. things which make this problematic are
folded constants: constants that are already bound to a machine
operation as a literal. maybe i should just write them as 'pseudo
forth code' but group them, like this:

         1 2 +   ->   dup  (drop 1) dup (drop 2)

here every grouped instruction is meant to be replaced later by one
machine opcode. the advantage of this approach is that there are no
'self-quoting' things in the code after a first pass.

considering the targets i'm using don't have an instruction to do
dup+ldl in one go, i guess this idomatic approach is a valid one.

it is probably better to do this in more phases:

1 forth based semantic substitution (rewrite)
2 conversion to idomatic representation (compile)
3 direct mapping from idiomatic cells to assembly code (assemble)


because 3 can be made invertible, it's possible to easily decompile,
flatten and semanticly optimize back!


Entry: pattern matching
Date: Sun Feb  4 13:41:49 GMT 2007

having a look at the plt pattern matching code. i really need this
kind of stuff :) the basic thing is

(match  x  (pat expr) ...)

when x matches one of the pat, the correspionding expr is evaluated
with symbols of pat bound to values in x.

ok.. seems to work pretty well. but i still need to find out how to
reverse a pattern.


Entry: next
Date: Sun Feb  4 16:06:45 GMT 2007


* find out how to reverse patterns
* lift rewriters above 'rest'
=> compile patterns

* assembler
* state


Entry: compiling patterns
Date: Sun Feb  4 16:09:43 GMT 2007


i need to make my own pattern language to compile substitutors from a
more highlevel definition like

((dup drop)  ())
((a b +)     (,(+ a b)))
((a b xor)   (,(bitwise-xor a b)))
((a not)     (,(bitwise-xor a -1)))
((a negate)  (,(* a -1)))

((dup dup (drop a) !)  ((,a sta))
((dup (drop a) !)      ((,a sta) drop))

using syntax-case. hence the latter 2 expressions will be merged into
one '!' rewriter macro.

as a preparation i can already try to see if all macros fit in this
category. yes, they do.

but i need to solve the problem of solving type matching first, since
the arithmetic above only works if the numbers are immediate. type
matching is part of the match.ss language, but i didnt figure out yet
how to also bind a matched item to a name..

i think i solved the rewriter problem for the pattern language
above. just need to sort out some macro issues, probably best use
syntax-case with some explicit rewriting.


Entry: syntax transformation
Date: Sun Feb  4 22:48:32 GMT 2007


1. one pattern i've been trying to solve is this

(definitions
	(some (special) structure here)
	(same (special) structure there))

it's easy enough to write the first transformation, but how do you do
the next one without having to explicitly recurse using names etc.. ?

in other words: "now just accept more of the same".
this seems to be the answer:

(define-syntax shift-to
  (syntax-rules ()
    ((shift-to (from0 from ...) (to0 to ...))
     (let ((tmp from0))
       (set! to from) ...
       (set! to0 tmp))    )))

one ellipsis in the pattern for every ellipsis on the same
level. or something like that.. need to explain better.


2. what's the real significance of the argument of syntax-rules ?

"Identifiers that appear in <literals> are interpreted as literal
identifiers to be matched against corresponding subforms of the
input."


3. how to get plain and ordinary s-exp macro transformers using
define-syntax?

i was thinking about something like this:

(define-syntax nofuss
  (syntax-rules ()
    ((_ (pat ...) expr)
     (lambda (stx)
       (match (syntax-object->datum stx)
              ((_ pat ...)
               (datum->syntax-object stx expr)))))))

(define-syntax snarf-lambda
  (nofuss (args fn)
          `(lambda (,@(reverse args) . stack)
             (cons (,fn ,@args) stack))))


but that doesnt work, since nofuss is not defined at expansion
time.. but it should work. there has to be some way of doing this.


Entry: pattern language
Date: Mon Feb 5 11:30:59 GMT 2007

seems to work. some problems remaining though:
- default clause
- literal/parameter
- fix 'rest'
- type specific match

done


Entry: remaining problems
Date: Mon Feb  5 21:26:33 GMT 2007

two big problems remaining. the assembler and state storage. the
assembler is a bit nasty. lots of tiny rules to obey.. i wonder if i
can make something coherent out of this.

the state store is tightly coupled to the assembler. here i can
probably do another trick of accumulating the dictionaries using some
binding functions.

what are the tasks of the assembler?
-> creating instruction codes from symbolic rep
-> resolving all names to addresses (2 pass?)
-> making sure all jumps have the correct size

the result of assembly is a vector, and some updated symbol
tables. the input is optimized and idiomized forth code that can be
straight translated.

it would be nice to use nondeterministic programming for choosing jump
sizes, but that's probably overkill: moving code down is probably easier.

some observations:

* if i only shrink jumps instead of expanding them, the effects are
  always local: no bounds are violated, but the solution might be
  sub-optimal. for backward jumps, the correct size is known, only
  forward jumps need to be allocated.

* a proper topological representation which indicates where jumps go
  and where they come from is a good thing to have: a single cell has:
  - list of cells that go from here: max 2
  - list of cells that come here
  - instruction + arguments

* maybe that's overkill. since it can always be generated if analysis
  is necessary.. what about this:

  - assume incremental code: old code is not going to be jumping to
    new code.

  - within a code block under compilation, forward jumps are possible,
    so they need to be allocated: use maximum size. however, they
    should be rare: function definitions could be sorted before hand?

  - recursively work down from the current compilation point, and
    adjust all jumps. backtrack if necessary. this can be done in a list

* yep.. it's probably simplest to just perform the 2 ordinary steps of
  backward then forward address resolution, and add as many passes as
  necessary to resolve the shrinking.

* there are 2 x 2 x 2 types of jumps wrt. a single cell collapse
    abs/rel x start:before/after x finish:before/after

    -> relative : adj if they cross the border
    -> absolute : adj if they end past border


Entry: tired
Date: Mon Feb  5 21:44:26 GMT 2007

probably all a bit over my head atm. feeling a bit sleepy. maybe do a
bit of cleaning. like writing fold in terms of match instead of if
etc..

(define (fold fn init lst)
  (match lst
         (()       init)
         ((x . xs) (fn x (fold fn init xs)))))

(fold + 0 '(1 2 3))

good coffee :)

i'm feeling a bit ambitious.. instead of writing a flashforth based
standard forth, it might be feasible to try to write a functional
programming language for the micro.


Entry: ((dup cons) dup cons)
Date: Tue Feb  6 03:43:13 GMT 2007

just added 'compose', then i found out the quine in the title doesnt
work any more. it does work in joy, so what's up? the problem here of
course is that consing 2 quoted programs does not give a quoted
program: these are abstract data types and not lists.. manfred must be
using some explicit definition of cons on quoted programs somewhere,
or i don't really understand his list semantics.

the problem with mine is that quoted programs are in fact lists, but
they have a header containing a link to the symbol resolver.

so, am i missing something about Joy? is the: "quoted program is list"
necessary? have to check that.

in the mean time, i can get to quines by defining 'qons'.

there is a possibility of embedding lex information inside the list,
so instead of

   (lex a b c) -> ((lex a) (lex b) (lex c))

which might even be better, since it allows for mixing of dicts. this
also makes it possible to use a simpler interpreter, since the lex
state doesn't need to be maintained.

hmm.. tried, but to tired. but something is rather important. having
lists and programs on the same level as in joy is nice, but requires a
single semantics. since i already almost automaticly introduced 3
kinds of sematics for symbolic code (cat, forth rewriter and forth
compiler) this is not really feasible.

so lists and programs should probably be separate entities, where
programs are abstract and not dissectable, but compose and qons are
defined to do run-time composition.

list	 program
concat	 compose (qoncat)
cons	 qons

where quons will take 2 quoted programs, and concatenate the quotation
of the first with the second, or (a) (b) -> ((a) b)

however, if i change the interpreter as mentioned above, the two
columns will be identical, and programs can be manipulated just as
lists, without giving up any other functionality.


so, good to learn, CAT not really Joy because

- i'm using fully quoted lists instead of quoted programs to represent
  data lists: there is a clear separation between code and data.

- joy probably uses numbers directly in the lists, which i can't due
  to different number semantics for cat and purrr. for me, they have
  to be incapsulated in numbers.

- parsing from symbolic -> code is explicit because of different
  semantics: this allows reuse of interpreter for different mini
  languages.


Entry: interpreter cleanup
Date: Tue Feb  6 11:33:15 GMT 2007

instead of using set!-car, it might be better to use delayed
evaluation for the instruction type: everything evaluates to a
procedure. that way it stays functional.

ok. this seems to work: () and nil are the same now. plus there is a
structural equivalence: since compilation from symbol ->
procedure/promise is 1-1, the size of a compiled program (list size)
is equal to the original code list size.

ok.. now ((dup cons) dup cons) still doesn't work!!

the reason is that nested lists in code get executed.. is this still
valid? something smells here..

let's first change the other parser/compilers..

ok, if i swap around the quoting such that new definitions are always
wrapped in a closure that will call 'run', i should be safe.

this works, but i run into a difficulty:
i cannot unquote a program wrapped in a closure (interpret quoted stack)

the solution to this is to change back the semantics of 'parse' -> it
will return a quoted program instead of an executable one, and put the unquoting in 'def'.

i had to change this:

(define-word run     (code . stack) (run code stack))

to this

(define-word i        (code . stack) (run-quoted code stack))
(define-word execute  (code . stack) (run-unquoted code stack))


to only take quoted programs, not pure closures. 'execute' is only
there for completeness, since it is rarely needed (it is equivalent to
(nil cons i))

maybe that's again one of the key points? meaning, to make a very
clear distinction between quoted programs = aggregation of primitives,
and primitives.

ok.. this looks like it's working. this makes the language a bit more
introspective. it looks like the quine works too now.

this was quite surprisingly non-trivial!

so what do i learn?

-------------------------------------------------------------------
it is necessary to explicitly distinguish QUOTED PROGRAMS from
PRIMITIVES. the latter is a black box, but the former is a list of
primitives. this structure is NOT recursive!
-------------------------------------------------------------------

having quoted programs obeying the list interface adds very flexible
introspection. this probably means that the difference from Joy is
purely syntactic now.


tiens.. ((dup cons) dup cons) broke again...

ok. the reason is that after constructing a program from another
program, you need to 'compile' it before it can be run with 'i', so
the quine is relative to 'interpret' and not to 'i'.

i'm going to switch back to my previous notation and use 'run' instead
of 'i'. the conclusion here is:

       --------------------------------------------
       { QUOTED programs } is a subset of { LISTS }
       --------------------------------------------

i think this is not the case in Joy. whenever you operate on a quoted
program using list construction, you need to 'compile' it to a program
again. this is a projection from the set of lists to the subset of
quoted programs.

so 'compile' really needs to be a projection, meaning (compile
compile) and (compile) are equivalent.


it is possible to change this simply by having run-unquoted cons
everything to the stack that's not executable. however, keeping this
explicit allows more wiggleroom in the semantics of different
sublanguages. or maye better: it is cleaner, since there is no
'default' behaviour (the way the 'cond' is setup in run-unquoted
allows overriding.. that's not so clean).

so, final word. the interpreter implements:
* interpretation of a list of primitives as NOP,TC,RC
* lazy evaluation of primitives (JIT: delayed compilation)

all the rest needs to be implemented in the source transformers.

NOTE: [dup cons] is the Y combinator, sort of..


Entry: assembler
Date: Tue Feb  6 16:48:07 GMT 2007

alright. the assembler. finding instructions is the trivial part. the
hard part is finding addresses of jumps.

1. resolve backward references  (+ find instructions)
2. resolve forward references

if there are multiple size jump instructions, and there are relative
and absolute jumps, extra passes can be added that resolve efficient
allocation of these. allocating all forward references with maximum
range, and adjusting them one by one seems to be the best approach:

3-N. shrink forward references if necessary.

things get complicated if forward small offset relative jumps are used
in compilation, since constructs to work around this are necessary. i
need to find a way to abstract this kind of behaviour. basicly mappinng

A   O-O-O-O-O-O
     \_______/

to
      ___
     /   \
B   O-O-O-O-O-O
       \_____/

or the other way around. for PIC18 going from B to A reduces the jump
distance by 2, since the long jump is eliminated.

it can probably be just kept at 1 & 2 for now, with all jumps equal
size, and fix that later.


Entry: bookkeeping
Date: Wed Feb  7 09:49:20 GMT 2007

the other problem is the bookkeeping. for the assembler i basicly need
a symbol table, which means the dictionary object from cat can be
reused, and some binding operations need to be devised.

there are things to separate:
- labels     (write accumulate, read random)
- 'here'     (read/write random)
- asm output (write only)

it is probably easier to do most of this in scheme, together with some
syntax. let's see.

maybe best to do everything in 3 steps:
1. assembly to polish notation, keeping symbol names

2. forward symbol resolve
3. backward symbold resolve

the last 2 are stateful, the first one is just pattern matching.


Entry: lifting problem
Date: Wed Feb  7 10:25:31 GMT 2007

how to call a generic prototype function within the body of a to-be
lifted prototype? this is still one of the bigger problems i had when
writing old cat: cannot execute from stack macros!


Entry: screen scraping
Date: Wed Feb  7 12:41:41 GMT 2007

ok, that was fun. using emacs macros to convert text pasted from the
pdf datasheet into lisp code :) but it doesnt work very well though. i
think i should just get the data from gpasm, hoping it's a bit more
structured. (in the end i just typed it in).


Entry: great success!
Date: Wed Feb  7 14:51:36 GMT 2007

writing the assembler, and i'm realizing something. scheme is really
cool :)

but i'm not sure if scheme is the core of what i'm finding cool. i
think it's pattern matching. since a compiler is mainly an expression
rewriter, this comes as no surprise in hindsight.

the biggest mistake in the previous brood system is to try the problem
without pattern matching constructs. brood's approach (and previous
badnop) is really too lowlevel.

for expression rewriting, lexical bindings are a must. since the
permutations involved are mostly nontrivial, performing them with
combinators instead of random access parameters is a royal pain, and
the resulting code is completely unreadable.

i think this can be distilled in yet another "why forth?" answer, but
in the negative. if the task you are programming involves the encoding
of a very tangled data structure, then a combinator language is a bad
idea, since you have to factor the permutation manually.

so it's about this: forth is bad at encoding fairly random or ad-hoc
permutation patterns like you would find in a language
compiler/translator.


and, don't forget: match & fold are your friends!


Entry: assembler working
Date: Wed Feb  7 21:14:25 GMT 2007

at least the part that's not doing symbol resolution. now for the
interesting part: the assembler has some state associated:

- dictionary
- current assembly point

which has to be dragged along. i was wondering how hard it might be
to solve this with some closure tricks in scheme..


Entry: lambda again
Date: Thu Feb  8 09:47:23 GMT 2007

trying to get my head around this lambda thingy.. there are a couple
of problems, the most important one is the decision of whether lambda
should be form or function.

* form: everything is compiled at compile time. this means lambda has
  to be a parser macro, and the only way to do that consistently is to
  have it be a prefix macro. this would introduce the semantic
  simplicity of the language by intruducing syntax.

* function: lambda does runtime compilation, in which case the lexical
  environment has to be bound to the compiled representation of the
  lambda call. it also introduces runtime compilation. speed wise this
  is no problem, since all dictionary lookups are postponed till later
  anyway, but conceptually it is different again.

maybe the latter is the lesser of the 2 evils. 'lambda' still needs to
be a parser exeption since it needs to capture the parse
environment. so lambda is really delayed parsing. maybe that makes
more sense.

ok, following this:
- the argument needs to be symbolic, not a quoted program. (raw source)
- nested lambda's will work
- the run time part is called 'apply'

now, what does a compiled lambda expression look like?

   '(A B C) '(foo B bar) lambda  ->  (bind-C bind-B bind-A foo B bar)

that's the easy part. now, where is the storage? clearly, storage is a
runtime thing, so we can change the code to:

   (alloc bind-C bind-B bind-A foo B bar)

now 'B', for which code is generated at compile time, needs to know
where to find this storage. what about just putting it on the top of
the stack, and modifying all code that's not accessing the parameters
to ignore the bindings?

some problems here with passing the lexical state to
subprograms.. wait: this is always done by 'parse'. it's ok to think
about lexical scope as dynamic scope of the parser.

but... passing stuff on the data stack is kind of dangerous, since all
subforms which have lambdas will do the same, so how do the inner
forms find the values of the outer variables?

the only real solution is probably to have the interpreter pass around
an environment pointer..

maybe that's a good point to just stop, and leave out lambda entirely.


Entry: monads
Date: Thu Feb  8 19:18:49 GMT 2007

i guess it's safe to say that 'bind' really is 'lift' as i defined it:
take a function that maps values outside into the monad, and turn it
into a function that can be composed.


Entry: lambda again
Date: Fri Feb  9 10:37:59 GMT 2007

let's see.. what does lambda do? actually two things:
* functions as values (delayed evaluation)
* locally (lexically) defined names

i already have the first one as quoted programs. so the problem i
should be solving is not the lambda problem comprising both
subproblems, but only the latter subproblem: lexical variable
binding. this is forth's "locals".

some more ideas: write the interpreter in Y-combinator form
(CPS?). this would allow the interception of invocations, basicly
allowing any kind of binding of the state that's passed around. maybe
this is the interesting problem for today?

btw. i ordered friedman's "Essentials of Programming Lanugages". First
edition got it very cheap on amazon. Now reading "The Role of the
Study of Programming Languages in the Education of a Programmer."
Done. Gives me a bit of good faith that i'm on the right track. I just
need to study and experiment more.. and learn to smell ad-hoc
solutions.

One of the things the paper mentions is that it is a good thing to
learn to implement your own abstractions / language extensions /
... and to invest some time into learning the general abstract ideas
behind language patterns, mainly (automatic) correctness preserving
transformations.

It looks like the approach Friendman suggests is kind of radical. I'm
doing this from a Forth and Lisp perspective for quite a while now,
but it looks like i am getting stuck in certain simple
paradigms. Rewriting BROOD kicked me out of that and made me think
about better approaches, adopting pattern matching, a static language
and lazy compilation.


The idea with PF as one of the BROOD targets is probably a good
idea. It's going to be a hell of a problem to tackle though.


things to try:

- convert the dynamicly bound code in BADNOP to something i can run on
  the new core.ss : this approach seems like a nice one and i can't
  really say why.. there's the idea that dynamic binding is bad, but
  it's quite handy from time to time (i use it in PF C code all over
  the place). why is this? and what should be the proper construct?

- see what CPS can bring. for one, it should make control structures a
  lot easier to implement. so THAT is what i was looking for. obvious
  in hindsight. but how to do this practically?


Entry: re re re
Date: Fri Feb  9 16:20:40 GMT 2007

so next actions.
1. is scoping important / feasible / desirable?
2. should i solve the assembler purely monadic?

one great advantage of NOT using static (or dynamic) scoping is the
independence of context. it does make a whole lot of sense to actually
just write the components as simple functions, and combine them
later.

what i have already is the core of the assembler, generated as simple
n-argument functions generated from an instruction set table. these
functions return a list of opcodes generated from this instruction.

currently this is executed as:

(define (assemble lst)
  (map
   (match-lambda

    ;; delay assembly
    (('delay . rest) rest)

    ;; assemble
    (((and opcode (= symbol? #t)) . arguments)
     (apply (find-asm opcode) arguments))

    ;; already assembled
    (((and n (= number? #t)) . rest) `(,n ,@rest))

    ;; error
    (other raise `(invalid-instruction ,other)))

   lst))


instead of writing this as a map which is independent, i should write
it as a for-each (an interpreter which accumulates state changes).

ok that was easy enough: the interpreter is split into 2 parts: one
that does pure assemblers (independent of state), which are the ones
generated from the instruction set table, and one that does impure
ones.

now for the disassembler. it's probably easiest to organize this as a
binary tree decoder. the argument decoding could be done working on
the binary representation string.


Entry: values
Date: Fri Feb  9 20:23:22 GMT 2007

i never understood why 'values' would be useful. well, i think i
understant now..

to compose 2 functions A and B
A   (x y z) -> (x y z)
B   (x y z) -> (x y z)

one would need to write
(apply B (A 1 2 3))   , with A returning a list

using values this becomes something like

;; values
(call-with-values
    (lambda ()
      (call-with-values
          (lambda ()       (values 1 2 3))
        (lambda (x y z)  (values (+ x 1) (+ y 1) (+ z 1)))))
  (lambda (x y z) (values z y x)))

;; lists
(apply
 (lambda (x y z) (list z y x))
 (apply
  (lambda (x y z) (list (+ x 1) (+ y 1) (+ z 1)))
  (list 1 2 3)))


i'm not convinced about the values thing.. lists are easier for
debug: they don't requires special call. i think what's easier to
read is a straight composition, where every function passes a
list to the next one, which is then appended to a list of
arguments, like this:

(chain `(,ins ())
       (dasm 1)
       (dasm 2))

maps to

(apply dasm
       (append '(4)
               (apply dasm
                      (append '(4) `(,257 ())))))


(define-syntax chain
  (syntax-rules ()

    ((_ input (fn args ...))
     (apply fn (append (list args ...) input)))

    ((_ input (fn args ...) more ...)
     (chain (chain input (fn args ...)) more ...))))

(chain `(,257 ())
       (dasm 4)
       (dasm 4))


ok. i got the disassembler body working. now still need to do the
search..

this binary tree search looks fancy bit is it really necessary?
might even be simpler actually.

ok. i need some binary trees for that.. just made some code, but
it's kind of clumsy: the tree is created on the fly if some nodes
do not exist. less efficient, but easier to do is probably to
generate a full tree, and then just use set to pinch off a
subtree somewhere.

ok.. dasm seems to work.

some minor issues with parsing multiple word instructions
though.. will have to change the prototype.

so the next step is to move some code to runtime, and to unify
the dasm and asm: basicly they do the same: convert between bit
strings and lists. the real 'problem' is the permutation of the
formal symbolic parameters into the order they occur in the bit
string.


Entry: asm/dasm cleanup
Date: Sat Feb 10 09:34:48 GMT 2007

fix the multiple instruction problem: it's probably easier and
cleaner to have one symbolic instruction correspond to exactly
one binary word. all the targets i have in mind are
risc-like. multiword instructions are then handled as multiword
opcodes.

once this is done, the asm and dasm pack/unpack could be combined
into one single 'interpreter'.

ok. maybe it's best to stop here. it's not 'perfectly clean' but
i guess what's left of dirtyness can easily be cleaned up when i
encounter another instruction set that's not compatible with this
approach.

another thing i need to consider, or at least need a 'reason for
ignorance' for, is: "why am i not generating pic assembly
code?". the reasons are 1. full control, 2. have dasm available
in core for debug. 3. easier incremental assembly & linking.

adding support for text .asm output is rather trivial.

ok...

next: branches

the two passes, fairly simple.
1. backward branches can be immediately resolved.
2. forward branches need to be postponed.

this is a combination of the directives 'relative' 'absolute' and 'delay'


Entry: PIC18 compiler
Date: Sat Feb 10 12:44:07 GMT 2007

time for the crow jewel :)

but first, i need to clean up the core.ss register code to accept
an abstract store with default. ok done.

i don't like the way i've got the generic register compiler and
the PIC18 compiler completely separated. it is good to share
code, but in this case, the sharing can probably be done better
by just copy/pasting the patterns, or at least, inserting them
from a common include.

what about keeping the register compiler as a general purpose
example and figure out how to do proper sharing once i have
different architectures running?

yep.. i think it's best to keep that idomatic compiler for other
experiments, and go straight for proper pattern matching peephole
optimizer.


Entry: more PIC18 compiler
Date: Sun Feb 11 09:10:13 GMT 2007

i think i made a mistake by writing it as just a pattern
compiler.. this thing should be a proper language with recursion,
otherwize i can't implement recursive substitution macros and other
lanugage patterns: one machine that maps forth straight to asm.

the only preprocessing stage should be the reducer, which folds
expressions like '1 2 +'. even better, this reducer should be part of
the compiler too, so that expanded macros benefit directly from this.

summarized: separate reduction and expansion phases might lead to
suboptimal performance: it's probably best to condense all this into a
single phase, and make an extensible pattern matcher.

this would be the same design as before. there are more of these: the
little interpreters for macro mode etc.. it was pretty good already it
seems. just the global variable thing was a mistake.

ok.. it probably pays to make the pattern matcher programmable. add a
minilanguage there too.

NEXT:
* control structures
* extend pattern language

the latter is not so trivial in the current implementation since a
nice thing to have would be a 1 -> many mapping. i could use a special
'splice' word for this though. maybe it's best to work around this
though.

anothering i'm thinking is: now that i'm no longer afraid of this
pattern matching business, why don't i write my own? this would make
it possible to do some of this at runtime, making it a bit more
flexible for additions etc..

time to taka break.


ok.. what i have is 2 conflicting operations: a pattern replacement
and a reverse. this needs to be sorted out properly: what exactly do i
want the programmable part to do?

ok.. it seems to work now. needs cleanup. i'm really curious about
runtime though.. probably these are all written in terms of the syntax
expander, and need to be syntax?


Entry: merging dictionaries
Date: Sun Feb 11 14:10:36 GMT 2007

i'm trying to port the intelligent macros now.

long standing problem.. should you merge macro and bottom language
dictionaries, or keep them separate? i think the best way to go is to
manually import or link what you need.

about variables and allocation. i think it's easier to just use
variable names for this, and shadow them when they are changed. then
after a compilation is done, the whole dictionary could be
filtered. the other option is to use a functional store like before,
which might be a good idea anyway.

NEXT:
- functional store (it's cleaner, and might come in handy later)
- conditionals + optimization
- for loops + loop massage


Entry: stateful macros
Date: Mon Feb 12 01:25:53 GMT 2007


let's see if i actually learned something.. basicly, i have two
options now. to write all the macros as explicitly handling the asm
buffer, or to have them spit out just a list of instructions.

i don't think there is any code that has to look back to the past asm
state: all words that do that are written as pattern matching partial
evaluators.

so, let's write all control structs as producers, just like the other
macros.

so.
i think i sort of disentangled the problem:

------------------------------------------------------------------
If there is a lot of state that has to be dragged along, split all
operations into classes that operate only on substates, or have a
simple, consistent way of operating on state, like concatenation.
Then, lift all these subclasses to a common interface that can be
composed.
------------------------------------------------------------------

The thing i'm using is really the Writer Monad.


Entry: monads
Date: Mon Feb 12 00:53:53 GMT 2007

about a year ago i made a decision to use a functional dynamic store
to solve the problem of state, because i didn't understand the idea
behind monads. this was a mistake, but i guess a necessary one. i
probably wasn't ready for the ideas at that time.

now i think i sort of get it. monads (haskell style) are about
dragging along state implicitly.

the irony is, i implemented that!

what i did was to have an implicit state objected being dragged along
as a top of stack element, invisible to some computations. this is the
'State Monad'.

the mistake is: this too general. it's better to use a smaller gun to
solve the problem at hand on a more local scale, instead of basicly
using a state machine model (albeit one without destructive mutation).

the small gun is mostly related to the 'Writer Monad'. the operation
that's made implicit is 'append'. i call this 'lift-stateful'. this,
together with some other state dragging (if the data stack is not
used, it can be dragged: some operations, like the pattern matching
peephole optimizer, work on the produced code as a stack.)

the thing that's really interesting though is this: if you start to
think about forth as a compositional language, then this whole monad
thing is nothing more than a way to 'lift' words so they can be
composed in linear code.

basicly. if the things you want to compose are operations A x B -> A x
B, but what you have is operations like

A -> A
B -> B
A -> B
A -> A x B
B -> A x B
A x B -> A
...

together with a higher order function (hof) that will correctly lift
them to A x B -> A x B, then what you're doing is abstracting away the
trivial parts of such a map in this hof.

for the writer monad, the trivial part is 'append'. replace 'trivial
work' with 'hard work' and you get this:

http://lambda-the-ultimate.org/node/1276#comment-14113

"By using a monad, a simplified interface to the necessary
functionality can be provided, while the hard work of maintaining and
passing the context is handled behind the scenes."


so, what i need to do is to work out some abstractions so i can
perform this kind of magic in straight cat without having to resort to
scheme code.


Entry: backtracking
Date: Mon Feb 12 08:35:51 GMT 2007

in 2.x there are the for .. next macros that perform an optimisation
for which a decision has to be made early on. does it make sense to
use 'amb' for this?

probably yes, becasue explicit undo is going to be more expensive than
just going back to a previous point and re-running the compilation..

the tricky part is to keep it under control :) in an interactive
interpreter, where state can be accumulated on the stack, having
lingering continuations in the backtracking stack might be dangerous,
since 'fail' effectively erases all changes made since the last
success.

i've provided 2 lowlevel words:
kill-amb!     reset the backtracking engine
amb	      make a nondeterministic choice from a list

the code in amb.ss supports (possibly infinite) lazy lists in case i
ever need them.

so. let's make 'amb' binary. this way it's easier to implement lazy
amb by embedding another call to amb in one (both) of the
alternatives. yep. this looks like a better idea.

haha. keep it under control! i've just been chasing a 'bug' where amb
apparently didn't return properly, however, it was just waiting for
input: the continuation had a 'read' in it, and the fail depended on a
previous read, so it just wanted that read again. so conclusion:

-------------------------------------------
be careful with amb and non-functional code
-------------------------------------------

i fixed the 'cpa' "compile print assembler" loop to read lines instead
of words, so at least the backtracking is ok on a line base.


Entry: commutation
Date: Mon Feb 12 16:44:30 GMT 2007

there are a lot of places where just swapping the order of
instructions might be beneficial. i ran into a bug where it is not
possible, although on first sight the operations seem independent:

	((['movlw f] 1-!) `([decf ,f 0 1] [drop]))

because 'decf' has an effect on the flag that's used in the macro for
'next', this is not always correct! drop, being movf, sets the Z,N
flags. however, decf sets the carry flag, so this could still be
used. however, i've disabled the optimization..


Entry: next actions
Date: Mon Feb 12 16:52:47 GMT 2007


- conditions
- variables
- constants in assembler


a variable allocation is just a dictionary operation, so it really
should be an assembler step. i need to think about that a
bit. something's wrong...


Entry: bored
Date: Mon Feb 12 23:18:29 GMT 2007

let's play a bit.

generators.. a generator is easiest understood as something
which, when activated, returns a generator and a value. in other
words: a generator is a lazy list.

(((3) 2) 1)

is a finite generator

manfred von thun has an interesting page about using reproducing
programs as generators:
http://www.latrobe.edu.au/philosophy/phimvt/joy/jp-reprod.html


i wonder how to do this in lisp?
suppose fn is a state update function

(fn init) -> generator


(define (gen fn init)
  (lambda ()
    (cons init
          (gen fn (fn init)))))

in cat it's quite simple too

(gen (2dup run swap gen) qons qons) ;; (init fn -- gen)

as mentioned by manfred in

this is related to the Y-combinator.
http://www.latrobe.edu.au/philosophy/phimvt/joy/j05cmp.html
basicly, a generator or lazy list is a delayed recursion.

so in cat, applying 'run' to a lazy list, has the same result as
applying 'uncons' to a list.


Entry: misc ramblings
Date: Tue Feb 13 12:06:14 GMT 2007


i'm going to change terminology a bit so it's more Joy like, if
only for the reason that it makes joy code easer to read.

duck -> dip

http://www.nsl.com/papers/interview.htm

There is a ... combinator for binary (tree-) recursion that makes
quicksort a one-liner:

    [small] [] [uncons [>] split] [swapd cons concat] binrec

then for-each:
i need to find the more abstract pattern, which is 'fold'.

what about a fold over a lazy list?


Entry: lazy lists
Date: Tue Feb 13 12:22:52 GMT 2007

right now i use them in (amb-lazy value thunk), where 'value' is
returned immediately, and thunk will be evaluated later.

the question remaining is that of interfaces. if i say "a lazy
list" do i mean thunk or (val . thunk) ?

(there is another question about using 'force' and 'delay'
instead of explicit thunks. for functional programs there is no
difference, but for imperative programs there is. maybe stick to
thunks because they are more general.)

i think 'amb-lazy' should be seen as a 'cons' which contains only
forcing, and leaves the delay operation to the user. i provide 'amb'
to construct a full list from this. unrolled it gives:

(amb-lazy first
          (lambda ()
            (amb-lazy second
                      (lambda ()
                        (amb-lazy third
                                  (lambda () (fail)))))))


for generic lazy lists: maybe using 'force' and 'delay' is
better, since it allows for 'car' and 'cdr' to trigger the
evaluation. this enables the definition of lazy-car and lazy-cdr
without fear for multiple evaluations that have different
results, and it still allows for non-functional lists.

ok.. cleaned it up a bit, and moved most of it to lazy.ss lazy
operations have a '@' prepended to the name of the associated
strict operations. i have @map, but @fold doesn't make sense since it
has a dependency in the wrong way.

i should also change ifold to something else.. there has to be a
proper lisp name for it. i renamed it to 'accumulate'. makes more
sense. (accumulate + 0 '(1 2 3))

the corresponding lazy processor makes sence, but only if it returns
the accumulator on each call. so it's more like
'accumulate-step'. it's better to just create the @integral
constructor, which gives a new list from an old one.


Entry: had this idea
Date: Tue Feb 13 21:28:02 GMT 2007

can you do something like:
1. resolve label
2. oops can't do. save 'cons' but continue
3. run all pending conses with the obtained info.

now, this isn't much different than storing all unresolved symbols in
a table and later fix them, only this stores actions. (don't set a
flag, set the action!)

more specificly, suppose there's the input

     		x x x y z z z

where y is not resolvable. the way to solve this is to have y run z z
z and then try to resolve y and concatenate the results. basicly just
swapping the order of execution..

something that could be done is to make the assembler essentially
2-pass, where the first pass performs normal assembly, but on the fly
creates its reverse pass which just resolves the necessary items and
works its way backwards.

talking about overengineering :)
a simple 2 pass is probably good enough.

but still..  this is more efficient, since the reversing which would
happen in an explicit 2-pass is not necessary + the scanning of things
already compiled can be avoided.

so:   x1 x2 x3 lx y1 y2 ly z1

-> (... z1 (ly y2 y1 (lx x3 x2 x1 (...))))


Entry: backtracking -> an argument against dictionaries as sets
Date: Wed Feb 14 10:50:57 GMT 2007

another thing what i didn't think about.. what's the actual cost of
the continuations? i don't think it's much, because the data is mostly
shared: asm is just appended to until it's completely finished, and
the code list is just run sequentially. there's no rampant data
copying going on: the garbage is created only at the compile end.

so, it might actually be better to NOT keep dictionaries stored as
sets, but just shadowed association lists, to make backtracking memory
efficient. (in case i want to create lots of choice points). the
redefining of 'current allocation pointers' tends to re-arrange and
copy things on functional stores..


Entry: bit instructions
Date: Wed Feb 14 15:58:04 GMT 2007


there are a lot of bit instructions that are better handled in a
consistent way. one of the problems with the assembler is that bit set
and bit clear have different opcodes. i think it makes more sense to
handle them as one opcode + argument.

all bit instructions are polar, take another 'p' argument, so they can
be easily flipped as part of the rewriting process. the extra argument
is placed as first one, to make composition easier.

    bcf bsf -> bpf
    btfsc btfss -> btfsp
    bc bnc  -> bpc
    ...

ok, it seems to be solved with a set of pattern matching macros, and a
factored out label pusher :)


 ;; two sets of conditionals
 ((l: f b p bit?) `([bit?  ,f ,p ,p]))  ;; generic -> arguments for btfsp
 ((l: p pz?)      `([flag? bpz ,p]))    ;; flag -> conditional jump opcode
 ((l: p pc?)      `([flag? bpc ,p]))

 ;; 'cbra' recombines the pseudo ops from above into jump constructs
 ((['flag? opc p] cbra)   `([r ,opc ,(invert p) ,(make-label)]))
 ((['bit?  f b p] cbra)   `([btfsp ,(invert p) ,f ,b] [r bra ,(make-label)]))

then we have the recursive macro (if == cbra label) and the pure cat
macro (label == dup car car swap)

a lot more elegant than the previous solution. i like this pattern
matching approach.

NEXT:
* variable and other namespace stuff
* forth lexer
* parsing words
* intel hex format


Entry: forth lexer + parsing words
Date: Wed Feb 14 21:40:13 GMT 2007

which is of course really trivial. see lex.ss
i'm not doing '(' and ')' comments again, just line comments '\'

i think i know why i always had problems with my ad-hoc parsers and
word termination etc.. splitting in lexing and parsing makes sense,
because the first one is purely flat, while the second one can be
recursive. it helps when in the 2nd phase there are no more stupid
problems with word boundaries..

parsing words are, well, extensions of the parser :)

since these will make things move away from straight 1->1 parsing, the
parser needs to be rewritten as a recursive process / fold.

ok, the scaffolding is there: written in terms of
reverse/accumulate. now i need to really think about how to solve the
'variable' problem.

-> how to solve parsing words?
-> where to do the actual allocation?


Entry: ihex
Date: Thu Feb 15 00:26:02 GMT 2007

this used to be written in CAT, but was a mess. it's one of those
simple things that are hard to express in a combinator language
because they drag along so much state if you want to do them in one
pass. again, they are about merely re-arranging data!

maybe i should just try it again, but using a multipass algo, just
to see if i learned something.. on the other hand, this would be
nice to have as scheme code, so i can use it outside of the project.

ok.. it seems to work fine. got some binary operations for free that
can be used in the loader too.


Entry: parsing words
Date: Thu Feb 15 09:55:27 GMT 2007

so. i need:

    : variable 2variable constant 2constant

the thing which is different from the previous implementation, is that
i have a separate compile (parse) and execute phase, so parsing words
cannot be compilation macros.

on the other hand, parsing words are always about quoting things,
mostly names, so probably a simple list of names mapped to number of
symbols is enough. limiting the number of symbols to one makes it even
easier.

sort of got something going here with variables and constants, but
there's another problem:


Entry: dictionaries
Date: Thu Feb 15 12:01:35 GMT 2007

i'm using a hash table to store 'core' macros: those that are
fixed. however, a forth program can create macros, so these need to be
defined somehow..

maybe make that a bit more strict?

the same goes for constants.. i'm using fixed machine constants in a
hash table, and some user defined stuff in other places.

this needs some serious thinking..

constants can be implemented as macros which inline literals. so the
only remaining question is: how to handle macros?

macros are really compiler extensions. they are a property of the
host, not of the target code.

it would be really inconvenient having to split a project into two
parts, so i should aim for macro defs inside source files. however, a
clear distinction needs to be made between host and target things:

target properties are related to on-chip storage == addresses

host properties are related to code generation only

the result is that there are 2 possible actions on a source file:
- reload macros + constants
- recompile = realloc code and data

to track the state of a project, the only thing that needs to be saved
is the source code + a dictionary of target addresses. all the rest
(macros) can be obtained directly from the source code.

actually, this is a lot better than the old approach, where macros are
stored in a project state file.


Entry: new badnop control flow
Date: Thu Feb 15 12:28:56 GMT 2007

in  = project sourcecode
out = compiled target code + dictionary

1. PARSE EXTENSIONS

   Read all source files and extend the compiler to include the macros
   and constants defined in the source files. This effectively builds
   a new special purpose compiler for the code in the project.

2. COMPILE CODE

   Convert all code definitions and data allocations to a form that is
   executable by the CAT VM, and run this code. This generates
   optimized symbolic assembly.

3. ASSEMBLE CODE

   In a two-pass algorithm, convert the symbolic assembly to binary
   opcodes, allocating and resolving memory addresses. This process
   uses the current dictionary, reflecting the state of the target,
   and produces a new dictionary and a list of binary code.


Entry: parse extensions (borked)
Date: Thu Feb 15 13:13:17 GMT 2007

and hupsakee, i'm writing a parser state machine again!

amazing what a not-so-good night's sleep does.. let's do this a bit
more intelligent using my favourite one-size-fits-all hammer pattern
matching!

seriously, the syntax is really simple, so i shouldn't be writing a
state machine, just a set of match rules.

one thing though. how to extend it? previous brood needed parse words
to be written explicitly. i should do that now too.. just a dictionary
of parse words, that output a chunk of cat code and the remainder of
the input stream.


Entry: forth parser - different pattern
Date: Thu Feb 15 18:45:52 GMT 2007

ok. got some sleep.

the thing is that this is a different pattern than all the other
things i've been doing. the previous pattern matching code for the
assembler is basicly a partial evaluatior, which looks backwards
instead of forwards. so this needs new code!

in short i need a different kind of parser or a preprocessor to map
forth -> composite code.


let's try to arrange the thoughts a bit since i feel i'm not seeing
something really obvious..

i have an urge to write the parser as a state machine, or as a pattern
matcher. both of them seem to lead to code with a similar kind of
complexity, but with some obvious redundancy. i can't see the higher
level construct.

ok.. what i'm missing here is elements from SRFI-1

it's quite clear what i want to do: generic list pattern
substitution. so basicly, the prototype is:

(in) -> (in+ out)

with (out) being concatenated.

let's call this the 'parser' pattern, and write an iterator for it.

ok. it needs a bit of polish, but the idea is there i think..


Entry: ditching interpret mode
Date: Thu Feb 15 21:33:13 GMT 2007

what about ditching interpret mode and relying fully on partial
evaluation? i can use the following trick: the partial evaluator does
NOT truncate results to 8 bit during evaluation, only after. so in
principle, there is a complete calculator available with full numeric
tower.

maybe it's good to create some highlevel constructs for the partial
evaluator. literals are still encoded as symbolic assembly, which is
ok, only somehow a bit dirty. this is effectively a second parameter
stack..

to make this more explicit, the macros 'unlit' and '2unlit' are
defined. these will reap literal values from the asm buffer and move
them to the parameter stack. the implementation of these macros is
split into two parts: a pattern matching part, and a generic macro
part '_unlit'.


Entry: more parsing
Date: Fri Feb 16 10:23:00 GMT 2007

so the basic infrastructure is there, now i just need to figure out
how to put the pieces together. this host/target separation needs some
more thought.

the problem i'm facing atm is 'constant'. this should define a
constant as soon as it's parsed, but the value comes from partial
evaluation which happens at macro execution time!

maybe i shouldn't really care about this 2-pass stuff.. i can just
compile code for it's side effects, being the definition of macros..

another thing, which is related to the comment about the asm buffer
being a second parameter stack: why not compile quoting words as
literals instead of loading them on the data stack? this way a simple
pattern matching macro can be used to implement the behaviour of
parsing words..

i have to be careful though, since this arbitrary freedom must have
some hidden constraint somewhere..

the hidden constraint is of course: literal stack encoding is
machine-dependent! it's actual assembler dude!

maybe keep it the way it is, however, 'forth-quoted' feels wrong. also
the combination of literals coming from the asm buffer, and the symbol
coming from the stack, feels awkward. but it does seem to be the right
thing..

anyways.. it seems to work now.


Entry: dictionary
Date: Fri Feb 16 14:07:09 GMT 2007

so the only thing that's remaining is the runtime dictionary stuff:
variables (ram allocation) and associated things.

mark variable names as literals during parsing. done.


i'm still not sure whether the muting operations are such a good
idea.. maybe a separate macro parsing stage is better after all.. as
far as i understand, the thing which makes this difficult is the way
that 'constant' works: it's dependent on runtime data (partial
evaluator), so the definition needs to be postponed..

what about using some delayed eval here? or i can use the same trick:
reserve the name so it can be treated as a literal, but fill in the
value later?

so, on to the fun stuff..  dictionaries. basic functionality seems to
work using the 'allot' assembler directive.


Entry: parse time macro definition
Date: Fri Feb 16 14:53:23 GMT 2007

what if i can:
- define all macros
- reserve all constant/variable names (which are just literal macros)

during parsing only?

and fill them in whenever the data is there?

the problem is how i'm handling 2constant now.. this can be fixed with
a gensym.

ok. this looks doable, but not essential. something for later.


Entry: forth loading and machine memory buffer
Date: Fri Feb 16 17:53:05 GMT 2007

two things i just did:
- added a function to load symbolic forth code
- draft for memory stuff

need to figure out where to do 'load'

load is a quoting parser, then just executes..


Entry: optimizations - need explicit unlit
Date: Fri Feb 16 23:40:27 GMT 2007

i'm running into several conflicting eager optimizations, which is
normal of course.. i was thinking about making this a bit more
manageble, by prefixing operations that have a lot of different
combinations with virtual ops that will just re-arrange things for the
better..

the most occuring mistake is to combine a dup with a previous
instruction so the lit doesn't show any more. i think in 2.x there is
an explicit 'unlit' that puts the drop back..

ok. this pattern matching is definitely an improvement for writing
readable code, but it does pose some problems here and there..


TODO:
- intelligent then
- better literal opti (unlit)
- port the monitor
- device specific stuff
- code memory buffer
- host side interpreter


Entry: optimization choices
Date: Sat Feb 17 10:37:12 GMT 2007

instead of having 'stupid' backtracking, it might be easier to do
'intelligent' backtracking. this means: at some point a choice is
made, but if at a later time it is realised this choice is the wrong
one, then this particular choice needs to be changed.

the pattern i encounter is this:
1. do eager optimization
2. realize later this optimization was not optimal
3. undo previous optimization to perform better one

every time there is an 'undo' this could be solved by an automated
backtracking step. what about a sort of 'electric save' ??

(it would also be interesting to somehow 'cache' the choices that have
been made in the mean time, so when a whole subtree is executed again,
the right choices are made first..)

interesting stuff :)

it looks like the search space is not really a tree, but more like a
snake line: 10010011001, where at some point one of the choices is
deemed wrong, for example 10010x11001. the remaining part 11001 then
needs to be re-done, but using the same pattern might be an
interesting optimization.

another thing is the storage of choices. backtracking needs a stack to
operate. well, i already have one! the asm buffer serves that purpose
quite adequately. this also solves the problem of the backtracking
using mutable state.


on the other hand, working purely algebraicly does have the advantage
of simplictly, but it requires the explict construction of inverse
operators.


Entry: literal opti
Date: Sat Feb 17 11:33:49 GMT 2007

instead of making pe operate on DUP MOVLW, let's make it work on MOVLW
only, so the extra SAVE is not necessary.

hmm.. i'm going in loops. the thing is that i'm using the literals in
the asm buffers really as compile time stack. simply making the
partial evaluation respect 'save' would enable to keep that
paradigm. otherwize the DUP in front of MOVLW (DUP MOVLW) needs to be
handled explicitly every time. this then needs to be handled by a
recombining DROP operation, which is really no different from handling
SAVE properly...

so back to the original solution.

to keep everything as pattern matching macros, i could also run an
explicit recombination after the literal operations.. quick and dirty.

wait a minute.

i can just dump code in the asm buffer, and add a bit to the pattern
macro to check for this, and execute it. then the only problem is:
quoted code or primitives? probably primtives are best, since they are
already packed into one item, and don't need 'run'.

ok. that seems to work just fine :)


Entry: monitor
Date: Sat Feb 17 16:22:13 GMT 2007

ok.. seems i'm almost to the point where i can compile the full
monitor code. some things are missing, like the chip specific configs,
but i can see that the partial evaluator is going to help quite a lot
to keep things simple: more things can be configured in the toplevel
forth source code file instead of a lisp style project file.

something that needs to change though is support for 'org'. this
probably means that assembly code needs to be tagged somehow.

ok. org is simply solved by embedding (org <addr>) in binary code.


Entry: intelligent then
Date: Sat Feb 17 19:59:12 GMT 2007


since i don't exactly remember what the code does, and i can't read
the old 2.x code just like that, let's decipher it.

the problem is something like this:

l4:
	btfsp 	1 TXSTA 1 0
     r 	bra 	l5
     r 	bra 	l4
l5:

which comes from

      begin tx-ready? until

which expands to

      begin tx-ready? not if _swap_ again then

the important part is the 'then', which should decide that it should
flip the polarity of the skip and the order of the two jumps IF the
first one corresponds to the symbol on the stack.

this works not only for braches, but for any single instruction
following after the forward branch.

ok. implementation. this doesn't fit the pattern matching shoe, since
the label on top of stack needs to be incorporated in the
check. however, it is possible to just compile the 'then', and perform
the optimization afterwards, which is possible using a pattern
matcher.

ok. this works.
i don't check the label though.. should do that, or prove that it
can't be anything else..


Entry: reverse accumulate
Date: Sat Feb 17 20:27:38 GMT 2007

now, something that has been getting on my nerves is the reverse tree
stuff.. there is absolutely no reason for it. the original reason was
to split code into chunks of forth idioms, but i sort of lost that...

this whole reverse tree stuff makes things to complicated so it has to
go.

temporarily i will take out the reverse-tree function.

ok. this seems to have worked.
a lot of code is a lot simpler now..

no there's still a bug. fixed.


Entry: tip
Date: Sat Feb 17 23:44:33 GMT 2007

(require (lib "1.ss" "srfi"))

yep. sometimes it takes a while to figure out the small things..

another thing: srfi 42 is about list comprehensions (loops &
generators). seems worth a look.


Entry: time to upload
Date: Sun Feb 18 08:37:04 GMT 2007

looks like stuff is in place to start dumping out hex files.  so, i
need to make an effort to not fall into the same trap as before: it
would be nice to have cat completely on the background, and do
everything from the perspective of the target system.

the easiest way to do this is to use the current debug interpreter,
and plug in a proper 'interpret' mode for interaction. yes, here there
is some confusion. what about interpret mode?

do i switch to compile mode explicitly? i kind of like the colorForth
approach where there is only editing and commands, no command line
editing.

everything between : ... ; is always compile mode. the tricky stuff is
what's before that, because i completely rely on conditional
compilation for constants etc..

but, constants are really the only exception. if i make an
interpret-mode equivalent of constant, then i could fake that.

oth. a proper compile vs interpret mode might be a better solution. it
is definitely cleaner.

so we converge on this?

-> compile mode = exactly the same as what's in files
-> interaction mode = all the rest

implemented as 2 coroutines.


Entry: state
Date: Sun Feb 18 08:59:04 GMT 2007

at this time, it becomes rather difficult to maintain all the state on
the stack, so i probably need to move to a more general state
monad. basicly what i had before in 2.x, but without executable code.

fist, let's see about what state needs to be accumulated:
- assembler buffer
- target dictionary
- forth code log?

data necessary in different modes:
compile:       asm buffer
assemble:      asm buffer + target dictionary
interpret:     target dictionary

i can probably avoid explicit monads (i don't know how to really do
that: have to lift a lot of code!), and just use a main driver loop
that runs the applications with the dictionary dipped.


what i have is a proper class based system:

- classes are cat dictionaries (implemented hash tables)
- inheritance is based on chaining these dictionaries.
- objects are association lists.


so that's for later. i'm in no need for objects with encapsulated
behaviour. the only thing i need is a local scope, so it's really just
used as a data structure.

this means i can start writing the main loop of the program, which is
basicly written as a method bound to state.

the thing i need to be careful about though is tail recursion. this
works with 'invoke'.

now that i'm here.. looks like this is an interesting way to implement
the assembler too, by writing an object that's a list, and using a
'comma' opcode to compile instructions.

thinking about this, there are really 3 major ways of symbol binding:

- method: aggregate
- lexically nested
- dynamicly nested


ok. brace for impact. going to do the asm 'object' thing.

ok... unresolved yet. this is too convoluted, precisely for the reason
of recursive calls. i'm still thinking dynamic binding here..

but there's something to say for the idea..


trying again..

TODO
- i need a better way to create a compiler for compositions:
(register, parse, name)
- should have a state base clase with just: self self@ invoke


ok. done.


Entry: passing state to quotations
Date: Sun Feb 18 15:32:47 GMT 2007


now for code quotations. how to recurse?

the problem is that if quotations are executed using 'run', they will
not obtain the state, so they need to somehow be wrapped such that
running them passes alog the state. is that at all possible?

yes. using some kind of continuation passing..

instead of wrapping the code simply in 'dip'

so:

- quoted programs need to be parsed recursively

- they need to be modified such that running them results in the
  object being loaded on the stack.

- it is not possible to override every word that performs 'run' to
  incorporate this behaviour.

- this trick is only LEXICAL no dynamic binding of words, only dynamic
  passing of state.


same old same old..
this goes way back :)

the problem is of course in the shielding. as long as every primitive
is really shielded from the state, there is absolutely no way to
access it. so  (blabla) dip  is not a good apprach.

it should be hidden but accessible, and not shielded.

let's do this manually for now: when you want to use quoted code in a
method definition, you have to explicitly parse, compile and invoke
it. the default will be globally bound code only, and shielded
execution for simplicity.

the alternative is to compile quoted code as a method (recursive
parse-state). this is kind of strange since the invokation has to
happen manually. no 'ifte' for example.

so unless i find a way to solve the 'ifte' problem and other
implicit 'run' calls, there is no way to do this automaticly: this is
really a modification to the core of the interpreter.

so i am going to let go of the scary bits, and conclude:

* only flat composition done automaticly
* recursive composition possible using 'invoke'
* quoting method code is done manually using special parser/compiler


so it all remains pretty much a state monad. some special functions
can be thrown into the composition to act on the state through some
interface, while the rest is 'lifted' automaticly.


Entry: fixing amb
Date: Sun Feb 18 16:15:23 GMT 2007

postponing the real work, i can try to fix amb to make it operate
only on the assembler store. what i need to do to make this work is to
return the continuation explicitly. so amb will do:

amb-choose   ( c1 c2 handle -- c1/c2 ) + effect of handle

here handle will store the continuation on a stack somewhere if c1 is
chosen. if this continuation is called, c2 will be chosen without
handler.

ok. looks like it's working.
still need to strip out the continuations in the assembler though.
done.


Entry: the app
Date: Sun Feb 18 18:14:22 GMT 2007

time to write the main loop.
- based on the store monad containing:
   - asm buffer
   - forth input stream (per line)
   - state memory
- written from the target perspective
- compile mode / interpret mode

ok..
seems i'm at least somewhere. now i need to think about the design a
bit more.. the state stuff is encapsulated in a small driver loop, the
rest is still functional.


Entry: byte continuations
Date: Sun Feb 18 19:07:52 GMT 2007

i was thinking about a way to use more highlevel functions in the 8bit
forth. obviously, a jump table can be used to encode jump points as
bytes. but why stop there? the return points can be mapped also,
giving the possibility of encoding return stack in bytes too, as long
as code complexity is small enough.

the compiler could do most of the bookkeeping.

this would make sense in a setting where the code is simple, but the
number of tasks is big. since that needs a ram-returnstack, which is
better implemented as a byte stack anyway.


Entry: application
Date: Mon Feb 19 09:30:44 GMT 2007

some remarks.
bin needed?
probably not.. just keeping the assembler and generating assembly on
the fly is probably best.

the basic editing step is:
- switch to compile mode, enter/load forth code
- switch to interpret mode -> code is uploaded

cat should only be for debugging

ok, so

CPA = forth compile mode. this is to edit the asm buffer using forth
commands. the asm buffer is stored in the 'asm file. in CPA mode it is
possible to test the assembly by issuing 'pb'. however, this doesn't
use the stored dictionary.. need to fix that.

ok, what i have now are 2 modes, switched using ctrl-D

* compile mode = compiled forth semantics ONLY
  not even special escape codes for printing asm etc

* interpret mode = simulated target console. target is seen as what it
  actually is + some interface to a server. the language used is forth
  lexed, but piggybacks on cat words.

looks like it's working fine this way. let's keep it.


Entry: literals again
Date: Mon Feb 19 12:05:39 GMT 2007

ok, i need to do this properly. back to the unlit strategy. basicly:
try to recover literals one by one, instead of massive combined
patterns.

let's try this:

lits asm>lit asm>lit

ok. seems to work. still needs some explicit code that might be
optimized, i.e. the literal patching. but i can live with it like it
is now..


Entry: inference
Date: Mon Feb 19 14:27:02 GMT 2007

it should be possible to infer some more about the state of the stack
given there are no jumps from arbitrary places, which is a sane
assumption.


Entry: another day over
Date: Mon Feb 19 18:01:19 GMT 2007

and i'm running it in the MPLAB simulator. it generates correct code
at first sight. so, time to hook it up :)

still some features missing: one of them is proper byte/bit
allocation.

so TODO:

- host side monitor
- state save/load

ok, i'm getting bytes back from the monitor running on the chip. time
to start writing the monitor code.


Entry: dynamic code
Date: Tue Feb 20 00:41:11 GMT 2007


cleaning up a bit now. funny, what i need now is dynamic code :)
anyways. it's easy enough now that i have a general purpose store. all
kinds of hooks can be added here, which can be saved later. they all
go in symbolic form. to make them full circle (symbolic words in
symbolic words) i need to add some kind of explicit interpreter
probably..


Entry: parse
Date: Tue Feb 20 09:55:04 GMT 2007

to wake up today, i'm going to change all the 'parse' stuff to
'compile', since that's what it really does: parse+compile. 'bind'
would be better maybe. thesaurus.

well, 'compile' is relly quite understandable.. so let's keep that.

maybe i better make compile = (bind + parse), and turn 'bind' into a
proper CAT function? this way the whole semantics and parsing thing
can be handled in CAT code.

the other thing to think about is CPS. does it make sense to use that?
i'm still thinking about run vs invoke. maybe it's better to just keep
it explicit until my current approach takes more shape and patterns
fall out..

change 'unquoted' to 'primitive'


parse:     ( source binder -- compiled )
find:      ( symbol -- delayed/primitive )


i changed names to the following protos:

a couple of syntaxes:
  cat-parse state-parse

a lot of namespaces:
  cat-find  <whatever>-find


ok, need to do clean up this stuff later.. maybe tonight.


TODO:
- fix the toplevel interpreter stuff + reload
- on reload, macros should be reloaded from source files also. means
  compile + ignore asm.
- fix proto of binder (+ parser?)
- CPS with dynamic variables?


Entry: duality
Date: Tue Feb 20 13:54:52 GMT 2007

something interesting happened here.. 'state-parse' is now implemented
as a delayed parse operation, which exposes the semantics:

parse:   list of things -> list of primitives
find:    thing          -> primitive

generalizing find's symbol -> primitive semantics. i could probably
find a better name, but let's stick to this since it's all over the
place. from now on 'find' means: map a "thing" to primitive behaviour,
and 'parse' means: map a collection of "things" to a LIST of primitive
behaviours, representing the functional composition of these
primitives.

in case of a 1-1 relationship between source syntax and compiled code
in list form, parse is really just (map find source). this is one of
the properties of CAT source code.

so there is something very simple hidden in all this..


---------------------------------------------------------------------

* PARSE: handle the structure or SYNTAX of source code.  this will
  translate source code to to a very basic COMPOSITE CODE
  representation, which is a list of primitive code elements,
  effectively reducing any form of syntax to a simplified one. in
  doing so, parse can use 'find' recursively to translate primitive
  source objects to primitive machine code.

* FIND: handle the meaning or SEMANTICS of source code. this will
  translate a source code atom, and translate it to PRIMITIVE CODE,
  possibly using 'parse' recursively to translate atoms comprised of
  structured source code.

this is the source code / compiled code duality.

       parse code collection        <-> interpret primitive code list
       find semantics of code atom  <-> run primitive machine code

---------------------------------------------------------------------

here 'machine code' is the code representation of the underlying
machine, which in this case is scheme, with primitives represented as
functions operating on a stack of values.

this is just eval/apply in disguise. the difference being that for
lisp, the functionality is represented by the first element in a list,
while here it is a composition.

eval:   (head more ...)  ==  (apply (eval head)
                                 (list (eval more) ...))


Entry: next actions
Date: Tue Feb 20 17:13:23 GMT 2007

run time state? or where to store the file handler? do this
non-functionally, since it's I/O anyway.. why not?

that seems to work. got ping working too. and @p+ next couple of
things should be really straightforward, but i am missing one very
important part: I CAN'T USE QUOTED PROGRAMS!!!

so i need to do something about that..

again, as far as i understant, the problem is in 'run'. if you hide
information by 'dipping' the top of the stack, there is no way to get
it back, unless you can bypass this mechanism somehow.

the thing that has to change is the interpreter.

ok. it should be possible by doing something like

    '(some app code) compile-app (for-each) invoke

making sure that the dict gets properly tucked away.

the nasty thing is this is dependent on the number of arguments the
combinator takes.

(invoke-1 swap run)
(invoke-2 -rot run)


invoke is bad for the same reason...

something is terribly wrong with the way i'm approaching this.. no
solution. too many conflicting ideas.

1. i need combinators to "just work"
2. i need to be able to run non-state code properly

possibilities:
- patch all quoted code -> parsed as state code
- do not patch combinators

maybe i should just try?

this is crazy...

i just don't get it.

heeeelp!


i don't know how to solve it.. but i can work around it :)

basicly, the problem i have is that i can't use higher order functions
in combination with the state abstraction: basicly, because the
abstraction effectively uses a different kind of VM. to solve it, i
need to either accept i have to change the VM, or just make the data
i'm using persistent. there are several options:


* turn the n@a+ and n@p+ into target interpreter instructions. this
  just makes them static, so i do not have to use references to
  dynamic state in the core routines. might be the sanest practical
  solution.

* just forget about the functional approach to the dynamic state and
  store this in a global variable. a bit drastic, and i will probably
  regret that later, since it feels like giving up on a good idea at
  the first sight of real difficulty...

i will go for the first one so i can at least finish the interaction
code.. this has the advantage of making the monitor itself a bit more
robust, since it will provide full memory access.

one thing i didn't think about though: making ferase and fprog
primitives will make them a bit less safe (ending up sending random
data). i should add a safety measure.

ok, that seems to work.


Entry: monitor update
Date: Tue Feb 20 21:52:02 GMT 2007

triggered by some unresolved conflict between hidden dynamic state and
the interpreter, i made most of the functions in the monitor available
as interpreter bytecode. this makes it a bit more robust and
apparently a whole lot faster also.

still to fix is some kind of safety measure to prevent the erase to be
triggered accedentally by some unlucky combination of input
data. a password if you want :)


Entry: monitor progress
Date: Wed Feb 21 11:46:43 GMT 2007

got most of it working this morning. next actions:
- variable/bit alloc
- save/restore state
- sheepsint core compile + macros

i do rely a bit on parsing macros in the original sheepsint 3.x
code. that's not so good. time to thing about working out some
abstractions a bit better.

for isr:

flag high? if flag low handler retfies then

now variables/bits

ok, no bits.. do that later, sheepsint doesnt use them: explict
allocation.

next:
- state loading on startup
- interrupt handlers


Entry: getting tired
Date: Wed Feb 21 23:55:34 GMT 2007

yes, time to get it done.. overall, i'm quite happy with the
result. it's a lot better than the previous two. i can't really see
much further from here, other than elaborating towards higher
abstractions (different language), and fixing some jump related simple
optimizations.

the bad guy is quoted method code, which has a strange conflict of
concepts. more on that later.

another thing i miss is inline cat code, i.e. for generating
tables. i think i better do this in a different file, and only in
scheme: no more intermediate cat-only files. 1 1.1 16 table-geom

then the lack of proper run-time semantics is kind of weird. the
partial evaluator replaces this, but in an implicit manner: not
everything is accessible, and the bit depth is different.

about literal opti: still not completely happy, since the patterns
should do the literal preprocessing automaticly.

looking at pic18.ss gives me a warm fuzzy feeling :) most of the
knowledge is encoded in 2 patterns: assembly substitution patterns and
recursive macros. language support is encoded in 2 more: some asm
state monad and writer monad.

the thing which would help a bit is reducing the redundancy for the
rewriter macro specification. the way it is right now is very
readable, but maybe a bit too much clutter. on the other hand, it
might be a bit overengineering.


Entry: monads again
Date: Thu Feb 22 00:57:22 GMT 2007

http://en.wikipedia.org/wiki/Monads_in_functional_programming

Alternate formulation

Although Haskell defines monads in terms of the "return" and "bind"
functions, it is also possible to define a monad in terms of "return"
and two other operations, "join" and "map". This formulation fits more
closely with the definition of monads in category theory. The map
operation, with type (t -> u) -> (M t -> M u), takes a function
between two types and produces a function that does the "same thing"
to values in the monad. The join operation, with type M (M t) -> M t,
"flattens" two layers of monadic information into one.

The two formulations are related as follows:

(map f) m ≡ m >>= (\x -> return (f x))
join m ≡ m >>= (\x -> x)

m >>= f ≡ join ((map f) m)

--

isn't that what i'm doing?

'map' is my 'lift', it lifts a function operating on only a stack to
one operating on a stack + state information.

'join' is i.e. concatenation of lists in the writer monad i'm using
for assembly,

'return' i don't use? yes i do. it's how i initialize state, i.e. by
loading an empty assembly list on the stack, and how some functions
return a packet of assembly code.

http://citeseer.ist.psu.edu/wadler92essence.html
the basic idea  in monadic programming is this: a function of type
a->b is converted to one of type a->Mb (monadic form)

i.e. assemblers:
a function '(movlw 123) is converted to '((movlw 123))

'bind' is there to compose 2 functions in monadic form.

in the example of assemblers, 'bind' will do the concatenation of the
assembly.


Entry: higher order pattern matching
Date: Thu Feb 22 09:56:59 GMT 2007

meaning: match pattern generation based on templates. it seems to
work, but involves double quoting, which is a bit hard to wrap your
head around.. there's one thing i've been trying to understand for a
while, is how to do this:

`((['dup] ['movf a 0 0] ['lit] ,word) (,'quasiquote ([,opcode ,',a 0 0])))

without having to use the "quasiquote" symbol. maybe i should have a
look at paul graham's "on lisp" again...

ok, i think i got it:

;; ORIGINAL: explicit quoting of the quasiquote symbol
`((['dup] ['movf a 0 0] ['lit] ,word) (,'quasiquote ([,opcode ,',a 0 0])))

;; WORKS: using a name binding to avoid double quoting
`((['dup] ['movf a 0 0] ['lit] ,word) (let ((opc ',opcode)) `([,,'opc ,,'a 0 0])))

;; MAYBE WORKS: pattern generated is (quasiquote (unquote (quote
;; thing))) instead of (quasiquote thing)
`((['dup] ['movf a 0 0] ['lit] ,word) `([,',opcode ,,'a 0 0]))

yep it works..

----------------------------------------------------------------------------------
the trick is to generate this:     (quasiquote (... (unquote (quote thing)) ... ))
instead of attempting to generate: (quasiquote (... thing ...))
----------------------------------------------------------------------------------

to really understand this, it might be interesting to implement quasiquote.
http://paste.lisp.org/display/26298

another thing to note: this merging of quoted/unquoted stuff is what
the syntax macros actually do a lot better automaticly..


Entry: interpret mode
Date: Sat Feb 24 10:15:38 GMT 2007

i got the synth core to run. next actions:
- interpret mode
- setting interrupt vectors
- note table
- figure out line voltage + impedance
- identify


interpret mode seems to +- work

i'm using overriding: if a word is not in the target dictionary, it is
executed on the host. maybe this will lead to some obscure problems?
maybe i really need to separate the 2 a bit better, and use an
explicit debug mode.

then state association. i should have an 'identify' command, so a
connected target chip can tell the host which state file to load. but
how to implement this? i could reserve some space in the bootsector
for this actually.

so, applications..

i was thinking about keeping the monitor independent. i don't think
this is a good idea, since the boot code is really application
dependent. so an application is everything, including monitor.


Entry: syntax macros
Date: Sat Feb 24 21:45:46 GMT 2007

been playing a bit with macros.. i don't really understand them fully
though.. especially the use of local syntax in syntax expansion etc..

also, it's really better to move the preprocessor hook out of
pattern.ss DONE.


Entry: 16bit code
Date: Sat Feb 24 23:19:13 GMT 2007


looking at the sheepsint controller code.. there is no real reason to
not make it 16-bit. all computation i'm doing is on 16bit numbers, and
the overhead of switching everything to 16bit is probably minor.

todo:
- 16bit interpret mode
- way to map symbols


Entry: toplevel workflow
Date: Sun Feb 25 08:41:42 GMT 2007

mainly about program organisation. a program consists of these parts:

1. boot block (first 64 bytes)
2. monitor
3. fixed application code
4. variable application code


a project is a directory. 2. should be made as standard as possible,
and i really shouldn't care about the size, since that only matters
for code protection. 3. should, if possible, stay on the target too
(mark). 4. should have a 'scratch' character.

empty = erase till previous 'mark', but no further than monitor code.
-> replace dictionary with saved dictionary
-> round 'here' up to the next 64 block
-> erase from there
DONE

also, the reset vector should jump to #x40
DONE

and i need to find a way to update the monitor code on the fly.
-> either copy the monitor as a whole (since it's place-independent)
-> copy a minimal copy routine.
as good as DONE


reloading core with minimal effect on state = 'core'

setting interrupt vectors : should save 'here' etc.. -> interesting,
since it really involves a run-time assembler stack. maybe it does
make sense to de-scheme the assembler..


Entry: state monad
Date: Sun Feb 25 10:46:05 GMT 2007

http://www.ccs.neu.edu/home/dherman/research/tutorials/monads-for-schemers.txt

let's see. the problem i had was to use 'for-each' in state
code. because of the way the state needs to be passed, all higher
order functions need to be aware of it. i just need a special
'for-each'.

the way around this is to use the stateful functions ONLY to access
the data, and use pure functions to do manipulation. i.e. 'logic' vs
'memory'. currently in the interpreter it works fine.

this can seem like a drag, but in fact it is a good thing: functions
are not unnecessarily infected by state.

so, monads are about order of execution, and really central to a
compositional language, where there is only order! this in contrast to
lambda languages, where there is an intrinsic parallellism in the
order of evaluation.

about interpretation: if you see a concatenative language as a series
of sequential operations, it is 100% serial (the way it is
implemented). however, if you see it as composition of functions,
there is no evaluation order, because there is no evaluation, only
composition.

i need to look into list comprehension etc...


Entry: call conventions : de-scheming
Date: Sun Feb 25 10:56:48 GMT 2007

instead of doing some real work today, i'm going to have some
fun. make the interpreter more reflective, meaning, converting all
important routines to operate on a stack.

actually, that might not be a good thing.. i'm using non-stack
functions for convenience, so primitives are simpler to code, and i
can use the lambda abstraction instead of combinators.


Entry: time to do some work
Date: Sun Feb 25 13:36:34 GMT 2007

- interrupt vectors DONE
- a/d converter for board
- 16 bit interpreter
- constants and variables


Entry: alan kay oopsla lecture and stuff
Date: Sun Feb 25 21:36:56 GMT 2007

- 'core' should not distroy any DYNAMIC state AT ALL, only static
  background.

- every restart is a failure.

- need better debugging.


i need to be more observing of things that are annoying, and fix them
immediately instead of some short-term goal. the thing i'm trying to
do is to build a better tool, not to finish some product. i need to
always try to distill the important core idea, instead of bashing away
to 'just make it work'..


Entry: variables
Date: Mon Feb 26 09:52:27 GMT 2007

i've got a problem with variable names: the dictionary does not make a
distinction between flash and ram names, but the interpreter does need
to treat them differently. this is solved properly by using two
dictionaries, or a nested dictionary.

probably 2 dictionaries is better.. requires a little rewrite though.

ok. i started to rewrite the assembler to separate assembly from
dictionary operations, so it's easier to make the dictionary an
abstract object.

then i need to change to recursive operations '(ram here) instead of
'here etc..

seems to be fixed. the implementation is abstracted, and currently
solved as a simple sub-dictionary. maybe move this to 2 separate
dicts..


Entry: analog -> digital
Date: Tue Feb 27 08:25:45 GMT 2007

i don't know what's wrong, but it doesn't work properly. but first,
documentation. let's copy and paste the previous one.


Entry: reasons
Date: Tue Feb 27 12:12:44 GMT 2007

all i want is lisp, but:

- cat is terse
- cat is more editable
- forth works on small things
- forth with linear data is predictable

the first to are from the point of writing software, and interacting
with a system, the last two are practical solutions to needing a lot
of programming power under constraints (small or RT)

something i learned though: it's bad to waste time writing
combinators. in BROOD 3.x i solved this by writing some core things in
scheme. basicly they are combinators: interpreters for certain kind of
code propagating state.

in PF and previous forth experiments i ran into this problem several
times: trying to express something which really needs 'hidden
state'. mostly i solve this using global variables. which is ok as
long as there is no pre-emptive multitasking going on.


SCRIPT    = toplevel organization: large amount of trivial code
ALGORITHM = small amount of  nontrivial code

the idea is to use a scripting language to glue together algorithms,
while the nontriviality of the algorithms is hidden, and the
connectivity between them is made managable by the features of the
scripting language.


Entry: linear lists -> PF
Date: Tue Feb 27 12:18:46 GMT 2007

yes. it does make sense to rewrite malloc. malloc is not what i need
if i'm using linear data structures. i don't need free, only free
lists. and yes, it would be cool to have access to the page
translation table too :)


Entry: compilation is caching
Date: Wed Feb 28 01:41:21 GMT 2007

compilation is really caching.. maybe i should find a way to add
dynamic loading of code without full image reload, by using a custom
made 'promise'. one that can be un-cached whenever a new word (or
group of words) is defined, so code can be re-bound.

more about the caching.. this means that symbolic code is really the
only representation of code. the compiled representation is an
invisible optimisation, and should be hidden from the programmer. if i
replace all atoms with a struct containing their symbolic version and
a possibly cached behaviour, i can re-interpret on the fly..

this should give all the benefits of late binding, without the
drawbacks of having to reload the whole image all the time.. however,
cache invalidation probably needs to do this anyway: invalidate a
whole dictionary of code, unless all references can be found
somehow.. probably not.

so what's the difference? what would a proper cache offer?
uncompilation for one.. it's probably good to keep the symbolic data
and environment around..


Entry: no more quotation
Date: Fri Mar  2 14:08:38 GMT 2007

quotation sucks.. and it's really not necessary if i install a default
semantics. my previous argument was: no default semantcs (no
defaults!) because i need more than one.. however, everything will run
on the VM as primitives, so there is no real good reason to have no
defaults: the symbolic representation might be "the bottom line", with
compilation viewed as optimization/caching.

what needs to be done to fix this? i probably need a better object
representation, a more abstract one. an object has properties, one of
them being its cached rep.

so.. what is an object?
- syntax, form.. this is the 'data' part
- semantics in the form of an associated interpreter object

optimizable properties:
- cached semantics.

this is really just OO. need to look at smalltalk.. maybe it's good to
have some ideas propagate. data=object data, interpreter=class

summary: the idea is to parse into something which retains the
symbolic representation, so semantics can be late bound, and
compilation is still possible, but is done with memoization.

clearing the cache is then possible by scanning the entire memory from
the root and invalidating some bounds.

this trick can also be used in PF. a linear language with late binding
but aggressive memoization.

hmm.. i read something in "the essence of functional programming"
http://citeseer.ist.psu.edu/wadler92essence.html
about values versus processes. to paraphrase:

- in lambda calculus, names refer to values
- in compositional languages, names refer to functions

the first one only has values, (while functions are a special case of
values), while in compositional languages there are only functions,
with values represented as functions.

going the intuitive route: a name is a function, and only a
function. an object is only a function. it has an associated
action. data is represented by a generator.


Entry: a new PF
Date: Fri Mar  2 15:22:12 GMT 2007

summary:

- object oriented: objects are functions. each object has a 'syntactic
  representation S' and an 'associated interpreter I'. (the result of
  applying I to S is X, an executable program which acts on a data
  stack.

- the basic composite data structure is a CONS cell.

- composite data is linear: no shared tails.

- the interpreter needs to be written in (a subset of) itself, to
  allow easy portability (to C).


problems:

all the problems are related to the linearity of the language. to make
things workable, some form of shared structure needs to be
implemented. however, this can lead to dangling references.

-> continuations / return stack
-> mutual recursion

if i clean up the semantics such that dangling pointers are allowed in
some form, like 'undefined word', this should be managable. to keep
things fast, this needs to be cacheable: it should be possible to
detect whether an object is live etc..

to rephrase: looks to me that a completely linear language is really
unpractical. how do you tuck away non-linearity so behaviour is still
real-time?

i keep running into the idea of 'switching off the garbage
collector'.. decompose a program into 2 parts: one that uses a
nonlinear language to build a data/code structure, and a second one
that runs the code: trapped inside the brood idea: tethered
metaprogramming.

-> a predictive real-time linear core (linear forth VM + alloc)
-> a low priority nonlinear metaprogrammer (scheme)

together with the smalltalk trick to simulate the real-time linear
core inside the metaprogrammer.

the VM:
- no if..else..then: only quotation and ifte
- no return stack access: use quotation + dip

this can be a lot more general than for next gen PF. i can run this
kind of stuff on a microcontroller too, to have a different
language. one with quotation, and no parsing words.. the idea is to
make the VM as simple as possible: i already have a way to implement a
native forth, maybe the catkit project should be just that: CAT is
that thing that runs on the micro? linear CAT?


Entry: linear CAT vm
Date: Fri Mar  2 15:59:41 GMT 2007

- run:     invoke interpreter
- choose:  perform conditional
- quote:   load item from code onto data stack

- tail recursion: this is really important
- continuations (return addresses) are runnable

using variable bit depth? code word bitdepth is determined by the
number of distinct words. an 8 bit machine is for small programs,
while a 16bit machine is for larger programs and/or programs that need
to do more math. something inbetween is also possible. most practical
is 12 bit. but the most important thing is: the data stack needs to be
able to hold a code reference.

for the 18f, i think it's best to go to 16bit. the forth is for
inconvenient features, while the highlevel language should be that: a
highlevel language.

in order to properly implement tail recursion, the caller should be
responsible for saving the continuation.


Entry: direct threading
Date: Fri Mar  2 16:33:47 GMT 2007

i'm trying to write an interpreter with these properties:
- proper tail calls (caller saves continuation)
- continuations can be invoked by 'RUN'
- direct threading.


in direct threading, threaded code is a list of pointers that points
to executable code, and a continuation is a pointer that points to a
list of such pointers. so yes, these constraints can be satisfied:

- composite = array of primitive
- continuation = composite
- a composite code can be wrapped in a primitive using a simple header

TBLPTR -> composite code
PC     -> primitive code

see direct.f -- summary: the most important change is threaded code +
proper tail calls by moving the continuation saving to the caller.


Entry: linear languages
Date: Fri Mar  2 19:13:23 GMT 2007


http://home.pipeline.com/~hbaker1/Use1Var.html

"A 'use-once' variable must be dynamically referenced exactly once
within its scope. Unreferenced use-once variables must be explicitly
killed, and multiply-referenced use-once variables must be explicitly
copied; this duplication and deletion is subject to the constraint
that some linear datatypes do not support duplication and deletion
methods. Use-once variables are bound only to linear objects, which
may reference other linear or non-linear objects. Non-linear objects
can reference other non-linear objects, but can reference a linear
object only in a way that ensures mutual exclusion."


what he describes a bit further on is an 'unshared' flag. a refcount =
1 flag, but it looks like this is more in the context of a mark/sweep GC.

an attempt to make some patterns automatic? reverse list construction
followed by reverse! is an example of a pattern that might be
optimizable if the list has a 'linear' type: the compiler/interpreter
could know that 'reverse!' is allowed as a replacement of 'reverse'.

so as far as i get it, baker describes a 'linear embedded
language'. linear components are allowed to reference non-linear ones,
but vise versa is not allowed without proper synchronisation. so in a
RT setting, this means the only thing that is allowed to run in the RT
thread is the linear part, while the nonlinear part can maintain it's
game outside this realm.

so, again:
- high priority linear RT core (forth)
- pre-emptable nonlinear metaprogrammer (scheme/cat)

the linear part contains only STACKS + STORE. the nonlinear part can
contain the code for the linear part. the compiler runs in the
nonlinear part. the nonlinear part is not allowed to reference CONS
cells in the linear part.

this can be implemented entirely inside of PLT. on the other hand,
having this structure independent of a PLT image makes it more
flexible: the core linear system should be able to do it's deed
independent of the metasystem's scaffolding.

baker calls my 'packets' nonlinear types: names with management
information (reference counts): a strict distinction is made. this
allows a nonlinear type to be decouple from it's (possibly linear)
representation object.

in PF this means: packets are references to linear buffers. the result
is that underlying representation can change ala smalltalk's 'become'.

conclusion:
- cons cells are linear
- packets are nonlinear wrappers for linear storage elements
- packet access: readers/writers access protocol: mutation is only
  allowed when there are no readers (RC=0). (functional ops)
- 'accumulation ops' use shared state + synchronized transactions.


Entry: standalone forth
Date: Sat Mar  3 13:25:46 CET 2007

maybe he didn't get it, but writing this compositional language and a
standalone forth are conflicting ideas..

it's not so hard to give up on parsing words, other than true quoting
words: there will be only one left, let's call it '{'.

what's worse is that i need to dumb it down a bit. i'd rather define a
new language, but an ANS forth might be better for teaching. for the
simple reason that i don't need to write such an extensive
manual. maybe it still makes sense to run both languages on the same
VM ?

another forthism that's not really necessary: since i'm sharing code
between the lowlevel subroutine threaded forth and the direct threaded
forth, why not make the VM primitives equal to subroutine threaded
forth, instead of them being directly linked to a NEXT routine. in
other words, why not have an explicit trampoline? this will be
slightly slower, but uses less code since the primitives don't need a
separate binding, which would just call the other code anyway.


conclusions:
- interpreter loop allows primitives == native code (STC forth)
- 'enter' uses short branch -> code needs duplication
- primitives need no IP saving!! (compiler needs to distinguish
  between primitives and highlevel words)


the last one is a consequence of doing continuation management on the
caller side: caller cannot be agnostic! it should be possible to pass
this information to 'enter' somehow, so enter can save/restore
depending on some flag. carry flag? that's ok, a long as this machine
state is guaranteed to be saved..

however, in this case, the primitive needs to call 'EXIT' in case the
carry flag is not set! so still, some compiler magic is necessary, or
all words need to terminate with an EXIT call, independent of whether
they terminate with a tail call.

this is a bit messy... let's try to summarize: the flag is called the
NTC flag: non-tail-call.

- EXIT = leaves current context
- WORD -> ENTER conditionally saves the context (carry flag)
- PRIMITIVE: needs EXIT if TC flag is set.


again.. there are 4 cases:  PRIM/COMP and TC/NTC. what i'd like is to
solve the PRIM/COMP completely in 'enter', such that the interpreter
can be agnostic about highlevel words.

an instruction =   primitive + NTC flag

what does the NTC flag mean for the interpreter? nothing. it's just
extra information passed to 'ENTER': it means the rest of the code
thread can be safely ignored.

the interpreter completely ignores it, and just runs forever, assuming
the code stream is infinite. all threading changes are implemented by
other primitives.

so, given the current implementation, a solution is to always compile
EXIT, together with a bit that indicates an instruction is a tail
call. this is not very clean.. the exit bit should be universal.

semantics: the bit indicates that the current thread can be discarded
BEFORE passing control to the primitive. then the primitive can always
just save the continuation. (a possible optimization is to overwrite
the continuation, but let's to the former first since it's
conceptually simpler). this is different in that the interpreter is
not agnostic about the return stack, but effectively implements
'EXIT'.


Entry: is code composite? run or execute? yin or yang?
Date: Sat Mar  3 16:37:37 CET 2007

in CAT it seems i've converged on only using composite code = list of
primitives as the quoted programs that can be passed to higher order
functions. however, original forth does not use this stance: threaded
code is a list of execution tokens, and execution tokens are the
canonical representation of quoted code, when treated as data.

this is wrong. why? reflection becomes more difficult.

the stuff on the return stack is the saved IP. this should be "a real
program". the inner interpreter deals with arrays of primitives, and
such arrays can be wrapped in a primitive by prepending them with
ENTER. however, the data representation of code should really be
composite, so no primitive address, but a composite address.

primitives == internals. it's better to treat primitives as singleton
composites, than to treat composites as primitives. in the inner
interpreter, the reverse view is better.

i think this view originates in original forth, and is mainly
historical: primitives came first. composites were treated as
primitives. i can't think of another reason really..


conclusion:

------------------------------------------------------
programs are composites = array of threaded primitives
------------------------------------------------------

i'm going to reflect this in the following change:
- execute is reserved for primitives
- run is reserved for composites


also, if you look at native code, the picture is pretty clear:
primitives are machine instructions, and you simply cannot 'execute'
them, they always need to be inside a code body.. composite code =
list of instructions, referred to by an address. it's just the same...


Entry: reflection
Date: Mon Mar  5 02:26:40 CET 2007

have to think about this a bit more. something strange going on with
this primitive/composite thing. what about only having highlevel code:
composite code just links to more composite code.. there's no way to
plug in primitives here. for purely pragmatic reasons, using
primitives, and highlevel words wrapped in primitives is worable..


Entry: essentials
Date: Thu Mar  8 16:57:26 EST 2007

* symbolic -> ast + room for it
* possibility to 'uncompile' an AST
* use abstract types (structures) and 'variant-case' in AST


Entry: delayed list interpretation
Date: Fri Mar  9 10:48:54 EST 2007

thinking about eopl : i need more data abstraction. car and cdr is
nice, but they really are quite lowlevel. there's too much
implementation leaking through. the asm monad is a good example of
abstraction, but code probably needs it too..

about using symbolic code and caching: parsing a list, it can be
either code or data, depending on how it is accessed. maybe it should
really have these 2 identities? if accessed using list processing, it
behaves as a list, but if accessed using 'run', it is behaves as code
-> jit compilation cache. the benefit of this is that the semantics
can change.

so a list is really an object with different identities. all list
processors should be modified to take an abstract list object.


Entry: platforms
Date: Mon Mar 12 18:54:28 EDT 2007

ai ai ai... i'm spending money again, surfing on ebay.. discovered
this nice ARM7 board on sparkfun made by olimex. it has a 128x128
color lcd, usb client mode and ethernet. a dream platform for brood,
especially since this is THE standard 32bit chip, getting really cheap
too.. so i want: 8 bit PIC18, 16 bit DSPIC30 and 32bit ARM7.


Entry: itch
Date: Tue Mar 13 12:47:23 EDT 2007

want to start changing some abstract data type implementations.. the
most important one is probably 'quoted program' or 'composition'. this
has to be distinguished from a chain of cons cells in that it has more
structure. a composition can always be converted to a chain of cons
cells, and a chain of cons cells can be converted to a composition if

1. it's a proper list
2. some interpreter semantics is attached

so a quoted program is the above: a proper list with attached
interpreter semantics. the changes this requires are:

- all data operations that modify lists need to accept the
  'composition' data type and convert automaticly.

- the parser needs to produce compositions instead of chained cons
  cells.


it's getting old.. but the structure of program (source compile
cached) is probably better written as (primitives) where each
primitive has its own semantics and cache: word (atom compile
cache)

there are several options for word
(source thunk/cache)
(source thunk cache)
(source compile cache)
(source compile env cache)

in general: do we want the environment to be explicitly specified, or
is an abstract representation of the binding operation enough?

one of the requirements is the ability to rebind, so at least cache
and binder need to be separate, since the binder uses (in the current
implementation) some mutable state.

probably the following model is close enough to the current one
(basicly the same as 'delay' but with possibility to clear the cache)
to not need a lot of changes (or enable incremental ones) and will do
what's required:

prim = (source atom, interpretation thunk, cached compilation)

so:
- code can be re-interpreted by just clearing the cache
- all code, independent of semantics, can be specified as lists
- a 'data' mutator can be defined that strips a list from all
  executable semantics

so i guess the conclusion is:
- composite code is a concrete list of abstract primitives
- primitives contain memoization info

this brings me to restate that in BROOD, the compiler itself is
written using OO techniques with mutable state, but the target
compilation is completely functional.

the reason is this:
- the host language is mostly about organization  -> mutable OO
- the target compiler is mostly about algorithms  -> functional + monads

the main practical reason for using functional approach in the
compiler is the ability to work with continuations for very flexible
control structures. the 'constant' part is implemented as an OO
system.


Entry: reflection
Date: Tue Mar 13 13:46:25 EDT 2007

another thing i keep running into is the mixed use of 2 calling
conventions: scheme N->1 and cat stack->stack

it would be nice to have scheme only to provide primitives, and have
all other utility code be out in the open. however, given the way some
algorithms are implemented now, that is impractical. i can have all
the reflection i want, but not necessarily from CAT, since that would
make it harder to use scheme to implement the core of things..

maybe it's good to keep this in mind: CAT is just a minilanguage
inside scheme, and all the things i need to bring out can easily be
brought out if necessary. full reflection is not necessary yet.

probably CPS chapter in EOPL will make this a bit more clear..

ok.. getting rid of the parse/find abstract interface. it complicates
things too much..

one thing i didn't think of is that 'find' maps  thing -> word. so the
compiler for a symbol + find is something that looks up a word AND
dereferences the implementation.

yep. i noticed it is really a good thing to use closures instead of
explicit structures.. of course, this does mean that all the red tape
moves to the other side: all things that provide closures need to do
the binding.

it's not going so smooth as expectected: all through the code it is
assumed a primitive is either a function or a promise. so i guess it's
a good idea to change it now. the main problem is the 'find' as
expressed above, since i have an extra level of indirection that
distinguishes 'find' from 'compile', with explicit delayed compilation
(interpretation) instead of implicit.

i can probably work around it by providing:
- compile lifting
- special primitive registrars


i need to sleep over this..

current problem = pic18-literal: used in a lot of places. produces
primtive but should produce word. i changed this, need to check. the
rest should be straightforward..

also writer-register! is wrong, due to the lift not being
wrapped.. maybe better just lift words instead of prims?
all this freedom!!


Entry: spaghetti
Date: Wed Mar 14 08:33:03 EDT 2007

the change above brings up some conceptual confusion.

- a word is a representation of an atomic piece of code. it retains
  its source representation, and a translator which defines its
  semantics.

- lifting is done on the primitives, not on the words. maybe that
  should change? NO

- the pattern (register! name (atom->word name compiler)) has 2
  occurences of name. this is ok. the first one is an index, while the
  second one is there to recompile the word if necessary.

ok, it seems to work now..


Entry: reload
Date: Wed Mar 14 09:48:10 EDT 2007

i run into problems with redefining the structs: data lingering after
a reload is not compatible with type predicates and accessors. this
means code cannot survive a reload. this is a bit ill defined, so i
need to make a descision.


- if i don't redefine 'word' on reload it can never be redefined on
  reload, which is a bit of a nuisance.

- the other solution is to redefine 'word' and chainging all the data
  on the stack to reflect this change: the stack is the only thing
  that survives a reload, so it needs to be properly processed. it
  looks like i need a temporary structure to solve this.

- find a better way to implement reload.


the struct thing is really annoying. i need to find a solution to that
soon. all the rest works with reload, even using a 'symbolic'
continuation passing: after load, the repl loop itself is recompiled.

maybe i should separate the files into init and update. that way it is
possible to perform incremental updates.

ok. the solution seems to be to install a 'toplevel' continuation that
is passed the entire application state (stack). 'load' can then be
called with a symbolic code argument == continuation.


Entry: TODO
Date: Wed Mar 14 11:01:25 EDT 2007

got it hooked up. lots of things to fix:

- pic programmer endianness bug for high word?
- fix reload / scheme modules (using different files with include) DONE
- create a monitor/compiler for the 16bit threaded interpreter


a compiler for the threaded code would:
- map a list of words to their respective addresses
- perform tail call optimization

i should go straight for cat-like code with code quoting.

yep. the most important things to tackle now are modularity and
platform independence. aspect oriented programming :)

maybe is should leave the module stuff for later, since reloading is
not that easy... loading inside a module namespace might be possible
though.


Entry: languages
Date: Thu Mar 15 09:40:09 EDT 2007

how to combine 2 different languages in one project? i'm trying to
write the purrr language in terms of purrr/18, and i need an easy way
to switch between them.

i need a methodology.. what is a threaded forth? basicly a set of
primitives. so what i need is:
- a list of primitive names
- a way to compile 'enter'

things to look at:
* unify all toplevel interpreters, so i can have more
* separate console.ss into machine specific things.


a toplevel interpreter is
- a string interpreter
- an exception handler
- a continuation


what about i store all the modes in the data store, symbolically?
seems to work.

now, about the VM. i think it's best i standardize on the VM mentioned
above: call threaded with return (jump / tailcall) bit. this can be
written in C too, so should eliminate most porting problems, with only
optimization problems remaining.

ok. back to where i started. should i allow 2 different languages on
one attached system? why would i want to do that? debugging of course,
but what else? there's a bit too much freedom here. 2 languages:
native + ST forth will compilicate things, but will also make things a
lot easier to use.. and a threaded forth compiler isn't so incredibly
hard to build..

so, i work with one core language purrr/18 and build a threaded forth
on top of that using a different mode.

so.. next problem = representation for words. i'm using a simple name
prefix = underscore. maybe there's a better way to do this? name
prefixes allows the use in the lower level language.

ok.. rambling on. the way to do it is to just translate highlevel
language into lowlevel forth, and pass it on to the compiler.

(1 2 +)  -> (ENTER ' _lit 1 ,, 2 ,, ' + EXIT)

here EXIT is a special word that installs the return bit in the MSB of
the last word.


Entry: TODO
Date: Thu Mar 15 13:17:53 EDT 2007

- variable abc abc 1 +
- flash addresses as literals
- exit bit
- write a paper about the absence of '[' and ']' and the relationship
  between literals and (dw xxx)


the first 2 are similar: it would be nice to partially evaluate some
code that uses words from the ram and flash dictionaries next to
constants.. this introduces another dependency.

currently the partial evaluator only resolves constant symbols. it
requires a new dependency [ dict -> compiler ] to resolve this
problem. there is a possibility to delay the evaluation of the
optimization until assemble time, by using closures..

there's a deeper problem here: name resolution needs to be fixed..

let's see..  partial evaluation can't fail in a sense that is
recoverable: if the literal optimization fails, it's a true error that
fails the entire compilation. this means the evaluation itself can be
delayed until the environment is ready, since the control flow does
not depend on the result.

a delayed evaluation has the form   \env -> value
whyle env is: name -> value.

the more i think about this, the more better i like the idea.

ok, so the first 3 can be solved using some form of delayed evaluation
until exit.


Entry: delayed evaluation forced in assembler
Date: Thu Mar 15 15:56:08 EDT 2007

there is already one kind: symbolic constants. the addition that needs
to be made is generic expressions. there are several forms to
choose:
- symbolic lisp expressions
- symbolic cat expressions
- scheme closures

the first one are nice since they are symbolic, so easier
debugging. the last one might be simpler to implement. lisp style
expressions make more sense here since they have a single value, not a
stack.

now, this can be combined with the paper on partial evaluation:
partial evaluation should then be transformed to compile time meta
code evaluation.

actually:

1 2 +   ->   [ 1 2 + ]L

following colorForth, executed code always results in a literal on the
->green transitions.

ok, so this fixes the question above:
* delayed code is symbolic cat.
* assembler does final evaluation of this code


so what is the context?

machine constant -> number
variable name -> data addresses
forth words -> code address

operations come from some dictionary, probably cat, but need to be
escaped somehow. let's say: search meta first, then variables, words,
constants.

this needs some changes:

* the assembler needs to be a CAT word, so the stack can be used as
  context.

* it's probably better to wrap all symbolic names in a list, so the
  evaluation is uniform: either numbers or lists.


this seems to work pretty well.


Entry: 16bit threaded forth compiler/interpreter
Date: Thu Mar 15 18:06:23 EDT 2007

let's give them a name: the highlevel forth is PURRR and the lowevel
forth is PURRR/18. here i use shorthand names threaded and native
resp.

first problem is the parser, since the forth needs its own parsing
words. that should be the only real problem. since this forth is
mainly for higher level stuff, i don't need machine constants: all
machine access is solved on a lower layer. actually, the different
namespace is a nice excuse for some simplificiation.

second problem is running code from the brood console. this needs a
little trampoline, since the only way to get out of a running
interpreter is to call 'bye':

' bye -> IP
<addr> _run CONT


Entry: added pattern debug
Date: Fri Mar 16 10:13:07 EDT 2007

the pattern compiler now has a debug method which dumps the source rep
of the patterns into the asm buffer for inspection. this is
implemented in the form of a match rule that matches ['pattern].


Entry: added robust reloading + logging
Date: Sat Mar 17 09:38:56 EDT 2007

error on reload: console waits for <ENTER> then reloads. this is to
give a chance to correct syntax errors without loosing state.

also added 'cat.log' output, which enables the use of emacs'
compilation-mode. just run 'tail -f cat.log' as compile command.


Entry: trampoline
Date: Sat Mar 17 09:40:45 EDT 2007


ok, i got something wrong...

words stored in the dictionary are primitives. invoking a highlevel
word from within a lowlevel native context requires the use of these
primitives.

remember: highlevel 'run' ONLY takes composite words, while lowlevel
entry ONLY takes primitives. IF a primitive contains a highlevel
definition AND the primitive points to an ENTER call, THEN the rest is
highlevel code.

the correct way to build a lowlevel -> threaded trampoline is:

* set the highlevel continuation, saving the current one, to highlevel
  code that does (bye ;)
* call the primitive
* call the interpterer to invoke the continuation


Entry: delayed eval in assembler
Date: Sat Mar 17 10:18:14 EDT 2007

i should go to an architecture where the number of passes in the
assembler is not fixed, but just enough so all expressions can
evaluate with correct dependencies. maybe one pass is necessary to at
least find out if all the labels are defined.


Entry: literals
Date: Sat Mar 17 12:06:44 EDT 2007

should literals be handled in the interpreter?

problems:

* a _lit instruction cannot drop the current thread, because the value
 needs to be accessed from the current thread. so the code "123 ;"
 translates to "LIT 123 NOP|EXIT"

* moving this to the interpreter by encoding the value in the opcode
  is possible, but then large words need 2 words. it also requires 2
  different lit instructions, implemented as:

  - LOAD + LOADHI
  - LOADLO + LOADHI
  - DUP + LOADSHIFT
  - LOAD + LOADSHIFT


the one with DUP requires only one explicit lit, but it always needs 2
words, even in cases where the data would fit into a single word.

i think code density is more important than any other constraint,
except conceptual simplicity of the language, which is independent of
the VM implementation. so it's probably going to be 2 different lits.

so how to implement that?

it's easy to detect if the high byte of an address is zero, since the
flags will be set. this could be the clue. the address space this
overlays is just the boot monitor, so that's no problem.

some bitneukerij. 'x>IP' uses movff so it doesn't affect flags, this
means the testing of the zero flag can be done after testing of the
carry flag.

are literals important enough to give them half of the address space?
the answer is probably yes:
- they occur a lot
- if they are as cheap as constants, you don't need constants
- 14 bit signed words will cover most use for numbers (counters/accumulators)
- other literals are addresses: make sure memory model respects this opti


so how do we give them half the address space:
- effectively: use only 32 kb -> enough for now
- align words to 4 byte boundary (and possibly reclaim the storage..)

let's go for the first one: only 32kb address space. the other half
could maybe be used for byte access?

so, encoding primitives as

[ EXIT | LIT | ADDR/NUMBER ]

gives  EXIT -> c
       LIT  -> n  afer one shift

this looks nice:

\ inner interpreter loop
: continue
    prim@/flags                 \ fetch next primitive + flags
    exit? if x>IP then          \ c -> perform exit
    literal? if 14bit ; then    \ n -> unpack literal
    execute continue ;          \ execute primitive

\ interpret doubleword [ 1 | 14 | 0 ] as a signed value.
: 14bit
    _c>>                 \ [ c | 1 | 14 ]
    1st 5 high? if       \ sign extend
	#xC0 or  continue ;
    then
	#x3F and continue ;


after fixing a bug in 'd=reg' it seems to work


Entry: parsing
Date: Sat Mar 17 19:00:52 EDT 2007

just fixed the interpreters and direct->forth translators using
parser.ss

somehow something doesn't feel right though.. parsing words feel
'dirty'. i'll try to articulate why, since i don't think anything can
be done about it.

internally, quoting is no problem: you just build a data type (word)
that supports quoted functions/programs/symbols.. in CAT this is done
by creating primitives that map  stack -> (thing . stack)

however, in program source code it is problematic: non-quoted
compositional code has a 1->1 correspondence symbols<->semantics, and
the semantics of successive words is not related. quoting is about
modifying the semantics of symbols.

one example where this is done very nicely is colorForth: here the
color of a symbol is part of the source code, and represents
information about how to interpret a symbol name. in textual from this
would be something like

(red drie) (green 1) (green 2) (green +) (green ;)

here a pair of (color word) represents a single semantic entity. in
ordinary forth however, it's not done this way: not all words have a
prefix (a color). an other way to say it: most words use the default
'color'.

so, in a sense, the thing that is 'dirty' is the default
semantics. this is not so bad for convenience sake, but does requires
a parser that introduces the semantics. otherwise we would have

: drie  number 1 number 2 word + word ;

which is really what it is parsed to in the end.. the thing i'm being
anal about is that CAT has a 1->1 correspondence between syntax and
semantics, inspired by Joy. although, this is not entirely true. a
syntactic shortcut in the form of (quote thing) is introduced to be
able to quote lists and symbols. but this is not entirely necessary:

'(1 2 3)  ==   (1 2 3) data
'foo      ==   (foo) data bar

with these operations being a bit less efficient. that concludes the
rant.


Entry: quasiquote
Date: Sun Mar 18 09:00:31 EDT 2007


which leads me to the following. it does make
sense to have lists of programs in CAT, where quasiquote would come in
handy.

`(,(+) ,(-))


Entry: program->word
Date: Sun Mar 18 09:09:50 EDT 2007

some nitpicking about constant->word. before i had quoted programs
wrapped using constant->word. this doesn't make sense, since the
'constant' is really a parsed thing, and not a source representation.

however, it does enable 'data' to do its work. but why don't i just
quote the source of the entire program, and store the parser as
semantics? that would be cleaner, but something doesn't feel right
there either..

well, actually. i can just delay parsing completely! that seems like
the right thing to do: the source can just be retained in its original
form, and initial recursion during parsing is avoided, which directly
solves the problem of setting! an atom's semantics.


Entry: lazy eval
Date: Sun Mar 18 10:06:18 EDT 2007

i think i start to see why a lazy language can be so convenient.. i
spend quite some time trying to figure out when it's best to evaluate
some expression. if this is always "as late as possible" this work
should disappear. nevertheless, it's an interesting exercise.

for the assembler it might be interesting to write it completely
lazily, including the optimizations necessary for jumps, which i still
need to implement.


Entry: disassembler
Date: Sun Mar 18 14:40:43 EDT 2007

disassembler needs to be smarter. i probably need to add some
semantics to the fields, and have a platform-specific map translate
them:


resolver closure + asm code -> [ shared code ] -> disassembled ->
prettyprint.


Entry: open files
Date: Sun Mar 18 17:15:50 EDT 2007

something is terriby wrong with the open files.. fixed by manually
closing. i think i need to read about how ports get garbage collected,
or not.. indeed. they are not, need explicit close or make an
abstraction:

http://list.cs.brown.edu/pipermail/plt-scheme/2004-November/007247.html


Entry: where to go from here?
Date: Tue Mar 20 01:01:02 EDT 2007

enough mudding about. roadmap:
- get dtc working with host interpret/compile
- make it self hosting
- combine with synth

- dspic asm + pic18 share/port


Entry: a safe language?
Date: Tue Mar 20 01:22:58 EDT 2007

[ 1 + ] : inc
[ 2 + ] : inc2

is it possible to make a safe language without too much trouble?
something like PF. without pointers..

[1 2 3] [1 +] for-each

the interesting thing is that i can use code in ram if i unify the
memory model. i think it's time to start to split one confusing idea
into 2:

- a 16/24bit dtc forth for use with sheepsynth dev: control computations
- a self contained safe linear language for teaching and simple apps


safe means:
* no raw pointers as data
* no accessible return stack, so it can contain raw pointers
* no reason why numbers need to be 16 bit: room for tags
* types:
  - number   [num  | 1]
  - program  [addr | 0]

features:
* symbols refer to programs, special syntax for assigment
* assigning a number to a symbol turns it into a constant
* for, for-each, map, ifte, loop

[ 1 + ] -> inc
1 -> inc

[[1 +] for-each] -> addit


now.. lists?
the above is enough for structured programming, but map and for-each
don't make much sense without the data structures.. so programs should
be lists, at least semanticly. since flash is write-once, a GC would
make more sense than a linear language.. so what about:

purrr/18 -> purrr -> conspurrr

maybe it's best to stay out of that mess.. cons needs ram, not some
hacked up semiram.

what about using arrays? if programs are represented by arrays instead
of lists, not too much is lost:

[1 2 3 4] [PORTA out] for-each   ;; argument readonly = ok
[1 2 3 4] [1 +] map              ;; argument modified in place (linear)

the latter one needs copy-on-write.

[[+] [-]] [[1 2] swap dip] for-each

what about

[1 2 3] [1 +] map -> test

1. arrays are initially created in ram, as lists?
2. when assigned to a name, they are copied to flash
3. assignment is a toplevel operation, effectively (re)defining constants
4. flash is GCd in jffs style.
5. words can be deleted.

in ram: one cell is 3 bytes: 2 bytes for contents + 1 byte next
pointer. this leaves room for 256 cells, or 768 bytes.

it might be interesting to make assignment an operation that's valid
anywhere: persistent store.. on the other hand, that encourages misuse.

so..
- free lists make no sense in flash
- they do in ram
- persistent store rocks

in order to make this work, i need to write a flash filesystem first.

problem: does redefining a word redefine all its bindings? it
should. so each re-definition needs to be followed by a
recompilation. nontrivial. this gets really complicated...

can't we represent code as source, and cache it in ram? it looks like
variable bindings should really be in ram. but what with persistent
store?

damn. dumbing it down aint easy. i think maintaining the late binding
approach is infeasible. maybe it's good enough to clean up the
semantics a bit? 1->1 syntax->semantics mapping (i.e. choose is the
only conditional) so code can be used as data using 'map'. maybe that
does make sense.. 'map' as 'interleave'.

ok, that's enough.


Entry: language in the morning
Date: Tue Mar 20 09:02:11 EDT 2007

after 4 hours of sleep: it's hard to say goodbye to nice ideas when
they don't work for practical reasons.. still there's something here.
i think i just need to read the PICBIT paper by Marc Feeley and Danny
Dube, and bas it on that. it looks like i just need to be wastful:
everything is source code, the flash is a filesystem, and the ram
contains executable code. the most important of all: it should be
towered:

purrr/18 -> purrr -> cat/18

so to distill again:
- cons cells in ram
- a flash file system


it is interesting how the linear/nonlinear language thing i'm using,
and this linear ram and nonlinear flash memory model coincide.

the approach in PICBIT seems interesting: using fixed size cells of 24
bits = [2 | 11 | 11], with the types:

00 PAIR
01 SYMBOL
10 PROCEDURE
11 one of the others


Entry: distributed system
Date: Tue Mar 20 09:53:19 EDT 2007

i was thinking: this tethered approach makes a whole lot of sense in
the case of one host controlling a huge amount of identical cores.


Entry: back to dtc
Date: Tue Mar 20 10:42:24 EDT 2007

got compile + interpret working. time for control structures. i'm
seriously considering only using code quoting. but how to implement?
same as in PF?

it's actually not so hard:

x x x { y y y } z z z
      |

x x x quot L123 ; y y y ; : L123 z z z


this does require a stack / recursion to associate the lables. another
way to deal with it is to solve it in the parser, and use real
lists. or the lowlevel forth could be extended to use something like
this, which is probably easiest.


Entry: hands on pic hacking
Date: Thu Mar 22 17:10:19 EDT 2007

playing with the synth board. it resets from time to time. found that
touching the PGM pin causes this. this pin is floating in my board, so
i guess that's where the problem is.

datasheet says:

CONFIG4L 300006 bit 2 LVP enable1 / disable0

indeed, this is on. as long as this is enabled, normal port functions
are disabled. moral of the story: disable it, or tie it high, or
enable weak pullups.


Entry: sheepsint
Date: Thu Mar 22 17:38:38 EDT 2007

after fixing the PGM bug (LVP disabled now), it still crashes from
time to time. i suspect it's some kind of interrupt thing.. lets
disable stack reset and see if it still crashes.

tss... watchdog timer was on. stupid.


Entry: modeless interface
Date: Thu Mar 22 18:55:22 EDT 2007

- modeless interface (unix socket) to send brood commands for emacs
- normal boot vs interpreter based on activity on reset


i should find a decent protocol to interrupt an app: to attach a
console easily, but to have it running most of the time.


Entry: partial evaluator
Date: Fri Mar 23 00:50:37 EDT 2007

i'm probably just getting tired, but isn't it a lot better to do
partial evaluation on source instead of assembly code? there is some
elegance to the greedyness of the algorithm. somehow, this feels
ok.. but if i type 1 2 +, it's always going to be equivalent to 3.. if
literals can be identified at the time they are compiled, their
compilation can also be postponed..

i don't really have a good explanation. what i do know is that this
works because it is fairly decentralized.. the price payed is "literal
undo" which is not so hard, and also works for pure assembler.

don't know if this is going to make sense.. a symbol's semantics is
only defined by what machine code it will be compiled into. (concrete
semantics) for forth, this is either a function call or some inlined
machine code. since the latter is highly machine specific, it doesn't
really make much sense to separate that out into partial evaluator +
optimizer, since the optimizer is going to add some bit of partial
evaluation anyway.. it's better to put some effort into making the
code separable: some patterns go for all register machines, some go
for all pic chips, ...

as i found so far,

1. abstractions will arise whenever they are hinted by redundancy or
   "almost redundancy".

2. if you build an abstraction you don't use later, you
   loose. abstractions make code more complicated, and are only
   justified by frequent use.

3. don't hesitate to keep towering abstractions until the redudancy is
   gone. some problems really do need several layers to encode
   comprehensably.

what i'm intrigued by:

4. solve only one thing per layer. (one aspect). if the abstractions
   do not stack, find a way to disentangle them, and weave them back
   together automaticly.


Entry: compiler compiler
Date: Fri Mar 23 10:00:04 EDT 2007

seems you can't really use macros to write macros without extra
effort im mzscheme. it defines level 0 and level 1 environments
(normal and compiler), but a level 2 (compiler compiler) cannot be
easily used without the use of 'require-for-syntax'

the thing i ran into is this: i want to use a macro to generate a
pattern matching expression inside a define-for-syntax function that
is used implement a macro that generates a pattern matching
expression.

maybe it's best i just switch everything to using modules, and reload
the full core when i'm reloading. i'm getting a bit tired of these
kind of problems.

questions:
1. is it possible to reload a module?
2. how to only recompile what's changed to reduce load time?


Entry: cat as plt language?
Date: Fri Mar 23 13:54:06 EDT 2007

ok, but what is apply in that case?

(apply fn args) == (run-composite stack composition)

in other words, exchange single code multiple data to single data
multiple code. apply then still means means: convert a data + code
into a data.


Entry: modularizing cat
Date: Fri Mar 23 14:35:26 EDT 2007

brings up a lot of problems.. some of the macro's i'm using like
snarf-lambda are not very clean wrt names and values..

i also 'communicate using global values' which is not a very good
idea.. so it's going to take a bit longer than expected, but the code
should be a bit cleaner when it's done.

ok, now for the big one. pic18.ss

generic forth stuff: need to spend some time to separate out the
sharables, which is a lot..

i do wonder if i really need both writers and asm state monads.. it is
cleaner, but also a bit of a drag..

i need a proper mechanism to do this separation.

but first, get this thing to load properly.. got some bugsies here and
there. seems the compiler works fine, but the assembler has got some
problems.

ok. seems to work now. also compilation seems to work.


Entry: macro namespace
Date: Fri Mar 23 19:02:35 EDT 2007

there is really no reason to have multiple macro namespaces. i mean:
namespaces are defined using hashes. it's easier to just load the
generics, then overlay the specifics instead of having a lot of
special names in the dictionary.. in other words: the pic18* words
should be replaced by global unique things, denoting the fixed
functionality:


* machine constants
* simple/full forth parser
* macros
  -> recursive
  -> pattern matchers
  -> writers
  -> asm state modifiers


all specific functionality is added on top by overlaying the
code. this used to be done with "load" but is now done using
"require". order of execution is preserved in require ???


Entry: double postpone
Date: Fri Mar 23 21:16:39 EDT 2007

i'm running into problems with macro generating code.. fixed
some. cleaned up some in vm.ss

now i have an interesting problem with delayed eval: macro defs (side
effects) get delayed till after the macros are used..

ok i think i got it..

what about tagging names that are supposed to be cat semantics in a
certain way?

ok.. this concludes a long run. from the top of my head, things are
better now because:

- badnop is better defined as a forth compiler with fixed
  functionality mentioned above

- code makes clear indication if functions are used as cat semantics
  == code that compiles something into a stack primitive.

- 'compile' and 'literal' are now CAT macros

- the state monad uses a more highlevel wrapper


things to do still:

- constants for disassembler
- disassembler
- core restart
- clean up source file layout, maybe split in more modules + docu

funny... running into an evaluation order problem again.. maybe i
should use some kind of module / scheme namespace trick to get rid of
this? because load/eval/parse order is kind of arbitrary now.. ->
nothing to worry about. it was a stupid typo.

got the meta-patterns macro working too. this is actually an
interesting idiom: just wrap a single macro around a body of 'define'
statements to alter the way they are used: it allows proper syntax
hilighting + individual testing.


Entry: so what is badnop?
Date: Sun Mar 25 16:02:37 EDT 2007

a native forth compiler for register machines, with provisions for
harvard architectures, and provisions to build a dtc interpreter on
top of a native wordlength forth.

the platform specific part are: assembler generator, pattern matching
peephole optimizing code generator, and some recursive macros.


Entry: persistent store
Date: Sun Mar 25 18:59:45 EDT 2007

so.. it would be way easier to just have the compiled forms cached on
disk. but i guess if that's really necessary i can always write out
scheme files and compile them. for the rest: all persistent data
should be SYMBOLIC.

this means:
- no compiled CAT code (word)
- no continuations in asm

this seems really important.. an area where compromise leads to
unnecessary complexity. i'm going to leave it open, and implement
restart by reload, giving only the parameter.

this is turning into a "where to put stuff" quest again.. ok. keep it
like it is, and put the data stack in the state store + perform some
checking to see if data is serializable before writing it out.


Entry: debugging tools
Date: Mon Mar 26 15:10:12 EDT 2007

need more debugging tools:

- some safe way of dealing with the bootblock (mainly isr) OK-
  on-demand console: interrupt app OK
- proper disassembler
- 'loket'
- documentation: how to document the language?

dasm needs some thought. the interrupt app is as simple as polling the
rx-ready flag, i.e.  "begin params rx-ready? until"


Entry: i need something new
Date: Tue Mar 27 10:22:26 EDT 2007

the dasm might be interesting.. maybe i should do that. but i'd like
to do something exciting today :)

wrote some badnop docs, changed some names.. maybe i should have
user definable semantics accessible in CAT itself? (more reflection?)


Entry: the road to PF
Date: Tue Mar 27 11:13:48 EDT 2007

ok. time to write PF in forth, by gradually bootstrapping into
different languages. the lifts are:

1. vector -> linear lists
2. non-managed -> refcount managed
3. untyped -> type/polymorph
4. proper GC
5. scheme


the first lift is the same as the one i already did, which is lifting
native code to vectored rep. the lower interpreter's composites become
the higher interpreter's primitives. however, if data is also being
lifted, the change is in no way trivial: primitives won't accept the
data until it's moved to a linear stack.

so maybe this needs to be separated. the lift to lists is different
for data than it is for code. on the other hand, it does look like a
nice place to insert some type checking code.

need to think a bit more..


Entry: multimethods
Date: Tue Mar 27 11:49:14 EDT 2007

i had this idea of representing types using huffman coding, in a
binary tree. this requires a set of fixed types and some information
about which ones are used most, but it might be quite optimal.

there is a lot lot of room for optimization here, moving type checks
outside of functions etc.. but it will probably require some type
specs.


Entry: poke
Date: Wed Mar 28 10:57:34 EDT 2007

let's write poke again, the PF vm. the first thing i need to do is to
generate C code from some sort of s-expression.

expression conversion seems trivial, just need to distinguish between
the bultin infix operators, and prefix expressions with comma
separated argument lists.

statements are more problematic. bodies are straightforward, but how
to handle special forms like for/while/do ?

seems i got most of it running now. main features:
- an s-expression interpreter with a primitive and a composite level
- used to implement 2 interpreters for statements and expressions

now i was thinking if it would be possible to create some kind of
downward lambda. i can't use the gcc extension..

yes, but i do need to allocate ALL functions in structures, meaning
explicit activation recors, and use lexical addresses. if this is
used, it's better to completely forget about any local C variables.


Entry: downward funargs
Date: Thu Mar 29 16:12:22 EDT 2007

so, attempt to create a 'downward lambda' for poke. allocating on
stack for now, with later possibility to allocate on heap. how hard is
this to have in some form?

simplifications:
- all cells are the same size
- values are pointers to 'object'

this needs quite a bit of support:
- environments
- closures

the function bodies themselves take:
- environment
- arg list (part of environment?)

a function invokation is:
- create environment extension
- run function
- cleanup environment extension


{
	object_t env[3]; // parent + 2 variables

	// invoke a function 'FUN'
	({
		// create new environment
		object_t ext[2];
		ext[0] = env;     // link parent
		ext[1] = 123;     // init first and only arg
		FUN(ext)          // invoke fun
	})
}


this resembles PICO.


ok.. going a bit too far here. what about introducing these features
when they are really needed?

one question though.. if only downward closures are needed, why not
use dynamic binding instead?

nuff.


Entry: back
Date: Thu Mar 29 17:50:44 EDT 2007

back to the code generator. the reason i wrote this was twofold. one
is to have a portable target for brood forth. the main idea there is
to rewrite mole into someting more graceful, and have a basis for
(re)writing PF.

and two: i need a language for expressing the signal processing code
in PF. this should not be forth, but a multi -> multi dataflow
language. maybe just forth + protos?

so. i think the next step should be to transform current cgen (poke)
so it has an extensible name space.

maybe it is a good time to look into defining new languages inside
PLT, since that's what i'm doing basicly, instead of mucking about
with explicit environment hashes and interpreters.

something to iron out: it's not a new language, it's a cross-compiler:
you want to define functionality accessible in one name space using
functionality accessible in another name space.


Entry: extendin cgen name spaces
Date: Fri Mar 30 10:31:15 EDT 2007

i don't really need to make the hash tables available. it's much
easier to just create a new interpreter function which falls back on
the basic one defined in cgen.ss

hmm.. i got myself in trouble again. the above doesn't work since
statement/expression are mutually recursive. in addition to that,
statement uses closures. maybe i do need a hash?

ok. i think i got it ironed out a bit. using a hook for both the
expression and statement formatters, and calling this hook recursively
does do the trick.


Entry: compiler structure
Date: Fri Mar 30 15:01:41 EDT 2007

so.. basicly, a compiler/assembler/whatever has the following
'natural' structure:

T = target language
S = source language
C = compiler language

it's best to separate the S -> T map into:

primitive macros  S -> T  (small)
composite macros  S -> S  (big)

you want to write both S -> T and S -> S maps in C. the reason you
want an S -> S map is because it contains higher level code than a S
-> T map.

one pitfall is to shield functionality in C by not properly mixing in
the T name space. the most straightforward way to implement both maps
is quasiquoting: quoted S or T and unquoted C. including the compiler
language is more precise:

primitive:   C,S -> T
composite:   C,S -> S

badnop is already organized this way: the primitives are peephole
optimizing pattern matchers, where C is scheme. writers and state
modifierd are composite, with C being cat. and the recursive macros
are a cleaner S -> S map, with C empty.


Entry: lifting
Date: Fri Mar 30 15:22:03 EDT 2007

now for the ambitious part. the thing that got my whole forth/PF thing
started is a desire to generate automatic control structure for video
DSP building blocks. basicly:

IN:  a highlevel description of how pixels are related through
     operations

OUT: a compiled representation processing images / tiles

the core component here is loop folding:

(loop { a } then loop { b }) -> (loop { a then b })

the win is a memory win: intermediates should not be flushed to main
memory.

so compilation generates the control structure. compilation 'lifts'
the pixel building blocks into something interwoven with the control
structures.


Entry: grid processing
Date: Fri Mar 30 16:01:57 EDT 2007

the possible optimizations depend tremendously on the amount of
information available on the individual processors, so the idea is
to keep the primitive set really simple, and look at their
properties.

* associative    (n-ary op consisting of n-1 binary ops)
* commutative    (binary op)
* linear/linear1


+   l a c
*   l a c
/   l1
abs


the typical structure to look at is a one dimensional FIR filter,
since this can be extended to 2D (space) and 3D (space+time)
filters.

(* gain
   (+ x
      (n x 0 -1)
      (n x 0 +1)))

let's analize. 1 and 3 are constants, so (/ 1 3) can be
evaluated. x is used in a 'n' expression, which we use to denote
membership of a grid. let's make all parameters into grids

(* (gain)
   (+ (x 0)
      (x -1)
      (x +1)))

so (gain) is a 0D grid, (x 0) is a 1D grid (x 0 0) is a 2D grid etc..

composite operations can be specified, for example

(processor (a b)
  (+ (a) (b 0) (b 1)))

this means all parameters need to be declared, since we need to know
the order. the syntax i'm using here requires ordered parameter
lists. i prefer this over keywords, since it is more compact, and we
need to fill in all inputs anyway (no explicit defaults).

another ineresting operation on an expression is to compute it's
reverse: an expression represents a dependency graph, which can be
inverted. however this is only iteresting for multiple inputs, which
we won't use yet: apply explicit subexpression elimination and graphic
programming.

ok, so we need parameter names.

another interesting operation is fanin: how many times is a single
value used? this is important for memory management
(linearization). note that linearization and operation sequencing is
almost equivalent to translation to forth.

maybe it's time to go for the first iteration binder. we map a single
function to an explicit iterator. i.e.

(+ (a 0) (a 1))

it has a single 1D grid input, and produces a single grid. ah!
something i forgot: what's the output type? a grid of dimensionality
equal to the maximum of the input grids.

so, an n-dimensional grid is placed on the same notational level as an
n-ary procedure.

ok. the above can be transformed to the loop body

(+ (index a (+ 0 i)) (index a (+ 1 i)))

where a runs over the line. the rest is border values:

(+ left (index a 0))
(+ (index a w) right)

so the idea is to make the loop body and the 2 borders.

implementation (see ip.ss)

     implicit  ->  explicit
     (a 0          (a ([I 0]  0)
        1             ([I 1]  1)
       -1)            ([I 2] -1))
                          |
            loop depth ---X


Entry: thinking error
Date: Fri Mar 30 19:53:11 EDT 2007

the error i made previously was to 'precompile' things: bind stuff to
tiles, then bind some stuff later in an interpreter. the problem with
this is that you're solving the same problem twice. not very good..

a much better idea is to keep everything in a highlevel description,
then compile it as composition goes on: one thing i'm dreaming about
is to build things in a pd patch, then hit 'compile' for an
abstraction, and it will compile an object that performs the
operation.

so, the other error was to use low level reps. forth has benefits, but
not for writing compilers, which is mainly template stuff: mixing name
spaces. you really need quasiquoting and random parameter access.

EDIT: this is what's so nice about the scheme macro system: the mixing
of compiler and target namespaces works really well.

Entry: monads and tree accumulation
Date: Sat Mar 31 10:44:58 EDT 2007

writing the source code analysis functions i run into the following
problem: map a tree, but also run an accumulation. now of course it's
easiest to just use local side effects here, since they behave
functionally from the outside. (linear data type construction).

but just out of curiosity, what kind of structure is necessary to do
this functionally?

basic idea of monads: if you don't save 'extra' data in the
environment, save it in the data. this requires 'map' to be
polymorphic, so it can act on this type accordingly. i don't think
it's worth the trouble here.


Entry: boundaries
Date: Sat Mar 31 12:14:13 EDT 2007

border values

using finite grids, borders need to be handled. basicly, invalid
indexing operations need to be replaced by valid ones. some
strategies:

constant:    (a -1 0 0) -> c
repeat:      (a -1 0 0) -> (a 0 0 0)
wrap:        (a -1 0 0) -> (a (wrap -1) 0 0)

how to name border regions? there are several distinct cases, for
example a square grid has these:

(L L) (I L) (H L)
(L I) (I I) (H I)
(L H) (I H) (H H)


L  low boundary
I  bound to iterator
H  high boundary

with looping indicated by { ... } a full 2D loop looks like:

   (L L)           ;; top left
   { (I L) }       ;; top
   (H L)           ;; top right
   {
       (L I)       ;; left
       { (I I) }   ;; bulk
       (H I)       ;; right
   }
   (L H)           ;; bottom left
   { (I H) }       ;; bottom
   (H H)           ;; bottom right


that basicly solves the problem. note that it's best to lay out the
code in a L I H fashon to keep locality of reference.

on to representation.

the loop body is a serialization of an N-dimensional 3-grid. (a 2-grid
is a hypercube). it's serialized into a ternary tree.

how to represent ternary trees? the following representation looks
best in standard lisp notation:

     ((L . H) . I)

other variants have the dot in an awkward place. another possible rep
is (I H L) which can be written in mzscheme's infix notation as

      (H . I . L)

i'm going for the former, as it allows to use (B . I) in case L and H
are the same. in order to generate the full loop body.

EDIT: it's easier to just use s-expressions: (range H I L), and have
'range' to be a keyword..

loop borders can be constructed using the data structure provided by
'src->loopbody'

ah! it's possible to separate the operations performing loop order
allocation and pre/post expansion, but probably not very
desirable.. so let's combine them, so we can get rid of using natural
numbers.

note: i found out that when i needs index lists, i'm doing something
wrong: applying a certain order on things...

so, in order to generate the tree above, we consume coordinates from
left to right. all loop transformations need to be done on the source
code before generating loop bodies.


Entry: lexical loop addresses
Date: Sat Mar 31 14:33:53 EDT 2007

i need a notation for addressing loop indices. currently i'm
converging on not updating pointers in a loop, but using indexed
addressing, since that's something that can be done easily in
hardware.

an optimization here is to use relative addressing only for the inner
loop, so only one index needs to be added, and cache the computation
for all other relative accesses.


each loop has exactly one index that's being incremented. the depth of
the loop determines how many indices are bound. what i'm trying to do
is to generate the border conditions that have not all indices
bound. how to do that?


loop a {

     ... data (a) ...

     loop b {

     	  ... data (b c) ...

     	  loop c {

	       ... data (a b c) ...

	  }
     }
}


the inner loop here needs to be split into 3 parts

data (l  b c)
data (a  b c)
data (h  b c)

then the 2 unbound parts can be moved out of the loop.


so, basicly.

BODY -> (nonfree . free)

as an example, take (+ (a 0) (a 1))

split in
      (+ (a 0)    (a 1))       ;; border
      (+ (a (i 0) (a (i 1))))  ;; body


since code is originally in unbound form, it might be more interesting
to perform binding inward. start from the relative description, and
split this into a partially bound and partially filled structure.

              border <- relative -> bound

then iterate downward

before this is possible, all code needs to be translated to full
'virtual full grid'. later on, it can be substituted back to its
original form.


Entry: representation
Date: Sun Apr  1 10:22:02 EDT 2007

ok, i think i got the basic idea, so it's time to start using some
abstract data structures. on the other hand, if using list structures
is possible, debugging is more convenient.. sticking to lists.


Entry: breath-first
Date: Sun Apr  1 16:52:47 EDT 2007

i think this is the first time i ever encountered a problem that's
easier solved using breadth first expansion.

hmm.. that's probably plain bullshit.. it's just my particular
approach at this moment using an 'infinite' expansion with an escape
continuation:

(define (expand e)
  (call/cc
    (lambda (done)
       (let ((expand-once
              (lambda (f) ...
                      (done e))))
            (expand (expand-once e))))))

basicly, this just iterates expand over and over, and backtracks to
the last correct expansion 'e' whenever some termination point is
reached in expand-once.

ok, abstracted in 'expand/done'


Entry: separation of concerns and exponential growth
Date: Sun Apr  1 19:15:54 EDT 2007

was thinking.. separation of concerns: hyperfactoring, whatever you
call it, is a means to move from linear -> exponential code dev..

once you can separate things into independent parts A x B, increasing
functionality in either will increase total functionality by the same
multiplication factor. if they are not separated, increase in
complexity doesnt translate to increase in functionality.

this is very badly explained, but i think i sort of hit a spot here.

compare the payoff of time invested in building independent/orthogonal
building blocks that can be combined, against the payoff of time of
tweaking a small part of a huge system. the added complexity
(information, code size) might be the same, but the added expressivity
(possible reachable behaviours) is hugely different. multiplication in
the first, and addition in the second.

it's the difference between adding a bit in state encoding
(exponential), and adding a state (linear).


Entry: the inner loop
Date: Sun Apr  1 19:26:11 EDT 2007


how to encode the innermost loop? for example start with

(+ (a (I 0) (I 1)) (a (I 0) (I 0)))

with the inner loop being the last index (arbirary choice). the main
question to answer is: "relative or absolute addressing?"

either one uses explicit pointer arithmetic, or one uses index
registers. for the outer loops, increments occur infrequently, so
it's best to use pointers. a -> pa

(+ (pa (I 1))
   (pa (I 0)))

so, the number of registers used for addressing in the inner loop is
equal to the number of grids (including the output one), and one loop
index. if addressing modes like BASE+REL+OFFSET are not available,
extra pointers or indices are needed.

i seem to remember that incrementing pointers using the ALU is bad on
intel, and it's better done using AGU..

i guess there's a lot of room for doing this right or wrong depending
on the architecture. and i sworn never to intel assembly again :)
if C is the target language, i guess some experimentation is in
order. for simple processors, it seems quite straightforward how to
subdivide things so maximum throughput can be attained.

i guess the next target is to generate actual code. that should iron
out the conceptual problems..


Entry: inner loop cont
Date: Tue Apr  3 09:47:24 EDT 2007

the problem is, the indentation shown by 'print-range' is not the same
as the indentation for the C code loop blocks. setup code needs to be
moved out of the loops. going from inner -> outer:


   (+ (grid a (I 0) (I 0)) (grid b (I 0) (I 0)))

needs to be translated to


   (update a 0 (I 0))
   (update b 0 (I 0))

   (+ (grid a (I 0) 0) (grid b (I 0) 0))

   (downate ...)

effectively updating the pointers before the loop is entered. i was
thinking about just shadowing a single variable 'i'

in that case, what is necessary is to make sure each expression
referencing I has only one occurance (or an occurance in the same
position).


instead of construction an intermediate range representation, it might
be more valuable to generate the loop structure directly, following
the same approach as before.


    (a (0 1 2))

->
    (a (L 0) (1 2))
    (a (I 0) (1 2))
    (a (H 0) (1 2))

->

    (let ((a (L a 0))) (a 1 2))
    (let ((a (I a 0))) (a 1 2))
    (let ((a (H a 0))) (a 1 2))

so, basicly just specializing variable names. this boils down to
computing pointers.


so, to resume the downward motion is:

(expr (+ (a 1) (a 0)))

->

(bind ((a_p1 (S a 1))
       (a_p0 (S a 0)))
      (expr (+ (a_p1) (a_p0))))
...


ok, i think i got somewhere:

> (p '(+ (a 0 0) (+ (a 1 0) (a 1 1))))
{
	int i;
	for (i = 0; i < (400 * 300); i += 300)
	{
		float* a_p1 = a + (i + (1 * 300));
		float* a_p0 = a + (i + (0 * 300));
		float* x_p0 = x + (i + (0 * 300));
		{
			int j;
			for (j = 0; j < 300; j += 1)
			{
				float* a_p1_p1 = a_p1 + (j + 1);
				float* a_p1_p0 = a_p1 + (j + 0);
				float* a_p0_p0 = a_p0 + (j + 0);
				float* x_p0_p0 = x_p0 + (j + 0);
				*(x_p0_p0) = (*(a_p0_p0) + (*(a_p1_p0) + *(a_p1_p1)));
			}
		}
	}
}

now, there are quite some possible optimizations or simplifications.
one is to leave the inner level as indexed pointers. another is to
replace stride multiplication with addition.


Entry: scheme syntax
Date: Tue Apr  3 22:35:32 EDT 2007

today i (re)discovered:

      (define ((x) a b) (+ a b))

and was surprised that it also works for

      (define (((x)) a b) (+ a b))

first saw it used in SICM


Entry: accumulation / values
Date: Tue Apr  3 23:29:08 EDT 2007

i need an abstraction for (linear) accumulation. no need to mess with
monads. the pattern i'm finding is:

* substitute expression in tree + accumulate a set

i want a function that returns 2 values, the original expression and
the accumulated set.

note that use of assignment like this isn't so bad, beacause it's
encapsulated (linear): there are no references to the object until
it's ready. also note (again) that using monads requires polymorphic
versions of generic list processing operations, and is overkill.

the 'lifting' technique used in the compiler do need monads, because
they are open: each operation modifies a state, and intermediates are
accessible, so pure functional programming is a good idea to keep
backtracking/undo tractable.


Entry: aspect oriented programming
Date: Wed Apr  4 08:56:29 EDT 2007


1972: Parnas "On Decomposing Systems"
1976: Dijkstra introduces term "Separation of Concerns"
1982: Brian Smidth introduces "Reflection"
1991: Metaobject Protocols
1992: Open Implementations
1993: Mini Open Compiler
1997: First paper on AOP
1997: D
2001: AspectJ
2004: JBoss

http://www.cs.indiana.edu/dfried_celebration.html
Anurag Mendhekar: Aspect-oriented programming in the real world


Entry: back to sheepsint
Date: Wed Apr  4 12:49:40 EDT 2007

i need to restart the board design soon, but i do need a fully
functional dev env before i can do that. some more things are
necessary:

proper stateless message interface (CAT = object) for sending code and
performing command completions.


Entry: summary
Date: Wed Apr  4 19:21:27 EDT 2007

THINKING ABOUT PF

been looking into bootstrapping PF from lowlevel forth core. aspects:
polymorphy and types (clos), linear memory managent (lazy copy),
transition from vector -> list.

the latter is interesting since it contains 2 parts: code: needs a new
interpreter, data: needs a lot of new primitives, maybe combined with
type checking. i wonder whether it's easier to just start from a cons
cell VM directly.


C CODE GENERATION

* separate statements and expressions
* plugin expression transformers


POKE

* using a non-blocked version of C gen


LOOP CODE GENERATION

i think i have the general idea:

* c code generation working
* functional specification mapped to assignment
* nested loops: blocks to bind locally cached index pointers
* additive index arithmetic
* inner loop uses a single index

the scheme code looks simple, and well factored. gut feeling says the
code is simplified enough for gcc's optimizer to tackle it.

i still need to do the border conditions. this will need to be example
driven. next month i might try to plug in some code.


Entry: from forth to PF
Date: Wed Apr  4 19:37:52 EDT 2007


1. data

a PF primitive written in forth looks like:

- (force) collect arguments (list -> vector)
- method lookup
- perform primitive forth code
- (lazy) push arguments (vector -> list)

so the stack is implemented like:

[ list | vector ]

the vector actually needs to be a circular buffer, because it behaves
as a deque: traffic between list and vector is on bottom end, while
primitives operate on top end, unless the primitives accept their
arguments reversed.


2. code

fairly straightforward.


because of the difficult impedance match between list and vector
machines, i think it makes sence to forget about building one on top
of the other, and write only the vm.

an interesting question is whether this can be abstracted. and also,
can i write the VM in itself?

been tinkering a bit with poke.ss and mole.ss
got the basic permutation worked out.


Entry: alan kay name dropping
Date: Wed Apr  4 19:59:37 EDT 2007

from "Proposal to NSF - Granted on August 31st 2006 - Steps Toward The
Reinvention of Programming"

i'm curious about the albert thing. what i read i don't understand
though.. better next time.

motivation and inspiration:
John McCarthy  LISP
... bootstrapping


Entry: persistence & late binding
Date: Thu Apr  5 16:11:14 EDT 2007

so, borduring verder on that article.. i ran into the problem of
saving parsed code, beacause semantics is stored as a procedure. what
about replacing this by a symbol?

assuming data will only be read by a system that has the bindings in
place, this is a valid approach. then bootstrapping can be solved
differently, and all internal representation is just cache.

so..

a word = code object
* a source representation
* a symbolic semantics (other word?)

* a cached transformer procedure (concrete semantics)
* a cached meaning = lambda expression

the cycles in this representation need to be broken somehow.

hmm.. this is actually a lot harder than it sounds, since the cache
really needs to be a cache. probably needs a from-scratch approach.

ok. started the 'symcat' project. for the current project i think i
can live with non-savable parse trees, since it's always possible to
save source code, and i have a working 'reload core' command for use
during compiler development. all in all, the system i'm writing is
fairly straightforward.

so no more about this really cool idea here. see symcat.


Entry: name spaces
Date: Thu Apr  5 16:59:24 EDT 2007

something that's getting on my nerves a bit are CAT namespaces. small
special purpose apps can benifit from the simplicity of a single
namespace and short names, but for CAT i'm not so sure any more. also,
i'd like to catch undefined names early on.


Entry: standalone
Date: Thu Apr  5 17:06:15 EDT 2007

time for the standalone forth. one of the things i've been wanting to
try for a while, but never got to.. i should have a look at flashforth
and also retroforh for inspiration. roadmap:

* 'accept' terminal input into buffer
* 'parse' words
* 'find' a word in the dictionary

compilation is straightforward, but requires some thinking since stuff
will need to go to ram first. (it's multipass, i.e. if .. then).


Entry: reflection
Date: Thu Apr  5 19:37:03 EDT 2007

the ideas of reflection and metacircularity probably go hand in
hand.. in CAT i'm getting a bit annoyed by having to choose between
implementing something as a scheme function, or as a cat function. for
example: semantics is implemented as a scheme function, so it's
technically not accessible from CAT.

let's re-iterate. the point of CAT...

usually, a forth compiler is written in forth. a cross compiler poses
problems in this sence, since the normal 'local feedback loop' doesn't
work. the (re)constructed rationale:

1. forth is extremenly modular: a function is a composition of functions
2. a forth compiler is most naturally expressed in the same way: a forth
   compiler is a composition of compilers (macros).
3. most naturally, forth is implemented metacircularly.
4. i can't do that because the target is too simple -> simulated
5. the metalanguage best reflects the same structure: compositional
6. choosing for a functional language (CAT) -> monadic composition
7. CAT is written in scheme to avoid it's own bootstrapping problem


the last one actually reads as: CAT is an impedance map from scheme to
a compositional language to easier implement an extensible optizing
forth compiler. if CAT is metacircular, there is no need for
scheme. this approach is not used because:

- (plt) scheme is packed with features
- i use a fair amount of scheme to provide primitives. in fact
  'primitives' is not really a good word for it..

so it's best to see CAT as scheme in disguise, and as a vehicle for a
decentralized compiler/interpreter, bound together by monadic
composition. to have the possibility of writing new CAT words is
mainly for extension property (writing the compiler), not for CAT
core.


Entry: nested scope
Date: Thu Apr  5 19:55:49 EDT 2007

as i've learned, these features are really necessary to write a
compiler:

* lexical variables
* quasiquotation
* pattern matching

however, they do serve most purpose adapting to a representation that
is inherently imposed, i.e. assembly language syntax. anything that is
non-compositional is better handled with something like scheme.

however, if you can design everything from scratch, it's probably
quote doable to get by with a couple of combinators and aggressive
factoring.

but, in the end, some form of lexical scope should be possible, if
only for the practical problem of name clashes.. there is only one
question. are names functions or values?

in lisp, they are values, because functions are explicitly invoked: if
a variable is in the head of a list, it's a function. in a
compositional language it would involve something like 'i'.


((a b c) locals
  ... a i ... b i ...)

treating things like values makes them more natural. an abstraction
could be added to do the other (bind as program). then, how to handle
the environments?


NOTE: got lexical variables and quasiquotation working in symcat, but
only by a more direct cat->scheme translation. i dont think it's
really necessary here, since i do most in scheme. also, some name
space issues are still not resolved. maybe i can switch for the next
reqrite tho :)


Entry: back in the solder lab
Date: Mon Apr 23 17:10:12 CEST 2007

things i need to get working before the end of the week:

* sheepsint input switches
* room for xtal on pcb
* capacitors on pots

random hacking:

* 3.3V serial interface
* usb?


Entry: emacs integration
Date: Mon Apr 23 20:03:23 CEST 2007

this screams for a 'once and for all' solution. i'd like to keep brood
portable, so using unix sockets for a console as is done for pf, is
not the way to go. since we're running a lisp in a lisp editor, it's
probably best to keep the one 'default' interface on stdin/stdout as a
lisp channel, and run the console logic in emacs.

maybe a bit in the style of slime?

ok.. following slime to ielm.el, modified to connect it to a running
scheme process. slime is too big for me to make sense of, i might
return later for some features, but i need to get something running
first.

what i need is multiple languages on the same console, or maybe
different buffers?

the whole idea is to have most of the parsing in emacs, so emacs can
make the editing a bit smarter. maybe i should have a look at:


Entry: erepl
Date: Wed Apr 25 14:18:32 CEST 2007

looks like it's working reasonably well.. things to add:

* tab completion
* multipe languages

either parser in emacs, or sending out raw lines. the former is better
for better line editing, it already does that really.. the second is
better so i don't need to rewrite anything, though forth parsers are
really simple and i'm not using a tremendous amount of special plt
read syntax. i wonder if emacs read syntax is extensible?

anyways, what i do need is a way to switch the mode in emacs, and not
in the target scheme image.


Entry: fresh install
Date: Thu Apr 26 09:16:42 EDT 2007

i tried a fresh install, but apparently my compile script tries to
compile stuff in the plt dist, starting with the deps of "match.ss".

"sudo ./go" should work..

so, how to install? should i keep all the source files as 'writable'?
should i keep it in dev land for a while? maybe best.


Entry: project directory
Date: Sat Apr 28 19:24:10 CEST 2007

i need to solve the following problems:
- core should be installed system-wide
- project directory should contain multiple projects

the idea is that 'clicking' on a state file should bring up
everything.

let's try to make sense of this: the brood system is aimed at
developers. in that sense, it is encouraged to hack the system, which
means the scheme files should not be stored system-wide, and they
should be writable. this allows the compilation cache to remain as it
is.

the source dir has a subdir called 'prj' which contains
subdirectories, one for each project. these individual subdirectories
could be managed using darcs.

it's absolutely essential to find a way to have the TARGET determine
which project to load. in order to do this, we use the reply of 'ping'
as the name of the project.

there is one default project for each architecture, which serves as an
example.

-> compilation from scheme: right now i invoke mzc, it's probably
   better to do so from a scheme script.


all this seems to work. next problems:

* windows / osx : emacs + serial port config

* using snot : rewrite all language repls to standard interface : one
  line (string) at a time, require from snot for it to be 1 or more
  valid s-expressions.

for the last one, i think i found it: just have 'prompt' display the
prompt and accept the next line input, this can be done using a simple
coroutine/continuation trick.


Entry: getting to working usb
Date: Sun May  6 13:04:44 CEST 2007


roadmap:

* constants as forth file
* platform dependent constants
* 2550 init
* get serial monitor working
* ...


Entry: usb debugging
Date: Mon May  7 13:36:31 CEST 2007

got the kernel messages going etc.. looked at doc/usb/asmusb.asm
(johannes adapted this from C code) to find out i need to enable full
speed instead of low speed: #0x14 -> UCFG. now i get transactions.

time for the highlevel protocol.


Entry: usb device descriptors : usb.ss
Date: Sat May 12 13:25:30 CEST 2007

looks like it's working: i can compile device descriptors from a more
reasonalble highlevel description. next step is to organize the tables
in flash.

ignorant of content, the thing it needs to do is to map

device     -> (n,addr)
(string,i) -> (n,addr)
(config,i) -> (n,addr)

the logic then needs to transfer the buffer in chunks

so i need a proper tree structure in flash. preferably one that can
handle errors so the device is a bit robust.

these things are read-only, so they can be implemented directly as
code. for example:

device/string/config  ( id -- string )

which is encoded as

: device  3 word-table
  	  addr0 ,,
	  addr1 ,,
	  addr2 ,,

: addr0   length , 1 , 2 ,
: addr1   length , 3 , 4 , 5 ,


here 'word-table' does bound checking + throws exception


for error handling: it's probably easier to just use 'max' to limit
the offset, then install the last redirect as an error handler, so:

: config   3 min route
  	   config0 ;
	   config1 ;
	   config2 ;
	   error ;


Entry: conditionals < and >=
Date: Sat May 12 14:54:37 CEST 2007

in pic18-comp.ss they are implemented as macro predicates, following
the standard forth comparison operators: consume 2, leave condition.
( a b -- ? ). these can be followed by if.

i've been looking into a more general way of using the CPFS[EQ|GT|LT]
opcodes, by mapping them onto the conditional jump implementation.
been avoiding this for a while, because i have unsigned 'max' and
'min'.

the thing is 'cbra'. it consumes a condition, and compiles a
conditional branch. does this really make sense? the other
conditionals can be inverted, these cannot: only by swapping jump
targets. so:

- change 'not' to support a new pseudo op
- change 'cbra' to do this branch based swapping

looks like it's working.

an optimization is possible in case of single opcode instructions, but
it's probably better to just code them as macros. needs some thought.


Entry: usb descriptors again
Date: Sat May 12 15:53:12 CEST 2007

it's probably best to just keep using 'route' in combination with
'min' and an error handler. let's standardize a 'buffer' or a 'string'
to what i already use for the 'ping' command:

: my-flash-buffer
	string>f
	length ,
	0 , 1 , 2 , 3 , ;

this means that the word 'my-flash-buffer' sets the current flash
object (the f register). a string is a flash object which has its
length stored in the first byte. so '@f++' on a string object will
give the length, and leave f pointing to the raw bytes, so successive
'@f++' will read out the bytes.

the usb descriptors should be stored in exactly the same way: device,
configuration and string should just set the current flash object,
which is understood to be a purrr string.

so, the following output

((device
  (16 1 16 1 0 0 0 8 216 4 1 0 4 3 2 1))
 (strings
  ((23 3 68 101 102 97 117 108 116 32 67 111 110 102 105 103 117 114 97 116 105 111 110)
   (19 3 68 101 102 97 117 108 116 32 73 110 116 101 114 102 97 99 101)
   (5 3 48 46 48)
   (10 3 85 83 66 32 72 97 99 107)
   (28 3 77 105 99 114 111 99 104 105 112 32 84 101 99 104 110 111 108 111 103 121 44 32 73 110 99 46)))
 (configs
  ((9 2 25 0 1 0 0 160 50 9 4 1 0 1 3 1 1 1 7 5 128 160 8 0 0))))


can be transformed into:

: device  string>f <length> , ... ,
: string  5 min route string0 ; string1 ; ... ; string-error ;
: config  1 min route config0 ; config-error ;

: string0 string>f <length> , ... ,
: config0 ...


maybe it's easier to just eliminate the intermediate names, since
there is a notion of arbitraryness involved. they are just local
labels, as used with if ... then. all in all, just generating a couple
of <tag><number> names is probably easiest.

ok, done.

now loading. the thing to fix next is a global path for any kind of
file loading mechanism.


Entry: some weird bug with forth parsing
Date: Sun May 13 13:35:02 CEST 2007

apparently, for parsing macros (color macros) like 'load' and 'path',
there is a problem when the macro that's implementing the behaviour,
popping the name from the data stack, is not defined..

i don't know why.. maybe i need to make that macro parsing part a bit
more transparent.

currently parsing words are a bit of a hack. i need to get to the core
of the problem and fix it. again:

* forth macros are cat words, as such they are 1-1 semantic/syntactic

* forth parsing transfers parsing words to quoting code: something
  forth source cannot represent, but parsed cat code can.

maybe i need a symbolic intermediate form, where lists are quoted
explicitly? like PF. with a mapping like:

(load file.f) -> (('file.f load) run)

hmm.. it's probably just a bad day to make decisions.

ok. calmed down a bit.

load-usb is working now.
next: hands on transfer.


Entry: state machine or task?
Date: Sun May 13 15:55:33 CEST 2007

a task that does usb transfers makes sense. however, since i'm still
debugging i think a more lowlevel approach is better. when i got it
running, i can write everything in blocking form.


Entry: jump bits
Date: Sun May 13 15:56:55 CEST 2007

words use relative addressing. this can lead to trouble. what about this:

* just assemble, but when an address doesn't fit, keep it symbolic.

* 3rd pass: gather all addresses, and compile words which contain a
  goto statement to the words that were called, but not reachable.

this will keep code small, and the assembler simple: no need for
variable size goto instructions inside words. the rationale is: this
forth is for lowlevel stuff. for highlevel things, use a DTC on top of
this: there you don't have a problem.


Entry: stamp dead
Date: Sun May 13 19:57:41 CEST 2007

serial port driver dead or something? i don't know. it doesnt seem to
be a software problem. chip isn't doing anything. without scope hard
to debug... so plan B

1. brood + snot (1 evening)
2. sheepsint buttons + audio out port (1 evening)

-> leuven for scope and other stuff..


Entry: stamp back
Date: Sun May 20 12:01:59 CEST 2007

something going on here.. i tried stamp 2, which refused to work a
couple of times, until i got it going. then replaced with the original
'broken' stamp, and now that one works too.

maybe it's just my breadboard.. since i did have to move 2 pins to the
left on the breadboard because the 2nd stamp's pin header is too
big.


Entry: late binding
Date: Sun May 20 12:47:48 CEST 2007

what i need next is some form of late binding to do incremental
debug. the code runs fine up to a point from which i need to make
small changes to the code. reloading there is a drag, so i need a
proper construct.

   defer broem

   2variable broem-hook
   : broem broem-hook run-hook ;


some premature optimizations: since these variables don't really need
to be accessible, it's maybe better to put them somewhere behind the
ram bank, for example shadowed by the FSR registers.. this way a hook
can be represented by a 1 byte XT.


Entry: color macros
Date: Sun May 20 18:30:49 CEST 2007

what i mean with color macros is macros that modify the 'color' of
subsequent words. currently i have no way to implement new parsing
words in forth. this is not a good thing.. something is broken, but i
dont know what exactly. probably my understanding...

problem: parsing words use automatic name mapping. this is bad, since
it's viral. meaning, once you start doing things like that it's all
over the place: there is really no clean way to nest parsing words.

so i need a different approach: extend the partial evaluator to
include symbols. the deal is this: the PE uses the assembly buffer as
a data stack. because some words use the CAT data stack for 'data'
items, things get confusing.

so, the thing is: i need a single macro that quotes the next atom in
the input stream as a literal, and then use that.


Entry: partial evaluation revisited
Date: Sun May 20 19:31:53 CEST 2007


i ran into a pattern: the assembler buffer can be used as a data stack
to perform partial evaluation. i don't have a proper way to make this
sound, but it seems to eliminate the need for an 'interpret mode' in
the sense of classical forth.

the interpret mode is replaced by a set of rewrite rules that will
perform compile-time evaluation. so instead of

	[ 1 2 + ]L

we just have

        1 2 +

with the same result: 3 being compiled. actually, in the latter case
purrr will produce [movlw (1 2 +)], so the evaluation can be delayed
as long as possible.

this can be extended to the following pattern: allow target forth
values to be richer than just numbers, but require that they can be
combined into lowlevel constructs.

since i use this trick a lot, why not make it a feaure instead of an
optimization? currently the postcondition of compiling a literal is
valid assembler code. what about relaxing this to a delayed literal
stack, and introducing a 2nd pass to comb out all the remaining,
non-optimized literals.

once i have this, partial evaluation becomes better defined: quoted
symbols can be included and can be used in parsing macros. the CAT
data stack can then be used for control operations only.

big change. probably requires a temporary fork.

NEXT: 'lit' macro preprocessing step

is it possible to make 'lit' a pseudo-asm operation? yes, but the
disadvantage is that it's not 1->1. is this required? yes. the asm is
1-1 sym<->bin, so this needs to be solved in the compiler.

considering the percentage of code that intersects with delaying
'lit', i guess it's best to wait until after the big deadline, and
work around the macro stuff now. as a matter of fact, i can still do
it the old way, just adding a single 'quote' operator, for example
backtick `.

that's a good idea, as long as there's a [`] too, meaning macros can
have literally quoted symbols in them. with those 2 primitives, all
parsing words can be implemented.


Entry: back to debugging -- deferred words
Date: Sun May 20 20:37:11 CEST 2007

if the idea is just to get debugging working, it's easy: execute will
do enough.


Entry: back to thinking about the literal stack..
Date: Sun May 20 23:53:26 CEST 2007

there's a jucy fruit on the tree somewhere.. but i can't see it
through the thick leafs. a literal stack is an interesting idea, and
also is commutation of some constructs with literal stack..

i noticed that a problem atm is hardcoding of [lit a b] instructions:
the number of arguments is hardcoded. could be fixed with a postproc
step, but have to be careful there..


Entry: parsing macros
Date: Mon May 21 11:42:18 CEST 2007

forth parsing words require an input to be attached. my model does not
allow that: it requires parsing macros to live in a separate class.

hmm.. this is really kind of complicated. what about providing a
mechanism to create parsing macros as pure symbolic macros?

hmm.. ok, i got symbolic expansion macros now, but that's not the same
as recursive parsing macros!

i'm having difficulty getting my head around all this..

next step is to write a macro mode which recursively calls the parser.


ok, i think i found it now: the trick is to allow composition. the
best way to do this is probably to write the parsers as CAT words.


Entry: parsing
Date: Mon May 21 14:16:48 CEST 2007

i think i got it now. i'm just doing parsing wrong: each parser should
have an explicit 'read' and 'write' operation. then some glue can be
constructed to compose all of them.

'read' reads the next input atom, and 'write' outputs CAT code in
parsed or symbolic form.

i need to really let this go and get the usb driver working.. rewrite
stuff accumulated thus far:

- explicit literal stack with compile postprocess
- parser with recursive composition

anyway, the bigger picture becomes visible: 3 different interpreters

- compiler is kept in compositional mode: every source atom
  corresponds to a single action in CAT

- before: parser converts multiword constructs into single word constructs

- after: assembler uses localized arguments -> not compositional, just
  a sequence of independent commands


Entry: grounding problems
Date: Mon May 21 16:24:16 CEST 2007

very strange: if i touch the table, the pic resets. some kind of EM
interference. i don't really know what's going on, but putting the
stamp in a cage worked: just a grounded metal top of a metal box.

if i stick the probe in the carpet, i can measure about 25 V
peak-to-peak 50Hz signal. maybe i should just ground my table?

ok, i connected the TV cable shield plugged into the cable modem to
the case of zzz. without this cable there's 114V ac accross. this
seems to fix the problem: no more 50Hz on the carpet.


Entry: defer
Date: Mon May 21 17:11:41 CEST 2007

hmm... the only thing i really need is to 'overwrite' a
function. using a separate ram table for deferred words might be a
good solution if a lot of them are needed, but it sure does complicate
matters. moreover: it requires loading values to ram etc.. what i need
is really a cheap hack:

	 : someword nopf 1 2 3 ;

the 'nopf' could be overwritten, since it's #xffff. this opcode can
then refer to the next definition.


Entry: usb debugging
Date: Tue May 22 13:49:39 CEST 2007

using usbmon, i get this as first failure after the first request,
which is a device request:

d97cb540 144438646 S Ci:000:00 s 80 06 0100 0000 0040 64 <
d97cb540 145068664 C Ci:000:00 -84 0

the odd thing is the request length, which is set to 64 and not
8. status code is -84 which means

http://www.mail-archive.com/linux-usb-devel@lists.sourceforge.net/msg25936.html

so why doesn't it respond at all?

maybe i need to acknowledge the TRNIF before sending a response?


Entry: the a & f registers
Date: Tue May 22 16:50:37 CEST 2007

i need a proper coding style. lets try the following: caller is
responsable for saving the current object context. this means it's
regarded as a low level feature, and bad coding style to pass
arguments in the a and f register.

conclusion: use them only in small lowlevel words, and use functional
words or different object representations on higher levels.

CURRENT OBJECT = BAD !!


Entry: different macro implementation
Date: Tue May 22 16:57:42 CEST 2007

or better: an extension

currently 'macro' in forth only takes names from the macro dict. what
about allowing runtime behaviour here?


Entry: ram copy
Date: Wed May 23 11:17:14 CEST 2007

funny, but i don't have any ram copy facility! the reason is of course
there is only one free indirect addressing register to use. in order
to make a faster one, interrupts need to be disabled and one of the
two other regs need to be used.

no time to think about that now, so i'm going to avoid mem->mem copy,
and save only what i need.. (SETUP request is what i'd like to copy)


Entry: something wrong with XINST
Date: Thu May 24 12:19:49 CEST 2007

this is probably the cause of a lot of my misery: somehow access bank
variables don't work right when XINST indirect addressing is
enabled. for the workshop i switched back to old inst, with access
bank.

need to figure out what's going on there later: somehow
fetching/storing address 96 doesn't work either.. if i stay low, it
works.


Entry: bouncing ball physics
Date: Thu May 24 15:22:02 CEST 2007

a bouncing ball can be made using the natural rollover from 255->0,
combined with some coordinate mapping.

A---B
|   |
D---C

->

A---B---A
|   |   |
D---C---D
|   |   |
A---B---A


so, using the high bit to signify whether a coordinate is reversed, the
operation simply becomes:

: bounce clc rot<<c
  	 c? if
	    rot>>c
	 else
            rot>>c #x7F xor
	 then ;

or even simpler: 1st 7 high? test if -1 xor then


Entry: johannes config bits
Date: Sat May 26 18:52:37 CEST 2007

low voltage program off
HS oscillator
power up timer off


Entry: meta workshop notes
Date: Mon May 28 09:25:16 CEST 2007

all went really well after day 1 of total chaos, very happy with the
result in the end.

some remarks:

1. need a proper 'erase-all' in case the chip is messed up
2. need interaction words composition -> all symbolic
3. more docs or reference words -> find some automated mechanism
4. need simpler conditionals
5. maybe distinguish between @high? and high? -> btf are odd duck
6. investigate extended instruction set troubles
7. automate 'expose'


Entry: quoting symbols
Date: Mon May 28 09:35:39 CEST 2007

so, why not use syntax for this?

`hello : 1 2 3 ;

i think i need to preserve parsing words for the simple reason that
':' is a parsing word. changing that behaviour makes things very
different from standard forth.

however, internally the parsing words should compile to the literal
stack.

the code above is actually quite clean. it has a symbolic
representation as CAT code, in the form of lisp's quote form. this
could be translated to forth in a minimal way. i could use this
symbolic representation as the output of the parsing stage. an
alternative lexer could then be used to make use of the more
functional forth described above (one without parsing words, only some
symbolic quote mechanism, where macros are purely concatenative).

note that since it's not legal to have a literal symbol not optimized
away, the ':' is redundant: symbols present after conpilation are just
labels. maybe even better, symbols are always labels. so why not get
rid of the space?

:help 1 2 3 ;

so, if parsing macros are symbolic transformers, interactive macros
could be the same. 'test words' if you want. this could lead to a
better simulator. the first version looks better, and has `<word>
compile <word> as literal.


Entry: literal stack
Date: Mon May 28 10:01:19 CEST 2007

just a quick look at what it would take:
1. abstract all literal patterns
2. make a local change in the abstraction

so this boils down to writing a generic pattern generator for literal
opti, and a mechanism to execute arbitrary macros as a pattern. this
is already there, but a bit of a hack. maybe it should be the default?
ok. there is already 'lit' defined in comp.ss, which can be extended
to take multiple arguments.


Entry: cache
Date: Mon May 28 10:09:10 CEST 2007

an annoying thing in the current code is to have to reload everything
when an implementation of a word changes: the cache never invalidates.
or, it's not a cache.

so i need to change the implementation of 'word' to include a cache
mechanism. it would be interesting to plug into the cache mechanism of
scheme, but that would require either a lowlevel thing, or something
with namespaces.


Entry: bug fixing day
Date: Sun Jun  3 12:32:20 CEST 2007

time to clean up some minor annoyances:

* serial port settings: use 'system' + platform script when port is opened
* faster upload (faster baudrate?)
* snot integration + better emacs integration
* fix parser -> parse to symbolic code
* create interpret macros
* sheepsint: build board tests for proto


Entry: monad stuff
Date: Sun Jun  3 17:15:08 CEST 2007

one problem i have with the way i perform function lifting (monads),
is that it's not mixable: i can't just 'tag on' another monad.

maybe this should be made a little more explicit. the next thing i
need to implement is parsing macros: symbolic preprocessors to map
forth to something closer to 1-1 cat code.

last time i got lucky: i was able to use code as one of the input
streams. now that's not so easy any more: there is an input stream
which is not code.

i guess the easiest way to tackle this is to just define a prototype
CAT function for a parsing word, and work from there.

in rout -> in+ rout+

with rout a reversely accumulated list of atoms. it's like the
assembler proto, but with an extra 'in' state value.

the default parser moves an atom from in -> rout

it would be nice to be able to compose parsing macros, so they really
should be a special kind of macro: one built on top of ordinary
macros, with the input stream on top of stack, and a primitive 'read'
which takes an input object.


Entry: snotification
Date: Sun Jun  3 20:55:19 CEST 2007

-> entry point = load state + enter main loop
-> main loop = event dispatch

got it mainly working, but i'm experiencing problems with asynchronous
messages.. maybe i should get rid of the dots?


Entry: faith, evolution and programming languages
Date: Tue Jun  5 11:21:39 CEST 2007

bye Phillip Wadler, April 27, 2007
Google Tech Talks

http://video.google.com/url?docid=-4167170843018186532

a bit over my head, but things to look into:

- a logic corresponding to a programming language
- contracts
- haskell type classes for polymorphy

about logic & programming languages:
http://video.google.com/url?docid=-4851250372422374791


Entry: boot config
Date: Wed Jun  6 19:06:17 CEST 2007

i'm looking for a better default for the boot loader, to make sure a
project is either in one of 2 states:

virgin:   run purrr interpreter on boot, no interrupts
app:      fresh reset vector + isr installed

if i make it so that 'scratch' can safely erase the boot sector,
things might get more robust.

things that can go wrong:

- reset not defined, but isr defined
   solotion: always define them in the same macro
- reset or isr defined, but target code is gone
   solution: always erase the boot block on 'scrap'


Entry: forthtv pro
Date: Tue Jun 12 04:18:08 CEST 2007

let's see. what i need is a dual processor 18f1220 system

VIDEO
- low bandwidth I2C master (pull: poll at line frequency)
- video using USART out
- audio sampled at video line frequency

HUB
- low bandwidth I2C slave (push)
- keyboard interface (bitbanged)
- USART for host serial


this is a nice excuse to prepare brood for multicore projects.

note that the 18 pin chips do not have I2C, so i need to go to 28 pin
versions.


Entry: bank select
Date: Tue Jun 19 15:44:50 CEST 2007

keeping bsr at a fixed value, the extra bit in the instructions that
access the register file can be used as an address bit. note 1x20 has
only 256 bytes of ram.


Entry: sheepsint core todo
Date: Wed Jun 20 14:52:53 CEST 2007


- standalone boot + fall into debugger
- battery operated
- brood async io
- 16 bit math for control ops
- note/exponential lookup tables
- pot denoise?
- keep it working (*)


(*) don't know if i can do that yet. what i can do is to freeze the
software: make a fork of brood. i also can't fix the boot block. but i
can fix the app block.. maybe i should go that way.


Entry: text interface
Date: Wed Jun 20 16:07:24 CEST 2007

i probably need to take a deep breath and change the monitor from
binary to text. this would make it a bit easier to standardize, and
also, make it usable without the brood system, for debug purposes..


Entry: application control flow
Date: Wed Jun 20 16:38:38 CEST 2007


1) boot
2) mainloop (contains RX check)
3) on RX, fall into interpreter

then from the interpreter, 2) can be entered. this works like a
charm. sheepsint runs fine on 2 AAA batteries too using
18LF1220.

summary:
- empty bootblock -> fall into interpreter ('warm')
- application -> install reset and isr vector (best at the same time)

something which is important though: if there's no serial TX connected
to the pic RX pin, something needs to pull the line high. on the
CATkit board, the easiest way is to insert a jumper inbetween RX and
TX.


Entry: DTC
Date: Wed Jun 20 18:03:14 CEST 2007

time to do the real job: a dtc forth. what i'd like to do is to chop
the chip up in 2 pieces. first half is kernel + audio, second half is
DTC on top of that.

this is not so easy as it looks :)

but.. it might be more robust. basicly i have the following choice:
A. go brood/snot and finish that interface (requires emacs + plt)
B. go binary and use just a terminal emulator

i basicly promised B. which, for education and not-too-sophisticated
use, is what we need. getting A. ready to the point where i can teach
it is too much, so i have no choice really. i need a real forth!

and i need it before i can do more synth stuff.. or better, while
doing. so what is necessary?

1. terminal input with XON/XOFF
2. dictionary
3. compile link to ram
4. copy ram->rom


Entry: conditionals
Date: Wed Jun 20 21:16:41 CEST 2007

ok.. using the flag macros can be fast, but it's alsy really really
hard to use if a condition always needs to be a macro. so i need basic
'=' etc.. using nfdrop, and proper if that accepts any kind of byte.

not completely tested, but asm looks ok.


Entry: mini module system
Date: Wed Jun 20 22:19:50 CEST 2007

basicly, do something similar as in PF: a 'provide' word will skip
loading the current file if the word already exists.


Entry: terminal.f
Date: Wed Jun 20 23:10:17 CEST 2007

thinking about this XON/XOFF thing: there is really no way around
doing this with interrupts and proper buffers. the problem is really
that the when we send an XOFF, a byte can be already in progress. in
fact, if there's no break, and the host is sending full speed, it
probably is.

so a proper interrupt/buffer scheme is necessary.

time to dig up those cool 15 byte buffers again :)


Entry: read/write pattern
Date: Thu Jun 21 12:02:31 CEST 2007

something which occurs a lot is an update to memory which i'd like to
put in a macro. till now i always solved this using a macro which
expects a memory address. maybe that's the only sane solution? need to
think about this..

a bit of a hack, but something that might be interesting: have a
'lastref' macro which compiles a ref to the last referred variable.


Entry: workshop
Date: Thu Jun 21 12:41:26 CEST 2007

this serial terminal thing is not going very fast.. maybe i should
focus on finishing the 16 bit words first, then build a tethered DTC
on top of that?

maybe indeed best not to stress too much. it is working. i just need
to add some control to the synth.


Entry: multiplication
Date: Thu Jun 21 21:38:45 CEST 2007

the first thing to do is to create a generic unsigned multiplication,
and derive the other muls from that.

let's call 'z' a 8 bit shift (256)

we need to compute (x0 + x1 z) (y0 + y1 z)
all coefficients are 0 - 255

this gives

z^0   x0 y0
z^1   x1 y0
      x0 y1
z^2   x1 y1

the lowest of 4 bytes is unaffected by the 3 bottom ones
the second of 4                            top one

so, i'd like to do this
- fast
- functional

so no temp variables

the variables are presented as

  x0 x1 y0 y1

every number is used twice

now the juggling

done: i gave up on not using ram. it's probably possible to just use
the stacks, but it's really inconvenient due to the 'convolutive'
nature of multiplication. what i mean is: multiplication has all to
all datadependencies, and is not easily serialized. if it is
serialized, it needs random access (variable names) or at least
relative indexing. forth is not good at that.


Entry: refactoring
Date: Fri Jun 22 13:18:53 CEST 2007

some things that need to change in brood to make it easier to
understand and modify:

- words need to be cached, not delayed evaled, so incremental loads are possible
- parser macros -> purely symbolic, using only a 'quote' word for some 'pure forth'
- partial evaluator needs to be properly defined, so more elaborate operations are possible. i.e. explicit literal stack + commutation of operations with literals.

so, in short: CACHE, PUREFORTH intermediate (without parsing words),
and explicit PARTIAL EVALUATOR.

the PE needs to work together with the PUREFORTH, to be able to have
symbols als "ghost values".


Entry: forth vs DSP
Date: Fri Jun 22 13:27:09 CEST 2007

following the remark above about multiplication. most DSP stuff is
like that, so i wonder if it makes much sense to write a forth for the
dsPIC. anyways, it shouldn't be too hard once i clean up the compiler
code a bit.


Entry: sheepsint next
Date: Fri Jun 22 16:34:41 CEST 2007

ok, DTC and multiplier are working. time to get busy :)

maybe i do need to think a bit about the memory model though. might be
interesting to have full device control.


Entry: memory model
Date: Fri Jun 22 17:40:02 CEST 2007

what about simply:

* kernel is overlayed with RAM + EEPROM
* all the rest is flash

note that only the first 32kb can contain VM code, due to VM using 2
bits. the other 32kb is addressable, but only usable for tables etc..
not important now for PICs i use.

so ram max address space is max #x1000, data eeprom i'm not sure there
is a limit. but we have only #x100 and it's not used. what about we
map the flash to the upper 32 kb, and ram from the start? then eeprom
could be added later.


Entry: vm macros
Date: Sat Jun 23 22:47:10 CEST 2007

basicly, i need control words. so i need a mechanism for vm macros.
ok, in place. next is to just write macros, and to add a mechanism for
loading.

actually, this is kind of interesting, since it requires 'control
stack' operations. to re-iterate, i have these kinds of macros:


- peephole optimizers (asm buffer + used as literal stack)
- control operations (use data stack as control stack)
- recursive macros
- simple incremental macros (writer monad)
- whole-state assembler macros (i.e. global optimization)


if i make the stacks a bit more obvious: literal and control stack
need to be independent. control stack is sort of a literal return
stack.

so i just need to write accessors that bridge literal stack (asm
buffer) and control stack (data stack).

the more general thing that interests me is to make more functionality
available to the forth level, so more powerful macros can be written
straight in forth, without having to resort to tricks. in short: i
need a meta-forth, not a meta-cat, so cat can be tucked away as an
implementation/intermediate language.


Entry: compilation stack and word names
Date: Mon Jun 25 11:40:58 CEST 2007

i find some standard forth words a bit confusiong. it's probably
easier to start calling the compilation stack 'c' and be explicit
about the traffic.

there are only 2 label operations: localsym>c (generates a new label)
and label>c (compiles a label reference for the assembler).

in ordinary forth, labels can be patched, effectively implementing a
dual pass assembly. since we're not using mutation, we just generate a
label at the first occurence (instead of reserving an empty cell and
pushing its address) and bind it to opcodes as required later. these
symbols will be bound by the multipass assembler later.


Entry: writer macros
Date: Mon Jun 25 11:56:23 CEST 2007

these are confusing. maybe i really shouldn't distinguish between
'writer' macros, and 'asm buffer' macros. the the writer thing is
clumsy and a bit hard to understand. so i'm taking it out.

+ it's simpler: i'm using some I/O style monad '>asm'

- writer macros can't be isolated any more (assumption needs to be:
  modifies the whole state, not just concatenation.)

this doesn't seem to be a big disadvantage. it's probably better to
use some kind of tag system to classify macros according to
properties. the only thing i use it for is optimization, where missing
a classification means some optimization can't be done, so it won't
cause fatal errors.


Entry: make-composer
Date: Mon Jun 25 13:24:19 CEST 2007

another thing i'm running into is my terminology about namespaces. if
i have a collection of words, i'd like to specificy:

- source dictionary (semantics)
- destination dictionary (def)
- parser (syntax )

currently that's make-composer, but the names used are a bit
confusing. this can be done better. maybe i should just rename
make-composer to define/parse/find.


Entry: parser words
Date: Mon Jun 25 13:38:01 CEST 2007

this needs a thought about what to do with parsing words, mostly
quoted symbols. i guess it's safest to put them on the compilation
stack, so i don't need any literal optimizations.

Entry: todo
Date: Mon Jun 25 13:38:54 CEST 2007

- take out all writer stuff OK
- rename
  	 asm-buffer-find       to       find-asm-buffer
	 asm-buffer-register!  to  register!-asm-buffer
	 state-parse	       to      parse-state
- fix parser macros: decide on lit/comp stack
- fix assembler evaluator


i'm not going to change th find/register!/parse names. this is just
cosmetics..

about fixing the assembler evaluator. what about requiring all literal
arguments to be cat code?


Entry: literal stack + compilation stack
Date: Mon Jun 25 14:37:20 CEST 2007

the important thing about stacks is that you need two of them, i once
read. which seems to be the case. currently i'm trying to figure out
what should go to what by default.

the idea of the 'literal stack' is simply to be able to do some
computation at compile time. a nice feature here is that a lot of
operations become more natural. for example:

	   1 2 +

is really just 3. and this is a mandatory optimization in
badnop. something you can rely on as a feature. standard forth would
make this explicit

     	  [ 1 2 + ]L

the reason i dont use the above is that my meta language is not
forth. it's CAT. more importantly, CAT is much more powerful than the
simple 8 bit forth is.

so, the idea goes:
- mandatory literal optimization (compile time evaluation)
- forth extended with 'ghost' types

the ghost types are things that make no sense for the microcontroller,
but when they are combined with other ghost types, result in things
that do make sense. the most obvious one is assembler labels:


           ' foo

will compile code that loads the (symbolic) address of foo. if this is
followed by a macro that consumes it, the whole can be reduced to code
that does have a meaning on the microcontroller.

i'm not 100% convinced this is a good idea (not being explicit), but
it does feel like one. what i'm looking for is to give it a decent
meaning. and to find out when to use the literal stack, and when to
use the compilation stack.

another thorn is the way the literal stack is implemented, but that
can be fixed later. right now i need to get the semantics right.


i'm not asking the right question..


what's the real problem here? the target chip has a clear separation
of ROM and RAM. this is both convenient (code is persistent), and not
(they need to be treated differently).

what i'd like to do is to make a source file correspond to only
ROM. standard forth doesn't do that: loading a file both writes code
and initializes data. i guess this is the main reason why things are
different for me:

 harvard: ram initialization (run-time code) and meta compilation
 (compile-time code) are strictly separate.

 von-neumann: both can be done at the same time (program load time),
 and blurr together.

so what does this have to do with the literal stack?

- the meta language is not forth
- i'm trying to disguise this

basicly, i'd like to not think about this thing being a
cross-compiler, and act as if everything runs on the target. one way
of doing that, is to require compile time evaluation whenever it is
possible. as a result, the simple recursive macro system, which does
not refer to the real meta language directly, becomes more powerful:
required partial evaluation gives it some run-time power, instead of
merely being passive concatenation of code.

so the real question is:

  how to simplify the target language such that no explicit reference
  to the meta language is ever necessary, and all macros have a
  compositional semantics.


the way that seems most natural to me is:

- partial evaluation is the default: act as if everything is done at
  run time (like "1 2 +"), but write the macros such that they perform
  compile time compilation + raise an error when higher level things
  can't be resolved at compile time.

- some constructs use the COMPILATION STACK referred to as 'c'. this
  is mainly intended for code blocks, and serves a bit the role of the
  return stack.


this also gives the solution for parsing words: their default
semantics is to map something to a literal compiler.

a common problem i encountered is a macro which has 2 references to
the same name. this is now easily solved using the compilation stack.

so the key is really in the words '>c' and 'c>'


Entry: vm words and literals
Date: Mon Jun 25 15:18:13 CEST 2007

so, looking at the remarks before.. the literal stack is really more
than just literals. it could contain words to. words in their normal
meaning are calls.

so: the assembler buffer is just a stack of symbols, bound to
semantics (literal, call, jump). what i really need is 2 new opcodes:
lit and word, that will be resolved in the assembler, but that can be
used in the optimizer and partial evaluator without too much trouble.

so i think i see the roadmap now:

1. fix assembler to take these opcodes:
      cw	 call word  (code)
      jw         jump word
      qw         quote word (data)

   which are really just the primitives used in the VM

2. fix the peephole optimizer to operate on those words


this will give a proper semantics to the literal stack, basicly it
will then contain words + their meaning: code or data. again a simple
pattern: delay low level representation as long as possible.

ok.
now i need to check first if the monitor code still runs..
it does. time to fix this.

it's probably easiest to create an extra assembly step which filters
out the pseudo ops. could be interesting to clean up the assembler a
bit.

i'm writing pic18-compile-post now, and will start using 'values' to
do the expansion. at first i though this values thing was a bit
clumsy, but having to wrap things in a list is usually more work: it's
better to do this in the consumer using call-with-values, than in the
producer when there are a lot more producers than consumers. which is
the case here..


Entry: literals : save
Date: Mon Jun 25 16:02:53 CEST 2007


oops. too much coffee, going too fast..  i AM doing SAVE for each
literal, so the postprocess step shoul maybe perform the save too?
this is a bit more complicated than i thought..

so: when to do SAVE?

currently save does:

 ((['drop] save)            '())
 (([op 'POSTDEC0 0 0] save) `([,op INDF0 1 0]))
 ((save)                    '([dup]))


what about just a second compiler pass with the word 'save' ?

that 2nd pass seems to work. problem now is a lot of the literal
macros do need their arguments. am i going to try to fix that now?
maybe think a bit about how to do that in a smart way..

ok. roadmap:

- just added c> == '(qw) op>asm
- replace all lit macros by a qw macro, and remove them from expose hack

done, now for the calls

TODO:
- replace branches and calls with pseudo ops
- fix vm control ops
- start working on the control part of synth


Entry: multiple passes: pseudo assembly language
Date: Wed Jun 27 10:41:04 CEST 2007

so the pseudo assembly langauge is a bit more explicit now. between
forth and real assembler there is a representation where the opcodes
qw, cw and jw are used. they give a proper typed stack meaning to the
assembly buffer.

as a consequence, quoting in macros can be eliminated, and pure
postfix notation can be used, using the 'word>c' operation, takes a
code word from the assembly buffer, and moves the tag to the
compilation stack. in short, instead of

	    	   ' word <...>

one can do

                   word word>c <...>

where <...> handles the symbol/address.


now wait.. if quote is not longer necessary in the compile time
semantics, why use it? (it is still necessary in the run time
code. back to that later.)

the whole idea seems to be: because all behaviour is postponed
(compilation), it doesn't have to be stopped before it
happens. meaning: if i enter 'broem' at a command prompt, it will
execute -> damage done. if i want to not run it, i need to 'quote' it,
which means postpone execution. during compilation, everything is
postponed, so no need for quotation! wonder if i can make that
a bit more formal.

this does ring a bell somewhere. been reading about the macro/module
system in PLT scheme. something with keeping run time and compile time
separated to make dependencies explicit.. anyways.

oops. not completely true: if it's a macro, and you want to refer to
it, it needs to be quoted.

an 'almost right' thing here.. if there are no macros, it's
right. macros are code, the rest is data during compilation. if
there's no code, true. back to my original point: quoting is
postponing execution. maybe i should just try it to see if i get into
situations that are awkward, because it does look promising.


Entry: vm compilation: one word to change semantics of parsed code?
Date: Wed Jun 27 11:25:18 CEST 2007

since the compilation buffer already contains the code/data
(quote/call) distinction, only a single word is necessary to convert
any operation to its vm equivalent. this word should leave macros
alone.

problem here is that i need a type (pattern) matching word here, so
not yet..

simpler: i'd like to remove the quote in 'vm->native/compile' and in
the vm-core.f file, so i can easily compose macros. quote is really a
preprocessing thing, which is necessary to get from source -> forced
data semantics. once parsed to intermediate, no quote is necessary.

argh.. so i don't really need to remove the quote there, since it's
exactly that: a preprocessing step to generate native forth code. it's
ok this includes a quote operation. si '_literal' and '_compile' take
data atoms on the literal stack, which means quoting is necessary for
code atoms. this allows VM semantics (decision for code/data) to be
different than lower level language, which is a good thing.

check vm-core.f for some explanation. sommary:
"purely compositional macros == good thing".

it's the basic idea of CAT. see the notes below.

question is though: can i make these macros powerful enough to have
some kind of lambda construct? postponed macros basicly? the only
thing i want to solve now is conditionals, but better to aim for the
bigger thing.


Entry: language path
Date: Wed Jun 27 11:41:28 CEST 2007


FORTH with parsing words = symbolic --->
FORTH with only quote = symbolic --->
pure FORTH without quote, = compositional CAT code -->
intermediate assembler = effected macros (real asm) with pseudo asm (qw, cw, jw) -->
symbolic machine assembler -->
binary

i should give these a name
PURRR18/forth      (quote + parsing words)
---> PURRR18/quote (pure + quoting word)
---> PURRR18/pure  (purely compositional macro language, as CAT code)
---> PURRR18/asm   (PIC18/asm augmented with pseudo ops)
---> PIC18/asm     (my version of the symbolic assembler language)
---> PIC18/bin     (binary machine code)


Entry: i want it all
Date: Wed Jun 27 12:14:37 CEST 2007

what about postponing macros? i basicly want conditional branching at
compile time, but full lambda (quoted macros) would be interesting too.

now what did i expect? forth is not CAT. this is a game of syntax, in
the end.. i'm trying to cram a meta language into the language syntax,
without using its quoting mechanism: lists. it looks like i can't make
it too powerful without introducing quoting syntax, which is what i'm
trying to avoid to keep it simple.

the problem which sprouted this line of thought is the VM return
operation. the words "_then _;" don't work because ";" expects a
word. so i'm going to need an extra primitive to solve this
conditional execution.

maybe there is only one real solution. make the ' operation a
syntactic one, like in lisp.

- if quote is syntax, an intermediate language is not necessary.
- if it's not, a parsing stream needs to be available

the last one is obviously worse, since it makes composition harder. so
that's what it will be: quote needs to be syntax, and ' is a special
character.

so, pure forth in s-expressions is

<program> ::= ( {<atom>} )
<pure>    ::= <number> | <word>
<atom>    ::= <pure> | ( quote <pure> )


to preserve previous syntax, the run-time semantics of "' word" still is
"load address of word on parameter stack"

so, summarize again:

is quote a lexing operation, or a syntactic operation? the answer
seems to be the former. the problem this solves is this: syntacticly,
code and data are distinct. the full domain is split in 2 parts, but
semanticly, code is a subset of data.

introducing quoting at the lexing level gives:
- better mapping to CAT (using the same lexical trick)
- saner semantics: quote is defined independent of an input stream
- quoting can be used in macros, using forth syntax, keeping the compositional property

in the language path above, the 'pure' and 'pure+quote' will now be the same, so i have


Entry: updated language path
Date: Wed Jun 27 13:31:57 CEST 2007


PURRR18/forth      (quote + parsing words, symbolic form is not CAT)
---> PURRR18/pure  (purely compositional macro language, has symbolic CAT form)
---> PURRR18/asm   (PIC18/asm augmented with pseudo ops)
---> PIC18/asm     (my version of the symbolic assembler language)
---> PIC18/bin     (binary machine code)

so the entry point is there to preserve original forth syntax, i.e. ":
abc". for internal processing, this will be mapped to "'abc make-:" or
as s-expression ((quote abc) make-:).

the 'make' name i need to think about still..

reason for having ' as lexing operation, instead of parsing, is that
it eliminates one parsing layer + it maps better to CAT.

this is different than forth, but in a way that is probably hardly
noticed.


Entry: again?
Date: Wed Jun 27 16:16:34 CEST 2007

so why not just a parsing step?
i need types to do this properly


macros:  pure+quote -> pure
forth:   forth-> pure

parsing words are merely frontends for pure

the alternatives are:

1. lexing produces a stream of symbols and numbers. then there are 2
different parsers that map this to pure forth.

2. lexing already produces quotes

the first option is really simpler, so let's keep that.


Entry: parsing
Date: Wed Jun 27 16:36:30 CEST 2007


so now i need to redo parsing. currently, it's a bit of a hack. it's
not extendible. but do i really want it to be extendible? i need a
different 'kind' of word. a parser is not a macro.. they operate on
different levels.

so let's abstract it out a bit.

2 steps need to be separated

forth -> symbolic cat
symbolic cat -> parsed cat

both are parsing operations structurally, but it's maybe best to give
them different names?

i got it, except for the quoting stuff..

now, a problem i ran into is that ' abc actually compiles a byte
address. i wonder where this will fail if i change that.


Entry: bytes or words
Date: Wed Jun 27 18:25:42 CEST 2007

some conflict here

bytes:  ' abc org	needs byte addresses
words:	"' abc"		can be used as just a symbol.

maybe quote is more important. maybe we need to have "execute" take
word addresses everywhere? that's also better for the VM.

the thing is: data is always byte addressed, while code is always word
addressed. a unified address space (bytes) would be nice, but makes
things complicated since quoting is not just quoting..

so best seems to me:

* execute takes word addresses
* monitor JSR will also take word addresses
* quoting a symbol name has default semantics to load word address on stack


Entry: cosmetics
Date: Wed Jun 27 18:40:35 CEST 2007

TODO:
- make dtc intermediate code a bit more readable
- fix prj path as mutable state (arbitrary.. maybe see it as a constant?)

last one isn't so important.. first one requires some kind of
loopback, and i think it will make things too complicated.. need to
think about it.


Entry: dtc control primitives
Date: Wed Jun 27 20:49:18 CEST 2007

i need 'run' and 'jump' prims.. time to get confused about primitives
and programs again. if i remember correctly, the lessen is to never
let primitive addresses leak into the higher level code: it's not
convenient to have to deal with 2 kinds of code words. in cat, i only
use programs (lists of primitives) never primitives directly. same
here.

just like for primitives, i need to choose for some kind of basic
representation: byte or word addresses for composite code? the only
thing i need to take care of is that continuations (return addresses)
are compatible with "run".

i'm getting confused.. i guess i just need to write if/then/else and
we'll see how to continue. it does look like there's no easy way other
than:

	LIT L0
	BRZ
	<true>
L0:	<rest>

and

	LIT L0
	ROUTE
	<true>
	LIT L1
	RUN;
L0:	<false>
L1:	<rest>


ok, so be it. can't win them all.. maybe a good opportunity to use
ifte instead of if .. then .. else.

so.. primitives. can't 'run' primitives. can run programs. so the idea
is that quoting code always quotes programs, so i need something like
PF's { and } words. for conditional branching i can use 'route' as a
basic word. cloaqued goto or something.

route \ ? program --


Entry: assembler bug
Date: Thu Jun 28 00:07:21 CEST 2007


performing meta evaluation needs to happen in the 2nd pass, because of
the presence of code labels.

time to clean up the assembler, and sort out all different
meanings. the bug is simple: just retry if there's an undefined
symbol.

then another problem: literals take 14 bits, but quoted programs are
byte addresses. can we resolve this somehow? if i really need the
return stack to contain word addressess, that can still be fixed
later. now i'm going for 'run' and 'run/b'.

ok, it seems to work now.


Entry: vm optimization
Date: Thu Jun 28 09:36:41 CEST 2007

now it's time to reduce code. it's not very fast anyway, so no reason
to start spilling bytes. but this is for later. got some stuff to get
ready now.

i'm happy with how it's looking though. some minor things need fixing,
probably the most important one being return stack alignation.

something to focus on is to limit the number of macros. i probably
only need conditionals, the rest can be written in forth even. macros
are only necessary for marking jumps.


Entry: sheepsint 8 bit interface
Date: Thu Jun 28 15:24:27 CEST 2007

so. i need a synth control layer. going to use the ordinary 8 bit
forth.


Entry: loading dtc forth
Date: Thu Jun 28 16:03:59 CEST 2007

problem. the mapping from vm -> native forth is not just syntactic. it
uses knowledge about target words being macros (as native macros) or
dtc target words. this means 'load' will not work properly.

so this decision needs to be postponed.

easiest is to load both symbols (word and semantics) on the literal
stack, and have a macro determine the semantics.

ok, seems to work.


Entry: problem with dup and literals
Date: Fri Jun 29 09:49:11 CEST 2007

123 dup 456 doesn't give 2 literals on the stack.. if i let dup copy
the literal, some other things go wrong.. maybe it's best to have dup
copy the literal, and solve the other problems in a second pass?

i found an optimization that solves it in one pass, by realizing

1 (2 3 !)   ->    <...> 1

where <...> stores the value with stack effect = 0.

other places where this might go wrong is where an explicit dup is
expected.. there are none outside of the '!' i think.


Entry: sheepsint core
Date: Fri Jun 29 10:51:39 CEST 2007

things to fix:

- noise
- sample playback

then for control, i need to find ways to map parameters to meaningful
ranges. this is where multiplication and exponential table lookup come
into the picture, which might be an interesting advanced topic.

ok, there's a problem with the buffering: i don't have a fixed sample
rate any more, so computed values need to be sent out immediately: i
have no idea when the next event will output the previous state!

ok, just moved it to the end of the isr.. now there's a bit of jitter,
but probably not really noticable.


noise still isn't working. i can't find the problem. probably needs a
fresh look. also, notes aren't working..


Entry: unified namespace and rolling back
Date: Sun Jul  1 15:47:32 CEST 2007


for target stuff.. meaning: something defined as a variable should be
able to be redefined as just a target word. or not? this is not so
easy since all meta objects are compiled into the core, and are not
really seen as data..

there is also a conflict between forth's "first find" and my meta
language's last redefined. maybe the project file should index macros
somehow? so they to can be reverted.. this would be cool for variable
names etc..


Entry: VM and TBLPTR
Date: Sun Jul  1 15:57:09 CEST 2007

maybe it's not such a good idea after all.. the deal is this: the VM
should be easy to use. anything that needs speed can simply be moved
to primitive code, completely eliminating interpretation overhead. i
put some effort into making both layers interoperable, so why not use
it?

it seems as if each 'useful' feature of the VM makes it a lot
slower. why do i care? the whole idea is to make some kind of
standard. why not write the VM on top of the memory model for
instance?


Entry: swapf
Date: Sun Jul  1 17:53:10 CEST 2007

something is wrong with the nswap macro:

ok, i found it: nothing wrong with the macro. there was en error in
the assembler binary opcode.


Entry: control slides
Date: Sun Jul  1 18:17:29 CEST 2007

linear & exponential. in-place updates? probably best to go
out-of-place. with wrap-around?


Entry: control timer
Date: Sun Jul  1 18:37:05 CEST 2007

previous sheepsint had some fixed sample->control rate timer. here i'm
using a fixed sample rate for the noise generator (bit less than 8
khz), which increments a 32bit counter once every tick. this can be
used as a general fixed time source.

ok, trying to sync to bits of the 32 bit timer, i'm using this code:

\ control at 244 Hz
: wait-control
    begin tick0 6 high? until
    cli tick0 6 low sti ;

but the cli/sti isn't necessary: the timer increment is atomic:
there's no read-modify-write.

one problem though, if the counter is reset, higher bits will never
get set! so a better strategy would be to wait for a bit to go low,
then wait for it to go high, so the transition is captured.


Entry: fix macro loading
Date: Tue Jul  3 12:07:42 CEST 2007

really annoying to have these not synced to project.. maybe include
them directly in the project file. also need caching: timestamps would
work together with mark points. a problem point is missing variable
and function name spaces. once something has been a macro, it will
remain a macro. a single dictionary stack is easier to use.


Entry: transient controller
Date: Wed Jul  4 12:43:16 CEST 2007

this is fairly simple if it only needs to save the mixer config (one
byte). saving oscillator frequency state requires 6 bytes more. what
about making the transient word itself responsible for saving current
state, and just using the x stack.

if the time base is fixed (32 bit tick timer), control words become
fairly simple. remaining question: who is responsible for syncing to
note tick? this is a question of composition: i.e. hihat + kick at the
same time requires hihat word to sync to note, not kick.

best to keep control syncing independent of note syncing.


Entry: AD conversion
Date: Wed Jul  4 13:53:39 CEST 2007

2 things to determine:

- aquisition time (sample/hold settling)
- TAD (per bit sample time)

TAD should be as short as possible, but greater than the minumum TAD,
approximately 2uS for 18F1220. the datasheet says for the F version at
8MHz, to use 16TOSC, and for the LF version to use 32TOSC.

It was on 16TOSC, 20TAD.. put it to 32TOSC, but can't see a
difference. maybe the pots are too noisy. i tried to add a capacitor.
100n and 10u, but no difference..


Entry: noise
Date: Wed Jul  4 15:49:18 CEST 2007

noise is probably more useful as one of the oscillators instead of a
fixed 3rd one, just like sampler. using the 8 bit timer only for
control time base frees up some resources, and decouples noise
frequency from control frequency.

best seems to be OSC1, keeping in mind the formant mixer. changing the
mixers: silence, xmod, formant. and having OSC1 do noise/square/sample.


Entry: bootsector
Date: Wed Jul  4 16:02:52 CEST 2007

maybe it's best to reserve some functionality for chip erase, so i
don't need to worry so much about messing up the bootsector. basicly,
just need a single piece that never changes, which has the ability to
influence the booting process to run the interpreter. probably an 'ack
bang' or something?

- keep boot sector free for fast isr
- reserve 2nd block for reset vector?

seems the core of the problem is that boot vector and isr vectors are
in the same block. what if:

- default reset vector = jump to second block
- add an application vector after this
- second block contains some kind of checking code to determine
  activation of application or debugger


Entry: metaprogramming
Date: Fri Jul  6 13:03:44 CEST 2007

more things from forth. i'be been using the first couple of macros
that use the compilation stack explicitly. i could probably move more
code to be accessible from the forth macro language. to have a forth
like [ and ] section would make sense.

the point where i want to stop is s-expressions: once i'm introducing
that syntax into forth, there's nothing stopping it from becoming
something completely different. one of the aims really is to keep out
s-expressions. however, it's not so hard to have some kind of 'begin
... end' construct that maps directly onto cat code.


Entry: noise as osc1
Date: Tue Jul 10 22:22:35 CEST 2007

tested. seems to work.


Entry: macros and cat
Date: Tue Jul 10 22:25:32 CEST 2007

name space mixing in macros. the ultimate goal is to have a forthish
CAT that i can just include in PURRR/18 code. currently the 'c' words,
combined with the literal stack, work pretty well. i need to think
about cleaning up the semantics a bit. there's a lot of nice things
hidden here..

one of those is: you need 2 stacks. mapping behaviour in an assymetric
way (i.e. return stack / data stack) is arbitrary "human meaning" to
ease understanding of components so they can be composed.


Entry: nand synth
Date: Tue Jul 10 23:45:29 CEST 2007

works like this:

- 4 schmith-trigger based oscillators, cap select (decade) + pot

- chained: 2nd AND gate input turns oscillator off

- the NOT in the chain prevents subsequent oscillators from being OFF
  at the same time

so, the the first oscillator A produces a square wave. during A's ON
period, the second oscillator B produces a square wave, during A's OFF
period, the second oscillator gives ON. and so the story continues...

....AAAAAA....AAAAAA....AAAAAA....AAAAAA
BBBB.BB.BBBBBB.BB.BBBBBB.BB.BBBBBB.BB.BB
C.C.CC.CC.C.C.C.CC.C.C.CC.CC.C.C.CC.CC.C

etc

this can give a quite complicated pattern after a couple of steps. one
thing is missing though, there is no resync: all capacitors keep state
between oscillator ON/OFF switches, so no formant-like tricks.


Entry: noise as sync source
Date: Wed Jul 11 12:12:56 CEST 2007

It's possible to use 'filtered pitched noise' by using the RESO mixer,
together with a noise OSC1. however, the opposite: an oscillator
resynced by noise i don't have atm. maybe OSC0 should be able to do
noise too?

OK, that's a different game


Entry: boost converter hack
Date: Mon Jul 16 19:13:12 CEST 2007

as mentioned before (probably in brood 2 ramblings.txt), it is
possible to use a protection diode as rectifier for a signal -> power
converter by connecting a signal with a large enough duty cycle
directly to an input pin, and connecting a cap across the power pins.

related, it should be possible to convert that scheme into a boost
converter, by connecting a power supply to an input pin using an
inductor, and using the pin's output stage as a switch that to charge
the inductor (by connecting the point to ground).

when the pin is switched to input, the inductor discharges the energy
stored in the magnetic field, and charges the capacitor through the
protection diode.

this way, the uC can regulate its own supply voltage. this scheme
just needs an initial push to charge the capacitor such that enough
energy is stored to boot the program that starts the feedback
mechanism.


Entry: filter bank on PIC18
Date: Mon Jul 16 22:13:34 CEST 2007

so, if i want to run a digital filter on the PIC18, for, say, some
demodulation. what performance am i looking at?

running on 5V and a xtal, i can get to 10 MIPS. for audio rate
signals, say up to 5kHz, this gives 2000 instructions per
sample. that's not quite nothing.

using half of this for the filtering, and the other half for the
decoding and the actual application, we're looking at 1000
instructions of DSP to burn. looks to me there's plenty of room.

to make it sound good, tones need to be quite stable. at least 1/16th
of a second. say 6.4kHz, this is about 400 samples.


Entry: PSK31 and meshing
Date: Mon Jul 16 22:27:06 CEST 2007

i think for the waag, we need to keep the basic objective simple:
PSK31 as it is tried an true, and there is decoding/encoding software
to actually test it.


Entry: human naming nature
Date: Tue Jul 17 01:21:33 CEST 2007

One of the things that's nice about a compositional language like CAT
is that they force you to aggressively factor. Simply because programs
become to hard to understand if you don't. Factoring is really
identifying (naming) substeps. In a compositional language, factoring
is really totally arbitrary, from a machine point at least. Not for
the programmer. Since function arguments are not named, names have to
be introduced elsewhere.

This is that extra bit of 'meaning' in a program which transforms it
from the mess a computer just executes, to some meta-executed thing
represented in a human mind.. Those are really not the same. Being
able to program something and 'knowing' how it works are different
things. The 'knowing' is hard to explain sometimes.

It's just a force of (human) nature, really..  For a program to be
actually readable, a bit more than the connectivity (topology) is
necessary: the information encoded in the names an sich seems to help
the human brain to understand the connectivity, or at least give it
some analogy.  Maybe a bit like embedding a topological thing in a
geometry to make it more 'real', programming is embedded in the real
world of thoughts by associating some native language to it. The two
ways to do this are either the lambda calculus (lexical scope) or
combinators.


Entry: get off that lazy ass
Date: Thu Jul 19 13:49:18 CEST 2007

i think i'm not made to idle around. depresses me. people tell me i
need to try harder, give it a couple of weeks of idling to find out
the true joy of life. i don't have time for that :)

so.. next things to tackle are:

* fix boot loader so the ICD2 can stay safely in the box for really stupid mistakes.
* interaction macros
* SNOT and sending code from emacs
* the slow highlevel forth on virtual memory


Entry: the boot block
Date: Thu Jul 19 13:54:04 CEST 2007


conditions:

* BLOCK 0 = empty OR 0000 and 0006 contain jumps to BLOCK 1 (soft reset)

  this ensures that an empty boot block is valid + interrupts and
  application invocation result in a reset when they are not defined.

* during boot, a DEBUG condition is checked. this will force it to run
  the interpreter to await commands.

* if the DEBUG condition is false, the application (addr 0002) is
  executed. if there's no application, a soft reset is run. (so
  eventually the chip responds).

* installing a new application:
   - clear boot block
   - install security jumps
   - install isr code

* possible conditions:
   - a pin
   - a boot wait + serial activity
   - break condition on serial port

--

installing the bootblock can be done in a single interaction macro:
compile an init macro, then when this succeeds wipe bootblock, and
upload a new one.

the deal is that the boot sequence up to the DEBUG check is NEVER
changed! it's not enough to have your application perform such a
test. this can go wrong in it's boot sequence before the check is
executed, or even during the check. get it right once, then keep it
like it is.

another possibility is to have the serial port operate from
interrupt. that way sending a break signal could actually stop the
program. however, this is more complicated and reduces freedom for
custom isrs.

--

thinking about it, why the one at 0006 ?  ok, it prevents problem if
there's a reset vector but no application vector installed. better to
be safe.

ok, default really is empty boot block: means app is gone. whenever
APP and ISR vectors are installed the 'reset-vector' macro needs to be
included.


Entry: new stuff
Date: Mon Jul 23 13:44:31 CEST 2007

done doing goto10 admin stuff. time to make a list of things that need
a different approach.

BROOD:

* streams (don't save intermediate state)
* macro namespaces
* interaction macros
* clean up pattern matching macros
* SNOT
* clean up / document / reflect on the forth macro semantics (partial
  evaluation + parsing words)

PURRR:

* boot block updates
* highlevel forth on virtual memory


Entry: name spaces
Date: Mon Jul 23 13:54:02 CEST 2007

i guess i need a proper name space mixing for the macro system. it
should all be just scheme functions, not hashtables full of
structures.

currently i have the following name spaces: cat, state, store, meta,
asm-buffer, forth-parse, macro, badnop

so.. let's see if i actually understand the plt scheme namespaces. a
namespace is something that maps symbols to storage cells for works
like 'eval' and 'load'.

so instead of using hash tables and explicit lookup, using namespaces
one could use 'eval'. the advantage is that run time 'eval' could be
avoided, and macros could be used where possible.

so, what do i want really..

* access macros using scheme names in scheme code.
* compile (eval?) a symbolic cat function straight to scheme fn
* be able to change cat macro name bindings just like scheme


questions i need answered:
* can an entire namespace be hidden in a module?
* is it possible to dynamicly add stuff to a module?
  (i guess so, using module->namespace)
* how to 'merge' namespaces?
* can i abstract the rather awkward symbol prefix merging?
* is prefix merging really awkward?


name spaces in scheme:

* once evaled/compiled, an expression is bound to a certain name space
  and independent of the current one


Entry: callout
Date: Mon Jul 23 22:30:28 CEST 2007

i need some knowledgable people to discuss this stuff with. don't know
where to find them though. things to try:

* plt list
* comp.lang.forth
* picforth list
* gnupic list


Entry: BROOD 4 takeoff
Date: Tue Jul 24 00:00:00 CEST 2007

EDIT: this where the ramp up to brood 4 starts with the move from
interpreter -> macros.


Entry: really on top of scheme
Date: Tue Jul 24 19:02:23 CEST 2007

so, i need to get rid of the explicit interpreter. or not? i'm mostly
concerned with name spaces here, not implementation.

(1 2 +) ->

(lambda stack (apply cat:+ (cons 2 (cons 1 stack))))

what about preserving original source form? do i actually still use
that? yes, when printing code. for example, doing (1 2 +) creates a
quoted CAT program, which when compiled doesn't have a source form.

so, how to assoicate original source form to lambda expressions?


i really should define my interface first. i don't need to use raw
functions as representations. the 'stuff' that's bound to names can
just as well remain a word structure. in the end, i'm doing nothing
but replacing hash tables by name spaces.

so..

* modules: separate code into logic entitites
* namespaces: allow run-time eval/compile

the latter part is not really necessary for the core! so, i should
build macros first, make sure i have a direct map from:

CAT (or any monad language derivative) -> 'raw' cat -> scheme

raw cat is just cat with scheme words.


so how to do this?

- all CAT code is compiled: use modules
- how to separate name spaces: (i.e. how to prefix names?)

so.. it's seeping through. names are compile time stuff. macros are
compile time stuff. anything that juggles names should be a macro. so
(cat +) is a macro, which expans to a labda expression, or a variable.

it's not enough to have it expand to just a lambda expression. storage
should be shared, so (cat +) should return a binding in case of a
single expression, or a composition (cat 1 +) in case of multiple
arguments.

so, what about this:

  any CAT-like languages use the (<language>: <word> <word> ...)
  syntax, where the macro <language>: (i.e. 'cat:') transforms the
  code into a function that maps stack -> stack


this way everything is directly accessible from scheme. for example
(cat: 1 +) is a lambda expression. neat. even, ':' could signify THE
cat. then 'cat-compile' is no more than (eval (cons 'cat: src))

note that i don't really need to ever run any programs. cat is just
functions, and in scheme, they can be applied to data.

the thing is, i don't need an interpreter. i just need a proper way to
associating compiled code to original source form (reflection). this
does mean giving up some reflection: the current source/semantics
association probably needs to change. it's not a small rewrite..


Entry: the macro way..
Date: Thu Jul 26 11:17:42 CEST 2007

let's start with some basics.

apparently structures can be used to implement behaviour of
procedures, using struct-type properties. this should be enough to
convert completely to macros.

i started cat-base.ss

so, here we go.. all the freedom is there again.


* i'm starting with one modification: low level CAT source
  representation is reversed. this makes writing the macros a bit
  easier.

  this makes (a . b) be 'compose a AFTER composition b', so:

  (pn-compose  a b c)  ==  (apply a (pn-compose b c))

* 3 phases are separated:
    - compile: atom -> representation of behaviour (apply/cons)
    - compose: list of words -> nested apply/cons
    - abstract: application -> lambda expression

  compile can be recursive due to the presence of quoted programs

* reversal is introduced early on: it's too confusing to have it
  around after the nested 'apply/cons' is in place. i'm switching from
  pn- to rpn- prefix at the point of abstraction (converting code to
  scheme lamda expression).

* snarfs can be stolen from previous implementation. maybe the code
  reversal should use a generic reverse macro too. (done)

* now all that's left is to solve the name resolution.


Entry: separating syntax from semantics
Date: Fri Jul 27 13:05:09 CEST 2007

I got the syntax working. Now i'd like to build an abstraction that
takes a binder macro, and produces a compiler macro:

cat-bind   ->   cat::

Assuming the structure of the language remains the same.

The problem is i keep running into compilation phase problems and i
don't really know why.

It's quite intriguing, this macro programming. Not quite the same as
regular lisp hey :) It's a bit like a lazy language with pattern
maching. Maybe it is a lazy language? Would be nice to read a bit about this..

Anyways, i do see to start some programming patterns. I have a problem
that i'd like to keep both semantics and syntax abstract. Currently, i
pass around 'compile', but it's too general. I'd like to specialize
only some compile behaviour, and keep the rest open. So: message passing!


That seems to work quite well.

Now, on to semantics.


Entry: macro expansion
Date: Fri Jul 27 16:36:51 CEST 2007

One problem i run into is that (cat: ....) seems to be looking for
symbols in the toplevel. I guess if i know why, i'm a big step
further in understanding this whole module/namespace stuff..

From the manual: 5.3 Modules and Macros

"... uses of the macro in other modules expand to references of the
identifier defined or imported at the macro-definition site, as
opposed to the use site."

This looks like the 'no surprises' rule, or the 'dynamic binding is
evil' rule to me.

The toplevel can still be used for dynamic binding, hence the macro
expands to (#%top . xxx::+)

So it looks like i have only one choice. Either i make sure the names
are available at the point where the macro body is defined, or i put
them in the toplevel explicitly.

Let's see if the former is doable.

Ok, trivial but still feels a bit weird. Maybe i'm too much accustumed
to late binding by 'load/include', which is as far as i get it,
exactly what the module system tries to avoid.

* Circular dependencies are allowed within a module
* Not in between modules
* Undefined symbols in a module are not allowed.
* Any late binding is to be done in the toplevel (but feels dirty)


Ok, time to clean up the utility code.


Entry: control structures
Date: Fri Jul 27 18:24:03 CEST 2007

.. become a lot easier to implement:

(define (xxx.choose no yes condition . stack)
	(cons (if condition yes no) stack))


Entry: where to store the functions?
Date: Fri Jul 27 18:34:08 CEST 2007

This remains a question. I thought it was necessary to have them in a
scheme name space. Not true. As long as they can be identified at
compile time, and mapped to storage, all is well.

Not true, and also not convenient, because i really can't find a good
way to do it except for explicitly creating an empty name space and
dumping all the references there.

Another thing: i don't really need the extra level of indirection a
name space cell provides: it is ok to just mutate the word structure
that's permanently attached to a certain name. It already behaves as a
cell:

instead of          NAME -> CELL -> WORD
we could just have  NAME -> WORD

since every cell is a word.

So why not just dumping stuff into hash tables? If (compile function
sym expr) returns a word structure, all is well. Since my language
doesn't have anything else than words, each name simply IS a word.

Make that nested hash tables, so i have a mutable real store to go
with the functional store. Maybe i can even unify them?


Entry: macros really are better
Date: Fri Jul 27 18:46:44 CEST 2007

* no VM, no custum control structures that invoke the interpreter. just 'apply'.
* functionality can still be stored in a hash table: each name refers to a fixed cell = word struct.
* hash table needs to be available at compile time


Entry: 2 stores
Date: Fri Jul 27 19:10:32 CEST 2007

Why not store the functions in the functional store? The main reason
is that the functional store is supposed to be dynamic, and the
mutable store static, never muted, except for debug purpose. But debug
is always!

So is there a better reason?

* It's not serializable.
* It's fully derived from source, and just a cache.

So a better division is:

- everything that's completely derived from source, and doesn't change
  during a regular, non core-sev session goes into the hash store.

- all the rest, the real state which is result of computations (like
  assembler labels) goes into the functional store.


Entry: compile time hash
Date: Tue Jul 31 20:14:25 CEST 2007

let's do this namespace thing: a hash module, used at compile time and
later run time to solve all binding problems.

something i forgot: a namespace has both runtime and compiletime
semantics, however, i need to transfer everything explicitly from
compile time to run time if i want to use a hash..

now i am really confused. does this even matter?  the hash is not
accessible at run-time, but it is possible to have it around at
compile time and just have a macro spit out some values..

the real problem is: modules can be compiled independently, and all
state accumulated over such a run needs to be saved if it is to be
used somewhere else. so what i'm trying to do will probably not work.


Entry: got snot working async
Date: Wed Aug  1 22:49:16 CEST 2007

so now it's time to do some real work. i still don't want to give up
the idea of putting cat names in modules, and using eval to compile
code at runtime. it really can't be that hard. would be a good
exercise to find out what a namespace needs next to being empty to
just compile code..


Entry: cat and #%top / lexical variables?
Date: Thu Aug  2 09:00:15 CEST 2007

what about this: redefine #%top in the cat syntax expander to go look
for the cat namespace. this should enable the use of lexical scope to
do name resolution.

i found something easier: using 'identifier-binding' names that are
not lexical can be drawn from a namespace object. this gives maximal
scheme<->cat interplay, while keeping the namespace mechanism we had
before.

so:

- compilation to lambda expressions
- top level name resolution

are now separate. at this point it looks like i'm where i was before,
only with word rep changed a bit, and lexical scope.


Entry: namespace again
Date: Thu Aug  2 10:19:01 CEST 2007

so all name resolution is a runtime thing. at runtime, a tree of hash
tables is available which contains permanent bindings to word
structures. the code expands to forms that get bound to this word
structure whenever they are executed, using 'delay' forms.

so, with this delay mechanism in place, is there a need for storing
semantics in word structures? probably not.

..

something is not right:

- can't have (apply (delay expr) body ...)
- can't insert a word structure at compile time either

i wanted to to the latter to avoid a delayed expression. the only
solution is to use a different apply.


ok, i got it now. just using delay in the macro and force in the
applicator.


Entry: lot of work
Date: Thu Aug  2 18:40:57 CEST 2007

got myself in a lot of work because i'm not respecting
interfaces.. maybe fix that temporarily? it was necessary for the
control structures because they're low-level, but maybe not for the
rest of the code?

next: the 'compositions' macro parameterized by:

* source name space
* target name space
* compilation macro

maybe it's best to take a step back, and respect the interfaces.. it
looks like this is going to work, so i can just as well make the step
and replace the entire vm code.


Entry: weird macro bug
Date: Thu Aug  2 20:52:12 CEST 2007

  ;; This driver could be generalized into eager evaluation for macros.

  (define-for-syntax (process-args op stx stx-args)
    (datum->syntax-object
     stx
     (map (lambda (x)
            (if (and (list? x)
                     (eq? ': (car x)))
                (op (cdr x))
                x))
          (syntax-object->datum stx-args))))

  ;; This utility macro calls another macro with an argument list
  ;; reversed if it is tagged with ':'. This is necessary for PN <->
  ;; RPN conversion.

  (define-syntax reverse-args
    (lambda (stx)
      (syntax-case stx ()
        ((_ m . args)
         #`(m . #,(process-args reverse stx #'args))))))


The code above doesn't work.. Something about the syntax gets lost
maybe? Expanding the macro seems to do the right thing though..


Entry: base functionality working
Date: Fri Aug  3 10:34:04 CEST 2007

got cat/cat.ss as absolute minimum: anonymous and named functions.
(like lambda and define).


Entry: macro weirdness
Date: Fri Aug  3 10:38:41 CEST 2007

i'm confused again.. syntax-rules macros are like normal order
application:

(macro arg1 arg2)

the arg1 and arg2 forms are left alone until after the expansion of
macro.

This is how it should be i guess (the only way to get non-eager
evaluation in scheme is by constructing macros). But somehow it's hard
to switch between both ways of writing code..

One of the things i miss is to parametrize a macro with an 'anonymous
macro'. Something that behaves as a transformer, but does not have a
name. More specificly:

(compositions (lambda-macro ...)    ....)

Is this possible, or am i just confused about something??


and another one:

why is it so difficult to get this working:
 (define-syntax lex/cat-compile (syntax-ns-compiler cat-ref (cat)))

 (define-syntax syntax-ns-compiler
    (syntax-rules ()
      ((_ ref (ns ...))
       (syntax-rules (global)
         ((_ c global s e)   (apply-force (delay (ref '(ns ... s))) e))
         ((_ args (... ...)) (cat-compile args (... ...)))))))


i'm importing the module that has 'syntax-ns-compiler' as
require-for-syntax, but i get the error:

ERROR: cat/stx.ss:146:10: compile: bad syntax; function application is
not allowed, because no #%app syntax transformer is bound in:
(cat-compile lex/cat-compile dispatch 3 (pn-compose lex/cat-compile (2
1) s))


but this works :

 (define-syntax define-syntax-ns-compiler
    (syntax-rules ()
      ((_ name ref (ns ...))
       (define-syntax name
         (syntax-rules (global)
           ((_ c global s e)   (apply-force (delay (ref '(ns ... s))) e))
           ((_ args (... ...)) (cat-compile args (... ...))))))))

i don't get it..

Update: the answer might be that the latter is a pure rewriting macro,
and thus doesn't need any phase separation.. The former does, and the
problem is just that i don't understand the separation here..


Entry: list operations on code
Date: Fri Aug  3 13:44:12 CEST 2007

since all compiled code should have it's source rep still attached,
generic list operations are possible. i'm inserting a call to 'source'
for most of them.

Now, why not have 'run' accept data? This will make the language
simpler, and representation just a matter of optimization.

So.. a consequence here is that there always is a default or base
semantics. Maybe that's better.


Entry: Conclusion
Date: Fri Aug  3 14:52:34 CEST 2007

Maybe a bit early since i don't have the old stuff ported yet, but the
main conclusions seem to be:

* name space storage can be kept abstract: it's ok to do part of the
  binding at runtime, as long as this behaviour is abstracted
  (cat/lang.ss)

* defining a new language as syntax instead of explicit interpretation
  is good, because scheme's scoping stuff carries over: it's possible
  to only replace the global name space, but to keep lexical variable
  bindings.

And, macros can be simple, if you stick to syntax-rules. The more
general syntax-case can become very confusing very fast. The most
important thing to remember for syntax-rules is that it is a DIFFERENT
language than scheme! It is normal order (breath-first) instead of
applicative order (depth-first).

So.. time to look into CPS a bit more. There's this SRFI 53 i might
have a look at, but before that, i had a go at rev-k and rev-arg in
stx-utils.ss

seems to work..


Entry: and beyond
Date: Sun Aug  5 01:14:43 CEST 2007

So..

Maybe it is time to make a proper module based CAT language. Modules
really are a nice way to factor a design.. and i am already running
into the simplest of problems: name space clutter. A lot of temp
functions i'm using are littering the name space.


Entry: porting
Date: Sun Aug  5 16:08:52 CEST 2007

so, i started porting badnop to the new cat core. the first nontrivial
problem i run into is 'state-parse'.

Maybe i should keep 'define-symbol-table'. This needs some thought,
since the whole namespace thing changed. In effect, it's the same:
there are still hash tables with functions.

Wait.. the 'make-composer' things need to be macros now..

so, what's needed is sourcedict,compiler,target
currently, 'cat' is sourcedict+compiler, and 'cat!' adds destdict. i
need a better naming for this, since it's so general..


Entry: mzscheme things to look into
Date: Mon Aug  6 10:35:47 CEST 2007

* what is a 'transparent repl'
* moving more snot functionality to scheme
* snot and syntax coloring


Entry: anonymous macros
Date: Tue Aug  7 00:30:38 CEST 2007

is it at all possible to have anonymous macros? what i need is to
parametrize one macro with an implicitly defined other macro.

maybe this is not necessary: it is possible to have 'local' macros,
meaning macros defined by other macros, with names from syntax
templates. those names never clash, so it serves the purpose.

  (define-syntax compositions
    (syntax-rules ()
      ((_ (gen-def! . gen-args) . definitions)
       (begin
         (gen-def! CAT! . gen-args)
         (compositions CAT! . definitions)))
      ((_ def! (name body ...) ...)
       (begin
         (def! name body ...) ...))))


Entry: lifting
Date: Tue Aug  7 09:09:06 CEST 2007

when i want to do lifting, a decision needs to be made based on whether
a symbol is present in one namespace or not. this is a run-time
decision, since i'm using late binding. that doesn't look too difficult.

i think i have it now, overriding 'global' and 'constant' methods. the
rest should just work.

but. it's good to have a better look at monad theory and the 'lifting'
formulation to clean up my terminology a bit.

Let's see:

map	(a -> b) -> (M a -> M b)
unit	a -> M a
join	M (M a) -> M a


Setting a 'stack' as the base type t, the monad type M t will be a
stack with added state.

map is trivial and already used, however, the other two operations are
hidden somewhere else: in the words that implement the monad
dictionary. Does it make sense to make them explicit?

The thing that confuses me is that i am doing the 'lifting'
automaticly, based on a name space distinction. All the functions
inside the monad dictionary actually to the mapping joining and
returning, but in a way that's not factored into those 3 operations.


Entry: state lifting works
Date: Tue Aug  7 11:54:47 CEST 2007

now i need to think about some proper abstraction names, so the
'compositions' declarations look nice and readable.

maybe it's best to standardize on the following syntax:

(compositions
	(syntax (dst ...)
        	(src ...) ...)

  def ...)


* 'syntax' refers to the macro used to compile the body of the
   code. this is actually a compiler which needs source semantics.

* '(src ...) ...' are the namespaces representing the source semantics
  used by the compiler.

* '(dst ...)' is the namespace used to store the resulting code
  object.


Entry: program quoting and lifting
Date: Tue Aug  7 13:29:37 CEST 2007

i ran into this before.. in lefted code, how do i quote programs?
because of automatic lifting, the only sane way is to default to
non-lifted cat semantics. so i need to fix it up a bit..

looks like it's fixed now.


Entry: things that need fixing
Date: Wed Aug  8 12:10:32 CEST 2007


Probably the parser in forth.ss needs to be rewritten.. maybe as
macros? The thing that needs to change is that the parser always
returns symbolic cat code. No tricks with inserting internal
representations.

another thing i need to fix is default semantics: what to do if a
symbol is not found? maybe using a parameter?

done..

so the parser macros. if it's entirely built on top of the ordinary
cat macros, i could disentangle them and get them to work first, then
rewrite the parser macro preprocessor.

so let's start top-down.


Entry: macro.ss and literal + compile
Date: Wed Aug  8 21:09:02 CEST 2007

now i get it:
they need to be in (asm-buffer) and c> and c>word in (macro) need to
refer to them. that way 'macro-prim:' can be used together with
lexical binding.

ok, that works.

actually, it's quite cute this way. lexical scope to mix scheme and
cat code is nice..

this makes me think: if i implement the preprocessing macros also as
lexical extensions, that property remains. maybe that's overkill?
maybe the current code is ok, as long as i make it fully symbolic?


Entry: hygiene and the rewrite-patterns macro
Date: Thu Aug  9 11:35:50 CEST 2007

It's fairly complicated, but the name bindings introduced are only:

  make-word-compiled
  lift-macro-executable
  lift-transform

what if i factor it into 2 parts:

  a nonhygienic part that creates just the match clauses
  a hygienic part that binds the function and macro names


It looks like this is sort of working. Now what about preserving
syntax information in the expression parts of the match clauses?

(match ---
       (pattern expression))

so the expression part can refer to lexical variables etc..

let's do that, but first see if this non-hygienic version works.

one important question: when peeling off syntax with syntax-e, and
using datum->syntax-object to put it back, is the orginal syntax that
wasn't peeled off preserved? it really has to be..

seems to work.. at least the expansion does, but i can't see what can
go wrong with the quoting..


Entry: reduce
Date: Thu Aug  9 14:34:52 CEST 2007

transforming

((a . 1) (a . 2) (a . 3)
 (b . 4) (b . 5))

into

((a . (1 2 3))
 (b . (4 5)))

is called 'reduce', at least that's what i recall... but, i think
maybe the more general 'fold' is also called reduce sometimes.. so i'm
going to call it 'collect' for now.


Entry: require-for-syntax
Date: Thu Aug  9 17:55:48 CEST 2007

look at the macro compiler-patterns. find a way to put the utility
functions in a module without getting the error:

 pattern-core.ss:94:11: compile: bad syntax; function application is
 not allowed, because no #%app syntax transformer is bound in: (begin
 (ns-set! (quote (macro +)) (make-word-compiled (quote +)
 (lift-macro-executable (lift-transform (lambda asm (with-handlers
 (((lambda (ex) #t) (lambda (ex) (pattern-failed (quote +) asm))))
 (match asm ((((quote qw) b) ((quote qw) a) . rest) (appen...

i don't get it. when i make them local to the transformer expression,
all is well, but using 'require-for-syntax' doesn't work.


i tried the following isolated case:


;; Utilities for syntax object processing.
(module stx-utils mzscheme
  (provide (all-defined))

  ;; Reverse a syntax list.
  (define (reverse-stx stx)
    #`(#,@(reverse (syntax-e stx)))))


(module test mzscheme
  (require-for-syntax (file  "~/plt/stx-utils.ss"))

  (define-syntax reverse-quote
    (lambda (stx)
      (syntax-case stx ()
        ((_ list)
         #`(quote #,(reverse-stx #'list)))))))


and this seems to work fine, so i'm doing something else wrong..


Entry: CPS macros are fun
Date: Thu Aug  9 18:29:06 CEST 2007

but not really practical when syntax-case is around. now that i'm
understanding it a little better, there isn't any reason to keep the
CPS macros for list reversal.

the other thing to consider is the 'compile' macro. i'm using
something akin to CPS there too, only it's more message like message
passing: pass the current object (self).


Entry: datum->syntax-object
Date: Thu Aug  9 19:17:06 CEST 2007

thinking a bit more.. i'm still not convinced that

#`(#,@(syntax-e #'some-list-stx))

is doing what i think it is doing: the manual says
datum->syntax-object is used, but does it see the syntax substructure?

reading the manual again, now that i know what i'm looking for:

"(datum->syntax-object ctxt-stx v [src-stx-or-list prop-stx cert-stx])
converts the S-expression v to a syntax object, using syntax objects
already in v in the result."

for (with-syntax ((pattern stx-expr) ...) expr)

"If a stx-expr expression does not produce a syntax object, its result
is converted using datum->syntax-object and the lexical context of the
stx-expr."

then for quasiquoting syntax:

"If the escaped expression does not generate a syntax object, it is
converted to one in the same was as for the right-hand sides of
with-syntax."


so i guess we're ok!


Entry: symbolic macro names
Date: Thu Aug  9 20:05:26 CEST 2007

something i run into is the (macro x) function from comp.ss: (macro:)
wont work because the variables in the patterns are symbolic!

this is really confusing..

i'm replacing the symbolic function with 'macro-ref' to make it more
clear this is a run time symbolic lookup, not something that can be
bound once.


Entry: lexical quoted
Date: Thu Aug  9 20:41:58 CEST 2007

with the new syntax approach, i can use lexical variables like

(let ((xxx (lambda (a b . stack) (cons (+ a b) stack))))
   (base: 1 2 xxx))

Which is convenient. However, i ran into at least 2 cases where the
more convenient thing to do is to insert a constant instead of a
function. However, the semantics of a symbol is always a function in
CAT. Except.. when it is quoted!

So what about this

(let ((yyy 123))
   (base: 1 'yyy +))


Meaning (base: 1 '123 +) ???

This is very convenient, but looks a bit weird. The reason is of
course that stuff after base: is NOT SCHEME. Quote in the cat syntax
only means: "this is data".

The benefit of this is that it somehow resembles pattern variable
binding as in syntax-rules.

A better explanation is this:

 The scheme and cat namespaces are completely separate: scheme has
 toplevel and module namespaces, while cat has everything from a
 separate hierarchical namespace. The only way they can interact is
 through lexical variables: this is the only set of names that is
 fully controllable.

 In cat expressions:

 * free identifiers come from the associated name space
 * identifiers bound in scheme are
    - used as functions when they occur outside of quote
    - used as data when they occur inside of quote


This can be implemented by mapping quote -> quasiquote, and unquote a
symbol whenever it is lexical.

It seems to work fine.
Quote for macros is now also fixed.

Another attempt to justify myself:

 The quote operator in cat language is NOT the same as the quote
 operator in scheme code. More specificly: lexical variables will be
 substituted whether they are quoted or not. i.e. both (base: abc) and
 (base: 'abc) will be substituted if the variable abc is bound. The
 quoting just indicates the atom is not to be interpreted as a
 function, but to be loaded on top of the data stack.

 The substitution is there to make metaprogramming easier.


Entry: pattern transformer extensions
Date: Fri Aug 10 10:23:06 CEST 2007

I'm trying to perform the pattern extensions properly. A true test
about this phase separation thingy, since i have a couple of phases
here:

0 matcher runtime
1 execution of pic18-pattern transformer
2 execution of pic18-pattern transformer generator

an extra problem is that i'm matching transformer names -> syntax
transformers.

this gets a bit complicated, because the name of the pattern
generator, i.e. 'unary->mem' is used both as a macro template, and as
a function name, so the transformer generator needs to be generated!

too many levels of nesting: this has to be simplified somehow..

wait:

the thing that needs to be generated is a pattern expander function,
which can be used in pic18-comp.ss to create the extended compiler-pattern
macro.


ok, i'm running into the problem again: if i put pic18-meta-pattern
and pic18-process-patterns in a different module, and
require-for-syntax it, i get the #%app error again..

so until i figure that out, maybe best to always use local transformer
procedures?

i guess it has something to do with binding identifiers. here the
problem seems to be 'lit'. in the binary-2qw pattern..


i ran into the problem again, in pattern-utils.ss /
extended-compiler-patterns and it was something like:

      #`(namespace . #,( ---- ))

which needs to be

      #`(namespace  #,@( ---- ))

because the #, expands to an s-expression, which is then just inlined
too, leading to 'process-patterns' not being quoted. weird stuff can
happen.. ALWAYS CHECK EXPANSION when this #%app thing occurs!

wait.. that's not it.. damn!

i'm going to leave it at not using any syntax for it.. it's not too
bad, and understandable.


Entry: for-syntax
Date: Fri Aug 10 14:08:57 CEST 2007

I still get into trouble with higher order where i completely don't
understand what's going wrong. Well, i guess it will come with
time. I'm glad i get syntax-case to a point where i don't need to use
unhygienic constructs any more.

And, if i get into trouble with name bindings, it's always possible to
put local functions in a transformer expression.

I'm done for today tough.. head is hurting :)


Entry: preprocessor
Date: Fri Aug 10 14:50:57 CEST 2007

Time to adapt the parser/preprocessor, and change it to something
purely symbolic.

Seems to work. It's a lot simpler too now that the macro
representation supports quoting etc..


Entry: cat name space organisation
Date: Fri Aug 10 17:45:16 CEST 2007

Things changed:

* i found a way to easily debug modules and scheme code using snot
* cat code is now fully embeddable in scheme
* things got separated out a bit more

So, what i'd like to do is to separate pure scheme code from stuff
that accessess the global 'stuff' name space. This doesn't include
private name spaces that are only written in a single module, like
'asm' or 'forth', but it sure does include 'base'.

base is full of junk.. maybe that's the real problem?

Maybe i should just implement more in scheme, and have this base thing
as a scripting language only...

Or. I need to add an easy syntax for creating 'local words' using the
lexical stuff i have now.

(letrec
    ((a  (base: 1))
     (b  (base: 2))
     (c  (base: 4)))
  (begin
    (ns-set! '(base broem)   (base: a b + c /))
    (ns-set! '(base lalala)  (base: a b c))))


(local ((a 1)
        (b 2)
        (c 4))

       ((broem  a b + c /)
        (lalala a b c)))


this requires a different syntax, since the anonymous compiler
needs to be available always.

very straightforward. works like a charm.


Entry: wordlist search path
Date: Sat Aug 11 15:21:18 CEST 2007

i need to change the 'find' macros so they accept multiple paths..

one think i'm wondering about is how 'force' is implemented.. somehow
i suspect that the thunk is not erased... it probably is..

found it:

(define(force p)
  (unless(promise? p)
    (raise-type-error 'force \promise\ p))
  (let((v(promise-p p)))
    (if(procedure? v)
	(let((v(call-with-values v list)))
	  (when(procedure?(promise-p p))
	    (set-promise-p! p v))
	  (apply values(promise-p p)))
      (apply values v))))

thunk is erased. the only thing to optimize is to not use the values
stuff, but a single return value. probably not worth it.

Haha, something i didn't see at first there: the p is either a
procedure or a list, so i can't make a single atom of it, because it
could be a procedure :)


another trick:

creating the '<language>:' macro at the spot where the '(language)'
dictionary is created and populated with primitives uses the module
dependency system to somehow enforce dependencies on namespaces, which
are not checked.


Entry: next: namespace
Date: Sun Aug 12 21:10:45 CEST 2007

more specificly, it's time to start using eval on the 'macro:' syntax,
and i run into the problem that this happens in a toplevel where it's
not defined. can you tie 'eval' to a module context?

update:

what about making this namespace explicit? i just need a single
namespace object which contains all the relevant compilers.

hmm... it looks like the easiest way to implement this is to require
each 'lang:' macro to be associated with an 'eval-lang' function.

argh... can't do that, since those also need eval.. looks like i can't
escape this namespace thing..

Entry: lazy data structures
Date: Mon Aug 13 01:33:23 CEST 2007

so, what do i need to make the assembly process lazy? match needs to
work on lazy lists.


Entry: disentangling
Date: Mon Aug 13 10:43:14 CEST 2007

i can't just separate out all the code that defines names in the
namespace, because the namespace is used for other things. there's
some conflict here..

what about i start doing it anyway.

- more namespaces

- populate each name space at the same place where the scheme code
  that uses it is exported.


the real trick is of course to see the direct specification of
namespaces as 'internal'. this should be wrapped by functions.


Entry: name space trick doesnt work
Date: Mon Aug 13 15:56:35 CEST 2007

the problem seems to be that data constructed using the struct from
rep.ss is different from data constructed using run time evaluation in
the separate namespace..

i need a different approach. first to identify the problem:

  - mzscheme has strict phase separation

  - "(eval '(macro: ,@src))" works when the runtime name space, the
    current one when that code is executed, has that macro available.

  - somehow, the 'rep.ss' gets loaded twice, since i run into
    incompatible instances.

now, why is it loaded twice?

one thing i could eliminate was a for-syntax dependency on rep.ss, so
the messages are a bit less confusing now.

it looks like the namespace trick creates a new instance. that's where
my trouble is.

let's simplify it a bit. the only thing that really needs dynamic
compilation is 'macro:'. so i'm going to put that dependency in the
code itself. but this should be independent of the problem, so i leave
it like it is.

i found a solution: just make sure the current namespace has the right
symbols. there's no other way. currently i just dynamic-require them
in, but i dont know if this is better than just requiring them at load
time, or namespace creation time..

it does feel a bit dirty though.. why use modules if they start
injecting stuff in your namespace? Maybe i should just pass the macros
upstream.. whatever..


Entry: quoting parsers
Date: Tue Aug 14 12:59:37 CEST 2007

They seem to work now. Map well onto the literal stack / typed macros
approach. The question is how to map them. It doesn't look like a good
idea to keep the same symbol, but how to change it then? I'd like both

  ' <filename> load

and

  load <filename>

to work. Prefixing them with 'def-' seems not right. What about
"/load". A single symbol seems the right thing.. ~#$%& are
ugly. "*load" seems a good compromise.

I got it sort of working, and factored it out a bit. However, might
need a bit of name cleanup to distinguish the following source
representations:

- a filename
- a list in symbolic forth format
- a list in symbolic macro
- the latter compiled into executable code

I switched to the following naming convention:
- files use 'load'
- strings use 'lex'
- list -> compiled code uses the ':' prefix
- all other operate on lists directly

Entry: default semantics
Date: Tue Aug 14 13:55:56 CEST 2007

Also, i start to wonder if it's a good idea for 'run' to take literal
lists as an argument. The only real benefit is Joy like introspection,
but since in badnop most source reps are macros, this doesn't make
sense: source is not the only thing, semantics needs to be added, so
default semantics might not be good.


Entry: long run times
Date: Tue Aug 14 14:46:10 CEST 2007

seems to go into a loop somewhere.. time for a break.

it seems it's just really slow!

and all the time is spent during compilation. looks like i got some
quadratic things going on in the expansion..

so, i suspect this is syntax-rules.. i'm using a lot of rewriting to
avoid syntax-case.. maybe i should just come back from that? already
eliminated the if-lexical? macros.

ok.. let's see. first make the expansion a bit less dramatic: some
things can be abstracted in a function.

then i replace rpn-compile by a single syntax-case macro. it still
calls the compile macro, which can be customized by stuff built on
top.. there's no difference in speed, so i guess it's somewhere else..


so it's this:

    (rpn-compile *forth* 'macro:)

if *forth* is about 150 atoms, there's about a second delay.
maybe it's the nesting of the macro?

i wonder if i can write a macro that's faster..

let's try something different.

currently, the rpn-abstract macro is using fold. it's still calling
the 'compile' macro. looks like that's what i need to replace.

so, how to implement modified behaviour. instead of using macros, why
not use functions? i do need proper phase separation to do this. let's
see if that's possible by moving stuff out to for-syntax-rpn.ss


Entry: running into #%app trouble again
Date: Tue Aug 14 22:15:17 CEST 2007

This is the smallest example i could find that doesn't work as i expect..

(module for-stx mzscheme
  (provide (all-defined))

  (define (break-stx fn args)
    #`(#,fn #,@args)))

(module test-stx mzscheme
  (require-for-syntax "for-stx.ss")

  (define-syntax (bla stx)
    (syntax-case stx ()
      ((_ fn . args)
       (break-stx #'fn #'args))))


  )

The, putting the 'break-stx' definition inside the define-syntax def
works fine..

Ok, if i change the quoting mechanism above to:

  (define (break-stx fn args)
    (datum->syntax-object fn
                          (cons fn args)))

It does work. Now i'm really confused.
I found this on the plt list:

http://groups.google.com/group/plt-scheme/browse_thread/thread/327013d5c6f61017/9a12e93d683a5f94?lnk=gst&rnum=2#9a12e93d683a5f94

  (require-for-template mzscheme)

in the module that generates syntax seems to solve it.


So, on to replacing the old 'compile' macro with a functional
approach, which works a lot better. There's really no reason to mess
with syntax-rules for anything else than simple patterns.


Entry: disentangling
Date: Wed Aug 15 14:19:05 CEST 2007


rpn-tx.ss        lowlevel syntax generation, parameterized by 'find'
rpn-runtime.ss	 runtime support for the above
rpn.ss           bind a 'find' closure generator to lowlevel syntax
ns-utils.ss	 support code for namespace lookup, to be used in find closures.
state-stx.ss  	 namespace namespace -> state syntax "language:" compiler
base-stx.ss      namespace -> base syntax "language:" compiler
composite.ss  	 create named words from compiler


Entry: mission accomplished
Date: Wed Aug 15 15:15:54 CEST 2007


Looks like i got it back online. The transformer works a whole lot
faster now. Let's repeat the conclusions:

- don't use syntax-rules if you need CPS tricks. it's ad-hoc and
  slow. use syntax-case with real functions instead.

- when using complicated syntax-case macros (compilers for embedded
  languages), separate out the transformer procedures and the template
  runtime support into different modules, so they can be tested
  separately.


I did this for pattern.ss  -> pattern-tx.ss and pattern-runtime.ss


Entry: better error reporting
Date: Wed Aug 15 16:21:00 CEST 2007

so... it would be great to be able to relate errors to where they
occur in the source code. however, to use the builtin syntax readers i
need to move both the lexer and the parser so they can
operate/generate syntax objects.

pretty clear what's to do next then:

- rewrite forth parser so it operates on syntax objects + create a
  proper 'forth:' macro that goes with it.

- make the lexer behave as 'read-syntax'

when this is done, i should be able to compile forth files straight away.

First part was easy: driver works. The rest should be
straightforward. However, moving this to compile time requires some
phase magic...

I was thinking about doing a proper phase separation in the forth code
too. Instead of defining macros as side-effect, it's probably better
to isolate them.


Entry: predicates->parsers
Date: Wed Aug 15 21:57:18 CEST 2007

I don't remember why exactly the map 'vm->native/interactive' is not
purely syntactic. it really should be.. refer to previous code to find
the previous functionality, but i'm breaking it and taking out the
'dict' dependency and will replace the predicates->parsers with
something that doesn't evaluate.

the previous 'predicates->parsers' behaviour is too dense. took me a
while to understand it. better to separate out in different
mechanisms: 1. syntacting transformation 2. run time symbol lookup


Entry: produced first monitor.hex
Date: Thu Aug 16 00:31:51 CEST 2007

looks like i got it mostly running now. didn't test the code yet since
a lot of things are still broken, mostly the interactive part. but it
looks ok.


Entry: brood 4
Date: Thu Aug 16 00:36:17 CEST 2007

enough things changed, and i'm in a broken state for a bit now. this
means it's time to up the version, and rewind the the 3 archive to a
working state.

it's archived as brood-3 on apatheia. this is the last patch included:

Mon Jul 23 21:29:12 BST 2007  tom.goto10.org
  * namespaces and next projects

at that time i was changing stuff to the boot block.. i'm not sure if
that code actually works.. might be better to revert a bit more back
till after the workshop.


Entry: next
Date: Thu Aug 16 01:01:53 CEST 2007


- test the target code, see if the monitor still works
- fix the interaction code
- fix the vm interaction/compile code
- fix snot for interacion/compile mode
- factor some badnop code: use local words


Entry: separate compilation
Date: Thu Aug 16 10:51:12 CEST 2007

got me thinking: can't i do the separate compilation trick for macros?
i already run into the non-transparency problem several times: trying
to define some code with some macros not defined..

one of the problems is 'constant': it needs run time compilation so i
can't just do this... another is that macros defined immediately start
influencing compilation of code after their loading.

but.. can the loading of forth files be made free of side effects? or
at least somehow separated? let's see what kind of side-effects we got:

 constant-def!
 2constant-def!
 macro-def!

those are easily isolated into separte dictionaries to separate 'core'
from 'project' macros and constants..

as long as project macros are loaded AFTER core macros, they can be
safely deleted as a whole.

the short version: it's impossible to change it now without real phase
separation..


Entry: literal pattern matching
Date: Thu Aug 16 11:20:58 CEST 2007

Patterns like

         ((['qw a] ['qw name] *constant)
    	  (begin (def-constant name a) '()))

are a bit redundant.. a better notation would be "(a name *constant)"


Entry: assembler cleanup
Date: Thu Aug 16 11:36:25 CEST 2007

Can't i get rid of the 'constants' namespace?

Again, why are they different from macros? To postpone symbol ->
number conversion until assembly time. So they can't be macros,
because at assembly time all macros have run.


Entry: compilation syntax
Date: Thu Aug 16 13:42:21 CEST 2007

i'm thinking about adding some syntax to compile code using different
syntax..

(a b c) is still default demantics quoted code, but

(lang : a b c) is interpreted as compiled with 'lang'.

or maybe

(lang: 1 2 +)

let's see if i can do this first on the rep.ss level: just store a
symbol naming the rep.

probably the first thing that needs to change is to change state-stx
to take an anonymous compiler as 2nd op..

it's really annoying to be at the border of compile/run the whole time!

first, the above is not really possible since state-stx code fallback
code is not derived from a named compiler.


Entry: override semantics
Date: Thu Aug 16 15:05:36 CEST 2007

Introduced the (language: ...) syntax for overriding language
semantics while quoting code. It's implemented as follows: the default
'program' compiler checks if the first symbol in a list ends in ':',
if so, the whole expression is passed to the scheme expander,
otherwise the default 'represent' method is used to compile the code
anonymously.

It's a small step from here to a 'lambda:' macro.

I also fixed the semantics annotation. However, it is possible to run
into code which doesn't have the semantics annotated because it works
with an anonymous macro.. This could be cleaned up, but i guess it
serves the debugging purpuse now: 'ps' displays macros as (macro:
....)


Entry: name mangling
Date: Thu Aug 16 16:01:27 CEST 2007

Maybe i should give the name mangling a go again.. If i recall the
thing i did wrong last time was to get rid of syntax information for
names, so they were mapped to toplevel names.

This macro seems to do the trick:

  (define (prefix pre name)
    (->syntax name ;; use original name info
              (string->symbol
               (string-append (symbol->string (->datum pre))
                              (symbol->string (->datum name))))))


So basicly now i have a mechanism to use the mzscheme module system
for handling namespace and dependency management.

I bet i can use some kind of 'module-local?' predicate on the syntax
to find out if a name is local to a module, and if so use that
instead.

I guess a good time to find out if i have the namespace stuff
sufficiently abstracted.

Something about naming conventions: the 'rpn-' modules do not need or
depend on the namespace implementation.

I do need a different kind of 'compile' macro, but for the rest it
works perfectly. Maybe time to rename some things..


Allemaal goed en wel, maar hoe kombineer ik? Lijkt me niet direct een
goed idee.. Dit werkt beter als alles of niets..

So, combining runtime namespace lookup and static modules.. how to?

One of the things to change is to not inherit from a namespace, but
from a named compiler macro.

What about starting from the ground up? Making the base language
static, then moving things from dynamic -> static?

Starts with snarfing. Instead of snarfing to dictionary, snarf to
prefix.

Start with separating primitive.ss into snarf.ss and ns-snarf.ss

So in principle, it should be really easy now to move the
implementation of base.ss to static functions without anybody
noticing. That is, if i can somehow make delegation work using just a
language: macro instead of namespaces..


Entry: the royal DIP
Date: Thu Aug 16 17:35:27 CEST 2007

i guess the solotion is to use 'dip' from base to create state syntax
abstractions. and maybe, to add an optimization that:

	      (+)

does not create an extra lambda, but returns the primitive + right away.

i guess the optimization can be left until later..

so..

the idea is to make the delegate compiler abstract. this requires
quite some change, but should make the code a lot simpler.. it would
also fix the annotation problem mentioned above.

so
	    (ns-base-stx badnop: (badnop) base:)

instead of
            (ns-base-stx badnop: ((badnop) (base)))


let's call this 'extend-base-stx'

haha. gotcha!  of course, the delegate: is a static thing, and the
namespace delegation is a dynamic thing, so there's no way to compile
this: the information necessary to decide about delegation or not is
not available at compile time, when the deligation needs to be
frozen..

it needs to work the other way around !!!

if a symbol is not defined at compile time, the resolution can be
postponed until runtime.

so i guess the gentle way to move things to static implementation is
to use the 'module-local?' predicate mentioned before. that way
module-local symbols can bind first, and cannot be overridden.

the number of methods in the compiler is getting larger. maybe use
real objects? prototypes?

also, if i use a decent prefix, symbol capture is not a real problem,
so i can put it on always? maybe just a dot or a pound sign..


Entry: pff... done coding..
Date: Thu Aug 16 20:49:59 CEST 2007

today was a bit intense. i start to get a bit more of this syntax /
lexical / static stuff.. it would be nice to make more things
static. there are only a couple of places that have 'plugin'
linking. one of them is 'literal' and 'compile' in the macro-prim
dictionary, so it looks as if i do need some dynamic binding.

however, i wonder if it's not better to solve this using units. more
standard tools = better, now that i know what i want at least..

one thing is bugging me though. some paradoxical thought:

i'd like to define words that fall back on another
dictionary. however, using static linking there is no such thing: a
symbol is there, or it is not. and there's no override..

maybe i should stick to dynamic.. it's really different and no easy
migration to static.


Entry: if i go static
Date: Thu Aug 16 23:29:47 CEST 2007

one name mangled namespace is enough, since i can use ordinary modules
to organise code and hide details, just like in scheme. let's stick to
'rpn.'

so i built that in: names like 'rpn.xxx' that are visible at compile
time get used as functions, and bind variables 'xxx' in the
cpmpositional code, just like lexical variables.

it looks like delegation from dynamic -> static parts is not
possible. since this is quite a deep thing to change, i'm not going
to. it's still possible to move highly specialized code into modules
to shield them from the main dictionaries.

what is possible is to add a static interaface to words in
'base'. they could still include code to register to the dynamic space
also, but at least this would enable to freeze some functionality. so,
maybe this: all base words are exported

- rpn.xxx variables from the rpn-base.ss file

- exported in a dynamic dictionary from base.ss, which gets the
  functionality from rpn-base.ss

is a bit confusing.. maybe leave it as is..


Entry: because i can
Date: Fri Aug 17 00:05:30 CEST 2007

there's a lot of 'because i can' code in thethered.ss ... as i found
out, some tasks are just easier to code in scheme. if it's anything
algorithmic, meaning intricate data dependencies, you're usually
better of writing a scheme program. they are easier to understand,
probably because they are a bit more verbose, and because 'automatic'
permutation and duplication of names avoid mental gymnastic for stack
juggling. there is nothing in the way now that i have both 'base:' or
'prj:' in scheme, and 'scheme:' in cat.

for what is the cat code useful then? simple patching and scripting,
there it clearly wins. as long as not too much data juggling is
needed, cat is really easier to patch things together.

also, imperative code looks nicer in cat. because cat is just
composition, it looks sequential. it happens that all (most)
imperative code i use is for communication. in scheme imperative code
seems always ugly.. maybe it's because synchronisation is easier to
imageine in a linear instruction flow: threads of execution joining
together at certain points, breaking the linearity of composition?


Entry: joy
Date: Fri Aug 17 01:28:07 CEST 2007

added a joy interpreter. it doesn't have much, but it can do

   ((dup cons) dup cons) i


Entry: interaction
Date: Fri Aug 17 10:45:30 CEST 2007

got a bit off track again.. time to fix interaction. first thing to do
was to put 'tinterpret' and 'tsim' code in prj.ss together with the
supportin code dip/s and ifte/s

so.. why is this so ugly?

by default, quoted programs and run + ifte use functional context to
limit surprises. however, sometimes i want to do things like:

 (tsim        (prj: dup tfind not) dip/s
              (prj: tinterpret)    ifte/s)


the xxx/s words are the analogs of xxx but pass stack + state to the
programs, and 'prj:' compiles state words.

is there a way to do this automaticly? probably not using my current
setup, unless i make 'run' understand state words, which means they
should be type tagged. since that only takes away the /s notation, i'm
not going to do this. so the convention:


  functionals <xxx> do NOT pass state to quoted programs,
  while the corresponding <xxx>/s DO


but...

if one uses types to do this automaticly, which means that the core
'apply' routine should be made aware of state, and rep.ss should
implement some kind of tagging for state words.. what would be the
real problem?


Entry: Monads
Date: Fri Aug 17 12:10:45 CEST 2007

i don't know much about type theory, but i think i understand how my
ad-hoc approach relates to monads using the unit-map-join formulation.

X     is state type
S     is stack type
( . ) is cons

unit ::  S             -> (X . S)
map  ::  (S -> S)      -> ((X . S) -> (X . S))
join ::  (X . (X . S)) -> (X . S)

so 'unit' introduces a new state object on the data stack. 'map' will
create a function that does what it did, but ignores the X part, and
'join' will accumulate one piece of state into another.

the first two are trival, and i do use them fairly explicit. but the
last one seems to be hidden a bit deeper, because i never use it
explicitly:

every state dictionary has a couple of words that bring stuff into the
monad, but they have type:

A     is assembly opcode

asm ::   (X . (A . S)) -> (X . S)

here 'A' is not the same as 'X', but in spirit it does the same
flattening operation.

looks like i'm missing some of the fun. clearly the 3 law formulation
has some benefit due to a higher level of abstraction, but what would
it bring me to make this a bit more explicit?

first of all, i need a proper type system. the monad objects should be
somehow tagged. that way 'unit' and 'join' can be made
polymorphic. 'map' should not be polymorphic, given i implement
monads as 'things on the top of the stack'.

;; map

(lift   (dip) curry)


the other two are problematic.  'join' is possible to do, since monad
types could be tagged so it _could_ be made polymorhic. but 'return' /
'unit'.. such polymorphism won't work because i can't infer the type!

i.e. 'return' is normally plugged into some expression that expects a
monad type. i have no way of determining something like that, so
'return' should have explicit annotation, probably best using just a
different name.

for example, the assembly 'return' for a single opcode would be

(asm:return    '() cons)   ;; wrap the single opcode in a list
(asm:join      append)     ;; concatenates the 2 state lists

what i do is to just combine those 2 operations into one that conses a
single opcode to the assembly list.

i think i sort of get the gist of it.. or not?


so, the other formulation uses

bind: (X . S) -> (S -> (X . S))  ->  (X . S)

so bind is like 'join' in that it combines a monad data type with a
function that maps from outside the monad to inside, and returns a
monad type.

note: because i have only one type (a stack: each function maps stack
-> stack).

* in the general case, the source and destination monads for the
  'bind' operator do not need to be the same, but in my approach they
  are, since there is only one type that can be "monadified"

* i do not have the concept of a type constructor (types do not have
  an abstract representation), so i can leave that out.


so a stupid question maybe. how do you get stuff OUT of a monad?

i think there's something i didn't get. the type signature of bind is:

   M t -> (t -> M u) -> M u

so i guess if M u == t, bind can get things out of a monad.

in general, it can get t out of M t (multiple times!), apply the
function (t -> M u) (multiple times), and combine 'stuff' from M t
with the (multiple) M u, and return an M u.


conclusion: not having 'real types' makes all this a bit difficult to
formulate.. it might be a nice exercise to try to do it anyway.


nice base for some more reading on the subject. maybe "Monadic
Programming in Scheme"

http://okmij.org/ftp/Scheme/monad-in-Scheme.html

this talks about the case where there's a single monad, or types of
different monads do not get mixed.


Entry: source annotation
Date: Fri Aug 17 16:45:25 CEST 2007

really.. does it make sense to NOT have the source annotation be
formal, if with a little more effort it can be?

It's sort of formal now.. Things that are not uncompilable have #f
semantics, the others are created straight from the named macro, so
should be right, or by composition from such, so should be right
because all code is syntacticly concatenable.

It sort of strikes me as odd that i can't have 'curry' or 'lift'
defined in a generic way, because quotation of data is not standard. I
could try to force it. Anyway, for 'lift' i only need base
semantics/syntax.

Wait, lift is possible if semantics is defined, but it requires that
quoted programs are always available. (even in forth macros!)


Entry: and so on..
Date: Sat Aug 18 01:48:03 CEST 2007

time to get back to pic programming... i didn't really anticipate this
static change and the move to brood 4. but this are really better this
way.

once the pic part is back online, it's time to look at interaction
macros, or how to create interactive meta functions.


so, timeline:

- interaction macros

- the standard 16bit forth (requires interrupt driven serial I/O and
  an on-chip lexer + dictionary)

- write something about compile/runtime and the different ways to fake
  the single machine experience.


Entry: state
Date: Sat Aug 18 14:34:18 CEST 2007

got interaction working. i changed it so the commands available in the
interaction mode need to be specified explicitly. this has to be done
for commands that take arguments anyway, so why make an exception for
0cmd?

see interactive.ss

so, i've been doing the snot-run thing, which works quote well. it's a
feast that state is stored elsewhere, and my function core can be just
reloaded. however, there are a few spots where i'm using state still..

one is IO. since it's non-functional anyway, storing the name of the
serial port couldn't hurt, right? wrong.. on restart, it needs to be
reset.

i made the 'boot' word which loads all the macros from source. this is
slow, i guess because of the constants?

so maybe i should just put the constants back as scheme file..


Entry: KAT and TAK
Date: Sun Aug 19 14:05:15 CEST 2007

I'm looking for a better way to explain the pattern matcher. Usually
generalizing helps. The reason why it seems special is because it is
only used with "macro pattern" and "quasiquoted scheme template"


Entry: no phase separation
Date: Sun Aug 19 15:16:21 CEST 2007

Now that i finally understand the point of phase separation, i wonder
if i can do something similar with the forth?

Maybe it's not necessary for small projects, but it does feel a bit
weird to first struggle to write scheme code that obeys mzscheme's
phase separation rules. To see that it's a good thing, and then to go
back to some non-separated way.

I see a roadmap on how to do this: just turn everything into scheme
syntax. The result after loading is a single function that generates
the program when evaluated. That way i know i'm going to get there. On
the other hand, i do not know what to give up then. My whole design
needs to change.

An other way is to do it incrementally. First make sure i can separate
code into macro definitions and rest. For just macro definitions this
is not so difficult. However, constants are a different story, since
they require compile time computation..

I guess that's where the problem is:


Entry: constant
Date: Sun Aug 19 15:27:10 CEST 2007

What are constants? Phase separation violation! In contrast to normal
macros, which obey separation because they do not use any values
created at compile time, macros generated by 'constant' join 2 phases.

The 'constant' word could be termed a "phase fold". The compiler after
'constant' is not the same one as before: it is extended with a
macro.

This kind of behaviour prevents modularization of code, because it is
not clear what the definition of the new macro depends on, the only
thing that can be assumed is that it depends on all the previous code,
and that all the following code depends on the new macro.

The solution is that this behaviour needs to be unrolled: instead of
updating the compiler on the fly, an extention phase (where macros are
defined) needs to preceed a compilation phase (where macros are
executed).

There is a general way to unroll 'constant': split the code in 3
parts: the part before, the definition of the new macro, and the code
after. This is rather cumbersome and entirely unnecessary..

However, in the case of Purrr18 it is usually possible in to transform
the code to a macro definition. Instead of writing

	   1 1 + constant twee

one could write

           macro : twee 1 1 + ; forth

This enables the macro definition to be distinguished from the rest of
the code, to clarify the dependencies of a file's plain code on the
macros defined in that file.

The only reason not to do it the second way is because it looses the
name 'twee' in the eventual assembly code.

Removing 'constant' could lead to a better transparency in the code:
compiled macros could then be seen as 'only cache'.

Note that i would do this just for more transparency, not to eliminate
undefined symbols: macro name binding is still late.


Entry: phase separation
Date: Sun Aug 19 16:34:26 CEST 2007

So, a forth file contains both macros (M) and forth (F) code. The
forth code always depends (->) on the macros  (M -> F)

If a forth file depends on an other forth file, the macros from the
former depend on the macros of the latter, and the forth code depends
on both macros and forth code from the latter. Due to transitivity,
the arrows from M -> F in between files can be omitted, so one gets
something like

    Ma -> Fa
    |     |
    v     v
    Mb -> Fb

where the arrow from Ma -> Fb is left out.

What this would buy me is that i solve the problem of keeping the
macros consistent with the state of a target:

Target state is a consequence of compiling all the Forth code in a
project. However, as a side effect, a project defines macros that are
used to generate this code in the first place. There needs to be a
clean way to 'reload' these macros from the source code, so we can
connect to a target with the macros instantiated.

I'm trying to see how to make this more rigorous: how to make
incremental compilation work without having to manage dependencies
yourself? Basicly, how to map the nice module system of mzscheme to
incremental Forth development.

This is clearly not for now. It requires a lot of change. One of them
would be management of storage on the controller: if dependencies of
separately compilable modules are fully managed, incremental uploads
are still possible, and become 'transparent'. I.e. changing a module
but not changing its dependencies makes it still possible to update a
system on the fly, but in a transparant way.

I'm still quite happy with the ad-hoc hacked up way of incremental
development. But knowing this is possible might make the itch a bit
stronger.


Entry: dynamic updates and functional programming
Date: Sun Aug 19 16:53:53 CEST 2007

I guess most of this train of thought started after i got to using
sandboxes with SNOT. Currently it works +- like this:

SNOT (the bootloader)
  * manages memory: stores project state in a single toplevel variable
  * manages purely functional sandbox
  * implements REPL

outside of the system the edit-compile cycle runs: changes are made to
the collection of functions that acts on the state and a compiler
recompiles those that have changed. then 'restarting' the system is
almost instantaneous: the state remains, only the operations on the
system can change.

the requirements for this are of course that all state is stored in a
fairly concrete way: representation must not change from one version
of the system to the next.

if representation changes, a small 'converter' could be made..

What i'd like is something like a smalltalk environment, but then for
scheme. A lisp environment with incremental loading comes close, but
transparency is necessary.

Smalltalk solves this by being completely dynamic: compilation is just
cache, and code can be edited on the fly. There is no 'off', it's
always running.

MzScheme solves this by being static but well-managed
dependencies. Separate compilation to make 'restarting' cheap. There
is an 'off', but it can be made small.

Using the approach above: managing ALL state separately renders a
virtually always-on system. The off period can approximate zero since
it's just "swapping a root pointer" once the code is
compiled. Compilation can take longer if changes to core modules are
made, but there remains a 1-1 correspondence between the system and
the source code.

I guess it's possible and not even too difficult to delay compilation
in the scheme case, making compilation behave more as a cache.


Entry: purification
Date: Sun Aug 19 17:17:13 CEST 2007

So i need to eliminate state. There are 2 cases where i've introduced
state because i thought it "wouldn't hurt"..

* the target IO port
* the project search path

The rest really behaves just as cache.

So if i'm allowed to be really anal about eradication, these things
need to change. Project search path is the easiest. Target I/O is
more difficult because it requires moving from a functional to a
monadic implementation.


Entry: eliminating global path variable
Date: Sun Aug 19 17:41:15 CEST 2007

to be able to eliminate the path state, i probably need dynamic
variables (parameters).

is this cheating?

not really.. since i'm using with-output-file already, and that
doesn't really feel like cheating.

this would also solve the problem with IO of course.. still i'm not
convinced it's not cheating..

one could say it's not cheating because the value has finite extent?
so why not implement monads as dynamic variables? because dynamic
variables are not referentially transparent, which you would want when
you 'run' a monad: it should still act just on the state provided, not
on something else...

so why are parameters different then? are they less evil when they are
constant? they represent 'context'.

* one thing is sure: they are less evil than global variables due to
  limited extent.

* if they are constants, they are less evil than when they are not

To really answer the question is to implement dynamic variables with
monads, and see how they are different. The problem i'm facing in my
ad-hoc state hiding approach is that i can't combine monads:

When i'm running something in the macro monad, i can't access anything
else. To have access to path, the monad should be bigger and include
'compilation context'.

The real solution is of course to make compilation independent of file
system access. Source code needs a preprocessing step that expands all
include' statements. Since it's only one keyword, this can be
implemented in the forth-load function. That function already
implements 'file system dereference', so why not include path
searching?

Ok, made it so. 'load' is now a load-time word so file system access
is concentrated in one point. 'path' is removed: this needs to be
specified in the state file, because it really is a meta-command.


Entry: cleaning up interaction
Date: Sun Aug 19 19:07:52 CEST 2007

This is the biggest change. Probably best to separate it out in a
different monad. The state associated with interaction is:

* I/O port
* target address dictionary
* assembly code

With this data it can start assemble code and upload it to the target.

But.. looking at the contents of the state file, there is not much
else!

(forth)       ;; might come in handy for interaction
(file)        ;; in case we want to access the file system
(config-bits) ;; on the fly reprogramming? some day probably
(consoles)    ;; this is the only real meta data not necessary for interaction..


maybe it's not worth it to split interaction off of prj. maybe it's
even just a bad idea: you'd want the 'fake console' to have power over
the whole project. impossible without giving it all the state.

let's just clean up tethered.ss and move out functions to badnop.ss

but... i'm using with-output already. so why not just have the i/o
commands do the same?

done. this immediately solves the problem of having more than one
device attached. i.e. a distributed system with all identical devices.


Entry: side-effect free macros
Date: Sun Aug 19 20:00:19 CEST 2007

i was thinking: if macros are side-effect free, constants can be
eliminated. because it's always possible to see if a macro is just a
constant: execute it, and if the result is '((qw <value>)) it is!

the only thing you would need constants for is to 'uncompile'.

another thing: what about making the partial evaluator reference
macros if it can be guaranteed the macros perform only computations
that can be completely reduced to values?

i need to disentangle this a bit..


Entry: no values
Date: Mon Aug 20 15:12:00 CEST 2007

i owe this to Joy: it's really good to have no "function value
quoting", i.e. just (foo) instead of something like 'foo

this leaves ' free for quoting literals, and has the benefit of a
symple abstraction syntax.


Entry: distributed programming
Date: Mon Aug 20 15:15:37 CEST 2007

The next hardware project is going to be krikit. It's going to be a
distributed system of small devices.


Entry: done
Date: Mon Aug 20 16:27:08 CEST 2007

yes, i guess so.. no pressing chages ahead, except for the macro/code
separation, side-effect free macros, and maybe dependencies.. which is
a biggie. another thing is interaction macros. so the todo looks like:

- move the words "constant, macro/forth, variable" and the 2-variants
  to preprocessor stage which can separate code into macros and forth
  code.

- add interaction macros


Entry: brood.tex
Date: Mon Aug 20 23:03:09 CEST 2007

i'm starting an explanation of macro embedding with a purely
functional approach. while i'm on the right way with my notion of
compilable, the effectful part is less obvious.

the idea is this: [ a 1 2 + b ] can be simplified to [ a 3 b ] if a
and b are effects.

somehow i'm missing something important.

maybe the situation is symmetric? instead of having language and
metalanguage, which both share some evaluation domain, they also have
functions that act on their full domain only.

i think i sort of got the duality now: the target depends on run time
state which is not representable, meaning only pure functions can be
evaluated.

...

my explanation is not completely sound.. when i'm talking about target
and host langauge, i never make the explicit conversion. there's
something wrong there. almost right, but not quite.

a compilable macro is something which can be 'unpostponed'. meaning,
it is a function that all by itself produces a program that can be
evaluated on the target.

...

another thing is that macros, in the way i implement them, are not the
macros i'm describing in the paper.

my macros are EAGER, they are a combination of the partial evaluation
strategy AND their original meaning.

the macros in the paper, at least the partial evaluation strategy, is
monadic. for compilable macros this makes no difference, but for other
algorithms, order does matter.


Entry: monoids and stacks
Date: Wed Aug 22 16:09:11 CEST 2007

something which has been tickling me for a while because i have it not
formalized in my head:

functional programming with stacks.. how does this work, really?

what's the relationship between state and stack?


so, compositional programming languages use compositions like [fg] to
express programs. all functions are unitary. that's nice to give some
framework about evaluation order (it being arbitrary, if there's a
representation of composed functions). so:

  Functional compositional languages make it easy to talk about
  partial evaluation: it's just the associativity law. Whether this is
  of any practical sense depends on whether we can partially evaluate
  FUNCTIONS to something simpler.

so let's start with inserting that thought in the paper..

then the other one is about locality of action. the fact that a
language is compositional doesn't really do much about this. you need
a way to ensure separation. this is where stacks come in.

but this is more about continuations than about being able to perform
partial evaluation.. really, the only thing i need to know is

  * POSSIBLE: that [1 +] is equivalent to [1] followed by [+]
  * ECONOMIC: that representation is actually simpler

that's the end of the story. the fact that the thing uses stacks is
relevant to prooving that [1 +] is equivalent.

i need to clean up notation.. i'm using two different notations for
application. one rpn, and one pn. let's stick to pn, because i use
functions somewhere else, and reserve rpn only for compositions.

...

there's another thing that's really wrong in my explanation. something
i noticed yesterday already... macros are about IMPLEMENTATION of
partial evaluation. i really have only a single language! that's what
it feels like also when programming. so i think i can plow over my
whole text again... frustrating, but i'll get there eventually.

maybe this is why i like programming so much. making sense is only
defined from the point of works/notworks. math is too free for me.. i
am not strict enough.

ok. the plan: get rid of the notion of 'macro' and introduce it only
later. keep everything abstract, just show a way to translate forth
into a functional language operating on state + metastate.

looks like i'm getting somewhere.. and this is going to turn up some
conceptual bugs. looks like i needed to spend this time plowing
through misconceptions..

again, this is wrong.. ARGH!

the compiler is not a map. it's a syntactic transformation. what i
call a compiler now is just the property 'compilable'.

so the compiler is something that proves a program is compilable!

ok, i got it sort of explained now. so this composite function thing
is about semantics, which leaves more room to talk about the
implementation of the proof constructor (compiler).

just added a note about function definitions. creating new names is
either something which happens outside of a program, or has to use
side effects. currently it's the latter, but i'd like to move to the
former.


Entry: real compositional language?
Date: Wed Aug 22 22:37:47 CEST 2007

actually, the step to a real compositional language is not so big any
more. just adding the parsing words '[' and ']' for program quotation,
and possibly an optimization for ifte -> if else then conversion
should do it. all other constructs can then be translated into higher
order functions.


Entry: phase separation
Date: Wed Aug 22 22:44:48 CEST 2007
Name: phase_separation

i guess now that base.tex seems to be about bull-free, the step is
phase separation for forth files. basicly this means:

1. collect all names and macros. this includes constants, variables, AND the
macros used for compiling function calls.

2. compile the code.

so.. it should in principle be possible to have proper semantic
separation of names before a source file is compiled. currently, words
have a default semantics (target word). however, i could catch
undefined names if i catch all occurances of ':', and register a macro
for each of them that will compile a procedure call. that way i can
remove problems with macro/code confusion...

so 2.. a name always maps to a macro explicitly. otherwize it is not
defined. no more default semantics. the macro might choose to compile
a call instruction using a symbolic reference.

this means the language becomes a bit less flexible:
         : (2)variable (2)constant

are no longer accessible from forth, and are prepreocessor directives
that change the code into a form:

(macros
  (a 1 2 3 +)
  (b 5 -))

(constants
  (c 1)
  (d 2 5))

(tape
  ((broem) a bla
   (lalala) bla broem))

where the tape is the layout of code memory with labeled entry
points. this structure is there to preserve multiple entry points
(fallthrough) and multiple exit points.

if macros are side effect free, constants can be eliminated. they are
simply macros that evaluate to a literal sequence, if they evaluate at
all.

i can even keep the current context for 'constant' suppose a forth
file starts with the code:

1 2 + constant broem

so the loose code "1 2 +" can be interpreted as a macro. the
consequence is of course that it's not possible to define constants
after the first function.

hmm.. i do need constants if i want constants in the assembly. because
to get them there, every constant needs to have a macro associated to
it that will compile the constant value.. so let's leave them in, but
employ the mechanism above to give them macro semantics. maybe a
constant is a macro that evaluates to a literal, so the actual macro
code can be stored somewhere else?

maybe the more important thing is to unify the compile-time constant
evaluation with macro execution? not really.. ai ai.. time to go to
bed..


Entry: set & predicate
Date: Wed Aug 22 23:56:09 CEST 2007

it never occured to me, but a set is indistinguishable from a
predicate function. operations on sets are then

(define (union        a b) (lambda (x) (or  (a x) (b x))))
(define (intersection a b) (lambda (x) (and (a x) (b x))))

a thing you can't do here is to iterate over the elements.


Entry: a day in bruges
Date: Fri Aug 24 10:32:26 CEST 2007

tourist in my own country.. anyway, i made some notes:

* partial evaluation/optimization

replacing a composition [fg] by a specialized function is always
possible in a compositional language. the reason why it doesn't work
for me is mainly because of 'hidden quotation'

for example the sequence "1 THEN +" contains a jump target. which is
not purely compositional.

solution: only pure quotations. all branching should be optimization
Forth is too dirty, need a syntax preproc. is there a way to have "[ 1
+ ] [ 1 - ] ifte" as the base form, and translate it into "if 1 + else
1 - then" ?  should i move all macros that break the compositionality
to a different level?

* terminology/concept cleanup: define compilability in terms of the
  existance of a retraction.


* proper credit

MOORE: required tail recursion, multiple entry (fallthrough) and exit points.

VON THUN: program quotation + combinators, program = function
composition and constants are functions, monadic extensions: top of
stack is hidden.

DIGGINS: typed view + things you can't do (whole stack ops kill stack
threading)

FLATT: separate compilation + phase separation

* semantics of jumps?

they get in the way of FCL formulation. a jump could be a
non-terminating evaluation? is there a way to make this sound?


* closures versus quoted programs

note that quoted programs are not closures, since they are not
_specialized_. for closures you really need dynamic behaviour: at run
time, some values need to be fixed. something that could emulate
closures is the consing of an anonymous function with a state
atom. this operation is call 'curry' in kat. it could be combined with
a monadic state for more elaborate emulation of closures & objects.


Entry: i hate it when this happens
Date: Fri Aug 24 13:31:09 CEST 2007

i have something in my head, about the relationship between
compositional stack languages, monads, virtual machines for the
implementation of functional languages and the lambda calculus and
combinators.. but i can't quite express it due to lack of literacy..

argh..

ik weet weer nie waar de klepel hangt..

dus:

1. compostional language -> put partial evaluation and meta
programming in a simple framework. independent of set!

2. elaborate on the set's substructure.


--

so, 1. gives a framework on how to build a compiler. but without
stacks, composition isn't really useful. so the stacks are needed as a
tool to create general functions that can be applied in several
concrete settings. so these functions need to somehow be independent
of SOMETHING. that something is the way in which run time data is
organized.

need to find a better explanation..

something that hit me just now: a computation on a stack language
always involves saving some state, and recombining it later.. there
are 2 ways this happens:

* most functions leave the bottom of the stack intact
* 'DIP' leaves a part of the top of the stack intact

this is probably related to normal order and applicative order
reduction.


--

another problem.. why is it so hard to get this formulated correctly?

in my exposition about parsing words, i cannot really use "variable
abc" as a good example, because it really is not compositional code,

that needs to be disentangled..

conclusion is right though: in order to disentangle this system, it is
neccessary to remove some reflection. to 'unroll' the dependencies.

and the picture is really about dependencies. functional programming
is more about getting your graph free of cycles than anything
else.. maybe that's the reason for stressing on the Y combinator: how
to introduce cycles, but not really.

there's another example in dan friedman's book: essentials of
programming languages. i can't find it now, but somewhere about
implementing an environment there is a need for a circular reference,
but he uses a trick to not have to do this..

maybe it's about how to make things static. to keep them from moving
so they can be looked at in peacefully and quietly :))


--

basicly:
- stack = environment (de bruyn index)


Entry: so.. what's he most important thing now?
Date: Fri Aug 24 16:06:47 CEST 2007

a lot of ideas need some fermenting still. but there's one that's
quite clear: names cannot be created dynamicly, because that kills the
representation as a declarative language.

so i need a preprocessing step that takes out all creation of new
names. this makes some things problematic. one of them is multiple
exit/entry points.

multiple entry points can be translated:

: foo a b c
: bar d e f ;

->

: foo a b c bar ;
: bar d e f ;

then at the point where '(label bar)' is assembled, the jump to bar
can be eliminated.

multiple exit points need to be translated to an else clause

: foo if a b c ; then d e f ;

->

: foo if a b c else d e f then ;

so it looks like it's not just names, also 'implicit names' or labels.


Entry: environment and stack
Date: Sat Aug 25 09:14:26 CEST 2007

let's elaborate on this a bit more. the stack can be seen as
related to an environment, which is a way to implement
substitution in lambda expressions. to simplify, suppose we
have only unary lambdas.

(lambda (a)
  (lambda (b)
    ((+ a) b)))

this can be rewritten using de bruin indices (starting from 0) as

(lambda (lambda ((+ 1) 0)))

where the numbers refer to an index into the environment
array. this gives an easy way to represent a closure as a (compiled)
lambda expression, and an environment.

maybe the missing ingredient in my understanding is the SK calculus?


Entry: paper again..
Date: Sat Aug 25 10:42:23 CEST 2007

in fact, i need to distinguish between syntax and semantics a bit
better. a compiler works on syntax, (a representation).

von thun has some text about this..

again, i'm amazed by how untyped you can be in scheme! i'm just
performing operations on lists, without ever having to clarify what
things are.. interpretation is a consequence of what functions you
apply on the symbols..

so, let's say that "working with symbols" is always untyped. they are
a universal tool of delayed semantics. maybe that's the idea behind
formal logic, right? by just specifiying HOW to operate on symbols,
you never need to explain what you are actually doing.

Quite an adventure, trying to provide a model for the language and
compiler.

* read Flatt's paper about macros again
* logic and lambda calculus.
* monads and their relationship with compositional programs.
* a purrr module system + compositional language


http://zhurnal.net/ww/zw?StokesTheorem

Funny. I have that book on my shelf, and i tried to start reading it
on thursday. I guess it has a major truth. Once the necessary
structure is in place, the conclusions are often trivial. So all the
effort is in the creation of structure. Sounds like programming.

Try "Once things are clearly defined, the solution is at most a single
line.", "Write the language, and formulate your solution in it.", "Ask
the right question."


Entry: fully declarative and compositional
Date: Sat Aug 25 11:45:01 CEST 2007

declarative:

all names defined in a source file are to be know
before the body of the code is compiled. that way, a program is a
collection of definitions.

compositional:

make all branching constructs fit the compositional view by using
combinators only.

both are largely independent, but should lead to a better
representation. advantages:

D
- side-effect free macros
- detection of undefined words
- possibility of modularization (later)

C
- correct optimizations in the light of branching


let's learn a lesson from the past.. i can't afford to break it
again. the changes that need to be made can be made without changing
the semantics so much that a radical rewrite of forth code is
necessary. all constructs used at this moment need to be preserved.

is there an incremental path? the following syntactic transformations
are necessary:

1. constant -> macro
2. variable -> macro
3. word definition -> macro
4. split a file into macro + code


Entry: monads in Joy
Date: Sat Aug 25 13:56:47 CEST 2007

http://permalink.gmane.org/gmane.comp.lang.concatenative/1506
http://citeseer.ist.psu.edu/wadler92essence.html

so that's what i've got to do today.

after reading manfred's comments, i think i need to read more of his
work before i attempt to re--invent his ideas.

the paper by wadler gives some relation between monads and cps. might
contain what i need to explain the relation between monads and
stacks. probably reaching the conclusiong that stacks are monads.

let's see if i can learn something from this.

for each monad, provide bind and unit.

one complication is that functions in cat return a stack. let's see if
that makes things worse.

unut:  x -- M x
bind:  M  fn -- N

bind extracts values from the monad, applies fn to each of them, and
constructs a new monad from the output.

it's easier to use 'join', since 'map' is so trivial.

wait, is this really the case?

map is (a -> b) -> (M a -> M b)

from http://en.wikipedia.org/wiki/Monads_in_functional_programming

  (map f) m ≡ m >>= (\x -> return (f x))
  join m ≡ m >>= (\x -> x)

  m >>= f ≡ join ((map f) m)

this is a little different than what i've been talking about
before.. maybe it's best i try to formulate this in scheme first. See
brood/mtest.ss

-- the misconceptions

M a -> (a -> M b) -> M b

does not mean the monads are different!

it merely means: unpack, process, repack


so what is 'map'. map really is map!

see the next entry for more intersting stuff about monads in scheme..


Entry: monads in scheme
Date: Sat Aug 25 16:18:59 CEST 2007


;; Monads in scheme.
(module mtest mzscheme

  ;; Monads are characterized by

  ;; - a type constructor M
  ;; - unit :: a -> M a
  ;; - bind :: M a -> (a -> M b) -> M b

  ;; In words: something that creates the type (ad-hoc in scheme),
  ;; something that puts a value into a monad (unit) and something
  ;; that takes values out of a monad, applies them to generate
  ;; several instances of the monad, and combines them into one.

  ;; Let's create some monads in scheme, using ad-hoc typing:
  ;; representation is not abstract, and there is no type check. Start
  ;; with the list monad.

  (define (unit-list a)
    (list a))

  (define (bind-list Ma a->Mb)
    (apply append (map a->Mb Ma)))


  ;; Using monads, functions need to be put into monadic form. Simply
  ;; wrapping them with 'unit' is usually enough.

  ;; (bind-list '(1 2 3) (lambda (x) (unit-list (+ x 1))))


  ;; So what is 'map' for the list monad? Haha! It's map!

  (define (map-list a->b Ma->Mb)
    (lambda (l) (map a->b l)))


  )


so now introduce polymorphy. instead of storing stuff in a hash, it's
easier to just store a pointer to the monad structure in the record
for a certain monad, i.e. use single dispatch OO.

so i got a polymorphic bind, and a fairly decent interface that
abstracts away the polymorphism, so 'unit' and 'bind' for each monad
can operate on the representation only.

  (define-monad Mlist
    (lambda (a)        (list a))
    (lambda (Ma a->Mb) (apply append (map a->Mb Ma))))


so.. what i can i do with this?

maybe best to try to translate some examples from wadler's paper into
this mechanism.

or to define a 'do' macro. the Haskell code for the list monad

  do {x <- [1..n]; return (2*x)}

is a bit too mysterious.. let's try something simpler. the maybe monad.

wait, all my functions are unary.. damn. how to take multiple values
into a monad? can't really do that.. will need explicit currying.

this uses letM*
http://okmij.org/ftp/Scheme/monad-in-Scheme.html

(define-macro letM
  (lambda (binding expr)
    (apply
     (lambda (name-val)
       (apply (lambda (name initializer)
                `(>>= ,initializer (lambda (,name) ,expr)))
              name-val))
     binding)))

so i transform this to my code..

try this:

  a = do x <- [3..4]
       [1..2]
       return (x, 42)

  a = [3..4] >>= (\x -> [1..2] >>= (\y -> return (x, 42)))


now

(define-syntax letM*
    (syntax-rules ()
      ((_ () expr) expr)
      ((_ ((n Mv) bindings ...) expr)
       (bind Mv
             (lambda (n)
               (letM* (bindings ...) expr))))))

leads to this:


(letM* ((a (Mlist '(1 2 3)))
        (b (Mlist '(10 20 30))))
    (unit Mlist (+ a b)))

#(struct:monad-instance
  #(struct:monad Mlist #<procedure> #<procedure> #<procedure>)
  (11 21 31 12 22 32 13 23 33))


wicked. the macro expansion gives

(bind
 (Mlist '(1 2 3))
 (lambda (a)
   (bind
    (Mlist '(10 20 30))
    (lambda (b) (unit Mlist (+ a b))))))


let's see if the type of 'return' can be inferred in a structure like
this. no. the return type of the entire expression is determined by
the return in the letM* block. this type is arbitrary and only
determined by the context of the expression, to which we have no
access in scheme. one possibility to fake this is using a dynamic
variable.

so, i guess it makes more sense to switch to map and join as basic
operations?

no. i ran into a problem with double wrapping of structures that
requires the 'join' operation to be aware of the wrapping. so i'm
going to revert the changes.

next exercise: the state monad.

i never really understood this. a state monad contains a function that
will return a value and a new state.

-- "return" produces the given value without changing the state.
return x = \s -> (x, s)
-- "bind" modifies transformer m so that it applies f to its result.
m >>= f = \r -> let (x, s) = m r in (f x) s


EDIT: see monad.ss


Entry: kat monads
Date: Sat Aug 25 21:35:12 CEST 2007

the problem seems to be that 'return' and 'bind' need to be formulated
in a way that properly deals with the stack. somehow it seems to get
in the way.

let's take a new look at it, modeling things on 'map'.

fmap   	  s.a->s.b Ma -- Mb
join	  MMa -- Ma
return	  a -- Ma
bind	  s.a->s.Mb Ma -- Mb

(bind     fmap join)

the thing which bothers me is 'map'. something is smelly about map in
joy, because of the stack "doing nothing".

it's strange that 'for-each' feels really natural, because it has
threaded state. but map somehow feels wrong..


Entry: for-each is left fold
Date: Sat Aug 25 21:45:30 CEST 2007

for-each is foldl is sort of 'universal iteration'.

	  '() '(1 2 3) (swons) for-each   ==  '(3 2 1)

foldr is more like 'universal recursion', and i don't have a direct
analog in kat. maybe i should create one like this:

          '() '(1 2 3) (cons)  foldr == '(1 2 3)


Entry: state monad
Date: Sun Aug 26 13:08:41 CEST 2007

a state monad is a nice example of a computation. nothing 'happens' as
long as the monad is not executed explicitly by applying the value to
some initial state. i think this is a nice starting point to formalize
what i'm doing, since it's about the same principle: build a
composition that represents the compilation, and execute it on an
initial state.

so, really, monads are a way to formulate any computation as a
function composition. doesn't that sound familiar?

the thing to find out is

* how my very specialized way of state passing fits in the general
  monad picture.

* why does 'map' feel so strange in Joy/KAT ?

* what is a continuation in KAT?

the last one i can answer, i think. it's a function that takes a
stack, and represents the rest of the compuation. so the continuation
of 'b' in [abcd] is just [cd]. i've added call/cc to base.ss


let's re-read von thun's comments


Entry: closures & stacks
Date: Mon Aug 27 14:49:10 CEST 2007

something to think about: a compositional language can have first
class functions without having first class closures, and without this
leading to any kind of inconsistency. like 'downward closures only'.

this brings me back to linear vs. non--linear.

a key observation is that linear data structures are allowed to refer
to non--linear ones, as long as the non--linear collector can traverse
the linear data tree (acyclic graph in the case we work with reference
counts as an optimization). but non--linear structures are NOT allowed
to refer to linear structures. (because otherwise they would not be
able to be managed by the linear collector).

this makes the non--linear collector trivially pre--emptable by linear
programs. PROVE THIS!


Entry: linear memory management
Date: Mon Aug 27 17:57:00 CEST 2007


something to think about is how embed a linear language in scheme as a
model. as long as its primitives never CONS, this should work.

i'm trying to formulate a machine that can express the memory
management part of a linear language, if it is given a set of
primitive functions. see linear.ss

this is an attempt to make poke.ss work, but from a higher level of
abstraction.

something i did wrong on the first attempt is to change the tree
structure WHILE still using the old addressing mode. permutation of
register contents needs reference addressing, so my macros are wrong.

this means i need a different representation of REFERENCES.

let's say a reference is:
- a pointer to a cons cell
- #t for CAR and #f for CDR


funny. i'm running into a problem numbering binary trees. the most
visually pleasing numbering is breath first

1
2 3
4 5 6 7
8 9 10 11 12 13 14 15

this corresponds the the binary encoding:

   1abcd...

where a is the first choice, b the second, etc...
the one i chose intuitively was

  1...dcba

which is not so handy, but is more efficient to implement when the
labeling doesn't really matter that much.


ok i got it working. i have a tree permutation 'engine' which is
accessed by numerical node addresses. now what does this buy me? a
simple way to talk about embedding linear trees.

in practice, some of the nodes are constant, and are better put in
registers.


Entry: binary trees
Date: Mon Aug 27 21:51:58 CEST 2007


still not 100% correct.. i'm loosing nodes.

ok. i'm making a mess of it, but i think can conclude the following:

1. it is possible to use a tree as the data universe
2. normal forth operations can be written as binary and ternary permutations on a tree
3. such a tree is conveniently addressed numerically

what i'm about to do is to:

- create an embedding of normal forth operations in a single tree, by:
  * fixing the positionss of the stacks
  * associating each operation to a permutation

- find a way to efficiently generate code for these operations, with
  the possibility of mapping some fixed nodes to registers.


AHA!

one pitfall i knew, and i run right in it. there's one operation which
is not allowed: if R points to a cons cell, it is not alloweded to
swap the contents of R with CAR or CDR of that cell, because this
creates a circular link, effectively loosing the cell.

more generally, it is not allowed to exchange R1 and R2 if they are in
the same subtree.

baker's machine contains no operations that can lead to such
permutations. it only talks about exchanging the contents of registers
with cons cells. this is different.

i'm trying to write the permutation for '>r', written as (D . R)

the following sequence of permutations is legal

((d . D) . R) -> ((d . R) . D) -> (D . (d . R))

which is (5 3) followed by (2 3). can this be written as a single
cycle (2 3 5) ? one would say yes..

so i guess i had a bug? since it created a circular ref in my previous
implementation.

now i can get (2 3 5) to work, but (5 3 2) doesnt!

i think i don't understand something essential here..

this is getting interesting!

i think i see the problem now. one is that my permutations are
inverted, and two is that (2 3 5) is not legal, but (5 3 2) is.

how to distinguish legal from illegal permutations?

and the inverse of (5 3 2) is not (2 3 5) but (2 3 7)

it looks like this encoding of the nodes is not very useful for tree
permutations.


Entry: legal permutations
Date: Mon Aug 27 23:39:23 CEST 2007


it looks like a more interesting approach is to start with operations
that are legal and invertible, and find their closure.

the difference with baker's machine is that i'm trying to use only one
root. hmm.. there has to be a way to see if a permutation is legal..

why is (2 3 5) not legal. because 5 gets the value of 2, which points
to 5. so a condition is that a register x cannot receive the contents
of a register y if x is in a subtree of y.

in (5 3 2) none such assignment happens.
- 5 is not a subtree of 3
- 2 is not a subtree of 2
- 3 is not a subtree of 5

'subtree of' can be computed by comparing box addresses

    [1]
   [2|3]
[4|5] [6|7]

        [1]
      [10|11]
[100|101] [110|111]

a is a subtree of b if b matches the head of a.

this way, no circular refs can be introduced. instead of thinking
about cons cells, think of binary trees. it indeed does not make sense
to swap nodes if one node is a subtree of another node.

what about enumerating all legal binary permutations on an infinite
binary tree?


()           identity

(2 3)

(2 6)  (2 7)
(3 4)  (3 5)

(4 6)  (4 7)
(5 6)  (5 7)

(2 12) (2 13) (2 14) (2 15)
(3 8)  (3 9)  (3 10) (3 11)
(4 12) (4 13) (4 14) (4 15)
(5 12) (5 13) (5 14) (5 15)
(6 8)  (6 9)  (6 10) (6 11)
...


back from tree rotations, which are not general enough...

in binary

()

(10 11)

(10 110,111)  (11 100,101)

(100,101 110,111)


back to numbers

level (bits)
1                /
2                (2 3)
3                (2 6,7) (3 4,5)
                 (4 5,6,7) (5 6,7) (6 7)
4                (2 12,13,14,15) (3 8,9,10,11)
                 (5 8,9,12,13,14,15)

it's quite hard to specify without exclusion statements..

but i guess i got what i was looking for: limited to only binary
permutations, the legal ones are easy to characterize.

what about using multiple coordinates, and then embedding them in a
numeration? It is always possible to encode an n-tuple of natural
numbers as a single one by interleaving the bits.

a legal binary permutation from node A and node B (A < B) can be
written as the tuple (A - 2, s, d) where s denotes the same level
trees and d the dept from it. this is really clumsy and doesnt work..

it looks like what i am looking for is a primitive dup and drop. the
reality is, these are not primitive!


Entry: tree rotations
Date: Tue Aug 28 00:03:47 CEST 2007

can i work with just tree rotations? yes. moving an element from one
stack to another is a tree rotation. the essence of a tree rotation is:

- reversal of P -> Q  to Q -> P
- movement of one of Q's subtrees to P

so a rotation is parameterized by 2 adjacent nodes P -> Q, and the left...

wait!

it's not a rotation, since the subtree that moves is the one in between.

it is a rotation if the stacks are encoded as

  ((D . d0) . (r0 . R))

then a rotation is simply

  (D . (d0 . (r0 . R)))

trees which represent associative operations have a value which is
invariant under tree rotations.

is this helpful at all, or am i moving away from my point? with 2
stacks, a data stack and a free stack, motion can be implemented by
rotations.

this is not general enough.. i have no need to preserve ordering.


Entry: different primitives
Date: Tue Aug 28 00:55:17 CEST 2007

so with a 2-stack system (D . F) with D rooted at 2 and F at 3, the
primitives are:

D  = 2
D+ = 5
D0 = 4
D1 = 10

F = 3


the free list needs to be flattened. this can be done when reserving a
new cell or when dropping a data structure.

the latter is probably best since it is
* more predictable: deleting a large structure takes time
* all references to externally collected objects can be removed

so, i do have a need for rotation! if the CAR of the free list is not
NULL, rotate the free list, then DROP the newly exposed top and DROP
the part we rotated to the stack.

: >free  (D+ F) (D F) ;
: swap   (D0 D1) ;
: free>  (D F) (D+ F) ;   \ [a.k.a. nil / save]

: drop   null? if >free ; then rotate drop drop ;


like baker remarks, a lot of operations can be coded so they avoid
copying of lists. i have a lot of this code in PF already..

the moral of the story is:

* this linear stuff is quite nice to build a language on top of, but
  you need a decent layer below it to create a proper set of optimized
  primitives to make it work efficiently.

* using a single tree works just fine, but is probably not necessary
  if the basic structure (like where the D, R and F stacks are)
  doesn't change.

* only use binary permutation of disjunct trees. disjunct trees are
  easier to spot for binary permuations.

* numbering trees in 1abc... fashion works well, and is easy for
  drawing diagrams.

* drop needs to deconstruct its argument.


the hash consing thing in this paper i dont get
http://home.pipeline.com/~hbaker1/LinearLisp.html

but but... about ternery permutations. they are easier to
understand. because the rotation i'd like to perform has to be
factored in a non-intuitive way..

maybe it's just the rotation operation that's difficult to express
that way? instead of focussing the movement of the data stack's first
CONS cell, it's easier to focus on the movement of the cell we want to
get rid of. so in the picture painted above, the operation 'rotate' is
actually 'uncons' and would be (9 5) (4 5)

i think that settles most of the questions. the rest is fairly
straightforward to fill in.


Entry: next
Date: Tue Aug 28 14:23:06 CEST 2007

after this small detour about trees, NEXT on the list:

* clean up syntax preprocessing & purely functional macros

* investigate on HOF syntax for Purrr18

* determine if Purrr is a valid project, or if it's best to aim for
  Poke.


Entry: ANS Forth - poke - PF
Date: Tue Aug 28 14:26:07 CEST 2007


the last question is quite an important one.. if i'm planning to write
a language for education, do i really want ANS Forth? the only reason
would be to have something 'standard', but for what reason. better
documentation?

i never used ANS Forth, and the more i get into this language
simplicity thing, the more i start to dislike it. i think i have all
the elements for a decent linear VM ala PF. should fit on a pic18

and.. a cleaner language is easier to teach. moreover, a poke language
can be made safe.

is it worth to stop somewhere in the middle to use a little bit more
optimal language, instead of one based on CONS cells?

this is not something to decide in an instance, but i think life is
already complicated enough to fill it with problems created by
weirdness in ANS that i don't use.. Forth is dead. long live KAT &
PURRR :)


Entry: Haskell
Date: Wed Aug 29 13:15:22 CEST 2007

just watched Simon Peyton-Jones’ OSCON 2007 tutorial, which clarified
a lot of things. he talked mostly about type constructors, type
classes, and the IO monad.


* IO a   is   world -> (world, a)

* and a type class is implemented as a record of functions that
  'travels independently' from values, i.e. dispatch based on return
  type.

* type constructors are also used for destructuring. this generalizes
  the 'list' constructor, and tuples (which are not constructors i
  think..)


Entry: hash consing
Date: Tue Aug 28 20:23:22 CEST 2007

so what's that all about. see:

http://home.pipeline.com/~hbaker1/LinearLisp.html
Reconstituting Trees from Fresh Frozen Concentrate

first, that section is not about hash consing, but about something
different: "our machine will be as fast as a machine based on hash
consing"

i dont get it..


Entry: compositional and?
Date: Wed Aug 29 14:30:25 CEST 2007

i was wondering what the deal is with compositional view. it allows a
simple framework for metaprogramming, but that's all.. i made this a
bit more clear in the paper.


Entry: curry-howard
Date: Wed Aug 29 16:42:46 CEST 2007

quite remarkable. i'm running into cases where operations from the
code i thought were merely a hack, like the 'snarf' operation, turn
out to be quite important for a monadic formulation of a stack
language.

in other words: i'm extracting some mathematical structure by naming
the types of all the transformations that are present in the code. i
think i'm just going to do this exhaustively..

in other words, by hacking around semi-blindly, following just an
ideal of 'elegance' i end up with a nice description of what i'm doing
in categorical sense.


Entry: arrows
Date: Fri Aug 31 00:29:36 CEST 2007

reading 'programming with arrows' by hughes.
this 'dip' business is really arrows..

just rewrote brood.tex to give a categorical relationship between a
TUPLE language and a STACK language.

what remains is to explain their difference...

it's been quite a day.. what did i learn really?

given a tuple language, mapping it to a stack langauge makes explicit
the need for run time 'cons' if the tuple language can create
closures.

ok, i need to go over this again since i lost direction a bit..

the CTL -> CSL bit is good though, since it reflects a 'real' part of
brood, namely the relationship between scheme and kat.

I'm still not really satisfied about the explanation. I probably need
some more time thinking about closures and dynamic memory: how to:

- combine a low level language with just stacks and function
  compostions, both implemented as vectors, with a linear memory
  model that supports closures.

- how to add 'constant trees' to a linear memory tree.

- what about trees and reference counts.

Also, i need to read Hughes paper about arrows.

what about this vague rambling:

- data stack = future data
- return stack = future code


Entry: stacks and continuations
Date: Fri Aug 31 18:19:05 CEST 2007

from wikipedia
http://en.wikipedia.org/wiki/Continuation

Christopher Strachey, Christopher F. Wadsworth and John C. Reynolds
brought the term continuation into prominence in their work in the
field of denotational semantics that makes extensive use of
continuations to allow sequential programs to be analysed in terms of
functional programming semantics.


for the linear memory case, i need to implement:
- closures (== cons)
- continuations (== a stack copy)

to do this efficiently, i need baker's approach to linear data
structures, which can be implemented using reference counts because
they cannot be circular.

something tells me i'm chasing something really obvious.. i guess the
next thing to tackle is to describe the linear language, and write a C
model for it. i.e. to implement POKE.


Entry: CSL vs CTL
Date: Fri Aug 31 22:03:26 CEST 2007

i talked myself into a pit.. what about "1 2 3 +". how can this be
seen as a CTL? only by making + operate on more than 2--tuples. this
means all arrows T_i -> T_j are also in T_{i+n} -> T_{j+n}


Entry: linear
Date: Fri Aug 31 22:29:04 CEST 2007

the next thing to do is to create closures without garbage
collection. this would make PF interesting.

so the deal is: tree structured data allows for 1--ref structures
which can be optimized using reference counts. i guess this is the
hash consing business.

hash consing =
- tabel van CONS cells
- bij (cons a b) -> check if cell is in hash: inc refcount else new

so that should be able to speed it up.. it's a bit smelly though.


Entry: poke
Date: Sat Sep  1 12:28:16 CEST 2007


yep.. time to get practical. this linear thingy is the most
problematic one.. i guess the thing i need to investigate is:

- write a linear memory manager in terms of a low-level set of
  operations (forth machine)

- write the linear machine's interpreter in itself.


i'd like to take a different approach with this: first write it in a
testable highlevel setting, then just map it to lowlevel code.

remarks:

 * by making the code storage nonlinear, a large problem is already
   solved: the return stack does not need to copy continuations. the
   return stack is a program == a primitive program | list of
   programs.

 * CDR coding. all code in flash are CDR-linked lists, but encoded
   such that they can be represented as vectors. this works very well
   with the remark above. it looks like this solves my earlier problem
   of vectors vs lists.

 * no branches. only combinators.

 * types: - primitive
          - integer
          - cdr-coded nonlinear cell
          - ram cell

 * type encoding: since there are only 4 types, 3 of which are memory
   addresses, it can be solved with a memory map, and N-2 bit integers.


there's one important part i forgot: VARIABLES
those don't really fit in the picture..


Entry: partial application vs. curry
Date: Sat Sep  1 23:35:14 CEST 2007


curry:  ((a,b)->c) -> (a->(b->c))

then partial application is i.e. curry (+) 123

so maybe i should follow christopher in:
http://lambda-the-ultimate.org/node/2266

and call what i'm calling curry 'papply'

and apparently, partial evaluation != partial application. so how do
they differ?


Entry: XY and stack/queue
Date: Sat Sep  1 23:53:44 CEST 2007

the [d r] thing i described about continuations yesterday is made explicit here:
http://www.nsl.com/k/xy/xy.htm

XY by Stevan Apter


Entry: goals
Date: Sun Sep  2 12:44:01 CEST 2007

the reason brood.tex doesn't work well is that i'm not setting goals
of the project. i started wandering when talking about categories...

so the goals are:

- create a language based on the ideas behind Forth, which is

  * easily mapped to a target (i.e. has very lowlevel elements)

  * less resistant to static analysis than Forth.

  * requires small resources in base form (i.e. just some stacks)

  * contains some highlevel constructs that can be easily optimized,
    i.e quoted programs ala Joy.

  * serves as an implementation language for a CONS based
    language. either a linear or nonlinear one.


Entry: references
Date: Sun Sep  2 13:15:59 CEST 2007

time to collect some references.


Entry: language levels
Date: Sun Sep  2 13:46:47 CEST 2007


- macro assembler / virtual forth machine: purely static. macros do
  not rely on any run time kernel support.

- macros with run--time support: some constructs that can not be
  translated to straight assembler require run time support code. for
  example indirect memory access using '@' and '!'

- dynamic memory: cons


what i'm guessing is that i need to get my dependencies straight. this
means:

- get rid of side--effects in macros (all names are identified in first pass)
- create a purely compositional base language with 'required optimization'

so where to start? it's a big job, but really needs to be done before
i start implementing linear CONS.

it looks like the end result here is going to be quite different from
what i have now. i'm basicly moving from a linear to a block
structured language.


Entry: block structure
Date: Sun Sep  2 13:54:45 CEST 2007

the real question is: should i implement the block structured language
on top of the linear one, or provide a set of macros to translate
forth into a block structured language, which is then transformed back
into a linear one?

it seems reasonable to keep the forth layer as the lowest one, and
translate into it. so basicly i need a lexer with list support.

time to factor out the basic problems:
* stream.ss
* stream-match.ss


Entry: lazy lists
Date: Sun Sep  2 16:35:20 CEST 2007

added stream.ss and corresponding matcher.

funny how reverse accumulation is no longer needed when you use lazy
lists!

maybe i should propagate this to the asm buffer? there is one problem
with the asm buffer though: it is used as a stack.

anyways.. i can make the lexer lazy. DONE. it's simpler now.


Entry: on lazy lists
Date: Wed Sep  5 17:29:30 CEST 2007

let's see if i can say something intelligent about this.. what i
notice is that streams make you avoid the following pattern:

* read list, process, accumulate as push.
* reverse the list

lexing/parsing fits this shoe nicely.

so.. are streams processes?

instead of using '@cons', one could just as well write:

- read
- process
- write

so what is the difference? it looks the lazy list approach is less
general, since it has only one output? multiple outputs need to be
handled using multiple lists. while the process view uses one process
and multiple streams.

and yes, these are processes. since the non-evaluated tails act as
continuations. every '@cons' should be read as write+block.

so, what about the asm?  it still needs to be used as a stack,
however, multiple passes can now be done lazily.


Entry: onward
Date: Wed Sep  5 22:27:58 CEST 2007

i keep getting distracted.. i got some work to do!

first one is elimination of side effects in macros: all side effects
in the brood application are to be cache only. this is an important
part that will open the road for more interesting changes, hopefully
leading to a fully compositional lowlevel language with a module
system.


Entry: monads and map
Date: Wed Sep  5 22:40:42 CEST 2007

so.. what about writing a macro for this 'generalized map - not quite
a real monad - collect results in a list' pattern?

i guess this is just unfold..

no it's not..

got this macro + usage:

  (define-syntax for-collect
    (syntax-rules ()
      ((_ state-bindings
          terminate-expr
          result-expr
          state-update-exprs)
       (let next ((l '())
                  . state-bindings)
         (if terminate-expr
             (reverse! l)
             (next (cons result-expr l)
                   . state-update-exprs))))))

  (define (@unfold-iterative stream)
    (for-collect
     ((s stream))
     (@null? s)
     (@car s)
     ((@cdr s))))


but it looks just ugly, so i'm going to forget about it.. i guess, if
this pattern shows up in code, it means i'm not using a proper hof.

what about writing it as a hof instead of a macro?

i think i'm getting a bit tired.. just reinvented unfold.. no, it's
unfold*


Entry: linear parser
Date: Thu Sep  6 00:25:26 CEST 2007


the parser can definitely be moved to streams. the fact that it
contains syntax streams is not really relevant to the structure of the
algorithms.. for example: i'm using 'match' in forth.ss

it changes a lot: the prototype of the parsers now is @stx ->
@stx. but the code should be a lot easier.

due to the linearity of forth / compositional code, writing a macro
transformer as a stream processor instead of a tree rewriter makes a
lot of sense actually..

the preprocessor will translate a token stream -> s-expressions.

occurances of syntax-case can be replaced by @match. which is exactly
what i avoided in a previous attempt.. maybe i should just create a
@syntax-case macro that's similar to the @match macro, taking
partially unrolled syntax streams.

hmm.. pure syntax-case is a bit clumsy.. but the 'no rest' parser
macro i'm using does fit pretty well.


something i've been talking about before:

syntax-case: matcher for compilation: merge 2 namespaces (pattern var + template)
match:       matcher for execution: only a single lexical namespace

i don't know how to make the pattern more explicit, but it boils down
to something like this: if you're match together with quasiquote,
you're actually COMPILING something, not computing something.

in that case, pattern matching using syntax-case might be more
appropriate, even if you're not using scheme macros, because of the
merging of template and pattern namespaces. (which have to be mixed
explicitly using quasiquoting).

actually: syntax-case matches 3 namespaces:
- pattern
- template
- transformer namespace


Entry: SRFI-40
Date: Thu Sep  6 10:16:17 CEST 2007

it's been fun, but time to move to a standard implementation:
http://srfi.schemers.org/srfi-40/srfi-40.html

it would indeed be strange if this were not somehow standardized..

(require (lib "40.ss" "srfi"))

but, 40 has problems:
http://groups.google.com/group/plt-scheme/browse_thread/thread/637cc74047a7ada9

anyway: thing to remember: streams can be ODD or EVEN
http://citeseer.ist.psu.edu/102172.html

i'm using EVEN style: (delay (cons a b)) instead of (cons a (delay b))


so what exactly is the problem with
http://srfi.schemers.org/srfi-45/srfi-45.html

?

it can be seen in @filter, as explained in the srfi-45 document:
a sequence of

(delay
   (force
      (delay
         (force

is not tail recursive. this is because 'force' cannot be tail
recursive: it needs to evaluate, and cache the value before
returning. srfi-45 solves this by introducing 'lazy'

easy to see in:
(define (loop) (delay (force (loop))))


ok. so i'm sticking with my own lazy stream implementation. most of it
should be fairly easy to replace with some decent standard library
later. i don't think i'm doing anything special..


Entry: linear parser begin
Date: Thu Sep  6 15:36:48 CEST 2007

- all parsers are @stx -> @stx
- parser-rules: easily adapted (used by predicates->parsers)
- named-parsers


i'm forgetting something.. a parser needs to distinguish between
'done' and 'todo': the driver will stitch the stream back
together. otherwise each parser needs to explicitly invoke the driver
routine as the second argument to '@append'.

the reason we use a driver is to make each individual parser agnostic
of it's environment..

in concreto: the current implementation can be largely reused, but
list tails need to be replaced by streams.

then the remaining question is: does a primitive parser return 2
streams, or a list and a stream?

again:
- if parser does 1 expansion, it needs to return 2 streams.
- if it does multiple, it suffices to return only one

it's best to let the driver decide, so the first one is more
general. making both streams makes the interface simpler.

looks like the only thing this needs is a proper syntax-case style
syntax stream matcher so i'm not jiggling too many syntax<->atom
conversions. need to think about that a bit better, to see what the
prototype needs to be.


Entry: parser rewrite
Date: Thu Sep  6 22:46:20 CEST 2007

the end is near.. code seems to simplify a lot.

need to write 2 more generic parsers:

- delimited
- nested


interesting.. this stream business is deeper than i thought. i do run
into a problem though: (values processed rest) what if rest is only
determined if processed is completely evaluated?

by moving the 'append' to somewhere else, the forcing order can no
longer be trusted. does this really matter?? i need a break.

ok.. i got it worked out as '@split' which returns 2 values: the first
is a stream before a delimiting value, and the second is the stream
after.

the code i have now needs a certain evaluation order. i can make it
independent of that by forcing until the rest-stream becomes true.

that works. also got @chunk-prefixed working: which separates a
prefixed stream into a stream of prefixed streams.


Entry: macro mode
Date: Fri Sep  7 13:24:38 CEST 2007

i found out that ';' can just as well be used in macro mode for 'jump
past end', if macro mode can only contain prefixed definitions.  this
will bring multiple exit points to macros. can change this later.

anyways.. all parsers are now token (syntax) stream processors. it
should be really straightforward from here to:

- separate macro and code definitions
- perform separate compilation for forth files (macro definitions)


about the use of ';' in macros: this probably needs some dynamic
variable because of context: a macro representing a forth file != a
normal macro. in a forth file ';' means return to sender, in a macro
it means jump past end.. maybe i should avoid this?


Entry: bored
Date: Wed Sep 12 21:28:51 CEST 2007


i had some days off writing an article for folly, and my mind is
wandering away from the lowlevel forth stuff.. talking to a friend
yesterday i realized i need something different. i'm getting
stuck. let's rehash the problems i'm facing right now:


- i need pure functional macros: no side effects except hidden in
  cache / memoization. this requires a true code dependency
  system. doing this half-assed makes no sense, so i should at least
  have something like mzscheme, possibly piggy backed on top of
  it. that however is not easy, since this will probably mess up my
  namespace stuff. so i'm a bit stuck because i can somehow forsee the
  problems that are coming after i fix up my macros.

- i want to give up on portable ANS forth idea, and design a safe
  PF-like linear language. the stumbling block there is variables,
  since it's incompatible with the linear idea. at least, doing it
  using references to cells. maybe i can use some trick here? can
  variables be managed externally so they never need to be deleted?
  can they be seen as data roots like machine registers? something is
  not right in my intuition here..


EDIT: Mon Oct 8 21:06:17 CEST 2007

Pure functional macros work now, and make things a lot better, but
this linear language variable thing i'm still quite puzzled by.


Entry: sticking to forth as basis
Date: Sat Sep 15 05:07:40 CEST 2007

reading http://lambda-the-ultimate.org/node/2452 forth in the news


i'm more and more convinced that forth should be the lowest level, not
some block structured higher level construct, which would require more
elaborate optimizations. it's best to have the pure control structs
(i.e. for next) as direct macros, and implement the higher code block
quoting constructs in terms of them.

forth has this way with return stack juggling that's very powerful for
making new control structures. this is hard to do efficiently when you
tuck it all away in combinators..


Entry: brood paper
Date: Sun Sep 16 14:47:05 CEST 2007

actually.. it would be interesting to go over my ramblings and make a
list of things i got really wrong, or saw too simplistic. then see
what solution i got or how i came to understand the issues.


- monads are not just hidden top of stack items
- the relationship between closures and CONS
- syntax-rules and composition
- pattern matching and algebraic types
- lazy lists vs. generators: lists remain 'connected'
- 'natural' compiler structure: scoping rules, quasiquoting and syntax-case (3 levels)
- more specificly: quasiquote vs syntax case: when to use macros? is it code or data?
- looping and boundary conditions (i.e. image processing)
- cdr coding and lists as arrays
- importance of side-effect free 'loading' + relation to phase separation.


Entry: linear structures, variables and cycles
Date: Mon Sep 17 16:06:58 CEST 2007

in a linear structure (tree or acyclic graph if hash consing is used)
cycles are not possible. so how do you represent datastructures that
have some form of self-reference?

the thing we're looking for here is something akin to the Y
combinator: instead of having a function refer to itself, a different
function is used to turn a function to "tie the knot".

let's start with:

http://scienceblogs.com/goodmath/2006/08/why_oh_why_y.php

i'll try to put it in my own words, see next post. the link above has
an interesting comment on self-application. also, the wikipedia page
has some interesting links:

http://en.wikipedia.org/wiki/Y_combinator

so how to you apply this trick to data structures? my guess would be
to start from data structures in the lambda calculus, and then making
things more concrete.


Entry: Y combinator
Date: Mon Sep 17 18:55:07 CEST 2007

a fixed point p of the expression F satisfies F(p) = p. the Y
combinator expresses p in terms of F as p = Y F. combining the two we
get:

     F (Y F) = (Y F)

simply expanding this gives exactly what we want:

     Y F = F (Y F) = F (F (Y F)) = F (F (F (...)))

where the dots represent an infinite sequence of self applications.
that's all folks. in order to implement useful recursion, simply write
the 'body' F, and Y will take care of the rest.

let's make this a bit more intuitive. suppose we want to create a
function f which is defined recursively in terms of f. look at F as a
function which produces such a function f,

    F : x -> f

the recursion is a consequence of the infinite chain of applications

    f = Y F
      = F (F (F ...))
      = F f

so what are the properties of F? first it needs to map f -> f. and
second if a finite recursion is desired, it needs to do this in a way
that it creates a 'bigger' f from a 'smaller' one, eventually starting
from the 'smallest' f which does not depend on f: this leads to a
finite reduction when normal order reduction is used.

let's solve this problem in scheme, for Y F = factorial. so we know
that:

   factorial = F (F (F (...)))

or

   factorial = F factorial

in words, F is a function that returns a factorial function if it is
applied to a factorial function. so the factorial function is a fixed
point of F. the Y combinator finds this fixed point as

   factorial = Y F.

the rest is fairly straightforward: a nested lambda expression which
uses the provided 'factorial' function to compute one factorial
reduction step:

F =

(lambda (factorial)
  (lambda (x)
    (if (zero? x)
        1
        (* x (factorial (- x 1))))))


the thing which always tricked me is 'fixed point', because i was
thinking about iterated functions on the reals used in many iterative
numerical algorithms like the newton method. in the lambda calculus,
there are only functions and applications, so a fixed point IS the
infinite nested application, since that fixed point value doesn't have
another representation, while a fixed point of a function on the reals
is just a point in the reals.


Entry: algebraic data types
Date: Tue Sep 18 13:44:48 CEST 2007

look no further.. the plt-match.ss actually has this kind of stuff, at
least the pattern matching associated to algebraic types. and i think
it is extensible.

http://download.plt-scheme.org/doc/371/html/mzlib/mzlib-Z-H-34.html
http://en.wikipedia.org/wiki/Algebraic_data_type

"In computer programming, an algebraic data type is a datatype each of
whose values is data from other datatypes wrapped in one of the
constructors of the datatype. Any wrapped data is an argument to the
constructor. In contrast to other datatypes, the constructor is not
executed and the only way to operate on the data is to unwrap the
constructor using pattern matching."


Entry: pic network
Date: Tue Sep 18 20:40:10 CEST 2007

1. simple: 2 wires
2. robust: working boot loader


Entry: parser-tools lexer
Date: Thu Sep 20 19:20:11 CEST 2007

i'm replacing the lexer with the one from parser-tools. this is a lot
lot easier than writing your own. what a big surprise; too bad i
postponed it for so long..


Entry: message passing
Date: Thu Sep 20 21:15:05 CEST 2007

hmm.. message passing concurrency seems to be the real solution of
tying a core and metaprogrammer together. i should find out how to
formalize message passing (i.e. Peter Van Roy and and Seif Haridi's
book "Concepts, Techniques, and Models of Computer Programming"
http://www.info.ucl.ac.be/~pvr/book.html)


Entry: work to do
Date: Sat Sep 22 19:42:35 CEST 2007


* documentation
* bootloader (+- DONE)
* independent of emacs?

preparing for waag & piksel, the most important problem to solve is to
make the bootloader robust. this is probably best solved as:

    serial cable  plugged -> start console
    unplugged (i.e. with jumper to gnd) -> start app (at 0x200 hex)
    all interrupt vectors moved to 0x200 block

then this block can be made write-protected, so there's absolutely no
way to mess it up -> can eliminate ICD2 connector on boards.


Entry: purrr manual questions + necessary fixes
Date: Sun Sep 23 13:30:49 CEST 2007

* can i get at least an 16--bit library running without making it stand-alone?
* how difficult is it to unify macros and words from user perspective?
  -> interaction always compiles a 'scrap' function.
* is it possible to write all control structures in terms of tail recursion?

the more filo ones:

* exceptions are imperative features.. is this bad? when is this bad?
  it's like using continuations, which is interesting for backtracking
  etc. i'm leaning toward pure functional programming, but some
  imperative features are really OK as long as they are
  shielded. i.e. global mutable variables are clearly not. (namespace:
  single assignment = ok + possible to hack for debug).


Entry: new bootloader fixes
Date: Mon Sep 24 12:37:41 CEST 2007

i got the monitor working, now i need to get the synth back up. some
things that need fixing from the debugging side:

* a correct jump assembler (+- DONE: throws exception)
* a correct disassembler (+- DONE: lfsr broken)
* constants in console (DONE)
* cache macro compilation
* a command to erase a block of code during upload


note about field overflows: for data values, it should be ok: it's
quite convenient to assume they are finite size. for example, banked
addressing.

for code it's an error, since you don't have any control over this
while programming.


Entry: error reporting
Date: Mon Sep 24 14:15:54 CEST 2007

yes, i am at fault here. never really gave it much thought, but it's
starting to become a problem. my error reporting sucks.

one of the most dramatic problems is the loss of line numbers to
relate errors to original code. a solution for this is to use syntax
objects everywhere.

second is the way errors are handled in the assembler. currently i
have some code that's a bit hard to understand: i got used to hygienic
macros, and symbol capture looks convoluted to me. maybe i just need
to rewrite that first?

hmm.. what about systematicly replacing 'raise' with something more
highlevel. one of the things that is necessary is a stack trace. there
was some talk on the plt list about this recently. let's have a look.

there is (lib "trace.ss") which doesn't really do what i need, since
it's active. what about taking this error reporting seriously, and
giving it its own module? would be good to eventually document all
possible errors etc.

what about the following strategy: every dubiously reported error will
be fixed, no matter what it takes.


>> c>
ERROR:
#<case-lambda-procedure>: no clause matching 1 argument: (qw)

this is a stack underflow error

i was thinking about installing an error translator in rep.ss, but
this kills the tail position. therefore, errors need to be translated
at the top entry point, which in this case is in prj.ss

it's really not such a simple problem.. need to define what
information i'd like to get: errors need to b e reported at
'interface' level which is either compile/run of files/words.

compile errors are most problematic since they need to be related to
source location..


Entry: state mud
Date: Tue Sep 25 14:05:35 CEST 2007


the prj.ss file should do nothing more than fetching/storing state and
passing it to pure functions. i am a bit appalled by the way things
work in prj.ss, because this state binding tends to swallow
everything..

maybe it's not such a good idea after all? i guess it is still a good
idea, but its only function should be to manage state.

let's rehash state stuff:

* only prj.ss contains permanent state
* I/O uses read-only dynamic scope for the read/write ports
* macros etc.. are supposed to be read-only cache
* all the rest is functional

UPDATE: Thu Sep 27 22:56:03 CEST 2007
- moved some functionality to badnop.ss
- adopted a left/right column notation for state/function


Entry: boot code and incremental upload
Date: Tue Sep 25 15:05:23 CEST 2007

the basic rule for forth is: code is incremental. if you need to patch
backward, you need to do an erase + burn cycle. how to do this
automaticly?

it's probably not so hard to solve by performing (CRC) checks on
memory.


Entry: core syntax
Date: Tue Sep 25 18:05:36 CEST 2007

just writing the purrr manual and i got back to this language tower
thing... i really need a core s-expression based syntax for code with
multiple entry and exit points, instead of forth.


Entry: or
Date: Tue Sep 25 19:44:24 CEST 2007

Something that's really handy in scheme is a short-circuiting 'or'.
i'm in need for something like that to define interactive word
semantics: try executable words first, then try variable names, then
try constants (or later macros). In scheme this is easy because
variables are referenced multiple times, in CAT this is awkward due to
explicit copying/restoring of the argument stack.

Some backtracking formulation would be nice, but generic backtracking
is overkill. It also requires explicit handling of the continuation
object. Escaping continuations work fine here, and they can be stored
in a dynamic parameter, so no explicit manipulation of continuation
objects is necessary.

With 'check' being a word that aborts the current branch if the top of
the stack is false, using the quasiquote (see next post) this is
simply:

`(,(foo check do something check more stuff)
  ,(bar check do something else)
  ,(in case everything fails))
attempts

The apology:

 In a compositional language, escape continuation (EC) based
 backtracking might take the role of a conditional expression because
 it's often easier to go ahead and backtrack on failure than to
 perform a number of tests/asserts ahead of time which might CONSUME
 your arguments, so you need to SAVE them first. An EC can be used to
 restore the contents of the stack before taking another branch.


The disadvantage of course is that words that use 'check' are only
legal within an 'attempt' context, and are not referentially
transparent. I guess this is ok.. same as using catch/throw.

I do feel a bit like a cowboy now.. What about distinguishing 'bad'
exceptions from 'good' ones? Using exceptions in CAT has always been
awkward, but the 'attempts' syntax here seems nice.


Entry: quasiquote
Date: Tue Sep 25 22:12:34 CEST 2007

what about postscript style [ ] quotation to create data structures
with functions?  i can't use [ ] or { } since mzscheme sees them as
parentheses. only angle brackes are left alone.. so either i'm
creating a syntax extension, ie.e (list: (bla) (foo) (bar)) or i use
an angle braket structure. since the latter will work, i'm using that:
<* *>

what about just using the quasiquote here? i'm not using it anywhere
else and i'm already using quote. it's only legal on programs: and
unquote means: insert program body here.


Entry: assembler optimizations / corrections
Date: Wed Sep 26 02:05:11 CEST 2007

A) jump size optimization

currently i have none. recently i introduced at least error reporting
on overflow. i think the deal is that doing it 'really right' is
difficult; i'm not sure there exists an optimal algorithm. the
simplest approach is:

  * convert small -> long jump
  * increment/decrement jumps before/after the instruction
  * update dictionary accordingly

it's probably easiest to do this on an already fully resolved buffer
(after 2nd pass). this algorithm is confusing due to the
forward/backward absolute/relative destinction. also, doing this
without mutation seems troublesome.


B) jump chaining

was really easy in the original badnop due to use of side-effects.


somehow this problem looks as if there's some weird control structure
that might help solve this is a more direct way.

OK... finding the optimal is apparently NP-complete

http://compilers.iecc.com/comparch/article/07-01-037

> [There was a paper by Tom Szymanski in the CACM in the 1970s that
> explained how to calculate branch sizes. The general problem is
> NP-complete, but as is usually the case with NP-complete problems,
> there is simple algorithm that gets you very close to the optimal
> result. -John]

or not?

http://compilers.iecc.com/comparch/article/07-01-040

  If you only want to optimize relative branch sizes, this problem is
  polynomial: Just start with everything small, then make everything
  larger that does not fit, and reiterate until everything fits.
  Because in this case no size can get smaller by making another size
  larger, you have at worst as many steps as you have branches, and
  the cost of each step is at most proportional to the program size.


so, it looks like the simple approach of using short branches and
expanding/adjusting + checking is good enough.


Entry: platforms
Date: Wed Sep 26 05:11:06 CEST 2007

been thinking a bit about platforms. some ideas:

* 32 bit + asm makes no sense. GCC is your friend here, and should
  generate reasonably good code for register machines. split language
  into 2 parts: POKE for control stuff, and some kind of dataflow
  language for dsp stuff.

* AVR 8 bit makes not much sense either. there is GCC and i already
  spent a lot of time optimizing 8 bit opcodes.. learning the asm
  sounds like a waste of time.

* don't know if PIC30 makes a lot of sense. it is an interesting
  platform (PDIP available), and they are reasonably powerful, if not
  a bit weird.

maybe focus on PIC18, and a small attempt to get a basic set of words
running for PIC30?


Entry: capacitance to digital
Date: Wed Sep 26 05:26:25 CEST 2007


http://www.microchip.com/stellent/idcplg?IdcService=SS_GET_PAGE&nodeId=2599&param=en531579

CAPACITANCE TO DIGITAL CONVERTER

To convert the sensor’s capacitance to a digital value, three things
have to happen. First, comparators and the flip flop in the comparator
module must be configured as a relaxation oscillator. Second, the
desired sensor must be connected to the relaxation oscillator. Third,
the frequency of the oscillation must be measured. The configuration
of the comparator and the SR latch require configuring the
comparators, the SR latch, and the appropriate analog
inputs. Connecting the sensor to the oscillator requires the control
software to select the appropriate analog input to the comparator
module’s multiplexer. It must also select the appropriate input to any
external multiplexer between the sensors and the analog inputs of the
chip. To measure the frequency of the oscillation, TMR1’s clock input
must connect to the output of the relaxation oscillator, and a fixed
sample period will be controlled by TMR0.

To start a frequency measurement, both TMR0 and TMR1 are cleared. The
TMR0 interrupt is then enabled. When the interrupt fires, TMR1 is
stopped, and the 16 bit frequency value in TMR1 is retrieved. Both
TMR0 and TMR1 can then be reset for the next measurement.

To keep the accuracy of the frequency measurement consistent, the
interrupt response time for the TMR0 interrupt must be kept as
constant as possible, so no other interrupt should fire during a
measurement. If one does, then the measurement must be discarded and
the frequency measurement must start over.

Once the 16-bit value is retrieved, the detector/decoder algorithms
can determine if the shift in frequency is a valid touch by the user
or not.

For more information on the interrupt services routine for TMR0, and
the initialization of the relaxation oscillator, refer to application
note AN1103 on Software Handling for Capacitive Sensing.


Entry: todo list
Date: Wed Sep 26 19:46:52 CEST 2007

URGENT:

* word reference manual

the primary goal would be to have documentation available at the
command console or during emacs editing, instead of just in paper
form. a tutorial can come later. now where  do i specify it?

* code protect the boot sector (OK)
* interaction macros (needs syntax + minor support in prj.ss)
* readline console
* command line completion
* make it installable (-> solve library deps? : install in collects?)
* check battery / BREAK resistor
* simplify prj.ss into chunks that operate on state explicitly (OK)


NOT URGENT:

* macro cache (maybe explicit files? read about compilation)
* scheme library split (module path handling)
* bootloader: automatic boot (2ND) sector patching
* assembler changes + functional macros


Entry: boot code
Date: Thu Sep 27 15:57:28 CEST 2007

this is actually pretty important. once i start sending out kits, it's
not so easy to change the bootloader.

some things to note:
- monitor.state -> (dict ...) format is the only important part
- boot sector is independent of any macros : only words count
- machine model obviously needs to stay stable (hasn't changed in years)
- binary api (the monitor commands) need to stay stable


what about making the bootstrap interpreter simpler? get rid of
anything other than 'receive transmit execute', and leave the rest to
a dictionary? if there's ever a problem for portability or whatever,
this might be the way to go: this interface allows to hide all
functionality in the dictionary associated to the boot kernel. right
now it's still quite pic-specific. some things will become less
efficient though..

also, the way the code is organised, sending commands will become more
difficult. the set i have now is complete enough, and reasonably
efficient. let's keep it simple and stick with the current one.

another thing: fixing the boot block. let's try that:
setting 30000B to A0 does the trick (CONFIG6H : WRTB)


Entry: using the ICD2 pins.
Date: Thu Sep 27 15:53:34 CEST 2007

last couple of days were a bit too much on the dreaming side. i need
something concrete to fix. i was thinking about simplifying the
programming interface. was thinking about using the ICD2 pins to also
do debug serial comm. but why? if my boot kernel is stable, this is
entirely unnecessary, except for reset!


Entry: ramp up to purely functional macros
Date: Thu Sep 27 20:57:29 CEST 2007

the parser.

STAGE 1:

- rewrite 'constant' as a macro definition

- separate macros from the body code, which is seen as a single
  function with multiple entry/exit points.


problem still not solved: 'variable'

currently, variable creates a constant containing a symbol, and 'code'
that performs the allocation later during the assembly phase. so in
fact, it's not so problematic.


Entry: prj.ss
Date: Thu Sep 27 22:59:52 CEST 2007

simplified it a bit: made state ops more explicit, and moved
functionality to badnop.ss

this looks like a nice approach in general. i do wonder why i still
need 'functional state' at the prj.ss level: most state updates are
intermingled with microcontroller state updates which are dirty
anyway.

one thing: it keeps me honest. on the other hand, i'd like to move to
some "image" representation. cached macros would be cool. maybe i should
look at that now.


Entry: macro cache
Date: Fri Sep 28 00:04:07 CEST 2007

it looks like the bulk of the 'revert' time is spent in needlessly
compiling code. there aren't so many run-time created macros: and
constants are currently not 'eval'ed. maybe i should make that so i
can snarf them out.

hmm.. spaghetti. the problem is that constants are still treated
separately. i can't unify them with macros until macros are purely
functional so they can be evaluated to see if they produce constant
values. solution dependences:

      file parsing to distinguish macro/code
then: purely functional macros
then: elimination of assembler constants

however, doing the first one requires elimination of assembler
constants!

looks like this is the reason why i can't oversee the problem: it's
quite a big loop. anyways, i can write the parsing step and test it
leaving the side-effecting macros intact. then move to side-effect
free macros and change the constant parsing to translate constants to
macros.

so. maybe i need an S-expression syntax first, so i can translate code
to it! for macros this is easy: i'm already using one. for composite
code however, it becomes more difficult due to the multiple entry-exit
points. this can be left alone in a first attempt.


Entry: product vision statement
Date: Fri Sep 28 01:05:20 CEST 2007

http://www.codinghorror.com/blog/archives/000962.html

for (target customer)
who (statement of need or opportunity)
the (product name)
is a (product category)
that (key benefit, compelling reason to buy)
unlike (primary competitive alternative)
our product (statement of primary differentiation)


for embedded software developpers
who want to program small embedded systems
the Brood system
is a tool chain
that supports incremental bottom up development
unlike C
our product has integrated metaprogramming through built-in macros.


something like that..
interesting.


Entry: documentation
Date: Fri Sep 28 14:03:33 CEST 2007

write a purrr manual in tex2page by sending queries to the brood
system. this should use an interface similar to snot.

brood needs to be centered around services, of which snot is one. so
let's try this:


services with

  - direct access to brood for SNOT and RL
  - document generation


Does services.ss run inside the sandbox?

YES

So all calls from snot.ss -> services.ss go through a sandboxed
eval. Services.ss itself does not need to take care of this, and can
use direct calls.

the deal is this:

a CONSOLE needs to separate:

   - TOPLEVEL (represented by eval)
   - STATE (a data structure stored independent of toplevel)


Entry: persistence
Date: Fri Sep 28 19:26:54 CEST 2007

i must not forget that the way i use persistence is a SOLUTION, and
not the original problem.

the real problem is a conflict between two paradigms:

* TRANSPARENCY as in MzScheme's module system
* image persistence and run--time self modification


as usual, my problem is rooted in ignorance. i've been jabbing about
the distinction between the two above for a while, but the real
problem is compiler compilation time.

i need to have a look at MzScheme's unit system. it sould be possible
to reload units after recompiling them because they are mere
interfaces.


Entry: services
Date: Fri Sep 28 23:24:59 CEST 2007

hmmm.. i didn't really get anywhere today. but at least i figured out
what 'services' should be. it's just the stuff that snot has access
to, but without the snot interface. i renamed it 'console.ss' and took
it out of 'snot.ss', which is now just a bit of glue.


Entry: forth preprocessing
Date: Sat Sep 29 15:51:12 CEST 2007

parsing and lexing.
it's divided in a somewhat un-orthodox way

LEXING

there are 2 front ends:
  forth-lex              :: string -> atom stream
  forth-load-in-path     :: file,path -> atom stream

the lexing part flattens the load tree. i.e. during lexing, the source
code is made independent of the filesystem.


PARSING

this is where i have to break things, so let's commit first.

1. flat forth stream -> compositional forth stream with macros removed
2. constants -> macros

let's see if i understand: constants are bad. there is no way around
the fact that constant swallows a value: it's the worst case of
reflection. this is not compatible with current parser. keeping it
would require lookahead.

so 'constant' needs to be replaced entirely by 'macro' in source code.

looking at the previous entry [[phase-separation]] what is required is
indeed a parsing step that can translate

       1 2 + constant x  -->  macro : x 1 2 + ; forth

yes, this is of course possible, but is it really worth it? maybe it's
better to clean up the Purrr language semantics now than to carry
around the code that allows this. ad-hoc syntax is a nuisance.

so, current path: CONSTANTS are being removed.

that was easy :)

now, for variables.


     variable abc

does 2 things: it creates a macro that quotes itself as a literal
address, and it adds code that tells the assembler to reserve a RAM
slot.

maybe i should use 'create' and 'allot' ?
(back to that later)

currently the parsing seems to work, except for the macro/code
separation step. for this i need a stream splitter. in stream.ss i
have '@split', which just splits off the head of a stream, not true
splitting.


status:
- parsing step: ok
- load! setep: ok (like previous load, but with macro defs separated)

next:

- remove all side-effecting macros
- change the assembler to take values from macros


remarks:
  * is dasm-resolve still possible?  (value -> symbol)

status:
- monitor.f -> monitor.hex gives the same code


Entry: cleanup
Date: Sat Sep 29 21:21:54 CEST 2007


core changes seem to be working. the rest is cleanup. TODO:

- fix variable (OK)

- fix interaction constants (OK)

- fix sheepsint (OK)

- extract macros from forth file -> compositions + save as cache (OK)

- fix interaction macros that reduce to expressions

- trick macros into generating their symbol during compilation, and
  value during assembly. (restore disassembly constants)

- clean the assembler name resolver


Entry: storing application macros in state file
Date: Sat Sep 29 22:31:46 CEST 2007

why not?

this solves a lot of problems.. and they are available in source form,
so there's not problem to store them symbolically.


Entry: profiling
Date: Sun Sep 30 03:15:31 CEST 2007

on sight.. but still quite remarkable. loading monitor.f from source
to S-expressions takes a lot more time than either compiling the
macros or compiling the code to a macro and running it. both are
instantaneous.

ha!

actually, that's very good news. improving the speed of the lexer
seems a lot easier to do than improving the speed of the compiler.

looking a bit further, sheepsint.f seemed to be faster. the reason is
thus the constants. maybe i should just put them back to
s-expressions? they don't change much after all.


Entry: upload speed
Date: Sun Sep 30 03:40:57 CEST 2007

It's quite annoying the upload speed is so slow. I need a way to
change the speed on the fly.

EDIT:
baud rate: commit goes a little bit faster when baud rate is changed
from 9600 to 38400, so the limiting factor is probably the flash
programming.


Entry: parsing and printing
Date: Sun Sep 30 16:17:09 CEST 2007

there are a couple of placese in the brood code where (regular)
parsing and printing are done in a relatively ad-hoc way using
'match'. maybe i should have a look at extending match to provide
better pseudo "algebraic types".

EDIT:
http://www.cs.ucla.edu/~awarth/papers/dls07.pdf (*)
http://planet.plt-scheme.org/package-source/dherman/struct.plt/2/3/doc.txt

(*) looks really interesting. also, i need to have a decent look at COLA:
http://piumarta.com/software/cola/

EDIT:
i changed the syntax for the peephole optimizer to something more akin
to algebraic types & matching.. still a bit of a hack, but there's a
better adapted quoting mechanism now.


Entry: deleted from brood.tex
Date: Sun Sep 30 19:25:53 CEST 2007


Some important assumptions I'm making to support the current solution
are that code updates need to be made \emph{while running}, and that
the target is severely \emph{resource constrained} such that all
compilation and linking needs to be done off--target. This excludes
\emph{late binding} of most code.

Another assumption I'm making is that some binary code on the target
will never be replaced, and will drift out of sync with the evolution
of the language in which it was written. An example of this is a
\emph{boot loader}. Such code needs to be viewed as a black box.

This approach violates transparency.

To give this section some context, I have to make my \emph{beliefs}
more explicit. I believe that a compiler is best implemented using
pure functional programming, because it is in essence a
\emph{function} mapping a source tree to a binary representation of
it. This idea is easily extended with \emph{bottom up} programming,
where part of the source tree generates a compiler to compile other
parts of the source tree. In order to make this work, I believe you
need \emph{transparency}. By this I mean that all \emph{reflection}
(compiler compilation) is \emph{unrolled} into a directed acyclic
graph representing a code dependency tree.

On the other hand, I believe that a microcontroller is best modeled as
a \emph{persistent} data structure. A microcontroller is a
\emph{physical object}, and should be modeled as such,
\emph{independent} of the compiler that is used to create the code
comprising the object state. This is what makes Forth interesting: the
ability to \emph{incrementally update} without having to recompile
everything. Due to limited hardware support (flash ROM is not RAM),
\emph{late binding} becomes problematic, and also induces a
significant performance penalty. This makes \emph{early binding} a
reasonable alternative: in the end the objective is to at least
provide the possibility to write efficient code at the lowest level of
the target language tower.

This is the heart of the paradigm conflict. Where do I switch from a
transparent language tower to \emph{dangerous} manually guided
incremental updates? Maybe the question to answer would be: why does
one want to have this kind of low--level control anyway? The real
answer is that at this moment, I don't really know how to create a
transparent system. The real reason for that is that I've been locked
in a certain paradigm.

Let's explore what would happen if we lean towards any of the two
extremes. If the whole system were transparent, the controller code
would need to be treated as a filesystem if incremental updates were
still to be used. After code changes, one could simply recompile,
relink and upload only the parts that changed. This is the sanest
thing to do.


Entry: misc improvements
Date: Sun Sep 30 21:35:40 CEST 2007

note that 'load' as it does currently doesn't 'commit'. actually,
that's not how it's used mostly! also, automatic commit might be nice
for compile mode..

on the other hand, compile mode is kind of an advanced feature also.


Entry: structures for music
Date: Mon Oct  1 05:48:46 CEST 2007

this is more of a tutorial pre. i saw aymeric was using the stack to
store sequences, which is not a good idea.. i see 2 other ways: flash
and ram. i kinda like the x / . approach for pattern synths. the trick
is to do multiple voices, so i really need some kind of multitasking.

say i have 3 patterns

: bd  o . . . o . . .  bd ;
: sn  . . . . o . . .  sn ;
: hh  o . o . o . o .  hh ;

what do o and . do ?

let's assume that recursion is not allowed in these patterns. what can
we hide in a single invocation? a simple trick is to use the
dictionary shadowing: the words could call some fixed word, which is
re-implemented later.

: instrument   do something ;

: bd   o . . o . .  bd2 ;
: bd2  . . o . . o  bd ;

we could have:

: o instrument yield ;
: . yield ;

hmm.. it's probably better to directly use names instead of this
name-capture thing.

if recursion is disallowed, it should be possible to store each thread
in a single byte, so a lot of threads are possible. in that case, an
explicit interpretation and automatic looping might be better, using
routing macros.


Entry: purrr reference documentation
Date: Mon Oct  1 16:13:01 CEST 2007

documentation for each macro. this contains 2 things:

- stack effect (type)
- 1 line human readable doc which possibly points to more information.

so a word's meta info looks like (+)

((type . (a a -- a))
 (doc  . "Add two numbers"))

if i can't do types yet, i should at least put the stack effect in a
form that can be used later to do types. it's also probably a good
idea to add meta-data separately to not clutter the code.

so, how to infer types? from the lowest level (pattern matching
macros) i can infer a lot.

first some cleanups: i'm taking out the 'compiled' field in the word
structure, because it's better to just save the source of macros
before they're being compiled, instead of trying to recover them
later.

what about word-semantics? i forgot the reason why sometimes it cannot
be filled.

been poking in the rpn.ss internals and i guess it's best to have the
state tx take a compiler for backup. but, this doesn't work for some
other reason i can't remember.. tata: spaghetti.

let's see if i can hack around it now by simply providing a language
name for backup.


Entry: i need closures
Date: Mon Oct  1 20:26:22 CEST 2007

yep..

too much crap going on with trying to call from prj -> base and having
to pass arguments.

EDIT:
when i wrote 'compose' i made sure to not allow composition between
words with different semantics. however, i'm not so sure if that's a
good idea.. i only want to use closures on functional words, not on
state words. maybe is should let go of this control freakish behaviour
since the source rep is only debug: it doesn't work relyably for all
words to reconstruct from that source..


Entry: dsPIC
Date: Tue Oct  2 03:46:01 CEST 2007

maybe it's time to try it out, and gently grow it into being. some
challenges:

- 3 bytes / instruction
- 16 bit datapath
- addressing modes

flash block erase size is 96 bytes, but address-wize this counts as 32
instruction words.

  The dsPIC30F Flash program memory is organized into rows and
  panels. Each row consists of 32 instructions, or 96 bytes. Each
  panel consists of 128 rows, or 4K x 24 instructions. RTSP allows the
  user to erase one row (32 instructions) at a time and to program
  four instructions at one time. RTSP may be used to program multiple
  program memory panels, but the table pointer must be changed at each
  panel boundary.

I don't understand why it says 'four instructions at a time' and then
later on talks about 32 at a time: "The instruction words loaded must
always be from a group of 32 boundary."

And the confusion goes on "32 TBLWTL and four TBLWTH instructions are
required to load the 32 instructions."

this looks like a typo.. let's download a new version of the
sheet. got DS70138C now. they're at version E. it's got the same typo.

so assume i need to write per 32 instructions + some magic every
4K instructions (updating a page pointer?). apart from the latter it's
quite similar to the 18f, just a larger row size size.

it looks like this thing is byte addressed, but for each 2 bytes,
there's an extra 'hidden' byte! lol

ok, there is a sane way of looking at it: the architecture is 16-bit
word addressed, but every odd word is only half implemented:
instruction width is 3 bytes.

it looks like it's best to steer the forth away from all the special
purpose DSP tricks like X/Y memory and weird addressing modes. looks
like an interesting target for some functional dataflow dsl though.

there are 2 kinds of instructions: PIC-like instructions that operate
on WREG0 and some memory location, and DSP-like instructions that use
the 16 registers.

roadmap:
- find a 8bit -> 16bit migration guide from microchip
- partially implement the assembler to PIC18 functionality


Entry: direct threaded forth
Date: Tue Oct  2 07:26:49 CEST 2007

i'm toying a bit with the vm forth. and was thinking: it's not
necessary to go stand-alone. it's much better to test this vm forth as
another target.


Entry: type signatures from pattern matching macros
Date: Tue Oct  2 14:38:47 CEST 2007

It should be possible to mine the 'source' field of pattern matching
macros for types, or at leas stack effect, of functions.

the first matching rule is always the most specific one: if that fits
a certain pattern.

the REAL solution here is to change the pattern matcher to REAL
algebraic types instead of this hodge-podge. moral of the story:
whenever pattern matching occurs on list structure, what you really
are looking for is algebraic types.

yes... i'm not going to muck around in this ad-hoc syntax. i need a
real solution: something on top of the current tx. i need real
algebraic types.

there is this:
http://planet.plt-scheme.org/package-source/dherman/struct.plt/2/3/doc.txt

but for my purpose it might be better to just stick with the current
concrete list representation for the asm buffer.

what about:


(([qw a] [qw b] +)   ([qw (+ a b)]))

->

((['qw a] ['qw b] +)   `([qw ,(+ a b)]))


looks like 'atyped->clause' in pattern-tx.ss is working. it's indeed
really simple to implement on top of the matching clauses.

looks like 'asm-transforms' works too.

i ran into one difficulty though. call it polymorphism. in original syntax:

   (([op 'POSTDEC0 0 0] ['save] opti-save) `([,op INDF0 1 0]))

cannot be expressed in the new syntax. however, this is
exceptional. it's probably a good idea to make this polymorphism
explicit. EDIT: it is possible to use unquote!! a bit of abuse of
notation, but ...

let's write the pic18 preprocessor on top of asm-transforms instead of
compiler-patterns.

ok. done. old one's gone.

now it should be a lot easier to write some documentation or type
inference..

i tried to tackle the 'pic18-meta-patterns' but i don't seem to get
anywhere. current syntax is way to complicated. it really shouldn't be
too hard by taking a more bottom up approach instead of trying to use
'callbacks' that force the preprocessing of some macro's
arguments. write a single generator macro for each kind.

trying again. this is the thing i want to generate:

  (define-syntax unary
    (syntax-rules ()
      ((_ namespace (word opcode ...))
       (asm-transforms namespace
                       (([movf f 0 0] word) ([opcode f 0 0])) ...
                       ((word)              ([opcode 'WREG 0 0])) ...))))

from this

  (asm-meta-pattern (unary (word opcode))
    (([movf f 0 0] word) ([opcode f 0 0]))
    ((word)              ([opcode 'WREG 0 0])))


the thing which seems problematic to me is the '...'

more specificly

(pattern template) ...   ->   (pattern template) (... ...) ...

that doesn't seem to work.

it looks like the 'real' problem here is due to the fact that i'm
expanding to something linear.. i'm inserting stuff. i wonder if it's
possible to modify the asm syntax a bit so it will flatten
expressions.

wooo.. macros like this are difficult. i'm currently doing something
wrong with mixing syntax-rules with calling an expander directly. best
to stick with plain syntax-case and direct expansion: that's easier to
get right.

the deal was: sticking with syntax-rules as a result of a first
expansion worked fine, i just needed to put the higher order macro in
a different file for phase separation reasons.

so.. the remaining step is to collapse the compiler-patterns-stx
phase, and add the current source patterns to the word source field,
which would yield decent docs.

ok, done.

> msee +
asm-match:
((((qw a) (qw b) +) ((qw `(,@(wrap a) ,@(wrap b) +))))
 (((qw a) +) ((addlw a)))
 (((save) (movf a 0 0) +) ((addwf a 0 0)))
 ((+) ((addwf 'POSTDEC0 0 0))))
>

that should be easy enough to parse :)
CAR + look only at qw.

the 'wrap' thing is something that needs to be cleaned up too.. i
tried but started breaking things. enough for today.

this is what i get out for qw -> qw

(((qw a) (qw b) --) ((qw `(,@(wrap a) ,@(wrap b) #f))))
(((qw a) (qw b) >>>) ((qw `(,@(wrap a) ,@(wrap b) >>>))))
(((qw a) (qw b) <<<) ((qw `(,@(wrap a) ,@(wrap b) <<<))))
(((qw a) drop) ())
(((qw thing) |*'|) ((qw thing)))
(((qw a) (qw b) ++) ((qw `(,@(wrap a) ,@(wrap b) #f))))
(((qw a) (qw b) swap) ((qw b) (qw a)))
(((qw a) dup) ((qw a) (qw a)))
(((qw a) (qw b) or) ((qw `(,@(wrap a) ,@(wrap b) or))))
(((qw a) (qw b) and) ((qw `(,@(wrap a) ,@(wrap b) and))))
(((qw a) neg) ((qw `(,@(wrap a) -1 *))))
(((qw a) (qw b) xor) ((qw `(,@(wrap a) ,@(wrap b) xor))))
(((qw a) (qw b) /) ((qw `(,@(wrap a) ,@(wrap b) /))))
(((qw a) (qw b) *) ((qw `(,@(wrap a) ,@(wrap b) *))))
(((qw a) (qw b) -) ((qw `(,@(wrap a) ,@(wrap b) -))))
(((qw a) (qw b) +) ((qw `(,@(wrap a) ,@(wrap b) +))))

i also made a 'print-type' function. for '+' :

((qw qw) => (qw))
((qw) => (addlw))
((save movf) => (addwf))
(() => (addwf))

this might be useful.. but what's more useful is the building of a
framework that enables this for all functions. it works for the
assembler primitives only.


Entry: TODO
Date: Sat Oct  6 20:54:13 CEST 2007

- live command macros
- put live commands in a namespace
- add doc tags: math/control/predicates/...
- write small tutorial:
  * assembler + PIC18 architecture
  * logic, addition and 8 bit programming (hex + binary)
  * the x and r stacks
  * route (DONE)
  * predicates & conditionals
  * run time computations & ephemeral constructs
- fix macro cache re-init + initial state loading (DONE)
- fix quoting in macros
- fix hardcoded paths (rename brood/brood)
- rename compilation stack

Entry: unsigned demodulator
Date: Sat Oct  6 21:30:01 CEST 2007

the pic18 has a hardware multiplier, which is nice. however, computing
signed multiplication takes quite a hit compared to unsigned. i was
wondering if i can do an amplitude-only demodulator using only signed
multiplications.

the entire function is unsigned -> unsigned.

    signal -> mixer -> I / Q -> I^2 + Q^2 -> LPF


[EDIT: deleted a long erroneous entry. the thinking error was about
the commutation of the LPF and the squaring operation. the above
expression just gives the average signal power.]

the correct formula is:

  X -> (I,Q) => LPF -> || . ||^2

that's completely symmetric wrt to phase.

the LPF is straightforward: a simple 1-pole will probably do if i keep
the bitrate low. a 2^n-1 coefficient is easy to implement without
multiplication.

the I=XC and Q=XS multiplications can probably be simplified since X =
x-h and C = c-h have no DC components. here h = 2^(bits-1).

  I = X C
    = (x-h) (c-h)
    = xc - hx - hc + h^2
    = xc - h(x + c - h)
    = xc - h(x - h + c - h - h)
    = xc - h(X + C + h)

    DC
    = xc - h^2

which is quite intuitive: take the average of xc, but remove the dc
component.


Entry: frequency decoding
Date: Mon Oct  8 00:26:37 CEST 2007

for krikit, the choice to make is to either decode the whole spectrum
(listen to everything at once) or listen only to a single band. this
is a choice that has to be made early on.. some remarks.

* FFT + listening to all bands is probably overkill. it's not so
  straightforward to implement, so the benefits should be big. FFT for
  point-to-point makes only sense when combatting linear distortion.

* single frequency detection is really straightforward. the core
  routine is a MIXER followed by a complex LPF. the output is phase
  and amplitude.

* using a sliding window average LPF together with orthogonal
  frequencies allows for good channel separation. this works for
  steady state only, so some synchronization mechanism is necessary.

* sending out a single message to multiple frequencies: easy to do
  with pre-computed tables for 0 and 1. phase randomisation to avoid
  peaks is possible here.

* i'm afraid of linear distortion due to room acoustics.. maybe FM/FSK
  should be used?

* if non-linear distorition is not a problem, DTMF frequencies are not
  necessary.

* using exact arithmetic, it is easy to update/downdate a state vector
  for rectangular window LPF. this update can be performed at the
  input of the mixer.

* bandwidth limitation for transmission.

http://en.wikipedia.org/wiki/Olivia_MFSK


Entry: network debugging + pic shopping
Date: Mon Oct  8 15:27:25 CEST 2007

- avoid remote reset: use WDT
- central power gives panic switch
- use a standard bus protocol for comm (I2C ...)

The 18f1220 doesn't have I2C, so it might be better to go for a
different component. Lowest pin count is 28. Let's take the one with
the most memory to have some room for tables and delay lines. I'm
thinking about

18f2620:  64kbytes flash, 3968 bytes ram (maxed out)

This is also a nice target for a standalone language. These are the
same, with some things missing.

          EEPROM (b)     FLASH (kb)
18f2620    1024           64
18f2610    0              64
18f2525    1024           48
18f2515    0              48


Entry: 8 bit unsigned mix -> complex 16/24 bit acc
Date: Mon Oct  8 15:32:30 CEST 2007

I've been toying a bit with a mixer + accumulator building block, and
it seems it can be quite simple. Some remarks:

- Perform signed offset correction out of the accumulation loop.

- Perform update/downdate for rectangular window at input due to
  commuation with mixer.

- As long as the result of accumulation fits in the word length,
  overflow is not a problem.


If a signed number X is represented by an unsigned number x, the
difference is X = x - h, where h = 2^{n-1} is 'half'. Per signed
multiplication there is an offset of h^2 = 2{2n-2}.

What this means is that once per 4 accumulations, the correction term
disappears due to word overflow if 2 bytes are used. However, the
maximal filter output occurs at full scale input, which will overflow
the accumulator if more than 4 accumulations are used, so maybe it is
better to use a 3 byte state. In any case, if the number of
accumulations is a power of 2, removing the unsigned offset is a
simple bit operation.


Entry: transmission
Date: Mon Oct  8 16:46:26 CEST 2007

Using hardware PWM with 8 bit resolution I can send out at 39kHz,
assuming fosc = 40HHz. This is still well beyond the maximal frequency
at about 3kHz, and won't pass the speaker, so an analog filter is not
necessary. Differential drive (half bridge) could be used.

One thing to note is that only ECCP (enhanced) can do multi-channel
PWM. The normal PWM is only single output, and all 28 pin chips have
just 2 x CCP. The 18 pin 18f1x20 has a single ECCP, and the 40/44 pin
18f4xxx also have one.

Looks like that's quite a limitation.. On the other hand, a CMOS
inverter could be used on-board.. Is that worth it? Probably not. A
simple coupling condenser will do the trick.


Entry: self programming 5V
Date: Mon Oct  8 18:01:24 CEST 2007

Something i just noticed in the 2620 datasheet: self-programming works
only at 5V ?


Entry: no apology!
Date: Thu Oct 11 01:43:40 CEST 2007


i tried a couple of times this week to explain the "ephemeral" macro
idea, but it's just insane. i need a real solution:

- macro code needs to know whether a certain word is defined or not.

- if a partial evaluation can't be computed, the error should be:

   * none, if the corresponding library code can be found.
   * "partially implemented literal construct" or something..


what i do need to explain is "Why leaky abstractions are not
necessarily bad." This is a core Forth idea quite opposed to the safe
language ideal. I'm using a lot of that stuff, and I guess it's good
to make a list of these.

looking at the code, this 'need-literal' error only happens in 3
places: toggle set and bit? i just took them out: they refer to code
words now, up to user to implement.


Entry: Purrr semantics
Date: Fri Oct 12 16:25:44 CEST 2007

As explained in Brood, there is only a single semantics of a Purrr
program: it is a compositional, purely functional language. A Purrr
program consists of a set of (recursive) macro definitions, and a
``body'' which defines a compilable function with reduced semantics.

It would be really cool if i could get rid of the explicit
'compilation' step, and make everything just declarative.

What i'd like to do is to apply this approach to scheme. Maybe that's
what PICBIT is doing?


Entry: train notes about syntax, semantics and metaprogramming
Date: Fri Oct 12 17:41:09 CEST 2007

I can identify 3 distinct uses of macros:
 - control flow (begin ... again)
 - optimization (1 2 +)
 - explicit meta (using the words '>m' and 'm>')

The latter is actually the same as the first. The 'm' stack is like
the 'r' stack: it is used to implement nesting constructs.


conceptual problem: jumps, can be solved by writing all jumps as
recursion and using higher order functions (combinators). together
with using only a single conditional statement, the solution is to
enable syntax for quoted macros. this leaves:

    * conditional (IF)
    * quoting operation (LAMBDA)
    * dequoting operation (APPLY)


The core ideas behind the macro language are:
    * purely functional (no side effects)
    * everything is first class
    * purely compositional (no syntax)

Then, the target langauge should inherit as much as possible from
these properties:
    * functional word subset (data stack)
    * possibility of HOF (with/without closures) using byte codes?
    * mostly pure compositional semantics, with a little syntax sugar

Construct a powerful metaprogramming system by starting with a pure
language, and making the transition (projection) from pure/ephemeral
-> non-pure/concrete explicit. In Purrr this is the decision to use
macros or words to impelement functionality.

Is metaprogramming a form of message passing? Sending "reconfig"
messages?


MERGE TODO:

- check how PLT classes solve name spaces issues + use this for macro namespace.

- fix macro quoting and nesting. maybe write program as list of macros
  instead of 1 macro now? it's isomorphic, but possible to manipulate.

- don't solve nesting in source preprocessor: that is to remain
  regular, and the parser is to be explicit (compilation 'meta'
  stack). maybe this requires a real extensible descent parser?

- check how Factor implementes closures

- make interaction words extensible

- check >> and 2/ simulation and partial evaluation words


Entry: notes remarks
Date: Fri Oct 12 17:43:50 CEST 2007

There are only 2 kinds of distinct primitive macros:

  - partial evaluation macros (written in pattern language)
  - nested structures (written in CAT)

Composite (recursive) macros can combine both. This seems to be the
way to explain how things are going + a way to clean up the code a bit
and reduce the number of primitive nesting macros. Apparently, that's
already accomplished.. on the other hand, these are entry points to a
type inference system.. edit: just re-implemented >c and word>c as
pattern matching words.


I am in trouble: I want to explain why I diverged from explicit
Forth--style metaprogramming to move to compositional macro semantics
with partial evaluation, and why at the same time i'm not going full
length: instantiation is still limited to a subset of the full macro
semantics. The thing is: having metaprogramming constructs in the
language disguised as 'compatible semantics' is a good idea: explicit
primitive macros can be reduced quite a lot. So what's the question??


Entry: debug bus
Date: Fri Oct 12 17:57:41 CEST 2007

      - identical clients
      - ad-hoc 1-wire instead of SPI/I2C/async/...
      - host = master
      - binary tree-like physical structure
      - cables/connectors ?
      - multihost: just use shared terminal

EDIT:
maybe an ad-hoc network is best to avoid at first.. let's get
something simpler working before trying crazy stuff.


Entry: quoting macros
Date: Fri Oct 12 19:53:43 CEST 2007

this looks a bit like the final frontier. currently i can't write
Forth in terms of a compositional language. with the current pattern
matching language, it would be trivial to do so if i had a
representation of anonymous macros. basicly i want:

	       [ 1 + ] [ 1 - ] ifte

that's easy enough if '[' and ']' are part of a parser preprocessor.
however, anything defined in terms of those, like 'if' 'else' 'then'
needs to be implemented as parser macros also! this complicates
things.. i see only 2 solutions:

	    - implement all nested words as parser words
	    - figure out a way to unify parsers and macros

what about this: allow the use of syntax '[' and ']' as a macro
quoter, but write words like 'ifte' in terms of Forth, instead of the
other way around.

again: i'd like to have an explicit compilation/macro stack lying
around, however, quoted macros are nice to have. this is
non-orthogonal, but does it really matter? i don't know what to think
about this..


Entry: Haskell
Date: Sat Oct 13 15:32:18 CEST 2007

I've been looking for an excuse to use Haskell for something
non--trivial. The demodulator (and unrelated, iirblep filter) might be
a good problem to tackle. OTOH, the real exercise is probably to write
a prototype in Scheme, test it, and then write a specific compiler to
translate that algorithm into C or Forth. So maybe best demodulator in
scheme (see filterproto.ss) and iirblep in Haskell?


Entry: the purrr slogan
Date: Sat Oct 13 18:45:18 CEST 2007

in order to explain what purrr actually is, it is best to set these
two points:


 * Purrr is a macro assembler with Forth syntax. It is implemented in
   a purely functional compositional macro language.

 * Because of the similarity of the procedural Forth langauge and its
   meta programming language most metaprogramming can be done by
   partial evaluation, blurring the distinction between the concrete
   procedural language, and the ephemeral macro language. In a sence:
   PE is not just an optimization, but an *interface* to the
   metaprogramming language.

 * The PE is implemented as greedy pattern matching macros (is this
   important?)

Entry: removed from purrr.tex
Date: Sun Oct 14 16:52:10 CEST 2007

\section{The Big Picture}

Purrr can be used in its own right, but it is good to note that Purrr
is part of the Brood system, which is an experiment to combine ideas
from Forth, (PLT) Scheme and compositional functional languages into a
single coherent language tower. Purrr can be seen as an
\emph{introspective boundary} in this language tower: the core of
Purrr is to be the basis of this language tree, but the scope of Purrr
is limited to a low--level language with Forth syntax and semantics
and some meta--programming facilities disguised as Forth macros. For
example, it is not possible to access the intermediate functional
macro representation directly from within Purrr at this moment; this
still requires extension of the compiler itself using the Scheme and
CAT languages. This separation between the Purrr language and its
implementation serves to to keep the programmer interface to Purrr as
simple as possible, while the detais of the language tower are worked
out to eventually lead to a more coherent whole. Purrr by itself is
reasonably coherent, although it is somewhat limited in full
reflective power by this language barrier. Eventually, Purrr should be
just an interface (with Forth syntax) to the low level core of the
compositional language tower in Brood.

Because Purrr is implemented only for the Microchip PIC18
architecture, there is no tested \emph{standard} machine layer: most
functionality is fairly tied with the PIC18. I am confident however,
that refining the split of the current code base into a shared and
platform specific component is fairly straightforward. Due to the ease
with which to create an impedance match in a Forth like language, I am
refraining from an actual specification of this standard layer until
the next platform is introduced.  By consequence, the border between
the machine model and the library might shift a bit.

Purrr's macro system is the seed for a declarative functional
language. Such a language would have no explicit macro/forth
distinction as in Purrr.


Entry: new ideas from doc
Date: Sun Oct 14 16:52:21 CEST 2007

It looks like things are getting cleaner: by taking this partial
evaluation thing serious, CAT primitives can be largely
eliminated. Just the words >m and m>, together with some stack
juggling words like m-swap, are enough to implement the whole
language. I just need to clean up a bit more so this idea can be
sealed as a property: no primitives except for a stack!

For documentation purposes it might now even be a good idea to write
most code in compiler.ss and pic18-compiler.ss in Purrr syntax,
leaving only the true primitives in s-expr syntax.  EDIT: that's a bad
idea until the forth syntax can represent everything the s-expr syntax
can.

The remaining cleanup brings me to the backtracking for/next
implementation. With just quoted macros and a 'compile' that executes
macros, this can be removed from the primitives.


Entry: writing lisp code in emacs
Date: Mon Oct 15 01:38:51 CEST 2007

watching slime screencast

* insert balanced paren: M-( with prefix arg


Entry: quoting macros
Date: Mon Oct 15 17:10:44 CEST 2007

Apparently, it was already implemented. I rewrote the for/next
backtracking so now it's expressed as recursive macros, except for the
part that tests the data structure constraint.

I guess what i have now is that compositional language forth
dialect. The only problem is that my Forth parser doesn't support
it. I just need to write some macros to transform code that uses
literal quoted macros into other constructs. Start with ifte:

   ;; Higher order macros.
   (([qw a] [qw b] ifte)
    ((insert
      (list
       (macro: if   'a compile
               else 'b compile
               then)))))


/me got big smile now :)


Entry: practical stuff : starting a new project
Date: Wed Oct 17 14:13:13 CEST 2007

I need to make my old 18F452 proto board work again, so this entry is
a seed for a "getting started" doc: how to get from nothing to a
working project.

EDIT: i'm switching to a 18F2620, so doing it over again.

Assumptions:
 * the project is part of (your branch of) the brood distribution
 * you're using darcs version control


1) Make a directory in brood/prj, and add to darcs

      cd brood/prj
      mkdir proto
      darcs add proto

2) Copy the following files from another project. i.e. prj/CATkit and
   add them to the darcs archive

      cd proto
      cp ../CATkit/init.ss
      cp ../CATkit/monitor.f .
      darcs add *

3) Edit the init.ss file to reflect your project settings.

skip step 4-6 if you have a chip with a purrr bootloader

4) Edit monitor.f for your chip

   That file includes the support for the chip in the form of a
   statement:

      load p18f2620.f

   Look in the directory brood/pic18 to see if such a file exists. If
   it does, go to step 5).

   If not, you need to create one and generate a constants file from
   the header files provided by Microchip. I.e.:

      cd brood/pic18
      ../bin/snarf-constants.pl \
      		< /usr/share/gputils/header/p18f2620.inc \
		> p18f2620-const.f

   The .INC file can alternatively be found in the MPLAB distribution,
   in the MPASM directory.

   Now you need to create the setup file for the chip. Start from a
   chip that is similar

      cp 18f1220.f p18f2620.f

   And edit the file to reflect changes necessary for chip startup and
   serial port initialization.


   Don't forget to add the files to darcs, and send a patch!

      darcs add p18f2620*.f
      darcs record -m 'added p18f2620 configuration files'
      darcs send --to brood@zwizwa.be http://zwizwa.be/darcs/brood

   In case you can't send email from your host directly, replace the
   "--to brood@zwizwa.be" option with an "--output darcs.bundle" option and
   send the resulting darcs.bundle file.


5) To compile the monitor in the interactive console type this:

      project prj/proto
      scrap

6) Make a backup copy of the monitor state.

      cp prj.ss monitor.ss

   And flash the microcontroller using the monitor.hex file.  In case
   you're using the ICD2 together with piklab, the command line would
   be:

      piklab-prog -t usb -p icd2 --debug --firmware-dir <dir> \
                  -c program monitor.hex

   Here <dir> is the directory containing the ICD2 firmware, which can
   be found in the microchip MPLAB distribution.


7) Next when you start the console, go back to the project by typing:

      project prj/proto


8) Now you can start uploading forth files using commands like:

      ul file.f

   This will erase the previously uploaded file and replace it the new
   one. If you want to upload multiple files, use the 'mark' word
   after upload to prevent deletion:

      ul file1.f
      mark
      ul file2.f

   Now the next 'ul' will erase file2.f before uploading a new
   file. To erase files manually, use the 'empty' word.


--- LIVE MODE ONLY ---
bin/purrr
project prj/CATkit
ping


Entry: this is a simultaneous fix/todo log for the previous entry
Date: Wed Oct 17 14:28:38 CEST 2007

- add default entries to dictionary on init
- single baud rate spec? mine it from forth source, or the other way around..
- standard naming for the state file?
- for chips that come with a bootloader: need to save the pristine file
- fix state file rep so it is a standard s-expression tagged with 'project'
- fix absolute path
- add 'serial' tag to port
- add a 'chip erase' or a fake one using "mark empty"

the 3 different state files:

    - init.ss         "most empty" state
    - monitor.ss      state file of bootloader only
    - prj.ss	      current state

these names are set as default, but can be overridden.

ok. done.

'monitor.ss' is never written by the application, so ppl with just a
monitor.ss file can revert to just that file (not implemented yet).


Entry: operations on dictionaries
Date: Wed Oct 17 15:34:26 CEST 2007

I'm trying to factor dictionary operations a bit. I already ran into
'collect' which takes a list of tagged pairs, and collects all
occurences for each unique pair. Doing this stuff purely functional
becomes difficult if performance is an issue: naive algorithms are
quadratic. Hash tables could accellerate. It seems overall that
mutation is the thing to choose here..


Trying to write these hierarchical combination things i'm getting
convinced that it's a bit of a mess.. (name . value) pairs are well
defined, but hierarchical structures require polymorphy. To make the
analogy with ordinary functions, basicly you're dealing with a
function that maps a value to a value OR another function..

Maybe the whole abstraction is broken?

I need to think about this.. something profound seems to be hidden
here. I'm going to hack around it for now.

I think I get it.. and it's trivial again.

  A hierarchical hash table (HHT) is an implementation of a finite
  function which maps tag SEQUENCES to values. All operations on HHTs
  have the semantics of operations on finite functions.

From this follows that paths need to be created if a value is
stored. It doesn't make sense to have to create the directory before
storing a value. Otoh, storing a value in a tag sequence, where one of
the top nodes is not a hhash is an error.


Entry: PIC write protect
Date: Wed Oct 17 20:20:23 CEST 2007

write protection works well and all, but i can't get it undone! i
think it works without problem in mplab, but using the piklab
programmer erasing the chip doesnt seem to work...

what is needed is a full chip erase. it doesn't look like piklab is
doing this correctly. on to installing mplab again..

OK i got it: memory that's protected requires a BLOCK ERASE, and such
an operation needs Vdd > 4.5


Entry: macro nesting
Date: Thu Oct 18 14:15:39 CEST 2007

time for the hiary problem: the syntax-rules -> syntax-case equivalent
for macros. what do i need:

there is only one decent way of doing this: use scheme
metaprogramming. i like forth and all, but for numeric stuff, it's
just easier to have variable names.. let's invent some new construct:

\ load a scheme file implementing macros
load-scheme  filename.ss

been hacking a bit, but i need a plan..

* s-expression files contain scheme expressions, not forth files with
  s-expression syntax. this effectively needs a scheme parser down the
  line, something that can convert the inline atoms to proper
  invokation.


what about this: make it possible to load plt modules from
forth. modules are stored as a single s-expression.

hmm... again.. some questions:

* how to store a module definition in the state file, so it can be
  instantiated?

all macros in the '(macro) dict get evaluated using def-macro!, which
does:

  (define (def-macro! def)
    (ns-set! `(macro ,(car def))
             (rpn-compile (cdr def) 'macro:)))

rpn-compile evaluates `(macro: ,def)

so this won't work to store modules. it's probably best to represent
the macros differently in the state file, so it's just scheme code,
and then create a module: evaluator.


that's the main problem:

* how to store the source of things that generate macros, in this case
  a scheme module, so they can be re-instantiated from the state file.

* do this without introducing ANY limit on what can be included in the
  scheme file.

* without introducing yet another special case. in fact it's probably
  better to remove a special case driven by this requirement.


let's go back to how macros are parsed. ok. they are included as a
(def-macro: <name> . <body>) expression in the atom stream. i guess
this needs to change to include a (def-module: . <body>) form.

why not change the def-macro: thing to a more general def-scheme:
syntax?


(def-macro: name . body)

-> (def-scheme: (def-macro! name body))

or..

have def-macro! support modules. i guess that's the simplest way.

ok.. changed the tag to "extend:" and changed the function that
implements the extension to "extend!"

i'm running into some bad behaviour.. need to formalize


Entry: forth translation
Date: Thu Oct 18 16:58:21 CEST 2007

Time to formalize the forth parsing. Some notes:

- it's actually just a lexer: no nested structures are handled in this
  stage: all is passed to the forth macros, which use the macro stack
  to compile nested structures.

- FILE: the first stage does only file -> stream conversion. this
  includes loading (flattening the file hierarchy)

- PARSE: the second stage does 'lookahead' parsing: all
  non-compositional constructs get translated to compositional
  ones. this also includes macro definitions.

The problem I run into is the FILE stage, which also needs to inline
scheme files, but gets messed up by the forth parser. I just need to
tag them differently.


Entry: error reporting
Date: Thu Oct 18 22:26:25 CEST 2007

using 'error' instead of 'raise' is a good idea since continuation
marks are passed. the rep.ss struct marks CAT words, so something
resembling a trace can be printed. the cosmetics can be done later,
this is good enough for now.

done. maybe want to convert some exceptions that are clear enough back
to raise so they don't print a stack trace. (reserved-word time-out)


Entry: hardware prototyping
Date: Fri Oct 19 11:22:00 CEST 2007

TODO:
- sine wave generation
- debug network
- connect a modulator and a demodulator

the first one seems rather trivial to me, so let's do the network
today. first thing is to give up on an ad-hoc bus: that's ok for
uni-directional stuff, but bidir is a pain. so let's go for something
standard.

i got the samples in yesterday. got them running on the breadboard
with intosc. if we can pull off the project on 8MHz, we can run on
2xAAA cells: the 18LF2620 need only 2V, but they need 4.2V @
40MHz. i'm going to stick to intosc for now.

next: I2C

* 2 lines are used: RC3 = clock, RC4 = data, these need to be
  configured properly by the user. on the 18F2620 their only other
  function is digital IO.

* registers:
   - SSPBUF = serial data I/O register
   - SSPADD = device address
   - SSPCON1, SSPCON2, SSPSTAT = control registers

* errors:
   - write collision

* firmware controlled master mode: seems it's just more work, so never
  mind..


Entry: TODO
Date: Sat Oct 20 13:30:29 CEST 2007


- get I2C working between 2 18F2620 chips on breadboard at intosc, as
  fast as possible.

- fix purrr.el : stupid broken indentation is annoying the hell out of
  me. clean up the file first, then automate the indentation rules
  generation etc..


Entry: message passing interface
Date: Sat Oct 20 14:00:30 CEST 2007

Since I2C is a shared bus architecture, care needs to be taken to
place operation in a sane highlevel framework. The interface i want is
asynchronous message passing. Messages should either be bytes, or a
sequence of bytes (in which case 'message' contains the size, and the
a/f regs contain the message)

      message address i2c-send

Let's suppose for now there is only a single process per machine, and
build multiple process dispatch on top of single process.

To do this bi--directionally, an event loop needs to poll for
messages. Dispatching of highlevel messages (internal addresses) can
be done as a layer on top of single message passing. So i need a send
and receive task, and make sure they don't collide

   * it's always possible to RECEIVE, so that should be the background
     task. this simply waits until a message arrives.

   * it's only possible to SEND if the bus is free, so a SEND might
     block.

The problem is that a message might come in while waiting to send out
a message. Therefore messages need to be queued. The moral of the
story:

     A send can never block a process, only a receive can.

So what is a task? It is a function that maps a single input message
to zero or more output messages. The output can be zero in a
meaningful way, because the task has internal state. So basically, a
task is a closure, or an object.

The driver routine can be a single task, since the hardware is
half-duplex. See pic18/message.f for the iplementation attempt.

Something to think about: the ISR needs to be completely decoupled
from the tasks that generate output messages. This is the whole point
of buffering: if there is straight line code from RX interrupt ->
computation task, the tasks that might run a long time will not be
pre-empted. So:

      The RX ISR and the dispatch loop are distinct.

what it looks like (yes i need to pick up hoare's book again..)

\ Message buffering for a shared bus architecture. The topology looks
\ like this:

\                            wire
\                              |
\                              | G
\         A             E      v      F
\  wire ----> [ LRX ] ----> [ LTX ] ----> wire
\                |             ^
\  . . . . . . . | B . . . . . | D . . . . . . .
\                v      C      |
\             [ HRX ] ----> [ HTX ]
\
\ Code above the dotted line runs with interrupts disabled, and
\ pre--empts the code below the line. Communication between the two
\ priority levels uses single reader - single writer buffers. The 6
\ different events are:
\
\ A) PIC hardware interrupt
\ B) RX buffer full condition
\ C) TX buffer full condition (execute task which writes to buffer)
\ D) wakeup lowlevel TX task from userspace
\ E) wakeup lowlevel TX task from kernelspace
\ F) PIC hardware send
\ G) wakeup lowlevel TX task from bus idle event
\
\ A task is an 'event converter'. The 4 different tasks are:
\
\ LRX) convert interrupt (A) to tx buffer full B and tx wakeup E
\ HRX) convert tx buffer full (B) to rx buffer full (C)
\ HTX) convert tx buffer full (C) to tx wakeup (D)
\ LTX) convert wakeup (data ready: D,E) to hardware send.
\
\ The pre--emption point is A: this causes no problems for the
\ low--priority task because of the decoupling provided by the receive
\ buffer. The only point that needs special attention is the LTX task,
\ which can be woken up by different events D, E and G, and care needs
\ to be taken to properly serialize message handling. To do this, both
\ D and E should invoke LTX with interrupts disabled. For E this is
\ trivial: just call the LTX task, for G is is already ok since it's
\ an isr, so D needs to explicitly disable interrupts.
\


Entry: todo today
Date: Sat Oct 20 15:52:49 CEST 2007

- write highlevel buffer code and try out with current serial before
  moving to I2C

- write mini 'hierarchial time' tutorial for sheepsint

- check mail just sent to technocore for delails of the next couple of
  days.

haha.. did none of them :) i suck at planning. what i did do is to
write a synth tutorial that is an introduction to the hierarchical
time thing + some explanation of a pattern language. what this doc is
leading me to is the need for some kind of dynamic variable binding
for code words: i already have 'hook.f' but something more general
should be used. something which directly deals with variables.


Entry: re-inventing C++
Date: Sat Oct 20 16:11:22 CEST 2007

i'm running into the need for polymorphy: i want to express generic
algorithms in a sane way. because of the philosophy of purrr, this has
to be done in a static way, with dynamic built on top of that later
maybe.

Oei this is going to lead to a whole lot of doubts about namespace
management.. Let's concentrate on the practical issues first.

EDIT: i'm going for name mangling.. see below.


Entry: hierarchical time
Date: Sat Oct 20 18:13:38 CEST 2007

One thinking error i made is: if a note word is SYNC followed by
CHANGE, then you can't compose words that start at the same sync. as a
result, SYNC needs to follow CHANGE, and the toplevel invokation needs
to provide proper synchronization.


Entry: the 'i' stack
Date: Sat Oct 20 23:14:45 CEST 2007

what about this: i'm using an extra byte stack, and 'x' is a symbol
that's useful in other contexts.. why not call the stack the 'i'
stack, since it's already used as a loop index in for .. next loops?

hmm.. great idea, but not really feasible without an automated
identifier replace.. it's everywhere.


Entry: dynamic words
Date: Sun Oct 21 00:50:39 CEST 2007

basicly, i need to find words to properly handle execution tokens.
there are 3 uses for a symbol related to dynamic code:

      * declare
      * invoke
      * change behaviour

if it's avariable, invokation will be explicit: because i don't want
the thing on the stack, an extra level of indirection should do it:

    2variable BLA
    BLA invoke
    : changeit BLA ->  ...... ;

another possibility is to use a parser word, which i'm not so keen on
using.

what syntax is better depends on the usage: do invokations dominate,
or do behaviour changes? i used the "->" word in ForthTV to set the
display task: that's a single vector, invoked in only one place, but
muted in a lot of places. let's go for this approach. results in
vector.f (hook.f is basically the same, left there for forthtv)


Entry: todo
Date: Sun Oct 21 16:15:32 CEST 2007

- hierarchical time
- highlevel buffer code (requires some polymorphy)
- tonight: fix purrr.el, clean up stuff in doc/


Entry: hierarchical time
Date: Sun Oct 21 16:37:48 CEST 2007

so what's the problem?

you want to have a class of words which "snap to" a timing grid, but
you want to be able to call a collection of fine scale words from
coarse scale words, without messing up the sync. the problem is that
if you do:

  : foo   8 sync-tick bar bar ;
  : bar   7 sync-tick .... ;

there are too many waits: "8 sync-tick" followed by "7 sync-tick"
waits for the next 7-scale tick.

somehow the sync word needs to know that the current time is already
ok. either:

  * assume that the caller does the outer bounds, and have callees do
    only subdivision. this works, but is cumbersome.

  * find a way to see that we're running synchronized.


how can a 0->1 transition in bit n be recognized in the bits < n?
they're all 0. but that's not very helpful.

damn i need coffee.

the question to ask is: did we recently sync? this can be answered by
copying the whole counter register to some place, and computing the
diff. this also allows to trigger on edges.

what about this: use some dynamic scoping for syncing. there is only
one word 'sync' which will synchronize on clocks given the current
time scale.


for each time scale one needs:

        a word that can compute the current phase count. this needs a
        bit offset and the last sync point. bit offset might be easily
        stored as a bit pattern.

global:
	the counter
	the last sync point

\ compute time difference from last saved sync point, using mask to
\ ignore fine scale.
: sync-diff
    sync-counter @
    sync-last @ -
    sync-mask @ and ;
macro
: sync-inphase?
    sync-diff nfdrop z? ;
forth

actually that doesnt solve anything.. it's quite easy to wait until a
condition changes, but it's a lot less easy to determine whether the
condition just happened.

really, the only thing i see is to have patterns like this:


_|_|_|_

which can be nested in larger scale patterns like

_______|_______|_______|_______

_|_|_|_ _|_|_|_ _|_|_|_ _|_|_|_


there the first and last syncs are removed, and only the subdivision
is synced to. it's then te responsability of the caller to turn things
on and off.

it looks to me that this is a real pain to work with.. maybe i should
just write a couple of words and see if it's actually sane to get
something working.. one thing i thought about was to bind the current
sync level to the word "|"

: hihat [[ noise 10 for | next ]] ;

where the [[ and ]] save and restore the synth config on the x
stack. that's 7 bytes per level, which is a bit too much probably, so
stick with manual saving/restoring.


ok. there really is only one decent solution: escape continuations. in
order to make proper use of synchronization, the caller needs to
indicate how long a word is allowed to last.

now, instead of that, think of there being only one voice at all time,
which simply accepts events from a separate entity. so the synth looks like:

[ CONTROL ] -> [ VIRTUAL SAMPLE PLAYER ] -> [ CORE SYNTH ]

each virtual sample is a word that loops forever. this requires multitasking.


Entry: generic functions
Date: Tue Oct 23 16:09:09 CEST 2007

When trying to implement the buffer algorithm, i ran into the need for
abstract objects: each buffer (queue) is going to have the following
interface:

  read
  write
  read-ready?
  write-ready? (maybe.. in case buffer-full condition is used..)

I have enough with a static object system: anything dynamic has to be
handled explicitly on top of that using byte codes (route) or vectored
words. So what is needed is simply a static (compile time) method
dispatch.

Should there be special syntax for messages, or do we just use a
single flat namespace, with some words dedicated as messages? For
example: 'read' could be such a message: always requiring a literal
object. This seems simplest, let's try that first and change it if it
is not appropriate. So:

   - WHERE is 'read' defined
   - HOW is 'read' defined

Suppose we use a 'method' keyword for creating new methods. This
probably trickles down to making the parser also generic. Let's use
CLOS terminology.

So what am I doing?

    I am roviding a means for static namespace management so I can
    write generic algorithms (as macros). As of this point NO effort
    is made to implement dynamic generic algorithms: this should be
    built on top of the static version.

My approach is going to be very direct: if more abstraction is needed
i will fix it later. Currently multpli dispatch is not yet
implemented. The interface should be:


    class BLA	          \ create a new object (a macro namespace)
    method FOO	          \ declare a new method object
    BLA method: FOO ... ; \ define a new method FOO of object BLA
    BLA FOO	    	  \ invoke method FOO for object BLA


So, how to implement.. This was the easy part:

   ;; Dictionary lookup.
   (([qw tag] [qw dict] dict-find) ([qw (dict-find dict tag)]))


Now the thing to do is to store the dictionary somehwere. This has to
mesh with the macro definition part of purrr.. let's see (using s-expr
macro definition syntax on the rhs)

     class BLA	    == 	       (BLA '())
     method FOO	    ==	       (FOO 'FOO dict-find compile-message)

Here 'compile-message' depends on what's exactly stored in the
dictionary: macro objects or a mangled symbol. It's tempting to just
go with symbol mangling: that way ordinary syntax can be used, and
interface to the rest of the language is really straightforward.

Let's go for the simple symbol mangling, which doesn't even need
dictionaries:

     A class is a collection of methods. Classes are identified by a
     symbol. A method is a macro which dispatches to another macro
     based on the symbol provided.

     class BLA      ==         (BLA 'BLA)
     method FOO     ==         (FOO 'FOO dispatch-method)

     : BLA.FOO ... ;

     FOO BLA        ==         'FOO 'BLA dispatch-method


Entry: problem in macros defined in forth syntax: quote doesn't work properly
Date: Tue Oct 23 17:43:02 CEST 2007

suppose i want this:

	: broem  ' broem ;   ==  (broem   'broem)

how to do that? currently this just gives an ifinite expansion because
the quote is not recognised. why? because inside the 'definition'
parser, the parsing words won't work.. this is probably a good thing,
but quote does need to work.. let's separate parsing words from quote
parsing.

the lex stream should be made a bit more clear.

FORTH -> [load flattener]
      -> [forth stuff: parsing words + definer environents]
      -> [quoting]
      -> SEXP


Entry: locals for macros?
Date: Tue Oct 23 20:08:46 CEST 2007

Once more than 50% of a macro's code is stack juggling words,
something needs to be done about it. The macro below is a typical
'multi-access' pattern: an EXPANSION instead of a CONTRACTION.

\ transfer bytes from one object to another
macro
: need not if exit then ;
: m m-dup m> ;
: transfer-once  \ source dest --
    swap >m >m

    ' ready? m msg need m-swap
    ' ready? m msg need m-swap
    ' read   m msg m-swap
    ' write  m msg m-swap

    m-drop m-drop
    ;
forth

What i really want is a locals syntax for macros that perform a lot of
expansion:

: transfer-once
     { src dest }

     ' ready? src msg need
     ' ready? dst msg need
     ' read src msg
     ' write dst msg
;

The macro system already has a syntax for locals, so i just need to
add this to the parser + choose the right semantics (code or data).


EDIT: also, what about just . (dot) for name binding operation?


Entry: locals
Date: Tue Oct 23 21:53:04 CEST 2007

Actually i did this before. I guess in brood-2 there's a syntax that
takes words like this:

      (a b | a b +)

Resembling Smalltalk's syntax for anonymous functions. i just saw
Factor also uses the vertical bar.

What i could do is to combine this with my special quoting syntax:

(a | a)      == execute
(a | 'a)     == identity

Following the rationale that words are mostly functions, and constant
functions are the exception.

This kind of syntax took me a while to get used to, but it makes a lot
of sense: has lead to a lot of simplified mixing of scheme and cat
code.

So what about combining that with destructuring?

       ((a . b) | 'a 'b +)

Hmm.. Let's leave that as an extension. There's no reason not to
however..

I think I need a dosis of good old fashoned confidence to go for the
quoted approach. What is more important: to stay true to the fact that
symbols are functions, or to go for the lambda-calculus approach of
using symbols as values + explicit application.

Even though it looks strange, the issue is: do i stick with my
previous realization that his is a good thing dispite it's strange
look. So the choice is either (classic):

      (a b | a b +)   ==  +
      (a | a execute) ==  execute

this has the interesting property that permutations are easily
expressed. or do i go with my approach

      (a b | 'a 'b +) == +
      (a | a)         == execute


What I could do is to use 2 forms of binding, and i guess that's what
i did before. have | do the stuff abouve and || do the normal thing,
or the other way around.

      (a : a)  == execute
      (a | a)  == id
      (a : 'a) == id

using the ':' has the added benifit of reminding you of a
"definition".


Entry: lambda
Date: Wed Oct 24 23:11:02 CEST 2007

Having had a night to sleep on it, i think it's going to be:

       (a b | a b) == id


* Lambda is simply too important to gratuituously do different.

* Data parameters are used more than function parameters, which in
  turn are easily quoted.

* It is compatible with current stack comment notation.


Entry: implementing lambda
Date: Thu Oct 25 13:20:50 CEST 2007

apparently i need to be careful where to introduce local variables in
the syntax expansion. as long as there's a lambda expression enclosing
a (xxx: a b c) macro, all lexical variables are identified properly,
but in this they are not:

  (define (bar? x)
    (eq? '\| (->datum x)))

  (define (represent-lambda c source)
    (let-values
        (((formals pure-source)
          (split-at-predicate bar? (syntax->list source))))
      #`(make-word
         '#,((c-language-name c) c)
         (quote #,source)
         (lambda
             #,(if (null? formals) #'stack
                   #`(#,@(reverse formals) . stack))
           #,(fold (lambda (o e)
                     (dispatch c o e))
                   #'stack
                   pure-source)))))

the 'dispatch' operation doesn't recognize lexical variables yet,
because the enclosing lambda macro hasn't updated the symbols.. so
lambda syntax should be introduced at a higher level.

i need a shortcut, only for macros, and then work up the abstraction
if necessary. the thing to extend is the 'macro:' form itself.

hmm.. i'm making a bit of a mess of it..

the lexical scoping for the macros is a bit special, and is probably
best handled using the pattern matching transformer stuff: the lexical
variables in macros should be bound to literal arguments in the
assembly buffer.

  (a b | a b +)

   ->

  (([qw a] [qw b] it)  (insert (list (macro: 'a 'b +))))

which is really awkward in the current composition.. it's probably
easiest to make a special purpose matching word as a straight lambda
expression. something like:

  (match stack
         (((('qw b) ('qw a) . rasm) . rstack)
          (let ((a (literal a))
                (b (literal b)))
            (apply (macro: a b +)
                   (cons rasm rstack)))))


Actually.. This is quite universal, except for WHERE to find the
arguments.. Anyways, let's get on with it.

  (make-word
   'macro-lex:
   '(a b \| a b +)
   (match-lambda*
    (((('qw b) ('qw a) . rasm) . rstack)
     (let ((a (macro: 'a)) (b (macro: 'b)))
       (apply (macro: a b +) (cons rasm rstack))))))


The first macro using lexical variables in synth-soungen.f

  macro
  : sync bit | \ --
      begin yield bit tickbit low?  until
      begin yield bit tickbit high? until ;
  forth

Subtle ay :)


Entry: theory
Date: Thu Oct 25 21:07:55 CEST 2007

in order to finish brood.tex, it looks to me that type theory is not
really the most important thing to brush up on: partial evaluation
is. there's a lot of stuff here:

  http://partial-eval.org/techniques.html

i need to give some proper attention. if only to relate my intuitions
to things people have spent some thought on.


Entry: multiple exit points
Date: Thu Oct 25 21:48:31 CEST 2007

instead of writing macros containing 'exit' which are really a loaded
gun, it might be better to write a proper while abstraction that uses
multiple conditions. unfortunately, an 'and' is not very easy to
optimize..

  macro
  : need not if exit then ;
  : m m-dup m> ;
  : transfer; src dst | \ --

      begin
         ' ready? src msg need
  	 ' ready? dst msg need
  	 ' read   src msg
  	 ' write  dst msg
      again

      ;
  forth

why is this complicatied: because i don't want to use 'and'. what i
want is a word 'break' which breaks from a loop on a condition. maybe
'transfer;' is good enough: since i already have arbitrary WORD
exitpoints, i can use this to get any control structure exit point: it
also prevents juggling of the control stack (macro stack).


Entry: move
Date: Thu Oct 25 22:19:09 CEST 2007

for this i need 2 pointer registers. thing is: i'd like to use the x
stack's register to do this a bit efficient, but then i can't use for
.. next !

implementation detail anyway..


Entry: buffers
Date: Fri Oct 26 13:04:40 CEST 2007

next on are data buffers. i have some code that uses 14 byte buffers
together with some dirty trick of storing read/write pointers in one
byte for easy modulo addressing. i could dig that up again?

what is a buffer?
     - 2 pointers: R/W
     - base address of memory region (statically known)
     - size (statically known)

suppose i represent it as 2 literal values:  rw-var offset

see buffer.f for draft (committing now)

but..

isn't it wise to write some code for generic 2^n buffers? where a
buffer consists of 2 variables, a mask indicating its size. ok, did
that but it leads to more verbose code.

a different strategy could be to store the read pointer or difference
at the point where W points, this saves a cell that's normally used to
distinguish between empty and full. hack for later..

anyways, i stick with the current: its probably good enough. i need to
move on.

nibble-buffer.f tested.


Entry: 0= hack
Date: Fri Oct 26 13:29:59 CEST 2007

i'd like to figure out a way to efficiently implement the 0= word,
wich turns a number into a condition. the problem is that 'drop'
messes up the zero flag, so i used a 2-instruction movff trick
before.. but using drop should be possible when using the carry flag.

hmm.. nfdrop is only 2 slots.. i don't think i can do better really.


Entry: I2C comm
Date: Fri Oct 26 16:16:20 CEST 2007

how to get this going? the typical 'debug the debbuger' problem: I2C
is going to be used for the debuggin network, but until that works


master/slave:

  to preserve symmetry, it might be wise to use a dedicated single
  master node which runs debug code, so all the kriket nodes can be
  identical (slaves).

  ideally, all cricket chips are free from ICD2 and SERIAL ports, and
  have only power, ground, and I2C clock and data.


send/receive:

  let's stick with the ordinary monitor protocol over I2C. the thing
  to do is to make a hub.


Entry: SD-dac
Date: Fri Oct 26 22:07:00 CEST 2007

A Sigma-Delta Modulator (SDM) can be thought of as an
error-accumulation DC generator: given a constant input, it will
generate the correct average DC output, with a quantization
error noise spectrum that is high--pass.

A First-order SDM is an extremely simple circuit: it consists of an
accumulator with carry flag output: at each output instance, the
current output value is added to the accumulator, and the resulting
carry bit is taken as the binary output, and discarded.

I had this idea of running an 'inverse interrupt' machine: instead of
loosing time in ISR, just run an infinite loop, but allow at each
instance one primitive to run, which needs to spend an exact amount of
cycles. Probably not worth the hassle, but could be interesting for
really tight budget.

Anyways, this could be an alternative to PWM for kriket sound
generation. It should in theory give better quality. but probably that
also needs a deeper accu. With fast interrupt it's only 3
instructions:

	movf	OUTPUT, 0, 0
	addwf	ACCU, 1, 0
	rlcf	PORTLAT, 1, 0

assuming it's bit number 0 in port, and the rest of the bits we don't
care about (i.e. are inputs)

the problem here of course is that it's not just output that counts:
the output also needs to be computed.

Looks like it's not really worth it. Best to use PWM interrupt with
plugin generator code. At 2Mhz to get the carrier above audible
frequencies would put the divider at 64, and the carrier at 31.25
kHz. (The interesting thing here is that it could also be used for
bit-bang midi output at the same time :)

To get this going: best to add a small modification to sheepsint to
switch it into PWM mode.


Entry: FM sheep
Date: Fri Oct 26 23:16:24 CEST 2007

ok.. let's see what's necessary to make an FM (PM) synth in style of
Yamaha oldies. using a proper synchronous fixed time sharing approach
a lot is possible:

1. 31.2 kHz  1  x SDM output
2.  7.8 kHz  4  x 8 bit synth voices
3.  9.7 kHz  32 x envelopes

for all this i have 64 instructions.

one envelope per operator is more than enough. i've been checking out
the code for table lookup, and it can be brought down to 4
instructions

movf PHASE
movwf TBLPTRL
tblrd
movf TABLAT

but i doubt if 8 bit phase resolution will be enough..


Entry: hub board
Date: Sat Oct 27 15:07:51 CEST 2007


make hub board, first for serial, then for I2C. the idea is that a hub
board can be placed inbetween a normal serial board and a PC host:
it's only goal is to provide control over the serial slaves.

the condition is that all slaves have identical code, which means that
host indeed can switch without problems between different slaves:

[ PC ] --- [ HUB ] === [ S1 ] === [ S2 ] === ...

requirements:

  * the interface that implements this should be transparent: there
    should be no need for calling code on the hub directly. (except for
    debugging the hub where the host has just hub's dictionary).

i suggest to do this we use the next slot of 16 interpreter commands
to pass through monitor commands to the hub.

again: if i manage to get things working this way (async serial hub) i
have no need for I2C to do networking.. in fact, in order to get I2C
working i better build a proper debug network!

and more: if i get this serial passthrough to work, moving to a
synchronous 1-wire approach should be no problem.

ok, i have 50 solutions now..

TODO:
	- make it work for serial = standard
	- use serial to bootstrap 1-wire
	- MAYBE use I2C after that, probably too complicated


Entry: 1-wire revisited
Date: Sat Oct 27 15:47:27 CEST 2007


yes, why not.. it's a cheap hack but might be worth it. and i already
have provisions for it on the CATkit board, so the solution should be
re-usable. (CATkit: COMM is RA4).

let's stick to the ordinary monitor protocol with RPC semantics: (host
asks question, client responds / acknowledges). this is already
half-duplex, so fits nicely in a shared bus context. a simple start
bit, 8 data bit, stop bit could be used for comm using the following
waveform:

	  1 1 X 0 1 1 X 0

with X the 'shared bus' point, we can have a bidirectional link:

    * there's always power in a cycle (at least 50%)
    * bus is high when idle
    * there's a sync point 0->1 for slave sync
    * the send/receive is software control

protocol could be somthing like:

    * master just sends (start bit, 8 data bits, stop bit)

for the CATkit board, the sync could replace the fixed TMR2. let's try
the following:

    * fix CATkit's no-serial cable detection. (OK)
    * drive a CATkit board with a square pulse
    * use TMR2 to perform timed read or write

next: config RA4
    * open drain output (needs external pullup - master side?)
    * does have protection diodes to both sides

so, in theory it should be able to feed the chip through the
protection diodes.. but as far as i can see, it doesn't boot
properly. after adding a diode RA4 -> VDD it boots on DC. i don't
understand..

so, on to the controller. from the host side, everything is
synchronous. so timing should not be an issue. driving a couple of
busses in parallell poses no extra problems.

hardware: the dallas 1-wire bus apparently drives the targets through
a resistor, instead of a transistor. i was wondering how to prevent
hazards on the bus, and this is probably it: brief inspection shows
that a faulty client can bring down a network easily by shorting
during charge phase. a resistor also limits the charging current. so i
guess resistors are good. (i wonder if the weak pull-ups can perform
this task.. probably better not.)

pic has quite a large maximum current sink (25 mA), which would
determine the minimal size of the pullup resistor, i.e. at 5V the
minimal is R = 200 ohms.

simplifications WRT dallas 1-wire:

  * one slave per wire: no elaborate synchronization protocol
    necessary: all flow control is done in software using the purrr
    protocol. (host initiates transfer by sending a couple of bytes
    and waits for reply)

  * multiple slaves: they need to be addressed. in that case, some
    protocol is necessary. i.e. addr = 0: broadcast, no
    reply. otherwize: address followed by a couple of data bytes.

  * can use a 4-phase regime 10XY, where the receiver samples
    inbetween X and Y.

  * in case no comm is needed: master leaves line high: no unnecessary
    drain when pulling the resistor low.


using RA4 on the 18F1220. and for sending? can probably use an 18F1220
as a hub too, if it uses just one output. which output to use? only
RA4. maybe one bus is really enough? this way i could use simple RCA
splitter cables to build a network.

ok, i thought i needed an open drain output. apparently not: just
switching between 0 and Z is enough.


Entry: CATkit/krikit debug board
Date: Sat Oct 27 21:39:41 CEST 2007

* in debug mode: one bidirectional power/clock/data per slave (raw
  byte protocol: no address). this makes it a drop-in for the normal
  async serial io for the monitor. in 'midi' mode the port can easily
  run unidirectional shared. bidirectional shared is a software
  problem that can be solved later.

* using the 18f2620 for driver. the package is small enough to be
  practical. it can run without xtal at 8Mhz and has on board i2c for
  more elaborate networking later on. it has enough pins to add some
  status output. (i.e. RGB led)

* port B is used for communication. RB4:RB7 have interrupt on change,
  so could be used for more elaborate slave comm later.

* running CATkit on a full line through 1k gives a 2V drop = 2mA.:
  that sounds about right. since this is low bw debug comm, it sould
  be possible to just leave the line idle = high. that means no clock
  is coming in.


so what about this:

 - run CATkit TMR2 at a higher rate, i.e. 31.25 kHz. this would give
    * a decent timebase for SD sample tests
    * a 7.9 kHz bitrate for debug comm
    * ability to send MIDI data from CATkit board


I wrote the code for the network debugger. The 4-phase modulator and
receiver transmitter framing words are done and tested. The remaining
thing is how to switch between receiver and transmitter. Probably
something like this:

    - start with receiver
    - receiver gets idle -> check tx buffer -> tx / rx
               gets data -> start rx state machine
    - transmitter stop -> check tx buffer -> tx / rx
                  data -> tx state machine

these can be taken into one loop, and activated depending on rx/tx
flag.


Entry: sheepsint urgent todo
Date: Sat Oct 27 23:06:44 CEST 2007

LIST MOVED DOWN


Entry: nasty sub bug?
Date: Sun Oct 28 14:32:32 CET 2007

the following code leads to incorrect asm:

    123 @ 124 @ -

	dup
	movf	123, 0, 0
	subwf	124, 0, 0

that should be subfw ??

the problem is in "123 @ -"

took the - and -- words out of the 'binary' meta patterns and fixed.


Entry: rtx
Date: Sun Oct 28 16:41:44 CET 2007

looks like it's +- working, at least the transmitter.
one little problem still, if client syncs to 0->1 transition, what
happens when it picks up in the middle of a data stream? suppose #x55
which is just a bunch of:

0100 0111 ...

syncing to the right frame is not a problem: per bit there is only one
0->1 transition to sync to. so the problem is that each client should
start with an idle line. it's the same problem as async serial.

so..

receiver for sheep. let's stick with a RX state machine only. the deal
is this:

- interrupt on change: detect 0 -> 1
     reset TMR2 + RX state machine

all logic from hub.f can be re-used, except for the top sequencer,
which should be
      route   ; ; ; rx-bit ;


Entry: comm on catkit
Date: Sun Oct 28 17:38:26 CET 2007

there are 2 ports left: RA4 RA6

both are not very interesting: no interrupt on change, or interrupt
facility. interrupt pins that can be reused are:

	  RB5 (INT1/TX)
	  RB2 (INT2)     not without cutting traces or removing R8
	  RB0 (INT0)     not without removing last pot
	  RB5-RB7 (KBI)  multiplexed with switches
          RB4 (KBI0/RX)

can it be done using polling only? i.e manually synchronize on each
start bit or something. need to think a bit more, but it looks like
manually polling is going to be problematic. the easiest thing is
really RB2/INT2: it's a proper interrupt, and its functionality is not
used atm.

maybe i should leave catkit out of it and try to get it to work on
krikit first.. catkit needs an update anyway, and this could be a nice
addition. reminder:

    - ditch AUDIO- for INT2
    - external rectifier diode
    - serial RX 100k pull-down
    - fix pot distance
    - fix switch distances
    - room for LED


Entry: Manchester
Date: Sun Oct 28 19:30:21 CET 2007

i'm wondering whether it's not simpler to use Manchester code. (BPSK
with square waves)

symbols are 01 and 10

once synchronized, the signal can be locked by allowing resync in on
the fixed transition at half symbol. syncing can be done on an idle
line, all one (10).

catch: for uni-directional with sender = master this works fine, but
bidirectional is problematic.


Entry: eliminating the pullup resistor
Date: Sun Oct 28 19:40:41 CET 2007

In case there's one slave only, the pullup resistor can be eliminated
by using a current-limiting resistor to prevent short-circuit on
collision.


Entry: slave on krikit
Date: Sun Oct 28 19:59:46 CET 2007

Got one spewing 123, now need another one listening.

Slave uses RB0 (INT0). Apparently i can't pull the line all the way
down.. Probably on-resistance (i'm pulling down 100 ohm..)

Sequencing is an interplay between INT0 and the TMR2.

INT0 -> reset timer phase + call 'rtx-next'
TMR2 -> 'rtx-next'


The other one got 123, and some shifted version out of sync. To get
better sync during debugging, bytes could be interleaved with a 10 bit
idle preamble.  This would guarantee resynchronization after the first
faulty reception.


Entry: strong 1
Date: Mon Oct 29 05:17:06 CET 2007

          --- Vdd
           |
          [Ru]    /--[Rl]-o SLAVE I/O
           |      |
MASTER o---o------o--|>|--o SLAVE Vdd
                          |
                         === C
                          |
                         --- GND

0 1 2 3
0 1 X X

phase 1 is 'strong drive' directly from Vdd, not through a pullup
resistor. this avoids strong sink currents and large voltage drop.

during phase 0 and 1, MASTER is OUT. also if it's sending in 2 and
3. when receiving, master is Z, so Ru pulls up the line.

a slave can still mess up by pulling a line high, but the short
circuit is prevented by Rl.


Entry: intermezzo: macro vs. return stack
Date: Mon Oct 29 15:50:14 CET 2007

actually, this is quite simple. if i change the terminology a bit,
compilation of local labels for jumps and run-time control flow using
execute and exit could be unified somehow.


Entry: about named macro arguments
Date: Mon Oct 29 19:46:11 CET 2007

maybe it's better to stick to prefix syntax to not gratuituously move
away from forth syntax. after all,

: 2@ var |
   var @
   var 1 + @ ;

is not too much different from

: 2@ | var |
   var @
   var 1 + @ ;

It will also simplify the implementation.


Entry: urgent stuff
Date: Mon Oct 29 20:52:04 CET 2007

time flies. i need to get     debug network running today. it should
not be more than patching the interpreter to the rtx: the hub should
just be a loop that polls the serial port, and possibly executes some
special purpose commands. the slave needs a new dispatch table
connecting to rx, tx from the slave rtx.
                          the

so todo:
- get this debug patch-through to work: nothing fancy, just repeat
- fix some of the urgent problems


Entry: hub interface
Date: Mon Oct 29 23:25:15 CET 2007

i'd like to do this with changing as little as possible. connect to a
hub just like any other project, but there should be a way to execute
its application without needing knowledge about the dictionary of the
hub device.

let's change interpreter.f to

\ token --
: interpret #x10 min route
             ; receive ; transmit ; jsr    ;
     lda     ; ldf     ; ack      ; reset
     n@a+    ; n@f+    ; n!a+     ; n!f+   ;
     chkblk  ; preply  ; ferase   ; fprog  ;

     e-interpret ;

the last word should lead to a reset if it's not implemented, or to
the interpretation of an extra set of byte codes. in any case, it is
required to be filled in by specific monitor code.

now: if there's no extension implemented, should invalid commands be
ignored or not? there's no proper way to react to invalid commands,
since they can quote the following bytes, leading to a completely
non-interpretable state... just reset is probably good enough.

another problem: if the hub just passes through, how to control it
after switching to passthrough mode? serial break is an option. need
to figure out how to send that in mzscheme though..

there should be a more elegant solution, but this requires either the
traffic to be quoted, or the new interpreter to actually understand
(parse) the traffic to see what comes through. the latter is not so
easy because of quoted bytes.

a better solution is to completely override the boot interpreter. that
way all traffic can be properly redirected.

i guess i'm making it too difficult. the real problem is: this hub
thingy doesn't fit in my debug or run view: the cable can't determine
whether a boot interpreter should be started or not. let's start there.

next: name for the protocol.. i'm going for E2: it's the binary
representation of 0 and 1: 0100 0111 = E2 with lsb first.


Entry: the big questions
Date: Tue Oct 30 00:25:54 CET 2007

probably the huge shot of caffeine i got today, but i'm in delusion /
big-idea mode again.. i run into a lot of bootstrap problems. today's
boostrap problem is debugging the debugger. somehow i think
bootstrapping is really the only significant problem.. it's the
"getting there" that's important practically, not so much the staying:
that should be obvious.

i find it a facinating subject. i need to read more about it:

 * need to play with piumarta's cola stuff: objects and lisp as ying
   and yang (though lisp has it's own ying and yang: eval and apply, i
   wonder if this is the case for objects? probably something with
   v-table lookup).

 * need to read about 3-lisp and reflective towers

 * i'm not so sure if writing a proper language bootstrap is valuable,
   but somehow it looks like yes. brood is a bootstrape exercise
   really. i'd like to end up, not necessarily at scheme but at a
   dynamic language to run on small machines.. maybe cola is the way
   to proceed?


another thing i need to read up on is partial evaluation and C code
parsing and refactoring, but that's secondary really.. maybe bootstrap
is indeed the only real problem


Entry: parsing again
Date: Tue Oct 30 04:07:36 CET 2007

* added packrat parser code from Tony Garnock-Jones
  this should "end all _real_ parser woes" when i switch to a
  different syntax frontend.

* for the forth regular parser, i just need to add proper syntax for a
  regular syntax stream pattern matcher: i have no real recursive
  parser need for the forth (really out of principle: to stick to the
  roots and make the language simple to understand. there's something
  to say about a simply parsed language when teaching!)

* the only reason i'm using syntax streams is to be able to recover
  source location information and to use syntax-case. the latter is
  probably not the right abstraction.


what i want to say is something like:

(parser-pattern (macro forth)
   ((macro <stream> forth)    ----))

where the '<stream>' is bound to a syntax stream

from portable-packrat.scm :

       (packrat-parser expr
		    (expr ((a <- mulexp '+ b <- mulexp)
			   (+ a b))
			  ((a <- mulexp) a))
		    (mulexp ((a <- simple '* b <- simple)
			     (* a b))
			    ((a <- simple) a))
		    (simple ((a <- 'num) a)
			    (('oparen a <- expr 'cparen) a)))

i read on the wikipedia page that a packrat parser is necessarily
greedy. i'm not sure in what sense..


Entry: finite fields
Date: Tue Oct 30 07:32:21 CET 2007

http://www.lshift.net/blog/2006/11/29/gf232-5

in 8 bit, the biggest prime is 2^8-5 = 251

i'm not sure what this is useful for though.. some error checking /
correcting stuff? the article talkes about "a" finite field, as if it
mostly doesnt matter which..

ah.. the wikipedia article on coding theory mentions subspaces of
vector spaces over finite fields. a naive way would be to use i.e. a
3-space in the 4-space over GF(251).


Entry: fixing the assembler
Date: Tue Oct 30 16:35:18 CET 2007

made the dictionary into a parameter to move code from internal ->
external definitions. now i need to abstract away the control flow of
the assembler: eliminate assemble-next

let's see, the type of control used is:

      * comma -> expand to list of instructions
      * comma0 -> same, without updating instruction pointer
      * register -> dictionary mutation


primitive ones:

      * again (retry) -> retry assembly with updated (dictionary?)
        state

properties:

      * assembling an instruction is 1 -> 0, 1 or more


looks like the major difficulty is in assembler operations that
recurse.. currently it's handled by just pushing an opcode in the
input buffer and calling next. i'm going to make this recursing
explicit.

how should non-resolved symbols be handled? just returning the
instruction seems best. i do need to fix absolute/relative addressing.

what about leaving restart to the sequencer? the idea is to provide
some expansion to plug-in assemblers (asm-find)


so the point where i need to make some changes is the way 'here' is
used: chicken and egg:

      * can't determine 'here' untill all previous instructions were
        assembled.

      * can't assemble instruction intil it is know how far a forwared
        reference is.

what about trying to solve this with backtracking? is that overkill?
maybe backtracking with memoization? maybe assembly itself is cheaper
than memoization :)

maybe every instruction should be compiled to a thunk that takes just
the absolute address?

hmm.. need some time to sort it through.. it should be possible to
write this in a lazy way..

roadmap:

 - get it to work like it did before
 - change the implementation of 'here' to a parameter
 - create a graph data structure from 'label'
 - figure out the control flow for some backtracking like thing
 - write some graph opti (i.e. jump chaining)

another remark: having 'labels' as pseudo instructions is bad. they
shoul really be true graph elements: pointers to instructions.

hmm.. i need a break.

lap.. goes wrong 'somewhere' :)

maybe i should fix the 'here' thing first before trying to get it to
run, since it's somehow messed up. i need to start over:


Entry: here kitty
Date: Tue Oct 30 22:54:39 CET 2007

what with 'here' ?

now that it's separated out a bit more, it's easy to see it is a bit
of a mess: i'm using org-push and org-pop so i can't just eliminate
it..

i need to separate these concerns:

* ORG / ORG-PUSH / ORG-POP   = telling where things go

  it's easier to cut out the intermediate part and handle it
  separately. a bit of a crazy way of doing things..

* 'here'  = using self-referencing blabla

i need a proper way of expressing all these dependencies: once 'org'
stuff is dealt with, and the absolute/relative problem is solved, the
remaining problem is one of relaxation.


Entry: relaxation problem
Date: Wed Oct 31 00:45:26 CET 2007

some choices need do be made before enough information is present, but
instead of completely starting over (backtracking) the form of the
problem is such that the intermediate solution can be updated. as long
as a complete dependecy graph is present, the solution is quite
trivial: just recurse over all dependencies.

some hints for finding the right data structure

1. instructions that do not reference code locations, either as jumps
   or just as literal words, are irrelevant and can be ignored.

2. labels point _in between_ instructions

3. keep the cause of events abstract: any instruction that has a
   reference can grow.

4. this is related to functional reactive programming

let's stick to the idea of instruction cells: each cell contains a
single symbolic opcode with arbitrary length.

thinking of this as cells sending messages to each other, there are 2
kinds of messages:

- tell next cell it has moved
- tell cells that depend on a label they need to update

looks ok at first, but for non-contiguous code that doesn't have a
non-decreasing code distance between several nodes, this might not
terminate.. if i make sure code never shrinks, this should be ok
though..

hmm... i need to read a bit about this. i guess in general it's
"linker relaxation".

so most important notes:

   * downward updates from a size change can be eliminated

   * to ensure termination, only expand/contract in one direction:
     that way it will at least stop at the case where all references
     are expanded.

   * if a size change happens as a consequence of an update


Entry: a more traditional approach
Date: Wed Oct 31 03:43:54 CET 2007

http://compilers.iecc.com/comparch/article/07-01-038

    There is a type of assembler that does exactly the same thing on
    every assembly pass through the sourcecode. Pass 1 outputs to
    dev>nul and is full of phase errors, pass 2 has eliminated most
    (or all) phase errors (output to nowhere) and pass 3 usually does
    the job in 99%+ cases whereupon code is output. On each pass
    through the sourcecode (or p-code in your case) you check for
    branch out of range, then substitute a long branch and add 1 to
    the program counter, causing all following code to be assembled
    forward+1, then make another pass and do the same thing again
    until no more branch out of range and phase errors are found do to
    mismatched branch-target addresses.


That doesn't require my esoteric approach and seems a lot simpler
really: just keep it running until the addresses stop changing.

So just do as before, but:

   * keep a phase error log
   * use a generic branch instruction which gives short or long branches
   * every pass is completely new
   * split 'old' and 'new' labels, make new labels mutable?
   * put 'here' in a dynamic variable
   * make a quick scan for labels to find out undefined ones

NEXT:
	prepare assembly code so multiple clean passes are possible:
	- get rid of 'mark' for example.
 	- put 'here' in a parameter
	- remove all dictionary manipulations
	- find a way to handle var and eeprom.. maybe separate pass/filter?

the goals is clear enough.. just some disentangling to do first..

different approach:

  * use the previous approach, but keep the dictionary after every
    pass (clean it inbetween)

  * keep a log of the name registerations to determine phase errors.


Entry: comparators and square waves
Date: Wed Oct 31 05:00:41 CET 2007

before trying anything with sine waves, it makes sense to at least
have a go at pure binary singnals spanning the entire bandwidth. i'm
curious as to how far i can completely eliminate amplification, and
use only a comparator?

i do loose all signal presence detection capability, and amplify noise
tremendously. but this does transform everything into a software /
filtering problem. i guess with some good codes i can actually get
things through..


Entry: shopping for opamps
Date: Wed Oct 31 19:59:42 CET 2007

@ maxim for low voltage rail-to-rail.

i can get as low as 2.7V for
MAX4167   5MHz, 1.3mA      (DUAL)
MAX494    0.5MHz, 0.15mA   (QUAD)


Entry: name spaces and objects
Date: Wed Oct 31 23:32:41 CET 2007

i'm trying to figure out how to make the name manging work well enough
to create static metaprogramming interface which supports generic
programming at the macro level.

  * write algorithms in macro form
  * instantiate them statically as many times you need


what i'm really missing is higher level macros. with those, i can
build anything i want really..

so why is it impossible to have those? i probably need to give up on
forth syntax..

(let me finish my verbose buffer code before i try to answer..)

ok. i don't know really.

let's first try to get things like this out of the way:

: bbf.tx-empty>z bbf.tx buffer.empty>z ;
: bbf.rx-empty>z bbf.rx buffer.empty>z ;
: bbf.rx-room>z  bbf.rx buffer.room>z ;
: bbf.tx-room>z  bbf.tx buffer.room>z ;

: bbf.>tx      bbf.tx buffer.write ;
: bbf.>rx      bbf.rx buffer.write ;
: bbf.tx>      bbf.tx buffer.read ;
: bbf.rx>      bbf.rx buffer.read ;
: bbf.clear-tx bbf.tx buffer.clear ;
: bbf.clear-rx bbf.rx buffer.clear ;

what i want is just

     ' bbf.tx- compile-buffer
     ' bbf.tx- compile-buffer

i can't even do variables since they are macros..

  * yeah i need to be able to generate macros
  * and fix name clashes within a compilation unit: both words and macros.

maybe the trick is really to define 'compilation unit' properly?

in my current approach, a macro can't pop up during expansion of code.


i need to get the philosophy right:

   * a flat namespace is nice for an application: everything is
     concrete. we're "among freinds" and last names are not necessary.

   * it sucks for writing library code

the solution in mzscheme that works for me is functions +
modules. local module namespace can be used for small specialized
utility words. i'd like to have something like that in forth.

the problem is: i'm taking a really static stance in which macros play
a central role, not functions. this works as long as macros are
sufficiently powerful, which means higher order macros.

now, let's pull the problems i'm having apart:


i wrote some buffer code, which is just macros. to instantiate a
buffer one doesn't simply do "bla create-buffer" or something, but it
is necessary to specialize a lot of functions manually. that's
completely unacceptable.


Entry: higher order macros
Date: Thu Nov  1 01:06:36 CET 2007

In order to solve some particular template problems, i'd like to have
higher order macros. this amounts to instead of splitting up a source
file as:

        MACROS -> PROCEDURES

splitting it up as

        ... -> MACROS^2 -> MACROS -> PROCEDURES

of course, there should be no limit to the tower.


The real problem is: i have no sane syntax space left! In macros i
can do this:

  macro : make-a-123
            ' a-123 *: 123 exit ;

which is already pretty ugly because of quoting issues. But what am i
going to invent to make higher level expansion work?

One thing is sure: taking out the reflection (making macros
side-effect free) killed the possibility of generating names at
compile time, EXCEPT for function labels. But those are really just
data: it's a hack that doesn't really count.

   So I have a GOOD THING: independent declaration instead of
   sequential variable mutation for creating new macro names, that
   causes a BAD THING: limited reflection due to improper phasing.

Actually, I already knew that, but i'm starting to feel it now:
artifical limits are no good. Even if they serve a higher goal.. Maybe
that makes them not artificial?

The limit i created is actually there for a reason: to use partial
evaluation to make it possible to perform compile time operations
without the need for an explicit meta language: without the need for
quotation like `(+ ,a ,b) or it's beefed up syntax-case / syntax-rules
variant.

(funny how the only 'meta' part of the language is the macro stack: it
punches holes in reality somehow ;)

So let's pat myself on the back:

   * the current macro / forth thing is GOOD. it is easy to use, easy
     to understand, and avoids most quotation issues that arise in
     practice by relying on partial evaluation. it gets pretty far
     without the need for an explicit metalanguage.

   * it's NOT GOOD ENOUGH because it's the top level: it can't be
     metaprogrammed itself!


The metaprogramming operations i'm looking for are those that create
new macro NAMES. Creating new macro BODIES should not be so terribly
hard: it is in fact what should be used for the quotation based
language.

So the core of the business should be the question why this works in
scheme:

(define-syntax make-macro
  (syntax-rules ()
    ((_ name body)
     (define-syntax name
       (syntax-rules ()
         ((_) body))))))

box> (make-macro bla (+ 1 2))
box> (bla)
3


Entry: poke
Date: Fri Nov  2 04:27:04 CET 2007

i'm taking a day off.. so technically i'm not allowed to write in this
log. however, i got into PF today, and wrote a rant on the PF list
about mapping and partial evaluation.

maybe it's time to start writing poke, or nailing down the
requirements. the idea behind poke is to have a machine model for DSP
like tasks that can be setup (metaprogrammed) by say a scheme
system. the idea behind an application is this:

	1. a program is compiled for a VM.

	2. a new VM is instantiated (on a separate core/machine)

	3. the VM now runs in real-time: doing its own scheduling and
  	   stack based memory management, being able to communicate
  	   with its host system and other VMs

each VM is a linear tree stack/tree machine.


i'd like to do this without writing a single line of C code: have it
all generated. that's the only way to be serious about generating
*some* code.

it should have an s-expression interface with which it talks to a host
scheme system. this acts as message passing: no shared state
allowed. this syntax should have an easy extension for binary data.

it should be 'ready' for multiprocessing. what i mean with this is:
each processing core should be able to run a single machine instance,
so instances should be able to talk among each other in a simple way,
and there should be a schedulure available on the VMs to handle the
message passing.

i was thinking about a 'binary s-expression' approach to limit
inter-machine communication parsing overhead. data should still be
list-structured though, and word-aligned. for human interface, a
simple front-end could be constructed. arrays can be allowed for ease
of wrapping binary data.

internally, cons based lists are used for all representation. cdr
coding is used to be able to represent programs linearly. memory
inside the machine consists of stacks only. each machine uses a
limited set of data types, making re-use lists efficient.

aim for the highest possible gcc code generation efficiency: i see no
point in targeting anything else than gcc, so all extensions are
allowed. i just checked (see doc/gcc/tail.c) for tail call support and
it seems to work when putting functions in a single file. it also
works putting the functions in different files apparently. that's good
news. state passed: 3 stacks: DS/RS/AS

the target language should be a pointer-free safe language. this is
going to be a bit more difficult, probably have to split in safe /
unsafe parts.

the 'system' language and the 'inner loop' language are different and
should be treated as such. i probably should start with the latter and
build the control language as a layer on top. the former is a
forth-like language extended with linear tree memory and the latter is
a multi-in-multi-out language to be combined with combinators.


  1. all C code generated: need a generator.
  2. message passing interface using s-expressions.
  3. run-time memory (stacks/trees) is locally managed
  4. other (code) memory is static/readonly, loaded by host
  5. safe target language (from a certain point up)

so poke seems like a really straightforward extension to
forth. getting it compatible with PF will be quite something
though.. all this is pretty low priority. the only difficulty is how
to deal with pointers for optimizing the linear stack/tree data
structures. 'safe poke' :)


Entry: mix
Date: Fri Nov  2 05:31:52 CET 2007

then the thing that could be used immediately in both PDP, PF and PD
modules: a language to describe inner loops and iterators, to yield C
code that can be straight linked into the projects.


Entry: instantiating abstract objects
Date: Fri Nov  2 15:19:40 CET 2007

i'm giving myself one hour to think about how to fix the verbosity of
the following code:


macro
: tx         #x100 tx-r/w #x0F ;  \ put buffers in RAM page 1
: rx         #x110 rx-r/w #x0F ;

: tx-ready?  tx-empty>z z? not ;
: rx-ready?  rx-empty>z z? not ;
: tx-room?   tx-room>z z? ;
: rx-room?   rx-room>z z? ;
forth

2variable tx-r/w
2variable rx-r/w

: tx-empty>z tx buffer.empty>z ;
: rx-empty>z rx buffer.empty>z ;
: rx-room>z  rx buffer.room>z ;
: tx-room>z  tx buffer.room>z ;

: >tx      tx buffer.write ;
: >rx      rx buffer.write ;
: tx>      tx buffer.read ;
: rx>      rx buffer.read ;
: clear-tx tx buffer.clear ;
: clear-rx rx buffer.clear ;


the ONLY difficulty here is that i can't generate macros, including
variables. is there an other way to solve the problem?

is it possibly to hide everything in one single macro? yes. if
tx-empty>z is never expanded as a function this is actually
possible. then what remains is just:


macro
: tx         #x100 tx-r/w #x0F ;  \ put buffers in RAM page 1
: rx         #x110 rx-r/w #x0F ;
forth

2variable tx-r/w
2variable rx-r/w

tx >buf
tx buf>

maybe i can somehow make an 'un-inline' function work? like
memoization?

something which gets me half way there is a blocking read/write
operation: only for dispatch loops this then becomes problematic.


conclusion: i guess it's ok to go for this approach:

   On the subject of code reuse, there are 2 options. Either you write
   it as procedure words, or as macros. Using the procedure word
   approach will lead to smaller code size but slower speed (since
   run-time dispatch is probably necessary).

   Using the macro word approach can lead to fast inline code which
   might be not optimial for code size.


Entry: e2 debugging
Date: Fri Nov  2 17:41:24 CET 2007

current setup: hub (master) connected to krikit (slave) which runs a
loopback. there is communication, but somehow a start bit gets
lost. there are 4 places where it can get lost:


1.    hub transmit (OK: clear on scope)
2.    slave receive (OK: sending #xFF all one gives reply)
3.    slave transmit (OK: reply has start bit)
4.    hub receive

i have no trigger scope or logic analyser so i need to construct a
steady state error condition i can sync my scope to. i can measure
slave transmit if i manage to add some wait code in the hub. such code
is probably necessary for other purposes.

so. running a couple of experiments makes it clear that 1-3 are
ok. the problem is with the hub receive that doesn't see the start
bit.

i don't see the problem. as far as i can isolate it, somehow the start
bit gets missed by:
    - the rx state machine is in the wrong state
    - the rx/tx switch comes a cycle too late
    - ...

i need something that's easier to test. i suspect the rx/tx switching
is the cause, so maybe i can make a better switcher?

i did notice a slightly borked waveform for the startbit
however.. let's see if i can get a better view and see where that's
coming from..

that was wrong. i start over:

     - fixed timer compensation, now at least the signal is stable
     - clearly to see that there's a phase problem

i'm wondering if it's not just a speed problem. timer is running every
64 clocks.. well.. it's easy to test by just running it slower
really.

YES! it was.. running 4x slower fixes the problem. time to do some
profiling then!


Entry: e2 + interpreter
Date: Fri Nov  2 23:21:38 CET 2007

i'd like to make 'transmit' and 'receive' late bound. that way it's
easy to switch the interpreter's default I/O. but i need to do it
cheaply: using vector.f and execute.f requires too much boot space.

wait.. that's the case for catkit. for the 2620 i have a lot more
room. maybe i should go that route then, and solve the catkit problem
when it poses itself.

time to make some decisions:
     * allow both serial + e2 ?
     * build e2 in boot loader?

actually, i do need e2 in boot loader. working as a safety
measure. hmm.. let's get it to do what it needs to do first.

ok. i can ping krikit. fixed the saving/restoring of the a reg so i
can access the stack. code upload doesn't work yet. i guess it has to
do with a missed 'ack' due to interrupts being disabled. maybe i
should build in the ack in the fprog?


  NOTE: about saving the a reg. if there are interrupts, the a reg
  needs to be saved anyway (or it's use protected with cli), so maybe
  it's best to just always save on clobber? alternatively, always save
  on clobber in isr.

i added an ack to fprog and ferase, but apparently that's not
enough. one line can be written, then it messes up. some code is
needed to properly resync the transceiver after programming so it
picks back up at the next idle INT0.

for debugging purposes, i should make a version that uses polling
only, so it can be used to setup interrupts. thinking about it, i
probably need to modify all opcodes so they give a sync themselves, so
no buffering is required. (uart has 1 byte). hmm.. it's not so simple
really.

actually, it is: all interpreter tokens have RPC semantics: they
return at least one value, except '00' which is a sink, and 'reset'
which can't have a return value. the 'ack' opcode can then be
eliminated, an possibly replaced with 'interpret'.

nop, reset -> no ack
receive	 -> ack
transmit -> value
jsr, lda, ldf -> ack
n@a+, n@f+  -> stream of bytes, no ack necessary
n!a+, n!f!  -> ack
ferase, fprog -> ack
chkblk, preply -> stream of bytes, no ack necessary

this should get rid of the requirement to have buffered io. remaining
timing issues can be handled with appropriate delays.


an interesting extension when 'receive' and 'transmit' are made
dynamic is to have them read from memory. that way a small program
could execute from ram.


Entry: boot protocol changed
Date: Sat Nov  3 02:27:28 CET 2007

      * fprog and ferase now give an 'ack' themselves. this is
        necessary for receivers that suffer when interrupts are
        disabled.

      * the #x00000000 password is eliminated: with boot code
        protection this isn't necessary.


Entry: separate compilation + name spaces
Date: Sat Nov  3 15:14:27 CET 2007

as a consequence of the way compilation works, it is possible to rely
on the fact that, per compilation unit, names can be overwritten. what
i mean is that it is possible to 'load' the same file twice, but with
different words/macros bound in its environment.

this comes close enough to the 'dictionary schadows' paradigm i'm used
to in PF, and which actually works pretty well: it avoids the need of
a name space mechanism fairly effectively.

an extension to this could be to allow for exports: provide only those
macros and words necessary.

then another extension: why not install the macro source in the target
dictionary? there's no realy reason not to, and it makes 'mark' work
for macros (given that i delete and re-instantiate the macro
cache). or.. i could use this as an indicator for using the macro
cache or not.

one thing that has been bugging me: if i define a word or a macro, i
do want it to override the previous word or macro.

i should make a list of the name space trade--offs for writing a forth
really.


Entry: roadmap
Date: Sat Nov  3 15:51:02 CET 2007

  - get programming to work over e2 (restart receiver after fprog: add
    a macro hook to interpreter.f) (done)

  - fix acks in interpreter.f and tethered.ss (done)

  - make it work without interrupts and put it in the boot loader

  - figure out the 'strong power 1' phase, and test with slave.

  - test over longer twisted pair cable.


Entry: no middle road
Date: Sun Nov  4 01:04:57 CET 2007

some thoughts about 'accumulative' code, due to lack of a better word.

in light of the recent remarks about higher order macros, i have the
impression i am mixing 2 paradigm in a not so elegant matter:

    1. functional programming, mzscheme's independent
       compilation with 'unrolled' metaprogramming dependencies.

    2. the accumulative image model, where a language grows by
       accumulating more power, which then can immediately be used to
       define new constructs.

i knowingly took out a part of 2. to get purely functional macros that
could be safely evaluated for their value at interaction time.
however, the the interactive compilation does work in an incremental
way. it looks as if i am forced into some middle road compromise.


Entry: embedded programming in 2007
Date: Sun Nov  4 15:53:35 CET 2007


the question i really like to answer: without too much bias (the the
tool i wrote) what is the point of writing static, early bound code in
2007, even if we're talking about microcontrollers.

  * is there really a 'complexity barrier' below which one HAS TO move
    to quasi manual compilation and allocation?

  * will this barrier remain in existence, or will better tools make
    a more high-level approach possible?

EDIT: some things i was thinking about yesterday:

  * leaky abstractions are hard to work with. starting from assembly
    and "thinking up", using purrr to help you write the application
    is the right approach. starting from some high-level understanding
    of the language and having to learn all its limitations doesn't
    really work. the problem seems to be the manual resource
    management: time, space, and synchronization between global
    variables, and hardware devices.

  * it seems i loose most of my time in low-level configuration issues
    which give little feedback on error, and dealing with situations
    that are hard to debug due to dependence on external events. low
    level design really is a debugging problem: setting up experiments
    to try to isolate errors. hence the use of loads of specialized
    (hardware) tools used in professional environments.


Entry: concatenative introduction email
Date: Mon Nov  5 18:44:15 CET 2007

Dear All,

Allow me to introduce myself. My name is Tom Schouten. I live in
Leuven, Belgium and I'm 32 now, if that helps paint a picture. I've
been interested in concatenative programming for a while and lurking
here and there.. To educate myself, I wrote quite a lot of code in the
last couple of years, and I'd like to share some of the results, but
maybe even more the resulting questions. (warning: long post, story of
my life :)

My background is in electrical engineering. My heart lies in music
DSP.  I've been working up the ladder from electronics, to machine
language and C/C++, through Pure Data (a data flow language) to Perl &
Python to finally end up at Scheme and functional programming. I'm
flirting a bit with Haskell, but really just read because most recent
interesting functional programming texts use that language.

http://en.wikipedia.org/wiki/Pure_Data

The problem I'm trying to solve to guide me a bit is "Build tools to
write DSP code, mostly for sound and video processing, in a high level
language." I ran into limits of expressiveness writing video
extensions in C for Pure Data, about 4-5 years ago. Apparently there
are no freely available tools that solve this problem, so I take that
to be my mission.

About 3-4 years ago I started writing Packet Forth (PF) as an
attempt to grow out of my C shoes. It was at the time I discovered
colorForth, and I was wondering if I could create some kind of cross
breed between Pure Data and Forth. PF now looks a bit like Factor on
the outside, though is less powerful. PF uses linear memory management
(data is a tree), with some unmanaged pointers for data and code
variables. PF's point is a to be a scripting language which tosses
around some DSP operations written in C. It doesn't aim to be a
general purpose language. The darcs archive is here:

http://zwizwa.be/darcs/packetforth/

Some more more highlevel docs aimed at media artists here:

http://packets.goto10.org

I got a bit frustrated with the internals of PF, mostly because there
is still too much verbose C code, and a lot of C preprocessor macro
tricks that could best be done with a _real_ C code generator.

So I dived a bit deeper into Forth, and early 2005 I started at the
bottom again: I wrote an indirect threaded forth for Pure Data (mole),
and started BADNOP (now dubbed BROOD 1), an interactive cross compiler
for the Forth dialect Purrr, an 8-bit stack machine model for Flash
based PIC Microcontrollers.

http://zwizwa.be/darcs/mole
http://zwizwa.be/darcs/brood-1

Mole made me 'get' Forth finally: the first versions of PF were mostly
blind hackery to get to know the problems before the solution. For
mole, I actually followed tradition a bit more (Brad Rodriguez'
"Moving Forth"). This lead to a more decent PF interpreter.

The forth I wrote to write the cross-compiler for the PIC Forth was a
mess. I was experimenting with some quotation syntax but realized that
what I was really looking for was lisp, or a lisp--like concatenative
language. At that time, early 2006, I discovered Joy, so I ditched the
compiler and rewrote it in CAT (not Christopher's Cat) which was
written in scheme (BROOD-2). After some refactoring and rewriting due
to beginner mistakes I am now at BROOD-4, with the CAT core written as
a set of MzScheme macros. This CAT is a dynamicly typed concatenative
language with Scheme semantics. I consider it an intermediate
language. Currently it is only used to implement the Purrr compiler (a
set of macros) and the interaction system.

http://zwizwa.be/darcs/brood

Purrr is as far as I know somewhat novel. All metaprogramming tricks
one would perform in Forth using the [ and ] words are done using
partial evaluation only. I've tested this in practice and it seems to
work surprisingly well.

I am still struggling a bit with the highl level Purrr semantics
though. Concretely, it is a fairly standard macro assembler with
peephole optimization. Nothing special there. On the other hand, its
macro language is a purely functional concatenative language which is
'projected' onto a real machine architecture after being partially
evaluated. I tried to explain these concepts in the following papers:

http://zwizwa.be/darcs/brood/tex/purrr.pdf
http://zwizwa.be/darcs/brood/tex/brood.pdf

(for the latest versions it's always better to use the .tex from the
darcs archive)

The latter needs some terminology cleanup, but it contains an
explanation of the the basic ideas, and an attempt to clarify the
macro semantics in a more formal way. I'm interested to learn what i
need to read in order to frame these concepts in proper CS speak... It
looks like I'm either terribly ignorant of something that already
exists (I went through a couple of stages of that already), or I found
a clean way of giving cross-compiled Forth a proper semantics.

On a lighter note, I'm using Purrr to build The Sheep, a retro synth
inspired by 1980's approach to sound generation. It runs on CATkit,
and has been used successfully many times in beginner "physical
computing" workshops, as electronics is called in non-engineering
circles :)

http://zwizwa.be/darcs/brood/tex/synth.pdf
http://packets.goto10.org/packets/wiki/CATkit

(the scary guy in the picture is not me :)


Entry: krikit board design decisions
Date: Mon Nov  5 17:52:11 CET 2007

- 4 x AAA -> need at least 5V. alternatively, use a 9V cell and a
  transistor for speaker output.

- RGB led onboard

- debug connector = battery connector (RCA plug)


Entry: TODO
Date: Mon Nov  5 21:40:23 CET 2007

list has moved to TODO file.


Entry: polling E2 interpreter
Date: Mon Nov  5 23:15:31 CET 2007

it's not entirely without trade-offs to choose for a polling
interpreter for E2 in the boot code.

PRO: independent of interrupt routines which is useful for debugging
application isrs.

CON: completely synchronous and non-buffered. this requires some
careful coding in order not to miss any data.

Maybe the boot code should contain both versions?

This leads to objects really: a vtable is a dynamic route word.

2variable stdout
: do-stdout stdout invoke ;
: e2-stdout stdout -> route
       rx ; tx ; on ; off ;


hmm.. i messed up slave.f: diff tomorrow..


Entry: macros and procedure dictionary
Date: Tue Nov  6 06:03:54 CET 2007

maybe the trick is to just get rid of the distinction between
procedure words and macros: a single namespace, with procedure words
being equial to

: bla 123 compile ;

this combined with a preprocessing step that identifies all labels in
the source text. a single namespace is easier to understand. separate
compilation units gives shadowing, while inside a single compilation
unit circular references are possible.

what i want this to move toward is a more and more static declarative
structure. maybe i should re-implement namespaces and build them on
top of the mzscheme module system. i doubt the solution i can live
with eventually will be significantly different than mzscheme's..
maybe a bit more liberal? or is that just because of current
implementation?

maybe i should make the compiled macros into a real cache, and store a
master version as a s-exp tree..


Entry: redefine
Date: Tue Nov  6 15:59:00 CET 2007


i need to

  * make it illegal to redefine macros: they use a caching mechanism
    which replaces names with values (procedures).

  * make it illegal to define a label that is already a macro


the real problem is that redefines need a proper semantics in CAT. for
the forth, i think shadowing redefines are best: 'empty' is practical,
and it should work for macros too.

CAT is currently designed so redefines are illegal: this allows the
use of values instead of boxed values. some possible routes out of the
mess:

   - prohibit redefinitions
   - use shadowing + proper cache
   - use boxed values (reset the code inside the 'word' struct)

a deeper question is: why not use mzscheme namespaces for all macros?
answer: because i rely on late linking. is there a way around this? it
probably makes it too complicated, since i need to figure out a way to
map it to BOTH modules and units..

let's stick to the current hash table name space, and go fore the
boxed approach: mutate the words themselves, instead of their hash
table entry.

OK. that seems to work. remaining problem: defining words that are
macros. a way to solve this is to define each word in the dictionary
as a macro, compiling [cw <name>]

let's not.. i've added a warning, which made me realize that i do use
this: macros can call words with the same name as a fallback. that
mechanism might be more worth than a safety net.. no, a safety net is
more important: can fix the delegation using a symbol prefix. what
about doing this automaticly? the last matching pattern is always
mapped to a runtime call?

i do need to fix dangling macros though. let's see if i can run into
that case again..

ok, it's clear: a dangling macro can be disastrous.

this is a mess..

assume there are 2 classes of macros: CORE and PRJ.

PRJ needs to be flushed whenever the project changes. i am not sure
whether macros from CORE will actually bind to those defined in
PRJ. there is no such plugin behaviour as far as i can tell, but
nevertheless it is possible to go wrong so i should do this:

             flush cache =
               - invalidate all prj macros (make them raise an exception)
               - detach them from the namespace


looks like i got it now: ns-flush-dynamic-words! + support


Entry: asm rewrite
Date: Wed Nov  7 00:58:54 CET 2007

found another asm bug: variables get allocated on each pass now. this
doesn't seem to be fatal though, just inefficient. sheepsint works, so
it can't be the weird hub.f bug i'm chasing..

[EDIT]

several things might change here, but it could be a good idea to keep
the current operation until i have time to clean it up a bit. cleanup
would be:
  - move 'here' to a separate dynamic variable
  - handle different dictionaries better.

the problem now is that 'allot' gets called multiple times without
reset.. it's probably best to filter it out in a preprocessing step.


Entry: sheep transients
Date: Wed Nov  7 04:45:41 CET 2007

'sound' needs to be a stack: a circular one, initialized with valid
sounds, or a delimited one so a sound can end in 'done' to fill the
rest of a control slice with another sound.

the point of this is to create a concatenation at run time. it is of
course possible to do this at compile time, but the fun would be in
*mixing* sounds..

i think i have the solution there: each pattern tick a 'program' is
erased, and filled with instruments that are played after each other,
with the last tone = silence.


Entry: low impedance signal source
Date: Wed Nov  7 07:00:27 CET 2007

i'm trying to understand the difference between these 2 statements:

 * for a low--impedance source you best measure current, while for a
   high--impedance source you best measure voltage.

 * a current source has high impedance, and a voltage source has low
   impedance.

the deal is that these are 2 different kinds of "measurement" because
of the entire different scale of energy involved: for sensors, you
want maximum energy transfer, but to "measure" a current or voltage
source, you want minimal energy transfer.

looking at a sensor as a voltage or current source, you want to "max
it out".


implementation:

so, doing this with an opamp is really trivial. bias (+) on Vdd/2,
feed back from (out) to (-) using Ra, and connect the current source
between the virtual ground (-) and (+).

                    R
              /---/\/\/\/\--\
              |  __         |
              | |  \        |
       /--||--o-| - \_______|___o Analog -> uC
   |\  @     _o_| + /
   | ||@    |   |__/
   |/  @    |
       \----o
            |
o--/\/\/\/--o---/\/\/\/--o
GND        2.5V         5V


then:
 * connect the speaker to 2 analog inputs, so they can be switched in
   analog high Z mode: not good to bias digital ins at Vdd/2

 * run the opamp and bias network off of digital output


let's see if analog Z1/Z2 can be PWM outputs. no such luck.. maybe use
a transistor to shield the detached (Z) driver pin from the Vdd/2 bias
voltage. or just not use pwm...

remaining question is if the opamp, when powered down, can take a
large differential input voltage.


EDIT: the circuit doesn't work without a capacitor: the coil is a
short circuit at DC, connecting (-) and (+). due to nonzero offset
voltage, this saturates the amp.

EDIT: i understand now why measuring current is not a good idea. the
impedance of the device is dependent on frequency: 0 for DC, rising
linearly. if you measure current, the signal will have a strong low
frequency content. however, if you measure voltage through a resistor
that's say 10x larger than the stated impedance, the response is
flattened out since the resistor dominates. so the classic one works
better:

   SPK                Rg
    o          /---/\/\/\/\--\
    |          |  __         |
    |    Rs    | |  \        |
    o--/\/\/\--o-| - \_______|___o Analog -> uC
|\  @        ____| + /
| ||@       |    |__/
|/  @       |
    |       |
   === Cs   |
    |       |
    |   Cn  |
o---o---||--o
|           |
o--/\/\/\/--o---/\/\/\/--o
GND        2.5V         5V


Here Cn reduces noise by lowering the AC impedance wrt to the high DC
impedance point at (+). Rs = 47R and Rg = 100k give decent
results. About 2000x or 66dB.

The values of Rs should be as low as possible to reduce noise. I'm
comfortable now i understand the trade-off.

EDIT: i switched to using closed loop current measurement again, this
time limiting the overall gain to about 100x (using Rg=1K). followed
by a second stage with 100x gain this seems to work better. i suppose
my original problam was just due to too high gain, running into opamp
limitations.


EDIT: going back to the circuit with Rs: in that one the opamp's input
could be decoupled from what goes to the speaker by switching the (-)
and the top of the bias network to ground.


Entry: e2 hub
Date: Wed Nov  7 17:37:52 CET 2007

now i need to find a way to program the e2 hub. on problem in the
boot protocol is that i have no way to packetize the stream.. what i
want is a hub which is mostly in repeater mode (for the commands
0->15) but responds to other commands itself.

there are 2 alternatives. either write a 'fake' interpreter which
simulates the state machine that parses the debug input stream, or
change the protocol so it is delimited.

the former is a stupid short term thinking hack.. let's packetize.

hmm... this is quite a change again: thinking about optimizing the
problem. oops bad word :)

a way to do this quickly is to just prepend every message with the
size. that way the core interpreter can ignore it, but the repeater
can transfer without being able to interpret.

let's try that first.

notes
  - should put 'ack' at 1, so a stream of ones gives ack messages

hmm.. i chickened out. it's a lot of changes at once. lots of places
to go wrong. will cost me some frustration.. let's find another way,
go for the stupid hack.

if i can make the message length not dependent on context, meaning a
previous message length, i can probably derive the lengths manually.

the only problems here are the block transfer words, stuff that comes
back from the uC can be echoed without problems. (i'm thinking about
ping reply..)

ok, made the protocol context-free in the host -> target direction.

so, next:
  * make hub understand protocol (OK)
  * add hub commands
  * move to polling implementation in bootloader

hub commands: these should be an abstract interface for things one
would like to do with a hub. arbitrarily

    set client  (0=hub)
    on client
    off client


hmm.. i don't see it so clearly. what am i trying to accomplish? by
default i should be in 'hub application' mode, but it should be
possible to switch to hub debug mode too. the latter can be permanent
switch (requiring access to the dictionary to switch back).

i got it sort of figured out now.

  TODO: * make hub switch between hub-interpreter and interpreter
          using external resistor. do this when hub is finished.

        * until then, find a way to start the repeater without having
          the hub dictionary loaded.


Entry: how much amplification?
Date: Thu Nov  8 01:22:04 CET 2007

i have 12 bit at my disposal. amplification is mainly determined by
the ratio of distances. i'm measuring current, which should be
proportional to sound pressure, which is 1/r^2

so, suppose i use a gain factor of 64 = 8^2, this gives a range ratio
of 8. say 1 - 8 meters: don't put them closer than a meter, and
further than 8.


Entry: poke & precompiled loops
Date: Thu Nov  8 06:14:42 CET 2007

i think i need to separate out the c code generator so i can start
generating code for PF and Pure Data, which will not be anything
forth-like so doesn't really belong in brood. in fact, the way i think
of it now (something akin to functional reactive programming) it will
be quite the opposite.


Entry: RS next order
Date: Thu Nov  8 17:42:19 CET 2007

 * linear regulators
 * 9V clips / battery holders
 * transistors?
 * high ohm resistors
 * xtals + caps / resonators
 * blue bell wire
 * schottky diodes
 * small signal diodes


Entry: more modem design decisions
Date: Fri Nov  9 02:44:47 CET 2007

 - modulation or not?

  some modulation is necessary since i can't transfer DC. but
  frequency response looks really non-flat to go for a wideband
  approach. i need to experiment.

 - FIR or IIR

  what is needed is a decimating filter. i can probably get much
  further with a crude windowed FIR than an IIR.


Entry: demodulator
Date: Sat Nov 10 15:12:58 CET 2007

i'm not going to waste time on trying out a pure square wave
modulation. let's stick with some simple demodulator, and have a look
at the numbers.

i currently have the debug net running at 8kHz. this should also host
the filter tick, which consists of:
   - read adc + update filter state
   - once every x samples, wake up the detector tasklet

what if i start out with using a FSK, because it requires no
synchronization, and use a square window where the 2 frequencies are
placed each other's zero.

so.. square window does have perfect rejection for the harmonic
frequencies. it's only the stuff that lies inbetween that is
problematic. ok, this is obvious.

the problem can be entirely moved to synchronization and linear
distortion due to transitions. if the receiver listens during a steady
state part of the signal, perfect rejection is possible. so the main
questions are:

  - how to synchronize?
  - how to limit transitions?

which brings me back to PSK.. maybe it is just simpler to use? as long
as the start of a symbol can be detected (threshold) and the phase can
be corrected (preamble) the rest seems not so hard really.

again, from a different angle: demodulating PSK is a synchronous mixer
followed by a low pass filter. i assume that a rectangular window is
going to be good enough as an LPF, which just leaves the problems of
signal detection and synchronization.

if i leave the non-synchronized receiver on constantly, it outputs a
24 bit complex number. during synchronization this needs to move
toward zero. the phasor will rotate once per window length. which
direction? if the direction is known, it's possible to detect a
crossing. the direction is determined by the rotation direction of the
mixer phasor.

i need to up the frequency: 2 MIPS is not enough. maybe i should do
that first.. then the output stage then the receiver then a decoder.

next actions:
 * have a look at PSK31 demod code
 * build the output stage (either PWM or SD)
 * build board + move to 40 MHz (monday: can't find xtals, maybe test
 on the dude?)

Entry: PSK31
Date: Sat Nov 10 18:56:01 CET 2007


PSK31: Peter G3PLX
http://det.bi.ehu.es/~jtpjatae/pdf/p31g3plx.pdf

some ideas from the paper:

 * this is a protocol for live communication. Error correcting codes
   introduce delays.

 * use relaxed bandwidth for the filter for smaller delays and lower
   cost.

 * take advantage of high frequency stability of modern HF radios

 * demodulation by using 1 bit symbol delay and comparison. ??? i dont
   get this one.

 * synchronize using the amplitude modulation component!

 * viterbi decoder for convolutional code


Entry: a single port for debugging
Date: Sat Nov 10 20:37:18 CET 2007

wait a minute. if i manage to plug the E2 protocol through to the icd
port, i could standardize on a single set of connectives. however, the
connection is not standard, but it is 4-wire (can run over telephone
cable) is synchronous and has a clock too. what this would solve is
the bootstrap upload problem, which is a nasty one..


Entry: transmission bandwidth
Date: Sat Nov 10 20:51:22 CET 2007

something i never really understood: Fig. 4 in the PSK31 paper shows
the bandwidth for random data. why is this so wide? why is reversal
not the highest bandwidth?

other questions: try to explain what this 'bit delay' demod is + how
the amplitude demodulation sync works.


Entry: BPSK synchronization
Date: Sat Nov 10 21:25:37 CET 2007

there are 2 kinds of synchronization necessary: carrier
synchronization, and bit clock synchronization. the former can use a
PLL, the latter can use the 1->0 transition.

suppose the following bit encoding: 8N1, with 1 = idle, and 0 = start
bit. during idle the phasor needs to be predictable. this is either a
fixed value, or an oscillation between 2 signal states. picking the
former this gives

   1 = carrier
   0 = inverse carrier

during idle, the synchronizer works: this is a PLL state machine which
turns a single phase increment left or right depending on which
quadrant the phasor is in. there are 3 bits determining quadrant.

there needs to be an AGC which reduces the 24 bit phasor to an 8 bit
phasor for easier demodulation and synchronization.


Entry: so why not use AM?
Date: Sat Nov 10 21:50:43 CET 2007

somehow both FSK and PSK seem too complicated. maybe i should start
with AM, then later (never) continue down the road and try FSK (double
AM) and PSK (with synchronization).

the most important interference we're going to find is bursts. these
should be able to eliminate using stop bits: 1 = on, 0 = off, which
means a burst will probably lack a stop bit.

algo: continuous square window filter with signal detector feeds into
simple sampler. if the sampler is not active, every 1->0 transition
will wake it up.

before starting with AM, i can just use some noise modulated
protocol. hell, anything that can get a 1 accross.


Entry: roadmap
Date: Sat Nov 10 22:16:08 CET 2007

EDIT:

  * try strong phase and run it off E2 (OK)
  * level detector, use the RGB led.


Entry: E2 next
Date: Sun Nov 11 01:02:47 CET 2007

apparently the E2 signal interferes quite a bit with the amplifier,
which is not such a big surprise. so i guess it's time to mature the
debug network a bit:

  * switch to idle mode (keep high) when there's no host -> target
    comm.

  * find out what the initial 'missed ping' is all about.

i'm going to add stop bit checks to at least eliminate 1->0 bus
glitches as a source of errors.

that wasn't the problem... something is wrong with bus startup. maybe
i need to make sure 'off' will actually switch on power state?

looks like the error is with the slave init.. i get a predictable
reply to a ping after bus-reset:

> 13 >tx
> rx-size p
3 > rx> px
2D > rx> px
AD > rx> px
F7 > rx-size p


i made a little progress here:

> hub-init
ERROR:
time-out: 1
> rx-size p
0 > 5 >tx 8 >tx 0 >tx
> rx-size p
1 > rx> p
131 > 9 >tx 2 >tx
> rx-size p
2 > rx> px
FF > rx> px
D0 >

this sends the commands for fetching the 2 bytes at rom address
#x0008, which indeed should be #xFF, #xD0

this is reproducable. so i can conclude that the bytes get received
properly. something goes wrong in either the slave transmitter or the
host receiver..

i give up.. i can't find it. a workaround which seems to be stable is
to send an 'ack' which will send back a garbled byte.

apparently, unplugging and replugging the E2 connector gives the same
behaviour: first byte coming from slave is corrupted. so it can't be
host side..


Entry: amp notes
Date: Mon Nov 12 15:29:42 CET 2007

i changed the circuit back to 1K input impedance, 100K feedback in
first stage. the second stage has 1K + 100nF, and 100K feedback, and I
have no idea why this works: less noise, and it seems to have a good
response in the intended range..

maybe because most sounds have a 1/f response? i don't know... it
responds well to whistles, which is nice.

this is a 10 kHz pole... so it's basicly set up as a differentiator?
maybe because i have GBW rolloff this works? i'm puzzled.

i tried with a LM358N and it gives a lot more noise.

i tried TL072CN and it gives too much bandwidth! so, i use a
compensated integrator with 1K/100n in the source and 220K/4.7n in the
feedback section. looks like this is final enough.. maybe beef up the
amp just a tiny bit more..

PARTS LIST:

2 x 220K
1 x 100K
2 x 10K
3 x 1K

2 x 15pF
1 x 4.7nF
2 x 100nF
2 x 10uF

1 x 10MHz
1 x 18F2620
1 x TL072CN
1 x LED(red)
2 x 6 PIN HEADER
                                           C2 4.7nF
                                        /-----||----\
                                        |           |
   SPK            Rg 220K               |  R2 220K  |
    o          /-/\/\/\--\              o--/\/\/\---o
    |          |  __     |              |  __       |
    |   Rs 1K  | |  \    |       R1 1K  | |  \      |
    o--/\/\/\--o-| - \___o--||--/\/\/\--o-| - \_____o LINE
|\  @        ____| + /     C1 100nF     __| + /
| ||@       |    |__/                   | |__/
|/  @       o---------------------------/
    |       |
   === Cs   |
    |  10uF |
    |       |
    |   Cn  |
    |  10uF |
o---o---||--o
|           |
o--/\/\/\/--o---/\/\/\/--o
GND        2.5V         5V


First stage gives 220 x amplification.

TL072 (TI version, i'm using ST version) has a GBW of 3 MHz, with 220
x amplification this gives rolloff at 13 kHz. so for the first stage
i'm good.

Second stage is a band pass filter with 22 x amplification:

G . . . . ._________
          /.       .\
         / .       . \
        /  .       .  \
         1/t2     1/t1

t1 = R1 C1 = 100us  -> 10kHz
t2 = R2 C2 = 1000us ->  1kHz


because f1 > f2 the gain is not R2/R1 but R2/R1 * f1/f2.

a bit quirky, but it works.. maybe i should try with exchanging the
time constants so f2 > f1.

looks like these changes keep the transfer function the same, with a =
sqrt(10)

R2 -> 1/a R2
C2 -> 1/a C2
R1 -> a R1
C1 -> a C1

so, there's a reason to do it like i did! the capacitors are
smaller. so where's the trade-off? maybe noise due to large resistors?
however, when f2 > f1 the gain is independent of the capacitors.

let's make this a bit more intuitive. what happens when C1 is made 10x
larger, so f1 = 1kHz, and C2 is made 10x smallr, so f2 = 10kHz? the
gain is now 10x more, so then the gain can be reduced by making R2
10x smaller, which again requires C2 to be 10x larger. so the net
effect is:

R1 -> R1
C1 -> 10 C1
C2 -> C2
R2 -> 1/10 R2

this gives a 1uF capacitor. so alternatively R1 can be made 10x
larger, which requires C2 to be made 10x smaller. giving 10K and 470pF
respectively. (EDIT: this is what i did. works fine).

makes more sense now. so is there a ciruit that has independent
frequency and gain?

hmm.. i just tried the LM358N again, and it gives good results
also. guess the TL022 was just too low bandwidth? yup. 0.5 Mhz. hmm
the LM358N is only 1MHz ?


Entry: building the first krikit boards
Date: Mon Nov 12 15:41:18 CET 2007

 - not using E2, use serial + icd2 instead
 - xtal 40 MHz operation

pins to determine:
     * opamp + bias power
     * analog input (maybe first stage also)
     * speaker out

and figure out if the opamp can take 5V when it's powered down,
otherwise it needs an extra pin to pull the (+) input to ground also.


Entry: new sheepsint default app
Date: Thu Nov 15 20:00:15 CET 2007


something like this:

buttons:
* noise on/off
* xmod
* reso
* reset = silence

take button state from ram, uninitialized, so it survives reset.

xmod control uses 2 x 2Hx - 20kHz log
reso needs robustness for reso freq < main osc freq

3 x frequency nobs

2 knobs left.. maybe some modulation? osc 2 frequency + modulation
index. (formant / noise frequency)


Entry: fake chaos
Date: Sat Nov 17 02:27:51 CET 2007


following the same line as the formant oscillator, a fake chaotic
filter could be made. such an oscillator (some / all?) contain
unstable oscillations that are 'squelched'. the points where such
squelching happens are randomly distributed, but the bursts themselves
are quite stable, leading to an approximation as randomly spaced fixed
bursts.

does this work with the current setup? no.. it uses a random
period. that's different.

so... using the reso algo, it boils down to randomizing p0 with fixed
p1 and p2.

randomizing could be fixed + variable. the question is when to update
the period. the easiest is a fixed rate..

continuous updates seem to work. now p0 is modulated with a uniformly
distributed, taking care not to over-modulate so p0 wraps
around. everything is moved to prj/CATkit/demo.f


Entry: amplifier noise
Date: Mon Nov 19 21:25:32 CET 2007

for the next iteration of the krikit board, it might be a good idea to
improve the amp a bit. there are 2 things to consider:

  * input stage noise (impedance)
  * filter/amp stage capacitor values vs. noise and power consumption


Entry: shopping
Date: Mon Nov 19 23:57:44 CET 2007

AITEC:
 - perfboard?

VOTI:
 - perfboard

RS:
 - perfboard (RS: 206-8648, manuf: RE 200 HP)
 - oscillator
 - 9V battery holders + linear regulators
 - 8pin sockets
 - small signal diodes
 - high ohm resistors
 - blue,black bell wire


Entry: krikit todo
Date: Tue Nov 20 17:58:26 CET 2007


 - determine pins: analog in, opamp enable,
 - output transistor: speaker out pin.
 - debug net: E2 / serial: minimal slave complexity solution

also for catkit: it might be best to connect the E2 bus to the serial
TX pin, which is multiplexed with an INT pin. (also for 2620? no, but
can be connected externally.)


Entry: ditch E2 ?
Date: Tue Nov 20 18:16:27 CET 2007

simple TTL serial with a bit of careful programming to ensure enough
'on' time (basicly, large enough cap or some extra '1' bits in the
data) might be a better approach, since it doesn't require a special
decoder in the target chip.

i could use a 'standard' here: the stereo minijack used in some A/V
equipment. ftdi sells them too apparently:
http://www.ftdichip.com/Images/TTL-232R-AJ%20pinout.jpg

or: leave the choice between E2 and serial open. given a bit of delay
on the client side when sending, and a proper 'listen' phase on the
host, a serial protocol can be used using the same hardware as the E2
bus:


         1K
TX  o--/\/\/\/\----\
                   |
RX  o----------o---o---o BUS
               |
       /--|<|--/
       |
VDD o--o
       |
      === 10uF
       |
GND o--o---------------o GND


another thing to do would be to make the E2 protocol compatible with
the hardware uart. the problem here is the factor 5...


          (+)                      (-)

SERIAL    client + hw simple       4 wires
SER+POW   client simple            3 wires
E2        client complex           2 wires


hmm...  the thing which i find most attractive is the possibility to
have a POWER socket that can be used as a comm. the rectification
diode also acts as a protection diode this way, and diode drop is not
really a problem when powered from 3V-5V.

so the main question is: how to make SERIAL run over 2 wires, given
the setup above? this is a sofware problem: how to synchronize. the
question is whether this extra synchronization effort will lead to more
slave complexity, with the bound being the E2 rx/tx.

 POW: works as long as cap is large enough
 RX:  always works
 TX:  works as long as host leaves room on the cable..

so the problem is for host to create a window. this shouldn't be too
big, for POW reasons.. to solve the timing issues here, it looks to me
that complexity will be a lot higher. i guess it's best to stick to
E2, but keep open the possibility of unidirectional serial comm.

conclusions:

     (4) serial + separate power over 4 lead telephone wire
     (3) serial, power from data, using stereo audio cable
     (2) E2: 2 wire, power connector

which one for krikit?


Entry: problem chips
Date: Wed Nov 21 20:40:14 CET 2007

18LF2610-I/SP doesn't seem to want to program..
ok. that was stupid. they're not self-writable.


Entry: krikit pins
Date: Wed Nov 21 22:31:09 CET 2007

input works seemingly without problems. output is going to be a bit
more problematic.. i think it's not a good idea to drive the speaker
with the pin directly.. for 2 reasons: 8 ohms is to high a load and
the drive point needs to be tolerant for analog voltages (a CMOS input
is not, and i'd like to use the PWM)

with the current setup, a PNP switch is probably best.

so, design variables:

    * PNP / NPN  (cap to ground or Vdd)
    * suppression diode?
    * feed from battery (9V) or Vdd (5v regulated)

EDIT:
1K with PNP on 13/RC2/CCP1

EDIT:
ok. make sure the speaker is not full-on, the transistor gets really hot.

EDIT:
running into a problem: i'm using high = off, which apparently the PWM
doesn't like: it gives a single spike. so i need to explicitly turn
off PWM.

EDIT:
the chip resets unexpectedly. trying now with ICD2 attached: seems to
be stable. so something's wrong with my reset circuit probably. could
be power supply stuff. some spikes..


Entry: standard 16 bit forth
Date: Thu Nov 22 04:53:21 CET 2007

i keep coming back to a standard forth for sheepsint. purrr18 is there
to stay as a low level metaprogrammed machine layer, but teaching it
is a real pain..

maybe the time is there.. maybe a safe language is not the way to
go. maybe a simple forth is more important? maybe standard is
important after all? i have a lot of design choices to make, like
building the interpreter on top of a unified memory model or
not..

 [ mostly triggered by ending up at the taygeta site (from e-forth) ]

more questions. if i want to make a standard forth platform, wouldn't
it be better to go for the 18f2620 with a resonator and a linear
regulator, and add a keyboard in and video out while i'm at it? why
not the dspic then? ( because i didn't port to it yet, tiens! )

so possible projects for januari:
  - portable forth on top of purrr18
  - linear safe language on top of purrr18
  - dspic assembler + compiler
  - a home computer based on 18f2620

strategically, portable forth seems to be the best option, since this
solves most of the documentation issues.. dspic and the home computer
are more of a lab thing. the linear safe language is something i need
to figure out how to do first in PF context.

portable forth could use 'code' words to switch to purrr? maybe it's a
good exercise all in itself to try to write a standard forth, and not
care too much about optimization etc.. i have my non-standard forth
now, so it's good to aim for the average.


Entry: the circuit again
Date: Thu Nov 22 18:59:33 CET 2007

because the input impedance is so low, the 10uF cap is really not
neglegible! in fact, this gives a 10ms time constant with a 1K
resistor. that's 100Hz, but the filter cuts of at 1kHz, so it's ok..

it's not ok for what i wanted to do, which is to use only a single
transistor to drive the speaker without a capacitor. this might
reverse polarize the cap: it's probably best to keep the switching
frequency high enough so this doesn't need to happen.. i wanted to
replace it with a ceramic one, but that needs at least 1uF.

wait a sec.. maybe it's just not possible to drive the cap to a
negative voltage? yup.. the + side of the cap will be at saturation
voltage.


                  Rg 220K
               /-/\/\/\--\
               |  __     |
        Rs 1K  | |  \    |
    /--/\/\/\--o-| - \___o     o Vdd
    |           _| + /         |
   === Cs      | |__/           >|
    | 10uF     |                 |---/\/\/\---o SPK
    |          o Vbias          /|
    |                          |
    o--------------------------/
    |
|\  @
| ||@
|/  @
    |
    0
   GND


i wonder if it's possible to make the circuit such that the transistor
doesn't blow up if SPK is driven low for too long. check this:
http://www.winbond-usa.com/products/isd_products/chipcorder/applicationbriefs/apbr21a.pdf

somehow the DC path needs to be blocked or at least limited.

hmm.. i think there's really no better way than to use switching: that
produces the least amount of heat in the transistor. just be careful
to not drive it too long, and use minimal DC: start wave 'touching'
ground instead of symmetric around 2.5V.

i'm ordering some BC640, which are TO92(A), 1A dropin for A1015. i had
a BC516 PNP darlington on my list, but the BC639 is 1A. for the
switching loads i care about, i don't need high beta.


Entry: crap.. transistor won't switch off
Date: Sat Nov 24 20:50:04 CET 2007

using 78L05 regulator for the chip, but wanted to drive the speaker
straight from the 9V battery. problem is, i can't switch off the
transistor if i'm not using open drain output and a pull-up
resistor..

so i guess just stick with connecting the speaker to the regulator
output, and up the regulator a bit... tho it's 100mA, maybe it can
take a bit of peak current?

do i have everything now? i guess.. maybe an on/off switch, but that's
easy to do later. also, if possible, add a connector for the led, so
it can be brought out to the box.


Entry: got the first carrier on the mic amp
Date: Sat Nov 24 13:03:50 CET 2007

sort of a little mile stone.. but no time for celebration yet. it's at
600Hz, which seems low.. putting it higher gives less response. and
it's quite distorted. higer frequency, more distortion.

ok..

so the resonance frequency of the speaker, measured by moving it over
the table, is about 625 Hz. which is of course the reason why i get
such a good response at 610Hz :)

this will be really hard to get out, so why not use it? use either fm
or pm at that frequency, and adapt the filter / amp accordingly.

so what about this:

make the amp go from 450Hz to 1kHz, and use the resonance frequency of
the speaker as carrier wave.

22K 100nF (450Hz) - 1M 1nF (1kHz) - gain = 45

  init-analog pwm-on 1 freq ! 2 att ! 0 nwave


half of that seems to work fine too.. 305Hz. what about a golden ratio
FSK modulation scheme? that way the harmonics of the lower one won't
interfere with the higher one..

EDIT: sticking to one carrier seems best in light of this resonant
peak.


Entry: direct threaded forth
Date: Thu Nov 29 11:02:39 CET 2007

a couple of days of rest doing admin stuff.. going to amsterdam today
for the final sprint. some things that crossed my mind:

DTC FORTH VM

  * purely concatenative virtual machine code: implement literal as
    i.e. 8 bit literal, and 8 bit shift + literal: code never accesses
    IP. same for jumps.

  * unified memory model is probably more important than speed: it
    allows for other memory mapped tricks, since memory has a single
    access point. the real problem is that instruction fetch is built
    on top of the memory model. maybe this can be optimized somehow?


DEMODULATOR

  * bitrate ~ bandwidth^(-1)

    this can easily be seen in the response of a sharp filter: it
    rings a lot, so can't accomodate much time variation.

  * AM (data=power) -> PM (signal + data=phase).

    i'd like to stick to a single carrier. the reason is that the
    resonant peak of the speaker is something that best can be used
    instead of fought. i didn't measure it, but it at least looks and
    sounds quite sharp. comparing waveforms on the scope, i would
    guess 12dB through both sending and receiving speakers.

  * sampling rate is only dependent on bit rate, not on carrier
    frequency: aliasing sampling can be used. this means that in order
    to accomodate more processing power on the same chip, the bitrate
    can just be lowered.


and the combination of both: since this is going to be quite
math-intensive, it's probably best to choose for a bit more highlevel
approach: construct a couple of decent abstractions, maybe some easy
to use fixed point math routines, instead of perfectly optimal
ones.. the chip runs at 10MIPS. if i aim at 100bps, that's 100
instructions per bit, if i aim at 10bps, that's 1000.. the idea is to
get it to work first.


Entry: math routines
Date: Fri Nov 30 12:21:49 CET 2007


time for math routines. some design decisions:

     * signed/unsigned
     * bit size
     * saturated/overflow

it would be nice to be able to reuse these later in the DTC standard
forth as math routines. i do have special need here, in the sense that
the input is only 8 bit.

the main problem is the multiplication routine. the standard has a
16x16 -> 32 signed multiplication.


2 approaches for the filter:
  * simple: 2nd order IIR bandpass
  * matched FIR filter as in PSK31


i have enough memory to perform FIR filtering. let's focus on trying
to understand the PSK31 demodulator. till now i only found code
examples, no highlevel pseudocode or diagrams.

here Peter G3PLX talks about AFC (automatic frequency correction):
http://www.ka7oei.com/fsk_transmitter.html#FSK31_Explained

with PSK apparently the frequency correction doesn't need to know
anything about the data, since the spectrum is symmetric around the
carrier.

i'm not sure whether AFC is necessary in my scheme: all recievers and
transmitters are stationary, and there's no wind. on the scope however
i did see some slight variation in period, but this was probably due
motion of the speaker/mic (just sticking up by its pair of connecting
wires).

 "To get in sync. the PSK31 receiver derives it's timing from the 31Hz
  amplitude modulation on the signal. The Varicode alphabet has been
  specially designed to make sure there's always enough AM to keep the
  receiver in sync. Notice that we can extract the AM from the
  incoming signal even if it's not quite on tune. In PSK31 therefore,
  the AFC and the synchronisation are completely independent of each
  other."

So it's not completely true that the AFC doesn't need to know anything
about the data: data needs to be 'rich enough'. But the trick of
getting the AM straight from the signal is interesting. This means i
can probably proceed nicely from AM -> PM

Some alarming notions here:
http://www.nonstopsystems.com/radio/frank_radio_psk31.htm

 "Like the two-tone and unlike FSK, however, if we pass this through a
  transmitter, we get intermodulation products if it is not linear, so
  we DO need to be careful not to overdrive the audio. However, even
  the worst linears will give third-order products of 25dB at +/-47Hz
  (3 times the baudrate wide) and fifth-order products of 35dB at
  +/-78Hz (5 times the baudrate wide), a considerable improvement over
  the hard-keying case. If we infinitely overdrive the linear, we are
  back to the same levels as the hard-keyed system."

What i saw on my scope, is a strong 2nd harmonic, probably due to
non--linearity caused by the DC bias in the speaker. Using some kind
of feedforward correction based on a measurement it is probably
possible to correct this when it becomes a problem: the transmitter is
simple enough so all kinds of wave shaping corrections could be
introduced there.

 "The PSK31 receiver overcomes this (ED: side lobes due to square
  window) by filtering the receive signal, or by what amounts to the
  same thing, shaping the envelope of the received bit. The shape is
  more complex than the cosine shape used in the transmitter: if we
  used a cosine in the receiver we end up with some signal from one
  received bit "spreading" into the next bit, an inevitable result of
  cascading two filters which are each already "spread" by one
  bit. The more complex shape in the receiver overcomes this by
  shaping 4 bits at a time and compensating for this intersymbol
  interference, but the end result is a passband that is at least 64dB
  down at +/-31Hz and beyond, and doesn't introduce any
  inter-symbol-interference when receiving a cosine-shaped
  transmission."

 "PSK31 is therefore ideally suited to HF use, and would not be
  expected to show any advantage over the hard-keyed
  integrate-and-dump method in areas where the only thing we are
  fighting is white noise and we don't need to worry about
  interference."

So maybe it's not necessary yet? Since we're using a single frequency
in the first attempt, a demodulator that rejects nearby signals might
not be required.

Anyway. Conclusion: i need to have a look at the exact algorithm used
for matching + synchronization.


Entry: demodulator.f
Date: Fri Nov 30 21:07:31 CET 2007

i had some bottom up code (what can be done efficiently) using
8x8->16 unsigned multiplication and 24bit accumulation. this works
well for rectangular windows, but not so much for non-rectangular.

maybe rectangular is enough since we don't have interfering signals?
anyway, it might be wise to look at how to do a windowed one..

i guess the idea is like this: make the window obey some kind of
average property that can be removed using maybe a separate
accumulation of the signal.

it doesn't look that hard: ** is inner product

[ s(t) + s_0 ] ** [ w(t) + w_0 ]

so there are 3 correction terms:

   s(t) ** w_0  == 0
   w(t) ** s_0  == 0
   w_0 ** s_0

which requires the average signal s_0 as the only variable component,
which needs to be scaled with the window DC component (can be 2^...)
and a fixed offset.

so i can basicly use the same unsigned core routine for general
complex FIR filters: renamed the macros to mac-u8xu8.f, and added
complex-fir.f


Entry: drop dup
Date: Fri Nov 30 23:02:59 CET 2007

optimization: drop dup -> movf INDF0


Entry: implementing the filter loop : complex-fir.f
Date: Sat Dec  1 10:27:37 CET 2007

i have it down to about 31 instructions for an unsigned multiply
accumulate operation 8x8 -> 24, and an accumulation. both can be
combined after the loop to correct the offset.

offset compensation is implemented now, and all sharable code has been
moved to macros in mac-u8xu8.f

routines are tested and seem to work just fine. so:

    filter coefficients are centered at #x80, but the accumulator will
    shift one position to the left to compensate, so the filter
    coefficients behave as s.7 bit fixed point with inverted sign bit
    if the accumulator is seen as s.15.8

this means that:

    11111111 -> +0.1111111
    00000000 -> -1.0000000


Entry: -!
Date: Sat Dec  1 11:49:04 CET 2007

\ value addr --
: -! >r negate r> +! ;


this subtracts the number on the stack from the variable, not the
other way around. note that this has the argument order of '!' not of
'-'

the reason for doing it like this is that this occurs the most:
subtract a value from an accumulator variable.


Entry: subsampling
Date: Sat Dec  1 13:43:35 CET 2007


A baud rate that sounds like 16th notes would be nice, which is about
8 Hz. A carrier of 600 Hz, this gives a ratio of 75. The sampling rate
needs to be > 16Hz, so let's take the one in the neighbourhood of
32Hz.

Care needs to be taken though when using aliasing: the frequency error
will amplify. Let's see.. using 10 Mhz, the subdivisions become:

2^20 -> 9.5 Hz     baud rate
2^18 -> 38.1 Hz    sampling frequency
2^14 -> 610.4 Hz   carrier frequency
2^12 -> 2.44 kHz   4 x carrier
2^7  -> 78.1 kHz   PWM frequency

the carrier/baud rate here is 2^6 = 64

going from carrier -> sampling frequency is a subdivision of
16. what's the error of the oscillators?

CSTLS10M0G53 has 0.5 % precision. times 16 that becomes 8.0 % which is
quite a lot.. so probably it does need continuous phase compensation??

another reason to not subsample is to get better noise performance and
better frequency rejection due to longer integration time. using 4 x
carrier frequency is still only 2.4 kHz which is at 2^12 subdivision,
or 4k instructions per sample, which is absolutely no problem. this
gives 2^8 samples per symbol.

another thing to think about: synchronization. this can be implemented
using time shifts or phasor rotation. the latter is probably not a
good idea due to problems with filter matching. so actually, the
carrier needs to be significantly oversampled, or at least mixed.. i
think i need to make a new table with variables..


Entry: synchronization
Date: Sat Dec  1 14:34:11 CET 2007

if there are enough symbol alternations present this causes
significant AM modulation which makes synchronization easy: sync to
the zero crossings. this means the preamble needs to be 01
transitions.. probably best to use simple async with 1 = idle =
transition and 0 = no transition.

next:

 * AM send
 * AM receive


Entry: multiplication again
Date: Sat Dec  1 15:16:13 CET 2007

funny how this is starting to be an exercise in multiplication
routines :)

since i'm using unsigned multiplication for the filter for efficiency
reasons, i have to implement signed multiplication in cases where
correction can't be moved to outside of a loop, which is the generic
case.

the 16bit multiplication performs correction using conditional
subtraction. for 8 bit it's probably easier to use conditional
negation, since that doesn't require extra storage.

this sucks.. -1 * -1 overflows to -1..  maybe it's better to use a
representation that can actually encode 1, even if this means giving
up one bit of precision?

s1.6


Entry: userfriendly
Date: Sun Dec  2 10:37:42 CET 2007

i need to weed a bit in the userfriendlyness and SIMPLIFY the way some
things are used, because it seems as if some combinations cannot be
made. i wanted to create scheme code that generates forth macros, but
it looks like this is not so easy!

another thing is 'splitting' the host and target, so the host can run
some kind of query program in cat or scheme.. maybe the 'current-io'
parameter should be set back again in prj and scheme modes?


Entry: clicks
Date: Sun Dec  2 13:03:06 CET 2007

need ramp-up and ramp-down to prevent clicks. ramp-up time should be
in the order of 20ms = (50Hz)^(-1)

after ramp-up, carrier fade in should be used. this can use the
'attenuation' variable. OK. using 25ms ramp to bias.

now, how to intialize the phase?

  OOK: can start envelope with -128 (amp = -1)
                 carrier with -64 (amp = 0)

  BPSK: needs carrier fade-in.


looks like BPSK sounds smooth enough without envelope fade-in when
starting the carrier at phase = -PI/2

doing the same now for AM, so there's no problem with envelope
frequency = 0.


Entry: transmitter
Date: Sun Dec  2 15:08:19 CET 2007

time to get the transmitter sorted out, so i can make a standalone
device that sends out a known data sequence:

  * combine the framed rx/tx with sending/receiving
  * figure out OOK and BPSK transition based send modes

return to zero, i don't see the point in that, so transition based
seems good enough. let's say 1 = trans, 0 = notrans. this has the
advantage that an idle line is the richest signal, good for sync
purposes.

transition based is easiest to implement using the current code.  in
case transition based is not desired (i.e. because it accumulates
error), this can still be pre-coded as long as a transmission starts
with a known oscillator phase.


Entry: signal rates revisited
Date: Sun Dec  2 18:50:57 CET 2007

4 independent frequencies:

  * PWM TX rate: determines high-frequency + aliasing noise
  * carrier:     only important for path (i.e. speaker reso)
  * baud rate:   bandwidth -> noise sensitivity
  * RX rate:     selectivity (related to FIR length)

(EDIT: carrier and baud rate are not independent wrt data filter
qualoty. see below)

important for the receiver are :

 - baud rate, which limits the maximal integration time (dependent on
   symbol length).

 - RX rate: enables longer filter lengths, which gives more
   selectivity and noise immunity.


it doesn't make sense, for constant baud rate, to up the RX frequency,
but keep the FIR length constant, so:

        FIR =  k . (RX / BAUD)

        RX = Fosc / OPS

where k is the number of symbols the FIR spreads over, probably 1 or
2. and OPS is the amortized number of operations per sample
(processing and aquisition). the filter is 32 in the current
implementation.

this gives about 300kHz at 10MIPS. looks like we have some
headroom..

anything more than 8kHz is probably not going to make much sense.


( I was thinking about noise and dithering, and that at this high
  frequency because of absence of noise there will be no 'extra'
  sensitivity due to the dithering at levels close to the quantisation
  step, but there probably will be extra due to pwm effects. So it
  looks like all small bits help.. )


EDIT: another variable i forgot to mention is symbol rate
vs. carrier. using a mixer, it is desirable to have large separation
between the two so a simple data filter can be used.


Entry: matched filter
Date: Sun Dec  2 21:22:33 CET 2007


differential BPSK data stream:
.   .___.   .   .___.
 \ /     \ / \ /
  X       X   X
./ \.___./ \./ \.___.
  1   0   1   1   0

Using cosine crossfading as implemented in modulator.f is effectively
the same as using symbols 2 baud periods wide with a 1 + cos
envelope. this wavelet is the output filter which maps a binary +1,-1
PCM signal to the shaped BPSK signal.

This output filter needs to be matched in the receiver.

Now, about matched filters..

A matched filter in the presence of additive white gaussian noise is
just the time-reverse of the wavelet: one projects the observed signal
vector onto the 1D space spanned by the wavelet's vector in signal
space. This gets rid of all the disturbances orthogonal to the
wavelet's subspace.

When the noise is not white, the noise statistics are used to compute
an optimal subspace to project onto, such that most of the noise will
still vanish.

I don't have noise statistics, and I'm not going to use any online
estimation, which leaves me to plain and simple convolution with the
time-reversed signal.

I do wonder what all this talk is about 'designing' matched filters
for PSK31...


Entry: phase synchronization
Date: Sun Dec  2 22:27:44 CET 2007

I'm confusing 2 things:

  * bit symbol synchronization
  * carrier phase synchronization

Using a complex matched filter, phase synchronization can be done
entirely by using an extra phase rotation operation: it doesn't really
matter what comes out, as long as:

 - the matched filter's envelope is synchronized
 - we're using I and Q filters

It's clear that mismatching the symbol clock has a lot less effect
than mismatching the carrier phase. When compensating the carrier
phase, we compensate what's aliased down after subsampling at symbol
rate.

This still needs 2 separate synchronization methods: bit symbol
synchronization (which sample point to start filtering) and carrier
frequency/phase compensation.


Entry: recording sound
Date: Mon Dec  3 10:35:52 CET 2007

let's go for this: a symbol is 256 samples. This allows easy buffer
management. This fixes the sample rate at 4.88 kHz

recording seems to work ok: it's at 8x the reso frequency of the
speaker: scratching it over a newspaper gives a nice saturated wave
with period about 8.

maybe it's time to add gplot in the debug loop.


Entry: IIR or FIR
Date: Mon Dec  3 15:39:49 CET 2007

IIR:
      * mixer + lowpass
      * sync mixer to carrier
      * start bit detection = zero crossing
      * no messing with blocks
      * only approximate matching

FIR:
      * possible to construct optimal matching filter
      * no phase distortion
      * synchronization more complicated (filter freq is fixed?)


It looks like IIR + PLL is really simpler to implement (same at every
sample block, no buffers necssary).

NEXT: have a look at how to implement a PLL.

( Actually... It should be possible to mix the assymetric tail of a
  stable IIR filter in the transmitter! Though not simple due to
  rounding.. Something like that can probably not be computed exactly,
  so this would require a bit more expensive transmitter.. )

Using a PLL it's probably best to first try to synchronize to a clean
carrier. Since a mixer is necessary as part of the processing chain,
that can be used to perform the correction.


-> [ MIX ] -> [ LPF ] ----> I --> [ AGC/HIST ] -> bits
      ^               --o-> Q
      |                 |
      \-------[ OSC ]<--/

The quadrature component can be used as an error feedback. This always
works, since it's not present in the signal.

reading this:
http://rfdesign.com/mag/radio_practical_costas_loop/

two things are mentioned to perform carrier recovery:
    * squaring + division
    * costas loop

note that's about an analog implementation.

so not all roses in the IIR world.. what about taking the best of
both? perform carrier recovery using a mixer + PLL and use a similar
approach for the data sampling recovery.


a nice place to go back to this paper:
http://www.argreenhouse.com/society/TacCom/papers98/21_07i.pdf

where the signal is sampled using a 1-bit dac, and the mixer has
values {-1,0,1}. after integration, an adaptive rotation is performed.


Entry: simplified
Date: Mon Dec  3 16:56:50 CET 2007

 * AGC: or the absolute value of a symbol buffer + compute shift count
 * INTEGRATE: sum the entire buffer (no sideband rejection)
 * sample at say 8 points per symbol

since there's no filter other than the analog 450-2kHz this should
perform pretty bad. but i guess it's time for a fail-safe.. use noise
modulation first :)

NM -> AM (async) -> PM (sync)

a genuine problem doing this al experimental is the program
sequencing.. there's a huge difference between being able to do
something per sample and having to store some for later..

EDIT: it doesn't make sense to write an AM demodulator without
thinking about the BPSK that will follow, so i need to do AM with a
separate mixer + LPF.

mixer seems straightforward. the remaining problem is the LPF. If i
can make that work with simple shifts, where's as good as there..


Entry: triangular window
Date: Mon Dec  3 17:33:12 CET 2007

however.. it is probably possible to use triangle windows and
'recompose' things later, since a triangle window is self-similar!

given a number of sample points, from this construct 2 numbers: one
weighted with ramp up, one with ramp down. these can be easily
combined, so one could shift the center of the window and recompute
easily.


Entry: interrupts
Date: Mon Dec  3 17:46:08 CET 2007

looks like the real question is whether or not to use interrupts. doing
this as state machines leaves too little room for block-based FIR
techniques. i'm also not very convinced about trying the AM first,
because i'm already trying to optimize the layout for that algo: i
need to go for mixer + IIR LPF, and implement AM in that framework.


Entry: data filter coefficient
Date: Mon Dec  3 18:31:07 CET 2007

The constraint is: we don't care about the delay, but attenuation
shouldn't be too big. What about this: pick the pole at half the bit
rate, and round upto the next power of 2.

  t <-   1/sqrt(2) = (1 - 2^(-p)) ^ t

EDIT: how to pick p ?

it's easier to use this approach, where we require the decay time to
be such that a the response will drop below the 1/2 threshold in one
symbol time:

  (1 - 2^(-p)) ^ t < 1/2

where t is the number of samples in a symbol. this is equivalent,
since the t in the previous formula is related to half the baud rate.

if t is large (in our case it's 64), the linear term is the one that
dominates the lhs, so the above can be approximated by

  (1 - 1 + t2^(-p))

which gives an expression:

  p = log_2 (2t)


Entry: AM vs PM
Date: Tue Dec  4 11:38:18 CET 2007

something i missed yesterday: demodulating AM with a non-properly
tuned mixer might give trouble.

no, this is not the case as long as both the I and Q components are
computed: it only gives a problem for PHASE (which will rotate on
mismatch) not AMPLITUDE.


Entry: data filter implementation
Date: Tue Dec  4 12:03:34 CET 2007


the easiest way to keep precision is to never loose any bits. the data
filter has the form:

   x += (1 - a) x + a u

where x is state and u is input, and a = 1 - 2^(-p)

the current settings give t = 512 (5kHz sample rate and 9Hz symbol
rate). which means p = log_2 (1024) = 10 as the approximation of the
bound. speeding up the filter by a factor of 2 gives p = 9.

it might be worth relaxing it even further to 8, so shifts are
eliminated.

( so, just out of curiosity.. is it possible to use unsigned
  multiplication? just doing this without thinking introduces a scaled
  copy of the original modulated signal in the output. if the lowpass
  filter allows, this might be not a problem: requirements are just 2x
  as strict. )

problem with signs: it might be simpler to work completely with
unsigned values since signs make multi-byte arithmetic more
complicated (need sign extension). a simple solution is to run the
multiplication as signed (to get rid of the component at the carrier
frequency) but run the filter accumulation as unsigned. the DC
component in the result is completely predictable and can be
subtracted later.

first experiment i measure something: noise is at around 5 and maximal
measured signal is around 150. that's a significant difference. now
it's time to map the 24bit range to something more managable.

now to be careful not to overflow the filter input: it seems
reasonable to ignore the lower byte.

looks like i have a bug in the signed 16bit multiplication
routine. EDIT: yep.. type TOSL replaced with TOSH


Entry: better debugging tools
Date: Tue Dec  4 15:07:04 CET 2007

i need a way to print reports from ram.. before this can be done in a
straightforward way, the interaction language (which will need to be
cat or scheme) need to be defined properly + some way of adding code
like this to the project needs to be defined.

what i need now is a way to inspect 24 bit numbers.. what about adding
inspectors to the code? these are forth words that send out data in
the form of a buffer. i could then make inspectors for any kind of
thing.

EDIT: yes.. i really have a good excuse to make proper debugging
tools. just fixed the prj console to be able to connect to the
target. was thinking about properly specifying interactive commands as
an 'escaped' layer over the target interaction.. basicly every
possible 'island' in the code needs to be extensible.. most
importantly: macros, interactive words, prj words, scheme code, ...

EDIT: considering the amount of time i'm loosing to get this thing
going, it might be wise to standardize on method.. i.e. all 16bit
signed fixed point or something.


Entry: double notation
Date: Tue Dec  4 16:15:02 CET 2007

there's some things to distinguish:

    1 x 1 -> 1      word (standard words)

    2 x 2 -> 2      _word (16bit variants of standard words used in DTC)

    1 x 1 -> 2      2word, 3word etc.. nonstandard, any combination
    1 x 2 -> 2      that makes sense
    1 x 3 -> 3
    1 -> 2
    ...


Entry: costas loop
Date: Tue Dec  4 23:09:21 CET 2007

Have a look at the HSP50210 datasheet. It gives a nice general idea
about how a PSK receiver would work: 3 tracking loops
(AGC,carrier,symbol), user selectable threshold, matched filter (RRC
or I&D), soft decisions.


Entry: saturation
Date: Wed Dec  5 12:27:13 CET 2007

It's important to be able to prevent wrap-around distortion. Some kind
of saturation mechanism might make this easiest: it's easier than
carrying around high precision data. So where to saturate? Most
straightforward is the LPF, but at first glance it's better done at
the point where power is calculated, since LPF seems to have enough
dynamic range.

The properties of a non-saturated word are:
    * sign word byte is #x0000 or #xFFFF
    * both words have the same sign bit

This can be reduced to:
    sign word + lower sign byte == 0

OK


Entry: weird LPF output
Date: Wed Dec  5 15:42:12 CET 2007

i think i need to focus on building some more debug tools
today.. something's going wrong and i can't find the cause. the
problem is amplitude modulation in the LPF power output, going from
100 -> 400, with a period of about 35 = 140Hz. and a component at 4 x
that frequency, not locked, which is probably the carrier.

i measure this with the modulated signal, and with an unmodulated
carrier.

go one by one: it's probably best to try to eliminate the DC offset,
so at least that is not drowning the signal component, which is a lot
smaller.. EDIT: this is already happening: sample is converted to
signed then multiplied.

ok.. questions

 * why is there a 1/8 Hz component in the power output? i would expect
   the power to be smooth.. not modulated -> this is just noise. the
   level is really low, so it's the accumulation of (2^(-8) * u).

 * why is there a 1/64 Hz component in the power output?
   EDIT: the frequency is a mixer mismatch = 1/8 - 1/8', where 8' is
   the not quite =8 measured carrier frequency.

 * what does the filter input look like?

dit some input signal measurement, and the first thing i notice is
that the carrier frequency is quite off. i get 28/4 is T=7 instead of
T=8. which would give a beat at 1/56. that might explain a lot...

ok.. i get it. the convolution of these 2 spectra:

     |  .  |     cos(w1 t)
    |   .   |    cos(w2 t)
        0

gives:

 |     |.|     | cos(wd t) cos(ws t)
        0

with wd = w2-w1 and ws = w2+w1

the sinewave that gets folded near 0 will interfere with the signal
data! so this approach just doesn't work without synchronization!

it looks like the only way to do this is to either have proper
synchronization, or use a band pass filter, not a mixer.

AM: first order lowpass with complex coefficient, followed by output
    power computation.

PM: requires AGC or cartesian->polar conversion for properly scaled Q
    -> phase feedback.

the quick and dirty way is to just filter the absolute value of the
input. then add a more selective filter. hmm.. i still need to kick
out DC, so better go for the frequency-selective filter.


Entry: 1/8 or 1/4 frequency filter?
Date: Thu Dec  6 17:12:32 CET 2007

it's probably easier to separate the problem in 2 parts: (1 -1 ; -1 1)
with sqrt(2) amplitude, compensated by a single arbitrary
multiplication to get the gain to 1-2^(-8). this requires at least
16bit. incorporating the scaling factor in the matrix seems to lead to
the same precision problem, but requires 4 multiplications instead of
one.

so what about a 1/4 filter? that's even simpler, and doesn't require a
sqrt(2) scaling factor, so the (1-2^-8) scaling can be done without a
multiplier.

so.. the lpf filter i had before can be re-used. the only thing to add
is to cross-add the filter states, and add in the input signal.
rotating the signal can be done using a 4-state state machine, which
will add/subtract the signal to/from one of the states. +x +y -x -y

give this approach it's probably also possible to reduce the LPF state
from 24 to 16 bit. check this. in a stable regime, using 2 bytes, the
high byte will have the amplitude of the input at the frequency, so at
least for strong signals it would be stable (gain = 1). looks like it
has only effect on noise and rejection.


Entry: too much carrier drift
Date: Thu Dec  6 19:16:16 CET 2007


so, to get a bit of full-circle understanding: why not mix a signal to
DC and filter its absolute value? looks like the thing i did wrong was
not the mixer, but the place where the smoothing is going on.

or:
  * mix + filter: isolates a frequency region
  * full-wave rectify + filter.

2 filter operations are essential here, so it's probably easier to do
only one, and instead of full-wave use the amplitude/power of a
filter.


But but but... maybe the filter is actually too sharp? I measured the
carrier at 1/7Hz, expecting it at 1/8Hz.. i'm missing a parameter:
bandwith and time decay are related, but increasing the sample rate..

look: this is just a shifted one-pole filter: it's the equivalent of
passing the difference signal 1/7-1/8 to the lowpass filter. that
probably won't survive.. so i'm stuck with the same problem: the
carrier shift is much more than the bandwidth of the signal!

it's about 80Hz at 600Hz, while the signalling frequency is around
9Hz. this means i have to do something about it... it's either going
to be manual tuning, or adaptive tuning.. synchronous demod is
starting to look like the only solution. or i should just use the 2
filter approach of above:

  * wide filter to eliminate noise: it should be wide enough to
    capture the carrier tolerance.

  * narrow filter to perform demodulation after full-wave rectify.

it's starting to look like synchronous is going to be a lot less
hassle.. again.. what do i need? an AGC to normalize the Q output such
that i can use it as feedback to phase offset.

go over this again.. something's wrong.


Entry: cordic
Date: Fri Dec  7 09:36:31 CET 2007

the most elegant solution seems to be to use a cordic I,Q->A,P
transform, so both the AGC and PLL have proper data to work on.

For use in the demodulator, the constant scaling factor is not a
problem. What I would like to do is to perform sequential updates: use
Q to update I and then use the updated I to update Q. With a=s2^(-n)
this amounts to:

| 1  a |   | 1  0 |    | 1 + a^2   a |
|      | * |      | =  |             |
| 0  1 |   | a  1 |    | a         1 |

Which is no longer a scaled rotation. Correcting this looks like more
hassle than just performing the update in parallel.

I don't need a lot of phase resolution. 8 bit is definitely enough.

Hmm.. Is going to be a lot of work..


Entry: simplified PLL
Date: Fri Dec  7 11:21:40 CET 2007

What about using a 2 bit phase detector which just detects the
quadrant and accordingly adjusts the frequency?

 -2 | -1
----+-----
 +2 | +1

With + meaning counterclockwise.  Since we're not using the Q
component, both directions of I should be allowed, so a better
approach is:

 +1 | -1
----+-----
 -1 | +1

Filtering this signal and using it to increment the frequency gives
the right amount of feedback near the lock. In the phase diagram, what
needs to be done is to slow down the oscillator. The design parameters
here are:

     - smoothing of the phase error
     - gain of the phase error

I'm not too sure about oscillations though.. Maybe linear error
response is an essential element?

I guess i'm missing some experience here. Gut feeling says it should
be possible to design a PLL by filtering a 2 bit phase detector. Gut
feeling also says that this will lead to oscillations.

I'm off track again. These are the choices to make:
  - go for CART->POLAR transform with high resolution (i.e. 8 bit)
  - use AGC and Q component for feedback.

The latter seems simpler. Maybe i should try that first. Cordic isn't
as straightforward as i thought since it needs a barrel shifter. Which
could be implemented using the multiplier, but then why not use proper
coefficients?

So.. AGC.

Stick to the mixer algo, but figure out how to perform variable gain
so the error signal used to drive the phase adjustment is properly
scaled.

Estimate the gain using a filtered sum of absolute value of the I and
Q components.


Entry: PLL analysis
Date: Fri Dec  7 13:32:07 CET 2007

Using linear system theory: around the error=0 point, the system is
linear and behaves like a controlled integrator. We control frequency
(velocity) and out comes phase (position) which is the integral of
frequency. Such a system with a proportional controller is stable
because it is first order with negative feedback. It can be sped up by
increasing the gain. However, faster also means more susceptible to
noise of the control signal (in the PLL case the Q signal)

This is in absence of a disturbance signal. This can be modeled by a
signal d which drives the integrator directly. In the PLL case this is
the frequency mismatch. This will result in some permanent error. The
ratio between the 2 is determined by the error amplification.

Questions:

  * add or subtract from rx-carrier-inc ?
    -> depends on whether one wants to sync to +I or -I

  * how to prevent mixer drift?
    -> looks like the DC component of the error should not have any influence?


Entry: discrete control systems
Date: Fri Dec  7 14:00:14 CET 2007

Looks like the thing i'm confused about is the difference between
analog control systems and digitial ones. An analog 1st order
proportional control system can never overshoot, but a naive
discretization of this can!

The problem here is instability of integration methods.


Entry: the problem with the frequency offset
Date: Fri Dec  7 14:48:07 CET 2007

i think i found it: really stupid.. first i thought it was an
oscillator problem. didn't occur to me to try with 2 different boards
to see if that's actually the case. anyways, after trying, i got
exactly the same result. looking at the code i find this:

: sample>
    16 for wait-sample next
    0 ad@
    ;

which, if the processing takes longer than 1/16 of the clock period,
is wrong of course!

the solution is to solve this using the timer, or perform the sampling
in an isr. let's try the postscaler. OK.

need a break.. what i'm doing wrong is to use the integral of the
error to compute the frequency.. frequency should be just F_0 - e.

after the break.. looks like i'm still making too many mistakes: of
course, if i just restart the tracker at a random point of carrier
phase, chances are that there are going to be some transient
fenomena. i just need to run it longer probably.

OK: sync works to plain carrier.


Entry: synchronization to modulated carrier
Date: Sat Dec  8 11:30:05 CET 2007

i tried to following: use the sign of I to steer the direction in
which the feedback works. works ok for clean carrier, but in full
reversal this leads to problems.

looks like a conceptual problem.

maybe the synchronizer should be slowed down? in a sense that a symbol
transition, which moves through a zero feedback point (in which the
carrier is effectively not controlled), has no noticable effect on the
setting of the tuner, but when this transition is complete, full
feedback is in effect to pull the oscillator in sync again.


using just Q feedback, the PLL seems to stabilize around Q = -120,
with an amplitude of about 30.

say -128, that's -#x0080

#2000 -> #1F80

it's 1/64 th of the frequency, which is exactly the difference between
symbol rate and carrier: the PLL locks to another attractor..

a simple solution seems to be to limit the PLL frequency correction.

anyways, the sign stuff is necessary.


Entry: symbol synchronization
Date: Sat Dec  8 12:07:52 CET 2007

because i'm using locked synthesis and no non-synced downmixing,
the symbol synchronization can be derived from the carrier
synchronization. so maybe i should forget about syncing to the
modulated carrier?

pulling the oscillator in sync using a plain carrier might help a lot
actually. let's try a 7/1 test tone.


Entry: first packets: pll and reversals
Date: Sat Dec  8 13:25:20 CET 2007

apart from some problems related to gain (probably too much drive
which kicks the PLL out of sync: moving the things apart gives better
results.) it seems to work just fine.

looking at an I,Q plot i suspect the slow rise of the I signal is not
due to filtering, but due to loss of sync: Q gets thrown off, and the
PLL needs to re-sync. maybe it's more important to filter the error
feedback..

aha! it seems as if the PLL switches to the negative frequency
attractor. indeed. with wide spaced reversals it is clear that Q moves
from around -13 to +13

the problem is that by suddenly moving from subtract to add changes
the frequency of the oscillator from bias+corr to bias-corr. how to
solve this? aha.. maybe it's not necesary to flip the sign? since a
phase reversal in the I plane doesn't change the Q component?
actually, it does. switching of the sign compensation resynchronizes
the oscillator on transition to I = +A.

it looks like i need a controller with a zero error, which effectively
means a PI instead of a P controller. note that i already had an I
controller, but that's unstable.

i'm measuring an error of about #x10 / #x2000 = 0.2 % -- the spec
sheet says 0.5 % max. looks normal.

thinking about this PI controller: P + lowpass can't work, because
there is no zero-error. so i need an integrator. the problem is the
time constant / gain factor.

the error (Q) does seem to go to zero now. however, there is stil a
transition at the reversal.

now that i have a zero error, it's maybe best to multiply the I and Q
to obtain the error signal? for after dinner: i'm stuck with yet
another scale problem.. fixed point without a barrel shifter is
madness.. it might have been better to just implement the tools
necessary, even if they are inefficient: it is definitely doable
(which is what i wanted to prove really..) but it's difficult.

NEXT:
    - I * Q
    - AGC

preferrably combined such that I * Q and error feedback become simple.

i just saturated the error output to +-127.. i get nice results for I
amplitude around 100-150. but still: the Q component wiggles when the
phase transforms.

reading the costas-loop paper mentioned above: the 3rd multiplier is
called a phase doubler. it's only point is to make +-180deg both
stable lock points.

so, i'll write up the problem below.


Entry: more questions
Date: Sat Dec  8 15:04:31 CET 2007

why does the PLL response oscillate? the analysis by linear
approximation i made above showed it was first order.. something's
wrong there.


Entry: generic lowpass filters
Date: Sat Dec  8 16:24:02 CET 2007

it's no longer managable to have these special 1 * 2^(-8) filters.. i
need a special purpose 16bit lowpass filter, with saturation,
operating on proper 16 bit signed values, with possibly 8 bit
coefficients in a decent range.

it looks like there's plenty of room to do it in a proper
object-oriented fashion.

not doing it in proper object-oriented fashion, but a macro operating
on 4-byte state: 3 byte signed filter state, and 1 byte unsigned
filter coefficient: .00AA


Entry: AGC
Date: Sat Dec  8 22:24:17 CET 2007

it's not so straightforward, since it needs a division
operation.. currently, with the multiplication doubler (also with the
sign doubler) locking seems to work fine around 100-150 amplitude.


Entry: lock problems on transition
Date: Sat Dec  8 22:40:44 CET 2007

i still get the same problem: on transition, the phase is messed up
again. maybe the oscillator phase should rotate too? i'm
confused... at the point where the I component goes into transition,
the Q component gets kicked off.

the integrating controller works well: error goes to zero
eventually. i just need to figure out why the phase bumps..


something strange tho..  the Q spike only happens on a +1 -> -1
transition. the -1 -> +1 transition is clean. this smells like some
kind of wrap around bug..

sending #x01 bytes instead of #x11 bytes seems to contradict this:
spike on every transition.


Entry: emergency solution: AM
Date: Sun Dec  9 00:02:13 CET 2007

tomorrow it looks like the best thing to start with is gain control,
to find an optimal feedback coefficient for the PLL. once this works i
can try to find a bitrate that works with the phase error still
happening. then i could hand it over and try to fix the
sync/transition error.

normalization:
  * agc (division + filtering)
  * arctangent

previous conclusion about cordic artangent was that it's hard to do
without a barrel shifter.. i can probably unroll most of this by using
the multiplier and double buffering.

good thing is that this can be used for AM also, without the need for
quadratics.

EDIT:
actually, is should really just do AM by measuring the power. the
previous error (large carrier mismatch) is solved.


Entry: articles
Date: Sun Dec  9 09:15:49 CET 2007


R De Buda "Coherent Demodulation of Frequency-Shift Keying with Low
Deviation Ratio" -- IEEE Transactions, 1972, COM-20 pp 429-435.

S Pasupathy, "Minimum Shift Keying: A Spectrally Efficient Modulation"
-- IEEE Communications Society Magazine, July 1979, Vol 17, pp 14-22.


Entry: AM
Date: Sun Dec  9 10:13:57 CET 2007

i got very nice reception it looks like.

what about the following algorithm:
  * set threshold to an estimate of the noise threshold (say 50)
  * wait until something comes in: interpret it as a start bit
  * find the max amplitude during the start bit
  * start sampling 9 bytes, by waiting for half a symbol length, and
  compare to half the dynamic threshold


looking at some sampled data of #x55 + start and stop bits, which is
01010101, with 0 = ON, 1 = OFF, it seems that putting the threshold at
half is not a good idea.. also, the time it takes to reach from going
above noise threshold (50) to the peak of the start bit is exactly the
symbol length.

maybe it should be compared with a lowpass envelope?

tried this, but looks like LPF delay is going to be a
problem. however, it should be possible to keep the same filter, but
perform the comparison with delayed versions?

another possibility is to just save the sample points, and perform the
filtering at a later stage.

or.. it could just be compared to the previous sample point? if lower
it's the reverse? that will probably work just fine: this might give a
problem for stable 0 or 1..

next algo:

   * start sampling s - s_0 after detecting a start bit. s_0 = rise
     time to threshold level.

   * collect 10 samples.

   * postprocess


what i'm doing: s_0 = 0, and watching the output of the sampling with
a #x55 byte. it looks pretty decent. now trying the number station.

next approach:

   * compare with previous (differentiate)
   * maybe hysteresis?

differentiate is no good.

i'm probably fighting something else.. maybe the data rate is just too
fast? i had to move from 512 samples to 256.. so looks like something
else is going on..

what about this: change the special purpose lowpass filter so it takes
16 bit coefficients, and then reduce the filter pole a bit.


Entry: confused
Date: Sun Dec  9 16:22:43 CET 2007

let's see.. there's something wrong with my symbol rate. i thought it
was 512 samples, but it's 256. corrected for this, i can receive
signals. however, it seems the bandwidth is mismatched. so i have 2
calculations that are probably erroneous:

  * necessary bandwidth -> filter coeff
  * symbol rate at the transmitter

doing some manual experiments, i got the filter pole fixed at #x0100,
which gives very nice waveforms. making it bigger only increases the
noise, but doesn't seem to influence the shape too much.

so everything looks pretty good, but it seems there is too much
inter-symbol interference due to the assymetry of the receive
filter. i could try to hack around this by doubling each bit, but
keeping the envelope constant.


got now:
    * halved symbol rate (transition + stable)
    * 3 x bandwidth (100 -> 300)

now at least the filter makes it roundtrip from 0 to max amp.

now try to subtract startbit from each bit, then use sign.

this works!

reception seems quite robust. at least when it's not receving
bogus. so i need a way to eliminate the worst kinds of noise, which
are transients that trigger a start bit. these could then be used as
human input.

it's not very robust tho.. probably i need to compute the maximum, and
use half of that (or less..) as a threshold.

  looks like this is relatively robust:
        threshold = 1/4 of maximal power
        translated to amplitude this is 1/2


Entry: next
Date: Sun Dec  9 18:32:53 CET 2007

i think it's ok to forget about synchronous stuff for a while... also
speeding it up is for later maybe. what i need first is:

  * extra stop bit to eliminate transients
  * cleanup code for blocking send & receive.


Entry: krikit -> reflections on Forth and DSP
Date: Sun Dec  9 22:52:16 CET 2007

looking at the code i write, it is full of global variables (temporary
storage for multiple fanout). and inlined early-bound math ops,
operating directly on memory instead of the stack. also macros that
unfold to criss-cross variable access are much more useful here than
compositional forth.

the problem with DSP is that speed matters, and it's easy to get to
order of magnitude savings by early binding. so macros are
important. algorithms are often not terribly complicated. the stress
is more on mapping things to hardware.

now, i realize i'm stretching it trying to do DSP on a PIC18. it
misses essential elements like a barrel shifter, large accumulator,
and rich addressing modes. these things REALLY make a huge
difference. but, keeping data in memory (registers) makes things
relatively fast on a PIC18.

if the specs are clear (if the algorithm doesn't change)
implementation can be straightforward, though a manual endeavor. but,
what i've learned, experimentation REQUIRES more highlevel
constructs. i lost too much time and energy in mapping to hardware
before things actually worked.

which leads me to the following strategy: if experimentation on the
hardware is essential, experiment on hardware that 10x faster, or use
data rates 10x slower such that high level abstractions can be
used. for purrr on the PIC18 and 16/24 bit DSP operations this means:
USE A DTC FORTH! when it's done, core routines can be done in purrr18
or in machine language. what i need is:

  * confidence that 10x speedup is possible
  * confidence that slowing down ACTUALLY WORKS
  * patience and discipline to get it to work FIRST and THEN speed it up

what i missed in this project is the availability of an easier to use
16bit forth, and a policy for doing fixed point math. the former would
have made the latter more easy to use.

and second, it is probably a good idea to start looking for a dataflow
language: one that

  * automates the allocation of temporary buffers (variables).
  * enables abstract boxes (made of networks of abstract boxes)
  * automates iterated boxes (+ possible 'folding')
  * separates registers from functions (all feedback = explicit)
  * frp vs. static sequencing ?

so i'm not so sure anymore if forth is really useful for the
dsPIC. maybe in the sense that it should map to the 16bit arch just
like purrr maps to the 8bit arch, but leave the dsp-ishness alone:
provide only an assembler.


Entry: local names
Date: Sun Dec  9 23:15:27 CET 2007

which brings me to macros and local variables.. i'm using the wrong
tool for the job: i can't bind new names to old ones, like in
scheme. for example:

: bla state |
    state 3 +

...

the 'state 3 +' can't be bound to a single new name. this really
screams for a new language syntax and semantics. or at least enable
local macro definitions (there's no real reason why not..)

: bla state |
    : foo  state 1 + ;
    : bar  state 3 + ;

    foo @ bar ! ;

but.. that's getting ugly. what i want here is some form of
pre-scheme. downward closures.

(bla : state |
   (foo : state 1 +)
   (bar : state 3 +)

   foo @ bar !)

another thing: when allowing local variables, it makes more sense to
put them in front of the name, to correspond better to how they are
used.

  (square | dup *)
  (x square | x x *)

and i need to figure out how to solve the anonymous function
problem.. i.e. 'define' vs 'lambda'... these are conflicting.. what
about using

   : in a context that requires a named definition. i.e. a global
     definition or a local let

   | in a context that requires an anonymous definition, i.e. the
     argument to ifte. (| ...) is then equivalent to (...)

  (x square : x x *) vs (x | x x *)


a function definition can then be something like

  (a b c superword :
     (e : a 1 +)
     (f : b 1 +)

     a b + e +)

where local definitions are possible at the beginning of a definition


Entry: dtc forth
Date: Mon Dec 10 14:21:35 CET 2007

a unified memory model is not so hard to implement efficiently. but, a
point that could make a huge difference is to use this memory model
inside the interpreter. a trade-off between speed and flexibility. i
can imagine it being interesting to be able to test code in ram before
flashing it.. at the least, the option should be kept open.


Entry: RGB led
Date: Mon Dec 10 17:57:57 CET 2007

trying to figure out where to put the LED.

  * all connected to analog ports
  * one extra digital connector with 220R resistor
  * all connected to pins, so they can be reverse biased for light detection

pinouts: (common anode)


 |
 ||
||||
4321

 4       3       2    1
 |   R   |   B   |    |
 o--|<|--o--|>|--o    |
         |            |
         o--|>|-------o
             G

the region that's free is between 21 and 26. the anode needs to be
connected to a pin that can be switched to analog. on the board the
best option here is 23/AN8. leaving pin RB0/INT0 free for debug net
might be a good idea. AN9-10-11 are then all digital to control the
LED cathodes, AN8 is analog to tolerate the analog voltage. this also
won't conflict with the necessary digital outputs already on the
board. the anode resistor could go to.

the RGB led is connected like this:

26 RB5      o--[220]---o
                       |
25 RB4      o-----o    |
                  |    |
24 RB3      o--o  G    |
               B  |    |
23 RB2/AN8  o--o--o----o
               R
22 RB1      o--o


Entry: dsp language
Date: Tue Dec 11 09:33:07 CET 2007

what is necessary? i could take the PD sound processing as a model.

  * box = primitive | composite

  * composite = box + interconnect

  * things should be parametrizable in grids (from which an iteration
    structure is defined)

  * can we have lexical scope?

  * don't force serialization

  * don't force naming of intermediates, but don't restrict it
    either. (box combinators)

  * allow scheme (expression trees) to be a subset of the
    language. the exention is no more than a way to abstract 'parallel
    scheme'.

it would be nice not to go too far away from lambda abstractions. the
problem is multiple outputs. these could be multiple functions. so
what about common subexpressions? keep it manual for now..

maybe use scheme-like syntax based on 'values' but called
'output'. the latter will be more general than values: it can be
re-arranged in time. it's an essential observation.

not forcing the naming of intermediates can be problematic, since it's
the whole point: dsp code is very graph-like, and naming is more
efficient for this.. it looks like naming IS essential.

brings me to composition: a new box consists of 'node' sections which
name nodes. 'lambda' could be replaced with 'in' since it will name
the external inputs. all other nodes have to be named. 'not forcing
naming' can be implemented by special purpose box combinators.

nodes are different from locally created 'specialized' boxes.

names can be replaced by box expressions if they are tree-like (return
a signle value) otherwise they need to be named in a 'node'. similarly
'out' can be discarded in a definition. this allows the use and mixing
of scheme functions.

(in (a b c)  ;; 'in' is the parallel equiv of 'lambda'
   (box (mula (x) (* a x))    ;; create local specialized box (like 'define')
   (box (mulb (x) (* b x))

   (nodes                     ;; naming intermediates
     ((q r) (div/mod a b))

     (out
      (+ (mula c) (mulb c))
      (- (mula c) (mulb c)))))))


so, concretely:

  'in'    is like 'lambda' but it has parallel outputs
  'nodes' is like 'let-values'
  'box'   is like a local 'define'
  'out'   is like 'values' but defines parallel outputs


so the principles:

  1. the ONLY point of the language is to extend the many->one lambda
     calculus that can create expression TREES to something that can
     create expression GRAPHS.

  2. it is important that the lambda calculus is a subset which uses
     it's original lisp tree notation.
      * 'out' is redundant for single outputs
      * intermediates from single output boxes do not need to be named

i'd like to extend this to grid processing: systolic arrays etc: box
compositions that connect boxes in several dimensions, such that
iterators can be derived from a highlevel description.


Entry: driving led
Date: Tue Dec 11 11:17:14 CET 2007

driving the led during reception is going to happen at 5kHz, which
when using PWM is probably going to be too little. say 256 steps gives
about 20Hz. so what about using SD modulation? i wanted to try this
for a while, maybe now is the time.

yup. works like a charm. since red is less bright, i give it a double
time slot, which leads to a 4 phase state machine.

at receive sample rate there's some noticable flicker at low
intensity, at about 5Hz. it's easy to avoid by introducing a minimum
of 5 or 6 as color values.


Entry: more state machines
Date: Tue Dec 11 12:27:35 CET 2007

the send and receive functionality should also be implemented as state
machines. or.. stick to a single application thread, and run the other
state machines from the blocking operations? maybe that's easiest.

sending and receiving are mutually exclusive. currently there's only
the LED that works in parallel.


Entry: rx/tx interference
Date: Tue Dec 11 16:50:53 CET 2007

there seems to be interference with driving the led and reception. i
added "red blink" in the demo app whenever there is a bad reception,
however, this seems to completely throw it off..  (edit: not the led
but tx)

so i need to add some pauses probably. which brings me to: there is no
generic pause word, so i'm going to use just a double 0 for loop.

the interference seemed to be due to the absence of 'ramp-off' :
before switching to rx-mode the speaker was still being driven. i
added those and some pause, now it seems to work.


Entry: project scheme extensions
Date: Wed Dec 12 09:38:21 CET 2007

i need to move away from loading scheme extensions as individual
macros, but towards associating them to a project. they are
different. the distinction to make is:

 * macros from forth code: incremental, can be redefined
 * brood extensions: fixed per project

this of course leaves in the dark brood extensions as libraries.. it's
a hodge podge. what i could try is to keep the target namespace
management intact: typical forth style shadowing for both words and
macros and allow it to call scheme code.

what about a unified dictionary:
      * macros stored as symbolic code
      * ram addresses stored as macros

+ macros are allowed to postpone expansion if they reduce to single
constants?

it looks like the seed of the plan is there: it's simple and i can't
see any problems. the main difficultie lies in the difference of the
way the cat namespace works (declarative: no re-definition, all names
defined at once) and the purrr one (shadowining, incremental)


Entry: TODO list cleanup
Date: Wed Dec 12 09:55:06 CET 2007

DONE:

* fix the assembler: i'm running into word overflows, code is getting
  too big. maybe use a trick: whenever a word overflows, just add some
  new code after the code chunk, jump to there, and have that chunk
  jump to the original word with a far jump. as a quick fix: at least
  print the name of offending symbols to they can be manually patched
  to jong jumps.

* switch the assembler to a mutating algo so proper jump graph opti
  can be performed easily. i see no point for pure algos there.. asm
  is a black box anyway.


IMPOSSIBLE:

* if 'invoke' is a macro anyway, why not combine it with execute/b ?

  ANSWER: it's awkward to set the return stack to the word after
          invoke without using a call. that call might as well be
          execute/b

* nibble buffer is not interrupt-safe: the R/W thing is
  shared.. probably need separate R/W pointers! (FIXED)


REMARKS:

* make it possible for a macro to create a variable. more specificly:
  make it possible to create any couple of words and variables
  together. (this means a macro can create a macro.. probably means
  re-introducing some reflection).

  if the macro dictionary is merely a cache of a linear dictionary,
  with the linear dictionary containing macros, this kind of
  reflection should be possible to introduce without the disadvantage
  there was before: mutation in the dictionary hash.. there would only
  be shadowing, and 'mark' could handle macros too. syncing cache
  means (lazily) recompiling the macro cache.


Entry: mzscheme slow text
Date: Wed Dec 12 10:34:26 CET 2007

i just tried:

(define (all)
  (define stuff '())
  (let next ()
    (let
      ((c (read-char)))
      (if (eof-object? c)
          (reverse! stuff)
          (begin
            (set! stuff (cons c stuff))
            (next))))))


(printf "~s" (length (all)))


tom@del:~$ time bash -c 'cat ~/brood/doc/ramblings.txt | mzscheme -r /tmp/text.ss'
606700
real	0m0.332s
user	0m0.319s
sys	0m0.012s

so it's at least not read-char..

maybe i need to write a fast tokenizer for forth using just read-char
instead of the yacc clone from mzscheme? probably the same goes for
sweb.

tokenizer has 3 states:
    * whitespace
    * comment
    * word

easy enough to just do manually.

it could be implemented as a 'read-syntax' word which adds source
location information to the symbols and comments read. a syntax-reader
is essential since they can be pluggen into the module loader system.


Entry: incremental static binding
Date: Wed Dec 12 12:40:22 CET 2007

about static binding, redefine and linear dictionaries: it's better to
have something that is predictable, but a bit rigid, than something
that's flexible but harder too use.

what i mean is redefining lowlevel words: it's possible to do so, but
dependency management then becomes manual. the rule is: later code can
never change bindings in earlier code, but it can redefine behaviour
for future code. this is dirty, but the simplicity is very managable
and it allows for predictable hacks. the only real alternative is a
proper dependency management system and name space isolation. david
and goliath.


Entry: sane conditionals
Date: Wed Dec 12 16:15:42 CET 2007

time to give up on the crappy >? constructs.


Entry: conditional optimization
Date: Wed Dec 12 16:28:22 CET 2007

what i need is a way to optimize away a conversion from flags ->
number -> flags, but without hindering the construction of proper flag
bytes.

the macros like '=?' can still be used as optimizations that need to
combine with 'if' immediately, but the others should definitely
produce flag bytes.


Entry: >z
Date: Wed Dec 12 17:29:29 CET 2007

i wonder why i'm not just using flag>c instead of >z.. since carry
flag is unaffected by drop. maybe to save carry flag some places?

0 -> carry = 0
any other -> carry = 1

that's just "255 + drop"

well.. it doesn't matter so much in that it's never inlined.


Entry: then opti
Date: Wed Dec 12 17:46:02 CET 2007

it looks like this is mostly broken. maybe since the introduction of
'drop save' elimination. i see that "z? if drop 123 ; then" doesn't
eliminate to one instruction.. see 'swapbra' and extend it to other
conditional execution macros.


Entry: dtc primitives
Date: Thu Dec 13 10:54:01 CET 2007

towards a standard forth.

 1. get it to crosscompile
 2. write a kernel in itself

the important things to note about the implementation is that it is
concatenative: there are no 'parsing codes', meaning, there is no
lookahead.

  * every word is an instruction
  * 'return' is marked by a bit

as a consequence, each word has only 14 bits of payload. two bits are
reserved to distinguish between data and code, and implement the
return instruction.

now the criticism: maybe it's best to ditch the return bit, since it
limits the addressable memory. with 14 bits only 16k words can be
addressed. the trade-off needs somei think it's best to ditch the return bit, since it prevents easy
access to primitives by just reading them from code.

i'm not sure where this can bite, but using the LSB as tag bit
(0=data, 1=code) and making execute ignore the tag bit allows the use
of 15 bit numbers, which can represent addresses.

maybe it's not such a good idea.. i'm a bit uncomfortable with not
having 16 bit width.

 statistics. a return bit makes
only sense if the words are expected to be short. padding is an
option, but awkward, since every label needs to be prepended by a nop
if it's not aligned.

rebuttal: tail recursion. this is the thing that's handled with the
return bit.. i forget a lot of thought already went into this
thing. tail recursion justifies the inconvenience of handling the
extra bit.

remark: a tagged data system can be built on top of this forth. i'm
not comfortable with giving up a 16 data/return stack in favour of a
14 or 15 bit tagged system.


Entry: signed/unsigned comparisons
Date: Thu Dec 13 12:45:45 CET 2007

two issues. are they the same or not, and what should the default be?

they are not the same:

     pos neg >
       * always true in signed
       * always false in unsigned

unsigned: carry
signed: sign of result (might overflow)

it's a bit silly, but i think it's time i admit i don't fully
understand it.. carry in addition is simple. carry in subtraction is
also not so difficult, since subtraction is addition with negative.

a carry on addition means overflow: the word's not big enough. simple.

but what is a carry on subtraction? let's isolate some cases.

            result carry sign overflow
10 3 -        7      1     0    0
3 10 -       -7      0     1    0

100 -100 -  -53      0     1    1
-100 100 -   53      1     0    1


http://en.wikipedia.org/wiki/Overflow_flag

  The overflow flag is usually computed as the xor of the carry into
  the sign bit and the carry out of the sign bit.

In other words: addition adds one extra bit to the representation. In
order to not have overflow, for unsigned addition/subtraction this bit
needs to be 0, and for signed addition/subtraction this needs to be
the same as the sign bit.

So, for a signed comparison, take the sign bit of the result, and
assume there is no overflow. For unsigned take the carry bit.


Entry: dtc remarks
Date: Thu Dec 13 14:34:51 CET 2007


 * size or speed? in the end it should run on CATkit, which has little
   flash memory, so i should really go for size.

 * FOR..NEXT is not standard, so i can just make something up?


can't get for..next going.. debugging return stack stuff is
hard. wanted to have a quiet simple puzzle day, but it requires 'real
work' :)

about size vs speed. the primitives need to be fast, so they can be
used in STC code with the VM eliminated, but the VM needs to be
SIMPLE. the return stack really should contain the same stuff as can
be found in straight line code.

i'm going to eliminate some macros. hmm.. too much thinking because
it's already too much optimized.. i find it difficult to throw this
kind of stuff away.

what to optimize:
  * inner interpreter loop
  * maybe math primitives (used elsewhere)

not so important:
  * enter/leave + RS (once per highlevel word)


Entry: eForth / tail recursion + concatenative VM
Date: Thu Dec 13 16:05:24 CET 2007

why is not optimizing so difficult? i see factors of ten everywhere..

the vm-core.f i have is nice, but i'm still quite stuck at trying to
solve multiple problems at the same time: * interoperability between
STC and DTC: both primitives and brood.  * tail recursion

it needs to be simplified a lot.. in the same way that PF needs to be
simplified to get to a proper VM architecture: it's the same problem.

i can do with primitives what i want, but all CONTROL FLOW needs to be
based on 2 simple instructions: _run and _?run - the duals _execute
and _?execute are only for primitives.

so what's the definition of _run, such that it can be turned into a
jump..

IMPORTANT:

   conditional run is not the same as conditional branch..  this
   points to an inconsistency: things that JUMP are incompatible with
   the exit bit.


another problem is that 'immediate' won't work: no compile time
execution: a simplified forth. can i have a macro mode? before i can
implement these i really need to take a look at putting back
incremental extension in the language, this time without implementing
it using mutation.. (it starts to look like this cutting of the
reflective wire was a really bad idea..)


Entry: macro code concatenation
Date: Thu Dec 13 19:20:09 CET 2007

what i'd like to postpone expansion of constants until assembly. but,
i can't influence the meta functions from forth code.. this is another
one of those arbitrary complications.

what about:
  - putting macros in the project dictionary
  - by default, they are expanded
  - when present in data positions, they are evaluated

i can't see a reason why this wouldn't work. the only concern is
stability: each invokation needs to reduce. i.e. '+' in meta dict is
special because it's different from the '+' in macros (the latter
can expand to symbolic code containing '+')

  the problem i'm trying to solve is to get a minimal symbolic
  representation of things that are constants by delaying their
  evaluation, or by somehow recombining?

i.e.: if there is a macro

  : foo 1 + ;

i want the code "123 foo foo" to expand to the machine code:

  (qw (123 foo foo))

instead of

  (qw (123 1 + 1 +))

the thing that decides what to do here is '+' but can this decision
somehow be transformed to the point where 'foo' executes? if every
macro inspects its result, and if the result is ONLY the combination
of constants->constants, this combination can be made symbolic, since
it can be re-computed at assembly time.

i.e.

   (qw a) (qw b) foo -> (qw (c d e f))

can be replaced by

   (qw (a b foo))

because probably "c d e f" is not going to be very helpful to
understand where the constant came from.

this would enable the unification of:
   * constants
   * variables
   * macros
   * meta words
   * host code

does the subset of these macros need to be explicitly defined?
probably not. they are just macros, and qualify if they map qw's to
qw's.


Entry: partial reduction
Date: Fri Dec 14 10:16:22 CET 2007

maybe macros should be made greedy, such that when completely expanded
they reduce. what i mean is that "1 2 +" -> (movlw 3) but "abc 1 +" ->
(movlw (abc 1 +)). combined with the mechanism described above, this
could be the key to unification.

as a result, macros will be the only evaluation mechanism, which just
need to be provided with a symbol lookup. there are 2 phases of macro
execution:

  - phase 1:  compile to literals + instructions, names symbolic
  - phase 2:  compute literal values using resolved names


it looks like making the effect of 'meta' into a local effect is the
way to go. it would be nice to find a way to fix the 'postponing'
operation first, so at least generated assembly code looks nice.


Entry: meshy finished?
Date: Fri Dec 14 15:59:31 CET 2007

looks like we're at the end. got 8 devices talking to each other. so
time to make a "what learned?" section..


  * for DSP, use a dsPIC instead of a PIC chip, OR write a highlevel
    (but slow) set of primitives on PIC. i spent too much time in
    writing "fast" code that eventually didn't get used, or
    extensively modified to destroy the optimizations.

    DSP apps have the property that a lot of the code volume needs to
    be fast, which screams for a SEPARATE algorithm design and
    implementation/optimization phase. the problem here is on-target
    debugging. as long as the app scales time-wise (rate reduction
    without changing other variables) optimization can be postponed.

  * get it to work FAST, and start with the most difficult part, even
    if it means dirty hacked up proof of concept, then incrementally
    improve while keeping it working. don't spend time on things that
    solve needs that are not immediate if there are other immediate
    needs.

       - debug network: eventually didn't get used
       - the hardware layer: it delayed everything else

    the mistakes had quite severe consequences in the end. i could
    have gained 2 weeks by not making the debug network.

    the cause of the mistakes seem to be

       - mismatch in skill (no analog electronics hands-on experience,
         and dusty theoretical understanding) but mostly misplaced
         confidence in non-tested skill.

       - underestimation of importance of debugging.

  * debugging deserves its own bullet. ironically, i lost a lot of
    time building a debugging tool. building that tool was a good
    idea, but i forgot a couple of steps:

       - underestimated the difficulty in getting the debug net
         working properly. this actually required an intermediate
         debugging phase to monitor the behaviour of both send and
         receive. i didn't anticipate these problems, which was a
         mistake. lesson to learn is to never underestimate the
         problems that can arise, even if the application seems really
         trivial.

       - doing high-bandwidth work (DSP) requires high-bandwidth
         debugging tools or at least a large storage space on chip for
         traces and logs. a solution here would be to make a separate
         circuit only for logging, or use a high-bandwidth host
         connection. an example could be a circuit that records to
         a flash card, or a USB connection to host.

       - need better host side software extension system for
         special-purpose debugging tools. it should be the same as the
         way the host system is written, so that tools can be moved
         into the main distro when polished. to make this easier, the
         number of extensible points needs to be limited such that
         they are better accessible. i.e. the console's need to be
         programmable.


so, to summarize:


     DESIGN then IMPLEMENT

     don't optimize and design at the same time if there is a lot of
     opportunity for optimization (i.e. DSP app on PIC18 where an
     order of magnitude of speed gain is easy to find). as long as
     time-critical cores are small, this is ok, but when the core is
     all there is, you need to get it to work first using a highlevel
     approach, and ONLY THEN make it fast.


     ELECTRONICS is DEBUGGING

     do not underestimate the difficulty of getting something right in
     reality, even if the logical model is trivial. programming
     problems seem to be about managing complexity, while electronics
     problems are about managing external influences, non-ideal
     behaviour, and tons of exceptions and hacks. these are entirely
     different. programming = abstraction, electronics = debugging.


Entry: meshy presentation -- technical
Date: Fri Dec 14 16:54:00 CET 2007

hardware

  goal = as simple as possible
     - 40mm speaker used as mic
     - input:  2 opamp mic bandpass amplifier + 8 bit A/D
     - output: switching transistor (PWM)
     - PIC18 @ 10 MIPS
        - prototype uses large chip (64kb - 4kb - 28 pin PDIP)
        - possible to downscale a lot (8kb - 256b - 18 pins SMD)
     - RGB led (single resistor, S/D alternated pulsed)

lowlevel software

  - purrr
     - Forth dialect
     - simple but powerful
     - bare metal vs. abstraction mechanisms
     - interactive (debugging!)
     - bottom up programming
     - metaprogramming (scheme)
     - emphasis on debugging

  - sound modulation:
     - OOK  (on-off keying)
     - BPSK (binary phase shift keying)
     - 10 baud framed bytes: 1 start, 8 data, 2 stop
     - 610Hz carrier (speaker reso)
     - speaker driven with 7 bit PWM @ 78kHz

  - demodulator
     - input sampled at 5kHz
     - downmixer (cross modulator) + lowpass filter
     - OOK: asynchronous, power detect
     - BPSK: synchronous costas loop


Entry: simplex LEDs
Date: Sat Dec 15 11:24:21 CET 2007

the most efficient way (wire-wise) to connect a bunch of LED is to
place them on the midpoints of simplexes, where you connect the
simplex points to +/- drive points: this makes it possible to switch
on 1 hop vertices, but 2 or more hop vertices stay off since they will
not reach threshold voltage. this structure is also called a "complete
graph".

http://mathworld.wolfram.com/Simplex.html
http://mathworld.wolfram.com/CompleteGraph.html

mapping this to a 2D or 3D structure in a nice symmetric way is not
that trivial. however, the most symmetrical planar arrangement is:

place the points in a circle. if the number of points N is odd, you
get (N-1)/2 concentric circles each containing N points, with a
criss-cross network below it. even works similarly, only one of the
circles has half the elements.

this structure can be wrapped around half a sphere. wrapping it around
a full sphere gives easy access to the control points, and gives a
spherical or cilindrical structure.

the coverage grows ~ n^2 so taking more points is relatively more
efficient. however, overall connection might get too complicated. a
different approach is to take some kind of 'primitive circle' which
can be unfolded in a line, for example the pentagram with 10
LEDs. transporting then could be done using a bus. i.e. a ribbon
cable. maybe it's possible to use a ribbon cable with pins?

using a linear solution, it might be possible to make something that
is composable. i.e. take an N solution, add a wire and some N
primitives and make an N+1 solution.

this turns out to be just cyclic permutations. for example, starting
with the 2-terminal primitive L2, it can be extended to a 3-terminal
primitive L3 by means of the primitive 3-permutation P3, and adding an
extra wire to P2, so:

L3 = L2 P3 L2 P3 L2 P3 = (L2 P3)^3
L4 = (L3 P4)^4


   in general: L_N = (L_{N-1} P_{N}) ^ N

this is probably a lot easer to do than networking, since it's basicly
braiding. a linear projection is easy to control, but i'm not sure if
it's really a good approach for construction.. if i find an easy way
to solve the permutation problem, then yes, it's a good thing.

simplification: it's probably ok to leave out the last permutation,
and compensate for it in software.

now, permutations and braids: they are not the same. transpositions
have no direction, and are self-inverting. a twist on the other hand
has a sign, and is not self-inverting.

braids can implement permutations while giving structural integrety.
for example the most typical 3-strand braid:

__   ____
  \ /
   \
_/  \   _
     \ /
      /
_____/ \_

implements a 3-element cyclic permutation as a right crossing followed
by a left crossing (nomenclature: rotate the image 90 counterclockwise
and progress upward: direction is the strand that passes over the
other one.

compare this to a double right crossing:

__   ____
  \ /
   \
_/  \   _
     \ /
      \
_____/ \_

this is a simple twist and provides no structural integrity, but
implements the same permutation.

can this somehow be used as a building block for the other cyclic
permutations? sure.. as long ass you work with twists from left to
right, and make sure the twist pattern gives you structural integrety,
the same logic applies: the result is just a cyclic permutation.


Entry: interactive mode
Date: Sun Dec 16 10:13:47 CET 2007

from interactive.ss :

  The end goal of Purrr is to have only 'live' and 'macro'
  interactions: the system should be powerful enough so excursions to
  the underlying prj: code is not necessary. This gives a separation
  between 'tool development' and 'tool usage'.

I've come to believe that this is not a good idea in general. It is OK
to be able to access the most basic host code, such as compilation,
upload and inspection, but for real work you'd want to automate those
and have a 'real' programming language behind it. In other words:
access to prj or scheme code is necessary.

  * it's ok to have a small collection of host words in interaction
    mode which are hidden using prefix parsing.

  * this set of mappings (parsing words) should be extensible: prefix
    parsing needs a simpler definition form.

  * the functionality behind those words should be extensible

Concretely this requires interactive.ss to be adjusted so it can
accomodate parsing code in a different way. Maybe it can be made
extensible together with the other parsing words.. The problem right
now is that it is a single method, and the way it's defined is
difficult to make dynamic (it's a scheme macro).

Actually, compile mode forth parsers are already registered in the
global namespace tree, so making them extensible can be done
incrementally by adding some more name spaces.


Entry: extensible interactive parsers
Date: Sun Dec 16 10:52:30 CET 2007

two conflicting views here:

  * currently interactive parsers are isolated functions, which is
    nice and clean.

  * what is required is extensibility and re-use.

the solution seems to be to put the components in a global name space,
which is used as the unified extension mechanism, and replace the
function with a stateful one that refers to the name space.

key elements here are 'with-member-predicates' and
'predicates->parsers'. these form a construct that needs to be
attached to the global namespace tree.

the former creates a collection of membership predicates.

the latter creates a map (finite function) from atom -> parser.

the problem with the current approach is the generality of the
parsers: they don't just map names to functions, but also create
'classes' with similar behaviour, so there is a level of indirection
that needs to be captured. the live parser map is

   * symbol -> parser  (parser primitive)
   * symbol -> symbol  (parser class)

if they are stored in this way, interpretation is quite
straightforward. the approach is:

   * provide alternatives for 'with-member-predicates' and
     'predicates->parsers' so they postpone their behaviour and store
     it in the global namespace.

   * provide an interpreter.

OK. implemented + tested.

Some further cleanup. Maybe it's best to not store symbols in the
dictionary, but parsers: use cloning instead of delegation? This way
the dictionary IS the finite function. The real problem is that macros
have a delegation method (function composition) but parsers (and
assemblers for that matter) have not.

so:

  Forth syntax parsers (lookahead) have no composition
  mechanism. Therefore cloning is used to give some form of code
  reuse. It used to be delegation, but this gives dynamic behaviour
  which contrasts with the static, declarative intent of the global
  name space, regardless of its implementation as a hash table.

and about ns:

  The global namespace is used as:
   * declarative symbol table (single assignment, mutual refs)
   * cache (forth macros should eventually be defined in state file)

Maybe forth.ss should be separated into generic forth style parser
macros and functions and the definitions of the parser words.


Entry: static composition and extension
Date: Sun Dec 16 11:19:16 CET 2007

i chose for a hierarchical dictionary as the main means of program
extension. the way it is used is not dynamic binding, but 1. postponed
static binding and 2. cache of a linear dictionary.

as a consequence, it can probably be completely replaced by mzscheme's
module composition approach, together with some means (units?) to
solve circular dependencies and plugin behaviour.

however, i see no point in changing this until the dependency on the
method that implements this linking part can be abstracted
away. currently that seems problematic, because the name store is
everywhere: it is the backbone of the system.

i find it very difficult to see what is the right thing to do
here. 1. i'm not using abstraction mechanisms provided by mzscheme to
do namespace management, which makes me miss some static/dynamic
checks, and is in general just a bad idea. 2. my approach is more
lowlevel so flexible to shuffle it around and find the right
abstraction. the thing is i'm not sure yet if i need this flexibility
(over the built in functionality).

the only way to really resolve the ignorance is to implement a toy
project which doesn't use the global namespace, and only uses mzscheme
units and modules.

Entry: future dev
Date: Sun Dec 16 15:48:03 CET 2007

  * fix problems in TODO (mostly peval)
  * finish 16bit DTC
  * dsPIC forth
  * lisp-like dsp functional dataflow language for PDP/PF/dsPIC
  * CATkit 2
  * sheepsint 8-bit synth engine (envelopes + FM)
  * E2 debugging
  * CATkit midi
  * USB


Entry: inspecting macro output
Date: Mon Dec 17 10:07:27 CET 2007

finding a common tail in 2 lists is necessarily quadratic. but i
probably don't need that, since i'm looking only for common subtails
in substacks.

i'm still looking for a good description of the problem.. the problem
of finding the common tail seems to be the one to give insight.

what about this:
 1. split input and output 'qw' atoms off
 2. check if remaining tail is the same

this is the only behaviour that's valid. once this data is obtained,
it could be peeled to isolate the behaviour of a macro, at which point
cold be decided to 'unevaluate' it.

now, what does unevaluate means?

... (qw 1) (qw 2) +   ->  ... (qw 3)

this could be replaced by (qw (1 2 +))

this is always the case: since the evaluation can be performed again
later. the only information that is extracted at this point is whether
the macro does anything else.

the change in macro code seems to be here:

    (([qw a ] [qw b] word)         ([qw (wrap: a b 'metafn)]))

the 'wrap:' form needs to be replaced by something that might return a
value if the variables contain numbers.


running into a small namespace problem.. trying to use scheme names,
but it might be better to leave the meta dict in there to do this kind
of stuff, but only call it from the macros. basicly, the stuf after
wrap: should be symbolic if the parameters are symbols, and computed
if both are numeric.


Entry: benchmarking
Date: Tue Dec 18 16:21:16 CET 2007

the current reader is problematic.. it's slow, and i don't understand
the reason. i don't think it's usage of streams, since it wat slow
before, and it's not read-char, since i tried that..

so...

  1. make a test for the current reader
  2. replace it with a new reader
  3. build 'read-syntax'


first text: the problem seems to be somewhere else..

  (define f (forth-load-in-path "monitor.f" '("prj/CATkit" "pic18" )))

is virtually instantaneous like it should be..

so where did i get the idea that this is slow?

indeed:

  '(file monitor) prjfile prj-path forth-load-in-path

is instantaneous also.

otoh, 'forth->code/macro' isn't instantaneous at all..

compiling the code 'code/macro!' is instantaneous also. i think i got
it. why is the code/macro splitter so slow?

tracking down to forth.ss : forth->macro.code which uses
@forth->macro/code which uses @moses

it can't be @moses since that's just a filter.. so it's probably down
the stream in the macro processor. need to test that separately.

running into some inconsistencies.. probably best to switch everything
to syntax objects, including a syntax-reader.


Entry: read-syntax
Date: Tue Dec 18 17:54:50 CET 2007

from

http://download.plt-scheme.org/doc/371/html/mzscheme/mzscheme-Z-H-12.html#node_chap_12

  (datum->syntax-object
   ctxt-stx v [src-stx-or-list prop-stx cert-stx])

converts the S-expression v to a syntax object, using syntax objects
already in v in the result. Converted objects in v are given the
lexical context information of ctxt-stx and the source-location
information of src-stx-or-list. If v is not already a syntax object,
then the resulting immediate syntax object it is given the properties
(see section 12.6.2) of prop-stx and the inactive certificates (see
section 12.6.3) of cert-stx. Any of ctxt-stx, src-stx-or-list,
prop-stx, or cert-stx can be #f, in which case the resulting syntax
has no lexical context, source information, new properties, and/or
certificates.

If src-stx-or-list is not #f or a syntax object, it must be a list of
five elements:

  (list source-name-v line-k column-k position-k span-k)

where source-name-v is an arbitrary value for the source name; line-k
is a positive, exact integer for the source line, or #f; and column-k
is a non-negative, exact integer for the source column, or #f;
position-k is a positive, exact integer for the source position, or
#f; and span-k is a non-negative, exact integer for the source span,
or #f. The line-k and column-k values must both be numbers or both be
#f, otherwise the exn:fail exception is raised.

(datum->syntax-object
 #f word
 (list source-name
       line
       column
       position
       span)
 #f #f)


EDIT:

why do i run into the need to have a port object that can put back a
character? scheme needs this too, so maybe the port objects need to
support putback?

it's the other way around: scheme ports support a peek operation.


looks like it works now, and the code looks clean.
next: create syntax objects.

this seems to be rather straightforward by using
'port-count-lines-enabled' and 'port-next-location'.

ok. seems to work now.


Entry: syntax cleanups
Date: Sun Dec 23 13:57:33 CET 2007

what about the '|' character for lexical variables?

things to be aware of:
  don't break code / or break it verbosely

again, i want to write a state machine.. i need to think a bit about
the abstractions used in forth.ss

'parser-rules' works well. the rest is hard to read. the problem seems
to be parsers that segment data, instead of taking a fixed amount of
data from the stream. these need state machines.

let's rewrite the def: parser as an example.

basicly this is forth-lex.ss, but then recursively.

OK. i've got a definition parser working which produces name, formals
list and body. now this needs to be passed upstream somehow. looks
like that is the next part to cleanup: macros can have formals, and
they need a symbolic representation for this, i.e. in the state file.

now the question is: should this be the (a b | a b +) syntax, which
requires another lexing step, or should it be an s-expression with
explicit formals list?

what about this: make lexing steps easier, and just use more lexing
steps.  forth handles parsing (recursive) at a later state than
lexing.


Entry: regular grammar
Date: Sun Dec 23 22:00:27 CET 2007

the essential property of a regular grammar is that, each production
rule produces at most one non-terminal. intuitively, this means there
is no "recursive" tree structure, only a sequential one: there is no
"replication gain".

so it looks like i need a way to express some of the state machine
parsers as simple regular expressions based on membership functions,
instead of the more specialized character classes.

 (vaguely related: note that the Y combinator is essentially a copy
 operation)


Entry: regular expressions
Date: Tue Jan  1 15:48:20 CET 2008

the data is a stream of tokens, so regular expressions can be
constructed in terms of membership functions and modifiers like '*' or
'+'. symbols can be converted to membership functions.

that should be enough? not really. need some form of abstraction: a
pattern can be a composition of patterns.

so maybe it is better to stick with the lexer language in mzscheme?
since what i am going to re-invent is ultimately going to be a generic
regexp tool. EDIT: looks like it's really character-oriented. maybe it
is a good exercise to try to write a lexer generator? can't be that
hard.. also, i run into this problem so many times with low-level bit
protocols that it might be a good idea to take a closer look: white
space is essentially the 'stop bit' in async comm.

  which brings me to the question: i think i read on wikipedia (i'm
  offline now) that regular expressions and FSMs are somehow
  equivalent. how is this?

how about forth-lex.ss: a specification not as production rules of a
regular language, but as regular matching patterns? what is the
problem i am trying to solve? find a function (or macro) that maps

  lex-compiler : language-spec -> token-reader

stream = token stream | EOF
token = word | comment | white

at the same time, i am trying to stay true to the forth syntax: simple
read-ahead. (keyword + fixed number of tokens) or delimited read
(keyword + tokens + terminator).

  note: there seems to be a difference between reading UPTO a token,
  or reading UPTO AND INCLUDING a token. is standard forth always of
  the latter form?

to answer the question partially: the current forth-lex.ss performs
segmentation, and thus is not of that form: it cuts INBETWEEN tokens.
but forth is. can i learn something from this? yes: cutting AT a token
makes the automaton simpler, since it doesn't require peek. let's call
that 'delimited' until i know the technical term.

i think the important lesson is that:

  1) forth should be delimited: this simplifies on-target lexing
  2) exception: first stage tokenizer in brood = segmentation

the latter is an extension to make source processing in an editor
(like emacs) easier by preserving whitespace and delimiting
characters. BUT, it should not introduce structures that 1) can't
interpret.

it looks to me that before fixing the higher level compiler and macro
stuff, the lexer should be fixed such that it can be replaced by a
simple, reflective state machine (true parsing words). looking at
forth, there are 2 reading modes:

   - read upto and excluding character
   - read next word (= upto and excluding whitespace)

by fixing some of the syntax (comments and strings) editor tools can
be made exact: a list of DELIMITED words will read upto and including
a delimiter.


Entry: rethinking forth-lex.ss
Date: Tue Jan  1 18:00:25 CET 2008

a proper markup language is necessary. one that will not throw away
information, but gives perfect parsing of source code. note that in
order to transform source code to markup, a tokenizer is
necessary.

the tokenizer is a form of 'unrolled' parser: it describes a
segmentation that CAN be parsed by a reflective delimited
parser. ('reflective' means words have access to the input stream and
can thus influence the grammar).

in order to make the right decision, it is necessary to have a look at
the standard word ." which quotes a string up to but excluding the "
character and prints it: this words interprets the first whitespace as
the delimiter, and any subsequent whitespace is part of the string. in
order to properly segment code, this behaviour needs to be respected.

instead of (pre word post) a different segmentation is necessary which
can properly encode eof. what about a word/white distinction?

(word    pos string delimiter)
(comment pos string delimiter)
(white   pos string)

another question: is EOF error or not, when it follows a word? i think
the answer should be YES: otherwise it violates concatenation of files
= file.

got forth-lex.ss simplified now.. it looks really familiar ;)
i need to give it the standard names, but this looks like it.

NEXT: add delimited parsing to parser-rules. this should capture all
parsing need, since there are no more non-delimited constructs. i.e.

  (parser-rules ()
    ((_ macro : name words ; forth)
        ---------))


Entry: declaration mode
Date: Wed Jan  2 19:15:54 CET 2008

embedded in standard Forth syntax is a "declaration mode" where all
definitions are interpreted as macro definitions instead of
intantiations of words.

i'd like to express the state machine that implements this mode using
an extension of the 'parser-rules' syntax, one that implements (a
limited set of) regular expressions.

let's start with a summary of current constructs (-> means "depends on")

  parser-rules -> @syntax-case -> @unroll-stx + syntax-case

where 'parser-rules' creates a function with parser prototype (stream
-> stream,stream) and @syntax-case is like 'syntax-case' but
applicable to the head of streams.

most of the real action is in forth.ss, where i'd like to eliminate a
number of constructs. the current way to collect a number of
definitions is using 'def-parser' which creates a definition parser
parameterized by a type tag. recently i wrote this as a straight state
machine. this i'd like to replace now with some regexp based matching
approach.

the key elements in a def parser are:

    * a definition is of the form
         : <name> (optional | <formal> ... |) <word> ... ;

    * a list of definitions is terminated by the word 'forth'

previously i came to the conclusion to only allow delimited
constructs, which are clearly marked with a start and stop
marker. these constructs require no lookahead, and thus have a simpler
automaton implementation.

i'd like to use the '...' construct to indicate zero or more, just
like the syntax-case macro, but necessarily limited by a fixed marker
symbol. a '...' at the end of a match means pattern recursion.
optional constructs can be handled by multiple match rules. this makes
a def parser look like:

   (parser-rules (: ; | forth)
     ((: name | formal ... | word ... ; ...) ((def name (formal ...) (word ...))))
     ((: name word ... ; ...)                ((def name () (word ...))))
     ((forth)                                (()))

can this form of ellipsis be mapped to the default meaning of multiple
occurances? this looks like an important question: a core difference
between tree and sequence matching.

question: what is better?
  * special meaning of '...' at the end of a sequence (self-recursion)
  * explicit recursion?

the def parser could be constructed as a 2-phase machine: one that
dispatches between staying in the mode and calling a single def parser
or exit the mode, and the def parser itself.

'...' could vaguely mean "multiple times", but there's a difference
between: multiple times upto XXX, or infinitely many. it looks like
explicit recursion is better than looping, so i'm going to drop the
special meaning. this brings a single def parser to:

   (parser-rules (: ; |)
     ((: name | formal ... | word ... ;) ((def name (formal ...) (word ...))))
     ((: name word ... ;)                ((def name () (word ...)))))

now, what i can use is this:

  (syntax-case #'(a b c end bla) (end)
    ((stuff ... end r) #'(r stuff ...)))

=> (bla a b c)


yep.. it looks like there's a fundamental difference between the tree
matching and sequence matching problem. maybe i need to give it a
special symbol. let's take *** to mean: collect upto following
terminator, so ... can still be used for tree matching.

   (parser-rules (: ; |)
     ((: name | formal *** | word *** ;) ((def name (formal ***) (word ***))))
     ((: name word *** ;)                ((def name () (word ***)))))


what about a simpler approach? the only thing that needs to be done is
to collect syntax object between marks into lists. these lists are
easy to process with a @syntax-case parser later on. so the thing
that's necessary is a way to construct a stream parser that collects
up to a certain predicate. sounds familiar?

ok.. this leads to simpler code. i could use the current 'def-parser'
as a template for a more general delimited parser expression.

i think i can ditch '@split' now. it leads to convoluted code.

ok. 'mode-parser' is now written as an explicit recursion now. this
probably means i can start throwing out some stream processing
code. wait.. need to check the macros-with-arguments thing..

OK. fixed. commented out a lot of code from stream.ss that was related
to chunking/splitting.

so.. the lesson:

     * linear streams: use explicit delimiters for embedded sequences:
       simplifies parsing: no lookahead necessary.

     * convert delimited sequences to lists + use scheme's tree
       matchers


Entry: next?
Date: Thu Jan  3 00:48:28 CET 2008

* connect the syntax reader to the parsing/loading code.
* unify all evaluation to execution of macros + manage evaluation time


Entry: moving to stx objects
Date: Thu Jan 31 12:49:59 CET 2008

what needs to be done now is to:

* replace all compile words so they accept syntax object in addition
  to lists.

* convert all generators to syntax generators

* add print routines for them


so.. start in badnop.ss: string->code/macro (for compile mode, which i
can test now). i'm replacing forth-string->list with forth-string->syntax

got string->syntax stuff working. now trying the path/file
loader. this needs @syntax-case instead of @match.

except for the weird problem below which i worked around, it seems to
work now. printing works out of the box (snot).


Entry: weird @syntax-case problem
Date: Thu Jan 31 13:54:43 CET 2008

the 'load' symbol in this doesn't want to work. if i replace it with a
different name, it does.. what's that about?

    (@syntax-case
          stream tail (load-ss load)

          ;; Inline forth file
          ((load name)
           (begin
             (printf "load\n")
             (@append (@flatten (f->atoms (stx->string #'name)))
                      (@flatten tail))))

         ....


Entry: possible cleanups
Date: Thu Jan 31 15:56:06 CET 2008


 * asm buffer from tagged list -> abstract type?

   there's a lot of room for improvement in that department. it would
   allow some kind of instruction annotation that's not possible right
   now. i think were i to start from scratch, i would build it around
   this..

 * macro unification

   (from the TODO)

   unify dictionaries: put macros in the main dict as lists, store ram
   addresses as variables, and find a way to postpone compilation of
   macros to their corresponding values if they reduce to values (are
   constants/variables/labels...)

the former is cosmetics (atm), the latter is a tough problem, but can
lead to a gigantic simplification.


Entry: target name space unification
Date: Thu Jan 31 16:01:12 CET 2008

name space unification would mean that the dictionary stored in the
.state file contains not only addresses, but also macros (in a form
that's specific enough to recompile).

this form needs to include lexical variables. so a dictionary item is
either a number, or a macro. target words are then just macros:

((abc 123)            ;; literal / constant / ram variable / ...
 (go  3235 execute)   ;; code
 (bla abc def))       ;; any macro code

taking into account lexical variables this can be simplified to a
single format:

((abc () (123))
 (go  () (3235 execute))
 (bla () (abc def))
 (arg (a b) (a b +)))

where the first parens are the macro lexical variables.
code that has no lexical variables is purely concatenative.


this requires quite a deep cut, but should lead to great
simplification.

fork point is here.


Entry: declarative namespace + cached linear dictionary
Date: Thu Jan 31 16:53:39 CET 2008


make dictionary abstract? maybe the most important point to ensure is
cache consistency. on one end, there is a symbolic representation of a
dictionary, on the other end there is a compiled version, which
resides in the NS (macro) part. how to ensure these are never out of
sync?

so the next step is to define what the NS object actually is. it is a
collection of namespaces, where each element is STATIC. the
IMPLEMENTATION allows mutation, but the use should be restricted to
single assignment. otherwise the cache is invalid.

the main function the NS object provides is PLUGIN behaviour: late
binding of some identifiers to allow the system to be composed of
several individual pieces, without needing the strict tree-based
structure of mzscheme's module system. maybe units are the right way
out, but right now i'm stuck with this more lowlevel model. what's
necessary is to define some proper interfaces to this:

  1) NS as graph binding (single assignment)
  2) NS as cache object for target macros

i made this remark before.

the first access pattern is easily enforced: never overwrite
anything. the second one is more difficult. need to google a bit,
looks like a popular pattern: cache association list with a hash
table.


Entry: caching an association list
Date: Thu Jan 31 16:54:05 CET 2008


the problem can be solved by making the operations abstract.

association list:
   * push
   * pop
   * find

as long as the access pattern contains no pops, the caching mechanism
is quite simple. on pop, one could re-generate. this is effectively
what i'm already doing, however, it's not guaranteed synchronized.

so.. the elements: 2 dictionaries:

  (macro)         ;; defined in core, and untouched by prj
  (macro-cache)   ;; cache of prj macros

for this to work, the code in (macro) should NOT depend on the code in
(macro-cache). this means the core macros are not allowed to have
pluggable code. this is only allowed in the static load part.

let's rephrase: macros are subdivided in 2 parts:

  1) declarative with cross-resolve (pluggable components)
  2) linear dictionary extension on top of this

does this in any way interfere with local name re-definitions?

i think i just need to try it out..

re-iterate the model from the forth side:

each compilation unit has a name space that can shadow/extend the
previous one. all extensions in one unit need to be unique.  this
model resembles incremental compilation per word (strict early
binding), but allows for cross-reference within one unit.

path:
 * get rid of constants.
 * get rid of ram dictionary.
 * move macros to target dictionary.

constants already were eliminated. they can still occur in rewrite
macros that generate asm code though.

the ram dictionary is more problematic. it's probably best now to move
to abstract access methods for the dictionary. it does look like
that's the way out. pulling those changes through the assembler will
shuffle things quite a bit. macros can follow quite easily from there.

maybe it looks like this: in assembler.ss -> 'label 'word 'allot
represent the points where the dictionary is augmented. what will
happen here is that macros can be defined also, no?

there seems to be a conflict between allowing the definition of labels
(ram or flash) and allowing those of macros, when they are all
unified..

there is a difference however: as long as the thing which creates a
new macro definition, only dumps it in the assembler buffer, there is
no problem.. the entire buffer will be assembled with the current
macro definitions.. wait, there's something warped about that.

pushing through some changes, i arrive at the assembler. it might be
best to turn the running variables (rom and ram top pointers) into
real variables, and use the dictionary as a stack.


going to try to do some things at once:
 - allot needs to be rewritten in terms of ptr@ ptr!
 - adding new dictionaries won't work any more

fading out.. next = (code . 0) (data . 0) etc.. data is missing.

ok, cleaned that up a bit.. also made the running pointers mutable.


Entry: macros in dictionary
Date: Fri Feb  1 12:28:01 CET 2008

that's the next step.  now i need to think hard about where this can
go wrong, with the semi-separation i have.

basicly, the preprocessing step SORTS all names, to make sure macros
are active before the rest of the code is compiled. this shouldn't
give any trouble.

the thing to look at next is the path macro definitions
travel. probably it's best to parse everything in one go: formal list
(empty for concatenative macros). forth.ss is again the place to
be. looks like make-def-parser is the function to modify.

that modification seems to work. now adjusting badnop.ss and
macro-lambda-tx.ss to build a compiler function that uses the parsed
representation to build a macro.

the problem here is that it doesn't really fit in the rpn-compile
framework.

so.. i made it fit. the "body" for macro-lex: compilers consists of 2
elements. a list of formals and a body. this is the standard format
used in the state file. md5 sum still checks.

NEXT: move the 'macro dict into the normal dict.

ouch.. can't have "123 execute" as macro.. or can we? maybe that's one
that should be delayed.. i need sleep. this smells like the beginning
of something new.. a proper way to organize the code.

a question to answer: why did i violate source concatenation by
introducing locals? the answer is of course out of convienience, but
is there a real disadvantage? the macros themselves are still
compositional.. this is just about source.


Entry: name change
Date: Sat Feb  2 12:12:30 CET 2008

it's time to start thinking about a name change for the cat
lanugage.. problem is of course cat-language.com

i have 2 alternatives: KAT and SCAT. the problem with KAT is that it
sounds the same as CAT. the problem with SCAT is the same as the
problem with SNOT.. do i really care though? programming in scat could
then become scatology. i still think that's humor ;)


Entry: reflection
Date: Sun Feb  3 10:43:19 CET 2008

i was thinking yesterday about macro unification, and wondered whether
it might be better to go back to the accumulative model for name
resolution / redefinition.

the main problem before was that compilation of code had side-effects
(definition of new macros in the NS hash), which made it impossible to
evaluate code for its value only. however, there is probably a way to
put this accumulative behaviour back, by taking the assembler into the
loop: let the asm 'register' the macros.

the REAL problem i'm trying to solve is still macro generating macros
and the generation of parsing words. both are a opposed to declarative
code model, but in the end, the model isn't declarative at all.. it's
a bit of a mess in my head now.

GOAL:

      i need macro generating macros: limiting the reflective tower in
      any way will always feel artificial.

how to do that?

      * accumulative (image model) is the simplest, and the original
        way of dealing with this problem. however, it doesn't give a
        static language.

      * declarative (language layer model) is the cleanest way of
        doing this, but requires some overhead that might look as
        overkill.


can we have both? the declarative approach needs s-expr syntax to be
managable. it won't be Forth any more..

let's see.. image model: simplest, highly reflective forth
paradigm. declarative: cleanest for metaprogramming purposes.

i guess i need to isolate the exact location of the paradigm
conflict. what do i want, really?

GOALS:

  * generating new names (macros) should be possible within forth
    code. currently, the only way are the words ':' and 'variable'.

  * cross reference should be possible. this currently works for
    macros, because they use a two-pass algorithm (gather macros
    first, then compile the code) and works for procedure words, also
    because of a two-pass algorithm (ordinary assembler).

  * linearity in chunks should be possible, which is the current
    model.

questions from this:

  - is it possible to unify the 2 different ways of emplying a 2-pass
    algorithm for cross-references?

  - how to move from a fixed 2-layer architecture (macros + words) to
    an n-layer architecture. is this doable without a language tower?
    is it desirable? (is reflection really that bad? does it conflict
    with automatic cross-reference?)


the more i let this roll around, the more a certain light goes to this
solution: split the problem in 2 languages. use a reflective forth
which 'unrolls' into a layered language description, and a static
layered s-expression based language that uses the same macro core.

this gives the convenience to use forth syntax and the reflective
paradigm, and at the same time the flexibility to use the language
tower when reflection is too difficult to get right, or the automatic
layering doesnt work..

so, the current question becomes: can the GOALS be kept by moving back
to a completely reflective machine (including parser!) which unrolls
automatically?

remark: it looks as if i really need the equivalent of 'define' which
would be really 'let'.. it all seems to boil down to scope (Scope is
everything!). a forth file should be transformable into a collection
of definitions and macro definitions. it probably makes a lot more
sense to see the dictionary as an environment which implements the
name . value map of a nested lambda expression.

let's see..

   the current model (macros are compositional functions) is really
   good. the remaining problem is scope: when to nest (let*) and when
   to cross-ref (let-rec).

another idea.. instead of looking from the leaf nodes and building a
dependency tree, what about starting from the root (kernel) node, and
build an inverse dependency tree? the linear model is the intersectin
between the two.


Entry: future CATkit
Date: Wed Feb  6 13:43:32 CET 2008

some possible roads to travel with CATkit, and associated problems:

* boot loader programmer: instead of going with the USB TTL cable, it
  might be more interesting to create a complete solution for
  programming with brood: one that can program any of the target chips
  straight from the factory. it's pretty clear to me now that freezing
  the bootloader spec is going to be really problematic: they are
  project-specific. building a single all-in-one programmer/debugger
  solution is the way to go. maybe the E2 ideas can be unified with
  this too?

* to make the programmer doable, it might be wise to start using
  available Microchip C code: which means being able to link Purrr
  code to a MPLAB or Piklab project. also for ethernet based pics this
  might be wise. time to get a bit less radical if i want to get
  things done..

* a fairly standard 16bit Forth language. i'm far removed from this if
  i first want to fix the internal representation back to a more
  reflective approach with automatic unrolling into nested namespaces,
  and integrated parsing.. (EDIT: not true.. since the Purrr18
  language should remain fairly stable, writing the Forth while doing
  the macro changes might work out just fine.)

* pre-assembled kits for Forth-only workshops. what is necessary there
  is to work for minimal cost: basicly shrink and eliminate
  through-hole components. however.. the big cost is really not the
  board if it has pots on it. the deal is: there's no point in
  competing with arduino.


Entry: overall design changes
Date: Fri Feb  8 11:45:31 CET 2008

assembler

  it's been fun, but it might be good to start outsourcing code
  assembly. especially regarding the future use of different
  architectures, and interfacing with object code formats. it fits
  better in C code generation too.


interaction

  this needs some thought, but at this point an abstract interface
  between the compiler and the target system is necessary. the road
  towards this consists of writing a double backend: one for PIC18,
  and one for ARM (philips) or MIPS (microchip 32bit). i'm thinking
  about moving most of it back to scheme, and phase out the cat code
  in prj.ss and badnop.ss


forth language

  i'm a bit in a ditch here.. the current attempt to unify the
  namespaces into a single nested macro name space brings up questions
  about maybe unifying the parser too.. however, looking at radical
  forth changes like colorForth, a move towards a rather fixed parser
  can be observed. in my approach, the parser takes out a lot of dirty
  forth-isms while at the same time keeping the syntactic convenience
  they bring, at the price of not being so extensible.. the core idea
  is still: the current functional macro approach is good, i just need
  to figure out how to organize the name space and keep everything as
  declarative as possible (relationships, not state changes).


Entry: CATkit 2
Date: Fri Feb  8 16:53:48 CET 2008

Keeping the current code in Purrr18 as the implementation language,
moving to an on-target interpreter seems like the only sane way to
decouple the CATkit community project from the evolution of
BROOD. CATkit/Sheep core could still be done in Purrr18, but the
availability of a straight no-hassle Forth would make things a lot
simpler. Clear separation of kernel / user also serves as a good
psychological barrier.

This has huge implications for the architecture. The 18F1320 won't be
enough. Probably a move to 18F2620 is necessary because of memory
requirements.

Using the current architecture though, there is a possibility to take
the following path:

 * create a different debug bus over the ICD2 connector
 * use the serial port for Forth console

Actually, that's not really necessary.. All this can be multiplexed
over serial. Another qestion is: does it make sense to have an
intermediate dtc layer like i have now, which essentially uses a
double implementation of the compiler (macros): one in brood and one
on the target? Really, the only thing to do is to replace machine code
with Purrr18 and for the rest build a standard console based Forth
machine.


Entry: stand-alone Forth
Date: Fri Feb  8 17:14:43 CET 2008

rationale:
  * more standard (documentation)
  * no dependency on Brood (decoupled from scheme + emacs)
  * no double implementation of compiler (host + target)

roadmap:
  - look at Flashforth and Retro Forth.
  - start building dictionary -> interpret mode -> compile mode
  - possible on 18f1320 ?
  - macro/immediate?
  - tail recursion?


Entry: goals
Date: Sat Feb  9 10:24:59 CET 2008

to prevent ending up in a random walk, it's time to clearly state some
goals on the PIC18 front.

  BROOD core + PURRR18: target audience is mostly myself, or people
  with assembler/electronics background. most important features are
  flexibility (focus on macros and code generation), speed and code
  size. BROOD is a tool for the "kleine zelfstandige".

  stand-alone PURRR: target audience is much broader. less emphasis on
  absolute control, more on simplicity, language stability and
  compatibility across platforms. it's the "configuration
  language". i'm thinking ANS + tail recursion + concatenative VM.

non-PIC18 things are quite open still. core needs more modularity (see
entry://20080208-114531)


Entry: pragmatics of macro namespaces
Date: Sat Feb  9 15:25:56 CET 2008

what about this:

  * design an s-expression syntax that has all the desired properties.

  * make the name-value binding explicit and unique: this gives
    problems with multiple entry and exit points.

  * write a translator from forth syntax

  * regenerate the macro cache, each time the language nesting level
    changes.


(language <macros> <words>)

(language
 ((a () 1 2 3)
  (b () 4 5 6))
 ((help a b)
  (broem b b b)))

nested syntax: at each point the current language sees the enclosing
macros. a compilation step compiles code into macros containing the
addresses.

<macros> <defs>  ->  <macros+> <code>

each macro block begins a new language layer.

time is not right yet. maybe i should do the forth first?

no.. i need to start breaking things and building them back up to get
more insight on how to disentangle before changing the current code.


Entry: breaking macro storage
Date: Sat Feb  9 17:40:04 CET 2008


simply replacing '(macro) with '(dict) now..

  secondary: prj.ss is really hard to understand. maybe more of the
  cat code should be moved to scheme? or at least to a more functional
  approach.. the state management is still difficult to understand.

looks like this just works for the monitor. now why is that? i
expected it to break somewhere..

it indeed breaks somewhere: interactive mode. looking up words doesn't
work. time to move that to a more abstract implementation in target.ss

next thing that broke is 'mark'.

  prj.ss: is so dirty because there's a lot of mutation going on, and
  the naming of words is really inconsistent. this really needs
  cleanup.

  another hidden assumption about "org" in bin->chunk. the problem
  seems to be that absence of 'org' leads to problematic asm blocks.

what about structured asm? i read something about this in olin
shiver's comments about a summer job he did implementing a scheme
compiler.. maybe that's what i need to go to? anyways.. there's a lot
lot lot of work cleaning up data representations.

  the whole ifte/s and run/s business is a bit rediculous.. it doesn't
  feel natural, and requires deep thought each time. i think it's time
  to ditch the way state access works, and move most code to
  functional programming with prj.ss doing nothing but state
  management (no control logic!)


Entry: state management / the point of prj>
Date: Sun Feb 10 14:50:08 CET 2008

something really smelly about it. i think i'm better off with true
mutation in the scheme sense, instead of working around it the way is
done in prj.ss

the base line is: this prj> mode should be usable for DRIVING THE
TARGET. the whole functional state business is overkill: most code can
really be made functional, and possibly more understandably written in
scheme. whenever state recovery is necessary, it can be moved to the
functional domain (i.e. assemble and compile as they are now..)

the problem i'm trying to solve is discipline: not gratuituously using
global state. maybe i should read some haskell tips, since this is the
way haskell programs seem to be written: a bulk of pure functions and
a central state management module through monads.

let's see some important properties:

  - the interactive forth layer translates to prj scat code
  - the macro code is purely functional code with a threaded asm state
  - staying close to scheme keeps things simple


other remarks:

  - base and prj are different. this is clumsy.
  - there are 2 namespaces: NS and the prj state namespace.
  - prj already behaves as true mutable state. is permanence necessary?
  - atomic failures

preliminary conclusion is: scat code is important as intermediate
layer between scheme and forth, both for interactive and compile time
use. the compile time part needs to be functional because it makes
computations easier: compilations should be really just functions. the
interactive part however is intrinsicly stateful: ultimately it
manages the state of the target and the current view (debug UI).

the only place where current scat/state approach is useful is atomic
state updates. these however, can be replaced by purely functional
code and a transaction based approach: each command is a state
transaction and either fails or succeeds. compositions of transactions
should maintain that property. aha, holy grail identified:

   COMPOSABLE TRANSACTIONS


maybe i just need to start reading again. this is very related to COLA
(combined object lamda architecture) and the recent transactional
memory stuff in haskell.


Entry: transactions
Date: Mon Feb 11 09:35:00 CET 2008

the way it works now: every console command that updates the state
store in snot.ss is a transaction. if it fails, the previous state is
maintained. something like that can be implemented differently.

what i'd like to avoid is to have to copy NS in the current
implementation. a possibility is to transparently replace part of the
NS tree with an association list. then parameters can be used to make
a copy.

it looks like the let* / letrec problem wants to propagate deep into
the structure of the entire program.. why is that?

maybe i should start using a persistent object model for the store?

ok.. this is shaking up the roadmap again. TODO:

  - fix the problems with macro unification
  - implement reverse macro lookup properly
  - think about making evaluation time concrete (entry://20071217-100727)
  - work towards a cleaner state representation


about haskell and monads: looking at state management, monads somehow
solve the bookkeeping of 'current' data. this can take many forms, but
two crystallized constructs are: global and dynamic environments,
which in scheme would solve most problems involving the passing of
data outside of function arguments. thanks to the type system in
haskell, the red tape can be hidden, and all is implemented using just
functions.

EDIT: being able to use state restore on failure on the command line
level is really nice. this should not be given up. however, once the
target is being modified, errors can't be fully recovered.


Entry: variables
Date: Mon Feb 11 10:17:08 CET 2008

running into trouble with recursive variable expansion. the problem is
that a variable is this:


 #`((extension: name () 'name)  ;; macro quotes name
    'name #,n buffer)

which uses:

 (([qw name] [qw size] buffer)   ([variable name] [allot 'data size]))

and this in the assembler:

  (define (variable symbol value)
    ;; FIXME: no phase error logging?
    (dict-shadow-data (dict) symbol value))

so eventually, the name will get shadowed. the problem now seems to be
that there's some recursive lookup that messes things up?

lets try a test case.

   variable broem
   broem  \ <- infinite loop

ok.. conceptual error or just small bug?
just small bug: forgot parens around 'name in (extension: name () ('name)
which gave (quote name) -> recursive call


Entry: intermezzo -> snot + interrupt
Date: Mon Feb 11 10:22:48 CET 2008

this is getting on my nerves. it's been fixed a while ago in mzscheme
cvs, but maybe i should just go for 3.99 atm? see if it breaks
things..

went pretty well. had to replace some reverse! by reverse, and use
mutible pairs in the decoder.ss

another thing that changed is manual expansion of user paths
(tilde). this is a bit more problematic.

another thing that gets on my nerves is the absence of stack
traces.. what am i supposed to do with this:

  ERROR:
  car: expects argument of type <pair>; given {#f . #<procedure>}

ok.. it is pretty deep: the srfi-45-promise uses mutable pairs.
fixed + fixed the plt sandbox code and sent mail to plt-scheme list
fixed break stuff in brood + snot.
breaks work now.


Entry: more fixes
Date: Mon Feb 11 16:14:53 CET 2008

the 'empty' needs to be fixed. something wrong there.  doing reverse
asm would be an interesting next step + moving some code to hex
printing.


Entry: moving more code to scheme in tethered.ss
Date: Tue Feb 12 13:57:00 CET 2008

  * mzscheme with modules is quite a nice namespace management tool to
    write nontrivial programs. the big flat namespace with
    late-binding plugin behaviour in brood is a bit messy. maybe i do
    need the extra bit of mz handholding, and move plugins to
    parameterized code?

  * i really miss closures when writing cat code. names and nested
    scopes are important, and trading in a bit of conciseness for
    names (and absence of stack juggling!) is a good idea. with
    closures and macros, scheme is malleable enough to reduce red tape
    where necessary. my personal preference is moving: cat is not a
    good implementation language compared to scheme.

  * the cat intermediate language is interesting to simulate
    interactive forth: translation is really straightforward. gluing
    scheme and forth together, this layer serves well: adding scheme
    functionality to cat is straightforward + translating forth to cat
    is too.

this leaves me with the following problem to fix: ts-stack is a word
that is used to plug in the target stack bottom + pointer location. do
i keep it like that?

it looks like these things are best solved using parameters: that way
the scheme code will work too. maybe i should make a list:

  * connection (lazy-connect.ss)

candidates:

  * stack location
  * flash program/erase size


Entry: porting to mz v4
Date: Fri Feb 15 10:27:21 CET 2008

yeah, reading docs can bring clarity ;)

  doc/release-notes/mzscheme/MzScheme_4.txt

i got a bit confused about the whole scheme and scheme/base thing
while reading some web server docs. the biggest change seems to be the
use of optional and keyword arguments in lambda expressions.

do i make a full port? probably best to not keep too much legacy in
the brood core.. i need the upgrade for sandbox.ss fixes, so maybe
it's time to jump to 4 completely. as expressed in the release notes,
the keyword arguments can be problematic for legacy code..


Entry: big changes
Date: Fri Feb 15 10:52:12 CET 2008

OK..

i think i know what i need to do, but it's a big job: i need to get
rid of the NS namespace, and split the code into:

   * purely functional
   * parameterized

the line between the two isn't clear-cut. parameters are things that
are "mostly constant". i.e. communication ports, file paths, ... to me
it looks like this is the most important line of name space management
in scheme code. (in haskell, the problem of code parameterization as
automatic threading of data is solved using monads)

the problem with parameters is that they break referential
transparency, which is a great property for testing.. i think in most
cases, a transparent function can be wrapped in a parameterized
one. i just need some moderation here: every use of a parameter, deep
in the code (like 'here' in the assembler) makes things more specific,
but might be the right thing to do.

so, basicly, code can be dynamically layered: the assembler
i.e. doesn't USE the target dictionary as a parameter, but gets it
passed as an argument by the interaction system (which i.e. does has
it as a parameter). in contrast, the assembler, internally, might use
dictionary as a parameter, but the code outside of the assembler
doesn't need to know that.

getting rid of NS namespace, and moving to module name management
instead means:

  * more code is static (tree dependencies)
  * plugin behaviour (graph dependencies) need to be solved explicitly
  * simpler: map everything straight to scheme compilation, with names:
   - lexical
   - module-local (with prefix to separate from scheme)
   - toplevel (might be used for plugin behaviour / units?)

that looks significant enough to call it brood-5


Entry: eliminating the state dialect
Date: Fri Feb 15 11:20:19 CET 2008

anything that can be done on brood-4 before making the jump to
abolishing NS? yes: moving to parameterized project data while keeping
the transaction-like workflow intact + solve transaction thing with
target memory maps.

ROADMAP:

  - move more compiler code to scheme.

  - eliminate the prj <- state implementation, but make sure
    transaction behaviour is maintained (association lists or
     hash tables?)

  - move assembler and parser to separate dictionaries (or keep them
    in NS till later?)

  - move CAT code to module based namespace.


Entry: plt scheme study
Date: Tue Feb 19 16:44:25 CET 2008

maybe it's best to look a bit closer to the plt scheme language now
that V4 is coming out. some things i'd like to know more about are:

  * mixin class system
  * delimited control

mentioned on http://en.wikipedia.org/wiki/Plt_scheme

in addition, it would be nice to get more of the drscheme
functionality in snot, such as proper stack traces, module browser,
syntax-level refactoring.

i'll take http://zwizwa.be/darcs/sweb as the case study for
this. brood's a bit to hairy atm.

trying to make sense of:
http://www.cs.utah.edu/plt/delim-cont/

it looks like understanding this will bring me closer to understanding
the problem in brood with "undo" at the console, and the transaction
based model i'm chasing after. yeah, vague..

reading the paper. chapter 2: the operators: shift, control, reset.
hmm.. i'm missing a lot of muscle to read that one..

ltu to the rescue:
http://lambda-the-ultimate.org/node/606
http://lambda-the-ultimate.org/node/297

  "Good stuff! But keep in mind that, as the cartoon in the slide
   says, control operators can make your head hurt..."

no shit..

to summarize vaguely what the 2 points are about:

  - delimited control: partial continuations: don't jump outside of
    context.

  - mixins: somewhat related to generic functions.

about the delimited continuations, it might be best to read the plt
doc on "prompt" and some related things on continuation marks and
stack traces. for mixins, i'm reading this:

http://www.cs.utah.edu/plt/publications/aplas06-fff.pdf

from a quick skim i don't see how it's related to generic functions
though.. mixins seem interesting though i don't see the difference
with multiple inheritance. maybe that inheritance is linear instead of
tree-structured?


Entry: expression problem
Date: Thu Feb 21 23:23:46 CET 2008

http://groups.google.com/group/plt-scheme/browse_thread/thread/3aaacdc5169e5889

Mark's reply was pretty clear, and this:

  The PLT folks have used the expression problem as a springboard for
  thinking about big issues like, what does it mean to be a software
  component, and what are appropriate ways for reusing and extending a
  software component.

Is then modules/units/classes/mixins..

Swindle might be indeed a good thing to have a look at next. The whole
deal of multiple dispatch, so central to PF, is in the end something i
need to understand better.

about multimethods: cicil is mentioned here:
http://tunes.org/~eihrul/ecoop.pdf

http://citeseer.ist.psu.edu/219067.html
compression of dispatch tables?
(about PF: there's probably a way out using small number of types or
compile time type inference..)


I'm reading ``Modular Object-Oriented Programming with Units and
Mixins'' now.

The slogans make a lot of sense:

  * UNITS: Separate a module's linking specification from its
    encapsulated definitions.

  * MIXINS: Separate a class's superclass definition from its
    extending definitions.

Maybe i should give it a try?


Entry: units
Date: Fri Feb 22 01:06:40 CET 2008

looks like units + modules are going to be enough to organize brood
without the need for a NS hash table. how to exactly chop it up is
still a bit of a mistery. maybe start with the plain CAT code, then
organize the macros in a similar way, then find a way to translate
forth code straight to s-expression.

what if i start with separating out the assembler as a unit? in the
end i'd like to be able to use externally provided assemblers / C
compilers.

in doing so, abstracting the data types that are passed between
assembler and linker might be necessary. these are assembly opcodes,
dictionary and compiled target words + linker data.


Entry: call by need
Date: Fri Feb 22 12:18:56 CET 2008

was trying to quickly hack up a solution in scheme that emulates
makefiles and i realized it's actually call-by-need, which is again
the same as the dataflow serialization problem (pd). which can be
extended to early reuse by transforming it into a linear language
(i.e. forth).


Entry: delimited continuations
Date: Tue Feb 26 13:26:36 CET 2008

best to start here:
http://pre.plt-scheme.org/docs/html/reference/Evaluation_Model.html#(part~20prompt-model)

i think i sort of get it.. the analogy of stack frames, but more
general since they can be tree-structured (just like
environments). all the operations on continuations are then
compositions of these trees, with restrictions on how far back in the
tree continuations can be captured, and rules on composition that
makes sence in light of these restrictions.


Accessing a tree as if it were a stream and ``updating'' in-place
without mutation..

http://lambda-the-ultimate.org/node/969


Entry: errortrace
Date: Thu Feb 28 11:50:17 CET 2008

http://pre.plt-scheme.org/docs/html/errortrace/installing-errortrace.html

this work when using it like this:

Welcome to MzScheme v3.99.0.13 [3m], Copyright (c) 2004-2008 PLT Scheme Inc.
> (require errortrace)
> (enter! (file "/tmp/test.ss"))
 [loading /tmp/test.ss]
 [loading /usr/local/mz-3.99.0.13/collects/scheme/base/lang/compiled/reader_ss.zo]
 [loading /usr/local/mz-3.99.0.13/collects/syntax/compiled/module-reader_ss.zo]
> (a)
error: bla
/tmp/test.ss:8:12: (error (quote bla))
/tmp/test.ss:6:12: (b)


 === context ===
/tmp/test.ss:7:0: b
/tmp/test.ss:6:0: a
/usr/local/mz-3.99.0.13/collects/scheme/private/misc.ss:63:7

the file /tmp/test.ss is:

#lang scheme/base
(provide a)
(define (x) #f)
(define (a) (b) (x))
(define (b) (c) (x))
(define (c) (error 'bla))


now, to incorporate it in snot, it looks like there's a combination
needed with prompts. indeed.. the error printing works fine when
wrapped in 'prompt', and execution continues thereafter.

http://pre.plt-scheme.org/docs/html/reference/cont.html#(mod-path~20scheme~2fcontrol)

first thing to note: 'prompt' and 'abort' i can add those in sweb
instead of the current combination of parameters and call/ec.

second: prompt is readily applied in the repl in brood, at run/error
in host/base.ss

it works for host/purrr.ss by replacing the toplevel error printer by
a prompt. probably can do the same in snot.

hmm.. it's not in snot that the prompt should be. i did add some
marking to the code that prints 'language-rep-error' in case the
underlying rep (provided by the program!) doesn't print the error
itself. so in brood the error should be printed, and preferably INSIDE
the box context.

"console.ss" is loaded in the snot context from "snot.ss". the latter
file registers the different languages using the 'register-language'
snot function present in snot's toplevel. ("snot.ss" is not 'require'd
but 'load'ed)

what i'm interested in is frames that run up to the sandboxed
evaluator, so maybe it should be implemented in snot/box.ss ? see snot
ramblings for more..


Entry: continuation marks
Date: Thu Feb 28 17:20:17 CET 2008

http://www.cs.utah.edu/plt/publications/icfp07-fyff.pdf

currently continuation marks are used to make some kind of scat
language trace through the code. basicly, i can put anything there i
want. it's reassuring that the basic mechanism is available. (also,
this idea is very related to some dynamic variable hack i tried in
PF.. don't remember if it's still there..)

something strange that i didn't know about exceptions: apparently the
handler is executed in the context of the 'raise' call! that explains
a lot. no.. this is not the case:

(define param (make-parameter 123))
(with-handlers
    (((lambda (ex) #t)
      (lambda (ex) (printf "E: param = ~s\n" (param)))))
  (parameterize
      ((param 456))
    (begin
      (printf "B: param = ~s\n" (param))
      (raise 'boo))))

gives:
B: param = 456
E: param = 123

ok: i'm confusing the lowlevel 'handle' with the highlevel 'catch'.
the paper mentions how to implement 'catch' on top of 'abort', but
also talks about interference of prompts, and the use of tagged
prompts to work around that.

so the bottom line: exceptions and prompts do not collide, because the
prompt tag used to implement exceptions is not accessible. this does
mean that an exception can jump past any arbitrary prompt.

question: how does this work in sandbox? apparenlty sandbox re-raises
exceptions: see the internal function 'user-eval' in 'make-evaluator*'
in scheme/sandbox.ss : the value that comes from the channel is raised
if it's an exception.

something is still don't understand about mixing of prompts and
exceptions. if i don't wrap a prompt around the evaluation in
host/purrr.ss exceptions will terminate the program, so the prompt
seems to terminate propagation and trigger the printing of the
error. however, doing this down the chain in snot doesn't work like
that..

a prompt with default tag wraps the toplevel, so the whole
continuation is also a partial continuation (upto that prompt).

hmm.. then i read this:

  "The default prompt tag is also part of the built-in protocol for
   exception handling, in that the default exception handler aborts to
   the default tag after printing an error message."

note this says 'default exception handler'. so if there's one above
the prompt, that one will be called instead of the default handler.


Entry: roadmap
Date: Thu Feb 28 14:45:25 CET 2008

adjusted roadmap:

  * get base language working without NS + put in separate module.
  * figure out how to use units for plugin behaviour

then follow up with entry://20080215-112019

it looks like understanding the namespace issue by first moving the
core component to a more native namespace management system is a key
element. the rest should then be mere disentanglement.


TODO:
  separate SCAT as a different project
  separate it from NS


Entry: eval vs. require
Date: Sat Mar  1 19:56:18 CET 2008

the key insight (finally) seems to be that the current 'eval' based
approach needs to be replaced by 'require', or an underlying mechanism
that allows module based namespace management. everything that now
goes through the NS hash can be done with module namespaces.


Entry: module namespaces
Date: Mon Mar  3 00:27:20 CET 2008

everything reduces to scheme code in modules, which makes things
easier to extend. (also for parsers?)

(define increment
  (lambda s
    (apply base.+
           (cons 1 s))))


the idea is that 'increment' can be imported as 'base.increment', or
anything else, using prefix imports. there's no need to specify the
target namespace unless there are clashes between scheme and the
functions defined in the module, which can be avoided by not importing
scheme bindings, and separating definition of base. primitives (which
has scheme available) from definition of composites. composite modules
then only contain definitions which map some namespace ->
(un)prefixed.

to this end, a similar aproach can be used as the 'find' plugin in the
rpn syntax currently used for NS linking. the 3 elements: syntax,
source ns and dest ns can be specified like before. (just make a
namespace translator?)

problem solved? probably only units left: plugin behaviour needs to be
handled explicitly.


Entry: language tower
Date: Mon Mar  3 00:38:37 CET 2008

scheme
base    snarfed scheme functional rpn
state   macro primitives
macro   forth machine wrappers
forth

why so many?

they all solve a single problem in a very straightforward way. base
snarfs functionality from scheme, state is a lifted base + threaded
state, and macro implements the greedy machine map + peephole
optimizer using a threaded state model.

misc hacks from plane notes:

- auto snarf through contracts
- use #lang scat/base for base->base maps (is purely declarative
  language possible?)
- decouple module as unit to speed up compilation during incremental
  dev. (fake image based dev)
- get rid of @ stx for streams (scribble) / find standard streams lib
  / use lazy scheme. (brood is pure FP so why not)
- use parameters for compiler object (also for NS stuff?)


Entry: parameterized transformer
Date: Wed Mar  5 14:30:34 CET 2008

instead of using a compilation object, it might be more convenient to
use parameters in the transformer environment to define functionality
for the basic syntax operations.

maybe best to write the rpn code from scratch in scat/rpn/


Entry: scat ready
Date: Thu Mar 20 08:38:42 EDT 2008

looks like the lowest layer of rpn code + namespace management is
done. made a nice extension that allows parsers to be written as
syntax transformers (like it should!).

until the representation part is finished and ready to be ported to
brood, the process is documented in the dev log at
http://zwizwa.be/ramblings/scat


Entry: BROOD-5: initial move from BROOD-4
Date: Fri Feb 29 12:39:38 CET 2008

This ramblings file is a merge between BROOD-4 and BROOD-5. The new
version is codenamed SCAT, and is a complete rewrite of the core
representation and name space handling code. The darcs archive has
been flushed as has happened before. The old histories are still
available at:

http://zwizwa.be/darcs/brood-4
http://zwizwa.be/darcs/brood-2
http://zwizwa.be/darcs/brood-1

(brood-3 didn't have a history flush, and is present in brood-4)


Entry: utilities = language ?
Date: Fri Feb 29 13:50:02 CET 2008

Splitting brood in 3 components: brood, scat and zwizwa brings up the
problem of code bundling. there are 2 views of modules:

  - what they provide.

    this is the most important form of organization. there's a
    spectrum with 2 extremes: one object, and everything. the latter
    is a utility module, which is akin to a language. the former is a
    component module: an abstracted collection of code with a very
    limited interface.

  - how they are used.

    using component modules is straightforward: since they are often
    highly specialized, dependencies between components can be clean
    and understandable. using utility modules is not: granularity is
    much finer, and they behave more like "background noise": stuff
    you need to know about, but can assume to just "be there".


therefore, when using utilities in a project, like scat, it's maybe
best to take a single file and make sure it exports a non-colliding
set of tools. so the purpose of that single file is to be a decoupling
point, providing a language to the client, and importing small
utilities and components from all kind of different sources.

  so the approach i take is to have one collection of utilities
  (zwizwa-plt), and have a single file in each project that uses a
  base language with (a subset of) these utilities present.

an organic analogy:

     GRASS = all permeating language (base lang + utilities)
     TREES = specialized program components


Entry: scat without ns
Date: Fri Feb 29 14:26:39 CET 2008

how to proceed? this needs abstraction of definitions in 'composite'
macro and abstraction of 'find' in code bodies. the latter is already
worked out.

TODO: make base.ss independent of ns.ss

might be a good opportunity to start documenting. maybe try out
scribble?

scribble is quite nice.

disentangling ns is not going to be simple though. there's aproblem
that i didn't think about: BROOD is fraught with occurences of
defining one language in terms of another one (i.e. primitive
macros). will this still work? i do need different namespaces. should
they also be just prefixed? => this is a core problem and needs a
proper interface!

also.. why not use real objects for the rpn-tx.ss plugin behaviour?
maybe it is overkill.


Entry: for-template and scheme/base
Date: Wed Mar  5 16:44:26 CET 2008

setting: 2 modules
  test.ss   (require (for-syntax "rep.ss"))
  rpn-tx.ss (require (for-template mzscheme))

when test.ss is #lang mzscheme, or #lang scheme, this works. however,
for #lang scheme/base i get an error:

/home/tom/scat/scat/rpn/test.ss:8:2: compile: bad syntax; function application is not allowed, because no #%app syntax transformer is bound in: (lambda (stx) (syntax-case stx () ((_ . code) ((rpn-represent) (syntax code)))))

after adding a 'for-syntax mzscheme' or 'for-syntax scheme/base' in
test.ss it works.

what works is to add (for-template scheme/base) in rep-tx.ss and
(for-syntax scheme/base) in test.ss

looks like there's different phase 1 bindings for mzscheme and
scheme/base.. or something.. i don't really understand.

i try to explain:

tom@del:~/phase-test$ cat tx.ss
#lang scheme/base
(provide gen-code)
(require (for-template scheme/base))
(define (gen-code) #'(+ 1 2))

tom@del:~/phase-test$ cat use.ss
#lang scheme/base
(require (for-syntax "tx.ss" scheme/base))
(define-syntax gen
  (lambda (stx) (gen-code)))

why are both requires of scheme/base needed?

EDIT: take a look at these expands:

box> (expand-syntax #'(module broem scheme/base (define foo 123)))
(module broem scheme/base
  (#%module-begin (define-values (foo) '123) (#%app void)))
box> (expand-syntax #'(module broem mzscheme (define foo 123)))
(module broem mzscheme
  (#%plain-module-begin
   (#%require (for-syntax scheme/mzscheme))
   (define-values (foo) '123)))

mzscheme has mzscheme included in phase +1, while scheme/base does
not. (what's the difference between %plain-module-begin and
%module-begin ?


Entry: snarfing
Date: Wed Mar  5 18:27:53 CET 2008

got the syntax part working. next = snarfing from scheme to scat
names.

got compositions working too.

do need to take a good look at repl toplevel vs. module local
names. how to destinguish between undefined names in module context,
and use of a toplevel name in repl context?

EDIT: actually, it's not necessary. it might be easier to just map to
a name when it's not lexical if the distinction between module-local
and other isn't necessary, and let the scheme name resolver take care
of it.


Entry: namespaces
Date: Thu Mar  6 13:22:21 CET 2008

it's a lot simpler now. The dispatch routine interprets two kinds of
identifiers:

   lexical -> used as is
   other   -> prefixed

if a finer grained control is necessary, the rpn-global (and possibly
rpn-lexical) parameters can be overridden.

  NOTE: it indeed makes a lot of sense to do this with parameters: it
  falls into the "printing" design pattern: code that transforms a
  description into a "document" according to a set of global
  parameters.


Entry: language names
Date: Thu Mar  6 16:02:48 CET 2008

is that still necessary? the only reason it's there is to interpret
the source code, but if that interpretation isn't always possible, why
keep it? it's also pretty awkward to fill the parameters everywhere..

with the risk of breaking things, i'm going to take out the language
names. probably best to have the source code field represent an
expression that evals to the object, and let the debug code that uses
the source field interpret the whole expression.


Entry: state syntax
Date: Thu Mar  6 18:25:20 CET 2008

scat/state doesn't do anything else than switching namespace for
quoted programs. symbols need to be imported explicitly. so the
question is: how to import a bulk at once?

EDIT: forgot immediates! ok.. seems to work now.

maybe this is it:
(namespace-mapped-symbols
 (module->namespace
  '(file "/home/tom/scat/scat/base/base.ss")))

not so difficult: see module->names in ns.ss

now the question is, how to map this to a 'require' form?

this doesn't seem to work, i get some strange errors that might be due
to the fact that i'm using dynamic-require into the current
namespace. what really needs to be done is just determine the exports
of a module: the module then needs to be compiled, but not
instantiated.

maybe 'module-compiled-exports' is a better way to get to the exports?

from scribble/search.ss:

 (module-compiled-exports
   (get-module-code (resolved-module-path-name rmp)))])

ok.. from syntax/modcode i can use get-module-code to do this:

(define (get-exports path)
  (map car
       (car
        (call-with-values
            (lambda ()
              (module-compiled-exports
               (get-module-code
                (resolved-module-path-name
                 (make-resolved-module-path path)))))
          list))))

which works on:
(get-exports (string->path "/home/tom/scat/scat/base/ns-tx.ss"))

now i just need a way to make this work on ordinary require specs.
+ solve some problem in ns.ss (%app)

looks like i've got something that works from toplevel.. now need to
get it going in modules.

almost.. the rest is for later.


Entry: modules
Date: Sat Mar  8 17:12:01 CET 2008

got stuck yesterday at translating require specs -> module file
location. going to leave it, and try to get the symbol snarfing
working.

box> (define-lifted (state)
             (base)
             state-lift "base/base.ss")

compile: bad syntax; reference to top-level identifier is not allowed,
because no #%top syntax transformer is bound in: module

i have no idea what's going on here..

what this means is that:
 * the compiler maps undefined symbols -> toplevel refs
 * there's an undefined symbol mapped to toplevel refs
 * there's no toplevel


Entry: start from scratch
Date: Sun Mar  9 14:43:10 CET 2008

probably need to read a bit about modules, namespaces and
compilation. for example, what does this code actually do?

(module test mzscheme
  (provide foo boo)
  (define-syntax (boo stx)
    (syntax-case stx ()
      ((_ . args)
       (begin
         (printf "compiling\n")
         #`(+ 1 2)))))
  (define foo (boo)))


when evaluated, it declares a module, compiling its code.

before i can understand things, i need to see the relation between:

  - compilation handlers
  - load/use-compiled
  - get-module-code
  - namespace
  - namespace's module registry (namespace-attach-module)
  - code inspectors

the get-module-code approach works, but somehow is context-dependent
(current namespace / module registry?). it would be interesting to
find a method independent of context.


so...

a namespace is somthing that maps names -> things, to be used by
'eval'. it's a generalization of the standard scheme toplevel.

each namespace has a module registry. modules declared in a namespace
will attach to this registry, to be referenced by an identifier.


Entry: getting at the names..
Date: Mon Mar 10 23:06:32 CET 2008

got something that works:

;; Get to the exported names by requiring the module into an empty
;; namespace which has the base module attached to its registry.
(define (get-names path)
  (let ((n (make-base-empty-namespace)))
    (parameterize
        ((current-namespace n))
      (namespace-require/expansion-time path))
    (namespace-mapped-symbols n)))

still stuck at some #%app problem further down the line, but the names
come out.

wait.. what a mess! the previous one did work, and the current one
doesn't (needs absolute paths).. get-module-code is ok.


this seems to be problematic:

(define (define-ns-tx stx)
  (syntax-case stx ()
    ((_ ns name val)
     (let ((mapped (ns-prefixed #'ns #'name)))
       #`(define #,mapped val)))))

the name created here 'mapped' is not recognized as a module-local one.


----------- broem.ss
#lang scheme/base
(require
 (for-syntax
  scheme/base
  "broem-tx.ss"))
(provide foo)
(define-syntax (foo stx)
  (foo-tx stx))

----------- broem-tx.ss
#lang scheme/base
(provide foo-tx)
(define (foo-tx stx)
  #`(+ 1 2))


this gives
box> (require "broem.ss")
box> (foo)
broem-tx.ss:4:4: compile: bad syntax; function application is not allowed, because no #%app syntax transformer is bound in: (+ 1 2)

 === context ===
/usr/local/plt-3.99.0.12/collects/scheme/sandbox.ss:445:4: loop


the thing that's missing here is the (require (for-template
scheme/base)) in broem-tx.ss


OK..
so it seems that now the remaining problems are because some names are
expanded to toplevel form because they are not visible somehow?
#lang scheme/base


------- lala.ss

(require
 (for-syntax scheme/base))

(define foo 123)

(define-syntax (broem stx)
  (printf "compiling broem\n")
  #`(define lala #,'foo))

(broem)

(provide foo broem lala)


that doesn't create the lala symbol.. why?


Entry: more confusion
Date: Tue Mar 11 12:27:28 CET 2008

The module-path as passed to require is resolved as:

 ((current-module-name-resolver) '(file "asdf") #f #f #f)

actually, the docs explain it quite well: the name resolver also loads
the module and places the name into the current registry when 4th arg
is #t.. so we have the connection:


current-module-name-resolver -> current-load/use-compiled

EDIT: also look at 'expand-import' from scheme/require-transform 12.4.1


Entry: the problem
Date: Wed Mar 12 09:16:25 EDT 2008

so.. what about making a test case for the actual problems?

  module a: exports a couple of names
  module b: reads module a's source, extracts names, and uses them

  why can't you export a name created by a macro? i think i did this
  before, but the experiment above doesn't work.. what's up?


Entry: again
Date: Thu Mar 13 09:07:52 EDT 2008

why is this so confusing? what i try to do is symbol capture. or not?
2nd question first. let's look at how structures are implemented: they
create new names. maybe 'syntax-introduce' is necessary?

i clearly have to stop tackling this in a half-assed way. what's going
on here seems to be me being confused about the actual inner workings
of the module system, syntactic forms and namespaces. it's at the very
core of the language, nothing to try to random-walk around.. time for
some discipline.

revised questions:
  * how to export a name generated by a macro?
  * is my approach (snarfing names from modules) 'the right one'?

roadmap: read about core syntax and hygiene.
2.3 Expansion (Parsing) -> very useful ;)

the #%app error usually means that an identifier is not syntax, but a
function application. if phase 1 doesn't have a language instantiated
(which defines #%app) this is an error. the #%top error probably means
that an identifier is not defined. in modules, the #%top identifier is
not defined?

3.3 #%top can be used to override lexical bindings -> "Such references
are disallowed anywhere within a module form"

evaluating undefined abc as (#%top . abc) in toplevel gives:
  reference to undefined identifier: abc

in module level:
  reference to an identifier before its definition: abc in module: "/tmp/test.ss"


Entry: defining names from macro
Date: Thu Mar 13 12:27:21 EDT 2008

#lang scheme/base
(require
 (for-syntax scheme/base))

(define-syntax (foo stx)
  #`(define bar 123))

(define-syntax (broem stx)
  (syntax-case stx ()
    ((_ name)
     (printf "expanding (broem)\n")
     #`(define name 123))))

(broem lalala)
(foo)

;;---

this defines 'lalala' as a symbol, but 'bar' is not accessible. looks
like a hygiene issue where hygiene has to be broken explicitly?

a better approach is maybe to expand to a 'module' form, so all
symbols are introduces in the same place, and no capturing is
necessary?

something like:

(module state-snarfs "snarfer-lang.ss"
   (snarf-from "../base/base.ss" (state) (base) lift-state))

where 'snarf-from' is a macro provided by snarfer-lang.ss which
expands to a #%plain-module-begin form.

rationale: what is a module? it is a finite map (name -> stx/value)
in that sense, a snarf is maybe indeed better exposed as a module
transformer, mapping modules -> module instead of modules -> expressions

again: the alternatives are:
  * variable capture for define forms (non-hygienic)
  * whole module expression generation


Entry: datum->syntax
Date: Thu Mar 13 14:59:33 EDT 2008

this worked:

(define-syntax (foo stx)
  (datum->syntax stx '(define bar 123)))

it defines 'bar'. replacing stx with #f doesn't work (no #%app error).
the same thing with #' doesn't work.. apparently, the latter creates
some lexical bindings..

so, the proper way to break hygiene is using 'datum->syntax'.

the thing to look at is build-struct-names from syntax/struct


this seems to work for the current state.ss

update:
        using (->syntax stx <symbol>) with stx the stx object that's
        passed to define-lifted-tx, it seems to work fine. looks like
        the problem was (->syntax #f <symbol>)

in short: the stx object seems to have knowledge of the module's
namespace, can tell whether names are module-local?

Entry: module path
Date: Thu Mar 13 17:41:54 EDT 2008


more confusion:

  resolve-module-path could maybe be used in combination
  with (syntax-source-module) to resolve relative references?


* have a look at module index :

EDIT:

----- test.ss:

#lang scheme/base
(require
 (for-syntax
  scheme/base))

(define-syntax (foo stx)
  (printf "~a\n"
          (module-path-index-resolve
           (syntax-source-module stx)))
  (datum->syntax stx ''bla))

(foo)

gives:

image> (require (file "/tmp/test.ss"))
'test
bla

so it returns a symbol instead of a path..


EDIT:

incredible.. this also works, but only in the dynamic extent of syntax
transformers (why?) :

(define (module-mapped-symbols module-path)
  (let-values
      (((base-phase template-phase label-phase)
        (syntax-local-module-exports module-path)))
    base-phase))


Entry: what's in a stx object?
Date: Fri Mar 14 06:50:58 EDT 2008

looking at 2.3 expansion, it seems that some information at least is
context-dependent. the question is: is this context stored in the stx
object, or set by parameters? (looks like that implementation is
internal : accessible by 'syntax-local-context')

another: how is the lexical info stored?


Entry: carefully tuned api for the compiler
Date: Fri Mar 14 07:33:18 EDT 2008

all-in-all, it's quite nice. once the basic design elements are
understood, and all over simplifying assumptions are cleared out by
actually reading the docs (!). there is a huge emphasis on macros and
modules (static), instead of "building new interpreters" as in SICP.


Entry: on with the real work
Date: Sat Mar 15 07:38:50 EDT 2008

tough this made me think a bit.. this snarfing business, is it really
necessary? or will it work at all? currently it uses some form of
delegation to implement functionality: if it's not in (state), take it
from (base) what is necessary is to delay the 'fill up' to the last
stage at which point, after declaration of (state) functions.


Entry: the design choices
Date: Sat Mar 15 09:09:15 EDT 2008

* use syntax objects for everything that represents code. this gives
  the best match to PLT scheme: it allows to use most of the
  underlying machinery in the way it integrates well.

    - representation: lambda + redefinable language structure
    - namespaces: modules and units
    - compile/interpret: use the scheme expander instead

  in retrospect, this took a long time for me to understand, but it
  looks like i'm finally getting it. identifier scope management
  (lexical, dynamic, module, unit) is hard. if somebody does it for
  you, then use it!

* syntax streams: are these really necessary? they complicate things
  due to having to deal with both lists and streams. maybe, when
  everything is syntax and usable as scheme macros, streams are no
  longer necessary?


it looks like the more important choice is going to be to have a
complete map from text -> binary code / assembly code that lives in
the syntax domain.


Entry: quoted programs
Date: Sat Mar 15 10:01:51 EDT 2008

in a code quotation, it should be possible to override the
language. if the first identifier in a program is a syntactic form,
then use it to transform the expression, otherwise use default.

not so straightforward. actually, it is: using

 (define (transformer? id-stx)
   (syntax-local-value
    id-stx (lambda () #f)))

maybe the possibility to do this recursively would be nice? that way
all kinds of syntax extensions can be implemented, and forth could be
represented as-is.

this can probably be handled in dispatch?

almost. it's a partial fold, while currently represent is a full
(left) fold. fold up to encountering a transformer, then pass the
remainder of the expression to the transformer (who might call
represent again).

first: cleanup some things. move the symbol mapping to a single
function.

second: having syntax in the middle of an expansion isn't really
sound, right? what would (a _b c) do, with _b syntax?


(represent (a _b c)) ->
(lambda s (represent/step (a _b c) s)) ->

(represent/step (a _b c) s) ->

(lambda s
  (represent/step
    (_b c)
    (dispatch a s)))

well.. it does make sense. if a is not syntax, the result is an expression

ok.. i got something working: string things together with
'rpn-compile' until a transformer is encountered, upon which the
rest of the code is handed off.

however: still can't use '(lambda <f> <b>)' because the there's
already a lambda wrapped.


i wonder however: why not allow true parsing words? the loop over a
code body could be made more explicit: give words access to the syntax
objects. the traditional 'immediate' words could be used. no.. they
need to be macros: need to be available at expand time, and function
bindings aren't. so i'm on the right track.


Entry: lexical tricks
Date: Sun Mar 16 14:51:04 EDT 2008

maybe these lexical tricks are more of a nuisance than anything
else.. if lexical capture from scheme is required, why not just prefix
those names? what is lost here is an abstract way to access the
namespace, but what's gained is more clarity of mechanism +
readability of code. maybe that's worth more?

in that case, a simpler non-intrusive prefix might be desirable, or a
let-form that abstracts the prefixes.

ok: took out the automatic lexical tricks:

  * all identifiers in rpn code are now mapped in the compiler
  * unquote works in
      - quoted () code   -> unquote value interpreted as a function
      - quoted '  data   -> unquote value placed in data structure

looks better this way.


Entry: typed writer monad
Date: Sun Mar 16 15:58:10 EDT 2008

the state extension used in brood works well for macros, but is very
hard to use with the dictionary in prj. why is this?

the conflict is about what the meaning of quotations should be:

    if they're state functions, all code that takes programs should be
    redefined such that they pass on the state. this makes sense for
    state functions, but is infeasible unless it can be done
    automatically.

    passing state code to non-state code doesn't work.

so what if code were typed? the problem is that i'm trying to
implement some half-assed monad thing without having a type system to
make it convenient..

so... base can't run state code, but state code can be converted to
base code if it doesn't contain any real state actions. probably state
needs a 'run' that can distinguish between the two types.

can the functionality of run be somehow 'replaced' when 'inside' state
code? in order to answer, need to define what 'run' means.


Entry: run
Date: Sun Mar 16 16:38:27 EDT 2008

since 'run' is the ONLY PRIMITIVE word that accepts code, any code can
be made to run by overriding the behaviour of it. is this leading
anywhere?

if there are 2 languages (base and state), there should also be two
language types when code in these languages is represented as data AND
two STATE types.

once these things are in place, it should be clear what the behaviour
of run should be: dispatch on code and state type and do the right
thing:

\___ code      STACK      STATE
data\___

STACK          apply      error

STATE        apply/lift   apply


in order to not have to change everything, the base stack type can be
just a list. anything built on top though, needs to be typed.

let's start with the state type. OK. seems to work. this looks like a
generic 1-arg language (in scat-state.ss). looks like it's factored in
the right way now: didn't take any changes to core to change the rep
completely.

next: word structures. maybe take out source rep first: if it's
syntax, i can probably find source rep from source text instead of
storing it..

DONE.

the fact that SCAT words are procedures, is it a feature? or is a more
abstract interface required? now base and state have different reps,
maybe 'apply' (run) should be made abstract?

or.. each function object should carry a run method, which accepts
different states/stacks/arguments ?

i'm being pushed towards a staticly typed language..


Entry: lifting applicators
Date: Mon Mar 17 10:45:46 EDT 2008

so.. splitting the stack/state objects themselves into different types
is straightforward. now, how should a language type be implemented? as
a run method? this is probably the most straightforward way: the
default could be apply, and anything smarter than the base langauge
simply overrides it.

this is confusing. if a word has an explicit 'apply' method, it's no
longer a procedure (which has an implicit, universal apply method).

dead end? yes.. a word structure is either a procedure, or not. why
should it be an abstract type? ->
  * to add annotation
  * to distinguish from other procedures


let's reach for the bottle: can i solve this with dynamic binding?
no.. all the words that somehow are to operate on the state need to be
lifted. it's not that the functions deep inside some dynamic extent
need to be updated, it's that the state itself needs to be
accessible. it's probably possible to solve part of it with dynamic
binding, but that will not play nice with closures..

so.. to come back to a comment in brood: the problem is not that prj
state language is a bad abstraction, it's that the lifting of
operators from base -> state is not so straightforward : anything that
passes or duplicates the whole state needs to be handled.

eventually, this boils down to 'run' and anything built from it. maybe
the compilation should handle this?

the next step to the solution is: 'run' should be isolated: no-where
should there be 'apply': i could keep the base representation for
debugging purposes, but everywhere else it should be abstract

maybe 'base-control' should be separated out again, like it was in
BROOD-2? am i going in circles? not really: but w.r.t. to lifting,
control operations are different.

next: split off control.ss which contains all the code tainted with
'run'.

so, why is dynamic binding for state so bad? suppose there's a
transition from scat/state -> scat/base, and something needs to access
the state inside the scat/base extent. if this is allowed, the whole
base language is no longer referentially transparent.

let's see..
  ifte : (choose run)

is it possible to lift 'ifte' if the source is not available?

maybe i just need continuations?

there's something right in my face here i just don't see..


Entry: lifting
Date: Tue Mar 18 08:46:21 EDT 2008

the thing that bugs me is that currently, the only way to do the
lifting is to manually update and duplicate all the code: i found an
operation that doesn't compose easily. this means that the whole of
'control.ss' needs to be parameterized, so it can be included in the
state language.

what this control needs is:
  * a way to apply a function to the abstract state
  * a way to push/pop to the stack in this state

should i give up the concreteness of the representation? currently,
the only place where i use it is in debugging (and that's ok) or in
lexical binding of functions (which can be dealth with using macros).

when what currently is 'apply' becomes abstract, a genuine
compositional language can emerge.

OK: got stack abstract now. change wasn't so deep, means i'm getting
better at choosing the right abstractions..

to summarise: the base language has the following components:

   * RPN TRANSFORMER
   * STACK DATA TYPE
   * NAMESPACE IDENTIFIER MAPPER

this is extended easily to STATE DATA TYPE


Entry: representation
Date: Tue Mar 18 11:33:20 EDT 2008

so... what about: represent everything with
  - procedures (closures)
  - structure types
  - tagged lists

tagged lists can be used to implement structure types, but closures
can too. (the example of 'cons' in SICP). iirc, in Haskell, structure
types are unevaluated functions.

basically, these are all interchangable, and are merely about
implementation. however, in PLT scheme, using structure types seems to
be more efficient.

i see myself moving more from the "wow, everything is a list!" and
"eval is amazing!" attitude towards the static undertone in PLT
scheme, which seems to be more based on the language tower model
(macros are no longer 'accumulative' -> everything is unrolled into a
tree structure) and structure types: more abstract than tagged
lists. everything is more static without loosing any power, except for
a little bit of quick-hack power..


Entry: sidestepping the problem
Date: Tue Mar 18 12:17:33 EDT 2008

what if i sidestep the whole issue, and define base to have a void
state? that way, the skeleton can be kept, but arbitrary data can be
threaded through code. all control code just passes on the state,
without being able to modify it.

how is this really different from an 'environment' value to be passed
along?

if you look at how 'invisible state data' is used, it is quite related
to lexical variables. for example in (lambda (x) <exp>), the 'x' might
be used very deep inside <exp>, so for all the nesting inbetween, this
'x' is not used. the equivalent of a deeply nested expression in an
rpn language is a long composition.

let's give it a try..

seems to be the right thing to do. next problem = to abstract control.


Entry: control
Date: Tue Mar 18 13:29:05 EDT 2008

control operations are things that access the data stack, and are able
to apply functions to state, collect state, etc.. without having
access to the state themselves.

maybe this is a nice occasion to start using structure types and a
modified match with

(struct <tag> (<var> ...)) -> (<tag> <var> ...)


Entry: dynamic trick
Date: Tue Mar 18 15:55:40 EDT 2008

instead of making a lot of control words for each dynamic invocation,
there's a mechanism now that takes a function which accepts a thunk,
and evaluates the thunk in the dynamic environment. the thunk
represents the rpn continuation.


Entry: comprehensions
Date: Wed Mar 19 10:00:21 EDT 2008

today's excursion into plt land is about comprehensions. the main
reasons:

  * for-each: the concatenative program interpreter
  * for: number -> list maps, for low-level ops

hmm.. leave it for later. i fixed it like it was, but turned
interpret-list into a function (was syntax).

looks like the language is ok now. it's simpler, and hidden state is
easier to implement. purity is guaranteed by just not passing in any
state.

so can the whole state namespace be ditched then? namespaces are still
necessary for the brood macro language, but not any more for prj. the
important thing is that there's no need for lifting, since the base
and state language syntaxes are the same now.

let's call it all scat. looks like this solves a lot of problems.


Entry: the scat machine model
Date: Wed Mar 19 12:22:24 EDT 2008

all scat functions take a data stack, and a hidden parameter used to
implement any hidden data that bypasses the computation. the reason
for implementing it like this is that the code operating on the data
stack (the SCAT langauge) is orthogonal to extensions that implement
hidden state.

the reason that the pure functions ALSO pass this state around is to
avoid the problem of lifting pure functions to state passing
functions. if purity is desired: simply do not pass any interpretable
state into a computation. to check purity: pass something that can
only be interpreted locally (checks read), and check if it's the same
when it comes out (checks write).

why is hidden state necessary? anything can be solved with
combinators, but for some problems, bypass can be be a tremendous
simplification. relate this to:

  * lexical variables (bypass trough random access in environment)
  * monads (bypass bookkeeping implemented by 'bind' and 'return')


functions can then be classified into 3 groups that do

   1) not know about state (define-word).
   2) know about state, but merely pass it on (define-control, make-state)
   3) modifications on the state data (like 2, but with data accessors)


i'd like to clearly separate 2 and 3, but see no way to do now other
than giving group 2 no means to interpret the data, and require
functions in group 2 to never replace the data. the latter could be
guaranteed with an assert, the former is namespace management.

i'm done ;)


Entry: from here
Date: Wed Mar 19 13:47:33 EDT 2008

i guess it's time to start rebuilding brood on top of this
structure. the core changes are:
  * create the macro language
  * rewrite prj.ss

going to keep macro.ss in scat for now.

default semantics seems to work fine, but name space mapping needs to
make the distinction between defined macros, and undefined ones which
default to calls. maybe this is a nice point to lift target namespace
management to scheme by requiring all words to be defined as macros?

box> (define-ns (macro) abc (postponed-word 'abc))

next?

core is working fine. the remainder is namespace management and
rewriting parsing words.


Entry: parsing words
Date: Wed Mar 19 16:40:59 EDT 2008

what about this: all scheme transformers are unary functions. nothing
prevents me though to add binary functions to the transformer
environment. these could then be reserved for calls from within
'next', because they don't fit the scheme transformer type.

this so, because it's illegal to call them directly anyway: they use
dynamic state set by the compiler macros.

this seems to work pretty well.


Entry: forth / macro mode
Date: Wed Mar 19 18:41:12 EDT 2008

the next thing to implement is the forth / macro mode. this is a key
issue, since i'd like to change this to a set of LHS = RHS expressions
where names are clearly defined.

part of this is delimited parsing. (EDIT: solved)

to make this transition as smooth as possible, the basic syntactic
form which defines both macros and words needs to be defined. the core
problem is a tough one: allowing identifiers to be overridden requires
some lexical structure. the second problem is to retain the 'inner'
names after a compilation. maybe the enclosing structure can be saved
and incrementally extended?

what i'm talking about is this:

(let-ns (macro)
    ((foo (macro: 1 2 3))
     (bar (macro: 456)))
  <forth> )

->
(let-ns (macro)
    ((foo (macro: 1 2 3))
     (bar (macro: 456)))
    (let-ns (macro)
        ((word (target-word 123))
         (burp (target-word 567)))
        <hole>))

basicly, the target dictionary is a nested lexical structure like the
above, which has a 'hole' in it. on the next compilation step, this
hole can be extended. this way all name management can be delegated to
the scheme expander.

to make this workable, the state should be stored in a form that's
like the above, but flat.

see target-tx.ss for the resulting code.

to summarize:

  * target state = dictionary = listof listof (name . code) each
    element in the top dictionary list consists of an association list
    for one compilation level. names within a level need to be unique
    and can be used recursively.

  * each forth <-> macro transition in forth code introduces a new
    nesting level.


remaining q: how to represent non-macros? what about using 'forth:' to
mean something that generates a macro which compiles an address.

what about treating words as macros, but mark them as 'instantiated?'
this might be interesting, since it allows some flexibility for code
processing (i.e. code inlining).

one thing at a time..
what's next?

need to define some interfaces. most likely the assembler. where does
it fit in? what does a forth file represent?


Entry: representation
Date: Thu Mar 20 08:40:18 EDT 2008

with scat ready, the next real problem is representation of forth
code. time for some "what is" exercises.

macros are not really the problem, but what is instantiated forth code?

(let-ns (macro)
    ((foo (macro: a bar))
     (bar (macro: c d e foo)))
  (let-ns (macro)
      ((baz  (forth: foo bar))
       (shma (forth: wikki wakki)))))

in the end, after compilation, this leads to:

(let-ns (macro)
    ((foo (macro: a bar))
     (bar (macro: c d e foo)))
  (let-ns (macro)
      ((baz  (macro: 123 compile))
       (shma (macro: 456 compile)))
    <forth>))

so: a dictionary is a nested let-ns expression with a hole in
it. compilation is filling this hole to get a new nested let-ns
expression.

it looks like the key extension is to move the assembler to the syntax
level also.

so what does the 'forth:' form do?

  * it creates a macro which compiles a reference, and saves code for
    later compilation in that lexical environment (closure) bound to
    that reference.

so the essential part is to separate the creation of new names from
evaluation of the righthand sides + complete unification of forth
words and macros. (a forth word has a macro associated which compiles
its body).

testing this, but i'm missing an essential part of brood (compilation
stack pattern matching) to get this working. EDIT: worked around it
using base language.

what i got now:
(define empty-state
  (make-state '() '()))

(define (run-macro macro)
  (state-data
   (macro empty-state)))

(define-syntax forth:
  (syntax-rules ()
    ((_ . code)
     (let ((word
            (delay
              (run-macro
               (macro: . code)))))
       (base: ',word compile)))))

(define xxx
  (letrec-ns
   (macro)
   ((abc   (macro: 1 2 3))
    (def   (macro: 4 5 6))
    (broem (forth: 123))
    (lala  (forth: abc def))
    (shama (forth: lala)))

   (macro: lala)))

looks like a pretty good implementation. it even has a means to only
compile what's necessary by adding an 'export' word: determine which
functions get exported into the namespace.

this structure creates a graph in which the nodes are forth words
(instantiated macros).

Entry: code structure: line or graph?
Date: Thu Mar 20 10:43:43 EDT 2008

now.. look in to the 'structured assembler' (something from an Olin
Shivers writeup about a project he worked on doing dataflow
optimization for..)

if assembly is just a serialized expression graph, why not keep it a
graph longer? there are some features used in purrr that assume
serialized code (fallthrough). is this really necessary? or should
such code be grouped somehow as a single function with multiple entry
points. it's interesting as low-level control, but a pain to do code
transformations.. the rep mentioned above already has this.

now, with this structured asm thing, maybe all local label issues can
be solved that way too?

NEXT:
  * unify macros and forth words (meaning of ';')
  * defining LHS/RHS and ':'
  * variables

Entry: definitions
Date: Thu Mar 20 16:59:55 EDT 2008


a problem popped up: either ':' (upto) is terminator, or ';' is a
terminator (including).

the real problem: if ':' is a macro, it should be able to introduce
new names. how to do that? macros are parsed inside the body of a
toplevel 'represent'. instead of doing it like that, rpn-next could be
called directly.

ok. got it working by abusing the compiler a bit and storing continue
points in a dynamic variable. i tried without side-effects (using
prompts), but can't get that to work.

now with compilation mode built in.


Entry: forth rep
Date: Fri Mar 21 15:33:11 EDT 2008

seems to be fixed now. see the files forth.ss forth-tx.ss and
forth-rep.ss : the result of evaluating a (definitions . <code>) form
is an assembly code graph.

now, how to save the macro definitions so incremental compilation can
be performed? the 'definitions' macro should be extended with an input
assoc-list of defined words (and macros?).


Entry: incremental compilation
Date: Sat Mar 22 09:12:25 EDT 2008

an intermediate representation is necessary which preserves the macros
in some form so they can be re-instantiated, but also preserves the
forth words in some form. is it possible to somehow grab the source
code (after lexing) of each word? that way original macro source can
be preserved, and forth source can be translated to abstract rep (only
addresses).

basic idea: can't use source code to save current language
state. maybe save the environment functions together with the forth
structs that are generated? i.e. if address is filled, return that,
otherwise return word struct.

maybe the macro from yesterday needs to be factored a bit?
OK. incremental updates are implemented as a simple nesting of
letrec-ns forms.

so, how to represent the target state? i prefer to have this in a
readable form, preferrably one with macro source intact, and forth
words resolved to numbers.

first: separated out some things: there's a 2-level nesting with 'old'
not being collected. need some factoring to reset parameters.

or.. i could make the macros ephemeral, so they have to be included
explicitly in each source file?

or.. the whole thing runs at stx-transform time and expands into a
module that defines a number of forth macros and a binary code chunk?

maybe it's best to collect per level. then later, code that's not
necessary can be ignored.

can collection be done statically?

the thing is: macros need to be saved in some syntactic form, not as
an object. from this, it's not necessary to evaluate macros, except
for things that generate code. instantiated live target code is
represented by macros.

so, ground rules:
 * evaluation of code gives a list of nodes in a code graph
 * saving of state = saving of macros as syntax.

problem = how to save macros?
each word evaluated needs to be evaluated in the lexical context.


Entry: what is compilation of forth words?
Date: Sat Mar 22 19:20:59 EDT 2008

essentially, in the current context, it's a source code map which
translates all forth words to macros that compile a call:

macro : abc 123 ;
forth : xyz abc abc ;

 ->

macro : abc 123 ;
      : xyz #x0010 execute ;


so what is the essence?  it's source transformation. i probably need
to perform some operation twice: one to construct some syntax, and
another one to evaluate functions. maybe first separate syntax and
semantics better?

let's try to catch the code first. it's probably easier to move from
using a 'continue' thunk to just recording the point of the next
definition.

catching code is problematic: there might be a macro that generates
several words or macros.. this cannot be cleanly cut out by cutting
out each definition. the only real thing is what comes out in the end:
that's what needs saving: the fully expanded nested let expression.


looking at this:

box-macro> (forth-words : abc 1 2 3 def : def 4 5 6 abc)
(forth-words-incremental (: abc 1 2 3 def : def 4 5 6 abc))
box-macro>
(forth-definitions
  (lambda (collect)
    (letrec-ns
      (macro)
      ((abc
        (mode-forth
          'abc
          (lambda (state)
            (macro/def ((literal 3) ((literal 2) ((literal 1) state)))))))
       (def
        (mode-forth
          'def
          (lambda (state)
            (macro/abc ((literal 6) ((literal 5) ((literal 4) state))))))))
      (collect))))

actually gives the solution.
instead of expanding to something which includes dynamic binding, just
have them pass in as an argument. in this case: ditch the
'forth-definitions' and make 'mode-forth' and 'mode-macro' parameters.

this gives a very clean separation of function and structure: the
structure is just the expansion of all macros. function can be plugged
in later.

yep. works like a charm.

box-macro> (forth-rep (macro : abc 123 forth : def 345))
(lambda (forth macro collect)
  (letrec-ns
    (macro)
    ((abc (macro 'abc (lambda (x) ((lit 123) x))))
     (def (forth 'def (lambda (x) ((lit 345) x)))))
    (collect)))


Entry: syntax: going further
Date: Sun Mar 23 09:38:32 EDT 2008

what i'd like to save is not only this structure, but a means to
transform it into something that
  * has addresses bound
  * can be extended

made the 'lambda' part parametric.. is this necessary?

so.. maybe move away from storing the delayed computation to something
more concrete? like the rest of the syntax stream? kept it, but
cleaned up the access a bit.

so.. it's essential to have 2 functions:
  * the syntax transformer
  * the asm graph evaluator

now, when the assembly step has finished, the original syntax needs to
be updated (words replaced with address refs) and saved such that it
can be reused later to build new syntax expression.

so what is what?

a dictionary is a collection of frames from 'compilation units'. (a CU
is a single level in the final nested letrec expression.) composition
of CUs is composition of syntax transformers.

ok.. separated out the core form: code->reps

made an extra macro called 'dictionary' which enables a slightly
lighter concrete s-expression representation, so it is easier to edit
when replacing words with macros compiling addresses.

maybe add another called 'word' which transforms the names so expanded
names don't have the prefix? this will probably clash with
macros.. nope: there's a way between: i need this only in the
representation, which is after macro expansion.

EDIT: had to update rpn-tx.ss to expand forms returned by
rpn-map-identifier before determining if the resulting identifier is a
macro.

so.. we get this:

box-macro> (forth-rep dictionary (: foo 123 : bar 4) (: baz foo bar))
(lambda (forth macro collect)
  (dictionary
    ((foo forth (lambda (x) ((lit 123) x)))
     (bar forth (lambda (x) ((lit 4) x))))
    (dictionary
      ((baz forth (lambda (x) ((word bar) ((word foo) x)))))
      (collect))))

where 'dictionary' and 'word' are macros that make this code a bit
more readable. updating a dictionary to replace words with references
goes like this:


      (baz forth (lambda (x) ((word bar) ((word foo) x))))
->
      (baz macro (base: #x0123 compile))

maybe this deserves a little macro of itself to turn it into

      (baz macro (address #x0123))

the next step is to turn this dead representation into a live function
that can be composed to generate new syntax transformers.


so why a raw lambda, and not some form that has concatenative code?
the problem is that this requires uncompiling: the lambda is the
result of all syntax that might be defined in forth code, which can
contain arbitrary scheme forms: there might be no 'simple'
concatenative form.

on to extension transformation.
  rep + name.address -> rep

Entry: not currently transforming
Date: Sun Mar 23 13:52:01 EDT 2008

PROBLEM: not being able to run the transformers because they depend on
some expander environment is problematic. i'm using some functions
that use context only available in `official' transformers. fix that.

EDIT: this is not just 'some expander environment'. it's the lexical
environment of the expression being transformed. also see entry about
calling macros directly. entry://20080325-144330


in other words: is it possible to turn a function into a transformer?
this requires some level shifting voodoo i don't see how to perform..

(define-syntax transformer (lift tx-fun))  ?? too simple to see ??

EDIT: looking at plt/src/mzscheme/src/env.c -> now_transforming
(syntax-transforming?), this predicate is derived from the presence of
a scheme_current_thread->current_local_env value.

in eval.c -> expand the function _expand is called with a new expander
environment.

hmm.. too complicated to quickly browse. what i did see is that the
environment during an expansion is ether dynamic, or attached to the
stx objects. the latter makes sense, but i can't see any obvious
refs.. the magic is done in resolve_env, which is quite
complicated. it looks as if the info is not tied to the objects..

so guesswork: the expander has a dynamic environment which contains
the lexical environment of an expression.


Entry: what is dict?
Date: Sun Mar 23 18:36:38 EDT 2008

the incremental compilation has type (DICT,SRC) -> (DICT,BIN)

so what type is DICT?

It needs to be something that can be serialized, so what about plain
s-expressions? Is there information that cannot be captured?

One problem is that arbitrary extensions might depend on external
code. Representation should probably be a 'module' form which
explicitly states its dependencies. Problem solved. Module forms are
neatly representable by s-expressions since they have no free
variables.

Now how to go about this?

A dictionary is a module that exports a function 'update' which takes
forth source code, and outputs the expression of another module and
binary code.

This is interesting: it involves writing a "module quine" ;)

EDIT: when you can modify the language in which to write a quine, the
problem is trivial. the reason why some quines are interesting is the
length it can take to express one in some language / system, or the
extent to which detours can be taken..


It almost works, except for some constant redefinition errors.

OK. got it working with arbitrary payload and update expression. some
interplay between macros and s-expression code generation: on each
iteration:

  * apply the update expression to the state
  * pass on the update expression
  * send the output

very straightforward iterative system once the boilerplate generator
is in place. this can be minimised by re-defining %plain-module-begin
or something..

updates: can probably standardize the module name, since only one is
necessary and it can be loaded into a sandbox.


stylized the reflection loop a bit: now using #%module-begin macro and
a minimalistic module spec which also carries over the body code.

cleaned it up to a single macro:

(define-syntax (module-begin stx)
  (syntax-case stx ()
    ((_ (tick state) . forms)
     (let ((name
            (syntax-property
             stx 'enclosing-module-name)))
       #`(#%plain-module-begin
          (provide update)
          (define (update input)
            (let-values (((state+ output) (tick 'state input)))
              (values
               output
               `(module #,name scat/forth-dict
                  (tick ,state+) . forms))))
          . forms)))))

so state update / storage is solved.


Entry: dictionary update
Date: Mon Mar 24 09:21:37 EDT 2008

the incremental compilation has type (DICT,SRC) -> (DICT,BIN)

so, what does the module storage part need to implement?
 - name binding (i.e. add special require forms)
 - implement a transformer binding

it will be used as a syntax include.


Entry: lexical syntax annotation
Date: Mon Mar 24 10:33:01 EDT 2008

still i don't fully grasp the notion of lexical information in
syntax. the macros 'let' and 'lambda' annotate syntax with lexical
properties, when they are expanding. so does this work?


(define (make-expr body)
   #`(lambda (x) #,body))
((eval (make-expr #'x)) 123)

=> 123

so it looks like indeed, building syntax like that does perform
'capture'.

my problem is this

(define (wrap stuff)
   #`(lambda (x) #,(stuff #'x)))

(define (stuff stx)
   #`(let ((x 123)) #,stx))

(eval ((wrap stuff) 1))

=> 123

so the inner let captures the #'x
there's nothing special about defining the formal parameter x and the
reference x in the same function 'wrap'.


(define (test-stx stx)
  (syntax-case stx ()
     ((_ (a) b)
      (bound-identifier=? #'a #'b))))

(test-stx #'(lambda (x) x)) => #t

so 'bound-identifier=?' does do some lookup: it doesn't need to expand
the syntax?

EDIT:

what i wonder is: is it possible to construct a function 'wrap' as
above which can guarantee that the 'x doesn't get captured? this
requires some kind of symbol rename.

looks like it's a legitimate question:

(define (wrap stuff)
  (let ((x (car (generate-temporaries #'(x)))))
    #`(lambda (#,x) #,(stuff x))))

(define (stuff stx)
  #`(let ((x 123)) #,stx))

these are constructed using interned symbols, not gensyms. let's
update the lambda code.

so what about syntax marks? 2.3.5

(define-for-syntax (wrap stuff)
  #`(lambda (x)
      #,(syntax-local-introduce
         (stuff (syntax-local-introduce #'x)))))
(define-for-syntax (stuff stx)
  #`(let ((x 123)) #,stx))
(define-syntax (test stx)
  (wrap stuff))

((test) 456)
=> 456

works..  so, if i understand: marking is an on/off operation: the net
result is that syntax introduced by the transformer is marked, and
thus not the same as syntax introduced elsewhere.

EDIT: this does give problems with names introduced by 'letrec-ns'
separately: they are no longer catched. so.. in the end, i'm better of
doing everything non-hygienically? sounds like not right..


Entry: cleanup
Date: Mon Mar 24 11:58:57 EDT 2008

i'm not in a great design mode today, so going to do some maintenance
and simplification.

changed some names in forth-tx + took out the mode symbols and
hardcoded them to 'forth and 'macro -> lambda will capture them. (see
the remarks about bound-identifer=? and the local transformer
environment). the question that rises here is can forth code shadow
the 'forth', 'macro' and 'collect' names? yes: but only locally within
one forth file.

maybe it's also better to flatten the dictionary representation that's
stored in the modules? let's postpone this a bit..

TODO:

'with-forth' and the scat macro: collapse syntax and namespace. i
tried to add a similar 'word' macro but apparently it doesn't do
that.. maybe add syntax-parameters?

break.


Entry: dictionary serialization
Date: Mon Mar 24 18:38:34 EDT 2008

but.. with these things going on, the representation is no longer
serializable as an s-expression. so maybe i should look into
flattening out everything to a simple associative list..

there's a bunch of problems there that need to be ironed out. maybe
it's time to take a couple of days break from it? from what i've been
reading today serialization if not in compiled format is going to be a
problem due to hygiene: converting expansions to s-expressions is not
a good idea..

compared to previous implementation, what do we have here:
  * no ambiguities for names
  * can obtain all macros from loading source code

i guess it's time to start writing down some requirements and work
from there..


Entry: basic structure
Date: Tue Mar 25 08:44:07 EDT 2008

maybe i should just stick to using an opaque representation, and try
to use this to gain access to the data.

ther are 2 models to use this:

  * all code is available in source form: intermediate representation
    doesn't matter much, since it can be redefined.

  * some code is closed. in that case, the representation does matter,
    because it becomes an interface.

it's the latter problem i'm trying to solve.

if this could be almost human readable, but mostly independent of
mzscheme's binary representation, that would be great.


requirements:

  * a dictionary needs to contain binary code + macro code + reference
    to the compiler version / library it was made with.

  * a dictionary needs to be opaque, and at the same level of scheme
    modules.


what about separating the incremental model from the libary model? the
incremental model is mostly for developing. they are two different
namespace models.

  * kernel: uses mzscheme's module name management system.
  * incremental: extends the flat public interface.

maybe the next question is: what is a forth file?

      how to make "forth file" == "scheme module"

and build the incremental compiler on top of that. it's very
straightforward for macros. but what about words? every module that's
compiled contains forth words the same way as the nested let.

yes. it's better this way.


Entry: the ':' macro
Date: Tue Mar 25 13:00:05 EDT 2008

whenever forth->definitions is called, it needs to be done in an
environment where level -1 has the some definition macro (':')
defined. this can be ensured by requiring it for template.

hmm.. something's wrong here. got some dependencies tangled up.. can
it be made automaticly?

separate things: forth-lang.ss now gives a macro 'forth:' that
produces toplevel forms from forth code. this can then be wrapped for
module usage.

module works, but it doesn't want to export the name symbols. calling
the transformer directly should solve this: then no marks are
added. alternatively, we could mark ourselves?

aha: the 'provide' statement needs to have the same lexical context as
the names, then it works.

(define-syntax (module-begin stx)
  (syntax-case stx ()
    ((_ code ...)
     (let ((name
            (syntax-property
             stx 'enclosing-module-name)))
       #`(#%plain-module-begin
          #,(datum->syntax stx '(provide (all-defined-out)))
          (printf "FORTH:\n")
          (forth: code ...))))))

gathering forth code works too now: everything is dumped in a list
which is named according to the module name.


Entry: calling transformers directly
Date: Tue Mar 25 14:43:30 EDT 2008

isn't really good style. why? they all run in the same lexical context
as the transformer they are called by.

need to have a better look at local-expand and friends to see if
there's no better way to handle this. in other words: check where is
it (not) necessary to have the same lexical context.

i guess it's ok for the RPN code representation: a piece of RPN code
produces a single lambda expression, which is a single lexical
environment anyway..

but i don't see if it's always harmless.
maybe i should just try to break it?

because of the way that early expressions are on the deepest nesting
levels, introducing new names into the expansion only influences code
BEFORE a certain point, so i see no reason to do so.


Entry: syntax-case implementation
Date: Tue Mar 25 19:59:59 EDT 2008

let's see:
http://www.cs.indiana.edu/~dyb/pubs/tr356.pdf

  two identifiers are bound-identifier=? only if they they have the
  same name and are present in the original program or are introduced
  by the same macro.

  free-identifier=? determines if two identifiers WOULD refer to the
  same binding.

  generate-temporaries creates a list of temporary names. not becase
  renaming is necessary (never is!) but because it might be convenient
  to insert (a list of) names.

macro implementation:

  for each expansion, a mark is created. input and output syntax is
  marked, and double marks are dropped. the net result is that
  identifiers introduced by the macro are marked once.


Entry: onward
Date: Tue Mar 25 20:29:12 EDT 2008

with the basic representation intact, i'm going to leave the the
incremental dictionary stuff for later and concentrate on bringing
over other constructs from brood.

  - reader syntax OK
  - the pattern matcher OK
  - variables and constants OK
  - code chunks / anonymous words OK
  - exit / jump-to-end OK
  - local variables OK


Entry: reader
Date: Wed Mar 26 14:58:39 EDT 2008

stole syntax/module-reader and adapted it for reading forth code.


Entry: splitting scat and forth
Date: Wed Mar 26 15:14:20 EDT 2008

let's call 'scat' the rpn language with functionality and namespace
management, and grow this project into brood-5 instead of trying to
port it 'into' brood. i see no reason to release scat as a different
project yet, but there are good reasons to separate the scat code into
a single collection that's accessible trough scat.ss and scat-tx.ss
interface files

maybe time to combine some files.. there are a lot.


Entry: pattern matcher
Date: Wed Mar 26 17:38:14 EDT 2008

looks like it's working: had to correct some hygiene: names lost their
lexical content in name-stx->symbol

time for more porting. next: constants and variables.

trying to port some pattern macros to scat, and i run into the use of
macro-find/false. this is the first occasion of dynamic namespace
access, which requires a bit of thought to solve..

the deeper question is: are quoted symbols still allowed? or do they
always refer to macros? i guess the answer is no. that's what making
things static is all about. it could still be added later, but let's
not do so until there's a compelling reason.

ok.. it's getting a bit more serious now: true cleanup. no more hiding
behind reflection ;)


Entry: name generating words
Date: Wed Mar 26 19:58:11 EDT 2008

it's all about names now.. the next things to tackle are name
generating words. let's start with 'constant'.

this needs to create a macro eventually. i eliminated this before
because of the awkward reflection loop. is that still a problem?

yes. it's a phase mix that can't be resolved, because "free-range"
code is no longer allowed. so, no constant: use macros instead.

then 'variable'. there's little trouble here, except that it requires
address resolution, so qualifies as a word.

so.. what does variable expand into? maybe a forth-word? let's see if
forth-rep.ss can be extended to represent variables.

maybe it should get its own representation, next to forth and macro?

a variable is a macro that compiles a reference (as literal) to a word
structure. so in some sense, it is a forth word: the defining property
of a forth word being: need for allocation of memory.

so what does a variable look like?

(make-word (scat: ',var-struct literal))

this var-struct should also be a word struct, so it can be registered
in the same way as forth words. however, the normal 'forth' wrapper
expression isn't really useful here. so it might be necessary to
change that a bit.


Entry: delimited words
Date: Wed Mar 26 21:31:10 EDT 2008

if there's a 1-1 relation between names and words, some care needs to
be taken to solve conditional jumps etc..

this is exactly the kind of trouble i got into when trying to make a
purely concatenative VM for the catkit on-target forth dialect.


Entry: forth-tx.ss and macro-lang.ss
Date: Thu Mar 27 08:27:14 EDT 2008

note that 'macro:' is really for brood style macros (postponed target
words) but the forth-tx in itself is more general. maybe move it to
separate directories? in short: those 2 need to be bound, at the
forth-lang.ss level, (which should be purrr), because they are
orthogonal upto there.

yes, it's a good time to decide how to make behaviour pluggable. for
microcontroller targets, most of the forth code can be shared.

forth uses 'compile' and 'literal' from the underlying target. so
should it be a unit? it's only 2 names, let's first pass them in
through function/macro.

let's reach for the bottle: dynamic binding. it solves all your
problems! ;)

but, in this case it might make sense. units are really overkill, and
to have some default is interesting for testing. i guess if the
bindings themselves are isolated, they are easy to change later.

so what to separate:
   - macro    (representation of postponed code + pattern matcher)
   - forth    (forth syntax on top of macro)
   - purrr18  (code specific for PIC18)

so.. got macro separated. now trying to make a Forth layer on top of
SCAT. note that here 'imperative' code should be possible! that's
something to fix later. until then only declarative code.

OK. scat-forth.ss is working!


Entry: duplicate module instances
Date: Thu Mar 27 11:22:56 EDT 2008

i ran into a problem where the rep.ss module is loaded twice when
requiring test/fafa.f and scat.ss (the latter to get at the scat:
symbol)


Entry: dependencies between subprojects
Date: Thu Mar 27 12:12:27 EDT 2008

maybe it's best to just have one file for both the main and tx words.
did that. looks a lot cleaner now. also made the test cases pull in
all code.


Entry: variables
Date: Thu Mar 27 13:17:08 EDT 2008

So, let's represent variables by

(define (wrap-variable name size)
  (let ((word
         (make-target-word
          name #f size)))
    (values
     (scat: ',word literal)
     word)))


This probably requires a compiler extension since it's different from
macro and forth modes.

Got it working: the trick was to add a special variable mode that
evaluates macros as literals, and a 'buffer' word that behaves as ':'
to define that macro. This then leads to the subsitution macros:

(substitutions
  (macro)

  ((variable name)  (buffer name 1))
  ((2variable name) (buffer name 2)))

see macro/target-rep.ss and forth/forth-tx.ss for implementation.


Entry: control structures / contitional jump
Date: Thu Mar 27 16:17:00 EDT 2008

there's an opportunity now to write the forth-style control words in
terms of higher order abstractions. is this possible? or are the more
lowlevel forth constructs necessary?

probably things like for .. next are going to lead to
trouble. basicly, i need the equivalent of 'label' and a way to
emulate fall-through.

the problematic part is the conditional jump.

conceptually, it joins 3 words parts: the part before, and the 2
branches. let's just do if:


: bla if do-it then go-on ;

->


: bla   ' l0 ift go-on ;
: l0    do-it ;

ha!

forth-tx.ss has a 1-element stack. turn this in an arbitrary length
stack and all branches can be postponed and compiled after the word is
done.

that's the mechanism: making this work so the current 'inline'
branches are still used should be straightforward. maybe it's just a
'swap' on that stack?

again. the idea is to make temporary at each brach point.

let's try the compilation stack thing.
OK. implemented. now what does it mean if there is more than 1
continuation waiting?

ok. i know what this is!
quoted code ;)

expression nesting should be part of the parser..

but the stack's not a stack but a queue: quoted defs come AFTER
current def.

i got a bit of a pardigm clash here: the scheme lexer has support for
s-expressions, so is a better candidate for building a syntax for a
language that supports code quotations. however, this is not
compatible with forth syntax. the question is: how to map

  if <1> else <2> then    ->     [ <1> ] [ <2> ] ifte
  begin <1> again         ->     [ <1> ] forever

doesn't look like it's a good idea to write a second recursive
expression parser.. really. stick to s-expressions for that. part of
the purrr kernel could be written in this different syntax: all the
machinery to manage that is available now.

rephrase the question: how can we keep the illusion of straight-line
code? it's very convenient to have as a low-level tool, but for
optimizations it's better to have a graph structure.

maybe forth chunks should just be lined up?

what about using an assembler instruction for this? fallthrough?

problem is that this doesn't collect all the variables.. maybe
variables should have them too? or variables represented as an 'allot'
opcode?

so, what about:
  * forth words have reversed asm code stored
  * the head of the list (last instruction) is 'falltrough' which
    points to the next word.
  * compilation interface exposes 2 words: register and compile.


Entry: variables again
Date: Thu Mar 27 18:01:01 EDT 2008

if variables are represented by the opcode 'allot', then they can
probably be generalized to quoted words: this is basicly
'create'. do i need to distinguish between ram/rom/eeprom allot? for
now, let's keep it simple.


Entry: semicolon
Date: Thu Mar 27 19:22:22 EDT 2008

let's say that 'exit' always means return, even in macros. but ';' means

   macro: jump to end
   forth: exit

how to implement "jump to end" ?

this requires labels.

so.. what is a label? an entry point. a word.

let's see.. there are 2 kinds of fork points:

  * conditional goto   (not call!)
  * entry / label

so.. a macro can split a word?


  x x x then y y y

this turns the ys into a different word. the place where this would
happen is in target-rep.ss in macro->code:

that call might return multiple words. the first one being the
original word, but other ones made of chunks, where each chunk is an
entry or fork point.

note that this is almost the same as the splitter for parsing forth
definitions. why are they not completely the same? problem with the
forth parsing requires a function return because of 'rpn-next', while
the code splitter doesn't..

so.. this needs to be brought to the level of words (which have
names), not code lists. OK done that.

so.. what needs to be a word? jump targets. the reason is simple: jump
sources are clearly visible (idally, a word would be ONLY jump
sources), but targets are not. maybe in a later step also eliminate
sources? anyways. there's 2 kinds of jump targets: forward references
(if .. then) and backward references (loops).

OK: it's basicly like it was before, but labels are now references to
word structures (created when the label is created) and created by the
'label' macro. these are used by the 'split' word to start filling the
word structure with code.


Entry: hindsight
Date: Sat Mar 29 09:39:50 EDT 2008

Making BROOD more static is a way to bring the early exploratory phase
into a more fixed structure. It seems the overall design is good
enough to be stabilized. from that perspective it makes sense to cast
it in stone.

What did actually change over the last couple of weeks? Basicly,
symbols are disappearing. The only place where they are still left is
in the assembler, but that's easily changed. Symbols are replaced with
identifiers, and are a (scheme) compile time object. The Forth
compiler is now implemented using Scheme's exposed compiler API (the
macro system).

What this change did for the structure of the program is to point out
places where reflection was used without justification. It's now using
PLT Scheme's approach of 'unrolled reflection'. As a result, more
things can be checked at compile time + name handling is completely
handed off to scheme.

This is the natural extension of 'lamb / brood 3'. The step through
brood 4 was necessary to get familiar with the language layer
approach. Giving up the NS hash table and run time evaluation hacks is
the final step in trusting this layered module system: when names are
identifiers, scheme can do the management.

   so to restate one of the goals of brood: to do Forth the PLT Scheme
   way: built on top of its hierarchical macro/module system.

things i'm adding:
   - RPN: rpn syntax for scheme
   - SCAT: an embedded rpn language with scheme semantics
   - PAT: a typed concatenative pattern subsitution language
   - MACRO: a language for expressing postponed forth semantics
   - FORTH: building forth syntax on top of scat
   - PURRR: macro language with forth syntax

MACRO has postponed semantics because the output of brood is assembly
language. If there were a simulator, it would make sense to bring the
static implementation down to that level. Instead, that part is still
interpreted. This also allows easier integration with external
assemblers.

Central to brood is the structure of the MACRO language. It's a
concatenative language that operates on 2 stacks: the SCAT data stack
which is used as the MACRO compilations state stack, and the 2nd
stack, which contains the assembly code, is interpreted by the PAT
language as a typed concatenative language: the one which implements
partial evaluation in PURRR.

PAT's matching can be translated to compile time if at one point it is
decided to give up on the symbolic asm representation, but instead use
a (typed) abstract one. This probably needs some cleanup in the
assembler first.


Entry: exit: jump to end
Date: Mon Mar 31 08:59:52 EDT 2008

now for 'exit'. what i want to support is words/macros like this:

    ... if ; then ...

the ';' is exit for ordinary words, but it's jump-to-end for
macros. the 'end' is defined in the environment that executes a single
macro. there's no way to do this except for wrapping each macro.

ok. so, split it in 2 parts.

   * meaning of ';'. for macros, it means 'exit-macro', for words it
     means 'exit'.

   * implement 'exit' and 'exit-macro'.


not that local macro exits always occur within some conditional
construct which ALSO introduces splits. can this be used somehow? or
is it better to optimize this away when eliminating empty words?

aha. the label is that for the code AFTER the exit, not the code it
exits TO! they's different!

seems to work. note: this allows dead code elimination: anything
present in a word's code body AFTER exit or jw can be eliminated. this
allows to turn jw/nop -> jw making code re-arrangement possible. dead
words are then eliminated by just not being reachable.

the next problem that pops up is interference between 'exit' and how
it is used in optimizations. basicly, 'exit' needs to be a parameter.


seems to work.
one more problem: if the last word in a macro is ';', it doesn't need
to split: would create too many spurious words.

maybe the ';' thing can be checked at (scheme) compile time? the
problem is that this really needs semantics.. so, no.. maybe propagate
source location information to at least give proper error message?

ok. wasn't so difficult: srcloc is now passed from compile time to be
stored in the target word structure, or used in macro error reporting.


Entry: eliminating the meta language
Date: Mon Mar 31 11:11:48 EDT 2008

maybe it's possible to eliminate the scat: meta layer altogether?
maybe makes things simpler, but could lead to some circular refs. scat
is really only necessary to implement 'm>' and '>m', the rest can be
implemented with the pattern language.

hmm.. let's keep it in there for a bit.


Entry: local variables
Date: Mon Mar 31 13:54:41 EDT 2008

.. and then i'm done.

: foo | a b c | c b a ;

first step is to make an anonymous version of the pattern language.

took some time, but got it. needed to clean up pattern language
transformer intermediate state.

the interesting thing now is that we basicly get multiple occurences
for free. if locals captures the 'state' argument of

  (lambda (stack)  (expr stack))

this is pretty straighforward:
  * close the expression collected up to then
  * apply it to the input state
  * collect locals from this state
  * bind locals to variables
  * bind wrapper macros
  * exand the rest of the code in this augmented lexical env.

EDIT: yes, but it took some detours ;)
it's working in the most generic version, with as much as possible in
the form of runtime support.


Entry: interpolation in ellipsis
Date: Tue Apr  1 08:51:51 EDT 2008

var ...  ->   (1 2 3 var #,(bla) 4 5) ...

doesn't seem to work. trying to work around that with local syntax
(let-syntax).

this works, but it made me run into an interesting problem: the local
environment of the transformer is lost when doing something like

(let-syntax
  ((rep: (lambda (stx) ((rpn-represent) (stx-cdr stx)))))
 ...)

now, i tried to capture that environment, but apparently that doesn't
work becasue the RHS is evaluated in a different phase.

basicly, the transformer in which the let-syntax expression occurs and
the RHS are independent.

i don't see a way around this, but it might be interesting to think
more about it. i had some small bad feeling about parameters, and this
is where it goes wrong..


Entry: time to start porting brood
Date: Tue Apr  1 12:09:15 EDT 2008

looks like all the machinery is in place, except for incremental
compilation. time to drag library stuff over, then think about
incremental stuff.

the next problem might be 'load'. which has to be replaced by a module
based interface. the thing to solve here is 'require' in forth. this
probably requires the function that moves forms to the toplevel..

maybe 'begin-lifted' from (lib "etc.ss")

i got it working by manually collecting requires in the purrr-lang.ss
wrapper, but that's not the best way..

/usr/local/plt-3.99.0.12/collects/scat/purrr/purrr-syntax.ss:41:10: require: not at module level or top level in: (require "purrr-bla.f")

now.. how to collect code from different requires?


looks like i need to get at the require before names are expanded.

OK. fixed it by expaning straight to a #%plain-module-begin


Entry: collecting words / incremental compilation.
Date: Tue Apr  1 14:55:15 EDT 2008

suppose we're building a kernel. that kernel is represented by a
single module. when instantiating that module, we get access to the
exported words. these are linked to structures that might not be
provided explicitly.

all dependencies are handled, and the required target code can be
computed by flattening the call graph given the entry points.

this means the problem of getting a linked kernel with limited entry
points is separate from building a library of macros accessible in
other programs. maybe time to start working on incremental
compilation?


Entry: purr18 / redefine
Date: Wed Apr  2 09:15:33 EDT 2008

maybe leave the incremental compilation bit till later, and try to
port the core purrr18 language first, then figure out how to modify
the assembler. maybe the latter should really be kept separate so i
can target external assemblers.

ok.. the next problem is the use of a lot of undefined bindings in the
previous pic18 spec. it needs a proper mechanism for target plugin
behaviour. one way to solve it is to wipe it under the carpet and move
it to the assembler, since that's symbolic. but this probably won't
work for everything. i.e. if a DUP is defined in the kernel, it needs
to be replaced in all the code that uses it. it's true late binding.

there are 2 paths to take, the static one (units + explicit linking)
and the dynamic one (just redefine the macro structs). since the
core language is macros only, that's ok.

there is one problem though: the core language's code should be
independent of target. if some target decides that a core macro needs
to change, core shouldn't have to know that. that means it has to
provide an exact specification of all words that can be
redefined.

this looks like too much of a hassle, so let's go for redefine using
mutation of word structs + some mechanism to at least keep track of
changes.

what about this: allow for macros to be redefined by checking for the
availability of the identifier. if they are defined, bind the old
functionality to a name SUPER, so one could do things like:

   (([drop] dup)  ([movf 'INDF0 0 0]))
   ((dup)         (macro: SUPER))

this makes sure that words are at least defined, and it also makes the
hierarchy between redefines clear (= same as module hierarchy).

what model is this? it's not late binding (at least one binding needs
to be present).

TODO:
       * fix 'insert' syntax to something simpler. OK
       * add redefines to the macro syntax

think more about why this is a bad idea.


redefine can be a postprocess step by having it refer to itself first,
then to swap the old and new implementations.

why are assignments bad? no sharing is possible. that's where
parameters are better in some cases: at least the extent of the side
effect is limited. so should i just make each word a parameter?


  - if the name exists, it doesn't get redefined, but the word that's
    returned by the wrapper is swapped with the one defined


(define <name> <body) ->

(letrec ((macro/super <body>))
  (swap-word! <name> macro/super))

can this be handled by define-ns ?

no.. it needs to be at the module language level: that's where the
define-ns macro is inserted. nope.. it needs to be deeper than that.

got it working with this:

(define (define/swap!-ns-tx define stx)
  (syntax-case stx ()
    ((swap! ns name val)
     (let ((id (ns-prefixed #'ns #'name)))
       (if (identifier-binding id)
           ;; introduce 'super' as temporary self-ref
           (let ((super
                  (ns-prefixed
                   #'ns (datum->syntax #'val 'super))))
             #`(letrec ((#,super val))
                 ;; swap to undo self-ref
                 (swap! #,super #,id)))
           #`(#,define #,id val))))))

tested with:

(define/swap!-ns word-swap! (macro) dup (macro: super super))
-> expands to
(letrec-values (((macro/super) (macro: super super)))
  (word-swap! macro/super macro/dup))

(macro/dup (make-state '() '((qw 123))))
-> #(struct:state () ((qw 123) (qw 123) (qw 123)))

the module level thing didn't work because the 'require' statements
were not expanded yet, so bindings were not there.

now, the test case with forth lang doesnt work. FIXED: problem was
the introduction of 'super' from syntax context != source.


Entry: parameters
Date: Wed Apr  2 20:39:46 EDT 2008

some words about
  1) bottom-up VS. late binding.
  2) augmentation (permanent) or
     default + specialization (temporary redefine)

(the interesting thing about writing brood/scat is that it brings up
issues that seem quite important on the level of larger scale program
organization)

so, is this specialization of deep components just a dirty hack?

  + bottom up design with static binding at least solves the
    'undefined' errors: the lowest layer's linking is statically
    checked. having a core that's bottom up makes it easier to develop
    and test: there are a lot of macros and transformers in there. the
    defaults for low-level components could be just for testing
    partial eval.

  + allowing vectored code (making everything a parameter) solves
    having to explicitly _declare_ things as a parameter: some
    pluggable behaviour is necessary in the current approach.


are parameters necssary in the core language? they make sense for rpn:
used in scat/macro/interactive.

do the macros need a similar approach?
is this all a consequence of the solution, or the problem?

seems the most important question is: OK to let higher level modules
modify parameters, instead of setting them with 'parameterize'?

what about making re-definition explicit in the 'compositions' macro?


Entry: standardizing interface names
Date: Thu Apr  3 07:25:57 EDT 2008

The goal is to make the macros used to define rpn words handle
redefinitions. The previous approach of mutation will be replaced by
parameters. Before that, let's give an appropriate name to toplevel
macros:

'compositions'
'patterns'


Entry: every macro a parameter?
Date: Thu Apr  3 07:39:22 EDT 2008

What about defining extensions as a functions that installs
environment modifications and runs a thunk? This allows to keep at
least the most basic functionality intact, and allows specific
extensions to be represented by an object.

To come back to yesterday's remarks: i'm a bit uncomfortable with
extension of low-level components without being able to undo
those. Carefully building a bottom-up structure and then starting to
poke around in its innards without 'undo' mechanism seems wrong..

So what is the real reason for needing this poking? The macros employ
target specific optimizations.

This should solve it:

(define (make-macro . args)
  (make-parameter (apply make-word args)))

Hmm.. where to introduce? First naive replacement does violate
something.. a lot of code assumes macros are functions.

Maybe in 'make-word' ?

Maybe using dynamic-wind is still the best approach. The problem is
that my representation type isn't abstract enough: i really would like
it to be a procedure mapping state->state.

So.. is dynamic-wind thread-local? Maybe that's the big difference.

So what about this: keep word interface like it is, but provide a
words-parameterize form: if it can't be solved with straight
parameters because they are procedures, solve it on the other side of
the interface.

EDIT:

some auto-upgrade is added so it is not NECESSARY to specify that
words are parameters when the auto-upgrade words are
available. however, it is possible to PREVENT words from becoming
parameters by not exporting the auto-upgrade functions.

this looks like it's flexible enough. now to adjust the compositions
and patterns macros to use this.

got 'with-compositions' working, but it needs a 'super' too.

got 'super' working too.

so now the mechanism for extending the compiler is in place: every
word can be replaced in dynamic context + a mechanism for at least
limiting some replacement can be easily installed.

EDIT: got pic18 compiling with redefined core words.  now.. get rid of
the parameters in target.ss and make those into re-definable word
too. then it should be mostly done.

todo:
  * remove target-postpone-* parameters in macro-lang.ss and replace
    them by parameterized words. OK
  * same for split / label? -> NO: special api


Entry: uneasy feeling
Date: Thu Apr  3 16:28:44 EDT 2008

something doesn't feel right with this parameterization. however, it
does look simpler..

maybe some mode should be added to automatically extract which
parameters are redefined? but then, 'super' might not do anything.. i
need to think a bit about this.

EDIT: so why is this parametrization necessary? it's a cross-cutting
concern: OPTIMIZATION. it's used not (necessarily) to change
functionality, but to change implementation.

the thing which makes it a bit half-assed, is the way in which
responsabilities of the core and the extension (pic18) are
distributed. is there a sound abstraction hidden here?

OK.. so what about automatically collecting all the extensions at
compile time, but leaving them unspecified in the code. what about
doing it the other way around: specifying ALL target specific macros
as an extension, and have them define the parameter if it doesn't
exist yet.


Entry: simplifying ns-tx
Date: Fri Apr  4 10:48:55 EDT 2008

a lot of syntax-case macros get simpler by using let-syntax: this
allows ellipsis to be used instead of explicit list manipulations.

the core 'ns' functionality is really to transform the names in a
symbol introducing macro like let. abstracting this now.


box> (letrec-ns (macro) ((a 123) (b 345)) #f)
scat/ns-tx.ss:94:19: compile: access from an uncertified context to unexported variable from module: "/home/tom/scat/scat/ns-tx.ss" in: make-let-ns-prefixer

after exporting the variable it worked..


Entry: what is an extension?
Date: Fri Apr  4 12:09:02 EDT 2008

so.. i'd like to keep the hierarchical module approach, which means
that several language modules can be built on top of the same code,
without the need for different module instances. this requires at
least that the FUNCTIONALITY of the module's data structures (words)
is not modified. in that respect the current approach with parameters
is OK.

so, an extension is a PARAMETERIZATION of the core compiler, such that
it generates target-specific code using abstractions provided by the
core.

currently, each extension needs to know which words are to be extended
using 'with-patterns' or 'with-compositions' macros.

now, does it make sense to have 'with-xxx' automatically define names
if they do not exist yet, with 'super' bound to an exception?

or should i just forget about all this nonsense and go back to
multiple instances of modules, with permanant mutation of word
structures?

let's read the doc about mudule instances first.


Entry: module instances
Date: Fri Apr  4 12:27:26 EDT 2008

looks like i'm stuck at a fundamental misunderstanding.. each time
'require' is called, the RHS of define expressions is evaluated. so
ALL the expressions are instantiated! this happens EVERY TIME AGAIN
when code is required.

however, this process is fast if the code is already COMPILED.

how did i work around this with the ns hash before? it looked as if
there was only a single instance.. maybe because i was working in a
single module environment?

in the light of the re-definitions and parameters mentioned before,
what i'm trying to do doesn't really make sense: suppose i want a
PIC18 and PIC30 forth. if i want both of them in the same module,
parameters would be a good approach.

but having them in separate modules would work just as well. they
would represent different specialized instances of the core compiler,
but would have no sharing of data. they could be explicitly put in
different namespaces. in such an approach, modifying the
word-structures in-place is a perfectly legitimate approach.

the parameter words weren't for nothing however: they can still be
used for local parameterizations, like ";".

OK. 'compositions' and 'patterns' now will re-define words, with
'super' bound to the previous implementation.

i guess now it's time to see how to compute code instances?


i tried this:

tom@del:/tmp$ cat A.ss
#lang scheme/base
(require "B.ss" "C.ss")
(printf "A\n")

tom@del:/tmp$ cat B.ss
#lang scheme/base
(require "C.ss")
(printf "B\n")

tom@del:/tmp$ cat C.ss
#lang scheme/base
(printf "C\n")

box> (require "A.ss")
C
B
A

box> (define NS (make-namespace))
box> (parameterize ((current-namespace NS)) (namespace-require "A.ss"))
C
B
A


so, what happens is: each toplevel require pulls in what is necessary,
but instantiates the phase 0 expressions only once. the same require
in the same namespace will not do anthing, but in a different
namespace it will instantiate again: values are not shared between
namespaces, but compiled expressions are shared (the global module
registry)

The manual has this to say:

  In a top-level context, require instantiates modules (see Modules
  and Module-Level Variables). In a module context, require visits
  modules (see Module Phases). In both contexts, require introduces
  bindings into a namespace or a module (see Introducing Bindings).

So what's the difference between 'instantiate' and 'visit'? Visit just
looks at the phase 0 names, and evaluates any phase 1 expressions, but
doesn't evaluate phase 0 expressions. Got it.

This brings new light to collecting code. All modules should require
the central code registry, which then defines a *code* variable. Got
every module's code annotated by its name too: this gives all the code
compiled + a possibility to take only what's necessary.


Entry: next
Date: Fri Apr  4 14:36:09 EDT 2008

so what's next?

get the monitor to compile with 'require' instead of 'load' +
provision of symbols. this might introduce some trouble with undefined
symbols. also, the assembler needs some fixing.

that's a bit boring.. what's the most interesting thing to do next?


Entry: beautiful vs. interesting
Date: Fri Apr  4 17:16:35 EDT 2008

i was thinking about beautiful vs. interesting today. if beatiful
means simple on a superficial level, then interesting means simple on
a deeper, hidden level. in programming, things are usually interesting
before they become beautiful.

boring then means non-compressed complexity: that you don't understand
the problem.. it's either simple or interesting. boring can always be
made interesting on a meta-level, right?


Entry: poke
Date: Fri Apr  4 17:22:15 EDT 2008

what about starting poke, as a temporary relief from the pic18 guts?
poke should fit nicely on top of macro. it might also be a way to
improve on what 'assembler' means.

let's port the C code generator first.

the cgen has a special purpose transformer. let's map that to macros,
starting from the bottom. hmm.. doesn't really work that well since
this prints to strings, not s-expressions.

OK.. got indentation working as parameters. now.. go from
s-expressions -> syntax objects so 'syntax-case' can be used. before
that, first needs to be defined what an expression transformer is:

    stx -> string

ok. got it: syntax-case instead of match, but not using the scheme
expander.


Entry: compiler's NEXT
Date: Sun Apr  6 11:27:33 EDT 2008

this should be (source, exp) -> (source+, exp+) instead of returning
exp only when done: that would eliminate the hoop-jumping in
forth-tx.ss

ha: the continuation thunk used doesn't work well with the current
continuation passing approach. -> fixed it by returning a thunk
instead of just the syntax: some extra information was needed (a ':'
means 2 things: 1. end previous def 2. start new one)

makes me wonder if i can fix the splitting in target-rep.ss in a
similar way.


Entry: the assembler
Date: Sun Apr  6 14:43:02 EDT 2008

keep the road open to have assembler opcodes as syntax, to carry over
source code information.

so what does the assembler do?

  * provides chunks of (fully linked) target code representation given
    a namespace of primitive assemblers.

  * resolves target code/data addresses (using a multi-pass algo)


what does the assembler not do?

  * symbol resolution: all symbols should be resolved by the compiler,
    so there is no need to do any namespace management.

  * code order is determined by the compiler: assembler gets a list of
    word structures.

should it be called 'assemble!', and see it as a graph update
function?

not boring at all: there's a problem that needs to be solved, and
quite a deep one: what about symbols? well actually, they might
evaluate straight to word structures, which are accessible!


Ha!

evaluation of expressions in the assembler are dependent on 2
different context: whether they are part of call instructions, or part
of literal loads. this seems to solve a big problem, but i can't quite
express it yet...

TODO:
  * evaluation of symbolic code
  * think about side effects OK/not ?
  * where to store assembler opcodes?


Entry: quoting and meta-code
Date: Mon Apr  7 07:36:13 EDT 2008

the problem: quoted labels. somewhere down the line, quoted words
loose their quoted tag, such that macro evaluation doesn't give what's
needed.

' abc 1 + jump

what about getting rid of the symbolic part, and starting with clean
semantics?

 (([qw a] [qw b] +)    ([qw (macro: ',a ',b +)]))

this doesn't work of course, because it recurses.. so what would work?
schould it be scat code?

 (([qw a] [qw b] +)    ([qw (scat: ',a ',b +)]))

what is the semantics of a quoted scat word? can it be just a thunk?

 (([qw a] [qw b] +)    ([qw (lambda () (+ a b))]))

thunks are most flexible, but would it be good to limit the semantics
to somehow scat words or macros?

let's restate the goals:
  * obtain a value at assembly time.
  * allow easy composition of meta-code at compile time
  * allow meta code inspection
  * simplify definition of meta-ops (snarfing)

 (([qw a] [qw b] +)    ([qw (scat: ',a ',b +)]))

maybe i need to give up on inspection, and solve that
later. concentrate on semantics first.

   [qw <thing>]

what is <thing>?

it's any VALUE that makes sense at compile time, to be passed around
between macros, but eventually, it should end up as a number.

what is composition of meta code?

in the previous approach, this was done syntactically: just
concatenate lists. is this still a good approach? isn't a more general
abstract approach better?


 (([qw a] [qw b] +)    ([qw (meta: a b +)]))

now, what does 'meta:' do ?

  * produces a single (delayed) numerical value
  * the '+' comes from scat namespace
  * the 'a' and 'b' are lexical parameters.

ok. got it in macro/meta.ss : simple layer on top of scat code which
appropriately quotes lexical variables, and wraps results in a meta
structure to chain evaluation.

seems to work with the pic18 stuff too.

now: meta annotation. something that might come in handy is to figure
out where assembler literals come from. in the old brood, code was
just symbolic. here it needs to be annotated explicitly, because that
information is lost.

problem: meta code has to be thunks: the value can depend on numerical
addresses of words, which might change during the relaxation phase of
the assembler.

Entry: tick
Date: Mon Apr  7 10:57:55 EDT 2008

so, with meta quoting out of the way, the real problem can come back
now: computing with word labels. this probably boils down to giving
TICK the proper semantics.

     ' foo

this will produce a literal with a quoted macro. all symbols in
macro/forth code need to be macros, and quoting symbols needs a
different tick.

what does it mean to quote a name? it produces a literal value that
supports an 'unquote' operation.

in addition: it MIGHT support POINTER MANIP if it is a macro that
wraps a call to a word.

so: the previous approach of treating symbol names as word addresses
IMPLICITLY dequotes it to yield a numeric address value.

anonymous macros might be convenient. anonymous words also. what's the
difference?

let's see:

      ' foo compile    ==   foo

this is an important issue, and needs some more thought. the
difference between "execute" and "compile" should be cleared up also.

looks like i really need to be careful with AUTOMATIC changes between
macros and words.

NOTE: macros can't survive to the assembly phase, so everything that
used to be a symbol, now needs to be a target-word struct.


does this solve it?

 ;; Get the address from the macro that wraps a postponed
 ;; word. Perform the macro->data part immediately (as a type check)
 ;; and postpone the address evaluation.
 (([qw macro] address)
  ([qw (let ((word (macro->data macro 'cw)))
         (meta-delay
          (let ((pointer
                 (target-word-address word)))
            (unless pointer
              (error 'unresolved-word "~s"
                     (target-word-name word)))
            pointer)))]))


looks like it: 'run' and 'address' are now separate. 'run' doesn't
need to know if the quoted macro represents a target word. 'address'
does need to know that, and fails if it is not.


Entry: optional library code
Date: Mon Apr  7 12:19:22 EDT 2008

Then, the more general problem of requiring runtime words only if
necessary. For example:

 (([qw macro]  run) macro)
 ((run)             ([cw 'runtime-run]))

Since symbols are no longer allowed, this form of late binding needs
to be handled differently. The most straightforward solution is to
have the default throw an exception, and rely on targets to implement
the word.


Entry: bugs
Date: Mon Apr  7 13:03:29 EDT 2008

something went wrong with forth / macro mode: check test/purrr-broem.f
-> fixed: current-mode evaluated at the wrong time.


Entry: assembler
Date: Mon Apr  7 20:08:01 EDT 2008

using a prompt-tag to abort from meta-force: this somehow needs to
give the smallest or the largest instruction, depending on whether
assembly uses a growing or shrinking relaxation.


Entry: more assembler
Date: Tue Apr  8 09:12:35 EDT 2008

About the relaxation algorithm: as far as i understand, it is
necessary that individual instruction sizes move only in one direction
(grow/shrink) to prevent oscillations in the relaxation phase.

2 questions:
   * is this correct?
   * how to ensure?

ok.. i'm going to assume it's necessary to limit size changes. how to
implement this? for this to work individual instructions need to be
tagged somehow.

let's put the responsability at the machine assembler end, and provide
only a mechanism to record the previous result.

on the other hand, we could pad with nops. OK. got it.


Entry: relative addressing
Date: Tue Apr  8 13:41:28 EDT 2008

so.. the relative addressing is a bit of a hack. is it possible to
move address resolution down to the assembler opcodes? sure. just have
them depend on 'pointer-get'.

let's port the pic18 assembler, and see if the generation can be
improved a bit.

porting asmgen and trying to get relative addressing, which now has
already overflow detection, to use absolute input.

Maybe meta-catch-undefined can be eliminated by setting undefined
words to 'here', so they compile to a small relative jump
instruction. Maybe just leave that out: it's an optimization, not
essential.

OK. pic18 seems to work too.


Entry: .f -> .bin
Date: Tue Apr  8 17:11:08 EDT 2008

Time to define the purrr18 language, which assembles straight to
binary. Maybe target-word structures should have a 'bin' slot?

ok.. now for assembling on the spot.  or is that not a good idea?

(assemble! (apply append (map cdr *code*)))

maybe it's time to start doing the "load in namespace" thing?


Entry: workspace
Date: Wed Apr  9 13:37:21 EDT 2008

so.. all the static stuff seems to be in place, now for the dynamic
workspace. there are some issues still with multiple instances.

so. why would one want to use a namespace? to use 'require', 'eval'
and 'compile' in a controlled fashon.


Entry: outstanding issues
Date: Wed Apr  9 13:44:48 EDT 2008

  - dead code elimination OK
  - opti-save / pseudo OK
  - variable allocation OK: words got realms now
  - splitting: OK
  - jump chaining
  - org
  - code serialization


Entry: assembler bugs
Date: Wed Apr  9 13:53:52 EDT 2008

something wrong with error handling on eval..

(allot data 123) -> the 'data part is not allowed: only numbers and
meta promises that evaluate to numbers


Entry: multiple compiler passes
Date: Wed Apr  9 14:45:03 EDT 2008

something i forgot about: there is the 'pseudo' and 'opti-save' pass
that goes over the code after the first pass. it might be a good idea
to formalize this a bit.

the real question: why not postpone all optimisations till later, and
have the core language be as simple as possible?


Entry: words vs macros
Date: Wed Apr  9 16:21:56 EDT 2008

now that i got the target word datastructure in my hands, it's easy to
see that these are completely separate from the macros that generated
them. they can be serialized with evaluated meta expressions, possibly
augmented with some annotations as to where the computations came
from. to get at macros, simply load all the source code, but discard
the *code* variable.

so next question: how to serialize a graph of structs? and while we're
at it: what about graphs and functional programming?


Entry: conditional jumps -> more static assembly rep
Date: Wed Apr  9 18:13:49 EDT 2008

something to think about is if it is possible to find a common
primitive for for .. next and other loops.

ha.. something else: assembler constants need to be bound: no more
symbolic magic. maybe also a good time to require assembler opcodes to
be bound names + perform arity checks?

also added an opcode check: it's probably best to just replace that
with module name bindings in the (asm) namespace though, so all checks
can be automated. however, that does require either moving the
assembler to compile time, or using namespaces + eval. a big hurdle is
the implementation of the pattern matcher: it uses symbols. since the
assembler namespace is flat, and not so big, and quite constant, it
doesn't really need to be managed.. let's keep it like it is, but add
an arity check.

it would be nice to have some things available at compile time though,
like arity checks. might combine both the symbolic matching AND some
static name binding?


Entry: static vs. dynamic
Date: Wed Apr  9 19:57:49 EDT 2008

i don't know whether this is mostly bias, but it seems that using a
bottom-up approach instead of an ad-hoc, late-bound approach makes
things easier to understand.

i.e.: i didn't realize that the "delay evaluation until assembly time"
is really ONLY about values of target code and data addresses:
everything else can be evaluated earlier. previously this was handled
in a sort of ad-hoc way with evaluation of macros and assembler
labels.

so, what about the assembler? should it be static (which needs some
magic in the pattern transformer) or stick to the symbolic approach?
maybe some middle road: names handled statically, and the rest done
with structure instances.


Entry: matcher
Date: Thu Apr 10 01:14:34 EDT 2008

it's probably best to:
  * represent assembler instructions with structs + explicit type info
  * write a special purpose matcher that takes into account bitfield
    widths.

the idea is this: the asmgen goes from the textual rep -> symbolic rep
-> procedure rep. what about stopping that compilation somewhere and
leave a little bit of interpretation of the proto?

(define (proto->assembler . proto)
  (match proto
         ((name formals . operands)
          #`(make-asm
             (lambda args
              (parameterize
                  ((asm-error
                    (proto->asm-error-handler '#,proto args)))
                (apply
                 (lambda #,formals
                   (list #,@(map assembler-body operands)))
                 args)))
             '#,proto))))

proto looks like this:
(movlw (k) ((14 . 8) (k . 8))))

moved the asm-error parameterization to assembler.ss resolve/assemble

ok. arity check works.

let's move on to making a pattern matcher. the thing to do is to make
it match instances.


need to revert to find other bug. the problem was with '(list-rest',
which behaves differently than the dotted notation in the other
matcher. as far as i can see the following was legal in match:

   #`(#,@'() . rest)  -> rest

however, in the alternative syntax this becomes:

   #`(list-rest #,@'() rest)  -> (list-rest rest)

which isn't the same:

box> (match '(a b c) ((list-rest bla) bla))
match: no matching clause for (a b c)

tricky business

ok, so now the plt matcher is in place, and it should be possible to
start matching struct instances instead of symbols. on the other hand,
it's not so essential: got plenty of checking implemented now.. maybe
move on to real work?


Entry: mature optimizer
Date: Thu Apr 10 13:28:53 EDT 2008

in order to keep the optimizer tractable, it has to be factored a
bit. lets see what we got now:

   (1) compilation + optimization of non-jump instructions
   (2) jump optimizations on intermediate code
   (3) elimination of intermediate reps + save opti

the last step is target specific atm.


Entry: meta eval annotation
Date: Thu Apr 10 13:32:08 EDT 2008

annotation is easily made by replacing undefined addresses with
symbolic references..

ok, not easy, but at least straightforward: target values now have 2
thunks: one that produces real values, and one that gives expressions
referring the target word names.

maybe.. it's better to just use an s-expression language here?
targeting external assemblers' expression languages will probably be
easier using a nested format, instead of a linear one.. for the
built-in assembler, there's no need, since the evaluation mechanism is
abstract.


Entry: conditional jumps
Date: Thu Apr 10 17:22:33 EDT 2008

these are special.. but how exactly?

one of the things i'd like to try is to isolate loop bodies so they
can be optimized. the previous 'amb' based approach (for .. next) is a
bit of a dirty hack, and doesn't work very well with non-flat code as
before..

the for..next opti checks to see which of these produces the smallest
loop body.

       for .. next
       dup for drop .. save next drop

so, what is the pattern here?
  * execute a couple of simultaneous paths
  * choose the best one

this probably needs a purely functional split loop, so continuations
can be used without trouble. let's try that.

what does 'split' do? it calls 'next' and then continues. so it needs
a true continuation.

remarks:

  * split doesn't need call/cc: it just produces a value, and that
    value isn't all that interesting.

  * in the loop body, no assignments can be made: when a split occurs,
    just cons the word and code list together, and perform mutation
    AFTER everything is done.


OK, i think i got it written down.. not sure whether it will work
though: afraid that the continuations in the macro evaluation will
somehow interfere with the update loop..

it should really be seen as 2 tasks communicating.. anyhow. more
later.

i can't get this to work.. probably i'm discarding things i'm
collecting when calling the continuation. maybe composable
continuations work here? but i don't really understand them yet.. it's
like stuff pushed to the return stack..

so why can't this be solved in a monadic way? actually, this is the
same problem as the one i'm trying to solve with passing data
alongside the normal stack: because there's no room besides 'data' i
can't just tuck away more stuff..

looks like this is getting me closer to how to implement the core
mechanism for monadic threading... probably going to learn a thing or
two here. let's concentrate.

  - a macro is a map  (stack,asm) -> (stack,asm)

  - i'd like to extend this to a map (words,stack,asm) ->
    (words,stack,asm) which has a single mixing operator 'split', and
    all the other operations are lifted.

how to do this?

http://community.schemewiki.org/?composable-continuations-tutorial

from plt manual:

  (reset val) => val
  (reset E[(shift k expr)]) => (reset ((lambda (k) expr)
                                       (lambda (v) (reset E[v]))))
    ; where E has no reset

similar:

  (prompt val) => val
  (prompt E[(control k expr)]) => (prompt ((lambda (k) expr)
                                          (lambda (v) E[v])))
    ; where E has no prompt

applied to the monadic bladibla: suppose 'D' performs some mixing with
other state y, but all the small caps operate only on x.


(lambda (x y)
  (a (b (c (D (e (f x)))))))

=

(lambda (x y)
  (let-values
      (((y+
         x+)
        (reset
         (a
          (b
           (c
            (shift abc
                   (let ((x+ (e (f x))))
                     (values (merge2 x+ y)
                             (abc (merge x+ y)))))))))))))


this way, the extra data 'y+' can be passed sideways, not going
through the 'abc' chain.

so.. as long as an expression is wrapped in a reset, 'shift' can get
inbetween. how to use this for implementing lifting? one prompt tag
per lifted thing?


Entry: composable continuations
Date: Thu Apr 10 21:33:38 EDT 2008

some simpler example is needed. suppose we're doing one prompt tag per
threader.

(define (one stack) (cons 1 stack))
(define (drop stack) (cdr stack))

(define (word: . fns)
  (apply compose (reverse fns)))


(define (broem stack extra)
  (define (mix s)
    (shift post
           (values (+ extra 1)
                   (post (cons extra s)))))

  (reset ((word: one mix one one) stack)))


getting tired.. what i want to do is basically: create a mechanism to
chop the code up in chunks that is compatible with continuations for
non-deterministic programming, so optimization can be implemented
using 'amb'.

what i sort of see is that it is possible to use shift and control to
chop up a program into different parts, and recompose them. in the
light of writing a RPN program (a b c) as (lambda (state) (c (b (a
state)))) this makes sense: shift can capture what happens AFTER a
certain point, upt to where the result is needed.


again.. i think i get what reset/shift do, but can't make the
connection to sidestepping threading. maybe i should try to translate
it to RPN first?


Entry: hiding more stuff in 'data'
Date: Thu Apr 10 23:06:10 EDT 2008

1. there is no difference in trying to extend stack -> (stack,data) ->
   more, so it should work with just a number.

2. in a loop which has extra internal state, compute a composition of
   functions where one of the functions is special, in that it can
   refer to the enclosing state.

can't this be hidden in 'make-state?' : let that function perform all
the necessary combinations of state.

what i wonder is how to relate this to real monads: the operation of
"flattening" 2 monadic layers?

  bind : (M t) -> (t -> M u) -> (M u)

  map  :  (t -> u) -> (M t -> M u)
  join :  M (M t) -> M t

map seems really trivial, but join?


Entry: lifting with shift
Date: Thu Apr 10 23:19:44 EDT 2008


 (a (b (C (d (e x)))))

 ab : x -> x
 de : x -> x
 C  : x.y -> x.y

now make
 abCde : x.y -> x.y

let's try again:

add1 : x -> x
swap : (x . x) -> (x . x)

(define (swap x)
  (cons (cdr x) (car x)))

(define (lift fn)
  (shift post                        ;; capture postproc
         (lambda (xy)                ;; create new function
            (let ((xy+ (fn xy)))     ;; apply fn to its input
              (cons (post (car xy+)) ;; apply post to one of the components
                    (cdr xy+))))))   ;; .. and join again

   * capture the stuff that postprocesses the x component
   * apply the


Entry: struct matcher
Date: Fri Apr 11 10:17:44 EDT 2008

in order to make the monad thing work, i'm going to use structure
types only, and write a special purpose matcher that handles nested
structure types with a simpler syntax.


Entry: broken functional compiler
Date: Fri Apr 11 15:03:07 EDT 2008

paste it here, so i can get the imperative back online:


;; To the macro layer, code and labels are distinct entities
;; represented by abstract target-word data type and reversed assembly
;; code lists respectively. After compilation, the code lists are
;; permanently attached to the word structs. During compilation, no
;; side effects are made, so continuations can be used for
;; optimizations.


(define (compile-word input-word)

  ;; Label generation is stateful, but that's ok since we don't care
  ;; much about the counter values. They are just for readability.
  (define next (make-counter 0))
  (define (label (name (format "_L~a" (next))))
    (make-simple-target-word
     (string->symbol
      (format "~a" name))))

  ;; Get the macro code, and create a start thunk and set up the
  ;; grabber parameter.
  (define macro (target-word-code input-word))
  (define name  (target-word-name input-word))
  (define grab-words (make-parameter #f))
  (define (go)
    ((grab-words)
     (macro->code macro name)))

  ;; Split needs to be purely functional so continuations can be used
  ;; freely when compiling code, discarding the split word state if
  ;; necessary.
  (define word/code
    (let ((tag (make-continuation-prompt-tag 'compile-word)))
      (set-target-word-code! input-word #f)
      (parameterize-words-ns!
       (macro) ((semicolon (ns (scat) postpone-exit)))
       (parameterize
           ((target-make-label label)
            (target-split #f)) ;; Ensure side-effects are local.

         ;; State updates directed by calls to 'split'.
         (let update ((words '())               ;; listof (word . code)
                      (current-word input-word)
                      (continue go))            ;; continuation thunk

           ;; Split needs to be purely functional so continuations can be used
           ;; freely when compiling code.
           (target-split
            (lambda (state new-word)
              (let ((code  (state-data state))
                    (stack (state-stack state)))
                (shift-at
                 tag
                 chunk
                 (update (cons ;; no assignments!
                          (list current-word chunk)
                          words)
                         new-word
                         (lambda () (k (make-state stack '())))))
                 tag))))

           ;; After 'macro->code' we end up here to record the last bit
           ;; of code, collect everything and exit from 'go' and thus
           ;; the 'update' loop.
           (grab-words
            (lambda (final-code)
              (cons (list current-word final-code) words)))

           ;; Continue computation
           (reset-at tag (continue)))))))


  ;; Link up structures, and return a list of words.
  (map* (lambda (word code)
          (set-target-word-code! word code)
          word)
        word/code))


Entry: next
Date: Sat Apr 12 00:30:14 EDT 2008

got a bit confused by the control operators yesterday. might look at
this link, and some more about cursors..

http://blog.plt-scheme.org/2007/07/callcc-and-self-modifying-code.html


Entry: monads
Date: Sat Apr 12 13:03:46 EDT 2008

so.. from the point of 'map' and 'join', which i think are easier to
understand.

  map:  take a function f:u->t, and turn it into Mu->Mt

  join: take MMt to Mt: undo a 'double wrap'

the key insight is that how many times f is used, and in what order is
not specified. and for join, it doesn't matter what the wrapping does,
as long as it can be flattened: wrapping can contain multiple base
type instances, in whatever structure.

maybe 'bind' isn't that hard to understand after all, it takes a monad
Mt and a function that produces a monad Mu from a value t, unwraps Mt,
applies t->Mu as many ways as necessary, and combines all the Mu into
a single Mu.

in the map/join version, the ORDER of wrapping is very important.

((map f) m) == (bind m (lambda (x) (return f x)))
(join m)    == (bind m (lambda (x) x))
(bind m f)  == (join ((map f) m))

in order to understand this better, i'm trying to implement it
(without looking at other implementations.) see monad.ss


i'd like to make 'map' and 'join' polymorphic, but that's not quite
possible because of absence of typing information. functions could be
annotated however (do contracts help here?)

(something in the back of my head: in haskell, one can dispatch on the
return type of a function. i'm not sure if that's going to be a
problem here.. EDIT: it's about the unit operation.)


Trying to implement a monad that carries around just an extra scheme
value. This is the simplest thing i can think of.

(define-struct extra-monad (value extra)

(define (extra-map t->u)
  (struct-match-lambda
   ((extra-monad value extra)
    (make-extra-monad (t->u value) extra)))))

(define extra-join
  (struct-match-lambda
   ((extra-monad (extra-monad value extra-inner)
                 extra-outer)
    (make-extra-monad value ???))))


The problem seems to be in the join operator. Map is simple: just
pass it on. But what does the combination do? An option is to simply
pick one of the 2.

http://groups.google.com/group/comp.lang.functional/msg/2fde5545c6657c81

     "You can also turn programs in continuation passing style into
     monadic form. In fact, it's a significant result (due to Andrezj
     Filinski) that all imperative programs using call/cc and state
     can be translated into a monadic style, and vice versa. So
     exceptions, nondeterminism, coroutines, and so on all have a
     monadic expression."

maybe time to formulate my question: since call/cc seems to be more
'native' to scheme, why don't i use that instead of monads?

ok.. am i allowed to try again with reset/shift ??


Entry: reset / shift
Date: Sat Apr 12 14:00:25 EDT 2008

trying to do this:

  (abcZdef)  -> (ABCZDEF)

need to do this dynamically, without changing the small caps.. wrt to
state, the diagram should illustrate it:

  -a-b-c-Z-d-e-f-
  -------+-------

so i try to use 'shift' to collect the remaining computation, and turn
it into a lifted function.

what i want is this:


(lambda (_) (f (e (d (Z (c (b (a _)))))))) ->

(lambda (_)
 (cons  (cons (lambda (x) (f (e (d x)))) z)
        (c (b (a _)))))

the problem is really termination (the 'null' of the list if you want)


ok.. i get something, but not what i expect..
time to go to a simpler version.

hmm.. very confusing stuff: i understand what happens if there's one
shift, but every next one gives results i don't understand. probably
best to try to write out some examples using the reduction rules.


Entry: static
Date: Sat Apr 12 15:23:48 EDT 2008

if i can't get it to work dynamically, why not provide the information
statically? the only thing that matters is the binding for the
functon 'split' in the compiler loop. isn't there a way to make the
forth macros accept this word?

the problem is: there are words defined on top of split, so i'd have
to make all those dependencies static too..


Entry: state.ss / 2stack
Date: Sat Apr 12 17:13:04 EDT 2008

main problem: the 'data' part in state is the thing that's passed
around by all control words. this cannot be the 2nd stack: data needs
to be a stack of 'wrapped things'.. i'm not sure what that means yet,
but it's 'stuff' that gets threaded through computations.

BROOD 6 is probably going to be about doing this with composable
continuations..

i'm going to try to shield access to this data atom. i think i still
don't understand why scat-control.ss needs to have this atum clearly
visible.

trying to define these stack update functions, i find a need to make
the WHOLE state representation explicit again.

ok.. it's the right path, but i'm using the wrong abstraction.

i need a mechanism to just stick something on state to be retrieved
later. in other words, the layers of wrapping need to be made
abstract.

i.e in:    a b c D e f g H i j

if D and H interact with the threaded state in some way, they needs to
be able to do that without the lower case functions knowing about the
existence of these things.

different type of things need to work independently. or not? i.e. asm
access only makes sense if the type is actually extended with such
information.

i'm missing some crucial insight here..


Entry: automatic lifting
Date: Sat Apr 12 18:05:29 EDT 2008

i'm looking at this the wrong way..

this all makes so much more sense taking the stance of "automatically
lifting" a procedure whenever it is applied to a certain input
type. that's really the only thing necessary..

so what about turning this around and seeing 'state' as an object with
a method 'apply me'

imagine a conversation between STATE and FUNC.

STATE: dude, i want to apply you. what's your game?
FUNC:  i take A to A
STATE: hey, i got some A here, i'm going to use you and move on.

so, 'state' should really be a function.

STATE: FN -> STATE


so.. it's the responsability of the state to interpret the functions
applied to it, and the responsability of functions to identify
themselves.

(define (make-stack data)
  (lambda (fn)
     (if (stack-proc? fn)
         (make-stack (fn data))
         (error 'type-error)))


this changes the representation from

  (lambda (x) (a (b (c x))))

to

  (lambda (x) ((((x c) b) a))


which really looks like RPN code :)


the 'dumb' state would be

(define (dumb data)
  (lambda (fn)
    (dumb (fn data))))


so.. is this an interpreter? looks like it.. note that in order to
optimize things, some could be unrolled:

(lambda (x) ((x c) (lambda (stack) (a (b stack)))))

this is basicly an implementation of the 'map' function: the function
implementing the state object is the monad wrapper M which contains a
type t, and it maps an incoming t->u to Mt->Mu.


summarized:

  * all functions are typed, and do not need to be aware of state.
  * state is completely abstract


maybe this can do all kinds of lifting automatically? i.e. scheme
functions -> stack functions etc..


how hard is it to change this? where do control words fit in, since
state is no longer passed automatically. maybe control words are just
another type that take a continuation argument?


(define (stack lst)
  (lambda (fn)
    (cond
      ((stack/control? fn)
       (stack
         (call/cc
           (lambda (k) (fn k lst)))))
      ((stack/data? fn)
       (stack (fn lst))))))


well.. it's an interpreter for sure. can these contitionals be
eliminated? well yes, if at compile time the type can be
determined.. so is that possible? can functions be typed statically?

there's one problem though: composition: what type does this have?
state -> state, where state is a fn. so there's a difference between
'primitives' and 'composites'.

(lambda (x) ((((x) a) b) c))

EDIT: something's chicken and egg here tho: primitive types and
extensible types. looks like i slammed into the "expression problem",
since i want to extend both the type and the methods.


Entry: questions
Date: Tue Apr 15 19:37:42 EDT 2008

* extensible types: is the inverted approach of previous post a good
  idea?

* reset/shift : there has to be a way to 'split' functions at points
  where other data is injected.

Entry: shift/reset breakpoint draft
Date: Tue Apr 15 21:10:42 EDT 2008

(define tag (make-continuation-prompt-tag 'tag))

(define (make-split [more #t] )
  (lambda (inner)
    (shift-at tag
              rest
              (values (and more rest) inner))))

(define x add1)
(define y (make-split))
(define stop (make-split #f))

(define (make-composition . fns)
  (apply compose (reverse fns)))

(define (test fn input)
  (let next ((thunk (lambda () (reset-at tag (fn input)))))
    (let-values (((k v) (thunk)))
      (printf "v = ~s\n" v)
      (if k
          (next (lambda () (k v)))
          v))))


box> (test (make-composition x x y x x x y x x x x stop) 0)
v = 2
v = 5
v = 9
9


EDIT: i get it.. nested shifts will always return the deepset shift
free expression.


Entry: multiple compilation paths + memoization
Date: Wed Apr 16 09:31:56 EDT 2008

the reset/control is about implementing the forth compile loop without
side-effects, currently it uses a stack (push!). once that is done,
there should be a way to use extenisions to compile some sequences
multiple times, and pick the best one.

one of those is for/next. however, with nested loops, care should be
taken not to make the algorithm quadratic. i'm not sure whether
memoization is necessary: explicitly using 2-path execution might be
more interesting. in the 'test' loop before, this amounts to running
one compilation multiple times, one with code wrapped around the loop,
and pick the best one.


Entry: breakpoints
Date: Wed Apr 16 09:54:05 EDT 2008

the reset/shift approach has the semantics of breakpoints. let's just
call it that, and make the abstraction complete.

the players:
  * (make-breakpoint tag mix [more #t])
  * (with-breakpoint tag fn state0 value0) -> state,value
  * (mix state value) -> state,value

this seems to work well. my only worry is composition: what happens if
there is more than one tag involved? the way to look at this might be
from the outside: a tagged shift only makes sense if it's captured by
a tagged reset, so combinations of tags would be properly dynamicly
nested. in that case, i see no problem.


Entry: compiler with breakpoints
Date: Wed Apr 16 14:19:48 EDT 2008

looks like it's working.

now: is this really necessary? it would be nice to understand if it
can be done using parameters and side-effects.

the true test here is of course to try something with continuations,
see where it goes. maybe have a go at for .. next?

Entry: postpone-exit
Date: Fri Apr 18 11:05:58 EDT 2008

hmm.. something wrong with -broem -bla tests, they seem to
hang. problem with mexit.

they were calling each other:

(compositions
 (macro) macro-prim:
 (exit   postpone-exit))


(define-ns (scat) postpone-exit (ns (macro) exit))

renamed to 'compile-exit' goes better with 'compile'


Entry: cps forth
Date: Fri Apr 18 11:31:43 EDT 2008

is there any meat in cps forth? or is this just a way of interpreting?
probably..

cps replaces "CALL" and "RETURN" with "GOTO with parameters". it does
need first class functions though.


Entry: parsing C
Date: Fri Apr 18 12:50:15 EDT 2008

http://eli.thegreenplace.net/2007/11/24/the-context-sensitivity-of-cs-grammar/

of things to do.. i need to have a look at piumarta's packrat
parser. that would be a very interesting addition to brood.


Entry: scat progress
Date: Fri Apr 18 14:07:35 EDT 2008

is going really well. i'm as good as done, except for the interactive
part which needs a bit of re-org. the name space management is a lot
better now. making things a bit more static didn't really hurt.


Entry: new name for purrr
Date: Fri Apr 18 14:24:01 EDT 2008

everybody keeps calling it picforth, but that's already used. what
about PRICFOTH? it already sounds obscene in dutch..


Entry: tethering
Date: Fri Apr 18 14:44:10 EDT 2008

 * compile the monitor
 * port interactive code

maybe it's possible to get rid of interpret/compile mode in console
interaction. maybe some 'auto tether' can be made: not running certain
optimizations so macros can be easier simulated?

that's quite a challenge..

the problem at first hand seems to be the use of platform-dependent
constructs.. translating forth to pseudo code is trivial, but some of
the langauge is defined ONLY in terms of assembly code.

the reason to have an interpret mode is to not have to touch the flash
rom. ram-based forths should really just compile and execute, but for
rom-based forths there is room for a separate interpret mode
language. it's also the right spot to introduce tethered commands from
target's perspective.


Entry: compiling the monitor code
Date: Fri Apr 18 15:06:57 EDT 2008

things that are going to pop up:
 - handling the namespace
 - compilation, assembly + serialization of word struct.
 - org


Entry: for .. next
Date: Fri Apr 18 15:45:13 EDT 2008

maybe i need to test this first: compile 2 branches + save the best in
memoized form such that nested loops are computed inside out in linear
time.

           for         body         next
       dup for    drop body save    next drop

so, at the time 'for' executes, it needs to know which of the 2 is
shortest: (body) or (drop body save).

let's call the above: (for0 body next) and (for1 body next1) and
reserve (for) and (next) as the macros that setup the evaluation.

this leads to the following control logic:

  if 'for' can capture 'body', it can try several strategies and pick
  the best one.

can this be done using composable continuations? it would be the first
testing point to see if they mix well. if so, it can probably be
generalized to a lot more control structures.

i worry about nesting: for_o for_i .. next_i next_o would lead to
something like:

(lambda (state)
  (reset (next_o
          (reset (next_i
                  (body
                   ;; for_i
                   (shift i
                          ;; for_o
                          (shift o
                                 state))))))))

maybe they need different prompt tags? looks like it: the inner shift
won't see the outer reset. let's give it a try.

this needs to go deeper: since the rest of the code explicitly needs
to be called inside a dynamic extent.. confused now.


stack:

next_o
next_i
body
for_i
for_o


it's probably best to bring shift/reset to scat.

;; Installs a reset and saves the prompt tag on the stack.
(define-ns (macro) reset/tag
  (lambda (state)
    (let ((tag (make-continuation-prompt-tag 'reset)))
      (reset tag


Entry: composable continuations
Date: Fri Apr 18 17:16:26 EDT 2008

http://schemekeys.blogspot.com/2006/12/delimited-continuations-in-mzscheme.html

  ... four classes of delimited continuation operators ...  are
  referred to as -F-, -F+, +F- and +F+. Dybvig et al. describes them
  as "a classification of control operators in terms of four variants
  of F that differ according to whether the continuation-capture
  operator (a) leaves behind the prompt on the stack after capturing
  the continuation and (b) includes the prompt at the base of the
  captured subcontinuation."

that makes things a lot easier to understand.


Entry: tools + check
Date: Fri Apr 18 20:55:41 EDT 2008

moved code used from zwizwa-plt back to the tools/
directory. granularity is too fine. if i need it in other projects,
maybe best copy/paste.. most is too specific.

should also cleanup sweb to get rid of the stream stuff, and use
something standard.


Entry: serialization
Date: Fri Apr 18 21:44:26 EDT 2008

using scheme/serialize and define-serializable-struct should give
serializable object code if the target-value structs are
evaluated. maybe some annotation should be left instead of the value?


Entry: graphs and FP
Date: Sat Apr 19 10:28:53 EDT 2008

i never quite understood how to deal with graphs in FP. in EOPL
there's a point where circular reference is avoided by delaying
linkage. i think at the point where environments are implemented. at
the time it struck me as odd..

so, what is a graph? it's a map::  node -> (listof node)

whether children are ordered or not, listof can be setof

the problem with graphs is that nodes refer to one another. let's
first try to represent a graph as a tree.

see also zipper:
http://www.st.cs.uni-sb.de/edu/seminare/2005/advanced-fp/docs/huet-zipper.pdf

using lazy data structures, self-reference is easy, and can be
represented by lambda terms, which eventually boil down to the
Y-combinator.

EDIT: the idea seems to be to represent the graph as a lazy structure
that can generate a 'local tree expansion' or something.. a bit like
manifolds and R^n patches.

EDIT: what i'm looking for is called circular programming.
http://www.csse.monash.edu.au/~lloyd/tildeFP/1989SPE/
http://www.haskell.org/sitewiki/images/1/14/TMR-Issue6.pdf

basicly: you need lazy evaluation to build graph structures: a pointer
to a structure can be available while the structure itself is as of
yet unevaluated, and as such can reference itself.


Entry: spread the word
Date: Mon Apr 21 19:48:29 EDT 2008

http://www.forthfreak.net/index.cgi?WikiNode

Purrr is mentioned there. Maybe i go around and edit some wikis?


Entry: string -> language
Date: Mon Apr 21 21:26:18 EDT 2008

How to create forth code from a string?

i forgot how the logic works..

pic18/lang/reader.ss:
(module reader scat/forth/module-reader
  scat/pic18/purrr18-module-language)

the generic forth reader uses #%plain-module-begin from the specified
module. to declare and instantiate a module body

(module test "pic18/purrr18-module-language.ss" : abc 1 2 3)
(require 'test)
(print-all-code)
abc:
	[dup]
	[movlw 1]
	[dup]
	[movlw 2]
	[dup]
	[movlw 3]


now, from a string: open the reader module with a prefix:
(require (prefix-in 'forth- "pic18/lang/reader.ss"))


The answer seems to be: forth code lives in a namespace, so in order
to load a file, create a new namespace.

EDIT: got it to work by using:

    (parameterize
        ((current-namespace ns))
      (eval form)
      (eval `(require
              scat/macro/code
              ',name)))

now i can instantiate multiple namepspaces, each with their own
language. one problem though: the word structures are not accessible,
because the instances are different.

anyways, this gives a nice border to create the "badnop interface".

now, this takes noticable time with all modules compiled.

     (ns-print-code (purrr18->namespace ": abc 1 2 3"))

which means something is running during instantiation of the
modules.. maybe it's the tests? maybe it's possible to keep a
namespace around with an instantiated compiler, and re-evaluate forth
code? TODO: split instantiation of compiler, and compilation, to make
way for incremental compilation.

looks like this is the next step: make this easy to use.


Entry: repl during compilation
Date: Tue Apr 22 10:24:22 EDT 2008

in order to have the same debug 'compile' mode as in brood-4, some
access to the asm state is necessary. this needs to be implemented as
a breakpoint word, one which prints out the whole state in a
meaningful way.


Entry: flashforth
Date: Thu Apr 24 16:08:58 EDT 2008

going through the flashforth tutorial, and it seems mikael has been
busy. with some optimizations here and there. it's nice to have an
example like that.

this does bring me to the optimization vs. simplicity trade-off. it
seems difficult to stay at either extreme.


Entry: state extensions through shift/reset
Date: Fri Apr 25 10:22:36 EDT 2008

with this shift/reset thing working for augmenting the compile state
from one straight code list to an assoc list of such, i think it might
be better to do the same with the 'data' element in the core:
everything that uses the 2stack state should use this dynamic
extension mechanism.

to extend state:
   * define mixer words using 'make-breakpoint', referring to a prompt tag
   * wrap such code in 'with-breakpoints'

the place to start is the control words. this needs to be made
independent of state rep anyhow.

CONTROL INTERFACE:
  - state
  - state-cons
  - state-stack


the only one that's really problematic is 'apply' because it performs
both function application on an isolated stack + state merging.

maybe this should ignore state effects? solve later: might disappear
when state threading is done using composable continuations.

  NOTE: the business of 'merging state' in the monads/JOIN operator
  seems to be a generalization of assignment.

ok. so i got all explicit state reference removed. time to wrap the
whole thing in 'with-breakpoints', and turn the low level rep of scat
functions to stack only.


so.. made the change. the remaining question is "where to wrap"?
should all macros have SCAT prototype, or are they converted to a
2stack -> 2stack mapper at some point?


Entry: closed or open?
Date: Fri Apr 25 12:25:27 EDT 2008

the remaining question for the compiler is: how to represent
macros. are they open SCAT functions, or closed 2stack -> 2stack
functions. the problem with the latter is that it can't be composed in
the scat way.

so, let's do this: if composition in SCAT is necessary, prototype
remain stack->stack + open dynamic refs, otherwise the expressions are
closed using 'scat->2stack'

now, this has implications for the pattern transformers:
pattern-tx->macro will represent a transformer as an open SCAT
function.

ok.. got that sorted out. see k/asm->scat in scat-2stack.ss

now it's time to get in trouble: make-split-update requires access to
the asm state, so it needs to operate on closed macros.

    open macro = scat function (stack -> stack)
    closed macro = 2stack -> 2stack

so, the solution is to not operate on open macros, but on closed ones:
that way the compile mixer has access to the asm buffer.

   (i need some terminology cleanup)

next problem: 'with-exit' requires access to asm buffer.

maybe time for a break.. can't wrap my head around this: m-exit uses a
parameter, so can i create a mix function that references this
parameter? probably not, because the dynamic context of the mix
function is likely outside the scope of the with-mexit.

maybe the macro exit status should be hidden in the overall compile
state. alternatively: m-exit could close over the asm-buffer? i don't
really understand yet..

summarized: how do parameters and delimited continuations interact?

the problem with 'with-exit' as it is now is that without turning it
into a 2stack mixer function, there's no way to access the state.

so, bascly, i need a mixer that calls with-breakpoints. does that
work? it really should work.

what is this? local closure.

ok.. i'm going to try first to mix parameters with the breakpoint
control structure, then possibly make an abstraction for this.


got too much in my head again.. macro->postprocessor operates on
closed macros.

ok. i have the impression to be on the right way: this way of
composing does need a better abstracted api.

instead of using 'values' in the state update function, use a
structure type: the type has 2 values.


Entry: better abstraction
Date: Fri Apr 25 19:19:23 EDT 2008

1. the low level sequencer api:
     * make-breakpoint
     * with-breakpoints

2. high level lifting api: consists of adaptor functions constructed
   from state wrap/unwrap functions.


i'm REALLY close to figuring out the relation to monads, but can't
wrap my head around it yet. instead of applying the continuation in
'with-breakpoints', that operation should be abstracted.


Entry: weird bug
Date: Fri Apr 25 22:27:35 EDT 2008

ok. almost working, except that i get 'stack' instead of '2stack' in
the mexit-update function, while 2stack-mexit gets a '2stack'. i hope
this is not a conceptual error with order of prompt tags..

looks like it is.. i should check if it's possible to mix the
open/close constructs.

basicly, what i'm doing is this: create a shift with a reset in it.

  (shift (E (reset ...)))

the normal operation is nested shifts:

  (reset (E1 (shift (E2 (shift ...)))))


it really should not be a problem: shift can only see the inner reset,
the only way a reset can disappear in a reduction is when it exits
(returns a value).

(reset val) => val

(reset E[(shift k expr)])
=> (reset ((lambda (k) expr)
           (lambda (v) (reset E[v]))))

;; where E has no reset
--

(reset val) => val

(reset E[(shift k expr)])
=> (reset
    (define (k v) (reset E[v]))
    expr)


i can't juggle with it yet.. maybe make some test cases?

EDIT: i really don't see it.. put in some logging tags, and i don't
understand that order either.. looks like the 'nested closing' doesn't
work.


Entry: in words: mexit
Date: Fri Apr 25 22:28:15 EDT 2008

during the execution of a macro, the word ';' will compile a jump to
the end of the execution, except when it occurs at the end. the word
will be split if necessary.

=>

'except' : remove last jump.

the problem i'm facing is where to store the state necessary to
implement this. was implemented using parameters + side effects, but
want it to be in terms of shift/control to give a purely functional
implementation.


Entry: giving up?
Date: Sat Apr 26 09:15:22 EDT 2008

maybe time to see if this can be installed in the compile state. what
i need there is a stack to trace the dynamic context of macros. it
feels wrong though, to not have this coincide with the dynamic call
stack.. but maybe it is becase i'm messing that up, that i need it
stored explicitly somewhere else?

the problem is: what if macros don't exit? am i allowed to jump
outside of context? does it need dynamic-wind? hmm.. if a macro
doesn't exit, it probably also doesn't produce code.

on the other hand: having the mdyn stack available might be interesting.


Entry: next try
Date: Mon Apr 28 14:08:47 EDT 2008

let's make a simpler example first: 2 level nesting of scat->2stack
and 2stack->scat.

works just fine..

(define-ns (macro) test-b
  (lambda (s)
    (printf "test-b ~a\n" s)
    s))

(define-ns (macro) test-a
  (2stack->scat
   (match-lambda
    ((struct 2stack (asm ctrl))

     (printf "test-a ~a\n" asm)
     (let ((out
            ((scat->2stack macro/test-b)
             (make-2stack asm ctrl))))
       out)))))

box> ,scat (require "macro.ss") (macro:: 1 test-b 2 test-a 3)
toplevel in /home/tom/scat/
with-breakpoints:init #<continuation-prompt-tag:2stack> #<stack> #<stack>
with-breakpoints:next #<continuation-prompt-tag:2stack> #<stack> #<stack>
test-b #<stack>
with-breakpoints:next #<continuation-prompt-tag:2stack> #<stack> #<stack>
with-breakpoints:next #<continuation-prompt-tag:2stack> #<stack> #<stack>
test-a ((qw 2) (qw 1))
with-breakpoints:init #<continuation-prompt-tag:2stack> #<stack> #<stack>
test-b #<stack>
with-breakpoints:next #<continuation-prompt-tag:2stack> #<stack> #<stack>
with-breakpoints:next #<continuation-prompt-tag:2stack> #<stack> #<stack>
with-breakpoints:next #<continuation-prompt-tag:2stack> #<stack> #<stack>
(qw 1)
(qw 2)
(qw 3)
box>


so it's somewhere else..
EDIT: i ran into a segfault somewhere..
EDIT: that bug is fixed, maybe try to see if this code now works?
EDIT: ok.. my bug is still there, i'm going to let it go.

Entry: destructive assignments
Date: Mon Apr 28 15:38:55 EDT 2008

so.. why am i doing this? suppose one takes a partial continuation
which has state, does it hang on to this?

(define ((integrate state) in)
  (set! state (+ state in))
  state)

(define k (reset (let ((x (integrate 0))) (x (shift k k)))))

box> (k 1)
1
box> (k 1)
2
box> (k 1)
3
box> (k 1)
4

yes: multiple executions of the partial continuation keep their
state. why would they do otherwise? that's the dragon i'm fighting: i
need an abstraction where the partial continuations are pure
functions.

the important remark here, also related to finding a decent
abstraction instead of the breakpoint one: what i'm doing is
'splitting' a composition. maybe i should go back to that, istead of
the mixer/update abstraction?

basicly, something like
  (a b c | d e f | h i j) with-split

   -> (abc def hij)

the funny thing is, trying to just

  split = (lambda (x) (shift k (cons k x)))

doesn't give a list, a pair with value and continuation, which in turn
produces a pair with value and continuation.


Entry: simplified sequencer
Date: Mon Apr 28 18:16:47 EDT 2008

maybe an explicit sequencer isn't necessary..


a b c >asm d e f

what should '>asm' do? obtain the continuation (d e f) and obtain the
state from somewhere. so what it should pass to the driver is a
procedure that takes a state and a continuation and produces a state.

mix  (state  k:type->state) -> state

this looks like 'bind'  (Ma a->Mb) -> Mb


Entry: is it possible to implement mexit as a parameter?
Date: Mon Apr 28 21:58:10 EDT 2008


as long as the parameters are retrieved inside the proper dynamic
context (not inside a mixer!) there should be no problem mixing
_immutable_ parameters with the stitch mechanism.

_mutable_ parameters are a problem when multiple executions are
desired. for mexit, this includes the exit label reference count
(number of exit points in a macro).

i.e.: suppose some macro executes multiple times, and on each
execution it calls mexit: the reference counts will add up.

but, if this effect can be kept local, there is really no problem: if
macros are wrapped, the state is only visible during execution of that
macro. i can imagine cases where this is violated:

   : bla ...  ( ... ; ... ) ... ;

the code between parens might be grabbed to be compiled in multiple
variants as part of an optimization: here ';' really shouldn't have
any side effects except for the result produced by the variant used.

so, on order to keep the design of the compiler simple, the following
requirements for macros are a good idea:

  * side-effect free wrt. code produced.

  * read-only parameterization allowed: not necessary to be
    referentially transparent

as an exception, side effects ARE allowed if they do not influence the
compilation results (i.e. logging). the reference count tracking of
exit label references violates this.


Entry: practical mexit
Date: Mon Apr 28 22:14:36 EDT 2008

so, can i sidestep the issue and eliminate unnecessary splits at the
end of macros? probably not: will mess up optimizations. local exits
for macros are an exception. is it possible to somehow automatically
ignore the last one, to scan the code for references? no: if a split
occurs during the execution of a macro by any other means, some jumps
to exit might not be visible.

a b c ; d e f ;

so, what are the tasks:
  * maintain an exit label (dynamic parameter)
  * figure out whether to split or not at the end of the macro

the problem is that ALWAYS splitting is bad, because it interferes
with optimization. checking if the label is reachable should be not
too difficult if the start of compilation can be marked somehow.

let's go back to what i'm really trying to do here: to _emulate_ a
return stack. why don't i just have such a thing instantiated
explicitly in the compiler state, so other return stack operations can
be emulated also?


Entry: struct macros
Date: Tue Apr 29 11:41:00 EDT 2008

it would be nice to abstract away the details for update pattern
matching. this requires some access to struct layout. basicly,
generate this from struct names:

(define-sr (compile-update
            (icurrent iwordlist iasm ictrl)
            (ocurrent owordlist oasm octrl))
  (match-lambda*
   ((list
     (struct compile-state (icurrent iwordlist))
     (struct 2stack (iasm ictrl)))
    (values
     (make-compile-state ocurrent owordlist)
     (make-2stack oasm octrl)))))

actually, it's much more straightforward to do this with structure
type inheritance. this requires a deep change though.. first: switch
order of fields in 2stack


Entry: structure types and inheritance
Date: Tue Apr 29 14:35:02 EDT 2008

now i feel stupid

delimited control isn't necessary at all here.. simple inheritance
will do the trick just fine.

one thing i didn't get though is: inheritance works nice for read, but
what's needed is to construct the right output type, so the update
function needs to be abstracted somewhere..


-> all derived structs now have an 'update' function in the first
   field, and a direct constructor as in:

(define update-compilation-state
  (case-lambda
    ((state ctrl)
     (update-compilation-state state ctrl
                               (2stack-asm-list state)))
    ((state ctrl asm)
     (update-compilation-state state ctrl asm
                               (compilation-state-current state)
                               (compilation-state-words state)))
    ((state ctrl asm current words)
     (driver-make-compilation-state ctrl asm current words))))

(define (driver-make-compilation-state ctrl asm current words)
  (make-compilation-state update-compilation-state
                          ctrl asm current words))


ok.. done feeling stupid. works, and is a lot easier to understand.

this can be implemented more efficiently using lists: less copying,
more sharing. not important atm.


this abstraction makes it a bit easier to use:

;; state matcher which introduces 'update'
(define-syntax (state-lambda stx)
  (syntax-case stx ()
    ((_ type (var ...) . expr)
     #`(lambda (state)
         (match state
                ((struct type (update var ...))
                 (let ((#,(datum->syntax #'type 'update)
                        (lambda args
                          (apply update state args))))
                   . expr)))))))

maybe use syntax parameters instead of introducing a symbol?


Entry: summary
Date: Tue Apr 29 19:19:57 EDT 2008

last couple of weeks were dark. what came out?

* using inheritance + abstract factory gives a much simpler solution
  to hidden state threading than composable continuations. inheritance
  solves state read, while abstract factory solves functional state
  update.

* read-only parameters are ok for macros, but mutable parameters or
  mutable closed variables used for determining code output are not:
  if continuations are to be used to perform certain optimizations by
  trial and error, it's best to stick to pure functions. (i don't care
  so much about referential transparency as i care for macro side
  effects.)

* i've got a little more intuitive understanding of monads, and am now
  of the opinion that they are too general for what i'm trying to
  do. also: absence of polymorphism makes them hard to use in
  scheme. and, what i'm trying here might fit better under the arrow
  abstraction, but i'm unfamiliar with that.


Entry: shift / reset and for .. next ?
Date: Tue Apr 29 20:36:56 EDT 2008

what is the reduction rule in rpn?

  shift ... reset  => ( ... ) reset

the problem here is twofold:
  - what prompt tag to use?
  - how to pass the continuation?

it's probably better to use 'call-with-continuation-prompt'


Entry: brood is
Date: Wed Apr 30 18:34:27 EDT 2008

entry://20080329-093950

functionality: RPN SCAT PAT+MACRO FORTH PURRR

types: SCAT:  stack
       MACRO: + asm stack
       FORTH: + dictionary / current word / macro rs

implemented using functional data structures.


Entry: next action
Date: Thu May  1 01:49:04 EDT 2008

time is running out, what needs to be done next?

  - fix the for .. next optimisation as a pilot for other partial
    continuation based optimizations. (-> delimited control + lazy code)

  - port the monitor code.
  - port the interaction code.
  - port catkit / sheepsint.

  - find an easy bootstrap for catkit from within brood.
  - serial port interface (PLANET PACKAGE)

  - simulator interface


Entry: simulator and partial evaluation
Date: Wed May  7 12:48:26 CEST 2008

* interaction should really be partial evaluation of machine instructions.

* the assembler should be specified as a simulator (functionality),
  and state dependency (for data flow analysis)

this should be implemented as a separate macro language.

(TODO: look at plane notes again)


Entry: wire protocol
Date: Wed May  7 13:02:14 CEST 2008

it's important to have a look at what exactly goes over the wire. the
minimalistic monitor there is now is nice for minimal complexity, but
something that can be inspected directly has its advantages. i'm
thinking about the prefix notation from pltix.


Entry: namespace stuff
Date: Wed May  7 13:26:58 CEST 2008

it might speedup compilation a bit to separate phase +1 code from the
core routines: right now, the whole scat module gets instantiated
during compilation.

this is back to how it was before moving all phase level 0 and 1 code
into one module for convenience. i've tagged (they print their name)
the modules that instantiate stuff so it is clear that compilation
does not spuriously instantiate any code that only makes sense at run
time, and that run-time instantiation happens only once per namespace.

so, that works pretty well: an instantiated compiler + code dictionary
is represented by a namespace. this is a "synchronous late-bound
object": it takes messages that can alter its state, and returns
values. what is missing is a way to serialize the state as object code
that can be imported again without compilation.


Entry: machine model / partial evaluation and state management
Date: Wed May  7 17:52:08 CEST 2008

the idea is to be able to evaluate (simulate) code off-target, as long
as it only depends on MACHINE state.

[movlw 123] can be translated into: read,modify,write with the update
happening off target. [movwf LATA] is a border case: it can be split
in read,modify,write but it also effects external physical state.

what is required is a clear definition of what simulation means: is it
completely isolated from the 'real' world, or does it just simulate
the computation part of the target? does [movwf LATA] alter the output
of pins, or does it modify some internal model?

it would be best to make this behaviour pluggable: the amount of
'realness' should be configurable.

the modes are:
                  |  STATE     COMPUTATION
 -----------------|------------------------
  (1) stand-alone |  real      real
  (2) tethered    |  real      emulated
  (3) simulator   |  emulated  emulated
  (4) test        |  emulated  real


and, really, you need only the first 3. does the 4th one make sense
during application development? actually not: the CPU is a functional
unit, and can be exactly emulated (in principle, might not always be
necessary: partial emulation can be good enough). this mode DOES make
sense during emulator testing though. (emulating STATE completely
might be impossible since it depends on the external world)

the place to introduce emulated state is in the partial evaluator of
machine code.

so.. what you want is to be able to modify meaning of code depending
on level of simulation. i.e. [movwf LATA] might mean:

   (1) execute the instruction on the target

   (2) simulate the instruction as passive (memory only) machine
       state update on the host + write the state to the target

   (3) simulate the state update as active machine state update, do
       not involve the target. (i.e. writing to the latch might set
       the state on input ports during next instructions.)

   (4) compare the state update simulated on host and executed on
       target

probably i should generalize brood as a framework for pluggable
simulation. this is more general than the previous emphasis on
tethered development, and potentially a _LOT_ more powerful.

it's probably best to focus on memory mapped i/o and synchronous
execution: get it to work for the PIC18 first, then generalize the
architecture. each functional unit can be implemented as a thread.

what you want basicly is fine grained control over what exactly is
executed on the target, and what is not. there is an order relation
hidden here: it's impossible to simulate state update when executing
code on the target, this means there's a directed graph of 'realness'
that can be used as a guide to building a code/data structure to
implement this.

given the program source, it can be compiled for:

     (1) running completely on the target

     (2) running partly on the host + target state update. the latter
         could be plain code execution.

     (3) complete simulation


some remarks here.

* time-critical software needs to run on-target, so it is important to
  design programs such that they can be tested by virtualizing the
  stimulus (slowing down time): make everything synchronous, that way
  time is an integer and can be abstracted. simulate non-synchronicity
  on top of this.

* the application domain is massive parallel, so the basic unit of
  simulation is a task. PLT scheme has all the necessary tools to
  build this kind of thing. it would be interesting to equip purrr18
  with some libraries to implement state machines and tasks in a way
  that works well with the simulator.

* program compilation = partial evaluation of simulators. i.e. [movwf
  LATA] can be compiled to machine code and executed on the machine
  only if LATA is real. an application will compile to 2 things:

     1. supporting machine code to run on target (i.e. the monitor)
     2. host side entry point, which might sequence simulation

* not so much related, but can 'incremental dev' be used here? only
  recompile parts of target support code that is necessary? this is an
  optimization problem which only needs proper dependency
  management (memoization) and can probably be solved seperately.


Entry: simulator problem definition: generalized interaction mode
Date: Wed May  7 18:43:54 CEST 2008

given
   1. (assembler) source code
   2. cpu (functional) and memory/port (state) model

generate:
   1. binary support code to upload, possibly incrementally
   2. a toplevel driver function that starts the simulation


work with assembler source code to keep the machine model simple:
source code simulation is never going to be accurate enough to be
generalized: you want the nitty gritty. this also enables the
decoupling of the compiler and the simulator: external compilers can
be used.

so, should the memory model be destructive or not? this boils down to
the question: what is more important: speed or the ability to have
non-straight line execution? what about going for the cleanest
solution, and have EACH memory location be a port, with memory being a
simple loopback port? so the only state related to memory model would
be the configuration "patch", which is static for a certain
simulation. all other state could be task-local.


Entry: base machine: real or virtual
Date: Wed May  7 20:20:36 CEST 2008

there's no need to do work twice, so what machine should be used for
the old interaction mode? the 3-instruction forth?

the problem is primitives: currently, the primitives are the machine
instructions. so accurate simulation means simulation of those. on the
other hand, it might be more flexible to allow a higher level
simulator to test algorithms.

the thing that gets in the way is premature optimization: part of the
problem to solve is manual machine mapping: currently, purrr18 is more
PIC18 than Forth. maybe something intermediate can be constructed: a
VM that implements a subset of instructions without optimizer?

so problem: find an intermediate language that is easier to simulate
than target (too complex / target specific) or language (too language
specific, underspecified)

EDIT:

this is a serious problem. 2 stances:

  * forget about target simulator: concentrate on programming
    language, and make it clean. that way simulation is easy because
    the primitives can be simple.

  * forget about the language simulator: what you want is a tiny layer
    on top of the real machine, and possibly use multiple languages
    and binary code.

if i have to choose, the 2nd one is really the only practical
solution. the only disadvantage is that it's target specific.. is
there some trick to be able to have both?


Entry: metaprogramming in the real world
Date: Thu May  8 12:35:56 CEST 2008

talking to axel yesterday, and he was saying that he's doing nothing
but writing scripts that write scripts. what does that really signify?
why is metaprogramming so effective?

there's a selling point hidden here..

i'd say: it's so incredibly difficult to build an interface to
extremely parametric code, that it's better to just turn it into a
proper language with its own composition mechanism, such that it is
complete. the metaprogramming then eliminates the tedious step of
making compositions that can't be composed in the base language. or,
it removes the necessity to extend the base language for one specific
problem.


Entry: simulator generator + specification
Date: Thu May  8 13:16:10 CEST 2008

another thing that popped up in the discussion: how fast is the
simulator? can it be specialized? maybe this is an essential point
also: concentrate on creating a simulator generator.

this means the simulator needs a specification language, so it can be
compiled to fast specialized code later.


Entry: loop bodies and delimited control
Date: Thu May  8 14:34:32 CEST 2008

maybe today is not the day.. tired and stupid, i'm not worth much. but
i run into very strange results when trying shift/reset.

why don't these work?

 (define-word shift stack (shift k (cons k stack)))
 (define-word reset stack (reset stack))

reset actually doesn't install a prompt around a computation, because
'stack' will be evaluated before it is reached. it expands to
(call-with-continuation-prompt (lambda () stack))

what i need is a macro that does this. that is unfortunate, because
all code that uses it will need to be macros too. is that really true?

yes, looks like it is: otherwise the code gets evaluated before it's
passed to reset. looks like i need to play with evaluation order a
bit: instead of using strict evaluation, it might be easier to use
lazy.

what about: every scat function takes a delayed computation?

it shouldn't be too difficult to change this in one place only:
wrapping each strict function so it becomes a lazy one.

(lambda (lazy-apply fn thunk)
  (lambda () (fn (thunk))))

or

  (delay (fn (force thunk)))


Entry: strict vs lazy
Date: Thu May  8 16:57:49 CEST 2008

so, i run into a point where evaluation order does matter: reset needs
to delay its argument.. is it worth it to modify the entire
representation to a lazy one?

the thing is: modifying evaluation order requires macros.. but macros
are viral: any composition of a macro is again a macro.

on the other hand, why isn't EVERY function a macro? with the bulk
using strict application. do i really need to access functions
directly, or is (scat: bla) enough?

or.. why can't composition automatically be macrofied, or, why can't i
have unbalanced parenthesis?

it looks like there is really no way around this: in order to capture
dynamic compositions, code needs to be delayed so a prompt can be
inserted BEFORE evaluation.

EDIT: so.. lazy eval. does that give problems with sequenced code? no:
since a scat program is already sequenced (composition of unary
functions) there is no problem here.

when this is done with the datatypes themselves, it should be fairly
straightforward: states are thunks.


Entry: concatenative family (Cat language)
Date: Fri May  9 18:12:11 CEST 2008


it's time to dive into Joy, Factor and Cat again to see where things
are different. especially Cat, since Christopher and me are doing
similar things for 2 years now with little interaction, and with a
slightly different focus. in SCAT:

 * ties to scheme are important. my goal is not to write a stand-alone
   language. hence the choice of PLT Scheme, which is pretty big..

 * SCAT is dynamically typed.

 * SCAT is not linear.

 * MetaCat: i use term rewriting, but in a different part: i see no
   need for SCAT metaprogramming other than introducing
   non-concatenative language elements to support Forth. otoh,
   rewriting is _very_ important in the PIC18 code generator. however,
   the code that is rewritten is symbolic assembly code, not SCAT code.

 * SCAT is only used to support MACRO. it's probably not general
   enough as a full programming language. however, things are easily
   snarfed from Scheme. (with Dave's move to PLT Scheme for Fluxus,
   there is an interesting road to travel there though..)

TODO: relation to Factor and Joy.


Entry: peephole optimizer
Date: Fri May  9 18:25:40 CEST 2008

maybe it's better to separate the machine specific optimizer from the
code generation step? that way the peephole optimizer can be reused
with different languages, and probably be tested using a machine
model.


Entry: assembler expression language
Date: Fri May  9 18:35:17 CEST 2008

what currently is 'target:' might better be written in s-expr syntax,
so that it's easily converted to concatenative syntax. (the other way
is more difficult).

while now it's kind of cute to have this concatenative language map to
a concatenative assembler expression language, later when external
assemblers need to be supported, this might become a nuisance.

also, and probably more important: a distinction needs to be made
about data types and partial evaluation:

  * use scheme's infinite precision types
  * use only target types (i.e. accurately SIMULATE the computation)


Entry: documentation + presentation
Date: Sun May 11 11:42:37 CEST 2008

introduction:

  Brood is a metaprogramming environment for deeply embedded
  programming, starting with the idea: "How to modernize the tethered
  Forth approach?". Forth is appropriate for programming small
  computers, but too low-level for a host-side metaprogramming
  framework. Scheme is ideal for this.

  The second objective is to generalize this to special-purpose
  problem description languages.

what is metaprogramming?
  - use language A to generate code in language B
  - A = B possible, but more likely A > B  (more high level)
  - partial evaluation of the usual language tower, to limit
    complexity of on-target support code. (give up some generality)
  - overall idea: use high level construct where possible, but
    specialize to low-level where possible.

why are macros important?
  - aren't functions enough?
  - partial evaluation: separate compile and run time. (get extra cake
    at compile time without giving up possibility to use highly specific
    code.)

why these weird languages?
  - scheme = clean lisp. lisp's strength:
      * metacircular interpretation (language defined in itself)
      * leads to easy metaprogramming (lisp macros)
      * scheme: based on untyped lambda calculus = functional
        programming with imperative extensions (environment model)
  - forth
      * due to the concatenative composition model, the language
        itself is quite powerful in itself, despite its
        simplicity. (even without dynamic memory management or garbage
        collection. related to innate 'linearity')
      * base language = static, suited for real-time applications
      * efficient: thin machine model for simple sequential chips
        (less efficient for pipelined number crunching processors: a
        dataflow language would be better suited there).
      * simple metaprogramming
      * it has a purely functional + purely concatenative subset

different brood layers:
  - PLT scheme module system + module languages
  - SCAT: purely functional intermediate language implemented as
    Scheme macros.
  - MACRO: purely functional metalanguage on top of SCAT. a MACRO
    program generates (symbolic assembly) code. it includes PAT which
    combines code generation and peephole optimization.
  - FORTH: syntax on top of SCAT or MACRO to provide the
    non-concatenative part of Forth (parsing words like ':').
  - ASM:
      * target specific assembler generator
      * target address expression language (= SCAT)
      * standard n-pass branch instruction code relaxation
  - LIVE: live target interaction / simulation framework.

why from scratch?
  - to gain deeper understanding
  - to find a natural modularity without tool-specific idiosyncrasies

can it use external tools?
  - yes, but design is optimized for internal tools. (i.e. compiler
    -> assembler interface uses structured data instead of text)

can it use different languages?
  - interfacing on object level: no problem (not implemented yet though)
  - since custom languages are the core business, i see little
    advantage supporting standard languages (like "C") directly.
  - however, purrr is a core component.

why not OO?
  - FP is natural: compiler = a function. maps source code to object code.
  - stateless code generation makes different code generation paths
    easy to implement (output feedback without environment setup)
  - easier to do OO in FP than vise versa.

different implementation language?
  - Forth/C/C++: been there. too low-level while performance payoff
    not so important.
  - Perl: i tried before, but i prefer structured data to strings
  - Java: too clumsy.
  - Haskell: i'm tempted, but probably too little wiggle space to
    evolve a design. The final impelementation however might work well
    in Haskell. Scheme's approach is conceptually closer to
    metaprogramming, also wrt. ML.
  - other dynamic OO languages (Python,Ruby,Smalltalk,...): i'm not
    particularly convinced they are better than a FP oriented
    language.


Entry: inheritance for state threading
Date: Sun May 11 14:14:23 CEST 2008

What is the practical reason for using threaded state instead of real
state? to simplify composition of code generators, primarily to allow
multiple applications without causing side-effects. It makes the life
of the optimization implementer easier.

This is probably a spot where multiple inheritance might be
appropriate: if there's no clear hierarchy to state extensions,
forcing one might not be a good idea.


Entry: target expression language (TEL)
Date: Sun May 11 14:46:50 CEST 2008

1. what is it?

The target expression language is the vehicle for expressions that
depend on target labels (static memory addresses), and are passed to
the assembler to be evaluated after static target memory allocation.

In the integrated compiler + assembler architecture in Brood, these
expressions are computations closed over (initially unresolved) target
word structures. For external assemblers, they need to be translated
into strings that represent target assembler expressions. Because of
the need to represent external tools, this language benifits from an
intermediate form. (currently, and only for illustration, this is
symbolic SCAT code, but will probably be replaced by s-expression code
later).

2. where do these expressions come from?

The expressions are generated by the peephole optimizing code
generator, mostly as partially evaluated target code. I.e. the Purrr
code
           ' main 1 +

compiles that code as the expression which adds 1 to the address of
the "main" procedure word. At compile time it can be determined that
the value can be obtained at assembly/link time, so literal
instructions can be generated. However, at compile time only the
computation can be stored, due to possible dependency on (as of yet
undefined) target label values.


Entry: purrr compile time expressions
Date: Sun May 11 15:01:22 CEST 2008

One of the cool core features of Purrr is the ability to inline
compile time computation without extra annotation. (In Forth
traditionally the words '[' and ']L' are used.) This allows for a very
flexible macro composition mechanism.

However, the computations are performed in infinite precision, and do
not give the same results as would do the same code without these
compile-time computations. I.e. in Purrr18:

             1000 30 /

makes no sense on target due to 8-bit limitation (and possibly
non-availability of the / operator), but makes sense in Purrr18
because the endresult will be truncated only in the end. What is
compile is
             [DUP] [MOVLW (1000 30 /)]

because the target is only 8-bit, this sort of computations is very
useful, and really should be the default (over annotated
meta-computations).

Now the question is: i'd like to build a simulator that can execute
the generated PIC18 code. is it possible to generate intermediate code
that's easier to simulate than PIC18 code, but produces the same
results?

This looks like a very hairy problem: PURRR18 is a 'dirty low level'
language where constructs are defined as-is, and you should be aware
of bit-depth limitations and flag effects etc.

The stay-out-of-trouble part of me says i should stick to a genuine
PIC18 simulator. It can simulate code generated by other means.
Following that approach probably also makes it easier to plugin
external simulators.


Entry: external tool interface
Date: Sun May 11 15:16:15 CEST 2008

To increase the commercial usefulness of Brood, external tool
interfaces are absolutely essential. As a test case, it these things
should be present:

  * gpasm + its meta language
  * gpsim

Instead of writing a simulator, it would be a better exercise to skip
the interpreted part and create a simulator generator: this would
allow testing of the C-code generation facility for 2 reasons:

  * an external, possibly C-based interface will be necessary
  * simulators need to be as FAST as possible


Entry: gpasm / mpasm expression syntax
Date: Mon May 12 11:33:28 CEST 2008

(see MPASM user guide, chapter 8 for expression syntax)
http://gputils.sourceforge.net/33014g.pdf

something i didn't know: this language is apparently stateful. there
are accumulating expressions like '+='

i'm not sure whether state accumulation is really necessary though:
most of what it would be useful for can probably be captured by the
compiler, unless it depends on target code addresses. usage will
tell..


Entry: electronics engineers should learn scheme
Date: Tue May 13 00:13:36 CEST 2008

I don't think I know anyone who has written code in some language and
at some point realized that the language is not powerful enough to
express a certain pattern, then to move on to writing some script that
actually generates code for that particular language from a more
highlevel description (or simply a set of parameters).

The idea is: it's difficult to create a language that will allow one
to describe all possible applications. However, it's not so incredibly
difficult to create a SIMPLE language that's aimed at being easily
EXTENSIBLE using a MACRO language.

So, if you know it's going to happen at some point, why not embrace it
from the start and call yourself a language designer instead of an
application programmer.

This happens especially in domains where hand-assembly is still
important: deeply embedded software.


Entry: documentation
Date: Tue May 13 12:56:32 CEST 2008

Time to start documenting. Let's make it a literal program with proper
online cross-ref + a way to reference to ramblings. Let's make this
into a tool to structure code for refactoring purposes.

Starting with scat.ss


2 kinds of comments:
   * paragraph:              ;; blabla
   * column                  (+ 1 2) ;; add it

comments are attached to an expression.

maybe it's best to avoid column comments entirely?


http://groups.google.com/group/plt-scheme/browse_thread/thread/1e2cae24ec84b70a/b59b55e3990da368?lnk=gst&q=scribble#b59b55e3990da368

That thread has an interesting comment on source code documentation:
you need BOTH reference (per function doc) and general overview /
meaning of a bunch of functions.


Entry: load vs. require
Date: Tue May 13 19:06:25 CEST 2008

the next problem is load vs req.. should i keep load? in the current
.f files a lot is done using late binding: require the includer to
specify some words.. this goes against the bottom-up module
approach. how to solve it?

finding a decent solution for this late binding is quite important:
code generation is heavily parameterized: it's inconvenient to have to
specify.. i'm already using this trick in the core, so why not in the
libs.. the only problem is: it can generate run-time errors.


Entry: org (non-declarative code)
Date: Tue May 13 20:33:58 CEST 2008

boot.f contains non-declarative code that calls 'org'.

so.. how to fix org?

the result should be a word struct that has an assigned address. the
problem however is that the assembler forces addresses. so this needs
a fix in the assembler and the compiler.

the problem with org is that a word can start at a certain location,
but be split after that. the previous mechanism might not be so bad
actually..

switching back to pointers = collection of shallow binding stacks.

ok. now how to solve this in forth?
there needs to be room for assembler directives OUTSIDE of code
definitions.

nope.. this violates some entry point stuff..

the org shouldn't be an assembler directive, but some command attached
to the list of words passed to the assembler.

ok, implemented in the compiler: per word, there's one instruction
that can be passed to the assembler about where to assemble the code.

how to specify this in the language? it's really like ':', but
different.. it would benefit from some kind of parameterization.. this
is a tough one: jeopardizing the clean per-word forth defs..


Entry: org
Date: Thu May 15 12:02:03 CEST 2008

so..

: bla 1 2 3 ;

but what about

: 123 1 2 3 ;

where '123' is the address?

the problem is, code outside of a definition is no longer
allowed. the only way parameters can be passed to the assembler is
through instructions inside a target-word instance. maybe the same
route should be followed as for variables? add some pseudo asm..

let's go to the root problem:

  * allow creation of words/macros from within macros
  * allow setting of address of these words

fuck i'm doing language design again..

actually, it's not so bad.. the trick is to make 'org' operate on the
current label. the code to compile a jump at a certain location then
goes:

macro
: install-vec

    ` VEC label   \ create new label
    #x200 >org    \ set current label's org
    do-vec exit   \ compile its code
    org> drop     \ restore org
;


looks like it's working.. the idea: to allow creation of WORDS within
MACROS. note that to create macros within macros a different mechanism
is necessary: introduction of names needs to be done on the Scheme
macro level, so words created as such are not accessible to the bulk
of the code by name.

it's still not optimal.. it gets in the way of straight-line
code.. maybe i should add this concept:

  code that comes from the compiler needs to be assembled in straight
  line, but the compiler can ask to dump some code somewhere else too.

this should make anonymous code possible too.. argh

maybe this is good enough: the only place where it will get in the way
is re-arranging of code locations by the assembler (or intermediate
step). i.e. the connect-words! function in target-compile.ss won't work.

the real problem is: whenever an org-pop happens, compilation can
continue at the word where the corresponding org-push happened. this
might be a clue about how to implement. the compiler doesn't need to
provide a list of words, but a list of list of words, where the
inner lists have fall-through, and the outer ones are independent.


Entry: org and fallthrough
Date: Thu May 15 14:15:51 CEST 2008

Another example where a collision of two or more seemingly trivial but
annoying problems that resist elegant solution in the current paradigm
leads to a better paradigm.

Because of fallthrough, which is a low-level property of assembly code
i don't like to give up in the PURRR language, the order of words is
important.

However, words that ORG at a different address are independent of
those that came before, and words that EXIT are independent of those
that come after. This can easily be reflected by adding a 2nd level of
nesting in the representation of a target word collection:

        (deque-of (stack-of target-word?))

The operations are:

    * EXIT/ORG: create a new current-fallthrough list (with possible associated address)
    * QUEUE:    move current fallthrough list to the end

What data structure is this?

  * access top element :: (stack-of target-word?)
  * add new element to the top
  * move top element to bottom

So, it's a combination of stack and a set, implemented using an
assymetric deque. Stuff popped off the stack is recorded in the set
(and looses its order).

Actually, it might be implemented as 2 stacks directly.


Now, there's a deeper problem: this accumulation needs to span across
words, so the point where words are already packaged needs to be
modified to allow accumulation.


Entry: labels
Date: Thu May 15 18:44:27 CEST 2008

Every name that occurs in a source file corresponds to a "define" in
the scheme expansion, and is associated to a macro: a function that
generates code for the named construct. For target words, this creates
a reference to a word structure.

The problem with target words is that they have fallthrough, or
multiple entry points.


Entry: the Forth parser
Date: Fri May 16 10:32:27 CEST 2008

Turns out that collecting Forth into separate definitions isn't a good
idea, because there is no 1-1 correspondence between names that start
with ':' and eventual word structures:

  * there are multiple entry / exit points: words can fallthrough and
    thus are connected

  * it's possible to generate words on the fly: each _label_ should be
    captured into a corresponding scheme definition for the macro that
    generates it, but code inbetween can be accumulated.

Maybe it's easier to just accumulate everything into one giant
function? Simply using 'compose' on the current structure is probably
enough.

So.. instead of

(define (wrap-macro/postponed-word name loc macro)
  (let ((w (new-target-word #:name name
                            #:realm 'code
                            #:code macro
                            #:srcloc loc)))
    (values
     (macro-prim: ',w compile)
     (lambda ()
       (compile-word w)))))

we can have

(define (wrap-macro/postponed-word name loc macro)
  (let ((w (new-target-word #:name name
                            #:realm 'code
                            #:srcloc loc)))
   (values
     (macro-prim: ',w compile)
     (compose macro (make-target-split w)))))

where everything is dumped into a single macro that generates the
code, to be executed later by 'compile-word'.

this seems to work in first try..
let's clean it up a bit.


Entry: labels and multiple entry points
Date: Fri May 16 15:48:58 CEST 2008

code with multiple entry points like this

  : default-bla 123
  : bla-it      1 + ;

is useful to have, but it's difficult to handle. the problems happen
when default-bla is accessed in isolation, i.e. when moving around
code.

this needs a proper data structure, or at least assembler support..

* split the code into fallthrough chunks = a list of words where only
  the last one is terminated.

* add assembler support for a 'fallthrough' opcode.

the point is: make the data structure such that optimization and code
migration becomes easier to do. fallthrough code should be treated as
a single entity, even if it contains multiple entry points.

operations on word w:
  (A) does w fall into some w' ?
  (B) does some w' fall into w
this is an extra level of linking between words. (A) is essential
knowledge, but (B) doesn't matter much for the word w itself..

so, each target word has a possible fallthrough word.


what about this: pass the assembler a list of words, where each word
is the head of a fallthrough word chain. the extra compiler state this
requires is a set of independent chains.


Entry: compiler state update
Date: Fri May 16 16:18:48 CEST 2008

state is growing larger so pattern matching isn't the best way of
updating it. maybe best to use local mutation: copy the state on
entry + perform imperative update.

hmm.. that looks even more ugly due to long names and explicit get/set

this just needs to be factored.

ok, with some factoring (struct dict) it's all a bit more readable:
compilation generates a list of list of (word code) inside a dict
struct, which will be collected into a list of head words that have
the code fallthrough structure recorded.

next: the 'org' operations + fallthrough disconnect on jumps.


Entry: redefining words + compiler build log.
Date: Sat May 17 11:39:19 CEST 2008

I need a proper explanation about why it's good or bad to redefine
words. This is about installing 'hooks'.

The problem with hooks is that they can get difficult to
understand. Let's add some warning to this redefine process.


Entry: implementing 'exit' chunk splitting
Date: Sat May 17 13:23:52 CEST 2008

basicly, this needs to:
   * split with new label = dead code
   * collect current chunk

ok.. this, together with factoring out the target-post code and dead
code elimination, which is now as good as free, seem to work fine.

next: org

small fix: instead of eliminating dead code, it's better not to
generate it in the first place: compiler will drop when code is not
associated to a label.


Entry: conditional assembly
Date: Sat May 17 17:15:31 CEST 2008

Something that's not implemented yet: elimination of "<literal> if",
which reduces to an elimination of an "or-jump". This requires some
thought, but should be not so difficult to do.


Entry: Jump chaining
Date: Sat May 17 17:18:16 CEST 2008

This needs to be performed at assembly time due to delayed
computations. It's straightforward though: examine the opcode at the
start of the target word, and check if it's an unconditional jump.

This optimization might introduce new dead code.. Is it possible to
move it somewhere else?


Entry: implementing org using new datastructures
Date: Sat May 17 17:25:38 CEST 2008

org is a property that can be attached to a word chain. what i'd like
is a way to "inline" code that creates different code chains, without
affecting any optimizations. this would also be handy for anonymous
words.


Entry: org again
Date: Sun May 18 12:23:03 CEST 2008

It's not so simple, because there is no way to guarantee there's only
a single chunk going to be compiled: the problem is 'org-pop' which
needs to restore the current/code/chunk state.

The right way to solve this is to dump the whole structure on the
control stack, and restore it on org-pop.

OK: push-chain and pop-chain work. it's now possible to create new
word chains while compiling another one.

It's funny how Forth syntax's inherent lack of nested structures makes
you appreciate the simplicity of s-expressions. However, it's easily
solved by introducing balanced tokens.

So, what about this: if a label's name is anything else than a symbol,
it will be evaluated and used as code location.

Some more changes: forth/forth-tx.ss will now save the prelude code
under a #f label. scat/ns-tx.ss is changed so #f labels do not have an
associated define, but DO evaluate their expressions for side effect
(which will define a target #f word)

Changed wrap-macro/postponed-word to not create a target word struct
if there's no word name.


Entry: delimited control again
Date: Sun May 18 17:11:49 CEST 2008

Now.. hold that thought. Maybe it is better to introduce a proper
nesting structure to get at macro code..

observe:
  * 'reset' needs to be a macro
  * 'shift' can be plain code

what about making reset ']' and put the logic of shift in the
balancing word? i.e. for[ 1 2 ]

Maybe it's best to return to shift/reset from semantics point, and not
from the particular implementation scat->scheme i'm using.


Entry: Labels and code
Date: Sun May 18 18:48:18 CEST 2008

so.. maybe it's time forget about splitting the target code into
words? is it a false abstraction?

not really.. but the current fallthrough mechanism does look a bit
clumsy. the problem wrap-macro/postponed-word solves is the creation
of wrapper macros, which is quite essential as it allows ALL names to
be handled by the PLT module + lexical scoping..

whatever the representation is of the code that generates the
assembler is moot. currently it's this:

          (#f . prelude-macro)
          (word0 . macro0)
          (word1 . macro1) ...

the macro<x> are then wrapped with a split (label) and concatenated
again.

a different, possibly simpler implementation would be to collect all
names separately, and define a single big macro that generates the
module's code. the problem here is that inline macros need to be
handled differently...

so let's stick to the current implementation.

  * macro definitions are clearly delimited: one name for one macro,
    no multiple entry points.

  * forth definitions have multiple entry points + there's an unnamed
    prelude at the beginning of the file. (names merely interleave one
    big macro that generates the code body)


Entry: org again
Date: Sun May 18 19:57:01 CEST 2008

ok.. org-push and org-pop now work: they will compile a single chain
of words.. however it's not what it should be!

  * it's still impossible to SET org permanently.
  * multiple chains wil 'org-pop' by themselves..

why is this so difficult to get right? probably because i'm trying to
keep the effect local, while org is really a global effect on the state
of the assembler.

so... can org-pop somehow garantee there's only a single chunk
compiled? no..

we'll get there eventually.. just need to find the right abstraction.

another problem: compiling a jump table will look like a bunch of
unreachable code.. (a jump table is a bit of a hack)

the jump table is easily solved by using a different word to separate
entries, which could enable some extra checking..

so, to look at this from the bright side: requiring a restricted
bondage-style structure for the compiler exposes a lot of corner cases
that exploit side-effects of low-level constructs. such side effects
need to be eliminated: the core needs semantic simplicity, where
semantics is close to machine semantics for data operations, but
closer to abstract semantics for control structures.


Entry: jump tables
Date: Sun May 18 20:40:00 CEST 2008

Uptil now these have been an abuse of ';' which breaks with the new
dead code eliminator. So, how to fix that?

     : dispatch route
          read ; write ; help ; reboot ;

the 3 last jumps will be eliminated.

so a different macro is necessary. something like


     : dispatch route
          read , write , help , reboot ;

with a bit of abuse of notation, comma is polymorphic and operates
both on CW as on QW. on CW it compiles a jump without exit.


Entry: quoted macros (the 'address' word)
Date: Sun May 18 22:40:40 CEST 2008

There's a problem with

  ' abc

This really should produce [qw #<word>], where #<word> is the SCAT
word representing the macro that postpones the compilation of the
word, such that (' abc compile) == (abc)

Note: choose 'run' instead of 'compile'.

But there's one problem. What is this?

   ' abc ,

On QW values, comma will always produce a [dw], so this should compile
the address of a function, and fail if it's a generic macro.

Ok, i solved it before, it's "address".

So, i wonder.. Can't this be done automatically, as part of a
postprocessing step before things are handed to the assembler? or as
part of target evaluation code?

What's the idea here:

    postpone the conversion from macro -> target-value as long as
    possible, because the former is more general, but cannot survive
    the assembler. the problem is the inclusion of such values in
    assembler expressions: in that case the expression evaluator needs
    to be aware of them.


target-rep can't know about macros (to simplify design), so either/or:

  * catch all macro instances before they go into a (target: ...)
    expression or end up as a plain macro in the assembly code.

  * use explicit 'address' after using the tick operator.

  * give target-rep a means to evaluate macros.


the middle one might be best.. that way representations of words
(quoted macros) are different from addresses. essentially, they are..


Entry: i need this to be done
Date: Sun May 18 23:27:34 CEST 2008

I'm a bit fed up with mucking about in the low level architecture.
Apparently, a sane combination between high level constructs
(i.e. code graphs) and low level features such as fallthrough make
things complicated, and lead to some tough choices. Anyways, it does
look like I'm at some kind of end point with this. It's still quite
elegant and powerful.

One point needs some more exercise: construction of anonymous
macros. This probably needs a move to a lazy architecture for macro
representation.

Maybe instead of concentrating on for .. next and dynamic macro
creation, i should really concentrate on static anonymous macro defs
first..

The words [ and ] are not used yet. Let's turn them into static
anonymous macro creators.

EDIT: see, it's getting big before it's documented. this facility is
already there, but using the s-expression syntax:

box> (macro:: 1 2 (3 4 5) run 6)
(qw 1)
(qw 2)
(qw 3)
(qw 4)
(qw 5)
(qw 6)

ok, that was straightforward: (forth/forth-tx.ss)

(define (open-paren-tx code exp)
  (let-values (((code+ rep)
                ((rpn-represent) (stx-cdr code))))
    ((rpn-next) code+
     ((rpn-immediate) rep exp))))

(define (close-paren-tx code exp)
  (values (stx-cdr code) exp))


NOTE: this opens the road for a lot of functions expressed as hof,
i.e. ifte.

Entry: code annotations
Date: Mon May 19 11:54:18 CEST 2008

it's really not working well to add a symbolic representation from the
code: sometimes there just isn't any, due to effect of macro
transformation. it's probably best to just store source location,
since it's only for documentation purpose.


Entry: purrr
Date: Mon May 19 12:08:46 CEST 2008

so what's special about purrr? if i'm to explain what this is about,
the purrr language itself is rather central to the idea.

 -> partial evaluation
 -> extensive use of macros
 -> functional metalanguage


Entry: instantiate left-over macros
Date: Mon May 19 12:38:22 CEST 2008

Maybe it's possible to leave quoted macros in the code and instantiate
them? this would be a really powerful extension. Can be combined with
turning local exit points back into return/jump ops.


Entry: assembler directives
Date: Mon May 19 13:29:15 CEST 2008

The brood assembler has relatively few assembler directives. This is
intentional: the assembler performs ONLY linking and relaxation (and
in the future possibly related operations that optimze these
processes, such as code reordering.)

However, in the PURRR language, some control over code location is
desired. How to satisfy

  - control over address location
  - chained code to facilitate re-ordering

Yes, it's 'org' again..

Maybe it's best to let 'org-push' save the chain list too, that way
'org-pop' can ensure there is only one chain, which is what we want..

(maybe this should just push everything, making the internal compiler
state acessible to some macros?)

Trying: pop-chain will save the recorded chains as only one
chain. This works: ensures at least compilation at correct address.

So, this fixes the chain bug for org-push/pop, but still doesn't
provide an 'org'. Maybe this needs to be specified somewhere else?


Entry: next
Date: Mon May 19 21:09:55 CEST 2008

look at plane notes, and entry://20080501-014904

two deep problems remaining:

  - how to solve 'org'  (or, is assembler state access allowed?)
  - dynamically decompose macros (loop optimizations, lazy code)

the rest should be straightforward. i do not have the energy to tackle
any of them atm.. can they be ignored, and postponed until after
porting of the interaction code?


Entry: strict/lazy and macros
Date: Tue May 20 02:31:56 CEST 2008

so.. in a lazy language, less macros are necessary because evaluation
order isn't much of an issue. in a strict language, the existance of
'if' and 'lambda' as a special form infects certain constructs (to
be special forms also)

now, where would a lazy language need macros? there is template
Haskell, so i guess there is some need for metaprogramming..


Entry: use of monads in dsl implementation
Date: Tue May 20 11:24:27 CEST 2008

http://www.cs.yale.edu/homes/hudak-paul/hudak-dir/ACM-WS/position.html


Entry: constants and the 'parameter' word
Date: Tue May 20 13:39:41 CEST 2008

Constants are a sort of typed macro: they represent literal values,
but are not necessarily completely defined in the core compiler:
(re)definition of constants is necessary to obtain a specialized
compiler that can generate code.

The problem i run into is where to generate the error for undefined
constants. Currently, target-value evaluation uses target-value-abort
to signal a value is not available. However, this should really be
reserved for labels only.

-> fixed: partial evaluation of target-values NEVER calls the actual
   evaluation: this means make-constant can pass code straight to the
   assembler (currently: just the constant name)

fixed another problem: wrap-macro/mexit requires the state to be
compilation-state, while macro->code from 2stack.ss works only on
2stack. added a mechanism to temporarily wrap the 2stack state in a
compilation-state object.

so.. this can be moved to forth. simply add a word 'parameter' which
will create a constant that's later to be redefined. a parameter is
something more than a mere macro: it has a guarantee to produce only a
single value. (i'm thinking about things like 'fosc' and 'baud')

  parameter baud
  parameter fosc

i can't call it 'constant' because of confusion with the way that word
works in standard forth. so:

  A parameter is a stub macro that produces a single literal
  value. These serve to parameterize lowlevel code without resorting
  to more explicit parameterization. (i.e. 'fosc' might influence a
  lot of timing related constants)

  A parameter that is actually used to generate code needs to be
  (re)defined as a macro that produces a single value. Otherwise an
  undefined-parameter exception will be thrown at assembly time.

  Parameters are thus a somewhat controlled violation of the overall
  bottom-up structure of Purrr code.


small prob: the 2stack / compile-state code gave problems again. what
i'm doing now is to avoid that problem and require parameters to be
defined in forth code with mexit support.

EDIT: another problem with paramters: redefining them with another
paramter is not a good idea: basicly, there's a sequential element
here (load).


Entry: bored
Date: Tue May 20 19:53:21 CEST 2008

i'm getting thoroughly bored with this.. need to find some new tricks
or get something going, because i'm loosing motivation.


Entry: hooks / late binding and kernel modularity
Date: Wed May 21 15:06:03 CEST 2008

this problem is more serious that i thought. it's not only used for
constants, but for generic macros. maybe add something like a
'macro-hook' which is like a parameter, but doesn't guarantee anything
about code generation.

i need to think a bit deeper about linking and modularity.. the pure
bottom-up approach won't work well. maybe the 'unit' approach is
really better?

is it possible to import a module called 'link.f' that implements the
cyclic name resolution?

am i going to be anal about names? i'm already going quite far with
early binding.. consistency counts.. the 2 uses are:

  * compiler extension by redefining some core macros (i.e. 'dup')
  * code parameterization (both constants and generic macros)


wait a minute..

if code generation can be postponed until all the macros have been
loaded, then simply adding stub macros that are redefined later would
work just fine.

maybe best to take the inelegance and get the damn monitor to run.. in
essence, it's a problem with the .f source code. any abstraction
necessary to make that code more modular can be added later.

EDIT: it really gets in the way.. 'route' gives problems. but that can
be imported? this problem is solvable, but requires some thought..


Entry: another layer?
Date: Thu May 22 11:24:05 CEST 2008

I was thinking about putting target-compile.ss in forth/ because it's
mostly about extending the macro/ stuff with features necessary for
code instantiation and target label management. Or, it should be
placed in compile/

Can macro-lang.ss be made independent of target-compile.ss ?
Yes, when macro-lang.ss splits off a target-label specific part. This
code should move to compiler/  (which is badnop/ now -> get rid of
that name)


Entry: units
Date: Thu May 22 11:38:11 CEST 2008

I need separate compilation with clearly defined interfaces for some
components. One would be the logger: since it cuts through everything,
changing its code requires recompilation of the whole codebase.

A nice excuse to try to understand units, then to move on to using
this for .f files too.


Entry: another bug in redefine
Date: Thu May 22 14:22:19 CEST 2008

whenever a word is created, it creates a replacement macro. this macro
should have redefine enabled also.

( i enable mutation and things start going wrong.. )


Entry: no more juice
Date: Thu May 22 21:13:30 CEST 2008

Looks like i need to take some time off of the project, do some other
things. Looking at what i did in the last 2 weeks:

   * delimited continuations for loop body optimization: strict vs. lazy

   * trying to fix org, it's still not fixed (language design issue: i
     have no way to annotate this in the current structured
     representation)

   * struggling with specialization (redefine + super) and plugin
     behaviour.

   * trying to write documentation for the project

   * thinking about simulators, and simulator generators


The good things that happened: cleaned up compiler data structures +
separated postprocessing optimizations. Those look nice now. The rest
was a random walk, however, the EXTEND and LINK problems are quite
important, and as far as i can see the only real hurdle.


Entry: more juice
Date: Fri May 23 09:30:49 CEST 2008

got a good night sleep + some ideas about writing documentation

today:

- write more docs: create a reference doc extracter
- separate some code and change names to make the module hierarchy more clear
- write something about forth and closures


Entry: introduction documentation
Date: Fri May 23 09:31:17 CEST 2008

- It is about language:

     * Lisp (more specifically PLT Scheme)
     * Forth (the Purrr dialect)

- It is about Meta-language: Macros

     * S-expressions (Lisp) and concatenative syntax (Forth) are easy
       to process. It's possible to make an all syrup Squishee.

     * Forth, viewed as a functional language, has an arbitrary
       evaluation order. This presents an opportunity for generating
       static, specialized low-level code from high-level templates by
       employing implicit compile-time evaluation. The Purrr
       experiment is about making Forth more declarative.

- Design is accessible.

     * Unit of composition = Scheme module.
     * Forth source files are Scheme modules
     * Design is layered: Scat, Macro, Compiler, Forth syntax, ...

- Goal
     * Small business and enthousiasts first
     * Test in industry setting (needs a specific problem to solve)


Now, this should be elaborated in a couple of chapters, with lots of
examples.

Entry: Community bootstrap
Date: Fri May 23 11:23:18 CEST 2008

A plan to attract developers. What is necessary?
 - it should work relatively flawlessly for 1 target
 - it should be well-documented
 - extensions should have a clear API

The first two are mostly perspiration. The real challenge is to
standardize some APIs. I am hesitant though about standardizing too
much: the aim of the project remains the construction of a tool for
'from scratch' development.


- Purrr language extensions

    These are at library level, and can be developed separately from
    the core project. Purrr standardization is mostly ad-hoc, but due
    to PLT's module system, glue layers are fairly straightforward to
    maintain, and standardization can be made the responsibility of
    the system designer.

- Processor extensions:

    Target-specific extensions can be separated entirely from the
    Staapl core. The PIC18 architecture specification is an example of
    this. It boils down to:

    * Creating an assembler. This can be a layer on top of an external
      assembler, or a specification in the assembler generator
      language. (see pic18/asm.ss)

    * Create a set of macros (compiler extension). Map the Purrr
      language to the machine structure (data stack, return stack) and
      implement the primitives. (see pic18/macro.ss)

    The interfaces for these operations are now fairly standard.

- Internal compiler extensions:

    Changes to Scat, Macro or the compiler would best be incorporated
    as as core changes. Such changes can be temporarily forked, and
    later merged into the main distribution. However, I expect this
    part of the program to be relatively stable.

- Simulator extensions:

    As an addon to the interactive part of Staapl, a simulator
    (generator) would be nice. The interface for this still needs to
    be developed. This could be written as a processor extension.

- Extra languages:

    A future goal is to construct a static dataflow language to
    supplement Forth code for building DSP applications. How to
    structure this isn't clear yet.

- Application spin-off communities

    I'm thinking about Sheep, CATkit and KRIkit: the applications that
    have been used to battle-test the Purrr language. This would
    involve some kind of Purrr library standard. I'm thinking about
    writing something a bit closer to ANS Forth, but there is
    insufficient project pull to do it atm.


Entry: redefining names
Date: Fri May 23 12:37:48 CEST 2008

Let's re-iterate the design choices:

CASE A: compiler specialization
CASE B: parameterized kernel code

for CASE A the choices are:

   explicit renaming vs. implicit redefinition.

for CASE B:

   linking (using the mechanism from CASE A) vs. explicit
   instantiation.

EDIT: it might be best not to loose to much time on trying to fix this
now, and 'emulate' the classic load-style redefine behaviour until a
better abstraction mechanism pops up.


Entry: namespaces and parameterization
Date: Fri May 23 13:11:54 CEST 2008

Instead of trying to use late binding, it might be more interesting to
use explicit parameterization where possible. This is one of the
problems with Brood 4 which caused a lot of pain. Let's fix that
first.

Taking the interpreter code as an example. What i'd like to do is to
make instantiation explicit:

: interpreter
    ' io make-interpreter compile ;


There's an interesting problem here with choice of words. Maybe here
'compile' should be used to indicate that there's an instantiation
going on. It's not always clear whether something is dynamic or
static. There's a difference between:


    : interpreter int-body ;

and

    ` interpreter create-interpreter

The latter is preferred. It is more general. But is not possible in
the current implementation. Labels can be defined, but they are only
for annotation, and visible AFTER compilation.

Can this be simplified?
What is desired is something like

    ` bla

Where at that moment the current context has a macro called bla, which
is not yet associated to any code. So how to associate a macro to
code?

Maybe instead of mapping Forth code directly to 'define' it should be
mapped to a 2-step procedure of undefined macro creation + single
assignment.

I'm touching some core of reflection here.. Something that doesn't
really work well together with the way the Purrr language is layed
out. There's no problem with doing this in s-expression syntax, but
expressing this in Forth syntax is not so easy..

Let's see.. Quick hacking it:

      \ create the macros
      declare read
      declare write

      \ define them
      ' read ' write make-io!

But, this requires make-io! to have side-effects + read and write not
to appear in any code before make-io! is executed. That is definitely
not desirable: it would be back to the old imperative style.

So, what about using this as the primary syntax:

    [ 2 3 4 ] define bla

And writing ':' as a substitution macro?


Continuing the random walk. It is possible to do something like:

    : read   io 2nd ;
    : write  io 1st ;

But that's exactly the thing that's not flexible enough when a lot of
macros need to be created. Is it really a good idea to try to solve
this in Forth syntax, since it obviously has some shortcomings..

It looks like the only way to solve this is to use 'parsing word'
preprocessors. Maybe the goal is here to figure out a way to create
such substitution macros in .f files? It's certainly possible to
create scheme files which have the 3 levels:

       - parsing words (bound to scheme level 1)
       - macros (scheme level 0)
       - forth words: consequence of hitting 'compile'

The problem really is how to map the scheme s-expression flexibility
to Forth code. Can't be done?

This is a dead end..

Solve it with substitutions and be done with it?

((io read write) (:macro read io drop compile
                  :macro write swap drop compile))


Entry: Explicit instantiation and macro assignment.
Date: Fri May 23 14:04:35 CEST 2008

EDIT: comes from blog, now degraded to a rambling because it's
confusing given i'm embracing 3 level code now (word,macro,parser)

There is one pattern that is cumbersome to express at this moment:
   * Write logic as macros without specifying concrete names.
   * Provide names during instantiation.

If this is about creating a single function or macro, it's straightforward:

  : interpreter
      ' receive
      ' transmit
     compile-interpreter ;

Because the .f syntax at this moment assumes that all macros occur in
isolated (macro . code) pairs it is not possible to define a
collection of functions/macros.

What is desired is something like:

   ` read ` write create-io

Which requires some form of mutation, even if it's single
assignment. Alternatively parsing words preprocessors could be used:

   create-io read write

To expand to

   :macro read make-io-read ;
   :macro write make-io-write ;

This requires a special purpose syntax, i.e. something like

   parser create-io read write ==
      :macro read make-io-read ;
      :macro write make-io-write ;
   end

The problem with this is that parser generating parsers are hard to
express. Is it possible to create a single abstraction that does not
limit the number of orders? It looks like programmable parsers are
necessary anyway when macros can't deal with identifiers.

The good thing about using the previous approach is that it maps very
well to Scheme's define and define-syntax.


TODO: place this in the framework of compilation phases and figure out
a better syntax for defining parser macros.


Entry: substititions
Date: Fri May 23 16:11:41 CEST 2008

Why are parsing words necessary? Because modified semantics are
allowed for symbol definition (':')

What about a Forth syntax for substitution macros?

  parser variable name == create name 1 allot ;

This would solve virtuall all problems, since it gives access to named
subsititions, but it adds a level of inelegance to the language. Well,
sort of: they are already there, so why not make them
available.. These would make sense in interactive commands too.

  (What about ditching all this crap and creating a flexible
   alternative s-expression based syntax i think..)

So, considering that I don't want to loose the good things about Forth
syntax (prefix syntax to eliminate parenthesis) I guess I have to
learn to live with the bad things about Forth syntax (necessity for an
extra composition mechanism due to prefix syntax). It's not all bad,
just some trade-offs..

So, prefix substitutions. They are not like parsing words, but can be
used to emulate them. They are the 'last resort' composition
mechanism which are used to capture prefix patterns.

If I'm going to embrace them as one of the features of Forth syntax,
it might be wise to make ':' not a primitive. More general, it might
be wise to have this as a layer on top of a simpler, single assignment
preprocessor.


Entry: Single assignment base language
Date: Fri May 23 19:23:32 CEST 2008

So, I'm going full circle. Back to postfix notation with a (set! name
value) component? Is it actually possible to do so? Prefix primitives:

   [ ]       macro composition
   ' <name>  macro variable dereference (+ creation?)
   macro!    single assignment definition
   forth!
   variable!

This would require a separate state machine (that resembles a forth)
to run during the preprocessing step.

This looks like a nice solution, but i cant help to think: why is this
DIFFERENT than the Scat state machine? Can this be written in Scat?
And don't i introduce composition problems again?

A scat machine with state: (stack, in, out) should be able to parse
this without problems.

Now I'm confused.. caffeine poisoning..


Entry: More standard forth syntax
Date: Sat May 24 11:49:11 CEST 2008

Maybe it's a good idea to have an interpret mode anyway, at least
during the parsing phase. This would make it a lot easier to deal with
standard forth syntax like 'constant'.

Looking at what the parser does now, it is already the case. Only
there's just a single mode: compile.

Adding an interpret mode, the language that's active could be plain scat.


Ha, i'me using the [ and ] word for code quotation. Looks like that's
exactly the opposite from how it would be used in Forth.

Questions:
  * is it possible to solve this without interpret mode?
  * using substitutions only, does the hygienic system provide enough
    freedom?

The latter i mean
  (io read write) ==
  (:macro io    1 2 3 make-io-object ;
   :macro read  ` read io ;
   :macro write ` write io ;)

The 'io' name is not visible in code since it's introduced by the
macro.

What can be seen here is that pattern relacement macros need a special
terminator.

parser io read write ==
   m: io     1 2 3 make-io-object ;
   m: read   ` read io ;
   m: write  ` write io ;
parser bla ... ==
  ...
end

Going that route, why wouldn't you write everything as parser
instead of macros?

The problem is, parsers allow you to deal with NAMES, while macros
allow you to deal with CODE. The fact that macros are associated to
names is a practical matter, but they are not allowed to modify or
create names. Parser words are necessary because prefix syntax is the
ONLY way to modify semantics of names, other than to reference the
macro that is bound to it in the current environment.

This might be a bit confusing. There are 3 things that can be bound to
names:

        * parser extensions  -> non-concatenative source code preprocessor
        * compiler macros    -> concatenative (compositional) code generation
        * forth words        -> macro instantiation

It's probably possible to design a language that doesn't need the
first step, but technically even scheme has this: the reader, which
has special features. Having to do it this way is a FUNDAMENTAL
limitation/feature of Forth.

This is DIFFERENT from Forth because it is pure input word stream
substitution. Forth's parsing words operate on the input stream
directly. This is one of the reflective properties of Forth that is
eliminated in Purrr.

EDIT: But, adding this extra level, does it stop there? How to create
parser macro creating parser macros?

The problem is easily solved with s-expressions, but hard to do with
the current implementation: what is necessary is to add semantics
AFTER collecting tree-structured code. This is what s-expressions can
do: parse step follows read step. Is it possible to bring this to
Forth?

TODO: write something intelligent about this problem. it's a deep one,
related to syntax and reflection: the difficulty of unrolling Forth.


Entry: parser cleanup
Date: Tue May 27 11:06:38 CEST 2008

EDIT: this is a mess.. i've reached the conclusion that the 3 layers
are necessary, so it might be best to think a bit about the best way
to represent the highest layer..

gut feeling says current problems with parser (the non-concatenative
part of Forth syntax) are rooted in the way it's implemented.

phase 1: what is ':' ?

this, in addition to creating a new label, will terminate the previous
definition. following the colorForth model, there is no interpret
mode, so definitions run upto the next word. those 2 behaviours need
to be split up.

( simplifying, why not have macros with multiple entry points? this
would not be too hard to solve really. )

so, let's have a look at the single-assignment language for Purrr
parsing. the problem to solve is:

   How to implement a Forth-style syntax (without interpret mode) on
   top of the purely concatenative Macro syntax used by Purrr.

To re-iterate why this is a problem:

  * I'm convinced purely functional and purely compositional macros
    are a good idea: they behave well as a data structure for program
    representation and code generation (compiler structure).

  * Forth is only a thin layer on top of this, mostly to solve
    definition of names in a source file using a familiar
    syntax. Forth syntax in itself is a good user interface. However,
    the design of the Forth language is firmly rooted in reflectivity,
    employing an image based incrementally extensible word dictionary,
    something which i'm trying to unroll into a language tower.

  * There is already a base syntax using s-expressions which
    translates directly to Scheme. The problem now is to find a way to
    translate (linear) forth into tree-structured Macro expressions.

One solution is to

  * create an intermediate language syntax that has all the features
    needed in Forth, but uses s-expressions and single assignment.

  * map this language to an explicit list of definitions

  * find a mapper from linear forth -> s-expressions

This should work for the 3 levels of code:

    - macros
    - words (instantiated macros)
    - parser (Forth source transform patterns)


First attempt in forth/single.ss

Maybe the real focus should be to embed s-expressions in the Forth
language. This gives parse-time lists.

Care should be taken not to create yet another level that's hard to
metaprogram. This needs to sink in a bit.


Entry: parser idea
Date: Tue May 27 14:21:56 CEST 2008

To further distill the actual idea. There are 3 levels of
concatenative code, with each its own interpreter. The real question
is: can't they be unified into one?

Macros

  These are Scat words that create target (assembly) code. They are
  the simplest kind, being built directly on top of Scheme (module)
  name spaces and an RPN parser. This parser reads from an input
  stream, and accumulates an output Scheme syntax.

  IN = stream of identifiers
  OUT = accumulation of nested Scheme expression

Target words

  Obtained as instantiated Macros. In macro/target-compile.ss

  IN = stream of macros
  OUT = accumulation of target code words


Forth preprocessor

  This associates names to Macro or Target code.

  IN = stream of Forth syntax
  OUT = list of (name . value) pairs.


Are they all really necessary? Macros are the core programming
construct and serve to describe programs. Target words determine the
instantiation level of code. Forth preprocessor allows naming of
macros and target words.

The target words layer is a consequence of manual code
instantiation. This is a feature: i want to have this level of
control. In principle this could be eliminated when the language is
made a bit more high-level (i.e. elimination of return stack access).

The preprocessor layer is necessary because of limited reflection: if
macros cannot create named code, constructs that abstract this cannot
be macros, so need to be preprocessors.

Wether all 3 layers can be implemented in more of the same way is an
interesting but not so urgent question. Wether they have to remain in
existence is easy to answer: yes. They are a consequence of two
important design choices:

  * explicit instantiation (macro vs forth)
  * non-reflectivity to simplify code processing and namespace handling


Entry: macro instantiation is memoization
Date: Tue May 27 14:51:34 CEST 2008

What about looking at the code instantiation problem as a form of
memoization? A macro that is inlined twice can be replaced by a single
instantiation and an indirection.

Doing this automatically could lead to a simpler (beginner) language
that does not need a programmer-specified distinction between Macro
and Forth modes.


Entry: next
Date: Tue May 27 16:14:53 CEST 2008

  - fix the parser = a base language + an extension syntax.

  - add a temporary 'load' function to supplement 'require' for
    old-style Forth: it might be better to keep the mechanism in
    there, instead of forcing the use of bottom-up modules.


Entry: embedding s-expressions in Forth code
Date: Tue May 27 16:27:19 CEST 2008

In order for the parser code to work properly, new parsers should be
able to work from within a file. This creates a problem because not
all forms can be identified before parsers are created.

This problem can be bypassed by allowing genuine s-expression syntax
for the parsers. That would also allow parsers to be nested.

{ parsers
  { { variable name }  { create name 1 allot } }
  { { 2variable name } { create name 2 allot } }
}

Why not make this go a bit further and allow generic scheme code? This
would solve basicly all other syntax problems with forth files. Maybe
this is also the right way to map Forth syntax to scheme..

Quick hack works fine, except for one small problem: it doesn't get
evaluated at the right time. Maybe moving everything to toplevel by
using this as primitive syntax can help?


Entry: rethinking forth parsing
Date: Wed May 28 12:43:27 CEST 2008

The problem to solve is to identify parser commands before code is
parsed. This is only possible when all s-expressions are collected
before linear source code is parsed. This means that it's not possible
to define parser words that expand to s-expressions.

There's a chicken and egg problem there that deserves some
attention. The problem is that identifiers can't really be identified
as in scheme. -> look into how this expansion stuff works.

This works fine in scheme:

     #lang scheme/base

     (define broem (bla))

     (define-syntax make-bla
       (syntax-rules ()
         ((_ bla)
          (define-syntax bla
            (syntax-rules ()
              ((_) (+ 1 2)))))))

     (make-bla bla)


So, just expand everything to this, should work fine.

Maybe the RPN compiler should be simplified a bit, using syntax
parameters instead of compile time parameters.. Makes sense?


Entry: today
Date: Fri May 30 00:42:34 CEST 2008

was a day of writing 6 paragraphs of introduction. i think i sort of
got it going: the reasons for brood:

   * lisp is cool, especially for metaprogramming
   * create a small language to metaprogram from within lisp

the problem is to really leverage this, some knowledge about languages
and implementations is necessary. i'm still not sure about how to sell
this to electronics engineers that don't see the point of lisp..

about the code: the parser patterns seem to be important as a final
'highest level of metaprogramming'. i'm still not convinced it is a
good idea, but it seems to be a consequence of wanting to keep old
forth syntax instead of s-expressions. i need to spend a day writing
and thinking about this.


Entry: the parser again
Date: Fri May 30 10:11:18 CEST 2008

the problem is that parser macros defined in an .f file should have
immediate effect. can this be local-expanded or something?

i ran into this deeper problem: parsing isn't really factored when
being inside a single rpn macro: big reliance on dynamic
variables. can this be replaced by something else? probably not so
easy..

lets re-iterate over the expansion algorithm in PLT
Scheme. done.. basicly, it can fish out 'define-syntax' before
expanding the value expressions in 'define'.

so, maybe it should be a true preprocessing step like it was
before. one that converts the forth language to 'compositions' forms.

the thing is, inside a 'macro:' you really don't want any forth prefix
syntax. this should be s-expression syntax only. the only exception is
locals. -> probably best to separate the changes into those that
introduce new global names, and those that don't.

EDIT: not necessary. both data and code quotations are expressible in
s-expressions and part of the RPN syntax. local variables can be added
by using definitions like ((name param ...) body ...) instead of (name
body ...)

(locals are really special: they make sense only when parameterizing
bigger code chunks, with lots of parameter reuse..)


hmm... some things are not really very well disentangled yet.. maybe i
should accept that 1. macro is clean == enough,  2. the forth on top
of that has some hacks to support standard forth syntax on a system
that doesnt have forth's kind of reflection..

damn, this is complicated..

one remark about my method though: forth-tx is too complicated. i find
it difficult to make modifications because it is a hack on top of rpn
body creation. at this moment i don't really see how to change that
without introducing more than one abstraction layer..

maybe it needs a couple of days rest, i'm not coming up with anything
exciting here..


Entry: syntax parameters
Date: Fri May 30 11:17:57 CEST 2008

is this true? using syntax parameters, i can get rid of the hack that
calls the transformers directly.

need to be careful there: every time the expander is called again,
names are marked. because i'm using this mechanism to build a single
lambda expression, names might make more sense unmarked..

Entry: Forth syntax, philosophical approach
Date: Fri May 30 15:35:18 CEST 2008

meaning, through natural language.. i do this too little with the
tough problems that turn out to be huge time sinks..

The problem:

  It should be possible to _define_ new Forth substitution words,
  which is implemented by define-syntax, _before_ the expansion of
  body code.

  In Scheme, due to the use of s-expressions, this is easy. In Forth
  however, the names are burried inside a muck of words: expanding all
  substitition words to expose those words that might yield the
  definition of _new_ substitition words is (probably) not possible.

  Question 1: is it at all possible to fish out these macro
  definitions? If so, how?

  Question 2: if it's not possible, can we formally acknowledge it as
  a shortcoming of Forth syntax and work around it?


Entry: fix it later?
Date: Sat May 31 10:11:37 CEST 2008

since this is more a pride issue than anything else, can it be fixed
later? probably.. it's just about

   * syntax for parsing words
   * allowing 'load'

'load' can be implemented using include/reader : specifying the reader
is essential since it needs to expand to some form that can be
included in a file.

this means i have to construct
  1. a scheme syntax to define forth files
  2. a reader that gives this scheme syntax
  3. module-reader in terms of those 2

the point to start is purrr/forth.ss

This file contains all logic necessary to expand module syntax.


Entry: apologies
Date: Sun Jun  1 00:14:32 CEST 2008

explain:

  why (forth) macros are actually (scheme)functions, and code needs to
  be compiled at (scheme) runtime. (i.e. why is there one level
  (actually 2 if you count the assembler) that has manual compilation?

-> derive from this a proper instantiation syntax for forth code.

EDIT: the explanation is simple: a significant part of the program is
a long-lived target code juggler. that part cannot be just syntax.

the proper instantiation for forth code is of course 'define-ns' in
the (target) namespace. then all words are accessible through
reflective operations.


Entry: module system
Date: Sun Jun  1 00:56:05 CEST 2008

http://calculist.blogspot.com/2008/04/dynamic-languages-need-modules.html

StoneCypher said...

    It is important that you learn a well established language that
    has already successfully grappled with these problems before
    deciding on your own mechanism.

very true..


Entry: rethinking code instantiation
Date: Sun Jun  1 10:54:22 CEST 2008

it's frankly too complicated and ad-hoc atm.. i lost oversight. these
features/choices make it complicated:

  * forth words have associated macros
  * multiple entry and exit points
  * forth parser is a single macro, but uses factored macros

macros are really simple (declarative), but forth syntax +
instantiation makes it a lot more difficult.

code instantiation produces a single macro that runs with
compilation-state to produce a collection of fallthrough words. This
is the remainder after lifting out all macro and parser definitions.

what about making instantiation an operation on macros? i.e. replace a
macro with a wrapper, and collect the body instantiation
somewhere. this doesn't work for multiple entry points though..

so: multiple exit is easy: it's simple to fake in macros using a
'macro return stack' in compilation-state. multiple entry points
however is quite difficult.

is it possible to do this?
   1. bring the representation back to single entrypoint
   2. write multi-entry point code / fallthrough as an optimization

i don't think so.. there is too much code that relies on multiple
entry points. i need a simpler way to represent it.


Entry: again
Date: Sun Jun  1 11:13:26 CEST 2008

try meta level here: i'm loosing oversight because it's not working: i
can't make small changes to see how they propagate through a working
system. the real problem that started this is the inability to define
parsing macros, which lead to a realization that these need to be
instantiated before the rest of the code is parsed (chicken / egg
problem) which lead me to think that this is impossible unless some
form of partial expansion is used, which is then made difficult by the
way parsers are implemented: by directly calling them.

in short:

problem A:

 it's difficult in the previous setting to get to a structure
 where the 'define-syntax' occurances can be isolated before they are
 used.

 it's easy to solve this by requiring them to occur in different files.

problem B:

 it's currently not possible to include a forth file because the
 module expander is not factored properly.

let's tackle B  first.

what's wrong with current forth-module-begin-tx macro? the whole
register-code! business is not so good. a forth: macro should take an
extra argument to represent the instantiated macro.

'register' is used in the macro expansion to store the word struct
produced by the wrap operation. 'compile' produces the code graph. the
problem is: i'd like these to be composable from different sources, so
a chunk of forth code needs to produce something that can be
accumulated later into something else.

the idea is correct, only the implementation is clumsy: instantiation
needs to be solved at a single place, then forth syntax needs to be
built on top of it.


Entry: nobody uses frameworks
Date: Sun Jun  1 12:49:28 CEST 2008

what are the entry points? the ui? brood needs to be api'd as a
library. it needs to be a straightforward set of macros on top of
scheme, no callback nonsense.


Entry: forth syntax / code instantiation
Date: Sun Jun  1 14:12:34 CEST 2008

split the forth macros in 2 parts: those that create new names, and
those that do not. (the latter contains locals, quote and code
quotation). note that quoting doesn't need forth syntax: there is a
corresponding s-exp syntax. if the same is done for locals, then forth
syntax can be exclusively used in .f files or a forth: macro where it
introduces names.

so: the '(forth-toplevel form compile (defs ...)' does:
  * expands to a special form (i.e. begin or #%module-begin)
  * binds 'compile' to a function that generates words
  * has preferrably no side-effects

the idea is that those forms can be composed (i.e. using 'load') and
that the toplevel module namespace initiates all code compilation.

the problem is not the expansion to definitions (either 'define' for
macros or 'define-syntax' for parsers): it uses the toplevel 'begin'
form.

the problem is registeration of the forth words / instantiated
macro. binding it to a given name is not necessarily a good thing. i
really need to think a bit about how this is used in toplevel project
namespaces, both for one-shot and incremental code compilation.

maybe it is best to put all word instances in the toplevel namespace
AND allow for a mechanism to collect them (maybe from the namespace
using reflective operations?)  this could be an algorithm akin to
garbage collection: only compile the code reachable from the roots =
exported (forth) namespace names.

it only needs a way to take care of the recursive definitions: the
word instances are defined in terms of macro names, and can only be
evaluated AFTER all macro bindings are evaluated. because of the level
split (forth macros are scheme functions) this has to be done
manually. it's easiest by just making all forth code into promises
though.

the only problem remaining is fallthrough: how to guarantee the
correct order of compilation? this is one of the main reasons why
simple mapping from name -> datum isn't really possible: the order is
important.

so, the problems:

  * forth code is ordered (supports multiple entry points)
  * evaluation of forth code requires all macros to be bound, so has
    to be done after evaluation of macro body expressions.

this is solved atm once per module, but due to 'load' this operation
needs to be composable:

  -> compose the definition of macros
  -> compose the forth instantiation macros

it's probably ok to bind the labels to names so they can be accessed
through reflective operations later, but it's essential to also
somehow orchestrate the compilation of the code.

the remaining question: when does the code need to be compiled? also,
answer this in light of incremental compilation (keep the namespace
active, just add in more code/macros). is it ok to assume some
context?


Entry: partial evaluation literature
Date: Sun Jun  1 14:49:19 CEST 2008

enough dabbling, i think i'm ready for reading:
http://www.dina.kvl.dk/~sestoft/pebook/

* pe = an operation on program text


Entry: namespace woes
Date: Mon Jun  2 19:35:21 CEST 2008

file:///plt/doc/guide/reflection.html

  Modules not attached to a new namespace will be loaded and
  instantiated afresh if they are demanded by evaluation. For example,
  scheme/base does not include scheme/class, and loading scheme/class
  again will create a distinct class datatype:

This is important for the global registery used for recording Forth
instantiations, and datatypes that are shared by the host meta system
(i.e. target and macro word structs).


Entry: fixing forth instantiation
Date: Tue Jun  3 11:14:34 CEST 2008

made the first changes to purrr/forth.ss so it generates just
syntax. need to change:

   - handling of toplevel require forms using parameter
   - compilation

maybe i should just give up the production of forth dictionary
'records' and just use the environment to dump stuff, making
compilation of a .f file a side-effectful operation.

SOLUTION:

  move everything that's not part of the instantiation macro to
  toplevel using the forth-toplevel-forms parameter. this includes
  macros (and maybe variable definitions?) and definition of the word
  structs.

  the remaining running state is just a single big macro which inlines
  the word structs in the macro code using the 'label' forth
  word. i.e:

     (define-ns (target) bla (make-target-word #:name bla))

     ... ,(ns (target) bla) label ...


  this way, forth compilation is what it is: construction of a macro
  that after instantiation gives a code graph. all OTHER stuff that
  happens in a .f file (definition of macros, variables, imports, ...)
  have a straight meaning as scheme module components and can be
  recorded in a side channel, implemented by a parameter.


  ( i can't help but think about writing the parser words as scat
  words.. this is yet another threaded state problem.. )


stubbornness: i'm just going to keep things as they were. cleaned up
purrr/forth.ss a bit + finally undersood how namespaces can share
code, and it looks like this is enough to build the necessary
abstractions.

NEXT: load


Entry: load
Date: Thu Jun  5 10:58:26 CEST 2008

got it working, at least with absolute paths.  the trick is to call
the forth syntax reader forth directly inside parser-tx.ss, and to
combine source location info with the proper lexical information.

next: fix path + convert kernel's 'require' stuff back to 'load' so it
can be modularized later, and so that most hacks around late binding
can be simply replaced by loading stuff into the same namespace.

Entry: simulator
Date: Thu Jun  5 14:17:16 CEST 2008


Entry: Simulator
Date: Thu Jun  5 14:05:22 CEST 2008

http://citeseer.ist.psu.edu/119550.html

What I need is a language for simulator design, or more specifically,
a strategy for compiling target code + some semantics spec to host
executable C code for optimal performance.

An advantage is that what needs to be simulated is usually quite
lowlevel code with fairly specific space and time characteristics. So
basicly, a state machine description language is necessary. Something
that can be compiled to a massively parallel C program.

If there is one place in Staapl where partial evaluation is going to
make a big difference, it's there.

--

Instead of writing a simulator, building a partially evaluated
simulator might be a better idea, since for simulation speed is very
important.

What is an instruction? It's a state update. state is memory. Memory
is a number of registers, with variable bit size. So an instruction is
something with the following properties:

  * an endomap for (a subset of) the machine state
  * timing information
  * encoding (for instruction interpreter)

Maybe i should take a step back towards pure s-expressions for
instruction set spec, since these are a bit hard to compose (write
macros that expand to them) composition would help to define some
instruction classes.

 (addwf   (f d a) "0010 01da ffff ffff")

 ((addwf f d a) ((#b001001 6) (d 1) (a 1) (f 8)))

Actually, the simulator descriptor language is as good as the same as
the dsp dataflow language. maybe i should do the latter first, then
generalize.


Entry: The Dataflow Language
Date: Thu Jun  5 18:10:53 CEST 2008

see entry://20071211-093307


From the KRIkit project it's quite clear that some kind of dataflow
language makes sense for deeply embedded DSP applications.  The main
problem I ran into during the project is a too low abstraction level
to write the modem code.  Making the code more abstract at the Forth
level would have cost about 10x execution speed.  Solving this at the
macro level probably a better idea.  Some declarative dataflow or
array language might be a good addition to Staapl and a better match
to RISC architectures with more direct 2/3-operand register file
addressing modes than the 1-operand 8-bit PICs.

But what syntax should that language have? The main problem that makes
a straightforward dataflow language impractical is expressing the
linking between code modules whenever a module has more than one
output value.  Using a scheme-like expression syntax, some form of
'let-values' is necessary to name output values when nested
expressions are no longer sufficient due to the move from a
tree-structured data flow graph do a directed acyclic data flow graph.

Why not use a concatenative/Forth syntax?  Instead of using RPN
notation as a direct specification of control flow, which would fix
the execution order of operations as specified, it could be used only
to build the (parameterized) dataflow graph, which could then be
compiled and re-serialized specifically for the target architecture.

When this is combined with higher order functions (or the macro
version: combining forms) this gives the original Packet Forth idea:
APL with Forth syntax.

I have argued before that writing DSP code in Forth is difficult, but
this problem can be simplified using higher order (list/array)
combinators.  What usually deters is the inefficiency of such
composition mechanism when they are implemented at run time. However,
when these combinators can be eliminated at compile time (a
first-order language with higher order macros) it might be feasible to
have both highlevel programming AND efficient code.

A place to look for inspiration is probably Factor's data flow analysis.


Entry: Array Processing
Date: Fri Jun  6 13:25:05 CEST 2008

Programming in an array processing language can be factored in two
steps:

  * construction of primitive, pure many -> many functions
  * mapping these over tensors


Entry: next
Date: Fri Jun  6 16:10:22 CEST 2008

paths in load.
  * find a file in path.

now, allowing files with undefined symbols might be a convenient
notational device, but it makes it hard to test them individually
because they need context.


Entry: command line
Date: Fri Jun  6 16:56:16 CEST 2008

it's time to start using a forth command line + code store.

then, there is only boring stuff left:

  * fix 'org', or think about how such a direct assembler-state
    control statement can be allowed in the language.

  * fix the undefined symbol problem introduced by the switch to
    module languages -> maybe add some toplevel undefined symbol
    handler in the badnop namespace management code. make sure some
    toplevel equivalent of 'load' works. (why can't load be used for
    import actually?)


Entry: namespaces
Date: Sat Jun  7 11:17:15 CEST 2008

trying to figure out exactly where to put things.

  1. support system = toplevel application namespace
  2. one namespace per compiler / project.

the parser and lexer for the REPL obviously should be in 2. so it
needs an interface. also, it's probably best to make the interface to
the repl a macro.

uptil now, there were only modules. each module brings its own lexer
and parser. the result is Scheme definitions. to add repls, each repl
needs to be attached to a lexer and a parser.

the problem i run into now is that purrr/repl.ss pulls in too much
dependencies, mostly because of purrr/forth.ss
the latter should be factored a bit more.. fixed..

ok.. think i got it working: purrr.ss imports the whole purrr base
layer with forth syntax (parser words) AND a repl macro.

next: fix a problem with module loading.. infinite loop when requiring
an .f file

hmm... looks like i have a problem. increased the limit to 100mb and
now it works.. it runs in 26 mb too..


Entry: redefine
Date: Sat Jun  7 18:40:52 CEST 2008

so.. toplevel stuff is working now. so why can't these toplevel
definitions be used to change implementation of core macros like
'dup'? the idea is that yes, i'd like to keep the current module
system for managing names, but no i don't want to prevent modification
of macros. basicly, they are used to change aspects.

merely putting them in toplevel to be able to upgrade them would get
rid of other advantages of the module system.


Entry: require + toplevel
Date: Sat Jun  7 19:03:25 CEST 2008

box> ,purrr
toplevel in /home/tom/scat/
;; scat
extend: (macro) jump
;; macro
;; forth
;; purrr
box> (repl "require test/purrr18-test.f")
;; scat
extend: (macro) jump
;; macro
;; asm
extend: (macro) +
extend: (macro) dup
extend: (macro) drop
extend: (macro) swap
extend: (macro) or-jump
extend: (macro) not
extend: (macro) then
;; forth
;; purrr
;; pic18
;; dead: ((jw #<target-word>))
;; dead: ((exit))
;; dead: ((exit))
;; dead: ((exit) (qw 6) (qw 5) (qw 4))
box>


Same for (require "test/purrr18-test.f"). This isn't right: it should
reuse the scat stuff.. How to do that? Does the namespace need to be a
module namespace?

Maybe it is.. require cannot depend on context, so when a module
requires another one, that last one needs to be re-instantiated.

I guess this is a chance to finally figure out what i'm doing with
this namespace / compiler instance business ;)

EDIT:

* there is one instance of the compiler for interactive use.

* each module that is required into the toplevel instantiates a
  compiler.

the latter makes sense: loading an application without a toplevel is
possible as long as it has its own compiler associated.

one question though: isn't using a toplevel terribly inefficient then?
probably, for things not used after instantiation, garbage collection
kicks in? The compiler is simply discarded?

ok, i think i got it fleshed out now.. the only remaining things to
figure out is to remove the dependenices of the data structures on the
scat code (make dependency on the badnop side optional) and figure out
where to put the assembler (probably best in the target namespace)


Entry: comping purrr to C
Date: Sat Jun  7 19:39:42 CEST 2008

it shouldn't be too difficult to add a C frontend for purrr. basicly,
every word instance is a function + the stack pointer is passed as a
parameter + tail calls are forced.


Entry: basic
Date: Sat Jun  7 23:16:41 CEST 2008

i was wondering how difficult it would be to compile one of the BASIC
dialects for PIC or AVR to purrr.


Entry: new names
Date: Sun Jun  8 17:39:24 CEST 2008

It's difficult to pick good names. The current ones: brood, purrr and
scat are a bit difficult to google because they are all common terms.

I was thinking about STAAPL, which is a creative spelling of stapel,
the dutch word for stack. Maybe retrogrammed as STAck and Array
Programming Language. Another one is Staprola, stack programming
language. No google hits on that.

Or something completely meaningless? Wurzon/Kamizi? What about calling
the whole system Staapl, calling the pure language Wurzon and the
Forth layer on top Kamizi? Hmm.. the most important thing is the name
of the project. Let's try staapl for a while.

EDIT: main project is now called staapl.


Entry: about that stack
Date: Sun Jun  8 18:00:38 CEST 2008

so.. are we going to stick with stacks or not? i'd like to give the
concatenative language as specification for a dataflow language some
thought. in that case, the system is as good as complete.


Entry: factoring
Date: Mon Jun  9 10:50:22 CEST 2008

I'd like to factor macro and target:

macro: just the functional macro metalanguage, no instantiation
target: only instantiation (compilation) and optimization

Is this a good use of time? Probably not.. The macro language is never
useful without instantiation.. it really is just composition of unary
functions which after all isn't terribly interesting if you never
evaluate them. So ditch this..

What does need to happen is to trim dependencies of the target
representation structure. Currently, it needs 'scat' for some
evaluation stuff. Fixed.


Entry: concatenative dataflow language
Date: Mon Jun  9 11:15:29 CEST 2008

it compiles to a dataflow graph. in order to do this incrementally, i
need to think about state. state, as represented in purrr, is the
current output of the network, so compilation is just adding a node.

(... [N1] [N2] +) -> (... [N3])

where

   [N1] [N2]
      | |
      (+)
       |
     [N3]

next decision: does this need a functional state, or can mutation be
used directly to build the graph? the tricky point is multiple
outputs:

(... [N1] [N2] div/mod) -> (... [N3] [N4])


   [N1] [N2]
      | |
   (div/mod)
      | |
   [N3] [N4]


this could be represented as:

  [N1] = (div/mod [N1] [N2])
  [N2] = (shift (div/mod [N1] [N2]))

with dat structure sharing. this completely avoids the problem of
having to name intermediates.

a more symmetric rep would be

  [N1] = ((div/mod [N1] [N2]) 0)
  [N2] = ((div/mod [N1] [N2]) 1)

this even has a representation is straight scheme in the form of
memoized procedures.

let's try to build one on top of the pattern matcher.

the first notational problem i run into is specification of primitives
with multiple outputs, which is the problem i'm trying to solve!

so.. let's stop going in circles.

;;   Some important points:

;;   * Dataflow macros have a different representation. They have an
;;     entirely different compilation mechanism: one which involves
;;     register allocation and instruction scheduling. This
;;     representation should be made solid.

;;   * Give the dataflow macro rep, writing an automatic convertor to
;;     concatenative syntax is trivial.

;;   As a result, the macro/pattern.ss mechanism is only needed as a
;;   building block, not as a ui front end.

the composition mechanism should just build the graph, but represented
in such a way that 'executing' it is simplified.. this boils down to
how to do the binding, whether to use 2-way links, whether to represent
subgraph inputs by 2 nodes etc..

it looks like going for an explicit data structure that is later
interpreted or compiled might be the best approach. it's easiest to
understand. (the other way is to map it directly to scheme code, which
is also a DAG).

in hardware, all functions are many to one.. the only place where many
-> many functions come from is abstraction. can this fact be used to
simplify the problem? a subgraph is basically a list of (named)
expressions expressed in terms of (named) inputs.


Entry: monads and computation
Date: Mon Jun  9 12:28:05 CEST 2008

the philosophical idea behind monads starts to dawn on me.. in any
programming language, there are 2 things to consider:

  * a composition mechanism, which takes multiple language elements
    and turns it into one (or more) composite elements.

  * primitive elements.

this is 'bind' and 'return'.


Entry: strategic overview
Date: Mon Jun  9 15:04:48 CEST 2008

About how i'm going to tackle the simulator generator
problem. Hardware is best modeled starting from a description of its
interior, which is registers + logic. Functional/dataflow descriptions
are thus a good base language. It looks like using the simulator as a
pull for implementing the core dataflow representation seems like a
good idea.


Entry: representing DAGs
Date: Mon Jun  9 15:46:14 CEST 2008

"It's better to separate the 2 concepts of many->one functions and
grouping, than to work with many->many functions and permute/connect
their outputs.

What this does for representing graphs is the ability to use simple
nested (scheme) expressions.

In this view, mapping a concatenative language to an expression based
syntax is completely trivial. Representing one is completely trivial
also. So what problem am I solving?

( Some idea is itching in the back of my head telling me that partial
evaluation for functional dataflow analysis is really trivial as long
as there is only a single type: compilation is nothing more than
evaluating the graph while adding postponed semantics to the
code. What makes it hard is presence of higher order constructs. I'd
like to get a handle on this.. move it from the philo level to
concrete code.. Is it all the same thing? Is partial evaluation REALLY
better viewed from a compositional pov, as an intermediate form to get
the evaluation right, and then to transform it back to a graph for
optimizing the register allocation / sequencing? That can't be the
case really, since both are easily related to each other. Probably i'm
forgetting about associativity here.. )


Entry: base semantics
Date: Mon Jun  9 16:06:41 CEST 2008

Looks like i need a representation of base semantics of stack
manipulation operators. This can then be used to generate substitution
rules for the pattern language, and perform dataflow analysis. I've
added a file stackop/stackop.ss that's not used yet to put this info.


Entry: coming out
Date: Mon Jun  9 16:11:08 CEST 2008

when i start combining data flow analysis with a concatenative
specification syntax, it's time to admit that yes, this is about
syntax! so, rationalization:

  * for target implementation, a stack based langauge is nice

  * anything that can be analyzed before it's placed on the target
    might benifit from being transformed into a data flow graph, to
    get rid of the explicit serialization in concatenative code.


Entry: base language for simulator description
Date: Mon Jun  9 17:07:06 CEST 2008

Let's see if this route makes sense: create a scheme language level
for an expression serializer. In = an expression graph, out =
serialized graph. This is a pure scheduling compiler, mainly serving
as a front-end to a C-code generator.

First: what about names. If there are no scoping issues, it's best to
work on symbols instead of identifiers. This seems to be the case.


Entry: enough dabbling
Date: Mon Jun  9 17:27:19 CEST 2008

next:

  * fix 'load' to perform source relative include and figure out a way
    to perform temporary code generation with undefined symbols
    (i.e. assuming they are constants or something).

  * fix 'org'

  * port the target interaction code


Entry: fixing load
Date: Mon Jun  9 17:55:53 CEST 2008

This is not entirely trivial: the environment in which the code is
expanded needs to be modified so the load statements inside the code
know where to get the code. Currently, it's simply inlined so context
can't be tracked.

OK. with the control flow out of the way, it's probably easier to
override current-load than to try to re-implement that part..

Q: is it possible to use require in a loaded file? if yes, is then a
problem to replace the load handler also for requires?

hmm.. the parser atm is really confusing.. too much juggling with
return values and continuation thunks.. this needs to be solved
without a driver routine.. maybe a single dynamic variable to
accumulate code is better.. there is aleady one for toplevel defs..


Entry: new parser driver
Date: Mon Jun  9 21:22:40 CEST 2008

It's a mess. There have been several occasions where i tried to
understand it but couldn't. So, how to fix this. There are 2 things to
arrange:

* whenever a definition starts, the name, srcloc, and mode need to be
  recorded. -> implement as thunk.

* whenever a definition ends, the current expression needs to be
  combined with the stored header information and collected as a record.

the basic driver seems to work. it's a lot simpler to understand now:

(define (definer mode)
  (lambda (code expr)
    ((finalize-current) expr)
    (syntax-case code ()
      ((_ name . code+)
       (new-record #'name
                   (mode)
                   (stx-srcloc #'name))
       (collect-next #'code+)))))


now need to adapt all the other macros to this new way of doing
things.. should be straightforward. nesting can be implemented with
dynamic scope and an exit continuation like for load.

ok, with some minor shuffling in what to return to the continuation
(which is now implemented as a prompt) it seems to work.

now locals: maybe i can get back to using (rpn-represent)? this
requires the function to return.. is this possible?

ok: rpn-next can _only_ return when all input is parsed. this way
parsing can still be nested locally without the driver loop needing to
restart parsing.

ok, got some generic nesting working, now do the same for load so all
nestings can compose.

something seems to be wrong with 'load' though: probably a
continuation barrier.. nope: the procedure embedded in the syntax was
of course wrapped as a syntax object, and my printer routine
automatically unwraps it..

locals: the problem with not allowing rpn-next to return is of course
that now it is no longer possible to modify the closing expression
(the lambda wrapper): this used to happen by returning. the solution
is to add yet another parameter that represents expression closure.

it's actually already there in the form of 'rpn-lambda'
but this makes it a bit complicated..

the following modification should do it: allow 'locals-tx' to modify
rpn-lambda, and reset rpn-lambda in the forth parser so every
definition can start from a clear wrapper.

ok.. the thing is this: building an expression one wants to be able to
insert nesting expressions above and below: passing on just the inner
expression is a bad idea. maybe this needs to change? 3-value parser
state?

hmm.. alternatively, write the parser in terms of scat threaded state
updates, but that might go too far and lead to bootstrap problems.

the problem is now to make sure that wrappings are only used
once. this needs an interface:

pfff... i'm getting myself into lowlevel mess again because one
feature doesn't fit into the simple abstraction. what about making
'expr' an expression generator: a list of functions that can be
composed and evaluated.

OR:

expr = a cursor inbetween:

(outer . inner)

this should really solve all parsing needs.

ok. got it working with a bit of juggling with rpn-lambda:

  * at every rpn-compile, the current expression wrapper is set from rpn-lambda.
  * expressions are allowed to override the current expression wrapper
  * if entry is not trough rpn-compile, you need to initialize the wrapper!

pff..
next problem: the locals macro seems to have a problem with non-2stack
states. solved: wasn't fixed after abstract state update was changed.

Entry: compiling monitor
Date: Tue Jun 10 17:32:22 CEST 2008

without 'org' and some things disabled here and there. but it does
seem to compile, at least in toplevel namespace with 'load'.

it compiles, but doesn't assemble. some 2stack problem in
target-value->number


Entry: local variables
Date: Tue Jun 10 19:13:29 CEST 2008

A side-effect of the way locals are implemented, is that they can
occur anywhere in a macro definition or code word, and will bind
literals.

box> (repl ": foo 1 2 | a b | a b a | d e f | e ")
box> (print-all-code)
foo:
	[dup]
	[movlw 2]
box>


Entry: next
Date: Tue Jun 10 19:56:35 CEST 2008

org:

let's see.. the real problem with org is that it permanently changes
assembly state. currently it's possible to set the org of a chain of
code, which is a local effect only.


asm:

monitor doesnt assemble.. problem with 2stack <-> compilation-state
confusion in evaluation of target values.  the error happens in
/home/tom/staapl/macro/instantiate.ss:80:3 wrap-macro/mexit which
means a .f generated macro is evaluated. ok, it's a constant that's
evaluated with macro->data.

it should be possible to convert macro->data so it runs on a dummy
compiler state, but the real problem here seems to be: is it somehow
possible to not wrap macros with local exit if they don't need it?

well.. i can always run it once, then decide on how to encode it. a
macro can throw an error, but it is not allowed to have other side
effects.

problem:

    the m-exit mechanism makes it impossible to evaluate macros on a
    2stack state, which is possible for most 'clean' macros.

    should the concept of 2stack macro be discarded? or is the concept
    of clean macro important? (don't you just love dynamic typing!)

intuition goes towards: keep 2 classes because the compilation-state
class deals with 'lowlevel' features like multiple entry and exit
points.

is it possible to more clearly separate these 2 classes?l


Entry: meta
Date: Tue Jun 10 23:44:47 CEST 2008

last two months have been, well, long..
i didn't get so much done really. mostly reorganizing, fixing bugs and
thinking about the new features.. some topics:

  * load vs. require, module expander and redefining words
  * org and labels + multiple entry/exit points
  * namespace juggling (badnop)
  * parser cleanups + new syntax for scheme expr + code quotations
  * documentation
  * simulator ideas + dataflow language

so i did get things done, they were just more difficult than
anticipated.. all of them involved significant choices and
backtracking on dead ends, not much straightforward coding as i
expected. maybe that's a good thing in the end.. it's just that now
i'm a bit drained on the creative front.

once compilation works (maybe tomorrow?) the road onward should really
be straightforward: port the interaction code, and solve bugs in the
compiler that are exposed. the goal should be to move to a working
'ping' beginning next week.


Entry: serialization for incremental dev
Date: Wed Jun 11 00:03:33 CEST 2008

it does look like the serialization problem for incremental
development is relatively easy to solve: save the forth words from the
namespace, and rebuild it later by loading macros from source code,
moving them to a new namespace, and augmenting them with compiler
macros for the serialized words.


Entry: m-exit
Date: Wed Jun 11 09:49:43 CEST 2008

ok, another choice to be made.
does 'macro->data' use 2stack or compilation-state or something else?

maybe it can be parameterized?

since it's more about a configuration issue than anything else: when
not using mexit and in-line word creation, use 2stack, otherwise use
the extended compilation state.

ok, something else to clean up:

    state:2stack : create a state object with update function
    make-2stack  : create the raw struct

now, the solution i have is to use a parameter called
macro-eval-init-state, but isn't it better to store this kind of type
information in the macro itself? in fact, this is a universal
property: each scat/rep.ss word has a native type on which it
operates, so let's make that mandatory. the type class can be
represented by a constructor for an empty state.

plan changed:

   * add a new record to words to indicate type. the type is actually
     a state constructor.

   * new-state:2stack -> parameterized constructor
   * state:2stack     -> type value

ha, this doesn't work for compostions! there it needs to be inferred
at compile time, but that's not possible. anyway, i'm going to keep it
to see where it ends up. maybe an order relation for state types can
be defined, so at least this type analysis can be performed at
run-time.

no.. it's too flakey, let's get rid of it.

Entry: error reporting
Date: Wed Jun 11 11:27:43 CEST 2008

box> (assemble! (all-code))
asm-overflow: (bpz (p R) ((112 . 7) (p . 1) (R . 8))) (bpz 126 1) (-131 8 -1)

that's nice, but where does it come from? what i want to know here is:

       * where in the assembly code does this happen
       * what is the corresponding source location

the latter might be difficult, but at least it should be possible to
find out in which word this is.

ok, got better error reporting now: it tells you which word chain it's
in, and dumps out the assembler code before it re-raises the
exception.

/home/tom/staapl/pic18/interpreter.f:59:2: n!f+
n!f+:
	[jsr 0 async.rx>]
	[movwf 4068 0]
	[drop]
_L72:
	[jsr 0 async.rx>]
	[movwf 4085 0]
	[tblwt*+]
	[drop]
_L75:
	[decf 4071 1 0]
	[bpz _L72 1]
	[movf POSTDEC1 1 0]
	[jsr 1 ack]
asm-overflow: (bpz (p R) ((112 . 7) (p . 1) (R . 8))) (bpz 129 1) (-134 8 -1)


the problem is quite clear now: relative instructions need to be
initialized differently so they don't overflow in the first pass.

no.. problem is something else:

 (bpz      (p R)     "1110 000p RRRR RRRR")


p is the first argument.

where did that come from?

i think i remember: all jump instructions were changed such that the
target address is the first argument. however, apperently the
assembler hasn't changed accordingly. this looks like a relic. let's
change it back..

ok, seems to work now.

next problem: all jumps are long. (typo)
next problem: dead code elimination for jump tables (fixed)
next problem: org


Entry: comma
Date: Wed Jun 11 16:42:06 CEST 2008

the problem with comma is that it is one of the reflective
words. (like 'constant' it accesses the run-time stack and produces
code).  i use it in purrr to change postponed literal semantics to
inlined raw words / bytes.

since this has no standard semantics i took the freedom to use it as a
replacement for ';' for jump-tables, one that doesn't terminate the
code.

is there a better way to do this actually? can't the current code word
be marked that it contains a run-time jump and thus needs to have
chain splitting disabled?

let's keep it manual: write a macro on top of the low-level
dispatcher later.


Entry: some slogans..
Date: Wed Jun 11 17:37:45 CEST 2008

.. to later remember why some decisions are taken

* it's important to have purely functional macros + some abstract way
  of handling the threaded state.

* for the parser i've opted not to, because this purely functional
  infrastructure is not necessary: it's merely a frontend for forth
  code, and has a composition mechanism in the form of substitution
  macros.  internally it uses an explicit serial interpreter ('next'
  routine).

* i'm trying to find a good trade-off between low-level control
  (i.e. raw jump tables, where the language is basicly an assembler)
  and high-level code analysis and manipulation, which serve as the
  basis for high-level metaprogramming constructs.

* yes, i like to split the code in chunks of about 300 lines


Entry: org
Date: Wed Jun 11 20:38:24 CEST 2008

so, maybe just hack it. whenever a name is a number, it's a permanent
org change. the obvious requirement is that it's an expression that
can be evaluated at compile time.

this is already used: if target name is a number, the current chain
will be assembled inside a code pointer push/pop.

in macro/instantiate.ss the function 'combine-if-org' is used to
combine multiple chains if the current store has an org specified, to
make sure it stays bundled.

so, what needs to be done is a simple marker in the assembly code that
sets the code pointer. that's all really.. why is this so difficult?
because it's a dirty operation, and there's no clean way to do it. it
probably needs some management at some point, i.e. to disallow it for
certain code contexts..

to summarize:


   ORG FOR CHAINS:
       * compiler uses state push/pop to control stack
       * on org change: all chains are combined
       * assambler recognizes org chains

   PERMANENT ORG:
       * some 'magic packet' in the chain stream.

permanent org needs to terminate the current chain, but doesn't need
to link up chains.

let's just make names (org <number>) and (org! <number>)

Entry: compiler state operations
Date: Thu Jun 12 12:19:39 CEST 2008

These need to be factored out a bit.

I do need to be careful about providing primitives that are
non-destructive, and leave destruction to simple destructors.

org-pop is:   terminate-chain
              combine-chains
              pop-chain

I'm going to leave this as is until I need a different mechanism that
needs to split/merge the compiler state.

AHA: split and merge.

pop-chain =  terminate-chain
             combine-chains    ;; only for org
             merge-state

push-chain = split-state

split-state: save current asm, rs and dict on the control stack, and
             start with a clean slate.

merge-state: merge current asm, rs and dict with the one on the
             control stack.


Entry: Compiler Code Hierarchy
Date: Thu Jun 12 13:06:34 CEST 2008

During compilation the assembly code (the result of instantiating
macros) is organized in the following hierarchy:

  * A word is a single entry point, represented by a target-word
    structure associated to a chunk, which is a list of consecutive
    assembly code instructions. Code inside a word can only be
    reached through a jump to its label, and is thus not observable
    to the world. Words serve as the unit of code generation (and
    recombination). Any operation on code that doesn't alter
    semantics is legal within a chunk.

  * A chain is a list of words (chunks) with implicit
    fallthrough. Each word indicates a single entry point. Chains
    are terminated by exit points. Chains are the unit of target
    address allocation: each chain can be associated to an address
    independent of other chains. Some chains have fixed addresse
    (org).

  * The store is a stack of recently constructed chains.

Entry: next
Date: Thu Jun 12 14:08:10 CEST 2008

fix some bugs..

 * "if ; then" doesn't work  -> plug-in library support for macros

 * labels are registered when compilation fails. is this ok?

 * looks like macro delegation in 'pattern' is not a tail-call, is
   this a problem?

 * begin doesn't work (typo)

 * comma: byte or word?

 * org is sometimes dead code, sometimes not? silly: gets redefined. +
   another problem.. some org labels seemingly get dropped.


Entry: library fallback
Date: Thu Jun 12 15:01:40 CEST 2008

It would be really nice to be able to automatically link in
functionality. It's actually not too difficult to do so, but it
needs access to the filesystem, which currently is only possible in
the forth syntax layer (macros are pure).

So, where to put it?

This was solved using macro redefinition, without automatic inclusion
of library functions.

The convention is to prefix library fallback functions with a tilde
'~' character.

Entry: comma
Date: Thu Jun 12 17:22:57 CEST 2008

In case of pic18, should comma compile bytes or words? Byte tables are
useful, but the native code word size is 2 bytes, and all code words
use word addresses. Since comma is mostly for data tables it's
probably best to let it compile data word size instead of code word
size when they are different.


Entry: next
Date: Fri Jun 13 17:51:16 CEST 2008

i think i got the most important bugs nailed down. time to go from
code -> ihex and upload something.

(save-ihex (all-binary-code) "/tmp/broem.hex")


Entry: words and chains
Date: Fri Jun 13 22:02:10 CEST 2008

I'm already regretting the internal linking of word structures. It
feels unnatural to have to convert things to a list, and have to
remember if something is a sequence or not. On the other hand, it's a
clear sign of fallthrough: a word can never be mistaken to be
standalone..

Maybe i should define iterators / comprehensions?

It's probably better to define an explicit chain type, instead of
using words for that.. A chain is a list of words. The address of a
chain is the address of the first word.

Basic idea: directly encoding fallthrough from a given word instance
is less important than having a proper data structure that
distinguishes entry points from code grouping due to fallthrough.


Entry: Toplevel vs. module namespaces
Date: Fri Jun 13 19:35:37 CEST 2008

When you talk to somebody, you'd like them to remember what you were
talking about before. Conversation needs context.

When you read something, you hope for all context to be explained in
the text you are reading. Exposition needs completeness.

Same goes with interaction to a machine.  This is the image based
accumulative repl vs. the transparent repl debate.

I find it quite interesting to give it some thought, as both are
valuable for some uses. This is about load vs. require: Interactive
incremental development vs. the 'run' button.


(explain)


Entry: machine code org
Date: Sat Jun 14 10:05:19 CEST 2008

* instruction -> list of binary machine code
* word
* chain
* store

    In the code base, whenever assembly code, binary code, or word
    structures appear in a list, it is REVERSE SORTED. This is easier
    from the point of compilation.

    Code ANALYSIS needs the reverse of that: code is linked in the
    direction of instruction flow. This is how words are internally
    linked: given a target word (entry point) it's inline fallthrough
    can be easily obtained using 'target-chain->list', which again
    returns reversed list.


Entry: addresses
Date: Sat Jun 14 10:26:20 CEST 2008

The Forth uses byte addresses, because it might access data bytes in
flash memory.

The Assembler uses word addresses, because this is the basic unit for
flash memory.


Entry: Don't step on composition.
Date: Sat Jun 14 20:45:55 CEST 2008

Whenever you write a program, do not EVER limit the way in which
people can combine your primitives. This is actually quite hard:
limitations tend to creep in whenever a minilanguage arises. I.e. in
Brood-4 there was a problem with interaction macros: they did not have
a composition method that was easily accessible from Forth files.

The catch-all solution in Brood-5 is to provide access to the
underlying scheme core through Forth syntax.

(explain)


Entry: The compiler dictionary structure.
Date: Sun Jun 15 09:10:46 CEST 2008

There's a problem right now which looses labels, probably because the
state reaping in the compiler is too complicated:

 asm code collects with current word -> current chain
 chains collect to current store
 this state can be pushed/popped.

Drops should be made explicit. Trouble is I see no drops.

Problem is probably that 'terminate-chain' for 'org' also needs to
terminate the current word. The bug occurs after 'comma' of the config
values, which are not terminated.. So, let's make sure that
terminate-chain only works for #f labels.

No, that's not it.. weird one.. needs proper machinery to track down.


Entry: dataflow language
Date: Sun Jun 15 09:39:48 CEST 2008

Compiling an expression language to C is fairly trivial, since C has
an expression language built in. GCC also has SSA (static single
assignment) form, so presenting C code that uses single assignment
should be ok. Expression evaluation is straightforward, so trusting
GCC to handle this properly should be no problem. GCC also has a
mechanism for proper tail calls:

http://community.schemewiki.org/?gcc-does-no-flow-analysis

So, as long as there are no first class functions or comprehensions,
compilation is really easy.


Entry: array comprehensions
Date: Sun Jun 15 10:09:14 CEST 2008


The trouble is then, the problem I want to solve is to generate
efficient code for a dataflow graph + array comprehension
combination.

The real problems for comprehensions (translation to nested loops)
are performance (P) and correctness (C).

  * Inner loop generation (P)
  * Cache memory optimization (P)
  * Handle border conditions. (C)

Here (P) need to be fast and (C) if applicable (i.e. in convolution)
might be approximately correct.

There is a discussion on the concatenative list about Backus' FP and
APL not having first class functions, but comprehensions. Is it fair
to say that this is the thing to do for numerical code? First class
functions are overkill, but anything you would want to solve with them
can be solved with comprehensions: you turn higher order operations
into syntax, which converts them to easily inlined loops.

There's a thread on concatenative about this:
http://www.nabble.com/Joy%27s-relationship-to-FP-%2B-a-Joy-variant-with-combining-forms-td17576284.html

My answer, without thinking too much, would be: macros. Languages
based on composition can have partial evaluation. Higher order macros
can expand to combination forms: Have higher order macros, but don't
allow such functions at run time. Is this cheating, or completely
beside the point?

   "We owe a great debt to Kenneth Iverson for showing us that there
   are programs that are neither word-at-a-time nor dependent on
   lambda expressions, and for introducing us to the use of new
   functional forms." - John Backus, 'Can Programming Be Liberated
   from the von Neumann Style?'

The basic idea; use concatenative syntax for specification of:

* pure functions, which are translated to a dataflow representation.
* higher order macros which serve as combining forms (combining forms).


Maybe I should try to answer this, and relate it to unrolled
reflection and partial evaluation:

http://www.nabble.com/Re%3A-Joy%27s-relationship-to-FP-%2B-a-Joy-variant-with-combining-forms-p17802001.html

I don't know how though.. Maybe best to try the combinator route first.


   The idea here is that "combining forms" as found in FP and APL are
   related to the macros in Purrr that are not compilable. My stress
   on macros is really about giving up first class functions at
   runtime, but not at compile time.


Entry: data flow + aspects
Date: Sun Jun 15 10:09:26 CEST 2008

So, for simple scalar DSP code (i.e. no FIR filters), it should be
possible to define such a language fairly easily, and concentrate on
'hints' for compilation.

What I mean is this: make the description highlevel, and add hints to
influence compilation. These hints would choose number systems,
scalings and bit widths for fixed point, etc..


Entry: for .. next
Date: Sun Jun 15 11:36:02 CEST 2008

The (hypothetical) higher order macro:

   [ ... ] for-next

and the Forth equivalent

   for ... next

do the same thing, but in current RPN representation, 'for-next' does
have access to the macro quotation to perform optimizations.

It would be nice to make the first one primitive. In the current
implementation however, this is difficult to do because the ... in the
2nd form is evaluated before 'next' is evaluated. This approach would
need a change in representation.

It does seem that using quotations directly is a far better approach:
it doesn't need any forensics to recover quotations from flat Forth
syntax. A limited translation of Forth to this form as a syntactic
operation might be feasible, but it's not possible to take it all..

This is at the core of the conflict of 2 forces in Purrr:

  (L) The lowlevel compiler which gives explicit control to the
      programmer about how to use nesting.

  (H) The highlevel approach based on explicit quotations and
      s-expression syntax.

Unifying both is difficult, but they can probably be built on common
ground: providing a macro language with simple (conditional) jump
primitives. This is (virtual) machine design: the Purrr control
primitives.


Entry: Purrr control primitives
Date: Sun Jun 15 12:07:38 CEST 2008

conditional jumps : VM primitives


Entry: Combining Forms and Higher Order Macros
Date: Sun Jun 15 12:25:15 CEST 2008

EDIT: This used to be a blog entry, but is too confused to qualify as
such.  Most of this is about the clean coma language with higher order
macros, which needs to be implemented first before any of the array
processing extensions can be added.

Coma is used as the pure functional macro kernel of the Forth language
dialect template called Purrr, which is further specialized to yield a
PIC18 Forth dialect.

Coma is essentially a template programming language: all words that
occur in a source file are generate and process intermediate code if
they occur in an instantiating context, i.e. a Forth word.

Classical macros expand code.  The essential extra step to make
program specialization work is a reduction step: after expansion of
several primitives, examine the resulting code and reduce it by
evaluating part of it.  In Coma the expansion and reduction operations
are combined into a single step which are specified as primitive
intermediate code pattern matching words.

In a compositional/concatenative language, the reduction operation
becomes simpler compared to lambda reduction, due to the absence of
variable names.  Other than that, the principles are the same:
anything done in Coma can be extended to expression languages with
formal parameters with a few extra substitution rules.


What I am interested in is to see in what way this can be applied to
higher order programming.  Currently Coma is used mainly to support
Forth (a first order language), using a simple extension that allows
the implementation of Forth's structured programming on top of
conditional and unconditional jumps.

Alternatively, a more Joy-like language can be constructed on top of
Coma, where all control flow is built on top of recursion and the
'ifte combinator.

This is where I'm a bit in the dark.  It is possible to create
replacements for Forth's structured programming words by using higher
order combinators that perform partial evaluation on code quotations,
inlining them.  A construction like

     (a b c) (d e f) ifte

would essentially compile to Forth's

     if  a b c  else  d e f  then

There are two main problems with this approach. One is should 'ifte
have a semantics if it has arguments that cannot be evaluated at
compile time?  It doesn't seem too difficult to allow a run-time code
quotations to _exist_ at run time; they are just code pointers.
However, allowing them to be _constructed_ at run time (the analogy of
closures) requires some run-time memory management (either full GC or
linear memory).

Another problem is explicit recursion.  The main difficulty about PE
is when to stop.  It's not possible to unroll recursive definitions
completely since this provides an infinitely large program.  Wadler
talks about this in the orginal deforestation paper, where such
infinite expansions are caught by introducing run-time recursion of
newly generated functions.  An alternative approach (Bananas and
Lenses) is to solve the PE problem at a higher level: by identifying
(a limited set of) higher order combinators and deriving a set of
rules about how they combine, program manipulation is possible on that
higher level, instead of working with the mostly unstructured
recursion (goto!) in unfolded combinator bodies.  In Backus' FP, all
the combinators are fixed, and there is no composition mechanism for
combinators. According to Backus (Iverson?)  this would force
programmers to learn to use the available operators efficiently.

What I'd like to figure out is what set of data types + functional
forms would be good for certain classes of problems related to
resource constrainted programming.  How to reduce the space of
programs such that efficiency is guaranteed, but programs can still be
written in a high level declarative form.

References:

http://www.stanford.edu/class/cs242/readings/backus.pdf
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.36.1030
http://citeseer.ist.psu.edu/wadler90deforestation.html
http://citeseer.ist.psu.edu/meijer91functional.html


Entry: partial evaluation of higher order functions
Date: Sun Jun 15 23:46:16 CEST 2008

  * higher order macros (HOM)
  * list comprehensions
  * combining forms + 1st order functions (FP)
  * forth -> pure macros. what is this level shift?
  * deforestation

EDIT: STA(ck)APL

According to the wikipedia FP entry, limiting a language to 1st order
functions and a limited number of (non-composable) 2nd order functions
creates a simple algebraic structure. Combining forms are quite simply
defined as 2nd order functions.

List comprehensions are similar to limited order combining forms: they
avoid the use of higher order functions to perform
iteration/folding. However, they do not generate first order functions
as objects: they are merely syntax.

I found a definition for HOMs here:
http://foldoc.org/?higher-order+macro

   "A means of expressing certain higher-order functions in a first
    order language."

P.L. Wadler "Deforestation: Transforming programs to eliminate trees"
http://www.springerlink.com/content/7217v376n7388582/

original paper:
http://homepages.inf.ed.ac.uk/wadler/papers/deforest/deforest.ps

Where i'd like to end up is to find the relationship between my forth
macro approach and fixed / composable 2nd order functions. I need a
theoretical framework, probably some type system restriction, to get
out of the anything->anything lisp world.

What I use this for is not necessarily to define a clean langauge
specification, but to see if it can help choose between higher order
functions and inlined expansion. Since I have higher order functions,
but would like to have inlining as optimization.

Somewhat related, i can make things such that each word is linked to
its originating macro. This might lead to functions that are both
instantiated, and available as a macro whenever they are used in a
combining form that cannot accept a function at run time.


Entry: more input
Date: Mon Jun 16 16:18:41 CEST 2008

Shopping around for input:
http://citeseer.ist.psu.edu/440438.html

 "Macros as Multi-Stage Computations: Type-Safe, Generative, Binding
 Macros in MacroML (2001)"

This is actually an interesting paper which deals with a lot of stuff
that's on my mind now. Might be interesting to take my implementation
directed approach a bit further. Macros in statically typed languages:
difficult? Is syntax tree manipulation in dynamic languages less
ad-hoc?

This is an interesting hub-paper.


Entry: preparing for state shuffling
Date: Mon Jun 16 16:29:14 CEST 2008

the current syntax in instantiate.ss and 'state-lambda' is not very
good. i already ran into trouble mixing out variables into something
returning a next state object + data.

it's also not abstract enough to be able to replace structure type
based encoding with list (stack of stacks?) encoding.

-> use syntax parameters

Oh, they are not so scary :)
Maybe it's best to make the rpn transformer like that too, instead of
all these compile time parameters. That's a major overhaul that could
be used to write it in cps actually.


Entry: wandering into confusion
Date: Tue Jun 17 19:13:25 CEST 2008

In very general terms, what I want to do is to see if I can do DSP in
a concatenative language with the proper higher order combinators for
list/array processing, and find a way to optimize all combinators away
to produce optimal loop code. That's all really: design a language
that feels high-level and lispy, but is guaranteed to compile to
efficient constructs.

My hunch is that this is actually not so very difficult to do with a
concatenative language. What I don't see is how to perform
instantiation automatically. How to move from high -> low, treating
higher order functions as macros. And, how much am I not just
re-inventing APL.

As far as I read, Wadler's deforestation paper deals with this kind of
higher order macros in section 5. Do I need to understand
deforestation before being able to use these macros? I don't really
need arbitrary intermediate tree data structures: i'm really just
using arrays. Looks like deforestation is necessarily expressed in a
first-order language, and higher order macros are a way to get some of
the effects of higher order functions in a first order language.

Thinking about this, it probably needs automatic instantiation: no
manual macro/forth defs.

Now this:
http://www.cs.ucsd.edu/~goguen/ps/will.ps.gz

looks like it is pretty close to what i'm doing. I was wondering about
how to relate C++ templates to purrr composable macros, but looks like
the root of this idea is in the OBJ language.


Entry: apology
Date: Tue Jun 17 23:39:50 CEST 2008

Why am I doing this? It is really about the language, about its
algebraic feel. Maybe I should be honest and keep that as the only
real reason. It's like Legos. It clicks.

Then there are explanations of why I might like it:

  * Concatenative languages span a wide spectrum in a useful way. This
    allows me to use similar paradigms from the very low to the very
    high.

  * One can get far without closures (which take the form of curried
    quotations created at run-time).

  * Partial evaluation is simple for functional concatenative
    languages: scopes don't get in your way.

  * Imperative concatenative language can have a large functional
    subset.

  * Linear memory management becomes non-intrusive.


Some fragments of correspondence. Question about breaking new ground
with the Staaple approach:

Well, I know for a while now I need some form of compile-time program
specialization that can turn higher order functions into specialized first
order loops. The real question is in how to simplify the programming
language such that this the problem of writing the compiler can be solved
by me in a limited time. Doing it only with an interpreter + specialized
manually crafted C core routines (like PF, Pure Data, Supercollider,
Matlab, ...) is not powerful enough.

Untyped lambda calculus is too general to solve the problem with a simple
compiler. Typed lambda calculus works better but such a language is not so
straightforward to implement. So I'm looking at something first-order with
higher order macros, closer in spirit to APL, Backus' FP and C++
templates than lisp.

The only ground I broke is that I ended up with a non-intrusive way to
combine compile time operations and run time operations in one language
without semantic problems, simply by taking a functional programming view
where evaluation time might be thought of as unspecified.

Concatenative macros are a very natural way to do template programming,
because name bindings don't get in the way. Concatenative form can also be
easily transformed to nested expression form so when I need data flow
analysis I can do it, but for some program transformations it's really
easier to keep it concatenative. Code is more 'algebraic' and less 'logic'
in that form if that makes sense at all.. Lists instead of trees.

What I have now is still manual: there is no automatic loop inlining
happening. I'd like to figure out if this is necessarily a part of
the language (1st order language + some 2nd order functionals) or if i
can automate it, so it becomes a language with higher order semantics
preserved in case the opti doesn't apply.

So, while I find it interesting, I am getting in territory where I should
be careful not to be too general, and try to stick to the problem of making
a language wich is very close to machine language, but has access to higher
order constructs. I'm already there: the macroassembler on steroids idea:
for PIC18 the bottom layer concatenative language almost maps 1-1 to
assembler. It's automatic code juggling part on top of it that is giving me
headaches.


About writing beginner languages:

I did some workshops with forth now, and i find people pick it up pretty
fast. The real problem is not language though. Some languages go smoother
in the beginning than others, but I found the real problem to be the point
where you leave simple scripting (filling in parameter values) and code
composition enters the picture: how to divide and conquer. I think a
beginner language should stretch the scripting part as long as possible,
but i sort of gave up on that idea. It only makes hitting the abstraction
wall more painful. What i got a bit discouraged about is that more often
than not, no matter what you try people like to stay in that scripting
area. I don't know if there's a way to trick people into crossing that
barrier unknowingly. Did you run into something like this with scheme?


Entry: about deforestation
Date: Wed Jun 18 12:39:51 CEST 2008

Maybe his earlier work on lists is better suited to translate to my
problem: folding combinators for array processing.

I'm facing a gap between my understanding of theory, and a particular
practical problem. I should just try to solve one such problem with
higher order combinators to get a better view about concrete problems,
to get out of the muck of abstract confusion..

The problem is really one of datastructures and their iterators. In
FP, there are only nested arrays and some map and shift operators.

So.. Deforestation is about eliminating intermediate nested data
structures in a first order language with recursive pattern matching
for tree deconstruction.

Wadler defines the property 'treeless', which is used to construct
transformation rules to transform a composition of treeless functions
into a treeless function.

Definition: a term is treeless with respect to a set F of function
names if:

  * it is linear (each variable is used only once. this is to make
    sure transformations don't introduce redundant computations, and
    can be relaxed for integers)

  * it only contains functions in F (these are the 'exceptions' that
    will be expanded)

  * every argument of a function application and every selector in a
    case term is a variable. (obviously, otherwise there would be an
    intermediate tree result)

The algorithm that performs deforestation maps a linear term which
contains variables and functions with treeless definitions to a
treeless term and a possibly empty set of treeless definitions.

The core of the algorithm is the standard function 'inlining': replace
each function application with an inlined definition.

If i start to think from that point, shouldn't i get to something
simple? After all, i have no variable names to worry about, and no
destructuring or creation of run-time data structures. That is all
quite different.

So, question: what is the equivalent of deforestation in a
concatenative language with a simply managed array data structure and
a 'map' operator?

Let's write down some things that need to 'actually happen'

  * loop transformation to eliminate intermediate buffers
  * array memory management and reuse (linearity?)
  * dereferencing indirect addressing (on PIC18)

Dereferencing indirect addressing is a particular problem i ran into
writing highly specialized DSP code for microcontroller. It's a fairly
extreme level of templating which makes sense due to very limited
indirect addressing and multiplication on PIC18.

There is a difference beteween translating 'map f l' to (cons (f l1)
...) and making sure the same operation happens in place, or with a
fast reuse array.

There's something in Wadler's paper about
instantiate/unfold/simplify/fold, found in Burstall and Darlington: "A
transformation system for developing recursive programs."


Entry: eliminating intermediates in a concatenative language
Date: Wed Jun 18 15:14:52 CEST 2008

Suppose [ f map ] applies an operation to a number of data structures
and produces a number of data structures, according to the arity [ f
].

The transformation that eliminates intermediate storage is:

   [ f map g map ] -> [ [ f g ] map ]

In opti talk this is called loop fusion. For simple video processing,
this is about the single most important optimization: it eliminates
storage of intermediate frames, which take up a large part of cache
memory. This optimization is in practice bounded by:

   - dependency depth for deep pipelines
   - instruction cache size

Naively, to take care of those issues it can be beneficial to limit
loop fusion and allow for a limited sized intermediate
buffer. These 2nd order problems can be ignored for now. The most
important life saver is loop fusion, which if not for speed, can save
a lot of memory.

Translating this optimization to current concatenative macro
architecture, it requires access to all functions in macro form. Final
instantiation of 'map' can be the generation of a for loop and buffer
allocation, but the important step is the fusion. How to use current
code substitution rules to implement this?

For one, it requires a 2-pass algorithm in the current form. The map
macro cannot be instantiated until all fusion has happened. Postponed
partial evaluation is solved in the pattern language using pseudo
assembler instructions.

EDIT:
http://www.randomhacks.net/articles/2007/02/10/map-fusion-and-haskell-performance
apparently, in haskell this kind of optimization is pluggable

There are a couple of interesting links in that article.

It's all starting to become a bit more clear: concentrate on
properties of higher order combinators. One of the links seems to be
about automatically deriving these kinds of rules (Wadler's Theorems
For Free).


Entry: code transformations
Date: Wed Jun 18 17:22:49 CEST 2008

The multi-pass optimization algorithm has an ad-hoc form: the first
pass instantiates macros, while subsequent passes perform specific
substitutions only. I can't express it properly, but shouldn't code
originally have the form ([qw <macro1>] [qw <macro2>] ...) with a
'run' application?

An important question: is there a way to get rid of this 2-pass
mechanism? Is it necessary or just more elegant to use a demand driven
pipeline?

To implement  [ f map g map ] -> [ [ f g ] map ] we need to represent
'map' as a pseudo op:

  (([map f] [qw g] map) ([map (macro: f g)]))
  (([qw f] map)         ([map f]))

And the map can be eliminated in a 2nd pass by instantiating it:

  (([map f] map-pass) (macro: f do-map))

This is entirely too trivial, so all the beef is hidden in the
instantiation of 'map'. Which makes me think, what is 'map'? An
implementation of a data structure iterator + a specification of its
abstract properties used for source manipulation.

next: Think about 'fold' over arbitrary data structures and related
things like loops, iterators, comprehensions, compile time folding,
...

There's some rewriting in Cat that does this
http://www.cdiggins.com/cat/cat.pdf


Entry: doubts about compilation
Date: Thu Jun 19 09:12:11 CEST 2008

I'm wondering wether it isn't better to keep macro code representation
in list form. The assembler output is again a concatenative language
in source form, so why doesn't it have the same form as the input?

Also, maybe the eager algorithm is too simple?

The problem is that given a composition [a b c] it could be split as
[ab c] or [a bc]. One could be significantly simpler than the other
syntactically, while semantically they denote the same function. An
eager algorithm is always going to pick the first option. This is the
reason why some rewriting operations are postponed to a next pass.

  More specifically: the multipass algorithm now is a simplification
  of a general non-deterministic algorithm that optimizes ideal
  combination of terms.

Maybe multiple passes need to be defined more abstractly?


Entry: back to fixing bugs
Date: Thu Jun 19 11:39:57 CEST 2008

since i'm just getting confused over ideas that need fermentation and
more reading, maybe best to start fixing bugs.

the alledged problem with 'org' seems to be a problem with forth/macro
mode switching.

OK:

macro
\ : config #x10 ;
forth

#x20 org : bla

WRONG:

macro
: config #x10 ;
forth

#x20 org : bla


probably 'forth' needs to terminate previous macro defs, otherwize
non-labeled code will be concatenated to the last macro def.

looks like that was indeed the problem.

that was the last known bug in the way of uploading code. time for
hands-on!


Entry: interaction code
Date: Thu Jun 19 13:25:59 CEST 2008

might be better to try to get some communication going with the
previous monitor, before uploading the freshly compiled one.

start with target.ss

first part is upgrade to plt 4.0
for loops: comprehensions.


Entry: lazy-connect: book vs. conversation
Date: Thu Jun 19 15:06:04 CEST 2008

about the 'current state' issue for interactive development. i guess
it's ok to have state. the previous approach of making everything
temporary is maybe a bit too brutal.

i.e. a current connection is really ok. use custodians to manage that
kind of stuff, not parameters.

On the other hand, for lowlevel interaction it might be a good idea to
flush buffers on every message exchange, since things tend to go
wrong.

basic interaction works:
box> (with-io-device '("/dev/ttyUSB0" 9600) (lambda () (scat> ping)))
CATkit
<0>
box>

ordinary target access seems to work without trouble. the thing that
needs to change is interaction with the target dictionary, which is
now a scheme namespace + serialization, incremental compilation and
code upload.


Entry: dictionary / namespace
Date: Thu Jun 19 20:51:48 CEST 2008

Should the interaction code live in the same namespace as the
compiler? It would be nice to be able to specify interaction code in
source files, so maybe it's best to do that.

I need to be careful here not to fall into the same pit: host side
code should be composable + interaction templates too: prefix syntax
is ideal to override default target semantics so need to have it, but
should be composable.


Entry: data types + HOF
Date: Thu Jun 19 22:08:06 CEST 2008

So, along the lines of
http://www.randomhacks.net/articles/2007/02/10/map-fusion-and-haskell-performance

The basic idea is: whenever you define a data type and a
map/fold/... HOF you need to somehow obtain transformation rules to
simplify compositions, by moving operations inside loops.

The problem is, mapping this to Purrr, it's quite easy to add these
transformation rules (manually), but the problem is really the
representation of the data type. How to add the runtime support for
say a video frame?

It is also pretty clear that typing is essential here: i'd really like
'+' to be polymorphic, so i can make every occurence of 'map'
implicit. Functions should be upgraded (coerced?) automatically.

I'm getting confused now.. Polymorphic macros. Is assembly pattern
matching a genuine type system?


Just had a look at the wikipedia C++ template page, and it says:

   a feature of the C++ programming language that allow code to be
   written without consideration of the data type with which it will
   eventually be used

It looks like what i'm doing is more general that that (compile time
decisions based on literal values), but on the other hand, you can
probably hide everything that can be done with values in Purrr, in
classes in C++.

Maybe this is an essential difference really: value based
metaprogramming instead of type based? Does this make sense at all?
( Objects instead of classes? Prototype templates? )


Entry: Faust
Date: Fri Jun 20 00:28:30 CEST 2008

http://faust.grame.fr/

basicly what i want to do, but i don't like the syntax. makes me think
that i'm on the right track with a concatenative specification
frontend to solve the 'bussing' problem: connecting multi in/out
things..

another thing to solve when doing a dsp language like that is
block-based algorithms.

so, am i on the right way with programmable macro semantics? say i can
use a pic18 program that imports a module with a different macro
semantics that produces a static dataflow network + buffer management?

maybe i need to just write another synth to pull this thing
through. something more classic dataflow + feedback, with emphasis on
compilation to pic18 architecture.

NOTE: translating concatenative code to expressions has one advantage:
it makes usage explicit, so allocation might be simpler?


Entry: rewriting
Date: Fri Jun 20 00:51:18 CEST 2008

should i give up eager pattern matching, and move to a different
rewriting system, or is the current one good enough when it's equipped
with an easier to use multipass architecture?

this is interesting:
http://lambda-the-ultimate.org/node/1658#comment-20313


Entry: live parsing words
Date: Fri Jun 20 09:54:32 CEST 2008

problem is that this is a map from (live) -> (scat) while the other
substitutions macro defines endomaps.

it's straightforward to map to a different namespace:

  (unquote (ns (scat) id))

but doing this one looses nested macros, which i just conveniently
used for defining substitution-types. (using a primitive language, one
needs 2 composition methods)

wait.. the live->live subsitutions for 2sim can remain as is. just
need to define the primitive properly. it's probably easiest to just
use quoted code + run.

next: tfind


Entry: 3 different languages
Date: Fri Jun 20 11:01:48 CEST 2008

compositions:          (name w1 w2 w3)
postfix asm patterns:  ((a1 a2 name) (b1 b2 ...))
prefix substitutionsn: ((name a1 a2) (w1 w2 w3 ...))

Compositions are the core of the functional language. Postfix asm
patterns are used to implement eager rewrite rules during translation
and prefix substitutions are used for changing semantics of symbolic
names and numbers.


Entry: the meta namespace
Date: Fri Jun 20 12:06:22 CEST 2008

i run into a problem with different instances of the target-word
structure. maybe the solution is to make sure badnop runs in a module
namespace?

looks like the problem is with using the namespace anchor attached to
the module namespace.. maybe best to attach it to the repl's toplevel
namespace.

There's still some confusion: if a module A is in namspace NS, and
module B is required, but B requires A, then A will be
re-instantiated, right? Unless NS is a module namespace.

Let's see, what's the difference between:


> (define ns (make-base-namespace))
> (dynamic-require "target/rep.ss" #f)
instantiating target/rep
> (namespace-require "target/rep.ss")
> (namespace-attach-module (current-namespace) "target/rep.ss" ns)
> (parameterize ((current-namespace ns)) (namespace-require "target/rep.ss"))

and doing this from within a module doesn't work...

An explanation: when a require form is evaluated inside a module, the
module registery of the required module is not the same as that of the
namespace in which it is required.

A toy example:

(module A scheme/base (printf "instantiating A\n"))
(module B scheme/base (require 'A) (printf "instantiating B\n"))

box> (require 'A)
instantiating A
box> (require 'B)
instantiating B
box>

again:

box> (require 'B)
instantiating A
instantiating B
box>

so... A is not re-instantiated.. what am i doing wrong?

ok.. the namespace.ss code works just fine:


;; Create a namespace with shared and private module instances.
(define (shared/initial-namespace src-ns shared private)
  (let ((dst-ns (make-base-namespace)))

    ;; See PLT 4.0 guide, section 16.3
    ;; Reflection and Dynamic Evaluation -> Sharing Data and Code Across Namespaces
    (define (load-shared mod)
      (parameterize ((current-namespace src-ns))  ;; make sure it's there
        (dynamic-require mod #f)
        (namespace-require mod))

      (namespace-attach-module src-ns mod dst-ns) ;; get instance from here
      (parameterize ((current-namespace dst-ns))  ;; create bindings
        (namespace-require mod))
      )

    (define (load-private mod)
      (parameterize ((current-namespace dst-ns))
        (dynamic-require mod #f)
        (namespace-require mod)))

    (for-each load-shared shared)
    (for-each load-private private)

    dst-ns))

The problem seems to be about other modules that are loaded into that
namespace. They seem to re-instantiate the the target/rep module.

Ok, it was really really stupid:

(namespace-require "pic18.ss")) ->  (namespace-require 'staapl/pic18))

Just a module name issue.


Entry: prj
Date: Fri Jun 20 13:54:13 CEST 2008

Prj is a small bit of glue to enable loading of different specialized
compilers in their own namespace. All namespaces share some code for
space efficiency and some datastructures so they can communicate
through the badnop layer.

Now, what about 'find'?

This is a reflective operation: it looks in the current toplevel
namespace to map a symbol to a value.

Ok, looks like everything is there now, just needs to be patched
together.


Entry: live interaction language
Date: Fri Jun 20 20:06:01 CEST 2008

the current interface doesn't seem to support what i want to do.

- to invoke macros properly i need to set the ide map to (live) prefix

- however, this map leads to undefined words for default semantics,
  which would take items from the (target) prefix.

however, it is possible to put the macros in the 'target' namespace.

there's a simple workaround: make sure the target word is executable,
or add an interpretation step. the latter is probably going to be
simplest, since performance is not really an issue here :)

used the interpretation step. interaction seems to work fine now. got
both a prj> and a live> language.

what is missing is incremental compilation with suspended state +
inspection.


Entry: terminology cleanup
Date: Sat Jun 21 10:21:53 CEST 2008

chunk:    list of binary/asm code associated to a word (entry point)
chain:    list of chunks with fallthrough

for manipulating blobs of code for ihex + upload, a different name is
necessary. let's call it blob then instead of chunk.

bin:      (number (listof number)) binary code list that happens to be
          consecutive

line:     upload unit (8 bytes on PIC18)
block:    erase unit (64 bytes on PIC18)


to ease uploading, the chunk/chain subdivision is converted to lumps,
which are then concatenated into bigger lumps. and split into lines.


Entry: binary data objects
Date: Sat Jun 21 10:51:04 CEST 2008

Handling binary data involves a lot of fixed size tables. Instead of
using index lists, it might be more elegant to use 'for/list'
comprehensions and sequences.

This deserves a bit of attention. Let's define the type better.

  - bin = (listof bin-chunk)
  - binchunk = (listof number (listof number))  : address + codelist

Operations:
  - splitting/joining bytes/words
  - line splitting
  - aligmnent
  - binchunk combinations


Entry: avoiding O(N^2)
Date: Sat Jun 21 12:10:04 CEST 2008

The problem of combining binchunks when they are consecutive is
interesting: I can't find a way to use my usual collection of higher
order functions without running into O(N^2) complexity due to iterated
append.

The problem: given a list of address/code pairs, create a new list
which combines them if they are consecutive.

((a c) (a c) ...)

To avoid N^2 and multiple traversals, the easiest way to do this is
using a state machine with an accumulator, but is it possible to use
HOFs for this, other than fold (which just factors out the explicit
recursion/looping), with the choice of using constant space (left
fold) or minimal hassle (right fold).

Maybe i need to look into parser combinators, or try to write one
figuring out the core routines.

On the other hand, the right abstraction for this might be stream
processing: convert binchuncks to a stream of word/address pairs, and
recobine them.

Let's try that first. Looks like i'm looking for 'for/fold'

This is actually an interesting point where first order functions are
syntactically more convenient than higher order ones ('for' is a
form!). The difference seems really syntax: it's probably
straighforward to convert between HOF and comprehension form.

In the guide: 11.8 Iteration Performance. Looks like they've been
thinking about optimization too :)

This actually looks like a good candidate for a ``pre-scheme'' style
first order language that compiles to straight machine code without
runtime support.

I don't understand why there is no 'in-append'. This would be a nice
exercise for sequence combinators.


Entry: Parsing combinators.
Date: Sat Jun 21 14:53:40 CEST 2008

A lot of code in Staapl is about converting one datastructure into
another one. Serializing one is simple, but collecting into another
seems more difficult.

Upto now I've been using manual stack manipulation to collect data
structures. Is there a better way to tackle this?

Almost always this is insertion into trees + postprocessing
(reversing).

Let's see..

stack levels
1 -> push            (a b c) -> (x a b c)
2 -> push + push'    ((a b c) (d e f)) -> ((x a b c) (d e f))
                                       -> ((x) (a b c) (d e f))

Let's start with a simple list-of-list parser with the operations:

->0
0->1
1->2


I tried it with vectors:

;; ----

#lang scheme/base
(require "list.ss")


(define (llp-push-level! v n x)
  (let ((stack (vector-ref v n)))
    (vector-set! v n (cons x stack))))


;; (llp-move! v n x)  push x to stack n
;; (llp-move! v n)    push stack n-1 to stack n
(define (llp-move! v n
             [x (let* ((n- (- n 1))
                       (x- (vector-ref v n-)))
                  (vector-set! v n- '()) ;; move to x
                  x-)])
  (llp-push-level! v n x))

(define (llp-push! v x)
  (llp-move! v 0 x))

(define (llp-compact! v n)
  (for ((i (in-range n)))
       (llp-move! v (add1 i))))

(define (make-llp n)
  (make-vector n '()))

(define v (make-llp 3))


;; ----


But it's probably better to just use lists, since the operations
themselves are simple tree operations.

push:    (a b ...) -> ((x . a) b ...)
compact: (a b ...) -> (() (a . b) ...)

this becomes:

;; ----

;; Stack of stacks.
(define (make-sos n)
  (for/list ((i (in-range n))) '()))

;; Convert a collapsed sos to a list of lists, applying an operation
;; to each list level.
(define (sos->lol sos [op reverse])
  (let ((dim-1 (- (length sos) 1)))
    (let down ((l (list-ref sos dim-1))
               (n dim-1))
      (if (zero? n)
          (op l)
          (op (map (lambda (le) (down le (- n 1)))
                   l))))))

(define (sos-push sos x)
  (cons (cons x (car sos))
        (cdr sos)))

(define (sos-collapse sos n)
  (if (zero? n)
      sos
      (cons '()
            (sos-collapse
             (sos-push (cdr sos)
                       (car sos))
             (- n 1)))))


;; ----


Entry: comprehensions + delimited control
Date: Mon Jun 23 08:14:22 CEST 2008

Looks like a nice alternative to lazy lists. Screams for coroutines,
so might want to add some abstraction to it.

http://groups.google.com/group/plt-scheme/browse_thread/thread/d0ff99391f9ac53f/f5de867c296afcbe?lnk=gst&q=yield#f5de867c296afcbe

(define (in-yielder f)
  (define end (list 1))
  (define i (iter f end))
  (make-do-sequence (lambda ()
                      (values (lambda (_) (i))
                              void void void
                              (lambda (e) (not (eq? end e)))
                              void))))
(for/list ([x (in-yielder (lambda (yield)
                            (for-each yield '(1 2 3))))])
          x)


Entry: enumerators vs. cursors
Date: Mon Jun 23 11:08:51 CEST 2008

http://okmij.org/ftp/Computation/Continuations.html#enumerator-stream
http://okmij.org/ftp/papers/LL3-collections-talk.pdf
http://okmij.org/ftp/Scheme/enumerators-callcc.html
http://lambda-the-ultimate.org/node/1882

Oleg makes a case for providing enumerators natively, and deriving
cursors from them if necessary, since they are the less useful
variant.

stream:     encapsulated iteration state
enumerator: collection fold

Comprehensions are similar to enumerators, but they do not iterate
over an abstracted datastructure, but over a concrete (sum/product) of
(possible abstract) data structures. They are trivially translated to
a map/fold HOF + fold function + a data constructor.

http://srfi.schemers.org/srfi-42/srfi-42.html
According to Sebastian Egner the main reason for this srfi is a simple
form for the naturals.

I find the middle way: using 'in-generator' from tools/seq.ss to
convert generators based on delimited control to sequences usable in
comprehensions quite convenient.


Entry: got ihex
Date: Mon Jun 23 11:48:23 CEST 2008

with the comprehension based binary code formatters we're now at the
point to upload code and generate the monitor code in proper format:

:020000040030CA
:0E00000000C80F000080800003C003A0034072
:020000040000FA
:020000001FD00F
:020000040000FA
:02000800FFD027
:020000040000FA
:02001800FFD017
:020000040000FA
:10004000C8D09EBA01D0FDD7ABA203D0AB98AB8885
:10005000F8D7EC6EAE50ABA402D0ED50F2D7120040
:10006000ACB201D0FDD7AD6EED5012000500E9DF56
:10007000FD6EED50E6DFFE6EED501200EC6E000EF0
:10008000EFD7A68EA69C88D8F9D7A68EA69C87D82F
:10009000F5D70F0B8DD812000ED0E2D70ED041D07D
:1000A00047D0ECD7FF0019D00DD02CD020D047D0AE
:1000B00033D0E7D7EAD7C5DFE1D7D8DFDFD7C1DF55
:1000C000E8DFFDD7BEDFE46EED50EC6E0900F550C1
:1000D000C7DFE706FAE1E5521200B3DFE46EED5048
:1000E000EC6EDE50BDDFE706FBE1E5521200A9DF52
:1000F000E46EED50A6DFF56E0D00ED50E706FAE177
:10010000E552BCD79EDFE46EED509BDFDE6EED5016
:10011000E706FBE1E552B2D75BD8EC6E0800F528A4
:10012000D2D78FDF8EDFDA6EED50D96EED50A6D7C5
:1001300088DF87DFF76EED50F66EED509FD7EC6EDF
:10014000400EE46EFF0EEC6E0900F550ED14E7066C
:10015000FAE1E55285D7EC6EFD50F66EFE50F76E73
:10016000ED5006001200D96E800AE834DA50DA5AEF
:10017000ED501200F8DFEC6EDE501200F4DFDE6EA0
:10018000ED501200EC6EF29E550EA76EAA0EA76EF1
:10019000A682F28EED501200A684A688F3D7A6841C
:1001A000A6980A00EFDF09001200EF60EF6EED5035
:1001B000E844FD26010BFE22ED50120000EE7FF018
:1001C00010EE8FF08A68EC6E700ED36EED50120058
:1001D0001200000EFC6EF2DFC18AC18C93889382FC
:1001E000EC6E330EAF6E240EAC6E900EAB6EED5017
:0801F00081A801D064D704D0FE
:00000001FF


Entry: ssa
Date: Mon Jun 23 13:07:38 CEST 2008

http://www.cs.princeton.edu/~appel/papers/ssafun.ps
http://lambda-the-ultimate.org/node/2860


Entry: DSP language
Date: Mon Jun 23 22:12:09 CEST 2008

let's write a simple synth that runs on PIC18, but uses a nontrivial
hardware mapping.


Entry: booting the monitor
Date: Tue Jun 24 10:05:51 CEST 2008

and it's not working :)

good. time to get some debugging tools online. that's what we're here
for, right?

first: slurp + printing a hexdump from a sequence.

got hexdump + easier io stuff working, problem is not the serial line
but something else (debug-transmit and debug-loopback work)

box> (io> (target> 1 2 3 ts))
<3> 1 2 3

the problem seems to be just with 'ping'
ok. i remember: there's no 'hello' string defined.

fixed.
looks like it's running now, but i can't seem to execute words.


Entry: playing with generators
Date: Tue Jun 24 10:30:59 CEST 2008

A problem that i've run into is 'wrapping' a sequence around a loop to
build a 2D view. It pops up more than once (i.e. list->table), so lets
make an abstraction for it.

This is trivial to solve with a generator + comprehension:

     (for ((row  (in-naturals)))
        (printf "~a \n" row)
        (for ((columns (in-range 8)))
           (printf "~a " (generate)))
        (printf "\n"))

But the problem here is termination condition. Can this be turned into
a comprehension that abstracts termination?

A simple way is to turn the printing loop itself into a generator:

(define printer
(in-generator
 (lambda (yield)
   (for ((row (in-naturals)))
      (yield (lambda (x) (printf "~a ~a" (* row 8) x)))
      (for ((i (in-range 6))) (yield (lambda (x) (printf " ~a" x))))
      (yield (lambda (x) (printf "~a\n" x)))))))

which can then be easily combined with the sequence to be printed

(for ((p printer))
     ((i sequence))
  (p i))

The only thing that's awkward here is the missing newline in case the
sequence terminates in the middle of a line. This could be solved by
using some 'strings with backspace'.

This is the cleaned-up hex printer sequence:


(define (in-hex-printer [start 0]
                        [data-nibbles 2]
                        [address-nibbles 4])
  (in-generator
   (lambda (yield)

     (define-syntax-rule (lp formals . args)
       (yield (lambda formals (printf . args))))
     (define (addr x) (hex->string address-nibbles x))
     (define (data x) (hex->string data-nibbles x))

     (for ((row (in-naturals start)))
       (lp (x) "~a  ~a" (addr (* row 8)) (data x))
       (for ((i (in-range 6))) (lp (x) " ~a" (data x)))
       (lp (x)  " ~a\n" (data x))))))

Which can be used as in:

(define (slurp)
  (for ((i (in-thunk in))
        (p (in-hex-printer))) (p i)))


Now, turn it into an enumerator + a sequence derivative. Does that
makes sense? What does a fold look like if the print output is seen as
a data structure? It actually makes more sense as unfold operation.

So, really, print is a consumer, but i turned it into a consumer
producer.

Moral of the story?

  * comprehensions abstract termination conditions which makes them
    easier to use than generators (eof?/read)

  * in cases where generators are more convenient than nested/parallel
    loops (parsing/printing-style data representation conversion
    problems), the consumer can be turned into a a squence of consumer
    procedures, which can then be linked to a producer sequence in a
    simple for loop.


Entry: target mode / simulator
Date: Tue Jun 24 15:43:04 CEST 2008

in target mode, it would be interesting to allow all kind of macros,
and try to simulate them, at first only the 'qw' and 'cw'
instructions.


Entry: interactive compilation
Date: Tue Jun 24 16:33:25 CEST 2008

2 things:

  * what to do with the global code accumulator?

  * no more 'compile' mode for quick & dirty interactive compilation:
    use files/buffers instead.

let's have a look first at serialization. it's not possible to
serialize macros, but it might be possible to serialize constants and
aliases.


ok. got a function to read/write the namespace. on write, the macros
get recreated.

(define (target-words [words #f])
  (if words
      (for-each*
       (lambda (name realm address)
         (let ((word
                (eval
                  `(new-target-word #:name ',name
                                    #:realm ',realm
                                    #:address ,address))))
           (eval
            `(begin
               (define-ns (target) ,name ,word)
               (define-ns (macro)  ,name
                 ,(case realm
                    ((code) `(scat: ',word compile))
                    ((data) `(scat: ',word literal))))))))
       words)
      (for/list ((name (ns-mapped-symbols '(target))))
        (let ((word (find-target-word/false name)))
          (list name
                (target-word-realm word)
                (target-word-address word))))))


the global accumulator could be replaced by a parameter, so
file->words conversion is possible locally. this deserves some
cleanup.

Entry: multi-stage programming
Date: Tue Jun 24 19:18:23 CEST 2008

http://www.cs.rice.edu/~taha/teaching/03F/511/

talks about metaprogramming (manual staging) in a type-safe manner.


Entry: questions
Date: Wed Jun 25 02:25:19 CEST 2008

The problem is not answers; it's asking the right questions. An
attempt:

 Q: Why are multiple passes for the rewriter so essential? Given a
    satisfactory answer to this, is it better to rewrite first to a
    simple qw,cw language, or are per-target patterns better?

 Q: Is it possible to see non-compilable pseudo assembler results
    somehow as type errors or contract violations, and associate
    blame?

Following John Nowak's advice, let's have a look again at the Joy page
about rewriting, and Backus' Turing Award lecture.

http://www.latrobe.edu.au/philosophy/phimvt/joy/j07rrs.html
http://www.stanford.edu/class/cs242/readings/backus.pdf


Entry: Joy and rewriting
Date: Wed Jun 25 10:46:45 CEST 2008

The setting:

  Any programming language can be given a rewriting system, but for
  Joy it's particularly simple.

The idea is thus to put it on top: a rewriting system as a
metaprogramming system: a source code transformation.
( Actually, no. It seems to be about giving a full semantics to a Joy
program using JUST rewrite rules. )


 Q: Does it make a difference if the rewriting system works on the
    source language (as in Joy) or the target language (as in Purrr)?

It becomes interesting at the point where "The role of the stack" is
discussed: reduction strategies.

There's this 'duality' between programs and stacks that's right in the
middle of my representation. Looks like Manfred was there first:

   This is the key for a semantics without a stack: Joy programs
   denote unary functions taking one program as arguments and giving
   one program as value. The literals denote append operations; the
   program returned as value is like the program given as argument,
   except that it has the literal appended to it. The operators denote
   replacement operations, the last few items in the argument program
   have to be replaced by the result of applying the
   operator. Similarly the combinators also denote (higher order)
   functions from programs to programs, the result program depends on
   the combinator and the last few quotations of the argument program.

   It is clear that such a semantics without a stack is possible and
   that it is merely a rephrasing of the semantics with a
   stack. Purists would probably prefer a system with such a lean
   ontology in which there are essentially just programs operating on
   other programs. But most programmers are so familiar with stacks
   that it seems more helpful to give a semantics with a stack.

This is exactly what the Purrr primitive macros do: they take programs
to programs. Essentials:

   PRIMITIVES: rewrite rules as endomaps of target machine
   code. semantics of concatenative program expressed in terms of
   these primitive machine rules is the composition of these rules
   applied to the empty target program.

   COMPOSITION: already hinted above: ordinary composition serves as
   the main abstraction mechanism to construct new endomaps of target
   machine code.

   PARTIAL EVALUATION: The 'stack' shows up here as the local view of
   target machine code. If the target language has a notion of a run
   time parameter stack, there is a possibility for staging: moving
   computations to compile time while preserving semantics.


In "Quotation revisited" Manfred talks about the "draconian" measure
of not equating lists and quoted programs. In Scheme terms, this is
about constructing lambda expressions vs. quasiquotation + eval. Using
the solution of only allowing the construction of quotations, but not
the destruction (intensional definition, defined by its properties),
isn't really that bad. (It's how Scat does it: all quotations are
opaque, no reflection.)

 Q: For Purrr it's possible to talk a whole lot about the semantics
    of macros without even mentioning target semantics. Does it make
    sense to see target semantics as an extension of the semantics
    introduced by the rewriting rules, to capture the cases that the
    rules don't handle: those that are somewhat general? Or is it
    better to see the macro semantics as the extension of limited
    target runtime semantics.

So, to relate Purrr and Joy a bit more: using rewrite primitives and
function composition, Purrr will reduce a program to a value whenever
it is a pure program. However, the target semantics isn't pure, so not
all programs can be completely reduced.


Entry: code registration
Date: Wed Jun 25 12:12:03 CEST 2008

The point is to record all word structs as they appear in code, in the
proper load order. This operation should be nestable. The problem I
run into is that 'define' needs toplevel/module context, and making
code registeration nestable seems to conflict with this: trying a
parameter gives problem, because the defines will be expanded in an
expression context.

If it can't be made nestable, let's make the code storage write-once.

Maybe it's better to define 'compilation unit' (one invokation of
'register-code' per 'forth-begin'   = module or load.

The problem to solve is to figure out which code was already
uploaded. Maybe it should just be marked? Done.

The remaining problem is: how to handle errors during upload? It might
be wiser to only mark code as synced AFTER upload was successful. Lets
provide an enumerator interface instead.


Entry: Swapping the two stacks : using just rewrite primitives?
Date: Wed Jun 25 18:32:16 CEST 2008

In fact, since the 'assembly stack' is of such paramount importance
for giving a semantics to the macro language:

 Q: why not use it as the primary stack, and define Forth primitives
    that manipulate program entry points (conditional jumps) as an
    extension to that? (sticking with pure rewrite rules at first?)

 Q: if so, can a concatenative eager rewriting macro language like
    Purrr be equated with a purely functional typed concatenative
    stack language without full reduction?

To answer the first rule: if code quotations are allowed without
higher order functions then my gut feeling is that this should work
pretty well. This brings the metalanguage VERY close to Scat: simply
extending Scat with assembly code data types already does the trick.

It looks like this is the way to find a better link between target
semantics and macro semantics.


Entry: Automatic instantiation
Date: Wed Jun 25 19:34:45 CEST 2008

from the blog post, which will probably be edited once i find a way to
express this properly (and solve the problem maybe..)

  Now, that's a nice story, but that's not how it happened :) The
  specification of rewriting rules came natural as a syntactic
  abstraction for some previously manually coded peephole
  optimizations. This rewrite system however seemed way more useful
  than just for performing simple optimizations, and by introducing
  pseudo target assembly instructions has been transformed into a
  different kind of two-level language semantics: powerful compile
  time type system with an analogous run time type system with
  'projected semantics' derived from eager macro rewrite rules.

  The downside of this is that in order to write programs, one has to
  make decisions about what to instantiate, and what not. It might be
  interesting to try how far this can be automated: don't instantiate
  what cannot be instantiated, but try to isolate subprograms that
  have real semantics.

Two things:

  Q: macro semantics and target semantics are not the same for some
     words like '+'. is this good or bad? it's useful for computing
     constants, but dangerous for overflows. is it better to
     completely embed the target semantics, and use different symbols
     for the metaprogramming operations?

  Q: is automatic instantiation really that difficult?


compile time + might be seen as a different type.. (like + and +. in
ocaml)

except for optimality (inlining might sometimes be better), solving
the instantiation problem based purely on semantics (inline when a
composition is 'real' + doesn't mess with the target interpreter)
might not be so difficult.


Entry: and?
Date: Wed Jun 25 20:07:57 CEST 2008

did we learn anything?

 - to really know what i'm talking about, i need to concentrate on a
   simpler concatenative macro language without higher order functions
   using a single stack and automatic instantiation of real macros.

 - the relation between semantics introduced by the rewrite rules and
   the the partially/fully postponed operations needs to be clarified
   a bit.


Entry: them stacks
Date: Wed Jun 25 20:29:13 CEST 2008

In the current compiler there are 4 stacks. Following the previous
remarks about concentrating on rewriting first, the order will
probably change to this:


1. Assembly stack, contains target assembly code and is used for
   target code rewriting. (in Forth compilers this is the allot
   buffer. in non-rewriting compilers it grows monotonically.)


2. The Forth Control stack, used for recording jump labels to
   implement looping and conditional structured programming constructs

3. The dictionary stack: (actually a set of stacks, supporting
   multiple entry points / fallthrough and control flow analysis)

4. The macro exit stack: for supporting multiple exit points in forth
   style macros. (an emulated run time stack).


Only the first one is essential for Pure Purrr (Puurpr, Purrepr, ...
Paars?). The other ones are there to support Forth's state in a
functional macro system + give some freedom to exchange macros and
instantiated words and perform control flow analysis.

The problem I mentioned before is that the Forth approach with
separate control stack is a bit of a dead end, since it's not a very
structured way of dealing with code. Macro quotations are probably a
lot better. Unfortunately, there is no simple way to convert forth
syntax to quoted macros, without getting rid of the control stack.


5. Probably, in a language based on 1. with automatic instantiation
   (otherwise there would probably be some kind of code explosion), a
   stack of instantiated words might need to be added. However, this
   is just a write-only registery (log).


Entry: 2 stage semantics
Date: Thu Jun 26 10:15:58 CEST 2008

correspondence:

It might be helpful to put on a background light: i'm trying to write
a system for parameterized programming of tiny computers (currently
Microchip PIC18 microcontrollers) based on concatenative and
functional languages. I'm interested in limited order semantics mostly
from a perspective of optimal implementations: how simple can the
eventual semantics be made without having to sacrifice space/time
efficiency. Currently I'm leaning towards a full macro system with
first class macros, but I'm interested in this limited order
semantics, and like to see if it can somehow be embedded in my
approach.

What i'm trying to figure out is how i can use 'higher order macros' in
my system to allow for limited order semantics as you are suggesting.

The approach i'm taking is:

   * Use 2 stages: concentrate on the first stage which consists of
     a joy-like language that operates on a stack of machine code
     instructions (stage 1) and a stage that executes the resulting
     machine code (stage 2).

   * Start building stage 1 semantics from rewrite rules that operate
     on programs built from a single stage 2 instruction QW (quote
     word), which loads a number onto the run time stack. For example,
     the stage 1 function '+' performs the following program
     transformation:

            ...  [QW 1] [QW 2]   ->  ... [QW 3]

A complete Joy-like semantics can be built from this, if the fact that
QW can only accept numeric arguments is ignored.

At this point, some operations might not be defined for all input
programs. For example '+' applied to the empty program is not
defined. What can be done here is to start building target semantics
based on the program rewrite rules: use a couple of instructions that
manipulate the run time stack to make sure '+' can be defined on all
input programs:

            ... [QW 1] -> ... [ADDLW 1]
            ...        -> ... [ADDWF POSTDEC0 0 0]

Doing this for the whole set of primitives gives a language with 2
stage semantics:

   stage 1: program text represents machine code rewrite functions
   stage 2: rewriting of the empty program results in 'real' programs

The remaining problem is that some values used as arguments to machine
code instructions might not be numbers: the Joy like language is
higher order, so quoted programs are an example of such values. (We
can create another problem by introducing intermediate instructions,
which are stage 1 data objects that do not represent target values nor
instructions. However, this is beneficial for the eventual goal of
paramerized programming.)

As a result, not all (macro) programs that have a stage 1 semantics
can be attributed a stage 2 semantics, because applying them to the
empty program does not yield a program that lies in the target program
space due to the use of non-numeric values, or the use of pseudo
machine instructions.

What I'm already convinced about is that this approach works pretty
well for manual metaprogramming: by requiring the programmer to
instantiate the 'real' programs as parameterized general macros,
programs can be built in a Forth style language. (Think Forth run-time
and immediate words). Allowing for compile-time data types that do not
translate to the (necessarily) limited target machine semantics gives
access to a very powerful way to factor/modularize parameterized
programs (specialized code generators).

What I'm interested in is to figure out how to perfrom automatic
instantiation which gets rid of the 2-mode word/macro Forth-style
semantics, how to turn non-specialized program generation into type
errors where the source can somehow be blamed, and how to embed
limited order operators in a sound way.


Entry: rewrite semantics
Date: Thu Jun 26 12:18:20 CEST 2008

a remaining problem in my reasoning about rewrite semantics is this:

  * Manfred talks about giving Joy a semantics through rewrite
    rules. This REPLACES the stack semantics, but stack semantics is
    later re-introduced as a STRATEGY for implementing the rewrite
    rules.

  * I talk about rewriting target programs. The definition of the
    function + can be written as

      [qw 1] [qw 2] +  -> [qw 3]

    This syntax represents the definition of a function which maps a
    target program of 2 instructions to one of 1 instruction. (Let's
    not use parameterized numeric values for simplicity.)

    But this is trivially changed into a system of purely syntactic
    rewrite rules like:

      [qw 1] [qw 2] [qw +] -> [qw 3]


What's the difference? They are really the same, no?

Extending the function system with functions that are
'self-compiling', i.e. :

       123 -> [qw 123]

and extending the rewriting system with a preprocessing step that maps
all syntax elements X to [qw X], we have two ways of interpreting:

      1 2 +

As functions, and inside the rewrite system.


Entry: chip bootstrap and monitor protocol
Date: Thu Jun 26 14:58:17 CEST 2008

The problem with CATkit is that it needs a bootstrapped chip that
listens for commands on the serial port. The problem with this is that
there is a threshold for people to start using Purrr for PIC18 without
buying a programmer: they have to build one. It would be a lot more
convenient to do all the communication throug the ICD port.

In theory this isn't so difficult, but does require some juggling to
get going. The Purrr console is an RPC protocol: host sends command to
target and waits for reply. The Microchip debugger protocol is a
master-slave protocol. After bootstrapping using the one-way Microchip
programming protocol the PIC can be made to do anything, but requiring
the host to wait for asynchronous replies isn't so easy to do with
custom hardware.

I was thinking about a serial port based interface in which the target
-> host protocol is simple RS232, but host->target can be synchronous
(for initial programming) or asynchronous.


Entry: Backus Turing Awared Lecture
Date: Thu Jun 26 20:39:54 CEST 2008

Interesting points about FP:

  * all functions are unary. (this fits in a concatenative but not
    necessarily stack-based) approach.

  * primitive combining forms are chosen not only for their
    computational properties, but also for how they behave in the
    algebra of programs.

It has crossed my mind about using objects different than stacks to
perform the chaining. Looks like that is what FP is: lists of lists.

( PF is FP backwards. very funny. giving FP a postfix syntax wouldn't
be completely insane, really.. having an embedded array processing
language is a concatenative one could get away with the strange
semantics of 'map' in a concatenative language: turning a value from a
list into a stack, processing it and turning the result back into a
value. )

  * functional forms and parameterized programming are quite related.

  * to apply a defined symbol, replace it by the RHS of its
    definition.

so, if you change the syntax around such that forms are implemented by
postfix macros that expect quoted programs, but do NOT allow quoted
programs to survive the compilation, you're done right? a
concatenative functional macro language.


Entry: point-free style and monads
Date: Fri Jun 27 11:24:25 CEST 2008

Cons: M t
Unit: t -> M t
Bind: (M t) -> (t -> M u) -> (M u)

A monad M is a way to organize a collection of t, together with a way
to sequence computations of such collections. The 'bind' operation
takes values t from M t, produces a collection of monadic values M u
and combines those into a single collection M u.

The thing with '>>=' and 'do' is that they introduce new names.
Instead of mapping one monad to another one directly, this is
unwrapped to a 'do' comprehension that is then 'iterated' by the
implementation of '>>='.

It's probably better to look at the alternative formulation, replacing
Bind by:

Map:  (t -> u) -> (M t -> M u)
Join: M (M t) -> M t

which can be used in point-free style, and is closer to the spirit of
FP and stack languages.


Entry: lambda: why names?
Date: Sun Jun 29 10:25:11 CEST 2008

http://www.latrobe.edu.au/philosophy/phimvt/joy/j08cnt.html

Lambda names are a user interface: lexical locality works well for
human brains. However, manipulating lambda expressions is
tedious. (DeBruijn indices fix this problem, but are not so
readable.. Maybe it's best to regard names as syntactic sugar?).


What I don't understand in a lot of texts about Joy is the emphasis on
composition instead of application. I understand that the _structure_
of a program is better seen in terms of composition alone, but the
eventual use of a program is really application. Let uppercase denote
data items and lowercase denote functions:

           S a b

The first space between S and a is an application, while the second
one is a composition. Eventually, you're interested in the
value. Maybe the nuance is to really get rid of the value altogether
and see the semantics of [a b] as the 'output' of a program? Feels
wrong..

    syntax:     concatenation
    semantics:  composition of functions
    execution:  application

The only real down-to-earth use i see is syntactic manipulation
leaving semantics invariant: optizing compilation.

Really, Joy in its 'composition only' lore is really about
compilation, about 'relative semantics' which stops at the actual
application. Maybe my intuition is too much attracted by operational
semantics (the 'real' world?).

After all, there is something to say for "Application has not a single
property. Function composition is associative and has an identity
element" -Meertens. In S a b, dragging the S on the left along is
rather pointless, even if the 'real' thing that happens is ((S a) b),
semantically all that matters is the composition of a and b, because
the application can be associated out: (S (a b))

Anyways.. Onward: Backus' FP: all functions are unary, but functional
forms can take multiple parameters. Embedded in Purrr this means that
at runtime there is a single 'token' going around, but at compile
time, there might be a stack of functions and forms combining
them. Actually, this seems like an interesting embedding! Purrr as a
metalanguage for a non-stack language, based on the observation that
both languages are concatenative, but having a different threaded
state: stack vs. list of lists.

Then about Category Theory and CAM. Too much for now..


Entry: is interpretation really different?
Date: Sun Jun 29 12:37:10 CEST 2008

This popped up before, but I'm not sure if it's an arbitrary
re-arrangement. Consider the expression from last post:

                S a b

Where 'S' is a state, and a and b are functions. Turning the data/code
roles around, one could interpret S as a function and a,b as data,
where application of S yields a new function:

                ((S a) b)

This has the semantics of an interpreter: 'S' is an interpreter state
that takes the input code sequence (a b) to produce a new state.
Compare this to the state monad in Haskell.

Somehow it feels as if (S a) or (a S) are really only two sides of the
same coin: producing a new state interpreter S from the message a, or
computing a new state S to be interpreted by function a. Is this
related to different order of evaluation/currying of the same
function?


Entry: oleg metaprogramming
Date: Sun Jun 29 23:29:32 CEST 2008

http://okmij.org/ftp/Computation/Generative.html#framework


Entry: gnuplot
Date: Tue Jul  1 14:37:34 CEST 2008

First: There's a small inconvenience in sandbox that shuts down ports
created inside a sandbox whenever there are eval-limits. Setting space
and time limits to #f fixes this.

For the rest, gnuplot works nice, but i get zombie processes.

FIXME: closing the port doesn't stop the process. add a custom port
wrapper.


Entry: custodian + custom port
Date: Tue Jul  1 21:06:13 CEST 2008

Trying to avoid the creation of zombie processes when custodian shuts
down the gnuplot pipe. This works well with 'close-output-port but
doesn't work with custodians, presumably because the inner port gets
shut down first?

(define (open-gnuplot)
  (let ((co (current-output-port)))
    (match
     (process/ports co #f co "gnuplot")
     ((list stdout
            stdin
            pid
            stderr
            control)
      (make-output-port
       'gnuplot
       stdin
       (lambda (bytes start endx _ __)
         (write-bytes bytes stdin start endx))
       (lambda ()
         (printf "closing gnuplot\n")
         (close-output-port stdin)
         (control 'wait)))))))

(define p #f)
(define c (make-custodian))
(parameterize ((current-custodian c))
   (set! p (open-gnuplot)))
(custodian-shutdown-all c)


 Q: does the custodian shut down custom ports?

(define (make-dummy-port)
  (make-output-port
   #f
   (current-output-port)
   void
   (lambda ()
     (printf "closing\n"))))

(define p #f)
(parameterize ((current-custodian c)) (set! p (make-dummy-port)))
(custodian-shutdown-all c)

-> nothing happens..

(close-output-port p)
-> prints "closing\n"

EDIT: solved with double fork using an external utility:

#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>

int main (int argc, char **argv) {
    int pid;
    if (argc == 1) {
        fprintf(stderr, "usage: %s <prog> <arg> ...\n", argv[0]);
        return -1;
    }
    pid = fork();
    if (!pid) {
        char *a[argc];
        int i;
        for (i=0; i<(argc-1); i++){
            a[i] = argv[i+1];
        }
        a[i] = 0;
        execvp(a[0], a);
        fprintf(stderr, "%s: can't execute %s\n", argv[0], a[0]);
    }
    return 0;
}


Entry: matlab-like behaviour
Date: Wed Jul  2 00:59:08 CEST 2008

Except for heavy-duty floating point, and a ton of library code, most
of what is in the matlab language is ease of working with vectors and
matrices on the syntax level. What would it take to have a scheme-like
clone? Are there any already to take some inspiration from?


Entry: chaos
Date: Wed Jul  2 01:01:45 CEST 2008

got lost in chaotic patterns again.. what i've been doing last couple
of days:

 - LFSR + Hough
 - Reading the FP paper + joy rewriting (algebra of programs)
 - misc stuff about point-free languages
 -                  metaprogramming
 - simplified core semantics: 2stack -> 1stack + sexp (code quotations)


EDIT: using dynamic-wind

(define c (make-custodian))
(parameterize ((current-custodian c))
  (thread (lambda ()
            (dynamic-wind
                void
                (lambda () (let l () (sleep 1) (l)))
                (lambda () (printf "shutting down\n"))))))

nope, doesn't work either


Custodians don't manage processes, and it doesn't look like there is a
way around that.. Maybe ignore the problem for now? Or try to get
something similar working with subprocess?

Maybe the "double fork" trick works here?


Entry: Multi Stage Programming: Its Theory and Applications
Date: Wed Jul  2 10:26:40 CEST 2008

PhD Thesis of Walid Taha:
http://www.cs.rice.edu/~taha/publications/thesis/thesis.pdf

About typed metaprograming (MetaML)


Entry: datatypes and iterators
Date: Wed Jul  2 17:14:37 CEST 2008

Starting from the ideas in FP, how can we build a minimalistic algebra
of programs specific for image processing? This means:

   * keep the data type simple (tiled images)

   * functionals are special cases of map/fold/shift for image ops

starting from this, building a framework for effective loop fusion
should be doable. the problem is composition of shift operators:
combining two convolution maps gives a bigger convolution map. (Is it
possible to work only in terms of the 4 direction unit shifts?)

This can be tested in 1D first, i.e. for block based audio processing.

Algebra of programs. Ingredients:

  * binary functions +,-,*
  * scalars + vectors

Is it possible to make something smaller than FP?


Entry: lab/image-io.ss
Date: Wed Jul  2 17:27:50 CEST 2008

PGM input and YUV4MPEG output seem to work, but it's quite
slow. Working towards the algebra of image processors, it might be
interesting to start with the same basic structure in scheme:
represent images as 1D vectors, and generate 'fast iterators' for it.


Entry: that DSP language
Date: Wed Jul  2 21:47:25 CEST 2008

Buzzword time. Or, what are the different ideas I'm trying to solve at
the same time by being confused for months in a row.. Basically, I
can't take the time to read all research on this topic, and find it
hard to follow such without proper hands-on experience. So how far can
I actually get with common sense alone?

metaprogramming:
  * dynamicly typed metaprogramming (Purrr)
  * concatentative composition based languages + evaluation time

dsp:
  * an algebra of programming languages / rewrite rules
  * real-time memory allocation + organization: maximise locality
  * combining protyping + implementation (meet-in-the-middle language)
  * solve the tiling + shifting problem

Currently I got a 2 projects to finish (snowcrash + staapl PIC18), but
after that, I need some free hacking time to tackle the next problem,
or some study time. Before I do anything else, I need to do:

   * Pierce TAPL
   * Muchnick ACDI


Entry: tile problem
Date: Wed Jul  2 21:58:43 CEST 2008

Suppose it is possible to obtain a data-flow graph which maps inputs
to outputs. Use this to:

   * create a core loop for infinite data

   * optimize core loop to introduce software pipelining and eliminate
     multiple reads.

   * solve boundary conditions

Maybe it's time to start reading Muchnick ACDI, and combine it with
information from Pierce TAPL and the vague idea of Algebra of
Programs + how to use it to perform loop fusion for DSP stream and
image processing.


Entry: image iterators + dont-care regions
Date: Thu Jul  3 20:34:44 CEST 2008

Algorithms simplify a whole lot if dont-care regions can be
constructed: no need to handle border conditions, except for the
initial tiling step (duplication). The duplication this gives is
probably not problematic since it's 2nd order.

The players:

semantics:
  * unary operators + 1 image mapper
  * binary operators + 3 image mappers:
      - 2 images
      - 1 image with X or Y shift

implementation:
  * image accumulators
  * coordinate iterators


Maybe I'm just getting tired, but it's really hard to chop this into
primitives. One of the things that keeps getting in the way is to
create the correct type of result accumulator from the input type. The
problem is: the right model is not (+ a b) but (set! r (+ a b)) which
can then be used to create a (+ a b).

Maybe build it around this:

(define (inner-loop! i i->j fn r a b)
  (vector-set! r i
               (fn (vector-ref a i)
                   (vector-ref b (i->j i)))))

r:    result vector
i:    main index
i->j: main index to secondary index map
fn:   binary function
a:    first input vector
b:    second input vector (can be same as first)


Funny, what i'm re-inventing here are the x and y operators in the
generating function / Z transform (xy transform?) representation of 2D
sequences, but bound to a function. I.e.

  lifters : function => sequence operator

   lift   : +    ->   +
   lift-x : +    ->   1 + x
   lift-y : +    ->   1 + y

Maybe the base language should just be ordinary math functions and
2-variate polynomials? This should be more than enough to generate
tiling + appropriate iterators.

I've come full circle: starting with algebra -> DSP -> implementation
of algorithms -> generalization in language -> algebra.

Anyways, for the 2 shifts in a framework of 2^n tiles, it is possible
to use modulo addressing, which simplifies code a lot.

Sobel:   (1 - y)^2 + (1 - x)^2

Ok, got it to work:

(define (sobel i)
  (define ^2 (U (lambda (x) (* x x))))
  ((B +)
   (^2 ((X -) i))
   (^2 ((Y -) i))))

U unary lift
B binary lift
X binary x-shift lift
Y binary y-shift lift


So.. These are 2 different views: the operator view (where X and Y
denote the shift operators) and the higher order function view, where
unary and binary scalar operations are mapped to unary and binary
image operations. The latter language seems more general: easier to
work with multiple arguments.


Entry: automatic lifting
Date: Sat Jul  5 10:41:02 CEST 2008

Some of the lifting operations can be automated: U and B can be
inferred from the arity of the operation. X and Y need to be
specified.


Entry: scheme vs. purrr PE macros
Date: Sat Jul  5 12:25:00 CEST 2008

It's about macro arguments. The fundamental idea is that the expansion
depends on the input _values_ not just the input structure. In
ordinary day-to-day Scheme macros this is seldom the case.

What I'd like to find is a way to explain the essential difference
between Scheme's macros system and Purrr's macro system, which is a
polymorphic concatenative language where values represent postponed
operations.

Building such a Scheme partial evaluator in transforming all functions
to macros shouldn't be too difficult. This is called "introducing
staging".  The analogous intelligent scheme macro:

(define-syntax-ns (pesel) +
  (lambda (stx)
    (syntax-case stx ()
      ((_ a b)
       (let ((da (syntax->datum #'a))
             (db (syntax->datum #'b)))
         (let ((na (number? da))
               (nb (number? db)))
           (if
            (and na nb)
            (datum->syntax stx (+ da db))
            #`(+ a b))))))))

So, is there ar real difference between the concatenative (string)
rewriter and the tree rewriter? Not really. The only problem is that
for a tree rewriter which optimizes applications, the appropriate
rules for lambda rewriting need to be implemented. The only difference
is thus convenience: this kind of stuff is easier to do in
concatenative languages due to absence of names.

  * Concatenative languages: non-primitives can be expanded as a
    concatenation of primitives, which are simply applied in order.

  * Lambda languages: non-primitives need to implement the usual
    lambda reduction mechanics.

So, partial evaluation of pure lambda expressions is actually not so
difficult: if you start from normal order reduction, just reduce
things that can be reduced for a certain expression.

So.. Applied to the case of image processors. If they can be written
as pure expressions, making evaluation order irrelevant, a program is
easily specialized:

       1. library of primitives + combinator HOFs
       2. specialized expressions

   -> eliminate all applications of HOFs to yield a single expression

So, really, this seems quite straightforward. Am I missing something?
Yes. This does not include deforestation (or, the simplified version
for image data structures).

Roadmap:
  * write a lambda expression reducer
  * obtain rewrite rules for image HOFs
----
  * alternatively, formulate it in a concatenative language to avoid
    the lambda reducer.


Entry: order of parameters
Date: Sat Jul  5 14:07:59 CEST 2008

 Q: For highly parameterized code, the order of arguments in a higher
    order function decomposition is a bit ad-hoc. Is there a way to
    make this less so?


Entry: split coma/macro
Date: Sat Jul  5 18:02:40 CEST 2008

Merged the split-off staapl-coma project: swaps the order of the two
stacks, such that there is a 1-stack metalanguage that doesn't use
Forth style control words. The log entries are inlined below.

What this does it give a clear separation between the languages:

  * COMA

     An s-expression based COmpositional MAcro language of which the
     values represent atomic target programs. Using pattern matching,
     program rewrite rules are implemented that perform partial
     evaluation and program parameterization.

  * MACRO

     On top of COMA, a Forth macro language with Forth control words,
     labels, code fallthrough and local exit macros.


-----

_Entry: swap the 2 stacks
_Date: Wed Jul  2 23:38:47 CEST 2008

I'd like to move to a single stack model for a clean Macro language,
all the other stacks are for Forth style control words.

This is the prototypical "deep change" that's hard to make in a
dynamic language. Is there a way to make this easier? Maybe separating
out part of the macro language (mos) which will implement the core
compiler + pattern matcher.

It involves changing all primitives, since they no longer move stuff
from the Scat stack to the asm stack, but transform data in-place.


Got pretty far already: got basic coma macro language to run + simple
macro> command line.

OK. got a bit further. stuck at:

box> (require "pic18.ss")
...
box> (repl ": asdf 123 23")
;; (macro) asdf
STATE:#<compiler>
non-null-compilation-stack:  ((23) qw (qw 123))

 === context ===
/home/tom/darcs/staapl-coma/macro/postprocess.ss:36:0: empty-ctrl->asm
/home/tom/darcs/staapl-coma/macro/postprocess.ss:44:0: assert-empty-ctrl
/home/tom/darcs/staapl-coma/macro/instantiate.ss:218:0: compile-forth
/home/tom/darcs/staapl-coma/macro/instantiate.ss:384:0: target-compile-1
/home/tom/darcs/staapl-coma/macro.ss:35:0: target-compile
/usr/local/plt-3.99.0.26/collects/scheme/sandbox.ss:459:4: loop


while (macro> 123 23) works fine..
time to go to bed..


_Entry: bug fixes
_Date: Sat Jul  5 17:54:14 CEST 2008

Nothing serious, just some missing dependencies due to file splits,
and the expected ctrl/asm confusion here and there.

What i did note: pic18/test.f doesn't 'require' but it does 'load'

Looks like we're done. Time to merge.


Entry: rewrite rules for HOFs
Date: Sat Jul  5 21:25:02 CEST 2008

 Q: What is the essence of the 7 deforestation rules in the Wadler
    paper? (page 8)

(1) variables are left alone
(2) distribute over type constructors
(3) function application: substitute terms in parameterized body, and
    recurse transformation
(4) distribute over case (variable)
(5) given constructor, pick one branch and substitute terms
(6) case of function application: substitute argument
(7) case of case: push inner case through to the branches

The case statements are there to handle pattern matching for union
types. You need those to be able to stop recursion! The rest is really
just term substitution and elimination of constructors through rule
(5).

Translating this to what I want to build: either I find a way to use
this representation together with a final step that optimizes data
recursive constructors to use arrays, or I use a special set of data
types..

The higher order macros are interesting. (Wadler mentions OBJ btw. I
probably got it from there.) 'where' terms are introduced, a kind of
'let' for local function definitions. Macros are then like functions
whos variables can reference function names, but cannot be
recursive. Lack of recursion guarantees they can be expanded
out. First order recursion is still allowed using 'where' clauses.


 Q: Make a summary of what the OBJ system is about.

Rewriting + first order equational logic + ordered sorts (types).
Quite elegant, but I'm not sure whether I can use any of this in my
untyped ad-hoc approach.. Theories are quite surprising: an ability to
define properties of operations.

 Q: Algebra of programs?

The Backus paper really seems to be seminal for all this work in
functional programming about program transformation.. Nobody seems to
call it algebra of programs though..

 Q: "Functional Programming with Bananas, Lenses, Envelopes and Barbed
    Wire"

    "We develop a calculus for lazy functional programming based on
    recursion operators associated with data type de nitions. For
    these operators we derive various algebraic laws that are useful
    in deriving and manipulating programs."

Seems to be about moving from basic, low-level recursion to
transformation on a higher level: using combinators.


Entry: optimizing lists to arrays
Date: Sun Jul  6 00:35:12 CEST 2008

This deforestation business seems doable. Remaining problem is to map
recursive list processing algorithms to vector algorithms by somehow
faking 'cons'. Or, it's really 'cons' with cdr coding.

On the other hand, HOFs could be used for this: operations that lift
scalar ops to container ops.


Entry: not writing a single line of C code..
Date: Mon Jul  7 18:08:52 CEST 2008

Attempt to generate C code for Sobel + Hough transform, based on a
higher order macro specification. Basicly the same as hough.ss but
with partial evaluation of some functions.

Roadmap:

 * start from a purely functional description in HOF combinator form
 * prove some transformation laws for the combinators
 * use these to transform the algorithm

The real problem is making the X and Y combinators combine into
something that can be easily compiled out into n x m rectangular
region combinators. Basicly, start with the loop you want to end up
with, and factor it into separate parameterizable pieces.

It seems like 'adding parameters' to an inner loop is what makes this
difficult. This can be solved with HOFs and partial application. When
going that way (fixed arity) maybe using a stack approach can be done
immediately?

EDIT: see onward entry://20080710-121719


Entry: about recent changes and insights
Date: Wed Jul  9 02:06:36 CEST 2008

Moving from 2stack->1stack for the core macro language (Coma) seems
like a pretty significant step. This will allow most Fortisms
(semantics) to be concentrated in a single implementation, next to its
syntax.

For the array/dsp language design: moving toward algebra of
combinators seems to be the right approach. The problem is how
exactly. I'm trying to align with these ideas in finished research to
see where I can add a bit of originality without re-inventing
everything.


Entry: That DSP language: fanout
Date: Wed Jul  9 14:02:19 CEST 2008

Thinking about the particular problem of writing the sobel algorithm
in combinator form, doing this in stack notation makes me miss a
``parallel'' function mapper. Basicly, a 'distribute' operation.

The essential problem is that a lot of DSP algorithms are many -> many
maps: they are far from linear (single use of variables), and are
quite parallel. To solve this, combinators need to do the duplication
and parallel application. (anamorphism followed by catamorphism).

It might be interesting to allow for code quotations to be
parallel. I.e. a list of functions can be interpreted as a
composition, or as a parallel application: It's useful to have a
vector of functions/closures.


Entry: porting old ip.ss
Date: Thu Jul 10 12:17:19 CEST 2008

Let's start from the previous approach from entry://20070330-160157
and see if there's something to do with the new insights. This code is
mainly about generating C code from a grid-function specification:

(fn (gain x)
  (* (gain)
     (+ (x 0)
        (x -1)
        (x +1)))

This has the advantage that it translates straight to array accesses:
arrays are finite functions mapping coordinates to value.

Porting to the new cgen.ss involves some switch from
symbols/lists->syntax


Got syntax port working, now need to change the 'grid' and 'loop'
macros + check if they worked before.. Looks like there's some code
missing.

Hmm.. Should i go back to working with symbols, or get the syntax
working? The problem is 'substitute'. This should be replaced by a
variant of 'syntax-case' that recurses over expression arguments.

Ok. using 'syntax-case/r' this becomes quite simple. Got it ported.

NEXT: this implementes 'map', now find a way to express fold/integral
style functions like the Hough transform, on top of this mechanism. No
DSP without inner product.. Maybe focus on the Hough-like accumulator
style first: that requires random access, which is genuinely
different. It seems like a lot of generality is necessary to express
such a specific data flow.


Entry: functional forms (FP in Coma)
Date: Thu Jul 10 16:21:51 CEST 2008

So, embedding functional forms in Coma.

 Q: What is a functional form?

It is a macro which takes multiple macros as arguments. Elaborating on
the first example in Backus' FP paper: the inner product, this becomes

  inner = trans (*) map (+) insert

This is as much about datatypes as it is about functional
forms. 'trans takes a 2D vector and transposes it, 'map takes a 2D
vector and applies a function to each inner vector, returning a vector
of results. 'insert takes a vector and folds it with a binary
operator.

The thing which feels a bit strange here is to have unary
functions. All stack operations are automatically mapped to vector ->
scalar functions. We could use a vector -> vector lift too, for macros
with multiple return values.

Embedding FP in a concatenative macro language:

   * all functions are unary (arity 1 -> 1)
   * pure stack macros can be lifted to FP functions in 2 ways:
        vector -> scalar  (one result)
        vector -> vector  (one or multiple results)
   * functional forms have arity n -> 1 and operate on unary functions.

So, the trick is to somehow hide the lifting of unary stack ops ->
unary vector ops. The easiest way is to see the stack as the outer
vector wrapper. This however doesn't 'unpack things by default.

I.e., in FP, the '+ function behaves like:

      +:<1,2> = 3

and not

      +:<1,2> = <3>

Using only the top element seems a bit dirty (for the same reason that
'map feels dirty), but it does seem to be the most convenient
approach.


 Q: Does everything look like a nail?

Is this just a small gimmick, to have an embedded FP macrolanguage for
creating (inlined) expression evaluators, or is it genuinely useful to
construct (static) DSP structures in control applications where most
glue logic is Forth? I guess if I can re-implement Krikit I have a
proof of concept.


Entry: fold
Date: Fri Jul 11 11:26:04 CEST 2008

Given a way to transform loop body specification (- (x 0) (x 1))
into an expanded loop, how to perform folds?

The Hough transform: for each (x,y) accumulate

   r = x cos t + y sin t

into an (r,t) plane.

Maybe it's best not to write this as a fold. The problem is that this
is not a fold of a simple symmetric binary operator, but of a piksel
and the state (r,t) accumulator. In pseudo:

    (fold (lambda (piksel accu x y)
            (+ accu (sino x y)))
          accu0
          image)

How to add this kind of operation?

Maybe formalize the grid version first: Sobel is

(fn (a)
  (let ((dx (- (a 0 0) (a 0 1)))
        (dy (- (a 0 0) (a 1 0))))
    (+ (* dx dx)
       (* dy dy))))

Aha! This is why sets are necessary for local binding of loop
pointers. It eliminates common subexpressions!

The 'let' form seems to work, however, current parsing messes up the
syntax by mis-identifying expressions as grids. This needs some kind
of parameter substitution.

'let' doesn't seem to work for expression statements. The code below
expands fine, but cgen.ss expands the 'let' incorrectly.


(src->code
 '(loop (= (grid result 0 0)
           (let ((dx (- (grid a 0 0) (grid a 0 1)))
                 (dy (- (grid a 0 0) (grid a 1 0))))
             (+ (* dx dx)
                (* dy dy))))))
=>

(statements
 (block
  (var int i)
  (for-head (= i 0) (< i (* 400 300)) (+= i 300))
  (block
   (vars
    (float* a_p0 (+ a (+ i (* 0 300))))
    (float* result_p0 (+ result (+ i (* 0 300))))
    (float* a_p1 (+ a (+ i (* 1 300)))))
   (statements
    (block
     (var int j)
     (for-head (= j 0) (< j 300) (+= j 1))
     (block
      (vars
       (float* a_p0_p1 (+ a_p0 (+ j 1)))
       (float* a_p0_p0 (+ a_p0 (+ j 0)))
       (float* result_p0_p0 (+ result_p0 (+ j 0)))
       (float* a_p1_p0 (+ a_p1 (+ j 0))))
      (loop
       (=
        (grid result_p0_p0)
        (let ((dx (- (grid a_p0_p0) (grid a_p0_p1)))
              (dy (- (grid a_p0_p0) (grid a_p1_p0))))
          (+ (* dx dx) (* dy dy)))))))))))

Seems like this is called let*
-> some small bug still. Ok, fixed.

It's working now:

(p '(loop (= (grid result 0 0)
             (let ((float dx (- (grid a 0 0) (grid a 0 1)))
                   (float dy (- (grid a 0 0) (grid a 1 0))))
               (+ (* dx dx)
                  (* dy dy))))))

=>

{
  int i;
  for (i = 0; i < (400 * 300); i += 300)
  {
    float* a_p0 = a + (i + (0 * 300));
    float* a_p1 = a + (i + (1 * 300));
    float* result_p0 = result + (i + (0 * 300));
    {
      int j;
      for (j = 0; j < 300; j += 1)
      {
        float* a_p1_p0 = a_p1 + (j + 0);
        float* a_p0_p1 = a_p0 + (j + 1);
        float* a_p0_p0 = a_p0 + (j + 0);
        float* result_p0_p0 = result_p0 + (j + 0);
        *(result_p0_p0) = ({
          float dx = (*(a_p0_p0) - *(a_p0_p1));
          float dy = (*(a_p0_p0) - *(a_p1_p0));
          ((dx * dx) + (dy * dy));
        });
      }
    }
  }
}


Now, how should i see this? It is an image comprehension with built-in
shift operators. What about the following specification syntax:


(for/grid (a result)  ;; grids
          ((i 100)    ;; dimensions, possibly inferred
           (j 100))
   (= (result 0 0)
      (let ((dx (- (a 0 0) (a 0 1)))
            (dy (- (a 0 0) (a 1 0))))
        (+ (* dx dx)
           (* dy dy))))))

where 'let' uses type inference from the values.

I'm not sure wether it's a good idea to have a grid iterator. Maybe
factoring in smaller for-loop like comprehensions is a better idea?

Sufficiently confused again..

The Hough loop + edge detection should look like this:

(for/grid (a)  ;; grids
  (let ((dx (- (a 0 0) (a 0 1)))
        (dy (- (a 0 0) (a 1 0))))
    (let ((sobel
           (+ (* dx dx)
              (* dy dy))))
          (if (> sobel 600)
              (accu! x y)))))

It doesn't look like this is going to work. Too experimental
still. Let's move to straight C.


Entry: writing C
Date: Sat Jul 12 12:24:01 CEST 2008

Funny, how i've been disgusted by C to then move on to a higher level
of abstraction, only to find that i'm actually enjoying writing C
quite a bit because I'm getting better at writing properly factored
code.

Maybe the trick is really to define an s-expression based language
that can do anything C can do, so the compilation becomes incremental
rewriting?

The approach in ip.ss should maybe be a bit more factored + expose
lowlevel constructs?

How to make a better C?

    -> type inference + polymorphy
    -> local functions (macros): maybe like purrr: manual inlining?
    -> 2 stacks? would make downward-only local functions easier


My intuition says that it really can't be too hard to do this in a
proper, not too ad-hoc way, but any time i dive into it, it seems as
if i don't understand the problem fully. It does look like types are
the essential part. Maybe it's best to look at C without the
polymorphism of the math operators?

Another thing that might help is to simplify the control flow
constructs. Maybe only 'if and 'goto should be kept? Or 'if and
'while(1)? The 'for loop is at least better replaced with a 'while
loop.

With SSA and CPS being equivalent, how to generate C code such that
the SSA form that the compiler sees is actually the one we intend?


Entry: typed vs. untyped
Date: Mon Jul 14 10:59:55 CEST 2008

Been browsing through Oleg Kiselyov's papers on code generation. Most
of it seems to be based on MetaOcaml and a translation to C. It might
be interesting to try to summarize the difference between MetaOcaml's
approach to staging, and the hygienic 'syntax->datum / 'datum->syntax.


Entry: multi-stack Forth support
Date: Mon Jul 14 11:24:01 CEST 2008

Currently i'm using structure type inheritance together with pattern
matching to be able to perform base type operations on derived types:
they simply leave alone the extended state.

This is essentially the same as operating on a stack: each type
extension adds one stack element. Does this have implications on the
implementation level? Is it better to represent state as a stack of
states right from the beginning? This would make the update method
trivial. Current conclusion: Maybe it's best to keep that method
abstract.


Entry: rewriting
Date: Mon Jul 14 11:34:12 CEST 2008

What I call 'eager' rewriting probably has a better accepted name in
the literature.


Entry: Generating optimal code with confidence
Date: Mon Jul 14 14:55:20 CEST 2008

http://okmij.org/ftp/Computation/Generative.html

 "A Methodology for Generating Verified Combinatorial Circuits", Joint
 work with Kedar N. Swadi and Walid Taha. roc. of EMSOFT'04, the
 Fourth ACM International Conference on Embedded Software, September
 27-29, 2004, Pisa, Italy. ACM Press, pp. 249 - 258.
 http://www.cs.rice.edu/~taha/publications/conference/emsoft04.pdf

This paper and related work seems to be a ticket into the field of
Resource Aware Programming (RAP), to find a way to place Staapl's
dynamic type approach, and see how static type systems can be of
benefit. Reference number [3] talks about a linear functional
language, which is pretty close to where i'm going.

The roadmap seems to be something like:
 * get educated about type systems (TAPL)
 * see what there is to learn abot Cat's type system
 * translate this to a type system for Coma

References:

[3]  related to http://www.sac-home.org/
[30] "Generating heap-bounded programs in a functional setting", TAHA
     Walid, ELLNER Stephan, HONGWEI XI.

Resource aware programming:
  * highly expressive untyped substrate
  * stage distinction
  * static type systems

The latter is about typing code/circuit generators so they can be
composed. I don't know what the untyped substrate is about.


Entry: binary operations
Date: Mon Jul 14 15:23:22 CEST 2008

Composition of binary operations have the following structures:

non-struct: tree
assoc:      list
comm+assoc: set

Read the OBJ paper yesterday, and I'm thinking wheter the 'theories'
approach might be usable in Coma: expressing properties of operators,
i.e. associativity, commutativity,...


Entry: MetaOCaml / MetaScheme
Date: Tue Jul 15 12:18:50 CEST 2008

( Edit from: Fri Jun 27 13:02:41 CEST 2008 )

http://okmij.org/ftp/Computation/Generative.html#meta-scheme

4 special forms:

bracket
escape
lift (cross-stage-persistence)
run

  "Scheme's quasiquotation, being a general form for constructing
  arbitrary S-expressions (not necessarily representing any code), is
  oblivious to the binding structure."

But quote-syntax and unsyntax do this correctly, right? Hmm.. I don't
see it without thinking..

EDIT:

  "... uses a complex macro ALPHA that is aware of the binding
  structure. ALPHA traverses its argument, presumed code expression,
  and alpha-converts all manifestly bound variables to be unique
  symbols."

This I can understand: alpha-renaming to make sure names are unique
before splicing in code.

  "Since syntax-rules can only produce globally-unique
  identifiers but not globally-unique symbols, we must use syntax-case
  or a low-level macro-facility.

OK, if the goal is to create code that has a symbolic representation,
this is clear. The syntax-case case uses generate-temporaries for the
unique names.

But does it need to be like that? If we're generating
code that is eventually to be compiled, why not generate a graph
structure directly?

  "The macro ALPHA is implemented as a CEK machine with the
  defunctionalized continuation."

Ok, so it's an interpreter basicly. CEK is the machine underlying
Scheme, as opposed to i.e. SECD for lisp. The CEK is implemented in
syntax-rules.

It might be interesting to see how alpha renaming and cross-stage
persistence are problematic or avoided in Staapl/Coma. Alpha renaming
is avoided by using a point-free target language. CSP is qw. It
supports numbers, target-address dependent expressions which can be
reduced during the assembly stage, and macros which need to be
eliminated during postprocessing.

Ok. Staapl is quite a bit simpler, because I'm doing metaprogramming
and code generation in the same spot. It's only because point-free
code is linear + that all code generators need a finalization step
that this trick works.


Entry: cleanup
Date: Tue Jul 15 12:33:47 CEST 2008

Since the roadmap for further work is pretty clear (TAPL, MetaOcaml,
typed stack languages) it's time to finalize Coma so it can be
released.

 * Fix undefined symbol bugs for monitor=module (OK)
 * Load monitor as a module (OK)
 * Upload monitor from within Staapl (OK)
 * Load/save addresses (OK)
 * Write documentation (OK)
 * Incremental upload.
 * Make Snot repl
 * Get the synth going


Entry: control.ss and label.ss
Date: Tue Jul 15 13:53:51 CEST 2008

About the space between state:2stack and state:compiler. It is
possible to define the control primitives in 2stack using the 'label'
pseudo-op as it used to be. Later replacing 'label' with the
intelligent construct in instantiate.ss gives the possibility to build
structured code graphs. Maybe it's worth to separate both?

This works well. It leaves the words

     exit or-jump sym label:

as hooks that can be used to plug in the control flow analysis code
from instantiate.ss

The code then uses the pseudo ops:

     label jw/if

To represent labels and conditional jumps.

Finalized this: added separate control/ project directory and renamed
the remaining macro/ to comp/ to indicate it's purely about
compilation (code tree generation) and postprocessing, not about
language definition.


Entry: name troubles
Date: Tue Jul 15 16:01:16 CEST 2008

Loading the monitor code into the target namespace works
fine. However, requiring it gives trouble. (undefined macro/f->)

This is probably due to the use of macro/f-> in the parsing words,
while that word is later defined in a source file: at the time of
expansion, the macro isn't yet defined. So it needs a stub.

This can be solved by adding pic18 specific parsing words that are
required into the file.

OK, that works. Added a stub of 'f->' in pic18/macro.ss and created
pic18/parsing-words.ss to add the word 'fstring:'.


Entry: documentation -> parser itches
Date: Wed Jul 16 13:00:14 CEST 2008

I feel that the most important layer Scat -> Coma + Control is
ready. It's simple enough now to be documented. However, the Forth +
parser part isn't very well written.. Does it make sense to spend some
time on cleaning it up? The main questions are:

 Q: Is it possible to write the compiler more as a substitution system
    instead of a CPS parser?

 Q: Is that desireable?

Sticking with the CPS approach, the state that's passed might need
some simplification. The mistake I made is to pass an expression that
will only be added to on the outside, but what is really necessary is
to be able to insert into expressions. This requires a cursor into a
tree. The problem is that I don't know that data structure well enough
to get to this implementation from an intuitive approach.

 Q: What is a zipper?

http://okmij.org/ftp/Scheme/misc.html#zipper

There are two views, one is a fairly straightforward 'reversal of
pointers' where each node encodes a path through to the top node using
the following data structures:

(define-struct path (left path right))
(define-struct cursor (current path))

A path contains a list of left sibling trees, a list of right sibling
trees and a path. If path is #f the current node is the top node.

According to this:
http://okmij.org/ftp/Scheme/zipper-in-scheme.txt

Zipper can be represented as a delimited continuation of a depth first
traversal. It seems this works only for updating nodes, not for adding
them. (or not?)

Anyways, simple straightforward manipulation might be enough to
transform the tree used in the expression accumulator of the Scat
parser into a zipper structure, to get rid of the 'wrapping'
problem.

Zipper as continuation is actually not so hard to understand: the
'reversed pointers' are really nothing more than stack frame links in
the recursive decent of the structure.


Entry: incremental dev
Date: Thu Jul 17 10:51:19 CEST 2008

I think all is in place, I just need to flesh out the normal flow of
operations.

How to create a project?

- make sure Staapl is installed in plt/collects/staapl (sudo make install)

- (require (lib "pic18/monitor-p18f1220.f" "staapl"))

  This file is an example of a self-contained Purrr project for
  PIC18. Loading it like this will compile the file and import all its
  macros and target words into the current namespace. In the following
  we'll take the interactive approach, but remember that it is
  possible to automate all of this: .f files are really PLT Scheme
  modules and can be composed as such.

  In order to handle the code interactively, it's more convenient to
  use a prj environment. This is a scheme namespace object into which
  all support code can be loaded.

- (require staapl/prj/pic18)

  Once the environment is loaded, a Forth repl is available using
  (repl <string>). This will provide the string to the reader present
  in the prj namespace. The following command will load a file into
  the current namespace. Note that this is different from require.

- (repl "load staapl/pic18/monitor-p18f1220.f")

  This loads, compiles and assembles the code. Use (print-all-code) to
  view it. Use (ihex) to view intel hex dump, or (save-ihex
  <filename>) to save it. For convenience, it's possible to call
  (piklab-prog <filename>) to program it.

  Set the port the target is connected to, i.e.:
  (current-console '("/dev/ttyUSB0" 9600))

  Use (prj> . code) to execute prj Scat code, and (target> . code) to
  execute possibly simulated interaction code. I.e. (prj> ping)


- re-establishing contact. this requires target word addresses.

  (save-target-words <filename>)
  (load <filename>)


Entry: Walid Taha RAP video
Date: Thu Jul 17 13:25:40 CEST 2008

Jan 22 2007 @ google:
http://video.google.com/videoplay?docid=915594482273345538&q=type%3Agoogle+engEDU

Research question: What are the high level abstractions that can be
used to keep control over resource use?

goal:
  - support expressive abstractions
  - ensure safety by static analysis
  - don't let this get in the way

means:
  - multi-stage programming (MSP)
  - reactivity (I/O events)
  - advanced type systems

( Staapl does MSP in a traditional untyped / partly dynamically typed
  way, without special tools for reactivity and static type
  analysis. Contrasting principle: get MSP to work first in a simple
  paradigm, add static tools later using DSLs. )

Ideas behind MSP are old. The new approach is to combine this with
static tools. Reactivity is combined with MSP by creating program
generators for reactive programs with static guarantees.

Essence behind typed MSP: extend with types that are
  - delayed value
  - annotated what kind of delayed value

( An essential concern that seems to be solved by MetaOcaml is
  variable capture. The lucky thing about Staapl is that this problem
  is avoided: there is never any confusion about binding of values: in
  the pattern definition language standard Scheme lexical binding is
  used, while in composition, there are no bound names: all is
  point-free. The big disadvantage of course is that a stack language
  has no parameters. For multi->multi DSP code this is a
  problem. However, FP like array processing languages can be embedded
  in a similiar point-free style, replacing the stack with array
  structures that also can be re-used in every step. The essential
  insight is that it's not stack computing that's important, but
  point-free threaded state that gets discarded after each function
  application. )

What pops out of the FFT example in the talk is the use of
mathematical properties in the generation of the code: algebra of
programs, where the program in this case is ordinary algebra :)

Reactive programming: E-FRP, a scaled down version of FRP from Haskell
which compiles to event-loop C-code.

Linear types: values can be used only once (consumed). Hoffmann
(LFPL). The idea is that in pattern matching, on deconstructs a cons
cell, which is then passed to the RHS where it can be reused:

 cons(x, xs) at d -> cons(1, xs) at d

Indexed types: a bit like polymorphic types, but with parameters that
have values, i.e. lists of a certain size. This goes pretty far: you
provide proofs that the type checker checks. This can be very
specific: nasicly, the types could be made to perform the whole
computation completely in the type system.

EDIT: look at dependent types, Pierce p. 462

Onwards, maybe this is interesting: gradual typing:
http://lambda-the-ultimate.org/node/1707

( About domain specific knowledge and rewrites: basicly, these are
  theorems about the data types and operators. This might be
  abstracted by generating parameterized theorems, but then those can
  probably be specofied as more complicated rewrite rules. One
  possible step is to start from equations, and distill directed
  rewrite rules. )


Entry: The Expression Lemma
Date: Thu Jul 17 16:20:18 CEST 2008

http://blogs.msdn.com/ralflammel/archive/2008/07/16/the-expression-lemma-explained.aspx

Is this right in the middle of point-free code, where imperative and
functional meet? I.e. the application of the composition (f g h) to
the state x in

    x @ (f g h)

can also be seen as the interpretation of the sequence of messages f g
h by the object x:

    [x f]
    [x g]
    [x h]


Entry: Algebraic types
Date: Fri Jul 18 10:11:48 CEST 2008

Roadmap for today: find out exactly how 'pattern is based on algebraic
types or not. Explain this in the scribble doc, upload doc to server
and send an email to Walid Taha.

http://planet.plt-scheme.org/package-source/dherman/struct.plt/2/4/doc.txt

http://en.wikipedia.org/wiki/Algebraic_data_types

  An algebraic data type is a datatype each of whose values is data
  from other datatypes wrapped in one of the constructors of the
  datatype. Any wrapped datum is an argument to the constructor. In
  contrast to other datatypes, the constructor is not executed and the
  only way to operate on the data is to unwrap the constructor using
  pattern matching.

Let's stick to duck typing. I don't see any essential differences
between the way the instruction mapping works and a re-implementation
in terms of algebraic types. The pattern matcher solves the basic
organization of the data stack. On top of that (data within
instructions) any scheme data type and plt-match syntax can be used.


Entry: Generating typed programs
Date: Fri Jul 18 22:39:03 CEST 2008

Actually, generating a statically typed system in a dynamically typed
language isn't such a crazy idea. The type checking of the generated
program can happen at generator run time, at which also the generator
dynamic types are available.

However, what I am doing in Staapl is to generate the target program,
but to also compile it immediately. There is not really an
intermediate generated program representation other than the data
types exchanged between code processors. This data however could
really be representing code bodies for embedded languages.

In any case, static guarantees about the target program are really
dynamic checks of the data passed between code processors /
generators. At generator compile time, I can't check anything. But is
this really necessary? What is lost is the ability to typecheck
generator components. Only when they are instantiated, they can be
tested. Adding an explicit test suite for generators solves that
problem.


Entry: The API
Date: Fri Jul 18 22:48:13 CEST 2008

I'm worrying about publishing the API to the Staapl system. The
important observation however is that the 'patterns 'compositions and
'scat: and 'macro: forms are really enough to start building
abstractions on top of if necessary.

Maybe I should really take the slow approach: document only those
functions that are necessary, and take a slow start.


Entry: Zipper for the parser
Date: Sat Jul 19 12:28:27 CEST 2008

The idea is this: currently the parser api uses 2 syntax elements to
pass around, and some context. It would be better to dump as much of
the context into the passed state, and also encode that state into a
single object that acts as normal syntax transformer.

The standard tree to accumulate is this:

  (lambda (state) x)
                  |
                state

where the x point is added to. The 'locals' parser takes the current
lambda expression and assigns it to a variable in a let expression,
and builds a new lambda expression.

How to represent zipper in syntax objects? If we're only representing
trees where the cursor is at the rightmost slot in a node, the data
structure becomes simpler:

  (state ((lambda (state) #f))) :: <node <<siblings> <root-path>>>

(define (stx-zip-up stx)
  (syntax-case stx ()
    ((node ((siblings ...) parent))
     #`((siblings ... node) parent))))

(define (stx-zip-down stx)
  (syntax-case stx ()
    (( (siblings ... node) parent)
     #`(node ((siblings ...) parent)))))


Entry: incremental upload
Date: Sat Jul 19 14:07:20 CEST 2008

The idea is that code goes 'somewhere'. It is associated to a
resource. All code compiled inside a compiler namespace is registered
to a central code registry. Every time code gets TRANSFERRED somewhere
else, the corresponding code is marked as old. Transfer means to
either write out a hex file or similar, or to upload it directly to a
target.

The simplest interface seems to be indeed the mapper: this can ensures
the operation completed before state is changed.

Onward: using map/mark-target-code, create a function that uploads the
binary code using code defined in tethered.ss

Ok, I remember: last what i did here was to use the comprehensions to
build a formatter for upload-bin. That code now needs to be tied to
getting the last binary code. Simple match? No: i got annoyed by the
absence of a Scheme interface to the interaction code in tethered, one
that automatically connects to the console. Let's write that first.

Maybe the simpler solution is to add a default somewhere?

I moved some of the tools/io.ss code to live/console.ss since it's
quite specific. Can probably merge together.

Added the 'with-console function to run arbitrary code in connection
with the console. This is still not good enough. Need a real
connection. But let's postpone this decision until after the highlevel
part of the code is done.

Verified that the programming worked using a read 'fbytes>list

The 'ihex function in pic18.ss uses 'auto-bin to produce binary
code. Maybe upload should work similarly?

( This level is really a bit of a mess.. Internally code is organized
well, but on the outer levels it's a patchwork.. probably because it's
a state machine written around code state, and there are several
format conversions going on.. )

The interactive state consists of:

   * code compiled upto now, possibly marked as old
   * core + project macros (the concatenative language)
   * the current upload point for interactive dev

The last one is still missing: assembly doesn't save the memory
pointers. Let's place them in target/code.ss


Entry: functional code graphs
Date: Sat Jul 19 20:02:24 CEST 2008

I'd like to move back to functional data structures for the code
graphs. There is simply too much fuss with bookkeeping, so let's move
to functional types.

* Graphs: In order to make graphs in a functional language, one needs
  to see the graph as an infinite tree. Such structures can only be
  defined in a lazy manner. In scheme, this requires explicit use of
  delay/force.

* Unroll the updates: It is necessary to write updates as different
  data structures that refer to their parents. This involves:

    - code compilation + linking (target-word-code target-word-next)
    - assembly (target-word-address + target-word-bin)

Of course, thinking about it now, the reason this all is imperative is
that linking is simplified using the code -> word patch after
instantiation, and assembly is easy because the address and bin slots
can be updated multiple times.

Maybe this is a more sane approach: once code is marked 'old' it is
effectively frozen, and can never change (be re-compiled or
re-assembled). It is also completely concrete at that point, and
should be serializable to disk.

FIXED: made it a bit simpler, using separate *chain* and *bin* stacks
in pic18.ss

NEXT: 'upload-bin' seems to perform a binchunk-split, while this is
also done in pic18.ss : where is best?


Entry: Reachable vs. Incremental
Date: Sat Jul 19 20:40:51 CEST 2008

There are 2 models of developing code:

  * Standard incremental Forth: assemble everything that's generated,
    in the order in which it appears in source code. For subsequent
    code, just append to the already defined code.

  * Reachability: define some entry points into the code, and assemble
    the serialized reachable graph. This allows for more elaborate
    dead-code elimination.

To make it clear, i'm renaming code.ss to incremental.ss since that
part is only necessary for incremental dev.

Also, it seems there was a confusion between 2 parts of state that
need to be maintained:

 - compilation: symbolic assembly code
 - assembly: binary code + allot pointers

Maybe it's best to provide a simplified interface that performs all of
this at once? What is necessary is some kind of transaction model.

Instead of having 'repl' perform just compilation, it needs to do
assembly too. The result of 'repl' is an updated target/incremental.ss
state and a list of to-be uploaded code.

 Q: Is separate compilation/assembly necessary?

Probably only when debugging macros, and then it is probably easier to
use the 'macro> interface, so the control flow analysis and
optimization doesn't get in the way. We'll see later on if a finer
granularity in the api is necessary.

Let's replace 'register-code' with a hook, so the behaviour of what
exactly happens when a file is loaded/required is pluggable.

Hook works fine: this makes things a lot easier. For example now
prj/pic18.ss has full control over what happens when code gets loaded:
it defines two modes: one that accumulates binary code and increments
code addresses, and one debugging mode that simply prints out symbolic
asm.


Entry: hygiene
Date: Sun Jul 20 03:28:30 CEST 2008

Looks like variable capture in macros is quite a bit more complicated
than I thought. Reading the MacroML paper:

http://portal.acm.org/citation.cfm?id=507635.507646

Basicly, introducting binding forms during a source code
transformation requires the assurance that no free variables are
captured (= hygiene). A sure way of doing this is to use generated
names that come from a namespace exclusively allocated to that
particular source code transformation.

The other way around (referential transparency) the names introduced
should be related to those visible at syntax transformer definition
time, not those visible in the lexical expansion context.

As far as I understand, in MetaML and MacroML renaming is used also.

 Q: what are freshness conditions?

No idea. Maybe this has to do with generated names?

The paper contains an interesting section about recursive macros and
'early' parameters: those necessarily evaluated to make sure expansion
is finite.

Hmm.. So MetaML is really about evaluation order: making sure some
evaluations happen before other ones, independent of the language's
default normal/applicative order.

However, the point about substitution at the end of section 3 I don't
really understand. Why is there never any variable capture?


Entry: Staapl pilars
Date: Sun Jul 20 03:51:45 CEST 2008

STACK/POINTFREE:
  - stack machines have efficient VMs / hardware implementation
  - maps to clean functional semantics
  - imperative code looks functional (stack gives referential transparency)
  - easy to express partial evaluation / rewrite rules
  - metaprogramming simplified: no hygiene or reftrans problems

DYNAMIC TYPING:
  - simple dynamic type system: easy to understand. basis = pattern
    matchin transformation.

INTERACTION:
  - incremental development
  - target-view console


OPEN QUESTIONS:
  - type systems: how to add more static analysis
  - embed array processing languages

As a summary, I think over the course of a couple of years I've found
the proper factorization of the program, and as good as optimal
syntactic constructs for extending it.

Disadvantages? Mostly that base language is a stack language (matter
of taste). And dynamic generation can produce obscure errors and won't
catch type errors.

Bottom line: simple highly extensible metaprogramming system for tiny
controllers without arbitrary abstraction walls + a practical
interactive framework.


Entry: the target: language
Date: Sun Jul 20 12:59:14 CEST 2008

somehow '(target> ts) doesn't work any more:
reference to an identifier before its definition: scat/ts in module: "/home/tom/staapl/live/target-lang.ss"

but it is defined in the namespace.
FIXED: didn't include target.ss in parsing-words.ss so the
substitutions macros didnt see those words.

NEXT: full target console + synth


Entry: the synth
Date: Sun Jul 20 14:37:39 CEST 2008

i don't have control stack juggling words defined, so using the
opportunity to use some macros + locals (very useful for 1 -> many
maps like accessing 2-byte variables)


Entry: ,,geo-seq test case
Date: Sun Jul 20 15:06:38 CEST 2008

An opportunity to test table generation and recursion in macros.

geo-seq ( start endx length -- )

This brings up an inportant issue: availability of target values. In
the MacroML paper these are called 'early' parameters. Let's define
them in Coma to mean values that do not depend on target word
addresses, and as such can be evaluated at compile time.

The generator works fine, had to change some things due to the new
tscat: macro. But.. there's something wrong with phases: requiring the
code doesn't really seem to work! Identifiers don't get required in
time..

This looks like trouble...

Let's avoid this for now.


Entry: asm overflow errors
Date: Sun Jul 20 20:17:37 CEST 2008

Forward jumps cause problems due to target addresses begin aligned at
zero. The easiest way around this is probably to ignore these errors
in the first phase?

done.
got it to compile now.

Entry: pattern matching guards
Date: Sun Jul 20 21:23:28 CEST 2008

next problem:

bang:
	0401 6EEC [dup]
	0402 52EF [movf INDF0 1 0]
	0403 52EF [movf INDF0 1 0]
	0404 52EF [movf INDF0 1 0]
	0405 52EF [movf INDF0 1 0]
	0406 52EF [movf INDF0 1 0]
	0407 52EF [movf INDF0 1 0]
	0408 52EF [movf INDF0 1 0]
	0409 50E9 [movf 4073 0 0]
	040A 6E18 [movwf other-task 0]
	040B 0E10 [movlw 16]
	040C 6EFC [movwf 4092 0]
	040D 0EF0 [movlw 240]
	040E 6EE1 [movwf 4065 0]
	040F 0EE0 [movlw 224]
	0410 6EE9 [movwf 4073 0]
	0411 52EF [movf INDF0 1 0]
	0412 501A [movf (sound 1 +) 0 0]
	0413 D50E [jsr 1 execute/b]

the first part comes from suspend, which properly expands using
'macro>

box> (macro> suspend)
[save]
[movf 4085 0 0]
[save]
[movf 4086 0 0]
[save]
[movf 4087 0 0]
[save]
[movf 4057 0 0]
[save]
[movf 4058 0 0]
[save]
[movf 4092 0 0]
[save]
[movf 4065 0 0]
[save]
[movf 4073 0 0]

It's in the binary .hex code too. Maybe a bug in postprocessing ?

It's this one:

   (([,op POSTDEC0 0 0] [save] opti-save)  ([,op INDF0 1 0]))  ;; NEED SYNTAX

hmm.. how to match to the value of a parameter.

ok, fixed by using a general curried function creator


Entry: compile/execute vs. run
Date: Mon Jul 21 11:23:30 CEST 2008

Due to multi-stage semantics, the meanig of these 3 words requires a
little thought. There are several cases of quoted data to be handled:

macro
label
symbol

Currently, 'compile can handle it all, 'run handles macros and
delegates to ~run, while 'execute handles labels and delegates to
~run.

What about providing a basic ~run, and wrappers around it?

(Note that this is probably a symptom of ill-typed code: macros cannot
be target values.. why is that?)

Conclusion:

  They really do different things. 'run is the clean Coma version
  (Coma doesn't have labels), 'compile wont delegate to the runtime
  ~run and 'execute is a possibly optimized lowlevel execute which
  delegates to ~run.


Entry: Higher order macros
Date: Mon Jul 21 12:38:23 CEST 2008

It seems pretty clear now that higher order macros should be built on
top of the Forth control primitives.

 * Forth code is not structured on the syntactic level: all control
   structures are a consequence of semantics of control macros. Now,
   this is a powerful mechanism in itself, but it really is more
   concrete/lowlevel than quoted code fragments: I don't see a simple
   way to extract structured data from this.

 * Otoh, all functionality to implement higher order macros is defined
   in the Forth control language.

So, to add control structures to Coma (anything that involves
branching), it is better to build those on top of control.ss and
shield that namespace using the module system.

Because higher order Coma has loop bodies in a clean rep, it can
perform more optimizations.

Conclusion:

        - Forth Control depends on pure core Coma
        - Coma Control depends on Forth Control.


Entry: snot repls
Date: Mon Jul 21 12:46:48 CEST 2008

Roadmap:
 - compilation repl OK
 - parser + interaction repl OK
 - polish commands.ss

It's probably more useful to only keep track of assembly code that's
not been uploaded or saved yet. So I'm changing pic18.ss moving
kill-bin! to kill-code!

Upload is working from console now.

Next: load .f files into the namespace using something a little less
raw than "load <filename>". This requires to move a piece of code from
forth/parser-tx.ss to forth/lexer.ss ... In order to get the relative
loading to work properly, forth-load/compile just expands to the
'load' word + filename inlined as string.

One more thing: in order to be able to use 'load' in the interactive
console, one needs to have access to reflective operations. So this
should work:

(define (forth-load filename)
  (eval `(forth-load/compile ,filename)))

This seems to work. I put it in live/reflection.ss

Next: mark (hmm.. lot of this convenience stuff needs to be
re-implemented..)

Entry: mark & empty
Date: Mon Jul 21 17:28:29 CEST 2008

Mark probably won't work like it used to: it needs a stack of current
words.. Maybe the run-time state in pic18.ss needs to be implemented
as a stack?

 Q: Can this be implemented properly instead of hacked together? This
    means: perfect restoration of a namespace. Can the namespace
    itself be dupped?

 Q: Can we somehow serialize the namespace?

 Q: A procedure can be serialized, but a closure can't is this true?


Let's hack it together first.

Hmm.. All this depends much on what I want to accomplish. Simply put,
the only operation I'm interested in is to REPLACE some interactively
loaded code. In all cases I've working in sofar, the application
consists of:

  (C) a fixed core
  (I) incremental replacements

Here (C) is completely source-defined, while (I) is the incremental
part. Maybe this is a better model to work with than setting (C) to be
only the monitor code. Maybe 'empty' should always go right upto (C),
and not use a stack of restore points. It sounds as if it is cleaner,
but i've never used it effectively because it requires some mental
tracking while mostly you just want to start from a clean sheet.

So, let's pick the best of both worlds: no mark/empty. If you want
empty, recompile and reflash the app. This also reflects a need that
occured in the previous approach: sometimes things go bad, and what
you want is to go back to a working point fairly quickly.

Eventually, this will require custum programmers. But let's do it with
the ICD2 first.

'mark and 'empty are currently implemented in the simplest way
possible: just tracking the words. Some extra safety can be built on
top of this, but essentially, once you use 'empty the namespace and
target are out of sync.


Entry: substitutions
Date: Mon Jul 21 19:44:50 CEST 2008

Something I dont' really get is why substitutions don't get
name-checked before used.. They are macros, maybe that's why?

The problem is that some definitions might not work. Is there a way
around this? Maybe evaluate the code somewhere? No.. the identifiers
are only interpreted when the macro is invoked. Before that no checks
can be made.


Entry: project reload
Date: Mon Jul 21 21:48:52 CEST 2008

Can't install new namespace from within the namespace, so need to work
around this by throwing some exception/abort.


Entry: done?
Date: Mon Jul 21 22:07:27 CEST 2008

Need to check the synth code if it's still working, but as far as I
can see I'm done. Some minor toplevel organization things + ICD2
programmer interface.


Entry: problems
Date: Tue Jul 22 00:08:20 CEST 2008

So, what didn't work out? I'm getting a bit hyped up with a nearing
release, maybe time to list the things that I've been stressing about:

 * catching loop bodies into functional representations
 * the simulator: over-the-top staging challenge
 * dsp language: probably will become AP language
 * C code excursion: need firm ground to work on the grid iterators


Entry: disassembler
Date: Tue Jul 22 00:58:05 CEST 2008

Forgot about that.. Maybe try to get it working first.


Entry: Graph structured lambda calculus, SECD, ...
Date: Tue Jul 22 01:29:08 CEST 2008

I'm tired so this might be nonsense..

Something i never understood is the obsession with keeping lambda
representations flat. For source transformations it makes a lot more
sense to represent lambda terms as a graph instead of a tree:
explicitly connecting reference sites with binding variables.

EDIT: this is actually what de Bruijn indices do: they point upwards
in the graph structure, counting abstractions. Writing this as a graph
gives a directed acyclic graph which is (related to?) the dataflow
graph of the computation.

Anyways: SECD and Forth

S = param stack
E = allot stack
C = instruction pointer
D = return stack

http://www.cs.utah.edu/classes/cs6520-mflatt/s00/secd.ps

SECD is lispey while CEK is schemey

http://planet.plt-scheme.org/package-source/robby/redex.plt/1/0/doc.txt


Entry: dasm
Date: Tue Jul 22 12:41:44 CEST 2008

The assembler has an ad-hoc type system, where operand names determine
the type. This is used for checking overflows of jumps and
implementing absolute/relative addressing.

Anyways, I'd like to use the disassembler to build target code chains,
so they might be used later to be re-generated. The question is, where
should the labels refer to?

Maybe solve that problem later, and first get a bin->chain converter
working.

Ok. minimal dasm working. Needs some tuning + more configurable behaviour
(symbol resolve + word / address size etc)


Entry: the synth
Date: Tue Jul 22 21:05:59 CEST 2008

almost there: thing to do is
  * boot + isr vectors
  * whole app build script
  * piklab-prog
  * project reload (scratch)


Entry: piklab
Date: Wed Jul 23 14:01:16 CEST 2008

Synth doesn't work. Time to get piklab-prog to work without having to
re-plug the board: using run etc.. OK

The problem seems to be in the binary code chunking: the first chunk
it produces is correct, but the remaining ones are not:

(map car (car *bin*))
(576 0 688 48 50 2142)

The problem is data chunks. How do they end up in the code? The
problem is conversion of words to binary: this should take only code
words. the error is in 'target-chain->bin : added a realm filter.

Looks like there's still a problem: there are 3 code chunks remaining
now:

box> (map car (car (bin)))
(576 688 2142)

The problem could be that data chunks do not get disconnected. Looks
like that was it: added 'terminate-chain after variable macro.

Ok, booted the synth, but it doesn't work properly. This means i get a
chance to test some of the debug features.

There's something wrong with the 2nd instruction 089A E1B3 [bpz _L717
1]. The address is way off. 3 is correct, but where does the 'B' come
from? (It should be E103).

compile> : boo 0 xor z? if -1 else 0 then ;
command> print-code
boo:
	0898 0A00 [xorlw 0]
	089A E1B3 [bpz _L717 1]
	089C 6EEC [dup]
	089E 0EFF [movlw -1]
	08A0 D002 [jsr 1 _L718]

_L717:
	08A2 6EEC [dup]
	08A4 0E00 [movlw 0]
_L718:
	08A6 0012 [return 0]

The 3 above looks like it's accidental.

compile> : boo z? if 123 then
command> print-code
boo:
	08A8 E1AC [bpz _L719 1]
	08AA 6EEC [dup]
	08AC 0E7B [movlw 123]
_L719:

This should be E102

Let's go back to only the monitor.


This is a problem with this:
 (bpc      (p R)     "1110 001p RRRR RRRR")
 (bpn      (p R)     "1110 011p RRRR RRRR")
 (bpov     (p R)     "1110 010p RRRR RRRR")
 (bpz      (p R)     "1110 000p RRRR RRRR")

wich i thought was fixed. This line was wrong:
   (([flag? opc p] [qw l] or-jump)     ([,opc (flip p) l]))

compile> : foo z? if 123 then
command> print-code
foo:
	0240 E102 [bpz 1 _L184]
	0242 6EEC [dup]
	0244 0E7B [movlw 123]
_L184:

It works!


Entry: packaging + prepare release
Date: Thu Jul 24 10:01:05 CEST 2008

http://docs.plt-scheme.org/mzc/plt.html

- clean up darcs project init (local collects?) OK
- build plt package
- clean up forth.pdf


Entry: old web site
Date: Thu Jul 24 12:04:55 CEST 2008

<p>
  To understand the development approach and the current form of the
  source code, it might be necessary to see it in the right context. I
  am an electrical engineer working mostly on embedded control and
  signal processing projects. I seek to optimize the development
  process of highly specialized software for embedded systems by small
  groups of 1 to 3 people. I got fed up with ad-hoc methods of
  metaprogramming and code generation that I see used in this
  engineering subculture, and decided to build a clean system on a
  solid base that can be understood and used by a single electrical
  engineer with an open mind towards modern programming language
  technology. I am not a programming language theorist, and if you
  want to use Staapl, you don't need to be either.

<p>
  The current emphasis is on work towards Purrr, a stand-alone
  standard Forth layer for generic microcontroller architectures, and
  Purrr18 an interactive tethered cross-compiled Forth dialect
  designed for the 8-bit Microchip PIC18 Microcontroller. Future goals
  include the design of a linear concatenative language as a successor
  or drop-in replacement for
  the <a href="http://zwizwa.be/packetforth">Packet Forth</a>
  interpreter, and the design of a declarative Scheme derived
  data-flow language to implement DSP functionality on a
  microcontroller or DSP processor. Eventually I want to cover the
  whole spectrum of tiny 8-bit microcontrollers to 32-bit machines
  that can run unix with an integrated language tree based on Forth
  and Scheme dialects, and an interaction system that can handle live
  software updates and debugging for distributed embedded
  applications.


Entry: staapl home
Date: Fri Jul 25 14:19:48 CEST 2008

In the reflection code, there are hard links to the location of the
staapl tree. Maybe these should be made soft such that staapl can be
installed anywhere: trying to host it on Planet gives some trouble..

Maybe I should provide a 'staapl-install function that will install
wrapper modules around the planet modules.

OK. I got some solution.
The preferred module language is

   #lang planet zwizwa/staapl/pic18

This will allow easy install of staapl through planet. I've removed
all #lang references from the .f files though: going to use 'load' for
most things, and only wrap toplevel code in module languages.


Entry: cleanup dist + docs
Date: Sat Jul 26 09:26:22 CEST 2008

- make sure examples are in planet dist
- clean docs + add to planet dist

examples. there are 3

  * compile + burn monitor
  * compile only synth
  * start only repl

REPL is moved to core and renamed to staapl/prj/pic18-repl while the
examples are now written as modules and accessible through

  mzscheme -p zwizwa/staapl/examples/upload-monitor
  mzscheme -p zwizwa/staapl/examples/build-synth


Next is to cleanup the docs. Maybe the Purrr manual should be moved to
scribble too? This would allow some testing + documentation of live
interaction.


For the Forth doc, it would be nice to write a small macro for
evaluating chunks of literal forth code.


Entry: offline compilation example
Date: Sun Jul 27 12:07:27 CEST 2008

What is necessary is a script that compiles a PIC18F1220 application
from an input forth file, including a monitor and a proper boot
sequence.

This means:
  * read input arguments
  * create namespace with instantiated monitor
  * instantiate the script
  * add a simplified boot mechanism
  * dump out .hex and .dict

Added staapl/prj/pic18f1220-serial wich loads the 18f1220 defs and the
monitor code.

NEXT: need to solve the path issues with 'load'. But where? For
convenience i'm going to put it in the lexer module, but it should
really be somewhere else..

Yes. This is not trivial, since the path needs to be available at
compile time.

Currently:
 - lexer is free of paths
 - relative paths come from rpn-search-path parameter in parser-tx
 - pic18 path is encoded in parser-tx
 - that can move to forth-begin, where the param can be set

Problem: how to make the load path available at compile-time? More
specifically, how to set the parameter rpn-search-path? Simply setting
it at runtime doesn't work since it's a different instance. Maybe
using 'eval' helps?

Hmm... this is a can of worms. Maybe a way out is to add a form that
sets the load path, just like before. This works.

Remaining question: should this be a permanent state? Also, what with
modules? Requiring a module will already use load-relative i
think. YES.

So, the remaining problems are purely about interactive
'include'. This means it can probably be best solved there.

The remaining question is wether in parser-tx.ss, the search path
should be reset on each compilation.


Entry: the state file
Date: Tue Jul 29 10:03:44 CEST 2008

So, now that i'm converging to a certain workflow (fixed core
application + incremental dev on top of that), it's possible to define
a state file which contains:

  - a reference to code to be loaded for macros FIXME
  - a dictionary with target words OK
  - the pointers OK
  - console OK


FIXME: Make setting up the console part of the responsability of prj,
so it becomes easier to metaprogram from Scheme. In the end, Scheme is
the main composition mechanism, not Scat..
Oops. That sounds good but it messes up the unquoting in the macros.


Entry: associativity after instantiation
Date: Wed Jul 30 10:30:07 BST 2008

Working on the staapl/pic18 documentation... There's one thing I'm
noticing now: the highlevel semantics talks about associativity. This
is true for composition of macros, due to associativity of function
compositon. However, this property is NOT conserved through
instantiation!

    I(C(x,y)) != C(I(x),I(y))

C = concatenation
I = instantiation

Actually, this property is essential for some optimizations that
expose 'observable' code: jump targets. This is why the chain
splitting is so important in the instantiation step.

Entry: Onwards: concurrency / types / ans
Date: Wed Jul 30 08:51:04 CEST 2008

* This is interesting:
  http://www.transterpreter.org/docs/index.html

  I'm a bit sad alloted time didn't allow to implement the distributed
  debugging system for KRIkit last year.  But surely, concurrency is
  the next step for Staapl, next to some more static analysis.

* Reading about types lately, especially TAPL.  This article
  summarizes it well:
  http://www.pphsg.org/cdsmith/types.html

  Types are "things to prove about programs".  This can really mean
  anything.  According to Chris that's the big idea. This is
  significantly different from "types are sets" in the dynamic/lisp
  sense.

* ANS Forth on top of Staapl Forth?  It is not straightforward but
  could take the form of a standard Forth compiled to staapl
  primitives with simulated dictionary.  More importantly, is it
  necessary?  Should Staapl contain a mechanism to build a standard
  reflective Forth on top of the unrolled macro Forth?  It looks like
  this is a ``marketing problem'' more than anything else.


Entry: editing the forth paper
Date: Wed Jul 30 12:01:32 BST 2008

Took this out because it might confuse people:

\footnote{Being aware of patterns is what programming is all about. It
  is important to see patterns in your problem, so you can compress
  the problem down into a feasible solution. However, it might be
  \emph{more} important to close the loop and see the patterns in your
  \emph{solution}, so you can bring your understanding of the problem
  to a higher level. For significantly complicated solutions burried
  in code, the code can really talk back by throwing residual patterns
  at its creator.}

Entry: time for play?
Date: Thu Jul 31 02:23:33 CEST 2008

I'd like to use the dsPIC (PIC30) to make some sound. Was thinking
about targeting the gpasm assembler for it, instead of writing one
from scratch, since the architecture is significantly different from
the 8-bit ones. It would be an interesting test case for serializing
target label computations.

Hmm.. Looks like gpasm doesn't support PIC30. Ok, from scratch
then. Will start tagging posts.


Entry: dsPIC30
Date: Thu Jul 31 12:54:50 CEST 2008

I'd like to generate an assembler from an instruction set table for
dsPIC30, however, I can't seem to find one.

Just sent a request for information about the dsPIC30/33 to
Microchip. Also added the 12 and 14 bit core instruction sets. Maybe
try to get to a blink-a-led app for the 14 bit arch? It would be
interesting to figure out some pic code sharing, to test the
flexibility of the Staapl design.

EDIT: got an unhelpful reply from microchip, trying again.
In the mean time i found this:
http://ww1.microchip.com/downloads/en/DeviceDoc/mplabalc30v3_00.tgz

Which contains a file src/c30/c30_device.info and some C routines to
manipulate it.


Entry: 14 bit arch
Date: Thu Jul 31 16:58:15 CEST 2008

How to add a new architecture?

ASM
 - create an assembler description
COMP
 - add some pseudo ops for stack manipulation
 - write metapatterns for the arithmetic and logic operations
COMBINED
 - connect code to purrr like purrr.ss (so 'forth-compile can be used)
 - connect the assembler


Works pretty well. Now trying to restructure it a bit..
After fixing bug in meta-pattern which prevented the use of (macro:
...) and some minor cleanup, the 14bit core seems to work.

box> (forth-compile ": foo 123 + ;") (print-code)
foo:
	0000 307B [movlw 123]
	0001 0780 [addwf INDF 1]
	0002 0008 [return]

I'm taking a different approach for MC14: no intermediate instructions
except for the ones used in purrr. See where I get..

Entry: meta-pattern
Date: Thu Jul 31 18:09:24 CEST 2008

This is a classical evaluation order manipulation. I know what I want
to do from a high level, but I somehow don't understand the
particularities of it. Quasiquotation is really confusing.. Factoring
it in simple steps might help..

meta-pattern (M0) is a macro that generates a macro M1
M1 expands to a number of applications of a template defined in M0

Let's try to construct a toy example first.

What I don't understand is nesting of syntax-case, and nesting of
quasisyntax and unsyntax.

The rule: an unsyntax corresponds to the toplevel quasisyntax, just
like quasiquote, and nesting of syntax-case just binds new variables,
but the toplevel ones are still visible. Nothing special..

   Basicly, syntax-case is a binding mechanism that allows you to
   avoid unsyntax. Nested quote/unquote is a mess, so solving this
   with merging of namespaces (variables and metavariables) is more
   convenient.

   For higher order macros it's best to stick with syntax-case and
   syntax, and leave nested ellipsis and quasisyntax/unsyntax alone.


Looks like it's working now. It's quite readable.

I've also added a macro 'patterns-class that combines 'meta-pattern
and its invocation. This gives a pretty compact representation:

(patterns-class
 (macro)
 ;;---------------------------------
 (op    pe/op  opcode  w/op   ~op)
 ;;---------------------------------
 ((+    +      addwf   w/+    ~+)
  (-    -      subwf   w/-    ~-)
  (and  and    andwf   w/and  ~and)
  (or   or     iorwf   w/or   ~or)
  (xor  xor    xorwf   w/xor  ~xor))
 ;;---------------------------------
 ((w/op)              ([opcode INDF 1]))
 (([qw a] [qw b] op)  ([qw (tscat: a b pe/op)]))
 (([qw a] op)         (macro: ',a movlw w/op))
 ((op)                (macro: ~op))
 ((~op)               (macro: w<-top drop w/op)))


Entry: multiple targets
Date: Fri Aug  1 00:04:29 CEST 2008

Time to start factoring and parameterizing code. First thing to tackle
is the chain/bin state management: everything that goes into mc14.ss
should be factored out.

created live/state.ss


Entry: ANS Forth
Date: Fri Aug  1 01:17:06 CEST 2008

So.. It seems like a good idea to pick up the ANS Forth again. The
only freedom in implementing one is what kind of threading model to
use, and where to put stuff.

Some requirements:

  * subroutine threaded
  * call/jump/or-jump
  * data word doubler

This can probably be implemented as a small layer on top of standard
forth as done before.

num -> 'num dup upper

Entry: control flow analysis
Date: Fri Aug  1 10:35:10 CEST 2008

Chunks of code in target-word structs are basic blocks in muchnick's
terminology. (Function calls don't count here, because they don't
actually change the flow of control arbitrarily: they are equivalent
to inlined instructions).

One thing I probably need to change is to separate basic blocks after
conditional branches.

These are the basic building blocks for control structures:

      * unconditional jump
      * conditional jump
      * conditions

It seems essential to be able to represent the condition generators in
an abstract form, so they can be easily inverted. (The code that
generates the flag can be inverted, instead of the flag being inverted
after its generated).

Otoh. It should be possible to build a control flow graph with
non-instantiated code. This is the problem I tried to solve with
delimited control, but there have to be better ways.. Some form of
reflectivity on the macro end might be necessary: representing
non-primitive macros as lists?


Entry: tool integration
Date: Sat Aug  2 10:41:35 CEST 2008

Preparing for professional usage, this project needs:
 * better integration with MPLAB (linker)
 * interface with C-based development.


Entry: Factor
Date: Mon Aug  4 18:43:59 CEST 2008

Let's take another look at Factor. What I'm interested to find out is
how the compiler is structured. Let's see if there are any documents
on the blog describing it.

These seem to be interesting links:

http://factor-language.blogspot.com/2007/09/two-tier-compilation-comes-to-factor.html
http://factor-language.blogspot.com/2008/01/compiler-overhaul.html

Hmmm.. They are more about the dynamic vs. static debate. I think I've
converged on that: both are nice, but static declarative modules
win. Toplevels can be built on top of that. For PIC, everything is
static, and redefinitions need to be reloaded, but it does allow for
an 'allot-stack' like development which allows separation of kernel
and application.


Entry: ANS Forth
Date: Tue Aug  5 09:32:01 CEST 2008

It seems that standardizing is an essential part to get to some
adoption. Basicly, nobody cares about nonstandard Forths: people write
their own. Makes sense really.

So, let's bring the PIC18 Forth to standard. The goal is to do it as
convenient as possible, without loosing too much time on
optimization. Some ideas I gathered before:

  * Data doubler: Add a layer that performs just primitive data size
    doubling.

  * Unified address space. Map part of RAM into the namespace.

  * Interpreter. It seems a good idea to stick with a native Forth,
    and write a dispatching interpreter on top of this. That way all
    primitives can be re-used.

It's simplest to first make the data doubler, so words written in the
doubled language are usable in the unit language. On top of this
memory access words can be written, which then can support a trivial
interpreter loop.

Alternatively, I can implement Taygeta's primitives, and optimize them.

Entry: Data doubling
Date: Tue Aug  5 09:52:59 CEST 2008

PRIMITIVES:
* math primitives: coded manualy (DONE)
* macro mapping: coded manually

COMPOSITON:
* parser map:
    - num   -> num hilo
    - word  -> _word

This should be written as a pure parser.

I think I'm running into a composition problem here: I can't find a
straightforward way to plug in the 'derived: word to 'forth-begin.

Let's go to the definition of 'forth-compile and work from there.

(define-syntax (forth-compile stx)
  (syntax-case stx ()
    ((_ str)
     #`(forth-begin
          #,@(string->forth-syntax #'str)))))

This inserts lexed syntax into forth-begin. Maybe 'forth-begin needs a
namespace argument? Let's see where '(macro) is hardcoded. That's in
'forth-begin-tx, where the records returned by the parser are assigned
to a namespace.

I thought there's one more hardcoding reference, which is in rpn-tx.ss
where '(macro) namespace is used to check if a particular identifier
is a parser, but this is actually parameterized by
'rpn-map-identifier.

What about using the latter in 'forth-begin-tx too?

Next: should the '(target) namespace be remapped too?

Ok, parameterized a bit, but that leads to other spaghetti being
exposed. Atm, the thorn is the fact that instantiate.ss references the
'(macro) namespace. Shouldn't this be generic? I just moved these
references out of 'forth-begin, but the only thing that macro does is
to bind instantiate.ss to the forth parser-tx.ss

Trying to fix something else first: parser-tx.ss doesn't need to be
aware of the wrapping words, it should just provide an abstract data
structure with 'forth 'variable and 'macro tags. OK. The code classes
are interpreted in 'forth-begin-tx and not passed to 'forth->records.

Maybe the next step is to also generate the toplevel scheme forms as
part of the records structure to avoid awkward passing of out-of-band
data through muting parameters? Done.


Alright, it's a bit cleaner now. Maybe this is enough to build a
front-end that takes macros from a different namespace? Still there's
the problem of how to link target functions back to the original
target namespace. Maybe this is more of a nested namespace problem
actually? Using '(macro derived) and '(target derived) does make
symbols accessible as derived/+ dervived/- etc... in the core space.

This does requires a decision: to make name space mapping standard.


Entry: Derived Forth
Date: Tue Aug  5 13:10:00 CEST 2008

Let's concentrate the ideas from the previous post. To create a
derived Forth, create a separate namespace that is a child of the one
we build on top of. The language is then defined though a
parameterized forth-begin-tx such that:

 * 'derived-forth-begin uses only only (macro _) and (target _)
   namespaces for direct reference and definition.

 * the corresponding prefix macros in (macro _) map to (macro) for
   implementing functionality.

 * all (macro _) forms are accessible in the (macro) namespace through
   their direct mapping xxx -> _/xxx, but the (macro _) namespace is
   completely isolated from (macro).

Once this works, maybe the current 'live mode can be written as a
compile mode? But it's not really necessary, since it probably doesnt
require toplevel forms. Unless we allow definitions in the live
mode.. Maybe it is a cleaner model..


Entry: return stack
Date: Tue Aug  5 15:19:12 CEST 2008

Thomas Pornin:

  "ANS doesn't require the return stack to consist of stackable
   elements...  What ANS specifies is that, for each activation
   context, there is a stack-like storage area in which you may write
   celle values with >R, and get them back with R>. But these values
   are accessible only from the word itself, not from the caller and
   neither from the callees. Moreover, you are supposed to clean that
   stack before exiting from the word."

Elizabeth D Rather:

  "Exactly.  A standard system must have a Return Stack whose entries
   are the same size as cells and data stack items.  And it must
   respond to >R, R@, and R>. What the standard *doesn't* require is
   that the system must use it for return addresses.


This is interesting. I didn't know that. This means it's probably best
to rename my 'x' stack to the 'r' stack. Anyways, i've removed all
references to 'r' so it can be added easily. Also, x will be renamed
to x@.


Entry: ANS Forth frontend
Date: Tue Aug  5 18:43:04 CEST 2008

The non-reflective words are going to be straightforward, but the
reflective ones are problematic.  The lexer, prefix processor and
macros are DIFFERENT entities in the unrolled structure, while in a
reflective Forth, they are all just Forth words.

I don't see a solution for this, other than completely replacing the
lexer and preprocessor parser with something more akin to an interpret
mode simulator.

http://www.ultratechnology.com/meta.html


Entry: documenting a port
Date: Wed Aug  6 12:24:47 BST 2008

Maybe it's a good idea to publicly port to 14 bit architecture, so
that process can be documented?


Entry: Formalizing Coma
Date: Wed Aug  6 20:26:42 CEST 2008

Scat is a concatenative language modeled after Joy.  Syntactically, a
program p is a concatenation of programs p_i or a primitive program
word p'

    p = (p_0 p_1 ...) | p'

This is isomorphic with the semantics, where each program word can be
associated with a function, and syntactic concatenation maps to
function composition.

For Scat, the operational semantics (the implementation in terms of a
primitive machine) is given by primitive Scheme functions closed over
a state space represented by a Scheme data type. In the case of Scat
and Coma, this is a stack, in case of Coma+Control, this a pair of
stacks.

For most practical use, but not necessary for theoretical use, the
state contains at least a stack. It's reason of being is to introduce
locality in the effect of functions. It is useful for creating a
practical programming language and to derive simple local syntactic
rewrite rules that can be derived from local stack operations.

Reduction (evaluation) of Scat expressions is eager, and happens from
left to right, where each primitive function part of a larger
composition is applied to the state, which is threaded through the
computation. This is the same as a sequential machine with global
state.

Note that, because function composition associates with function
application, the order of evaluation is arbitrary

    (S_0 a) b = S_0 (a b)

The function a applied to the initial state S_0 returns a value that
when passed to the function b yields the same result as evaluating the
composition (a b)

Now, this only useful if you can prove that there is some c with
nice properties such that

    (a b) = c

which allows the application to be written as

          S_0 c

The ``nice properties'' can be simplified to mean ``shorter code''.


For Coma, the eventual goal is to generate machine code (syntax) from
a concatenative source program (syntax). So instead of looking at
associativity of composition, we should look at associativity of
concatenation (syntax). More specifically, at rules that allow the
substitution of a concatenation of program words i.e. (x y z) by
another concatenation of words (s t).

     a b x y z c  =  a b (x y z) c  =  a b (s t) c  =  a b s t c

This uses the rule

     x y z = s t

Now, where do these rules come from? In Staapl, they are syntactic
transforms that preserve the associated semantics. These semantics are
operational semantics derived from a stack machine's instructions.


TODO: Implementation.

First: translate to QW, CW language.
Then implement rewrite rules.


Todo:

  * explain where the assymmetry comes from: why does a rewrite rule
    operate on code from the right only?

  * explain in a simple way that all semantics comes from target
    idealized (un-projected) machine operations.

  * explain how you go from arbitrary substitutions to a greedy
    left->right substitution scheme.


EDIT: This needs some cleanup. It deserves a separate paper.

What I'm trying to explain:

  * Coma's primitives are intermediate language transformers. The
    intermediate language has essentially 2 instructions: execution
    and quotation. This is then extended with termination, jumps and
    conditional jumps.

  * Some of the intermediate language is real target language. Open
    question: good or bad? Should this be separated out? Due to the
    pattern matching, it behaves as opaque black-box code + it allows
    the implementation of simple peephole optimizations. (This is ok:
    it's a natural extension of the opaque CW data vs. QW that can be
    partially evaluated. actually, it's a mix of the 2.)

  * It also allows arbitrary data to be passed from macro to macro,
    which is a vehicle for arbitrary incremental code generation: this
    is the presence of QW in the language: it embeds a dynamic stack
    language.

  * This can be viewed as eager partial evaluation. In a concatenative
    language, PE is rather trivial (no variable substitutions). But,
    in most cases it is not complete: not all primitives exist at run
    time: they need to be specialized / combined.

  * It is not PE. Actually, it is, on the lowest level (math + stack
    shuffling).

--


Essentially:

This contains quite some posts about the semantics of Coma. It is work
in progress. Essentially, the semantics is defined from a set of
rewrite rules that implement CONCATENATE operation as a binary
function taking a program in intermediate form, and a program
word.

This operation performs semantics preserving transformations.


Entry: evaluation
Date: Sat Aug  9 09:03:24 BST 2008

It's been a busy time lately.  What needs to happen next?  On of the
priorities is to get back into industry as soon as possible.  You
there, hire me!

Possible next steps:

  * reference documentation + API fixing

  * CSP

  * TAPL

  * other Microchip targets

  * zero-cost viral platform

  * better boot monitor protocol

  * array processing language

  * a 24-bit virtual dsp for PIC18

  * finding collaboration

  * standard Forth frontend


Entry: occam
Date: Sat Aug  9 09:18:21 BST 2008

see http://en.wikipedia.org/wiki/Occam_%28programming_language%29

- Communication between processes work through named channels.  One
  process outputs data to a channel via ! while another one inputs
  data with ?.  Input and output will block until the other end is
  ready to accept or offer data.

- SEQ, PAR and ALT for sequential parallel and conditional execution.

The difference between Concurrent ML (on which PLT Scheme's
concurrency is based) and Occam, is the way in which channels are
treated.  In CML they are first class (dynamic), while in Occam they
are static entities.


Entry: monitor rewrite
Date: Sat Aug  9 10:01:47 BST 2008

This involves 2 main parts: an asynchronous message-passing mechanism
over an abstract channel, and the definition of a low-level protocol
for different transports.

The current problem with the monitor protocol is that it is RPC
based. This is fine for 1-1 communication, but won't work well over a
many->many network.


Entry: Microchip programming protocol
Date: Sat Aug  9 10:07:51 BST 2008

Towards a zero-cost standard platform for Microchip PIC.  All PICs
support the microchip programming protocol, consisting of:

1 /MCLR Vpp
2 Vdd
3 GND
4 PGD
5 PGC
6 PGM

( see http://www.prc68.com/I/ICD2.shtml )

PICs without charge pump need 13V Vpp and can't program themselves.


It is necessary to split the development workflow into two parts:

  * Single chip applications: this can use a combined full programmer
    + debug monitor to also support chips that need Vpp. This is the
    one useful for teaching, so it makes sense to make the tethering
    hardware as simple as possible, i.e. to not depend on a Microchip
    programmer.

  * Networked applications: here interconnect needs to be as simple as
    possible, so the 5-wire programmer protocol is impractical.  Cost
    of tethering hardware isn't critical, so can have more complexity.
    It's best to stick with something standard here, i.e. I2C or CAN
    bus.  The topology can be symmetric: programmer host interface can
    be just one of the nodes.


Entry: ICD2 + serial?
Date: Sat Aug  9 13:46:38 BST 2008

Is it possible to combine the programmer port and the serial port into
a single connector?  If the hardware flow control pins can be used,
this could work.  I believe the FTDI chip has a bitbang mode too,
which could be useful.

serial   ICD2
GND      GND
/CTS
VCC      VCC
TXD
RXD
/RTS

DTE = master (Data terminal equipment)
DCE = slave (Data circuit-terminating equipment)

RTS = requirest to send
CTS = clear to send

Using standard serial, there are only 2 output lines: TXD and RTS.
The microchip protocol needs at least 2, clock and data, assuming the
MCLR and PGM pins are set correctly.

Maybe the best approach is really to stick with the ICD2 connector,
and device a protocol on top of that.


Entry: Staapler connector
Date: Sat Aug  9 14:31:37 BST 2008

Before the lowlevel boostrap can be solved, I need an adapter dat
converts serial to ICD2 protocol.  Let's combine the Olimex 6-pin
header with the FTDI 6-pin header into a 12 pin double header.  This
allows the use of 2x3 female headers for the Staapler.

How to arrange it? Daisy chaining Staapler should work without trouble
(using one Staapler to program another one). The other alignments can
be taken to simplify GND and VDD connections.


Staapler connector: target board male 2x6 header with ICD2 and TTL232R
serial connector.  This is placed at board edge.  Staapler (outline =
dotted lines) fits on top of this, sticking out (downward) over the
target bord edge (dotted edge).  The serial connector is optional.

  . . . . . . . . . . . . .
  . +-------------------+ .
  . |  1  2  3  4  5  6 | .  ICD2
  . |  7  8  9 10 11 12 | .  Serial (optional)
  . +-------------------+ .
  .                       .
  .  target board edge    .
-----------------------------
  .                       .
  .                       .

ICD2
 1 /MCLR  white
 2 VDD    red
 3 GND    black
 4 PGD    blue
 5 PGC    green
 6 PGM    yellow

SERIAL
 7 GND    black
 8 /CDS
 9 VDD    red
10 TXD    orange
11 RXD    yellow
12 /RTS


Entry: Staapler programmer
Date: Sat Aug  9 16:33:00 BST 2008

I started building a 18F1320 prototype which has this male connector
as its programming connector.  Next is to find out where to connect
the outputs and inputs of the female connector.

INPUTS:

A single input is necessary for the bit-banged serial receive.
Probably another one, or this one shared, for some client->host signal
when using the ICD2 port alone for monitor operation.

Either interrupt on change, or the INT0-INT2 pins can be used.


OUTPUTS:

TXD         bit banged serial transmit
/MCLR O     target reset
VDD   O     optional target power
PGD   I/O   data
PGC   O     clock
PGM   O     low voltage programming

Later this could be extended with a charge pump for generating the
programming voltage.


Other constraints:

  * RA4 = open drain
  * general purpose ports (not using analog): RA0-3
  * more (not using oscillator): RA6-7

Probably best to use RA0-3 for the 4 outputs-only ports /MCLR VDD PGC
PGM.  Let's use INT0/RB0 for PGD I/O.

So, what should I do first?
  * Get the monitor to work with the ICD2 connector only.
  * Build a programmer.

The programmer seems rather trivial.  The most difficult problem to
solve now is to get a bidirectional communication going over the ICD2
connector.

Some things to think about.  The only benefit of this device is to be
able to use both from-scratch programming + staapl console.  It is
beneficial for smaller targets when there is actually a charge pump
available.  Maybe it is better to just modify the firmware of an
already existing programmer?  The ICD2 would be a good target.


Entry: Staapler roadmap
Date: Sat Aug  9 17:54:17 BST 2008

Eventual Staapler goals are:

  * standardize on ICD2 connector for interactive debugging and get
    rid of serial connector.
  * add both LVP and HVP through ICD2 connector
  * create Staapler bootstrap method using parport prog

To bootstrap Staapler itself, this approach can be used:

 (1) Standardize all PIC development on ICD2 connector only.  Create a
     protocol for bi-directional serial communication on top of
     master-slave Microchip programmer protocol.

 (2) Build two Staapler boards A and B with ICD2+serial input (or
     Staapler + one other board with just ICD2+serial input)

 (3) Connect A's serial output to B's serial input and devise
     patch-through code.  This process then gives a workflow for
     working with multiple projects (namespace) at the same time.

 (4) Build the ICD2 comm master on A and ICD2 slave on B, using the
     serial->serial patchthrough for B development.

Independently after (3)

 (5) Connect A's ICD2 output to B's ICD2 input and write PIC LV
     programmer code.

 (6) Use programmer code to emulate a minimalistic parport programmer.

 (7) Write host-side bootstrap code for PC parport programmer.

 (8) Staapler v2: add support for charge pump + USB, write HV
     programmer code, add support for different busses (for networked
     debugging).


Entry: Staapler protocol
Date: Sat Aug  9 19:10:37 BST 2008

1. Document the ICD2 master-slave protocol
2. Add to this slave->master messaging (maybe just polling?).
3. Allocate pins on master + slave side


1. From the programming manual for 18F1220 (DS39592B)

   Commands and data are transmitted on the rising edge of PGC,
   latched on the falling edge of PGC, and are sent Least Significant
   bit (LSb) first.

   All instructions are 20-bits, consisting of a leading 4-bit command
   followed by a 16-bit operand.  Depending on the 4-bit command, the
   16-bit operand represents 16-bits or 8-bits of data.

   COMMANDS FOR PROGRAMMING           4-Bit Command

   Core Instruction                   0000 (Shift in 16-bit instruction)
   Shift out TABLAT register          0010
   Table Read                         1000
   Table Read, post-increment         1001
   Table Read, post-decrement         1010
   Table Read, pre-increment          1011
   Table Write                        1100
   Table Write, post-increment by 2   1101
   Table Write, post-decrement by 2   1110
   Table Write, start programming     1111


2. The hardware protocol used is really enough to provide
   bidirectional communication if it is extended with a simple 'data
   ready' signal from the slave.  This could either be an
   asynchronous slave signal (pull a line low/high) or an answer to a
   poll.

   A direct slave signal is probably easiest to implement.  Then it
   should be the data line since that is already a multiplexed port at
   the master, leaving the clock to remain output-only.

3. On the slave side it's easy: use data and clock from the
   programmer protocol.  On the server side, the clock will allways be
   output, but the data line receives an asynchronous signal.

   In case of asynchronous signalling, the data line is probably best
   implemented as a wire-or bus.

   Slave side data line for 18F1220 is RB7.  It has a weak pullup;
   maybe this can be used instead of on the master side, i.e. thinking
   about driving with PC parport, which has open collector output?
   This http://www.beyondlogic.org/spp/parallel.htm suggests using a
   4k7 pullup.

   target        host
   ---------------------
   PGC           RA0
   PGD           RB0
   PGM           RA1
   VDD           RA2
   /MCLR         RA3
   GND           GND


Notes.

It's probably best to start with reading the device using the ICD2
protocol.  That way core routines for write and read can be created.

The return protocol is self-delimited: each return message is
prepended with the size of the message.  Probably the master->slave
protocol should do the same so it can be routed.

RB7 is slave data line; sending is a simple shift.  Same for RB0
master sending.

What with Microchip's in circuit debuggin protocol?  Is that specified
somehwere?


Entry: Revising boot monitor
Date: Sun Aug 10 09:43:44 BST 2008

To prepare for proper routing of the monitor protocol, all commands
should be self-delimiting.  Note that the protocol remains RPC: each
request receives a reply.

 Q: Should the interpreter ignore messages it doesn't understand?  If
    so, what should be the reply?

Probably not.  This is a debugging protocol where the slave gives full
access to the host.  Limiting access by checking if some messages are
legal or not makes no sense in this setting: it's the host's
responsability to properly drive the target in this mode.

 Q: Should the monitor protocol be explicitly specified?

No.  It is the responsability of the application developper to use the
proper protocol since it might be extended for specific applications.

 Q: What with PING? I'm starting to run out of boot monitor space.

Maybe it's best to take this out. It's not essential. And
identification data can always be added to block 0, which is
essentially unused: the host knows what kind of target it is, and for
each target type, a storage area can be assigned.

Next: clean up live/tethered.ss so there is a clear delimited message
send/receive part in the protocol, instead of the current "send header
then send body" approach.

 Q: Should we send "write at address" or "set address pointer" +
    "write" ?

Opting for the latter. It seems to be easiest when sending multiple
chunks.

So, rewrite is done.  All messages in 2 directions are now
length-prefixed byte strings that do not require interpretation to be
repeated or routed.  The tethered.ss code is refactored into async
send/receive for messages and rpc functions.  'ping is removed and
replaced with a simple target-sync ack = OK mechanism, which enabled
the monitor code to fit into the 256 words again, after interpreter
changes for delimited messages.


Entry: popularity
Date: Mon Aug 11 08:01:12 BST 2008

To arduino or not?  It would sure help popularity, but I'm afraid it
will shift focus too much toward AVR, and leave PIC in the shadow.
I've invested quite some time in getting familiar with Microchip's
architectures, to for the tool as a whole, it might be better to stick
to that single architecture until most of the highlevel workflow and
interoperation design reflects my knowledge there.  This is
nontrivial: it includes the whole monitor.f + thethered.ss chain.

Standard Forth frontend or not?  It might help to get more people
interested, but would distract from the original idea.  If I find a
proper way to combine standard Forth with the current approach so they
can interoperate, and provide metaprogramming support only for the
standard one, it might work though.

Usage statistics.  I have no control over the PLaneT version.  How to
find out usage stats?  Maybe the PLaneT version should download
updates?  Or, I could put the installer in PLaneT only?


Entry: Staapler
Date: Mon Aug 11 11:30:15 CEST 2008

( It looks like Staapler is redundant since PicKit2 provides all the
necessary functionality.  It can program .HEX files and act as serial
passthrough. )

I started building 2 prototypes for the first iteration of the
Staapler based on a 18F1320.  Currently limited to programming /
debugging of Staapl based projects for PICs that support LVP.

It uses the Microchip 6-pin ICP/ICD interface, using the pinout from
the Olimex ICD2 clone (RJ jacks are too cubersome).

http://www.olimex.com/dev/images/PIC/PIC-ICSP.gif

In addition, the connector has an optional second row of 6 pins with a
FTDI serial TTL header in case an additional serial port is desired.

The hardware interface is a male 2x6 header with ICD2 and TTL232R
serial connector.  This is placed at target board edge.  The Staapler
is plugged on top of this (board outline = dotted lines), sticking out
(downward) over the target bord edge (dotted edge).

  . . . . . . . . . . . . .
  . +-------------------+ .
  . |  1  2  3  4  5  6 | .  ICD2
  . |  7  8  9 10 11 12 | .  TTY Serial (optional)
  . +-------------------+ .
  .                       .
  .  target board edge    .
-----------------------------
  .                       .
  .                       .

ICD2
 1 /MCLR  white
 2 VDD    red
 3 GND    black
 4 PGD    blue
 5 PGC    green
 6 PGM    yellow

SERIAL
 7 GND    black
 8 /CDS
 9 VDD    red
10 TXD    orange
11 RXD    yellow
12 /RTS


Next to the female connector for programming a target board, the
Staapler has a male Staapler-compatible connector.  This is used to
bootstrap the Staapler boot monitor using an ICD2 and connect to the
host using the serial interface.  It contains the following
connections for the female header:

   Target        Staapler 18F1320
   ------------------------------
   PGC           RA0
   PGD           RB0
   PGM           RA1
   VDD           RA2
   /MCLR         RA3
   GND           GND

This has PGD wired to RB0(INT0) so the Microchip protocol can be
easily extended with a target -> host ``terminal ready'' signal,
enabling the host to wait for replies without the need for polling.

The bootstrap plan is documented here:
http://zwizwa.be/ramblings/staapl/20080809-175417


Entry: ANS : Forth in Forth + ???
Date: Mon Aug 11 13:11:57 CEST 2008

What are the necessary primtives to implement a Forth in Forth?  The
problem I'm trying to solve is to simulate an on-target Forth between
full simulation and full stand-alone.

The only real primitives are

    @ ! execute

Which means: memory and execution model are abstract.  This works for
the standard PIC18 boot monitor, which is really nothing more than the
3 instruction Forth [1] together with some implemented primitives for
programming and more block transfer with less overhead.

So, let's first build a complete abstract forth machine.

What does a completely abstract ANS Forth machine look like?

  * two stacks of cells (the parameter stack and R stack: >R R> R@)
  * a cell array allocation mechanism (ALLOT)


From the standard document[2].

  3.3 The Forth dictionary

  Forth words are organized into a structure called the
  dictionary. While the form of this structure is not specified by the
  Standard, it can be described as consisting of three logical parts:
  a name space, a code space, and a data space. The logical separation
  of these parts does not require their physical separation.

  A program shall not fetch from or store into locations outside data
  space. An ambiguous condition exists if a program addresses name
  space or code space.

  3.3.3 Data space

  Data space is the only logical area of the dictionary for which
  standard words are provided to allocate and access regions of
  memory. These regions are: contiguous regions, variables,
  text-literal regions, input buffers, and other transient regions,
  each of which is described in the following sections. A program may
  read from or write into these regions unless otherwise specified.

So, '@' and '!' can _only_ access data space.


 Q: The important next question is: When defining reflective words
    (macros), do they have access data space at all?

No. Data space does not exist during compilation, which means that all
words that are accessible at run-time should also be simulated.  This
is the only proper way to unroll the behaviour completely, and have
simulated reflection that can be TRANSPARENTLY moved to real reflection.

Roadmap:

  - write a reflective ANS Forth that can generate simulated programs
    using some access to the target memory (for I/O)

  - from the representation of this, extract a kernel using dependency
    analysis of words.  draw primitives from a library.


 Q: How to represent the reflective Forth?

I'm not sure if it's useful to write this on top of Coma.  The result
of reading a Forth file is a structured code graph that can be
processed to generate a Forth kernel in terms of Coma and some
primitives.

 Q: Where to start?

Let's port JONESFORTH[3].  In fact, it might be a good exercise to
stick with Richard Jones's literal file, and replace the x86 assembly
with Scat code.

Actually, it can be ported to plain scheme code.  Let's write it in
PLT's rs5s language.

Hmm.. it's got me completely confused again.  There are some problems.
I'd like to write this on top of STC primitives, which are not
compatible with direct threaded code.  Alos, the dictionary model
needs to be worked out a bit.  So I need a standard model where
primitives can be plugged on top of some execution/dictionary model I
can live with.  Maybe the best way is to implement one myself after
all, or figure out how to modify one of the portable ones.

It doesn't look like JONESFORTH is a good starting point.  Going to
remove it from the darcs archive.

I need a different set of primitives.. Maybe eForth[4] is the way to
go after all?

I found a link on comp.lang.forth[5] about this.  This brings me back
to taygeta MAF[6] which is what I was looking for actually.

EDIT: One night of sleep later, I think effort is best spent
elsewhere.  The essential problem is that dictionary layout and
threading model need to be abstracted.  If the Forth has to run on a
Flash controller, Flash programming needs to be in there too..  This
is already a large part of the interpreter.


References:

[1] http://pygmy.utoh.org/3ins4th.html
[2] http://lars.nocrew.org/dpans/dpans.htm
[3] http://www.annexia.org/_file/jonesforth.s.txt
[4] http://www.baymoon.com/~bimu/forth/
[5] http://groups.google.com/group/comp.lang.forth/browse_thread/thread/287c36f0f2995d49/10872cb68edcb526?#10872cb68edcb526
[6] ftp://ftp.taygeta.com/pub/Forth/Applications/ANS/maf1v02.zip

Entry: Minimal bootstrap
Date: Mon Aug 11 15:46:05 CEST 2008

From http://groups.google.com/group/comp.lang.forth/browse_thread/thread/287c36f0f2995d49/10872cb68edcb526?#10872cb68edcb526

---
 FORTH Primitives Comparison (use a fixed width font)
---
3     primitives - Frank Sargent's "3 Instruction Forth"
9     primitives - Mark Hayes theoretical minimal Forth bootstrap
9,11  primitives - Mikael Patel's Minimal Forth Machine (9 minimum, 11 full)
13    primitives - theoretical minimum for a complete FORTH (Brad Rodriguez)
16,29 primitives - C. Moore's word set for the F21 CPU (16 minimum, 29 full)
20    primitives - Philip Koopman's "dynamic instruction frequencies"
23    primitives - Mark Hayes MRForth
25    primitives - C. Moore's instruction set for MuP21 CPU
36    primitives - Dr. C.H. Ting's eForth, a highly portable forth
46    primitives - GNU's GFORTH for 8086
58-255 functions - FORTH-83 Standard (255 defined, 132 required, 58 nucleus)
60-63 primitives - considered the essence of FORTH by C. Moore (unknown)
72    primitives - Brad Rodriguez's 6809 CamelForth
74-236 functions - FORTH-79 Standard (236 defined, 147 required, 74 nucleus)
94-229 functions - fig-FORTH Std. (229 defined, 117 required, 94 level zero)
133-?  functions - ANS-FORTH Standard (? defined, 133 required, 133 core)
200    functions - FORTH 1970, the original Forth by C. Moore
240    functions - MVP-FORTH (FORTH-79)
~1000  functions - F83 FORTH
~2500  functions - F-PC FORTH

FIXME   27 ?     - C. Moore's MachineForth

For comparison:
---
8       commands - BrainFuck (small,Turing complete language)
8     primitives - Stutter LISP
8     primitives - LISP generic
11     functions - OS functions Ritchie & Thompson PDP-7 and/or PDP-11 Unix
14    primitives - LISP McCarthy based
18     functions - OS functions required by P.J. Plauger's Standard C
Library
19     functions - OS functions required by Redhat's newlib C library
28       opcodes - LLVA - Low Level Virtual instruction set Architecture
51-56  functions - CP/M 1.3 (36-41 BDOS, 15 BIOS)
56     functions - CP/M 2.2 (39 BDOS, 17 BIOS)
40      syscalls - Linux v0.01 (67 total, 13 unused, 14 minimal, 40
complete)
71       opcodes - LLVM - Low Level Virtual Machine instructions
92+    functions - MP/M 2.1 (92 BDOS, ? BIOS)
102    functions - CP/M 3.0 (69 BDOS, 33 BIOS)
~120   functions - OpenWATCOM v1.3, calls - DOS, BIOS, DPMI for PM DOS apps.
150     syscalls - GNU HURD kernel
170    functions - DJGPP v2.03, calls - DOS, BIOS, DPMI for PM DOS apps.
206    bytecodes - Java Virtual Machine bytecodes
290     syscalls - Linux Kernel 2.6.17 (POSIX.1)

eForth primitives (9 optional)
----
doLIT doLIST BYE EXECUTE EXIT next ?branch branch ! @ C! C@ RP@ RP! R> R@ >R
SP@ SP! DROP DUP SWAP  OVER 0< AND OR XOR UM+ TX!
?RX !IO $CODE $COLON $USER D$ $NEXT COLD IO?

9 MRForth bootstrap theoretical
----
@ ! + AND XOR (URSHIFT) (LITERAL) (ABORT) EXECUTE

9 Minimal Forth (3 optional)
----
>r r> 1+ 0= nand @ dup! execute exit

drop dup swap

23 MRForth primitives
----
C@ C! @ ! DROP DUP SWAP OVER $>$R R$>$ + AND OR XOR (URSHIFT) 0$<$ 0=
(LITERAL) EXIT (ABORT) (EMIT) (KEY)

20 Koopman high execution, Dynamic Freq.
----
CALL EXIT EXECUTE VARIABLE USER LIT CONSTANT 0BRANCH BRANCH I @ C@ R> >R
SWAP DUP ROT + = AND

46 Gforth
----
:DOCOL :DOCON :DODEFER :DOVAR :DODOES ;S BYE EXECUTE BRANCH ?BRANCH LIT @ !
C@ C! SP@ SP! R> R@ >R RP@ RP! + - OR XOR AND 2/ (EMIT) EMIT? (KEY) (KEY?)
DUP 2DUP DROP 2DROP SWAP OVER ROT -ROT UM* UM/MOD LSHIFT RSHIFT 0= =

36 eForth
-------
BYE ?RX TX! !IO doLIT doLIST EXIT EXECUTE next ?branch branch ! @ C! C@ RP@
RP! R> R@ >R SP@ SP! DROP DUP SWAP OVER 0< AND OR XOR UM+ $NEXT D$ $USER
$COLON $CODE

BrainFuck
-------

> < + - . , [ ]

Stutter LISP
----
car cdr cons if set equal lambda quote

generic LISP
----
atom car cdr cond cons eq lambda quote

LISP, McCarthy based
----
and atom car cdr cond cons eq eval lambda nil quote or set t


Entry: next
Date: Tue Aug 12 09:11:43 CEST 2008

Maybe it's a good idea to leave the standard Forth idea alone for a
while.  It is definitely doable and an interesting challenge, but at
this moment, there are probably more useful things to focus on.
Additionaly, having two different paradigms for Forth might be
needlessly confusing.  So let's move on.  To do:

  * Staapler

     - just continue the roadmap. next goal = device ID readout.

  * Reference documentation

     - The forms 'patterns 'compositions and 'substitutions.

  * Internal language standard

     - control flow primitives: document this when writing the 14 bit
       core port.

     - standard library: I'm not sure if this is useful yet.  Probably
       best to wait and see until there are more targets.  It would be
       nice to be able to share most of the monitor code though.


Entry: comp.lang.scheme
Date: Tue Aug 12 09:37:14 CEST 2008

Trying a different kind of announce here..
--

Hello,

Announcing the recent release of Staapl, a library for metaprogramming
microcontrollers.

It is centered around the concept of an ``unrolled'' Forth language
tower, impedance-matched to PLT Scheme's declarative module system,
and uses a stack-based pattern language to implement primitives for
code generation, partial evaluation of the pure functional target
language subset and parameterized metaprogramming.

The representation language is a thin layer on top of Scheme
implementing a concatenative language with threaded state which can be
used independently of Staapl.

Current implementation contains a Forth syntax frontend to the
concatenative macro language, a backend code generator for Microchip's
PIC18 architecture, a tethered interaction system, and a test
application implementing a sound synthesizer.

Download & Documentation at http://zwizwa.be/staapl

Enjoy!
Tom


Entry: debugger protocol
Date: Tue Aug 12 11:55:27 CEST 2008

Apparently the debugger protocol for 18F is proprietary, but for
18F877 it's available here:

http://www.beyondlogic.org/pic/f877-6bk.pdf
http://ww1.microchip.com/downloads/en/DeviceDoc/51242a.pdf

The main idea behind the debugger is the use of a breakpoint register
and external halt.

Looks like this is for ICD and is obsoleted, replaced by ICD2.

Anyways, I don't really need it.  The use I've found for ICD2 is to
debug the debugger..  I might add some support for ICD2 later, but
let's focus on a more direct interpreter approach.


Entry: double debugging
Date: Tue Aug 12 12:21:49 CEST 2008

A problem I ran into during development of KRIkit is the double
debugger problem.  When writing an application involving a client and
a server, it is beneficial to be able to access both systems from the
same host.

I'm thinking about a simple daisy-chained system. The unused bits in
the boot monitor interpreter could be used as address bits.

The next step is serial patch-through.


Entry: parameterized code
Date: Wed Aug 13 09:14:46 CEST 2008

Context: writing a synchronous serial slave for the ICD2 programmer
protocol.  This involves code parameterized by 'clock' and 'data' pin
macros, and provides 'read' and 'write'.

It is time to properly tackle the problem I tried to solve with
loading different code modules into a namespace.  Currently, the only
way to introduce new bindings is to write parsing words.  These are
essentially prefix words that expand into arbitrary (prefixed) Forth
code.

I'm not entirely happy with:

  * 'load' into a shared namespace only works for 1 instance.

  * this pattern is too important to have it specified on top of the
    Forth prefix parser.

 Q: Is it possible to write down a simple solution in Scheme/Coma and
    translate this to a Forth prefix solution?


Entry: Coma code + instantiation
Date: Wed Aug 13 09:29:23 CEST 2008

Maybe it's time to start moving things from Forth syntax to Coma/sexp
syntax.  What's currently missing is an instantiation syntax for Coma
words.  Something like:

;; Define code generators
(compositions (macro) macro:
  (a  1 2 3)
  (b  a c)
  (c  a b))

;; Declare which of them employ run-time instantiation.
(instantiatie (macro) c b)

Problems:

  * Recursive macros.  During instantiation some recursive expansions
    might lead to infinite code size.  This needs a detection
    mechanism and possibly automatic instantiation.

  * Fallthrough and multiple exit points.  This needs special
    syntactic support.  Moreover, the ';' used in Forth is awkward to
    use in Scheme syntax.

  * Somehow it feels wrong to use Forth's structured programming words
    in the s-expression definitions.  Code blocks in the form of
    higher order macros seem to make more sense there..  Is this just
    aestetics?

The fallthrough/local-exit problem could be avoided by not allowing
them in a simple version of 'instantiate'.  The cost of these features
needs to be analyzed more: they are not for free and significantly
complicate the code graph instantiator.


Entry: problem with darcs-1 -> darcs-2
Date: Wed Aug 13 09:59:56 CEST 2008

I missed a patch on my laptop.. The one that cleans up instantiate.ss
How to fix?  Roll back the darcs-2 repo to the point right before this
patch, compute a diff and patch the new tree.

Alternatively: inspect the patch itself, see what changed and copy
over the files.

FIXME: the test doesn't work any more since the monitor changes. ok


Entry: Staapler change of plan.
Date: Wed Aug 13 18:52:06 CEST 2008

The plan has changed to move to PicKit2, since there's no way to do it
cheaper, and the platform seems open.  So what to do with Staapler?
Maybe just focus on using the ICD2 connector as a serial port.

Interesting: PicKit2 uses an 18F2550.  It might be directly
reprogrammable for Staapl use.


Entry: Interaction simulator
Date: Thu Aug 14 13:56:32 BST 2008

Maybe.. Instead of using a forth-style interaction mode, it is
possible to just completely simulate everything: interaction mode is
built on top of core Coma without target specialization, and the
resulting QW,CW code is interpreted.  Compiled definitions are kept in
the interpreter so they can be used in interaction.

This looks like a much saner model than the current one + it allows to
work towards some standard Coma semantics.


Entry: next
Date: Thu Aug 14 18:05:20 CEST 2008

  * specification of an internal language standard through simulation
    of non-specialized coma output.

  * create an instantiation syntax usable from scheme and write the
    fort-begin form on top of this.

  * think a bit about this whole csp/occam-pi thing.  figure out what
    the core automation problem is in the occam-pi compiler.  maybe a
    'manual' version can be included in Staapl?


Entry: books
Date: Thu Aug 14 19:37:02 CEST 2008

This is the collection of books I'd like to finish.  I'm being foolish
and read books without making exercises, trying to incorporate
knowledge in the design and implementation of Staapl.  TSPL, EOPL and
SICP were real eye-openers.


Done:
* TSPL http://www.scheme.com/tspl3/
* EOPL http://www.cs.indiana.edu/eip/eopl.html
* SICP http://mitpress.mit.edu/sicp/full-text/book/book.html (except logic)

Reading:
* CSP   http://www.usingcsp.com/
* TAOCP http://www-cs-faculty.stanford.edu/~knuth/taocp.html
* TAPL  http://www.cis.upenn.edu/~bcpierce/tapl/
* TAPOC http://www.comlab.ox.ac.uk/people/bill.roscoe/publications/68b.pdf

Todo:
* PLAI  http://www.cs.brown.edu/%7Esk/Publications/Books/ProgLangs/
* CTMCP http://www.info.ucl.ac.be/~pvr/book.html


Entry: dsPIC
Date: Fri Aug 15 08:48:46 BST 2008

Microchip is not really being very helpful providing anything else
than the .pdf programmers reference.  So, let's see if there's a way
to get a hold on the instruction set without typing it in.

The difference between dsPIC and the 8-bit PICs is the addressing
modes.  This chip has more of a classical RISC ISA.

Data memory hierarchy:

RAM, first word      (WREG)
RAM, first 16 words  (Wxx - Working registers)
RAM, first 4K        (File registers, Near RAM)
RAM, all 64K


Data addressing modes:

              | File Register
   |- Basic --| Immediate       | No Modification
   |          | File Register   | Pre-Increment
   |          | Direct          | Pre-Decrement
   |          | Indirect -------| Post-Increment
   |                            | Post-Decrement
   |                            | Literal Offset
   |                            | Register Offset
   |
   |            |- Direct
   |- DSP MAC --|
                |             | No Modification
                |             | Post-Increment (2, 4 and 6)
                |- Indirect --| Post-Decrement (2, 4 and 6)
                              | Register Offset


The only difficulty is to somehow encode the addressing modes
properly.  The generic template is:


 (file  (o b f d)       "oooo oooo obdf ffff ffff ffff")
 (lit10 (o b k d)       "oooo oooo obkk kkkk kkkk dddd")
 (lit5  (o b w k d)     "oooo owww wbqq qddd d11k kkkk")
 (alu3  (o b w q d p s) "oooo owww wbqq qddd dppp ssss")


lit5 -> lit4 for shifts

I'm not feeling much for typing it all in.. Isn't there a way to snarf
the assembler from a file generated by mplab?

Typing the address modes manually, the opcodes i can probably get that
way. So, roadmap:

1. generate an ASM file with all opcodes
2. run mpasm30
3. interpret output (binary?)

Setting up mpasm30.. I have an XP image somewhere.. Wait, there are
linux binaries.


Entry: Architectures: where to draw the line?
Date: Fri Aug 15 16:22:27 BST 2008

dsPIC has a gcc toolchain:

http://iridia.ulb.ac.be/~e-puck/wiki/tiki-index.php?page=Cross+compiling+for+dsPic

http://iridia.ulb.ac.be/~e-puck/share/cross-compiler/packages/pic30-binutils_2.01-1_i386.deb
http://iridia.ulb.ac.be/~e-puck/share/cross-compiler/packages/pic30-gcc_2.01-1_i386.deb
http://iridia.ulb.ac.be/~e-puck/share/cross-compiler/packages/pic30-support_2.01-1_all.deb


It's a bit silly to try to compete with that.  If Staapl should
support dsPIC, it needs to do so on top of C.

It might make sense to try to write some dsp-ish language that
compiles to assembler, but it doesn't look like there is much to gain
in writing an assembler + forth compiler in the same style as for
PIC18.  Once it's a multi-register RISC chip, C is really the way to
go.  Same for 32-bit ARM/MIPS.  Also, when there's a C compiler
available, not being able to integrate with it is commercial suicide.
For the small controllers, you're going to be the only tool in the
chain.  Not so for the bigger ones... there are going to be libraries
and C developers.

So, where should Staapl live?

- For 8-bit controllers < PIC18 Staapl = Forth based macro assembler.
  Implements native code generator.

- For 16/32-bit controllers that have a decent C compiler: Staapl
  provides Forth based scripting language + DSP-ish array processing
  languages.  Built on top of C.

- For 32-bit systems based on PPC/Intel: Staapl's PLT Scheme based
  meta system.

The unifying idea is concatenative languages: the bare metal macro
Forths for the low end, a linear typed Forth for the mid end and the
functional Scat/Scheme for the high end.


Entry: LuT comment
Date: Sun Aug 17 08:32:07 CEST 2008

Forth is very elegant minimalism, and hard to improve if you want a minimally complex self-hosted system.

But when you switch to a cross-compiled Forth system, the target side can be simplified a lot by taking out reflection.  For interactive applications, Staapl uses Frank Sergeant's <a href="http://pygmy.utoh.org/3ins4th.html">3 instruction Forth</a>.

On the host side however, minimalism isn't really necessary.  Having "just Forth" on the host side really seems like a limitation.  I see no reason why a cross compiler written in Forth can't be replaced by one written in Scheme.  To make this easier, Staapl uses an impedance matching language Scat, which is a concatenative language based on <a href="http://www.latrobe.edu.au/philosophy/phimvt/joy.html">Joy</a>. Staapl's code transformers are modeled after Forth's immediate words, but are represented as pure Scat functions.  All reflection is unrolled as acyclic PLT Scheme modules, making metaprogramming more straightforward.

When you skew ordinary Forth from words towards macros an approach like this where macros are as clean as possible seems to make sense. Staapl's PIC18 Forth starts out as all macros. There are no kernel words.

I've just updated the scribble docs in the <a href="http://planet.plt-scheme.org/display.ss?package=staapl.plt&owner=zwizwa"> PLaneT</a> package. The main project site is at <a href="http://zwizwa.be/staapl">http://zwizwa.be/staapl</a>.


Entry: PicKit2 arrived
Date: Mon Aug 18 16:58:46 CEST 2008


> piklab-prog -p pickit2 -d 18f1220 -c connect
piklab-prog: version 0.15.2 (rev. distribution)
programmer: pickit2
device: 18F1220
Using port from configuration file.
Connecting PICkit2 Firmware 1.x on USB Port with device 18F1220...
Error: USB Port: Could not find USB device (vendor=0x04D8 product=0x0033).

> sudo piklab-prog -p pickit2 -d 18f1220 -c connect
piklab-prog: version 0.15.2 (rev. distribution)
programmer: pickit2
device: 18F1220
Using port from configuration file.
Connecting PICkit2 Firmware 1.x on USB Port with device 18F1220...
Firmware version is 2.1.0
Warning: The firmware version (2.1.0) is higher than the version tested with piklab (1.20.0).
You may experience problems.
  set Vdd = 5 V and Vpp = 12 V
Error: USB Port: Error receiving data (ep=0x81 res=-110) (err=could not get bound driver: No data available).


Starting over:

From the PICkit 2 Interface guide Kit
PICkit2SourceGuidePCv2-52FWv2-32.pdf available in
http://ww1.microchip.com/downloads/en/DeviceDoc/FirmwareV2-32-00.zip

Looks like that's what I need.  The proper way is to integrate this
into Staapl, instead of using an external programmer.  Mostly because
the serial patch-through is what is most useful.

What is interesting about v2 firmware is that it is essentially an
interpreter for a script language.  This allows quite direct
manipulation of the interface, so it might be used for all kinds of
things!  It gives access to serial port emulation, I2C, SPI and the
normal clocked protocol.


Note: instead of adding the directory where the .dat file resides to
the run-time path, just copy that .dat file to /usr/local/bin together
with pk2cmd

Do i need to start digging in the .dat file?  Programming can probably
be outsourced to 'pk2cmd'.  The only interaction necessary is access
to the port directly.  The code in pk2cmd can be snarfed if reading
the .dat file should be necessary, but it's not priority it seems.

Next: libusb in PLT Scheme.  This looks like the excuse I need to dig
into FFI.  PICkit2 uses HID, which should be quite straightforward.
Does the FFI have a reader for C structs?

Hmm.. Just using emacs to edit the struct into an s-expression was
easy enough.

OK, got the reader to work... Why do people create such ad-hoc
formats?

The rest is for later.. Maybe the dsPIC compiler has a similar way of
reading in meta-data?

Entry: libusb and FFI
Date: Tue Aug 19 10:05:35 CEST 2008

Some questions:
 * is it possible to automate this?
 * what about self-referential structures?
 * pointer-pointer?

First contact seems ok, just need to resolve some issues.  Also, it
might be a good idea to define the usb structs on a higher level, so
they can be used in Forth too.

Probably need to read this first:
wget http://repository.readscheme.org/ftp/papers/sw2004/barzilay.pdf

Then start here:
http://libusb.sourceforge.net/doc/function.usbopen.html

A bit too much reading for now.. Need to switch to output mode again.


Entry: Parameterized programming
Date: Tue Aug 19 15:26:18 CEST 2008

Two forms of metaprogramming: anonymous macros and name templates.
I'm writing a blog article about this..

Now, object.  An object is something identified by a single reference
(for compile-time objects this would be a macro) that can be sent a
message.  I'm interested in a _static_ version of this abstraction,
which is not much more than a way to link namespaces at compile time,
or some way of plugging in behaviour..

I.e. a bidrectional communication port accepts two messages: 'read and
'write.  I'd like to write code that uses these messages, but is
parameterized by the object implementing it.

What could be done is to declare 'read and 'write as methods.  A
method is a macro that sends a message to a (couple of) object(s).

Hunch: I think it's better to persue that route instead of that of
prefix parsers, because they don't compose well.  Try to stick to
concatenative s-expression syntax, and improve partial evaluation
rules.

So... does 'late binding at compile time' make any sense?  The idea is
to let a method be something that sends a message to an object.


Entry: The s-exp language
Date: Tue Aug 19 16:31:56 CEST 2008

Maybe it's best to force the s-exp language into something really
distinct from Forth, to make it less confusing.  The idea in the end
is that instantiation is arbitrary: the current differences between
macros and forth, the latter allowing fallthrough and local exit etc,
are better captured somewhere else.

Quotations that are not instantiated can be instantiated and
associated to an execution token.  For 8-bit this gives 256
quotations.

So, can instantiation be automated? This would give a cleaner language
semantics.  In that case, manual instantiation is no longer necessary,
and can be classified as merely an optimization hint.

When not to instantiate?
  - doesn't produce compilable code    <- easy to check
  - if inlining produces BETTER code   <- needs a measure


Entry: lazy composition : concatenative vs. compositional
Date: Fri Aug 22 10:57:48 CEST 2008

I'm not using the concatenative vs. compositional property anywhere.
This extra inspection level could be useful for optimizations.  It
boils down to 'lazy composition'.

Bottom line: hiding the primary composition mechanism behind lambda is
NOT a good idea because it throws away information that might be
exploited during optimization.  The lambda representation IS a good
idea for introducing arbitrary primitives however.

The good part is that this is easy to change.

Also note that a CPS style representation is actually better than a
nested lambda expression representation.


Entry: abstracting over names
Date: Sat Aug 23 11:19:47 CEST 2008

#lang scheme/base

;; An attempt to find a standard mechanism for parameterizable
;; modules.  This consists of:

;;  * Find a way to build instantiated code from scheme.  Currently it
;;    only uses macros.

;;  * Create instantiation macros for parameterizable code.

;;  * Possibly write the Forth instantiation on top of this Scheme
;;    code.  Somehow unify ``data parameterization'' usable from
;;    macros with ``name parameterization'' usable from prefix macros.

;;  * Should this use higher order macros or just Forth control words?


;; The example is parameterized code for a synchronous read operation.

(define-syntax-rule (sync-reader clock data read write)
  (begin
    (compositions
     (macro) macro:

     (wait   clock low? if
               begin clock high? until
             then
             begin clock low? until)

     (read   0 (wait data @ 1 and or <<)  8 times)
     (write    (wait dup 1 and data ! >>) 8 times drop)

    (instantiate
        read write)))

;; This creates two instantiated words parameterized by two macros.
;; The 'clock and 'data arguments could possibly be passed as macro
;; arguments, but the 'read and 'write NAMES are identifiers; the
;; macro language doesn't know identifiers.


;; Original Forth syntax:

;; \ Instantiate 'read and 'write procedures for slave mode synchronous
;; \ communication.

;; macro
;; : wait-falling | reg bit |
;;     reg bit low? if begin reg bit high? until then
;;     begin reg bit low? until ;
;; forth

;; : read 8 for clock wait-falling ... next


Entry: planet build problems
Date: Sat Aug 23 11:24:34 CEST 2008

Planet will compile all the scheme files, so they better be working.
Make sure to run "make planet-test" before uploading!

win XP: readline not available, so let's take it out.


Entry: second documentation pass
Date: Sun Aug 24 11:10:42 CEST 2008

Let's look at the available documentation.  Is it any good?  The first
thing that comes to mind is: "How clear is the purpose?"

Not very I think..  This should be paragraph #1.

Let's fix the introduction first.

  * Stack machine model: can keep fine granularity on to very low
    complexity hardware.  This influences code size and processor
    complexity.  RISC has a factorization problem due to the global
    nature of registers.  Abstracting small machines as stack machines
    is feasible, since thy are usually not too engrained in the
    register model.  ( Is this really so? Get data about popularity. )

  * Concatenative language model built on top of this: simple
    framework for staging and partial evaluation, alternative to
    lambda calculus.


Entry: machine model
Date: Sun Aug 24 19:25:38 CEST 2008


Now, pragmatically, it would probably be easier to market Staapl as a
machine model instead of a programming language.  This model should be
a stripped down version of ANS Forth, with reflective words removed.
Let's have a look at Moore's "core set" and his processor primitives.
Staapl needs more: Harvard architecture can't be abstracted.

Two possible complementary usage scenarios:

  * individual embedded developers using the macro forth or the static
    combinator language directly to write applications and provide new
    target chips.

  * using the language model as a target language for higher order
    description compilation in a model-based design approach.

Now, compared to other machine models, the one in Staapl is a 'mid
point': you write the standard primitives in terms of appropriate
machine primitives in the same system.  There is no intermediate "byte
code" representation that requires an explicit byte code interpreter
with optional jit: the idea is to use just code transformers.

Let's see

All arguments are data cell size, or arbitrary compile time
identifiers that can be optimized away.  This exposes 3 stacks.  One
stack is partially evaluated (the D stack) which means it is present
at run time and at compile time (containing quoted, opaque machine
instructions or hibrid immediate instructions).  The R stack is only
present at run time and the M stack only at macro execution time.

Note that the machine model does not specify machine flags: macros for
flags should be hidden in the lower layers.  When these primitives
require runtime support, they should redirect to the same name
prefixed with a tilde "~" character.  Conditions "?" are probably
compile-time constructs to ensure optimal encoding of branch and skip
instructions.  If they survive to runtime, they are encoded as 0=false
and true otherwise.

(primitives

;; WORD             D          R      M

;; Data

(@                  (a -- x)   (--)   (--) "Fetch data")
(!                  (x a --)   (--)   (--) "Store data")
(+ - * / and or xor (x y -- z) (--)   (--) "Binary operator")
(<< >> 2/           (x -- y)   (--)   (--) "Unary operator")


;; Aux stack
(>r                 (x --)     (-- x)   (--) "To r stack")
(r>                 (-- x)     (x --)   (--) "From r stack")
(r@                 (-- x)     (x -- x) (--) "Fetch from r stack")

;; Control

(call               (--)       (-- a) (l --) "Call label")
(jump               (--)       (--)   (l --) "Tail call label")
(exit               (--)       (a --) (--)   "Return to caller.")
(or-jump            (? -- )    (--)   (l --) "Conditional jump to label")

;; Indirect memory access

(@a                 (-- x)     (--) (--) "Fetch RAM.")
(@a+                (-- x)     (--) (--) "Fetch RAM, increment a")

(@p                 (-- x)     (--) (--) "Fetch RAM.")
(@p+                (-- x)     (--) (--) "Fetch ROM, increment p")

)


Entry: Broad spectrum
Date: Sun Aug 24 22:45:11 CEST 2008

It is easy to implement a stack machine quasi-optimally on a low-end
8-bit machine, the PIC18 being the canonical example.  For RISC
architectures, some whole-program analysis is probably in order to
optimize register usage.  Is it worth it to keep that road open?  In
other words: should Staapl aim at broad-spectrum complexity, or does a
C backend suffice?

There are a couple of things to distinguish: (same for 16 vs 32)

  * ease of porting 8-bit apps to 16-bit cores.
  * ease of introducing a 'data doubler' for 8-bit cores.
  * data-flow analysis and register allocation for DSP/RISC cores

What about ease of porting?  If it is possible to define chip targets
in a way that allows static checking and possibly derivation of
optimization rules, a lot could be gained.  Is a unified assembler /
simulator feasible?  For processor cores this doesn't seem so
difficult: once say 3 random chips are implemented, generalizing them
should be straightforward.

So, what about solving the vendor lock-in problem?  Maybe C already
solves this..

Yes, simulators again.  That's where the real beef is.  If I take the
effort to write an assembler, I should perform a little more work and
provide an instruction simulator too.  Otherwise it's probably best to
try to work the supplier-provided assembler in textual form.

Anyways, I should have a look at tiny device C compiler, see if
there's nothing to snarf there.

http://sdcc.sourceforge.net/


Entry: trade magazines
Date: Sun Aug 24 23:03:44 CEST 2008

http://www.dspdesignline.com/
http://www2.electronicproducts.com
http://www.techonline.com/
http://www.microcontroller.com
http://www.embedded.com/mag.htm
http://www.deepchip.com/
http://www.semiconductor.com
http://www.design-reuse.com/


Entry: interoperation and snarfing
Date: Mon Aug 25 10:28:07 CEST 2008

Yes, finding a nice to have the project survive is difficult.  Maybe
it is better to start focussing on using 3rd party tools.  As
mentioned in the last post: the only real reason to write an assembler
is to also write a simulator.  Doing this without machine readable
processor descriptions is asking for trouble.  I had this idea a few
weeks ago about writing an instruction extractor that snarfs directly
from the vendor-specified assembler, to extract opcodes when
addressing modes are provided.  Maybe a similar thing could be done
with semantics?  Run tethered programs to see how each instruction
behaves, which flags it sets etc..  There are ways to make this not a
completely manual unverified "copy from manual" story.

However, this is a road that needs focus, so it might be a better idea
to avoid it, and start using vendor assemblers, and forget about the
simulator idea.

About simulation: the difficulty is not necessarily to simulate
processor cores, but to simulate peripherals.  Without help from the
manufacturer, this is really not doable.

So.. It looks like processing text files is going to be part of the
problem.  I need to have a look at the Ometa parser language.  This
might come in handy for integration work: use grammars not parsers.


Entry: PICkit2 + libusb
Date: Mon Aug 25 10:35:54 CEST 2008

Today I'm going to have a look at the PICkit2 serial patch through.  I
think the target-side software problem (parameterized code problem)
that made me drag my feet is solved.  The approach I found would help
towards building the control combinator Coma extension more useful for
metaprogramming.

Next step: understand the FFI.  I got the Barzilay/Orlovsky paper in
front of me.

There's a thread about libusb and some code by Jakub Piotr Cłapa here:
http://list.cs.brown.edu/pipermail/plt-scheme/2007-March/016671.html


Entry: USB / model based design?
Date: Tue Aug 26 15:28:15 CEST 2008

I'm writing some support code for the Univeral Serial Bus.  The
problem with usb is its (too) complex setup procedure.  It is quite
general, and has a lot of overhead when you want to do simple things
with little meta-data overhead.  It seems that this is the main reason
most people stick to the HID subsystem, moreover because this does not
require device driver developent.

I'm looking at this from two perspectives:

  * Need to support USB PICkit2 through libusb/FFI
  * Client code for the PIC18

The host code necessary to drive the PICkit2 is interesting to figure
out how things work, while the client code for the PIC18 is an
interesting excercise for code generation from a higher level
description.  I've attempted this before without much success.

The eventual state of a USB connection is data packets moving back and
forth between virtual pipes called ENDPOINTs.

Apparently, the host side isn't that complicated.  For PICkit2 I get
away with claiming the interface and sending/receiving 64 byte buffers
to ENDPOINT1.


Entry: prescheme
Date: Mon Aug 25 23:32:02 CEST 2008

As mentioned before, for RISC chips some dataflow analysis is
necessary to do proper register alloc.  I should really have a look at
Factor's new intermediate rep.  Probably a lot to learn there.

Also, I'd like to define an applicative first order language that's as
simple as possible, to get an idea of the complexity of a compiler
that support such a language, and to see how far such a thing is from
properly data-flow analysed stack code.  Maybe have a look at Jal?
(Hmm.. according to Wouter the source code is a mess..)


Entry: PK2 replies
Date: Tue Aug 26 19:36:45 CEST 2008

box> ,staapl (enter! "staapl/pickit2/pk2.ss")
box> (open-pk2)
box> (pk2 GETVERSION)
(2 32 0 41 64 181 68 24)

That's firmware 2.32.0 and some other stuff.

got minimalist interpreter.ss working so commands and scripts can be
created conveniently:

box> (pk2-cmd
      (EXECUTE_SCRIPT
       (pk2-script (VDD_OFF)
                   (VDD_GND_ON)))
      (EXECUTE_SCRIPT
       (pk2-script (VPP_PWM_OFF)
                   (VPP_OFF))))
(166 2 254 253 166 2 248 250)

Reply works too.

box> (pk2-cmd (GETVERSION))
(2 32 0)

Ok.. this is complete overkill, but it was fun :)

Something doesnt seem to work:

box> (pk2-cmd (EXECUTE_SCRIPT (pk2-script (READ_BYTE_BUFFER))))
box> (pk2-cmd (UPLOAD_DATA))
()


Changed names to 'script-begin and 'command-begin, cleaned up the code
a bit more and concatenate all replies from executing multople
commands in a single 'command-begin.

Next: test the serial output.

It's pretty clear that something doesn't work with executing scripts:
This doesn't turn on the led.

box> (pk2-script (BUSY_LED_ON))
()

From the manual: "RUN_SCRIPT and EXECUTE_SCRIPT commands will be
ignored when any of StatusHigh bits 7:1 are set"

Apparently, status needs to be read before it works.


Ok, going to have some more fun.  Now I got the interpreter macros,
but they behave a bit awkward for composition.  I'd really like to
model the instructions as functions.  Let's organize the namespaces so
the commands can be in plain sight.

This works pretty well! Got free tab completion and all :)


Entry: demos
Date: Wed Aug 27 12:30:07 CEST 2008

  * model-based design demos: the (graphical) DSP language compiled
    down to the PIC18 Forth machine + registers, and a generic USB
    driver generator.

  * PICkit2 interaction: complete the driver + write a bitbang serial
    monitor.


Entry: two distinct uses of byte code
Date: Fri Aug 29 16:19:24 CEST 2008

I'm sick, so this might not make sense.

There are 2 uses of byte code:

  * as decoupling interface
  * as serialization protocol

In the first case, bit patterns are associated to specific semantics.
All programming uses this byte code explicitly, possibly from a shared
table where it's specified only once, used by both sides of the
interface.

In the latter case, byte code is a consequence of having to squeeze
information over a channel between two componets of the SAME system,
and its values are essentially arbitrary: if they are never touched by
a programmer, they can be generated automatically.  This allows for
COMPRESSED byte codes: ones that result in dense code.

The are destinguished by the code being published or not.


Entry: standard programming cable
Date: Fri Aug 29 17:30:54 CEST 2008

http://www.electro-tech-online.com/micro-controllers/36934-icd2-vs-pickit-2-a.html#post289566

  When I designed the Inchworm (ICD2) and more recently the Junebug
  (PK2) I recall why I choose the 2x5

  The RJ12 on the Microchip ICD2 does not have 0.1" pinning
  (breadboards & protoboards are usually 0.1")

  The 1x6 female header on the PICkit 2 IMO simply comes loose to
  easily plus its not readly designed for use with a programming cable
  (the Junebug has a 16pin extended version available)

  Some clones use a 1x5 (the prehistoric PGM pin is omitted) polarized
  male header, you need a crimp tool to build them. There is also no
  strain relief so you have to be careful.

  Other clones use a 2x5 male header and IDC cable. This is the system
  my kits use.

  1. Inexpensive
  2. Robust the IDC cable has strain reliefs
  3. Reliable on my kits I double up the connections
  4 Can be assembled easily a small vice works well
  5 Can be used on a solderless breadboard (the doubled connections work well here)


Looks like a plan.


Entry: PICkit2 UART mode
Date: Fri Aug 29 18:45:21 CEST 2008

Apparently, READ_STATUS clears the uart mode.. This gives a nice
squarewave:

 (begin
  (ENTER_UART_MODE (baud 300))
  (DOWNLOAD_DATA (build-list 60 (lambda _ #x55))))

Baud calc is correct too.
Measured 35ms pulse distance which is about 30Hz.

Hardware loopback works too.

Bootstrap roadmap:

  * Connect the ISP and UART pins on the target together, so current
    monitor can be used.

  * Set up host side data connection.

  * Implement target side bit-banged serial.

  * Support native programming so we can go from full recompile &
    upload straight to interaction without external tools.


Maybe it's actually easier to setup programming first.  Let's have a
look.

How to request device ID?  A script?  Yes, looks like it.


Entry: Applications
Date: Fri Aug 29 21:38:13 CEST 2008

http://www.automotivedesignline.com/howto/210200925;jsessionid=ZUA0HLFOIMXXKQSNDLPSKHSCJUNN2JVN?pgno=3
http://www.autosar.org/find02_07.php

Have a look at the microcontroller abstraction layer.


Entry: MetaML
Date: Sat Aug 30 10:26:45 CEST 2008

http://en.wikipedia.org/wiki/MacroML

  MacroML is an experimental programming language based on the ML
  programming language family that seeks to reconcile ML's static
  typing systems, and the types of macro systems more commonly found
  in dynamically typed languages like Scheme; this reconciliation is
  difficult as macro transformations are typically Turing-complete and
  so can break the type safety guarantees static typing is supposed to
  provide.

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.10.6987

Read that and figure out the idea behind macros in a staged setting.


Entry: more examples
Date: Tue Sep  2 18:58:27 CEST 2008

After couple of days of rest it's become pretty clear that what the
project needs most is more examples.  Plenty of things to do though:

   * Finish PICkit2 interface (module = bitbanged serial port)
   * Usb drivers (module = generic usb + hid + serial + audio)


Other things that keep coming back:

   * parsers (Ometa)
   * quasiquotation, MetaML and syntax-case
   * difference btwn. metaprogramming Scheme and lambda calculus
     only. (define global identifiers?)


Entry: quasiquotation and staging
Date: Tue Sep  2 22:23:08 CEST 2008

quasiquotation = merging of namespaces
syntax-case does this by importing identifiers

look at oleg's paper about metaml vs. quasiquote and renaming.  maybe
the essential difference is the compile time identification of bound
names?  scheme's macro facility only makes sure that names come from
the right lexical environment, but they do not identify binding forms
and variable references.

another thing is cross stage persistence: quoted code creates
closures (variables from generation time are still available).

so, quasiquotation. apparently i got the nested version wrong:

http://repository.readscheme.org/ftp/papers/pepm99/bawden.pdf

  "The innermost comma is associated with the outermost backquote."

I found a definition of quasiquote here:

http://srfi.schemers.org/srfi-46/eiod.scm

(define-syntax quasiquote
  (syntax-rules (unquote unquote-splicing quasiquote)
    (`,x x)
    (`(,@x . y) (append x `y))
    ((_ `x . d) (cons 'quasiquote       (quasiquote (x)   d)))
    ((_ ,x   d) (cons 'unquote          (quasiquote (x) . d)))
    ((_ ,@x  d) (cons 'unquote-splicing (quasiquote (x) . d)))
    ((_ (x . y) . d)
     (cons (quasiquote x . d) (quasiquote y . d)))
    ((_ #(x ...) . d)
     (list->vector (quasiquote (x ...) . d)))
    ((_ x . d) 'x)))


Entry: That safe language
Date: Thu Sep  4 01:20:24 CEST 2008

Maybe, what i need is a language that can be completely expressed as
rewrite rules instead of stacks?  Maybe it makes sense as a didactic
tool?


Entry: pk2cmd
Date: Thu Sep  4 13:12:03 CEST 2008

Simplified entry trace in pseudo-C:

main()
pk2app.PK2_CMD_Entry(argc, argv);
checkHelp(argc, argv);
PicFuncs.ReadDeviceFile(tempString);
Pk2OperationCheck(argc, argv);  // check if conn is necessary
findPICkit2(pk2UnitIndex);  // find pk2 and init comm
processArgs(argc, argv);  // execute commands
PicFuncs.ReadPkStatus();  // only if it did comm

the beef is in processArgs()

PicFuncs.VddOff();  // make sure VDD is off
// look for part name first
PicFuncs.FindDevice(tempString); // look for the device in the device file

// from then on, process args according to 4 priorities
// these are in cmd_app.cpp and take you directly to the action
priority1Args(argc, argv);
checkArgsForBlankCheck(argc, argv);
priority2Args(argc, argv);
priority3Args(argc, argv);
priority4Args(argc, argv);
delayArg(argc, argv);


Examples:
program a .hex file:

tom@zzz:/tmp$ pk2cmd -pPIC18F1220 -fsynth.hex -m
PICkit 2 Program Report
4-9-2008, 13:32:55
Device Type: PIC18F1220

Program Succeeded.

Operation Succeeded


this contains options:

-p   device
-f   hex file selection:

// in cmd_app.cpp
bool Ccmd_app::priority1Args(int argc, _TCHAR* argv[])
     	case 'f':
	     	ret = ImportExportFuncs.ImportHexFile(tempString, &PicFuncs);

-m   program memory

// in cmd_app.cpp
bool Ccmd_app::priority2Args(int argc, _TCHAR* argv[])
// -M Program
if ((checkSwitch(argv[i])) && ((argv[i][1] == 'M') || (argv[i][1] == 'm')) && ret){
  if (hexLoaded) {
     if (argv[i][2] == 0)
     { // no specified region - erase then program all
       if (PicFuncs.FamilyIsEEPROM()) {}  // ignoring eeprom programming
       else {
         bool rewriteEE = PicFuncs.EraseDevice(true, !preserveEEPROM, &usingLowVoltageErase);
         if (PicFuncs.FamilyIsPIC32()) {}  // ignore PIC32 programming
         else {
         // program all but configs and verify, as configs may contain code protect
         // code here mainly calls PicFuncs.WriteDevice
         }
       }
     }
  }
}


// finally, in PICkitFunctions.cpp
bool CPICkitFunctions::WriteDevice(bool progmem, bool eemem, bool uidmem, bool cfgmem, bool useLowVoltageRowErase)

// this function contains the main programming flow:

// compute configration information.
int configLocation = (int)DevFile.PartsList[ActivePart].ConfigAddr /
            DevFile.Families[ActiveFamily].ProgMemHexBytes;
int configWords = DevFile.PartsList[ActivePart].ConfigWords;
int endOfBuffer = DevFile.PartsList[ActivePart].ProgramMem;

SetMCLR(true);     // assert /MCLR to prevent code execution before programming mode entered.
VddOn();

// ignoring some device specific things (OSCCAL/Keeloq/PIC24/
RunScript(SCR_PROG_ENTRY, 1);
if () {
   DownloadAddress3(0);
   RunScript(SCR_PROGMEM_WR_PREP, 1);
}
while data {
  DataClrAndDownload(downloadBuffer, DOWNLOAD_BUFFER_SIZE, 0);
  RunScript(SCR_PROGMEM_WR, scriptRunsToUseDownload);
}

// The functions called here are very simple wrappers around
// writeUSB() that build a command packet.


// programming EEPROM DEVID and CONFIG is similar


Entry: pickit disassembler
Date: Thu Sep  4 14:30:15 CEST 2008

Because it's probably not too hard to do, and because I need a way to
see if the PK2v2 VM can be ported to AVR (not using SFR while
programming).


box> (pk2-script 1 dasm)
Standard 12V Vpp 100ms delay after entering.
Drop Vpp & retake high to try entry on parts with internal MCLR & osc, and ICSP pin(s) high outputs.
(VPP_OFF
 MCLR_GND_ON
 VPP_PWM_ON
 BUSY_LED_ON
 SET_ICSP_PINS
 47872
 DELAY_LONG
 20
 MCLR_GND_OFF
 VPP_ON
 DELAY_SHORT
 127
 MCLR_GND_ON
 VPP_OFF
 VPP_ON
 MCLR_GND_OFF
 DELAY_LONG
 19)


Entry: microchip forum post
Date: Thu Sep  4 17:02:50 CEST 2008

First, i'd like to point out that it is possible to use Staapl without
the metaprogramming.  I've tried to be careful to not let high
abstractions get in the way when problems are simple and don't call
for them.

The website contains a couple of tutorials for this, and Staapl's main
test application (a sound synthesizer) is of that class.

The only thing that's generated there is a lookup table of note
frequencies, a geometric series.  This is the Scheme code that
implements that macro:
http://zwizwa.be/darcs/staapl/staapl/pic18/geo.ss

For the metaprogramming, I'll macro-expand the sales pitch a bit. I
haven't found a concise way to introduce the idea to people not
familiar with both Forth and Scheme..

> Staapl is a collection of abstractions

A library of scheme functions and macros

> for metaprogramming microcontrollers

that can be used to construct programs in a lower level language based
on Forth, and interactive code for testing program operation.

> from within PLT Scheme.

Staapl integrates with PLT Scheme's macro/module system, a solid base
for writing large applications and domain specific languages.  The
main point is that all name space and name scope management is PLT
Scheme's.

> The core of the system is a programmable code generator

Think of it as a macro assembler which uses a stack to pass arguments.
This leads to a prefix assembly language, instead of the normal
postfix OPERATOR OPERAND ...

> structured around a functional concatenative macro language

The 'concatenative' part is where the ease of metaprogramming is
hidden: composition of programs is just concatenation of strings.  (
This is how PostScript is mostly used. ) The macros are not quite
Forth, but a side-effect-free higher-level version that makes the
Staapl core a lot simpler.

> with partial evaluation.

This is basicly 'constant folding' optimization, but it is also used
to write template code.  A concatenative functional language makes
partial evaluation almost trivial.

> On top of this it includes a syntax frontend for creating Forth style languages

The core is not Forth, but I've added a frontend that looks a lot like
Forth.  This is historical, since the original idea was to build a
forth compiler, but the core evolved into a different language.

> a backend code generator for the Microchip PIC18 microcontroller architecture

A standard macro set implements a stack machine layer for the
PIC18. I'm working on 12 and 14 bit cores.

> and interaction tools for shortening the edit-compile-run cycle.

One of the great benefits of Forth is its interactivity: with the
target chip running, you can create new functions and try them out.
However, Staapl is not self-hosted (not stand-alone) so interactivity
needs to be simulated.  This allows to use interactive development on
tiny chips.


Now, why does this matter?  The original goals of this project are basicly:
* Interactive Forth development for tiny chips.
* Easy generation of code that has a large amount of red tape, but is
   difficult to express in a low-level language.

The first one is really just standard Forth approach. If you've never
tried it, give Flashforth a try. ( http://flashforth.sourceforge.net )
It stays closer to traditional Forth.  Staapl's Forth is more
optimized for the integration with Scheme.

The latter one is more ambitious and is heavily inspired by the
Lisp/Macro approach of writing domain specific languages.  This is
sometimes called model-based design/development: use high level
descriptions to compile to low level form.  Add to this that Forth is
very simple to generate, and you have a high+low level system that is
impedance-matched.

The problem is that for small projects, this approach is overkill.
I'm aiming mostly at large code bases which contain a lot of
parameterization.

I'm working on a USB driver framework that's written using such an
approach, USB containing a significant amount of red tape to make you
want to keep separate specificiation and implementation (which then
becomes a special purpose translator from specification to concrete
code).

If this model-based idea rings a bell, drop me a line.  I'm looking
for concrete problems to apply this to ('compiling' specifications).


                     * * *

    quote:

    ORIGINAL: DavidP5

    I know that functional languages can be quite helpful in language
    translation. The purpose of this Staapl system seems to be for
    translating from one language to another (in this case Forth-style
    languages are translated, using the functional language Scheme,
    into PIC18 assembly code?). So it seems that you can create your
    own Forth-style language and then use this system to make your own
    compiler for the language. But, I'm sure you would have huge
    reservations about such a plan for microcontroller code
    generation. I would be interested inhearing from the OP whether I
    have this correct.


There are 2 translation phases involved.

1.

The Forth macro language translates concatenative code to machine
code.  To port this to a new target, some key macros need to be
defined that generate the appropriate machine code for a simple set of
primitives. Optionally, this can add some optimizations.

Most of the core compiler is reused. This layer mostly presents a
'generic microcontroller' to the upper layers if desired. It's
perfectly feasible to use this Forth language as a programming
language by itself.


2.

Code for this basic lowlevel Forth language can then be generated from
within Scheme. This is if you want to use the 'generic
microcontroller' stack machine as the machine to target your
domain-specific description language.


Entry: more pk2
Date: Thu Sep  4 18:51:32 CEST 2008

Added accessors to the pk2 module namespace that query the current
device's properties:

box> (print-script (ProgMemWrWords))
Erase progmem (09) & EE(0B), int timed 10ms

WRITE_BITS_LITERAL #x06 #x00
WRITE_BYTE_LITERAL #x00
WRITE_BYTE_LITERAL #x00
WRITE_BITS_LITERAL #x06 #x09
DELAY_LONG #x02
WRITE_BITS_LITERAL #x06 #x0B
DELAY_LONG #x02

box> (script (ProgMemWrWords))
Erase progmem (09) & EE(0B), int timed 10ms
(238 6 0 242 0 242 0 238 6 9 232 2 238 6 11 232 2)


This required only minor modification:

;; Also create accessor thunks for properties on top of hash DB.  These
;; will get the property of the current part (makes name-checking
;; static).
(define-syntax-rule (define-reader/provide name (type id . args) ...)
  (begin
    (define-reader name (type id . args) ...)
    (begin (define (id) (property 'id)) ...)
    (provide id ...)))
(define part (make-parameter 'PIC18F1220))
(define (property tag [dev (part)])
  (let* ((part (hash-ref *part-index* dev))
         (fam  (vector-ref *family* (hash-ref part 'Family))))
    (hash-ref part tag
              (lambda ()
                (hash-ref fam tag)))))


Trying to make a device-read script now, but something is missing: the
code executes scripts on the device, but i didn't see the part where
it gets uploaded.

The lowlevel routine is
\\ PICkitFunctions.cpp
bool CPICkitFunctions::downloadScript(unsigned char scriptBufferLocation, int scriptArrayIndex)

The one that calls this is:
void CPICkitFunctions::downloadPartScripts(int familyIndex)

This is called by
void CPICkitFunctions::PrepPICkit2(void)

Which also sets VDD and Vpp voltages


Scripts start from index 1, not 0 !!

box> (SCRIPT_BUFFER_CHKSM)
(0 0 0 0)
box> (DOWNLOAD_SCRIPT 1 (script (ProgMemRdScript)))
Reads 64 words of program memory.
()
box> (SCRIPT_BUFFER_CHKSM)
(1 0 238 0)

So uploading seems to work.  Execution not though:


;; CPICkitFunctions::ReadDevice
(define (read-program-memory)
  (READ_STATUS)

  (MCLR_GND_ON)
  (EXECUTE_SCRIPT  (VDD_GND_OFF)
                   (VDD_ON))             ;; we power the device

  (EXECUTE_SCRIPT  (CLR_DOWNLOAD_BFR)
                   (DOWNLOAD_DATA 0 0 0))
  (EXECUTE_SCRIPT  (script (ProgMemAddrSetScript)))

  (CLR_SCRIPT_BFR)
  (DOWNLOAD_SCRIPT 1 (script (ProgMemRdScript)))

  (EXECUTE_SCRIPT  (CLR_UPLOAD_BFR)
                   (RUN_SCRIPT
                    1 ;; index ???
                    1 ;; repititions?
                    )
                   (UPLOAD_DATA_NOLEN)
                   (UPLOAD_DATA_NOLEN)))


I'd like to eliminate the use of 'script' too.  For printing scripts,
one should use the 'properties function.  This requires some type
spec, maybe derived from the name?


Entry: remarks about pk2.ss
Date: Fri Sep  5 01:45:58 CEST 2008

Can I say something intelligent about the pk2.ss implementation?  It's
about primitives and composition, and hiding as much as possible
behind NAMES, exposing only a single behaviour for each... I.e. a
selector name in the device data base representing a script has the
ONLY behaviour to reproduce the script code, parameterized on the
current device.  All indirection in the datastructure can be hidden
behind this one name.

3 namespaces are brought together: commands, script primitives and
scripts from the .dat file.  One remaining confusing bit is that
scripts can be invoked as if they were commands without error.

2 semantics: commands and scripts.

What I fail to see is why these 2 semantics are necessary.  Basicly,
the scripts are composable with something like ordinary 'call' in the
form of RUN_SCRIPT, which has an extra repeat parameter.  The commands
are a flat meta-language.

Something to fix: make sure it's not possible to confuse scripts and
commands by making their return values distinct.  It is convenient to
not have to manually 'execute' commands, but confusion is easy.


Entry: MetaML vs Scheme Macros
Date: Fri Sep  5 02:16:24 CEST 2008

In MetaML's staged programming, the substitution is done on binding
expressions: the static analysis knows which are binding constructs
and which are variable references.  For Scheme macros this is not
possible for generated code, however, it is possible to identify
bindings using bound-identifier=?

I'm confused...


Entry: blink-a-led
Date: Fri Sep  5 09:48:55 CEST 2008

As requested on the Microchip forum.  Actually not a bad idea.

Preparing some code on the 452 proto board.  Requires a bit of
shuffling.  Made a command line compiler wrapper script to use in
examples.


Entry: The 18F452
Date: Mon Sep  8 11:59:18 CEST 2008

Trying to share some code for the monitor, making it easier to have a
"default" interpreter running (current code contained too much red
tape).  Implemented as macros, so separate instantiation is possible.

The word 'init-all/console will perform all initialization necessary
to setup the machine model (stacks) and initialize the serial port for
the console.  In addition, it starts the interpreter if a serial
connection is detected (this requires a pulldown resistor on the
serial port).

Some things to fix: serial port baud rate: should be derived from a
single specification. OK

What needs to happen still?
I'd like to use this 'sc' script installer, but that's a pain..


Entry: Mail from JPC
Date: Mon Sep  8 15:28:27 CEST 2008

* http://www.sics.se/~adam/pt/
  Isn't that just a statemachine?

* Objective-C AutoreleasePools
  Isn't that just tree rewriting?

* ARM THUMB code: use this as a native Staapl targed, and use C for
  the 32-bit mode.  Also, check out fmt.ss :
  http://planet.plt-scheme.org/display.ss?package=fmt.plt&owner=ashinn

* Look at typed scheme for namespace hierarchy stuff..

Entry: make-mzscheme-launcher
Date: Mon Sep  8 17:56:49 CEST 2008


I wanted to send this to the list before I ran into
'make-mzscheme-launcher.

--

Hello,

I'd like to figure out a way to install (platform specific) wrapper
scripts that call the main entry point of an application stored on
PLaneT.

I.e. the Staapl compiler has a command line front-end that can be
invoked as:

        mzscheme -p zwizwa/staapl -- <compiler arguments>

But for convenience i'd like to wrap this in a script called 'sc':

        #!/bin/bash
        exec mzscheme -p zwizwa/staapl -- "$@"

Now, I wonder if there's a way to do this so that it "just works" on
all platforms supported by PLT Scheme, after doing something like:

        sudo mzscheme -p zwizwa/staapl/install

Also, is it possible to generate some progress report during
installation of a package?  I understand the default of quiet is to be
preferred in many cases, but this seems to confuse people.. (Is there
really something happening??)  Maybe as a flag for the 'planet'
binary?

Cheers,
Tom


Entry: old partial evaluation explanation
Date: Tue Sep  9 12:55:40 CEST 2008

( This was removed from the blog and replaced with a post that
stresses the BINDING and QUASIQUOTATION mechanisms used to implement
SUBSTITUTION RULES that are INSPIRED by PARTIAL EVALUATION )

http://zwizwa.be/ramblings/staapl-blog/20080526-203330

-------------

So, how does it work?

       PE from (greedy) deterministic pattern matching
                             =
                 a typed template language

So, by fixing the algorithm used to implement PE, a language emerges
that is useful for other kinds of code generation. Let's spin that out
a bit.

PE in a concatenative language is quite straighforward: function
composition is associative which makes evaluation order a parameter to
play with. Compositions of pure functions can be performed at compile
time to generate composite functions that are more efficiently
implemented, while other compositions can be postponed to run time due
to dependency of dynamic data (1).

This is because concatenative syntax allows to abstract away the
composition method: a function can always be trivially inlined,
instead of being invoked at runtime using a the run-time composition
mechanism: the machine's function call (2). When inlining multiple
functions, there can be an opportunity for specialization by moving
some computations to compile time. For example, inlining the functions
[ 1 ], [ 2 ] and [ + ] produces an inlined composition [ 1 2 + ] which
can be replaced by the composition [ 3 ]. This is automatic program
specialization by Partial Evaluation in its purest form.

In Purrr, the Partial Evaluator is not implemented as a separate
component. PE is a consequence of the actions of the machine model,
which is specified in terms of primitive Purrr macros, which implement
a map of recently generated code (the top of the compilation stack) to
new code to be generatied (placed on the compilation stack). These
primitives are expressed in a language with deterministic pattern
matching. It allows the specification of the following compiler
components:

  * target code generation
  * peephole optimization
  * partial evaluation
  * generic parameterized template instantiation

The first 3 could be counted as components of pure partial evaluation.
The last one however is not: it is an interface that connects the
concatenative macro language to explicit code generation tools. It
allows the use of templates that have no target semantics unless they
are parameterized.


Why is this useful?

Say you want to implement 'cos' as a function of two arguments like

    cos ( angle scale -- value )

Realizing that a true 'cos' function is never used in the target code
because the scale can be fixed and is available at compile time, it
can be implemented as a template that generates a lookup table and
code to lookup the value. If later generic cosine routines are
necessary, this template macro can be extended to compile a call to
library code in case the parameter is not available at compile
time. One can be surprised how many times this pattern occurs: due to
the lack of target support for specific primitive abstractions it is
often easier to write something as a template for specialized
code. Note that this is different from programming for non-embedded
systems where this primitive functionality is usually available.

The advantage of doing it this way is that the code is easier to read:
code expresses semantics more easily without instantiation annotation
getting in the way. This annotation can be expressed somewhere else in
the form of 'forth' and 'macro' mode indicators. The disadvantage is
that a lot of code will be pushed towards implementation as a
macro. If this is taken too far, possible sharing might be
endangered. For that reason, moving between macro and instantiated
target code is made really straightforward in Purrr, but it remains an
explicit operation under programmer control.

Explicit code generation in Purrr is useful when

  * partial evaluation becomes too hard to do automatically
  * some on-target primitives are not available
  * algorithms are hard to express in concatenative syntax

So as long as it is possible to express a general algorithm in the
purely functional macro sublanguage, the built in PE can be used to
specialize the code. The advantage here is that the difference between
compile and run time can be silently ignored as an implementation
detail. However, in practice in some cases it might be easier to make
the code generation process a bit more explicit. In it is made very
straightforward to plug in arbitrary Scheme code for parameterized
code generation.


Conclusion

Stack languages are interesting for writing parameterizable low-level
code because the composition mechanism is so simple:

 * They are very straightforward to implement on the target
   architecture with a small run-time footprint of two stacks.

 * Automatic specialization through partial evaluation is very
   straightforward to implement off-target.

 * Implementing the code generator (including PE) using deterministic
   pattern matching exposes an interface that can be reused for
   plugging in aribitrary parameterized code generators.

In Purrr, the code generator language is Scheme. Within Scheme all of
Purrr and the underlying compiler is exposed: you can decide to
generate (pseudo) assembly code, Purrr code, or interface to external
code generators.

--

(1) Of course, the Purrr target semantics is not purely functional. It
    contains language primitives that introduce machine state
    (determined by world state) through a side channel into the
    concatenative language. This is no problem for PE, since it merely
    limits PE to those the subset of pure functions.  Procedures that
    depend on other parts of the machine state aside from the
    (threaded) parameter stack simply have to be instantiated, and
    cannot be performed at compile time.

(2) Except when it interferes with the implementation of the run-time
    function composition method, i.e. modifies the return stack. A
    more correct statement would be that the subclass of pure
    functions can always be trivially inlined.


Entry: concatenative email
Date: Tue Sep  9 15:10:14 CEST 2008

Hello folks,

I'm trying to write a paper about the core idea of Staapl from the
perspective of concatenative code rewriting, and how a particular
_implementation_ of this gives a convenient applicative -> concatenative
metaprogramming framework.

These two blog posts try to explain the mechanism:

http://zwizwa.be/ramblings/staapl-blog/20080806-212034
http://zwizwa.be/ramblings/staapl-blog/20080625-162839

As side information a post that deals with the 'impedance match'
between Scheme and the concatenative macro language, based on
pattern matching and quasiquotation:

http://zwizwa.be/ramblings/staapl-blog/20080526-203330

I'm wondering mostly if this makes sense..  While using the abstractions
in real life works beautifully, I have great trouble trying to explain
in a few words why this all works so well.

Basicly, it's the interplay of:
  * pattern matching for destruction/construction of stack machine code
  * using this for eager partial evaluation implementing rewrite rules
  * macro hygiene and lexical closures

Any comment welcome.
(Best one gets a free PIC kit ;)

Cheers,
Tom


Entry: blink-a-led on the 18F452
Date: Tue Sep  9 16:03:19 CEST 2008

It's working.. Don't know what went wrong last time, something to do
with the FTDI cable giving up.

What needs to be fixed to make the demo work smoothly in interactive mode.

  * State should contain toplevel macro file, so macros can be
    regenerated.

  * Interpreter should evaluate macros if it doesnt find a target word
    or substitution macro.


Entry: USBprog
Date: Wed Sep 10 12:03:00 CEST 2008

I'm thinking about contributing to the USBprog PICkit2 v2 firmware clone:

  http://www.embedded-projects.net/index.php?page_id=218

Preliminary source code is in the SVN

  svn co http://svn.berlios.de/svnroot/repos/usbprog

The directory with firwmare is here:

  svn co http://svn.berlios.de/svnroot/repos/usbprog/trunk/usbprogPIC/firmware2/

Since there is already C code, writing it in Forth is maybe not a good
idea.  A more interesting approach might be reverse-engineering of the
.HEX file + build some tools for that.

Writing C for AVR is probably also a good starting point to get more
familiar with the architecture + it could allow some code generation
experiments.

Conclusion, for me this project is about:

  * Working towards a standard programmer for Staapl, next to the
    native PICkit2

  * Adding some reverse engineering functionality for PIC18 (including
    some simulator stuff?)

  * Getting to know the AVR and it's gcc based toolchain, and get an
    idea of how good the compiler actually is, to see if there's much
    to gain with a stack-based HAL.


Entry: 8051
Date: Wed Sep 10 12:46:39 CEST 2008

Another important target next to AVR and ARM/thumb might be the 8051.
This can serve as inspiration:

  http://www.camelforth.com/page.php?4

Also, this might make Gert get off his ass and try my software..

EDIT: just requested samples for AT89C51RB2 and ATMEGA168


Entry: Re: DavidP5 (Microchip forum)
Date: Fri Sep 12 08:33:05 CEST 2008

In the following thread:
http://forum.microchip.com/tm.aspx?m=356807

DavidP5:
> So, what you made is a variant of the Forth programming language for
> PIC18s?

Essentially, yes.  That's what is finished now.  But there's a lot
more to it than that, mostly moving in two different directions:

Higher abstraction:

    It is Forth, but with its conventional macro system replaced by a
    functional programming language based on the Joy language:
    http://en.wikipedia.org/wiki/Joy_(programming_language)

    Why?

    * As you already mentioned, functional programming languages are
      good for writing compilers: a compiler is a data structure
      converter == a function.  A tool most useful for this is pattern
      matching:
      http://en.wikipedia.org/wiki/Pattern_matching

    * Due to their similarity, the Joy/Forth interplay makes partial
      evaluation easy to express as simple algebraic manipulation.
      I.e. there is a rule that says that "1 2 +" always can be
      replaced with "3", or more generally for any two numbers
      preceding "+".  The trick is that this mechanism can be used for
      more general template programming.

    This, together with the integration in the PLT Scheme module
    system makes a solid base to start experimenting with different
    languages and domain specific models on top of the low-level
    typeless stack machine language.

Portability:

    The idea is to find a standard set of macros that can abstract the
    microcontroller architecture to a large degree.  This set will be
    based as much as possible on standard Forth and existing stack
    computer instruction sets.  Compared to C this can go a lot
    further because it is based on a macro system instead of a
    function library API as is common for other abstraction layers.

    Staapl contains a (dis)assembler generator that should make
    porting to new architectures relatively straightforward.  I'm
    currently working on 12 and 14 bit cores.

    The step to integrating this with a processor simulator generator
    isn't that big, but this would really need closer cooperation with
    chip vendors as I'd like to do this with machine-readable chip
    descriptions.


What I'd like to find out is how people who spend most of their time
``in the trenches'' would use this tool.  For me it's in the first
place a tool to build things on top of, but I'm quite curious whether
using it just as a low-level machine abstraction layer would work, and
what would need to change to make that really practical.  I wrote some
nontrivial applications in it (a sound synthesizer toy and an ASK/PSK
modem) so I'd say it can definitely be used as such.

For more information, here's some documentation that talks only of the
lower Forth layers:

http://zwizwa.be/archive/pic18-forth.pdf
http://zwizwa.be/archive/pic18-synth.pdf

Cheers,
Tom


Entry: simplify simulation
Date: Fri Sep 12 12:46:53 CEST 2008

Actually, if I only knew the stack effect of a macro for all literal
arguments, it would be straightforward to simulate it during live
target interaction.  This could get rid of all explicitly coded
simulation in live/command.ss making the command frontend less
dependent on extensions (all semantics would be encoded in macros).

Approaches:
  * annotate macros with stack effect (pre-checks)
  * let it operate on data directly (either lazily, or copy whole stack)

I.e. for +:

 1. copy the whole stack     (1 2) -> ([qw 1] [qw 2])
 2. apply macro                    -> ([qw 3])
 3. success? replace stack         -> (3)

Copying the whole stack isn't too much communication overhead, and
it's easiest to implement.

The change needs to be made in live/target-lang.ss in
'target-interpret.  This needs to change to full symbolic
interpretation relative to the compiler namespace: map-id needs to do
nothing.

Maybe the the undefined symbol hook for the target/xxx namespace
reference should be added to then try the macro/xxx namespace?

It's simpler than that: at compile time, check if the name is defined,
otherwise compile a macro/xxx reference.  Use 'identifier-binding

See next posts.

The trouble is that writing a generic simulator is a lot of work.
Probably not so much the infrastructure, but entering all the
processor specific data.  Looks like I'm better off with just keeping
the current simplistic interpreter + a macro evaluator.


Entry: simulator simulator
Date: Fri Sep 12 13:22:50 CEST 2008

I don't have a target board at hand atm, but i'd like to test the
interaction code.. How to proceed?  Let's connect the default IO to a
stack machine simulator that simulates the 3-instruction Forth.

Got it +- working, except for data stack access: this is not abstract
enough in the current tethered approach.

The simplest solution seems to be to add a 'meta' command to the
interpreter, which will load a pointer to a meta struct area.  This
could then contain info about the target, without the need for more
interpreter instructions.  Alternatively, this could be stored in a
per-micro specific location. i.e. the boot block.

OK.. now I'm getting ambitious.. What about turning this into a seed
for a proper machine simulator?  Instead of writing a version of the
interpreter for the machine, compile the current .f code to another
machine and run that in the simulator.  Might be a good idea to
converge on the standard machine model.

Let's try to separate things out a bit..

 1. Reference semantics = Scheme.  The pattern matching rules define
    the different type signatures.

 2. Machine model = stack ops + memory reference.  The stack ops can
    be derived straight from the macros.

So, build a function that translates a bunch of macros into a symbolic
interpreter for a stack machine with a certain word length.

What should be the goal?

  * To be able to compile .f code to a standard virtual machine for
    bit-accurate simulation + testing of control structures.

  * Simplify this kind of simulation + generic platform-specific
    simulation and porting.  The idea is that if I want to compile to
    a huge number of architectures, I better think about a decent
    simulator/assembler architecture.  It's probably best to allow for
    an incremental assembler construction path: not all opcodes are
    necessary to create a basic uC port.  A lot of the assembler
    infrastructure for PIC18 is not really used.


Entry: towards standard machine architecture
Date: Fri Sep 12 16:29:41 CEST 2008

There are essentially 2 kinds of semantics involved: the PE semantics,
which is scheme with infinite precesion and all kinds of data types,
and the concrete machine semantics.  All the rest is just intermediate
and should be derived or at least verified.

In order to improve test-driven development, these need to be made
comparable: some redundancy needs to be inserted to check compiler
correctness and machine model correctness.

So, is it possible to create a model for the 'generic risc' chip
emulating a stack machine, and then gradually use this model in
compiler generation/verification?

Microcontroller HAL:

   Harvard architecture (2 memory pointers)
   2 Stacks.
   Arithmetic operations operating on parameter stack.
   Function call/jump
   Conditional jumps.

It looks like this is the same problem as creating a simulator for a
specific processor based on a generic simulator writer framework.

Roadmap: build a verifyier for the current PIC18 compiler first.  This
means that all instructions used should have annotations that allow
semantics to be added, enabling simulation and so verification.

The processor (logic) specification language can be a simple
functional language with named nodes (multi in/out functions).  Maybe
SSA, essentially parallel dataflow (really?), is actually the most
convenient representation?

(define-io (add (a b) (dst w n ov z c dc))
  (! (c W) (+ a b))
  (! z (zero? W))
  (! n  (ref W 3))
  (! dc (ref W 7))
  (! ov (xor c n)))

Then the base language: If i'm not going to express it down to the
gate level, some meaningful intermediate level needs to be used.
Here there is really no alternative other than C.


Simulator conclusions:

  * Compared to compilation, ultimately simulation needs to be FAST.
    The only way to make things run fast on arbitrary host systems
    (workstations) is to generate C, and specialize a simulator to a
    program.

  * This answers the question of primitive semantics: C and it's bit
    manipulations.  The composition mechanism can be kept at SSA,
    which is trivially transferred to/from other dataflow
    representations.

  * Such a primitive language is easily simulated in scheme when speed
    isn't important, but it's important to keep that layer to not use
    too many higher order tricks that would prevent compilation to C
    later.


Entry: summary of activity
Date: Sat Sep 13 09:23:20 CEST 2008

 * PICkit2: working towards an integrated solution that can program +
   communicate over the ICD2 connector.

 * thinking about standard machine model + how this can be combined
   with a concrete machine simulator and how to write such a thing, or
   if a virtual simulator is better.

 * thinking about more targets and what needs to change in the
   assembler generator to support addressing modes (PIC doesn't have
   any: they are memory mapped).

 * occam-pi and staapl: how much does it take to run just the
   synchronization primitives in Forth?

 * documentation and correspondence


Entry: MetaScheme
Date: Sat Sep 13 10:16:27 CEST 2008

* Trying to re-implement this:

 http://okmij.org/ftp/Computation/Generative.html#meta-scheme

     We implement the very clever suggestion by Chung-chieh Shan and
     represent a staged expression such as .<(fun x -> x + 1) 3>. by
     the sexp-generating expression `(,(let ((x (gensym))) `(lambda
     (,x) (+ ,x 1))) 3) which evaluates to the S-expression ((lambda
     (x_1) (+ x_1 1)) 3). Thus bracket is a complex macro that
     transforms its body to the sexp-generating expression, keeping
     track of the levels of brackets and escapes.


The CEK machine: code, environment, continuation. ?


* Trying to find the key points of difference between MetaML and
  Staapl.  This is quite difficult, because there is no clear analogy
  of "code object" in Staapl.  There are two: low level stack machine
  code, and anything Scheme (including Coma code), which represents a
  stack machine code transformer.  The presence of partial evaluation
  makes switching between these two straightforward, but also a bit
  confusing..

  1. Bracket == Quasiquote in Staapl, because template code cannot
     generate new binding constructs.  This makes things simpler.

  2. Code objects are opaque (implemented as machine code transformers
     closed over lexical Scheme environment) instead of abstract
     syntax trees.  This allows one to never have to leave the
     semantics of generators, and Scheme -> Coma translation isn't
     really staging, but namespace mixing.


Entry: FSM-hume
Date: Sat Sep 13 17:49:49 CEST 2008

FSM-hume: check the primitives, and assimilate.

http://www.macs.hw.ac.uk/~greg/hume/

  Hume is a novel programming language, intended for resource bounded
  domains, designed at Heriot-Watt University and the University of St
  Andrews. It is based on concurrent finite state automata controlled
  by pattern matching and recursive functions over rich types. Hume
  has been designed as a multi-level language, where different levels
  have different formal properties amenable to different
  analyses. HW-Hume is a relatively impoverished language of bits and
  tuples for characterising hardware, with decidable equivalence and
  termination, and predictable time and space behaviour. FSM-Hume
  introduces fixed precision abstractions over bit tuples, including
  integers, reals, strings and vectors, with associated operators and
  conditional constructs. This level, oriented to wider finite state
  machine-based designs, has strongly bounded time and space
  behaviour. HO-Hume augments FSM-Hume with a repertoire higher-order
  function with known cost models, such as map and fold, and
  user-defined non-recursive functions. PR-Hume extends HO-Hume with
  user-defined primitive recursive bounded functions and full Hume is
  a Turing Complete language.


EDIT: Reading the introduction of the Hume report, the thing that
struck me is the use of a restricted set of higher order combinators
without full recursion in one of the intermediate steps.  Looks like a
more formal treatment of the higher order macro ideas that leaked into
Staapl. (p.e. ifte)

http://www.macs.hw.ac.uk/~greg/hume/hume03.pdf


Entry: the Transterpreter
Date: Sun Sep 14 10:03:17 CEST 2008

> This is an email from Matt Jadud describing the TVM and occam
> http://www.transterpreter.org/

The TVM is a virtual machine for the instruction set of a Transputer.
In particular, we support the 4xx and 8xx series instruction sets,
which includes floating point operations.

The Transputer had a 3-depth stack (for integer and FP each; I'll
ignore FP from now on, as the FP is really no different/no more
interesting). The A, B, and C registers (as they are called), along
with a workspace pointer, instruction pointer, and a front pointer and
back pointer for the scheduling queue, and a timer queue represent all
of the machine state needed for an occam process. (occam was the
language designed to execute on the Transputer.)

By keeping things to just seven words, you could do very fast context
switches; the Transputer could do nanosecond context switches between
parallel processes. This is, in part, because it had some very fast
RAM on board. It's specialized nature, and expensive construction, are
just two of the reasons it ultimately died; the Intel 286 was a much
cheaper processor.

The occam language is grounded in the CSP algebra for reasoning about
concurrent and parallel processes.

http://www.usingcsp.com/

In CSP (and therefore occam) you reason about sequential processes
that execute in parallel. They communicate over channels, and sending
and receiving were designated in the algebra by '!' and '?',
respectively. In occam, these are the operators for sending and
receiving as well.

Producer/consumer looks like:

PROC producer (CHAN BYTE ch!)
  WHILE TRUE
    ch ! 42
:

PROC consumer (CHAN BYTE ch?)
  WHILE TRUE
    BYTE c:
    ch ? c
:

PROC main ()
  CHAN BYTE ch:
  PAR
    producer(ch!)
    consumer(ch?)
:

The channel is represented in the TVM (and on the Transputer) as a
single word in memory. This is a channel word that is in shared
memory; the workspaces of the two processes are private. (This privacy
is, and in fact MUST be, guaranteed by the compiler, or true
parallelism breaks.) When the writer comes in, they tweak the word;
when the reader comes in, they either wait, or complete the
rendezvous.

The communications are therefore the synchronization points in a
program, and dictate when we enter the scheduler. Further, they are
synchronous (we block on read and write) and point-to-point. The
extended language, occam-pi, introduces SHARED channels (that make
implementing bus architectures easy), as well as some other nifty
bits, like mobility of data and channel ends. For now, I'm staying in
the core/original language for purposes of this discussion.

How this is implemented does not matter. You could implement that
protected word using transactional memory. You can rip my
producer/consumer apart, and replace that channel with a wireless
link... as long as you preserve the semantics of the channel. It takes
work, but the powerful thing about the channel abstraction is that it
is a clean, well-defined point at which we can ignore what is going on
at the other end, and just assume we're either reading or writing to a
word in memory.

In the TVM, we just use a word and have one or two protected values
that indicate the channel is empty and whatnot (MAX_INT and MIN_INT
play roles here, I think; I can't remember.)

If you're familiar with the Cell, their mailboxes (which provides a
single word for communicating between the CPU<->SPU as well as all of
the SPU<->SPU communications, I think) are effectively the same
primitive. They have implemented what amounts to a CSP channel for
transferring one word at a time between these units directly in
hardware, and a small API for interacting with that word. Of course,
the C compiler doesn't help make sure you're not doing something
braindead, and they do provide for DMA transfers between the SPUs...
but my point is that many hardware devices that are going parallel
have similar operations baked in.

At a higher level, the blocking calls in MPI are equivalent, and I've
mapped occam channels to them in the past. Things "just worked,"
by-and-large... with a few caveats about comms failure, which (sadly)
the original occam model never considered. (It wasn't possible... or,
something catastrophic happened, so you didn't care, since your
computer was on fire.)

So that's a bit of history and a bit of a peek into the way
synchronization happens. The Transterpreter is more interesting than a
Transputer in a lot of ways, since we can programmatically map channel
ends to most anything. For example, in the Blackfin port (and soon the
ARM port, once we get our dev boards), we map the interrupt-driven
UART to a channel end.

0. The VM is sleeping; the CPU is sleeping.
1. A character lands on the UART. The CPU wakes up.
2. The interrupt stuffs the character into an occam channel word. (We
have a C API for this.)
3. The interrupt sets a flag in the VM, wakes the VM, and goes away.
4. The VM runs the occam bytecode, and the "channel" handling comms is
ready. The corresponding read happens.
5. Everyone goes back to sleep if nothing else needs to be done.

The fun thing is that we can do this with an arbitrary number of
interrupts, and the occam program will be semantically sound, even
though we're doing spurrious/arbitrary interrupts on hardware. This is
because channels only guarantee sync, they don't guarantee time.
Therefore, the VM can be "waiting" on any number of "hardware"
channels, but since it thinks they're just normal occam channels, the
VM is happy, and the occam programmer can get on with building a
highly parallel network of communicating processes on their embedded
target without wondering about the safety of their hardware
interrupts. If it works in test (on a single processor with
non-interrupt driven code), then it will work without. (I'm not trying
to waive my hands overly aggressively... things should be cool, but
we're still exploring this space. The semantics, we believe, hold.)

Given that the Transputer was originally used (and still used, in some
form) for codecs, comms, and video processing, it makes sense that
there is a good mapping.


Entry: Trench dwellers
Date: Sun Sep 14 11:53:03 CEST 2008

Got quite a to-the-point reply from DavidP5 on the microchip forum:


    The people who inhabit these forums are probably "in the trenches"
    as much as anyone. It is ironic that you seem to be having some
    difficulty explaining what your tool actually does in a way that
    is comprehensible to the people you are talking to (translating it
    into trenchese?). Clearly you love the concepts that you work
    with, and they sound so exotic (the lambda domain for example),
    but they may be getting in the way of you showing the
    trench-dwellers how you are proposing to make their lives
    easier. I suspect that using it "just as a low-level machine
    abstraction layer" is as much as you can hope for, but you will
    have to stop talking like that if you want to get any traction
    around here. I think you would be better to say that you have made
    a Forth implementation for PIC18s, and ask people to test it and
    suggest improvements. I have read your pic18-forth pdf. It is
    interesting but way too abstract for "trenchers". A tutorial about
    how to use your Forth for a variety of useful tasks (how to use a
    timer, how to send and receive data over a USART, responding to an
    external interrupt, writing to flash etc) would be better
    received. Trench-dwellers are primarily practical people so you
    must show them how to do stuff with your Forth if you want them to
    be interested. It would also be worth demonstrating that your
    compiled programs are at least as good as compiled C in terms of
    efficiency and resource utilisation.


This puts the finger on the sour spot.  Looks like there is only one
way around this: differentiate the documentation in a Scheme camp and
a machine code camp.

My reply:

    Thanks.  Your suggestions are very helpful.  It does look indeed
    like a different approach to documentation is needed.  I'm getting
    more convinced that the real problem I'm trying to solve is not
    really the technical one of building this system, but to write the
    documentation, bridging two different engineering cultures that
    generally just ignore each other..


A question to the PLT Scheme list:

(at least, something I want to ask but writing down the question makes
me think more.. this needs to ferment a bit more..)

Hello folks,

This list being a lair of educators, I guess it is the right place to
get some inspiration.  I'm looking for advice on writing documentation
which needs to use the concept of lexical scope and closures for
people that have spent already several years in an Electrical
Engineering background.

I have an EE/DSP background myself, and after absorbing Scheme and a
bit of basic language theory over the last 3 years I think I finally
get it, in that this ``exotic lambda domain'' has become a natural way
of working.  I recall that learning this was not an easy process, and
involved partially un-learning the use of real machines as my only
reference point.  However, in the meanwhile I seem to have not made
any progress at all extracting knowledge from my own learning process
to better explain the basic ideas in simple terms to my former peer
group, or avoiding unnecessary lore.

So the question: Does anyone have ideas or experience to ``sell the
benefits of functional programming'' to an EE biased audience?


---

I need two chapters.

1. The Forth language.

2. Its compiler.


I'm taking the following post off the blog..


Entry: Can you sell a language?
Date: Sun Sep 14 16:43:08 CEST 2008

( Appeared on the blog on Sun Aug 24 14:39:43 CEST 2008 but removed
  because if its rambling style.  This problem is much deeper than I
  assumed, and I need to take another approach. )

This article is an attempt to clear out some ideas about design
choices in Staapl, from the _cultural_ point of view.  What is culture
and what has pure technical merit?  It turns out this is not so easy
to answer.

I don't remember who said it, but it was at a keynote talk at MWSCAS
2000: ``As engineers, we should be aware about not falling in love
with our approach.''  It's an idea that I very much needed to hear,
and which taps me on the shoulder from time to time.  Ask yourself this
question: how much of your approach is just tradition?

A couple of days ago I ran into this post on LuT:
http://lambda-the-ultimate.org/node/687#comment-18074

A nice display of insight.  This made me think a bit about the real
merit of Staapl, aside from the cultural aspects tied to the Scheme
and Forth paradigms (it's cleaner).  See the previous post that tries
to identify the design choices, illustrating possible other approaches
to the problem Staapl solves.

Of the 4 trade-offs illustrated, I would say the first two are
technical and the last two are cultural.  The choice between library
or language is what Frank Atanassow stresses in the LuT comment.  I
have no data to prove it, but it does look like implementing a
compiler in a functional language is simpler because the problem of
compilation is expressing maps between data structures.

Compiler writing, a hobby for data structure junkies:
http://flint.cs.yale.edu/cs421/case-for-ml.html

Within functional programming the typed vs. untyped choice is one of
much debate.  I picked a middle road here, using PLT Scheme, a dynamic
language with static aspirations.  This choice is definitely about
culture, but in my opinion not so terribly important.  The main
players that set aside Scheme in the land of dynamic languages are
hygienic macros and PLT's module system.

Using a stack machine as a machine model is quite a gamble because of
the incredible pressure of the C programming language, which fueled
the rise of RISC machines.  However, going against this culture makes
things simpler.  Maybe it's a false hope, but I have the idea that
once parallel programming becomes _culturally accepted_, machine
architectures will become simpler.  So, this choice is cultural or
better _counter-cultural_ .


Let's try to answer Frank's 5 questions:

   1. What problem does this language solve? How can I make it
      precise?

As a representation language it contains an easily retargetable
machine model that supports a simple metaprogramming framework in the
form of quasiquotation, combined with a binding form for target code
parttern matching.

   2. How can I show it solves this problem? How can I make it
      precise?

A subjective statement: partial evaluation for concatenative language
(string rewriting) is simpler to understand than lambda expression
reduction (tree reduction rules).

   3. Is there another solution? Do other languages solve this
      problem? How? What are the advantages of my solution? of their
      solution? What are the disadvantages of my solution? of their
      solution?

Yes.  Metaprogramming based on (typed) lambda calculus: MetaOcaML.
The disadvantage of my solution is the lack of type system (which does
look like it is straightforward to add) and the forced use of a
point-free programming style.

   4. How can I show that my solution cannot be expressed in some
      other language? That is, what is the unique property of my
      language which is lacking in others which enables a solution?

It can, therefore it is a Scheme library.

   5. What parts of my language are essential to that unique property?

Scheme -> Coma quasiquotation, machine code pattern matching binding
construct, and semantics gueded by partial evaluation transformation
rules.


                             * * *


Both Forth and Lisp are firmly rooted in a culture of thought that you
can only appreciate once you went through the "aha-moment" yourself.
This statement reflects some of the smugness present in these
programmer cultures.  A smugness that deters many people new to the
paradigms.  But from my experience (I come from a C/assembly
background) I can testify that after this aha moment, you can't look
at things in the same way as before.

So for those that had the aha for both Forth and Lisp, I don't think
much justification for Staapl is necessary.  You are probably not
interested, because you're writing your own abstraction, right this
moment!

The rest of this article is for the other audience I try to reach:
embedded engineers that do not know Forth or Lisp and their relation
to macros and staging.


                             * * *


The problem with abstraction knowledge is that it is difficult to
bring up the motivation to aquire it when you already have a method
that is ``good enough''.  It's certainly not a good thing to make
things too abstract.

http://en.wikipedia.org/wiki/Worse_is_better

But sometimes the smart thing to do is to move beyond the barriers of
the programming system, up the abstraction ladder.

The main point is: for complex code that requires lowlevel control in
such a way that excludes the use of higher level languages directly
(can't trust that compiler!), it might be a good idea to find an
intermediate solution between using a low or a high level language
approach exclusively.

Use a highlevel language to _generate_ code for a lowlevel one.  Don't
trust a generic compiler to generate good code for you, write a
specific one yourself by automating the high -> low translation of the
abstractions you need.  If you can ``manually compile'' a highlevel
solution description by expressing it in a low level programming
language, and perform the same kind of translation multiple times, you
can probably automate this process.

The good news is that writing a special purpose compiler for a
specific abstraction is quite a lot easier than trying to write a
compiler that optimally tries to compile any kind of program.
Additionally, this problem can be simplified by using a set of
primitive compiler components.

So, let's suppose you've decided you need a staging solution.  Why
would you choose Staapl?

http://en.wikipedia.org/wiki/Not_Invented_Here

I know I can't sell Lisp.  I know I can't sell Forth.  But I know that
if I want to build a framework for code generation (a.k.a. model based
design), I can't ignore the knowledge encoded in the design of the
Forth and Lisp languages.  To use either of both seems like a
limitation: Forth's over-concrete standard semantics doesn't use a
source representation that is abstract enough and Lisp's (or better
Scheme's) machine model is too high level.  The Joy language seems to
be the grain boundary between these two models.  Staapl is really
nothing more than a possible implementation of the bridge between a
simple machine model and code generation viewed in a functional
programming setting.

The Staapl approach, through being built on paradigms from Forth and
Lisp, brings together a large collection of design choices that formed
the Forth and Lisp languages, leading to a fairly optimal encoding
system.  The path is quite straightforward: start with Forth, a
language that is already good at efficiently encoding problems
occuring in embedded design, and and replace its compiler, usually
written in Forth, by a mechanism that provides an impedance match to
languages based on lambda abstraction.  This is done by viewing Forth
macros as functional combinators (the Joy language model).  These
macros provide an API point that separate a multi-target backend from
highlevel metaprogramming and code generation on top of an abstract
machine.

http://en.wikipedia.org/wiki/Joy_programming_language

In staapl the impedance match between lambda abstractions and
combinators happens in two places.  The Scheme->Scat quasiquoting
mechanism allows the construction of parameterized _concatenative_
code,

 (let ([a 123]
       [b (macro: +)])
   (macro: ',a ,b))          ;; results in (macro: '123 +)

while a pattern matching mechanism for target (stack)machine code
allows the use of lambda abstraction to build _primitive_ code
transformers.

 (patterns (macro)
   (([qw a] [qw b] +) ([qw (+ a b)])))

In other words: Staapl removes some of the difficulty of having to use
a concatenative language at the higher abstraction layers, while
keeping the simple concatenative model as the basic composition
mechanism for metaprogramming.

The real advantage of Staapl is its structure as a machine model
(primitives + composition mechanism) combined with a programmable code
generator adapted to that model (composition of concatenative macros)
that is accessible from an applicative Scheme language through its
lexical scoping mechanism.

The simple 2-stack machine model allows simple re-targeting to any
machine architecture, with a slight bias towards the low-end ones,
where C might be sub-optimal from a code size perspective.  The overal
algebraic feel of combinator code is an asset for writing code
generators.  In short, stack languages are good for writing lowlevel
primitives.  Staapl embeds this stack model in Scheme, which is based
on the more standard function application model, and as such
interfaces easer to generic domain-specific models, without requiring
a specification in a stack language representation.


To conclude, the merits of Staapl summarized:

  * minimalistic machine model
  * simple staging framework (code = sequence instead of directed graph)
  * language promotes fine granularity (small functions / macros)
  * functional languages make source transformation easier
  * a bridge to lambda-based abstraction (quasiquoting + patterns)
  * once in scheme, use scheme's function+macro composition mechanisms


Entry: How not to sell a language
Date: Sun Sep 14 21:32:21 CEST 2008

OK.  Different strategy.  I need to use two different points of view.
There seems to be too much a divide between CS and EE to try to
explain everything from a single point.

It's important to try to build a community around the practical
aspects of Staapl, ignoring the ``exotic lambda domain'', and to
complement this with metaprogramming internals.

The documentation as it is now is already somewhat partitioned in
these two classes.  It just needs to be made more clear:

Abstract:
    * The general idea behind Staapl metaprogramming for PLT Scheme.
    * The reference guide for the Scheme API.
    * Some articles on the blog.
    * The excellent PLT Scheme documentation.

Concrete:
    * An overview of the PIC18 macro Forth dialect.
    * A practical tutorial about interactive tethered development.
    * A tutorial about writing a low-level DSL using bottom up programming with procedures and macros.


Entry: Break
Date: Mon Sep 15 11:38:02 CEST 2008

I started rewriting pic18 forth doc, but ran into a problem with
scribble and forth syntax.  Overall I think it's time to take a break
and re-order priorities..  I think I have an idea of where to go next,
but some focus is in order.

Main: get PICkit2 working so examples can be constructed.
Side: work on the reference doc + the concrete docs in terms of examples

- explain the differences with standard forth:
   * based on macros
   * no reflection
   * no 'postpone' necessary due to partial evaluation

- move metaprogramming examples ('patterns and 'macro:) from the
  introduction to the reference doc.


Entry: non-orthogonal part
Date: Tue Sep 16 12:43:06 CEST 2008

There's one thing that always gets in the way while explaining the
quasiquotation and pattern matching facilities central to Staapl's
metaprogramming model:

The fake algebraic type constructors perform the same role as
quasiquotation of macros.  Can the late detection of this pattern
somehow be bent into an asset?

It is an essential part, because without it, it is impossible to
create primitive code generators.

So, there are 3 important parts:

  code = (fake) algebraic data types

    * code deconstruction (pattern matching)
    * code construction

  composition: concatenative + quasiquotation
  evaluation: turn abstract (macro) code into concrete code.


The main problem is that I have no clear idea of stages.  Basically,
there are only two ``real'' stages:

   1.  The creation and composition of code generators.  Mostly based
       on higher order programming.

   2.  Execution of these.


Maybe the real idea is:

   There is no need for multi-stage programming if your metalanguage
   has higher order functions.  (Then, in a HOPL, macros can be useful
   yielding again a mult-stage language.)


Entry: MetaML and Cross-Stage-Persistence
Date: Tue Sep 16 16:16:51 CEST 2008

What I don't really understand is why CSP is stressed so much..  Why
isn't the whole code represented as a closure that evaluates down to
something with a much simpler lexical structure, i.e. machine code?

Why do you need intermediate representation layers, that are littered
with opaque CSP values, but nevertheless cannot be inspected?

It seems MetaML is really fairly tightly rooted in the manipulation of
ML abstract syntax trees, in such a way that staged code can be
presented to the ML compiler.

In staapl, the abstract representation is quite straightforward
because the code IS the compiler: it is a closure that when applied to
the empty program dumps out concrete machine code.

I guess renaming in MetaML is necessary because the ASTs manipulated
are probably still flat trees, and do not contain direct
variable->binder connections? (NO: it works modulo alpha-equivalence)

So I wonder.. Am I missing something?  The MetaML story starts with
explicit manipulation of a-equivalent template code (lambda calculus),
I start with manipulation of flat transformer combinators and
primitive combinators (primitive stack machine code transformers).


Entry: algebraic types
Date: Wed Sep 17 08:27:54 CEST 2008

I've been selling the pattern matching as being based on algebraic
types.  Maybe it's time to make that a bit more formal?

The most interesting problem to solve is to perform type analysis on
the macros.  Is it possible to compute every in-out behaviour?

The reason for asm code to be in source form is mostly confusion about
what to do with it where it to be an abstract data structure.

Maybe the most important changes are:

   * replace the extensional composition mechanism by an intensional
     one so type interference can reach the primitives.

   * add annotation to primitives. (done)


The first one is less straightforward..  This is simple for anything
not built on top of parser extensions..  Looks like the parser needs
to be rewritten for this.  That's a lot of work...  It's probably
better to try to incrementally improve it. Make it more abstract..

One thing I don't like is the way parameters are used to record the
'closing' of expressions.  It might be better to make 'expr' an
abstract data type (zipper?) and encode all the information in it.

First part is quite simple: it's only the immediate/function sites
that need modification for the normal syntax (not forth/parser-tx.ss)

Second part involves the 'close' operation, which can be separately
abstracted.  Let's first remove this.  It's trivial for the pure
concatenative code, but not for parser-tx

OK. got a better idea now about intensional representation: all the
parser macros convert ultimately to pure concatenative code through
the quotation mechanism.  The only thing parser-tx does is to add
dictionary functionality.  It looks like replacing the default lambda
wrapper is all that's necessary.

Then, the "current-close" operation is only used in the locals-tx.ss
transformer.

It's not just the default lambda wrapper, it's also the assumption
that the initial state has a syntactic representation.  Let's try to
model the expression as a function parameterized in the internal state.

Got first draft of functional representation working:

(define (rpn-open-expression) (lambda (x) x))
(define (rpn-on-cursor fn expr) (compose fn expr))
(define (rpn-close-expression expr)
  (let ((s #'*state*))
    #`(lambda (#,s) #,(expr s))))
(define rpn-expr? procedure?)

Next: fix locals, since it's broken now.  This is the one that needs a
cursor, since it needs to wrap the code.  This uses a non-hygienic
name capture trick that apparently doesn't work in the current approach.

This needs to be factored out a bit.

I don't see a way to do this without capturing the state and
re-binding it.. So i put rpn-state back in, but it's not a paramter.


Entry: loading library code
Date: Thu Sep 18 12:01:40 CEST 2008

using "load p18f1220.f" from a file not in the staapl/pic18 directory
doesn't work..  it does work when using staaplc, but not when using
the require form from scheme.

EDIT: this is because the module level code has no concept of load
path: it only uses absolute and relative references.  The prj/pic18
version does have this functionality, and is the preferred way of
using the compiler.


Entry: documentating
Date: Fri Sep 19 19:27:56 CEST 2008

I'm getting a bit fed up with trying to explain Staapl to the wrong
audience..  Learned a lot about how _not_ to write documentation, and
hopefully moving towards a better approach.  It's improving, but the
way is longer than I thought.


Entry: static assembler
Date: Sat Sep 20 00:39:38 CEST 2008

Let's see what is necessary to get some static analysis going.  Before
any compile time analysis can be added, the assembler needs to be
moved to static names.

One remark: if the assembler becomes static, it's straightforward to
convert it to concatenative code.

There are 2 uses for assembly instructions:

  * as data structure supporting pattern matching and construction.
  * as reducable expression

A pseudo instruction is non-reducable, and serves only as a
data-intermediate.  Assembly requires the association of the data
structure to a reducer.

Let's make all assembler structs carry a pointer to a reducer, and let
them derive from an abstract assembler opcode which has only a
reducer.  Make sure the pattern matching form ignores the reducer when
matching, and provide a constructor that creates a proper reducer
thunk.

NEXT: create a macro for this.. requires access to the lowlevel
version of define-struct


Entry: more about macros and MetaML
Date: Sat Sep 20 15:58:53 CEST 2008

More than I can process, but there was some talk about macros on LuT
recently.

http://lambda-the-ultimate.org/node/2987

Some things that struck me:

    Laurence Tratt:

    The Template Haskell school (for want of a better phrase - it
    might better be called the MetaML school) of compile-time
    meta-programming inverts the traditional Lisp notion of
    macros. Put very simply, in Lisp macros are special things that
    are explicitly identified, but macro calls aren't (they're
    normal-looking function calls that happen to do special stuff at
    compile-time); in Template Haskell, macros are normal functions
    (that just so happen to return ASTs) but macro calls are
    explicitly identified. This means that you can do anything you
    would normally do with functions, so 'macros' are first-class in
    such an approach.

Interesting, since this is one of the things that I consider a feature
in Staapl: heavily metaprogrammed source code can be read as if it is
single-stage.

    naasking:

    Staging is strongly-typed runtime compilation. So you can replace
    a interpreted parser expression built from closures with a
    compiler using staging, and get a specialized parser that executes
    at full speed with no interpretive overhead.

I'm probably missing a lot of nuances trying to compare Staapl with
MetaML by ignoring the type system..

    Frank Atanassow:

    The problem with LISP-style macros is that they force the user to
    solve those problems over again. But, if the interpreter/compiler
    writer has already done that, why should they have to? Why can't
    they reuse that functionality?  To put it another way, the problem
    is that LISP doesn't sufficiently abstract from the way programs
    are represented. It forces you to work with surface syntax,
    represented as trees of atoms, when in fact a program has a great
    deal more structure.

Now, hygienic macros do solve quite a lot of problems here, since at
least the identifiers are handled appropriately, but since the "great
deal more structure" in lisp is hidden behind the effect of a macro,
there is no way to check that structure before executing a macro with
the "wrong parameters" and generating a "run-time error at compile
time": you can't statically check macro code, and have to wait until
it is used in a tranformation for it to fail.


This leads to the following references:

About macros:
http://lambda-the-ultimate.org/classic/message9532.html

    Peter Van Roy:

    Higher-order programming is more expressive. Macros are
    syntactically more concise and give more efficient code. (Note
    that expressiveness and verbosity only have a tenuous connection!)

In Staapl, expressiveness is traded for efficiency.  The thing that
glues HOFs and macros together seems to be a good partial evaluator.
I.e. higher order traversal function applications replaced with static
'structured programming' code.

    Peter Van Roy:

    My definition of expressive power has nothing to do with Turing
    equivalence. It's something like this: adding a new language
    construct increases the expressive power of a language if it can
    only be expressed in the original language with nonlocal
    changes. For example, adding exceptions to the declarative model
    of chapter 2 in CTM increases the expressive power, since the only
    way to do the same thing in the declarative model is to add
    boolean variables and checks to all procedures (checks like: if
    error then return immediately). This was proved in the 1970s! (All
    of CTM is structured in this way, with computation models becoming
    more and more expressive, in the precise sense defined above. See
    Appendix D of CTM for a brief summary.)


There's some more talk about quasiquotation being the really important
idea, not s-expressions.  Frank Atanassow talks about macros just
being notational convenience, not really extending expressiveness
according to PVR's definition.


Some threads about MetaML:
http://lambda-the-ultimate.org/classic/message8778.html

    Frank Atanassow:

    One of the limitations of MetaML, incidentally, is that its
    representation of programs is too abstract. Although you can
    reflect a program---turn it into a value---the only thing you do
    with it is compose it with other reflected programs, and evaluate
    it. You can't, for example, rewrite parts of a reflected
    program. This is sufficient to turn an interpreter into a
    compiler, but not into an optimizing compiler.


http://lambda-the-ultimate.org/node/2438
Need to read that again..

But it does bring me back to the idea that the combinator approach in
Staapl makes things really a lot simpler.. But what is lost?


http://web.cecs.pdx.edu/~sheard/staged.html
Tim Sheard: A Taxonomy of meta-programming systems.

Quasiquotation is defined as a different way to write syntax trees
built from algebraic data types, instead of writing the trees by
manual construction.


Entry: ee vs. cs
Date: Sat Sep 20 18:23:18 CEST 2008

Reading these cs-jargon filled posts make me think that documentation
for Staapl should really play a bridging role between two worlds, as I
find myself right in the middle of it now, not being part of any :)

Anyways, the message is clear: more examples.


Entry: typed scheme
Date: Sat Sep 20 19:34:40 CEST 2008

http://lambda-the-ultimate.org/node/2622

occurence typing:

The key feature of occurrence typing is the ability of the type system
to assign distinct types to distinct occurrences of a variable based
on control flow criteria. You can think of it as a generalization of
the way pattern matching on algebraic datatypes works.


Entry: type classes in scheme
Date: Sat Sep 20 19:56:24 CEST 2008

http://groups.google.com/group/comp.lang.scheme/msg/ad9df65985068c16

cool.


Entry: musing
Date: Sat Sep 20 21:43:45 CEST 2008

After a day of reading and not really learning that much, my heart
aches for some down-to-earth engineering..  Looks like I want too much
and need to slow down a bit.  I've learned a lot in the last couple of
years, but beore i can get further on the big ideas front I need to
read more.  Finish TAPL and have a look at CSP/CTM, then look at HUME.


Entry: static asm identifiers
Date: Sun Sep 21 11:13:51 CEST 2008

problem is to give the static identifier (used in match) the right
kind of prefix. maybe i should have a look at dave herman's algebraic
datatypes instead of structs.


Entry: C code generation
Date: Mon Sep 22 10:25:48 CEST 2008

http://docs.plt-scheme.org/dynext/index.html


Entry: pickit2 debugging
Date: Mon Sep 22 17:27:42 CEST 2008

problems: apparently (EXECUTE_SCRIPT (ProgEntryScript)) was missing.
Also, the voltages are messed up.  It looks like the connection code
in Jeff's pk2-3.00 sets the pgm voltage.

Look at pickitGetDevice2() for a proper initialization.


Entry: concatenative.org
Date: Tue Sep 23 08:54:38 CEST 2008


Staapl (http://zwizwa.be/staapl) is a 2-stage dynamically typed
metaprogramming system for the PLT Scheme family of programming
languages. It consists of a concatenative transformer language Coma
that operates on a stack machine language.

It is based on two main observations.

    * An imperative stack machine model (abstracting a processor as a
      Forth machine) works particularly well for small embedded
      microcontrollers.

    * Staging and partial evaluation are simple to express for
      functional concatenative programming languages due to absence of
      problems related to identifier hygiene.


Staapl's Scheme bridge consists of the dynamically typed functional
concatenative Scat language, implemented as a hygienic Scheme
macro. In its basic form, Scat is Joy with extensional compositions:
code quotations are not lists, but abstract entities represented by
Scheme closures. Scat compositions support quasiquotation for
lexically scoped template programming from Scheme, and use a hygienic
hierarchical name management system based on PLT Scheme's declarative
module system.

Staapl's code transformers are written in a derivative of Scat called
Coma, a functional concatenative language operating on stacks of stack
machine code instructions. Staapl contains a pattern matching
mechanism for implementing Coma primitives in Scheme, which is used to
implement partial evaluation and writing architecture
backends. Currently there is an optimizing backend for Microchip
PIC18.

Staapl contains a simplified Forth frontend that hides most of the
metaprogramming system. This Forth dialect has the standard
Forth-style metaprogramming removed and replaced with a mechanism
based on partial evaluation and template macros. It is possible to use
the Forth language as a stand-alone programming system.


Entry: MetaOcaml
Date: Tue Sep 23 16:00:06 CEST 2008

Before I can make a roadmap for adding static analysis to Staapl, on
top of the already mentioned move to proper algebraic data types and
intensional Coma code representation, it might be a good idea to dig a
bit deeper into MetaOcaml.

The ideal application point is generation of PF's C-code video
processing primitives from high level staged algorithms.

Useful:
http://www.mpi-sws.mpg.de/~rossberg/sml-vs-ocaml.html

Let's start at the examples page:
http://www.metaocaml.org/examples/

Looks like this is the most interesting part:
http://www.metaocaml.org/examples/fft.ml


Hello world in MetaOcaml -> C

M-x tuareg-run-caml
metaocaml

# .!{Trx.run_gcc}(.<123>.);;
- : int = 123

And generates the following file as a side-effect:
--------------

void initializer() ;


#ifdef OCAML_COMPILE

#include "/usr/local/lib/ocaml/caml/mlvalues.h"

#endif
void initializer()
{
}


#ifdef OCAML_COMPILE
int main()
{
  return 123;
}


#endif
--------------


Entry: Texas Instruments TMS320C67
Date: Fri Sep 26 07:10:54 CEST 2008

What would it take to port Staapl to TI C67?  For DSP apps stack-based
approach without register allocation is not going to go very far,
since most DSP apps heavily rely on using all registers for
pipelining..

However, it might be possible to add compilers for a small number of
building blocks.  It could be that registers are mainly used for
pipeline filling instead of random access...


Entry: partial evaluation vs. dynamic closures
Date: Fri Sep 26 09:07:42 CEST 2008

Partial evaluation and dynamic closures are related in that they are
both partial reductions (given an expression with abstractions and
applications, perform some beta-reduction(s) somewhere in the
expression), but are quire different in the way they are represented.

Dynamic closures are best implemented by delaying full application -
the execution of (machine) code - until all the values are present:
intermediate representation is a chunk of parameterized (machine) code
and an environment of variable bindings.

Partial evaluation performs reduction by generating a (machine) code
representation that does not contain any free variables that need to
be filled in by a separate record of variable bindings at final
application time.  The (machine) code produced after partial
evaluation is completely stand-alone.

But I apparently already said that in the blog..


Entry: more about typed vs. untyped staging
Date: Fri Sep 26 09:51:45 CEST 2008


http://lambda-the-ultimate.org/node/2575

  Oleg:

  By typed compilation I mean a transformation from an untyped to
  typed tagless representations.


Entry: MetaML vs. Template Haskell
Date: Fri Sep 26 09:53:21 CEST 2008

What I dont get.. On the TH website:
http://www.haskell.org/th/

  Template Haskell is an extension to Haskell 98 that allows you to do
  type-safe compile-time meta-programming.

Then on Oleg's site:
http://okmij.org/ftp/Computation/Generative.html

  In the process, we develop a simple type system for a subset of TH
  code expressions (TH is, sadly, completely untyped).

So what's this? Type-safe manipulation of untyped code?  Maybe the
point is that TH allows _syntactic_ correctness by using a
type-checked AST, but doesn't allow the type checking to run until it
is effectively spliced into code and passes the compiler.


Entry: Matlab / Simulink integration
Date: Fri Sep 26 10:38:59 CEST 2008

Real-Time Workshop:
http://www.mathworks.com/products/rtw/

Based on Embedded Matlab:
http://www.mathworks.com/products/featured/embeddedmatlab/index.html

And the fixed point toolbox:
http://www.mathworks.com/products/fixed/

The language subset is what would be expected: everything works,
except things that need dynamic tricks + annotation is necessary.
Looks like a simple static type extension on top of dynamic Matlab.


Entry: Type system for Coma functions
Date: Fri Sep 26 10:45:23 CEST 2008

Path: Coma identifiers need to be linked to compile time information.
How to implement this?  I'm already using this space for Forth
parsers, so it requires a different ns..  Time to look at how Typed
Scheme does this.

Now, as long as one stays within the primitives + composition model,
this seems rather trivial.  Base types could be derived from primitive
stack code pattern matching rules, and composition is just figuring
out all the branches that are taken depending on Coma run-time..

If there is remaining dynamic dispatch based on the value of
instructions, then this won't work: search space will explode
exponentially.  So the move to real agebraic types might be
necessary..

I.e. something like [qw n] -> (if (n < 0) [qw -1] [qw 0])
Here 'n can't be determined at compile time, but it still generates a
[qw ...] type in any case.  When the two branches do not generate the
same type, this gives a problem, since it requires type unions (to
simplify) or an exhaustive search.

There is also a problem with Scheme -> Coma metaprogramming: how to
introduce typing info there?  This requires a binding form that allows
information to travel at compile time.

  (let ((abc 123))
    (let-macro ((foo (macro: 1 +))
                (bar (macro-value abc)))
       (macro: bar foo)))


Roadmap:

  * Figure out where this kind of dependency is used in an essential
    manner.  Go from there to decide on what kind of restrictions to
    impose.

  * Scheme -> Coma template programming then requires some annotation.
    This could be solved by using a special binding form that
    introduces local Coma words, so is not an essential problem, but
    does reduce the freedom a bit in how to formulate things.


I do wonder if this is the right way to go..  This is a can of worms.
I'm better of starting with a typed language to build a similar system
to see where the problems are..  Most code in Staapl is structured
enough so it can be straightforwardly translated to Ocaml or Haskell
code.


Conclusion: stay with the current implementation, and gradually allow
static checks to be introduced.  Avoid conversion to a full static
type system until the problem is understood (an implementation in a
typed language is working.)

Looks like the road to follow is more towards Typed Scheme instead:
this already is written from the viewpoint of integrating with a
dynamic language.


Entry: typed vs. untyped metaprog
Date: Sun Sep 28 07:57:56 CEST 2008

I'm not really convinced about the benefits of typed metaprogramming.
The important issues are hygiene and the use of algebraic datatypes
for code transformations.  The fact that type errors get cought at
generator compile time instead of generator execution time hower seems
a bit moot.  This only makes sense when one writes generators that are
not used and so not tested.

MetaOcaml is a nice system though.  It does what it's supposed to do,
and the offshoring to C is particularly interesting: in that setting
typed metaprogramming does make sense.  Generating C code from Scheme
requires one deals with types in some way, which is not entirely
trivial.

However, Staapl has proper hygiene and models generators as closures.
The fact that it can generate ill-typed _generators_ doesn't seem to
be such a problem since this is still detected at generator execution
time.

Note however that Staapl isn't completely untyped.  On the ADT level
(the machine instruction opcodes) some static checking is definitely
possible: Coma primitives and composition mechanism are both
accessible and are _already_ compile-time computations.

Typing the Scheme->Coma unquote operation could be replaced by
cross-stage persistance only: require functions to always use a
let-macro form such that type info can travel at compile time using
local syntax bindings, and always interpolate Scheme lexical values as
[qw ...] generators, or provide a let-constant form.


Entry: Union types
Date: Sun Sep 28 09:31:47 CEST 2008

One could see disjoint union types as a means to control conditional
branching by keeping its effects local.  Unadulterated dynamic
"decision making" makes static analysis untractable since different
branches of a "case" statement might have a completely behaviour.

Union types with corresponding exhaustive "case" statements control
this exponential explosion of possible paths through the code by
joining branches back together: at the type level, each branch does
essentially the same thing, allowing the identification of a union ->
union map as a "local conditional branch" construct.

So in essence: one puts in a little bit more effort to design types
(approximate behaviour) in order to be able to prove things about the
behaviour of a program without executing it.  An essential tool to
make this practical when conditional branching is present is to
abstract conditionals as union types, avoiding behaviour search space
explosion.


Entry: modifying MetaOcaml
Date: Sun Sep 28 12:11:36 CEST 2008

Maybe the best way to understand a bit more about the internals of
MetaOcaml is to dive right in and make some modifications.  I.e. I'd
like to replace the "double" type with "float" which is more useful
for embedded development.

There are two occurences of the literal '"double"' in bytecomp/trx.ml
What happens if these are changed?  Nothing..

It looks like postprocessing the code is going to be simpler.  For
generating float code this dirty hack should work:

           #define double float

Now.. The clean way.  In cprint.ml there is a function called
"print_type_spec" which does support the Tfloat type.  Maybe changing
Tdouble by Tfloat should work then?

Test with:

# (.!{Trx.run_gcc}.<fun x -> x +. 1.0>.) 123. ;;
- : float = 124.

That seems to work.  The generated procedure has correct type
declarations, and the marshalling uses implicit float<->double
conversions based on the types of "Double_val" and "Store_Double_val".

float procedure(float x_1 )
{
  return x_1 + 1.0;
}


#ifdef OCAML_COMPILE
value procedure_marshall(value arg_list )
{
  initializer();
  {
    float ret ;

    float x_1 = Double_val(arg_list);
    value store = alloc(Double_wosize, Abstract_tag) ;

    ret = procedure(x_1);
    Store_double_val(store, ret);
    return (store);
  }
}

EDIT: some interesting extensions:
  - double vs. float
  - short ints
  - vectors


Entry: bug in interactive console -> testing
Date: Fri Oct  3 10:56:12 CEST 2008

Right when I needed to demo it..  Something to think about is a decent
regression test.  The current .hex based all-in-one test doesn't cut
it for changes in interactive console.  Otoh, that code is as good as
stable..


Entry: next
Date: Fri Oct  3 11:03:47 CEST 2008

Been a bit out of it.. What's next?

- pk2 + serial interface
- dorkbot demo
- start working on C-code generation

I'm a bit in the middle of wanting to try the MetaOcaml->C and
extending Staapl with C-code target (ARM/dsPIC).  To simplify things
it might be best to try to stabilize on the pk2 interface first, so
there's a low-threshold entry to the project.


Entry: MetaOcaml bytecomp/trx.ml
Date: Fri Oct  3 12:39:33 CEST 2008

Downloading the Ocaml distribution to see the difference between the
two projects.  MetaOcaml based on 3.09.1

http://caml.inria.fr/pub/distrib/ocaml-3.09/ocaml-3.09.1.tar.bz2

The cabs.ml file seems to be from the Ocaml project.  Nope.. it's not
in there..  It's a separate library: cabs -- abstract syntax for
FrontC.  Here's a link in the Caml Dev Kit:

http://pauillac.inria.fr/cdk/newdoc/htmlman/cdk_180.html#SEC206

The one in metaocaml is 2.1 while the one documented above is 3.0.
There seem to be some naming differences.

The other file from FrontC is cprint.ml
The parser itself isn't included.


EDIT: look at these:
http://manju.cs.berkeley.edu/cil/
http://frama-c.cea.fr/what_is.html


Entry: Generating 3 addr SSA (GIMPLE)
Date: Fri Oct  3 16:47:06 CEST 2008

Since all code in GCC goes through GIMPLE, it might be interesting to
generate such code in the first place.

It's possible to have GCC dump out a C-like representation of the
intermediate tree forms using flags like: -fdump-tree-gimple

http://gcc.gnu.org/ml/gcc/2002-08/msg01397.html

  In any case, here's a draft of a design of a new SIMPLE
  representation.  It's a bit sketchy at this point, but I'm very
  interested in comments:

------

   function:
     FUNCTION_DECL
       DECL_SAVED_TREE -> block
   block:
     BIND_EXPR
       BIND_EXPR_VARS -> DECL chain
       BIND_EXPR_BLOCK -> BLOCK
       BIND_EXPR_BODY -> compound-stmt

A BIND_EXPR takes the place of the current COMPOUND_STMT, SCOPE_STMT and
DECL_STMT; all of the decls for a block are given RTL at the beginning of
the block.  DECLs with static initializers keep their DECL_INITIAL; other
initializations are implemented with INIT_EXPRs in the codestream.  The
Java "BLOCK_EXPR" is very similar.

   compound-stmt:
     COMPOUND_EXPR
       op0 -> non-compound-stmt
       op1 -> stmt

rth has raised some questions about the advisability of using COMPOUND_EXPR
to chain statements; the current scheme uses TREE_CHAIN of the statements
themselves.  To me, the benefit is modularity; apart from the earlier
complaints about the STMT/EXPR distinction, using COMPOUND_EXPR makes it
easy to replace a single complex expression with a sequence of simple ones,
simply by plugging in a COMPOUND_EXPR in its place.  The current scheme
requires a lot more pointer management in order to splice the new STMTs in
at both ends.

It seems to me that double-chaining could be provided by using the
TREE_CHAIN of the COMPOUND_EXPRs.

   stmt: compound-stmt | non-compound-stmt
   non-compound-stmt:
     block
     | loop-stmt
     | if-stmt
     | switch-stmt
     | labeled-block-stmt
     | jump-stmt
     | label-stmt
     | try-stmt
     | modify-stmt
     | call-stmt
   loop-stmt:
     LOOP_EXPR
       LOOP_EXPR_BODY -> stmt | NULL_TREE
     | DO_LOOP_EXPR
       (to be defined later)

The Java loop has 1 (or 0) EXIT_EXPR, used to express the loop condition.
This makes it easy to distinguish from 'break's, which are expressed
with EXIT_BLOCK_EXPR.

EXIT_EXPR is a bit backwards for this purpose, as its sense is opposite to
that of the loop condition, so we end up calling invert_truthvalue twice in
the process of generating and expanding it.  But that's not a big deal.

>From an optimization perspective, are LABELED_BLOCK_EXPR/EXIT_BLOCK_EXPR
easier to deal with than plain gotos?  I assume they're preferable to the
current loosely bound BREAK_STMT, which has no information about what it's
exiting.  EXIT_EXPR would have the same problem if it were used to express
'break'.

   if-stmt:
     COND_EXPR
       op0 -> condition
       op1 -> stmt
       op2 -> stmt
   switch-stmt:
     SWITCH_EXPR
       op0 -> val
       op1 -> stmt

The McCAT SIMPLE requires the simplifier to make case labels disjoint by
copying shared code around, allowing a more structured representation of a
switch.  I think this is too dubious an optimization to be performed by
default, but might be interesting as part of a goto-elimination pass; a
possible representation would be to also allow a TREE_LIST for op1.

   labeled-block-stmt:
     LABELED_BLOCK_EXPR
       op0 -> LABEL_DECL
       op1 -> stmt
   jump-stmt:
     EXIT_EXPR
         op0 -> condition
     | GOTO_EXPR
         op0 -> LABEL_DECL | '*' ID
     | RETURN_EXPR
         op0 -> modify-stmt | NULL_TREE

I had thought about always moving the assignment to the return value out of
the RETURN_EXPR, but it seems like expand_return depends on getting a
MODIFY_EXPR in order to handle some return semantics.

     | EXIT_BLOCK_EXPR
         op0 -> ref to LABELED_BLOCK_EXPR
         op1 -> NULL_TREE
     | THROW_EXPR?

I'm not sure how we want to represent throws for the purpose of to
generating an ERT_THROW region?  I had thought about using a THROW_EXPR
wrapper, but that wouldn't work in non-simplified code where calls can have
complex args.  Perhaps annotation of the CALL_EXPR would work better.

     | RESX_EXPR
   label-stmt:
     LABEL_EXPR
         op0 -> LABEL_DECL
     | CASE_LABEL_EXPR
         CASE_LOW -> val | NULL_TREE
         CASE_HIGH -> val | NULL_TREE
   try-stmt:
     TRY_CATCH_EXPR

This will need to be extended to handle type-based catch clauses as well.

     | TRY_FINALLY_EXPR

I think it makes sense to leave this as a separate tree code for handling
cleanups.

   modify-stmt:
     MODIFY_EXPR | INIT_EXPR
       op0 -> lhs
       op1 -> rhs
   call-stmt: CALL_EXPR
     op0 -> ID
     op1 -> arglist

Assignment and calls are the only expressions with intrinsic side-effects,
so only they can appear at statement context.

The rest of this is basically copied from the McCAT design.  I think it
still needs some tweaking, but that can wait until after the
statement-level stuff is worked out.

   varname : compref | ID (rvalue)
   lhs: varname | '*' ID  (lvalue)
   pseudo-lval: ID | '*' ID  (either)
   compref :
     COMPONENT_REF
       op0 -> compref | pseudo-lval
     | ARRAY_REF
       op0 -> compref | pseudo-lval
       op1 -> val

   condition : val | val relop val
   val : ID | CONST

   rhs        : varname | CONST
	      | '*' ID
	      | '&' varname_or_temp
	      | call_expr
	      | unop val
	      | val binop val
	      | '(' cast ')' varname

   unop    : '+' | '-' | '!' | '~'
   binop   : relop | '-' | '+' | '/' | '*' | '%' | '&' | '|' | '<<' | '>>' | '^'
   relop   : '<' | '<=' | '>' | '>=' | '==' | '!='


Entry: About code generation and identifiers
Date: Fri Oct  3 17:29:43 CEST 2008

With all this focus on making identifiers lexically scoped, I'm
wondering how the move to an external compiler/assembler is going to
work out..


Entry: gimple
Date: Fri Oct  3 18:43:41 CEST 2008

void boo(int *a, int *b){
	int i;
	for (i = 0; i<100; i++){
		a[i] += b[i];
	}
}

the -tree-gimple option gives

boo (a, b)
{
  unsigned int i.0;
  unsigned int D.1200;
  int * D.1201;
  unsigned int i.1;
  unsigned int D.1203;
  int * D.1204;
  int D.1205;
  unsigned int i.2;
  unsigned int D.1207;
  int * D.1208;
  int D.1209;
  int D.1210;
  int i;

  i = 0;
  goto <D.1197>;
  <D.1196>:;
  i.0 = (unsigned int) i;
  D.1200 = i.0 * 4;
  D.1201 = a + D.1200;
  i.1 = (unsigned int) i;
  D.1203 = i.1 * 4;
  D.1204 = a + D.1203;
  D.1205 = *D.1204;
  i.2 = (unsigned int) i;
  D.1207 = i.2 * 4;
  D.1208 = b + D.1207;
  D.1209 = *D.1208;
  D.1210 = D.1205 + D.1209;
  *D.1201 = D.1210;
  i = i + 1;
  <D.1197>:;
  if (i <= 99)
    {
      goto <D.1196>;
    }
  else
    {
      goto <D.1198>;
    }
  <D.1198>:;
}


This is quite lowlevel.  It doesn't have any structured elements. Just
"goto", "if" and assignment.  Also, all derefs are direct.  I wonder
how this is then mapped to addressing modes.

The -tree-original option gives this:

;; Function boo (boo)
;; enabled by -tree-original


{
  int i;

    int i;
  i = 0;
  goto <D.1197>;
  <D.1196>:;
  *(a + (unsigned int) ((unsigned int) i * 4)) = *(a + (unsigned int) ((unsigned int) i * 4)) + *(b + (unsigned int) ((unsigned int) i * 4));
  i++ ;
  <D.1197>:;
  if (i <= 99) goto <D.1196>; else goto <D.1198>;
  <D.1198>:;
}

It also just has if + goto.


Entry: readMode()
Date: Sun Oct  5 10:12:16 CEST 2008

A trace of the 'read' command for pk2 by Jeff Post simplified to PK2
v2.x and 18F1220 programming.

http://home.pacbell.net/theposts/picmicro/
http://home.pacbell.net/theposts/picmicro/pk2-3.00-alpha12.tar.gz


In pk2-3.00-alpha12/pk2main.c :

readMode()
  pickitGetDevice()
    readDeviceData1/2()
    pickitGetDevice2()
      findDeviceName2()
      SETVPP
  pickitRead()
    allocateDevice2Buffers();
    pickitReadProgram()
      pickitReadProgram()
        enableTargetPower()
        clearUploadBfr()
        (EXECUTE_SCRIPT (ProgEntryScript))
        setDownloadAdrs()
        while {
          clearUploadBfr()
          (EXECUTE_SCRIPT (ProgMemRdScript))
          readDataBlock()
        }
      pickitReadEeprom()
      pickitReadConfig()
  writeHexFile()


Some problems: UPLOAD_DATA_NOLEN doesn't seem to work in my
implementation, so I'm using UPLOAD_DATA.

Got config write to work too.. However, write protect can't be cleared
with just writing config data.  I think this requires a chip-erase.
OK, works now.

For program write, I move to the Microchip pk2cmd sources:
PICkitFunctions.cpp : WriteDevice

write-program-memory now finishes, but the pk2 hangs on the next
read-program-memory.

Wait.. Got something back!
After reset it does seem to work.

Let's try with uploaded script again.

That doesn't work, but without works fine.

Got the problem: the DOWNLOAD_SCRIPT function had tag 255, which
should be 254, but that doesn't work either.  Something fishy there:
it's not tested yet.


Entry: demo / test
Date: Sun Oct  5 16:21:11 CEST 2008

with pickit2 programming working for the essential parts, it's time to
write a demo / test.  add this to staaplc:

this leads to 2 operation modes:
  * OFFLINE: generate .hex and .ss files to manually upload + connect
  * ONLINE:  don't save those files, but upload binary and stay connected

interface to online programming should be the same as hex file saving.

'upload-monitor is working now.  next: activate target + serial
pass-through.  the latter requires a circuit and some thought, so
leave it for later.


Entry: last couple of weeks
Date: Sun Oct  5 21:16:50 CEST 2008

* MetaOcaml: I've been looking closer at MetaOcaml, trying to prepare
  for replacement of some PacketForth code by GCC offshored generated
  code.  Also been thinking about Mathlab/Simulink's
  Real-time-workshop and how this might fit together with MetaOcaml.

* PICkit2: programmer is working.  Getting closer to a standard 5-wire
  protocol for minimalist PIC programming.

* Documentation: saw the need to stabilize on two fronts: pure Forth
  without the Scheme stuff, and a Scheme side.

* Static analysis: some ideas about adding a type system and inference
  rules, but ran into problems with the dynamic nature of Scheme.  I'm
  not sure if this is worth it..  Maybe a "static core" that is
  translated to Ocaml or Haskell as a _test suite_ is a better
  approach.  The idea is: the fact that Staapl is dynamically typed is
  a plus as it makes things simpler, but it is important to keep an
  eye on how things would work would it be implemented in a static
  type system, to get more inspired compile-time and test-time checks.

* C analysis and synthesis, Plenty of stuff here: MetaOcaml + FrontC /
  CIL, Haskell's language.c and Ometa parser for PLT Scheme.  The
  latter might be interesting to solve the "quasiquotation" problem: a
  template language with the langauge's concrete syntax is easier to
  use than syntax trees.

* Transterpreter + CSP.  I've had little time for reading lately, but
  it would be nice to write the MC-303 ui application in a parallel
  Purrr dialect, one which has return stacks in RAM.

* Simulator simplification.  This triggered a lot of thinking about
  static analysis, since the simulator needs to know the number of
  elements consumed by the macro to be able to evaluate it.  The
  current mechanism doesn't give any information: it merely inspects
  the input and acts accordingly.  Alternatively, it might be easier
  to copy the entire stack, but this requires extra annotation for the
  interpreter.


Entry: closures
Date: Tue Oct  7 09:57:29 CEST 2008

When explaining functional programming to somebody with a hands-on
imperative/oo background (say C,C++,Python), how to proceed?

I think the punch-lineshould be something like: closures (lambda)
make the creation of plug-in behaviour really easy.  A closure is
essentially a method bound to a hidden object.

Except from typing "lambda" in Scheme, or "fun" in Ocaml or "\" in
Haskell, there is essentially no overhead when creating these objects.
These objects are first-class, in that they can be passed to other
such objects as function arguments.

Essentially, functional programming is about "building functionality
by passing closures (parameterized behaviour with context) to other
closures".


Entry: macros at the console
Date: Tue Oct  7 10:13:43 CEST 2008

- type system: allow compile-time annotations that know this
- lazy stack read
- full stack transfer

The latter seems best really, since it is also useful for debugging.
Maybe best implemented as an extra opcode?  Dump stack..  This
requires the monitor to know its data stack location.

OK: full stack transfer works: target provides just stack size, host
performs moves + copy.

NEXT: remove '+' from simulated words, but try to add it as a
simulated macro.

OK. got it to work.. it is a bit slow though, due to moving of all data.

Removed other simulation mechanism.

Possible optimization: perform multiple macros in sequence, instead of
moving data up and down.

Fixme: add real interpretation of QW, CW and JW.  This does require
some optimizations to be disabled.  However, the comp/ postproc seems
to be not used anyway.  FIXME: make this all a bit more explicit.

There's another problem: with interactive usage, not all identifiers
are available at compile time.  This is not problematic, except for
prefix words: those need to be available at the place where 'target:'
is evaluated.

Actually, it is problematic.. Currently invokation of commands doesnt
work any more..

TODO:
  * replace the current simulation with an explicit QW/CW interpreter
  * clearly define the semantics of macro-eval

first one ok: see 'interpret-cw/qw in live/tethered.ss
the other one i need to check properly..


Entry: pickit2
Date: Tue Oct  7 15:55:57 CEST 2008

next: serial passthrough.

OK. the scheme side code seems to work.

next: PIC side direct loopback.

  ICD2    serial           color

1 /MCLR                    white
2 VDD                      red
3 GND                      black
4 PGD     RX  (<- target)  blue
5 PGC     TX  (-> target)  green
6 PGM


For the PIC18 this requires:

13 PGC RB6 RX (<- host)
14 PGD RB7 TX (-> host)


Build a test app:

init-stacks
TRISB 7 low
begin
  PORTB @ >> PORTB !
again


Entry: microchip samples
Date: Tue Oct  7 21:37:31 CEST 2008

need some pics with canbus + ethernet.. for ethernet the stand-alone
chip with spi is maybe better:

ENC28J60

PIC18F2580      32k
PIC18F2585      48k
PIC18F2680      64k
PIC18F2685      96k


Entry: staapl pickit interactive
Date: Fri Oct 10 08:02:52 CEST 2008

Looks like programming is working, time to clean it up a bit.  Some
things that need to work:

       1. accessible from console (reload project: 'init-prj)
       2. verify!

The namespace management thing is a bit confusing..  How does the
console fit in this?  Can it execute commands that can change the
current project?  It should.

   Interaction: all identifiers necessary for interaction reside in
                the namespace object.

The problem with this is that the command parsing will depend
completely on the wrapped namespace.  Where is the conflict?

  + There should be some toplevel entity that can kill a project and
    reload it without having to reload core compiler components, or
    that can keep multiple projects active.

  + Projects should be self-contained, and not influence this top
    level manager.

  + The console should have access to the toplevel management.

  - Currently, console is per project.

Solution: provide an escape hatch per project: install a command that
will evaluate an expression in the toplevel context?

Now, the problem is: some commands will disconnect the target.  How to
cope with that? Currently the 'OK is provided by a sync to the
target..  Maybe the concept of connection should be
re-established.. So, add another bullet:

  - Currently, console assumes connection.

So, what is the model?  The console interacts to a machine, but the
machine is variable ('login' to another project / device) and it
includes a mode where commands don't work since there is no target
connected.  However, we want to keep the illusion we're on the target
chip.


Entry: Staapl selling points
Date: Sun Oct 12 20:07:20 CEST 2008

I'm gearing up to try to sell Staapl to trench dwellers.  I need an
attack plan..  Particular language needs to be adapted to the public,
but the real selling points need to be clear.

* The console.

A console is something to control parameters, preventing recompile.

In practice, everybody developing embedded apps provides a console.
Usually this is done in an ad-hoc way, adding just the
reconfigurability that is necessary.  Lisp and Forth remove this
ad-hoc feature: the console is a keyboard connected to a full
language, giving a short edit-compile-test cycle.


* Pattern matching, closures,

Useful for building language transformers

* Closures, macro hygiene and declarative modules.

Tools for namespace management.


EDIT: After one week in the mud, I think I'm going to try to scale
down the expectation a bit..  You can't sell functional programming.


Entry: getting things done
Date: Sat Oct 18 14:46:21 CEST 2008

Next event is (probably) dorkbot gent 2008/12/19.  What needs to work?
Pickit2 interface + examples.

SIMPLE PIKCIT2 BASED.  This is also for the low-end pic chips.

1. Get serial pass port to work.
2. Bit-banged serial console over ICD.


USB BOOTLOADER BASED.  Johannes' USBpicstamp.  The problem here is
only software.


Entry: higher bandwidth to PC
Date: Sat Oct 18 15:08:14 CEST 2008

I.e. for simple logic analyser..

* USB: Relatively simple hardware, but driver software is complex.
  This is on the list, but not available yet.  Disadvantage is polled
  nature (not client driven).

* ETHERNET: Simpler software, but little more complicated hw.  What is
  necessary to get the ENC28J60 ethernet chips going?  Hmm.. I don't
  have transformers, 1% resistors, rj45 connectors and 3.3 V
  regulators.  Order?  Disadvantage: components.

* PC ISA.  Can this be done without address decoder chips (using a
  single address bit on a dedicated machine?)  If so, it's a possible
  solution.  If not, other interfaces are probably better.  Probably
  best with PSP port on 40pin devices.  What with DMA?  Disadvantage
  is compatibility.

* ATA. PIO-0 might be feasible?  Try to figure out if a standard linux
  driver can be used.  Might have some protocol overhead.

* The FTDI chips allow for more elaborate modes than just async
  serial.  I have a bunch of them..  Is there a linux driver that
  allows more lowlevel access?  The linux driver allows 'warp' rate
  460800, which would be 8 channels at 46.08kbit, which is probably
  enough for now, and should be straightforward to get going.  The
  driver also supports a custom divisor.

* CANbus.  I've ordered a bunch of PICs with CANbus support, and some
  line drivers.  The problem here is interface to PC, it needs a
  separate bus.


Entry: PicKit2
Date: Wed Oct 22 08:25:50 CEST 2008

Should I finish this, or first build an USB interface?  For the serial
passthrough I need a decent measuring device.  Actually, an active
brain would be enough..

EDIT: pk2 is probably more important.  Let's get a setup going.


Entry: distracted
Date: Sun Oct 26 11:52:40 CET 2008

Got distracted a bit by ideas of using CANbus or RS485 as a way to
connect measuring equipment to a PC.  Currently I'm thinking of trying
to make an ISA card with DMA.  As long as the card is not bus master,
this shouldn't be too hard.  However, it does need buffering on the
PIC side, so is limited to slow data rates (not possible to run a
single loop).

So, the easiest route is the FTDI on 460800 baud with plain 8 bit port
readout.  For PIC18 @ 40 MHz that's a divisor of 21.7   And on 48 Mhz
it's 26.

The reason i'm dragging my feet here is that i don't have a decent
workflow setup.  This application (getting serial comm to work) can be
used as a comb for workflow documentation.

  1. Create a standalone application:  .f -> .hex
     This involves: processor selection + oscillator configuration.
     Maybe this is a good time to also use standard config bits
     defines from the asm files.

  2. Time base + delay loops.  Use this to create a bit-banged serial
     send, and later receive.

  3. hardware UART config.


The deal is this: get the monitor config working using just busy loops
and general purpose IO.  This leads to simple examples.  It removes
dependency on hardware uart (i.e. when uart is used by app, or chip
doesnt have one).  It should be enough to get the PK2 serial port
working too.


Entry: reset
Date: Tue Nov 11 09:46:57 CET 2008

Working in the field for 4 weeks gave me a proper reset.  Where is
Staapl headed?  I think I found some possible commercial applications
for code manipulation foo, but they are more about code analysis,
refactoring and aspect-oriented tricks.  And it all has to be about C.
So I guess it's time for a break, and continue work on usability and
documentation.  From a more practical point, I have a need to build a
couple of circuits for measuring and testing, which could pull Staapl
development.

The most important thing to focus on right now is to reduce the setup
time to go from a blank chip to a working app (interactive or not).
This should be nothing more than adding an icd2 connector.

Second important thing is to be able to use multiple projects at the
same time.  Currently I want to use an 4550 board with a parallel port
to use as a logic analyser to get the bitbanged serial port to work.


Entry: colours
Date: Tue Nov 11 12:16:38 CET 2008

Going to standardize the colorscheme for ICD/serial, following the
thing that's already in my head: On the master side: orange = hot
(TX), yellow = cold (RX).  Orange is then clock (always out), while
yellow is data (sometimes in).  This makes sense.

( I never thought I'd loose so much time on connectors.. Or on fitting
the stuff in my head.. Is it straight or reversed? RX or TX?  I.e. the
ICD connector is a bus with a well-defined master, while async serial
is a symmetric point-to-point link, so we assign a master to avoid
colors to cross, which means slave has inverted colors. )


  ICD2    serial           color

1 /MCLR                    white
2 VDD                      red
3 GND                      black
4 PGD     RX  (<- target)  yellow
5 PGC     TX  (-> target)  orange
6 PGM


Entry: basic app config
Date: Tue Nov 11 16:01:33 CET 2008

It's quite simple now:

 * compile chip configuration registers
 * define fosc
 * define monitor baud rate
 * load chip specific macros
 * load monitor + boot code

Also, FTDI cable works up to 230 kBaud


Entry: busy loop timing
Date: Tue Nov 11 16:13:00 CET 2008

I'd like to create a busy wait macro that's relatively accurate, using
nested for loops.  Got it:

\ This is optimized for fosc from 8 -> 48.  The inner loop compensates
\ for the oscillator period.  Using a convenient period of 50us, this
\ gives 33 iterations at 8Mhz and 200 iterations at 48Mhz.  The macro
\ is exposed, but it is most accurate for +- 50 us.

macro
: usec
    fosc 4000000 / *  \ instructions per us
    3 /               \ instructions per loop
    for next ;
forth

: 50usec 50 usec ;


Entry: logic analyser
Date: Tue Nov 11 17:32:13 CET 2008

Now, let's see what there is to do to pipe in data.  Using 4 channels,
there are 2 measure points per byte, gives a rate of about 50kbit.
But, let's keep it simple an pipe only the bytes.

Using 8 channels is easiest to code, but gives only 23k samples/sec.
At 230400 baud, a byte needs to leave every 434 cyles.  Let's work
with a sample period of 54 cycles, this gives enough time to send out
the bytes (TXREG !) and amounts to 185k samples/sec.

Now, make this into 64 cycles, and a free running timer can be used to
synchronize, making everything easier to get right.  This still gives
156k samples/sec.

A free running timer isn't necssary, we can easily update the timer
every cycle.


Entry: dorkbot - a different angle?
Date: Sun Nov 16 17:20:04 CET 2008

programming = debugging

There's already too many things said about programming.  Writing
software for deeply embedded systems isn't really about programming,
it's about debugging.  And there is only one way to debug: make sure
you SEE what's going on, the rest is just getting there.


Entry: tools ready
Date: Thu Nov 20 08:36:38 CET 2008

Everything seems to be in place now after move.  Still don't have
internet, but at least the PC in my workspace has USB working.  No
more excuses for procrastination.  Time to do some serious work now.

next: PK2 serial

First: make the PK2 accessible from toplevel.  Currently I have to use

box> (enter! "staapl/pickit2/pk2.ss")
box> (boot)
loading PICkit2 device file.
pk2-open: PICkit 2 Microcontroller Programmer


(define (test-uart)
  (boot)
  (uart-start)
  (let loop ()
    (display ".")
    (uart-write #x55)
    (sleep 1)
    (loop)))

There are several things wrong.  First, VDD is not connected.

OK. At 300 baud, hardware loopback works.

next: make buffer work

OK: sending is a sync channel, receiving is an async channel (we can't
block target).


next: bitbanged serial over the ICD2 ports


Entry: multiple consoles
Date: Sat Nov 22 08:33:10 CET 2008

Something I didn't try yet is to run a snot head to a forth console.

It's quite straightforward: just load the .ss file into the toplevel,
then activate the console with:

  ,(repls '(box command))

This implies the console is defined (i.e. in .snotrc or in the
corresponding .snot file in the path) using the command:

  (register-language
   'compile "compile> "
   (not-implemented 'compile-eval)
   (lambda (str) (box-eval `(forth-compile ,str)))
   #f)

The hook here is 'forth-compile.  To enable multiple consoles, this
needs to be parameterized.

The .ss file generated needs to be a bit more general than a
standalone console..


Entry: Getting rid of namespace management
Date: Sat Nov 22 09:00:15 CET 2008

This is a feature only useful for more advanced host/target setups.
What about making things simpler?  For direct console access, only a
single namespace is necessary, so there is no need for the extra
indirection that shields the compiler namespace.  This also gives a
more hackable interface.

Argument against: you need a namespace to perform evaluations.  But
this can just as well be the initial mzscheme namespace.

So, the .ss file is now something that can be loaded into a fresh
namespace, be it a prepared one or an empty one.

Note that staaplc also uses reflective operations to compile a .f
file.


Entry: host + client
Date: Sat Nov 22 11:01:44 CET 2008

So, what's the workflow for having both a client and a host
communicating, instead of an emulated console?

The whole deal with 'lazy connection' never really worked properly..
Time to fix it or get rid of it.  What about: one namespace per
target, so store the connection in the namespace?

Let's simplify.

* The tethered.ss code is parameterized by 'target-in' and
  'target-out', so it doesn't care if the port is a channel or a
  scheme port.

* Lazy connections are not necessary.  If there is no console, the
  target simulator is run instead.


Entry: pk2 console works
Date: Sat Nov 22 15:10:42 CET 2008

That's a milestone.  There's a slight problem however: detection of
ICD cable.. It would be nice to keep the same idle line detection as
for ordinary serial ports.

Another disadvantage is that it doesn't fit in 512 bytes, but for
applications that need it the boot block could be compacted.  And,
with the pk2 attached, a messed up bootloader isn't so problematic, so
one could run without boot-protect making fixed-size loaders not
necessary.

Next: reset pk2 with ICD2 PGC line high.

Probably something like this: send a single command buffer that
switches the target on + switches to uart mode.

Maybe it's best to use the following logic: before connecting, the
serial line acts as an interrupt input.  Initially the line is in BRK,
and upon reception of a 0->1 transition (BRK->IDLE) the interpreter is
started, waiting for a start bit (0).

But.. that uses an isr, and is thus quite invasive..

Let's see.. If we can start target with the PGC output high, then it's
already ok.

Hmm.. can't get it to work atm.


Entry: full circle
Date: Sat Nov 22 17:09:16 CET 2008

next: do everything: programming + connecting in one go, and seal it up.

basic functionality works:

  staaplc -u -d pk2 rapid.f

this is without verify, and things go wrong when pk2 isn't closed
properly.

close fixed too.  looks like it's working now.


Entry: next
Date: Sun Nov 23 08:47:48 CET 2008

The next big hurdle is USB.  But it might be interesting to setup some
design flow documents showing a test-centered design approach.

* Bootloader + serial cable.  Flash once + enable boot protect.  All
  application code is loaded in incremental mode.

* PK2.  Flash application (whole program).  Incremental code mainly
  for debugging.

In the previous approach I used relative addressing to modify fields
in buffers.  Is this the right abstraction?  Going back to the radical
roots, I think I saw an interview with Chuck Moore about not using
C-like structures, but something else..  Is there anything there?

What is filling in a struct?  It is like calling a procedure and
passing parameters.  Procedures use the stack in Forth, so is there a
correspondence here?


Entry: minilanguage for usb drivers
Date: Sun Nov 23 13:10:21 CET 2008

USB code contains a huge amount of red tape not easy to factor in
procedures.  This could be a good opportunity to document the
construction of a minilanguage that abstracts this.

Let's first get the picstamp to boot..


Entry: optimization terminology
Date: Sun Nov 23 13:12:43 CET 2008

* constant propagation: replacing variables initialized with constants
  by the constant.

* constant folding (constant expression evaluation): eliminate
  expression evaluation by replacing expressions with results.

* inlining: eliminating procedure invocation, which is a run-time
  binding construct.


Entry: Parameterization
Date: Sun Nov 23 13:33:04 CET 2008

It's a bit hard to use..  The thing I seem to use the most is:

     * Literal macro arguments (for single procedures).
     * Global variables that get redefined.
     * Prefix parsers.

The first one is OK, but has limited use. (It can't define multiple
words in a simple way).  The second one is not good, but used a lot.
The third one doesn't get used really.

So, for redefinition of global variables (macros).  What's the deal
there?

  macro : foo 1 ;
  forth : bar foo ;
  macro : foo 2 ;
  forth

In this code, foo is:
   	0840 6EEC [movwf 236 0]
	0842 0C02 [retlw 2]

In a single compilation unit, macro definition ALWAYS comes before
code generation.  If a macro gets redefined, the generated code will
reflect this.

This is a bit raw, but very usable.  Note: this is used inside the
core compiler too.  Most language features are redefined or
specialized in this way, so it is kind of basic.


Entry: problems shutting down uart mode
Date: Sun Nov 23 18:26:04 CET 2008

When I use EXIT_UART_MODE, the chip won't properly reconnect on next
start.  When I don't do this, the pickit isn't closed down properly.
I don't know what this is..

Anyways, the following code works as a hackaround:

        (begin
          (uart-write 1 6)
          (msleep 100)
          (unless (uart-try-read)
            (printf "uart-start hack\n")))


Entry: the problem with writing lowlevel code..
Date: Sat Dec  6 12:55:54 CET 2008

I've been doing a full-time C job last 2 months.  The things I loose
most of my time with are not really programming tasks, but getting
hardware to work: using the right initializations, correct order of
operations and a load of workarounds.

The problem is STATE.  Hardware is overall tremendously stateful,
doesn't have clear interfaces that prevent illegal use, and has poor
error handling/reporting.

It does look like there is little to be done about this: the problem
is hardware itself:  problems are pushed to the software level since
they are easier to handle there.  A device driver is then really
mostly a design bugfix layer..

One thing which is tremendously useful in packet-based communication
is a sniffer with a packet parser.  For APIs it's a call trace.

So.. maybe for the USB drivers, I try to route it over ethernet (i
think that exists) to be able to sniff the traffic?


Entry: longer cables (pk2 over TCP)
Date: Wed Dec 10 22:41:57 CET 2008

I'm running into the problem that the host I can connect to my
experimental setup isn't very fast, and the fast one I'd like to do
dev on isn't near to my soldering table.

Basicly, I'd like to transport the PK2 protocol over TCP.  Since PK2
really isn't more than sending 64 byte buffers up and down, this
should be quite simple..


Entry: demos for dorkbot
Date: Thu Dec 11 21:56:19 CET 2008

I'd like to do the synth + the TV.

For the TV I have 13.5 Mhz xtal + now there's the bitbanged monitor
freeing one of the serial ports for raw binary output.  I don't
remember which chip can do this though..

The serial port seems to be this one: in 18f2550 datasheet, section
20.3 synchronous master mode.


The 13.5 Mhz is standard video pixel clock.  Can probably use one of
the usb devices they have higher clock ratings.

So... A better TV out circuit.  What i tried before is to use the 75
ohm impedance as a voltage divider, to not have to use a 75 Ohm output
impedance.  This is a dirty trick yeah..

So, to do it properly, generating the 4 levels (0V : 0.33V : 0.66V :
1V) requires a buffer amp to isolate the load impedance from the
resistor divider.  This frees resistor values a bit, but requires a
rail-to-rail amp if fed from a single supply.  For demo purpose it's
probably best to stick with the two simple resistors.

Asked Bert: To do it properly, provide double levels + add a unit
buffer opamp with a 75 ohm resistor in series (this assumes line is
properly terminated), maybe also add an output transistor to deliver
the current.

Non-terminated: it's possible to get the line levels always correct if
the buffer has zero output impedance.


Entry: Preparing for dorkbot
Date: Sun Dec 14 15:40:53 CET 2008

This works on my first 452 proto A/V proto board:
  staaplc -u -d pk2 452-40.f

But, the synth boards don't work.  They are running at 8 Mhz, maybe
it's too fast?  Let's lower the baud rate.

2400 doesn't work either.. It's probably something else..

I forgot how to load a forth file into the scheme console, so the asm
can be inspected.  OK: using
  (require (planet zwizwa/staapl/prj/pic18))

instead of
  (require (planetzwizwa/staapl/pic18))


The other option is to use the former, then run (init-namespace).

FIXES:
- baud rate was hardcoded
- 40Mhz goes up to 19200
- 8Mhz goes up to 4800

Look into this better: this probably due to start bit timing issues.

NEXT: load the whole synth in the app. OK.

Now, running the synth app.

NEXT: fix problem with ICD2 port being shut off.

Ok. not initializing the digital inputs seems to work.  Now there's a
problem when turning the synth engine on: this messes up busy-loop
timing.  Maybe the serial busyloop should disable interrupts?
Disabling the engine during command interpretation is probably
simplest.

Another option is to tie the bitbanged serial port to the interrupt
routine, and use the main timer for bitbanged receive.

The same problem is going to occur for ForthTV, so lets fix it
properly.  We need to poll for the start bit, disable the engine and
then fall into receive.  Is it ok to remove polling the start bit
TRANSITION in receive?


Entry: Polling the icd serial port
Date: Mon Dec 15 09:31:52 CET 2008

\ Custom startup
: warm
    init-all
    init-board-debug
    engine-on
    begin
        icd.rx-ready? if
            engine-off
            interpret-msg
            engine-on
        then
    again ;


This seems to work.


Entry: interrupting application
Date: Mon Dec 15 10:38:15 CET 2008

Shut down serial + power cycle?


Entry: list of interactive commands
Date: Mon Dec 15 12:59:58 CET 2008

Some of these are prefix commands: commands that take literal inputs
from the command input stream, instead of the data stack.

  see <word>           \ Disassemble <word>
  ul <path>            \ Replace current marked upload with file's program
  <n> abd              \ Dump data block n (64 bytes)
  <n> fbd              \ Dump flash block n (32 words)
  <n> p | ps | px      \ Print byte value as unsigned, signed or hex
  <n> _p | _ps | _px   \ Print word value (2 bytes) as unsigned, signed or hex
  <n> kb               \ Print a memory map for the first n kilobytes.
  words                \ Print on-target words, usable interactively
  macros               \ Print all defined names, usable in program code
  ts | tss | tsx       \ Print data stack bytes in unsigned, signed or hex
  _ts | _tss | _tsx    \ Print data stack words (2 bytes) in unsigned, signed or hex


Other more advanced commands:

  load <path>          \ Load and compile a file (doesn't upload)


Entry: ctrl-C
Date: Mon Dec 15 13:35:14 CET 2008

It would be nice if ^C resets the device while PK2 is being used.

Ok: did this: Theres a parameter "tethered-reset" that provides target
reset for a specific target IO mechanism.  For serial ports, this is
just the "cold" monitor command.  For PK2 this is an external reset.

This is driven by the console: whenever a user break arrives during
the execution of a command, (command "cold") is executed.  User break
during read-line is ignored.

It isn't very robust though.  For normal use it seems to work, but
interrupting commands like "4 kb" doesn't work that well.  Also, when
there is no console, there's no way to stop the console.

Ok.. After a reset, the console checks if "OK" works.  If not, it will
exit.

Hmm.. It's messed up now..
Re-flashing helped.. Probably something got killed.

OK. This is good enough.


Entry: forthtv
Date: Mon Dec 15 20:48:15 CET 2008

Next: porting the forthtv code.  This requires mostly cleanups of
language changes (already did "constant"), and some serial port logic.

Getting this to work for the serial port is not going to be simple..
Maybe we should work with some zero bytes to synchronize.


Entry: more forthtv
Date: Tue Dec 16 18:31:27 CET 2008

Got it to compile today, but it doesn't seem to want to run.
Something gets redefined.

Maybe it's the "hook.f" stuff?  This arrow notation i've changed in
the synth end of 2007.  (Forthtv last update is may 2008 for
workshop).

No, it's usec.


Entry: redefining names
Date: Tue Dec 16 18:51:04 CET 2008

It's necessary to find a way to manage redefinition of macro names.
This mechanism is terribly convenient for specialization
(configuration) but hard to debug.


Entry: dorkbot presentation
Date: Wed Dec 17 20:04:12 CET 2008

It's about 3 things really.

What is it?

 * a programming language
 * a compiler to PIC18 machine code
 * human interaction tools (console + inspection)

What is it more

 * a programmable programming language
 * a retargetable code generator
 * remote procedure calls

=> all features are programmable

The big ideas:

  * Forth as an efficient machine model
  * Lisp style macros (Scheme's hygienic version)
  * Short debug cycle

What makes Forth so nice?
It's a CUT & PASTE language. (concatenative)

  -> _VERY_ convenient for test-driven prototyping


Entry: sync jitter
Date: Wed Dec 17 20:36:05 CET 2008

TV stuff is working.. It would be nice to also solve the 0-1-2 jitter
problem.  Originally it was done using interrupts, which gave zero
jitter.  It can be solved using an instruction that conditionally
wastes once cycle: a zero offset conditional branch.

EDIT:

It's actually not so difficult:  Create an ISR that pops the return
address, then add a routine   : wait wait ;


Entry: dorkbot staapl presentation overview
Date: Fri Dec 19 09:16:38 CET 2008

  - general idea: 3 things
  - synth board:
      * interaction
          - set config (square, pwm, p0, ...)
          - upload script
      * demo "go"
  - TV board:
      * interaction + stop
      * show code


Entry: midi
Date: Thu Dec 25 18:43:23 CET 2008

Been thinking about using some kind of network <-> midi converters
where data is carried one-way over +- current loops, which can feed
the target.


Entry: Staapl definition
Date: Sun Jan  4 14:42:10 CET 2009

Staapl is a 2-stage concatenative language inspired by Forth, Joy and
Scheme.  It is built as follows:

  - [MACRO LANGUAGE] Semantics are defined in terms of primitive
    machine code transformers: each WORD is represented by a function
    mapping a string of machine code instructions to another one.

    Note this entails both EXPANSION and RECOMBINATION of
    primitive machine code: Macros perform computations.

  - [CONCATENATIVE LANGUAGE] Because the transformers are functions,
    function composition can be used as the language's composition
    mechanism.  This is represented syntactically by CONCATENATION.

  - [STACK LANGUAGE] These transformers operate on ONE END of a
    machine code strings which makes it easy to attribute the
    semantics of a STACK language.

  - [INSTANTIATION] The MACRO semantics is extended with run-time
    storage and indirection (2 stacks + machine's indirect jump
    unstruction) to provide run-time operations that mimick the
    semantics of the stack-of-code transformers.


The result is a Forth-like language with a well-structured macro
system.

It is 2-stage because of the existence of 2 very distinct code levels:
   - base machine language
   - machine language transformers

The transformers themselves can be programmed using mult-level
techniques (I.e. Scheme macros) if necessary.


A different semantics model ignores the explicit distinction between
target code generators and instantiated targed code (MACROs and
PROCEDUREs), and defines a program as a partial evaluation of a
functional concatenative program.  The main idea is to combine the
macro semantics with concrete semantics of the base machine language
to get to this simplified semantics.  However, in Staapl this is only
approximate and acts more like an ideal design guide.

--

I'm happy with this in the sense that it is a story that keeps coming
back, and it is fairly close to how it is implemented.  However, the
instantiation part isn't very clear yet...  This needs a proper
mathematical model: i guess the instantiation part will need proper
constraints.. I.e. it's easy to define for pure functions operating on
machine values, vs. transformation of literal load instructions, but
this has to be made more explicit.


Entry: genoeg gezaagd..
Date: Fri Jan 23 18:17:07 CET 2009

Time to get going again.  What's on the plate?

  - theory:

      * see above. complete the story. it's about macros and how
        forth/lisp fit into that picture.

      * bridge to C.  For commercial non-sealed applications, that's
        the only plug point.  Targets: AVR 8bit, DSPic 16bit and
        ARM/MIPS 32bit.

  - practical:

      * High-bandwidth communication.  I've been toying with ideas
        like CANbus, Ethernet and RS422/485 but it's probably best to
        stick to USB and build a router for serial protocols.  Also,
        USB is usable for other things, definitely for the systems
        programming I'm doing lately.  Also see:

        http://developer.berlios.de/projects/usb4rt

      * PICkit bugfixes.  It's mostly working, but it still crashes.
        Get rid of the 'uart hack' by looking at the actual output of
        the chip.  Setup some logic analyser.

   - education:

      * Get kids involved.  I've been aiming too much at trying to
        convince already fixed-minded embedded engineers.  Fuck them.
        Get some fresh minds in the picture.


Entry: concurrent
Date: Mon Jan 26 13:29:12 CET 2009

For CSP/transputer style tasks, it might be best to create a pure
stack machine VM for one of the architectures, one that has simple
task switches.

http://occam-pi.org/list-archives/occam-com/msg01148.html


Entry: multi-stage semantics
Date: Tue Jan 27 09:33:45 CET 2009

http://lambda-the-ultimate.org/node/3179

  "Ziggurat allows the language extender to optionally define static
  semantics for her new language, and connect these static semantics
  amongst language levels."

Isn't this what I'm looking for?  The move from compile-time semantics
coming from rewrite rules, to run-time semantics.  The paper calls
this "specification by compiler".


Entry: PIC30 binutils + gcc
Date: Fri Jan 30 14:02:08 CET 2009


http://www.baycom.org/~tom/dspic/

There are also deb packages, but I forget where I found them. Mirrored
here: (/etc/apt/sources.list)

deb http://zwizwa.be/debian unstable main


// test.c add1(int a){ return 1 + a; }

tom@del:/tmp$ pic30-elf-gcc test.c
pic30-elf-ld: cannot find -lpic30-elf
tom@del:/tmp$ pic30-elf-gcc -c test.c


I guess libpic30-elf is libc?


Entry: hybrid systems
Date: Mon Feb  9 17:24:28 CET 2009

Staapl is about 2-stage systems.  Currently this is PC-uC, but it
could very well be PC-DSP on soc designs like the TI DaVinci.


Entry: state machines / USB
Date: Wed Feb 11 15:44:46 CET 2009

Before finishing the USB driver, there are two more general tasks to
solve:

  * Figure out an approach to create state machine abstractions.

  * Refine the compiler which translates a highlevel device
    description into code that answers queries about it.

Keep this in mind: during device setup code doesn't need to be fast,
so it is better to aim for the right language abstraction instead of
trying to write optimal code, as I did before.

This is going to make the water muddy for a while, as I can anticipate
some problems.  One in particular is "require" in combination with the
way macros work in a flat language.

Steps:
 1. Build 18F2550 hardware setup. (USBPICstamp)
 2. Setup USB debug on host side.

Let's make this into a story about how to write a resonably complex
application with Staapl's macro approach.

-- Getting it to run

tom@zzz:~/staapl/app$ staaplc -u -d pk2 452-40.f
loading PICkit2 device file.
pk2-open: PICkit 2 Microcontroller Programmer
program: 532 bytes
config: 14 bytes
(Vdd 4.814453125)
(Vpp 11.9072265625)
uart-start hack
Connected (pk2 19200)
Press ctrl-D to quit.
OK

The same one for the 2550 doesn't work.  Some 1220 boards also do not
work, so I think this is a circuit issue.  What I do miss here is a
memory verify operation after programming.  Let's build that first.

Ok.  Apparently the 2550 doesn't allow flash write.  I think this is a
boot-protect problem: need to disable boot protect in config before
writing is possible.

Apparently this was the problem:
#x30000000 org  \ wrong
#x300000 org    \ right

Will add warning when there's no config memory.

I have to program it with
 pk2cmd -I -p PIC18F2620 -F logan.hex -M

Also, the console will only come up after one ctrl-C


Entry: fixing bugs w. bitbanged serial
Date: Thu Feb 12 09:05:16 CET 2009

Maybe I should switch to a proper bit-banged serial port
implementation first.  However, without knowing what exactly is
happening, I can't make proper decisions.

The hypothesis is that there is some transition on the host->target
line that gets mistaken as data.  I don't have anything to measure
this, but I can build something to measure it.

One possible route is to fix the logic analyser such that it is
connected with the ICD2 serial port for serial console, and the TTL
serial port for data transfer.  This does look like a lot of work..

Alternatively, we could try to get the PK2 logic analyser to work.  It
has higher bandwidth.  I do have two of them..  The problem is that it
has only a window of 1K samples for 3 channels..

So I do need my own, as I don't know yet what I'm looking for.  Let's
build it.

Next problem: there seems to be something wrong with chip erase.. I
can program fine, but it won't reset the bits before programming.

Ok. There was at least something wrong with the SetAddress script.

Now I get into trouble when I try to run
  (EXECUTE_SCRIPT (ConfigWrPrepScript))

car: expects argument of type <pair>; given ()


The other problem, when I (execute ProgMemWrPrepScript) the programmer
hangs.  The script that hangs is 18F_PrgMemRd64, but the writing seems
to have succeeded.

It doesn't behave the same on subsequent runs..  Something fishy going
on here..

WTF!
adding (READ_STATUS) after each read/write command seems to do the trick.

No.. back to square one.  It won't verify if I change the source file,
both for config mem and program mem.

One thing that needs to be on is external reset.  It should be
possible to do this with turning target off though..


Entry: global variables
Date: Thu Feb 12 10:48:34 CET 2009

What is actually a gobal variable?  It's something of which there is
only one instance.  I.e. a machine constant, or an application
constant.

It's easier to use global variables for driver configurations.  Driver
code is highly parametric, but doesn't usually need multiple
specializations per uC program.  I.e. console baud rate.  Currently
"baud" global is used everywhere in console and serial init, but this
needs to change so one can have a console AND a special purpose serial
port with a different baudrate.

In general I'm going to use this strategy: use both generic macros
with arguments AND some default macro which uses a global variable.


Entry: pk2cmd trace
Date: Thu Feb 12 18:48:18 CET 2009

This is:
~/pickit/pk2cmdv1.20LinuxMacSource/pk2cmd -I -p PIC18F2620 -F staapler.hex -M >staapler.pk2.log


--

W: A6 02 FE FD
EXECUTE_SCRIPT VDD_OFF VDD_GND_ON

W: A0 80 2A D2
SETVDD

W: A1 40 DF 9C
SETVPP

W: AF
R: 48 02 25 3C 0E 00 81 81 00 0F C0 0F E0 0F 40 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF 00
SCRIPT_BUFFER_CHKSM

W: AB
CLR_SCRIPT_BFR

W: A4 00 12 FA F7 F9 F5 F3 00 E8 14 F6 FB E7 7F F7 FA FB F6 E8 13
W: A4 01 08 FA F7 F8 F3 03 E8 0A F4
W: A4 02 2D DA 2A 0E DA 15 09 DA 00 00 DA F8 6E DA AA 0E DA 55 0A DA F7 6E DA AA 0E DA 54 09 DA 00 00 DA F6 6E EE 04 09 F2 00 F0 DA FF FF E9 09 01
W: A4 03 09 EE 04 09 F2 00 F0 E9 06 7F
W: A4 05 1B EE 04 00 F1 F2 0E DA F6 6E EE 04 00 F1 F2 0E DA F7 6E EE 04 00 F1 F2 0E DA F8 6E
W: A4 06 21 EE 04 00 F1 F2 0E DA F6 6E EE 04 00 F1 F2 0E DA F7 6E EE 04 00 F1 F2 0E DA F8 6E DA A6 8E DA A6 9C
W: A4 07 23 EE 04 0D F1 F1 E9 05 1E EE 04 0F F1 F1 EE 03 00 F3 04 E7 2F F3 00 E7 05 F2 00 F2 00 EE 04 0D F2 FF F2 FF
W: A4 08 1F DA A6 9E DA A6 9C EE 04 00 F1 F2 0E DA A9 6E EE 04 00 F1 F2 0E DA AA 6E DB DA 99 0E DA F5 6E
W: A4 09 21 DA A6 80 DA A8 50 DA F5 6E DA 00 00 DA 00 00 EE 04 02 F2 00 F0 DA A9 2A DA D8 B0 DA AA 2A E9 1E 1F
W: A4 0A 1C DA F8 6A DA A6 9E DA A6 9C EE 04 00 F1 F2 0E DA A9 6E EE 04 00 F1 F2 0E DA AA 6E DB
W: A4 0B 23 EE 04 00 F1 F2 0E DA A8 6E DA A6 84 DA A6 82 DA 00 00 E9 03 03 E8 01 DA 00 00 DA A9 2A DA D8 B0 DA AA 2A
W: A4 0D 1B DA 30 0E DA F8 6E DA 00 0E DA F7 6E DA 00 0E DA F6 6E EE 04 09 F2 00 F0 E9 06 0D
W: A4 0E 1E DA A6 8E DA A6 8C DA 00 EF DA 00 F8 DA 30 0E DA F8 6E DA 00 0E DA F7 6E DA F6 6E DB DB DB
W: A4 0F 33 EE 04 0F F1 F2 00 EE 03 00 F3 04 E7 2F F3 00 E7 05 F2 00 F2 00 DA F6 2A EE 04 0F F2 00 F1 EE 03 00 F3 04 E7 2F F3 00 E7 05 F2 00 F2 00 DA F6 2A E9 30 06
W: A4 11 1B DA 20 0E DA F8 6E DA 00 0E DA F7 6E DA 00 0E DA F6 6E EE 04 09 F2 00 F0 E9 06 07
W: A4 13 2F DA 20 0E DA F8 6E DA 00 0E DA F7 6E DA F6 6E DA A6 8E DA A6 9C EE 04 0D F1 F1 E9 05 02 EE 04 0F F1 F1 EE 03 00 F3 04 E7 2F F3 00 F2 00 F2 00
W: A4 16 32 DA 3C 0E DA F8 6E DA 00 0E DA F7 6E DA 05 0E DA F6 6E EE 04 0C F2 3F F2 3F DA 04 0E DA F6 6E EE 04 0C F2 8F F2 8F DA 00 00 EE 04 00 E8 01 F2 00 F2 00
W: A4 17 32 DA 3C 0E DA F8 6E DA 00 0E DA F7 6E DA 05 0E DA F6 6E EE 04 0C F2 0F F2 0F DA 04 0E DA F6 6E EE 04 0C F2 83 F2 83 DA 00 00 EE 04 00 E8 01 F2 00 F2 00
DOWNLOAD_SCRIPT

W: AF
R: 48 02 25 3C 0E 00 81 81 00 0F C0 0F E0 0F 40 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF 00
SCRIPT_BUFFER_CHKSM

W: A6 02 EA 00
EXECUTE_SCRIPT SET_ICSP_SPEED


W: A6 01 F7
EXECUTE_SCRIPT MCLR_GND_ON

W: A6 02 FC FF
EXECUTE_SCRIPT VDD_GND_OFF VDD_ON

W: A9 A5 00 01
W: A9 A5 16 01
W: A9 A5 01 01
CLEAR_UPLOAD_BFR RUN_SCRIPT (SCR_PROG_ENTRY)
CLEAR_UPLOAD_BFR RUN_SCRIPT (SCR_ERASE_CHIP)
CLEAR_UPLOAD_BFR RUN_SCRIPT (SCR_PROG_EXIT)

W: A6 02 FE FD
EXECUTE_SCRIPT VDD_OFF VDD_GND_ON

W: A6 01 F6
EXECUTE_SCRIPT MCLR_GND_OFF


W: A6 01 F7
EXECUTE_SCRIPT MCLR_GND_ON

W: A6 02 FC FF
EXECUTE_SCRIPT GND_OFF VDD_ON

W: A9 A5 00 01
CLEAR_UPLOAD_BFR RUN_SCRIPT (SCR_PROG_ENTRY)

W: A7 A8 03 00 00 00
CLR_DOWNLOAD_BFR DOWNLOAD_DATA

W: A9 A5 06 01
CLR_UPLOAD_BFR RUN_SCRIPT (SCR_PROGMEM_WR_PREP)

W: A7 A8 3D 1F D0 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
W: A8 3E FF FF FF E0 D0 9E BA 01 D0 FD D7 AB A2 03 D0 AB 98 AB 88 F8 D7 EC 6E AE 50 AB A4 02 D0 ED 50 F2 D7 12 00 AC B2 01 D0 FD D7 AD 6E ED 50 12 00 05 00 22 D8 FE 6E ED 50 FD 6E ED 50 12 00 EC
W: A8 3E 6E 00 0E AF D0 A6 8E A6 9C 7D D8 F9 D7 A6 8E A6 9C 7C D8 F5 D7 0F 0B 82 D8 F2 D7 14 D0 0F D0 14 D0 4B D0 50 D0 EC D7 FF 00 28 D0 1C D0 3B D0 2F D0 4F D0 5A D0 E7 D7 EA D7 80 D8 7F D0 EC
W: A8 3E 6E 01 0E 90 D8 8F D0 7A D8 DB D7 D3 DF D9 D7 76 D8 FF 0F ED 50 02 E3 72 D8 DE DF 12 00 F8 DF FE D7 6D D8 EC 6E 7F D0 FC DF E4 6E ED 50 EC 6E 09 00 F5 50 78 D8 E7 06 FA E1 E5 52 12 00 F1
W: A8 09 DF E4 6E ED 50 EC 6E DE 50
W: A9 A5 07 04
W: A7 A8 3D 6E D8 E7 06 FB E1 E5 52 12 00 55 D8 E4 6E ED 50 52 D8 F5 6E 0D 00 ED 50 E7 06 FA E1 E5 52 AD D7 4A D8 E4 6E ED 50 47 D8 DE 6E ED 50 E7 06 FB E1 E5 52 A3 D7 BF DF DA 6E ED 50 D9 6E ED
W: A8 3E 50 9D D7 B9 DF F7 6E ED 50 F6 6E ED 50 97 D7 EC 6E 40 0E E4 6E FF 0E EC 6E 09 00 F5 50 ED 14 E7 06 FA E1 E5 52 AA D7 EC 6E E9 50 80 0F A6 D7 EC 6E F2 9E 55 0E A7 6E AA 0E A7 6E A6 82 F2
W: A8 3E 8E ED 50 12 00 A6 84 A6 88 F3 D7 A6 84 A6 98 0A 00 EF DF 09 00 12 00 EF 60 EF 6E ED 50 E8 44 FD 26 01 0B FE 22 ED 50 12 00 00 D8 EC 6E 57 0E E4 6E ED 50 E7 06 FE E1 E5 52 12 00 81 AC 01
W: A8 3E D0 FD D7 F4 DF F2 DF EC 6E 08 0E E4 6E 00 0E 81 AC 02 D0 D8 80 01 D0 D8 90 E8 30 E7 DF E7 06 F7 E1 E5 52 12 00 8A 9E E1 DF EC 6E 08 0E E4 6E ED 50 E8 30 02 E3 8A 8E 01 D0 8A 9E D7 DF E7
W: A8 09 06 F8 E1 E5 52 8A 8E ED 50
W: A9 A5 07 04
W: A7 A8 3D D1 D7 00 0E FC 6E 00 EE 7F F0 10 EE 8F F0 8B 68 EC 6E 70 0E D3 6E 93 8C 93 9E ED 50 5C D7 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
W: A8 3E FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
W: A8 3E FF FF FF FF FF C8 0F 12 00 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
W: A8 3E FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
W: A8 09 FF FF FF FF FF FF FF FF FF
W: A9 A5 07 04
CLR_DOWNLOAD_BFR DOWNLOAD DATA ...            |
CLR_UPLOAD_BUFFER RUN_SCRIPT (SCR_PROGMEM_WR) | 3x

W: A9 A5 01 01
CLR_UPLOAD_BUFFER RUN_SCRIPT (SCR_PROG_EXIT)


W: A9 A5 00 01
CLR_UPLOAD_BUFFER RUN_SCRIPT (SCR_PROG_ENTRY)

W: A7 A8 03 00 00 00
CLR_DOWNLOAD_BUFFER DOWNLOAD_DATA

W: A9 A5 0A 01
CLR_UPLOAD_BUFFER RUN_SCRIPT (SCR_EE_WR_PREP)

W: A7 A8 10 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
CLR_DOWNLOAD_BUFFER DOWNLOAD_DATA

W: A9 A5 0B 10
CLR_UPLOAD_BUFFER RUN_SCRIPT (SCR_EE_WR)

W: A9 A5 01 01
CLR_UPLOAD_BUFFER RUN_SCRIPT (SCR_PROG_EXIT)


W: A9 A5 00 01
CLR_UPLOAD_BUFFER RUN_SCRIPT (SCR_PROG_ENTRY)

W: A7 A8 08 FF FF FF FF FF FF FF FF
CLR_DOWNLOAD_BUFFER DOWNLOAD_DATA

W: A9 A5 13 01
CLR_UPLOAD_BUFFER RUN_SCRIPT (SCR_USERID_WR)

W: A9 A5 01 01
CLR_UPLOAD_BUFFER RUN_SCRIPT (SCR_PROG_EXIT)

W: A6 02 FE FD
EXECUTE_SCRIPT VDD_OFF VDD_GND_ON

W: A6 01 F6
EXECUTE_SCRIPT MCLR_GND_OFF

W: A6 02 FC FF
EXECUTE_SCRIPT VDD_GND_OFF VDD_ON


W: A9 A5 00 01
CLR_UPLOAD_BUFFER RUN_SCRIPT (SCR_PROG_ENTRY)

W: A7 A8 03 00 00 00
CLR_DOWNLOAD_BUFFER DOWNLOAD_DATA

W: A9 A5 05 01 # SCR_PROGMEM_ADDRSET
CLR_UPLOAD_BFR RUN_SCRIPT

W: A9 A5 03 01 AC
R: 1F D0 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF 00
W: AC
R: E0 D0 9E BA 01 D0 FD D7 AB A2 03 D0 AB 98 AB 88 F8 D7 EC 6E AE 50 AB A4 02 D0 ED 50 F2 D7 12 00 AC B2 01 D0 FD D7 AD 6E ED 50 12 00 05 00 22 D8 FE 6E ED 50 FD 6E ED 50 12 00 EC 6E 00 0E AF D0 00
W: A9 A5 03 01 AC
R: A6 8E A6 9C 7D D8 F9 D7 A6 8E A6 9C 7C D8 F5 D7 0F 0B 82 D8 F2 D7 14 D0 0F D0 14 D0 4B D0 50 D0 EC D7 FF 00 28 D0 1C D0 3B D0 2F D0 4F D0 5A D0 E7 D7 EA D7 80 D8 7F D0 EC 6E 01 0E 90 D8 8F D0 00
W: AC
R: 7A D8 DB D7 D3 DF D9 D7 76 D8 FF 0F ED 50 02 E3 72 D8 DE DF 12 00 F8 DF FE D7 6D D8 EC 6E 7F D0 FC DF E4 6E ED 50 EC 6E 09 00 F5 50 78 D8 E7 06 FA E1 E5 52 12 00 F1 DF E4 6E ED 50 EC 6E DE 50 00
W: A9 A5 03 01 AC
R: 6E D8 E7 06 FB E1 E5 52 12 00 55 D8 E4 6E ED 50 52 D8 F5 6E 0D 00 ED 50 E7 06 FA E1 E5 52 AD D7 4A D8 E4 6E ED 50 47 D8 DE 6E ED 50 E7 06 FB E1 E5 52 A3 D7 BF DF DA 6E ED 50 D9 6E ED 50 9D D7 00
W: AC
R: B9 DF F7 6E ED 50 F6 6E ED 50 97 D7 EC 6E 40 0E E4 6E FF 0E EC 6E 09 00 F5 50 ED 14 E7 06 FA E1 E5 52 AA D7 EC 6E E9 50 80 0F A6 D7 EC 6E F2 9E 55 0E A7 6E AA 0E A7 6E A6 82 F2 8E ED 50 12 00 00
W: A9 A5 03 01 AC
R: A6 84 A6 88 F3 D7 A6 84 A6 98 0A 00 EF DF 09 00 12 00 EF 60 EF 6E ED 50 E8 44 FD 26 01 0B FE 22 ED 50 12 00 00 D8 EC 6E 57 0E E4 6E ED 50 E7 06 FE E1 E5 52 12 00 81 AC 01 D0 FD D7 F4 DF F2 DF 00
W: AC
R: EC 6E 08 0E E4 6E 00 0E 81 AC 02 D0 D8 80 01 D0 D8 90 E8 30 E7 DF E7 06 F7 E1 E5 52 12 00 8A 9E E1 DF EC 6E 08 0E E4 6E ED 50 E8 30 02 E3 8A 8E 01 D0 8A 9E D7 DF E7 06 F8 E1 E5 52 8A 8E ED 50 00
W: A9 A5 03 01 AC
R: D1 D7 00 0E FC 6E 00 EE 7F F0 10 EE 8F F0 8B 68 EC 6E 70 0E D3 6E 93 8C 93 9E ED 50 5C D7 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF 00
W: AC
R: FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF 00
W: A9 A5 03 01 AC
R: C8 0F 12 00 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF 00
W: AC
R: FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF 00
CLR_UPLOAD_BFR RUN_SCRIPT (SCR_PROGMEM_RD) UPLOAD_DATA_NOLEN UPLOAD_DATA_NOLEN ...

W: A9 A5 01 01
CLR_UPLOAD_BUFFER RUN_SCRIPT (SCR_PROG_EXIT)


W: A9 A5 00 01
CLR_UPLOAD_BUFFER RUN_SCRIPT (SCR_PROG_ENTRY)

W: A7 A8 03 00 00 00
CLR_DOWNLOAD_BUFFER DOWNLOAD_DATA

W: A9 A5 08 01
CLR_UPLOAD_BFR RUN_SCRIPT (SCR_EE_RD_PREP)

W: A9 A5 09 04 AC
R: FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF 00
W: AC
R: FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF 00
W: A9 A5 09 04 AC
R: FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF 00
W: AC
R: FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF 00
W: A9 A5 09 04 AC
R: FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF 00
W: AC
R: FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF 00
W: A9 A5 09 04 AC
R: FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF 00
W: AC
R: FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF 00
W: A9 A5 09 04 AC
R: FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF 00
W: AC
R: FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF 00
W: A9 A5 09 04 AC
R: FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF 00
W: AC
R: FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF 00
W: A9 A5 09 04 AC
R: FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF 00
W: AC
R: FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF 00
W: A9 A5 09 04 AC
R: FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF 00
W: AC
R: FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF 00
CLR_UPLOAD_BFR RUN_SCRIPT (SCR_EE_RD) UPLOAD_DATA_NOLEN UPLOAD_DATA_NOLEN


W: A9 A5 01 01
CLR_UPLOAD_BUFFER RUN_SCRIPT (SCR_PROG_EXIT)


W: A9 A5 00 01
CLR_UPLOAD_BUFFER RUN_SCRIPT (SCR_PROG_ENTRY)

W: A9 A5 11 01 AC
CLR_UPLOAD_BUFFER RUN_SCRIPT (SCR_USERID_RD)
R: FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF 00

W: A9 A5 01 01
CLR_UPLOAD_BUFFER RUN_SCRIPT (SCR_PROG_EXIT)

W: A6 02 FE FD
EXECUTE_SCRIPT VDD_OFF VDD_GND_ON

W: A6 01 F7
EXECUTE_SCRIPT MCLR_GND_ON

W: A6 02 FC FF
EXECUTE_SCRIPT GND_OFF VDD_ON

W: A9 A5 00 01
CLR_UPLOAD_BUFFER RUN_SCRIPT (SCR_PROG_ENTRY)

W: A7 A8 03 00 00 00
CLR_DOWNLOAD_BFR DOWNLOAD_DATA

W: A9 A5 0E 01
CLR_UPLOAD_BUFFER RUN_SCRIPT (SCR_CONFIG_WR_PREP)

W: A7 A8 10 00 06 0F 0E 00 81 81 00 0F C0 0F E0 0F 40 A5 BF
CLR_DOWNLOAD_BUFFER DOWNLOAD_DATA

W: A9 A5 0F 01
CLR_UPLOAD_BUFFER RUN_SCRIPT (SCR_CONFIG_WR)

W: A9 A5 01 01
CLR_UPLOAD_BUFFER RUN_SCRIPT (SCR_PROG_EXIT)

W: A6 02 FC FF
EXECUTE_SCRIPT GND_OFF VDD_ON

W: A9 A5 00 01
CLR_UPLOAD_BUFFER RUN_SCRIPT (SCR_PROG_ENTRY)

W: A9 A5 0D 01
CLR_UPLOAD_BUFFER RUN_SCRIPT (SCR_CONFIG_RD)

W: AA
R: 0E 00 06 0F 0E 00 81 81 00 0F C0 0F E0 0F 40 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF 00
UPLOAD_DATA

W: A9 A5 01 01
CLR_UPLOAD_BUFFER RUN_SCRIPT (SCR_PROG_EXIT)

W: A6 02 FE FD
EXECUTE_SCRIPT VDD_OFF VDD_GND_ON

PICkit 2 Program Report
12-2-2009, 18:44:57
Device Type: PIC18F2620

Program Succeeded.

W: A6 01 F7
W: A6 02 FC FF
W: A9 A5 00 01
W: A9 A5 02 01
W: AA
R: 02 86 0C 0F 0E 00 81 81 00 0F C0 0F E0 0F 40 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF 00
W: A9 A5 01 01
W: A6 02 FE FD
W: A6 01 F6
Device ID = 0C80
Revision  = 0006
Device Name = PIC18F2620
W: A6 01 F7
W: A6 02 FE FD
W: A2
R: 05 00 0C 0F 0E 00 81 81 00 0F C0 0F E0 0F 40 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF 00

Operation Succeeded


--

Keeping it in reset before turning on the voltage seems to make things
behave better.  No more 'uart hack'.

But that's it.. I don't see anything wrong except that my scripts are
not uploaded...  Maybe they contain loops and this won't work?

No difference..

It's really strange: the 1220 does erase but no program, while the
2620 doesn't do erase.

Jeezes..

There is  a spot with delay:

I've added a delay in:
(define (target-on)
  (CLR_UPLOAD_BFR)
  (EXECUTE_SCRIPT (VDD_GND_OFF)
                  (VDD_ON))
  (msleep 150) ;; from pk2cmd source
  (void))

But this gives all zeros..

There's only one thing to do: start from an exact copy of the trace
and build abstractions around that.

I don't know what I messed up, but it did work before..

http://www.auelectronics.selfip.com/pdfs/CB0703_PICKit2_Schematic.pdf


Entry: more pk2
Date: Sat Feb 14 12:03:09 CET 2009

I don't understand... The dumps are almost exactly the same, but it
doesn't work..  Somehow there's something wrong with the primitives.

I'm loosing to much time with this.  Isn't there a better way?  What I
want is to upload a program and then connect to the serial port,
that's all.

Here's a dump of

 ~/pickit/pk2cmdv1.20LinuxMacSource/pk2cmd -I -p PIC18F2620 -E >staapler.pk2.erase.log

W: A6 02 FE FD
W: A0 80 2A D2
W: A1 40 DF 9C
W: AF
R: 3F 00 F7 25 9E BA 01 D0 FD D7 AB A2 03 D0 AB 98 AB 88 F8 D7 EC 6E AE 50 AB A4 02 D0 ED 50 F2 D7 12 00 AC B2 01 D0 FD D7 AD 6E ED 50 12 00 05 00 22 D8 FE 6E ED 50 FD 6E ED 50 12 00 EC 6E 00 0E 00
W: AB
W: A4 00 12 FA F7 F9 F5 F3 00 E8 14 F6 FB E7 7F F7 FA FB F6 E8 13
W: A4 01 08 FA F7 F8 F3 03 E8 0A F4
W: A4 02 2D DA 2A 0E DA 15 09 DA 00 00 DA F8 6E DA AA 0E DA 55 0A DA F7 6E DA AA 0E DA 54 09 DA 00 00 DA F6 6E EE 04 09 F2 00 F0 DA FF FF E9 09 01
W: A4 03 09 EE 04 09 F2 00 F0 E9 06 7F
W: A4 05 1B EE 04 00 F1 F2 0E DA F6 6E EE 04 00 F1 F2 0E DA F7 6E EE 04 00 F1 F2 0E DA F8 6E
W: A4 06 21 EE 04 00 F1 F2 0E DA F6 6E EE 04 00 F1 F2 0E DA F7 6E EE 04 00 F1 F2 0E DA F8 6E DA A6 8E DA A6 9C
W: A4 07 23 EE 04 0D F1 F1 E9 05 1E EE 04 0F F1 F1 EE 03 00 F3 04 E7 2F F3 00 E7 05 F2 00 F2 00 EE 04 0D F2 FF F2 FF
W: A4 08 1F DA A6 9E DA A6 9C EE 04 00 F1 F2 0E DA A9 6E EE 04 00 F1 F2 0E DA AA 6E DB DA 99 0E DA F5 6E
W: A4 09 21 DA A6 80 DA A8 50 DA F5 6E DA 00 00 DA 00 00 EE 04 02 F2 00 F0 DA A9 2A DA D8 B0 DA AA 2A E9 1E 1F
W: A4 0A 1C DA F8 6A DA A6 9E DA A6 9C EE 04 00 F1 F2 0E DA A9 6E EE 04 00 F1 F2 0E DA AA 6E DB
W: A4 0B 23 EE 04 00 F1 F2 0E DA A8 6E DA A6 84 DA A6 82 DA 00 00 E9 03 03 E8 01 DA 00 00 DA A9 2A DA D8 B0 DA AA 2A
W: A4 0D 1B DA 30 0E DA F8 6E DA 00 0E DA F7 6E DA 00 0E DA F6 6E EE 04 09 F2 00 F0 E9 06 0D
W: A4 0E 1E DA A6 8E DA A6 8C DA 00 EF DA 00 F8 DA 30 0E DA F8 6E DA 00 0E DA F7 6E DA F6 6E DB DB DB
W: A4 0F 33 EE 04 0F F1 F2 00 EE 03 00 F3 04 E7 2F F3 00 E7 05 F2 00 F2 00 DA F6 2A EE 04 0F F2 00 F1 EE 03 00 F3 04 E7 2F F3 00 E7 05 F2 00 F2 00 DA F6 2A E9 30 06
W: A4 11 1B DA 20 0E DA F8 6E DA 00 0E DA F7 6E DA 00 0E DA F6 6E EE 04 09 F2 00 F0 E9 06 07
W: A4 13 2F DA 20 0E DA F8 6E DA 00 0E DA F7 6E DA F6 6E DA A6 8E DA A6 9C EE 04 0D F1 F1 E9 05 02 EE 04 0F F1 F1 EE 03 00 F3 04 E7 2F F3 00 F2 00 F2 00
W: A4 16 32 DA 3C 0E DA F8 6E DA 00 0E DA F7 6E DA 05 0E DA F6 6E EE 04 0C F2 3F F2 3F DA 04 0E DA F6 6E EE 04 0C F2 8F F2 8F DA 00 00 EE 04 00 E8 01 F2 00 F2 00
W: A4 17 32 DA 3C 0E DA F8 6E DA 00 0E DA F7 6E DA 05 0E DA F6 6E EE 04 0C F2 0F F2 0F DA 04 0E DA F6 6E EE 04 0C F2 83 F2 83 DA 00 00 EE 04 00 E8 01 F2 00 F2 00
W: AF
R: 48 02 25 3C 9E BA 01 D0 FD D7 AB A2 03 D0 AB 98 AB 88 F8 D7 EC 6E AE 50 AB A4 02 D0 ED 50 F2 D7 12 00 AC B2 01 D0 FD D7 AD 6E ED 50 12 00 05 00 22 D8 FE 6E ED 50 FD 6E ED 50 12 00 EC 6E 00 0E 00
W: A6 02 EA 00
Erasing Device...
W: A6 01 F7
W: A6 02 FC FF
W: A9 A5 00 01
W: A9 A5 16 01
W: A9 A5 01 01
W: A6 02 FE FD
W: A6 01 F6
W: A6 01 F7
W: A6 02 FC FF
W: A9 A5 00 01
W: A9 A5 02 01
W: AA
R: 02 86 0C 3C 9E BA 01 D0 FD D7 AB A2 03 D0 AB 98 AB 88 F8 D7 EC 6E AE 50 AB A4 02 D0 ED 50 F2 D7 12 00 AC B2 01 D0 FD D7 AD 6E ED 50 12 00 05 00 22 D8 FE 6E ED 50 FD 6E ED 50 12 00 EC 6E 00 0E 00
W: A9 A5 01 01
W: A6 02 FE FD
W: A6 01 F6
Device ID = 0C80
Revision  = 0006
Device Name = PIC18F2620
W: A6 01 F7
W: A6 02 FE FD
W: A2
R: 05 00 0C 3C 9E BA 01 D0 FD D7 AB A2 03 D0 AB 98 AB 88 F8 D7 EC 6E AE 50 AB A4 02 D0 ED 50 F2 D7 12 00 AC B2 01 D0 FD D7 AD 6E ED 50 12 00 05 00 22 D8 FE 6E ED 50 FD 6E ED 50 12 00 EC 6E 00 0E 00

Operation Succeeded


# The only thing that matters here is:

W: A6 01 F7
W: A6 02 FC FF
W: A9 A5 00 01  # ENTER PROGRAMMING
W: A9 A5 16 01
W: A9 A5 01 01  # LEAVE PROGRAMMING
W: A6 02 FE FD
W: A6 01 F6
W: A6 01 F7
W: A6 02 FC FF


# I don't see any differences..

It's probably something outside this layer.  Maybe USB doesn't get
initialized properly?  Maybe the PK2 needs an extra reset or so?


Entry: dropping pk2 support
Date: Sat Feb 14 12:31:52 CET 2009

It looks like it's best to drop programming support and focus on
serial connectivity only, although I have a hard time letting this go,
but I'm really stuck and it doesn't look like there's help available.

It might also be better to implement the serial + reset interface
outside of staapl.

So, let's do that.


Entry: pk2serial.c
Date: Sat Feb 14 19:00:57 CET 2009

In tools/pk2serial.c there's a stdio bridge for PK2 serial
communication.  This is useful as a general-purpose utility and might
be included in gputils?

Tested with loopback.

The bitbanged IO for PIC however is not really relyable.  I might want
to fix that first.


Entry: forget about bitbanged uart
Date: Sat Feb 14 19:41:36 CET 2009

Actually, at the expense of 2 pins I can just forget about bitbanged
serial and connect the uart+icd pins together and get on with writing
some software..  There is no general way to solve this: when the
serial port is necessary for something else, the console comm needs to
be incorporated into the whole app anyway, depending on the app's
need.  Timer-less / iterrupt-less bitbanged serial is really asking
for trouble apparently..

Maybe it's easier to get synchronous communication working over the
ICD2 connector.  Now that I do have the basic PK2 protocol figured out
and implemented in a C program, this step might not be as great..

The simplest way to do this is to use the AUX pin as a slave->master
interrupt.  Upon reception, the master can clock out the reply.  When
the message is finished, slave resets AUX pin.  Alternatively, after
writing a message the data pin could be used as an interrupt, and the
clock's falling edge would serve as an acknowledgement, with the
rising edge sampling the first bit.

But.. This is a still lot of work to get going.  Probably not worth
the trouble.

One more thing: PGC should then be connected to target's RX/DT and not
the CK input, so synchronous slave mode can only be used if these
wires are swapped..  Damn...

  ICD2    serial                 color

1 /MCLR                          white
2 VDD                            red
3 GND                            black
4 PGD     RX  (<- target TX/CK)  yellow
5 PGC     TX  (-> target RX/DT)  orange
6 PGM

Unidirectional transfer with PIC as slave seems simple enough
though.. But that's not very useful.

One last hack: After host has transmitted last message, client asserts
data line low.  Host starts reading out these low bits.  When client
is done, it waits for a clock signal and sends a single high bit, then
starts slave transmission of the message.  Host reads length byte, and
subsequent data bytes.


Entry: next
Date: Sat Feb 14 23:02:11 CET 2009

- report identifier errors with source location
- switch examples back to ICD + hardware uart.
- fix upload + console with external apps
- start working on 18F2550 USB support


Entry: workflow
Date: Sun Feb 15 11:24:10 CET 2009

Maybe staapl should behave a bit more like ordinary compiler /
telnet.  This works very well (staapl/app/Makefile)

.PHONY: 452-40
452-40:
	staaplc -d /dev/ttyUSB0 452-40.f
	pk2cmd -I -p  PIC18F452 -f 452-40.hex -M -R
	staapl 452-40.dict


Perfected it a bit with rules + programmer commands embedded in the .f
files.

It works on the 18f2550 too now.. Apparently one of the usbpicstamp
boards is defective.  Maybe oscillator?


Entry: wait a minute..
Date: Sun Feb 15 13:14:52 CET 2009

The pk2 code might actually work.  Maybe I just configured the wrong
device?  Exactly: trying to do a 2620 with a 1220 config.


Entry: gdb vs. forth
Date: Sun Feb 15 13:38:09 CET 2009

The Staapl/Forth approach is to see a program as a library with
multiple entry points, which can be invoked during debugging.

In gdb it's also possible to call functions.  To debug a library of
functions, simply add dummy main() and instantiate the program.
I.e. test.c :

#include <stdio.h>
void boo(void){
	printf("boo!\n");
}

int main(int argc, char **argv){
	return 0;
}

tom@zzz:/tmp$ gcc -g test.c
tom@zzz:/tmp$ gdb ./a.out
GNU gdb 6.8-debian
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i486-linux-gnu"...
(gdb) break main
Breakpoint 1 at 0x80483c6: file test.c, line 7.
(gdb) r
Starting program: /tmp/a.out

Breakpoint 1, main () at test.c:7
7		return 0;
(gdb) p boo()
boo!
$1 = void
(gdb)


Entry: pk2serial.c
Date: Sun Feb 15 15:30:05 CET 2009

It seems to work fine, but when I connect it to the 2550, after a
while its TX goes low.. ???

Let's decouple this problem from getting the USB to work.. Use USB +
PK2 + serial connection.

It would be nice to have a fully functional pk2 interface, so I'm not
going to give up yet.  The synchronous + interrupt scheme mentioned
yesterday might be an interesting route and should be not too hard to
implement.


Entry: old usb driver
Date: Sun Feb 15 18:19:41 CET 2009

It works "a bit"..  I got stuck in may 2008 when trying to get it to
work on a tight schedule..  The problem then was probably a compiler
bug that I hope has made it out now..

The files I have are:

~/darcs/brood-4/prj/USBPicStamp/usb.f
~/darcs/brood-4/prj/USBPicStamp/cdc.usb
~/darcs/brood-4/host/usb.ss

These are now in staapl/pic18

The architecture is as follows:

  * Each usb driver starts from a .usb file containing a high level
    version of the configuration data structure that will be queried
    by the host.  From this a "compiled" version of this data
    structure is generated.

  * Next to this there is a driver library that implements the USB
    logic and state machine.


There is one big decision to be made up front though: use the extended
instruction set or not?  In this mode, there is a "frame pointer".
The first 0x60 bytes of the first page are interpreted relative to
FSR2.

The thing is: these might be convenient for c-style local variables or
objects, but I'm not sure if this deserves a whole new architecture /
ABI.  Is it possible to make this just a simple extension?  Or can we
do without?  I did use it extensively before..

At first sight, this does intruce a new problem: namespaces.  When
objects are used, they behave as namespaces, and the language simply
doesn't have any straightforward mechanisms for it..

So.. Let's just keep this as an experiment and solve it using dynamic
variables.  Together with cooperative multitasking and clear data
separation of the preemptive parts (interrupts) dynamic binding should
work just fine.


Entry: dynamic binding
Date: Sun Feb 15 19:46:01 CET 2009

Since Forth already has its local mechanisms for function composition,
adding dynamic binding (in the form of shallow binding) to implement
limited extent objects might not be such a bad idea.

The big advantage here is ease of implementation:
    * It doesn't require separate namespaces (local vs. global variables)
    * It can use deterministic stacks for storage (the retain/return stack).
    * No indirect addressing is necessary (expensive on PIC)

There are plenty of disadvantages though, most of them boiling down to
"no closures".

But, for a real-time language without parameter names, this is a
really good mechanism, since it is 100% deterministic for memory
allocation.


Entry: 18F extended mode
Date: Mon Feb 16 13:22:52 CET 2009

Note however that this IS an interesting target for writing a more
traditional algol style language without having to resort to register
allocation.


Entry: linux usbmon
Date: Tue Feb 17 10:08:52 CET 2009


To enable usbmon in the 2.6.x kernel, follow these
steps :-

1) Compile the USB_MON module (Device Drivers / USB Support / USB
   Monitor) or compile it into the kernel

2) Enable debugfs in the kernel, rebuild the kernel
   and reboot using the new kernel

3) mount -t debugfs none_debugs /sys/kernel/debug

4) modprobe usbmon

5) Find the Bus to which the USB device connects (cat
   /proc/bus/usb/devices and look for the 'T:' line which gives the
   bus number

6) cat /sys/kernel/debug/usbmon/<Bus Number>t > /tmp/Bus<Bus
   Number>.txt (NOTE : It's <Bus Number>t, not just <Bus Number>)

7) Use the USB device

8) Kill the 'cat' command

9) Examine the file /tmp/Bus<Bus Number>.txt

Read /usr/src/linux/Documentation/usb/usbmon.txt to
decode the output :-)


Peter


Entry: Macros and such
Date: Fri Feb 20 15:29:19 CET 2009

Some important ideas that need to be cleared out

- Lisp vs. other syntax and quasiquotation.

  As opposed to what I used to think, Lisp syntax is only _convenient_
  for quasiquotation based metaprogramming, but not at all
  _necessary_.  This is illustrated by MetaOcaml's syntax extensions.
  One just needs to extend the parser basicly.

- Pre-checks: giving semantics to the macro INPUT instead of
  "specification by compiler"

  As explained [1], untyped macros can be made much more useful when
  they can be combined with static semantics.  This is a very powerful
  idea: bottom up _typed_ language design.

- MetaOcaml approach: limited to bind and eval, no other constructs?

  In dynamic vs. static: A big advantage of Lisp/Scheme style macros
  is that ALL langauge constructs can be expanded to.  In MetaOcaml
  only EVALUATION TIME of nested expressions can be manipulated, but
  the expressions themselves are really only let, lambda and apply.

[1] http://www.cc.gatech.edu/~dfisher/ziggurat/icfp-ziggurat.tar.gz


Entry: next
Date: Sat Feb 21 11:24:39 CET 2009

I need to get organized..  It's difficult at this time to pick a
target, so here's a list.  First, the practical things that need to
happen:

   * USB driver + driver generator.
   * Better error reporting.
   * A logic analyser.

Then what I would like:

   * C parser + prettyprinter in typed scheme.
   * Use it to refactor PacketForth code.
   * Staapl core in typed scheme.
   * Understand the Ziggurat paper
   * Static semantics for Staapl forth.
   * Understand Tony Garnock-Jones' packrat parser.
   * Definitive state-machine abstraction.
   * A pure stack VM for implementing Occam primitives.
   * A macro-extensible static subset of (typed) Scheme that maps to C.


Entry: error messages
Date: Sat Feb 21 11:37:57 CET 2009

This is quite straightforward: for each error that's obscure, go fix
it!


  - Allow EOF terminated identifiers. Hacked by adding \#newline

  - reference to undefined identifier: macro/SETUP.wLengthHigh

      This should state the place of reference.  The problem is that
      this is due to namespace bindings..  Compiling as a module
      should work better?

      Hmm.. This is a can of worms.. Bout time it's getting fixed.

      FIXED: problem was in symbol prefixing: source info got lost.


Entry: require
Date: Sat Feb 21 12:58:05 CET 2009

The problem is that I don't remember how it works..  Let's go over it
again.

A "#lang planet zwizwa/staapl/pic18" will associate the forth reader
and expand the current file to a module form.  Let's add some debug
print to that.

The problem seems to be that files in the pic18/ directory are not in
the load path when we invoke the forth langauge as a module bye
requiring it.

First problem to fix: a "load" that occurs in a required .f module
should resolve relative to the location of that .f module.

I'm confused.. Let's try to get the minimal things to work first:

OK.  I've added another form in parser-tx.ss / require-tx

 (syntax-case code (planet staapl)
    ((_ planet module . code+) (next `(require (planet ,(p #'module))) #'code+))
    ((_ staapl module . code+) (next `(require (planet ,(p #'module) ("zwizwa" "staapl.plt"))) #'code+))
    ((_ module . code+)        (next `(require ,(p #'module))) #'code+)
    ))

This way "require staapl <staapl-file>" will work.

Let's see if this can be propagated all the way..  The idea is to have
some application that loads as a module.  This requires all
"parameters" to be defined.. (This is really getting a bit messy..)

Ok. so there's an example that can be loaded like this:
  mzscheme -p zwizwa/staapl/pic18/mod-example
or:
  (require (planet zwizwa/staapl/pic18/mod-example))

(I used an .ss extension to be able to use shorthand syntax.)

What does this give you?

After also requireing the support code (require (planet
zwizwa/staapl/pic18)) this gives you access to the compiled code:
(all-bin)


So.. This enables us to track down undefined symbols.  However,
compare these two:

tom@zzz:~/staapl/staapl/pic18$ mzc -k test.ss
mzc v4.1.0.3 [3m], Copyright (c) 2004-2008 PLT Scheme Inc.
"test.ss":
  making "/data/safe/tom/darcs/brood-5/staapl/pic18/test.ss"
  making "/home/tom/staapl/staapl/forth/module-reader.ss"
  making "/home/tom/staapl/staapl/pic18/lang/reader.ss"
  making "/home/tom/staapl/staapl/pic18/lang.ss"
compile: unbound identifier in module in: macro/device-descriptor

tom@zzz:/tmp$ cat broem.ss
#lang scheme/base
broem
tom@zzz:/tmp$ mzc -k broem.ss
mzc v4.1.0.3 [3m], Copyright (c) 2004-2008 PLT Scheme Inc.
"broem.ss":
  making "/tmp/broem.ss"
broem.ss:2:0: compile: unbound identifier in module in: broem

 === context ===
/usr/local/plt-zwizwa/lib/plt/collects/compiler/cm.ss:117:0: compile-zo*
/usr/local/plt-zwizwa/lib/plt/collects/compiler/cm.ss:201:0: compile-zo
/usr/local/plt-zwizwa/lib/plt/collects/compiler/cm.ss:235:2: do-check
/usr/local/plt-zwizwa/lib/plt/collects/compiler/cm.ss:281:4
for-loop

The source location information seems to get lost somewhere..  I have
no clue why..  Too many layers.

The problem seems to be in load-tx.  I guess in file->forth-syntax.
Nope.. All is ok there..

Looks like I found it:  in dispatch-element in rpn-tx.ss, after
mapping the symbol there seems to be no source info.  The mapper for
purrr is defined in coma/macro-tx.ss

This seems to trace back to ns-prefixed in scat/ns-tx.ss which traces
back to prefix in tools/stx.ss

YES! That's it. Fixed by adding 3rd argument to ->syntax:

(define (prefix . names)
  (let ((orig-stx (car (reverse names)))) ;; use original name info
    (->syntax orig-stx   ;; lexical context
              (string->symbol
               (apply string-append
                      (map
                       (lambda (x) (format "~a" (->datum x)))
                     names)))
              orig-stx   ;; source info
              )))


aint this pretty:


-*- mode: compilation; default-directory: "~/staapl/staapl/pic18/" -*-
Compilation started at Sat Feb 21 16:01:34

cd ~/staapl/staapl/pic18 && mzc -k test.ss
mzc v4.1.0.3 [3m], Copyright (c) 2004-2008 PLT Scheme Inc.
"test.ss":
  making "/data/safe/tom/darcs/brood-5/staapl/pic18/test.ss"
  making "/home/tom/staapl/staapl/pic18/lang.ss"
usb.f:207:4: compile: unbound identifier in module in: macro/device-descriptor

 === context ===
/usr/local/plt-zwizwa/lib/plt/collects/compiler/cm.ss:117:0: compile-zo*
/usr/local/plt-zwizwa/lib/plt/collects/compiler/cm.ss:201:0: compile-zo
/usr/local/plt-zwizwa/lib/plt/collects/compiler/cm.ss:235:2: do-check
/usr/local/plt-zwizwa/lib/plt/collects/compiler/cm.ss:281:4
for-loop


Compilation exited abnormally with code 1 at Sat Feb 21 16:01:36


Now, is it possible to somehow make the dynamic scripts in app/ behave
as if they were static modules?  This means "load" should know the
path even if it's required..

Now, if you add this it works:

#lang planet zwizwa/staapl/pic18
path /home/tom/staapl/staapl/pic18

The idea is: somwhere in the expansion of the module the path should
be added.

Ok.. simplified it to "path staapl pic18".  Is it possible (or
desirable) to do this automatically?

Next step: do the same in staaplc: instead of loading the file into a
namespace, do a toplevel "require".  Let's leave this for later.
There's some restructuring and simplification necessary to make this
the default application development approach.


Entry: usb
Date: Sat Feb 21 17:53:45 CET 2009

The core code for usb seems simple.  It needs rewriting with a bit
smarter use of "dynamic" variables and run-time constructs instead of
being littered with macros.

The most problematic is the code generator, which translates highlevel
descriptors to something that can be embedded in the code.


Entry: core design documentation
Date: Sat Mar  7 12:03:35 CET 2009

Maybe it's time to start abstracting the core design of the SCAT
machine: exactly what is a state, how to extend and lift states, how
can it be made imperative and how to relate it to monads.

A good starting point would be: what can you do when you type this in
drscheme and press "run" :

#lang planet zwizwa/staapl/pic18
: abc 123 ;

One example is (print-code).


Entry: hacking
Date: Sun Mar  8 09:20:34 CET 2009

Like I said before, I don't know where to move next.  However, I'd
like to concentrate on some practical things..

I still would like to give the synchronous ICD2 comm on PK2 approach a
try, since this would give a very straightforward tethered development
workflow.

I wonder, how does this relate to AVR programming?  I think AVR uses
(a variant of) SPI.

Let's try to re-incorporate the pk2 code back into the project, then
see if I can make a transmission work.


ah crap.. let's do something else..

I really need to focus on applications, and learn to live with the
incomplete state.


Entry: drscheme
Date: Sun Mar  8 12:44:15 CET 2009

running the macro expander on forth code gives an error. apparently
source location can be a symbol too, so the instantiation macro needs
extra quotation

FIXED.


Entry: Object code : types or other static information?
Date: Sun Mar  8 13:24:19 CET 2009

What information is attached to assembly language opcodes at compile
time?  At this moment, they are nothing but tags used to guide pattern
matching.  Maybe this should contain some more static info?

This is something i started working on in asm/static.ss but apparently
stopped.

This is one of the most important points of change: assembler
instructions need to be properly typed.


Entry: evaluation hardware
Date: Sat Mar 21 09:50:36 CET 2009

 - something with TI dsp, i.e. the davinci/C6000
 - xilinx virtex, 500k gates


Entry: reading C code
Date: Sat Mar 21 09:51:32 CET 2009

Not strictly part of Staapl, but I'd like to comment on it here.  Dave
Herman recently released version 2.0 of his C parsing library, and
suggested using his prettyprinter tools for generating C code.

Goal: create a struct reference -> get/set method refactoring tool and
let it loose on the PF source code.

http://planet.plt-scheme.org/display.ss?package=c.plt&owner=dherman
http://planet.plt-scheme.org/display.ss?package=pprint.plt&owner=dherman


Entry: FPGA
Date: Sat Mar 21 10:09:48 CET 2009

I'd like to start doing some VHDL experiments.  What hardware to use?
There are 2 big players: Xilinx and Alterra.  Both have limited free
tools that work on linux.

It's a bit arbitrary, but I'm going to stick with Xilinx.
Circumstantial bias mostly points in that direction...

There are mainly 2 classes of devices:

  - high density: Virtex
  - low density: Spartan

Dev boards:

  Spartan 3A - 400k gates.

   49,-  http://www.xilinx.com/products/devkits/aes_sp3a_eval400_avnet.htm


Entry: DSP target
Date: Sat Mar 21 11:00:53 CET 2009

C64x+ fixed point DSP as a Staapl target.  The most likely candidate
is the Davinci chip, and the Neuros open settop box.

http://store.neurostechnology.com/neuros-osd2-platform-p-55.html

Texas Instruments DaVinci DM6446 SoC.  = ARM9 + C64x+
http://focus.ti.com/docs/prod/folders/print/tms320dm6446.html


For TI C67x+ floating point DSPs, start here:
http://focus.ti.com/paramsearch/docs/parametricsearch.tsp?family=dsp&sectionId=2&tabId=1948&familyId=1401
TI Home > Digital Signal Processing > Processor Platforms > C6000™ Floating-point DSPs


Entry: Forth and data structures
Date: Sun Mar 22 21:51:26 CET 2009

There is a clear link between lazy function application and data
structure pattern matching in a functional programming language.  The
body of a pattern matching clause behaves as the body of a function:
the data structure "transports" the parameters of datastructure
construction to the datastructure deconstruction binding environment,
much like the invokation of a function does.

In Forth this isn't so clear because there are no lexical binding
forms.  To get a similar behaviour, a data structure should be
"executable" and load a number of values on the parameter stack,
however, this is impractical.

I have the impression that Forth is better at "process" oriented
programming instead of data structure manipulation: data structures
are about postponing interpretation, while embedded processing is
mostly about immediate action/reaction: there is more code than data.

This is exactly the reason why I moved to Scheme to implement the
compiler (compile time data structure interpretation).

Can this be made into some kind of mantra?

       process           <->         structure
       RUN                           COMPILE

Or put differently: When writing highly specialized lowlevel code, try
to avoid datastructures, or only use the datastructures at compile
time.  Think of that which would be the result of deforestation in a
FP language, and write it directly, or pass the datastructures between
macros.  Deforestation is essentially jit-compilation: eliminate
intermediate (code) representation.


Entry: openfirmware
Date: Sun Mar 22 23:30:56 CET 2009

 svn co svn://openfirmware.info/openfirmware
 cd openfirmware/cpu/x86/pc/emu/build
 make
 cp emuofw.rom /usr/share/qemu/
 qemu -bios emuofw.rom -hda fat:/tmp


Entry: eForth primitives
Date: Fri Mar 27 20:22:33 CET 2009

I'm implementing a machine to run eForth.  Almost done with the
primitives, except execution model and dictionary format.

I had a very nice introduction to eForth as a pdf somewhere..

This looks OK: http://www.offete.com/files/zeneForth.htm

First: what is the point?

I'd like to find a way to implement a stand-alone ANS Forth on top of
Staapl primitives.  But what I really need is a Forth that will run
partly on the host and partly on the target.

I'm trying to first implement it as is, and solve the bootstrapping
problem.  Probably translation to a standard binary image that can by
itself be translated to a binary image of a real machine.


OK.  Got primtive execution working..

The thing which always confused me is that on an emulated machine, I
have this tendency to use 3 levels: threaded code, primitives
represented as numeric opcodes, and real primitives (here Scheme
thunks).

But this isn't necessary.  Forth primitives that end in a jump to NEXT
can be represented by sequential scheme code ending in a NEXT
tailcall.

Next: bootstrap threaded code.  With a threaded interpreter,
primitives can't be executed directly but need to be wrapped in
highlevel words that contain the primitive instructions.


Entry: Threaded code + control primitives working
Date: Sat Mar 28 12:01:04 CET 2009

The only remaining parts are bootstrapping the rest of the Forth code
and/or defining the structure of the dictionary.

I'm thinking about bootstrapping the code by hand (writing an
interpreter in scheme) instead of using an external interpreter.  This
requires breaking two feedback loops:
   - parsing words
   - macros

Once the forth is bootstrapped it can be used to generate images for
other interpreters by changing the semantics.

How to represent the dictionary?  The bootstrap parser should work
directly on the binary form of the dictionary.  It looks like the best
approach is this: massage the Forth source file such that it is simple
enough to be parsed by a non-reflective parser, but remains ANS
compliant.

Removed all CODE words from eforth.f

The remaining forth->machine code reference problem is CALL, (and
=CALL)

How to bootstrap.

   * First make sure everything parses to lists of words.

   * Then it's probably best to use the macros from the .f file to do
     the rest of the compiling.  This requires them to be identified
     right after parsing.

   * Build a dependency graph of the code and manually resolve all
     circular conflicts by replacing words with primitives.


Entry: RTL eforth (RISC portability through macros)
Date: Sat Mar 28 13:03:08 CET 2009

There are a lot of operations that are reused between primitives:
  * stack push/pop
  * register/memory store/fetch

It might be interesting to write eForth completely in terms of these
primitives, not the forth ones, or divise forth primitive MACROS for
these.

The main idea is that not WORDS should be the basis, but MACROS that
generate primitive operations: write in terms of a generic register
machine.

With just memory access, optimizations could be made: register instead
of ram, indirection, etc..

So the main idea is: how to _automatically_ map a RISC machine (memory
+ MIMO logic blocks) to a Forth VM.

EDIT: the problem with this is that all code needs to be implemented
with registers, without using the "locals" construct that is so
convenient.  It requires knowledge of addressing modes and operator
arity.


Entry: Loops and data structures
Date: Sun Mar 29 12:27:38 CEST 2009

A simplified view on functional programming:

    FP is about passing data structures around until they are ready to
    be reduced into pure behaviour.

Ideally, you'd want to program subparts in terms of processors of
intermediate data stuctures, and have the compiler use those
datastructures as a skeleton to produce code sequencing with data
structures eliminated.

I can't express it yet -- the idea isn't formed completely, but it's
about this:

    Datastructures don't make sense without code processing it, and
    relating it to some kind of semantics derived from the real,
    physical world.

In other words, they need to be _compiled_ to some form of physical
machine code.

Now, the simplified view:

    Code = serialized traversal/update of data structures.

What is the link of this to deforestation: elimination of intermediate
datastructures?  The classic example is unix pipes: instead of
read/writing from/to intermediate data files, processes can be
connected using pipes.

It looks like some of the Oleg Kiselyov's ideas are starting to make sense
to me: next I'd like to understand the relation between staging,
deforestation and delimited control.

http://okmij.org/ftp/Computation/Continuations.html#context-OS
http://okmij.org/ftp/papers/context-OS.pdf
http://www.cs.rutgers.edu/~ccshan/zipper/poster.ps


Entry: The purpose of Coma
Date: Sun Mar 29 13:21:07 CEST 2009

Rehash of the ideas behind the to-be Coma layers.

         - conditional + tail recursion  for control flow
         - first order functions (no lambda)
         - higher order macros and structured parameterization.

I would like to take the approach of "mandatory deforestation".  Write
a language+metalanguage with compile-time only data structures, and
allow compilation of expression _only_ when complete deforestation
(elimination of _all_ data structures at compile time) is possible.
This leads to the following slogan:

         Staapl eliminate lists and lambdas.

The target machine should then be simple such that it does not need
garbage collection (i.e. make it linear) and even further: not need
stacks or linear trees either.  If I'm correct this is somewhat
related to the Hume project - a hierarchy of machines:

        1. finite state machines
        2. FSM + stacks (push down automata / linear structures)
        3. cons cells + lambda (needs garbage collection to implement)

Lambda is really the code equivalent of a cons cell: lambda conses
static code templates with an environment structure.


Entry: FL
Date: Sun Mar 29 16:26:51 CEST 2009

This is if I recall very much related to what John Nowak is working
on.  (Inspired by Backus' FP).  I find it difficult to follow the
posts on the concatenative [stack] list.  Maybe this paper helps: FL
is a kind of meta-FP.

   http://www.cs.berkeley.edu/~aiken/ftp/FL.ps


Entry: Forth vs. C
Date: Sun Mar 29 16:29:02 CEST 2009

I'm trying to see why a single (indexable) stack for both
continuations AND environment is so different from having the two
separated.  Also compare the forth 2 stacks with the E and K stacks
from the CEK machine.

http://www.cs.utah.edu/~mflatt/past-courses/cs6520/public_html/s00/secd.ps
http://www.cs.utah.edu/plt/publications/pllc.pdf  6.4 p73
http://lambda-the-ultimate.org/node/2423

Maybe I should implement a CEK machine and see how it behaves compared
to Forth?  I.e. identify where data sharing is introduced.

Maybe the key is in 'let' : it introduces an element in the
environment E, but leaves the continuation intact.  Which operation
leaves the environment intact but changes the continuation?


Entry: Shifting the Stage: Staging with Delimited Control
Date: Mon Mar 30 12:52:39 CEST 2009

This looks like an interesting next step:

http://okmij.org/ftp/Computation/Generative.html#circle-shift

In my own uneducated view, there seems to be a relation between
staging and delimited control.  The goal of staging is to re-order
evaluation so some evaluations can be performed before final code is
generated.  Intermediate data structures (intermediate compuations)
are eliminated.

Now.. A zipper produces some (coroutine style) execution sequencing.
It mainly turns things inside out directed by the shape of a data
structure: data is passed between processes in a way directed by the
data structure, but if the eventual result is not necessary,
re-constructing the structure can be avoided.  This is essentially
deforestation (elimination of intermediate structure, where
intermediate computation is a form of deconstructed intermediate
structure.).

Now, that's a very vague notion.  Let's try to make it more precise.
Maybe it's the "left fold with abort" that's been mentioned in Oleg's
writings.

Let's just write a zipper based on a left (non-recursive) fold.

This leads to:


(define-struct zipper (element state yield))
(define (collection/traverse->zipper collection init-state [fold traverse/fold])
  (reset
   (fold (lambda (el state)
           (shift k (make-zipper el state k)))
         init-state
         collection)))

(define (traverse/fold fn init-state lst)
  (let next ((s init-state) (l lst))
    (if (null? l) s
        (let ((s+ (or (fn (car l) s) s)))
          (next s+ (cdr l))))))

The contents of the struct:
  - element: current focus
  - state: state of the output (data structure)
  - yield: state of the input (continuation)


Now, does it make sense to put the input and output state on the same
ground?  In the zipper both input and output state are contained in
the continuation.  Is it possible to write a zipper which has only an
abstract representation of output state (compile-time only) but never
produces any real data:  all we're interested in is how code is
sequenced in the end.

That's another vague notion..

This probably relates to trouble with having to write code in CPS when
staging is involved.


Entry: normal numbers / autodiff -> memoized expressions
Date: Mon Mar 30 13:16:47 CEST 2009

Combining data structures reminds me of normal numbers and automatic
differentiation: instead of producing a giant expression, autodiff
produces a memoized version that has lower complexity than a fully
subsituted single expression.  Differentiation doesn't really change
complexity, but it is a "convolution" operation that increases only
memory use.

Maybe this can be generalized to composition of any
signal/image-processing operations: instead of working with
intermediate representations, use deforestation to construct memoized
expression iterators.


Entry: memoization
Date: Mon Mar 30 14:32:45 CEST 2009

http://okmij.org/ftp/Computation/staging/circle-shift.pdf

contains an example of how to memoize a recursive function using a
modified Y-combinator which leaves the (parameterized) function body
unmodified.

The paper is about memoization (let-insertion) and ... (if-insertion)
in a safe way (preventing side-effects to create unsoundness) using
restricted use of delimited control.


Entry: names in concatenative language
Date: Mon Mar 30 14:40:52 CEST 2009

What I still don't fully get is how to treat names in a concatenative
language in a localized way.  It has been quite clear to me for a
while that names should _only_ reflect functions, not values.  It's
possible to construct complete static reasoning based on functions
alone with values (i.e. numbers and stacks) being restricted to the
run time only.

But then.. What is a number?  In lambda calculus it can be modeled by
a function..  This remark is more about binding forms.  I.e. with the
``|'' word used in macros names get bound to values (actually constant
functions), though there was the doubt for a while to support a
different binding form that did not perform this form of automatic
abstraction.


Entry: Staapl vs. MetaOcaml
Date: Mon Mar 30 15:10:03 CEST 2009

Staapl is simpler due to absence of binding problems in the generated
language and due to absence of static type guarantees. But, can some
of the static typing used in MetaOcaml be used to make some operations
in Staapl better defined?

Maybe it's time to start splitting the project in two?  One part moves
along with Forth, Stacks and macros and evolves towards some kind of
type system or better characterization of the compile time
compilations, while the other uses MetaOcaml to target nested C
expressions so lambda calculus can be used and low-level machine
mapping is left to the C compiler.

I'm really not so interested in register allocation and
machine-specific data and control flow hacks.  Ok for simple
processors and Forth, but for RISC it's already been solved many
times.

So where do I move from here?  Probably typed scheme.

The MetaOcaml part will be split off as http://zwizwa.be/darcs/ip


Entry: let and lambda
Date: Mon Mar 30 15:22:58 CEST 2009

In another post I talked about using traversal macros in C instead of
traversal functions that take context objects since macros allow the
enclosing lexical environment to be used.

The real point is the difference between ``let'' and ``lambda''.  The
latter ``forks'' the stack while the former does not.

This leads to the following question:

    Is it possible to replace lambda with fork completely?
    Single-stack tasks instead of higher order functions?


Entry: static Staapl : core problem?
Date: Mon Mar 30 16:20:59 CEST 2009

The compile time data types are already typed, but the data type
processors are not: there might still be code transformer definitions
that don't make sense.

Because this is rather entangled, it might be interesting to just make
the cut (turn assembly opcodes into structs instead of lists) and see
where it starts to bleed.

From staapl/pic18/asm.ss it looks like the best place to start is the
instruction-set macro.

Next: setup some regression tests.


Entry: bugfixes + regression tests
Date: Tue Mar 31 10:33:17 CEST 2009

Before changing anyting in the compiler, testing needs to be made a
bit more relyable: The simplest to do is to record the output of
compiling all programs in app/ requiring these to be the same.

bug: "load synth/synth.f" doesn't work:

staaplc -d /dev/ttyUSB0 synth-icd-p18f1220.f
current-load-relative-directory: not a complete path: "synth/"

 === context ===
/home/tom/staapl/staapl/forth/parser-tx.ss:287:0: load-tx
/home/tom/staapl/staapl/forth/parser-tx.ss:287:0: load-tx
/home/tom/staapl/staapl/forth/parser-tx.ss:194:5
/home/tom/staapl/staapl/purrr/forth-begin-tx.ss:87:2: code->toplevel-form
/home/tom/staapl/staapl/purrr/forth-begin-tx.ss:46:0: forth-begin-tx

make: *** [synth-icd-p18f1220.hex] Error 1


OK. got toplevel "make test" now which builds everything from scratch.


Entry: instruction-set
Date: Tue Mar 31 11:13:36 CEST 2009

The assembler contains two things:
    - static description of I/O
    - a 2-way function (equation) relating bit vector to parsed bit vector

Let's look at typed assembler languages and see what the basic ideas
are.  Some form of annotation is required..

box> (instruction-set-tx #f #f #'((addwf (f d a) "0010 01da ffff ffff")))
(begin
  (begin
    (#f
     'addwf
     (make-asm
      (lambda (f d a)
        (list
         (ignore-overflow f 8 (ignore-overflow a 1 (ignore-overflow d 1 9)))))
      '(addwf (f d a) ((9 . 6) (d . 1) (a . 1) (f . 8)))))
    (#f
     9
     6
     (lambda (opcode)
       (match
        (cadr
         (chain
          `(,opcode ())
          (dasm-step 'f 8)
          (dasm-step 'a 1)
          (dasm-step 'd 1)))
        ((d a f) (list 'addwf f d a)))))))

This produces both an assembler and a disassembler.  For static
analysis however, these functions are better turned into
interpretation steps mapping between binary and struct:asm

Probably best to start from scratch since the current code is a bit
entangled.

So, basic idea: the assembler description is where machine and
compiler meet.  This should eventually be linked to a machine
simulator for verification.

;; Eventually the assembler should include a complete model of the
;; machine's execution engine, which combined with a memory model
;; (including memory mapped io devices) can fully simulate code.

;; The primary relation we're interested in now is the equivalence
;; relation BI <-> SI for asm/dasm.

;; BI = binary instruction
;; SI = symbolic instruction
;;  I = instruction (modulo equivalence, represented by SI)

;; asm  : SI -> BI
;; dasm : BI -> SI
;; sim  : SI -> machine -> machine


To get to know typed scheme I'm trying to express the representation
of the assemblers as ts types.

(define-struct: Bitfield ([value : Number] [width : Number]))
(define-struct: SI ([asm : Assembler] [dasm : Disassembler]))

(define-type-alias Disassembler (Bitfield -> (Listof Bitfield)))
(define-type-alias Assembler    ((Listof Bitfield) -> Bitfield))


Now, what should addwf be?

It needs to be an instance of a type since it will be used as part of
pattern based tree-transformation.

(define-struct: addwf ...)

But each instance of such a struct will be linked to an assembler and
disassembler function that map between this struct and Bitfields.  So
"addwf" is not just a type.

(define-struct: addwf ())
(: addwf-asm  (addwf -> Bitfield))
(: addwf-dasm (Bitfield -> addwf))

  Note: Writing in a typed language it strikes me that in an
        dynamically typed language you get polymorphy for free: types
        are predicate functions.


(define-struct: addwf ([f : (-> Bitfield)] [d : (-> Bitfield)] [a : (-> Bitfield)]))
(: addwf-asm  (addwf -> Bitfield))
(: addwf-dasm (Bitfield -> addwf))

The types of addwf's arguments are thunks.  The important thing to
note here is that they can't be numbers, since they might depend on
addresses.  This cyclic dependency is resolved later in the relaxation
algorithm, where the thunks are re-evaluated.

So, it might be best to include this in the typing too: a bitfield
contains something that depends on a dictionary.

Next: parameterize opcode (for classes of opcodes with same operands).
This is where the simulator needs to be plugged in too.

I.e. for  (addwf f d a)  we need:

     - a class (fda f d a)
     - behaviour in 2 stages: assembly and simulation

simulation is necessary for partial evaluation. (which then needs 2
separate semantics: bit-true and extended types.

  Note: this is starting to make sense.. requiring types for all this
        requires me to think harder about dependencies.  looks like
        figuring out this type system is the next stage for Staapl.  i
        really needed to do this once to know what to write down now..

Next:
 * make current implementation of addwf correct
 * add classes + 2-stage behaviour


Entry: defunctionalization
Date: Wed Apr  1 10:34:03 CEST 2009

http://www.kennknowles.com/blog/2008/05/24/what-is-defunctionalization/

    Defunctionalization is transforming a program to eliminate
    higher-order functions.

I ran into this before in the Gambit-C presentation by Marc Feeley:

http://www.iro.umontreal.ca/~boucherd/mslug/meetings/20041020/minutes-en.html


Entry: fine grained multitasking
Date: Wed Apr  1 21:06:32 CEST 2009

Here entry://20090322-215126 I wrote something about Forth being a
``control language''.  Something which is mostly sequencing of
instructions and control/branching instead of numbercrunching or
datastructure processing.

Is there a name for this class of programs?


Entry: ideas and confusion
Date: Fri Apr  3 12:54:07 CEST 2009

Thing's I'm working on but don't make complete sense yet:

  TYPES:

    - staged assembly language (machine semantics: compile / eval).
      eval = relative to machine, compile = relative to branch label
      allocation.

    - fixing the macro specification "patterns" language to use typed
      scheme.

  PEVAL:

    - "total deforestation" : use of intermediate high-level data
      structures for program specification and simulation +
      composition that allows total elimination of this intermediate
      data representation.

    - iterators -> tasks using delimited continuations (actor model)


Entry: machine
Date: Fri Apr  3 13:02:23 CEST 2009

Before going anywhere, it needs to be defined what a "machine" is.  It
is important that the description of the machine (data structures and
interpreter) is written in such a way that it can be easily
specialized/compiled (to C code).

One important question: why use bottom-up operational semantics?  Why
not define an intermediate semantics (i.e. Forth)?  The answer is:
this is about (specializing) compiler correctness.  The high level
semantics themselves are trivial and could be used to verify the
bottom-up semantics, but this requires complete simulation of the true
(physical) semantics.

A machine is:
  * a collection of registers / memory array
  * MIMO functions

Note that the registers are _not_ SA like in [LLVM], since we're
trying to model exact machine semantics for simulation, and are not
trying to map a generic language to a register machine.

Should I take a hint from this?  Should the machine model be
completely functional so code can be transformed into data flow
networks?  Or can I write a model using assignments only to avoid
overspecifying operations (declaring outputs), then use dataflow
analysis to transform it to a different representation?  I probably
need to take a better look at Chaper 8 in [ACDI].  A lot of intuition
is still missing..

The bottom line however: machine description needs both:

    - assembly (compilation)

    - simulation (evaluation / compile time computation / type
      system).

The description of the semantics should be compilable as fast and
standalone code (read: C).

I can do this only (practically) for simple machines, or subsets of
larger machines.

[ACDI] http://books.google.be/books?id=Pq7pHwG1_OkC
[LLVM] http://llvm.org/


Entry: staapl/comp
Date: Fri Apr  3 20:53:12 CEST 2009

;; Code to build structured assembly code graph from forth code. This
;; uses an extension to scat's 2-stack model to represent
;; concatenative macros with a Forth-style control stack.

I'm wondering if it's possible to completely eliminate this part.
Implementing coma control on top of comp's control is maybe not the
right approach.

The code in staapl/comp needs to be documented properly, or simplified
in a bottom up approach.  It's ok to have fallthrough and multiple
entry, and exit points, but the way it is implemented is not quite
clear.


Entry: transposed notation
Date: Sat Apr  4 09:50:59 CEST 2009


(define-syntax-rule (update ((formal next) ...))
  (lambda (formal ...) (values (next ...))))

What about using this macro to write down machine operations?  One
advantage of this is that it makes implementation faster because it
doesn't need dynamic memory for the arguments/return values.  (Can I
test this assumption?)

Let's stick to syntactic abstraction first and leave implementation
for later.


Entry: machine specification
Date: Sat Apr  4 11:05:53 CEST 2009

* The machine's namespace should be accessible partly.  I.e. if the
  machine has registers A B C, it should be possible to define an
  update using only a sumbset of the registers, with the other
  registers left untouched.

* Implementation of sequencing should be completely abstract (on the
  macro level).  This way it can later be changed from struct ->
  struct to values -> values to list -> list or whatever suits the
  needs.


So, what is necessary?

  NAMESPACE: some way of specifying the machine's namespace.  This
  could be implemented as a struct, list, vector, threaded values...
  This needs compile-time identifiers.

  UPDATE: parallel assignment defined for the abstract namespace.


Roadmap: first transform current machine's (i.e. in
comp/instantiate.ss and forth/parser-tx.ss to a transposed update
form, then replace with fixed namespace).  To do this, perform
automatic transformation of the state-lambda form in
staapl/scat/stack.ss

  1. trivial syntax-rules -> syntax-case
  2. pretty print old form for


Complications:

  * 'update is used explicitly in the state-lambda syntax, probably to
    allow for more flexible exit points.

  * pattern matching is used on the registers, so the name-based
    matching cannot be used, making the two mechanisms (machine
    namespace / positional binding + deconstruction) incompatible in
    the current form.


Example:


(define merge-store
  (state-lambda compiler
                 ('() ;; empty asm
                  (list-rest
                   (list asm rs (struct dict (current chain store)))
                   ctrl)
                  (struct dict (#f '() store+))
                  '()) ;; empty rs
                 (update asm
                         ctrl
                         (make-dict current chain
                                    (append store+ store))
                         rs)))

What about this form, annotated with some redundant visual marker syntax.

;; (field pattern expr)
((asm  :  '() -> asm)
 (ctrl :  (list-rest
           (list asm
                 rs
                 (struct dict
                         (current chain store)))
           ctrl)
          -> ctrl)
 (dict :  (struct dict
                  (#f '() store+))
          -> (make-dict current
                        chain
                        (append store+ store)))
 (rs   :  '() -> rs))


Advantage here is that it is immediately clear which registers are
modified.  A clause (r :) would mean (r : r -> r).

Now.. instead of using fn -> values, isn't it better to use tail
recursion (CPS) instead, and use delimited control to terminate a
sequence of operations?


Entry: values vs CPS
Date: Sat Apr  4 13:37:57 CEST 2009

For "functional update" of a state machine, it is simplest to use
Scheme's built-in tail recursion.  The idea is that heap data is not
allocated for building argument lists - instead this is implemented
using a stack with reclaimable memory.

For mzscheme the place to look seems to be scheme_values() defined in
src/fun.c:

  if (p->values_buffer && (p->values_buffer_size >= argc))

then values get copied to the current thread's values buffer,
otherwise the "slow" procedure is used, which allocates a new values
buffer.

Then there's stuff in the JIT.  Apparently the nb of function
arguments is limited to 3.

Anyways: implementation is for later.


The important question is: do we model individual transition functions
(machine state update using external sequencing) or do we include the
machine's continuation model in the transition model.  I.e.

          (define (instr R1 R2) (values (+ 1 R1) R2))

vs

          (define (instr R1 R2 k) (k (+ 1 R1) (R2)))


It's probably best to use the latter since it is more general.


Entry: composition and compile time information
Date: Sat Apr  4 14:07:15 CEST 2009

Composing machine operations can be simplified by including compile
time information about the registers modified.  This allows simpler
lifting of machines.  (I.e. single stack -> dual stack).


Entry: ARM vs. MIPS
Date: Tue Apr  7 08:57:20 CEST 2009

The two 32 bit architectures for which it makes sense to try to write
native support.  As a general ``getting aquainted'' move, how do these
two architectures compare?


http://www.embeddedrelated.com/usenet/embedded/show/74090-1.php


Entry: machines
Date: Wed Apr  8 08:21:23 CEST 2009

The interface consists of:

  1. register namespace
  2. register content representation (for registers containing stacks)

The first part is necessary to decouple the cardinality and order of
the registers in the representation from specification of functionality.

For the second I'm not sure whether datastructures should be limited
to stacks only, in which case the pattern matching becomes simpler.
In all use cases in staapl (*) these are the only two


  (*) registers only: real machine simulation (w. stacks implemented in memory)
      stacks: all scat based machines + forth parser machine


So, the macros are factored in the following way:
  - convert to normal form: this introduces and/or completes clauses
  - convert to binding form

For the latter there should be more than one: simple lambda forms and
match expressions.

So, I have a basic lambda form now:


(machine-lambda #'values                ;; continuation
                '(X Y)                  ;; identifier namespace + order
                #'((X -> (car X))))     ;; update function
=> (lambda (X Y)
     (values (car X) Y))


Entry: imperative module system
Date: Wed Apr  8 18:55:58 CEST 2009

Explain why the current "imperative" module system is not a good idea
and how it should be changed to implement something better.

Relate this to Ocaml's functors.

The problem I'm trying to solve is about the following conflict:  One
wants the freedom to be able to change _all_ elementary code
generating functions but one does not want to be burdened with
declaring them as replacable.  This is about _defaults_ and declaring
how we are not using defaults.

The problem is: with unconstrained mutation of behaviour, locality is
lost: we really do want to change global behaviour in some cases
(i.e.: different machine), but would like some semantics to stay
invariant.  What is the proper way to change this?


Entry: assembler incremental changes
Date: Wed Apr  8 21:04:58 CEST 2009

With the machine/vm.ss macros working it should be straightforward to
start building the assembler/simulator structure and gradually
translate the macro language to a typed one.

EDIT: simply start adding _some_ form of static information to
assembler instructions and build other machinery on top of this: start
at the asm-register! and dasm-register! macros.


Entry: stacks
Date: Thu Apr  9 08:13:58 CEST 2009

The big problem however is the static implementation of stacks.  This
requires a model for memory, but I'd like to keep memory and machine
separate.


Entry: asm hygiene problems
Date: Thu Apr  9 09:28:40 CEST 2009

Looks like there are a couple of problems with the current assembler
implementation: when introducing symbols, some lexical information
apparently gets lost.  Time to cleanup the lot.

Hygiene problems seem to be solved, but now the app/ tests have
changed code.

I'm going to revert the assembler sources to isolate the change that
breaks the tests.

See the offending change below.  I'm going to introduce just lexical
correctness, and see where it goes.

hunk ./staapl/asm/asmgen-tx.ss 28
+ "../tools/stx.ss"
hunk ./staapl/asm/asmgen-tx.ss 39
+(check-set-mode! 'report-failed)
hunk ./staapl/asm/asmgen-tx.ss 55
-;; (bitstring->list "0101 kkkk ffff ffff")
+;; Parse bitstring.
hunk ./staapl/asm/asmgen-tx.ss 61
+(check (bitstring->list "0101 kkkk ffff ffff")
+       => '(0 1 0 1 k k k k f f f f f f f f))
hunk ./staapl/asm/asmgen-tx.ss 65
-;; (bin->number '(1 1 0 0))
hunk ./staapl/asm/asmgen-tx.ss 68
+(check (bin->number '(1 1 0 0)) => 12)
hunk ./staapl/asm/asmgen-tx.ss 72
-;; (combine-bits '((k . 1) (k . 1) (k . 1)))
hunk ./staapl/asm/asmgen-tx.ss 79
-
+(check (combine-bits '((k . 1) (k . 1) (k . 1)
+                       (l . 1) (l . 1))) => '((k . 3) (l . 2)))
hunk ./staapl/asm/asmgen-tx.ss 93
+(check (split-opcode '(1 0 1 0 k k k k l l)) => '((10 . 4) (k . 4) (l . 2)))
hunk ./staapl/asm/asmgen-tx.ss 97
-(define (parse-opcode-proto str)
-  (split-opcode
-   (bitstring->list str)))
+(define (parse-opcode-proto str-stx)
+  ;; After parsing the string, the lexical information needs to be restored.
+  (define restore-lexical-info
+    (match-lambda
+     ((name . bits)
+      (cons
+       (if (symbol? name)
+           (datum->syntax str-stx
+                          name)
+           name)
+       bits))))
+  (map restore-lexical-info
+       (split-opcode
+        (bitstring->list
+         (syntax->datum str-stx)))))
hunk ./staapl/asm/asmgen-tx.ss 118
-(define (binary->proto row)
-  (match row
-         ((name proto . binary)
-          (append (list name proto)
-                  (map parse-opcode-proto binary)))))
+(define (binary->proto row-stx)
+  (syntax-case row-stx ()
+    ((name proto . binary)
+     (append (list #'name   ;; preserve name's lexical info
+                   #'proto)
+             (map parse-opcode-proto (syntax->list #'binary))))))
hunk ./staapl/asm/asmgen-tx.ss 126
-(check (binary->proto '(xorwf (f d a) "0001 10da ffff ffff"))
-       => '(xorwf (f d a) ((6 . 6) (d . 1) (a . 1) (f . 8))))
+(check (->sexp (binary->proto '(xorwf (f d a) "0001 10da ffff ffff")))
+       => `(xorwf (f d a) ((6 . 6) (d . 1) (a . 1) (f . 8))))
hunk ./staapl/asm/asmgen-tx.ss 159
+  (->sexp
hunk ./staapl/asm/asmgen-tx.ss 166
+  )
hunk ./staapl/asm/asmgen-tx.ss 198
-(define (instruction-set-tx asm! dasm! instructions)
-  (let ((protos
-         (map
-          binary->proto
-          (syntax->datum instructions))))
-    [_$_]
+(define (instruction-set-tx define-assembler dasm! instructions)
+  (let ((protos (map binary->proto (syntax->list instructions))))
hunk ./staapl/asm/asmgen-tx.ss 207
-                          (#,asm!
-                           '#,name
+                          (#,define-assembler
+                           #,name
hunk ./staapl/asm/asmgen.ss 36
-    ((_ asm! dasm! instructions ...)
-     (instruction-set-tx #'asm!
+    ((_ define-assembler dasm! instructions ...)
+     (instruction-set-tx #'define-assembler
hunk ./staapl/asm/asmgen.ss 42
-  (iset asm-register!
+  (iset define-assembler
hunk ./staapl/asm/asmgen.ss 50
-(let ((asm  #f)
-      (dasm #f))
-  (let ((asm!  (lambda (name fn)     (set! asm fn)))
-        (dasm! (lambda (opc bits fn) (set! dasm fn))))
-    (iset asm! dasm!
-     (testopc (a b R) "1010 RRRR aaaa bbbb"))
-    (parameterize
-        ((current-pointers #hasheq((code . (-1)))))
-      (check (asm  4 2 -1) => '(#xAF42))
-      (check (dasm #xAF42) => '(testopc (a . 4) (b . 2) (R . -1)))
-      (void))))
+
+'(let*
+    ((testopc #f)
+     (dasm #f)
+     (dasm! (lambda (opc bits fn) (set! dasm fn))))
+
+  (iset set!   ;; define-assembler
+        dasm!
+        (testopc (a b R) "1010 RRRR aaaa bbbb"))
+  (parameterize
+      ((current-pointers #hasheq((code . (-1)))))
+    (check (testopc 4 2 -1) => '(#xAF42))
+    (check (dasm #xAF42)    => '(testopc (a . 4) (b . 2) (R . -1)))
+    (void)))
hunk ./staapl/asm/dictionary.ss 23
- asm-register!  asm-find
+
+ [_$_]
+ ;; asm-register!  asm-find
+ define-assembler
+
+ [_$_]
hunk ./staapl/asm/dictionary.ss 31
- define-asm
+ define-asm   ;; FIXME: get rid of this
hunk ./staapl/asm/dictionary.ss 47
+
+
+(define-syntax-rule (define-assembler name fn) (asm-register! 'name fn))
+
+
+
+
hunk ./staapl/asm/dictionary.ss 119
+
+
hunk ./staapl/tools/stx.ss 15
+
+;; FIXME: doesn't handle all cases (i.e. vectors..)
hunk ./staapl/tools/stx.ss 21
+   ((pair? x)   (cons (->sexp (car x)) (->sexp (cdr x))))
+   ((null? x)   '())


Entry: hygiene bug
Date: Thu Apr  9 10:34:34 CEST 2009

The minimal patch I can find that breaks the test is this:


hunk ./staapl/asm/asmgen-tx.ss 92
-(define (parse-opcode-proto str)
-  (split-opcode
-   (bitstring->list str)))
+(define (parse-opcode-proto str-stx)
+  (map (match-lambda ((param . bits) (cons (datum->syntax str-stx param) bits)))
+       (split-opcode (bitstring->list (syntax->datum str-stx)))))
hunk ./staapl/asm/asmgen-tx.ss 101
-(define (binary->proto row)
-  (match row
-         ((name proto . binary)
-          (append (list name proto)
-                  (map parse-opcode-proto binary)))))
+(define (binary->proto row-stx)
+  (syntax-case row-stx ()
+    ((name proto . binary)
+     (append (list #'name #'proto)
+             (map parse-opcode-proto (syntax->list #'binary))))))
hunk ./staapl/asm/asmgen-tx.ss 108
-(check (binary->proto '(xorwf (f d a) "0001 10da ffff ffff"))
+'(check (binary->proto '(xorwf (f d a) "0001 10da ffff ffff"))
hunk ./staapl/asm/asmgen-tx.ss 178
-(define (instruction-set-tx asm! dasm! instructions)
-  (let ((protos
-         (map
-          binary->proto
-          (syntax->datum instructions))))
+(define (instruction-set-tx asm! dasm! ins-stx)
+  (let ((protos (map binary->proto (syntax->list ins-stx))))


Let's have a look at the disassembly.
The're all 'bra 'bnz 'rcall instructions.  The other code is intact.
I suspect this is in assembler composition.


tom@zzz:~/staapl/app$ make 1220-8.diff

< 000000:  d020  bra	0x42
< 000040:  d101  bra	0x244
---
> 000000:  d01f  bra	0x40
> 000040:  d0e0  bra	0x202
4,5c4,5
< 000044:  d024  bra	0x8e
< 000046:  d021  bra	0x8a
---
> 000044:  d001  bra	0x48
> 000046:  d7fd  bra	0x1042
7c7
< 00004a:  d029  bra	0x9e
---
> 00004a:  d003  bra	0x52
10c10
< 000050:  d021  bra	0x94
---
> 000050:  d7f8  bra	0x1042
14c14
< 000058:  d02f  bra	0xb8
---
> 000058:  d002  bra	0x5e
16c16
< 00005c:  d021  bra	0xa0
---


Entry: refactoring
Date: Thu Apr  9 12:54:54 CEST 2009

factored out asm-lambda.ss / asm-lambda-tx.ss implementing the DSL for
assembler/disassembler specification : now separate from symbol binding.

next: fix the disassembler as lazy list parsing step.

First step done: generic disassemble word (works also for arbitrary
literal fields) and a macro that expands to something like this:

box> (disassembler-body #'diff #'(k s) #'((118  7) (s  1) (k  8) (3 2)))
(lambda (w)
  (let-values (((field_0 s k field_3) (disassemble/values '(7 1 8 2) w)))
    (and (= field_3 3) (= field_0 118) (list 'diff k s))))


Entry: transposes
Date: Thu Apr  9 20:12:44 CEST 2009

Loop transformations.. These are really just about permutations of
indices.

Anyways..  I'd like to transform this:


(((a 1) (b 2)) ((c 3) (d 4)))

=> (((a b) (c d))
    ((1 2) (3 4)))


Which is the (outer) index transpose (i j k) -> (k i j)

Fixed. Transposition is really best handled with syntax-case
ellipsis.

box> (disassembler-body #'foo #'(s k) #'(((118  7) (s  1) (3 2)) ((k 12))))
(lambda (temp54 temp55)
  (let-values (((temp56 s temp57) (disassemble/values '(7 1 2) temp54))
               ((k) (disassemble/values '(12) temp55)))
    (and (= temp57 3) (= temp56 118) (list 'foo s k))))


The macro:


(define-syntax-rule (push! stack x) (set! stack (cons x stack)))
(define-syntax-rule (lambda* formals . body) (lambda (a) (apply (lambda formals . body) a)))
(define (generate-temp) (car (generate-temporaries #'(#f))))

(define (disassembler-body opcode formals body-stx)
  (define literals '())
  (define (fix-names! stx)
    (for/list ((n (syntax->list stx))
               (i (in-naturals)))
      (let ((_n (syntax-e n)))
        (if (number? _n)
          (let ((__n (generate-temp)))
            (push! literals (list __n _n))
            __n)
          n))))
  (let ((ws (generate-temporaries body-stx)))
    (syntax-case (list ws body-stx) ()
      (((w ...)  (((name bits) ...) ...))
       #`(lambda (w ...)
           (let-values
               ;; Transpose it
               #,(for/list ((stx (syntax->list #'((w (name ...) (bits ...)) ...))))
                   (syntax-case stx ()
                     ((w ns bs)
                      #`(#,(fix-names! #'ns)
                         (disassemble/values 'bs w)))))
             (and
              #,@(map (lambda* (name value)
                               #`(= #,name #,value))
                      literals)
              (list '#,opcode #,@formals))))))))


Looks like i'm getting the hang of combining the highlevel ellipsis
based pattern matching where possible with lowlevel macros where
needed.

Entry: combined asm dasm
Date: Sat Apr 11 13:35:40 CEST 2009

From the same form both asm and dasm can be generated:

box> (asm/dasm-lambda-tx #'(add (a b) "0101 aaaa" "bbbb bbbb"))
(values
 (lambda (a b) (list (asm+ a 4 5) b))
 (lambda (temp70 temp71)
   (let-values (((temp72 a) (disassemble/values '(4 4) temp70))
                ((b) (disassemble/values '(8) temp71)))
     (and (= temp72 5) (list a b)))))


next: cleanup tools.ss dictionary.ss

Now it's important to not mess up the way the disassembler is
integrated.  'dasm-find is no longer possible (we can't distinguish
based on a single word).

This probably means that the following lineage needs to disappear:

 disassemble->word (asm/tools.ss)
 tsee              (live/tethered.ss)

Let's start at dasm-register!

Ok: need to repair dasm-find and dasm-register!

Let's have them introduce toplevel bindings, but also register these
in a dynamic namespace (or use eval?)

I'm loosing oversight.


Entry: Berkely CS61C Machine Structures
Date: Sat Apr 11 21:44:41 CEST 2009

http://webcast.berkeley.edu/course_details.php?seriesid=1906978500

3op instruction format: rator rand,rand,rand
$zero register -> enables assignment: add $r1, $r2, $zero

The lecture series is rather silly and slow, but MIPS is a nice ISA.


Entry: disassembler types
Date: Sun Apr 12 10:34:52 CEST 2009

Should the disassembler know about signedness of values?  Yes, but
where do we do that?  Upper case are signed values, R is PC
relative.. This needs a better spec : type info needs to travel
somehow.

It's probably best to see the assembler as a 2-way function
(equation) and provide a composition and contract mechanism.


Entry: more fixes
Date: Sun Apr 12 11:01:18 CEST 2009

word-ref: expects type <struct:word> as 1st argument, given: #<procedure>; other arguments were: 1

 === context ===
/data/safe/tom/darcs/brood-5/staapl/asm/dictionary.ss:72:0: proto->asm-error-handler
/data/safe/tom/darcs/brood-5/staapl/asm/assembler.ss:249:0: resolve/assemble
/data/safe/tom/darcs/brood-5/staapl/asm/assembler.ss:276:0: nop
core
/data/safe/tom/darcs/brood-5/staapl/live/state.ss:56:0: assemble-chains
/usr/local/plt/collects/scheme/private/map.ss:44:11: for-each


Apparently, asm procedures are wrapped in a struct that contains meta
info.  Let's keep this for now (see struct:asm in
staapl/asm/dictionary.ss)


Entry: magic
Date: Sun Apr 12 11:18:07 CEST 2009

The problem with improperly compiled jumps seems to have
disappeared..  Maybe it was a consequence of lingering non-hygienic
constructs that simply disappeared with the cleanup?

Somehow I don't believe this so let's try to execute one of the
images.

Seems to run just fine. (cd app ; make picstamp.live)


Entry: static assembler data
Date: Sun Apr 12 11:23:04 CEST 2009

The next thing to figure out is how to attach "type" info to the assembler.

Need to think about:
     * assembler primitives
     * assembler composites

     * asm code processors (operate on structs)

Another thing: all operand processing (signed/unsigned
relative/absolute) should be somehow atttached to the disassembler.

These are enough to start thinking about representation:

  * operands are not bit vectors: they have types

  * assembly, disassembly and opcode construction/deconstruction
    should be handled.


Entry: restating the question
Date: Thu Apr 16 20:24:59 CEST 2009

So the real question is: can we build a typed metaprogramming system
from the ground up?

     types + abstract interpretation

     module system for instantiating generic code + some formalism to
     think about names. (forth + functors?)


        instantiation
module --------------> package

          application
function ------------> value


Entry: sequence tools
Date: Fri Apr 17 09:56:30 CEST 2009

Instead of letting staapl depend on zwizwa/plt i'm going to simply
copy the files (symbolic links won't work).  Later this could be
factored out as a separate planet package, whenever I publish
something else.


Entry: asm/dasm cleanup
Date: Fri Apr 17 12:32:10 CEST 2009

So, the idea is that asm and dasm become integral parts of the
language tower, to allow compile-time checks (type checking) and
partial evaluation based on abstract interpretation.

The challenge is going to be to get a better idea of what is static
and what is dynamic.  The confusion is rooted in the type of
application: it is a code transformer, so should all code transforming
code be scheme compile-time code or not?

next:

  * referential transparency for asm and dasm. (replace parameters
    with explicit environment objects).

  * syntax cleanup for signed/unsigned relative/absolute addressing


It's probably best to focus on dasm for now, since it's simpler.  Asm
also contains a link to target dictionary.

next:

  * add address input to dasm
  * add type convertors to dasm

type converters

the main decision is: run time (interpretation) or compile time
conversion?  let's tick to run-time, and move to compile time when
necessary.

what is needed is the analogue of this:
(define (paramclass->asm name)
  (case name
    ((R)  #'asm+/pcr) ;; used for relative jumps
    (else #'asm+)))   ;; assemble value ignoring overflow

Ok. Using this:

(define (paramclass->dasm name)
  (case name
    ((R)  #'dasm/pcr)
    (else #'dasm/unsigned)))


we will do parameter class translation at compile time:

box> (dasm-lambda-tx #'foo #'(a R) #'(((123 10) (a 1) (R 5))))
(lambda (temp12)
  (let-values (((temp13 a R) (disassemble/values '(10 1 5) temp12)))
    (and (= temp13 123) (list 'foo (dasm/unsigned a) (dasm/pcr R)))))


Entry: don't mix side-effects and streams..
Date: Sat Apr 18 12:56:55 CEST 2009

The problem is context-dependency of the disassembler on the current
code pointer, which probably doesn't work well with parameterization..

The result of disassembly seems to be correct, but i doubt it really
does what i think it does..  Maybe the parameter should be pushed
down?

No, let's do this properly: all dasm words get passed as a first
argument the current code location.


arbitrary choice:

the PC that's plugged into a disassembler, should it be the location
of the instruction, or the value of the PC when the instruction
executes (pointing _past_ the instruction)?

Let's stick as close as possible to the machine definition: PC should
point after the instruction.

next:
        * assume that PCR addressing mechanism is the same for all
          architectures of interest.  if not, this should be
          parameterized in operand.ss

        * make the assemblers referentially transparent.

Problem: when invoking the assembler, the next position of the PC is
not known, so we can't use this!  The transformation from instruction
address -> PC is done in the macro, which knows instruction sizes.


Entry: next problem
Date: Sat Apr 18 15:19:01 CEST 2009

The pattern matcher checks arity of the assembler functions.  There's
a problem when moving from (formals) -> (address . formals) for asm
prototype.  Maybe this should be temporarily disabled?

ok. stupid typo.


Entry: disassembly address?
Date: Sat Apr 18 17:14:14 CEST 2009

should the disassembler contain the address?  maybe not, since it's
already converted to absolute addressing..


Entry: further cleanup
Date: Sat Apr 18 17:37:52 CEST 2009

define-lowlevel-asm : needs "define"

original:

(define-sr (define-lowlevel-asm (name addr . formals) . body)
  (asm-register!
   'name
   (make-asm
    (lambda (addr . formals) . body)
    '(name formals))))

there's more problems here: is the '(name formals) info really
necessary at run time? -> no this is supposed to be filled with the
prototype from the assembler generator.


Entry: name structure
Date: Sat Apr 18 18:50:26 CEST 2009

I find there are a lot of subtleties due to the hash-table based
(mutable) name structure.  To solve this properly will require quite
an overhaul..

Where to start?

Let's first take the files apart, and figure out what is necessary at
compile time to perform pattern matching.  Hmm.. too much junk
still.. Let's eliminate the run-time pattern first.

What is it used for?  Easy to see by changing the accessor interface:
asm-prototype -> asm-proto

It is only used in the error handler.  Let's take it out completely.

proto->asm-error-handler : assembler.ss

Ok.. Now, what information is necessary at run time about an
instruction assembler?  For testing purposes it might be linked to its
disassembler, but that's about it..  All other info is best checked at
compile time (i.e. in pattern matching).


Entry: moving towards hash-less assembler
Date: Sun Apr 19 11:08:25 CEST 2009

This requires the 'patterns macro to be adjusted to take names from
the module namespace instead of the hash table.  Note that this is
part of a more ambitious project of eliminating _all_ mutation of
names (also for macros) and replace them with directed acyclic name
structures (modules).

The problem here is that in a lot of places, symbols get quoted
directly, so it's a bit spread all over.

Maybe I'm not quite ready yet.  I don't completely understand _what_
an assembler should be.  This is lack of theoretical knowledge and a
good model of what I'm doing.

It is:
        - syntax to be manipulated (object language)
        - syntax to be translated to machine code
        - syntax to be interpreted (simulator)

Can we model these as functions?

        - macro : [asm] -> [asm]
        - assemble : [asm] -> [bin]
        - simulate : [asm] -> (m -> m)

Where [] = list of
      asm = assembly instructions
      bin = binary instructions (bit vectors)
      m = machine state object

There are others:
        - primitive-assemble : asm -> [bin]
        - disassemble : [bin] -> [asm]


Instead of poking around in current code, it might be wiser to start
building the core from ground up, to see what exactly can be done with
it.


Entry: what is an assembly instruction?
Date: Sun Apr 19 11:27:37 CEST 2009


SYNTAX:

   [asm] -> [asm]

      An abstract data type to be manipulated by coma macros.  This
      type has expansion-time information that checks constraints on
      the syntax: name and arity. (and later possibly type).

SEMANTICS:

   (asm,here,operands) -> [bin]

      Assembly instructions can be _compiled_ to binary machine code,
      relative to the location at which they are supposed to be
      placed.  Note that operands need to be bit vectors.  This
      function is called a primitive assembler.  The (partial) inverse
      of this is called a primitive disassembler.

   (asm,here,operands) -> (m -> m)

      The primitive simulator is merely a different representation of
      the semantics, accessible from the syntax manipulation phase.
      It translates a fully specified instruction into a machine state
      update function, enabling the instruction to be _interpreted_.

GRAPH:

   ([asm],dict,start) -> [bin]

      Instances of asm structures can contain symbolic information
      which eventually depends on the size of the [bin] output of
      primitive assemblers.  This circularity is broken by sucessive
      approximation using a relaxation algorithm.


Anything that translates to [bin] looses information: in this stage,
all symbolic/parameterized information is gone.

The interesting parts are the coma [asm] -> [asm] translation and the
interpretation steps (asm,here,operands) -> (m -> m).  If the latter
can be made to perform abstract interpretation, a lot of checks can be
constructed.

It looks like the important conclusions are:

   - flexibility of representation of instruction semantics (m -> m)
     will determine how the information encoded in these functions can
     be fed back to [asm] -> [asm] transformers (i.e. to generate
     peephole optimizations).  however, this can be added later on, on
     top of basic static structure.

   - practically: information about the instruction's type needs a
     compile-time binding.


Entry: compile time bindings
Date: Sun Apr 19 12:07:59 CEST 2009

let's change the asmgen macro to add a syntax binding containing
assembler prototype.

Ok.. it's not what it needs to be, but the basic defining form is
there.  Time to give things a name.  How should we call the data
structure that when collected in stacks is symbolic machine code?
Let's call it ``op''.

Then:  - coma manipulates ops
       - assembler translates ops into binary code or simulator code.

So.. Now that the stage is set, what am i going to do with it?  I have
a little bit more information than just op's arity: types and rel/abs
addressing are available too.  Are these interesting?  Not really...
These only matter when the operands are numerical -- not during [op]
transformation phase.

So.. Until there is some form of abstract interpretation possible, we
can't do much more than simply checking existence and arity.

So.. Is it necessary to re-invent structures?  Will I use something a
static struct can't carry?  Yes..  It's exactly the "type" information
associated with the op that can lead to automatic transformer
derivations..

Let's just see if the static structure is in place to actually verify
existence first.  In pattern-tx.ss

In the pattern expansion, tags now are wrapped in a verify struct
which looks up the tag in the (op) namespace.

Next: unify all definition syntaxes.

;; Main definer body for asm/dasm/op namespaces.
(define-syntax-rule (define-asm/dasm/op static asm-body dasm-body)
  (begin
    (define-syntax-ns
      (op) name static)
    (define-values-ns
      (((asm)  name)
       ((dasm) name))
      (let ((asm asm-body)
            (dasm dasm-body))
        ;; backwards compat (later, use reflective operations for this)
        (asm-register! 'name asm)
        (dasm-register! #f #f dasm)
        (values asm dasm)))))

OK.. got arity checking working in pattern-tx.ss :

(define (check-ins type ins)
  (syntax-case ins ()
    ((rator rand ...)
     (if (not (identifier? #'rator))
         (printf "warning: can't verify parametric instruction template ~a\n"
                 (syntax->datum ins))
         (let* ((id (ns-prefixed #'(op) #'rator))
                (op (syntax-local-value id (lambda () #f))))
           (if (not op)
               (printf "warning: unchecked ~a: ~a\n"
                       type (syntax-e id))
               (if (= (op-arity op)
                      (length (syntax->list #'(rand ...))))
                   (void) ;; (printf "ok: ~a\n" (syntax->datum #'rator))
                   (raise-syntax-error #f
                                       "incorrect arity"
                                       ins))))))))

Some things can't be checked yet, which produces some warnings.

It's safe to remove the old run-time checking method now.

OK: i've left the symbolic representation as-is, and used the static
info just as a check : it emits warnings when it can't find
prototypes.

Now, is it possible to eliminate current warnings?

It should be possible to separate declaration and implementation of
certain instructions...  Somehow circularity needs to be broken.

The problem is that it is possible to define op manipulations (macros)
without having ops defined, with implementation following later.

Maybe we should allow single assignment of functionality?  This keeps
things declarative, but gets rid of order of particular definitions.

I'm a bit confused.. What's the right question to ask?

The problem is best understood with an example:

       (patterns (macro)
          (([cw a] primitive-exit)  ([jw a]))
          ((primitive-exit)         ([exit])))

This is from core.ss

The problem is that core.ss needs to know the definitions of cw, jw
and exit before this makes sense.

So, instead of using late binding (which is why the assemblers are in
a hash table right now) this could be solved using parameterized
modules : if you want to use core.ss, you need to first provide
meaningful definitions.

The fact that i want to export compile time values makes this
problematic.


Entry: the real problem
Date: Sun Apr 19 19:45:44 CEST 2009

Currently I'm using late binding for some behaviour, both for
assembler and for macros : they behave as mutable nodes.

I'd like to turn this into a declarative structure.

Maybe i should have a look at PLT Scheme's unit facility to tackle
this one..

  Units organize a program into separately compilable and reusable
  components. The imports and exports of a unit are grouped into a
  signature, which can include “static” information (such as macros)
  in addition to placeholders for run-time values.

Is this different than before?  If they do export static information,
it's basicly what I'm looking for.

Compiler template + Machine description -> Compiler

http://download.plt-scheme.org/doc/html/guide/units.html

Could be signatures (plug-in interfaces):
   - machine model (cw,jw,qw + basic macros)
   - compiler core (control flow graph construction vs. "label")


Entry: big change
Date: Mon Apr 20 10:12:20 CEST 2009

question is, should i dive right in?  eliminate all mutation and
rebuild?  it doesn't seem there is a gradual way..

let's try.  start with the machine model.  (staapl/coma/core.ss)


Entry: ns refactoring
Date: Mon Apr 20 10:28:11 CEST 2009

Taking it out and placing it in a separate submodule.  Separate from
scat code and simplify prefix tricks.


Entry: all pseudo
Date: Mon Apr 20 12:00:18 CEST 2009

What about this: simply require some basic ops to be pseudo
instructions (which cannot be overridden) and require the elimination
of these in the machine definition.

That way there is no "inner" dependency on the machine's assembler:
the machine's compiler will eliminate all virtual code in terms of
which the core compiler and macro language are defined.

So.. Let's define the virtuals once and for all in coma/core.ss

Aha.  The problem is that when in different modules, names _can_ be
redefined, since they are independent.  It is only visible globally in
the asm register.

So, I'm confident I can pull this off and change the pattern matching
and assembler so they don't need the hash any more.  Dependencies will
just roll out.

Next: change instruction representation.  Maybe it's best to just
stick with replacing the symbol with an abstract rep.  Maybe just the
'asm struct.

Matcher + op transformers compile.

Matchers seem to work too:
coma.ss :

box> (macro> 1)
[#<asm> #x1]
box> (macro> 1 2)
[#<asm> #x1]
[#<asm> #x2]
box> (macro> 1 2 +)
[#<asm> (1 2 +)]


moving into assembler.ss now -> getting rid of asm-find

hmm.. assembler needs to know (ns (asm) nop) -> set to 0000 for now.

so..
looks like the basic change works, just some rules won't match.
ha.. silly match bug: matches never failed.


ok. now removing asm-register! and asm-find


Entry: now the same for macros?
Date: Mon Apr 20 18:41:48 CEST 2009

That's a different pair of sleeves!

This needs a bit more thought.  There are 2 forms of mutation:

  - what to do with "extensions" : augmenting pure compile-time
    constructs with compilable code. (super)

  - override: complete re-implementation (extension without delegate).


Entry: update: we're still symbolic
Date: Mon Apr 20 18:45:01 CEST 2009

The assembler transformation still uses just lists, however the tags
are now identifiers.

It's still possible to move around tags though, so these aren't
algebraic types.  It would be interesting to see exactly why not.

  * Polymorphy: it's possible to parameterize tags and work with
    instruction prototypes only.  This could be properly abstracted
    into instruction class objects.

  * Non-exhaustiveness: operators act on only a very small portion of
    the possible operations, with the default a simple concatenation.


Entry: Macro transformers
Date: Tue Apr 21 10:04:36 CEST 2009

Now there are m : [op] -> [op] transformers.  Is it possible to deduce
from these [m] -> [m] transformers?


Entry: Mutation
Date: Tue Apr 21 10:06:34 CEST 2009

Now it's time to see how we can get rid of mutation.  Only redefine
behaviour by explicit renaming.  This will break .f code, so it's best
to do it in two stages: first for the core code then the .f code.


Entry: today..
Date: Tue Apr 21 11:18:25 CEST 2009

I'm going to take it easy.. let this rest a bit.  Let's pick some
low-hanging fruit.

next: ns syntax.

Is it possible to do all of this with a single ns syntax?


Entry: expand-to-top-form
Date: Tue Apr 21 15:14:40 CEST 2009

(define-syntax (ns stx)
  (syntax-case stx ()
    ((_ (ns ...) expr)
     (let ((prefixed (lambda (n) (ns-prefixed #'(ns ...) n))))

       (if (identifier? #'expr)
           (prefixed #'expr)
           (let ((id=?
                  (lambda (stx symb)
                    (eq? (syntax->datum stx) symb)))
                 (prefixed-list
                  (lambda (stx)
                    (map prefixed (syntax->list stx))))
                 (exp (expand-syntax-to-top-form #'expr)))

             ;; (printf "top: ~a\n" (syntax->datum exp))
             (syntax-case exp ()

               (((form (((n1) (form1 names e ...))) n2) i ...)
                (id=? #'form 'letrec-values)
                #`((form (((n1) (form1 #,(prefixed-list #'names) e ...))) n2) i ...))

               ((form b . e)
                (id=? #'form 'let-values)
                #`(form
                   #,(for/list ((n (syntax->list #'b)))
                       (syntax-case n ()
                         ((names e) #`(#,(prefixed-list #'names) e))))
                   . e))

               ((form names e)
                (or (id=? #'form 'define-values)
                    (id=? #'form 'define-syntaxes))
                #`(form #,(prefixed-list #'names) e))

               )))))))

This doesn't work.. I get undefined references to let-values /
define-values ...

Maybe it's best to not invoke the transformer?

OK.. got it working with an explicit preprocessing macro:

(define-syntax (ns stx)
  (syntax-case stx ()
    ((_ (ns ...) expr)
     (let* ((prefixed
             (lambda (n) (ns-prefixed #'(ns ...) n)))
            (prefixed-list
             (lambda (stx)
               (map prefixed (syntax->list stx))))
            (prefixed-binders
             (lambda (p)
               (lambda (binders)
                 (for/list ((b (syntax->list binders)))
                   (syntax-case b ()
                     ((n e) #`(#,(p #'n) e))))))))

       (if (identifier? #'expr)
           (prefixed #'expr)
           (let ((form?
                  (let ((form (car (syntax->datum #'expr))))
                    ;; (printf "form = ~a\n" form)
                    (lambda (symb) (eq? form symb)))))

             (syntax-case #'expr ()

               ((form (name . a) e)
                (or (form? 'define)
                    (form? 'define-syntax))
                #`(form (#,(prefixed #'name) . a) e))

               ((form name e)
                (or (form? 'define)
                    (form? 'define-syntax))
                #`(form #,(prefixed #'name) e))

               ((form names e)
                (or (form? 'define-values)
                    (form? 'define-syntaxes))
                #`(form #,(prefixed-list #'names) e))

               ((form binders e)
                (or (form? 'let)
                    (form? 'letrec)
                    (form? 'shared))
                #`(form #,((prefixed-binders prefixed) #'binders) e))

               ((form binders e)
                (or (form? 'let-values)
                    (form? 'letrec-values))
                #`(form #,((prefixed-binders prefixed-list) #'binders) e))

               )))))))


Managed to delete a whole lot of code that's no longer used with this
simpler approach.


Entry: the real deal
Date: Wed Apr 22 12:26:23 CEST 2009

Now for the biggies:

  * how to get rid of mutation of macros?
  * internal state monad?

;; \__ target
;; \__ scat
;; \__ coma
;; macro::jump
;; \__ control
;; macro::sym
;; macro::label:
;; macro::exit
;; \__ comp
;; \__ asm
;; \__ forth
;; \__ live
;; macro::+
;; macro::/
;; macro::*
;; macro::-
;; macro::dup
;; macro::drop
;; macro::swap
;; macro::,
;; macro::or-jump
;; macro::not
;; macro::then
;; \__ purrr
;; \__ pic18
;; macro::TRISC
;; macro::STATUS
;; macro::FSR2L
;; macro::FSR2H
;; macro::PLUSW2
;; macro::PREINC2
;; macro::POSTDEC2
;; macro::POSTINC2
;; macro::INDF2
;; macro::FSR1L
;; macro::FSR1H
;; macro::PLUSW1
;; macro::PREINC1
;; macro::POSTDEC1
;; macro::POSTINC1
;; macro::INDF1
;; macro::WREG
;; macro::FSR0L
;; macro::FSR0H
;; macro::PLUSW0
;; macro::PREINC0
;; macro::POSTDEC0
;; macro::POSTINC0
;; macro::INDF0
;; macro::PRODL
;; macro::PRODH
;; macro::TABLAT
;; macro::TBLPTRL
;; macro::TBLPTRH
;; macro::TOSL
;; macro::TOSH
;; macro::TOSU
;; macro::C
;; macro::Z


So... I tried to remove the redefine! syntax, but it seems to depend
on (rpn-map-identifier) for finding already existing bindings.  Can
this be moved to static namespace names?

Upgraded ns (ns-tx) so it can do require/provide forms too.  Now
replaced redefine!-ns in (compositions ...) with a proper define, and
i'm trying to use (ns-in (macro) (except-in ...)) to explicitly
re-define things.


Entry: hierarchical language
Date: Wed Apr 22 14:24:04 CEST 2009

So.. Why should coma/language.ss have jump?

Maybe coma/language.ss should just be about partial evaluation: all
control flow stuff should be goine.  This means it should not know
about CW JW EXIT

But.. Let's fix that later.  Now take out things that conflict.

It's probably best to make the languages include each other, and later
separate out disjoint parts:

   core < language < coma < control < comp


Ok.. This compiles, but there are still problems with the toplevel
namespace using staaplc.  Maybe i should just continue and take the
mutation also out of the forth code?

Another thing (now that i've re-discovered the "run" button) : how to
test comp.ss?

Seems there are still some problems..  It's probably best to make this
into a linear chain of extensions, to make sure a parent module
doesn't import a non-modified deeper core module.  This should include
the assemblers.


Entry: then <-> declarative macros
Date: Wed Apr 22 15:09:41 CEST 2009

"then" is a problem because it uses a plug-in optimization: macros
defined in terms of "then" in the lower language layers will not
change behaviour.

this is a point where we have to give up flexibility due to absence of
hooks.

        hook = hole in module

this needs to be solved later when i do have a way to put holes in
modules.  but overall it's probably best to stick to a more static
bottom-up code structure.

in general: hooks in functional programming can usually be solved with
higher order functions (create holes with lambda).  i can probably do
the same here too.

EDIT: it's worse than that.  "label:" and "sym" have the same
problem.  Looks like it's time for a unit.


Entry: static dasm
Date: Wed Apr 22 16:15:28 CEST 2009

There's still "dasm-register!".

Maybe disassembly should be turned into a reflective operation.
Something that collects all the names and seals up the dasm (could be
done statically probably..)


Entry: parameterized control compiler
Date: Wed Apr 22 16:49:35 CEST 2009

control.ss is parameterized by the code's control graph
representation.  in the current version there is either a flat
assembler list, or a control flow graph structure.

these come from the implementation of "sym" and "label:"

It's time to parameterize these into a unit.

procedure application: expected procedure, given: #<set!-transformer>; arguments were: #<syntax:/data/safe/tom/darcs/brood-5/staapl/control/control.ss:133:5> #<expression>

Looks like this interferes with the procedure? predicate used in the
rpn parser..

OK.  ignoring set!-transformer?

Units seem to work now too.

Next : what is purrr.ss ?


hmm... it's confusing.

  - i'd like to be able to use the pic18 compiler without the baggage
    of the CFG, so it needs to be a unit too?


Entry: a lot of backpatching
Date: Wed Apr 22 23:21:42 CEST 2009

just ran into this:

staaplc -d /dev/ttyUSB0 1220-8.f
asm-pattern: match failed for: not, asm:
[bit? #xf9e #x5 #x1]

this is because "not" is the one from coma/coma.ss

i'm using the wrong abstraction.
units are a good start, but this must be made easier to use.

small embedded programs can use a flat namespace.  building a
specialized program is really building this namespace.

also, each word should delist where its implementation comes from.
let's implement that first, then it will be clear why some
functionality won't work..


Entry: units as basic block
Date: Wed Apr 22 23:42:50 CEST 2009

It's starting to look better: units are probably a better way to
construct applications.  At least it's going to be a better way to
organize all the circular dependencies in the core code..


Entry: taking it apart again.
Date: Thu Apr 23 08:30:07 CEST 2009

so, let's start in coma/core.ss separate out virtual.ss : virtual
instructions for use in the macro evaluator.

next: how to parameterize control with the proper branching mechanism?

the important part is to be able to tell where there are exit points:
exit or or-jump.


Entry: compilation state representation
Date: Thu Apr 23 10:13:45 CEST 2009

Is it possible to get rid of the parameters in coma?  And is this
really desirable?

  - macro-state-check
  - macro-eval-init-state

Who determines what the compilation state is?  The compiler.

The real question is, why does the macro evaluator need to know the
internal compiler state?  Because it compiles of course..  This is
quite a deep feedback loop that's hard to explicitly propagate outward
for linking resolution.

But.. Since the compiler state are all subtypes (stack < 2stack <
comp), it should be possible to automatically upgrade when necessary.

Fix this later.  First get test-comp.ss to run.  It's probably more
important to get the parameterization in the compiler to work properly.


Entry: de-parameterize comp/instantiate.ss
Date: Thu Apr 23 10:41:12 CEST 2009

parameters:
  compile-literal
  compile-word
  compile-exit
  compile-mexit
  semi
  target-postprocess


ok.. getting rid of parameters should not be too difficult. however,
fixing the primitive compilation steps (especially mexit) might be a
challenge.

mexit is probably best implemented not with a parameter, but with
placing functions on the rs stack..


Entry: next
Date: Thu Apr 23 17:00:58 CEST 2009

first i need to get the full code gen + tests back online, then it's
time to cleanup the CFG compiler.

had to fix 'org and introduced machine^ containing cell size.

;; This is for printing only..  Maybe keep the parameters?  Otoh,
;; there should be a simple way to do this properly too: fill in these
;; params deeper in the code..
(target-code-unit 2) ;; a code word is 2 bytes
(target-code-bits 16)
(target-address-size 24)

lets see if we can get the test to run

next problem: forth-begin-tx.ss depends on the compiler through the
wrap-macro/... functions.  the reason is that forth-begin contains
instantiation.

this probably needs an extra interface.

ok.. i got it i think..

next: postprocessing.

OK works.


Entry: final test
Date: Fri Apr 24 08:54:27 CEST 2009

Cleaning up forth-begin code to make it parameterizable.

What i really want is a sort of intermediate form to display Forth
files in a similar way to the 'compositions macro.  Basicly, .f
parsing should get rid of all the parsing words, but leave the rest
intact.  This will probably expose some hairy bits..

I'm getting rid of the rpn-map-target-identifier parameter: let's
stick to '(target) namespace in the forth-begin macro.  If not,
later simply pass the namepspaces to the forth-begin macro.

Ok.. time for the final test.

compile: unbound identifier in module in: asm/qw

This is defined in op/asm.ss

Need to rename things:
  jump-cfg doesn't really compile cfg: just a list of list of chunks


anyways, in the mean time i fixed the command line interface to the
compiler.  it's again possible in snot to simply type forth code and
have the resulting assembly code displayed.


next prob:

/home/tom/staapl/staapl/comp/unit-jump-cfg.ss:225:6:  opcode ``cw'' won't match opcode with same name.
/home/tom/staapl/staapl/pic18/unit-pic18.ss:599:14:  opcode ``exit'' won't match opcode with same name.
/home/tom/staapl/staapl/pic18/unit-pic18.ss:600:4:  opcode ``exit'' won't match opcode with same name.
/home/tom/staapl/staapl/pic18/unit-pic18.ss:609:4:  opcode ``save'' won't match opcode with same name.

seems like there are different instances of the asm? objects due to
unit instantiation..  let's make sure they are linked by putting them
in a separate unit.

now.. instead of rushing to a maybe solution, is there a better way to
match?

the problem is that (asm: foo) produces different results depending on
wheter it is executed inside a unit or outside.

i don't see a way to fix this without telling the instantiation
process to share instances..

let's test this.


Entry: syntaxes and units
Date: Fri Apr 24 16:24:24 CEST 2009

What I don't understand is why syntax transformer definition has to be
part of a unit's signature.

Maybe it helps to understand how units are implemented...

So, an identifier in a unit is a rename transformer.

What better than to look at the expansion?

(define-signature foo^ (foo (define-syntaxes (baz) '(1 2 3))))
(define-signature bar^ (bar))

(expand
 #'(unit
     (import foo^)
     (export bar^)
     (define bar (+ 123))
     (define-syntax (print-baz stx)
       (printf "baz: ~a\n" (syntax-local-value #'baz))
       #'456)
     (print-baz)))

PRINTS:
baz: (1 2 3)

VALUE:
(#%app
 make-unit
 'eval:42:0
 (#%app
  vector-immutable
  (#%app cons 'foo^ (#%app vector-immutable (#%top . signature-tag))))
 (#%app
  vector-immutable
  (#%app cons 'bar^ (#%app vector-immutable (#%top . signature-tag))))
 (#%app list)
 (let-values ()
   (let-values ()
     (lambda ()
       (let-values (((temp51) (#%app box undefined)))
         (#%app
          values
          (lambda (import-table)
            (let-values (((temp50)
                          (#%app
                           vector->values
                           (#%app
                            hash-table-get
                            import-table
                            (#%top . signature-tag))
                           '0
                           '1)))
              (let-values ()
                (let-values ()
                  (letrec-values (((bar) (#%app + '123)))
                    (#%app set-box! temp51 bar)
                    '456)))))
          (#%app
           make-immutable-hash
           (#%app
            list
            (#%app
             cons
             (#%top . signature-tag)
             (#%app vector-immutable (lambda () (#%app unbox temp51))))))))))))


Judging from this, units are completely compiled once defined.  This
consumes the signature info.  All the information that's necessary at
compile time needs to be provided _in the signature_.

The compile-time info patterns-tx depends on thus needs to be embedded
in the signatures also.  Looks like it's best to define operations
themselves as signatures.

What is necessary is the separation of interface and implementation
for the assembler opcodes.  Opcodes should be _declared_ somewhere,
then when _defined_ the declaration should be verified (or possibly
created).

Maybe the signature can be stored _inside_ the static info?

(define-syntax (define-op-signature stx)
  (syntax-case stx ()
    ((_ name^ (name arg ...) ...)
     _ _ _)))


Entry: making qw parametric
Date: Fri Apr 24 19:03:32 CEST 2009

This goes really deep..  In macro-tx.ss the qw opcode is used to
define the immediate semantics of the macro language.

Considering that this is implemented with parameters defined at
compile time, i think i'm stuck.

Maybe the scat languages themselves should also be implemented as
units..  Then multiple instances can be defined similarly.

Well.. I guess the idea was to make everything declarative, so this
probably includes the parameters at the very core of staapl...

I'm not sure if it's possible to do this gradually.

Let's first establish some working point to get back to.


Entry: mark: test-pic18.ss / pic18.ss work with console
Date: Fri Apr 24 19:20:41 CEST 2009

i will now start breaking things to turn scat into a unit that can be
re-instantiated multiple times, instead of using compile-time
parameters to change behavior of a single compiler core.

this will probably be quite some effort..

now, instead of stupidly starting to break things, isn't it possible
to write a unit interface on top of rpn-tx.ss ??


Entry: rpn compiler
Date: Fri Apr 24 20:09:03 CEST 2009

maybe it's time to rewrite the whole thing... one of the problems i
had is to have to work around limitations of the parser..

maybe it should be made to parse forth in the first place, then limit
it to be able to do simpler things too?

the basic problem is a parse from linear code -> dictionary.  this is
solved using prefix parsing keywords.

so, what is needed is a representation of a dictionary, somewhat like
the chunk-collecting compiler in comp/compiler.ss

another thing to mention is (delimited) continuations.  upto now this
was problematic because of the way code was mapped to lambda
expressions.  it might be better to work directly in cps, since that's
a lot closer to the way forth works.

let's see..

the code fragment (a b c)

start in rpn/new.ss

Entry: the new scat
Date: Fri Apr 24 20:44:03 CEST 2009

So.. What is a stack language?  It's like a functional language where
the environment is replaced with an explicit stack.


ok, it went fast for a bit.. trying this now:

(define-syntax-rule (op-2->1 op)
  (lambda (d r)
    ((car r) ;; call continuation
     (cons (+ (car d)
              (cadr d))
           (cddr d))
     (cdr r))))

(define (done) (list (lambda (d r) d)))
(define-syntax-rule (run program stack)
  ((ns (rpn) program) stack (done)))

Basicly, each fn takes 2 arguments, the data stack and the
continuation stack.  It performs a primitive computation and passes
the (d r) to the popped continuation.

Next step is to define a parser, basicly an expansion time
interpreter.  This will lead us to something very close to the CEK
machine.

(lambda (d r c) ...)

p : parameter stack
r : continuation stack (return stack)
c : code stack
d : dictionary


the p and r can be eliminated during parsing.. probably re-introduced
later.

the continuation stack can be abstracted as a procedure. it never
needs to be implemented as a stack.

it might also be faster to run functions from a list than to construct
syntax for them.


DAMN..
almost there, but i'm fading out..

i'm confused about the tension of implementing the contuations as an
explicit stack, or as an abstract function.

for functions, i need to figure out how to write "compose"

(define (a p k) ...)
(define (b p k) ...)

(compose (a b)
  (lambda (p k1)
    (a p (lambda (p k2) (b p k1)))))


Now, how does a primitive look?

(define-syntax-rule (op-2->1 op)
  (lambda (p k)
    (k (cons
        (op (p-car p)
            (p-cadr p))
        (p-cddr p)))))


Entry: list interpreter
Date: Fri Apr 24 23:31:28 CEST 2009

Why not use a forth-style list interpreter?  Why does the code have to
be abstracted as a nested lambda expression?  What about CPS?

I'm a bit confused now..

Let's see.. The goal is to be able to run the code without excessive
consing.  This means either nested expressions:

          (lambda (stack)
            (c (b (a stack))))

List interpretation:

          (lambda (stack)
            (do-list (list a b c) stack))

Or CPS:

          (lambda (stack k) ;; L1
            (a stack (lambda (stack1) ;; L2
                       (b stack1 (lambda (stack2) ;; L3
                                   (c stack2 k))))))

With a,b,c primitives.
[1] http://en.wikipedia.org/wiki/Continuation-passing_style


Funny, writing the CPS as follows makes it easier to see the
Haskell "do" notation (with arrows reversed).


(define (add p k)
  (k (cons (+ (car p)
              (cadr p))
           (cddr p))))

(define (add3 p k)
  (add p (lambda (p1)    ;; add p -> p1
  (add p1 k))))          ;; return p1

(define (add4 p k)
  (add p  (lambda (p1)   ;; add p  -> p1
  (add p1 (lambda (p2)   ;; add p1 -> p2
  (add p2 k))))))        ;; return p2

(define (comp a b) (lambda (p k)
    (a p (lambda (p1)
    (b p1 k)))))


So.. I see no advantage in CPS.  Does PLT optimize this to a simpler
form?  The closures are really no better than te naive consed return
stack used earlier.  It's probably best to stick to the native
representation.

So..  Nothing better.  But there's one optimization that should be
possible.  Instead of using 1->1 functions, it might be possible to
use n->n functions, which eliminates construction of the state vector.


Entry: n-ary instead of unary ops
Date: Sat Apr 25 00:24:04 CEST 2009

the problem here is that the simple nesting can't be used:

(lambda (stack) (c (b (a stack))))

instead we have

(lambda (x y z code)
  (let next ((x x)
             (y y)
             (z z)
             (c code))
    (if (null? c)
        (values x y z)
        (call-with-values (lambda () ((car c) x y z)) next))))

it's probably not worth fussing about this, since i don't know what's
going on without knowledge of the compiler.  maybe i'm just looking
for 'compose?


(define (pp a b) (values (cons (car b) a) (cdr b)))
((compose pp pp pp) '() '(a b c d e f g h i j k)) =>
(c b a)
(d e f g h i j k)


Entry: collecting
Date: Sat Apr 25 01:27:52 CEST 2009

This isn't really parsing. It's just collecting elements of a
dictionary.  The same pattern happens twice in the code:

  * the forth parser:  collects (name type body) code
  * the CFG compiler:  collects (target code) chunks
  * binchunk parser

The binchunk parser uses the stack of stacks abstraction.  It should
be quite straightforward to do the same for the other dictionaries.

The operations:

    - make-sos
    - sos->list
    - sos-push
    - sos-collapse


Entry: basic parser
Date: Sat Apr 25 02:13:51 CEST 2009

Is working.. Now to see if it can do the fancy stuff like locals.

The problem is: how to parameterize this?

It can't be done with just macros.  I.e. we need to get at the local
syntax value of the real identifiers after mapping to determine
whether it's a transformer that has to be called.  Is it enough to
have the mapping namespace parameterized?

Also, the representation of the dictionary which is passed between
macros should be abstract.  However, it might be enough to keep this
fixed at a list of 2 stacks.

Actually, the only real problem is to parse the flat code into
something which can easily be parsed with syntax-rules macros.


Now... Maybe the sos is enough.  This is what i get out of the new CPS
style rpn-parse

(define-syntax-rule (rpn: code ...)
  (rpn-parse (quote
              (rpn)
              p-apply
              p-cons) code ...))


box> (rpn: 123 : abc 1 2 + : 100 -)
(()
 (((name 100) (p-apply rpn/-))
  ((name abc) (p-cons 1) (p-cons 2) (p-apply rpn/+))
  ((p-cons 123))))


This is actually colorForth ;)

Entry: mixing pattern names and variable names
Date: Sat Apr 25 12:10:12 CEST 2009

This doesn't work:

(define-syntax rpn/:
  (make-rpn-transformer
   (lambda (w d k)
     (let ((name (cadr w))
           (w+ (cddr w)))
       (k w+ (d-compile #`(name  ;; <- this
                            #,name)
                        (d-close d)))))))


The marked "name" is not an introduced identifier that will later
match to a literal "name" pattern as in :


(define-syntax rpn-define
  (syntax-rules (name)
    ((_ (name n) (type param) ...)
     (ns (rpn) (define n (rpn-lambda (type param) ...))))
    ((_)
     (void)) ;; ignore empty code
    ))


Entry: parameterization
Date: Sat Apr 25 12:32:44 CEST 2009

So the question is mostly: does the current lexical parameterization
work for solving all possible syntax combinations?

Let's see.

All works pretty well, except for quasiquote.  I don't understand how
it works any more..

Ok.. Quasiquote serves to build datastructures containing functions:

     `(1 2 ,+)      -> (list 1 2 xxx/+)
     `(1 2 ,(+ +))  -> (list 1 2 (xxx: + +))

This is a bit ad-hoc, and I don't beleive it is used much..

Looks like the basic parser is done.


Entry: macro continuations
Date: Sat Apr 25 16:20:07 CEST 2009

Trying to avoid low-level macros, I do run into occasions where I need
to resort to CPS to be able to properly separate concerns (use macros
to "preprocess" other macro's input).

Usually this can be done by passing a macro continuation.  But what if
this continuation takes other arguments?  Maybe what is needed is a
"macro curry".

I've been playing with this when I tried writing the parser completely
in syntax-rules.  What I'm looking for here is a simple mechanism that
catches most of my specialization problems.

This is a bit more insidious than i thought.  My current approach
doesn't compose well... Let's just stick with simple one continuation
macros.... maybe local syntax would help?  Something to fake "lambda"
for macros?

The thing is: in CPS, primitives take an exptra continuation
argument.  However, to compose promitives requires the creation of new
continuations (in the form of closures).  This doesn't work for
macros, so the best way is to create local syntax.

It works with local syntax.  This is a bit tricky due to (... ...)
which i never got to intuitively understand very well (it seems upside
down).

(define-syntax-rule (rpn-begin code ...)
  (let-syntax
      ((tx-mk
        (syntax-rules (name)
          ((_ (()        ;; empty anonymous list
               ((name n) ;; entries need tagged name
                . tagged-code)
               (... ...)))
           (begin
             (ns (rpn) (define n (rpn-lambda . tagged-code)))
             (... ...))))))
    (rpn-specialized-parse tx-mk code ...)))


Entry: scat in terms of new rpn syntax
Date: Sat Apr 25 17:53:24 CEST 2009

rpn-scat.ss is quite simple now
let's start taking out the entire old parser and work up from here.


I don't remember what rpn-wrap is about.  Taking out shift/reset.

(ns (scat) (define-syntax reset (rpn-wrap (expr) #`(reset-at scat-prompt #,expr))))
(ns (scat) (define-syntax shift (rpn-wrap (expr) #`(shift-at scat-prompt k #,((rpn-immediate) #'k expr)))))

For the rest it seems to work fine.

next is coma.

From macro-syntax.ss i don't know what this is about:

(define-sr (define-procedure name)
  (ns (macro) (define name
                (postponed-word 'name))))


now, looks like tscat: isn't so trivial.. let's see.

indeed.. What it does is to figure out whether a variable is locally
bound (i.e. not a module import or toplevel) and depending on this
will perform some action.

this is the worst kind of reflection there is.. how can this be
re-incorporated?

Hmm.. it's not so bad.  It's about the input of tscat: itself, not
some inside behaviour mod.  Can probably just be copied:

;; Operate on rpn code body, processing lexical and other variables.
(define (rpn-lex-mapper fn-lex [fn-no-lex (lambda (x) x)])
  (lambda (stx-lst)
    (map
     (lambda (stx)
       (if
        (and (identifier? stx)
             (lexical-binding? stx))
        (fn-lex stx)
        (fn-no-lex stx)))
     stx-lst)))


Entry: next
Date: Sat Apr 25 19:08:40 CEST 2009

Turning coma into a unit in terms of basic ops - or, maybe it's best
to dive into the parser first...  Ok. let's try that then.

Let's do locals-tx.ss first.

One of the problems i saw coming was the use of semantics macros in
the parsing words...  Simply, there is no way to do this without
passing some parameters into the transformer..

Let's see.. Instead of doing this in macro code, try it in plain rpn
first.

The syntax is:

    (| a b | b a)

The difficulty here is in finding the end of the code.  Once this is
done it's quite straightforward.

(define-syntax-rule (rpn/locals: (a b) prog ...)
  (lambda (p)
    (let ((a  (lambda (x) (immediate (p-car p) x)))
          (b  (lambda (x) (immediate (p-cadr p) x)))
          (p+ (p-caddr p)))
      (function (rpn: prog ...) p+))))


The convoluted nature of the old implementation seems to be mostly
about not being able to terminate the code properly.  As an
s-expression, it's really trival.

So.. What about solving this in the postprocessor?  Can't do that
since some identifiers might be interpreted as compile-time actions..

So probably the current way it works is already wrong: compile-time
bindings won't get shadowed!  There's only one way: the body needs to
be expanded inside a context that has expanded the surrounding let
form.

Hmm..

One thing:
I'm going to change the local syntax stuff to explicitly named
transformers to make the expansions a bit less verbose.

This is the locals transformer:

(define-syntax (rpn-parse-locals stx)
  (syntax-case stx ()
    ((_ (namespace
         function
         immediate
         program:
         pop-values)
        (param ...) code ...)
     (let ((plist (syntax->list #'(param ...))))
       #`(lambda (p)
           (let-values (((p+ param ...) (pop-values p #,(length plist))))
             (ns namespace
                 (let #,(for/list ((p (syntax->list #'(param ...))))
                          #`(#,p (program: ',#,p)))
                   (function (program: code ...) p+)))))))))


Entry: identifier -> source path
Date: Sun Apr 26 00:15:15 CEST 2009

From PLT list:

(define (definition-source id)
  (let ([binding (identifier-binding id)])
    (and (list? binding)
         (resolved-module-path-name
          (module-path-index-resolve (car binding))))))

So.. next problem: how to parse this in forth code?

It would be great if all code could be segmented first, then properly
parsed one definition at a time.

This is limitation of Forth syntax, which is a consequence of its
imperative nature..  So let's stick to the imperative nature and solve
it.


Entry: compile time is compile time
Date: Sun Apr 26 01:47:00 CEST 2009

Q: The problems I'm trying to solve at this moment are about run-time
   values not matching across unit boundaries.  What about avoiding
   these run-time values and turning the macros into proper macros
   (executing at scheme compile time).  Would that solve anything?

Really, I should find a simple test case first..  Rewriting the
RPN compiler was (is) fun and necessary for clarity, but units aren't
modules when it comes to macros..

The following example shows otherwise: values are properly shared, the
module is instantiated only once.  (eq? it1 it2) => #t

;; it.ss
#lang scheme/base
(provide it)
(printf "instantiating it\n")
(define it "it")

;; sig.ss
#lang scheme/base
(require scheme/unit)
(provide sig1^ sig2^)
(define-signature sig1^ (it1))
(define-signature sig2^ (it2))

;; top.ss
#lang scheme/base
(require "unit1.ss"
         "unit2.ss"
         "sig.ss"
         scheme/unit)
(define-syntax-rule (define/invoke (sig ...) (unit ...))
  (begin
    (define-compound-unit/infer combined@
      (import)
      (export sig ...)
      (link unit ...))
    (define-values/invoke-unit combined@
      (import)
      (export sig ...))))
(define/invoke (sig1^ sig2^) (unit1@ unit2@))

;; unit1.ss
#lang scheme/unit
(require "sig.ss" "it.ss")
(import)
(export sig1^)
(define it1 it)

;; unit2.ss
#lang scheme/unit
(require "sig.ss" "it.ss")
(import)
(export sig2^)
(define it2 it)


So.. what is the problem then?

Is it because values are somehow wrapped?  Is this related to the
remark that eq? doesn't work across unit boundaries?

Maybe I should accept that the eq? is bad practice.  The assembler
instance can't be used for pattern matching.  Maybe it should just
match the name then..


Entry: Forth parsing
Date: Sun Apr 26 03:01:08 CEST 2009

The trouble is that instead of a list with a hole in it (the
dictionary's compile point = cursor), it would be nice to be able to
put holes in nested applications.

This is exactly what the previous implementation did.  Instead of
being able to process the whole locals expression at once, it
incrementally bulds an inside-out structure, where the outside is
represented by a (stack of) wrapping procedures.

So, there are 2 options:
    * create a zipper-like dictionary for incremental compilation
    * do multi-pass parsing using delimiters

Some trick needs to be used..

The thing is this:

: test | ; : | ;

What is this supposed to mean?  Introducing the local variables then
kills all hope of using either ; or : as delimiter, since they should
be redefined.

Ha, Looks like the Factor syntax changed too :)

Let's go look at Forth for inspiration.
I found Gforths's syntax:

   { l1 l2 ... -- comment }


The problem is that I'm using non-local exits.  I like them, and they
are used for both macros and Forth words.  The only real solution is
to keep it like it is:

   * new definition terminates old

   * locals installs a wrapper around the remainder of the code.

   * parsing words cannot be used as argument names,


the last one because they are not recognized as lexically bound during
parsing, only when the parsed source is expanded after passing it to
the dictionary compiler.  this is also a good thing since it would
otherwise allow redefinition of words like : as a local variable,
messing up the code structure.

it's probably best to have a separate 'locals tag that is recognized
by the -> scheme compiler.  actually, this doesn't look all that bad.

: foo | a b | b a 1 - -

-> ((name foo) (locals (a b)) (fn a) (fn b) (im 1) (fn -) (fn -))


    1. tag source and execute parsing words
    2. compile tagged source to scheme (including some special forms)


now, can 'locals be implemented as a simple "cons" macro?


Entry: regular expressions
Date: Sun Apr 26 09:41:26 CEST 2009

Maybe i should just move on to regular expressions for all this lexing
activity (essentially "tabel parsers").  It's ok to write one state
machine, but it gets old quite fast..


Entry: sharing data between macros
Date: Sun Apr 26 09:59:19 CEST 2009

Problem: individual rpn-transformers should be able to tag words with
semantics just like the core transformer does.

One thing is to always pass the semantics macros environment around in
a hash table or so.  But i wonder if it isn't a lot simpler to
re-resort to paramters, in this case syntax parameters.

Maybe I should learn to take a hint and see the use of a literal 'name
symbol as exactly the problem I'm trying to solve.  The parser is
mainly about translating Forth code to simple s-expressions (in the
form of tagged atoms) so it is trivial to parse later _with_
information provided by the parsing words themselves.

(: abc 123 +) ->
((name abc) (lit 123) (apply +)) ->
(name abc ((lit 123) (apply +)))

Hmm.. i lost perspective a bit..  It looks like there's a very simple
and elegant solution hidden somewhere, but I can't find it.  The clue
seems to be "fold": there is a datastructure parsed up to a certain
point, how does the next token add to this?  A new slogan:

       Forth parsing = fold.

To be continued..


Entry: ns -> rename transformer
Date: Sun Apr 26 10:05:46 CEST 2009

not using this is probably the reason why syntax checking wont work
for lexical variables in scat quotations.

hmm.. not really what i thought.


Entry: parser idea
Date: Sun Apr 26 10:42:44 CEST 2009

The thing is: what if rpn-parse will _not_ translate identifiers if it
deems they are not transformer bindings?

The idea beeing that parsing words only serve to translate a flat file
into forth syntax, annotating each atom with a compiler semantics.

There is really no way around using some kind of cursor into the final
data structure = the scheme expression.

An empty dictionary is equivalent to:

  (begin (lambda (p) p))

With the cursor made explicit:

  (begin (lambda (p) (C p)))

compiling 123:

  (begin (lambda (p) (C (push 123 p))))

compiling a reference to a in ns (foo):

  (begin (lambda (p) (C (apply foo/a (push 123 p)))))

compiling the name tag 'bar in namespace (foo):

  (begin (lambda (p) (apply foo/a (push 123 p)))
         (define foo/bar (lambda (p) (C p))))


now, compiling the locals '(a b)'

I find it so increadibly difficult to switch between nested lambda
representation and a CPS/monadic style representation.

But, in words, what does the locals transform do?

   It transforms the current lambda expression:

      - it replaces it with a new lambda expression that applies the
        expression collected upto now to its input stack.

      - pops values off the stack and binds them lexically to constant
        functions.

      - expands the rest of the code inside this new context.


  (before ...) (local ...) (after ...)

  (lambda (p)
    (let ((p-in (apply (program: before ...) p)))
      (let-values (((p-popped local ...) (pop-stack p-in)))
        (apply (program: after ...) p-popped))))


How does this translate to a flat representation?

   - grabbing the (before ...) list isn't that difficult: it's already
     in the dictionary.

   - closing it needs access to its closing procedure.


Entry: zipper dictionary
Date: Sun Apr 26 12:39:23 CEST 2009

So I abstracted the pattern into an abstract data structure.  The key
seems to be that instead of postponing the "collapse" operation to a
postprocessing step, it should be accessible to users of the library
so the current entry can be collapsed and tagged.

Let's apply this to locals to see what's necessary:

   - grab the current code and return it as an expression

   - install a new collapse routine, which uses the default collapse
     as an embedded operation.

So the dictionary needs to store the default collapse.

It's probably best to make sure the internal state of the dictionary
is never observable: users of the dictionary can only put stuff
inside, but never take out.  The semantics function (packer) itself
should be observable, just like the result of its application to the
current entry.

;; Locals.  Transform object + semantics into new semantics.
(define (locals-obj+pack->pack obj pack)
  (lambda (instructions)
    `(lambda (p)
       (let ((p+ (apply ,obj p)))
         (let ((top (car p+))
               (p++ (cdr p+)))
           (apply ,(pack instructions) p++))))))


It looks like this works.

Time to use it in the rpn parser.

First, remove code tagging.  That's about how the structure is used
and it can be hidden in the semantics.

Wait: nested locals.  It should have a proper semantics.

Hmm... it doesn't do what I expect.

Using the default semantics will not nest the locals properly (earlier
ones no longer visible).  Using the current replaced semantics will
apply the first program twice.


Using the current semantics:

  (lambda (p)
    (let ((p+
           (apply
            (lambda (p)
              (let ((p+ (apply (program: 10000) p)))
                (let ((*OUTER* (car p+)) (p++ (cdr p+)))
                  (apply (program: 20000) p++))))
            p)))
      (let ((*INNER* (car p+)) (p++ (cdr p+)))
        (apply
         (lambda (p)
           (let ((p+ (apply (program: 10000) p)))
             (let ((*OUTER* (car p+)) (p++ (cdr p+)))
               (apply (program: 30000) p++))))
         p++))))

This is not correct as (program: 10000) gets applied twice.  To make
sure let's alpha-convert.

  (lambda (p0)
    (let ((p1
           (apply
            (lambda (p2)
              (let ((p3 (apply (program: 10000) p2)))
                (let ((*OUTER* (car p3)) (p4 (cdr p3)))
                  (apply (program: 20000) p4))))
            p0)))
      (let ((*INNER* (car p1)) (p5 (cdr p1)))
        (apply
         (lambda (p6)
           (let ((p7 (apply (program: 10000) p6)))
             (let ((*OUTER* (car p+)) (p8 (cdr p7)))
               (apply (program: 30000) p8))))
         p5))))

Damn.. i miss the intution and i can't make the algebra make sense..

I'm probably confusing two things.  The problem is that normal code
nesting reverses the order, but locals nesting is the same order as in
scheme.

So.. Is it possible to somehow change the code so that (a b c)
corresponds to (a (b (c _))) instead of (c (b (a _)))?

Yes.. It's CPS.

So, maybe I should use cps.. When I do that, I do want to know how
good things are optimized away though, to make sure representation is
a bit decent.


Entry: CPS + optimization
Date: Sun Apr 26 16:41:40 CEST 2009

http://docs.plt-scheme.org/mzc/decompile.html

Using

  mzc -vk comp.ss ; mzc --decompile compiled/comp_ss.zo

comp.ss :

  (define (inc x k)
    (k (+ 1 x)))

  (define (inc3_ x k)
    (inc x (lambda (x)
    (inc x (lambda (x)
    (inc x k))))))


Inspecting the output gives something like this:

  (define (inc3 p k)
    (let* ((p1 (+ '1 p))
           (p2 (+ '1 p1)))
           (k  (+ '1 p2))))


So it does look like this rep might be valuable as an abstraction
mechanism.  Let's see if it still works with more complicated
functions.


(define (add p k)
  (k (cons (+ (car p)
              (cadr p))
           (cddr p))))
(define (add4 p k)
  (add p  (lambda (p1)
  (add p1 (lambda (p2)
  (add p2 k))))))


=>

  (define (add4 p k)
    (let* ((p1 (cons
                (+ (car p)
                   (cadr p))
                (cddr p)))
           (p2 (cons
                (+ (car p1)
                   (cadr p1))
                (cddr p1))))
      (k (cons
          (+ (car p2)
             (cadr p2))
          (cddr p2)))))

So the function gets inlined and applied closures are converted to let
expressions.


Now, instead of using cps, it might be easier to already put the
compositions in let form.  The goal eventually is to add other let
bindings to a list!

So, let*-transformation works like:

(a b c) ->

(lambda (p)
  (let* ((p (a p))
         (p (b p))
         (p (c p)))
    p))

Or better in it's nested form which works better with other
let-transformers, which can be spliced right in.  This then makes the
locals problem completely trivial.

(lambda (p)
  (let ((p (a p)))
  (let ((p (b p)))
  (let ((p (c p)))
    p))))


So, let's switch the rpn language to a "nested let" representation
(EDIT: administrative normal form[1]).  This is quite trivial.

... foldl ... #`(apply atom #,expr)

->

... foldr ... #`(let ((p (apply atom p))) #,expr))))

Tests pass without trouble.

Ha! for once i have the feeling i know what i'm doing!


Entry: zipper-dict
Date: Sun Apr 26 18:45:49 CEST 2009

After the CPS / nested-let story I forgot about the zipper dict.

But is it really necessary?  The semantics is no longer needed:
problem solved with nested reps -- it was only used for locals.

So, I've closed the abstraction: there is now only:

    d-open      ;; create a new dictionary
    d-compile   ;; compile an instruction
    d-start     ;; start new definition
    d->forms    ;; close dictionary and output forms in order


[1] http://en.wikipedia.org/wiki/Administrative_normal_form

Entry: the forth parser
Date: Sun Apr 26 21:13:58 CEST 2009

now need to fix the forth parser one by one.  the running state might
be problematic though:
   - current mode
   - toplevel forms

maybe use a syntax-parameter for this?


Entry: more forth parsing
Date: Mon Apr 27 13:15:42 CEST 2009

fixed the 'include' parser this morning.
this means i'm done, except for the parsing state, which will probably
need a parameter.

maybe it's best to use the same trick as in the include parser:
install a dynamic context and continue the parser, then at the end
collect all data.

then can probably be written in terms of each other..


Entry: dictionary "extra" state
Date: Mon Apr 27 16:20:16 CEST 2009

It's not really elegant to augment the purely functional parser with
parameters and imperative stacks.  However, it is quite isolated and
should be fixable by making the dictionary object itself extensible.

Prime objective now is to get it to work again.  That last piece of
cosmetics is for later.

I do wonder though how this would be solved using monads: if you have
a piece of threaded state, how do you extend it?

The current implementation confuses me.  The dictionary probably needs
type tags: semantics for compiling an entry: lambda-tx should be
passed in!

Ok. This works + i made testing a bit easier.

test-pic18.ss works too again.


Entry: forth bootstrap with new rpn parser
Date: Mon Apr 27 18:14:50 CEST 2009

with the new parser infrastructure it should be quite straightforward
to bootstrap a standalone forth compiler: map the dictionary to the
machine VM semantics.

as long as the Forth code defining the interpreter doesn't use a
feedback loop to define lexing/parsing/macro words this should work
just fine.


Entry: fixing pic18 forth parsing
Date: Mon Apr 27 18:52:17 CEST 2009

next: need to fix the parser to make embedding toplevel scheme
expressions work, and allow some state to be maintained during
parsing.


actually.. it's not that incredibly difficult: just add semantics to
the name -> this gives how it should be defined.  toplevel forms can
then just be added as anonymous definers.

or better yet, simply use a list of macros, which means the form can
be executed immediately once it is done.

  i.e. (register-rpn name rpn-lambda code ...)

to make require forms work better, they could be expanded in-line.

Let's forget about arbitrary scheme forms, but let's have a good look
at how require can be made to import syntax.

The problem uptil now has always been that while it was possible to
add require forms in the module body, the forth syntax would already
be completely expanded before these require forms could introduce new
bindings..

What is necessary: whenever a 'require form is encountered, that
module should be instantiated such that it's syntax bindings are
visible.

Anyways.  Let's first make sure that a dictionary can be expanded to
any kind of top-level form.

Ok.  For 'rpn-begin the dictionary compiler is just 'begin, so any
expression can be inserted.

Let's fix this first for scheme forms.


Entry: prefix macros : term rewriting in Forth code
Date: Mon Apr 27 23:43:05 CEST 2009

One basic prefix macros are working, they behave exactly the same as
Scheme syntax-rules macros.

(define-syntax-rule
  (rpn-syntax-rules (literal ...)
                    ((pattern ...) (template ...)) ...)
  (make-rpn-transformer
   (lambda (w d k)
     (syntax-case w (literal ...)
       ((pattern ... . w+)
        (k (syntax->list #`(template ... . w+))
           d))
       ...))))

Entry: next
Date: Tue Apr 28 01:07:38 CEST 2009

TODO:  - current mode
       - source location

Current mode can copy the mode from the last dictionary item.
Source location requires an updated rpn-parse, and can be added later.


Entry: top level coma/forth parser
Date: Tue Apr 28 09:49:24 CEST 2009

This needs a new name.  Actually, it should be part of "forth" since
what it actually does is to multiplex 3 semantics in one file:
variable, word and (concatenative) macro.


Entry: cleanup
Date: Tue Apr 28 10:31:33 CEST 2009

salvaged from rpn/cps.ss :

(define-syntax (cps stx)
  (let ((cps-fns (cdr (syntax->list stx))))
    #`(lambda (k)
        #,(foldr (lambda (fn k)
                   #`(lambda (p) (#,fn p #,k)))
                 #'k
                 cps-fns))))


(define-syntax (cps-let stx)
  (let ((cps-fns (cdr (syntax->list stx))))
    #`(lambda (p)
        #,(foldr (lambda (fn sub)
                   #`(let ((p (#,fn p))) #,sub))
                 #'p
                 cps-fns))))

(define-syntax-rule (macro form)
  (syntax->datum (expand-once #'form)))


(check
 (macro (cps a b c))
 => '(lambda (k)
       (lambda (p)
         (a p (lambda (p)
         (b p (lambda (p)
         (c p k))))))))

(check
 (macro (cps-let a b c))
 => '(lambda (p)
       (let ((p (a p)))
       (let ((p (b p)))
       (let ((p (c p)))
         p)))))


Entry: instantiation
Date: Tue Apr 28 10:43:12 CEST 2009

What is necessary first is an instantiation syntax that's independent
of Forth syntax.  Then later translate Forth to it using the new
parser.

ok: basic macros are in place + "mode" works by inspection of last
dictionary element.


Entry: require + define-syntax
Date: Tue Apr 28 13:44:38 CEST 2009

Now, when 'require and 'define-syntax are encountered in a dictionary,
it is probably best to expand them before parsing the rest of code.
This can be done by generating a recursive call to the top begin form.

I'm happy with this new representation: things are much easier to
express and the problem actually looks simple now -- as it should,
since it is already solved.

It really _looks_ like a forth compiler too now, with the only
exception that all mutation is replaced with some functional
counterpart.


Entry: what is what..
Date: Tue Apr 28 15:30:21 CEST 2009

now i'd like to be able to parameterize the forth so it doesn't depend
on macro: and (macro) but that doesn't seem to be possible.

so what is it part of?  it's more like an emerging thing..

  rpn = forth-style dictionary compiler used to implement pure
        concatenative languages in s-expression syntax and forth
        prefix style parsing.

  comp = macro instantiation + post-proc opti

since instantiation is a big part of what comp is, the forth should be
part of comp.


Entry: target:
Date: Tue Apr 28 16:09:30 CEST 2009

Forgot about that one..

(How come I only now see it not compiling? something wrong with deps..)

This is part of the live.ss stuff and broken now.

Let's focus on pic18.ss = just the compiler.


Entry: separate parser namespace
Date: Tue Apr 28 16:22:17 CEST 2009

maybe the prefix parsers should live in a separate namespace? the
problem however is that they are bound to primitive semantics macros.
but these could be perameterized.

well, i'm not 100% convinced this will work with units due to the
amount of syntax juggling.

let's keep it as it is.  it's always possible to define a non-hygienic
parser macro.

in fact i'm already in trouble!


now i'm really confused..
time to give it some rest.


Entry: tscat:
Date: Wed Apr 29 10:23:57 CEST 2009

This uses some special trick in the name mapping.  Will the simpler
infrastructure be able to handle it?

The problem is this:

  (define (map-id id)
    (let ((target (ns-prefixed #'(target) id))
          (macro  (ns-prefixed #'(macro) id)))
      (cond
       ((identifier-binding target) target)
       (else #`(target-simulated #,macro)))))

Name mapping in the new rpn code cannot be made procedural.  To get to
the same functionality, all macros need to be wrapped in the target
space as well.

So, instead of seeing a namespace as a collection of objects, what
about seeing it as a _language_ first, with the objects being part of
the implementation.

So, add a macro to map from (macro) -> (target) for each target word.

No.. The cleanest solution is to make the namespace syntax functional
by setting

(define-syntax-rule (macro . form) (ns (macro) . form))

And figuring out how to get to the local identifiers.

Let's try in rpn first.


Entry: namespace mapping
Date: Wed Apr 29 11:42:49 CEST 2009

I'd like to expand namespaces to something more abstract: a generic
identifier mapping mechanism.  This to implement the 'target: macro
which takes elements from a different namespace if they are not
transformer bindings.

The problem is that I can't seem to figure out how "local-expand" can
be used to turn the abstract mechanism back into a concrete identifier
mapping, so rpn-transformer instances can be looked up at compile
time through syntax bindings.

Now, to keep a bit of sanity, I found that staying away from deep
macro system internals is generally a good idea.  There's a reason why
these things are hidden: they are quite complicated.

So, I'm going to adopt the following convention:

  * At compile time (namespace ... id) is directly interpreted as (ns
    (namespace ...) id) and cannot be overridden.

  * For run time identifiers the frorm (namespace ... id) can have
    arbitrary meaning.

So later when I figure out how to properly implement this it could be
changed.


Entry: units and macros
Date: Wed Apr 29 18:00:49 CEST 2009

There is one thing i didn't understand when starting with units: it is
not so straightforward to use transformer bindings.  Maybe it isn't
the abstraction I'm looking for after all.

It's a bit obvious that you can't separately compile things and later
fill in compile time dependencies.

I'm missing some intuition again.  Where do I put the parsing words
and the type info?

I need another abstraction.

Now, the wordlists were a good idea.  Can they be combined with some
other macro-based linking form?

..

After reading this[1] again:

   "Each id in a signature declaration means that a unit implementing
    the signature must supply a variable definition for the id. That
    is, id is available for use in units importing the signature, and
    id must be defined by units exporting the signature.

   "Each define-syntaxes form in a signature declaration introduces a
    macro to that is available for use in any unit that imports the
    signature. Free variables in the definition’s expr refer to other
    identifiers in the signature first, or the context of the
    define-signature form if the signature does not include the
    identifier.


It seems quite clear that yes units only link _variable_ declarations
and inside the unit's body it is possible to use syntax depending on
some of the variable bindings, but this syntax needs to be tied to the
signature's compile time data.

Now, how does this relate to parsing words.  When are they necessary?
Only in .f code.  S-expressions don't need them, as they can use pure
concatenative code.

So..

   * The basic composition part in Staapl is still the module, but it
     is possible to create modules from units.

   * For ops the signature contains compile-time data for type
     checking so this problem should be solved.

   * Abstracting macros becomes difficult.

[1] http://docs.plt-scheme.org/reference/creatingunits.html


Entry: Abstracting parsing words.
Date: Wed Apr 29 18:29:37 CEST 2009


Let's try to abstract parsing words in units too.  Since they are
always written in terms of scat words (which are values), units _can_
be used.

However, it does seem there is no way to then get at this syntax by
requiring a module..  Only by importing the signature.

That seems to answer the question:

   * forth files are units, because they need _syntax_ which depends
     on functionality provided by other units.

   * all parsing words are part of signatures if they depend on
     external functionality.

So, I'm confident I can find some way to organize all code, so let's
start with where I left off wrt. the operator signatures, and work up
from there.


Entry: ops and op matching
Date: Wed Apr 29 19:07:44 CEST 2009

This is a tough one.. What I'm really trying to do is to:

  * Check syntax of pattern matching at compile time.  This is
    currently only existence of op + arity.

  * Implement matching at run-time (using tagging).

  * Associate the op instance with semantics.  (avoid symbolic lookup
    / interpretation here).

I'm currently confusing tags with implementation instances: it is
possible to have an instance (a tagged list) without associated
semantics, or to have multiple instances of the same type (op) with
different semantics.

So basicly

   * _matching_ should just check the symbol, not the semantics.
     since we're taking the instruction apart, we really don't care
     about what it would do if it were still there.

   * _construction_ needs associated semantics

So matching needs:
   * compile time data
   * type predicate

Construction needs:
   * compile time data
   * constructor

This allows the insertion of operations that are not known except for
the local scope, but will properly assemble to machine code.

That's good.

Let's use (op <name> info) and (op <name> predicate).


Entry: namespaces are properties
Date: Wed Apr 29 19:26:34 CEST 2009

Actually it's much more reasonable to use the namspaces like this:

 (ns (op addwf info))   ;; compile time information
 (ns (op addwf ?))      ;; run time predicate
 (ns (op addwf asm))    ;; run time semantics
 (ns (op addwf dasm))   ;; inverse of semantics

Then they ring like properties of addwf.

There's a complication though: current implementation takes the
lexical context from the last element, so this won't work without
extra syntax to indicate from what context you want to pick a
symbol...

So it becomes:

  (op info addwf)
  (op ?    addwf)
  (op asm  addwf)
  (op dasm addwf)


(Note: row store vs. column store)


Entry: macro types
Date: Wed Apr 29 19:40:25 CEST 2009

But but!

When I want to add types to macros, units no longer suffice!

Or.. macro types should then again be part of the signature.  Which
makes sense:  it's ok to use a different _implementation_ but the
_specification_ (of which the I->O type is a part) should be static.

So again units seem to be the proper abstraction.


Entry: from dynamic to static
Date: Wed Apr 29 20:47:27 CEST 2009

So basicly a certain instance of the asm struct behaves as a
parameterized type.  The name tag determines the main variant, but the
implementation can be different.

It's a pain in some way to go through it like this, but it _is_ all
starting to make more sense.  Static typing is hard..  The help the
compiler gives doesn't come free!

It will probably take a couple of iterations but it looks like i'm at
least going to end up somewhere.


Entry: forth signature
Date: Wed Apr 29 22:27:06 CEST 2009

The deal is now: I have a file full of macro definitions, all in terms
of instantiate^ with as most as possible of the compile-time code
factored out in a separate module.

This should then be part of the instantiate^ signature, right?
Alternatively it can be instantiated outside of signatures.

Probably best with (extends instantiate^)

Now, before writing a Forth-only form for instantiation, let's first
concentrate on an s-expression.

(forth-variables (name ...))
(forth-words     (name code ...))
(forth-macros    (name code ...))


;; Scheme forms for instantiation.  Maybe these can better be
;; avoided..  Its probably simpler to define in terms of the wrap-xxx
;; functions directly.

 (define-syntax-rule (forth-macros (name code ...) ...)
   (begin (define-macro wrap-macro name code ...) ...))
 (define-syntax-rule (forth-words (name code ...) ...)
   (begin (define-forth wrap-word name code ...) ...))
 (define-syntax-rule (forth-variables (name ...))
   (begin (define-forth wrap-variable name 1 allot) ...))


Entry: the signature
Date: Thu Apr 30 09:06:19 CEST 2009

This probably needs a macro to not have to use define-syntaxes only.


Entry: levels again..
Date: Thu Apr 30 10:24:31 CEST 2009


When I require the module, the rpn-syntax-rule form works, but the
prefix-parsers rule doesn't work.

(define-syntax-rule
  (rpn-syntax-rules (literal ...)
                    ((pattern ...) (template ...)) ...)
  (make-rpn-transformer
   (lambda (w d k)
     (syntax-case w (literal ...)
       ((pattern ... . w+)
        (k (syntax->list #`(template ... . w+))
           d))
       ...))))

(define-syntax-rule (prefix-parsers namespace ((name arg ...) template) ...)
  (ns namespace
      (define-syntaxes (name ...)
        (values (rpn-syntax-rules () ((_ arg ...) template)) ...))))

This is because it is used at compile time.  So 'rpn-syntax-rule will
probably need to be defined in parser-tx.ss

In addittion, the file that defines prefix-parsers needs to (require
(for-syntax "parse-tx.ss"))


Entry: postprocessing macro using expand-to-top-form
Date: Thu Apr 30 11:10:51 CEST 2009

Maybe I should give it a try again, to make ns-tx a postprocessing
macro.

expand-to-top-form

The problem is: expansion needs to stop before any bindings are
introduced in body code.

It looks like expand-syntax-to-top-form is necessary since the lexical
environment needs to be left intact.

Maybe do it in this way: use expand-syntax to figure out which
identifiers are in binding position so they can be mapped.


OK..

I find it strange why I can't get this to work (wont match):

          (syntax-case top-form (define-values)
                 ((define-values (name ...) expr)
                  #`(#,(datum->syntax stx 'define-values)
                     #,(prefixed-list #'(name ...)) expr))

And have to resort to something like this:

          (syntax-case top-form ()
                 ((form (name ...) expr)
                  (form? 'define-values)
                  #`(#,(datum->syntax stx 'define-values)
                     #,(prefixed-list #'(name ...)) expr))


The datum->syntax take 'define-values from the caller's
context.. Maybe that's not a good idea, and it should be our context.

The expanded form's tag itself is not visible in the caller's context,
so we cant just re-insert it.


Hmmm..
I got it to work, but I don't understand why 'define still shows
up.. Thought that was not a primitive form? (-> define-values)

Anyways, where was I.

define-signature for forth parsers...  this won't work with
expand-to-top-form in ns-tx.. probably best to do it in a separate
macro.


Entry: sometimes it doesn't expand
Date: Thu Apr 30 14:34:45 CEST 2009

This won't expand ns forms unless there is a (require (for-syntax
"ns.ss")) in the definition place of this macro  Expansion happens in
the transformer environment, so it needs bindings for the forms.

;; Collects all syntax definitions.
(define-syntax (define-signature-begin stx)
  (define (tx-forms forms)
    (for/list ((form (in-stx forms)))
      (if (identifier? form)
          form
          (let ((top-form (expand-syntax-to-top-form form)))
            ;; (pretty-print (syntax->datum top-form))
            (syntax-case top-form ()
              ((form names expr)
               (and (eq? 'define-syntaxes (syntax->datum #'form)))
               #`(define-syntaxes names expr)))))))
  (syntax-case stx (extends)
    ((_ id^ . forms)
     #`(define-signature id^
         #,(tx-forms #'forms)))
    ((_ id^ extends id-super^ . forms)
     #`(define-signature id^ extends id-super^
         #,(tx-forms #'forms)))))


Entry: syntax signatures
Date: Thu Apr 30 14:56:51 CEST 2009

now how to get the syntax out of the signature, back into a module
namespace?

you can't

it only makes sense inside a unit because it binds unit imports.  as a
consequence, .f files can never be modules since their most basic
syntax requires syntax transformers depending on the compiler
instantiation code.

as a result, each compiled .f file then needs to be linked with the
compiler to be able to produce code.

i'm still not convinced.. need to sleep on it..

anyways, apart from the "is it possible" thing here, it's probably
enough (and better) to have units only for .f files.

so next: create a unit in s-expr syntax.

ok.. i don't understand it.. maybe it was a bad idea to put the forth
syntax in a unit.  let's try to use macro defining macros.


Entry: units vs. modules
Date: Thu Apr 30 15:38:14 CEST 2009

* modules serve to properly handle macros they provide bottom-up
  language design.

* units serve to identify separately compiled components.


Entry: just cut & paste
Date: Thu Apr 30 17:13:49 CEST 2009

i can't figure out how to abstract it: too many levels at once makes
my head hurt..

so let's first try to get the bugs out and then try again.


;; TEST
(macro-forth-begin : abc 1 + : def 1 +)
(define cfg (pic18-compile->cfg (list inline/abc inline/def)))
(print-target-word (car cfg))

This creates a CFG.

So it looks like the full circle works.. Now clean it up.


Entry: code collection
Date: Fri May  1 09:31:09 CEST 2009

Is trivial once the registration macro gets passed to the definer
words.

Entry: expand-to-top-form again
Date: Fri May  1 12:27:04 CEST 2009

I suspect I have another problem with expand-to-top-form..  The
identifier 'new-lambda' shows up in a place i don't understand.  It's
not one of mine.

In the PLT code i find this:

tom@zzz:/plt/collects$ grep -Ire 'new-lambda' *
scheme/private/kw.ss:  (#%provide new-lambda new-λ
scheme/private/kw.ss:  (define-syntaxes (new-lambda new-λ)
scheme/private/kw.ss:    (let ([new-lambda
scheme/private/kw.ss:      (values new-lambda new-lambda)))
scheme/private/kw.ss:                  (normalize-definition stx #'new-lambda #t #t)])
scheme/private/pre-base.ss:             (rename new-lambda lambda)


Ok. After restoring the old ns-tx all works fine.


Entry: local
Date: Fri May  1 13:02:03 CEST 2009

A small tangent.  With (require scheme/local) :

box> (local [(pic18-begin : abc 123)] foo)
coma/macro-forth.ss:50:7: local: not a definition at: (register! inline) in: (local ((pic18-begin : abc 123)) foo)

It would be nice to restore that to be able to perform isolated
parsing.  Probably needs a separate 'let form.


Entry: jump^
Date: Fri May  1 17:07:17 CEST 2009

I'd like to override jw/false in PIC18 but it is already exported in
the compiler.  The reason being that PIC18 performs a lot of
optimizations to optimize jumps and bit tests (about the only thing
the PIC18 is good at).

But, there's a conflict here: should this optimization be done as a
2nd pass optimization, or can we do it at once?  + is this second pass
necessary (or should the first pass do opti?)


Entry: things to fix
Date: Fri May  1 18:23:42 CEST 2009

  - pic18 conditional jump optimization
  - pic18 org
  - pic18 full dictionary export
  - CFG without mutation
  - compiler cleanup
  - local exit without parameter

Most of these need to be done with test enabled, which means the cjump
has priority.


Entry: jw/false
Date: Sat May  2 11:00:30 CEST 2009

The simplest solution is to take it out of compiler-unit.ss and jump^


Now, delegation to run-time words won't work any more.  So how to
implement this: ??

 (([qw l] jw/false)       (macro: ~>z ,(insert `([bpz 0 ,l])))) ;; STUB


 (([qw l] jw/false)       ([decf WREG 0 0] [drop] [bpc 0 l])))) ;; STUB


Entry: what is a .fm ?
Date: Sun May  3 09:06:05 CEST 2009

In other words: should it already export compiled code or leave that
to the loader?

I'm in favour of keeping modules at the macro level, and leaving
instantiation to the client.

But first:
    * instantiation checks
    * staaplc

Ok. After fixing bug instantiation works for the 452-40.f example.
Not checked yet if it produces the same code as before.

So.. A forth module.  How to make it more declarative.

The problem with the global state is that multiple modules will use it
to attach to.  I'd like to get rid of this final bit of state.

Problems:

  - If modules are made to export a symbol, this will clash.
  - If 2 modules use a 3rd, how will instantiation work?

The latter is probably the key: as long as code is not doubly
instantiated we're ok.

Fixed some bugs, now it's clear to me how it should work:  it's ok to
instantiate code on require, as long as
            * the original postponed macro stack is left intact
            * the cfg's are collected in sequence

Instantiator can get at the code by requiring the correct module.


Entry: trouble with locals probably
Date: Sun May  3 10:00:59 CEST 2009

Some matching error somewhere

This works:

(state-pop (make-state:stack `((,op/asm/qw 123))) 1 op/?/qw)

This too:

(macro-pop (make-state:stack `((,op/asm/qw 123) (,op/asm/qw 123))) 1)

Ah, no it returns the original stack:

123
#(struct:stack
  #<procedure:update>
  ((#(struct:asm #<procedure:qw> qw) 123)
   (#(struct:asm #<procedure:qw> qw) 123)))

looks like a prototype mismatch: stack is expected in first position
but given last.

ok fixed.


Entry: parser macros again
Date: Sun May  3 11:19:45 CEST 2009

Hmm.. because 'expand uses the 'forth-begin linked to the compiler
implementation, instantiating the forth words like it's done now wont
work unless the dependencies are propagated through:

(define-forth
  ;; defines parser invocation from scheme + Forth parser words ...
  (forth-begin forth macro : :macro :forth :variable)
  ;; ... in terms of registration and compiler wrappers.
  (register! wrap-macro wrap-word wrap-variable))

So I'm going to make this macro non-hygienic to solve the dependency problem.

This seems to work:


(begin-for-syntax
 (define $ syntax-local-introduce))
;; Primitive parsing word transformers coupled to compilation forms.
(define-syntax (define-forth stx)
  (syntax-case stx ()
    ((_ (forth macro : :macro :forth :variable)
        (reg wrap-macro wrap-word wrap-variable))
     #`(begin
         (define-syntax-rule (#,($ #'forth-begin) . code)
           (forth-begin/init (forth-word reg wrap-word #f rpn-lambda) . code)) ;; (*)
         (ns (macro) (define-syntax :macro    (with-mode #'macro-word #'reg #'wrap-macro)))
         (ns (macro) (define-syntax :forth    (with-mode #'forth-word #'reg #'wrap-word)))
         (ns (macro) (define-syntax :variable (with-mode #'forth-word #'reg #'wrap-variable)))
         (ns (macro) (define-syntax :         (last-mode #'reg
                                                         #'forth-word #'wrap-word
                                                         #'macro-word #'wrap-macro)))
         (prefix-parsers
          (macro)
          ((forth) (:forth #f))
          ((macro) (:macro #f)))))))

Now to be honest I'm getting confused by the levels of quoting /
unquoting here.  That code is derived by successive transformation of
correct code, but I've lost the intuitive understanding.

Maybe by introducing the identifiers once this can be made more readable.

Ok: this abstracts it:

;; Non-higienically introduce a collection of identifiers.
(define (datum->syntax-list stx lst)
  (map (lambda (x) (datum->syntax stx x)) lst))

(define-syntax-rule (syntax-introduce-identifiers stx lst body)
  (syntax-case (datum->syntax-list stx 'lst) ()
    (lst body)))


Entry: require + recursive expand
Date: Sun May  3 12:58:15 CEST 2009

This works now.  Care needs to be taken to preserve the proper context
for 'require forms though:

(define-syntax (require-file-id stx)
  (syntax-case stx ()
    ((_ id)
     (datum->syntax
      #'id ;; Note that the whole form should have the caller's context.
      `(require (file ,(path->string (stx->path #'id))))))))


Entry: compiler
Date: Sun May  3 14:07:42 CEST 2009

So, with modules working, shouldn't compilation become trivial?  Yes
for module -> binary, but there's also the .dict file which requires
reflection.

I wonder what 'init-namespace did..

Let's make compilation only work for modules.  This way there are
never any missing identifiers.

(require "app/452-40.fm"
         (planet zwizwa/staapl/code)        ;; code->binary
         (planet zwizwa/staapl/port/ihex))  ;; write-ihex
(write-ihex (code->binary))

This produces almost correct code:

diff 452-40.hex /tmp/test.hex
2c2,3
< :0E00000000260F0E000181000FC00FE00F4020
---
> :10000000000026000F000E0000000100810000002B
> :0C0010000F00C0000F00E0000F004000D7

The difference is in ',' performing word compile instead of byte
compile.

fixed.


Entry: recursive expansion
Date: Sun May  3 15:41:43 CEST 2009

messes up all parameter context, so "load" won't work properly.  the
solution is of course to dump out the context in the ..-begin
continuation.


Entry: minor problems to fix
Date: Sun May  3 17:04:39 CEST 2009

- context for recursive expansion (maybe save context in the
  dictionary in the first place?)

- staaplc


But first something more inspiring.  Abstracting some of the parser
words into a simpler syntax.  The CPS and explicit CDR on the stream
makes it hard to read.

-> ok, but not many chances to use it..


Entry: staaplc
Date: Sun May  3 19:01:02 CEST 2009

Simplified it:

   - the file needs to be a module, which means it knows its own
     language and can be a scheme module which exports binary code.

     the most important change is that a module can now be compiled
     before it is instantiated: enabling static checks (like
     identifier bindings) to happen at this stage.  this solves the
     problem of not knowing where undefined identifiers are
     referenced.

   - no more connection to the programmer or device.  staaplc is
     offline only.  all the live code is in staapl/live.ss


Problems:
   - some reflection issues : solved
   - simulation : fixed (problem with exporting base language namespace)
   - disassembler : mostly fixed (just add byte addressing)
   - macro evaluater start state

disassembler needs to be coupled to the _instance_ of the assembler.
currently only the forms get built together.  maybe the disassembler
form should be a curried exprssion that can produce the disassembler.

done


Entry: prj
Date: Sun May  3 20:10:08 CEST 2009

now i forgot: what was this prj thing about?
multiple namespaces i think..  i'm going to leave it in, but it's
currently not used because everything is module-based now.

the only interesting code that i can remember is sharing code for
running multiple instances with the same core compiler instance.


Entry: disassembler
Date: Sun May  3 20:46:26 CEST 2009

what about turning disassembly into a reflective operation: query the
namespace instead of building state.  DONE: define-dasm-collection

but conceptually what is the difference with collecting global code?


Entry: macro exit without parameters
Date: Mon May  4 01:09:04 CEST 2009

Seems to be quite straightforward.  Removed the parameter and replaced
it with this function:

;; The ";" word inspects the macro return stack.  If there's context,
;; execute mexit.  Otherwise we're in straight line code and can
;; execute primitive-exit.
(define (semi state)
  (if (null? (compiler-rs state))
      ((ns (macro) primitive-exit) state)
      ((ns (macro) mexit) state)))


Entry: compiler-unit.ss cleanup
Date: Mon May  4 01:26:36 CEST 2009

Now.. using the machine.ss notation i worked on before, cleanup the
compiler-unit.ss file so it's actually readable.


Entry: non-splitting return
Date: Mon May  4 01:35:15 CEST 2009

For constructing jump tables I now use the word '.' instead of '.,'
while ';' will return and mark the following code non-reachable.


Entry: using machine/vm-stx.ss for compiler
Date: Mon May  4 08:51:36 CEST 2009

In [1] there is some explanation of the syntax.  The gut of the syntax
transformer is in the function 'machine-nf which translates a
specification syntax to a normal form given a list and order of
registers.

 (syntax->datum (machine-nf '(A B C) #'((A) (B -> (cons A B)))))

=>

 ((A A A) (B B (cons A B)) (C C C))


;; Convert machine definition form to a symbol-indexed dictionary.
;; Use hash table for usage marking and duplicate checks.

(define (form->clauses form)
  (define hash (make-hash))
  (for-each
   (lambda (clause)
     (match (syntax->list clause)
       ((list-rest name expr)
        (let ((key (syntax->datum name)))
          (when (hash-ref hash key (lambda () #f))
            (raise-syntax-error 'duplicate-name
                                "Form contains duplicate name"
                                clause name))
          (hash-set! hash key clause)))))
   (syntax->list form))
  hash)


(define (clauses-ref/mark-defined! clauses r)
  ;; Hygienically introduce default (identifier not reachable from body code).
  (define (default) (list (datum->syntax #f r)))
  (let ((clause (hash-ref clauses r default)))
    ;; Mark it used.
    (hash-set! clauses r #f)
    clause))

(define (clauses-check-undefined dict)
  (hash-map dict
            (lambda (key notused)
              (when notused
                (raise-syntax-error 'undefined-register
                                    "Undefined register"
                                    notused
                                    (datum->syntax notused key)
                                    )))))

;; Convert machine definition clauses to normal form, completing
;; clauses if necessary, and sorting them in the correct order.
(define (machine-nf registers stx)
  (let* ((dict (form->clauses stx))
         (nf (datum->syntax
              stx
              (for/list ((r registers))
                (syntax-case (clauses-ref/mark-defined! dict r) ()

                  ;; Annotated syntax.  This makes it easier to use the same
                  ;; language for clauses with and without pattern matching.
                  ((reg -> expr)        #`(reg reg expr))
                  ((reg : pat -> expr)  #`(reg pat expr))

                  ;; Non-annotated.
                  ((reg)          #`(reg reg reg))
                  ((reg pat)      #`(reg pat reg))
                  ((reg pat expr) #`(reg pat expr))

                  )))))
    (clauses-check-undefined dict)
    nf))


So what's next?  Make this work on structure fields.  How does
scheme/match do this?  It uses syntax certifiers to access the stuct
type.

Now, it doesn't look like the original field names are preserved, only
the accessor and mutator names.

This means that the namespace has to be provided externally, possibly
by generating both the struct and the update form at the same time.


Ok.. basic form is working:

(define (machine-update-struct i struct-id registers stx)
  (let* ((info (extract-struct-info (syntax-local-value struct-id)))
         (make-struct-id (cadr info)))
    (printf "constructor: ~a\n" (syntax->datum make-struct-id))
    (syntax-case (machine-nf registers stx) ()
      (((reg pat expr) ...)
       #`(match #,i
                ((struct #,struct-id (pat ...))
                 (#,make-struct-id expr ...)))))))

Now to find some good names.

Function is updated to just copy non-defined fields:


(define (machine-update-struct-tx i struct-id registers-stx stx)
  (let* ((info (extract-struct-info (syntax-local-value struct-id)))
         (make-struct-id (cadr info))
         (size (length (cadddr info)))
         (registers (syntax->datum registers-stx)))
    (when (< size (length registers))
      (raise-syntax-error #f "Too many fields" registers-stx))
    ;; Pad fields if there aren't enough.
    (let ((registers
           (append registers
                   (for/list ((n (in-range (- size (length registers)))))
                     (string->uninterned-symbol (format "R~s" n))))))
      (syntax-case (machine-nf registers stx) ()
        (((reg pat expr) ...)
         #`(match #,i
                  ((struct #,struct-id (pat ...))
                   (#,make-struct-id expr ...))))))))


[1] entry://20090408-082123


Entry: using the machine update macro in compiler-unit.ss
Date: Mon May  4 11:39:49 CEST 2009

The only problem is that an update function can't call anoter update
function since the clauses are parallel.  I'm going to see if this is
a problem.  So far it can easily be solved by leaving continuations in
the asm stream.


Entry: making all examples compile
Date: Mon May  4 12:50:27 CEST 2009

Works well upto the synth.  Apart from some undefined symbols I run
into a problem with 'variable' which needs 'allot'.

geo-seq : this metaprogrammed word could probably be included properly
once require/load is working.

problem here is name binding though.. solved now with parameterizing
the macro:

  planet zwizwa/staapl/pic18/geo
  : geo-seq ' ,, compile-geo-seq ;

This does mess up the load path so need to fix that now.

Ok. At least fixed for all 'require statements that come _before_
'load statements.  Maybe 'load should be re-implemented to use a
serializable decent structure.  Otoh: 'load performs re-entering the
parser so maybe not such a good idea.  It might be simpler to use
explicit stacks.


Entry: weird error
Date: Mon May  4 14:50:12 CEST 2009

abort-current-continuation: continuation includes no prompt with the given tag: #<continuation-prompt-tag:meta>

This was due to not guarding target word evaluation. Fixed.


Entry: missing code
Date: Mon May  4 15:10:49 CEST 2009

the synt compiles nicely, but there's a whole chunk of code simply
missing.  0248-392 is not there.  the rest is what it should be.

this looks like a problem with org.

Composition order is wrong here, leading to dead code.

 (close-chain ,(make-target-split #f)
              ,(state-update ((dict -> (dict-terminate dict)))))

Ok. test now passes.

Wow.. What a maraton.


Entry: load vs. units
Date: Mon May  4 15:17:01 CEST 2009

now, i'm already really happy with the current single assignment
structure.  however, units are neater.  in the case of forth code
however, they can probably be automatically derived: simply take the
defined words and the referenced words with library subtracted.

so, i'm going to leave that like it is right now.  the only benefit is
separate unit compiation, which isn't really necessary yet.


Entry: next
Date: Mon May  4 20:08:27 CEST 2009

the rest is mostly cosmetics.

 * fix the disassembler so it gives a more userfriendly printout.
 * change the init-state behaviour for macro-eval.
 * fix recursive loading interaction with expand

then the next step is static information about the macros, and
possibly a simulator: move towards more static checks.  find
redundancies and fix them in rules.  eliminate fancy tricks - simplify
all semantics (there's still quite a lot of this).

i've been thinging about Forth assembler generators but the current
code with the 'patterns-class macro is already general enough.


Entry: optimizer / semanitcs
Date: Tue May  5 09:16:31 CEST 2009

Now pic18-unit.ss contains a series of ad-hoc rules foor peephole
optimization.  Is there a way to separate this from the core
semantics?

A good example are these:

(patterns-class
 (macro)
 ;;--------------------------------
 (word  l-opcode  s-opcode)
 ;;--------------------------------
 ((+    addlw     addwf)
  (and  andlw     andwf)
  (or   iorlw     iorwf)
  (xor  xorlw     xorwf))
 ;;---------------------------------------------------------------
 (([qw a ] [qw b] word)         ([qw (tscat: a b word)]))
 (([l-opcode a] [qw b] word)    ([l-opcode (tscat: a b word)]))
 (([qw a] word)                 ([l-opcode a]))
 (([save] [movf a 0 0] word)    ([s-opcode a 0 0]))
 ((word)                        ([s-opcode POSTDEC0 0 0])))


This contains a couple of concerns interwoven:

  * compile time evaluation (tscat: ...)
  * stack manipulation optimization
  * use of unary (one literal) and binary target ops


Entry: Futamura
Date: Tue May  5 09:25:25 CEST 2009

Now, the question is: is this a specializer that can take itself as
input?  Do the works of mister Futamura have any significance here?
Can the theory of partial evaluation be used to shed some light here?

The following is from [1] page 43.  PE is a partial evaluator, prog is
a program, const is its static data and data is its dynamic data:

  PE(prog,const)(data) = prog(const,data)

Here PE(prog,const) = prog' is a specialized program.

The 3 Futamura projections are then about prog being an interpreter
int, with the static data const = prog.  The first projection combines
the interpreter and the program into a compiled program.

  PE(int,prog)(data) = int(prog,data)
  +----------+ compiled program

When this specialization happens a lot with int = constant, the PE
itself can be specialized to the interpreter resulting in a compiler.

  PE(PE,int)(prog)(data) = int(prog,data)
  +--------+       compiler
  +--------------+ compiled program

Now, if we have to do this for a lot of different interpreters, we can
specialize the PE(PE,int) invokation to the static PE giving a
compiler generator PE(PE,PE)

  PE(PE,PE)(int)(prog)(data) = int(prog,data)
  +-------+            compiler generator
  +------------+       compiler
  +------------------+ compiled program


Anyways, I'd like to read some more of this [2].  It seems to contain
some answers about why partial evaluation isn't trivial.  On page 42
in [1] there is some simple answer to the non-triviality: partial
evaluators need to be conservative to ensure termination.

[1] http://thyer.name/phd-thesis/
[2] http://www.itu.dk/people/sestoft/pebook/


Entry: rules
Date: Wed May  6 10:10:06 CEST 2009

So instead of the rules in [1] isn't it better to explain what addlw
means in terms of qw then derive the rule?  in other words, instead of
defining rules for packing, define them for unpacking too.

(([addlw a] unpack) ([qw a] [cw +]))

As an intermediate step towards more insight, it might be best to find
which rules or clauses within a rule are invertible.  Once there are
inverses, it should be possible to start optimizing by search.

The thing is: once subsets of code have better isolated properties,
transformations on them could be done on a higher level.

In general, code concatenation is a moniod [2] :
  - closure
  - associativity
  - identity element

Identifying invertible elements would maybe make it possible to find a
subgroup in the monoid.  Then finding commutation relations could
construct an abelian group.

Actually, the 'unpack macros are disassemblers.

I'm thinking that a mechanism on top of the current macros is
necessary for this.  The problem with rules and general rewriters is
that they are algorithmically complex.  However, it might be possible
to devise a couple of passes of eager macros from a set of more
general rules and a bunch of training data in the form of programs.

The trick is going to be to link together the semantics of the
transformers defined in terms of how they act on code, and the
algebraic structure defined by the transformers alone, without this
semantics attached.


[1] entry://20090505-091631
[2] http://en.wikipedia.org/wiki/Monoid
[3] http://en.wikipedia.org/wiki/Transition_monoid


Entry: verifying a transformation rule given concrete machine semantics
Date: Wed May  6 11:05:43 CEST 2009

Maybe it's best to find a way to verify rules first.  Or better
individual clauses.

([addlw a] [qw b] +) -> ([addlw (a b +)])

This particular one is simpler when we're able to unpack addlw:

([qw a] [cw +] [qw b] [cw +]) -> ([qw (a b +)] [cw +])

or in concatenative code compiled to the generic vm with partial eval
removed:

( a + b + ) -> ( a b + + )


Entry: proof that [ a + b + ] == [ a b + + ]
Date: Wed May  6 11:20:19 CEST 2009

Funny, it's been a while since I did this kind of stuff.  But maybe
trying to do this unprepared will reveal some tricks.

To prove this, lets try to prove [ x a + b + ] == [ x a b + + ] first.

Note that a, b, x need to be values (self-quoting operators), not
generic operators (which might be non-invertible).

To prove this we relate the semantics of '+ in the concatenative code
to the semantics of '+ in a nested expression:

  ([qw a] [qw b] +) -> ([qw (+ a b)])

LHS:

  x       [qw x]
  a       [qw x] [qw a]
  +       [qw (+ x a)]
  b       [qw (+ x a)] [qw b]
  +       [qw (+ (+ x a) b)]

RHS:

  x       [qw x]
  a       [qw x] [qw a]
  b       [qw x] [qw a] [qw b]
  +       [qw x] [qw (+ a b)]
  +       [qw (+ x (+ a b))]

By unification we now need to prove

  (+ (+ x a) b)  == (+ x (+ a b))

Wich is easier to see with infix notation:

  (x + a) + b    == x + (a + b)

This is true by the associative property of '+

Generalizing this to all 'a we could attempt (*) derive the property

  [ + b + ] == [ b + + ]

which should be read as

  [ + (b +) ] == [ (b +) + ]

Then chopping off the last '+ gives

  [ + b ] == [ b + ]

In general, this is only works in one direction:

  [ + b ] ---> [ b + ]
  [ b + ] -/-> [ + b ]

because the former replaces a strong typing constraint (need two input
values) with a weaker one (need one input value).

So (*) introducing variables to be able to prove relations makes
typing stricter.  This is a significant insight.

So, the associativity law in concatenative code looks like:

  [ a b c + + ] == [ a b + c + ]

Equivalent and with a 2-value type constraint:

  [ x + ] == [ + x ]

Associativity is commutation of variable quote and apply.  Actually
this is not so strange: associativity is about changing the order of
function application.

To write this without variable quote we get, typed with 3-value
arguments:

  [ + + ] == [ >r + r> + ]

or with dip notation

  [ + + ] == [ [ + ] dip + ]

The variable quote form is simpler: '+ commutes with single result
functions.


Entry: lazy partial evaluation
Date: Wed May  6 18:16:37 CEST 2009

Interesting article here [1].  I'm getting a faint hint at what this
signifies.  It is related to reordering applications as mentioned
above.  I need to read a bit more about terminology.

"Evaluation under lambda" First comment in [2].  Reducing expressions
inside abstractions, instead of leftmost outermost.

First, let's go back to Felleisens comment about reduction strategies
vs. calculi.  It's better to define what a reducable expression is and
always reduce the leftmost outermost expression.

  "Normal order and applicative order are failed attempts to explain
   the nature of call-by-name programming languages and call-by-value
   programming languages as models of the lambda calculus. Each
   describes a so-called _reduction strategy_, which is an algorithm
   that picks the position of next redex BETA that should be
   reduced. By 1972, it was clear that instead you want different kind
   of calculi for different calling conventions and evaluation
   strategies (to the first outermost lambda, not inside).  That is,
   you always reduce at the leftmost-outermost point in a program but
   you use either BETA- NAME or BETA-VALUE." [3]

What is the equivalent of reduction inside abstractions for a
concatenative language?  Probably reduction of subprograms.  I wonder:
is there any difference between reducing from left to right and
reducing in arbitrary order?

Also, the Staapl macros behave as higher order abstract syntax: they
describe terms instead of being terms (machine code) [5].


[1] http://lukepalmer.wordpress.com/2009/05/04/lazy-partial-evaluation/
[2] http://lambda-the-ultimate.org/node/3217
[3] http://list.cs.brown.edu/pipermail/plt-scheme/2009-February/030354.html
[4] http://en.wikipedia.org/wiki/Supercombinator
[5] http://en.wikipedia.org/wiki/Higher-order_abstract_syntax


Entry: higher order macros
Date: Thu May  7 09:14:16 CEST 2009

I believe it's time for implementing control structures as higher
order macros.  This will probably stir up the biggest problems with
the current partial evaluator:  swapping code and data.

law         swap
-----------------
ASSOCIATIVE code
COMMUTATIVE data


Entry: specializer
Date: Thu May  7 09:30:31 CEST 2009

Maybe it is better to see the macros as specializers.  Take the
definition for '+ for example:

 (([qw a ] [qw b] +)         ([qw (tscat: a b +)]))
 (([addlw a] [qw b] +)       ([addlw (tscat: a b +)]))
 (([qw a] +)                 ([addlw a]))
 (([save] [movf a 0 0] +)    ([addwf a 0 0]))
 ((+)                        ([addwf POSTDEC0 0 0])))

This is defined as a function that combines original syntax 'qw with
specialized syntax 'addlw 'movf 'addwf.

The advantage is that it can be example-based: if you see some pattern
in target code that you didn't expect, it can usually just be added to
the rules of some specializer.

The problem is that this is quite lowlevel and difficult to understand
because it mixes 2 conceptual levels: real machine code and pseudocode
directly representing a trivial compilation of concatenative/Forth
code.

Another problem is that this is eager: always reduce in the same
place: at the top of the code stack.  There is no "reduction under
lambda".  What I wonder is if this actually is a limitation.

So.  Next task: figure out a way to compile highlevel rewrite rules
(which operate only on macro syntax) into eager lowlevel ones.

I.e. the commutative law:

     (swap +) = (+)

Something related to Luke Palmer's post: what if it is possible to
keep functions that return a single value wrapped as promises?

The idea is that concatenative code (a b +) can be replaced with (b a
+) as long as 'a and 'b return a single value.  Using some kind of
type system it is possible to identify "pseudoconstants" that will
make moving code around a lot easier.

I.e.  Suppose there is some occurance of (a b +) where 'a will not
recombine with the code before it, but 'b will, then it is better to
swap them and evaluate the whole bunch into a value.

Actually, the type system could be _implemented_ as something that
wraps into 'qw.  I.e. macro results should be left as quotations as
long as possible, and when finally evaluated should be memoized.

This requires a representation of "pop the runtime stack" as a quoted
macro.  More specifically: reference into the runtime stack.  The
stack could then be pushed/popped on every call.

Bottom line: more lazyness -> better partial evaluation.

To organize this it might be best to keep the eager and lazy parts
separate.  I.e. the current PIC18 macros could implement the final
eager specializer (the optimizing compiler) while a lazy specializer
is written on top of it.

Some examples:

     @  always should bind directly
        [m1 a] -> [m1 (macro: a @)]

     +  binds lazily with a check for its two arguments
        [m2 ab]       -> [m1 (best (macro: ab +) (macro: ab swap +))]
        [m1 a] [m1 b] -> [m1 (best (macro: a b +) (macro b a +))]

Now this 'best thing in there.. That's the reason the compiler needs
to be purely functional - so we can easily fork 2 different states and
pick the best one to continue.

One thing about this scheme: it's not clear when to actually force the
expression.  I think this is the same as Luke means with "I am still
not totally sure when to introduce an abs" in [1].

It looks like a good time is whenever there are more items packed
together in a single macro return than the next macro takes as input.


Michael Thyer's lamba animator [2] contains examples of both eager and
lazy specialization.

[1] http://lukepalmer.wordpress.com/2009/05/04/lazy-partial-evaluation/
[2] http://thyer.name/lambda-animator/


Entry: lazy partial evaluation
Date: Fri May  8 14:44:26 CEST 2009

I'm trying to find the proper type of lazy evaluations in Staapl,
inspired by [1].  Something like this:

(define-virtual-ops
  (mw a))

(patterns
 (macro)

 (([mw a] [mw b] +) ([mw (macro: ,a ,b _+)]))
 (([qw a] delay)    ([mw (macro: ',a)]))
 (([mw m] force)    m)

 )


Here _+ is the stict one from pic18.ss
To use this in the strict (macro: ...) form do something like:

   1 delay 2 delay + force


Now, what I wonder is how this can be used to separate all compile
time computations (manipulation of qw's) from target compilation
(using target machine code only).

There is one thing though that I didn't do yet from [1]: I made
closures, but not abs's.  "Indirections represent a logical reordering
of the stack, and are used to model bound variables."

What I don't quite understand is why this needs to pop the element
referred to.

Anyways: a basic idea is that if you use a stack and start to reorder
operations on it, you had better kept track of the original positions
of the operations.


[1] http://lukepalmer.wordpress.com/2009/05/04/lazy-partial-evaluation/


Entry: lazy coma
Date: Sat May  9 09:51:51 CEST 2009

So what does the lazy language look like from the point of the
programmer?  First, it does not distinguish between instantiation and
macro definition.  That is what the specializer handles.

Ok.. so there's and rpn language lazy: now, which has a completely
separate namespace (lazy) and defines in terms of (macro).

I.e. (pic18-begin ,(lazy: 123 123 +) force)

The next thing to figure out is 'dup.  If a quoted macro ends up being
forced more than once, we (probably) want to memoize it into a word.
Since it's not possible in general to replace a forced macro with a
word later, this needs some form of backtracking.

Something like: default behaviour = instantiate.  Whenever this turns
out to be problematic (recursion or instantiation of large macros)
turn the macro into a run-time abstraction.


Entry: Michael Thyer's Thesis about Lazy Specialization
Date: Sat May  9 11:02:51 CEST 2009

At page 18 of [1] there is a ``knot-tying'' version of the
interpreter.  This fully lazy technique seems to be the key to lazy
partial evaluation.  Time to get familiar with it.

The evaluator from page 18:

 evalProg prog env = (lookup "main" prog') env where
     prog' = map (\(label,stmts) -> (label,evalStmt stmts)) prog
     evalStmt [] env = env
     evalStmt (stmt:stmts) env =
         case stmt of
           SAssign var exp -> evalStmt stmts (update (var, evalExp exp env) env)
           SGoto label -> (lookup label prog') env
           SIf cond yes no -> if evalExp cond env /= 0
                              then evalStmt yes env
                              else evalStmt no env

Here prog' is the dictionary of evaluated expressions which is built
in terms of evalStmt which in turn refers back to prog' in the
interpretation of SGoto.  If there is some sequence of reductions that
will build the datastructure lazy evaluation will find it.  See also
[2].


[1] http://thyer.name/phd-thesis/
[2] http://calculist.blogspot.com/2005/07/circular-programming-in-haskell.html


Entry: peephole and stack
Date: Sun May 10 09:04:13 CEST 2009

The peephole optimizer works well for a stack formulation because of
the way locality is encoded.  This is especially so for the PIC18
which tunnels everything through the WREG.

However, for true partial evaluation the program structure needs to be
exposed on a more abstract level.  On the other hand, the stack gives
a very natural way to bind static data early: by placing it on top.

So the real question is: how to steer static data towards the top of
the stack?  How to keep a simple reduction mechanism, but write logic
on top of them such that this works best?

Examples.  With ifte ( cond true false ) arguments, how to specialize
this:

   : when-done  [ [ ready-flag high? ] i ] dip [ ] ifte ;

This will perform some re-arrangement of the arguments then call
ifte.  I don't see any problem.  This will just yield 3 [mw .]
arguments to ifte.

Now: ifte needs to know if the condition is available at compile time.
It can do this by forcing the condition macro and checking if it
reduces to a value.  If so, one of the branches can be eliminated.  If
not, the conditional branch construct needs to be passed to runtime.
This doesn't seem so special.

So where _is_ the problem?  I think it's the coupling run time stack
vs. compile time stack.  At some point a quotation needs to be
instantiated.  At this point one also needs to know the input and
output argument structure.

Somehow it seems to be really natural to do eager optimizations, but
quite a leap of faith to do it lazily.  Messing with the run-time
stacks is not trivial..

'dup and specialization.  When a quotation that's big enough gets
instantiated more than once, there's an opportunity for sharing.  This
needs extra knowledge to be filled in later: what is "big enough"?
But apart from that,  it needs its type to be able to use it as a
function call!  Inlined macros don't need type: they can just be
executed, but once abstracted the type is neccessary both at the call
site and at the definition site to decouple both.

next: implement run time abstraction from a pure lazy macro definition
of a language.


Entry: macro->data / macro->code
Date: Sun May 10 10:29:54 CEST 2009

Removed these and replaced them with state->data / state->code.
Currenly only the 'unwrap' function in coma/code-unit.ss uses ad-hoc
macro evaluation in the core compiler, next to the proper
instantiation.  The rest is in interaction code only, which is now
also attached to (state:compiler).

However, some tests are not passing any more.  One problem is gpdasm:
some bug got fixed which gives different dasm output.  Another is
probably at a deeper level.

Let's first try to rewind gpdasm, or try to update to the new gpdasm
with old staapl code.


Entry: eta-reduction on rpn-lambda
Date: Sun May 10 14:26:08 CEST 2009

Making abstractions with rpn-lambda with a single term at this moment
introduces abstraction overhead.  I wonder if this can lead to
memoization problems due to the fact that (macro: m) != (macro m).

To do this purely generative instead of by postprocessing requires a
different approach.  Currently the information is lost: what you want
is to fish out the function that is applied, not the application.

Maybe it isn't really a problem:  performance wise i don't think it
would matter since the compiler is probably smart enough to eliminate
the abstraction.


Entry: tower of interpreters
Date: Mon May 11 09:44:49 CEST 2009

In [1] on page 44 it is mentioned that Futamura projection practically
requires a partial evaluator that is both powerful enough to
specialize itself and simple enough to be specialized by itself.

Now, what is the relation between writing interpreters in Staapl's
macro forth to be specialized by its partial evaluator, and writing
them directly as towers of macros?  I'm confused by terminology here,
but I'm thinking practically about protocol parser specification etc..

[1] http://thyer.name/phd-thesis/


Entry: minilisp
Date: Mon May 11 18:34:30 CEST 2009

A tussendoortje.  I found this[1] when going through some djvulibre
code.  It might be interesting to keep in mind whenever I'm going to
target scat to C.

[1] http://leon.bottou.org/projects/minilisp


Entry: telling people things
Date: Mon May 11 22:37:54 CEST 2009

You can't.  You need examples.

[1] http://thefarmers.org/Habitat/2004/04/you_cant_tell_people_anything.html


Entry: full lazyness
Date: Wed May 13 11:06:48 CEST 2009

On page 55 of [1] "full lazyness" is defined as giving each term
minimal scope.  Then it is remarked that:

  It is possible to achieve full laziness in a lazy language by
  transforming the syntactic representation of functions so that the
  scope of every function is minimal.


[1] http://thyer.name/phd-thesis/


Entry: next: documentation update
Date: Wed May 13 11:51:13 CEST 2009

  * website: this needs to be simplified a bit.
  * scribble docs: they are probably broken.
  * papers: less important but best moved to scribble

The emphasis should be on the following:

   * it works:
      - you can compile .f -> .hex / .dict
      - you can interact with target using .dict

   * it's simple: bottom-up macro system using PLT's module system.

Then there should be examples examples and examples.


Entry: pe reordering
Date: Wed May 13 16:34:50 CEST 2009

With eager evaluation (not evaluating under lambda) the trick is to
make sure that reducable expressions are exposed in the right order.

How to do this in Coma?


Entry: interpreting generated code as a stack
Date: Wed May 13 20:01:51 CEST 2009

Is this the essential difference compared to expression based partial
evaluation?

I.e. the Scheme form (+ a b) could be implemented as a macro + which
inspects its arguments, and performs an addition if it finds two
numbers.

In general (without the stack stuff) Staapl is a compiler for a
composition language : point free style -> imperative code.

It works on the analogy that machine code is actually a composition of
functions acting on the machine state.  This in itself is not so
useful when the only state that can be manipulated is the machine
state, but it is very useful when the state is somehow limited.

Using grids instead of stacks could lead to an APL like language.

I am missing something however.

I really don't want to operate on machine code.  I want typed lazy
values, not evaluated ones.

There are two problems I'm solving:

      [qw a] [qw b]    + -> [qw (+ a b)]
      [addlw a] [qw b] + -> [addlw (+ a b)]

The first one is generic, the second one is quite specific to the
machine.  Is it possible to implement partial evaluation in a more
direct style, without having to include specific clauses into each
machine-specific transformer?

There is one very big problem with my evaluator: it _needs_ to be
strict because it is intertwined with the implementation of the
run-time parameter stack.  This "side-effect" is OK as long as
parameter dereference doesn't get postponed and reordered.

Basicly, it's possible to delay the popping of the data stack and wrap
this action in a quoted macro, but then this popping fixes the order
in which these macros have to be evaluated.  I have a hunch that if I
can describe this problem in a proper way, I can find a solution and
move to a lazy evaluator.

However, when it is possible to use random parameter access, this
mechanism could be used to implement an evaluator for an applicative
language, essentially giving Luke's approach.

This stuff is over my head.  I just went over the rest of Thyer's
thesis and I'm quite confused.  The problem doesn't seem to be trivial
at all.


I keep coming back to "lazy pops".

Entry: when is eager matching not working?
Date: Wed May 13 23:34:51 CEST 2009

It would be interesting to find cases where the eager partial
evaluation scheme breaks down.  One clear case is the interleaving of
orthogonal cases.

I.e.   1
       dup @ 101 !
       2 +
       102 !

This will not evaluate "1 2 +" to "3" because the "1" is not visible
at "2 +" because of the machine code generated by "dup @ 101 !".

The proper way to solve this is to perform the following rewritings:

1 dup @ 101 ! 2 + 102 !
1   1 @ 101 ! 2 + 102 !      ; evaluate 'dup' ->
1 @ 101 !   1 2 + 102 !      ; commute '1' and '1 @ 101 !' ->
1 @ 101 !       3 102 !      ; evaluate '1 2 +'

Let's evaluate this using lazy macros

       ()
1   -> ([mw (1)])
dup -> ([mw (1)] [mw (1)])
@   -> ([mw (1)] [mw (1 @)])
101 -> ([mw (1)] [mw (1 @)] [mw (101)])
!   -> ([mw (1)] [mw (1 @ 101 !)])

Now, at this point the two macros need to swap, so (2 + 102 !) can
combine with [mw (1)].

Basicly, (1 @ 101 !) should be annotated as being independent of run
time stack, and so can be tucked away.

Wait. This "tucking away" is similar to the self-removing abstractions
in Luke's idea.

The real problem here is side-effects..  I'm still comparing apples
and oranges..


Entry: uniqueness types
Date: Thu May 14 01:20:39 CEST 2009

Sounds like an interesting concept.  Used in the Clean language.

[1] http://en.wikipedia.org/wiki/Uniqueness_type


Entry: syntax properties
Date: Thu May 14 02:23:53 CEST 2009

Apparently check syntax uses the 'disappeared-use syntax property.
Find this in the manual.  (Got it from Typed Scheme paper [1])

[1] http://www.ccs.neu.edu/scheme/pubs/scheme2007-ctf.pdf


Entry: only macros?
Date: Thu May 14 09:13:51 CEST 2009

Two questions for today:

    * can QW be replaced entirely by MW?
    * how to massage a composition of macros before evaluating them.

At first sight there doesn't seem to be a problem with making all
literal values lazy macros that produce a single literal value.  The
only restriction is that these macros are not allowed to produce code
that alters the run-time parameter stack.

Maybe I should fix the implementation of the eager evaluator, since it
is easy to understand, and consider a preprocessing step that aims at
providing a reordering the composition?

Also, I'm still annoyed by this:

      [qw a] [qw b]    + -> [qw (a b +)]
      [addlw a] [qw b] + -> [addlw (a b +)]

The latter should really be

      [mw (a +)] [qw b] + -> [mw (a + b +)]

The direction to go in seems obvious: I'm not going to get anywhere
without processing of macro compositions.  This means that each macro
should carry with it a description of its I->O behaviour.

The problem is that I don't really have a substrate for thought
here..


Entry: partial evaluation: functions, not macros
Date: Thu May 14 09:50:51 CEST 2009

Maybe a key element to make this simpler is to stop thinking of
compositions as macros, but instead look at them as non-reduced
functions.

Given a function composition (a b c ...) the goal is to re-order the
operations such that the eager peeophole compiler can produce better
code.  Can this maybe be put into mutual feedback?

I.e. stick with eager evaluation, but allow backtracking to search
through the space of possible reorderings.

So let's fix the stage for now:

  - Peephole optimizer stays what it is.  This works well for manual
    code writing where there is usually "only one thing going on at a
    given time", but is non-optimal in general when independent
    operations that are interleaved.

  - Try to find a way to preprocess.

Maybe it's simplest to get rid of the current control flow compiler
for this.  See it as a side-track of the Forth compiler.


Entry: more algebra
Date: Thu May 14 09:59:38 CEST 2009

Is there some way to turn the monoid into a group?  Defining
operations as invertible would make things a lot more elegant
probably.  For arithmetic this isn't too hard: add a 2nd "kill stack"
that carries information to perform the undo.  Alternatively, add the
undo information to the values themselves.

I.e. something like:

(1 2 +) -> (3), (2 -)

There should be some application for constraint-based programming
here..  Anyways, let's make the basic structure first.


Entry: functional PIC18 language
Date: Thu May 14 10:07:13 CEST 2009

To make this work in practice, this needs jumps.  The control flow
analyser and control stack are not necessary so let's keep it simple.
What we need are:

      - function call / return instructions
      - conditional jump (ifte)

Ok. separated out pic18-control-unit.ss making pic18-unit.ss
independent of the control^ macro set.


Entry: low-hanging fruit
Date: Thu May 14 10:59:03 CEST 2009

This partial eval stuff probably needs a rest.  It will involve a lot
of thinking, but what I need now is some easy success to boost
motivation..

Maybe it's time to start porting to the dsPIC.  I have some parsed
manual lying around somewhere to get to the instruction set.  However,
it might be best to do it manually.

There is one thing I'd like to change though: the assembler should be
more compositional.  Right now there is a single table with all
opcodes, but it's probably best to separate out all function
prototypes.

This leads me to think that an opcode is really just an argument to a
more general instruction.  Anyways...

Hmm.. dsPIC isn't that low-hanging due to different instruction set
architecture.  It would be cool as a real platform though.  Chips are
PDIP and small.  Maybe I should take this as a hint: find a generic
approach to bootstrapping a new architecture.


Entry: new documentation
Date: Fri May 15 18:56:19 CEST 2009

 - website should contain only 1-sence explanation, with an immediate
   link to a scribble doc.

 - the basic idea is a peep-hole optimizer for forth, based on the
   observation that forth code looks like a stack.

 - staying close to the way Forth is usually implemented, the core
   idea is kept and turned into a functional framework.

 - on top of this a lot of uC specific optimizations can be built


Entry: terminology : partial evaluation
Date: Sat May 16 21:41:48 CEST 2009

I think I need to distinguish partial evaluation (specialization of
_functions_) and peephole optimization.  They are very much related
(if you look at an instruction like [addlw ..] as a specialized
version of [addwf ..]) but I think it's better to use partial
evaluation for a language that doesnt distinguish between macros and
functions.


Entry: types are calling conventions
Date: Sun May 17 09:07:16 CEST 2009

http://lambda-the-ultimate.org/node/3319


Entry: removed front page stuff again..
Date: Sun May 17 10:04:27 CEST 2009

I keep on doing this.. Why?  Anyways, the new introduction says almost
the same as this, but with examples.


<h2>Basic Elements</h2>

 <p>
   Metaprogramming is about manipulating programs as data.  A key
   element here is the representation of programs.  Being a compiler,
   Staapl has two languages to deal with: a high-level input language
   and a low-level output language. Both languages are
   <a href="http://en.wikipedia.org/wiki/Concatenative_programming_language">concatenative</a>.
   <ul>
   <li> Output language programs are represented as lists of
   instructions
   <code>[op]</code>.

   <li> Input language programs are represented as output language
   program transformer functions <code>[op]->[op]</code>.
   </ul>

<p> A meaning can be attached to high-level concatenative source
  code using the following compilation algorithm:
   <ol>
     <li> Read the tokens from the input program source, and
     associate each of them with an <code>[op]->[op]</code>
     transformer.
     <li> Compose all <code>[op]->[op]</code> transformers using
     function composition, in the order their tokens appear in the
     input program source.
     <li> Apply this function to the empty program <code>[]</code> to
     obtain the final output program.
   </ol>
   From this one could infer that each <code>[op]->[op]</code>
   function appends a small machine code fragment to the code
   accumulator, essentially behaving as
   an <a href="http://en.wikipedia.org/wiki/Assembly_language#Macros">assembler
   macro</a>.  However, the function is free to modify the accumulated code in
   its entirety, performing any optimization it sees fit.

<p> Using this representation the task of building code generators can
be split in these parts
  <ul>
    <li> High level: Composing generators/transformers is expressed
      using concatenation.  I.e. the Scheme form <code>(macro: 123
      +)</code> creates a composite code transformer, built from the
      code transformers <code>123</code> and <code>+</code>.
      The <code>macro:</code> form behaves as quasiquote, facilitating
      template programming. I.e. the previous transformer is
      equivalent to <code>(let ((x 123)) (macro: ',x +))</code>.

    <li> Low level: Creating language primitives as
      <code>[op]</code> processors, and possibly defining a machine
      instruction set <code>op</code> that has the double function of
      representing target semantics and serving as a representation of
      compile time data.
  </ul>


<h2>Macro Forth</h2>

<p> By interpreting the code list <code>[op]</code> as a stack and
  adding an <code>op</code> instruction <code>(QW value)</code> that
  represents loading of a literal value on the
   <a href="http://en.wikipedia.org/wiki/Forth_programming_language#The_stacks">parameter
   stack</a> at run time, it is possible to implement Forth with eager
   partial evaluation.
<p> I.e. the function associated to the token word <code>+</code>
  would normally only compile machine code that pops 2 numbers from
  the run-time stack, adds them and pushes the result to the stack.
  Instead it could be made to inspect the code it is passed for any
  instructions that load literal values on the runtime stack, remove
  those, perform the addition at compile time, and generate code for a
  constant instead.

<p> Note that the basic structure of <code>[op]</code> lists
  and <code>[op]->[op]</code> transformers is more general than stack
  languages.  In fact, it could be used to implement partially
  evaluating macro languages for any kind of state machine.


Entry: previous introduction
Date: Sun May 17 10:09:46 CEST 2009

It's sort of ok, but I think the new one is better because it uses
examples.  Let's keep it around.


Entry: i can't read documentation
Date: Sun May 17 12:31:01 CEST 2009

i just don't have the patience.. trying to figure out how to build my
scribble docs locally to see how they will show up on the planet
server, but there are several inconsistencies in my understanding of
how setup-plt works.

most importantly: the collects dirs need to be clean!
it compiles everything it finds, so junk gets in the way.


Entry: releases
Date: Sun May 17 17:15:21 CEST 2009

Currently the "planet-fileinject" is the best way to test the planet
package before upload.

 ~/.plt-scheme/planet/300/4.1.5.5/cache/zwizwa/staapl.plt/1/8/planet-docs/


Entry: fruit
Date: Sun May 17 22:35:40 CEST 2009

HA! i know about some low-hanging fruit to compensate for all this
seriousness.  an automatic scheme -> rpn converter so rpn can be
packaged separately to be integrated with fluxus.


Entry: preparing for release
Date: Mon May 18 10:49:14 CEST 2009

Everything seems to be working, except for some minor issues with the
documentation (the red underline).

Time to upload it to PLaneT.

So, before releasing, do the following:
    * check bin/version
    * make test                 # run the test suite
    * make planet-fileinject    # install the package locally

The "release" target will do these.


Entry: unit filteres
Date: Tue May 19 12:12:41 CEST 2009

How would one build a unit which filters an api?  This would be a very
interesting way to override macro behaviour.

Maybe what I'm really looking for is objects..


Entry: inline macros vs. partial evaluation
Date: Wed May 20 16:21:50 CEST 2009

The thing is: you want sharing.

Currently, when using macros, they are always inlined, even when there
are compile time computations going on

It looks like a good first attempt to proper pe would be to decide
when _not_ to inline.


Entry: applications
Date: Wed May 20 17:00:37 CEST 2009

I need a metronome.  And a clock for rummikub.  And maybe something
that responds to sound?

I found the 2 8x8 led matrix displays.  This looks like a nice target
for some application.  Needs 24 pins to make it work comfortably.


Entry: ltu thread on the future of programming
Date: Wed May 20 20:50:29 CEST 2009

I'm tired so time for wild associations..  But these two posts I found
rather striking:

http://lambda-the-ultimate.org/node/1439#comment-16473

  Since I asked the question....

  ...you may have guessed I had a few answers myself but wanted to see
  what else I could shake loose from you guys first....

  Moore's Law and Feeping Creaturism implies ever larger and more
  complex systems. The solution to which is not ever larger and more
  complex languages, but simpler languages, both syntactically and
  semantically that permit tools to assist more in...

  * Refactoring.
  * Testing.
  * Code Generation.
  * Reengineering, understanding and visualization of very large systems.

  Thus

  * The ease of analysis, rewriting and refactoring of the Joy
    language will make the Joy language the "parent/inspiration"
    language for the next generations of languages.

  * Syntax and semantics will become simpler. ie. Towards a simpler
    version of Lisp/Scheme or Forth.

  Some other beliefs I have stemming from Moore's Law and creeping
  featurism driving the exponential growth in software complexity...

  * Linear logic will supplant GC.

  * Lazy evaluation is not an optimization (or hard to implement nice
    to have). It's mathematically the only way to have a consistent,
    analizable and manipulatable system.

  * Single Assignment or Joy / Forthlike no assignment will become the
    norm.

  * Cyclic dependencies at one or more levels (package / file / class
    / ...?) will be explicitly disallowed by the tools.

  * Static typing will lose to Dynamic Typing, and Dynamic Typing will
    lose to compiler inferred Static Duck Typing.

  * Languages will provide a transparent "each object has one thread
    inside it, every method invocation is just another event in that
    object's event queue" view to the programmer.

  * OO hierarchies are too reminiscent of pre-Codd hierarchical
    databases. And SQL ignores Codd on too many points. I expect to
    see a language that...

    ** Is fully relational ala Codd Law's.
    ** The basic Object model is also in at least 4th normal form.

Can I prove any of this?

No. Not now. Not yet.

But it's what I keep my mind occupied with when I'm not earning my
bread.

John Carter


http://lambda-the-ultimate.org/node/1439#comment-40602

The solution to the concurrency problem...

Will be a two language solution:

  Language A: be a sequential language designed from the ground up to
  be easily parallelizable at the meta level. It will not be a
  parallel language with parallel programming constructs, neither
  message passing nor multi-threading nor explicit effects. It will
  have a lot of restrictions that may seem strange to the sequential
  programmer but they will all be there for a good reason which is...

  Language B : which is a declarative meta-language which will allow a
  different programmer to examine and reason about both language A and
  it's potential run-time environment and then re-factor it's code to
  run in parallel on the chosen platform(s). At first most programs in
  language B will be heuristic and specialized for particular
  applications, but as more experience is gained with particular
  algorithms (in language A), and with particular hardware
  configurations, patterns will begin to emerge and language B will be
  able to support automatic parallelization across certain categories
  of algorithms on certain categories of hardware. Language B will
  have all the nice syntactic sugar parallel programmers love, whether
  it be in the language of threading, effects, messages or something
  else.

David Minor


More gems:
http://lambda-the-ultimate.org/node/1439#comment-16479 (canvas vs.
http://www.info.ucl.ac.be/people/PVR/flopsPVRarticle.pdf (4 x a 4-layer language)


Entry: interaction
Date: Thu May 21 10:56:59 CEST 2009

With all this module stuff I actually forgot to test if the
interaction is still working.

Apparently not.

Had to fix a thing or two in the macro evaluation code.  Now it seems
to work again.

"empty" is broken though. FIXED.


Entry: simpler live commands
Date: Thu May 21 11:44:41 CEST 2009

Currently the way the target: language is constructed is a bit
convoluted.

In rpn-target.ss there's an unnecessary double indirection.

Ok, removed it.  Semantics is now a bit clearer.  There are 3 cases
for identifiers:

   - target words (executed)
   - macros (simulated)
   - prefix parsers


Entry: target stack access
Date: Thu May 21 12:53:29 CEST 2009

Another thing which has annoyed me for a while is the way target macro
simulation works: it slurps the entire stack.  How can we know how
many elements should be popped?  This really could be done lazily, but
would require some modification to the code representation (allow lazy
stacks).

I already have a mechanism to do this: target-value.

Just adding thunks that will force stack pops should be enough.

This works, but picking out instructions to skip turns out to be more
difficult than expected. (i.e. "dup" and "swap" don't work because
they dont wrap the target words.

This should be done by comparing input and output.

--

This problem isn't nearly as trivial as it seems.  Before evaluation
starts, all pops need to be forced.

Even this isn't enough: we need to find _references_

Ok. the trick is this:

  - first compare in/out to see which operations do not need to be
    executed.
  - dummy eval the rest to trigger the lazy pops
  - use the size of the input stack to determine how many instructions
    of out need to be taken


Entry: lazy bootstrapping
Date: Thu May 21 12:56:27 CEST 2009

If lazyness is the ``natural'' way to deal with circular dependencies,
isn't there a way to solve a compiler bootstrapping problem using
circular programming tricks?


Entry: tethered.ss
Date: Thu May 21 17:08:50 CEST 2009

I have the impression that there is a lot of redundancy in live.ss
that can be eliminated by flattening some layers.  There are functions
that are accessible as scheme functions, scat functions and
interaction macros.  Is this really necessary?

I don't really use scat all that much except as a vehicle for other
things..


Entry: usb
Date: Thu May 21 18:09:28 CEST 2009

After three years maybe it's time to finally write the damn driver,
no?

Funny how this has been problematic for a while.

Using picstamp.fm from app/ to get it going.

First: getting old code to compile.

Simply invoking "load usb.f" gives:
 include "/data/safe/tom/darcs/brood-5/staapl/pic18/usb.f"
 include "/data/safe/tom/darcs/brood-5/staapl/pic18/shift.f"
reference to undefined identifier: macro/device-descriptor

This identifier is present in pic18/usb.ss but still uses an old
symbolic interface.  Let's see if that module can be revived on its
own.

The main function is 'usb-compile-device
I have one description file: pic18/cdc.usb

Let's change the interface such that this file becomes a module which
includes some scheme code and used the 'define-usb-device form.

OK, it generates this:
(: device-descriptor f-> 18 |,| 18 |,| 1 |,| 16 |,| 1 |,| 0 |,| 0 |,| 0 |,| 64 |,| 216 |,| 4 |,| 1 |,| 0 |,| 0 |,| 0 |,| 4 |,| 3 |,| 2 |,| 1 |,| : string0 f-> 23 |,| 23 |,| 3 |,| 68 |,| 101 |,| 102 |,| 97 |,| 117 |,| 108 |,| 116 |,| 32 |,| 67 |,| 111 |,| 110 |,| 102 |,| 105 |,| 103 |,| 117 |,| 114 |,| 97 |,| 116 |,| 105 |,| 111 |,| 110 |,| : string1 f-> 19 |,| 19 |,| 3 |,| 68 |,| 101 |,| 102 |,| 97 |,| 117 |,| 108 |,| 116 |,| 32 |,| 73 |,| 110 |,| 116 |,| 101 |,| 114 |,| 102 |,| 97 |,| 99 |,| 101 |,| : string2 f-> 5 |,| 5 |,| 3 |,| 48 |,| 46 |,| 48 |,| : string3 f-> 10 |,| 10 |,| 3 |,| 85 |,| 83 |,| 66 |,| 32 |,| 72 |,| 97 |,| 99 |,| 107 |,| : string4 f-> 28 |,| 28 |,| 3 |,| 77 |,| 105 |,| 99 |,| 114 |,| 111 |,| 99 |,| 104 |,| 105 |,| 112 |,| 32 |,| 84 |,| 101 |,| 99 |,| 104 |,| 110 |,| 111 |,| 108 |,| 111 |,| 103 |,| 121 |,| 44 |,| 32 |,| 73 |,| 110 |,| 99 |,| 46 |,| : config0 f-> 25 |,| 9 |,| 2 |,| 25 |,| 0 |,| 1 |,| 0 |,| 0 |,| 160 |,| 50 |,| 9 |,| 4 |,| 1 |,| 0 |,| 1 |,| 3 |,| 1 |,| 1 |,| 1 |,| 7 |,| 5 |,| 128 |,| 160 |,| 8 |,| 0 |,| 0 |,| : string-descriptor 5 route/e string0 |;| string1 |;| string2 |;| string3 |;| string4 |;| string-error |;| : configuration-descriptor 1 route/e config0 |;| config-error |;|)


OK, now to make the code a bit more modern.

I need an abstraction for building tables.  Something that can
interleave a bunch of atoms with a compile function.  I.e.:

begin-table ,
  1 2 3 4 5
end-table

Maybe with a more concise syntax?

Something like { , 1 2 3 4 5 }

Wait, since "[" and "table[" are independent tokens, square brackets
could be used for any kind of structured data, allowing s-expressions
to occur in Forth style code.  Alternatively, "scheme" and "table"
could be prefix parsers that allow arbitrary s-expression
transformation.

  scheme: [ define foo [ + 1 2 ] ]     \ scheme expression
  table: , [ 1 2 3 4 5 6 ]             \ constant table
  [ 1 2 + ]                            \ quoted function

Actually, why invent new syntax if s-expressions do just fine..  The
reason to not have s-expressions in Forth is that then you don't need
to represent them at compile time.  Additionally, Forth control words
allow to be composed in strange ways by themselves.. I don't like to
throw that away.

But in scheme it's really easy to do this:

(define-syntax-rule (table: separator (item ...))
  (macro: ,(macro: item separator) ...))

So for defining tables I do think that Forth-style prefix parsers
might not be the right way to go.  It's probably easier to write
Scheme macros and provide some s-expression based syntax to access
them from flat Forth.

So, what is necessary is a generic way to create prefix parsers that
take an s-expression as an argument and pass it to a scheme macro to
produce an opaque coma macro.  (with "{" and "}" reserved for module
level expressions)

(snarf-prefix-macro macro:)


Entry: Factor stack effect checker
Date: Fri May 22 08:30:47 CEST 2009

http://docs.factorcode.org/content/article-inference.html


Entry: usb cont
Date: Fri May 22 08:48:06 CEST 2009

So what should usb-cdc.ss produce?  It generates a number of Forth
procedures, but it does so on top of the flat Forth syntax.  It's
probably easier to have it define s-expressions.

I need a form that abstracts these: register! wrap compile.  It
probably is in fact easier to build it on top of the Forth syntax..
It has a more direct interface to the control flow graph compilation
process (the fallthrough feature).

The usb code should eventually become a unit, but now i only have the
PIC18F2550 to work with, so let's stick with a module that requires
pic18.ss

Since this is to be used as an abstraction around generated names,
it's probably also best to only provide access to the final table
words.

Ok. separated parsing and code generation.

Now, I need to think a bit more about how to abstract generation of
Forth code to scheme.  Most flexible is the generation of flat Forth
code, since it has full access to all control flow features.

I think I found a nice hack: Generate flat Forth code, but do it as a
nested s-expression.  This keeps the intended substructure intact but
can be easily flattened down.  Think of this as using the
associativity of function composition to annotate code that belongs
together.

Though, for bootstrapping the purely functional concatenative language
on top of the imperative Forth core, it might be interesting to
provide some means of escape.  In the case of the usb code generator,
it doesn't make sense since it already uses lower-level language
constructs like jump/value tables.

TODO: name hygiene for usb.ss

OK.  Looks like it's working now.  usb-cdc.ss now defines 3 names.

Entry: prefix parsers vs. concatenative macros
Date: Fri May 22 10:42:02 CEST 2009


* concatenative: deal with compile-time function composition and
  intermediate code transformation only.

* prefix: these are module-level composition tools: they can abstract
  over more general syntax structure that manipulates name binding.

Entry: renaming x -> r
Date: Fri May 22 12:54:14 CEST 2009


Entry: dynamic image bases vs. static source based development
Date: Fri May 22 13:05:59 CEST 2009

I really like PLT scheme's macro/module system.  Sometimes it gets in
your face, imposing acyclic dependencies.  However, I've found this
usually to be a good thing.  I've learned to trust the consistency it
brings while developing Staapl.

However, my debugger/editor emacs _needs_ the dynamic approach: I'm
living inside it with lots of state attached whenever I want to change
some functionality.

How can we combine the easy-correctness of a static tool like drscheme
with the convenience of an all-dynamic no-restart environment like
emacs?


Entry: The f and a pointers -- dynamic scope
Date: Fri May 22 13:20:54 CEST 2009

It is really convenient to be able to use the a and f pointers to RAM
and ROM respectively.  Should this go at a cost of reducing the
abstraction level.  I.e. are the registers to be saved before use or
during interrupts etc?

The real question is more general: give a proper style for using
dynamic binding.  Since there is no lexical scope, dynamic scope is
the only alternative.

The other question is: do you expose dynamic scope in an interface, or
do you keep things referentially transparent?

More specifically: the usb descriptor compiler constructs 3
words that provide pointers to binary records.  Should they provide
them in the 'f' register (where it will have to end up eventually) or
on the top of the stack?  The latter is probably better practice.


Entry: composability
Date: Fri May 22 13:32:24 CEST 2009

Designing Staapl is mostly an exercise in not loosing composabiity.
Maybe that's what language design is about?  Introducing features that
don't clash; keeping them orthogonal so they can be composed at will.

I find that the simpler I make basic principles, the better this
works.  The current hurdle is fighting the macro/instantiate divide:
I'm writing abstractions that need library functionality (basicly,
things that use the hardware stack).

Entry: disappeared-use
Date: Sat May 23 06:48:05 CEST 2009

I'm trying to get check-syntax to work for prefixed identifiers.
These don't seem to do what I think they do:

   (syntax-property
      (ns-tx #`(_  (namespace ...) name))
      'disappeared-use #'name))

   (syntax-property
      (ns-prefixed #'(namespace ...) n)
      'disappeared-binding n)))


Let's ask:

Hello,

In the following, is there a way to instruct Check Syntax to recognize
the #'x identifier in reference and binding position as in case 1, or
in its relation to the #'_x identifier as in case 2 ?

----
#lang scheme
(require (for-syntax scheme))

(define-for-syntax (underscore stx)
  (datum->syntax stx
                 (string->symbol
                  (format "_~a"
                          (syntax->datum stx)))))

(define-syntax (u stx)
  (syntax-case stx ()
    ((_ (name val) . body) #`(let ((#,(underscore #'name) val)) . body))
    ((_ id) (underscore #'id))))

;; case 1
(u (x 123) (u x))

;; case 2
(let ((_x 123)) (u x))
(u (x 123) _x)
----

Cheers,
Tom


ANSWER:

Delivery-date: Sat, 23 May 2009 23:35:03 +0200
From: Chongkai Zhu <czhu@cs.utah.edu>
To: Tom Schouten <tom@zwizwa.be>
CC: Sam TH <samth@ccs.neu.edu>, plt-scheme@list.cs.brown.edu
Subject: Re: [plt-scheme] Check Syntax & mangled identifiers

Just keep the srcloc and the original prop of the identifier seems to be
enough for me.

Chongkai

#lang scheme
(require (for-syntax scheme))

(define-for-syntax (underscore stx)
 (datum->syntax stx
                (string->symbol
                 (format "_~a"
                         (syntax->datum stx)))
                stx
                stx))

(define-syntax (u stx)
 (syntax-case stx ()
   ((_ (name val) . body) #`(let ((#,(underscore #'name) val)) . body))
   ((_ id) (underscore #'id))))

;; case 1
(u (x 123) (u x))

;; case 2
(let ((_x 123))
 (u x))
(u (x 123) _x)


Entry: fixing "load"
Date: Sat May 23 08:41:37 CEST 2009

Problem: 'load doesn't mix with 'expand.

There is a simple fix for this: make 'load behave as a preprocessor
only, which means it is a reserved word that cannot occur anywhere
else in the source code.  Maybe this is a bit restrictive though..

It's probably simpler to flatten the current recursive calls for load.

Now.. Do I really need the current-load-relative-directory parameter?
I'm not using 'load anywhere.. It's probably OK to dump info in the
forth search path parameter.

Hmm.. let's quick-fix it for now built on existing structure: the
first item in the forth path will be the current directory.  This is
simply changed on begin/end of a loaded sequence.

Better: abstract the value of forth-path to include both current
directory and search path.

OK. it seems to work.

There's one thing that's going to bite me later though: the mode
(macro/forth) isn't saved..  Maybe this should be implemented as
parse-time state also?  Just have a single struct that can be easily
dumped as an abstract transformer?

Anyways..
Road is open to do it properly + it should now be possible to use
'require inside 'load -ed files.


Entry: usb cont.
Date: Sat May 23 13:07:40 CEST 2009

Next problem: get the code I already have to compile and upload.


Entry: load/require interference
Date: Sat May 23 13:27:15 CEST 2009

problem:
  target code generated by a module might be emptied. since a module
  won't be instantiated again, this will introduce dangling
  references..

crap.. it's not easy!

The real problem is that a module should not have an instantiation
side-effect.  Or, we should make it so that module code cannot be
erased. Or, 'empty' should clear the namespace.

Maybe the latter is the best approach.  That way modules will get
re-instantiated.

So.. Application development is separated in 2 parts:
  - kernel development (as self-contained .fm)
  - scripts that can accumulate

Upon reload the target should be cleared from the point that's marked
as the start of the script buffer.


Entry: moving stuff to modules
Date: Sat May 23 14:29:02 CEST 2009

Ok, with this approach it is possible to have on-demand loading of
code using require instead of load.  Let's start porting kernel code
to modules.

Doing this, at least the constants need to somehow be defined in two
steps, instead of at application level.

Ok. This is going to re-arrange the test code so need to be careful.

Moving the test to kernel code, not library code.  Serial.f contains
some code that can be separated out.


Entry: the monitor
Date: Sat May 23 16:45:52 CEST 2009

I'd like to put the monitor code in a module, but it needs to be
parameterized by read and write.  However, this code is not critical
in any way so let's turn read/write into dynamic variables.

With vectors in ram, this looses a bit of robustness.  But nothing
a reset can't fix.  Also, the kernel size will grow so it won't fit in
the 512 bytes any more..  This isn't such a problem either since I'm
giving up on programmer-less operation for small devices.  Maybe for
the usb sticks later, but they have bigger protected boot blocks.


ok, i messed up the current code: i thought i was testing it but
forgot to upload.. so it was still running old.  stupid...

there's a problem: "org" doesn't work properly with the way modules
generate code.

this can be turning boot.f into a module too, to make sure it gets
instantiated first.

so.. this doesnt work..

org = side effect = not compatible with non-sequential load


Entry: order of instantiation
Date: Sun May 24 09:27:03 CEST 2009

It is important to:

1. keep assembler-like behaviour for low-level things without actually
   going to the assembler.

2. keep high-level module behaviour.

So, how can this be made more painless?


Now wait. There is nowhere in the assembler where "org-pop" is
actually executed.  Ok, it's in "with-pointer".

The problem seems to be here:

#x0040 org-push    : boot-40 warm ;  org-pop

The ":" messes up the chain..

Since I can't get this right and can't see the error straight away,
there is probably something wrong with the architecture.

Ok. Removing the ':' fixes the problem:

#x0000 org-push    #x0020 jw ; org-pop
#x0040 org-push    warm ;  org-pop

So, this worked with "org" but not with "org-push".  Can't use "org"
any more because the order of instantiation is not predictable.  The
initial compilation point is defined in pic18.ss and can essentially
not be changed.

Ok, so I'm going to take the "org!" out of the code: only relative
access allowed.

OK.  Now, why doesn't ':' work inside an org-begin .. org-end ?
It calls make-target-split.

No idea.. I've eliminated the problem by generating labels dynamically
using ">label" and "label:", which fixes the problem.


Entry: todo
Date: Sun May 24 11:08:12 CEST 2009

FIXME: jw/cw Forth words need to take byte addresses


Entry: vectorized receive/transmit
Date: Sun May 24 12:29:20 CEST 2009

There's a problem because the interpreter uses the a and f registers,
and vectorized access uses them too.

No it doesn't.  It's a silly bug in tethered (f! instead of a!)


Entry: stack size
Date: Sun May 24 12:35:09 CEST 2009

Instead of having the interpreter tell the stack size, it's probably
better to allow inspection of the current stack pointer, so the host
can determine stack size by knowing bottom and direction.  This to
decouple the interpreter from the hardcoded "ds-bottom" macro.

Hmm.. I messed something up again.. I get protocol errors.

Something with this returning void sometimes.

(define (ts-copy)
  (let ((it (a>/b (stacksize) #x80)))
    (if (void? it) '() it)))

Looks like (a>/b 0) is not valid.  It instead requests a 256 byte
string because "0 for ... next" behaves as "#x100 for ... next".


Entry: where is the first element on the data stack?
Date: Sun May 24 17:26:02 CEST 2009

The problem is that with top of stack in WREG, the stack can't be
empty.  There is always at least one element, which will be written to
the bottom of the reserved memory when a new word is loaded.  This
element however will be ignored when the stack is displayed.

(define (ts-copy)
  (reverse (a>/b (stacksize)
                 (+ 1 (stackbottom)))))


Entry: icd2 serial
Date: Sun May 24 18:53:01 CEST 2009

I'm moving away from icd2 serial port.  It never really worked
well.  It's not too much of a problem to have both a programmer and a
serial cable attached.  The programmer flashes a lot faster too..

So. Preferred way of working for PIC18:

  * pk2 attached to flash kernel code (which is a .fm module)
  * hardware serial port for interaction, high baudrate.
  * .dict holds kernel code, scripts don't get retained
  * script buffer junk erased on terminal connect.

This means: no permanent incremental development.  Permanent code can
only be flashed as an entire image.  It is possible however to
construct some kind of fusing mechanism based on interaction.


Entry: the synth
Date: Sun May 24 18:56:55 CEST 2009

Time to port the synth to module code.
With some minor changes it still seems to work.

Playing is for another time.  Maybe when fixing the docs?


Entry: words in namespace
Date: Sun May 24 20:54:12 CEST 2009

Moving everything to modules does create the problem that not all
words are visible in the toplevel namespace.  A simple require will
fix that, but still.. There should be a better way..


Entry: interaction macros: not easily composable
Date: Mon May 25 00:26:54 CEST 2009

Make some syntax for this.  It's currently not straightforward to work
on a project and add some live words.

Also, think about the role of scat: in this.  It's a bit of an
unnecessary middleman, no?

Is there a way to piggyback target interaction on simulation?
I.e. create an opcode with semantics to perform a host -> target
remote procedure call?

Maybe it should be more dynamic (do-what-i-mean) since it is a
debugging tool after all.  I've got all the static tight-assness i
want in the module system, so let's let it rip in the interaction.


Entry: removed old vm commands
Date: Mon May 25 09:34:13 CEST 2009

;; quoted here for later reference.  this code is probably broken.

 ;; Entry point for (syntax-only!) live interaction -> prj code
 ;; transformation.
 (define (live->prj code)

   (define default
     (predicates->parsers
      (number?  ((n)  (n tlit)))
      (symbol?  ((w)  ('w tinterpret)))))

   (apply-parsers-ns/default
    '(live) default code))


 ;; Append a line to a log of lines.

 (define (log-line str stack)
   (if (or
        (null? stack)
        (not (equal? str (car stack))))
       (cons str stack)
       stack))


 ;; DIRECT
 (provide vm->native/compile
          live/vm->prj)


 (define (underscore stx)
   (->syntax
    stx
    (string->symbol
     (string-append
      "_"
      (symbol->string (->datum stx))))))


 (define (vm->native/compile code)

   (define default
     (predicates->parsers
      (symbol?      ((w)       (|'| #,(underscore #'w)
                                    |'| _compile macro/default)))
      (number?      ((n)       (n _literal)))))

   (apply-parsers-ns/default
    '(compile-vm) default code))


 (named-parsers
  (compile-vm)

  (0cmd         ((w)       (w)))
  (|:|          ((_ name)  (: #,(underscore #'name) enter)))
  (|;|          ((_)       (_exit))))

   (named-parser-clones (compile-vm)
                        (0cmd pa clear))


   ;; FIXME abstract out ns/default thingy

   (define (live/vm->prj code)

     (define default
       (predicates->parsers
        (symbol?   ((w)   ('#,(underscore #'w) tf
                           _tlit 'dtc tfind texec/w)))
        (number?   ((n)   (n _tlit)))))

     (apply-parsers-ns/default
      '(live-vm) default code))


   ;; FIXME: find a way to extend the other live commands.
   ;; map these to their '_' counterpart

   ;; FIXME: commands that take no args can be simply mapped.
   ;;(define (_command? x)  (element-of x '(ts tss tsx cold ping)))

   (named-parsers
    (live-vm)

    (0cmd      ((w)   (w))) ;; just use same as native
    (_0cmd     ((w)   (#,(underscore #'w)))) ;; special

    (1cmd      ((w)   (_t> #,(underscore #'w)))))


   (named-parser-clones
    (live-vm)

    (0cmd   commit clear pa ppa cold ping)
    (_0cmd  ts tss tsx)
    (1cmd   p ps px kb))


Entry: do what i mean
Date: Mon May 25 09:51:01 CEST 2009

Let's change the semantics of the console interaction as follows:

The (target) namespace has the semantics:

  * target prefix parsers -> expand
  * target words -> execute
  * concatenative macros -> simulate
  * scat: infer type + run
  * scheme: infer type + run

The problem with this is the "infer type" part.  For scheme -> scat
it's not too difficult to do dynamically using rpn-wrap-dynamic.

Ok. added the form (scat-dwim id)

The general idea is that for code you want static features, but for
the interaction/debugging you really want maximum flexibility.


Entry: usb.f and indexed addressing
Date: Tue May 26 09:12:13 CEST 2009

Is this really necessary?

It is tempting to use, but it will make supporting code for PIC18
difficult.  The relative addressing is quite an extensive change.

The real problem however is that "struct" addressing needs namespace
support in the language.  This will be a bigger hurdle.

I talked about this before[1]: there is something to say about structs
vs. applications in functional programming languages.  Can we do the
same for Forth?  Use the data stack as constructing/deconstructing device?

[1] entry://20090322-215126


Entry: MetaML and future of Staapl
Date: Tue May 26 09:51:27 CEST 2009

Removing this from the introduction page:

  Related Work

  Compared to MetaML, which seems to be the current reference point
  for staged programming systems, Staapl contains functionality
  related to MetaML combined with abstract interpretation. However,
  Staapl is quite different as it is a non-homogenous two-stage system
  based on flat combinators and dynamic typing, while MetaML is a
  homogeneous multi-stage system based on the lambda calculus and
  static ML-style typing.

It's not so clear..  The real difference is that Staapl is a bridge
between something that behaves as a Forth macro system and the scheme
procedure/macro system.  The concatenative macros are special in that
they do not deal with names, so they could be compared to the
constrained code manipulation that's possible in MetaML.  But, Staapl
also contains a complete scheme-like macro system that can abstract
over names (the prefix parsers) bringing it far outside the reach of
the static analysis allowed by MetaML.

It would be nice to get some discussion going with Walid Taha or one
of his students about this.  I tried contacting him in an informal way
but got no reply.

Anyways, I'd like to be able to understand the real differences
between macro systems and staging but as far as I can see in the
literature, the bridge between them is still being built.  Dave
Herman's work is interesting in this respect, as is the Ziggurat
system.

Now, my goals are humble.  As arrogance and achiever mentality start
to fade, I see what I did in perspective.  It's not really rocket
science.  I'm glad I've got the bridge to PLT Scheme worked out, but
there are still things that I don't really like that much:

  * Inability to integrate true partial evaluation.  The basis is
    there I just don't see the light yet.  What I do understand is
    that PE is more of an art than a science, since it is mainly about
    avoiding code explosion due to inlined recursive calls.  That
    problem is related to the halting problem.  The field seems to be
    mostly about "bags of tricks" requiring a lot of study to get a
    good idea about what people have tried and what works and what
    doesn't.

  * Separate machine peephole optimizations and generic ones.
    I.e. the behaviour of '+' should be extendable, to be handled by
    the main compiler if both arguments are available, and by the
    target compiler for 1 and 0 available compile time arguments.

  * A type system.  A lot is to be gained by a proper type system that
    would enable processing of concatenative macro code _before_
    handing it over to the eager evaluator.  I.e. trying different
    permutations that are possible due to commutativity of operations.
    This requires building an algebraic system of combinators that can
    perform simplifications at compile time.  It would probably also
    help with lifting the language semantics to a bit higher level,
    and simplifying the peeophole optimizer.

  * Run-time and compile-time stack interaction.  If pops and pushes
    are made lazy, it is possible to perform data-flow analysis on
    words with fixed stack effect.  However, I currently don't know
    how to mix the side effects of pushing/popping with re-arranging
    machine instructions.  I'm already using something like this in
    the live simulator where it is clear how it should work.


Entry: standard forth
Date: Tue May 26 17:03:38 CEST 2009

What would be the most useful way to incorporate standard Forth into
the project?  The reason you'd want standard Forth is to use already
existing Forth code.

There are several problems that prevent standard Forth use at this
moment.  The most severe ones are:

  * 8-bit cell size -> some intermediate layer that implements 16-bit
    access is necessary.

  * non-standard parser.

The latter isn't such a problem.  Once it's clear what kind of VM we
want for the 16-bit forth, a self-hosted Forth should be
bootstrappable with the already existing parser.

The question is then: what kind of VM.  A subroutine threaded 16-bit
VM would work better with the already existing architecture, but an
indirect threaded Forth is easier to implement in a machine-
independent way.  Interoperability is more important than reduced
implementation complexity so let's start by creating a subroutine
threaded compiler in Forth.

To write a self-hosting compiler, the form of the dictionary should be
made explicit.

I need at least this:

[ link | name | CT | XT ]
                     code ....

Where the dictionary either contains a pointer to the code, or the
code inlined.  Probably a pointer is better since then the dictionary
could be stored somewhere else (and stripped).

So what about this:
  - FIND returns a dictionary record
  - REC>XT returns the interpretation semantics
  - REC>CT returns the compilation semantics

A dictionary structure is then

[ link | CT | XT | name ] where the name is a padded pascal string.

Do strings have a special representation?

Also, I want a Forth with space-safe tail calls.

Looks like the most difficult part is string comparison.  I wouldn't
know how to do that without a huge amount of stack shuffling (pointers
and sizes).  Let's see how this is usually done.

I'm starting to think that writing a self-hosted Forth that needs to
be bootstrapped isn't such a good approach.  It's probably easier to
write it top-down and just bootstrap the dictionary.


Entry: comparing two strings -> generators
Date: Tue May 26 20:03:38 CEST 2009

This is one of the examples which is really hard to do in the current
PIC18 Forth.  It's a memory to memory operation which doesn't fit well
in Forth simply because it uses a couple of arguments (size + address)
with some of them double length (address).

In scheme, there is always tail recursion for "parrallel assignment":
updating all the elements in a state vector simultaneously in terms of
the previous values.  In Fort due to absence of random access names,
this is more difficult, and has to be serialized into a sequence of
operations using stack juggling.

It's probably best to use an abstraction for this: generators.


Entry: more libraries
Date: Wed May 27 19:24:28 CEST 2009

Now that I'm going over it again I see how much I've been writing
assembler in Forth, using tricks and escapes to get to small and
efficient code.  One of the examples is the "=" operator... There
isn't any!

The reason is that I've been using flag based condition macros
everywhere.  The PIC18 is good at this.  Now that I'm writing a string
comparison routine where the _only_ point is a destructive test (not a
test followed by another operation) i find that I'm lacking proper
operations.

Let's build some.

Actually, they did exist.  I discovered them right before the deadline
in the Waag project.


Entry: rc files
Date: Wed May 27 20:21:15 CEST 2009

Extend the staaplc compiler to include an instruction in the .dict
file to load the .rc file corresponding to the project.  This could
then be used to contain interaction scripts.


Entry: standard forth
Date: Thu May 28 07:36:23 CEST 2009

Before I loose them again, some useful links [1][2].

So, I have most of the difficult decisions made:
 * Forth is subroutine threaded
 * @ and ! are RAM, flash has separate access words
 * dictionary and code are in separate flash regions
 * dictionary uses Pascal strings
 * , (comma) goes to RAM first, into a circular buffer

Still to solve:
 * FIND
 * terminal input
 * large branches


There is one thing I'm not convinced about.  The threading model.
Maybe it's better to stick to some form of address interpretation.  It
will make things a lot slower, but won't have double sized call words
or chained jumps.

Anyways, I already did a lot of thinking about this, the result of
which can be found in the pic18/vm-core.f code.  I'm again stuck in
this loop of not knowing what to choose: fast or flexible?  Putting it
like this it should of course be flexible.

Can't I have both?  It should be really simple though: for speed, use
the 8-bit Forth.  It's always going to be faster.  For the one on top
of that, use anything that has compatible primitives _and_ has small
code size and maximum flexibility otherwise.

The VM in pic18/vm-core.ss uses threaded code with an exit bit to
implement tail recursion.  It also sees the return stack as code.


Some links

[1] http://astro.pas.rochester.edu/Forth/forth-words.html
[2] http://lars.nocrew.org/dpans/dpans.htm


Entry: partial evaluation is proper commutation
Date: Thu May 28 07:44:28 CEST 2009

It's all about order.  I had this dream where it was very clear to me.
Of course as dreams go, they only tell you what you believe, not
necessarily what's true or how to get there.

Makes sense though.  I've seen the argument made multiple times that
lazyness and good PE are quite related.  Lazyness gives you some kind
of optimal evaluation order.


Entry: debugger
Date: Thu May 28 14:40:31 CEST 2009

 - scripts
 - require on command line
 - arbitrary code compilation on command line (1)
 - variables crash

(1) This is best done by interpreting ':' as a switch to compile mode
for the rest of the line.  It needs a special prefix parser.  Problem
is: this is not composable (you can't do anthing _after_ compiling the
code), so it probably needs a continuation hook to be able to build
stuff on top.  Let's define the "compile" word to take another macro
as continuation.

compile commit . . . .

Ha I worked myself into a corner here..  How to call Scheme code from
a macro?  It's probably better to do it in two steps.


I made some abstractions:
  - slurp the rest of the line:
      slurp receiver code ...  -> receiver (code ...)
  - perform side effects wrapped as scat functions
    for calling (eval '(forth-compile-tokens c))
  - switch to compile mode for a limited number of words
    for variable, 2variable, require, ...
  - switch to compile mode for the rest of the line

All compile mode switches immediately upload code.


Entry: happy
Date: Thu May 28 17:42:52 CEST 2009

I'm quite happy with the way everything composes now.  There seem to
be no arbitrary limits that limit composition so expressive power
multiplies into beautiful flowers.

So there is scat that glues everything together, scat's unquote to be
able to turn anything into scat code, scheme for stuff that needs
local lexical names, and the rewrite system (based on scheme's syntax
pattern matching) for any kind of "exceptions" that might arise.

The late-bound interaction language also makes things a lot easier to
implement.

It looks like it's ready to be finalized.


Entry: issues to solve
Date: Thu May 28 23:25:07 CEST 2009

 - command completion
 - regain flat namespace (import all modules) (1)
 - snot

(1) is now done in code.ss -- it is essentially a reverse name lookup
independent of the forward name resolution that is performed using
identifiers only. this dictionary could then be used at the debug
console.

i'd like to fix the disassembler too.

Maybe it's better to fix the CFG rep first.  I've never really liked
the way code lists are linked to labels.  It feels artificial.

So, is there a way to solve this problem?

Maybe the CFG should be defined in a more textbook manner.

Looking at it from the pov of the assembler,
  - labels point to code
  - have an address that can be modified
  - code points to labels

So, there is a static part (labels + assembly code) which is a graph.
To this graph another data structure is associated which maps labels
to addresses and assembly code to binary lists.

Mabe this should just be abstracted into some datatypes.

Ok. this is probably what is necessary:

The default data structure on which the compiler operates, and for
which there exist printing routines is the CHAIN, which is a list of
(label code) lists.  In addition, each LABEL (target-word) points to
its associated code and its next instruction, however, these are only
there for reference and might be implemented separately.  Both during
compilation and disassembly these might be invalid.


Entry: list split
Date: Fri May 29 12:58:19 CEST 2009

There is one simple parsing step that keeps occuring over and over in
dealing with flat things: introduce structure by grouping based on a
predicate of one of the lists.

i.e.   (x x x x x x  x x x)
       (1 0 0 0 0 0  1 0 0)
->    ((x x x x x x)(x x x))

This has to have some kind of name..  This is basicly "regexp split"
but then done on multiple sequences.

What this needs is two functions:
   - match
   - combine


Entry: vm stuff
Date: Fri May 29 17:47:55 CEST 2009

OK.. it doesn't work any more.  somehwere something went wrong.  i
just switched the Forth to use byte addresses for "address".

Let's try to debug it.

OK, it was storing byte addresses, but expecting word addresses.

Now, how to distinguish words from macros?  This requires some
restructuring.  The problem is trivial when using compilation tokens
but i have an inverted dependency somewhere..

Hmm.. i tried restarting 5 times..  There's a problem here since we're
using the same namespace.  Try again..

The problem is this: my macros already have compilation semantics.
The distinction between word/maro is made in the dictionary
compilation process.  Overriding this is asking for trouble: it's
essentially unhiding something that's already abstracted.  There has
to be a simpler way to do this.

I removed pic18/rpn-macro-double.ss to start over.

There is already a mechanism for this, but it's also not composable:
the way forth-begin is defined in pic18.ss

Conclusion: the forth definition code is not re-entrant because it is
tied to the (macro) namespace: can't define Forth on top of Forth.
However, it is possible to hide this behind namespaces, but I'm not
sure if this is worth the trouble.  It would get quite confusing too..

The only way to do this is to use nested namespaces..


Entry: forth on forth
Date: Sat May 30 09:05:05 CEST 2009

The proper way to have _both_ an anonymous compiler and a dictionary
compiler requires some juggling.

This is an energy sink..  It's clearly not anticipated in the design,
so I wonder if i should persue it.  The irony is that a threaded Forth
compiler is really trivial compared to one based on composition of
macros.

What should this be used for?
  - A standard Forth: yes
  - A standalone Forth: yes
  - Use inside native code: maybe

The important observation is that a standard Forth doesn't need the
"macro:" layer because it can be written entirely in the interaction
mode.  This is how it was implemented before.

In other words: the toplevel namespace module is better suited to a
standard Forth than the declarative module one.  Let's not waste too
much time on this.

Conclusion: there is no way around writing a full-fledged Forth
parser, which either runs on the target or on the host.  It's probably
possible to make some minor hacks in target-double: but that's what
they be: hacks.  Anyway, it really only needs ":" and "variable".

Then, maybe there really is no reason to first write a tethered
standard Forth and then make it stand-alone?  Probably best to have
only the standalone version.


Entry: Forth mode
Date: Sat May 30 11:10:21 CEST 2009

Traditionally, Forth has 2 modes: interpret / compile.

The Staapl Forth is modeless.  It instead has two languages.  "macro:"
is the compiler language while "target:" is the command line
interpreter language.

Both models are similar.  The main difference is that in Staapl
"macro:" doesn't know anything about "target:", while in Forth words
are words..


Entry: byte addressess
Date: Sat May 30 11:20:30 CEST 2009

I'm loosing too much time about byte/word addressess.  The original
idea was to use only word addressess to make the assembler simpler,
but that's inconvenient in Forth if there is also to be byte data
access.

Let's keep the assembler as is, but make the change at the interaction
level.

The problem seems to be in the function "tfind" that treats code and
data the same.  This needs to be split into two functions: find-code
and find-data :: symbol -> byte.ptr

These then should be defined in the toplevel namespace so the
interaction code can pick them up there.


Entry: Forth VM next
Date: Sat May 30 14:07:36 CEST 2009

Need to check if the macros work with the vm: language.  Currently the
hack to have a bootstrapping Forth console for the 16-bit VM seems to
be simple and effective.

Ok. Macros work.

But..  I simply can't have it that I spent all this time making the
Forth layer abstract enough to be able to reuse it for other
architectures, but I can't seem to make it work for building Forth on
Forth.  As mentioned before: I'm probably not going to use that layer
much.  To make the self-hosted Forth possible, all macros need to be
implemented as on-target words.

That would be the next step.

Now, instead of always looking at host -> target rpc, what about
enabling target -> host calls?  This would make bootstrapping a
self-hosted interpreter simpler: it would enable gradual offloading.

That seems to be the real problem to solve!


Entry: target-directed to-host offloading
Date: Sat May 30 14:51:37 CEST 2009

Figure out a way to make the tethering protocol bi-directional.  The
target should be able to request a computation to take place on the
host.

It's better to use the target's code sequencing to debug code than to
offload this as simulation to the host.

The simplest way to do this is to use a symbolic interface: target
sends a string requesting for a command to execute, and waits for ack.

Ok, so what needs to be done is to find a way to implement the target
Forth's immediate words both as macros (using the host compiler) and
as target words.  This by abstracting the dictionary and the compile
stack.

Conclusion:

    The difficulty in writing a Forth in Forth is to abstract the
    dictionary, compilation stack and threaded code representation in
    such a way that a single specification of all immediate words
    (mostly control words) can be used in different compilers.


So, can a very classical way of implementing the control words be
implemented in coma?  The problem is that coma uses abstract labels,
not target addresses, and a real forth would use forward jump
patching.

But there are only two forms: forward and backward jumps.  Both will
save some data on the compilation stack and consume it later.

    PUSH     	                  POP
    empty forward jump + hole     here -> hole
    here                          backward jump

From [1]:

    Bill Muench designed the original eForth 1.0 for simplicity and
    portability. It had only 30 words written in assembler and used
    only BEGIN_UNTIL BEGIN_WHILE_REPEAT IF_ELSE_THEN and FOR_NEXT in
    its source. The second release reduced the number of code words to
    28 and removed the FOR_NEXT constructs from the source code and
    replaced them with BEGIN constructs. I was pleased to learn this
    when I was designing a meta compiler to generate a version of
    eForth for a new target. It meant that there were fewer IMMEDIATE
    words that were needed in the meta compiler. The meta compiler no
    longer needed to compile FOR_NEXT constructs.

[1] http://www.ultratechnology.com/meta.html

Entry: niche
Date: Sat May 30 16:00:25 CEST 2009

well put:

    Programming philosophy: don't bother with any higher-level
    language -- just write in and extend the operational semantics
    directly by adding new virtual machine instructions. You'll be
    forced to think more clearly about what your programs do. You'll
    encounter fewer "impedence mismatches" where you have to fight
    against the programming language to say what you mean (e.g., tail
    calls in C). You'll probably come up with results that are much,
    much more economical in all but time to market.

    There's something interesting about modularity in forth: In
    forth, each word's author is expected to be cognisant of and
    responsible for the whole state of the machine. For the price of
    assuming cooperation and trust between components, you get
    enormous flexibility and power.

[1] http://lambda-the-ultimate.org/node/2319#comment-34864


Entry: This week
Date: Sun May 31 08:51:21 CEST 2009

With the interaction problems fixed, it's time to get the usb driver
going.  This has been the subject of procrastination for far too long.

Let's start with this idea of Forth, explained in the LtU thread i
quoted in the previous post: build a state machine that can solve your
problem, and violate locality where necessary (imposed co-operation).

Basicly this is component based design.  Electronics.  Forth is about
writing code as you would write hardware: the finiteness is central to
the idea.  Singletons with interfaces.

On the theory side, it's important to start acknowledging the
difference between concatenative macros and prefix parsers.

So, goals for this week:
  - main: usb driver
  - on the side: target -> host rpc


Entry: the 14-bit VM
Date: Sun May 31 15:22:19 CEST 2009

I'm pasting the source code of the more exotic VM in this post as a
backup.  Code will be modified in-place to move to a simpler DTC
architecture.  There are 3 files

dtc-control-i.ss      on-target immediate words (untested)
dtc-control-m.ss      same, but using host macros
dtc.ss                core interpreter


----------- dtc-control-i.ss -----------
#lang planet zwizwa/staapl/pic18 \ -*- forth -*-
provide-all

\ On-target immediate words implementing the control words.
staapl pic18/double-math
staapl pic18/double-pred
staapl pic18/execute
staapl pic18/dtc


\ This needs "comma" and a way to back-patch words.  The idea is to
\ compile to a RAM buffer first, and transfer it to FLASH when it's
\ done.
staapl pic18/double-comma

macro
: _address  word-address lohi ;
forth

: _mask    #x3F and ;
: _lmask   _mask #x40 or ;
: _compile _mask _, ;  \ takes word address as 2 bytes
: _literal _lmask _, ;
: _0       0 0 ;

\ These compile unconditional and conditional jump.
: _jump,   ' _run    _address exitbit _compile ;
: _0=jump, ' _0=run; _address         _compile ;


\ Jumps are proper primitives.  They take a single argument which we
\ compile as a literal.
: _hole    _here@ _0 _literal ;
: _lpack   _>> _lmask ; \ pack byte address as literal
: _then    _>r _here@ _lpack _r> _! ;  \ patch hole

: _if      _hole _0=jump, ;
: _else    _>r _hole _jump, _r> _then ;

: _begin   _here@ ;
: _again   _lpack _, _jump, ;
: _until   _lpack _, _0=jump, ;


\ COMPLICATIONS: because of the exit bit, jump targets need to be
\ protected so the previous instruction doesn't get exit-tagged.  See
\ -m.ss

---------- dtc-control-m.ss -----------
#lang planet zwizwa/staapl/pic18 \ -*- forth -*-
provide-all

\ Macros implementing the control words.  For a self-hosted
\ interpreter these need to be replaced by immediate words.

staapl pic18/double-math
staapl pic18/double-pred
staapl pic18/execute
staapl pic18/vm-core

macro

\ note: XT need to be word addresses, since i have only 14 bit
\ literals. return stack still contains byte addresses though, so for
\ now it's kept abstract.


\ create a jump label symbol and duplicate it (for to and from)
: 2sym>m      sym >m m-dup ;

\ jumps are implemented as literal + primitive (instead of reading
\ from instruction stream)

: m>jmp    m> literal ' _run     compile _exit ;
: m>0=jmp  m> literal ' _0=run;  compile  ;

: _begin    2sym>m m> label: ;   \ back label
: _again    m>jmp ;              \ back jump
: _until    m>0=jmp _space ;     \ conditional back jump

: _if     2sym>m m>0=jmp ;                  \ c: -- label1
: _else   2sym>m m>jmp m-swap m> label: ;   \ c: label1 -- label2
: _then   m> label: _space ;        \ c: label --

: _space  ' _nop compile ; \ necessary when 'return' needs to be isolated.

\ : _for    _2sym>m m> label ' do-for compile ; \ c: -- label
\ : _next   _m>literal ' do-next compile _space ;

: _for
    ' _>r compile
    _begin ;
: _next
    ' do-next compile  m>0=jmp
    ' _rdrop compile
    _space ;


------------ dtc.ss -------------
#lang planet zwizwa/staapl/pic18 \ -*- forth -*-
provide-all

staapl pic18/double-math
staapl pic18/double-pred
staapl pic18/execute

\ ************************************************************************

\ A direct threading composite code interpreter.  It has a number of
\ small differences to standard Forth.  The idea is this will run a
\ version of forth without parsing control words, but using quoted
\ code instead.


\ *** CONTINUE resumes the execution of the VM, more specificly the
\ program pointed to by IP.  A program is an array of primitive
\ instructions.  Primitive instructions are primitive code (word)
\ addresses + a continuation discard bit (EXIT bit).  IP is
\ implemented by TBLPTR (the f register).

\ *** I want to express iteration using TAIL RECURSION.  This means
\ the caller needs to pass the proper continuation to the callee on
\ the RETURN STACK, discarding the current thread if necessary.  For
\ this purpose, one 'EXIT' bit will be reserved in the instruction
\ field, and the interpreter loop will pop the stack before calling
\ the next primitive.

\ *** A continuation can be invoked by RUN, so there is no distinction
\ between programs and continuations.  A continuation takes a data
\ stack as argument, just like ordinary programs.  RUN is the dual of
\ forth's EXECUTE, which is used here to invoke primitives.

\ *** The machine return stack is reserved for the underlying STC
\ forth / machine code.  The VM uses the STC retain stack as return
\ stack, to limit interference.

\ *** To treat composite code as a primitive, an array of primitive
\ instructions needs to be prefixed by a machine code element 'CALL
\ enter', which will save the current continuation (IP) and invoke a
\ new one.  This 'enter' could be duplicated if a large address space
\ is spanned, so a short branch can be used.

\ *** The interpreter is explicit: this is done so that primitives do
\ not need to end in NEXT, as is done traditionally, enabling the use
\ of native/STC primitives.  All 16-bit primitives are prefixed with
\ '_' (underscore) so they are easily mapped and debugged in STC
\ forth.


\ TODO: some modifications.
\ - all data sizes used (literals, primitives, composite) fixed at 14bit
\ - interpreter runs on top of memory model: composite code in ram possible


\ ************************************************************************

\ IP + RS
\ instruction pointer manipulation. only the ones that affect the
\ machine return stack and machine flags need to be macros. the rest
\ can be functions for ease of debugging.

macro
: @IP+  @f+ ;  \ read bytes from the instruction stream
forth

: _IP!
    _<< fh ! fl ! ;   \ store to IP

: enter \ asm (rcall ENTER) wraps composite code in prim
    _IP>r
    TOSL fl @!  \ TOS cannot be movff dst, but src is ok
    TOSH fh @!
    pop ;

: _>r    >r >r ;
: _r>    r> r> ;
: _rdrop rdrop rdrop ;


\ These 2 govern the format in which threaded addresses are stored on
\ the return stack. For return stack tricks to work, this is taken to
\ be word addresses.

: _IP>r              \ save current IP to VM RS
    clc
    fh @ rot>>c +r !
    fl @ rot>>c +r ! ;

: _r>IP              \ pop IP from VM RS
    clc
    r- @ rot<<c fl !
    r- @ rot<<c fh ! ;


\ INSTRUCTION FORMAT + INTERPRETER / COMPILER

macro

\ in flash, code is stored as [EXIT | LIT | DATA]. the shift left when
\ reading will move EXIT->carry and LIT->negative flags.

: exit?    c? ;
: literal? n? ;
: prim@/flags         \ fetch next primitive from composition
    \ clc             \ low bit is ignored by PIC
    @IP+ rot<<c        \ this sets c and n flags
    @IP+ rot<<c  ;


\ compilation macros. DTC is compiled by mapping it to native forth in
\ symbolic form. these macros implement the encoding.

: pow2      1 swap <<< ;
: set       pow2 or ;
: mask      pow2 1 - and ;

: exitbit   15 set ;
: litbit    14 set ;
: mask14    14 mask ;


: literal  mask14 litbit   ,, ;  \
: compile  word-address ,, ;     \ xt --
: _exit    dw> exitbit  ,, ;
: _;       _exit ;

\ utility macros
: _c>>     rot>>c 2nd rot>>c! ;
: _<<c     2nd rot<<c! rot<<c ;

forth

\ inner interpreter loop
: continue
    prim@/flags                 \ fetch next primitive + set type flags
    exit? if _r>IP then         \ c -> perform exit
    literal? if 14bit ; then    \ n -> unpack literal
    execute/b continue ;        \ execute primitive

: 14bit \ interpret doubleword [ 1 | 14 | x ] as a signed value.
    _c>>                 \ [ x | 1 | 14 ]
    #x3F and             \ high bits -> 0
    1st 5 high?
    if #xC0 or then      \ high bits -> 1
    continue ;

: _bye      pop              \ quit the inner interpreter
: _nop      ;


\ trampoline entry. 'interpret' will run a dtc primitive or primitive
\ wrapped program.


: bye>r      enter ' _bye compile _exit
: interpret     \ ( lo hi -- )
    bye>r       \ install continuation into dtc code "bye ;"
    execute/b   \ invoke the primitive (might be enter = wrapped program)
    continue ;  \ invoke threaded continuation


\ CONTROL FLOW WORDS

\ 'run' is the dual of 'interpret'. it takes threaded code addresses. in
\ combination with the exit bit, this can be used to implement
\ conditional jumps.

: _run \ word-addr --
    _IP>r
    _IP! ;

\ : _0=run \ flag addr --
\     _run
\     or nz? if _r>IP then
\     drop ;


\ "go" = "run ;"

\ i don't want to use the word 'jump', but conditional jump is not the
\ same as conditional run.

: _0=run;  \ ? program --
    _>r
    or nz? if
	_rdrop
    else
	_r>IP
    then drop ;


forth
: do-next \ -- ?
    _r> _1- _dup
    _>r _0= ;


Entry: Simpler DTC
Date: Sun May 31 15:39:24 CEST 2009

The 14-bit VM is clever, but it's not simple.  It might be used for
something else later.  The fact that the return stack is executable
code might lead to somewhere.  However, the 14-bit constants are a
pain and the extra effort to make tail recursion work is not worth it.

Right now what is important is to get the self-hosted Forth to work
and make it reasonably portable.

Let's go back to a simple threaded Forth.  I'm not sure what the name
is for the method I'm using: it's direct threading[1] with primitives
that do not use NEXT, but instead use an explicit interpreter
loop. (NEXT = procedure return).

[1] http://en.wikipedia.org/wiki/Threaded_code#Direct_threading


Entry: buffered compile working
Date: Sun May 31 20:12:45 CEST 2009

In the end it turned out to be simple, with the right primitives.
However, to get there I had to tone down my enthousiasm..  The only
way to write lowlevel Forth is this:

  - write primitives for your problem
  - test the primitivies
  - write the high-level code

Even for the simplest problems (like moving a buffer from ram to flash
after extending its bounds) it pays off to leave the muddy waters of
low level machine state as soon as possible.  But inevitably this step
has to be performed.  The testing is what makes this all work.  If it
wasn't so easy to compile a word and test it, working like this would
be quite difficult..

Once the primitives were right, the stuff on top was really obvious.


Entry: usb and debugging
Date: Mon Jun  1 07:21:38 CEST 2009

Let's get target->host communication working to at least make debug
print statements work.

The idea is this: whenever a command gets executed, the host waits
either for ACK (zero length message) or a command to execute.  For now
let's just stick to display.

It is quite trivial.  Apparently "emit" was already defined as
: ack1 1 transmit transmit ;

The host side then is simple: on every execute, expect printouts
before ack (empty message).

(define (tslurp)
  (let ((reply (target-receive/b)))
    (unless (null? reply)
      (display (list->bytes reply))
      (tslurp))))

(define (texec/b addr)
  (~texec/b addr)
  (tslurp))


Entry: usb
Date: Mon Jun  1 10:44:27 CEST 2009

So, how to tackle USB.

On the PIC this boils down to dealing with endpoint buffers, so let's
write some abstractions to deal with those.

We can use the simplest scheme: no double buffering, one endpoint and
fixed buffers for IN and OUT.

An endpoint buffer descriptor is a 4-byte structure:

  0 STAT  status register
  1 CNT   buffer elements
  2 ADR   buffer address

This descriptor resides in the USB RAM, which is a dual ported memory
bank accessible by the MCU (microcontroller unit) and the SIE (serial
interface engine).  Ownership is governed by the UOWN bit in the STAT
register for each buffer.  The buffer descriptor addresses are mapped
to buffer descriptor registers when the UEPn bit is set (endpoint
enable), or to RAM when the endpoint is disabled.

The STAT register's contents depend on wheter the MCU or SIE owns the
endpoint buffer.

           7    6    5     4      3     2     1   0
SIE mode  UOWN  -   PID3 PID2   PID1  PID0   BC9 BC8
MCU mode  UOWN DTS  KEN  INCDIS DTSEN BSTALL BC9 BC8


3222222634222626262222263

Jun  1 13:21:15 zzz kernel: [415980.152170] usb 4-1.3: new full speed USB device using ehci_hcd and address 95
Jun  1 13:21:15 zzz kernel: [415980.224150] usb 4-1.3: device descriptor read/64, error -32
Jun  1 13:21:15 zzz kernel: [415980.400121] usb 4-1.3: device descriptor read/64, error -32
Jun  1 13:21:15 zzz kernel: [415980.576090] usb 4-1.3: new full speed USB device using ehci_hcd and address 96
Jun  1 13:21:15 zzz kernel: [415980.648072] usb 4-1.3: device descriptor read/64, error -32
Jun  1 13:21:15 zzz kernel: [415980.824042] usb 4-1.3: device descriptor read/64, error -32
Jun  1 13:21:15 zzz kernel: [415981.000135] usb 4-1.3: new full speed USB device using ehci_hcd and address 97
Jun  1 13:21:16 zzz kernel: [415981.408010] usb 4-1.3: device not accepting address 97, error -32
Jun  1 13:21:16 zzz kernel: [415981.480547] usb 4-1.3: new full speed USB device using ehci_hcd and address 98
Jun  1 13:21:26 zzz kernel: [415991.888009] usb 4-1.3: device not accepting address 98, error -110
Jun  1 13:21:26 zzz kernel: [415991.888255] hub 4-1:1.0: unable to enumerate USB device on port 3


It seems to try 5 times to reset the device.
Apparently we need to send something back.

In [1], section 8.3.3 (usb enumeration) i find this:

 - host sends USB RESET
 - host sends GET DESCRIPTOR to find out


After URSTIF (6) there's ACTVIF (4).  It looks like we need to send
something back, but what?

Maybe [2] will help.  It contains the USB stack for PIC18.


[1] http://www.elsevier.com/wps/find/bookdescription.cws_home/714114/description#description
[2] http://www.microchip.com/stellent/idcplg?IdcService=SS_GET_PAGE&nodeId=2680&dDocName=en540668


Entry: usb cont
Date: Mon Jun  1 16:50:48 CEST 2009

problem was incorrectly initialized buffer descriptors (swapped CNT
and STAT) and not setting UEP0 to #x16 (had #x14) to enable
CONTROL+IN+OUT

Next: handling transactions.  Get the PID from the buffer ID.  This
can only be

    \ 0001 OUT
    \ 1001 IN
    \ 1101 SETUP


Ok, the next step is to handle all the requests.  What needs to be
done is to make a simple interface that maps the requests to Forth
functions.  How to make this readable?

This makes for some incredibly boring code...

The SETUP buffer contains the following values:


Entry: accessing structures
Date: Tue Jun  2 09:51:41 CEST 2009

I'm moving back to the previous approach, which is a pattern I've seen
a lot in Forth code:

  - set "current context"
  - operate on it

This isn't so bad, given that there is no other way to have any form
of data locality.  For this we're not going to use the extended
instruction set: just some uniquely named accessors that operate on
the current object in the a register, without changing the pointer.

OK.  Replied to GET_DESCRIPTOR, now there's a SET_CONFIGURATION coming
in.

Problem is now that probably again i'm not properly acknowledging this
request, since there is a STALL coming in, and the host gives
timeouts.

It would have been so much simpler if they'd just made it into a
single flat namespace and a uniform RPC mechanism instead of all this
if-whatever-then-set-this-else-do-that crap.

I guess it doesn't get much muddier than this.  Interfacing with
hardware that's designed for minimal _hardware_ complexion sucks.  You
get all the shit..

So it looks like i don't understand replies yet.. When to set the DATA
toggle for instance.

Enough for today..

Anyways, it looks like this problem is general enough to be solved
with humility and acceptance[1].  It's one of those problems that
seems to be not there.  Maybe it's so hard because it actually does
something really important: throwing away the right information.

[1] http://zwizwa.be/ramblings/staapl-blog/20090602-110800


Entry: syntax-directed translation
Date: Tue Jun  2 12:10:26 CEST 2009

In [1] chapter 9 there is a section on syntax directed code generator
generation.  (a.k.a. Graham-Glanville).  It seems what I'm doing is
related to this, only Staapl uses RPN instead of PN, and pattern
matching is ordered (eager).


[1] http://www.elsevier.com/wps/find/bookdescription.cws_home/677874/description#description

Entry: usb next
Date: Tue Jun  2 15:30:26 CEST 2009

Simple: build better abstractions.  The first thing to do is to
abstract the buffers better.  Each buffer should be an object +
methods:

  - claim / release vs. send / receive
  - buffer chunking
  - data toggle
  - interrupt acknowledge (= transaction request queue)

Second it might be interesting to write some highlevel interface on
top of bit access.  It's important to have the low-level interface for
when speed counts, but in general initializations are not
speed-critical, but they comprise the bulk of the (tediously explicit)
code.


Entry: documentation
Date: Tue Jun  2 16:30:54 CEST 2009

file:///usr/share/plt/doc/scribble/srcdoc.html

Might be better for writing docs.. Much of the code in Staapl is quite
straightforward and readable.  Especially for macros it might make
more sense to just look at the expansion instead of some description.


Entry: more standalone forth
Date: Tue Jun  2 19:14:44 CEST 2009

NEXT: the dictionary.  I already settled on using a 2-part
dictionary with metadata and code stored separately.

Things to figure out:
  * Where to put it
  * What to put in it
  * How to store head pointers.

Note that recursion is difficult in standard Forth, because the word's
semantics is only known _after_ the definition is compiled.  One could
say that immediate words don't really need recursion so storing the
address at the invokation of ":" makes recursion work.  I don't think
the standard says anything about this..  Actually, I'm wrong.  RECURSE
is used to make sure that words can be redefined keeping the previous
behaviour reachable for delegation.

Let's just keep it simple and use the abstract "_," to create the
dictionary entries too.  This leads to reasonably portable code.

[1] http://www.taygeta.com/forth_intro/recurse.htm


Entry: VALUE and TO
Date: Tue Jun  2 21:21:17 CEST 2009

Reading issue 1/4 of the Forth Dimensions explaining the TO concept.
This can be combined with VALUE to create words that dereference by
default but can be escaped for assignment.

Doing this in Staapl requires a bit of modification to the way
variables are wrapped.  I'm not sure if it's actually possible since
the reflection that TO performs might not be available.  Names are
associated to a single behaviour.


Entry: smarter bootstrapping
Date: Tue Jun  2 21:46:51 CEST 2009

So, now that I have an idea about how to make the primitives work,
maybe it's possible to modify eForth to use the buffered compiler,
then bootstrap using gForth.

Once this works, the same could be done for a set of primitives
written in Scheme.  This could then be extended into a working ANS
Forth that runs in Scheme, that can be used to bootstrap standard
Forths for other architectures on top of the Staapl Forth.

Summarized:

 - eForth + buffered compilation

 - bootstrap self-hosted Forth for PIC18 using eForth86/gForth

 - bootstrap eForth on top of Scheme for bootstrapping the
   microcontroller Forths without eForth86/gForth


It does look like eForth is manually bootstrapped: there is just an
ASM file which contains manually compiled threaded code.

This means it can't be metacompiled easily.  I'd like to make this a
bit more convenient.

With the new parser architecture it might be possible to bootstrap
both the target forth and the metacompiler directly from the same
source code using lazy circular programming, manually breaking cycles
in .f code if they occur.

That sounds like an interesting challenge.

The control words don't seem to be such a problem.  The parsing words
are.  Closing the loop there is the core of the problem.

However, there is a neat trick in eForth:

: COMPILE ( -- ) R> DUP @ , CELL+ >R ; COMPILE-ONLY

This makes it possible to avoid parsing words in the compiler, but
it requires the code to be threaded, not natively compiled.

This makes me think that trying to bootstrap by instantiating a 16-bit
binary image in Scheme might be feasible.  Once it is resolved and
relaxed, it can be directly transferred by mapping the primitives.

The parser can segment the code such that names can be mapped to
tokens.  From this, immediate words need to be identified so they can
be used in the compilation of the code.  This seems to be the
essential circle to break.  Parsing words defined in .f code are then
no longer part of the circle, and are written bottom-up in threaded
code.

The "R>" trick might interfere with bootstrapping though..  Unless the
whole memory is lazy such that "@" effectively compiles the next word
in line..

There is probably going to be a problem where lazyness and state
(here) will interfere.  In this respect it seems that a 2-pass
algorithm is simpler:

  1. Construct an interpreted version of the compiler as a code graph
     based on scheme functions that string together the primitives.
     This means the compiler cannot inspect its threading mechanism
     (it's not there!).

  2. Use this version to compile the source again.

Hmm.. Direct execution might not work though.  Maybe simulating the
threading is better..

I do wonder if it's possible to use the second pass to go over the
tokens one by one and instantiate them.  If the immediate words are
runnable, they can generate the correct _number_ of tokens, but might
not yet be able to resolve them.  Lazy approach might work after all.

Wait... If the Forth could somehow use single assignment lazy
bootstrapping would work just fine.  The .f file is then a
specification of a string of bytes.

Maybe all this needs is a re-interpretation of "!" and "@" ?


Entry: more bootstrapping
Date: Wed Jun  3 09:48:28 CEST 2009

Ok I have mcf.ss parsing the code.  Now, how to fire it up?

Can it be done fully circularly?  I.e. do "literal" and "," need to be
unrolled or can the be taken from the source?

Let's try to solve the first problem: to simply interpret some code.

I've implemented the eForth[1] primitives before.  They simulate a DTC
binary machine.  Now the two need to be hooked together.

[1] http://www.baymoon.com/~bimu/forth/


Entry: it's a trap!
Date: Wed Jun  3 10:32:21 CEST 2009

Combining the Staapl language tower approach and the self-hosted Forth
with immediate words is confusing stuff.  I keep seeing ways to
bootstrap more directly using the compiler infrastructure, but this
requires unrolling the immediate words.

Now, in essence, that is not so difficult.  There are only a couple.

What keeps escaping me however is how to do this automatically.  Given
a single (standard) Forth file, lift the immediate words (and their
dependencies!) to the macro side.

What this requires is a shift in perspective: Forth macros are
lowlevel macros, so they should really correspond to _scat_ words
instead of _macro_ words.

So to keep things in perspective:

  * circular bootstrapping seems interesting, but might not be
    necessary if the forth code can be unrolled just enough to compile
    itself.  the challenge here is to find the right primitives
    (prefix parsers and compiler words) to do this..

Entry: old comments about bootstrapping
Date: Wed Jun  3 10:42:16 CEST 2009

I'm running in circles.. Good I write things down, now i just need to
read them from time to time!

--

;; I've been looking for a long time to find a solution to writing a
;; frontend that's ANS Forth compatible.  I'm still not sure wheter it
;; is really useful at this point, but if it is not too difficult, it
;; might be a nice addition that enables the inclusion of Staapl into
;; a more traditional Forth based project.

;; The problem in itself isn't very difficult:
;;
;;   * find a Forth written in Forth + a small set of primitives
;;   * implement the primitives.
;;   * bootstrap the compiler
;;
;; However, I'd like to do it in a way that enables some more
;; flexibility.

;; After pondering this for a while, I think this might be an
;; interesting approach: Write a simulated Forth, use it to generate a
;; memory image, and translate the compiled threaded code to run on
;; top of Scat's Forth / Coma.

;; Doing this in a way that enables gradual offloading to the target
;; is not that simple.  Wanting more control over dictionary format
;; and execution model (i'd like to use STC primitives) makes things
;; quite challenging.

;; http://lars.nocrew.org/dpans/dpans.htm
;; core wordset:

;; !  # #> #S ' ( * */ */MOD + +!  +LOOP , - .  ."  / /MOD 0< 0= 1+ 1- 2!
;; 2* 2/ 2@ 2DROP 2DUP 2OVER 2SWAP : ; < <# = > >BODY >IN >NUMBER >R ?DUP
;; @ ABORT ABORT" ABS ACCEPT ALIGN ALIGNED ALLOT AND BASE BEGIN BL C!  C,
;; C@ CELL+ CELLS CHAR CHAR+ CHARS CONSTANT COUNT CR CREATE DECIMAL DEPTH
;; DO DOES> DROP DUP ELSE EMIT ENVIRONMENT?  EVALUATE EXECUTE EXIT FILL
;; FIND FM/MOD HERE HOLD I IF IMMEDIATE INVERT J KEY LEAVE LITERAL LOOP
;; LSHIFT M* MAX MIN MOD MOVE NEGATE OR OVER POSTPONE QUIT R> R@ RECURSE
;; REPEAT ROT RSHIFT S" S>D SIGN SM/REM SOURCE SPACE SPACES STATE SWAP
;; THEN TYPE U.  U< UM* UM/MOD UNLOOP UNTIL VARIABLE WHILE WORD XOR [ [']
;; [CHAR] ]


;; The problem with implementing this in a way that it can be
;; simulated on the host and moved to the target lies in 3 parts:

;;   * INPUT:       ACCEPT

;;   * DICTIONARY:  FIND WORD VARIABLE CONSTANT CREATE POSTPONE

;;   * THREADING:


Entry: lifting immediate words to staapl
Date: Wed Jun  3 10:44:49 CEST 2009

I get these words with the simplistic code upto now:

   compile ! >buf hole r> >r here@ ?jump , ; jump

The "compile" hack probably needs to go.

So.. This needs a compiler that compiles threaded code.  The wrapper
macros will compile (dw addr).

More later.  It's confusing me again.. This is something for the
morning (16:53).


Entry: TTL video
Date: Wed Jun  3 14:43:47 CEST 2009

Time to get the hands dirty.  Too much programming and thinking
lately..  Here's one of the things I'd like to do.  I have 2 or 3 old
TTL monochrome monitors that I'd like to give a new life.  The pinouts
are in [1].

This could use the approach I used before for TV, but with syncs on
separate lines.  I used the SPI output port for sending video data.
On a 452 this i pin 24 (RC5/SDO).

The D-SUB Female:

5 4 3 2 1
 9 8 7 6

1 GND
6 INTENSITY
7 VIDEO
8 HSYNC positive
9 VSYNC negative

Let's start with connecting VIDEO to the SDO, and the hsync/vsync to
two other ports.  I used RA4 for Composite.

Ok, sending out stuff on SDO gives some signal.  Let's figure out
syncs.

The horizontal frequency is supposed to be 18.43kHz.  That's 542 ticks
of a 10MHz clock.  The divisor should thus be 137 * 4.

Ok. Hsync works with a pulse of about 1.5 us wide.  It would be nice
to be able to get at some timing data to see how the vsync works
(duration and nb of lines..)


[1] http://pinouts.ru/Video/mono_ttl_pinout.shtml
[2] http://en.wikipedia.org/wiki/Color_Graphics_Adapter
[3] http://en.wikipedia.org/wiki/IBM_Monochrome_Display_Adapter
[4] http://www.seasip.info/VintagePC/mda.html
[5] http://en.wikipedia.org/wiki/Motorola_6845


Entry: simpler debugging
Date: Wed Jun  3 16:34:38 CEST 2009

Some things that need to be figured out:

  - profiling.  Why is loading the (compiled) image so slow?  Is it
    really the scat vm?

  - find a proper way to recover from chip lockup.  Currently it will
    exit the app after "cold" fails (relic from pk2 external reset).
    -> fixed: won't exit now.

  - give up on dictionary as "top" files.  it hurts composition
    (frameworks don't work for debugging: you want top control to
    script things) maybe do this: when generating a .dict also
    generate a .live file which simply loads the .dict file.  the user
    can then use this file to add debug commands, and it could be used
    to add a sandbox etc..

  - figure out a simple way to make the disassembler display the right
    addresses.  currently it interprets everything as code addresses.


Entry: structured procrastination
Date: Wed Jun  3 23:14:12 CEST 2009

I think I've been doing this[1] for a long time now.  For me it's
always been network system administration, improving debugging tools,
starting new projects from crazy ideas and generally structured
talking out of my ass to see what it brings..

Actually, apart from te net admin it's maybe all true
procrastination..

Lately I've been wanting to study more PL theory, but I still find it
more interesting to re-invent the weel following subtasks in Staapl.
The Forth bootstrap has been quite a funny example here..  It's an
interesting problem.  Not only for nostalgic reasons (yeah those 8 bit
thingies) but I like building non-trivial things on simple systems -
to think, where mere brute force programming pattern duplication would
be much more economic with time..  But that, is excepting defeat.

Anyway.. What am I saying.. This is not a blog.. It's about the code,
no?

The real problem I'm avoiding at this moment is partial evaluation and
the functional concatenative language.  I also promised myself to
write the USB driver first, and write the documentation.  But instead
I'm bootstrapping a standard Forth on top of a system that is
distinctively non-standard for very good reasons.  Why?  Because it's
fancy and shows off the flexibility of the system.

The PE is difficult..  Maybe mostly because the problem is not
well-defined.  It's difficult to define "optimal".  People do seem to
have tried (and maybe succeeded?).  Hence I didn't read too much yet
either..

The USB driver is difficult for a different reason: it's extremely
tedious and error-prone to write in a direct style.  I'm trying to do
it a bit more abstract so I can learn a thing or two for the next
complicated driver I need to write..  I'm thinking about specializing
in driver writing, and write some tools for that.  It's a difficult
problem worthy of some attention.  It's also particularly unglamorous,
so might make me some money in the process.

Writing documentation is difficult.  Writing clearly tout court is
difficult.  I'm starting to gain more and more respect for good
teachers.  And for good manuals.  The PLT Scheme manual is a good
example.

So, what for tomorrow?  Maybe finish the bootstrapping.. :)

[1] http://www.structuredprocrastination.com/


Entry: multiple interpretations
Date: Thu Jun  4 00:54:32 CEST 2009

The "unrolling" bootstrap compiler points to a problem in the rpn
semantics interface: it is currently impossible to attach more than
one interpretation.

Wait.  Why not simply use two rpn-parse forms?
Ok, that's silly simple..


Entry: TI chips
Date: Thu Jun  4 02:43:39 CEST 2009

On the wikipedia[1] page it says the C6000 floating point processor is
code-compatible with the C62x.  I'm not sure how that works though..

C6000 Series

    * TMS320 C6000 series, or TMS320C6x: VLIW based DSP's.
          o TMS320C62x fixed point /2000 MIPS/1.9 Watts
          o TMS320C64x fixed point - code compatible with TMS320C62x
          o TMS320C67x floating point - code compatible with TMS320C62x

However, it does please me I didn't get the DM6446(C64x+)-based system
for nothing.

Wow.. This is not a simple chip.  Maybe I should stick with the TI
toolchain and libraries.  I think my time is best spent elsewhere..

[1] http://en.wikipedia.org/wiki/TMS320#C6000_Series


Entry: hardware want
Date: Thu Jun  4 03:38:50 CEST 2009

After the USB interface is done, it's time to move to different
architectures.. I want them all of course, but what would be best?

  - LLVM.  Shouldn't be too hard.  Ideal for the 32-bitters.  ARM
    THUMB and MIPS shouldn't be too hard either.  I don't have much
    use for this though..

  - dsPIC.  The architecture is not too complicated - very PIC like,
    with same/similar peripherals.  The DSP part however is more
    RISC-like with lots of registers - might warrant a special
    sublanguage.  Best for personal use as I have lots of samples.
    Low pin count PDIP packages.


Entry: mcf and syntax parameters
Date: Thu Jun  4 11:35:26 CEST 2009

Expanding with multiple semantics is maybe best solved with syntax
parameters.

So.. The lifting: instead of fishing out dependencies to see _what_ to
be defined in the (scat) namespace, it might be simpler to just define
everything and provide stubs.  This might also help for more general
source-level Forth simulation and analysis.

[1] http://docs.plt-scheme.org/reference/stxparam.html


Entry: USB sucks
Date: Thu Jun  4 14:01:29 CEST 2009

The problem with USB is that it is a bit of an all-or-nothing
protocol.  Incremental development is hindered by not being able to
access the host side's primitives directly.  You just have to set it
up and let it go through a sequence.

It has this in common with physical real-time systems.  Because you
can't stop or slow time, often you just have to setup a test rig and
log the behaviour for later analysis.

Now, simulation does make this easier.. Is it possible to run the
linux side of the chain in a step-by-step way using qemu or so?  Seems
to be difficult to set up.  Probably logging[1] is the better
approach.

[1] entry://20090217-100852


Entry: dsPIC
Date: Thu Jun  4 14:55:53 CEST 2009

The overall PIC24/PIC30 architecture isn't too different from the
PIC18 to make most code work similarly.  However, the assembler will
be quite different due to presence of addressing modes that were
previously unavailable.  Solving this will also make porting to
different architectures simpler.

The first problem to solve is how to express them syntactically.  The
Staapl PIC18 assembler uses a flat syntax which is no longer
suffcient.  See table 4A, page 5 of the migration guide[1].

It's probably best to implement addressing modes as argument
transformer functions that live in the Scheme namespace.  However,
this requires them to fill multiple fields.

Maybe the complexity of the assembler should be raised a bit to be
able to access multiple fields from a single argument.  I.e. start
with the PIC18 GOTO opcode.

How does LLVM expres addressing modes?

[1] http://ww1.microchip.com/downloads/en/DeviceDoc/39764a.pdf


Entry: next actions
Date: Thu Jun  4 17:38:03 CEST 2009

USB: usbmon trace
MCF: build primitives, try to compile, make memory model
DOC: port old pic18 forth doc to new doc
DSP: addressing modes in assembler
TTL: find MDA vsync timings
SNT: metal box + drill stand


Entry: MDA
Date: Thu Jun  4 17:40:56 CEST 2009

MDA timings

pixel clock:    16.257 Mhz  (882)
line frequency: 18.432 kHz  (370)
refresh:        49.81 Hz

visible 720x350

This timing data comes from some obscure .h file i found googling for
"720x350".  This is probably correct as the pixel clock is standard,
and the total w x h seems to correspond to other documents.

Looking for 16.257 Mhz shows that this is a standard rate.  I found it
referenced here[1] gives 883 pixels:

PC MDA (Mono Display Adaptor)
B&W character-only display 80x25 with 9x14 font, so 720x350 pixels
  80x25 (=2000, not quite 2k) chars of each 2byte (1 char, 1 attrib) = 4k RAM
50 full-frames/s of 368 lines (18.43kHz/54.3478us), 350 shown
  uses non-square pixels, clocked 16.257MHz, 883pixels/line, 720 shown
not compatible with TV technology so can deviate in signalling, uses pure TTL
  2bit digital brightness, and 2 separate 1bit H and V sync
  3 gray levels (BI: 00=black, 10=normal (light gray), 11=bright (white))
DB9 connector 1+2 GND, 3+4+5 nc, 6 Intensity, 7 Brightness, 8 HSync, 9 VSync
  HSync positive, VSync negative active


I checked with the monitor that 1.5 us hsync pulse works.  How long
should the vsync pulse be?

I found timings for VGA text mode (640x350) which gives

hsync 3.77 us
vsync 60.us

Ok.  I got a stable image.  Problem was type which meant D3 didn't get
actuated.

I have it working now with 380 lines + 1 line for the vblank pulse.

Next: interrupt operation to make it jitter-free.  Turn the code into
a state machine, and call it from the ISR.  This requires some robust
way to set isr vector.

As used before for TV, the hold time of the hsync pulse could be used
to compute state machine dispatch for the timer interrupt.  Since I'm
interested in making a dedicated (black box) circuit for driving some
TTL/VGA/TV monitors, this behaviour could be appropriate.

[1] http://neil.franklin.ch/Projects/SoftVGA/Design/Video_Signals


Entry: PIC CRT display controller
Date: Thu Jun  4 19:11:56 CEST 2009

This requires the following elements:

  - High-priority ISR for jitter-free screen updates.

  - Background task for manipulating the frame buffer + communication.

The problem is that the shift register i'd like to use for
sprite/character drawing is also used for normal UART comm, so it
would be necessary to construct an SPI circuit.  This is an
interesting test case for building a distributed system, where
debugging console is daisy-chained.

Maybe use an 18F2620 for the video driver?  It has full 4k RAM and a
lot of ROM for characters and graphics.  It can be I2C/SPI slave and
drive the universal serial output as a synchronous master port at the
same time.

Maybe using a CAN interface will be better, since I'd like to build
such an interface for my car anyway.  With CAN, the device would have
to be an 18F2680.  It has the same EUSART and should be able to use
CAN in parallel.

Probably best to go for I2C first as it is quite a bit simpler.


Entry: multiple devices
Date: Thu Jun  4 20:04:47 CEST 2009

Before getting into other busses, I need a way to access 2 separate
chips from the same debugging session.  I tried communication
protocols before, but without a proper way to control _both_ sender
and receiver testing becomes quite difficult.

Some observations:

  * Async serial ports simply work.  There is no substitute for
    interfacing to a PC.

  * Staapl monitor communication is host directed rpc / half duplex.
    Slaves are quiet unless addressed.  This means all slave outputs
    could be wired together.

  * Bit-banging serial _output_ data is a lot simpler than input.  The
    debugger could send out a protocol extension which sends the slave
    mask before the message.

  * The router could perform the reply OR in software as its inner
    loop.  It doesn't need to understand the protocol, just combine
    bits.


Roadmap:
  - build the router.
  - use it to bootstrap monitor protocol over I2C.


Entry: Staapler
Date: Thu Jun  4 20:25:01 CEST 2009

Objective has changed.  The Staapler is no longer a PIC programmer,
but a serial <-> network interface to distribute the monitor protocol
to different chips.

The current architecture has a 18F1320 which should be adequate.

No.. It doesn't have xtals..

Let's take a comfortable 18F2620 @ 40 Mhz.

Let's bootstrap this incrementally, just like one would do in an
experimental setup.

The setup is pk2 connected to staapler.

Is it possible to use the pk2 to program other chips, with staapler
acting as a router?

I'm sort of back to square one here.. For convenience I got rid of
bootloaders, at least i got rid of "standard" bootloaders.  It's
simpler to always start from scratch, and the monitor code just
doesn't seem to stabilize..  Plus you have full control over the boot
block without chance to mess up.

Because routing the programmer signal is less trivial I
might have to go back to bootloaders for multi-PIC experiments.

Is that so?  The problem is that 12/14 bit non-self programmable PICs
are impossible to use then..  The first task would be to shield the
staapler from the program signal.  This could be done using a passive
switch.  To route the programming would require active switches.

Hmm.. i'm lost already.  Too many ill-specified conflicting
requirements..  KISS.  PIC18 only.  No programmer routing, unless for
all-equal code.

What about this: start with multi-PIC projects where code is the same
so PICS can be driven in lock-step. (SIMP ;)


Entry: dictionary files
Date: Fri Jun  5 09:20:10 CEST 2009

Problem: two conflicting behaviours wanted.

    - from a terminal "mzscheme project.dict" should fire up a
      readline Forth terminal.

    - from a test framework, the dictionary should be accessible as
      data.

Maybe it's time to learn something from Taha and friends: once you
call something "code" it should really be opaque.  Writing out a
Scheme file that's supposed to be executed, and then inspecting it
afterwards is not a good idea..  If it is data (open to different
interpretations) it should be written as such: tagged + unevaluated.

Slogan:

        Data is code parameterized in its interpreter.

Or the more obvious one:

        Code is data bound to its interpreter.


Entry: metacircular forth unrolling
Date: Fri Jun  5 10:44:08 CEST 2009

Ha.. Again, it's not so simple as I thought: Scat needs to be extended
to make the macros based on "branch" and "?branch" work.

I believe here lies the "almost a new thing" part of unifying the
prefix parser and the inner interpreter.  I've been trying to make
this explicit for a while.  The idea is this:

  * Parsing words read the next word from the input stream before the
    normal interpreter has a chance to interpret it in the default way
    (lookup + execute/compile)

  * "doLIT", "branch" and "?branch" implement control flow exceptions
    for the inner interpreter.

Note that in both cases a proper quoting mechanism can reduce the
number of words that do this to one.  This is what the (quote _) form
does in the rpn syntax: it loads an atom from the input stream on the
stack, upon which normal semantics can be used to manipulate it
further.

So.. There's something simple hidden here. It's all about stacks.

   Prefix parsers in Staapl use the input stream as a stack.  Forth's
   parsing words don't do this, neither does the inner interpreter.

   The reason Scat seems to not need a stack is because it implicitly
   uses a tree instead (Scheme's closure structure).


What is a procedure call?  It prefixes the current continuation (a
code list) with a code list.  There is something disturbingly
non-circular about this..  A trap I fell into many times before..

There are really 3 stacks, and they correspond to the 3 registers of
the CEK machine:

  * Code (input threaded code)
  * Environment (data stack)
  * Kontinuation (return stack)

But there is this interesting relation between C and K.

Anyway.. I'm getting confused and need to re-establish contact with
the concrete.

Practically

  - how to unify rpn-parse with rpn-lambda, writing parsers in terms
    of scat code that accesses the code stack?

It is important to see that (prefix-parsers ...) is also a stack
language.

The problem however is that it is really arbitrary what function these
stacks have..  It's only the number of stacks that is important, and
whether they are used as stacks or as streams.

This story is really about parsing and rewriting..  Is there some
theory about this?  Is a 2-stack machine fundamentally different from
a 1-stack machine or a 3-stack machine?

One of Chuck Moore's slogans is: you need stacks, and you need at
least two.

So why is one different from two?  I'm missing a lot of theoretical
knowledge to know where not to look..  I'd say in general the
introduction of an extra stack would help you to save whatever you
_were_ doing with N-1 stacks on the Nth stack, and solve a subproblem
that needs N-1 stacks.

It might be an interesting problem to define all functionality in scat
in terms of an N-stack machine.  It is already very much like that,
just not explicit:

   - prefix parsers (rewrite rules) use the input stream as a stack.
   - prefix parsers use the dictionary (a stack of stacks)
   - the CFG compiler uses a set of stack of stacks.

the basic structure of Forth is the interpreter:

  D D E x x x D D D D

where "D" are tokens to be interpreted in the default way and "E"
exception tags that change the meaning of subsequent terms.  By
interleaving the "D" with their semantics you get:

  (C D) (C D) (E x x x) (C D) (C D)

Where "C" is the default compilation action.

But I'm moving too far away into the abyss..

Entry: don't lift everything
Date: Fri Jun  5 13:10:31 CEST 2009


Ok I see what the problem is: I'm trying to implement _all_ of a .f
file describing the compiler as a compile-time entitiy.  That is the
mistake.  The compile-time functionality usually does _not_ contain
any conditional code, so straight-line execution is enough.

Can this be made more precise?

The phase separation bootstrapper for metacircular Forth compilers can
be made a lot simpler if it makes the assumption that immediate words,
implemented in terms of other code _do not_ use any immediate words
themselves directly or indirectly through their dependencies.  In
other words: the .f file should be expressable in two phases.

The reason for this is that control words are incompatible with purely
compositional semantics of the scat language.  It wouldn't be
impossible to unroll more phases, but this would require phase 1
(scat) to support assumptions made about code threading, which isn't
the case.

The primitives "doLIT", "branch" and "?branch" are reflective: they
know about threaded code and change the nature of the interpreter.


Entry: is rewriting lowlevel or not?
Date: Fri Jun  5 14:24:05 CEST 2009

Funny how the scheme pattern-based rewriting mechanism is considered
highlevel, but if you try to use it to do anything complicated you
miss composition.  Essentially, you're using a low-level machine
without procedure calls.

Then what is called lowlevel macros in scheme (those that get their
hands dirty with manipulating syntax as data) do have composition so
comprise a highlevel machine.

They are something like different local projections of the space of
composition of programming methods.  The thing is: the human brain (at
least mine) has trouble juggling multiple orthogonal abstractions, so
languages usually tend to limit the number.  However, expressive power
is related to the number of abstractions you can multiply.

This is a basic idea in math, and maybe is the biggest reason why i am
not a mathematician: i need to keep my feet in the mud: i'm not
willing to lift feet off the ground to be able to fill my head with
abstractions.  I want to _see_ what they do.


Entry: CTM
Date: Sat Jun  6 08:35:42 CEST 2009

Section 3.4.3 p.140 mentions the use of Definitive Clause Grammar
(DCG) [1] to hide explicit threading of accumulators.  Immediately
after is mentioned that they no longer use this and prefer explicit
state instead.

Section 3.4.4 p.141 mentions difference lists[2].  These can be used
to prevent consing when manupulating the head of some list.  Also it
is mentioned that this can be used to append in constant time when the
tail of the difference list is an unbound variable.  Apparently used a
lot in Prolog programming.

[1] http://en.wikipedia.org/wiki/Definite_clause_grammar
[2] http://en.wikipedia.org/wiki/Difference_list


Entry: Multiple PICs
Date: Sat Jun  6 10:03:18 CEST 2009

Plan: I2C support for monitor.  Probably I2C is best for debugging
since it's a bus, while SPI is a pipe.

Goals:

  * Use the eusart shift register on the PIC18 for video generation.
    This requires communication over a different channel.

  * Multiple target debugging for testing other network code.

The simplest way to get multiple targets connected is probably
daisy-chaining them.  This would require protocol extensions to
include addressing.  The simplest way is to use decrement addressing,
and send replies to #xFF.

Next: implement this.  Ok that wasn't too hard.

Now, can I make a cable / bus that attaches to the standard serial
port connector?


Entry: bitbanged serial
Date: Sat Jun  6 11:24:36 CEST 2009

So it looks like daisy-chaining should be feasible.  However, I'm
actually more interested right now in getting bitbanged serial to
work.  Both for debugging and for MIDI apps.

It doesn't seem to hard to have a single channel going, but how do you
do multiple?  Looks like busy-looping + oversampling is the best
approach.  This way each channel can have its own phase counter.

Now since I'm just playing anyway, maybe the E2 protocol should be
revived?  A low-bandwidth single channel 4-phase protocol:

  1234
  WR10

3     provides power
3->4  is the sync edge
1     write time
2     read time

I did have some trouble to get this to work though.. What might be
better indeed is a simple modulated serial line.  This would allow
standard receiver hardware to work with just a modification to receive
and transmit to (de)modulate the signal.  The idle signal should be
encoded such that frame errors can be detected when you plug into an
active line.  Luckily idle=1 so this should be no problem.  Simply
using the odd / even bits should work.


Entry: CRT terminal board
Date: Sat Jun  6 12:45:30 CEST 2009

It would have to be a 2620 with oscillator.  The 1220 doesn't have an
I2C/SPI device.  So let's build it.


Entry: debugger interface
Date: Sat Jun  6 12:51:14 CEST 2009

Problem: I've got my boards standardized on the 1x6 serial port header
for the FTDI.  I'd like to keep this working because it's damn
convenient.  Now the problem is, 6 pin headers are not really
standard.. You can find 2x5 everywhere.  And I have plenty of
flatcable to go with it..

How to combine?

A 2x5 header could exhibit these 10 signals:

POW VDD
POW GND
SER TX
SER RX
ICD MCLR
ICD PGD
ICD PGC
ICD PGM
I2C SD
I2C SC

SPI is best carried over a separate channel since it has 3+1 lines.
It's more for fast comm anyway.. This bus is for debug.

Can this be made compatible with ICD and TTLSERIAL pinouts by plugging
them in some fashion?  It can be made to work for either the SER or
the ICD, but not both.  Ser is probably more important.  I can use an
adaptor for ICD.


GND  PGD
MCLR PGC
VDD  PGM
RX   SD
TX   SC

Can the brown line (MCLR) be driven from the TTLSERIAL?  This would
enable target reset. It's CTS (see [1]) which is an input, so this
won't work.  OTOH, this means that it won't be asserted so it's ok to
connect it to a reset pullup.

Doing it like this would enable the use of a 2x3 header to convert to
the ICD connector:

x x  ICD
x x
x x
. .
. .

x .  SERIAL
. .
x .
x .
x .
o


This looks quite acceptable.


[1] http://www.ftdichip.com/Images/ttl232rsch1.jpg


Entry: Neil Franklin's OS ideas
Date: Sun Jun  7 19:05:40 CEST 2009

This[1] contains a description of a microcontroller OS.  If it weren't
for the comments about not wanting to use forth postfix notation, I'd
say he's been reading my blog ;)  What Neil is describing comes quite
close to my objectives.  Though Staapl's stress is on getting the
Scheme side right (first).

[1] http://neil.franklin.ch/Projects/Sketches/Microcontroller_Oper_System


Entry: profiling
Date: Sun Jun  7 19:15:33 CEST 2009

Maybe have a look at the mzscheme profiler.  It would be nice to speed
up compilation a bit.

(define (start)
  (instrumenting-enabled #t)
  (profiling-enabled #t)
  (profiling-record-enabled #t))
(start)

(define (stop)
  (output-profile-results #t #t))

 (start)
(require (file ...)) ;; make sure there are no compiled files!
(stop)


This gives more info:

 (syntax->datum (syntax-case (get-profile-results) () (stuff #'stuff)))

Probably best to get the source loc info from that..  Now, how to up
the sample rate?


Entry: I2C
Date: Sun Jun  7 19:51:07 CEST 2009

Is there a problem with running the controller in slave mode all the
time?  I.e. host doesn't know _when_ a reply will arrive, only that
one _will_ arive.  Maybe it's best to switch ownership.

Ok I2C supports bus arbitration[1], but this is probably not necessary.

[1] http://en.wikipedia.org/wiki/I2C


Entry: more bootstrapping
Date: Mon Jun  8 12:58:39 CEST 2009

This is extremely difficult to pin down!  But it's a good exercise to
understand the dependencies better and make Staapl simpler, preparing
for multiple targets.

Let's see how far we get with simple examples.

Ha, there was something missing in the garbage collector.  Let's
abstract GC first, it might be useful later.  Ok..  This took some
time: made it reusable..

Next: build it to figure out the dictionary prototypes necessary..
This idea needs a break though.


Entry: Emulators & Data Flow
Date: Mon Jun  8 16:01:52 CEST 2009

What about emulators?

This idea disappeared when I was rewriting the parser and module
structure.  What this needs is a way to specify a "mother machine"
that can emulate anything, and a translation from assembler to this
machine language.  This translation could be interpreted or compiled.
To start, primitives in Scheme should suffice.

The hardest part is probably the ALU.  Memory emulation is really just
arrays, and memory-mapped I/O can be implemented using channels.

The real challenge is in combining the PE mechanism with a simulator.
However, this would kill the infinite precision types used.  (Which
means semantics "projection" should be better defined.)

Specifiying the ALU and other hardware should be done in a declarative
single-assignment language like Oz[1], equipped with a parallel +
blocking statement semantics.  This is closer to reality and closer to
concrete HDLs (which are closer to reality)..

Next: build an interpreter in Scheme that can execute this parallel
code.

[1] http://en.wikipedia.org/wiki/Oz_(programming_language)


Entry: DFL
Date: Mon Jun  8 17:53:54 CEST 2009

Started staapl/machine/dfl.ss -- the essential idea is to use Scheme's
scoping mechanism to build a DFG, and then simply execute that graph.
Compilation can be performed by recording a successful trace.

Actually, code can be split up even more:

1. recursively create and connect nodes
2. serialize (try to resolve the dependencies)
3. abstract

The first two steps can be done statically.  The structure this
produces (a bunch of nodes and a sequential program) can be abstracted
into a state update function that can then be optimized to use
registers more effectively.

But: this solves the real problems:

 - PRIMITIVES are implemented as primitive scheme n->1 functions.

 - COMPOSITION is graph construction (where nodes are lexical
   variables), reducable to a scheme procedure.


Hard to express, but what I find remarkable is that for a DFL,
compilation seems to be natural.  Maybe because composition naturally
decomposes into a BIND and a RUN phase, because of its graph nature.
Graphs expressed linearly (as a list of ops) always require 2 passes
to close the references.

Note that currently, macro-expanding everything isn't necessary: it is
possible to re-use subnets, by abstracting them.  The registration
should thus be captured by the "abstraction" construct.  In other
words: how to turn a composite back into a primitive.


Entry: more DFL
Date: Thu Jun 11 10:48:02 CEST 2009

Let's factor it out a bit: computations don't need to be executed
during resolution phase.

1. BIND (construct the network)
2. RESOLVE (find a workable serialization)
3. EXECUTE (run the serialized program as a function)

More specificially: primitives only need to be defined up to signature
(number of in/out).

Ok. I've simplified it a bit and come to the conclusiong that the
composition mechanism should yield scheme functions (thus serialize
all computation).

Also, the dependency resolution is now separated from computation so
it can in principle be done at expansion time.  This poses an
interesting problem however: the syntax uses scheme's lexical variable
binding, but at expansion time this isn't available yet.  So syntax
should be lifted to a next stage.

I ran into this pattern before.  There is probably something
interesting hidden here.


Entry: DAN 60
Date: Thu Jun 11 23:10:18 CEST 2009

http://www.cs.indiana.edu/dfried_celebration.html

Last year I really enjoyed the talk of Anurag Mendhekar[1]
"Aspect-Oriented Programming in the Real World".  I'm watching it
again.  Some point to take home:

  * You can't write generalized weavers.  That's a bit of the point of
    aspect-oriented programming: your design (the aspects) determines
    the implementation of the weaver.

  * Practical advice: your users will think language = syntax.  Use
    XML to trick them into thinking they already know what you're
    telling them.  Next to a standard syntax, also try to give your
    language a dumbed down semantics.  (This is the rule of least
    power[2]).

  * Be _really_ careful about the abstractions you build.  Know your
    stuff (language design, macros, source tx, compiler design)

  * Specific techniques: abstract interpretation, monadic semantics
    and Hindley-Milner type systems (constraint solver).

I don't find any introductory work on Modular Monadic Semantics.
Googling for it does yield some papers though..

About this syntax vs. semantics thing.  Indeed, I've run into this
too.  Semantics seems to be something that is "hidden" when people not
experienced with how computers (interpreters) work think about
programming.  The direct contact is with the language syntax.  They
will absorb the semantics through example.  It seems that this
implicit notion of language semantics is quite central to the way
natural languages are used: semantics is created by your _own_ brain,
by observing how _another brain_ interprets the syntax.  Starting from
an explicit model of semantics is really confusing, as it puts
abstraction before experience[3].

[1] http://video.google.com/videoplay?docid=1875973989673234551&hl=en
[2] http://www.w3.org/2001/tag/doc/leastPower-2006-02-23.html
[3] entry://../quack/20090613-140830


Entry: DFL + generating C code
Date: Wed Jun 24 19:40:05 CEST 2009

Maybe it's time to start using this dfl language together with some
mechanism to generate processors that can be embedded in Pure Data.


Entry: sheep synth in a cherry ps2 keyboard
Date: Wed Jun 24 22:49:03 CEST 2009

I didn't find a proper box, so i'd thought I'd give it a try to put it
inside a cherry keyboard.  But what to do with the keyboard itself?
Should I leave the controller in and try to use ps2 commands, or
should I build a new scanner?

4 pins: CLOCK,DATA,GND,+5V

The clock frequency is 10-16.7 kHz.  11 bit protocol:
start-8data-parity-stop.  Data changes on falling edge and can be
sampled on rising edge.


[1] http://www.computer-engineering.org/ps2protocol/


Entry: TTL mono
Date: Thu Jun 25 12:23:03 CEST 2009

I have a bunch of 13.5 MHz xtals that I'm not going to use for
anything.  Maybe they can serve for the TTL project?  54MHz should
work for the 48MHz parts.


Entry: IDC debug pinouts
Date: Thu Jun 25 13:12:14 CEST 2009


So, what's the standard way of numbering 2x5 ribbon cable?  Judging
from the blades, pin 1 is either in lower left or upper right.  I'm
taking the layout I find in standard IDE cables, which puts pin 1 in
lower left with the gap on the bottom.

---------------
|  2 4 6 8 10 |
|  1 3 5 7 9  |
------   ------

The connector that's mounted on top of this maps the pins to this
pattern to connect to the ribbon cable with pin 1 marked Red, and the
other pins Grey.

  2   4   6   8   10
1 | 3 | 5 | 7 | 9 |
| | | | | | | | | |
R G G G G G G G G G

Let's fix it like this: pin 1 of the IDC is mapped to pin 1 (GND) of
the TTL serial.  The 1x6 connector seems to fit in the 2x5 ridged
header plug.

1 GND    2 PGM
3 MCLR   4 PGC (green)
5 VDD    6 PGD (blue)
7 RX     8 SD
9 TX    10 SC

Maybe one more modification.  To have the programming lines accessible
through a 2x3 is maybe not so important as having the power + I2C
lines available?


[1] entry://20090606-125114
[2] http://en.wikipedia.org/wiki/Insulation-displacement_connector


Entry: Component Order Futurlec
Date: Thu Jun 25 16:52:57 CEST 2009

IDC 2x5 connectors + sockets.
90deg angle 1xn male headers
1xn female headers (stamp board)


http://www.futurlec.com/ConnIDC.shtml


Entry: logic analyser
Date: Fri Jun 26 13:50:18 CEST 2009

It shouldn't be too difficult to perform sampling, but the problem is
getting the data to the host.  Currently I have about 200kbit for the
serial line..  My previous conclusion was that USB is the only viable
option as it can transfer 12Mbit.  Maybe I should re-get going..


Entry: why?
Date: Sat Jun 27 10:48:58 CEST 2009

I got hit by a suddon jolt of loneliness in doing all this.  But hey,
it's what I wanted: to have time to try things out for myself.  I ran
into this[1] last night and started to read up on plan9[2], inferno[3]
and oberon[4].

Then I ran into a plog post about low-level programming[5], which made
me see why I really enjoy that kind of work: simple, possible to get
it right, room for cleverness hidden behind clean interfaces.  The
fact that hardware needs some extra massaging I guess isn't so bad
compared to having to deal with bloated towers of arbitrary brokenness
:)


[1] http://www.loper-os.org/?p=8
[2] http://en.wikipedia.org/wiki/Plan9
[3] http://en.wikipedia.org/wiki/Inferno_(operating_system)
[4] http://en.wikipedia.org/wiki/Oberon_operating_system
[5] http://www.yosefk.com/blog/low-level-is-easy.html


Entry: PIC overclocking
Date: Sat Jun 27 14:00:34 CEST 2009

Picstamp is configured with XTPLL, but the 2550 for TTL mono
generation has a 13.5 MHz XTAL attached, so it needs to be HS.

Doesn't work with 2550.  Could be I do something wrong with my config,
but I guess it's because the PLL is fixed frequency.  Tried with 4x
HSPLL on a 2620 and this seems to work, so I'm no longer trying the
2550.


Entry: further bus bootstrapping
Date: Sat Jun 27 14:46:38 CEST 2009

Connector seems to work.  In order to get the ttlmono to work with I2C
i need to bootstrap that first using daisy-chained serial.

Serial daisy-chaining works too.  I've added a debug connector to the
452-40 board.  A little tweaking on the host side is still necessary
to keep everything lock-step + add some error recovery.

Now, how to manage different images on the host?  As long as they have
the same macros this isn't such a problem.  Images could then carry
symbolic data so they can be mapped to bus ids.


Entry: fix PLaneT package
Date: Mon Jun 29 23:59:16 CEST 2009

I forgot it's broken.  Just tried install under windows, which works
fine except that it doesn't compile .hex files.

Ok.. I get a strange error building the docs.  Same one that happened
when moving 32 -> 64 bit.   Maybe some files need recompile?


tom@zni:~/staapl/doc/scribblings$ make
scribble --pdf ../../staapl/scribblings/staapl.scrbl
match: no matching clause for #<compiler>
make: *** [staapl.pdf] Error 1

I think this is different versions of structs..  Something wrong with
the paths.

Ok, there's only one proper way I can see to fix this: catch the
"match" error and translate it to an error which prints the current
context somehow.

I've isolated the problem to this:

@section{PIC18 Forth}

@forth-ex{
planet zwizwa/staapl/pic18/route
}
@ex[()(code)]

@forth-ex{
forth
: one   1 ;
: two   2 ;
: three 3 ;
: four  4 ;

: interpret-byte
    route
       one . two . three . four ;
}
@ex[()(code)]

It seems to confirm the idea: the code required by the forth statement
re-instantiates the compiler struct..  Let's test that idea again by
adding a print statement near the struct def.

Running just the demo module there is indeed a double instantiation.

box>
module demo.ss (/home/tom/staapl/staapl/pic18/demo.ss)
define-struct compiler
box> (forth> "planet zwizwa/staapl/pic18/route")
define-struct compiler


Entry: more errors and warnings
Date: Tue Jun 30 11:53:43 CEST 2009

It's high time this gets automated..

setup-plt: WARNING: duplicate tag: (mod-path "(planet zwizwa/staapl/macro)")
setup-plt:  in: /home/tom/.plt-scheme/planet/300/4.2.0.5/cache/zwizwa/staapl.plt/1/10/scribblings/staapl.scrbl
setup-plt:  and: /home/tom/.plt-scheme/planet/300/4.2.0.5/cache/zwizwa/staapl.plt/1/9/scribblings/staapl.scrbl
setup-plt: WARNING: duplicate tag: (mod-path "(planet zwizwa/staapl/pic18/demo)")
setup-plt:  in: /home/tom/.plt-scheme/planet/300/4.2.0.5/cache/zwizwa/staapl.plt/1/10/scribblings/staapl.scrbl
setup-plt:  and: /home/tom/.plt-scheme/planet/300/4.2.0.5/cache/zwizwa/staapl.plt/1/9/scribblings/staapl.scrbl
setup-plt: WARNING: duplicate tag: (index-entry (mod-path "(planet zwizwa/staapl/macro)"))
setup-plt:  in: /home/tom/.plt-scheme/planet/300/4.2.0.5/cache/zwizwa/staapl.plt/1/10/scribblings/staapl.scrbl
setup-plt:  and: /home/tom/.plt-scheme/planet/300/4.2.0.5/cache/zwizwa/staapl.plt/1/9/scribblings/staapl.scrbl
setup-plt: WARNING: duplicate tag: (index-entry (mod-path "(planet zwizwa/staapl/pic18/demo)"))
setup-plt:  in: /home/tom/.plt-scheme/planet/300/4.2.0.5/cache/zwizwa/staapl.plt/1/10/scribblings/staapl.scrbl
setup-plt:  and: /home/tom/.plt-scheme/planet/300/4.2.0.5/cache/zwizwa/staapl.plt/1/9/scribblings/staapl.scrbl
setup-plt: rendering: <planet>/zwizwa/staapl.plt/1/10/scribblings/staapl.scrbl
setup-plt: error: during making for <planet>/zwizwa/staapl.plt/1/10/comp
setup-plt:   define-unit: undefined export macro/address in: (define-unit machine-test@ (import) (export machine^) (compositions (macro) macro: (code-size 1)))
setup-plt: error: during making for <planet>/zwizwa/staapl.plt/1/10/pic18
setup-plt:   default-load-handler: cannot open input file: "/home/tom/.plt-scheme/planet/300/4.2.0.5/cache/zwizwa/staapl.plt/1/10/pic18/dtc.ss" (No such file or directory; errno=2)

Ok.. added a 'allss' target which compiles all .ss files in staapl/
tree.  This should make it easier to create a garbage-free dir.


Entry: blink-a-led and non-interactive code
Date: Tue Jun 30 12:08:55 CEST 2009

The chip macros depend on "baud".  I've removed these but it will
probably break the rest..

I just tried if this worked:

0 org-begin
: def1 1 2 3 ;
: def2 123 ;
org-end

Apparently it does.  I'm surprised.  This is an indicator that I need
to simplify the CFG generator, since I don't think this should work...

Ok, what I think this does is create a single block that cannot be
reallocated.  This is the correct behaviour.  Omitting an org
statement leaves the flash allocation to the compiler (later,
currently it's just concatenated at 0x44 for PIC18 = boot block + jump
over init code).

Fixed dependency on "baud" and "fosc".


Entry: improving build times
Date: Tue Jun 30 14:50:35 CEST 2009

on zni, building all objects takes

time make clean all-modules

real	0m46.524s
user	0m43.923s
sys	0m1.708s

I suspect the reason for this is too much "for-syntax" requirements.

Hmm.. maybe not..  Note that I did remove "poke" (say takes one
second), but the result is now:

real	0m42.222s
user	0m40.039s
sys	0m1.544s

A little but not so much..


Entry: fixing the disassembler
Date: Tue Jun 30 17:11:40 CEST 2009

The dasm needs info about the type of the operations to perform proper
reverse lookups.  Where can these be inserted?

Damn it's too hot to think..


Entry: Against the Toplevel
Date: Wed Jul  1 10:39:18 CEST 2009

I find myself quite in the middle!  I agree wholeheartedly that
programs should be cast in stone, with a clear structure where all
dependencies (all links) are explicit.

But.  For quick-and-dirty stuff in a world that's not perfect
(electronics) a little extra power does wonders.


[1] http://calculist.blogspot.com/2009/01/fexprs-in-scheme.html


Entry: parsers
Date: Sat Jul  4 23:41:14 CEST 2009

One of the things that need to change in the next iteration is the
unification of the prefix parser with the code CFG generator.  both do
essentially the same thing: transforming a linear structure into a
list of things (parsing: definitions, CFG: basic blocks).


Entry: complex hardware
Date: Mon Jul  6 16:35:27 CEST 2009

Why are computers (hardware/software) so complex?

  - standards: adhering to standard interfaces makes it easier to use
    component based design, which enables the economy of scale.
    however, there are _lots_ of standards, and some of them go back a
    long time.

  - optimization: a good example is the memory hierarchy.  fast memory
    is simply too expensive to have a lot of, and cheap memory is
    complex to read and write.

if you can eliminate those two, you end up with really simple things.

Because the big enemy in the long term is complexity, is there a way
to eliminate these two evils?  Especially on the hardware side,
reducing complexity might yield lower production cost for smaller
components.


Entry: small projects
Date: Tue Jul  7 22:25:44 CEST 2009

Get a small scheme running on the wrt boxes, something with a simple
FFI.  Then metaprogram it from PLT.

Using tinyscheme.  I had to simply substitute gcc for mipsel-linux-gcc
from the OpenWRT buildroot.

Let's see if the same works for scheme shell.

The problem is that there are multiple bootstrap stages.  I wonder if
qemu can be used to run openwrt binaries.  Currently it chokes on
finding the libraries..

This made me discover scratchbox2 - which is a QEMU based simulator
build system, built around autotools.

Entry: multi-pic
Date: Sat Jul 11 15:11:22 CEST 2009

It looks like the simplest approach is going to be to use the _same_
language for all chips involved, which means that the set of macros
are shared.  However, each chip should have its own _dictionary_.

It is ok for one or more chips in the chain to use a _different_
language, as long as it is turned off.

Now.. Maybe it's better to make sure that this isn't necessary: to be
able to use one language per host, and simply share the macro
namespaces if necessary.


Entry: C parsing needs the preprocessor
Date: Tue Jul 14 14:46:20 CEST 2009

I'm thinking about solving a problem that needs solving:  PLT Scheme
needs to be able to parse C code and CPP code.  This involves:

  1. understanding Dave Herman's c.plt package

  2. implementing CPP

In general I would like to keep on refining Staapl to bring it closer
to integration with standard tools (mostly GCC and binutils) and
figure out ways to bring gradual typing to a dynamic language.

On the other hand I would like to write a system that is not
disruptive in a GCC toolchain.  Write something that is immediately
useful without having to move to Forth and Scheme.  Keep the
metaprogramming in the C domain, but provide a decent library
interface and make sure popular scripting languages can benefit from
this.


Entry: Bootstrapping through simulators
Date: Wed Jul 15 17:35:03 CEST 2009

Check out this[1] and this[2].  It might be an interesting way to get
going with the Spartan-3A, figure out a good way to make porting
Staapl to other architectures simpler and get Staapl gegenwind.

Next steps:

  - figure out how to build System09 using the xilinx tools.  it looks
    like there are makefiles provided so now i just need to get the
    general idea of how a fpga design is built up from source files.

  - get a 6809 simulator[5] to hook into Staapl for testing a code
    generator.  the Forth machine model could be gotten from
    MaisForth[4].

Looks like[6] Hans is looking into porting to the Spartan-3A board!.
Actually, this is old news.  It is ported (as can be read on [1]).

Dikke vette kluif!

[1] http://code.google.com/p/rekonstrukt/
[2] http://members.optushome.com.au/jekent/system09/index.html
[3] http://en.wikipedia.org/wiki/6809
[4] http://home.hccnet.nl/anij/xedni.html
[5] http://koti.mbnet.fi/~atjs/mc6809/
[6] http://netzhansa.blogspot.com/2009/03/rekonstrukt-progress-midi-drum-machine.html


Entry: HCC - AVR Forth (dutch)
Date: Wed Jul 15 17:58:41 CEST 2009

http://www.forth.hccnet.nl/forthvanafdegrond12.html


Entry: Assembler with addressing modes
Date: Wed Jul 15 19:11:34 CEST 2009

PIC asm is quite simple due to lack of addressing modes.  I need to
find a way to support addressing modes for supporting the 6809 and
dsPIC architectures.

First step: get inspiration for syntax: find an s-expression based
representation and write it as code modifiers.

Let's think about this a bit.  What is an addressing mode?  It is a
register list semantics modifier.  It doesn't necessarily modify a
single register, but can also modify groups (i.e. relative
addressing).

I don't have enough data points so this needs to be implemented using
a relaxation approach: just do it and see where it gets at..

There are two ways to look at it:
  - how to represent it textually
  - how the modes are encoded

Typically, encoding is a bit vector that expresses the type of the
arguments.  I.e. for dsPIC.


The reason why I find this so difficult is because I currently have
assemblers modeled as _functions_ while this doesn't work for
addressing modes, unless you see an addressing mode as an opcode
modifier instead of an argument modifier.

In some sense, I need to undo some of the cleverness that's used in
the design of the assembler language.

Now in general there seems to not be so much structure so it looks
like a simple expansion step on top of the current implementation is
what is necessary.


Entry: dsPIC instruction encoding
Date: Thu Jul 16 08:25:03 CEST 2009

HA! I didn't see this one before: in the dsPIC programmers reference
table 6-2 on page 6-10 contains the instruction 4bit x 4bit encoding
matrix.


Entry: 'instruction-set macro
Date: Thu Jul 16 08:37:10 CEST 2009

Updating the 'instruction-set macro to include expansions so this:

 ;; byte-oriented file register operations
 (addwf   (f d a) "0010 01da ffff ffff")
 (addwfc  (f d a) "0010 00da ffff ffff")
 (andwf   (f d a) "0001 01da ffff ffff")

can be written as:

 (fda   (o f d a) "oooo ooda ffff ffff")
 (addwf (f d a)   (fda "0010 01" f d a))
 (andwf (f d a)   (fda "0001 01" f d a))

or in curried form:

 (fda   (o f d a)  "oooo ooda ffff ffff")
 (addwf (f d a)    (fda "0010 01"))
 (andwf (f d a)    (fda "0001 01"))


Entry: instruction decoder (asm - dasm - sim)
Date: Thu Jul 16 08:46:30 CEST 2009

This assembler business is a serious pain in the ass..  What I really
want is:

  - assembler  (s-expr + forth RPN syntax)
  - disassembler + standard dasm pretty-printer
  - simulator

All generated from the same piece of code.

In some sense, the manufacturer's asm syntax is disposable.  As long
as it can be pretty-printed there is really no need for it.  What
counts is the binary machine code syntax.

The main reasons to include the simulator in the loop are:
  - development of sim+asm/comp itself is made testable
  - partial evaluation will no longer need ad-hoc semantics

Bottom line: currently Staapl is completely defined in terms of
machine semantics and ad-hoc infinite precision eager evaluation at
compile time.  It needs some semantics to at least provide a safety
net for this cavalier way of dealing with compile time computations.

Basicly, there should be only one '+' in the whole chain.

A central piece in this is the instruction decoder.  If it is possible
to write this as a bijective function, a 1-1 map between parsed
opcodes and a binary vector, the rest is just connecting up logic
elements to registers.


Entry: Bijective functions
Date: Thu Jul 16 09:07:13 CEST 2009

I've seen invertable functions before in PLT: in the web server:
stuffer[1].

  (struct stuffer (in out))
    in : (any/c . -> . any/c)
    out : (any/c . -> . any/c)

  A stuffer is essentially an invertible function captured in this
  structure. The following should hold:

    (out (in x)) = x
    (in (out x)) = x

Then it defines a composition operation for these.  Another thing to
think about is constraint based programming from SICP[2], and in the
Guy Steele thesis[3][5].  See also thread on LtU[4].

The CP approach in a nutshell:

  - define primitive constraints as a collection of directed
    functions.

  - define constraint composition (network)

  - find a way to transform an undirected constraint network into a
    directed data flow graph

So it looks like the current dataflow graph code in Staapl could be
re-used for representing the more general constraint problem.

What would be the primitive constraints for a decoder?  Essentially,
how can it be factored?


[1] http://docs.plt-scheme.org/web-server/stateless.html
[2] http://mitpress.mit.edu/sicp/full-text/book/book-Z-H-22.html#%_sec_3.3.5
[3] http://www.ai.mit.edu/publications/pubsDB/pubs.doit?search=AITR-595
[4] http://lambda-the-ultimate.org/classic/message4990.html
[5] md5://4226c96cd6ac34fd4eea1c38de1ecad4


Entry: DAG sorting
Date: Thu Jul 16 10:37:47 CEST 2009

I wrote down an algorithm for this in the car a couple of weeks
ago that uses only vector element transpositions.

example: 1 -> 4 -> 3 -> 2

1 2 3 4
  2 3 4
  . .
  3 2 4
  .   .
  4 2 3
    . .
    3 2
---------
1 4 3 2


Entry: PIC18 decoder
Date: Thu Jul 16 10:47:10 CEST 2009

The way to approach this is to start with the smallest opcode field.
For PIC18 this is the top 4 bits of the 16 bit instruction word.
(Which becomes the top 3 in my representation, as bcf/bsf are modeled
as a single parameterized instruction to ease the optimization of
logic complement).
                                           ;; p: 0/1
 (bpf    (p f b a) "100p bbba ffff ffff")  ;; bsf/bcf
 (btfsp  (p f b a) "101p bbba ffff ffff")  ;; btfss/btfsc
 (btg    (f b a)   "0111 bbba ffff ffff")

The question is then: how to specify this as an equation
(bidirectional function) and how to attach semantics to the fields to
build the simulator.

Currently the letters used as parameter names have a semantics: this
should be defined better as connections to machine state (i.e. R means
PC-relative word addressed).  These should all be equations.

This really hints at the proper solution though: all this is so
intimately related that it should be solved as a whole.  It is really
not much more than a single morphism between the two spaces:

  ( machine-statevector, binary-instructions, computation-logic )

  ( msv-rep, highlevel-instruction-syntax, simulator-primitives)

The problem is representing all this in a form that is accessible for
metacomputation.  Maybe it's best to start with a ball-of-mud and
then disentangle it piece by piece?


Entry: split bit fields
Date: Thu Jul 16 10:59:41 CEST 2009

First, I'd like to fix this:

 (_call   (s l h) "1110 110s llll llll" "1111 hhhh hhhh hhhh")

This could be written as

 (call    (s (h l)) "1110 110s llll llll" "1111 hhhh hhhh hhhh")

where the concatenation of the bit fields is automatic.

But, this notation should not interfere with type specs.


Entry: constraints -> dataflow
Date: Thu Jul 16 11:16:56 CEST 2009

Is it possible to implement a constraint network such that it can
mostly be used as a directed dataflow network, but the possibility is
kept open to run some functions in reverse?

The current compile-time DAG sorter could be generalized to a
constraint sorter, but then the primitive functions need to be
constraints..

What it does look like though is that this can probably be extended
without too much trouble.


Entry: the relative conditional jump
Date: Thu Jul 16 11:21:55 CEST 2009

Start with the most difficult instruction: one that uses machine state
and a typed operand (an addressing mode): a relative conditional jump.

Branch if carry flag is set:

bc R   1101 0010 RRRR RRRR

elements:
        C  carry flag
        R  operand type
           instruction format
        PC program counter

        PC <- (if C (+ PC (* n 2)) PC)

So, what is "R"?  It is some simplified representation of the
semantics "PC <- (if C (+ PC (* n 2)) (+ PC 2))".

This is a very important question to answer.  Distinguish:

   - code syntax
   - symplified semantics (type)
   - concrete semantics (simulator)


Entry: curry-howard correspondence
Date: Thu Jul 16 11:37:15 CEST 2009

I have a chance here to understand some theory better.  Type systems
are simplified semantics: The morphism between typed programs and
proofs is a well-specified version of simplification.

In Pierce 9.4 p.109:

  linear logic <-> linear types
   modal logic <-> partial evaluation and run-time code gen

I did not know this for modal logic[1].  Probably a point to expand on
a bit..

[1] http://en.wikipedia.org/wiki/Modal_logic


Entry: roadmap
Date: Thu Jul 16 14:05:17 CEST 2009

I got a pretty big stack now, started with the idea to port to 6809..
Let's disentangle (goal followed by dependent goal indented).

  * 6809 port + rekonstrukt on the Spartan-3A avnet evl

  * asm - dasm - sim specification  (would also enable dsPIC port)

  * constraint / dataflow specification of a CPU arch (would enable
    proper Staapl semantics useful in static processing: checks + PE

So, how to tackle this?

As I already mentioned before, an example of the problem I'm facing
concretely is how to unify types (very ad-hoc metadata for asm
parameters) with the concrete machine semantics.

I should clarify this and grow the disassembler into a simulator by
figuring out how to represent the two stages: assembly/binary
instruction syntax and run-time machine state operations.


Entry: type checking vs. abstract interpretation
Date: Thu Jul 16 14:35:00 CEST 2009

I would like to understand the connection between abstract evaluation
and type systems[1][2][3].  Damn.. LtU is always a good place to get
re-humbled..

  "Conventional type-checking is big-step evaluation in the abstract:
   to find a type of an expression, we fully `evaluate' its immediate
   subterms to their types. Our paper[2] describes a different kind of
   type checking that is small-step evaluation in the abstract: it
   unzips an expression into a context and a redex."

[1] http://lambda-the-ultimate.org/node/2208
[2] http://okmij.org/ftp/Computation/#small-step-typechecking
[3] http://okmij.org/ftp/Computation/#teval


Entry: machine specification
Date: Sun Jul 19 09:09:57 CEST 2009

It looks like this diverging/converging semantics problem really needs
a good way to specify bottom line operational semantics.

Can this bottom line be expressed in terms of a proto Forth machine
instead of lisp or dataflow?

I am very confused atm...  It seems this is all somewhat
straightforward: i don't see any _real_ problem other than encoding
the vague description in a collection of data structures..  However, I
don't seem to see how to start.  This makes me think I'm
underestimating a certain element.

Let's pick up the data flow language again, and build an ALU
simulator to see how exactly the specification would go for the dasm
<-> alu connection to make the simulator.


Entry: DFL language implementation : Summary
Date: Sun Jul 19 09:25:47 CEST 2009

See here [1][2][3] for font line notes about developing the DFL
implementation in Staapl.  I got distracted by the trick used (using
"eval" at expansion time).

The goal of this language is to have a functional / combinatorial
description of a machine (in terms of a connection of functions), to
ease the propagation of machine semantics to higher levels.  The main
goal is to use this for code analysis, starting with compiler
correctness and possibly guiding the development of
application-specific type systems.

Funny.. I got distracted again!  Don't worry, I'm employing the method
of structured procrastination[4].

[1] entry://20090608-175354
[2] entry://20090611-104802
[3] entry://20090608-160152
[4] http://www.structuredprocrastination.com/


Entry: Abstract Interpretation and Higher Order Macros
Date: Sun Jul 19 11:16:28 CEST 2009

Let's define HOMs as macro generating macros, not macro versions of
higher order functions (comprehensions?).

The problem with alternative interpretations and macros is that there
really should be one evaluation step (syntax -> semantics).  Can this
be done by making a macro generate both code and a macro?

I can't reach it at the moment.  The abstraction level is too
high... I wrote to the plt-list about this.


Entry: next
Date: Sun Jul 19 18:48:59 CEST 2009

Try to understand Matthew's reply[1] and see how to apply this to
other problems (the DFL compiler, the Forth bootstrapping, the Scheme
snarf for Scat, ...)


Entry: using the macro-generating-macro trick for dfl
Date: Mon Jul 20 13:05:01 CEST 2009

I think I understand the point, but I don't quite see how to put it in
the DFL macro.  I guess this is for after the break.  Going to do some
more layed back stuff to recover from last couple of day's sprints and
insomnia.


Entry: back to basics
Date: Tue Jul 21 09:11:58 CEST 2009

So what happened?  Things were going great and all of a suddon I get
overly ambitious because I start explaining (I'm looking for clients
atm, and prospects want to know wtf I'm doing), and because I'm
looking for direct application.

I think I have to be honest and re-focus on the small things for the
next sprint.  Forget about C and LLVM.  It's better to solve that
problem when there is an actualy need, as there is plenty of material
and people to get me through it.  I need to stick to the original
idea:

      * stick to Forth (small machine) and Scheme (untyped lamda
        calculus) as a _simple_ substrate for the idea (a beefed up
        macro assembler)

      * incorporate abstract interpretation / static analysis /
        verification on top of this.


Entry: parsing
Date: Tue Jul 21 09:14:30 CEST 2009

The rewrite in march-april this year solved some problems.  The parser
is now static and composes better.  However, there is still
duplication at several points.  The idea of dictionary and prefix
parsing is so central to forth that it shows up everywhere -
especially when you unroll some reflection - so maybe this deserves a
better abstraction?

Also, current parser seems too verbose to me: it should be written in
Forth sooner than CPS Scheme.

Another point is that it might be better to perform control flow
analysis in a second step.  Doing everything at once complicates
matters, and for targets that do it themselves (later: LLVM) it's
completely unnecessary.

Should i just remove it, and go back to a more straightforward
low-level semantics?

The way compilation works atm, it needs to solve the following
problems:

   - implement `org-begin' .. `org-end'
   - macro local exit

The former allows you to place blocks of code at fixed addresses.

I could try this pretty much in isolation, since the compiler is a
unit..

Looking at it, the current implementation is certainly not too
complicated.  The `state-update' function helps a lot to make it more
readable.  The trouble is however that the control macros do things
behind the scences.  Another problem is that conditional branches are
not counted, making the resulting data structure not a proper CFG.

Maybe finding a proper naming scheme and a better central data type
would help?  Maybe it's just too lowlevel..

Another problem is the use of assignment later to build a
"non-standard" graph structure (as opposed to a flat tree).  A flat
structure can be turned into a tree using names to be bound, i.e. a
lambda expression.  Can this be used to represent the data structure
better?


Entry: semantics and static analysis
Date: Tue Jul 21 11:43:02 CEST 2009

The more I see the structure of what I'm trying to do, the more
"normal" it all becomes.  This is good.  Looks like I'm about to find
a bit more connections to existing approaches.

1. a well-defined operational semantics (relative wrt a single
   abstract point, not the target's concrete machine language) will
   help for "interpretation" based analysis: either full simulation or
   simplified abstract interpretation.

2. this can be used to assist (by verification and test) in building
   locally-specified higher level semantics for sublanguages,
   consistent with the operational semantics.

The idea is to start from the machine, and simply increase the
abstraction level at certain points + provide a set of tools to do
this.


Entry: growing higher level semantics
Date: Tue Jul 21 12:42:23 CEST 2009

The overall problem is that it is quite straightforward to go from
high -> low by mapping a high level construct in a low level one, and
proving this map preserves the algebraic/logic structure.  Going the
other way is not possible in general because a lot of constructs will
lie outside of the range of the mapping.  (I.e. the lower-level
semantics does not have a structure to hold on to..)

So, in general, how do you build higher level semantics given an
operational one?


Entry: macro assembler
Date: Tue Jul 21 12:38:10 CEST 2009

Going further with this: if Staapl is to be a beefed up macro
assembler speckled with application-specific local static semantics,
then it needs to be able to talk to existing assemblers.

Maybe I've been focussing on the wrong foe?  It's not C I need to
target, but machine assemblers.  It's probably best to hide the staapl
asm/dasm/sim behind an interface that can be easily implemented by
external tools.


Entry: RIDL
Date: Tue Jul 21 14:20:12 CEST 2009

Real-Time Functional Reactive Programming

[1] http://www.texbot.org/


Entry: daisy chain buggy?
Date: Tue Jul 21 20:23:32 CEST 2009

i get dropped bytes..  shouldn't happen really.. maybe bad wire?

lowered to 115200
still problems..

hmm.. maybe it's a buffering problem?
i changed `chunk-receive' to #x20 buffers, now it works??

weird.


Entry: multi-pic debugging
Date: Tue Jul 21 23:01:56 CEST 2009

instead of switching between target consoles, it might be simpler to
start working in a host + client approach and go to client consoles
when checking local things.  the monitor should have the semantics of
an RPC server.


Entry: dfl-compose without eval
Date: Wed Jul 22 12:32:40 CEST 2009

(define-syntax (dfl-compose stx)
  (syntax-case stx ()
    ((_ formals body ...)
     (let-staged ((nodes (dfl-graph formals body ...))) ;; (*)
         #`'(dfl-sequence
             formals
             #,(dfl-sort-graph
                nodes
                (syntax->list #'(body ...))))))))

;; The construct(*) behaves as:
;;
;;    (let ((nodes (eval #'(dfl-graph formals body ...)))) ___)
;;
;; but doesn't use `eval'.  Instead it uses a 2nd macro stage to
;; trigger evaluation of the syntax form.  The function
;; `dfl-sort-graph' takes both a syntax and a value `nodes' derived
;; from it and produces a sorted program.


Entry: DFL next
Date: Wed Jul 22 12:41:53 CEST 2009

Implement primitives.  Ha! primitives are multivalued functions :)

The basic idea is this: I'm not going to bother with making dataflow
networks that can be processed as data structure.  For this it's
better to use techniques similar to dfl-sort-graph that operate on
source code directly.  The point is eventually to be able to run the
DFL statements, so it looks like it's simplest to make sure the
dfl-compose / dfl-sequence operation maps scheme functions to a scheme
function, so DFL programs mesh well with other scheme programs.

Which is exactly what it already did..


Entry: Scheme's call/cc and C
Date: Thu Jul 23 09:07:11 CEST 2009

[1] http://lambda-the-ultimate.org/node/1422
[2] http://repository.readscheme.org/ftp/papers/sw2000/feeley.pdf


Entry: Extreme Forth
Date: Wed Jul 29 09:26:11 CEST 2009

Reminds me of the dataflow vs. `thread oriented' programming idea
mentioned before.  In order to make DSP work for Forth, some kind of
local naming needs to be used to 1) make the random-access work. and
2) introduce concurrency in a serial language.

I'd say there is really only one way to properly program DSP code, and
that is using a dataflow language (DFL).

The point is that using Forth as a _frontend_ syntax isn't such a bad
idea.  The syntax allows _programmers_ to perform extreme factoring.

DFL code is tedious to write due to explicit naming of outputs.  See
the following table:

                named-IN  named-OUT
    - forth         /        /
    - expression    x        /
    - dataflow      x        x

So, if you add some kind of primitive to forth that represent (local)
DFL nodes as names (one could `constant' semantics for inputs and use
the `->' operator as node single assignment)

        A B + -> C

Looking at the C18 described in [1] the local names are actually
_streams_ or more concretely, autoincremented memory access.  The rest
of the article talks about distributing an app over multiple cores.

[1] http://www.ddj.com/hpc-high-performance-computing/210603608


Entry: Forth and `singletons'
Date: Wed Jul 29 10:20:29 CEST 2009

Related to `electrical engineers love macros' and the use of global
structures inside Forth abstractions for embedded programming.

If I'm allows to simplify greatly: most embedded programming can be
thought of as consisting of two steps:

   - building abstractions
   - instantiating + connecting

Because the latter part contains a significant amount of _you really
only need one_ the abstractions can usually be macros _that make this
assumption_.

Write something intelligible about this.


Entry: electronics is the world of leaky abstractions
Date: Wed Jul 29 17:18:18 CEST 2009

How much we'd like the world to be perfect, when it gets physical
there is no way around hitting the wall from time to time.

A lot of embedded programming is essentially about resource management
and getting the order of things right.  Due to cost issues
contemporary hardware often isn't smart enough to not step on its own
toes.  In many cases it is the driver code that is responsable for
most arbitration.  (Simply put: most driver code makes sure you don't
turn on this thing before you turn off that other thing.)

Considering the way the human brain deals with such a mess of details,
there seems to be little that can be done other than `test driven
development'.

TODO: expand this idea a bit more..  Basicly: I'm convinced any kind
of code can be made pretty in some sense by using the right
abstraction.  Driver code is usually highly stateful and ugly: most
useful information is hidden behind the order of operations.  Can this
be done better, i.e. expressed in terms of the actual dependencies
i.e. a state-flow graph cfr. a data flow graph?  Maybe I'm just
looking for CSP?

Concretely: can the PIC USB driver be used as a test case for this?


Entry: NEXT: study
Date: Fri Jul 31 12:37:40 CEST 2009

I'm in a quite confused position right now.  Frankly I'm lost.  I need
to read up on some ideas that are key to the further evolution of
Staapl.  At this moment the current core is a dynamicly typed macro
system which has _names_ right: scheme's lexical scope + hygienic
macros.  This makes Staapl into an assembler with a fancy
preprocessor.  The next stage is semantics of stuff to build on top of
the untyped/dyntyped infrastructure.

The focus should be on _examples_ of how to tackle a particular
problem using constrained metaprogramming (typed DSLs).

1. bottom up semantics (ziggurat) - typed macros (dherman)

Figure out what the central idea is and try to see the connection with
writing DSL on top of the macro forth.

2. partial evaluation - abstract evaluation - staging - semantics

http://okmij.org/ftp/Computation/#teval
http://redex.plt-scheme.org/

3. incorporate some special purpose static semantics (i.e. CSP/occam)


Entry: generators
Date: Fri Jul 31 15:40:11 CEST 2009

Some ideas that are coming together:

* Gluing C and Scheme code it is often data aggregation that gets in
  the way: creating scheme lists from C makes it difficult to
  automaticly wrap the C and Scheme world.  (idea: replace lists by
  generators).

* Universal traversal interface[1].

* Lots of threads and channels[3]: eliminate datastructures (which are
  really "snapshots") by always directly connecting consumers and
  producers.  There should be a link between this and linear data
  structures.

* Can data allocation be completely eliminated from C primitive code?
  I.e. write in single assignment style[2]?

[1] http://okmij.org/ftp/Streams.html#enumerator-stream
[2] http://en.wikipedia.org/wiki/Oz_programming_language
[3] http://en.wikipedia.org/wiki/Occam_programming_language


Entry: linear data structures vs. dataflow communication channels
Date: Fri Jul 31 15:52:01 CEST 2009

A linear data structure can be consumed only once.  This seems
suspiciously similar to threads communicating over synchronous
channels: one read per write.

Combine this with Forth (stack languages) as a vehicle for linear data
structures it should be not too far from a very nice morphism between
linear forth + communicating processes.

Damn I need to write this down formally.

The hint is:

    delimited continuations = tasks

    such are the link between data structure cursors (data) and
    traversal routines (code).  essentially: the shift/reset can turn
    the code into a data structure (= a piece of the return stack).

So when you start thinking in terms of: the data structure _is_ the
traversal, it can be made to disappear.

Essentially: datastructures are future computations.


Entry: Delimited Continuations in C
Date: Fri Jul 31 16:21:35 CEST 2009

This means: implement shift() and reset().  I assume that setjmp can
be used for this, or at least to implement the part that captures the
C-stack.

reset() would be a first setjmp() while shift() is a second setjmp()
followed by an operation that saves the whole stack somewhere,
followed by a a longjmp() that jumps to the reset point.


Entry: Studying the Factor compiler
Date: Wed Aug 26 18:20:53 CEST 2009

Time to give it a read[1][2].

[1] http://docs.factorcode.org/content/article-compiler.html
[2] http://www.mail-archive.com/factor-talk@lists.sourceforge.net/msg03573.html


Entry: Rewriting in a nutshell
Date: Thu Aug 27 09:36:02 CEST 2009

This is how I understand it now, all in the light of language
transformations.

- If there is a unique rule to apply in every case, rule application
  is a _function_ and you end up with a small-step semantics.

- If there is more than one possible applicative rule, you no longer
  have a function.

  However, if your set of rules is _confluent_ (like the lambda
  calculus) then you can _construct_ an algorithm that _is_ a function
  by simply picking a reducable term (that might be favourable in some
  sence, or not, but the rule you pick is otherwise not essential to
  the final result) i.e. the graph reduction algorithm used to
  implement Haskell.

- If your rule set is non-confluent but monotone (no loops) you
  essentially no longer have a unique result, but reduction is still
  possible.

  This can still make sense if there is a way to project the results
  onto something that gives them back their uniqueness.  I.e. if you
  have a an onto `meaning' function that maps syntax -> semantics.  If
  there might be ``more than one way to do it''.  To turn this back
  into a solvable problem you need an extra measure to sort out the
  multiple (syntax) results.  I.e. pick the one with shortest length.

- If the rule set is not monotone (results can grow) you can't do much
  with it from a reduction p.o.v.  I.e. such rule sets are more akin
  to BNF descriptions of grammars and are more useful to use ``in the
  other direction''.


Summarized, this is:
   - small-step operational semantics
   - lazy evaluation
   - program optimization
   - grammar specification


Entry: Constraint Programming
Date: Mon Aug 31 15:50:06 CEST 2009


Exploration
-----------

In good in-house tradition, I'm going to try to re-invent it before I
look up the implementation again in SICP[?] or Steele's Thesis[?].

The idea is the following: constraint networks are multi-way functions
built up from primitive constraints.  You can do the following:

      * compose networks
      * assert inputs / receive satisfiability errors
      * query outputs / receive un-asserted errors

The tricky part is going to be to stage the control flow into a
predictable real-time C-program, but first let's write a dynamic
version and see where local algorithms fail.

Start with 2 primitive constraints (possibly the only ones I will
need).

   a + b + c = 0
   a * b * c = 1

To make constraint satisfaction into a local algorithm, a rule
needs to be an active element: when receiving an assertion it
needs to:

      - propagate it if possible
      - store it if no propagation is possible
      - raise an error if there is a conflict

In general, a rule is an ordered / named list of nodes, and a governor
that implements the behaviour above.

Nodes in a rule can be asserted or floating.  Let's represent asserted
nodes by a number and non-asserted ones by #f.

A rule -> node link needs to be bidirectional to propagate values
through the network.  For this a `slot' structure can be used,
referring to a rule and an index (all rules have position-encoded
nodes).


Control Flow
------------

   - a node is asserted, propagating a signal to its associated rules.
   - a rule is asserted:
        - underdetermined: stop
        - determined: propagate
        - overdetermined: error


Remarks
-------

Q: Find out why this can't solve sets of equations. (or can it? what's
   the relation with triangulation?)

A: One answer is that it's possible to have N inputs and M equations,
   which will give a propagation when there are N-M inputs defined.


OK.  Control flow seems to work.  Now, what can we actually do with
this?

* Adding linear functionals is a trivial extension.  Adding general M
  x N systems requires some more code-gen magic to turn the equations
  into directed equations, but doesn't add any significant other
  difficulties.

* In general, adding a particular N nodes, M<N equations feature is
  straightforward as long as ``directionalizing'' the equations into
  functions is possible.

* If the inputs and outputs are known, it reduces to a static I->O
  DFL program.


So, I wonder..  What value does this add?

   - it abstracts control flow (directionalizing equations + sequencing)

What about the following form:

   - set of equalities
   - set of inequalities + actions


Abstract Interpretation
-----------------------

Let's try to capture the static part using abstract interpretation.
Essentially, turn the current implementation into a staged macro
implementation.

An interesting point: can you stage `amb'?

An interesting property of directional constraints is that you don't
need to make choices before you do tests.  In discrete constraint
satisfaction problems you do need to do that.

Maybe I can make a combination of both?  Continuous and discrete
constraints, and use a staged `amb' to compile it.

So what you do in the abstract evaluation is working with the
_availability_ of parameters.  Then you can serialize the control flow
and cast it in stone, leaving the _values_ of the parameters
unspecified.

This means that DFL probably becomes a special case of constraint
programming.

Let's call it ``staged constraint programming''.  Or ``staged
prolog''.

So..  What is the abstract version of a constraint?  It's really
simple: a constraint propagator.

So, in the staged/ae version there are really only 2 problems:

    - constraint propagation based on availability, resulting in a
      sequential program.

    - constraint -> function conversion based on the sequencing
      (constraint directionalization)

This pattern is quite neat: starting with an exotic control flow
paradigm, fixing control flow through staging, which then allows
binding optimizations (storage) and compile-time evaluation.


Entry: DFL/CPL : Control flow staging
Date: Thu Sep  3 10:45:38 CEST 2009

Context
-------

In the application I am looking at, the guiding priciple should be the
following:

   The resulting code should be free of unbounded recursion.  This
   means the structure of the C code is can be flattened to a finite
   size combinatory network.

   Within that framework, how can the specifications be abstracted in
   useful ways such that they can be reduced to this form using
   information available at compile time.


DFL: data flow language
CPL: (deterministic) constraint propagation language

Before implementing it, let's look at the usefulness first.

1.

For DFL it's not disputed: allowing parallel presentation of
operations is an advantage over having to specify the order manually.

There are some degrees of freedom in how the DFL -> function/procedure
is implemented, but this is the classical _inline_ vs. _call_ debate,
and can be mostly related to static inlining vs. run-time function
calls.  Note that in this case, the use of recursion is an
_implementation issue_ mostly related to memory use and memory
locality for both code and data.  DFL specs with code sharing lead to
directed acyclic graphs

2.

Adding unspecified directionality (equations instead of directed data
flow operators) brings us to constraint propagation based networks: at
compile time, constraints specified as equations can be translated
into data flowoperators, in addition to fixing the order of DFL
operator evaluation.

One problem with constraints propagation networks however is that they
are _sparse_.  I.e. they cannot solve parallel constraints (systems of
linear/nonlinear equations).  It is probably my bias towards these
kinds of constraint specifications that makes me think there is a
problem here.. So is there?

In order to augment N x 1 constraints over N variables to M parallel
constraints, one needs a static `directionalization algorithm' that
turns the equation into a function from the known parameters to the
dependent ones (which can then trigger further constraint evaluations.


Open issues
-----------

Is local consistency[1] useful, or are we looking at more global
constraints like sets of equations?


[1] http://en.wikipedia.org/wiki/Local_consistency


Entry: DFL -> LCL -> ?
Date: Thu Sep  3 14:42:28 CEST 2009

This is work-in-progress.  The application that drives this is a small
DSL to express safety constraints.  Other than that the specifications
are unclear.  Anything that's higer level than straight C or some
syntact sugar around it is probably fine.  The main problem is to not
make it _too_ powerful.

Step 1: a data flow language (DFL)

One step up from C by omitting sequential order of evalution.

The code transformation performs abstract evaluation on
ready/not-ready values and outputs a sorted list of functions
satisfying data dependencies.

The Scheme code is here (comments and procedure names might
contain confusing terminology though, and some of it is about
modularity).
http://zwizwa.be/darcs/staapl/staapl/machine/dfl.ss


Step 2: local constraint language (LCL)

One step up from DFL by omitting directionality of the operations.
For this I have no code yet.  The idea is that again using
ready/not-ready values, all sequencing can be done statically like the
DFL, but in addition ``directionalizing'' (better word?) the
constraints can be done at compile time.


Step 3: global constraints

I think the story then breaks down because of data value dependencies.
In some cases the global constraints can be turned into local
constraints (i.e. linear equations with M unknowns and N < M
equations) but these are special cases.

However, it might still be useful to add some constraints that need a
search-based approach as long as the inputs for them are also provided
(or they could be slow-changing, like once per day switching on/off a
particular input).  Discrete inputs could then switch between modes,
or could trigger re-compilation of the constraint checker.


Entry: Practical: constraint.ss
Date: Fri Sep  4 11:13:10 CEST 2009

I have a first draft of a staged local propagation constraint
language.  Now the plan is to clean this up by eliminating scheme
phase issues (i.e. rule classes are phase 1) and find a way around
multiple outputs.

OK Phase issues are fixed.  Let's leave multiple outputs to later.

How to make this more interesting?  One of the requirements is
definitely going to be inequalities.

Maybe a `range' type would be good.

I.e. the MAX rule:

     m = MAX(a,b)

when it receives values for `a' and `b' it can compute a value for
`m'.

however, if it receives a value for `m' and `a' things are more
complicated:

   a > m   -> ERROR
   a = m   -> b \in [-\infty, a]
   a < m   -> b = m

for floating point values that are part of measurements, the equality
doesn't make much sense.. so ranges aren't really necessary.


Entry: More about rules
Date: Sun Sep  6 08:53:59 CEST 2009

Looking closer at the problem, this isn't half as trivial as I first
thought..  Mostly: I can't see how to use MAX or other comparisons
other than in one direction.  Can I work around this by simply making
this a compile time error?  Max only works in one direction?

There is one vital assumption that is maybe not so valid: is it useful
to fix the direction at run time?  Maybe it's simpler to use an
event-driven / FRP approach?

No, let's stick to the current description: the language is going to
need a certain complexity to be useful.  It's probably not so that
this could serve as a simple code example -- only as a design
principle.

I found one more PhD thesis[1] linked from the wikipedia article[2].

[1] http://www.ps.uni-sb.de/Papers/abstracts/tackDiss.html
[2] http://en.wikipedia.org/wiki/Local_consistency


Entry: Other abstract values
Date: Sun Sep  6 09:07:07 CEST 2009

It looks like it's necessary to allow general purpose computations at
compile time (i.e. full elimination of rules).  This would allow the
introduction of `amb'.

Note that backtracking will probably need a state-threading approach
to the code generation problem: the current assignment-based approach
is probably a dead end.

Hmm.. this is definitely not going to be a simple example.  I guess
it's best to separate out the DFL implementation.

The useful directions seem to be:

  - eliminate implicit code threading (pure functional generator)
  - learn more about constraint propagation

It would probably also help to read Jacques Carette's paper[1] about
monadic generators.  Once the generator is purely functional,
backtracking can be added.

Instead of a synchronous networ approach, it might also be interesting
to directionalize equations from the pov of a single variable to its
dependencies.  The concept of finding abstractions that allow staged
control flow is apparently a bit more broad than I first thought.

In my first approximation of understanding, constraint propagation
seems to exploit the sparseness of systems to yield local inversion
which can then be chained.

OK. Pure functional generators.  Two kinds of effects need to be
eliminated:  value assignment and code `emit'.  The latter can simply
thread the state, while the former can use a node -> value finite
function to be passed around.

For backtracking purposes, it seems possible to combine parameters
with partial continuations as long as the parameter values are
saved/restored upon continuation capture/invokation.  The _values_
stored in the parameters however need to be sharable: the current
in-place update doesn't work, and needs to be replaced with something
purely functional.

Let's see..

[1] http://www.cas.mcmaster.ca/~carette/publications/scp_metamonads.pdf


Entry: Too much freedom
Date: Mon Sep  7 08:55:44 CEST 2009

OK, I have amb running with parameter swapping, so if non-determinism
(search) is needed when generating code it's there.

But what to do with it?  It looks like I can't go any further without
proper specs.

So, what is the problem?

  - Allow for a number of inequalities to be specified over the data.

  - Because they are essentially free, allow for equalities too

  - Define unsatisfiability: what actions are to be taken?

The real question is: why isn't a simple directional data-flow
approach good enough?  If I have so much trouble trying to understand
why one would abstract out directionality in the constraints, it's
probably not worth the bother..

The missing link is probably that this really needs an _event_ based
approach: the safety monitor needs to

          1. be able to tell something isn't right

          2. appoint blame to the event that broke constraints

          3. if possible, correct or prevent a setting

What about this: for each collection of inputs that can be given as an
event, produce a program that says if this action is valid or not
according to a number of constraints.

This is actually a lot simpler than propagation: it needs only to
associate each collection of inputs to the collection of rules they
are connected to, and check each rule.

The constraing propagation could then be used for directionalizing
equalities that serve only to express extra relations.


Another point: the wikipedia page talks about propagation as
restricting the domains of nodes after eliminating constraints.


Entry: Ranges
Date: Mon Sep  7 10:00:04 CEST 2009

The problem is the domain of the nodes.  Currently, they are limited
to values, and propagation performs assignment.  Let's clean this up
first by allowing subsets.

First: constraints need to perform propagation themselves.  This
should not be handled in the driver routine.


EDIT: equations and inequalities: equations useful for defining
intermediates on which the inequalities are specified.  When looking
from the point of individual get/set operations, the algorithm
performs directionalization of the equations + selects the appropriate
constraints to check.


Entry: A safety monitor
Date: Tue Sep  8 11:08:54 CEST 2009

A language with two constructs is probably good enough:

  - inequalities to express safety constraints.  when not satisfied,
    these trigger actions.

         i.e. "totalpower < maxpower"

  - equalities to add extra internal nodes to make the inequalities
    easier to express


This could then be used to create 2 classes of imperative safety
monitor programs:

  * system monitor: keeps an eye on a system's output.  this is a
    stream processor: all nodes updated at the same time, and all
    constraints are checked (serially, conceptually in parallel).

  * operator monitor: for each allowed input event (a tuple of set
    points) one can construct a check that can accept/reject the
    setting, or limit it to some extent.

The useful part would indeed be not to just tell an operator wrong,
but to _adjust_ a request to a valid region.  This needs extra
knowledge however: how to express an intermediate point (i.e. some set
points might _really_ make no sense: this then should not result in
somehow erroneous behaviour).

Entry: Prolog and Pattern Matching
Date: Tue Sep  8 09:47:46 CEST 2009

Looks like I missed some important point about Prolog.  In my head
it's simplified to `amb' + constraints in the form of horn clauses.
But then what?  As a guide, let's take [1] and [2].

In [2], chapter 22 talks about `amb' (or `choose').  Relatively
straightforward.  So what is `cut', explained in 22.5?  Essentially,
eliminating part of a search tree.  This is straightforward in
depth-first search, and requires marking of the resume stack.  Then
22.6 talsk about breath-first search, implemented by replacing the
resume stack with a queue.

Chapter 24 talks about prolog as ``rules as a means to construct a
tree of implications, which can then be searched using
non-determinism.''  Or prolog as a combination of pattern matching,
nondetermism and rules.  A rule has the form

  if <body> then <head>.

The basic idea is that if the body matches a fact, its head can be
added to the collection of facts.

When asked for a fact, we can recursively sift through the facts and
heads of rules, essentially picking out rules and finding evidence.
The search ends when there is evidence found, or when all rules and
facts are exhausted.

So, what about binding?  Find all occurences of variables satisfying a
certain property.  It looks like what is necessary to implement this
in Scheme is to map Scheme's binding mechanism to the rule binding.
In the following rule,

   (if (and (tree x) (summer)) then (green x))

which is the binding site, and which is the reference site for `x' ?

It's not obvious, so maybe the answer is it depends on whether this is
interpreted as a constructor or a destructor, or both.  Unification?
Indeed. [1][3].

One thing that confuses me is the 2 directions of information flow: to
see if a parameterized fact is true, one needs to run the rules
backwards, but for every value found, the value propagation runs the
other way.  So it looks like control flow is going in one direction,
while data flow moves the opposite way!

Let's try to organise the control flow first, and see how the data
flow can be incorporated in that solution.  Given a set of rules,
construct functions that are indexed by proposition.

  ;; rule <body> <head>
  (rule (and (tree x) (summer)) (green x))
  (rule #t (summer))
  (rule #t (tree pine))
  (rule #t (green algae))

Suppose `(green x)' nondeterministically binds x.  How does this look
as in expression form?

I'm stuck at expressing binding recursively.  Let's try to make it
more difficult first

  (rule (and (tree x) (brown y)) (browntree x y))


...


[1] http://mitpress.mit.edu/sicp/full-text/sicp/book/node92.html
[2] http://www.paulgraham.com/onlisptext.html
[3] http://en.wikipedia.org/wiki/Unification


Entry: Resolution (Logic)
Date: Wed Sep  9 09:05:16 CEST 2009

Reading CTM[1] chapter 9 about relational + logic programming.

The basic idea is that you want theorem proving: given a logic
sentence, the system needs to find a proof for it.  On page 634 it
remarks:

- theorem prover is limited (Goedel's [in]completeness theorems)
- algorithmic complexity: need a predictable operational semantics
- deduction should be constructive

This is ensured by restricting the form of axioms so that an efficient
constructive theorem prover is possible.  For Prolog these axioms are
Horn clauses, which allow an inference rule called resolution[2].
Additionally, Prolog allows to add hints to map a logic program to a
more efficient operational semantics.


[1] http://www.info.ucl.ac.be/~pvr/book.html
[2] http://en.wikipedia.org/wiki/Resolution_%28logic%29


Entry: Relational programming
Date: Wed Sep  9 09:29:28 CEST 2009

I was wondering if it makes sense to allow infinite choice sequences.
Probably not, unless they appear at only one place in the tree.  For
depth-first this should be on top, and for breath-first at the bottom.
Otherwise the infinite branching will prevent some children from ever
being reached.  Also, allowing `cut' operations helps.

But, it does seem usefult to at least keep an algorithmic description
of choice points without having to resort to explicit lists.
Additionally, these choice points could be results from another search
problem, generated lazily..

So, what about reformulating choice as taking an enumerator by
default, instead of trying to shoe-horn procedural sequences back into
explicit lists.

Design principles:

   - lazyness is good

   - enumerators (HOFs) are better than lazy lists (data structs)


So, `amb' takes a function.  A choice point is then a continuation (a
hole) and an enumerator.  The contination can be plugged into the
enumerator directly.

(struct choice (k enum)) ->  (enum k)


Now, does this compose?  I.e. can the driver just execute the
enumerator and nest the backtracking that way?

Hmm..


OK: settled to:
  - amb takes enumerator
  - some enumerator <-> list / stream / sequence transformers (enum.ss)
  - solutions presented as enumerator

Next: cut.

Marks should be installed at `amb' time, while cut


Entry: unify.ss
Date: Thu Sep 10 16:41:29 CEST 2009

Unification algo seems to be working.  Used it in one way (as a
pattern matcher) in database.ss

About prompt tags.  It looks like it's necessary to name the tags,
because other uses of partial continuations would interfere with the
marks used for backtracking.

The idea seems to be that for each abstraction built on top of partial
continuations, you use a new prompt tag.  Then you just need to worry
about those tags being properly nested (i.e. nested backtracking).


Entry: Algebra of Programs
Date: Fri Sep 11 12:30:28 CEST 2009

Let's focus on the basic idea: a create a collection of combinators
that satisfy some mathematical laws.  Make it almost trivial: no power
in the language, easy manipulation.  Play with the manipulation and
try to grow some context to understand current literature about this.

The basic one is the interaction between `map' and fuction compositon.

  (f) map (g) map = (f g) map

Now, construct a bunch of meaningful programs that use just this, and
then construct all possible rewritings.

Then, try to associate a _cost_ and _feasibility_ to the different
versions, i.e. register pressure, cache pressure, ...


[1] entry://../staapl-blog/20090911-110748


Entry: Prolog
Date: Fri Sep 11 12:37:47 CEST 2009

Looks like I really need Prolog, and more general, some kind of
theorem prover / constraint system.  It will probably pay to build one
and have an idea of the tradeoffs.

Let's finish what I have now.  In the previous attack, mapping rule
inference to lambda abstractions doesn't work, because it requires
_unification_ which is a symmetric binding construct, while pattern
matching and construction is asymmetric.

So, given a rule

    P(X) & Q(X,Y) -> R(Y)

this can be used in a query for R(Y) as follows:

     - the query R(Y) leads to a store with one unbound variable Y.
     - extend the vocabulary with the variable X
     - find a stream of solutions for Q(X,Y)
     - filter it with P(X)


Entry: How the query system works (SICP)
Date: Sun Sep 13 14:48:35 CEST 2009

In [1] the stream-of-frames implementation is used to implement the
query system.

  In general, the query evaluator uses the following method to apply a
  rule when trying to establish a query pattern in a frame that
  specifies bindings for some of the pattern variables:

    * Unify the query with the conclusion of the rule to form, if
      successful, an extension of the original frame.

    * Relative to the extended frame, evaluate the query formed by the
      body of the rule.

  Notice how similar this is to the method for applying a procedure in
  the eval/apply evaluator for Lisp:

    * Bind the procedure's parameters to its arguments to form a frame
      that extends the original procedure environment.

    * Relative to the extended environment, evaluate the expression
      formed by the body of the procedure.


In my implementation, the stream-of-frames is the result of
`solutions', which can then be fed back into `amb' recursively.

It seems that the real problem is `alpha conversion' : to map
variables in a query to variables in a head/body combo.  A simple way
to do this is to represent the rules's parameters using functions
(higher order abstract syntax) HOAS[2]:

  ``In the domain of logical frameworks, the term higher-order
    abstract syntax is usually used to refer to a specific
    representation that uses the binders of the meta-language to
    encode the binding structure of the object language.''

However, in order to avoid explicit renaming and piggy-back on
Scheme's lexical variables, some deconstruction is necessary at
compile time.  It seems this needs to do the work twice.

Use explicit renaming then? (as in SICP).

Alternatively, we can give up the direct representation as lists and
only use the abstract one where the unification control flow is
expanded.  Maybe that's a better way: you don't need `eval' or any
direct interpreter as long as you have lambda and hygienic macros.  In
this case it looks like a store needs to be a run-time entitiy, but
the unification match is compile-time.

But, let's do renaming first.  A working implementation is worth more
it seems..  This is the core of [3]:

(define (unify-rule store pattern rule)
  (bump-rename-count!)
  (let* ((rule (map rename-variables rule))
         (store (add-free-variables store (rule-head rule))))
    (for/fold
        ((store (unify store pattern (rule-head rule))))
        ((bpat (rule-body rule)))
      (unify-pattern store bpat))))

(define (unify-pattern store pattern)
  (unify-rule store pattern (choice/enum (rules-db))))

(define (solve pattern)
  (solutions
   (query
    (bindings
     (unify-pattern
      (add-free-variables (empty) pattern)
      pattern)))))

So indeed, it is quite straightforward.

Rules can be partly staged + renames can be avoided, by piggy-backing
on a directed pattern matching binding form.  Currently I have this
implemented as a clause translator: the head match produces either a
fail, or a body where the rule's variables are substituted by the
corresponding sites in the input pattern.  Instead of constructing a
list, this could also just construct a recursive query invocation.


[1] http://mitpress.mit.edu/sicp/full-text/sicp/book/node94.html
[2] http://en.wikipedia.org/wiki/Higher-order_abstract_syntax
[3] http://zwizwa.be/darcs/staapl/staapl/machine/prolog.ss


Entry: quotation vs. prefix
Date: Sun Sep 13 17:44:06 CEST 2009

To distinguish variables from symbols some form of syntax tagging
needs to be used.  In SICP and On Lisp, the `?' character is used for
this.  I picked explicit quotation, which would translate to PLT's old
match form.  This choice is quite arbitrary as it's straightforward to
translate, so I'm going to switch back to the SICP/OL style.

EDIT: Funny, apparently in SICP this is also just surface syntax:
variables are tagged as (? <name>) for efficiency.


Entry: Linear equations
Date: Mon Sep 14 09:52:06 CEST 2009

So... This constraint business.  The current algorithm uses N x 1
constraints.  This needs to be generalized to N x 1 linear
constraints, then N x M systems.

Let's start with the most general first.  I.e. a 3 x 2 system:

  a x + b y + c z = d
  q x + r y + s z = t

In general this can be solved using gaussian elimination[1].  Given a
collection of known variables, a square system can be constructed
which then needs to be inverted.  Since all coefficients are known at
compile time, the sequence of computations can be generated in-line to
produce a dataflow program, which can then be sequenced.

Pivoting[2] can be used to yield optimal condition.  Full-pivoting
searches for the element with largest absolute value in the
unprocessed rows.

OK.. since this is all quite straightforward computable at run-time,
it might be good to do it numericall.  Otoh, allowing rational numbers
might be interesting too (i.e. for sparse systems).

Yes, let's go for the rational numbers.  Wait, since there are going
to be square roots of 2 and 3 in the constants, maybe also use field
extensions?

It looks like it might be a good idea to abstract the _coefficient
domain_ of the equations using some kind of unit interface (like the
functor approach in [3]).


[1] http://en.wikipedia.org/wiki/Gaussian_elimination
[2] http://en.wikipedia.org/wiki/Pivoting
[3] http://www.cas.mcmaster.ca/~carette/publications/scp_metamonads.pdf


Entry: Abstract number domains
Date: Mon Sep 14 12:29:29 CEST 2009

What: build a staged expression evaluator for abstract number
systems.  I.e. flatten an expression to ANF + allow for complex
numbers, normal numbers, other fields/rings, ...

So.. I've separated the ring-sig.ss interface, and will provide
algorithms in terms of mathematical rings/fields.  This seems the
cleanest separation.


Entry: Using associativity
Date: Wed Sep 16 10:04:05 CEST 2009

I'd like to use the associativity laws to shorten the critical path
using binary subdivision.  This is only useful for the staged
version.  It would probably be best to hide this behind the `ring^'
interface since it relies on representation.

The naive subdivision won't work: it produces this:
box> (sum (syntax->list #'(a b c d e f g h)))

;; (v8 (+ a b))
;; (v9 (+ c d))
;; (v10 (+ v8 v9))
;; (v11 (+ e f))
;; (v12 (+ g h))
;; (v13 (+ v11 v12))
;; (v14 (+ v10 v13))

which is depth-first.  Breadth-first is probably better, but it uses
more intermediates.  Looks like the trade-offs here are implicit.

Anyway: it is a different problem.  This could be done in two steps:
first make sure the _dependencies_ are a binary tree, then in a second
iteration schedule the operations to allow for pipeline delays.  The
latter can probably be handled by the C compiler.

So: compilation is

    EQUATIONS -> DATAFLOW COMBINATORS -> DATAFLOW SCALAR -> SEQUENTIAL


Actually, once you start combining different forms, because of
specialized mul/add for 0 and 1, the tree of operations gets
unbalanced.  Postponing until the computation is finished and then
re-balancing (based on algebraic laws) might be an interesting
optimization.


Ok.  I've added memoization.  This doesn't do commutative ops though.
Ok, this is now also handled.


Entry: Echelon form
Date: Thu Sep 17 11:21:52 CEST 2009

Seems to be working.  The only problem is the staging of pivoting,
since this has an influence on the data flow.

Ok, what I want is not echelon form, but gauss or gauss-jordan
elimination.  The inner loop does this:

. | ....        . / 0 | ....
P | ....   ->   P / 1 | ....
. | ....          0   | ....

There are two parameters: wheter to scale the pivot (P/1) and whether
to eliminate the top rows to zero (gauss-jordan) or not (gauss).

Ok. moved to gauss-jordan elimination.


Entry: Matrix combinators
Date: Thu Sep 17 14:30:11 CEST 2009

Interesting how not trying to optimize prematurely makes the matrix
code look quite simple.

Because the matrix structure can be completely eliminated at compile
time, the only thing that remains is the data-flow dependencies the
algorithms introduce (i.e. pivoting makes choices wrt. dependencies
where the goal is to optimize for better numerical stability).

What I wonder is if it's possible to reconstruct some grid operations
from the bulk of the data flow network.  I.e. if the higher order
operations as I've specified them now can be used to map to some
intermediate language that _doesn't_ get rid of the grouping
relations (i.e. to recover loops from the flat code).  From afar this
does look like it's not a good idea, as it creates an artificial
search problem.

But.. Tagging the nodes with this information might make
reconstruction easier..


Entry: Constraints
Date: Thu Sep 17 15:32:23 CEST 2009

Ok, so give a set of linear equations that determine intermediate
variables, and a set of inequalities, determine:

   - running check for autonomous state
   - event-check for parameter updates

So.. what are the advantages over doing it numerically?  The algebraic
approach supports sparse equations.


Entry: AI - Summary
Date: Thu Sep 17 18:03:37 CEST 2009

I have this feeling there is much more hidden below the surface I just
scratched.  Abstract interpretation seems to be a really neat trick.
It makes some things look really easy.


Entry: Equations
Date: Fri Sep 18 14:19:26 CEST 2009

So, how to use this, starting from a bunch of equations.  Essentially,
what we want to do is 1. to figure out which are the variables, 2. see
if they make up a linear system.

For 2. what is needed is for every symbolic multiplication, at least
one of the variables needs to reduce to a constant.


Entry: Type Checking and Abstract Interpretation
Date: Fri Sep 18 14:54:58 CEST 2009

When reading stuff in [2], I missed this[1] comment by Greg
Buchholz[3]:

  ``I'm toying with a interpreter/compiler for a Joy-like language,
    and of course the issue of typing came up. But instead of having
    static type inference or latent type checking, I've been
    interested in executing programs with types instead of values.''

Then moving on to the small-step abstract interpretation[4].

When dealing with (control) effects, small-step operational semantics
are easier to work with than coarser big-step / denotational
semantics, while the latter allow for syntax-directed techniques
(structural induction[5] over expressions).


[1] http://lambda-the-ultimate.org/node/834#comment-7658
[2] entry://20090716-143500
[3] http://kerneltrap.org/blog/6714
[4] http://okmij.org/ftp/papers/delim-control-logic.pdf
[5] http://en.wikipedia.org/wiki/Structural_induction


Entry: Abstract Interpretation: Lattices
Date: Fri Sep 18 19:09:10 CEST 2009

While intuitively, abstract interpretation for staging (symbolic
computation) is quite straightforward to do, the mathematical
definition relies on order-preserving functions on lattices.

A canonical example of a lattice is the set of subsets of a set,
together with union and intersection (a boolean lattice[1]).

I.e. arithmetic can be approximated over the lattice made up of the
subsets of {+,0,-}.

        {+} add {+}  -> {+}
        {-} add {+}  -> {+,0,-} = T
        x   mul {0}  -> {0}
        ...

So..  The missing links and terminology seem to come from denotational
semantics[2].  As far as I get it now, an abstract semantics is
related to a full denotational semantics in some structure-preserving
way.

Attempt:

In the following diagram L and L' are lattices, f is a concrete
function, f' an abstract function, a : L->L' is the abstraction
function and c : L'->L is the concretization function.  The a and c
are ``semi-inverses'' in that they preserve order: they form a Galois
Connection[5].

The abstraction is sound when whenever c acts as a
``semi-homomorphism'' f . c < c . f' respecting the order relation
instead of an equivalence, i.e. the following diagram commutes:

                              f
                       L  --------->  L

                       ^              ^
                       |              |
                       | c            | c
                       |              |
                       |              |
                              f'
                       L' --------->  L'

In other words, soundness means that for any concrete operation forall
x, f x = y, if x \in c(x') then y \in c(f' x') meaning that for all x,
if x' is an abstraction of x and f' is an abstraction of f then f'x'
is an abstraction of y.

It's convenient to picture the original semantics L as the powerset of
some set |L (i.e. the natural numbers |N): already containing all
approximations.

In this case the abstract representation L' is related to |L by
mapping elements l' to subsets of |L.  The order relation in L
represents ``level of abstraction''.  I.e. the element + in L' could
map to the element {0,1,2,...} in L.

Now, from the first lecture of Cousot[7], we find: Abstract
interpretation is considering an abstract semantics that is a
_superset_ of the concrete semantics of the program, hence it covers
all possible concrete cases.  This leads to the requirements: 1. sound
(cover all cases) 2. precise (avoid false alarms) and 3. simple (avoid
combinatorial explosion).

So, conclusions: what I'm doing at the moment is a combination of
several techniques: it produces a trace (abstract evaluation of the
interpretation step function) + approximates values by values U
variables / expressions.  It would be interesting to try to formalize
this.  Soundness seems ``obvious'' in my case, so te proof shouldn't
be too difficult (probably trivial once there's a formal model).  In
general however, it does seem like a good idea to try to keep the
theory in mind next time I need to do some analysis, and use specific
applications as an example.


[1] http://en.wikipedia.org/wiki/Boolean_lattice
[2] http://en.wikipedia.org/wiki/Denotational_semantics
[3] http://santos.cis.ksu.edu/schmidt/Escuela03/WSSA/talk1p.pdf
[4] http://santos.cis.ksu.edu/schmidt/Escuela03/home.html
[5] http://en.wikipedia.org/wiki/Galois_connection
[6] http://santos.cis.ksu.edu/schmidt/Escuela03/WSSA/talk3p.pdf
[7] http://web.mit.edu/afs/athena.mit.edu/course/16/16.399/www/


Entry: Contracts
Date: Sat Sep 19 10:26:15 CEST 2009

Time to tag some contracts..  I've tried to use typed Scheme, but the
absence of inference makes it tedious.  Also, I'm not quite used to
the degree of precision required for typed programming..  This is
really something to do gradually: from the point you know what you're
doing.  (I wonder, is there something like a Scheme -> ML translator
that lets you use the ML typechecker and inferencer?)

Anyways, I'd like to explore the route:

    untyped -> contracts -> typed-scheme


Entry: C extensions using standard syntax.
Date: Sat Sep 19 12:04:36 CEST 2009

Continuaton of [1] in staapl/staapl/c.  The idea is to embed macros in
C source code as a way to sneak metaprogramming techniques into the
embedded C world.

Let's try to build a framework for this.  What is needed?
 - c.plt input wrappers (cpp, mfile)  OK
 - (pretty) printing to C
 - conversion from c.plt data structs <-> code processor structs

c.plt still needs the C preprocessor to perform expansions, so that's
wrapped using cpp.ss (which uses ../tools/mfile.ss).  NEXT: get a grip
on c.plt data structures.

The AST is here[2].  Ok, I have a first draft of a partial naive
pretty printer.  Looks like we're there.

[1] entry://../meta/20090919-112256
[2] http://planet.plt-scheme.org/package-source/dherman/c.plt/3/2/ast.ss


Entry: Join libprim and staapl
Date: Sat Sep 19 17:37:51 CEST 2009

Reason 1: the cplt analysis tools can be used in the libprim build
process.


Entry: VLIW + GP
Date: Sun Sep 20 12:33:13 CEST 2009

What if.. a GP processor performs address calculations for a VLIW?
I.e. concentrate on subdividing the problem in kernel loops that
execute on the VLIW, and buffer management code that runs on the GP,
performing address calculation to feed the VLIW with data.


Entry: Loop TX
Date: Sun Sep 20 16:18:43 CEST 2009

In order to get a better idea of loop transformations, it might be
interesting to look at the ``algebra of loop transformations''.  I'd
be surprised if this doesn't exist in such an abstract form.

- interchange (transpose)
- splitting/peeling (boundary conditions or segments)
- fusion/fission
- unrolling
- invariant motion (move independent statments outside loop)
- reversal
- tiling/blocking
- skewing
- vectorization
- software pipelining
- unswitching (moving conditionals out)
- inversion (while -> do/while)

Apprently there is an abstraction called the ``Unimodular
Transformation Framework'' which deals with representing these
oparations as matrices.  Muchnick 20.4.2.

From this it seems reasonable to limit the possible loops in a
language (combinators) to get a better-behaved algebra of loop
transforms.


[1] http://en.wikipedia.org/wiki/Loop_optimization

Entry: MAP is easy, FOLD is not.
Date: Sun Sep 20 17:54:45 CEST 2009

So, when writing data combinators, MAP can be used to fill the gaps,
while FOLD needs special care.  Most commonly the operator that is to
be folded is associative.  This allows for the fold to be broken up
into independent pieces which can be combined in the end.

Comparing this to bananas, lenses, ... [1] the problem is different.
You want to treat MAP specially because of parallellism.  The map
fusion (loop fusion) is like a non-local particle going through your
program.  It has many degrees of freedom.  Another difference is that
comllex anamorphisms are rare in DSP.  They mostly take the form of
constant lifting (combine a constant to each element in a loop
map/fold) or simple weight generation (lines & sines).  The
catamorphism (fold) however is very important, with the inner product
taking the lead, possibly followed by min/max.

Constant lifting corresponds to moving variables out of loop scope.
It seems that the idea of loop scope is going to be an important one.

This idea of loop fusion being a ``particle'' moving around in a space
of constant energy is stuck in my head.  The same can then possibly be
said for fold fusion.

    (map $) . (map @)  =  map ($ . @)

Something not to ignore is that boundary conditions are important, and
significantly complicate manual code compared to truly symmetric map
and fold.

As I mentioned before in [1]: for image processing you want the data
types to be a bit more abstract than recursive types.  I.e. picking an
implementation is picking a data type + recursion schemes that handle
pre/loop/post cases.

[1] entry://../compsci/20090911-125525


Entry: Combinations of 2D filter masks and mappable scalar ops.
Date: Sun Sep 20 18:51:45 CEST 2009

Main rationale: boundary conditions significantly complicate
expression of folds over images.  Write an algebra of fusable
operations, first ignoring boundary conditions (infinite fold), and
second taking that into account.

Parameterizing composition of 2D filter masks will yield a large class
of useful programs, and will probably give some idea about how to move
on to more serious folds.

So... How to express data types?  Or can this be avoided?

Moving to the simpler problem of 1D filter masks over the stream s,
which consists of an infinite list of elements s = e*, this is about
the operator Z wich delays the stream by one time instance.

       Q: what does `Z + 1' mean?

Here Z is an operator: s -> s
this makes 1 also an operator: s -> s
and + is an operator combinator:  (s->s, s->s) -> ((s,s)->s)

It looks like it is actually this shift operator that makes things so
problematic.  The reason is that it ``looks inside'' an iteration over
a stream.

What I mean is that, while it is easy to lift + to a stream operation
(s,s) -> s and feed it with two streams s and Zs, it is less trivial
to turn the operation into an operation f : s->s, where f is obtained
from +,1 and Z.

More specificially, the function that provides arguments from the
input stream to the scalar + : (e,e) -> e somehow needs to retain
memory.  This ``state maintenance'' seems to be the central problem
related to combining maps + folds and stream shift operators.

In other words, there is a difference between the following lift
types, depending on lift over streams, or lift over s,Zs.

   S-lift : ((e,e) -> e)   ->  ((s,s) -> s)

   Z-lift : ((e,e) -> e)   ->  (s -> s)

Making these functions explicit and providing laws for them is
probably what needs to be done.  The lower one can be separated in as


         ((s -> s),
          (s -> s),
          ((e,e) -> e)) -> (s -> s)

i.e. taking two stream transformers and a binary scalar op, and
feeding the transformed streams into the binary element op.  Filling
in 1,Z,+ gives the filter function f mentioned above.

So.. the interaction between

    map2 : ((e,e)->e, s ,s) -> ((s, s) -> s)
    map1 : ((e->e), s) -> (s->s)
    Z    : (s -> s)

    streamZ s = (s -> (s,s))

    map2(+, s, Z(s)) = map1(+,streamZ(s)) : s->s

It's important to distinguish between a stream of tuples and a tuple
of streams.  It looks like making the types work is going to bring us
half way there (in the light of Wadler's free theorems[1]).


[1] http://homepages.inf.ed.ac.uk/wadler/papers/free/free.ps


Entry: Monads in a concatenative language
Date: Mon Sep 21 12:31:00 CEST 2009

How would you express them?  Probably best in terms of unit/map/join,
because absence of lexical variables makes the do notation of the bind
form impossible.  What do monads look like in the map-formulation?


Entry: Constraint Language
Date: Thu Sep 24 11:23:37 CEST 2009

1. convert to normal form (confluent rewrite system)
2. collect constraints into propagators (i.e. collect linear equations)
3. perform constraint propagation (find a distribution strategy)

Putting it like this makes it look a lot more structured.  Especially
the first one is something I didn't think about.  The third one
appeared after reading chapter 12 in CTM.

The first one can maybe be done directly with syntax-rules.
Normalizing the relations is simple: move stuff to the other side
until there is a comparison to 0.  The next simplification is
conversion to sum of products.  This expands every + nested inside an
*.

How can conversion to normal form be formulated as an alternative
representation?  I.e.:

  +  concatenate
  -  (map negate) . +
  *  convolve

It's actually quite simple:
  1. recursively reduce all multiplications
  2. flatten all additions


Entry: Stream Combinators : Z
Date: Sun Sep 27 10:42:38 CEST 2009

The problem that needs to be solved for the image processing loop
transforms is a way to move `Z', the single-element delay, inside and
outside a loop.  The general idea:

  Z (delay) should be a high-level operation.  In all implementations
  I've found, Z is always explicitly implemented wrt. the
  representation of streams / sequences, leading to a complicated (too
  much detail) and inflexible (too specific) encoding of knowledge.
  DSP code should be expressed in terms of polynomials.  The mapping
  of this representation to code should be automated, possibly
  parameterized.

Essentially, translate a Z on the inputs into a Z of the fold state.
This is probably related to paramorphisms.

I'm looking for a _form_ where the operation of moving the effect of Z
inside a loop is clear.

Suppose s is an infinite list.  In scheme notation I'm looking for the
transformation between:

  (lambda (s)
    (let ((s1 s)
          (s2 (cdr s)))
      (map + s1 s2)))

and

  (lambda (s)
    (let filter ((s1 (car s))
                 (s  (cdr s)))
      (let ((s2 (car s)))
        (cons (+ s1 s2)
              (filter (cdr s) s2)))))

The first is a map, while the second is a kind of fold (it uses
threaded state).

The right framework to see this is probably memoization / dynamic
programming.  The delay operation `cdr' here is memoized, which in
practice means that an expensive memory fetch will be cached by
keeping the variable in register.

A paramorphism does something similar: its binary operation takes a
pair: the input and output of the iterated function.  From[1]:

A paramorphism for the natural numbers:

   h 0       = b
   h (1 + n) = n @ h n

For lists:

   h Nil            = b
   h (Cons (a, as)) = a @ (as, h as)

Here `@' is the binary function to be iterated.  The next step seems
to be to look into Meerten's idea[2] and find a way to express it for
Z-based computations.


However, note that a paramorphism has a truly recursive call tree:
there is a reverse data dependency: the binary operation @ receives
the _result_ of the recursion.  What I'm looking for is tail recursion
with all memoized state / accumulators passed in as extra arguments.

Indeed the problem is different: the specification might deal with
more general recursion patterns on recursive data types, but in the
end one needs to construct loop kernels with local state connected to
a data access pattern.  Maybe the reason for confusion is the
difference between abstract definition with low arity function, and
the ``splatted'' nature of pipelined loop kernels for RISC/VLIW
machines with lots of registers.

Maybe the question can be reformulated as: find a transformation that
memoizes accesses.  In [3] elimination of redundant loads is mentioned
as one of the possible optimizations that bring accesses to
registers.  If I'm about to move the abstraction level up, this kind
of optimizations becomes very important.  What are the other patterns?

   - elimination of redundant loads (i.e. offset indexing implemented
     using register shuffling)

   - loop unrolling enables array -> variable translations.

   - data prefetch and proper allocation in memory hierarchy: spatial
     vs. temporal locality, i.e. FIR input/output vs. FIR coefs.


[1] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.41.125
[2] http://www.springerlink.com/content/h1547h551422462u/
[3] isbn://1558607668


Entry: Stream combinators continued
Date: Tue Sep 29 12:43:22 CEST 2009

So, starting from the primitive delay combinator Z : s -> s, which
takes a stream and maps it to the stream with one operation shifted,
we need algebraic laws such as:

   (map fn s (Z s))  =  (loop_1_Z fn s)
   (map fn (Z s) s)  =  (loop_Z_1 fn s)

Where `map' is an element-wise morphism, or fn : e -> e, and `loop'
has threaded state.

The objective is to find an abstract way to characterize the functions
`loop_1_Z' and `loop_Z_1' so the rewrite rules work in both
directions.  This depends on the representation of streams (which will
ulimately be machine memory arrays).  Let's start with:

   ((map fn) (I s) (Z s))  =  ((loop fn I Z) s)
   ((map fn) (Z s) (I s))  =  ((loop fn Z I) s)

The objective is to start with a LHS representation in terms of
combination of streams, and turn it into a single iteration over a
number of primitive streams.  This is essentially loop merging.

Later we need to give up the functional notation early and use
data-flow variables: each expression contains a number of inputs and
outputs.  For now expressions are simpler.

The rule above needs a more general composite transformation rule that
can merge two loops.  Essentially what one wants is algebraic rules
that relate stream and function operators.

Let's use the following notation:

s : [e]           [.] stream type constructor
f : e^n -> e^m    elementary function
S : s^n -> s^m    stream transformation
F : f -> S        elementary function transformation


   ((F1 fn) (S1 s)) = ((F2 fn) s)
    +-----+ +----+     +-----+
       S       s          S

Laws are needed that have S1 in terms of an F, like:

   ((F1 f1) ((F2 f2) s)) = ((F3 f1 f2) s)


What I can gather from this is that:

  1. types are important to make this managable.  It might be wise to
     put this in a categorical framework to map out the territory.

  2. the rest seems pretty straightforward derivation of theorems that
     proove equalities.


Let's try this with the important players:

  + * : e^2 -> e
  -   : e -> e
  I,Z : s -> s

What's next is a syntax and semantics for `loop' parameterized by the
elementary operators Z^n, n>=0 with I = Z^0.

TODO: build loop on some substrate of streams of elements (i.e. a
circular vector of exact numbers, to make verification of
transformation rules simple.)

Got loop.ss test skeleton.  The first thing I notice is that I need an
abstraction for threading delayed values:  It's not so simple to do
this with unary fold.  Does for/fold allow for multiple values?  Yes.

OK, I have the first TX test working:

(quickcheck
 (lambda (s) (map + (I s) (Z s)))
 (lambda (s) (loop_I_Z + s))
 1 10)

Next is to find a way to generate loop_I_Z from a specification (I Z).
This should generate pre/loop/post code (which is the most annoying
part doing manually).

Another thing: if there is no output dependency (i.e. IIR filters), it
is possible to run loops with delays backwards.  This is necessary for
making operations parallel.  Take this as a hint that the best
description probably takes direction simply as a parameter.  The real
deal is this: in the end, there is an inner loop that accesses a mask
over the data.  Find a way to describe this on a higher level.

In order to properly represent loops it might be best to adhere to a
representation that uses arrays and indices directly.  The problem
here is that references are pure, but of course assignments are not.
Let's move to single-assignment vectors (this way coverage can be
tested).

In short: separate abstract formula manipulation from implementation
(single-assignment vectors).

Functional vs. dataflow.  An interesting property is that it is
difficult to talk about vector operations without allowing (single)
assignment: iteration loops seem to scream for explicit mention of the
storage/send of outputs.  Wait.. Maybe a read/write formulation would
work better?  I.e. delimited continuations etc..

Q: can these transformations be expressed easier in terms of delimited
continuations?

Q: Essentially, memory already behaves as a stream (very clear in DSP:
using DMA from DRAM to SRAM to feed a processor..).  Maybe I should
reformulate loops as combinations of IO + coef + state + loop?

Q: What about seeing code as a description of a circuit and then
re-interpreting as loops (operational semantics?)

I.e. a 3-tap fir filter is a description of a network that connects
adders and multipliers from an input stream to an output stream.
Combinators essentially duplicate a basic pattern to a global one.
Looking at it this way (pure functional kernels) it might be simpler
to derive loops and pre/post code by reducing the pure kernels by
combining them with a traversal strategy.

That's really the essence, right?  Declarative dataflow.  What is
declared?  The _structure_ of the computational network, _not_ the map
to a serial machine.

I.e. instead of thinking about delay lines, think of a little more
highlevel concept of spans: transform loop bodies, such that the final
loop construction takes the form of a universal pre/loop/post
combination that takes a description of the span as input.

Some remarks:

   - using a limited set of core operations, issues of associativity
     and commutativity could be left until later (when discovering
     ILP).

   - the remaining problem is the conversion of an input specification
     to a (memoized) data flow program.  in order to have things map
     easily to serial C, the memoized form uses `let*' terminating in
     a `values' form that produces the output.

Let's try this.  The effect of 'z' needs to be pushed through to an
input variable, to be mined later to determine the signature of the
function.


[1] http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.68.5409&rep=rep1&type=pdf
[2] http://eprints.kfupm.edu.sa/45250/1/45250.pdf


Entry: Filters cont..
Date: Wed Sep 30 18:18:26 CEST 2009

So, I have a notation that can push a z operator through operations to
end up as a variable operator (all operations work on current values,
and delays are implemented by the driver loop).

(ttx r/z #'(+ a (z (+ b (z (+ (a b)))))))
=> (+ a (+ ((z 1) b) (+ (a ((z 2) b)))))

Next: do the same for operator specifications (polynomials in z).
It's probably best to separate operatiors and other things using a
type system, however, an operator is something where z can be an
argument to an operation (where the operation is then an operator on
an operator).


Entry: Constraint cont.
Date: Mon Oct  5 13:14:46 CEST 2009

Starting from [1].  What I have now is a specific propagation engine
(the idea is to use real numbers / floats, so sets of integers are
currently not yet supported).  The normalizing should work too.  So
what I need is a specification and propagate the problem description
to a solution.

Let's start with something trivial first:

(constraint
  (= (+ (* 3 x) (* 2 y)) 10)  ;; define intermediate nodes
  (< (+ x y) 3)               ;; safety constraints
  )

This should lead to the following behaviour: generate code for the
following functions:

          set_x()
          set_y()

This produces directed equations + relevant check of constraints.

So, I have a routine now that produces a list of equalities and
inequalities.  In the first iteration, the equalities need to be
arranged into a matrix.  This means the unknowns need to be
discovered.  If the equation is in normal form, we can simply gather
them from the '*' forms.

Next was sorting terms according to order.


[1] entry://20090924-112337


Entry: 2-level semantics
Date: Mon Oct  5 15:50:00 CEST 2009

Concrete: whenever an identifier appears as a literal in a syntax-case
expression, it has a compile-time semantics thats _not_ programmable.

There is a subtle difference between this, and providing a syntax
binding for the identifier.  The latter allows for lexical scope.

Currently, using syntax case with identifiers for normalizing
arithmetic expressions is probably good enough.

EDIT:

  While writing an (algebraic) meta-processing language, some of the
  identifiers (operators) have a compile-time semantics (i.e. the
  associativity law which allows re-arranging expressions, and has
  little meaning at run time, when the program has lost all its
  mathematical meaning and is merely a sequence of instructions for a
  serial/parallel computer.)

  In short, `+' and `*' are _not_ scheme!  They have two identities
  that should be distinguished with the utmost care: at compile time
  they are _formal_, and are there _only_ to steer formula
  transformation.  At run time they are functions that operate on
  values.  Once this separation is clear, it is possible to start
  being flexible with the idea of "run time" : if some of the run-time
  reductions/operations can be performed in a separate transformation
  stage, further reductions are possible.

  On the other hand, variables might be scheme (i.e. they should
  respect hygiene).  However, they are quite type-restricted: the
  compile time transformation (formal manipulation) needs to be
  compatible with the run-time semantics (model theory).


Entry: Filter language
Date: Thu Oct  8 11:09:27 CEST 2009

So, goal is this: given a collection of streams and a number of delay
operators, construct a kernel routine.  Start with 1 (audio), move on
to 2 (image) and finish with 3 (video).

The kernel routine is _declarative_ : it merely states the relation
between neighbouring pixels.  To give an operational semantics, a
serialization needs to be implemented that constructs a nested loop in
terms of a physical data stream.

The problems are thus:

    * specification -> normal form polynomial

    * polynomial -> imperative loop

Parameterizing the choices to be made in the last step will give an
implementation / optimization strategy.


An interesting heuristic for (artificially constructed) school
exercises is to ask: did I use all the axioms/equalities?  It might be
a good idea to first focus on transformation rules for simple
algebraic expressions in s-expression form, i.e. generic
associativity.

There are two important operations:
   - arbitrary -> right rotated binary tree (rrbt)
   - rrbt <-> flat

This works really well.  A large collection of simple tree / graph
operations will probably be the right approach.

NEXT: loop body construction.  This involves obtaining delay
information for the variables, and constructing delay assignments and
pre-roll code.  This seems quite straightforward.  Requires some core
routines (id hash) so can be done later.

Ok: I have something that generates this:

box> (ttx loop-body (r/z #'(+ q (z a) (z (z b)))))
(begin
  (begin (set! b_1 (~ b 1))
         (set! b_2 (~ b 2))
         (set! a_1 (~ a 1)))
  (for ((i (in-range n)))
   (set! b_0 (~ b i))
   (set! a_0 (~ a i))
   (set! q_0 (~ q i))
   (set! (~ result i) (+ q_0 a_1 b_2))
   (set! a_1 a_0)
   (set! b_2 b_1)
   (set! b_1 b_0)))

Now, let's simplify the generator so it becomes more composable.
I.e. generator (postponed generated code) should become an algebraic
object on which certain transformations can be performed.  Can it be
turned into a group with a certain generator/... representation?

I.e. the code above was directly constructed from a dictionary of
variables + their associated maximal delay/offset.  Is it possible to
factor this into separate operations of memoization (dereference) and
matching between adjacent loop iterations?

Maybe it's important to see that LOAD/PROCESS/STORE/PROPAGATE is only
important as a final view.  Memoization can be kept out of the picture
for all transformations.


[1] http://docs.plt-scheme.org/reference/dicts.html#%28def._%28%28lib._scheme/dict..ss%29._make-custom-hash%29%29


Entry: Linearization
Date: Thu Oct  8 15:57:18 CEST 2009

It might be interesting to look at linearization / automatic
differentiation in the current implementation of the staapl/algebra
code.  I.e. useful for deriving optimization kernels for nonlinear
models.  (Also see Carette's paper and Haskell links in [1]).

[1] http://del.icio.us/doelie/autodiff

Entry: Loop TX
Date: Fri Oct  9 17:38:36 CEST 2009

Seems to be centered around the idea of linear transformation of
iteration space, which is an integer lattice.

Dependence vectors are integer lattice elements representing spatial
dependencies.  The 1D case is called a ``distance vector''.

``Optimizing Compilers for Modern Architectures: A Dependence-based
Approach''[2] allegedly gives a good introduction about these
techniques.

EDIT: This is an interesting book.  It starts from a lower-level
point: iterations with arbitrary dependencies, and tries to
reconstruct parallellism from that based on dependence.

However, from the point of constructing higher order combinators, the
distinctions between maps and folds is really important:
transformation laws can be expressed on a higher level.  But, on the
implementation level of course loops are going to be ever present.


[1] http://suif.stanford.edu/papers/wolf91b.pdf
[2] isbn://1558602860


Entry: Next
Date: Sat Oct 10 11:26:34 CEST 2009

It's getting complicated.  Time for some design decisions.

* It seems that keeping a loop in reference form is the best approach
  for now: memoization and pre-fill are easy to generate if not a bit
  tedious.  Some abstraction will help here.

* For loop transformations I need to read [1] and reuse the
  representation.

* With the syntax in place, it should be possible to start making and
  parameterizing source transformation rules using higher level
  constructs.

I need a problem to drive this.  Simple straight-line code isn't so
interesting as it seems that problem is solved.

Today seems to be a low-key day, so I'm going to do the bookkeeping
routines: generation of C code from let* form.


[1] isbn://1558602860


Entry: statements/dataflow vs. expressions
Date: Sat Oct 10 12:41:27 CEST 2009

So, the `let*' SSA translator seems to work.  The problem now is to
make the bridge between a syntax for a dataflow language, and C
statements that implement the assignments: simple expressions cannot
do this.

I wonder if it's a good idea to use Oz's dataflow/logic variables
approach as a basic framework, instead of syntactically
differentiating inputs and outputs.  The latter also allows
non-directed interpretation (as a constraint network, instead of
directed dataflow).

Let's stick to the simpler approach (dataflow) but postpone the
decision until it's necessary.

Ultimately this is about having specification as directed equations
(single assignment / dataflow) or relations.

Anyways, back to the point: integrate the let* form with a loop body
generator so it generates runnable code.

Next: array references. OK.

Looks like everything is in-place to make the first simple generators
for 1D streams + Z.

Next: compiler for stream operator expressions -> C array code.

What is needed? 1. simple references  2. delay memoization
It's best to start with the simple for(i=0;i<N;i++)

Ok, got that.  Next: dataflow

This needs an intermediate format where both temporary nodes _and_
array elements get assigned.  The two are quite different.  Is it
possible to unify them to keep the rep simpler, then later optimize
out?  I.e.: because all references get memoized anyway, the final
"assignment" can then be removed, and the intermediate grids reduce to
intermediate register values.


Getting closer: some simple `for' syntax is working.  It seems best to
follow the MetaOCaml example and focus on Scheme->C translation (an
imperative subset containing some statements and C expressions).  This
allows testing of generated code directly in Scheme.

Next: make a specification syntax for a (currently sorted) DFL
language with a z operator, and translate it into a Scheme/C
expression.


Entry: spread & cleave
Date: Sat Oct 10 13:42:52 CEST 2009

Or Factor and dataflow intent instead of stack shuffling[2][3].  It
looks like I've missed a lot of good stuff recently[1][4] (Betweem the
noise look at Nowak's replies and related posts).

Essentially: `cleave' takes an argument and passes it to a sequence of
quotations (a fan-out), and `spread' will apply a list of functions to
elements on the stack (zipping like an inner product).

Moral of the story: Stack languages are ``too sequential'' (as are
monads), and expressing any kind of parallellism without _state
isolation_ leads to problems.  The `cleave' and `spread' combinators
are not parallel because they thread the stack through the iteration.

[1] http://tunes.org/~iepos/joy.html
[2] http://docs.factorcode.org/content/word-cleave,combinators.html
[3] http://docs.factorcode.org/content/word-spread,combinators.html
[4] http://tech.groups.yahoo.com/group/concatenative/message/4283


Entry: Z -> C
Date: Sun Oct 11 09:53:57 CEST 2009

Given a specification of a filter in terms of stream operators, derive
the routine that implements the loop.  When done in several steps,
this is quite straightforward.

     1 convert a `z' notation to normal form by pushing the operator
       through the expression / network to end up at stream offsets.

     2 generate loop code in imperative Scheme, converting `~' to `ref'

     3 convert imperative Scheme to C AST and concrete syntax.


Step 1 produces code like this:

   ((~ v1 0) (+ (~ a 0) (~ a 1)))
   ((~ x 0)  (* (~ v1 0) (~ v1 0)))

representing a dataflow network.  (In the case above we could specify
that the stream `v1' won't be observed, meaning it can be implemented
as a scalar.)

Ok, full circle:

box> (ast-emit (stx->c-stmt (dfl/z->scheme #'((v1 (+ a (z a))) (x (* v1 v1))))))
for ((i = 0); (i < n); (i)++)
{
  (v1[(i + 0)] = (a[(i + 0)] + a[(i + 1)]));
  (x[(i + 0)] = (v1[(i + 0)] * v1[(i + 0)]));
}

The memoized + loop shift version also +- works, but it needs more
infrastructure to connect to, to get an idea the requirements.  In any
case, the memo code seems straightforward / tedious so I'm going to
leave that alone for now, and concentrate on the loops themselves.

One thing is quite clear though: building more elaborate compilers as
a sequence of steps is very doable, but once the number of
transformation steps gets larger, it's probably best to switch to a
different representation form (i.e. a typed language), to make the
invariants more explicit.  Currently I'm using just scheme syntax
objects (s-expressions with marked identifiers).  I.e. the code
explained here uses the following forms (dfl = dataflow Scheme).

  - dfl with stream variables and `z' operator
  - dfl with `~' operator (stream variable + offset)
  - imperative Scheme expressions with `set!', `for' and `ref'
  - C ast (c.plt)
  - C concrete syntax

So: basic infrastructure is mostly there.

Next: this should rest until after I did some more reading on
dependence-based compilation and loops.


Entry: Tool decoupling (embedded ML)
Date: Mon Oct 12 14:14:49 CEST 2009

Compiler compilers.  Essentially, can Staapl be formulated in such a
way that the compiler itself is generated?  If it can be written as an
ML-style data transformer, conversion to a fast and simple
implementation (i.e OCaml) should be possible.

Rationale: removing PLT Scheme from the dependencies (might be
difficult due to dependence on module system though) and making
compilation a fast operation.


Entry: PIC24/30/33 addressing modes
Date: Mon Oct 12 16:37:46 CEST 2009

Assembler addressing modes.  The problem with the assembler syntax as
specified by Microchip is that it uses a syntax that's not so easily
expressed as s-expressions.  However, addressing modes really are just
names mapped to bit representation so they might be handled as
constants.  I'm just not quite sure about how this will work with
transformation / optimization.

So, let's just try it: setup the infrastructure (assembler + stub
compiler interface) and fill it in.

Next: implement 'drop' as "MOV [--W14], W0".  The encoding:

01111 wwww B hhh dddd ggg ssss

w: offset
B: byte mode
h: dst address mode
d: dst reg
g: src address mode
s: src reg

The problem here is that an s-expression syntax would probably need to
provide a bit of sugar, since the instruction format itself is quite
spartan.  In the current framework, an instruction is a function with
a number of binary inputs.  This needs to change to something more
general, preferably representable as combinators.

Also, `MOV' is used for a lot of different instruction tags.

Addressing modes:
000 Ws      ;; Register direct
001 [Ws]    ;; Indirect
010 [Ws--]  ;; .. post-dec
011 [Ws++]  ;; .. post-inc
100 [--Ws]  ;; .. pre-dec
101 [++Ws]  ;; .. pre-inc
11x [Ws+Wb] ;; Offset / Unused (RESET)

Proposed s-expr syntax:

Ws
(* Ws)
(*-- Ws)
(*++ Ws)
(--* Ws)
(++* Ws)


The troubling element is offset addressing.  Instead of representing
it as (+ Wb Ws) it needs to not refer to offset register, as this is a
global entitiy.


Entry: Giving up on the assembler?
Date: Mon Oct 12 18:22:30 CEST 2009

Looking at the rest of the assembler syntax for dsPIC, I'm starting to
think that it might be better to:

  * use an external (textual) assembler

  * use some form of textual syntax for pattern matching.

I.e. something that can still match operands, but uses standard asm
representation in source code.

Problems:

  * pattern matching then requires a parser and an AST rep for the
    assembly language.

  * ``target values'' : the current system depends on opaque
    expressions that depend on address values.

Resolving the latter is part of the assembler relaxation process, and
``outsourcing'' them requires translation to the assembler expression
language.  This is ok for limited semantics, but requires quite a
change wrt. the current expressivity.  Maybe this isn't really needed?

So, external assemblers are currently not possible in a
straightforward way.  Looks like the only simple solution is to use a
matchable s-expression syntax, and extend the idea of assembler
pattern matching.  I.e. currently the `patterns' syntax uses scheme's
`match' syntax for its leaves.  This could be extended to allow for
assembler expressions in tree form.

A better integration with vendor syntax might however be a good
idea, if only for verification of the internal assembler against the
vendor implementation... Not too urgent though.


Entry: Staapl in ML?
Date: Mon Oct 12 18:34:24 CEST 2009

The problem is that an untyped approach allows for several ``higher
order'' tricks that are more difficult to do in a strict type system.
I.e. the assembler is implemented as something akin to am algebraic
data type, but it is still possible to also match the opcode
parametrically (which would be a constructor in a straightforward
implementation in ML).

I wonder, is this good or bad?  It's good that I can express more, and
manually add some constraints at compile time, but it's bad that some
corner cases escape this ad-hoc type system.


Entry: What is an assembly language?
Date: Wed Oct 14 15:50:26 CEST 2009

In staapl, it has the following function:

  - macro semantics are defined in terms of assembly language
    transformation / generation

  - assembly language is either compiled to binary code, or
    interpreted in other ways (simulator)

It's important to distinguish the two language levels (transformation
and interpretation).


Entry: staapl marketing
Date: Tue Oct 20 19:00:56 CEST 2009

I'd like to capture the nature of Staapl in a single meaningful
phrase, to then try to explain it word-by word, and allow for
exceptions.

The `patterns' language is the specification language of a
concatenative(1) code transformation(2) language with local(3)
actions.

(1) Concatentation of `words' (syntax) denotes composition of code
transformation functions (semantics).

(2) The code being transformed is the machine language of an abstract
or concrete machine.

(3) Because the code transformations are local, they act as the
operations of a stack machine.  If this locality is _also_ reflected
in the machine language objects the transformers manipulate, a 2-stage
stack language can be constructed.

The system behaves as a concatenative macro assembler that performs
reductions (computations) in addition to expansions.

Extensions:

Giving up locality allows the construction of more general combinator
languages, not necessarily stack-based.  This sort of behaviour can be
embedded inside a stack-machine.


Entry: PIC18F1220 with direct speaker attachment
Date: Thu Oct 29 11:57:37 CET 2009

It's possible to connect an 8 ohm speaker directly to the PIC output
pins as long as you switch it fast enough.  I'm going to use this to
build a bridged circuit for a burgler alarm.

P1A / B3 - pin 20
P2B / B2 - pin 19

On the CATkit board these lead to R7 and R8.

The software: I'm copying synth-1220-8.fm to alarm-1220-8.fm
The app can reuse most of the synth lib.  It just requires some
different config data and boot code.


  R: I need to document standard practices.  Building a library of
  code and running a console with access is straightforward.  But how
  to do the boot process?

So, it's sputtering.  Let's make the wiring optional.

I've changed it so that by default it drives the speaker in
full-bridge, but switches it off in the `engine-off' word to avoid DC
current.


Entry: Synth: the control layer
Date: Sat Oct 31 09:45:01 CET 2009

A lot of thought went into building the control/virtual-sample 2-task
structure, but apparently not into documenting it, except for
doc/pic18-synth.pdf (from .tex).

I'm just using `sync' from the synth module to provide a time base for
top-level control (duration & beep frequency).  The fancy task
switching is not used.

FIXME: there is no other task.. How come `yield' still works?


Entry: Haskell on hardware
Date: Fri Jan 15 08:34:32 CET 2010

Read this [1][2].

Is there an alternative?  I.e. can the operating system be eliminated
such that Haskell runs straight on hardware, and the hardware's
physical model is somehow represented without resorting to sequential
programming tricks like the IO monad?

It seems there is an opportunity to try this out in Staapl.  As the
concept of application / program / physical interaction all somewhat
blur on a bare metal microcontroller as there is little historical
baggage to carry around (i.e. operating system) except for the actual
design of the machine.  Can this last step be eliminated by designing
a machine that's less like a sequential computer, and more like a
bunch of functions and events?

What about a graph reducer in hardware?  Or first, a hardware-assisted
GC?


[1] http://conal.net/blog/posts/can-functional-programming-be-liberated-from-the-von-neumann-paradigm/
[2] entry://../compsci/20100115-080242


Entry: Roadmap
Date: Mon Feb  8 14:18:57 CET 2010

After a long time off the project, I'm thinking about the following
goals:

  * Make sure the low-level part is standard.

Staapl is essentially a macro-assembler based on PLT Scheme language
towers.  However, it does not have a proper interface for interacting
with external assemblers and binary tools (i.e. binutils).

Currently the PIC18 assembler is an essential part of the system.
This should be separated.  I still think it's a neat idea to take the
assembler into the workflow, but most of the work should be off-loaded
to external tools, as writing an assembler in itself isn't very
productive.  For this, the assembler expression language in Staapl
needs to become simpler (atm. it's scheme).

  * Offload high-level components to Ocaml/Haskell.

See libprim[1] and meta[2] projects.

[1] entry://../libprim
[2] entry://../meta


Entry: Offloading high-level components to OCaml/Haskell
Date: Sun Feb 21 12:13:32 CET 2010

I've been playing with Haskell and (Meta)OCaml a bit lately[1].
Current conclusions are that indeed they seem to be better suited for
building program transformers.  Types come in handy when things get
too complex.

This suggests that for Staapl it might also be useful to finally move
to Typed Scheme.  However, this will probably be postponed a bit until
I'm more comfortable with Haskell & OCaml.

One thing though: for commercial applications I really see the typed
languages as more promising, especially when code modifications are
required.  Somehow it seems that when the right abstractions are in
place, the typed languages are more friendly to the beginner.  Finding
the abstractions in accordance with the available type magic can be
hard though; type waters go deep.

[1] entry://../meta


Entry: PIC24 assembly
Date: Mon Feb 22 21:25:49 CET 2010

Another approach to get PIC24 support is to start from the (textual)
assembler, write an s-expression language for it and incorporate it
into Staapl in a bottom-up way; i.e. let the structure of the asm
language drive the higher abstractions instead of the other way
around.


Entry: Fixing (limiting) the assembler
Date: Wed Mar  3 19:15:16 CET 2010

This needs a fix in the `tv:' form.  This is an RPN form which takes
names either from the lexical environment and lifts them as constants
or delegates to the `scat:' form.

It looks like the modification needs to be made to replace that
`scat:' form by something with a restricted functionality such that
assembler expressions can be used.

Roadmap:

  - replace tv: -> scat: delegation by something more explicit,
    i.e. with an empty namespace

  - add functionality to the new form until all uses are covered.

This might turn up some cross-connections at places..

( Ha!  With my head in the Haskell learnings lately I forgot that I
really like Scheme macros ;)

EDIT: I've been going through the code and it seems the use of tv is
really quite limited to arithmetic expressions.  And, there is already
a `partial-eval' that can pretty-print code: this is used in the repl.

Atm it seems that this can be left un-checked.  Checking can be
introduced quite directly in the `tv:' macro definition, using the
`fn-no-lex' argument to `lex-mapper'.


EDIT2: Maybe it's even simpler to combine the partial eval and other
code to using abstract evaluation, instead of explicitly.

EDIT3: Or the other way around..  Encode sytax explicitly, but tag
semantics in case evaluation is done in Scheme.  The unifying
principle is: we don't really have semantics in Scheme; the assembler
should be abstracted, so it needs an AST rep, not a functional one.

Here I miss Haskell type class polymorphism. But maybe other forms of
polymorphism can be used.  Swindle?


Entry: Staapl's substrate
Date: Wed Mar  3 19:26:02 CET 2010

I've been pondering a lot about the code/data issue and the
(operational) semantics of Staapl.  One of the recurring themes is:

  * The programmer only ever sees combinators, not machine code.

  * Combinators are defined as manipulation of VM code, which is a mix
    of real machine code and intermediate representations.

The same problem came back in some Haskell code I wrote for performing
algebraic simplifications.

The vague idea is this: you really want to only ever use combinators,
but in order to implement them they need to have a ``data base''.

In Staapl this data base is some intermediate machine code.
Combinators generate and manipulate machine code.

The thorn in my side is: why is the pseudo-machine code (QW,CW) not
separate from the real machine code?  Or, is there a benefit to
combining both over a 2-phase approach where the combinators only work
on intermediate (QW,CW) code, and the peephole optimization is done
directly on that instead of by the combinators themselves?

From the perspective of [1] only the low-level optimizations are
relevant for Staapl.  Chapter 18 talks about pattern matching for
machine idioms, which is what most of the code in the PIC18 code
generator is about.  Chapter 6 talks about code generation from a
lowlevel intermediate representation (LIR) using a SLR(1) parser.


[1] http://books.google.be/books?id=Pq7pHwG1_OkC&printsec=frontcover&dq=advanced+compiler&source=bl&ots=4W91Krb-tS&sig=H7M_d8VN-MQAiInjpUshpqiwq_0&hl=en&ei=CauOS4KcKpn20gTNqvnuDA&sa=X&oi=book_result&ct=result&resnum=3&ved=0CBIQ6AEwAg


Entry: Correctness and Machine Description
Date: Wed Mar  3 19:50:37 CET 2010

One of the major problems in the development of Staapl was the
correctness of its code generator and peephole optimizations.  While
the rules by themselves are easy enough to prove, their large number
and the absence of a coverage test makes it possible for mistakes to
seep in.

I've written before about a way that makes it possible to assure the
correctness of the code generation by adding redundant information
(semantics) to the machine language (i.e. a simulator).

I wonder if this can be used together with quickcheck.  Since the
problem is the generation of the test cases, automatic that part might
be an interesting approach.

When trying this I got stuck on the description language for writing
down the semantics.  Then I got off on a tangent to describe a
dataflow language.

Essentially what is necessary is to represent each instruction as a
transition function, and perform automatic lifting such that sequences
of instructions can be translated to composed transition functions.
Once a sequence of machine instructions has a _functional_
representation (function + dependencies) it is straightforward to make
test cases and relate this concrete low-level operational semantics to
any higher level operational semantics.

***

The real problem: Staapl is not a language.  It is a notation for a
macro assembler, and in its current form it is not well specified.

I've seen this problem before[1]: thinking of the evolution of Staapl
on multiple targets as the evolution of a standard interface.  I'm not
so sure this is actually workable.  The problems I face simply
attempting to move to a more complex architecture (PIC24) makes me
wonder if it's not better to stick at the low level, or at least have
_multiple_ interface inbetween that can be reused, given constraints.

I.e. following Haskell typeclasses and OCaml modules, it makes sense
to reuse number systems (type classes).  It might be more useful to
push Staapl in such a direction: organically abstract more
functionality.

[1] entry://../staapl-blog/20090716-132153


Entry: Assembler expression language
Date: Fri Mar  5 08:32:16 CET 2010

The reason this is so difficult is because this mixes a lot of
concepts:

  - binding: all expressions are valid in an environment of labels.
    i.e. it is a reader monad.

  - the expressions themselves should be compilable to external
    assembler expression trees or embedded in Scheme

  - the syntax of the expression language can be concatenative or
    nested expressions.  maybe it's best to stick to the latter,
    because they have "value" semantics, not stack semantics.

  - this can be unified by allowing a lifting procedure that lifts
    relevant scheme procedures to procedures over an extended abstract
    domain.  I.e. + can mean scat/+ or scat-abstract/+ which delegates
    to scat/+ in case of literals.


So essentially:

   A an assembly object is a collection of _undefined_ labels (an
     environment) and a collection of expressions in terms of those
     labels

   M machine code is a collection of _defined_ labels containing
     binary code objects.

To be able to use internal and external assembler, the function A -> M
needs to be abstracted.


Entry: Assember refactoring: practical
Date: Sat Mar  6 08:54:35 CET 2010

Get rid of the `source' component in target-value, and use abstract
interpretation instead.  Try once to see the dependencies, then
reconsider if it doesn't work.

Reconsider: I do not want to loose names of constants.  Currently
these go through:

  (define-syntax-rule (constants (name value) ...)
    (begin
      (define name (target-value-delay value 'name)) ...
      (compositions (macro) macro: (name ',name) ...)))

Which binds the name both in the Scheme namespace to a target-value
and in the macro/ namespace as a constant function that refers to the
scheme-bound name.

It looks like there is no way around it: current implementation needs
to maintain explicit source rep as we need to export symbols.  Maybe
the functional rep can then be discarded?

I forgot: how de labels fit into the picture?  The key routine is the
following: target-value objects will trigger their interpreter, while
target-word structs return an address if it is defined, or abort
otherwise.

;; Undefined words will abort. This is internal: used only to
;; recursively evaluate target-value references.
(define (target-value-eval expr)
  (cond
   ((target-value? expr) ((target-value-thunk expr)))
   ((target-word? expr)  (or (target-word-address expr)
                             (target-value-abort)))
   (else expr)))

To make this easier to understand, I want to change names:

  target-value  -> target-asmexpr
  target-word   -> target-node

Maybe the latter is not necessary, but the former seems to be: it's
essential to capture the idea that these are expressions - not fully
evaluated.

No - it's too interwoven, also the doc.  I don't think it's that
complicated once you look from the perspective of the assembly result,
not the specification:

  target-word   = a machine address
  target-value  = a literal operand (can be address or num constant)

  Because the target-word objects are not yet instantiated before
  assembly relaxation is performed, the target-value objects need to
  be thunks, parameterized by the value of the target-word objects.

I think it really needs to stay the same.  Maybe the only thing to do
is to abstract the composition mechanism and representation of the
target-value code components.  I.e. turn them into s-expressions
instead of concatenative words, as they represent values, not stacks
or stack ops.

So.. is this a detail or not?

  PRO: it's instructive to be able to see symbolic asm code in RPN
       form like this:

box> (code> 123 TOSL +)
[qw (123 TOSL +)]

  CON: - essentially, these chunks represent values, not stacks or stackops
       - an extra translation from RPN -> expression form is necessary

What about this: keep the code form to serve as human-readable only,
then modify the function form such that the functions come out of an
asmexpr namespace.  This then allows one to encode the assembler
syntax into the operations.

I.e. change 'tv:' to something that is parameterized by the asmexpr
compiler.

So I'm at this now:

;; The target-value compiler uses scat: to construct assembler
;; expressions.  (FIXME: this sould later be full parameterization of
;; which assembler to use).

(define-syntax-rule (tv: . code)
  (make-target-value-compiler scat: . code))

Is that enough?  No, it needs to be parameterized differently.  Is the
whole target-value / target-word business necessary when using an
external asm?  In some sense yes as it's necessary to read back the
binary data.  (Maybe that alone can be quite a problem?)

I.e. the binary code storage is tightly coupled to the target-word
struct, which matches it up with assembler opcodes.  Really, this
can't be handled by external tools.

My conclusion is this:

  - Staapl is a macro assembler.  The 'assembler' (symbolic -> binary
    translation and relaxation) part is a deeply integrated component
    of the system.  I.e. the Staapl assembler is more powerful than
    external monolithic tools.

  - To support external tools, use tool behaviour snarfing.  Todo:
    write an assembler snarfer (in Haskell?).


Entry: Recovering static structure of Scheme programs
Date: Sat Mar  6 09:20:38 CET 2010

Aside from the name bindings, there is of course little static
structure in Staapl.  I'm having trouble re-figuring out the
connection between target values and target words.

Hint: follow name bindings, and possibly change them temporarily to
find out dependencies quickly.


Entry: The PIC24 assembler
Date: Sat Mar  6 11:09:12 CET 2010

What assembler can be used to start from?  The MicroChip tools can run
under wine.  Maybe it's best to use those?  Alternatively use the
dspic30 toolchain [1][2].

Maybe it's best to stick with the windows releases.  Other toolchains
will probably have similar windows-only components.


Entry: MPLAB and wine
Date: Sat Mar  6 11:45:58 CET 2010

The MPLAB version I have apparently doesn't run on 64 bit linux.

The error message:

  winevdm: unable to exec 'D:\tom\.wine\drive_c\MPLAB\MPASM.EXE': DOS memory range unavailable

Goes away after this:

  sudo sysctl -w vm.mmap_min_addr=0

Now the message (on 64bit) is:

  wine: Cannot start DOS application "D:\\tom\\.wine\\drive_c\\MPLAB\\MPASM.EXE"
        because vm86 mode is not supported on this platform.


According to [3]:

> This happens because your CPU is still in 64-bit mode. Any Intel/AMD
> processor cannot use vm86 once the cpu is in 64-bit mode. The only way to
> run 16-bit applications on a 64-bit OS is emulation software such as DOSBox.

Ok, full emulation then..

Hmm.. I also tried the pic30 chain at [4].  I did find some compiled
debs that work on 32bit[5] but the compilation of [4] requires deps I
don't have.  Too much hassle.


[1] http://www.baycom.org/~tom/dspic/
[2] http://iridia.ulb.ac.be/~e-puck/wiki/tiki-index.php?page=Cross+compiling+for+dsPic
[3] http://osdir.com/ml/wine-users/2009-07/msg01058.html
[4] http://sourceforge.net/apps/mediawiki/piklab/index.php?title=Compilation_of_pic30_version_3.01
[5] http://iridia.ulb.ac.be/~e-puck/wiki/tiki-index.php?page=Cross+compiling+for+dsPic


Entry: dsPIC -> ARM
Date: Sat Mar  6 12:53:18 CET 2010

I think it's probably better to forget about dsPIC for now, and work
only on ARM (thumb) code generation.  There are essentially two
problems:

  - move from flat -> nested syntax for argument modifiers
    (i.e. addressing mode)


Oops.  I ran out of steam.  Most of these developments seem to be dead
ends or require a huge amount of effort I can't spend atm.


Entry: GNU Binutils
Date: Sat Mar  6 14:57:23 CET 2010

It looks like the real problem is that I'm underestimating the
arbitrariness of assembler syntax.  This arbitrariness needs to be
encoded somewhere..

Maybe it would be instructive to see how binutils is implemented?

Some intro here[1]:

  opcodes/ contains the opcodes library. This has information on how
           to assemble and disassemble instructions

  cpu/     contains source files for a utility called CGEN. This is a tool
           that can be used to automatically generate target-specific
           source files for the opcodes library, as well as for the
           SIM simulator used by GDB.


I'm looking at the Microchip binutils extension from
mplabalc30v3_01_A.tar.gz, in acme/opcodes/pic30-opc.c

Much of the necessary information is in
  const struct pic30_opcode pic30_opcodes

Looks like that can be snarfed just fine.


[1] http://www.linuxforu.com/teach-me/binutils-porting-guide-to-a-new-target-architecture/


Entry: Graham-Glanville method
Date: Sat Mar  6 18:59:51 CET 2010

(See Muchnick: Chapter 6)

The basic idea is to relate trees and instructions in a grammar, and
perform bottom-up parsing with certain disambiguation rules.  Then
parsing emits instructions while reducing the tree.


Entry: PIC30 tools
Date: Sun Mar  7 14:53:50 CET 2010


tar xf pic30-deb-templates-3.01.tar.bz2

cd pic30-3.01/pic30-binutils-3.01/upstream ; wget http://ww1.microchip.com/downloads/en/DeviceDoc/mplabalc30v3_01_A.tar.gz
cd pic30-3.01/pic30-binutils-3.01          ; dpkg-buildpackage -b
cd pic30-3.01                              ; dpkg -i pic30-binutils*.deb

cd pic30-3.01/pic30-gcc-3.01/upstream/     ; wget http://ww1.microchip.com/downloads/en/DeviceDoc/mplabc30v3_01_A.tgz
cd pic30-3.01/pic30-gcc-3.01/              ; dpkg-buildpackage -b

The support files come from the MPLAB distribution:  see [1]

http://ww1.microchip.com/downloads/en/DeviceDoc/MPLAB_C30_v2_05-Full.exe
under windows, install with serial:
MTI030340303

Copy directories /include, /lib, /support, and /bin/c30_device.info to pic30-3.01/pic30-support-3.01/upstream/

tom@wurzon /Program Files/Microchip/MPLAB C30
$ tar zcf pic30-support-3.01.tgz include lib support bin/c30_device.info

Now copy to linux and untar in  pic30-support-3.01/upstream (mv bin/c30_device.info .)


debian note:
       - gcc-3.3 and sysutils are in etch
       - dos2unix is now fromdos (apt-get install tofrodos)


[1] http://www.opencircuits.com/DsPIC30F_5011_Development_Board
[2] http://sourceforge.net/apps/mediawiki/piklab/index.php?title=Compilation_of_pic30_version_3.01


Entry: Hands-on
Date: Sun Mar  7 15:24:53 CET 2010

Let's forget about the dsPIC assembler until after using the
binutils/gcc toolchain to get something to work.

I'd like to re-focus on actually using Staapl to build things instead
of perpetually redesigning the core.  The problem I got stuck on last
time was building a network of PIC devices.

Let's make this priority one.

Last time the problem was that I don't have an I2C network yet.  This
is the ultimate goal.  In order to get there, I need to build a hub
app.  The hub app takes serial PC communication to anyting else.

The hub needs to run at 40 MHz to be able to get a decent data rate.

It's been a while.  In the summer I got side-tracked by the dataflow
language ideas.  This is where I got last time:

  entry://20090711-151122  better to use a single code image (homogenous network)
  entry://20090627-144638  serial daisy-chaining works
  entry://20090625-131214  standard ``zwizwa connector''
  entry://20090606-125114  ...

The circuit that's on my desk is a 452@40 connected to a 2620@54
(using a 13.5 MHz XTAL).

Daisy chaining worked by connecting the TX of the 452 to the RX of the
2620.  In Staapl this worked like this:

tom@zni:~/staapl/app$ make ttlmono-2620-54.dict
tom@zni:~/staapl/app$ mzscheme ttlmono-2620-54.dict

That didn't work

tom@zni:~/staapl/app$ make 452-40.live

That gave a programming error.  After disconnecting the 2620 it did
program succesfully.  Looping the RX/TX chain on the slave connector
also worked.

scan
Found 1 target(s).
OK

If I recall this has something to do with the reset of the 2nd PIC.
Or the baud rate?

Hmm.. and then out of the blue it works.

scan
Found 2 target(s).
OK

Using `target!' it's possible to switch target.

This does get messed up easily.  I have no idea how to reset it
properly.  This seems to help:

pk2cmd -W1 -R -PPIC18F452


Good.  Cleaned up the target addressing so it is a bit more robust:

  - target-count now sends a nop token to 255 without checking

  - target-receive+id/b checks to see if a message made a round-trip
    without being answered

  - target! performs target-count to make sure the id is valid before
    setting it.


Entry: Dependency analysis
Date: Tue Mar  9 11:22:10 CET 2010

Modeling a CPU.  Let's stick to the 2-phase model:

  - static: dependencies (connectivity: input influence & output spill)
  - dynamic: boolean functions

The idea is to express composition:

  - static: this is type inference: a sequence of instructions has
    certain static information associated that can be composed at
    compile time as concrete info (i.e. register usage).

  - dynamic info can be functionally modeled.

It might be better to do this in a typed language then?


Entry: Composing partial state maps
Date: Sat Mar 13 09:04:52 CET 2010

I need to get this going..  Let's look at the assembler typing/sim
stuff, either in staged Scheme or in Haskell.

First point is the datasheet[1].  Let's use the 12F675 since the
instruction set is simpler than the 18F series.

What are the high-level problems?

    * Functional dependencies.  This is easy:

         ADDLW k

         input constraint: 0 <= k <= 255
         function:         (W) <- (W) + k
         status function:  C, DC, Z

      This needs a low-level logic language to express the
      semantics. I.e. at the bottom this should be logic gates, but
      certain compositions should be acellerated by "simulator
      macros".

      The status functions need to be made explicit, so the function
      type is:

           k -> W -> (W,C,DC,Z)

    * Composition / dependency analysis.

      The interesting problem is how to take descriptions that map
      partial state to partial state, and lift them to complete state
      maps so they can be composed and then possibly re-embedded in a
      minimal representation for dependency or "clobber" analysis.


What are the low-level problems?

1. The basic level is logic gates: the bottom line of semantics should
   be as simple as possible.

2. Solve composition of logic gates by dependency analysis.

3. Solve the mapping (simulation) problem, i.e. implement certain
   networks by operations present in an implemented target, which
   could be a high level language, a real machine, an FPGA circuit
   (build accelerated compiler verifiers on FPGA!), ...


What is the essential idea?

  I'm dealing with information on at least 2 levels: network
  connectivity (meta-lang) and network function (object-lang).

  I.e. staging is essential: There are going to be computations on the
  meta-lang (figuring out dependencies, lifting dependencies to
  provide composition).

  How this is implemented doesn't really matter much (Scheme, Haskell,
  MetaOCaml).

   * Scheme: highest flexibility (hackability due to more operational
     nature: just do it) and very simple staging (macros & modules).

   * MetaOCaml: typed staging, might be useful for figuring out static
     structure of the program itself better.

   * Haskell: very flexible type system, but type-level computations
     are somewhat complex.


Functions and resources.

   1. FUNCTIONS: object language: functional dependencies
      (AND,OR,NOT,compositions) between nodes.

   2. RESOURCES: meta language / type system: _physical_
      instantiations of functional networks connecting shared nodes.


Tagless interpreters[2] and embedding of staging in non-staged
languages[3] are going to be essential components to provide insights.

Hardware languages are about instantiation of macros.  Is it possible
to use the ideas behind Ziggurat[4] to provide higher level semantics
of compositions?  I.e. it's simple to pool together a huge number of
gates into an abstraction, and simulate it by simulating the instance
directly.  What is interesting though is to abstract it n the
meta-level (semantics!) not just the human-modular level.

What I mean is: composition of modules in hardware description
languages is about abstraction for the engineer: the engineer has a
(fuzzy) model about how something works and can use this to "proove"
correctness of compositions.  Can this be made more formal?  Can we
use the specification (simplification) as a type of a hardware module?
I.e. how to relate low-level properties/semantics to high-level[3].


The real problem seems to be management of state (resource).  I.e. an
I2C interface isn't a function, it's a state machine.

Anyways.. There's something to learn here.


[1] http://ww1.microchip.com/downloads/en/DeviceDoc/80125H.pdf
[2] http://okmij.org/ftp/tagless-final/APLAS.pdf
[3] http://okmij.org/ftp/Computation/staging/metafx.pdf
[4] http://lambda-the-ultimate.org/node/3179


Entry: Haskell vs. Scheme
Date: Sun Mar 14 21:32:58 CET 2010

Idea: pattern matching is not composable in Haskell/OCaml: syntactic
elements that cannot be abstracted over.  In Scheme this is easy.

     (patterns-class
      (macro)
      ;;----------------------------------------
      (word         opcode)
      ;;----------------------------------------
      ((1+          incf)
       (1-          decf)
       (rot<<c      rlcf)
       (rot>>c      rrcf)
       (rot<<       rlncf)
       (rot>>       rrncf)
       (swap-nibble swapf))
      ;;----------------------------------------
      (([movf f 0 0] word) ([opcode f 0 0]))
      ((word)              ([opcode WREG 0 0])))

Here the identifiers `1+',`1-',... are defined by instantiating the
two rules on the bottom, filling in `word' and `opcode' respectively.


Entry: Functional representation of stages
Date: Sun Mar 14 23:00:40 CET 2010


I've identified the language levels[1] in Staapl.  I was wondering how
to represent these as higher order functions, and whether it is
useful.

I see 2 ways to do this:


A. Directly relating functions in the 3 levels:

> type Mem a   = [a]                 -- Machine state, parameterized by representation
> type Asm a   = (Mem a) -> (Mem a)  -- Machine code, represents state transitions
> type Macro a = (Asm a) -> (Asm a)  -- Macro language = machine code transformers

The problem is that this does not include pattern matching rules
(intensional analysis).  Code is an opaque type.

Pattern matching seems to be an essential component to encode the
actual work.


B. So, what about including the interpretation steps, i.e. have a mix
   of code and data intermediate?

See figure in [1] which looks like

                 data
          int      |
     stx ====> fun |
      :            |
 comp :            V
      :          data'
      V
     stx'

where the morphism called "interpretation" is included explicitly.

> -- Machine state, parameterized by representation
> type Mem a = [a]
>
> -- Semantics of AsmStx and MacroStx
> type AsmFun a   = (Mem a) -> (Mem a)
> type MacroFun a = (AsmStx a) -> (AsmStx a)
>
> -- Machine and forth syntax: concrete (structured) data.
> data MacroStx a = ...
> data AsmStx a = ...
>
> iAsm   :: AsmStx a -> AsmFun a
> iMacro :: MacroStx a -> MacroFun a


So is interpretation a practical issue (in Haskell, OCaml, Scheme,
... you need the syntax representation to be able to manipulate it) or
is this of deeper significance?

Ultimately this should be related to mathematical logic, where formal
statements and and formal rewrite rules are manipulated on the
meta-level.

[1] entry://../staapl-blog/20100314-192109


Entry: About compilation
Date: Sun Mar 14 23:07:46 CET 2010

It really is ultimately about proof that a walk towards an optimum
given by one measure, doesn't change another measure.

I have this neat little bag of symbols here that after a complex
computation (interpretation) gives a result.

What I want to do is to replace that neat little bag with an even
neater little bag (according to some measure) such that the result
after interpretation isn't influenced.


This is a constrained optimization problem.

     CONSTRAINT:     semantics preserving
     OPTIMIZATION:   minimize some other property.

Suppose my neat packet is X, suppose my correct semantics expressed by
the equation S(X) = 0, and suppose my neatness (i.e. code size,
execution speed, power consumption) maps X into an ordered space P : X
-> {O,<} such that we can compare X1 and X2 based on the order
introduced by the property P.


The deal is: in a transformation machine that preserves semantics
S(X), a lot of the internal structure of S is ``ghosted''.

I think the place to look is the peephole optimizer of [1].


[1] http://compcert.inria.fr/doc/index.html


Entry: Compilation, Interpretation and Staging
Date: Tue Mar 16 15:13:43 CET 2010

I'm trying to build an intuition for the following diagram

                 data
          int      |
     stx ====> fun |
      :            |
 comp :            V
      :          data'
      V
     stx'

which represents the types:

    int  :: stx -> (data -> data')
    comp :: stx -> stx'

An interpreter maps syntax to function, while a compiler maps syntax
to syntax.  For a state machine representation, data=data'.

The difference between staging and compilation is then quite clear:

    - staging uses the range (data') of a target function domain as
      the input of a following interpretation step

    - multi-pass compilation is straightforward function composition

The big idea is that compilation is just computation (stx -> stx').
However, the connotation of compilation is usually that semantics of
the syntax is preserved in some way.  I.e. every intermediate syntax
will be related to some semantics (function domain) in such a way that
those function domains can be related.


           int
     stx  ====> fun
      :          ^
 comp :          ;
      :          ; proj
      V    int'  ;
     stx' ====> fun'


I.e. for a definitional interpreter int, we know that comp is correct
if int = comp . int' . proj where int' is the target interpreter and
proj maps the target semantics (i.e. machine simulator) into the
original semantics.

Question: what happens if proj points in the other way?


Entry: Compiler Testing
Date: Wed Mar 17 11:24:37 CET 2010

What is needed is a test relative to a reference implementation, and a
sufficient argument the test coverage is broad enough.

This requires:

     - a _simple_ reference implementation (VM + compiler)

     - simulators for target architectures + test suite.

The idea is that it is easy to write these components (errors can't
hide), but it is error prone to write an optimizing compiler.


Entry: Test Coverage
Date: Wed Mar 17 11:42:39 CET 2010

As an alternative to simulation, it might be possible to create a test
suite that runs on the target.  Essentially, the only reason to write
a simulator is to ``augment'' it with behaviour that makes inspection
easier.

This leads to the question of coverage.  Can the compiler be
instrumented with a log that records the rewrite rules that are being
applied?


Entry: Multi-pass : applicative functor?
Date: Wed Mar 17 11:59:44 CET 2010

Staapl has 2 passes, one that applies a list of functions to a list of
assembly instructions, and one that applies the same function on the
list.  What about making these operations explicit?


Entry: Proving rules
Date: Wed Mar 17 12:03:01 CET 2010

Is it really so that a correctness proof is too difficult?  Some of
them are really quite trivial, i.e.:

  (((movlw a) (exit) pseudo) ((retlw a)))

The good thing about proof is that all rules can be treated
individually.  For testing, I'm not so sure about that.

 (([qw a ] [qw b] -)             ([qw (tv: a b -)]))
 (([addlw a] [qw b] -)           ([addlw (tv: a b -)]))
 (([qw a] -)                     ([addlw (tv: a -1 *)]))


Entry: Adding static semantics to macros
Date: Sat Mar 20 13:46:29 CET 2010

I'm trying to get an idea of Ziggurat[1] from the JFP paper (no public
link).  Before trying to explain that approach, I'm going to walk
around in ignorance for a bit, and see what I can write down.

A macro is a compiler:

        code -> code'

A typed macro is a compiler together with a (static) interpreter.

        code -> fun
      = code -> data -> data'

The idea is that before (or interleaved with) compilation, some
interpretation is performed on the code to see whether it has certain
properties, without executing it completely.  I.e. one performs an
abstract interpretation.

( Note that the basic unit of interpretation doesn't need to be the
same for code and data. I.e. type checking/inference can span over
multiple functions in a module, while functions are typically isolated
in behaviour. )

Summary:

        The main idea is that there is _both_ compilation and
        interpretation going on.


Ok, now the paper.

* The basic idea is delegation of behaviour.  In terms of syntax
  objects this is semantics: how to interpret.  A syntax object can
  provide its own meaning, or delegate to the syntax it expands into.

* The "lazy" part is there to break a possible circularity.
  Delegation depends on expansion (macro use time) and is not knowable
  at macro definition time.


[1] http://lambda-the-ultimate.org/node/3179


Entry: Lambda vs. Patterns
Date: Mon Apr 12 10:10:29 EDT 2010

Dominikus' patterns: variables are "freed" after patterns are
executed.  What makes this different from the LC?

I.e. variables are still used to provide random access to values
inside data structures (to encode permutation combinators), but there
is no concept of closure or environment.

Apples and oranges.  But what is the link?  How to make the
correspondence: application/abstraction vs quotation/dequotation?

Confusing stuff without a proper substrate.


Entry: removed log-stx in asm-template-tx
Date: Sun May 16 15:26:56 CEST 2010

I don't remember how the template logging works, but I've commented
out the reference of log-stx in the asm-template-tx macro in
coma/pattern-tx.ss


Entry: Problems with Staapl
Date: Tue May 18 15:01:57 CEST 2010

I'm using it for something practical after 6 months of leaving it
alone.  What is annoying?

  - PIC chip config is brutal (binary only)

  - Basic PIC18 library configuration also: there is no mechanism for
    defaults + there is a bunch of "include" files that are badly
    organized.

  - There is no standard approach for debugging up to getting the
    serial console working.  Make a check list or something.
    (i.e. measuring "#0xF0 transmit" with a frequency counter).

  - There is no automated procedure for porting Microchip include
    files to .f


Entry: 4550 fatman
Date: Tue May 18 15:29:43 CEST 2010

Got it working after soldering on a 100n decoupling cap and switching
to 19200 baud.  Now it works up to 230400 baud.


Entry: The Staapl Killer App: static data structures -> code
Date: Thu Aug 19 23:12:41 CEST 2010

Reactive framework.
1. Compile to RAM structure
2. Compile to Flash + RAM structure
3. Combine static data structure + its "interpreter" into in-line code.

Number 3 might be interesting to work out for a I/LE reactive network.
It has a fair amount of links and a relatively small run-time state.
The data structure links can be replaced with code branches, and the
run-time state can be used to make those branches conditional.


Entry: Targets
Date: Tue Aug 24 21:55:11 CEST 2010

A bit too much freedom to pick from so many targets.  One thing I
wonder about is whether the PIC was actually a good choice.  It is
really different from standard RISC.

ARM cores are getting really cheap.  Programming them in C isn't a big
deal as I've recently learned;  It feels like "normal" programming.

So why am I doing the whole tool stack myself?  Maybe I should keep
these questions away from the project and worry about it in the
libprim/meta projects.

    * Staapl is about the PIC18.

I don't really have time to make it work on different architectures
(tools + libraries).  Maybe one day LLVM, but that's it.

    * Staapl is about Scheme and Forth.

To make Staapl more interesting community-wise, it might be a good
idea to get the standard Forth interpreter going: automatic
"unrolling" bootstrapping.


About the dsPIC.  This is an interesting architecture from an
application side, but probably better for the other metaprogramming
project.  In any case, I should not attempt to do anything before
writing an app in its machine language..  Then the road will become
clear pretty fast.


Entry: Staapl heritage: colorForth and Machine Forth
Date: Sat Sep  4 19:54:46 CEST 2010

It might be interesting to track down the Staapl heritage.  Chuck
Moore's and Jeff Fox's web sites are probably key.  Brad Rodriguez
Moving Forth series[2].

[1] http://www.complang.tuwien.ac.at/anton/euroforth/ef99/thomas99a.pdf
[2] http://www.bradrodriguez.com/papers/


Entry: Why does Forth lead to small code size?
Date: Sat Sep  4 20:30:24 CEST 2010

1. You're forced to _factor_.  This exposes reusable code and
   "exponential leverage".

2. You're forced to _order_ variable accesses.  This reduces
   addressing overhead (if you can avoid stack shuffling).

Both are non-trivial at first, but learnable and soon become second
nature.


Entry: I need a fun project
Date: Wed Sep  8 01:52:10 CEST 2010

Working working serious stuff..  I need some play.  What about a
bootstrappable (standard?) stand-alone Forth?  Last time I worked on
this I got stuck at some cross-stage binding issues, which are
solvable using let-syntax[1].

[1] entry://20090722-123240


Entry: The good thing about microcontrollers ..
Date: Sun Sep 19 20:29:28 CEST 2010

.. is that RAM is pristine.  Apart from having to deal with hardware
interfaces, there are very little limitations about how to use memory.

( Forget for a moment the possibility to reserve _part_ of the RAM and
Flash for code with standard calling conventions and non-moving flat
memory layout. )

The point is that building a more useful graph memory on top of
RAM/Flash combo is doable when you can design the whole system.


Entry: Things to fix
Date: Sun Oct  3 08:38:46 CEST 2010

* Modules work fine, but what with parser macros?  Currently the way
  these are made modular is a bit of a hack.  Can this be done
  differently, or is it not a real problem?

The problem is that units and macros don't mix well.  It is possible
to define macros in terms of unit identifiers, but then these are
treated somewhat special.


Entry: One chip, one model for verification
Date: Tue Oct  5 21:15:52 CEST 2010

An advantage of using only one target chip (that's simple!) is to be
able to write a verifiable semantics more easily.


Entry: Continuous evaluation
Date: Wed Oct  6 19:46:20 CEST 2010

Tired, but different frame of mind..

- Staapl Forth is low-level but easy to build on:
  Each application is a DSL.

- For verification, some formal semantics is necessary, even if only
  for black-box testing.


Entry: Picking up again
Date: Mon Nov 15 11:52:19 EST 2010

I'd like to pick it up again, probably restructuring and documenting
the code.  The goal is to work towards a USB driver for the PIC18 with
a middle point that implements a proper debugging interface and emacs
bridge.

Meaning, the compiler seems to be working fine, now make the
interaction system a bit less messy.

First, what's with this module business?

A "staapl pic18/serial" line is equivalent to "(require (planet
zwizwa/staapl/pic18/serial))".

This defines a couple of macros like "macro/async.>tx" and target
words like "target/async.>tx"

The macros are code generation/transformation functions while the
target words are compiled target code.

One of the most important features of modules is to make sure all
names are bound at compile time.  I.e. I just converted an old usb.f
file to __usb.ss including the top line:

#lang planet zwizwa/staapl/pic18 \ -*- forth -*-

This then allows compilation with "mzc __usb.ss", giving an error in
my case:


mzc ~/staapl/staapl/pic18/__usb.ss
/home/tom/staapl/staapl/pic18/__usb.ss:23:6: compile: unbound identifier in module in: macro/UIE


Entry: More flexible name binding
Date: Mon Nov 15 12:21:14 EST 2010

What is getting more clear now is that I need both modules and units
as abstraction mechanisms.

Example: machine constant names are an interface.  They are provided
_globally_ so need to be parameterized somehow.

Modules won't work there.

This is important.  It needs some serious thought.

In deeply embedded software projects, compile time parameterization is
important (i.e. see eCos).  This means code has holes.

Soo, looks like I found a project to constructively procrastinate on
the USB driver: work out the module system.


Entry: The module system
Date: Mon Nov 15 18:51:57 EST 2010

- The current "word set" approach is good.  Build on that.

- Prefix parsing macros are bad because they hurt composition and
  interfere with the Racket unit system.  Can they be removed or
  somehow be made harmless?  I.e. turn them into surface syntax only?

- Can we attach types to the unit interfaces?


Entry: Partial evaluation
Date: Mon Nov 15 18:55:07 EST 2010

Is the current greedy approach actually smart, or should reduction be
defined in a different way?  Find a good explanation for the two
alternatives.

The good part is that partial evaluation is expressed as simple and
straightforward evaluation on "code stacks".  This is good, as long as
the language comprising the code stacks is simple.  Currently it also
includes machine asm, so it's screaming for a semantics that can be
used to verify partial evaluation rules.


Entry: Linker scriots: eliminating `load'
Date: Tue Nov 16 08:13:21 EST 2010

That's really the basic idea.  You write parameterized modules in
terms of `import'.

The boring part is going to find a way to formulate this.  I.e. how to
solve the unit/module grouping.  Do we allow more than one unit in a
module?


Entry: Control flow graph
Date: Tue Nov 16 08:24:22 EST 2010

The current compilation state is a bit ad-hoc.  Can we get something
more elegant?  I.e. like [1]

[1] http://www.cs.tufts.edu/~nr/pubs/zipcfg-abstract.html


Entry: Summary of possible directions
Date: Tue Nov 16 08:27:28 EST 2010

In order of importance

1 Fix module system: proper units and eliminate "load".  The USB
  driver can be the pull for this.

2 Keep the "eager macros" partial evaluation strategy, but augment it
  with a semantics of the low level machine language used.

3 Build the compiler on top of a more abstract control flow graph.
  Currently the way compilation state is maintained feels a bit raw..

4 Build a theory for the I :: m -> (t -> t) towering in Haskell.


Entry: The I :: m -> (t -> t) towering
Date: Tue Nov 16 09:59:24 EST 2010

The main idea is that the structure of m ``roughly'' corresponds to
the structure of t.  Can we call it structural towering?

I.e. if m is a concatenation of elements and so is t, then multiple
layers of towering ``act as one''.  I.e. it is simple to then add
processing steps like optimization.

* In Staapl the 't' isn't really target code: it contains pseudo code,
  a control flow graph and a "macro return stack" hack used to
  implement local exit in macros.

  More specifically, the interpretation is I :: m -> (t' -> t') where
  t' is an extension of t.  Then compilable macros are those that
  eventually project down to (t -> t) without loosing structure.

* Generalized, if a simple arrangement of m can be translated to a
  particular function compositions structure, we're still in business.


Entry: Little annoyances
Date: Fri Nov 19 08:13:21 EST 2010

- can't hard-reset chip on console


Entry: Units
Date: Fri Nov 19 08:16:04 EST 2010

Starting with something tangible: ttlmono-2620-54.fm

It has the following load statements that are to be eliminated:

load p18f2620.f        \ chip macros
load monitor-serial.f  \ boot block + serial monitor code

The place to start is probably to model model-serial.f as a
parameterized unit.  That file consists of:

load monitor-serial-core.f
load monitor-serial-warm.f

The first one doesn't have any unbound names, so can be replaced by a
module directly.  Nope, it needs:

       - init-chip
       - fosc

       - init-serial
       - baud


What does this mean?  The monitor APPLICATION needs chip-specific code
for serial and whole chip init.  That sounds reasonable.

Next: turn monitor-serial-core.ss into a unit.

This needs components:
  - define new unit signatures
  - import / export
  - link

So it introduces some red tape.  Let's stick to s-expr syntax for the
signatures and link files.  Add these units:

  pic18-osc-sig.ss
  pic18-serial-sig.ss

Let's just keep these inside the pic18/sig.ss for now:

(define-macro-set pic18-chip^   (fosc init-chip))
(define-macro-set pic18-serial^ (baud init-serial))

And let's just stick with one interface:

;; Chip-specific code.
(define-macro-set pic18-chip^
  (fosc        ;; oscillator Hz
   init-chip   ;; chip-specific init
   baud        ;; hard-coded monitor baud rate
   init-serial ;; chip-specific serial port init
   ))


Next problem: the .ss files use #lang scheme/unit
How to expose these to Staapl?

I've added pic18-unit/lang.ss based on scheme/unit instead of
scheme/base but I run into trouble that indicates I don't know what
I'm doing.

To investigate: how does the lang/reader.ss mechanism work again?  It
expands to a "module" form, but the scheme/unit/lang/reader.ss might
do something else.


Entry: The Forth -> module parsing is too complicated
Date: Sat Nov 20 08:21:12 EST 2010

Reason: the `expand' re-structuring after a require statement.

Maybe it's simpler to expand from Forth (concrete) syntax straight to
Scheme module syntax instead of going through forth-begin.

Or: is the split between lexing and parsing really necessary?  It even
complicates standardizing Forth.

I think it's important to keep in mind that the basic form should be
an s-expression that integrates well with the rest of Racket.


What about this:

   * Write a non-extensible flat Forth lexer that compiles to (module
     ...) form.

   * Is it possible to do this in a way that is extensible?  I.e. can
     we have a read-time Forth running?

   * Can the same code be used on-target?


So, the point really seems to be to write a stand-alone Forth and then
stub it out to plug in the compiler/macro part.

Can we start out with something like eForth and move on from there?


Entry: Really unrolling Forth
Date: Sat Nov 20 08:56:52 EST 2010

Come on, it can't be that difficult.

The point:

    - Internally, the Forth is unrolled and defined in terms of macros
      with phase separation.

    - A reflective front-end should generate such a structure.


You really need a reflective Forth to implement this!  There is no way
around it.  It probably needs to be meta-circular too.  Then bootstrap
it and optimize for a particular target.

The problem is the meta-circular interpreter: a semantics for Forth :)


Entry: Just get rid of the damn syntax.
Date: Sat Nov 20 09:39:57 EST 2010

So, this makes me wonder.  Why am I so attached to this Forth syntax?
Maybe the project should be finished (implement units properly for the
low-level library) without touching any Forth syntax?

It's probably much more useful to have a proper s-expression based
syntax first.


Entry: Summary
Date: Sat Nov 20 16:13:05 EST 2010

- Fix units -> define a proper s-expression based module format for
  macros and target code.  Then units are "automatically" fixed on top
  of the scheme/unit language.

  This involves refactoring the Forth parser.  Soothing words: the
  current Forth syntax isn't standard anyway, and is maybe more a
  roadblock than a nice feature.

- Write a new parser on top of the s-expression format, one that's
  more standard Forth-like.


Entry: Future of Staapl
Date: Tue Dec 28 14:51:17 EST 2010

Time to get started as I need it for a project.  I think I'm going to
ditch the Forth syntax and move to something more compositional.

Roadmap:

  - Find a way to marry code with modules.  Currently only macros have
    modules.  Can this be done in a way that meshes better with the
    Scheme system?


Entry: s-expression only: splitting forth parser and dictionary compiler
Date: Thu Dec 30 11:29:22 EST 2010

coma/macro-forth.ss:

Currently has both the forth parsing and dictionary compilation parts.
Luckily those are already separate.  Let's just put them in a separate
module.


Using `forth-dictionary-log' to see what's actually passed in.  I had
to fix a bug here: dynamic parameter needs to be a function.


The question is: why does the expanded dictionary have `forth-parse'
calls?  This happends after require statements.  I have the feeling
that this feedback loop is what makes some things behave badly.
Basicly, we don't compile to a flat dictionary structure, but compile
to something that has a recursively defined dictionary structure.

It's a bit of a mess...

It's probably best to start somewhere else.  The ingredients are:

  code-register-postponed!
  wrap-macro
  wrap-word
  wrap-variable

This is exactly what is passed to `define-forth-parser' in the
pic18.ss module.  The first one is defined in the code.ss module.  The
latter 3 are defined in the comp/compiler-unit.ss module.

The word "postponed" just means "postponed to run time", i.e. compiled
code.

So, how to use that interface directly?  I.e. let's define some
variables and some words.

          (wrap-word name loc macro)

box> (wrap-word 'foo #f (macro: 1 +))
#<target-word>
#state->state
#state->state

The values are: label (contains code), label compiler (i.e. for call
or address as data), and the code generator.

So this doesn't yet compile anything.  That's what the
`code-register-postponed!' is about.  I.e. see ther `forth-word' macro
in the macro-forth.ss module.  That one obtains the 3 values as:
label, wrapper, inline, and will perform registration.

This is where we can tap in.

Maybe the `forth-word' should be renamed, or at least moved to a
different location, i.e. in the compiler-unit.ss module.

( Hmm.. Spagetti code.  Or too much parameterized code.  )

Why is the `compile' parameter used ion forth-word set to
`rpn-lambda'?  It is explaned in the docs even, but I don't get it.
Wait, it is `macro:' : see `forth-begin/init'.  Indeed, just a
parameter.

Next problem: why doesn't this work?

;; similar to macro-forth.ss: forth-word
(define-syntax-rule (instantiate-code name word ...)
  (begin
   (define-values
     (label wrapper inline)
     (wrap-word 'name
                #f                    ;; source location
                (macro: word ...)))   ;; compiler
   (ns (target) (define name label))
   (ns (macro)  (define name wrapper))
   ;; (ns (inline) (define name inline)) ;; not necessary
   (code-register-postponed! inline)))


box> (instantiate-code foo 1 +)
box> (print-target-word target/foo)
foo:

It probably needs just to be compiled.  I've added some code to
`print-target-word' to deal with non-compiled code.  Something calls
compile somewhere..  It's `compile!' defined in the pic18.ss module.

Ok, that seems to work fine.


Observation: The convoluted code seems to come mostly from the Forth
syntax which has parsing state that doesn't mesh too well with the way
PLT modules work.  The rest seems to be fine.  State involves:
   - forth / macro switching
   - recursive "require" expansion


Entry: Instantiation and control flow
Date: Thu Dec 30 13:37:21 EST 2010


The idea in the Forth syntax is to give give full control to the
programmer regarding control flow.  I.e. the default is for words to
fall trough.

This base-level control is an important feature: you want to be able
to build abstractions on top of control flow.

How to make this explicit?  Probably we just need two code
instantiation forms: one that behaves as the macro form (with implicit
exit) and one that has fallthrough code.

These are currently set as:

 (words      (name . code) ...)   ;; individual words
 (words-flat (name . code) ...)   ;; fallthrough words


The order is not specified in the individual words.  This makes
fallthrough explicit and macro <-> word substitution simpler.

I don't see a simple way to write this on top of the `compositions'
macro as there are two namespaces involved.

So this seems to be it.

The same goes for variables.  Question: should variables be flat (in
sequence) by default, or should we keep them re-arrangable?  The usual
caveat applies: if flat, it's no longer managable, i.e. no fancy
compiler optimizations.


Entry: Next
Date: Thu Dec 30 14:51:13 EST 2010

Write some code!

The words/variables approach seems to work fine.  This should fix the
composition problem with the the only thing missing the toplevel
stubs: compile to .hex and .dict etc..

Roadmap:
  - Convert one of the Forth language modules to s-expressions

Converted serial.ss which was trivial.  The tests seem to pass also.

The next thing is to write code, and maybe figure out how to get
highlighting for the scheme forms, that would be nice.


Entry: Retargetting at Schemers
Date: Thu Dec 30 20:01:45 EST 2010

A side effect of ditching the Forth syntax is to get rid of the
false hope that this would bring in Forth programmers.  ( Ha! )
So I'm back to Scheme, or Racket if you want..

One of the reasons I put in the Forth layer is to be able to hide the
Scheme side.  What was I thinking?  While Forth syntax is quite
productive for writing low level apps, trying to hide PLT scheme
features behind an explicitly coded Forth frontend isn't very
productive.


Entry: Object-oriented API
Date: Fri Dec 31 23:26:39 EST 2010

Now instead of a command API, we need a base-line OO API and build the
command line API on top of that.

Command lines are nice, but APIs are nicer, as they allow easier
"metaprogramming".


Entry: Bootloader
Date: Fri Feb  4 13:07:20 EST 2011

* To make the framework easier to use, it seems to be a good idea to
  switch back to the bootloader approach.  When working with multiple
  PIC chips in very simple circuits, the PIC programmer is too much of
  a hassle, and the ICD connector adds a significant board overhead.

* Multiple bus support is really needed.  I need bootloader on I2C for
  sure.


Entry: 3V input on 5V circuit.
Date: Fri Feb  4 13:12:17 EST 2011

I need to sniff a 3V SPI bus.  I only have 5V boards.  What's the
fastest way to interface?  For now it's only 3V -> 5V but near future
might need reverse too.

  * Build a 3V board.

  * Some inverter chip tricks?

It seems it's going to be far easier to just stick to 3V3 for the
sniffer.  PIC 3V3 pins are 5V tolerant.

See figure 26-3 in the 18LF2620 data sheet.  At 3V3 max speed is about
20 MHz.  It should be straightforward to modify the clock settings on
the 2620 to do this.

I tried with a 3V3 cable but didn't work.  Voltage measured 4.3V so
maybe I messed up the voltage regulator in the cable?  Check here [2].
The logic levels are 3.3 but the power is 5V.  Both 3V3 cables are the
same.  Why is that?  Ok, going from 4.7 -> 5.0 when using a powered
hub.

[1] entry://../electronics/20110204-131618
[2] http://www.ftdichip.com/Support/Documents/DataSheets/Cables/DS_TTL-232R_CABLES.pdf


Entry: Back to the Stack
Date: Mon Feb 21 09:17:37 EST 2011

* The "lexical" Staapl spin-off has come full circle[1]: I should
  switch to a _stack_ architecture for describing directed acyclic
  graphs.


STACKS & FANOUT:

What needs to be done is to make this mesh with the rest of Staapl.

The key choice is to separate the internal representation that
supports higher order functions, from the "user interface" that allows
the use of lexical variables on top of this.

The stack representation has explicit "dup".  Recovery of sharing
information is not possible in abstract-interpretation based Haskell
implementations; it requires a CPS-style approach.

The stack interface is already threaded.

To make this explicit, the sawtooth algorithm can be used as an
application pull.


[1] entry://../meta/20110126-092935


Entry: New PIC arch
Date: Mon Feb 21 09:39:35 EST 2011

Arrived[1], together with some 3V3 regulators.

[1] entry://../electronics/20110205-163902


Entry: Bootloader
Date: Thu Feb 24 20:51:31 EST 2011

Requirements:

 - Basic command set = standard.

 - Front end (serial / SPI / I2C / USB / ICD) configurable.

 - Possible to protect memory, but not mandatory.

 - Access to reset.

 - Command/OO interface


Entry: Applications
Date: Thu Feb 24 21:04:26 EST 2011

Another thing that might be interesting is to introduce
"applications".  An application is a binary with an entry point.  Even
better might be objects.

The basic idea is to be able to reload code without full chip erase.
(Again, balancing on the line between whole-program compilation and
linkable objects.)

I wonder..  Is there a standard object format to use for the pic18?


Entry: PIC18 debug tools
Date: Thu Feb 24 21:06:32 EST 2011

If you want to use the Microchip tools on Windows it's all good of
course.  Not if you want to get creative..  No specs for PIC18 debug
mode, and the only programmable programmer (PicKit2) is probably being
discontinued soon.

Working console-only is doable, giving up on debug features.  However
one annoying point is lack of access to reset.

A nice project would be to write alternative firmware for the PK2.  It
has the right connection, and is readily available for $30 as a
clone[3].  The only real hurdle seems to be USB, but that's something
I need to figure out anyway some time..

Let's summarize:

  * Get USB to work, using USBPicStamp or PK2
  * Make console run over ICD2 port (build a serial -> ICD adapter)
  * Get programming specs
  * Get debug specs

Some reverse engineered debug info should arrive here[1] soon.  The
PK2 schematic is here[2].  Prog spec is available from the microchip
part website, i.e. for 18F1xK50[4].

[1] http://jaromir.xf.cz/hdeb/hdeb.html
[2] http://www.modtronix.com/products/prog/pickit2/pickit2%20datasheet.pdf
[3] http://www.sure-electronics.net/mcu,display/DB-DP004_1_b.jpg
[4] http://ww1.microchip.com/downloads/en/DeviceDoc/41342E.pdf


Entry: Programming?
Date: Fri Feb 25 23:52:17 EST 2011

The problem isn't programming, it's debugging.

If this sounds obious to people that do electronics for a living, but
it's not to someone with a "clean" programmer background.

I keep being surprised at how easy it is to loose things to the
darkness.  To get into a situation where you can't see what's
happening.


Entry: PIC debugging: data over serial connector
Date: Tue Mar  1 23:45:19 EST 2011

Currently I have a programmer + serial console per pic chip.  This is
too complicated.  I want something simpler.

How to use the ICD to send console data?

Using the PICkit2 it is possible to send and receive data.  I was on a
roll trying to make this work before, but I got lost somewhere.  What
happened?


Entry: RTS / CTS as reset?
Date: Tue Mar  1 23:58:53 EST 2011

Can one of these pins (I always forget which is which) be used as a
reset pin on a standard serial connection?


[1] http://www.easysw.com/~mike/serial/serial.html#5_1_2


Entry: picstamp.fm -> picstamp.ss : removing `load' and usining units
Date: Mon Mar  7 14:11:53 EST 2011

Basic idea: the PIC18 support code is parameterized by: fosc, baud.
These need to be defined in the application unit.

The best approach seems to be to look at the toplevel app module as a
linker script that patches together different components, and provides
some "configuration modules".


Entry: PIC18F constants
Date: Mon Mar  7 19:11:45 EST 2011

This probably needs a "shared constants" approach, or else all PIC18
specific modules need to be written as units.

Alternatively, a pic18 signature can be created that is enough to
support the basic library.  Then the constant files would implement 2
interfaces:

pic18-shared
pic18-device

The snarfer can then distinguish between these 2.

EDIT: but is it really a good idea to distinguish between the 2?


Entry: Tension between modules and units
Date: Mon Mar  7 19:18:27 EST 2011

* Modules are definitely easier to use.  The directed nature of
  dependencies makes it easy to hide stuff.

* Units are more flexible and necessary when there are different
  implementations of interfaces.

Now, is it possible for modules to depend on some "machine" module
that provies all the identifiers, and have this resolved later in a
top-level linking phase?

EDIT:

I.e. where A -> B means "A depends on B"

    A -> B -> C -> M

All the arrows could be implemented by `require' be it for holes in M.

I don't think it's possible to automatically translate the module
chain to a unit chain when one inserts holes in M.  It seems that the
only way to do that is to use dynamic parameters, and that's not what
I want.  I had that before and it's too error-prone.  I really want
static bindings, no side-effecting behaviour change.


Entry: Roadmap: constant dependencies
Date: Tue Mar  8 16:08:31 EST 2011

1. Remove pic18-const

2. Make an interface for the minimally needed constants used in the
   compiler and possibly library code.

3. Make it so that there is no longer a top pic18.ss interface, but a
   unit that still requires linkage with a chip-specific module.

4. Turn everything into a unit, and splice off the parser.
   -> Make this work for macros.


wow.. does this open a can of worms or what!


Entry: Compiler not in module?
Date: Tue Mar  8 16:46:18 EST 2011

Maybe I have it all backwards.  What if the compiler is something that
is fed a configuration (an app link script), and this configuration is
a racket module?

The thing is this: the compilation is really just another stage, so
why is it not abstracted as such?

Currently the target words definition needs some state in the main
module.  How did this work again?


Ok. I get it.  Each project has a list of non-instantiated macros that
are then tied together later.

Problem: `words', `variables', ... are macros defined in terms of
signatures, so they need to be part of the signatures.  Move
label-unit.ss to sig.ss

How to have a signature depend on another signature, i.e. such that
the macros defined in the signature can refer identifiers from another
signature?

HMM.. that doesn't work so well.
This really needs combining.

I need a different approach as this is just a shot in the dark..

  Q: Why do I end up with macros depending on units in the first place?

It seems that a "not fully specified compiler" doesn't really agree
well with separate compilation.  At least from that perspective it at
least makes sense that I'm not able to express what I want.

In other words: definition of target words only makes sense once the
compiler is fully specified.


Entry: Macros in terms of multiple signatures
Date: Tue Mar  8 18:18:17 EST 2011

I'd like to have a simpler explanation of why macros in signatures
can't depend on other signatures, making signatures depend on each
other.  Why is this "flattening" necessary.

Actually, it's not so hard..  Signatures are just collections of
names, and don't contain any dependency information.  Actually,
dependencies are expressed _in terms of_ signatures.

The solution would be to define a new signature that has all the
identifiers the macros depend on, and have a dummy unit that
translates a bunch of signatures into the one needed by the macros.


Ok, I seem to be on the right way, but I at some point one of the
macros doesn't expand properly after generating an expression with
multiple invocations of another macro:  (begin (m ...) ...)

Nope... it expands just fine.  It's about one of the identifiers:
 "access from an uncertified context to unexported variable from
module: label:exit".  Binding it lexically before handing to the
"macro:" expander seems to make that error go away.

This lexical binding introduced another error: the `define' was no
longer top-level.

Ok, WORKS!


Entry: Compiler fixes
Date: Tue Mar  8 20:59:53 EST 2011

next: the code.ss is no more.  compiler needs to instantiate the unit.

EDIT: the whole live section has a dependency on code.ss

What is the real problem?  Dependency on machine macros.  Actually,
there is no import in code-registry-unit.ss so maybe it should go back
to being a module?

Done.


Entry: Next
Date: Thu Mar 10 10:47:00 EST 2011

Make pic18.ss the generic linked device, but provide a mechanism to
choose a proper chip-specific const module.


Entry: Name change?  `macro' is misleading
Date: Fri Mar 11 10:06:13 EST 2011

How hard would it be to change `macro' to `gen' or something similar,
to distinguish from scheme macros?

Can it be called `code'?  Maybe that's too general.

Let's just keep it like it is.  The change is cross-cutting and there
doesn't seem to be a single right pick.


Entry: Next: org-begin org-end
Date: Fri Mar 11 10:17:59 EST 2011

It seems that this is best pushed somewhere else.  I.e. next to the
`wrap-code' functions.

Looks like there's plenty of room for simplification once the
stack-oriented label juggling is pushed deeper.

Especially the org code is a bit of a hack.  It probably also just
works using dummy names.

Maybe `wrap-word' and `wrap-variable' should have an optional org
argument?


Entry: Introducing names
Date: Sat Mar 12 09:04:30 EST 2011

I find myself mindlessly shuffling things to make the symol
introduction work for a macro based on define-values/invoke-unit.

The idea is that I want a single form that links a user-specified unit
into the whole of the pic18 compiler, and exports alll the signature
words.

This _requires_ some non-hygienic functionality, so it is up to the
writer of the macro to ensure that it actually makes sense.  In my
case, it's providing the proper `require' statements.  The error I
made was to use relative paths.


The trick seems to be to find out where `define-values/invoke-unit'
decides where to put the identifiers.  The manual[1] says it is
introduced in the context of the `define-values/invoke-unit' form.


[1] http://download.plt-scheme.org/doc/html/reference/invokingunits.html


Entry: Hmm.. still getting weird unknown signature errors
Date: Sat Mar 12 12:32:46 EST 2011

It's too complex.  It looks like there are some things I don't
understand about units, because I keep running into compiler errors I
do not understand + they don't seem to be stable either.  Sometimes
code compiles fine, then it does not..  Probably depending on
compiled/* code caches..

It looks like there is only one way to do this right as I don't really
know what I'm doing: use units for everything and perform the linking
step explicitly in a top-level module.  Once that works, try to
abstract the linkin step into a macro or something.

Basic problem: non-hygienic macros don't compose very well.  If at all
possible, stick to all hygienic.

The main offender here is the forth macros.  I can't possibly put all
of those in a signature (can I?).

However, if I can, the composition problem would be solved completely..


Entry: Moving forth parsing words to signatures.
Date: Sat Mar 12 14:12:39 EST 2011

I don't see an immediate starting point to do this incrementally.
Maybe that's actually good; maybe the real solution is to throw it all
away and start over?

Man, this is hard..

Let's start at the beginning.

A significant roadblock is the non-composability of define-signature.
Can that be fixed transparently?

I think it really needs a rewrite, starting bottom up, on top of
label^, and taking into account all the problems that are associated
with the flatness.

Starting from scratch also allows preprocessing macros to be
represented differently.  They should not be syntax, because they do
not behave as scheme macros.

The basic idea is that the gizmos that implement the separate
behaviour of the syntax-juggling code are themselves implemented as
signature-specified _values_ to avoid the problem of having to define
them as macros.

Can this be done in a straightforward way?  I.e. there should be just
another stage.

The essential element seems to be to "split" the identifiers into two
classes: those that are part of the transformer stage and those that
are part of the code stage.


Entry: The new `forth-begin'
Date: Sat Mar 12 15:47:24 EST 2011

Forth code is a straight line, and it generates Scheme expressions
that look like:

(variables a b c)
(words-flat
  (foo 123)
  (bar foo 1 +))
(macros-flat
  (baz foo bar + <hole> ))

Essentially, a module s-expression is built one atom at a time.  The
default behaviour is to simply append at the end of the last
expression, i.e. in the above it would be at <hole>.

Problems: there is no "macros-flat" form.

Once there is, it should be quite straightforward to use this
approach.  The expression could be kept in inverted representation as
long as recursive expansion is not necessary, i.e. a scheme form that
introduces names needs to be properly inserted before the forth
parsing goes on.


The basic misunderstanding in the previous approach is that it is
possible to take a whole file and build a single huge s-expression,
which will then be expanded at once.  Because of introduction of names
this isn't possible without recursive expansion.  Let's build that in
from the start.

I forgot, the prefix parser thing is actually quite deep..  The call
to `syntax-local-value` is in rpn/parse.ss

So it doesn't look all that bad over there..  It's the middle part
that's rotten.


Entry: Focus
Date: Tue Mar 15 13:34:43 EDT 2011

Bottom line: there are many things to fix.  Mostly the forth parser
and the compiler are too "stateful" and might need a change, or at
least some thought on why things are as they are.

So it looks like I need to focus more if I want to work on real-life
projects instead of full-time tinkering on Staapl.

I'm also not in a terrific physical shape so maybe this is not a good
time to do the creative magic required to overhaul the core.  One step
at a time.

Currently, there are two goals that are somewhat intertwined if they
need to be done right: get the bare-bones app to work using
s-expressions, and cleanup the Forth macro interface on top of units.

TODO:

  * allow `words' and `words-flat' to support raw addresses instead of
    labels.  This might need a change in the word-wrap code.

  * figure out how to extend signature syntax so it's possible to move
    code between signatures and plain modules.

The latter seems most isolated so let's start there.


Entry: Extending `define-signature'
Date: Tue Mar 15 13:41:15 EDT 2011

Apparently the identifier is not is a separate namespace, so beware
when including this in ordinary modules:

(define-signature-form (define-syntax-rule stx)
  (syntax-case stx ()
    ((_ (name . pat) expr)
     (list
      #'(define-syntaxes (name)
          (syntax-rules () ((_ . pat) expr)))))))


[1] http://pre.plt-scheme.org/docs/html/reference/define-sig-form.html


Entry: org
Date: Tue Mar 15 14:57:47 EDT 2011

The way org is implemented using org-begin and org-end is a bit weird,
as it can span over several definitions (i.e. one consecutive chain
with several entry points).


Entry: Piggy-back signature macros
Date: Thu Mar 17 00:23:57 EDT 2011

I have a `prefix-parser' macro defined in rpn.ss and I want to lift
this to a signature form.  How to do that?

1. Import the original forms with a id prefix


Entry: Prefix parsers part of signatures
Date: Thu Mar 17 00:34:44 EDT 2011


Ha damn it, it works!  After adding some glue and a dummy unit to
export the signature, this is the sig def and an expansion test:


(require "../rpn/rpn-signature-forms.ss")
(define-signature prefix-test^
  ((prefix-parsers
    (macro)
    ((p3) (+ + +)))))


;; In pic18.ss context:
box> (syntax->datum (expand #'(macro: p3)))
(#%app
 make-word
 (lambda (p)
   (let-values (((p) (#%app (#%top . macro/+) p)))
     (let-values (((p) (#%app (#%top . macro/+) p)))
       (let-values (((p) (#%app (#%top . macro/+) p))) p)))))


Nope that's not yet correct.  The `+' is not visible in the signature
and needs to be made part of the signature.  I moved to this:


(define-signature prefix-test^
  (macro/plus
   (prefix-parsers
    (macro)
    ((plus3) (plus plus plus)))))


And a trivial plug in the unit def:

(import stack^)  ;; for macro/+
(export prefix-test^)
(define (macro/plus s) (macro/+ s))


This isn't quite right.  I mean, it works, but it's clumsy.  The names
like `plus' leak into the namespace.  It would be better if this
didn't need to add aliases.  Can `import' or `open' be used in the
signature?

What about this: use two interfaces.  One that lists the deps and
another that extends this sig with macros.

What I really want is a simple way to bundle things, make unions of
interfaces.


Entry: Ask racket list
Date: Thu Mar 17 01:45:38 EDT 2011

Is it possible to define macros as part of signatures, but allow the
macros to see different signatures?

Is it possible to create unions of signatures, ala multiple
inheritance?


Entry: The (ns ...) hack
Date: Thu Mar 17 01:54:59 EDT 2011

Trouble is, this doesn't work so well because it inspects forms.

Actually, my trouble is that defining identifiers sometimes needs to
be abstracted, such as in signatures.


Entry: Assembler broken?
Date: Thu Mar 17 21:04:48 EDT 2011

mzscheme -p zwizwa/staapl/staaplc -- -c /dev/ttyUPS picstamp.fm
patterns: Mismatch at: (((list-rest (list (? (ns (op ?) qw)) b) (list (? (ns (op ?) qw)) a) rest) (macro/append-reverse (begin (list (op: qw (tv: a b /)))) rest)))
make: *** [picstamp.dict] Error 1

That makes no sense.
What is the question mark about?

The error is raised in:
staapl/coma/pattern-tx.ss

How to add an original source location to the message?

Ok, I was able to trace it to the macro `/' and its' definition
in pic18-macro-unit.ss:

(patterns-class
 (macro)
 ;;------------
 (word)
 ;;------------
 ((pow)
  (>>>)
  (<<<)
  (/)
  (*))
 ;;---------------------------------------------------------------
 (([qw a ] [qw b] word)         ([qw (tv: a b word)])))


So what happens is clear: the `/' macro is invoked without constants
on the compilations stack.  Why is that?  Can we also get at the call
site of the macro?

Looks like I need to get at the backtrace, or define what a backtrace
means for a staapl compiler.


Entry: Code structure
Date: Thu Mar 17 21:24:21 EDT 2011

So.. looks like I'm being forced into cleaning up some other code
layers too.  First, a summary of recent fixes:

 1. Names and units: this seems to be mostly solved, apart from some
    cosmetics that have to do with bundling signatures, and pushing
    this style to all the library code.

 2. Allow s-expression-only definition of target code modules.  Seems
    conceptually ok, but because of missing library code from point
    1. this doesn't work yet.

 3. Salvage the forth prefix parser.  I wasn't going to do this but it
    seems it's at least relatively straightforward to mix units and
    macros, if a bit clumsy.

TODO:

 4. Make the run-time target code generation layers more transparent
    and as much as possible implemented with proper identifier
    management.


Entry: Backtraces / continuation marks
Date: Fri Mar 18 11:59:29 EDT 2011

From [1]: "The continuation marks included in the exception are
effectively a stack trace, and you can convert them into locations."


[1] http://www.mail-archive.com/users@racket-lang.org/msg00132.html


Entry: Weird bug
Date: Sat Mar 19 10:56:51 EDT 2011

Figuring out the backtraces seems a bit much work.  I used a tracing
approach, tagging the macro invokations with a `printf'.  Traced it
down to:

    UEP0 >block

where

   : >block 16 / ;

My guess is that UEP0 is undefined.

No it's something more insiduous.  In ramblock.ss, the following code
is compiled as if `>block' is a target word, not a macro.

macro
: >block 16 / ;
forth

Changing the `:' to an explicit `:macro' solves the problem.

Ok, I moved the code that was in coma/macro-forth-tx.ss back to a
`begin-for-syntax' form and the problem goes away.

I don't understand..  Something to do with local state maybe?

Bottom line: that code has to change.  Too obscure.


Entry: A new macro-forth ?
Date: Sat Mar 19 11:57:26 EDT 2011

In coma/macro-forth.ss: the `forth-compile-dictionary' needs to be
replaced by a more direct mechanism.  Recap: why does rpn-parse have
this first argument?


Entry: Uncertified syntax
Date: Sat Mar 19 14:23:40 EDT 2011

Simply put, I don't know where to begin to understand why I get these
syntax certificate errors[1].  However, my guess is that it's the `ns'
form that is doing something wrong.  Let's see what happens if I try
to remove it.

From [2]: "Certification should work automatically unless you're using
`local-expand' and re-arranging the result."

An I'm not calling local-expand.  Removing `ns' doesn't seem to be an
option.  It's too deep.


[1] http://docs.racket-lang.org/guide/stx-certs.html
[2] http://lists.racket-lang.org/users/archive/2007-March/016859.html


Entry: What a mess..
Date: Sat Mar 19 15:01:19 EDT 2011

Actually, it is working.  Don't forget that!  But frankly I don't
understand why.  Code is patched with workarounds.

I can't really remove it.  Lot of functionality depends on it.

What to do??

Maybe best jently move towards rewriting the library code into
s-expression / unit style, and _WHEN IT'S WORKING_ rethink the forth
part.

I would not be surprised if in the mean time something really simple
popped up that allowed to scrap all that complexity.  It really can't
be that hard.  It's just because I don't have overview and dug myself
too deep into racket macro internals I don't need.


Entry: Syntax certificates
Date: Sun Mar 20 10:25:09 EDT 2011

In Staapl, I keep running into errors like these:

compile: access from an uncertified context to unexported syntax from module: "/home/tom/pub/darcs/brood-5/staapl/pic18.ss" at: label:org-begin in: label:org-begin.261

I've read a bit on the manual[1] and I think I sort of understand the
idea, but I can't figure out what is causing this.  So, that's today's
task: understand syntax certs.

There are 3 candidates:

- The name prefixing which is used in ns.tx uses `datum->syntax'.
  According to the manual this does not transfer certificates.

- The `ns' macro itself.  However, that really just re-arranges its
  input.

- The `rpn-parse' macro which takes apart syntax and puts it back
  together again.


Another experiment.  What I noticed is that if I put the reference to
`label:org-begin' in a `macro:' form by itself, i.e.

   (macro: .. ,label:org-begin ..)

there is no problem.  So I changed the defining form to:

   (define-syntax-rule (word-defs wrap name raw-macro)
     (begin
       (define-values (label wrapper codegen)
         (wrap 'name
               #f                      ;; source location
               raw-macro))
       (word-define (target) name label)
       (word-define (macro)  name wrapper)
       (label:append! codegen)))

and then used the following:

  (define-syntax-rule (words-org-flat (address rpn-code ...) ...)
     (begin
       (define org-begin label:org-begin)
       (define org-end   label:org-end)
       (begin
         (let ((raw-macro
                (macro: 'address ,label:org-begin
                        rpn-code ...
                        ,org-end)))

           (word-defs label:wrap-word #f raw-macro))
         ...)))

which works.  However, when the `raw-macro' is substituded in the code
it doesn't work.

Ok, so I've re-arranged the code such that the code generator is bound
to a variable.  That seems to work and has the added benefit of being
a bit more readable.

   (define-syntax-rule (words-flat (name . rpn-code) ...)
     (begin
       (begin
         (define codegen (macro: . rpn-code))
         (word-defs label:wrap-word name codegen))
       ...))
   (define-syntax-rule (words (name rpn-code ...) ...)
     (begin
       (begin
         (define codegen (macro: rpn-code ... ,label:exit))
         (word-defs label:wrap-word name codegen))
       ...))
   (define-syntax-rule (words-org-flat (address rpn-code ...) ...)
     (begin
       (begin
         (define codegen
           (macro: 'address ,label:org-begin  ;; Switch to a new chain
                   rpn-code ...               ;; append the code there,
                   ,label:org-end))           ;; and switch back.
         (word-defs label:wrap-word #f codegen))
       ...))
   (define-syntax-rule (variable-entry name size)
     (begin
       (define codegen (macro: size ,label:allot))
       (word-defs label:wrap-variable #f codegen)))

The pattern that shows up is that deconstructing a form like (a . b)
causes problems.  But that doesn't seem to be right.  My guess is that
this is really an artifact of the unit stuff.  Maybe it's time to
start to blame racket?

So, I don't understand why it's a problem here..  Let's try to redo
the other point where it showed up.

Ok, it seems the trouble is with the prefix parsers defined in the sig:
   (prefix-parsers
    (macro)
    ((forth) (:forth #f))
    ((macro) (:macro #f))

    ((variable n)   (:variable n 1 mf:allot))  ;; FIXME: needs parameterization
    ((2variable n)  (:variable n 2 mf:allot))  ;; FIXME: needs parameterization
)

The rest seems to work ok.


Entry: Is `syntax->list' the culprit?
Date: Sun Mar 20 12:54:43 EDT 2011

From [1]: "Calling syntax->list loses the outermost certificate, which
is the "safest" place to have one. IOW, syntax->list causes the error
when one of its immediate subexpressions is an introduced, unexported
identifier."

Following up to [2]:

(define-for-syntax (recertifiable-transform transform stx)
  (let ([new-stx (transform stx)]
        [inspector (current-code-inspector)])
    (define (recertify s) (syntax-recertify s new-stx inspector #f))
    (values new-stx recertify)))

The non-osdir thread: [3].


Following up on the idea that the problem might be `syntax->list' I
tried the following to replace the invokation in `rpn-syntax-rules',
which seems to work:

(define (syntax->rlist stx)
  (let* ((cci (current-code-inspector))
         (recert (lambda (stx-new)
                   (syntax-recertify stx-new stx cci #f))))
    (map recert (syntax->list stx))))

Next to that, there is another problem with `datum->syntax'.  Looks
like the reason I couln't get anywhere before is that it was wrong in
at least two places.

Another place that caused trouble was `make-rpn-expand-transformer'.
Here the certificate needs to come from the result of
`begin-stx-thunk'.

Another one (damn!) is mf:alloc.  Tried for a bit to see what's going
on but I'm running out of steam.  This is horrible.


[1] http://osdir.com/ml/plt-scheme/2010-03/msg00245.html
[2] http://osdir.com/ml/plt-scheme/2010-03/msg00247.html
[3] http://lists.racket-lang.org/users/archive/2010-March/038586.html


Entry: More..
Date: Sun Mar 20 15:10:12 EDT 2011

One step closer: verified that the error comes from using
`syntax-local-value' on the `macro/:macro' identifier.

So it's really literally so: that particular access is not allowed.
I'm assuming this is because unit syntax is somewhat special.

Let's explore the other route again.


Entry: Avoiding `syntax-local-value'
Date: Sun Mar 20 15:28:08 EDT 2011

I'm using a mechanism I don't fully understand (`syntax-local-value'
and its interaction with syntax certificates and units) in the heart
of the Staapl code (the `rpn-parse' macro).

As engineering practices go, that's probably the worst one can do...

It might be wiser to do this in two steps: If any plugin behaviour is
necessary, put it in a different layer.

Another argument: `rpn-parse' is supposed to generate a single form,
not a recursive expansion.


Entry: Enough
Date: Sun Mar 20 18:39:00 EDT 2011

I'm sick of it.  Let's rip it all out and build it back up.


Entry: Crisis
Date: Sun Mar 20 18:43:51 EDT 2011

Rewrite is not an option.  I probably don't have the energy to get it
back into shape.  If there's a rewrite, it will have to be in Haskell
in order to be forced to get the types right.


Entry: Debugging certificates.
Date: Sun Mar 20 18:45:34 EDT 2011

Is there a way to debug this to see what's going on?


Entry: Same problem as before, now with working signatures.
Date: Sun Mar 20 19:17:18 EDT 2011

[1] entry://20110319-105651


Entry: picstamp.fm back online
Date: Sun Mar 20 19:54:12 EDT 2011

Saved the syntax!

Now it's time to kill it off..  The implementation is far to crummy.

Code upload doesn't seem to work though.  It might be a racket thing
too.  Nope it was `target-byte-address' undefined.  I'm not correctly
handling some error in live.ss


error #(struct:exn:fail target-word-not-found: OK #<continuation-mark-set>)


Entry: Safely tucked away?
Date: Sun Mar 20 20:49:09 EDT 2011

So it's time to get some real work done.
Next: target instantiation macro.


Entry: PIC18 debug mode
Date: Mon Mar 21 00:09:19 EDT 2011

See here[1] for a mirror of Jaromir's notes.  From what I gather, the
most useful bits seem to be the fact that debug mode can be entered
from a host signal without any kind of interrupt support on the
target.  The rest is software, and effectively using this already
assumes there is a protocol over RB6 and RB7.

If debug mode is triggered by a 1->0 transition on RB6 (PGC), we can
use this to attach a serial port.  This means a standard serial start
bit can jump to debug.

Anyway, it seems best to use an external clock as Jaromir suggests, to
be independent of clock speed of the PIC.

It seems simplest to use the current size-prefixed command/reply
protocol.  Target can use PGD to signal ready state followed by a
clock-out by the host.

Otoh, I2C is 2-wire.  Maybe that's what I should stick to?

[1] entry://../electronics/20110320-225422


Entry: NEXT
Date: Thu Mar 24 20:29:10 EDT 2011

- compound units -> full s-exp only kernel

- USB interface


Entry: Compound units
Date: Thu Mar 24 20:31:11 EDT 2011

I guess what I'm looking for next is a compound unit, or a partially
linked unit.

Does this need the explicit linking?  The unit interface seems to have
mostly two ways of linking: manual or automatic.  From what I
understand the automatic way is if there is no duplication of
interfaces.

What about compound-unit/infer?  I.e. from the guide[2]:

  > (define-compound-unit/infer toy-store+factory@
      (import)
      (export toy-factory^ toy-store^)
      (link store-specific-factory@
            toy-store@))


But, that doesn't solve the problem of consolidating interfaces to
make them simpler to reference.


[1] http://docs.racket-lang.org/reference/compoundunits.html?q=bsl
[2] http://docs.racket-lang.org/guide/Linking_Units.html


Entry: Unit unions
Date: Thu Mar 24 20:59:55 EDT 2011

Let's just build and abstraction for it.  I don't think that compound
signatures are possible, except for the limited single-inheritance.

So I got something working:

(define-syntax (define-dictionary stx)
  (syntax-case stx ()
    ((_ name (sig^ ...))
     #`(define-syntax name #'(sig^ ...)))))

(begin-for-syntax
 (define (re-syntax context stx)
   (datum->syntax context (syntax->datum stx))))

(define-syntax (define/invoke-dictionary stx)
  (syntax-case stx ()
    ((_ dict^^ (unit@ ...))
     (let ((sigs (re-syntax stx (syntax-local-value #'dict^^))))
       #`(begin
           (define-compound-unit/infer combined@
             (import)
             (export #,@sigs)
             (link unit@ ...))
           (define-values/invoke-unit combined@
             (import)
             (export #,@sigs)))))))

And then:

(define-dictionary pic18^^
  (stack^
   stack-extra^
   memory-extra^
   .. ))


This works, but only because the symbols are non-hygienic.  I can't
seem to keep the signatures themselves hidden, and export the
definitions.


Entry: Why is there no unit collection?
Date: Thu Mar 24 23:30:39 EDT 2011

I think I'm not using the right collection.  Probably, I should be
using modules and parameters.  Units make it harder to work with
macros, and they don't agglomerate easily like modules do.

My problem however was that I do really have components with macros in
them.


Entry: Use the source luke
Date: Fri Mar 25 09:39:43 EDT 2011

I really should look more at the racket source code:

./launcher/launcher.rkt:7:

(define-values/invoke-unit/infer launcher@)
(provide-signature-elements launcher^)

From a quick look it seems this is quite common.  However, there
doesn't seem to be an equivalent of the `provide-all-out' form.


Entry: Include files
Date: Fri Mar 25 15:03:08 EDT 2011

For the device-specific constants, it might be best to use inheritance
here: one unit provides the constants necessary for the library code,
another interface provides chip-specific constants.

Trouble here is of course that the Microchip header parser needs to
distinguish these.


Entry: Protocol changes
Date: Sun Mar 27 11:12:55 EDT 2011

Making the USB polling work made me thing of a missing feature in the
current monitor code: it's not possible to read/write the terminal
from the target side.

This isn't so hard to do.  It just requires some coroutine-style
symmetric communication.  Maybe it's best to implement that first, and
only then move on to the polled PK2 channel.

Before doing anything, I have both monitor .ss files and .f files.
What's up here?  I don't think the .ss ones actually work, as I don't
see them included in any projects.

The change can be minimal.  Only the jsr command executes arbitrary
code.  The host side that waits for ack there needs to implement the
coroutine mechanism.

Man, I've been out of it for a while.  This is already working!  The
`console-log' function handles the receive message.  Currently the
only thing it does is printing messages to the console, but this could
be extended into a coroutine call.

The problem with the current one is how to express "continue" to the
target.  It seems best to fully implement continuations as any other
approach is going to be an ad-hoc hack from if-then-else hell that
wants to be a full continuation implementation.

Let's leave this for later.  The current implementation at least
allows for some form of ping-pong.


Entry: Running the test suite
Date: Sun Mar 27 11:19:50 EDT 2011

The .hex comparison tests seem to pass, except for the picstamp code
which I've been changing.


Entry: Reading the ICSP proto on PIC
Date: Sun Mar 27 12:45:36 EDT 2011

Trouble is that by default the PK2 sends very narrow pulses: 160ns,
which is 2 cycles at 12 MIPS.

This is no problem for hardware, but for software it's a bit of a
stretch.  Can this be slowed down on the PK2.  Otherwise it might be
necessary to use the KBI2 feature, which is not universal.

RB7 = PGC = KBI2 for the "new core" 18F:
18F2550
18F1220
18F2620
18F24J10

but i.e. not for the "old core":
18F252

for the very new core which has a different data sheet pin diagram,
they seem to be listed as IOC which I assume is the same:
18F46K22

RBIF (INTCON 0) changes when a PORTB pin changes.  The trouble is
going to be that we need to detect only clock edges, not a data edge.
We can't check the data itself because we're not going to be fast
enough..  Triggering on both data and clock should be possible if the
delay is set high enough.  The PK2 is quite predictable: data is set
and clock is toggled as fast as possible, so a small delay after any
data or clock change should be sufficient.

However, it might be necessary to go to SPI or I2C using the extra AUX
line since this seems to be a bit too much foefelare.

Wait: it is possible to change the pulse with by changing the speed
using SET_ICSP_SPEED.


Entry: Forth syntax
Date: Sun Mar 27 14:06:29 EDT 2011

I'm not going to ditch it.  It's a nice syntax for actual programming.
A nice UI.


Entry: Clock sync
Date: Tue Mar 29 00:03:56 EDT 2011

"begin cond? until" is no longer compiled to bit test + jump
backwards, but to a jump forwards.  Strange.

EDIT: I found this:
 ;; Conditional skip optimisation for 'then'.
 ;; FIXME: not used since we can't mutate then (defined in control.ss)
;;  (([btfsp  p f b a] [bra l1] ,ins [label l2] swapbra)
;;   (if (eq? l1 l2)
;;       `([btfsp ,(flip p) ,f ,b ,a] ,ins)
;;       (error 'then-opti-error)))
;;  ((swapbra) ())


I tried to add it but it doesn't seem to trigger.  Maybe something
changed in the way `label' is handled?

Hmmm... `label:' cuts off a part of the code so the optimization is
not performed.  I don't see a straightforward way to solve this.

  begin cond? until

  sym dum>m label: cond? until

  sym dum>m label: cond? not while repeat

  sym dum>m label: cond? not  sym dup >m jw/false end: m-swap again then

The trouble is the `end:' caused by `if'.

Is it really necessary?  I suppose this is to prevent optimizations to
wipe out the branch in some conditions ; a sane default.

So the wait loop really needs a different approach:


  begin cond? jw/false end:


Maybe "until" is the natural primitive, not jw/false?

Interesting!  This is what works:

  until = ( m> jw/false end: )


Entry: The infinite todo list
Date: Tue Mar 29 12:07:22 EDT 2011

* Combine ICSP proto with monitor code.

* Try the ICD interrupt.

* Get USB to work.

* Fix disassembler (use maybe original assembly code annotation?)

* Add command line completion for the readline interface.

* Re-integrate with snot, or use Geiser[1]

* Find out why compilation is so slow.  I.e refactor module structure
  to a finer grain.

* Cleanup source code layout.

[1] http://www.nongnu.org/geiser/geiser_4.html


Entry: Packet-based monitor code
Date: Tue Mar 29 22:38:45 EDT 2011

I'm not sure if that's going to work directly with the current
implementation.  Maybe best to move the current serial port approach
to a packet API.

Approach:

in/b and friends seem to be only called in `target-recieve+id/b' and
`target-count'.  The latter is an artifact of the daisy-chain serial
and needs to move to a different level.  The former already hints at a
message-based approach.

Basically, we can buffer up to the point that we read.  This means the
byte-oriented "printf" style can be maintained as it works quite well.

Looks like all commands expect a reply except for reset.


Entry: Target-side monitor code
Date: Wed Mar 30 21:56:10 EDT 2011

The basic protocol with sync is verfied.  Now the monitor needs to be
adapted to work with a packet-oriented approach.

Most important issue is that the last byte in a transaction is allowed
to detach the receiver for a while.  Maybe reception should drop the
postamble?

The current interpreter code already has some support for headers and
acks.

Roadmap:
  - All commands need to end in ack.
  - Move the serial daisy-chaining code somewhere else.
  - Add addressing to the ICSP protocol?


Entry: .f files and early binding
Date: Thu Mar 31 00:21:45 EDT 2011

You know, I went through all this work to be able to use units and not
have to rely on context-sensitive "include", but I have to say that in
the light of incremental upload and late binding, that approach is
quite valid.  I mean, it's sane: you can load the same code multiple
times with different bindings without fear of changing anything in the
core.

However.  Big caveat.  You can't redefine things in Forth code
implemented in racket modules, because it uses a different binding
mechanism.

It would be great to be able to unify those two approaches, i.e. using
some form of lexical nesting to in the racket modules to get to the
same obscuring behaviour.

It looks like it's only n@f+ and n@a+ that need an additional wait-ack
appended.


Entry: pk2 ICSP proto monitor
Date: Thu Mar 31 01:46:51 EDT 2011

Simple receive/transmit works but the monitor itself doesn't want to
do much.  I assume there is not enough time inbetween bytes for the
interpreter code to run.  Let's measure.

13us..  That's quite a bit at 0.1us instruction cycle.  I had put the
period to 10us.

Hmm.. can't redefine words as macros?
Ok, that's not the problem.  It was a missing definition.

So the 1 command (push) works.


Entry: It's working, now make it faster
Date: Thu Mar 31 03:33:08 EDT 2011

It's not particulary fast because of the large delays.  Let's see if
we can up the clock speed back to 3us.  That doesn't do anything.

I've added the syncs in-channel, and it seems to work except for 13
(stackptr) which replies with 0 for a while and then sends:

icsp-recv: h:#t a:#f b:(0 244 15)
icsp-recv: h:#t a:#f b:(0 8 0)

Which is this on the line (LSB first):

   00 00101111 11110000

That's indeed a 10 sync followed by a #xFF address and a #x00 size
byte.  Tshifting it should be no problem.

The (0 8 0) is:

   00 00010000 00000000

Which is a sync bit followed by 2 zeros.  This is because the device
is in receive mode, and it will clock in address + size, and ignore 0
size message.

Ok, receive sync seems to work.  However, sending out the command
really needs a proper sync as otherwise the targets gets messed up.
It seems there are significant delays so we assume the target is
always there.

When the send sync is on, the receive sync doesn't seem to be invoked.
Maybe there's plenty of pause?

Next option: see if it can be solved with a loop in a PK2 script using
IF_EQ_GOTO.


Entry: Looking at "kb"
Date: Thu Mar 31 16:04:33 EDT 2011

The "kb" display is quite slow.  I'm looking at it at the scope and
while there is still a 3ms delay in the sync due to more than one
packet begin used, there is an extended period where the host is
polling but there is no reply, about 50 ms.

That's 500 instructions.  That doesn't seem right.  Each chkblk loop
is only 64 bytes.

Wait.. The loop is 5 instructions.  That's 6 clocks per iteration
including the 2 clocks for the branch.

 	0218 0009 [tblrd*+]
	021A 50F5 [movf 245 0 0]
	021C 14ED [andwf .L111 0 0]
	021E 06E7 [decf 231 1 0]
	0220 E1FA [bpz 1 .L116]

Actually, that's already 38.4 ms for just the loop, so that indeed
makes sense.  So why is the display so slow?  On the serial line it's
much faster.

(really?)

Typical delays:

       3ms      2ms                4ms              3ms
  sync <-> send <-> receive_header <-> receive_body <-> sync


The actual byte transfers are almost not noticable as they have 3us
clocks and are in the order of 100us total length.  This is far from
ideal!

I don't see a simple way to fix this, so for now this will have to do.
Maybe lower the clock speed too as that doesn't seem to have much
influence either.

Yep.  Switched to 100us and can't see any difference, except that
there is a bit less idle bus time.

I'm not sure how to fix this..  Sending larger messages and using less
handshakes will help, but it seems to be an inherit problem with the
pk2 as there is no pipelining to hide the 1ms usb bus clock.

Hmm.. That's not really true.  In one direction it pipelines just
fine.  Sending 2 x 26 bytes transfers spaces them 1ms apart with only
200us wasted space.

So it does burst well..  The problem then is the pingpong handshake.

So how to fix?  The only annoying part is bulk read and write.  These
can probably be optimized by using larger packets.

Ok, so roadmap for faster pk2:

  - Use RAM buffering for program upload, send one page at a time.

  - Make checkblock work on 1 k blocks.


Entry: Console usability issues
Date: Fri Apr  1 20:51:29 EDT 2011

Let's do some cleanup.  I want to be able to issue host-only commands
without target comm, i.e. set voltages, do chip erase etc..

What this needs is a shorter timeout for ack.  100ms should be enough
to see whether target is listening or not.  This is the OK word.


Entry: Sync issues
Date: Fri Apr  1 21:38:53 EDT 2011

It looks as if it's not properly syncing on startup.  The lucky thing
is that the the NOP is an a 16-bit zero string so that's probably why
it recovers eventually.

Adding 50 retries for pk2-poll is a proper workaround.  OK and BUSY
work properly.


Entry: Stat
Date: Fri Apr  1 21:58:06 EDT 2011

The "stat" word is now available for PK2.


Entry: Scheme prompt
Date: Fri Apr  1 21:59:29 EDT 2011

Might be best to add a prompt for the "scheme" word to make sure we
stay in the scheme interpreter when there is an error.

Apparently exceptions go straight through the prompts (different
tag?), so I'm using a catch-all exception handler.


Entry: Powering target from PK2
Date: Fri Apr  1 22:30:26 EDT 2011

It's possible but needs some massaging.  I'll need to recover the
exact sequence, but the basic idea is to:

 * Check if there is target power, and ONLY switch on the PK2-provided
   line if there is none!

 * Set the reset/program line correctly.

Need to make sure that device is properly configured, so we can get
the right operating voltage from the database.

Actually, the default works for a 5V PIC, without target voltage
checking.  The current `target-on' and `target-off' functions work
fine from reset.


Entry: Faster program
Date: Sat Apr  2 01:17:14 EDT 2011

The current protocol slows it down because there are several sync
transfers.  All should go in a single 8 byte burst.

However, I do not want to create an extra instruction for this.

This should be something like:

JSR <cmdlo> <cmdhi> <addrl> <addrh> <b0> ... <b7>

The change (not necessary for this) but useful for other extensions is
to allow a jsr without ack, so arbitrary comm commands can be added.

This could have the 'fast-prog command in the dictionary.  If it's
there it can be used, otherwise the standard approach can be taken.

Ok, works.


Entry: Can't rewrite Staapl
Date: Sat Apr  2 01:35:53 EDT 2011

Maybe the compiler, but definitely not the interaction system.  There
is too much knowledge and workarounds (``AI'') encoded.


Entry: Faster "kb"
Date: Sat Apr  2 03:15:25 EDT 2011

This can probably use the same strategy as fast program: do more work
per packet, and bundle commands that do not cause pauses into a single
command.


Entry: Debugger
Date: Sat Apr  2 03:18:59 EDT 2011

Making the debugger functionality work should now be straightforward.
This should also be kept interactive:

  - Halt target
  - Set debug bit
  - Program debug vector


[1] entry://../electronics/20110320-225422


Entry: Low-level conditionals
Date: Sat Apr  2 09:28:42 EDT 2011

Does `if' actually work with normal ints?  Trouble is that I really
rarely need it as it's simpler to process condition bits in a
different way.

( Or, I know it's inefficient and will work around using plain `if'. )


Entry: The disassembler
Date: Sat Apr  2 10:06:21 EDT 2011

The next broken part is the disassembler.  I'm thinking it might
actually be better to query the host code instead.

Looks like that data is no longer available due to code-clear!.  Nope,
it is...  I'm not sure why the kernel words are not available.

It's always more useful to have the original assembly code available.
In that case, the disassembler doesn't need to translate labels.
Maybe it's more useful if it just uses numbers?

I've added word addresses but that's really no solution, as "see" uses
byte addresses.  Removed.  This needs to be fixed properly.


Entry: Next
Date: Sat Apr  2 14:13:59 EDT 2011

Basically, the PK2 connection is working quite well.  Some unrelated
cosmetic issues I ran into while hacking:

  - dasm doesn't really work

  - chain cutoff problem

Next: debugger or USB?


Entry: Chain cutoff problem?
Date: Sat Apr  2 14:16:36 EDT 2011


The result of "sea" chains seem strange if there are a couple of
definitions in a row without fallthrough.  No chain cutoff after jump?


: bar food food ; : bar2 bar bar ;
.OK
sea bar
bar:
	02C8 DFFB [jsr 0 food]
	02CA D7FA [jsr 1 food]
bar2:
	02CC DFFD [jsr 0 bar]
	02CE D7FC [jsr 1 bar]
OK


Entry: USB
Date: Sat Apr  2 16:18:46 EDT 2011

The main problem for USB is handling a lot of struct data, i.e. the
endpoint registers used to control the USB hardware.

Let's assume the descriptors are just flash constants.

Just trying to load the usb code gives undefined words for:
*EP0-OUT*  *EP0-IN*

Where do they come from?

What about this one?  Nope it has some missing code.
Tue Jun  2 09:35:33 EDT 2009  tom@zwizwa.be
  * usb driver needs more abstraction

This one does have the EP0 macro definitions:
Mon Jun  1 02:56:40 EDT 2009  tom@zwizwa.be
  * cleanup


macro

\ *WORD* means the current object context has changed: all literal
\ addresses below #x60 are relative indexes.

: EPn-OUT    3 <<< ;
: EPn-IN     EPn-OUT 4 + ;

\ ( reladdr -- ) set object to buffer descriptor in bank 4
: *BD*       al ! 4 ah ! ;
: *EP0-OUT*  0 EPn-OUT *BD* ;
: *EP0-IN*   0 EPn-IN  *BD* ;

forth


Basicly, each endpoint has count, buffer and a control register.  I
was using a-relative addressing.  Should this be maintained?

The main problem is getting this abstraction right.  In C it would be
trivial but in Forth we need to be a bit clever.

Brodie's idea is to try to avoid structures: use code/commands
instead.  Essentially what more does is to use global variables to
store a current state, and have words operate on the current state.

If combined with save/restore on a stack this is essentially dynamic
binding.  The previous approach did exactly that, but using the index
register that's normally intended to hold the stack frame.


So it isn't much else than filling buffers either with fresh data or
from constants and sending acks, so let's build a proper abstraction
for that.


Entry: Real problem: no structs
Date: Sun Apr  3 23:34:50 EDT 2011

Essentially: no local, late-bound namespaces.  They can be emulated,
but this requires global names.  Is that a problem?


Entry: PIC18 extended instruction set
Date: Wed Apr  6 21:11:32 EDT 2011

It's a pain in the ass because it forces you to choose.  Damn
premature optimization!

For C it's a no-brainer.  For Staapl PIC18 Forth however, I don't know
whether it's a good idea because it effectively splits the code into
two different platforms.  The "current pointer" is too useful to not
use.

So maybe from an organizational point of view it's best to not use it
at all, then all PIC18 targets can use the same code.

So, to move the USB driver forward: I need a fast implementation of
indirect data access, or a different approach for accessing the
endpoint registers.


Entry: USB endpoint register access
Date: Wed Apr  6 21:25:06 EDT 2011

The simplest approach seems to be to create a word for setting the a
register to the correct endpoint window, and have an ep-command word
that sets the size and flags.

The buffers only need to be set once at startup.  The good thing is
that the status and count are right next to each other.

STAT
CNT
ADRL
ADRH

So this is really just a!! followed by !a+ !a+ for the basic count &
status access.  Probably need to write CNT first though, since STAT
can cause an action: setting UOWN transfers the buffer to the USB
hardware, which causes a transmit on an IN endpoint.


Entry: a!! bug
Date: Wed Apr  6 21:40:54 EDT 2011

The a!! macro had lo/hi swapped.
This only happened for literals.

hunk ./staapl/pic18/pic18-macro-unit.ss 246
- (([qw lo] [qw hi] a!!) ([_lfsr 2 hi lo]))
+ (([qw lo] [qw hi] a!!) ([_lfsr 2 lo hi]))


Entry: Streams are cool
Date: Wed Apr  6 22:32:53 EDT 2011

In my day-job embedded work I'm moving from data structures to streams
a lot.  Main reason is to avoid intermediate lists.

My favourite abstraction is for-each with early abort.  In C this
needs to be combined with the dual stream interface (open / rewind /
next / eof) because for-each can't be easily inverted (i.e using
partial continuations).

Anyways.  Maybe this is also true for Forth, but then on one level
deeper: to treat structs as streams because indirect addressing
(structs) are a bit awkward to use: no support for the separate
namespaces.  Doing so opens up the door for channel-based
multiprogramming, as a streamed structure read or write can easily be
replaced by a channel connecting producer and consumer.

Essentially this is the same transformation as going from named to
nameless arguments: use position instead of names to encode meaning.

I think a big idea is hidden here.  Streams and state machines...

Actually it's not such a big deal: when memory is scarce, you
implement functionality as communicating state machines.  Streams and
their associated parsers and printers are state machines, or state
machines embedded in push-down automata (state machine + stack) or
2-stack/tape machines.


Entry: A clean usb.f
Date: Thu Apr  7 00:24:04 EDT 2011

So I copied the initial (layer 0?) stuff from _usb.f

I'm using the following syntax for intializing the the buffer
descriptors: The first word sets the a register, and the rest just
stores incremental bytes.

    IN0:
      #x08     !+  \ clear UOWN, MCU can write
      64       !+  \ buffer size
      buf-IN0  !+  \ addrl
      buf-page !+  \ addrh

What I miss is "emit".  It doesn't work properly.  I can't do much
without basic debug output.  So fixing that is the next task.


Entry: Fixing emit
Date: Thu Apr  7 00:26:25 EDT 2011

Let's use different addresses to distinguish between return and call.

I.e. sending to 0xFF causes the host command to terminate, while
sending to 0xFE causes a command to be executed.

Problem.  With:

: bar 65 emit 65 emit ;

bar
icsp:send h:#t b:(0 3 3 214 2)
icsp-recv: h:#t a:#f b:(1 255 1)
icsp-recv: h:#f a:#t b:(65)
icsp-recv-message: (255 1 65)
Aicsp:send h:#t b:(0 0)
icsp-recv: h:#t a:#f b:(1 144 0)
icsp-recv-message: (144 0)
icsp:send h:#f b:(0 0)
icsp-recv: h:#t a:#f b:(2 0 0)
icsp-recv-message: resync: 10
icsp-recv-message: resync: 1
icsp-recv-message: (0 0)
OK

There seems to be a collision when both sides are writing on the bus.

Saved scope capture as:
~/staapl/NewFile1.wfm

How to view the waveform?

What I see is:
* host sending 0 3 3 x x
* target sending 255 1 65
x host sending 2 pulses, where target replies ack 1,0
* host sending out 16 pulses, target replies 0 (receives!)
* host and target both send data

The trouble is at x: target sees the pulses and performs ack, but host
doesn't seem to see the ack?

Every poll then clocks the device while it's in a 2 byte read.  When
that is finished it replies.

Very strange.

Let's look at the transmit part.  It misses the sync bit.  Why?


Does it actually ack when it receives a zero packet?

It doesn't so I changed that.  This has no effect though.


It doesn't see that pulse...  Damn.  Why?  BUG in PK2?

FOUND IT: in pk2-in, an icsp-ack was sent while the internal
message-oriented code in icsp.ss already performs the handshake.


Entry: Resync
Date: Thu Apr  7 14:37:19 EDT 2011

Now ts doesn't work any more.

All commands that receive data from the host do not work any more
after removing the icsp-ack call from pk2-in.

Problem was that "emit" and "reply" are not the same thing.  "emit"
needs a separate ack to keep the request-response cadence going.


Last problem seems to be the "kb" word.
This was also still calling emit. Fixed.
It's a lot faster now that the sync issues are resolved.
Fast is good!


Tss... now I get this:
racket pk2-picstamp.dict
Connecting to PICkit2.
datfile:  /usr/local/bin/PK2DeviceFile.dat
iProduct: PICkit 2 Microcontroller Programmer
command-made-roundtrip: 0 ()

I think I found it: when the target has just set the ack bit and we
miss it because we clock out the next..

Something fishy going on with the handshake.  At times I see a third
clock pulse.


Entry: Better debug tools
Date: Thu Apr  7 20:38:40 EDT 2011

1. Point-and-shoot a word: find its definition if it appears in the console.
2. Get at backtraces.


Entry: Test-oriented programming
Date: Fri Apr  8 19:33:29 EDT 2011

So, how to make things more testable.  I'm currently hacking away at
the usb driver, lamenting the absence of breakpoints and data
inspection.

Might be time to implement those?

At first, since updates are so cheap, simply modifying the code is
quite doable.  What I need is a word that will abort execution and
drop be in the interpreter.

This isn't so hard: reset the return stack and call interpreter.


Entry: ICSP on 8MHz
Date: Sat Apr  9 18:19:35 EDT 2011

Doesn't seem to work.  Is this important?  Yes.

I tried it down to 200us clock which is ridiculously slow so there has
to be another problem.  I can see similar issues at 15us.  Some sync
issue.

It takes the chip 25us to respond to the handshake pulse, so anything
that's larger that that should be fine.  Let's take period 60us.

Problem is: both write to the bus, so it looks like they continuously
get off cycle.

The thing is, that's quite a long time.  Is it actually running at 2
MIPS?  It's just sitting there in a tight loop of 3 cycles.  It takes
5 cycles to get from detection of positive clock level to setting the
output.  At 0.5 us cycle time (2 MIPS) that should be 2.5 us, not the
tenfold of that.

Something's not right.

Section 2.4 in the data sheet: at startup, the output of INTOSC is set
at 1MHz.  That makes more sense!


Entry: New proto board
Date: Sun Apr 10 00:49:26 EDT 2011

Wired up 2550 to ICD and USB.  Latter was quite straightforward.  No
external components required: just hook up GND, D-, D+.


Entry: TODO PK2
Date: Sun Apr 10 00:50:34 EDT 2011

- Proper reset instead of power cycle
- Check target voltage before switching ON
- Programming
- Debugging


Entry: Debug output
Date: Sun Apr 10 11:53:59 EDT 2011

Sending messages from Flash is trivial.  How to encode it in the
source?  Does backtick still work?

No.  This needs a different prefix parser.

I do have "fstring:" which is based on f->

Since this is just for debugging, it would be nice to have something
more general.  Something that's part of the parser.

Alternatively, conditions could be stored on the host too.

What I really need is tracing info: this way a single word "trace"
could be inserted at a particular point to see where execution is
going.


Ok, what works:
- 0xFF : normal console logging and ack for empty message.
- 0xFE : hex dump

What I want: a trace command that allows the host to execute code in
the sync loop, i.e. to query the target.


OK. The sync is a bit patched together but at least it works:


(define (trace-hook addr)
  (printf "trace: ~x\n" addr)
  (abd 0)
  )

: foo trace trace trace ;
.OK
foo
trace: 312
000  F8 12 A4 ED 05 24 53 02
008  D3 80 5A 20 C5 C2 14 0C
trace: 314
000  F8 12 A4 ED 05 24 53 02
008  D3 80 5A 20 C5 C2 14 0C
trace: 17e
000  F8 12 A4 ED 05 24 53 02
008  D3 80 5A 20 C5 C2 14 0C
OK


This should enable any kind of program instrumentation at trace
points.


Entry: It needs to be faster
Date: Tue Apr 12 14:56:59 PDT 2011

It needs to be faster.  It takes almost 3 minutes to compile all the
code dependencies for a single .fm image.  After that it's reasonable
(just compiling forth code).

Why is it so slow, and how to make it faster?

The thing is that once it is running, once the compiler is compiled,
it shouldn't be changed any more.  The inconvenience is just an
artefact of me constantly changing things that require full recompile.

Maybe getting rid of some of the bundling modules should make it work
better: have finer-grained module boundaries.


Entry: Forth is not C
Date: Mon May 23 15:36:07 CEST 2011

C isn't all that bad.  It's main virtue is that it is quite readable
and has a stable status-quo.  That stifles innovation in programming
method, but from a business perspective it brings predictability.

Forth (and the idea behind Forth) is definitely more powerful than C.
It's a bit like the story of Lisp: with great power comes with great
responsability.  In practice it seems that toning down programmers by
_limiting_ the expressiveness isn't always a bad idea.

( I'm still really a hacker - problem solver and not a manager.
However, recently I've had the pleasure of working with a very good
project manager, and I'm starting to see some things that were not
part of my world before.  All of them have to do with making money
through preventing loss. )


Entry: The Forth community
Date: Mon May 23 15:52:17 CEST 2011

As I mentioned elsewhere here, I don't like the fundamentalism of the
Forth community either[1], AND the embedded community when it comes to
language innovation.

The thread does mention that Dear Chuck is OK, with which I agree
completely ;)


[1] http://news.ycombinator.com/item?id=2574204


Entry: Forth as an interface
Date: Mon May 23 16:07:11 CEST 2011

The good thing is that it has manual fanin/fanout specs so the
compiler doesn't need to do that part of (very hard!) re-arranging.
Maybe Forth as a frontend really isn't such a bad idea.  This is
essentially Factor[1].

[1] http://factorcode.org/


Entry: Metalanguages are functional
Date: Sat Jul 16 14:35:41 CEST 2011

What I thought to be the big insight in Staapl is that the
metalanguage (macro language) of an imperative Forth language can be
made purely functional.

What I did not understand at that time is that this is also part of
the idea of using monads in Haskell[1]:

    While programs may describe impure effects and actions outside
    Haskell, they can still be combined and processed ("compiled")
    purely, inside Haskell, creating a pure Haskell value - a
    [computation description] that describes an impure calculation.


What is similar between Monads and Staapl's associativity which allows
compile time and run time to be arbitrarily separated, is that in both
cases the awareness that we're dealing with a meta-level "disappears a
bit".

Of course, Staapl's macro language is dynamically typed so there are
no static type signatures to go by.  Maybe it would be a nice exercise
to make all that hidden structure more explicit.


[1] http://www.haskell.org/haskellwiki/Monad


Entry: Running synth with PK2
Date: Tue Sep 27 20:08:14 EDT 2011

Problem: probably need to disable interrupts during PK2 bitbang.  PIC
waits in the following routine:

: icsp-sync \ -- : sync on rising clock edge
    begin icsp-clock low?  until
    begin icsp-clock high? until ;

Is it enough to do this?

: icsp-sync \ -- : sync on rising clock edge
    sti
    begin icsp-clock low?  until
    begin icsp-clock high? until
    cli ;


Entry: PORTB / ICSP comm?
Date: Thu Oct  6 15:46:08 EDT 2011

Is this the culprit?

: init-out
    TRISB 2 low
    TRISB 3 low ;

Nope.  ICSP pins are RB7-RB5

It seems to really be the interrupts.  Whenever I switch on one of the
timer interrupts the comm gets messed up.  It probably misses pulses.

I seem to recall that the ICSP hardware's pulse size can't be changed,
but I beleive it does keep the data stable after the pulse.  Is there
a way to let the hardware detect the pulse?

PORTB 6

There is the interrupt on change mechanism.

The question is whether it is worth to spend time to do this.  Let's
just briefly look at the IOC mechanism.


Entry: PORTB IOC
Date: Thu Oct  6 16:01:14 EDT 2011

A change on RB7-RB4 sets RBIF (= INTCON:0)

To ack we need to read PORTB to end the mismatch condition, and clear
RBIF.

: rbif-ack   PORTB @ drop INTCON RBIF low ;
: rbif       INTCON @ 1 and ;
: rbif-test  rbif-ack rbif ;


Tried this, no success:

\ This doesn't actually wait for a clock pulse, but for a change on
\ *ANY* of RB7-RB4.  This is a hack to work around failure to detect
\ short pulses due to interrupts in the busy loop.  Doesn't seem to work.
: icsp-sync.hack \ .hack
    begin icsp-clock low?  until
    PORTB @ drop INTCON RBIF low \ ack RBIF
    begin INTCON RBIF high? until \ wait for change on RB7-RB4
    ;


Entry: Console working with app that uses interrupts.
Date: Thu Oct  6 16:39:40 EDT 2011

Test case: synth.
Seems like there are a couple of routes:

- Just use the console to setup some stuff and call `main'.  This
  works fine, but console is dead of course.

- Make come kind of CTRL-C command to stop the app.

- Run the console from interrupt, at least the byte-input part.  This
  might be a bit of work.

- Use the ICSP debugger stuff[1].

[1] entry://../electronics/20110320-225422


Entry: Problem: Data Structures
Date: Mon Oct 31 23:50:55 EDT 2011

- data structures vs. "protocol oriented" programming.

I was thinking that maybe Staapl should be about minimizing code size.
A single-purpose language.  Currently the lack of lexical variables
make working with data structures quite a challenge.  I.e. the USB
driver horror.  I'm not sure yet if this is really just a name space
issue.  It's funny how this arises only in externally specified
protocols.  When I write my own stuff I get away with representing
things as code and actions.


Entry: Preparing for release
Date: Sat Nov  5 09:49:53 EDT 2011

Need to fix:

 - libusb DONE
 - hex printing DONE
 - proper reset DONE, 'cold' still works
 - reliable ping (+- DONE.. pk2 seems to get stuck sometimes)
 - reliable pk2 reset ???
 - record definitions to file DONE
 - documentation


Entry: hex printing
Date: Sat Nov  5 13:18:19 EDT 2011

So emit can do strings, but it's probably best to allow a hex printing
mode also to avoid having to waste much time on the PIC for trace
logging.

Actually, it's already there.
I changed the encoding so row/plain is just one bit: FC FD.

Maybe this should be changed however, to use one extra byte and use
only the #xFF address for host calls, instead of encoding host calls
in the host address space.


Entry: Reliable ping
Date: Sat Nov  5 13:52:51 EDT 2011

When target is sending stuff, i.e. in a print loop, it will look as if
it is responding to pings.  This means the pings are too simple.

It still gets stuck sometimes:


foo
01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F 10 11 12 13 14 15 16 17   C-c C-c
Command "foo" interrupted.
Trying cold restart...
target-off
target-on
recv-header: malformed header: (0)

but I this seems to be due to a stack underflow on the device, using:

  : foo 1 + dup dump foo ;

with a correct def it doesn't behave like that:

  : foo 0 begin 1 + dup dump again ;

I do get this error still:

bad-reply-id: 0 ()

That latter one seems to be recoverable by another retry.

This one however seems to indicate that pk2 is stuck:

icsp-recv: pk2 read: expected 3 bytes, got 1:
icsp-recv: b:2 h:#t a:#f -> (0)

even the stat command doesn't give good results:

stat
(status
 (0 "Vdd GND")
 (1 "Vdd")
 (0 "Vpp GND")
 (0 "Vpp")
 (0 "VddError (Vdd < Vfault)")
 (0 "VppError (Vpp < Vfault)")
 (0 "Button Pressed")
 (0 "Reset since READ_STATUS")
 (0 "UART Mode")
 (0 "ICD transfer timeout/Bus Error")
 (0 "Script abort - upload full")
 (0 "Script abort - download empty")
 (0 "RUN_SCRIPT on empty script")
 (0 "Script buffer overflow")
 (0 "Download buffer overflow"))
(voltages (0.000152587890625 "Vdd") (3.425 "Vpp"))
subbytes: ending index 65 out of range [1, 64] for byte-string: #"@\254\0@\4\0 \376\200\3\1\0\304\37\200 \0\0\361\a$\b\0@\374\1\n\2\0\210?`A\0\0\361\a0\b\0@\374\1\r\2\0\20\177\200\203\0\0\3...

Maybe it would be good to find out how to reliably reset the pk2?


Entry: Record definitions from interactive session
Date: Sun Nov  6 09:07:23 EST 2011

It seems really hard to save the state due to possibility of macros.
So what about just logging the input to a file for each successfull
compilation / macro def?

This requires a patch to the interactive parser live/commands.ss

So I've added an eval log, which seems most appropriate as it contains
validated syntax.


Entry: PK2 slave?
Date: Sun Nov  6 10:56:35 EST 2011

I miss a non-interfering console log.

It would be nice if it were possible to put the pk2 into slave clock
mode such that the pic doesn't have any restrictions on how fast it
can send the data.  Currently it seems to be problematic to do a
bit-banged interface with a slave clock on the pic when there is an
ISR active, i.e. as with the synth.

What about this:
- pic master clock, only 1-directional send.
- pk2 can interrupt pic by raising a particular line.

Looks like I2C and SPI are only supported with PK2 set as master.
Without a gigantic hack this is probably not going to work..

What about making an ICSP interrupt handler?  I looked at this before
but it doesn't seem to be so simple because of the change-detect pins.
However, the on-chip debugger unit uses this approach.  Maybe the
right approach is just to implement that debugger interrupt[1].

The real solution is a protocol that has no timing issues, meaning
each event needs to be acknowledged.  The simplest protocol I can
think of is here[3].  Can the PK2 do that?

[1] entry://../electronics/
[2] http://jaromir.xf.cz/hdeb/bdm/bdm.html
[3] entry://../electronics/20111106-131545

Entry: Synth: chaotic oscillator
Date: Sun Nov  6 10:59:39 EST 2011

I thought I had some kind of fake[1] chaotic oscillator running.. How
did it work?   Looks like I lost some code somewhere..

Let's see if I can reconstruct with current design.

This needs 3 features:
- main periodic oscillator
- a "resonance" which is the timeout from start -> end
- a random pulse that triggers the reso

This seems to conflict with the reso mixer as the nose osc is OSC1 and
not OSC0:

  OSC0 xor (OSC1 and OSC2)

this fixes OSC1 and OSC2 to be gate and reso, and OSC0 to be chaos.
Maybe the mixer should be changed to support this again?

It would also be good to dig up the old code since there have been
quite some archive changes over the years..

[1] entry://20071117-022751


Entry: Problem with synth + pk2 solved.
Date: Sun Nov  6 14:34:07 EST 2011

There were 2 problems actually: interrupts + PORTB config.

Culprit was this:
: init-ports-digital
    #x61 TRISB or!    \ analog 4 is RB0 + RB5-RB7 are digital in (RB7 = icd tx)
    INTCON2 RBPU low ; \ enable weak pullups for port B (switches)


The following dump routine now works:
: _dump_ cli dup dump sti ;


Entry: Documentation
Date: Mon Nov  7 11:01:04 EST 2011

The `pic18>' macro in the demo is broken, meaning that the
documentation won't generate properly.


Entry: pic18/demo
Date: Mon Nov  7 11:11:00 EST 2011

Needed to fix some code in pic18/demo that's used in the documentation
after the unit refactoring early this year.


Entry: state machines
Date: Mon Nov  7 16:44:03 EST 2011

I'm thinking about what to do next with the synth.  The main problem
is that I don't have a good way to deal with state machines and
building dataflow networks, or at least event sending..

State machines can be well represented using python-style coroutines,
which are essentially threads with only one cell, meaning that yield
can only be called at the entry level.

It would be nice to have some global compiler support for doing jump
table allocation for this mechanism, since RAM is so expensive and
most state machines can do well with 256 states.

The problem then is how to find "islands of control" that can use
disjunct control points?  It might be best to do this with an extra
context, as we also need to enforce that yield is not used is library
routines.

Some notes

- Islands of control: collect words that are separated by tail calls
  only.  Doing this automatically is possible, but would require it to
  be done on a low level.

- Explicit declaration might be better, something like:

    begin-sm <entry> ... end-sm

  This way the context for this can be stored in a threaded compiler
  state, and we can enforce that all calls inside such a section are
  actually tail calls.

- Python-style generators/coroutines do not allow calling `yield' from
  nested context, which is exactly the point here since we want to
  save memory and resource allocation hassle.  However, in Staapl
  nesting that doesn't use any data state can still be done using
  macros, which would expand into multiple control points.

- It needs an extra compiler state.  It might be better to first make
  compiler states composable, or provide a way to attach some state
  for later extension.


Entry: The Staapl compiler is an Arrow
Date: Mon Nov  7 18:15:43 EST 2011

In spirit that is, not in interface.  It behaves as one, and it would
be great to be able to implement it as one.

Arrows in Haskell use binary tuples for plumbing.  What I need is a
mechanism that can identify components by name.  This would kill
modularity somewhat (i.e. can't arbitrary nest the same types) but
would make things a lot easier to work with.

The other idea is that instead of using names as in what I'm looking
for, or arbitrarily nested binary tuples as in the Haskell Arrow
class, the states could be nested as stacks also.  This might give a
more appropriate "stacks of stacks" approach.

Anyway, let's first summarize what there is at this time:

- Compilation stack: code compilation and partial evaluation
- Control stack: local control words: if, then, begin, again
- Macro stack: implements ';' in macro nesting.
- Chain stack: collects basic blocks.

The first 2 are called '2stack' and are defined in ...
The other 2 are define in staapl/coma/state.ss as a struct derived
from 2stack with two extra components.

Let's see if I can add a coroutine stack in there without too much
trouble.  Maybe just adding a hash is already more than enough to keep
the extension in a module.

I've added an 'ext' field to the 'compiler' struct in comp/state.ss
and adjusted the matchers and constructors appropriately.  Seems to
work just fine.  Now how to access?

I see I did my best to go creative with the `state-update' macro.
This uses the `mu-lambda-struct' from vm.ss, a shorthand notation for
machine register updates, where assigment and matching are expresed by

        <reg> -> <value>
        <reg> : <pattern> -> <value>

Maybe I should make a list called "goldmine of weird ideas" ;) I'm
having deja-vu now from the time when I wrote this.

Conclusion: the state mechanism is already extendable through the
struct derivation method.

   stack -> 2stack -> compiler

These support automatic lifting / polymorphism, i,e, you can pass a
derived struct to a basic op and it will update only the derived
field.

So yes, it should be possible to use the extension mechanism that's
already there.  However, it would make it difficult to do this using
the current modular mechanism, in that the state constructor would
need to be extended in a way that seems to interfere with the
modularity constraints..

So let's implement it as a hash table with local names, i.e. not
symbols but something that can be globally unique.


Entry: General cleanup
Date: Tue Nov  8 13:12:58 EST 2011

- Error messages are horrible.  Some examples:

   duplicate definition
   -> where's the first one?)

   non-null-compilation-stack:  (#<target-word>)
   -> say where the context is opened, i.e. the location of the non-matched "for"

- .f files are not in dependencies for compilation of .fm module.

- ! and @ only work for macros.  how to fix that?


Entry: Some sounds
Date: Tue Nov  8 13:47:44 EST 2011

Can play with this:

: 2execute \ lo hi --
    push TOSH ! TOSL ! ;

: play
    init-board
    engine-on
    2execute
    engine-off ;


\ DEMO
3 ' z2 play
30 ' z1 play
20 ' z0 play
' wioew play
' rrxmod play
1 ' nzwioew play
' iiuu play
' woe play
3 ' iiuwoe play

128 ' pattern-sequencer play

Entry: Where's the other code?
Date: Tue Nov  8 14:41:46 EST 2011

I had some other code with synth patches and a sequencer.  Where did
it go?  I found them in the brood-4 archive.  Looks like I just forgot
some files when porting from 4 to 5 last time.

So I copied some files.  Looks like demo.f is what was used for the
Piksel performance, and is more advanced that what is in synth-control.f

I also found the bassdrum / hihat stuff in sounds.f

And indeed, after a couple of minor changes, it still works!
128 ' pattern-sequencer play


Entry: PK2 sync
Date: Wed Nov  9 00:47:15 EST 2011

I wonder if it would be possible to change the sync part to a level
triggered interrupt, maybe using the same protocol as is used in the
ICD, which is to cause a 1->0 transition on RB6 pin (clock).

This means that we need to keep the bit high in idle, and lower it to
signal interrupt.

I think this even works with the current protocol.

pulse 0:  target writes 1 to ack, otherwise line pulls low.
pulse 1:  target releases bus
pulse 2+: host writes packet, then waits for reply

so instead of expecting a reception after the first pulse, the host
will poll until it sees a 0->1 transition from the target, which
acknowledges the interrupt.  from then on, the fast protocol can be
switched on.


Entry: Serialzing thread state
Date: Wed Nov  9 10:40:38 EST 2011

I'm thinking about the shallow coroutine (SCR) approach to encoding
code that's formulated using recursion into a compact state-machine
representation.

Currently there are 2 ways to go about this:

 - Implement 1-SCR using state ID allocation, jump table generation

 - On top of this, implement n-SCR by control flow analysis.

Maybe the flow analysis really isn't necessary.  As long as the
"recursion" is factored out as macros, there is no real issue since
this can just use linear allocation of control point IDs.

The code turns out to be quite small and straightforward.  It is
amazing how such a minimal change can have such a big impact.


Entry: Wow
Date: Wed Nov  9 15:08:55 EST 2011

I'm loosing it..  going too creative with inventing new syntax...
What I want to do is:

- make `make-target-label' available to Scheme code to make it easier
  to construct control flow abstractions in Scheme using lexical
  scope, instead of using m stack juggling.

- make some syntax for local binding of the compiler state variable to
  make extended compiler state available as lexical bindings.  The
  reason is that the following is too verbose:

  (compositions (macro) macro:
   (word ,(lambda (state)
            ((macro:  ......) state))))

- singleton coroutine or multiple instances?  (where to bind state var?)


;; The SCR state is accessible through a compiler state extension tag.
(define scr-tag (state-tag-label 'scr))

;; STATE REP
(define-struct scr-context (var yield))
(define-struct scr (ctx labels))

;; ACCESSORS
(define (ref s)     (state-tag-ref s scr-tag))
(define (ctx s)     (scr-ctx tag (ref s)))
(define (yield s)   (scr-context-yield (ctx s)))
(define (var s)     (scr-context-var (ctx s)))
(define (labels s)  (scr-labels (ref s)))
(define (next-id s) (length (labels s)))

;; MUTATORS, all curried to produce state transformers.

;; Add label to SCR jump table state.
(define (set v)
  (state-tag-set scr-tag v))
(define (update label)
  (lambda (s)
    (match (scr-ref s)
           ((struct scr (ctx labels))
            ((set (make-scr ctx (cons label labels))) s)))))

(define (init-state var)
  (set (make-scr (make-scr-context var (make-target-label)) '())))

;; Convenience: bring state in lexical context of macro, and apply
;; macro to state.
(define-syntax-rule (let/state state macro)
  (lambda (state) (macro state)))

;; Bound to macro: This has a semicolon as ad-hoc separator to
;; indicate that state is a binder.
(define-syntax macs
  (syntax-rules (:)
    ((_ state : . words)
     (let/s state (macro: . words)))))


Entry: Shallow Coroutine State
Date: Thu Nov 10 08:22:03 EST 2011

So the question is: where to store the address to the state variable.
If it is not a singleton coroutine it can't be in the code itself,
only in some wrapper word or passed explicitly to the SCR entry/resume
point.

What about storing it on the r stack?


Entry: Conditionally compiling target words
Date: Thu Nov 10 08:34:53 EST 2011

Macros are free.  Nobody cares if they're there, except for the
namespace pollution.

Words are not, so how to solve the problem of conditionally compile
some target code that supports a macro collection?

Looks like I already need this functionality for '!'.


Entry: Python coroutines
Date: Thu Nov 10 23:00:35 EST 2011

A generator suspend is return but without decref of the stack frame,
so it includes arguments and locals.

[1] http://mail.python.org/pipermail/python-dev/1999-July/000467.html


Entry: Fix documentation
Date: Sat Nov 12 12:08:50 EST 2011

Find out how to build it outside of racket.
This generates file:///home/tom/staapl/staapl/scribblings/staapl.html

  cd ~/staapl/staapl/scribblings
  scribble staapl.scrbl


Things to fix:

  > (macro +)
  (word #<procedure:...aapl/scat/rep.ss:67:3> '((((qw a) (qw b) +) ((qw (tv: a b +)))) (((addlw a) (qw b) +) ((addlw (tv: a b +)))) (((qw a) +) ((addlw a))) (((save) (movf a 0 0) +) ((addwf a 0 0))) ((+) ((addwf POSTDEC0 0 0)))))

Not such a big deal, but I don't see why this isn't just #state->state

  > (print-code (macro: add))
  reference to undefined identifier: POSTDEC0

POSTDEC0 is not exported by demo.ss
It is in the pic18/sig.ss signature module, signature pic18-const-id^


  > (target-value->number (tv: 1 2 +))
  reference to undefined identifier: tv:

I moved the definition from pic18-macro-unit.ss to target-scat.ss


  (define stx1
    #'(rpn-lambda
       (macro-push 1)
       (macro-push 2)
       (scat-apply (macro +))))

  > (pretty-expand stx1 expand-once)
  (rpn-lambda (macro-push 1) (macro-push 2) (scat-apply (macro +)))
  > (pretty-expand stx1)
  eval:53:0: macro-push: (macro-push 1) did not match pattern
  (macro-push val p sub) in: (macro-push 1)

This was a missing provide of "rpn.ss" ids in "demo.ss"

Looks like it's done.


Entry: Documenting the new Forth -> Sexpr compiler
Date: Sat Nov 12 12:38:03 EST 2011

Let's first do this based on the macro stepper.

Hmm.. Looks like I need to look at the source to see how this works.


Entry: Macros don't support quote
Date: Sat Nov 12 13:17:54 EST 2011

This doesn't work:

(let ((p ....))
  (forth-begin path ,p))

The problem is in macro-forth-sig.ss

I have a phase problem.  How to get the string "pic18", represented by
a value, injected into the body of the macro?

   (define-syntax forth-begin
     (lambda (stx)
       (syntax-case stx ()
         ((_ . code)
          #`(begin
              (mf:forth-begin
               path #,(build-path (home) "pic18")  ;; library path
               . code)
              (mf:compile!))))))


The word after 'path' is always parsed as a literal.  What I need is a
form like

   "asdfasdf" path!


Hmm.. then even still..  It's a weird kind of phase mixing!  Find out
what is the real problem here.

It looks like this really needs to be inserted as a compile-time
entity, not a run-time entity.  That's why it can't be done in the
signature definition, because macros in a sig def can only depend on
identifiers in the signtature.

The underlying reason of this needing to be a compile-time entity is
because the file search path representation is also a compile-time
entity, so the let form at the top of this post is meaningless: the
binding doesn't exist at expansion time.


I fixed the issue by moving the '(library "pic18") form to
pic18/lang.ss and added a stub to insert that term in the
live/command.ss interpreter by means of `forth-begin-prefix' which can
insert the same.  Now forth-begin is generic.


Entry: Next?
Date: Sat Nov 12 15:21:21 EST 2011

Done for now.
- broken doc fixed
- removed some dead code after "forth as unit" refactoring
- removed pic18 reference in generic forth macro parser

Next?
- pulling in word defs when a macro is used
- ! and @
- SCR


Entry: Pulling in word defs
Date: Sat Nov 12 18:12:25 EST 2011

There is currently no dead code elimination for globally defined
words.  Maybe this should be done differently.  The current ad-hoc way
of building control flow graphs is a bit of a hack..

It also doesn't include jump tables.  Maybe the (lack of) language
semantics is just a bit too low level.

So, how can I implement this without getting into muddy waters?

One way is to build an extended compilation state that records a
reference to the word (present / not), and adds a definition to
another word chain when it doesn't find a definition, just like
org-begin and org-end do.


Entry: Property based testing
Date: Sat Nov 12 18:41:36 EST 2011

How to make property based testing for the peephole optimization
rewriting?


Entry: Compile a macro into a word
Date: Sat Nov 12 19:06:53 EST 2011

This was the attempt, but it seems I'm confusing macro (state->state)
and state types.

;; FIXME: compile a word outside of the current chain.
(patterns
 (macro)

 (([qw label] [qw macro])) ,(add-word label macro))

(define (add-word label macro)
  (state-update
   ((dict : (struct dict (current chain store))
          -> (make-dict current
                        chain
                        (cons (make-dict-entry label macro)
                              store))))))
(define (make-dict-entry label macro)
  ???
  )

That's where I saw that I'm dealing with apples and oranges.  How to
call the compiler recursively?  Or how to avoid that, and do the chain
trick?

Looks like what's really needed is something that does state stack
juggling, like this:

      (macro: ',label label-org-begin ,macro exit org-end)

So I've wrapped it in a 'compile-macro' word, which seems to work.
The code is compiled outside the current chain.

box> (forth> ": foo 5 6 begin make-label [ 123 + ] compile-macro 7 8 again")
.L11:	0052 00A4 0F7B [addlw 123]
	0053 00A6 0012 [return 0]
foo:	0054 00A8 6EEC [movwf PREINC0 0]
	0055 00AA 0E05 [movlw 5]
	0056 00AC 6EEC [movwf PREINC0 0]
	0057 00AE 0E06 [movlw 6]
.L10:	0058 00B0 6EEC [movwf PREINC0 0]
	0059 00B2 0E07 [movlw 7]
	005A 00B4 6EEC [movwf PREINC0 0]
	005B 00B6 0E08 [movlw 8]
	005C 00B8 D7FB [jsr 1 .L10]


Trouble is though, if I want to use this for one-shot compilation of
code that's used whenever a macro is invoked that needs support code,
it will be compiled again for incremental compilation, as this kind of
state is not saved.


Entry: DIP packages
Date: Sat Nov 19 22:24:07 EST 2011

One of the reasons I stuck to PIC18 is the DIP package.  So what are
the other options?

- PIC
- AVR
- MSP430: only MSP430F2013

Looks like MSP430 is not really worth it: only one "sugar me" DIP
package..


Entry: State machines & register allocation
Date: Wed Jan 11 10:59:17 EST 2012

I've been playing with Haskell a bit lately, writing a state machine
compiler (essentially SSA form of code without recursion: original
form is only letrec-style mutual tail recursion).

This makes me think about a certain disconnect between the approach
used in Staapl (stacks = local variables) and a state machine approach
that's so useful for embedded systems.  There isn't really a good way
in Staapl to make this kind of thing work well due to lack of
structure namespaces., i.e. separating the 'instance' and 'member'
concepts.

All the talk of protocol-oriented programming does seem to have
something underlying it (minimise state in data processing by focusing
on streams), but in practice it happens that data structures are
buffer-oriented, not stream-oriented, probably due to a bias of the C
programming language.  Stream-oriented work requires protocol design.
If the protocol is fixed the minimal memory usage is limited by
(ad-hoc) protocol decisions and random data dependencies.

Concretely: i'd like to solve the USB protocol parser problem in a
smart way but this really requires either buffering and structure
(namespace) support, or a botched stream-oriented approach that tries
to work with the limitations of the protocol.

I was thinking that *if* the intention is to compile down to a bunch
of global variables, then it's possible to just use a modifed kind of
macro.  The Staapl macros have lambda and I bet this isn't so hard to
then turn into global memory references in an instantiation phase:

  - function abstraction: name binding only: associate each argument
    with a "var @" macro.

  - function calls: pop arguments from stack and store them into the
    associated macros.  requires knowledge of arity, but could be
    included at definition time.

This requires a "lazy argument" macro, where each argument is
interpreted as an inline function instead of a literal value.

A macro like :

   : foo-macro a b c | ... ;

would be instantiated to a function foo' by binding its arguments to
the variables:

   : foo-code [ v1 @ ] [ v2 @ ] [ v3 @ ] foo-macro

and each call to 'foo' would be implemented as a macro or function

   : foo v3 ! v2 ! v1 ! foo' ;

The purpose of this approach is then to isolate naming (just lamba
arguments) from storage allocation.

This technique is probably useful in general to allow "flattening" of
stack allocation.

[1] entry://../meta


Entry: Next?
Date: Fri May  4 18:38:18 EDT 2012

Been a while.  It's starting to itch again.  Looking at those USB
connectors in my rack here..  Maybe it's time for another pass.  Get
the USB driver going and make a synth controller.  Why in that order?
The USB would seriously simplify interface issues, and is long
overdue.  I wonder what Staapl could have been where it not for the
failure to have a working USB interface.  However, the reasons for
that impasse are quite deep, and seeded in doubt about "the right
approach".

I still like the idea of an untyped macro language, but currently it
lacks something that is completely trivial in C: hierarchical data
structure namespaces.

There is still the idea of "protocol oriented programming" or "minimal
complexity stream parser approach" whatever name it should bear, but
unfortunately that doesn't work so well with existing protocols like
USB which are based on random access to flat memory buffers, an
approach that is quite biased to using C.

So, to summarize:

- formalize "minimal complexity stream parser protocol design"
- implement hierarchical namespaces
- get that damn USB driver to work


Reading the previous post, it seems that at least the idea of
protocol-oriented programming is a bit stable.  Maybe time to figure
out if it actually makes sense practically.  Theoretically at least I
see a whole bunch of complexity disappear if protocols are designed
better, or even automatically.

And the solution for adding namespaces had crossed my mind too:
there's lambda to introduce local names.  Much more is not needed,
nesting those will do just fine.


Entry: Forth is compression
Date: Sun May  6 08:47:13 EDT 2012

- compression of code size and temporary data storage (short-lived
  values values) through implicit operand access.

- compression takes effort: there's no free lunch.  writing forth
  takes more time because it forces "good structure".  writing "bad
  forth" is very obvious: code size explodes.

- this idea can be taken far: stream-oriented programming: state
  machines and small tasks.


Entry: Stream-oriented machine
Date: Mon May 14 17:28:48 EDT 2012

I found a use case for it, so let's set up the basic architecture:

Instructions:

  LIT ( -- n )               copy byte from instruction stream to stack
  CPY ( n -- ... )           copy n bytes from instruction stream to stack
  EXC ( .. addr n -- )       execute C ABI
  LOK ( n -- addr n )        lookup addr/nb32bitargs


LIT 1
LIT 4 COPY 1 2 3 4
LIT 1 LOK

Then, it would be nice to be able to define a shortcut like this:


DOFUN <codebyte> <arg> ....

This way it's possible to get the forth machine through the company
management, making it do something useful first, and then extending it
with all kinds of Forth goodies ;)

Can this use the trick of loading the return stack with opcodes?  If
opcodes are both primitives and calls, this might work.


Entry: Executing a word by pushing it's instructions on the return stack.
Date: Mon May 14 18:13:28 EDT 2012

Why do I need it?  This way I don't need to implement code threading,
just loading literals on a stack.

What does this require?  A union type that represents both primitives
and sequences.

Probably, sequences need to be abstracted.

No, what this changes is explicit "exit", where "exit" re-loads the
return stack.

Basically I have this idea in my head that the return stack is really
just the continuation, which represents an infinite list of
instructions (that might be non-deterministic, i.e. it branches).  The
return stack is then a "cache" for the head of this list.

So, instead of using an instruction pointer, what actually happens is
that the last *PRIMITIVE* instruction on the RS will re-populate RS:
code can only be executed from RS, but can reside anywhere, abstracted
by the particular code representation.

That's it!


Entry: Input stream vs parameter stack
Date: Mon May 14 19:52:05 EDT 2012

There's always this tension between prefix commands and postfix
commands.  The pain is that moving from prefix (input) to postfix
(parameter stack) reverses order.

I.e.

PRE:   call n fn a1 ... an
POST1:           a1 ... an fn n call
POST2:           an ... a1 fn n call

The order of a1 ... an doesn't matter so much in the last call, so it
can be assumed that the whole command is fully reversed.

So what is the problem I'm solving, really?

Why is this always such a problem in Forth.  Are parsing words really
that essential?  I mean, there is the interplay between RS and DS, but
it seems there's a similar thing going on between the input stream and
DS.  Is Forth really a 3-stack machine, or a 2-stack, 2-stream machine
- console input and threaded code stream, which are also *very*
similar.


Entry: Loosing >r
Date: Tue May 15 21:38:17 EDT 2012

Loading instructions on the return stack causes works like >r, r> to
no longer work.  So is this a good idea then?


Entry: Notes about starting up
Date: Thu May 17 12:45:14 EDT 2012

Notes
  - Don't connect more than one PK2.  Later: allow multiple.
  - Overall, it boots just fine: connect PK2 and run "make xxx.live"


Entry: Datastructures (named offsets / addresses)
Date: Thu May 17 12:50:00 EDT 2012

For getting USB to work, first problem is handling datastructures.
The idea that came out of previous notes is to solve the namespace
problem by using positional pattern matching: bind macros to fields.

I.e. like in Haskell:

    dataGet1 (DataStructure d1 d2 d2) = d1

Note that accessors in Haskell also don't use hierarchical namespaces.
Everything is solved with modules.

Underlying idea: the power of Staapl is in the macro system, which is
essentially Scheme.

The question is then, how to represent it?  This might need references
to datastructures on the macro level, which are eventually flattened
to raw memory accesses?

Basically, this is a wrapper around:
0 hard-coded addresses (static memory)
1 indirect objects

On the PIC18 this would be implemented the "a" register.  The question
is then how to avoid contention with operations that use "a" directly?

This is a bunch of loose ideas that doesn't quite fit together
clearly.  Let's go back to the basic app.


Entry: USB stuff
Date: Thu May 17 13:13:50 EDT 2012

Using M-x staapl-usb from tools.el:

load usb.f
 include "/home/tom/pub/darcs/brood-5/staapl/pic18/usb.f"
 include "/home/tom/pub/darcs/brood-5/staapl/pic18/debug.f"
.........................................................................OK

I believe that's the last effort.

init-usb
OK


$ sudo tail -f /var/log/syslog

May 17 13:14:49 zoo kernel: [ 9866.365107] usb 4-1.4: new full speed USB device using ehci_hcd and address 7
May 17 13:14:50 zoo kernel: [ 9866.441358] usb 4-1.4: device descriptor read/64, error -32
May 17 13:14:50 zoo kernel: [ 9866.616742] usb 4-1.4: device descriptor read/64, error -32
May 17 13:14:50 zoo kernel: [ 9866.793137] usb 4-1.4: new full speed USB device using ehci_hcd and address 8
May 17 13:14:50 zoo kernel: [ 9866.869508] usb 4-1.4: device descriptor read/64, error -32
May 17 13:14:50 zoo kernel: [ 9867.049510] usb 4-1.4: device descriptor read/64, error -32
May 17 13:14:50 zoo kernel: [ 9867.225507] usb 4-1.4: new full speed USB device using ehci_hcd and address 9
May 17 13:14:51 zoo kernel: [ 9867.632247] usb 4-1.4: device not accepting address 9, error -32
May 17 13:14:51 zoo kernel: [ 9867.705418] usb 4-1.4: new full speed USB device using ehci_hcd and address 10
May 17 13:14:51 zoo ntpd[2365]: adjusting local clock by -6.269105s
May 17 13:14:51 zoo kernel: [ 9868.112332] usb 4-1.4: device not accepting address 10, error -32
May 17 13:14:51 zoo kernel: [ 9868.112856] hub 4-1:1.0: unable to enumerate USB device on port 4


I get -71 (EPROTO) on old dell machine and -32 (EPIPE) on new amd64 host.

I'm starting to worry that it doesn't work on USB1.1


Entry: maybe different approach better
Date: Thu May 17 17:52:14 EDT 2012

broebel:~# mount -t debugfs none_debugs /sys/kernel/debug
broebel:~# sudo modprobe usbmon
broebel:~# cat /sys/kernel/debug/usb/usbmon/1u | tail /tmp/usb1.log

[1] http://www.makestuff.eu/wordpress/?p=2537
[2] http://www.mjmwired.net/kernel/Documentation/usb/usbmon.txt


Entry: Analyzing USB traffic
Date: Fri May 18 11:41:50 EDT 2012

As usual, the effort should go to *effective* debug tools to just see
what's going on.  Then fixes are trivial.  Problem is observation.

Next:
- Event log on PIC, maybe in flash?
- Interpret USB debugging [1]

[1] http://www.mjmwired.net/kernel/Documentation/usb/usbmon.txt


Entry: Buffered console log
Date: Fri May 18 11:58:00 EDT 2012

What about this:

- Make emit buffered
- Before each handshake, dump out the emit buffer first.

Having 3 kinds of replies might be a bit too much: emit, hexdump,
non-formatted hexdump.

Also, it doesn't work everywhere... let's clean it up a bit.

I've commented it out for now..  Main problem encountered today:
- protocol is very ad/hoc and probably needs central place of documentation
- i don't have a simple way of using fifos


Entry: USB debugging
Date: Fri May 18 14:12:17 EDT 2012

Taking only a single URB: c40c7200


T     

  Event Type. This type refers to the format of the event, not URB
  type.  120 Available types are: S - submission, C - callback, E -
  submission error.

ADDR

  
  "Address" word (formerly a "pipe"). It consists of four fields,
  separated by colons: URB type and direction, Bus number, Device
  address, Endpoint number.  Type and direction are encoded with two
  bytes in the following manner:

    Ci Co   Control input and output
    Zi Zo   Isochronous input and output
    Ii Io   Interrupt input and output
    Bi Bo   Bulk input and output

  Bus number, Device address, and Endpoint are decimal numbers, but
  they may have leading zeros, for the sake of human readers.

S

  Status word. This is either a letter, or several numbers separated
  by colons: URB status, interval, start frame, and error count.

SETUP

  Setup packet, if present, consists of 5 words: one of each for
  bmRequestType, bRequest, wValue, wIndex, wLength, as specified by
  the USB Specification 2.0.  These words are safe to decode if Setup
  Tag was 's'. Otherwise, the setup packet was present, but not
  captured, and the fields contain filler.


URB      TIME       T TD:B:DEV:E S RT RQ VAL  INDX LEN
----------------------------------------------------------------
c40c7200 2500976461 S Ci:1:001:0 s a3 00 0000 0001 0004 4 <
c40c7200 2500976494 C Ci:1:001:0 0 4 = 01010100

First line is input request.

RT a3 = 1010 0011   dir=dev->host, type=class, recp=other
RQ 00 = GET_STATUS

bmRequestType:

D7 Data Phase Transfer Direction
0 = Host to Device
1 = Device to Host
D6..5 Type
0 = Standard
1 = Class
2 = Vendor
3 = Reserved
D4..0 Recipient
0 = Device
1 = Interface
2 = Endpoint
3 = Other
4..31 = Reserved

Second line.

Hmm... doesn't correspond to [1]

RT = 1000 0000b
RQ = GET_STATUS (0x00)
VAL = Zero
INDX = Zero
LEN = Two

I need a working starting point...
I moved to wireshark: this has some parsing, makes it more clear.


URB      TIME       T TD:B:DEV:E S RT RQ VAL  INDX LEN             comment
--------------------------------------------------------------------------------
c40c7200 2500976461 S Ci:1:001:0 s a3 00 0000 0001 0004 4 <        GET_STATUS
c40c7200 2500976494 C Ci:1:001:0 0 4 = 01010100
c40c7200 2500976511 S Co:1:001:0 s 23 01 0010 0001 0000 0
c40c7200 2500976522 C Co:1:001:0 0 0
c40c7200 2500976533 S Ci:1:001:0 s a3 00 0000 0002 0004 4 <
c40c7200 2500976543 C Ci:1:001:0 0 4 = 00010000
c40c7200 2501080613 S Ci:1:001:0 s a3 00 0000 0001 0004 4 <
c40c7200 2501080638 C Ci:1:001:0 0 4 = 01010000
c40c7200 2501080689 S Co:1:001:0 s 23 03 0004 0001 0000 0
c40c7200 2501080703 C Co:1:001:0 0 0
c40c7200 2501136463 S Ci:1:001:0 s a3 00 0000 0001 0004 4 <
c40c7200 2501136503 C Ci:1:001:0 0 4 = 03010000
c40c7200 2501192458 S Co:1:001:0 s 23 01 0014 0001 0000 0
c40c7200 2501192480 C Co:1:001:0 0 0

c40c7200 2501192538 S Ci:1:000:0 s 80 06 0100 0000 0040 64 <
c40c7200 2501213484 C Ci:1:000:0 -75 0
c40c7200 2501213573 S Ci:1:000:0 s 80 06 0100 0000 0040 64 <
c40c7200 2501217471 C Ci:1:000:0 -71 0
c40c7200 2501217550 S Ci:1:000:0 s 80 06 0100 0000 0040 64 <
c40c7200 2501221470 C Ci:1:000:0 -71 0
c40c7200 2501221552 S Co:1:001:0 s 23 03 0004 0001 0000 0
c40c7200 2501221569 C Co:1:001:0 0 0
c40c7200 2501276456 S Ci:1:001:0 s a3 00 0000 0001 0004 4 <
c40c7200 2501276498 C Ci:1:001:0 0 4 = 03010000
c40c7200 2501332450 S Co:1:001:0 s 23 01 0014 0001 0000 0
c40c7200 2501332472 C Co:1:001:0 0 0
c40c7200 2501436471 S Ci:1:000:0 s 80 06 0100 0000 0040 64 <
c40c7200 2501440425 C Ci:1:000:0 -71 0
c40c7200 2501440517 S Ci:1:000:0 s 80 06 0100 0000 0040 64 <
c40c7200 2501443429 C Ci:1:000:0 -71 0
c40c7200 2501443502 S Ci:1:000:0 s 80 06 0100 0000 0040 64 <
c40c7200 2501447427 C Ci:1:000:0 -71 0
c40c7200 2501447501 S Co:1:001:0 s 23 03 0004 0001 0000 0
c40c7200 2501447517 C Co:1:001:0 0 0
c40c7200 2501500454 S Ci:1:001:0 s a3 00 0000 0001 0004 4 <
c40c7200 2501500496 C Ci:1:001:0 0 4 = 03010000
c40c7200 2501556451 S Co:1:001:0 s 23 01 0014 0001 0000 0
c40c7200 2501556473 C Co:1:001:0 0 0
c40c7200 2501660468 S Co:1:001:0 s 23 01 0001 0001 0000 0
c40c7200 2501660497 C Co:1:001:0 0 0
c40c7200 2501660561 S Co:1:001:0 s 23 03 0004 0001 0000 0
c40c7200 2501660573 C Co:1:001:0 0 0
c40c7200 2501716479 S Ci:1:001:0 s a3 00 0000 0001 0004 4 <
c40c7200 2501716528 C Ci:1:001:0 0 4 = 03010000
c40c7200 2501772449 S Co:1:001:0 s 23 01 0014 0001 0000 0
c40c7200 2501772470 C Co:1:001:0 0 0
c40c7200 2501772530 S Ci:1:000:0 s 80 06 0100 0000 0040 64 <
c40c7200 2501776436 C Ci:1:000:0 -71 0
c40c7200 2501777147 S Ci:1:000:0 s 80 06 0100 0000 0040 64 <
c40c7200 2501779365 C Ci:1:000:0 -71 0
c40c7200 2501779520 S Ci:1:000:0 s 80 06 0100 0000 0040 64 <
c40c7200 2501783371 C Ci:1:000:0 -71 0
c40c7200 2501783526 S Co:1:001:0 s 23 03 0004 0001 0000 0
c40c7200 2501783548 C Co:1:001:0 0 0
c40c7200 2501836458 S Ci:1:001:0 s a3 00 0000 0001 0004 4 <
c40c7200 2501836500 C Ci:1:001:0 0 4 = 03010000


[1] http://www.beyondlogic.org/usbnutshell/usb6.shtml


Entry: USB debugging with wireshark
Date: Fri May 18 15:16:34 EDT 2012

First thing I see are messages to
1d6b:0001 Linux Foundation 1.1 root hub

What is this?  What's that "Linux Foundation" business?  Looking into
drivers/usb/core/hcd.c it seems that the root hubs are emulated.  I
don't find any direct explanation but it seems that this is to share
code between the host controller and hubs, by exposing a HC as a hub.

Note: For a USB device, all traffic on the wire is directed to one
device.  For a usb BUS (which wireshark sees) there are multiple
devices.  The URB distinguishes between them.

I have my device connected as the only device on the test machine, but
there is still the hub traffic to ignore on the usb bus.  How to set a
wireshark usb[1] filter?

  usb.urb_id == 0xc40c7800


Looks like the first couple of transfers are handled by the PIC
hardware, I don't recognize them, and it seems neither does WireShark
as it's not parsed ("Application Data")

The first properly parsed message is a GET DESCRIPTOR device.

The ones before that are bmRequestType
23  h->d, class, other
a3  d->h, class, other

First non-parsed bytes in those packets: 0,1,0,0,3,0,1
This should be the bmRequest field

From the names in [2] these seem to be "physical requests".  It
doesn't seem that they actually make it to the firmware.  First
bmRequestType:bmRequst I see on the device is 80:06, whis is DEVICE
request GET DESCRIPTOR.

So it seems:
- physical requests can be ignored
- next: reply to GET DESCRIPTOR
- next: fix logging issues (either RAM buffer or TTL serial port)


[1] http://www.wireshark.org/docs/dfref/u/usb.html
[2] http://www.compsys1.com/support/usb/pic_code/HIDCLASS.ASM


Entry: Old USB code
Date: Fri May 18 16:30:17 EDT 2012

Last reference of what I was doing with USB is from [1].  This
triggered a bunch or problems: emit -> PK2 stuff that eventually ended
in me getting bored/disgusted/...

I see, the old code is in _usb.f

AHA. Now I remember.  The old USB code used FSR2-relative addressing
(the a reg), which is a problem for my Forth library because it's
essentially a different machine to manage.  That's why I switched to a
"stream" approach.

Next:
  - send dummy device reply + check on sniffer
  - get the "struct compiler" back online

[1] entry://20110407-002404


Entry: Speed vs abstraction: current object?
Date: Fri May 18 16:46:31 EDT 2012

While this stream-oriented approach does seem to work a little bit,
I'm not sure that using the 'a' register for this is a good idea.  Or
'a' should be something like current object, which means that this
only works for highly coupled code.


Entry: Can the usbmon be trusted?
Date: Fri May 18 19:31:01 EDT 2012

According to the USB dump there is a reply to the device request, but
I'm not sending anything from the uC.  I just get "Malformed Packet:
USB" and 24 bytes.

Maybe it doesn't get past the host controller?

Time difference (- 0.246596 0.238753) which is 7.8ms
Something isn't right..


Let's go back to the raw capture node.


Entry: Today's trouble
Date: Fri May 18 19:57:50 EDT 2012

- It doesn't seem that usbmon/wireshark can be trusted to say what
  actually goes over the wire.  Either I'm not getting it or wrong
  data doesn't make it to the PC.

- PK2 is unstable: it can get stuck requiring a reset.  Maybe I should
  just accept standard debug tools, otherwise I'm never going to get
  anything done.

- Overall it feels too complex.  I'm tempted to start over.


Entry: usbmon
Date: Fri May 18 20:10:06 EDT 2012

Let's just work with the raw USBmon stuff:

cfbc4500 2426908079 S Ci:1:000:0 s 80 06 0100 0000 0040 64 <
cfbc4500 2426915987 C Ci:1:000:0 -75 0
cfbc4500 2426916076 S Ci:1:000:0 s 80 06 0100 0000 0040 64 <
cfbc4500 2426918960 C Ci:1:000:0 -71 0

So there is no trace of any reply packets.  Seems wireshark's display
was bogus indeed.

75 == EOVERFLOW
71 == EPROTO

I found this in ohci.h:

/* map OHCI TD status codes (CC) to errno values */
static const int cc_to_error [16] = {
	/* No  Error  */               0,
	/* CRC Error  */               -EILSEQ,
	/* Bit Stuff  */               -EPROTO,
	/* Data Togg  */               -EILSEQ,
	/* Stall      */               -EPIPE,
	/* DevNotResp */               -ETIME,
	/* PIDCheck   */               -EPROTO,
	/* UnExpPID   */               -EPROTO,
	/* DataOver   */               -EOVERFLOW,
	/* DataUnder  */               -EREMOTEIO,
	/* (for hw)   */               -EIO,
	/* (for hw)   */               -EIO,
	/* BufferOver */               -ECOMM,
	/* BuffUnder  */               -ENOSR,
	/* (for HCD)  */               -EALREADY,
	/* (for HCD)  */               -EALREADY
};

But the controller is UHCI.  The other plausible mention is in hub.c:2720

In uhci-q.c:763 I find:

static int uhci_map_status(int status, int dir_out)
{
	if (!status)
		return 0;
	if (status & TD_CTRL_BITSTUFF)			/* Bitstuff error */
		return -EPROTO;
	if (status & TD_CTRL_CRCTIMEO) {		/* CRC/Timeout */
		if (dir_out)
			return -EPROTO;
		else
			return -EILSEQ;
	}
	if (status & TD_CTRL_BABBLE)			/* Babble */
		return -EOVERFLOW;
	if (status & TD_CTRL_DBUFERR)			/* Buffer error */
		return -ENOSR;
	if (status & TD_CTRL_STALLED)			/* Stalled */
		return -EPIPE;
	return 0;
}


So this is at least starting to make sense a bit.

1st: EOVERFLOW from TD_CTRL_BABBLE
2nd: EPROTO    from TD_CTRL_CRCTIMEO probably timeout.

Also seen:
     EPIPE     from TD_CTRL_STALLED

Some UHCI docs[1]: "When a device transmits on the USB for a time
greater than its assigned Max Length, it is said to be babbling."

[1] ftp://download.intel.com/technology/usb/uhci11d.pdf


Entry: Simple software?
Date: Fri May 18 21:06:11 EDT 2012

It's a bit of a hopeless situation to try to keep things simple when
connecting to with existing software/hardware interfaces.


Entry: Getting things done
Date: Fri May 18 21:08:09 EDT 2012

So it's clear now: staapl is an experimental toy.  I don't think I can
ever climb the mountain of compatibility with C.  Staapl probably only
makes sense in either a PIC architecture, or on a real stack machine
in i.e. an FPGA.

Trouble is that this is a work of passion, of a bit crazy ideas, and
currently there's a bit too much wind ahead to make this a fun
project..

So what to do?

Entry: USB on scope
Date: Sat May 19 00:21:15 EDT 2012

Looking on scope, I see only one of the lines (D-) move between 0-3V,
the other (D+) stays at 0.  Here[1] is mentioned that there is single
ended signalling, but only for initial conditions.  Doesn't look
normal.

However, if something that lowlevel is wrong, why does reception work?
PIC does get bytes in just fine.

Ordering new chips so I can see if a fresh chip has the same behaviour.

Could also be output config.

EDIT: Looks like I was just seeing the single-ended signalling.
Setting a 1->0 trigger on D+ does show some 12Mbps symmetric waveforms
after a while.


[1] http://www.beyondlogic.org/usbnutshell/usb2.shtml#Electrical


Entry: Next
Date: Sat May 19 01:36:55 EDT 2012

Electrical is OK, so let's continue looking in the linux source.  Or
maybe, let's try low speed first. -> didn't work

drivers/usb/core/message.c:132
usb_control_msg() 

:44
usb_start_wait_urb()


Entry: kernel with USB debug message
Date: Sat May 19 12:11:40 EDT 2012


May 19 11:53:16 broebel kernel: [  925.937217] usb usb1: usb resume
May 19 11:53:16 broebel kernel: [  925.937246] usb usb1: wakeup_rh
May 19 11:53:17 broebel kernel: [  925.976143] hub 1-0:1.0: hub_resume
May 19 11:53:17 broebel kernel: [  925.976205] uhci_hcd 0000:00:07.2: port 1 portsc 0093,00
May 19 11:53:17 broebel kernel: [  925.976240] hub 1-0:1.0: port 1: status 0101 change 0001
May 19 11:53:17 broebel kernel: [  926.080205] hub 1-0:1.0: state 7 ports 2 chg 0002 evt 0000
May 19 11:53:17 broebel kernel: [  926.080269] hub 1-0:1.0: port 1, status 0101, change 0000, 12 Mb/s
May 19 11:53:17 broebel kernel: [  926.192164] usb 1-1: new full speed USB device using uhci_hcd and address 6
May 19 11:53:17 broebel kernel: [  926.221152] usb 1-1: uhci_result_common: failed with status 440000

440000 == (1<<22) | (1<<18)
       == TD_CTRL_STALLED | D_CTRL_CRCTIMEO


Conclusion?  I don't think the device sends anything.
Next step:
- read the datasheet
- read example code


broebel:/net/kers/home/tom/linux/linux-2.6-2.6.32/drivers/usb/core# rmmod ehci_hcd uhci_hcd usbmon usbcore ; insmod ./usbcore.ko  ; modprobe usbmon ; modprobe uhci_hcd


Entry: DATA 0/1
Date: Sat May 19 13:32:47 EDT 2012

I removed data toggle, so maybe that's a problem?

[1] http://wiki.osdev.org/Universal_Serial_Bus#Data_Toggle_Synchronization


Entry: ASM USB example code
Date: Sat May 19 13:50:53 EDT 2012

It does some things to the BD0O registers I don't really understand..
Let's analyse ProcessSetupToken() / SendDescriptorPacket()

 - Copy the 8 byte setup packet to a separate buffer.
 - Reset both BD0I and BD0O
 - Reset PKTDIS
 - .... fill buffer IN0
 - Toggle+transfer DB0I


The strange thing is that I get a reset, get_dev_descr, reset, set
addr.  So one would think that the 80 06 did work..

R
80 06 00 01 FF R
00 05 12 FF 
00 05 12 FF 
00 05 12 FF 


Man I'm so confused..  What I need is some documentation that explains
all this as a trace: what exactly happens on the wire for an entire
enumeration process?

What I'd like to see is a successful device descriptor transaction.
I'm thinking that maybe the data phase is not correct.  Focus on this:
what is the DATAx for a reply to a SETUP packet with a device request?

[1] http://pe.ece.olin.edu/ece/projects.html


Entry: SET_ADDRESS: why does lab1.c wait for IN ?
Date: Sat May 19 16:43:53 EDT 2012

lab1.c replies with 0-size DATA1 to SET_ADDRESS, and it doesn't
immediately set the address.  That only gets used in ProcessInToken.
Why is there an IN token after SET_ADDRESS?

Answer: there is always an IN transaction after a SETUP transaction.
ProcessInToken() merely acts as a notification that the 0-size IN
transaction sent in response to the SET_ADDRESS is done.  Only after
that it is safe to set the address.

Basically, TRIF -> IN is just an acknowledgement of the end of an IN
transaction initiated by the PIC.  Like this:

   - PIC firmware prepares IN buffer (an empty one, serving as the
     Status word of a SET_ADDRESS control message)

   - PIC HW waits for IN token sent by host

   - PIC HW sends out DATA token to host

   - PIC HW receives ACK from host

   - This completes the transaction in hardware, so TRNIF is set, and
     the PID of last transaction set to "IN" in USTAT.

   - Only after the whole SET_ADDRESS transaction is done the device
     address can be changed.


The reason that this is the only case handled in IN events is because
we don't care in the other case: nothing more to be done after
transaction is over.


Entry: Status USB
Date: Sat May 19 18:06:10 EDT 2012

I still don't know what's going on.  It seems the first get_device
request doesn't make it over the wire.

I'm making a lot of changes for small things that seem to be wrong not
writing things in the right order etc, but there doesn't seem to be
any improvement in the log messages.  I'm still flying blind.

( removed the hub messages )

May 19 18:50:44 broebel kernel: [25973.576247] usb 1-1: new full speed USB device using uhci_hcd and address 46

May 19 18:50:44 broebel kernel: [25973.576282] drivers/usb/core/message.c: usb_control_msg: cca1e000 6 256 0
May 19 18:50:44 broebel kernel: [25973.627168] usb 1-1: uhci_result_common: failed with status 440000
May 19 18:50:44 broebel kernel: [25973.627285] drivers/usb/core/message.c: usb_control_msg: cca1e000 -> -84
May 19 18:50:44 broebel kernel: [25973.627318] drivers/usb/core/message.c: usb_control_msg: cca1e000 6 256 0
May 19 18:50:44 broebel kernel: [25973.630139] usb 1-1: uhci_result_common: failed with status 440000
May 19 18:50:44 broebel kernel: [25973.630215] drivers/usb/core/message.c: usb_control_msg: cca1e000 -> -71
May 19 18:50:44 broebel kernel: [25973.630246] drivers/usb/core/message.c: usb_control_msg: cca1e000 6 256 0
May 19 18:50:44 broebel kernel: [25973.634123] usb 1-1: uhci_result_common: failed with status 440000
May 19 18:50:44 broebel kernel: [25973.634221] drivers/usb/core/message.c: usb_control_msg: cca1e000 -> -71
May 19 18:50:44 broebel kernel: [25973.744252] usb 1-1: device descriptor read/64, error -71

May 19 18:50:44 broebel kernel: [25973.848155] drivers/usb/core/message.c: usb_control_msg: cca1e000 6 256 0
May 19 18:50:44 broebel kernel: [25973.883124] usb 1-1: uhci_result_common: failed with status 440000
May 19 18:50:44 broebel kernel: [25973.883241] drivers/usb/core/message.c: usb_control_msg: cca1e000 -> -84
May 19 18:50:44 broebel kernel: [25973.883274] drivers/usb/core/message.c: usb_control_msg: cca1e000 6 256 0
May 19 18:50:44 broebel kernel: [25973.886095] usb 1-1: uhci_result_common: failed with status 440000
May 19 18:50:44 broebel kernel: [25973.886168] drivers/usb/core/message.c: usb_control_msg: cca1e000 -> -71
May 19 18:50:44 broebel kernel: [25973.886199] drivers/usb/core/message.c: usb_control_msg: cca1e000 6 256 0
May 19 18:50:44 broebel kernel: [25973.890081] usb 1-1: uhci_result_common: failed with status 440000
May 19 18:50:44 broebel kernel: [25973.890197] drivers/usb/core/message.c: usb_control_msg: cca1e000 -> -71
May 19 18:50:45 broebel kernel: [25974.000292] usb 1-1: device descriptor read/64, error -71


Entry: I have no clue
Date: Sat May 19 19:06:54 EDT 2012

Looks like I'm stuck.
- I don't see what's on the wire
- Data doesn't seem to get past the UHCI controller

It seems as if something doesn't get sent.

Next: more reading?  PIC datasheet and lab1.c


This is 99% about attitude and stamina ;)
Quite a challenge.


Entry: Increase visibility
Date: Sat May 19 20:34:26 EDT 2012

I'm getting quite sick of it, so let's just try to increase visibility
and interactivity.  The main problem is that I don't see what's going
on, and what I see I don't trust.

So. 

1. Take PK2 out of the loop.  Move back to serial port console.
   EDIT: This needs some fixing.  Something broke after PK2 stuff?

2. Patch linux kernel to fail on the first error.  This way it will be
   more clear what's happening.


Entry: Serial console
Date: Sat May 19 21:18:34 EDT 2012


  (console 'uart "/dev/ttyUSB1" 230400)

Looks like that is broken.


Entry: Frustrated
Date: Sat May 19 21:45:34 EDT 2012

I'm running into too many problems.  All of them are debuggable but
I'm re-entering the debugging cycle quite deeply now.  Everywhere I
look there's an insurmountable heap of cruft.

Time to start from scratch?

Something simple.


Entry: Future of Staapl
Date: Sat May 19 23:05:28 EDT 2012

It's becoming quite clear

- Staapl is (not yet) for practical things.  It needs to grow.

- I'm going to continue on the USB as an excercise in debugging.


I periodically get quite sick of this.  The reason for that is that
I'm thinking goal-oriented instead of process-oriented.

Main conclusion is that Staapl is still a hack, not really ready for
building "real" (non-exploratory) software with external constraints.

The USB driver turns out to be a good exercise in debugging skills,
but it's going to take a LONG time to finish this, so I'm no longer
putting any "product" goals.  It will take as long as it takes and
it's a process.

I'm decoupling it from any practical tools (effect pedal), which will
be built in C.


Entry: Things to fix
Date: Sun May 20 11:11:15 EDT 2012

 - Make the USB driver debuggable (interactive?)

 - PK2 "interrupt" isn't stable.  Does this also pose a problem if
   interrupt isn't used?

 - Serial console is broken on 18F2550

 - Fix error messages and code navigation.

 - Documentation?  A bit of a chicken-and-egg problem: without docs,
   nobody touches it, and until I'm getting some real feedback, docs
   will be bad or outdated.  At this point it doesn't seem like a good
   way to use time.


Entry: Solipsism
Date: Sun May 20 11:02:05 EDT 2012

Another point that has been bubbling up in the atmosphere of despair
surrounding the USB driver is is integration into "society".  Up to
this point, apart from some pats on the backs, nobody cares.  It all
feels a little solipsistic, which would be fine if it were to be on
the same level as "solving a crossword puzzle".

However, that's not what I want.  At least, it should be useful in
some way.  Either as a project in and of itself, or as a means to an
end.

I will continue working on it as long as it peaks my interest, but
without any particular "product" in mind.


EDIT:  I really have more fun when I ignore purpose..


Entry: Debuggable USB driver
Date: Sun May 20 11:15:08 EDT 2012

The idea is to make it run just once: make one service attempt, then
stop.

What about adding debug strings?
More general: send "string tokens" to the host.


Entry: Printing racket backtrace
Date: Sun May 20 11:51:34 EDT 2012

[1] https://groups.google.com/group/plt-scheme/browse_thread/thread/231bb68fbc8093eb
[2] https://groups.google.com/group/racket-users/tree/browse_frm/month/2010-06/090316a2a81df3a4?rnum=91&_done=%2Fgroup%2Fracket-users%2Fbrowse_frm%2Fmonth%2F2010-06%3F


Entry: Trouble
Date: Sun May 20 19:15:43 EDT 2012

match: no matching clause for #<compiler>

Goes away after "make clean".


Entry: Basically a whole day of stuff
Date: Sun May 20 19:20:05 EDT 2012

From today:
- using an interactive approach: run code, don't reset PIC or PK2.
- quit / abort / warm / continue
- patching linux driver with debug messages and eliminating retries


Entry: Debugging
Date: Tue May 29 13:29:02 EDT 2012

So nothing is happening on the wire.  My guess is that it's software
or configuration, and not electrical.  But to eliminate that it's
probably easy to:

- Try another 18F2550 chip
- Try PICstamp board
- run some third pary code [1]

[1] http://www.sparetimelabs.com/usbcdcacm/index.html


Entry: Resetting PK2
Date: Tue May 29 13:43:47 EDT 2012

PK2 gets into a state that is only broken using a hard USB
unplug/replug cycle.  Error in Scheme is:

  Error opening console pickit2:
  procedure application: expected procedure, given: #(struct:exn:fail "error: no-pickit2-found" #<continuation-mark-set>) (no arguments)

  Process staapl-usb exited abnormally with code 1

Error using pk2cmd is:

  No PICkit 2 found.

  make: *** [pk2-2550-48.flash] Error 10

Though the device is still there:

  tom@zoo:~/$ lsusb
  ...
  Bus 001 Device 049: ID 04d8:fc92 Microchip Technology, Inc. 


  tom@zoo:~/$ lsusb -v  -s 1:49

  Bus 001 Device 049: ID 04d8:fc92 Microchip Technology, Inc. 
  Device Descriptor:
    bLength                18
    bDescriptorType         1
    bcdUSB               2.00
    bDeviceClass            2 Communications
    bDeviceSubClass         0 
    bDeviceProtocol         0 
    bMaxPacketSize0         8
    idVendor           0x04d8 Microchip Technology, Inc.
    idProduct          0xfc92 
    bcdDevice            1.00
    iManufacturer           1 
    iProduct                2 
    iSerial                 0 
    bNumConfigurations      1
    Configuration Descriptor:
      bLength                 9
      bDescriptorType         2
      wTotalLength           67
      bNumInterfaces          2
      bConfigurationValue     1
      iConfiguration          0 
      bmAttributes         0x80
        (Bus Powered)
      MaxPower              200mA
      Interface Descriptor:
        bLength                 9
        bDescriptorType         4
        bInterfaceNumber        0
        bAlternateSetting       0
        bNumEndpoints           1
        bInterfaceClass         2 Communications
        bInterfaceSubClass      2 Abstract (modem)
        bInterfaceProtocol      1 AT-commands (v.25ter)
        iInterface              0 
        CDC Header:
          bcdCDC               1.10
        CDC ACM:
          bmCapabilities       0x02
            line coding and serial state
        CDC Union:
          bMasterInterface        0
          bSlaveInterface         1 
        CDC Call Management:
          bmCapabilities       0x00
          bDataInterface          1
        Endpoint Descriptor:
          bLength                 7
          bDescriptorType         5
          bEndpointAddress     0x82  EP 2 IN
          bmAttributes            3
            Transfer Type            Interrupt
            Synch Type               None
            Usage Type               Data
          wMaxPacketSize     0x0008  1x 8 bytes
          bInterval               2
      Interface Descriptor:
        bLength                 9
        bDescriptorType         4
        bInterfaceNumber        1
        bAlternateSetting       0
        bNumEndpoints           2
        bInterfaceClass        10 CDC Data
        bInterfaceSubClass      0 Unused
        bInterfaceProtocol      0 
        iInterface              0 
        Endpoint Descriptor:
          bLength                 7
          bDescriptorType         5
          bEndpointAddress     0x03  EP 3 OUT
          bmAttributes            2
            Transfer Type            Bulk
            Synch Type               None
            Usage Type               Data
          wMaxPacketSize     0x0040  1x 64 bytes
          bInterval               0
        Endpoint Descriptor:
          bLength                 7
          bDescriptorType         5
          bEndpointAddress     0x83  EP 3 IN
          bmAttributes            2
            Transfer Type            Bulk
            Synch Type               None
            Usage Type               Data
          wMaxPacketSize     0x0040  1x 64 bytes
          bInterval               0
  can't get device qualifier: Operation not permitted
  can't get debug descriptor: Operation not permitted
  cannot read device status, Operation not permitted (1)


Using this code[1] usbreset.c

  #include <stdio.h>
  #include <fcntl.h>
  #include <errno.h>
  #include <sys/ioctl.h>
  #include <linux/usbdevice_fs.h>
  void main(int argc, char **argv)
  {
          const char *filename;
          int fd;
          filename = argv[1];
          fd = open(filename, O_WRONLY);
          ioctl(fd, USBDEVFS_RESET, 0);
          close(fd);
          return;
  }

  ./usbreset /dev/bus/usb/001/049

I still get trouble:

May 29 13:59:08 zoo kernel: [1049324.765287] usb 1-2.1.2.1: reset full speed USB device using ehci_hcd and address 49
May 29 13:59:08 zoo kernel: [1049324.863631] cdc_acm 1-2.1.2.1:1.0: This device cannot do calls on its own. It is not a modem.
May 29 13:59:08 zoo kernel: [1049324.863674] cdc_acm 1-2.1.2.1:1.0: ttyACM0: USB ACM device

It shows up as a serial port.  After unplug, replug gives:

May 29 14:03:54 zoo kernel: [1049610.664984] usb 1-2.1.2.4: new full speed USB device using ehci_hcd and address 50
May 29 14:03:54 zoo kernel: [1049610.767214] usb 1-2.1.2.4: New USB device found, idVendor=04d8, idProduct=0033
May 29 14:03:54 zoo kernel: [1049610.767224] usb 1-2.1.2.4: New USB device strings: Mfr=1, Product=2, SerialNumber=3
May 29 14:03:54 zoo kernel: [1049610.767232] usb 1-2.1.2.4: Product: PICkit 2 Microcontroller Programmer
May 29 14:03:54 zoo kernel: [1049610.767239] usb 1-2.1.2.4: Manufacturer: Microchip Technology Inc.
May 29 14:03:54 zoo kernel: [1049610.767244] usb 1-2.1.2.4: SerialNumber: 

At which point it works again.  So it seems firmware gets stuck.  USB
reset does not reset the PK2 firmware.  Looks like this bug is a
dead-end.  The only thing I can think of is to rewrite the firmware.

[1] http://www.roman10.net/how-to-reset-usb-device-in-linux/


Entry: What to do with PK2
Date: Tue May 29 14:21:57 EDT 2012

Since there are some issues with the firmware, it doesn't seem to be a
good idea to stick with original firmware.  I'm not sure if it's
useful to write new firmware for PK2.  I suppose it is possible, as
copyright is only on the original Microchip code, and the schematic is
not patented.


Entry: Next
Date: Wed May 30 09:03:23 EDT 2012

- add power led to board
- add debug led?  (i.e. on when waiting in interpreter)
- Try another 18F2550 chip
- Try PICstamp board
- run some other USB code [1]


I tried to run this[1] in a hurry yesterday but nothing happened.

[1] http://www.sparetimelabs.com/usbcdcacm/index.html


Entry: Fixing serial
Date: Wed May 30 09:53:30 EDT 2012

I need to do someting else to get me going this morning.  Let's try to
fix the board + serial.  I noticed a stupid soldering error last week.

EDIT: Looks like this is a bit broken..  That sucks as this
pre-interpreter stuff is hard to debug.

First problem: TX wasn't connected.  Old 18F452 board has a serial
daisy chain that needs to be jumpered for single connection.

Second problem: running the code below gives me 9600 baud (9803 as
measured by OLS).  Weird.. would assume when it's off that it's way
off, not 2% of something that seems standard.

: serial-test
    230400 40 init-serial
    0 begin dup async.>tx 1 + again
    

Got it.  Problem was that this probably never worked after the
interpreter changed to the new "handshake" interface necessary for
PK2.


Entry: PicStamp
Date: Wed May 30 17:03:36 EDT 2012

On Johannes' board it seems to get a bit further.

This is the device descriptor it receives on the amd64 host:

12 01 00 02 00 00 00 40
D8 04 FF 01 01 00 01 02
00 01

which looks OK.  On the 32bitter USB 1.1 it doesn't go without errors,
but the bytes at least go over the wire in an IN transaction.

Let's try with a different chip.
Same results.

After the SETUP token I get SYNC errors

So it's probably electrical.

Entry: Electrical problem?
Date: Wed May 30 17:39:07 EDT 2012

Reading the 18F2550 datasheet, here's something I missed:


  17.2.2.8 Internal Regulator

  The PIC18FX455/X550 devices have a built-in 3.3V reg- ulator to
  provide power to the internal transceiver and provide a source for
  the internal/external pull-ups. An external 220 nF (±20%) capacitor
  is required for stability.

I found it looking up "VREGEN" from inspecting the configuration bits
in piklab.

EDIT: Adding the 220nF did the trick.

Another one for later:

  The data in the USB Status register is valid only when the TRNIF
  interrupt flag is asserted.


Entry: Next
Date: Sun Jun  3 11:30:25 EDT 2012

So problem is pierced.  What next?  Probably should focus on proper
logging facility, i.e. fixing/simplifying the interpreter.

It's not necessary to store strings on the device.  Simply defining
"error conditions with arguments" should be enough.  They can be part
of the host system.

Goal remains the same: absolute minimum use of resources on target.
The point is not to make a stand-alone forth, so offload all debug if
possible.

It seems simplest to do this using a proper RPC mechanism.  Define a
way to call host words from target, where the empty reply is "return"
as it was before.  Trouble is that once this is introduced in target,
it creates a coupling between host and app, i.e. app can no longer run
stand-alone.

So there needs to be a way to turn this off.


Entry: RPC
Date: Sun Jun  3 11:54:35 EDT 2012

Host RPC calls.  We already have several: emit, hexdump1, hexdump2,
return.  How to make this simpler?

- Protocol?  Same as before: each app has a number of RPC calls
  defined in its host dictionar that are encoded just like interpreter
  commands are encoded.

- Detach: at some point it needs to be possible for the target to call
  local code or raise an exception when host words are called in
  standalone mode.


The simplest way to "vector" these words is to ignore them.  It is
known how many arguments are being sent, so simply replacing the
current interpreter I/O with some simulation should create a proper
stub.


Entry: Fixing RPC
Date: Sun Jun  3 14:35:34 EDT 2012

If RPC is to be handled correctly, it needs to be handled in all parts
of the code, meaning:

   host sends RPC request
      -> target answers (0xFF)
      -> target sends RPC request (0xFE)

Cases of "return" and "call" need to be clearly distinguished.  Maybe
it's time to change this a bit such that the addressing is clear.


EDIT: fixed.
Seems to work ok.


Entry: Disassembler: use "sea" instead of "see"
Date: Sun Jun  3 21:31:28 EDT 2012

I forgot how this worked: It seems indeed better to just store
assembly for viewing instead of trying to reconstruct from raw
disassembly.  See [1].

[1] entry://20110402-100621


Entry: Symbolic target words
Date: Sun Jun  3 21:49:49 EDT 2012

Instead of using a bunch of codes, it's probably simplest to call host
code using symbolic names, and using the same approach as those
(non-parsing) words run from the command line.

What this needs is the following interface:

>h h>                 move words to host stack
#xFE [ <char> ... ]   execute target word

Maybe all RPC code should drop into interpreter?
EDIT: done, looks better.

Next: where is the live: state stored (stack)?
grep for "state:stack" in staapl/staapl/live

./rpn-target.ss:85:  (void ((target: code ...) (state:stack))))

So it looks like I need to open up this one:

(define-syntax-rule (target> code ...)
  (void ((target: code ...) (state:stack))))

Is it target: or live: ?


reflection.ss `run' delegates to `forth-command' defined in
rpn.target.ss in terms of `target:' to interprete commands.  So do we
want the target to have acess to that or only the `live:' set?

The namespace used by 'live:' is defined by `live-interpret' in
rpn-live.ss, and it will take words from the (scat) namespace but
delegate to scheme usoing `scat-wrap-dynamic'.

EDIT: Actually, this should probably be `target-interpret' instead of
`live-interpret'.

That works, but it creates loops: `target-interpret' gives preference
to target words, so this needs to be broken.

I don't see how to fix it.  `kb' is a 1cmd: in the (target)
dictionary, which is implemented as a prefix parser.

so "kb" -> "t> kb", where 'kb' is then interpreted in (target) which
delegates to target word.

Maybe this needs a more systematic approach.  Instead of delegating to
host commands from .f code, make it so that all host commands are
accessible on the target.


Entry: Flash literals
Date: Tue Jun  5 08:42:16 EDT 2012

Need to get back literal strings.  Some options:

- store in separate segment + provide word which loads pointer as literal
- inline code: store as Pascal string + use return address to locate + skip
  -> 2 versions: skip / drop.  I think I had drop before.

See string.f :

: f-> \ -- 
    TOSL @ fl !
    TOSH @ fh !
    pop ;

: foo f-> 3 , 65 , 65 , 65 ,
: .foo foo @f+ for @f+ emit next ;


There seems to be symbolic quotation using '`' but I forgot how to
then compile this to a byte sequence using ','

It's 'string,' but something's broken.  Yep, '->byte-list' didn't
support symbols.


Entry: Host commands accessible on target
Date: Tue Jun  5 10:32:53 EDT 2012

Maybe it's time to make a little diagram of all the different name
spaces and prefix parser tricks..  It's getting quite complicated.

1. Allow for "` <cmd> cmd" syntax.
2. Fix delegation loop


(live)
------

This is scat running on host with some words to interact with the
target machine, i.e. >t t> texex/b.  For convenience, it has late
binding for all symbols, with automatic lifting of Scheme words it
finds in the name space.  Note that scat is early bound.


(target)
--------

This is the interpreter for the command line dialect of the target
language.  It consists of 2 layers:

* prefix macros that translate to live: code.  (See commands.ss)
* name space delegation to target, then (live) (See rpn-target.ss)


Entry: Prefix commands
Date: Tue Jun  5 11:42:11 EDT 2012

(prefix-parsers-wrapped
 (target) 1cmd:
 (kb))

(prefix-parsers/meta
 (target) live:
 ((1cmd: w)    (t> w)))

Somehow (1cmd: kb) reseolves 'kb' using 'interpret-target', while what
I expect is this to be in live.  The trick here is in the /meta stuff,
which defines prefix parsers in terms of some other interpreter, while
the 'kb' parser itself is defined just as a (target) namespace prefix
parser.

;; Like 'prefix-parsers', but translate code using a different
;; compiler and splice it in.

(define-syntax-rule (prefix-parsers/meta ns lang: (pat code) ...)
  (begin
    ;; Evaluation the pattern to check if the names are actually
    ;; defined, but that doesn't work because it includes pattern
    ;; names as well..
    ;; (begin (lang: . code) ...) ;; test-eval it
    (prefix-parsers ns (pat (,(lang: . code))) ...)))


So, knowing this, how to break the loop?  Looks like it has to be
solved in the (target) namespace because that's where the prefix
parsers are.  So this needs to be solved in `target-interpret'.

Solution seems simple: call `target-interpret' direcly, and add an
optional argument.

Hmm... doesn't seem to work very well.  Interactions are too complex.
Maybe it's better to not define these words as prefix macros.  That
seems to be the real problem.

These could be just words..

Almost there.  This works:
> (target/kb (state:stack))

But when it's typed on the console, the words are not found.  Maybe it
just needs to look in the target/ dictionary first?

Another problem: why are the target words not defined as target/ prefix?

Ok, I see.  Looks like we hust need a different namespace.  What about
sim?

It's probably better to get rid of (target) for any console code, and
use it just for target words.  That's for later cleanup, for now let's
just introduce an extra namspace.  Let's just call it (host).


Entry: Simplification
Date: Tue Jun  5 13:33:05 EDT 2012

With target->host RPC working, there is really no need for a lot of
"push" commands implemented on target.

I.e. can the 3 dumps + trace be eliminated?  There's only one detail:
those commands support streaming of data, which might be handy.  Let's
just put all 3 versions of dump in one command, or maybe better, dump
to host stack byte list?

What about keeping the scat stack that's used in (target) language
active?  This way target could do: dump packet + exec command.

Hmmm.. that doesn't work due to recursive evaluation: this would
require the stack to be threaded through this recursion.  I think this
is too big a change, so let's keep stack local and use an explicit rpc
stack.

EDIT: Flash and RAM dump are not necessary on target.  They can be
implemented on the host.

Entry: Next
Date: Tue Jun  5 15:43:25 EDT 2012

* Add a single prefix parser macro word to execute a host command, e.g.

  h: px

  such that these don't need to be duplicated in the namespace.

* add some support for byte streams.  this probably needs a 2-byte
  "execute" or a 1-byte indirection, or say a table of file
  descriptors.


Maybe some more general indirection handling should be in order, also
for "pulled in" code.  1-byte indirection is handy, since it's the
data unit.


Entry: Dynamic binding
Date: Tue Jun  5 18:44:57 EDT 2012

While I don't really like dynamic binding so much, in forth it can
sometimes be quite handy.  Especially for streams it seems.

The overhead of local (stack) variables is too high.  In Chuck Moore's
stack machines, there is also a current pointer register.  For Staapl
on PIC there is already a & f so might be good to abstract it a bit.

However, moving bytes around is about the only thing we are ever going
to do a lot of so let's just define i> >o and use dynamic binding to
point them to current input and current output, with values saved on
RS.

what about this:

  iopen-<xxx>  connects something to i> and performs init
  iclose       calls close method + pops previous input

nope open/close should be different from `parameterize'.

  i-begin      pushes old i> vector to RS and installs new one
  i-end        pops old i> vector

So it seems best to do this on top of a vector abstraction.  There
really aren't so many tokens to manage, so sticking to bytes seems the
best approach.  The token table can be defined in the monitor, or
generated.

Previous code used 128 tokens (even ones not used) all stored at first
ram bank.

: do-arrow
    0 a!!
    TOSL @ !a+
    TOSH @ !a+
    pop ;


Actually there seems to be a simpler way to do this, which is to place
the indirection at the hardcoded level: reserve flash space for
vectors.  In the config .fm it could be defined where to put it.  I'm
wondering if it's possible to do some kind of smart ' (tick) operator:
automatically generate tokens whenever a tick is encountered.

TRAP.  This is a trap.  Just define a global execute function with a route

macro
: e0 0 ;
: e1 1 ;
: e2 2 ;
forth 
: execute _e0 . _e1 . _e2 ; 

Hmmm..  Trouble really is that global and incremental compilation are
really different.  Unless we use a linked list approach.

TRAP.  It's a trap!

What I need is a way to make some global "gather" compilations work.
It should be no big deal in the all-at-once module compiler, but doing
this incrementally is problematic, unless there is a way to make the
gathering operations updateable.

I think the state machine stuff works here too.  Also, the code
instantiation should be able to use it.

Anyway, this is a deep hole.


Entry: Doing this stuff in GDB
Date: Tue Jun  5 22:33:44 EDT 2012

What is so helpful about this is to start from the idea of "everyting
on the target" and then start offloading things to the host.  The big
idea here is that the target can "call host code".  For GDB, this
would mean:

  - set a breakpoint in a generic gdb_call() method.  use a single
    method to only have to use a single breakpoint.

  - disambiguate this call by using a scratch buffer in RAM that takes
    the parameters of an RPC call.

  - when a breakpoint fires, have GDB transfer this RPC param buffer
    to a handler, possibly an external program.
  
  - allow this handler program to call back into the program, and/or
    change the program's state.

Now, to simplify things, can the GDB part be limited to transferring
raw data packets between the target and some host process?  What I
really want is just a bi-directional message pipe.  Anything else can
go on top of that.

So basically, gdb_call() would give a command buffer to GDB, and
receive a reply.  This method would be called whenver there is a
target request or event, and called periodically to receive incoming
calls.

An a-symmetric RPC mechanism can carry calls in the other direction,
by embedding it into two calls: poll(), reply().

The main feature is really that the protocol is synchronous, so both
sides are always in a well-defined state.

So, what about this simplified version:

   - breakpoint at gdb_call
   - read target RAM buffer, save to req.bin (pipe?)
   - read from reply.bin (pipe?) into target RAM buffer
   - continue

Once the pingpong channel is organized, all the rest is software
protocol.

  gdb -> ext : dump binary value <pipe> <variable>
  ext -> gdb : source <pipe>

hmm doesn't look like it..  gdb doesn't want to block on sourcing from
a pipe.  this probably needs a shell command to keep it synchronous.

  dump binary value req.bin gdb_req
  shell do_rpc
  source reply.gdb


Entry: gdb_req
Date: Wed Jun  6 00:00:56 EDT 2012

Simple hack to make RPC channel from (embedded ARM) target program to
external linux application using GDB / JTAG.

  target -> app:  req.bin    binary request
  app -> target:  reply.gdb  gdb command reply

This is useful to write a test system with minimal modification of the
target program.  The test system can be structured as if it were
running on the resource-limited embedded target, while in effect it is
implemented by host code with arbitrary complexity.  The gdb_call()
below is a hook point to insert remote calls.  If the test system is
detached, gdb_call() is a NOP, and the application can perform its
normal behaviour.


##### gdb_req.c

#include <stdio.h>
unsigned int gdb_req = 0;

void gdb_call() {
}

int main(void) {
    for(;;) {
        gdb_req++;
        printf("OUT gdb_req = %d\n", gdb_req);
        gdb_call();
        printf("IN  gdb_req = %d\n", gdb_req);
    }
    return 0;
}

##### gdb_req.sh

# Dummy operation: replace with program that handles request.
hd req.bin
echo 'set gdb_req = 123' >reply.gdb


##### gdb_req.gdb

define service
       dump binary value req.bin gdb_req
       shell ./gdb_req.sh
       source reply.gdb
       continue
end

define start_service
       break gdb_call
       tbreak main
       run
       while 1
              service
       end
end


Entry: Inlining symbolic constants
Date: Wed Jun  6 08:32:04 EDT 2012

There's a way to do macros.

   : foo [ 1 2 3 ] i 

But I'm not sure if functions will work.  I have been playing with
this before, but I don't remember.

There's also this:

   : foo make-label [ 1 ] compile-macro 2 ;

So it should be enough to dup it and compile a call using cw.  ccm =
compile and call macro.

   : foo [ 1 ] make-label dup >m swap compile-macro m> cw ;

Or factored out

   macro
   : ccm make-label dup >m swap compile-macro m> cw ;
   forth
   : foo [ 1 ] ccm ;

I've added it to the compiler as compile-call-macro.  This allows:

   : _kb [ ` kb fbin; ] compile-call-macro fcmd ;

Some refactoring is necessary.  compile-macro compiles exit which we
don't want, and it might be simpler to allow for macro wrapping and
composition to be explicit.

Works.


Entry: quote/exec
Date: Wed Jun  6 10:10:52 EDT 2012

RPC can be simplified more.

- Only one word for sending packet, parameterized by i> for input
- Same code sends quote/exec requests

Next: use vector.ss
EDIT: done, works fine.


Entry: Speed vs. code size
Date: Wed Jun  6 12:00:50 EDT 2012

Maybe it's time to make this more explicit: I have a strong irrational
tendency to want to write fast code.  In practice, this means a lot of
macros to eliminate indirect references.

While it's possible to do this, it's very inconvenient and doesn't
usually help much with code size.  I guess it's always possible to
optimize for speed when it's necessary.


Entry: do-arrow (vector.ss)
Date: Wed Jun  6 12:09:29 EDT 2012

The vector abstraction I used was:

2variable token
: set-token token -> <implementation code> ;
: run-token token invoke ;


It seems to work just fine for now.  Uses 2 bytes RAM to store the
address (which is abstract) but uses only 1-byte tokens (index into
table of addresses), which is convenient for use.

For now this seems good enough.  Storing the addresses in Flash is
possible but requres more elaborate bookkeeping.

I used it to create words: stdin i> 
This solves 99% of all data transfer problems in an abstract way.


Entry: Next
Date: Wed Jun  6 14:54:55 EDT 2012

Probably USB driver.  Got some new tools now.


Entry: Problem with host commands
Date: Wed Jun  6 15:04:23 EDT 2012

The target shouldn't execute host commands, but live (scat/scheme)
commands that operate on the host stack and access the target stack
explicitly.  Implemented as such, then there is no recursion.

The difference is that host commands take arguments form the target
param stack.  So what should it be?  The question is: implicit or
explicit?

It's a bit confusing.  I.e. the console is "magic", all the rest (scat
host machine and Forth target machine) are straighforward.

Explicit is better for target->host code.  But this need to find a
good way to make all this explicit in documentation.  I.e. the console
is "magic", all the rest (scat host machine and Forth target machine)
are straighforward.


Entry: Pull commands for dumps
Date: Wed Jun  6 16:04:32 EDT 2012

Added _ad and _fd scheme functions that can be called from target to
perform memory dumps without pushing.  It's not sure how these will be
useful, but at least it's possible to gather more data than in a
single packet, since host is smarter about chunking.

Another option would be to push a data generator that can be called by
host code to continuously stream data.


Entry: GDB stuff
Date: Thu Jun  7 19:19:10 EDT 2012

With a little more effort this can be made to talk to the gdbserver
directly, doing it in-image, avoiding the funky shit.


Entry: memcpy
Date: Sat Jun  9 10:24:50 EDT 2012

This is difficult in the current implementation because one of the
indirect addressing registers is used as the rs.  So either I do it
slow with memory pointers and the stdio stuff, or I write a special
routine.

For USB, it might not be necessary to use it, i.e. avoiding memcpy
altogether and doing it in-place.

For Flash there is no problem: initializers can go in Flash and will
probably use less code.


Entry: RPC context save
Date: Sun Jun 10 08:48:25 EDT 2012

RPC calls should preserve the following state:
- 3 stacks: xs, ds, rs
- 2 registers: a, f

However, the interpreter acts as if it owns these registers, so before
doing anything destructive a/f need to be saved.  It's easy to do this
on the target but why not do it on the host?

The trouble is that indirect addressing uses the a register, so it
doesn't look like this is possible without a register fetch support.

Maybe it's simplest to reuse the stackptr word to also dump out other
state.  Then, what about restoring?  Store also needs the a reg, so
restoring a reg is not possible without target support.

Looks like this needs 2 words: save and restore.  There's one slot
left as I see 2x reply0.  I guess slot 6 is not used.

Also, I'm not sure if jsr is still useful, so might reuse slot 7 also.

It seems better to turn the basic >t t> support words into multi-byte
words.  OK, done.

However, I'm not sure that was really necessary but it's good to have
it.

I'm thinking a bit more.  It's probably good to keep the interpreter
minimal, and only implement support for other features where they are
actually needed.  I.e. when not running a recursive interpreter,
save/restore is not necessary.  So let's put it in debug.f

EDIT: reply/1 isn't necessary.  Only used for stackptr and checkblk.
stackptr is necessary for ts, so might go in debug.f also.

EDIT: interpreter now only does memory transfer + execute (with and
without ack).  Decoupled other functionality from the host i/o.


Entry: Correct for .. next
Date: Sun Jun 10 10:14:36 EDT 2012

For .. next isn't implemented correctly when count = 0; it will loop
256 times.  How to do that better?

What about 1+ followed by a jump to the "then clause?".


Entry: Next
Date: Sun Jun 10 13:27:40 EDT 2012

- check proper save/restore of a/f registers on host RPC: OK
- continue usb


Entry: USB get descriptor + set addres OK
Date: Sun Jun 10 15:53:04 EDT 2012

However, the irony is that the debugging I have now slows it down too
much.  So what now?  Time for the auto-compressed log message generator!

Anyways.. it should probably work OK as long as there is only a single
PK2 pinpong going on.  How many are there now?

It's a couple:
- quoted symbol to print
- pb
- ts
- stack@
- a!
- dump

Probably it's best to make a do-it-all log command that dumps out
address, symbol and datastack.  Is there a simple way to do address ->
symbol translation?

Maybe it's best to just use trace actually..  EDIT: nope, also too
much.  I got the USB hardware wire trace so that's probably enough for
now.

Entry: USB next
Date: Sun Jun 10 17:46:42 EDT 2012

Let's just do all transactions one by one.

On the wire:
- addr:0  GetDescriptor  80 06 00 01 00 00 40 00
- addr:0  SetAddr x      00 05 48 00 00 00 00 00
- addr:x  GetDescriptor  80 06 00 01 00 00 12 00

So second time it asks only for 12 bytes.  I don't see a reply to that
request, so looks like address is not OK.


Entry: USB cont
Date: Tue Jun 12 08:15:49 EDT 2012

So on the wire things go wrong after SetAddr.  The device does seem to
receive the address.

address @ .
77 OK

But UADDR isn't correct.

UADDR @ .
112 OK

That's actually a bug in tethered.ss because the correct value 4D is
visible on the dump:

#xF6 abd
F60  00 00 00 00 00 00 00 00
F68  08 00 00 9F 04 20 4D 14

So UADDR is set, but device doesn't reply.  Maybe it should only be
set once?

I see the value of address changes..  Data corruption?  Might be the
vector table.  Or something more subtle.  Do variable addresses clash
when defined in modules?  Doesn't seem so..

OK, problem is missing "UIR TRNIF low".

But this probably means there is no transaction.IN handler for the
SETUP packets?  I'm confused..

Next error: for the initial addr==0 GetDescriptor there is
SETUP,IN,OUT but for the subsequent GetDescriptor to the new address,
the OUT phase is missing, and DataCenter (Beagle USB sniffer software)
doesn't see it as a GetDescriptor transaction.

Questions: if there is supposed to be an OUT, why isn't this visible
in the sniffer?

Aha, the IN phase in SETUP is DATA0, not DATA1 as in the successful
transaction.

Correct should be:

Get Device Descriptor:
  SETUP DATA0
  IN    DATA1
  OUT   DATA1

Set Address
  SETUP DATA0
  IN    DATA1

Looks like I'm just toggling from the Set Address IN reply, but I
should reset.

EDIT: Refactored a bit, now I get different errors.  Stall on OUT
phase of first GetDescriptor call.


    0 OUT/DATA0        \ make room for next SETUP request on EP0


Entry: Boolean functions
Date: Tue Jun 12 11:08:44 EDT 2012

1,a -> ~a
01   ->  1

(a xor b) or b


Entry: Things to change
Date: Tue Jun 12 11:50:14 EDT 2012

- cache the PIC18 target code generation in the Racket compilation
  phase, it is very slow.  I.e. once the module code is fixed, there
  is no reason to run the macros more than once: result will be the
  same, so just store the target code in the module and wrap this in a
  Racket compilation phase.

- decouple PK2 driver from interpreter image, I.e. provide over TCP.
  This could even be written in C to allow it to run on a larger uC.


Entry: USB linear code
Date: Tue Jun 12 12:04:40 EDT 2012

Instead of writing this as a state machine, I wonder if it's not
easier to use linear code.


Entry: More USB debugging
Date: Tue Jun 12 13:39:20 EDT 2012

The next SETUP txn it receives is:

80 06 00 06 00 00 0A 00

This is a Get Descriptor request for descriptor type 6, which is not
defined.

Weird.. Is this some kind of standard-complience test?

It doesn't happen on the other PC...  Might be some id-dependent thing
in the Linux kernel..  Let's just continue on other host.

Next request is get config: 80 06 00 02 00 00 09 00

I think it's time to revive the request struct compiler.


Entry: Struct compiler
Date: Tue Jun 12 14:47:04 EDT 2012

This needs a way to create target words in a module.  I did this
before, where is it?

The syntax is: (words ...)  Example in serial.ss

Old usb.ss code uses "route/e" "_x>" "xskip" which probably need to be
revived. 

Testing:
Device descriptor works.
String descs go over the wire but are not properly encoded.
There's something wrong with config

Ok, this was max packet size.
-> Need to truncate packet.


Entry: USB cont
Date: Tue Jun 12 18:56:59 EDT 2012

Enum working up to Set Configuration.
Next problem is the descriptors themselves:

[2276531.480629] usb 1-2.1.2.2: new full speed USB device using ehci_hcd and address 37
[2276531.573072] usb 1-2.1.2.2: config 1 interface 0 altsetting 0 has an invalid endpoint with address 0x80, skipping
[2276531.574313] usb 1-2.1.2.2: New USB device found, idVendor=04d8, idProduct=0001
[2276531.574321] usb 1-2.1.2.2: New USB device strings: Mfr=4, Product=3, SerialNumber=2
[2276531.574328] usb 1-2.1.2.2: Product: USB Hack
[2276531.574333] usb 1-2.1.2.2: Manufacturer: Microchip Technology, Inc.
[2276531.574339] usb 1-2.1.2.2: SerialNumber: 0.0
[2276531.575129] usbhid 1-2.1.2.2:1.0: couldn't find an input interrupt endpoint
[2276535.883246] usb 1-2.1.2.2: USB disconnect, address 37

Next is to pick a an interface and stick to it.  I'm tempted to do
something really simple for Staapl, just wrap the monitor protocol in
2 vendor-specific requests:

SET_DATA (push)
GET_DATA (pull)

and write a C program that takes data on stdio so this can be combined
with socat.

What are the alternatives?
  - vendor-specific / really simple
  - CDC
  - FTDI


Entry: USB rpc stuff
Date: Wed Jun 13 13:54:35 EDT 2012

Cought trying to make a (simple) RPC mechanism for remote USB calls.
One that works for PK2 and can be reused for some ad-hock Staapl
packet protocol.

I'm running into some weird problem with memset() / memcpy ().

Wait... this is silly.  0xad is probably \255 octal?


Entry: Linux USB serial
Date: Wed Jun 13 21:00:39 EDT 2012

What about emulating the simplest driver?  Some candidates:

linux-2.6-2.6.32/drivers/usb/serial$ ls -lS *.c |grep -v mod

-rw-r--r-- 1 tom tom   2069 Dec  2  2009 hp4x.c
-rw-r--r-- 1 tom tom   2025 Dec  2  2009 siemens_mpi.c
-rw-r--r-- 1 tom tom   1521 Dec  2  2009 funsoft.c

See also[1]

Reading Documentation/usb/usb-serial.txt in Linux source gives:

  If your device is not one of the above listed devices, compatible with
  the above models, you can try out the "generic" interface. This
  interface does not provide any type of control messages sent to the
  device, and does not support any kind of device flow control. All that
  is required of your device is that it has at least one bulk in endpoint,
  or one bulk out endpoint. 
  
  To enable the generic driver to recognize your device, build the driver
  as a module and load it by the following invocation:
	insmod usbserial vendor=0x#### product=0x####
  where the #### is replaced with the hex representation of your device's
  vendor id and product id.


[1] http://comments.gmane.org/gmane.linux.usb.general/34211


Entry: USB Bulk
Date: Thu Jun 14 17:00:45 EDT 2012

So Linux USB Generic serial driver seems to be enough to get going.
It uses bulk transfers.  I can see OUT and IN transactions on the
wire, but don't know what I'm supposed to see.

First thing to do is to enable the endpoint buffers.


Entry: IN transaction
Date: Thu Jun 14 19:06:24 EDT 2012

When exactly is the TRNIF for the IN transaction set?  Figure 17-9
indicates that it is after a transaction is complete.

So, to send data on an IN endpoint, update BD and transfer it to USB
transceiver.  Whenever it receives an IN token it sends out the data
and after receiving an ACK from the host, it sets TRNIF.

With IN implemented I see stuff on the wire, but the serial port side
doesn't do anything.  Probably needs OUT too.

So.. now I see a bunch of data on both endpoints, but still nothing on
the serial side.  Overlooking something..  Oops.. going a bit too hard?

[17504.255316] ------------[ cut here ]------------
[17504.255334] WARNING: at drivers/usb/serial/usb-serial.c:410 serial_unthrottle+0x53/0x72 [usbserial]()
[17504.255342] Hardware name: Aspire M3400
[17504.255346] Modules linked in: ftdi_sio usbserial binfmt_misc ppdev lp sco bnep l2cap crc16 bluetooth rfkill vmnet parport_pc parport vmblock vsock vmci vmmon autofs4 powernow_k8 cpufreq_stats cpufreq_powersave cpufreq_conservative cpufreq_userspace cpufreq_ondemand freq_table nfsd exportfs nfs lockd nfs_acl auth_rpcgss sunrpc act_police sch_ingress cls_u32 sch_sfq sch_cbq 8021q ipt_MASQUERADE xt_tcpudp xt_state iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_filter ip_tables x_tables bridge stp fuse radeon ttm drm_kms_helper drm i2c_algo_bit tun kvm_amd kvm dm_mirror dm_region_hash dm_log sbp2 ieee1394 loop usbhid hid snd_hda_codec_atihdmi snd_ice1712 snd_ice17xx_ak4xxx snd_ak4xxx_adda snd_cs8427 snd_ac97_codec snd_hda_codec_realtek snd_seq_dummy ac97_bus snd_i2c snd_mpu401_uart snd_seq_oss snd_seq_midi snd_rawmidi snd_seq_midi_event snd_hda_intel snd_hda_codec psmouse snd_pcm_oss snd_mixer_oss snd_seq snd_pcm i2c_piix4 shpchp snd_timer snd_seq_device snd i2c_core soundcore serio_raw snd_page_alloc button processor pci_hotplug evdev wmi ext3 jbd mbcache dm_mod sr_mod cdrom sd_mod usb_storage ohci_hcd ahci r8169 libata thermal mii thermal_sys scsi_mod ehci_hcd [last unloaded: usbserial]
[17504.255566] Pid: 7475, comm: cat Tainted: G        W  2.6.33.7-rt29 #1
[17504.255571] Call Trace:
[17504.255586]  [<ffffffff8103c61f>] ? warn_slowpath_common+0x76/0x8d
[17504.255597]  [<ffffffffa06802ca>] ? serial_unthrottle+0x53/0x72 [usbserial]
[17504.255608]  [<ffffffff811def55>] ? tty_unthrottle+0x39/0x45
[17504.255616]  [<ffffffff811dda78>] ? n_tty_flush_buffer+0xe/0x67
[17504.255625]  [<ffffffff811e0993>] ? tty_ldisc_flush+0x27/0x3c
[17504.255634]  [<ffffffff811e1332>] ? tty_port_close_start+0x13b/0x163
[17504.255643]  [<ffffffff811e176f>] ? tty_port_close+0x11/0x41
[17504.255651]  [<ffffffff811db440>] ? tty_release+0x23c/0x578
[17504.255661]  [<ffffffff810c485e>] ? handle_mm_fault+0x3cb/0x79a
[17504.255669]  [<ffffffff812ee201>] ? rt_spin_lock+0x29/0x6d
[17504.255678]  [<ffffffff810dd642>] ? __fput+0x10e/0x1e2
[17504.255686]  [<ffffffff810da8ad>] ? filp_close+0x5f/0x6a
[17504.255693]  [<ffffffff810da95a>] ? sys_close+0xa2/0xdb
[17504.255701]  [<ffffffff81002b02>] ? system_call_fastpath+0x16/0x1b
[17504.255708] ---[ end trace b863518ac3707459 ]---


EDIT: This is probably because 64 bytes is not a short packet, and
host keeps polling, so the transfer never stops.

Maybe it's the CRC?  Doesn't look like it.  From what I see in PIC DS
this is handled by the transceiver.  Maybe it needs a stall packet?

From [1]:

  IN: When the host is ready to receive bulk data it issues an IN
  Token. If the function receives the IN token with an error, it
  ignores the packet. If the token was received correctly, the
  function can either reply with a DATA packet containing the bulk
  data to be sent, or a stall packet indicating the endpoint has had a
  error or a NAK packet indicating to the host that the endpoint is
  working, but temporary has no data to send.  

Searching for NAK in the PIC DS doesn't turn up an explicit mechanism.
Maybe just not send stuff?

Ok, this goes a little better.  OUT now seems to work: when I cat a
file to /dev/ttyUSB it goes over the wire in its entirety.  However,
the IN stuff is still weird.  In response to IN, host sends an OUT
transaction.  Maybe there's a handshake?  Or is the IN data
interpreted in some way?

Wait, this could just be TTY stuff in response to the codes sent by
the device.  Let's just send characters.

It's probably because echo is on.

And the fact I don't see anything is probably because of line
buffering.  Adding CR/LF to the string gives this:

broebel:/home/tom# cat /dev/ttyUSB0
 !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]

Looks like it's working!


[1] http://www.beyondlogic.org/usbnutshell/usb4.shtml#Bulk


Entry: Serial on windows?
Date: Fri Jun 15 09:38:23 EDT 2012

Now, pick one that also works on windows[1].

[1] http://www.linuxjournal.com/article/6573


Entry: Next
Date: Fri Jun 15 09:58:32 EDT 2012

- Make PK2 more usable: use RPC stuff / fix the sync bugs.

- Speed up compilation: generate target code only once by moving it to
  transformer phase.

- USB: MIDI / Generic interfaces.


Entry: Cleanup staaplc
Date: Fri Jun 15 11:07:11 EDT 2012

It doesn't look like these lines are really necessary:
    ;(target-words-check! labels)
    ;(code-pointers-set! pointers)

As long as we have the correct .fm corresponding to the compiled code,
everything should be there.  It's not necessary and a bit confusing to
redefined words.

If I recall, the reason that this is there is to allow the shell to
communicate with "binary only" images, by just saving the addresses.
I no longer thing this is necessary, or a good idea.


Entry: Cache target code compilation
Date: Fri Jun 15 11:48:54 EDT 2012

Trouble here is that it needs to have a sytax representation.  Is that
possible with arbitrary (circular) data structures?  Doesn't look like
it.

  #lang scheme
  (require (for-syntax scheme))

  (define-syntax (foo stx)
    (let ((val (make-vector 1)))
      (vector-set! val 0 val)
      #`'#,val))

  (define bar (foo))


datum->syntax: cannot create syntax from cyclic datum: #0=#(#0#)

So... before this can work, there first needs to be a non-circular
representation of target words, where names are used to introduce
circularity.

The only place where this happens in code is in an assignment
statement that links a word to its code.  Can this be broken open?

grep -nrI . -e set-target-word
./comp/postprocess.ss:68:      (set-target-word-next! w w+)
./comp/postprocess.ss:71:    (set-target-word-code! word code)
./comp/postprocess.ss:91:  (set-target-word-code! word
./asm/assembler.ss:134:       (lambda (w) (set-target-word-address! w #f))
./asm/assembler.ss:167:        (set-target-word-address! word addr)
./asm/assembler.ss:231:                          (set-target-word-bin! word inss)

I'm not sure if this is going to be an easy change.
It might be better to live with the situation and first make PK2 more robust.


Entry: PK2 robustness
Date: Fri Jun 15 12:28:33 EDT 2012

It looks like the main problem is the device getting out-of-sync.
There is not so much of a problem with stuck PK2 firmware with some
safe-guards installed.

Let's investigate.  Run current usb test app a couple of times, using
CTRL-C to interrupt and power-cycle the target.  The error message
after a while is:
      
  OK
  test
    C-c C-c
  Command "test" interrupted.
  Trying cold restart...
  target-off
  target-on
  icsp-recv: pk2 read: expected 3 bytes, got 1:
  icsp-recv: b:2 h:#t a:#f -> (0)

So it gets out of sync and after that never recovers.  How to resync?

Will try resync on this:

            (unless (= expect-size real-size)
              (error 'icsp-recv "pk2 read: expected ~a bytes, got ~a:\n~a\n"
                     expect-size real-size (log-msg reply)))

Looking on the logic analyzer, there is some transfer going on, but
it's not sure who is writing.  This probably needs a look on the scope
to see if the read/write phase got messed up..

There is a collision during one bit, so something is out of sync.

What I don't understand is why a power reset doesn't solve it.

Hard to see what's going on, but I think it's safe to assume the
problem is on PK2 side.  Target seems to be fine.

Solution for now:

  (define (reconnect)
    (pk2-close)
    (sleep 1)
    (connect)
    )

The PK2 RESET command doesn't work if PK2 is stuck.  It disconnects
the USB device while opened with libusb which seems to cause trouble..


Entry: Opti stuff
Date: Fri Jun 15 16:08:48 EDT 2012


Might need to revert.  The building blocks seem to work but driver
doesn't respond to first SETUP packet.  EDIT: fixed.  problem was
different interface for OUT/DATA0.


\ PIC18 specific (since USB is also PIC specific): use rot<<, indirect
\ addressing using a register, and assumption that >r and r> can span
\ across procedures.
  

\ ep -- bd    
: IN    rot<< 1 + ;
: OUT   rot<< ;

\ Toggle mask for stat+ : bit 7 for and, bit 6 for xor.    
: DATA0  #x00 ; \ 0 and 0 xor
: DATA+  #xC0 ; \ 1 and 1 xor
: DATA1  #x40 ; \ 0 and 1 xor

macro
\ Just needed once, so use macros  

: bdaddr \ n -- lo hi
    rot<< rot<< 4 ; 

\ Update status register     
: stat+ \ togglemask ustat -- ustat+
    over rot>> and xor  \ Apply togglemask to DATAx bit
    #x40 and            \ Keep only DATAx, rest was filled by USB
    #x88 or ;           \ UOWN set to USB, DTSEN=1, KEN=0, INCDIS=0, BSTALL=0
    
forth
    
\ Prepare buffer descriptor to send to USB transceiver with updated
\ DATAx and buffer size.  Destroys `a' reg.  
  
: >usb \ n bd x --
    a>r
      >r bdaddr ah ! al ! r>
      @a+      \ STAT @
      stat+ >r
      !a-      \ CNT !
      r> !a    \ r> STAT ! 
    r>a ;
    
: OUT/DATA0 OUT DATA0 >usb ;
: IN/DATA0  IN  DATA0 >usb ;
: IN/DATA1  IN  DATA1 >usb ;    
: IN/DATA+  IN  DATA+ >usb ;    


Entry: Next
Date: Fri Jun 15 18:47:40 EDT 2012

Time for reflection, maybe?
USB is working.  Yeah!

Is it worth writing alternative firmware for PK2?  Probably not.  At
least not yet.  Adding programmer to Staapl PK2 driver might be nice,
but is not really necessary either..

There's another window that opens up now: PC USB <-> stuff ;)

For the synth project, the killer app is an exponential D/A converter
with digital (time) feedback.  Maybe it's time to start to do that?


Entry: USB midi
Date: Fri Jun 15 19:02:01 EDT 2012

I have a serial port, which is trivial to add to Pd.  Could already
run MIDI over it.

Ha, I have a sniffer now, I don't need to read docs!

Endpoints send 64 byte packets containing MIDI messages, with a single
byte prefix, and padded with zero.  From page 16 in [1] this byte is
cable number + code index.  Each message is 32 bit, (I guess) with up
to 16 messages per 64 byte IN/OUT transaction.

This is the configuration info sent by
Bus 003 Device 020: ID 09e8:0076 AKAI  Professional M.I. Corp.

00000000  09 02 65 00 02 01 00 a0  32 09 04 00 00 00 01 01  |..e.....2.......|
00000010  00 00 09 24 01 00 01 09  00 01 01 09 04 01 00 02  |...$............|
00000020  01 03 00 00 07 24 01 00  01 41 00 06 24 02 01 01  |.....$...A..$...|
00000030  00 06 24 02 02 02 00 09  24 03 01 03 01 02 01 00  |..$.....$.......|
00000040  09 24 03 02 04 01 01 01  00 09 05 01 02 40 00 00  |.$...........@..|
00000050  00 00 05 25 01 01 01 09  05 81 02 40 00 00 00 00  |...%.......@....|
00000060  05 25 01 01 03                                    |.%...|
00000065


[1] http://www.usb.org/developers/devclass_docs/midi10.pdf


Entry: USB Serial: raw mode by default?
Date: Fri Jun 15 20:53:44 EDT 2012

Is there a way to disable all tty processing, especially echo, before
any application has a chance to send data to the device?


Entry: Too Concrete
Date: Fri Jun 15 22:43:41 EDT 2012

For memory operations current Staapl isn't abstract enough.  Juggling
the a register and working with bytes / words is a pain.

I wonder if the r register can be freed up a bit.  Can we make it so
that 'r' is not guaranteed to be available in interrupts?  This would
allow the FSR1 register to be used as a pointer reg for memcopy.
Interrupt could have its own stacks..

: >IN1 a>r
    13 4 a!! a>    \ get buffer count
    top 6 high? if \ check if full
       drop
       IN1-flush
       IN1-wait
       0
    then
                   \ check if full
    dup >r
    128 + 5 a!!    \ get next byte address
    >a             \ store byte
    r> 1 +         \ increment counter
    13 4 a!! >a    \ store it
    ;
    

Writing code like this is really too much work, isn't it?  I'm having
fun with this, but somethines it's a royal pain to express something,
especially dealing with memory.  Mostly because I don't want to make
"slow" abstractions; I'm still programming a register machine...

Stuff like this really can't beat C..  It will just never get done.
Maybe I should make a real Forth to try out some other ideas?

In the code above I'm thinking, it really can't be that hard to write
a byte to a buffer in memory and increment a counter.  But I think it
is on PIC...

Some opti: counter could just contain the al register.  Before sending
out, mask out bits that are not used.  Version two


Entry: PIC and memory buffers (banked access)
Date: Sat Jun 16 07:38:40 EDT 2012

Re: last post..  Trouble is of course that 1. PIC is a pain to use an
2. abstraction solves everything.  (and 3. I have to accept these
things ;)

However, thinking in assembly it seems that what is needed is a way to
use banked addressing.  The pointers themselves are in a fixed
location so not necessary to use the 'a' register here..

Let's define the words @b, !b, b@, b! that perform the banked access
and access to the banking registers.

DONE.


Entry: Next
Date: Sun Jun 17 14:42:08 EDT 2012

- >IN1 using banked access for pointer

Make the abstraction such that there is no huge setup penalty for
transferring multiple bytes.  I.e. just use >a but add an open/close
abstraction that loads/stores the CNT reg.


Entry: PIC indirect addressing
Date: Sun Jun 17 14:59:42 EDT 2012

In general this is a pain in the rear, except when it's possible to
use the FSR registers to keep loop state.  With a bit of tinkering
this doesn't seem to be a big problem.

Makes a lot of sense now why the ColorForth chips have 2 pointer
registers: one source, one dest.  It makes loops over memory a lot lot
more efficient than juggling pointers.  It's really just stdin/stdout.

Anyways, here the new IN1 update using b and a.

\ When filling up the buffer, CNT has AL.  Strip off the bits when
\ sending it out.  We can just use the >a and a> words to access the
\ buffer.

\ Since the location of the buffer is known, these are implemented as
\ macros to make the other code a bit more readable, and to have a
\ more efficient implementation.  Indirect addressing is inefficient
\ since we're already using all 3 pointer registers.

macro
: IN>BD    OUT>BD 4 + ;          \ EP -- BD
: OUT>BD   8 * ;                 \ EP -- BD
: CNT      1 + ;                 \ BD -- BD.CNT    
: IN1/CNT  1 IN>BD CNT ;         \ -- BD.IN1.CNT
: bd@      >m bd-page b! m> @b ; \ addr -- value (fetch in BD page)
: bd!      >m bd-page b! m> !b ; \ value addr -- (store in BD page)
forth
    
\ These serve as "open/close" for the IN1 buffer.
: a/IN1-begin
    IN1/CNT bd@
    al ! buf-page ah ! ;
: a/IN1-end
    al @
    IN1/CNT bd! ;

\ Single byte access, saving a.
: >IN1 a>r a/IN1-begin >a a/IN1-end r>a ; \ byte --
    
    
Entry: Scheduling
Date: Tue Jun 19 12:08:55 EDT 2012

Next problem is scheduling since there are now two tasks; i.e. how to
handle USB consumer/producer control flow?  Maybe it's time to start
using interrupts?  Alternative is to do the polling in the event loop.
Maybe that's going to be easier.


Entry: Cheapest USB PIC18F
Date: Tue Jun 19 23:43:30 EDT 2012

Currently that's PIC18F13K50[1] at $1.32 volume price, compared to
$3.44 for my default PIC182550.  Mouser pricing for 13K50 is [2]

1:	$2.39	
10:	$1.91	
25:	$1.75	
100:	$1.58

Bummer, reading the Flash programming manual[3] it looks like D+/D-
are multiplexed with PGC/PGD, which means PK2 approach can't be used.
Probably needs a bootloader to use Staapl.

Talking about bootloader, it might be simplest to go that route now
that USB is working.

[1] http://www.microchip.com/wwwproducts/Devices.aspx?dDocName=en533925
[2] http://www.mouser.com/Search/ProductDetail.aspx?qs=hH%252bOa0VZEiAcEtBytpgHsA%3D%3D
[3] http://www.microchip.com/wwwproducts/Devices.aspx?dDocName=en533925


Entry: GDB stuff
Date: Wed Jun 20 15:30:39 EDT 2012

See previous post[1].

How to connect this to a server application?  Let's make a small C app
that handles requests by waiting for a single write on a named pipe,
and writing back a reply on another.

The thing is.. it's a lot simpler to just exec GDB from the C app, and
use the --annotate=3 protocol that's also used in emacs.

EDIT: Tried the exec GDB approach for a closed project.  Works well!
Main benefit is that test system and target system share the same
language + code base while running on different hosts.

[1] entry://20120606-000056


Entry: Busy
Date: Wed Jul  4 13:16:25 EDT 2012

Couple of weeks off the project due to work and holidays.  Next thing
to do is to make a strategy for having 2 tasks: USB driver and main
app.  Some options

- state machine polling loop
- blocking tasks
- ISR + main task

The usefulness of these depends on the application.  ISR+main seem to
be the simplest approach.  Trouble on PIC18 is that tasks need to
share the return stack, other than that task switching is fast.  The
USB driver can easily be written as a state machine so it can run in a
polling loop or from an ISR.

Let's go for slow ISR for USB, then move from there.


Entry: Cheapest PIC18F
Date: Tue Jul 31 17:23:08 EDT 2012

For an electronics project I want the cheapest possible 18F chip that
can be used with Staapl.  I'm thinking to go for a 2-chip solution:
one programmer/hub with USB and and one or more slave chips.

At volume pricing, the 18F13K22 is current the cheapest at $1.16 [1]
PDIP low volume is $2.30

The 18F1220 volume is $1.96 with low volume PDIP at $2.44

So for low volume it doesn't make much difference, though the 13K22
is faster (16Mhz intosc up to 16 MIPS / 64MHz).

[1] http://www.microchip.com/wwwproducts/Devices.aspx?dDocName=en538201


Entry: Synth Club
Date: Tue Jul 31 20:27:25 EDT 2012

1. Working stuff

- 18F1220 synth: cheap + boards available.  Problem: this is
  programming.  How to simplify?

- Mixer feedback

2. Breadboard experiments

- Inverter feedback


Entry: Starting again
Date: Mon Oct 15 10:39:51 EDT 2012

Question now is: what works and what doesn't work for PK2?  I'm quite
ennoyed by this to the point I want to be done with the whole bazar.
It's too much effort to work around this all the time..

Connecting to PICkit2.
datfile:  /usr/local/bin/PK2DeviceFile.dat
iProduct: PICkit 2 Microcontroller Programmer
Console startup failed:
#(struct:exn:fail bad-reply: id:0 msg:() #<continuation-mark-set>)
Continuing with REPL anyway:
Press ctrl-D to quit.
OK

After that it works.


Entry: Debugger
Date: Mon Oct 15 11:21:13 EDT 2012

The thing is that the PK2 is only necessary for kernel programming.
After that it's just a digital serial system, so I wonder if it might
be possible to wire-or onto the ICD lines.


Entry: Next
Date: Mon Oct 15 15:15:52 EDT 2012

1.

I can't do anything without proper debug tools, so that problem needs
to be solved first.  Since the final product is going to have USB, it
seems best to go with a bootloader and debug console built in.
Focusing on a single chuip would make things easier too.  It's nice to
play around with the smaller ones but the 18F2550 family will do just
fine for now.

With a bootloader there is the PK2 to program it and debug the
bootloader.  Remaining debugging could be done over USB, i.e. "inside
the OS".

2.

Once a proper USB device is working, this can also be used as a
debugger/programmer.


NEXT: Get the Staapl protocol working on a virtual serial port.


Entry: Standard bootloader
Date: Mon Oct 15 15:21:58 EDT 2012

It would be best to re-use a standard bootloader.  Is there one?
Doesn't look like it, then the best bet I have is to boot it up as a
serial port and use the Staapl protocol.


Entry: Staapl protocol on virtual serial port.
Date: Mon Oct 15 15:30:55 EDT 2012

Problem: buffering.  It's probably best to stick to the ping-pong
protocol.  The EP0 has IN and OUT buffers, which are separate, so
there should be no problem simulating single-byte rx/tx words.


Entry: Next: console on USB SERIAL
Date: Sat Oct 20 14:55:14 EDT 2012

The problem is flow control.  Monitor is written in terms of blocking
read, so some inversion is necessary.

The big question is: do we want tasks?  It's easy enough in PIC18 as
long as the hardware stack is deep enough, but it's quite a pain if it
is not.  One more reason to switch to a different arch.

The other approach is to just use interrupts.  This requires some
thinking as I do need the high-priority interrupt for the audio stuff.

Let's do this then.
0x18 is LPIV

Section 17.5 “USB Interrupts”.
Low priority interrupts:
PIR2 USBIF : USB interrupt flag
IPR2 USBIP : USB interrupt priority
PIE2 USBIE : 
UIE        : Propagate USB interrupts to microcontroller

From Figure 9-1, this is the configuration that enables USBIF to
interrupt the CPU to the Low Pri 0018h vector.

USBIF = 1
USBIE = 1
USBIP = 0
GIEL/PEIE = 1  : peripheral interrupt enable
GIE/GEIH = 1   : global interrupt enable
IPEN = 1

Also, not to forget from UIR -> USBIF there is UIE
UIE = 7F


Entry: What to save in ISR?
Date: Sun Oct 21 10:16:39 EDT 2012

The main question is, is it safe to use the stacks?  I think so,
because none of the ASM instructions leave the stacks in an
inconsistent state.  So the recipe is:

- dup is MOVWF which does not affect status flags, so WREG can be
  saved first.
- STATUS can then be copied into wreg.

The above are just STATUS @.  To restore, care needs to be taken that
the drop doesn't mess up the flags after restoring it.  So STATUS !
can't be used.  There is a "nfdrop" that uses MOVFF to not affect
flags.  This should work:

  dup STATUS !
  nfdrop


- save STATUS before doing any dup, since dup affects flags.

ther eis an

Entry: Simpler, not more complex.
Date: Sun Oct 21 10:36:30 EDT 2012

I'm thinking about how to do this bootloader thing, but really, that
is currently not the issue.  Stick to kernel + interaction approach,
where:

- PK2 or other Microchip programmer is used to upload the kernel.
- Interaction is over USB serial port.

Later if necessary, the kernel can be programmed only once, and all
updates can be done over USB interaction (i.e. arduino-style)

It's probably not a good idea to start adding all kinds of hooks to
try to predict usage at this point: interrupts and USB descriptors
etc..


Entry: Fundamental linking question
Date: Sun Oct 21 10:39:11 EDT 2012

I had started to refactor things to use Racket modules, but in
practice, just leaving .f files with undefined names seems to be a lot
simpler.

Is there a way to modify the compiler to manage dependencies better if
the .f files are changed?  Currently it's completely ignored.  Maybe
this can use an "include" directive or so.


Entry: Serial port echo
Date: Sun Oct 21 11:24:37 EDT 2012

So, service-usb is run from interrupt.  Next is to make an echo app in
"userspace".  To avoid double buffering, this approach can be used:

If IN1 is empty and owned by the UC, it's possible to send out a
packet by locking the usb interrupt, filling the buffer, and sending
it out.  Otherwise, just wait until it is.

Trying a bit, I can get it to work without interrupts but once I start
playing with the interrupt enable/disable it goes wrong.  Is there a
simpler way to synchronize?

The thing is this: if we're not currently sending, the IN1 buffer is
owned by the uC.  The ISR will not touch it until the flags are set,
so can that be used?

EDIT: Might be actually that it never leaves the ISR because some flag
is not acknowledged.  Maybe something else is triggering the interrupt?


Entry: Status LED
Date: Sun Oct 21 11:55:14 EDT 2012

I'm reminded of a simple fact: if your debugger doesn't work, you need
a status LED to figure out what's going on.

And currently, the PK2 stuff is playing up again.


Entry: Tools trouble
Date: Sun Oct 21 13:26:46 EDT 2012

Hmm...  This sucks.  There are a bunch of things that are not really
working as they should, so maybe a new approach is necessary.

Problems:

- PK2 is not reliable
- Bootloaders are cumbersome to use (what if they get overwritten)
- Serial console doesn't have a reset


Instead of making a radical change and ending up at another problem,
is there a way to do this with minimal changes?  Would a proper reset
for serial console be enough?


Entry: What's working?
Date: Sun Oct 21 15:11:17 EDT 2012

WORKING:
1. "test" on pk2-2550-48.fm + just PK2 connected, after 2nd try after full unplug.
2. ctrl-C + reload .live + "test"
3. ...
4. ...


5. "testi", all the same
6. ctrl-C + reload .live "testi"
7. ...

NOT WORKING: The same, but using serial console.  Disconnecting the
power might help.  Nope... something weird is going on.  Maybe some
ports are disabled or something?

Weird..
Let's cut the power from USB-TTL, see what happens then.

Trouble was that "PIR2 USBIF low" was needed.


Entry: Byte read/write
Date: Sun Oct 21 18:47:48 EDT 2012

Byte write:

- Busy loop until UOWN=0  (if UOWN=1 a transaction is in progress)
- Save a byte to the buffer, update pointer
- If buffer is full, send to USB

Flush:
- <n> 1 IN/DATA0

Implementation: CNT can be used to store the LSB of byte pointer.

I did this impl before.  Where did it go?
Yep:

a/IN1-begin
a/IN1-end
IN1-flush


Entry: Test primitives
Date: Sun Oct 21 20:26:17 EDT 2012

I need a better proper "equals" operation that properly drops the top
element.  It seems best to use the carry flag for conditions since it
survives "drop".

: = =>c c? ;


Entry: Reproducibly stable startup for USB code
Date: Mon Oct 22 13:32:21 EDT 2012

I don't know what's going wrong, but the problem seems to be that it
doesn't want to start up properly right after flashing with PK2.  This
works:

- make .flash / .live
- exit staapl
- power cycle both PK2 and the USB (1-2 seconds)
- make .live
- testi

Could be a hub issue.  Might want to plug it in directly..


Entry: Loopback working
Date: Mon Oct 22 13:37:15 EDT 2012

Not particulary fast though.  Saturates at 62.3kB/s

uC is running at 12 MIPS.  Since each byte is processed separately by
>IN1 and OUT1> this gives 192 instructions per byte.

Actually it quite consistently gives 62.5kB/s which is 64000 / 1024 or
exactly one 64 byte buffer per millisecond (USB tick).


Entry: Indirection for >INx OUTx> ?
Date: Mon Oct 22 13:54:25 EDT 2012

Would be nice, but maybe isn't necesary.


Entry: Automatic banked access
Date: Mon Oct 22 13:55:12 EDT 2012

The bank select instruction gets in the way of optimization.  It's
probably best to make bank-select words for all operations that use
direct memory references, and just inject the instruction in a
preprocessing step based on the actual value of the address:

This could use the following map:
000-07F  access RAM
080-0FF  device registers
1FF-F80  bank-select accessed
F80-FFF  stack ram (from 080-0FF).

For straight line code the bank select instruction could then also be
omitted.


Entry: Next: board powered from USB
Date: Mon Oct 22 16:46:26 EDT 2012

- Power board form USB and disconnect PK2/FTDI.
- Drive the interpreter

Power from USB works with serial loopback.  However, I need to unplug
it for about 5-10 seconds before replug.  Maybe needs brown-out reset?

Hmm... it worked for a bit now not at all.  Ok it does work when I
wait 5 seconds and plug it into a different hub, facing down.  Maybe
solder issue, or a voltage issue.  Maybe I should try with an LF?

It's very picky about the port it goes into.  Plugging it directly
into the tower doesn't work either..  Only the powered hub.

I don't think it's the voltage.  Very unpredictable behavior.
Probably some soldering issue.

It all works fine as long as its powered by the PK2.  It seems really
just to crash.  Maybe a pause at bootup would work?

I checked on the scope.  Don't see much on the outside except for the
3.3V VUSB signal.  It stays on for about a second, then discharges to
about 1V over a little over 2 seconds.  This cycle repeats
indefinitely.  I don't see anything happening to the supply voltage or
the reset.  The device descriptor is received OK.

Very strange.  

This might be related to the thing not working with just a serial
console attached..

EDIT: Trying again using interactive PK2 it works for a bit, then it
disconnects again.

Maybe it's time to switch back to uart and see what is actually going
on.

EDIT: Trying some more things.
Added a startup delay: 0 for 0 for 0 for next next next, which seems to work.


EDIT: When I connect just the ground from PK2, it seems to run without
problems.

Disconnecting the ground then keeps it running.  Nope, this is not
consistent.  I just switch the hub's switch (unplug/replug) and it
starts OK.  What can this be?

Touching the solder joints makes it crash.
If I don't send any data it lives longer.


Running from with PK2 with USB power disconnected works fine.
I can only make it crash by touching the oscillator connections.

Can this be caused by a floating input pin?


More weirdness.  PK2 is now supplying 3V.  I noticed this when trying
to close the USB 5V jumper while running. 


After programming, pk2 give 4.3V
Running with that, after a while it gives up.

Stopping PK2, then starting again runs it at 3V.
First time no proper comm (bad-reply) 2nd time it works.

Could it be that the regulator is enabled, but the signalling voltage
is not 3.3V?  EDIT: It is.

So everything seems to work at 2.7 and 3.6 V, but 4.3V doesn't work.
Maybe the power is just too unstable?


Entry: 18F2550 voltage
Date: Fri Oct 26 08:58:57 EDT 2012


From the datasheet:

  1.2 Other Special Features

  Like all Microchip PIC18 devices, members of the
  PIC18F2455/2550/4455/4550 family are available as both standard and
  low-voltage devices. Standard devices with Enhanced Flash memory,
  designated with an “F” in the part number (such as PIC18F2550),
  accommodate an operating VDD range of 4.2V to 5.5V.  Low-voltage
  parts, designated by “LF” (such as PIC18LF2550), function over an
  extended VDD range of 2.0V to 5.5V.


This makes no sense.  It's an F part (not LF) runs fine on lower
voltage but not on higher voltage..  WTF?  Also, later in the DS the
operating voltage is mentioned to be a larger range:
28.1 DC Characteristics: 2.0-5.5V

Let's replace the chip and see what it does.

EDIT: Maybe this is really an LF device that's marketed as an F
device.  I assume they are the same die but get sorted into 2 range
bins.

Some discussion here[1].  People think indeed they are the same chips
passing some extra testing.  Verified by some F parts working at LF
voltages.

[1] http://www.electro-tech-online.com/microcontrollers/130612-difference-between-pic-lf-pic-f.html

Entry: Next
Date: Fri Oct 26 09:21:16 EDT 2012

1. Work on firmware: on 3V it seems to work so let's just continue.

2. Try on different hardware:
   - USBPicStamp
   - Build new board for verification using 18LF4550
   - Swap in new 18(L)F2550 (waiting for sample order)


Entry: FLUNK presentation
Date: Mon May  6 11:06:23 EDT 2013

What is Staapl?

- "assisted" macro assembler
- low-level code modeling tool
- experiment: does it make sense to write "abstract low-level code".


Some basic ideas.

  - FACTOR

    uC code is often code-size constrained.
    A stack language (Forth) or stack machine model can help here.

    Why?  Replacing global registers (RISC machine) with a 2nd stack
    can decouple code, introducing more opportunity for code reuse.


  - GENERATE

    uC code is often very specialized and "hand optimized".

    This means there is _implicit_ structure in the code that is no
    longer visible in the low-level assembly (or C) code.

    It makes sense to generate it from a higher level description, to
    make this otherwise hidden structure explicit.  I.e. represent a
    model and a specializer as opposed to just the specialized code.


  - FUNCTIONAL

    At the meta level, a stack language is easily represented as a
    pure functional language.

    syntactic concatenation -> composition of code generators


How does it work?  Take the Forth snippet

         1 2 +

which loads two numbers on the parameter stack and performs the "+"
operation.  The result of this is a parameter stack loaded with the
number 3.

When compiling this to machine code, one would typically see the
instruction sequence:

         push 1
         push 2
         call +

Note there is a direct correspondence between a forth code sequence
and a machine code sequence.  The trick is then to interpret the
recently generated machine code as a _data stack_

I.e. after compiling "1 2", the compiler sees the following generated
code segment:

         push 1
         push 2

When it encounters "+", instead of compiling the call to "+", it would
remove the two _instructions_ from the compilation buffer, perform the
computation at compile time, and produce the result.

         push 3

In general, this is called _partial evaluation_.  This particular
structure is also called a _peephole optimizer_ in that it only looks
at the most recent code.


Once this mechanism is in place, it is possible to generalize it to
"virtual instructions", meaning to compile code that cannot be
compiled directly to machine code, but will act as parameters passed
to other code generators.  This introduces _composition_ of code
generators.

And that's it.  Everything else fits in this picture.  It allows the
same syntactic representation for compile time and run time code.

It's not that different from the macros vs. functions idea in Scheme,
only that this works on stacks of code and not data flow graphs (=
lambda expressions).

So what's the point?  It allows to mix machine mapping (i.e. PIC18
macros) and highlevel code generators, all in one representation,
giving access to the _real_ machine.

( As opposed to performing the partial evaluation directly on the
input syntax.  This is related to joy's syntactic vs. semantic
isomorphism. )


Example in staapl:
  - conditionals use uC flags instead of the data stack.
  - 


Entry: Why work on the stack level as opposed to the syntax level?
Date: Mon May  6 12:16:44 EDT 2013

From [1]:

  In Joy, the meaning function is a homomorphism from the syntactic
  monoid onto the semantic monoid. That is, the syntactic relation of
  concatenation of symbols maps directly onto the semantic relation of
  composition of functions. It is a homomorphism instead of an
  isomorphism because it is onto but not one-to-one, that is, some
  sequences of symbols have the same meaning (e.g. "dup +" and "2 *")
  but no symbol has more than one meaning.

How is this relevant for Staapl?  The idea is that it is easier to
work with the semantic representation (functions) than the syntactic
representation.

I.e. instead of using term rewriting as the computation engine, one
uses function composition, which in practice is implemented as
directed rewriting, i.e. pattern matching.

The advantage here is to be able to encode machine-specific ideoms
(i.e. machine instructions) along side high level constructs.


[1] http://en.wikipedia.org/wiki/Joy_%28programming_language%29