This is the BROOD dev log. The older the entries, the less accurate
they probably are with respect to the current design and
implementation. If you have comments, give them on the brood list

http://lists.goto10.org/cgi-bin/mailman/listinfo/brood-list


Entry: compiled / interpreted
Date: Thu Feb 23 17:33:21 EST 2006


After briefly using a direct interpreter i've opted for
: run compile execute ;

Code is now always compiled. Where
: compile translate asm ;

With 'asm' the a scheme eval in the forth module. (There's still the
word 'scheme' which evals in the current module).

This approach is way simpler than direct interpretation, and it allows
words to be represented as scheme procedures.  Also, it was a piece of
cake to get true tail calls this way, while in the interpreted version
it was more difficult. (Fri Apr 14 21:53:31 CEST 2006 it does work
now, but compilation is still the preferred method)


(in case you're wondering, i'm trying to split up packet forth into 2
layers: one closer to scheme, with GC, and one closer to forth,
without too many dynamic memory tricks)

so basicly, i'm re-inventing Joy, Factor, ...  I found a lot of
concatenative languages.  See http://del.icio.us/doelie/concatenative


Entry: true forth->c compilation
Date: Thu Feb 23 17:37:41 EST 2006


I'm thinking display lists from opengl. If there's no conditional code
or loops, there's little challenge in compiling code, you can just
flatten the call tree to a long list of primitives. Including loops
and conditional code, it is still relatively straightforward.

Now to do this properly, it might be best to keep using a register
language (C function) for primitives. These can be easily compiled to
an 'unrolled stack' C function. This is easiest for the C compiler to
do its data flow analysis for optimization. Maybe its best to just use
a new variable for each assignment.

All stack manips just manip the current variable stack.

: x 2 add ;
: y 3 mul ;
: z 1 x dup y sub ; 

// z

d0 = 1             // 1    (d0)
d1 = 2             // 2    (d0 d1)
d2 = add(d1, d0)   // add  (d2)
                   // dup  (d2 d2)
d3 = 3             // 3    (d2 d2 d3)
d4 = mul(d3, d2)   // mul  (d2 d4)
d5 = sub(d2, d4)   // sub  (d5)


Then the next steps would be to include
* conditional code
* loops

Maybe its easiest to just not support unrolling for those.


Entry: the chicken was first!
Date: Fri Feb 24 11:54:43 EST 2006


I found this today:
http://savannah.nongnu.org/projects/chicken/

So, not only is my idea non-original, the whole chicken-egg meme is
already taken!

So the project is now called 'brood', with chicken and egg being its
components.


Entry: cross compilation
Date: Fri Feb 24 14:08:45 EST 2006

So what's the most important difference from a cross-compiler/monitor/
interpreter and a metacompiling one?

The language and the meta-language are not the same.

Was adding some features in egg. The interpreter only accepts numbers
and symbols. So lists can have a separate meaning, where i made them
to be scheme code now.

(define (compile-other thing) (primtive-eval thing))

This is the same as [ ] words in forth.

So, instead of

: x [ 1 2 + 3 + ]L * ;

i can do

: x (compile (+ 1 2 3)) * ;

where scheme is the meta language, or

: x (1 2 3 + + compile) * ;

where chicken is the meta language.
Of course, the 'compile' part can be made implicit.


Entry: all eggs
Date: Fri Feb 24 17:12:34 EST 2006

Why not use all eggs?

Instead of making the communication between chicken and egg
asymmetric, why not let them all connect to each other, with one
meta-programming the other?

Again, the point of the whole exercice is twofold: 

(A) write a high level (functional) concatenative language on top of
scheme 

(B) use this to metaprogram small forth eggs

Point (A) is the next step after PacketForth, while (B) is the next
step after BadNop. The whole idea is that some things are better done
without a garbage collector, and other things are better done with.
(there's this note though on
http://home.pipeline.com/~hbaker1/ForthStack.html about linear
languages not needing a GC)


About the first egg: this is an in-image shared library. Dangerous,
but very flexible. Eventually it should be an out-of-image tethered
forth, to prevent the monitor (chicken) from crashing.

All eggs should eventually be on the same footing: microcontrollers
and audio/video engines. So let's have a look at

* writing a small egg interpreter in cps
* have a minimal interface through stdin/stdout


Entry: immediate words
Date: Thu Mar  2 15:58:06 EST 2006

what about this syntax on a microcontroller forth:

     (dup) (1) ifte

or are we completely stuck with this:
   
     if dup else 1 then

the problem with allowing these kinds of constructs is they almost
require execute, which is not always cheap and not always possible,
say for 8bit machines.

ok. this is just a prelude to badnop. how to port it?  basically,
disentangle optimization and core. then port core, followed by
optimization.


Entry: interpret/compile
Date: Sun Mar  5 14:11:25 CST 2006

during the coding of the parser rule compilation step, i run into
another explicit interpreter. which means, it can probably be mapped
to a lambda directly. this is a patter which seems to occur a lot.


Entry: deep compilation
Date: Mon Mar  6 10:44:15 CST 2006

yesterday i differentiated between >s-scheme and >r-scheme.  the first
converts only a top level expression to scheme, the second recurses,
and converts all lists containing symbols to lambdas. the reason for
this is mainly speed (i'm trying to find a way to compile to C using
chicken scheme->C comp.  as a side note, guile seems to be reasonably
fast. slow startup time but fast run time.)

(actually, i discovered lazy evaluation here :)


Entry: SSA
Date: Mon Mar 13 18:45:11 EST 2006

I'd like to know some more about GCC. And it's time to dive into my
optimizing compiler book (Advanced Compiler Design and Implementation,
First Edition by Steven Muchnick). I don't think i'm ready yet to
write a frontend, but it might be interesting to use a subset of C
which is close to the intermediate language(s) used for optimization.

http://gcc.gnu.org/wiki/GIMPLE
http://gcc.gnu.org/wiki/SSA

Especially this one:

Almost all tree optimizer passes work on SSA GIMPLE, while some work
on normal low level GIMPLE. SSA is single static assignment.

To make forth run really fast, you need a forth chip. C compiler and
most target chips are drenched in register transfer lingo, so it seems
the only place where forth can have some advantage is really simple
chips, like the PIC architecture, and on some higher level (above the
inner loop), with inner loops expanded to fill the risc pipeline.


Entry: Functional programming and state machines (monads)
Date: Tue Mar 14 15:08:46 EST 2006

I'm finally starting to get the functional programming thingy. In
order of appearance, these are the things that pulled me over:

1. writing cat in a purely functional way
2. writing a parser state machine in a functional way
3. nondeterministic programming + fp seem to be made for each other
4. a pure fp oo model: http://okmij.org/ftp/Scheme/pure-oo-system.scm

Now it's time for monads. Basically, i'd like to combine the clean
functional semantics of cat with a target forth state machine, for
compilation and interpretation. I'd like to see what this monad stuff
is all about. From

http://en.wikipedia.org/wiki/Monads_in_functional_programming

It seems the trick is in "Since the bind operator is non-associative,
this constrains the sequence of execution."

I don't really get it yet.. Maybe i should concentrate on an
intermediate representation for compiled forth code. One which is easy
to optimize on. Badnop works nice because it has an 'undo'
function. This works well for linear (one-pass) things, but for
multipass it is not really suited. Also, the way it is implemented in
Badnop (modify the tail of the allot stack) is easily mapped to a
functional approach without mutation.

It's not all that spectacular, especially not in the
concatenative/stack view: sequential execution is nothing more than
function composition, which is associative, but not commutative.


Entry: next gen forth
Date: Wed Mar 15 02:08:17 EST 2006

Not being humble or anything, but i was thinking about the need for
parsing words and compile/interpret semantics. Since for tethered dev,
there's no real need for interpret semantics: code needs to be
compiled before it is run. This should be there for packet forth too.
There's too much confusion about parsing words and compile/interpret:
<i> vs. [i] etc.. In colorForth, there is no such thing as interpret
semantics, well.. you can type numbers and execute words, but you
can't execute macros. (well, you probably can, but they will just
compile stuff, not run it). Since the same tactic (compile, then run)
is used in rscheme, i don't see why i shouldn't simplify forth and
packet forth to the same paradigm, at least for the readline console.


Entry: dynamic binding 
Date: Wed Mar 15 02:13:26 EST 2006

I realize this is a pitfall, but dynamic binding can be damn handy.
Maybe what i'm looking for is fluid-let, a way to set a 'context'
and restore it afterwards.

I guess that's why this
http://www.gnu.org/software/mit-scheme/documentation/mit-scheme-ref/Dynamic-Binding.html
is called 'dynamic binding' :)


Entry: concatenative scheme
Date: Wed Mar 15 11:45:48 EST 2006

Damn, my mind really likes this compositional (dataflow) way of
thinking.  Probably it's just a remnant of forth induced imperative
programming style, or DSP box thinking... I keep want to do things
like

(define (composition state args)
  (apply
   (lambda (state args) 
     ..
     (list state args))     ; receiver
   (sender (state args))))  ; sender


Which actually is 'named forth' or 'braided, sorted dataflow', with
which i mean, forth with names and without stack shuffling. For
example:

(define (dosomething state args)
  (define (fun1 state args) ...)
  (define (fun2 state args) ...)
  (define (fun3 state args) ...)

  (apply fun3 
	 (apply fun2 
		(apply fun1 (list state args)))))

Maybe i should have a look at GOOPS and CLOS. What about PD in lisp?
Something like:

    ((generator))
((do-left) (do-right))
     ((join-it))

I really need syntax for the above because (apply (lambda ...)
(preproc ...)) is really unreadable.  So i went on to create:

(define-macro (flow initial-args sequence)
  (reverse-accumulate
   (lambda (fn exp)
     `(apply ,fn ,exp))
   (cons (car sequence) initial-args)
   (cdr sequence)))

Now what about this:

(define-box name
     (inA inB inC)
     (outA outB outC))

(lambda-box
     (inA inB inC)
     (outA outB outC))

This works well, but there's still name duplication. I guess that's
the price to pay for flexible 'braiding' of data.

So what about fanout? Basicly that can be solved with lexical scoping,
or putting 2 processors in one box.

As long as ins and outs are the same, we have plain named let / letrec
/ let* syntax:


Entry: parallel forth
Date: Wed Mar 15 15:48:12 EST 2006

What about looking at a compiler as a parallel forth with 2 data
stacks? One is the computation stack, the other the state stack or
dictionary. Writing badnop.scm the way it is now is really feels like
i want to write it concatenatively: i'm stumbling over names.


Entry: closures -- more stacks!
Date: Thu Mar 16 15:31:16 EST 2006

PF Deja-Vu. So the deal is, you really want to have 'temp stacks' when
writing in CAT. I've gone to great pains trying to not use a single
muting operation in badnop.cat, and i realize it is really hard to
do. Sometimes you just need more space.

So we go the same route as PF, from stack-only words to words
operating on stack + VM, which is just a bunch of other stacks. (RS
and DICT). In the badnop compiler, it would be terribly handy to be
able to use input (source code) and output (dictionary) as 'global
objects' so there is no need for so much juggling.

Note that this is not absolutely necessary, but it seriously
contributes to conciseness by avoiding much stack juggling.

The most convenient way to do that instead of having a data stack 

    (1 2 3),

to have the first element contain other state information like 

   ((a b c) 1 2 3).

Where (a b c) can basically be anything, written in the functional OO
style mentioned above. The thing to keep in mind there is performance:
every 'muting' operation will cons alot.

Note that even when trying to avoid mutation, this will be have an
equal amount of 'global effect' as using global variables. So the
thing i am really looking for is closures in CAT.


http://www.haskell.org/tmrwiki/HaskellSchoolOfExpression

Any book on Haskell must be appraised for its explanation of monads in
general and IO specifically. Monads are a purely functional way to
elegantly carry state across several computations (rather than passing
state explicitly as a parameter to each function). They are a common
stumbling block in learning Haskell, though in my opinion, their
difficulty is over-hyped.

So case is closed atm.. My if .. then code is starting to look ugly,
because i can't use smart object words like comma without having the
dictionary object on the stack.


Entry: Environment
Date: Thu Mar 16 19:03:24 EST 2006

I'm going to try to enclude an environment in the stack machine.  This
will give an extra cons for each operation, but seems to be the only
workable, and for now, understanable way to tackle the problem. The
environment is just another stack, basically.


Entry: The point
Date: Fri Mar 17 12:14:31 EST 2006

Every once in a while i have to restate my objectives, because this
thing is so rich in cool new concepts and other distractions, i don't
remember what i'm doing.

I'm a simple guy. I like arithmetic. I like numbers. And i like to
build things with them. The forth approach is good for this, because
it is terribly concise to build huge things from small blocks. There
is of course merit in other languages/paradigms, but for doing what i
like to do (My Dear Aunt Sally), forth is simply the way.

I think there's only one importante thing that's remaining to be
solved. How to add objects to a forth in an imperative and a
functional way? As I explained above, sometimes you really need to do
a lot at once (hubbing), which means you need a specialized language
-- or at least specialized local words and context -- to express what
you want to do. This seems to be solved by the monad concept as used
in Haskell (which i still need to study a bit more), and fluid-let
(dynamic binding) in scheme.


Entry: Local functions
Date: Sun Mar 19 14:57:20 EST 2006

Was thinking about cat.. (emit-scheme '(dup swap drop)) gives

(lambda stack 
	(apply (find (quote drop)) 
	(apply (find (quote swap)) 
	(apply (find (quote dup)) stack))))

Find could be replaced by wrapping a let around the lambda or any
other word. The only problem here is that it needs to be done in a
single function. Like:

( 
   in> bla bla bla >out 

   ((>out ....)
    (in>  ....)) with
) :

Which would give something like:

(lambda stack
	(let ((find ...))
	(apply (find (quote >out))
	(apply (find (bla ))
	...))))

This might be exactly what i'm looking for. The external stack effect
( in out -- in out ) could be locally managed by the functions in>
and out>. Or something like:

((word dup swap drop comma)
 (word comma)
 with-i/o)

Ok, what is this about? Dynamic and static scoping. The latter is
definitely easier with dynamic scoping. Using a contextual 'in' and
'out' mechanism reminds me of inheritance: macros are subclasses of
some class which has these methods defined. There seems to be two ways
of solving it:

1. always passing around the current environment, which is kind of a
   deep change to the interpreter: words map (env . stack) -> (env
   . stack) instead of stack -> stack.

2. adding support for 'functional objects' with macros implemented as
   subclasses.


Then there is the practical question, do i really need static scoping?
All non-quoted symbols are interpreted as functions in CAT, so static
scoping would be only a namespace trick really.

What about an interpreter modification trick:

1. if word is found in context on top of stack, execute it as an
   object mapper: (env . stack) -> (env . stack)

2. if word is not found in context, execute it ducked.


Entry: object calls
Date: Sun Mar 19 19:10:26 EST 2006

Ok. I got mlit/mcall and >o-scheme, to implement method calls, but i'm
not really sure if that solves enough of my problem. The interesting
thing about scheme is the ability to implement lots of things as
closures, instead of things operating on data. The functional OO
(classless prototype) system mentioned above is like that. So i wonder
wether it's not easier to implement objects as closures..

Anyway, now i got things like:

(x) :symbols				# x is a symbol
1 2					# arguments
x ((+) duck) >code push () push		# build dictionary
(x) send				# send message

-> 3 ((x . #<procedure #f stack>))

The first real problem to solve is to implement a stack this way.
A stack would have methods:
- stack	      # return the whole stack
- push	      # push element
- pop	      # pop element
- top	      # return top element
- set	      # set new stack

So i got two ways of doing things now: compiling a subprogram which
searches in a dictionary first, and a closure based object system.
None of those really solve my basic problem: 'with-input' dynamic
scope.


Entry: State data
Date: Mon Mar 20 12:34:43 EST 2006

Currently, the data passed from call to call is just a stack. This is
not enough for the level of convenience (more global state) i'd like
to have.

The first thing that's wrong is the fact that data is not abstract. To
be able to decide to pass more context around boils down to extending
the data type. (While still not 100% clear to me, I think this is the
idea of monads, where a type can contain extra data about a
computation or state.)

When the stack is abstract, it can be implemented as just a stack, or
as a stack + extra context.

CALL CONVENTION: all words accept a context, and return a context. this
should be abstract. it can be just a stack, or a stack together with
bindings. We call the context 'state'.

It might be wise to keep the stack words as they are, since that gives
extra information about what they actually modify, and solve the
calling convention in the defining macros. the only thing that needs
to be changed is local change of control flow in primitives.scm


Entry: State data 2
Date: Mon Mar 20 18:15:51 EST 2006

It seems to be easiest to solve everything in the definition macros,
predenting all primitives are just stack ops, wrapping context
constructors around the stack effect, and lexically overriding
call/prim/lit. This means the latter 3 are best taken to be functions.

So, how to tranform a 

(lambda stack stack) 

into 

(lambda (context . stack) (cons context . stack)) 

with possibility of doing calls and taking continuations etc.
Wrapping a stack processor is trivial, but something that does 

(call 'word stack) -> (call 'word (cons context . stack))

The thing here is that such a call already returns the new context, so
wrapping the whole stack op in a (cons context (exp ...)) is not an
option. We need an explicit 'return'.

Seriously, the only way this is ever going to work without major hacks
is by treating the stack as something abstract. This teaches again
another lesson i already knew but ignored anyway: never use cxr/cons
to define a datastructure ALL OVER THE PLACE. write localized access
methods, or macros, and use those. It's payed me before..


Entry: anaphoric macros
Date: Tue Mar 21 17:02:01 EST 2006

The prototypical example adapted from Paul Graham's On Lisp, p. 189

(define-macro (aif condition . rest)
  `(let ((it ,condition))	      
	(if it ,@rest)))

So you can do things like:

(aif big-long-calculation (foo it))

I think i can use this to get a minilanguage going for defining
primitives, without explicitly referring to the stack or vm.


(prim: execute  (locals (word)       (prim word)))
(prim: duxecute (locals (stuff word) (prim word) (push stuff)))


Entry: badnop OO compiler
Date: Mon Mar 27 08:54:48 CEST 2006

Doing the OO thing: This is not a core feature. It is important to
keep the core machine stateless, except for definitions. When objects
are necessary, or local context, local accumulation, it is possible to
run another interpreter on top.

This does raise the question about code representation. Maybe it's not
really necessary to do this compilation to lambda thing. Some remarks:

1. i see no real reason to have dynamic binding for the core words

2. there is a true need for dynamic binding for the methods and messages
   where local context is used: context change == object mutation, and
   to do this functionally, all those things become context maps, so
   nothing is really fixed.

3. to work in 'pure forth style' i really only need one context object.


Why is this so confusing?


Entry: math of joy
Date: Tue Mar 28 11:53:17 CEST 2006

http://www.latrobe.edu.au/philosophy/phimvt/joy/j02maf.html

Abstract: Joy is a functional programming language which is not based
on the application of functions to arguments but on the composition of
functions. This paper describes the theoretical basis of the
language. The denotation of Joy programs maps a syntactic monoid of
program concatenation to a semantic monoid of function
composition. Instead of lambda abstraction Joy uses program quotation,
and higher order functions can be simulated by first order functions
which dequote quoted programs.

Constants are functions. Is this true for cat? Sort of.

More about mathematical foundations. Semantics of Joy is a morphism
between syntactic monoid (concatenation of symbols) and semantic
monoid (composition of functions). A monoid is a group where the
operation is not necessarily invertible, but there is a unit eliment,
and the operation is closed and associative. These are the basic
traits of a category. Actually, a monoid is essentially a category
with one object.

http://en.wikipedia.org/wiki/Monoid


Entry: depressed but ready
Date: Fri Mar 31 09:26:02 CEST 2006

Damn. Programming is hard. I start to see the way to go with pf,
badnop and the whole collection of brood VMs, but there's so much i
need to tackle to actually get there. Let's see what the real problems
are:

* I/O in CAT (badnop compiler VS. state <-> late binding / dynamic scope)
* I/O in PF (pre-emptible parser + select based mainloop)


Entry: dynamic scope again
Date: Mon Apr  3 09:49:54 CEST 2006

I am on the verge of just using set! to implement i/o for the
compiler. Yes, i need to remind myself why i can't do that: to be able
to do backtracking on input and output. That's a very practical
reason. So there's not much left. The whole thing requires a separate
explicit interpreter. So why not make the whole language like that?
More forth like, but without true assignments (all previous state's
are accessible if you observe them..)


Entry: pith
Date: Fri Apr 14 10:12:55 CEST 2006

The pure functional state machine seems to be working. The dictionary
is passed along. There is complete emulation of @ and !, without using
assignments. This enables efficient forking of the whole VM. Since
token replacement always puts the token on the top, shared constant
code will not be copied.


Entry: interpretation
Date: Fri Apr 14 10:16:21 CEST 2006

Since the above pith mechanism really requires dynamic binding, it
might be easier to just use plain old interpretation, instead of
wrapping everything in lambda expressions. This should in principle be
implementable by replacing 'compile' and 'execute'.

Actiually, it's simpler. Just replacing the functions prim did the
trick. This does change the meaning of prim from 'execute primitive'
to 'execute code segment', meaning execute and run have the same
semantics here.

I wanted to make it load faster, but hat is just as slow as
pith. Maybe it's just the guile bytecode compiler?

So, in short: hat has an explicit interpreter. This moves the
'interpret' (code/data distinction) and highlevel code / primitive
distinction (apply/run) to run-time. I'm not sure if it's actually so
useful, if it doesn't speed up things.


Entry: templates
Date: Fri Apr 14 14:18:52 CEST 2006

Been thinking.. In lisp i use quite a lot of templates. And in the
end, (lisp) functions are really templates for transforming arguments
into something else.. One pain for forth metaprogramming by explicit
list copnstruction is the absence of templates. It might be
interesting to add a primitive to do this kind of stuff, to ease
metaprogramming CAT. Such a primitive would take arguments like:

		<values> <named parameters> <expression> 

This works perfectly. A very powerful construct for metaprogramming.
Of course, this is just lambda abstractions in a language which
doesn't name parameters.


Entry: poke/badnop and linear lisp
Date: Fri Apr 14 18:18:17 CEST 2006

The next thing to tackle is to write poke as a target for badnop, and
start working on the sceleton of the lower level packet forth
replacement.

How to proceed? Badnop is a native code compiler. It requires 3
primitive instructions: jump/call/prim. Other instructions can be
incorporated in the form of macros.

What is what in badnop/poke?

VM Instructions are either:
* C code (prim)
* execution tokens

Primitives are macros that compile code adresses. Their midlevel
representation could be (prim "c_function").


Entry: literature
Date: Sat Apr 15 01:59:18 CEST 2006

from http://www.latrobe.edu.au/philosophy/phimvt/joy/j00rat.html

"However, Raoult and Sethi (1983) propose a purely functional version
for a language of unary stack functions. The stack has an everpresent
extra element which is conceptually on top of the stack, a state
consisting of variable-value pairs. Most activity of the stack occurs
below the state, only the purely functional fetch X and assign Y
perform a read or write from the state and and they perform a push or
a pop on the remainder of the stack. The authors also propose uses of
continuations for such a language."

That's PITH.

J.-C. Raoult and R. Sethi.
    Properties of a notation for combining functions.
    {\it J. Assoc. for Computing Machinery}, 30:595, 1983.


Entry: badnop and poke
Date: Sat Apr 15 10:50:55 CEST 2006

So.. The thing to get going here is the tethering, and then work from
the ground up. We need basic support for loading eggs.

This works now. The other thing is to clarify the target
system. Basically, there are 2:
(1) C -> C functions
(2) C functions -> forth

The second system is a basic non-optimizing forth simple. It's done,
save the trivial assembler that translates lit,jump,call,prim into
binary. The first system is much more interesting, because it will
largely consist of macros.

The PIC target will be similar: peephole optimization for most direct
functions, with assembler trivial.

So maybe it's best to go this route: 

- write the assembler for poke translating badnop's list rep ->
  binary, finishing (2)

- write a new target for assembling the primitives


Entry: prototype programming
Date: Sun Apr 16 00:55:30 CEST 2006

Since i have dictionaries and dynamic binding and everything,
prototype based programming is easily introduced. An object is just a
dictionary (association list), sending a message could have a simple
syntax like

	(obj selector) @

since i'm not using deref. Defining a method could go like

	(obj selector  code code code) method

Making clones is then just

	obj (obj2 obj3 ...) set

On problem though. There's no self pointer.


Entry: quines and generators
Date: Sun Apr 16 11:24:56 CEST 2006

Been trying to write quines in pf. The same is doable in CAT, which is
the same as Joy. The simplest quine (to 'run') is

    ((dup cons) dup cons)

Now, manfred has a page on using this kind of structure for lazy
lists, or generators. I was trying that today to implement the
simplest of such examples: a counter.

http://www.latrobe.edu.au/philosophy/phimvt/joy/j05cmp.html

Writing a counter functionally, the code must do two things:
- return the value
- define a new counter bound to new state

Modifying the quine above to copy and split off a payload on each
iteration gives something like this:

    ((0 swap dup cons) 0 swap dup cons)

Now the 'dup' just needs to be replaced by something which updates the
top copy -- the one that will be quoted. Let's change this to
    
    (x swap dup cons) :
    ((0 x) 0 x)

To make updates, change this to

    (update 1 +) :
    (x over update swap dup cons cons) :
    (0 (x) x)

Putting some more code in in a word gives

    (y (over) duck run swap dup cons cons) :
    (x (update) y) :
    (0 (x) x)

So you can create a new generator like this:

    (y (over) duck run swap dup cons cons) :
    (generator (y) cons dup cons cons) :
    0 (1 +) generator


Entry: interpret vs eval
Date: Wed Apr 19 18:39:07 CEST 2006

Om nog eens terug te komen op interpret/compile. Het is wel leuk aan 1
kant om alles om te zetten naar scheme, maar het vereist wel runtime
eval. Als ik nog ooit wil overstappen naar een andere implementatie
(bv chicken) is het misschien toch beter om die compilatiestap
achterwege te laten, en volledig geinterpreteerd te werken. Vooral ook
omdat ik met late binding zit. In ieder geval, dit is voorlopig een
pure implementatie kwestie. Misschien best om toch oplossen van het
probleem uit te stellen tot het zich stelt.

Een praktisch voordeel is meer introspectie. Truuken als die hierboven
zijn mogelijk als code gewoon data is en in runtime veranderd kan
worden. Serializatie wordt ook simpler.


Entry: old school 1200/2400 FSK
Date: Wed Apr 19 19:05:03 CEST 2006

I'm in need for a protocol to send power & data to chips, for very
simple control only data. The 14bit synth does something like that,
but it's not very stable or standard or simple due to its 3-phase
nature. I've been thinking about Frequency Shift Keying, the protocol
used for old audio tape data storage, but then the raw MSX
version. It's sexy and old school and doesn't sound too bad either :)

http://www.terra.es/personal7/miguel.colom/msx/codif_en.htm


Entry: code transformation
Date: Wed Apr 19 19:47:28 CEST 2006

It might be interesting to add a feature to:

1. read a source file, with comments, into a tree structure
2. modify the source file
3. write it out again

This would facilitate refactoring tremendously, and would allow me to
probably rewrite badnop code into cat code, since i do have a forth
parser.


Entry: badnop - getting organized
Date: Fri Apr 21 12:36:11 CEST 2006

Best to disentangle all namespaces:

* constant:  constants
* target:    target code flash addresses
* variable:  target data ram addresses
* monitor:   target boot monitor
* macro:     target language macros
* assembler: target assembler


Entry: compiler/assembler structure
Date: Sat Apr 22 14:02:20 CEST 2006


Badnop is currently structured as:

         P             B
  forth --> assembler <-> binary


The first one is a projection from forth code to assembler code (both
represented as lists). The second one is a bijection between symbolic
assembler and binary machine code. Some remarks about this:

* B make the assembler/disassembler reversible. This is a good idea,
  since we can import binary code.

* The assembler can be in polish notation, instead of RPN in old
  badnop.  This is a good thing, since the symbolic assembly notation
  can be made the same (or very close to) the standard assembly
  language notation.

* P contains peephole optimizations. 

It is worth finding out wether we can write these optimizations on the
forth level, to get to something like:

         P                   B             B
  forth --> optimized forth <-> assembler <-> binary

Probably this is overkill, since the assembler can be translated into
some pseudoforth, which will assemble back to the assembler code. So
there's no need to get rid of the assembler really. But, it should be
possible to do a lot more dirty tricks by writing the forth optimizer
on the forth level instead of straight-line peephole opti.


The functions themselves are implemented as stream processors, since
there's no direct 1-1 map from opcodes to words (some words are 2
instructions), and definitely not from forth to assembler.


Entry: compilation & speedup
Date: Mon Apr 24 13:24:41 CEST 2006

CAT is really really slow. So it might be a good idea to add some form
of early binding: for example, compile a single word to a primitive
(using the forth->lambda translators), but keep some reserved words
dynamic. These can be specified in a list.. So it would be sort of
a manual compile operation.

And the reader (parser) is terribly slow too. I don't really
understand why.


Entry: watching videos
Date: Wed Apr 26 00:49:14 CEST 2006

Ralph William Gosper, Jr through Guy Steele, Jr:
"What is a data structure?
A data structure is a very stupid programming language."


Entry: asssembler labels
Date: Thu Apr 27 20:06:21 CEST 2006

Right now i have bijective translation between machine code and
assembler code using absolute addresses. I wonder wether i should make
the disassembly use relative, absolute or symbolic addresses.

For sure, symbolic addresses are the most interesting. Address
resolution is a fairly lowlevel thing, and really belongs at the
machine code level, not the assembly level. This would enable editing
of machine code (grabbed from C code for example), which would be a
very good thing.

Anyway, not a trivial problem.. Badnop, the assembler on steroids :)

   cat> 0xFBAD disassemble>asm
   cat> .asm
	nopf 	0BAD 
   cat> 


So I'm in easter egg mode:

   cat> 0xFBAD disassemble>asm
   cat> .asm
	badnop	
   cat>	

har har har gasp har har har


Entry: better
Date: Thu Apr 27 20:52:53 CEST 2006

Maybe it wasn't such a bad idea after all to port badnop to CAT.  I'm
starting to like cat, though there's some problems i need to talk
about later.. It's harder than i thought, mainly because old badnop
has a lot of knowledge encoded in the form of optimized forth hacks,
which is hard to read. I'm moving away from that. Anyways, i start to
see new patterns that are interesting to explore:

Symbolic assembler snippets: dissect an image, chunk it into
functions, transform to symbolic representation and re-assemble.  In
short, better data structures, more 1-1 maps which make life a lot
easier and the code a whole lot more malleable.

I really like CAT too. The way it's possble to just 'set code' makes
programming a lot of fun. No red-tape, just set behaviour. This way of
programming is so different to what i'm used to. I'm curous about how
to optimize it though. But that's something for later..

There's one big problem though: i really need name spaces. Currently i
can work in forth-style, but i ran into several occasions already
where i forgot names, and accedentally redifined an already used
term. Is not the same as forth where these kind of problems only arise
for interface words due to the early binding. I'm already doing a lot
of namespace stuff with the bindings for the assembler, disassembler
and compiler, but i probably need to make it more explicit.

Funny how i walk the path of
stack -> (dict . stack) -> (namespace . (dict . stack))

Next to name spaces, (or using name spaces), a mechanism to connect
processes should be installed.


Entry: non-programmables
Date: Fri Apr 28 05:02:56 CEST 2006

The question is really, what about the non-self-programmable chips?
Should i support them, or should i adapt the WISP to program them?
They are fairly easy to program. I could forget about the plan of
writing programmers for the ones with self-programmable eeproms, since
a bootloader is really sufficient for those.


Entry: getting it to work
Date: Sun Apr 30 21:15:39 CEST 2006

I got CAT running in cygwin, and the MPLAB ICD2 seems to
cooperate. Everything is ready for the final leap: compiling the
monitor. A lot of things need to be fixed before i can do that

* forth file loading
* inline assembler
* macro definition
* assembler address resolution
* symbolic assembler labels
* 2constant
* if/then
* for/next
* begin until

* long jumps


Entry: quoting and syntax
Date: Mon May  1 12:39:53 CEST 2006


COMPILE TIME vs RUN TIME

CAT has no compile mode. It is completely dynamic. This makes life a
lot simpler. BADNOP (target forth) has no interpret mode. This doesn't
make sense anyway since it's not its own meta language. This is a very
important choice, central to the design of the whole system. Since the
two languages are not the same, they can be optimized for what they
actually need to do.

[Edited. It turns out that the darget _does_ have an interpret mode:
it was too easy to add the word 'trun', which takes a list, and
interprets each atom. Numbers are loaded on the stack, and symbols are
resolved to addresses and executed. However, this is not a special
mode of the monitor. In addition to ordinary runtime compilation,
there is again a 'scratch' command, which takes a list of forth code,
_compiles_ it, uploads it and then runs it. These are implemented as
just words operating on lists in cat. However, the distinction between
macros and nested words is a real one which cannot be ignored by the
programmer.]

The compiler doesn't need to be fast, but it needs to be very
expressive. CAT is a (almost pure) functional lazy language. The term
'lazy' is not really properly used here since CAT is not based on
lambda abstractions and aplications (eval/apply). What i mean is that
binding of symbols and lists of symbols to executable code is as late
as possible.

The target language needs to be fast and resource friendly. The
relation to machine code should be very transparent. BADNOP's target
forths are therefore real forths: they bind as early as possible, and
avoid high level data constructs which depend on higher level data
structures.

The interplay of both systems is where the exceptionality of this
system lies. You are bound to run into problems that are hard to
express in the target language because of its static nature. However,
in such cases it is usually not so hard to manually divide the
solution in two parts: one that can be solved at compile time, and the
remainder needing to be done at run time. The compile time operation
usually amounts to code and data structure generation, or higher order
processes like code generator generators.

In general, this decision cannot be made by a compiler. Rephrased, it
is quite hard to make a general purpose compiler that is as good as a
human to solve very constrained resource allocation problems. But, for
some things special purpose compilers can be generated, and that's
exactly what CAT is for. Having a flexible dynamic language at hand to
express these ideas in is a big plus. Using CAT you can do it either
in CAT or Scheme or anything else that can use lisp style data
structures.

SYNTAX

The target forth is compile only, which brings it closer to the
semantics of an assembler where ';' just means return and ':' just
defines a code label. This is close to the semantics of
colorForth. CAT has no compile mode, so it is a trivial syntax choice
to plug CAT in the target forth's [ ... ]

Now, there is one huge point of possible confusion: the macros
implementing the target forth are all implemented in CAT. The target
forth's syntax is chosen to be as close as possible to traditional
forth. The big difference between CAT and forth is the quoting
mechanism.

In CAT, *all* symbols encountered in the input stream are interpreted
as functions. The only quoting mechanism is the list (a b c). Lists
are treated as data, but can be interpreted as code using 'run' and
derivatives. This is the basic machinery for function abstraction and
delayed evaluation. So, in order to change the default semantics of a
symbol, the only option is to shield it from interpretation by placing
it in a list, and explicitly operate on that list.

In BADNOP Forth, *most* symbols are compiled to function calls. Some
have different meaning. There are no lists. The main quoting mechanism
to change the default compilation semantics are 'immediate parsing
words'. These are functions that run at compile time, and can modify
the meaning of words following the data stream.

An example to illustrate both quoting mechanism. Declaring a constant
with a value computed at compile time could be done in two ways:

      [ compute-constant ]L constant MAGIC
      [ compute-constant (MAGIC) constant ]

In the first approach, 'constant' is a Forth parsing word, altering
the meaning of 'MAGIC', while in the second approach, 'constant' is a
CAT word which uses a list containing the symbol 'MAGIC'. Basically,
this is where CAT & BADNOP meet. The basic syntactic choice to be made
are all centered around which quoting method to use.

When should I respect as much as possible the standard forth approach?
When should i deviate from the standard? For some things, CAT can not
be hidden. So the choice is really about hiding the dynamics of CAT in
the forth source.

At one extreme, deviating from standard too much by bleeding CAT
through to the level of the target language's syntax makes things
really confusing. Compare these:

     (1 high) (0 low) ifte
     if 1 high else 1 low then

By clearly defining 'if' 'else' and 'then' as compilation macros, the
second line has a very clear semantics. At compile time it will be
transformed into a set of conditional jumps with the remaining code
compiled. In the first line, the code can be dynamic. It might be so
that the stuff that's quoted is not available at compile time yet,
though we really don't know. Wether the quoted code can be bound early
depends on what's actually done with it. This requires a smart
compiler, which is a bit more than I can do at the moment.

The example above shows why I really need forth's explicit early binding
semantics in the target language. Because of the importance of target
resource efficiency, binding should be at the control of the
programmer, and default binding should be immediate.

So the obvious road to follow is to stay as close to forth as
possible. Which road to follow in the 'constant' example is not very
clear always, but i guess it will become clearer in the future when
special cases arise.


Entry: faster CAT
Date: Tue May  2 15:38:54 CEST 2006

* VM: store immediate context (dict, stack, continuation) in a vector
* compiled code can be stored in vectors as well.. the question is
  wether the symbols referred ever change binding..

Let's first try manual binding. Given a word, compile it to a scheme
function, resolving all occurences of symbols down to their respective
primitives, except for a set of given late binding words.

But, until I have a proper profiler there's really no point.. Just
running a trace shows that read-file is already pretty slow. Maybe
writing CAT in POKE will be a nice exercise after all. Explicit early
vs late binding trade--offs.


Entry: asm deque notation
Date: Wed May  3 12:46:51 CEST 2006

A note about notation: i use >x and x> as the input/output operations
for both queues and stacks. this is NOT the same as the notation used
for deques: (see cat/library.cat)

	   >x and <x access the head stack
	   x< and x> access the tail stack


>x and x> are both standard forth notation for stacks, and i like it
for general input/output operations, like queues, for it's visual
qualities.

BUT for the assembler, i make an exception: the buffer is not a queue,
but an input-limited deque: it is not possible to write to the reading
end (assembler input), but it is possible to read from the tail
(disassembler or compiler output). this represents the 'undo
compilation' operation, or a way to share the assembly buffer using it
as a stack.  the notation becomes:

>asm     write to end of asm file
<asm     undo write to end of asm file
asm>	 read from start of asm file

This is implemented as a reversed deque. The most common operations
>asm and asm>, which are standard notation, and the exception <asm. is
chosen to correspond to >asm.

Deques are represented as a (HEAD TAIL) pair. <deque and >deque will
read and write the head, and deque> and deque< will read and write the
tail. There's still one degree of freedom left, which is what you call
head or tail..


Entry: machine code decoder
Date: Wed May  3 15:08:39 CEST 2006

Currently the disassembler is not very efficent. Instead of
incremental search, I could take advantage of fact that for risc
architectures, the decoder is a tree. Pseudo ops need to be solved
somewhere else then.

A decoder tree for the pic18f instruction set. The structure of a
tree is 

      tree == (maskbits (result tree) ...) / opcode-list

opcode-list enumerates opcodes starting from last result, and counting
up by 1<<maskbits

see the resulting tree in pic18f/assembler.cat


Entry: non tail recursion / abstraction
Date: Thu May  4 15:48:09 CEST 2006

Tail recursion is really just loops. There is no real copying of the
arguments going on: all items can remain on the data stack and are
just being updated.

However, for true (tree) recursion, this copying needs to be made
explicit.  The naive approach leads to very hard to implement stack
juggling, to a special construct might be appropriate.

In general, there are several constructs that require really
aggressive factoring for them to be tractable. Sometimes i resort to
twiddle & test methods to get a certain combinator working instead of
trying to find the more abstract approach, which is almost always
better. So if code doesn't read as an idiom, there's something
wrong. Core constructs are usually complex, but hide that complexity
in other parts of the code by reusing it.

One of those things is an intelligent case statement. I've been trying
to make something like that for ages, because it occurs a lot.  This
is basically always a dictionary with added element that the matching
function can be variable. So how to do that?

What i want is something like:

(compile
		(((list?)             ((comment) compile-op))
		 ((macro-dict find)   (run))
		 ((constant-map find) ((compile-data) for-each))
		 ((symbol?)	      (compile-code))
		 (compile-data))
		 		switch) :

[edited]		

It eventually became just that, but each line is just a pair
of lists (cond . code) which looks like ((list?) do something)


Entry: cache bindings
Date: Thu May  4 22:04:51 CEST 2006

I don't want to leave lists yet, want to investigate the caching of
bindings.

Using current functional substitution where items are listed according
to age, it is possible to add markers. If a tag does not occur before
a marker, one can be sure it was not modified since the addition of
the tag, so a cached value could be used.

This can also be implemented as a cons of 2 lists, one with items that
never changed value, and another with things that have been redifined
after being used.

It would be really nice to have a gradual trade-off between static and
dynamic binding. (Which is really nothing else than a decent cache...)


Entry: forth style comments
Date: Fri May  5 14:44:46 CEST 2006

It's funny that parenthesis in standard forth are used as
comments. That really smells like a statement :) Anyways, in Badnop
forth, lists have no meaning, so parenthesized can just as well
approximate their original meaning as comments. There's one difference
of course: they need to be balanced, which is not so for the original
forth words. Inside a [ ] environment, lists are lists again.


Entry: undo
Date: Sat May  6 13:48:16 CEST 2006

Some optimizations require substitutions in the asm code. It might be
a good idea to provide better access to that code before getting into
twiddling again.

The first one encountered is in 'intelligent next' which substitutes:

    (label xxx)
    (drop)

by
    (drop xxx)
    (label)

Currently, the code just saves all the intermediate code on the stack,
tests, swaps and pasts the code back. The pasting is not necessary in
the case where nothing changed. So a construct like 'save-excursion'
should be used. In fact, this is easy enough to do with exceptions.

(modifying branch) (not modifying branch) try

An interesting modification of this is something like: throw (rewind
. code), and let the handler run the code when the 'rewind' exception
is thrown, or rethrow otherwize.

I tried that, but it looks too ugly. An easier approach is using

     (attempt   (drop) swap concat (failed) car catch) :
     (failed    (failed) throw) :
     (require   not (failed) if) :

This way the code that messes up things can call 'require' from time
to time to make sure it's still on the right track.


Entry: on to the next one..
Date: Sat May  6 15:48:59 CEST 2006

This means i'm sort of done with the basic porting of the compiler
itself. Time to start on some forth code. The monitor is the first
target of course. And some decent error messages.


Entry: applications
Date: Sun May  7 00:42:45 CEST 2006

* remote control for mplayer
* synth on the 18f core
* 16f programmer
* pc keyboard/mouse reader
* modified mouse: synth in a mouse/keyboard?


Entry: print syntax changed
Date: Sun May  7 17:31:20 CEST 2006

Changed . -> print. Dot is now reserved for pair notation ( . )
Added a shortcut p for print, like the unix 'dc' command.


Entry: debugging by printf
Date: Sun May  7 18:28:35 CEST 2006

Or really, printing traces. Still the best way :)

The easiest way to do so in CAT is to add hooks to certain points in
the code. Currently i have hooks for all module inputs: compiler,
assembler, disassembler, image buffer. Still need to find some proper
way to give feedback about _where_ an error occurs. 


Entry: lazy evaluation
Date: Mon May  8 00:31:12 CEST 2006

Watching lecture 1b of
http://swiss.csail.mit.edu/classes/6.001/abelson-sussman-lectures/
and i think i'm starting to get it. Lisp macros are really nothing
but a way of working around strict evaluation, i.e. to have local
non-strictness.


Entry: interactive monitor
Date: Mon May  8 17:01:17 CEST 2006

So, monitor is working. The result code is exactly the same as the
original, so i think i did a good job porting. Next question is how
to manage state. I'm probably going to extend the monitor with a set
of compiled macros for basic interaction.

Time for a bit of reorganization, because now i'm at a point where
complexity can explode. The whole application should have convenience
maximized for this kind of developement:

DIY:
* create a bootloader for a certain architecture (chip + dev interface)
* burn it to the chip using a 3rd party tool

PREFAB:
* buy a pic chip (usbpicstamp) with a built in badnop bootloader 

From that point on, it should be possible to add new code, without
having to erase anything, one building on top of the other. Also, the
forth image should be able to be saved, so it is possible to continue
working with all the data and meta data defined.


Some questions/remarks
* Why do i have token-map and labels?
* assembly and disassembly don't need to use the same interface (here)
* assembly buffer is transient?

[Edit. All of those are fixed. Incremental code upload is working now,
and saving is implemented together with version control using darcs.]


Entry: linear vs. functional
Date: Tue May  9 19:12:04 CEST 2006

Did i actually learn something re--implementing badnop?

(a) Flat lists as dictionaries are REALLY slow
(b) Higher level functions are very convenient to work with
(c) Ticketing: save context, try, backtrack


I already knew (a). I recall doing some profiling on old badnop, and I
found out more than 99% of the time was spent traversing the symbol
table. In order to speed up CAT/Badnop that's the place to start
looking, since everything is symbolic.

I am amazed by (b). It's really another world when you don't have to
care about binding times, and can juggle code on the fly, taking it
apart and assembling again. This works really really well.

Although i haven't used (c) to its full extent, since i'm only using
the linear (stack based) construct, i can already see it is a
blessing. I suppose proper continuations would make it even easier in
some spots, but i'm not sure about how yet. One place would be to
automate 'undo' for literal and other operations, which means that at
the point a literal is to be compiled, it is either compiled, or
passed to another function. This i need to think about more. Using (c)
requires functional data structures though.

So, for the purpose of writing an assembler/compiler
-> symbol tables should be faster (hashes?)
-> data buffers (stacks/queues) should remain functional


Entry: backtracking
Date: Tue May  9 20:14:38 CEST 2006

There are a lot of beautiful possibilites for simplifying the code
using non--deterministic programming. Examples:

* for..next: use a dup / drop .. drop around
* literal compilation: either compile, or pass to another macro

The interesting thing here is that most choice points can be resolved
rather soon in the compilation. Instead of using pure continuations,
only the data state could be saved.


Entry: pseudo state
Date: Tue May  9 20:16:13 CEST 2006

I have 'functional' state, which enables functional 'undo/fork'
tricks, but for some constructs state can get in the way. Let's see
what i use it for:

* name/value bindings 
* having 'current' things like current state of compilation target.
* caching results of compilation, assembly, disassembly...


The last one are 'current' things too, but they serve only the
implementation and are not very useful from the user point of view, so
their extent should be kept minimal.

[Edited. I think this is mostly fixed. The only kept state is the
state associated with the target machine: dictionary, macros and
compilation point. Image could be removed even, since it's present in
the target anyway.]


Entry: pattern matching
Date: Wed May 10 00:23:36 CEST 2006

I was wondering wether writing the optimizer as a pattern matcher
isn't a better idea. Since these are really a lot easier to express,
and probably easier to compile too. What i mean is this: i have a lot
of comments like

# (dup drop)  == ()   |  (x drop) == (x drop)
# (drop save) == ()   |  (x save) -> (x save) 
# (save drop) == ()
# (save unsave) == ()

# (fn ;)     == (asm (jump bra fn))
# (number ;) == (asm (retlw number))
# (x ;)      == (x asm(return))


# (0x00 reg !) == (asm (clrf reg 0))
# (0xff reg !) == (asm (setf reg 0))

# (x     reg !) == (x asm (sta reg) save)
# (dup x reg !) == (x asm (sta reg))

# (number +)    == (asm (addlw number))


This should really be code, not comments. But, it's probably not so
easy to do without giving up the serial execution semantics of the
compiler, like for for..next and begin..until macros.

( a ) Can it be done to write only a part of the optimizations in this
form?  

( b ) Can this be generalized to a compiler for a proper concatenative
language which uses early binding?


I've been musing before about constructs like '(a b c) if', but
decided not to go for it because of late binding problems. However,
having a syntax like that and using it for late binding are
independent things..

The thing i want to get at is how to write an optimizer based just on
rewrite rules? This can probably be dragged quite far when the
language is really concatenative. Forth is not completely.

What about compiling forth from right to left, since most
optimizations really do 'eat their arguments'?

What i really should do is to not compile to assembly language, but to
something which looks more like forth, but where each word is one
instruction. The nasty thing is of course, how to solve multiple
argument instructions.

food for thought..


Entry: cleanups
Date: Wed May 10 01:08:37 CEST 2006

* getting rid of labels/tokens distinction?
* reloading the host application does no longer overwrite data state
* state saving / image saving

The last one requires some thought. There needs to be a distinction
between host and target code, so they can evolve together. The main
question is how to make them merge properly.


Entry: random number generators
Date: Fri May 12 16:04:25 CEST 2006

I finally got it :) The easiest way to get a decent random number
generator is to use a Galois field GF(2^N), consisting of polynomials
with coefficients in GF(2), which is the set of elements {0,1},
together with addition (XOR) and multiplication (AND). The polynomials
are taken to be modulo m(x), a irreducible polynomial of degree N.

There is more than one such polynomial. The algebraic structures
corresponding to each such polynomials are all isomorphic -- there is
only a single field GF(2^N). However, they are represented
differently. For use as a random number generator, the easiest
approach is to take an m(x) such that x will generate all the nonzero
polynomials. Such a field is called primitive.

Multiplication by x corresponds to a left shift. Suppose s(x) is the
polynomial representing the current state of the generator, then there
are 2 cases. 

* The result xs(x) is of degree less than N. In this case it will
  not require a reduction.

* The result xs(x) is of degree N. In this case, it is possible to
  obtain a polynomial of degree less than N by adding m(x), since both
  have a monomial x^N which cancels out after addition.

This leads to a very simple algorithm. For N=8, a primitive polynomial
is m(x) = x^8 + x^4 + x^3 + x^2 + 1, which has a binary representation
11D, taken by identifying each bit with the corresponding coefficient
of m(x). After left shifting the state (multiplying by x), the result
is either 0?? or 1??. Only the latter has to be reduced by adding
m(x), which means performing an XOR with 11D. The high bit always
results in zero, so the algorithm becomes:

- shift s right
- carry? ->  s = s XOR 1D

or "clc rot<c c? if 0x1d xor then"


Entry: strict eval
Date: Sun May 14 03:15:55 CEST 2006

I was thinking about the relationship between strict/lazy and
late/early binding. Both are not the same.. I think I've been misusing
the term before. The first deals with supplying arguments to lambda
expressions, the second with knowing things at compile time. But,
there is definitely some relationship.

What I'm looking for is the distinction between program quoting as in
cat and joy, and block structure. Program quoting is a form of delayed
evaluation though.  If I make this the only real difference between a
lowlevel and a highlevel language, can i make it work?

What would be the difference? The effort would be to clean up forth
semantics. Replace parsing words by quoting structures, and optimize
them away where possible. In short, it should be possible to build a
purely concatenative language like that with early binding code
quoting. This is basically what i have in packet forth using the { }
construct, where it is clearly distinguished from lists, which are
data. 

Can this be done automatically? Or is it really necessary to
distinguish between list semantics and quoted code?

For example

    ((a) (b) ifte) (c) ifte

Would compile to symbolic quoted code in CAT, but the a static language
would transform it into something PF-ish like

    { { a } { b } ifte } { c } ifte

which, (apart from notation :) compiles 'a', 'b', the first 'ifte' and
'c' as resolved code, in addition to the last 'ifte'.

Maybe the thing i want is just a way to express higher level functions
in such a way that they will be optimized to something equivalent, but
with expanded iteration? For example:

    (a b c) (process) for-each   -> a process b process c process

Now, this makes sense, but the 'map' one does not. Without highlevel
aggregates these constructs do not make sense. So, if the idea is to
run this stuff on a microcontroller, it at least needs lists, or
vectors, as built in types, so they can be loaded on the stack. In
plain forth it is still possible to do

    buffer  for-each  process

Where for-each is the parsing word that compiles iterative calls (and
that knows how to stop, so buffer returns somehow a delimited object.)

So, i think i have to conclude that, yes, it is possible to do this
automatically, like a smart 'ifte' which compiles if possible. But
doing this without explicit syntax for binding time might be
confusing. In the end, that's why PF's { } is not the same as ().

So this requires higher level data structures (lists/vectors). For
true lowlevel languages it might be better to have special purpose
data structures (delimited arrays) and define highlevel
functions/macros for them. Stick with forth.

Which means i have 3 layers now:

* CAT (dtyped, symbolic, late, functional, GC)

* PF = FORTH + LISTS (dtyped, compiled/symbolic, early/late,
  imperative, reuse lists/trees)

* FORTH (untyped, compiled, early, imperative, vectors)

The middle one can be seen as a sorth of transition layer. Both CAT
and PF can be written in a lower level forth. I'm not really sure
wether it is interesting to write CAT in PF.


Entry: roadmap
Date: Sun May 14 17:45:53 CEST 2006

So, what do i need to do next? Give things a name.

* CAT = functional concatenative language
* BADNOP = compiler
* POKE = in-image forth + linear lisp / packetforth experiment
         contains forth->C compiler (compiler for poke VM is trivial)
* PURRR18 = 8bit forth for 18f pic + crapsynth / sheepsint
* PURRR30 = 16bit forth for 30f pic


Brings me to things to do:

- move to 18F1220 as base processor
- write synth for 18F1220
- start working on 30F architecture
- start working on POKE + C macros.


Entry: pic 18f chips
Date: Sun May 14 22:28:03 CEST 2006


discontinued:		252 452
replaced by, new cod	2520 4520
small, orig code:	1220 1320
new code, usb		2550 4550

        usb     ins+    intosc  
1220	-	-	+
2520	-	-	-
2550	+	+	-


Entry: some old notes
Date: Wed May 17 01:22:59 CEST 2006

Going through some old notes, and trying to distill the basic ideas
from it. Some questions:

* How to efficiently implement type multimethods vs. typless systems.
* Can fast reuse cache opti and general GC be combined?
* Functional versus Imperative.
* GC lists vs. LINEAR lists vs. STATIC ARRAYS
* Static process core + dynamic config language.
* Synchronization for parallel processing.
* How to represent grids: where to move from arrays to lists
* Grid combinator control flow (transpose symmetry and APL?)

The answer to the GC question is probably no. Not explicitly freeing
as soon as possible is in conflict with automatic (non RC based)
GC. Unless we use the ideas from linear lisp, functional programming
seems to require general GC. There are some ideas about 'observation
of data' which is really about fanout and data ownership / linear
lisp.

Most of the ideas about static process and dynamic setup/compile are
stable in my head now, however the quest about data structures does
not talk about the approach where a program has two phases: the first
one is effectively lisp with GC, while the final stage runs on static
data structures. This is mostly the idea behind CAT/PURRR, only there
is a host/target distinction at the point where the dynamic/static
distinction is made. Moving this toward the target might prove
beneficial. (I briefly looked at a paper about this but lost the
pointer..) Essentially this is again 'the interpreted compiler'
pattern, where you have a slow flexible language to express
compilation of rigid but fast specialized code.

The next set of questions is about dataflow optimization. This is
something I should not attempt before educating myself to the state
of the art of compiler technology a bit more.


Entry: monads
Date: Sun May 21 09:59:48 EDT 2006

This is killing me :) I probably need just a couple of hours of
concentration to get it, and then hopefully rewrite the whole of CAT
to incorporate the ideas.


Entry: dynamic scope
Date: Mon May 22 17:58:30 EDT 2006

So, i really should call dynamic scoping dynamic scoping, meaning
reserving some syntax for it, like 'with-dynamic' or just 'with'.

In general block based languages, lexical scoping is
preferrable. However, CAT is not block based, so there is no such
thing as lexical enclosing, only run-time enclosing (function
nesting).

This means dynamic scoping is the only way to have 'local' data, where
local data here means 'alive for a certain function nesting depth'.
Using combinators effectively amounts to the same thing: data stored
on the parameter stack (or the return stack if it is accessible) in
the form of an implicit 'zeroth' item.


Again.
- no lexical scoping possible due to language paradigm
- dynamic scoping is the only alternative left

I do wonder how to modify the language in such a way that lexical
scoping does become possible. It seems to me that the OO approach is
here simpler to do than the lambda/closure approach. Probably i will
some day need a name space mechanism for pure practical
purposes. Making this hierarchical makes it an object/module
system. Need to think about this a bit more.


Entry: chip state
Date: Wed May 24 17:25:26 EDT 2006

Need to have a way to automatically save the entire chip state for a
single project. Now.. do i let this depend on
-- project id present in bootloader
-- unique chip id

What about this:
1. standard way to create a bootloader, given chip type and osc freq
2. this creates monitor.hex and a state file in a certain directory

so, state dir, contains subdirs per project ID.. ID is present in the
bootloader, so CAT can ping the chip, get the project ID, and load the
corresponding state.

Continuing on this subject. Currently there are two 'home directories'
for Brood. One is the system directory, which is used by
'brood-resolve' and 'use', and the user state directory and taken from
the environment variable BROOD_HOME, which is set automaticly if not
present. The other is the user state directory, which is by default
$HOME/.brood, and which contains state associated to projects.

The purrr projects are for example (purrr sheepsint), where the state
file loaded by '(purrr sheepsint) restore' is 
~/.brood/purrr/sheepsint/state.cat 

So, how to initialize a monitor?


Entry: mode changes
Date: Mon May 29 13:31:51 EDT 2006

Syntax question. I already have 'asm (ins) (ins) ...' which is
self--terminating. Why not use this for other things like 'vectors'?
Maybe it's best to make mode changes a bit more explicit, since '['
and 'macro' are already implemented as such. Is it possible to share
code here?

Hmm... maybe this is a bit too much nitpicking..


Entry: name spaces
Date: Tue May 30 10:27:40 EDT 2006

This needs to be thought out a bit better. Since the dictionary is
actually a set, name collisions are more frequent, so namespaces are a
necessity. Look at this pieace of code in brood/comp.cat

# save only the compiler state
(app-variables   ( image          # binary code
		   labels         # symbol -> address map
		   target-words	  # list of 'true' target words
		   macro-dict     # compiler macros
		   constant-map   # constants & variable names (ram addresses)
		   forth-log	  # log of entered forth code
		   project	  # project meta data
		   prog-word      # code flash compilation point
		   byte		  # data ram byte allocation
		   bit bucket	  # data ram bit allocation
		   deferred	  # data ram deferred word vectors
		 )) :
(print-app       app-variables (() cons print-var) for-each) :
(save-app        (print-app) with-output-to-file) :


Ugly innit? This really has to change. All this should go into the
'project' namespace. Things to do:

* get/set for hierarchical dictionaries.
* code/data distinction: be clean from start
* interpreter: dictionary stack

GET   var         dict find
SET   (var) set   (dict) add

So, simple 'word' semantics doesn't change. However, it might be
interesting to use explicit fetch/store. Or just to have both..

	    (dup swap drop) (nop) !

which is really

            (dup swap drop) (nop) dict add-recursive

Alright. Everything is packed away in the (project) namespace. Much
better.


Entry: todo
Date: Tue May 30 19:01:24 EDT 2006

* erase project (revert to clean monitor)
* boot hook (incremental)


Entry: flash programming
Date: Wed May 31 13:04:49 EDT 2006

One lesson i learned: this chip, though it has its charms, is a bitch
to program! Using only flash, without abstracting it as ram, is quite
something.. Maybe it is better to abstract it as ram anyway.

There's this trick you can use with flash programming. There are two
operations, which both work in blocks. Erase (x->1) and program
(x->0). For the 18f chips, erase goes in blocks of 64 bytes, while
program goes in blocks of 8 bytes.

It is possible to overwrite an already programmed word, but this will
result in the already present word ANDed with the to be programmed
word. Since both 0xFFFF and 0x0000 are nop instructions, this can be
used to an advantage: emulating ram does not require an erase-block
operation every time.

This seems to work.. Now about the boot monitor. What if I use the
reset pin as a trigger to enable monitor or not: high means run
program, low means execute monitor.


Entry: icd2 linux
Date: Thu Jun  1 17:40:52 EDT 2006

Seems to work using icd2prog. I modded it a bit so the config works
for the chips i'm using. Might be interesting to snarf some
functionality to talk directly with the icd2 from within CAT.


Entry: time for reflection
Date: Fri Jun  2 10:46:11 EDT 2006

Pun intended. It's crazy how fast you start wanting to rebuild the
language when starting an application. For the synth, i am in need for
deferred words, like doer..make or defer..is

Both of these constructs seem a bit hard to use for the simple things
i'm doing: choosing different control/synth engines at runtime. The
12f synth in previous badnop had a relatively simple design: it's an
interpreter with a standard tree-structured decoder, where the last
phase of decoding is a jump into a table.

The only disadvantage here is the fact i'm using flash memory, so i
can't just start overwriting things, and the other thing: one word
branch ops are 11 bit, meaning they reach to +- 1<<10 words.


Entry: teaching
Date: Sun Jun  4 10:01:41 EDT 2006

The thing i want to teach in the sheepsint tutorial is: an application
is an interpreter and a language. So, what i really need are tools to
make this kind of approach a bit simpler.


Entry: flash
Date: Tue Jun  6 10:27:39 EDT 2006

Deferred words with addresses in ram are not a very nice way of doing
things. They are clumsy to implement and are not persistent, and so
need initialization from constants stored in flash. Why not store
everything in flash in the first place? Using byte-based jump tables
instead of word-based deferred words makes life easier.

So, it's easier to go this way: support jump tables explicitly. They
are present in sizes of 2^n, which can be muted only from the monitor
prompt, one table at a time. The operation that does so will erase an
entire block, and then reprogram it.

It's better to do something more general: redefine / overwrite a word,
since that allows arbitrary hacks that keep bindings.

(name code ...) overwrite
-> get address of name, save it + org
-> compile + assemble code
-> get end address
-> determine blocks + upload them

This seems to work, in the form:

     0x20C (a b +) hack

Where the numerical argument is the start address of the patch.


Entry: sweet noises
Date: Wed Jun  7 16:43:54 EDT 2006

So.. 1 bit sound.
Time to go algorithmic.

An interesting class of sounds are simple synthetic waves, reso
filtered, and then limited. If there's a way to do this without too
much computation, i'd like to find out.

It's probably fast harmonic sweeps, but in such a way that phase
correlation is maintained?

So let's see what we can do with harmonics only.
Using an 8 bit counter does not give a lot of resolution.

Anyway, time to get the thing done i've been postponing for ages,
which is..


Entry: TODO
Date: Fri Jun  9 13:23:38 EDT 2006

* PIC usb
* pd interfacing tutorial
* sheepsynth tutorial
* cat language documentation
* patch the forth mode, so it's possible to upload or
  overwrite (hack) blocks of forth code


Entry: convenience
Date: Wed Jun 21 11:51:49 EDT 2006

Time to get some convenience things working first:
* forth mode for emacs, with upload keys etc..
* fix parse problem
* automatic darcs archive init


First two are fixed now. Forth mode with built-in upload is a lot
easier to use indeed. Parse problem i hacked around. Needs to be fixed
properly some day. So, I'm about to start a new project: the
seriousint, a bit more serious approach to squarewave synthesis. Going
to use that one as a test for 'how to create a project'.


Entry: namespaces
Date: Thu Jun 22 12:48:14 EDT 2006

So.. time to clean up namespaces. Basically, you can't really do
without considering the binding scheme i'm using. Compared to (what i
call -- i don't know the 'proper' name) forth-style linear binding
this kind of global name space approach i'm using breaks too
easily. In linear binding, redefined words are only bound to new code
in code loaded after the redefinition, which enables the re-use of
'internal' words. Just redefine them, since they only serve purpose at
load-time, and supposedly are not to be used by other modules.

The main thing that needs to be patched is the interpreter. Instead of
using a dictionary directly, it needs to use a list of namespaces to
search. There could be one global variable indicating this: search
order.

((core) (assembler) (compiler) ()) (search) !

This seems to work. Now for task two: eliminate all self modifying
code, or at least, isolate things that are modified in a separate
namespace.

Done. The variables now live in the (var) namespace, how original!
The next thing that could be done is to eliminate variables that are
only meant to be temoporary. To do this automaticly would require a
whole different name binding and scoping approach. Maybe leave it for
later.


Entry: search cache / compiler
Date: Fri Jun 23 09:58:20 EDT 2006

Once a large part of the dictionary is marked EXPLICITLY as read-only,
it can be cached in a faster lookup structure, or maybe even bound
permanently, directly to the code/data itself.

The first level of compilation is really straightforward. Just perform
recursive concatenation on each symbol until you reach primitives.  A
better approach would be to keep the nesting of words so they can be
re-used. This would require a data structure that represents a
compiled word. We could use just a list as before, but that requires
an explicit 'run' every time such a list is executed.

The second level is a lot more difficult, since i give no information
at all on how to do other kinds of nesting. Even the simplest () ()
ifte can't really be compiled without prooving that the two lists are
not generated at runtime.

So, while CAT is flexible due to its simplicity, it's also terribly
hard to compile properly, because of the lack of meta information on
the words. The FACTOR language is a good example of how to aim for
compilation. It's also more forth-like, and has declaration to be used
by the compiler.

All in all, I think speeding up the compiler/assembler can best be
done by writing it in some other sub-forth, piece by piece, and keep
CAT as simple as possible.


Entry: usb and indexed addressing
Date: Sun Jun 25 12:11:14 EDT 2006

Started working on the USB driver again. It seems relatively
straightforward, but a bit boring. I do think it's a good idea to
start using the extended instruction set for it's indexed addressing
when dealing with USB messages. Will ease coding a lot.

This does require moving the stacks around, probably best to the
region 0x000-0x05F, and it limits the globally accessible variables to
the range 0x060-0x07F, which means 32 in total.

In general, the choice between extended set or not will mostly depend
on the programming model for the application. The extended
instructions themselves are interesting for more efficient moving of
data, but usually a lot of moving can be avoided in simple
self-contained apps.

Before starting to implement that thing, it might be wise to do the
config bits too.


Entry: usb decoder
Date: Mon Jun 26 18:51:00 EDT 2006

Since USB is going to be quite an important part of the PURRR project,
it's a good idea to spend some time thinking about how to implement
the decoder. Basicly, USB is an RPC mechanism: 

* host sends request, client responds
* all requests have the form of tables

This is not very exciting. Doing this directly in PURRR given a table
of name->value maps is really really boring and error prone. So is
there a way to specify the whole USB protocol in a more high-level
fashion, and then compile it to forth code?

see usb.txt


Entry: sheepsint simplification
Date: Wed Jun 28 12:22:10 EDT 2006

- 3 frequency levels: synth, control (/40), note (/25)
- note sequencer uses programs


Entry: caching
Date: Wed Jun 28 13:57:56 EDT 2006

What about writing a caching mechanism for method lookup, instead of
the very slow dictionary traversal employed now? CAT is impractical
for larger projects. Even the monitor and sheepsint.f take quite a
noticable time to compile. How to solve this?

I tried to cache the symbol binding at the level of 'state-find',
which gives a significant speedup, in the order of 4-5 I wonder if
it's possible to cache things at the level of 'dict-find' or
'assoc-ref', using the dictionaries themselves as keys to map to hash
tables.


There's 4 distinct operations on a dictionary

non-muting: * FIND
muting:     * ADD 
            * REMOVE
            * REPLACE

The goal is to find those dictionaries that are constant, cache them
with hash tables, and ignore all the rest. The thing to avoid is to
pile up the cache with thing that are normally GC'd, so the cache
keeps the only reference leading to a blowup. Don't seem to be so
easy.


Some ad-hoc approaches: Every N searches, decrement the TTL of each
dictionary. On each reference, set the refcount to some constant
value. This will guarantee that dead dictionaries will be deleted, and
often used ones will stay alive. There are just two tuning parameters.

Complicating requirement: hash needs to be able to reply 'false', so
should have a pretty clear picture of what's in the dict. Maybe this
is even easier to do: one to one correspondence between dict and hash,
and updating hash from dict update operations.

for example:

not to cache:
- toplevel & var

definitely to cache
- core & badnop & purrr
- macros and asm after init

What can be done to avoid registering the main dictionary, or one of
its clones? It would be good to start caching something only when it's
been referenced a couple of times, end then keep it synced. What about
randomly doing this?

This is not going to work in an ad-hoc manner.. Too many special cases.

What i need is a way to detect constant dictionaries, and cache only
those. How do i know if a dictionary is constant? If it keeps being
referenced. So, what i have is a hash which contains just all
dictionaries that are referenced, and on each reference their count
goes up, while once in a while, every count goes down.

Ok, some testing with ad-hoc algo teaches me that it's not going to
work if i include the top level in searches, since it's the one that's
always referenced, and changes any time anything else changes. Plus,
it's small. So toplevel is out.

What about moving the cache point? Things that need to be cached
because they are largely constant are:

- all executable (core/badnop/purrr) words
- assembler: machine class-map
- compiler: macros (+ update)

What about explicitly locking/unlocking a dictionary? Then locking
could turn it into a hashtable. This seems to work, but speedup is not
so big as i thought it to be.


Entry: caching / compiling
Date: Thu Jun 29 10:36:15 EDT 2006

Again, what's the deal? Making it run faster. Period. Turning things
into hash tables doesn't seem to work that well. I get a factor 2
speedup for compiling the boot monitor. Maybe this does have to do
with the compiler dictionary though.

What about compiling after all? 

See bind.scm

It seems to work reasonably well. Don't even need hashes! The monitor
compilation goes from 15 -> 5 seconds. Good, but i thought it would be
better.. The compilation step still takes quite some time, so it could
be that dictionary lookup for code words is not the only
bottleneck. Maybe the macros should be compiled too. There is a clear
difference between compiling the monitor code the first time, and
doing it again with a 'dirty' project. It looks like a significant
amount of time is spent in maintaining the functional data
structures. Time to take a closer look at Okasaki's PhD thesis "Purely
Functional Data Structures" section 6.2.1 Binary Random-Access Lists.


Entry: backtracking
Date: Fri Jun 30 01:31:32 EDT 2006

Since I went through the trouble of writing CAT's data structures
purely functional, why am I not using true backtracking for things
that are not easily undone like the for..next inner/outer drop? Maybe
afraid of efficiency issues? Maybe it still is overkill? Manipulating
assembly is probably still cheaper than recompiling the same code on
'fail', but it would look better. I do use this mechansim, but only
through contained single direction 'exception' backtracking.

So the real question is: backtracking on functional data structures,
or linear logic and explicit coding of inverses?

Thinking about it, this particular problem might benefit from treating
the assembly code as a tree. If a particular loop's body is kept
isolated, inserting it into the parent code is a simple manner.


Entry: profiling
Date: Fri Jun 30 15:08:12 EDT 2006

Eliminating the dictionary lookup for code was a safe guess as an
optimization point. However, the rest is probably harder to guess, as
I already made some wrong ones. So, time for some proper profiling.
How to tackle that one?


Entry: return stack
Date: Fri Jun 30 15:11:12 EDT 2006

I'd like to find out how to gain access to the guile return stack, or
to implement CAT only continuations. Mainly for debuggin purposes. To
at least know WHERE an error occurs.


Entry: todo
Date: Sat Jul  1 20:53:38 EDT 2006

* sheepsint note & controller stuff & docs (1 day)
* interrupt driven 3 oscillator hires timer synth (1 day)
* connect purrr to pd
* usable emacs configuration for purrr

* proper forth mode


Entry: disassembler
Date: Sat Jul  1 21:10:30 EDT 2006

About synchronization between chip and memory image. The (project
image) dictionary should really at all times be in perfect sync. If
it's foreign software, there's no monitor running. If it's
self-modifying, the machine code should be read before disassembling.

So disassemble =
* read code block
* verify
* dasm symbolicly using the symbol table

This means the disassembler is connected to only one source, and the
dirty ugly hack can be removed.

Done. Looks cleaner now. Do need to add verify operations.


Entry: forth & puredata
Date: Sun Jul  2 11:25:06 EDT 2006

Might be best to get both working at the same time. What exactly is
necessary to connect pd and purrr? Doing interactive dev in pd is
probably not necessary. Getting data as fast as possible might be.

So i have two basic solutions:
* make PD module understand dictionary format
* add OSC to guile/CAT

Been working on the forth mode. It stays in forth mode now, even on
error. And it behaves as a proper console with possibility to upload code
through
-> interpretation of lists as CAT objects
-> import of some cat words, including 'r'

In short '(: + + ;) r' still works in forth mode.


Entry: old badnop
Date: Sun Jul  2 17:00:53 EDT 2006

There's 2 features i didn't port yet
* branch 'snapping'
* 14 bit core compiler

The latter i'm probably going to skip, since it has no support for the
paging madness. The former might be interesting though, but is really
an image level optimization:  dasm branch? follow asm


Entry: sheepsint
Date: Thu Jul  6 15:57:24 EDT 2006

Time to simplify. The idea is this: to split the synth design i did
for the 14bit forth in the original badnop, which goes as far as
designing a new language for synth control, into two project

seriousynth:   a serious attempt to generate high quality sound
	       using the resources available on the 18f, like the 3 high
	       resolution timers and the hardware pwm, adc's and high
	       frequency oscillator.

sheepsint:     a scaled down version of the original 14bit design, not
	       using interrupts, using internal oscillator, and purely
	       synchronous pure forth code.


Entry: todo go forth
Date: Fri Jul  7 14:05:52 EDT 2006

* emacs mode
* bootloader for all chips
* pd interaction?
* sheepsint: interrupts?


Entry: grids
Date: Fri Jul  7 14:08:58 EDT 2006

Here we go again. Loop transformations for grid/matrix operations. I'm
getting a bit sad over the code dup in the matrix and image code in
pf. Most of these things could be split up into some higher level
constructs. What pf needs is:

* better type system
* automatic generation of iterators
* better polymorphism

the first one means that executing type checks for each primitive
should be minimized. the system should be designed in such a way that
these checks can be disabled, so polywords need a different
implementation.

the second one means i need a way to represent iterators, ideally a
subset of C that can be easily generated in the form of S-expressions.

the third one is something that needs to be thought out well. program
specialization is important.


another thing i need to accept is that either i write an opengl app,
or i write a video app. both at the same time is too complicated.


Entry: pre-scheme
Date: Fri Jul 21 00:18:16 BST 2006

scheme48's low level stuff is written in pre-scheme. can't i do
something similar in CAT? reasing in abstraction could do things like:

* turn on late binding
* turn on GC

the main problem with cat is things like

    (bla bla bla) (boh boh boh) ifte

which, to be efficient, should have the lists bound as code, not
data. more things can be added:

* moving from linear code to linked lists (different 'enter')
* moving from untyped to dynamicly typed code (types as pre-xt's)


Entry: workshop
Date: Sat Jul 22 10:44:05 BST 2006

first workshop went well. i'm updating doc/sheepsint.txt

the most important thing to change is to make the target mode a bit
more mature, so you can at least reload the code: messing up the code
happens a lot when you start to do the wrong things. so

TODO: 
* need 'reload' word
* need run time 'toggle', 'high', 'low'


Entry: minimal osc listener
Date: Sat Jul 22 14:27:17 BST 2006

(receive 10001) osc-receiver
(osc-console    receive unswons interpret osc-console) :
(/purrr t) :

Entry: new purrr features and fixes
Date: Thu Jul 27 16:04:47 BST 2006

- stack underflow detectiong
- better synchronization (random numbers / counter ?)
- recording file from target mode
- host, target = idempotent

Entry: workshop notes
Date: Fri Jul 28 12:34:30 BST 2006

- either programming or electronics prior knowledge
- more tools: soldering, tweezers, screw drivers, ...
- not combined with pd/pf: to much scatter


Entry: mzscheme
Date: Thu Aug 10 19:08:54 CEST 2006

problem with macros not being able to use functions defined in the
same compilation unit, and some more things with macros being
different. but, as dave put it, might be interesting to switch because
of more active community around plt scheme compared to almost dead
guile.


Entry: done being lazy
Date: Sun Aug 27 19:10:51 CEST 2006

lots of things to do. most important are simplification of pf to
lock-free design, which i'm going to accomplish by first making it
completely single-threaded, then moving things to more threads using
lock-free data transactions. this should lead to
jack/pd/desiredata/pnpd interfaces. pf is going to be top-down
simplification, while keeping the code running, while in cat/poke i'm
going to do the same + base myself on some ideas from retroforth.

second thing is switch to mzscheme for cat. guile is dead. this
requires macro cleanup.

in summary:

* transactional memory + lockfree stuff
* mzscheme -> clean up macro code
* study retroforth design 


Entry: building pf from the ground up
Date: Wed Aug 30 04:52:14 CEST 2006

start with poke:
- 4 instruction vm: PRIM/CALL/JMP/???
- single block of linear core memory

features:
- multimethod dispatch (generic functions)
- true object model
- refcount object management
- hashtable
- symbols
- lists or pairs? + memory management
- binary serialization + comm (all objects either serializable,
  or reconstructable by design. which means 'write' should write anything
  to a stream, such that 'read' will either verbatim reconstruct it, or
  reconstruct it using "trusted constructors", or just arbitrary code)


so.. to summarize. the 2 most important problems are 

1) memory management for things that can actually be deleted. 

maybe i should explicitly avoid this? linear lisp mm + pure packet
leaf nodes with reference counts based on their respective types are
trivial to implement. the 'drop' method for a packet moves it to the
global reuse-stack in a lock free way. a (suboptimal) 'real' GC should
then be possible to write on top of this by re-packing the core memory
and discarding the tree.

2) multimethod dispatch

how to make sure this also works on composite types? (lists of
packets)


Entry: todo september
Date: Sun Sep  3 20:18:38 CEST 2006

* move to plt scheme
* fix workshop bugs
* usb driver
* make it run on windows (easier with plt?)


Entry: eforth
Date: Tue Sep  5 03:04:29 CEST 2006

reading eforth book "eForth and Zen, C. H. Ting", which can be found at
http://www.eforth.com.tw/academy-n/basic.htm

charles mentions it in the vaticForth source, which is modeled after
Ting's eForth17.

some comments

* purrr sacrifices a lot of ease of use (16 bit numbers, single
  address space, ...) + simplicity of implementation (loads of special
  case macros for target machine code optimization). it is not really
  suited for teaching.

* i need a serially controlled, universal, programmable data interface
  i can plug into any old computer and use the terminal to do
  debugging without needing a host system to run. this is an ideal
  place to test this eforth thing. this could be an interface to any
  app written in a more hard core language like purrr..


so this leads to 

* purrr -> 'visible' part of the forth project. relatively portable
language, adhering to more conservative forth standard practice

* cat/badnop -> my playing ground


Entry: flashforth
Date: Wed Sep  6 03:11:30 CEST 2006

something mikael is doing in flashforth, and from which purrr might
benefit, is to always have kernel words be looked up before user
words, to make the system more robust. one thing i was thinking about
in addition to that, especially aiming at teaching, is to have a 'safe
mode', where dup and drop are actually functions that perform some
checking. this was a common error. another easier solution is to give
some guard bytes, and have the interpreter check state..


Entry: stand alone 16bit forth
Date: Thu Sep  7 20:55:51 CEST 2006

After reading some things i see it like this:
* ease of implementation is most important
* need ans? maybe not. but a compatibility layer could be nice. (eforth?)
* dict in lower caps

some more
* @ and ! are the only memory access words even used in NEXT


Entry: todo update
Date: Fri Sep  8 17:34:15 CEST 2006

* usb!
* minimal ANS forth read docs
* sheepsint -> interrupt driven synth core
* sheepsint -> highlevel interface forth.. (note layer)
* small module system


Entry: name change
Date: Fri Sep  8 20:25:53 CEST 2006

so.. 18f, short for badnop/18f is now the 8bit compiled 18f forth. a
nice and clear name for the real deal. the 'sexy' name is reserved for
the 16bit interpreted forth front end, which is in dream up phase, but
will probably be very similar, or a port of, charles' standalone 18f
forth.


Entry: purrr 16bit vm
Date: Sat Sep  9 13:18:29 CEST 2006

got the memory model and execution model figured out. time to write
the compiler. i was thinking.. maybe it's easiest to just write it in
forth, and write a simulator to run the forth in cat?

i managed to get cought again with the indirect threaded code deref
story. somehow it's really unintuitive. but it looks like it works
now..


Entry: usb
Date: Sun Sep 10 09:30:09 CEST 2006

another try. tomorrow johannes is coming over, so i need to get things
in good shape! since i have strings now, i might just add a debug
mechanism to send printouts over the serial cable. this might be one
of those good ideas, tom! another thing i need is loadable modules..

about the loadable modules, i'm just using files now, which is good
enough. loadable modules are only necessary when there are a lot of
macros that need shared support code. cotton feature postponed till
later.

so.. how to go about getting this usb thing to work. i read the specs,
and have some kind of idea about how to get things working.. basicly,
usb is a pull system: host requests data.

so, i got the 'init-usb' part right :) lol now, the usb reset
interrupt UIR URSTIF apparently configures all the buffers. a nice
opportunity to start looking into that relative addressing.

hmm.. i'm not so sure any more.. i had it running through a 4 6 6 6 6
6 6 6 6 3 sequence, (activate, reset, ..., suspend), but get these
kinds of errors in linux (-71 is EPROTO)

usb 3-1: new low speed USB device using uhci_hcd and address 86
usb 3-1: device not accepting address 86, error -71
usb 3-1: new low speed USB device using uhci_hcd and address 87
usb 3-1: device not accepting address 87, error -71
usb 3-1: new low speed USB device using uhci_hcd and address 88
usb 3-1: device descriptor read/64, error -71
usb 3-1: device descriptor read/64, error -71
usb 3-1: new low speed USB device using uhci_hcd and address 89
usb 3-1: device descriptor read/64, error -71
usb 3-1: device descriptor read/64, error -71

this seems to happen after only init-usb ???
sometimes it works and sometimes not, i guess i need to sleep..


Entry: random stuff
Date: Sun Sep 10 16:56:25 CEST 2006

* easier 'empty' ?
* robust to ctrl-C during upload? disable?
* fix assembler jumps !!!! (ed. it's an error now. >8KB code is problematic)
* move to byte addressing model.. word addressing is crazy


Entry: standard forth
Date: Thu Sep 14 05:19:25 CEST 2006

do i really want to write one? actually pretty boring stuff.. what i'd
like to have is a standard forth for teaching, mainly for
documentation and 'api stability' and a simple standalone forth for
myself in case i need standalone apps with console.

the real question is, can i live with IMMEDIATE? if so, then this is
not a problem, and i should just implemement a MAF kernel.

(ed)

actually, i probably do need this thing for myself. it would be a
nice excercice to write it in a couple of layers

1. inner interpreter + memory model (general vs flash only)
2. standalone interpreter
3. standalone compiler
3. ans layer


Entry: forth console + speed
Date: Thu Sep 14 06:26:47 CEST 2006

i didn't deem this so important at first, but it is. the console needs
to be conceptually clean. cat is nice for actually implementing the
whole system, and for writing tools that eventually become macros, but
interacting with the machine should be as simple as possible, and
emulate a true forth basicly.

one important part is the uploading of code. blocks are a bit spartan
if you don't have to use them, but as an interface '123 load' they
work very well.

another thing that's important is speed. cat is really really slow. i
should find out how to make it a bit faster.

the strange thing is that compiling words makes it only a tiny bit
faster. i'm assuming that most of the other time is also spent in
looking up macro and word dictionaries, or in rearranging the data
tree. but i need a tool to actually measure this.

some simple things that could speed things up is to speed up the
io. maybe add some buffering. terminal io is relatively fast. reading
is terribly slow also.


Entry: roadmap
Date: Fri Sep 15 21:43:13 CEST 2006

1. interrupts
2. 3 x hires interrupt synth algos
3. controller in 16bit highlevel forth
4. interpreter
5. compiler


Entry: problem, solved? attempt to get organized.
Date: Sat Sep 16 03:57:34 CEST 2006

trying something new. if i write code that's not clean or has missing
features, i usually add FIXME or FIXME: blablabla. what about starting
to do this in the dev log? a place for more general as opposed to one
line problems.

FIXME: 18f block 0 init
FIXME: overflow in relative jumps
FIXME: no parens in forth mode
FIXME: code upload: paragraph instead of block? + define keys
FIXME: bootblock updater


Entry: boot stuff
Date: Sun Sep 17 01:20:36 CEST 2006

back on line.. so, the boot block. maybe it's best to use some special
purpose instructions instead of 'org' to do the compilation. 
FIXME: org messes up upload


Entry: byte addresses
Date: Sun Sep 17 19:19:36 CEST 2006

bite the sour apple. if you EVER use byte addresses, and the machine
itself is byte addressable. NEVER use word addresses as
representation. it makes life more diffictult, even if at first glance
things seem easier.

how to tackle.. (project labels) -> (project symbols)
a name change is i think the best way of making the transition

ok.. looks like its done. some bugs here and there, now the code looks
a bit clearer. all addresses are byte addresses now. this would enable
a better treatment of config bytes too.


Entry: driving leds
Date: Mon Sep 18 06:48:44 CEST 2006

fun fun :)
rediscovering the L,H,Z matrix trick:
using N pins you can drive N(N-1) leds

for example

  0 1 2 3
0   a b c
1 d   e f
2 g h   i
3 j k l  

where for example 
a = H H Z Z
e = L Z H Z

number of leds = N * N(-1)
2 times (> and <) the number of ways to choose 2 terminals
from N nodes


coding hack: for time multiplexing purposes, it is possible to refresh
independent (orthogonal) subspaces simultaneously.

for 8 x 7 it is possible to use only 8 pins.
for 4 x 7 = 28 leds, the closest is 5 x 6 = 30, so 6 pins is enough.

then for the number of resistors.. a naive approach would be to use
one resistor per 2 leds. however, it is possible to use one per pin
giving --/\/\/\---|X|---/\/\/\--- where the middle symbol is two leds
antiparallel.

however, wether this works depends on what access you have to the
terminals. i'm using CA digits, one anode per digit, all segment
cathodes free, so this won't work. the maximum number of shared anodes
is 5. this means i need to use 8 pins. looks like that's going to be
the D bus. something like


D port: 0 1 2 3 4 5 6 7

digits: + a b c d e f g
	g + a b c d e f
	f g + a b c d e
	e f g + a b c d

using rotates rather than inserts (a + b c ... ) makes it easier to
handle in software.


a path from 1 -> 0 will light a led, but will not light 2+ leds
connected in series between the same terminals. according to don
lancaster's tech musings 152, only one led can be driven at a time. i
don't think this is correct. you can not have both more than one 1 AND
more than one 0, but it is possible to switch on a whole 'row'
connected to a common 1 or 0 terminal, with the OFF condition
represented by high Z.

however, switching more than one LED at a time does create a problem
when using one current limiting resistor for each port. since the
common 1 or 0 resistor is shared, it has a larger voltage drop
proportional to the number of lit leds, resulting in less overall
current: per LED ~ 1/(n+1) with the total current ~ n/(n+1)

this can be solved by using a bypass diode for each resistor. this
makes the circuit assymmetric, which isn't a problem, since there is
already a preferred assymetry: it's easier to drive all segments for
one digit at a time as opposed to one segment for each digit.

to get a decent brightness, it might be possible to drive the led with
a higher current than its maximum rating, as long as it has a limited
duty cycle. this does pose some problems for testing, since you can't
have a non-cycling lit segment.


Entry: analog filters
Date: Mon Sep 18 17:40:05 CEST 2006

so.. why are 'musical' analog filters special? because of the varying
time constants. the basic building block for a filter is an
integrator, which is easily built using an opamp or ota. the reason
CA3080 otas are used in old designs is because of their variable gm,
which is ultimately a bipolar current bias. this translates in
variable gain and so variable time constant for integrators.

so, the basic entity is a multiplication, in the disguise of
transistor bias which influences voltag -> current gain. analog
multipliers should be able to fix this.

an analog multiplier can easily be built using switched circuits, in
the form of a PWM modulator. this requires analog switches, a
comparator, and a triangle or saw generator. i order some switched cap
filters (MAX265) which have exposed sections so i can give this a try.


Entry: 30f and 12f/16f
Date: Mon Sep 18 22:35:51 CEST 2006

just put a dspic on the breadboard. since this story is mostly
software, i can do it later.. lots of things to do. wonder how this
can best be tackled. write assembler first, or write monitor and
pre-forth first? anyway, the forth machine model needs to be
determined, requires mostly reading for which i don't have time now.

then, the 14bit cores. they have some drawbacks. banked program memory
and non-writable flash. these make interactive dev really hard. i
think it's safest to abandon them for now. an option could be a
hardware incremental programmer.

so, both are closed for the moment. too much work to actually get
somewhere interesting in a couple of days. means i'm sort of done,
except for the sheepsint portable radio mod, and can start working on
sheepsint and the 16bit idtc forth.


Entry: standalone sheepsint
Date: Wed Sep 20 00:29:39 CEST 2006

the only thing i need to build before piksel is a little standalone
sheepsint board. i'm not sure about the input methods yet.. maybe this
is not necessary at all. maybe just a serial port is more than enough?


Entry: video output
Date: Thu Sep 21 05:30:04 CEST 2006

there's a lot of stuff on the web for making PIC chips and other
microcontrollers drive PAL/NTSC/MGA/VGA/... for monochrome this is
really not so hard. the only downside is you can't really do much else
when sending out bits since pixel clocks are so close to the
instruction frequency. so i was wondering if it's possible to use the
PIC's hardware serial port to do this: that way it should be possible
to drive only at byte intervals. making the interrupt routine as fast
as possible should leave open some space to do other stuff below
isr. if the clock freq is something like 2x pixel clock, this leaves
at least some room for other things to do, while giving full
resolution output.

first, a look at the serial output hardware on the most basic pic18f.
the simplest version is EUSART. the thing i need is synchronous
master: can send out bits at instruction tick, clock 2x without
interleaving.

( don't need async uart: the standard start/data/stop with sync on
stop/idle->start transition, synchronous slave rx/tx and synchronous
master rx. also, in the bigger 18f chips there is an MSSP module which
implements I2C and SPI. briefly looking at the timing diagrams, it
should be possible to use these also, maybe together with address
registers etc.. but it looks a lot more complicated, so i am going to
forget about it. )


so, text based output with flash character lookup is something like this

MOVF POSTINC2
MOVWF TBLPTRL
TBLRD*+
MOVF TABLAT
MOVWF TXREG

clearing of the interrupt flag is not necessary. it happens
automaticly when TXREG is written. using an extra layer of caching
which gets the current pixels from ram instead of flash gives a
simpler interrupt routine

MOVF POSTINC2
MOVWF TXREG


but this requires another task to actually fill that ram area from rom
characters and ram character framebuffer. if it is possible to do this
fast enough, i think this is the way to go. it would enable scan line
doubling to leave room to do other things. the only thing that still
needs to be done in the isr is the H an V sync pulses. the trickiest
part is to do the 'end of scanline' checking fast enough in the ISR.
probably something like this:

BTFSS INF2 6
RETFIE 1

this would check the line readout for overflow, assuming the data is
placed in memory so end of line will cause a particular bit to change
state. that seems pretty near optimal. per line code can run after
that, and doesn't need to be so fast since there's plenty of room
between lines. if not, it is easily created by using less characters.

then, what's left is the flash->ram copying. the inner loop of this is
just

TBLRD*+
MOVF TABLAT
MOVWF POSTINC1

where POSTINC1 needs to be saved so outside this loop ordinary forth
code can run. (actually, this is not correct, since there needs to be
something that determines the address to read from).

the interrupt latency for any internally generated interrupt is 3
cyles. an iret is 2 cyles, and a btfss is only one if the branch is
not taken. so, this gives

  3 entry
  2 load next char line + store TXREG
  1 check end of line
  2 return

totalling 8 instruction cycles for the isr. bummer. this means it's
not posssible to output at instruction cycle frequency and do
something else too. it is very possible though to run at half that
frequency, freeing half of the cycles for other stuff, but still 5 out
of 8 cycles are overhead. it's probably easier to do it without
interrupts.
  
what about something like

MOVF POSTINC2
MOVWF TXREG
x
x
x
x
x
x
MOVF POSTINC2
MOVWF TXREG
x
x
x
x
x
x

with the x's filled up with payload. a line loop like that can be
started by syncing on the TX interrupt bit. it is possible to squeeze
the flash->rom copying inbetween this. TBLRD*+ takes two cyles, the
other 2 take 2 also. this leaves 33% of a line to fill up with useful
instructions. hmm... (error in reading character set.. wont make it)

back to square one. this code takes 6 cycles to run:

MOVF POSTINC2
MOVWF TBLPTRL
TBLRD*+
MOVF TABLAT
MOVWF TXREG

same number of cycles, but not affecting the W register so it can be
used to do useful stuff in the remaining instruction slots.

MOVFF POSTINC2 TBLPTRL
TLBRD*+
MOVFF TABLAT TXREG

this leaves a 2 instruction slot to do useful stuff. only straight
line code can be used to fill the slot. filling this up could be
automated using expanded forth code, combined with instruction timing
tables. alternatively, the 2 cycle slot could be filled up with a
jump, which can be interrupted / disabled by a 15kHz timer interrupt
for vertical sync. giving more compact code.


timings. dot clocks are only a guideline for analog video. you put on
a line what you want. line frequencies are strict.

proto	pixel (MHz)   line (KHz)	  
---------------------------------
NTSC	13.5	      15.75
vga640  25.175	      
vga720	28.175		
mda	16.257		


conclusion: using a multisync vga monitor, it might be possible to use
a nonstandard pixel clock. halving the dotclock makes it all very
possible. really, the only thing that's inportant is to get the H/V
timing right. lines are just analog. for tv and vga:

from http://lipas.uwasa.fi/~f76998/video/modes/

"The harsh reality of analog video is that there is no fixed, defined,
horizontal resolution. There is just a squiggly analog waveform which
lasts 52 µs. If we want to turn it into pixels, we need to determine a
sampling rate, and sample the signal at that rate - over the whole 52
µs active length."

so i guess there is quite a lot of freedom, even with vga. the
important thing is that the horizontal sync is within tolerance. doing
12MHz pixel clock for video still gives 640 pixels width, wich is more
than enough. probably too much to read on most tv sets. a simple old
scool 40x25 mode would be quite ok.

forgot about this: the 'safe area'. a reasonable percentage of the
width (and hight) cannot be used due to overscan. so doing text
automaticly gives some black space which can be used to perform other
mcu tasks.

another hack. when reducing the horizontal resolution, it's only
natural to also reduce the vertical resolution. using the
non-interrupt way of hardcoded interleaving, scan doubling can be
used. for video, this is frame based, so there is no real gain (no
space to cache a whole frame on a pic) but for vga, it is possible to
make one line write out the line data to ram (8 instructions filled),
and have the other line just cache it, using up just 2 instrictions,
giving a net gain of 1 instruction per pixel.

looks like interlaced/noninterlaced depends on the time you wait from
vertical sync to start your next scanline. for noninterlaced, alwayas
start a line at the same time, for interlaced, the 2nd field's first
line should be delayed a bit do the beam moves inbetween scanlines.
see http://www.williamson-labs.com/480_tv.htm


Entry: pic sample player
Date: Thu Sep 21 09:28:09 CEST 2006

there's the PIC18F24J10, which is the cheapest 18f. it has a 31kHz
internal oscillator and 16 kbytes flash. seems like the ideal 1 bit
sample player. its a 28 pin package, so plenty of room to add some
flashing lights and knobs and switches.


Entry: fast multibyte math
Date: Sun Sep 24 04:14:48 CEST 2006

maybe that's one of the most important things in writing a kernel that
will support the synth core audio part: it needs to do higher
resolution calculations. immediately it becomes less optimal, since
there's no real way around moving stuff out to registers, perform
calculations, and put the result back on the stack. so can't really
help slow..

start with the beginning. the 18f has an 8x8 -> 16 unsigned
multiplication. to build an 16x16 -> 32 unsigned multiplication we
have

(a1 X + a0) (b1 X + b0) =
a1 b1 X^2 + (a1 b0 + a0 b1) X + a0 b0

given the byte on the stack, it's easiest to compute this low byte
first.


Entry: more TV signal generation
Date: Sun Sep 24 14:15:23 CEST 2006

Thinking about using the SPI ports, so it is possible to still use the
serial port for comm. since it has a 1 byte buffer, it could be polled
during line sync. nope.. won't work. SPI uses the same output pin as
the USART.

so, to do i/o too we'll need something different. the easiest is
probably just a master synchronous port, done in software, running at
line frequency, or 1/2 thereof. this could poll for commands. this
does complicates things. for now, i can probably just multiplex
it. does need some resistors at serial input though.

wait.. looking at the wrong data sheets. MSSP/SPI uses SDI/SDO/SCK,
which are separate from RX/TX on 252,452. the 1220 does not have such
a module, while the 4550 shares pins with RX/TX, but provides a
streaming parallel port.


Entry: tasks
Date: Mon Sep 25 12:03:40 CEST 2006

very simple task switching. need to save the 3 stack pointers. the
parameter stack and aux (control) stack pointer can be saved on the
return stack, so a single byte containing the stack pointer is
sufficient.

this is a can of worms.. there are lots of options. if you run tasks
from the isr, you need to save lots of state. better be safe: a task
is saved FSR0L FSR1L FSR2 STKPTR

need to make a distinction between cooperative tasks, and pre-emptive
tasks which are interrupt driven.


ok, byte the sour apple. what is needed is:

- a task is a piece of ram memory, reserved to save state
- each task has a 'suspend' and 'resume' word
- tasks invoked from an ISR are pre-emptive tasks, they
  wait for events (IO), and are woken up as soon as this happens.
- tasks invoked from the mainloop are cooperative tasks, and
  are easiest implemented in a round-robin fasho


implementation: the ram memory is easiest taken to be the data
stack. since the return stack is a limitied resource. storing saved
registers in ram or datastack doesn't make any real difference, except
that data stack is faster.

ok.. draft done, not tested yet. what i miss is initialization. so, each
task needs rp,dp,xp while the initial a reg can be assumed to be zero.

wow. it works! it's really simple :) no more boring state machines!!


Entry: speed
Date: Wed Sep 27 05:58:55 CEST 2006

working on some applications lately, i found out that 

* speed is more important than i thought before.
* pure functional design doesn't always work, since the microcontroller
  itself has a very pressing 'state' associated to it.
* 'mark' / 'empty' are valueable things.
* having CAT lists in forth code is really confusing.
* i need better emacs integration, especially for uploading


Entry: 23 key toy piano
Date: Thu Sep 28 12:07:44 CEST 2006

bought new for a couple of euro. crappy, doesn't work well. runs on 2
1.5V AA cells. has a chip-on-board package connected to 23 lines, one
for each switch. all switches have a common connection to VDD.

speaker is connected to an S8550 PNP transistor \EBC/ configured as a
switch with emitter to VDD and collector to speaker, which is
connected to ground with the other terminal.

now, how to use an 18f1220 for this.. 

23 <= 25 = 5x5
23 <= 24 = 6x4

so, 10 pins are the minimum, with 2 possibilities to connect them. it
is necessary to cut some PCB tracks to connect the switches to output
ports. software is easier if reading is 4 bits (a nibble), so 4x6,
with 6 scanning phases, is the preffered configuration.

PORTB has weak pullups. switch resistance is about 150 Ohms. choose
pins by elimination.

RB1 = TX/CK
RB4 = RX/DT

76543210
...x..x.

hm.. this already messes up the nice bit alignment :) however, 0765 is
possible, which gives a nibble after a single rotate instruction, so
we choose the following read ports with pullup resistors, giving up
associated functionality:

RB5 = KBI1		wit
RB6 = P1C/KBI2		geel
RB7 = P1D/KBI3		oranje
RB0 = AN4/IN0		rood


then, 6 driver ports are needed. RA4 is no good. candidates

RA2 = AN2/VREF-	        wit
RA3 = AN3/VREF+		geel
RA1 = AN1/AN1		oranje
RA6 = CLKO		rood
RA7 = CLKI		paars
RB2 = P1B/INT2		blauw


leaving available 1 pwm output that can be used to drive the speaker.

RB3 = P1A/CCP1

and 2 analog inputs for maybe 2 controller knobs.

RA0 = AN0
RA1 = AN1


Entry: ICD2 compatible monitor
Date: Fri Sep 29 05:38:41 CEST 2006

instead of having a forth monitor run over the serial port, why not
use the PGD/PGC as a slave or master serial port? this frees up the
real serial port for other tasks, and since the interface protocol is
adhoc anyway.. it would require a single interface board with a pic
that does host serial/usb and slaves to the target.


Entry: connectors!
Date: Fri Sep 29 07:11:08 CEST 2006

before i go into the previous, i probably need an intermediate
solution, since the only connectors i can easily make myself are 5x2
ribbon cables. lets see:

VDD
VSS
VCC
PGD
PGC
PGM
TX
RX
xxx
xxx

if possible, make it compatible with FTDI TTL serial cable.

anyways. i'm sticking to just 2 connectors. but, the moral of the
story is: if a host connection is permanent, use the serial port. if
it is debug only: use an ICD2 connector and create some code to use
the PGD/PGC pins for serial debug comm. since this involves an
intermediate that translates to host serial/usb, i can't do that yet..


Entry: piano keyboard
Date: Sun Oct  1 09:58:13 CEST 2006

piano keyboard seems to work fine.. the hardware itself is really
crappy, and the wires are a bit in the way, but it works. now i just
need to finish the synth itself, which brings me to the next:


Entry: interrupts + standalone boot
Date: Sun Oct  1 09:58:54 CEST 2006

i need a decent, safe, standard way of patching the boot record for
interrupt routines. in addition to that, i also need a way to make an
application boot standalone.


Entry: wanted features and changes
Date: Sun Oct  1 13:49:26 CEST 2006

- project macros = union of system / project macros. just like
  'console' words and constants.

- automatic tethered / standalone mode for serial operation

- safe interrupt setting (not messing up boot loader)

- upload of binary files, like font tables, audio samples, without
  going through source code.


the first 2 will break compatibility, but i guess the whole project is
not really ready for stable config files, if ever :) for the monitor
detection, i'm thinking about sensing a break condition or so.

Entry: cleanup
Date: Mon Oct  2 09:41:48 CEST 2006

- bulk of macros now reside in (badnop macros)
- no separate (18f) namespace: just dump in (badnop)
- command completion fixes: wait + forth words
- upload binary data
- isr install


Entry: piksel sheepsint gig
Date: Mon Oct  2 18:17:43 CEST 2006

target:
- piano
- 3 knob 16/ noise box


Entry: old sheepsint
Date: Mon Oct  2 18:52:25 CEST 2006

The clock/power/data thing is actually a lot easier to do using
existing codes: DC-less codes, i.e. manchester codes, PSK, FSK,
etc... This allows easy transport over audio channels + easy
regeneration. Looks like differential manchester code is interesting,
especially when reserving the possibility to control over audio
channels.


Entry: sheepsint controller
Date: Tue Oct  3 09:40:17 CEST 2006

i'm swithing to 3 x 16bit timers for sound generation. it would be a
good idea to switch to a 16bit forth to control these thingies. the
thing i got stuck in last time was the multiplier. it's not so hard,
just need to be a bit carryful.

(a x + b)(c x + d) = ac x^2 + (ad + bc) x^1 + bd x^0

			x_32	x_321	   x_10

things to realize: 
       x^0 doesn't carry (max FE01)
       x^1 carries into the third byte, (max 1FC02)
       x^2 doesn't carry

so the easiest thing to do, is to compute x^0 and x^2 independently,
and then compute the middle x^1 and have it carry over into the most
significant 3 bytes.

of course, there are a lot of people that have done this before, so i
wonder if i can't write it as a serial update equation, instead of a
parallel one.

can't find much, and i can't see much myself either, so i guess i do
it as described above. turns out the routine in the 18f datasheet is
the same. good enough.

now, in the synth i'm in need for multiplication, only for some
exponential/sinusoidal period modulation, so i don't need to solve the
general problem -- that's for purrr some day. this means basicly
1D and 2D discrete linear systems. poles are close to 1, so
approximation is possible:

1D	x    <-   (1+d) x


2D      | x |       | 1-d^2   -d   | | x |
	| y |  <-   | d      1-d^2 | | y |


Entry: equally tempered music
Date: Tue Oct  3 18:02:26 CEST 2006

from don lancaster's "The Saga of the Dripping Stalactite", an account
about discrete optimization (i.e. magic sinewaves). an 8bit optimum
for an equally tempered scale is:

116 123 130 138  146 155 164 174  184 195 207 219  232


Entry: binary time constant
Date: Tue Oct  3 20:49:47 CEST 2006

by Roman Black - December 2001
http://centauri.ezy.net.au/~fastvid/picsound.htm

page was down, i got it from google cache. this idea is not new of
course, it's classic sigma/delta A/D conversion. the philips super
audio CD if you want. the interesting thing about this particular
approach is the encoder: by using a 1 pole filter in the feedback loop
which has a time constant that's easily implemented using shift + add,
a "pic friendly" encoder can be built.

since the 18f has a multiplier, this is probably not even
necessary. however, this circuit could be interesting to use the pic
as a digital delay in a digitally controlled analog circuit.

from Frequency Detection on an Ubicom SX Microcontroller -- Chris Fogelklou 

a s/d converter is easy enough to build as


                      R               R
comparator out 0---/\/\/\/\---X---/\/\/\/\---0 low Z analog in
                              |
comparator in  0--------------X
                              |
                             === C
                              |
                              V

the interesting part here is that the 'comparator in' can be connected
to a comparator, or an A/D to get more bits, but also to a digital
input! this will behave as a comparator on half the supply
voltage. the only problem being that near this threshold, substantial
current is drawn.


Entry: synthstuff
Date: Tue Oct  3 19:59:59 CEST 2006

- xmod / reso
- sample playback
- wavetable
- lfsr noise

LFO modulation:
- 16bit sine wave / exponential

optional:
- ordinary 8 bit pwm: sawtooth/sine
- geigerclicks
- one-shot vs. transition : audible difference?


ok.. looks like it's time to get started. the time hierarchy is
essentially the same, just a little more complicated due to the
presence of the 3 timer state machine.

i only have 2 algorithms:
* decoupled xor :  0 1 2 xor xor
* fake reso     :  0 1 2 xor and

the main deal is in the resynchronisation:

* decoupled
* 0 -> 1/2
* 0 -> 1 -> 2


now i know why i've been procrastinating. i didn't make up my mind yet
for the ultimate design.. there's too much freedom and too many silly
problems due to the 16bit numbers used. it's strange how easily you
take atomicity for granted. working with 2x8 bit numbers, a lot more
'locks' are necessary. anyways. it can probably be limited to locking
the store to the period and phase variables.

sheepsynth used a 'bus system' where the algorithm determined how the
outputs are combined. this can be reused.


Entry: timer update
Date: Thu Oct  5 08:50:47 CEST 2006

periods. it's probably easier and more precise if the timers just get
added to instead of reset in the course of an isr. need to check the
sheets on that.

for the synth, a little bit of phase jitter/delay is probably not so
bad. however, frequency drift is not a good thing, so i'm looking for
a way to update a 16 bit timer register. for 8 bit it's simple, just
add/subtract a value. for 16bit, an explicit read modify write is
necessary. the counters need to be in 16bit read mode to give a
predictable behaviour. however, there's a hack. always a hack :)

when a timer overflows from FFFF -> 0000, an interrupt is
generated. in this interrupt we respond by resetting the timer and
doing some other stuff. the reason for adding to the timer instead of
setting it, is that the exact point is not know, or tedious to obtain
by counting instructions every time something changes. however, it is
a lot easier to guarantee response before the low byte overflows for
the first time.

hold on. it is possible to directly add to a timer, since the only
thing you need to know is wether a carry occurred or not. so, adding
to the low byte produces carry, which is then used to add to the high
byte. however, this requires a directly writable (read-modify-write)
high byte for the timer.

it turns out the latching behaviour of the 16bit timers is
programmable for TMR1 and TMR3, but not for TMR0. so i guess the
previous hack will be better: construct timer value on stack, then
perform 16bit write. this is a constant-time operation which can be
compensated for.

so, evenually it's something like this:

: reset0
    TMR0L @      ( read low timer value. this also latches the high byte. )
    posc0l @+    ( advance timer lo, write carry bit )
     posc0h @    ( read timer high increment on stack )
     TMR0H ++!   ( update timer high latch )
    12 +         ( account for duration of this routine )
     0 TMR0H ++! ( update timer high latch again )
    TMR0L ! ;    ( write out timer lo, which also writes the latched high byte. )

butt ugly.. wonder if there's no more elegant way. haha! there
is. since it's possible to assume TMR0H is zero, this is possible:

: reset0
    TMR0L @      ( read low timer value. this also latches the high byte. )
    12 +         ( account for duration of this routine )
    TMR0H rot<c! ( assume high reg is zero, rotate carry bit into reg )
    posc0l @+    ( advance timer lo, write carry bit )
    posc0h @     ( read timer high increment on stack )
    TMR0H ++!    ( update timer high latch )
    TMR0L ! ;    ( write out timer lo, which also writes the latched high byte. )


but if i go that way, i can just as well assume that the 12 + doesn't
carry, meaning we respond to the interrupt in under about 240 cycles,
giving:

: reset0
    TMR0L @      ( read low timer value. this also latches the high byte. )
    12 +         ( account for duration of this routine, assume this doesn't carry over. )
    posc0l @+    ( advance timer lo, write carry bit )
    posc0h @     ( read timer high increment on stack )
    TMR0H ++!    ( update timer high latch )
    TMR0L ! ;    ( write out timer lo, which also writes the latched high byte. )


hmm.. something i didn't think about. the above routines are only for
independent oscillators. if there's sync, timers really need to be
written. anyways. quickly solved by just writing them hey :)

then: output buffers. still needed? need a break me.

finally: delays. since jitter is very audible, it needs to be
minimized. what is needed is a fixed phase position for periodic
events. this can only be done if there's a constant delay path from
interrupt to final output state change. i've taken care to make timer
updates independent on where they happen in a single cycle -- all the
rest of the operations need to be constant delay too. only then will
stable sounds result.

PWM timer bug fixed. TMR0 is not the same as TMR1/TMR3 due to 2 cycle
delay. when compensated for, this gives a perfectly stable pulse after
_0@ _1!


Entry: sheepsint fixes
Date: Sun Oct  8 17:47:08 CEST 2006

need to build a couple of boards with

* 1320/1220
* red led + resistor (for read / write)
* output jack + resistor + capacitor
* reset button
* specific monitor code (led flash)

then the synth itself, next actions:
* fake reso filter
* pitch modulation (bassdrum etc..)
* piano: standalone operation
* piano: battery powered

* cutoff modulation/envelope (15)
* note table (15)
* fixed time loop (5)
* software interrupt : control rate divider (8192 Hz) (30)
* noise generator (5)
* sample / wavetable player (5)

* perf: some 1-shot demos (60)
* workshop: basic 8bit words, no 16bit madness (60)
* timer prescale (20)


Entry: misc cleanup
Date: Sun Oct  8 19:08:15 CEST 2006

why are there namespaces? not to avoid collisions, but 

1. have a handle for a group of words
2. possibly to switch behaviour


Entry: huffman forth
Date: Mon Oct  9 01:54:29 CEST 2006

i need to think about crazy ideas to do some garbage collection, or
overwriting of memory registers to get the gunk out. huffman encoded
forth. not new of course, but i thought it was a nice idea when i
heard it. involves:

* compressor (cross compiler)
* next bit (runtime)
* huff tree (compiled)

each node in the decoder tree can be represented as
1. call to NEXTBIT
2. conditional branch to left node
3. branch to right node

a leaf node is just a CODE word which jumps to NEXT = root node.


Entry: fractional multiplication hacks
Date: Mon Oct  9 13:58:26 CEST 2006

just implemented a multiplication by 1.00zz where zz is a single
byte. the problem with this is that sometimes this does not give
enough resolution to 'move' with a small pole: in order for the
resultant not to vanish, the value aa in 0.aabb needs to satisfy 
aa x zz > 0x80.

i was thinking about using dither to give a bit more resolution:
instead of using 0x80 for rounding, a pseudo random number can be
used. for example, if aa x zz == 2, then 2 out of 0x100 times it will
carry over, resulting in motion. this is a nice occasion to figure out
the difference between PWM, dither, "nyquist PWM", S-D modulation and
the Bresenham algorithm.


Entry: weird bug
Date: Tue Oct 10 10:52:43 CEST 2006

something which happened in the previous workshop, and now i have it
again: when a 18LF1320 boots, it sends a 0x80 on the serial port. i
thought this was a bad connection or bad chip or so, but it has to be
something else.. if it's received by the host system, it is a valid
byte. such a byte physicly looks like:

idle  start data stop idle
1111111 0 00000001 1 1111111111111111

so it is nothing more than a pull to ground of the serial output for a
certain period of time. possibly the time between clearing the TRIS
bits and enabling the serial port. all points in this direction, since
the chip runs at low frequency during boot.

edit: indeed. setting the port to all one before clearing TRIS fixes
the problem.


Entry: pic dev station
Date: Tue Oct 10 11:10:25 CEST 2006

so. everybody wants to design a programmer. if i were to design a
programmer, what would it look like?

- use PGD/PGC pins for monitor comm: standard ICD2 header
- usb/serial interface to host
- multiple tethered targets
- charge pump, so smaller pics can be programmed too
- possibly ICD2 compatible (pure software problem)


Entry: serial port based app/mon select
Date: Tue Oct 10 12:54:07 CEST 2006

so.. the zero byte is set as nop. monitor should boot up in case there
is no byte within a couple of msec. why is this so hard? because i
want to have some kind of failsafe. it's risky to overwrite the reset
vector, so i need an approach that will

- launch monitor when there's input data on serial port
- launch monitor when there's no application code (bulk nops)
- launch application in other cases

the first dash is important (not done in application) so that in case
the application fails to start the monitor, the monitor can still be
run. in short, the real reset vector should not be touched: monitor is
considered stable code, while app might fail.

ok, it seems to work most of the time, and i can't pinpoint the
errors. sometimes i get weird results, if for some reason the target
starts sending bogus due to some faulty init. this needs to be tweaked
a bit.


Entry: 2^n ratio software interrupts
Date: Tue Oct 10 18:34:18 CEST 2006

if i run it at 8192 kHz and only use power of 2 rations, things get
simpler. software interrupts can be generated by xoring one of the
high bytes of the counter with a previous value.


Entry: piano
Date: Wed Oct 11 22:29:35 CEST 2006

what about using the left 5 keys and right 6 keys as modifiers, and
the middle octave as 2 tone poly synth?


Entry: poke
Date: Sat Oct 14 21:14:20 CEST 2006

- brood symbol lookup accellerator

- mole in pd

- writing cat in poke is insane -> switch to plt


Entry: optimization
Date: Sun Oct 15 12:37:11 CEST 2006

- write parse/unparse in C : syntax is simple and load/save really
  needs to be fast. maybe plt scheme solves this problem? guile's
  strings are really slow.

- functional data structure : dict/dictroot/stack avoid consing.
  (need to profile, see if it really matters)


Entry: shallow binding
Date: Tue Oct 17 22:41:13 CEST 2006

i just read "Shallow Binding Makes Functional Arrays Fast (1991)" by
Henry G. Baker. the only thing i really use is

- convenience of variable (imperative) programming semantics when
  there are too many variables to effectively use purely functional
  concatenative approach.

- ability to completely undo all side effects of a routine (exceptions
  + speculative execution / simple nondeterminism)

- map and accumulate (in the form of for-each)

what i really need is a state machine with a fancy undo. i wonder
wether checkpoints aren't a lot simpler.

somehow the first item is really ugly, but still terribly
convenient. in the end, the program (incremental compiler) really
deals with current state and mappings that bring it to the next state.
the whole compiler application is really a data base, but the (var)
dir needs to be cleaned up.

checkpoints only occur at a 'catch' boundary, meaning an assignment
that's not enclosed cannot be undone. problem is that every command is
already enclosed in the interpreter catch loop.

ACID: see http://en.wikipedia.org/wiki/ACID 
In databases, ACID stands for Atomicity, Consistency, Isolation, and
Durability.


Entry: big jumps
Date: Tue Oct 17 23:44:06 CEST 2006

1. due to the incremental nature, forward big jumps are really scarse:
   mutual recursion is best kept as local as possible.

2. backward jumps can be resolved in the first pass.

3. big forward jumps are vectors, which can be handled separately


Entry: cat optimization
Date: Wed Oct 18 00:07:16 CEST 2006

thinking about it, there can only be 2 places where cat is
inefficient, except for underlying efficiencies of guile.

1. name lookup
2. functional data structure maintenance

the first one can easily be cached (bound) for constant code. this
already gives a significant speedup. the second one requires a better
treatment of mutation and the dictionary. implementing the (var) and
other often changing variables as a vector which is part of the
machine struct and is checkpointed on 'catch' might solve a lot of
problems.


Entry: mzscheme tough nuts
Date: Sun Oct 22 11:40:22 EDT 2006

exceptions: things are different in mzscheme, and i'm really
guile-centric. differences:

* raise takes a single object / throw takes a vararg list
* predicates instead of key objects


Entry: tv sync generation
Date: Sun Nov  5 00:04:19 EST 2006

something i didn't think of: sync pulses can be generated using
PWM. probably only makes sense for fast MCU's though.


Entry: conceptual problems
Date: Tue Dec 12 15:12:31 CET 2006

alright.. i've had some rest, working on PF. some reflections:

he problem is state. the whole idea is to have incremental update of
state to the microcontroller. the functional/stateless approach of CAT
is just to compute the update functions. currently, there is probably
too much time wasted in keeping persistent state.

the solution here is to clearly separate the functional and
incremental parts. one of the core problems here is 'local context'
which should really be solved as lexical variables instead of dynamic
scoping.

so, my question is.

* is it possible to create proper lexical hierarchy in a forth based
  concatenative model, or do we need a different approach.

* what about viewing the problem as a database problem? have a coarser
  granularity in terms of transactions that either succeed or fail.


so, the buzzwords:
* transactions that can be rolled back (darcs?)
* core = functional, with lexical scoping: have a true scheme-like environment.


Entry: TV/VGA generation
Date: Tue Dec 12 21:10:02 CET 2006

thinking about how to simplify this video stuff to build a character
terminal from a pic chip in a way that leaves some space for other
work to do during drawing.

what i didnt try yet is to use a simpler chip, and use it's general
purpose serial port for blitting out stuff. what i use now is the SPI
port which has a don't care in the output. the generic serial port
should really be able to draw just one line. however, using that one
gives up the serial hardware port, so it would be cool to be able to
receive serial or other bytes manually.

blitting out a bitmap image from ram is faster than an indirect text
character image. 2 cylces versus 6. however, there's not enough ram to
actually display an entire screen worth of characters.

on the other hand, using two chips, this problem is not too hard to
solve: have one chip do the display, and have it poll data from the
other chip synchronously during sending.

serial/USB/... <-> [ I/O ] --> [ DISPLAY ] -> TV/VGA

maybe it's overkill.. maybe a simple high-speed shift register would
do it better. the real problem here is the overhead. at 6/8 cycles per
pixel, the overhead is too large to do anything useful that requires
strict timing. as long as it's hardare, it's ok. maybe the comm should
be SPI? use the SPI to pull in data from the I/O chip, and feed the
VGA with the general purpose shift EUSART register.

decreasing the clock by 2 will already give 6/16 instead of 6/8. maybe
that's a better approach: just reduce resolution.

so, why, if i'm using two chips. why not use something that can dump
out serial data as the main blitter, like a serial memory, and have
the character set in memory? problem is: they are dumb. what i need is
on-the-fly generation of the bitstreams, where one byte in gives say 8
to 16 bits out (character width), or, when the scanning is made
horizontal, this can be increased. this is an interesting problem by
itself. output wavelets..

ok. shift registers. that's stupid. they need to be loaded too, and
require lots of leads.. the only reason to have them is to clock them
at a higher frequency. however, generating the write strobe in this
case is problematic.. so forget about it.


Entry: mesh crickets
Date: Tue Dec 12 21:46:53 CET 2006

was thinking about this: what about
- pic chip
- single speaker
- oscillator
- battery

and create a mesh network of those? using simple but robust BPSK this
should really work. i'm not sure about the linear distortion though,
since there's not enough horsepower to do adaptive filtering.

* reflections could be a problem, have to see. keeping packets very
  short might solve this problem.

* poles/zeros: probably no problem for higher frequencies. using
  narrow bandwidth channels reduces the risk for linear distortion.

this would at least sound nice. maybe different channels can already
solve most of the linear distortion problems.

gert says:

* it already exists: PSK31, but try something simpler first
* crosstalk is going to be a problem 
* hidden node effect
* ignore distortion, you're doing low bit rate
* use simple error correction (read-solomon)
* use some frequency hopping to avoid problems (FHSS)


let's see if i understand. BPSK is suppressed carrier (dual sideband)
AM. the widest signal is 01010101.. represented as a minimal bandwidth
AM signal this is

    sin(Wc t) sin(Wb/2 t)

where Wb is the bit frequency. Due to the two sidebands, the bandwidth
is Wb.


PSK31
- uses cosine shaping for transmit
- receiver shaping: sort of a 'matched shaper' wrt to the transmit

the idea is: transmit shaper is kept simple and standard (shaper
instead of filter) so the receiver can be improved upon later.


Entry: FSK/ASK
Date: Sat Dec 16 23:30:11 CET 2006

PSK might be nice, but requires coherent demodulation (carrier phase
recovery). FSK and ASK require only power recovery, and are a lot
easier to implement.

ASK: the output is the power of the output of a bandpass filter, or
the output power of a shifted version of the signal.

FSK: 2 such filters to detect silence / 0 / 1 / (noise = both)


since i don't have a lot of control over amplitude, probably FSK is
best. the limiting part is going to be the mixer + decimating filter,
since that's the only component that runs at audio rate.

the filter could be a very crude flat average. this gives an inner
loop like this: (quadrature receiver with product integrator)

x = sample()
x_i = x * I
x_q = x * Q
update(I,Q)    (-> use tables)
y_i += x_i
y_q += y_q

the only thing that's problematic here is the multiplication. 8x8
signed.

orthogonality: if all signalling functions are harmonics of 1/T, then
the integral over a period of T will cancel out all frequencies except
the one we're interested in.

hmm.. denkfout: de 2 carriers moeten afzonderlijk naar DC gemixt worden

the above is too complicated:

talking about it a bit with axel and gert gives the following:
- non-synchronous FSK
- goertzel algo (2nd order filter with 1 multiplication per sample)
- dtmf frequency relations to reduce effect of nonlinearities


Entry: plt porting
Date: Thu Dec 21 20:13:15 CET 2006

let's see.. if i recall, the problem atm are the exception
handlers. what's needed is a way to unify guile's and plt's system
error handling, and possibly the language's catch/throw.

the thing i run into first is:
error: <non-printable-object>

ok. fixed: object->string

now output flushing. -> set by default

need 'system' function.. OK


Entry: i/o files
Date: Fri Dec 22 16:59:47 CET 2006

looks like plt doesn not use I/O files. each direction has a separate
port. however, it does define 'open-input-output-file' which gives
back 2 ports associated to the same device.

this function returns multiple values. as a recap, this works like this:

(values 1 2)    ; returns multiple values

(call-with-values
	(lambda () (values 1 2))   ; thunk that returns multiple values
	(lambda (x y) (list x y))) ; function that processes the values

so, to deal with it one could do

(call-with-values  <thunk> list)  ; return multiple values as list

or

(call-with-values  <thunk> cons)  ; return 2 values as a pair


next problem: string-match

apparently, i just ignored this: (define (string-match a b) 'failed)

looks like i need the perl regexps here. OK


next problem: select()

looks like mzscheme is trying very hard to be portable.. and
apparently select() doesn't go there. so what i need is something
that's more posix like, the way guile does it. there should be at
least some module i can use.

it looks like mzscheme uses select only in it's thread
implementation. so there is a higher level api to do what i would do
with select.

in brood, select() is used in polling for serial port input. maybe
it's best to focus on abstracting this polling behaviour, i.e. 'busy
looping'. i don't really need tasks yet.


Entry: kerstboom
Date: Fri Dec 22 18:30:41 CET 2006

so i got my 452 board, and a led connected to two naked copper wires,
and two balls of aluminum foil. the idea is to make a 4 wire maxcon
network, which is topologically a tedrahedron: 4 alu balls and 6
edges, with 2 leds per edge. this is connected to the micro through a
couple of resistors.

next: port selection. PORTD looks most interesting. soldering an 8 pin
DIP socket to PORTD. got 100 ohm resistors connected to the led
now. this gives about 10 milliamps through the led, assuming led is
between 2.5 and 3 volts, and power supply is 5V.


Entry: sheepsint
Date: Wed Jan  3 02:59:48 CET 2007

chatting with aymeric about the future of sheepsint. i need to figure
out where to put the stress. the pf interpreter cleanup is probably
best forgotten. turbulent time thing..

board is important. USB not so important. maybe for later gadgets, but
probably boards and how to design them is more important. from the
software side, i need to really fix the mzscheme port, and then do
some benchmarking. so 2 sides of the coin:

HARDWARE
- build a board from scratch
- smd soldering (optional, board could be dip + perf)

SOFTWARE
- mzscheme port
- profiling (slow speed is really annoying, it should be possible to get 10x)

!! check aymeric's feature list in irc logs


Entry: profiling
Date: Fri Jan  5 17:19:50 CET 2007

ok. time to make it a bit faster. i need to know where the thing
spends most time. i would guess it's in binding and searching, but it
could also be data structure updates.

something i can't put in real words yet: i want to eliminate global
state variables that have nothing to do with state of the
microcontroller. in other words, i'd like the compiler to be purely
function, with state updates only occuring when a compilation is
done. in short, all the stuff in (var) should be replaced by some
other construct, preferably lexical scoping.


Entry: brood internals
Date: Wed Jan 10 17:22:06 CET 2007

we zijn er bijna. ik zit nog met 1 vraag: hoe elimineer ik dynamic
variables, en implementeer ik een forth met syntactic scope?

brood is leuk en zo, maar het gebruik van dynamische variabelen is
slordig. ik moet eens kijken of het mogelijk is om een concatenatieve
taal te schrijven als een pure syntactische extensie van een (lazy)
scheme, zodat ik beter kan bootstrappen op scheme.

hoe geraak ik daar?

A* wat met lambda?

B* begrijpen wat de essentie is van transactionele updates
   (microcontroller heeft state, daar kan ik niet buiten) in een
   functionele taal.

C* begrijpen wat precies nodig is om scheme lazy te maken.

D* in hoeverre kan ik overschakelen op haskell?


nu, wat bedoel ik met lexical scope.. lambda. ik geloof dat retroforth
iets heeft met locale functies, ergens..

ik heb de indruk dat het gevaar ergens dieper schuilt.. uiteindelijk
gaat het hier gewoon over expliciet benoemde locale variabelen, en
forth/stacktalen zijn combinatorisch ipv random-access.

maar.. wat ik doe is toch zowiso al introduceren van random
access. misschien best om het dan 'juist' te doen, in de vorm van
static scope.. proper lambda dus.

http://lambda-the-ultimate.org/classic/message11828.html : "There is
nothing preventing a `stack-based language' (this is a meaningless
term, BTW, like `interpreted language') from implementing
call-by-name, if that's what you mean."

"And I don't think that their approach `is not expressed in the same
way' in languages like ml, scheme and haskell', since, in all those
languages, one can define combinators. It's just that no one wants to
do it because taking advantage of lexical scope makes things much
easier."

punt.. de reden waarom het wel werkt voor mij is dat ik denk (en zie)
dat er een 'natuurlijke orde' is in het gebruik van argumenten, zodat
in de meeste gevallen namen noemen meer de rol krijgt van 'vervelende
overhead' ipv. extra feature dat programmeren gemakkelijker maakt.

er is een vrij belangrijke opmerking te maken over 'dup'. het maakt op
een vrij elegante manier fanout duidelijk: waar gaat mijn data naar
toe. bij namen noemen is dat niet altijd duidelijk.


"Speaking of Forth, it's interesting to note that Curry's combinators
B, C, K, and W, which form a basis for the lambda calculus, are
essentially the Forth words execute, swap, drop, and dup. B, C, and K
are present in the Haskell Prelude (as $, flip, and const)."

more interesting.. need to look into that.

anyways.. ik ben weer in de war.


combinatorisch:
	pro = compositie gaat gemakkelijk
	con = 'routing' is problematisch

symbolisch: 
	con = compositie kan omslachtig zijn (veel haakjes)
	pro = 'routing' is triviaal

uiteindelijk gaat het hier echt wel om SMAAK. wat is beter? tja. bier
of wijn? dit is wel een mooi uitgangspunt voor de foam paper.

maar ik denk dat er niet veel discussie mogelijk is over het feit dat
lexical scope beter is dan dynamische scope. ik ben een beetje aan't
sukkelen met BROOD.

om het effe enorm te versimpelen, ik wil een syntax zoals
   (a b c) (bla1 bla2 bla3) lambda

om dit weer te geven
   (lambda (a b c . stack)
   	(apply bla3 (apply bla2 (apply bla1 stack))))

dit is waarschijnlijk gewoon rechtstreeks in te pluggen.. ik moet
gewoon lambda toevoegen.

tot zover voor *A


interlude. aangezien dit hele experiment vooral dient om mij een
gevoel te geven over functioneel programmeren, lijkt het me aangewezen
nog even verder te gaan over types, of er ten minste later op terug te
komen.


nu. transacties. in feite is het compileren van een chunk code niet
meer als het berekenen van een 'diff' voor de microcontroller
state. het uitgangspunt is incrementele compilatie, dan moet ik dat
ook maar waarmaken. het probleem nu is dat dit nergens expliciet
ingebouwd is. hier moet ik nog eens op terug komen. scheme en lazy
eval is voorlopig niet echt aan de orde. en haskell is voor later.


Entry: lambda
Date: Sat Jan 13 15:25:37 CET 2007

autotrip heeft me hetvolgende geleerd: wat ik doe is functional (de
manier dat ! en @ werken) maar het is niet referentieel
transparant. die twee zijn niet hezelfde. ik kan niet zeggen dat mn
code 'geen side effects' heeft. ik kan er wel voor zorgen dat de side
effects hersteld kunnen worden. dit is al bij al een vreemde situatie.

nu vroeg ik me af of het niet mogelijk is dat zo te houden, en hier
bovenop een 'echte' lambda te kreeeren:

   * een functie maakt een nieuwe env aan (dict) door zijn dict te
     catten met de globale dict ontvangen van caller

   * als deze funtie andere functies oproept, wordt alleen de parent
     dict doorgegeven. de locale dict wordt enkel doorgegeven aan een
     nieuwe lamda struct.

das een groot verschil. misschien niet goed uitgelegd..


Entry: practical lambda
Date: Sat Jan 20 11:48:09 GMT 2007

* reading paper by Lynas (hmm.. not what i needed. too clumsy)
* the expression problem?

misschien moet ik eerst eens kijken hoe ik de badnop compiler meer
stateless kan maken (for referential transparancy)

dit wil zeggen: elimineer (var)

dit zit toch wat dieper dan ik dacht:
- ihex -> kan evt vervangen worden door lambda
- read / write -> 'user interface' state
- asm queue -> dynamicly bound everywhere.. (like "print")


als ik eens begin bij het begin: de ihex variabelen hebben
daadwerkelijk een vrij locale scope.

conclusion: distinguish these

- 'real' state variables
  -> target state
  -> user interface state (current inspection address, asm buffer)

- 'dirty' state variables
  -> ihex printing state

if state variables never affect the user interface state, they should
be part of lambda expressions (local variables).


Entry: patch / transactions
Date: Sat Jan 20 12:37:37 GMT 2007

is it possible to get to a point where the assembly of a line of forth
code translate into a single transaction (patch) object.

so env  -> env
   code -> env patch

in the evaluation of this function, some local dynamic scope might be
introduced, but it is cleaned up after the compilation.

rationale: the whole idea of BROOD is to have an incremental
compiler. the proper way to solve this in a FUNCTIONAL language is to
compute the increment of the state instead of the new state.

wat ik aan't doen ben is 'functional data base programming'.
i should really have a look at the 'darcs algebra of patches'.

reading: "Database States in Lazy Functional Programming Languages:
Imperative Update and Lazy Retrieval", by Yoshihiko Ichikawa.

- "state transformer" approach for referential transparent interface

the rest is a bit to hard to read atm.. no patience :) i probably need
to read a bit more on the 'state-transformer' approach in haskell.


Entry: apply
Date: Sat Jan 20 13:54:13 GMT 2007

net effe het woord 'apply' ingevoerd: werkt als 'run' maar negeert
alle veranderingen aan environment: dit om manueel referentiele
transparantie te verkrijgen. (apply for referential transparency).

this effectively introduces dynamic variables with limited scope.

the lambda solution would be a lot cleaner, but this gets us on the
way of introducing more transparancy. look at ihex now, there's still
the ad-hoc symbol prefix namespace i'm using, which would be a lot
cleaner if there was a real local environment.

maybe i should give it a try to change ihex to a local function
approach.

ok, this seems to work..  there are a couple of performance issues
though: when i'm using '!', which happens iimplicitly in ':', the
whole environment gets searched, which will replace occurences instead
of just hiding them. so i need a different word, i.e. 'let' which will
just shield without replacing, to avoid costly substitutions and
searches. 

this might be another gentle ramp-up to proper lambda. (it's still
dynamic though, since this env propagates..) it seems to me that
'lambda' requires some core changes to the current CAT vm, since local
names should not be propagated to functions called from the body. i
think i'm missing something.


Entry: 18f / 30f pinouts
Date: Sat Jan 20 16:35:54 GMT 2007

a bit disappointing, they are completely different for the 18pin
versions.. this means a sheepsint board has to be redesigned for the
dspic.

looks like i should call them pic18 and pic30 though.


Entry: lexical scope
Date: Sun Jan 21 12:15:19 GMT 2007

after trying to implement dynamic scope 'with' and have it clash with
my namespace searching mess i think it's time to add proper static
scoping. there is only one huge problem: i don't have the lisp-like
syntactical thing in cat: code is data until it's interpreted.. i do
wonder how the substitution problem is solved in scheme though.


basicly this is the problem:


	    ((a 1) (b 2)) (a b +) lambda

this will return a function on the stack which accepts 2
arguments. the code of this function will look something like:

	   ( <remove 2 items from stack, and save it in environment>
	     <reference 2nd item>
	     <reference 1st item>
	     + )


Entry: scheme piggy back
Date: Sun Jan 21 13:17:01 GMT 2007

it might be interesting to write a concatenative language as a macro
extension of scheme, using a module namespace, and re-using the scheme
scoping rules. in some way this is back to basics.

* all functions return a stack

(a b c)  ->  (apply c (apply b (apply a)))

(define _lambda (body parameters . stack))


thinking about it a bit, this absence of true syntax makes it really
hard to actually do something. as an exercice, i'm trying to write a
CAT as a pure syntactic extension of scheme.


(define-for-syntax (code? thing)
  (symbol? thing))

(define-syntax compile
  (lambda (stx)
    (syntax-case stx ()
      ((_ thing stack)
        (if (code?
             (syntax-object->datum
              (syntax thing)))
           (syntax (thing stack))
           (syntax (cons 'thing stack)))))))

(define-syntax cat
  (syntax-rules ()
    ((_ stack)
     stack)
    ((_ stack first rest ...)
     (cat (compile first stack) rest ...))))


still, how do you introduce lambda? i can do this as a prefix form like:

       lambda p (p p p)

this works reasonably well.. it does seem that a 'parsing word' is
necessary, since this is a syntactic operation which happens at
compile time. this should be easily extendable to
	   
       lambda (p q) (p q +)

it is, only i've got the order wrong.


Entry: remarks about lexical scope
Date: Mon Jan 22 12:52:21 GMT 2007

introducing lambda necessarily introduces block structure. scoping is
about introducing block-dependent names. as long as the interpreter is
not aware of 'activation records' or something similar, lexical
scoping does not make sense.

in CAT there is no such thing. it's easy enough to add, but it
requires modification to the interpreter itself. it's probably easiest
to change the name space structure in a such a way that it will
support proper closures = (code . env) and make sure each word is
bound to its own environment. this is quite a deep change, but the
only one that will solve the problem.

currently in CAT, the environment is a run-time thing (dynamic). what
i need for proper lambda is proper closures, where environment is a
lexical thing: whenever a lambda expression is evaluated, an
environment is frozen and tied to a function.

in order to have the best of both worlds, i need to keep the current
(persistent) dynamic environment (which is a property of the
interpreter), and a static environment (which is a property of a
function).


Entry: static AND dynamic
Date: Fri Jan 26 19:27:49 GMT 2007

looks like this:
- static binding: environment tied to function (lambda)
- dynamic binding: environment tied to interpreter

the latter is as if it is tied to a continuation. also, this is
closely related to the monad thing in haskell, which is understand
very remotely as 'passing along hidden variables without violating the
pure functional paradigm.'

now a note on efficiency.

- once a bunch of symbols is associated with a (static) lexical
  environment, the code can be compiled. since names will never refer
  to other objects, the indirect references (symbol names) can be removed.

- all symbols that are not in the lexical enivonment, are necessarily
  in the dynamic environment. they cannot be compiled and have to be
  resolved at run time.


if i implement proper lexical scope and use it for most important
words, speedup will be significant. i think i found my main
optimization point: both for speed and sanity :)


but, somehow i don't really understand how scheme does it
though.. maybe this is just because define is actually 'set' ?

Welcome to MzScheme version 352, Copyright (c) 2004-2006 PLT Scheme Inc.
> (define (AAA x) (+ x 1))
> (AAA 1)
2
> (define (BBB x) (+ (AAA x) 10))
> (BBB 0)
11
> (define (AAA x) (+ x 100))
> (BBB 0)
110
> 


so, i guess i can't escape monads now. because it really looks like
that's what i'm trying to do, but then more general.


Entry: R.W. Floyd's Turing Award Lecture
Date: Sat Jan 27 12:17:41 GMT 2007

http://www.ias.ac.in/resonance/May2005/pdf/May2005Classics.pdf

I now see clearly and concretely the force of Minsky’s 1970 Turing
lecture [19], in which he argued that Lisp’s uniformity of structure
and power of self reference gave the programmer capabilities whose
content was well worth the sacrifice of visual form. I would like to
arrive at some appropriate synthesis of these approaches.

To sum up, my message to the serious programmer is: spend a part of
your working day examining and refining your own methods. Even though
programmers are always struggling to meet some future or past
dead-line, methodological abstraction is a wise long term investment.

To the teacher of programming, even more, I say: identify the
paradigms you use, as fully as you can, then teach them
explicitly. They will serve your students when Fortran has replaced
Latin and Sanskrit as the archetypal dead language.

To the designer of programming languages, I say: unless you can
support the paradigms I use when I program, or at least support my
extending your language into one that does support my programming
methods, I don’t need your shiny new languages; like an old car or
house, the old language has limitations I have learned to live
with. To persuade me of the merit of your language, you must show me
how to construct programs in it. I don’t want to discourage the design
of new languages; I want to encourage the language designer to become
a serious student of the details of the design process.


Entry: haskell and monads
Date: Sat Jan 27 12:18:28 GMT 2007

http://sigfpe.blogspot.com/2006/08/you-could-have-invented-monads-and.html

- monads have something to do with side effects.  many types of
  container object can be viewed as monads some tutorials find it had
  to reconcile the 2.

- 3 examples of a -> m b : Debuggable, Multivalued, Randomized

- We're given a function a -> m b but we need to somehow apply this
  function to an object of type m a instead of one of type a.

- In each case we do so by defining a function called bind of type (a
  -> m b) -> (m a -> m b) and introducing a kind of identity function
  unit :: a -> m a. The triple of objects (m,unit,bind) is the monad.

- You don't even have to explicitly use the bind function, you can ask
  Haskell to insert it into your code automatically.

In Haskell the names are '>>=' for bind, 'return' for unit, and in
order to overload the names >>= and return, we need to make use of
type classes.

The 3 patterns above are called Writer, List and State monads in
Haskell.

--

It looks like what i'm trying to do is something like a writer monad,
but since the assembler can optimize, it becomes the more general
state monad.

It looks like 'do' in Haskell is actually that strange thing in
concatenative langauges where functional and imperative seem to
collide: sequential statements as composition of functions. However,
in Haskell the state that's being updated is always properly isolated,
while in an imperative machine, the state is the entire machine.

The pattern i'm after is something like 

    forthcode -> forthcode, assembly

sort of 'shit a turd' kind of approach.


http://sigfpe.blogspot.com/2006/01/eleven-reasons-to-use-haskell-as.html

Haskell isn't very good at dealing with arrays. In most programming
languages it takes time O(1) to access the ith element of an n element
array. In Haskell you can get this kind of performace if you don't
mind using the various painful to use array operators available, or
you can sacrifice performance, keep the elegance, and get O(log n)
performance. On the other hand, you'd be surprised just how many
useful matrix algortihms don't require random access to the ith
element.

Entry: static scope
Date: Sun Jan 28 12:24:02 GMT 2007

how to add static scope in CAT? 

* add static environment to each function = what lambda does
* adapt interpreter to search static environment or 
* make lambda compile immediately all static refs

the latter is actually pretty elegant, since it doesn't require any
changes to the interpreter.

it probably does get tricky when deep binding occurs, since there is
really no lexical structure (i'm thinking 'ifte' here)

(a b c) ((la la a) (jam jam b c) ifte) lambda

hmm..... somethings not right

what about jit? whenever code is run, it is compiled using the
current stack of activation records.

first: create explicit closures as (env . code)
second: find out how to add toplevel environment to this


Entry: restate objectives
Date: Sun Jan 28 14:00:42 GMT 2007

* basis = forth VM on a microcontroller for simplicity
* write cross compiler in a language similar to forth
* use as a FP approach for the compiler language


Entry: confused
Date: Sun Jan 28 18:09:05 GMT 2007

i'm learning new things which comes with a lot of confusion and desire
to start over. let's see if i can recall the road again?

-> wrote PF to have explicit symbolic code + processing

-> found out that extreme late binding might be an interesting addition

-> discovered the benifits of functional programming (basicly 'no
   dangling references')

-> came up with the idea of writing a lowlevel language with a very
   powerful highlevel meta language: cat/purrr, where the
   combinatorial language is clone of Joy.

-> found out that having no variables at all is quite drag, so worked
   around a 'functional objects' model where mutation is allowed (so
   it feels imperative) but underneath the whole previous state is
   preserved. i'm not sure how this is called, but it has a lot of
   advangages.

-> found out that the simulated imperative programming has actually
   some of the disadvantages of real imperative programming: not
   referentially transparent. basicly, i'm using dynamic binding where
   i shouldn't, and shoot myself in the foot plenty.

-> found out that i really need static binding and a proper
   lambda. learned some more about how to implement lexical scoping.

-> found out that fully dynamic implementation is way too slow.

-> thought that it might be interesting to start over. started to
   experiment with a concatenative language on top of scheme, and some
   ideas to write an interpreter which uses static scope. found out
   that it's not really possible to change current CAT to static scope
   without deep changes.

-> came back to an old idea of specifiying the optimizer as a
   declarative language (a formula rewriter). i.e. (dup drop) -> ()

-> learned about haskell monads and started thinking about how i could
   use this to implement a compiler.

-> started thinking about pure functional approach where state is
   really just what's on the microcontroller.

-> started thinking about eliminating cat entirely, and keeping only
   its spirit: the interface to the compiler should be from the
   perspective of the target forth. i.e. compilation is a task spawned
   on the host from the target console.


Entry: incremental improvement day
Date: Wed Jan 31 11:54:26 GMT 2007

home alone today, so time to get some work done. i've been ranting
about rewriting CAT. this ranting full of pathos about rewriting and
starting over seems to be an essential part of my development
process. i've learned not to give in to it, but instead to fork off
test projects to see if the core ideas are actually valid. i

t's not good to act on your impulses immediately. i've been doing that
too much. there is the practical element of bugfixes.

so, to crystallize, what needs to be changed?

* static scope in CAT
* new optimizer using rewrite rules

before i can do any of these, i need to clean up the current
interpreter code.

so. what if i change the VM implementation to one which uses 4
registers instead of a list, after the classic SECD design?

State Environment Code Dump.

To make things a bit more readable, it might be wise to change all
macros to define-syntax form. DONE.

Ok, let's continue not slashing it. If i change dict in a way that is
compatible with the current prototype, i should be able to keep most
of the code intact. The main change being the following:

- dynamic dict is always past along
- static dict is 
  * re-instantiated on 'call'
  * augmented in 'lambda'


actually, i can just hack it in!

if i use a path called 'static' which has precedence on all searching,
and a binding structure like

<defs> <code> run-static

which just stores defs in the (static) variable and restores it
afterward, i have all i need!

is this true?

something i don't get... fuzzy. getting tired.

i'm thinking.. the functions themselves don't need access to the
static environment, since static effects don't change when code is
running, so the VM prototype can remain just like it is.. the change
is in the interpreter completely.


i'm going to remove the tracer to make things a bit clearer.

indeed. i've isolated the appearance of a lexical binding to the point
where a symbol is resolved to a (lex . code) closure pair. now i need
to update the dictionary format to store code as either compiled
scheme functions (primitives) or symbolic closures.

ok.. this seems to work.

the next part is to implement lambda in a way that makes
sense... since environments is one thing, but semantics for cat and
scheme are different..