Entry: introduction to the ramblings
Date: Sun Nov 19 15:47:03 EST 2006

These are some random remarks I couldn't give a place. Only the bottom
part probably makes sense to read by anyone else but myself. Early
posts are no longer accurate or relevant. I started dating entries in
januari 2006, right after "Make Art 2006". Before that is considered
'the dark ages', full of confusion and despair. After the festival
things got better. PF started stabilizing, and the code got cleaned up
a lot. Evolutionary garbage collected :)


Entry: contexts & dictionaries

a dictionary is a collection of links. a link is all there
is to know about a word: a list containing:
(name, compiler, codefield, body, ..., sentinel)
the sentinel points back to the list container.
is used by words like xt>ct etc..

for highlevel definitions, the body contains threaded
code: a list of xts

the dictionary behaves as a stack (d> >d)
so you can create temporary words and remove them again.


Entry: binding & list-interpret

in packet forth, there is a 'symbolic code' intermediate.
code has 3 levels: compiled, symbolic/tokenized and text.
ordinary forth does not have this middle representation,
or at least it is not clearly visible.

the reason for this middle rep is of course flexibility.
in addition to that it allows for a better interface to
lanuages or systems that have an 'atom/list' concept. the 
2 key players here are puredata and scheme/lisp.

now, this adds a nice extra 'feedback loop' that is hard
to use in ordinary forth: the symbolic tokenized representation
can be stored in a list, and fed to the interpreter as such.

this means there is an extra pre-processor in addition to
the usual forth-style immediate/postpone based macros.

now. be careful with these things. they can be very powerful,
and sometimes easier to use than immediate/postpone, but they
behave differently.

the big difference would be late binding. ordinary forth
code does all binding at compile time: all symbols are converted
to xts. the advantage of this is of course speed, but it also
ensures a linear structure: definitions can be undone using
mark and mark>xxx. in addition to that, words can be redefined,
but previous bindings in older code are left intact.

note that when you start to work with hierarchical dictionaries,
and the client dictionaries can write to the main forth, this
linearity is no longer there. also, some plugins use primitive
objects (they behave as pointers to external things). the external
things will not be deleted by drop.

so what's the point: list-interpret does late binding.
every time a word containing a code generator based on 
list-interpret is executed, the words are rebound. so this
trades forths compile-time binding to run-time binding.

this can be necessary in some cases, but if you see you
'have to' use these things a lot, there's usually a problem 
with the order of definitions: you're writing 'nonlinear'
code with forward definitions. try to solve those with
either 'defer' or a 'current context' stack.

my advice would be: avoid 'shameless parser macros' when 
possible, especially when you can use immediate/postpone.
they are slow and make things non-linear due to the runtime 
binding involved. and there's this thing: once you start
'metaprogramming' there's no way back. to see the mess this
can lead to, have a look at the evolution of the puredata.pf
pd support code in CVS. all due to the fact that the word
'pd-self' which returns the current pd object was defined 
after most of the pd support code.


Entry: polymorphy & subclassing

the mechanism for subclassing is already present in the polywords,
implemented as an 'abort-driven' special-purpose interpreter.

this is a runtime search mechanism, so not very efficient. 
for time-critical code you can always call the handler directly 
if you know what type to expect. so 'float:+' instead of '+'.

a polyword is a chain of 'service providers' for a certain abstract
operation associated with a specific abstract stack effect. if
the first xt in the chain throws an e_type exception, control is passed
to the next word in the list and so on. this can be used to define
'globally known abstract methods on abstract objects' or generic functions.

for objects which are bound to only one method, the standard forth
create .. does> combo can be used, where the word already has the 
data bound. also, there is a lot more possible by using codefield
extensions to implement a real object system. 


Entry: list syntax

for data handling lists are very convenient. especially when
you combine them with 'map' style words. this gives it a bit of
a lisp feel.

but using lists for defining the syntax of the language is
imnsho a bad idea. i do not use it any longer. see the evolution
of setup.pf in CVS to see what mess this leads to.

forth is not lisp.

for them that need an example, and myself when reading this in a couple 
of weeks, this is the case:

if ( w1 w2 w3 )    vs.    if w1 w2 w3 then 

i could go left here, but that would immediately lead to the nesting 
case:

if ( w1 w2 if ( w3 w4 ) )

now, first remark. why the hell did i add a list parser? because i want 
to be able to read s-expressions. (though pf does not use pairs, but
linked atoms and a list head that points to head and tail to implement
stacks and queues).


Entry: re-entering forth

because forth has it's own mechanism of keeping track of control flow
(data and return stack), there is no need to use the c stack for this.

the forth is structured as a virtual machine. there is an object pf_vm_t
defined in forth.h that is just this. almost the entire c api for the forth
involves a pf_vm_t. 

because the c stack is not used, you can 'freeze' a program, and just
continue running it later. the program running in the vm can be considered
as a task, or a coroutine. this is more flexible than an ordinary caller/callee
relationship. (see pf_vm_resume() in libpf/forth.c)

if you just want to have a caller/callee relationship, you need to use
the pf_vm_call_* functions, which will synchronize with the C stack.
make sure you also install an abort handler using '>abort'


Entry: packets & objects

packets cannot contain other packets. this rule is necessary to
prevent hilarious situations with circular references. packet management
is based on reference counts. these are not sufficient if circular
references can occur. (the dictionary, implemented as a list, used to be 
a packet, and made me realize this).

lists are packet containers, so cannot be packets. this is the reason why
they are always copied on dup or @, and deleted on drop and when overwritten
by !

strings are not packet containers, so are implemented as packets. streams, 
dictionaries, abstract objects all can contain other packets, so cannot be 
managed using reference counts and need a gc to be managed automaticly. 
but there is no gc.

so for objects that are not packets, you are on your own: they need
to be explicitly created and freed, and are represented by a pointer.
dangling pointers are possible, and will crash the system when used. 

so, there is no real object system in pf. the reason is that there
are a million ways to write one (there's a survey on the web somewhere),
and some of these are very simple so if you need one, just pick one.
have a look at dictionary.pf to see what i think it should look like.

to sum up, there are 3 kinds of basic objects in pf. these can all
reside in a variable (container for a single thing + type description)
from forth, values have a type and are thus like lisp atoms, though
pf_atom_t acts both as container and type description.

1. scalars (like pd's) includes abstract objects (pointers).
2. packets. (raw buffers or things that behave as such).
3. lists of atoms.

these can be combined into any non-circular data structure.
you can build circular structures using the atom_pointer scalar
type, but have to manage the pointers yourself.


Entry: real time

forth is is called a real-time lanugage. why is this? not because
it is fast, but because it has a predictable execution pattern.

pf is quite slow for a forth, but has a predictable stationary 
execution pattern. stationary means here: all memory is re-used 
after an initial 'fill-up' period, which can be forced to occur 
in the initialization phase if necessary. this is a trade-off 
between static and dynamic allocation.

note that linux and most unices are not real-time operating systems
per se, so pf can't be real-time. this is about forth not breaking
(soft) real-time properties in a (soft) real-time system.

forth is real-time, because it allows you to do a lot of things
without using unpredictable subsystems like dynamic memory allocation
and garbage collection. forth is very linear. this fits well in simple
settings for 'stationary processing'. most things that are about
real-time are about events and streams. forth fits this picture well.


Entry: patterns

forth is poetic. and yes, i'm just going to give in to that forth
meme spreading urge again.

you know this thing about patterns. design algorithms. it seems working
with forth is like extracting patterns. refactoring is easy. most of the
time nothing more than cut and paste because of the lack of names for
local variables (just stack positions).

once you isolate the small heavy used 'channels' a nice modular structure 
emerges. forth's way of telling you that you didn't make a clean cut is 
to spit a lot of complexity at you at once. this is the thing that needs 
to be tamed. one good indication is the amoint of 'dup swap drop' in your
code. when i first started forth, i was amazed that code i saw from other
people contained so few stack manips, while mine was full of it. over time
i got a feel for organizing things to avoid the juggling.


Entry: meta data

the problem with handling pure data (a sequence of bits) is meta data.
which ever way you shake or turn it, it's always in the way. all i want
is the damn buffer. now what is meta data, actually? it is the code
to decode the bits. so, why not dump some real, proper code in the
meta data? ta-da: packet forth.

if you save an image in internal format, the contennts of the file
will contain something like this:

#(image/s8/100/100/1 80000)<binary data>

whenever a '#' is encountered in the input stream, a parser command
is executed. currently, there is only one, which is '#('. this reads
a list containing the type of the packet, and optionally the nb of
bytes following in the stream. note that data can be aligned by
adding a number of spaces before the closing ')'. the default nb
of bytes is the total data size of the packet. this works only with
'pure packets'. i.e. those that really are a sequence of 'dumb' bits.

now, if i load this file using

"image.pf" load

i will get the contents of the file on the data stack, being an
image packet. the nice trick is that i can just edit the file,
add a command after the binary data, i.e.:

#(image/s8/100/100/1 80000)<binary data> image:double

and do the same again, and i got a double size image when i
load it. so pf data files are 'electric', because they are really
code files. now, this has quite some security implications, but
that's not what it's about here. it adds so much flexibility if
you keep your data files as code. i could just have something
like:

"otherfile.pf" load

without someone ever noticing it comes from the other file. 
anything can be plugged there of course. it's just code.

so meta data can be very broad. there's a basic type system,
and a lot of conversion and processing logic, so you can emulate
almost anything. it's really easy to save a data atom to a file
and append some symbolic processing code to it.

the nice thing about this, is that you can use the pf 'file format'
in other applications, and have this kind of magic happening.
again, terribly insecure, but worth having as a tool in the box.


Entry: mapping and drawing -- are packets mutable? 

yes and no. that's politics.

standard packet operations (maps) perform copy on write, 
so each operation has only local effect. no packets are muted.
this allows for a clean functional style of programming,
with minimal copying overhead.

for incremental updates (i.e. storing a single pixel) this 
is crazy. so, we introduce a hack in the form of 
the 'accumulation packet' aka 'bucket'. this captures the 
'drawing' concept, next to the already present 'mapping' 
concept.

an accumulation packet is not a separate type, but any
packet is assumed to be an accumulation packet for functions
like store, draw, and any other that might fit the shoe
of 'partial mutation'.

so, you have to be careful. most of the time it is clear
if you are dealing with muting or non-muting functions.
index! is such a mutator. rendering text to an image is
another one.

only one thing to remember. if you use muting operations,
and you must operate in a functional framework, you need 
to 'reserve' a packet. this will ensure it is
local by performing a copy if necessary. something like:

: deadbit reserve dup >r 0 123 r image:index! r> ;

would behave as a functional map. the original,
if used elsewhere, will not be modified. in fact, some
functional image processing primitives are implemented
with reserve + a muting operation.

this is of course just the eternal call by reference /
call by value debate. for standard (functional) usage
a packet behaves as a rvalue (implemented as a reference 
to a shared readable object. lisp style.), for accumulation 
packet usage, it behaves an lvalue (implemented as a 
reference to a shared, writable object. forth style.)


Entry: exceptions

exceptions in pf are implemented using a 'try .. recover .. endtry'
sequence. 'try' installs an exeception handler on the return stack,
which consists of a marker (DS,RS depth) and a return point.

if an error occurs (either in a primitive or from threaded code
using 'throw') the code after the 'recover' word is executed.
before that the stacks are rewound to the state before 'try',
and the exception (either an e_ style primitive error or any
object passed to 'throw') is placed on the data stack. note
that the stacks are rewound to a certain depth, but that does
not mean their contents is specified.

this can lead to strange situations when a lot of data is consumed
from the data stack or rearranged. things you put on the return
stack will still be there, but the data stack contents is
unspecified. depending on what you are doing it might be trivial
to know what is on the stack.

on normal execution (the part between 'try' and 'recover' did
not cause any errors) the error handler is removed after completion
and code after 'endtry' is executed.


Entry: cross-compiling

[ before reading this: cross compilers can best be written
using dictionaries and modified search order. that's
the standard way, which seems to work well. (badnop uses
an ad-hoc name mangling approach which is really non-
standard, but has its benefits).

the text below advertises pf for writing these compilers.
while the author does not follows his own advice. ]

or how to write a forth compiler in pf for all kinds
of target (virtual) machines.

first: there are some problems. pf is not standard forth wrt 
to symbol names. in addition, there is special syntax for lists,
strings and pure packets. this could be fixed by adding a
'plain forth mode' to the parser, but currently i found no
need for such a thing.

it seems to me the only reason you would need this is when
you want to use standard forth code verbatim, which is a
bad idea anyway.. for most cases, the platform is 'exotic'
and everything will be built from scratch. using pf to define
the 'subforth' is very convenient: lots of tools available:
basic forth, symbols, lists, strings, unix interface, ...

the general setting is this: pf will be used to host all the
symbolic stuff using it's own dictionary. the target forth
only has threaded or native code in its memory buffer, and
all communication between the 2 uses target execution tokens
(i.e. code field adresses) and literal values. this keeps
the target forth very simple.

the host forth needs to change behaviour to make this happen:
in target-mode, which is implemented as a child dictionary
for scaf, compile and interpret work different, or ':' and ';' 
are redifined.

the execute semantics of a target word (its definition for
the host dictionary) is something like

: word target-xt target-execute ;

note that target-execute can do all kinds of magic, like
talking to the target over a serial line or midi (distributed
forth). the compile semantics ( xt -- ) is like

: word, xt>body follow @ target-compile ;

(follow is because the literal is prepended with 'lit')
so plugging into the 'compile' and 'execute' channels
effectively enables a forth to be connected to another one,
with most of the host's functionality present. this, my
friend, is a very powerful tool. 

so. what is forth? a way to connect machines.
forth is a communication protocol.

most problems with data/stream processing can be solved in a
very elegant and usually efficient way with state machines.
split up the problem by building a machine that reflects the 
problem domain. this can be done in software, but also in
hardware. for forth, there's really no difference. build
as many machines as you want, and program and connect them 
all in one language.

see plugins/scaf/scaf.c and script/scaf.pf for more information.


Entry: forth again. and code and packets and objects.

yeah. it's a cult. i had a serious forth-fit yesterday realizing the
true nature of forth. it's really nothing more than the simplest
possible way of getting your code to run on anything that behaves as a
state machine. because of the single-pass nature of symbolic and
threaded forth code, and possibly some bytecode and token
intermediates, communication between forths is really straightforward.

because forth is so linear, some things are hard to do. it's not a
lisp. (yeah, another fetish). this is its only weakness i think. so
reasons. explanations. memeplex cleanup.

GOOD:
* forth is easy to write from scratch.
* forth is even easier to write from scratch in forth.
* forth is an interpreter and a compiler at the same time.
* forths can be chained. compilation is one-pass and so chainable. (coroutines, pipes)
* forth is extensible. no artificial border between language and program.
* forth code is very dense. there are no named intermediates.
* forth has no real policy other than stacks and words.
* this makes it behave very well in real-time conditions.
* forth supports both prefix and postfix message protocols natively.
* pf is a 3 layer forth. text->sym.tokens->addr.tokens

BAD:
* forth is not scheme.
* forth code is very dense. there are no named intermediates.
* forth avoids non-linear data structures (circular) to avoid gc
  which can be a terrible pain sometimes.


[EDIT: funny to read this. serious meme high. think i got a bit better
at naming the concepts though. the good thing about forth is that it's
almost completely concatenative (in a functional programming way, with
implicit composition). the bad thing about forth is that this is done
in an 'optimal' way where optimal is mainly defined as minimal
interpreter complexity: in other words, tricks which are not entirely
conceptually clean wrt to functional composition, are allowed, as long
as they are simple to implement.

the main example here is macros, which are predominantly used to
introduce 'local nonlinearity'. an example is if .. then .. else,
which can be solved more cleanly by using highlevel functions, but in
forth is just a syntactic trick to compile conditional jumps more
efficiently. see 'if' below.]


now, regarding this packet thing. this is really nice, because it
separates state from functionality very clearly.

* code is just code, and has no state other than constants
* data is just data, raw bits with some interpretation tag.

(ahem.. code data duality.. this is triggering too many
associations.)

on the top level (or any level), you chain code and data
together. this separation between stateless functions and data in a
form that is raw and white box is kept as long as possible in packet
forth. this approach is much easier than objects, and has several
benefits:

both are serialized easily. and that's a key point. saving and loading
code is easy, because it's just a sequence of words.  saving and
loading data is easy, because it's just a sequence of bits with some
type description. both can be present in the same stream, making forth
an extensible communication protocol.

so load/save for all code/data in packetforth is trivial, linear and
what not. you can send these things over all kinds of channels, in all
kinds of forms (internal symbolic token rep, internal compiled rep,
text rep, ...)

the thing we need to give up here is abstract objects. sorry.  support
for abstract object types in packet forth is not very good. it does
not fit the packet concept, so i decided to not loose too much time on
it.

abstract objects cannot be serialized in packet forth, and are just
there to abstract things that cannot be made explicit as functions and
packets. currently they are used for i/o streams and other i/o
adaptors that benefit from an object representation (because they have
state invisible to pf), or that already are implemented this way. this
is the case for most wrapped libraries.


[EDIT: this got a lot better since the 'serialize' invented during the
huge non-blocking rewrite. raw packets can be serialized like all
other data, though not in an endian independent way.]


Entry: packet forth is for insecure programmers

haha. forthers are weirdos. some of them are really obsessed with
eliminating bloat. not that i don't like that, but for packet forth
this is not necessary. i don't consider pf bloated, but compared to an
ordinary forth, it's a lot less 'hard core'.

[EDIT: getting more hard-core by the day :)]

the problem with real forths is this hard-core-ness. i've fallen for
it too, but maybe it is the only way to truly understand forth: to go
to the bottom and take out everything that isn't necessary. especially
to take out the problems you created yourself by adding 'generic
features' that you don't use. forth is really about specialized
hardware and software. the main reason why the ANS standard is a joke
for most purposes. forth only has an edge if speed, responsiveness,
flexibility and code size are all required at the same time, and
everything needs to be written from scratch.

[EDIT: i'm just parotting here, but there's something to say about
hard-core forth: hard core simplicity is really addictive. it's a
puzzle that has challenges you don't find in other languages, because
people seem to have stopped caring about issues. i got really into
this with the 8-bit pic forths: very non-standard, full of hacks to
make code small and fast. PF is indeed a bit less extreme here, since
pure speed at the VM level has never really been an issue.

i just finished 'meta math' by chaitin. talks about these ideas a lot,
but from a more mathematical and philosophical pov.]

for most of today's problems, bloat is ok. bloat is evolution at
work. evolution leads to dirty but working solutions. unix is just
that: a bunch of tools cludged together. unix used to have the same
'minimal' philosofy as forth has, but that time is gone. most linux
systems are bloated as hell. considering what unix has made possible,
i do not consider this a bad thing.

[EDIT: for forth, the evolution goes in the head of (mostly) a single
developper. there is a lot of bloat, but most of it is in the form of
volatile thought. once done, the solution is more elegant than larger
distributed projects like unix systems. maybe forth is more about the
size of human brains than anything else.]

for some people minimalism works out. look at ColorForth. CF is a
piece of art. i don't think it has a single word that can be
removed. it is a complete self-hosted dev env running on the bare
metal. i bow to ColorForth. i bow to Chuck. he figured it all out. his
ideas, especially the hardware ones, are guerilla warfare. it is
really about bringing power to the people. and that's about the same
as i want to do with packet forth. be it in a little more modest
way. i'm not into designing chips yet, but who knows.

[EDIT: crackhead. though i still bow to chuck :]

ColorForth works for things that it is designed for: writing stuff
from scratch to eliminate all possible bloat and to taylor it too the
hardware. In fact Chuck has done this to the extreme: half of CF is
the hardware (25x and 50x chips).

pf is a little less radical. it embraces forth and unix at the same
time. think about it. both are really the same.  unix is a collection
of loosely coupled programs. forth is a collection of loosely coupled
functions. unix is an operating system. forth is an operating system.

the only real difference is that unix is big and bloated, and evolved
into a standard. it is very 'concrete'.  forth is more of an idea. a
design pattern. because it is not standard it can stay in motion which
keeps it lean and mean.

note that both solve almost the same problem in 2 different ways:

* unix is a grown standard. standards have this effect of introducing
bloat because of backward compatibility: there are versions (library
hell) that can only be effectively solved by duplication. this is on
the binary level (debian). on the code level a lot of bloat can be
eliminated though (gentoo), but there's still problems with syncing
different people's efforts. you know. 'bazar' and all. unix works
because it is a solution to a real problem: providing a standard base
for people to work together on large projects. look at GNU/linux and
all other open source efforts: most of these things can work together,
which is an achievement in itself.

* forth is a pattern. it can be used to solve a specific problem using
design and not evolution. well, evolution is possible of course, but
because of the absence of 'standard' issues, bloat can be eliminated
the moment it pops up.  this is not specific to forth, but still, the
analogy to unix is striking.

both have the same sort of modular approach (variables <-> files and
words <-> programs) and a 'manual' of how to port it to any
architecture.

now, what is packetforth? it is the combination of unix and
forth. packet forth takes functionality from unix, in the form of
system services, code libraries and other programs. it can talk to
these. in addition to that, it's virtual machine is more highlevel
than an ordinary forth: it uses lists instead of raw memory buffers,
and has a built in type system taylored to support media processing.

but it is still a forth. and can be made to host or tether other
forts. so it's a win-win situation. it is integrated into the bloated
unix world, so it can piggyback on that, and it is a forth so you can
do anything that you would use a forth for: compiling stuff from
scratch for other architectures, and connect them to the overal
system.

i use this mainly for hosting other virtual machines. i've discovered
that most things related to data processing can be solved in a clean
and usually efficient way using virtual machines to decouple highlevel
and lowlevel code: to create an extendable and interactive system that
is still fast.

[EDIT: note that this project has moved mostly to the scheme/cat based
brood system, since PF is considered finished and therefore i can't
break it any more with deep changes.]

this is an approach used in other fields to. most highlevel languages
these days compile code to bytecode to run in virtual machines to
solve portability problems. and there is the parrot project that aims
to be the generic scripting language vm.  so it really is a sign of
the times. but combined with forth, a vm enbles a lot of cool tricks.

thinking about it, i could have written packet forth in forth, which
would have made the lowlevel-highlevel distincion a bit less
prominent. (in short: there is too much C code that would be easier to
implement in forth.). the problem with this is a practical one: when i
started i didn't know how to write a forth in forth. so there you have
it. bloat generated by evolution and the practical limitation of
typing speed etc.

anyway. packet forth works. i like what it has become.  it has a
reasonably highlevel feel which is a bit like lisp. it can do things a
bare forth cannot do, and is fully integrated in the operating system,
so i think i'm going to live with the little bloat left. can't rewrite
forever.

[EDIT: funny, since what i did next is a big rewrite :]

it's my forth. and i've learned that does not mean it's necessarily
yours. though i think the ideas in here are provocative enough to make
you experiment :)


Entry: using the return stack

packet forth does not have local variables. this is a good thing. but,
there's the return stack. during the execution of a function's body,
it is not used, so you can use it to store atoms that are in your way.

i've found this to work very well. if you look at my forth code, you
see i don't use 'swap' and 'over' a lot. i use the return stack: '>r r
r>'. the main reason for this is that it is more visually distinct:
the fact you need to balance >r and r> gives code a little more
texture.

'r' also behaves as a local variable. usually i store the current
object in r. i.e.:

# ( zzz.object -- )

: zzz:magic 
        >r
	r  zzz:magic1
	r  zzz:magic2
	r> zzz:magic3 ;

the fact that you NEED to balance r makes it actually more ehm
user-friendly :) i.e.: if you make an error to this, code will
immediately fail. real forths crash when you do this. pf does not,
it just throws a type error if it does not find a valid return
address on the return stack.

morale: implicit arguments are fun, but don't loose your head with
'swap dup drop over'. if you see a lot of those words interleaving
your code, try to find out what you really want to do and chop it in
pieces.

some words of caution though. pf does not have an explicit control
stack [EDIT: which would be yet another global variable, getting in
the way of multitasking and fast context switches], so some words use
the return stack for this. examples include:

for .. next
try .. recover .. endtry

(hint: define a word using these and look at the code using .xt)

control words that do not use the return stack and can be freely
interleaved with >r r r> and others that modify RS:

if .. else .. then
begin .. again / until

you can always create new stacks of course. you can even implement
automatic cleanup of these if you install a cleanup handler on the
return stack. the only thing to be careful about is that jumping to
raw threaded code (what 'if .. then' and 'enter..  leave' do) and
executing an xt are not the same thing.


Entry: contexts & stacks.

[EDIT: most of the stuff below is oboleted by the non-blocking
rewrite. PF now has an OS-friendly cooperative multitasker which
presents a blocking IO interface to forth tasks, and uses non-blocking
IO for the implementation to ensure blocking in the PF<->OS link is
considered an error, except for the 'select()' point.]

forth should be simple: 1 (system) thread, 1 dictionary.  i forth per
processor. no smp. 

but, pf is about integrating in unix and at this point in time that
means threads. threads suck because they are unpredictable. it is a
gigantic hack actually, to have processes with shared memory. but
there is blocking io in unix and threads are an elegant way to deal
with that.

let's leave the politics aside for a while. how to support threads?
no. first. why threads are bad for pf: forth operations are not
atomic.

when you design a forth as an os, on a single-processor machine, you
can avoid pre-emptive multitasking and use cooperative multitasking
(coroutines). the big advantage of this is that atomicity is under
your control. the only problem is interrupt handlers, and depending on
the machine, you can usually make a subset of the forth words atomic
so you can use those in an interrupt handler.

threads are bad for pf, because pf's data structures (lists) are not
thread safe. this is largely a design error that is too hard to
correct at this moment. in addition to that, forth words are
implemented as C functions, and during the execution of a C function
some invariants may be invalid.  this can be solved when the data
structures are thread-safe though.

so the short story: correcting this is not possible without a complete
rewrite of about every line of code. moreover i doubt if it would
really add a lot of value, because it is not too hard to implement
extra synchronization constructs (queues, stacks) that are
thread-safe, and that would effectively solve the problem.

so the current approach is this: threads are only used for system i/o
drivers. the forth itself is single-threaded, and will stay like
that. all other things can be solved using processes (private
data/code per forth) and shared memory and pipes for communications.

[EDIT: i probably still need threads when i want to use libraries that
do blocking IO, or use external processes as is the case for
mencoder.]


Entry: reinvent the wheel

yes. it is important. reusing code is a joke. don't go there.  for
some things it is really necessary, like reading quicktime files, or
interfacing to graphics hardware, but if you can: do it yourself. use
an expressive language. be independent.  it is the way to
enlightenment.


Entry: symbol vs xt in dynamic tasks

[EDIT: the original task code needs to go and be replaced by the new
one. this will be obsoleted soon. task local words (true objects)
might be necessary.]

forth can be convoluted. packet forth maybe more..  the problem i ran
into is state again. in the original opengl.pf demo file, there are
several tasks. state for each task is kept on the return stack. is
this a good idea? the problem is not necessarily coroutine state
(where are we in the code) but objects.

it uses smoothers. these are one-pole lowpass filters with embedded
state. the original approach is to create a symbolic word. this is
done by feeding symbolic code to the interpreter.

see above for more ramblings about this. i'm still not sure how to
handle this 'pattern'. problem with feeding stuff to the interpreter
is that once you start, you have to continue this way, because forth
is very linear.

another problem is that tasks are really dynamic objects.  as far as i
know, 'standard' approach in forth is to create static tasks. then it
makes sense to define symbolic words for it.

with dynamic tasks, the words need to be cleaned up. here an object
oriented style is more appropriate.

so. what's the final word?

either use a separate dictionary for each task, and choose either
symbolic word creators (create .. does> constructs combined with
instantiation through list-interpret) or xt creators. a task then
really is an object: it needs to be destructed properly.

the other option is to use functional programming, with clear
separation between code and data. data can be put on the return stack,
and code can be reused.

it would be very nice to define links on the return stack, that will
be cleaned up automaticly. i still don't know if this is a good idea,
but at least it is an option.. [EDIT: this looks like the only way to
go. it's not completely safe (pointers) but i think that's ok. no
nanny pf.]

so, conclusion: there will, at least in my mind, be always a conflict
between the object-oriented (or dictionary oriented) style, and the
functional programming style.

[EDIT: this has a name.. i forgot. something about either make data
easily extensible (OO) or code (FP)]

i need more time to stabilize my opinion, but it sure feels that
functional programming with data structures in lists is a lot better.

then, following that thought, we end up with lisp again. the more i
work with pf, the more i really want a lisp. :))

[EDIT: see cat/brood]


Entry: packet forth link structure

[EDIT: i don't really like this.. but it's a deep cut to change
it. related to typedescription == opcode idea, and code can be any
list, instead of only 'blessed' lists.]

all code in pf is stored in links. a link is a list with a certain
structure (see above). a dictionary is an abstract object which
contains a list of links, and a pointer to a parent dictionary.

[EDIT: only one global dictionary. no parents. this is faked at
compile time only by splicing of a part of the global dictionary into
an isolated object, after the code has been bound.]

this structure is more complex than ordinary forth, where the
dictionary is just a linear array. because of this, allot works
differently. the dictionary can be accessed atom-wize (i.e. using , )
or link-wize.

this gives a lot of freedom to shuffle around code. as long as links
are in the dictionary, they can be accessed symbolically.

to come back to the temporary function subject. what this can be used
for is to build some kind of code structure on the dictionary stack,
and then, when it is finished (compiled or 'connected'), move it
somewhere else, for example the return stack.

this makes me doubt again, wether or not dictionaries should be
abstract objects... but let's not go there. there are infinitely many
things wrong with pf that could be done a lot simpler.. maybe pf2 will
do it that way :))


Entry: tailcalls: return stack juggling and abstraction

of course, it is safest to do this with macros. but, it is possible to
do it with ordinary words too, which could lead to more compact code.

then the only problem is tail calls.. you never know.  so i leave it
at macros until i understand how to do it..

this has implications for everything dealing with the return
stack. especially exceptions and for..next loops.

anyway, this tailcall business needs to be examined. probably it's
much simpler than i think. [EDIT: yes, but threading needs to change
to a simpler tokenized (type tag) call threaded forth.]


Entry: create vs. link

again, related to temporary code. 'create-xxx' is really a parsing
word. using parsing words in functions is really a pain. so, it's best
to have non-parsing 'link-xxx' words that create a named link (name
from DS).

so, in short. use 'create-xxx' to build static things, and use
'link-xxx' or 'make-xxx' to build resp. symbolic or xt referenced
dynamic things.


Entry: dictionary

i took out the dictionary object today. this means no more C coded
hierarchical namespaces.

all is reworked as an allot stack. it can take links (circular lists),
which are used to store code. in addition it can be used as an
ordinary stack.

the effect of the dictionary object can still be implemented by
swapping in/out some other list containing links. similar to temp
code.

[EDIT: temp code is no longer implemented... given a nudge or two to
the module loading code, this could be revived using the 'mark' words.]


Entry: dictionaries objects and binders

hierarchical namespaces are removed, so objects etc need to be
implemented differently. this is done by using the word 'mark', which
tags the dictionary, and some 'mark>xxx' word, which builds some
structure from the marked part of the dictonary. currently there are:

word		behaviour of created word

mark>buffer	variable
mark>dict	find
mark>object	find execute
mark>binder	accept find execute/compile


this can be used in conjunction with tasks and coroutines to have
temporary code stored on a return stack of a running routine. see
script/scheduler.pf

[EDIT: the first thing that's acually still alive. used in a lot of
code where a local namespace is necessary to reduce typing, but would
interfere too much with the global namespace. an example here is the
video players/recorders, which have a very narrow default interface,
but have a lot of tuning functionality.]


Entry: attributes

Now, on to ease of use. the nice thing about forth is: 1 word, 1
effect. This can get rid of a lot of red tape while programming: each
thing has one default effect.

This is not compatible to objects though, where you have several
things an object can do, identified by a name for each action. The
dictionary objcts and binders above can do that, though in most cases
it is easier to just do it the forth way.

This is very much like files are implemented in unix: each file has
some standard behaviour (read and/or write), but in addition to that,
other attributes can be modified by IOCTLs.

All attributes are stored in a dictionary object, which behaves as a
class object. Each representative word stores a dictionary of
attributes.

What problem does this solve? For instance setup of i/o modules. It
gives a nice trade-off between object oriented programming, and forth
style one-hit.

It could be implemented using does words: first item in body is
instance data second item is class (dictionary find xt)

msg grab format!

Now, could we store this in the link by default?

Maybe wize to implement a word differently:
meta link->first (name compiler finder)
body link->last  (cf xt xt xt xt)

this would still enable fast execution:
xt = pf_atom_t -> pf_list_t

update: attributes are a bad idea, but the double link structure might
be a good idea actually.


type symbols & garbage collection

what about this: type indication should be symbol only.  xt's are just
atom pointers pointing to a cf.  polymorphy should be handled by this
symbol mapping.

<EDIT> Thu Dec  7 13:52:41 EST 2006


Entry: too many stacks

what about dynamic scoping for local context? (clisp's 'special')
in scheme there is no such thing, and in clisp the default is
now lexical scoping, but in forth it might make sense because
there is no lexical scope. implementing one would require a whole
lot of other changes.

what i mean is: we have several auxilary stacks: input, output,
abort, (dictionary,) ... who all need some synchronization with
the return stack. implementing tasks and coroutines becomes harder
because this context needs to be saved too, which does not happen
at the moment. so it might be wize to take out these stacks, and
find a mechanism to put them on the return stack.

what about a searchable attribute list on the return stack? this
could help to implement a whole lot of local context things, like
i/o, local variables, abort handlers, ...

the mechanism would be to have an extra type: a_context, which is
a symbol followed by a data atom.

so.. i went out and tried to change this, only to find out it
is a big mess (the input/output stacks). so for now, i stick
with manual synchronization. there seems to be no way to solve
it in general.


loading inside generators/coroutines

ran into this while writing a sequencer for a psycholigical
experiment. this is possible, but it can only perform one
script at a time. in short: there can be only one open file
at a time, because the input/output stack is global. the
console is not a problem, because it is pushed/popped to the
stack.

should this be task local? (see above: attributes). probably
not. the way it is now (dynamic scope, if you please) is ok for
most purposes. if you need multiple script inputs, you can
always use stream-read-atom.


Entry: polywords again

to simplify the code and possibly speed up things a bit,
it might be wize to chop up polywords in 2 steps:

* first, execute the scalar word. this will do the
  multiplexing based on the numerical type description,
  and so will be a bit faster. this is built into the
  kernel and doesn't add bloat to the forth code.

* then, execute the packet polyword in the way it is
  implemented right now.

scalar processing is built in, so scalar processors
will be primitives. the exception mechanism is not necesary
to execute the polyword xt, which could be stored in the
field after the codefield, i.e. directly in body.


Entry: coroutines/generators/tasks

seems the way i use generators/tasks, it makes sense
to distinguish between switcher words (which swap the
return stack) and body code, and explicitly assign code
to the switcher using 'start'. it is left to the user
to build more convenient starter/restarter words on top
of this.

when this xt calls other xts, a 'nested control structure'
can emerge. as with other control structures, this should
balance the data stack over entries and exits, since the
data stack is a shared resource and is not saved in the task 
structure on a yield.

to name things: within the control flow of a single
coroutine, there can be 2 types of xts.

* normal xts: do not call yield, do not need balanced 
  data stack

* control xts: can call yield/transfer or call another 
  control xt, but the data stack needs to be balanced 
  between entry and exit (which can be 2 different xts),
  or equal to the desired yield semantics.


Entry: abort vs exceptions

i'm gearing towards taking out abort, since there
are too many occations where it can lead to quite
awful bugs, by not restoring some context.

this would eliminate the abort stack. in the same
spirit, it might be interesting to also eliminate
the i/o stacks, and use only 'current' variables.

there's a problem with this though: exceptions are
saved on the return stack, so abort serves the 
purpose of a 'super-exception'. not trivial to take
out. chould check how this is done in gforth.

currently, there's a problem with the console and the
coroutine/generator implementation, which swaps out the
return stack.


Entry: ctypes

what are they? void pointers.
do we really need them? no.
so i'm starting to eliminate them, starting with x11
and streams.

this means there's a new definition about what a
packet really is: a generic object, with the constraint
that packets cannot cause circular refs (this messes
up the refcounts, and with it general forth tree-like
structure). i only encountered this when the dictionaries
were still packets (could contain themselves).

so non-pure packets can be wrappers around generic
objects.


Entry: streams

these need to be handled differently. what is a stream?
a port that supports the following methods:

- getc
- ungetc
- vprintf
- write
- read
- close

current streams are strings and files/pipes. this is 
changed so that a stream is a packet now. string streams
are still a bit of a hack though.


Entry: revision

after a couple of months working on another project,
i came back to pf with a mixed feeling. one, i really
like the forth. some things are still a bit complex,
like the abort stack handling vs exceptions, but still,
it's nice, and modular.

but the C core, i really dislike. especially the verbosity
of plugin argument handling. this needs to be cleaned up.
the code is so full of sequences like:

check packet  (PF_STACK_CHECK)
get packet    (s->first->next..->w.w_packet)
get header    (pf_packet_header)
get subheader (pf_packet_subheader)
check something in subheader, usually a subtype descriptor
clone destination packet from header->description or some
information in the subheader.

this needs to be simplified.

i do this locally using macros for most places where it's too 
absurd to handle, like the image processing words, but there
has to be an easier way to make the transition from forth to
C functions, avoiding a lot of typecasts.


Entry: parrot

been a while again.. i should paste the other text i wrote about
a supercollider-like design, desiredata, and lisp. it's clear to
me now, that what i originally wanted to do with packet forth, is
not possible in its current form: do memory management correctly.

currently, it's reference count based, which is nice, but either
necessitates manual collection for circular references, or abolishes
them. it's good for my ego to know Python started out like that :)

what i see now, is that optimizing for memory re-use, which is so
vital to obtain speed in video-processing, is largely incompatible
with mark/sweep garbage collection. however, a system can be built,
(think supercollider, desiredata) where the central server has a
'fixed' dataflow structure, and the controlling language is very
flexible (scheme, or some lisp/forth hybrid like i'm trying to
build).

in the case with a clearly split design, the client can really
be anything, and the server can be fairly simple: a plain refcount
based forth. decisions then need to be made as to where exactly
storage and functionality is implemented. interesting problem, and
as far as i understand the only path to completion of my 'vague' idea
about pf..

now, since parrot is nearing completion, it might be interesting to
build the gui/client lanuage on top of that.


Entry: objects again

an example of a pattern that occurs a lot:

* grab : get video frame (camera is abstracted as one word: READ)
* send a control message to camera

what you would do normally is "camera grab" and "camera control".
what i suggest is something like unix ioctls. i introduce a word
'control' which returns grab's underlying object to which other
messages can be sent, instead of the pre-bound 'grab'.

: <control> <'> xt>body @ ;
: [control] <control> postpone literal postpone @ ; immediate


Entry: parrot
Date: Wed Jan 18 17:12:13 CET 2006

Larry Wall has sexy ideas. 
http://dev.perl.org/perl6/talks/2000/als/larry-als.txt

[Perl as a Low-Level Language]

Perl as a low-level language.  Polymorphism is your enemy if you're
trying to do low-level programming.  If you want to get early
compile-time binding as soon as possible, you want the compiler to spit
out very efficient code, so you write your loop to declare i as an
int, then by golly you want your compiler to spit out very efficient C
code.

...

[Perl as a High-Level Language]

By way of contrast, if Perl's going to become more a high-level
language then polymorphism is your friend.  You want to delay your
binding as long as possible as to what implements what. Perl has
always been in the business of allowing, but not requiring,
abstraction.  We'd like to put in more support for functional
programming, for logic programming, and for what are called "little
languages".

[Perl as a Metalanguage]

The folks at Bell Labs invented this notion, soon after they invented
yacc.  "Cool, yacc lets you make a grammar for your own language".  So
all these itty bitty languages sprang up that were for their own
purpose.  So each time you wrote a new program, you wrote your own
language for your program.  This was cool except that you had to learn
the new language each time, and it was always different.  We'd like to
explore the notion of using a big language as a little language.

If it's okay to program in a Perl subset, and you define a subset that
looks like a little language, then how do you get around paying the
price of the generality of the big language?  We have ways of thinking
about that.

-- 


that's something to think about. if parrot is going to do all these
things, then parrot is what i need.