Entry: introduction to the ramblings Date: Sun Nov 19 15:47:03 EST 2006 These are some random remarks I couldn't give a place. Only the bottom part probably makes sense to read by anyone else but myself. Early posts are no longer accurate or relevant. I started dating entries in januari 2006, right after "Make Art 2006". Before that is considered 'the dark ages', full of confusion and despair. After the festival things got better. PF started stabilizing, and the code got cleaned up a lot. Evolutionary garbage collected :) Entry: contexts & dictionaries a dictionary is a collection of links. a link is all there is to know about a word: a list containing: (name, compiler, codefield, body, ..., sentinel) the sentinel points back to the list container. is used by words like xt>ct etc.. for highlevel definitions, the body contains threaded code: a list of xts the dictionary behaves as a stack (d> >d) so you can create temporary words and remove them again. Entry: binding & list-interpret in packet forth, there is a 'symbolic code' intermediate. code has 3 levels: compiled, symbolic/tokenized and text. ordinary forth does not have this middle representation, or at least it is not clearly visible. the reason for this middle rep is of course flexibility. in addition to that it allows for a better interface to lanuages or systems that have an 'atom/list' concept. the 2 key players here are puredata and scheme/lisp. now, this adds a nice extra 'feedback loop' that is hard to use in ordinary forth: the symbolic tokenized representation can be stored in a list, and fed to the interpreter as such. this means there is an extra pre-processor in addition to the usual forth-style immediate/postpone based macros. now. be careful with these things. they can be very powerful, and sometimes easier to use than immediate/postpone, but they behave differently. the big difference would be late binding. ordinary forth code does all binding at compile time: all symbols are converted to xts. the advantage of this is of course speed, but it also ensures a linear structure: definitions can be undone using mark and mark>xxx. in addition to that, words can be redefined, but previous bindings in older code are left intact. note that when you start to work with hierarchical dictionaries, and the client dictionaries can write to the main forth, this linearity is no longer there. also, some plugins use primitive objects (they behave as pointers to external things). the external things will not be deleted by drop. so what's the point: list-interpret does late binding. every time a word containing a code generator based on list-interpret is executed, the words are rebound. so this trades forths compile-time binding to run-time binding. this can be necessary in some cases, but if you see you 'have to' use these things a lot, there's usually a problem with the order of definitions: you're writing 'nonlinear' code with forward definitions. try to solve those with either 'defer' or a 'current context' stack. my advice would be: avoid 'shameless parser macros' when possible, especially when you can use immediate/postpone. they are slow and make things non-linear due to the runtime binding involved. and there's this thing: once you start 'metaprogramming' there's no way back. to see the mess this can lead to, have a look at the evolution of the puredata.pf pd support code in CVS. all due to the fact that the word 'pd-self' which returns the current pd object was defined after most of the pd support code. Entry: polymorphy & subclassing the mechanism for subclassing is already present in the polywords, implemented as an 'abort-driven' special-purpose interpreter. this is a runtime search mechanism, so not very efficient. for time-critical code you can always call the handler directly if you know what type to expect. so 'float:+' instead of '+'. a polyword is a chain of 'service providers' for a certain abstract operation associated with a specific abstract stack effect. if the first xt in the chain throws an e_type exception, control is passed to the next word in the list and so on. this can be used to define 'globally known abstract methods on abstract objects' or generic functions. for objects which are bound to only one method, the standard forth create .. does> combo can be used, where the word already has the data bound. also, there is a lot more possible by using codefield extensions to implement a real object system. Entry: list syntax for data handling lists are very convenient. especially when you combine them with 'map' style words. this gives it a bit of a lisp feel. but using lists for defining the syntax of the language is imnsho a bad idea. i do not use it any longer. see the evolution of setup.pf in CVS to see what mess this leads to. forth is not lisp. for them that need an example, and myself when reading this in a couple of weeks, this is the case: if ( w1 w2 w3 ) vs. if w1 w2 w3 then i could go left here, but that would immediately lead to the nesting case: if ( w1 w2 if ( w3 w4 ) ) now, first remark. why the hell did i add a list parser? because i want to be able to read s-expressions. (though pf does not use pairs, but linked atoms and a list head that points to head and tail to implement stacks and queues). Entry: re-entering forth because forth has it's own mechanism of keeping track of control flow (data and return stack), there is no need to use the c stack for this. the forth is structured as a virtual machine. there is an object pf_vm_t defined in forth.h that is just this. almost the entire c api for the forth involves a pf_vm_t. because the c stack is not used, you can 'freeze' a program, and just continue running it later. the program running in the vm can be considered as a task, or a coroutine. this is more flexible than an ordinary caller/callee relationship. (see pf_vm_resume() in libpf/forth.c) if you just want to have a caller/callee relationship, you need to use the pf_vm_call_* functions, which will synchronize with the C stack. make sure you also install an abort handler using '>abort' Entry: packets & objects packets cannot contain other packets. this rule is necessary to prevent hilarious situations with circular references. packet management is based on reference counts. these are not sufficient if circular references can occur. (the dictionary, implemented as a list, used to be a packet, and made me realize this). lists are packet containers, so cannot be packets. this is the reason why they are always copied on dup or @, and deleted on drop and when overwritten by ! strings are not packet containers, so are implemented as packets. streams, dictionaries, abstract objects all can contain other packets, so cannot be managed using reference counts and need a gc to be managed automaticly. but there is no gc. so for objects that are not packets, you are on your own: they need to be explicitly created and freed, and are represented by a pointer. dangling pointers are possible, and will crash the system when used. so, there is no real object system in pf. the reason is that there are a million ways to write one (there's a survey on the web somewhere), and some of these are very simple so if you need one, just pick one. have a look at dictionary.pf to see what i think it should look like. to sum up, there are 3 kinds of basic objects in pf. these can all reside in a variable (container for a single thing + type description) from forth, values have a type and are thus like lisp atoms, though pf_atom_t acts both as container and type description. 1. scalars (like pd's) includes abstract objects (pointers). 2. packets. (raw buffers or things that behave as such). 3. lists of atoms. these can be combined into any non-circular data structure. you can build circular structures using the atom_pointer scalar type, but have to manage the pointers yourself. Entry: real time forth is is called a real-time lanugage. why is this? not because it is fast, but because it has a predictable execution pattern. pf is quite slow for a forth, but has a predictable stationary execution pattern. stationary means here: all memory is re-used after an initial 'fill-up' period, which can be forced to occur in the initialization phase if necessary. this is a trade-off between static and dynamic allocation. note that linux and most unices are not real-time operating systems per se, so pf can't be real-time. this is about forth not breaking (soft) real-time properties in a (soft) real-time system. forth is real-time, because it allows you to do a lot of things without using unpredictable subsystems like dynamic memory allocation and garbage collection. forth is very linear. this fits well in simple settings for 'stationary processing'. most things that are about real-time are about events and streams. forth fits this picture well. Entry: patterns forth is poetic. and yes, i'm just going to give in to that forth meme spreading urge again. you know this thing about patterns. design algorithms. it seems working with forth is like extracting patterns. refactoring is easy. most of the time nothing more than cut and paste because of the lack of names for local variables (just stack positions). once you isolate the small heavy used 'channels' a nice modular structure emerges. forth's way of telling you that you didn't make a clean cut is to spit a lot of complexity at you at once. this is the thing that needs to be tamed. one good indication is the amoint of 'dup swap drop' in your code. when i first started forth, i was amazed that code i saw from other people contained so few stack manips, while mine was full of it. over time i got a feel for organizing things to avoid the juggling. Entry: meta data the problem with handling pure data (a sequence of bits) is meta data. which ever way you shake or turn it, it's always in the way. all i want is the damn buffer. now what is meta data, actually? it is the code to decode the bits. so, why not dump some real, proper code in the meta data? ta-da: packet forth. if you save an image in internal format, the contennts of the file will contain something like this: #(image/s8/100/100/1 80000) whenever a '#' is encountered in the input stream, a parser command is executed. currently, there is only one, which is '#('. this reads a list containing the type of the packet, and optionally the nb of bytes following in the stream. note that data can be aligned by adding a number of spaces before the closing ')'. the default nb of bytes is the total data size of the packet. this works only with 'pure packets'. i.e. those that really are a sequence of 'dumb' bits. now, if i load this file using "image.pf" load i will get the contents of the file on the data stack, being an image packet. the nice trick is that i can just edit the file, add a command after the binary data, i.e.: #(image/s8/100/100/1 80000) image:double and do the same again, and i got a double size image when i load it. so pf data files are 'electric', because they are really code files. now, this has quite some security implications, but that's not what it's about here. it adds so much flexibility if you keep your data files as code. i could just have something like: "otherfile.pf" load without someone ever noticing it comes from the other file. anything can be plugged there of course. it's just code. so meta data can be very broad. there's a basic type system, and a lot of conversion and processing logic, so you can emulate almost anything. it's really easy to save a data atom to a file and append some symbolic processing code to it. the nice thing about this, is that you can use the pf 'file format' in other applications, and have this kind of magic happening. again, terribly insecure, but worth having as a tool in the box. Entry: mapping and drawing -- are packets mutable? yes and no. that's politics. standard packet operations (maps) perform copy on write, so each operation has only local effect. no packets are muted. this allows for a clean functional style of programming, with minimal copying overhead. for incremental updates (i.e. storing a single pixel) this is crazy. so, we introduce a hack in the form of the 'accumulation packet' aka 'bucket'. this captures the 'drawing' concept, next to the already present 'mapping' concept. an accumulation packet is not a separate type, but any packet is assumed to be an accumulation packet for functions like store, draw, and any other that might fit the shoe of 'partial mutation'. so, you have to be careful. most of the time it is clear if you are dealing with muting or non-muting functions. index! is such a mutator. rendering text to an image is another one. only one thing to remember. if you use muting operations, and you must operate in a functional framework, you need to 'reserve' a packet. this will ensure it is local by performing a copy if necessary. something like: : deadbit reserve dup >r 0 123 r image:index! r> ; would behave as a functional map. the original, if used elsewhere, will not be modified. in fact, some functional image processing primitives are implemented with reserve + a muting operation. this is of course just the eternal call by reference / call by value debate. for standard (functional) usage a packet behaves as a rvalue (implemented as a reference to a shared readable object. lisp style.), for accumulation packet usage, it behaves an lvalue (implemented as a reference to a shared, writable object. forth style.) Entry: exceptions exceptions in pf are implemented using a 'try .. recover .. endtry' sequence. 'try' installs an exeception handler on the return stack, which consists of a marker (DS,RS depth) and a return point. if an error occurs (either in a primitive or from threaded code using 'throw') the code after the 'recover' word is executed. before that the stacks are rewound to the state before 'try', and the exception (either an e_ style primitive error or any object passed to 'throw') is placed on the data stack. note that the stacks are rewound to a certain depth, but that does not mean their contents is specified. this can lead to strange situations when a lot of data is consumed from the data stack or rearranged. things you put on the return stack will still be there, but the data stack contents is unspecified. depending on what you are doing it might be trivial to know what is on the stack. on normal execution (the part between 'try' and 'recover' did not cause any errors) the error handler is removed after completion and code after 'endtry' is executed. Entry: cross-compiling [ before reading this: cross compilers can best be written using dictionaries and modified search order. that's the standard way, which seems to work well. (badnop uses an ad-hoc name mangling approach which is really non- standard, but has its benefits). the text below advertises pf for writing these compilers. while the author does not follows his own advice. ] or how to write a forth compiler in pf for all kinds of target (virtual) machines. first: there are some problems. pf is not standard forth wrt to symbol names. in addition, there is special syntax for lists, strings and pure packets. this could be fixed by adding a 'plain forth mode' to the parser, but currently i found no need for such a thing. it seems to me the only reason you would need this is when you want to use standard forth code verbatim, which is a bad idea anyway.. for most cases, the platform is 'exotic' and everything will be built from scratch. using pf to define the 'subforth' is very convenient: lots of tools available: basic forth, symbols, lists, strings, unix interface, ... the general setting is this: pf will be used to host all the symbolic stuff using it's own dictionary. the target forth only has threaded or native code in its memory buffer, and all communication between the 2 uses target execution tokens (i.e. code field adresses) and literal values. this keeps the target forth very simple. the host forth needs to change behaviour to make this happen: in target-mode, which is implemented as a child dictionary for scaf, compile and interpret work different, or ':' and ';' are redifined. the execute semantics of a target word (its definition for the host dictionary) is something like : word target-xt target-execute ; note that target-execute can do all kinds of magic, like talking to the target over a serial line or midi (distributed forth). the compile semantics ( xt -- ) is like : word, xt>body follow @ target-compile ; (follow is because the literal is prepended with 'lit') so plugging into the 'compile' and 'execute' channels effectively enables a forth to be connected to another one, with most of the host's functionality present. this, my friend, is a very powerful tool. so. what is forth? a way to connect machines. forth is a communication protocol. most problems with data/stream processing can be solved in a very elegant and usually efficient way with state machines. split up the problem by building a machine that reflects the problem domain. this can be done in software, but also in hardware. for forth, there's really no difference. build as many machines as you want, and program and connect them all in one language. see plugins/scaf/scaf.c and script/scaf.pf for more information. Entry: forth again. and code and packets and objects. yeah. it's a cult. i had a serious forth-fit yesterday realizing the true nature of forth. it's really nothing more than the simplest possible way of getting your code to run on anything that behaves as a state machine. because of the single-pass nature of symbolic and threaded forth code, and possibly some bytecode and token intermediates, communication between forths is really straightforward. because forth is so linear, some things are hard to do. it's not a lisp. (yeah, another fetish). this is its only weakness i think. so reasons. explanations. memeplex cleanup. GOOD: * forth is easy to write from scratch. * forth is even easier to write from scratch in forth. * forth is an interpreter and a compiler at the same time. * forths can be chained. compilation is one-pass and so chainable. (coroutines, pipes) * forth is extensible. no artificial border between language and program. * forth code is very dense. there are no named intermediates. * forth has no real policy other than stacks and words. * this makes it behave very well in real-time conditions. * forth supports both prefix and postfix message protocols natively. * pf is a 3 layer forth. text->sym.tokens->addr.tokens BAD: * forth is not scheme. * forth code is very dense. there are no named intermediates. * forth avoids non-linear data structures (circular) to avoid gc which can be a terrible pain sometimes. [EDIT: funny to read this. serious meme high. think i got a bit better at naming the concepts though. the good thing about forth is that it's almost completely concatenative (in a functional programming way, with implicit composition). the bad thing about forth is that this is done in an 'optimal' way where optimal is mainly defined as minimal interpreter complexity: in other words, tricks which are not entirely conceptually clean wrt to functional composition, are allowed, as long as they are simple to implement. the main example here is macros, which are predominantly used to introduce 'local nonlinearity'. an example is if .. then .. else, which can be solved more cleanly by using highlevel functions, but in forth is just a syntactic trick to compile conditional jumps more efficiently. see 'if' below.] now, regarding this packet thing. this is really nice, because it separates state from functionality very clearly. * code is just code, and has no state other than constants * data is just data, raw bits with some interpretation tag. (ahem.. code data duality.. this is triggering too many associations.) on the top level (or any level), you chain code and data together. this separation between stateless functions and data in a form that is raw and white box is kept as long as possible in packet forth. this approach is much easier than objects, and has several benefits: both are serialized easily. and that's a key point. saving and loading code is easy, because it's just a sequence of words. saving and loading data is easy, because it's just a sequence of bits with some type description. both can be present in the same stream, making forth an extensible communication protocol. so load/save for all code/data in packetforth is trivial, linear and what not. you can send these things over all kinds of channels, in all kinds of forms (internal symbolic token rep, internal compiled rep, text rep, ...) the thing we need to give up here is abstract objects. sorry. support for abstract object types in packet forth is not very good. it does not fit the packet concept, so i decided to not loose too much time on it. abstract objects cannot be serialized in packet forth, and are just there to abstract things that cannot be made explicit as functions and packets. currently they are used for i/o streams and other i/o adaptors that benefit from an object representation (because they have state invisible to pf), or that already are implemented this way. this is the case for most wrapped libraries. [EDIT: this got a lot better since the 'serialize' invented during the huge non-blocking rewrite. raw packets can be serialized like all other data, though not in an endian independent way.] Entry: packet forth is for insecure programmers haha. forthers are weirdos. some of them are really obsessed with eliminating bloat. not that i don't like that, but for packet forth this is not necessary. i don't consider pf bloated, but compared to an ordinary forth, it's a lot less 'hard core'. [EDIT: getting more hard-core by the day :)] the problem with real forths is this hard-core-ness. i've fallen for it too, but maybe it is the only way to truly understand forth: to go to the bottom and take out everything that isn't necessary. especially to take out the problems you created yourself by adding 'generic features' that you don't use. forth is really about specialized hardware and software. the main reason why the ANS standard is a joke for most purposes. forth only has an edge if speed, responsiveness, flexibility and code size are all required at the same time, and everything needs to be written from scratch. [EDIT: i'm just parotting here, but there's something to say about hard-core forth: hard core simplicity is really addictive. it's a puzzle that has challenges you don't find in other languages, because people seem to have stopped caring about issues. i got really into this with the 8-bit pic forths: very non-standard, full of hacks to make code small and fast. PF is indeed a bit less extreme here, since pure speed at the VM level has never really been an issue. i just finished 'meta math' by chaitin. talks about these ideas a lot, but from a more mathematical and philosophical pov.] for most of today's problems, bloat is ok. bloat is evolution at work. evolution leads to dirty but working solutions. unix is just that: a bunch of tools cludged together. unix used to have the same 'minimal' philosofy as forth has, but that time is gone. most linux systems are bloated as hell. considering what unix has made possible, i do not consider this a bad thing. [EDIT: for forth, the evolution goes in the head of (mostly) a single developper. there is a lot of bloat, but most of it is in the form of volatile thought. once done, the solution is more elegant than larger distributed projects like unix systems. maybe forth is more about the size of human brains than anything else.] for some people minimalism works out. look at ColorForth. CF is a piece of art. i don't think it has a single word that can be removed. it is a complete self-hosted dev env running on the bare metal. i bow to ColorForth. i bow to Chuck. he figured it all out. his ideas, especially the hardware ones, are guerilla warfare. it is really about bringing power to the people. and that's about the same as i want to do with packet forth. be it in a little more modest way. i'm not into designing chips yet, but who knows. [EDIT: crackhead. though i still bow to chuck :] ColorForth works for things that it is designed for: writing stuff from scratch to eliminate all possible bloat and to taylor it too the hardware. In fact Chuck has done this to the extreme: half of CF is the hardware (25x and 50x chips). pf is a little less radical. it embraces forth and unix at the same time. think about it. both are really the same. unix is a collection of loosely coupled programs. forth is a collection of loosely coupled functions. unix is an operating system. forth is an operating system. the only real difference is that unix is big and bloated, and evolved into a standard. it is very 'concrete'. forth is more of an idea. a design pattern. because it is not standard it can stay in motion which keeps it lean and mean. note that both solve almost the same problem in 2 different ways: * unix is a grown standard. standards have this effect of introducing bloat because of backward compatibility: there are versions (library hell) that can only be effectively solved by duplication. this is on the binary level (debian). on the code level a lot of bloat can be eliminated though (gentoo), but there's still problems with syncing different people's efforts. you know. 'bazar' and all. unix works because it is a solution to a real problem: providing a standard base for people to work together on large projects. look at GNU/linux and all other open source efforts: most of these things can work together, which is an achievement in itself. * forth is a pattern. it can be used to solve a specific problem using design and not evolution. well, evolution is possible of course, but because of the absence of 'standard' issues, bloat can be eliminated the moment it pops up. this is not specific to forth, but still, the analogy to unix is striking. both have the same sort of modular approach (variables <-> files and words <-> programs) and a 'manual' of how to port it to any architecture. now, what is packetforth? it is the combination of unix and forth. packet forth takes functionality from unix, in the form of system services, code libraries and other programs. it can talk to these. in addition to that, it's virtual machine is more highlevel than an ordinary forth: it uses lists instead of raw memory buffers, and has a built in type system taylored to support media processing. but it is still a forth. and can be made to host or tether other forts. so it's a win-win situation. it is integrated into the bloated unix world, so it can piggyback on that, and it is a forth so you can do anything that you would use a forth for: compiling stuff from scratch for other architectures, and connect them to the overal system. i use this mainly for hosting other virtual machines. i've discovered that most things related to data processing can be solved in a clean and usually efficient way using virtual machines to decouple highlevel and lowlevel code: to create an extendable and interactive system that is still fast. [EDIT: note that this project has moved mostly to the scheme/cat based brood system, since PF is considered finished and therefore i can't break it any more with deep changes.] this is an approach used in other fields to. most highlevel languages these days compile code to bytecode to run in virtual machines to solve portability problems. and there is the parrot project that aims to be the generic scripting language vm. so it really is a sign of the times. but combined with forth, a vm enbles a lot of cool tricks. thinking about it, i could have written packet forth in forth, which would have made the lowlevel-highlevel distincion a bit less prominent. (in short: there is too much C code that would be easier to implement in forth.). the problem with this is a practical one: when i started i didn't know how to write a forth in forth. so there you have it. bloat generated by evolution and the practical limitation of typing speed etc. anyway. packet forth works. i like what it has become. it has a reasonably highlevel feel which is a bit like lisp. it can do things a bare forth cannot do, and is fully integrated in the operating system, so i think i'm going to live with the little bloat left. can't rewrite forever. [EDIT: funny, since what i did next is a big rewrite :] it's my forth. and i've learned that does not mean it's necessarily yours. though i think the ideas in here are provocative enough to make you experiment :) Entry: using the return stack packet forth does not have local variables. this is a good thing. but, there's the return stack. during the execution of a function's body, it is not used, so you can use it to store atoms that are in your way. i've found this to work very well. if you look at my forth code, you see i don't use 'swap' and 'over' a lot. i use the return stack: '>r r r>'. the main reason for this is that it is more visually distinct: the fact you need to balance >r and r> gives code a little more texture. 'r' also behaves as a local variable. usually i store the current object in r. i.e.: # ( zzz.object -- ) : zzz:magic >r r zzz:magic1 r zzz:magic2 r> zzz:magic3 ; the fact that you NEED to balance r makes it actually more ehm user-friendly :) i.e.: if you make an error to this, code will immediately fail. real forths crash when you do this. pf does not, it just throws a type error if it does not find a valid return address on the return stack. morale: implicit arguments are fun, but don't loose your head with 'swap dup drop over'. if you see a lot of those words interleaving your code, try to find out what you really want to do and chop it in pieces. some words of caution though. pf does not have an explicit control stack [EDIT: which would be yet another global variable, getting in the way of multitasking and fast context switches], so some words use the return stack for this. examples include: for .. next try .. recover .. endtry (hint: define a word using these and look at the code using .xt) control words that do not use the return stack and can be freely interleaved with >r r r> and others that modify RS: if .. else .. then begin .. again / until you can always create new stacks of course. you can even implement automatic cleanup of these if you install a cleanup handler on the return stack. the only thing to be careful about is that jumping to raw threaded code (what 'if .. then' and 'enter.. leave' do) and executing an xt are not the same thing. Entry: contexts & stacks. [EDIT: most of the stuff below is oboleted by the non-blocking rewrite. PF now has an OS-friendly cooperative multitasker which presents a blocking IO interface to forth tasks, and uses non-blocking IO for the implementation to ensure blocking in the PF<->OS link is considered an error, except for the 'select()' point.] forth should be simple: 1 (system) thread, 1 dictionary. i forth per processor. no smp. but, pf is about integrating in unix and at this point in time that means threads. threads suck because they are unpredictable. it is a gigantic hack actually, to have processes with shared memory. but there is blocking io in unix and threads are an elegant way to deal with that. let's leave the politics aside for a while. how to support threads? no. first. why threads are bad for pf: forth operations are not atomic. when you design a forth as an os, on a single-processor machine, you can avoid pre-emptive multitasking and use cooperative multitasking (coroutines). the big advantage of this is that atomicity is under your control. the only problem is interrupt handlers, and depending on the machine, you can usually make a subset of the forth words atomic so you can use those in an interrupt handler. threads are bad for pf, because pf's data structures (lists) are not thread safe. this is largely a design error that is too hard to correct at this moment. in addition to that, forth words are implemented as C functions, and during the execution of a C function some invariants may be invalid. this can be solved when the data structures are thread-safe though. so the short story: correcting this is not possible without a complete rewrite of about every line of code. moreover i doubt if it would really add a lot of value, because it is not too hard to implement extra synchronization constructs (queues, stacks) that are thread-safe, and that would effectively solve the problem. so the current approach is this: threads are only used for system i/o drivers. the forth itself is single-threaded, and will stay like that. all other things can be solved using processes (private data/code per forth) and shared memory and pipes for communications. [EDIT: i probably still need threads when i want to use libraries that do blocking IO, or use external processes as is the case for mencoder.] Entry: reinvent the wheel yes. it is important. reusing code is a joke. don't go there. for some things it is really necessary, like reading quicktime files, or interfacing to graphics hardware, but if you can: do it yourself. use an expressive language. be independent. it is the way to enlightenment. Entry: symbol vs xt in dynamic tasks [EDIT: the original task code needs to go and be replaced by the new one. this will be obsoleted soon. task local words (true objects) might be necessary.] forth can be convoluted. packet forth maybe more.. the problem i ran into is state again. in the original opengl.pf demo file, there are several tasks. state for each task is kept on the return stack. is this a good idea? the problem is not necessarily coroutine state (where are we in the code) but objects. it uses smoothers. these are one-pole lowpass filters with embedded state. the original approach is to create a symbolic word. this is done by feeding symbolic code to the interpreter. see above for more ramblings about this. i'm still not sure how to handle this 'pattern'. problem with feeding stuff to the interpreter is that once you start, you have to continue this way, because forth is very linear. another problem is that tasks are really dynamic objects. as far as i know, 'standard' approach in forth is to create static tasks. then it makes sense to define symbolic words for it. with dynamic tasks, the words need to be cleaned up. here an object oriented style is more appropriate. so. what's the final word? either use a separate dictionary for each task, and choose either symbolic word creators (create .. does> constructs combined with instantiation through list-interpret) or xt creators. a task then really is an object: it needs to be destructed properly. the other option is to use functional programming, with clear separation between code and data. data can be put on the return stack, and code can be reused. it would be very nice to define links on the return stack, that will be cleaned up automaticly. i still don't know if this is a good idea, but at least it is an option.. [EDIT: this looks like the only way to go. it's not completely safe (pointers) but i think that's ok. no nanny pf.] so, conclusion: there will, at least in my mind, be always a conflict between the object-oriented (or dictionary oriented) style, and the functional programming style. [EDIT: this has a name.. i forgot. something about either make data easily extensible (OO) or code (FP)] i need more time to stabilize my opinion, but it sure feels that functional programming with data structures in lists is a lot better. then, following that thought, we end up with lisp again. the more i work with pf, the more i really want a lisp. :)) [EDIT: see cat/brood] Entry: packet forth link structure [EDIT: i don't really like this.. but it's a deep cut to change it. related to typedescription == opcode idea, and code can be any list, instead of only 'blessed' lists.] all code in pf is stored in links. a link is a list with a certain structure (see above). a dictionary is an abstract object which contains a list of links, and a pointer to a parent dictionary. [EDIT: only one global dictionary. no parents. this is faked at compile time only by splicing of a part of the global dictionary into an isolated object, after the code has been bound.] this structure is more complex than ordinary forth, where the dictionary is just a linear array. because of this, allot works differently. the dictionary can be accessed atom-wize (i.e. using , ) or link-wize. this gives a lot of freedom to shuffle around code. as long as links are in the dictionary, they can be accessed symbolically. to come back to the temporary function subject. what this can be used for is to build some kind of code structure on the dictionary stack, and then, when it is finished (compiled or 'connected'), move it somewhere else, for example the return stack. this makes me doubt again, wether or not dictionaries should be abstract objects... but let's not go there. there are infinitely many things wrong with pf that could be done a lot simpler.. maybe pf2 will do it that way :)) Entry: tailcalls: return stack juggling and abstraction of course, it is safest to do this with macros. but, it is possible to do it with ordinary words too, which could lead to more compact code. then the only problem is tail calls.. you never know. so i leave it at macros until i understand how to do it.. this has implications for everything dealing with the return stack. especially exceptions and for..next loops. anyway, this tailcall business needs to be examined. probably it's much simpler than i think. [EDIT: yes, but threading needs to change to a simpler tokenized (type tag) call threaded forth.] Entry: create vs. link again, related to temporary code. 'create-xxx' is really a parsing word. using parsing words in functions is really a pain. so, it's best to have non-parsing 'link-xxx' words that create a named link (name from DS). so, in short. use 'create-xxx' to build static things, and use 'link-xxx' or 'make-xxx' to build resp. symbolic or xt referenced dynamic things. Entry: dictionary i took out the dictionary object today. this means no more C coded hierarchical namespaces. all is reworked as an allot stack. it can take links (circular lists), which are used to store code. in addition it can be used as an ordinary stack. the effect of the dictionary object can still be implemented by swapping in/out some other list containing links. similar to temp code. [EDIT: temp code is no longer implemented... given a nudge or two to the module loading code, this could be revived using the 'mark' words.] Entry: dictionaries objects and binders hierarchical namespaces are removed, so objects etc need to be implemented differently. this is done by using the word 'mark', which tags the dictionary, and some 'mark>xxx' word, which builds some structure from the marked part of the dictonary. currently there are: word behaviour of created word mark>buffer variable mark>dict find mark>object find execute mark>binder accept find execute/compile this can be used in conjunction with tasks and coroutines to have temporary code stored on a return stack of a running routine. see script/scheduler.pf [EDIT: the first thing that's acually still alive. used in a lot of code where a local namespace is necessary to reduce typing, but would interfere too much with the global namespace. an example here is the video players/recorders, which have a very narrow default interface, but have a lot of tuning functionality.] Entry: attributes Now, on to ease of use. the nice thing about forth is: 1 word, 1 effect. This can get rid of a lot of red tape while programming: each thing has one default effect. This is not compatible to objects though, where you have several things an object can do, identified by a name for each action. The dictionary objcts and binders above can do that, though in most cases it is easier to just do it the forth way. This is very much like files are implemented in unix: each file has some standard behaviour (read and/or write), but in addition to that, other attributes can be modified by IOCTLs. All attributes are stored in a dictionary object, which behaves as a class object. Each representative word stores a dictionary of attributes. What problem does this solve? For instance setup of i/o modules. It gives a nice trade-off between object oriented programming, and forth style one-hit. It could be implemented using does words: first item in body is instance data second item is class (dictionary find xt) msg grab format! Now, could we store this in the link by default? Maybe wize to implement a word differently: meta link->first (name compiler finder) body link->last (cf xt xt xt xt) this would still enable fast execution: xt = pf_atom_t -> pf_list_t update: attributes are a bad idea, but the double link structure might be a good idea actually. type symbols & garbage collection what about this: type indication should be symbol only. xt's are just atom pointers pointing to a cf. polymorphy should be handled by this symbol mapping. Thu Dec 7 13:52:41 EST 2006 Entry: too many stacks what about dynamic scoping for local context? (clisp's 'special') in scheme there is no such thing, and in clisp the default is now lexical scoping, but in forth it might make sense because there is no lexical scope. implementing one would require a whole lot of other changes. what i mean is: we have several auxilary stacks: input, output, abort, (dictionary,) ... who all need some synchronization with the return stack. implementing tasks and coroutines becomes harder because this context needs to be saved too, which does not happen at the moment. so it might be wize to take out these stacks, and find a mechanism to put them on the return stack. what about a searchable attribute list on the return stack? this could help to implement a whole lot of local context things, like i/o, local variables, abort handlers, ... the mechanism would be to have an extra type: a_context, which is a symbol followed by a data atom. so.. i went out and tried to change this, only to find out it is a big mess (the input/output stacks). so for now, i stick with manual synchronization. there seems to be no way to solve it in general. loading inside generators/coroutines ran into this while writing a sequencer for a psycholigical experiment. this is possible, but it can only perform one script at a time. in short: there can be only one open file at a time, because the input/output stack is global. the console is not a problem, because it is pushed/popped to the stack. should this be task local? (see above: attributes). probably not. the way it is now (dynamic scope, if you please) is ok for most purposes. if you need multiple script inputs, you can always use stream-read-atom. Entry: polywords again to simplify the code and possibly speed up things a bit, it might be wize to chop up polywords in 2 steps: * first, execute the scalar word. this will do the multiplexing based on the numerical type description, and so will be a bit faster. this is built into the kernel and doesn't add bloat to the forth code. * then, execute the packet polyword in the way it is implemented right now. scalar processing is built in, so scalar processors will be primitives. the exception mechanism is not necesary to execute the polyword xt, which could be stored in the field after the codefield, i.e. directly in body. Entry: coroutines/generators/tasks seems the way i use generators/tasks, it makes sense to distinguish between switcher words (which swap the return stack) and body code, and explicitly assign code to the switcher using 'start'. it is left to the user to build more convenient starter/restarter words on top of this. when this xt calls other xts, a 'nested control structure' can emerge. as with other control structures, this should balance the data stack over entries and exits, since the data stack is a shared resource and is not saved in the task structure on a yield. to name things: within the control flow of a single coroutine, there can be 2 types of xts. * normal xts: do not call yield, do not need balanced data stack * control xts: can call yield/transfer or call another control xt, but the data stack needs to be balanced between entry and exit (which can be 2 different xts), or equal to the desired yield semantics. Entry: abort vs exceptions i'm gearing towards taking out abort, since there are too many occations where it can lead to quite awful bugs, by not restoring some context. this would eliminate the abort stack. in the same spirit, it might be interesting to also eliminate the i/o stacks, and use only 'current' variables. there's a problem with this though: exceptions are saved on the return stack, so abort serves the purpose of a 'super-exception'. not trivial to take out. chould check how this is done in gforth. currently, there's a problem with the console and the coroutine/generator implementation, which swaps out the return stack. Entry: ctypes what are they? void pointers. do we really need them? no. so i'm starting to eliminate them, starting with x11 and streams. this means there's a new definition about what a packet really is: a generic object, with the constraint that packets cannot cause circular refs (this messes up the refcounts, and with it general forth tree-like structure). i only encountered this when the dictionaries were still packets (could contain themselves). so non-pure packets can be wrappers around generic objects. Entry: streams these need to be handled differently. what is a stream? a port that supports the following methods: - getc - ungetc - vprintf - write - read - close current streams are strings and files/pipes. this is changed so that a stream is a packet now. string streams are still a bit of a hack though. Entry: revision after a couple of months working on another project, i came back to pf with a mixed feeling. one, i really like the forth. some things are still a bit complex, like the abort stack handling vs exceptions, but still, it's nice, and modular. but the C core, i really dislike. especially the verbosity of plugin argument handling. this needs to be cleaned up. the code is so full of sequences like: check packet (PF_STACK_CHECK) get packet (s->first->next..->w.w_packet) get header (pf_packet_header) get subheader (pf_packet_subheader) check something in subheader, usually a subtype descriptor clone destination packet from header->description or some information in the subheader. this needs to be simplified. i do this locally using macros for most places where it's too absurd to handle, like the image processing words, but there has to be an easier way to make the transition from forth to C functions, avoiding a lot of typecasts. Entry: parrot been a while again.. i should paste the other text i wrote about a supercollider-like design, desiredata, and lisp. it's clear to me now, that what i originally wanted to do with packet forth, is not possible in its current form: do memory management correctly. currently, it's reference count based, which is nice, but either necessitates manual collection for circular references, or abolishes them. it's good for my ego to know Python started out like that :) what i see now, is that optimizing for memory re-use, which is so vital to obtain speed in video-processing, is largely incompatible with mark/sweep garbage collection. however, a system can be built, (think supercollider, desiredata) where the central server has a 'fixed' dataflow structure, and the controlling language is very flexible (scheme, or some lisp/forth hybrid like i'm trying to build). in the case with a clearly split design, the client can really be anything, and the server can be fairly simple: a plain refcount based forth. decisions then need to be made as to where exactly storage and functionality is implemented. interesting problem, and as far as i understand the only path to completion of my 'vague' idea about pf.. now, since parrot is nearing completion, it might be interesting to build the gui/client lanuage on top of that. Entry: objects again an example of a pattern that occurs a lot: * grab : get video frame (camera is abstracted as one word: READ) * send a control message to camera what you would do normally is "camera grab" and "camera control". what i suggest is something like unix ioctls. i introduce a word 'control' which returns grab's underlying object to which other messages can be sent, instead of the pre-bound 'grab'. : <'> xt>body @ ; : [control] postpone literal postpone @ ; immediate Entry: parrot Date: Wed Jan 18 17:12:13 CET 2006 Larry Wall has sexy ideas. http://dev.perl.org/perl6/talks/2000/als/larry-als.txt [Perl as a Low-Level Language] Perl as a low-level language. Polymorphism is your enemy if you're trying to do low-level programming. If you want to get early compile-time binding as soon as possible, you want the compiler to spit out very efficient code, so you write your loop to declare i as an int, then by golly you want your compiler to spit out very efficient C code. ... [Perl as a High-Level Language] By way of contrast, if Perl's going to become more a high-level language then polymorphism is your friend. You want to delay your binding as long as possible as to what implements what. Perl has always been in the business of allowing, but not requiring, abstraction. We'd like to put in more support for functional programming, for logic programming, and for what are called "little languages". [Perl as a Metalanguage] The folks at Bell Labs invented this notion, soon after they invented yacc. "Cool, yacc lets you make a grammar for your own language". So all these itty bitty languages sprang up that were for their own purpose. So each time you wrote a new program, you wrote your own language for your program. This was cool except that you had to learn the new language each time, and it was always different. We'd like to explore the notion of using a big language as a little language. If it's okay to program in a Perl subset, and you define a subset that looks like a little language, then how do you get around paying the price of the generality of the big language? We have ways of thinking about that. -- that's something to think about. if parrot is going to do all these things, then parrot is what i need.