A ``primitive oriented'' scripting language core. Libprim is a collection of primitive objects and operations written in C, mostly intended for writing embedded audio/video applications. These objects are designed in such a way that they can be used stand-alone in a C application, but are easily ``lifted'' into a more expressive data/code model for prototyping and debugging. Libprim constists of 3 main layers: * LEAF: C objects (bytes, symbol, port, parser, tuple, ...) with manual memory management. * EX: graph memory with automatic asynchronous mark/sweep GC, i.e. a Scheme memory model. * PF/SC: VMs implementing two execution models using the EX data structures. SC is a standard Scheme tree-walking interpreter. PF is a linear stack language (a Scheme/Forth hybrid). Entry: A bit less cryptic Date: Tue Aug 4 11:14:36 CEST 2009 Practically, this project is about giving ideas implicit in Packet Forth[1] a new home. For day-to-day problem solving in embedded software projects, C is not going anywhere because next to a programming language, it is also an interface standard, both for commodity machines and commodity programmers. Instead of focussing on building new languages, I think it is worthwhile to do some engineering on with subsets of C itself, by providing an infrastructure that makes it easier to bootstrap ``functional, concurrency oriented C code''. Hopefully this idea will become easier to articulate once the implementation is done. The goal is this: * Write a re-usable collection of C primitives necessary to support a dynamically typed scripting language, but do this in such a way that it does _not_ depend on external memory or control models. * To test the primitives and the programming method, write a couple of dyntyped scripting languages (memory and control models) on top of the primitives. I'm thinking about two: a PF-like concatenative linear stack language and a Scheme-like CONS/CONT/GC based language. * Write a compiler for static memory allocation and scheduling to optimize away the scripting layer. * Use a data-flow architecture (inspired by Oz). Static types? Given that the goal of the project is _prototyping_ which is mainly _evolving specification_ I think it's better to stick to a simpler dynamically typed approach. [1] http://zwizwa.be/packetforth Entry: CEKS machine Date: Tue Aug 4 11:25:38 CEST 2009 Scheme implemented in terms of a CEKS[1] machine. Focus on the evaluator first and ignore GC. Some CEKS machine for Javascript[3][4]. It might be easier to start with the CEK machine, and add the store and GC later. Anyways.. I started with the GC. Bottom up seems better. It already gives a way to encode types. [1] http://www.cs.utah.edu/plt/publications/pllc.pdf [2] md5://3ab517a1167f86ef52caf3bf1a4fdb0c [3] http://wiki.ecmascript.org/doku.php?id=meetings:dave_herman_presentation [4] http://www.ccs.neu.edu/home/dherman/javascript/ [5] http://www.cs.utah.edu/~mflatt/past-courses/cs6520/public_html/s00/secd.ps Entry: Lightweight tasks Date: Tue Aug 4 12:20:48 CEST 2009 Based on the interface present in GNU PTH[1]. [1] http://www.gnu.org/software/pth/ Entry: Primitives and Composition, an OS? Date: Tue Aug 4 12:25:20 CEST 2009 Any form of language design is based on building primitives (axioms) and composition mechanism (inference rules). Is is a stretch to say that an Operating System is actually a composition mechanism? If tasks are primitives, gluing tasks together is what the OS does. Entry: Aspect oriented programming Date: Tue Aug 4 12:29:47 CEST 2009 The idea I'm trying to encode is the addition of join points to a C program: make all these points available to an observer (scripting language). [1] http://en.wikipedia.org/wiki/Aspect_oriented_programming Entry: Garbage Collector Date: Tue Aug 4 14:53:52 CEST 2009 I've added a simple copying GC in ceks/gc.c for allocating vectors, as a memory model for implementing the Scheme interpreter. It uses tagged objects: vectors, atoms, integers, and one available user tag. The vectors themselves can be tagged (i.e. to implement a couple of primitive structs) when the maximum vector size size is limited. [1] http://www.brpreiss.com/books/opus5/html/page426.html Entry: Symbols Date: Tue Aug 4 17:58:47 CEST 2009 Instead of using a hash table, a preliminary implementation could just use a stack. This will make parsing slower, but can be easily changed later. Entry: Continuations Date: Tue Aug 4 20:32:34 CEST 2009 So.. ISWIM uses n-ary primitive application, but unary abstraction and application. Why is this? It seems that it makes the extension of the environment on application easier: only a single variable has to be added. Maybe this isn't necessary for Scheme. A multivalued application adds all variables at the same time. Instead of arg,fun,opd as explained in PLLC, I needs just a single type that evaluates all variables in an application from left to right, starting with the function position, and then performs either an application (extending the environment) or a primitive evaluation (production of a value). So how to represent (x x x _ x x x) where "_" is the hole? In short: what does the CEKS machine do? It converts the current expression and its continuation in a simplified expression and an updated continuation. It is _application_ (triggered by the availability of the last argument) that _pops_ the K stack. So.. Representing continuations. (((f a b) E) K) -> ( _ (((f E) (a E) (b E)) K)) -> ((f E) (((a E) (b E)) K)) Entry: Objects and Lambdas Date: Wed Aug 5 09:43:14 CEST 2009 What I'm trying to accomplish probably has a lot of parallells with COLA[1]. However, my concern is mostly interoperability with C. Anyways. GC and objects. It might be wise to require objects to exhibit a class structure: each object points to a record of methods, where the first couple are used by the GC. [1] http://piumarta.com/software/cola/ Entry: Confused about primitives Date: Wed Aug 5 13:14:07 CEST 2009 This is ironic: the goal is to properly define primitives, but I'm already getting hosed! Once conclusion to draw: there is a set of C primitives that is best written directly in the required API (sc ,object, ...) -> object instead of in unwrapped C, because of the tight coupling with the data representation. These _direct primitives_ are written manually. In short: to keep things simple, the interpreter is written in terms of primitive functions using all scheme datatypes (ala tinyscheme). What cannot be avoided however is functions that convert between scheme and C values (esp. strings<-> and ints). it's best to limit the use however, and try to use as much as possible real scheme primitives to avoid duplication (don't give in to premature opti here!). So, let's try it: All C functions used in the implementation of the Scheme interpreter are Scheme primitives. This fixes the primitive calling convention to: object fn(sc*, object, ...) This will raise other problems since I'm not sure it's possible to have variable argument invokation (assumed from Pure Data source). Entry: Representing constants Date: Wed Aug 5 13:54:45 CEST 2009 I might be useful to make it possible to directly cast non-managed objects, by assuming the class pointer is in the slot _before_ the pointer. It is upto the user then to _never_ place them inside GC managed vectors, as they won't have a valid class pointer. Elaborate on this a bit.. I.e. it would be nice for a (const char *) to be a valid object that can reside in a collected data structure. I.e. the object itself should be recognizable as not requiring free(). The problem then is that I'm running out of tags: 00 non-managed pointer 01 vector 10 integer 11 bool Using more than 2 tag bits places extra contstraints on pointer alignment. Booleans are useful for predicate implementation in Scheme. It is possible however to use the 01 vector slot, since we know what that will point to: - integer (it's a live vector, and the int = size) - vector (moved object) This could be a non-managed pointer to represent a class? Maybe another workaround is simpler: doubly wrapping all predicates and representing bools as integers. Let's stick to what we have and thing about this a bit more.. It looks like bools are more useful than constants. Otoh: since bools require only two values, and are actually constants, the problem is solved: 00 constant = non-managed pointer 01 vector 10 integer 11 managed pointer Let's implement this first before rewriting everything to primitives. Entry: Unsafe pointers Date: Wed Aug 5 15:49:26 CEST 2009 I get it now: by making sure the atom_class pointer is the slot one _before_ the atom pointer, it is possible to write unsafe functions without too much hassle. The free() pointer is stored in a class object. This class object is the value's type. Some values however are only statically typed, and appear as unsafe pointers if they make it into Scheme land. Entry: debugging Date: Wed Aug 5 17:48:59 CEST 2009 Hmm.. it's more difficult than I thought. The interpreter itself I can probably get debugged, since its structure is quite straightforward. However, the GC is going to be an adventure. Currently I have no way to properly mark roots in case GC is triggered inside a C primitive. During the evaluation of step() we could push all allocated atoms to a stack, and discard it at the end. This makes sure that intermediate data will not be freed. Alternatively, the heap could be marked such that all objects past the marker need to be copied first. But this then wouldn't work if there are 2 or more collections. Alternatively, the marking point could be translated into a vector that is then added to a local stack of references. The real question is: does the interpreter know what to mark? Let's change a gc->root into a callback function. Done. Now it shouldn't be to hard to turn a gc marker into an array. Basicly something like this: sc->marker = gc_marker(sc->gc) step() free_retain_stack() If there is a collection we do something like push_retain_stack(gc_marker_to_vector(sc->marker)) mark_all() sc->marker = gc_marker(sc->gc) OR do it manually. which is probably too error-prone.. Ok.. what about keeping most of it hidden inside the GC. When the GC calls mark_roots() it will pass in a vector of references containing the objects that were saved since the last border mark. Actually, this fancy trick doesn't work because the C stack still has the _old_ pointers. Not simple! One option would be to abort the primitive whenever a collection is necessary. If they are written in a purely functional style this shouldn't be a problem. Entry: mixing GC and C Date: Wed Aug 5 19:44:17 CEST 2009 The problem is this: a gc_alloc() inside a primitive might trigger a collection. At that point, all the pointers on the C stack will become invalid. To overcome this, all primitives that trigger gc_alloc() should be purely functional, such that an allocation can restart the interpreter step() that triggered the gc_alloc(). So.. If the gc itself is also made stateless (and written in CPS) then this could work pretty well. Lets see if gc_alloc() can be aborted. Looks like it.. One problem however is that the current implementation of gc_grow() will never be reached, so the program will end up in an infinite loop when there isn't enough memory to account for a single primitive execution. Here the mark_wild() function would come in handy: If it is detected that a single primitive step cannot continue because there is not enough heap space, the heap needs to grow. As long as gc_collect() doesn't abort the current GC invocation, this can be done automatically. Entry: constant strings Date: Wed Aug 5 21:29:40 CEST 2009 Since these are not aligned, they cannot be objects. So, no constant strings. Some goes for primitives. Entry: First interpreter run Date: Thu Aug 6 02:15:41 CEST 2009 # (zero? 123) tom@zni:~/libprim/ceks$ ./gc-test #state(#closure((zero? 123) ()) ()) #state(#closure(zero? ()) #frame(() (#closure(123 ())) ())) #state(#closure(#prim<0x4014f3:1> ()) #frame(() (#closure(123 ())) ())) #state(#closure(123 ()) #frame((#closure(#prim<0x4014f3:1> ())) () ())) #state(#closure(#f ()) ()) ERROR: halt: #state(#closure(#f ()) ()) Trace/breakpoint trap So primitive execution seems to work. Still to test: abstraction and application. After fixing some bugs we're at: # ((lambda (abc) 123) 456) tom@zni:~/libprim/ceks$ ./gc-test #state(#closure(((lambda (abc) 123) 456) ()) ()) #state(#closure((lambda (abc) 123) ()) #frame(() (#closure(456 ())) ())) #state(#closure(#lambda(#(abc) 123) ()) #frame(() (#closure(456 ())) ())) #state(#closure(456 ()) #frame((#closure(#lambda(#(abc) 123) ())) () ())) #state(#closure(123 ((abc . #closure(456 ())))) ()) ERROR: halt: #state(#closure(123 ((abc . #closure(456 ())))) ()) Trace/breakpoint trap Entry: Pre-emption Date: Thu Aug 6 10:46:21 CEST 2009 So, let's install the preemption based garbage collector + write the first `functional C' primitive task for parsing s-expressions = a token enumerator. Entry: constants vs. closures Date: Thu Aug 6 12:09:08 CEST 2009 Yesterday went well. The basic infrastructure seems to work. Now I need to get the small errors in the representation right. Currently I have constants tagged to an environment. This won't hurt, but doesn't make much sense. Maybe change the rep a bit. The environment contains closures (open_term + environment) or constants. OK: it's way simpler to make everything a closure, but tag contstants with an empty environment. So.. Something is not right. This gives an error: (post (cons 111 ((lambda (abc) 123) 456))) ERROR: undefined: abc I currently lost oversight so let's re-connect with the textbook rules. Entry: CEK primitives Date: Thu Aug 6 15:01:19 CEST 2009 Looks like I'm confusing two things: * primitive values + fully reduced closures (lambda + env) * non-fully reduced closures (application/varref/value + env) I'd like to make sure that primitives only see primitive values and fully reduced closures. Also, primitives should be able to return closures (even not fully reduced). Where is the inconsistency? Looking at PLLC[1] Chapter 6, p 74 at the CEK reduction rule [cek5], it seems that primitives get passed _only_ primitive values, not closures. How can I make this consistent with a primitive that can operate on closures as values? Hmm.. Let's provide better names for the data structure. 1. A continuation frame represents the current state of argument evaluation from reducable closures to irreducable closures. 2. A fully evaluated continuation frame can be _applied_ which means that a new (reducable) closure is created by extending the closure in the first position with variables. From what I could test it seems that application works fine. So I guess I need to figure out how to understand the difference between: - a machine where _all_ values are closures, but primitives can only accept and return primitive values (which have their environment dropped before primitive evaluation) - a machine where values are either closures or primitive values, and primitive functions can operate on both. So, either the interpreter's notion of "closure" is made abstract, or it is changed to directly operate on both primitive values and closures. What I could do is the following: regard interpreter state and environment + data structures as different worlds. STATE: the current reducable closure and continuation contain closure structs only, but all communucation with the environment (extension and reference) and primtives will pack/unpack closures to variables. Ok. Added closure_pack() and closure_unpack(). [1] http://www.cs.utah.edu/plt/publications/pllc.pdf [2] md5://3ab517a1167f86ef52caf3bf1a4fdb0c Entry: Lists vs. applications Date: Thu Aug 6 16:44:38 CEST 2009 Currently the implementation doesn't distinguish between applications and lists. The CEK machine doesn't deal with `constructors' which are essentially non-evaluated functions. I don't really know how this is done in practice. Let's just wrap code in a syntax object for now. This seems to work. There are now two kinds of terms: non-reduced terms are SYNTAX while reduced terms are anything else. Anyways.. It works, but it's probably a bit unorthodox. This can't represent syntax as values. The real solution would be to represent special forms, applications, variables references and quotes as separate entities. This will have to do for now. Entry: Testing gc preemption Date: Thu Aug 6 17:47:06 CEST 2009 Seems to work. There is a problem though: triggering GC outside of the mainloop is not well-defined. This basicly boils down to the fact that the GC managed memory is owned by the interpreter. The caller of _sc_run() has no business using these data structures. It's useful for testing though. It's necessary to perform a GC before doing any allocation to prevent corruption. Entry: Working? Date: Thu Aug 6 18:51:02 CEST 2009 Looks like it's working. tom@zni:~/libprim/scheme$ wc -l *.[ch] 167 gc.c 10 gc_config.h 133 gc.h 54 main.c 559 scheme.c 201 scheme.h 38 scheme_prim.h 41 symbol.c 28 symbol.h 1231 total It has almost nothing though. Only evaluation + supporting prims. Missing: - special forms (if, set!) - macros - reader - ports The special forms might be enough so reader can maybe be written in Scheme and compiled to C spec of the syntax tree? Both special forms need modification to the continuation. Assignment will need to be done in such a way that GC interrupt isn't possible. I'm adding a couple of structs to accomodate these different continuation types. Once realizing the frame type is only one of different kinds of continuations, it was quite straigtforward to add 'if', 'set!', 'begin' and 'lambda' with multiple expressions. Entry: Coroutines Date: Thu Aug 6 19:30:26 CEST 2009 This paper defends the revival of (asymmetric) coroutines as a general control abstraction[1]. [1] http://lambda-the-ultimate.org/node/2868 Entry: Macros Date: Fri Aug 7 09:17:45 CEST 2009 This will require a syntax-level environment, and a lookup for every form. A macro continuation is simply a tag: the current value (s-expression) needs to be tagged such that it is wrapped as SYNTAX and passed to the interpreter. I'm going to change the wording. What is now SYNTAX should become AST: a tag for reducable term. SYNTAX in Scheme sense seems to be something different: it is a data structure with binding information. Even if I'm not going to support the latter, it's best not to confuse things. Ok, macros work by: constructing a k_apply linked to k_macro. The k_apply is filled so it will trigger the application of the macro. The k_macro wraps the result as an AST to trigger reduction. Entry: C-Scheme Date: Fri Aug 7 10:23:06 CEST 2009 Working with C-callable Scheme primitives makes the step towards a PreScheme-like implementation easier. Using only `let*', `if', `begin', `set!', and a restriction on function application (only primitives can be called from primitives) this could be turned into a simple Scheme->C compiler. At least, the way the interpreter step is implemented now in C + Scheme datatypes, could be embedded into equivalent Scheme quite easily. Scheme C -------------------------------------------- let* Lexical variables and blocks. begin Blocks. if if statement. set! Variable assignment. (fn arg ...) Function call. prim-fn C-implemented prim (or compiled from C-Scheme) The only thing that won't work is `lambda'. But, in case it is used to pass to higher order functions (HOF), the HOF could be replaced with a macro. Note that PreScheme[1] is much more than this: it compiles a significantly larger subset of Scheme. Use pattern matching. [1] http://en.wikipedia.org/wiki/PreScheme [2] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.3.4031 Entry: Really done? Date: Fri Aug 7 11:38:25 CEST 2009 It looks like I have all of Scheme now. Next: make some C-code analysis tools that generate all the boilerplate. One more thing is missing: variable length argument lists. This is necessary to implement `list' without which macros are no fun. Entry: Two Kinds of Primitives Date: Fri Aug 7 13:20:51 CEST 2009 Due to restart at GC, the interpreter can support two kinds of primitives: * RESTARTABLE: Pure functions operating on Scheme data structures (or impure functions that do not perform allocation _after_ mutation). * ABSTRACT: C code that does not refer to any Scheme data. The disadvantage of not being able to access Scheme data from impure C code can be largely removed by providing a suspension mechanism for primitives. In this case the C code could behave as a coroutine, which allows the use of enumerators / iterators instead of construction of intermediate data. Impure functions performing allocation before mutation are hard to write, and are best limited to the implementation of internal primitives supporting mutation (like `set!' and `define') and adaptor code that bridge Scheme and C datastructures for abstract functions. In short: use either purely functional programming or concurrency oriented programming, and you get a very simple interpreter and application structure. Entry: Bootstrapping Date: Fri Aug 7 19:14:57 CEST 2009 Bootstrapping is fun. It's the first time I did this for a Scheme interpreter. I've added two mzscheme .ss files. One to convert .scm files to .c_ files as long as there is no reader, and one to lift the primitives from the .c file and write boilerplate registration code. Then bootstrapping the rest of the basic scheme functionality on top of the bare words is quite interesting. One thing though: I don't immediately see how to make apply and eval work. Eval is probably iterating the step() function. But apply? Similarly maybe? Constructing a continuation frame and iterating the interpreter? There is probably a simpler though non-obvious way to do this.. Manipulating the machine state from inside a primitive would probably do the trick. The trick is that the function modifies the continuation. This could be done using mutation, but that won't play nice with running the interpreter manually (as part of debugging for instance). It's probably best to make it possible for primitives to return a continuation. Or better: to allow for explicit "state transforming" primitives. Let's thinkg about this first.. Before the step() function can really be a pure function, it has to isolate all side effects, and should not rely on a main loop's actions. Maybe it's best to first try to implement call/cc. Applying a continuation is really simple. But how to capture it? Looks like that's similar to apply. Then about errors. Maybe it is better to keep errors local to the step() function that generated them, and return them as values containing the continuation. Except for GC restarts: they need to go all the way up to make sure data is not corrupted. Ok. errors are now local to step() allowing for recursive step() but GC restart goes all the way up the chain. Then a surprise: eval is (make-ast '(post 'foo-foo)) If a primitive returns a reducable term (represented by the ast wrapper type), it will be evaluated. Maybe apply can be made in a similar way? Well.. I found something that uses eval and map and quote. Not pretty.. Looks like manipulating the continuation is best. Entry: letcc -> apply Date: Sat Aug 8 10:22:47 CEST 2009 It seems better to stick with the way things work right now, and use only a single type of primitive. This makes integration with C (the eventual goal) a lot easier, and will prevent a lot of if-type style code. With `letcc' implemented as a special form, it is now possible to create primitives that operate on continuations as data, creating new continuations. These values can themselves be applied. This means `apply' can be implemented as a continuation modification followed by an application. Yes, but. The current k_apply type is a bit awkward, because it evaluates from left to right, it will wait for the final argument before coninuing with application. Maybe it's simplest to support a right-to-left app frame? Before continuing, let's make the continuations somewhat abstract: each with a parent slot in the first position. Ok. One way is to have a reverse k_apply: one that traverses the arguments from right to left. This way the `apply' continuation can be constructed as an argument list, with the function value passed to it. This would either require an extra reverse, or some re-arrangement in the primitive call trampoline. Or, a different kind of continuation could be made that ignores its value, but passes closure to its parent continuation. Basicly, such a continuation takes a value, ignores it, and passes another value to its parent continuation. Actually, this already exists: k_seq. Indeed: Using a continuation like this: _ -> (fn a1 a2 (begin _ a3)) will also ignore the supplied value and reduce to (fn a1 a2 a3). Entry: closures and values Date: Sat Aug 8 12:33:11 CEST 2009 So.. It's probably best to move closure_pack to the start of step(), and unpack before adding to the environment. It's a bit awkward the way it is now. Let's make it such that: - on step() entry, the closure is unpacked to perform reduction. - if there is no reduction, the value is packed. Entry: magic values Date: Sat Aug 8 12:47:55 CEST 2009 Because closures and ast's have a special meaning in the interpreter, they cannot be used as values in Scheme. Entry: Interesting Date: Sat Aug 8 15:28:07 CEST 2009 Implementing this Scheme interpreter without relying on cons or closures has been most revealing. I never knew I knew so little about the mechanics. Arrogance tends to get in the way of learning.. There are some meta-confusion things in the syntax representation that are a bit fishy still: - Syntax is wrapped in a weird way (tagged s-expressions. a full parse into an AST so values and terms can never be confused would probably be better). - Because of this confusion, primitives can return reducable expressions (make-ast) and closures (make-closure) that will not behave as literal data. As I've mentioned before, this makes make-ast behave as eval. Maybe the solution is to explicitly represent non-reducable values as VALUE? Ok: * all primitive return values will be tagged with VALUE * REDEX that is not a varref or an application is tagged as VALUE Ok, I see.. It's probably best to get rid of the word "closure", and have two kinds of values that contain environments: - redex - lambda The trouble is primitive values. In the lambda calculus, there are only two kinds of terms: applications and abstractions, where abstractions are values. So.. Can I get rid of it in one try? The code should be a lot clearer with this.. I think I got the concept right, but there is some stray assignment that messes up the GC. Maybe time to isolate all assignments? Inspecting the run state it gives something like: ... (#k_apply(#k_macro(#k_apply(#k_macro(#k_apply( ... Ok.. recursive macro application. After some more small tagging bugs (sc_apply_ktx required some head scratching) it seems to work now. The only remaining problem is the invokation of (gc) from within scheme. Was a problem with sc->entries. Added sc->step_entries, which is a different semaphore. Seems to work now. I can call the interpreter step from within scheme + all the state objects are available: (make-value 123) => #value(123) (eval-step (make-state (make-redex 'a '((a . 123))) (mt))) => #state(#value(123) #k_mt) (eval-step (eval-step (make-state (make-redex 'a '((a . 123))) (mt)))) => #error(halt 123 #state(#value(123) #k_mt)) Entry: Loop without cons Date: Sun Aug 9 01:09:22 CEST 2009 So.. What would be necessary to make an interpreter that can perform a loop without performing allocation? It would need to overwrite variables. So.. What is scheme without GC? This should be split in two: Scheme with closures, but without continuations (using a stack) vs. Scheme without closures, or only downward closures. Entry: Reader task Date: Sun Aug 9 11:32:40 CEST 2009 I got distracted.. Getting a front row view of how Scheme is implemented on top of a bare machine is fascinating. Anyways. Now that the core of the interpreter seems to work, let's make the first attempt at re-usable code: a reader = tokenizer + "compressed ast parser" for s-expressions. A compressed ast is a binary flat representation of a tree, consisting of a stream of words. This is important, because it will be the main interface between two communicating processes that do not share memory (or not all memory: they might share constant pointers.). Entry: Keeping track of value names Date: Sun Aug 9 12:05:38 CEST 2009 How to tag values with names? Every time a value is lifted from the environment, it could be recorded in the VALUE struct for debugging purposes. When there is an error, the current value's name could be used for reporting. Hmm.. It's not so simple.. It's probably easier to tag the primitive struct itself with names. Ok: solution was this: tag the sc struct with the primitive before executing it: exceptions come from the call chain, or the interpreter. Entry: Byte code Date: Sun Aug 9 13:40:28 CEST 2009 More procrastination. What would Scheme byte code look like? It is essentially creating a cached representation of the interpreter sequencing. The answer seems to be CPS conversion. Entry: More library code Date: Sun Aug 9 18:05:10 CEST 2009 let, letrec, named-let, apply, eval, cond. Entry: Scheme interpreter next: Date: Mon Aug 10 09:35:13 CEST 2009 critical path: * C tasks -> reader (also find a standard protocol for transferring tree data using a machine word port interface). structured procrastination: * compilation to bytecode (CPS) * subset compilation to C: i.e. embedding C in s-expr syntax using let* for set! begin while. maybe augmented with a bit of optimizations to make the subset larger (PreScheme style). * garbage collector: the stop & copy collector is good for bootstrapping, but it at least needs to be replaced by a cheney collector that doesn't use a stack. maybe some more fancy things later. this could be abstracted out and reused. Entry: Tasks: one-shot + continuations Date: Mon Aug 10 11:52:16 CEST 2009 It looks like the simplest approach is to use a `prompt'. The current tinyscheme implementation has a transparent mechanism: any C primitive can be suspended relative to the setjmp right before primitive execution. The setjmp is also used for implementing exceptions. The scheme in this project has a different structure: there is some state unpacking between the mark and the actual primitive call. This makes me thing it might be better to implement c continuations and one-shot tasks using a primitive that marks the context. OK. Continuations are working: they create a new ck struct on each suspend. It's probably to make one-shot continuations if the context is reused + a context size check is performed. NEXT: - allocate before running (run on separate stack) - one-shot Entry: GC Date: Mon Aug 10 15:48:22 CEST 2009 Apparently there's a problem with finalization of atoms: if the unreachable garbage space contains more than one reference to an atom, its finalization will be called twice. Hmm.. It looks like atoms that have a free() method need to be wrapped in a vector, to make sure they only appear once in the heap. So, the data needs a different encoding. Let's start from what we need in the GC: - vectors - constants - integers - finalizers An object that needs finalization which is not idempotent needs to be wrapped in a vector. OK: solved by removing the 'atom' type, but requiring that every finalizer found in the heap is followed a pointer to finalize. This gives a simple mechanism to build wrapper vectors around arbitrary (aligned) pointers: the client just needs to ensure that each object occurs only in a single vector, and use that vector as the opaque representation of the object. Entry: Alloc before call Date: Mon Aug 10 17:03:12 CEST 2009 Instead of allocating before calling a primitive, in the case where the intermediate operation doesn't alloc, it might be simpler to see if there is enough space and then postpone the allocation, in order not to create garbage. Entry: 0xbad Date: Mon Aug 10 19:31:34 CEST 2009 A coincidence: usually this indicates problems :) 1: test_ck() (123 . #aref(#fin<0xba6270:0x40123c> #data<0xbad420>)) 2: test_ck() (124 . #aref(#fin<0xba6270:0x40123c> #data<0xbad5b0>)) 2: test_ck() (124 . #aref(#fin<0xba6270:0x40123c> #data<0xbad740>)) 2: test_ck() (124 . #aref(#fin<0xba6270:0x40123c> #data<0xbad8d0>)) 3: test_ck() 125 Entry: Dybvig Danfest 2004 Date: Mon Aug 10 23:19:58 CEST 2009 "The macro writers bill of rights"[1]. Check out KFFD hygienic expansion algo[4]. This lead me to hygienic macros on top of unhygienic ones[2]. Apparently my letrec is wrong: all evaluations need to happen before all assignments. Optimizations: constant folding, copy propagation, inlining, dead code elimnation. Inlining (deciding to inline) isn't as easy as copy propagation: you want your compiler to terminate and prevent code explosion. [1] http://video.google.com/videoplay?docid=-6899972066795135270 [2] http://p-cos.net/documents/hygiene.pdf [3] http://people.csail.mit.edu/jaffer/r4rs_12.html [4] http://www.cs.ucdavis.edu/~devanbu/teaching/260/kohlbecker.pdf Entry: Sequential code Date: Tue Aug 11 09:39:04 CEST 2009 Visually I associate left-aligned code in Scheme with side-effects (begin). However, parallel code also looks like this, and is not necessarily bad from a mutation pov. Entry: GCC Date: Tue Aug 11 10:19:25 CEST 2009 Looking at the assembly output of scheme.c and it seems that it performs the inlining well, except for the allocation, which uses a vararg function _sc_make_struct that looks rather expensive to call. Also, gc_alloc won't line. I guess the `full' check prevents that. It's probably easier to implement GC boundaries with unmapped pages. Entry: bootstrap in forth Date: Tue Aug 11 10:44:02 CEST 2009 Now, in order to get to a smaller code size. What about writing the step function in threaded code? Does this make at all sense? I don't think so: the features that are used to write the interpreter are GC's vector construct (library) and C lexical scope and C structure member scope. It's rather awkward to write something like that in a combinator language. Entry: size Date: Tue Aug 11 11:21:40 CEST 2009 Looks like the real culprit is the primitive registration code. This can be done in a table. strip scheme ls -l scheme # currently -rwxr-xr-x 1 tom tom 44592 2009-08-11 11:28 scheme # Ha! # The prim table actually made it bigger :) -rwxr-xr-x 1 tom tom 45912 2009-08-11 11:34 scheme # Changing the prototype of the function to take strings instead of # symbols made it smaller: -rwxr-xr-x 1 tom tom 43936 2009-08-11 12:02 scheme So it's really the code that generates the boot code that bloats the binary. Looks like a reader is what's needed to compress further. Also, part of the data structures could be loaded as constants: symbols and primitives are not GC-managed, so they could be defined as constant data. Another thing that leads to bloat is the tag shifting. Can this be simplified to a single AND + compare? After separating the bootstrap code to lib.o i get these stripped sizes: # gcc -m32 -rw-r--r-- 1 tom tom 1256 2009-08-11 12:48 gc.o -rw-r--r-- 1 tom tom 16116 2009-08-11 12:48 lib.o -rw-r--r-- 1 tom tom 2032 2009-08-11 12:48 main.o -rw-r--r-- 1 tom tom 12456 2009-08-11 12:48 scheme.o -rw-r--r-- 1 tom tom 676 2009-08-11 12:48 symbol.o -rw-r--r-- 1 tom tom 1136 2009-08-11 12:48 task.o # gcc -m64 -rw-r--r-- 1 tom tom 1904 2009-08-11 12:49 gc.o -rw-r--r-- 1 tom tom 18040 2009-08-11 12:49 lib.o -rw-r--r-- 1 tom tom 2640 2009-08-11 12:49 main.o -rw-r--r-- 1 tom tom 17344 2009-08-11 12:49 scheme.o -rw-r--r-- 1 tom tom 1176 2009-08-11 12:49 symbol.o -rw-r--r-- 1 tom tom 1832 2009-08-11 12:49 task.o Entry: Compressing an s-expression into a serial protocol. Date: Tue Aug 11 12:53:48 CEST 2009 What about this: create a small stack based VM spec that can can create a data structure from an atom stack, and send the instructions over the wire. Hmm.. confusing.. Really I want a proper one-shot abstraction without copying before attempting this. What I really want is expanded AST compression or compilation to CPS bytecode. Entry: CPS conversion Date: Tue Aug 11 13:57:19 CEST 2009 (lambda (x y) (fn (- x 1) (+ y 2))) (lambda (x y k) a1 <- (- x 1) a2 <- (+ y 2) (fn a1 a2 k)) Entry: Local macros Date: Tue Aug 11 14:06:20 CEST 2009 Let's support those by also recording the macro environment. Looks like it's easier to first support `let' as a special form, and then reuse that code for `letmacro'. `let' would use an application frame, where the first value is already filled in (a lambda created from the body). The same could be done with `letmacro'. What would be a `lambda' where the names are bound to macros instead of values? Maybe that's the simpler route? Time to take a break, this is confusing.. Entry: Compiling to C and pattern matching Date: Tue Aug 11 16:48:29 CEST 2009 Take the interpreter's main loop and write it as a pattern match in Scheme syntax. Actually: a pattern matcher is a language for writing interpreters. Basicly, all the C structs, predicates, and if/then/else blocks can be automated easily, as long as it maps data -> data. Otoh, the C code itself isn't so difficult. Entry: Dynamic languages workshop Date: Tue Aug 11 17:05:08 CEST 2009 http://www.ai.mit.edu/projects/dynlangs/wizards-panels.html Panel 1: runtimes David Moon: ``Start with a small working system and expand it incrementally, adding functionality and adding performance. The big bang way only works for God. Everyone else has to use evolution.'' ``Sometimes an error message means exactly what it says.'' Panel 2: compilation An interesting discussion about finalizers. For libprim tasks, I'm thinking that C++ destructors might be interesting in combination with stack-based "pure" tasks: going out of scope due to return or exceptions calls destructors. Will Clinger mentions weak pointers [1]. Also David Detlefs talked about reference counting coming back. Figure out what he really meant. Panel 3: Language design Jonathan Rees: Tension between distributed programming and OO: an object is essentially "here": all abstractions around that is essentially smoke and mirrors. (I.e.: physical location/locality becomes more important as systems grow physicaly larger.) This is about physical things (state) vs. information. Object != value. Guy Steele: little known secrets - stacks are overrated - GC is good, finalizers are bad (finalizers w. good semantics <- GC at every pointer alteration) - lexical scoping is good. easy to screw up when writing interpreter. - type inference can be good but error reporting is hard. - monads maybe very good - higher order functionals are underrated. thinking in APL can approve your lisp code. - keyword arguments are good (PG) - pretty printers are vastly underrated - rule-based systems. (20 years ago, time to try again). [1] http://www.iecc.com/gclist/GC-faq.html Entry: removing optimizations Date: Tue Aug 11 17:45:36 CEST 2009 Heeding good advice: removing all optimizations that make code look more complicated. Entry: Real-time Garbage Collection Date: Tue Aug 11 22:20:29 CEST 2009 Supercollider uses [1] while libgarbagecollector[2] is used in Io. This one[3] is a rework of Metronome. I'd like to know how this works.. After reading a bit, the conclusion I get to is that this isn't really a science. Optimality boils down to figuring out which objects are long lived, so they can be ignored most of the time. There are a bunch of heuristic tricks to do this, but none of them are optimal for all programs. [1] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.39.4550 [2] http://www.dekorte.com/projects/opensource/libgarbagecollector/ [3] http://lambda-the-ultimate.org/node/2393 Entry: GC stuff Date: Tue Aug 11 22:49:43 CEST 2009 Generational collection. One of the key points seems to be that one mostly collects the `nursery', a pool of recently allocated objects. The assumption is that most of the elements in the nursery will become garbage soon after allocation (child mortality) AND that there are a lot of objects that don't. In order for this to work there needs to be a cheap way of finding pointers that point from the heap at large into the (much smaller) nursery. Finalization: Instead of scanning the from space after copying, it's also possible to perform the finalization later, when allocating anew: the data will be moved to the cache anyway because of initialization, so we might just as well access it then. Entry: Cheney GC Date: Wed Aug 12 10:13:23 CEST 2009 [ Minsky-Fenichel-Yochelson-Cheney-Arnborg (MFYCA) method ] Like a recursive mark and copy collector, but using a breadth-first search. 1 Copy all root objects: copy atoms, update moved vectors and copy vectors not yet moved. 2 Copy all references in newly allocated objects. 3 Repeat until heap stops growing (all reachable objects are moved). OK: typed in, compiling, not working: it blows up. Simplified, but I can't see it. Guess it's time for a proof. An interesting challenge this is: I've always stayed away from implementing graph algorithms in C. It requires a lot of discipline to get the invariants right. The cause is usually due to embedding and premature optimization, i.e. using the same field to represent two distinct values in different modes when there are subtle cases where this is not valid. The Cheney algo is a simple one, but I couldn't get it right ``just writing it down''. Ha.. Indeed: caused by leaky abstractions: the problem was that vector tags didn't get copied, which probably triggered an allocation loop in the interpreter due to type errors. Next: add assertions and error handling for the whole project. [1] http://en.wikipedia.org/wiki/Cheney%27s_algorithm Entry: Simpler k_apply Date: Thu Aug 13 12:17:00 CEST 2009 Next time I write an interpreter, it will be a right-to-left evaluating one. This simplifies the form of apply continuations, to easier support the `apply' primitive. Entry: Restating goals Date: Thu Aug 13 12:27:44 CEST 2009 In libprim I'm trying to explore the idea of non-invasive scripting. Essentially, it explores a concurrency-oriented programming style, where one of the tasks is a `management core'. At the same time I'm looking into simple interpreter cores to serve as the operating system. Currently there's a bare-bones scheme interpreter. Entry: Ports Date: Thu Aug 13 12:36:27 CEST 2009 Added a ports struct (GCd aref) + a debug port until dynamic scope is implemented. Entry: Current scheme implementation wrt. generated code. Date: Thu Aug 13 13:58:21 CEST 2009 Code is generated for declaring primitive functions and defining boot code in C (as long as there is no reader). This could be augmented by generating all code that has to do with data structures: struct def, lowlevel casts, and highlevel predicates and constructors. Additionally, all primitive trampoline code could be generated. Definitely: when the primitive struct accessors need to be available, it would be simpler to use a code generator. Entry: Leaf: small object model Date: Thu Aug 13 14:37:22 CEST 2009 So.. Basicly, what's next is to make primitive data extension simpler. I'd like to get to a functionality of basic Scheme or PF: a collection of primitive objects written in C, easily interfaced with scripting code and independent of its data and control models. For the GC memory model (Scheme), such objects are wrapped in aref (finalizer + constant). For the linear model, they would be wrapped in a refcount cell. tom@zni:~/libprim/leaf$ ls *.c bytes.c port.c symbol.c task.c Entry: Naming conventions Date: Thu Aug 13 16:25:49 CEST 2009 The trick seems to really be: use decent naming conventions for C functions and types. This way a lot of boilerplate is easily generated at compile time. Entry: Stack language Date: Thu Aug 13 16:52:32 CEST 2009 With the scheme language as good as finished, let's start the stack language. 1. references 2. cell-base 3. queues Ok.. Separated out the object representation of GC: it can be re-used in refcount based memory model. The first thing to do is to create the memory and object model. Essentially: implement `void' and `drop'. Done. Next: print objects. This can be done by factoring out the primitive print functions. Entry: Dual regime Date: Thu Aug 13 18:46:32 CEST 2009 I started with the basics of pf2.c : stack based alloc and free. Maybe the important thing to realize is that it's not going to be possible to be completely linear, due to references to code and global storage. However, it might make sense to get to an almost-linear regime, where GC is active during load and setup, and after that the GC is switched off and only the linear language runs. Anyway, let's just be carful to make sure pf data types are compatible with GC. Alternatively: the code/variable space could be made inaccessible to the language. If it is externally managed and accessed abstractly, references into this memory _can_ be used. Entry: Reusing primitives Date: Thu Aug 13 23:16:55 CEST 2009 There seems to be a real benefit in doing this in a shared way. All Scheme primitives that do not CONS can be directly reused. Even better: the whole machine might be embeddable inside scheme: this would give a way to make references work (variables and code), which would lie outside the reach of stack code, but could be used as jump targets and data storage. Entry: Moving the stack language into the Scheme interpreter. Date: Fri Aug 14 08:48:30 CEST 2009 Basic idea: the PF machine never CONSes from the GC free space for _data_ objects. It's CONS is a reusing one. What about doing this differently: separate out the memory model from the scheme struct. This includes GC and all leaf object classes. The thing to decide is which objects are allowed in the stack machine as data objects, and which should remain hidden. There's no use for exporting the Scheme's internal data structures (yet). Let's start with cons cells and atoms. Maybe the main problem is representing RC'd atoms in the Scheme core. It doesn't seem to be necessary to run PF in SC. As long as the memory management is compatible, they can be made to work together. The hairy part is the GC restart, since they would need to share the GC next to all the mem classes, but it doesn't seem infeasible. So, given that we're using the Scheme's data model, how should finalization and reference counting be bridged? Entry: Packet Forth Date: Fri Aug 14 11:09:11 CEST 2009 It's becoming more clear to me that this is really a rewrite of PF. The core idea is to be able to use linear+RC based memory management for ``regime programs'' : code that performs some setup and then ends up in an infinite loop with very predictable memory patterns. The mistake I made in the first implementation is to assume that GC isn't necessary. Only after _first_ implementing a Scheme around the primitive data types did I realize that this is really an essential part: the synergy between a Scheme-like host and a linear stack language that can mutate the GC-managed graph is essential to the idea. You need the first to provide "global memory" for the second. Representing data. The PF primitives should know about (``inside memory''). - constants - cons cells - RC managed leaf objects - opaque references to "outside" memory The outside model, accessible from Scheme, needs to wrap RC managed cells in an aref object. An ``outside pointer'' TAG_OPAQUE is a data structure that is GC-managed, but can be accessed abstractly from the inside. It is treated as a constant. Basicly, in the linear model, everything could be opaque by default. Essentially, PF doesn't need to care about managing anything else than CONS cells and RC objects. Now, whenever an RC value is made accessible to Scheme, it needs to be wrapped in an aref, where the finalizer decrements the refcount. Such values are really meant to be kept local to the linear memory, but of course it shouldn't be prevented to have them in the global space, i.e. embedded in a piece of code as a constant. This is not trivial however. The refcounted object should be linked to the Scheme wrapper, to keep the representation unique. However, this link must not become stale! It looks like an rc struct needs to be a GC managed vector. Conceptually it's not so hard: my difficulty is with the representation. LIN -> GRAPH : rc++ finalize: rc-- wrappers don't need to be unique! The latter remark is key. Looks like this works fine. Entry: Code and Variables Date: Fri Aug 14 16:26:47 CEST 2009 The VM's data structures should not be visible inside the PF language: the machine is not reflective. This is a key difference with the current PF implementation. This enables graph data to be used to represent PF byte code. (Note that if reflection is necessary, it can be implemented at a greater expense by _copying_ from graph -> linear memory and back.) The remaining problem is then the representation of variables. Essentially a variable is a box that's referenced by code. CODE -> BOX This BOX can then be loaded on the data stack to enable the PF machine to modify its contents. But. What can be put in a box? What if a box disappears? Hmm.. not done yet. Basicly it seems that a box needs to be an aref. What if an aref contains a list? Yes.. plenty of pitfalls.. This can be solved by changing the semantics of finalizers: they take objects instead of constant's pointer. By tagging constant with 00 nothing really changes though. So, finalizers take objects: this means that linear structures can be embedded in an aref: if an aref becomes unreachable, an RC-- will ripple through the list. Entry: Scheme Primitives in the Stack Machine Date: Fri Aug 14 18:00:40 CEST 2009 If a primtive doesn't CONS it can be reused without problems. Let's separate all of them out, making reuse easier. Entry: Queues Date: Fri Aug 14 18:01:53 CEST 2009 As the CONS of two stacks? Note that this gives only amortized O(1) addittions and removals, and might for that reason not be a good candidate. Otoh, it does allows representation as a tree. Because of the linear nature however, mutable vector-based solutions might work. Entry: Packet Forth Interpreter Date: Fri Aug 14 18:08:45 CEST 2009 Next: the interpreter. Should be quite straightforward, only this time I'd like to not make the mistake of not having proper tail call handling. For some late night zen. One thing though: the return addresses. What are they? They need to be real references to guarantee a safe memory model, but they need to be distinguishable from ordinary CONS cells. Maybe just add a CODE cell? Also, this gives a very elegant way to make tail recursion work: simply link the CDR to the head of the last word called! Looks like this is ``continuations for free''. The return stack then becomes a CONS list of CODE pairs. CODE = RETURN | (SUB : CODE) ;; ':' is a GC CONS SUB = PRIM | CODE RS = MT | (CODE . RS) ;; '.' is a linear CONS Here SUB is either a PRIM which can be invoked to modify the machine state, or a CODE list which can be executed by saving the current CODE cdr on the return stack and jumping to the SUB CODE. RETURN pops the next CODE from RS and jumps to it. ( Note that RETURN is necessary _only_ if a code definition ends on a primitive. If not, it's simply linked to the tail procedure. ) Essentially, the return stack itself is in this case also just a code graph ``constructed at run-time'' : it is a callable function! Note however that `.' is a _stack_ that's managed linearly, while the `:' is graph managed. I.e. the return stack can't contain loops, but the code can. This makes the new PF semantics complete. HA! Entry: pf_run() Date: Sat Aug 15 00:04:48 CEST 2009 The basic interpreter seems to be running. Next: exceptions and constants (quotation). For quotations, I'm running into the same problem as for the Scheme interpreter: it is necessary to strictly separate language and metalanguage: explicit quote tags are necessary to be able to quote code. Entry: Linear pairs Date: Sat Aug 15 11:47:05 CEST 2009 It's probably better to use a different data type for linear pairs. That way nonlinear pairs can be used as constants too. Otoh: reusing the same pair structure allows primitives to be reused. Even those that do cons. (Cons could be part of a shared object). It's really not such a hurdle to manage the border between linear and graph memory, so let's stick to the simple implementation: cons lists are cons lists, they just have only a single reference in the linear part. Entry: More sharing between Scheme and PF Date: Sat Aug 15 12:02:16 CEST 2009 There is already quite some duplication, mostly in exception handling, GC, and printing. This should be handled by "mem". (mom?). I guess it's a good idea to factor out as much as possible beforehand. Let's rename "mem" to "base", which implements: - GC restart - primitive leaf types - primtiive exception handling - printing Hmm.. Let's not. The simplest way is to make sure the primitives themselves will take from the base class. This can be factored out later.. It was only a couple of lines to add the ABORT exceptions in PF + it's slightly different since the interpreter is not re-entrant. Entry: Code bootstrap Date: Sat Aug 15 12:48:20 CEST 2009 Now that default abort works it's time to start adding some primitives and a bootstrap reader (sexp -> C). Seems straightforward once list -> code translation works. (eval). Before going there, I think it's best to do the shared primitives anyway. The PF core can benefit greatly from _expressions_ : in fact, almost all code in pf.c is doubly done: once as expression, and once bound to the linear stack. Entry: EX: expression language Date: Sat Aug 15 13:55:38 CEST 2009 `expressions' use the following resources: - GC - exceptions - leaf types It is a dynamically typed functional language using the C-stack for expression nesting. The Scheme and PF languages add primitives that operate on an extension of this state, with most of the primitive expressions shared. Now, by binding EX in the respective scripting languages to &pf->m and &sc->m it is possible to abbreviate all expressions, leaving out the prefixes. I.e. generalization of: #define CONS(a,b) ex_cons(EX, a, b) Now that the infrastructure is there, let's move the primitives lazily depending on need in both languages. Entry: Static scope. Date: Sat Aug 15 15:41:17 CEST 2009 What's interesting for the current PF code graph implementation, is that it is actually statically scoped. All variables are bound at compile time, which makes (data / code) hiding through local definitions possible. I'm not sure how to reflect this back into the language, but it could be used to create local dictionaries, one of the things that was really hard to do in original PF. Entry: PF / PX primitives Date: Sat Aug 15 16:05:08 CEST 2009 In light of this sharing it makes sense to define PF in terms of PX primitives: in-place expressions. pf_ forth words pl_ linear (in-place update) expressions ex_ nonlinear (sharing / consing) expressions (usable in Scheme) pf_ probably are redundant, and can be generated from px_ Entry: EX printing Date: Sat Aug 15 16:43:05 CEST 2009 EX has a lowlevel port* reference to implement basic printing. It needs a lowlevel interface because in SC and PF ports are managed respectively as graph and linear atoms. Entry: Building Date: Sat Aug 15 20:47:04 CEST 2009 Looks like I've got the build problems ironed out. Moral: dont _EVER_ generate files outside your local makefile directory. This can be subtly done by some rule that depends on a specification outside of your dir.. I had something like this: ../ex/ex_prims.h_: ../ex/ex_prims.c which nicely overwrite the .h_ file that should have been built in another directory using a locally defined rule for .h_ files Another thing is to use different extensions for different generators. Otherwise the problem above is bound to impose itself. Entry: EX errors Date: Sat Aug 15 21:18:47 CEST 2009 So.. Time to move primitive errors from SC -> EX. Done. Entry: Knot tying Date: Sat Aug 15 22:50:25 CEST 2009 To define recursive functions, it is necessary to use mutation of the code graph. I would like to preserve a semantics of early binding though. OK, it's not so difficult. Adjust the environment so each pair points to a code cell. This is the one that will be referred to from within other code. When a definition arrives, reuse the cell. What I really would like is a way to call Scheme code during definition. Mutation doesn't go well with the allocations necessary. So maybe this needs to be written as a bootstrapped composite word? Otoh, it might be solved linearly, translating the code in-place, transforming the structure into code directly. 1. compile in-place, reusing the cons cells (code is isomorphic to the data representation, except for `quote' and variable wrappers. 2. update the global environment. This would work were it not for the wrapping that needs to perform allocation. It really would be simpler to do it from Scheme. Alternatively, it can be done in two passes: one pass creates the skeleton but doesn't mutate to make sure it can be restarted, while the second pass fills in the values. Also, it's possible to start with a non-recursive version and use that to boostrap the recursive one as a primitive. It's really not so difficult: the only need is that everything fits into free GC space before the modification to the environment is made. Entry: PF compiler Date: Sun Aug 16 11:44:20 CEST 2009 "Compiler" is a strong word: it's mostly a linker. As mentioned in the previous post, this is a bit awkward due to the graph patching that needs to be done. There are many corner cases related to singleton words (that need ``eta-reduction'') or primitives, that can be inlined. The real question is (and I've run into this before) does the dictionary contain CODE references, or SUB references? CODE = RETURN | (SUB : CODE) ;; ':' is a GC CONS SUB = PRIM | QUOTE | CODE RS = MT | (CODE . RS) ;; '.' is a linear CONS It might make more sense to use SUB refs. Tail calls are tricky! Let's change the types a bit: CPAIR = (SUB : CODE) CODE = RETURN | CPAIR SUB = PRIM | QUOTE | CPAIR RS = NIL | (CPAIR . RS) The reason this feels a bit weird is that composite code doesn't need RETURN. The only way to ever pop RS is to have a word end in a primitive, i.e. [ PRIM RETURN ]. Composite code can always just jump. Looking at CPS versions of Scheme, this is also quite clear: the only thing that ever _invokes_ the continuation, is a primitive. So, can the types be made a bit more intuitive? Entry: New PF interpreter Date: Sun Aug 16 12:09:57 CEST 2009 The interpreter evaluates the following recursive data structure. ( Note that this represents fully resolved byte code: it doesn't contain any variable references. ) SEQ = (SUB : SUB) ;; `:' is graph CONS SUB = PRIM | QUOTE | SEQ K = HALT | (SUB . K) ;; `.' is linear CONS SEQ Code sequencing provides the encoding of what to do _now_ and what to do _next_ SUB A subroutine is a basic unit of work: invoke a primitive function, load a data object onto the parameter stack, or perform a code sequence. K The continuation is a stack constructed at run-time, representing the rest of the evaluation, finishing with HALT. The interpreter uses this data structure as follows (in pseudo code, an ML implementation can be found here[1]). sub = POP(K) case sub SEQ (now : next) -> K = PUSH(now, PUSH(next, K)) PRIM -> execute PRIM QUOTE -> load data item on parameter stack SUB is the main code type (the subroutine). It is what you pass to `run', and it is what can be found as resume points in the continuation. It includes primitive code and data, as well as composite code sequences (SEQ). When a SEQ is encoded, the two SUBs will be pushed onto K in the correct order. Note that this encoding automatically gives proper tail call handling[2]: loops can be implemented using recursion without blowing up the K stack. * * * The above is the representation of compiled code. Concatenative (Joy-like) code can be compiled straightforwardly to SEQ-based code. This can be illustrated using the standard lisp CDR-linked pair notation to represent both source code CONS pairs and SUB pairs. The the symbolic program (a b c d) = CONS(a, CONS(b, CONS(c, CONS(d, NIL)))) is compiled to an improper list (A B C . D) = SEQ(A, SEQ(B, SEQ(C, D))) where each symbol x is mapped to a corresponding X representing a SUB. Resolution happens using a global vocabulary that maps symbols to SUBs. * * * The code representation as a binary graph has several advantages related to the similarity between binary graphs and binary trees (PF's linear pair data structure for data storage and representing the machine continuation). - SEQ as a pair of SUBs gives a very direct encoding of "now" vs "next". This makes it really easy to represent continuations as a stack of SUBs. - The continuation K ``looks like'' a SEQ (a particular kind of SUB), in that it is a linear pair of a SUB and a K. - The machine can run without nonlinear data allocation: continuation push/pops are linear. - One-shot full and partial continuations (segments of the K stack) are (isomorphic to) linear lists. - This isomorphism obviously works also the other way: linear lists constructed at run time can be converted to one-shot full or partial continuations. (This can be used to emulate one-shot `closures'.) - Linear one-shot closures/continuations can be _copied_ to make them multi-shot. - They can also be _compiled_ to code (linear pairs represented by nonlinear SEQ pairs) so they can be reused any number of times. However, this is not a linear operation. ( Note that the isomorphisms need some work to figure out quoting issues. Essentially it will boil down to different kinds of continuation frames. ) [1] http://zwizwa.be/darcs/libprim/pf/pf.ml [2] http://repository.readscheme.org/ftp/papers/ai-lab-pubs/AIM-443.pdf Entry: Compilation Date: Sun Aug 16 12:43:45 CEST 2009 Now.. I think the bytecode does what I need it to do. But how to compile it? This requires multiple passes to properly inline all references. I have something like this: FOR(SRC): symbol? YES -> dereference if possible NO -> quote as datum next? YES -> alloc SEQ and continue NO -> store and finish Then iterate over the data structure until all symbols are dereferenced. There are (at least?) two special cases that won't work: - empty programs are not handled properly. - degenerate loops lead to infinite dereference passes: (a b) (b a) Handling those separately should make it work though. First, filter out all references to empty programs. Second: use a counter to indicate the number of dereferences performed. If 0, there is no progress: any remaining references are degenerate loops. The empty program (NOP) is necessary as a corner case, because empty quotations need to work. These can be optimized out of normal code though. Hmm.. this approach has a number of pitfalls. It isn't possible to determine whether a program is a NOP before completely compiling it. Let's make it work first using a 1->1 map from source structure, and perform optimizations in a second pass. So, the passes are: (PURE) 1. construct environment 2. compile s-expr -> AST (``improper'' SUB lists) 3. resolve all references 4. check degenerate loops (NI) 5. snap pointers (NI) (IMPURE) 6. patch global environment (NI) Most of the passes can be done purely. They don't create much garbage so they can be kept as primitives (should make it to the end without GC). To test it, first the primitives need to be defined. OK. Prims load + highlevel code compiles. Entry: Linear cons cells Date: Sun Aug 16 18:46:55 CEST 2009 It might be best to switch to an explicit representation for linear CONS cells. Especially for the interaction with Scheme, it might be confusing to have the same datastructure represent two different things. The hack is of course that inside the PF primitives, it's possible to use direct calls to read-only ex_ functions, without the need to make them polymorphic to both linear and nonlinear lists. Anyways, this seems something to postpone until it becomes a problem. It should be straightforward to do: it's mostly the functions _pf_copy_from_graph and _pf_copy_to_graph. Entry: QUOTE Date: Sun Aug 16 19:32:48 CEST 2009 Another premature optimization: currently there's a lot of copying going on when defining new code from within PF (represented as linear lists). Entry: Decompile Date: Sun Aug 16 20:59:37 CEST 2009 With everything packed as a graph, it's quite difficult to print out any code. Let's add some reverse lookup for code printing. OK. This seems to work. With a bit of effort re-quoting also works. Note that words don't have a definite end, except when they end in a primitive. Whenever it encounters a sequence tail that has a name, it prints that instead. Using brackets for printing code: (def . ['456 dup-post abc]) (loop . ['1 [print-state] ['1 '2] loop]) (abc . ['123 dup-post]) Entry: Autosnarf Date: Sun Aug 16 22:38:53 CEST 2009 From EX / PX -> PF. Possible? Probably not, since there are no guarantees about linearity of the output. Without proper typing this doesn't seem to work.. So it needs to be done manually. Entry: No IP Date: Mon Aug 17 09:18:48 CEST 2009 With the machine organized as it is now, there is really no need for an IP register: always use the top of RS. Well, conceptually that's the case, but let's keep it like it is now. It's an implementation issue that doesn't have too many consequences. Another thing though: the return stack can't be used for storing temporary data. Or can it? Looks like it can, at least in the middle of a SEQ, because an atom placed on RS will only be accessed when evaluating a primitive or constant in tail position. Entry: Next? Date: Mon Aug 17 09:55:03 CEST 2009 - More primitives: combinators + list/stack/queue ops. - Reader Entry: Making PF expressions available to scheme. Date: Mon Aug 17 09:59:05 CEST 2009 The requirement is to decouple them from the pf* state other than ex*. Very straightforward: rename them px_ -> ex_ and use the ../ex/ex_prims.ss snarfer. Entry: Does PF need real-time GC? Date: Mon Aug 17 10:27:47 CEST 2009 Note that I realize that these days it is possible to get real-time incremental GC. However, these are still not synchronous memory managers: they won't reclaim large blocks immediately so always require more memory than a synchronous GC, and on top of that will cause unnecessary loss of data locality. This doesn't mean that the current (very simple stop-and-copy Cheney GC) can't be replaced by something more elaborate to make the _compilation_ also respect real-time constraints. Otoh, since the linear and graph memory are completely separate, and the GC will never modify the linear data, a GC that doesn't move the data can be run in a low priority thread without problems. So, yes: I guess I'm taking position _against_ real-time GC, at least for scripted DSP applications (for static compilation, you can probably use static buffer allocation) where large chunks of memory are allocated and freed at a rapid rate. It seems that the multitasking approach makes more sense: a linear core which can use (read-only) data and code from a nonlinear one. ... Baker 1978[1] -> contains some information about CDR coding. Useful for making byte code sequential (cache-friendly). Chapter 8 is of interest though: talks about RC counting. The trouble with RC managed data that Baker mentions is that a `drop' of a huge data structure takes a long time to deallocate. He suggests to do te deallocation lazily. In practice though (especially due to the `physical' nature of linear objects) this doesn't really happen much. Consumption rate is usually one-by-one. And, lazy dealloc still doesn't perform fast reclaim for cache locality, what we're really after.. [1] http://home.pipeline.com/~hbaker1/RealTimeGC.html Entry: `Harvard' PF in Flash ROM Date: Mon Aug 17 19:15:18 CEST 2009 Now.. If this works, does it make sense to try to implement the same algorithm on a PIC18 while using the RAM as linear memory, and keeping the ROM GC managed? It really doesn't need all that much space, and the self-write seems to be compatible. Entry: PEG/LEG Date: Tue Aug 18 08:58:30 CEST 2009 Trying out this[1]. The best way to understand is the metacircular example: The grammar for peg grammars is shown below. Grammar <- Spacing Definition+ EndOfFile Definition <- Identifier LEFTARROW Expression Expression <- Sequence ( SLASH Sequence )* Sequence <- Prefix* Prefix <- AND Action / ( AND | NOT )? Suffix Suffix <- Primary ( QUERY / STAR / PLUS )? Primary <- Identifier !LEFTARROW / OPEN Expression CLOSE / Literal / Class / DOT / Action / BEGIN / END Identifier <- < IdentStart IdentCont* > Spacing IdentStart <- [a-zA-Z_] IdentCont <- IdentStart / [0-9] Literal <- ['] < ( !['] Char )* > ['] Spacing / ["] < ( !["] Char )* > ["] Spacing Class <- '[' < ( !']' Range )* > ']' Spacing Range <- Char '-' Char / Char Char <- '\\' [abefnrtv'"\[\]\\] / '\\' [0-3][0-7][0-7] / '\\' [0-7][0-7]? / '\\' '-' / !'\\' . LEFTARROW <- '<-' Spacing SLASH <- '/' Spacing AND <- '&' Spacing NOT <- '!' Spacing QUERY <- '?' Spacing STAR <- '*' Spacing PLUS <- '+' Spacing OPEN <- '(' Spacing CLOSE <- ')' Spacing DOT <- '.' Spacing Spacing <- ( Space / Comment )* Comment <- '#' ( !EndOfLine . )* EndOfLine Space <- ' ' / '\t' / EndOfLine EndOfLine <- '\r\n' / '\n' / '\r' EndOfFile <- !. Action <- '{' < [^}]* > '}' Spacing BEGIN <- '<' Spacing END <- '>' Spacing Starting with this: Grammar <- Spacing Identifier+ EndOfFile Spacing <- ( Space / Comment )* Space <- ' ' / '\t' / EndOfLine Comment <- ';;' ( !EndOfLine . )* EndOfLine EndOfLine <- '\r\n' / '\n' / '\r' EndOfFile <- !. Identifier <- < IdentStart IdentCont* > Spacing { printf("ID: %s\n", yytext); } IdentStart <- [a-zA-Z_] IdentCont <- IdentStart / [0-9] Which doesn't do anything.. I guess I'm not getting something... Maybe what I need is leg instead of peg? Indeed, it seems to support expressions. exampeles/calc.leg: %{ #include int vars[26]; %} Stmt = - e:Expr EOL { printf("%d\n", e); } | ( !EOL . )* EOL { printf("error\n"); } Expr = i:ID ASSIGN s:Sum { $$= vars[i]= s; } | s:Sum { $$= s; } Sum = l:Product ( PLUS r:Product { l += r; } | MINUS r:Product { l -= r; } )* { $$= l; } Product = l:Value ( TIMES r:Value { l *= r; } | DIVIDE r:Value { l /= r; } )* { $$= l; } Value = i:NUMBER { $$= atoi(yytext); } | i:ID !ASSIGN { $$= vars[i]; } | OPEN i:Expr CLOSE { $$= i; } NUMBER = < [0-9]+ > - { $$= atoi(yytext); } ID = < [a-z] > - { $$= yytext[0] - 'a'; } ASSIGN = '=' - PLUS = '+' - MINUS = '-' - TIMES = '*' - DIVIDE = '/' - OPEN = '(' - CLOSE = ')' - - = [ \t]* EOL = '\n' | '\r\n' | '\r' | ';' %% int main() { while (yyparse()); return 0; } One problem though: it doesn't seem to be possible to pass context into yyparse that can trickle down to the actions. Makes me wonder how one would implement dynamic scope in C in a thread-safe way. I guess by using thread-local variables.. Ok. I think I get the basic idea. Now the subtleties. How to express delimiting? [1] http://piumarta.com/software/peg/peg.1.html Entry: Weird error Date: Tue Aug 18 15:37:32 CEST 2009 (define (read-file) (let loop () (let ((expr (read))) (if (eof-object? expr) (post expr) (begin (post expr) (loop)))))) (read-file) Passing the string ' 1 2 3 abc "def" foo' gives: READ-TEST ;; gc 2340:12378 ;; gc 2383:12335 ;; gc 2380:12338 1 2 ;; gc 2377:12341 3 abc "def" foo ;; gc 2298:12420 ERROR: undefined: expr This `undefined' message should not happen, as the variable should be in lexical context. Something goes wrong here.. The environment does have `expr' : (gdb) p ex_post(sc, _ex_map1_prim(sc, ex_car, env)) (expr loop) So maybe the two symbols are not equal? (gdb) p term $6 = 6561248 (gdb) p ex_caar(sc, env) $7 = 6561248 No, they are.. Ok, I know what this is: This should use find_slot istead of find, because the value can be #f Entry: First PEG grammar Date: Tue Aug 18 16:41:23 CEST 2009 Next = - d:Datum { ob = d; } | - e:EOF { ob = e; } | j:Junk { ob = j; } Datum = List | Number | Symbol | String LP = '(' - RP = ')' - Dot = '.' Space - List = LP Tail Tail = a:Datum Dot d:Datum RP { $$= CONS(a,d) } | RP { $$= NIL } | d:Datum t:Tail { $$= CONS(d,t) } Junk = < .* > EOF { $$= JUNK(yytext) } Number = < Digit+ > !Letter - { $$= NUMBER(atoi(yytext)) } Symbol = < Letter Char* > - { $$= SYMBOL(yytext) } String = ["] < ( !["] Char )* > ["] - { $$= STRING(yytext) } Letter = [A-Za-z] Digit = [0-9] Char = Letter | Digit Space = [ \t\n] - = ( Space | Comment )* EOF = !. { $$= EOF_OBJECT } Comment = ';' ( !EOL . )* EOL EOL = '\n' | '\r\n' | '\r' After adding some special characters, it parses the boot.scm file. Parse error reporting is going to be spartan it looks like.. So, how to parse from a different input port? ( Simple: read the man page. ) I think I get it now: a PEG is a backtracking program with a sequential left to right choice operator. It is not a description of a CFG. So, in order to read a minimal amount of characters, the current grammar needs to be reworked a bit. Esp. the greedy post-whitespace parse needs to be replaced by something else. Start with simply putting all '-' rules in front. This seems to almost work.. The .scm file won't parse though. Entry: REPL working Date: Tue Aug 18 20:07:50 CEST 2009 # This is the non-greedy version of the grammar: # http://zwizwa.be/darcs/libprim/ex/sexp.leg Read = - n:Next { ob = n; } Next = Datum | EOF | Junk Datum = Quote | Const | List | Number | Symbol | String List = LP Tail Tail = a:Datum Dot d:Datum RP { $$= CONS(a,d) } | RP { $$= NIL } | d:Datum t:Tail { $$= CONS(d,t) } Quote = - ['] - d:Datum { $$= QUOTE(d) } Const = True | False LP = - '(' RP = - ')' Dot = Space - '.' True = - '#t' { $$= TRUE } False = - '#f' { $$= FALSE } Number = - < Digit+ > !NonDigit { $$= NUMBER(atoi(yytext)) } Symbol = - < Char+ > { $$= SYMBOL(yytext) } String = - ["] < ( !["] . )* > ["] { $$= STRING(yytext) } Letter = [A-Za-z] Digit = [0-9] Special = '!' | '-' | '?' | '*' | '_' | ':' | '<' | '>' | '=' | '/' | '+' NonDigit = Letter | Special Char = Digit | NonDigit Space = [ \t\n] - = ( Space | Comment )* Comment = ';' ( !EOL . )* EOL EOL = '\n' | '\r\n' | '\r' EOF = !. { $$= EOF_OBJECT } Junk = < .* > EOF { $$= JUNK(yytext) } Entry: Next Date: Tue Aug 18 21:26:25 CEST 2009 Scheme: - dynamic parameters - dynamic wind - partial continuations PF: - separate linear pair type (OK) - export non-linear part as EX (OK) - polymorphy - packet reuse - exceptions EX: - primitives Entry: GC and read -> need C suspensions Date: Wed Aug 19 08:43:40 CEST 2009 Because ports are stateful entities, the read operation might interact badly with GC restarts. It might be necessary to construct an intermediate representation. The real solution however is to make the reader communicate externally at everything which is now a CONS, so it never sees scheme values. I guess this can be made to work, so I'm going to stick to the current version, since that change would be transparent once context switching is implemented. Entry: PF compiler: SC < PF ? Date: Wed Aug 19 09:25:54 CEST 2009 Since it needs to construct the right data types, which ultimately tie into the memory manager through finalizers, it looks like the linear memory manager needs to be part of the Scheme's machine state if we want to create linear code.. So no, this doesn't work.. What might work is to do it the other way around: have PF state be an extension of SC state, so all Scheme code can run on the PF machine, _and_ use the PF compiler words, but not vice versa. Indeed: the `sc' struct only has the global registers and symbol cache in addition to the `ex' struct. This can be easily changed. In order for this to work, the linear data needs to be well separated. Let's make a separate LCONS data structure. Ok, almost done. ex_bang_reverse was not polymorphic, and it wasn't completely clear where to do the graph -> lin conversion. NONLIN -> LIN conversion should happen in pf_ functions, while px_ functions operate on nonlinear structures. They can depend on linear structures, as long as they don't link them to anything. I.e. inplace update is allowed, and some kinds of folding. Entry: LIN vs. BOX Date: Wed Aug 19 12:24:51 CEST 2009 It's probably simpler to place ALL linear data that is supposed to go into the nonlinear tree in a BOX. LIN is only useful for quotations, which can be nonlinear values. This does break LIN->GRAPH->LIN equivalence, since all data will be wrapped. Something is not well specified in the data conversion: how to handle lists at compilation time? The interpreter should be dumb: it unwraps LIN, but will simpy copy anything else. Let's default QUOTE such that it will place a LIN-wrapped copy of the quoted datum. Entry: GC restarts Date: Wed Aug 19 13:04:24 CEST 2009 When this keeps growing, it's probably going to be a good idea to get rid of the GC restarts, and scan the C stack for references to update. Restarting works well for simple functions, but with more complicated C functions and these functions calling each other, the invariants (no alloc after mutation) are not so easy to maintain. Entry: PF read needs to be linear Date: Wed Aug 19 13:15:28 CEST 2009 This is going to be more difficult to manage. But I/O really needs to be free of GC issues. Ports also need to be lowlevel (fd instead of FILE). Entry: PF design choices Date: Wed Aug 19 13:55:15 CEST 2009 So what problems does it actually solve? The PF design allows the combination of: * Synchronous finalization. * Reflection. In PF, synchronous finalization is a consequence of the use of primitive functions that operate on a linear core memory: linear lists/trees and open/close RC-managed leaf objects). Reflection -- here limited to compilation: the translation of source code represented as a data object to a representation of composite functions that can be executed by the interpreter -- is made practical by embedding the linear core machine in a non-linear dynamic language. This composite code is naturally represented as a non-linear data structure: it uses sharing (subroutine calls) and can contain cycles (program loops). While compilation itself is a nonlinear operations and might produce garbage that needs to be collected asynchronously, running the resulting code does not, as it only sequences linear operations. Note that nonlinear data can be treated as constant by the linear primitives and interpreter because it is managed elsewhere, as long as the asynchronous GC traces all the nonlinear references in the linear memory. Dynamic features are helpful for debugging, testing and evolving design (= writing throwaway code). This, however, is largely a matter of taste. The usefulness of the linear memory manager has some deeper reasons though. Asynchronous GC doesn't work well in combination with finalizers: functions that free external resources (different from core memory) when their representation gets collected. In practice, an ``external resource'' is anything with an open/use/close access semantics that _you cannot change_ for whatever reason. This can be anything. Some examples: - A file in a filesystem. - A hardware resource. - Cache-managed resources (i.e. processor memory cache). ( The latter case might be unclear. Here the open/close is not explicit, but some resource will be evicted whenever a resource that's not in the cache is accessed. A cache doesn't behave as memory because there is no `memory-full' notification when an eviction happens. Absence of this signal is actually the basic idea: it frees the user from having to ``pool'' the resource. In some cases however (essentially due to broken design, when you can't bypass the cache API), you want to regain control over the pooling behaviour by manipulating the access patterns from an external pooling mechanism. Note that in general, this is only a performance problem: dead references will eventuall get evicted from the cache because they are never accessed, but they might trigger eviction of _other_ components that are still in use. I.e. the round-robin access pattern typical for an asynchronous GC does exactly this in the case of an LRU cache strategy. ) The problem is that you never know when the GC runs, which might lead you to cases where you're waiting on the availability of a resource that's unreachable, but not yet collected. Triggering local GC in this case isn't a solution: the reference could be held by an unreachable representation object in a _different_ program that uses asynchronous GC which doesn't expose an interface to its GC trigger. The real problem here seems to be that asynchronous GC is essentially a _global_ operation, and making this work with low-level resource management requires a specific kind of resource -> GC signalling to happen between all components involved. In stark contrast, the open/close semantics composes easily (a constructor/destructor calls several constructors/destructors), and shields the lower layers from what happens above. This means that in practical situations, open/close resource semantics are the only available option, and in the case where fast reclaim is necessary, a linear container memory might be the simplest solution. A linear _concatenative_ language gives you many of the benefits of a higher level safe language, with the added benefit of not needing an asynchronous GC, at the expense of giving up the ``random access'' that come with lexical variables. Of course, it is also possible to translate this to a language _with_ lexical variables as long as proper constraints are placed on variable references: every variable needs to be referenced exactly once. This can be enforced statically. This seems to be harder to get right though (i.e. use-once closures). However, if such a language can be defined, it can probably be directly translated into the PF concatenative execution model. Note that Real-Time GC (incremental GC with aribrarily low upper bounds on collection times) isn't a solution to the problem unless it can manage _all_ resources in a bounded time. Collection still isn't synchronous, so any depletion event needs to propagate into the collector, which might break the upper collection bounds as it requires a full GC. Entry: Static linearity checks Date: Wed Aug 19 14:35:50 CEST 2009 It would be nice to find a way to statically enforce the linearity constraint. Currently, it's quite error-prone to write primitives. Entry: Merging SC and PF Date: Wed Aug 19 15:16:50 CEST 2009 Maybe it's time to completely unify the two interpreters. Essentially this would work as a `scheme-eval-step' in PF. The idea is that once you have everything inplace to implement PF in EX, the only thing that's left is the Scheme interpreter's (pure) step function and some other operations on the interpreter state. The only problem is Scheme's toplevel environment. However, this could be hidden as dynamic scope in the continuation chain inside the state. Entry: Multiple dispatch Date: Wed Aug 19 16:46:09 CEST 2009 How is this best implemented? Entry: No >r, nor r> : towards call/cc Date: Wed Aug 19 17:56:53 CEST 2009 These don't work any more. The problem is that there isn't really a return stack. There is only a continuation. If this is the case then it is better to get rid of IP, and always use the top of RS as the continuation. What can be done is to modify the continuation such that an operation can be inserted after another one. I.e. to implement `dip'. OK: removed IP register. RUN is now simpy a push to RS. Let's rename RS to K. Now call/cc is very simple: void pf_call_with_cc(pf *pf) { _ fn = TOP; _ k = pf->k; pf->k = LINEAR_CONS(fn, HALT); _TOP = k; } Note these are linear one-shot continuations, but they can be copied without causing any trouble. They are also _data structures_ and cannot be applied like a function. But, maybe "run" should take linear lists, by pushing them onto K. Note this is problematic without partial continuations (which should be the default..) Entry: Lowhanging fruit Date: Wed Aug 19 20:03:57 CEST 2009 - dip - map - linear lists (partial continuations) should not be recognized by `run' -> use `run-list' or so. Map: This can't be a primitive, but I prefer to make it fast (local), which could be done by modifying the continuation from the primitive. It needs 3 storage locations: in out fn - move result to OUT - replace with next from IN - push FN to K What I really need is a better way to express all these stack permuations. Dip: there's a problem here: quote isn't a linear atom, so it should be handled by undip. Entry: Delimited Continuations Date: Thu Aug 20 09:27:08 CEST 2009 In contrast with the previous PF implementation, which was mostly an indirect threaded Forth, the current VM has a direct representation of continuations. Partial continuations are not so difficult to construct on top of this: they simply splice the current K upto a marker. I prefer to use shift/reset, so let's implement that first. Seems to work: (shift 1 2 3) reset ps <1> [{'1 '2 '3}] run ps <3> 1 2 3 The first paper about delimited continuations seems to be this one[1]. In the introduction it mentions that the crucial idea is to represent continuations as functions that can be composed. In contrast, traditional continuations cannot be composed, as they do not return a value to the calling context (it is discarded). On page 7, on changing the CEK machine: ``One solution is changing (CEK7) so that the invoked continuation is concatenated to the current one: ...'' Apparently this is problematic for the denotational semantics (continuations being opaque?) but I see no problems on the implementation side. Essentially, in PF this concatenation could be made ``lazy'' (just like SEQ SUBroutines) by allowing K to be a tree instead of a list. That way the interpreter can unpack branches one cell at a time. Otoh, it seems simpler to place the burdon in `run' as it is now. There are some other problems with the current implementation though: the `prompt-tag' and `undip' functions are special, but can occur in lists that are representing a continuation. Therefore it is probably best to represent the continuations abstractly to prevent programs from destructing them. In any case, it could be left as is now. Just keep in mind: a list of code can be converted to a continuation, but vise versa isn't really sound. [1] http://www.ccs.neu.edu/scheme/pubs/felleisen-beyond.pdf Entry: Error prone Date: Thu Aug 20 10:01:05 CEST 2009 As mentioned before, writing the PF primitives is error prone. Linearity constraints are difficult to maintain in the presence of primitive exceptions. There is only one remedy: never keep linear variables on the C stack. But this can be tedious.. Entry: Pattern matching for EX Date: Thu Aug 20 11:28:14 CEST 2009 This really shouldn't be so difficult: from the data structures, it would be a generation of a couple of CPP macros that setup a lexical environment. It would require some trickery with curly braces though. MATCH(v) CASE(pair, a, d) { } CASE(NIL) { } ELSE { } END_MATCH where MATCH sets up the first '{' and each CASE starts with a '}'. Entry: Scheme -> EX compilation Date: Thu Aug 20 15:24:19 CEST 2009 This seems really straightforward. It would be nice to be able to statically check at least names and argument arity. About writing code generators: you _really_ want to write it as a coroutine (a `generator') that passes strings to an output port through a `yield' function, in this case traditionally called `emit'. Using mapping might be more elegant, but in this case due to small differences in tree structure, decoupling the collector from the enumerator is best. In C, there are statements and expressions. For EX it's simpler to get rid of the statements early on, so one can focus on the expressions. Also, they don't need to be converted to statements (CPS) since C's expression syntax will do just fine. Then, it's simpler to use syntax-rules style pattern matching to write the generator than an using scheme/match at runtime. A bit of work trying to get the basic abstractions right: statements emit a line, expressions use string substitution to be built recursively, function/constructor names get mapped, context arguments get added. Everything is a function, except the language forms themselves, which are implemented as macros (`def', `block' and `return'). (def (bar x y) (block ((a 123) (b 345)) (block ((c (plus a b)) (d (min a))) (return (foo c d x y))))) => _ ex_bar(ex *ex, _ x, _ y) { _ a = 123; _ b = 345; { _ c = ex_plus(ex, a, b); _ d = ex_min(ex, a); return ex_foo(ex, c, d, x, y); } } Translating from Scheme-like syntax (eliminating 'return') will require automatic distinction between special forms and application forms. There's another thing that's really annoying: nested Scheme/lisp expressions return values, but C's curly braces represent nested statements. Maybe it's best to simply target GCC's statement expressions[1], and use the '?' condition operator. This seems more natural. I'm also using a direct compiler instead of macros. To make this look better though, the indentation needs to be changed so statement expressions can be assigned to variables to reduce the visual complexity of the nesting. That's a pretty-printing exercise, so consider the real problem solved. The rest is play ;) So, prettyprinting. What you want is "newline" to indent to the current indentation point. -- Trying something else: rewrote it to make it more _manual_ : essentially the code makes decisions locally about whether to "press enter" or whether to change indentation or not. This seems to work quite well. (emit-definition '(define (foo a b c) (let* ((x (plus b c)) (y (let* ((d (min a b)) (e (max a b))) (times d e)))) (bar x y)))) (pp-enter) => _ ex_foo(ex *ex, _ a, _ b, _ c) { return ({ _ x = ex_plus(ex, b, c) _ y = ({ _ d = ex_min(ex, a, b) _ e = ex_max(ex, a, b) ex_times(ex, d, e); }) ex_bar(ex, x, y); }); } OK.. I now see that this is not trivial because it requires backtracking. Better use something that's seen some thinking[3]. [1] http://gcc.gnu.org/onlinedocs/gcc/Statement-Exprs.html [2] http://www.iai.uni-bonn.de/~ralf/hw2001/9.html [3] http://planet.plt-scheme.org/display.ss?package=pprint.plt&owner=dherman [4] http://eprints.kfupm.edu.sa/69518/1/69518.pdf Entry: Terminology Date: Thu Aug 20 22:59:14 CEST 2009 I shouldn't be calling the code graph an AST (abstract syntax _tree_) since it is a _linked_ AST, which in general is a graph: variable references linked to their definition points. In PF all variables refer to code: primitives, quotations or SUB cells. Entry: PF interpreter in Ocaml Date: Fri Aug 21 08:14:26 CEST 2009 I'd like to encode the interpreter data types in something more rigid, to see if I missed some corner case. The first thing to realize is to distinguish types and constructors. SEQ = (SUB : SUB) ;; `:' is graph CONS SUB = PRIM | QUOTE | SEQ K = HALT | (SUB . K) ;; `.' is linear CONS type sub = Prim | Quote | Seq of sub * sub ;; type k = MT | Frame of sub * k ;; The second thing is to realize that dynamic typing is a _lot_ less restrictive than the ML type system. Especially the wrapping necessary make unions of other types. Dave Herman describes it here[2] in relation to the way it's done in Typed Scheme[3]. type sub = Prim of prim | Quote of datum | Seq of sub * sub and k = MT | Frame of sub * k and datum = Number of int | Code of sub | Pair of pair and prim = Address of int and pair = Nil | Cons of datum * datum ;; Now I'm trying to express the interpreter. Ocaml's syntax is weird. Constructors have different syntax from functions? Aha. Ok I understand. Constructors can take multiple inputs, but functions always take a signle one. Multiple inputs are handled using currying. A constructor seems to be a function that takes a tuple. Next: how do you put a function in a datatype? I.e. the prim type is actually state -> state. Looks like I want GADTs[4][5][6]. This looks like a can of worms I don't want to get into yet. Let's use simple enumeration. Good. It's running. See code here [7]. I've added 'Run' which leads to some arbitrariness in encoding the types. Is Run a prim or a special case of code? I'd say the latter since it operates on the whole machine state instead of just the parameter stack. Quite revealing to have to specify things a bit tighter.. From a first real contact, I think I like OCaml, even if it is more restrictive. It is definitely more easy to get things right once they pass the type checker. I should have a closer look at Typed Scheme[3]. Now, getting the 'fac' example running. In factor: : fac ( n -- n! ) dup 1 = [ 1 ] [ dup 1 - fac ] if * ; What is necessary is a way to define a cyclic data structure. Apparently `let rec' does not allow this: let rec _fac = Seq(_dup, Seq(lit 1, Seq(_equals, Seq(quot (lit 1), Seq(quot ((Seq(_dup, Seq(lit 1, Seq(_minus, _fac))))), Seq(_if, _multiply))))));; => Error: This kind of expression is not allowed as right-hand side of `let rec' How to solve this? In Scheme it's easy: just dump a symbol at first, then walk the data structure and set! the value. [1] http://lambda-the-ultimate.org/node/983 [2] http://calculist.blogspot.com/2008/09/true-unions-revisited.html [3] http://www.ccs.neu.edu/home/samth/typed-scheme/ [4] http://www.haskell.org/haskellwiki/Generalised_algebraic_datatype [5] http://okmij.org/ftp/ML/GADT.ml [6] http://www.haskell.org/haskellwiki/GADTs_for_dummies [7] http://zwizwa.be/darcs/libprim/pf/pf.ml [8] http://caml.inria.fr/pub/ml-archives/caml-list/2008/12/460de486fcbbc2eca13e5c77d014da17.en.html Entry: PF interpreter in Haskell Date: Fri Aug 21 17:02:45 CEST 2009 That was fun. Let's try the Haskell version. First, how does this work.. Loading the file pf.hs in ghci: > :l ~/libprim/misc/pf To reload: > :r Oops.. ran out of steam.. maybe one lanuage in one day is enough.. I wonder though: the only apparent difference with Ocaml (apart from the pure and lazy things) is that constructors seem to be curried functions, while on Ocaml they take tuples. [1] http://learnyouahaskell.com Entry: Remarks about the PF interpreter Date: Sat Aug 22 08:15:00 CEST 2009 Refer to entry://20090816-120957 Note it is important to distinguish the byte code graph data structure from an AST or the untyped s-expression encoding of _symbolic_ code. The SEQ type can have sharing (shared subroutines) or loops (direct or mutual recursion), while an AST or an s-expression is really a flat _tree_. - symbolic source code like: ``1 dup cons'' here the symbols are translated by a dictionary (vocabulary) that maps symbol -> SUB. symbolic code is a _tree_ represented as an _untyped_ s-expression. - `byte code graph'. Each symbolic source code definition can be compiled to a single SUB (subroutine) type, which is _directly linked_ to other SUB types by data sharing. Because it's a graph, this data structure is not easily printed. - AST: because the original syntax is so simple, the intermediate form one would traditionally use in an interpreter = a data structure with a recursive type that represents the same _tree_ as a source code s-expression, is _not_ present in the current PF implementation. Compilation goes straight from symbolic code as non-annotated s-expressions to the `byte code graph'. IMPLEMENTATION One could _implement_ a SEQ(foo,bar) as the following machine code: call
jump
Note that the `first' slot in a SEQ will push the return stack when it points to another SEQ. If foo were a prim, the `call' instruction could be replaced by inlined machine code implementing the primitive, or a call to a machine code subroutine. In the case 'bar' is a subroutine that's used only once, it could be inlined to eliminate the jump. Alternatively, if the machine has a literal push instructiom, an even simpler representation would be something like: push
jump
Entry: The return stack should contain XTs Date: Sat Aug 22 10:40:14 CEST 2009 Something that has always bothered me about Forth is that the return stack contains _pointers_ to XTs (i.e. XT _arrays) while `execute' takes XTs. In other word: the return stack _itself_ is not a program. This is a consequence of having two forms of code: threaded code and machine code. An XT is often implemented as a pointer to machine code (direct threaded) or a pointer to a pointer to machine code (indirect threaded). Entry: PF closure Date: Sat Aug 22 10:46:47 CEST 2009 Because there is no nested lexical scope, there are no closures as there are in a functional programming language based on the lamda calculus. However, they can be simulated by `consing' data onto code. Entry: Ocaml and tagged interpreters Date: Sat Aug 22 12:23:04 CEST 2009 It seems to me that not being able to define data types that can contain functions is quite a limitation. Maybe there is something I didn't get? Maybe this only can't be done _recursively_ ? I.e. what I want is: type sub = Prim of (state -> state) | Seq of sub * sub (* NL *) instead of having to break it down to tags where each state -> state function is interpreted. Actually. Look at [1] Function types. Also [2] Recursive values which are not functions. About recursive values: apparently it is only able to handle `simple' expressions. ``Informally, the class of accepted definitions consists of those definitions where the defined names occur only inside function bodies or as argument to a data constructor.''[3] [1] http://caml.inria.fr/pub/docs/oreilly-book/html/book-ora016.html#toc21 [2] http://caml.inria.fr/pub/docs/oreilly-book/html/book-ora016.html#toc23 [3] http://caml.inria.fr/pub/docs/manual-ocaml/manual021.html#s:letrecvalues Entry: Next Date: Sat Aug 22 15:22:33 CEST 2009 After the small excursion to Ocaml land (maybe I should forgo the Haskell version?) it's time to get back on track and get this thing working before next week. Goals: - tasks/assymetric coroutines with proper stack switching. - automatic wrapping of C APIs. Entry: Closures / Different continuation types Date: Sat Aug 22 18:22:21 CEST 2009 The 'cons' trick to make a runnable continuation out of lists (possibly consed to existing code) doesn't work: code needs to be quoted. > '(1 2 3) run > ERROR: unknown-instruction: 1 Maybe it was really a good idea to go for the Ocaml route: now I can try to express these things in a type system to eliminate glossing over those level-shifting tricks that tend to seep in to ``just make it work''. This is something to be _really_ careful about. Something I've only learned to appreciate recently.. It makes things more complex, but it is well worth it when you try reasoning about the code. Guy Steele mentioned this in [1]. It is also quite central to the PLT module system[2]. Intuitively: it is about a more general principle of not giving into the ``turing machine feedback loop''. Recursion (or iteration) is a fine tool, but too much mixing makes things hard to understand. More often you want to abstract direct recursion into combinators. For my taste Joy relies too much on `i' which constantly shifts levels. However it does illustrate a point: if you want a really simple language: eval everything. What would work is to have different 'frame' types to instruct the interpreter to handle these speical cases. Also, the stuff you pinch off of a continuation stack should really be opaque. Composable, yes, but not decomposable.. Maybe there could be two kinds: interpreting and non-interpreting frames? So it looks like we need different kinds of continuations after all.. They can all be differently tagged linear pair types though. [1] http://www.ai.mit.edu/projects/dynlangs/wizards-panels.html [2] http://www.cs.utah.edu/plt/publications/macromod.pdf Entry: Quote of the day Date: Sat Aug 22 18:50:04 CEST 2009 Attributed to Matthew Flatt: ``You need a certain amount complexity to make things look simple.'' [1] http://list.cs.brown.edu/pipermail/plt-scheme/2009-August/035243.html Entry: Asymmetric Coroutines in Lua Date: Sat Aug 22 22:39:04 CEST 2009 It makes the point that, like partial continuations (PC), asymetric coroutines (AC) are easier to use because they are composable just like routines, while symmetric coroutines are hard to set up. Coroutines cannot yield when there is a C function call in the stack. The authors mentioned they were working on investigating the relation between AC and PC. From what I tried in Scheme, an AC can be implemented in terms of a PC. Also the other way around? They both are essentially segments of a control stack. The update is here[3][4]. A library by Sandro Magi (naasking)[5][6]. Looks like this is the place to contiue working. I missed it last time, since I went to the sigfpe[7] article for inspiration. Currently, libprim/task.c has stack copying coroutines. So, what am I going to do? Stack copying already works, so it could be a fall-back. Implementing one-shot continuations could then be implemented as an additional feature. [1] http://lambda-the-ultimate.org/node/438#comment [2] http://www.inf.puc-rio.br/~roberto/docs/corosblp.pdf [3] http://lambda-the-ultimate.org/node/2868 [4] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.58.4017 [5] http://code.google.com/p/libconcurrency/ [6] http://higherlogics.blogspot.com/2008/07/coroutines-in-c-redux.html [7] http://homepage.mac.com/sigfpe/Computing/continuations.html [8] http://www.reddit.com/comments/6s5tt/coroutines_in_c Entry: Making partial continuations abstract Date: Sun Aug 23 09:35:40 CEST 2009 So, it looks like it's clear that PCs are the proper abstraction mechanism: they are k-stack segments that can be concatenated to the current continuation. There are several ways to approach this. However, considering that the tagging needs to use _linear_ memory, the appropriate way is to tag the pairs. If this is the way to do things, is it possible to move other kinds of tagging to a ``snapped'' pair-based encoding? I.e. instead of Seq(Quote(a), b) use SeqQuote(a, b) It doesn't look like it: there are several disadvantages to not having Quote(...) and Prim(...) as separate instances of a sub (subroutine) type. These are used a lot as a single type and should be bundled, so they don't complicate matters. So what about this: and cont = Done | Next of sub * cont | Late of datum * cont Where 'Late' has the implied semantics of using `interpret' to convert datum -> sub. This would allow certain kinds of cheap, linear reflective behaviour to be implemented. ( Note: I'm _only_ trying to work around the construction of a Quote type, which is nonlinear. I.e. it is a flattening optimization that allows the use of linear data only. ) What about calling 'Late' something like 'Data' or 'Close'. It really doesn't need `interpret' semantics: it can be a pure quote, because the call to `interpret' can be inserted directly into the continuation. This looks alright. I just have to convince myself that this is elegant: I.e. that it's really necessary to have 2 quotation mechanisms: one part of nonlinear code (enabling data to behave as code + be shared), and one part of linearly constructed code (data as code, but not shared). Let's give it some fermentation.. EDIT: One thing to note is that the continuation doesn't need this whole tail-recursive constraint: it is always consed to the empty continuation, which leads to a machine halt state. So it really _is_ a pair. All this is because of subtle differences is between `code', `closure' and `continuation'. (I need a better name for `closure' though.) Entry: Tree permutations Date: Sun Aug 23 10:00:42 CEST 2009 It's suspiciously difficult because of the possible type exceptions that might mess up temporary storage on the data stack. So let's forget about optimality and write well-factored code. A linear language specification needs a sublanguage for specifying _tree permutations_. Essentially, once you've figured out which pointers permute, the inverse is just the inverse permutation. This needs to be split in two parts: - make sure the tree has the structure you expect (type check) - perform permutation Essentially, I want th tree transformation to be safe and guaranteed linear in the first place, and correct in the 2nd place (which is then easily checked in the terminal). Probably Knuth has something to say about this. Also, I ran into a tree transformation language a while back. Forgot where.. Was a Belgian guy. Also, I did something like this before. Probably hidden somewhere in Staapl / Brood code archives.. Poke? This looks like a nice exercise to write a compiler for. Can it be done using binary tree rotations? ( Really, I've done all this before.. But where? ) Suppose the stack is this: (L . S) `cons' will be something like (L . (d . S)) -> ((d . L) . S) then the reverse `uncons' is: ((d . L) . S) -> (L . (d . S)) They are indeed simple tree rotations. A left rotation can be transformed into a right rotation as LEFT = FLIP * RIGHT * FLIP Nope! They are not tree rotations. They are rotations followd by a particular flip: LEFT FLIP (L . (d . S)) ---> ((L . d) . S) ---> ((d . L) . S) Anyways, it doesn't seem so difficult to compile such a definition into a program that deconstructs the list, and then re-uses the list cells to reconstruct it. This can even be done manually. I've summarized it in the next post. Entry: Writing linear tree transformations Date: Sun Aug 23 10:58:56 CEST 2009 /* Implementing (conservative) tree transformations. A tree transformer can be derived using the following manual compilation approach: 1. Write down the transform in a high level dotted pair form: LHS -> RHS 2. Following the structure in LHS, bind the nodes (and deconstructed dot pairs) to C lexical variables using type-checking casts for the pairs. This makes sure exceptions happen before we mutate anything. 3. Rebuild the tree by re-using the pairs to create the dotted tree in the RHS. To increase readability, index the pairs from left to right as they appear in the textual form of LHS and RHS. */ /* D = datum L = list P = parameter stack (L . (D . P)) -> ((D . L) . P) */ void pf_cons(pf *pf) { pair *dot0 = CAST(lpair, pf->p); pair *dot1 = CAST(lpair, dot0->cdr); _ L = dot0->car; _ D = dot1->car; _ P = dot1->cdr; pf->p = VEC(dot1); dot1->car = VEC(dot0); dot0->car = D; dot0->cdr = L; dot1->cdr = P; } /* D = datum L = list P = parameter stack ((D . L) . P) -> (L . (D . P)) */ void pf_uncons(pf *pf) { pair *dot0 = CAST(lpair, pf->p); pair *dot1 = CAST(lpair, dot0->car); _ D = dot1->car; _ L = dot1->cdr; _ P = dot0->cdr; pf->p = VEC(dot0); dot0->cdr = VEC(dot1); dot0->car = L; dot1->car = D; dot1->cdr = P; } Entry: peg/leg leaky? Date: Sun Aug 23 11:18:05 CEST 2009 Let's clean up the reader. There is a problem with the peg/leg parser: it uses malloc() but no free(). OK for one-shot tools but if I'm going to keep using it, it's probably best to handle that problem.. Ok, it apparently reuses its buffers, and grows them whenever necessary. I guess that's OK. So, now I just need to be careful to not upset it. Probably best to use malloc() Entry: Different continuation types Date: Mon Aug 24 11:03:53 CEST 2009 Continuing the ideas in [1]. Starting with the following context: * A partial application is a binding of data context to code. Suppose you have an object referenced by `ob' and an operation `op' that expects such an object as the first parameter on the stack, the following code would be a closure: (ob op) * In Joy, code == data. Its VM wouldn't need to distinguish pairs for code, partial applications and partial continuations. Their difference is only visible through usage: they are all represented as lists. * In PF, these 3 need a different representation because code is nonlinear, while partial applications and continuations are linear. What about this: and stack = Empty | Next of datum * stack and cont = Done | Next of sub * cont | Data of datum * cont Wait. There is already a mechanism to construct partial applications as continuations: `dip' followed by `run'. The question is then: what is best? Explicit tagging (using different continuation frame types) or ``prefix parsing''. It seems to me that tagging is better behaved: in that case each frame has its meaning isolated, and doesn't depend on the meaning of the previous frame. So be it: tagged frames. This also makes `dip' simpler. [1] entry://20090823-093540 Entry: A Joy machine Date: Mon Aug 24 12:21:55 CEST 2009 In [1] you find an intensional Joy machine with continuations implemented as intensional quotations constructed at run time as the machine performs name -> quotation dereference. This is an example of how a reflective feedback loop (turning data into behaviour) can make a language interpreter really simple. The flip side is that reflection breaks static analysis. [1] http://zwizwa.be/darcs/libprim/pf/joy.ml Entry: SEQ -- why encode the byte code as a binary tree instead of nested lists? Date: Mon Aug 24 17:16:15 CEST 2009 (From a discussion with Dominikus.) Why use a binary tree instead of a list to represent byte code? We start with the assumption that we agree on representing byte code as a recursive data type (i.e. as opposed to using low-level arrays and pointers/indices), and that we agree on wanting proper tail call handling so loops can be expressed as recursive calls without blowing up the context stack. To do this correctly, given a concatenative function specified by the code sequence "a b c ... z", the _last_ procedure `z' needs to be handled in a special way. Before `z' is invoked, no context needs to be saved because there is nothing to do afterwards. This is in contrast with the function calls `a', `b', `c' ... which need the computation to resume and thus require a context save. Now, in the case of a compiled language, you have two options. You either perform this special treatment of the last element in the list at run time, or you perform it at compile time and represent the code in a data structure that doesn't need any special case handling by the interpreter. This is exactly what SEQ does: it provides a data structure which can be interpreted without special-casing. The `handling' of tail calls is done at compile time only, by encoding the program as a binary tree. Interpreting the binary tree has only one case: the first branch needs context save, while the latter one doesn't. Entry: Rant about reflection Date: Mon Aug 24 19:04:41 CEST 2009 PF isn't really so much about Forth. The name is a historical accident.. PF is really about building a simple, safe, dynamically typed language with linear memory management. Let me rant a bit. Forth is.. well.. Forth :) There isn't much to say about it, except that it's a very nice machine model with a hacked-together text interpreter that's really too reflective. Over the last couple of years I've moved more into the direction of less reflective code, mostly guided by the PLT Scheme approach to domain specific languages. The attempt in Staapl to ``unroll'' the very reflective nature of standard Forth stands as one of the pilars of my approach to concatenative languages. In general the pattern seems to be that using very reflective languages leads to very simple language cores and a lot of expressive freedom, but in turn makes it easier to write programs that are harder to understand. In a static programming style you put the cleverness in the types and other static constraints: something a compiler can check and proove consistent. In a dynamic language the cleverness goes right into the core of things: anything can be special-cased. The main point being that when code bases grow it seems to be beneficial to be able to bolt things down in a more static way: huge code bases can't be kept in a human head so you need a machine to help manage at least part of the consistency. Adding some static properties seems to be the only way to do that effectively. I'm currently right in the middle of these two paradigms: I realize dynamic languages are essential for debugging and exploration, but also that static structure is necessary to put more meaning into code you write for "later" when it's meaning is no longer in your short-term memory. The holy grail seems to be to find ways to combine these two in a productive manner. Gradual typing seems to be the way to go. Entry: JOY: Intensional quotation Date: Mon Aug 24 20:32:40 CEST 2009 Both intensional and extensional quotations can be _constructucted_ by building larger objects that contain other quotations as subparts, but only intensional quotations can be _deconstructed_ i.e. interpreted as a list. Because the use of extensional quotation requires two different types with different operators, Manfred concludes[1] that ``On the whole, then, it does seem preferable to have quotation as an intensional constructor.'' PF (and Scat in Staapl) do it differently to be able to compile quotations to a more efficient form. MetaOcaml does it differently to not loose well-typed guarantees. In general it seems to me that extensional quotation is the sensible default: loose reflective object language expressivity to gain meta language expressivity. [1] http://www.latrobe.edu.au/philosophy/phimvt/joy/j07rrs.html [2] http://en.wikipedia.org/wiki/Intensional_definition [3] http://ccl.clozure.com/irc-logs/scheme/2008-10/scheme-2008.10.22.txt Entry: rewriting -> Factor list Date: Tue Aug 25 08:00:05 CEST 2009 Hello list, I guess this is mostly for Daniel, but anyone else of course feel free to chime in... I'm trying to get a better idea about rewriting systems for concatenative languages, particularly in the area of specification of peephole optimizations. It's been on my mind for a while, but I've always side-stepped any non-deterministic computations. I.e. in Staapl[1], the PIC18 code generator[2] is written using an eager pattern matching approach, essentiall re-interpreting already generated machine code as stack machine code that can then be partially evaluated together with the currently compiling primitive operation. (Explained here[3] in some detail). I've always been suspicious that this formulation, while it works well in practice and has a certain elegance, is a special case of a more general, non-confluent rewriting system. It seems to me that if there are any interesting optimizations to make, they are not going to be unique and depend on which path through the reduction rules is used to get to a particular irreducible expression. And then it stops. I have some intuition about reduction machines, as long as they compute a unique solution (lambda calculus w. strict eval (CEK machine) and lazy eval using graph reduction, ...) but I'm not sure how to view the other cases where you have a bunch of rewrite rules and you ask the compiler to pick a sequence of reduction rule applications that leads to the ``most optimal'' reduction under some kind of cost function (for PIC18 this would be the machine code length). I hope this makes some sense.. Any ideas on this welcome. Cheers, Tom [1] http://zwizwa.be/staapl [2] http://zwizwa.be/darcs/staapl/staapl/pic18/pic18-unit.ss [3] http://zwizwa.be/archive/staapl.html Entry: Real-time GC Date: Tue Aug 25 08:34:16 CEST 2009 PF uses linear core memory management and nonlinear code memory management. With linear memory management I mean that the machine's run-time uses _only_ a tree data structure. SEQ encodes all the rest, and is the basic data structure of the nonlinear memory. (Essentially, SEQ is only constructed at compile time: during run-time the code structure is constant.) The reason for this is that (in my understanding), "real" real-time tracing garbage collection doesn't exist. It exists for memory-only GC, but in practice you also want to manage much more scarce hardware resources which in the light of a real-time GC _still_ require full non-bounded collection steps. If you replace a tracing GC with refcount management system, then you can have real-time GC, given that you realize GC cost is proportional to the _size_ of data structures you destroy. In practice, this isn't a problem. An embedded system usually has a loop with very predictable data usage. The case where you destroy a single data structure containing a large amount of objects is rare. Entry: Metacircular interpretation: Maxwell's equiations of software Date: Tue Aug 25 10:27:08 CEST 2009 [1] http://arcfn.com/2008/07/maxwells-equations-of-software-examined.html Entry: Next Date: Tue Aug 25 10:37:26 CEST 2009 Implement the 2-type continuation frames in C. This requires two kinds of linear cells: data cells (list cells) and contiuation k-cells. Got it.. let's see if `dip' works. OK. shift/reset now also work again. The implementation (using linear pairs for data and `next' frames for sequence continuations) is a bit messy. Maybe there is something to say about doing it like this.. Too reflective? Let's keep it like it is and see if there are any inconsistencies. Summary: it's now possible to `cons' a datum onto a partial continuation, which is implemented as a nested data/next frames, where the data frames are just the linear list object. This is still not correct: Essentially one wants a `bind' and a `compose' operation which combines a datum or a function with a function. Entry: What is a continuation? Date: Tue Aug 25 17:33:50 CEST 2009 That's the down-to-earth bit-twiddler's ``is''.. A continuation is essentially (a representation of) an imperative program that sequences the remaining primitive machine operations to perform to complete the current program/evaluation. The side effect of this imperative program is the evaluation of the expression it represents. Obvious if you look at what CPS-conversion does: it translate a functional program written in terms of nested expressions, into an imperative program that incrementally builds and updates a data structure. Now, this becomes really obvious in a concatenative (point-free) language, which already is in CPS form. The continuation is essentially a program in the same form as any other program. Entry: PLT Redex Date: Tue Aug 25 17:46:05 CEST 2009 Jao calls it syntax-rules on steroids[1]. Looks like the answer to the question about the peephole optimizer's rewriting behaviour has been right under my nose for quite a while. [1] http://programming-musings.org/2009/06/19/a-merry-gang/ [2] http://redex.plt-scheme.org/ Entry: Re: rewriting Date: Tue Aug 25 21:31:25 CEST 2009 Thread + reply from Daniel: http://www.mail-archive.com/factor-talk@lists.sourceforge.net/msg03559.html > If the rewrite rules are monotonic, then there's an easy procedure > to optimize the program: list out all programs, calculate their > efficiency, and select one of the best ones I guess here `monitonic' means that there are no loops? Something like `strictly decreasing'. > With something like an optimizer, I don't understand why things have > to be confluent. This corresponds to my intuition also. > Maybe the algorithm to find the optimal series of peephole > optimizations will be a confluent subset of the general rewriting > system of all valid peephole optimizations; is this what you mean? What I meant was that the algorithm in Staapl is formulated differently (each word is a macro: a _fuction_ which takes an object code stack to an object code stack). I don't immediately see how this can be reformulated as a more general rewrite system on code _syntax_, but I've got this hunch it's not so difficult. I'm also not sure if they are really 100% equivalent, since I'm throwing in some extra specification of the order of evaluation. This makes it possible to play tricks like using objects that only exist at compile time without requiring any special notation for it: the eager algorithm guarantees that such objects are fully reduced before code is compiled (or raise an error). What would be neat is to be able to keep this semantics, but use a more elegant way of formulating it as a syntactic operation. (If this doesn't make sense I'd be happy to elaborate.) > On register-based architectures (like the PIC18, right?), doing > everything with a stack is inherently inefficient because of peeks > and replaces on the top of the stack. The 8-bit PIC chips are weird. The 16-bit cores are more or less standard register/risc but on the 8-bitters everything needs to pass through a single working register. This makes it almost cry for a stack language approach :) > I guess you do the equivalent of DCN by your peephole > optimizations. DCN = deconcatenation? > The thing about Factor's compiler is, it does this in a more general > way. You should look at Slava's past blog posts, or read the code, > if you're interested in knowing more about it. Bout time I have a look at it then.. Is the compiler tied in closely to the rest of the system? I'm wondering if I it would be difficult to embed it in PLT Scheme through the concatenative -> Scheme wrapper macros used in Staapl. So, your basic idea seems to be that yes, you want something that's not confluent, and no, there is no general algorithm: general rewriting is too unstructured, so a special purpose approach is required. Thanks so far for these answers! Cheers, Tom Entry: Abstract continuations Date: Wed Aug 26 11:12:17 CEST 2009 Time to byte the bullet and make the code more abstract. TODO: make the polymorphy more highlevel on next change. Entry: Partial Application Date: Wed Aug 26 11:55:02 CEST 2009 The pf.ml machine viewed as a non-linear machine can have a `sub' for continuation. To do this linearly one needs: 1. a way to wrap up a a linear datum as something runnable 2. allow for the composition of runnable things. It turns out that `compose' is a much better base to build on than partial application, which is a special case of run-time data -> code conversion and runtime code composition. In the current form it needs an extra wrapper for nonlinear -> linear data, but after that, `compose' is just list concatenation where the CONS cells are continuation frames of `dip' and code sequencing. Entry: PF: Linear code composition with `compose' Date: Wed Aug 26 15:47:04 CEST 2009 The `>lcode' word _projects_ linear code and nonlinear code to linear code. The `abstract' word _wraps_ any atom as linear code. The `lcompose' word takes two linear code words and composes them. The `compose' word takes two elements on the stack, projects them to linear code using `>lcode', and composes them with `lcompose'. The `pa' word takes two elements on the stack, converts the first one using `abstract' and the second one using `>lcode', and composes the resulting linear code with `lcompose'. A partial continuation obtained using `shift' and `reset' is represented as linear code. A full continuation obtained from `call/cc' is currently implemented as a partial continuation that ignores continuation marks inserted by `reset'. The `run' word and with it all combinators take both linear and nonlinear code. They do _not_ take data however. In that light I'm still not convinced that `>lcode' should automatically convert data to code. There is probably need for an `>ldata' word too. Summary: Anything that represents code can be passed to `compose' to yield a code quotation that can be passed to `run' and higher order combinators. Entry: Shift / reset Date: Wed Aug 26 23:19:22 CEST 2009 In Scheme, `shift' binds the partial continuation to a value and then passes this to a body, shifting it in place of the removed frames. Looks like `reset' and `shift' need to be exchanged! More tomorrow.. (shift 1 2 3) reset > > ps <1> <,{'1 '2 '3}> > run > ps <3> 1 2 3 > It is `reset' that marks the continuation stack and `shift' that packs up the continuation. So it looks like it is correct. The only strange thing is that in a CPS language, the ``body of shift'' will come after `reset' ! Entry: Concatenative languages and Full continuations Date: Thu Aug 27 14:32:29 CEST 2009 From a discussion with Dominikus I come to conclude that it's important to distinguish different kinds of continuations: * Full state continuations, that capture the entire machine state: both parameter and context stack. * Context-only full/partial continuations, that capture marked segments of the control stack and view the whole data stack as the value to be passed. Note that marking the data stack is ill-defined, as it may be popped ``beyond the mark'' before it is captured. Can both be unified? Rephrasing one of the solutions Dominikus proposed, one could wrap the stack as a quotation, and concatenate it onto the remainder of the current program being evaluated, yielding a new program that encapsulates the full continuation. Let's implement this in PF and see where it takes us. OK. After doing this it seems that * You still want partial continuations in this context: in most cases you're not interested in the toplevel drivers, just in the control that's part of the program. * The operation factors out into two steps: ordinary continuation operators that ignore the parameter stack, and a separate operation that appends the stack as a quotation to the resulting continuation. Thinking a bit further and keeping an eye on how it's implemented in Scheme, it seems that this generalizes to any kind of threaded state mechanism (i.e. FP-style matrices, Monads, Arrows, ...?) Entry: simpler primitives Date: Thu Aug 27 15:03:09 CEST 2009 As mentioned before: there are too many things to think about implementing linear primitives (possible GC restart and mutation). This reduces my confidence that the code is correct in the presence of primitive error aborts or GC aborts. So, either I give up on the idea of GC restarts, or I factor the code such that these distinctions are no longer a problem. Entry: TODO Date: Fri Aug 28 11:26:56 CEST 2009 - define partial continuations better - check out the multipe state threading in dominikus' code Entry: Partial Continuations in a Concatenative Language Date: Fri Aug 28 13:19:08 CEST 2009 Look here[1] for links of the theoretical background. First, the current implementation of `reset' and `shift' are _wrong_ -- actually doubly wrong: * the _captured_ by `shift' is delimited. * the _shifted expression_ is placed in a delimited context. I'm using an interface that doesn't even have a shifted expression. So let's fix this: the API is now: shift / control ( quot -- ) Where the quotation is called with the continuation on top of stack. [1] entry://../compsci/20090828-144955 [2] http://www.bluishcoder.co.nz/2006/03/factor-partial-continuation-updates.html [3] http://docs.factorcode.org/content/article-partial-continuations.html Entry: GC restarts: clear separation between PURE and LINEAR functions. Date: Fri Aug 28 21:01:50 CEST 2009 To catch GC restarts in the definition of linera primitives, a simple trick can be used for testing: simply kill the reference to the GC right before a mutation is made. So, primitives are either purely functional and can have allocation, or are linear and cannot perform allocation but are allowed to perform in-place operations. Let's say that a primitive starts in pure mode, but whenver it mutates something, it switches to impure mode. In order to test all primitives, a freelist = NIL + GC step can be performed before the execution of each primitive to cover the case where a primitive needs extra storage. Practically: - The VM: replace all PUSH/DROP operations with a FROM_TO. Start with making this type-tagged. There is a real problem in the exception handler: it might fail as there is no guarantee there's enough memory. - Find a more highlevel way of doing this. Preferably static. Can't this be solved as a type issue? In C? - Am I solving a real problem, or is this just a consequence of the way the GC is implemented? The real problem is to make a distinction between PURE and LINEAR, where the latter cannot perform any allocations, and the former cannot perform any mutations. - Try it first in Scheme. Ok. In Scheme it doesn't seem to be such a problem. Now, in PF, can't we make LINEAR the default? The problem is then: how to specify nonlinear words? Let's say it's allowed to switch to PURE mode at the entry point of a primitive. There's something fishy going on with exceptions in Scheme too: exceptions should NOT perform allocations. The _sc_step() might have switched to linear mode, so the sc_make_error might trigger GC. switch(exception = setjmp(sc->m.r.step)) { case EXCEPT_TRY: PURE(); rv = _sc_step(sc, state); break; case EXCEPT_ABORT: rv = sc_make_error(sc, sc->m.error_tag, sc->m.error_arg, state, const_to_object(sc->m.r.prim)); sc->m.error_arg = NIL; sc->m.error_tag = NIL; break; default: break; } (fixed: sc_make_error is now called before _sc_step, to preallocate such that the error allocation itself can't trigger GC restart). Maybe primitive exceptions should be linear? There are a lot of nuances here.. It needs to be simplified. EDIT: I'm still using the earlier paradigm of ``pure before linear''. In the C code this requires respecting some invariants in an error-prone dentralized way, but the principle itself holds and seems to have some higher significance: You can use pure funcitional operations as long as you don't change the world state. I.e. it's either PURE or LINEAR. And you're allowed to switch from pure to linear, but not the other way around. Practically, this makes memory management easier in the presence of brutally low-level C-code. Also, it's quite simple to enforce linearity: simply do not pass a reference to the allocator! It's more difficult to enforce purity however. In C this can't be enforced (or maybe it can.. using some `const' declarations..) Entry: Primitive exceptions should be linear Date: Sat Aug 29 09:39:12 CEST 2009 The main reason for this is that you really want the C model to be independent of the GC. Exceptions extend into C. The _real_ problem is that in PF, there are occasions where you want to perform a linear allocation, but you're not sure this is going to succeed. What about calling this a system failure? I.e. always make sure there are enough free cells. In an embedded system it most definitely is an unrecoverable failure. The problem is that in current ``prototyping PF'' the linear cells are allocated lazily. Maybe a good solution would be to keep an extra free buffer with a couple of cells N to guarantee at least N allocations to succeed in the LINEAR regime where the GC is switched off. Then when the machine can take a GC restart, replenish this buffer. So, let's leave it as it is. Fix the problem when there are unexpected allocations in LINEAR mode. Currently this seems to only happen when errors are already quite severe. Then, the only remaining problem is to make sure no _inconsistencies_ are introduced due to primitive errors by keeping linear data linked to the VM instances at all times. Entry: Fully linear PF machine Date: Sat Aug 29 11:11:18 CEST 2009 Hmm... Maybe the trick is really to have a single linear <-> nonlinear conversion primitive, and never perform any conversions in other primitives. Another problem: in _px_run() the `ip' can't be a C variable for the same reason that other structures can't be present only on the C stack: primitives might abort in which case the current `ip' is lost. What about turning this around: 1. PF primitives can never fail due to GC restarts as they _always_ run in LINEAR mode. They will abort on an empty freelist, but this case can be handled separately and be made more gentle in the prototyping case (where you want to give up real-time behaviour for extra allocation). 2. Anything that's nonlinear needs to be guarded from GC restarts in a different way. Entry: Linear `compile' Date: Sat Aug 29 14:15:06 CEST 2009 Now, because `compile' can't introduce any non-linearities, it could be made to produce a linear code representation. Only when code enters the _dictionary_ need it become non-linear: since then it has become random-access and there are no more guarantees about how it is referenced. Let's split this up in `compile-datum' and `map'. Entry: Combining CDR-coding with vectors Date: Sat Aug 29 19:24:12 CEST 2009 Instead of using vectors with tags in the name field, it might be simpler to optimize the representation for the encoding of CDR-linked lists. This is to be able to give in to some performance issues: speed is not important for PF, but memory size might be. Currently, CONS cells are really expensive, requiring 3 pointer slots per cell. However, since the rest of the implementation relies heavily on type tags, maybe this isn't going to work well? It might be better reserved for a tagless representation.. Something for later. [1] http://en.wikipedia.org/wiki/CDR_coding Entry: PF: clear staging Date: Sun Aug 30 11:09:35 CEST 2009 Everything in the code points to and optimal organization where linear / inplace programming and pure functional programming are clearly separated. Keeping both in the same VM might be useful for sharing a memory model, and is _very_ useful as a user interface, but it complicates matters. The lessen to learn: this _is_ a staged approach, so it can be implemented in a way where the object language / machine is completely independent from all reflection, and where code and other graph-structured objects are constant. The good news is that this is a very good fit for Staapl. What needs to be done is to find a way to express this in a typed setting. It is my hunch that the PF VM _itself_ might be better expressed in a typed language, where linearity constraints can be managed statically. Essentially, a linear concatenative language is a _notation_ that allows one to forgo linearity annotations, which seem to be necessary in a applicative language with unstructured variable references. Entry: A fully linear concatenative VM needs interpretation Date: Sun Aug 30 11:50:31 CEST 2009 What I find really remarkable is that it is possible to make a pure functional concatenative language fully linear. You don't need _any_ graph structure, including its virtual machine! However. Full linearity of the machine requires some form of interpretation (laziness). When code is executed, name references need to be expanded to their definitions. This `lookup' cannot be performed in a separate compilation stage (i.e. to eliminate its associated run-time cost) while maintaining linearity of the VM. Such a stage would directly link variable references with variable definitions and require a non-linear (graph) data structure: * Names can be used multiple times (i.e. procedure re-use) which requires a directed acyclic graph. * Name-based recursion can introduce loops in the representation. In this light, the idea behind the PF machine is that you operate in two regimes: one which remains linear, but keeps code and VM as opaque, constant structures, and one which is nonlinear to implement issues related to the VM (in the simplest case just code compilation). The main idea is that: ``Names are part of the meta-language''[1]. [1] entry://../staapl-blog/20080813-125604 Entry: Using C's `const' to distinguish between PURE and LINEAR Date: Sun Aug 30 17:13:21 CEST 2009 LINEAR is easy to enforce dynamically: simply don't provide a reference to the allocator. Pure is more difficult, but might be done statically using C's `const' for return values of constructors. Entry: Factor-like syntax Date: Sun Aug 30 18:27:22 CEST 2009 Removed gratuituous incompatibilities with Factor: the parser now accepts '{' and '}' for based list literals, and the pretty-printer uses '[' and ']' to display decompiled code. FIXME: this is broken: {{123}} -> '('(123)) Entry: Using Partial Continuations Date: Mon Aug 31 09:34:24 CEST 2009 Let's try the classical inversion of control problem: convert a traversal function into a cursor. First example: convert a list into a stream by inverting `each'. (list->stream . ((((cons) shift) each '()) reset)) A `stream' is a pair of a value and a quotation or '(). > '(1 2 3) list->stream > > ps <1> {1 . <,[each-next] '{2 3} '[[cons] shift] ,['()] ,[prompt-tag]>} > uncons > ps <2> 1 <,[each-next] '{2 3} '[[cons] shift] ,['()] ,[prompt-tag]> > run ps > <2> 1 {2 . <,[each-next] '{3} '[[cons] shift] ,['()] ,[prompt-tag]>} > uncons run ps > > <3> 1 2 {3 . <,[each-next] '() '[[cons] shift] ,['()] ,[prompt-tag]>} > uncons run ps > > <4> 1 2 3 () This also makes it pretty clear that a partial continuation shouldn't take the data stack, otherwise `list->stream' wouldn't work correctly. I.e. in the following you really want everything underneath the literal list to be left intact. > 123 123 123 '(1 2 3) list->stream > > > > > ps <4> 123 123 123 {1 . <,[each-next] '{2 3} '[[cons] shift] ,['()] ,[prompt-tag]>} Entry: Onward Date: Sun Sep 20 14:59:20 CEST 2009 Looks like the overall design is done. In the mean time Dave Herman fixed the show-stopping bug in c.plt, and I've started integrating it in Staapl. The plan is now to gradually move the libprim implementation in C to a higher level description, so it can be compiled to C or another substrate (i.e. a stack machine). This would also enable the linearity constraints to be machine-checked. EDIT: the goal here should be to keep the current functionality working at all times, and migrate it to a different implementation. Entry: Summary Date: Tue Oct 6 17:18:15 CEST 2009 What do we have now: * EX: a very simple GC supports an expression evaluation language that is used to implement the Scheme and PF machines. * Scheme: a very simple reflective Scheme interpreter. * PF: a VM for a linear memory concatenative language. * Some scaffolding code to generate wrappers. What do we want: * Wrapper generation for a library of leaf objects. * Fully functional PF language. * Simpler way to implement PF primitives. Entry: Images Date: Tue Oct 6 17:25:41 CEST 2009 Goal: get the video in, XV out working in PF. Strategy: use the old PF code, after factoring it into plain C OO code. Currently, the code in X11 seems already factored so should be usable as-is. Next: create a wrapper generator. For scheme, it's usually simplest to keep the parameter layout. For PF however, object references are usually best placed on top-of-stack, with all the other arguments in reverse order. Something like this: message(object, arg1, arg2) -> arg1 arg2 object message Let's try Scheme first, then use that as a base to write the RPN wrappers. Step one: parse the file with c.plt Took cplt.ss from staapl (includes simple printer). Hmm.. This isn't going very well. Maybe the hint I should take from this is that my header files aren't abstract enough: they depend on a lot of typedefs from deep within the system. The C objects should really abstract most if not all of this.. So, what's next? Setting a clean header standard? I guess this is doable: allow for public and private versions, with public ones exposing only functions. Alternatively, working around the limitations of c.plt might be simpler. Filed bug report. Note: there is a syntax for creating ASTs: http://planet.plt-scheme.org/package-source/dherman/c.plt/3/2/planet-docs/c/pc.html Entry: C parsing Date: Wed Oct 7 09:09:29 CEST 2009 Ok, let's make it work with modified .h files for now. No, the bug is too much of a block. The simplified headers won't parse either. I'm letting this go until the block is fixed. Entry: PLT Scheme Date: Wed Oct 14 09:28:21 CEST 2009 The current design of a combined Scheme / PF machine seems useful. However, if there is no constraint on the _size_ of the scripting system, it might be wiser to bolt a PF-like linear machine onto PLT Scheme, in a way that is usable with other Scheme implementations. Making the definition of PF more formal might be done in this way too: providing a substrate in PLT Scheme which can compile to C and the minimalistic Scheme/GC in librim. Entry: Gstreamer Date: Thu Oct 22 12:46:55 CEST 2009 Recent events have made me think it's probably best to bridge PF and gstreamer. However, it might be interesting to do this as a 3-way coupling between plt-scheme, PF vm and the gstreamer framework. Entry: Next: usability + test case Date: Tue Nov 3 10:28:39 CET 2009 Need to evolve the design towards an (undisclosed) application. Things to fix: * automatic wrapper generation This might be the most interesting thing to start at. * unify the Scheme and PF cores into a single VM. Entry: FFmpeg Date: Tue Nov 3 10:36:12 CET 2009 Need FFmpeg encoder bridge + image format. image: 256 level greyscale. encoder: mpeg4, no audio How to generate the glue code? -> manually for now (probably need both manual and generated glue code) Q: Is it worth it to use the C api? Is that amount of control necessary? Maybe a pipe bridge is easier? Q: What documentation to use? Its difficult to find a good introduction for the C encoder api. This seems to be the right spot[1]. Start with the file libavcodec/apiexample.c[2]. The example file uses the following objects: AVCodec *codec; // codec object AVCodecContext *c; // configuration data for codec AVFrame *picture; General structure is simple: picture data is converted to raw packet based on current codec context. Next: representing encoded data blobs is maybe best done using the native string objects. Problem: how to bridge wrapped buffers and C-level buffers? I.e. the low-level C objects need to be able to interface with leaf/bytes.h. The simplest solution seems to be to never let the C code allocate anything (except for data embedded in locally defined objects). [1] http://ffmpeg.org/developer.html [2] http://www.irisa.fr/texmex/people/dufouil/ffmpegdoxy/apiexample_8c.html Entry: building a custom Scheme interpreter Date: Tue Nov 3 16:16:06 CET 2009 How to build an interpreter with extensions? Currently leaf objects are initialized in sc/scheme.c : _sc_new() /* Atom classes. */ sc->m.p = malloc(sizeof(*(sc->m.p))); TYPES->ck_type = ck_class_new(); TYPES->symbol_type = symbol_class_new(1000); TYPES->prim_type = (void*)0xF001; // dummy class TYPES->port_type = port_class_new(); In ex/ex.h : base_types (the classes) are accessed by the core implementation through the sc->m.p member. The problem that needs to be solved: a scheme factory method `make-foo' needs to access the foo class object through the sc* scheme VM pointer. Question: do these classes really need to be tied to the VM? The problem is this: they need to be stored somewhere. It's possible to store them as a global variable : some C libraries do this anyway. Essentially: - do different VMs _need_ different class instances? - do we want to _allow_ this? - classes could be shared Two conflicting requirements: it's better not to use any global process data, but some form of sharing will be present when libraries are used that do this. Let's build the FFmpeg wrapper with global classes. Solution: _sc_init() takes a parameter = a struct containing classes that will be stored in the sc->m.p member. These can then be used by procedures to access vm-global resources, where the implementation can then decide to make them image-global (i.e. FFmpeg's init) Entry: installed libprim scheme Date: Tue Nov 3 17:26:21 CET 2009 Todo: - make the header files compatible with system-includes. I.e. use a single base (ex/... sc/...) - bundle libraries: libprim_ex.a libprim_sc.a libprim_media.a Libraries are not a problem. Header files are, because of generation.. Separating in private / public will be a mess. Ok, fixed it Entry: Automatic wrapper generation Date: Tue Nov 3 18:44:40 CET 2009 The idea is now the following: from a .h file containing a simple C wrapper written in terms of known C objects, generate all the glue code to register the functions with the VM. For SC and PF, this amounts to packing/unpacking strings, ints, ... depending on the memory model used. Essentially: the wrapper binds the pure C objects to a PF/SC memory + control (exception) model. The task should be simple: examine the .c / .h code and generate anything that looks like a pattern. Entry: The big plan Date: Tue Nov 3 21:33:02 CET 2009 So, what am I going to do with this? To be able to use it in commercial projects, I'm probably going to switch the licence to LGPL or BSD. The idea is that a simple .c tarball (without dependency on mzscheme) can be used to build the .a libs which can then be linked into an app. DONE The problem is that I'm still not positive about how to tackle the wrapper generation. This essentially bridges the C data model to the EX data model, possibly wrapped differently for SC/PF. This is still too complex. The architecture needs to be formalized. The C function model should be something like dataflow functions in Oz. The idea being that the C code doesn't perform any data allocation, but that function output is handled by in-place updating the collection of basic objects. I.e. SSA, but on an object-level. Entry: The media lib Date: Wed Nov 4 06:38:18 CET 2009 Collection of .c/.h files. media.c : wrappers for scripting languages. To generate the wrapper code it would be best to make sure that the .h files are "clean" : no system libs, no delegate library libs. This can be moved out of the critical path though. Keeping the general idea in mind, the ex functions can be coded manually. The problem with class registration remains though. Classes are accessed as part of a struct, so they cannot be dynamic. Entry: More formal architecture Date: Wed Nov 4 07:19:19 CET 2009 C - side: objects, constructors, destructors and in-place functions. The C functions can be bridged to EX function by wrapping C objects in a GC managed struct. This should be automated. How can this automation be made as simple as possible? Entry: Swig Date: Wed Nov 4 07:28:54 CET 2009 Let's have a closer look at Swig[1]. It might be wise to make all interfaces compatible with standard scripting languages through Swig. The problem with Swig is that it generates large code. It can really be done a lot simpler.. [1] http://www.swig.org/Doc1.1/HTML/Contents.html Entry: Are classes/types global? Date: Wed Nov 4 07:49:31 CET 2009 I guess the answer is yes. C objects contain pointers to C classes, but the classes themselves do not really need to be tied to the VM instance. The real question is: what is a VM instance? Let's reformulate: each VM has its own memory manager (GC). This means that managed data cannot be exchanged between VMs. Really, a VM is kind of a global variable. Why is there a need to reference the base_types struct? Let's look at the macros that use it: #define DEF_ATOM(name) \ static inline name *object_to_##name(object ob, ex *m) { \ return (name*)object_struct(ob,m->p->name##_type); } This essentially maps `name' to a pointer to its associated class/type object for use in the `object_struct' function, which is the main function that determines if an object matches a type, and returns a pointer or NULL. There is no real reason why m->p->name_FOOTYPE shouldn't be globalvar_FOOTYPE, apart from global variable references being more expensive. So.. How to make the scheme interpreter extensible? Can these core implementation issues be hidden from user extensions? The whole point is to be able to simply wrap some C/C++ code.. Let's stick with: - interface: class_new() methods in the C code, to allow for most general approach. - policy: (static) global variables for class objects in extensions Entry: Wrappers Date: Wed Nov 4 13:06:42 CET 2009 Looking at media.c, what occurs here is: - CAST - ERROR - C object calls CAST + object calls can be automated. For PF and SC, object wrapping might be different. Can this be solved at link time? The real interface is object_to_xxx which is currently an inline function. If this is transformed into linkable symbols, the binary code can be reused + plugins become feasible. How to automate ERROR wrapper generation? Somewhere the decision needs to be made about how to handle errors. Either C code returns error values, or a setjump-based approach can be used. Entry: Missing functionality Date: Wed Nov 4 13:41:43 CET 2009 - current-input-port / current-output-port OK - system OK - string parser OK - parameters - let* - macros: check if it's actually correct (evaluation order) - readline Can parameters be bypassed? I'm already using implicit output ports all over the code. Currently just using direct (lexical) write routines: parameters aren't really necessary. Entry: ffmpeg delayed frames Date: Wed Nov 4 13:59:19 CET 2009 - first frame might be zero size - setting frame input to NULL produces delayed frames. Entry: ffmpeg external codecs Date: Wed Nov 4 15:05:12 CET 2009 does ffmpeg have an mp3 encoder? no : use lame ac3, aac, vorbis are built-in apparently all configuration data is stored in the AVContext struct. the ffmpeg code is quite error prone.. a bit of abstraction around it won't hurt. next: write encoded audio to buffer. there is no data structure for audio - just a flat array of short int. int avcodec_encode_audio(AVCodecContext *avctx, uint8_t *buf, int buf_size, const short *samples); Entry: Muxing Date: Wed Nov 4 16:35:54 CET 2009 av_interleaved_write_frame AVPacket -> AVFormatContext an AVFormatContext has a number of AVStream Whether to write audio or video is based on the pts (time stamp) /* write interleaved audio and video frames */ if (!video_st || (video_st && audio_st && audio_pts < video_pts)) { write_audio_frame(oc, audio_st); } else { write_video_frame(oc, video_st); } [1] http://cekirdek.pardus.org.tr/~ismail/ffmpeg-docs/structAVPacket.html [2] http://cekirdek.pardus.org.tr/~ismail/ffmpeg-docs/structAVStream.html [3] http://cekirdek.pardus.org.tr/~ismail/ffmpeg-docs/structAVFormatContext.html Entry: load Date: Thu Nov 5 08:13:29 CET 2009 It's working: implemented in scheme, but the parser won't give eof: it hangs. Fixed: it actually worked, just needed to add eof-object? check. Entry: C90 Date: Thu Nov 5 10:29:01 CET 2009 Making sure it works using C90. How to ask GCC to check that? gcc -ansi -pedantic Problems: - task (as expected..) : task.c:63: warning: ISO C90 forbids variable-size array ‘reserve’ task.c:63: warning: ISO C90 forbids mixed declarations and code - va_copy (C99) Conclusion: apart from some annoying limitations and the task.c abstraction there seems to be little trouble in making it compliant to C99. C90 is more difficult though. Postponed until it's required. Entry: finding home Date: Thu Nov 5 11:13:53 CET 2009 After "make install" the binary library should be able to find its boot code. It's probably simplest to use pkg-config to do this. Entry: libprim classes Date: Thu Nov 5 13:50:26 CET 2009 Change needed: all classes need the same members, so the scripting core can treat all objects properly. 0. free 1. write Essentially, the class struct contains the methods, while the object struct contains a pointer to the class struct which identifies the object type. The problem is that the method implementation need to be independent of any scripting language core details. OK works. Entry: Parser Date: Thu Nov 5 17:03:46 CET 2009 the leg parser gives problems. essentially, I don't understand it. time to switch to a different one. maybe gut the one from tinyscheme? Entry: GC restarts and reading Date: Thu Nov 5 17:34:05 CET 2009 Reading is stateful: it can't trigger a restart. This is best solved by calling the collector before read. Growing the heap: restart loops need to be detected. fixed with gc->margin check after collection. Currently set at 100 cells. (Heuristic: 11 is not enough for SC and triggers a restart loop). Entry: SRFI-1 Date: Fri Nov 6 07:51:05 CET 2009 I took the reference implementation from [1], but had to remove some initial comments: the parser choked on realloc(). Problems keep turning up. While it's quite elegant to express it as a leg grammar, it's probably best to start thinking about writing one from scratch that I actually understand and that plays nice with the GC + supports suspension. Weekend job. Code parses, but some functionality is missing: values, receive, check-arg [1] http://srfi.schemers.org/srfi-1/srfi-1.html Entry: dynamic linking Date: Fri Nov 6 11:16:27 CET 2009 All object_to_xxx and xxx_to_object calls should be accessible by the linker. This consists a major interface between the C lowlevel objects/classes and the wrapped representation in the scripting layers PF/SC/... Entry: parsing Date: Fri Nov 6 16:04:36 CET 2009 Using peg/leg anyway.. The problems to solve: - realloc problem - GC access since it's a side-effecting operation. Also look at this[1], a (Haskell-style) parser combinator for C. [1] http://www.cs.chalmers.se/~koen/ParserComboC/parser-combo-c.html Entry: manual parsing? Date: Sat Nov 7 07:26:48 CET 2009 Lisp syntax is LL(1) -> deterministic recursive descent. Manually? problems: - data buffering (growable arrays) - serialization First: how to serialize a parse? Essentially, I need a linear representation of CONS cell based s-expressions. (1 2 3) -> ( 1 . ( 2 . ( 3 . () ) ) ) ((1 2) 3) -> ( ( 1 . ( 2 . () ) ) . ( 3 . () ) ) CONS NUMBER 1 CONS NUMBER 2 CONS NUMBER 3 NIL CONS CONS NUMBER 1 CONS NUMBER 2 NIL CONS NUMBER 3 NIL here CONS and NUMBER are functions that consume a fixed amount of next items which are always functions to be executed. Essentially, the s-expressions need to be translated to a prefix stream. This can be standardized. There is only one primitive data item: a fixed size byte array with size prefix. The rest are ADT constructors. The constructors themselves could be encoded as prefixed byte strings. So, in order to do this, the only thing that's needed is an encoding of natural numbers in byte strings to represent size tags. CONS NUMBER 1 CONS NUMBER 2 CONS NUMBER 3 NIL -> 1C1N11C1N21C1N31X where "C" = cons "N" = number "X" = nil Also, to make the format completely general, the constructor sizes need to be specified too. So we have: ... So, the only problem is self-delimiting natural numbers. Constraints: - small numbers are fast to decode. - mostly ASCII when the names themselves are ASCII CONS NUMBER 1 CONS NUMBER 2 CONS NUMBER 3 NIL -> 1C21N11... Simplest way to have self-delimited byte sequences is to mark the last digit. This way small sizes are captured in single bytes. Using base-16 is inefficient, but it's easiest to decode on small targets. This way letters can be used, with capitals denoting the last symbol. An advantage is that this introduces redundancy that can be used to guess the file format. abcd efgh ijkl mnop eHello Better is this: use the range 0x30 "0123456789:;<=>?" for the final symbols and 0x40 @ABC... for the prefixed. This way the small strings have decimal digits. 1C21N1B40x10 Even better: use the range 0x20 for prefixes: this distinguishes them from names. 1B4fooo Entry: Architecture summary Date: Sat Nov 7 09:36:10 CET 2009 LEAF: primitive C objects EX: C-based expression languages using GCd objects (wrapping leaf objects) PF/SC: execution model, using EX primitive operations + data model. caveats: - SC is slow: heap-allocated tree walking interpreter with search-based symbol binding. The upside is that the structure is simple: a textbook CEKS machine. - EX requires distinction between two modes: * PURE: construction without mutation = GC enabled -> primitives might restart on collection * LINEAR: mutation without construction = GC disabled. Entry: tokenized tree parser -> postfix stream Date: Sat Nov 7 11:09:10 CET 2009 Got the numbers working. Now, instead of using a prefix format. What about using a postfix one, where bytes are pushed and constructors executed afterwards? An extra bit could encode whether the byte string is quoted, or whether it is an operation. This way the parser doesn't need procedure recursion, but can use an explicit (more efficient?) reduction stack. However, to make the format partially parsable, arity still needs to be specified. So, it looks like the tradeof is to use bottom up or top down formats. The conclusion seems to be that really, parsing isn't the problem. The hairy one is tokenizing, and representing a tokenized stream so it can be trivially reconstructed. Another thing: tokenizing produces a stream of things. What things? There needs to be a distinction between code/constructors and data/primitives. Ascii rep constraints: - no '"' and '\' characters so it fits in C strings - 3 modes: prefix numbers: last digit contains quotation bit Entry: Representing tokenized streams Date: Sat Nov 7 11:30:02 CET 2009 Let's sum up the design decisions: * representing tokenized streams: prefixed or delimited Prefix tokens are simpler for implementing token streams, as they allow memory allocation in the _recognizer_ to happen before the payload data is touched. However, to do this completely general, some form of delimited parsing needs to be used to represent the _size_ itself (if one doesn't want to introduce arbitrary size limits by choosing fixed representations as in Quicktime/MPEG). However, this is not a problem: in practice a recogniser can set a limit (i.e. 64 bit sizes since it's limited anyway) while the format is in no way limited. * quotation vs. construction Since the base type is made of byte chunks, an extra tag is necessary to indicate whether a list of bytes is a payload or a constructor name. * bottom-up (postfix) or top-down (prefix) This reflects the structure of the parser. The same token stream format can be used for both parsers. I.e. the list (1 2 3) is represented as RPN: 1 2 3 NIL CONS CONS CONS PN: CONS 1 CONS 2 CONS 3 NIL Note that for cons lists RPN isn't so great. One could use XCONS to build lists in reverse. RPN NIL 3 XCONS 2 XCONS 1 XCONS There is one important point: RPN isn't terminated, while for PN it's always clear whether we've read all the data or not. I.e. RPN needs EOF. * arity-tagged constructors Arity tagged constructors are important to be able to represent the stream as a tree without meta-information about the constructor tags. Is this actually useful? It doesn't seem so... Any kind of less-structured format can always be embedded in cons-cell based lists, and is maybe better represented as such. So, for libprim: - prefixed tokens - 3 tags: 0x20 non-trailing-digit 0x30 trailing-code (numerical digits) 0x40 trailing-data (bytes) - polish (rpn = not delimited) Entry: s-expression parser Date: Sat Nov 7 15:25:27 CET 2009 Now, for the s-expression parser on top of the tokenizer. Is it possible to convert between the two formats without parsing, or is an intermediate tree rep necessary? It looks like this needs a tree rep to keep track of the paren matching. Alternatively, it could be tokenized to LP DOT RP and parsed from that. List = LP Tail Tail = a:Datum Dot d:Datum RP { $$= CONS(a,d) } | RP { $$= NIL } | d:Datum t:Tail { $$= CONS(d,t) } Entry: macros Date: Sun Nov 8 10:27:16 CET 2009 The question is: should macros be expanded when a procedure body is created, or when it is applied? 1. Does this change semantics? In a way that's important for the standard? Yes: if macro definitions change after creation of body code both regimes behave differently. 2. Does it affect (memory) performance? Without optimizations, expanded expressions are probably more expensive to store than original code. However, it has probably a huge effect on run-time performance. So... It looks like some form of compilation is in order. Entry: macro expansion Date: Sun Nov 8 11:22:33 CET 2009 The bootstrap now starts with the definition of a `macro-expand' function that traverses an expression and recursively expands it. The question is now where to plug it in. The inner interpreter should no longer recognize macros. Let's try to see what happens if that functionality is simply removed. Entry: profiling Date: Mon Nov 9 10:13:42 CET 2009 Check why it's so slow. Probably this is a name lookup issue. It's especially clear now that boot performs macro expansion. Figure out a way to cache that or implement environments differently. If I recall from Forth, dict lookup whas the biggest bottleneck. Check Dybvig's thesis[1]. [1] http://www.cs.indiana.edu/~dyb/pubs/3imp.pdf Entry: libprim dependency structure Date: Mon Nov 9 15:12:13 CET 2009 LEAF <-- MEDIA <--. ^ | | | (C) . . . . . | . . . . . . . . . | . . . . . . | | (EX) PF --> EX <-- SC <-- SC_MEDIA Above the dotted line is C code without memory management (except malloc for DAG-structured leaf objects). Below is built on the EX/GC memory model, with PF and SC implementing VM control structure. Entry: source distro w/o mzcsheme dep Date: Mon Nov 9 16:33:18 CET 2009 Build the generated files + tar up. Entry: string ports Date: Tue Nov 10 08:23:21 CET 2009 This morning I've added an indirection to the port class that makes it possible. Next: create ports with an embedded bytes string. Entry: class pointers as global variables Date: Tue Nov 10 08:24:29 CET 2009 I guess there's nothing wrong with that. The only reason the VM structs need the class pointers is to dispatch on type of leaf object. The fact that these are referenced somewhere else from a global context (i.e. leaf objets can be created outside of the VMs) doesn't matter. This will probably make a lot of code simpler. Especially writing extensions. Entry: IO to string Date: Tue Nov 10 10:48:14 CET 2009 In order for the normal write routines to work, the prototypes need to change so formatting can be done in two steps. > (define p (open-output-string)) > (write 'd p) Program received signal SIGSEGV, Segmentation fault. 0x0806b5d4 in _IO_str_overflow () (gdb) bt #0 0x0806b5d4 in _IO_str_overflow () #1 0x080699d3 in _IO_default_xsputn () #2 0x0805d811 in vfprintf () #3 0x082209a8 in ?? () #4 0x00000001 in ?? () #5 0xbfa435e0 in ?? () #6 0xbfa455dc in ?? () #7 0x08056309 in yy_Tail () at sexp.h_leg:620 #8 0x08056626 in yy_Tail () at sexp.h_leg:624 #9 0x00000000 in ?? () (gdb) Essentially: all FILE* operations need to be replaced with port* operations + the routines need to return bytes written, and should be robust to NULL references, which will just get size estimates. Is there a way to use sprintf as a one-pass algo? So, need to fix: - output to string OK (from packetforth, using vnsprintf) - faster boot (added expand.scm -> broken) (parser -> OK?) - constructors without class argument (class = global context) OK Entry: boot speed Date: Tue Nov 10 15:40:34 CET 2009 The problem seems to be the parser. So, time to get rid of it. Lexer seems +- ok. Entry: stack Date: Wed Nov 11 11:37:50 CET 2009 - bytes: add mime type I'm thinking to use the 'bytes' type to implement any data type that behaves as a buffer of bytes. - all leaf objects: printer routines in terms of 'port' Ok.. it took a bit longer than expected, but it looks like the parser's working. At least, it takes the boot.scm input. The rest of the day maybe do some more testing. Entry: object structuring + exceptions Date: Wed Nov 11 15:44:27 CET 2009 aborting the parser.. representing the parser in an object? connecting a parser straight to the port struct? latter is a bit dirty, but should work. found a simpler way: using setjmp did some more bugfixing. todo: make the error reporting a bit less spartan. Entry: printing to string ports Date: Thu Nov 12 06:43:36 CET 2009 Fixed. Using vnsprintf instead of vsprintf to limit nb of chars actually written. This can probably be optimized by attempting to write and retrying when it fails instead of always calling the formatter twice. Entry: parameters Date: Thu Nov 12 08:13:28 CET 2009 The next useful feature would be dynamic parameters. I think the necessary VM machinery is in place (continuation marks) so the rest would be manipulation of continuation chain. Entry: String repl Date: Thu Nov 12 10:41:39 CET 2009 For embedded applications without stdio a string repl is useful. A coroutine-style interface between a C node and the VM would make this possible. Let's use the following interface. This will consume Scheme commands in string form and produce a string with stdio. The string needs to be used before the VM is re-entered. const char *_sc_rep(sc *sc, const char *commands); Needed: - a repl written in scheme - a 'c-yield-string' word that exchanges strings. - modifiable output ports parameter's aren't necessary yet + IO ports only need to be settable from scheme, so this should simply work. Entry: sc_eval_step debugging Date: Thu Nov 12 12:05:20 CET 2009 (gdb) p sc_write_stderr(sc, state) #state(#redex(repl-oneshot ()) #k_apply(#k_mt () () ()))$3 = 20 Entry: lowlevel errors Date: Thu Nov 12 17:56:48 CET 2009 Need a mechanism in leaf to propagate errors. Entry: gprof Date: Fri Nov 13 07:29:17 CET 2009 Tried to inline some functions that are called a lot. Most of them are data access & creation. The real message is: in order to speed this up: perform less consing & lookup. According to Dybvig, it's variable references and procedure applications (frame extensions). Global variables could be linked to boxes instead of having to go through a search operation. This might be the best speedup. ex_find() takes 44.9% of the time. It's probably best to replace global variables with a direct reference to the cell that contains the definition. The same can be done for free variables in closures. If they're not assigned to, they can additionally be unboxed. #(var (name . val)) Entry: bootstrapping Date: Fri Nov 13 08:45:19 CET 2009 Also for wrapper generation from C: this should be self-hosted. Currently using PLT Scheme is ok but eventually it's probably best to make it standard. Entry: fdopen and read + write Date: Fri Nov 13 13:04:53 CET 2009 man fdopen: Reads and writes may be intermixed on read/write streams in any order. Note that ANSI C requires that a file positioning function intervene between output and input, unless an input operation encounters end-of-file. (If this condition is not met, then a read is allowed to return the result of writes other than the most recent.) Therefore it is good practice (and indeed sometimes necessary under Linux) to put an fseek() or fgetpos() opera- tion between write and read operations on such a stream. This operation may be an apparent no-op (as in fseek(..., 0L, SEEK_CUR) called for its synchronizing side effect. It's probably best to use 2 FILE objects. Entry: GC restarts: disallowed by default. Date: Fri Nov 13 13:24:17 CET 2009 FIXME: bunch of (all?) _sc_make_aref() calls might be restarted! it's probably simplest to probe the GC free space inside the interpreter loop to at least make those calls succeed. In general: define a global nb of cells that is _guaranteed_ to be available to all primitives invoked from _sc_step(). This might be extended to a mechanism that completely prevents GC restarts inside all primitives, turning it into an error. Primitives that allocate a lot of cells then can be the exception: they could allow GC restart manually. Entry: sockets working Date: Fri Nov 13 16:24:00 CET 2009 It's now possible to start a console on a tcp socket as client or server. Entry: Proposed changes Date: Sat Nov 14 07:35:45 CET 2009 - eliminate global variable lookup - treewalker -> CPS VM (write this in Scheme first, follow Dybvig) - GC restarts: make sure these only happen in step() OK Entry: preventing GC restarts Date: Sat Nov 14 07:42:33 CET 2009 It might be possible for me to write code under restarts, but in practice, interfacing with (stateful) C code becomes really difficult. It's best to make the mechanism more explicit. remarks: 1. It is possible to prevent the gross of the restarts by checking the available memory before executing the next VM step. This guarantees no restarts in case the nb of allocated cells remains below a certain threshold. 2. For primitives that allocate a lot of memory the restart mechanism might be enabled. Strategy: remove the 'PURE' mechanism, but prevent restarts in primitives. OK. This seems to work. The only places where unbounded allocation occurs are in ex.c -> ex_list_clone() (used by the mapping functions) and ex_list_reverse(). Time will tell if I overlooked something. Entry: VM Date: Sat Nov 14 09:36:00 CET 2009 What are the operations on the continuation? Is it enough to simply serialize these? Contiuation types: if set! begin apply Currently the interpreter works in two steps: 1. reduce non-value 2. apply value to continuation How do do this: - tag s-expressions - cps-convert Entry: Memory usage Date: Sat Nov 14 10:20:54 CET 2009 The current implementation isn't so efficient for memory usage. From[1] I gather that a one-size mark/sweep algorithm might be better. It would also remove the need for restarts. However, it would complicate the use of variable length objects. Additionally, pairs are represented quite inefficiently: 3 word cells per pair. However, in typical applications, the amount of memory used for representing cells is probably much less that used for representing leaf objects. Let' s not worry too much about this.. [1] http://w3.ift.ulaval.ca/~dadub100/files/picbit.pdf Entry: quasiquote Date: Sun Nov 15 09:41:45 CET 2009 Adapted from tinyscheme source. Entry: compact VM Date: Sun Nov 15 10:49:52 CET 2009 Now, suppose I can later fix the compactness of data representation. How to make at least code as compact as possible? Using the base-16 ASCII encoding expaned before, it's possible to create some very compact code. Entry: cond -> hygiene Date: Sun Nov 15 11:12:55 CET 2009 The '=>' syntax in 'cond' is the first place I encounter that needs to introduce variables. (let ((x expr)) (if x (function x) )) However, it seems that x is perfectly shielded from the lexical scope of function, so there is no danger of unhygienic effects. Entry: treewalker vs. bytecode Date: Sun Nov 15 12:45:37 CET 2009 Let's try to build a stackless bytecode VM. I don't think it's really worth it to try to build a stack-based VM. In cases where performance is necessary, it might be better to use an existing Scheme compiler. Try to focus on byte-code size. Entry: Primitive errors Date: Mon Nov 16 08:15:27 CET 2009 The interpreter fixes can wait: it's only speed. More important additions are leaf/ error handling. The problems: 1 this is again "context" (as with class instances) 2 messages should be linked to the _wrapped_ objects 3 memory management. For 1, there are two options: pass in an extra argument to the primitives, or solve it with a thread-local variable. For 2 it's best to have the caller register the wrapped object that needs to show up in the error message. For 3 it's best to not have the leaf code do any allocations. Essentially: the VM needs to provide an error handling mechanism. For Scheme, the simplest approach is to keep the argument list of the primitive call around. Entry: Grids & Folds Date: Mon Nov 16 10:43:35 CET 2009 Add (unsafe) operations for operating on vectors. Make it general at first, then try to add a compile-time binding mechanism. Is it worth to try to start from generation directly? Probably not. Let's define the datatypes first. Added: homogeneous grids - double floats to make interoperation with GSL easier. Entry: fixes Date: Mon Nov 16 16:56:26 CET 2009 remove the EX->TYPES struct: use only global classes + change all prototypes. Entry: compile "display lists" Date: Mon Nov 16 17:52:40 CET 2009 Similar to opengl, it's probably possible to observe the calls into leaf to simply record all primitive calls to compile them to threaded code. Adding the proper inspection points (all leaf calls go through a broker?) will make this possible. Essentially, the broker is an RPC mechanism: the client (EX/SC/PF) calls into LEAF through an observable channel: EX <-> RPC <-> LEAF Ignoring side effects, replacing the sequence of calls generated by EX by a flat script should produce the same result. However, this doesn't capture loops efficiently. Maybe some combinators could be added to the RPC mechanism? In general: simply recording shouldn't be so difficult. Parameterizing the representation however is not so easy. Also, it would probably work better with statically allocated objects: given a pool of objects, record all operations that go between. Maybe starting with a generic prototype would help? Return int for errors + only operate on leaf objects. Entry: global _type() access Date: Tue Nov 17 07:36:37 CET 2009 Removed "cached" type references from ex struct. Entry: TODO: fix generic functions for numbers Date: Tue Nov 17 13:58:11 CET 2009 Mix fixnums and flonums + prepare for full numeric tower. Entry: Wrapper-independent code Date: Wed Nov 18 07:35:49 CET 2009 PF and SC wrap the LEAF objects differently. How can they share highlevel EX code? Currently all "virtual" behaviour is encoded in a table in the EX struct, but since this is not VM-specific (i.e. multiple SC or PF VMS, all with their own local memory and GC) it can maybe be expressed using a static or dynamic linker interface. OK: made leaf object wrapping virtual + moved a lot of code to from scheme.c to ex.c Entry: yuv type Date: Wed Nov 18 13:37:10 CET 2009 Y Y' U Cb V Cr To be pedantic, these are not the same: Y luminance = detected energy Y' luma = brightness = gamma-compressed luminance = electric voltage However, in digital video, the Y is always gamma-compressed, and the prime is often removed. YPbPr is the analog equivalent[2]. For more info see YUV(analog) [3] and YCbCr(digital) [4]. How to make the video frame type as general as possible? Practically though, only a couple of different formats are necessary. These are the degrees of freedom: * planar / packed It seems that planar formats use only 2x2. YV12 Y,Cr,Cb I420 Y,Cb,Cr * component order * subsampling 4:4:4 none 4:2:2 vertical 4:1:1 vertical + horizontal Practically. Can the yuv format be made a subclass of the bytes format? Not really, as it is not a growable buffer. Done. [1] http://www.fourcc.org [2] http://en.wikipedia.org/wiki/YPbPr [3] http://en.wikipedia.org/wiki/Yuv [4] http://en.wikipedia.org/wiki/YCbCr Entry: Re-introducing packets Date: Wed Nov 18 14:58:24 CET 2009 A packet is something that can be constructed from a raw bytes buffer given some meta information. The other way around, packets can dump their data. Entry: External filters Date: Thu Nov 19 11:30:04 CET 2009 Make an abstraction to call an external data pipe filter as if it were a scheme function. This needs some select-based sequencing. Actually, this only works if there is a clear RPC semantics. Maybe coroutines are better? If unix pipes can be seen as synchronous channels with well-defined messages this can be greatly simplified. The original PF sequencer can probably be used. This needs non-blocking IO. Is there an SRFI for this? Anyway, in order to try this, it is necessary to create tasks and i/o operations that can be retried. As opposed to PF, it's probably best to implement the basic functionality in C. This needs: - pre-emptable parser (a C one-shot continuation). - output state machines - input state machines Only the parser is a push-down automaton, not a state machine. This will be a significant amount of work. Let's postpone it a couple of weeks. Rewriting the parser to use an explicit stack might be an interesting approach. However, it's probably better to use a proper coroutine implementation. Maybe this can be taken from lua? Alternatively, pthread can be used. It might be a lot simpler to implement. As long as the EX memory model can be separated from read/write threads this should actually be quite straightforward. However.. This does make them more expensive. We can have both.. On the longer term, figure out how lua does it. Switchable stacks is what is necessary. On the short term: create a channel abstraction that can produce/consume leaf objects. Malloc can still be used to allocate data that's passed between threads, and the leaf code is completely separate from any graph memory model. So, a pchannel can be used to synchronize with system threads. Implementing channels using condition variables (semaphores are not enough?) There are two blocking conditions. GET blocks when C = NULL PUT blocks when C = OBJ Signalling consumer -> producer: write_ok producer -> consumer: read_ok leaf_object *channel_get(channel *x) { leaf_object *ob; pthread_mutex_lock(&x->mut); while(!x->object) { pthread_cond_wait(&x->get_ok, &x->mut); } ob = x->object; x->object = NULL; pthread_cond_signal(&x->put_ok); // wake up producer pthread_mutex_unlock(&x->mut); return ob; } int channel_put(channel *x, leaf_object *ob) { pthread_mutex_lock(&x->mut); while(x->object) { pthread_cond_wait(&x->put_ok, &x->mut); } x->object = ob; pthread_cond_signal(&x->get_ok); // wake up consumer pthread_mutex_unlock(&x->mut); return 0; } Entry: testing channel Date: Thu Nov 19 15:07:10 CET 2009 The basic use case is a channel that wraps a file descriptor, which represents a socket or an other pipe to another process. OK, seems to work. Making this many->one would make it possible to serialize external events. The only problem is resource management. See the following scenario: 1 consumer, N producers. Whenever the consumer looses interest, all the produces should give up. However, when a producer disappears this should keep the channel open. In this case the 'free' method, issued at the consumer side should signal all producers to give up, and only cleanup the object when they're gone. Let's make it work for the many->one case. Entry: channel teardown Date: Fri Nov 20 15:11:14 CET 2009 Channels are quite simple. What's not simple is to represent end-of-stream and teardown without causing deadlock. But.. calling an external app with both input and output isn't so simple. Something goes wrong in the synchronization: i get commands that do not output anything. If I perform the commands manually everything works fine. Yes.. This is too complicated to get right. The bric-a-brac resource management simply doesn't work. Other possibilities: use only a feeder thread, and read the output of the process synchronously. But, given the trouble this creates and the difficulty of debugging a multi-threaded app in gdb, I'm thinking about moving to non-blocking IO. Let's create a special case: execute an external program, feeding it binary data on stdin, and collect its stout in a buffer. Sequence the reads & writes with select. Is it possible to move from FILE to fd's? For output this is just flush. How does input work? Entry: channel problems + external processes Date: Fri Nov 20 16:34:54 CET 2009 channels work fine, but teardown is problematic. current problems: - resource allocation of attached threads doesn't work - what do do with memory management? - how to avoid deadlock how to talk to an external process? The problem seems to be that teardown isn't defined very well. What does it mean? - all threads that block on GET should return with EOF - all threads that block on PUT should get an input signal Teardown and PUT should be properly serialized = what's wrong now. This needs a different abstraction. Let's concentrate on asymmetric solutions that always connect to a thread, and only support uni-drectional operations. Frustrating. There has to be a way to do this properly, but the resource management is nontrivial. Entry: to thread or not to thread Date: Sat Nov 21 14:26:50 CET 2009 + C-based parsers are simpler - state machines don't need it - can't use VM code - gdb is more difficult - not all targets support it - requires special memory management The main question is however: isn't it better to have an event loop based mechanism in the interpreter? Original PF at some point had to do this too. In this light, isn't it better to switch to non-buffered IO? Direct fds? Entry: select() based event loop Date: Sun Nov 22 12:16:30 CET 2009 Essentially, this consists of a list of file descriptors and associated actions. Currently the actions are: - INPUT: read when ready + parse - OUTPUT: write when ready The assumption can be made that the tasks are representable as state machines. Push-down automata will then need an explicit stack structure, or somehow wrap the C stack. Entry: Simpler PF Date: Sun Nov 22 12:19:02 CET 2009 What about building the core of the PF interpreter directly on top of LEAF instead of on EX? The core doesn't need a GC, and its memory can be built from tuples. Entry: setbuf fgetc Date: Sun Nov 22 17:47:00 CET 2009 What I'd like to know is how fgetc() buffers the input. How I would do it: fix some buffer size and attempt to read as much as possible without blocking. 2nd question: how to get at the input buffer without triggering a blocking read? -> set the underlying fd to non-blocking. fread will return a short item count and set errno = EAGAIN. so, there seems to be no problem. the real problem is: how to code this? should resuming happen at scheme level, or is a low-level mechanism better (i.e. present streams as object channels, and wake-up only after a full object is read/written). iow: do we use a low-level state machine sequencer (fast, but inflexible) or a high-level task switching engine (slow but flexible). as mentioned before, one important thing to note is that if a _parser_ is encoded as a blocking task, this whole idea won't work. currently, even the scheme syntax tokenizer uses function calls. it would be _really_ convenient to have a portable lightweight threads library. i don't understand why uclibc doesn't implement makecontext: SUSv2, POSIX.1-2001. POSIX.1-2008 removes the specifications of makecontext() and swapcontext(), citing portability issues, and recommending that applications be rewritten to use POSIX threads instead. alternatively, it might be interesting to construct some C macros for nested code. this mechanism isn't needed much: maybe a portable specialized implementation make sense? taking the birds-eye view, it seems that writing parts of the code directly in forth makes sense. maybe the project needs a ruffle? pff... it's time to focus, and work with what's there. writing the parsers as a state-machine isn't necessary yet. current I/O requirements only need simple state machines. it's probably best to propagate select up to the highest level, and then see how to specialize it if necessay. next: find a simple interface for select() A list with structures like this, where the busy flag is modified in place: (port 0/1/2 !busy fn) Entry: R4RS compliance Date: Tue Nov 24 08:56:32 CET 2009 Apparently `char?' and `number?' need to be disjoint. Since I'm not really interested in unicode, chars can be implemented as bytes, and embedded inside the constant space. Other R4RS issues include an incomplete implementation of the numeric tower. This requires quite some effort to get working, and I will update it ongoingly. At least the tokenizer should recognize numerical syntax, and maybe complex numbers could be implemented. Rationals and bignums are more work and currently not really relevant. Entry: arm cross build + test with qemu Date: Tue Nov 24 15:56:55 CET 2009 # build the toolchain with busybox. standard config: just pick # generic arm target. this gives arm-linux-uclibcgnueabi # the distro includes a script that builds it and installs it as a # subdir of current dir. bin/build-arm # the app is compiled statically, so can run directly in QEMU. use # the expanded code to boot faster. qemu-arm arm-linux-uclibcgnueabi/bin/sc --boot sc/boot-expanded.scm Entry: Compile to AST Date: Wed Nov 25 12:07:21 CET 2009 Currently the code remains in s-expression form. However, `lambda' should compile to AST with a limited number of cases. If this is then put into CPS form we have a VM. Roadmap: AST from is required for making varref explicit + cachable. Entry: VM ideas Date: Wed Nov 25 08:41:21 CET 2009 To make varref work, the compiled lambda expressions need to be modified. Since this is a caching operation, it could be done at runtime, whenever a varref is encountered. Essentially, a bare-bones lambda expression is a tree containing: - constants - variable references - abstractions - special forms The goal to speedup: - eliminate variable lookup - eliminate intermediate structures (deforestation) Entry: kawa scheme Date: Thu Nov 26 22:59:20 CET 2009 cd /usr/local/lib wget ftp://ftp.gnu.org/pub/gnu/kawa/kawa-1.9.90.jar export CLASSPATH=/usr/local/lib/kawa-1.9.90.jar java kawa.repl On android[1]. [1] http://per.bothner.com/blog/2009/AndroidHelloScheme/ Entry: `freeze' Date: Thu Nov 26 23:28:40 CET 2009 Currently 50% of the execution time is spent in variable lookup. What about adding a `freeze' operation that will replace all global names with values? The reason to make this a command is to make the semantics controllable. Freezing variables to values needs boxing to preserve mutability. As long as define / set! are not used to change the contents of a global variables, this works fine and should speed up a lot. Entry: `apply' Date: Thu Nov 26 23:40:53 CET 2009 I suspect apply is way too slow because of the continuation stack trick. Changing the evaluation order of arguments from back to front should make some things easier.. i.e. argument lists can be constructed in a straightforward manner. The 'reverse' operation necessary to construct the list of redexes can be done once at compilation time. But, before I'm going to do something like this, some form of scheme-level profiler is necessary. Entry: Scheme Resources Date: Fri Nov 27 08:03:36 CET 2009 1. libprim/sc vs Standard Scheme. The SC interpreter is optimized for embedded use. It has a very small C core of about 60K. It is standard R4RS with the most expensive features removed (i.e. no rational numbers, complex numbers or infinite precision integers). Its main goal is to glue together C data and code, and provide a system interface akin to a bash shell (programs / pipes / files). 2. getting to know Scheme To get started, take a look at this http://www.scheme.com/tspl3/ Entry: VM change roadmap Date: Fri Nov 27 09:33:40 CET 2009 This is a longer change, probably for the weekend. Change the representation of code such that (expanded) lambda expressions get transformed to an intermediate form which is then interpreted by the VM. This can still be a tree. Just make sure that some binding work that's done at run-time atm is replaced with compile-time work. Entry: Coroutine interface C <-> VM Date: Fri Nov 27 11:23:31 CET 2009 Requires: "yield" signal that preserves current continuation. This is already +- implememented with the MT continuation. Currently I don't see how to do this. Something isn't right. Yield should behave as a primitive: the re-entry from scheme behaves as the return value of the sc_yield() primitive. Entry: remove support for towers Date: Fri Nov 27 11:36:23 CET 2009 It makes the interpreter loop too complicated, and it's not used. This means sc_step is not a function accessible from scheme, which means it can be implemented more efficiently, and probably simpler. The fundamental conflict is this: some primitives need explicit access to the current interpreter state - this conflicts with implementing the interpreter in a purely functional way. Essentially, the concept is broken so let's get rid of it. The vm is a state-machine because of a hack that allows modification of internal state (i.e. the current value & continuation) in an interface towards primitives that pretends as if there is no suck access. Is there a reason to not unify primitive exceptions and GC restarts? Entry: A better approach Date: Sat Nov 28 10:33:09 CET 2009 1. Write the VM in Scheme 2. Translate this to EX using a bootstrap compiler This approach allows to get the code working right first, and gives another escape route to compiled code. Entry: Dybvig heap machine Date: Fri Nov 27 16:48:14 CET 2009 A accumulator X next expression E environment R value rib (already accumulated values for next evaluation) S stack/continuation How is this different from the cek machine continuation frames? Entry: Detached console Date: Mon Nov 30 13:41:47 CET 2009 The idea is to run the VM with its own dispatch loop in a separate thread, and have the foreground app (anything that links the VM into its image) present only a character console I/O. leaf_object *console_rpc(console *d, const char *cmd) { console_write_raw(d, cmd, strlen(cmd)); console_write_raw(d, "\n", 1); port_flush(d->out); return console_read(d); } int main(int argc, char **argv) { sc *sc = _sc_new(argc, argv); /* This will start the VM in the background with a standard unix-socket based dispatch loop. See boot.scm::init-console */ console *c = _sc_start_console(sc); /* To exchange information with the VM, a character interface is used. To properly syncrhonise on the return stream, the leaf/ s-expression parser is used. */ for(;;) { leaf_object *o = console_rpc(c, "(+ 1 2)"); leaf_write(o, p); port_printf(p, "\n"); leaf_free(o); sleep(3); } } Entry: wish list Date: Mon Nov 30 15:02:50 CET 2009 - parameters + io-redir - dynamic-wind - exceptions - continuation printing - stepping / debug? Entry: #< ... > Date: Mon Nov 30 16:08:28 CET 2009 This will be scanned correctly if the brackets '<' and '>' are balanced. The default parser will tag these with "hash". The Scheme parser raises an error. Entry: build system fixes Date: Tue Dec 1 14:25:40 CET 2009 Environment variables: LDFLAGS Entry: select and FILE* Date: Wed Dec 2 10:49:44 CET 2009 Because of input buffering, select will not always give the correct behaviour. I.e. for the input console, giving a sequence "1 1 1 1 1" will give only one answer as it seems the whole string was buffered. This is ok as long as the user waits for reply before sending another command, but it's something to be careful about for unidirectional channels! Entry: Code reps Date: Wed Dec 2 13:57:13 CET 2009 - tuples (low-level, tree-structures, C) - vectors (high-level, graphs, EX, GC) - cons-lists (in terms of vectors) For communication between apps, symbol-tagged tuples to represent ADT constructors seems simplest. However, the s-expression parser produces such a description based on cons cells. Maybe there should be a routine that flattens this, and fails on improper lists. tagged cons list -> tagged tuples. Can the parser be extended to produce flat vectors directly? Entry: Java comm (converting any rep to java data structures) Date: Wed Dec 2 14:01:24 CET 2009 Using StringTokenizer[2] a string can be converted into an iterator. However, a straight connection to JNI might be simpler (see next post). [1] http://www.cafeaulait.org/javafaq.html#xtocid75724 [2] http://www.javasoft.com/products/JDK/CurrentRelease/api/java.util.StringTokenizer.html Entry: Java JNI Date: Thu Dec 3 11:36:00 CET 2009 /* Recipe for running Scheme in Java with access to Java objects: Start Scheme VM as a native method called from a Java thread which never returns. The JNIEnv variable passed in can be used to call into Java code from the VM mainloop. */ /* JAVA JNI */ /* This macro defines standard LEAF object behaviour for wrapped Java JNI objects, and unwrappers for AREF representation. */ #define DEF_JAVA(name) \ typedef struct { leaf_class super; } java_##name##_class; \ typedef struct { java_##name##_class *type; name jni_ref; } java_##name; \ void java_##name##_free(java_##name *x) { free(x); } \ int java_##name##_write(java_##name *x, port *p) { \ return port_printf(p, "#<" #name ":%p>", x->jni_ref); \ } \ LEAF_SIMPLE_TYPE(java_##name) \ java_##name *java_##name##_new(name jni_ref) { \ java_##name *x = calloc(1, sizeof(*x)); \ x->type = java_##name##_type(); \ x->jni_ref = jni_ref; \ return x; \ }\ DEF_AREF_TYPE(java_##name) DEF_JAVA(jclass) DEF_JAVA(jobject) DEF_JAVA(jmethodID) /* Dynamic context (VM running in the dynamic extent of a java call) */ typedef struct { JNIEnv *env; // current Java thread's environment jobject this; // the embedding scheme object } java_ctx; static inline java_ctx *_sc_java_ctx(sc *sc) { java_ctx *ctx = (java_ctx*)(EX->ctx); return ctx; } #define JAVA_ENV (_sc_java_ctx(sc)->env) /* Exceptions. JNI methods give a special return value (i.e. NULL) to indicate an exception has been raised. This needs to be handled _before_ making another JNI call. */ _ _sc_java_error(sc *sc, _ ob) { if ((*JAVA_ENV)->ExceptionCheck(JAVA_ENV)) { (*JAVA_ENV)->ExceptionClear(JAVA_ENV); } return ERROR("java", ob); } _ sc_java_class(sc *sc, _ name) { jclass class = (*JAVA_ENV)->FindClass(JAVA_ENV, CAST(cstring, name)); if (!class) { return _sc_java_error(sc, name); } return _ex_leaf_to_object(EX, java_jclass_new(class)); } _ sc_java_methodID(sc *sc, _ cls, _ name, _ sig) { jmethodID method = (*JAVA_ENV)->GetMethodID(JAVA_ENV, CAST(java_jclass, cls)->jni_ref, CAST(cstring, name), CAST(cstring, sig)); if (!method) return _sc_java_error(sc, name); return _ex_leaf_to_object(EX, java_jmethodID_new(method)); } _ sc_java_this(sc *sc) { return _ex_leaf_to_object(EX, java_jobject_new(_sc_java_ctx(sc)->this)); } _ sc_java_string(sc *sc, _ ob) { return _ex_leaf_to_object(EX, java_jobject_new((*JAVA_ENV)->NewStringUTF(JAVA_ENV, CAST(cstring, ob)))); } /* Method calls in Java are a bit of a pain due to there not being a single root object. The scalar objects need to be handled differently. It's probably simplest to restrict to object-only calls and wrap scalar types somewhere else. Otherwise, the native types can be handled using some macros to perform the routing based on input types. */ _ sc_java_call(sc *sc, _ ob, _ ID, _ args) { vector *v = CAST(vector, args); int i, n = vector_size(v); jvalue jargs[n]; for (i=0; islot[i])->jni_ref; jobject rv = (*JAVA_ENV)->CallObjectMethodA (JAVA_ENV, CAST(java_jobject, ob)->jni_ref, CAST(java_jmethodID, ID)->jni_ref, jargs); if (!rv) { return _sc_java_error(sc, args); } return _ex_leaf_to_object(EX, java_jobject_new(rv)); } /* FIXME: check this All objects objtained through the JNI will remain live as long as the method does not return. References need to be explicitly released otherwise. http://journals.ecs.soton.ac.uk/java/tutorial/native1.1/implementing/refs.html "To make sure that Java objects can eventually be freed, the JNI by default creates _local references_. Local references become invalid when the execution returns from the native method in which the local reference is created." "In certain situations, however, the native code may need to call the DeleteLocalRef function to explicitly delete a local reference." This is done using (*env)->DeleteLocalRef(env, object) */ Entry: simplifications Date: Thu Dec 3 13:27:24 CET 2009 - eliminate towers of interpreters - eliminate GC_INTEGER ?? -> all objects are either structured leaf objects or constants (it can be argued that an integer is a constant though) - fix the .h_prims generator Entry: Wrapping Date: Fri Dec 4 07:42:52 CET 2009 I spend a lot of time in wrapping something -> leaf and then leaf -> EX/SC. Make at least the latter one automatic. Use this to make generic C code generatior for a subset of procedures. Entry: JNI interface with type checking Date: Fri Dec 4 14:17:30 CET 2009 Something that's quite necessary to make this work reliably is a dynamic type checker. The information is available (it's necessary for method lookup) so it shouldn't be too difficult to integrate. What's the simplest way to run a JNI example on a Linux platform? From here[1] we have: gcc -o libnativelib.so -shared -Wl,-soname,libnative.so -I/export/home/jdk1.2/include -I/export/home/jdk1.2/include/linux nativelib.c -static -lc [1] http://java.sun.com/developer/onlineTraining/Programming/JDCBook/jniexamp.html I have some preliminary code outside of the project, which with a bit of cleanup can probably be made into a LEAF<->JAVA interface. Entry: EX and LEAF compilers Date: Fri Dec 4 18:02:22 CET 2009 Start thinking about compiling Scheme/PF code in (readable) EX and LEAF C code. For non-looping code this could be done in a straightforward way recording all LEAF api calls. For looping code it's more involved. Entry: reflection on future Date: Sun Dec 6 10:22:22 CET 2009 Recent developments have shifted the off-clock effort from making the Scheme VM faster to making the Scheme layer easier to eliminate, producing readable and editable C code. Things that seem to become more important: - LEAF data structures with sharing - Automatic wrapper generation - Migration from Scheme to C/LEAF code - Java JNI integration In short: it should be possible to peel the EX/SC layers off more easily, leaving just C code and the LEAF object library. For structures with sharing, but without dependency on an asynchronous GC, it's necessary to implement some form of reference counting. This can either be done inside the base object, (taking a 2nd field), or as a subclass. Things that are hinted at: - Polymorphy in LEAF - Implementing ex/vector in terms of leaf/tuple - Implementing PF's linear (data) core in terms of LEAF, with code execution in terms of VECTOR. The more I think about this, the more it seems plausible to take a closer look at the object side of the COLA architecture. Some missing links are it's object model, and bootstrap through the method table cache. From another side, CLOS and implementations might also be a good source of information. Entry: Unification Date: Sun Dec 6 10:37:42 CET 2009 To simplify compilation from Scheme -> C using only the LEAF object model and manual memory management, all objects need to become leaf objects (i'm looking at integers and characters), so at least there is a unified model for data and operations at a lower level. As became quite clear toying with the Java JNI, having to special case data items that are not objects is quite a pain. It's important not to forget the goal of the libprim project: have low-level code and data models that can be _augmented_ with highlevel ones (data: EX GDd vectors, code: Scheme and PF interpreters). This is about the total elimination of _all_ Scheme and PF code in a final project, to get rid of the management requirement of not _using_ any unknown or non-standard languages, so the resulting C code can be modified without needing any access to the higher level data and code models. Entry: Next Date: Sun Dec 6 11:02:46 CET 2009 - Unified objects (integers as leaf objects) - Data sharing and reference counts RC: if there is a necessity to create circular data structures, it's probably prudent to contain those locally, i.e. wrap them in a non-circular structure. It's probably necessary to equip each object with a reference count and a `final' flag. The latter is necessary because there is otherwise no guarantee that a multi-owned object will remain constant. Entry: Assumptions Date: Sun Dec 6 11:38:44 CET 2009 What are the assumptions? - Synchronous memory management will be necessary, i.e. the system already contains open/close resources (code libraries / hardware interfaces / remote connections / ...) that cannot be modified. - Use of anything outside of plain C will be a liability and should be removed (if possible). - During prototyping and testing there is no constraint on the tools used. Entry: Implementing RC Date: Sun Dec 6 11:44:13 CET 2009 I'm probably going to add it to the base object. This means that yes, all objects require an RC which will make them larger. However, I intend to keep the interface abstract, so objects with RC=1 might be implemented simpler. One problem however: how can you tell whether a C function consumes an object, like the parser uses a port, or reads/modifies it in place? That inconvenience is not going away.. Maybe the convention is that all argument to constructors will use the object as a delegate. Maybe this should be done where it's appropriate: at the point where the reference is _STORED_. Essentially: users (callers) can't know what a procedure does with a reference. Convention: all leaf objects passed as arguments are not stored inside a data structure that exits the procedure in a different form. Each _persistent assignment_ requires a duplication. Assignments to local C stack variables are not a problem. Entry: LYSP Date: Sun Dec 6 18:41:40 CET 2009 See here[1]. Before I try to do anything else I might better read the code. One of the interesting tricks is to use dlsym for looking up native code. [1] http://www.piumarta.com/software/lysp/ Entry: Memory and stack management questions Date: Sun Dec 6 18:44:42 CET 2009 - how efficient is malloc? - does it make sense to have a separate pooling mechanism? - how non-portable is makecontext() ? Entry: JNI & reflection Date: Mon Dec 7 14:12:37 CET 2009 java.lang.Class java.lang.reflect.Method java.lang.reflect.Constructor Something that bothers me in Method.invoke() is that the arguments are an Object[] array. Do they get automatically casted to native types like int,long ? I don't see anything of this sort, so it seems that the answer is yes: this only supports objects. It would be interesting to figure out if kawa can be used to perform this reflection task.. Conclusion: java reflection API seems useful, but there are complications if arguments are not an Object subclass. Nope[2]. It is possible to represent primitive types as a Class object. [1] http://java.sun.com/j2se/1.3/docs/api/java/lang/Class.html [2] http://java.sun.com/j2se/1.4.2/docs/api/java/lang/Class.html#isPrimitive() Entry: prettyprinting Date: Tue Dec 8 07:25:06 CET 2009 Don't underestimate the power of pure functional C code, prettyprinting and gdb. I'm developing for Android atm, and I'm quite surprised by the lack of good debugging tools. Entry: simplifying exceptions : no more multiple entries Date: Tue Dec 8 07:46:47 CET 2009 This seems to work. Both PF and SC interpreter loops are simpler now: jmp-buf is only set once per exception, instead of once per primitive invocation. Entry: More reflection Date: Tue Dec 8 13:28:55 CET 2009 Varargs are expressed as this: Object print_var(Object... a) { System.out.println(((String)a[0]) + ", " + ((String)a[1])); return _void; } The method can then be resolved + invoked as in: Method find(String cmd) throws java.lang.NoSuchMethodException { return this.getClass().getDeclaredMethod(cmd, new Class[] {object_array_class}); } void test() throws java.lang.NoSuchMethodException, java.lang.IllegalAccessException, java.lang.reflect.InvocationTargetException { print_2("aaa", "bbb"); Object[] args = {"ccc", "ddd"}; find("print_var").invoke(this, new Object[] {args}); } Note that the vararg definition is essential. The following method does not have the right signature and raises java.lang.NoSuchMethodException: eval.print_2([Ljava.lang.Object;) Object print_2(Object a0, Object a1) { System.out.println(((String)a0) + ", " + ((String)a1)); return _void; } But, vararg arrays are enough to make at least the definition of dynamic methods a bit more straightforward. Summarized in jni/eval.java Entry: Mapping tuples to method invocations Date: Tue Dec 8 14:08:09 CET 2009 Main problems are strictness and argument agregation. I.e. given the tuple: t = tuple("doit", "foo"); * Does eval(t) evaluate "foo"? * Does eval pass multiple arguments 1->... as an argument vector? It's probably best to deviated from lisp-style semantics here, and simply use method lookup and explicitly constructed argument lists. Entry: More java reflection Date: Wed Dec 9 08:10:19 CET 2009 The missing element is a Scheme object -> java object interpreter. Or even simply a string -> java object interpreter. Essentially, to quickly hack it, all primitive types really could be represented as strings, while all object types need to be somehow wrapped, or placed in a local store. This should make it possible to keep the interface between java and Scheme completely dumb. The problem is: an explicit distinction needs to be made between local variable references and other constructors. Doing this properly looks like a good weekend project. Entry: sc in Java on linux Date: Wed Dec 9 17:49:04 CET 2009 Moved code from the external project side into the libprim side, so the interface can be made simpler. Next: how to wrap a C pointer as a java object? -> use long Next: hash table for java<->aref wrapper management. such a hash table needs a simple linked list anyway for collisions, so maybe start with a linked dictionary first. Can this be done with tuples? Keys then need to be leaf objects. Actually yes. The ref list is really just a list. Entry: JNI object -> class Date: Thu Dec 10 11:04:24 CET 2009 I changed some methods from object to class context (static methods). At the same time some problems started occuring: sc_java_class doesn't resolve any more. Next: - print exceptions - maybe re-instate object context? Entry: Reflection Date: Thu Dec 10 13:41:20 CET 2009 So, there's a SC/C <-> Java connection. Link this then to the eval.java and reflect.java classes. Different strategy: Instead of going through a large amount of red tape in C to be able to call different constructors / class methods / methods, it might be simpler to move most of the dispatch logic to the Java side and use the reflection API, but keep a single interface through some static method interface from C->Java. On the C side, things need to be simplified such that there are 3 different routines: constructors / static / methods FIXME: constructors with arguments dont work. w.o. ok FIXME: create an object pool based on the java object pointer, not the local wrapper. this is a mess.. solve it properly. it's not possible to map leaf -> aref due to moving GC, so each leaf needs to be _dupped. Entry: Java JNI fixed Date: Fri Dec 11 10:10:09 CET 2009 Fixed most of the bugs in the JNI <-> LEAF/EX bridge, except for invoking constructors. Entry: JNI w/o type signatures Date: Fri Dec 11 11:33:01 CET 2009 It would be handy to be able to get rid of the type signatures when looking up methods. Could be done in two steps: - map object -> class -> methods list - find method with correct type, or simply try them one by one Entry: JAVA calls -> running VM loop Date: Fri Dec 11 11:53:18 CET 2009 This needs some locking. The simplest place to release the lock is during the select() wait period. Entry: global instead of local refs Date: Fri Dec 11 14:59:46 CET 2009 Need to change the ref rep to global, so they survive across JNI calls. /* Create a global reference */ stringClass = (*env)->NewGlobalRef(env, localRefCls); /* The local reference is no longer useful */ (*env)->DeleteLocalRef(env, localRefCls); Apparently, this gives a unique object. Looks like the object mapper isn't necessary. But let's leave it in for now in case it is. No: it still is, but in less cases (i.e. objects passed through functions?). Or are these also local refs that get re-wrapped? Confusing.. Looks like I introduced a problem that wasn't there before. Revert. /* The sc struct contains a reference to the currently active Java context. */ /* For some reason the global refs cause crashes.. */ #define GLOBAL_REFS 0 static void jniref_free(jniref *x) { // LOGF("DeleteRef(%p) RC=%d\n", x->jni_ref, leaf_rc(&x->base)); sc *sc = x->sc; if (sc) { java_pool = tuple_list_remove_object(java_pool, (leaf_object*)x, 1); #if GLOBAL_REFS (*JAVA_ENV)->DeleteGlobalRef(JAVA_ENV, x->jni_ref); #else (*JAVA_ENV)->DeleteLocalRef(JAVA_ENV, x->jni_ref); #endif } free(x); } static int jniref_write(jniref *x, port *p) { return port_printf(p, "#", x->jni_ref); } LEAF_SIMPLE_TYPE(jniref) DEF_AREF_TYPE(jniref) static int jobject_equal(jniref *x, void *jni_ref) { return x->jni_ref == jni_ref; } static jniref *new_jniref(sc *sc, void *jni_ref) { jniref *x = calloc(1, sizeof(*x)); leaf_init(&x->base, jniref_type()); x->sc = sc; x->jni_ref = jni_ref; java_pool = tuple_stack_push(java_pool, (leaf_object*)x); return x; } /* 1st level wrapping makes a 1-1 map between leaf objects and jni references */ static jniref *new_jobject(sc *sc, void *jni_ref) { #if GLOBAL_REFS void *global_jni_ref = (*JAVA_ENV)->NewGlobalRef(JAVA_ENV, jni_ref); (*JAVA_ENV)->DeleteLocalRef(JAVA_ENV, jni_ref); return new_jniref(sc, global_jni_ref); #else jniref *x; /* If it's already wrapped as jobject, return it with RC++. */ if ((x = (jniref*)tuple_list_find(java_pool, (leaf_predicate)jobject_equal, jni_ref))) { LOGF("reuse %p\n", jni_ref); return LEAF_DUP(x); } else { // LOGF("wrap %p\n", jni_ref); return new_jniref(sc, jni_ref); } #endif } Entry: Other problem Date: Fri Dec 11 16:00:56 CET 2009 Probably because the string created in java-string isn't a local ref and can't be freed as such? I need to do this differently... The current design is not very controllable. Maybe it's best to make sure nowhere in the C code there is ever any local reference visible, then see how far that gets. A mix of java and C code is probably going to be best. C side only sees global refs and only objects, then use the reflection API on the Java side to do all the work. Entry: ReleaseStringUTF Date: Sat Dec 12 07:57:01 CET 2009 ReleaseStringUTF is for freeing the resources allocated by GetStringUTF. I had it coupled with NewStringUTF8 which creates a java object, not a C-readable UTF-8 string. Entry: Simpler interface C->Java Date: Sat Dec 12 09:02:48 CET 2009 Currently using a single delegate static method which is passed an array of objects. Next: make this recursive tuples. Ok. Most code seems to be connected properly now. Entry: Testing: call arbitrary methods + send/return integers Date: Sat Dec 12 14:03:11 CET 2009 In other words: make method lookup simpler (no fussing with specifying types: automate as much as possible). Done, but I didn't make special number convertors. Structured strings are enough for now. Entry: Mapping numbers Date: Sun Dec 13 12:28:12 CET 2009 Calling the constructors is straightforward. However, automaticly converting numbers will not suffice, as Java distinguishes between different types. Also, invoking a specific constructor/convertor and then automatically converting Java->Scheme rep also causes problems. So, let's _not_ automatically convert back to Scheme rep, but make this a 2nd operation. This would also eliminate GC restart problems. Entry: Re-integration Date: Mon Dec 14 10:01:44 CET 2009 It worked just fine this weekend in isolation, however now the class instances don't seem to be the same for the sc_java_unpack() function. Maybe there is some double wrapping going on? -> could be this then needs isSameObject() It does work on Android though.. Good. FIXED: needed isSameObject() Entry: getDeclaredField Date: Mon Dec 14 11:35:09 CET 2009 Any method that accesses a field doesn't work when it's invoked a second time. This stuff is quite full of pitfalls.. FIXED? seems to have been a release() error. Entry: separate build dirs Date: Tue Dec 15 11:16:18 CET 2009 Necessary for cross-compilation. Seems to work. This was mostly adding the $(SRCDIR)/$(MODULE) prefixes for all source files, and copying the Makefiles into the build dir. Entry: --eval or --load options + java pass-through Date: Wed Dec 16 15:01:47 CET 2009 For running java debugcode this is necessary. Done: the sc/sc.c code now uses an --eval option that will pass its string argument to the `eval-string' function after loading the boot script. The default value is '(repl)'. To run a script, use: --eval '(script)'