Extensions to tinyscheme 1.39 Entry: Wrapping external pointers Date: Thu Jul 23 15:11:12 CEST 2009 There doesn't seem to be any interface for managing external data. I'm adding a "void" datatype. what this requires: - T_DATA type tag (enough bits?) - is_data() - mk_foreign_data() - typedef foreign_data - _object._data - datavalue() - finalize_cell() - atom2str() not yet: - garbage collector: recursive mark() on scheme data - scheme predicate (general: defs should define their own predicate) The only problem is that there is only a single type of foreign data, so the C side should make sure it can interpret a void pointer, i.e. by adding an extra level of wrapping. This is then solved by using the finalize operation as the type. EDIT: I am officially stupid. The hacking.txt[1] document explains exactly what I just did. Nobody read the docs, doc! Ok it seems to be mostly what I did, except that my extension abstracts pointers and uses the destructor as a type tag. [1] http://tinyscheme.sourceforge.net/hack.txt Entry: Objects are environments Date: Fri Jul 24 16:34:12 CEST 2009 [1] http://tinyscheme.sourceforge.net/oo.txt Entry: Small interface changes Date: Fri Jul 24 18:41:48 CEST 2009 ivalue -> integer_value rvalue -> real_value This makes CPP macros that generate type juggling red-tape simpler. Entry: Errors from primitives Date: Fri Jul 24 23:16:42 CEST 2009 This needs a modification to the interpreter. At first sight, there is no way to implement this as a function: the C stack frame of the primitive would act as a guard. Entry: Vectors are consecutive cells Date: Sun Jul 26 12:48:49 CEST 2009 This makes access O(1) but allocation not especially efficient and possibly leaky in that finding consecutive cells could be more problematic as time advances and the free list gets more fragmented. Maybe not? The freelist is kept sorted, so at least this problem is minimized in some sense, but that doesn't guarantee anything. Entry: Type checking + wrapping Date: Mon Jul 27 14:52:14 CEST 2009 I've added the type checking and wrapping abstractions as data.h and data.c to the tinyscheme tree. These functions provide a concise mechanism for _checking_ the type of scheme arguments entering a C function, _unpacking_ them to C values or pointers and to _wrap_ them in scheme values on function return. The `blob.c' file uses these abstractions. It also illustrates the abstract data extension (T_DATA). Entry: Errors from primitives Date: Mon Jul 27 14:58:19 CEST 2009 The simplest way seems to be returning a magic value whenever a Scheme primitive is executed. An alternative is to use setjump() to provide an error exit point. Let's try the latter. Using USE_SETJUMP Done. It works by calling _Error_1 in the context of the C foreign function, followed by a longjmp() which aborts the current C stack and returns to the function's invokation point. Entry: Delimited Continuations for Primitives Date: Fri Jul 31 17:19:27 CEST 2009 The idea is this: bridging C and Scheme can sometimes be tedious due to the mismatch of data types. As long as the C side doesn't allocate anything, the type matching can be largely automated. However, it is often one wants to return a list of atoms. This requires explicit use of the scheme list structure and its allocation mechanism inside C primitives. This hinders automatic impedance matching. C isn't such a bad language to program in as long as all data can be allocated as local variables on the call stack. Reifying this as a data object makes it possible to avoid containers altogether, by representing them by traversal functions: generators for read-only data, and zippers for read-write data. This allows a single point of allocation: representing the C stack (cursor state) as a data structure. The problems to be solved seem to be: - marking the stack frame in memory - storing it somewhere I just added the exception mechanism. What about turning this into resumable tasks? Ok. The core scheme is now extended with a 2nd way of aborting the computation: returning a value after returning from a longjmp(), where this value could be a representation of a copy of the C stack. The extension library task.c implements that datastructure. So, what are the problems? - C code cannot hold on to Scheme references accross suspension. References made inside the C context are not accessible to the garbage collector, meaning that the referred objects might no longer be alive on resume. This is actually a _good thing_ as it allows decoupling of the memory models. - Code needs to be instantiated in the same place for pointers to make sense. This means it's not possible to use C -> scheme calls. This problem can be solved by allocating resumable stack segments on the heap in the first place, so they don't need to be copied. The advantage is that C code that doesn't use continuations runs fast. - Return values are always a pair (ctx . rv) because we can't store the rv in the ctx since there's no transitive mark(). What does this solve? - Iteration state doesn't need to be manually wrapped in C or Scheme objects: one size fits all. I like this. Too bad it has some problems that can't be managed automatically. The general idea is quite powerful though. I wonder if re-arranging things so that this does work makes sense. One change would be to jump to a C-stack on the heap whenever a reset() is called. How to do that? Another thing to investigate: how does tinyscheme use the C-stack? There's an option that makes it more C-like.. (EDIT: no magic here). An older version of glibc longjmp can be found here[1]. I'm not sure if there is a portable way to jump into a piece of malloc() memory. From[3] it looks like that at least for glibc, the stack pointer is at a fixed offset in the struct. #define JB_SP 4 Is this the same for all archs? Maybe look at uclibc too. Apparently there are standard ways of using malloc() memory as stack space[4]. However the gnu doc says this is deprecated in posix due to portability problems, and that posix threads should be used instead. I don't see why. So, how to use setjmp/longjmp instead? The trick is in assuming that stack and heap are in the same address space: simply allocate a large stack frame that will not be used but brings locals into the heap. (EDIT: this doesn't seem to work so well since positions of local variables on the stack are essentially unspecified. Even the existence of the stack itself is never made explicit in the standard!) The interface atom looks like this: a function that returns a lazy list return either nil or a cons of an element and a continuation. The continuation takes a single argument. I've added some safety checks that make sure a context can only be instantiated when it has the same base address. [1] http://cvs.savannah.gnu.org/viewvc/libc/sysdeps/x86_64/__longjmp.S?revision=1.2.2.4&root=libc&view=markup [2] http://cvs.savannah.gnu.org/viewvc/libc/sysdeps/i386/__longjmp.S?root=libc&view=markup [3] http://cvs.savannah.gnu.org/viewvc/libc/sysdeps/i386/jmpbuf-offsets.h?root=libc&view=markup [4] http://en.wikipedia.org/wiki/Setcontext Entry: TinyScheme's C stack magick Date: Sat Aug 1 11:23:11 CEST 2009 Actually: it's not the C stack that's used, but a different array based stack for the dump (context save of the 4 registers). Entry: Tasks on heap Date: Sat Aug 1 13:41:07 CEST 2009 So.. Instead of copying the C stack, how can a C subroutine's stack be allocated on the heap before it is executed, avoiding unnecessary copying? The following macro could act as a portable way to switch stacks: #define SET_STACK(sp) \ char _reserve_start[0]; \ alloca(((void*)_reserve_start) - (sp)) Maybe it's just more convenient to keep it as it is: no need to estimate stack sizes and no need for extra wrapping for initial entry and final exit. Problem: when I disable optimization, it no longer works. Looks like I assume I'm depending on some parameters being stored in registers when switching/copying stack. After the memcpy() the variable containing a reference to the context gets overwritten. task *ctx; 0x7fffffffd3e8 _sp 0x7fffffffd338 size 472 The `ctx' variable is not allocated on top of the reserved stack space.. the alloca() doesn't do what I think it does. Trying again with #define SET_STACK(sp) \ char _reserve_start[0]; \ char _reserve[((void*)_reserve_start) - (sp)] gives: ctx->base 0x7fffffffd510 &ctx 0x7fffffffd3e8 _reserve_start 0x7fffffffd3a0 _sp 0x7fffffffd338 _reserve 0x7fffffffd2f0 which makes no sense at all.. Conclusion: It looks like C variables can be re-arranged in memory. Apparently nothing is specified[1]. Trying again but using the assumption that a large array will be allocated on the stack. /* Reserve stack space (overestimate) */ volatile void *reserve[ctx->size / sizeof(void *)]; base 0x7fffffffd510 sp 0x7fffffffd338 reserve 0x7fffffffd180 Ok. I fixed it now by: - allocating a large array to make sure function call/return works during/after memcpy. - making sure the ctx variable is not allocated on the stack So, maybe it is possible to predict alloca? Run it once, see where the pointer ends up, then run it again to move the stack permanently. [1] http://stackoverflow.com/questions/1102049/order-of-local-variable-allocation-on-the-stack Entry: Traversal Date: Sat Aug 1 18:36:34 CEST 2009 Enumerators are better than cursors because: - enumeration state is hard to represent - open/close needs to be handled explicitly - end-of-stream needs out-of-band value Ultimately this is about memory management.. You want an enumerator to make sure that a resource can be freed when the traversal is done. Encapsulating it in an cursor object makes this difficult to manage. How to make sure the cursor is finalized? I'm already running into these kinds of problems: * Allow for a free() method by adding an `abort' input atom. This would allow for dynamic data to exist on the stack. (One could require streams to be always run to the end?) * Allow for dynamic data by _moving_ it from scheme atoms. Best is pjrobably to copy. Is it possible to turn this into genuine `fold' operation, and have the stack capture as a special case? Instead of binding the `yield' to the C procedure, it could be replaced by a generic closure object called from the procedure by re-entering the interpreter. Basicly what I'm doing at this moment (always capturing the stack, turning the enumerator into an cursor) is _against_ the argumentation for enumerators. I only do it to make it easy to perform memory allocation in C, but it seems to only avoid the problems in some special cases. A different question: for recursive invokations of the interpreter, can the middle C-part of the call stack be spliced out so the Scheme part can be flattened? Or is it really simpler to keep the current `rebasing' as is, but wrap the cursors in an enumerator interface that can guarantee proper resource finalization? So.. How do these two conflicting stances find a balance? Generators allow the removal of tree structurs from C code, but the resulting objects are difficult to manage if they are not allowed to be finalized. (They work fine for stack-allocated temporary data however.) [1] http://okmij.org/ftp/papers/LL3-collections-enumerators.txt Entry: A different model for tasks Date: Mon Aug 3 11:25:52 CEST 2009 From a practical pov, the current implementation based on stack copying might in practice not be optimal. Let's create a linear version, where a task can only be resumed _in-place_. Ultimately my goal is to use the same C primitives in Scheme and a linear, stack based PF-like language. So, practically, what is the next step? First make sure that in the current DaVinci code this model can be used. Implementation of the task mechanism isn't that important: get the semantics right first. [1] http://annexia.org/freeware/pthrlib Entry: task.c as part of scheme.c ? Date: Mon Aug 3 11:53:57 CEST 2009 Maybe not: i probably am going to need different implementations later.. Maybe stick to the shlib? This however does need a tie into the TinyScheme API. Otoh, the task mechanism does call deep into the private scheme implementation, so probably best to keep it. First let's cleanup the current switch() statement. OK: task.c is now in scheme.c Entry: dumping tables Date: Mon Aug 3 12:25:05 CEST 2009 So, now that any data structure can be serialized, how to serialize tree structures or tables? A simple protocol would be () 1 2 3 () 4 5 6 for (1 2 3) (4 5 6) Entry: Task implementation Date: Mon Aug 3 12:47:44 CEST 2009 Follow this[1] thread: it is mentioned that MLton uses mmap() for stack allocation. Then pthrlib[2] is a good starting point. The file src/pthr_context.c contains implementations based on get/setcontext() and hacks of the jmp_buf struct. [1] http://caml.inria.fr/pub/ml-archives/caml-list/2007/07/d2677c52a6b251df4fa3c529a8066ec1.en.html [2] http://annexia.org/freeware/pthrlib Entry: transitive mark() Date: Mon Aug 3 13:36:46 CEST 2009 I'm going to need transitive (recursive) mark() to allow retention of a task's input arguments as long as it is alive. This requires extension of the scheme.c external object API. Entry: getcontext() Date: Mon Aug 3 19:04:49 CEST 2009 It doesn't seem to work in uClibc for the arm: /home/tom/git/tinyscheme/scheme.c:4821: warning: warning: getcontext is not implemented and will always fail So.. do I want this bad enough to figure out how to fix it? It doesn't look like its really difficult though, with inspiration from pthrlib[1]. And then there is GNU pth[2], which seems to be specifically designed to solve this issue. Let's see what the author has to say[3]. * Multithreading is advantageous mostly from a resource use point of view. (ED: No intermediate data structures to store, and no code that unfolds/folds them.) * Internal communication can leverage the shared address space. Essentially: pointers to long-lived data can be passed which might reduce communication overhead. This looks contradictory at first (avoid data and then be happy you can still pass data) but the key observation is that _intermediate data_ is kept short-lived. Practically this means that it will not leave the CPU cache. The portable trick used is creating a signal stack and jumping into it by generating a signal. Once there, setjmp and longjmp can be used to save and restore context. This all gives me the creeps.. Anyways, I can use the interface. If performance is bad it's always possible to replace it with a simple custom-made solution. Like the paper says: it's only saving a couple of registers. [1] http://www.annexia.org/freeware/pthrlib [2] http://www.gnu.org/software/pth/ [3] http://www.gnu.org/software/pth/rse-pmt.ps