Extensions to tinyscheme 1.39

Entry: Wrapping external pointers
Date: Thu Jul 23 15:11:12 CEST 2009

There doesn't seem to be any interface for managing external data.
I'm adding a "void" datatype.

what this requires:
  - T_DATA type tag (enough bits?)
  - is_data()
  - mk_foreign_data()
  - typedef foreign_data
  - _object._data
  - datavalue()
  - finalize_cell()
  - atom2str()

not yet:
  - garbage collector: recursive mark() on scheme data
  - scheme predicate (general: defs should define their own predicate)


The only problem is that there is only a single type of foreign data,
so the C side should make sure it can interpret a void pointer,
i.e. by adding an extra level of wrapping.

This is then solved by using the finalize operation as the type.

EDIT: I am officially stupid.  The hacking.txt[1] document explains
exactly what I just did.  Nobody read the docs, doc!

Ok it seems to be mostly what I did, except that my extension
abstracts pointers and uses the destructor as a type tag.

[1] http://tinyscheme.sourceforge.net/hack.txt


Entry: Objects are environments
Date: Fri Jul 24 16:34:12 CEST 2009

[1] http://tinyscheme.sourceforge.net/oo.txt


Entry: Small interface changes
Date: Fri Jul 24 18:41:48 CEST 2009

ivalue -> integer_value
rvalue -> real_value

This makes CPP macros that generate type juggling red-tape simpler.


Entry: Errors from primitives
Date: Fri Jul 24 23:16:42 CEST 2009

This needs a modification to the interpreter.  At first sight, there
is no way to implement this as a function: the C stack frame of the
primitive would act as a guard.


Entry: Vectors are consecutive cells
Date: Sun Jul 26 12:48:49 CEST 2009

This makes access O(1) but allocation not especially efficient and
possibly leaky in that finding consecutive cells could be more
problematic as time advances and the free list gets more fragmented.

Maybe not?  The freelist is kept sorted, so at least this problem is
minimized in some sense, but that doesn't guarantee anything.


Entry: Type checking + wrapping
Date: Mon Jul 27 14:52:14 CEST 2009

I've added the type checking and wrapping abstractions as data.h and
data.c to the tinyscheme tree.  These functions provide a concise
mechanism for _checking_ the type of scheme arguments entering a C
function, _unpacking_ them to C values or pointers and to _wrap_ them
in scheme values on function return.

The `blob.c' file uses these abstractions. It also illustrates the
abstract data extension (T_DATA).


Entry: Errors from primitives
Date: Mon Jul 27 14:58:19 CEST 2009

The simplest way seems to be returning a magic value whenever a Scheme
primitive is executed.  An alternative is to use setjump() to provide
an error exit point.

Let's try the latter.

Using USE_SETJUMP

Done.  It works by calling _Error_1 in the context of the C foreign
function, followed by a longjmp() which aborts the current C stack and
returns to the function's invokation point.


Entry: Delimited Continuations for Primitives
Date: Fri Jul 31 17:19:27 CEST 2009

The idea is this: bridging C and Scheme can sometimes be tedious due
to the mismatch of data types.  As long as the C side doesn't allocate
anything, the type matching can be largely automated.  However, it is
often one wants to return a list of atoms.  This requires explicit use
of the scheme list structure and its allocation mechanism inside C
primitives.  This hinders automatic impedance matching.

C isn't such a bad language to program in as long as all data can be
allocated as local variables on the call stack.  Reifying this as a
data object makes it possible to avoid containers altogether, by
representing them by traversal functions: generators for read-only
data, and zippers for read-write data.

This allows a single point of allocation: representing the C stack
(cursor state) as a data structure.

The problems to be solved seem to be:

  - marking the stack frame in memory
  - storing it somewhere

I just added the exception mechanism.  What about turning this into
resumable tasks?  

Ok.  The core scheme is now extended with a 2nd way of aborting the
computation: returning a value after returning from a longjmp(), where
this value could be a representation of a copy of the C stack.  

The extension library task.c implements that datastructure.

So, what are the problems?

  - C code cannot hold on to Scheme references accross suspension.
    References made inside the C context are not accessible to the
    garbage collector, meaning that the referred objects might no
    longer be alive on resume.  This is actually a _good thing_ as it
    allows decoupling of the memory models.

  - Code needs to be instantiated in the same place for pointers to
    make sense.  This means it's not possible to use C -> scheme
    calls.  This problem can be solved by allocating resumable stack
    segments on the heap in the first place, so they don't need to be
    copied.  The advantage is that C code that doesn't use
    continuations runs fast.

  - Return values are always a pair (ctx . rv) because we can't store
    the rv in the ctx since there's no transitive mark().


What does this solve?

  - Iteration state doesn't need to be manually wrapped in C or Scheme
    objects: one size fits all.


I like this.  Too bad it has some problems that can't be managed
automatically.  The general idea is quite powerful though.  I wonder
if re-arranging things so that this does work makes sense.

One change would be to jump to a C-stack on the heap whenever a
reset() is called.  How to do that?

Another thing to investigate: how does tinyscheme use the C-stack?
There's an option that makes it more C-like.. (EDIT: no magic here).

An older version of glibc longjmp can be found here[1].  I'm not sure
if there is a portable way to jump into a piece of malloc() memory.
From[3] it looks like that at least for glibc, the stack pointer is at
a fixed offset in the struct.

#define JB_SP	4

Is this the same for all archs?  Maybe look at uclibc too.  

Apparently there are standard ways of using malloc() memory as stack
space[4].  However the gnu doc says this is deprecated in posix due to
portability problems, and that posix threads should be used instead.
I don't see why.

So, how to use setjmp/longjmp instead?  The trick is in assuming that
stack and heap are in the same address space: simply allocate a large
stack frame that will not be used but brings locals into the heap.
(EDIT: this doesn't seem to work so well since positions of local
variables on the stack are essentially unspecified.  Even the
existence of the stack itself is never made explicit in the standard!)

The interface atom looks like this: a function that returns a lazy
list return either nil or a cons of an element and a continuation.
The continuation takes a single argument.

I've added some safety checks that make sure a context can only be
instantiated when it has the same base address.


[1] http://cvs.savannah.gnu.org/viewvc/libc/sysdeps/x86_64/__longjmp.S?revision=1.2.2.4&root=libc&view=markup
[2] http://cvs.savannah.gnu.org/viewvc/libc/sysdeps/i386/__longjmp.S?root=libc&view=markup
[3] http://cvs.savannah.gnu.org/viewvc/libc/sysdeps/i386/jmpbuf-offsets.h?root=libc&view=markup
[4] http://en.wikipedia.org/wiki/Setcontext


Entry: TinyScheme's C stack magick
Date: Sat Aug  1 11:23:11 CEST 2009

Actually: it's not the C stack that's used, but a different array
based stack for the dump (context save of the 4 registers).


Entry: Tasks on heap
Date: Sat Aug  1 13:41:07 CEST 2009

So.. Instead of copying the C stack, how can a C subroutine's stack be
allocated on the heap before it is executed, avoiding unnecessary
copying?  The following macro could act as a portable way to switch
stacks:

#define SET_STACK(sp) \
    char _reserve_start[0]; \
    alloca(((void*)_reserve_start) - (sp))

Maybe it's just more convenient to keep it as it is: no need to
estimate stack sizes and no need for extra wrapping for initial entry
and final exit.

Problem: when I disable optimization, it no longer works.  Looks like
I assume I'm depending on some parameters being stored in registers
when switching/copying stack.

After the memcpy() the variable containing a reference to the context
gets overwritten.

task *ctx; 0x7fffffffd3e8
_sp        0x7fffffffd338
size       472

The `ctx' variable is not allocated on top of the reserved stack
space..  the alloca() doesn't do what I think it does.


Trying again with

#define SET_STACK(sp) \
    char _reserve_start[0]; \
    char _reserve[((void*)_reserve_start) - (sp)]

gives:

ctx->base      0x7fffffffd510
&ctx           0x7fffffffd3e8
_reserve_start 0x7fffffffd3a0
_sp            0x7fffffffd338
_reserve       0x7fffffffd2f0


which makes no sense at all..  Conclusion: It looks like C variables
can be re-arranged in memory.  Apparently nothing is specified[1].

Trying again but using the assumption that a large array will be
allocated on the stack.

    /* Reserve stack space (overestimate) */
    volatile void *reserve[ctx->size / sizeof(void *)];


base          0x7fffffffd510
sp            0x7fffffffd338
reserve       0x7fffffffd180

Ok. I fixed it now by:

    - allocating a large array to make sure function call/return works
      during/after memcpy.

    - making sure the ctx variable is not allocated on the stack


So, maybe it is possible to predict alloca?  Run it once, see where
the pointer ends up, then run it again to move the stack permanently.


[1] http://stackoverflow.com/questions/1102049/order-of-local-variable-allocation-on-the-stack


Entry: Traversal
Date: Sat Aug  1 18:36:34 CEST 2009

Enumerators are better than cursors because:

   - enumeration state is hard to represent

   - open/close needs to be handled explicitly

   - end-of-stream needs out-of-band value

Ultimately this is about memory management..  You want an enumerator
to make sure that a resource can be freed when the traversal is done.
Encapsulating it in an cursor object makes this difficult to
manage.  How to make sure the cursor is finalized?

I'm already running into these kinds of problems:

* Allow for a free() method by adding an `abort' input atom.  This
  would allow for dynamic data to exist on the stack.  (One could
  require streams to be always run to the end?)

* Allow for dynamic data by _moving_ it from scheme atoms.  Best is
  pjrobably to copy.

Is it possible to turn this into genuine `fold' operation, and have
the stack capture as a special case?

Instead of binding the `yield' to the C procedure, it could be
replaced by a generic closure object called from the procedure by
re-entering the interpreter.  Basicly what I'm doing at this moment
(always capturing the stack, turning the enumerator into an cursor)
is _against_ the argumentation for enumerators.  I only do it to make
it easy to perform memory allocation in C, but it seems to only avoid
the problems in some special cases.

A different question: for recursive invokations of the interpreter,
can the middle C-part of the call stack be spliced out so the Scheme
part can be flattened?  Or is it really simpler to keep the current
`rebasing' as is, but wrap the cursors in an enumerator interface that
can guarantee proper resource finalization?

So.. How do these two conflicting stances find a balance?  Generators
allow the removal of tree structurs from C code, but the resulting
objects are difficult to manage if they are not allowed to be
finalized.  (They work fine for stack-allocated temporary data
however.)


[1] http://okmij.org/ftp/papers/LL3-collections-enumerators.txt


Entry: A different model for tasks
Date: Mon Aug  3 11:25:52 CEST 2009

From a practical pov, the current implementation based on stack
copying might in practice not be optimal.  Let's create a linear
version, where a task can only be resumed _in-place_.

Ultimately my goal is to use the same C primitives in Scheme and a
linear, stack based PF-like language.

So, practically, what is the next step?  First make sure that in the
current DaVinci code this model can be used.  Implementation of the
task mechanism isn't that important: get the semantics right first.

[1] http://annexia.org/freeware/pthrlib


Entry: task.c as part of scheme.c ?
Date: Mon Aug  3 11:53:57 CEST 2009

Maybe not: i probably am going to need different implementations
later.. Maybe stick to the shlib?  This however does need a tie into
the TinyScheme API.  Otoh, the task mechanism does call deep into the
private scheme implementation, so probably best to keep it.

First let's cleanup the current switch() statement.
OK: task.c is now in scheme.c


Entry: dumping tables
Date: Mon Aug  3 12:25:05 CEST 2009

So, now that any data structure can be serialized, how to serialize
tree structures or tables?

A simple protocol would be () 1 2 3 () 4 5 6 for (1 2 3) (4 5 6)


Entry: Task implementation
Date: Mon Aug  3 12:47:44 CEST 2009

Follow this[1] thread: it is mentioned that MLton uses mmap() for
stack allocation.  Then pthrlib[2] is a good starting point.  The file
src/pthr_context.c contains implementations based on get/setcontext()
and hacks of the jmp_buf struct.

[1] http://caml.inria.fr/pub/ml-archives/caml-list/2007/07/d2677c52a6b251df4fa3c529a8066ec1.en.html
[2] http://annexia.org/freeware/pthrlib


Entry: transitive mark()
Date: Mon Aug  3 13:36:46 CEST 2009

I'm going to need transitive (recursive) mark() to allow retention of
a task's input arguments as long as it is alive.  This requires
extension of the scheme.c external object API.


Entry: getcontext()
Date: Mon Aug  3 19:04:49 CEST 2009

It doesn't seem to work in uClibc for the arm:

/home/tom/git/tinyscheme/scheme.c:4821: warning: warning: getcontext is not implemented and will always fail

So.. do I want this bad enough to figure out how to fix it?  It
doesn't look like its really difficult though, with inspiration from
pthrlib[1].  And then there is GNU pth[2], which seems to be
specifically designed to solve this issue.  Let's see what the
author has to say[3].

  * Multithreading is advantageous mostly from a resource use point of
    view.  (ED: No intermediate data structures to store, and no code
    that unfolds/folds them.)

  * Internal communication can leverage the shared address space.
    Essentially: pointers to long-lived data can be passed which might
    reduce communication overhead.

This looks contradictory at first (avoid data and then be happy you
can still pass data) but the key observation is that _intermediate
data_ is kept short-lived.  Practically this means that it will not
leave the CPU cache.

The portable trick used is creating a signal stack and jumping into it
by generating a signal.  Once there, setjmp and longjmp can be used to
save and restore context.

This all gives me the creeps..

Anyways, I can use the interface.  If performance is bad it's always
possible to replace it with a simple custom-made solution.  Like the
paper says: it's only saving a couple of registers.


[1] http://www.annexia.org/freeware/pthrlib
[2] http://www.gnu.org/software/pth/
[3] http://www.gnu.org/software/pth/rse-pmt.ps