Sat Aug 1 21:58:50 CEST 2009
Concurrency oriented programming in C + Scheme
The context of this article is the use of Scheme as an extension
language in a predominantly C-based application. It is a way to make
C programming easier by sticking to a subset that performs only
`hierarchical' data allocation on the C-stack (i.e. doesn't use
malloc() and free(), or any kind of explicit alloc/dealloc). Let's
call this subset stack-C. From my experience(1) I make the assumption
that stack-C is enough to solve many practical subproblems, given the
external scheduling and memory management can be solved.
The main idea is that while mixing ordinary C with (scripting)
languages that have a different memory and execution model can be
problematic (i.e. Scheme with GC'd CONS cells and a continuation-based
control mechanism), the stack-C approach provides a clean separation
between the C side and any time and space resource management provided
by the scripting language runtime.
The reason is that stack-C tasks can be:
- trivially suspended to (contiguous!) memory
- resumed multiple times (trivially forked)
- garbage collected without finalization
- separated completely from memory and control management
- trivially parallelized
This is essentially declarative concurrency. You shift the
object-oriented model (state machines) to tasks.
The idea is related to deforestation, where intermediate data
structures (which require knowledge of the memory model) are
eliminated by (static) scheduling of communicating tasks. Concretely:
the stack-C code needs to only consume and produce atomic data, which
is easily wrapped, and never builds any (recursive) data collections.
When the need arises to build datastructures (intermediate
representations) this can be done in the scripting language, without
complicating the C code.
This allows core algorithms to be written in `functional C' while all
data management and code scheduling can be performed by the scripting
language, be it a GCd Scheme, or a linear concatenative language.
Memory and time management can then be `library provided'. I wonder
if this is the key to a practical linear language. The fact that such
`suspendable functional C' primitives could be used in two completely
different memory models is a powerful hint.
(1) The idea comes from an implementation of an EtherCat master
driver written in C, in a style that does not use malloc() for
data structure allocation, but instead allocates all memory on
the activation stack. This made me wonder if this programming
style can't be generalized. The idea popped up again trying to
extend TinyScheme with `inverted enumerators' aimed at replacing
list allocation with coroutine / generator style parameter
(2) When stack-C is extended to include open/close or malloc/free
style external resource management, the main challenge in this
approach is finalization of tasks. What we do is creating
cursors by enumerator inversion, which doesn't guarantee the
traversal is run to the end. This might however be elegantly
solved by always re-inverting such inverted enumerators back to
enumerators in the scripting language. Once could however argue
that managed external resources are an implementation artifact
and could be designed out.
(3) Alternatively, a linear approach (no cyclic data, refcount based
management) makes this easier to manage. However, it does
require a finalize() method to the task. Linearly managing the
execution contexts themselves (making fork() expensive) might be
not such a bad idea.