Tue Jul 14 10:44:51 CEST 2009

Staapl's ideas in a broader context

I would like to attempt another introduction to the ideas behind
Staapl from the viewpoint of current state of affairs in the embedded
C programming world as I perceive them.  This should be useful in a
broader sense, outside of the specific (Forth, Scheme based) Staapl

1. The Status Quo

One thing is sure at this point in time.  C and Unix (Linux) have won.
There is nothing that can reduce cost more than a standard platform.
Moreover, In places where Unix' memory footprint is too large, C is
still surviving.

What saddens me is that this focus on C is stifling constructive
creativity in other ways.  C might be a "good enough" platform for
creating a vast collection of open source tools that can run anywhere.
It however is _not_ a nice language to write complex programs from

2. Doing it Differently

What I am trying to get an idea about in Staapl is to see what
embedded programming is really about.  What are you left with once you
eliminate the struggle with the tools?

Writing C code in the past, I have lost a tremendous amount of time
dealing with its low-level nature.  I've found my way around most of
these problems, but only _after_ careful study of high level
programming systems.  A bit of knowledge about the implementation of
high-level languages makes it possible to get enough of the necessary
infrastructure into a C program to soften the development pain[1].

Working on Staapl, the conclusion I've come to is that an embedded
programming system should have the following components well

  - a simple base language that allows you to get close to the machine

  - reusable libraries that take care of recurrent problems

  - a sane metaprogramming system (code generation from abstract
    descriptions) closely matched to the base language

  - a straightforward way to support debugging and profiling
    (introspection) in the development loop

What "well integrated" means in practice is to not cause extra
problems on top of the problems you are already going to encounter in
an embedded software setting: debugging and profiling.

Staapl tries to solve these problems by combining ideas from Forth and
Scheme to acknowledge and solve the base language, metaprogramming and
debugging/profiling problems.  The particularities of Scheme and Forth
make it possible to include base _language_ extensions as libraries.
From a modular programming perspective this is a very strong plus[3].

3.  Current Solutions.

The embedded (Linux-based) software industry seems to have filled the
4-point list in the following way:

  - C (a GCC-based cross compiler)

  - a huge amount of available code

  - a lot of C preprocessor macros + ad hoc external scripts and test
    suits that make it all hold together

  - printf (or more sophisticated log / trace based debugging) and
    modifications to the source code that make it possible for GDB to
    be an effective tool using breakpoints and watchpoints

The first item is actually quite well covered by the C programming
language.  C is close enough to the machine to not introduce
many artificial limits when you need low-level control.

The second item is filled in the sense that there is a vast body of in
principle reusable C code.  However, the problem with combining this
code is in the interfaces they have to present.  Because C's code and
data API is so low-level, a lot of libraries depend on _other_
libraries to solve representation problems.  The general disorganized
nature of the Open Source Software community turns this into quite a
mess.  A typical Linux based workstation has a large amount of
"middleware" duplication due to this.  I'm almost willing to bet that
half of all C code written is simply glue code that could be
eliminated where there a better standard interface.

The funny thing is that on a workstation this duplication doesn't
really pose a problem due to an abundance of memory resources.
However, in special purpose embedded applications it leads to a
serious hindrance: sometimes it is practically impossible to
disentangle the web of dependencies and produce lean code, so people
start re-inventing and re-implementing interfaces to add to the
already huge pile of solutions.

Then, looking at metaprogramming, it is not unreasonable to call the C
preprocessor an abomination.  It has causes a lot of people to write
inpenetrable macro code and resort to ad-hoc C code generator scripts.

There are examples of quite involved systems that are meant to
generate C code.  However, the fact that there is no standard here is
a genuine problem that wastes a lot of developer's time.
Closed-source solutions (such as the Mathworks' template system) don't
really help either.

The last point, profiling and debugging, has been made simpler in the
recent years due to the development of open simulators.  The irony is
that this is not done at the point where it makes most sense (a
programming laguage's machine model) but at a _real_ machine's
interface, a de-facto standard arising due to semiconductor market
dynamics.  Point being that such a low-level tie point is often not
very practical.  But hey, it's better than nothing, and it definitely
has some use (i.e. valgrind).

4.  So what is the real practical problem?

The lack of a standard approach for metaprogramming.

If there is one thing we can learn from the past is that standards
cannot be imposed.  They emerge as a consequence of many small
short-sighted local decisions, and are bound to be sub-optimal[2], so
let's rephrase this:

  The real problem is the lack of a standard for C metaprogramming.

If you start looking at C and its associated interfaces as a platform
(a machine) then the problem is really simple: you can automate away a
lot of tedious low-level C handling as long as your system can

  * generate C code from some meaningful high-level description with
    whatever form of error-catching high-level semantics
    added. (i.e. a static type system).

  * parse C code to make sure you can tie into the vast collection of
    open source software to extract what you need, using either their
    public interfaces, or possibly going deeper into their internal
    structure and pick and choose there.

A tool that can read/write C code full of unnecessary cleverness and
extract meaningful components in a way that can be used without
unnecessary cleverness on the operator's side would be opening quite a
lot of possibilities.  Even without a background in language design,
one can think of many applications that would make large classes of
boilerplate code a thing of the past if only the complexity of
"managing the code reader / generator" can somehow be reduced.

Moral of the story: if C is there to stay, it's probably best to treat
it as a "data format" instead of a programming language. 

5.  What to do with metaprogramming?

Once you have metaprogramming techniques, which is essentially control
over the semantics of your languages, what can you do with it?

There is one single idea that keeps coming back to me: make sure your
application can move gradually from a dynamic to a static structure.
This fixes an extra handle on your project to get to the right
combination of correctness, observability and maintainability.

Dynamic features make development easier in the stage where you don't
really know what you're doing yet.  Dynamic languages allow for ad-hoc
debugging tools.  If there's one place where you need one-shot
cleverness it's debugging.  It is really helpful to be able to change
the semantics of your base language in specific ways to trace down
problems.  What you don't want is non-observable parts hiding in your
system because of rigid static constraints imposed by the programming
system.  Code really is data, and so is machine state.  You should be
able to look at every aspect of a program, dead (static code) or alive
(runtime state).

Once you have and idea correctly implemented, try to move as much as
possible to static code and eliminate all unnecessary scaffolding
cleverness.  If your solution is any good, it probably has some kind
of structure that can be expressed elegantly in a modern type system.

From my experience: early development in a static language is hard
because the language tends to get in the way.  In a dynamic language,
late development and maintenance is hard because dynamic languages
allow too much implementation freedom and complexity and so leave much
space for obscure errors to hide.  Dynamic languages leave the
programmer's internal representation implicit: this is exactly the
stuff you'll forget about when not working on the code for a couple of
months, or the stuff that the other guy looking at your code doesn't
know at all.  If you cast this structure in logic you're better off in
the end.  

Practically my message seems to boil down to two principles:

  - allow for on-target dynamic structures (embed an interpreter for a
    reflective scripting language so it's there when you need

  - make sure you have a good static tool setup (compiler +
    verification) so most (all) unnecessary "moving parts" can be
    eliminated as soon as you have the basic structure figured out.

6.  Links

[1] http://en.wikipedia.org/wiki/Greenspun%27s_Tenth_Rule
[2] http://en.wikipedia.org/wiki/Worse_is_better
[3] http://blog.plt-scheme.org/2007/05/macros-matter.html