C and Unix programming.

Entry: NULL terminated arrays of pointers
Date: Thu Oct 23 14:43:21 CEST 2008

== a cdr coded lisp list.  These work better than arrays + length.

The only problem is that allocating temporary data structures on the
run-time stack requires object size.  But.. maybe alloca works here?


Entry: High level programming in C: Data Structures
Date: Wed Oct 29 10:14:30 CET 2008

[ This should go in a separate CS.txt log. ]

What I miss most in C are closures and directed acyclic datatypes.
This made me think a bit about different classes of data structures in
a C program.  In a dynamic language with GC these all tend to blend.


1.  Linear types (pure trees): every object has a single reference,
    which could be implemented as embedded structs (C struct
    inheritance) or embedded struct pointers (C struct delegation).

2.  Directed acyclic graphs: objects can have multiple references, but
    there are no loops.  Reference counting works here.

3.  General graphs: If there are circular references, reference counts
    won't work: you need to build "intelligent" constructors /
    destructors for the whole graph.


Entry: C
Date: Mon Nov 10 12:11:11 CET 2008

As long as everything is static, C isn't so bad.  But anything that
involves complicated datastructures is a pain..  The first thing to
standardize on is a list implementation.  Looks like this is done the
right way in the Linux kernel.


Entry: RPC and ad-hoc protocols
Date: Mon Nov 10 17:34:00 CET 2008

This is about different ways of looking at procedure calls, mostly
from an object-oriented pov (modify object in-place) vs. functional
programming view (share + copy).

  * In the OBJECT model, the client passes a data object to the
    server, which is modified in place and returned back to the
    client.  (I.e. Unix IOCTL).  The advantage here is that server is
    relieved from memory management.  In the setting of FP, linear
    memory management can be used.

  * In the FUNCTION model, there is no mutation of shared data between
    client and server.  The client passes a copy of the necessary data
    to the server over a message channel, and the server returns a
    reply over a different channel.

The problem I ran into is having to implement an RPC interface over a
unidirectional message protocol.  This happens when what you want is
really a shared library of data objects and function calls, but it
needs to provide control operations behind the scenes (i.e. it has an
event loop that's not triggered by the clients).


An interesting road is to investigate what is necessary to write an
RPC generator: eliminate the red tape by hiding the message protocol
from the user, and expose only procedure calls on both client and
server side, where client calls, and server gets called.

The problem with automating RPC generation is data-sharing: often in
C, a function will allocate or mutate an object, and thus creating a
side-channel.  So it looks like this abstraction is going to be
somewhat leaky, since it needs to be purely functional (copy instead
of mutation).


Entry: Protocol definition
Date: Tue Nov 11 13:26:19 CET 2008

1. Make sure code never depends on the protocol: all encoding should
   be done automatically.

2. Provide automatic documentation for wire debugging.  (Or better
   still, a sniffer that produces human-readable data.)


Entry: packet filters vs. aspect oriented programming
Date: Wed Nov 12 12:29:53 CET 2008

I'm working on an RPC mechanism over unidirectional Xenomai message
queues, and it struck me that a packet filter is actually a pointcut,
a set of join points matching a specification: given a stream of RPC
messages, let some through but modify others.


Entry: CSP style channels on Xenomai
Date: Wed Nov 12 16:13:23 CET 2008

I'm trying to use queues for something they are probably not useful
for.  What I really want is CSP style channels.

There are two problems:
  * A select() is necessary to wait on multiple events.
  * Tasks might be too expensive for fine task granularity.

This describes the CT library from UTwente:

http://www.ce.utwente.nl/rtweb/publications/2007/pdf-files/102CE2007_WMC07_pdf.pdf


 Q: What is the difference between waiting on a set of messages from a
    single "socket", and waiting on a set of ports?  Can such
    descriptions be automatically transcribed?

Anyways, the event flag group services provide the necessary
machinery:  

  An event flag group is a synchronization object represented by a
  long-word structure; every available bit in such word can be used to
  map a user-defined event flag. When a flag is set, the associated
  event is said to have occurred. Xenomai tasks and interrupt handlers
  can use event flags to signal the occurrence of events to other
  tasks; those tasks can either wait for the events to occur in a
  conjunctive manner (all awaited events must have occurred to wake
  up), or in a disjunctive way (at least one of the awaited events
  must have occurred to wake up).


Entry: Daemon vs. Shared library
Date: Tue Nov 18 11:16:06 CET 2008

Since a shared object can be used to share static data, as long as
processes need to communicate on the same machine and need RPC
semantics (clients are single thread, server uses synchronized
access), this is probably enough.

However, this does needs shared memory for the shared data, and
semaphores to protect them.  The API can abstract this.


Entry: Implementing CSP in a single thread.
Date: Tue Nov 18 16:17:20 CET 2008

Apparently, "CSP semantics" for conditions are inefficient to
implement.  One usually chooses for "Mesa semantics", where setting a
condition does not mean a yield, which means that the condition can
change before signalled tasks are effectively woken up.  The remedy is
to always check the condition after wake-up (loop before you leap!)

http://www.cs.duke.edu/courses/spring01/cps110/slides/sem/sld005.htm

But as far as I can see CSP semantics are only inefficient if task
switches are expensive.  The idea is probably that task switches can
be limited if a single thread sets more than one condition in a row.

So, when task switches are cheap (stack machines), this problem goes
away?  I'd like to implement this in Staapl.


Entry: Iterators vs. Fold vs. Comprehensions
Date: Fri Nov 28 12:11:49 CET 2008

http://lambda-the-ultimate.org/node/1224

What you want in C however, is a comprehension instead of a fold: this
allows to use lexical variables without creating explicit iteratee
context objects, or using gcc's downward closures.

A comprehension is really just the macro form of a fold: the iteratee
is inlined.

The problem with comprehension macros is the maintenance of state
across the loop body.  Such a macro looks like:

  type1 var1;
  type2 var2;
  ...
  FOR_COLLECTION(var1, var2, ...) {
      // code that uses var1, var2, ...
  }

Without access to the scope of the FOR and an intermediate scope of
the loop body, this becomes quite difficult to hack..  Something like
this might work better:

  type1 var1;
  type2 var2;
  ...
  FOR_COLLECTION(var1, var2, ...) 
      // code that uses var1, var2, ...
  END_FOR

In short, it's too complicated to do in general, so let's stick to
generic folds.


EDIT: Actually it is not necessary to store iteration state variables
inside the context.  Using a macro like this:

#define DE_COLLECTION_FOR_(_list, _de, _del)            \
    struct list *_del;                                  \
    for(_del = (*_list), _de = _del ? _del->de : NULL;  \
        _de;                                            \
        _del = _del->next, _de = _del ? _del->de : NULL)
#define DE_COLLECTION_FOR(list, de) DE_COLLECTION_FOR_(list, de, __del__)

The iteration state is stored outside the context.  The compiler will
probably optimize the use of the variable, so we need not worry about
memory usage.  The only remaining problem is the "__del__" symbol: it
is generated inside the macro and might cause name clashes.

How to implement gensym in C?[1]

[1] entry://20100825-142132


Entry: C++ local objects + exceptions.
Date: Mon Dec  8 10:46:40 CET 2008

Local objects are destroyed on exceptions.  This is actually quite
useful as a code generation target..  But isn't this the same as
alloca() ?


Entry: GET/SET refactoring
Date: Fri Dec 19 11:05:07 CET 2008

I've found an interesting test case for automatic refactoring:
implementing endianness abstraction by exctracting GET/SET accessors
from direct struct access in a low-level driver that uses little
endian packed data structures.


Entry: datastructures in C
Date: Wed Jan 21 17:14:43 CET 2009

In lisp, a composite datastructure is always a bunch of pointers to
other data structures.  In C, some data might be "inlined".  This
inlining is done behind the scenes in Lisp (i.e. tagged ints), while
in C you have:

   struct->field

or

   struct.field

And it's always _your_ responsability to know which of both it is.

Is it possible to do this automatically, but still retain manual
inlining at the definition site? I.e. to write something like
struct=>field, and have the compiler expand it to struct->field or
struct.field (analogous to Macros in Staapl)?


Entry: memory management + RT
Date: Thu Jan 29 09:27:42 CET 2009

  - RT datastructures (directed acyclic graphs)

This means eliminating malloc().  How far can we get with DAGs?  With
downward closures and read-only structures all data is stored in the
tasks's activation frame.  This means DAG memory management without
reference counting.

  - traversal and partial continuations

In the face of lightweight tasks (i.e. stack machines), partial
contuations (tasks) become a feasible abstraction.  Figure out if
there are "free rides".  I.e. if this can lead to datastructures that
would otherwise require special memory management (refcounts) but can
be built on top of stack-based allocators.

One remark though: this requires one (arbitrarily large) stack per
task, so just moves the problem.  The problem being that _real_ memory
comes from a shared pool - a machine architecture problem.

So.. Can we solve this problem with virtual memory?  Or is VM exactly
the thing that makes context switches expensive?

To summarize the idea:

    ASSUMPTION: Downward closures are 'powerful enough' for building
                data structures in simple RT programming.

     ADVANTAGE: Use fast stack-based memory management.

       PROBLEM: One stack per task.  How to allocate this from a
                single memory pool?

It is a non-issue for non-shared memory, but that might require
inefficient pre-allocation.  Where is the trade-off exactly?


Entry: logic & constraint based programming 
Date: Sat Jan 31 12:35:51 CET 2009

More specifically: reversible dataflow.  From a collection of
equations (or inequalities), construct a C program or interpreter +
program representation that computes a solution (or computes
feasability).


Entry: lists
Date: Thu Feb  5 17:29:21 CET 2009

For a simple static C program with a bit of dynamic data, this is a
simple list structure based only on concatenation, with "next"
pointers embedded in the objects.

This works for pure trees (flat trees).

Conceptually, only these operations are necessary:
     * a PUSH operation of a singleton to a stack.
     * a REVERSE operation to finalize construction preserving order
     * a FOR operation for traversal

If objects have built-in list pointers, there is conceptually no
"element".  All things are lists, and the primitive operations are
"overlay_1" and "split_1".

overlay_1 (abcd... , 1234...) = a1234...
split_1   (abcd...)           = bcd...

I'm not sure if this way of looking at it is so useful, but there _is_
a difference between objects that contain a next pointer, and obects
that are contained in a separate container structure (i.e. cons
cells).  Using embedded next is commonplace, and is easier to use with
manual memory management, so maybe it's better to stick to it.

Another way of looking at it: if datastructure construction is not
time-critical, using quadratic list insertion is not really a
problem..  That way append and iteration might be enough.

The problem with embedded "next" pointers is that if you want to put
the objects in an alternative container, you get two different
containement mechanism (one built-in, one explicit).


Entry: Data structure traversal in C
Date: Fri Feb  6 15:31:43 CET 2009

In a block-scoped language without proper lexical closures,
comprehensions (iteration macros) are easier to use than iteratatee
objects (higher order functions) -- the operation passed to an
iterator.

The reason is that the body of the comprehension has access to the
lexical environment outside of that body.

In this scenario, factor out the data structure's FIRST and NEXT
operations into procedures, and construct a trivial FOR macro using
those based on C's for( _ ; _ ; _) construct.

They are more limited: the comprehension loop block cannot be reused,
while (stateless) iteration objects can be.

EDIT: it is all about the difference between ``let'' and ``lambda''.
The latter ``forks'' the stack while the former does not.


Entry: (*x)[i]
Date: Mon Feb  9 12:00:00 CET 2009

Given int **x, what's the difference between  (*x)[i] and *x[i] ?

This is clear to see when we write (*x)[i] and *(x[i]);

In the first x is a pointer to an array of ints, in the second x is an
array of pointers to int.

Be careful, the compiler can't see the difference!
   

Entry: C macros and name capture
Date: Thu Feb 19 17:41:02 CET 2009

Today I spent an hour tracking down a classic bug: hygiene problems
with non-hygienic macros (C preprocessor).  See the following
definition, call and expansion:

// definition
#define LST_ADD(head_lvalue, tail) {\
        typeof(head_lvalue) *p = &(head_lvalue);        \
        while(*p) { p = &((*p)->next); }                \
        *p = tail; }

// call
LST_ADD(m->param_head, p);

// expansion
{ 
     typeof(m->param_head) *p = &(m->param_head); 
     while(*p) { p = &((*p)->next); } 
     *p = p; 
};        

The introduced symbol "p" shadows the variable "p" in the calling
context.  This can be avoided by always binding all macro input names
to local variable names before introducing macro-local names, but it's
sufficient to bind those that might be shadowed by the introduced
binding, which depends on how they are used.

// new definition
#define LST_ADD(head_lvalue, tail) {\
        typeof(tail) _tail = tail; {\
        typeof(head_lvalue) *p = &(head_lvalue);        \
        while(*p) { p = &((*p)->next); }                \
        *p = _tail; }}


Entry: Automatic binary network protocol definition
Date: Thu Feb 26 15:09:23 CET 2009

The idea: given a collection of (somehow restricted) functions, create
server and client protocol pack/unpack routines.

The reason: I loose too much time doing this by hand.

The restrictions: obviously, the functionality should fit in a simple
RPC mechanism that sends one packet to a server, and expects a return
packet (see above).  The main difficulty is memory management, and
mapping C function calls to structs and back.


Entry: classes vs. prototypes
Date: Sun Mar 29 14:30:19 CEST 2009

Writing plain vanilla C code, one often wants some form of OOP
indirection.  Instead of storing references to classes in objects, it
is often simpler to get rid of this level of indirection and store
methods directly in objects.  This is called ``Prototype based OOP''.

When it is clear that some kind of class structure emerges,
"behaviour" objects can be constructed: objects which contain only a
collection of methods and are referenced by other objects.

http://en.wikipedia.org/wiki/Prototype-based_programming


Entry: Concurrency oriented programming in C + Scheme
Date: Sat Aug  1 21:58:50 CEST 2009

The context of this article is the use of Scheme as an extension
language in a predominantly C-based application.  It is a way to make
C programming easier by sticking to a subset that performs only
`hierarchical' data allocation on the C-stack (i.e. doesn't use
malloc() and free(), or any kind of explicit alloc/dealloc).  Let's
call this subset stack-C.  From my experience(1) I make the assumption
that stack-C is enough to solve many practical subproblems, given the
external scheduling and memory management can be solved.

The main idea is that while mixing ordinary C with (scripting)
languages that have a different memory and execution model can be
problematic (i.e. Scheme with GC'd CONS cells and a continuation-based
control mechanism), the stack-C approach provides a clean separation
between the C side and any time and space resource management provided
by the scripting language runtime.

The reason is that stack-C tasks can be:

   - trivially suspended to (contiguous!) memory
   - resumed multiple times (trivially forked)
   - garbage collected without finalization
   - separated completely from memory and control management
   - trivially parallelized

This is essentially declarative concurrency.  You shift the
object-oriented model (state machines) to tasks.

The idea is related to deforestation, where intermediate data
structures (which require knowledge of the memory model) are
eliminated by (static) scheduling of communicating tasks.  Concretely:
the stack-C code needs to only consume and produce atomic data, which
is easily wrapped, and never builds any (recursive) data collections.
When the need arises to build datastructures (intermediate
representations) this can be done in the scripting language, without
complicating the C code.

This allows core algorithms to be written in `functional C' while all
data management and code scheduling can be performed by the scripting
language, be it a GCd Scheme, or a linear concatenative language.
Memory and time management can then be `library provided'.  I wonder
if this is the key to a practical linear language.  The fact that such
`suspendable functional C' primitives could be used in two completely
different memory models is a powerful hint.

Footnotes.

  (1) The idea comes from an implementation of an EtherCat master
      driver written in C, in a style that does not use malloc() for
      data structure allocation, but instead allocates all memory on
      the activation stack.  This made me wonder if this programming
      style can't be generalized.  The idea popped up again trying to
      extend TinyScheme with `inverted enumerators' aimed at replacing
      list allocation with coroutine / generator style parameter
      linkage.

  (2) When stack-C is extended to include open/close or malloc/free
      style external resource management, the main challenge in this
      approach is finalization of tasks[2].  What we do is creating
      cursors by enumerator inversion, which doesn't guarantee the
      traversal is run to the end.  This might however be elegantly
      solved by always re-inverting such inverted enumerators back to
      enumerators in the scripting language.  Once could however argue
      that managed external resources are an implementation artifact
      and could be designed out.  

  (3) Alternatively, a linear approach (no cyclic data, refcount based
      management) makes this easier to manage.  However, it does
      require a finalize() method to the task.  Linearly managing the
      execution contexts themselves (making fork() expensive) might be
      not such a bad idea.


[1] http://okmij.org/ftp/Haskell/misc.html#fp-arrays-assembly
[2] http://okmij.org/ftp/Scheme/enumerators-callcc.html#Finalization
[3] http://okmij.org/ftp/papers/LL3-collections-enumerators.txt


Entry: Generating C code
Date: Sun Aug  9 12:52:37 CEST 2009

When generating boilerplate C code for a project, it seems better to
generate either full C files, or header files with defines and inline
functions.  Including bits and pieces of C code from files seems
error-prone.


Entry: Unlikely
Date: Sat Aug 15 11:36:41 CEST 2009

I've started to use the `unlikely' annotation.

#define unlikely(x) __builtin_expect((x),0)

if (unlikely(NULL == foo)) { ... }

Apart from catering to that insatiable hunger for fast code it is also
makes code more readable: on the first reading pass you can ignore
these checks to see the default control path.


Entry: CONS using alloca()
Date: Mon Aug 17 19:28:18 CEST 2009

I'd like to write a tokenizer / parser in C without using malloc().
Is it ok to use alloca() for this (i.e. one per character)?  I did
this before actually...

Hmm..  Apparently alloca() is no longer standard?  Something to do
with ANSI C's specification of compile-time knowable stack frame size.


Entry: mmap() tricks
Date: Mon Aug 17 19:34:43 CEST 2009

How to use mmap() for memory allocation tricks?  Say I want to
allocate a single 4K page into my address space, and free it later.
Can this be done using mmap?

It's possible to place a page at a fixed address using MAP_FIXED.


Entry: Dynamic scope and thread-local variables.
Date: Tue Aug 18 12:42:11 CEST 2009

Dynamic scoping is easy to implement using global variables and
save/restore combined with some `dynamic-wind' guard for longjmp() in
case that's used.

However, in the presence of pre-emptive threads, this isn't an
option.  So how to use thread-local storage?

[1] http://en.wikipedia.org/wiki/Thread-local_storage
[2] http://publib.boulder.ibm.com/infocenter/iseries/v6r1m0/index.jsp?topic=/rzahw/rzahwex1.htm


Entry: Will use mmap() instead of malloc() for allocation of major heap chunks.
Date: Mon Aug 24 09:54:03 CEST 2009

That's what shows up in the output of the ./configure of (Meta)OCaml.


Entry: Hiding circular dependencies using "temporary binding".
Date: Mon Jul 26 14:07:25 CEST 2010

If you stick to proper trees as datastructures for C programs, you can
avoid memory allocation problems.  

In such case each datastructure has exactly one owner/parent/... an
can in principle be _embedded_ in that larger structure.

If you need directed, acyclic graphs, you need reference counting.

If you need generic cyclic object reference graphs, you need
asynchronous GC.

However, in many cases it is possible to go back to DAG / tree rep by
providing a "temporary binding" object that associates 2 objects that
need to know about each other.

This requires a rule that methods cannot store pointers to other
objects, but they might be "temporarily bound" during the execution of
one method.  The binding object (the one that actually stores the two
pointers) is then responsible for the management of the two pointers.

This is actually dependency injection[1] combined with a `short-lived
clause': object only live during method call.

In the DAG case, another trick is to impose an order on the data
structure.  The simplest one is the call stack: if a DAG is built only
from local variables, simply returning functions will properly
cleanup.

FIXME: de-ramble

[1] http://en.wikipedia.org/wiki/Dependency_injection

Entry: Translating error codes
Date: Wed Jul 28 08:01:22 CEST 2010

I notices an interesting pattern: because error codes are not real
exceptions, they do not have a clear identity.  I.e. an error code
obtained from a different framework probably needs to be translated
before it is returned up the chain.

In C, it seems to be best to keep a single error namespace for all
errors in a single module to avoid multiple translations.


Entry: Error handling and "smart code"
Date: Thu Jul 29 10:46:03 CEST 2010

For very stateful object (i.e. file systems) that have many possible
local errors that could in principle be solved locally, it might
sometimes be better to separate the code into two layers:

  * A dumb layer that gives up at the first sign of trouble,
    cancelling the transaction (not modifying the state).

  * An error recovery layer at the API entry point that localizes
    retry/resolve to all conditions that are recoverable, restarting
    transactions from the start.

The basic idea is: don't _ever_ leave your object in an inconsistent
state between method calls.  This can sometimes conflict with "use
small method calls" and pushes you towards thinking very hard about
what a decently sized transaction does.  If a state transition
consists of a lot of small increments with intermediate
inconsistencies, make sure it can be aborted at all times.  In C this
can be done by keeping the updated state in local variables, and only
committing when everything went right.


Entry: Error handlers as goto labels
Date: Fri Jul 30 07:23:28 CEST 2010

What you see a lot in C code is a "goto error" sprinkled around code,
where the "error" points to code that performs some error recovery
before function return.  The problem with this is that you don't know
where the error condition came from when you set a debugger breakpoint
on the "error" label.

The solution is to place the handling code in a separate function.
Instead of "goto error" one would use:

        return  handle_error(err);

This way a breakpoint can be set on "handle_error" and from the
backtrace it is clear which branch of the code generated it.


Entry: An argument for NULL-terminated lists
Date: Wed Aug 11 19:15:31 CEST 2010

It's funny how a seemingly arbitrary choice keeps roaring its head.

If you represent an array in C or any other low-level language that
allows pointers, you have a choice between:

       - size + contiguous vector (SV)       i.e. Pascal string

       - contiguous vector + sentinel (V0)   i.e. C string

Up to now I've found that all other things being equal, the SV is
better because 1) you don't need out-of-band values and 2) you know
the size without traversal.

Granted, NULL usually isn't a problem as an out-of-band value, but it
can be problematic in other cases where the elements are not simply
memory addresses.

Knowledge of the total size can come in handy when the vector is being
accessed sequentially (i.e. during serial communication), and you need
to allocate temporary storage.  In the V0 case you need to traverse
twice: once to get the size, and once to copy the data.  This can be
expensive and in some cases even impossible, in which case you need to
start guessing the size and handle cases where you guessed to low.

However, today I've found a case where V0 is actually better: when you
want a compact representation of traversal stacks for tree data
structures(1).  

In the V0 case you just need to store a stack of element pointers, and
pop the stack whenever the sentinel is reached (i.e. the "return"
instruction).  In the SV case you need to store _two_ words per
recursion level: one current pointer, and one counter or pointer to
the end of the data structure(3).


(1) Sometimes you do want explicit traversal stacks: recursion in C
    can be memory-inefficient due to presence of arguments or local
    variables that are "global" to the data structure descent.

(2) Actually implementing a traversal on a SV-style tree it struck me
    that I could still use a single-word stack by _copying_ the
    elements to the stack all at once.  Of course this is not a real
    solution as it less efficient overall wrt. stack space: whole
    lists are stored instead of pointers into lists.

(3) The V0 vector case are CDR-coded linked lists, meaning you can
    easily "pop" such a data structure[1].

[1] entry://20081023-144321

Entry: Programming without malloc()
Date: Sun Aug 15 21:05:41 CEST 2010

( Related to PF and linear memory management. )

How come that I have not much trouble writing linear C code (not
using malloc) for embedded systems work?

Is it that embedded systems code is inherently "simple", or does it
have to do with component-oriented approach (usually embedded software
is really about the underlying hardware, which is necessarily finite).

[1] entry://../libprim/20100815-204639


Entry: Explicit recursion stack
Date: Mon Aug 16 07:45:09 CEST 2010

To eliminate recursive calls in C and replace them with an explicit
stack, use the following approach:

  1. Make the code self-recursive by unrolling mutually recursive
     calls.  This is necessary to provide a single "context" to the
     loop.  The stack will be in that context.

  2. Replace the recursive call with a CALL macro that pushes the
     current data structure pointer to the stack and jumps to the
     entry point.

  3. Replace the entry point to examine the top of the stack.
     

Entry: Stacks and Queues
Date: Tue Aug 17 09:19:20 CEST 2010

Another non-obvious obvious thing (NOOS).  Stacks and queues are
"dual" as they represent two ways of traversing a tree: depth-first
and breadth-first.

Now, is a breadth-first search necessarily less efficient than a
depth-first search?  It seems it needs more bookkeeping information.
Or is this a consequence of how I usually implement?


Entry: Explicit recursion and post-ops.
Date: Tue Aug 17 09:46:52 CEST 2010

To write a tree descent algorithm which performs an operation _after_
recursion in the tree, the recursion stack needs two types of
"continuations".  One for the child traversal, and one for the
post-op.

This is exactly the same problem as the representation of
continuations in a language interpreter.  The solution is to build a
stack which has elements that are unions instead of structs, and write
a dispatcher (pattern matcher) for the different component types.


Entry: Struct assigments
Date: Tue Aug 17 11:08:05 CEST 2010

Apparently memcpy() isn't really necessary for structs.

[1] http://stackoverflow.com/questions/324011/memcpy-vs-assignment-in-c
[2] http://groups.google.com/group/comp.lang.c.moderated/browse_thread/thread/17f9d2a82b1ee88f/31d43c3c27a478af?pli=1


Entry: Static allocation: stacks and queues.
Date: Wed Aug 18 18:34:27 CEST 2010

* In an embedded application, if a static structure is built at
  run-time, built up from many small fragments of memory, it might be
  better to allocate from a stack than to pollute the heap.  If the
  amount of memory needed is predictable, the stack size can even be
  fixed at compile time.  This requires a two-pass algorithm though.

* Instead of writing pointers to a work queue, it is also possible to
  write self-contained objects.  This avoids malloc() for the payload
  data.

Both are really the same pattern: ``catch'' memory allocations and
implement them differently based on knowledge of the memory lifetime.


Entry: Non-cyclic objects
Date: Thu Aug 19 18:47:17 CEST 2010

The free() operation is trivial for "linear" objects that do not
contain any links to dynamic resources: simply discard.

This is the: ``finalizers are evil'' anti-pattern.  Is this actually
useful?


Entry: Constant data in Flash
Date: Thu Aug 19 22:43:47 CEST 2010

How do you make sure constant data goes into Flash memory?  Is it
possible to generate constant data structures with cross references?

From [1]:

const volatile int *data __attribute__((section("FLASH")) = /* whatever */;


[1] http://stackoverflow.com/questions/1284619/gcc-c-arm-and-const-pointer-to-struct-field
[2] http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0474a/BABHIIEF.html

Entry: Constant Recursive Data Structures
Date: Mon Aug 23 18:36:41 CEST 2010

How to define constant, cyclic data structures in C?

Same as for functions!

The syntax for variable definitions and declarations is the same, it
is the order that matters.  Only the last occurance is a definition,
the rest are declarations.


//-------------------------------

#include <stdlib.h>

struct node;

struct node {
    const struct node **parents;
    const struct node **children;
};

const struct node a;
const struct node b;
const struct node c;
const struct node d;
const struct node e;

const struct node *a_p[] = {NULL};
const struct node *a_c[] = {&c, NULL};
const struct node a = {a_p, a_c};

const struct node *b_p[] = {NULL};
const struct node *b_c[] = {&c, NULL};
const struct node b = {b_p, b_c};

const struct node *c_p[] = {&a, &b, NULL};
const struct node *c_c[] = {&d, &e, NULL};
const struct node c = {c_p, c_c};

const struct node *d_p[] = {&c, NULL};
const struct node *d_c[] = {NULL};
const struct node d = {d_p, d_c};

const struct node *e_p[] = {&c, NULL};
const struct node *e_c[] = {NULL};
const struct node e = {e_p, e_c};


int main() { return 0; }


Entry: Book about embedded C
Date: Mon Aug 23 21:05:07 CEST 2010

I wonder if it would make sense to start writing abook about embedded
C.  Maybe it's best to start with cataloging the tricks I already
know.

TOPICS:

  * Memory allocation patterns
     - static memory
     - stacks and queues
     - pure trees and inline structs
     - DAGs and refcounting
     - garbage collection and Scheme
     - hiding circular deps (the "environment" pattern / S combinator)

  * Data structure traversal
     - callbacks with context
     - generic fold w. callback
     - generators
     - for MACROS to avoid context objects

  * Debugging
     - gdb as an application console


Entry: Gensym in C
Date: Wed Aug 25 14:21:32 CEST 2010

To make iteration macros for abstract data structures on top of the
`for' keyword, it helps to be able to define unique symbols to store
the iteration state object.  I.e.


#define X_COLLECTION_FOR_(_list, _x, _xl)             \
    struct x_list *_xl;                               \
    for(_xl = (*_list), _x = _xl ? _xl->x : NULL;     \
        _x;                                           \
        _xl = _xl->next, _x = _xl ? _xl->x : NULL)
#define X_COLLECTION_FOR(list, x) X_COLLECTION_FOR_(list, x, __xl__)

The `__xl__' symbol above is inserted into the macro invocation name
context as a local variable and might clash.  Is there a way to
properly solve this problem?

The following seems to work:

#define __GENSYM2(x,y) x##y
#define __GENSYM1(x,y) __GENSYM2(x,y)
#define GENSYM(x) __GENSYM1(x,__COUNTER__)

The indirection is about prescan[3]:

    "If an argument is stringified or concatenated, the prescan does
    not occur. If you want to expand a macro, then stringify or
    concatenate its expansion, you can do that by causing one macro to
    call another macro that does the stringification or concatenation."


[1] http://stackoverflow.com/questions/1132751/how-can-i-generate-unique-values-in-the-c-preprocessor
[2] http://gcc.gnu.org/onlinedocs/cpp/Common-Predefined-Macros.html#Common-Predefined-Macros
[3] http://gcc.gnu.org/onlinedocs/cpp/Argument-Prescan.html#Argument-Prescan


Entry: Resource Acquisition Is Initialization (RAII)
Date: Thu Aug 26 11:57:21 CEST 2010

Mixing resources and exceptions, or how to model external resources as
short-lived local (automatic) variables.

This tickles the idea again about ``resource-free programming''.  In
general, ``garbage collection is good but finalizers are bad''
[Steele].

[1] http://en.wikipedia.org/wiki/Resource_Acquisition_Is_Initialization
[2] http://calculist.blogspot.com/


Entry: Constant multidimensional arrays
Date: Mon Aug 30 10:48:25 CEST 2010

One of those things that has always puzzled me is how multidimensional
arrays work in C.  I've always avoided this and embedded multidim
arrays inside of 1-dim arrays using explicit coordinate mapping.

int temp[0x10][0x20];

In the definition of `temp`, is there an array of pointers involved?

    K&R: "In C, a two-dimentional array is really a one-dimensional
    array, each of whose elements is an array.".

So my confusion is mostly about the difference between an array and a
pointer.  These are not the same!  See the sizeof() macro.

The confuses arrises because an array is automatically converted to a
pointer whenever it is used in a pointer context.


Entry: Order of evaluation in CPP macros
Date: Tue Aug 31 11:19:00 CEST 2010

Is it possible to play with order of expansion in CPP?  Judging from
the GENSYM[1] macro it is.  How to do this systematically?

Should evaluation order matter?  For a purely functional language,
evaluation order only matters for termination issues.  So it's no
surprise that we run into evaluation order when an impure features is
used (the __COUNTER__ variable).

[1] entry://20100825-142132


Entry: Defining constant linked lists using CPP
Date: Tue Aug 31 13:37:45 CEST 2010

Is it possible to define constant linked lists from isolated macro
expansions?

I'd think it's possible using name concatenation, but it would still
require manual input of symbols, so it's no worse than giving an
explicit previous list item to link..


Entry: git diff
Date: Tue Aug 31 17:12:23 CEST 2010

Not really C related, but I only use git for C projects.

Problem: make a diff that can be applied by `patch' on a subtree of a
git repository.

It's probably simplest to use two braches and diff between the
branches.


Entry: C99 function names: __func__
Date: Wed Sep  1 13:59:19 CEST 2010

See GCC manual[1].  Apparently __func__ is part of the C99 standard,
and behaves as:

static const char __func__[] = "function-name";

__FUNCTION__ is a GCC extension and not portable.

[1] http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Function-Names.html


Entry: Non-invasive C code tracing
Date: Wed Sep  1 14:56:44 CEST 2010

Is it possible to attach code to C function entry and exit points in a
non-invasive way?

One (invasive) way is to wrap each function in an instrumentation
function.


Entry: The meaning of `const'  (.rodata section)
Date: Tue Sep  7 09:10:01 CEST 2010

One often hears that ``const is broken'' which means that it is
possible in C to get an unrestricted pointer to const data.

However, on a more pragmatic side, the `const' qualifier does tell the
compiler that the data is _intended_ to be constant, and as such it
can be put in a separate linker section `.rodata'.

This section can then be put in (p)ROM.  I.e. for ARM uC it can go
into program Flash.


Entry: Linker sections on AT91SAM7X512 (ARM7TDMI, 128k RAM, 512k Flash)
Date: Tue Sep  7 09:25:35 CEST 2010

The output of objdump (reformatted):

tom@one:~/$ objdump -h app.elf 

app.elf:     file format elf32-little

Sections:
Idx Name            Size      VMA       LMA       File off  Algn
  0 .debug_aranges  000021f8  00000000  00000000  00030e30  2**3  CONTENTS, READONLY, DEBUGGING
  1 .debug_pubnames 00006976  00000000  00000000  00033028  2**0  CONTENTS, READONLY, DEBUGGING
  2 .debug_info     0007fbd5  00000000  00000000  0003999e  2**0  CONTENTS, READONLY, DEBUGGING
  3 .debug_abbrev   00014342  00000000  00000000  000b9573  2**0  CONTENTS, READONLY, DEBUGGING
  4 .debug_line     00017cc3  00000000  00000000  000cd8b5  2**0  CONTENTS, READONLY, DEBUGGING
  5 .debug_frame    000062f4  00000000  00000000  000e5578  2**2  CONTENTS, READONLY, DEBUGGING
  6 .debug_str      0000f462  00000000  00000000  000eb86c  2**0  CONTENTS, READONLY, DEBUGGING
  7 .debug_loc      0001b6ee  00000000  00000000  000facce  2**0  CONTENTS, READONLY, DEBUGGING
  8 .rom_vectors    00000040  00100000  00100000  00008000  2**0  CONTENTS, ALLOC, LOAD, READONLY, CODE
  9 .text           0001e790  00100040  00100040  00008040  2**2  CONTENTS, ALLOC, LOAD, READONLY, CODE
 10 .rodata         00002580  0011e7d0  0011e7d0  000267d0  2**3  CONTENTS, ALLOC, LOAD, READONLY, DATA
 11 .got            00000228  00120d50  00120d50  00028d50  2**2  CONTENTS, ALLOC, LOAD, DATA
 12 .fixed_vectors  00000140  00200040  00200040  001163c0  2**5  CONTENTS, READONLY
 13 .data           00000cac  00200180  00120f78  00030180  2**3  CONTENTS, ALLOC, LOAD, CODE
 14 .bss            0001140c  00200e2c  00121c24  00030e2c  2**4  ALLOC
 15 .ARM.attributes 0000002c  00000000  00000000  00116500  2**0  CONTENTS, READONLY
 16 .debug_ranges   000073e0  00000000  00000000  00116530  2**3  CONTENTS, READONLY, DEBUGGING
 17 .comment        00001784  00000000  00000000  0011d910  2**0  CONTENTS, READONLY


VMA = virtual memory area
LMA = ???

00100000 Flash
00200000 RAM

00000000 Remappable Flash/RAM   

Some sections start at 0 but this seems to be a dummy value, i.e. for DEBUGGING.


Entry: Reinventing linked lists
Date: Tue Sep 14 09:35:46 CEST 2010

Yes, I know i deserve to be slapped when implementing (sorted) linked
lists in C, but really, sometimes it's just easier to just write it
down.

Trouble is, this is error prone.  I _think_ I can just write it down
but there's always some bug in the code that's not caught by the type
system.  Double pointers are notorious for this.  See the following:

    struct timeout_list **tol = &a->timeouts;
    while ((*tol) && ((*tol)->time < at)) { (*tol) = (*tol)->next; }

What I really wanted to write is:

    struct timeout_list **tol = &a->timeouts;
    while ((*tol) && ((*tol)->time < at)) { tol = &((*tol)->next); }

The latter iterates, while the former removes elements!


Entry: Emacs and parsing C
Date: Tue Sep 14 11:59:08 CEST 2010

There are some tools available, but they don't make the click to me
(i.e. CEDET).  The one that cought my attention is [1].

The problem is quite simple:

  * build a database from your source code.

  * query the database

  * allow "replace", taking part of a syntax tree and replacing it
    with something else.

  * integrate this with emacs

The most complex part seems to be the C parser.  I.e. to go from code
to (modifiable!) datastructure back to code.


[1] http://mike.struct.cn/blogs/entry/15/


Entry: Embedding arbitrary data in ELF using objcopy
Date: Sat Sep 18 15:13:33 CEST 2010

Good hint from [1]:

objcopy \
        -I binary -O elf32-i386 -B i386 \
        --rename-section .data=.rodata file.bin file.o

Default secion is .data which is here renamed to .rodata for embedding
of const data that goes into program Flash.

This creates 3 symbols:

SYMBOL TABLE:
00000000 l    d  .rodata	00000000 .rodata
00000000 g       .rodata	00000000 _binary_file_bin_start
00000007 g       .rodata	00000000 _binary_file_bin_end
00000007 g       *ABS*	        00000000 _binary_file_bin_size

To use different names, use something like:
    -–redefine-sym _binary_file_bin_start=_my_bin_data

[1] http://www.doof.me.uk/2010/05/07/cute-objcopy-hack/


Entry: Makefile error: multiple target patterns
Date: Fri Sep 17 15:18:20 CEST 2010

Check for multiple colons[1].

[1] http://stackoverflow.com/questions/2100448/multiple-target-patterns-makefile-error


Entry: Const correctness
Date: Thu Sep 23 10:13:38 CEST 2010

What we already know is that global variables declared "const" will go
into the .rodata section.

What about functions?  I'd like to hash the __FUNCTION__ variable into
an integer to get method tags.  Does the compiler inline this if the
hashing function is declared const?


[1] http://en.wikipedia.org/wiki/Const-correctness 
[2] http://stackoverflow.com/questions/212237/constants-and-compiler-optimization-in-c


Entry: Makefile patterns
Date: Fri Feb 25 16:21:30 EST 2011

Pattern targets.  The pitfall is to not make them too general.  How
are ambiguities handled?

What I like to know is how to insert a phony target into a real one.
I.e. some targets do not have a full dependency tree, only a recursive
make...

What I want to do is to build a better build system for ecos.

  - If the config file or any of the cdl files change, regenerate the
    whole tree.

  - For other changes, just perform recursive make.

Maybe it's not really worth it : keep current as is (only rebuild if
config changes) and do manual rebuild when working on eCos source
files.


Entry: Environment or makefile variables?
Date: Sat Mar  5 14:27:01 EST 2011

Basic idea: stick to one; don't use both.


Entry: Datastructures in Flash
Date: Sat Mar  5 23:35:58 EST 2011

Maybe I misunderstood the code, but I saw a segment in eCos where a
datastructure is defined and attributed to a certain linker segment.
The linker then collects all of these and places them next to each
other.

This seems like a nice way to build contiguous memory areas from
dispersed source code locations when using C as a compiler target
language.

[1] entry://20100918-151333
[2] http://../ecos/20110529-123810

Entry: GNU make : target specific variables
Date: Wed Mar  9 12:42:34 EST 2011


[1] http://www.gnu.org/software/make/manual/make.html#Target_002dspecific


Entry: Generators
Date: Thu Mar 17 11:53:12 EDT 2011

When you have list structures moving between layers of code, say
A->B->C, there is a choice to be made about the form of the data that
flows between the functions.  The simplest approach is to put a bunch
of data in a (large) buffer and pass it on to the next function.  An
alternative way is to pass only a single element at a time, and place
the A->B->C components inside a loop.

Taking some ideas from [1], and taking into account the limitations of
C, the optimal seems to be to provide a "fold" for each data
structure.  In C code I prefer to use the word "for".

The main advantage of such an approach is that it avoids intermediate
data structure storage such as large buffers to pass lists of elements
from one function to the next.  Using the "for" loop aproach instead
of the iterator object approach -- i.e. methods that implement first,
next, last -- has the added advantage that locking and other resource
management can be abstracted away from the callback function.


[1] http://okmij.org/ftp/Streams.html#enumerator-stream


Entry: Abstract data types vs. static RAM allocation
Date: Fri Apr  1 14:28:39 EDT 2011

There is a clear conflict of interest between

  1. C structures with static allocation
  2. Data abstraction

It would be handy to be able to know the size of a data structure at
compile time, but hide its layout such that data accessors can be
used.

I'm not so much interested in hiding reads, but writes really do need
special attention, as they can unknowingly introduce the worst kind of
(single-threaded) bug the data structure invariant violation.

I was thinking about a hack that would use the C "const" to protect a
struct from direct write access, but cast the const in the
implementation.  That might not interfere well with optimization
though..  However it might be so that the C compiler doesn't use the
"const" for optimization simply because const pointers can be recast.


I'd like to pose this question on Stack Overflow.  How to formulate?

Q: How to get at the size of a forwared declared struct?

A good practice when writing C code is to always use forward structure
declarations (incomplete types) in a header file, and keep the
implementation in a code file.

// in header
struct abc;

// in code
struct {
       int a;
       int b;
       int c;
}

When writing C code in an embedded development setting with a lot of
memory perssure, it is often a good idea to statically allocate data
structures whenever possible.  Doing so requires complete
types. i.e. a struct definition which has all fields accessible by the
user.

Is there an elegant way to resolve the conflict of these two
constraints?  Is it possible to provide struct size information at
compile time to make it possible to perform static allocation, but at
the same time hide the layout of the struct to prevent arbitrary
access?


Entry: C and packed bitfields
Date: Sun Apr  3 13:49:15 EDT 2011

Is this actually useful?


Entry: Producer/consumer in C in an embedded context
Date: Mon Apr  4 14:11:08 EDT 2011

Main problem in embedded is to limit RAM memory usage.  This makes
algorithms that use intermediate data structures prohibitive.

Usually there is plenty of ROM in comparison to RAM, so it is usually
possible to use use manual "deforestation".

What this means is to turn code into producers/consumers (threads and
streams).  In languages that support partial contiuations this is
straightforward.  How to do it in C?


Entry: "for" functions or "FOR" macros?
Date: Tue Apr  5 21:15:28 EDT 2011

The idea is to always use abstract collection traversal interfaces.
There are essentially two forms:

  - the STREAM: open, close, next, eof

  - the early-abort FOR loop, either as a macro or as a higher order
    function (callback + state).

Most people are familiar with the STREAM option, but the FOR function
has some advantages, such as centralizing resource allocation
(open/close are abstract).

In C, a "FOR" macro is more convenient since it can use its
surrounding lexical context without using tricks, but a "for" function
is more powerful though a bit clumsy to use for its need of specific
"context" structures (closures).


Entry: static inline: (pseudo-) separate interface from inplementation
Date: Thu Apr 28 12:28:44 EDT 2011

I tend to use "static inline" functions a lot for simple accessors to
allow the compiler to perform more aggressive optimization.  The
problem is that this always needs to combine interface and
implementation for two reasons:

  - data structures need to be fully specified.

  - function bodies are visible.

The former you can't do anything about: compiler needs to know the
data layout so it's technically possible for the API user to abuse
this.  The latter makes it hard to read headers.

It seems a good idea to use the following: use forward declarations
for data structures and inline functions and stash the implementation
of both in a _priv.h file that is not supposed to be looked at by
human eyes, but is there for the compiler:


// foo.h
struct foo_data;
static inlien int foo_data_size(struct foo_data *x);
#include "foo_priv.h"

// foo_priv.h
struct foo_data {
       int a, b, c;
};
static inline int foo_data_size(struct foo_data *x) {
       return sizeof(foo_data);
}

Another issue whether one allows struct members as:

struct highlevel {
       struct lowlevel_foo foo;
       struct lowlevel_bar bar;
}

or needs to abstract using pointers and possible constructors such as
lowlevel_foo_new() and lowlevel_bar_new() to instantiate the objects.

struct lowlevel {
       struct lowlevel_foo *foo;
       struct lowlevel_bar *bar;
}


Entry: GNU cflow
Date: Thu May 19 17:31:40 CEST 2011

Print a control flow graph (as an indented text file) from a bunch of C files.

Another tool is egypt[4].

Apparently doxygen can also use Graphviz to generate call graphs[3].

cflow2dot[5] to connect cflow ang Graphviz.

[1] http://www.gnu.org/software/cflow/
[2] http://en.wikipedia.org/wiki/Call_graph
[3] http://en.wikipedia.org/wiki/Doxygen
[4] http://www.gson.org/egypt/
[5] https://code.google.com/p/cflow2dot/
[6] http://sourceforge.net/projects/cflow2vcg/files/cflow2vcg/0.5/cflow2vcg-0.5.tar.gz/download


Entry: Link-Time Optimization (LTO) vs. static inline
Date: Sun May 22 18:27:51 CEST 2011

Does GCC perform LTO?  Seems the short answer is yes[1].  How to
enable it?  From [2]:

    To enable LTO, simply add the flag '-flto' to both compile and
    link commands.

[1] http://gcc.gnu.org/wiki/LinkTimeOptimization
[2] http://gcc.gnu.org/ml/gcc/2009-10/msg00060.html


Entry: Automatic casting and overflow
Date: Mon May 23 16:59:12 CEST 2011

Something I always do explicitly because I can't remember the rules:

char i = 255;
int  j = i + 1;

Does i get cast from char -> int before the addition, or after?  Where
is this made explicit?

Section 2.7 Type Conversions in K&R is quite explicit: 

  When an operator has operands of different types, they are converted
  to a common type according to a small number of rules. In general,
  the only automatic conversions are those that convert a ``narrower''
  operand into a ``wider'' one without losing information [...]


Entry: Compile time assert
Date: Mon Jun  6 11:30:01 CEST 2011

I need a compile-time assert macro that can go into proprietary code,
so can't use the one in Linux.

Good news: I did not look at any code yet, so can I reinvent it?  The
trick alledgedly is to use the macro arguments to construct data of
negative size.

So let's analyze this:  CT_ASSERT(x)

  - If the argument of the assert macro is zero, a compilation error
    is triggered.

  - To trigger a compilation error, construct a data structure with
    negative size.

  - A name is necessary for the data structure, so let's use a name
    generation macro (GENSYM[1]).

It turns out to be quite straightforward: using result values of
boolean expressions (after a comparison of the input to 0), a simple
negation can be used to map the failed assert to a negative array
size.

#define CTASSERT(x) static const char GENSYM(ctassert_)[-((x)==0)];

EDIT: There is even a more elegant way.  Instead of using a code
construct that has a run-time run-time remnant as a symbol part of the
binary if it's not optimized out, it's possible to use just a
completely ephemeral construct: i.e. a type.

#define CTASSERT(x) struct GENSYM(ctassert_) {char a[-((x)==0)]; }

EDIT: It's also possible to avoid size 0 arrays and use size -1 for
false and size +1 for true:

#define CT_ASSERT(x) struct GENSYM(_ctassert_){ char a[1 - 2*((x)==0)]; }


[1] entry://20100825-142132


Entry: Know your linker: how to avoid unused functions to end up in a binary.
Date: Sat Jun 11 18:01:52 CEST 2011

Use the CFLAGS

  -ffunction-sections -fdata-sections

which will put functions in their own sections.  Together with the
linker option

  -Wl,--gc-sections

unused sections (functions or data) will be removed.  The only
downside is that using separate sections requires more space due to
alignment.


Entry: binutils RTFM
Date: Sun Jun 12 15:59:20 CEST 2011

From [1]:

    You can see the symbols in an object file by using the nm program,
    or by using the objdump program with the `-t' option.


[1] http://www.delorie.com/gnu/docs/binutils/ld_8.html


Entry: What are __restore and __restore_rt ?
Date: Sat Jun  4 15:39:51 CEST 2011

I ran into these missing symbols in an eCos synthetic target build[1].
I have no idea where these functions should be defined, and what the
meaning is of the construct in the eCos code:

        .align 16
        .global cyg_hal_sys_restore_rt
cyg_hal_sys_restore_rt:
        movl    $SYS_rt_sigreturn, %eax
        int     $0x80
1:              
        .type __restore_rt,@function
        .size __restore_rt,1b - __restore_rt
                
        .align 8
        .global cyg_hal_sys_restore
cyg_hal_sys_restore:
        popl    %eax
        movl    $SYS_sigreturn, %eax
        int     $0x80
1:              
        .type __restore,@function
        .size __restore,1b - __restore


The build fails like this:

make[1]: Entering directory `/opt/xc/ecos/build/cvs/linux/hal/synth/i386linux/current'
gcc -c  -I/opt/xc/ecos/build/cvs/linux/install/include -I/opt/xc/ecos/src/cvs//packages/hal/synth/i386linux/current -I/opt/xc/ecos/src/cvs//packages/hal/synth/i386linux/current/src -I/opt/xc/ecos/src/cvs//packages/hal/synth/i386linux/current/tests -I. -I/opt/xc/ecos/src/cvs//packages/hal/synth/i386linux/current/src/ -finline-limit=7000 -Wall -Wpointer-arith -Wstrict-prototypes -Wundef  -Wno-write-strings -g -O2 -ffunction-sections -fdata-sections  -fno-exceptions -Wp,-MD,src/syscall-i386-linux-1.0.tmp -o src/hal_synth_i386linux_syscall-i386-linux-1.0.o /opt/xc/ecos/src/cvs//packages/hal/synth/i386linux/current/src/syscall-i386-linux-1.0.S
/tmp/ccIPQc1w.s: Assembler messages:
/tmp/ccIPQc1w.s: Error: .size expression for __restore_rt does not evaluate to a constant
/tmp/ccIPQc1w.s: Error: .size expression for __restore does not evaluate to a constant
make[1]: *** [src/syscall-i386-linux-1.0.o.d] Error 1
make[1]: Leaving directory `/opt/xc/ecos/build/cvs/linux/hal/synth/i386linux/current'
make: *** [build] Error 2
make: Leaving directory `/opt/xc/ecos/build/cvs/linux'

According to something I found in google code search[2], these
functions are trampolines.  It's a part of gdb code[3].  It says:

    ... as of version 2.1.2, the GNU C Library uses signal trampolines
    (named __restore and __restore_rt) that are identical to the ones
    used by the kernel.

Maybe that is something that changed?  I used objdump -T to find these
symbols in any of the libraries in /lib and /usr/lib and I didn't find
anything.  Maybe it's in libgcc or so?

So let's see what that code actually means.  The .type[4] directive
records the symbol table type for the associated symbol.  So it seems
that ".type" in

        .type __restore,@function
        .size __restore,1b - __restore

does not much more thanannotate the symbol "__restore": it does not
define it.  Same for ".size": it records the size associated to the
symbol.

Maybe what is meant here is to just refer to the "cyg_hal_sys_restore"
symbol, and not the "__restore" symbol?  I.e. the code probably has
changed at some point to incorportate a name change, but somebody
forgot to update that directive.  It's not necessary for the code to
work so it was only detected when binutils got stricter.

See fix next post.

[1] http://ecos.sourceware.org/ml/ecos-discuss/2011-06/msg00013.html
[2] http://www.google.com/codesearch?as_q=__restore_rt&btnG=Search+Code&hl=en&as_package=&as_lang=&as_filename=&as_class=&as_function=&as_license=&as_case=
[3] http://www.google.com/codesearch/p?hl=en#pFm0LxzAWvs/darwinsource/tarballs/other/gdb-203.tar.gz%7CYXbJTYT1R-s/gdb-203/src/gdb/i386-linux-tdep.c&q=__restore_rt
[4] http://tigcc.ticalc.org/doc/gnuasm.html#SEC133


Entry: Synthetic target fix
Date: Sat Jun  4 16:41:20 CEST 2011

After checking what ".type" and ".size" actually mean (they annotate a
symbol with type and size), it seems likely that what is meant here is
to just refer to the "cyg_hal_sys_restore" symbol, and not the
"__restore" symbol?  I.e. the code probably has changed at some point
to incorporate a name change, but somebody forgot to update that
directive.  It's not necessary for the code to work so it is only
detected now that binutils got stricter about it's input.

If this is the case, see the fix below

Cheers,
Tom


? synth_fix.patch
Index: syscall-i386-linux-1.0.S
===================================================================
RCS file: /cvs/ecos/ecos/packages/hal/synth/i386linux/current/src/syscall-i386-linux-1.0.S,v
retrieving revision 1.13
diff -u -8 -p -r1.13 syscall-i386-linux-1.0.S
--- syscall-i386-linux-1.0.S	23 Aug 2009 11:34:45 -0000	1.13
+++ syscall-i386-linux-1.0.S	4 Jun 2011 14:40:09 -0000
@@ -439,20 +439,20 @@ SYSCALL5(ipc)
 // via another system call.         
 
         .align 16
         .global cyg_hal_sys_restore_rt
 cyg_hal_sys_restore_rt:
         movl    $SYS_rt_sigreturn, %eax
         int     $0x80
 1:              
-        .type __restore_rt,@function
-        .size __restore_rt,1b - __restore_rt
+        .type cyg_hal_sys_restore_rt,@function
+        .size cyg_hal_sys_restore_rt,1b - cyg_hal_sys_restore_rt
                 
         .align 8
         .global cyg_hal_sys_restore
 cyg_hal_sys_restore:
         popl    %eax
         movl    $SYS_sigreturn, %eax
         int     $0x80
 1:              
-        .type __restore,@function
-        .size __restore,1b - __restore
+        .type cyg_hal_sys_restore,@function
+        .size cyg_hal_sys_restore,1b - cyg_hal_sys_restore


Entry: Are structs bad?
Date: Thu Jun 23 11:07:57 CEST 2011

The problem with C is not that structs are a bad interface, but that
it is hard to enforce structs to be immutable.

I.e. in essence there is not such a big difference between:

    struct foo {
        int a,b,c;
    };

and

    void foo (int a, int b, int c);

If any of the two interfaces changes, the code that uses it needs to
be changed anyway.

Of course, exposing _internal_ data structures that contain
implementation details that are irrelevant to the user is a a bad
idea.  I'm thinking mostly of structs as configuration files.


Entry: Be careful with typeof() for casts in macros
Date: Sat Jun 25 11:48:18 CEST 2011

Excerpt of a simple stack implementation.

  #define STACK_PUSH_SAFE(stack, thing, overflow) {       \
      void *room = stack_allot(stack);                    \
      if (!room) goto overflow;                           \
      *((typeof(thing)*)room) = thing;                    \
      }

The trouble with this is that the input argument of the macro is used
to determine the access.  So if the stack represents short ints (16
bits) and the type of the parameter at the call site is a 32bit int,
there is a problem:

  - data abort on ARM due to alignment problems

  - incorrect access (writes 2 values) on archs that support
    non-algined access

  
Entry: Locking and callbacks
Date: Sat Jun 25 13:19:14 CEST 2011

Is it possible (and sound?) to do the following:

  When a mutex is locked by the current thread, do not lock (causing
  deadlock) but throw an error.


This shows up in a case where I'm using callbacks inside a lock to
avoid manually having to manage resources.  However, the hierarchy of
the locks of the application is such that it isn't completely
impossible to avoid deadlocks: it's possible to call functions that
ultimately depend on a low level lock the callback has already
reserved, in this particular case a disk lock.

Probably this is just a sign that the cause of the inner deadlock
needs to be decoupled, i.e. by using a thread.  In this case, the
inner cause is a logger which might aquire a disk lock, however it is
possible to put the logger in a separate thread and use a buffer to
capture logging data until the disk log is released.

Is there a (practical) way to statically verify lock hierarchy
violations?  I.e. this[1] mentions a mechanism where lock hierarchy is
constructed (probably through control flow graph) and it triggers an
error when this graph contains loops, i.e. joins on the same lock.

It seems[2] that it is possible to do it at run-time.  That drdobbs
article[2] also mentions layer violation through callbacks, which is
exactly my problem.  I'm using callbacks to abstract locking, but this
then has the potential to violate hierarchies.


[1] http://www.osronline.com/ddkx/ddtools/dv_8pkj.htm
[2] http://drdobbs.com/high-performance-computing/204801163?pgno=1


Entry: emacs + cscope
Date: Mon Jul 11 20:43:11 CEST 2011

Minimal setup:

apt-get install cscope cscope-el  
cscope -bR -I<path> ...           # Index directory, set include path.

Emacs:

(require 'xcscope)
M-x cscope-set-initial-directory  # Set it to the file that contains cscope.out
M-x cscope-find-this-text-string  # C-c s s


Entry: GCC not catching undefined variable error
Date: Tue Jul 12 11:08:46 CEST 2011

With -Wall enabled, this undefined variable error is not caught:

  void foo(int arg) {
      int err;
      if (arg < 0) goto exit;
      err = 0;
  exit:
      return err;
  }

Might it be because of -O0, i.e. no data flow analysis performed?
Yep, absence of opti is the problem:

  cc1: warning: -Wuninitialized is not supported without -O


Entry: cflow & entry point
Date: Mon Jul 18 22:03:39 CEST 2011

Just passing a random file to cflow made me think first that all
functions are sorted, meaning that if a function is called by another
one, it is not taken as one of the main entry points.

However, it does seem to miss some things: my code contains static
functions that go into a struct which are completely ignored by cflow.

I tried "--include=s" to include static functions, but that doesn't
seem to work.

Hmm.. Something's wrong.  Removing all the "static" annotations does
make the functions show up.


Entry: Why buffer?
Date: Tue Jul 19 16:43:23 CEST 2011

The 7 whies down the rabbit hole answer: to avoid loosing time in
context swapping.

From this simplified view it seems straightforward to conclude that to
optimize for small memory usage, optimize context swapping.


Entry: CPP macro for counting the number of bits in a word.
Date: Sun Oct 30 15:00:38 EDT 2011

I don't know a way to write recursive macros; don't know how to
terminate the recursion and I don't know if recursion is supported in
the first place.  Though this is a simple manual unrolling:

#define NB_BITS_32(x) (NB_BITS_16(x) + NB_BITS_16(x >> 16)
#define NB_BITS_16(x) (NB_BITS_8(x) +  NB_BITS_8(x >> 8)
#define NB_BITS_8(x)  (NB_BITS_4(x) +  NB_BITS_4(x >> 4)
#define NB_BITS_4(x)  (NB_BITS_2(x) +  NB_BITS_2(x >> 2)
#define NB_BITS_2(x)  (NB_BITS_1(x) +  NB_BITS_1(x >> 1)
#define NB_BUTS_1(x)  (x & 1)
           

Entry: Counting in the preprocessor
Date: Mon Nov 14 11:08:52 EST 2011

It's probably possible to count in the preprocessor by allocating an
array, and using sizeof.  Then somehow get rid of the thing later..


Entry: MMX on AMD64
Date: Tue Nov 29 12:15:22 EST 2011

Does it not work?


Entry: Assigning pointers to zero
Date: Sat Dec 17 14:58:52 EST 2011

I ran into a bug today that's similar to

  void foo(int *a) {
      a = 0;
      ///
      a = 123;
  }

when I meant:

  void foo(int *a) {
      *a = 0;
      ///
      *a = 123;
  }

How to make this into an error without using leaky const arguments?


Entry: Getting over C++ hate (fear?)
Date: Sun Dec 18 09:00:38 EST 2011

In discussion with someone that doesn't have this problem, it's come
to my attention lately that I have an irrational attitude towards C++.
Sure, C is simple and straightforward, but in my style I rely a lot on
the CPP.  Maybe it's time to start collecting a couple of C++ features
that would be interesting to explore, with an eye on embedded
development, to avoid having to resort to text macros too much..

- Resource management through local objects. RAAI[1].  Even without
  exception support (like eCos) it's still useful to be able to return
  and have destructors called.

- Templates: building complicated const data structures at compile
  time to be able to store them in read-only Flash memory.

[1] http://en.wikipedia.org/wiki/Resource_Acquisition_Is_Initialization


Entry: Typed CPP
Date: Sun Dec 18 09:07:42 EST 2011

Is it possible to build some kind of type system on top of CPP?  Limit
use to construct that have some structure?  I.e. build a system of CPP
tools that can be compiled by standard CPP but also compiled by a
preprocessor with a type system, i.e. something that would fit C
syntax so I can feed it into Language.C ?

This decoupling gets you:

  * Compatibility with standard CPP, meaning a source tree can be
    compiled without the static analyzer and its dependencies present
    on the system, which is *really* important.

  * Full freedom in tool dependency.  Since analysis is only necessary
    when you're actually *changing* the code, the static analyzer can
    have a whole host of dependencies and special purpose features.


Is this just C++'s template system?


Entry: Obtaining pointer to current function
Date: Thu Dec 29 11:05:45 EST 2011

Is there a way to obtain a pointer to the current function's address
without knowing its name?  I.e.

  void foo(void) {
      void *me = &this_function;
  }

From [1] it seems that this isn't possible, except for doing it with a
function call like this:

  #include <stdio.h>

  void *get_addr(void) {
      return __builtin_return_address(0);
  }

  int main(void) {
      printf("%p\n", get_addr());
      return 0;
  }

This might be enough though.  The thing is that gdb knows how to map
this to a source location, given the binary compiled with debugging
symbols:

  # gdb <ELF>
  (gdb) list *<ADDRESS>


[1] http://stackoverflow.com/questions/2154852/get-a-pointer-to-the-current-function-in-c-gcc


Entry: Detecting re-entrancy
Date: Thu Dec 29 19:44:49 EST 2011

How to detect re-entrancy in a function, while allowing normal
multi-threaded access?

I don't see a way to do this without thread-local variables.


Entry: C re-appreciation day
Date: Sat Jun 23 15:03:08 EDT 2012


Entry: Overflow is undefined for signed ints
Date: Tue Jul 10 09:18:39 EDT 2012

While Two's complement arithmetic has little secrets for me, I was
surprised to learn that it's not so well-defined in C as I thought.

The reason seems to be that for unsigned ints, the behaviour is easily
expressed in terms of the C '%' modulo operator, as operations modulo
1^n where n is the bit size.

However, for signed ints this doesn't work because the same
interpretation (modulo a positive number) should not change the sign.

While there is definitely a "wrap around" for signed ints, it can not
be expressed (simply) in terms of the C modulo '%' operator, which is
probably the reason it is left unspecified.

When unsigned and signed are mixed, the result is unsigned.  That
makes sense, since it allows an accumulator to be unsigned, and
increments to the accumulator be unsigned or signed.

Conclusion: representing modulo arithmetic in C using overflow needs
to be done using unsigned ints.  It will "probably" work with signed
ints, but that relies on unspecified behavior.


Entry: C eval
Date: Fri Aug 17 16:43:27 EDT 2012

I need a simple way to convert a C string to .bin for data structures.
Something like:

int eval(const char *c_init, void *buf, int bufsize);

Which could evaluate "int a[] = {1,2,3};" to binary:
01 00 00 00 02 00 00 00 03 00 00 00

The script part of it seems relatively easy.  Run make and dump the
.bin to stdout.  How to do the spawn?


Entry: tty
Date: Thu Sep 20 00:02:42 CEST 2012

More a unix thing, but a very interesting article nontheless.  This
caught my attention:

  By default, fork(2) places a newly created child process in the same
  process group as its parent, so that e.g. a ^C from the keyboard
  will affect both parent and child. But the shell, as part of its
  session leader duties, creates a new process group every time it
  launches a pipeline.

I ran into trouble recently trying to start a bunch of processes from
a shell, but being surprised why they did not end up in one process
group.  The explanation makes perfect sense: a process group is best
associated to a single pipeline.  What I did was an "implicit"
pipeline: some processes work together sharing named pipes or files &
polling.

Is there a different way to force this?  I.e. make the shell put
processes in a single group?


[1] http://www.linusakesson.net/programming/tty/


Entry: Simpler gdb tracer
Date: Thu Sep 27 08:17:08 EDT 2012

Macro to:
- Call function.
- Tag -> return address.

How to find return address?


Entry: const * const
Date: Sun Oct 14 09:25:39 EDT 2012

0. const * const: the first "const" refers to the object, the second
   to the poiter variable.

1. You'd want to declare "const *a" in the function PROTOTYPE to show
   the caller that the contents of an object passed by reference is
   not modified.

2. In general, you want variables in a function that are not to be
   changed to be declared "const".

3. You'd want to also use "const * const a" in a function DEFINITION
   to make sure the implementation doesn't modify the value of the
   pointer itself.  However, this doesn't need to show up in the
   prototype!  (Since pointers are passed by value it is irrelevant to
   the caller.)  The prototype can just be "const *a".

( #3 is the new insight )


Entry: cscope
Date: Thu Nov 22 10:46:39 EST 2012

cscope -kbq

This will convert
IN: cscope.files
OUT: cscope.out cscope.in.out cscope.po.out

However, it seems to be missing things.

From manpage[1]

  -q Enable fast symbol lookup via an inverted index. This option
     causes cscope to create 2 more files (default names
     ``cscope.in.out'' and ``cscope.po.out'') in addition to the
     normal database. This allows a faster symbol search algorithm
     that provides noticeably faster lookup performance for large
     projects.

  -k ``Kernel Mode'', turns off the use of the default include dir
     (usually /usr/include) when building the database, since kernel
     source trees generally do not use it.

  -b Build the cross-reference only.

[1] http://cscope.sourceforge.net/cscope_man_page.html


Entry: Tagged list tricks
Date: Sat Dec  8 12:52:33 EST 2012

Use callbacks instead of lists:

typedef void (*zl_zwindow_parsed_event_fn)(void *context, int nb_args, ...);
void zl_xwindow_for_parsed_events(zl_xwindow *xwin, zl_zwindow_event_fn handle, void *context);

Use const char to "simulate" interned symbols (hashed strings) for
tags known at compile time, without actually having to implement
dynamic symbol hashing.

Abstracting lists of tags (i.e. a "type definition") in a macro can
help here:

#define ZL_DISPLAY_EV_LIST(EV)           \
    EV(keypress)                         \
    EV(keyrelease)                       \
    EV(keyrelease)                       \

/* Declaration */
#define ZL_DISPLAY_EV_DECL(name) extern const char *ZL_DISPLAY_EV_#name;
ZL_DISPLAY_EV_LIST(ZL_DISPLAY_EV_DECL)

/* Implementation */
#define ZL_DISPLAY_EV_IMPL(name) const char *zl_display_ev_#name ##name;
ZL_DISPLAY_EV_LIST(ZL_DISPLAY_EV_IMPL)

Some more info:

/*

To bridge 2 programming systems, the real hurdle is in
impedance-matching the data types and memory model.

In ZL this is done by keeping a very simple basic data type: C
variable argument lists where tags are reprented using

typedef const char *zl_tag;


A. List representation
----------------------

There are essentially 2 kinds of list in C that are easy to use inside
the language:

- Static data initializers.  This requires a union (optionally tagged)
  so has some syntactic overhead.  Works for global and local
  variables.

- Variable argument lists.  This is a feature supported by the C
  standard and is very convenient to use as there is no notational
  overhead.

Often converted data is short-lived, allowing data conversion to be
implemented with nested C calls.  Both are able to side-step malloc()
when they are combined with some kind of callback mechanism.


B. List tagging
---------------

Using const char * as a tag type allows 2 things to be married in a
very elegant way:

- Pointer-based comparison in C without memory allocation overhead
  (all string memory allocation is static).

- Easy conversion to run-time interned symbols (hashed strings) for
  dynamic languages.

*/

Entry: C trailing commas in function calls / definitions?
Date: Sun Dec  9 15:55:58 EST 2012

[1] http://stackoverflow.com/questions/2311864/history-of-trailing-comma-in-programming-language-grammars


Entry: GNU Make subdir stem?
Date: Tue Dec 25 13:57:37 EST 2012

I've been using rules like this for a while:
$(BUILD)/%.o: $(SRC)/%.o

which works fine even if % is a subdirectory.

However, the following seems to insist to take only the file basename
as the stem:

%.o: $(SRC)/%.c

for foo/bar.o in SRC=/baz/src this expands to:

foo/bar.o: foo/baz/src/bar.c


Entry: GNU Make rule recursion
Date: Wed Dec 26 12:05:19 EST 2012

I'm having trouble with parallel make.  As far as I can see, all my
deps are proper, but sometimes make exits with gcc linker stage
complaining:

    abc.o: No such file or directory


I don't see much in the debug output except this, which I don't
understand:

    Avoiding implicit rule recursion.

I'm still not sure what this is, but 2 things fixed it, splitting the original rule into to, i.e. from

$(BUILD)/%.elf: $(BUILD)/%.elf $(BUILD)/lib.a

to

$(BUILD)/%.x: $(BUILD)/lib.a
$(BUILD)/%.elf: $(BUILD)/%.elf $(BUILD)/%.x

and to change to static pattern rules[1].  I'm sticking with the latter.

[1] http://www.gnu.org/software/make/manual/html_node/Static-Pattern.html


Entry: Atomic pointer assignments
Date: Wed Jan  9 09:46:29 EST 2013

I need to build some code that run on on i386/amd64 and ARM.  Is it
safe to assume that pointer assigments are atomic?

It's probably best to not assume anything, and hide the access to
atomic structures behind a typedef and a 

[1] http://stackoverflow.com/questions/8919818/is-pointer-assignment-atomic-in-c


Entry: Linking .o -> .elf to make binary files
Date: Thu Jan 24 23:22:27 CET 2013

The idea is to make a .bin file with some data and
location-independent (-fPIC) code.

To make a .bin from an .elf use:

   objcopy -O binary --only-section=.text <in> <out>

this does not work on a regular "-c" object file because it can
contain undefined data references.  Linking a .o to .elf using:

   gcc -nostartfiles <in> <out>

does things like inserting .rodata constant references in code.  This
is is the .o before linking:

  29:	f3 44 0f 10 0d 00 00 	movss  0x0(%rip),%xmm9        # 32 <proc_dispatch+0x32>
  30:	00 00

and this is the fully linked .elf:

  4002a9:	f3 44 0f 10 0d aa 00 	movss  0xaa(%rip),%xmm9        # 40035c <proc_dispatch+0xdc>
  4002b0:	00 00 

To link properly, use something like the script below.  It places
everything in a single .text section, after which the objcopy command
above works as expected.


  MEMORY  {
      /* All addresses are file-relative. */
      file : ORIGIN = 0, LENGTH = 0x10000000
  }

  SECTIONS {
      . = 0;
      .text : {
          /* Header is first, the rest is only code and constants.
             The order doesn't matter as there are offsets stored 
             in the header. */
          *(.header*)
          *(.text*)
          *(.rodata*)
      } >file
  }


This places the .header section first, which allows things like:

  struct rai_header __attribute__((section(".header"))) PROC(info) = {
      .info = {
          .magic    = RAI_MAGIC,
          .entry    = (u64)PROC(loop),
          .nb_state = PROC_NB_STATE,
          .nb_in    = PROC_NB_IN,
          .nb_out   = PROC_NB_OUT,
      },
  };

Here PROC(loop) is a reference to a function, which shows up as a
file-relative offset because we start addressing at 0.  Note that the
integer .entry field needs to be able to contain a pointer: it is not
possible to truncate an address value to a shorter field at link time.


Entry: Universal header
Date: Tue Feb 12 14:24:29 CET 2013

For storing things in Flash over a non-error-free channel, I found
that there are 2 things important to make handling easier:
- protect individual headers with a small CRC
- don't make chunk sizes too large

struct uh {
    u8 crc;
    u8 size;
    u8 type;
    u8 _reserved_;
};

crc is necessary to verify the whole message
size is necessary to buffer the whole message and skip to the next one
type is necessary to know how to process

_reserved_ is an extra byte that can be used by one of the other
fields.


Entry: Python-style coroutines in C
Date: Sun Feb 17 19:44:33 CET 2013

One of the patterns that came up a lot in a project I worked on
recently is state machines that parse "finitely nested" data
structures.

What I mean is that in order to parse data structures with unlimited
nesting, one generally needs a "task", i.e. something with an
execution (nesting) stack, i.e. a recursive decent parser.

However, in practice, many data structures have a finite structure,
i.e. lists of lists of things.  Those do not need a stack, just a
(finite collection of) state variable(s).

( From the human-understanding p.o.v. it surprises me that I've never
  seen this clear enough until recently.  Maybe my bias is too much
  towards arbitrary nesting depths, i.e. programming langauges? )

So what's the problem?

Suppose you're writing a parser for a nested data structure that needs
to be suspended, i.e. the data is not available all at once, but only
in small chunks.

Traditionally, there are 2 ways to do this in C:

  - Use a thread.
  - Use an explicitly coded state machine.

As long as there is a possibility to use a blocking mechanism such as
a pipe, a thread might be the best solution.  However, it is a bit
overkill, since we really do not need a full stack

Granted, there is no such thing as a "full stack" in C : each thread
is always just a state machine because the stacks are always finite.
But the point is about program structure, i.e. what can we do when we
know exactly the nesting dept at compile time?

So, we'd like to use a state machine.

Trouble with state machines is that they are hard to code, because
they often look like "manually inverted, nested for loops".

However, this "inversion" can be done automatically.

This is what most "coroutine" libraries in C are about: clever tricks
to "jump into the body of a function".


So what is the transformation?

- Move all the state variables into an object (i.e. no local
  variables!  either static variables or an explicit context struct)

- Abstract the "blocking points", i.e. the part around the "read"
  functions as a sequence:
  - save state
  - return to "OS"

- The "OS" then performs the read, i.e. stores memory into a buffer,
  or performs an externally blocking operation, and calls the function
  again with a block point.


Something like this:

struct vars {
    /* Continuation */
    void *buf;
    void size;
    void **next;

    /* Program variable context */
    int i, i_max;
    int j, j_max;
};

void parse(struct vars *x) {
    if (x->next) goto *(x->next);
    x->buf  = &x->i_max;
    x->size = sizeof(x->imax);
    x->next = &&read_1;
    return;
  read_1:
    for (x->i = 0; x->i_max; x->i++) {
        for (x->j = 0; x->j_max; x->j++) {
        // ...
    }
}


The lucky thing is that the storage of the next state as a computed
goto is immediately followed by the label.  This means that no special
tricks are necessary to use the limited C preprocessor to generate the
following "call sequence"

          x->buf  = &buf;
          x->size = sizeof(buf);
          x->next = &&resume_point;
          return;
        resume_point:

The __COUNTER__ token can be used to generate the labels.

Ok, so I've implemented it and put it online[1].

First comment I get is: "Tricky, powerful & dangerous." :)

Some people really don't like this.  Preferring "stupid" code to make
maintainability easier.  Fair point.  Still a neat hack though, if
used with the appropriate caution.

[1] http://zwizwa.be/git/shaco


Entry: How to design binary data structures?
Date: Fri Mar  1 13:10:33 CET 2013

The main trouble is serialization of in-memory linked data structures.
However, there is a solution to this problem in wide use already:
application binaries.  The program that prepares those binaries is a
linker, often split in 2 phases: compile time and run time linking.

This can be re-used by letting a linker generate a binary with a "base
address" of 0.  The only measure this requires at load time is the
translation of all pointers to base-relative addressing.

It's probably even possible to include a relocation table in the
binary, so this doesn't need to be done manually.

EDIT: For my current case it seems simplest to perform the relocation
manually at load time.  The data structure traversal for linking could
be re-used to perform additional consistency checks.


Entry: OpenMP
Date: Thu May 23 19:36:25 EDT 2013

#include <stdio.h>
 
int main(void)
{
  #pragma omp parallel
    printf("Hello, world.\n");
  return 0;
}

$ gcc -fopenmp omp.c 
$ ./a.out 

Hello, world.
Hello, world.
Hello, world.
Hello, world.


[1] http://en.wikipedia.org/wiki/OpenMP


Entry: setting up cscope for code review
Date: Tue Jul 16 10:22:34 EDT 2013

- Make a directory ~/cscope

- Symlink all relevant project directors into ~/cscope


Entry: Implementing abstract state machines in software
Date: Mon Jul 22 14:11:22 EDT 2013

One of the practical challenges for implementing state machines in
software is time delays.  In software, FSMs are necessarily
event-based (either poll or interrupt).  I.e. computation is started
when the environment changes.

However, it is often hard to provide a strong time base.  Though this
is often required in practice: many digital protocols require time
delays.

How to abstract this properly?  I.e. how to write a state machine in
an abstract way such that both event routing and timing is done
in a convenient way?

It seems that a combination of two approaches would work:
  - A way for a FSM to request a delay to the host
  - A way for the FSM to inspect time passed (access to timer).

It doesn't seem worth the hassle to try to implement a "thread-like"
FSM api in C.  This task is best left to a generator.


This all fits into a more general idea of not using interrupts and/or
threads in application development to provide a more predictable task
switching semantics.

In this scheme, interrupts could still be used to drive simple
buffering in case DMA is not available, but use should probably only
be generalized to single producer-consumer structures.


Entry: Connecting state machines
Date: Thu Jul 25 17:41:39 EDT 2013

After a couple of years in RTOS land, I find myself writing
non-trivial code for a bare-bones uC.  Non-trivial in the sense that
the problem would normally be solved using a couple of communicating
tasks.  I am resorting to state machines, triggered essentially from
interrupts.

State machines are a pain to write from a C code pov, but they do
allow a more straightforward handling of atomicity.  It might actually
be worth the hassle.

Mutual exclusion would then in essence be handled by setting a single
priority interrupt routine.


Entry: Premature memory optimization
Date: Mon Aug 12 12:35:58 EDT 2013

It seems that in deeply embedded software design, it is possible to
save a lot of memory by replacing buffering with control flow.
Essentially there are two kinds of buffers:

- Limits set by hardware, e.g. USB endpoint buffer sizes

- Limits set by software, e.g. communication between different API
  layers and tasks.

Memory usage inefficiency comes from the second part.  It's almost
always possible to eliminate (deforest?) buffers and handle data
correctly.


Entry: Coroutines in C
Date: Tue Aug 20 18:28:31 EDT 2013

Interesting links from HN[4] article.  For later mining..

[1] http://www.embeddedrelated.com/showarticle/455.php
[2] http://fanf.livejournal.com/105413.html
[3] http://dotat.at/cgi/git/picoro.git
[4] https://news.ycombinator.com/item?id=6243946


Entry: Linux coding style in emacs
Date: Sat Dec  7 10:07:53 EST 2013

- M-x c-set-style linux
- tabs as tabs

[1] https://www.kernel.org/doc/Documentation/CodingStyle
[2] http://stackoverflow.com/questions/5476092/emacs-changing-c-coding-style


Entry: CPP defs
Date: Mon Dec  9 10:30:12 EST 2013

To check which machine-specific macros are defined, e.g. __i386__ or __x86_64__ :

cpp -dM /dev/null


Entry: gdb -i=mi and target-stream-output
Date: Mon Jan  6 12:13:00 EST 2014

I'm guessing a problem with gud in emacs is that the process
stdout/stderr is not tagged with target-stream-output '@'.


Entry: c++ rehash
Date: Fri Feb 28 18:24:17 CET 2014

why

  virtual ~foo() {}

and not

  virtual ~foo() = 0;

?


Entry: C++ allocators
Date: Sun Mar  9 12:21:29 CET 2014

I'd like to wrap a chunk of memory as a shared_ptr<vector<uint8_t> >,
and then manually assert that the refcount is 1 after pushing it
through some network.

template <typename T>
class slice: public std::allocator<T> {
public:
  T* allocate(size_t n, const void *hint=0) {
    LOG("allocate(%d,%p)\n", n, hint);
    return (T*)_data;
  }
  void deallocate(T* p, size_t n) {
    LOG("deallocate(%p,n)\n", p, n);
  }
  slice(U8 *data, U32 data_length) :
    _data(data),
    _data_length(data_length) { }
private:
  U8 *_data;
  U32 _data_length;
};


Entry: John Hughes on debugging imperative code with QuickCheck
Date: Fri Mar 28 22:08:18 EDT 2014

Some conclusions ([1] 18:00)

- The same property can find many bugs.

- Minimal failed tests make debugging easier.


The latter is nice and convenient, but the former is actually a really
interesting argument, especially for testing low-level code:
behavioral models have _less_ properties than full-blown quirky
implementations, so might test different code paths or sub-solutions
for differing inputs.

EDIT: The stateful testing is proprietary.  It would be a nice
challenge to find out how to do this.

[1] http://www.youtube.com/watch?v=zi0rHwfiX1Q&feature=youtu.be


Entry: Property based testing
Date: Sun Mar 30 08:33:06 EDT 2014

- Model the state
- Wrap each API call in a state transition function (post condition)
- Generate random sequences of API calls until failure
- Verify postcondition
- On failure, shrink test

The pre/post conditions seem straightforward.  How to shrink tests?

For an ordered list of N calls, how many ordered subsets are there?
How can they be organized in a tree such that they can be searched
more effectively?

Simplification: the assumption is that some calls are irrelevant.
I.e. we've found a way to trigger a bug and we just want to simplify
that particular way - not find a new way to trigger.  So a search
strategy is a tree of permutations, with pruning for cases that stop
failing.

I.e. when removing a particular call no longer fails the test, we
label it as essential and won't test the removal of any more calls
with that one call removed as well.  Note that it might be that there
are triggerable bugs in that subtree!  But that's not what we're
after.

So how to represent?

- A stateful test is a list of API calls together with their
  arguments.

- The test function is composed from the list of calls and the state
  transition model.


Entry: gdb source debugging in emacs broken
Date: Fri Aug 29 11:43:59 CEST 2014

Things break a lot these days..

So when I write a standalone app on zni or zoo, all works fine.

M-x gdb
gdb -i=mi app.el
break main

This shows the source file with current location set.

However in other apps I have trouble getting this to work.


Entry: Ivory
Date: Sat Sep  6 15:48:57 CEST 2014

Seems a little limited to very simple state machines and loops...

[1] https://www.youtube.com/watch?v=wC2tmo7l5Mc


Entry: Cross compile ARM debian
Date: Sun Sep  7 09:31:09 CEST 2014

Time to set up cross-compiler for beaglebone.


Entry: Boost spirit parser
Date: Mon Sep 29 11:44:38 CEST 2014

spirit[2] Recursive Decent parser generator
Spirit V2 grammars are fully attributed (see Attribute Grammar)[3]
- Qi: parser generator eDSL
- Lex: lexer
- Karma: output generator

phoenix[1] Phoenix enables Functional Programming (FP) in C++
e.b. bind[4]

_val is another Phoenix placeholder representing the rule's synthesized attribute[6]

eps is a multi-purpose parser that returns a zero length match


[1] http://www.boost.org/doc/libs/1_56_0/libs/phoenix/doc/html/index.html
[2] http://www.boost.org/doc/libs/1_56_0/libs/spirit/doc/html/index.html
[3] http://en.wikipedia.org/wiki/Attribute_grammar
[4] http://www.boost.org/doc/libs/1_56_0/libs/bind/bind.html
[5] http://www.boost.org/doc/libs/1_56_0/libs/spirit/doc/html/spirit/qi/reference/parse_api/iterator_api.html
[6] http://www.boost.org/doc/libs/1_56_0/libs/spirit/doc/html/spirit/qi/tutorials/roman_numerals.html


Entry: Keil C
Date: Thu Oct 30 10:55:25 EDT 2014

What is the Listings/ directory?
Seems to be generated at compile time.


Entry: Quickcheck erlang eqc_fsm
Date: Fri Oct 31 20:33:23 EDT 2014

For testing C libraries.


Entry: State machine or buffers?
Date: Thu Nov  6 16:33:40 EST 2014

I'm inclined to say state machine.  Buffering takes away the control
problem of having to define "suspend points", replacing them by
straight line imperative code.  However, it only works when
segmentation is possible.


Entry: IO state machines
Date: Sat Nov  8 14:12:57 EST 2014

These things are not trivial.  Currently writing one for a gdb stub.
Seems that a lot can be generalized however.  Something can be learned
here..

Problem:
- stream-oriented protocols
- memory constraints

In general, such a problem is best solved by a coroutine / task
approach, where each independent element in the stream processing is a
separate task.

The problem is "suspending".  Some approaches ordered in increasing
suspend granularity:

- pre-emptive multitasking: suspend at machine level
- cooperative multitasking: suspend at a source-level coherent point
- state machines: suspend

The first two use stacks, the latter one uses explicit suspend points.
Abstacting away suspend points trades in predictability for ease of
use.

When working with explicit suspend (state machines), there is a
trade-off between control complexity and buffer memory usage.
Depending on the kind of protocol, there is usually an upper limit to
the size of a buffer that contains the "largest" protocol element in
such a way that there is no space needed to store detailed
intermediate processing.


Take home argument:
- Split design abstractly into cooperating coroutines
- Add buffering to reduce suspend problems
- Granularity is a trade-off between worst-case processing time,
  memory usage, code complexity due to fsm-ification.


Entry: read(2) / write(2) on pipes
Date: Wed Jan  7 18:55:18 EST 2015

If I do a write(2) on a pipe, is it guaranteed to be an atomic read(2)?


Entry: Macro name concatenation
Date: Thu Jan 15 23:10:50 EST 2015

There's this weird thing about macro expansion order combined with
symbol concatenation.  Forgot how it went and how to work around it.

[1] http://stackoverflow.com/questions/7045358/how-to-cause-macro-expansion-before-concatenation


Entry: union tagging
Date: Sun Feb  8 15:21:28 EST 2015

/* Note that "static inline" doesn't necessarily inline.  Adding this
   attribute will force GCC to inline the function even if
   optimization is off. */
#define INLINE static inline __attribute__((__always_inline__))

/* Union tagging.  libopencm3 uses untyped uint32_t for almost all
 * peripherals.  This is error prone in the abstractions uses below,
 * so the code here wraps identifiers in a union.  The wrapper is
 * optimized away leaving no run-time overhead. */
#define HW_WRAP_UNION(name, type)         \
    union name { type id; };              \
    INLINE union name name(type id) {     \
        union name u = {.id = id};        \
        return u;                         \
    }
//            wrapper | original libopencm3 type
// -------------------+------------------------
HW_WRAP_UNION(hw_rcc,   enum rcc_periph_clken)
HW_WRAP_UNION(hw_tim,   uint32_t)
HW_WRAP_UNION(hw_gpio,  uint32_t)


Entry: State machines with simple for loops
Date: Sat May  9 02:25:24 CEST 2015

In C without too much ado.

Basic idea is to capture all the local variables in a struct, and have
entry points for all the "wait" statements.

These can be implemented as lables.

wait(COND) then boils down to:


label: if (!(COND)) { x->next = &label; return; }


That's just one macro:

#define WAIT(label,condition) \
  label: if (!(condition)) { state->next = &&label; return 0; }

Where we're assuming local context:
  - variable name "state"
  - field name "next"
  - 0 means task is blocked (other returns can have error conditions).

Use computed goto (addresses) but if that's not available, use a case
statement.

EDIT: going for it.  Also implementing sub-machine call (like
subroutine, but executing an encapsulated state machine until halt).


Entry: Circular counters
Date: Tue May 12 19:34:48 CEST 2015

How to get a circular counter that also has a count for the number of
wraparounds?

Say we have depth d, but we actually want to have w wraps as well,
giving w*d.  Where w is as large as possible.


Entry: poll vs. select
Date: Fri May 15 18:17:29 CEST 2015

[1] http://daniel.haxx.se/docs/poll-vs-select.html


Entry: mem_write_lazy
Date: Sat Jun  6 12:22:21 CEST 2015

struct mem_write_lazy {
    uint32_t addr;
    uint32_t n;
    union {
        uint8_t byte[4];
        uint32_t word;
    } data;
};

static void mem_write_lazy_init(struct mem_write_lazy *m, uint32_t addr) {
    m->addr = addr;
    m->n = 0;
}
static void mem_write_lazy_flush(struct mem_write_lazy *m) {
    if (((m->addr & 3) == 3) && (m->n == 4)) {
        mem_write32(m->addr, m->data->word);
    }
    else {
        for(int i=0; i<m->n; i++) {
            mem_write32(m->addr+i, m->data->buf[i]);
        }
    }
    m->addr += m->n;
    m->n = 0;
}
static void mem_write_lazy_byte(struct mem_write_lazy *m, uint32_t val) {
    ...
}


Entry: trie encoding
Date: Fri Jun 19 19:31:56 EDT 2015

Tried a naive representation of a trie, and it seems to be very
space-inefficient.

How to encode the trie in a single string?

(char,offset)


(nb_entries, (char,offset))

if char == '#', offset indicates offset in payload table
else it is the number of characters to jump down the string

The constraints:
 - space between entries <256
 - nb entries <256

If this becomes a problem, easily solved by going to 16 bit encoding.


Entry: Semaphores on Cortex M
Date: Sat Jul 11 15:08:52 EDT 2015

I'm currenly using an approach of:
- no RTOS
- main event loop, polling state machines
- ISRs interacting with FIFOs to bridge to main loop

It's tricky to get the FIFOs right.

Currently I rely on atomic read/write, but in some cases it would help
to have real semaphores.  How to implement on Cortex M?


Entry: mmap shared
Date: Sat Feb 13 21:40:17 EST 2016

http://stackoverflow.com/questions/16032396/how-to-get-mmaped-memory-to-sync-to-the-file


Entry: Debian multiarch linker flags
Date: Fri Feb 26 17:05:42 EST 2016

I want to build a 32bit intel binary on a 64bit system.
I solved this problem before...

Excerpt, with multiarch:

ifeq ($(shell uname -m), x86_64)
## Build 32-bit binaries (emu_portcon.elf needs 32bit pointers)
GCC_DIR             := /usr/lib/gcc/i586-linux-gnu/4.9/
USR_LIB_DIR         := /usr/lib/i386-linux-gnu/
LIB_DIR             := /lib/i386-linux-gnu/
PLATFORM_CFLAGS     := -m32 -fPIC
PLATFORM_LDFLAGS    := -m32 -L$(GCC_DIR) -L$(LIB_DIR) -L$(USR_LIB_DIR) -nostdlib
PLATFORM_STARTFILES := $(USR_LIB_DIR)/crt1.o $(USR_LIB_DIR)/libc.a -lgcc_s
else
PLATFORM_CFLAGS     := -fPIC
endif


This produces binaries that crash in exit() on tp.
Let's do it the right way...

Ok, likely not the right gcc version.
Nope.  2 hosts do not have this, but tp does.  Let's upgrade.

This upgrade fixed it:

ii  libc6-i686:i386                   2.19-18+deb8u2
ii  libc6-i686:i386                   2.21-9


Entry: small memory software
Date: Sun Mar 13 22:48:28 EDT 2016

http://www.smallmemory.com/book.html


Entry: uC program core
Date: Sun Apr 24 15:49:21 EDT 2016

- "recursive" finite state machines = tasks with finite and static
  activation records / recursion depth.

- hardware drivers = glue code to abstract state machines.

- glue code.

Components aren't hard.  The problem is the glue code, essentially the
abstraction for communication between tasks.  I.e the OS.

Communication (almost?) exclusively takes the form of writes to a
memory location or larger buffer / queue, and synchronization to an
external event in the form of an actual event (interrupt or function
call), or a logical condition change that needs to be polled.

The sequencer then orchestrates resumption of execution, possibly as
simple as a round-robin poll.

True boundless recursion is (almost?) never needed.  Activation
records can be allocated in advance for all tasks in a single nested
struct/union.  Mutually exclusive activation records (successive
function calls) can use a C union.


What about designing a language with those restrictions?  Building
applications then boils down to building state machines (finite tasks)
that cause events and wait for events to happen.  CSP or Actor model.


Entry: Two abstractions
Date: Sun Apr 24 16:01:36 EDT 2016

DE: event propagation with a concept of simultaneity.

SM: finite-depth nested tasks (state machines) as C macros.

Combination might prove useful.


There were some problems, but I forgot what exactly.  Something to do
with the need to schedule an update "at the next time instance",
i.e. queing side-effecting actions based on pure computations.


Entry: Building a malloc-less, static Erlang
Date: Sun Apr 24 16:21:59 EDT 2016

Essentially, hardware.  The idea being that designing real-time
applications is best done as state machines, not as tasks.  Leave the
hardware/software separation to a configuration stage: implement tasks
that handle high event rates in hardware, implement tasks that change
infrequently or don't need fast response times in software.

Features:
Processes are static: call tree is known at compile time.
Processes have infinite life span.
Language is immutable, implemented through mutation.

Is it possible, by static analysis of message passing, to restrict the
size of the message queues?


Entry: Inline functions and instantiation (fast abstract code).
Date: Sun May  8 23:27:35 EDT 2016

The pattern is this:

- generic:
  struct config { int param; }
  static inline int fn(struct config c, int arg) { return c.param + arg; }

- specific instantiation:
  static const struct config c0 = { .param = 123 };
  int fn0(int arg) { fn(c0, arg); }


It seems that most compilers know how to fold the constants even if
it's a big struct.  This makes it useful for deeply embedded code
where you want to avoid the abstraction overhead.  This can be
significant for peripheral access that in most cases simplifies to one
or two inline bus accesses, as opposed to a function that inspects the
struct at run time).

A way to use this is to put all instantiations in a main.c file, and
have library code refer to instantiated versions of the generic inline
code.

Note that pointers work equally well as long as the struct referred to
is const.  If it is not, it is possible to externally modifiy it so
compiler can't fold it.  If struct is also static, no remnant will be
left in the file.

It might be better practice to use pointers.  In case of non-optimized
code this would be not horribily inefficient (call by value).


Entry: Supporting multiple boards with optimized HW functions
Date: Wed May 11 12:15:08 EDT 2016

I've found this hard to do in C.  Situation: a number of boards with
different hardware configurations (e.g pins, peripherals) but in need
of fast HW access.

For fast HW access, the solution is to use inline functions, possibly
parameterized by static const structures.  I.e. make sure the compiler
has access to the config for constant foldig, and optionally doesn't
need to include the config structs themselves in the image.

However, this still creates a problem of instantiation.  The idea is
then to instantiate functions in the main .c file, one for each board.
These main .c files provide an iterface to the library code that is
shared across the boards.  The HAL thus consists of plain C functions
defined in main .c, used by library code.

To further separate this I found it useful to define a set of
inline.*.c files that are not compiled to .o and gathered in in the
libary, but are inlined in the main file through #include preprocessor
directives.  These files expect symbols or macros defined outside of
it.

It's possible to have it both ways:
- create .c files that reference a C object for config
  - either create this object as non-static, in which the config is dynamic 
  - or create it as a const static object and inline the .c in main .c to give static config

Depending on whether the code describes a singleton or a function that
is used multiple times it can reference a global symbol or a function
parameter for the configuration.


Entry: Low level debugging
Date: Fri May 13 17:41:56 EDT 2016

Today another one of those days..  In general I avoid low level
programming - i.e. the kind that has no "feedback", where you have to
be really clear about all the assumptions that need to be checked
without there being anything saying: hey, this contract is broken.

I ended up at a point where I don't even have a printf.


Entry: multilib
Date: Sun Jun  5 13:53:27 EDT 2016

gcc -m32 needs "gcc-multilib" package which removes "libc6-i686:i386".

Some documentation here:

https://wiki.debian.org/Multiarch/LibraryPathOverview

Main question: what is the link between gcc-multilib and multiarch?
- Multilib is a gcc feature to support multiple ABIs on one ISA.
- Multiarch is to support multiple architectures on one debian system.


Entry: rust 1.8 embedded
Date: Tue Jun  7 13:33:47 EDT 2016

https://spin.atomicobject.com/2016/05/25/rust-1-8-embedded-firmware/


Entry: gdb reverse debugging
Date: Fri Jul 15 12:07:35 EDT 2016

https://www.youtube.com/watch?v=PorfLSr3DDI
https://www.gnu.org/software/gdb/news/reversible.html

record
reverse-stepi
break/watch
reverse-continue


Entry: cross compile debian arm
Date: Sat Aug 13 00:52:22 EDT 2016

Easy these days:
dpkg --add-architecture armhf
apt-get update
apt-get install libncurses-dev:armhf gcc-arm-linux-gnueabihf

With qemu-user an binfmt-support installed it can also execute the
resulting binaries.


Entry: low level dev tools
Date: Tue Sep 27 17:47:32 EDT 2016

https://blog.fogcreek.com/dev-life-interview-with-casey-muratori/

  There are no good dev tools out there today. That much is abundantly
  clear, and it is a topic on which I find almost all my systems-level
  programming friends agree, so I think it’s becoming a consensus
  among people who actually care about low-level code.  Nobody is
  making dev tools for us anymore, and it shows.

Also mentions "compression oriented programming":

https://mollyrocket.com/casey/stream_0019.html


Entry: const pointers
Date: Sat May 25 11:00:32 EDT 2019

A pointer to a const pointer.

Yeah I dont remember how this works.

Use case is this: I want to get a const *var_t by having a function
set a pointer.

typedef struct {
   int val;        
} var_t;

int get_var(var_t **ppvar) {
}

const var_t *pvar;
get_var(&pvar);


How to interleave the const stuff?


Entry: C type classes
Date: Tue Apr 28 21:41:45 EDT 2020

// Define the data structure abstractly as macro that invokes its
// argument on every field

#define my_struct_fields(def_field) \
        def_field(name1, type1) \
        def_field(name2, type2)

// We can then plug this into a number of "definers".  E.g. one that
// instantiates the struct:

#define DEF_STRUCT_FIELD(name,type) type name;
#define DEF_STRUCT(name) struct name { name##_fields(DEF_STRUCT_FIELD) };

DEF_STRUCT(my_struct)

// And others variants of the type template, e.g. serializer functions.
#define DEF_WRITER_WRITE(name,type) pbuf_write_##type(p,s->name);
#define DEF_WRITER(name)                                \
    static inline void pbuf_write_##name(               \
        struct pbuf *p,                                 \
        const struct name *s) {                         \
        name##_fields(DEF_WRITER_WRITE)                 \
    }
DEF_WRITER(my_struct)

// Which creates write functions that take a struct defined by the //
// previous macro construct, and create a writer function for that
// struct.  Here the buffer abstraction from uc_tools is used (pbuf).


Entry: GCC asm syntax
Date: Mon May 11 09:11:25 EDT 2020

TL;DR

output section:
=r  write to register
+r  read and write to register

input section:
r   read from register


An example:

struct channel {
    uint32_t setpoint;
    uint32_t accu;
};

INLINE void channel_update(
    struct channel *channel,
    uint32_t *shiftreg) {

    __asm__ (
        "   adds %0, %0, %2  \n"   // update accu, update carry
        "   adc  %1, %1, %1  \n"   // shift carry flag into LSB
        : "+r"(channel->accu),     // %0 read/write
          "+r"(*shiftreg)          // %1 read/write
        :  "r"(channel->setpoint)  // %2 read
        : );
}


Entry: bin to elf
Date: Sat May 16 10:14:24 EDT 2020

arm-none-eabi-objcopy \
    --input-target=binary \
    --output-target=elf32-littlearm \
    17.10.03.bin 17.10.03.elf

tom@panda:~$ arm-none-eabi-objdump -x 17.10.03.elf 

17.10.03.elf:     file format elf32-little
17.10.03.elf
architecture: UNKNOWN!, flags 0x00000010:
HAS_SYMS
start address 0x00000000

Sections:
Idx Name          Size      VMA       LMA       File off  Algn
  0 .data         00004cc5  00000000  00000000  00000034  2**0
                  CONTENTS, ALLOC, LOAD, DATA
SYMBOL TABLE:
00000000 l    d  .data	00000000 .data
00000000 g       .data	00000000 _binary_17_10_03_bin_start
00004cc5 g       .data	00000000 _binary_17_10_03_bin_end
00004cc5 g       *ABS*	00000000 _binary_17_10_03_bin_size


Can this still be relocated?