C and Unix programming. Entry: NULL terminated arrays of pointers Date: Thu Oct 23 14:43:21 CEST 2008 == a cdr coded lisp list. These work better than arrays + length. The only problem is that allocating temporary data structures on the run-time stack requires object size. But.. maybe alloca works here? Entry: High level programming in C: Data Structures Date: Wed Oct 29 10:14:30 CET 2008 [ This should go in a separate CS.txt log. ] What I miss most in C are closures and directed acyclic datatypes. This made me think a bit about different classes of data structures in a C program. In a dynamic language with GC these all tend to blend. 1. Linear types (pure trees): every object has a single reference, which could be implemented as embedded structs (C struct inheritance) or embedded struct pointers (C struct delegation). 2. Directed acyclic graphs: objects can have multiple references, but there are no loops. Reference counting works here. 3. General graphs: If there are circular references, reference counts won't work: you need to build "intelligent" constructors / destructors for the whole graph. Entry: C Date: Mon Nov 10 12:11:11 CET 2008 As long as everything is static, C isn't so bad. But anything that involves complicated datastructures is a pain.. The first thing to standardize on is a list implementation. Looks like this is done the right way in the Linux kernel. Entry: RPC and ad-hoc protocols Date: Mon Nov 10 17:34:00 CET 2008 This is about different ways of looking at procedure calls, mostly from an object-oriented pov (modify object in-place) vs. functional programming view (share + copy). * In the OBJECT model, the client passes a data object to the server, which is modified in place and returned back to the client. (I.e. Unix IOCTL). The advantage here is that server is relieved from memory management. In the setting of FP, linear memory management can be used. * In the FUNCTION model, there is no mutation of shared data between client and server. The client passes a copy of the necessary data to the server over a message channel, and the server returns a reply over a different channel. The problem I ran into is having to implement an RPC interface over a unidirectional message protocol. This happens when what you want is really a shared library of data objects and function calls, but it needs to provide control operations behind the scenes (i.e. it has an event loop that's not triggered by the clients). An interesting road is to investigate what is necessary to write an RPC generator: eliminate the red tape by hiding the message protocol from the user, and expose only procedure calls on both client and server side, where client calls, and server gets called. The problem with automating RPC generation is data-sharing: often in C, a function will allocate or mutate an object, and thus creating a side-channel. So it looks like this abstraction is going to be somewhat leaky, since it needs to be purely functional (copy instead of mutation). Entry: Protocol definition Date: Tue Nov 11 13:26:19 CET 2008 1. Make sure code never depends on the protocol: all encoding should be done automatically. 2. Provide automatic documentation for wire debugging. (Or better still, a sniffer that produces human-readable data.) Entry: packet filters vs. aspect oriented programming Date: Wed Nov 12 12:29:53 CET 2008 I'm working on an RPC mechanism over unidirectional Xenomai message queues, and it struck me that a packet filter is actually a pointcut, a set of join points matching a specification: given a stream of RPC messages, let some through but modify others. Entry: CSP style channels on Xenomai Date: Wed Nov 12 16:13:23 CET 2008 I'm trying to use queues for something they are probably not useful for. What I really want is CSP style channels. There are two problems: * A select() is necessary to wait on multiple events. * Tasks might be too expensive for fine task granularity. This describes the CT library from UTwente: http://www.ce.utwente.nl/rtweb/publications/2007/pdf-files/102CE2007_WMC07_pdf.pdf Q: What is the difference between waiting on a set of messages from a single "socket", and waiting on a set of ports? Can such descriptions be automatically transcribed? Anyways, the event flag group services provide the necessary machinery: An event flag group is a synchronization object represented by a long-word structure; every available bit in such word can be used to map a user-defined event flag. When a flag is set, the associated event is said to have occurred. Xenomai tasks and interrupt handlers can use event flags to signal the occurrence of events to other tasks; those tasks can either wait for the events to occur in a conjunctive manner (all awaited events must have occurred to wake up), or in a disjunctive way (at least one of the awaited events must have occurred to wake up). Entry: Daemon vs. Shared library Date: Tue Nov 18 11:16:06 CET 2008 Since a shared object can be used to share static data, as long as processes need to communicate on the same machine and need RPC semantics (clients are single thread, server uses synchronized access), this is probably enough. However, this does needs shared memory for the shared data, and semaphores to protect them. The API can abstract this. Entry: Implementing CSP in a single thread. Date: Tue Nov 18 16:17:20 CET 2008 Apparently, "CSP semantics" for conditions are inefficient to implement. One usually chooses for "Mesa semantics", where setting a condition does not mean a yield, which means that the condition can change before signalled tasks are effectively woken up. The remedy is to always check the condition after wake-up (loop before you leap!) http://www.cs.duke.edu/courses/spring01/cps110/slides/sem/sld005.htm But as far as I can see CSP semantics are only inefficient if task switches are expensive. The idea is probably that task switches can be limited if a single thread sets more than one condition in a row. So, when task switches are cheap (stack machines), this problem goes away? I'd like to implement this in Staapl. Entry: Iterators vs. Fold vs. Comprehensions Date: Fri Nov 28 12:11:49 CET 2008 http://lambda-the-ultimate.org/node/1224 What you want in C however, is a comprehension instead of a fold: this allows to use lexical variables without creating explicit iteratee context objects, or using gcc's downward closures. A comprehension is really just the macro form of a fold: the iteratee is inlined. The problem with comprehension macros is the maintenance of state across the loop body. Such a macro looks like: type1 var1; type2 var2; ... FOR_COLLECTION(var1, var2, ...) { // code that uses var1, var2, ... } Without access to the scope of the FOR and an intermediate scope of the loop body, this becomes quite difficult to hack.. Something like this might work better: type1 var1; type2 var2; ... FOR_COLLECTION(var1, var2, ...) // code that uses var1, var2, ... END_FOR In short, it's too complicated to do in general, so let's stick to generic folds. EDIT: Actually it is not necessary to store iteration state variables inside the context. Using a macro like this: #define DE_COLLECTION_FOR_(_list, _de, _del) \ struct list *_del; \ for(_del = (*_list), _de = _del ? _del->de : NULL; \ _de; \ _del = _del->next, _de = _del ? _del->de : NULL) #define DE_COLLECTION_FOR(list, de) DE_COLLECTION_FOR_(list, de, __del__) The iteration state is stored outside the context. The compiler will probably optimize the use of the variable, so we need not worry about memory usage. The only remaining problem is the "__del__" symbol: it is generated inside the macro and might cause name clashes. How to implement gensym in C?[1] [1] entry://20100825-142132 Entry: C++ local objects + exceptions. Date: Mon Dec 8 10:46:40 CET 2008 Local objects are destroyed on exceptions. This is actually quite useful as a code generation target.. But isn't this the same as alloca() ? Entry: GET/SET refactoring Date: Fri Dec 19 11:05:07 CET 2008 I've found an interesting test case for automatic refactoring: implementing endianness abstraction by exctracting GET/SET accessors from direct struct access in a low-level driver that uses little endian packed data structures. Entry: datastructures in C Date: Wed Jan 21 17:14:43 CET 2009 In lisp, a composite datastructure is always a bunch of pointers to other data structures. In C, some data might be "inlined". This inlining is done behind the scenes in Lisp (i.e. tagged ints), while in C you have: struct->field or struct.field And it's always _your_ responsability to know which of both it is. Is it possible to do this automatically, but still retain manual inlining at the definition site? I.e. to write something like struct=>field, and have the compiler expand it to struct->field or struct.field (analogous to Macros in Staapl)? Entry: memory management + RT Date: Thu Jan 29 09:27:42 CET 2009 - RT datastructures (directed acyclic graphs) This means eliminating malloc(). How far can we get with DAGs? With downward closures and read-only structures all data is stored in the tasks's activation frame. This means DAG memory management without reference counting. - traversal and partial continuations In the face of lightweight tasks (i.e. stack machines), partial contuations (tasks) become a feasible abstraction. Figure out if there are "free rides". I.e. if this can lead to datastructures that would otherwise require special memory management (refcounts) but can be built on top of stack-based allocators. One remark though: this requires one (arbitrarily large) stack per task, so just moves the problem. The problem being that _real_ memory comes from a shared pool - a machine architecture problem. So.. Can we solve this problem with virtual memory? Or is VM exactly the thing that makes context switches expensive? To summarize the idea: ASSUMPTION: Downward closures are 'powerful enough' for building data structures in simple RT programming. ADVANTAGE: Use fast stack-based memory management. PROBLEM: One stack per task. How to allocate this from a single memory pool? It is a non-issue for non-shared memory, but that might require inefficient pre-allocation. Where is the trade-off exactly? Entry: logic & constraint based programming Date: Sat Jan 31 12:35:51 CET 2009 More specifically: reversible dataflow. From a collection of equations (or inequalities), construct a C program or interpreter + program representation that computes a solution (or computes feasability). Entry: lists Date: Thu Feb 5 17:29:21 CET 2009 For a simple static C program with a bit of dynamic data, this is a simple list structure based only on concatenation, with "next" pointers embedded in the objects. This works for pure trees (flat trees). Conceptually, only these operations are necessary: * a PUSH operation of a singleton to a stack. * a REVERSE operation to finalize construction preserving order * a FOR operation for traversal If objects have built-in list pointers, there is conceptually no "element". All things are lists, and the primitive operations are "overlay_1" and "split_1". overlay_1 (abcd... , 1234...) = a1234... split_1 (abcd...) = bcd... I'm not sure if this way of looking at it is so useful, but there _is_ a difference between objects that contain a next pointer, and obects that are contained in a separate container structure (i.e. cons cells). Using embedded next is commonplace, and is easier to use with manual memory management, so maybe it's better to stick to it. Another way of looking at it: if datastructure construction is not time-critical, using quadratic list insertion is not really a problem.. That way append and iteration might be enough. The problem with embedded "next" pointers is that if you want to put the objects in an alternative container, you get two different containement mechanism (one built-in, one explicit). Entry: Data structure traversal in C Date: Fri Feb 6 15:31:43 CET 2009 In a block-scoped language without proper lexical closures, comprehensions (iteration macros) are easier to use than iteratatee objects (higher order functions) -- the operation passed to an iterator. The reason is that the body of the comprehension has access to the lexical environment outside of that body. In this scenario, factor out the data structure's FIRST and NEXT operations into procedures, and construct a trivial FOR macro using those based on C's for( _ ; _ ; _) construct. They are more limited: the comprehension loop block cannot be reused, while (stateless) iteration objects can be. EDIT: it is all about the difference between ``let'' and ``lambda''. The latter ``forks'' the stack while the former does not. Entry: (*x)[i] Date: Mon Feb 9 12:00:00 CET 2009 Given int **x, what's the difference between (*x)[i] and *x[i] ? This is clear to see when we write (*x)[i] and *(x[i]); In the first x is a pointer to an array of ints, in the second x is an array of pointers to int. Be careful, the compiler can't see the difference! Entry: C macros and name capture Date: Thu Feb 19 17:41:02 CET 2009 Today I spent an hour tracking down a classic bug: hygiene problems with non-hygienic macros (C preprocessor). See the following definition, call and expansion: // definition #define LST_ADD(head_lvalue, tail) {\ typeof(head_lvalue) *p = &(head_lvalue); \ while(*p) { p = &((*p)->next); } \ *p = tail; } // call LST_ADD(m->param_head, p); // expansion { typeof(m->param_head) *p = &(m->param_head); while(*p) { p = &((*p)->next); } *p = p; }; The introduced symbol "p" shadows the variable "p" in the calling context. This can be avoided by always binding all macro input names to local variable names before introducing macro-local names, but it's sufficient to bind those that might be shadowed by the introduced binding, which depends on how they are used. // new definition #define LST_ADD(head_lvalue, tail) {\ typeof(tail) _tail = tail; {\ typeof(head_lvalue) *p = &(head_lvalue); \ while(*p) { p = &((*p)->next); } \ *p = _tail; }} Entry: Automatic binary network protocol definition Date: Thu Feb 26 15:09:23 CET 2009 The idea: given a collection of (somehow restricted) functions, create server and client protocol pack/unpack routines. The reason: I loose too much time doing this by hand. The restrictions: obviously, the functionality should fit in a simple RPC mechanism that sends one packet to a server, and expects a return packet (see above). The main difficulty is memory management, and mapping C function calls to structs and back. Entry: classes vs. prototypes Date: Sun Mar 29 14:30:19 CEST 2009 Writing plain vanilla C code, one often wants some form of OOP indirection. Instead of storing references to classes in objects, it is often simpler to get rid of this level of indirection and store methods directly in objects. This is called ``Prototype based OOP''. When it is clear that some kind of class structure emerges, "behaviour" objects can be constructed: objects which contain only a collection of methods and are referenced by other objects. http://en.wikipedia.org/wiki/Prototype-based_programming Entry: Concurrency oriented programming in C + Scheme Date: Sat Aug 1 21:58:50 CEST 2009 The context of this article is the use of Scheme as an extension language in a predominantly C-based application. It is a way to make C programming easier by sticking to a subset that performs only `hierarchical' data allocation on the C-stack (i.e. doesn't use malloc() and free(), or any kind of explicit alloc/dealloc). Let's call this subset stack-C. From my experience(1) I make the assumption that stack-C is enough to solve many practical subproblems, given the external scheduling and memory management can be solved. The main idea is that while mixing ordinary C with (scripting) languages that have a different memory and execution model can be problematic (i.e. Scheme with GC'd CONS cells and a continuation-based control mechanism), the stack-C approach provides a clean separation between the C side and any time and space resource management provided by the scripting language runtime. The reason is that stack-C tasks can be: - trivially suspended to (contiguous!) memory - resumed multiple times (trivially forked) - garbage collected without finalization - separated completely from memory and control management - trivially parallelized This is essentially declarative concurrency. You shift the object-oriented model (state machines) to tasks. The idea is related to deforestation, where intermediate data structures (which require knowledge of the memory model) are eliminated by (static) scheduling of communicating tasks. Concretely: the stack-C code needs to only consume and produce atomic data, which is easily wrapped, and never builds any (recursive) data collections. When the need arises to build datastructures (intermediate representations) this can be done in the scripting language, without complicating the C code. This allows core algorithms to be written in `functional C' while all data management and code scheduling can be performed by the scripting language, be it a GCd Scheme, or a linear concatenative language. Memory and time management can then be `library provided'. I wonder if this is the key to a practical linear language. The fact that such `suspendable functional C' primitives could be used in two completely different memory models is a powerful hint. Footnotes. (1) The idea comes from an implementation of an EtherCat master driver written in C, in a style that does not use malloc() for data structure allocation, but instead allocates all memory on the activation stack. This made me wonder if this programming style can't be generalized. The idea popped up again trying to extend TinyScheme with `inverted enumerators' aimed at replacing list allocation with coroutine / generator style parameter linkage. (2) When stack-C is extended to include open/close or malloc/free style external resource management, the main challenge in this approach is finalization of tasks[2]. What we do is creating cursors by enumerator inversion, which doesn't guarantee the traversal is run to the end. This might however be elegantly solved by always re-inverting such inverted enumerators back to enumerators in the scripting language. Once could however argue that managed external resources are an implementation artifact and could be designed out. (3) Alternatively, a linear approach (no cyclic data, refcount based management) makes this easier to manage. However, it does require a finalize() method to the task. Linearly managing the execution contexts themselves (making fork() expensive) might be not such a bad idea. [1] http://okmij.org/ftp/Haskell/misc.html#fp-arrays-assembly [2] http://okmij.org/ftp/Scheme/enumerators-callcc.html#Finalization [3] http://okmij.org/ftp/papers/LL3-collections-enumerators.txt Entry: Generating C code Date: Sun Aug 9 12:52:37 CEST 2009 When generating boilerplate C code for a project, it seems better to generate either full C files, or header files with defines and inline functions. Including bits and pieces of C code from files seems error-prone. Entry: Unlikely Date: Sat Aug 15 11:36:41 CEST 2009 I've started to use the `unlikely' annotation. #define unlikely(x) __builtin_expect((x),0) if (unlikely(NULL == foo)) { ... } Apart from catering to that insatiable hunger for fast code it is also makes code more readable: on the first reading pass you can ignore these checks to see the default control path. Entry: CONS using alloca() Date: Mon Aug 17 19:28:18 CEST 2009 I'd like to write a tokenizer / parser in C without using malloc(). Is it ok to use alloca() for this (i.e. one per character)? I did this before actually... Hmm.. Apparently alloca() is no longer standard? Something to do with ANSI C's specification of compile-time knowable stack frame size. Entry: mmap() tricks Date: Mon Aug 17 19:34:43 CEST 2009 How to use mmap() for memory allocation tricks? Say I want to allocate a single 4K page into my address space, and free it later. Can this be done using mmap? It's possible to place a page at a fixed address using MAP_FIXED. Entry: Dynamic scope and thread-local variables. Date: Tue Aug 18 12:42:11 CEST 2009 Dynamic scoping is easy to implement using global variables and save/restore combined with some `dynamic-wind' guard for longjmp() in case that's used. However, in the presence of pre-emptive threads, this isn't an option. So how to use thread-local storage? [1] http://en.wikipedia.org/wiki/Thread-local_storage [2] http://publib.boulder.ibm.com/infocenter/iseries/v6r1m0/index.jsp?topic=/rzahw/rzahwex1.htm Entry: Will use mmap() instead of malloc() for allocation of major heap chunks. Date: Mon Aug 24 09:54:03 CEST 2009 That's what shows up in the output of the ./configure of (Meta)OCaml. Entry: Hiding circular dependencies using "temporary binding". Date: Mon Jul 26 14:07:25 CEST 2010 If you stick to proper trees as datastructures for C programs, you can avoid memory allocation problems. In such case each datastructure has exactly one owner/parent/... an can in principle be _embedded_ in that larger structure. If you need directed, acyclic graphs, you need reference counting. If you need generic cyclic object reference graphs, you need asynchronous GC. However, in many cases it is possible to go back to DAG / tree rep by providing a "temporary binding" object that associates 2 objects that need to know about each other. This requires a rule that methods cannot store pointers to other objects, but they might be "temporarily bound" during the execution of one method. The binding object (the one that actually stores the two pointers) is then responsible for the management of the two pointers. This is actually dependency injection[1] combined with a `short-lived clause': object only live during method call. In the DAG case, another trick is to impose an order on the data structure. The simplest one is the call stack: if a DAG is built only from local variables, simply returning functions will properly cleanup. FIXME: de-ramble [1] http://en.wikipedia.org/wiki/Dependency_injection Entry: Translating error codes Date: Wed Jul 28 08:01:22 CEST 2010 I notices an interesting pattern: because error codes are not real exceptions, they do not have a clear identity. I.e. an error code obtained from a different framework probably needs to be translated before it is returned up the chain. In C, it seems to be best to keep a single error namespace for all errors in a single module to avoid multiple translations. Entry: Error handling and "smart code" Date: Thu Jul 29 10:46:03 CEST 2010 For very stateful object (i.e. file systems) that have many possible local errors that could in principle be solved locally, it might sometimes be better to separate the code into two layers: * A dumb layer that gives up at the first sign of trouble, cancelling the transaction (not modifying the state). * An error recovery layer at the API entry point that localizes retry/resolve to all conditions that are recoverable, restarting transactions from the start. The basic idea is: don't _ever_ leave your object in an inconsistent state between method calls. This can sometimes conflict with "use small method calls" and pushes you towards thinking very hard about what a decently sized transaction does. If a state transition consists of a lot of small increments with intermediate inconsistencies, make sure it can be aborted at all times. In C this can be done by keeping the updated state in local variables, and only committing when everything went right. Entry: Error handlers as goto labels Date: Fri Jul 30 07:23:28 CEST 2010 What you see a lot in C code is a "goto error" sprinkled around code, where the "error" points to code that performs some error recovery before function return. The problem with this is that you don't know where the error condition came from when you set a debugger breakpoint on the "error" label. The solution is to place the handling code in a separate function. Instead of "goto error" one would use: return handle_error(err); This way a breakpoint can be set on "handle_error" and from the backtrace it is clear which branch of the code generated it. Entry: An argument for NULL-terminated lists Date: Wed Aug 11 19:15:31 CEST 2010 It's funny how a seemingly arbitrary choice keeps roaring its head. If you represent an array in C or any other low-level language that allows pointers, you have a choice between: - size + contiguous vector (SV) i.e. Pascal string - contiguous vector + sentinel (V0) i.e. C string Up to now I've found that all other things being equal, the SV is better because 1) you don't need out-of-band values and 2) you know the size without traversal. Granted, NULL usually isn't a problem as an out-of-band value, but it can be problematic in other cases where the elements are not simply memory addresses. Knowledge of the total size can come in handy when the vector is being accessed sequentially (i.e. during serial communication), and you need to allocate temporary storage. In the V0 case you need to traverse twice: once to get the size, and once to copy the data. This can be expensive and in some cases even impossible, in which case you need to start guessing the size and handle cases where you guessed to low. However, today I've found a case where V0 is actually better: when you want a compact representation of traversal stacks for tree data structures(1). In the V0 case you just need to store a stack of element pointers, and pop the stack whenever the sentinel is reached (i.e. the "return" instruction). In the SV case you need to store _two_ words per recursion level: one current pointer, and one counter or pointer to the end of the data structure(3). (1) Sometimes you do want explicit traversal stacks: recursion in C can be memory-inefficient due to presence of arguments or local variables that are "global" to the data structure descent. (2) Actually implementing a traversal on a SV-style tree it struck me that I could still use a single-word stack by _copying_ the elements to the stack all at once. Of course this is not a real solution as it less efficient overall wrt. stack space: whole lists are stored instead of pointers into lists. (3) The V0 vector case are CDR-coded linked lists, meaning you can easily "pop" such a data structure[1]. [1] entry://20081023-144321 Entry: Programming without malloc() Date: Sun Aug 15 21:05:41 CEST 2010 ( Related to PF and linear memory management. ) How come that I have not much trouble writing linear C code (not using malloc) for embedded systems work? Is it that embedded systems code is inherently "simple", or does it have to do with component-oriented approach (usually embedded software is really about the underlying hardware, which is necessarily finite). [1] entry://../libprim/20100815-204639 Entry: Explicit recursion stack Date: Mon Aug 16 07:45:09 CEST 2010 To eliminate recursive calls in C and replace them with an explicit stack, use the following approach: 1. Make the code self-recursive by unrolling mutually recursive calls. This is necessary to provide a single "context" to the loop. The stack will be in that context. 2. Replace the recursive call with a CALL macro that pushes the current data structure pointer to the stack and jumps to the entry point. 3. Replace the entry point to examine the top of the stack. Entry: Stacks and Queues Date: Tue Aug 17 09:19:20 CEST 2010 Another non-obvious obvious thing (NOOS). Stacks and queues are "dual" as they represent two ways of traversing a tree: depth-first and breadth-first. Now, is a breadth-first search necessarily less efficient than a depth-first search? It seems it needs more bookkeeping information. Or is this a consequence of how I usually implement? Entry: Explicit recursion and post-ops. Date: Tue Aug 17 09:46:52 CEST 2010 To write a tree descent algorithm which performs an operation _after_ recursion in the tree, the recursion stack needs two types of "continuations". One for the child traversal, and one for the post-op. This is exactly the same problem as the representation of continuations in a language interpreter. The solution is to build a stack which has elements that are unions instead of structs, and write a dispatcher (pattern matcher) for the different component types. Entry: Struct assigments Date: Tue Aug 17 11:08:05 CEST 2010 Apparently memcpy() isn't really necessary for structs. [1] http://stackoverflow.com/questions/324011/memcpy-vs-assignment-in-c [2] http://groups.google.com/group/comp.lang.c.moderated/browse_thread/thread/17f9d2a82b1ee88f/31d43c3c27a478af?pli=1 Entry: Static allocation: stacks and queues. Date: Wed Aug 18 18:34:27 CEST 2010 * In an embedded application, if a static structure is built at run-time, built up from many small fragments of memory, it might be better to allocate from a stack than to pollute the heap. If the amount of memory needed is predictable, the stack size can even be fixed at compile time. This requires a two-pass algorithm though. * Instead of writing pointers to a work queue, it is also possible to write self-contained objects. This avoids malloc() for the payload data. Both are really the same pattern: ``catch'' memory allocations and implement them differently based on knowledge of the memory lifetime. Entry: Non-cyclic objects Date: Thu Aug 19 18:47:17 CEST 2010 The free() operation is trivial for "linear" objects that do not contain any links to dynamic resources: simply discard. This is the: ``finalizers are evil'' anti-pattern. Is this actually useful? Entry: Constant data in Flash Date: Thu Aug 19 22:43:47 CEST 2010 How do you make sure constant data goes into Flash memory? Is it possible to generate constant data structures with cross references? From [1]: const volatile int *data __attribute__((section("FLASH")) = /* whatever */; [1] http://stackoverflow.com/questions/1284619/gcc-c-arm-and-const-pointer-to-struct-field [2] http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0474a/BABHIIEF.html Entry: Constant Recursive Data Structures Date: Mon Aug 23 18:36:41 CEST 2010 How to define constant, cyclic data structures in C? Same as for functions! The syntax for variable definitions and declarations is the same, it is the order that matters. Only the last occurance is a definition, the rest are declarations. //------------------------------- #include struct node; struct node { const struct node **parents; const struct node **children; }; const struct node a; const struct node b; const struct node c; const struct node d; const struct node e; const struct node *a_p[] = {NULL}; const struct node *a_c[] = {&c, NULL}; const struct node a = {a_p, a_c}; const struct node *b_p[] = {NULL}; const struct node *b_c[] = {&c, NULL}; const struct node b = {b_p, b_c}; const struct node *c_p[] = {&a, &b, NULL}; const struct node *c_c[] = {&d, &e, NULL}; const struct node c = {c_p, c_c}; const struct node *d_p[] = {&c, NULL}; const struct node *d_c[] = {NULL}; const struct node d = {d_p, d_c}; const struct node *e_p[] = {&c, NULL}; const struct node *e_c[] = {NULL}; const struct node e = {e_p, e_c}; int main() { return 0; } Entry: Book about embedded C Date: Mon Aug 23 21:05:07 CEST 2010 I wonder if it would make sense to start writing abook about embedded C. Maybe it's best to start with cataloging the tricks I already know. TOPICS: * Memory allocation patterns - static memory - stacks and queues - pure trees and inline structs - DAGs and refcounting - garbage collection and Scheme - hiding circular deps (the "environment" pattern / S combinator) * Data structure traversal - callbacks with context - generic fold w. callback - generators - for MACROS to avoid context objects * Debugging - gdb as an application console Entry: Gensym in C Date: Wed Aug 25 14:21:32 CEST 2010 To make iteration macros for abstract data structures on top of the `for' keyword, it helps to be able to define unique symbols to store the iteration state object. I.e. #define X_COLLECTION_FOR_(_list, _x, _xl) \ struct x_list *_xl; \ for(_xl = (*_list), _x = _xl ? _xl->x : NULL; \ _x; \ _xl = _xl->next, _x = _xl ? _xl->x : NULL) #define X_COLLECTION_FOR(list, x) X_COLLECTION_FOR_(list, x, __xl__) The `__xl__' symbol above is inserted into the macro invocation name context as a local variable and might clash. Is there a way to properly solve this problem? The following seems to work: #define __GENSYM2(x,y) x##y #define __GENSYM1(x,y) __GENSYM2(x,y) #define GENSYM(x) __GENSYM1(x,__COUNTER__) The indirection is about prescan[3]: "If an argument is stringified or concatenated, the prescan does not occur. If you want to expand a macro, then stringify or concatenate its expansion, you can do that by causing one macro to call another macro that does the stringification or concatenation." [1] http://stackoverflow.com/questions/1132751/how-can-i-generate-unique-values-in-the-c-preprocessor [2] http://gcc.gnu.org/onlinedocs/cpp/Common-Predefined-Macros.html#Common-Predefined-Macros [3] http://gcc.gnu.org/onlinedocs/cpp/Argument-Prescan.html#Argument-Prescan Entry: Resource Acquisition Is Initialization (RAII) Date: Thu Aug 26 11:57:21 CEST 2010 Mixing resources and exceptions, or how to model external resources as short-lived local (automatic) variables. This tickles the idea again about ``resource-free programming''. In general, ``garbage collection is good but finalizers are bad'' [Steele]. [1] http://en.wikipedia.org/wiki/Resource_Acquisition_Is_Initialization [2] http://calculist.blogspot.com/ Entry: Constant multidimensional arrays Date: Mon Aug 30 10:48:25 CEST 2010 One of those things that has always puzzled me is how multidimensional arrays work in C. I've always avoided this and embedded multidim arrays inside of 1-dim arrays using explicit coordinate mapping. int temp[0x10][0x20]; In the definition of `temp`, is there an array of pointers involved? K&R: "In C, a two-dimentional array is really a one-dimensional array, each of whose elements is an array.". So my confusion is mostly about the difference between an array and a pointer. These are not the same! See the sizeof() macro. The confuses arrises because an array is automatically converted to a pointer whenever it is used in a pointer context. Entry: Order of evaluation in CPP macros Date: Tue Aug 31 11:19:00 CEST 2010 Is it possible to play with order of expansion in CPP? Judging from the GENSYM[1] macro it is. How to do this systematically? Should evaluation order matter? For a purely functional language, evaluation order only matters for termination issues. So it's no surprise that we run into evaluation order when an impure features is used (the __COUNTER__ variable). [1] entry://20100825-142132 Entry: Defining constant linked lists using CPP Date: Tue Aug 31 13:37:45 CEST 2010 Is it possible to define constant linked lists from isolated macro expansions? I'd think it's possible using name concatenation, but it would still require manual input of symbols, so it's no worse than giving an explicit previous list item to link.. Entry: git diff Date: Tue Aug 31 17:12:23 CEST 2010 Not really C related, but I only use git for C projects. Problem: make a diff that can be applied by `patch' on a subtree of a git repository. It's probably simplest to use two braches and diff between the branches. Entry: C99 function names: __func__ Date: Wed Sep 1 13:59:19 CEST 2010 See GCC manual[1]. Apparently __func__ is part of the C99 standard, and behaves as: static const char __func__[] = "function-name"; __FUNCTION__ is a GCC extension and not portable. [1] http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Function-Names.html Entry: Non-invasive C code tracing Date: Wed Sep 1 14:56:44 CEST 2010 Is it possible to attach code to C function entry and exit points in a non-invasive way? One (invasive) way is to wrap each function in an instrumentation function. Entry: The meaning of `const' (.rodata section) Date: Tue Sep 7 09:10:01 CEST 2010 One often hears that ``const is broken'' which means that it is possible in C to get an unrestricted pointer to const data. However, on a more pragmatic side, the `const' qualifier does tell the compiler that the data is _intended_ to be constant, and as such it can be put in a separate linker section `.rodata'. This section can then be put in (p)ROM. I.e. for ARM uC it can go into program Flash. Entry: Linker sections on AT91SAM7X512 (ARM7TDMI, 128k RAM, 512k Flash) Date: Tue Sep 7 09:25:35 CEST 2010 The output of objdump (reformatted): tom@one:~/$ objdump -h app.elf app.elf: file format elf32-little Sections: Idx Name Size VMA LMA File off Algn 0 .debug_aranges 000021f8 00000000 00000000 00030e30 2**3 CONTENTS, READONLY, DEBUGGING 1 .debug_pubnames 00006976 00000000 00000000 00033028 2**0 CONTENTS, READONLY, DEBUGGING 2 .debug_info 0007fbd5 00000000 00000000 0003999e 2**0 CONTENTS, READONLY, DEBUGGING 3 .debug_abbrev 00014342 00000000 00000000 000b9573 2**0 CONTENTS, READONLY, DEBUGGING 4 .debug_line 00017cc3 00000000 00000000 000cd8b5 2**0 CONTENTS, READONLY, DEBUGGING 5 .debug_frame 000062f4 00000000 00000000 000e5578 2**2 CONTENTS, READONLY, DEBUGGING 6 .debug_str 0000f462 00000000 00000000 000eb86c 2**0 CONTENTS, READONLY, DEBUGGING 7 .debug_loc 0001b6ee 00000000 00000000 000facce 2**0 CONTENTS, READONLY, DEBUGGING 8 .rom_vectors 00000040 00100000 00100000 00008000 2**0 CONTENTS, ALLOC, LOAD, READONLY, CODE 9 .text 0001e790 00100040 00100040 00008040 2**2 CONTENTS, ALLOC, LOAD, READONLY, CODE 10 .rodata 00002580 0011e7d0 0011e7d0 000267d0 2**3 CONTENTS, ALLOC, LOAD, READONLY, DATA 11 .got 00000228 00120d50 00120d50 00028d50 2**2 CONTENTS, ALLOC, LOAD, DATA 12 .fixed_vectors 00000140 00200040 00200040 001163c0 2**5 CONTENTS, READONLY 13 .data 00000cac 00200180 00120f78 00030180 2**3 CONTENTS, ALLOC, LOAD, CODE 14 .bss 0001140c 00200e2c 00121c24 00030e2c 2**4 ALLOC 15 .ARM.attributes 0000002c 00000000 00000000 00116500 2**0 CONTENTS, READONLY 16 .debug_ranges 000073e0 00000000 00000000 00116530 2**3 CONTENTS, READONLY, DEBUGGING 17 .comment 00001784 00000000 00000000 0011d910 2**0 CONTENTS, READONLY VMA = virtual memory area LMA = ??? 00100000 Flash 00200000 RAM 00000000 Remappable Flash/RAM Some sections start at 0 but this seems to be a dummy value, i.e. for DEBUGGING. Entry: Reinventing linked lists Date: Tue Sep 14 09:35:46 CEST 2010 Yes, I know i deserve to be slapped when implementing (sorted) linked lists in C, but really, sometimes it's just easier to just write it down. Trouble is, this is error prone. I _think_ I can just write it down but there's always some bug in the code that's not caught by the type system. Double pointers are notorious for this. See the following: struct timeout_list **tol = &a->timeouts; while ((*tol) && ((*tol)->time < at)) { (*tol) = (*tol)->next; } What I really wanted to write is: struct timeout_list **tol = &a->timeouts; while ((*tol) && ((*tol)->time < at)) { tol = &((*tol)->next); } The latter iterates, while the former removes elements! Entry: Emacs and parsing C Date: Tue Sep 14 11:59:08 CEST 2010 There are some tools available, but they don't make the click to me (i.e. CEDET). The one that cought my attention is [1]. The problem is quite simple: * build a database from your source code. * query the database * allow "replace", taking part of a syntax tree and replacing it with something else. * integrate this with emacs The most complex part seems to be the C parser. I.e. to go from code to (modifiable!) datastructure back to code. [1] http://mike.struct.cn/blogs/entry/15/ Entry: Embedding arbitrary data in ELF using objcopy Date: Sat Sep 18 15:13:33 CEST 2010 Good hint from [1]: objcopy \ -I binary -O elf32-i386 -B i386 \ --rename-section .data=.rodata file.bin file.o Default secion is .data which is here renamed to .rodata for embedding of const data that goes into program Flash. This creates 3 symbols: SYMBOL TABLE: 00000000 l d .rodata 00000000 .rodata 00000000 g .rodata 00000000 _binary_file_bin_start 00000007 g .rodata 00000000 _binary_file_bin_end 00000007 g *ABS* 00000000 _binary_file_bin_size To use different names, use something like: -–redefine-sym _binary_file_bin_start=_my_bin_data [1] http://www.doof.me.uk/2010/05/07/cute-objcopy-hack/ Entry: Makefile error: multiple target patterns Date: Fri Sep 17 15:18:20 CEST 2010 Check for multiple colons[1]. [1] http://stackoverflow.com/questions/2100448/multiple-target-patterns-makefile-error Entry: Const correctness Date: Thu Sep 23 10:13:38 CEST 2010 What we already know is that global variables declared "const" will go into the .rodata section. What about functions? I'd like to hash the __FUNCTION__ variable into an integer to get method tags. Does the compiler inline this if the hashing function is declared const? [1] http://en.wikipedia.org/wiki/Const-correctness [2] http://stackoverflow.com/questions/212237/constants-and-compiler-optimization-in-c Entry: Makefile patterns Date: Fri Feb 25 16:21:30 EST 2011 Pattern targets. The pitfall is to not make them too general. How are ambiguities handled? What I like to know is how to insert a phony target into a real one. I.e. some targets do not have a full dependency tree, only a recursive make... What I want to do is to build a better build system for ecos. - If the config file or any of the cdl files change, regenerate the whole tree. - For other changes, just perform recursive make. Maybe it's not really worth it : keep current as is (only rebuild if config changes) and do manual rebuild when working on eCos source files. Entry: Environment or makefile variables? Date: Sat Mar 5 14:27:01 EST 2011 Basic idea: stick to one; don't use both. Entry: Datastructures in Flash Date: Sat Mar 5 23:35:58 EST 2011 Maybe I misunderstood the code, but I saw a segment in eCos where a datastructure is defined and attributed to a certain linker segment. The linker then collects all of these and places them next to each other. This seems like a nice way to build contiguous memory areas from dispersed source code locations when using C as a compiler target language. [1] entry://20100918-151333 [2] http://../ecos/20110529-123810 Entry: GNU make : target specific variables Date: Wed Mar 9 12:42:34 EST 2011 [1] http://www.gnu.org/software/make/manual/make.html#Target_002dspecific Entry: Generators Date: Thu Mar 17 11:53:12 EDT 2011 When you have list structures moving between layers of code, say A->B->C, there is a choice to be made about the form of the data that flows between the functions. The simplest approach is to put a bunch of data in a (large) buffer and pass it on to the next function. An alternative way is to pass only a single element at a time, and place the A->B->C components inside a loop. Taking some ideas from [1], and taking into account the limitations of C, the optimal seems to be to provide a "fold" for each data structure. In C code I prefer to use the word "for". The main advantage of such an approach is that it avoids intermediate data structure storage such as large buffers to pass lists of elements from one function to the next. Using the "for" loop aproach instead of the iterator object approach -- i.e. methods that implement first, next, last -- has the added advantage that locking and other resource management can be abstracted away from the callback function. [1] http://okmij.org/ftp/Streams.html#enumerator-stream Entry: Abstract data types vs. static RAM allocation Date: Fri Apr 1 14:28:39 EDT 2011 There is a clear conflict of interest between 1. C structures with static allocation 2. Data abstraction It would be handy to be able to know the size of a data structure at compile time, but hide its layout such that data accessors can be used. I'm not so much interested in hiding reads, but writes really do need special attention, as they can unknowingly introduce the worst kind of (single-threaded) bug the data structure invariant violation. I was thinking about a hack that would use the C "const" to protect a struct from direct write access, but cast the const in the implementation. That might not interfere well with optimization though.. However it might be so that the C compiler doesn't use the "const" for optimization simply because const pointers can be recast. I'd like to pose this question on Stack Overflow. How to formulate? Q: How to get at the size of a forwared declared struct? A good practice when writing C code is to always use forward structure declarations (incomplete types) in a header file, and keep the implementation in a code file. // in header struct abc; // in code struct { int a; int b; int c; } When writing C code in an embedded development setting with a lot of memory perssure, it is often a good idea to statically allocate data structures whenever possible. Doing so requires complete types. i.e. a struct definition which has all fields accessible by the user. Is there an elegant way to resolve the conflict of these two constraints? Is it possible to provide struct size information at compile time to make it possible to perform static allocation, but at the same time hide the layout of the struct to prevent arbitrary access? Entry: C and packed bitfields Date: Sun Apr 3 13:49:15 EDT 2011 Is this actually useful? Entry: Producer/consumer in C in an embedded context Date: Mon Apr 4 14:11:08 EDT 2011 Main problem in embedded is to limit RAM memory usage. This makes algorithms that use intermediate data structures prohibitive. Usually there is plenty of ROM in comparison to RAM, so it is usually possible to use use manual "deforestation". What this means is to turn code into producers/consumers (threads and streams). In languages that support partial contiuations this is straightforward. How to do it in C? Entry: "for" functions or "FOR" macros? Date: Tue Apr 5 21:15:28 EDT 2011 The idea is to always use abstract collection traversal interfaces. There are essentially two forms: - the STREAM: open, close, next, eof - the early-abort FOR loop, either as a macro or as a higher order function (callback + state). Most people are familiar with the STREAM option, but the FOR function has some advantages, such as centralizing resource allocation (open/close are abstract). In C, a "FOR" macro is more convenient since it can use its surrounding lexical context without using tricks, but a "for" function is more powerful though a bit clumsy to use for its need of specific "context" structures (closures). Entry: static inline: (pseudo-) separate interface from inplementation Date: Thu Apr 28 12:28:44 EDT 2011 I tend to use "static inline" functions a lot for simple accessors to allow the compiler to perform more aggressive optimization. The problem is that this always needs to combine interface and implementation for two reasons: - data structures need to be fully specified. - function bodies are visible. The former you can't do anything about: compiler needs to know the data layout so it's technically possible for the API user to abuse this. The latter makes it hard to read headers. It seems a good idea to use the following: use forward declarations for data structures and inline functions and stash the implementation of both in a _priv.h file that is not supposed to be looked at by human eyes, but is there for the compiler: // foo.h struct foo_data; static inlien int foo_data_size(struct foo_data *x); #include "foo_priv.h" // foo_priv.h struct foo_data { int a, b, c; }; static inline int foo_data_size(struct foo_data *x) { return sizeof(foo_data); } Another issue whether one allows struct members as: struct highlevel { struct lowlevel_foo foo; struct lowlevel_bar bar; } or needs to abstract using pointers and possible constructors such as lowlevel_foo_new() and lowlevel_bar_new() to instantiate the objects. struct lowlevel { struct lowlevel_foo *foo; struct lowlevel_bar *bar; } Entry: GNU cflow Date: Thu May 19 17:31:40 CEST 2011 Print a control flow graph (as an indented text file) from a bunch of C files. Another tool is egypt[4]. Apparently doxygen can also use Graphviz to generate call graphs[3]. cflow2dot[5] to connect cflow ang Graphviz. [1] http://www.gnu.org/software/cflow/ [2] http://en.wikipedia.org/wiki/Call_graph [3] http://en.wikipedia.org/wiki/Doxygen [4] http://www.gson.org/egypt/ [5] https://code.google.com/p/cflow2dot/ [6] http://sourceforge.net/projects/cflow2vcg/files/cflow2vcg/0.5/cflow2vcg-0.5.tar.gz/download Entry: Link-Time Optimization (LTO) vs. static inline Date: Sun May 22 18:27:51 CEST 2011 Does GCC perform LTO? Seems the short answer is yes[1]. How to enable it? From [2]: To enable LTO, simply add the flag '-flto' to both compile and link commands. [1] http://gcc.gnu.org/wiki/LinkTimeOptimization [2] http://gcc.gnu.org/ml/gcc/2009-10/msg00060.html Entry: Automatic casting and overflow Date: Mon May 23 16:59:12 CEST 2011 Something I always do explicitly because I can't remember the rules: char i = 255; int j = i + 1; Does i get cast from char -> int before the addition, or after? Where is this made explicit? Section 2.7 Type Conversions in K&R is quite explicit: When an operator has operands of different types, they are converted to a common type according to a small number of rules. In general, the only automatic conversions are those that convert a ``narrower'' operand into a ``wider'' one without losing information [...] Entry: Compile time assert Date: Mon Jun 6 11:30:01 CEST 2011 I need a compile-time assert macro that can go into proprietary code, so can't use the one in Linux. Good news: I did not look at any code yet, so can I reinvent it? The trick alledgedly is to use the macro arguments to construct data of negative size. So let's analyze this: CT_ASSERT(x) - If the argument of the assert macro is zero, a compilation error is triggered. - To trigger a compilation error, construct a data structure with negative size. - A name is necessary for the data structure, so let's use a name generation macro (GENSYM[1]). It turns out to be quite straightforward: using result values of boolean expressions (after a comparison of the input to 0), a simple negation can be used to map the failed assert to a negative array size. #define CTASSERT(x) static const char GENSYM(ctassert_)[-((x)==0)]; EDIT: There is even a more elegant way. Instead of using a code construct that has a run-time run-time remnant as a symbol part of the binary if it's not optimized out, it's possible to use just a completely ephemeral construct: i.e. a type. #define CTASSERT(x) struct GENSYM(ctassert_) {char a[-((x)==0)]; } EDIT: It's also possible to avoid size 0 arrays and use size -1 for false and size +1 for true: #define CT_ASSERT(x) struct GENSYM(_ctassert_){ char a[1 - 2*((x)==0)]; } [1] entry://20100825-142132 Entry: Know your linker: how to avoid unused functions to end up in a binary. Date: Sat Jun 11 18:01:52 CEST 2011 Use the CFLAGS -ffunction-sections -fdata-sections which will put functions in their own sections. Together with the linker option -Wl,--gc-sections unused sections (functions or data) will be removed. The only downside is that using separate sections requires more space due to alignment. Entry: binutils RTFM Date: Sun Jun 12 15:59:20 CEST 2011 From [1]: You can see the symbols in an object file by using the nm program, or by using the objdump program with the `-t' option. [1] http://www.delorie.com/gnu/docs/binutils/ld_8.html Entry: What are __restore and __restore_rt ? Date: Sat Jun 4 15:39:51 CEST 2011 I ran into these missing symbols in an eCos synthetic target build[1]. I have no idea where these functions should be defined, and what the meaning is of the construct in the eCos code: .align 16 .global cyg_hal_sys_restore_rt cyg_hal_sys_restore_rt: movl $SYS_rt_sigreturn, %eax int $0x80 1: .type __restore_rt,@function .size __restore_rt,1b - __restore_rt .align 8 .global cyg_hal_sys_restore cyg_hal_sys_restore: popl %eax movl $SYS_sigreturn, %eax int $0x80 1: .type __restore,@function .size __restore,1b - __restore The build fails like this: make[1]: Entering directory `/opt/xc/ecos/build/cvs/linux/hal/synth/i386linux/current' gcc -c -I/opt/xc/ecos/build/cvs/linux/install/include -I/opt/xc/ecos/src/cvs//packages/hal/synth/i386linux/current -I/opt/xc/ecos/src/cvs//packages/hal/synth/i386linux/current/src -I/opt/xc/ecos/src/cvs//packages/hal/synth/i386linux/current/tests -I. -I/opt/xc/ecos/src/cvs//packages/hal/synth/i386linux/current/src/ -finline-limit=7000 -Wall -Wpointer-arith -Wstrict-prototypes -Wundef -Wno-write-strings -g -O2 -ffunction-sections -fdata-sections -fno-exceptions -Wp,-MD,src/syscall-i386-linux-1.0.tmp -o src/hal_synth_i386linux_syscall-i386-linux-1.0.o /opt/xc/ecos/src/cvs//packages/hal/synth/i386linux/current/src/syscall-i386-linux-1.0.S /tmp/ccIPQc1w.s: Assembler messages: /tmp/ccIPQc1w.s: Error: .size expression for __restore_rt does not evaluate to a constant /tmp/ccIPQc1w.s: Error: .size expression for __restore does not evaluate to a constant make[1]: *** [src/syscall-i386-linux-1.0.o.d] Error 1 make[1]: Leaving directory `/opt/xc/ecos/build/cvs/linux/hal/synth/i386linux/current' make: *** [build] Error 2 make: Leaving directory `/opt/xc/ecos/build/cvs/linux' According to something I found in google code search[2], these functions are trampolines. It's a part of gdb code[3]. It says: ... as of version 2.1.2, the GNU C Library uses signal trampolines (named __restore and __restore_rt) that are identical to the ones used by the kernel. Maybe that is something that changed? I used objdump -T to find these symbols in any of the libraries in /lib and /usr/lib and I didn't find anything. Maybe it's in libgcc or so? So let's see what that code actually means. The .type[4] directive records the symbol table type for the associated symbol. So it seems that ".type" in .type __restore,@function .size __restore,1b - __restore does not much more thanannotate the symbol "__restore": it does not define it. Same for ".size": it records the size associated to the symbol. Maybe what is meant here is to just refer to the "cyg_hal_sys_restore" symbol, and not the "__restore" symbol? I.e. the code probably has changed at some point to incorportate a name change, but somebody forgot to update that directive. It's not necessary for the code to work so it was only detected when binutils got stricter. See fix next post. [1] http://ecos.sourceware.org/ml/ecos-discuss/2011-06/msg00013.html [2] http://www.google.com/codesearch?as_q=__restore_rt&btnG=Search+Code&hl=en&as_package=&as_lang=&as_filename=&as_class=&as_function=&as_license=&as_case= [3] http://www.google.com/codesearch/p?hl=en#pFm0LxzAWvs/darwinsource/tarballs/other/gdb-203.tar.gz%7CYXbJTYT1R-s/gdb-203/src/gdb/i386-linux-tdep.c&q=__restore_rt [4] http://tigcc.ticalc.org/doc/gnuasm.html#SEC133 Entry: Synthetic target fix Date: Sat Jun 4 16:41:20 CEST 2011 After checking what ".type" and ".size" actually mean (they annotate a symbol with type and size), it seems likely that what is meant here is to just refer to the "cyg_hal_sys_restore" symbol, and not the "__restore" symbol? I.e. the code probably has changed at some point to incorporate a name change, but somebody forgot to update that directive. It's not necessary for the code to work so it is only detected now that binutils got stricter about it's input. If this is the case, see the fix below Cheers, Tom ? synth_fix.patch Index: syscall-i386-linux-1.0.S =================================================================== RCS file: /cvs/ecos/ecos/packages/hal/synth/i386linux/current/src/syscall-i386-linux-1.0.S,v retrieving revision 1.13 diff -u -8 -p -r1.13 syscall-i386-linux-1.0.S --- syscall-i386-linux-1.0.S 23 Aug 2009 11:34:45 -0000 1.13 +++ syscall-i386-linux-1.0.S 4 Jun 2011 14:40:09 -0000 @@ -439,20 +439,20 @@ SYSCALL5(ipc) // via another system call. .align 16 .global cyg_hal_sys_restore_rt cyg_hal_sys_restore_rt: movl $SYS_rt_sigreturn, %eax int $0x80 1: - .type __restore_rt,@function - .size __restore_rt,1b - __restore_rt + .type cyg_hal_sys_restore_rt,@function + .size cyg_hal_sys_restore_rt,1b - cyg_hal_sys_restore_rt .align 8 .global cyg_hal_sys_restore cyg_hal_sys_restore: popl %eax movl $SYS_sigreturn, %eax int $0x80 1: - .type __restore,@function - .size __restore,1b - __restore + .type cyg_hal_sys_restore,@function + .size cyg_hal_sys_restore,1b - cyg_hal_sys_restore Entry: Are structs bad? Date: Thu Jun 23 11:07:57 CEST 2011 The problem with C is not that structs are a bad interface, but that it is hard to enforce structs to be immutable. I.e. in essence there is not such a big difference between: struct foo { int a,b,c; }; and void foo (int a, int b, int c); If any of the two interfaces changes, the code that uses it needs to be changed anyway. Of course, exposing _internal_ data structures that contain implementation details that are irrelevant to the user is a a bad idea. I'm thinking mostly of structs as configuration files. Entry: Be careful with typeof() for casts in macros Date: Sat Jun 25 11:48:18 CEST 2011 Excerpt of a simple stack implementation. #define STACK_PUSH_SAFE(stack, thing, overflow) { \ void *room = stack_allot(stack); \ if (!room) goto overflow; \ *((typeof(thing)*)room) = thing; \ } The trouble with this is that the input argument of the macro is used to determine the access. So if the stack represents short ints (16 bits) and the type of the parameter at the call site is a 32bit int, there is a problem: - data abort on ARM due to alignment problems - incorrect access (writes 2 values) on archs that support non-algined access Entry: Locking and callbacks Date: Sat Jun 25 13:19:14 CEST 2011 Is it possible (and sound?) to do the following: When a mutex is locked by the current thread, do not lock (causing deadlock) but throw an error. This shows up in a case where I'm using callbacks inside a lock to avoid manually having to manage resources. However, the hierarchy of the locks of the application is such that it isn't completely impossible to avoid deadlocks: it's possible to call functions that ultimately depend on a low level lock the callback has already reserved, in this particular case a disk lock. Probably this is just a sign that the cause of the inner deadlock needs to be decoupled, i.e. by using a thread. In this case, the inner cause is a logger which might aquire a disk lock, however it is possible to put the logger in a separate thread and use a buffer to capture logging data until the disk log is released. Is there a (practical) way to statically verify lock hierarchy violations? I.e. this[1] mentions a mechanism where lock hierarchy is constructed (probably through control flow graph) and it triggers an error when this graph contains loops, i.e. joins on the same lock. It seems[2] that it is possible to do it at run-time. That drdobbs article[2] also mentions layer violation through callbacks, which is exactly my problem. I'm using callbacks to abstract locking, but this then has the potential to violate hierarchies. [1] http://www.osronline.com/ddkx/ddtools/dv_8pkj.htm [2] http://drdobbs.com/high-performance-computing/204801163?pgno=1 Entry: emacs + cscope Date: Mon Jul 11 20:43:11 CEST 2011 Minimal setup: apt-get install cscope cscope-el cscope -bR -I ... # Index directory, set include path. Emacs: (require 'xcscope) M-x cscope-set-initial-directory # Set it to the file that contains cscope.out M-x cscope-find-this-text-string # C-c s s Entry: GCC not catching undefined variable error Date: Tue Jul 12 11:08:46 CEST 2011 With -Wall enabled, this undefined variable error is not caught: void foo(int arg) { int err; if (arg < 0) goto exit; err = 0; exit: return err; } Might it be because of -O0, i.e. no data flow analysis performed? Yep, absence of opti is the problem: cc1: warning: -Wuninitialized is not supported without -O Entry: cflow & entry point Date: Mon Jul 18 22:03:39 CEST 2011 Just passing a random file to cflow made me think first that all functions are sorted, meaning that if a function is called by another one, it is not taken as one of the main entry points. However, it does seem to miss some things: my code contains static functions that go into a struct which are completely ignored by cflow. I tried "--include=s" to include static functions, but that doesn't seem to work. Hmm.. Something's wrong. Removing all the "static" annotations does make the functions show up. Entry: Why buffer? Date: Tue Jul 19 16:43:23 CEST 2011 The 7 whies down the rabbit hole answer: to avoid loosing time in context swapping. From this simplified view it seems straightforward to conclude that to optimize for small memory usage, optimize context swapping. Entry: CPP macro for counting the number of bits in a word. Date: Sun Oct 30 15:00:38 EDT 2011 I don't know a way to write recursive macros; don't know how to terminate the recursion and I don't know if recursion is supported in the first place. Though this is a simple manual unrolling: #define NB_BITS_32(x) (NB_BITS_16(x) + NB_BITS_16(x >> 16) #define NB_BITS_16(x) (NB_BITS_8(x) + NB_BITS_8(x >> 8) #define NB_BITS_8(x) (NB_BITS_4(x) + NB_BITS_4(x >> 4) #define NB_BITS_4(x) (NB_BITS_2(x) + NB_BITS_2(x >> 2) #define NB_BITS_2(x) (NB_BITS_1(x) + NB_BITS_1(x >> 1) #define NB_BUTS_1(x) (x & 1) Entry: Counting in the preprocessor Date: Mon Nov 14 11:08:52 EST 2011 It's probably possible to count in the preprocessor by allocating an array, and using sizeof. Then somehow get rid of the thing later.. Entry: MMX on AMD64 Date: Tue Nov 29 12:15:22 EST 2011 Does it not work? Entry: Assigning pointers to zero Date: Sat Dec 17 14:58:52 EST 2011 I ran into a bug today that's similar to void foo(int *a) { a = 0; /// a = 123; } when I meant: void foo(int *a) { *a = 0; /// *a = 123; } How to make this into an error without using leaky const arguments? Entry: Getting over C++ hate (fear?) Date: Sun Dec 18 09:00:38 EST 2011 In discussion with someone that doesn't have this problem, it's come to my attention lately that I have an irrational attitude towards C++. Sure, C is simple and straightforward, but in my style I rely a lot on the CPP. Maybe it's time to start collecting a couple of C++ features that would be interesting to explore, with an eye on embedded development, to avoid having to resort to text macros too much.. - Resource management through local objects. RAAI[1]. Even without exception support (like eCos) it's still useful to be able to return and have destructors called. - Templates: building complicated const data structures at compile time to be able to store them in read-only Flash memory. [1] http://en.wikipedia.org/wiki/Resource_Acquisition_Is_Initialization Entry: Typed CPP Date: Sun Dec 18 09:07:42 EST 2011 Is it possible to build some kind of type system on top of CPP? Limit use to construct that have some structure? I.e. build a system of CPP tools that can be compiled by standard CPP but also compiled by a preprocessor with a type system, i.e. something that would fit C syntax so I can feed it into Language.C ? This decoupling gets you: * Compatibility with standard CPP, meaning a source tree can be compiled without the static analyzer and its dependencies present on the system, which is *really* important. * Full freedom in tool dependency. Since analysis is only necessary when you're actually *changing* the code, the static analyzer can have a whole host of dependencies and special purpose features. Is this just C++'s template system? Entry: Obtaining pointer to current function Date: Thu Dec 29 11:05:45 EST 2011 Is there a way to obtain a pointer to the current function's address without knowing its name? I.e. void foo(void) { void *me = &this_function; } From [1] it seems that this isn't possible, except for doing it with a function call like this: #include void *get_addr(void) { return __builtin_return_address(0); } int main(void) { printf("%p\n", get_addr()); return 0; } This might be enough though. The thing is that gdb knows how to map this to a source location, given the binary compiled with debugging symbols: # gdb (gdb) list *
[1] http://stackoverflow.com/questions/2154852/get-a-pointer-to-the-current-function-in-c-gcc Entry: Detecting re-entrancy Date: Thu Dec 29 19:44:49 EST 2011 How to detect re-entrancy in a function, while allowing normal multi-threaded access? I don't see a way to do this without thread-local variables. Entry: C re-appreciation day Date: Sat Jun 23 15:03:08 EDT 2012 Entry: Overflow is undefined for signed ints Date: Tue Jul 10 09:18:39 EDT 2012 While Two's complement arithmetic has little secrets for me, I was surprised to learn that it's not so well-defined in C as I thought. The reason seems to be that for unsigned ints, the behaviour is easily expressed in terms of the C '%' modulo operator, as operations modulo 1^n where n is the bit size. However, for signed ints this doesn't work because the same interpretation (modulo a positive number) should not change the sign. While there is definitely a "wrap around" for signed ints, it can not be expressed (simply) in terms of the C modulo '%' operator, which is probably the reason it is left unspecified. When unsigned and signed are mixed, the result is unsigned. That makes sense, since it allows an accumulator to be unsigned, and increments to the accumulator be unsigned or signed. Conclusion: representing modulo arithmetic in C using overflow needs to be done using unsigned ints. It will "probably" work with signed ints, but that relies on unspecified behavior. Entry: C eval Date: Fri Aug 17 16:43:27 EDT 2012 I need a simple way to convert a C string to .bin for data structures. Something like: int eval(const char *c_init, void *buf, int bufsize); Which could evaluate "int a[] = {1,2,3};" to binary: 01 00 00 00 02 00 00 00 03 00 00 00 The script part of it seems relatively easy. Run make and dump the .bin to stdout. How to do the spawn? Entry: tty Date: Thu Sep 20 00:02:42 CEST 2012 More a unix thing, but a very interesting article nontheless. This caught my attention: By default, fork(2) places a newly created child process in the same process group as its parent, so that e.g. a ^C from the keyboard will affect both parent and child. But the shell, as part of its session leader duties, creates a new process group every time it launches a pipeline. I ran into trouble recently trying to start a bunch of processes from a shell, but being surprised why they did not end up in one process group. The explanation makes perfect sense: a process group is best associated to a single pipeline. What I did was an "implicit" pipeline: some processes work together sharing named pipes or files & polling. Is there a different way to force this? I.e. make the shell put processes in a single group? [1] http://www.linusakesson.net/programming/tty/ Entry: Simpler gdb tracer Date: Thu Sep 27 08:17:08 EDT 2012 Macro to: - Call function. - Tag -> return address. How to find return address? Entry: const * const Date: Sun Oct 14 09:25:39 EDT 2012 0. const * const: the first "const" refers to the object, the second to the poiter variable. 1. You'd want to declare "const *a" in the function PROTOTYPE to show the caller that the contents of an object passed by reference is not modified. 2. In general, you want variables in a function that are not to be changed to be declared "const". 3. You'd want to also use "const * const a" in a function DEFINITION to make sure the implementation doesn't modify the value of the pointer itself. However, this doesn't need to show up in the prototype! (Since pointers are passed by value it is irrelevant to the caller.) The prototype can just be "const *a". ( #3 is the new insight ) Entry: cscope Date: Thu Nov 22 10:46:39 EST 2012 cscope -kbq This will convert IN: cscope.files OUT: cscope.out cscope.in.out cscope.po.out However, it seems to be missing things. From manpage[1] -q Enable fast symbol lookup via an inverted index. This option causes cscope to create 2 more files (default names ``cscope.in.out'' and ``cscope.po.out'') in addition to the normal database. This allows a faster symbol search algorithm that provides noticeably faster lookup performance for large projects. -k ``Kernel Mode'', turns off the use of the default include dir (usually /usr/include) when building the database, since kernel source trees generally do not use it. -b Build the cross-reference only. [1] http://cscope.sourceforge.net/cscope_man_page.html Entry: Tagged list tricks Date: Sat Dec 8 12:52:33 EST 2012 Use callbacks instead of lists: typedef void (*zl_zwindow_parsed_event_fn)(void *context, int nb_args, ...); void zl_xwindow_for_parsed_events(zl_xwindow *xwin, zl_zwindow_event_fn handle, void *context); Use const char to "simulate" interned symbols (hashed strings) for tags known at compile time, without actually having to implement dynamic symbol hashing. Abstracting lists of tags (i.e. a "type definition") in a macro can help here: #define ZL_DISPLAY_EV_LIST(EV) \ EV(keypress) \ EV(keyrelease) \ EV(keyrelease) \ /* Declaration */ #define ZL_DISPLAY_EV_DECL(name) extern const char *ZL_DISPLAY_EV_#name; ZL_DISPLAY_EV_LIST(ZL_DISPLAY_EV_DECL) /* Implementation */ #define ZL_DISPLAY_EV_IMPL(name) const char *zl_display_ev_#name ##name; ZL_DISPLAY_EV_LIST(ZL_DISPLAY_EV_IMPL) Some more info: /* To bridge 2 programming systems, the real hurdle is in impedance-matching the data types and memory model. In ZL this is done by keeping a very simple basic data type: C variable argument lists where tags are reprented using typedef const char *zl_tag; A. List representation ---------------------- There are essentially 2 kinds of list in C that are easy to use inside the language: - Static data initializers. This requires a union (optionally tagged) so has some syntactic overhead. Works for global and local variables. - Variable argument lists. This is a feature supported by the C standard and is very convenient to use as there is no notational overhead. Often converted data is short-lived, allowing data conversion to be implemented with nested C calls. Both are able to side-step malloc() when they are combined with some kind of callback mechanism. B. List tagging --------------- Using const char * as a tag type allows 2 things to be married in a very elegant way: - Pointer-based comparison in C without memory allocation overhead (all string memory allocation is static). - Easy conversion to run-time interned symbols (hashed strings) for dynamic languages. */ Entry: C trailing commas in function calls / definitions? Date: Sun Dec 9 15:55:58 EST 2012 [1] http://stackoverflow.com/questions/2311864/history-of-trailing-comma-in-programming-language-grammars Entry: GNU Make subdir stem? Date: Tue Dec 25 13:57:37 EST 2012 I've been using rules like this for a while: $(BUILD)/%.o: $(SRC)/%.o which works fine even if % is a subdirectory. However, the following seems to insist to take only the file basename as the stem: %.o: $(SRC)/%.c for foo/bar.o in SRC=/baz/src this expands to: foo/bar.o: foo/baz/src/bar.c Entry: GNU Make rule recursion Date: Wed Dec 26 12:05:19 EST 2012 I'm having trouble with parallel make. As far as I can see, all my deps are proper, but sometimes make exits with gcc linker stage complaining: abc.o: No such file or directory I don't see much in the debug output except this, which I don't understand: Avoiding implicit rule recursion. I'm still not sure what this is, but 2 things fixed it, splitting the original rule into to, i.e. from $(BUILD)/%.elf: $(BUILD)/%.elf $(BUILD)/lib.a to $(BUILD)/%.x: $(BUILD)/lib.a $(BUILD)/%.elf: $(BUILD)/%.elf $(BUILD)/%.x and to change to static pattern rules[1]. I'm sticking with the latter. [1] http://www.gnu.org/software/make/manual/html_node/Static-Pattern.html Entry: Atomic pointer assignments Date: Wed Jan 9 09:46:29 EST 2013 I need to build some code that run on on i386/amd64 and ARM. Is it safe to assume that pointer assigments are atomic? It's probably best to not assume anything, and hide the access to atomic structures behind a typedef and a [1] http://stackoverflow.com/questions/8919818/is-pointer-assignment-atomic-in-c Entry: Linking .o -> .elf to make binary files Date: Thu Jan 24 23:22:27 CET 2013 The idea is to make a .bin file with some data and location-independent (-fPIC) code. To make a .bin from an .elf use: objcopy -O binary --only-section=.text this does not work on a regular "-c" object file because it can contain undefined data references. Linking a .o to .elf using: gcc -nostartfiles does things like inserting .rodata constant references in code. This is is the .o before linking: 29: f3 44 0f 10 0d 00 00 movss 0x0(%rip),%xmm9 # 32 30: 00 00 and this is the fully linked .elf: 4002a9: f3 44 0f 10 0d aa 00 movss 0xaa(%rip),%xmm9 # 40035c 4002b0: 00 00 To link properly, use something like the script below. It places everything in a single .text section, after which the objcopy command above works as expected. MEMORY { /* All addresses are file-relative. */ file : ORIGIN = 0, LENGTH = 0x10000000 } SECTIONS { . = 0; .text : { /* Header is first, the rest is only code and constants. The order doesn't matter as there are offsets stored in the header. */ *(.header*) *(.text*) *(.rodata*) } >file } This places the .header section first, which allows things like: struct rai_header __attribute__((section(".header"))) PROC(info) = { .info = { .magic = RAI_MAGIC, .entry = (u64)PROC(loop), .nb_state = PROC_NB_STATE, .nb_in = PROC_NB_IN, .nb_out = PROC_NB_OUT, }, }; Here PROC(loop) is a reference to a function, which shows up as a file-relative offset because we start addressing at 0. Note that the integer .entry field needs to be able to contain a pointer: it is not possible to truncate an address value to a shorter field at link time. Entry: Universal header Date: Tue Feb 12 14:24:29 CET 2013 For storing things in Flash over a non-error-free channel, I found that there are 2 things important to make handling easier: - protect individual headers with a small CRC - don't make chunk sizes too large struct uh { u8 crc; u8 size; u8 type; u8 _reserved_; }; crc is necessary to verify the whole message size is necessary to buffer the whole message and skip to the next one type is necessary to know how to process _reserved_ is an extra byte that can be used by one of the other fields. Entry: Python-style coroutines in C Date: Sun Feb 17 19:44:33 CET 2013 One of the patterns that came up a lot in a project I worked on recently is state machines that parse "finitely nested" data structures. What I mean is that in order to parse data structures with unlimited nesting, one generally needs a "task", i.e. something with an execution (nesting) stack, i.e. a recursive decent parser. However, in practice, many data structures have a finite structure, i.e. lists of lists of things. Those do not need a stack, just a (finite collection of) state variable(s). ( From the human-understanding p.o.v. it surprises me that I've never seen this clear enough until recently. Maybe my bias is too much towards arbitrary nesting depths, i.e. programming langauges? ) So what's the problem? Suppose you're writing a parser for a nested data structure that needs to be suspended, i.e. the data is not available all at once, but only in small chunks. Traditionally, there are 2 ways to do this in C: - Use a thread. - Use an explicitly coded state machine. As long as there is a possibility to use a blocking mechanism such as a pipe, a thread might be the best solution. However, it is a bit overkill, since we really do not need a full stack Granted, there is no such thing as a "full stack" in C : each thread is always just a state machine because the stacks are always finite. But the point is about program structure, i.e. what can we do when we know exactly the nesting dept at compile time? So, we'd like to use a state machine. Trouble with state machines is that they are hard to code, because they often look like "manually inverted, nested for loops". However, this "inversion" can be done automatically. This is what most "coroutine" libraries in C are about: clever tricks to "jump into the body of a function". So what is the transformation? - Move all the state variables into an object (i.e. no local variables! either static variables or an explicit context struct) - Abstract the "blocking points", i.e. the part around the "read" functions as a sequence: - save state - return to "OS" - The "OS" then performs the read, i.e. stores memory into a buffer, or performs an externally blocking operation, and calls the function again with a block point. Something like this: struct vars { /* Continuation */ void *buf; void size; void **next; /* Program variable context */ int i, i_max; int j, j_max; }; void parse(struct vars *x) { if (x->next) goto *(x->next); x->buf = &x->i_max; x->size = sizeof(x->imax); x->next = &&read_1; return; read_1: for (x->i = 0; x->i_max; x->i++) { for (x->j = 0; x->j_max; x->j++) { // ... } } The lucky thing is that the storage of the next state as a computed goto is immediately followed by the label. This means that no special tricks are necessary to use the limited C preprocessor to generate the following "call sequence" x->buf = &buf; x->size = sizeof(buf); x->next = &&resume_point; return; resume_point: The __COUNTER__ token can be used to generate the labels. Ok, so I've implemented it and put it online[1]. First comment I get is: "Tricky, powerful & dangerous." :) Some people really don't like this. Preferring "stupid" code to make maintainability easier. Fair point. Still a neat hack though, if used with the appropriate caution. [1] http://zwizwa.be/git/shaco Entry: How to design binary data structures? Date: Fri Mar 1 13:10:33 CET 2013 The main trouble is serialization of in-memory linked data structures. However, there is a solution to this problem in wide use already: application binaries. The program that prepares those binaries is a linker, often split in 2 phases: compile time and run time linking. This can be re-used by letting a linker generate a binary with a "base address" of 0. The only measure this requires at load time is the translation of all pointers to base-relative addressing. It's probably even possible to include a relocation table in the binary, so this doesn't need to be done manually. EDIT: For my current case it seems simplest to perform the relocation manually at load time. The data structure traversal for linking could be re-used to perform additional consistency checks. Entry: OpenMP Date: Thu May 23 19:36:25 EDT 2013 #include int main(void) { #pragma omp parallel printf("Hello, world.\n"); return 0; } $ gcc -fopenmp omp.c $ ./a.out Hello, world. Hello, world. Hello, world. Hello, world. [1] http://en.wikipedia.org/wiki/OpenMP Entry: setting up cscope for code review Date: Tue Jul 16 10:22:34 EDT 2013 - Make a directory ~/cscope - Symlink all relevant project directors into ~/cscope Entry: Implementing abstract state machines in software Date: Mon Jul 22 14:11:22 EDT 2013 One of the practical challenges for implementing state machines in software is time delays. In software, FSMs are necessarily event-based (either poll or interrupt). I.e. computation is started when the environment changes. However, it is often hard to provide a strong time base. Though this is often required in practice: many digital protocols require time delays. How to abstract this properly? I.e. how to write a state machine in an abstract way such that both event routing and timing is done in a convenient way? It seems that a combination of two approaches would work: - A way for a FSM to request a delay to the host - A way for the FSM to inspect time passed (access to timer). It doesn't seem worth the hassle to try to implement a "thread-like" FSM api in C. This task is best left to a generator. This all fits into a more general idea of not using interrupts and/or threads in application development to provide a more predictable task switching semantics. In this scheme, interrupts could still be used to drive simple buffering in case DMA is not available, but use should probably only be generalized to single producer-consumer structures. Entry: Connecting state machines Date: Thu Jul 25 17:41:39 EDT 2013 After a couple of years in RTOS land, I find myself writing non-trivial code for a bare-bones uC. Non-trivial in the sense that the problem would normally be solved using a couple of communicating tasks. I am resorting to state machines, triggered essentially from interrupts. State machines are a pain to write from a C code pov, but they do allow a more straightforward handling of atomicity. It might actually be worth the hassle. Mutual exclusion would then in essence be handled by setting a single priority interrupt routine. Entry: Premature memory optimization Date: Mon Aug 12 12:35:58 EDT 2013 It seems that in deeply embedded software design, it is possible to save a lot of memory by replacing buffering with control flow. Essentially there are two kinds of buffers: - Limits set by hardware, e.g. USB endpoint buffer sizes - Limits set by software, e.g. communication between different API layers and tasks. Memory usage inefficiency comes from the second part. It's almost always possible to eliminate (deforest?) buffers and handle data correctly. Entry: Coroutines in C Date: Tue Aug 20 18:28:31 EDT 2013 Interesting links from HN[4] article. For later mining.. [1] http://www.embeddedrelated.com/showarticle/455.php [2] http://fanf.livejournal.com/105413.html [3] http://dotat.at/cgi/git/picoro.git [4] https://news.ycombinator.com/item?id=6243946 Entry: Linux coding style in emacs Date: Sat Dec 7 10:07:53 EST 2013 - M-x c-set-style linux - tabs as tabs [1] https://www.kernel.org/doc/Documentation/CodingStyle [2] http://stackoverflow.com/questions/5476092/emacs-changing-c-coding-style Entry: CPP defs Date: Mon Dec 9 10:30:12 EST 2013 To check which machine-specific macros are defined, e.g. __i386__ or __x86_64__ : cpp -dM /dev/null Entry: gdb -i=mi and target-stream-output Date: Mon Jan 6 12:13:00 EST 2014 I'm guessing a problem with gud in emacs is that the process stdout/stderr is not tagged with target-stream-output '@'. Entry: c++ rehash Date: Fri Feb 28 18:24:17 CET 2014 why virtual ~foo() {} and not virtual ~foo() = 0; ? Entry: C++ allocators Date: Sun Mar 9 12:21:29 CET 2014 I'd like to wrap a chunk of memory as a shared_ptr >, and then manually assert that the refcount is 1 after pushing it through some network. template class slice: public std::allocator { public: T* allocate(size_t n, const void *hint=0) { LOG("allocate(%d,%p)\n", n, hint); return (T*)_data; } void deallocate(T* p, size_t n) { LOG("deallocate(%p,n)\n", p, n); } slice(U8 *data, U32 data_length) : _data(data), _data_length(data_length) { } private: U8 *_data; U32 _data_length; }; Entry: John Hughes on debugging imperative code with QuickCheck Date: Fri Mar 28 22:08:18 EDT 2014 Some conclusions ([1] 18:00) - The same property can find many bugs. - Minimal failed tests make debugging easier. The latter is nice and convenient, but the former is actually a really interesting argument, especially for testing low-level code: behavioral models have _less_ properties than full-blown quirky implementations, so might test different code paths or sub-solutions for differing inputs. EDIT: The stateful testing is proprietary. It would be a nice challenge to find out how to do this. [1] http://www.youtube.com/watch?v=zi0rHwfiX1Q&feature=youtu.be Entry: Property based testing Date: Sun Mar 30 08:33:06 EDT 2014 - Model the state - Wrap each API call in a state transition function (post condition) - Generate random sequences of API calls until failure - Verify postcondition - On failure, shrink test The pre/post conditions seem straightforward. How to shrink tests? For an ordered list of N calls, how many ordered subsets are there? How can they be organized in a tree such that they can be searched more effectively? Simplification: the assumption is that some calls are irrelevant. I.e. we've found a way to trigger a bug and we just want to simplify that particular way - not find a new way to trigger. So a search strategy is a tree of permutations, with pruning for cases that stop failing. I.e. when removing a particular call no longer fails the test, we label it as essential and won't test the removal of any more calls with that one call removed as well. Note that it might be that there are triggerable bugs in that subtree! But that's not what we're after. So how to represent? - A stateful test is a list of API calls together with their arguments. - The test function is composed from the list of calls and the state transition model. Entry: gdb source debugging in emacs broken Date: Fri Aug 29 11:43:59 CEST 2014 Things break a lot these days.. So when I write a standalone app on zni or zoo, all works fine. M-x gdb gdb -i=mi app.el break main This shows the source file with current location set. However in other apps I have trouble getting this to work. Entry: Ivory Date: Sat Sep 6 15:48:57 CEST 2014 Seems a little limited to very simple state machines and loops... [1] https://www.youtube.com/watch?v=wC2tmo7l5Mc Entry: Cross compile ARM debian Date: Sun Sep 7 09:31:09 CEST 2014 Time to set up cross-compiler for beaglebone. Entry: Boost spirit parser Date: Mon Sep 29 11:44:38 CEST 2014 spirit[2] Recursive Decent parser generator Spirit V2 grammars are fully attributed (see Attribute Grammar)[3] - Qi: parser generator eDSL - Lex: lexer - Karma: output generator phoenix[1] Phoenix enables Functional Programming (FP) in C++ e.b. bind[4] _val is another Phoenix placeholder representing the rule's synthesized attribute[6] eps is a multi-purpose parser that returns a zero length match [1] http://www.boost.org/doc/libs/1_56_0/libs/phoenix/doc/html/index.html [2] http://www.boost.org/doc/libs/1_56_0/libs/spirit/doc/html/index.html [3] http://en.wikipedia.org/wiki/Attribute_grammar [4] http://www.boost.org/doc/libs/1_56_0/libs/bind/bind.html [5] http://www.boost.org/doc/libs/1_56_0/libs/spirit/doc/html/spirit/qi/reference/parse_api/iterator_api.html [6] http://www.boost.org/doc/libs/1_56_0/libs/spirit/doc/html/spirit/qi/tutorials/roman_numerals.html Entry: Keil C Date: Thu Oct 30 10:55:25 EDT 2014 What is the Listings/ directory? Seems to be generated at compile time. Entry: Quickcheck erlang eqc_fsm Date: Fri Oct 31 20:33:23 EDT 2014 For testing C libraries. Entry: State machine or buffers? Date: Thu Nov 6 16:33:40 EST 2014 I'm inclined to say state machine. Buffering takes away the control problem of having to define "suspend points", replacing them by straight line imperative code. However, it only works when segmentation is possible. Entry: IO state machines Date: Sat Nov 8 14:12:57 EST 2014 These things are not trivial. Currently writing one for a gdb stub. Seems that a lot can be generalized however. Something can be learned here.. Problem: - stream-oriented protocols - memory constraints In general, such a problem is best solved by a coroutine / task approach, where each independent element in the stream processing is a separate task. The problem is "suspending". Some approaches ordered in increasing suspend granularity: - pre-emptive multitasking: suspend at machine level - cooperative multitasking: suspend at a source-level coherent point - state machines: suspend The first two use stacks, the latter one uses explicit suspend points. Abstacting away suspend points trades in predictability for ease of use. When working with explicit suspend (state machines), there is a trade-off between control complexity and buffer memory usage. Depending on the kind of protocol, there is usually an upper limit to the size of a buffer that contains the "largest" protocol element in such a way that there is no space needed to store detailed intermediate processing. Take home argument: - Split design abstractly into cooperating coroutines - Add buffering to reduce suspend problems - Granularity is a trade-off between worst-case processing time, memory usage, code complexity due to fsm-ification. Entry: read(2) / write(2) on pipes Date: Wed Jan 7 18:55:18 EST 2015 If I do a write(2) on a pipe, is it guaranteed to be an atomic read(2)? Entry: Macro name concatenation Date: Thu Jan 15 23:10:50 EST 2015 There's this weird thing about macro expansion order combined with symbol concatenation. Forgot how it went and how to work around it. [1] http://stackoverflow.com/questions/7045358/how-to-cause-macro-expansion-before-concatenation Entry: union tagging Date: Sun Feb 8 15:21:28 EST 2015 /* Note that "static inline" doesn't necessarily inline. Adding this attribute will force GCC to inline the function even if optimization is off. */ #define INLINE static inline __attribute__((__always_inline__)) /* Union tagging. libopencm3 uses untyped uint32_t for almost all * peripherals. This is error prone in the abstractions uses below, * so the code here wraps identifiers in a union. The wrapper is * optimized away leaving no run-time overhead. */ #define HW_WRAP_UNION(name, type) \ union name { type id; }; \ INLINE union name name(type id) { \ union name u = {.id = id}; \ return u; \ } // wrapper | original libopencm3 type // -------------------+------------------------ HW_WRAP_UNION(hw_rcc, enum rcc_periph_clken) HW_WRAP_UNION(hw_tim, uint32_t) HW_WRAP_UNION(hw_gpio, uint32_t) Entry: State machines with simple for loops Date: Sat May 9 02:25:24 CEST 2015 In C without too much ado. Basic idea is to capture all the local variables in a struct, and have entry points for all the "wait" statements. These can be implemented as lables. wait(COND) then boils down to: label: if (!(COND)) { x->next = &label; return; } That's just one macro: #define WAIT(label,condition) \ label: if (!(condition)) { state->next = &&label; return 0; } Where we're assuming local context: - variable name "state" - field name "next" - 0 means task is blocked (other returns can have error conditions). Use computed goto (addresses) but if that's not available, use a case statement. EDIT: going for it. Also implementing sub-machine call (like subroutine, but executing an encapsulated state machine until halt). Entry: Circular counters Date: Tue May 12 19:34:48 CEST 2015 How to get a circular counter that also has a count for the number of wraparounds? Say we have depth d, but we actually want to have w wraps as well, giving w*d. Where w is as large as possible. Entry: poll vs. select Date: Fri May 15 18:17:29 CEST 2015 [1] http://daniel.haxx.se/docs/poll-vs-select.html Entry: mem_write_lazy Date: Sat Jun 6 12:22:21 CEST 2015 struct mem_write_lazy { uint32_t addr; uint32_t n; union { uint8_t byte[4]; uint32_t word; } data; }; static void mem_write_lazy_init(struct mem_write_lazy *m, uint32_t addr) { m->addr = addr; m->n = 0; } static void mem_write_lazy_flush(struct mem_write_lazy *m) { if (((m->addr & 3) == 3) && (m->n == 4)) { mem_write32(m->addr, m->data->word); } else { for(int i=0; in; i++) { mem_write32(m->addr+i, m->data->buf[i]); } } m->addr += m->n; m->n = 0; } static void mem_write_lazy_byte(struct mem_write_lazy *m, uint32_t val) { ... } Entry: trie encoding Date: Fri Jun 19 19:31:56 EDT 2015 Tried a naive representation of a trie, and it seems to be very space-inefficient. How to encode the trie in a single string? (char,offset) (nb_entries, (char,offset)) if char == '#', offset indicates offset in payload table else it is the number of characters to jump down the string The constraints: - space between entries <256 - nb entries <256 If this becomes a problem, easily solved by going to 16 bit encoding. Entry: Semaphores on Cortex M Date: Sat Jul 11 15:08:52 EDT 2015 I'm currenly using an approach of: - no RTOS - main event loop, polling state machines - ISRs interacting with FIFOs to bridge to main loop It's tricky to get the FIFOs right. Currently I rely on atomic read/write, but in some cases it would help to have real semaphores. How to implement on Cortex M? Entry: mmap shared Date: Sat Feb 13 21:40:17 EST 2016 http://stackoverflow.com/questions/16032396/how-to-get-mmaped-memory-to-sync-to-the-file Entry: Debian multiarch linker flags Date: Fri Feb 26 17:05:42 EST 2016 I want to build a 32bit intel binary on a 64bit system. I solved this problem before... Excerpt, with multiarch: ifeq ($(shell uname -m), x86_64) ## Build 32-bit binaries (emu_portcon.elf needs 32bit pointers) GCC_DIR := /usr/lib/gcc/i586-linux-gnu/4.9/ USR_LIB_DIR := /usr/lib/i386-linux-gnu/ LIB_DIR := /lib/i386-linux-gnu/ PLATFORM_CFLAGS := -m32 -fPIC PLATFORM_LDFLAGS := -m32 -L$(GCC_DIR) -L$(LIB_DIR) -L$(USR_LIB_DIR) -nostdlib PLATFORM_STARTFILES := $(USR_LIB_DIR)/crt1.o $(USR_LIB_DIR)/libc.a -lgcc_s else PLATFORM_CFLAGS := -fPIC endif This produces binaries that crash in exit() on tp. Let's do it the right way... Ok, likely not the right gcc version. Nope. 2 hosts do not have this, but tp does. Let's upgrade. This upgrade fixed it: ii libc6-i686:i386 2.19-18+deb8u2 ii libc6-i686:i386 2.21-9 Entry: small memory software Date: Sun Mar 13 22:48:28 EDT 2016 http://www.smallmemory.com/book.html Entry: uC program core Date: Sun Apr 24 15:49:21 EDT 2016 - "recursive" finite state machines = tasks with finite and static activation records / recursion depth. - hardware drivers = glue code to abstract state machines. - glue code. Components aren't hard. The problem is the glue code, essentially the abstraction for communication between tasks. I.e the OS. Communication (almost?) exclusively takes the form of writes to a memory location or larger buffer / queue, and synchronization to an external event in the form of an actual event (interrupt or function call), or a logical condition change that needs to be polled. The sequencer then orchestrates resumption of execution, possibly as simple as a round-robin poll. True boundless recursion is (almost?) never needed. Activation records can be allocated in advance for all tasks in a single nested struct/union. Mutually exclusive activation records (successive function calls) can use a C union. What about designing a language with those restrictions? Building applications then boils down to building state machines (finite tasks) that cause events and wait for events to happen. CSP or Actor model. Entry: Two abstractions Date: Sun Apr 24 16:01:36 EDT 2016 DE: event propagation with a concept of simultaneity. SM: finite-depth nested tasks (state machines) as C macros. Combination might prove useful. There were some problems, but I forgot what exactly. Something to do with the need to schedule an update "at the next time instance", i.e. queing side-effecting actions based on pure computations. Entry: Building a malloc-less, static Erlang Date: Sun Apr 24 16:21:59 EDT 2016 Essentially, hardware. The idea being that designing real-time applications is best done as state machines, not as tasks. Leave the hardware/software separation to a configuration stage: implement tasks that handle high event rates in hardware, implement tasks that change infrequently or don't need fast response times in software. Features: Processes are static: call tree is known at compile time. Processes have infinite life span. Language is immutable, implemented through mutation. Is it possible, by static analysis of message passing, to restrict the size of the message queues? Entry: Inline functions and instantiation (fast abstract code). Date: Sun May 8 23:27:35 EDT 2016 The pattern is this: - generic: struct config { int param; } static inline int fn(struct config c, int arg) { return c.param + arg; } - specific instantiation: static const struct config c0 = { .param = 123 }; int fn0(int arg) { fn(c0, arg); } It seems that most compilers know how to fold the constants even if it's a big struct. This makes it useful for deeply embedded code where you want to avoid the abstraction overhead. This can be significant for peripheral access that in most cases simplifies to one or two inline bus accesses, as opposed to a function that inspects the struct at run time). A way to use this is to put all instantiations in a main.c file, and have library code refer to instantiated versions of the generic inline code. Note that pointers work equally well as long as the struct referred to is const. If it is not, it is possible to externally modifiy it so compiler can't fold it. If struct is also static, no remnant will be left in the file. It might be better practice to use pointers. In case of non-optimized code this would be not horribily inefficient (call by value). Entry: Supporting multiple boards with optimized HW functions Date: Wed May 11 12:15:08 EDT 2016 I've found this hard to do in C. Situation: a number of boards with different hardware configurations (e.g pins, peripherals) but in need of fast HW access. For fast HW access, the solution is to use inline functions, possibly parameterized by static const structures. I.e. make sure the compiler has access to the config for constant foldig, and optionally doesn't need to include the config structs themselves in the image. However, this still creates a problem of instantiation. The idea is then to instantiate functions in the main .c file, one for each board. These main .c files provide an iterface to the library code that is shared across the boards. The HAL thus consists of plain C functions defined in main .c, used by library code. To further separate this I found it useful to define a set of inline.*.c files that are not compiled to .o and gathered in in the libary, but are inlined in the main file through #include preprocessor directives. These files expect symbols or macros defined outside of it. It's possible to have it both ways: - create .c files that reference a C object for config - either create this object as non-static, in which the config is dynamic - or create it as a const static object and inline the .c in main .c to give static config Depending on whether the code describes a singleton or a function that is used multiple times it can reference a global symbol or a function parameter for the configuration. Entry: Low level debugging Date: Fri May 13 17:41:56 EDT 2016 Today another one of those days.. In general I avoid low level programming - i.e. the kind that has no "feedback", where you have to be really clear about all the assumptions that need to be checked without there being anything saying: hey, this contract is broken. I ended up at a point where I don't even have a printf. Entry: multilib Date: Sun Jun 5 13:53:27 EDT 2016 gcc -m32 needs "gcc-multilib" package which removes "libc6-i686:i386". Some documentation here: https://wiki.debian.org/Multiarch/LibraryPathOverview Main question: what is the link between gcc-multilib and multiarch? - Multilib is a gcc feature to support multiple ABIs on one ISA. - Multiarch is to support multiple architectures on one debian system. Entry: rust 1.8 embedded Date: Tue Jun 7 13:33:47 EDT 2016 https://spin.atomicobject.com/2016/05/25/rust-1-8-embedded-firmware/ Entry: gdb reverse debugging Date: Fri Jul 15 12:07:35 EDT 2016 https://www.youtube.com/watch?v=PorfLSr3DDI https://www.gnu.org/software/gdb/news/reversible.html record reverse-stepi break/watch reverse-continue Entry: cross compile debian arm Date: Sat Aug 13 00:52:22 EDT 2016 Easy these days: dpkg --add-architecture armhf apt-get update apt-get install libncurses-dev:armhf gcc-arm-linux-gnueabihf With qemu-user an binfmt-support installed it can also execute the resulting binaries. Entry: low level dev tools Date: Tue Sep 27 17:47:32 EDT 2016 https://blog.fogcreek.com/dev-life-interview-with-casey-muratori/ There are no good dev tools out there today. That much is abundantly clear, and it is a topic on which I find almost all my systems-level programming friends agree, so I think it’s becoming a consensus among people who actually care about low-level code. Nobody is making dev tools for us anymore, and it shows. Also mentions "compression oriented programming": https://mollyrocket.com/casey/stream_0019.html Entry: const pointers Date: Sat May 25 11:00:32 EDT 2019 A pointer to a const pointer. Yeah I dont remember how this works. Use case is this: I want to get a const *var_t by having a function set a pointer. typedef struct { int val; } var_t; int get_var(var_t **ppvar) { } const var_t *pvar; get_var(&pvar); How to interleave the const stuff? Entry: C type classes Date: Tue Apr 28 21:41:45 EDT 2020 // Define the data structure abstractly as macro that invokes its // argument on every field #define my_struct_fields(def_field) \ def_field(name1, type1) \ def_field(name2, type2) // We can then plug this into a number of "definers". E.g. one that // instantiates the struct: #define DEF_STRUCT_FIELD(name,type) type name; #define DEF_STRUCT(name) struct name { name##_fields(DEF_STRUCT_FIELD) }; DEF_STRUCT(my_struct) // And others variants of the type template, e.g. serializer functions. #define DEF_WRITER_WRITE(name,type) pbuf_write_##type(p,s->name); #define DEF_WRITER(name) \ static inline void pbuf_write_##name( \ struct pbuf *p, \ const struct name *s) { \ name##_fields(DEF_WRITER_WRITE) \ } DEF_WRITER(my_struct) // Which creates write functions that take a struct defined by the // // previous macro construct, and create a writer function for that // struct. Here the buffer abstraction from uc_tools is used (pbuf). Entry: GCC asm syntax Date: Mon May 11 09:11:25 EDT 2020 TL;DR output section: =r write to register +r read and write to register input section: r read from register An example: struct channel { uint32_t setpoint; uint32_t accu; }; INLINE void channel_update( struct channel *channel, uint32_t *shiftreg) { __asm__ ( " adds %0, %0, %2 \n" // update accu, update carry " adc %1, %1, %1 \n" // shift carry flag into LSB : "+r"(channel->accu), // %0 read/write "+r"(*shiftreg) // %1 read/write : "r"(channel->setpoint) // %2 read : ); } Entry: bin to elf Date: Sat May 16 10:14:24 EDT 2020 arm-none-eabi-objcopy \ --input-target=binary \ --output-target=elf32-littlearm \ 17.10.03.bin 17.10.03.elf tom@panda:~$ arm-none-eabi-objdump -x 17.10.03.elf 17.10.03.elf: file format elf32-little 17.10.03.elf architecture: UNKNOWN!, flags 0x00000010: HAS_SYMS start address 0x00000000 Sections: Idx Name Size VMA LMA File off Algn 0 .data 00004cc5 00000000 00000000 00000034 2**0 CONTENTS, ALLOC, LOAD, DATA SYMBOL TABLE: 00000000 l d .data 00000000 .data 00000000 g .data 00000000 _binary_17_10_03_bin_start 00004cc5 g .data 00000000 _binary_17_10_03_bin_end 00004cc5 g *ABS* 00000000 _binary_17_10_03_bin_size Can this still be relocated?