Wed Aug 11 19:15:31 CEST 2010

An argument for NULL-terminated lists

It's funny how a seemingly arbitrary choice keeps roaring its head.

If you represent an array in C or any other low-level language that
allows pointers, you have a choice between:

       - size + contiguous vector (SV)       i.e. Pascal string

       - contiguous vector + sentinel (V0)   i.e. C string

Up to now I've found that all other things being equal, the SV is
better because 1) you don't need out-of-band values and 2) you know
the size without traversal.

Granted, NULL usually isn't a problem as an out-of-band value, but it
can be problematic in other cases where the elements are not simply
memory addresses.

Knowledge of the total size can come in handy when the vector is being
accessed sequentially (i.e. during serial communication), and you need
to allocate temporary storage.  In the V0 case you need to traverse
twice: once to get the size, and once to copy the data.  This can be
expensive and in some cases even impossible, in which case you need to
start guessing the size and handle cases where you guessed to low.

However, today I've found a case where V0 is actually better: when you
want a compact representation of traversal stacks for tree data

In the V0 case you just need to store a stack of element pointers, and
pop the stack whenever the sentinel is reached (i.e. the "return"
instruction).  In the SV case you need to store _two_ words per
recursion level: one current pointer, and one counter or pointer to
the end of the data structure(3).

(1) Sometimes you do want explicit traversal stacks: recursion in C
    can be memory-inefficient due to presence of arguments or local
    variables that are "global" to the data structure descent.

(2) Actually implementing a traversal on a SV-style tree it struck me
    that I could still use a single-word stack by _copying_ the
    elements to the stack all at once.  Of course this is not a real
    solution as it less efficient overall wrt. stack space: whole
    lists are stored instead of pointers into lists.

(3) The V0 vector case are CDR-coded linked lists, meaning you can
    easily "pop" such a data structure[1].

[1] entry://20081023-144321