Sun Jan 15 11:27:14 EST 2012

Analyzing C

Yesterday I took a stab at parsing a real world C file.  The goals are:

 1. Generate external representations of binary data structures
    defined as packed structures.  Main goal: generate load/save
    routines in a language different than C.

 2. More general problem: generate function and data wrappers for
    scripting languages.  Aiming at Lua for my current real-world
    problem, but eventually this should fuel libprim[1].

The problem I ran into is the structure of C syntax.  Getting C into
an AST is not a trivial problem.  However, the ad-hoc nature of C
syntax makes it hard to work with it directly.

It makes a lot of sense to transform a C file into something that's
more manageable, like:

  - type definitions (struct, union, enum, typedef, function).

  - external (no storage) declarations for data / functions.

  - internal (storage) declarations for data / functions.

For my current application the storage declarations of variables and
functions can be ignored.  I'm mostly interested in 1. types and
2. external declarations.

The first approach I took was to write a parameterized recursion over
raw C AST.  It turns out that this is too complicatied.  Before raw C
syntax is usable, it probably needs to be translated into something
simpler, reflecting the 5 categories of objects mentioned above.

So I wonder, did anyone do this before?

Looks like I didn't really do my homework.  There's plenty in the
Language.C package that deals with higher level processing.  Let's
have a look at it.

Summary: I burned my fingers by failing to see the complexity of C
         syntax.  Disappointing in that what I want to do seems more
         complicated than I thought, but one step closer to actually
         getting it done!

[1] entry://../libprim