Sun Jan 15 11:27:14 EST 2012
Yesterday I took a stab at parsing a real world C file. The goals are:
1. Generate external representations of binary data structures
defined as packed structures. Main goal: generate load/save
routines in a language different than C.
2. More general problem: generate function and data wrappers for
scripting languages. Aiming at Lua for my current real-world
problem, but eventually this should fuel libprim.
The problem I ran into is the structure of C syntax. Getting C into
an AST is not a trivial problem. However, the ad-hoc nature of C
syntax makes it hard to work with it directly.
It makes a lot of sense to transform a C file into something that's
more manageable, like:
- type definitions (struct, union, enum, typedef, function).
- external (no storage) declarations for data / functions.
- internal (storage) declarations for data / functions.
For my current application the storage declarations of variables and
functions can be ignored. I'm mostly interested in 1. types and
2. external declarations.
The first approach I took was to write a parameterized recursion over
raw C AST. It turns out that this is too complicatied. Before raw C
syntax is usable, it probably needs to be translated into something
simpler, reflecting the 5 categories of objects mentioned above.
So I wonder, did anyone do this before?
Looks like I didn't really do my homework. There's plenty in the
Language.C package that deals with higher level processing. Let's
have a look at it.
Summary: I burned my fingers by failing to see the complexity of C
syntax. Disappointing in that what I want to do seems more
complicated than I thought, but one step closer to actually
getting it done!