Sat Jan 14 11:20:06 EST 2012


I want to have a standard approach for analyzing C files.  This means
in practice to find a way to factor the recursion over a C AST into
something that is manageable.  I'm taking the approach of writing a
single recursion, parameterized by a type class of analysis methods.

This seems to go quite well, though the first hurdle is recursive
structure declarations.  In Language.C both toplevel declarations and
member declarations are captured in the same CDeclaration type.  We
really want to separate those, because of the "just gather" semantics
wanted for the toplevel declarations, and recursive "compile tree"
semantics of the member declarations.

It seems that "managing the recursion" is 99% of the problem.

Anyways, how to capture this?  It's quite complex, but I think with a
proper structure this can be captured in a good way.  What assumptions
to make?

1.  The result should be a tree structure of a single type.  This
    keeps type signatures simple.

2.  Recursion should be monadic: this allows the user to collect
    statistics instead of just building a tree structure.

3.  Toplevel forms should have m () return type because they don't fit
    the nested structure well.  There should be a way to "ignore"

Yeah this needs a different approach.  Stacking all that functionality
in a single class is not a good idea.

Let's keep CAnalyze but rename it to CAtop: toplevel analysis.  Then,
when gathering lists of function / union / structure definitions /
declarations it's possible to write individual analysis for them.

Not so simple.  Maybe this needs a top down approach: write the wanted
ADT for the analysis result, then 

So, for structs, the problem is that a structure type can be nested,
i.e. for:

T = { T1 m1, T2 m2, ... }

In practice this doesn't happen so much, but it needs to be handled.

The trouble is that in my handwaving I didn't think of this in the
eventual result, which would be a Lua description of a binary data
structure, used for parsing.  There is no real issue though, just that
it needs to be recursive.

It seems to be best to skip on the typeclass and first write a
recursion that produces a known type, then later replace the
constructors with type class functions if that's still needed.

Struct is too complicated.  Let's take an error-driven approach to
implement the subset, starting with:

caData = struct where
  oops ast = error $ show $ strip ast
  struct = oops    

Hm... messed it up again.  It almost worked, but only for structs.
Typedefs didn't show up which made the type info incomplete.  To make
it complete, it needs all typedefs also.

I'm also confusing type name bindings and structure information.

Trouble is that in C syntax this all happily lives together..

What a mess!

Next: use the same error-driven approach, but make sure first that the
semantics of the eventual type description structure is sane.

Note that this needs two parts:

  - declaration of types
  - declaration of variables / definition of functions

It's probably best to keep those to COMPLETELY separate

What about using an SSA-style form for the types, providing a name for
each unnamed subtypes.  This then enables at least a flat

The main idea seems to be to separate type definitions from type

What about this?  It doesn't cover everthing (such as arrays) but it
seems like a good start.

data TypeBase = Prim String  
              | User String    -- typedef
              | Enum String
              | Struct String 
              | Union  String
              deriving (Eq, Show)

data TypeRef = TypeRef TypeBase Int                       
             deriving (Eq, Show)

data TypeDef = DefUser   String TypeRef
             | DefStruct String [TypeRef]
             | DefUnion  String [TypeRef]
             | DefEnum   String [(String, Int)]
             deriving (Eq, Show)

Maybe it's better to type the namespaces directly.

So.. using error-driven approach I get to something, but probably need
to refactor it a bit..  Also some more general parsing/matching will
be necessary to parse annotation lists.