[<<][compsci][>>][..]
Wed Jul 5 10:25:45 EDT 2017

recursive descent parsers

About parsers.  I don't really understand table parsers, and most
parsers I've written are recursive descent parsers, as most languages
I need to parse are very lisp-like, e.g. do not need backtracking.

I would like to get to a point where this "feedforward" nature is
expressed in a more direct way.

Take as example the GDB status language, which in first approximation
is a nested set of key value bindings and is quite representative of a
lot of configuration languages out there:

<atom>    ::= <symbol> | <number>   ; not specified further
<obj>     ::= <atom> | <dict>
<dict>    ::= "{" <entries> "}"
<entries> ::= "" | <entry> | <entry> "," <entries>
<entry>   ::= <atom> | <binding>
<binding> ::= <symbol> "=" <obj>


The property of the parser is to be able to pop a character and
determine what to do with it without putting it back.  This way the
structure is a left fold.


A natural way to represent this is to split the tokenizer and parser.
The tokenizer will just collect non-control characters in a list, and
upon encounter of a control character, will push the atom somewhere.

This "somewhere" is what this is all about.


Representation: the "current expression" is an inside-out term with a
hole in it (a zipper or cursor).  The tokenizer will fill this hole
whenever an atom is parsed, and will create a new expression with a
hole.

The point is then to establish the meaning of the control characters
as "hole transformers".  E.g. each control character is a function
that takes an atom and a hole, and produces a new hole.

The "primitive hole" is then just an object.  When the parser calls
this it will will terminate.



Now one by one, define the meaning of the control characters.

","  if the current hole is an atom, change it to a list hole,
     otherwise, append to current list hole
"="  if the current hole is an atom, change it to a binding

"}"  delete the hole at the end of the list (or fill it with nil?)

This structure needs to know the type of the current hole to be able
to transform it.  That is not the same as a lisp parser.

How to represent this?  Because of continuation transformation it
seems impossible to represent a continuation as a function.



I can't really stabilize thought about this.  There is something about
the idea of attaching concrete meaning to individual characters that
is very appealing, but oth it seems convoluted.

There is no need for backtracking, but the "hole transformation" is
definitely a form of going back and correcting a previous assumption.

Let's forget about "=", but do the list control characters first.


"}"  close current list hole with a nil
"{"  insert a list hole in the current hole
","  push a pair into the current list hole


A hole is not a continuation, it is a continuation transformer:

(Obj,K) -> K


Some context:

- parse starts with a continuation (a hole).

- a control character takes the current token, pushes it into the
  hole, and updates the hole


"}" is tricky as it has two meanings:
  - empty atom, insert []
  - non-empty atom, insert [a]


I lost it... it seems like a good idea but I can't get a hold of it.


Maybe the reason that this is difficult is that I'm not separating out
the tokens.

Thinking a bit, it seems that what makes this difficult is exactly the
postfix nature: "," and "=" change the meaning of the text that comes
before.

Maybe this can be solved in the tokenizer?  E.g. the tokenizer should
turn the input stream into a prefix-only stream, transorming [a = b]
into [= a b].

Note that this is not possible for lists, as those are delimited, but
that might not be such a problem.

In any case this does start to look like a waste of time for the
current task of parsing the GDB message format.




[Reply][About]
[<<][compsci][>>][..]