[<<][compsci][>>][..]Mon Feb 16 11:21:54 CET 2009

Before storming into a dark room writing yet another ad-hoc recursive descent parser, it might be a good idea to turn on the lights and look at languages first. AUTOMATA -------- Automata are classified by the class of formal languages they can recognize. Types of Finite Automata[1] : DFA: deterministic finite: each state has a exactly one transition for every symbol in the alfabet. NFA: nondeterministic finite: .... zero or more transitions ... (symbol with no state: input is _rejected_) It can be shown that these [D|N]FA can accept the same languages. These languages are called _regular_. Extensions to Finite Automata: PDA: push-down: FA + stack. NPDAs accept the _context free_ languages. LBA: linear bounded: a turing machine with a tape length proportional to the input string. LBAs accept the _context sensitive_ languages. TM: turing machine: equivalent to algorithms. Turing machines decide/accept recursive languages and recognize the recursively enumerable languages. FORMAL LANGUAGES ---------------- Let's stick to formal languages defined by a formal grammar[3]. A grammar is a set of rules for transforming strings. These rules are called _productions_. All the strings in the language are generated by applying the grammar rules to a collection of _start symbols_. If there are multiple ways of generating the same string, then the grammar is said to be _ambiguous_. Note that from a practical point (parsing) this is upside down: parsing consists of two steps: - validate a string (is it part of the language?) - as a side effect: find the exact production rule(s) to build a parse tree representation of the input (convert concrete syntax to abstract syntax) Ambiguity is problematic for the latter part: you really want a single parse tree to which to attribute semantics later. Note: The production rules approach is very different from recognition-based PEG - parsing expression grammars, where the language is the set of inputs recognized by the parser expression (a formal representation of a recursive descent parser). A context-free language can be recognized in O(n3) time, but there are a couple of subclasses for which linear algorithms exist: LR Left-to-rigth, Rightmost derivation LALR Lookahead LR LL Left-to-right, Leftmost derivation LL(k) LL with k lookahead, without backtracking LL(k) Can be implemented with recursive decent parsers. Lisp is LL(1) A derivation[4] is a convenient way to express how a particular input string can be produced by fixing a replacement strategy (i.e. always replace Leftmost or Rightmost non-terminal first) and listing the rules applied using that strategy. This is not unique for an ambiguous grammar. [1] http://en.wikipedia.org/wiki/Automata_theory [2] http://en.wikipedia.org/wiki/Context-free_languages [3] http://en.wikipedia.org/wiki/Formal_grammar [4] http://en.wikipedia.org/wiki/Context-free_grammar#Derivations_and_syntax_trees

[Reply][About]

[<<][compsci][>>][..]