Parsing and compiling Software Packages * Parse::RecDescent (perl) * Lex & Yacc (C and C++; see http://dinosaur.compilertools.net/) * Bison (successor to yacc; generates LALR parsers from BNF-like grammers and Flex) * Flex (successor to lex; recognizes tokens from regular expressions) * SableCC (LALR from BNF; sablecc.org), JavaCC (Sun, recursive descent; javacc.dev.java.net), Jaccie (educational, LL,LR,SLR,LALR, interactive GUI) (Java; see http://catalog.compilertools.net/java.html for a longer list) Vocbulary * Context Free Grammar * Types of parsing algorithms * "SLR" (Simple LR) < "LALR" (Look-Ahead LR) < "LR(1)" parsers See "LR parser" (Left-to-right input; Rightmost derivation) on wikipedia. Also called "bottom-up" * "LL" parser , also called "top-down" parsing * "Recursive descent" parsing, another type of top down From the wikipedia article "left recursion" and "recursive descent" * "each procedure usually implements one of the production rules" * These parsers can't handle "left recursive rules", e.g. ::= + because the associated procedure, e.g. function Expr(){ Expr(); match('+'); Term();} goes into an infinite loops as it tries to go all the way down the left side of the parser tree. These can also happen in an indirect way, e.g. A ::= B "a" | C B ::= A "b" | D is effectively A ::= ( A "b" | D ) "a" | C which is left recursive. Usually left recursive rules can be re-written to avoid this problem, e.g. A ::= A "a" | A "b" | B | C is changed to A ::= B A' | C A' A' ::= "" | "a" A' | "b" A' where the new non-terminal A' is intuitively the tail of A. * BNF (Backus-Naur form) is a standard computer programming language way to write context free grammars. There seem to be a number of variations in common use. Some of the basic notations include ::== is defined as | or non-terminal (to distinguish from terminals; optional) "terminal" terminal (especially a single char, e.g. ")" ) [ ] optional { } repeat zero or more times ( ) grouping Here's an example that expresses BNF (without <>) in BNF, from http://cui.unige.ch/db-research/Enseignement/analyseinfo/AboutBNF.html. syntax ::= { rule } rule ::= identifier "::=" expression expression ::= term { "|" term } term ::= factor { factor } factor ::= identifier | quoted_symbol | "(" expression ")" | "[" expression "]" | "{" expression "}" identifier ::= letter { letter | digit } quoted_symbol ::= """ { any_character } """ Here's another example that shows when the <> might be used, when words without <> are meant to be terminals. := if then [ else ] end if ; --------------------------------------------------------------------- Lingua::Romana::Perligata on CPAN * see the paper http://www.csse.monash.edu.au/~damian/papers/HTML/Perligata.html (For the purposes of this topic, check out particularly the grammer in Appendix B.) * The Sieve of Eratosthenes : #! /usr/bin/perl -w use Lingua::Romana::Perligata; adnota Illud Cribrum Eratothenis maximum tum val inquementum tum biguttam tum stadium egresso scribe. vestibulo perlegementum da meo maximo . maximum tum novumversum egresso scribe. da II tum maximum conscribementa meis listis. dum damentum nexto listis decapitamentum fac sic lista sic hoc tum nextum recidementum cis vannementa da listis. next tum biguttam tum stadium tum nextum tum novumversum scribe egresso. cis "If you have to ask 'Why?', then the answer probably won't make any sense to you either."