Formal
Languages
and the
Theory of
Computation

Fall 2016
course
navigation

Jim's parsing exercise answers

These are some answers to the exercises posed on the programming language parsers page.

1. ambiguity

grammar 1 is ambiguous because on this input (assuming we've used a lexer to turn variables and numbers into tokens)
id * id + id
it allows both this parse tree
EXPR | EXPR OP EXPR | | | id * EXPR OP EXPR | | | id + id
and this one
EXPR | EXPR OP EXPR | | | EXP OP EXPR | | | | | | | id * id + id
which are different parse trees.
On the other hand, grammar 2 only has one possibility, namely
EXPR | EXPR ADDOP EXPR | | | TERM | TERM | | | TERM MULOP FACTOR | FACTOR | | | | | FACTOR | | | | | | | | | id * id + id
For grammar 2, the ordering of expr -> term -> factor -> id has the precedence of operations (i.e. multiplication before addition) built in.

2. LL, LR, recursive descent

I will follow the notation and ideas in http://tackoverflow.com/questions/1044600/difference-between-an-ll-and-recursive-descent-parser .
First, grammer 1 from above after the lexer runs to turn things into tokens can be written as
EXPR = id EXPR = number EXPR = - EXPR EXPR = ( EXPR ) EXPR = EXPR OP EXPR OP = + OP = - OP = * OP = /
where the tokens (terminals) are (id, number, +, -, * /), the symbols are (EXPR, OP), and all the different choices have been listed out explicitly.
The string to parse is
id * id + id

LL

The two allowed operations are
predict apply one of the grammar rules on the left-most possible symbol match remove the left terminal from production and string if they match
and we get
production input action ----------------------------------------------- EXPR id * id + id predict EXPR = EXPR OP EXPR (from "id *" lookahead) EXPR OP EXPR id * id + id predict EXPR = id (from "id *" given EXPR OP) id OP EXPR id * id + id match id OP EXPR * id + id predict OP = * (from * lookahead) * EXPR * id + id match EXPR id + id predict EXPR = EXPR OP EXPR (from "id +", no OP) EXPR OP EXPR id + id predict EXPR = id id OP EXPR id + id match OP EXPR + id predict OP = + + EXPR + id match EXPR id predict EXPR = id id id match
with the productions on the left providing the top-down tree :
EXPR | EXPR OP EXPR | | | id * EXPR OP EXPR | | | id + id
Note that there is no ambiguity in the LL algorithm : it must go left-to-right and therefore cannot match the + higher up in the tree than * for this grammar.
Note also that our "lookahead" choice uses k input tokens (in this case k is 2) and also uses the current production to choose which rule to apply.

LR

Now the two allowed operations are
reduce reverse a grammar rule, replacing possibly several symbols with one shift move the next (left) terminal from the input to the end of the workspace workspace input action --------------------------------------------- id * id + id shift id * id + id reduce EXPR -> id (match right side; replace with left) EXPR * id + id shift EXPR + id + id reduce OP -> + (note match is on part of workspace) EXPR OP id + id shift EXPR OP EXPR + id reduce EXPR -> id EXPR + id reduce EXPR -> EXPR OP EXPR EXPR + id shift EXPR OP id reduce OP -> + EXPR OP id shift EXPR OP EXPR reduce EXPR -> EXPR OP EXPR EXP
and we stop successfully there, with no input left to shift and our workspace in the start symbol namely EXP .
This time the workspace shows the parse tree upside down, with start at the bottom.

recursive descent

As per the discussions at
A recursive descent algorithm
So in terms of a recursive descent approach,
terminal means "must match and 'consume' that token now from input" A (symbol) means recursively call the function A which tries to match that A B (sequence) means "call A(), and if that works call B()" A | B (choice) means "call A(), if that fails, back up and call B() instead"
To work, we must not have left recursion (or implied left recursion) in rules, or this will fail by descending infinitely.
Example of failing by left recursion :
A -> B | A ; OK A(B()) is tried first, and can succeed to stop recursion A -> A | B ; _not_ OK since it tries A(A(A(A(...))) and never backtracks
Note that grammar 2 is written in a way that makes recursive descent possible. This rule is _not_ left recursive as written
expr = term | expr addop term
As a recursive descent algorithm, this says to try matching a term first, which can stop the recursive descent.
But if the rule had been written
expr = expr addop term | term
which is the same as far as the context free grammar is concerned, the recursive descent algorithm would call exp() recursively forever, and descend down the left most branch of of the possible parse tree without any way to backtrack.