Jim's parsing exercise answers
1. ambiguity
grammar 1 is ambiguous because on this
input (assuming we've used a lexer
to turn variables and numbers into tokens)
id * id + id
it allows both this parse tree
EXPR
|
EXPR OP EXPR
| | |
id * EXPR OP EXPR
| | |
id + id
and this one
EXPR
|
EXPR OP EXPR
| | |
EXP OP EXPR | |
| | | | |
id * id + id
which are different parse trees.
On the other hand, grammar 2 only has one possibility, namely
EXPR
|
EXPR ADDOP EXPR
| | |
TERM | TERM
| | |
TERM MULOP FACTOR | FACTOR
| | | | |
FACTOR | | | |
| | | | |
id * id + id
For grammar 2, the ordering of expr -> term -> factor -> id
has the precedence of operations
(i.e. multiplication before addition) built in.
2. LL, LR, recursive descent
First, grammer 1 from above after the lexer runs to turn things into tokens
can be written as
EXPR = id
EXPR = number
EXPR = - EXPR
EXPR = ( EXPR )
EXPR = EXPR OP EXPR
OP = +
OP = -
OP = *
OP = /
where the tokens (terminals) are (id, number, +, -, * /),
the symbols are (EXPR, OP), and all the different choices
have been listed out explicitly.
The string to parse is
id * id + id
LL
The two allowed operations are
predict apply one of the grammar rules on the left-most possible symbol
match remove the left terminal from production and string if they match
and we get
production input action
-----------------------------------------------
EXPR id * id + id predict EXPR = EXPR OP EXPR (from "id *" lookahead)
EXPR OP EXPR id * id + id predict EXPR = id (from "id *" given EXPR OP)
id OP EXPR id * id + id match id
OP EXPR * id + id predict OP = * (from * lookahead)
* EXPR * id + id match
EXPR id + id predict EXPR = EXPR OP EXPR (from "id +", no OP)
EXPR OP EXPR id + id predict EXPR = id
id OP EXPR id + id match
OP EXPR + id predict OP = +
+ EXPR + id match
EXPR id predict EXPR = id
id id match
with the productions on the left providing the top-down tree :
EXPR
|
EXPR OP EXPR
| | |
id * EXPR OP EXPR
| | |
id + id
Note that there is no ambiguity in the LL algorithm :
it must go left-to-right and therefore cannot
match the + higher up in the tree than * for this grammar.
Note also that our "lookahead" choice uses k input tokens
(in this case k is 2) and also uses the current production
to choose which rule to apply.
LR
Now the two allowed operations are
reduce reverse a grammar rule, replacing possibly several symbols with one
shift move the next (left) terminal from the input to the end of the workspace
workspace input action
---------------------------------------------
id * id + id shift
id * id + id reduce EXPR -> id (match right side; replace with left)
EXPR * id + id shift
EXPR + id + id reduce OP -> + (note match is on part of workspace)
EXPR OP id + id shift
EXPR OP EXPR + id reduce EXPR -> id
EXPR + id reduce EXPR -> EXPR OP EXPR
EXPR + id shift
EXPR OP id reduce OP -> +
EXPR OP id shift
EXPR OP EXPR reduce EXPR -> EXPR OP EXPR
EXP
and we stop successfully there,
with no input left to shift and our workspace in the start symbol
namely EXP .
This time the workspace shows the parse tree upside down,
with start at the bottom.
recursive descent
As per the discussions at
A recursive descent algorithm
- does same languages as LL(*)
- has worst case O(k**n) for an LL(k) grammar on a string of length n.
- implementation of "parsing expression grammar" (PEG)
- ... which look like grammars, but (A | B | C) means
- "first try A, and if that fails, then try B " ... namely _branching_
- ... which means that the order matters (unlike a context free grammar)
So in terms of a recursive descent approach,
terminal
means "must match and 'consume' that token now from input"
A (symbol)
means recursively call the function A which tries to match that
A B (sequence)
means "call A(), and if that works call B()"
A | B (choice)
means "call A(), if that fails, back up and call B() instead"
To work, we must not have left recursion (or implied left recursion) in rules,
or this will fail by descending infinitely.
Example of failing by left recursion :
A -> B | A ; OK A(B()) is tried first, and can succeed to stop recursion
A -> A | B ; _not_ OK since it tries A(A(A(A(...))) and never backtracks
Note that grammar 2 is written in a way that makes
recursive descent possible. This rule is _not_ left recursive as written
expr = term | expr addop term
As a recursive descent algorithm, this says to try matching
a term first, which can stop the recursive descent.
But if the rule had been written
expr = expr addop term | term
which is the same as far as the context free grammar is concerned,
the recursive descent algorithm would call exp() recursively
forever, and descend down the left most branch of of the
possible parse tree without any way to backtrack.