Jim's parsing exercise answers

These are some answers to the exercises posed on the programming language parsers page.

1. ambiguity

grammar 1 is ambiguous because on this input (assuming we've used a lexer to turn variables and numbers into tokens)

   id * id + id

it allows both this parse tree

   EXPR
    |
    EXPR OP EXPR
    |    |  |
    id   *  EXPR OP EXPR
            |    |  |
	    id   +  id

and this one

   EXPR
    |
    EXPR           OP  EXPR
    |              |   |
    EXP OP EXPR    |   |
    |   |  |       |   |
    id  *  id      +   id

which are different parse trees.

On the other hand, grammar 2 only has one possibility, namely

   EXPR
   |
   EXPR                    ADDOP  EXPR
   |                       |      |
   TERM                    |      TERM
   |                       |      |
   TERM    MULOP  FACTOR   |      FACTOR
   |       |      |        |      |
   FACTOR  |      |        |      |
   |       |      |        |      |
   id      *      id       +      id

For grammar 2, the ordering of expr -> term -> factor -> id has the precedence of operations (i.e. multiplication before addition) built in.

2. LL, LR, recursive descent

I will follow the notation and ideas in http://tackoverflow.com/questions/1044600/difference-between-an-ll-and-recursive-descent-parser .

First, grammer 1 from above after the lexer runs to turn things into tokens can be written as

  EXPR = id
  EXPR = number
  EXPR = - EXPR
  EXPR = ( EXPR )
  EXPR = EXPR OP EXPR
  OP   = +
  OP   = -
  OP   = *
  OP   = /

where the tokens (terminals) are (id, number, +, -, * /), the symbols are (EXPR, OP), and all the different choices have been listed out explicitly.

The string to parse is

  id * id + id

LL

The two allowed operations are

 predict     apply one of the grammar rules on the left-most possible symbol
 match       remove the left terminal from production and string if they match

and we get

 production           input            action
 -----------------------------------------------
 EXPR                 id * id + id     predict   EXPR = EXPR OP EXPR  (from "id *" lookahead)
 EXPR OP EXPR         id * id + id     predict   EXPR = id    (from "id *" given EXPR OP)
 id OP EXPR           id * id + id     match id
 OP EXPR              * id + id        predict   OP = *       (from * lookahead)
 * EXPR               * id + id        match
 EXPR                 id + id          predict   EXPR = EXPR OP EXPR (from "id +", no OP)
 EXPR OP EXPR         id + id          predict   EXPR = id
 id OP EXPR           id + id          match
 OP EXPR              + id             predict   OP = +
 + EXPR               + id             match
 EXPR                 id               predict   EXPR = id
 id                   id               match

with the productions on the left providing the top-down tree :

   EXPR
    |
    EXPR OP EXPR
    |    |  |
    id   *  EXPR OP EXPR
            |    |  |
	    id   +  id

Note that there is no ambiguity in the LL algorithm : it must go left-to-right and therefore cannot match the + higher up in the tree than * for this grammar.

Note also that our "lookahead" choice uses k input tokens (in this case k is 2) and also uses the current production to choose which rule to apply.

LR

Now the two allowed operations are

 reduce    reverse a grammar rule, replacing possibly several symbols with one
 shift     move the next (left) terminal from the input to the end of the workspace

 workspace          input             action
 ---------------------------------------------
                    id * id + id      shift
 id                 * id + id         reduce   EXPR -> id  (match right side; replace with left)
 EXPR               * id + id         shift
 EXPR +             id + id           reduce   OP -> +   (note match is on part of workspace)
 EXPR OP id         + id              shift
 EXPR OP EXPR       + id              reduce   EXPR -> id
 EXPR               + id              reduce   EXPR -> EXPR OP EXPR
 EXPR +             id                shift
 EXPR OP            id                reduce   OP -> +
 EXPR OP id                           shift
 EXPR OP EXPR                         reduce   EXPR -> EXPR OP EXPR
 EXP

and we stop successfully there, with no input left to shift and our workspace in the start symbol namely EXP .

This time the workspace shows the parse tree upside down, with start at the bottom.

recursive descent

As per the discussions at

A recursive descent algorithm

does same languages as LL(*)
has worst case O(k**n) for an LL(k) grammar on a string of length n.
implementation of "parsing expression grammar" (PEG)
- ... which look like grammars, but (A | B | C) means
- "first try A, and if that fails, then try B " ... namely _branching_
- ... which means that the order matters (unlike a context free grammar)

So in terms of a recursive descent approach,

  terminal
     means "must match and 'consume' that token now from input"
	  
 A        (symbol)
    means recursively call the function A which tries to match that
	  
 A B      (sequence)
    means "call A(), and if that works call B()"
	  
 A | B  (choice)
    means "call A(), if that fails, back up and call B() instead"

To work, we must not have left recursion (or implied left recursion) in rules, or this will fail by descending infinitely.

Example of failing by left recursion :

    A -> B | A       ; OK  A(B())  is tried first, and can succeed to stop recursion
    
    A -> A | B       ; _not_ OK since it tries A(A(A(A(...))) and never backtracks

Note that grammar 2 is written in a way that makes recursive descent possible. This rule is _not_ left recursive as written

   expr = term | expr addop term

As a recursive descent algorithm, this says to try matching a term first, which can stop the recursive descent.

But if the rule had been written

   expr = expr addop term | term

which is the same as far as the context free grammar is concerned, the recursive descent algorithm would call exp() recursively forever, and descend down the left most branch of of the possible parse tree without any way to backtrack.

http://cs.marlboro.edu/ courses/ fall2016/formal_languages/ notes/ jims_parsing_exercise_answers
last modified Thursday September 29 2016 1:22 am EDT

Formal
Languages
and the
Theory of
Computation

course

navigation

Jim's parsing exercise answers

1. ambiguity

2. LL, LR, recursive descent

LL

LR

recursive descent

FormalLanguagesand theTheory ofComputation

course

navigation

Jim's parsing exercise answers

1. ambiguity

2. LL, LR, recursive descent

LL

LR

recursive descent

Formal
Languages
and the
Theory of
Computation