EECS665 – Compiler Construction – Drew Davidson $\newcommand{\bigsqcap}{\mathop ⨅}$

    Parsing


    Overview

    Given a CFG and a string (a sequence of tokens), the goal of parsing is to determine whether the string is in the language of the CFG. The answer is yes iff the string can be derived from the CFG's start nonterminal. Equivalently, the answer is yes iff we can build a parse tree for the string, with the CFG's start nonterminal at the root.

    There are a number of algorithms that can answer this question for any CFG, even ambiguous CFGs. Their worst-case running time is O(N3), where N is the length of the input (the number of tokens in the input string). Some algorithms have better running times for non-ambiguous CFGs. For example, there is an algorithm called Earley's algorithm that has worst-case runtime O(N2) for non-ambiguous grammars, but even a quadratic-time algorithm may considered too slow.

    Fortunately, there are classes of grammars for which O(n) parsers can be built (and given a grammar, we can quickly test whether it is in such a class). Two such classes are:

    LL(1) Scan input left-to-right Do a leftmost derivation One token of look-ahead

    and

    LALR(1) do a leftmost derivation One token of look-ahead scan the input right-to-left "Look-ahead". This has nothing to do with the number of tokens the parser can look at before it chooses what to do -- it is a technical term that only means something when you study how LR parsers work

    LALR(1) grammars are:

    • More general than LL(1) grammars (every LL(1) grammar is also LALR(1) but not vice versa).
    • The class of grammars accepted by the parser generators yacc, bison, and Java Cup.
    • Parsed by bottom-up parsers (they construct the derivation tree by going from terminals up to nonterminals, then attaching several nonterminals together to form a new nonterminal, etc, ending with just the start nonterminal).
    • Harder to understand than LL(1) parsers (i.e., it is hard to understand how the LALR(1) parser works, and what makes a grammar LALR(1)).

    Sample Parsers

    In this class, we will learn about three parsing techniques:

    1. A general, bottom-up parsing algorithm called the CYK algorithm.
    2. A technique for parsing LL(1) grammars top-down, called predictive parsing.
    3. A technique for bottom-up parsing that works for LALR(1) grammars, but since that is rather complicated, we will study a version of the algorithm that only works for a subset of the LALR(1) grammars called SLR(1) grammars.