EECS665 – Compiler Construction – Drew Davidson $\newcommand{\bigsqcap}{\mathop ⨅}$

    SLR State Construction


    Contents


    SLR Parsing

    SLR means simple LR; it is the weakest member of the LR family (i.e., every SLR grammar is also LALR and LR, but not vice versa). To understand SLR parsing we'll use a new example grammar (a very simple grammar for parameter lists):

    PList $\longrightarrow$ ( IDList )
    IDList $\longrightarrow$ id
    | IDList id

    Building the Action and Goto Tables for an SLR Parser

    Definition of an SLR item:

      An item (for a given grammar) is a production with a dot somewhere on the right-hand side.
    Example
    image/svg+xml PList( IDList ) PList( IDList ) . PList( IDList ) . PList( IDList ) . PList( IDList ) . The production The possible items
    NOTE: for the production X $\longrightarrow$ $\varepsilon$, there is just one item: X $\longrightarrow$ . $\varepsilon$

    The item "PList $\longrightarrow$ . lparens IDList rparens" can be thought of as meaning "we may be parsing a PList, but so far we haven't seen anything".

    The item "PList $\longrightarrow$ lparens . IDList rparens" means "we may be parsing a PList, and so far we've seen a lparens".

    The item "PList $\longrightarrow$ lparens IDList . rparens" means "we may be parsing a PList, and so far we've seen a lparens and parsed an IDList.

    We need 2 operations on sets of items: Closure and Goto

    Closure

    To compute Closure($I$), where $I$ is a set of items:

    1. put $I$ itself into Closure($I$)
    2. while
      there exists an item in Closure(I of the form
        X $\longrightarrow$ $\alpha$ . B $\beta$
      such that there is a production B $\longrightarrow$ $\gamma$, and B $\longrightarrow$ . $\gamma$ is not in Closure(I)
      do
      add B $\longrightarrow$ . $\gamma$ to Closure($I$)

    The idea is that the item "X $\longrightarrow$ $\alpha$ . B $\beta$" means "we may be trying to parse an X, and so far we've parsed all of $\alpha$, so the next thing we'll parse may be a B". And the item "B $\longrightarrow$ . $\gamma$" also means that the next thing we'll parse may be a B (in particular, a B that derives $\gamma$), but we haven't seen any part of it yet.

    Example 1: Closure({ PList $\longrightarrow$ . lparens IDList rparens })

      We'll begin by putting the initial item into the Closure (Step 1 above). So far, our set is: { PList $\longrightarrow$ . lparens IDList rparens}

      Now, we will do step 2, checking the set we build for productions of the form B $\longrightarrow$ $\gamma$, where the item B $\longrightarrow$ . $\gamma$ is not in the set. There's only one item that we can check, and the symbol to the immediate right of the dot is lparens which is a terminal symbol. Obviously, there are no productions of the form B $\longrightarrow$ $\gamma$ with a terminal symbol on the left-hand side, so there's nothing else to check.

      With Step 2 exhausted, we can return the set we've built up: Closure({ PList $\longrightarrow$ . lparens IDList rparens }) = { PList $\longrightarrow$ . lparens IDList rparens }

    Example 2: Closure({ PList $\longrightarrow$ lparens . IDList rparens })

      As with the previous example, we put the initial item into the Closure. So far, our set is { PList $\longrightarrow$ lparens . IDList rparens }

      For step 2, we begin by selecting the only item in our working set, PList $\longrightarrow$ lparens . IDList rparens. We now look for productions with a left-hand side of IDList, since that's the symbol to the immediate right of the dot. One production of this form is "IDList $\longrightarrow$ id". Since the item IDList $\longrightarrow$ . id is not in the Closure yet, we add it. Our set so far is { PList $\longrightarrow$ lparens . IDList rparens , IDList $\longrightarrow$ . id}

      We know that the item that we just added, IDList $\longrightarrow$ . id will not yield any more items, because the symbol immediately to the right of the dot is a terminal. However, we still haven't captured every production with IDList on the left-hand side, which we need to check because of our initial item. The grammar also has the production IDList $\longrightarrow$ IDList id, so we add the item "IDList $\longrightarrow$ . IDList id" to the closure. At this point, our working set is { PList $\longrightarrow$ lparens . IDList rparens , IDList $\longrightarrow$ . id, IDList $\longrightarrow$ . IDList id}

      The new item that we added has IDList to the right-hand side of the dot. Fortunately, we've already exhausted every production of the grammar with IDList on the left-hand side. Thus, we can pronounce our working set complete:

      Closure({ PList $\longrightarrow$ lparens . IDList rparens }) = { PList $\longrightarrow$ lparens . IDList rparens , IDList $\longrightarrow$ . id, IDList $\longrightarrow$ . IDList id}

    Goto

    Now that we have defined the Closure of a set of items, we can use it to define the Goto operation. The basic idea is that $I$ tells us where we might be in the parse, and Goto($I$, X) tells us where we might be after parsing an X. Here is the definition:

      Let $I$ be a set of items, and X be a grammar symbol (i.e. a single terminal or nonterminal).
      Goto($I$, X) = the Closure of the set of items
        A $\longrightarrow$ $\alpha$ X . $\beta$
      such that
        A $\longrightarrow$ $\alpha$ . X $\beta$
      is in $I$
    Example 1: Goto($I$1, X1)
      where
        $I$1 = { PList $\longrightarrow$ . lparens IDList rparens }
        X1 = lparens

      Let us begin by defining an intermediate set:

        $\mathcal{W}$ = the set of items of the form A $\longrightarrow$ $\alpha$ X . $\beta$ such that an item of the form A $\longrightarrow$ $\alpha$ . X $\beta$ is in $I$.

      We can now build $\mathcal{W}$ by taking each item from $I$ (of which there is only one) and advancing the dot to the right.
      Thus, $\mathcal{W}$ = { PList $\longrightarrow$ lparens . IDList rparens}

      With $\mathcal{W}$ in hand, we are ready to perform the Goto operation by computing Closure($\mathcal{W}$) = Closure( { PList $\longrightarrow$ lparens . IDList rparens} ) . We already computed this closure above as { PList $\longrightarrow$ lparens . IDList rparens , IDList $\longrightarrow$ . id, IDList $\longrightarrow$ . IDList id}, so we are done:

        Goto($I$1, X1) = { PList $\longrightarrow$ lparens . IDList rparens , IDList $\longrightarrow$ . id, IDList $\longrightarrow$ . IDList id}

    Example 2: Goto( $I$2, X2 )
      where

        $I$2 = Goto($I$1, X1 )
        X2 = IDList

      The inner Goto operation is the result of Example 1, so we can substitute that result directly. Expanded, the problem statement is:

      Goto( { PList $\longrightarrow$ lparens . IDList rparens , IDList $\longrightarrow$ . id, IDList $\longrightarrow$ . IDList id} , IDList)
      Again, we start by computing $\mathcal{W}$:
      Item in $I$ of the form A $\longrightarrow$ $\alpha$ . X $\beta$ Item of the form A $\longrightarrow$ $\alpha$ X . $\beta$
      PList $\longrightarrow$ lparens . IDList rparens PList $\longrightarrow$ lparens IDList . rparens
      IDList $\longrightarrow$ . IDList id IDList $\longrightarrow$ IDList . id
      IDList $\longrightarrow$ . id IDList $\longrightarrow$ id .

      Thus, $\mathcal{W}$ = { PList $\longrightarrow$ lparens IDList . rparens, IDList $\longrightarrow$ IDList . id, IDList $\longrightarrow$ id . }

      We can now take the closure of $\mathcal{W}$ to complete the operation. This turns out to be trivial, since no element in $\mathcal{W}$ is followed by a nonterminal, and therefore yields no additional items. Thus,
      Goto($I$2, X2)
      = Closure($\mathcal{W}$)
      = $\mathcal{W}$
      = {PList $\longrightarrow$ lparens IDList . rparens, IDList $\longrightarrow$ IDList . id, IDList $\longrightarrow$ id . }

      Our ultimate goal is to create a To build the FSM:

      1. Augment the grammar
          (a) add new start nonterminal S'
          (b) add new production S' → S (where S is the old start nonterminal)
      2. I0 = closure({ S' → . S })
      3. for each grammar symbol X such that there is an item in I0 containing ".X" do
          add a transition on X from state I0 to state Goto(I0,X)
      4. repeat step (3) for each new state until there are no more new states
      Note: In step (3), when we say "state Goto(...)" we mean the state that contains exactly that set of items; if there is not already such a state, then create a new one.

      Example

        grammar
        S' $\longrightarrow$ PList
        PList $\longrightarrow$ ( IDList )
        IDList $\longrightarrow$ id
        IDList $\longrightarrow$ IDList id

        Corresponding SLR FSM

        image/svg+xml S'. PList PList. ( IDList ) PList( . IDList ) PList . ID PList. IDList ID I2 ( S'PList . ID I4 IDListID . IDListIDList ID . PList'( IDList . ) IDList IDList IDList . ID ID I5 PList( IDList ) . IDList IDList . ID ) PList I0 I1 I3 I6

      Given the FSM, here's how to build Action and Goto tables:

      Action Table:
      • if state i includes item A $\longrightarrow$ $\alpha$ . a $\beta$
        where a is a terminal, and the transition out of state i on a is to state j,
        then set Action[i,a] = shift j
      • if state i includes item A $\longrightarrow$ $\alpha$ .
        where A is not the new start symbol S'
        then for every a in FOLLOW(A), set Action[i,a] = reduce by A $\longrightarrow$ $\alpha$
      • if state i includes item S' $\longrightarrow$ S .
        then set Action[i,$] = accept
      • set all other entries of Action table to error
      Goto Table:
      for every nonterminal X
      if there is a transition from state i to state j on X
      then set Goto[i, X] = j

      Example

        FOLLOW(IDList) = $\{$ ), id $\}$
        FOLLOW(PList) = $\{$$$\}$
        image/svg+xml IDListIDList ID reduce by IDListIDList ID reduce by PList( IDList ) reduce by IDListIDList ID reduce by IDListIDList ID reduce by shift 5 shift 6 shift 4 accept shift 2 PList IDList ( ) ID $ 0 2 1 3 4 5 6 Action Table Goto Table Symbol FSM State 1 3
      Conflicts

      Not every grammar is SLR(1). If a grammar is not SLR(1), there will be a conflict in the SLR Action table. A conflict in the table exists if there is a table entry with more than 1 rule in it. There are two possible kinds of conflicts:

      1. shift/reduce conflicts (a shift and a reduce action in the same table entry)
      2. reduce/reduce conflicts (two reduce actions in the same table entry)

      A shift/reduce conflict means that it is not possible to determine, based only on the top-of-stack state symbol and the current token, whether to shift or to reduce. This kind of conflict arises when one state contains two items of the following form:

      1. X $\longrightarrow$ $\alpha$ . a $\beta$
      2. Y $\longrightarrow$ $\alpha$
      and a is in FOLLOW(Y).

      A reduce/reduce conflict means that it is not possible to determine, based only on the top-of-stack state symbol and the current token, whether to reduce by one grammar rule or by another grammar rule. This kind of conflict arises when one state contains two items of the form

      1. X $\longrightarrow$ $\alpha$ .
      2. Y $\longrightarrow$ $\beta$ .
      and there is some symbol a that is in both FOLLOW(X) and FOLLOW(Y).

      A non-SLR(1) grammar

      This grammar causes a shift/reduce conflict

        grammar

        S' $\longrightarrow$ S
        S $\longrightarrow$ A * B | B
        A $\longrightarrow$ a | + B
        B $\longrightarrow$ A

        relevant part of the FSM

        image/svg+xml S'. S S. A * B SA . * B * A I0 I1 S. B A. +B B. A BA . SA * . B I2 B. A A. a A → . + B This item indicates that Action[1, *]should be shift 2 This item, and the fact that * is inFollow(B) indicates that Action[1, *]should be reduce by B → A
      Another non-SLR(1) grammar

      This grammar causes a reduce/reduce conflict:

      grammar

      S' $\longrightarrow$ S
      S $\longrightarrow$ a A d | b B d | a B e | b A e
      A $\longrightarrow$ c
      B $\longrightarrow$ c

      Follow sets

      Follow(A) = {d,e}
      Follow(B) = {d,e}

      relevant part of the FSM

      image/svg+xml S'. S S. a A d a I0 Since Follow(A) and Follow(B)both contain d, it means that Action[2,d] includes bothReduce by Ac and Reduce by Bc S. b B d S. a B e S. b A e Sb . B d I3 Sb . A e B. c A. c Sa . A d I1 Sa . B e A. c B. c I2 c b c Ac . Bc . Similarly, Follow(A) and Follow(B)both contain e, so consequently Action[2,e] also includes bothReduce by Ac and Reduce by Bc

    This concludes our discussion of bottom-up parsing. Have a nice day.