LALR(1) follow sets

Notes for CPSC411:

Author: Robin Cockett
Date: 9 March 2007


Forming the LALR(1)-item automaton from the LR(0)-item automaton

The LALR(1)-item automaton's states are the same as those of the LR(0)-item automaton except that we need, to help resolve any conflicts, sets of terminals associated with each completed item in each state which contains such. These are the LALR(1) conflict resolution sets. One expensive way to calculate these is to develop the LR(1)-item automaton and map it down onto the LR(0)-item automaton while and using its follow token to determine conflict resolution. Another method is to calculate follow sets on the augmented grammar (which we describe below). However, the fastest (if not the easiest way) is to calculate the conflict resolution token in situ from the LR(0)-item automaton: this avoids developing the LR(1)-item automaton or the augmented grammar (although the in situ calculation is, in fact, just the follow set calculation for the augmented grammar).

It is useful to realize that to tell whether a grammar is LALR(1) we only need to calculate the conflict resolution tokens for the completed items for which there is a shift/reduce or reduce/reduce conflict. This can, in fact, be done by hand and it is the purpose of these notes to outline how and also to relate this calculation to the follow tokens of the LR(1)-item automaton and the follow sets of the augmented grammar.

A basic technique is to mimic the effect of the shift-reduce parser on the automaton by tracing paths backwards to understand when reductions of completed items should be permitted. This can be achieved more formally by calculating augmented follow sets.

Augmented follow sets (in situ)

A starting item in a state of an LR(0)-item automaton is an item of the form x -> .\gamma. If x is not the start non-terminal the only reason that an item of this form is present in a state is because there are items in the state of the form y -> \alpha.x \beta.

When x is the head of a starting item in state S, we define the LALR(1) immediate follow set of x in a state S, IFollow(x,S), to be

if x=start is the start nonterminal of the augmented grammar IFollow(x,S) = { $ }
otherwise IFollow(x,S) = \union { first(\beta) | y -> \alpha.x \beta \in S \}

Note that IFollow(x,S) is just the terminal transitions leaving the state reached by doing an x from S (with the added possibility of a $). So this is really very easy to calculate by eye-balling the machine!

We shall say there is a (backward) propogation step:

(x,S) ~~> (y,T)

in case there is an item y -> \alpha.x \beta in the state S with \beta nullable and a (forward) path \alpha: T -> S in the machine. The LALR(1) follow set of x in S, Follow(x,S), is the union of

the immediate follow set of x in S, IFollow(x,S)
the immediate follow sets y in T, IFollow(y,T), for which there are (one or more) backward propogation steps (x,S) ~~> (y,T).

That is Follow(x,S) = \union { IFollow(y,T) | (S,x) ~*~> (y,T) } ... i.e. the union of the immediate follow sets reachable by zero or more propogation steps.

This can be calculated for (x,S) by first calculating the transitive closure of the backward propogation steps to obtain a set of pairs which can be reached by backward propogation from (x,S) and then forming the union of the immediate follow sets of these pairs. Recall, significantly, that there can be backward propogations (x,S) --> (y,S) if there is a rule of the form y -> .x \beta where \beta is nullable. So one can move from one follow set on a state to another on the same state. Recall also backwards propogation from S along a path \alpha is non-deterministic: there may be many different states T from which following \alpha will get one to S.

The LALR(1) conflict resolving tokens for a completed item

x -> \alpha. in state S

are then exactly the set:

\union \{ Follow(x,T) | \alpha: T -> S \}

The advantage of this calculation is that one can apply it to resolve just the conflicts which occur in the LR(0)-item automaton and if one is lucky the propogation ends quite quickly leading to a relatively cheap way of determining whether the language is LALR(1) or not.

Relation to the LR(1)-item automaton's follow tokens

Here is a sketch of why the way of calulating the LALR(1) follow sets, as above and from the LR(1)-item automaton, are the same. To see that this calculation does calculate the follow sets obtained from the LR(1) automaton it suffices to relate the adding of tokens to the tokens in the LR(1)-item automaton to the augmented follow set calculation. This is achieved by first observing that the tokens collected by the augmented follow sets will all be generated by the LR(1)-items. To see this observe first that the immediate follow sets do not add anything that the LR(1)-items do not already collect. Similarly, if there is a backward propogation (x,S) --> (y,T) then any symbol which follows y in T will also be picked up in the LR(1)-automaton to follow x in S. So the LALR(1) -augmented follow sets are no bigger than the sets picked up by the LR(1)-item automaton. For the converse we need to show that anything picked up by the LR(1)-item automaton will be picked up by the augmented follow sets: again this can be seen by looking at the steps in the process.

The augmented grammar

The following realization is due to M.E. Bermudez and G. Logothetis (Inf. Proc. Letter 31 (1989) 233-238): the above calculation of LALR(1)-augmented follow sets can be seen as the standard follow set calculation on an augmented grammar. The augmented grammar is constructed as follows:

Each rule x -> \alpha in the original grammar occurs as a starting item in a number of LR(0)-item automaton states. For each such state, S, create a new copy of the production with head non-terminal augmented by the state S and each non-terminal in the body augmented by the state it leaves when that starting item is followed through its (uniquely determined) shifts in the LR(0)-item automaton.

(x,S) -> \alpha'

This creates an augmented grammar whose follow sets, Follow(x,S) are exactly the same as the augmented follow sets described above.

This is a VERY convenient packaging of the LALR(1)-augmented follow set calculation! It also leads to the following, perhaps a little surprising, observation. Every language recognizable by an LALR(1) grammar is recognizable by an SLR(1) grammar! Which one? Well of course it is the augmented grammar!

Example (dangling else):

Consider the following grammar this is a cut-down version of the "dangling else" problem which was famously encountered in the development of Algol and led to the whole development of the theory of context free grammars.

Stmt -> assignment
| if expression then Stmt Ecase.
Ecase -> else Stmt
|.

has been cut down to:

S -> s | c S C.
C -> e S | .

the grammar is ambiguous so cannot be LALR(1). Here is the LR(0) automaton. Notice the conflict in state 4. As we know that the grammar is not LALR(1), we should be able to show that e is in the augmented follow set of C in state 4 (we go backward from state 4 by \epsilon). We do not have to do the whole LALR(1)-augmented follow set calculation to see this consider the follow sets and the propogation from (C,4):

Ifollow(C,4) = {}	S -> c S.C: (C,4) -> (S,0) S -> c S.C: (C,4) -> (S,3)
Ifollow(S,0) = {}	start-> .S: (S,0) -> (start,1)
Ifollow((start,1) ={$}
Ifollow(S,3) = {e}

This means by (C,4) -> (S,3) => {e} that e is in the LALR(1)-augmented follow set so that the grammar is not LALR(1).

The augmented grammar is

(S,0) -> s | c (S,3) (C,4).
(S,3) -> s | c (S 3) (C,4).
(S,6) -> s | c (S,3) (C,4).
(C,4) -> e (S,6) | .

Try the follow set calculation here and check you get the same answer!

Example (resolving some conflicts):

Consider the grammar:

A -> B ( A B)
| .
B -> A *( B A )*
| .

Here are its vital statistics. It is not LR(0) or SLR(1) as is shown here.

We try to show that it is LALR(1).

The reduce/reduce conflict in state 0:
We must calculate Follow(A,0) ={ *(, $ }:

Ifollow(A,0) = { *( }	(A,0) -> (start,1)
Ifollow(start,1) = { $ }

and Follow(B,0) = { ( }:

Ifollow(B,0) = { ( }

nothing

So this resolves this reduce/reduce conflict.

The reduce/reduce conflict in state 6:
We must calculate Follow(A,0) ={ *( }:

Ifollow(A,3) = { *( }

nothing

and Follow(B,3) = { ( }:

Ifollow(B,3) = { ( }

nothing

So this resolves this reduce/reduce conflict!

Similarly is sate 5 one can resolve the reduce/reduce conflict.

Similarly the reduce/reduce conflict in state 5 can be resolved. However, the shift/reduce conflict in state 5 cannot be resolved so this grammar is not LARL(1). The immediate follow set IFollow(A,5) = { *( } so we do avoid the shift/reduce conflict .... so the grammar is not LARL(1).

Example: removing all conflicts.

Consider the slight variation of the above grammar

A -> A ( B A)
| .
B -> B *( A B )*
| .

The vital statistics of this grammar are here. The LR(0) automaton is here: note the grammar is neither LR(0) or SLR(1). However, as we now verify this is LALR(1) -- and this should remind us that a slight change to a grammar can change everything.

There are shift/reduce conflict in state 3 and 6. However, in state 3 Ifollow(A,3) = { ( , ) } and there is no propogation. So this is resolved. Symmetrically
Ifollow(B,6) = { *( , *) } and there is no propogation so this is also resolved. Thus, the grammar is LALR(1).

This was very simple! Let us illustrate the technique further by working out the LALR(1) follow set for the completed item A -> A ( B A). in state 7. We are looking for the follow sets of A in all state where the given production starts. Tracing this back we must calculate Follow(A,3) and Follow(A,0). It is not hard to see that IFollow(A,0) = { ( } and there is propogation to (start,1) which will add { $ } so that Follow(A,0) = { ( , $ }. Let us calculate for (A,3) here there is no propogation and IFollow(A) = { ( , ) }.
Thus the LALR(1) follow set on the completed item in state 7 is ( / ) / $.