Notes for CPSC411:
Author: Robin Cockett
Date: 9 March 2007
Forming the LALR(1)-item automaton from the LR(0)-item automaton
The
LALR(1)-item automaton's states are the same as those of the
LR(0)-item
automaton except that we need, to help resolve any
conflicts,
sets of terminals associated with each
completed
item in each state which contains such. These are the LALR(1)
conflict resolution sets. One expensive way to calculate these is
to develop the LR(1)-item automaton and map it down onto the LR(0)-item
automaton while and using its follow token to determine conflict
resolution. Another method is to
calculate follow sets on the augmented grammar (which we describe
below). However, the fastest (if not the easiest way) is to calculate
the conflict resolution token in situ from the LR(0)-item automaton:
this avoids developing the LR(1)-item
automaton
or
the augmented grammar (although the in situ calculation is, in fact,
just the follow set calculation for the augmented grammar).
It is useful to realize that to tell whether a grammar is LALR(1) we only need to calculate the conflict resolution tokens for the completed items for which there is a shift/reduce or reduce/reduce conflict. This can, in fact, be done by hand and it is the purpose of these notes to outline how and also to relate this calculation to the follow tokens of the LR(1)-item automaton and the follow sets of the augmented grammar.
A basic technique is to mimic the effect of the shift-reduce parser on the automaton by tracing paths backwards to understand when reductions of completed items should be permitted. This can be achieved more formally by calculating augmented follow sets.
Augmented
follow
sets (in situ)
A starting item in a
state
of an LR(0)-item automaton is an item of the form x -> .\gamma. If x
is
not the start non-terminal the only reason that an item of this form is
present
in a state is because there are items in the state of the form y ->
\alpha.x
\beta.
When x is the head of a starting item in state S, we define the LALR(1) immediate follow set of x
in
a state S, IFollow(x,S), to be
We shall say there is a (backward) propogation step:
(x,S) ~~> (y,T)
in case there is an item y -> \alpha.x \beta in the state S with \beta nullable and a (forward) path \alpha: T -> S in the machine. The LALR(1) follow set of x in S, Follow(x,S), is the union of
That is Follow(x,S) = \union { IFollow(y,T) | (S,x) ~*~>
(y,T) } ... i.e. the union of the immediate follow sets reachable by
zero or more propogation steps.
This can be
calculated
for (x,S) by first calculating the transitive closure of the backward
propogation
steps to obtain a set of pairs which can be reached by backward
propogation
from (x,S) and then forming the union of the immediate follow sets of
these
pairs.
Recall, significantly, that there can be backward propogations (x,S)
--> (y,S) if
there
is a rule of the form y -> .x \beta where \beta is nullable.
So one
can
move from one follow set on a state to another on the same state.
Recall
also backwards propogation from S along a path \alpha is
non-deterministic: there may
be many different states T from which
following \alpha will get
one to S.
The LALR(1)
conflict
resolving tokens for a completed item
are then exactly the set:
\union \{ Follow(x,T) | \alpha: T -> S \}
The advantage
of
this calculation is that one can apply it to resolve just the conflicts
which
occur in the LR(0)-item automaton and if one is lucky the propogation
ends quite quickly leading to a relatively cheap way of determining
whether
the language is LALR(1) or not.
The augmented grammar
The following realization is due to M.E. Bermudez and G. Logothetis (Inf. Proc. Letter 31 (1989) 233-238): the above calculation of LALR(1)-augmented follow sets can be seen as the standard follow set calculation on an augmented grammar. The augmented grammar is constructed as follows:
Each rule x -> \alpha in the original grammar occurs as a starting item in a number of LR(0)-item automaton states. For each such state, S, create a new copy of the production with head non-terminal augmented by the state S and each non-terminal in the body augmented by the state it leaves when that starting item is followed through its (uniquely determined) shifts in the LR(0)-item automaton.
(x,S) -> \alpha'
This creates an augmented grammar whose follow sets, Follow(x,S) are exactly the same as the augmented follow sets described above.
This is a
VERY
convenient packaging of the LALR(1)-augmented follow set calculation!
It also
leads
to the following, perhaps a little surprising, observation. Every
language
recognizable by an LALR(1) grammar is recognizable by an SLR(1)
grammar!
Which one? Well of course it is the augmented grammar!
Example (dangling else):
Consider the
following
grammar this is a cut-down version of
the
"dangling else" problem which was famously encountered in the
development
of
Algol and led to the whole development of the theory of context free
grammars.
Stmt ->
assignment
| if expression then Stmt Ecase.
Ecase -> else Stmt
|.
has been cut
down
to:
S -> s | c
S
C.
C -> e S | .
the grammar
is
ambiguous so cannot be LALR(1). Here
is the LR(0) automaton. Notice the conflict in state 4. As
we
know that the grammar is LALR(1) so we should be able to show that e is
in
the augmented follow set of C in state 4 (we go backward from state 4
by \epsilon). We do
not have to do the whole
LALR(1)-augmented follow
set calculation to see this consider the follow sets and the
propogation
from (C,4):
| Ifollow(C,4) = {} |
S -> c S.C: (C,4) -> (S,0) S -> c S.C: (C,4) -> (S,3) |
| Ifollow(S,0) = {} |
start-> .S: (S,0) -> (start,1) |
| Ifollow((start,1) ={$} |
|
| Ifollow(S,3) = {e} |
This means by
(C,4)
-> (S,3) => {e} that e is in the LALR(1)-augmented follow set so
that the
grammar
is not LALR(1).
The augmented
grammar
is
(S,0) -> s
|
c (S,3) (C,4).
(S,3) -> s | c (S 3) (C,4).
(S,6) -> s | c (S,3) (C,4).
(C,4) -> e (S,6) | .
| Ifollow(A,0) = { *( } |
(A,0) -> (start,1) |
| Ifollow(start,1) = { $ } |
| Ifollow(B,0) = { ( } |
nothing |
| Ifollow(A,3) = { *( } |
nothing |
| Ifollow(B,3) = { ( } |
nothing |
Similarly the
reduce/reduce conflict in state 5 can be resolved. However, the
shift/reduce conflict in state 5 cannot be resolved
so
this grammar is not LARL(1). The immediate follow set IFollow(A,5)
= {
*(
} so we do avoid the shift/reduce conflict ....
so
the grammar is not LARL(1).
Example: removing all conflicts.
A -> B ( A B)
| .
B -> A *( B A )*
| .
The vital statistics of this grammar are here. The LR(0) automaton is here: note the grammar is neither LR(0) or SLR(1). However, as we now verify this is LALR(1) -- and this should remind us that a slight change to a grammar can change everything.
There are shift/reduce conflict in state 3 and 6.
However, in state 3 Ifollow(A,3)
= { ( , ) } and there is no propogation. So this is
resolved. Symmetrically
Ifollow(B,6) = { *( , *) } and
there is no propogation so this is also resolved. Thus, the
grammar is LALR(1).
This was very simple! Let us illustrate the technique further
by working out the LALR(1) follow set for the completed item A -> A
( B A). in state 7. We are looking for the follow
sets of A in all state where the given production starts. Tracing
this back we must calculate Follow(A,3)
and Follow(A,0).
It is not hard to see that IFollow(A,0)
= { ( } and there is propogation to (start,1) which will add { $ } so
that Follow(A,0) = { ( ,
$ }. Let us calculate for (A,3) here there is no
propogation and IFollow(A)
= { ( , ) }.
Thus the LALR(1) follow set on the completed item in state 7 is ( / ) /
$.