Notes for CPSC411:
Author: Robin Cockett
Date: 9 March 2007
Forming the LALR(1)-item automaton from the LR(0)-item automaton
The
LALR(1)-item automaton's states are the same as those of the
LR(0)-item
automaton except that we need, to help resolve any
conflicts,
sets of terminals associated with each
completed
item in each state which contains such. These are the
LALR(1)
conflict resolution sets. One expensive way to calculate
these is
to develop the LR(1)-item automaton and map it down onto the
LR(0)-item
automaton while and using its follow token to determine conflict
resolution. Another method is to
calculate follow sets on the augmented grammar (which we
describe
below). However, the fastest (if not the easiest
way) is to calculate
the conflict resolution token in situ from the LR(0)-item
automaton:
this avoids developing the LR(1)-item
automaton
or
the augmented grammar (although the in situ calculation is, in
fact,
just the follow set calculation for the augmented grammar).
It is useful to realize that to tell whether a grammar is LALR(1) we only need to calculate the conflict resolution tokens for the completed items for which there is a shift/reduce or reduce/reduce conflict. This can, in fact, be done by hand and it is the purpose of these notes to outline how and also to relate this calculation to the follow tokens of the LR(1)-item automaton and the follow sets of the augmented grammar.
A basic technique is to mimic the effect of the shift-reduce parser on the automaton by tracing paths backwards to understand when reductions of completed items should be permitted. This can be achieved more formally by calculating augmented follow sets.
Augmented
follow
sets
(in situ)
A starting item
in a
state
of an LR(0)-item automaton is an item of the form x ->
.\gamma. If x
is
not the start non-terminal the only reason that an item of this
form is
present
in a state is because there are items in the state of the form y
->
\alpha.x
\beta.
When x is the head of a starting item in state S, we define
the LALR(1) immediate
follow set of x
in
a state S, IFollow(x,S),
to be
We shall say there is a (backward) propogation step:
(x,S) ~~> (y,T)
in case there is an item y -> \alpha.x \beta in the state S with \beta nullable and a (forward) path \alpha: T -> S in the machine. The LALR(1) follow set of x in S, Follow(x,S), is the union of
That is Follow(x,S) = \union { IFollow(y,T) | (S,x)
~*~>
(y,T) } ... i.e. the union of the immediate follow sets
reachable by
zero or more propogation steps.
This can be
calculated
for (x,S) by first calculating the transitive closure of the
backward
propogation
steps to obtain a set of pairs which can be reached by backward
propogation
from (x,S) and then forming the union of the immediate follow
sets of
these
pairs.
Recall, significantly, that there can be backward propogations
(x,S)
--> (y,S) if
there
is a rule of the form y -> .x \beta where \beta is
nullable.
So one
can
move from one follow set on a state to another on the same state.
Recall
also backwards propogation from S along a path \alpha is
non-deterministic:
there may
be many different states T from which
following \alpha will get
one to S.
The
LALR(1)
conflict
resolving tokens for a completed item
are then exactly the set:
\union \{ Follow(x,T) | \alpha: T -> S \}
The
advantage
of
this calculation is that one can apply it to resolve just the
conflicts
which
occur in the LR(0)-item automaton and if one is lucky the
propogation
ends quite quickly leading to a relatively cheap way of
determining
whether
the language is LALR(1) or not.
The augmented grammar
The following realization is due to M.E. Bermudez and G. Logothetis (Inf. Proc. Letter 31 (1989) 233-238): the above calculation of LALR(1)-augmented follow sets can be seen as the standard follow set calculation on an augmented grammar. The augmented grammar is constructed as follows:
Each rule x -> \alpha in the original grammar occurs as a starting item in a number of LR(0)-item automaton states. For each such state, S, create a new copy of the production with head non-terminal augmented by the state S and each non-terminal in the body augmented by the state it leaves when that starting item is followed through its (uniquely determined) shifts in the LR(0)-item automaton.
(x,S) -> \alpha'
This creates an augmented grammar whose follow sets, Follow(x,S) are exactly the same as the augmented follow sets described above.
This is a
VERY
convenient packaging of the LALR(1)-augmented follow set
calculation!
It also
leads
to the following, perhaps a little surprising, observation.
Every
language
recognizable by an LALR(1) grammar is recognizable by an SLR(1)
grammar!
Which one? Well of course it is the augmented grammar!
Example (dangling else):
Consider
the
following
grammar this is a cut-down
version of
the
"dangling else" problem which was famously encountered in the
development
of
Algol and led to the whole development of the theory of context
free
grammars.
Stmt ->
assignment
| if expression then Stmt
Ecase.
Ecase -> else Stmt
|.
has been
cut
down
to:
S -> s
| c
S
C.
C -> e S | .
the
grammar
is
ambiguous so cannot be LALR(1). Here
is the LR(0) automaton. Notice the conflict in state 4.
As
we
know that the grammar is not LALR(1), we should be able to show
that e is
in
the augmented follow set of C in state 4 (we go backward from
state 4
by \epsilon).
We do
not have to do the whole
LALR(1)-augmented follow
set calculation to see this consider the follow sets and the
propogation
from (C,4):
Ifollow(C,4) = {} |
S -> c S.C: (C,4) -> (S,0) S -> c S.C: (C,4) -> (S,3) |
Ifollow(S,0) = {} |
start-> .S: (S,0) -> (start,1) |
Ifollow((start,1) ={$} |
|
Ifollow(S,3) = {e} |
This means
by
(C,4)
-> (S,3) => {e} that e is in the LALR(1)-augmented follow
set so
that the
grammar
is not LALR(1).
The
augmented
grammar
is
(S,0)
-> s
|
c (S,3) (C,4).
(S,3) -> s | c (S 3) (C,4).
(S,6) -> s | c (S,3) (C,4).
(C,4) -> e (S,6) | .
Ifollow(A,0) = { *( } |
(A,0) -> (start,1) |
Ifollow(start,1) = { $ } |
Ifollow(B,0) = { ( } |
nothing |
Ifollow(A,3) = { *( } |
nothing |
Ifollow(B,3) = { ( } |
nothing |
Similarly
the
reduce/reduce conflict in state 5 can be resolved. However, the
shift/reduce conflict in state 5 cannot be resolved
so
this grammar is not LARL(1). The immediate follow set IFollow(A,5)
=
{
*(
} so we do avoid the shift/reduce conflict ....
so
the grammar is not LARL(1).
Example: removing all conflicts.
A -> A ( B A)
| .
B -> B *( A B )*
| .
The vital statistics of this grammar are here. The LR(0) automaton is here: note the grammar is neither LR(0) or SLR(1). However, as we now verify this is LALR(1) -- and this should remind us that a slight change to a grammar can change everything.
There are shift/reduce conflict in state 3 and 6.
However, in state 3 Ifollow(A,3)
=
{ ( , ) } and there is no propogation. So this is
resolved. Symmetrically
Ifollow(B,6) = { *( , *) }
and
there is no propogation so this is also resolved. Thus, the
grammar is LALR(1).
This was very simple! Let us illustrate the technique
further
by working out the LALR(1) follow set for the completed item A
-> A
( B A). in state 7. We are looking for the
follow
sets of A in all state where the given production starts.
Tracing
this back we must calculate Follow(A,3)
and
Follow(A,0).
It is not hard to see that IFollow(A,0)
=
{ ( } and there is propogation to (start,1) which will add { $ }
so
that Follow(A,0) =
{ ( ,
$ }. Let us calculate for (A,3) here there is no
propogation and IFollow(A)
=
{ ( , ) }.
Thus the LALR(1) follow set on the completed item in state 7 is (
/ ) /
$.