LR.1.html

where the dot indicates how we have broken up the righthand side and ++ stands for concatenation. The one special case we should be aware of is the item corresponding to an empty production: there is only one and this is written

a -> .

which stands for a -> \epsilon.\epsilon.

There are two sorts of transitions between these items:

Shift transitions:

x
[ a -> \a . x \b ] ----------> [ a -> \a x . \b ]

Expansion transitions: (these are silent)

[ A -> \a . x \b ] ~~~~~~~~~~> [ x -> .\d ]

The start state is the item [s' -> .s]. Any string which legally drives this automaton is a viable prefix the states which indicate there is a handle on the top of the stack are those which contain completed item: is an item whose "dot" is at the righthand end. The significance of such an item is that when the automaton is in this state we know that a handle (a place where a reverse derivation can be performed) has potentially been encountered. Thus, in the shift reduce parser there is an opportunity for a reduction.

Example: Consider the grammar:

a -> ( a )
| A

we may start by adding a new start state to get:

a' -> a
a -> ( a )
| A

The items are:

a' -> . a
a' -> a .	completed
a -> . (a)
a -> ( . a)
a -> (a . )
a -> (a) .	completed
a -> .A
a -> A.	completed

Part of the item automaton is (shift transitions only marked - you must add the expansion transitions from the starred states).

a
[ a' -> .a]* ----------> [ a' -> a.]

( a )
[ a -> .(a) ] -----> [ a -> (.a) ]* -----> [a -> (a.) ] ----> [a -> (a). ]

A
[ a -> .A ] -----> [ a -> A.]

I claim that this item automaton recognizes precisely the viable prefixes in a grammar which has every non-terminal reachable and realizable. The deterministic automaton which corresponds to this has states:

S0 = a' -> . a ------- a -> . A a -> . ( a )	S1* = a' -> a . ------	S2 = a -> A . ------
S3 = a -> ( . a ) -------- a -> . A a -> . ( a )	S4 = a -> ( a . ) -------	S5* = a -> ( a ) .

with transitions:

a
S0 -----> S1

A
S0 -----> S2

(
S0 -----> S3

(
S3 -----> S3

a
S3 -----> S2

A
S3 -----> S4

)
S4 -----> S5

The states starred have reduce actions associated with them: namely a reduction by the completed item they contain.

Now the next question is how we can use the information the automaton is giving to help the shift reduce parsing. To illustrate this we shall not only hold the viable prefix on the parse stack but also the state of this item automaton.
Consider the parse of "((A))":

STACK	INPUT	ACTION
S0 ---[]---> S0	"((A))"	shift
S0 --[(]---> S3	"(A))"	shift
S0 --[((]--> S3	"A))"	shift
S0 --[((A]-> S2	"))"	reduce[a->A]
S0 --[((a]-> S4*	"))"	shift
S0 -[((a)]-> S5*	")"	reduce[a->(a)]

At each stage we follow the transitions indicated in the machine (thus we shift when there is an appropriate transition to follow. When we reached a starred state we can initiate a reduce this means rolling back the stack to where it was before the handle and then going forward by the nonterminal at the head of the rule which one is reducing.

S0 ---[(a]---> S4	")"	shift
S0 ---[(a)]--> S5*	""	reduce[a->(a)]
S0 ---[a] ---> S1*	""	reduce[a'->a] = accept

Where acceptance is reach when one wished to reduce by the initial rule [a'->a] (which was artificially added) and when the input is empty.

A grammar is LR(0) if and only if the shift reduce parser can be unambiguously guided by the LR(0) item automaton (in the manner described above). It should be noted that most grammars are not LR(0)!

The example above, however, is LR(0). One can tell whether a grammar is LR(0) by inspecting the LR(0) item automaton: a grammar is LL(0) if and only if all completed items are in singleton states. This means we can unambiguously associate that reduce action with that state.

Exercises:

Show that

e -> e ADD t
| t.
t -> f MUL t
| f.
f -> VAR.

is not an LR(0) grammar.

Transform this into an LL(1) grammar (by removing left recursion and factoring) to get:

e -> t e+.
e+ -> ADD t e+
|.
t -> f t+.
t+ -> MUL f t+
|.
f --> VAR.

show that this grammar is not LR(0). Conclude that LR(0) does not include LL(1).

Is LR(0) included in LL(1)? Give a counter-example!

Is the grammar

S -> ( S )
|.

LR(0)?

SLR(1) grammars

It is clear that LR(0) gammars are not very powerful. However, we have not used the capability to look ahead in the input string at all. Consider the grammar:

S -> ( S )
|.

This is not LR(0) as the states of the item automaton are:

S0 = s' -> . s ------- s -> . (s) s -> .	S1 = s' -> s -----	S2 = s -> ( . s) ------ s -> . (s) s -> .
S3 = s -> (s . ) ------	S4 = s -> (s) . -------

with the transitions:

s: S0 ---> S1
(: S0 ---> S2
(: S2 ---> S2
s: S2 ---> S3
): S3 ---> S4

It is not LR(0) as the completed item s -> . occurs in non-singleton states. In fact we say that S0 and S2 have shift/reduce conflicts as we do not know whether to shift or reduce in these states. Similarly grammars can have reduce/reduce conflicts when two completed items occur in the same state.

However notice that to complete a successful parse when we reduce by the rule [s -> ] we must have the next token in the follows set of s (or if s is endable we could have reached the end of input). Notice, that s is endable and only ) is in its follow set. We have the option of reducing by [s -> ] in states S0 and S2. However, we can only shift out of these states if the next token is (. So taking this information into account we can again tell whether we should shift or reduce by inspecting the next symbol.

A grammar is said to be SLR(1) (or simple LR with a one token lookahead) if the shift/reduce (and reduce/reduce) conflicts can be resolved by using the follow sets as above.

More precisely:

Definition:

A grammar is an SLR(1) grammar if and only if in the for any state S in the LR(0)-item automaton of the grammar the following two conditions are satisfied:

For any item [a -> \alpha . X \beta] in S (where X is terminal) there is no completed item [b -> \beta .] in S with X in follow(b).

For any two completed items [b -> \beta.] and [b' -> \beta'.] in S the sets follow(b) and follow(b') are disjoint.

The above is an example of an SLR(1) grammar. However, this is still not a very large class of grammars:

Exercise:

Show that

e -> e ADD t
| t.
t -> t MUL f
| f.
f -> VAR.

is an SLR(1) grammar. (See dragon book and class notes)

Show that the following grammar is not SLR(1):

stmt -> call_stmt
| assign_stmt.
call_stmt -> ID.
assign_stm -> var ASSIGN exp.
var -> var LSPAR exp RSPAR
| ID.
exp -> var | NUM.

Is LL(1) included in SLR(1)? Consider the following grammar:

s -> a A a B
| b B b A.
a -> .
b -> .

LR(1) grammars

An LR(1) item consists of an LR(0) item paired with the lookahead symbol. Thus, the states of the \epsilon-NFA of items for G' consist of:

{ (A -> \alpha . \beta, t) | A -> \alpha ++ \beta is a production of G and t is a token}

where++ stands for append, as usual.

There are as before two sorts of transitions betwen these items:

Shift transitions:

[(A -> \a . X \b,t) ] --> [(A -> \a X. \b,t)]

Expansion transitions: (silent transitions)

[ (A -> \a . X \b,t) ] ~~~> [ (X -> .\d,t') ]

where t' in First(\b t). (N.B. here we are allowing the end-of-input symbol $ to be in the follow set.)

The start state is the item [(S' -> .S,$)]. As for the LR(0)-item automata, the LR(1)-item automaton has states which have completed items as their first coordinate final states: we associate an action which is dependent on the next token, the second coordinate, with such a state.

We may inspect the deterministic version of this automaton to determine whether the grammar is LR(1). It is LR(1) only if there are no reduce/reduce or shift/reduce conflicts. Here a shift/reduce conflict happens when a completed item in a state has its "follow token" also occurring as a shift from that state. A reduce/reduce conflict occurs when there are two completed items in the state with the same "follow token". As discussed earlier any LL(1) grammar is necessarily LR(1). Many practical programming grammars are LR(1).

LALR(1) grammars

The problem with LR(1) grammars is that (potentially) the item automaton can get very large. For this reason a there is a tendency to use LALR(1) grammars. They use the LR(0) deterministic item automaton's set of states but propagate the set of possible follow token to the items of these states. Thus the states of the LALR(1) item automaton consist of sets of items paired with sets of follow tokens.

The sets of follow tokens must be generated by calculating what must be in them. This involves a fixed point calculation where one successively adds tokens into these sets until no more additions happen.

A shift requires that all the follow tokens are transmitted to the target item. An expansion requires one to add in all the possible first tokens of the follow sentential forms:

Shift transitions:

[ (A -> \a. X \b,T) ] ----------> [ (A -> \a X. \b,T') ]

T is contained in T' (follow set of tokens of first item must be in target item.)

Expansion transitions:

[ (A -> \a. X \b,T) ] ~~~~~~~> [ (X -> .\c, T') ]

where T' contains { t' | in First(\beta t) and t \in T}.

Now if the automaton has cycles it may be necessary to propagate the follow sets several times round the automaton before they reach their least fixed point. (Rewording prompted by the Red Philosopher!)

Example:

s -> a A
| B a C
| D C
| B D A.
a -> D.

This grammar is LALR(1) but it is not SLR(1) nor is it LL(1). Only s is endable and the first and follow sets are:

Follow(s) = {}
Follow(a) = { A, C}
First(s) = { B, D}
First(a) = { D}

The LR(0) item automaton has states:

S0 = a' -> . a ------ s -> . a A s -> . B a C s -> . D C s -> . B D A a -> . D	S1* = s' -> s . ------	S2 = a -> D . s -> D . C --------
S3 = s -> B . a C s -> B . D A --------- a -> .D	S4* = s -> D C . -------	S5* = a -> B D . A -------- a -> D .
S6* = a -> B D A. ---------	S7 = s -> B a . C ---------	S8* = s -> B a C . ---------
S9 = s -> a . A --------	S10 = s -> a A . -------

The transitions are:

s: S0 ---> S1
a: S0 ---> S9
B: S0 ---> S3
D: S0 ---> S2
C: S2 ---> S4*
a: S3 ---> S7
D: S3 ---> S5
A: S5 ---> S6
C: S7 ---> S8
A: S9 ---> S10

There are a number of shift/reduce conflicts consider S2: the follow set of a includes C so we cannot resolve this conflict using the SLR(1) technique. However, let us propagate the follow sets to the items of this automaton. This gives:

S0 = s' -> . s $ ------ s -> . a A $ s -> . B a C $ s -> . D C $ s -> . B D A $ a -> . D A	S1* = s' -> s . $ ------	S2 = a -> D . A s -> D . C $ --------
S3 = s -> B . a C $ s -> B . D A $ --------- a -> .D C	S4* = s -> D C . $ -------	S5* = s -> B D . A $ -------- a -> D . C
S6* = s -> B D A. $ ---------	S7 = s -> B a . C $ ---------	S8* = s -> B a C . $ ---------
S9 = s -> a . A $ --------	S10 = s -> a A . $ -------

Notice how the states with the completed item [a -> D.] separate the global follow set of a into two components. This separation is sufficient to disambiguate when a reduction is appropriate.

LALR(1) grammars cover many of the grammars that one usually meets in the course of developing a programming language.

Yacc recognizes LALR(1) grammars.

Consider the grammar:

e -> e ADD t
| t.
t -> t MUL f
| f.
f -> VAR.

Start by adding a separated start state:

e' -> e
e -> e ADD t
| t.
t -> t MUL f
| f.
f -> VAR.

The vital statistics are:

NT	First	Follow
f	{ VAR}	{MUL,ADD.$}
t	{VAR}	{MUL,ADD,$}
e	{VAR}	{ADD,$}
e'	{VAR}	{$}

(everything is endable)

The deterministic item automaton has the following states (completed items starred!):

s0 = e' -> .e
--------
e -> .e ADD t
e -> .t
t -> .t MUL f
t -> .f
f -> .VAR

s1 = e' -> e. *
e -> e.ADD t
------------

s2 = e -> e ADD.t
------------
t -> .t MUL f
t -> .f
f -> .VAR

s3 = e -> e ADD t. *
t -> t.MUL f
-------------

s4 = t -> t MUL.f
------------
f -> .VAR

s5 = f -> VAR. *
----------

s6 = e -> t. *
t -> t.MUL f
------------

s7 = t -> f. *
--------

s8 = t -> t MUL f. *
-------------

Transitions:

(s0, e, s1)
(s0, t, s6)
(s0, f, s7)
(s0, VAR,s5)
(s1, ADD, s2)
(s2, t, s3)
(s2, f, s7)
(s2, VAR, s5)
(s3, MUL, s4)
(s4,VAR,s5)
(s4, f, s8)
(s6, MUL, s4)

Is the grammar LR(0)?

NO there are shift/reduce conflicts in the following states: { s1, s3, s6 }

Is the grammar SLR(1)?

That is can the follows sets of the terminal to which one is reducing help to disambiguate whether one should shift or not? Lets consider each state in turn: s1: The conflict is between reduce[e' -> e] and shift[ADD] follow(e') = {} and e' is endable so the only time one should reduce is if one has reached the end of input. So simple lookahead resolves this. s3: The conflict is between reduce[e -> e ADD t] and shift[MUL] follow(e) = {ADD} and e is endable so one should reduce only if one is at the end of input or the next symbol is an ADD. So simple lookahead resolves this. s6: The conflict is between reduce[e -> t] and shift[MUL]. Thus simple lookahead resolves the conflict again. So the grammar is SLR(1)!

Let us do a shift/reduce parse:

STACK	INPUT	ACTION
s0	"VAR ADD VAR MUL VAR ADD VAR"	shift
s0[VAR]s5	"ADD VAR MUL VAR ADD VAR"	reduce[f -> VAR]
s0[f]s7	"ADD VAR MUL VAR ADD VAR"	reduce[t -> f]
s0[t]s6	"ADD VAR MUL VAR ADD VAR"	reduce[e -> t] follow(e) = {ADD,$}
s0[e]s1	"ADD VAR MUL VAR ADD VAR"	shift[ADD] follow(e') = { $ }
s0[e]s1[ADD]s2	"VAR MUL VAR ADD VAR"	shift[VAR]
s0[e]s1[ADD]s2[VAR]s5	"MUL VAR ADD VAR"	reduce[f -> VAR]
s0[e]s1[ADD]s2[f]s7	"MUL VAR ADD VAR"	reduce[t -> f]
s0[e]s1[ADD]s2[t]s3	"MUL VAR ADD VAR"	shift[MUL] follow(e) = {ADD,$}
s0[e]s1[ADD]s2[t]s3[MUL]s4	"VAR ADD VAR"	shift[VAR]
s0[e]s1[ADD]s2[t]s3[MUL]s4[VAR]s5	"ADD VAR"	reduce[f -> VAR]
s0[e]s1[ADD]s2[t]s3[MUL]s4[f]s8	"ADD VAR"	reduce[t -> t MUL f]
s0[e]s1[ADD]s2[t]s3	"ADD VAR"	reduce[e -> e ADD t]
s0[e]s1	"ADD VAR"	shift[ADD] follow(e') = { $ }
s0[e]s1[ADD]s2	"VAR"	shift[VAR]
s0[e]s1[ADD]s2[VAR]s5	""	reduce[f -> VAR]
s0[e]s1[ADD]s2[f]s7	""	reduce[t -> f]
s0[e]s1[ADD]s2[t]s3	""	reduce[e -> e ADD t]
s0[e]s1	""	accept follow(e') = { $ }

LL(1)	<	LR(1)
LL(2)	<	LR(2)
LL(3)	<	LR(3)
	...

Parse Stack	Input	Action
	"()"	shift[(]
(	")"	reduce(r_3)
( s	")"	shift[)]
( s )	""	reduce(r_3)
( s ) s	""	reduce(r_2)
s	""	reduce(r_1)
s'		ACCEPT