The following is signed for the following iPads:
- Prezemek (1 & 2)
- Chris
- Steve
- Thomas
The symbol table is a data structure used to describe the identifiers present in the scope of a program at a given point in the source code. It is especially useful to help keep track of types of identifiers (symbols) during the semantic checking phase of a compiler.
A symbol table implementation must have at least an insert and a delete operation. Furthermore, a mechanism to handle nested scope needs to be present to disambiguate which identifier is being referred to when one declaration occludes another.
From the course notes:
These are the five types of thing we may want to insert
(1) an argument with its type
(2) a (local) variable with its type
(3) a (local) function with its type
(4) the name of a new defined datatype and its constructors
(5) the names of all the constructors with their types (the output type is the name of the datatype).
The last two obviously are only required for the bonus points.
When we lookup an identifier in the symbol table, we expect one of the following to be returned (again from course notes):
(1) the symbol is a variable:
(2) the symbol is a function:
(3) The symbol is an introduced type with the following coonstructors.
(4) The symbol is a constructor with the indicated type and jump offset.
We must also ensure that the correct identifier is looked up in the case that two identifiers with same name are both within scope. In our language, the one which is syntactically nearest to the actual reference of the identifier (in terms of the distance within the AST) is the correct one. This can be implemented easily be using a stack of symbol tables where the stack is pushed within every new block or function definition. Then, when an identifier is looked up, the stack is searched beginning at the top.
var x[2]:int;
fun exp(b:int):int {
var z:int;
if b=0 then z:= 1
else z:= x[1] * exp(b-1);
return z;
};
read x[0];
read x[1];
print exp(x[0])
As we descend through the AST to generate intermediate code, we want to be able to catch errors which were not possible to formulate using a grammar alone. Such errors include trying to add an integer to a float, or perhaps referencing a variable or function that has not been declared. So, if we are trying to check the expression “z:=1″ we would need to know what the variable “z” is. This information is retrieved through a lookup. Since the variable “z” should have been inserted into the symbol table by the time that we reach “z:=1″, we should be able to tell that “z” is not a function call and has been declared as an integer.
First, we will define an immediate left recursive relation between two nonterminals as follows:
Using nonterminals as nodes, we can construct a graph by creating an edge between two nonterminals when one is immediately left recursive on the other.
A grammar is left recursive if and only if the graph of immediate left recursions has a cycle. Conversely, the grammar is not left recursive when this graph is acyclic.
Below is an example grammar, with which we will illustrate the construction of the graph:
These files should be placed in you $HOME/.vim/syntax directory.
To actually use syntax files, the filetype of the buffer you are editing must be set correctly. Generally this is done automatically via file extension like so:
" Use LPFG (instead of LEX) for .l
au BufNewFile,BufRead *.l set filetype=lpfg
Of course, since both CPFG and LPFG use the .l extension, you’ll probably also need to set the filetype manually with this command:
:set filetype=cpfg
The reference for symbol tables in our compiler is at http://pages.cpsc.ucalgary.ca/~robin/class/411/M+/symbol-table-07.txt.
My implementation of the above code currently typechecks OK, but should still have a few bugs in it. It is here SymbolTable.hs (also see MAst.hs in the same directory).
During semantic analysis, some important questions that we want to be able to answer are:
While traversing the AST, we insert information into the symbol table and also do lookups to help us answer the questions above.
M:
var x,y:int;
fun exp(b:int):int
{ var z:int;
if b=0 then z:= 1
else z:= x * exp(b-1);
return z;
};
read x;
read y;
print exp(y);
AST:
M_prog
([M_var ("x",[],M_int),
M_var ("x",[],M_int),
M_fun
( "exp", [("b",0,M_int)],M_int
,[M_var ("z",[],M_int)]
,[M_cond
(M_app (M_eq,[M_id("b",[]),M_ival 0]),M_ass ("z",[],M_ival 1),
M_ass ("z",[]
M_app
(M_mul,
[M_id("x",[]),
M_app
(M_fn "exp",[M_app (M_sub,[M_id("b",[])
,M_ival 1])])]))),
M_return (M_id("z",[])]))],
[M_read("x",[]),M_read("y",[])
,M_print (M_app (M_fn "exp",[M_id("y",[])]))])
IR:
([I_FUN
("fn1"
,[] -- no local function
,1 -- one local variable
,1 -- one argument
,[I_COND(IAPP (IEQ,[I_ID(0,-4,[]),I_IVAL 0])
,I_ASS(0,1,I_IVAL 1)
,I_ASS(0,1,I_APP(I_MUL_I,[I_ID(1,1,[])
,I_APP(I_CALL ("fn1",1),[I_APP(I_SUB,[I_ID(0,-4,[])
,I_IVAL 1])])])))
,I_RETURN (I_ID(0,1 []))
])]
,2 -- two local variables
,[I_READ_I (0,1,[])
,I_READ_I (0,2,[])
,I_PRINT_I(I_APP (I_CALL ("fn1",0),[I_ID(0,2,[])]))
])
As stated in the LALR(1) notes (“The augmented grammar”), you can add non-terminals to the original grammar and use their vital statistics to resolve some of the conflicts which might appear in a LR(0) item set:
Each rule x → α in the original grammar occurs as the head item in a number of LR(0)-item automaton states. For each such state, S_n, create a new copy of the production with head non-terminal augmented by the number n of the state S, and each non-terminal in the body augmented by the number of the state it leaves when that starting item is followed through its (uniquely determined) shifts in the LR(0)-item automaton.
For instance, in the example grammar
S → id B B → V assign E | ε V → ε E → id V | num
whose LR(0) automata is here. If we were to inspect state #6, we would add new rule for B → V assign E that augments B with 2, V with 4, and E with 5. Following this for all the rules yields:
S0 → id B2 B2 → V2 assign E5 | ε V2 → ε E5 → id V7 | num V7 → ε
It is useful to calculate the vital statistics for the non-terminals in this grammar. In this case we see that V2 is not endable. This should tell us something about the reduce/reduce conflict present in state #2… as well as whether the language is LALR(1).
Just to get you started, here is a Haskell parser example.
There is a Makefile included. You’ll need the following installed on you system to get going:
The following handout summarizes some key points to know about these classes.