Clades V0.1 IPA

The following is signed for the following iPads:

  • Prezemek (1 & 2)
  • Chris
  • Steve
  • Thomas

Download Clades V0.1

Symbol Tables

The symbol table is a data structure used to describe the identifiers present in the scope of a program at a given point in the source code. It is especially useful to help keep track of types of identifiers (symbols) during the semantic checking phase of a compiler.

Implementation

A symbol table implementation must have at least an insert and a delete operation. Furthermore, a mechanism to handle nested scope needs to be present to disambiguate which identifier is being referred to when one declaration occludes another.

Insertion

From the course notes:

These are the five types of thing we may want to insert

(1) an argument with its type
(2) a (local) variable with its type
(3) a (local) function with its type
(4) the name of a new defined datatype and its constructors
(5) the names of all the constructors with their types (the output type is the name of the datatype).

The last two obviously are only required for the bonus points.

Lookup

When we lookup an identifier in the symbol table, we expect one of the following to be returned (again from course notes):

(1) the symbol is a variable:

  • VARIABLE(level, offset, type) it was found at level <level> in the symbol table and is at offset <offset> from the frame pointer of that activation recor and the variable is of type <type>.

(2) the symbol is a function:

  • FUNCTION(level,label,arg_type,type) it was found at level <level>, the label to which you must jump is <label>. The types of its arguments are <arg_type> and the type of its returned value is <type>.

(3) The symbol is an introduced type with the following coonstructors.
(4) The symbol is a constructor with the indicated type and jump offset.

We must also ensure that the correct identifier is looked up in the case that two identifiers with same name are both within scope. In our language, the one which is syntactically nearest to the actual reference of the identifier (in terms of the distance within the AST) is the correct one. This can be implemented easily be using a stack of symbol tables where the stack is pushed within every new block or function definition. Then, when an identifier is looked up, the stack is searched beginning at the top.

Using The Symbol Table

Say we have the following code:
var x[2]:int;
fun exp(b:int):int {
  var z:int;
  if b=0 then z:= 1
         else z:= x[1] * exp(b-1);
  return z;
};
read x[0];
read x[1];
print exp(x[0])

As we descend through the AST to generate intermediate code, we want to be able to catch errors which were not possible to formulate using a grammar alone. Such errors include trying to add an integer to a float, or perhaps referencing a variable or function that has not been declared.  So, if we are trying to check the expression “z:=1″ we would need to know what the variable “z” is. This information is retrieved through a lookup. Since the variable “z” should have been inserted into the symbol table by the time that we reach “z:=1″, we should be able to tell that “z” is not a function call and has been declared as an integer.

Testing a Grammar for Left-Recursion

First, we will define an immediate left recursive relation between two nonterminals as follows:

  • A nonterminal y is immediately left recursive on x, written xy, if and only if there is a production production x → αyβ with α nullable

Using nonterminals as nodes, we can construct a graph by creating an edge between two nonterminals when one is immediately left recursive on the other.

A grammar is left recursive if and only if the graph of immediate left recursions has a cycle. Conversely, the grammar is not left recursive when this graph is acyclic.

Below is an example grammar, with which we will illustrate the construction of the graph:

Continue reading →

CPSC203 Quiz 2 Study Guide

Available here

Vim Syntax Files for CPFG & LPFG

These files should be placed in you $HOME/.vim/syntax directory.

cpfg.vim

lpfg.vim


To actually use syntax files, the filetype of the buffer you are editing must be set correctly. Generally this is done automatically via file extension like so:

" Use LPFG (instead of LEX) for .l
au BufNewFile,BufRead *.l set filetype=lpfg

Of course, since both CPFG and LPFG use the .l extension, you’ll probably also need to set the filetype manually with this command:

:set filetype=cpfg

Symbol Table for M

The reference for symbol tables in our compiler is at http://pages.cpsc.ucalgary.ca/~robin/class/411/M+/symbol-table-07.txt.

My implementation of the above code currently typechecks OK, but should still have a few bugs in it. It is here SymbolTable.hs (also see MAst.hs in the same directory).

Overview

During semantic analysis, some important questions that we want to be able to answer are:

  • Has a variable with the same name been declared?
  • Does this expression have a valid type?
  • Is this identifier in scope?

While traversing the AST, we insert information into the symbol table and also do lookups to help us answer the questions above.

Intermediate Representation in M

You should able to:
  • List 5 differences between AST & IR
  • Describe the motivation for an IR
IR is the bridge between the AST and the machine. IR is much less language-specific and much more machine specific. For instace, many languages can compile down to the same IR which can then be used to target multiple machines. This is a primary motivation for IR.
IR also facilitates optimization. It is easier to reason about the actual computations being performed when the representation is closer to actual machine operations.

Some major differences between the IR and the AST (for M):

  • No variable names; only offsets. The offset is always relative to a particular scope and in this respect it uniquely determines the location where a variable is stored.
  • No function names, only labels (“fn1″, “fn2″, “fn3″, … etc) and level indexes.
  • All expressions are now typed ( use symbol tables )
  • All function definitions & variable declarations within each scope are collected together.
  • Basic declarations are removed, type is contained in the tree (e.g. I_READ_{I|B|F|C} )

M -> AST -> IR

M:

var x,y:int;

fun exp(b:int):int
{ var z:int;
  if b=0 then z:= 1
  else z:= x * exp(b-1);
  return z;
};
read x;
read y;
print exp(y);

AST:

M_prog
 ([M_var ("x",[],M_int),
   M_var ("x",[],M_int),
   M_fun
   ( "exp", [("b",0,M_int)],M_int
    ,[M_var ("z",[],M_int)]
    ,[M_cond
      (M_app (M_eq,[M_id("b",[]),M_ival 0]),M_ass ("z",[],M_ival 1),
       M_ass ("z",[]
        M_app
        (M_mul,
         [M_id("x",[]),
           M_app
            (M_fn "exp",[M_app (M_sub,[M_id("b",[])
                               ,M_ival 1])])]))),
          M_return (M_id("z",[])]))],
  [M_read("x",[]),M_read("y",[])
  ,M_print (M_app (M_fn "exp",[M_id("y",[])]))])

IR:

([I_FUN
  ("fn1"
   ,[] -- no local function
   ,1 -- one local variable
   ,1 -- one argument
   ,[I_COND(IAPP (IEQ,[I_ID(0,-4,[]),I_IVAL 0])
   ,I_ASS(0,1,I_IVAL 1)
   ,I_ASS(0,1,I_APP(I_MUL_I,[I_ID(1,1,[])
   ,I_APP(I_CALL ("fn1",1),[I_APP(I_SUB,[I_ID(0,-4,[])
   ,I_IVAL 1])])])))
   ,I_RETURN (I_ID(0,1 []))
   ])]
 ,2 -- two local variables
 ,[I_READ_I (0,1,[])
 ,I_READ_I (0,2,[])
 ,I_PRINT_I(I_APP (I_CALL ("fn1",0),[I_ID(0,2,[])]))
])

Using Augmented Grammar to resolve LR(0) item automata conflicts

As stated in the LALR(1) notes (“The augmented grammar”), you can add non-terminals to the original grammar and use their vital statistics to resolve some of the conflicts which might appear in a LR(0) item set:

Each rule x α in the original grammar occurs as the head item in a number of LR(0)-item automaton states. For each such state, S_n, create a new copy of the production with head non-terminal augmented by the number n of the state S, and each non-terminal in the body augmented by the number of the state it leaves when that starting item is followed through its (uniquely determined) shifts in the LR(0)-item automaton.

For instance, in the example grammar

S → id B
B → V assign E | ε
V → ε
E → id V | num

whose LR(0) automata is here. If we were to inspect state #6, we would add new rule for B → V assign E that augments B with 2, V with 4, and E with 5. Following this for all the rules yields:

S0 → id B2
B2 → V2 assign E5 | ε
V2 → ε
E5 → id V7 | num
V7 → ε

It is useful to calculate the vital statistics for the non-terminals in this grammar. In this case we see that V2 is not endable. This should tell us something about the reduce/reduce conflict present in state #2… as well as whether the language is LALR(1).

Expression Parser For Haskell

Just to get you started, here is a Haskell parser example.

There is a Makefile included. You’ll need the following installed on you system to get going:

  • alex
  • happy
  • ghc

LR(0) and SLR(1)

The following handout summarizes some key points to know about these classes.

LR(0) & SLR(1)