CPSC 411: Compiling M+ to AM ============================================================================ Author: Robin Cockett Date: 6 March 2006 ============================================================================ The next three assignments may be accomplished in a group of at most two. The assignments require you to implement a compiler for M+ into the AM stack machine code (see a separate document). The stages of the assignment are outlined below: Steps: (1) (March 20) The ALEX/HAPPY front end for M+ with translation into the syntax tree (see specification at the end of document). Required: basic error handling, construction of the syntax tree, showing of the syntax tree. Demonstrate your program on example M+ programs. (2) (April 6) Semantic checking for M+. Using a symbol table to check that the syntax tree is legal: check the scope of variables and that all expressions type check. Print the IR tree. Demonstrate your program on example M+ programs. (3) (April 13) Generate AM stack machine code .... demonstrate your compiler on example M+ programs. ============================================================================= M+ is an extension of Minisculus which has local variables, functions, and arrays. The next three assignments require you to implement M+ on the AM stack machine. You will need to write the compiler to stack machine code. You are REQUIRED to use ALEX and HAPPY. The first thing you must do is to sort out the syntax and implement it with a minimum of actions to obtain a syntax tree. The next stage is to check that identifiers and functions are correctly used and declared before their use (scope) and that expressions are correctly typed. This will require that you build a simple symbol table and implement simple type checking. The last stage is to generate the AM code. ============================================================================= PART I: LANGUAGE DESCRIPTION ------------------------------ The syntax of M+ is defined as follows: M+: ============================================================ prog -> block block -> declarations program_body. declarations -> declaration SEMICOLON declarations |. declaration -> var_declaration | fun_declaration. var_declaration -> VAR ID array_dimensions COLON type. type -> INT | REAL | BOOL. array_dimensions -> SLPAR expr SRPAR array_dimensions |. fun_declaration -> FUN ID param_list COLON type CLPAR fun_block CRPAR. fun_block -> declarations fun_body. param_list -> LPAR parameters RPAR. parameters -> basic_declaration more_parameters |. more_parameters -> COMMA basic_declaration more_parameters |. basic_declaration -> ID basic_array_dimensions COLON type. basic_array_dimensions -> SLPAR SRPAR basic_array_dimensions |. program_body -> BEGIN prog_stmts END. fun_body -> BEGIN prog_stmts RETURN expr SEMICOLON END. prog_stmts -> prog_stmt SEMICOLON prog_stmts |. prog_stmt -> IF expr THEN prog_stmt ELSE prog_stmt | WHILE expr DO prog_stmt | READ identifier | identifier ASSIGN expr | PRINT expr | CLPAR block CRPAR. identifier -> ID array_dimensions. expr -> expr OR bint_term | bint_term. bint_term -> bint_term AND bint_factor | bint_factor. bint_factor -> NOT bint_factor | int_expr compare_op int_expr | int_expr. compare_op -> EQUAL | LT | GT | LE |GE. int_expr -> int_expr addop int_term | int_term. addop -> ADD | SUB. int_term -> int_term mulop int_factor | int_factor. mulop -> MUL | DIV. int_factor -> LPAR expr RPAR | SIZE LPAR ID basic_array_dimensions RPAR | FLOAT LPAR expr RPAR | FLOOR LPAR expr RPAR | CEIL LPAR expr RPAR | ID modifier_list | IVAL | RVAL | BVAL | SUB int_factor. modifier_list -> LPAR arguments RPAR | array_dimensions. arguments -> expr more_arguments |. more_arguments -> COMMA expr more_arguments |. ============================================================ Terminals of minisculus+ ------------------------ "+" => ADD "-" => SUB "*" => MUL "/" => DIV "&&" => AND "||" => OR "not" => NOT "=" => EQUAL "<" => LT ">" => GT "=<" => LE ">=" => GE ":=" => ASSIGN "(" => LPAR ")" => RPAR "{" => CLPAR "}" => CRPAR "[" => SLPAR "]" => SRPAR ":" => COLON ";" => SEMICLON "," => COMMA "if" => IF "then" => THEN "while" => WHILE "do" => DO "read" => READ "else" => ELSE "begin" => BEGIN "end" => END "print" => PRINT "int" => INT "bool" => BOOL "real" => REAL "var" => VAR "size" => SIZE "float" => FLOAT "floor" => FLOOR "ceil" => CEIL "fun" => FUN "return" => RETURN {alpha}[_{digit}{alpha}]* => ID (identifier) {digit}+ => IVAL (integer) {digit}*.{digit}+ => RVAL (real) "false" => BVAL (booleans) "true" => BVAL where alpha = [a-zA-Z] digit = [0-9] Program comments: ---------------- M+ has two types of comments: multi-line comments /* comment */ and one line comments % comment The multi-line comments allow nesting of comments ... Commentary on the M+ grammar ------------------------------------- An M+ program is a block that is a list of declarations followed by a program body ============================================================ prog -> block. block -> declarations program_body. ============================================================ The declarations can either be function declarations or variable declarations each declaration is terminated by a semi-colon. ============================================================ declarations -> declaration SEMICOLON declarations |. declaration -> var_declaration | fun_declaration. ============================================================ A variable declaration is preceded by the reserved word "var" and declares an identifier or an array whose type is attached by a colon followed by the basic type. Arrays sizes may be given as expressions in terms of variables in whose scope the declaration lies. This allows one to declare a local array of a size dependent on some input (such as an array which is an argument to the function). M+ only has only three basic types: reals, integers, booleans. A function declaration is preceded by the reserved word "fun" and consists of an identifier followed by an argument list with a type followed by the function block. This consist of a declaration list followed by the function body enclosed in curly parentheses. The argument list consist of a (possibly empty) list of variable declarations separated by commas. Arrays are declared in argument lists without their size indicated but with the number of dimensions indicated. Arrays are passed by reference, thus they are passed as a pointer to the location at which they are stored. A function can call any function which has already been declared or is declared in the same block. Thus, (mutually) recursive functions are permissible. Functions are also allowed to use variables defined in the same block. A variable, array, or function in a minisculus program can only be legally used if it has been declared in an enclosing block or function. Thus M+ supports local function, variable, and array definitions .... ============================================================ var_declaration -> VAR ID array_dimensions COLON type. type -> INT | REAL | BOOL. array_dimensions -> SLPAR expr SRPAR array_dimensions |. fun_declaration -> FUN ID param_list COLON type CLPAR fun_block CRPAR. fun_block -> declarations fun_body. param_list -> LPAR parameters RPAR. parameters -> basic_declaration more_parameters |. more_parameters -> COMMA basic_declaration more_parameters |. basic_declaration -> ID basic_array_dimensions COLON type. basic_array_dimensions -> SLPAR SRPAR basic_array_dimensions |. ============================================================ The difference between a program body and a function body is that the function body MUST end with a return statement. Otherwise both consist of a series of program statements separated by semi-colons. Program statements include conditional ("if ... then ... else ...") statements, while loops, read statements, assignments, print statements, and blocks. Notice that a block permits the declaration of local variables and functions and is delimited by curly braces. ============================================================ program_body -> BEGIN prog_stmts END. fun_body -> BEGIN prog_stmts RETURN expr SEMICOLON END. prog_stmts -> prog_stmt SEMICOLON prog_stmts |. prog_stmt -> IF expr THEN prog_stmt ELSE prog_stmt | WHILE expr DO prog_stmt | READ identifier | identifier ASSIGN expr | PRINT expr | CLPAR block CRPAR. identifier -> ID array_dimensions. ============================================================ There are three kinds of expression in M+: integer, real, and boolean expressions. The syntax cannot distinguish these expressions and thus some type checking is necessary (and some coercions). Boolean expressions are used in conditional and while statements. Boolean expressions include the ability to compare integer and real expressions. There are a number of built-in functions: "floor", "ceil", "float" and "size". The last of these allows one to access the size or bounds of arrays. ============================================================ expr -> expr OR bint_term | bint_term. bint_term -> bint_term AND bint_factor | bint_factor. bint_factor -> NOT bint_factor | int_expr compare_op int_expr | int_expr. compare_op -> EQUAL | LT | GT | LE |GE. int_expr -> int_expr addop int_term | int_term. addop -> ADD | SUB. int_term -> int_term mulop int_factor | int_factor. mulop -> MUL | DIV. int_factor -> LPAR expr RPAR | SIZE LPAR ID basic_array_dimensions RPAR | FLOAT LPAR expr RPAR | FLOOR LPAR expr RPAR | CEIL LPAR expr RPAR | ID modifier_list | IVAL | RVAL | BVAL | SUB int_factor. ============================================================ A modifier list is either the arguments of a function or the address list of an array. Clearly these must be correctly typed. ============================================================ modifier_list -> LPAR arguments RPAR | array_dimensions. arguments -> expr more_arguments |. more_arguments -> COMMA expr more_arguments |. ============================================================ Specification of the syntax tree -------------------------------- Below is the Haskell datatype (a fairly complicated one) into which this syntax must be translated: data M_prog = M_prog ([M_decl],[M_stmt]) data M_decl = M_var (String,[M_expr],M_type) | M_fun (String,[(String,Int,M_type)],M_type,[M_decl],[M_stmt]) data M_stmt = M_ass (String,[M_expr],M_expr) | M_while (M_expr,M_stmt) | M_cond (M_expr,M_stmt,M_stmt) | M_read (String,[M_expr]) | M_print M_expr | M_return M_expr | M_block ([M_decl],[M_stmt]) data M_type = M_int | M_bool | M_real data M_expr = M_ival Integer | M_rval Float | M_bval Bool | M_size (String,Int) | M_id (String,[M_expr]) | M_app (M_operation,[M_expr]) data M_operation = M_fn String | M_add | M_mul | M_sub | M_div | M_neg | M_lt | M_le | M_gt | M_ge | M_eq | M_not | M_and | M_or | M_float | M_floor | M_ceil Miscellaneous language features: ------------------------------- (1) Can two functions which are at the same scope level have the same name but different parameter lists? Or would this be considered an error? ANSWER: This has not been specified! (a) The most restrictive solution is to disallow ALL definition of symbols with the same name ... (b) The most general solution is to allow functions which are polytypic (i.e. if their parameter lists are different then it is OK to have the same name). This is possible to do in M+. Futhermore, (b) is actually already the case for operations such as "+" and "*" whose meaning depends on the types of the arguments! Thus, the preferred/intended approach is to support this sort of ad hoc polymorphism. (2) Which sort of comment has precedent? For example is it possible to close a multi-line comment on a line which is a comment line? ANSWER: This has not been specified!! (a) the simplest (and the preferrred/intended solution) is to view single-line comments as being removed by a preprocessor before the multi-line comments are similarly removed. Thus single-line comments take precedence over multi-line comments!! (b) however, this is not the only solution!! For example one could take the view that the active comment type is that which was opened first! This would mean that mean that inside a multi-line comment one ignores single-line comments ... can you think of some other variants?