{- Title: Recursive descent parsing in Haskell (lectures 3 and 4) Author: Robin Cockett Date: 18 January 2007 Recall that after transforming the grammar we obtain a grammar with no left recursion which is suitable for recursive descent parsing (in fact it is an LL(1) grammar although I have not yet explained this). Here is the VERY basic version of the code which shows how one can directly translate this grammar into code. In general, for an LL(1) grammar some more work may be required ... the purpose of this example is to set you on the road to understanding the general process ... and to show how closely related the grammar and the code can be. (You can run this most easily using the hugs interpreter >hugs ... main> l: parser1_07.hs main> exp [TNum 7, TAdd, TNum 3] ... ) -} import Control.Applicative import Control.Monad (liftM,ap) {- You need the above to help with the monad definition because now Haskell insists monads be applicative ... (very annoying) -} data Tokens = TNum Integer | TAdd | TMul deriving (Show,Eq) {- Grammar Haskell Code ======= ============ exp -> term more_exp exp ts = more_exp (term ts) more_exp -> + term more_exp more_exp (TAdd:ts) = more_exp (term ts) |. more_exp ts = ts term -> factor more_term term ts = more_term (factor ts) more_term -> * factor more_term more_term (TMul:ts) = more_term (factor ts) |. more_term ts = ts factor -> NUM. factor ((TNum n):ts) = ts There are two major problems with this code: (a) ERROR DETECTION: the parse is successful in case the function exp eats the whole list of input tokens. If it is not successful one of two things can happen: it can either return a nonempty list or fail to match a case and so rather ungracefully crash! (b) SYNTAX TREE: It really should return a syntax tree! We shall address the first problem, (a), to start with by returning the remaining token list when a parse failure occurs. "Instrumenting" the code with reasonable error detection/reporting results in a code expansion. It also requires the use of a datatype, SorF [Tokens]. Here is the code modified in a rather inelegant way to accommodate error detection: -} data SorF a = Success a | Fail String deriving (Show,Eq) {- Recently Haskell required that Mondas are "applicative" this means the next two declarations are obligitary .... (very annoying) -} instance Applicative SorF where pure = return (<*>) = ap instance Functor SorF where fmap = liftM {- parser ts = successful (exp ts) where successful (Success []) = True % returns true if recognized successful (Success ts) = error (show ts) % otherwise shows token successful (Fail str) = error str % string not processed exp (Success ts) = rest_exp (term (Success ts)) exp (Fail str) = Fail str rest_exp (Success (TAdd:ts)) = rest_exp (term (Success ts)) rest exp (Success ts) = Success ts rest_exp (Fail str) = Fail str term (Success ts) = rest_term (factor (Success ts)) term (Success ts) = Success ts term (Fail ts) = Fail ts rest_term (Success (Tmul:ts)) = rest_term (factor (Success ts)) rest_term (Success ts) = Success ts rest_term (Fail ts) = Fail ts factor (Success ((Num n): ts)) = Success ts factor (Success ts) = Fail ts factor (Fail ts) = Fail ts Notice in this code we are explicitly passing back through the recursive calls the failure. This makes for alot of repetition in the code. In addition we have to start by distinguishing the Success(ful) case which also makes it much more wordy and heavy. In Haskell whenever one needs to "instrument" the code in this fashion to collect errors or some state information one should always think MONADS!!! They usually will allow a much more elegant expression. Here is exception monad on the SorF type constructor: -} instance Monad SorF where return x = Success x (Success x) >>= f = f x (Fail str) >>= _ = Fail str {- YES YOU CAN DECLARE YOUR OWN MONADS!! This allows you to instrument your code appropriately without too much clutter (an incredibly useful facility!). However be advised: monads are a sophisticated feature of Haskell and so, as you begin to get the feel of them, you may like to keep fairly close to my examples. The above exception monad is actually one of the simplest examples of a monad. Essentially it automatically propagates the failure case for you. Here is the same code using monads: -} exp::[Tokens] -> (SorF [Tokens]) exp ts = do ts' <- term ts rest_exp ts' rest_exp (TAdd:ts) = do ts' <-term ts rest_exp ts' rest_exp ts = return ts term ts = do ts' <- factor ts more_term ts' more_term (TMul:ts) = do ts' <- factor ts more_term ts' more_term ts = return ts factor ((TNum n):ts) = return ts factor ts = Fail (show ts) {- Notice that the code no longer has to handle the failure case so it begins to look much like the original very basic code. A word about the "do" syntax: it is recursively translated out using the following scheme: do {x <- hexp; dos} ===> hexp >>= q where q x = do { dos } do { hexp; dos } ===> hexp >>= (\_ => do { dos }) do { hexp } ===> hexp where hexp == Haskell expression dos == non-empty series of do statements The last step is to produce a syntax tree. First we have to define the form of the syntax tree: -} data Expr = Num Integer | Add Expr Expr | Mul Expr Expr deriving (Show,Eq) {- Next we have to build the syntax tree. Here I am taking the simplest possible route: instead of the parse step just producing the unprocessed tokens we arrange that it produces a pair: an expression together with the remaining tokens. Then when we are collecting an expression (rest_exp, rest_term) we will pass in what has been collected so far. In the code below I have included the types of the functions. This is good programming practise as it is a basic check that the code you have written does what you intended it to do. You should eventually include the types in your programs although when you are developing the program you may find this type information adds unneeded clutter and it is more efficient to rely on the type inference which Haskell provides. Here is the program: -} exp'::[Tokens] -> (SorF (Expr,[Tokens])) exp' ts = do (e,ts') <- term' ts rest_exp' e ts' rest_exp':: Expr -> [Tokens] -> (SorF (Expr,[Tokens])) rest_exp' e (TAdd:ts) = do (e',ts') <- term' ts rest_exp' (Add e e') ts' rest_exp' e ts = return (e,ts) term':: [Tokens] -> (SorF (Expr,[Tokens])) term' ts = do (e,ts') <- factor' ts more_term' e ts' more_term':: Expr -> [Tokens] -> (SorF (Expr,[Tokens])) more_term' e (TMul:ts) = do (e',ts') <- factor' ts more_term' (Mul e e') ts' more_term' e ts = return (e,ts) factor'::[Tokens] -> (SorF (Expr,[Tokens])) factor' ((TNum n):ts) = return (Num n,ts) factor' ts = Fail (show ts)