{- 
     Title: Recursive descent parsing in Haskell
            (lectures 3 and 4)
     Author: Robin Cockett
     Date: 18 January 2007

Recall that after transforming the grammar we obtain a grammar with 
no left recursion which is suitable for recursive descent parsing 
(in fact it is an LL(1) grammar although I have not yet explained 
this).

Here is the VERY basic version of the code which shows how one can 
directly translate this grammar into code. In general, for an LL(1) 
grammar some more work may be required  ... the purpose of this example 
is to set you on the road to understanding the general process ... 
and to show how closely related the grammar and the code can be.

(You can run this most easily using the hugs interpreter
    >hugs
       ...
    main> l: parser1_07.hs
    main> exp [TNum 7, TAdd, TNum 3]
       ...
)
-}

import Control.Applicative
import Control.Monad (liftM,ap)

{- You need the above to help with the monad definition because now 
Haskell insists monads be applicative  ... (very annoying) -}

data Tokens = TNum Integer
            | TAdd
            | TMul
     deriving (Show,Eq)

{-    

Grammar                            Haskell Code
=======                            ============
exp -> term more_exp               exp ts = more_exp (term ts)

more_exp -> + term more_exp        more_exp (TAdd:ts) = more_exp (term ts)
          |.                       more_exp ts = ts

term -> factor more_term           term ts = more_term (factor ts)

more_term -> * factor more_term    more_term (TMul:ts) = more_term (factor ts)
           |.                      more_term ts = ts

factor -> NUM.                     factor ((TNum n):ts) = ts

There are two major problems with this code:
(a) ERROR DETECTION: 
       the parse is successful in case the function exp eats the whole list 
       of input tokens.  If it is not successful one of two things can happen: 
       it can either return a nonempty list or fail to  match a case and so 
       rather ungracefully crash!  
(b) SYNTAX TREE:
       It really should return a syntax tree!

We shall address the first problem, (a), to start with by returning the 
remaining token list when a parse failure occurs. "Instrumenting" the code 
with reasonable error detection/reporting results in a code expansion. It 
also requires the use of a datatype, SorF [Tokens].

Here is the code modified in a rather inelegant way to accommodate error 
detection:
-} 

data SorF a = Success a
            | Fail String
     deriving (Show,Eq)

{-
Recently Haskell required that Mondas are "applicative" this means the next two 
declarations are obligitary .... (very annoying)
-} 

instance Applicative SorF where
    pure = return
    (<*>)  = ap  

instance Functor SorF where
    fmap = liftM

{-
parser ts = successful (exp ts) where
    successful (Success []) = True            % returns true if recognized
    successful (Success ts) = error (show ts) % otherwise shows token 
    successful (Fail str) = error str         % string not processed

exp (Success ts) = rest_exp (term (Success ts))
exp (Fail str) = Fail str

rest_exp (Success (TAdd:ts)) = rest_exp (term (Success ts))
rest exp (Success ts) = Success ts
rest_exp (Fail str) = Fail str

term (Success ts) = rest_term (factor (Success ts))
term (Success ts) = Success ts
term (Fail ts) = Fail ts

rest_term (Success (Tmul:ts)) = rest_term (factor (Success ts))
rest_term (Success ts) = Success ts
rest_term (Fail ts) = Fail ts

factor (Success ((Num n): ts)) = Success ts
factor (Success ts) = Fail ts
factor (Fail ts) = Fail ts

Notice in this code we are explicitly passing back through the recursive calls 
the failure.  This makes for alot of repetition in the code.  In addition 
we have to start by distinguishing the Success(ful) case which also makes it 
much more wordy and heavy.

In Haskell whenever one needs to "instrument" the code in this fashion to 
collect errors or some state information one should always think MONADS!!! 
They usually will allow a much more elegant expression.

Here is exception monad on the SorF type constructor:
-}

instance Monad SorF where
    return x = Success x
    (Success x) >>= f = f x
    (Fail str) >>= _ = Fail str
{-
YES YOU CAN DECLARE YOUR OWN MONADS!!  This allows you to instrument 
your code appropriately without too much clutter (an incredibly useful 
facility!).  However be advised: monads are a sophisticated feature of 
Haskell and so, as you begin to get the feel of them, you may like to keep 
fairly close to my examples.  The above exception monad is actually one of the 
simplest examples of a monad. Essentially it automatically propagates the 
failure case for you.

Here is the same code using monads:
-}

exp::[Tokens] -> (SorF [Tokens])
exp ts = do ts' <- term ts
            rest_exp ts'

rest_exp (TAdd:ts) = do ts' <-term ts
                        rest_exp ts'
rest_exp ts = return ts

term ts = do ts' <- factor ts
             more_term ts'

more_term (TMul:ts) = do ts' <- factor ts
                         more_term ts'
more_term ts = return ts

factor ((TNum n):ts) = return ts
factor ts = Fail (show ts)

{-  Notice that the code no longer has to handle the failure case so it 
begins to look much like the original very basic code.

A word about the "do" syntax: it is recursively translated out using the 
following scheme:

     do {x <- hexp; dos} ===> hexp >>= q where
                                 q x = do { dos }
     do { hexp; dos }    ===> hexp >>= (\_ =>  do { dos })
     do { hexp }         ===> hexp

where hexp == Haskell expression
      dos == non-empty series of do statements


The last step is to produce a syntax tree.  First we have to define the form 
of the syntax tree:
-}

data Expr = Num Integer
          | Add Expr Expr
          | Mul Expr Expr
     deriving (Show,Eq)

{-
Next we have to build the syntax tree.  Here I am taking the simplest 
possible route: instead of the parse step just producing the unprocessed 
tokens we arrange that it produces a pair: an expression together with 
the remaining tokens.  Then when we are collecting an expression (rest_exp, 
rest_term) we will pass in what has been collected so far.

In the code below I have included the types of the functions. This is good 
programming practise as it is a basic check that the code you have written 
does what you intended it to do.  You should eventually include the types in 
your programs although when you are developing the program you may find 
this type information adds unneeded clutter and it is more efficient to 
rely on the type inference which Haskell provides.

Here is the program:
-}

exp'::[Tokens] -> (SorF (Expr,[Tokens]))
exp' ts = do (e,ts') <- term' ts
             rest_exp' e ts'

rest_exp':: Expr -> [Tokens] -> (SorF (Expr,[Tokens]))
rest_exp' e (TAdd:ts) = do (e',ts') <- term' ts
                           rest_exp' (Add e e') ts'
rest_exp' e ts = return (e,ts)

term':: [Tokens] -> (SorF (Expr,[Tokens]))
term' ts = do (e,ts') <- factor' ts
              more_term' e ts'

more_term':: Expr -> [Tokens] -> (SorF (Expr,[Tokens]))
more_term' e (TMul:ts) = do (e',ts') <- factor' ts
                            more_term' (Mul e e') ts'
more_term' e ts = return (e,ts)

factor'::[Tokens] -> (SorF (Expr,[Tokens]))
factor' ((TNum n):ts) = return (Num n,ts)
factor' ts = Fail (show ts)