# CPSC 411 - Lab Notes - 01-20

## Lex

All students need to use lex.py from http://systems.cs.uchicago.edu/ply for their code.

### Format of Lex Input

In other languages such as the lex for C, input for lex is divided into three sections:

```  ...definitions...
%%
...rules...
%%
...subroutines...
```

However, in Python, the lexer is structured quite differently. More of that will be explained in lab 2.

In python, each of the tokens is associated with a regular expression, either by direct assignment such as in the simple case of a plus token:

```           t_PLUS = r'\+'
```

or by placing it as the comment string of a routine that will return the token, such as for a number:

```      def t_NUMBER(t):
r'\d+'
try:
t.value = int(t.value)
except ValueError:
print "Integer value too large", t.value
t.value = 0
return t
```

This lab will look at regular expressions.

## Regular Expressions

 Expression Meaning Example `.` Any character except "\n" `a,b,...` Non special characters match that character ab;c matches "ab;c" `[]` Any character in the brackets. ^ negates it when it is the first character. - signifies a range if not the first character. [abz] a single a or b or z [^a-z] Anything except lc letters. `*` 0 or more of the preceding pattern a* - nothing, a, aa, aaa,... + 1 or more of the preceding pattern ? 0 or 1 of the preceding pattern. [0-9]? An optional digit {n} n of the preceding pattern. {n,m} n to m of the preceding pattern. [a-z]{3,5} All groups of three, four or five letters {name} Refers to a name defined in the definitions section of lex `\` Escape character \* matches an asterik `()` groups patterns ([ab]1?)? matches nothing, a, a1, b, b1. `|` Either the pattern before or after. (if)+|5 matches multiple if's or a single 5 `"..."` Literally what is in the quotes. "\*" matches an backslash then an asterik `^` If the first character, matches beginning of the line `<>` State in lex

A more complete online reference is available on the Python web site, for example, see the re module there.

### An example of recognizing numbers.

We went through how to build up an example in class. We will arrive at a regular expression that recognizes such strings as 45, -34.928, 7e9 +23.348E-6.

 We want to recognize... Regular Expression A digit `[0-9]` Many digits `[0-9]+` An optional sign `[-+]?` A whole number `[-+]?[0-9]+` A fractional number `[-+]?[0-9]*\.[0-9]+` Both whole and fractional numbers `[-+]?[0-9]+|([0-9]*\.[0-9]+)` An exponent `[eE][-+]?[0-9]+` A number `[-+]?[0-9]+|([0-9]*\.[0-9]+)([eE][-+]?[0-9]+)?`