Up: Course Projects
CPSC 601.73 Project
Evolution with Introns and Exons
Jie Gao
Department of Computer Science
University of Calgary
gaoj@cpsc.ucalgary.ca
Abstract:
This paper is the report of my project for the course
Biological
Computation. Based on the fact that there are large quantities of genes whose
functions are unknown, and enlightened by the paper
[
WBM03], I proposed a new
Genetic
Programming approach with
introns and
exons that allows part of
the genes being disabled. I implement two different program expression
approaches, namely byte sequence and tree structure. The tree structure
expression performs well in evolution. Some experiments are done to compare the
traditional Genetic Programming with this new approach. The result shows my new
approach can really increase the adaptability of Genetic Programming.
Introduction
In computer science, evolutionary computation is an important computational
paradigm that comes from the nature. It tries to use evolution mechanisms
existing in the nature to solve computer problems. Therefore new discoveries
and hypothesis on the natural evolution could give new ideas to evolutionary
computation.
According to the known facts, in creatures' DNAs there are large quantities of
genes that are not ``translated'' by the creatures into the phenotypes, and
they are sometimes called ``garbage genes''. But most people, including me,
think they are not garbages at all. Their functions are simply unknown
currently. So many people use the term introns to refer to such
function-unknown genes instead of ``garbages''. Correspondingly, the known
genes are called exons. Many hypothesis are made on these introns.
People think they could be meta data of the exons, or the records of the whole
evolution histroy[Kam].
In January, a paper in Nature with the title Loss and Recovery of
Wings in Stick Insects[WBM03] reported a new
discovery in evolution -- some species can lose a complex organ and get it
back some time later. This phenomenon may happen for several times during the
evolution. It amazes scientists a lot because it reveals a fact that some
complex functions of a species can be temporarily disabled and latent in the
genes and be triggered by the environment again some generations later to come
back to the species.
This discovery interests me greatly. Associating the introns facts and the
phenomenon revealed by the paper, I come to the hypothesis that these genes are
turned from exons into introns -- thus disabled temporarily and still in the
genes, and later turned from introns back to exons -- enabled again. It seems
that such a mechanism can probably increase the adaptability of creatures,
especially in an ever-changing environment.
A natural thought is to test whether such a mechanism can result in increase of
adaptability in evolutionary computation. Therefore my idea for this project
comes into being -- a Genetic Programming with introns and exons.
In this report I explain the experiments on the Genetic Programming with
introns and exons, and show the results I got: Section 2
presents the problem selected for the Genetic Programming to solve; the
implementation is discussed in Section 3; Section
4 explains the experiments I made, as well as the results I got
from the experiments; finally in Section 5 and
6, the report is concluded and future work is indicated.
Problem to Solve -- Robot in a Maze
The problem for my Genetic Programming with introns and exons to solve is
``Robot in a Maze'': A maze is an array of grids (see Figure 2 on
Page
). The grid types can be path (the robot can go
through) or wall (the robot can not go through). The maze has a start
point and a finish point (marked by S and F in the maze). The
robot's task is to find a path from the start point to the finish point.
Particular programs are used to express different moving plans of the robot.
These programs are evolved by Genetic Programming.
The robot has various capabilities, including sensing the environment, moving
and doing simple logic and flow control operations. These elements of
capabilities consist the operations of the programs that will be evolved in
Genetic Programming.
The detailed capabilities of the robot, or the available operations in the
program, are listed in Table 1. Each of them has a boolean
return value that can be an operand of the logic operations. Nop is a
special operation that does nothing but generate a random boolean value as the
return value. If some of one operation's operands are missing, Nop
will be appended to make the operation valid.
Table 1:
Operations of the robot in a program
| Name |
Operand No. |
Return Value |
Description |
| E |
0 |
Whether the grid to the east is a
path grid |
| N |
0 |
Whether the grid to the north is a
path grid |
| S |
0 |
Whether the grid to the south is a
path grid |
| W |
0 |
Whether the grid to the west is a
path grid |
| NE |
0 |
Whether the grid to the northeast is a
path grid |
| NW |
0 |
Whether the grid to the northwest is a
path grid |
| SE |
0 |
Whether the grid to the southeast is a
path grid |
| SW |
0 |
Whether the grid to the southwest is a
path grid |
| East |
0 |
Succeed? |
Move to the grid to the east |
| North |
0 |
Succeed? |
Move to the grid to the north |
| South |
0 |
Succeed? |
Move to the grid to the south |
| West |
0 |
Succeed? |
Move to the grid to the west |
| NorthEast |
0 |
Succeed? |
Move to the grid to the northeast |
| NorthWest |
0 |
Succeed? |
Move to the grid to the northwest |
| SouthEast |
0 |
Succeed? |
Move to the grid to the southeast |
| SouthWest |
0 |
Succeed? |
Move to the grid to the southwest |
| Not |
1 |
Logic not of the operand |
| And |
2 |
Logic and of the operands |
| Or |
2 |
Logic or of the operands |
| If |
3 |
The first operand |
If the first operand is true, execute the
second operand. Elsewise execute the third operand |
| Nop |
0 |
Random |
No operation is executed |
A program of the robot's moving plan is consisted by the operations listed in
Table 1. Because the operations have only one type of return
values, and they all have return values, all operations are compatible. Thus
all kinds of operations can be an operand of another operation. A program is
organized in a hierarchical structure.
The introns and exons are labelled by a particular flag -- an enable
flag on each of the operations. One operation is an exon if it is enabled.
Otherwise it is an intron. In the hierarchical structure, if one operation is
an intron, all its operands and operands' operands are introns.
Implementation of Genetic Programming with Introns and Exons
I have tried two different approaches to express the programs. One is encoding
the programs by byte sequences, the other is using tree structures to express
programs. I implemented Genetic Programming with introns and exons in both
approaches and they got different performances in evolution. The following
subsections discuss the two approaches and the difference between them.
In this approach each operation is encoded by a binary integer. Since the total
number of different operations is not large, one byte is enough for this binary
integer, together with a bit for enable flag. In my actual implementation, the
operations are encoded in the way described in Table 2. In this
way, not only the operation code itself, but also the related information of
the operation is encoded in the byte. The length of the byte sequence is not
fixed so that the programs can be in various complexity. But in actual
implementation, there is a hard limit -- 1024 bytes so that the size of the
byte sequence is practical in implementation.
Table 2:
Byte encoding of the operations
| 7 |
6 |
5 |
4 |
3 |
2 |
1 |
0 |
| Bit |
Usage |
| 0 - 2 |
Operation code |
| 3 |
Operation type: |
| |
0 -- Detection of grids; |
| |
1 -- Movement. |
| |
Ignored if bits 4-5 are not all zeros. |
| 4 - 5 |
Number of operands encoded in binary integer |
| 6 |
Nop flag |
| 7 |
Enable flag |
The program is first expressed in operator-postfix expression, then converted
into a byte sequence. The postfix expression ensures that the hierarchical
structure of the program is preserved when it is converted into a flat byte
sequence. And the postfix expression also makes the program easier to
interpret, as is discussed below.
The evolutionary operations on the programs include mutation and
crossover. The mutation operation will randomly select one byte in the
byte sequence, and then randomly choose one of the three operations: deleting
this byte, inserting a random byte in front of this byte, or changing this byte
with another random byte. Here the random byte is verified to express an
existing operation in Table 1. Crossover takes place on two
byte sequences (programs)
and
: Randomly select one byte in
and
another byte in
, then concatenate the subsequence form the beginning of
to the selected byte with the subsequence from the selected byte in
to the
end of
.
The evolutionary operations do not check the structure of the new program,
because the execution mechanism ensures all combinations of operations in Table
1 are executable.
A byte sequence is first converted into an execution table in which low-level
commands are recorded for execution -- ``compiling'' the program before
execution. In the execution table each low-level has an address, a operand
address list and a value (see Figure 1). All operations in
Table 1 except If have corresponding low-level
commands. If operation is converted to combination of several
low-level commands including TSTJ (test a value in one address and
jump to another address according to the value tested) and JMP (jump
to another address). The postfix expression makes the conversion easier. A
stack is enough for conversion. While converting one operation, if no enough
operands exist, one or more NOP low-level command are inserted to make
the operation has enough operands.
Figure 1:
Low-level command format
 |
After the conversion, the execution table is executed over a maze to simulate
the movement of the robot. The execution is in iterations until the robot
reaches the finish grid, or the execution step limit is reached.
The tree structure is a ``standard'' expression in Genetic Programming. In this
approach, each operation is expressed by a node in a tree, and its operands are
children nodes. The children of a node are ordered so that they can express the
operands of an operation. A tree directly represents the hierarchical structure
of a program.
In each node, the operation and its return value are stored, together with an
enable flag. The enable flag decides the gene type (introns or exons) of the
whole subtree starting from this node.
In actual implementation, the tree has a node index to list all nodes in this
tree so that the evolutionary operations can randomly choose a node easily.
Like the byte sequence approach, the number of nodes in a tree is not fixed but
limited to 1024 for practical reasons in implementation.
The evolutionary operations in this approach are also mutation and crossover.
In mutation a node in the tree is randomly selected (from the node index), one
of the two operation is randomly executed: Flipping the enable flag of this
node, or replacing the subtree starting from this node with a randomly
generated and validated tree (could be an empty tree). In crossover two parent
trees
and
contribute to the new offspring. A randomly selected subtree
in
is replaced with another subtree randomly selected from
.
The execution of a tree is in an ``interpret'' way other than the ``compile and
execute'' way in the byte sequence approach. The execution starts from the root
node of the tree and recursively follow the whole tree structure. Like the byte
sequence approach, this program tree is executed in iterations until the
solution of the maze is found, or the execution steps exceed the execution
limit.
The two approach both provide the expressions of the programs, evolutionary
operations that achieve Genetic Programming, and simulations of the robot
moving in the maze. But they have different characteristics and got different
results in my experiments.
The byte sequence approach simplifies the evolutionary operations. Mutation and
crossover on the byte sequence do not need to consider more about the inside of
the sequence. Comparing to the tree approach, the usage of pointers is avoided.
Thus in implementation this approach is faster and less error-prone. The
mechanism in evolutionary operations and execution get faster speed than the
tree structure.
The tree structure approach is more clear in the hierarchical structure. This
approach is the ``standard'' way in Genetic Programming and it is tested by
various Genetic Programming implementations. But the evolutionary operations
and execution are more complex than the byte sequence approach.
Both approaches are implemented and tested. But they perform differently in the
evolutionary behaviours. The byte sequence approach get bad results in
evolution. The fitnesses over generations cannot converge. It's more like
random behaviours other than evolution. The tree sequence works well in
evolution. The species can increase their fitnesses over generations towards
the final goal.
The failure of the byte sequence approach is analyzed. The main reason is that
the evolutionary operations in this approach cannot keep a relatively stable
structure of the program. Good parts in one species can not be transferred to
its offsprings via evolutionary operations. Although the execution of the byte
sequence ensures every byte sequence is executable, it causes the simplified
evolutionary operations changing the hierarchical structure greatly, especially
when operations like If is mutated. Thus the evolutionary operations
in the byte sequence approach are not really evolutionary.
Further experiments are done based on the tree structure approach. The
experiments and results are described in the next section.
Experiments and Results
The following experiments and results are all based on the tree structure
implementation of Genetic Programming with introns and exons. To test whether
this Genetic Programming approach with introns and exons can get better results
than the traditional Genetic Programming, a ``standard'' Genetic Programming is
also implemented as the reference. In this traditional GP implementation, the
programs are in the same tree structure as the one described above, but the
enable flags are ignored so that it is a ``standard'' way.
In the subsections below, the term new approach refers to my Genetic
Programming with introns and exons, and old approach refers to the
traditional Genetic Programming approach.
I hypothesize that the introns and exons mechanism can increase the
adaptability of the species in evolution. Therefore the experiments should be
able to test and compare the adaptability of the new and old approaches. For
this purpose, the programs are evolved continuously on three different mazes.
At the beginning of the experiment, generate the first generation randomly and
evolve them on the first maze. If one of the programs can lead the robot to
move from the start grid to the finish grid -- solution found, change current
maze with a second maze, and evolve the existing programs directly in this new
maze. After the solution of the second maze is found, replace this maze with a
third maze and continue the evolution directly in that maze. When the solution
of the third maze is found, stop the evolution and the experiment ends.
Based on this design of the experiment, I have chosen the parameter values as
follows (they apply to both the new and old approaches):
- Population of each generation: 5000
- Elite group: 20% of the population
They are the fittest top 20% of the current generation. And they directly form
20% of the next generation.
- Mutation rate: 10%
Mutation takes place on the randomly selected programs from the elite group.
10% of the next generation are created in this way.
- Crossover rate: 70%
Crossover takes place on two randomly selected programs from the elite group.
70% of the next generation are created in this way.
- Fitness:
where
is the final distance from the robot to the finish grid,
is
the steps used to execute the program,
is the minimal path length
from the start grid to the finish grid, and
is the node number of the
program tree.
This fitness function considers mainly the position the robot can reach, as
well as other factors so that the fittest programs can keep concise and
efficient. And the part
helps the programs keep
their size within the hard limit 1024. The possible value of the fitness is in
. The fitter the program is, the higher fitness it gets.
- Mazes: The three mazed depicted in Figure 2.
The mazes in Figure 2(a), (b) and (c) are called maze 1,
maze 2 and maze 3 respectively.
- Generation limit: 100 in each maze
The programs can evolve for 100 generations at the most in each maze. If the
generation number reaches 100 in a maze, stop the evolution in this maze and
move on to the next maze for evolution. In actual experiments, the programs are
evolved 4 more generations (but the total generations in one maze can not
exceed 100) in current maze after the solution is found so that the average
fitness is better.
Figure 2:
Mazes used in experiments
(c) 10  10 maze
| Black: |
Wall grid |
|
S: |
Start grid |
| White: |
Path grid |
|
F: |
Finish grid |
|
In the experiments, I define one test as:
- Run the new approach, starting on maze 1. After the solution for maze 1
is found and 4 more generations are evolved, or the generation limit is
reached, move the whole population to maze 2. Again when the solution for maze
2 is found and 4 more generations are evolved, or the generation limit is
reached, move the programs to maze 3. Stop after the solution for maze 3 is
found or the generation limit is reached.
- Run the old approach on the three mazes in the same way.
To get convincing results, I ran my experiment for 100 tests. The next
subsection shows the results.
After running 100 tests, the direct results are the generation numbers spent in
each maze to find the solution. The generation numbers of the first solution
programs are shown in Table 3. If a generation number in it is
100, it indicates the evolution fails finding a solution within limited
generations in corresponding maze. From this table, as well as the statistics
in Table 4, we can find the new and old approaches' performances
are almost the same in maze 1 and maze 2. But in maze 3 there is great
difference between the two approach: The majority of the new approach tests can
find a solution in maze 3 while almost none of the old approach tests (except
one) can find a solution in maze 3.
Table 3:
Result table
from 100 tests
| New Approach |
Old Approach |
6 6 |
8 8 |
10 10 |
6 6 |
8 8 |
10 10 |
|
4 |
47 |
49 |
0 |
41 |
100 |
|
4 |
24 |
100 |
0 |
75 |
100 |
|
0 |
27 |
86 |
2 |
35 |
100 |
|
0 |
24 |
57 |
0 |
18 |
100 |
|
0 |
19 |
49 |
0 |
28 |
100 |
|
0 |
20 |
100 |
0 |
24 |
100 |
|
0 |
100 |
67 |
2 |
49 |
100 |
|
0 |
60 |
72 |
0 |
64 |
100 |
|
0 |
33 |
100 |
0 |
28 |
100 |
|
0 |
22 |
100 |
0 |
26 |
100 |
|
0 |
23 |
100 |
0 |
91 |
100 |
|
0 |
12 |
52 |
0 |
42 |
100 |
|
0 |
100 |
39 |
0 |
50 |
100 |
|
0 |
42 |
53 |
0 |
18 |
100 |
|
0 |
39 |
69 |
0 |
48 |
100 |
|
0 |
100 |
42 |
0 |
14 |
100 |
|
0 |
12 |
24 |
2 |
33 |
100 |
|
1 |
100 |
83 |
0 |
38 |
100 |
|
0 |
30 |
100 |
0 |
31 |
100 |
|
2 |
38 |
65 |
0 |
39 |
100 |
|
0 |
44 |
100 |
0 |
27 |
100 |
|
0 |
21 |
76 |
0 |
100 |
100 |
|
0 |
11 |
37 |
0 |
28 |
100 |
|
0 |
31 |
57 |
0 |
35 |
100 |
|
1 |
100 |
81 |
0 |
47 |
100 |
|
0 |
22 |
80 |
0 |
29 |
100 |
|
0 |
39 |
100 |
0 |
42 |
75 |
|
0 |
14 |
68 |
0 |
26 |
100 |
|
3 |
28 |
43 |
2 |
48 |
100 |
|
0 |
24 |
100 |
0 |
57 |
100 |
|
0 |
15 |
78 |
0 |
25 |
100 |
|
0 |
17 |
100 |
0 |
39 |
100 |
|
0 |
32 |
100 |
0 |
30 |
100 |
|
0 |
66 |
47 |
0 |
28 |
100 |
|
3 |
55 |
100 |
0 |
40 |
100 |
|
0 |
27 |
56 |
0 |
24 |
100 |
|
0 |
23 |
81 |
0 |
54 |
100 |
|
1 |
100 |
62 |
0 |
19 |
100 |
|
0 |
31 |
100 |
0 |
25 |
100 |
|
0 |
100 |
54 |
0 |
26 |
100 |
|
0 |
7 |
40 |
0 |
35 |
100 |
|
2 |
17 |
100 |
0 |
43 |
100 |
|
0 |
100 |
100 |
0 |
32 |
100 |
|
4 |
23 |
80 |
0 |
26 |
100 |
|
2 |
58 |
65 |
0 |
26 |
100 |
|
0 |
100 |
100 |
0 |
34 |
100 |
|
0 |
26 |
100 |
0 |
24 |
100 |
|
0 |
49 |
100 |
2 |
13 |
100 |
|
0 |
33 |
64 |
0 |
49 |
100 |
|
0 |
22 |
100 |
0 |
45 |
100 |
|
| New Approach |
Old Approach |
6 6 |
8 8 |
10 10 |
6 6 |
8 8
|
10 10 |
|
0 |
24 |
45 |
0 |
20 |
100 |
|
3 |
48 |
76 |
0 |
34 |
100 |
|
0 |
55 |
89 |
0 |
34 |
100 |
|
1 |
37 |
55 |
0 |
50 |
100 |
|
0 |
25 |
69 |
0 |
26 |
100 |
|
0 |
14 |
47 |
0 |
38 |
100 |
|
0 |
30 |
83 |
0 |
28 |
100 |
|
0 |
20 |
73 |
0 |
22 |
100 |
|
1 |
49 |
46 |
0 |
38 |
100 |
|
0 |
18 |
34 |
0 |
36 |
100 |
|
0 |
23 |
51 |
0 |
23 |
100 |
|
1 |
100 |
71 |
0 |
34 |
100 |
|
0 |
30 |
100 |
0 |
44 |
100 |
|
0 |
100 |
60 |
0 |
43 |
100 |
|
0 |
56 |
47 |
0 |
25 |
100 |
|
3 |
21 |
69 |
0 |
19 |
100 |
|
0 |
26 |
75 |
0 |
34 |
100 |
|
0 |
24 |
78 |
0 |
27 |
100 |
|
1 |
37 |
100 |
0 |
34 |
100 |
|
1 |
31 |
100 |
0 |
100 |
100 |
|
1 |
31 |
74 |
0 |
43 |
100 |
|
0 |
16 |
60 |
1 |
36 |
100 |
|
0 |
11 |
100 |
0 |
100 |
100 |
|
2 |
100 |
72 |
0 |
34 |
100 |
|
0 |
100 |
86 |
0 |
100 |
100 |
|
0 |
53 |
41 |
1 |
69 |
100 |
|
6 |
16 |
100 |
0 |
27 |
100 |
|
0 |
100 |
100 |
0 |
57 |
100 |
|
0 |
35 |
68 |
0 |
26 |
100 |
|
0 |
67 |
100 |
0 |
35 |
100 |
|
0 |
100 |
91 |
0 |
100 |
100 |
|
0 |
18 |
44 |
0 |
32 |
100 |
|
0 |
21 |
52 |
0 |
39 |
100 |
|
0 |
30 |
61 |
4 |
100 |
100 |
|
0 |
32 |
50 |
0 |
28 |
100 |
|
0 |
27 |
69 |
1 |
23 |
100 |
|
1 |
24 |
100 |
0 |
24 |
100 |
|
3 |
14 |
63 |
0 |
20 |
100 |
|
1 |
29 |
45 |
0 |
25 |
100 |
|
3 |
15 |
65 |
0 |
100 |
100 |
|
0 |
33 |
65 |
0 |
20 |
100 |
|
0 |
28 |
64 |
0 |
51 |
100 |
|
0 |
19 |
66 |
0 |
37 |
100 |
|
0 |
24 |
56 |
0 |
100 |
100 |
|
0 |
25 |
100 |
0 |
88 |
100 |
|
0 |
21 |
38 |
0 |
18 |
100 |
|
0 |
13 |
46 |
1 |
22 |
100 |
|
0 |
19 |
71 |
0 |
100 |
100 |
|
0 |
35 |
50 |
0 |
100 |
100 |
|
0 |
36 |
77 |
1 |
20 |
100 |
|
Table 4:
Statistics from result table
| |
New approach |
Old approach |
| Total tests |
100 |
100 |
| Success in maze 1 |
100 |
100 |
| Success in maze 2 |
85 |
90 |
| Success in maze 3 |
72 |
1 |
The averages of the highest fitness in each generation in the 100 tests are
shown in Figure 3, 4 and 5
respectively for maze 1, 2 and 3. They can convince us again what Table
3 and 4 show. In Figure 3 and
4, the two curves are very close to each other, which shows
the performances of the two approaches are almost the same. But in Figure
5, the curve for the new approach is higher than the one for
the old approach. This again clearly shows the new approach get better
performance.
Figure 3:
Average fitness of fittest species in maze 1
|
Figure 4:
Average fitness of fittest species in maze 2
|
Figure 5:
Average fitness of fittest species in maze 3
|
In the experiment, the programs are evolved in three different mazes, which
stand for various environments for the evolution. From the result of the
experiment on the three mazes, we can tell that if the population get fit in
one maze, they can get fit in another maze faster in the new approach than the
old one. Thus the adaptability of the new approach is better than the old one.
The Genetic Programming with introns and exons gets better results than the
``standard'' Genetic Programming. The only difference between the two approach
is the mechanism of introns and exons, which allows a partition of genes being
disabled temporarily and enabled again some generations later. So analysis on
how the introns affect the evolution is necessary.
Due to the large number of tests, I only choose the new approach in one test to
analyze. The one I have chosen is No. 60 test, whose generation numbers in the
three mazes are 0, 18 and 34 respectively. I recorded all random number seeds
in the tests. So I can replay a particular test for many times with exactly the
same results.
Before tracing down generation by generation, I gathered some statistics from
this test. They show totally how many programs are in the elite groups, and how
many times an enable flag is flipped. The numbers are listed in Table
5.
Table 5:
Statistics of new approach in No. 60 test
(a) Test information: Generation numbers
| Maze |
6 6 |
8 8 |
10 10 |
| Gen. No. for finding the solution |
0 |
18 |
34 |
| Actual Gen. No. evolved |
4 |
22 |
34 |
(b) Statistics about evolution
| Program type |
Number |
Percentage |
| Produced by evolution2 |
251937 |
100% |
| With enable flag flipping |
18297 |
7.263% |
(c) Statistics about elite groups
| Program type |
Number |
Percentage |
| In elite groups from all generations |
63063 |
100% |
| Contain introns |
32021 |
50.776% |
| Average percentage of introns in a program |
6.579
% |
From the statistics in Table 5(b) we can find actually
flipping enable flags takes place at a low frequency. But in the elite group
about half of the programs contain introns. This shows that the introns will
not decrease the fitness of the programs. And in the programs, the introns do
not hold a very high percentage comparing to the exons. On average the intron
percentage is only 6.579%. Comparing to the reality, it is a rather small
percentage. The amazing thing is that such a small amount of introns are able
to result in greatly difference in the adaptability.
Recalling my hypothesis in Section 1, the expected phenomenon
that increases the adaptability of the evolution is that some exons are mutated
into introns in one generation, and in later generations part or all of these
introns are mutated into exons again and they contribute much to the increase
of the fitness. Therefore more detailed analysis should be tracing some program
subtree's origins in some generations before to see if such phenomenon happens.
To analyze such phenomenon, a family graph of the programs is needed. However,
the total generation number on the three mazes is 63 and the family graph
should have about
edges, which means the family graph would be
enormous huge if I were able to generate it. I am able to analyze only part of
the family graph. I choose the fittest program in the final generation, and
generate the ancestor tree for 20 generations (see Figure 6). The
ancestor tree for 30 generations is over 2.1 G Bytes (the size when I kill my
ancestor tree generator), and I am not able to analyze it at all.
Figure 6:
Part of the ancestor tree
About 1/650 of the ancestor tree generated from the
fittest program in the final generation. The numbers are the program IDs used
in the experiment. To simplify the tree, if a node has two identical branches
one of them is omitted.
|
This tree is still very large. I am only able to follow the evolution backward
through the ``major contributors'', which contribute most of the program nodes
in its direct children. E.g. if program
and
crossover to produce
program
, and one of
's subtree is replaced by only a single node from
, I will concentrate on the transition from
to
. By this way, it is
possible to trace down from the root node in the tree to a bottom node.
However, through the analysis here, I only see sometimes a single node is
changed from exon to intron. No further complex introns and exons transitions
are observed.
After analyzing this ancestor tree, I think the reason why I fail to observe
the expected phenomena is that the transitions from introns to exons most take
place when the programs are moved from one maze to another. The last 34
generations are all in maze 3 but I only have a 20-generation tree. So I can
not see what happens immediately after changing the mazes.
To see a whole evolution process from the very beginning to the final
generation, I generate the ``Father Line'' and ``Mother Line'' of the fittest
program in the final generation. The Father Line contains only the father of
the program, and father's father, and so on.... Similarly, the Mother Line
contains only mothers and mothers' mothers. Table 6 shows the two
``lines''.
Table 6:
``Father Line'' and ''Mother Line''
| Father Line |
|
Program ID |
Intron size |
|
331009 |
1 |
|
321604 |
1 |
|
312546 |
1 |
|
307127 |
1 |
|
300763 |
0 |
|
294446 |
0 |
|
288409 |
0 |
|
283226 |
0 |
|
276550 |
0 |
|
269501 |
1 |
|
258747 |
1 |
|
252770 |
0 |
|
244804 |
0 |
|
239268 |
0 |
|
232546 |
0 |
|
219722 |
0 |
|
211662 |
0 |
|
202904 |
0 |
|
197453 |
0 |
|
194128 |
0 |
|
189839 |
0 |
|
183012 |
0 |
|
174951 |
0 |
|
164590 |
0 |
|
157700 |
0 |
|
149204 |
0 |
|
142889 |
0 |
|
133150 |
0 |
|
117350 |
0 |
|
105446 |
0 |
|
101136 |
0 |
|
94579 |
0 |
|
90302 |
1 |
|
80823 |
1 |
|
72793 |
1 |
|
66013 |
1 |
|
62446 |
0 |
|
58326 |
0 |
|
52441 |
0 |
|
46953 |
0 |
|
42819 |
0 |
|
37465 |
0 |
|
31235 |
0 |
|
24795 |
0 |
|
21030 |
0 |
|
15931 |
0 |
|
8921 |
0 |
|
4279 |
0 |
|
0 |
0 |
|
| Mother Line |
|
Program ID |
Intron size |
|
331009 |
1 |
|
321859 |
3 |
|
315105 |
0 |
|
307952 |
32 |
|
301333 |
1, 4 |
|
283266 |
1, 4 |
|
274461 |
1, 36 |
|
268873 |
1, 15 |
|
261857 |
1 |
|
257954 |
0 |
|
249303 |
0 |
|
243133 |
1, 8, 35 |
|
233744 |
1, 8, 24 |
|
228057 |
1, 8, 24 |
|
220198 |
1, 24 |
|
216023 |
0 |
|
208013 |
7 |
|
202164 |
0 |
|
197662 |
0 |
|
193781 |
0 |
|
185935 |
0 |
|
177955 |
0 |
|
171362 |
0 |
|
164589 |
0 |
|
157003 |
0 |
|
149162 |
0 |
|
143166 |
0 |
|
135618 |
0 |
|
130161 |
0 |
|
123130 |
1, 1 |
|
88835 |
1, 1 |
|
80865 |
0 |
|
68421 |
0 |
|
62141 |
1 |
|
54857 |
0 |
|
42376 |
0 |
|
33613 |
0 |
|
27108 |
0 |
|
18502 |
0 |
|
7397 |
0 |
|
0 |
0 |
|
In Father Line and Mother Line, especially in Mother Line, there are several
times that some introns are inherited by crossover or produced by mutation and
later disappear. An obvious example is the Program 268873 in Mother Line. Its
15-node intron subtree is created in mutation from Program 261857. But when the
introns disappears, I can not tell whether part of the introns are inherited by
the offsprings, because the partition contributed by the parent appears in both
introns and exons. The real source is unclear.
Conclusion
From the repeated experiment results shown above, the conclusion can be drawn
that the Genetic Programming with introns and exons approach can increase the
adaptability of the programs considerably. One noticeable thing is that the new
approach does not outperform the old approach in maze 2, while it get much
better results in maze 3. From this phenomenon we can go further to say that
the mechanism of introns and exons affects the evolution more in complex
environments.
But the detailed analysis met difficulties. The huge amount of total programs
does not allow a through look of every program. And in the analysis I was able
to perform, no expected phenomenon is discovered. So the possible explanation
is that either the phenomenon takes place too few to see, or the introns affect
the evolution in a totally different way. No matter which is the real reason,
the project shows the amazing characteristic of evolution: only a small change
can result in enormous difference.
Future Work
In this project I chose a toy problem for the Genetic Programming to solve.
Although the population and the test numbers are large enough to get convincing
results, the problem itself lacks complexity. Since my new approach seems to
affect the result more in complex environment, testings on more complex
problems should be performed. This is the direct future work following the
project.
Another pending work is more analysis of the evolution. In this project I am
not able to observe the expected phenomenon on introns because of the large
amount of evolution activities. If further test can prove that this introns and
exons mechanism is really able to improve the result of Genetic Programming,
more concentration should be put on a micro level in the evolution to see
whether some phenomena like the wings of the stick insects in
[WBM03] can be observed.
- Kam
-
Manolis Kamvysselis.
Evolution with a purpose.
Final paper for course 6.868, EECS, Massachusetts Institute of
Technology, http://web.mit.edu/manoli/evolution/www/main.html.
- Nil98
-
Nils J. Nilsson.
Artificial Intelligence : A New Synthesis.
Morgan Kaufmann Publishers, San Francisco, California, 1998.
- WBM03
-
Michael F. Whiting, Sven Bradler, and Taylor Maxwell.
Loss and recovery of wings in stick insects.
Nature, 421:264-267, January 2003.
CPSC 601.73 Project
Evolution with Introns and Exons
This document was generated using the
LaTeX2HTML translator Version 2002-2-1 (1.70)
Copyright © 1993, 1994, 1995, 1996,
Nikos Drakos,
Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999,
Ross Moore,
Mathematics Department, Macquarie University, Sydney.
The command line arguments were:
latex2html '-up_title=Course Projects' -up_url=../ -split 0 -local_icons report
The translation was initiated by Jie Gao on 2003-06-27
Footnotes
- ...
- The numbers shown in Table 3 on Page
are generation numbers, starting from 0. So the first
generation is 0, the second generation is 1, and so on.... In the following
tables, the generation numbers have the same meanings.
- ... evolution2
- In Table 5 on Page
,
the number of programs produced by evolution in all generations should be
252000 in theory. The loss of accuracy in C++ while converting floating point
to integer causes one more program in each generation's elite group. The number
of program in elite groups is greater than 63000 for the same reason.
Up: Course Projects
Jie Gao
2003-06-27