CPSC 333 --- Lecture 3 --- Friday, Jan 12, 1996

Recall the "Ongoing Example: Student Information System"

VERSION ONE of this system was described in Lecture 2. The entity
relationship diagram given for this system is as follows:

                          ___________________
                         |                   |
                         |      Student      |
                         |                   |
                          -------------------

--- that is, it consists of a single rectangle with the label
"Student."

This is a diagram with a single *entity* or *object* called
"Student" and with no "relationships".

This represents or "corresponds to" a single table of data that
must be maintained by the system. This table will have three
columns:

 - a column of ID numbers, with no numbering appearing more than
   once in it

 - a column of names of students (character strings)

 - a column representing the "status" of each student, which is
   either "passed" or "registered"

There will be one row of the table for each student that the system
"currently" needs to know about --- with the ID number, name, and
status of that student listed in that row.

Thus, you can think of each "instance" of the "entity" (or "object")
student as corresponding to a three-tuple: an ID number, name, and
status.

The ID number, name, and status all have "elementary" data types ---
integer, character string, and "passed"_or_"registered" (element of
an "enumerated set") respectively. Since these are "elementary" and
are all the components that make up an instance of "Student", ID
number, name and status are the (three) *attributes* of the entity
"Student".

Thus (again): an entity in an ERD corresponds, more or less, to a
table of data. Each column of the table stores an "elementary" data
item in each row. The number of *columns* is fixed, and there are no
blank or "undefined" spaces in the table. The number of *rows* can
change as the system is used --- as "instances" of the entity are
added or deleted. Each column of the table corresponds to an
*attribute* of the entity, and each row of the table corresponds to an
*instance* of the entity.

Note that the following "things" are NOT shown on the ERD (as I've
defined it):

 - the number of attributes of the entity, or their names or data
   types
 - any processes or functions that the system must support using
   the stored data that this ERD represents

This information will be modeled by *other* models than the ERD
which supplement or "complement" this diagram.

IN GENERAL: An "entity" in an ERD corresponds to something that
the system must remember information about. Frequently an "entity"
corresponds to:

 - an "external entity" --- other systems, devices, or people
   that the system must keep track of

 - "occurrences or events" --- provided that the system is required
   to keep a history of these

 - "roles" (eg, manager, engineer, or salesperson) played by people
   who interact with the system

 - organizational units (eg, division, group, team) that are relevant
   to the application

 - structures that are built up or assembled from other entities
   the system needs to remember information about.

Each entity should have two more *attributes* and each attribute
should have an "elementary" data type --- integer, real, character
string, or element of an "enumerated set". (Thus "array" or "record"
should not be types of attributes) As in the above example, you can
think of an entity as corresponding roughly to a "table of data" with
each column of the table corresponding to one of the entity's
attributes, and with each row corresponding to one of the entity's
instances.

It is almost always necessary for a system to be able to distinguish
between and select from the instances of an entity. Therefore there
should always be at least one SUBSET of the entity's attributes such
no two (or more) instances of the entity can have the same values for
all the attributes in this subset. Each SUBSET of attributes with this
property, that is as small as possible (that is, including no "extra"
or "unnecessary" attributes) is called a KEY of the attribute. ONE key
for an entity is designated as the PRIMARY KEY for the entity --- and
the implemented system will generally include operations that read or
delete an instance of each entity, which accept the values for the
primary key as inputs.

In the above example (since "keys" are required not to include
unneeded attributes) the entity "Student" has only one "key" --- in
this case, a subset consisting of a single attribute, "ID
Number". Therefore (since there's only one key to choose from),
the single element subset containing the attribute "ID Number" is the
primary key for the entity.

Universal Change/Abuse of Notation: In this case we'll gloss
over/ignore the difference between a single-element-subset and the
element contained in that subset, and call the attribute "ID Number"
the "primary key" of the entity.

VERSION TWO: A new version of the system can be used to keep track of
information about students who are registered in several courses.
While it will probably be necessary (*much later* during analysis and
development) to place a bound on the number of courses the system can
know about at the same time, we'll start by treating this as unlimited
(just as we're doing for the number of students).

As before, each student has a unique ID number and name.

Each course has a discipline code (a short character string), a course
number (a positive integer), and a title (a longer character
string). No two courses have the same discipline code and also the
same course number. However, several courses can have the same
discipline code. Several courses can have the same course number, as
well (but, not both at once). As well, titles are not unique.

System functions are extended, but also changed in some cases:

 - It should be possible to add information about students, even
   if though those students haven't registered in any courses (yet)

 - It should also be possible to add or change information about
   courses. Courses can be "created" before any students register
   in them, but of course students can't register in "nonexistent"
   courses.

 - Any student can be registered in or have passed zero or more
   courses. Any course can have zero or more students registered in
   it or passed by it.

 - Now, it is necessary to provide both a student's ID number
   *and* the discipline code and course number of course in order
   to register a student in a course, or indicate that a student
   has passed or withdrawn from a course.

"Course" will become an entity in the ERD for this version of the
system. The primary key for "Course" will consist of a pair of
attributes: Discipline Code, and Course Number.

It's possible to represent the data that this system must keep track
of using four tables:

 - A "Student" table, with two columns --- for ID number and for
   name

 - A "Course" table, with three columns --- the the Discipline Code,
   Course Number, and Course Title

 - A "registrations" table with three columns --- for ID number,
   Discipline Code, and Course Number --- all the attributes in the
   primary keys (only!) --- for the entities "Student" and "Course."
   To show that a given student is registered in a give course, we
   would have the ID number for this student and the discipline
   code and course number for this course all listed in one row of
   this table (and we *wouldn't* have all this information in one
   row otherwise).

 - A "completions" table, with the same three columns as the
   "registrations" table --- used to keep track of the students
   who have passed (rather than are currently registered in)
   courses.

Note that "registrations" and "completions" aren't quite the same
as "Courses" and "Students" --- their data tables list attributes
in the primary keys of other attributes, and don't include any
non-key attributes of their own.

An entity relationship diagram for version of the system might look
like the following. This "corresponds to" the above data tables
(in a sense that should be clearer later in the term, but that won't
be clear now).

   ____________                               ________
  |            |                             |        |
  |            |-----< is registered in >----|        |
  |  Student   |                             | Course |
  |            |---------< has passed >------|        |
  |            |                             |        |
   ------------                               --------

EXCEPT THAT "is registered in" and "has passed" should both be
enclosed in diamonds (which are hard to show in an text file).

This diagram has two "entities": "Student" and "Course". It
also has two *relationships*: "is registered in" and "has passed,"
corresponding to the "data tables" "registrations" and "completions"
in the above set of data tables, respectively.

My own "naming convention" --- which you *aren't* required to follow
--- has relationships in ERDs named by short "verb phrases," and has
data tables listed by nouns. It'd also be acceptable to use nouns as
the names for relationships --- but I personally think this makes the
ERD a bit less readable.

Unlike the entity "Student" in the first diagram, the entity
called "Student" has only *two* attributes: "ID Number" and "name,"
and these attributes have the same properties as they did in
"Version One." Therefore, "ID Number" is (still) the primary key for
this entity.

The entity "Course" has *three* attributes: "Discipline Code,"
"Course Number," and "Title". The primary key for this entity should
include a *pair* of attributes: Discipline Code and Course Number.

Note that each of these entities "corresponds to" a table of data, in
which columns correspond to attributes and rows correspond to
instances of the entity, as before.

The relationships "is registered in" and "has passed" do *NOT* have
attributes of their own --- "plain" relationships never do. These
represent CONNECTIONS between instances of the entities they relate.
Since both of these relationships connect the entity "Student" and the
entity "Course" each of these two relationships corresponds to a
(different) table of data. Each of these two tables will have three
columns --- one for a student's "ID Number" and two more for a
course's "Discipline Code" and "Course Number." That is, there are
columns for all the attributes in the PRIMARY KEYs of the two entities
connected by the relationship. There will *NOT* be a column for any
of the entities' other attributes --- hence, there is no column for a
course's "Course Name."

IN GENERAL: A "relationship" in an ERD connects two or more entities in
the diagram --- or can connect an entity to "itself." A relationship
*usually* connects exactly *two* entities, but three (or even four-way)
relationships can occur.

Each relationship "can" correspond to an additional system data table
(we'll see later that it can also correspond to an "extension" of an
entity's table in some cases). As the above example suggests,
relationships never have attributes of their own. Instead, the data
table for a relationship lists the primary keys of (instances of)
entities connected by the relationship.