CPSC 333 --- Lecture 3 --- Friday, Jan 12, 1996 Recall the "Ongoing Example: Student Information System" VERSION ONE of this system was described in Lecture 2. The entity relationship diagram given for this system is as follows: ___________________ | | | Student | | | ------------------- --- that is, it consists of a single rectangle with the label "Student." This is a diagram with a single *entity* or *object* called "Student" and with no "relationships". This represents or "corresponds to" a single table of data that must be maintained by the system. This table will have three columns: - a column of ID numbers, with no numbering appearing more than once in it - a column of names of students (character strings) - a column representing the "status" of each student, which is either "passed" or "registered" There will be one row of the table for each student that the system "currently" needs to know about --- with the ID number, name, and status of that student listed in that row. Thus, you can think of each "instance" of the "entity" (or "object") student as corresponding to a three-tuple: an ID number, name, and status. The ID number, name, and status all have "elementary" data types --- integer, character string, and "passed"_or_"registered" (element of an "enumerated set") respectively. Since these are "elementary" and are all the components that make up an instance of "Student", ID number, name and status are the (three) *attributes* of the entity "Student". Thus (again): an entity in an ERD corresponds, more or less, to a table of data. Each column of the table stores an "elementary" data item in each row. The number of *columns* is fixed, and there are no blank or "undefined" spaces in the table. The number of *rows* can change as the system is used --- as "instances" of the entity are added or deleted. Each column of the table corresponds to an *attribute* of the entity, and each row of the table corresponds to an *instance* of the entity. Note that the following "things" are NOT shown on the ERD (as I've defined it): - the number of attributes of the entity, or their names or data types - any processes or functions that the system must support using the stored data that this ERD represents This information will be modeled by *other* models than the ERD which supplement or "complement" this diagram. IN GENERAL: An "entity" in an ERD corresponds to something that the system must remember information about. Frequently an "entity" corresponds to: - an "external entity" --- other systems, devices, or people that the system must keep track of - "occurrences or events" --- provided that the system is required to keep a history of these - "roles" (eg, manager, engineer, or salesperson) played by people who interact with the system - organizational units (eg, division, group, team) that are relevant to the application - structures that are built up or assembled from other entities the system needs to remember information about. Each entity should have two more *attributes* and each attribute should have an "elementary" data type --- integer, real, character string, or element of an "enumerated set". (Thus "array" or "record" should not be types of attributes) As in the above example, you can think of an entity as corresponding roughly to a "table of data" with each column of the table corresponding to one of the entity's attributes, and with each row corresponding to one of the entity's instances. It is almost always necessary for a system to be able to distinguish between and select from the instances of an entity. Therefore there should always be at least one SUBSET of the entity's attributes such no two (or more) instances of the entity can have the same values for all the attributes in this subset. Each SUBSET of attributes with this property, that is as small as possible (that is, including no "extra" or "unnecessary" attributes) is called a KEY of the attribute. ONE key for an entity is designated as the PRIMARY KEY for the entity --- and the implemented system will generally include operations that read or delete an instance of each entity, which accept the values for the primary key as inputs. In the above example (since "keys" are required not to include unneeded attributes) the entity "Student" has only one "key" --- in this case, a subset consisting of a single attribute, "ID Number". Therefore (since there's only one key to choose from), the single element subset containing the attribute "ID Number" is the primary key for the entity. Universal Change/Abuse of Notation: In this case we'll gloss over/ignore the difference between a single-element-subset and the element contained in that subset, and call the attribute "ID Number" the "primary key" of the entity. VERSION TWO: A new version of the system can be used to keep track of information about students who are registered in several courses. While it will probably be necessary (*much later* during analysis and development) to place a bound on the number of courses the system can know about at the same time, we'll start by treating this as unlimited (just as we're doing for the number of students). As before, each student has a unique ID number and name. Each course has a discipline code (a short character string), a course number (a positive integer), and a title (a longer character string). No two courses have the same discipline code and also the same course number. However, several courses can have the same discipline code. Several courses can have the same course number, as well (but, not both at once). As well, titles are not unique. System functions are extended, but also changed in some cases: - It should be possible to add information about students, even if though those students haven't registered in any courses (yet) - It should also be possible to add or change information about courses. Courses can be "created" before any students register in them, but of course students can't register in "nonexistent" courses. - Any student can be registered in or have passed zero or more courses. Any course can have zero or more students registered in it or passed by it. - Now, it is necessary to provide both a student's ID number *and* the discipline code and course number of course in order to register a student in a course, or indicate that a student has passed or withdrawn from a course. "Course" will become an entity in the ERD for this version of the system. The primary key for "Course" will consist of a pair of attributes: Discipline Code, and Course Number. It's possible to represent the data that this system must keep track of using four tables: - A "Student" table, with two columns --- for ID number and for name - A "Course" table, with three columns --- the the Discipline Code, Course Number, and Course Title - A "registrations" table with three columns --- for ID number, Discipline Code, and Course Number --- all the attributes in the primary keys (only!) --- for the entities "Student" and "Course." To show that a given student is registered in a give course, we would have the ID number for this student and the discipline code and course number for this course all listed in one row of this table (and we *wouldn't* have all this information in one row otherwise). - A "completions" table, with the same three columns as the "registrations" table --- used to keep track of the students who have passed (rather than are currently registered in) courses. Note that "registrations" and "completions" aren't quite the same as "Courses" and "Students" --- their data tables list attributes in the primary keys of other attributes, and don't include any non-key attributes of their own. An entity relationship diagram for version of the system might look like the following. This "corresponds to" the above data tables (in a sense that should be clearer later in the term, but that won't be clear now). ____________ ________ | | | | | |-----< is registered in >----| | | Student | | Course | | |---------< has passed >------| | | | | | ------------ -------- EXCEPT THAT "is registered in" and "has passed" should both be enclosed in diamonds (which are hard to show in an text file). This diagram has two "entities": "Student" and "Course". It also has two *relationships*: "is registered in" and "has passed," corresponding to the "data tables" "registrations" and "completions" in the above set of data tables, respectively. My own "naming convention" --- which you *aren't* required to follow --- has relationships in ERDs named by short "verb phrases," and has data tables listed by nouns. It'd also be acceptable to use nouns as the names for relationships --- but I personally think this makes the ERD a bit less readable. Unlike the entity "Student" in the first diagram, the entity called "Student" has only *two* attributes: "ID Number" and "name," and these attributes have the same properties as they did in "Version One." Therefore, "ID Number" is (still) the primary key for this entity. The entity "Course" has *three* attributes: "Discipline Code," "Course Number," and "Title". The primary key for this entity should include a *pair* of attributes: Discipline Code and Course Number. Note that each of these entities "corresponds to" a table of data, in which columns correspond to attributes and rows correspond to instances of the entity, as before. The relationships "is registered in" and "has passed" do *NOT* have attributes of their own --- "plain" relationships never do. These represent CONNECTIONS between instances of the entities they relate. Since both of these relationships connect the entity "Student" and the entity "Course" each of these two relationships corresponds to a (different) table of data. Each of these two tables will have three columns --- one for a student's "ID Number" and two more for a course's "Discipline Code" and "Course Number." That is, there are columns for all the attributes in the PRIMARY KEYs of the two entities connected by the relationship. There will *NOT* be a column for any of the entities' other attributes --- hence, there is no column for a course's "Course Name." IN GENERAL: A "relationship" in an ERD connects two or more entities in the diagram --- or can connect an entity to "itself." A relationship *usually* connects exactly *two* entities, but three (or even four-way) relationships can occur. Each relationship "can" correspond to an additional system data table (we'll see later that it can also correspond to an "extension" of an entity's table in some cases). As the above example suggests, relationships never have attributes of their own. Instead, the data table for a relationship lists the primary keys of (instances of) entities connected by the relationship.