CPSC 333 --- Lecture 4 --- Monday, Jan 15, 1996 Recall the "Ongoing Example: Student Information System" VERSION THREE: In order to be able to provide some timetabling information, the system has been extended again to include information about the "sections" of courses that are currently running and in which student may be registered. Each "section" is a section of exactly one course. Zero or more "sections" may be running for a course at any time. In order to provide timetabling information it is necessary to keep track of the term (fall or winter) and year in which the section is running, as well as the location, name of the instructor, weekly meeting time, and duration of the lecture. (We're still dealing with a "simplified" problem because there's only one weekly meeting for each section of a course, and we don't need to keep track of labs or tutorials.) *If* sections have unique "section numbers" then the following diagram might be a reasonable one for this system: ___________ __________________ | | is | | | Student |----< registered >-----| Course Section | | | in | | ----------- ------------------ | | | | | < is a section of> | | | ___________ | | | | | Course | ----------< has passed >--------| | ----------- The attributes of "Student" and "Course" would not change. The attributes of "Course Section" would be "section number" (the primary key), term, year, location, day of week, start time, and duration. *HOWEVER* it may be unreasonable to assume that course section numbers must have unique "section numbers." It's at least as likely that section numbers only distinguish between course numbers that are all for *the same* course and that run in the same term and year --- and this is what we'll now assume. Unfortunately, if this is the case, then no set of attributes of the entity "Course" will form a (useful) key! On the other hand, each course section corresponds to *exactly one* course, and a key for course section *can* be formed from the primary key of *the associated course* together with some additional attributes of the course section. In this case "Course Section" is an example of a *WEAK ENTITY* --- an "entity" what is related to (and "dependent on") another entity, with the property that every instance of the "weak entity" is associated with exactly one instance of the other entity --- so that (as above) a "primary key" for the weak entity can be obtained by including some of the weak entity's "own" attributes to the primary key of the entity it "depends upon." Another example of a weak entity might be an entity called "Child" in a company's payroll system, if (for some reason) it's necessary to store a small amount of information about the children of each employee (and, to simplify life, assuming that both parents of a child are both employees) and each child is to be identified using a first name: In this case, "Child" is likely a weak entity that's associated with an entity "Employee". The child's name, together with the employee who's a parent, is enough to identify the child. It's likely that the attributes of the child *alone* aren't enough to form a key for the child --- certainly, the first names of all children in the system can't be expected to be unique. An entity relationship diagram for this version of the student information system might be as follows. ___________ __________ | | | | | Student |-------< has passed >-------| Course | | | | | ----------- ---------- | || | || | ___________________ | | _________________ | | || || -----< is registered in >----|| Course Section || || || | ----------------- | ------------------- where the "real diagram" (rather than ascii text approximation) includes a double line between the entity "Course" and weak entity "Course Section" and has the weak entity represented by a "double-line" rather than "single-line" box. The attributes for "Course Section" are not changed. However, the *primary key* for "Course Section" will include the attributes "Discipline Code" and "Course Number" of "Course" (its primary key) plus the weak entity's own attributes, "Term," "Year," and "Section Number." The data tables for "Student," "has passed," and "Course" will all be the same as before. The data table for "Course Section" will include columns for the above five attributes of "Course" and "Course Section" (that is, "Discipline Code," "Course Number," "Term," "Year," and "Section Number," as well as a column for each one of the additional attributes of "Course Section." The data table for "is registered in" will have six columns --- one for the primary key "ID Number" of "Student", and another five for the attributes included in the primary key of "Course Section" --- "Discipline Code," "Course Number," "Term," "Year," and "Section Number." IN GENERAL: Use a "weak entity" rather than an entity (and relationship) if the potential "weak entity" is associated to another entity, each instance of the "weak entity" is associated with exactly one instance of the other entity (but, each instance of the other entity could be related to zero, or several, instances of the weak entity), and there is no possible (or "natural") "primary key" for the weak entity that only includes attributes of the weak entity itself, but a "primary key" *can* be obtained by adding some of the weak entity's attributes to the "other" entity's primary key. VERSION FOUR: Of course, courses aren't always "pass/fail," and the system would be more useful if it could represent grades students receive for courses. Let's assume (for now) that it's never necessary to remember more than one grade that a student receives for a course, even though the student may attempt the course several times --- it will be sufficient for the system to remember the *highest* grade any given student has earned for a given course. A grade can be one of A+, A, A-, B+, B, B-, C+, C, C-, D+, D, or F. Unfortunately this grade doesn't really belong as an attribute of "Student," since the student might not have taken any courses, or may have earned grades for several (in fact, for arbitrarily many) courses. It doesn't belong as an attribute of "Course," either, for the same reasons. It *does* seem to "belong" as an attribute of the relation "has passed" --- but, since regular "relations" don't have attributes of their own, it's necessary to replace the relation "has passed"... ___________ __________ | | | | | Student |---------< has passed >----------| Course | | | | | ----------- ---------- ...with something that plays both the roles of a relationship connecting "Student" and "Course", and of an "entity" with attribute(s) of its own --- an *Associative Object* which we'll call "Course Credit": ___________ ___________ | | | | | Student |--------------< >---------------| Course | | | /|\ | | ----------- | ----------- | ___________________ | | | Course Credit | | | ------------------- In both the above diagrams the relationship "is registered in" and weak entity "Course Section" should also be included; they've been left out to save space and to focus attention on the part of the diagram that has changed. In the lower diagram, there should actually be a diamond (with no label) in the line between "Student" and "Course;" there should be an arrow pointing up from "Course Credit" into the bottom of that diamond. (This is hard to include in a diagram that's part of a text file.) "Course Credit" has a single attribute --- a grade. A data table for "Course Credit" would have four columns --- one for the primary key "ID Number" of "Student," two for the attributes "Discipline Code" and "Course Number" in the primary key of "Course," and one more for the additional attribute, "grade." This would replace the data table for the relationship "has passed," (which was the same, except for the omission of a column for a "grade." All the other data tables would be the same as they were for Version 3. IN GENERAL: An *associate object* has properties both of entities and relations --- like relations, it connects two or more other entities. Like (other) entities, it has one or more attributes of its own. VERSION FIVE: Now suppose that the system should be able to represent information about two kinds of courses: pass/fail, and courses for which students can earn letter grades. Suppose as well that the system is required to keep track of the "weight" (a positive integer, corresponding to the number of credits the course is worth) of any course that students earn grades for, but that weights *aren't* defined for pass/fail courses. An entity relationship diagram should include two similar (but slightly different) entities --- "Pass/Fail Course" and "Graded Course." "Pass/Fail Course" will have the same primary key and attributes as "Course" has in the above ERDs. "Graded Course" will have the same primary key and all these attributes, plus one additional attribute --- "weight." As in the diagram for "Version Three," a relationship "has passed" should connect "Student" to "Pass/Fail Course." As in the diagram for "Version Four," an associative object "Course Credit" should connect "Student" to "Graded Course." The fact that these two entities are highly similar can be reflected by introducing a third entity --- "Course" --- which is a *supertype* of both "Pass/Fail Course" and "Graded Course", so that "Pass/Fail Course" and "Graded Course" are both *subtypes* of "Course." This is drawn as follows: __________ | | | Course | | | ---------- | -+- | ----------------- | | | | ______________ __________ | | | | | Pass/Fail | | Graded | | Course | | Course | | | | | -------------- ---------- "Course" will have all the attributes that are *common* to both "Pass/Fail Course" and "Graded Course" and will also have the same primary key. "Graded Course" has one additional attribute, "weight," while "Pass/Fail Course" does not. The weak entity "Course Section" will be connected to "Course," since both Pass/Fail and Graded courses can have sections. The relationship "has passed" will only connect Student to "Pass/Fail Course", while the associative object "Course Credit" will only connect Student to "Graded Course." There will be a data table corresponding to "Pass/Fail Course" and another that corresponds to "Graded Course." Since *every* "instance" of a "Course" is actually either an instance of "Pass/Fail Course" or "Graded Course" (but never both at once), there will *not* be a third data table for the supertype "Course". The other data tables --- for Student, Course Section, Course Credit, and the relationships "has passed" and "is registered in" --- will be the same as they are in (recent) previous systems. IN GENERAL: Supertypes are useful for identifying two (or more) entities that are very similar --- with common primary keys and (possibly) common additional attributes and roles in relationships as well. These other entities are "subtypes" of the supertype. All *common* attributes (that is, attributes shared by *all* of the subtypes) are attributes of the "supertype" as well --- and are automatically "inherited" by every subtype. Relationships are allocated to the "supertype" and inherited by all subtypes in the same way. It is generally the case that supertypes have no instances of their own --- so, as above, each subtype will correspond to a data table (unless it's a "supertype" for *another* set of entities) but the supertype will not have a table of its own. We'll see later (when types of attributes, etc., are modeled) that supertypes/subtypes are useful for elimination of repetition of definitions in a model --- and can therefore make it easier to keep a model up-to-date and consistent. *** Answer to a "Frequently Asked Question" *** In previous years, quite a few students have asked about the difference between "weak entities" and "subtypes," or have confused them. Think of a subtype as being A KIND OF or perhaps A SPECIAL CASE of its supertype. The subtype has *exactly* the same primary key as the supertype. It also has all the (other) attributes that the supertype does. If a relation in an ERD connects the supertype to some entity, then instances of the subtype can be connected to instances of the other entity, by that relationship. In other words, the subtype *inherits* the key, all the attributes, and any relationships of the supertype. The subtype might have additional attributes of its own and might participate in additional relationships (or both) --- but, again, the *primary key* isn't changed. It's usually the case that any supertype will have several subtypes. A data system corresponding to an ERD including these will probably include a data table for each subtype, but *not* a data table for the supertype --- because it's generally true that every "instance" of the supertype is also an instance of one of its subtypes. If you are familiar with object-oriented design and programming: supertypes and subtypes in ERDs correspond to the use of "single inheritance" in an object-oriented design. In contrast, (an instance of) a weak entity is *not* a special case or a "kind of" (an instance of) the entity it's associated with. Instead, it's a "completely different thing." It happens that there's a different kind of "close connection" between the entity and the weak entity: Every instance of the weak entity "corresponds to" exactly one instance of the other entity. (However, "in the other direction": an instance of the "other entity" could have zero, one, or several instances of the weak entity "corresponding to it.") The primary key for the weak entity will include all the attributes in the *other object's* primary key, plus one or more of the weak entity's own attributes. *None* of the "other entity's" non-key attributes will be attributes of the weak entity --- so, the "other entity's" attributes will *not* generally be a subset of the weak entity's attributes, or vice-versa. The weak entity will have some attributes that are "key" attributes, in the sense that they're part of the weak entity's primary key, and the weak entity will generally have one or more non-key attributes, too. A data system will generally include both a data table for the "other" entity *and* a data table for the weak entity --- since both the "other" entity and the weak entity have instances of their own, and since an instance of the "other" entity is *not* also an instance of the weak entity, or vice-versa.