CPSC 333: Entities and Attributes

Location: [CPSC 333] [Listing by Topic] [Listing by Date] [Previous Topic] [Next Topic] Entities and Attributes


This material was covered during lectures on January 15-17, 1997.


The Student Information System, Version One

We will begin by considering a system (whose requirements are to be specified) that will be used to keep track of students that either are registered in or have recently passed some single, fixed academic course.

In order to deal with requests for information, it is necessary to keep track of the ID number and name (first name, middle initial, and family name) of each student that the system knows about. ID numbers are unique; that is, not two students have the same ID number. Names are not necessarily unique.

This is a pass/fail course. Students who fail from the course are automatically registered in the next section of the course. That is, they remain registered. Students may repeat the course as often as they'd like before they pass it, so it isn't necessary for the system to keep track of the number of times the student has attempted (and failed) the course. Students may not repeat the course after they've passed it.

It is necessary for the system to remember any given student that has passed the course (until the system is told to delete this information). In particular, it should be possible for a user to give the system an ID number and for the system then to report either that

If the ID number belongs to a currently registered student or to a student who has (recently) passed the course, then the system should also be able to supply the student's name.

A more complete description of this system is also available. However, these are all the details that we will need at this point.

The System's ERD

An ``entity relationship diagram'' for this system is as follows.

Picture of ERD

A plain text approximation of this picture is also available. As you can see, this consists of a single rectangle with the label "Student."

This is a diagram with a single entity called "Student" and with no ``relationships.'' Sometimes, entities are called ``objects'' in the literature, but we won't use this term, because objects (in object-oriented development) have features that entities (in ERDs) don't.

The Corresponding Table of Data

This entity represents or ``corresponds to'' a single table of data that must be maintained by the system. This table will have five columns:

  1. a column of ID numbers, with no ID number appearing more than once in it;
  2. a column of first names of students;
  3. a column of middle initials of students;
  4. a column of last names of students;
  5. a column representing the ``status'' of each student, which is always either ``passed'' or ``registered'' (but never both at once).

There will be one row of the table for each student that the system ``currently'' needs to know about - with the ID number, first name, middle initial (possibly a blank), last name, and status of that student listed in that row.

Instances and Attributes

Thus, you can think of each instance of the entity called ``Student'' as corresponding to a five-tuple: an ID number, first name, middle initial, last name, and status.

The ID number, first name, middle initial, last name, and status all have ``elementary'' data types - integer, character string, character, and ``passed''_or_``registered'' (element of an ``enumerated set''). Since these are ``elementary'' and are all the components that make up an instance of this entity, ID number, first name, middle initial, last name and status are the (five) attributes of the entity ``Student.''

Thus (again): an entity in an ERD corresponds, more or less, to a table of data. Each column of the table stores an ``elementary'' data item in each row. The number of columns is fixed, and there are no completely empty or ``undefined'' spaces in the table. Even the use of a ``blank space'' as a middle initial conveys some useful information - the fact that the student doesn't have (or refuses to disclose) a second given name. The number of rows can change as the system is used - as ``instances'' of the entity are added or deleted. Each column of the table corresponds to an attribute of the entity, and each row of the table corresponds to an instance of the entity.

These Aren't Shown...

Note that the following things are not shown on the ERD (as defined here).

This information will be modeled by other models than the ERD, which supplement or ``complement'' this diagram.

You may notice that Pont introduces something called an ``attribute diagram'' in the same chapter in which he introduces ERDs; these are extended ERDs that show the attributes too. We won't use these in CPSC 333.

Characteristics of an Entity

In general, an ``entity'' in an ERD corresponds to something that the system must remember information about. Frequently, an ``entity'' corresponds to one of the following.

Each entity should have two more attributes, and each attribute should have an elementary data type, such as integer, real, character, character string, or (element of an) ``enumerated set.''

Keys and Primary Keys

It is almost always necessary for a system to be able to distinguish between, and select from, the instances of an entity. Therefore there should always be at least one subset of the entity's attributes such no two (or more) instances of the entity can have the same values for all the attributes in this subset, at the same time.

Each subset of attributes with this property, that is as small as possible (ie, including no ``extra'' or ``unnecessary'' attributes) is called a key of the attribute.

Therefore, a key for an entity is a subset K of the set of the entity's attributes, such that the following two properties are satisfied.

  1. It is guaranteed that no two (or more) instances of the entity ever have the same values for all the attributes included in K at the same time. In other words, if you examine the columns of the data table for the entity that correspond to the attributes included in K, then there will never be two (or more) rows of the table having the same values in all these columns;
  2. K is ``minimal,'' in the sense that if any attribute is removed from K, then the remaining set of attributes no longer satisfies the first condition given above.

Every entity in an ERD must have at least one key. This implies that no two rows in a data table (corresponding to an entity) can ever be identical.

The conditions don't imply that keys are ``unique,'' or even that all the different keys for the same entity must include the same number of attributes. For example, if it was guaranteed that no two students in our system could ever have the same name, then the entity ``Student'' would have two keys: one of size one, containing only the attribute ``ID number,'' and another of size three, containing the attributes ``first name,'' ``middle initial,'' and ``last name.'' Note that this wasn't guaranteed; so the last subset really isn't a key, and our entity ``Student'' really only has one key (including the attribute ``ID number'').

In general, one key for an entity is designated as the primary key for the entity - and the implemented system will generally include operations that read or delete an instance of each entity, which accept the values for the attributes included in the primary key as inputs. It's possible, though, that the system will include other operations that access instances using other keys for the entity, too.

It's been noted already that, in the above example, the entity ``Student'' has only one ``key'' - in this case, a subset consisting of a single attribute, ``ID Number.'' Therefore (since there's only one key to choose from), the single element subset containing the attribute ``ID Number'' is the primary key for the entity.

Here is an abuse of notation that we'll commonly use: In this kind of case (the primary key includes only one attribute) we'll ignore the difference between a single-element-subset and the element contained in that subset. Thus, we'll call the attribute ``ID Number'' the ``primary key'' of the entity, even though the primary key is really a set that has ``ID Number'' as its only element.

Location: [CPSC 333] [Listing by Topic] [Listing by Date] [Previous Topic] [Next Topic] Entities and Attributes


Department of Computer Science
University of Calgary

Office: (403) 220-5073
Fax: (403) 284-4707

eberly@cpsc.ucalgary.ca