CPSC 333: Construction of an Entity-Relationship Diagram

Location: [CPSC 333] [Listing by Topic] [Listing by Date] [Previous Topic] [Next Topic] Construction from Problem Statement

This material was covered during lectures on January 23, 1997.

Starting Point
Goals
A Process to Follow
Application of this Process to a Problem
If Your Grammar's a Bit Rusty...
Try a Lab Exercise

Starting Point

The method to be described can be used with a short ``problem statement'' describing a system to be developed - something that is quite brief and incomplete, but that is likely to be available at the very beginning of software development.

It can also use the entity-relationship diagram for a (similar) existing system, if this one is available. This is important, since we can expect that we will need to change an existing specification at least as frequently as we'll need to develop a completely new one.

Later on, this method will be applied to an example. The problem statement for this example is given below.

We'll also see - quite a bit later in the course - that a similar method can be used to produce a class diagram during object-oriented analysis.

Goals

The above information will almost certainly not be sufficient for the development of a ``final'' ERD for our system. However, it will be sufficient for development of a first ERD that can be shown to potential system users for comments. Thus, we can use the above information to produce an ERD that will allow us to begin work.

A Process to Follow

Using the problem statement and similar ERD (if this is available), we will try to choose the entities to be include on the ERD in the first two steps described below.

We will try to choose the relationships to be used as well, in the next two steps after that.

In the step after that, we will try to find attributes and match them to the entities. We may discover at this point that associative objects are needed. We might discover that supertypes and subtypes should be created, as well.

Finally, we will try to choose primary keys for the entities; we may discover at this point that weak entities should be include on the ERD, as well.

First Step: Find Candidate Entities

Make a list of all nouns and noun phrases that appear on the given problem statement. Add the names of all entities, weak entities, supertypes and subtypes that appear on the previous ERD as well, if one is available.

Note that, in the next step, you will ``prune'' this list by discarding things that can't possibly be entities. If you are confident already, at this stage, that some noun shouldn't become an entity or an attribute, then it's certainly OK if you don't include it on the list that you prepare here.

The application of this step to the given example appears below.

Second Step: Prune List of Candidates

Now, ``prune'' this list by eliminating anything that does not have the following necessary characteristics of an entity, that have previously been described.

The noun or noun phrase describes something in the problem domain. Reject anything that appears to be computer-related jargon that refers to an implementation detail for the system to be produced, rather than part of its requirements. Reject anything that seems to be too ``generic'' or ``vague'' to be considered to a useful part of the description of the problem (such as a word like ``data'' or ``information''). Finally, reject anything that seems not to be related to the ``problem to be solved'' at all.
The noun or noun phrase must describe information that must be stored for ``indefinite periods of time'' in order for the system to function correctly. That is, it must describe data that might be created by one application of a function provided by the system, and then could possibly be used (read, modified, or even deleted) by one or more later applications of system functions. Discard anything that does not meet this condition.
It should be possible for the system to need to know about multiple instances of whatever it is that the noun or noun phrase describes. Otherwise you wouldn't need an entire ``table of data'' to store the corresponding information, so the noun or noun phrase doesn't really describe an ``entity.''
The noun phrase should describe something that has multiple attributes. That is, there should be several ``pieces of information'' (namely, more than one) that must be maintained about each ``instance'' of the candidate entity. If this isn't the case, then the noun or noun phrase is more likely to describe an attribute of something else, than it is for it to be an entity. Note, by the way, that you don't need to try to find all of the candidate entity's attributes, at this point, in order to make this assessment.
You also make sure that the same entity hasn't been included more than once, with different names: If two or more nouns or noun phrases appear to be synonyms (and really do describe the same thing in the problem statement), then all but one of them should be eliminated from the list.

There are other characteristics of entities that are important - all their instances should have common attributes, and it should be possible to choose a primary key for the entity. We won't worry about these at this point, but they will be considered during later steps of the process.

Again, the application of this step to the given example appears below.

Third Step: Find Candidate Relationships

Make a list of all the transitive verbs and transitive verb phrases that appear in the problem statement.

Include the names of all the relationships and associative objects that appear on the previous ERD, if one has been supplied.

The application of this step to the given example is shown below.

Fourth Step: Prune List of Candidates

Discard verbs or verb phrases unless their subjects and objects are both the names of entities that were selected above (or, synonyms for them). You should also discard verbs or verb phrases if they don't represent connections between instances of these entities that the system must remember in order to function.

Finally, make sure that you don't include two or more names for the same information. That is, discard synonyms, and also make sure that you don't include both ``active'' and ``passive'' forms for the same verbs or verb phrases (that is, don't include both ``passes'' and ``is passed by,'' etc.).

The application of this step to the given example is shown below.

Fifth Step: Find and Match Attributes

When pruning ``candidate entities'' from the list of nouns and noun phrases you generated in the first step, you will have discarded names of all the attributes of entities in the system. These will correspond to information in the problem domain that needs to be stored, but you will probably have ``rejected them,'' because they didn't have ``multiple attributes,'' or because it was already obvious that they were the names of attributes - parts of entities you'd selected already.

At this point you should confirm that all the ``attributes'' you found have ``elementary data types.'' If they don't, then it's possible that they should be broken up into a (fixed!) number of simpler components, and the names of these components should be included in your list of attributes, instead. As usual, you should also eliminate synonyms (so that information isn't included on your ERD more than once).

Once you've formed and pruned your list of attributes, you should choose the entity that each attribute corresponds to.

You may discover that an attribute actually corresponds to one of the relationships you found in the previous step, instead. In this case, the relationship should actually be an associative object.

Finally, if you notice that several entities share attributes (or roles in relationships) then it may a good idea to create supertypes and subtypes.

An application of this step to the given example is shown below.

Sixth Step: Choose Primary Keys

Finally, select a subset of the set of attributes of each entity that should be that entity's primary key.

If you aren't able to find a primary key for an entity, then it is possible that this should be a weak entity, instead. If you can't choose a primary key for everything after you've tried to turn entities into weak entities, then it's likely that you've made a mistake, somewhere along the line...

Application of this Process to a Problem

Problem Statement

Here is an abridged description of Version Two of the Student Information System that omits some unnecessary details, as well as all references to Version One.

It could either be used by itself to produce an ERD for Version Two of the system, or it could be used together with the entity-relationship diagram for Version One.

The system will keep track of the students that are registered in or that have passed academic courses.

In order to meet requests for information, it is necessary to keep track of the ID number and name (first name, middle initial, and last name) of each student that the system knows about. ID numbers are unique; that is, no two students have the same ID number. Names are not necessarily unique.

Each course has a discipline code, a course number, and a course title. No two courses have the same discipline code and also the same course title at the same time. Course titles aren't necessarily unique.

All courses are pass/fail courses. If a student fails a course then that student is automatically re-registered in the course. However, a student can withdraw from a course if the student hasn't already passed it. As well, if a student withdraws from a course then that student can register in it again. However, a student can't register again in a course that the student has already passed.

Application of First Step

If we applied the above first step to this example, then the following list would be produced - assuming that no attempt is made to ``prune'' the list (by rejecting things that couldn't possibly be entites).

system
track
student
course
order
request
information
ID number
name
first name
middle initial
last name
discipline code
course number
course title
pass/fail course

Note that, in English, some words - such as ``order'' and ``request'' - can be used either as nouns or as verbs. An examination of the above problem statement will confirm that these are both used there as nouns, so it's appropriate to include them in the above list.

If the ERD for Version One was also available (and being used) then we would use it to add ``student'' to the above list, if wasn't already there. Of course, we've already found this noun on the given problem statement, so it has been included already.

Application of Second Step

If we applied the above second step to this example, then we would accept or reject each of the above nouns or noun phrases, for the following reasons.

system: This isn't really in the problem domain, so it should be rejected.
track: This also isn't in the problem domain, so it should be rejected as well.
student: This satisfies all the conditions described above (and it appeared as an entity on the ERD for Version One, as well), so it should be accepted.
course: This also satisfies all the conditions described above, so it should also be accepted.
order: This doesn't seem to describe anything in the problem domain (as it's used in the problem statement), so it should be rejected.
request: This doesn't appear to describe information that needs to be remembered in order for the system to function (if you think it describes anything in the problem domain at all), so it should be rejected.
information: This is not really specific enough to be a good name for a single entity on the ERD for this system, so it should be rejected.
ID number: This certainly does represent something in the problem domain, about which information must be stored. However, it doesn't appear to be something with ``multiple attributes.'' Instead, it seems to be an attribute of an entity we've already accepted (namely, student), so it should be rejected.
name: This is (arguably) ``a tricky one.'' In fact, this is a synonym for a combination of a ``first name,'' ``middle initial,'' and ``last name.'' - that is, for three things that should be attributes of ``student.'' Thus, it describes data that will already be included using the entities that have been selected already. Therefore it would be a mistake to store this information again, and this word will be rejected.
first name: This doesn't have multiple attributes and should be an attribute for ``student,'' so it will be rejected.
middle initial: This should be rejected for the same reason as for ``first name.''
last name: This should be rejected for the same reason as for ``first name.''
discipline code: This should be rejected for essentially the same reason for ``first name'' (it will be an attribute of ``course,'' rather than for ``student'').
course number: This should be rejected for the same reason as for ``discipline code.''
course title: This should be rejected for the same reason as for ``discipline code.''
pass/fail course: As used in the problem description for this system, this is just a synonym for ``course.'' Therefore it will be rejected, because the entity it would be describe has been included already.

Application of Third Step

The transitive verbs and phrases that can be found on the above problem statement, and that would be listed if the third step described above was applied, are as follows.

keep track of
is registered in
meets request for
has
fails
is re-registered in
withdraws from
passes

Note that in a few cases, it isn't immediately clear what the subject of these verbs should be; sometimes it's clear from context that the subject should be ``the system,'' even though this isn't given explicitly in the problem statement.

The ERD for the first version of the system doesn't include any relationships or associative objects at all, so we won't be able to use it to add anything to this list.

Application of Fourth Step

If the fourth step described above is applied to the above list, then the following decisions would be made.

keep track of: The subject of this verb phrase, ``system,'' isn't the name (or a synonym for the name) of an entity, so this verb phrase should be rejected.
is registered in: This has an entity as both a subject and an object, describes connections the system will need to remember, and isn't a synonym we've chosen already, so it will be accepted.
meets request for: This will be rejected for the same reason that ``keep track of'' was.
has: In the problem statement, this never connects an entity to another entity (instead, it connects an entity to one or more of the names of its attributes), so it should also be rejected.
fails: This does connect an entity to an entity, but it doesn't represent connections that the system needs to remember in order to function, so it will be rejected.
is re-registered in: This will be rejected for the same reason as for ``fails.''
withdraws from: This will be rejected for the same reason as form ``fails.''
passes: This will be accepted. It corresponds to the relationship called ``has passed'' on the entity-relationship diagram for this system that has already been shown (and that was created using a slightly different problem statement - which explains the minor change in name).

Application of Fifth Step

When our list of candidate entities was pruned, the following attributes of entities were discovered - and this is the list we obtain at the beginning of this application of fifth step described above.

ID number
first name
middle initial
last name
discipline code
course number
course title

The word ``name'' was also considered, but we noticed that it was really a synonym for a combination of a ``first name,'' ``middle initial,'' and ``last name,'' so it won't be listed again here.

An examination of the problem statement will confirm that ``ID number,'` ``first name,'' ``middle initial,'' and ``last name'' should all be attributes of ``student,'' while ``discipline code,'' ``course number,'' and ``course title'' should all be attributes of ``course.''

It wasn't necessary to assign any attributes to relationships in this example, so no associative objects were created. For an example in which this would occur, see Version Four of the Student Information System.

There was also no reason to create supertypes and subtypes. For an example of a system in which this would be a good idea, see Version Five of the Student Information System.

Application of Sixth Step

It's easy to apply step 6 to this example.

The attribute ``ID number'' can be used as a primary key for ``student,'' and the pair of attributes ``discipline code'' and ``course number'' can be used as a primary key for ``course.''

Thus, it isn't necessary to use any weak entities on the ERD for this version of the system. For a version in which these are needed, see Version Three.

A Related Exercise

Try to apply this method, starting from the description for Version One of the Student Information System, to obtain the ERD that has been given for it.

After that (if you have the time for it), see whether you could start from the ERD we've obtained above, and the description of Version Three of the Student Information System, to obtain the ERD that's been given already for that version of the system.

If Your Grammar's a Bit Rusty...

If you've been having trouble remembering what a ``noun'' or a ``transitive verb'' is... Note that many good dictionaries describe parts of speech, and their uses, either in front matter or in appendices.

Several other (inexpensive and widely available) manuals of style, and descriptions of English grammar, also include this kind of information. Please send me email if there's one you like well enough so that you think it should be recommended here.

Your work in other courses (even computer science ones) will involve some essay or technical writing, so a reasonably good dictionary or manual of style is probably a good investment, in any case.

Try a Lab Exercise

A lab exercise based on this material is also available.

Location: [CPSC 333] [Listing by Topic] [Listing by Date] [Previous Topic] [Next Topic] Construction from Problem Statement

Department of Computer Science
University of Calgary

Office: (403) 220-5073
Fax: (403) 284-4707

eberly@cpsc.ucalgary.ca