CPSC 333: Construction of an Entity-Relationship Diagram

Location: [CPSC 333] [Listing by Topic] [Listing by Date] [Previous Topic] [Next Topic] Construction from Problem Statement


This material was covered during lectures on January 23, 1997.


Starting Point

The method to be described can be used with a short ``problem statement'' describing a system to be developed - something that is quite brief and incomplete, but that is likely to be available at the very beginning of software development.

It can also use the entity-relationship diagram for a (similar) existing system, if this one is available. This is important, since we can expect that we will need to change an existing specification at least as frequently as we'll need to develop a completely new one.

Later on, this method will be applied to an example. The problem statement for this example is given below.

We'll also see - quite a bit later in the course - that a similar method can be used to produce a class diagram during object-oriented analysis.

Goals

The above information will almost certainly not be sufficient for the development of a ``final'' ERD for our system. However, it will be sufficient for development of a first ERD that can be shown to potential system users for comments. Thus, we can use the above information to produce an ERD that will allow us to begin work.

A Process to Follow

Using the problem statement and similar ERD (if this is available), we will try to choose the entities to be include on the ERD in the first two steps described below.

We will try to choose the relationships to be used as well, in the next two steps after that.

In the step after that, we will try to find attributes and match them to the entities. We may discover at this point that associative objects are needed. We might discover that supertypes and subtypes should be created, as well.

Finally, we will try to choose primary keys for the entities; we may discover at this point that weak entities should be include on the ERD, as well.

First Step: Find Candidate Entities

Make a list of all nouns and noun phrases that appear on the given problem statement. Add the names of all entities, weak entities, supertypes and subtypes that appear on the previous ERD as well, if one is available.

Note that, in the next step, you will ``prune'' this list by discarding things that can't possibly be entities. If you are confident already, at this stage, that some noun shouldn't become an entity or an attribute, then it's certainly OK if you don't include it on the list that you prepare here.

The application of this step to the given example appears below.

Second Step: Prune List of Candidates

Now, ``prune'' this list by eliminating anything that does not have the following necessary characteristics of an entity, that have previously been described.

There are other characteristics of entities that are important - all their instances should have common attributes, and it should be possible to choose a primary key for the entity. We won't worry about these at this point, but they will be considered during later steps of the process.

Again, the application of this step to the given example appears below.

Third Step: Find Candidate Relationships

Make a list of all the transitive verbs and transitive verb phrases that appear in the problem statement.

Include the names of all the relationships and associative objects that appear on the previous ERD, if one has been supplied.

The application of this step to the given example is shown below.

Fourth Step: Prune List of Candidates

Discard verbs or verb phrases unless their subjects and objects are both the names of entities that were selected above (or, synonyms for them). You should also discard verbs or verb phrases if they don't represent connections between instances of these entities that the system must remember in order to function.

Finally, make sure that you don't include two or more names for the same information. That is, discard synonyms, and also make sure that you don't include both ``active'' and ``passive'' forms for the same verbs or verb phrases (that is, don't include both ``passes'' and ``is passed by,'' etc.).

The application of this step to the given example is shown below.

Fifth Step: Find and Match Attributes

When pruning ``candidate entities'' from the list of nouns and noun phrases you generated in the first step, you will have discarded names of all the attributes of entities in the system. These will correspond to information in the problem domain that needs to be stored, but you will probably have ``rejected them,'' because they didn't have ``multiple attributes,'' or because it was already obvious that they were the names of attributes - parts of entities you'd selected already.

At this point you should confirm that all the ``attributes'' you found have ``elementary data types.'' If they don't, then it's possible that they should be broken up into a (fixed!) number of simpler components, and the names of these components should be included in your list of attributes, instead. As usual, you should also eliminate synonyms (so that information isn't included on your ERD more than once).

Once you've formed and pruned your list of attributes, you should choose the entity that each attribute corresponds to.

You may discover that an attribute actually corresponds to one of the relationships you found in the previous step, instead. In this case, the relationship should actually be an associative object.

Finally, if you notice that several entities share attributes (or roles in relationships) then it may a good idea to create supertypes and subtypes.

An application of this step to the given example is shown below.

Sixth Step: Choose Primary Keys

Finally, select a subset of the set of attributes of each entity that should be that entity's primary key.

If you aren't able to find a primary key for an entity, then it is possible that this should be a weak entity, instead. If you can't choose a primary key for everything after you've tried to turn entities into weak entities, then it's likely that you've made a mistake, somewhere along the line...

Application of this Process to a Problem

Problem Statement

Here is an abridged description of Version Two of the Student Information System that omits some unnecessary details, as well as all references to Version One.

It could either be used by itself to produce an ERD for Version Two of the system, or it could be used together with the entity-relationship diagram for Version One.

The system will keep track of the students that are registered in or that have passed academic courses.

In order to meet requests for information, it is necessary to keep track of the ID number and name (first name, middle initial, and last name) of each student that the system knows about. ID numbers are unique; that is, no two students have the same ID number. Names are not necessarily unique.

Each course has a discipline code, a course number, and a course title. No two courses have the same discipline code and also the same course title at the same time. Course titles aren't necessarily unique.

All courses are pass/fail courses. If a student fails a course then that student is automatically re-registered in the course. However, a student can withdraw from a course if the student hasn't already passed it. As well, if a student withdraws from a course then that student can register in it again. However, a student can't register again in a course that the student has already passed.

Application of First Step

If we applied the above first step to this example, then the following list would be produced - assuming that no attempt is made to ``prune'' the list (by rejecting things that couldn't possibly be entites).

Note that, in English, some words - such as ``order'' and ``request'' - can be used either as nouns or as verbs. An examination of the above problem statement will confirm that these are both used there as nouns, so it's appropriate to include them in the above list.

If the ERD for Version One was also available (and being used) then we would use it to add ``student'' to the above list, if wasn't already there. Of course, we've already found this noun on the given problem statement, so it has been included already.

Application of Second Step

If we applied the above second step to this example, then we would accept or reject each of the above nouns or noun phrases, for the following reasons.

Application of Third Step

The transitive verbs and phrases that can be found on the above problem statement, and that would be listed if the third step described above was applied, are as follows.

Note that in a few cases, it isn't immediately clear what the subject of these verbs should be; sometimes it's clear from context that the subject should be ``the system,'' even though this isn't given explicitly in the problem statement.

The ERD for the first version of the system doesn't include any relationships or associative objects at all, so we won't be able to use it to add anything to this list.

Application of Fourth Step

If the fourth step described above is applied to the above list, then the following decisions would be made.

Application of Fifth Step

When our list of candidate entities was pruned, the following attributes of entities were discovered - and this is the list we obtain at the beginning of this application of fifth step described above.

The word ``name'' was also considered, but we noticed that it was really a synonym for a combination of a ``first name,'' ``middle initial,'' and ``last name,'' so it won't be listed again here.

An examination of the problem statement will confirm that ``ID number,'` ``first name,'' ``middle initial,'' and ``last name'' should all be attributes of ``student,'' while ``discipline code,'' ``course number,'' and ``course title'' should all be attributes of ``course.''

It wasn't necessary to assign any attributes to relationships in this example, so no associative objects were created. For an example in which this would occur, see Version Four of the Student Information System.

There was also no reason to create supertypes and subtypes. For an example of a system in which this would be a good idea, see Version Five of the Student Information System.

Application of Sixth Step

It's easy to apply step 6 to this example.

The attribute ``ID number'' can be used as a primary key for ``student,'' and the pair of attributes ``discipline code'' and ``course number'' can be used as a primary key for ``course.''

Thus, it isn't necessary to use any weak entities on the ERD for this version of the system. For a version in which these are needed, see Version Three.

A Related Exercise

Try to apply this method, starting from the description for Version One of the Student Information System, to obtain the ERD that has been given for it.

After that (if you have the time for it), see whether you could start from the ERD we've obtained above, and the description of Version Three of the Student Information System, to obtain the ERD that's been given already for that version of the system.

If Your Grammar's a Bit Rusty...

If you've been having trouble remembering what a ``noun'' or a ``transitive verb'' is... Note that many good dictionaries describe parts of speech, and their uses, either in front matter or in appendices.

Several other (inexpensive and widely available) manuals of style, and descriptions of English grammar, also include this kind of information. Please send me email if there's one you like well enough so that you think it should be recommended here.

Your work in other courses (even computer science ones) will involve some essay or technical writing, so a reasonably good dictionary or manual of style is probably a good investment, in any case.

Try a Lab Exercise

A lab exercise based on this material is also available.

Location: [CPSC 333] [Listing by Topic] [Listing by Date] [Previous Topic] [Next Topic] Construction from Problem Statement


Department of Computer Science
University of Calgary

Office: (403) 220-5073
Fax: (403) 284-4707

eberly@cpsc.ucalgary.ca