CPSC 333: Evaluating and Improving Entity-Relationship Diagrams

Location: [CPSC 333] [Listing by Topic] [Listing by Date] [Previous Topic] [Next Topic] Evaluating and Improving ERDs


This material was not covered in lectures in Winter, 1997, due to lack of time. However, it is ``required reading'' for CPSC 333 students.


In this course, an entity-relationship diagram is being used to model requirements involving data that must be stored (and ``remembered'') by a system for a nonnegligible amount of time in order for the system to function correctly.

The following guidelines can be used to improve an existing entity-relationship diagram (rather than to create a new one from a problem statement or other information).

Avoid Redundancy

Guidelines for evaluating the ``structure'' of an entity-relationship diagram are intended to ensure that the diagram, and any ``data tables'' corresponding to it, are as simple as possible. They also serve to eliminate any redundant data from the model. Redundant data includes data that can be computed from other information already in the data base, as well as duplicate copies of data items.

While one might decide to introduce redundant data into a data model in order to improve performance, or to provide ``fault tolerance,'' these decisions shouldn't be made until after performance requirements have been determined. This doesn't happen until well after requirements analysis has been started - and well after an entity-relationship diagram modelling essential requirements has been produced. It is more difficult to correctly modify an ERD (to reflect a change in requirements) if the ERD corresponds to a data model including redundant data, than it is if the ERD doesn't. Furthermore, the existence of redundant data in the model makes it more difficult to ensure that stored data remains consistent as the data is updated. That is, redundancies also complicate ``regular use'' of the data base or files.

Therefore, at this point in the course, an entity-relationship diagram that includes redundant data will be considered to be incorrect.

Normalization Rules

Normalization rules for entity-relationship diagrams are based on (and are virtually identical to) normalization rules for relational data bases. Ideally, an entity-relationship diagram should obey all four of the following ``normalization rules.''

Rule #1

Each instance of each entity (or weak entity, or associative object) has exactly one value for each of the entity's attributes. That is, there is exactly one meaningful value in each row-and-column of the entity's data table.

Rule #2

Attributes can have no (useful) internal structure. That is, each attribute must have an elementary data type such as integer, string, real, boolean, or an element of an enumerated set - and not ``tuple,'' ``list,'' ``array,'' or some other complex or aggregate data type.

Entity-relationship diagrams that obey rules 1 and 2 are in first normal form.

Rule #3

When an entity (or associative object or weak entity) has a ``compound key'' - that is, one consisting of two or more attributes - then each of the entity's attributes that is not part of this key should ``depend on'' the entire key, and should not be derivable from a proper subset of the key.

For example, consider the ERD for Version 3 of the Student Information System, and suppose the entity ``Course'' and its associated weak entity ``Course Section'' were merged together to form a single entity, having all the attributes of ``Course'' and ``Course Section.'' The primary key of ``Course Section,'' {discipline code, course number, year, term, section number}, could serve as a ``compound key'' for this new entity.

The entity-relationship diagram containing this new entity would violate rule #3, because the attribute ``title'' (which was a non-key attribute of the old entity ``Course'') would be an entity that only depends on a proper subset of the new entity's primary key - namely, {discipline code, course number}, the primary key of the old entity ``Course.''

An entity-relationship diagram that obeys rules #1, 2, and 3 is in second normal form.

Rule #4

Each attribute that is not part of a key for an entity (or associative object or weak entity) should not be derivable from some other non-key attribute, unless the other ``non-key attribute'' could be used, by itself, as another key for the entity.

Suppose, for example, that an attribute called ``starting month'' was added to the entity ``Course Section'' in the student information system. Suppose, as well, that the value of ``starting month'' was guaranteed to be ``September'' of the value of ``term'' (for the same instance) was ''Fall,'' that the value of ``starting month'' was guaranteed to be ``January'' if the value of ``term'' was ``Winter,'' the value of ``starting month'' was guaranteed to be ``May'' if ``term'' had value ``Spring,'' and the value of ``starting month'' was guaranteed to be ``July'' if the value of ``term'' was ``Summer.'' Then the value of the attribute would be computable from the non-key attribute ``term,'' and this would be a violation of Rule #4.

Structural Problems and Solutions

Some of the following problems are special cases of violations of one or more of the above normalization rules.

Problem #1

Problem: You discover data elements (attributes) that apply to some instances of an entity, but not to all of them - violating rule #1.

For example, there may be several ways to pay employees in a company, so some instances of ``Employee'' have an ``hourly rate'' (as an attribute), while others have a ``salary.'' These are two different things, and am employee's monthly pay should be computed from one or the other, and is computed from each in a different way.

Solution: Create subtypes of the original entity (which becomes a supertype), and group instances of the original entity using the attributes applicable to each (making each group a set of the instances of one of the new subtypes).

In the example, you would need to introduce at least two subtypes of ``Employee'' - ``Hourly Employee,'' whose instances are all instances of ``Employee'' that have an ``hourly rate,'' and ``Salaried Employee,'' whose instances are all instances of ``Employee'' that have a ``salary.''

Problem #2

Problem: You discover that some data elements are common to (and have the same type for) two or more entities. You know that this is not ``an accident:'' the definitions must be the same for these common entities.

Solution: Create a supertype of these entities, turning the ``common'' attributes into attributes of the new supertype.

Note, though, that an entity in an entity-relationship is not generally allowed to be a (direct) subtype of two or more entities at the same time - entity-relationship diagrams don't support ``multiple inheritance.'' Therefore, you might not be able to use this solution for the problem, if it would create a second ``supertype'' for some entity in the diagram.

You could probably accomplish the same thing by performing a more extensive reorganization of the diagram, replacing existing supertypes by a set of new ones - but this is difficult to describe and would lead to a complicated diagram.

It might be necessary, then, to ``annotate'' your diagram by listing the fact that some attributes should always have the same types (even though the diagram, and the types of these attributes, might be changed later on). You can include this kind of ``annotation'' in the definition for the attributes in the data dictionary for this system.

Problem #3

Problem: A data item appears to be an attribute of a ``combination'' of two or more instances from different entities, rather than an attribute of (an instance of) a single entity.

Solution: This data item should probably be an attribute of an associative object that connects these entities.

Problem #4

Problem: ``Repeating Groups'' - there may be several values of an attribute (or several sets of values for a set of attributes) corresponding to each instance of an entity. This would violate normalization rule #1 (or normalization rule #2, if you ``solved'' the problem by using a list or array as the type for the attribute(s) involved, in order to allow multiple values to be stored).

For example, suppose an entity ``Employee'' has an attribute, ``child's name.'' If an employee has several children then it isn't clear which of the childrens' names should be used to set the value for this attribute.

Solution: Create a new entity and a one-to-many relationship (which might be conditional) between the old entity and the new - or, perhaps, create a weak entity that depends on the original entity. The attributes of the new entity or weak entity will be the ``repeating'' attributes from the original entity, and these will be removed from the set of attributes from this entity.

In the example, one could create a weak entity called ``Child'' with an attribute ``name,'' connect it to the old entity, ``Employee,'' and delete ``child's name'' from the set of attributes for ``Employee.''

Problem #5

Problem: An entity has only one attribute.

This is unusual, and should be avoided if possible, but it might be acceptable if the entity is connected to several other entities by associative objects or relationships, or if this entity is a weak entity.

Solution: If there is a relationship between this and another entity, and each instance of the other entity is connected to exactly one instance of this ``single-attribute'' entity, then it is possible that this shouldn't be an entity at all, but should simply be an attribute of the other entity that it's connected to.

Problem #6

Problem: You discover that some entity can never have more than instance at a time (or, worse yet, that there's only one instance, and that instance never changes!).

Solution: Remove this entity from the model completely. The same solution should be used if you discover that some attribute can only take on one possible value (that is, it behaves more like a ``constant'' than a ``variable.'').

Problem #7

Problem: The value of some attribute can be computed as a function of some other information in the data model - such as the values of other attributes of the same entity.

Solution: Remove this ``redundant'' attribute. If its value can be computed from other information that's already being stored then there's no need to store it, too.

Consistency with Functional Requirements

Once a model of the ``functional requirements'' for the system (describing the operations that the system must support) has been created, one should check that the ``stored data'' requirements and ``functional'' requirements are consistent.

In particular, you should make sure that each system operation really could be carried out using only the inputs provided to the system for that operation, together with the information that is stored by the system (and represented by the system's ERD) - and, possibly, with a small (and ``bounded'') amount of additional data that might be kept in a small number of ``registers,'' as well. If it's discovered that some operation couldn't be performed, then this might mean that additional data should be stored, and that the ERD should be extended.

You should also try to confirm that all the data stored by the system and represented on the ERD really is needed in order for the system's operations to be performed correctly. If you can't find any use for a piece of information (or, if you discover that it could be generated from other stored data) then you should wonder whether that information really does need to be stored, and you should consider removing it from the ERD.

Location: [CPSC 333] [Listing by Topic] [Listing by Date] [Previous Topic] [Next Topic] Evaluating and Improving ERDs


Department of Computer Science
University of Calgary

Office: (403) 220-5073
Fax: (403) 284-4707

eberly@cpsc.ucalgary.ca