CPSC 333 --- Lecture 5 --- Wednesday, January 17, 1996

Creation of Entity Relationship Diagrams

We'll start from a short statement of the problem and use this to
produce a list of possible entities, relationships, and attributes
for an ERD.

(A sample problem description, for a possible "student information
system," was introduced at this point. Each of the following steps
were applied to the example after the step was described.)

These are obtained by performing a *grammatical parse* of the problem
statement. 

1) List all nouns and noun phrases  in the problem statement. These
   will be candidates for entities.

2) "Prune" this list, using the following criteria for ENTITIES:

    a) STORED DATA REQUIREMENT: The noun is a potential entity only if
       information about it *must* be *remembered* by the system in
       order for it to function --- because this information may be
       used to perform multiple system functions that occur at
       different times. Otherwise, storage of information about this
       "entity" is not part of the *essential requirements* for the
       system.

    b) MULTIPLE INSTANCES: It will be necessary for the system to
       keep track of more than one instance of the potential entity
       at a time. The number of instances the system might be expected
       to know about should be "unbounded" (or, at least, be more
       than, say, five or ten) --- so that it would make sense to
       store information about the "entity" in a data table rather
       than by using a small number of "registers".

    c) COMMON ATTRIBUTES: A set of "attributes" can be defined for
       the potential entity. Each "attribute" should have an
       elementary data type --- such as boolean, integer, real,
       character string, or member of an "enumerated set" --- but
       not "list," "array," "tree," etc.

       This set of attributes should be finite (and fixed), and all
       of these attributes should apply to all instances of the
       potential entity. Furthermore there should be *exactly one*
       value for each attribute, for each entity instance.

    d) KEY: There should be some subset of the attributes for a
       potential entity that forms a *key* --- so that no two
       instances of the entity have the same values for every
       attribute in the "key" (so that the values for the "key"
       attributes can be used to identify the entity instance).

    e) MULTIPLE ATTRIBUTES: An entity should have *two or more*
       attributes. Ideally this will include at least one "non
       key" attribute --- that is, it should *not* be necessary
       to use *all* the entity's attributes as part of one key.

       Alternatively, one might include a "one attribute" entity,
       or an entity whose key includes all its attributes, if is
       involved in one or more relationship that the system must
       remember.

    Nouns that fail to meet one or more of these requirements
    and should be discarded include
    
        - names of things that the system doesn't need to
          *remember* information about
        - names of things that never have more than one
          instance (usually fixed) --- such as the name of
          the "system" itself
        - names of attributes

    As well, care to should be taken to detect and eliminate
    *synonyms* --- if the same entity might have *two or more
    names* then only *one* of those names should be included
    as an "entity" in the list after the pruning has taken place.

... to be continued in Lecture #6.