CPSC 333 --- Lecture 5 --- Wednesday, January 17, 1996 Creation of Entity Relationship Diagrams We'll start from a short statement of the problem and use this to produce a list of possible entities, relationships, and attributes for an ERD. (A sample problem description, for a possible "student information system," was introduced at this point. Each of the following steps were applied to the example after the step was described.) These are obtained by performing a *grammatical parse* of the problem statement. 1) List all nouns and noun phrases in the problem statement. These will be candidates for entities. 2) "Prune" this list, using the following criteria for ENTITIES: a) STORED DATA REQUIREMENT: The noun is a potential entity only if information about it *must* be *remembered* by the system in order for it to function --- because this information may be used to perform multiple system functions that occur at different times. Otherwise, storage of information about this "entity" is not part of the *essential requirements* for the system. b) MULTIPLE INSTANCES: It will be necessary for the system to keep track of more than one instance of the potential entity at a time. The number of instances the system might be expected to know about should be "unbounded" (or, at least, be more than, say, five or ten) --- so that it would make sense to store information about the "entity" in a data table rather than by using a small number of "registers". c) COMMON ATTRIBUTES: A set of "attributes" can be defined for the potential entity. Each "attribute" should have an elementary data type --- such as boolean, integer, real, character string, or member of an "enumerated set" --- but not "list," "array," "tree," etc. This set of attributes should be finite (and fixed), and all of these attributes should apply to all instances of the potential entity. Furthermore there should be *exactly one* value for each attribute, for each entity instance. d) KEY: There should be some subset of the attributes for a potential entity that forms a *key* --- so that no two instances of the entity have the same values for every attribute in the "key" (so that the values for the "key" attributes can be used to identify the entity instance). e) MULTIPLE ATTRIBUTES: An entity should have *two or more* attributes. Ideally this will include at least one "non key" attribute --- that is, it should *not* be necessary to use *all* the entity's attributes as part of one key. Alternatively, one might include a "one attribute" entity, or an entity whose key includes all its attributes, if is involved in one or more relationship that the system must remember. Nouns that fail to meet one or more of these requirements and should be discarded include - names of things that the system doesn't need to *remember* information about - names of things that never have more than one instance (usually fixed) --- such as the name of the "system" itself - names of attributes As well, care to should be taken to detect and eliminate *synonyms* --- if the same entity might have *two or more names* then only *one* of those names should be included as an "entity" in the list after the pruning has taken place. ... to be continued in Lecture #6.