CPSC 333 --- Lecture 2 --- Wednesday, Jan 10, 1996

Requirements Analysis and Specification:

 Goal: A "requirements specification" which describes *what* a
 software system will be expected to do (rather than how it's
 supposed to do it) --- to be used as

  - a starting point for software design
  - a basis for development of tests that will be applied to
    the software as (and after) it is developed and maintained
  - a means to evaluate *test results* after tests are conducted
  - *** a reference during maintenance ***

  - an aid for estimating resources required for development and
    scheduling --- and, when necessary, part of a "contract" between
    developers and "customers"

Analysis Principles:

 1. "The information domain" must be represented and understood."

    ... where the "information domain" includes several aspects/views
    of the data maintained by a system, which can be studied and
    modeled independently:

    a) "Information Flow" --- the way data changes as it is
        processed by the system --- including relationships
        among system inputs, outputs, and data read from or
        written to data stores (or files)

    b) "Information Content" --- indicates internal structure
        if data items, including "types" of data items, including
        identification of simple data items that are components
        of more complex "aggregate" data items

        Example: Identification of "character string" as the
        type of a name, perhaps along with upper and lower
        bounds on string length, and information about which
        characters may appear

        Another Example: Identify a "book" (as viewed by a system)
        as an aggregate data item including simpler data items
        such as title, author, ISBN number, number of pages,...
        as components

    c) "Information Structure" --- includes information about
        the *relationships* required to hold among data items
        as well as "logical organization" of data items

        Frequently, it makes sense to organize data into a set
        of "data tables" --- so that it could be stored easily
        using a set of files or using a relational data base.
        This isn't always true: For example, information about
        processes maintained by an operating system might better
        be maintained in a "priority queue" rather than a set of
        data tables, since the "highest priority" process might
        frequently need to be accessed (without having to discover
        *which* is the highest priority process separately, ahead
        of time). This type of information about the "logical
        organization" of data could be considered to be part of
        "information structure."

    Note: These three aspects of data are taken from Pressman's
    books. I find Pressman's definitions to be somewhat
    vague, or overlapping. I won't ask whether some property or
    aspect of data is "information content" or "information structure"
    etc.

    A *rough* split --- "Information Flow" is what we'll model
    using "data flow diagrams" and "process specifications."
    "Information Content" is what we'll model using a data dictionary.
    "Information Structure" is what we'll model using entity
    relationship diagrams (and "extensions" of entity relationship
    diagrams, if there's time in the course to discuss object
    oriented development).

    The Important Part: It's important to include *all* of this
    in the requirements specification --- *and* it's possible to
    analyze and specify these different aspects somewhat independently
    using the above three kinds of model.

    Why?: All of it's needed ... and it seems to be easier to
    perform the analysis and specification by performing three
    simpler and smaller (related) tasks instead of a single larger
    and more complicated one

 2. (As foreshadowed above:) All three of these aspects or views
    of the information domain should be *modeled.* The models
    should be developed and/or organized om a way that uncovers
    details in a *layered* or *hierarchical* fashion

    Why? --- This makes it easier for people "reading" the models
    to obtain a "top down" view of requirements, starting with
    a "big picture" and, as desired, navigating through the models
    in order to discover more information about the parts of the
    system they're interested in

    --- It may also help to "localize" (or make it easier to "place")
    those details that are unknown earlier in development, or prone
    to change

 3. The analysis process should progress from "essential information"
    towards "implementation detail"

    "Essential Information" includes the requirements about *what*
     needs to be done (not *how*) that would remain relevant even if
     PERFECT TECHNOLOGY were available:

      - Infinite memory
      - Memory accessible in zero time
      - "Infinitely" fast and powerful processors
      - No *system component* could possibly fail

     ...but still assuming that the *human beings* and *other* systems
     that the software-to-be-developed must communicate with are
     "imperfect"

     "Implementation detail" includes more information about *how*
     things are to be done as well as well as requirements necessary
     because the software is to be developed using "real" (limited,
     and imperfect) technology

     Examples of "implementation details" or "implementation
     requirements"

      --- duplication of information in data stores --- which
       may exist in order to improve access time, or to provide
       a consistency check, but would *not* be needed if memory
       was accessible in "zero time" and could never fail

      --- requiring that a system perform a computation twice,
       in two different ways --- perhaps because processors
       are unreliable

     On the other hand: Checks on syntactical and semantical
     correctness of inputs received by people or external systems
     are likely to be "essential requirements" --- because we
     can't assuming that "people" or external systems are "perfect"

     Why separate essential and implementation requirements? --- Once
     again, this splits one big job into two smaller and simpler
     ones. As well: If software is to be developed and *used and
     maintained* over a substantial length of time, then
     it's possible (even likely) that the "environment" on which
     the software it to run --- processor, I/O devices, operating
     system --- will all change, and that the software will need to
     be modified accordingly. When this kind of change is made, it's
     possible that virtually all the "essential requirements" will
     stay the same, while many (or even all) "implementation details"
     will need to be modified.

     Finally: Implementation requirements that aren't separated from
     "essential requirements" tend to be "cast in stone" --- and
     included in redesigns and reimplementations, even after the
     circumstances that made those "implementation requirements"
     necessary/desirable have changed.


Modeling Information Structure: Entity Relationship Diagrams

These were originally used to model requirements to relational
data bases. More recently, they're been extended or modified and
then used to model requirements for "object-oriented" systems.

*Lack of* Reference Material

 - Unfortunately these models are *not* included in Pressman's
   "Beginner's Guide."

 - They *are* discussed briefly in Pressman's "Practitioner's
   Guide" in Section 8.3 (pp. 256--263) --- BUT I'll be discussing
   a simpler "version" of ERDs, so I won't cover some of the
   "extended" notation you'll find in Pressman's examples

 - *My* primary reference for this material is:

     Sally Shlaer and Stephen J. Mellor
     Object-Oriented Systems Analysis: Modeling the World in Data
     Prentice-Hall (Yourdon Press Computing Series)
     Englewood Cliffs, NJ, 1988
     QA 76.76 D47 S53 1988

    Another book with some useful information about this topic
    (and how it relates to some other analysis topics to be
    covered later):

     Edward Yourdon
     Modern Structured Analysis
     Prentice-Hall (Yourdon Press Computing Series)
     Englewood Cliffs, NJ, 1989
     QA 76.9 S84 Y685 1989

These diagrams model the information that must be *remembered*
and therefore *stored* by the software system over a nontrivial
period of time, because the information is received as input from
a user or created as a system's response to a user's request at one
time, and must be accessed, modified, or reported in order to deal
with additional, later, requests from users.

We will *not* use these to model

  - additional data that a system might need to read, create,
    or report in order to deal with a single event but that it
    doesn't need to remember

  - anything else (in particular, algorithms or processes that
    the system is required to follow)

"Warning:" Some *extended* ERDs *do* include some of this information
--- this is especially true of the extended ERDs used for "object
oriented analysis" --- but the ERDs discussed in CPSC 333 will *not*
include this. We'll use different tools to model this extra
information instead.

Ongoing Example: Student Information System

VERSION ONE: This system will be used to keep track of the students
that are registered in or that have completed some (SINGLE, FIXED)
academic course. In order to deal with requests for information it
is necessary to keep track of the ID number and name of each student
the system knows about. ID numbers are unique --- no two students
have the same ID number. Names are not unique.

It is not necessary for the system to "be aware" of students who have
never registered in the course.

This is a pass/fail course. Students who fail from the course are
automatically registered in the next section of the course --- that
is, they remain registered. Since students may repeat the course
(before they pass it) as often as they'd like, it isn't necessary for
the system to know how may times (if any) a given student has failed
the course.

A student can "withdraw" from the course if (s)he hasn't passed it. It
isn't necessary to keep track of students who have registered from the
course and then withdrawn before passing it.

It *is* necessary for the system to remember whether a student *has*
passed the course. In particular, it should be possible to give the
system an ID number and for the system to report either that

 - the student is not currently registered in the course and has
   never passed it
 - the student is currently registered (but has not passed yet)
 - the student has passed the course

If the ID number belongs to a currently registered student or to a
student who has passed the course, then the system should be able to
report the student's name.

It should also be possible to

 - add new students who have registered in the course
 - report that students have withdrawn without passing (which will
    cause the system to "forget about" them)
 - report that a registered student has passed the course.

Since the people or other systems providing information might
occasionally make mistakes (providing incorrect information) that
will need to be corrected, it should also be possible to

 - correct the ID number or name of a student
 - change the status of a student from "passed" back to "registered"

Finally, there should also be a way to remove students from the system
--- but we won't worry about this now.

An "entity relationship diagram" for this system is as follows.

                          ___________________
                         |                   |
                         |      Student      |
                         |                   |
                          -------------------

--- that is, it consists of a single rectangle with the label
"Student."

This is a diagram with a single *entity* (or *object*) called
"Student" and with no "relationships".

This represents or "corresponds to" a single table of data that
must be maintained by the system. This table will have three
columns:

 - a column of ID numbers, with no numbering appearing more than
   once in it

 - a column of names of students (character strings)

 - a column representing the "status" of each student, which is
   either "passed" or "registered"

There will be one row of the table for each student that the system
"currently" needs to know about --- with the ID number, name, and
status of that student listed in that row.

Thus, you can think of each "instance" of the "object" (or "entity")
student as corresponding to a three-tuple: an ID number, name, and
status.

The ID number, name, and status all have "elementary" data types ---
integer, character string, and "passed"_or_"registered" (element of
an "enumerated set") respectively. Since these are "elementary" and
are all the components that make up an instance of "Student", ID
number, name and status are the (three) *attributes* of the object
"Student".

Thus (again): an object in an ERD corresponds, more or less, to a
table of data. Each column of the table stores an "elementary" data
item in each row. The number of *columns* is fixed, and there are no
blank or "undefined" spaces in the table. The number of *rows* can
change as the system is used --- as "instances" of the object are
added or deleted. Each column of the table corresponds to an
*attribute* of the object, and each row of the table corresponds to an
*instance* of the object.