CPSC 333 --- Lecture 2 --- Wednesday, Jan 10, 1996 Requirements Analysis and Specification: Goal: A "requirements specification" which describes *what* a software system will be expected to do (rather than how it's supposed to do it) --- to be used as - a starting point for software design - a basis for development of tests that will be applied to the software as (and after) it is developed and maintained - a means to evaluate *test results* after tests are conducted - *** a reference during maintenance *** - an aid for estimating resources required for development and scheduling --- and, when necessary, part of a "contract" between developers and "customers" Analysis Principles: 1. "The information domain" must be represented and understood." ... where the "information domain" includes several aspects/views of the data maintained by a system, which can be studied and modeled independently: a) "Information Flow" --- the way data changes as it is processed by the system --- including relationships among system inputs, outputs, and data read from or written to data stores (or files) b) "Information Content" --- indicates internal structure if data items, including "types" of data items, including identification of simple data items that are components of more complex "aggregate" data items Example: Identification of "character string" as the type of a name, perhaps along with upper and lower bounds on string length, and information about which characters may appear Another Example: Identify a "book" (as viewed by a system) as an aggregate data item including simpler data items such as title, author, ISBN number, number of pages,... as components c) "Information Structure" --- includes information about the *relationships* required to hold among data items as well as "logical organization" of data items Frequently, it makes sense to organize data into a set of "data tables" --- so that it could be stored easily using a set of files or using a relational data base. This isn't always true: For example, information about processes maintained by an operating system might better be maintained in a "priority queue" rather than a set of data tables, since the "highest priority" process might frequently need to be accessed (without having to discover *which* is the highest priority process separately, ahead of time). This type of information about the "logical organization" of data could be considered to be part of "information structure." Note: These three aspects of data are taken from Pressman's books. I find Pressman's definitions to be somewhat vague, or overlapping. I won't ask whether some property or aspect of data is "information content" or "information structure" etc. A *rough* split --- "Information Flow" is what we'll model using "data flow diagrams" and "process specifications." "Information Content" is what we'll model using a data dictionary. "Information Structure" is what we'll model using entity relationship diagrams (and "extensions" of entity relationship diagrams, if there's time in the course to discuss object oriented development). The Important Part: It's important to include *all* of this in the requirements specification --- *and* it's possible to analyze and specify these different aspects somewhat independently using the above three kinds of model. Why?: All of it's needed ... and it seems to be easier to perform the analysis and specification by performing three simpler and smaller (related) tasks instead of a single larger and more complicated one 2. (As foreshadowed above:) All three of these aspects or views of the information domain should be *modeled.* The models should be developed and/or organized om a way that uncovers details in a *layered* or *hierarchical* fashion Why? --- This makes it easier for people "reading" the models to obtain a "top down" view of requirements, starting with a "big picture" and, as desired, navigating through the models in order to discover more information about the parts of the system they're interested in --- It may also help to "localize" (or make it easier to "place") those details that are unknown earlier in development, or prone to change 3. The analysis process should progress from "essential information" towards "implementation detail" "Essential Information" includes the requirements about *what* needs to be done (not *how*) that would remain relevant even if PERFECT TECHNOLOGY were available: - Infinite memory - Memory accessible in zero time - "Infinitely" fast and powerful processors - No *system component* could possibly fail ...but still assuming that the *human beings* and *other* systems that the software-to-be-developed must communicate with are "imperfect" "Implementation detail" includes more information about *how* things are to be done as well as well as requirements necessary because the software is to be developed using "real" (limited, and imperfect) technology Examples of "implementation details" or "implementation requirements" --- duplication of information in data stores --- which may exist in order to improve access time, or to provide a consistency check, but would *not* be needed if memory was accessible in "zero time" and could never fail --- requiring that a system perform a computation twice, in two different ways --- perhaps because processors are unreliable On the other hand: Checks on syntactical and semantical correctness of inputs received by people or external systems are likely to be "essential requirements" --- because we can't assuming that "people" or external systems are "perfect" Why separate essential and implementation requirements? --- Once again, this splits one big job into two smaller and simpler ones. As well: If software is to be developed and *used and maintained* over a substantial length of time, then it's possible (even likely) that the "environment" on which the software it to run --- processor, I/O devices, operating system --- will all change, and that the software will need to be modified accordingly. When this kind of change is made, it's possible that virtually all the "essential requirements" will stay the same, while many (or even all) "implementation details" will need to be modified. Finally: Implementation requirements that aren't separated from "essential requirements" tend to be "cast in stone" --- and included in redesigns and reimplementations, even after the circumstances that made those "implementation requirements" necessary/desirable have changed. Modeling Information Structure: Entity Relationship Diagrams These were originally used to model requirements to relational data bases. More recently, they're been extended or modified and then used to model requirements for "object-oriented" systems. *Lack of* Reference Material - Unfortunately these models are *not* included in Pressman's "Beginner's Guide." - They *are* discussed briefly in Pressman's "Practitioner's Guide" in Section 8.3 (pp. 256--263) --- BUT I'll be discussing a simpler "version" of ERDs, so I won't cover some of the "extended" notation you'll find in Pressman's examples - *My* primary reference for this material is: Sally Shlaer and Stephen J. Mellor Object-Oriented Systems Analysis: Modeling the World in Data Prentice-Hall (Yourdon Press Computing Series) Englewood Cliffs, NJ, 1988 QA 76.76 D47 S53 1988 Another book with some useful information about this topic (and how it relates to some other analysis topics to be covered later): Edward Yourdon Modern Structured Analysis Prentice-Hall (Yourdon Press Computing Series) Englewood Cliffs, NJ, 1989 QA 76.9 S84 Y685 1989 These diagrams model the information that must be *remembered* and therefore *stored* by the software system over a nontrivial period of time, because the information is received as input from a user or created as a system's response to a user's request at one time, and must be accessed, modified, or reported in order to deal with additional, later, requests from users. We will *not* use these to model - additional data that a system might need to read, create, or report in order to deal with a single event but that it doesn't need to remember - anything else (in particular, algorithms or processes that the system is required to follow) "Warning:" Some *extended* ERDs *do* include some of this information --- this is especially true of the extended ERDs used for "object oriented analysis" --- but the ERDs discussed in CPSC 333 will *not* include this. We'll use different tools to model this extra information instead. Ongoing Example: Student Information System VERSION ONE: This system will be used to keep track of the students that are registered in or that have completed some (SINGLE, FIXED) academic course. In order to deal with requests for information it is necessary to keep track of the ID number and name of each student the system knows about. ID numbers are unique --- no two students have the same ID number. Names are not unique. It is not necessary for the system to "be aware" of students who have never registered in the course. This is a pass/fail course. Students who fail from the course are automatically registered in the next section of the course --- that is, they remain registered. Since students may repeat the course (before they pass it) as often as they'd like, it isn't necessary for the system to know how may times (if any) a given student has failed the course. A student can "withdraw" from the course if (s)he hasn't passed it. It isn't necessary to keep track of students who have registered from the course and then withdrawn before passing it. It *is* necessary for the system to remember whether a student *has* passed the course. In particular, it should be possible to give the system an ID number and for the system to report either that - the student is not currently registered in the course and has never passed it - the student is currently registered (but has not passed yet) - the student has passed the course If the ID number belongs to a currently registered student or to a student who has passed the course, then the system should be able to report the student's name. It should also be possible to - add new students who have registered in the course - report that students have withdrawn without passing (which will cause the system to "forget about" them) - report that a registered student has passed the course. Since the people or other systems providing information might occasionally make mistakes (providing incorrect information) that will need to be corrected, it should also be possible to - correct the ID number or name of a student - change the status of a student from "passed" back to "registered" Finally, there should also be a way to remove students from the system --- but we won't worry about this now. An "entity relationship diagram" for this system is as follows. ___________________ | | | Student | | | ------------------- --- that is, it consists of a single rectangle with the label "Student." This is a diagram with a single *entity* (or *object*) called "Student" and with no "relationships". This represents or "corresponds to" a single table of data that must be maintained by the system. This table will have three columns: - a column of ID numbers, with no numbering appearing more than once in it - a column of names of students (character strings) - a column representing the "status" of each student, which is either "passed" or "registered" There will be one row of the table for each student that the system "currently" needs to know about --- with the ID number, name, and status of that student listed in that row. Thus, you can think of each "instance" of the "object" (or "entity") student as corresponding to a three-tuple: an ID number, name, and status. The ID number, name, and status all have "elementary" data types --- integer, character string, and "passed"_or_"registered" (element of an "enumerated set") respectively. Since these are "elementary" and are all the components that make up an instance of "Student", ID number, name and status are the (three) *attributes* of the object "Student". Thus (again): an object in an ERD corresponds, more or less, to a table of data. Each column of the table stores an "elementary" data item in each row. The number of *columns* is fixed, and there are no blank or "undefined" spaces in the table. The number of *rows* can change as the system is used --- as "instances" of the object are added or deleted. Each column of the table corresponds to an *attribute* of the object, and each row of the table corresponds to an *instance* of the object.