CPSC 333 --- Lecture 20 --- Friday, February 23, 1996

Principles of Testing

For the *kind* of system (extremely large; often complex; involving
numerous developers) discussed in this course...


 1) Testing is the process of examining or executing a program
    *with the intention of finding errors* ... and not to somehow
    deduce that it doesn't contain any errors

    Explanation: It must be assumed that the program *will* include
    numerous errors when it's first developed.

    The cost of *correcting* an error increases, drastically, as
    development proceeds. Therefore, the *sooner* an error is found,
    the better!

    "Corollaries:"

      - A *good* test is one that has a high probability of finding
        an as-yet-undetected error

      - A test *succeeds* if it *does* find an error!

 2) It is impossible to completely test any nontrivial module for
    any system.

    Explanation: In general, a single test gives information about
    the behaviour of the program on *only one input.*

    If hardware and software bounds (maximum sizes of integer
    inputs, maximum array lengths, etc.) are ignored, then the
    number of inputs --- and the number of tests needed for
    complete or "exhaustive" testing --- is generally infinite.

    Even if these hardware and software limits are included, the
    number of tests needed for "exhaustive" testing for a function
    with only one (simple) input is large --- and the number of
    tests needed grows *exponentially* as the number of input
    parameters.

     ... You don't need your module to have very many parameters
     before you're forced to conclude that the *time* needed to
     complete "exhaustive testing" would exceed the lifetime of
     the universe, even when assuming that the time needed to complete
     a single test is shorter than is currently possible ...

 3) Testing takes Creativity and Hard Work

    You can think of this as a "corollary" of #2: Exhaustive testing
    is impossible, so the best we can hope for is a small set of tests
    that somehow "cover" most plausible cases. Designing a set of
    tests that has this property is nontrivial.

 4) Test Results should be Recorded...

     ... for comparison with results obtained during "retesting,"
     after changes have been made ... in order to look for any
     unexpected or undesirable "side effects" of changes, as well
     as to see whether the changes helped.

 5) Testing is Best Done by Several *Independent* Testers

    ... and not (entirely) by the developers who designed and
    coded the system.

    Explanation: One *very* common source of errors is a
    misunderstanding of system requirements. If developers
    misundersand the requirements, they are also liable to
    "test the wrong thing" --- designing tests based on their
    incorrect understanding of what the system is supposed to do.
    It's also possible that developers can become too "attached to"
    their own work and will be reluctant to be critical of it.


Levels (or "Stages") of Testing

So far, development has proceeded from "general" (the entire system)
to "specific" (individual modules). Since *development* of tests
can take place as development proceeds, *test design* can proceed in
the same way.

However, *execution of tests* is performed in reverse order:

 1) Unit Testing: Each module in the system is *individually* tested.

 2) System Integration and Integration Testing: *After unit testing,*
    the modules are combined (or "integrated") together in order to
    form progressively larger and more complicated subsystems -- and
    each subsystem is tested before it is combined into an even
    larger system.

    When errors are found, it is generally necessary to "roll the
    process back:" Changes are made to one or more of the modules
    in the subsystem. The modules that have been changed must be
    "unit tested" again --- and integration tests for the subsystems
    containing these modules must be repeated in order to try to
    ensure that the detected problems have been eliminated, and no
    new problems caused, by the changes.

    Eventually the entire system is combined together and tested.

 3) Validation Testing: The *people who will be using the
    delivered system* begin to use the system, partly under
    "typical working conditions."

 4) System Testing: Software is often part of a much larger system
    that includes hardware, a data base, people, etc. After the
    software has been tested it is necessary to test the system
    as a whole.

We'll (eventually) consider each of these stages in turn.


Types of Tests

Both "static testing" and "dynamic testing" should be performed.

*Static Testing* is testing done directly on the source code of
a program, without executing it.

Types of static testing that can be performed manually:

  - Desk Checking: reading the source code, and scanning for
    possible errors in either syntax or logic.

  - Hand Execution: "playing computer," by reading successive
    lines of source code and carrying out the appropriate
    activities by making notation on paper.

Automated static testing can produce lists of errors, highlight
questionable coding practices, or flag departures from coding
standards.

Static analyzers can also provide information about the structure of
code, including symbol tables, call graphs (showing which modules are
called by which other modules), logic flow graphs, lists of parameters
passed to each module, etc.

*Dynamic Testing* test the behaviour of a module or program during
execution.

Two major approaches to dynamic testing:

 - Black Box Testing: also called "functional testing:" includes
   tests based on the functional requirements of programs (as given
   by requirements and module specifications)

 - White Box Testing: also called "glass box testing" or "structural
   testing:" includes tests based on the internal workings and
   operations of a module.

It isn't possible to perform either of these types of tests
"exhaustively": For black box testing, there are generally too many
possible inputs to test them all. For white box testing, there are
generally too many control paths through a module for all to be
checked in any reasonable amount of time.

Black Box Testing is useful for finding
 - incorrect or missing functions
 - interface errors
 - errors in data structure or external data base interfaces
 - performance errors
 - initialization and termination errors
This type of testing is conducted to some extent during unit testing,
and more extensively during later testing stages.

White Box Testing --- typically tries to ensure that
 - all "independent" control paths through a module have been checked
 - all logical decisions within a module have been exercised on
    both their "true" and "false" sides (or, all possible "cases"
    for a *case* statement* have been tested)
 - all loops are their boundaries --- both minimal and maximal
    numbers of iterations are checked --- as well as "typical"
    numbers of iterations
 - all internal data structures are checked
This type of testing is conducted at earlier stages --- extensively,
during unit testing.

"Why Perform White Box Testing?"

 - Frequency of logical errors and incorrect assumptions appears to
   be *inversely* proportional to the probability that a program
   path will "normally" be executed --- and we want to find these
   errors (in infrequently used parts of the program) *soon*
 - We often believe that a logical path is unlikely to be executed
   when, in fact, it will be executed on a regular basis
 - Typographical errors are "random" --- they are as likely to be
   on "obscure" paths as anywhere else

... and white box testing *does* attempt to check these "obscure"
control paths.