CPSC 333 --- Lecture 24 --- Wednesday, March 13, 1996

Additional White Box and Black Box Unit Tests:

 Reference: Sections 18.4 and 18.5 of Pressman's "Practitioner's Guide"
  --- pages 611--623.

 Additional White Box Tests:

   - (Slightly) different methods for testing conditions in
     programs

   - Data Flow Testing: A method that "selects test paths of a program
     according to the locations of definitions and uses of variables
     in the program."

 Black Box Tests:

   - Equivalence Partitioning: The "input domain" of a program is
     divided into classes of data from which test cases can be
     derived. Guidelines include instructions like the following:
     If an input is expected to belong to some range of numbers,
     include tests that include values for the input that are
     out of range --- both below the minimum allowable value and
     above the maximum --- and also in the expected range.

     One justification for "Equivalence Partitioning" is that,
     if structured programming has been used to implement the
     module, then it's to be hoped that the program will work
     consistently on all inputs in the same "class:" in particular,
     if it fails on one input in the class it'll likely fail on
     all of them.

   - Boundary Value Analysis: This "complements" Equivalence
     Testing, adding tests in which inputs have values at the
     "boundaries" of their allowed ranges of values. For example,
     if an input parameter is allowed to have values between some
     minimum value "a" and some maximum value "b" then the
     guidelines for Boundary Value Analysis will dictate inclusion
     of tests in which the parameter has values a-1, a, a+1,
     b-1, b, and b+1.

See Pressman's book for details about these additional testing
methods.

  
Something Pressman Missed?

If you're replacing an existing system, you may have access to "real"
problems --- ones that were solved using the system to be replaced.
These would, presumably, be useful for testing the system on "typical"
inputs. This certainly wouldn't hurt, given that the above methods
do seem to emphasize "extreme" or "atypical" cases!


Testing the Tests

These ideas can be applied to unit testing, or to integration testing
(which will follow). Since "exhaustive testing is impossible" we can
only hope to use a relatively small number of tests that will detect
as many of the errors in the system as possible. Under these
circumstances it's useful to have a way to estimate the likelihood
that a set of tests *will* catch most of the errors in code --- or, to
estimate the *proportion* of the errors in a program that will be
detected by a given set of tests.

The following two, similar, techniques can be used to try to assess
(or "test") a given set of tests in this way. In both cases, a set of
programming errors that *could* plausibly have been made, for the code
to be tested, are selected.

Error Seeding: For this method, all of these chosen errors are added
into (or "seeded" into) the code. That is, a new version of the
program is created by introducing these errors.

The tests are then executed on the new "seeded" program. The number of
"seeded" errors that were caught by the tests is counted. Clearly, the
higher the number of these errors that were caught, the better (and,
ideally, you'd like *all* the seeded errors have been detected).

One *might* use the proportion of "seeded errors that were detected"
to "seeded errors", as an estimate of the *unseeded* (that is, "real")
errors in the program that were "caught" by the tests as well.

Mutation Testing: In this case, instead of having one new version of
the program, *many* new versions --- "mutants" --- are created. Each
is created by seeding *exactly* one of the chosen errors into the
program --- so that there are as many "mutants" as there are chosen
errors and each is extremely similar to the original program.

Now, the entire set of tests is run on the original program and on the
mutants. A mutant "dies" if at least one of the tests produces
different (incorrect!) output when run on the mutant than it does on
the original, and a mutant "survives" otherwise. Now, the ideal
situation is that all the mutants "die," and the proportion of mutants
that die, to mutants, could be taken as an estimate of the proportion
of "real" errors in the code that these tests will catch.