CPSC 333 --- Lecture 24 --- Wednesday, March 13, 1996 Additional White Box and Black Box Unit Tests: Reference: Sections 18.4 and 18.5 of Pressman's "Practitioner's Guide" --- pages 611--623. Additional White Box Tests: - (Slightly) different methods for testing conditions in programs - Data Flow Testing: A method that "selects test paths of a program according to the locations of definitions and uses of variables in the program." Black Box Tests: - Equivalence Partitioning: The "input domain" of a program is divided into classes of data from which test cases can be derived. Guidelines include instructions like the following: If an input is expected to belong to some range of numbers, include tests that include values for the input that are out of range --- both below the minimum allowable value and above the maximum --- and also in the expected range. One justification for "Equivalence Partitioning" is that, if structured programming has been used to implement the module, then it's to be hoped that the program will work consistently on all inputs in the same "class:" in particular, if it fails on one input in the class it'll likely fail on all of them. - Boundary Value Analysis: This "complements" Equivalence Testing, adding tests in which inputs have values at the "boundaries" of their allowed ranges of values. For example, if an input parameter is allowed to have values between some minimum value "a" and some maximum value "b" then the guidelines for Boundary Value Analysis will dictate inclusion of tests in which the parameter has values a-1, a, a+1, b-1, b, and b+1. See Pressman's book for details about these additional testing methods. Something Pressman Missed? If you're replacing an existing system, you may have access to "real" problems --- ones that were solved using the system to be replaced. These would, presumably, be useful for testing the system on "typical" inputs. This certainly wouldn't hurt, given that the above methods do seem to emphasize "extreme" or "atypical" cases! Testing the Tests These ideas can be applied to unit testing, or to integration testing (which will follow). Since "exhaustive testing is impossible" we can only hope to use a relatively small number of tests that will detect as many of the errors in the system as possible. Under these circumstances it's useful to have a way to estimate the likelihood that a set of tests *will* catch most of the errors in code --- or, to estimate the *proportion* of the errors in a program that will be detected by a given set of tests. The following two, similar, techniques can be used to try to assess (or "test") a given set of tests in this way. In both cases, a set of programming errors that *could* plausibly have been made, for the code to be tested, are selected. Error Seeding: For this method, all of these chosen errors are added into (or "seeded" into) the code. That is, a new version of the program is created by introducing these errors. The tests are then executed on the new "seeded" program. The number of "seeded" errors that were caught by the tests is counted. Clearly, the higher the number of these errors that were caught, the better (and, ideally, you'd like *all* the seeded errors have been detected). One *might* use the proportion of "seeded errors that were detected" to "seeded errors", as an estimate of the *unseeded* (that is, "real") errors in the program that were "caught" by the tests as well. Mutation Testing: In this case, instead of having one new version of the program, *many* new versions --- "mutants" --- are created. Each is created by seeding *exactly* one of the chosen errors into the program --- so that there are as many "mutants" as there are chosen errors and each is extremely similar to the original program. Now, the entire set of tests is run on the original program and on the mutants. A mutant "dies" if at least one of the tests produces different (incorrect!) output when run on the mutant than it does on the original, and a mutant "survives" otherwise. Now, the ideal situation is that all the mutants "die," and the proportion of mutants that die, to mutants, could be taken as an estimate of the proportion of "real" errors in the code that these tests will catch.