printlogo
http://www.ethz.ch/
Department of Computer Science
 
print
  
English Deutsch

Bertrand Meyer, Ilinca Ciupa, Andreas Leitner, Manuel Oriol: When programs test themselves

Diese Seite existiert nur auf Englisch!

The dirty secret of software development

In practical software development testing remains the most important part of quality assurance. A survey by the US National Institute of Standards and Technology [2] assesses the cost of inadequate software testing for 2002 at $59.5 billion, or 0.6% of the US GNP. Even without such statistics programmers know that testing can be a tedious and even painful activity. One of the principal obstacles is the large amount of manual work, dominated by two time-consuming tasks: preparing test inputs (the set of values chosen to exercise the program) and test oracles (the criteria for the success of a test run). Recent years have seen the advent of some automation with testing frameworks such as JUnit [1], but what they automate is only test management and execution - leaving the two key issues unsolved.

We have developed a testing framework, AutoTest - freely available in source and binary from http://se.inf.ethz.ch/download /auto_test - which takes programs exactly as they are and tests them without the need for writing any test cases or test oracles. The only parameters AutoTest requires are:

When you come back after the specified time AutoTest will have produced a test report often including significant bugs that it found entirely automatically.

Perhaps this sounds like magic but the key is a simple idea: programs contain enough information about their own correctness properties to enable testing without any supplementary information. The details appear below but let us first see a typical interaction with the framework. 

A session with AutoTest

Provided with the given information AutoTest will, in the prescribed time, test the given classes and their features, using various heuristics to maximize testing effectiveness. The purpose of testing is to trigger failures, which reflect errors, more commonly known as bugs; AutoTest logs all encountered failures. At the end of the testing time, AutoTest displays the results in a Web browser (see figure 1).

The report displays the names of the offending features (operations) and their classes. Clicking the name of such a feature displays, on the right, the details of the applicable failures. For each failed test, this includes a witness: a test scenario, automatically generated by AutoTest, which triggered the failure.

Failure witnesses appear in a minimized form. What this means is that although AutoTest may have uncovered a particular bug through a long (by human standards) and tortuous route, that route may have included many instructions that have no effect on the result, and the minimization process will remove them; the resulting witness is generally of the minimal length necessary to reproduce the bug. Minimization is important for two reasons:

The technique used for witness minimization [4] is a variant of the program analysis technique known as slicing.

To exercise classes and their features, AutoTest uses various strategies for generating the needed values; contrary to intuition, carefully devised strategies based in part on random argument generation often beat approaches that at first may appear smarter. AutoTest uses a mix of strategies and the research effort continues to explore new ones.

Along with fully automatic test generation, AutoTest is set up to integrate manual tests if available. The two approaches are complementary, one providing breadth, the other depth: automatic tests are good at exercising components much more extensively than a human tester would ever accomplish; manual tests can take advantage of domain knowledge to test specific scenarios that an automatic mechanism would not have the time to reach. Manual tests are seamlessly integrated into the testing process: AutoTest automatically detects the presence of manual tests applying to a class that it is instructed to test and runs these tests before generating any others for that class.

Contracts as oracles

How can AutoTest decide whether a test succeeded? In other words, can programs really test themselves without a human (in practice, many humans) intervening to provide test oracles and examine the outcome of every test?

The answer lies in the contracts already present in Eiffel code. Contracts state what conditions the software must meet at certain points of the execution and they can be evaluated at runtime.  They include:

The Design by Contract approach [4] [5] does not require a fully formal specification; contracts can be partial.

Because contracts use valid expressions of the programming language, they can be evaluated during execution, allowing AutoTest to use contracts as oracles. The general observation is that a run-time contract violation always signals a bug. The bug can be either in the implementation or in the contract (meaning that it is incompatible with the programmer’s intuitive understanding of the software’s purpose, and so that something is wrong anyway since it has not been possible to express that purpose properly). For the usual case of an implementation bug the rule is more specific:

So if we consider the execution of AutoTest as a game aimed at finding as many bugs as possible:

  1. A postcondition or invariant violation is a win for AutoTest: it has uncovered a possible bug in the routine being called.
  2. A precondition violation for a routine called directly by AutoTest is a loss: the object and argument generation strategy has failed to produce a legitimate call. The call will be aborted; AutoTest has wasted time.
  3. If, however, a routine r legitimately called by AutoTest, directly or indirectly, attempts to call another routine with its precondition violated, this is evidence of a problem in r, not in AutoTest: we are back to a win as in case 1.

In addition to contract violations, AutoTest records unhandled exceptions and other forms of abnormal program termination.

Most testing tools and frameworks require programmers to instrument their software before it can be tested. Thanks to Eiffel programmers’ practice of including contracts in their software, AutoTest is able to test software as it is. 

Ever since its first experimental versions, AutoTest has regularly uncovered bugs, some serious and some minor, in released software, including production libraries and systems. Some of these bugs had been present for a long time even though the software had undergone extensive testing of the traditional kind.

Some results

The automatic nature of AutoTest makes it possible to perform large-scale experiments, for example to evaluate testing strategies, with minimum human intervention. We have embarked on a series of experiments, taking advantage of all the computing power we can muster, to explore some of the many unknown issues of software testing. Here are some initial results  [3].

We have found that as an AutoTest automatic testing session proceeds , the number of bugs f (t) found per time unit (at time t) decreases. Our experiments consistently suggest an inverse linear formula

f (t) = c + b/t

for constants b and c which vary with the strategy and the software under test, while the general law appears to remain the same. Figure 2 shows an example for a particular class tested with the random strategy, with numbers averaged for several seeds to the pseudo-random number generator.

The values in Figure 2 are representative; generally, when testing a class, AutoTest finds most bugs in the first few minutes of CPU time.

Uncovering and confirming such laws has a great potential significance since a key practical question in the industrial practice of software development is when to stop testing. If the estimated time to the next bug detection can be reliably predicted, a practical answer is: when the time to the next bug is higher than a preset threshold.

We have also found that on average contract-based random testing finds more bugs through contract violations than through other causes of exceptions. For some classes, this was true for all timeouts tried (from 1 to 30 minutes per class). For others, non-contract exceptions dominate in the first 2 minutes, then contract violations take over.

Since object and value generation relies on random strategies, it is important to evaluate the effectiveness of these strategies. We have found that it is more effective, given a set testing period, to run more tests with different random generator seeds than to use fewer seeds and run the tests longer. Figure 3 illustrates this phenomenon from a number of example library classes; the vertical axis indicates the percentage of bugs found against all known bugs.

Conclusions

AutoTest takes advantage of the self-documenting nature of contract-equipped programs to automate the most time-consuming parts of the testing process and hence increase the number of possibly damaging faults before a program is delivered to its users. Among the already visible consequences of the AutoTest project — on which you can find more information at http://se.ethz.ch/research/autotest, which also provides references to other publications on AutoTest — we may note:

References

  1. JUnit pages at www.junit.org/index.htm
  2. NIST (National Institute of Standards and Technology), The Economic Impacts of Inadequate Infrastructure for Software Testing, Report 7007.011, available at www.nist.gov/director/prog-ofc/report02-3.pdf.
  3. Ciupa, I., Leitner, A., Oriol, M., Meyer, B., Experimental Assessment of Random Testing for Object-Oriented Software, Proceedings of ISSTA'07: International Symposium on Software Testing and Analysis 2007, (London, UK), July 2007.
  4. Ciupa, I, Pretschner, A.. Leitner, A., Oriol, M, Meyer, B. On the Predictability of Random Tests for Object-Oriented Software. Submitted to publication, 2008.
  5. Leitner, A., Oriol, M., Ciupa, I, Zeller, A. and Meyer, B.: Efficient Unit Test Case Minimization, to appear in ASE'07: 22nd IEEE/ACM International Conference on Automated Software Engineering, Atlanta (Georgia), November 2007
  6. Meyer, B., Applying “Design by Contract”, in  Computer (IEEE), 25, 10, October 1992, pages 40-51.
  7. Meyer, B., Object-Oriented Software Construction, 2nd Edition, Prentice Hall, 1997.
  8. Meyer, B., Ciupa, I., Leitner, A., Liu, L., Automatic testing of object-oriented software, Technical Report 538, ETH Zürich, Chair of Software Engineering, November 2006
 

Wichtiger Hinweis:
Diese Website wird in älteren Versionen von Netscape ohne graphische Elemente dargestellt. Die Funktionalität der Website ist aber trotzdem gewährleistet. Wenn Sie diese Website regelmässig benutzen, empfehlen wir Ihnen, auf Ihrem Computer einen aktuellen Browser zu installieren. Weitere Informationen finden Sie auf
folgender Seite.

Important Note:
The content in this site is accessible to any browser or Internet device, however, some graphics will display correctly only in the newer versions of Netscape. To get the most out of our site we suggest you upgrade to a newer browser.
More information

© 2012 ETH Zürich | Impressum | 15.11.2007
top