Fast and Resilient Integration Testing - Part 1: Problem Statement and Analysis
In early 2012 running the test suite for the Madek Project took between 1 ½ to 2 hours. The whole test suite did rarely pass because one or more of the many tests failed with false negatives. Today, the now larger test suite runs in about 3 minutes1 and it very rarely fails, even though false negatives still can and do occur.
The Causes for Randomly Failing Tests
The reasons for failing tests in our case include:
- randomly generated test data clashing with prerequisites,
- badly written integration tests, and
- brittle or badly written code in the project itself.
- Incompatible changes in the testing library and test programs (i.e. browsers),
- bugs in libraries used for testing,
- resources problems or crashes on the executing machine, and
- communication problems between the services that run the tests.
We can divide the items in two categories. The development team is accountable for 1 to 3. Yet, there are conceivable reasons why they would not be fixed in a timely manner. The remaining problems are generally harder to control taking care of them would be costly.
The costs can be such high that investigation in alternative ways to deal with the problems is indicated. I will argue that there is a inherent, and not ignorable probability of false negatives when testing the integration of a multi service architecture.
A Simple Probabilistic Model
Let there be n independent tests. Let us simplify and assume that every test has the same probability of resulting in a false negative and let us denote it with pf. Consequently, the probability of getting the correct successful outcome is ps = 1 − pf for each test. If we execute each of those tests once, the probability of getting the correct and successful outcome for all tests is
Ps = psn
The problem is apparent. The outcome Ps is heavily determined by the exponent, i.e. the number of tests. A slight increase in the number of tests will annihilate any efforts to improve the reliability of the single tests quickly.
Mitigating the Problem
One can try to compensate with more unit tests in favor of integration tests. While still being popular, this approach has obviously limited value. To forego integration tests will be less of an option in the future since Microservice Architectures are becoming more popular2.
I suggest to recognize false negatives as an inherent property of integration testing. Hence, false negatives should be incorporated in the strategies to run and evaluate tests. I suggest to run every test independently and repeat it for at most k times to compensate false negatives. Then, let psʹ = 1 − pfk be the compensated probability for the successful outcome of a single test and Psʹ the one for the whole suite accordingly:
Psʹ = (1 − pfk)n
Let me illustrate how effective this naive approach is. Let us assume that the probability of failure pf is 0.01. The following diagram shows the total probability of success for up to 500 tests and k = 1 (no retry) in blue, k = 2 (one retry) in red and k = 3 (two retries) in yellow.
It suffices to chose a k in the order of log n because of limn→∞(1 − pflogn)n = 1 where pf is a small constant. For practical matters k is a very small integer like 3! This is the main result of our discourse.
The next part of this series will discuss an implementation for jenkins-ci.