Published: 2014-06-22

Tagged: cider-ci, continuous integration, microservices, software architecture, testing

This article discusses problems we face when performing full stack integration testing. I will focus on the reason for false negatives and how to avoid them. The proposed approach integrates very well with the strategy of splitting a test suite in several tasks for achieving a faster execution time. This is part one of a three part series.

Fast and Resilient Integration Testing - Part 1: Problem Statement and Analysis

In early 2012 running the test suite for the Madek Project took between 1 ½ to 2 hours. The whole test suite did rarely pass because one or more of the many tests failed with false negatives. Today, the now larger test suite runs in about 3 minutes¹ and it very rarely fails, even though false negatives still can and do occur.

The Causes for Randomly Failing Tests

The reasons for failing tests in our case include:

randomly generated test data clashing with prerequisites,
badly written integration tests, and
brittle or badly written code in the project itself.
Incompatible changes in the testing library and test programs (i.e. browsers),
bugs in libraries used for testing,
resources problems or crashes on the executing machine, and
communication problems between the services that run the tests.

We can divide the items in two categories. The development team is accountable for 1 to 3. Yet, there are conceivable reasons why they would not be fixed in a timely manner. The remaining problems are generally harder to control taking care of them would be costly.

The costs can be such high that investigation in alternative ways to deal with the problems is indicated. I will argue that there is a inherent, and not ignorable probability of false negatives when testing the integration of a multi service architecture.

A Simple Probabilistic Model

Let there be n independent tests. Let us simplify and assume that every test has the same probability of resulting in a false negative and let us denote it with p_f. Consequently, the probability of getting the correct successful outcome is p_s = 1 − p_f for each test. If we execute each of those tests once, the probability of getting the correct and successful outcome for all tests is

P_s = p_sⁿ

The problem is apparent. The outcome P_s is heavily determined by the exponent, i.e. the number of tests. A slight increase in the number of tests will annihilate any efforts to improve the reliability of the single tests quickly.

Mitigating the Problem

One can try to compensate with more unit tests in favor of integration tests. While still being popular, this approach has obviously limited value. To forego integration tests will be less of an option in the future since Microservice Architectures are becoming more popular².

I suggest to recognize false negatives as an inherent property of integration testing. Hence, false negatives should be incorporated in the strategies to run and evaluate tests. I suggest to run every test independently and repeat it for at most k times to compensate false negatives. Then, let p_sʹ = 1 − p_f^k be the compensated probability for the successful outcome of a single test and P_sʹ the one for the whole suite accordingly:

P_sʹ = (1 − p_f^k)ⁿ

Let me illustrate how effective this naive approach is. Let us assume that the probability of failure p_f is 0.01. The following diagram shows the total probability of success for up to 500 tests and k = 1 (no retry) in blue, k = 2 (one retry) in red and k = 3 (two retries) in yellow.

Compensated Probability of Failure

It suffices to chose a k in the order of log n because of lim_n→∞(1 − p_f^logn)ⁿ = 1 where p_f is a small constant. For practical matters k is a very small integer like 3! This is the main result of our discourse.

The next part of this series will discuss an implementation for jenkins-ci.

The total execution time is still between 1 ½ to 2 hours.↩
See the very recent article Microservices by James Lewis and Martin Fowler and the references given therein.↩