This week I was asked if there are any statistics there showing

- the number of defects (on average) produced during a software elaboration project or

- the number of defects produced versus the number of lines of code per programming language or technological stack (C/C++, Java, .NET or PHP for example)?

I answered that I haven't seen anything publicly available. Publishing such statistics could affect reputation. However, internally, any company should collect these statistics for risk management purposes.

Still, we can do some estimations. For example, let's assume the following oversimplified model:

**1.** A project consists of one or few iterations/sprints.

**2.** Ideally, code from each iteration/sprint is deployed with 0 defects. As a result, we consider what was fixed during the active iteration. We also assume that what was deployed in the previous iterations brings no defects in the current one.

**3.** Most of the trivial defects are spotted during the build phase (far before the test team gets engaged). As a result, we count the defects spotted during the unit tests execution. For now we ignore the defects spotted by the test team as this complicates the model :)

**4.** The unit tests are free of defects :)

Going further:

**5.** The $X$-th iteration delivers $N$ new units.

**6.** Each unit must have at least $2$ Unit Tests, for Expected Pass and Expected Failure cases.

**7.** As a result the $X$-th iteration has $2\cdot N$ Unit Tests.

**8.** The probability of a single Unit Test to fail is $\frac{1}{2}$ (Unexpected Pass or Unexpected Failure).

**9.** The iteration can have from $0$ to $2\cdot N$ defects as a result. The probability that the number of defects is *m* ($0\leq m\leq 2\cdot N$) is $$P(m)=\binom{2\cdot N}{m}\frac{1}{2^{2\cdot N}}$$

**10.** The mean value or the average number of the defects is $2\cdot N \cdot \frac{1}{2}=N$.

So, $N$ units with $N$ defects or roughly $1$ defect per unit.

Few words about the maths used. It is the binomial distribution where $p=\frac{1}{2}$ and the mean value is *E(X) =∑m⋅P(m) = n⋅p*, where *n=2⋅N*.

This formula also tells us that if we reduce the probability for a Unit Test to fail ($p<\frac{1}{2}$), then we will also reduce the number of defects. Sounds logic, doesn't?

I will also provide a quick proof for the mean value because it is indeed a very elegant piece of mathematics, so

*E(X) =∑m⋅P(m) = P(1)+2⋅P(2)+...+n⋅P(n)==C ^{1}_{n}⋅p⋅(1-p)^{n-1}+2⋅C^{2}_{n}⋅p^{2}⋅(1-p)^{n-2}+...+n⋅C^{n}_{n}⋅p^{n}==p⋅[n⋅(1-p)^{n-1}+2⋅C^{2}_{n}⋅p⋅(1-p)^{n-2}+...+n⋅p^{n-1}]==p⋅n⋅[(1-p)^{n-1}+C^{1}_{n-1}⋅p⋅(1-p)^{n-2}+...+p^{n-1}]==p⋅n⋅(1-p+p)^{n-1}=n⋅p*