Sunday, October 30, 2011

Boys vs. Girls

Here is another interesting problem I was trying to address a while ago:

In a country in which people only want boys, every family continues to have children until they have a boy. If they have a girl, they have another child. If they have a boy, they stop. What is the expected proportion of boys to girls in the country?

Apparently, this problem was originally posted by Google, as this post suggests. Here is a link to another stream, trying to tackle the problem. I will try to provide a solution that (in my opinion, of course) looks easier to comprehend.

At any given moment of time, the total number of couples (with children) can be divided into two categories CTotal = C1boy + Cno boys, where C1boy - number of couples with 1 boy (this is actually the limit, as it is stated in the problem) and Cno boys - number of couples having only girls (and thus, still trying to have a boy). We don't count the couples with no children as they don't add any value to the calculations below.

The number of boys in this case (or at any given moment of time) is Nb= C1boy.

The number of girls is Ng=N1⋅C1boy + N2⋅Cno boys, where N1 - average number of girls in a family with 1 boy and N2 - average number of girls in a family with no boys. Let's find these averages.

The probability for a family with one boy to have 1 girl is P(1 girl & 1 boy) = (1 ⁄ 2)⋅(1 ⁄ 2)
The probability for a family with one boy to have 2 girls is P(2 girls & 1 boy) = (1 ⁄ 2)⋅(1 ⁄ 2)⋅(1 ⁄ 2)
...
The probability for a family with one boy to have n girls is P(n girls & 1 boy) = 1 ⁄ 2n+1

So, the average number of girls in a family with 1 boy is (find the formula for this series here) N1=E(X) = ∑m⋅P(m) = ∑m ⁄ 2m+1 = (1 ⁄ 2)⋅∑m ⁄ 2m= (1 ⁄ 2) ⋅ (1 ⁄ 2) ⁄ (1 - 1 ⁄ 2)2 = 1.

Now...

The probability for a family with no boys to have 1 girl is P(1 girl) = 1 ⁄ 2
The probability for a family with no boys to have 2 girls is P(2 girls) = (1 ⁄ 2)⋅(1 ⁄ 2)
...
The probability for a family with no boys to have n girls is P(n girls) = 1 ⁄ 2n

The average number of girls in a family with no boys is N2=E(Y) = ∑k⋅P(k) = ∑k ⁄ 2k = 2.

As a result Nb ⁄ Ng= C1boy ⁄ (C1boy + 2⋅Cno boys). This expression is equal to 1 only when Cno boys=0, i.e. when all the families reach the target. However, if Cno boys= C1boy then Nb ⁄ Ng=1 ⁄ 3.

Sunday, October 23, 2011

Not so many irreducible fractions

I accidently found few papers in my parents' house, dating since I was 15 (I have no idea how those papers survived almost 19 years, but I am glad I found them). Those days, I was involved in Mathematical Olympiads, at school and national level. As you can imagine, those papers contain problems :)

This post is dedicated to one particular problem, that is:

Prove that any arbitrary segment of size 1⁄n (n >1, n∈N) on ℜ contains no more than (n+1)⁄2 irreducible fractions of type p⁄q (p,q∈Z), where 1≤q≤n.

What I remember, so far, is that this problem was part of the Liouville numbers lesson. Unfortunately, I can't reproduce the lesson (I can't remember it :(), but ... below is an alternative proof, containing the sort of analysis I like to define as "Don't be lazy to unfold the details".

Let's pick up an arbitrary number α∈ℜ so that it defines the [α, α+1⁄n] segment. For any rational number (irreducible fraction) p⁄q (p,q∈Z, 1≤q≤n) on this segment we have:
α≤p⁄q≤α+1⁄n or
(*)

Now:

1. Let's consider q=1, then from (*) we have α≤p1≤α+1⁄n. There can be maximum one such integer number p1. Why? Let's suppose there exists another one t1≠p1 so that
α≤t1≤α+1⁄n
But this means 1≤|t1-p1|≤1⁄n<1, which is impossible (absolute difference between 2 different integers is always ≥1).

2. Let's consider q=2, then from (*) we have 2⋅α≤p2≤2⋅α+2⁄n. There can be maximum one such integer number p2 (proof is identical to the case above except 1≤|t2-p2|≤2⁄n).

...

N-1. Let's consider q=n-1, then from (*) we have (n-1)⋅α≤pn-1≤(n-1)⋅α+(n-1)⁄n. There can be maximum one such integer number pn-1 (proof is identical to the case(s) above except 1≤|tn-1-pn-1|≤(n-1)⁄n).

N. Let's consider q=n, then from (*) we have n⋅α≤p≤n⋅α+1. There can be maximum two (!!!) such integer numbers pn and pn+1. Why? Imagine that n⋅α is an integer, then n⋅α+1 is another different integer. If n⋅α isn't an integer, then there will be maximum one integer satisfying n⋅α≤pn≤n⋅α+1 (e.g. if we suppose there will be 2 integers n⋅α≤pn<t≤n⋅α+1, then 1≤t–pn≤1t=pn+1. Also n⋅α+1 ≤pn+1=t≤n⋅α+1pn+1= n⋅α+1pn= n⋅α so n⋅α is an integer).

At this step we know that there could be maximum (n+1) integers pi, satisfying (*) where 1≤q≤n.

Next, let's consider all the even q between 1 and n-1, e.g. q=2⋅s, then we have from (*)

2⋅s⋅α≤p2⋅s≤2⋅s⋅α+2⋅s⁄n and
s⋅α≤ps≤s⋅α+s⁄n which is ⇔ (multiplying by 2)
2⋅s⋅α≤2⋅ps≤2⋅s⋅α+2⋅s⁄n
from which we can see that:
|p2⋅s-2⋅ps|≤2⋅s⁄n=q⁄n<1p2⋅s=2⋅ps or
pq⁄q=p2⋅s⁄(2⋅s)=ps⁄s or
pq⁄q is reducible.

We can state now that for any even q, pq⁄q is reducible, but there are (n-1)⁄2 even numbers between 1 and n-1. As a result we have maximum (n-1)⁄2 irreducible fractions satisfying:
α≤p⁄q≤α+1⁄n, where 1≤q≤n-1

What about the q=n case?

As we stated in the case N above, there can be maximum two integers (we noted then pn and pn+1) satisfying
n⋅α≤p≤n⋅α+1
if, and only if n⋅α is an integer.

If we consider pn<pn+1 then
n⋅α=pn
n⋅α+1=pn+1=pn+1

From the case 1 above n⋅α≤n⋅p1≤n⋅α+1 or pn≤n⋅p1≤pn+1 which means
pn=n⋅p1 or
pn+1=pn+1=n⋅p1

As a result, either pn⁄n or pn+1⁄n is further reducible, considering p1 exists. So we can have maximum one irreducible fraction. This means total maximum is 1+(n-1)⁄2=(n+1)⁄2 now.

If p1 doesn't exists (we stated there could be maximum 1), then both pn⁄n and pn+1⁄n can be irreducible, but the q=1 case (case 1) will have to be removed from the previously analysed options so 2-1+(n-1)⁄2=(n+1)⁄2.

One can observe that from the case 1, we can apply this technique (of multiplying) and deduce that pi=i⋅p1, 1≤i≤n-1. Yes, this is true, if we admit there exists p1 at all (this part is very subtle). But even in this particular case, the total is 2 irreducible fractions that is less than (n+1)⁄2 anyway.

Sunday, October 16, 2011

How many defects are there?

This week I was asked if there are any statistics there showing
- the number of defects (on average) produced during a software elaboration project or
- the number of defects produced versus the number of lines of code per programming language or technological stack (C/C++, Java, .NET or PHP for example)?

I answered that I haven't seen anything publically available. Any company, publishing such statistics, could damage its reputation. However, internally, any company should collect these statistics for the risk management purpose.

Still, we can apply maths to do some estimation, can't we? For example, let's assume the following:

1. A project consists of one or few iterations.

2. Ideally, code from each iteration is deployed with 0 defects. As a result, we consider what was fixed during the iteration(s). We also assume that what was deployed with the previous iterations is free of defects in the current one.

3. Most of the trivial defects are spotted during the compilation or build process (far before the test team gets engaged) as a result, we count the defects spotted during the unit tests execution. For now we ignore the defects spotted by the test team as this complicates the model :)

4. The unit tests are free of defects :)

Going further:

5. The X-th iteration delivers N new units.

6. Each unit must have at least 2 Unit Tests, for Expected Pass and Expected Failure cases.

7. As a result the X-th iteration has 2⋅N Unit Tests.

8. The probability of a single Unit Test to fail is 1⁄2 (Unexpected Pass or Unexpected Failure).

9. The iteration can have from 0 to 2⋅N defects as a result. The probability that the number of defects is m (0≤m≤2⋅N) is

10. The mean value or the average number of the defects is 2⋅N⋅1⁄2 = N.

So, N units with N defects or roughly 1 defect per unit.

Few words about the maths used. It is the binomial distribution where p=1⁄2 and the mean value is E(X) =∑m⋅P(m) = n⋅p, where n=2⋅N.

This formula also tells us that if we reduce the probability for a Unit Test to fail (p<1⁄2), then we will also reduce the number of defects. Sounds logic, doesn't?

I will also provide a quick proof for the mean value because it is indeed a very elegant piece of mathematics, so

E(X) =∑m⋅P(m) = P(1)+2⋅P(2)+...+n⋅P(n)=
=C1n⋅p⋅(1-p)n-1+2⋅C2n⋅p2⋅(1-p)n-2+...+n⋅Cnn⋅pn=
=p⋅[n⋅(1-p)n-1+2⋅C2n⋅p⋅(1-p)n-2+...+n⋅pn-1]=
=p⋅n⋅[(1-p)n-1+C1n-1⋅p⋅(1-p)n-2+...+pn-1]=
=p⋅n⋅(1-p+p)n-1=n⋅p