rtybase: October 2011

Sunday, October 30, 2011

Boys vs. Girls

Here is another interesting problem I was trying to address a while ago:

In a country in which people only want boys, every family continues to have children until they have a boy. If they have a girl, they have another child. If they have a boy, they stop. What is the expected proportion of boys to girls in the country?

Apparently, this problem was originally posted by Google, as this post suggests. Here is a link to another stream, trying to tackle the problem. I will try to provide a solution that (in my opinion, of course) looks easier to comprehend.

At any given moment of time, the total number of couples (with children) can be divided into two categories C_Total = C_1boy + C_{no boys}, where C_1boy - number of couples with 1 boy (this is actually the limit, as it is stated in the problem) and C_{no boys} - number of couples having only girls (and thus, still trying to have a boy). We don't count the couples with no children as they don't add any value to the calculations below.

The number of boys in this case (or at any given moment of time) is N_b= C_1boy.

The number of girls is N_g=N₁⋅C_1boy + N₂⋅C_{no boys}, where N₁ - average number of girls in a family with 1 boy and N₂ - average number of girls in a family with no boys. Let's find these averages.

The probability for a family with one boy to have 1 girl is P(1 girl & 1 boy) = (1 ⁄ 2)⋅(1 ⁄ 2)
The probability for a family with one boy to have 2 girls is P(2 girls & 1 boy) = (1 ⁄ 2)⋅(1 ⁄ 2)⋅(1 ⁄ 2)
...
The probability for a family with one boy to have n girls is P(n girls & 1 boy) = 1 ⁄ 2ⁿ⁺¹

So, the average number of girls in a family with 1 boy is (find the formula for this series here) N₁=E(X) = ∑m⋅P(m) = ∑m ⁄ 2^m+1 = (1 ⁄ 2)⋅∑m ⁄ 2^m= (1 ⁄ 2) ⋅ (1 ⁄ 2) ⁄ (1 - 1 ⁄ 2)² = 1.

Now...

The probability for a family with no boys to have 1 girl is P(1 girl) = 1 ⁄ 2
The probability for a family with no boys to have 2 girls is P(2 girls) = (1 ⁄ 2)⋅(1 ⁄ 2)
...
The probability for a family with no boys to have n girls is P(n girls) = 1 ⁄ 2ⁿ

The average number of girls in a family with no boys is N₂=E(Y) = ∑k⋅P(k) = ∑k ⁄ 2^k = 2.

As a result N_b ⁄ N_g= C_1boy ⁄ (C_1boy + 2⋅C_{no boys}). This expression is equal to 1 only when C_{no boys}=0, i.e. when all the families reach the target. However, if C_{no boys}= C_1boy then N_b ⁄ N_g=1 ⁄ 3.

Sunday, October 23, 2011

Not so many irreducible fractions

I accidently found few papers in my parents' house, dating since I was 15 (I have no idea how those papers survived almost 19 years, but I am glad I found them). Those days, I was involved in Mathematical Olympiads, at school and national level. As you can imagine, those papers contain problems :)

This post is dedicated to one particular problem, that is:

Prove that any arbitrary segment of size $\frac{1}{n}$ ($n \gt 1$, $n \in \mathbb{N}$) on $\mathbb{R}$ contains no more than $\frac{n+1}{2}$ irreducible fractions of type $\frac{p}{q}$ ($p,q \in \mathbb{Z}$), where $1 \le q \le n$.

What I remember, so far, is that this problem was part of the Liouville numbers lesson. Unfortunately, I can't reproduce the lesson (I can't remember it :(), but ... below is an alternative proof, containing the sort of analysis I like to define as "Don't be lazy to unfold the details".

Let's pick up an arbitrary number $\alpha \in \mathbb{R}$ so that it defines the $\left[\alpha, \alpha+\frac{1}{n}\right]$ segment. For any rational number (irreducible fraction) $\frac{p}{q}$ ($p,q \in \mathbb{Z}$, $1 \le q \le n$) on this segment we have:
$$\alpha \le \frac{p}{q} \le \alpha + \frac{1}{n}$$ or
$$q \cdot \alpha \le p \le q \cdot \alpha + \frac{q}{n} \tag{*}$$

Now:

1. Let's consider q=1, then from (*) we have α≤p₁≤α+1⁄n. There can be maximum one such integer number p₁. Why? Let's suppose there exists another one t₁≠p₁ so that
α≤t₁≤α+1⁄n
But this means 1≤|t₁-p₁|≤1⁄n<1, which is impossible (absolute difference between 2 different integers is always ≥1).

2. Let's consider q=2, then from (*) we have 2⋅α≤p₂≤2⋅α+2⁄n. There can be maximum one such integer number p₂ (proof is identical to the case above except 1≤|t₂-p₂|≤2⁄n).

...

N-1. Let's consider q=n-1, then from (*) we have (n-1)⋅α≤p_n-1≤(n-1)⋅α+(n-1)⁄n. There can be maximum one such integer number p_n-1 (proof is identical to the case(s) above except 1≤|t_n-1-p_n-1|≤(n-1)⁄n).

N. Let's consider q=n, then from (*) we have n⋅α≤p≤n⋅α+1. There can be maximum two (!!!) such integer numbers p_n and p_n+1. Why? Imagine that n⋅α is an integer, then n⋅α+1 is another different integer. If n⋅α isn't an integer, then there will be maximum one integer satisfying n⋅α≤p_n≤n⋅α+1 (e.g. if we suppose there will be 2 integers n⋅α≤p_n<t≤n⋅α+1, then 1≤t–p_n≤1 ⇒ t=p_n+1. Also n⋅α+1 ≤p_n+1=t≤n⋅α+1 ⇒ p_n+1= n⋅α+1 ⇒ p_n= n⋅α so n⋅α is an integer).

At this step we know that there could be maximum (n+1) integers p_i, satisfying (*) where 1≤q≤n.

Next, let's consider all the even q between 1 and n-1, e.g. q=2⋅s, then we have from (*)

2⋅s⋅α≤p_2⋅s≤2⋅s⋅α+2⋅s⁄n and
s⋅α≤p_s≤s⋅α+s⁄n which is ⇔ (multiplying by 2)
2⋅s⋅α≤2⋅p_s≤2⋅s⋅α+2⋅s⁄n
from which we can see that:
|p_2⋅s-2⋅p_s|≤2⋅s⁄n=q⁄n<1 ⇒ p_2⋅s=2⋅p_s or
p_q⁄q=p_2⋅s⁄(2⋅s)=p_s⁄s or
p_q⁄q is reducible.

We can state now that for any even q, p_q⁄q is reducible, but there are (n-1)⁄2 even numbers between 1 and n-1. As a result we have maximum (n-1)⁄2 irreducible fractions satisfying:
α≤p⁄q≤α+1⁄n, where 1≤q≤n-1

What about the q=n case?

As we stated in the case N above, there can be maximum two integers (we noted then p_n and p_n+1) satisfying
n⋅α≤p≤n⋅α+1
if, and only if n⋅α is an integer.

If we consider p_n<p_n+1 then
n⋅α=p_n
n⋅α+1=p_n+1=p_n+1

From the case 1 above n⋅α≤n⋅p₁≤n⋅α+1 or p_n≤n⋅p₁≤p_n+1 which means
p_n=n⋅p₁ or
p_n+1=p_n+1=n⋅p₁

As a result, either p_n⁄n or p_n+1⁄n is further reducible, considering p₁ exists. So we can have maximum one irreducible fraction. This means total maximum is 1+(n-1)⁄2=(n+1)⁄2 now.

If p₁ doesn't exists (we stated there could be maximum 1), then both p_n⁄n and p_n+1⁄n can be irreducible, but the q=1 case (case 1) will have to be removed from the previously analysed options so 2-1+(n-1)⁄2=(n+1)⁄2.

One can observe that from the case 1, we can apply this technique (of multiplying) and deduce that p_i=i⋅p₁, 1≤i≤n-1. Yes, this is true, if we admit there exists p₁ at all (this part is very subtle). But even in this particular case, the total is 2 irreducible fractions that is less than (n+1)⁄2 anyway.

Sunday, October 16, 2011

How many defects are there?

This week I was asked if there are any statistics there showing
- the number of defects (on average) produced during a software elaboration project or
- the number of defects produced versus the number of lines of code per programming language or technology stack (C/C++, Java, .NET or PHP for example)?

I answered that I haven't seen anything publicly available. Publishing such statistics could affect reputation. However, internally, any company should collect these statistics for risk management purposes.

Still, we can do some estimations. For example, let's assume the following oversimplified model:

1. A project consists of one or few iterations/sprints.

2. Ideally, code from each iteration/sprint is deployed with 0 defects. As a result, we consider what was fixed during the active iteration. We also assume that what was deployed in the previous iterations brings no defects in the current one.

3. Most of the trivial defects are spotted during the build phase (far before the test team gets engaged). As a result, we count the defects spotted during the unit tests execution. For now we ignore the defects spotted by the test team as this complicates the model :)

4. The unit tests are free of defects :)

Going further:

5. The $X$-th iteration delivers $N$ new units.

6. Each unit must have at least $2$ Unit Tests, for Expected Pass and Expected Failure cases.

7. As a result the $X$-th iteration has $2\cdot N$ Unit Tests.

8. The probability of a single Unit Test to fail is $\frac{1}{2}$ (Unexpected Pass or Unexpected Failure).

9. The iteration can have from $0$ to $2\cdot N$ defects as a result. The probability that the number of defects is m ($0\leq m\leq 2\cdot N$) is $$P(m)=\binom{2\cdot N}{m}\frac{1}{2^{2\cdot N}}$$

10. The mean value or the average number of the defects is $2\cdot N \cdot \frac{1}{2}=N$.

So, $N$ units with $N$ defects or roughly $1$ defect per unit.

Few words about the maths used. It is the binomial distribution where $p=\frac{1}{2}$ and the mean value is E(X) =∑m⋅P(m) = n⋅p, where n=2⋅N.

This formula also tells us that if we reduce the probability for a Unit Test to fail ($p<\frac{1}{2}$), then we will also reduce the number of defects. Sounds logic, doesn't?

I will also provide a quick proof for the mean value because it is indeed a very elegant piece of mathematics, so

E(X) =∑m⋅P(m) = P(1)+2⋅P(2)+...+n⋅P(n)=
=C¹_n⋅p⋅(1-p)^n-1+2⋅C²_n⋅p²⋅(1-p)^n-2+...+n⋅Cⁿ_n⋅pⁿ=
=p⋅[n⋅(1-p)^n-1+2⋅C²_n⋅p⋅(1-p)^n-2+...+n⋅p^n-1]=
=p⋅n⋅[(1-p)^n-1+C¹_n-1⋅p⋅(1-p)^n-2+...+p^n-1]=
=p⋅n⋅(1-p+p)^n-1=n⋅p