Wednesday, December 14, 2011

Jail Break Quiz

Everything was quiet and peaceful today, until Google Reader brought to my attention this puzzle from

So, I said, why not writing a little Java program to resolve this? Here is the code:

public class Puzzle1 {
 static public final int MAX = 100;

 static public void printDoors(boolean[] doorIsOpen) {
  int count = 0;
  for (int i = 0; i < doorIsOpen.length; i++) {
   if (doorIsOpen[i]) {
   else System.out.print("c");
  System.out.println(" =" + count + " open doors");

 static public void main(String[] args) {
  boolean[] doorIsOpen = new boolean[MAX];
  for (int i = 1; i <= MAX; i++) {
   for (int j = i; j <= MAX; j+=i) {
    doorIsOpen[j-1] = !doorIsOpen[j-1];

The result was "10 prisoners have escaped", as per the last line printed by the program:

occoccccoccccccoccccccccoccccccccccoccccccccccccoccccccccccccccoccccccccccccccccocccccccccccccccccco =10 open doors

The most interesting part of the puzzle, though, is the "why" one. Have a look at the sequence; 1st position (cell in this case) contains "o" (from "open"), 4 - "o", 9 - "o", 16 - "o", 25 - "o", 36 - "o", 49 - "o", 64 - "o", 81 - "o", 100 - "o". Basically, every number of type q2≤100 corresponds to an "o". Now, this is weird ... but not for a long time.

Let's stick with an easier example (10 rather than 100):

  • 1st cell (column from the left) is visited one time and remains opened and unvisited after that. Also number 1 has one divisor {1}.
  • 2nd cell is visited two times (one "o" and one "c") and remains closed and unvisited after that. Also number 2 has two divisors {1, 2}.
  • 3rd cell is visited two times (one "o" and one "c") and remains closed and unvisited after that. Also number 3 has two divisors {1, 3}.
  • 4th cell is visited three times (one "o", one "c" and one "o") and remains opened and unvisited after that. Also number 4 has three divisors {1, 2, 4}.
  • ...

It turns out that the number of visits is equal to the number of divisors the number, corresponding to the cell (column from the left), has. Here is the formula and this statement is easy to prove (e.g. using induction).

  • Now, if the number of divisors is even (we have a whole number of pairs {"o", "c"}), because the sequence always starts with "o" then it must end with "c".
  • However, if the number of divisors is odd (we have a whole number of pairs {"o", "c"} plus one extra {"o"}), the sequence ends with "o".

So far so good, but when the number of divisors is an odd number? According to the formula mentioned above the number
is odd when each of these numbers e1, e2, e3,...,ek is even, but this means that n is of type q2. Q.E.D.

Monday, December 12, 2011

K-Means Clustering Algorithm

This post is dedicated to K-Means Clustering Algorithm, used for unsupervised learning. Here is a short introduction into the unsupervised learning subject. This video shows the algorithm at work.

So, how does it work?

1. First of all we have the number K (the number of clusters) and the data set X1,...,Xm as inputs, K < m and Xj ∈ ℜn for ∀j=1..m. Considering that clustering is used as a task in data mining (e.g. including text), it is a nice exercise indeed to formalise a problem in a way where data is represented by vectors :)

2. At this step we randomly initialise K cluster centroids μ1,...,μK, μj ∈ ℜn for ∀j=1..K. The recommended way is to randomly initialise the centroids from the data set X1,...,Xm and make sure that μi≠μj for ∀ i≠j.

3. At this step we execute the loop:

Repeat {
  //cluster assignment step
  For i=1..m {
    Find the closest centroid for Xi, i.e.
    min{||Xij||}, ∀j=1..K, e.g. it is μt;
    Assign ci=t;
    //in this case ci is the index of 
    //the closest centroid for Xi

  //move centroids step, if a particular cluster 
  //centroid has no points assigned to it, 
  //then eliminate it
  For j=1..K {
    old_μj = μj;
    μj = average (mean) of all Xi where ci=j, ∀i=1..m;
This loop ends when all the centroids stop "moving", i.e. ||old_μj - μj|| < ε, ∀j=1..K, where ε is an error we are happy with (e.g. 0.01 or 0.001).

This is pretty much the algorithm. However, in this form, there is a risk to get stuck in a local minima. By local minima I mean the local minima of the cost function:
J(X1,...,Xm,c1,...cm1,...,μK) = (1 ⁄ m)⋅∑||Xi – μci||2, sum is for i=1..m

In order to address this, the algorithm (steps 2, 3) are executed several times (e.g. 100). Each time the cost function (J(...)) is computed and the clustering with the lowest J(...) is picked up. E.g.

For p=1..100 {
  Perform step 2;
  Perform step 3;
  Calculate cost function J(...);
Pick up clustering with the lowest cost J(...);

In order to make it more interactive, I and my son spent couple of hours implementing a JavaScript version of this algorithm using <canvas> tag from HTML5 (my son studies HTML at school, so it was an interesting exercise). Here is the link to the page (excuse me and my son for our non OOP approach to JavaScript :)). If you want to have a look at the sources, feel free to download that page (we didn't provide comments, but hopefully this article explains everything). We tested it with Google Chrome and Internet Explorer 9 (if you have problems with IE9, please consult the following link).

Note: While I was writing this article, a friend of mine suggested this nice JavaScript library for plotting

Tuesday, November 22, 2011

A* Search

Algorithms form an important part of the problem solving process in Artificial Intelligence. One useful algorithm is A* Search, which is an extension of another useful algorithm called Dijkstra's algorithm. This video is a short introduction to the subject.

So, the key part of this algorithm is the evaluation function f(node)=g(node)+h(node) (note that in the video "state" is treated as a "node"), where
- g(node) is the cost from the "Start" node to the current node and
- h(node) is the heuristic (estimated) cost from the current node to the "Goal".
The algorithm works by always expanding the path which has the minimum value of f(node) first (cheapest first). The search terminates when the "Goal" node is chosen for expansion or there are no more nodes to expand.

For the heuristic function to be admissible (or optimistic, which is the same) it is important that for any node h(node) ≤ actual cost from the node to the "Goal". If this is true, then "A* Search" will always find the lowest cost path from the "Start" to the "Goal", this video explains this principle with more details.

As you could mention, the last video operates with another term called "search frontier". It is a "characteristic" of the algorithm or the way it progresses to/with expanding the nodes. For example:
- this article shows how the search frontier looks for the Breadth-First Search algorithm and
- this article shows how the search frontier looks for the Depth-First Search algorithm.

An implementation of this algorithm would use a priority queue, where priority is dictated by the function f.

Now, let's consolidate this material with an exercise:

For heuristic function h and cost of action 10 (per step), find the order (1, 2, 3, ...) of the nodes expansion (the node is expanded when it is removed from queue). Start with "1" at the "Start" state at the top. Indicate the nodes that will never be expanded. Is the heuristic function admissible?

First of all the heuristic function is not admissible because h(B5)=20 > 10, where 10 is the actual cost from B5 to the "Goal".

Now, let's start with the expansion. The frontier is at the "Start".

1. f(A1)=10+11=21, f(A2)=18, f(A3)=17, f(A4)=16, f(A5)=20. The frontier moves to A1, A2, A3, A4, A5 (this is the content of the queue).

2. A4 is the node (in the queue) with the minimum f (=16), so it's the second node to be expanded (removed from the queue). f(B4)=10+10+5=25, f(B5)=40. The frontier moves to A1, A2, A3, B4, B5, A5.

3. A3 is the next node with the minimum f (=17), so it's the third node to be expanded. But there is nothing to add to the queue. The frontier now is A1, A2, B4, B5, A5.

4. A2 is the next node with the minimum f (=18), so it's the forth node to be expanded. f(B2)=10+10+3=23, f(B3)=29. The frontier moves to A1, B2, B3, B4, B5, A5.

5. A5 is the next node (again, in the queue) with the minimum f (=20), so it's the fifth node to be expanded. f("Goal")=10+10+0=20. The frontier moves to A1, B2, B3, B4, B5, "Goal".

6. The "Goal" is the next node (from the queue) with the minimum f (=20), so it's the sixth node to be expanded and, because it is the "Goal", it is also the last one to be expanded (i.e. end of the search).

Here is the expansion order: "Start"-1, A4-2, A3-3, A2-4, A5-5, "Goal"-6.

The nodes which were never expanded: A1, B1, B2, B3, B4 and B5.

Thursday, November 17, 2011

Dimensionality Reduction

Another interesting Machine Learning algorithm (unsupervised learning this time) is Dimensionality Reduction.

Here is a short video explaining the theory. In the example presented in this video the mean (μ vector) is the simple arithmetic average per column of the matrix X. Σ (or covariance) matrix is simply (1 ⁄ M)⋅(X-μ)T⋅(X-μ) (where X-μ is per column operation, i.e. mean is extracted from each element of the relevant column, T – is transpose operator and M is the number of rows in the matrix X). Eigen values and vectors are of the Σ matrix, i.e. satisfying Σ⋅v=λ⋅v, where λ is a scalar value.

And here is another video explaining the algorithm's applicability.

Monday, November 14, 2011

Linear Regression

This post will be another short Machine Learning lesson (or a set of materials to be more precise). Particularly, it will be about Linear Regression, which is a method of supervised machine learning (when there is a training set).

First of all, please watch these videos explaining the theoretical material: video 1, video 2 and video 3

Basically, it is a way to "predict" (or diagnose) a result from an input, after training the model with the training set.

The mathematics behind this is explained here and it is also known as the best fitting curve.

In order to consolidate this material (at least for myself :)), here is an easy exercise. Given the training set:

Find the hypothesis function ƒ(x)=w1⋅x+w0.

Using the formulas from the video material:

and the fact that M=5, we have:

w1=[5⋅(0⋅3+1⋅6+2⋅7+3⋅8+4⋅11)–(0+1+2+3+4)⋅(3+6+7+8+11)][5⋅(0+1+4+9+16)-( 0+1+2+3+4)2]=[5⋅88-350] ⁄ [5⋅30-100]=9 ⁄ 5

w0=(1 ⁄ 5)⋅(3+6+7+8+11)–(9 ⁄ 5)⋅(1 ⁄ 5)⋅(0+1+2+3+4) = 7 – 18 ⁄ 5 = 17 ⁄ 5

So ƒ(x)=1.8⋅x+3.4

Thursday, November 3, 2011

Naive Bayes and Laplace Smoothing

Don't panic if you don't understand my writings below. I encourage you to watch this video for more details. Alternatively subscribe to the following online course. So, we are going to do a bit of Machine Learning.

Let's have the following inputs (2 categories or training set)


Vocabulary of this training set is:


Vocabulary size is 11 words (also counted as categories, more details below).

Laplace smoothing is K=1.

P(SONG) = P(MOVIE) = (3+1) ⁄ (6+1⋅2) = 1 ⁄ 2
3 sentences in category, 6 - sentences altogether, 1 - is K, 2 - number of categories (SONG and MOVIES).

P("PERFECT"|MOVIE) = (2+1) ⁄ (8+1⋅11) = 3 ⁄ 19
2 occurrences in the MOVIE category, 1 - is K, 8 words in the MOVIE category, 11 - vocabulary size.

P("PERFECT"|SONG) = (1+1) ⁄ (8+1⋅11) = 2 ⁄ 19
1 occurrence in the SONG category, 1 - is K, 8 words in the SONG category, 11 - vocabulary size.

P("STORM"|MOVIE) = (0+1) ⁄ (8+1⋅11) = 1 ⁄ 19
0 occurrences in the MOVIE category, 1 - is K, 8 words in the MOVIE category, 11 - vocabulary size.

P("STORM"|SONG) = (1+1) ⁄ (8+1⋅11) = 2 ⋅ 19
1 occurrence in the SONG category, 1 - is K, 8 words in the SONG category, 11 - vocabulary size.

Applying Bayes' Rules we can calculate P(MOVIE|M), where M = {"PERFECT STORM"} (or probability that "PERFECT STORM" is MOVIE).

P(MOVIE|M) = P(M|MOVIE)⋅P(MOVIE) ⁄ [P(M|MOVIE)⋅P(MOVIE)+P(M|SONG)⋅P(SONG)] = (3⁄19 ⋅ 1⁄19 ⋅ 1⁄2)/[3⁄19 ⋅ 1⁄19 ⋅ 1⁄2 + 2⁄19 ⋅ 2⁄19 ⋅ 1⁄2] = 3⁄(3+4) = 3⁄7

Sunday, October 30, 2011

Boys vs. Girls

Here is another interesting problem I was trying to address a while ago:

In a country in which people only want boys, every family continues to have children until they have a boy. If they have a girl, they have another child. If they have a boy, they stop. What is the expected proportion of boys to girls in the country?

Apparently, this problem was originally posted by Google, as this post suggests. Here is a link to another stream, trying to tackle the problem. I will try to provide a solution that (in my opinion, of course) looks easier to comprehend.

At any given moment of time, the total number of couples (with children) can be divided into two categories CTotal = C1boy + Cno boys, where C1boy - number of couples with 1 boy (this is actually the limit, as it is stated in the problem) and Cno boys - number of couples having only girls (and thus, still trying to have a boy). We don't count the couples with no children as they don't add any value to the calculations below.

The number of boys in this case (or at any given moment of time) is Nb= C1boy.

The number of girls is Ng=N1⋅C1boy + N2⋅Cno boys, where N1 - average number of girls in a family with 1 boy and N2 - average number of girls in a family with no boys. Let's find these averages.

The probability for a family with one boy to have 1 girl is P(1 girl & 1 boy) = (1 ⁄ 2)⋅(1 ⁄ 2)
The probability for a family with one boy to have 2 girls is P(2 girls & 1 boy) = (1 ⁄ 2)⋅(1 ⁄ 2)⋅(1 ⁄ 2)
The probability for a family with one boy to have n girls is P(n girls & 1 boy) = 1 ⁄ 2n+1

So, the average number of girls in a family with 1 boy is (find the formula for this series here) N1=E(X) = ∑m⋅P(m) = ∑m ⁄ 2m+1 = (1 ⁄ 2)⋅∑m ⁄ 2m= (1 ⁄ 2) ⋅ (1 ⁄ 2) ⁄ (1 - 1 ⁄ 2)2 = 1.


The probability for a family with no boys to have 1 girl is P(1 girl) = 1 ⁄ 2
The probability for a family with no boys to have 2 girls is P(2 girls) = (1 ⁄ 2)⋅(1 ⁄ 2)
The probability for a family with no boys to have n girls is P(n girls) = 1 ⁄ 2n

The average number of girls in a family with no boys is N2=E(Y) = ∑k⋅P(k) = ∑k ⁄ 2k = 2.

As a result Nb ⁄ Ng= C1boy ⁄ (C1boy + 2⋅Cno boys). This expression is equal to 1 only when Cno boys=0, i.e. when all the families reach the target. However, if Cno boys= C1boy then Nb ⁄ Ng=1 ⁄ 3.

Sunday, October 23, 2011

Not so many irreducible fractions

I accidently found few papers in my parents' house, dating since I was 15 (I have no idea how those papers survived almost 19 years, but I am glad I found them). Those days, I was involved in Mathematical Olympiads, at school and national level. As you can imagine, those papers contain problems :)

This post is dedicated to one particular problem, that is:

Prove that any arbitrary segment of size 1⁄n (n >1, n∈N) on ℜ contains no more than (n+1)⁄2 irreducible fractions of type p⁄q (p,q∈Z), where 1≤q≤n.

What I remember, so far, is that this problem was part of the Liouville numbers lesson. Unfortunately, I can't reproduce the lesson (I can't remember it :(), but ... below is an alternative proof, containing the sort of analysis I like to define as "Don't be lazy to unfold the details".

Let's pick up an arbitrary number α∈ℜ so that it defines the [α, α+1⁄n] segment. For any rational number (irreducible fraction) p⁄q (p,q∈Z, 1≤q≤n) on this segment we have:
α≤p⁄q≤α+1⁄n or


1. Let's consider q=1, then from (*) we have α≤p1≤α+1⁄n. There can be maximum one such integer number p1. Why? Let's suppose there exists another one t1≠p1 so that
But this means 1≤|t1-p1|≤1⁄n<1, which is impossible (absolute difference between 2 different integers is always ≥1).

2. Let's consider q=2, then from (*) we have 2⋅α≤p2≤2⋅α+2⁄n. There can be maximum one such integer number p2 (proof is identical to the case above except 1≤|t2-p2|≤2⁄n).


N-1. Let's consider q=n-1, then from (*) we have (n-1)⋅α≤pn-1≤(n-1)⋅α+(n-1)⁄n. There can be maximum one such integer number pn-1 (proof is identical to the case(s) above except 1≤|tn-1-pn-1|≤(n-1)⁄n).

N. Let's consider q=n, then from (*) we have n⋅α≤p≤n⋅α+1. There can be maximum two (!!!) such integer numbers pn and pn+1. Why? Imagine that n⋅α is an integer, then n⋅α+1 is another different integer. If n⋅α isn't an integer, then there will be maximum one integer satisfying n⋅α≤pn≤n⋅α+1 (e.g. if we suppose there will be 2 integers n⋅α≤pn<t≤n⋅α+1, then 1≤t–pn≤1t=pn+1. Also n⋅α+1 ≤pn+1=t≤n⋅α+1pn+1= n⋅α+1pn= n⋅α so n⋅α is an integer).

At this step we know that there could be maximum (n+1) integers pi, satisfying (*) where 1≤q≤n.

Next, let's consider all the even q between 1 and n-1, e.g. q=2⋅s, then we have from (*)

2⋅s⋅α≤p2⋅s≤2⋅s⋅α+2⋅s⁄n and
s⋅α≤ps≤s⋅α+s⁄n which is ⇔ (multiplying by 2)
from which we can see that:
|p2⋅s-2⋅ps|≤2⋅s⁄n=q⁄n<1p2⋅s=2⋅ps or
pq⁄q=p2⋅s⁄(2⋅s)=ps⁄s or
pq⁄q is reducible.

We can state now that for any even q, pq⁄q is reducible, but there are (n-1)⁄2 even numbers between 1 and n-1. As a result we have maximum (n-1)⁄2 irreducible fractions satisfying:
α≤p⁄q≤α+1⁄n, where 1≤q≤n-1

What about the q=n case?

As we stated in the case N above, there can be maximum two integers (we noted then pn and pn+1) satisfying
if, and only if n⋅α is an integer.

If we consider pn<pn+1 then

From the case 1 above n⋅α≤n⋅p1≤n⋅α+1 or pn≤n⋅p1≤pn+1 which means
pn=n⋅p1 or

As a result, either pn⁄n or pn+1⁄n is further reducible, considering p1 exists. So we can have maximum one irreducible fraction. This means total maximum is 1+(n-1)⁄2=(n+1)⁄2 now.

If p1 doesn't exists (we stated there could be maximum 1), then both pn⁄n and pn+1⁄n can be irreducible, but the q=1 case (case 1) will have to be removed from the previously analysed options so 2-1+(n-1)⁄2=(n+1)⁄2.

One can observe that from the case 1, we can apply this technique (of multiplying) and deduce that pi=i⋅p1, 1≤i≤n-1. Yes, this is true, if we admit there exists p1 at all (this part is very subtle). But even in this particular case, the total is 2 irreducible fractions that is less than (n+1)⁄2 anyway.

Sunday, October 16, 2011

How many defects are there?

This week I was asked if there are any statistics there showing
- the number of defects (on average) produced during a software elaboration project or
- the number of defects produced versus the number of lines of code per programming language or technological stack (C/C++, Java, .NET or PHP for example)?

I answered that I haven't seen anything publically available. Any company, publishing such statistics, could damage its reputation. However, internally, any company should collect these statistics for the risk management purpose.

Still, we can apply maths to do some estimation, can't we? For example, let's assume the following:

1. A project consists of one or few iterations.

2. Ideally, code from each iteration is deployed with 0 defects. As a result, we consider what was fixed during the iteration(s). We also assume that what was deployed with the previous iterations is free of defects in the current one.

3. Most of the trivial defects are spotted during the compilation or build process (far before the test team gets engaged) as a result, we count the defects spotted during the unit tests execution. For now we ignore the defects spotted by the test team as this complicates the model :)

4. The unit tests are free of defects :)

Going further:

5. The X-th iteration delivers N new units.

6. Each unit must have at least 2 Unit Tests, for Expected Pass and Expected Failure cases.

7. As a result the X-th iteration has 2⋅N Unit Tests.

8. The probability of a single Unit Test to fail is 1⁄2 (Unexpected Pass or Unexpected Failure).

9. The iteration can have from 0 to 2⋅N defects as a result. The probability that the number of defects is m (0≤m≤2⋅N) is

10. The mean value or the average number of the defects is 2⋅N⋅1⁄2 = N.

So, N units with N defects or roughly 1 defect per unit.

Few words about the maths used. It is the binomial distribution where p=1⁄2 and the mean value is E(X) =∑m⋅P(m) = n⋅p, where n=2⋅N.

This formula also tells us that if we reduce the probability for a Unit Test to fail (p<1⁄2), then we will also reduce the number of defects. Sounds logic, doesn't?

I will also provide a quick proof for the mean value because it is indeed a very elegant piece of mathematics, so

E(X) =∑m⋅P(m) = P(1)+2⋅P(2)+...+n⋅P(n)=

Tuesday, September 13, 2011

Weird functions!

Here is another problem from IMO 2011, number 5 to be more precise:

Let ƒ be a function from the set of integers to the set of positive integers (Z+). Suppose that, for any two integers m and n, the diference ƒ(m)−ƒ(n) is divisible by ƒ(m−n), i.e. ƒ(m−n) | ƒ(m)−ƒ(n). Prove that, for all integers m and n with ƒ(m) ≤ ƒ(n), the number ƒ(n) is divisible by ƒ(m).

Weird indeed, but let's sort it out. We will consider that ƒ(m) < ƒ(n), because ƒ(m) = ƒ(n) case is straightforward. Let's prove some useful (and weird) properties of these functions first.

1. ƒ(m)=ƒ(m-0) | ƒ(m)-ƒ(0) ⇒ ƒ(m) | ƒ(0), for ∀m.

2. ƒ(-m)=ƒ(0-m) | ƒ(0)-ƒ(m). From (1) we know that ƒ(-m) | ƒ(0) which means:
- ƒ(-m) | ƒ(m) (*)
Similarly ƒ(m)=ƒ(0-(-m)) | ƒ(0)-ƒ(-m), which means:
- ƒ(m) | ƒ(-m) (**)
From (*), (**) and the fact that all the values returned by ƒ are positive ⇒ ƒ(-m) | ƒ(m) | ƒ(-m) ⇒ ƒ(-m)=ƒ(m), for ∀m.

3. From ƒ(n)>ƒ(m) (both are positive numbers) ⇒ ƒ(n)>ƒ(n)-ƒ(m)>ƒ(m)-ƒ(m)=0. But ƒ(n-m) | ƒ(n)-ƒ(m), which means, there ∃ an integer q≥1 so that q⋅ƒ(n-m)=ƒ(n)-ƒ(m) ⇒ ƒ(n)>ƒ(n)-ƒ(m)=q⋅ƒ(n-m)≥ƒ(n-m)>0.
From 0<ƒ(n-m)<ƒ(n) and 0<ƒ(m)<ƒ(n) results (from a≤b≤c≤d ⇒ |c-b|≤|d-a|):
- if ƒ(n-m)≤ƒ(m) ⇒ 0≤ƒ(m)-ƒ(n-m)<ƒ(n) or
- if ƒ(n-m)≥ƒ(m) ⇒ 0≤ƒ(n-m)-ƒ(m)<ƒ(n)
But both ƒ(m)-ƒ(n-m) and ƒ(n-m)-ƒ(m) are divisible by ƒ(n):
- ƒ(n)=ƒ(m-(m-n)) | ƒ(m)-ƒ(m-n)=(from 2)=ƒ(m)-ƒ(-(m-n))=ƒ(m)-ƒ(n-m)
- ƒ(n)=(from 2)=ƒ(-n)=ƒ((m-n)-m) | ƒ(m-n)-ƒ(m)=(from 2)=ƒ(-(m-n))-ƒ(m)=ƒ(n-m)-ƒ(m)
Or, saying it otherwise, ƒ(n) divides a positive number that is smaller than itself, which is possible only if the number is zero. In our case ƒ(n-m)-ƒ(m)=0 or ƒ(m)-ƒ(n-m)=0, which means ƒ(m)=ƒ(n-m).
As a result, if ƒ(n)>ƒ(m) ⇒ ƒ(m)=ƒ(n-m)

In the point 3 we stated that there ∃ an integer q≥1 so that q⋅ƒ(n-m)=ƒ(n)-ƒ(m), which is ⇔ q⋅ƒ(m)=ƒ(n)-ƒ(m) when ƒ(n)>ƒ(m) ⇔ ƒ(n)=(q+1)⋅ƒ(m) ⇔ ƒ(m) | ƒ(n)

Saturday, September 10, 2011

Alternative definition for the prime numbers

I was watching this video recently and mentioned the presenter pointing to the definition for the prime numbers we all know from the school, i.e.

Classical Definition: A natural number p > 1 is prime if 1 and p are its only divisors.

However, there is an alternative way to define the prime numbers (the one I learned when I was doing my university degree).

Alternative Definition: A natural number p > 1 is prime if for ∀ a, b - integers satisfying p | a⋅bp | a or p | b

(Note: "p | a" means "p divides a" or "a is divisible by p").

Sounds complicated? Not really. For example, according to this definition 4 isn't a prime because it divides 2⋅2, but doesn't divide 2.

With this definition, the classical one is, in fact, a theorem:

Theorem: A natural number p > 1 is prime ⇔ 1 and p are its only divisors.

Let's prove it ...

Step 1. Let's assume that p is prime and show that 1 and p are its only divisors.

We will start by supposing the contrary, i.e. there ∃ s,tN, both ∉ {1,p}, so that p=s⋅t. This means p | s⋅t and because p is prime ⇒ p | s or p | t. Let's assume p | ss=q⋅p, q - positive integer ⇒ p=s⋅t=q⋅p⋅t ⇒ 1=q⋅t which has sense when t=1 and q=1 ⇒ s=p - contradiction. So 1 and p are the only divisors for p - prime.

Step 2. Let's assume that the only divisors of p are {1, p} and show that p is prime.

Again we will suppose the contrary, i.e. p isn't prime. This means there ∃ a,b - integers so that p | a⋅b and p doesn't divide a and p doesn't divide b. We can write a⋅b=p⋅q, q - positive integer. We should note that (a,p) = 1 (greatest common divisor), otherwise if (a,p)=d>1 ⇒ d | p which means d = p and p | a - contradiction, so (a,p) = 1. From Bezout Theorem this means that there ∃ k,z - integers so that k⋅p+z⋅a=1 ⇔ k⋅p⋅b+z⋅a⋅b=bk⋅p⋅b+z⋅p⋅q=bp⋅(k⋅b+z⋅q)=b or p | b - contradiction. So p is prime.

It is important to note that if we consider the classical definition for the prime numbers, then the alternative definition becomes what is known as Euclid Lemma.

In other words, it doesn't really matter which definition we adopt (for integer numbers, for more details see this link), because one is provable from another.

Sunday, August 14, 2011

The Code, Part 3

Here is another interesting puzzle from the book (as I stated previously you need the passwords from the pre-selections).

Basil, a friend of mine, gave me this interesting array of numbers. He told me that if I grab any five numbers from it such that none of the numbers shares a row or column with another, then my five chosen numbers will always add up to the same sum. I think he made an error, however, because it does not always seem to verify properly. Exactly where is the glitch and what quantity should go in its place to improve the matrix's reliability?


Let's start with some analysis. The claim is that a1,j1+a2,j2+a3,j3+a4,j4+a5,j5=const, where jk≠jm, ∀k,m∈{1..5}

This implies that, e.g. a1,j1+a2,j2=a1,j2+a2,j1 or by simplifying it a1,i+a2,j=a1,j+a2,i. We make this expression even more generic (easy to prove)

Indeed, this is what is happening:
Example 1

Or 274+86=63+297

Example 2
Or 86+303=165+224

A quick check for all of the 2x2 sub-matrices (stepwise, can be done by a computer program) shows the problem:


The answer is, 142 should be replaced by 238.

Friday, August 12, 2011

The Code, Part 2

For the last two weeks I am addicted to The Code. Those, who successfully passed pre-selection, were offered to download a PDF file (you need the passwords from the pre-selection to open it) with even more puzzles.

Here is another one I liked:

A Mathematician's Apology
ABC = A3 + B3 + C3
CDE = C3 + D3 + E3
CDA = C3 + D3 + A3
FED = F3 + E3 + D3
A⋅100 + A⋅10 + D = ?

Now, if you Google for "A Mathematician's Apology", you will come to the following link, which is a book with the same title by Hardy. Section 15 reveals all the solutions for the equation:
x⋅100 + y⋅10 + z = x3 + y3 + z3
where x∈{1..9}, y,z∈{0..9}. They are (1,5,3), (3,7,0), (3,7,1) and (4,0,7).

The pattern is easy to spot now:
(A,B,C) = (1,5,3)
(C,D,E) = (3,7,0)
(C,D,A) = (3,7,1)
(F,E,D) = (4,0,7)

and A⋅100 + A⋅10 + D = 117

However, if (like me) "Google search" isn't (always) an obvious option, the following Java code:

public class Code {
  public static int func(int a, int b, int c) {
    return a*100 + b*10 + c - a*a*a - b*b*b - c*c*c;		

  public static void main(String[] args) {
    for (int a = 1; a < 10; a++) {
      for (int b = 0; b < 10; b++) {
        for (int c = 0; c < 10; c++) {
          if (func(a,b,c) == 0) {

will also return (1,5,3), (3,7,0), (3,7,1) and (4,0,7).

Monday, August 8, 2011

More maths from IMO 2011

Last week I was looking for something on Google. The reason I can't remember what I was looking for is the IMO website (International Mathematical Olympiad) that "accidently" was returned in the search results. So I stuck on resolving one of the exercises for ... 3 days!!!

The problem that attracted my attention was number 3 from IMO 2011:

For any function ƒ:R→R that satisfies:

ƒ(x+y)≤y⋅ƒ(x)+ƒ(ƒ(x)), for ∀x,y∈R

prove that ƒ(x)=0, for all x≤0.

Here is the solution. Few inequalities that will be used later first:

- y=0 ⇒ ƒ(x)≤ƒ(ƒ(x)) (1)

- x=0 ⇒ ƒ(y)≤y⋅ƒ(0)+ƒ(ƒ(0)) ≡ ƒ(x)≤x⋅ƒ(0)+(ƒ(0)) (2)

- ƒ(0)=ƒ(x+(-x))≤-x⋅ƒ(x)+ƒ(ƒ(x)) ≡ ƒ(0)+x⋅ƒ(x)≤ƒ(ƒ(x)) (3)

- ƒ(ƒ(x))=ƒ(x+ƒ(x)-x)≤[ƒ(x)-x]⋅ƒ(x)+ƒ(ƒ(x)) ≡ [ƒ(x) - x]⋅ƒ(x)≥0 (4)

Let's note ƒ(0)=c and rewrite/expand the above expressions:

- from (1) ⇒ c≤ƒ(c) (1A)

- from (2) ⇒ ƒ(x)≤x⋅c+ƒ(c) (2A)

- from (3) ⇒ c+x⋅ƒ(x)≤ƒ(ƒ(x)) (3A)

- from (1) and (2A) ⇒ ƒ(x)≤ƒ(ƒ(x))≤ƒ(x)⋅c+ƒ(c) or ƒ(x)≤ƒ(x)⋅c+ƒ(c) (5)

- from (5), if x=c ⇒ ƒ(c)⋅c≥0 (6)

All these are true for ∀x∈R. Let's prove that c=0.

1. First of all, let's suppose c<0.

From (6) ⇒ ƒ(c)≤0.

From (5) ⇒ ƒ(x)⋅[1-c]≤ƒ(c)≤0. 1–c>1, from which we have ƒ(x)⋅[1-c]≤0 ≡ ƒ(x)≤0, for ∀x∈R.

From (4) and because ƒ(x)≤0 ⇒ ƒ(x)–x≤0 ≡ ƒ(x)≤x, for ∀x∈R. Considering this and (1A) ⇒ c≤ƒ(c)≤c ⇒ ƒ(c)=c.

From (3A) and x=c ⇒ c+c⋅ƒ(c)<ƒ(ƒ(c)) ≡ c+c2≤c ≡ c2≤0 ⇒ c=0 - contradiction.

2. Let's suppose c>0.

From (1A) ⇒ 0<c≤ƒ(c).

From (2A) we have ƒ(x)≤x⋅c+ƒ(c), where x⋅c+ƒ(c) is a line with positive "c" as a coefficient (i.e. increasing line). This means that for ∀x≤-ƒ(c)⁄c ⇒ ƒ(x)≤x⋅c+ƒ(c)≤0 (*).

From (4) and (*) ⇒ ƒ(x)–x≤0 for ∀x≤-ƒ(c)⁄c or ƒ(x)≤x ≤-ƒ(c)⁄c (again, for ∀x≤-ƒ(c)⁄c). But because ƒ(x)≤-ƒ(c)⁄c ⇒ ƒ(ƒ(x))≤-ƒ(c)⁄c (**).

From (3A) and (**) ⇒ c+x⋅ƒ(x)≤ƒ(ƒ(x))≤-ƒ(c)⁄c, for ∀x≤-ƒ(c)⁄c. Both "c" and ƒ(c) are positive ⇒ x⋅ƒ(x)≤-[ƒ(c)⁄c]–c≤0 or x⋅ƒ(x)≤0 for ∀x≤-ƒ(c)⁄c. Because x is negative and (from (*)) ƒ(x) is negative ⇒ x⋅ƒ(x)≥0. So, 0≤x⋅ƒ(x)≤0, for ∀x≤-ƒ(c)⁄c. This is possible only if ƒ(x)=0, for ∀x≤-ƒ(c)⁄c (***).

From (*) and (***) ⇒ 0=ƒ(x)≤x⋅c+ƒ(c)≤0, for ∀x≤-ƒ(c)⁄c, which has sense only if c=0 - contradiction.

We proved that c=0 and ƒ(0)=0. As a result:

- from (2A) ⇒ ƒ(x)≤x⋅c+ƒ(c)=0 ≡ ƒ(x)≤0, for ∀x∈ R (2B).

- from (3A) ⇒ x⋅ƒ(x)≤ƒ(ƒ(x)), for ∀x∈R (3B).

From (2B) ⇒ for ∀x≤0, x⋅ƒ(x)≥0.

From (3B) ⇒ 0≤x⋅ƒ(x)≤ƒ(ƒ(x)), for ∀x≤0.

But ƒ(x)≤0 always (2B). So must be ƒ(ƒ(x))≤0 ⇒ 0≤x⋅ƒ(x)≤ƒ(ƒ(x))≤0, for ∀x≤0. From this ⇒ ƒ(x) = 0, for ∀x≤0.

Love this stuff :)

Saturday, August 6, 2011

HTML5 <video> tag, IE9 and Sophos antivirus

I was playing with HTML5 <video> tag recently. Everything was working fine with Safari and Chrome; however IE9 constantly stopped playing the video at ~11-20 seconds. I thought the problem was in compatibility, but, after spending half of the day (and night), I figured out the problem.

The problem was the "Sophos Web Content Scanner" add-on (Sophos v9.5). Once disabled (from the menu, Tools -> Manage add-ons), <video> tag works as it should with IE9.

If you have Sophos installed, you can test this by using, for example, this link (or HTML5 version of YouTube). IE9 needs to be restarted each time the add-on is enabled/disabled.

Thursday, August 4, 2011

The Code

Just in case you didn't know, BBC started recently a new TV series with the name "The Code". Apart from being a TV documentary, it is also a quest for the treasure hunt. Some of the questions, in this quest, are quite interesting. It's all about observing the patterns, nothing complicated.

For example, have a look at this puzzle. After about 20 minutes of looking at the pictures I came with the following solution:

For the "six-sided dice" the repeating numbers are 1,2,3. For the "eight-sided dice" the repeating numbers are 1,2,3,4. It looks like these numbers generate the others.

1. For the first "six-sided dice" - 1,2,3 generate 2,3,4
2 = 1 + 11
3 = 1 + 21
4 = 1 + 31

2. For the second "six-sided dice" - 1,2,3 generate 2,5,10
2 = 1 + 12
5 = 1 + 22
10 = 1 + 32

3. For the third "six-sided dice" - 1,2,3 generate 2,9,28
2 = 1 + 13
9 = 1 + 23
28 = 1 + 33

4. For the first "eight-sided dice" - 1,2,3,4 generate 2,3,4,5
2 = 1 + 11
3 = 1 + 21
4 = 1 + 31
5 = 1 + 41

5. For the second "eight-sided dice" - 1,2,3,4 generate 2,5,10,17
2 = 1 + 12
5 = 1 + 22
10 = 1 + 32
17 = 1 + 42

Now, following this pattern...

6. For the third "eight-sided dice" - 1,2,3,4 generate
1 + 13 = 2
1 + 23 = 9
1 + 33 = 28
1 + 43 = 65

And the answer is: 65

Friday, July 22, 2011

JavaScript to become a new technology stack?

Last month I mentioned about the Microsoft's effort towards HTML5 and JavaScript adoption. This month I have more interesting news :) ... JavaScript is about to become a new technology stack.

For example:

1. Do you need a cloud IDE for JavaScript coding? No problems, see Code and host your code.

2. Do you need server side JavaScript? No problems use node.js. This is a "tool" pretty much like Java (i.e. compile and execute - JIT). It integrates perfectly with WebSockets and WebWorkers added as part of HTML5.

3. Do you need anything more complex (e.g. enterprise oriented)? Use services like Wakanda, it even has a built in data store. Alternatively, use MongoDB (JSON storage) with node.js.

To conclude, it is possible to build quite complex web, mobile, desktop (in terms of HTML5 offline apps) applications using just JavaScript.

Sunday, July 10, 2011

Sudoku game or humans vs. machines (computers in this case)

Few days ago, a friend of mine posted the following link on his wall on Facebook. The main question from that article was "how long will it take to crack ...the world's hardest Sudoku?". I felt like being challenged.

I am not a Sudoku fan, however my wife and my son are. Just to annoy them (not because I am a bad person :), but because I was looking for a way to sort the Sudoku puzzles in a quicker way, when being asked to assist them ... so, "I am lazy" is a more precise definition) and because I am a software engineer, I wrote a long time ago (of course I was inspired, it's not entirely my contribution) a little piece of Java code to resolve such puzzles. As a result, I decided to test it.

Here is the code:

import java.util.*;

public class ResolveSudoku {
 private static final int[] sudokuMatrix = {0,0,0,0,0,0,0,0,0,

 // load the content from file
 public static void load(String file) throws Exception {
  String str = null;
  int line = 0;
  BufferedReader br = new BufferedReader(new FileReader(file));

  while ((str = br.readLine()) != null) {
   if (str.trim().length() == 0 || str.startsWith("#")) 

   if (line >= 9) {
    line = 10;

   String[] vls = str.split(",");
   if (vls.length != 9) {
    throw new Exception("Line " + (line + 1) + 
     " has illegal number of elements!");

   for (int i = 0; i < 9; ++i) {
    int value = Integer.parseInt(vls[i].trim());
    if (value < 0 || value > 9) {
     throw new Exception("Line " + (line + 1) + 
      " position " + (i + 1) + " has illegal value!");
    sudokuMatrix[line*9 + i] = value;


  if (line != 9) throw new Exception("File " + file + 
   " has illegal number of lines!");

 // print the content of the sudoku matrix
 public static void printMatrix() {
  for (int i = 0; i < 81; ++i) {
   if (i != 0 && i % 3  == 0) System.out.print(" ");
   if (i != 0 && i % 9  == 0) System.out.println();
   if (i != 0 && i % 27 == 0) System.out.println();

   if (sudokuMatrix[i] == 0) System.out.print(".");
   else System.out.print(sudokuMatrix[i]);

 private static boolean check(int row, int col, int value) {
  // check if row is ok
  int tmp = row * 9;
  for (int i = 0; i < 9; ++i) {
   if (sudokuMatrix[tmp + i] == value) 
    return false;

  // check if column is ok
  for (int i = 0; i < 9; ++i) {
   if (sudokuMatrix[i*9 + col] == value) 
    return false;

  // check if box is ok
  tmp = (row / 3)*27 + (col / 3)*3;
  if (sudokuMatrix[tmp] == value) return false;
  if (sudokuMatrix[++tmp] == value) return false;
  if (sudokuMatrix[++tmp] == value) return false;

  tmp += 7;
  if (sudokuMatrix[tmp] == value) return false;
  if (sudokuMatrix[++tmp] == value) return false;
  if (sudokuMatrix[++tmp] == value) return false;

  tmp += 7;
  if (sudokuMatrix[tmp] == value) return false;
  if (sudokuMatrix[++tmp] == value) return false;
  if (sudokuMatrix[++tmp] == value) return false;

  return true;

 public static boolean findSolution(int row, int col) {
  if (row == 9) {
   row = 0;
   if (++col == 9) return true;

  int pos = row * 9 + col;
  if (sudokuMatrix[pos] != 0) return findSolution(row + 1, col);

  for (int i = 1; i <= 9; ++i) {
   if (check(row, col, i)) {
    sudokuMatrix[pos] = i;
    if (findSolution(row + 1, col)) return true;

  // reset the value back to zero 
  sudokuMatrix[pos] = 0; 
  return false;

 public static void main(String[] args) throws Exception {

  long start = System.currentTimeMillis();
  boolean res = findSolution(0, 0);
  long ex_time = System.currentTimeMillis() - start;

  if (res) printMatrix();
  else System.out.println("No solution!");

  System.out.println("Execution time: " + ex_time + "ms.");

It's, basically, a routine that reads the puzzle's matrix from a file and another one that performs a brute force (i.e. analysing all the possible cases) in form of (I would say) an elegant recursive method call. Having the following input:

The problem was solved in ~30 milliseconds (after the byte code was JIT-ed on my "2.13Ghz, Core 2 Duo with SSD on board" laptop). And the answer was:

145 327 698
839 654 127
672 918 543

496 185 372
218 473 956
753 296 481

367 542 819
984 761 235
521 839 764

Thursday, June 2, 2011

Regarding the square root of n

Have you ever caught yourself (assuming you like maths) proving that numbers like √ 2  or √ 5  etc. are irrationals. Here is, more or less, a general statement with the proof.

For ∀n ∈ N (natural numbers), either √ n  ∈ N or √ n  ∈ I (irrational numbers, i.e. real numbers minus rational numbers).

There are two possible cases:

1. If n = p2, p ∈ N ⇒ √ n  = p ∈ N

2. If n ≠ p2 or there is no such a p ∈ N that would satisfy n = p2, then √ n  is irrational.

Let's suppose the contrary, i.e. there ∃p,q ∈ N and (p,q) = 1 (greatest common divisor) so that
This means that n·q2 = p2. From (p,q) = 1 ⇒ (p2,q2) = 1.

From the B├ęzout’s theorem, there ∃z,t ∈ Z (integers) so that
z·p2 + t·q2 = 1 ⇔ z·n·q2 + t·q2 = 1 ⇔
q2·(z·n + t) = 1
which means q2 divides 1, but this is possible only if q = 1. As a result n = p2 - contradiction with our initial assumption.

This proves the statement.

Monday, March 14, 2011

System Analyst vs. Business Analyst

People tend to confuse these two terms, the purpose of this article is to clarify them. Business Analyst is a person that is part of the business (which isn't necessarily IT related), knows the business and can articulate what business wants in form of the business requirements. System Analyst is a person with IT background, who knows how IT systems work, how to translate business requirements into technical requirements/artefacts so that they are understandable by the technical team; system designers, developers and testers. I didn't include architects to the list, because system analysis, as a practise, is part of the system architecture.

So, what is typically (this isn't a firm rule) expected from a System Analyst to deliver?
- Functional Requirements
- Non-functional Requirements
- Use Cases
- Wireframes with descriptions of each UI element
- Entity diagrams.

Traceability is another important word for the System Analyst (and not only). It is a way of showing how various artefacts are implementing/realising each other (e.g. use cases realising requirements etc.). In order to produce all these artefacts, it is recommended to use one of the existing, industry recognized UML designing tools. This can be Rational Rose, Sparx Enterprise Architect or anything suitable.

A typical process looks like:

1. A new model is created using the design tool.

2. Business requirements are imported/added to the model.

3. Business requirements, if not detailed enough, are extended with more detailed functional and non-functional requirements, gathered during various sessions with SME (Subject Matter Experts).

4. Functional/Non-functional and business requirements are linked (typically using "association" connector) for traceability reasons.

5. Use cases are elaborated and added to the model. As per the industry standard (check with Google), each use case must have one Basic Path with up to ten sequences/events and zero or more Alternative Paths.

6. For traceability reasons, use cases are linked (typically using "realize" connector) with the requirements they are realising.

7. Wireframes are elaborated and added to the model (I must add, if the tool supports this, e.g. Sparx Enterprise Architect). Wireframes represent the draft versions of the User Interfaces required from the System. The detailed mock-ups are typically elaborated by the graphical designers from the wireframes.

8. Wireframes are linked (typically using "realize" connector) with the use cases they are realising.

9. High level entity diagrams are elaborated and added to the model.

10. Wireframes' UI form elements are mapped (typically using "realize" or "association" connectors) with the relevant attributes of the relevant entities. This is a great way to highlight the re-usable elements to the system designers.

11. All the acronyms, business terms and abbreviations are added to the Model Glossary, this is mandatory regardless of the fact that the tool supports this (e.g. Sparx Enterprise Architect) or not (in which case it can be a separate document).

The following picture is just an example of the expectation.

And don't forget, all these artefacts must be reviewed by SME!

Few remarks:

1. Mock-ups, elaborated by the graphical designers, must follow the User Interface functional/non-functional requirements. For instance DDA (Disability Discrimination Act) requirements/standards, if it is the case.

2. Entity diagrams, provided by the system analyst, aren't expected to be detailed class or database diagrams, this is an exercise for the system designers. They must be part of, what I call, the "requirements language".

For instance, if the business requirement says that "the super user must approve the document before it is published", then there will be an entity Document that will, probably, have a "status" attribute mapped (aggregation or composition) to an enumeration with the elements "DRAFT, READY, APPROVED". The relevant use case (or use cases) will operate with these terms.