Sunday, November 11, 2012

Tackling Andrica's conjecture. Part 2

With the previous post, I stopped at the argument that $f_{4}(x)=\pi (x) - \sqrt{x}$ seems to be ascending for prime arguments. Apparently, this is a necessary and sufficient condition, i.e.:
Lemma 1. Andrica's conjecture is true iff function $f_{4}(x)=\pi (x) - \sqrt{x}$ is strictly ascending ($x < y \Rightarrow f(x) < f(y)$) for prime arguments.
The proof is given in the following discussion with the math.stackexchange.com community.
It is worth mentioning that $f_{4}(x)$ is descending in between prime numbers. For example for $\forall x\in \left ( p_{n}, p_{n+1} \right )$, $f_{4}(x)=n - \sqrt{x}$ and ${f_{4}}'(x)=-\frac{1}{2\cdot \sqrt{x}} < 0$.
In fact the purpose of the discussion was to prove that $$\sqrt{p_{n}} < n$$ which turns to be true for $\forall n \geq 2$, using Rosser's theorem.
Interesting result, out of this inequality, is that $$\sqrt{p_{n}} - \sqrt{p_{1}} < n-1$$ because $\sqrt{p_{1}}=\sqrt{2} > 1$. However, $$\sum_{k=2}^{n}\left ( \sqrt{p_{k}} - \sqrt{p_{k-1}} \right )=\sqrt{p_{n}} - \sqrt{p_{1}} < n-1$$ and as a result $$\frac{1}{n-1}\cdot \sum_{k=2}^{n}\left ( \sqrt{p_{k}} - \sqrt{p_{k-1}} \right ) < 1$$ or, in other words, on average Andrica's conjecture seems to be true.
But, leaving the statistical argument aside, how do we prove that $f_{4}(x)$ is ascending for prime arguments?
I don't have the answer yet, but here is where I stuck ...
Let's have a look at the following function $$f_{5}(x)=1 - \sqrt{p_{n+1} + x} + \sqrt{p_{n} + x}$$ This function is ascending $\forall x \geq -p_{n}$ $${f_{5}(x)}'= -\frac{1}{2\cdot \sqrt{p_{n+1} + x}} + \frac{1}{2\cdot \sqrt{p_{n} + x}} > 0$$ because $\sqrt{p_{n+1} + x} > \sqrt{p_{n} + x}$, for $\forall x \geq -p_{n}$.
This function also has a zero, $1=\sqrt{p_{n+1} + x} - \sqrt{p_{n} + x} \Leftrightarrow $ $$1 = p_{n+1}+p_{n} +2\cdot x -2\cdot \sqrt{(p_{n+1} + x)\cdot (p_{n} + x)} \Leftrightarrow $$ $$2\cdot \sqrt{(p_{n+1} + x)\cdot (p_{n} + x)}=p_{n+1}+p_{n} +2\cdot x - 1 \Leftrightarrow $$ $$4\cdot (p_{n+1} + x)\cdot (p_{n} + x)=p_{n+1}^{2}+p_{n}^{2}+4\cdot x^{2}+1+2\cdot p_{n+1}\cdot p_{n}+$$ $$+4\cdot p_{n+1}\cdot x-2\cdot p_{n+1}+4\cdot p_{n}\cdot x-2\cdot p_{n}-4\cdot x \Leftrightarrow $$ jumping ahead $$2\cdot p_{n+1}\cdot p_{n}=p_{n+1}^{2}+p_{n}^{2}+1-2\cdot p_{n+1}-2\cdot p_{n}-4\cdot x \Leftrightarrow $$ $$4\cdot x=(p_{n+1}-p_{n})^{2}-2\cdot (p_{n+1}+p_{n})+1=(p_{n+1}-p_{n}-1)^{2}-4\cdot p_{n}$$ as a result $$x_{0}=\left( \frac{p_{n+1}-p_{n}-1}{2} \right)^{2}-p_{n}$$ Obviously, $x_{0} > -p_{n}$.
And now:
Lemma 2. $f_{4}(p_{n+1}) > f_{4}(p_{n})$ iff $x_{0} < 0$, where $x_{0}$ is zero of $f_{5}(x)$.
First of all $$x_{0} < 0 \Leftrightarrow p_{n+1}-p_{n} < 2\cdot \sqrt{p_{n}}+1$$
As a result, if $f_{4}(p_{n+1}) > f_{4}(p_{n})$ this is equivalent (from Lemma 1) with $\sqrt{p_{n+1}}-\sqrt{p_{n}} < 1$ (Andrica's inequality for $p_{n}, p_{n+1}$). But then $$p_{n+1} - p_{n}=(\sqrt{p_{n+1}}-\sqrt{p_{n}})\cdot (\sqrt{p_{n+1}}+\sqrt{p_{n}}) < \sqrt{p_{n+1}}+\sqrt{p_{n}}$$ and applying Andrica's inequality again $$p_{n+1} - p_{n} < \sqrt{p_{n}}+1+\sqrt{p_{n}}=2\cdot \sqrt{p_{n}}+1$$ so $x_{0} < 0$.
Now, if $x_{0} < 0$, where we know that $x_{0} > -p_{n}$, and the fact that $f_{5}(x)$ is ascending then we have $$x_{0} < 0 \Rightarrow 0=f_{5}(x_{0}) < f_{5}(0)=1 - \sqrt{p_{n+1}} + \sqrt{p_{n}}$$ or $$\sqrt{p_{n+1}}-\sqrt{p_{n}} < 1$$ and then (Lemma 1) $f_{4}(p_{n+1}) > f_{4}(p_{n})$.
Do we know if $p_{n+1}-p_{n} < 2\cdot \sqrt{p_{n}}+1$ for $\forall n$?
Not yet, but we are close. One result, proved by Martin Huxley, states that $p_{n+1}-p_{n} < p_{n}^{\theta }$ for $\theta > \frac{7}{12}$ and sufficiently large $n$. In our case, $\theta =0.5$
Another result by Baker, Harman and Pintz states that $\theta$ may be taken to be 0.525.

Sunday, September 23, 2012

Tackling Andrica's conjecture. Part 1

Andrica's conjecture is one of those mathematical statements which are extremely easy to formulate, but complicated to prove. In fact it hasn't been proved yet, to be more precise. Here we go, for all consecutive prime numbers $p_{n}$ and $p_{n+1}$, it happens that: $$\sqrt{p_{n+1}} - \sqrt{p_{n}}< 1$$

1. The first attempt

Let's start with an analogy, here is an inequality $p_{n+1} < 2\cdot p_{n}$ which is a direct result from Bertrand's postulate, proved by Chebyshev. As a result of this inequality, we have: $$0 < \ln(p_{n+1}) - \ln(p_{n}) < \ln(2) < 1$$ But what is the relationship between natural logarithm and square root functions?

Well, let's have a look at the following function: $f_{1}(x)=\sqrt{x} - \ln(x)$. First derivative of this function is ${f_{1}}'(x) = \frac{1}{2\cdot \sqrt{x}} - \frac{1}{x}=\frac{1}{x}\cdot \left ( \frac{\sqrt{x}}{2} -1\right )\geq 0$, for $\forall x\geq 4$. This means that $f_{1}(x)$ is ascending for $\forall x\geq 4$. As a result, for $\forall p_{n}\leq p_{n+1}$, $n\geqslant 3$ (because $p_{3}=5$) $\Rightarrow f_{1}(p_{n}) \leq f_{1}(p_{n+1})$ or $$\sqrt{p_{n}} - \ln(p_{n}) \leq \sqrt{p_{n+1}} - \ln(p_{n+1})$$ Finally: $$0< \ln(p_{n+1}) - \ln(p_{n}) \leq \sqrt{p_{n+1}} - \sqrt{p_{n}}, \forall n\geq 3$$ As we can see, the analogy isn't of any use really, though we have a nice inequality.

2. The second attempt

Let's have a look at another function: $f_{2}(x)=\frac{x}{\ln(x)} - \sqrt{x}$. Its first derivative is $${f_{2}}'(x)=\frac{\ln(x)-1}{(\ln(x))^{2}}-\frac{1}{2\cdot \sqrt{x}}=\\ \frac{1}{2\cdot \sqrt{x}\cdot (\ln(x))^{2}}\cdot \left ( 2\cdot \sqrt{x}\cdot \ln(x)-2\cdot \sqrt{x}-(\ln(x))^{2} \right )$$

For the first term we have: $\frac{1}{2\cdot \sqrt{x}\cdot (\ln(x))^{2}}> 0$, for $\forall x> 0$.

Let's check the second term: $\left ( 2\cdot \sqrt{x}\cdot \ln(x)-2\cdot \sqrt{x}-(\ln(x))^{2} \right )=$ $\left (-x+2\cdot \sqrt{x}\cdot \ln(x)-(\ln(x))^{2}-2\cdot \sqrt{x}+x \right )=$ $\left [ -\left ( x-2\cdot \sqrt{x}\cdot \ln(x)+(\ln(x))^{2} \right ) +\left ( 1- 2\cdot \sqrt{x}+x\right )-1\right ]=$ $\left [ \left ( \sqrt{x}-1\right )^{2}-\left ( \sqrt{x}-\ln(x) \right )^{2} -1\right ]$ and we are not done yet. Using $a^{2}-b^{2}=(a-b)\cdot (a+b)$, we have: $\left ( 2\cdot \sqrt{x}\cdot \ln(x)-2\cdot \sqrt{x}-(\ln(x))^{2} \right )=$ $\left [ \left ( \ln(x)-1 \right )\cdot \left ( \sqrt{x}-1+\sqrt{x}-\ln(x) \right )-1 \right ]$.

What we have so far:

  • $\ln(x)-1 \geq 1$, for $\forall x \geq e^{2}$
  • $\sqrt{x}-1 \geq 1$, for $\forall x \geq 4$
  • $\sqrt{x}-\ln(x)>0$, for $\forall x \geq 4$, this is, by the way, $f_{1}(x)$, so $f_{1}(x)\geq f_{1}(4)>0$
Altogether: $\left ( 2\cdot \sqrt{x}\cdot \ln(x)-2\cdot \sqrt{x}-(\ln(x))^{2} \right )\geq 0$, for $\forall x\geq e^{2}$.

This means ${f_{2}}'(x)\geq 0$ and $f_{2}(x)$ is ascending, for $\forall x\geq e^{2}$. As a result, for $\forall p_{n}\leq p_{n+1}$, $n\geqslant 3$ (the cases for $p_{3}=5$, $p_{4}=7$ and $p_{5}=11$ can be verified with a calculator) $\Rightarrow f_{2}(p_{n}) \leq f_{2}(p_{n+1})$ or $$\frac{p_{n}}{\ln(p_{n})} - \sqrt{p_{n}}\leq \frac{p_{n+1}}{\ln(p_{n+1})} - \sqrt{p_{n+1}}$$ Final result is (let's note it as (2)): $$ \sqrt{p_{n+1}}- \sqrt{p_{n}}\leq \frac{p_{n+1}}{\ln(p_{n+1})} - \frac{p_{n}}{\ln(p_{n})}, \forall n\geq 3$$

2.1 A negative result of the second attempt

It is worth mentioning that function $\frac{x}{\ln(x)}$ has a special role in the number theory. According to the Prime Number Theorem: $\displaystyle\smash{\lim_{x \to \infty }}\tfrac{\pi (x)}{x/\ln(x)}=1$, where $\pi (x)$ is the prime counting function.

Obviously, $\displaystyle\smash{\lim_{n \to \infty }}\tfrac{\pi (p_{n})}{p_{n}/\ln(p_{n})}=1$ and $\displaystyle\smash{\lim_{n \to \infty }}\tfrac{\pi (p_{n+1})}{p_{n+1}/\ln(p_{n+1})}=1$ (as sub-sequences).

Also $\pi (p_{n})=n$ and $\pi (p_{n+1})-\pi (p_{n})=1$.

If we put all these together, does it mean that $\displaystyle\smash{\lim_{n \to \infty }} \left ( \frac{p_{n+1}}{\ln(p_{n+1})} - \frac{p_{n}}{\ln(p_{n})} \right )=1$?

Nope, it doesn't, check this link for more details.

2.2 A positive result of the second attempt

According to the following article, in 2005 it was proved that $\displaystyle\smash{\lim_{n \to \infty }}\inf \frac{g_{n}}{\ln(p_{n})}=0$, where $g_{n}=p_{n+1}-p_{n}$ or the n-th gap.

It is also obvious that $\frac{p_{n+1}}{\ln(p_{n+1})} < \frac{p_{n+1}}{\ln(p_{n})}$ (because $p_{n} < p_{n+1}$ and $\ln(p_{n}) < \ln(p_{n+1})$). As a result: $$ \sqrt{p_{n+1}}- \sqrt{p_{n}}\leq \frac{p_{n+1}}{\ln(p_{n+1})} - \frac{p_{n}}{\ln(p_{n})}\leq \frac{g_{n}}{\ln(p_{n})}, \forall n\geq 3$$ Or $$\displaystyle\smash{\lim_{n \to \infty }}inf\left ( \sqrt{p_{n+1}}- \sqrt{p_{n}} \right )=0$$ This means that Andrica's conjecture is true for an infinity pairs of $p_{n}$ and $p_{n+1}$. It doesn't prove the conjecture though, which states "it is true for all $p_{n}$ and $p_{n+1}$"

3. The third attempt

How about the following function $f_{3}(x)=\pi (x) - \frac{x}{\ln(x)}$? Well, for this case I used a little Python program, the code is here. First of all, $f_{3}(23)=1.66463325521 > f_{3}(29)=1.38774807317$. So, $f_{3}(x)$ is not ascending. The general tendency of $f_{3}(x)$ seems to be ascending though, according to this graphic (click to enlarge):

At least, for the areas where the function is ascending for consecutive primes (update the program to plot just plt.plot(x1, y1), i.e. primes and values of the function for the prime arguments), Andrica's conjecture is true, because if $$\pi (p_{n+1}) - \frac{p_{n+1}}{\ln(p_{n+1})} \geq \pi (p_{n}) - \frac{p_{n}}{\ln(p_{n})}$$ then (also consider (2)) $$1=\pi (p_{n+1}) - \pi (p_{n}) \geq \frac{p_{n+1}}{\ln(p_{n+1})} - \frac{p_{n}}{\ln(p_{n})}$$ Still, this is not a proof.

4. What's next?

Well, probably the next logical thing is to analyse the function $f_{4}(x)=\pi (x) - \sqrt{x}$. At least it seems to be ascending for prime arguments.

But the work is still in progress...

Tuesday, May 22, 2012

Factoring big numbers and RSA

Last week I came across this article. It's an interesting report indeed and with this post I will explain what is the implication. But first of all, let's quickly recall what RSA encryption is.

There is a message m which needs to be encrypted. There is a public key (e, n) and a private key (d, n). Message is encrypted in the following way c = me (mod n). Cipher text is decrypted in the following way m = cd (mod n). Puting all these together me⋅d ≡ m (mod n) and if m and n are co-prime then me⋅d-1 ≡ 1 (mod n). n is selected to be a product of two distinct prime numbers n = p⋅q and, according to the Euler's theorem, mφ(n) ≡ 1 mod n. Now e and d (exponents of the public/private keys) are selected so that e⋅d - 1 ≡ 0 (mod φ(n)) or e⋅d - 1 = k⋅φ(n), k - integer.

Because n is a product of really big prime numbers (thus n is a big number itself), then factoring it is a really complicated problem, considering "limited" computational power of the modern computers and the fact that factoring is an exponential (in terms of complexity) problem. As a result, computing φ(n), which in this particular case is φ(n) = (p - 1)⋅(q - 1), is also a complicated problem.

However, computing the greatest common divisor (GCD) isn't a complicated problem. As a result, if two public keys (e1, n1) and (e2, n2) are spotted, so that GCD(n1, n2) > 1, then decoding the relevant messages m1 and m2 becomes and easy task and this is the topic of the article I have mentioned at the top of this post.

Now, let's do a quick test. Imagine the following two public keys and associated cipher texts:

e1 = 65537
n1 = 59271838373237712241010698426785545947980750376894660532845611609385295
     49357464245996603984250810283460055082118943354872215289998388427726673
     74160929852573051680099378617005092405110706474184136037559125038434888
     56979904991517729100725512850421664634705274281314737938901139871448406
     073842088742598680079967
cipher1 = "J\u00c1R\u0090\u00e1\u00f4\u008b5My\u00f8\u00a1\u00f4>\u00a2\u00c3\u0010
           \u00bd\u00eb\u00cc&\u007fb\u001aC$\u001d\u00c5\u00b7\u00cdz\u00b7\u0017
           \u008a#9\u0012\u0089\u00feao\u0019\u009c\u00eb\u00b0>\u0086\u009b
           \u001d3~b0-u\u00fc\u0004!\rc\\\u00cb$\u0091\u009e\u00a1N\u009d2\u00ff
           \u0019\u009a9vH.\u00d5\u00e7m\u00a9m\u00ea^\u00d3T$\u00d7\u00d7\u0011
           \u0081\u00e4B\u009b~\u008c$\u00a6K\u008a\u00dc`\u00b4\u009cu\u00fb\u00c2
           \u0006\u00d1\u00bb\u00b9\u00a0\u008f\u00d2\u00bc\u0002\u00f6#\u001f\u001dM
           \u00bb\u0098\u00f2\u00a0\u009fO\u0080"

e2 = 65537
n2 = 72216220929585874029800250341144628594049328751082632793975238142970345
     80195859400832155769761460789049220801438471143407662437503457520665980
     33488377571129629910281750410842883648532072455460838627134172456428247
     65387577331828704441227356582723560291425753389466410790421096831823015
     438162111864463275922441
cipher2 = ".\u00fd9\u008dc\u00da\u00f9o\u00f5Vl\u00fb\u0087\u00ed\u00d5 \u00ee\u00cf
           \u0097~\u00d8T\u00f9.\u0018\u00b1\u00d5n^\u00a0\rA\u00e0\u001d\u00d5
           \u00c8:D\u00c9\u0014o\u00de\u00dbo\u00f9>)bc'a\u00a2\u008e\u00c1|\u00dd
           \r[q1\u00ac\u000f^\u0082b/A\u0010\u0087\u00ff\u00e4k=\u00c8\u00d6\u001c\u007f
           \u00fb\u00db\u00da&\u00d9\u00c5\u00c4\u008a#\u00a0u\u0003J&\n\u0083
           \u00a0\u00e1.\u00ba\u00fd\u008a0s?\u00deg\u00d50\u0015\u00eb\u0091
           \u00b3E\u00c7\u0015O\u00f3r\u00e3`~8\u00b4\u00b5=\u0089U\u007f\u00fa\u0019"

Check this link for the easiest implementation of the GCD algorithm. I won't implement it, because Java has it out of the box, as part of the BigInteger class. Here is my code:

import java.math.BigInteger;
import java.security.MessageDigest;
import java.util.Arrays;

public class RSATest {
 private static final long MAX_ITER = 100000;
 private static final int MAX_TRY = 5;
 private static final int BLOCK = 128;

 // hash function used by OAEP deoceder
 private static byte[] hash(byte[] b) throws Exception {
  assert b.length == BLOCK/2;

  MessageDigest algo = MessageDigest.getInstance("SHA-512");
  algo.reset();
  algo.update(b);
  return algo.digest();
 }

 // converts a String to a BigIntteger
 // convertion is performed as if the String were 
 // in base 256
 public static BigInteger stringToInt(String str) {
  int sz = str.length();

  BigInteger ret = new BigInteger("0");
  BigInteger v256 = new BigInteger("256"); // the base
  BigInteger mul = new BigInteger("1");

  for (int i = sz - 1; i >= 0; --i) {
   int v = str.charAt(i);
   ret = ret.add((new BigInteger("" + v)).multiply(mul));
   mul = mul.multiply(v256);
  }

  return ret;
 }


 // a simple method for xor-ing to byte arrays
 private static byte[] xor(byte[] b, byte[] a) {
  assert b.length == BLOCK/2;
  assert b.length == a.length;

  byte[] ret = new byte[BLOCK/2];
  for (int i = 0; i < BLOCK/2; ++i) ret[i] = (byte) (a[i] ^ b[i]);
  return ret;
 }

 // decodes an OAEP array, google for OAEP padding
 private static byte[] oaepDecode(byte[] b) throws Exception { 
  assert b.length == BLOCK;

  byte[] first  = Arrays.copyOfRange(b, 0, BLOCK/2);
  byte[] second = Arrays.copyOfRange(b, BLOCK/2, BLOCK);
 
  byte[] r = xor(second, hash(first));
  byte[] m = xor(first,  hash(r));

  return m;
 }

 // converts a BigInteger into a String
 // if withOAEP is set to true, then also applies
 // OAEP decoding
 public static String intToString(BigInteger n, boolean withOAEP) 
     throws Exception {
  byte[] b_r_n = n.toByteArray();
  int b_sz = b_r_n.length;

  // apply OAEP decoding
  if (withOAEP) {
   if (b_sz > BLOCK) {
    // make sure we pass the right size, i.e. of one BLOCK
    b_r_n = Arrays.copyOfRange(b_r_n, b_sz - BLOCK, b_sz);
   }
   b_r_n = oaepDecode(b_r_n);
  }
  return new String(b_r_n);
 }

 // a simple decoder with the private key (d, n)
 public static void decode(BigInteger d, BigInteger n, String cipherText) 
     throws Exception {
  BigInteger cipherInt  = stringToInt(cipherText);
  BigInteger messageInt = cipherInt.modPow(d, n);

  // a simple note that if messageInt is a solution for the
  // above modulo congruence eq, then messageInt + k*n is also 
  // a solution

  for (int i = 0; i < MAX_TRY; ++i) {
   System.out.println("Try: " + i);
   System.out.println("Integer message: " + messageInt);
   System.out.println("String message (flat) = " + 
    intToString(messageInt, false));
   System.out.println("String message (oaep) = " + 
    intToString(messageInt, true));
   System.out.println();
   messageInt = messageInt.add(n);
  }
 }

 // this method attempts to decode the cipher by pretending 
 // it knows "p" which is a composition of "n", i.e. n = p*q
 public static void decode_with_p(long e, BigInteger n, String cipher, 
     BigInteger p) throws Exception {
  BigInteger q = n.divide(p);

  BigInteger phi = q.subtract(BigInteger.ONE).multiply(
     p.subtract(BigInteger.ONE));

  BigInteger E = new BigInteger("" + e);
  for (long k = 0; k < MAX_ITER; k++) {
   BigInteger r = phi.multiply(
    new BigInteger("" + k)).add(BigInteger.ONE);

   BigInteger[] qr = r.divideAndRemainder(E);
   if (qr[1].equals(BigInteger.ZERO)) {
    // remainder is ZERO
    System.out.println("Cycle: " + k);
    System.out.println("Try private key: " + qr[0]);
    decode(qr[0], n, cipher);
    System.out.println();
   }
  }
 }

 public static void main(String[] args) throws Exception {
  long e1 = 65537;
  BigInteger n1 = new BigInteger( 
   "592718383732377122410106984267855459479807503768946" +
   "605328456116093852954935746424599660398425081028346" +
   "005508211894335487221528999838842772667374160929852" +
   "573051680099378617005092405110706474184136037559125" +
   "038434888569799049915177291007255128504216646347052" +
   "74281314737938901139871448406073842088742598680079967");

  String cipher1 = 
   "J\u00c1R\u0090\u00e1\u00f4\u008b5My\u00f8\u00a1\u00f4>" +
   "\u00a2\u00c3\u0010\u00bd\u00eb\u00cc&\u007fb\u001aC$" +
   "\u001d\u00c5\u00b7\u00cdz\u00b7\u0017\u008a#9\u0012" +
   "\u0089\u00feao\u0019\u009c\u00eb\u00b0>\u0086\u009b" +
   "\u001d3~b0-u\u00fc\u0004!\rc\\\u00cb$\u0091\u009e" +
   "\u00a1N\u009d2\u00ff\u0019\u009a9vH.\u00d5\u00e7m" +
   "\u00a9m\u00ea^\u00d3T$\u00d7\u00d7\u0011\u0081\u00e4B" +
   "\u009b~\u008c $\u00a6K\u008a\u00dc`\u00b4\u009cu" +
   "\u00fb\u00c2\u0006\u00d1\u00bb\u00b9\u00a0\u008f\u00d2" +
   "\u00bc\u0002\u00f6#\u001f\u001dM\u00bb\u0098\u00f2" +
   "\u00a0\u009fO\u0080";

  long e2 = 65537;
  BigInteger n2 = new BigInteger( 
   "722162209295858740298002503411446285940493287510826" +
   "327939752381429703458019585940083215576976146078904" +
   "922080143847114340766243750345752066598033488377571" +
   "129629910281750410842883648532072455460838627134172" +
   "456428247653875773318287044412273565827235602914257" +
   "53389466410790421096831823015438162111864463275922441");

  String cipher2 = 
   ".\u00fd9\u008dc\u00da\u00f9o\u00f5Vl\u00fb\u0087\u00ed" +
   "\u00d5 \u00ee\u00cf\u0097~\u00d8T\u00f9.\u0018\u00b1" +
   "\u00d5n^\u00a0\rA\u00e0\u001d\u00d5\u00c8:D\u00c9" +
   "\u0014o\u00de\u00dbo\u00f9>)bc'a\u00a2\u008e\u00c1|" +
   "\u00dd\r[q1\u00ac\u000f^\u0082b/A\u0010\u0087\u00ff" +
   "\u00e4k=\u00c8\u00d6\u001c\u007f\u00fb\u00db\u00da&" + 
   "\u00d9\u00c5\u00c4\u008a#\u00a0u\u0003J&\n\u0083" +
   "\u00a0\u00e1.\u00ba\u00fd\u008a0s?\u00deg\u00d50\u0015" +
   "\u00eb\u0091\u00b3E\u00c7\u0015O\u00f3r\u00e3`~8\u00b4" + 
   "\u00b5=\u0089U\u007f\u00fa\u0019";

  // this is really quick
  BigInteger p = n1.gcd(n2);
  System.out.println("p = " + p);
  System.out.println();

  decode_with_p(e1, n1, cipher1, p);
  decode_with_p(e2, n2, cipher2, p);
 }
}

Execute it with the following command line:

java -ea RSATest > results.txt

Inside the garbage, within the results.txt file, you should notice the following:

String message (oaep) = What could go wrong: http://pastebin.com/hNz9gZbe
...
String message (oaep) = A real example: http://digitaloffense.net/tools/debian-openssl

To conclude, take care when you generate the keys for RSA encryption (and yes, go through this link)!

If you'd like to go through more such quests, try this course (a really nice one).

Monday, April 30, 2012

A few words about ConcurrentHashMap

There is a popular belief that, because ConcurrentHashMap is part of the java.util.concurrent package then it is designed to deal with multi-threading only. However, ConcurrentHashMap also has quite a useful property that I will exploit in this post, without mentioning multi-threading at all.

Let's start with the following code:

import java.util.Map;
import java.util.HashMap;

public class HashMapTest {
 // creates a map with the keys from an array and
 // initializes the values with zero 
 public static Map<String, Integer> prepareMap(String[] arr) {
  Map<String, Integer> map = new HashMap<String, Integer>();
  for (String str : arr) {
   map.put(str, new Integer(0));
  }
  return map;
 }

 // updates the map, i.e. increases the values by one
 public static void updateMap(Map<String, Integer> map) {
  for (String str : map.keySet()) {
   Integer i = map.get(str);
   map.put(str, new Integer(i.intValue() + 1));
  }
 }

 // prints the map content
 public static void printMap(Map<String, Integer> map) {
  for (String str : map.keySet()) {
   System.out.println(str + " -- " + map.get(str));
  }
 }

 static public void main(String[] args) {
  String[] arr = {"A", "B", "C", "D", "E", "F", "G", "H"};

  Map<String, Integer> map = prepareMap(arr);
  printMap(map);
  updateMap(map);
  printMap(map);
 }
}

Nothing complicated, we generate a map and process it within the updateMap method. Everything works fine at this point. However, let's imagine that we need a slightly more complicated processing of the map, e.g. if we encounter the key "C", then we need to add a new entry to the map, something like this:

 // updates the map, i.e. increases the values by one
 public static void updateMap(Map<String, Integer> map) {
  for (String str : map.keySet()) {
   Integer i = map.get(str);
   map.put(str, new Integer(i.intValue() + 1));
   if (str.equals("C")) {
    // we have a special case, let's treat it
    map.put("C1", new Integer(0));
   }
  }
 }

Now, when we execute the updated code, it fails (run time) with the following exception:

Exception in thread "main" java.util.ConcurrentModificationException
        at java.util.HashMap$HashIterator.nextEntry(Unknown Source)
        at java.util.HashMap$KeyIterator.next(Unknown Source)
        at HashMapTest.updateMap(HashMapTest.java:19)
        at HashMapTest.main(HashMapTest.java:55)

And this is exactly what we are told by the specification:

"Returns a Set view of the keys contained in this map. The set is backed by the map, so changes to the map are reflected in the set, and vice-versa. If the map is modified while an iteration over the set is in progress (except through the iterator's own remove operation), the results of the iteration are undefined."

In other words, HashMap (in fact, this isn't specific to HashMap only) doesn't allow for "updates while iterating", more specifically adding new entries with new keys.

One way to fix the problem is to create a copy of the key set, something like this:

import java.util.Map;
import java.util.HashMap;
import java.util.HashSet;

public class HashMapTest {
 // creates a map with the keys from an array and
 // initializes the values with zero 
 public static Map<String, Integer> prepareMap(String[] arr) {
  Map<String, Integer> map = new HashMap<String, Integer>();
  for (String str : arr) {
   map.put(str, new Integer(0));
  }
  return map;
 }

 // updates the map, i.e. increases the values by one
 public static void updateMap(Map<String, Integer> map) {
  HashSet<String> keys = new HashSet<String>(map.keySet());
  for (String str : keys) {
   Integer i = map.get(str);
   map.put(str, new Integer(i.intValue() + 1));
   if (str.equals("C")) {
    // we have a special case, let's treat it
    map.put("C1", new Integer(0));
   }
  }
 }

 // prints the map content
 public static void printMap(Map<String, Integer> map) {
  for (String str : map.keySet()) {
   System.out.println(str + " -- " + map.get(str));
  }
 }

 static public void main(String[] args) {
  String[] arr = {"A", "B", "C", "D", "E", "F", "G", "H"};

  Map<String, Integer> map = prepareMap(arr);
  printMap(map);
  updateMap(map);
  printMap(map);
 }
}

But, if we look at the output, we will mention that "C1" node is unprocessed:

    ...
    A -- 1
    B -- 1
    C -- 1
    C1 -- 0
    H – 1
    ...

As a result, we will need a new routine to process the unprocessed data. There is a different solution though, to use ConcurrentHashMap, which according to the specification:

"Retrievals reflect the results of the most recently completed update operations holding upon their onset."

And here is the final code:

import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;

public class HashMapTest {
 // creates a map with the keys from an array and
 // initializes the values with zero 
 public static Map<String, Integer> prepareMap(String[] arr) {
  Map<String, Integer> map = new ConcurrentHashMap<String, Integer>();
  for (String str : arr) {
   map.put(str, new Integer(0));
  }
  return map;
 }

 // updates the map, i.e. increases the values by one
 public static void updateMap(Map<String, Integer> map) {
  for (String str : map.keySet()) {
   Integer i = map.get(str);
   map.put(str, new Integer(i.intValue() + 1));
   if (str.equals("C")) {
    // we have a special case, let's treat it
    map.put("C1", new Integer(0));
   }
  }
 }

 // prints the map content
 public static void printMap(Map<String, Integer> map) {
  for (String str : map.keySet()) {
   System.out.println(str + " -- " + map.get(str));
  }
 }

 static public void main(String[] args) {
  String[] arr = {"A", "B", "C", "D", "E", "F", "G", "H"};

  Map<String, Integer> map = prepareMap(arr);
  printMap(map);
  updateMap(map);
  printMap(map);
 }
}

All the entries are processed accordingly.

Now, let's check the efficiency, by executing the following code:

 static public void main(String[] args) {
  String[] arr = {"A", "B", "C", "D", "E", "F", "G", "H"};

  long total = 0;
  int max = 1000000;
  for (int i = 0; i < max; i++) {
   long start = System.currentTimeMillis();
   Map<String, Integer> map = prepareMap(arr);
   updateMap(map);
   long end = System.currentTimeMillis();
   total += end - start;
  }
  System.out.println("Average time " + ((1.0*total)/max) + "ms");
 }

Average for the ConcurrentHashMap version:

Average time 0.0022ms

Average for the HashMap and HashSet version:

Average time 0.0010ms

Indeed, ConcurrentHashMap brings some (worth to consider) performance penalties. But it isn't like it is twice slower than the HashMap and HashSet version, because in this example we are operating with a small number of entries:

String[] arr = {"A", "B", "C", "D", "E", "F", "G", "H"};

If we increase the number of entries to e.g. 259 (from "A0", "A1", ... to "Z9", excluding "C1"), then we have the following figures:

For the ConcurrentHashMap version:

Average time 0.048ms

For the HashMap and HashSet version:

Average time 0.033ms

Wednesday, February 29, 2012

The Game Of Life

Did I manage to drag your attention with the title? Good. Let's proceed to the lyrical part of this post then. A few years ago I watched this film and found the statistics starting at 1:16:56 being quite alarming. Not that I didn't know about most of those details, but it is totally different when you see them altogether. I encourage you to watch it as well. However, for the sake of this post, let's retain the following: "20% of the world's population consumes 80% of its resources".

Now, let's proceed to the technical part. My son started to learn Pascal at school and from time to time I give him "advanced" assignments, to polish his programming (and thinking) technique. After a few complaints from him, of the sort of "I am tired of doing nonsenses like calculate π, sorting etc. I want to do something ... real!" (well, he is just 12 years old), I decided to give him something "more real", a very basic Monte-Carlo simulation that I called "The Game Of Life". The reaction was quite unexpectedly expected, he got dragged by the title :)

So, what is the game about? Basically, let's imagine a world with limited resources (I don't think it is difficult to imagine this), which in the language of the game is called simply "wealth". We have a population of players with the total number of PLAYERS=100. Each player has two attributes, "wealth" and "power". The game starts with a equal distribution of wealth between the players and the "power=0" for each player. With each iteration, we can call it "a year", each player plays against a randomly selected player (players can't play with themselves). The rules for winning a turn are quite simple:

  • The player with greater "power" wins.
  • If the players have the same power, the richest one wins.
  • If the players have the same power and wealth, the winner is selected randomly.
  • The winner gets a reward, which means their "power" increases by 1 and if the loser has sufficient "wealth", the winner's "wealth" increases by 1, the loser's wealth decreases by 1 respectively (so we keep the resources constant).
The game continues for 10000 iterations.

Here is the code (slightly adjusted with my help):


program montecarlosimulation;
uses wincrt;
const
        PLAYERS = 100;
        YEARS = 10000;

var
        i,j:integer;
        x,y:integer;
        wealth:array[1..PLAYERS] of longint;
        power :array[1..PLAYERS] of longint;
        tmp:longint;
        poor:integer;

// p1 is the winner!!!
procedure reward(p1, p2:integer);
begin
        if wealth[p2] > 0 then begin
                wealth[p1] := wealth[p1] + 1;
                wealth[p2] := wealth[p2] - 1;
        end;
        power[p1] := power[p1] + 1;
end;

// decide who is the winner and apply the reward
procedure decision(a, b:integer):integer;
var
        d:integer;
begin
        // same power
        if power[b] = power[a] then begin
               // same wealth
                if wealth[a] = wealth[b] then begin
                        // pick the winner randomly
                        d := random(2);
                        if d = 0 then reward(a,b)
                        else reward(b,a);
                end
                else begin
                        if wealth[b] > wealth[a] then reward(b,a)
                        else reward(a,b);
                end;
        end
        else begin
                if power[b] > power[a] then reward(b,a)
                else reward(a,b);
        end;
end;

begin
        // initialise the players' values
        // total wealth is PLAYERS*1000
        for i := 1 to PLAYERS do begin
                wealth[i] := 1000;
                power[i] := 0;
        end;

        // the actual game
        randomize;
        for i := 1 to YEARS do begin
                for x := 1 to PLAYERS do begin
                        y := random(PLAYERS) + 1;
                        // the actual play
                        if x <> y then decision(y, x);
                end;
        end;

        // sorting the results, wealthiest on top
        for i := 1 to PLAYERS do begin
                for j:=i+1 to PLAYERS do begin
                        if wealth[i] < wealth[j] then begin
                                tmp := wealth[i];
                                wealth[i] := wealth[j];
                                wealth[j] := tmp;

                                tmp := power[i];
                                power[i] := power[j];
                                power[j] := tmp;
                        end;
                end;
        end;

        // printing the results
        poor := 0;
        for i := 1 to PLAYERS do begin
           if (wealth[i] = 0) then poor := poor + 1;
           writeln('person[', i,'] has ', wealth[i],' wealth and ',power[i],' power');
        end;

        // printing statistics
        writeln(poor, ' of ', PLAYERS, ' are poor.');
end.
This code was compiled using Free Pascal IDE.

Now, let's look at the results. After executing this code several times, the number of wealthy players (those with "wealth" > 0) is in between 20-23, from 100. This is ~20%. Setting YEARS = 15000 will decrease the result to 16%. But let's get back to the 20% figure, is this a coincidence with the 20% from the lyrical part on top of the post? The decision to set YEARS = 10000 was somewhat random, but later we (I and my son) figured out that ... check this out "Human prehistory begins in the Paleolithic Era, or "Early Stone Age". Later, during the Neolithic Era (New Stone Age), came the Agricultural Revolution(between 8000 and 5000 BCE) in the Fertile Crescent, where humans first began the systematic husbandry of plants and animals".

Couple of interesting facts as you may see. Obviously, the code is a way too simple to pretend to model a real game of life, e.g. the number of players is constant, there is no way for the poor players to ever win (which can happen in real life), lots of rich people are also philanthropists, "brain" (and "faith") attribute isn't considered at all, two or more poor players with total "power" greater than the "power" of a rich player can compete against that player, an "insufficiently" rich player can bribe a richer and more powerful player to win the play with another player etc. This, of course, will make the game more real and complicated (for my son, who still learns Pascal, while his father "plays" the game).

Anyway, if I managed to drag your attention to this subject, feel free to write a better version of "The Game Of Life" and drop me a note with your results. I am really curious.

Friday, January 27, 2012

Scope of variables in Java, nulling and why this might be important

While playing with some performance improvements this week, I came across this article. Weird you'd say, but let me show my version of the test scenario.

First of all, I am using Java 1.6.0_24 HotSpot 64-bit and executing the code with the following options (i.e. limiting the heap size to 64Mb):


java -Xms64m -Xmx64m MemTest

Here is the first version of the code:


public class MemTest {
 public static int avg(int[] ns) {
  long sum = 0;
  for (int i = 0; i < ns.length; i++) sum += ns[i];
  return (int) (sum/ns.length);
 }

 public static void main(String[] args) {
  int[] nums = null;

  for (int j = 0; j < 1000000; j++) {
   System.out.print("Iteration: ");
   System.out.println(j);

   int sz = 2<<(j % 30);
   if (sz > 11162611) sz = 11162611;

   System.out.print("Dynamic size: ");
   System.out.println(sz);

   nums = new int[sz];
   for (int i = 0; i < nums.length; i++) nums[i] = i;

   int avr = avg(nums);
   System.out.print("Average: ");
   System.out.println(avr);
  }
 }
}

There is nothing complicated in this code, we allocate an array (int[] nums) outside the main "for" loop (this is the key point of this post by the way) and calculate the average. Everything works fine so far. Now let's remove the "for" loop responsible for assignments:


public class MemTest {
 public static int avg(int[] ns) {
  long sum = 0;
  for (int i = 0; i < ns.length; i++) sum += ns[i];
  return (int) (sum/ns.length);
 }

 public static void main(String[] args) {
  int[] nums = null;

  for (int j = 0; j < 1000000; j++) {
   System.out.print("Iteration: ");
   System.out.println(j);

   int sz = 2<<(j % 30);
   if (sz > 11162611) sz = 11162611;

   System.out.print("Dynamic size: ");
   System.out.println(sz);

   nums = new int[sz];
   int avr = avg(nums);
   System.out.print("Average: ");
   System.out.println(avr);
  }
 }
}

Now the execution fails with the OutOfMemoryError. Odd, isn't it? What do we typically do in such cases? Increase the heap size? Well, but the previous version was working fine with the same heap size. Ok, now let's do something which isn't considered as a good practise by adding the line "nums = null;":


public class MemTest {
 public static int avg(int[] ns) {
  long sum = 0;
  for (int i = 0; i < ns.length; i++) sum += ns[i];
  return (int) (sum/ns.length);
 }

 public static void main(String[] args) {
  int[] nums = null;

  for (int j = 0; j < 1000000; j++) {
   System.out.print("Iteration: ");
   System.out.println(j);

   int sz = 2<<(j % 30);
   if (sz > 11162611) sz = 11162611;

   System.out.print("Dynamic size: ");
   System.out.println(sz);

   nums = new int[sz];
   int avr = avg(nums);
   nums = null; // <<-- HERE IS THE NEW LINE

   System.out.print("Average: ");
   System.out.println(avr);
  }
 }
}

Hey, fantastic, it works again! And now let's do it properly by changing the scope of the "int[] nums":


public class MemTest {
 public static int avg(int[] ns) {
  long sum = 0;
  for (int i = 0; i < ns.length; i++) sum += ns[i];
  return (int) (sum/ns.length);
 }

 public static void main(String[] args) {
  for (int j = 0; j < 1000000; j++) {
   System.out.print("Iteration: ");
   System.out.println(j);

   int sz = 2<<(j % 30);
   if (sz > 11162611) sz = 11162611;

   System.out.print("Dynamic size: ");
   System.out.println(sz);

   int[] nums = new int[sz]; // <<-- LOCAL SCOPE
   int avr = avg(nums);

   System.out.print("Average: ");
   System.out.println(avr);
  }
 }
}

And it works again.

So, why is this happening? I have no idea. It could be a particularity of the JVM implementation, as the author of the article, I mentioned at the top, suggests. But, obviously we can draw a conclusion, variable scope is something worth considering (and probably coding standards aren't always just "cosmetic" features).

Wednesday, January 25, 2012

Getting ready for Oracle Certified Professional Java SE Programmer?

Few years ago I passed SCJP (Sun Certified Java Programmer) certification and, during the preparation for the test, compiled a short, 18 pages, document containing some key facts required for passing the test successfully. I "accidently" found it in my archives and decided to share it, here is the link.

The document is a collection of various facts, gathered from various books/articles as well as code examples written by me while experimenting. It doesn't always clarify each topic with great details, but this wasn't the purpose of the document. So, if you struggle with a topic and want to know more, I always recommend to consult JLS (Java Language Specification).

You are more than welcome to use it and I will be happy to know if it helped you to pass the test (or at least you find it useful) ;)