Showing posts with label Programming. Show all posts
Showing posts with label Programming. Show all posts

Saturday, April 27, 2024

A few thoughts on faster sorting

In this article, I am presenting a sorting schema, hugely inspired by the Bucket and Proxmap sorting algorithms. I call it "schema", rather than "algorithm", simply because it introduces a (or another, depending on the sorting algorithm) "devide and conquer" element to the existing sorting algorithms, aiming (imho) to make them faster.

Also, let's recall that most of the sorting algorithms come with the time complexity $${\displaystyle O(N \log{N})} \tag{1}$$ With more details ...

1. The description of the schema

Let's suppose we have a list $L$ with $N$ (list size, thus a number) objects to sort. Now, let's imagine that we have $M$ (a number) buckets which are pre-sorted (buckets, not the content) according to the same criteria applied to sort the objects from the list $L$. Let's use strings as an example, if $$L=\{\text{BBBB}, \text{BBAA}, \text{AAAA},\text{...}\}$$ then we can "bucket" by the 1st letter of the string, giving us a total of 26 letters in the English alphabet (and yes let's ignore the case sensitivity and numbers and ... everything to keep things easy). We keep these buckets sorted as $$B_1=\{A\}, B_2=\{B\},...,B_{26}=\{Z\}$$ Then we partition/distribute the content of $L$ accross the buckets, sort the buckets individually and (finally) merge the content of the buckets to the final sorted list (simple traversal of the buckets, since they are already sorted). Something like this

What we have (in terms of time complexity) as a result

  • List $L$ traversal: ${\displaystyle O(N)}$
  • It's important that the distribution of objects to buckets is ${\displaystyle O(1)}$ per object, e.g. using hash maps
  • Sorting per bucket (from $(1)$) is ${\displaystyle O\left(\frac{N}{M} \cdot\log{\frac{N}{M}}\right)}$. Assumptions is that the content of the list $L$ is "uniformly" distributed (!!!) across the buckets.
  • Merging into the final list is ${\displaystyle O(M)}$. Again, buckets are pre-sorted.

2. The mathematical argument

Time complexity in the schema described above is $$ {\displaystyle O\left(N + M \cdot \frac{N}{M}\cdot \log{\left(\frac{N}{M}\right)}+M\right)} $$ which is $${\displaystyle O\left(N + N\cdot\log{\left(\frac{N}{M}\right)}+M\right)} \tag{2}$$ If we compare $(1)$ and $(2)$ $$\frac{N + N\cdot\log{\left(\frac{N}{M}\right)}+M}{N \cdot\log{N}}= 1-\frac{\log{M} - 1}{\log{N}}+\frac{M}{N\cdot\log{N}} \tag{3}$$ Assuming large enough $M < N$ we have $$\frac{N + N\cdot\log{\left(\frac{N}{M}\right)}+M}{N \cdot\log{N}}< 1-\frac{\log{M} - 2}{\log{N}}< 1 \tag{4}$$ In other words, for large $N$ and $M$ we should expect this schema to make most of the sorting algorithms "a little bit" faster. By how much? Let's see ...

3. Specific example

Let's consider lists of strings as an example. Here is the source code of a little PoC project which generates lists of randomly generated strings and sorts them using

  1. the Java's Collections::sort and
  2. an implementation of the schema above, which internally uses Collections::sort as well to sort the content of the buckets
and compares the results. No parallelisation is used, although sorting buckets in parallel could significantly improve the speed in the second (B) case, but no cheating!

Most of the aspects in the code are easy to tune, but we will consider only

  • lists of size $N=1000000$ and $N=3000000$ and
  • bucketing by the first 2 letters, giving the total number of buckets $M=26^2$

Plugging these numbers into the formula $(3)$ we should expect

$M=26^2$ and $N=1000000$ $M=26^2$ and $N=3000000$
0.60 0.63

An improvement of nearly $40$%!? No way ...

4. Actual results

One more test parameter I should mention (and which is not part of the formula $(3)$) is $X$ the length of each string added to the list $L$, this is to count for Java string comparison where string size plays a role. We will look for $X=10$, $X=50$ and $X=100$. Here are some results from running the PoC code on my fairly old computer with an Intel i5-3330 3.00GHz 4 Cores CPU on board, using Java 17

$X=10$, $M=26^2$ and $N=1000000$ $X=10$, $M=26^2$ and $N=3000000$
Reported result:
  Fast Sort avg: 422.33ms
  Std Sort avg: 677.96ms
    
Reported result:
  Fast Sort avg: 1484.24ms
  Std Sort avg: 2395.39ms
    
$\frac{422.33}{677.96}\approx 0.6229$ $\frac{1484.24}{2395.39}\approx 0.6196$

A few more results

$X=50$, $M=26^2$ and $N=3000000$ $X=100$, $M=26^2$ and $N=3000000$
Reported result:
  Fast Sort avg: 1545.82ms
  Std Sort avg: 2773.65ms
    
Reported result:
  Fast Sort avg: 1561.84ms
  Std Sort avg: 2883.08ms
    
$\frac{1545.82}{2773.65}\approx 0.5573$ $\frac{1561.84}{2883.08}\approx 0.5417$

5. Conclusions

Well, the test results are not too far off from the calculated results. So, the schema does improve the sorting ... However, I should mention the following:

  • It works well when the list $L$ is "uniformly distributed" across the buckets (already mentioned in section 1). Imagine that all the objects fall into one bucket, then we are back to ${\displaystyle O(N \log{N})}$ or slightly worse.
  • The test times don't include the time required to "build" the buckets. Buckets are meant to be "built" once and re-used.

And finally, the schema allows for

  • parallelisation, content of the buckets can be sorted in parallel and
  • imagine a data streaming scenario, if one bucket is updated, we don't have to re-sort the entire list $L$

Saturday, July 25, 2020

A few words about ConcurrentHashMap, compute() method

There is one ambiguous moment in the Java documentation of the compute() method from the ConcurrentHashMap class:

Some attempted update operations on this map by other threads may be blocked while computation is in progress, so the computation should be short and simple, and must not attempt to update any other mappings of this Map.

It's not clear what is being blocked. Is it the access to the entire map or to the key? Let's find out, by conducting a few experiments with the following code:

package tests.concurrency;

import java.util.Map;
import java.util.Objects;
import java.util.concurrent.ConcurrentHashMap;

public class TestConcurrentHashMapCompute implements Runnable {
	private static final Map<Key, String> MAP = new ConcurrentHashMap<>();
	private static final int MAX_RUNS = 100;
	private static final long SLEEP_TIME = 5000;

	public static void main(String[] args) {
		Thread th1 = new Thread(new TestConcurrentHashMapCompute());
		Thread th2 = new Thread(new TestConcurrentHashMapCompute());

		th1.start();
		th2.start();
	}

	@Override
	public void run() {
		final String name = Thread.currentThread().getName();
		for (int i = 0; i < MAX_RUNS; i++) {
			final String value = MAP.compute(new Key(name), (k, v) -> {
				System.out.println(name + ": enters");
				sleep();
				System.out.println(name + ": exits");
				return name;
			});
			System.out.println(name + ": " + value);
		}
	}

	private static void sleep() {
		try {
			Thread.sleep(SLEEP_TIME);
		} catch (Exception ex) {}
	}

	private static class Key {
		private final String value;

		private Key(String value) {
			this.value = value;
		}

		@Override
		public int hashCode() {
			return 1;
			// return Objects.hash(value);
		}

		@Override
		public boolean equals(Object o) {
			if (this == o) {
				return true;
			}
			if (o == null || getClass() != o.getClass()) {
				return false;
			}
			Key other = (Key) o;
			return Objects.equals(value, other.value);
		}
	}
}

It's nothing complicated, two threads calling the compute() method of a shared ConcurrentHashMap instance, using Key class instances as the key for the map. However, pay attention to the hashCode() method inside the Key class:

		@Override
		public int hashCode() {
			return 1;
			// return Objects.hash(value);
		}

Depending on what is being returned (comment/uncomment the relevant lines), you will see different behaviour. And, by the way, return 1 is a valid hashCode() implementation, according to the Java documentation:

- If two objects are equal according to the equals() method, then calling the hashCode method on each of the two objects must produce the same integer result.
- It is not required that if two objects are unequal according to the equals() method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.

The result is:
- With return Objects.hash(value), i.e. String's descent hashCode() implementation, the threads run in parallel, as if the block was on the Key.
- With return 1, the threads run sequentially, as if the block was on the entire Map.

In fact, the block is on the bucket indexed by the Key. That's what the hashCode() method is about, to quickly access the relevant bucket. HashMap, for example, uses a singly linked list as the storage for the buckets.

Saturday, September 22, 2018

Generating Subsets

Every once in a while I am being challenged with problems involving optimal subsets from a given set of objects. It happened again a few weeks ago. The problem was of a knapsack style (finding the most profitable subset of items of bounded total size) and ... of course I forgot how to tackle it using dynamic programming. But, many years ago I learned a very simple (to memorise) algorithm which saved the day and I will describe it in this article.

The Algorithm Design Manual, page $453$, defines it as binary counting. The idea behind it is the one to one mapping between the numbers $0 \rightarrow 2^{n-1}$ (total number of numbers is $2^n$) and binary strings of length $n$ (total number of strings is $2^n$) where bit $1$ at the index $i$ means "consider the object at the index $i$ in the original set for the given subset". We also know that the total number of subsets of a set of size $n$ is $2^n$. We have $2^n$ everywhere, awesome!

Let's see it in action by looking at a simple example, a small set $\left\{A, B, C\right\}$ with $n=3$. We are not interested in the empty set, thus we will ignore the $0$-th iteration, leaving us with a total of $2^3-1=7$ subsets:

IterationBinary string Indexes of the objects to considerSubset
1001 1$\left\{A\right\}$
2010 2$\left\{B\right\}$
3011 1 and 2$\left\{A, B\right\}$
4100 3$\left\{C\right\}$
5101 1 and 3$\left\{A, C\right\}$
6110 2 and 3$\left\{B, C\right\}$
7111 1, 2 and 3$\left\{A, B, C\right\}$

I need to stress the fact that it's an exponential (or $O\left(2^n\right)$) complexity algorithm. It is very inefficient for big sets. Here is a code example:

import java.util.LinkedHashSet;
import java.util.Set;

public class AllSubSets {
 public static void main(String args[]) {
  String[] setValues = new String[] { "A", "B", "C", "D", "E",
    "F", "G", "H", "I", "J" };

  long limit = 1 << setValues.length; // this is 2^length

  for (long l = 1; l < limit; l++) {
   Set subSet = new LinkedHashSet<>();
   for (int i = 0; i < setValues.length; i++) {
    if ((l & (1 << i)) > 0) {
     subSet.add(setValues[i]);
    }
   }
   subSet.forEach(System.out::print);
   System.out.println();
  }
 }
}

and the result:


A
B
AB
C
AC
BC
ABC
D
... (trimmed due to the size) ...
CDEFGHIJ
ACDEFGHIJ
BCDEFGHIJ
ABCDEFGHIJ

A careful reader will almost immediately reply that this version is limited to sets with maximum 64 objects (with Java 8, long is limited to 64 bits and can be treated as unsigned). BigInteger, which supports bitwise operations, addresses this limitation. And yes, the extra subset construction loop makes it an $O(n\cdot2^n)$ algorithm, it's still exponential though.

However, as it was mentioned earlier, this is not an efficient algorithm. Anything bigger than 64 objects will produce more than $2^{64}$ subsets and this is a big number. Even $2^{32}$ is pretty decent in size, so don't use this algorithm for big sets.

Monday, August 13, 2018

To Optional or not to Optional

I recently had an interesting conversation with my colleagues about the usage of Optional's in Java. It all revolves around this article.

So, I quickly sketched the following code to highlight the idea:

import java.util.Optional;

public class OptionalEvaluationTest {
 public static void main(String[] args) {
  test1(); // non lazy execution of orElse().
  test2();
  test3();
  test4();
 }

 private static void test1() {
  System.out.println("Test 1");
  Optional<Integer> oi = Optional.ofNullable(3);
  oi.map(OptionalEvaluationTest::intToString)
   .orElse(defaultString());
  System.out.println("----");
 }

 private static void test2() {
  System.out.println("Test 2");
  Optional<Integer> oi = Optional.ofNullable(null);
  oi.map(OptionalEvaluationTest::intToString)
   .orElse(defaultString());
  System.out.println("----");
 }

 private static void test3() {
  System.out.println("Test 3");
  Optional<Integer> oi = Optional.ofNullable(3);
  oi.map(OptionalEvaluationTest::intToString)
   .orElseGet(OptionalEvaluationTest::defaultString);
  System.out.println("----");
 }

 private static void test4() {
  System.out.println("Test 4");
  Optional<Integer> oi = Optional.ofNullable(null);
  oi.map(OptionalEvaluationTest::intToString)
   .orElseGet(OptionalEvaluationTest::defaultString);
  System.out.println("----");
 }

 private static String intToString(int i) {
  System.out.println("converter called!");
  return Integer.toString(i);
 }

 private static String defaultString() {
  System.out.println("default called!");
  return "default";
 }
}

which yields the following result:


Test 1
converter called!
default called!
----
Test 2
default called!
----
Test 3
converter called!
----
Test 4
default called!
----

What we see is that .orElse() is executed always, aka is not lazy executed. This may become a performance issue if used unwisely. Tend to prefer .orElseGet() instead.

Sunday, May 26, 2013

Tackling Andrica's conjecture. Part 3

Here is an interesting, but slightly detached from the previous two articles, result. Let's look at the following two sequences: $$a_{n}=\sqrt{p_{n+1}} - \sqrt{p_{n}}$$ $$b_{n}=\ln\left ( \frac{1+\sqrt{p_{n+1}}}{1+\sqrt{p_{n}}} \right )^{1+\sqrt{p_{n+1}}}$$

Here is a short Python code to visualise the sequences.

And here is how both sequences look like ($a_{n}$ the first and $b_{n}$ the second): Quite asymptotic, aren't they? Indeed they are ...

Lemma 3. $$\sqrt{p_{n+1}} - \sqrt{p_{n}} \leq \ln\left ( \frac{1+\sqrt{p_{n+1}}}{1+\sqrt{p_{n}}} \right )^{1+\sqrt{p_{n+1}}} \leq \left ( \frac{1+\sqrt{p_{n+1}}}{1+\sqrt{p_{n}}} \right )\cdot \left ( \sqrt{p_{n+1}} - \sqrt{p_{n}} \right )$$

Let's look at this function $f_{6}(x)=\frac{\sqrt{p_{n+1}}}{1+x\cdot \sqrt{p_{n+1}}}$. Obviously, ${\ln\left ( 1+x\cdot \sqrt{p_{n+1}} \right )}'=f_{6}(x)$. As a result $$\int\limits_{\sqrt{\frac{p_{n}}{p_{n+1}}}}^{1} f_{6}\left ( x \right )dx = \ln\left ( 1+x\cdot \sqrt{p_{n+1}} \right )|_{\sqrt{\frac{p_{n}}{p_{n+1}}}}^{1}=\ln\left ( \frac{1+\sqrt{p_{n+1}}}{1+\sqrt{p_{n}}} \right )$$

According to Mean Value Theorem, $\exists \mu \in \left (\sqrt{\frac{p_{n}}{p_{n+1}}} ,1 \right )$ such that: $$\int\limits_{\sqrt{\frac{p_{n}}{p_{n+1}}}}^{1} f_{6}\left ( x \right )dx = f_{6}\left ( \mu \right )\cdot \left ( 1- \sqrt{\frac{p_{n}}{p_{n+1}}} \right )$$

Putting all together: $$\ln\left ( \frac{1+\sqrt{p_{n+1}}}{1+\sqrt{p_{n}}} \right ) = \frac{\sqrt{p_{n+1}}-\sqrt{p_{n}}}{1+\mu \cdot \sqrt{p_{n+1}}}$$

Because $$\sqrt{\frac{p_{n}}{p_{n+1}}}< \mu < 1 \Rightarrow 1+\sqrt{p_{n}}< 1+\mu \cdot \sqrt{p_{n+1}} < 1+\sqrt{p_{n+1}} $$

And we get $$\frac{\sqrt{p_{n+1}}-\sqrt{p_{n}}}{1+\sqrt{p_{n+1}}}\leq \ln\left ( \frac{1+\sqrt{p_{n+1}}}{1+\sqrt{p_{n}}} \right )\leq \frac{\sqrt{p_{n+1}}-\sqrt{p_{n}}}{1+\sqrt{p_{n}}}$$ which proves this lemma.

Noting $\Delta_{n}=\sqrt{p_{n+1}}-\sqrt{p_{n}}$, this becomes: $$\frac{\Delta_{n}}{1+\sqrt{p_{n+1}}}\leq \ln\left ( 1 + \frac{\Delta_{n}}{1+\sqrt{p_{n}}} \right )\leq \frac{\Delta_{n}}{1+\sqrt{p_{n}}}$$ or $$\frac{1+\sqrt{p_{n}}}{1+\sqrt{p_{n+1}}}\leq \ln\left ( 1 + \frac{\Delta_{n}}{1+\sqrt{p_{n}}} \right )^{\frac{1+\sqrt{p_{n}}}{\Delta_{n}}}\leq 1$$

Is this result of any use? I don't know yet, but it looks like: $$\left ( 1 + \frac{\Delta_{n}}{1+\sqrt{p_{n}}} \right )^{\frac{1+\sqrt{p_{n}}}{\Delta_{n}}} \rightarrow e, n \to \infty$$ using this result.

Tuesday, May 7, 2013

A few cloud based APIs

If you came to the point of being ready to start a new project, here are few links to some cloud based APIs to consider:

1. StormPath - quite good a user management and you get free perks as well.

2. Mandrill - if you need a good notification (email based) system. Again, quite a decent free of charge limit.

3. MongoLab - well, that's MongoDB. And if you don't know what it is by now, then you shouldn't worry about it at all (caught your attention? Google for it then). Free 0.5Gb instance is a good starting point.

4. Google Chart API - Not the best one, but it's free and quite ok(ish) for building dashboards.

5. GeckoBoard - a better dashboard API, with HTML5 support (WebSockets and all that stuff), not for free though ...

And of course, you can test all that with Amazon for free for one year.

This is it for now, I will do my best to keep this list updated.

Sunday, September 23, 2012

Tackling Andrica's conjecture. Part 1

Andrica's conjecture is one of those mathematical statements which are extremely easy to formulate, but complicated to prove. In fact it hasn't been proved yet, to be more precise. Here we go, for all consecutive prime numbers $p_{n}$ and $p_{n+1}$, it happens that: $$\sqrt{p_{n+1}} - \sqrt{p_{n}}< 1$$

1. The first attempt

Let's start with an analogy, here is an inequality $p_{n+1} < 2\cdot p_{n}$ which is a direct result from Bertrand's postulate, proved by Chebyshev. As a result of this inequality, we have: $$0 < \ln(p_{n+1}) - \ln(p_{n}) < \ln(2) < 1$$ But what is the relationship between natural logarithm and square root functions?

Well, let's have a look at the following function: $f_{1}(x)=\sqrt{x} - \ln(x)$. First derivative of this function is ${f_{1}}'(x) = \frac{1}{2\cdot \sqrt{x}} - \frac{1}{x}=\frac{1}{x}\cdot \left ( \frac{\sqrt{x}}{2} -1\right )\geq 0$, for $\forall x\geq 4$. This means that $f_{1}(x)$ is ascending for $\forall x\geq 4$. As a result, for $\forall p_{n}\leq p_{n+1}$, $n\geqslant 3$ (because $p_{3}=5$) $\Rightarrow f_{1}(p_{n}) \leq f_{1}(p_{n+1})$ or $$\sqrt{p_{n}} - \ln(p_{n}) \leq \sqrt{p_{n+1}} - \ln(p_{n+1})$$ Finally: $$0< \ln(p_{n+1}) - \ln(p_{n}) \leq \sqrt{p_{n+1}} - \sqrt{p_{n}}, \forall n\geq 3$$ As we can see, the analogy isn't of any use really, though we have a nice inequality.

2. The second attempt

Let's have a look at another function: $f_{2}(x)=\frac{x}{\ln(x)} - \sqrt{x}$. Its first derivative is $${f_{2}}'(x)=\frac{\ln(x)-1}{(\ln(x))^{2}}-\frac{1}{2\cdot \sqrt{x}}=\\ \frac{1}{2\cdot \sqrt{x}\cdot (\ln(x))^{2}}\cdot \left ( 2\cdot \sqrt{x}\cdot \ln(x)-2\cdot \sqrt{x}-(\ln(x))^{2} \right )$$

For the first term we have: $\frac{1}{2\cdot \sqrt{x}\cdot (\ln(x))^{2}}> 0$, for $\forall x> 0$.

Let's check the second term: $\left ( 2\cdot \sqrt{x}\cdot \ln(x)-2\cdot \sqrt{x}-(\ln(x))^{2} \right )=$ $\left (-x+2\cdot \sqrt{x}\cdot \ln(x)-(\ln(x))^{2}-2\cdot \sqrt{x}+x \right )=$ $\left [ -\left ( x-2\cdot \sqrt{x}\cdot \ln(x)+(\ln(x))^{2} \right ) +\left ( 1- 2\cdot \sqrt{x}+x\right )-1\right ]=$ $\left [ \left ( \sqrt{x}-1\right )^{2}-\left ( \sqrt{x}-\ln(x) \right )^{2} -1\right ]$ and we are not done yet. Using $a^{2}-b^{2}=(a-b)\cdot (a+b)$, we have: $\left ( 2\cdot \sqrt{x}\cdot \ln(x)-2\cdot \sqrt{x}-(\ln(x))^{2} \right )=$ $\left [ \left ( \ln(x)-1 \right )\cdot \left ( \sqrt{x}-1+\sqrt{x}-\ln(x) \right )-1 \right ]$.

What we have so far:

  • $\ln(x)-1 \geq 1$, for $\forall x \geq e^{2}$
  • $\sqrt{x}-1 \geq 1$, for $\forall x \geq 4$
  • $\sqrt{x}-\ln(x)>0$, for $\forall x \geq 4$, this is, by the way, $f_{1}(x)$, so $f_{1}(x)\geq f_{1}(4)>0$
Altogether: $\left ( 2\cdot \sqrt{x}\cdot \ln(x)-2\cdot \sqrt{x}-(\ln(x))^{2} \right )\geq 0$, for $\forall x\geq e^{2}$.

This means ${f_{2}}'(x)\geq 0$ and $f_{2}(x)$ is ascending, for $\forall x\geq e^{2}$. As a result, for $\forall p_{n}\leq p_{n+1}$, $n\geqslant 3$ (the cases for $p_{3}=5$, $p_{4}=7$ and $p_{5}=11$ can be verified with a calculator) $\Rightarrow f_{2}(p_{n}) \leq f_{2}(p_{n+1})$ or $$\frac{p_{n}}{\ln(p_{n})} - \sqrt{p_{n}}\leq \frac{p_{n+1}}{\ln(p_{n+1})} - \sqrt{p_{n+1}}$$ Final result is (let's note it as (2)): $$ \sqrt{p_{n+1}}- \sqrt{p_{n}}\leq \frac{p_{n+1}}{\ln(p_{n+1})} - \frac{p_{n}}{\ln(p_{n})}, \forall n\geq 3$$

2.1 A negative result of the second attempt

It is worth mentioning that function $\frac{x}{\ln(x)}$ has a special role in the number theory. According to the Prime Number Theorem: $\displaystyle\smash{\lim_{x \to \infty }}\tfrac{\pi (x)}{x/\ln(x)}=1$, where $\pi (x)$ is the prime counting function.

Obviously, $\displaystyle\smash{\lim_{n \to \infty }}\tfrac{\pi (p_{n})}{p_{n}/\ln(p_{n})}=1$ and $\displaystyle\smash{\lim_{n \to \infty }}\tfrac{\pi (p_{n+1})}{p_{n+1}/\ln(p_{n+1})}=1$ (as sub-sequences).

Also $\pi (p_{n})=n$ and $\pi (p_{n+1})-\pi (p_{n})=1$.

If we put all these together, does it mean that $\displaystyle\smash{\lim_{n \to \infty }} \left ( \frac{p_{n+1}}{\ln(p_{n+1})} - \frac{p_{n}}{\ln(p_{n})} \right )=1$?

Nope, it doesn't, check this link for more details.

2.2 A positive result of the second attempt

According to the following article, in 2005 it was proved that $\displaystyle\smash{\lim_{n \to \infty }}\inf \frac{g_{n}}{\ln(p_{n})}=0$, where $g_{n}=p_{n+1}-p_{n}$ or the n-th gap.

It is also obvious that $\frac{p_{n+1}}{\ln(p_{n+1})} < \frac{p_{n+1}}{\ln(p_{n})}$ (because $p_{n} < p_{n+1}$ and $\ln(p_{n}) < \ln(p_{n+1})$). As a result: $$ \sqrt{p_{n+1}}- \sqrt{p_{n}}\leq \frac{p_{n+1}}{\ln(p_{n+1})} - \frac{p_{n}}{\ln(p_{n})}\leq \frac{g_{n}}{\ln(p_{n})}, \forall n\geq 3$$ Or $$\displaystyle\smash{\lim_{n \to \infty }}inf\left ( \sqrt{p_{n+1}}- \sqrt{p_{n}} \right )=0$$ This means that Andrica's conjecture is true for an infinity pairs of $p_{n}$ and $p_{n+1}$. It doesn't prove the conjecture though, which states "it is true for all $p_{n}$ and $p_{n+1}$"

3. The third attempt

How about the following function $f_{3}(x)=\pi (x) - \frac{x}{\ln(x)}$? Well, for this case I used a little Python program, the code is here. First of all, $f_{3}(23)=1.66463325521 > f_{3}(29)=1.38774807317$. So, $f_{3}(x)$ is not ascending. The general tendency of $f_{3}(x)$ seems to be ascending though, according to this graphic (click to enlarge):

At least, for the areas where the function is ascending for consecutive primes (update the program to plot just plt.plot(x1, y1), i.e. primes and values of the function for the prime arguments), Andrica's conjecture is true, because if $$\pi (p_{n+1}) - \frac{p_{n+1}}{\ln(p_{n+1})} \geq \pi (p_{n}) - \frac{p_{n}}{\ln(p_{n})}$$ then (also consider (2)) $$1=\pi (p_{n+1}) - \pi (p_{n}) \geq \frac{p_{n+1}}{\ln(p_{n+1})} - \frac{p_{n}}{\ln(p_{n})}$$ Still, this is not a proof.

4. What's next?

Well, probably the next logical thing is to analyse the function $f_{4}(x)=\pi (x) - \sqrt{x}$. At least it seems to be ascending for prime arguments.

But the work is still in progress...

Tuesday, May 22, 2012

Factoring big numbers and RSA

Last week I came across this article. It's an interesting report indeed and with this post I will explain what is the implication. But first of all, let's quickly recall what RSA encryption is.

There is a message m which needs to be encrypted. There is a public key (e, n) and a private key (d, n). Message is encrypted in the following way c = me (mod n). Cipher text is decrypted in the following way m = cd (mod n). Puting all these together me⋅d ≡ m (mod n) and if m and n are co-prime then me⋅d-1 ≡ 1 (mod n). n is selected to be a product of two distinct prime numbers n = p⋅q and, according to the Euler's theorem, mφ(n) ≡ 1 mod n. Now e and d (exponents of the public/private keys) are selected so that e⋅d - 1 ≡ 0 (mod φ(n)) or e⋅d - 1 = k⋅φ(n), k - integer.

Because n is a product of really big prime numbers (thus n is a big number itself), then factoring it is a really complicated problem, considering "limited" computational power of the modern computers and the fact that factoring is an exponential (in terms of complexity) problem. As a result, computing φ(n), which in this particular case is φ(n) = (p - 1)⋅(q - 1), is also a complicated problem.

However, computing the greatest common divisor (GCD) isn't a complicated problem. As a result, if two public keys (e1, n1) and (e2, n2) are spotted, so that GCD(n1, n2) > 1, then decoding the relevant messages m1 and m2 becomes and easy task and this is the topic of the article I have mentioned at the top of this post.

Now, let's do a quick test. Imagine the following two public keys and associated cipher texts:

e1 = 65537
n1 = 59271838373237712241010698426785545947980750376894660532845611609385295
     49357464245996603984250810283460055082118943354872215289998388427726673
     74160929852573051680099378617005092405110706474184136037559125038434888
     56979904991517729100725512850421664634705274281314737938901139871448406
     073842088742598680079967
cipher1 = "J\u00c1R\u0090\u00e1\u00f4\u008b5My\u00f8\u00a1\u00f4>\u00a2\u00c3\u0010
           \u00bd\u00eb\u00cc&\u007fb\u001aC$\u001d\u00c5\u00b7\u00cdz\u00b7\u0017
           \u008a#9\u0012\u0089\u00feao\u0019\u009c\u00eb\u00b0>\u0086\u009b
           \u001d3~b0-u\u00fc\u0004!\rc\\\u00cb$\u0091\u009e\u00a1N\u009d2\u00ff
           \u0019\u009a9vH.\u00d5\u00e7m\u00a9m\u00ea^\u00d3T$\u00d7\u00d7\u0011
           \u0081\u00e4B\u009b~\u008c$\u00a6K\u008a\u00dc`\u00b4\u009cu\u00fb\u00c2
           \u0006\u00d1\u00bb\u00b9\u00a0\u008f\u00d2\u00bc\u0002\u00f6#\u001f\u001dM
           \u00bb\u0098\u00f2\u00a0\u009fO\u0080"

e2 = 65537
n2 = 72216220929585874029800250341144628594049328751082632793975238142970345
     80195859400832155769761460789049220801438471143407662437503457520665980
     33488377571129629910281750410842883648532072455460838627134172456428247
     65387577331828704441227356582723560291425753389466410790421096831823015
     438162111864463275922441
cipher2 = ".\u00fd9\u008dc\u00da\u00f9o\u00f5Vl\u00fb\u0087\u00ed\u00d5 \u00ee\u00cf
           \u0097~\u00d8T\u00f9.\u0018\u00b1\u00d5n^\u00a0\rA\u00e0\u001d\u00d5
           \u00c8:D\u00c9\u0014o\u00de\u00dbo\u00f9>)bc'a\u00a2\u008e\u00c1|\u00dd
           \r[q1\u00ac\u000f^\u0082b/A\u0010\u0087\u00ff\u00e4k=\u00c8\u00d6\u001c\u007f
           \u00fb\u00db\u00da&\u00d9\u00c5\u00c4\u008a#\u00a0u\u0003J&\n\u0083
           \u00a0\u00e1.\u00ba\u00fd\u008a0s?\u00deg\u00d50\u0015\u00eb\u0091
           \u00b3E\u00c7\u0015O\u00f3r\u00e3`~8\u00b4\u00b5=\u0089U\u007f\u00fa\u0019"

Check this link for the easiest implementation of the GCD algorithm. I won't implement it, because Java has it out of the box, as part of the BigInteger class. Here is my code:

import java.math.BigInteger;
import java.security.MessageDigest;
import java.util.Arrays;

public class RSATest {
 private static final long MAX_ITER = 100000;
 private static final int MAX_TRY = 5;
 private static final int BLOCK = 128;

 // hash function used by OAEP deoceder
 private static byte[] hash(byte[] b) throws Exception {
  assert b.length == BLOCK/2;

  MessageDigest algo = MessageDigest.getInstance("SHA-512");
  algo.reset();
  algo.update(b);
  return algo.digest();
 }

 // converts a String to a BigIntteger
 // convertion is performed as if the String were 
 // in base 256
 public static BigInteger stringToInt(String str) {
  int sz = str.length();

  BigInteger ret = new BigInteger("0");
  BigInteger v256 = new BigInteger("256"); // the base
  BigInteger mul = new BigInteger("1");

  for (int i = sz - 1; i >= 0; --i) {
   int v = str.charAt(i);
   ret = ret.add((new BigInteger("" + v)).multiply(mul));
   mul = mul.multiply(v256);
  }

  return ret;
 }


 // a simple method for xor-ing to byte arrays
 private static byte[] xor(byte[] b, byte[] a) {
  assert b.length == BLOCK/2;
  assert b.length == a.length;

  byte[] ret = new byte[BLOCK/2];
  for (int i = 0; i < BLOCK/2; ++i) ret[i] = (byte) (a[i] ^ b[i]);
  return ret;
 }

 // decodes an OAEP array, google for OAEP padding
 private static byte[] oaepDecode(byte[] b) throws Exception { 
  assert b.length == BLOCK;

  byte[] first  = Arrays.copyOfRange(b, 0, BLOCK/2);
  byte[] second = Arrays.copyOfRange(b, BLOCK/2, BLOCK);
 
  byte[] r = xor(second, hash(first));
  byte[] m = xor(first,  hash(r));

  return m;
 }

 // converts a BigInteger into a String
 // if withOAEP is set to true, then also applies
 // OAEP decoding
 public static String intToString(BigInteger n, boolean withOAEP) 
     throws Exception {
  byte[] b_r_n = n.toByteArray();
  int b_sz = b_r_n.length;

  // apply OAEP decoding
  if (withOAEP) {
   if (b_sz > BLOCK) {
    // make sure we pass the right size, i.e. of one BLOCK
    b_r_n = Arrays.copyOfRange(b_r_n, b_sz - BLOCK, b_sz);
   }
   b_r_n = oaepDecode(b_r_n);
  }
  return new String(b_r_n);
 }

 // a simple decoder with the private key (d, n)
 public static void decode(BigInteger d, BigInteger n, String cipherText) 
     throws Exception {
  BigInteger cipherInt  = stringToInt(cipherText);
  BigInteger messageInt = cipherInt.modPow(d, n);

  // a simple note that if messageInt is a solution for the
  // above modulo congruence eq, then messageInt + k*n is also 
  // a solution

  for (int i = 0; i < MAX_TRY; ++i) {
   System.out.println("Try: " + i);
   System.out.println("Integer message: " + messageInt);
   System.out.println("String message (flat) = " + 
    intToString(messageInt, false));
   System.out.println("String message (oaep) = " + 
    intToString(messageInt, true));
   System.out.println();
   messageInt = messageInt.add(n);
  }
 }

 // this method attempts to decode the cipher by pretending 
 // it knows "p" which is a composition of "n", i.e. n = p*q
 public static void decode_with_p(long e, BigInteger n, String cipher, 
     BigInteger p) throws Exception {
  BigInteger q = n.divide(p);

  BigInteger phi = q.subtract(BigInteger.ONE).multiply(
     p.subtract(BigInteger.ONE));

  BigInteger E = new BigInteger("" + e);
  for (long k = 0; k < MAX_ITER; k++) {
   BigInteger r = phi.multiply(
    new BigInteger("" + k)).add(BigInteger.ONE);

   BigInteger[] qr = r.divideAndRemainder(E);
   if (qr[1].equals(BigInteger.ZERO)) {
    // remainder is ZERO
    System.out.println("Cycle: " + k);
    System.out.println("Try private key: " + qr[0]);
    decode(qr[0], n, cipher);
    System.out.println();
   }
  }
 }

 public static void main(String[] args) throws Exception {
  long e1 = 65537;
  BigInteger n1 = new BigInteger( 
   "592718383732377122410106984267855459479807503768946" +
   "605328456116093852954935746424599660398425081028346" +
   "005508211894335487221528999838842772667374160929852" +
   "573051680099378617005092405110706474184136037559125" +
   "038434888569799049915177291007255128504216646347052" +
   "74281314737938901139871448406073842088742598680079967");

  String cipher1 = 
   "J\u00c1R\u0090\u00e1\u00f4\u008b5My\u00f8\u00a1\u00f4>" +
   "\u00a2\u00c3\u0010\u00bd\u00eb\u00cc&\u007fb\u001aC$" +
   "\u001d\u00c5\u00b7\u00cdz\u00b7\u0017\u008a#9\u0012" +
   "\u0089\u00feao\u0019\u009c\u00eb\u00b0>\u0086\u009b" +
   "\u001d3~b0-u\u00fc\u0004!\rc\\\u00cb$\u0091\u009e" +
   "\u00a1N\u009d2\u00ff\u0019\u009a9vH.\u00d5\u00e7m" +
   "\u00a9m\u00ea^\u00d3T$\u00d7\u00d7\u0011\u0081\u00e4B" +
   "\u009b~\u008c $\u00a6K\u008a\u00dc`\u00b4\u009cu" +
   "\u00fb\u00c2\u0006\u00d1\u00bb\u00b9\u00a0\u008f\u00d2" +
   "\u00bc\u0002\u00f6#\u001f\u001dM\u00bb\u0098\u00f2" +
   "\u00a0\u009fO\u0080";

  long e2 = 65537;
  BigInteger n2 = new BigInteger( 
   "722162209295858740298002503411446285940493287510826" +
   "327939752381429703458019585940083215576976146078904" +
   "922080143847114340766243750345752066598033488377571" +
   "129629910281750410842883648532072455460838627134172" +
   "456428247653875773318287044412273565827235602914257" +
   "53389466410790421096831823015438162111864463275922441");

  String cipher2 = 
   ".\u00fd9\u008dc\u00da\u00f9o\u00f5Vl\u00fb\u0087\u00ed" +
   "\u00d5 \u00ee\u00cf\u0097~\u00d8T\u00f9.\u0018\u00b1" +
   "\u00d5n^\u00a0\rA\u00e0\u001d\u00d5\u00c8:D\u00c9" +
   "\u0014o\u00de\u00dbo\u00f9>)bc'a\u00a2\u008e\u00c1|" +
   "\u00dd\r[q1\u00ac\u000f^\u0082b/A\u0010\u0087\u00ff" +
   "\u00e4k=\u00c8\u00d6\u001c\u007f\u00fb\u00db\u00da&" + 
   "\u00d9\u00c5\u00c4\u008a#\u00a0u\u0003J&\n\u0083" +
   "\u00a0\u00e1.\u00ba\u00fd\u008a0s?\u00deg\u00d50\u0015" +
   "\u00eb\u0091\u00b3E\u00c7\u0015O\u00f3r\u00e3`~8\u00b4" + 
   "\u00b5=\u0089U\u007f\u00fa\u0019";

  // this is really quick
  BigInteger p = n1.gcd(n2);
  System.out.println("p = " + p);
  System.out.println();

  decode_with_p(e1, n1, cipher1, p);
  decode_with_p(e2, n2, cipher2, p);
 }
}

Execute it with the following command line:

java -ea RSATest > results.txt

Inside the garbage, within the results.txt file, you should notice the following:

String message (oaep) = What could go wrong: http://pastebin.com/hNz9gZbe
...
String message (oaep) = A real example: http://digitaloffense.net/tools/debian-openssl

To conclude, take care when you generate the keys for RSA encryption (and yes, go through this link)!

If you'd like to go through more such quests, try this course (a really nice one).

Monday, April 30, 2012

A few words about ConcurrentHashMap

There is a popular belief that, because ConcurrentHashMap is part of the java.util.concurrent package then it is designed to deal with multi-threading only. However, ConcurrentHashMap also has quite a useful property that I will exploit in this post, without mentioning multi-threading at all.

Let's start with the following code:

import java.util.Map;
import java.util.HashMap;

public class HashMapTest {
 // creates a map with the keys from an array and
 // initializes the values with zero 
 public static Map<String, Integer> prepareMap(String[] arr) {
  Map<String, Integer> map = new HashMap<String, Integer>();
  for (String str : arr) {
   map.put(str, new Integer(0));
  }
  return map;
 }

 // updates the map, i.e. increases the values by one
 public static void updateMap(Map<String, Integer> map) {
  for (String str : map.keySet()) {
   Integer i = map.get(str);
   map.put(str, new Integer(i.intValue() + 1));
  }
 }

 // prints the map content
 public static void printMap(Map<String, Integer> map) {
  for (String str : map.keySet()) {
   System.out.println(str + " -- " + map.get(str));
  }
 }

 static public void main(String[] args) {
  String[] arr = {"A", "B", "C", "D", "E", "F", "G", "H"};

  Map<String, Integer> map = prepareMap(arr);
  printMap(map);
  updateMap(map);
  printMap(map);
 }
}

Nothing complicated, we generate a map and process it within the updateMap method. Everything works fine at this point. However, let's imagine that we need a slightly more complicated processing of the map, e.g. if we encounter the key "C", then we need to add a new entry to the map, something like this:

 // updates the map, i.e. increases the values by one
 public static void updateMap(Map<String, Integer> map) {
  for (String str : map.keySet()) {
   Integer i = map.get(str);
   map.put(str, new Integer(i.intValue() + 1));
   if (str.equals("C")) {
    // we have a special case, let's treat it
    map.put("C1", new Integer(0));
   }
  }
 }

Now, when we execute the updated code, it fails (run time) with the following exception:

Exception in thread "main" java.util.ConcurrentModificationException
        at java.util.HashMap$HashIterator.nextEntry(Unknown Source)
        at java.util.HashMap$KeyIterator.next(Unknown Source)
        at HashMapTest.updateMap(HashMapTest.java:19)
        at HashMapTest.main(HashMapTest.java:55)

And this is exactly what we are told by the specification:

"Returns a Set view of the keys contained in this map. The set is backed by the map, so changes to the map are reflected in the set, and vice-versa. If the map is modified while an iteration over the set is in progress (except through the iterator's own remove operation), the results of the iteration are undefined."

In other words, HashMap (in fact, this isn't specific to HashMap only) doesn't allow for "updates while iterating", more specifically adding new entries with new keys.

One way to fix the problem is to create a copy of the key set, something like this:

import java.util.Map;
import java.util.HashMap;
import java.util.HashSet;

public class HashMapTest {
 // creates a map with the keys from an array and
 // initializes the values with zero 
 public static Map<String, Integer> prepareMap(String[] arr) {
  Map<String, Integer> map = new HashMap<String, Integer>();
  for (String str : arr) {
   map.put(str, new Integer(0));
  }
  return map;
 }

 // updates the map, i.e. increases the values by one
 public static void updateMap(Map<String, Integer> map) {
  HashSet<String> keys = new HashSet<String>(map.keySet());
  for (String str : keys) {
   Integer i = map.get(str);
   map.put(str, new Integer(i.intValue() + 1));
   if (str.equals("C")) {
    // we have a special case, let's treat it
    map.put("C1", new Integer(0));
   }
  }
 }

 // prints the map content
 public static void printMap(Map<String, Integer> map) {
  for (String str : map.keySet()) {
   System.out.println(str + " -- " + map.get(str));
  }
 }

 static public void main(String[] args) {
  String[] arr = {"A", "B", "C", "D", "E", "F", "G", "H"};

  Map<String, Integer> map = prepareMap(arr);
  printMap(map);
  updateMap(map);
  printMap(map);
 }
}

But, if we look at the output, we will mention that "C1" node is unprocessed:

    ...
    A -- 1
    B -- 1
    C -- 1
    C1 -- 0
    H – 1
    ...

As a result, we will need a new routine to process the unprocessed data. There is a different solution though, to use ConcurrentHashMap, which according to the specification:

"Retrievals reflect the results of the most recently completed update operations holding upon their onset."

And here is the final code:

import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;

public class HashMapTest {
 // creates a map with the keys from an array and
 // initializes the values with zero 
 public static Map<String, Integer> prepareMap(String[] arr) {
  Map<String, Integer> map = new ConcurrentHashMap<String, Integer>();
  for (String str : arr) {
   map.put(str, new Integer(0));
  }
  return map;
 }

 // updates the map, i.e. increases the values by one
 public static void updateMap(Map<String, Integer> map) {
  for (String str : map.keySet()) {
   Integer i = map.get(str);
   map.put(str, new Integer(i.intValue() + 1));
   if (str.equals("C")) {
    // we have a special case, let's treat it
    map.put("C1", new Integer(0));
   }
  }
 }

 // prints the map content
 public static void printMap(Map<String, Integer> map) {
  for (String str : map.keySet()) {
   System.out.println(str + " -- " + map.get(str));
  }
 }

 static public void main(String[] args) {
  String[] arr = {"A", "B", "C", "D", "E", "F", "G", "H"};

  Map<String, Integer> map = prepareMap(arr);
  printMap(map);
  updateMap(map);
  printMap(map);
 }
}

All the entries are processed accordingly.

Now, let's check the efficiency, by executing the following code:

 static public void main(String[] args) {
  String[] arr = {"A", "B", "C", "D", "E", "F", "G", "H"};

  long total = 0;
  int max = 1000000;
  for (int i = 0; i < max; i++) {
   long start = System.currentTimeMillis();
   Map<String, Integer> map = prepareMap(arr);
   updateMap(map);
   long end = System.currentTimeMillis();
   total += end - start;
  }
  System.out.println("Average time " + ((1.0*total)/max) + "ms");
 }

Average for the ConcurrentHashMap version:

Average time 0.0022ms

Average for the HashMap and HashSet version:

Average time 0.0010ms

Indeed, ConcurrentHashMap brings some (worth to consider) performance penalties. But it isn't like it is twice slower than the HashMap and HashSet version, because in this example we are operating with a small number of entries:

String[] arr = {"A", "B", "C", "D", "E", "F", "G", "H"};

If we increase the number of entries to e.g. 259 (from "A0", "A1", ... to "Z9", excluding "C1"), then we have the following figures:

For the ConcurrentHashMap version:

Average time 0.048ms

For the HashMap and HashSet version:

Average time 0.033ms

Wednesday, February 29, 2012

The Game Of Life

Did I manage to drag your attention with the title? Good. Let's proceed to the lyrical part of this post then. A few years ago I watched this film and found the statistics starting at 1:16:56 being quite alarming. Not that I didn't know about most of those details, but it is totally different when you see them altogether. I encourage you to watch it as well. However, for the sake of this post, let's retain the following: "20% of the world's population consumes 80% of its resources".

Now, let's proceed to the technical part. My son started to learn Pascal at school and from time to time I give him "advanced" assignments, to polish his programming (and thinking) technique. After a few complaints from him, of the sort of "I am tired of doing nonsenses like calculate π, sorting etc. I want to do something ... real!" (well, he is just 12 years old), I decided to give him something "more real", a very basic Monte-Carlo simulation that I called "The Game Of Life". The reaction was quite unexpectedly expected, he got dragged by the title :)

So, what is the game about? Basically, let's imagine a world with limited resources (I don't think it is difficult to imagine this), which in the language of the game is called simply "wealth". We have a population of players with the total number of PLAYERS=100. Each player has two attributes, "wealth" and "power". The game starts with a equal distribution of wealth between the players and the "power=0" for each player. With each iteration, we can call it "a year", each player plays against a randomly selected player (players can't play with themselves). The rules for winning a turn are quite simple:

  • The player with greater "power" wins.
  • If the players have the same power, the richest one wins.
  • If the players have the same power and wealth, the winner is selected randomly.
  • The winner gets a reward, which means their "power" increases by 1 and if the loser has sufficient "wealth", the winner's "wealth" increases by 1, the loser's wealth decreases by 1 respectively (so we keep the resources constant).
The game continues for 10000 iterations.

Here is the code (slightly adjusted with my help):


program montecarlosimulation;
uses wincrt;
const
        PLAYERS = 100;
        YEARS = 10000;

var
        i,j:integer;
        x,y:integer;
        wealth:array[1..PLAYERS] of longint;
        power :array[1..PLAYERS] of longint;
        tmp:longint;
        poor:integer;

// p1 is the winner!!!
procedure reward(p1, p2:integer);
begin
        if wealth[p2] > 0 then begin
                wealth[p1] := wealth[p1] + 1;
                wealth[p2] := wealth[p2] - 1;
        end;
        power[p1] := power[p1] + 1;
end;

// decide who is the winner and apply the reward
procedure decision(a, b:integer):integer;
var
        d:integer;
begin
        // same power
        if power[b] = power[a] then begin
               // same wealth
                if wealth[a] = wealth[b] then begin
                        // pick the winner randomly
                        d := random(2);
                        if d = 0 then reward(a,b)
                        else reward(b,a);
                end
                else begin
                        if wealth[b] > wealth[a] then reward(b,a)
                        else reward(a,b);
                end;
        end
        else begin
                if power[b] > power[a] then reward(b,a)
                else reward(a,b);
        end;
end;

begin
        // initialise the players' values
        // total wealth is PLAYERS*1000
        for i := 1 to PLAYERS do begin
                wealth[i] := 1000;
                power[i] := 0;
        end;

        // the actual game
        randomize;
        for i := 1 to YEARS do begin
                for x := 1 to PLAYERS do begin
                        y := random(PLAYERS) + 1;
                        // the actual play
                        if x <> y then decision(y, x);
                end;
        end;

        // sorting the results, wealthiest on top
        for i := 1 to PLAYERS do begin
                for j:=i+1 to PLAYERS do begin
                        if wealth[i] < wealth[j] then begin
                                tmp := wealth[i];
                                wealth[i] := wealth[j];
                                wealth[j] := tmp;

                                tmp := power[i];
                                power[i] := power[j];
                                power[j] := tmp;
                        end;
                end;
        end;

        // printing the results
        poor := 0;
        for i := 1 to PLAYERS do begin
           if (wealth[i] = 0) then poor := poor + 1;
           writeln('person[', i,'] has ', wealth[i],' wealth and ',power[i],' power');
        end;

        // printing statistics
        writeln(poor, ' of ', PLAYERS, ' are poor.');
end.
This code was compiled using Free Pascal IDE.

Now, let's look at the results. After executing this code several times, the number of wealthy players (those with "wealth" > 0) is in between 20-23, from 100. This is ~20%. Setting YEARS = 15000 will decrease the result to 16%. But let's get back to the 20% figure, is this a coincidence with the 20% from the lyrical part on top of the post? The decision to set YEARS = 10000 was somewhat random, but later we (I and my son) figured out that ... check this out "Human prehistory begins in the Paleolithic Era, or "Early Stone Age". Later, during the Neolithic Era (New Stone Age), came the Agricultural Revolution(between 8000 and 5000 BCE) in the Fertile Crescent, where humans first began the systematic husbandry of plants and animals".

Couple of interesting facts as you may see. Obviously, the code is a way too simple to pretend to model a real game of life, e.g. the number of players is constant, there is no way for the poor players to ever win (which can happen in real life), lots of rich people are also philanthropists, "brain" (and "faith") attribute isn't considered at all, two or more poor players with total "power" greater than the "power" of a rich player can compete against that player, an "insufficiently" rich player can bribe a richer and more powerful player to win the play with another player etc. This, of course, will make the game more real and complicated (for my son, who still learns Pascal, while his father "plays" the game).

Anyway, if I managed to drag your attention to this subject, feel free to write a better version of "The Game Of Life" and drop me a note with your results. I am really curious.

Friday, January 27, 2012

Scope of variables in Java, nulling and why this might be important

While playing with some performance improvements this week, I came across this article. Weird you'd say, but let me show my version of the test scenario.

First of all, I am using Java 1.6.0_24 HotSpot 64-bit and executing the code with the following options (i.e. limiting the heap size to 64Mb):


java -Xms64m -Xmx64m MemTest

Here is the first version of the code:


public class MemTest {
 public static int avg(int[] ns) {
  long sum = 0;
  for (int i = 0; i < ns.length; i++) sum += ns[i];
  return (int) (sum/ns.length);
 }

 public static void main(String[] args) {
  int[] nums = null;

  for (int j = 0; j < 1000000; j++) {
   System.out.print("Iteration: ");
   System.out.println(j);

   int sz = 2<<(j % 30);
   if (sz > 11162611) sz = 11162611;

   System.out.print("Dynamic size: ");
   System.out.println(sz);

   nums = new int[sz];
   for (int i = 0; i < nums.length; i++) nums[i] = i;

   int avr = avg(nums);
   System.out.print("Average: ");
   System.out.println(avr);
  }
 }
}

There is nothing complicated in this code, we allocate an array (int[] nums) outside the main "for" loop (this is the key point of this post by the way) and calculate the average. Everything works fine so far. Now let's remove the "for" loop responsible for assignments:


public class MemTest {
 public static int avg(int[] ns) {
  long sum = 0;
  for (int i = 0; i < ns.length; i++) sum += ns[i];
  return (int) (sum/ns.length);
 }

 public static void main(String[] args) {
  int[] nums = null;

  for (int j = 0; j < 1000000; j++) {
   System.out.print("Iteration: ");
   System.out.println(j);

   int sz = 2<<(j % 30);
   if (sz > 11162611) sz = 11162611;

   System.out.print("Dynamic size: ");
   System.out.println(sz);

   nums = new int[sz];
   int avr = avg(nums);
   System.out.print("Average: ");
   System.out.println(avr);
  }
 }
}

Now the execution fails with the OutOfMemoryError. Odd, isn't it? What do we typically do in such cases? Increase the heap size? Well, but the previous version was working fine with the same heap size. Ok, now let's do something which isn't considered as a good practise by adding the line "nums = null;":


public class MemTest {
 public static int avg(int[] ns) {
  long sum = 0;
  for (int i = 0; i < ns.length; i++) sum += ns[i];
  return (int) (sum/ns.length);
 }

 public static void main(String[] args) {
  int[] nums = null;

  for (int j = 0; j < 1000000; j++) {
   System.out.print("Iteration: ");
   System.out.println(j);

   int sz = 2<<(j % 30);
   if (sz > 11162611) sz = 11162611;

   System.out.print("Dynamic size: ");
   System.out.println(sz);

   nums = new int[sz];
   int avr = avg(nums);
   nums = null; // <<-- HERE IS THE NEW LINE

   System.out.print("Average: ");
   System.out.println(avr);
  }
 }
}

Hey, fantastic, it works again! And now let's do it properly by changing the scope of the "int[] nums":


public class MemTest {
 public static int avg(int[] ns) {
  long sum = 0;
  for (int i = 0; i < ns.length; i++) sum += ns[i];
  return (int) (sum/ns.length);
 }

 public static void main(String[] args) {
  for (int j = 0; j < 1000000; j++) {
   System.out.print("Iteration: ");
   System.out.println(j);

   int sz = 2<<(j % 30);
   if (sz > 11162611) sz = 11162611;

   System.out.print("Dynamic size: ");
   System.out.println(sz);

   int[] nums = new int[sz]; // <<-- LOCAL SCOPE
   int avr = avg(nums);

   System.out.print("Average: ");
   System.out.println(avr);
  }
 }
}

And it works again.

So, why is this happening? I have no idea. It could be a particularity of the JVM implementation, as the author of the article, I mentioned at the top, suggests. But, obviously we can draw a conclusion, variable scope is something worth considering (and probably coding standards aren't always just "cosmetic" features).

Wednesday, January 25, 2012

Getting ready for Oracle Certified Professional Java SE Programmer?

Few years ago I passed SCJP (Sun Certified Java Programmer) certification and, during the preparation for the test, compiled a short, 18 pages, document containing some key facts required for passing the test successfully. I "accidently" found it in my archives and decided to share it, here is the link.

The document is a collection of various facts, gathered from various books/articles as well as code examples written by me while experimenting. It doesn't always clarify each topic with great details, but this wasn't the purpose of the document. So, if you struggle with a topic and want to know more, I always recommend to consult JLS (Java Language Specification).

You are more than welcome to use it and I will be happy to know if it helped you to pass the test (or at least you find it useful) ;)

Wednesday, December 14, 2011

Jail Break Quiz

Everything was quiet and peaceful today, until Google Reader brought to my attention this puzzle from plus.maths.org.

So, I said, why not writing a little Java program to resolve this? Here is the code:


public class Puzzle1 {
 static public final int MAX = 100;

 static public void printDoors(boolean[] doorIsOpen) {
  int count = 0;
  for (int i = 0; i < doorIsOpen.length; i++) {
   if (doorIsOpen[i]) {
    System.out.print("o");
    count++;
   }
   else System.out.print("c");
  }
  System.out.println(" =" + count + " open doors");
 }

 static public void main(String[] args) {
  boolean[] doorIsOpen = new boolean[MAX];
  
  for (int i = 1; i <= MAX; i++) {
   for (int j = i; j <= MAX; j+=i) {
    doorIsOpen[j-1] = !doorIsOpen[j-1];
   }
   printDoors(doorIsOpen);
  }
 }
}

The result was "10 prisoners have escaped", as per the last line printed by the program:


occoccccoccccccoccccccccocccccccccco
ccccccccccccoccccccccccccccocccccccc
ccccccccocccccccccccccccccco 
=10 open doors

The most interesting part of the puzzle, though, is the "why" one. Have a look at the sequence; 1st position (cell in this case) contains "o" (from "open"), 4 - "o", 9 - "o", 16 - "o", 25 - "o", 36 - "o", 49 - "o", 64 - "o", 81 - "o", 100 - "o". Basically, every number of type q2≤100 corresponds to an "o". Now, this is weird ... but not for a long time.

Let's stick with an easier example (10 rather than 100):

oooooooooo
ocococococ
occcoooccc
occooooocc
occocoooco
occoccooco
occocccoco
occoccccco
occoccccoo
occoccccoc
  • 1st cell (column from the left) is visited one time and remains opened and unvisited after that. Also number 1 has one divisor {1}.
  • 2nd cell is visited two times (one "o" and one "c") and remains closed and unvisited after that. Also number 2 has two divisors {1, 2}.
  • 3rd cell is visited two times (one "o" and one "c") and remains closed and unvisited after that. Also number 3 has two divisors {1, 3}.
  • 4th cell is visited three times (one "o", one "c" and one "o") and remains opened and unvisited after that. Also number 4 has three divisors {1, 2, 4}.
  • ...

It turns out that the number of visits is equal to the number of divisors the number, corresponding to the cell (column from the left), has. Here is the formula and this statement is easy to prove (e.g. using induction).

  • Now, if the number of divisors is even (we have a whole number of pairs {"o", "c"}), because the sequence always starts with "o" then it must end with "c".
  • However, if the number of divisors is odd (we have a whole number of pairs {"o", "c"} plus one extra {"o"}), the sequence ends with "o".

So far so good, but when the number of divisors is an odd number? According to the formula mentioned above the number
τ(n)=(e1+1)⋅(e2+1)⋅(e3+1)⋅...⋅(ek+1)
is odd when each of these numbers e1, e2, e3,...,ek is even, but this means that n is of type q2. Q.E.D.

Monday, December 12, 2011

K-Means Clustering Algorithm

This post is dedicated to K-Means Clustering Algorithm, used for unsupervised learning. Here is a short introduction into the unsupervised learning subject. This video shows the algorithm at work.

So, how does it work?

1. First of all we have the number K (the number of clusters) and the data set X1,...,Xm as inputs, K < m and Xj ∈ ℜn for ∀j=1..m. Considering that clustering is used as a task in data mining (e.g. including text), it is a nice exercise indeed to formalise a problem in a way where data is represented by vectors :)

2. At this step we randomly initialise K cluster centroids μ1,...,μK, μj ∈ ℜn for ∀j=1..K. The recommended way is to randomly initialise the centroids from the data set X1,...,Xm and make sure that μi≠μj for ∀ i≠j.

3. At this step we execute the loop:


Repeat {
  //cluster assignment step
  For i=1..m {
    Find the closest centroid for Xi, i.e.
    min{||Xij||}, ∀j=1..K, e.g. it is μt;
    Assign ci=t;
    //in this case ci is the index of 
    //the closest centroid for Xi
  }

  //move centroids step, if a particular cluster 
  //centroid has no points assigned to it, 
  //then eliminate it
  For j=1..K {
    old_μj = μj;
    μj = average (mean) of all Xi where ci=j, ∀i=1..m;
  }
}
This loop ends when all the centroids stop "moving", i.e. ||old_μj - μj|| < ε, ∀j=1..K, where ε is an error we are happy with (e.g. 0.01 or 0.001).

This is pretty much the algorithm. However, in this form, there is a risk to get stuck in a local minima. By local minima I mean the local minima of the cost function:
J(X1,...,Xm,c1,...cm1,...,μK) = (1 ⁄ m)⋅∑||Xi – μci||2, sum is for i=1..m

In order to address this, the algorithm (steps 2, 3) are executed several times (e.g. 100). Each time the cost function (J(...)) is computed and the clustering with the lowest J(...) is picked up. E.g.


For p=1..100 {
  Perform step 2;
  Perform step 3;
  Calculate cost function J(...);
}
Pick up clustering with the lowest cost J(...);

In order to make it more interactive, I and my son spent couple of hours implementing a JavaScript version of this algorithm using <canvas> tag from HTML5 (my son studies HTML at school, so it was an interesting exercise). Here is the link to the page (excuse me and my son for our non OOP approach to JavaScript :)). If you want to have a look at the sources, feel free to download that page (we didn't provide comments, but hopefully this article explains everything). We tested it with Google Chrome and Internet Explorer 9 (if you have problems with IE9, please consult the following link).

Tuesday, November 22, 2011

A* Search

Algorithms form an important part of the problem solving process in Artificial Intelligence. One useful algorithm is A* Search, which is an extension of another useful algorithm called Dijkstra's algorithm. This video is a short introduction to the subject.

So, the key part of this algorithm is the evaluation function $$f(\text{node})=g(\text{node})+h(\text{node})$$ (note that in the video "state" is treated as a "node"), where
- $g(\text{node})$ is the cost from the "Start" node to the current node and
- $h(\text{node})$ is the heuristic (estimated) cost from the current node to the "Goal".
The algorithm works by always expanding the path which has the minimum value of $f(\text{node})$ first (cheapest first). The search terminates when the "Goal" node is chosen for expansion or there are no more nodes to expand.

For the heuristic function to be admissible (or optimistic, which is the same) it is important that for any node $h(\text{node}) \le$ actual cost from the node to the "Goal". If this is true, then "A* Search" will always find the lowest cost path from the "Start" to the "Goal", this video explains this principle with more details.

As you could mention, the last video operates with another term called "search frontier". It is a "characteristic" of the algorithm or the way it progresses to/with expanding the nodes. For example:
- this article shows how the search frontier looks for the Breadth-First Search algorithm and
- this article shows how the search frontier looks for the Depth-First Search algorithm.

An implementation of this algorithm would use a priority queue, where priority is dictated by the function f.

Now, let's consolidate this material with an exercise:

For heuristic function h and cost of action 10 (per step), find the order (1, 2, 3, ...) of the nodes expansion (the node is expanded when it is removed from queue). Start with "1" at the "Start" state at the top. Indicate the nodes that will never be expanded. Is the heuristic function admissible?

First of all the heuristic function is not admissible because h(B5)=20 > 10, where 10 is the actual cost from B5 to the "Goal".

Now, let's start with the expansion. The frontier is at the "Start".

1. f(A1)=10+11=21, f(A2)=18, f(A3)=17, f(A4)=16, f(A5)=20. The frontier moves to A1, A2, A3, A4, A5 (this is the content of the queue).

2. A4 is the node (in the queue) with the minimum f (=16), so it's the second node to be expanded (removed from the queue). f(B4)=10+10+5=25, f(B5)=40. The frontier moves to A1, A2, A3, B4, B5, A5.

3. A3 is the next node with the minimum f (=17), so it's the third node to be expanded. But there is nothing to add to the queue. The frontier now is A1, A2, B4, B5, A5.

4. A2 is the next node with the minimum f (=18), so it's the forth node to be expanded. f(B2)=10+10+3=23, f(B3)=29. The frontier moves to A1, B2, B3, B4, B5, A5.

5. A5 is the next node (again, in the queue) with the minimum f (=20), so it's the fifth node to be expanded. f("Goal")=10+10+0=20. The frontier moves to A1, B2, B3, B4, B5, "Goal".

6. The "Goal" is the next node (from the queue) with the minimum f (=20), so it's the sixth node to be expanded and, because it is the "Goal", it is also the last one to be expanded (i.e. end of the search).

Here is the expansion order: "Start"-1, A4-2, A3-3, A2-4, A5-5, "Goal"-6.

The nodes which were never expanded: A1, B1, B2, B3, B4 and B5.

Friday, August 12, 2011

The Code, Part 2

For the last two weeks I am addicted to The Code. Those, who successfully passed pre-selection, were offered to download a PDF file (you need the passwords from the pre-selection to open it) with even more puzzles.

Here is another one I liked:

A Mathematician's Apology
ABC = A3 + B3 + C3
CDE = C3 + D3 + E3
CDA = C3 + D3 + A3
FED = F3 + E3 + D3
A⋅100 + A⋅10 + D = ?

Now, if you Google for "A Mathematician's Apology", you will come to the following link, which is a book with the same title by Hardy. Section 15 reveals all the solutions for the equation:
x⋅100 + y⋅10 + z = x3 + y3 + z3
where x∈{1..9}, y,z∈{0..9}. They are (1,5,3), (3,7,0), (3,7,1) and (4,0,7).

The pattern is easy to spot now:
(A,B,C) = (1,5,3)
(C,D,E) = (3,7,0)
(C,D,A) = (3,7,1)
(F,E,D) = (4,0,7)

and A⋅100 + A⋅10 + D = 117

However, if (like me) "Google search" isn't (always) an obvious option, the following Java code:

public class Code {
  public static int func(int a, int b, int c) {
    return a*100 + b*10 + c - a*a*a - b*b*b - c*c*c;		
  }

  public static void main(String[] args) {
    for (int a = 1; a < 10; a++) {
      for (int b = 0; b < 10; b++) {
        for (int c = 0; c < 10; c++) {
          if (func(a,b,c) == 0) {
            System.out.println("("+a+","+b+","+c+")");
          }
        }
      }
    }
  }
}                                       

will also return (1,5,3), (3,7,0), (3,7,1) and (4,0,7).

Sunday, July 10, 2011

Sudoku game or humans vs. machines (computers in this case)

Few days ago, a friend of mine posted the following link on his wall on Facebook. The main question from that article was "how long will it take to crack ...the world's hardest Sudoku?". I felt like being challenged.

I am not a Sudoku fan, however my wife and my son are. Just to annoy them (not because I am a bad person :), but because I was looking for a way to sort the Sudoku puzzles in a quicker way, when being asked to assist them ... so, "I am lazy" is a more precise definition) and because I am a software engineer, I wrote a long time ago (of course I was inspired, it's not entirely my contribution) a little piece of Java code to resolve such puzzles. As a result, I decided to test it.

Here is the code:

import java.util.*;
import java.io.*;

public class ResolveSudoku {
 private static final int[] sudokuMatrix = {0,0,0,0,0,0,0,0,0,
         0,0,0,0,0,0,0,0,0,
         0,0,0,0,0,0,0,0,0,
         0,0,0,0,0,0,0,0,0,
         0,0,0,0,0,0,0,0,0,
         0,0,0,0,0,0,0,0,0,
         0,0,0,0,0,0,0,0,0,
         0,0,0,0,0,0,0,0,0,
         0,0,0,0,0,0,0,0,0};

 // load the content from file
 public static void load(String file) throws Exception {
  String str = null;
  int line = 0;
  BufferedReader br = new BufferedReader(new FileReader(file));

  while ((str = br.readLine()) != null) {
   if (str.trim().length() == 0 || str.startsWith("#")) 
    continue;

   if (line >= 9) {
    line = 10;
    break;
   }

   String[] vls = str.split(",");
   if (vls.length != 9) {
    br.close();
    throw new Exception("Line " + (line + 1) + 
     " has illegal number of elements!");
   }

   for (int i = 0; i < 9; ++i) {
    int value = Integer.parseInt(vls[i].trim());
    if (value < 0 || value > 9) {
     br.close();
     throw new Exception("Line " + (line + 1) + 
      " position " + (i + 1) + " has illegal value!");
    }
    sudokuMatrix[line*9 + i] = value;
   }

   ++line;
  }

  br.close();
  if (line != 9) throw new Exception("File " + file + 
   " has illegal number of lines!");
 }

 // print the content of the sudoku matrix
 public static void printMatrix() {
  for (int i = 0; i < 81; ++i) {
   if (i != 0 && i % 3  == 0) System.out.print(" ");
   if (i != 0 && i % 9  == 0) System.out.println();
   if (i != 0 && i % 27 == 0) System.out.println();

   if (sudokuMatrix[i] == 0) System.out.print(".");
   else System.out.print(sudokuMatrix[i]);
  }
  System.out.println("\n");
 }

 private static boolean check(int row, int col, int value) {
  // check if row is ok
  int tmp = row * 9;
  for (int i = 0; i < 9; ++i) {
   if (sudokuMatrix[tmp + i] == value) 
    return false;
  }

  // check if column is ok
  for (int i = 0; i < 9; ++i) {
   if (sudokuMatrix[i*9 + col] == value) 
    return false;
  }

  // check if box is ok
  tmp = (row / 3)*27 + (col / 3)*3;
  if (sudokuMatrix[tmp] == value) return false;
  if (sudokuMatrix[++tmp] == value) return false;
  if (sudokuMatrix[++tmp] == value) return false;

  tmp += 7;
  if (sudokuMatrix[tmp] == value) return false;
  if (sudokuMatrix[++tmp] == value) return false;
  if (sudokuMatrix[++tmp] == value) return false;

  tmp += 7;
  if (sudokuMatrix[tmp] == value) return false;
  if (sudokuMatrix[++tmp] == value) return false;
  if (sudokuMatrix[++tmp] == value) return false;

  return true;
 }

 public static boolean findSolution(int row, int col) {
  if (row == 9) {
   row = 0;
   if (++col == 9) return true;
  }

  int pos = row * 9 + col;
  if (sudokuMatrix[pos] != 0) return findSolution(row + 1, col);

  for (int i = 1; i <= 9; ++i) {
   if (check(row, col, i)) {
    sudokuMatrix[pos] = i;
    if (findSolution(row + 1, col)) return true;
   }
  }

  // reset the value back to zero 
  sudokuMatrix[pos] = 0; 
  return false;
 }

 public static void main(String[] args) throws Exception {
  load(args[0]);
  printMatrix();

  long start = System.currentTimeMillis();
  boolean res = findSolution(0, 0);
  long ex_time = System.currentTimeMillis() - start;

  if (res) printMatrix();
  else System.out.println("No solution!");

  System.out.println("Execution time: " + ex_time + "ms.");
 }
}

It's, basically, a routine that reads the puzzle's matrix from a file and another one that performs a brute force (i.e. analysing all the possible cases) in form of (I would say) an elegant recursive method call. Having the following input:

0,0,5,3,0,0,0,0,0
8,0,0,0,0,0,0,2,0
0,7,0,0,1,0,5,0,0
4,0,0,0,0,5,3,0,0
0,1,0,0,7,0,0,0,6
0,0,3,2,0,0,0,8,0
0,6,0,5,0,0,0,0,9
0,0,4,0,0,0,0,3,0
0,0,0,0,0,9,7,0,0
The problem was solved in ~30 milliseconds (after the byte code was JIT-ed on my "2.13Ghz, Core 2 Duo with SSD on board" laptop). And the answer was:

145 327 698
839 654 127
672 918 543

496 185 372
218 473 956
753 296 481

367 542 819
984 761 235
521 839 764

Monday, December 27, 2010

Re Java final keyword, immutability and reflection

Few days ago, in a conversation ("Sun Certified Java Programmer " group at http://www.linkedin.com/) regarding Immutability pattern and Java Reflection, somebody came with the following code example (slightly adjusted by me in order to highlight the class attributes initialization order), which I found quite interesting and decided to post it.

So, execute the following code:
import java.lang.reflect.Field; 

class Test {
 public Test() {
  doIt();
 }
 public void doIt() {}
}

public class TestReflection extends Test { 
 private final String name = "Immutable";
 private String n = "n";
 public TestReflection() { } 

 public void doIt() {
  System.out.println("1 = " + name); 
  System.out.println(n); 
  System.out.println("-----");
 }

 public static void main(String[] args) throws Exception { 
  TestReflection abc = new TestReflection();
  Class c1 = Class.forName("TestReflection");
  Field field2 = c1.getDeclaredField("name"); 
  field2.setAccessible(true); 

  System.out.println("2 = " + abc.name); 
  field2.set(abc, "Mutable"); 
  System.out.println("3 = " + abc.name); 
 } 
}

Output: 
1 = Immutable
null
-----
2 = Immutable
3 = Immutable

And the following version of the code:
import java.lang.reflect.Field;

class Test {
 public Test() {
  doIt();
 }
 public void doIt() {}
}
public class TestReflection extends Test { 
 private final String name; 
 private String n = "n";

 public TestReflection() { 
  name = "Immutable"; 
 } 

 public void doIt() {
  System.out.println("1 = " + name); 
  System.out.println(n); 
  System.out.println("-----");
 }

 public static void main(String[] args) throws Exception { 
  TestReflection abc = new TestReflection();
  Class c1 = Class.forName("TestReflection"); 
  Field field2 = c1.getDeclaredField("name"); 

  field2.setAccessible(true); 
  System.out.println("2 = " + abc.name); 
  field2.set(abc, "Mutable"); 
  System.out.println("3 = " + abc.name); 
 } 
}

Output: 
1 = null
null
-----
2 = Immutable
3 = Mutable

The answer to this behaviour can be found in JLS (http://java.sun.com/docs/books/jls/third_edition/html/j3TOC.html), section 17.5.3,

"Even then, there are a number of complications. If a final field is initialized to a compile-time constant in the field declaration, changes to the final field may not be observed, since uses of that final field are replaced at compile time with the compile-time constant. "

Wednesday, December 22, 2010

Java 32bits vs. 64bits

I was playing recently with a recursive method for building all the possible routes via a set of nodes (with some predefined rules to link each pair of nodes) in Java. Rather than persisting the calculated routes into a file, I keep them in a HashSet.

Having about 138 nodes, this recursive method generates more than 5mln routes. This is quite an intensive calculation that consumes (and this is the point)
- about 1GB RAM with 32 bits JVM and
- about 2GB with 64 bits JVM.

Execution time, for building the routes, is like 17 seconds with 32 bits JVM and 21 seconds with 64 bits JVM.

Obvious question is "What the heck?". Apparently, here are the problems:
1. 64 bits systems are supposed to use (correct) 64 bits pointers (please allow me to use this C/C++ term) for accessing the process address space, thus allowing the process to use/access more (virtual) memory (i.e. 4GB with 32 bits vs. 16TB with 64 bits). But keeping an allocation table for 64 bits pointers (comparing to 32 bits) requires much more memory. So, 1GB vs. 2GB RAM isn't really a JVM problem.

Conclusion: if at some point you decide to use 64 bits JVM, don't forget to adjust the heap volume accordingly, in order to avoid unpleasant surprises.

2. Problem is also related to the way memory chips (DDR SDRAM) are made. Without diving into details (more details here http://lwn.net/Articles/250967/), accessing physical memory is a very slow operation and CPU tends to store frequently accessed data in its cache memory (L1, L2 and/or L3, which are far better memory chips, more expensive though). Modern compilers also tend to help developers from this point of view, if they follow some very well known rules (e.g. use arrays, stack is "cache-able", don't use "volatile" unless you really know what it is, etc.). But larger pointers require different memory alignment, thus 64 bits pointers aren't so cache efficient in the way 32 bits pointers are (e.g. when thinking about arrays). As a result, accessing physical RAM (or CPU cache miss) is the result of the 4 seconds difference.

To conclude: take care when deciding about using 32 bits or 64 bits Java.

And few online resources:
http://benjchristensen.com/2007/02/16/32-bit-versus-64-bit-jdk-memory-usage/
http://portal.acm.org/citation.cfm?id=1107407