rtybase: Java

Showing posts with label Java. Show all posts

Saturday, April 27, 2024

A few thoughts on faster sorting

In this article, I am presenting a sorting schema, hugely inspired by the Bucket and Proxmap sorting algorithms. I call it "schema", rather than "algorithm", simply because it introduces a (or another, depending on the sorting algorithm) "devide and conquer" element to the existing sorting algorithms, aiming (imho) to make them faster.

Also, let's recall that most of the sorting algorithms come with the time complexity $${\displaystyle O(N \log{N})} \tag{1}$$ With more details ...

1. The description of the schema

Let's suppose we have a list $L$ with $N$ (list size, thus a number) objects to sort. Now, let's imagine that we have $M$ (a number) buckets which are pre-sorted (buckets, not the content) according to the same criteria applied to sort the objects from the list $L$. Let's use strings as an example, if $$L=\{\text{BBBB}, \text{BBAA}, \text{AAAA},\text{...}\}$$ then we can "bucket" by the 1st letter of the string, giving us a total of 26 letters in the English alphabet (and yes let's ignore the case sensitivity and numbers and ... everything to keep things easy). We keep these buckets sorted as $$B_1=\{A\}, B_2=\{B\},...,B_{26}=\{Z\}$$ Then we partition/distribute the content of $L$ accross the buckets, sort the buckets individually and (finally) merge the content of the buckets to the final sorted list (simple traversal of the buckets, since they are already sorted). Something like this

What we have (in terms of time complexity) as a result

List $L$ traversal: ${\displaystyle O(N)}$
It's important that the distribution of objects to buckets is ${\displaystyle O(1)}$ per object, e.g. using hash maps
Sorting per bucket (from $(1)$) is ${\displaystyle O\left(\frac{N}{M} \cdot\log{\frac{N}{M}}\right)}$. Assumptions is that the content of the list $L$ is "uniformly" distributed (!!!) across the buckets.
Merging into the final list is ${\displaystyle O(M)}$. Again, buckets are pre-sorted.

2. The mathematical argument

Time complexity in the schema described above is $$ {\displaystyle O\left(N + M \cdot \frac{N}{M}\cdot \log{\left(\frac{N}{M}\right)}+M\right)} $$ which is $${\displaystyle O\left(N + N\cdot\log{\left(\frac{N}{M}\right)}+M\right)} \tag{2}$$ If we compare $(1)$ and $(2)$ $$\frac{N + N\cdot\log{\left(\frac{N}{M}\right)}+M}{N \cdot\log{N}}= 1-\frac{\log{M} - 1}{\log{N}}+\frac{M}{N\cdot\log{N}} \tag{3}$$ Assuming large enough $M < N$ we have $$\frac{N + N\cdot\log{\left(\frac{N}{M}\right)}+M}{N \cdot\log{N}}< 1-\frac{\log{M} - 2}{\log{N}}< 1 \tag{4}$$ In other words, for large $N$ and $M$ we should expect this schema to make most of the sorting algorithms "a little bit" faster. By how much? Let's see ...

3. Specific example

Let's consider lists of strings as an example. Here is the source code of a little PoC project which generates lists of randomly generated strings and sorts them using

the Java's Collections::sort and
an implementation of the schema above, which internally uses Collections::sort as well to sort the content of the buckets

and compares the results. No parallelisation is used, although sorting buckets in parallel could significantly improve the speed in the second (B) case, but no cheating!

Most of the aspects in the code are easy to tune, but we will consider only

lists of size $N=1000000$ and $N=3000000$ and
bucketing by the first 2 letters, giving the total number of buckets $M=26^2$

Plugging these numbers into the formula $(3)$ we should expect

$M=26^2$ and $N=1000000$	$M=26^2$ and $N=3000000$
0.60	0.63

An improvement of nearly $40$%!? No way ...

4. Actual results

One more test parameter I should mention (and which is not part of the formula $(3)$) is $X$ the length of each string added to the list $L$, this is to count for Java string comparison where string size plays a role. We will look for $X=10$, $X=50$ and $X=100$. Here are some results from running the PoC code on my fairly old computer with an Intel i5-3330 3.00GHz 4 Cores CPU on board, using Java 17

$X=10$, $M=26^2$ and $N=1000000$	$X=10$, $M=26^2$ and $N=3000000$
Reported result: Fast Sort avg: 422.33ms Std Sort avg: 677.96ms	Reported result: Fast Sort avg: 1484.24ms Std Sort avg: 2395.39ms
$\frac{422.33}{677.96}\approx 0.6229$	$\frac{1484.24}{2395.39}\approx 0.6196$

A few more results

$X=50$, $M=26^2$ and $N=3000000$	$X=100$, $M=26^2$ and $N=3000000$
Reported result: Fast Sort avg: 1545.82ms Std Sort avg: 2773.65ms	Reported result: Fast Sort avg: 1561.84ms Std Sort avg: 2883.08ms
$\frac{1545.82}{2773.65}\approx 0.5573$	$\frac{1561.84}{2883.08}\approx 0.5417$

5. Conclusions

Well, the test results are not too far off from the calculated results. So, the schema does improve the sorting ... However, I should mention the following:

It works well when the list $L$ is "uniformly distributed" across the buckets (already mentioned in section 1). Imagine that all the objects fall into one bucket, then we are back to ${\displaystyle O(N \log{N})}$ or slightly worse.
The test times don't include the time required to "build" the buckets. Buckets are meant to be "built" once and re-used.

And finally, the schema allows for

parallelisation, content of the buckets can be sorted in parallel and
imagine a data streaming scenario, if one bucket is updated, we don't have to re-sort the entire list $L$

Saturday, July 25, 2020

A few words about ConcurrentHashMap, compute() method

There is one ambiguous moment in the Java documentation of the compute() method from the ConcurrentHashMap class:

Some attempted update operations on this map by other threads may be blocked while computation is in progress, so the computation should be short and simple, and must not attempt to update any other mappings of this Map.

It's not clear what is being blocked. Is it the access to the entire map or to the key? Let's find out, by conducting a few experiments with the following code:

package tests.concurrency;

import java.util.Map;
import java.util.Objects;
import java.util.concurrent.ConcurrentHashMap;

public class TestConcurrentHashMapCompute implements Runnable {
	private static final Map<Key, String> MAP = new ConcurrentHashMap<>();
	private static final int MAX_RUNS = 100;
	private static final long SLEEP_TIME = 5000;

	public static void main(String[] args) {
		Thread th1 = new Thread(new TestConcurrentHashMapCompute());
		Thread th2 = new Thread(new TestConcurrentHashMapCompute());

		th1.start();
		th2.start();
	}

	@Override
	public void run() {
		final String name = Thread.currentThread().getName();
		for (int i = 0; i < MAX_RUNS; i++) {
			final String value = MAP.compute(new Key(name), (k, v) -> {
				System.out.println(name + ": enters");
				sleep();
				System.out.println(name + ": exits");
				return name;
			});
			System.out.println(name + ": " + value);
		}
	}

	private static void sleep() {
		try {
			Thread.sleep(SLEEP_TIME);
		} catch (Exception ex) {}
	}

	private static class Key {
		private final String value;

		private Key(String value) {
			this.value = value;
		}

		@Override
		public int hashCode() {
			return 1;
			// return Objects.hash(value);
		}

		@Override
		public boolean equals(Object o) {
			if (this == o) {
				return true;
			}
			if (o == null || getClass() != o.getClass()) {
				return false;
			}
			Key other = (Key) o;
			return Objects.equals(value, other.value);
		}
	}
}

It's nothing complicated, two threads calling the compute() method of a shared ConcurrentHashMap instance, using Key class instances as the key for the map. However, pay attention to the hashCode() method inside the Key class:

		@Override
		public int hashCode() {
			return 1;
			// return Objects.hash(value);
		}

Depending on what is being returned (comment/uncomment the relevant lines), you will see different behaviour. And, by the way, return 1 is a valid hashCode() implementation, according to the Java documentation:

- If two objects are equal according to the equals() method, then calling the hashCode method on each of the two objects must produce the same integer result.
- It is not required that if two objects are unequal according to the equals() method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.

The result is:
- With return Objects.hash(value), i.e. String's descent hashCode() implementation, the threads run in parallel, as if the block was on the Key.
- With return 1, the threads run sequentially, as if the block was on the entire Map.

In fact, the block is on the bucket indexed by the Key. That's what the hashCode() method is about, to quickly access the relevant bucket. HashMap, for example, uses a singly linked list as the storage for the buckets.

Saturday, September 22, 2018

Generating Subsets

Every once in a while I am being challenged with problems involving optimal subsets from a given set of objects. It happened again a few weeks ago. The problem was of a knapsack style (finding the most profitable subset of items of bounded total size) and ... of course I forgot how to tackle it using dynamic programming. But, many years ago I learned a very simple (to memorise) algorithm which saved the day and I will describe it in this article.

The Algorithm Design Manual, page $453$, defines it as binary counting. The idea behind it is the one to one mapping between the numbers $0 \rightarrow 2^{n-1}$ (total number of numbers is $2^n$) and binary strings of length $n$ (total number of strings is $2^n$) where bit $1$ at the index $i$ means "consider the object at the index $i$ in the original set for the given subset". We also know that the total number of subsets of a set of size $n$ is $2^n$. We have $2^n$ everywhere, awesome!

Let's see it in action by looking at a simple example, a small set $\left\{A, B, C\right\}$ with $n=3$. We are not interested in the empty set, thus we will ignore the $0$-th iteration, leaving us with a total of $2^3-1=7$ subsets:

Iteration	Binary string	Indexes of the objects to consider	Subset
1	001	1	$\left\{A\right\}$
2	010	2	$\left\{B\right\}$
3	011	1 and 2	$\left\{A, B\right\}$
4	100	3	$\left\{C\right\}$
5	101	1 and 3	$\left\{A, C\right\}$
6	110	2 and 3	$\left\{B, C\right\}$
7	111	1, 2 and 3	$\left\{A, B, C\right\}$

I need to stress the fact that it's an exponential (or $O\left(2^n\right)$) complexity algorithm. It is very inefficient for big sets. Here is a code example:

import java.util.LinkedHashSet;
import java.util.Set;

public class AllSubSets {
 public static void main(String args[]) {
  String[] setValues = new String[] { "A", "B", "C", "D", "E",
    "F", "G", "H", "I", "J" };

  long limit = 1 << setValues.length; // this is 2^length

  for (long l = 1; l < limit; l++) {
   Set subSet = new LinkedHashSet<>();
   for (int i = 0; i < setValues.length; i++) {
    if ((l & (1 << i)) > 0) {
     subSet.add(setValues[i]);
    }
   }
   subSet.forEach(System.out::print);
   System.out.println();
  }
 }
}

and the result:

A
B
AB
C
AC
BC
ABC
D
... (trimmed due to the size) ...
CDEFGHIJ
ACDEFGHIJ
BCDEFGHIJ
ABCDEFGHIJ

A careful reader will almost immediately reply that this version is limited to sets with maximum 64 objects (with Java 8, long is limited to 64 bits and can be treated as unsigned). BigInteger, which supports bitwise operations, addresses this limitation. And yes, the extra subset construction loop makes it an $O(n\cdot2^n)$ algorithm, it's still exponential though.

However, as it was mentioned earlier, this is not an efficient algorithm. Anything bigger than 64 objects will produce more than $2^{64}$ subsets and this is a big number. Even $2^{32}$ is pretty decent in size, so don't use this algorithm for big sets.

Monday, August 13, 2018

To Optional or not to Optional

I recently had an interesting conversation with my colleagues about the usage of Optional's in Java. It all revolves around this article.

So, I quickly sketched the following code to highlight the idea:

import java.util.Optional;

public class OptionalEvaluationTest {
 public static void main(String[] args) {
  test1(); // non lazy execution of orElse().
  test2();
  test3();
  test4();
 }

 private static void test1() {
  System.out.println("Test 1");
  Optional<Integer> oi = Optional.ofNullable(3);
  oi.map(OptionalEvaluationTest::intToString)
   .orElse(defaultString());
  System.out.println("----");
 }

 private static void test2() {
  System.out.println("Test 2");
  Optional<Integer> oi = Optional.ofNullable(null);
  oi.map(OptionalEvaluationTest::intToString)
   .orElse(defaultString());
  System.out.println("----");
 }

 private static void test3() {
  System.out.println("Test 3");
  Optional<Integer> oi = Optional.ofNullable(3);
  oi.map(OptionalEvaluationTest::intToString)
   .orElseGet(OptionalEvaluationTest::defaultString);
  System.out.println("----");
 }

 private static void test4() {
  System.out.println("Test 4");
  Optional<Integer> oi = Optional.ofNullable(null);
  oi.map(OptionalEvaluationTest::intToString)
   .orElseGet(OptionalEvaluationTest::defaultString);
  System.out.println("----");
 }

 private static String intToString(int i) {
  System.out.println("converter called!");
  return Integer.toString(i);
 }

 private static String defaultString() {
  System.out.println("default called!");
  return "default";
 }
}

which yields the following result:

Test 1
converter called!
default called!
----
Test 2
default called!
----
Test 3
converter called!
----
Test 4
default called!
----

What we see is that .orElse() is executed always, aka is not lazy executed. This may become a performance issue if used unwisely. Tend to prefer .orElseGet() instead.