Saturday, April 27, 2024

A few thoughts on faster sorting

In this article, I am presenting a sorting schema, hugely inspired by the Bucket and Proxmap sorting algorithms. I call it "schema", rather than "algorithm", simply because it introduces a (or another, depending on the sorting algorithm) "devide and conquer" element to the existing sorting algorithms, aiming (imho) to make them faster.

Also, let's recall that most of the sorting algorithms come with the time complexity $${\displaystyle O(N \log{N})} \tag{1}$$ With more details ...

1. The description of the schema

Let's suppose we have a list $L$ with $N$ (list size, thus a number) objects to sort. Now, let's imagine that we have $M$ (a number) buckets which are pre-sorted (buckets, not the content) according to the same criteria applied to sort the objects from the list $L$. Let's use strings as an example, if $$L=\{\text{BBBB}, \text{BBAA}, \text{AAAA},\text{...}\}$$ then we can "bucket" by the 1st letter of the string, giving us a total of 26 letters in the English alphabet (and yes let's ignore the case sensitivity and numbers and ... everything to keep things easy). We keep these buckets sorted as $$B_1=\{A\}, B_2=\{B\},...,B_{26}=\{Z\}$$ Then we partition/distribute the content of $L$ accross the buckets, sort the buckets individually and (finally) merge the content of the buckets to the final sorted list (simple traversal of the buckets, since they are already sorted). Something like this

What we have (in terms of time complexity) as a result

  • List $L$ traversal: ${\displaystyle O(N)}$
  • It's important that the distribution of objects to buckets is ${\displaystyle O(1)}$ per object, e.g. using hash maps
  • Sorting per bucket (from $(1)$) is ${\displaystyle O\left(\frac{N}{M} \cdot\log{\frac{N}{M}}\right)}$. Assumptions is that the content of the list $L$ is "uniformly" distributed (!!!) across the buckets.
  • Merging into the final list is ${\displaystyle O(M)}$. Again, buckets are pre-sorted.

2. The mathematical argument

Time complexity in the schema described above is $$ {\displaystyle O\left(N + M \cdot \frac{N}{M}\cdot \log{\left(\frac{N}{M}\right)}+M\right)} $$ which is $${\displaystyle O\left(N + N\cdot\log{\left(\frac{N}{M}\right)}+M\right)} \tag{2}$$ If we compare $(1)$ and $(2)$ $$\frac{N + N\cdot\log{\left(\frac{N}{M}\right)}+M}{N \cdot\log{N}}= 1-\frac{\log{M} - 1}{\log{N}}+\frac{M}{N\cdot\log{N}} \tag{3}$$ Assuming large enough $M < N$ we have $$\frac{N + N\cdot\log{\left(\frac{N}{M}\right)}+M}{N \cdot\log{N}}< 1-\frac{\log{M} - 2}{\log{N}}< 1 \tag{4}$$ In other words, for large $N$ and $M$ we should expect this schema to make most of the sorting algorithms "a little bit" faster. By how much? Let's see ...

3. Specific example

Let's consider lists of strings as an example. Here is the source code of a little PoC project which generates lists of randomly generated strings and sorts them using

  1. the Java's Collections::sort and
  2. an implementation of the schema above, which internally uses Collections::sort as well to sort the content of the buckets
and compares the results. No parallelisation is used, although sorting buckets in parallel could significantly improve the speed in the second (B) case, but no cheating!

Most of the aspects in the code are easy to tune, but we will consider only

  • lists of size $N=1000000$ and $N=3000000$ and
  • bucketing by the first 2 letters, giving the total number of buckets $M=26^2$

Plugging these numbers into the formula $(3)$ we should expect

$M=26^2$ and $N=1000000$ $M=26^2$ and $N=3000000$
0.60 0.63

An improvement of nearly $40$%!? No way ...

4. Actual results

One more test parameter I should mention (and which is not part of the formula $(3)$) is $X$ the length of each string added to the list $L$, this is to count for Java string comparison where string size plays a role. We will look for $X=10$, $X=50$ and $X=100$. Here are some results from running the PoC code on my fairly old computer with an Intel i5-3330 3.00GHz 4 Cores CPU on board, using Java 17

$X=10$, $M=26^2$ and $N=1000000$ $X=10$, $M=26^2$ and $N=3000000$
Reported result:
  Fast Sort avg: 422.33ms
  Std Sort avg: 677.96ms
    
Reported result:
  Fast Sort avg: 1484.24ms
  Std Sort avg: 2395.39ms
    
$\frac{422.33}{677.96}\approx 0.6229$ $\frac{1484.24}{2395.39}\approx 0.6196$

A few more results

$X=50$, $M=26^2$ and $N=3000000$ $X=100$, $M=26^2$ and $N=3000000$
Reported result:
  Fast Sort avg: 1545.82ms
  Std Sort avg: 2773.65ms
    
Reported result:
  Fast Sort avg: 1561.84ms
  Std Sort avg: 2883.08ms
    
$\frac{1545.82}{2773.65}\approx 0.5573$ $\frac{1561.84}{2883.08}\approx 0.5417$

5. Conclusions

Well, the test results are not too far off from the calculated results. So, the schema does improve the sorting ... However, I should mention the following:

  • It works well when the list $L$ is "uniformly distributed" across the buckets (already mentioned in section 1). Imagine that all the objects fall into one bucket, then we are back to ${\displaystyle O(N \log{N})}$ or slightly worse.
  • The test times don't include the time required to "build" the buckets. Buckets are meant to be "built" once and re-used.

And finally, the schema allows for

  • parallelisation, content of the buckets can be sorted in parallel and
  • imagine a data streaming scenario, if one bucket is updated, we don't have to re-sort the entire list $L$

Sunday, April 7, 2024

A property of the Prime Counting function

While addressing this question on MSE the following inequality was revealed, concerning the Prime Counting function ...

For $\forall x,y$ such that $60184\le x< y$ we have: $$\frac{x}{\pi(x)} < \frac{y}{\pi(y)} + 1$$

Friday, April 5, 2024

De fapt ...

"De la istoria cu pomul,
S-a mai cumințit oare omul?"
Intreabă odată Domnul,
Pe slugă-Sa, Satan.

Acela, surprins de întrebare,
Cu fața plină de mirare,
Răspunde într-o răsuflare:
"De fapt ..."