Saturday, September 22, 2018

Generating Subsets

Every once in a while I am being challenged with problems involving optimal subsets from a given set of objects. It happened again a few weeks ago. The problem was of a knapsack style (finding the most profitable subset of items of bounded total size) and ... of course I forgot how to tackle it using dynamic programming. But, many years ago I learned a very simple (to memorise) algorithm which saved the day and I will describe it in this article.

The Algorithm Design Manual, page $453$, defines it as binary counting. The idea behind it is the one to one mapping between the numbers $0 \rightarrow 2^{n-1}$ (total number of numbers is $2^n$) and binary strings of length $n$ (total number of strings is $2^n$) where bit $1$ at the index $i$ means "consider the object at the index $i$ in the original set for the given subset". We also know that the total number of subsets of a set of size $n$ is $2^n$. We have $2^n$ everywhere, awesome!

Let's see it in action by looking at a simple example, a small set $\left\{A, B, C\right\}$ with $n=3$. We are not interested in the empty set, thus we will ignore the $0$-th iteration, leaving us with a total of $2^3-1=7$ subsets:

IterationBinary string Indexes of the objects to considerSubset
1001 1$\left\{A\right\}$
2010 2$\left\{B\right\}$
3011 1 and 2$\left\{A, B\right\}$
4100 3$\left\{C\right\}$
5101 1 and 3$\left\{A, C\right\}$
6110 2 and 3$\left\{B, C\right\}$
7111 1, 2 and 3$\left\{A, B, C\right\}$

I need to stress the fact that it's an exponential (or $O\left(2^n\right)$) complexity algorithm. It is very inefficient for big sets. Here is a code example:

import java.util.LinkedHashSet;
import java.util.Set;

public class AllSubSets {
 public static void main(String args[]) {
  String[] setValues = new String[] { "A", "B", "C", "D", "E",
    "F", "G", "H", "I", "J" };

  long limit = 1 << setValues.length; // this is 2^length

  for (long l = 1; l < limit; l++) {
   Set subSet = new LinkedHashSet<>();
   for (int i = 0; i < setValues.length; i++) {
    if ((l & (1 << i)) > 0) {
     subSet.add(setValues[i]);
    }
   }
   subSet.forEach(System.out::print);
   System.out.println();
  }
 }
}

and the result:


A
B
AB
C
AC
BC
ABC
D
... (trimmed due to the size) ...
CDEFGHIJ
ACDEFGHIJ
BCDEFGHIJ
ABCDEFGHIJ

A careful reader will almost immediately reply that this version is limited to sets with maximum 64 objects (with Java 8, long is limited to 64 bits and can be treated as unsigned). BigInteger, which supports bitwise operations, addresses this limitation. And yes, the extra subset construction loop makes it an $O(n\cdot2^n)$ algorithm, it's still exponential though.

However, as it was mentioned earlier, this is not an efficient algorithm. Anything bigger than 64 objects will produce more than $2^{64}$ subsets and this is a big number. Even $2^{32}$ is pretty decent in size, so don't use this algorithm for big sets.

Monday, August 13, 2018

To Optional or not to Optional

I recently had an interesting conversation with my colleagues about the usage of Optional's in Java. It all revolves around this article.

So, I quickly sketched the following code to highlight the idea:

import java.util.Optional;

public class OptionalEvaluationTest {
 public static void main(String[] args) {
  test1(); // non lazy execution of orElse().
  test2();
  test3();
  test4();
 }

 private static void test1() {
  System.out.println("Test 1");
  Optional<Integer> oi = Optional.ofNullable(3);
  oi.map(OptionalEvaluationTest::intToString)
   .orElse(defaultString());
  System.out.println("----");
 }

 private static void test2() {
  System.out.println("Test 2");
  Optional<Integer> oi = Optional.ofNullable(null);
  oi.map(OptionalEvaluationTest::intToString)
   .orElse(defaultString());
  System.out.println("----");
 }

 private static void test3() {
  System.out.println("Test 3");
  Optional<Integer> oi = Optional.ofNullable(3);
  oi.map(OptionalEvaluationTest::intToString)
   .orElseGet(OptionalEvaluationTest::defaultString);
  System.out.println("----");
 }

 private static void test4() {
  System.out.println("Test 4");
  Optional<Integer> oi = Optional.ofNullable(null);
  oi.map(OptionalEvaluationTest::intToString)
   .orElseGet(OptionalEvaluationTest::defaultString);
  System.out.println("----");
 }

 private static String intToString(int i) {
  System.out.println("converter called!");
  return Integer.toString(i);
 }

 private static String defaultString() {
  System.out.println("default called!");
  return "default";
 }
}

which yields the following result:


Test 1
converter called!
default called!
----
Test 2
default called!
----
Test 3
converter called!
----
Test 4
default called!
----

What we see is that .orElse() is executed always, aka is not lazy executed. This may become a performance issue if used unwisely. Tend to prefer .orElseGet() instead.