Saturday, September 22, 2018

Generating Subsets

Every once in a while I am being challenged with problems involving optimal subsets from a given set of objects. It happened again a few weeks ago. The problem was of a knapsack style (finding the most profitable subset of items of bounded total size) and ... of course I forgot how to tackle it using dynamic programming. But, many years ago I learned a very simple (to memorise) algorithm which saved the day and I will describe it in this article.

The Algorithm Design Manual, page $453$, defines it as binary counting. The idea behind it is the one to one mapping between the numbers $0 \rightarrow 2^{n-1}$ (total number of numbers is $2^n$) and binary strings of length $n$ (total number of strings is $2^n$) where bit $1$ at the index $i$ means "consider the object at the index $i$ in the original set for the given subset". We also know that the total number of subsets of a set of size $n$ is $2^n$. We have $2^n$ everywhere, awesome!

Let's see it in action by looking at a simple example, a small set $\left\{A, B, C\right\}$ with $n=3$. We are not interested in the empty set, thus we will ignore the $0$-th iteration, leaving us with a total of $2^3-1=7$ subsets:

IterationBinary string Indexes of the objects to considerSubset
1001 1$\left\{A\right\}$
2010 2$\left\{B\right\}$
3011 1 and 2$\left\{A, B\right\}$
4100 3$\left\{C\right\}$
5101 1 and 3$\left\{A, C\right\}$
6110 2 and 3$\left\{B, C\right\}$
7111 1, 2 and 3$\left\{A, B, C\right\}$

I need to stress the fact that it's an exponential (or $O\left(2^n\right)$) complexity algorithm. It is very inefficient for big sets. Here is a code example:

import java.util.LinkedHashSet;
import java.util.Set;

public class AllSubSets {
 public static void main(String args[]) {
  String[] setValues = new String[] { "A", "B", "C", "D", "E",
    "F", "G", "H", "I", "J" };

  long limit = 1 << setValues.length; // this is 2^length

  for (long l = 1; l < limit; l++) {
   Set subSet = new LinkedHashSet<>();
   for (int i = 0; i < setValues.length; i++) {
    if ((l & (1 << i)) > 0) {
     subSet.add(setValues[i]);
    }
   }
   subSet.forEach(System.out::print);
   System.out.println();
  }
 }
}

and the result:


A
B
AB
C
AC
BC
ABC
D
... (trimmed due to the size) ...
CDEFGHIJ
ACDEFGHIJ
BCDEFGHIJ
ABCDEFGHIJ

A careful reader will almost immediately reply that this version is limited to sets with maximum 64 objects (with Java 8, long is limited to 64 bits and can be treated as unsigned). BigInteger, which supports bitwise operations, addresses this limitation. And yes, the extra subset construction loop makes it an $O(n\cdot2^n)$ algorithm, it's still exponential though.

However, as it was mentioned earlier, this is not an efficient algorithm. Anything bigger than 64 objects will produce more than $2^{64}$ subsets and this is a big number. Even $2^{32}$ is pretty decent in size, so don't use this algorithm for big sets.

No comments:

Post a Comment