PrevNext
Not Frequent
 0/11

Sum over Subsets DP

Authors: Rameez Parwez, Siyong Huang, Aakash Gokhale

Taking bitmask DP to the next level.

Edit This Page

Prerequisites

Resources
CF

Good explanation + problem list

GFG

Goes over brute force solutions

CF

Characterizing SOS DP as multidimensional prefix sums

Sum Over Subsets (SOS) is technique used to efficiently calculate the sum of values for all subsets of a given set or bitmask. Instead of manually generating every subset, which can be slow, SOS uses bitwise operations to quickly go through only the necessary subsets.

The main trick is that for any bitmask xx, we can directly loop through all its subsets ii using:

i=(i1)&xi = (i - 1)\&x

This ensures we efficiently handle only the subsets of xx, skipping unnecessary combinations.

Implementation

Consider an array AA of 2n2^n elements. For a bitmask xx, we define F(x)F(x) as the sum of A[i]A[i] over all subset mask ixi \subseteq x. That is:

F(x)=ixA[i]F(x) = \sum_{i \subseteq x} A[i]

The goal is to compute F(x)F(x) for all x=0,1,2,....,2n1x = 0, 1, 2, ...., 2^n - 1 as efficiently as possible.

Solution

Brute Force

The naive solution is to iterate over all pair of masks, summing A[i]A[i] only when one of them is a subset of the other (i.e.,i&x=ii \& x = i).

C++

vector<int> sos(1 << n);
for (int x = 0; x < (1 << n); x++) {
// iterate over all other sets and checks whether they're a subset of x
for (int i = 0; i < (1 << n); i++) {
if ((i & x) == i) { sos[x] += a[i]; }
}
}

Python

sos = [0] * (1 << n)
for x in range(1 << n):
# iterate over all other sets and checks whether they're a subset of x
for i in range(1 << n):
if (x & i) == i:
sos[x] += a[i]

This solution has a time complexity of O(2n2n)=O(4n)\mathcal{O}(2^n \cdot 2^n) = O(4^n), which is too slow for large values of nn.

Optimized Solution : Iterating Over Submasks

Instead of iterating over all 2n2^n bitmasks for ii, we can optimize by iterating only over the subset mask of xx.

C++

vector<int> sos(1 << n);
for (int x = 0; x < (1 << n); x++) {
sos[x] = a[0];
// iterate over all subsets of x directly
for (int i = x; i > 0; i = (i - 1) & x) { sos[x] += a[i]; }
}

Python

sos = [0] * (1 << n)
for x in range(1 << n):
sos[x] = a[0]
i = x
# iterate over all subsets of x directly
while i > 0:
sos[x] += a[i]
i = (i - 1) & x

How does this works?

Time Complexity: O(3n) \mathcal{O}(3^n)

Proof

Faster Solution using Dynamic Programming

While the previous method is better, it still has some redundancy. For example, if a bitmask ii has kk unset bits, then A[i]A[i] is summed 2k2^k times. If we precompute and reuse these sums, we can eliminate repeated additions.

Subset Mask Partitioning with S(x,i)S(x, i)

We define the subset S(x,i)S(x, i) as follows:

S(x,i)={yxyx<2i+1}S(x, i) = \{y \subseteq x \mid y \oplus x < 2^{i + 1}\}

In simpler terms, S(x,i)S(x, i) contains all subset masks of xx whose bits to the left of the ii-th bit match those of xx.

For example:

S(11010110,3)={11010000,11010010,11010100,11010110}S(11010110, 3) = \{11010000,11010010,11010100,11010110\}

We can break S(x,i)S(x, i) as follows:

  1. If the ii-th bit of xx is 00, then S(x,i)=S(x,i1)S(x, i) = S(x, i - 1).
  2. If the ii-th bit of xx is 11:
    • Subsets with the ii-th bit 1:S(x,i1)1: S(x, i - 1).
    • Subsets with the ii-th bit 0:S(x2i,i1)0 : S(x \oplus 2^i, i - 1).

Thus:

S(x,i)={S(x,i1)if the i-th bit is 0S(x,i1)S(x2i,i1)if the i-th bit is 1S(x, i) = \begin{cases} S(x, i - 1) & \text{if the i-th bit is 0} \\ S(x, i - 1) \cup S(x \oplus 2^i, i - 1) & \text{if the i-th bit is 1} \end{cases}

Using the above partitioning, we define a DP table dp[x][i]dp[x][i] where:

dp[x][i]=yS(x,i)A[y]dp[x][i] = \sum_{y \in S(x, i)} A[y]

Time Complexity: O(N2N)\mathcal{O}(N \cdot 2^N)

C++

vector<int> sos(1 << n);
vector<vector<int>> dp(1 << n, vector<int>(n));
for (int x = 0; x < (1 << n); x++) {
dp[x][0] = a[x];
if (x & 1) { dp[x][0] += a[x ^ 1]; }
for (int i = 1; i < n; i++) {
dp[x][i] = dp[x][i - 1];
if (x & (1 << i)) { dp[x][i] += dp[x ^ (1 << i)][i - 1]; }
}
sos[x] = dp[x][n - 1];
}

Python

sos = [0] * (1 << n)
dp = [[0] * n for _ in range(1 << n)]
for x in range(1 << n):
dp[x][0] = a[x]
if x & 1:
dp[x][0] += a[x ^ 1]
for i in range(1, n):
dp[x][i] = dp[x][i - 1]
if x & (1 << i):
dp[x][i] += dp[x ^ (1 << i)][i - 1]
sos[x] = dp[x][n - 1]

Optimized Memory Usage

Since dp[x][i]dp[x][i] only depends on dp[x][i1]dp[x][i - 1], we can reuse the DP array.

C++

for (int i = 0; i < (1 << n); i++) { sos[i] = a[i]; }
for (int i = 0; i < n; i++) {
for (int x = 0; x < (1 << n); x++) {
if (x & (1 << i)) { sos[x] += sos[x ^ (1 << i)]; }
}
}

Python

sos = [0] * (1 << n)
for i in range(1 << n):
sos[i] = a[i]
for i in range(n):
for x in range(1 << n):
if x & (1 << i):
sos[x] += sos[x ^ (1 << i)]

SOS DP as N-Dimensional Prefix Sum

To understand this, let's revisit 2D2D prefix sums. Given n×mn \times m array AA, the prefix sum array SS is defined as:

S[i][j]=aibjA[a][b]S[i][j] = \sum_{a \leq i} \sum_{b \leq j} A[a][b]

The standard approach uses inclusion-exlcusion:

S[i][j]=S[i][j1]+S[i1][j]S[i1][j1]+A[i][j]S[i][j] = S[i][j - 1] + S[i - 1][j] - S[i - 1][j - 1] + A[i][j]

However, this approach has a drawback: as the number of dimensions grows, the number of terms we need to add or subtract grows exponentially. A simple and more scalable approach is to sweep through each dimensions one at a time and compute the prefix sum step by step:

C++

// Initialize
for (int i = 0; i < n; i++) {
for (int j = 0; j < m; j++) { S[i][j] = A[i][j]; }
}
// Sweep along x-axis
for (int i = 1; i < n; i++) {
for (int j = 0; j < m; j++) { S[i][j] += S[i - 1][j]; }
}
// Sweep along y-axis
for (int i = 0; i < n; i++) {
for (int j = 1; j < m; j++) { S[i][j] += S[i][j - 1]; }
}

Python

# Initialize
for i in range(n):
for j in range(m):
S[i][j] = A[i][j]
# Sweep along the x-axis
for i in range(1, n):
for j in range(m):
S[i][j] += S[i - 1][j]
# Sweep along the y-axis
for i in range(n):
for j in range(1, m):
S[i][j] += S[i][j - 1]

This approach generalizes to higher dimensions. For example, in 4D4D:

  • After sweeping along the x-axis, S[i][j][k][l]S[i][j][k][l] contains the sum of A[a][b][c][d]A[a][b][c][d] where ai,b=j,c=k,a \leq i, b = j, c = k, and d=ld = l.
  • After sweeping along the y-axis, S[i][j][k][l]S[i][j][k][l] contains the sum of A[a][b][c][d]A[a][b][c][d] where ai,bj,c=k,a \leq i, b \leq j, c = k, and d=ld = l.
  • After sweeping along the z-axis, S[i][j][k][l]S[i][j][k][l] contains the sum of A[a][b][c][d]A[a][b][c][d] where ai,bj,ck,a \leq i, b \leq j, c \leq k, and d=ld = l.
  • After sweeping along the w-axis, S[i][j][k][l]S[i][j][k][l] contains the sum of A[a][b][c][d]A[a][b][c][d] where ai,bj,ck,a \leq i, b \leq j, c \leq k, and dld \leq l.

If we extend this idea to nn dimensions, here's what happens: after sweeping along the ii-th axis, S[x]S[x] contains the sum of AA values where the first ii coordinates are less than or equal to xx, and the remaining coordinates match xx. Sound familiar? That's exactly what F(x)F(x) represents in the SOS problem!

If we think of each bit in a bitmask as its own axis, then a bitmask xx with nn bits can be viewed as an nn-dimensional coordinate where each component is either 00 or 11. A submask of xx corresponds to an nn-dimensional coordinate where each component is less than or equal to xx. Therefore, when we interpret the bitmask as an nn-dimensional coordinate, F(x)F(x) alings with the definition of an nn-dimensional prefix sum!

By applying the sweeping algorithm along each axis, we get the memory-optimized SOS DP solution mentioned earlier, demonstrating that SOS DP is indeed an n-dimensional prefix sum.

C++

for (int i = 0; i < (1 << n); i++) { F[i] = A[i]; }
for (int i = 0; i < n; i++) { // Sweep along the i-th axis
for (int x = 0; x < (1 << n); x++) {
if (x & (1 << i)) // If the i-th bit is set, accumulate
F[x] += F[x ^ (1 << i)];
}
}

Python

for i in range(1 << n):
F[i] = A[i]
for i in range(n): # Sweep along the i-th axis
for x in range(1 << n):
if x & (1 << i): # If the i-th bit is set, accumulate
F[x] += F[x ^ (1 << i)]

Focus Problem – try your best to solve this problem before continuing!

Explanation

First, think of each word as a combination of letters represented by a bitmask. For example, bcd = 0b1110, and ada = 0b1001, where each bit stands for a letter. We'll also keep track of how often each bitmask appears in the dictionary.

Next, we use SOS DP to compute the number of words disjoint from the mask(i.e., words containing none of the vowels in the mask).

Once this is calculated, the number of valid words for a subset is simply nsos[mask]n - sos[mask] because a word is valid if it contains at least one vowel from the subset, meaning it is not disjoint from mask. Finally, we square the count of valid words for every subset, XOR all those squared values together, and that gives us the answer.

Implementation

Time Complexity: O(M2M)\mathcal{O}(M \cdot 2^M)

C++

#include <iostream>
#include <string>
const int M = 24;
int sos[1 << M];
int main() {
int n;
std::cin >> n;
for (int i = 0; i < n; i++) {

Python

M = 24
sos = [0] * (1 << M)
n = int(input())
for i in range(n):
st = input()
mask = (
(1 << (ord(st[0]) - 97)) | (1 << (ord(st[1]) - 97)) | (1 << (ord(st[2]) - 97))
)
sos[mask] += 1

General Problems

StatusSourceProblem NameDifficultyTags
CSESEasy
Show TagsSOS DP
PlatinumEasy
Show TagsCombo, DP
CFNormal
Show TagsBitmasks, SOS DP
PlatinumNormal
Show TagsNT, Prefix Sums
CFNormal
Show TagsBitmasks, SOS DP
CFNormal
Show TagsBitmasks, SOS DP
kilonovaHard
Show TagsBitmasks, NT, SOS DP
CFHard
Show TagsBitmasks, DP
JOIHard
Show TagsSOS DP
CFInsane
Show TagsBitmasks, DP, SOS DP

Module Progress:

Join the USACO Forum!

Stuck on a problem, or don't understand a module? Join the USACO Forum and get help from other competitive programmers!

PrevNext