Implementation Solution Brute Force Optimized Solution : Iterating Over Submasks Faster Solution using Dynamic Programming Subset Mask Partitioning with

S(x, i)

Optimized Memory Usage SOS DP as N-Dimensional Prefix Sum Explanation Implementation General Problems

Not Frequent

0/11

Sum over Subsets DP

Authors: Rameez Parwez, Siyong Huang, Aakash Gokhale

Taking bitmask DP to the next level.

Edit This Page

Prerequisites

Gold - Bitmask DP

Implementation Solution Brute Force Optimized Solution : Iterating Over Submasks Faster Solution using Dynamic Programming Subset Mask Partitioning with

S(x, i)

Optimized Memory Usage SOS DP as N-Dimensional Prefix Sum Explanation Implementation General Problems


CF	SOS Dynamic Programming [Tutorial]	Good explanation + problem list
GFG	Sum over Subsets \| Dynamic Programming	Goes over brute force solutions
CF	Some SOS DP Insights	Characterizing SOS DP as multidimensional prefix sums

Sum Over Subsets (SOS) is technique used to efficiently calculate the sum of values for all subsets of a given set or bitmask. Instead of manually generating every subset, which can be slow, SOS uses bitwise operations to quickly go through only the necessary subsets.

The main trick is that for any bitmask $x$ , we can directly loop through all its subsets $i$ using:

i = (i - 1)\&x

This ensures we efficiently handle only the subsets of $x$ , skipping unnecessary combinations.

Implementation

Consider an array $A$ of $2^n$ elements. For a bitmask $x$ , we define $F(x)$ as the sum of $A[i]$ over all subset mask $i \subseteq x$ . That is:

F(x) = \sum_{i \subseteq x} A[i]

The goal is to compute $F(x)$ for all $x = 0, 1, 2, ...., 2^n - 1$ as efficiently as possible.

Solution

Brute Force

The naive solution is to iterate over all pair of masks, summing $A[i]$ only when one of them is a subset of the other (i.e., $i \& x = i$ ).

C++

vector<int> sos(1 << n);

for (int x = 0; x < (1 << n); x++) {
	// iterate over all other sets and checks whether they're a subset of x
	for (int i = 0; i < (1 << n); i++) {
		if ((i & x) == i) { sos[x] += a[i]; }
	}
}

Python

sos = [0] * (1 << n)
for x in range(1 << n):
	# iterate over all other sets and checks whether they're a subset of x
	for i in range(1 << n):
		if (x & i) == i:
			sos[x] += a[i]

This solution has a time complexity of $\mathcal{O}(2^n \cdot 2^n) = O(4^n)$ , which is too slow for large values of $n$ .

Optimized Solution : Iterating Over Submasks

Instead of iterating over all $2^n$ bitmasks for $i$ , we can optimize by iterating only over the subset mask of $x$ .

C++

vector<int> sos(1 << n);

for (int x = 0; x < (1 << n); x++) {
	sos[x] = a[0];
	// iterate over all subsets of x directly
	for (int i = x; i > 0; i = (i - 1) & x) { sos[x] += a[i]; }
}

Python

sos = [0] * (1 << n)
for x in range(1 << n):
	sos[x] = a[0]
	i = x
	# iterate over all subsets of x directly
	while i > 0:
		sos[x] += a[i]
		i = (i - 1) & x

How does this works?

Time Complexity: $\mathcal{O}(3^n)$

Proof

Faster Solution using Dynamic Programming

While the previous method is better, it still has some redundancy. For example, if a bitmask $i$ has $k$ unset bits, then $A[i]$ is summed $2^k$ times. If we precompute and reuse these sums, we can eliminate repeated additions.

Subset Mask Partitioning with $S(x, i)$

We define the subset $S(x, i)$ as follows:

S(x, i) = \{y \subseteq x \mid y \oplus x < 2^{i + 1}\}

In simpler terms, $S(x, i)$ contains all subset masks of $x$ whose bits to the left of the $i$ -th bit match those of $x$ .

For example:

S(11010110, 3) = \{11010000,11010010,11010100,11010110\}

We can break $S(x, i)$ as follows:

If the $i$ -th bit of $x$ is $0$ , then $S(x, i) = S(x, i - 1)$ .
If the $i$ -th bit of $x$ is $1$ :
- Subsets with the $i$ -th bit $1: S(x, i - 1)$ .
- Subsets with the $i$ -th bit $0 : S(x \oplus 2^i, i - 1)$ .

Thus:

S(x, i) = \begin{cases} S(x, i - 1) & \text{if the i-th bit is 0} \\ S(x, i - 1) \cup S(x \oplus 2^i, i - 1) & \text{if the i-th bit is 1} \end{cases}

Using the above partitioning, we define a DP table $dp[x][i]$ where:

dp[x][i] = \sum_{y \in S(x, i)} A[y]

Time Complexity: $\mathcal{O}(N \cdot 2^N)$

C++

vector<int> sos(1 << n);
vector<vector<int>> dp(1 << n, vector<int>(n));

for (int x = 0; x < (1 << n); x++) {
	dp[x][0] = a[x];
	if (x & 1) { dp[x][0] += a[x ^ 1]; }
	for (int i = 1; i < n; i++) {
		dp[x][i] = dp[x][i - 1];
		if (x & (1 << i)) { dp[x][i] += dp[x ^ (1 << i)][i - 1]; }
	}
	sos[x] = dp[x][n - 1];
}

Python

sos = [0] * (1 << n)
dp = [[0] * n for _ in range(1 << n)]

for x in range(1 << n):
	dp[x][0] = a[x]
	if x & 1:
		dp[x][0] += a[x ^ 1]

	for i in range(1, n):
		dp[x][i] = dp[x][i - 1]
		if x & (1 << i):
			dp[x][i] += dp[x ^ (1 << i)][i - 1]

	sos[x] = dp[x][n - 1]

Optimized Memory Usage

Since $dp[x][i]$ only depends on $dp[x][i - 1]$ , we can reuse the DP array.

C++

for (int i = 0; i < (1 << n); i++) { sos[i] = a[i]; }

for (int i = 0; i < n; i++) {
	for (int x = 0; x < (1 << n); x++) {
		if (x & (1 << i)) { sos[x] += sos[x ^ (1 << i)]; }
	}
}

Python

sos = [0] * (1 << n)
for i in range(1 << n):
	sos[i] = a[i]

for i in range(n):
	for x in range(1 << n):
		if x & (1 << i):
			sos[x] += sos[x ^ (1 << i)]

SOS DP as N-Dimensional Prefix Sum

To understand this, let's revisit $2D$ prefix sums. Given $n \times m$ array $A$ , the prefix sum array $S$ is defined as:

S[i][j] = \sum_{a \leq i} \sum_{b \leq j} A[a][b]

The standard approach uses inclusion-exlcusion:

S[i][j] = S[i][j - 1] + S[i - 1][j] - S[i - 1][j - 1] + A[i][j]

However, this approach has a drawback: as the number of dimensions grows, the number of terms we need to add or subtract grows exponentially. A simple and more scalable approach is to sweep through each dimensions one at a time and compute the prefix sum step by step:

C++

// Initialize
for (int i = 0; i < n; i++) {
	for (int j = 0; j < m; j++) { S[i][j] = A[i][j]; }
}

// Sweep along x-axis
for (int i = 1; i < n; i++) {
	for (int j = 0; j < m; j++) { S[i][j] += S[i - 1][j]; }
}

// Sweep along y-axis
for (int i = 0; i < n; i++) {
	for (int j = 1; j < m; j++) { S[i][j] += S[i][j - 1]; }
}

Python

# Initialize
for i in range(n):
	for j in range(m):
		S[i][j] = A[i][j]

# Sweep along the x-axis
for i in range(1, n):
	for j in range(m):
		S[i][j] += S[i - 1][j]

# Sweep along the y-axis
for i in range(n):
	for j in range(1, m):
		S[i][j] += S[i][j - 1]

This approach generalizes to higher dimensions. For example, in $4D$ :

After sweeping along the x-axis, $S[i][j][k][l]$ contains the sum of $A[a][b][c][d]$ where $a \leq i, b = j, c = k,$ and $d = l$ .
After sweeping along the y-axis, $S[i][j][k][l]$ contains the sum of $A[a][b][c][d]$ where $a \leq i, b \leq j, c = k,$ and $d = l$ .
After sweeping along the z-axis, $S[i][j][k][l]$ contains the sum of $A[a][b][c][d]$ where $a \leq i, b \leq j, c \leq k,$ and $d = l$ .
After sweeping along the w-axis, $S[i][j][k][l]$ contains the sum of $A[a][b][c][d]$ where $a \leq i, b \leq j, c \leq k,$ and $d \leq l$ .

If we extend this idea to $n$ dimensions, here's what happens: after sweeping along the $i$ -th axis, $S[x]$ contains the sum of $A$ values where the first $i$ coordinates are less than or equal to $x$ , and the remaining coordinates match $x$ . Sound familiar? That's exactly what $F(x)$ represents in the SOS problem!

If we think of each bit in a bitmask as its own axis, then a bitmask $x$ with $n$ bits can be viewed as an $n$ -dimensional coordinate where each component is either $0$ or $1$ . A submask of $x$ corresponds to an $n$ -dimensional coordinate where each component is less than or equal to $x$ . Therefore, when we interpret the bitmask as an $n$ -dimensional coordinate, $F(x)$ alings with the definition of an $n$ -dimensional prefix sum!

By applying the sweeping algorithm along each axis, we get the memory-optimized SOS DP solution mentioned earlier, demonstrating that SOS DP is indeed an n-dimensional prefix sum.

C++

for (int i = 0; i < (1 << n); i++) { F[i] = A[i]; }

for (int i = 0; i < n; i++) {  // Sweep along the i-th axis
	for (int x = 0; x < (1 << n); x++) {
		if (x & (1 << i))  // If the i-th bit is set, accumulate
			F[x] += F[x ^ (1 << i)];
	}
}

Python

for i in range(1 << n):
	F[i] = A[i]

for i in range(n):  # Sweep along the i-th axis
	for x in range(1 << n):
		if x & (1 << i):  # If the i-th bit is set, accumulate
			F[x] += F[x ^ (1 << i)]

Vowels

CF - Easy

Focus Problem – try your best to solve this problem before continuing!

Explanation

First, think of each word as a combination of letters represented by a bitmask. For example, bcd = 0b1110, and ada = 0b1001, where each bit stands for a letter. We'll also keep track of how often each bitmask appears in the dictionary.

Next, we use SOS DP to compute the number of words disjoint from the mask(i.e., words containing none of the vowels in the mask).

Once this is calculated, the number of valid words for a subset is simply $n - sos[mask]$ because a word is valid if it contains at least one vowel from the subset, meaning it is not disjoint from mask. Finally, we square the count of valid words for every subset, XOR all those squared values together, and that gives us the answer.

Implementation

Time Complexity: $\mathcal{O}(M \cdot 2^M)$

C++

#include <iostream>
#include <string>

const int M = 24;
int sos[1 << M];

int main() {
	int n;
	std::cin >> n;
	for (int i = 0; i < n; i++) {

Python

M = 24
sos = [0] * (1 << M)

n = int(input())
for i in range(n):
	st = input()
	mask = (
		(1 << (ord(st[0]) - 97)) | (1 << (ord(st[1]) - 97)) | (1 << (ord(st[2]) - 97))
	)
	sos[mask] += 1

General Problems

Source	Problem Name	Difficulty	Tags
CSES	Bit Problem	Easy	Show Tags SOS DP
Platinum	Problem Setting	Easy	Show Tags Combo, DP
CF	Bits and Pieces	Normal	Show Tags Bitmasks, SOS DP
Platinum	Sleeping in Class	Normal	Show Tags NT, Prefix Sums
CF	Compatible Numbers	Normal	Show Tags Bitmasks, SOS DP
CF	Jzzhu and Numbers	Normal	Show Tags Bitmasks, SOS DP
kilonova	XorTransform	Hard	Show Tags Bitmasks, NT, SOS DP
CF	Varying Kibibits	Hard	Show Tags Bitmasks, DP
JOI	2018 - Snake Escaping	Hard	Show Tags SOS DP
CF	Wise Men	Insane	Show Tags Bitmasks, DP, SOS DP

Module Progress:

Join the USACO Forum!

Stuck on a problem, or don't understand a module? Join the USACO Forum and get help from other competitive programmers!

Join Forum

Table of Contents

Sum over Subsets DP

Prerequisites

Table of Contents

Implementation

Solution

Brute Force

Optimized Solution : Iterating Over Submasks

Faster Solution using Dynamic Programming

Subset Mask Partitioning with $S(x, i)$

Optimized Memory Usage

SOS DP as N-Dimensional Prefix Sum

Explanation

Implementation

General Problems

Module Progress:

Join the USACO Forum!

Table of Contents

Sum over Subsets DP

Prerequisites

Table of Contents

Implementation

Solution

Brute Force

Optimized Solution : Iterating Over Submasks

Faster Solution using Dynamic Programming

Subset Mask Partitioning with S(x,i)S(x, i)S(x,i)

Optimized Memory Usage

SOS DP as N-Dimensional Prefix Sum

Explanation

Implementation

General Problems

Module Progress:Not Started

Join the USACO Forum!

Subset Mask Partitioning with $S(x, i)$

Module Progress: