Last Updated: April 2, 2026
We need to find the shortest path (fewest edges) that visits every node in an undirected graph. There are a few things that make this different from typical shortest path problems.
First, we can start at any node. There's no fixed source. Second, we can revisit nodes and reuse edges. This is critical because the graph might not have a Hamiltonian path (a path that visits every node exactly once), so backtracking through already-visited nodes might be necessary. Third, "shortest" means fewest edges traversed, not fewest unique nodes.
The key challenge is tracking which nodes we've visited along the way. A simple BFS won't work because the "state" isn't just our current position. Two paths that arrive at the same node but have visited different subsets of nodes are fundamentally different states. That observation, that our state includes both where we are and what we've visited, is the core insight that drives every approach to this problem.
1 <= n <= 12 → With at most 12 nodes, this is a strong signal toward exponential approaches. Bitmask techniques are very feasible since 2^12 = 4096 is tiny.The most straightforward idea is to try every possible order of visiting nodes. Since we need to visit all n nodes, one way to think about it is: what's the shortest walk through the graph that covers every node?
A brute force approach would try starting from each node and use backtracking to explore all possible next moves. At each step, we pick an unvisited neighbor (or walk through visited nodes to reach an unvisited one). We track the minimum number of edges needed to visit all nodes.
This is essentially enumerating paths, and it's wildly expensive. But it establishes what we're trying to optimize.
Input:
We try starting from each node. Starting from node 1: visit 1, go to 0 (visit 0), go to 2 (visit 2), back to 0, go to 3 (visit 3). Total: 4 edges.
The backtracking approach explores a massive search tree and doesn't remember whether it's already found the shortest path for a given (currentNode, visitedSet) combination. What if we could guarantee each unique state is explored at most once?
Here's the key insight: we can redefine what "state" means. In normal BFS on a graph, the state is just the current node. But in this problem, two paths arriving at the same node with different sets of visited nodes are completely different situations. The path that's visited {0, 1, 2} has different future possibilities than the one that's visited {0, 1, 3}.
So our state becomes a pair: (current node, set of visited nodes). Since n ≤ 12, we can represent the visited set as a bitmask. Bit i is set if node i has been visited. This gives us at most n 2^n total states, which for n=12 is 12 4096 = 49,152. That's tiny.
Now we just run BFS over this state space. BFS guarantees that the first time we reach a state where all bits are set (meaning all nodes are visited), that's the shortest path.
Since we can start at any node, we seed the BFS queue with all n starting states: (node i, bitmask with only bit i set) for each node i. The target is any state where the bitmask equals (1 << n) - 1, meaning all n bits are on.
BFS explores states in order of increasing distance. Since every edge in the original graph has weight 1, traversing one edge increases the path length by exactly 1. By defining our state as (node, visitedMask), we ensure that we never revisit the same situation. Two paths arriving at node 3 with the same set of visited nodes are equivalent, so we only need to keep the one that arrived first (which BFS guarantees is the shortest).
The bitmask is what makes this tractable. Without it, we'd need to store sets of visited nodes, which would be expensive to hash and compare. With a bitmask, checking and updating the visited set is just a bitwise OR, and the entire visited structure is a simple 2D boolean array.
n be the number of nodes and fullMask = (1 << n) - 1 (all n bits set).The BFS approach is already efficient, but we can also solve this with dynamic programming. The DP approach decomposes the problem differently, separating shortest-path computation from the visiting-order optimization.
Instead of BFS, we can solve this with dynamic programming. The idea is similar: our state is (current node, visited mask). But instead of exploring layer by layer, we build up solutions from smaller visited sets to larger ones.
Define dp[mask][i] as the minimum number of edges to reach node i with the set of visited nodes represented by mask. We want the minimum of dp[fullMask][i] across all nodes i.
The base case is straightforward: dp[1 << i][i] = 0 for all nodes i. Starting at node i with only node i visited costs 0 edges.
For the transitions, we precompute the shortest distance between every pair of nodes. If we know dist[i][j] for all pairs, the DP transition becomes simple: from state (mask, i), to visit a new node j, the cost is dp[mask][i] + dist[i][j]. The dist[i][j] value already accounts for any intermediate steps through other nodes.
This decomposition is elegant because it separates two concerns. The BFS precomputation answers "how do I get from A to B?" while the bitmask DP answers "in what order should I visit all nodes?"
The DP works because it decomposes the problem into two independent subproblems. First, what's the shortest distance between any two nodes? (Solved by BFS from each node.) Second, in what order should we visit all nodes to minimize total travel? (Solved by the bitmask DP.)
The bitmask DP iterates through all subsets of nodes in increasing order. For each subset, it knows the cheapest way to have visited exactly those nodes and ended at each node in the subset. When extending to a new node j, the cost is the current cost plus the pre-computed shortest path from the current node to j. This accounts for all the edge traversals needed, including traversals through already-visited nodes.
dp[1 << i][i] = 0 for each node i.dp[mask | (1 << j)][j] = min(dp[mask | (1 << j)][j], dp[mask][i] + dist[i][j]).dp[fullMask][i] for all i.