Last Updated: April 3, 2026
We are given a binary tree, and we need to find all subtrees that appear more than once. A subtree here means a node along with all of its descendants, preserving structure. Two subtrees are considered duplicates only if they have the exact same shape and the same values at every position.
The challenge is figuring out how to efficiently compare subtrees. A naive approach would compare every pair of subtrees directly, but that is wasteful. The key observation is that we can represent each subtree as a unique string (a serialization), and then two subtrees are duplicates if and only if they produce the same string. This transforms the problem from "compare tree structures" to "find duplicate strings," which is something a hash map handles naturally.
1 <= number of nodes <= 5000 -- With n up to 5000, even an O(n^2) approach is feasible (25 million operations), but we should aim for better. An O(n) or O(n * average_string_length) solution is ideal.-200 <= Node.val <= 200 -- Node values are bounded integers. This is useful because it means serialized representations will not have extremely long individual node strings.At least 1 node -- No need to handle an empty tree.The most straightforward idea: for every pair of nodes in the tree, check if the subtrees rooted at those two nodes are identical. If they are, add one of them to the result (making sure we do not add duplicates).
To check if two subtrees are identical, we do a recursive comparison: same value at the root, and both left subtrees are identical, and both right subtrees are identical.
This is conceptually simple but does a lot of redundant work. For each of the O(n^2) pairs, the comparison itself can take O(n) time in the worst case, making this an O(n^3) approach.
Input:
We compare every pair of subtrees. Node 4 appears as a leaf in three places (indices 3, 9, and 6). When we compare node at index 3 with node at index 9, isSame returns true, so we add [4] to the result. Similarly, the subtree [2, 4] appears at indices 1 and 5, so we add [2, 4] to the result.
Output:
isSame comparison can visit up to O(n) nodes in the worst case.isSame goes up to O(h) where h is the tree height.The bottleneck is comparing every pair of subtrees from scratch. What if we could represent each subtree with a compact fingerprint so that comparison becomes an O(1) hash map lookup?
Instead of comparing subtrees structurally, we can serialize each subtree into a string. Two subtrees are identical if and only if their serializations match. This transforms a tree comparison problem into a string matching problem.
We use postorder traversal (left, right, node) to build the serialization bottom-up. For each node, we recursively get the serialization of its left subtree and right subtree, then combine them with the current node's value. A null child is represented by a special marker like "#".
We store each serialization in a hash map counting occurrences. When the count for a serialization hits exactly 2, we know we have found a duplicate, and we add the current node to our result.
The serialization captures everything about a subtree: its structure (where the nulls are) and its values. Because we build it postorder (bottom-up), by the time we process a parent node, both children have already been serialized. So the parent's serialization naturally includes the full representation of its entire subtree.
Using the count == 2 check is a clean trick. When we see a serialization for the first time (count = 1), we just note it. When we see it a second time (count = 2), we know it is a duplicate, so we add it to the result. If we see it a third, fourth, or fifth time, we skip it because we have already recorded this pattern.
leftSerial + "," + rightSerial + "," + node.val.The bottleneck is the string serialization itself. For a skewed tree, the total string data across all nodes can reach O(n^2). What if we could represent each subtree with a short, fixed-size identifier instead of a long string?
The string serialization approach works, but it suffers from long strings in the worst case. Here is the optimization: instead of encoding a subtree as a full string, we assign each unique subtree structure a numeric ID. Then, to represent a subtree, we only need three things: the left child's ID, the current node's value, and the right child's ID. That is a fixed-size tuple, no matter how deep the subtree is.
Think of it like this. In the string approach, the parent's serialization includes the entire serialization of both children, which includes the entire serialization of the grandchildren, and so on. It is like copying a document into another document into another document. With IDs, we just reference the document by its page number instead of copying it. Each node's representation is always just three numbers.