AlgoMaster Logo

Find Duplicate Subtrees

Last Updated: April 3, 2026

medium
5 min read

Understanding the Problem

We are given a binary tree, and we need to find all subtrees that appear more than once. A subtree here means a node along with all of its descendants, preserving structure. Two subtrees are considered duplicates only if they have the exact same shape and the same values at every position.

The challenge is figuring out how to efficiently compare subtrees. A naive approach would compare every pair of subtrees directly, but that is wasteful. The key observation is that we can represent each subtree as a unique string (a serialization), and then two subtrees are duplicates if and only if they produce the same string. This transforms the problem from "compare tree structures" to "find duplicate strings," which is something a hash map handles naturally.

Key Constraints:

  • 1 <= number of nodes <= 5000 -- With n up to 5000, even an O(n^2) approach is feasible (25 million operations), but we should aim for better. An O(n) or O(n * average_string_length) solution is ideal.
  • -200 <= Node.val <= 200 -- Node values are bounded integers. This is useful because it means serialized representations will not have extremely long individual node strings.
  • At least 1 node -- No need to handle an empty tree.

Approach 1: Brute Force (Compare All Pairs)

Intuition

The most straightforward idea: for every pair of nodes in the tree, check if the subtrees rooted at those two nodes are identical. If they are, add one of them to the result (making sure we do not add duplicates).

To check if two subtrees are identical, we do a recursive comparison: same value at the root, and both left subtrees are identical, and both right subtrees are identical.

This is conceptually simple but does a lot of redundant work. For each of the O(n^2) pairs, the comparison itself can take O(n) time in the worst case, making this an O(n^3) approach.

Algorithm

  1. Collect all nodes in the tree into a list using any traversal.
  2. For every pair of distinct nodes (i, j), check if the subtree rooted at node i is identical to the subtree rooted at node j.
  3. To check identity, recursively compare the two subtrees: same value, same left subtrees, same right subtrees.
  4. If a duplicate pair is found, add the root of one subtree to the result set (use a set to avoid adding the same structure multiple times).
  5. Return the result list.

Example Walkthrough

Input:

1243244
root

We compare every pair of subtrees. Node 4 appears as a leaf in three places (indices 3, 9, and 6). When we compare node at index 3 with node at index 9, isSame returns true, so we add [4] to the result. Similarly, the subtree [2, 4] appears at indices 1 and 5, so we add [2, 4] to the result.

Output:

[2,4]
[4]
result

Code

The bottleneck is comparing every pair of subtrees from scratch. What if we could represent each subtree with a compact fingerprint so that comparison becomes an O(1) hash map lookup?

Approach 2: Postorder Serialization with Hash Map

Intuition

Instead of comparing subtrees structurally, we can serialize each subtree into a string. Two subtrees are identical if and only if their serializations match. This transforms a tree comparison problem into a string matching problem.

We use postorder traversal (left, right, node) to build the serialization bottom-up. For each node, we recursively get the serialization of its left subtree and right subtree, then combine them with the current node's value. A null child is represented by a special marker like "#".

We store each serialization in a hash map counting occurrences. When the count for a serialization hits exactly 2, we know we have found a duplicate, and we add the current node to our result.

Algorithm

  1. Initialize a hash map to count serialization occurrences and a result list.
  2. Perform a postorder DFS on the tree.
  3. At each node, build the serialization: leftSerial + "," + rightSerial + "," + node.val.
  4. For null nodes, return "#" as the serialization.
  5. Increment the count for this serialization in the hash map.
  6. If the count becomes exactly 2, add the current node to the result list.
  7. Return the serialization string so the parent can use it.
  8. After the DFS completes, return the result list.

Example Walkthrough

1Start postorder DFS: visit leftmost leaf first (node 4 at index 3)
124visit3244
1/7
1Hash map empty, begin postorder traversal
1/7

Code

The bottleneck is the string serialization itself. For a skewed tree, the total string data across all nodes can reach O(n^2). What if we could represent each subtree with a short, fixed-size identifier instead of a long string?

Approach 3: ID-Based Serialization (Optimal)

Intuition

The string serialization approach works, but it suffers from long strings in the worst case. Here is the optimization: instead of encoding a subtree as a full string, we assign each unique subtree structure a numeric ID. Then, to represent a subtree, we only need three things: the left child's ID, the current node's value, and the right child's ID. That is a fixed-size tuple, no matter how deep the subtree is.

Think of it like this. In the string approach, the parent's serialization includes the entire serialization of both children, which includes the entire serialization of the grandchildren, and so on. It is like copying a document into another document into another document. With IDs, we just reference the document by its page number instead of copying it. Each node's representation is always just three numbers.

Algorithm

  1. Initialize a map from tuple (leftID, nodeVal, rightID) to a unique integer ID, starting IDs from 1. Reserve ID 0 for null nodes.
  2. Initialize a count map from ID to occurrence count, and a result list.
  3. Perform a postorder DFS.
  4. For null nodes, return ID 0.
  5. At each node, recursively get the left child's ID and right child's ID.
  6. Form the tuple (leftID, node.val, rightID).
  7. If the tuple is not in the map, assign it a new ID.
  8. Look up the ID for this tuple and increment its count.
  9. If the count equals 2, add the current node to the result.
  10. Return the ID for this subtree.

Example Walkthrough

1Start postorder DFS, null nodes get ID=0
124visit3244
1/7
1Maps empty, null=ID 0, nextId=1
1/7

Code