AlgoMaster Logo

Longest Duplicate Substring

Last Updated: April 2, 2026

hard
5 min read

Understanding the Problem

We need to find the longest substring that appears at least twice in the given string. The two occurrences are allowed to overlap, which is important. For instance, in "banana", the substring "ana" starts at index 1 and again at index 3, and those two occurrences share the character 'a' at index 3.

A naive way to think about this: generate every possible substring, check if it appears more than once, and track the longest one. But the number of substrings is O(n^2), and comparing them takes O(n) each, so that would be O(n^3). For n up to 30,000, that is far too slow.

The key observation is that substring length has a monotonic property. If a duplicate substring of length k exists, then a duplicate substring of length k-1 also exists (just chop off the last character). This means we can binary search on the length of the answer. For each candidate length, we just need to check whether any duplicate substring of that exact length exists. That check is where rolling hash (Rabin-Karp) comes in, letting us compare substrings in O(1) amortized time.

Key Constraints:

  • 2 <= s.length <= 3 * 10^4 → With n up to 30,000, O(n^2) is borderline (900 million operations). We need something closer to O(n log n) to be safe.
  • s consists of lowercase English letters → Only 26 possible characters. This is useful for hash function design since we can map characters to integers 0-25.

Approach 1: Brute Force

Intuition

The most straightforward approach: try every possible substring length from longest to shortest, and for each length, check every pair of substrings to see if any two match. The moment we find a match, that is our answer since we started from the longest length.

We can simplify this a bit. For each starting index i, extract the substring starting there and check if it appears again later in the string. We do this for every possible substring, and track the longest duplicate we find.

Algorithm

  1. For each possible length L from n-1 down to 1:
  2. Extract every substring of length L.
  3. Store each substring in a set.
  4. If we find a substring already in the set, return it.
  5. If no duplicate found at any length, return "".

Example Walkthrough

1Try length 5: "banan", "anana" — no duplicates
0
b
1
a
2
n
3
a
4
n
len=5
5
a
1/5

Code

For each candidate length, we create O(n) substring objects and hash them from scratch. We also check every length from n-1 down to 1 even though most have no duplicates. Can we skip straight to the right length using the monotonic property, and hash substrings in O(1) using a rolling hash?

Approach 2: Binary Search + Rabin-Karp Rolling Hash

Intuition

Two separate insights combine here to give us an efficient solution.

First, the monotonicity observation: if a duplicate substring of length k exists, then a duplicate of length k-1 must also exist (just trim the last character from each occurrence). This means the set of "valid" lengths forms a contiguous range from 0 up to some maximum. That is perfect for binary search. Instead of trying every length from n-1 down to 1, we binary search for the largest length that has a duplicate.

Second, the rolling hash idea (Rabin-Karp): to check if any duplicate of length L exists, we slide a window of size L across the string and compute a hash for each window. If two windows have the same hash, they might be the same substring. We use a polynomial rolling hash so each new hash can be computed from the previous one in O(1), making the entire check O(n) for a given length.

Putting them together: binary search tries O(log n) lengths, and each check is O(n), giving us O(n log n) overall.

Algorithm

  1. Binary search on the length L, with low = 1 and high = n - 1.
  2. For each candidate length mid = (low + high) / 2, check if a duplicate substring of length mid exists.
  3. The check uses Rabin-Karp: compute the rolling hash of each substring of length mid. Store hashes in a map. If a hash collision occurs, verify by comparing actual characters.
  4. If a duplicate of length mid exists, record the result and search for longer (low = mid + 1).
  5. If no duplicate of length mid exists, search for shorter (high = mid - 1).
  6. Return the longest duplicate found, or "" if none exists.

Example Walkthrough

1Binary search: low=1, high=5. Try mid=3. Slide window of length 3.
0
b
1
a
2
n
window len=3
3
a
4
n
5
a
1/7

Code

While O(n log n) average case is great, the worst case depends on hash collision behavior. The suffix array approach avoids hashing entirely and achieves deterministic O(n log^2 n) time.

Approach 3: Suffix Array + LCP Array

Intuition

A suffix array is the sorted order of all suffixes of the string. Once we sort them, adjacent suffixes in sorted order share the longest common prefixes. The Longest Common Prefix (LCP) array stores the length of the longest common prefix between each pair of adjacent sorted suffixes. The maximum value in the LCP array is exactly the length of the longest duplicate substring.

Why does this work? Any duplicate substring must be a prefix of at least two different suffixes. When we sort all suffixes, those two suffixes end up adjacent (or near-adjacent) in the sorted order, and their common prefix shows up in the LCP array.

For example, with "banana": suffixes sorted are "a", "ana", "anana", "banana", "na", "nana". The LCP between "ana" and "anana" is 3 ("ana"), which is our answer.

Building the suffix array takes O(n log n) using standard algorithms, and computing the LCP array from it takes O(n) using Kasai's algorithm.

Algorithm

  1. Build the suffix array: sort all suffixes of s lexicographically. Store their starting indices in sorted order.
  2. Compute the LCP array using Kasai's algorithm: for each pair of adjacent suffixes in sorted order, compute the length of their common prefix.
  3. Find the maximum value in the LCP array. The corresponding suffix gives us the starting position of the longest duplicate substring.
  4. Extract and return the substring.

Example Walkthrough

1List all suffixes: banana, anana, nana, ana, na, a
0
b
banana
1
a
anana
2
n
nana
3
a
ana
4
n
na
5
a
a
1/5

Code