Last Updated: April 2, 2026
We need to find the longest substring that appears at least twice in the given string. The two occurrences are allowed to overlap, which is important. For instance, in "banana", the substring "ana" starts at index 1 and again at index 3, and those two occurrences share the character 'a' at index 3.
A naive way to think about this: generate every possible substring, check if it appears more than once, and track the longest one. But the number of substrings is O(n^2), and comparing them takes O(n) each, so that would be O(n^3). For n up to 30,000, that is far too slow.
The key observation is that substring length has a monotonic property. If a duplicate substring of length k exists, then a duplicate substring of length k-1 also exists (just chop off the last character). This means we can binary search on the length of the answer. For each candidate length, we just need to check whether any duplicate substring of that exact length exists. That check is where rolling hash (Rabin-Karp) comes in, letting us compare substrings in O(1) amortized time.
2 <= s.length <= 3 * 10^4 → With n up to 30,000, O(n^2) is borderline (900 million operations). We need something closer to O(n log n) to be safe.s consists of lowercase English letters → Only 26 possible characters. This is useful for hash function design since we can map characters to integers 0-25.The most straightforward approach: try every possible substring length from longest to shortest, and for each length, check every pair of substrings to see if any two match. The moment we find a match, that is our answer since we started from the longest length.
We can simplify this a bit. For each starting index i, extract the substring starting there and check if it appears again later in the string. We do this for every possible substring, and track the longest duplicate we find.
For each candidate length, we create O(n) substring objects and hash them from scratch. We also check every length from n-1 down to 1 even though most have no duplicates. Can we skip straight to the right length using the monotonic property, and hash substrings in O(1) using a rolling hash?
Two separate insights combine here to give us an efficient solution.
First, the monotonicity observation: if a duplicate substring of length k exists, then a duplicate of length k-1 must also exist (just trim the last character from each occurrence). This means the set of "valid" lengths forms a contiguous range from 0 up to some maximum. That is perfect for binary search. Instead of trying every length from n-1 down to 1, we binary search for the largest length that has a duplicate.
Second, the rolling hash idea (Rabin-Karp): to check if any duplicate of length L exists, we slide a window of size L across the string and compute a hash for each window. If two windows have the same hash, they might be the same substring. We use a polynomial rolling hash so each new hash can be computed from the previous one in O(1), making the entire check O(n) for a given length.
Putting them together: binary search tries O(log n) lengths, and each check is O(n), giving us O(n log n) overall.
The binary search works because of the monotonicity property: the answer to "does a duplicate of length L exist?" changes from YES to NO at exactly one threshold. Below that threshold, every length has a duplicate. Above it, none does. So binary search finds the exact boundary in O(log n) iterations.
The Rabin-Karp rolling hash works because instead of comparing substrings character by character, we compare their hash values. A polynomial hash can be updated in O(1) when we slide the window one position right: remove the contribution of the leftmost character and add the new rightmost character. This makes the full sweep O(n) instead of O(n * L).
While O(n log n) average case is great, the worst case depends on hash collision behavior. The suffix array approach avoids hashing entirely and achieves deterministic O(n log^2 n) time.
A suffix array is the sorted order of all suffixes of the string. Once we sort them, adjacent suffixes in sorted order share the longest common prefixes. The Longest Common Prefix (LCP) array stores the length of the longest common prefix between each pair of adjacent sorted suffixes. The maximum value in the LCP array is exactly the length of the longest duplicate substring.
Why does this work? Any duplicate substring must be a prefix of at least two different suffixes. When we sort all suffixes, those two suffixes end up adjacent (or near-adjacent) in the sorted order, and their common prefix shows up in the LCP array.
For example, with "banana": suffixes sorted are "a", "ana", "anana", "banana", "na", "nana". The LCP between "ana" and "anana" is 3 ("ana"), which is our answer.
Building the suffix array takes O(n log n) using standard algorithms, and computing the LCP array from it takes O(n) using Kasai's algorithm.
The suffix array approach is grounded in a simple but powerful observation: any substring of s is a prefix of some suffix of s. So finding the longest duplicate substring is equivalent to finding the longest common prefix shared by any two distinct suffixes.
When we sort all suffixes, suffixes that share long common prefixes end up next to each other. Kasai's algorithm exploits this by computing LCP values in original string order, using the fact that if suffix i and its neighbor share h characters, then suffix i+1 and its neighbor share at least h-1. This gives the algorithm its O(n) runtime.