Last Updated: March 29, 2026
This is the classic substring search problem. We have a text string (haystack) and a pattern string (needle), and we need to find where the pattern first appears inside the text. If it never appears, we return -1.
What makes this interesting isn't the "what" but the "how." The brute force approach is obvious: check every possible starting position. But substring search has been studied extensively, and there are elegant algorithms that avoid redundant comparisons by using information from previous mismatches. The journey from O(n * m) brute force to O(n + m) pattern matching is one of the most instructive algorithm design stories in computer science.
The key observation is this: when a mismatch happens during brute force, we throw away everything we learned and start fresh. Smarter algorithms remember what they've already matched and use that information to skip ahead.
1 <= haystack.length, needle.length <= 10^4 → Both strings can be up to 10,000 characters. An O(n * m) brute force means up to 100 million character comparisons in the worst case, which is borderline. O(n + m) is clearly better.haystack and needle consist of only lowercase English characters → No special characters, no unicode edge cases. The alphabet size is 26, which is relevant for certain hashing approaches.needle.length >= 1 → We don't need to handle an empty needle.The most natural approach: try every possible starting position in the haystack and check if the needle matches there. For each position i, compare characters one by one. If all characters match, we found our answer. If any character doesn't match, move to the next starting position.
There's one small optimization built into the brute force that's easy to miss. We only need to check starting positions from 0 to haystack.length - needle.length. Any starting position beyond that doesn't leave enough room for the full needle to fit.
n be the length of haystack and m be the length of needle.i from 0 to n - m:haystack[i..i+m-1] with needle[0..m-1] character by character.m characters match, return i.-1.When a mismatch happens, brute force throws away everything it learned and restarts from scratch. What if we could precompute how far to shift when a mismatch occurs, based on what we've already matched?
The KMP algorithm is built on a simple but powerful idea: when a mismatch happens, we already know what the last few characters of the haystack look like (they matched part of the needle). So instead of starting the comparison from scratch, we can "fall back" to the longest prefix of the needle that matches a suffix of what we've matched so far.
To make this work, KMP preprocesses the needle into a "Longest Proper Prefix which is also Suffix" (LPS) array. For each position j in the needle, lps[j] tells us: if a mismatch happens after matching needle[0..j], what's the longest prefix of the needle we can reuse without rechecking characters in the haystack?
The LPS array captures the internal structure of the needle. When we've matched needle[0..j-1] against the haystack and then hit a mismatch at needle[j], the LPS tells us the longest prefix of needle[0..j-1] that is also a suffix. Since that suffix just matched the haystack, the prefix must also match. So we can jump the needle pointer to lps[j-1] and continue without backtracking in the haystack.
This is what makes KMP linear. The haystack pointer i never moves backward. It advances at least once per iteration (either on a match or when j == 0). The needle pointer j can only fall back a limited number of times in total because each fallback undoes a previous advance, and advances are bounded by n.
lps[0] = 0 (a single character has no proper prefix that is also a suffix).length tracks the length of the current longest prefix-suffix, and i scans through the needle.needle[i] == needle[length], increment both and set lps[i] = length.length > 0, fall back to lps[length - 1] (reuse previously computed information).length == 0, set lps[i] = 0 and move to the next character.i for the haystack, j for the needle.j reaches m, we found a match at index i - m.j > 0, set j = lps[j - 1] (fall back without moving i).j == 0, just advance i.