This is the classic substring search problem. We have a text string (haystack) and a pattern string (needle), and we need to find where the pattern first appears inside the text. If it never appears, we return -1.
The brute force approach checks every possible starting position and re-compares from scratch each time. The more interesting question is how to avoid that redundant work. When a mismatch happens during brute force, it discards everything matched so far and restarts. Better algorithms reuse what has already been matched, which moves the cost from O(n * m) down to O(n + m).
1 <= haystack.length, needle.length <= 10^4 → Both strings can be up to 10,000 characters. An O(n * m) brute force reaches up to 100 million character comparisons in the worst case (a pattern like "aaa...ab" against text "aaa...a"), which is on the edge of acceptable. An O(n + m) algorithm removes the concern.haystack and needle consist of only lowercase English characters → No unicode edge cases. The alphabet size is 26, which matters for the rolling-hash approach below.needle.length >= 1 → No empty-needle case to handle.Try every possible starting position in the haystack and check whether the needle matches there. For each position i, compare characters one by one. If all m characters match, that position is the answer. On any mismatch, move to the next starting position.
We only need to check starting positions from 0 to haystack.length - needle.length. Any position beyond that leaves fewer than m characters of haystack remaining, so the full needle cannot fit.
n be the length of haystack and m be the length of needle.i from 0 to n - m:haystack[i..i+m-1] with needle[0..m-1] character by character.m characters match, return i.-1.Brute force spends most of its time re-comparing characters it has already seen. The next approach attacks that from a different angle: instead of comparing the needle against every window character by character, it compares a single hash value per window and only falls back to a full character comparison when the hashes agree.
Comparing two strings of length m costs O(m). If we could compare a numeric fingerprint of each window in O(1) instead, most windows would be rejected immediately. Rabin-Karp assigns each length-m string a hash value, computes the needle's hash once, then slides a window across the haystack and compares window hashes against the needle's hash.
The reason this is fast is the rolling hash. Recomputing a window's hash from scratch would cost O(m) per window and save nothing. Instead, treat the window as a base-26 number. When the window slides one position to the right, we remove the contribution of the leftmost character and add the new rightmost character, both in O(1). A hash match is only a strong hint, not proof (two different strings can share a hash, a collision), so on every hash match we verify the characters directly before returning.
Equal strings always produce equal hashes, so the algorithm never skips a real match: every true occurrence passes the hash check and then the character check. The only error a hash can introduce is a false positive (a collision), and the explicit character comparison after each hash match rejects those. Working modulo a large prime keeps the hash inside machine-integer range and makes collisions rare, but the character check is what guarantees a correct answer regardless of collisions.
n be the length of haystack and m be the length of needle. If m > n, return -1.BASE (26, the alphabet size) and a large prime modulus MOD to keep hashes bounded.power = BASE^(m-1) mod MOD. This is the weight of the leftmost character in a window.needleHash over the needle and windowHash over the first m characters of the haystack.0 to n - m:windowHash == needleHash, compare the window with the needle character by character. If they are equal, return the start index.windowHash - haystack[i] * power), multiply by BASE, and add the next character. Take the result mod MOD at each step.KMP removes the worst case that Rabin-Karp still carries: it guarantees O(n + m) regardless of input by precomputing how far to shift the needle on a mismatch, instead of relying on hash values.
When a mismatch happens, we already know the last few characters of the haystack: they matched a prefix of the needle. KMP uses that information to fall back to the longest prefix of the needle that is also a suffix of the part already matched, instead of restarting the comparison from scratch.
To make this work, KMP preprocesses the needle into a "Longest Proper Prefix which is also Suffix" (LPS) array. For each position j in the needle, lps[j] tells us: if a mismatch happens after matching needle[0..j], what's the longest prefix of the needle we can reuse without rechecking characters in the haystack?
The LPS array captures the internal structure of the needle. When we have matched needle[0..j-1] against the haystack and then hit a mismatch at needle[j], the LPS gives the longest prefix of needle[0..j-1] that is also a suffix. Since that suffix just matched the haystack, the prefix matches the same haystack characters, so we can jump the needle pointer to lps[j-1] and continue without backtracking in the haystack.
This is what makes KMP linear. The haystack pointer i never moves backward. It advances at least once per iteration (either on a match or when j == 0). The needle pointer j can only fall back a limited number of times in total because each fallback undoes a previous advance, and advances are bounded by n.
lps[0] = 0 (a single character has no proper prefix that is also a suffix).length tracks the length of the current longest prefix-suffix, and i scans through the needle.needle[i] == needle[length], increment both and set lps[i] = length.length > 0, fall back to lps[length - 1] (reuse previously computed information).length == 0, set lps[i] = 0 and move to the next character.i for the haystack, j for the needle.j reaches m, we found a match at index i - m.j > 0, set j = lps[j - 1] (fall back without moving i).j == 0, just advance i.