Understanding the Problem

This is the classic substring search problem. We have a text string (haystack) and a pattern string (needle), and we need to find where the pattern first appears inside the text. If it never appears, we return -1.

The brute force approach checks every possible starting position and re-compares from scratch each time. The more interesting question is how to avoid that redundant work. When a mismatch happens during brute force, it discards everything matched so far and restarts. Better algorithms reuse what has already been matched, which moves the cost from O(n * m) down to O(n + m).

Key Constraints:

1 <= haystack.length, needle.length <= 10^4 → Both strings can be up to 10,000 characters. An O(n * m) brute force reaches up to 100 million character comparisons in the worst case (a pattern like "aaa...ab" against text "aaa...a"), which is on the edge of acceptable. An O(n + m) algorithm removes the concern.
haystack and needle consist of only lowercase English characters → No unicode edge cases. The alphabet size is 26, which matters for the rolling-hash approach below.
needle.length >= 1 → No empty-needle case to handle.

Approach 1: Brute Force

Intuition

Try every possible starting position in the haystack and check whether the needle matches there. For each position i, compare characters one by one. If all m characters match, that position is the answer. On any mismatch, move to the next starting position.

We only need to check starting positions from 0 to haystack.length - needle.length. Any position beyond that leaves fewer than m characters of haystack remaining, so the full needle cannot fit.

Algorithm

Let n be the length of haystack and m be the length of needle.
For each starting position i from 0 to n - m:

Compare haystack[i..i+m-1] with needle[0..m-1] character by character.
If all m characters match, return i.

If no match is found after trying all positions, return -1.

Example Walkthrough

1i=0, j=0: Compare haystack[0]='s' with needle[0]='s'

matching

1/4

Code

Complexity Analysis

Time Complexity: O(n * m). In the worst case, we check all n - m + 1 starting positions, and each check compares up to m characters.
Space Complexity: O(1). Only a few integer variables. No additional data structures.

Brute force spends most of its time re-comparing characters it has already seen. The next approach attacks that from a different angle: instead of comparing the needle against every window character by character, it compares a single hash value per window and only falls back to a full character comparison when the hashes agree.

Approach 2: Rabin-Karp (Rolling Hash)

Intuition

Comparing two strings of length m costs O(m). If we could compare a numeric fingerprint of each window in O(1) instead, most windows would be rejected immediately. Rabin-Karp assigns each length-m string a hash value, computes the needle's hash once, then slides a window across the haystack and compares window hashes against the needle's hash.

The reason this is fast is the rolling hash. Recomputing a window's hash from scratch would cost O(m) per window and save nothing. Instead, treat the window as a base-26 number. When the window slides one position to the right, we remove the contribution of the leftmost character and add the new rightmost character, both in O(1). A hash match is only a strong hint, not proof (two different strings can share a hash, a collision), so on every hash match we verify the characters directly before returning.

Correctness

Equal strings always produce equal hashes, so the algorithm never skips a real match: every true occurrence passes the hash check and then the character check. The only error a hash can introduce is a false positive (a collision), and the explicit character comparison after each hash match rejects those. Working modulo a large prime keeps the hash inside machine-integer range and makes collisions rare, but the character check is what guarantees a correct answer regardless of collisions.

Algorithm

Let n be the length of haystack and m be the length of needle. If m > n, return -1.
Pick a base BASE (26, the alphabet size) and a large prime modulus MOD to keep hashes bounded.
Precompute power = BASE^(m-1) mod MOD. This is the weight of the leftmost character in a window.
Compute needleHash over the needle and windowHash over the first m characters of the haystack.
Slide the window from position 0 to n - m:

If windowHash == needleHash, compare the window with the needle character by character. If they are equal, return the start index.
To advance the window, remove the leftmost character's contribution (windowHash - haystack[i] * power), multiply by BASE, and add the next character. Take the result mod MOD at each step.

If no match is found, return -1.

Example Walkthrough

1needle='ll'. needleHash computed once. Window [0,1]='he', hash != needleHash

1/4

Code

Complexity Analysis

Time Complexity: O(n + m) on average. Each window roll is O(1), and verification on a hash match is rare with a good hash. The worst case is O(n * m) if many windows collide (for example, an adversarial input crafted to hit the chosen modulus), since each collision triggers a full O(m) comparison.
Space Complexity: O(1). Only a few integer variables for the hashes and the rolling power.

KMP removes the worst case that Rabin-Karp still carries: it guarantees O(n + m) regardless of input by precomputing how far to shift the needle on a mismatch, instead of relying on hash values.

Approach 3: KMP (Knuth-Morris-Pratt)

Intuition

When a mismatch happens, we already know the last few characters of the haystack: they matched a prefix of the needle. KMP uses that information to fall back to the longest prefix of the needle that is also a suffix of the part already matched, instead of restarting the comparison from scratch.

To make this work, KMP preprocesses the needle into a "Longest Proper Prefix which is also Suffix" (LPS) array. For each position j in the needle, lps[j] tells us: if a mismatch happens after matching needle[0..j], what's the longest prefix of the needle we can reuse without rechecking characters in the haystack?

Correctness

The LPS array captures the internal structure of the needle. When we have matched needle[0..j-1] against the haystack and then hit a mismatch at needle[j], the LPS gives the longest prefix of needle[0..j-1] that is also a suffix. Since that suffix just matched the haystack, the prefix matches the same haystack characters, so we can jump the needle pointer to lps[j-1] and continue without backtracking in the haystack.

This is what makes KMP linear. The haystack pointer i never moves backward. It advances at least once per iteration (either on a match or when j == 0). The needle pointer j can only fall back a limited number of times in total because each fallback undoes a previous advance, and advances are bounded by n.

Algorithm

Build the LPS (Longest Proper Prefix Suffix) array for the needle:

Initialize lps[0] = 0 (a single character has no proper prefix that is also a suffix).
Use two pointers: length tracks the length of the current longest prefix-suffix, and i scans through the needle.
If needle[i] == needle[length], increment both and set lps[i] = length.
If they don't match and length > 0, fall back to lps[length - 1] (reuse previously computed information).
If they don't match and length == 0, set lps[i] = 0 and move to the next character.

Search the haystack using the LPS array:

Use two pointers: i for the haystack, j for the needle.
If characters match, advance both pointers.
If j reaches m, we found a match at index i - m.
If characters don't match and j > 0, set j = lps[j - 1] (fall back without moving i).
If characters don't match and j == 0, just advance i.

Example Walkthrough

1i=0, j=0: haystack[0]='a' == needle[0]='a', match

1/6

Code

Complexity Analysis

Time Complexity: O(n + m). Building the LPS array takes O(m) time. The search phase takes O(n) time because the haystack pointer never moves backward.
Space Complexity: O(m). We store the LPS array of size m. No other additional space is needed.

Find the Index of the First Occurrence in a String

Understanding the Problem

Key Constraints:

Approach 1: Brute Force

Intuition

Algorithm

Example Walkthrough

Code

Approach 2: Rabin-Karp (Rolling Hash)

Intuition

Algorithm

Example Walkthrough

Code

Approach 3: KMP (Knuth-Morris-Pratt)

Intuition

Algorithm

Example Walkthrough

Code

Get Premium