AlgoMaster Logo

Find the Index of the First Occurrence in a String

easyFrequency8 min readUpdated June 23, 2026

Understanding the Problem

This is the classic substring search problem. We have a text string (haystack) and a pattern string (needle), and we need to find where the pattern first appears inside the text. If it never appears, we return -1.

The brute force approach checks every possible starting position and re-compares from scratch each time. The more interesting question is how to avoid that redundant work. When a mismatch happens during brute force, it discards everything matched so far and restarts. Better algorithms reuse what has already been matched, which moves the cost from O(n * m) down to O(n + m).

Key Constraints:

  • 1 <= haystack.length, needle.length <= 10^4 → Both strings can be up to 10,000 characters. An O(n * m) brute force reaches up to 100 million character comparisons in the worst case (a pattern like "aaa...ab" against text "aaa...a"), which is on the edge of acceptable. An O(n + m) algorithm removes the concern.
  • haystack and needle consist of only lowercase English characters → No unicode edge cases. The alphabet size is 26, which matters for the rolling-hash approach below.
  • needle.length >= 1 → No empty-needle case to handle.

Approach 1: Brute Force

Intuition

Try every possible starting position in the haystack and check whether the needle matches there. For each position i, compare characters one by one. If all m characters match, that position is the answer. On any mismatch, move to the next starting position.

We only need to check starting positions from 0 to haystack.length - needle.length. Any position beyond that leaves fewer than m characters of haystack remaining, so the full needle cannot fit.

Algorithm

  1. Let n be the length of haystack and m be the length of needle.
  2. For each starting position i from 0 to n - m:
    • Compare haystack[i..i+m-1] with needle[0..m-1] character by character.
    • If all m characters match, return i.
  3. If no match is found after trying all positions, return -1.

Example Walkthrough

1i=0, j=0: Compare haystack[0]='s' with needle[0]='s'
0
s
i
matching
1
a
2
d
3
b
4
u
5
t
6
s
7
a
8
d
1/4

Code

Brute force spends most of its time re-comparing characters it has already seen. The next approach attacks that from a different angle: instead of comparing the needle against every window character by character, it compares a single hash value per window and only falls back to a full character comparison when the hashes agree.

Approach 2: Rabin-Karp (Rolling Hash)

Intuition

Comparing two strings of length m costs O(m). If we could compare a numeric fingerprint of each window in O(1) instead, most windows would be rejected immediately. Rabin-Karp assigns each length-m string a hash value, computes the needle's hash once, then slides a window across the haystack and compares window hashes against the needle's hash.

The reason this is fast is the rolling hash. Recomputing a window's hash from scratch would cost O(m) per window and save nothing. Instead, treat the window as a base-26 number. When the window slides one position to the right, we remove the contribution of the leftmost character and add the new rightmost character, both in O(1). A hash match is only a strong hint, not proof (two different strings can share a hash, a collision), so on every hash match we verify the characters directly before returning.

Algorithm

  1. Let n be the length of haystack and m be the length of needle. If m > n, return -1.
  2. Pick a base BASE (26, the alphabet size) and a large prime modulus MOD to keep hashes bounded.
  3. Precompute power = BASE^(m-1) mod MOD. This is the weight of the leftmost character in a window.
  4. Compute needleHash over the needle and windowHash over the first m characters of the haystack.
  5. Slide the window from position 0 to n - m:
    • If windowHash == needleHash, compare the window with the needle character by character. If they are equal, return the start index.
    • To advance the window, remove the leftmost character's contribution (windowHash - haystack[i] * power), multiply by BASE, and add the next character. Take the result mod MOD at each step.
  6. If no match is found, return -1.

Example Walkthrough

1needle='ll'. needleHash computed once. Window [0,1]='he', hash != needleHash
0
h
1
e
he
2
l
3
l
4
o
1/4

Code

KMP removes the worst case that Rabin-Karp still carries: it guarantees O(n + m) regardless of input by precomputing how far to shift the needle on a mismatch, instead of relying on hash values.

Approach 3: KMP (Knuth-Morris-Pratt)

Intuition

When a mismatch happens, we already know the last few characters of the haystack: they matched a prefix of the needle. KMP uses that information to fall back to the longest prefix of the needle that is also a suffix of the part already matched, instead of restarting the comparison from scratch.

To make this work, KMP preprocesses the needle into a "Longest Proper Prefix which is also Suffix" (LPS) array. For each position j in the needle, lps[j] tells us: if a mismatch happens after matching needle[0..j], what's the longest prefix of the needle we can reuse without rechecking characters in the haystack?

Algorithm

  1. Build the LPS (Longest Proper Prefix Suffix) array for the needle:
    • Initialize lps[0] = 0 (a single character has no proper prefix that is also a suffix).
    • Use two pointers: length tracks the length of the current longest prefix-suffix, and i scans through the needle.
    • If needle[i] == needle[length], increment both and set lps[i] = length.
    • If they don't match and length > 0, fall back to lps[length - 1] (reuse previously computed information).
    • If they don't match and length == 0, set lps[i] = 0 and move to the next character.
  2. Search the haystack using the LPS array:
    • Use two pointers: i for the haystack, j for the needle.
    • If characters match, advance both pointers.
    • If j reaches m, we found a match at index i - m.
    • If characters don't match and j > 0, set j = lps[j - 1] (fall back without moving i).
    • If characters don't match and j == 0, just advance i.

Example Walkthrough

1i=0, j=0: haystack[0]='a' == needle[0]='a', match
0
a
i
1
a
2
a
3
b
1/6

Code