AlgoMaster Logo

Find the Index of the First Occurrence in a String

Last Updated: March 29, 2026

easy

Understanding the Problem

This is the classic substring search problem. We have a text string (haystack) and a pattern string (needle), and we need to find where the pattern first appears inside the text. If it never appears, we return -1.

What makes this interesting isn't the "what" but the "how." The brute force approach is obvious: check every possible starting position. But substring search has been studied extensively, and there are elegant algorithms that avoid redundant comparisons by using information from previous mismatches. The journey from O(n * m) brute force to O(n + m) pattern matching is one of the most instructive algorithm design stories in computer science.

The key observation is this: when a mismatch happens during brute force, we throw away everything we learned and start fresh. Smarter algorithms remember what they've already matched and use that information to skip ahead.

Key Constraints:

  • 1 <= haystack.length, needle.length <= 10^4 → Both strings can be up to 10,000 characters. An O(n * m) brute force means up to 100 million character comparisons in the worst case, which is borderline. O(n + m) is clearly better.
  • haystack and needle consist of only lowercase English characters → No special characters, no unicode edge cases. The alphabet size is 26, which is relevant for certain hashing approaches.
  • needle.length >= 1 → We don't need to handle an empty needle.

Approach 1: Brute Force

Intuition

The most natural approach: try every possible starting position in the haystack and check if the needle matches there. For each position i, compare characters one by one. If all characters match, we found our answer. If any character doesn't match, move to the next starting position.

There's one small optimization built into the brute force that's easy to miss. We only need to check starting positions from 0 to haystack.length - needle.length. Any starting position beyond that doesn't leave enough room for the full needle to fit.

Algorithm

  1. Let n be the length of haystack and m be the length of needle.
  2. For each starting position i from 0 to n - m:
    • Compare haystack[i..i+m-1] with needle[0..m-1] character by character.
    • If all m characters match, return i.
  3. If no match is found after trying all positions, return -1.

Example Walkthrough

1i=0, j=0: Compare haystack[0]='s' with needle[0]='s'
0
s
i
matching
1
a
2
d
3
b
4
u
5
t
6
s
7
a
8
d
1/4

Code

When a mismatch happens, brute force throws away everything it learned and restarts from scratch. What if we could precompute how far to shift when a mismatch occurs, based on what we've already matched?

Approach 2: KMP (Knuth-Morris-Pratt)

Intuition

The KMP algorithm is built on a simple but powerful idea: when a mismatch happens, we already know what the last few characters of the haystack look like (they matched part of the needle). So instead of starting the comparison from scratch, we can "fall back" to the longest prefix of the needle that matches a suffix of what we've matched so far.

To make this work, KMP preprocesses the needle into a "Longest Proper Prefix which is also Suffix" (LPS) array. For each position j in the needle, lps[j] tells us: if a mismatch happens after matching needle[0..j], what's the longest prefix of the needle we can reuse without rechecking characters in the haystack?

Algorithm

  1. Build the LPS (Longest Proper Prefix Suffix) array for the needle:
    • Initialize lps[0] = 0 (a single character has no proper prefix that is also a suffix).
    • Use two pointers: length tracks the length of the current longest prefix-suffix, and i scans through the needle.
    • If needle[i] == needle[length], increment both and set lps[i] = length.
    • If they don't match and length > 0, fall back to lps[length - 1] (reuse previously computed information).
    • If they don't match and length == 0, set lps[i] = 0 and move to the next character.
  2. Search the haystack using the LPS array:
    • Use two pointers: i for the haystack, j for the needle.
    • If characters match, advance both pointers.
    • If j reaches m, we found a match at index i - m.
    • If characters don't match and j > 0, set j = lps[j - 1] (fall back without moving i).
    • If characters don't match and j == 0, just advance i.

Example Walkthrough

1i=0, j=0: haystack[0]='a' == needle[0]='a', match
0
a
i
1
a
2
a
3
b
1/6

Code