AlgoMaster Logo

Number of Matching Subsequences

Last Updated: March 21, 2026

medium
5 min read

Understanding the Problem

We need to count how many words from the array are subsequences of the string s. A subsequence means we can find the word's characters in s in order, but they don't need to be contiguous. For instance, "acd" is a subsequence of "abcde" because we can pick 'a' at index 0, 'c' at index 2, and 'd' at index 3.

The brute force idea is obvious: for each word, scan through s with two pointers to check if it's a subsequence. But notice the constraints. We could have up to 5000 words, and s can be up to 50,000 characters long. That means we might scan through s up to 5000 times. Can we do better?

The key observation is that many words might share the same prefix or need the same characters. Instead of scanning s repeatedly, we can either scan s once while processing all words in parallel (bucket approach), or precompute where each character appears in s so that checking any word becomes a series of fast lookups (binary search approach).

Key Constraints:

  • s.length <= 5 * 10^4 → We can afford O(m log m) or even O(m * 26) preprocessing on s, but we want to avoid scanning s once per word if possible.
  • words.length <= 5000 → Up to 5000 words to check. Combined with s.length, a brute force of O(n m) = 2.5 10^8 is borderline. We should look for something faster.
  • words[i].length <= 50 → Each word is short. This means per-word work of O(L * log m) with binary search is cheap, since L is at most 50.

Approach 1: Brute Force (Check Each Word)

Intuition

The most natural approach: for each word, check whether it's a subsequence of s using the classic two-pointer technique. Start a pointer at the beginning of s and a pointer at the beginning of the word. Walk through s, and whenever the current character matches the word's current character, advance the word pointer. If the word pointer reaches the end, the word is a subsequence.

This is clean and easy to implement. Every developer would reach for this first.

Algorithm

  1. Initialize a counter to 0
  2. For each word in the words array:
  3. Use two pointers (one on s, one on the word) to check if the word is a subsequence of s
  4. If the word pointer reaches the end of the word, increment the counter
  5. Return the counter

Example Walkthrough

1Start: s = "abcde", words = ["a","bb","acd","ace"], count = 0
0
a
1
b
2
c
3
d
4
e
1/6

Code

The bottleneck is that we scan the entire string s independently for every single word. If we have 5000 words, we walk through s 5000 times. What if we could scan s just once and advance all words simultaneously?

Approach 2: Character Buckets (Next Letter Pointers)

Intuition

Instead of taking each word and checking it against s, flip the perspective. Scan through s one character at a time, and for each character, advance all the words that are currently waiting for that character.

Think of it like 26 waiting rooms, one for each letter. Each word sits in the waiting room corresponding to its next needed character. When we encounter a character in s, we go to that waiting room, pull out everyone waiting there, advance their pointer by one, and either count them as done (if they've matched all their characters) or send them to their new waiting room.

This way, we scan s exactly once, and each character in each word is processed exactly once.

Algorithm

  1. Create 26 buckets (one per lowercase letter), each holding a list of (word, index) pairs
  2. For each word, place it in the bucket corresponding to its first character, with index 0
  3. Scan through each character c in s:
    • Take all entries from bucket[c]
    • For each entry, advance its index by 1
    • If the index equals the word's length, increment the match count
    • Otherwise, move the entry to the bucket for its next needed character
  4. Return the match count

Example Walkthrough

1Initial buckets: 'a': [(a,0),(acd,0),(ace,0)], 'b': [(bb,0)], count=0
0
a
1
b
2
c
3
d
4
e
1/7

Code

The bucket approach is already excellent at O(m + T). But what if instead of scanning s during the query phase, we precomputed an index that lets us answer "where does character X appear after position P?" instantly? That's what binary search on precomputed positions gives us.

Approach 3: Binary Search on Precomputed Positions

Intuition

Here's a different angle: before looking at any word, preprocess s by recording where each character appears. For example, if s = "abcde", then 'a' appears at position [0], 'b' at [1], 'c' at [2], and so on. For a string like s = "abacd", 'a' appears at positions [0, 2].

Now, to check if a word is a subsequence, we don't need to scan s at all. For the first character of the word, find any position in s where it occurs. For the second character, find the first position that comes after the position we just used. For the third character, find the first position after that. Each of these "find the next position after X" queries is a binary search on the sorted list of positions.

If at any point we can't find a valid position (there's no occurrence of the needed character after our current position), the word is not a subsequence.

Algorithm

  1. Build a position index: for each character 'a' through 'z', store a sorted list of all positions where it appears in s
  2. For each word in the array:
    • Start with prevPos = -1 (meaning we haven't matched anything yet)
    • For each character in the word:
      • Look up the position list for that character
      • Binary search for the smallest position greater than prevPos
      • If no such position exists, the word is not a subsequence (break)
      • Otherwise, update prevPos to this position
    • If all characters were matched, increment the count
  3. Return the count

Example Walkthrough

1Build position index: 'a':[2,7], 'd':[0], 'f':[9], 'h':[3], 'j':[4,6], 'p':[5], 's':[1], 'u':[8]
0
d
1
s
2
a
3
h
4
j
5
p
6
j
7
a
8
u
9
f
1/8

Code