Last Updated: March 21, 2026
We need to count how many words from the array are subsequences of the string s. A subsequence means we can find the word's characters in s in order, but they don't need to be contiguous. For instance, "acd" is a subsequence of "abcde" because we can pick 'a' at index 0, 'c' at index 2, and 'd' at index 3.
The brute force idea is obvious: for each word, scan through s with two pointers to check if it's a subsequence. But notice the constraints. We could have up to 5000 words, and s can be up to 50,000 characters long. That means we might scan through s up to 5000 times. Can we do better?
The key observation is that many words might share the same prefix or need the same characters. Instead of scanning s repeatedly, we can either scan s once while processing all words in parallel (bucket approach), or precompute where each character appears in s so that checking any word becomes a series of fast lookups (binary search approach).
s.length <= 5 * 10^4 → We can afford O(m log m) or even O(m * 26) preprocessing on s, but we want to avoid scanning s once per word if possible.words.length <= 5000 → Up to 5000 words to check. Combined with s.length, a brute force of O(n m) = 2.5 10^8 is borderline. We should look for something faster.words[i].length <= 50 → Each word is short. This means per-word work of O(L * log m) with binary search is cheap, since L is at most 50.The most natural approach: for each word, check whether it's a subsequence of s using the classic two-pointer technique. Start a pointer at the beginning of s and a pointer at the beginning of the word. Walk through s, and whenever the current character matches the word's current character, advance the word pointer. If the word pointer reaches the end, the word is a subsequence.
This is clean and easy to implement. Every developer would reach for this first.
s, one on the word) to check if the word is a subsequence of ss which has length m. Each subsequence check is O(m) in the worst case. Total is up to 5000 50,000 = 2.5 * 10^8, which is borderline.The bottleneck is that we scan the entire string s independently for every single word. If we have 5000 words, we walk through s 5000 times. What if we could scan s just once and advance all words simultaneously?
Instead of taking each word and checking it against s, flip the perspective. Scan through s one character at a time, and for each character, advance all the words that are currently waiting for that character.
Think of it like 26 waiting rooms, one for each letter. Each word sits in the waiting room corresponding to its next needed character. When we encounter a character in s, we go to that waiting room, pull out everyone waiting there, advance their pointer by one, and either count them as done (if they've matched all their characters) or send them to their new waiting room.
This way, we scan s exactly once, and each character in each word is processed exactly once.
The bucket approach works because it respects the order of characters in s. By scanning s left to right, we guarantee that when a word advances its pointer, the matched character comes after all previously matched characters. Each word entry moves through buckets exactly once per character in the word, and we never revisit characters of s.
The brilliance is that we've turned the problem inside out. Instead of "for each word, find its characters in s," we're doing "for each character in s, advance all words that need it." This eliminates the repeated scanning.
c in s:s and T is the total number of characters across all words. We scan s once (O(m)), and each character in each word is processed exactly once when moved between buckets (O(T)). Since T <= 5000 * 50 = 250,000, this is very fast.The bucket approach is already excellent at O(m + T). But what if instead of scanning s during the query phase, we precomputed an index that lets us answer "where does character X appear after position P?" instantly? That's what binary search on precomputed positions gives us.
Here's a different angle: before looking at any word, preprocess s by recording where each character appears. For example, if s = "abcde", then 'a' appears at position [0], 'b' at [1], 'c' at [2], and so on. For a string like s = "abacd", 'a' appears at positions [0, 2].
Now, to check if a word is a subsequence, we don't need to scan s at all. For the first character of the word, find any position in s where it occurs. For the second character, find the first position that comes after the position we just used. For the third character, find the first position after that. Each of these "find the next position after X" queries is a binary search on the sorted list of positions.
If at any point we can't find a valid position (there's no occurrence of the needed character after our current position), the word is not a subsequence.
sprevPos = -1 (meaning we haven't matched anything yet)prevPosprevPos to this positions exactly once, distributed across 26 lists. Total storage is O(m).