AlgoMaster Logo

Substring with Concatenation of All Words

Last Updated: March 21, 2026

hard
6 min read

Understanding the Problem

We need to find every starting index in s where a substring of length words.length * wordLength is exactly a concatenation of all words in some order. The order doesn't matter, so we're really looking for positions where the substring can be split into word-sized chunks that form the same multiset as words.

The key observation is that checking permutations is a red herring. Since we don't care about order, we just need to verify that the frequency of each word-chunk matches the frequency of words in the array. Two multisets are equal if and only if every element appears the same number of times in both.

Also notice that words can contain duplicates (like ["word","good","best","word"] in Example 2), so we can't use a simple set. We need a frequency map.

Key Constraints:

  • s.length <= 10^4 → The string is relatively short, so even O(n * m) approaches are feasible.
  • words.length <= 5000 and words[i].length <= 30 → The total window size could be up to 150,000, but since s is at most 10^4, many test cases will have smaller m. Still, O(n m w) brute force is roughly 1.5 * 10^9 operations in the worst case, which is too slow.
  • All words have the same length → This is the crucial constraint that enables word-chunk-based sliding window.

Approach 1: Brute Force

Intuition

The most straightforward approach: for every possible starting position in s, extract a substring of the right total length, split it into word-sized chunks, and check whether those chunks form the same multiset as words.

We build a frequency map of words once upfront. Then for each starting index i, we extract m consecutive chunks of length w from position i, build a frequency map of those chunks, and compare the two maps. If they match, i is a valid starting index.

Algorithm

  1. Compute w (word length), m (number of words), and totalLen = m * w
  2. Build a frequency map target from the words array
  3. For each starting index i from 0 to s.length - totalLen:
    1. Extract m substrings of length w starting at positions ii + wi + 2w, ...
    2. Build a frequency map seen from these substrings
    3. If seen equals target, add i to the result
  4. Return the result list

Code

The bottleneck is that we rebuild the seen frequency map from scratch at every starting position. When we move from position i to position i + 1, most of the window overlaps, but we throw away all that work and start over. Can we at least exit early when a chunk isn't in the target at all?

Approach 2: Hash Map with Early Exit

Intuition

Before jumping to the full sliding window optimization, there's a useful intermediate step. The brute force already includes an early exit (break as soon as a chunk exceeds its expected count), but we can make the exit even more aggressive.

The idea: for each starting position, instead of blindly extracting all m chunks and then comparing, we check each chunk immediately against the target map. If the chunk isn't in target at all, we can bail out instantly. This doesn't change the worst-case complexity, but it dramatically improves the average case, especially when the string contains many characters that don't appear in any word.

This is essentially the same code as Approach 1 (we already added early exit there), but the conceptual emphasis is on how aggressively we can terminate early before moving to the optimal solution.

Algorithm

  1. Build a frequency map target from words
  2. For each starting index i from 0 to s.length - totalLen:
  3. Create an empty frequency map seen
  4. For each word position j from 0 to m - 1:
    • Extract the chunk at position i + j * w
    • If the chunk is not in target, break immediately (no point continuing)
    • Increment seen[chunk]. If it exceeds target[chunk], break
    • If we successfully checked all m words without breaking, add i to the result
  5. Return the result

Code

The code is identical to Approach 1 since we already included early exit. The key difference is conceptual: we're emphasizing that the early termination on "word not in target" is the most impactful optimization before we get to sliding window.

Even with early exit, we still start fresh at every position. Consecutive windows at positions i and i + w share m - 1 of the same word chunks. If we could slide the window by one word at a time, we'd only need to add one chunk and remove one chunk per step. That's exactly what the sliding window approach does.

Approach 3: Sliding Window (Optimal)

Intuition

Here's the key insight that makes this problem elegant: since every word has the same length w, the string s can be viewed as a sequence of word-sized chunks. But the chunking depends on where you start. If you start at index 0, the chunks are s[0..w-1], s[w..2w-1], s[2w..3w-1], etc. If you start at index 1, the chunks shift by one character. There are exactly w distinct ways to chunk the string, corresponding to starting offsets 0, 1, 2, ..., w-1.

For each offset, we have a clean sequence of non-overlapping word-sized chunks. Now we can apply the classic sliding window technique: maintain a window of exactly m consecutive chunks, and slide it one chunk at a time. When we add a new chunk on the right, we update our frequency map. When the window exceeds m chunks, we remove the leftmost chunk. If at any point our frequency map matches the target, we've found a valid concatenation.

The clever part is maintaining a matchCount variable that tracks how many distinct words are fully matched (i.e., seen[word] == target[word]). When matchCount equals the number of distinct words in target, we have a valid window. This avoids comparing the entire maps at every step.

Algorithm

  1. Build a frequency map target from words. Let distinctWords be the number of distinct keys in target
  2. For each starting offset k from 0 to w - 1:
    • Initialize an empty frequency map seen and matchCount = 0
    • Initialize left = k (left boundary of the window)
    • For each position right starting at k, stepping by w, while right + w <= s.length:
    • Extract the word rightWord = s[right..right+w-1]
    • If rightWord is in target:
      • Increment seen[rightWord]
      • If seen[rightWord] now equals target[rightWord], increment matchCount
      • If seen[rightWord] is one more than target[rightWord], decrement matchCount (we went from matched to over-counted)
    • If the window has more than m words (i.e., right - left >= totalLen):
      • Extract the word leftWord = s[left..left+w-1]
      • If leftWord is in target:
        • If seen[leftWord] equals target[leftWord], decrement matchCount (about to lose a match)
        • Decrement seen[leftWord]
        • If seen[leftWord] now equals target[leftWord], increment matchCount (removing the excess restored the match)
      • Move left forward by w
    • If matchCount == distinctWords, add left to the result
  3. Return the result

Code