Substring with Concatenation of All Words
Problem
Given a string s and an array of strings words, return all starting indices of substring(s) in s that is a concatenation of each word in words exactly once and without any intervening characters.
- 1 ≤ s.length ≤ 10⁴
- 1 ≤ words.length ≤ 5000
- 1 ≤ words[i].length ≤ 30
- s and words[i] consist of lowercase English letters
Example
s = "barfoothefoobarman", words = ["foo","bar"][0,9]The substrings starting at indices 0 and 9 are "barfoo" and "foobar" respectively, both concatenations of the words "foo" and "bar" without any extra characters. A brute-force approach would check every substring of length equal to total concatenation length, verifying if it contains all words exactly once, which is inefficient. The sliding window approach partitions the string into word-length segments and slides over these segments, maintaining counts of seen words and adjusting the window to ensure no word is overused. This method efficiently finds all valid starting indices in O(n) time relative to the string length.
Approach
Straightforward Solution
A brute-force solution would check every substring of length total_len and verify if it contains all words exactly once by counting frequencies. This approach is O(n * m * k) where n is string length, m is number of words, and k is word length, which is too slow for large inputs.
Core Observation
The problem reduces to finding all substrings of length equal to the total length of all words concatenated, where the substring is composed of the words in any order, each exactly once. Since all words have the same length, the string can be segmented into fixed-size chunks, enabling a sliding window approach that moves in increments of word length.
Path to Optimal
PreviewThe key insight is to slide over the string in word_len increments, maintaining a window of words and their counts…
Full step-by-step walkthrough on Pro →
Optimal Approach
PreviewIterate over possible offsets from 0 to word_len - 1 to cover all possible alignments. For each offset, use two pointers (left and right) to define a sliding window of words…
Full step-by-step walkthrough on Pro →
Want the full reasoning chain?
Unlock the complete walkthrough, line-by-line analysis, and recall drill.
Unlock ProTime
O(n * word_len)
The algorithm iterates over word_len offsets, and for each offset, it scans the string in increments of word_len, processing each word once. Each word is added and removed from the window at most once, resulting in O(n) operations per offset and O(n * word_len) overall.
Space
O(m)
The auxiliary space is dominated by the hash maps storing the target word counts and the current window counts, both proportional to the number of words m.
Pattern Spotlight
Sliding Window (Fixed-Size Chunk Partitioning)
When searching for concatenations of fixed-length words in a string, partition the string into word-length segments and slide a window over these segments, adjusting boundaries to maintain valid word counts and efficiently find all matches.
Solution
| 1 | class Solution: |
| 2 | def findSubstring(self, s: str, words: List[str]) -> List[int]: |
| 3 | if not s or not words: |
| 4 | return [] |
| 5 | |
| 6 | word_len = len(words[0]) |
| 7 | num_words = len(words) |
| 8 | total_len = word_len * num_words |
| 9 | |
| 10 | target = {} |
| 11 | for w in words: |
| 12 | target[w] = target.get(w, 0) + 1 |
| 13 | |
| 14 | result = [] |
| 15 | |
| 16 | for offset in range(word_len): |
| 17 | left = offset |
| 18 | window = {} |
| 19 | words_used = 0 |
| 20 | |
| 21 | for right in range(offset, len(s) - word_len + 1, word_len): |
| 22 | word = s[right:right + word_len] |
| 23 | |
| 24 | if word in target: |
| 25 | window[word] = window.get(word, 0) + 1 |
| 26 | words_used += 1 |
| 27 | |
| 28 | while window[word] > target[word]: |
| 29 | left_word = s[left:left + word_len] |
| 30 | window[left_word] -= 1 |
| 31 | left += word_len |
| 32 | words_used -= 1 |
| 33 | |
| 34 | if words_used == num_words: |
| 35 | result.append(left) |
| 36 | |
| 37 | else: |
| 38 | window.clear() |
| 39 | words_used = 0 |
| 40 | left = right + word_len |
| 41 | |
| 42 | return result |
Step-by-Step Solution
Validate Inputs and Derive Fixed Word Parameters
| 3 | if not s or not words: |
| 4 | return [] |
| 6 | word_len = len(words[0]) |
| 7 | num_words = len(words) |
| 8 | total_len = word_len * num_words |
| 14 | result = [] |
Objective
Ensure the inputs are valid and compute the structural parameters needed for the sliding window algorithm.
Key Insight
Because all words share the same length, the string can be processed in fixed-size word segments. Determining word length and word count early enables the sliding window to move in consistent increments.
Interview Quick-Check
Core Logic
Early validation prevents unnecessary computation and avoids accessing words[0] when the list is empty.
State & Boundaries
The fixed word length is the structural property that makes the sliding window approach possible.
Build Frequency Map of Target Words
To create a hash map that counts how many times each word appears in the input list, serving as the target frequency for valid substrings.
Slide Window Over String in Word-Length Increments for Each Offset
To iterate over the string starting at each possible offset and use a sliding window to find valid concatenations of words.
Return All Valid Starting Indices
To output the list of all starting indices where valid concatenations of all words occur.
3 more steps with full analysis available on Pro.
Line Analysis
This solution has 7 Critical lines interviewers watch for.
while window[word] > target[word]:
While the current word's count exceeds its target frequency, shrink the window from the left.
This loop is the core sliding window adjustment that ensures the window remains valid by removing words from the left until the frequency constraints are satisfied, enabling efficient detection of valid concatenations.
if not s or not words:
Check if input string or words list is empty.
An empty string or empty words list means no valid concatenations can exist, so returning an empty list immediately avoids unnecessary computation.
for offset in range(word_len):
Iterate over all possible offsets from 0 to word_len - 1.
Considering all offsets ensures that all possible alignments of word-length segments are checked, preventing missed matches due to misalignment.
Full line-by-line criticality + rationale for all 30 lines available on Pro.
Test Your Understanding
Why does the algorithm iterate over multiple offsets from 0 to word_len - 1 instead of just once from the start?
See the answer with Pro.
Related Problems
Sliding Window pattern
Don't just read it. Drill it.
Reconstruct Substring with Concatenation of All Words from memory until it sticks. AlgoDrill blanks out key lines and makes you fill them back in, step by step.
Unlock the Substring with Concatenation of All Words drill