Encode and Decode Strings
Problem
Given a list of strings, encode them into a single string such that it can be decoded back to the original list of strings exactly.
- 0 ≤ strs.length ≤ 10⁴
- 0 ≤ strs[i].length ≤ 10⁴
- strs[i] may contain any ASCII characters
Example
strs = ["hello", "world"]"5#hello5#world"The encoding concatenates each string preceded by its length and a delimiter '#'. For example, "hello" has length 5, so it is encoded as "5#hello". The decoder reads the length prefix, then extracts the substring of that length after the '#'. This process repeats until the entire encoded string is decoded back into the original list ["hello", "world"]. The critical moment is correctly parsing the length prefix and using it to slice the exact substring, ensuring no ambiguity even if strings contain digits or '#'.
Approach
Straightforward Solution
A naive approach might join strings with a delimiter like a comma or space, but this fails if strings contain that delimiter. Another naive approach is to escape delimiters, but escaping can be complex and error-prone.
Core Observation
The fundamental challenge is to encode a list of strings into a single string without losing information or introducing ambiguity, especially since strings can contain any characters including digits and the delimiter itself.
Path to Optimal
The key insight is to prefix each string with its length and a unique delimiter (here '#'). This length prefix acts as a precise marker, allowing the decoder to know exactly how many characters to read for each string, regardless of content. This approach avoids ambiguity and does not require escaping characters inside the strings.
Optimal Approach
Encode by concatenating each string as '<length>#<string>'. Decode by reading the length prefix until the delimiter '#', then extract the substring of that length. Repeat until the entire encoded string is processed. This guarantees lossless and unambiguous encoding and decoding.
Time
O(n)
Encoding and decoding each character exactly once results in linear time complexity proportional to the total length of all strings combined.
Space
O(n)
The encoded string and the decoded list require space proportional to the total input size. Auxiliary space is linear due to building the output string and result list.
Pattern Spotlight
String Manipulation (Length-Prefixed Encoding)
When encoding multiple strings into one string, prefix each string with its length and a delimiter to enable exact slicing during decoding, avoiding ambiguity caused by delimiter characters inside the strings.
Solution
| 1 | class Solution:
|
| 2 | def encode(self, strs: List[str]) -> str:
|
| 3 | res = ""
|
| 4 | for s in strs:
|
| 5 | res += str(len(s)) + "#" + s
|
| 6 | return res
|
| 7 |
|
| 8 | def decode(self, s: str) -> List[str]:
|
| 9 | res = []
|
| 10 | i = 0
|
| 11 | while i < len(s):
|
| 12 | j = i
|
| 13 | while s[j] != '#':
|
| 14 | j += 1
|
| 15 | length = int(s[i:j])
|
| 16 |
|
| 17 | start_of_str = j + 1
|
| 18 | end_of_str = start_of_str + length
|
| 19 | res.append(s[start_of_str:end_of_str])
|
| 20 |
|
| 21 | i = end_of_str
|
| 22 | return res |
Step-by-Step Solution
Build Encoded String by Prefixing Length and Delimiter
| 3 | res = ""
|
| 4 | for s in strs:
|
| 5 | res += str(len(s)) + "#" + s
|
| 6 | return res
|
Objective
To transform the list of strings into a single encoded string by concatenating each string with its length and a delimiter.
Key Insight
Prefixing each string with its length and a delimiter '#' creates a self-describing encoding. This allows the decoder to parse the length first, then extract the exact substring, regardless of the string's content. This approach avoids the complexity and pitfalls of escaping delimiters inside strings.
Interview Quick-Check
Core Logic
Concatenate each string as '<length>#<string>' to enable exact substring extraction during decoding.
Common Pitfalls & Bugs
Forgetting to convert the length to string or omitting the delimiter '#' leads to ambiguous decoding.
Decode Encoded String by Parsing Length Prefix and Extracting Substrings
| 9 | res = []
|
| 10 | i = 0
|
| 11 | while i < len(s):
|
| 12 | j = i
|
| 13 | while s[j] != '#':
|
| 14 | j += 1
|
| 15 | length = int(s[i:j])
|
| 17 | start_of_str = j + 1
|
| 18 | end_of_str = start_of_str + length
|
| 19 | res.append(s[start_of_str:end_of_str])
|
| 21 | i = end_of_str
|
| 22 | return res |
Objective
To reconstruct the original list of strings by reading length prefixes and slicing substrings accordingly.
Key Insight
The decoder scans the encoded string, first reading characters until it encounters the delimiter '#', which marks the end of the length prefix. Converting this prefix to an integer gives the exact length of the next string. Using this length, the decoder extracts the substring immediately following the delimiter. This process repeats until the entire encoded string is consumed, ensuring accurate and lossless reconstruction.
Interview Quick-Check
Core Logic
Parse the length prefix by scanning until '#', then slice the substring of that length to decode each original string.
State & Boundaries
Use a pointer to track the current position in the encoded string, advancing it by the length of the parsed substring plus the length prefix and delimiter.
Common Pitfalls & Bugs
Failing to update the pointer correctly after extracting each substring can cause infinite loops or incorrect decoding.
Line Analysis
This solution has 2 Critical lines interviewers watch for.
res += str(len(s)) + "#" + s
Append the length of the string, a delimiter '#', and the string itself to the result.
This line implements the core encoding strategy: prefixing each string with its length and a delimiter to enable unambiguous decoding.
length = int(s[i:j])
Convert the substring between pointers i and j to an integer length.
Parsing the length prefix as an integer determines how many characters to extract for the next string, enabling precise slicing.
res.append(s[start_of_str:end_of_str])
Extract the substring of the specified length and append it to the result list.
This step reconstructs one original string from the encoded data, preserving order and content.
while s[j] != '#':
Advance the secondary pointer until the delimiter '#' is found.
Locating the delimiter is essential to extract the length prefix accurately.
i = end_of_str
Advance the main pointer to the end of the extracted substring for the next iteration.
Updating the pointer ensures the decoder progresses through the encoded string without overlap or repetition.
while i < len(s):
Start a loop to process the encoded string until fully parsed.
This loop ensures that decoding continues until all encoded substrings have been extracted.
j = i
Set a secondary pointer to find the delimiter '#' marking the end of the length prefix.
This pointer scans forward to locate the delimiter, which separates the length prefix from the string content.
j += 1
Increment the secondary pointer to continue searching for the delimiter.
This step iteratively moves the pointer forward, ensuring the delimiter is found even if the length prefix has multiple digits.
for s in strs:
Iterate over each string in the input list.
Processing each string individually allows encoding them with their length prefixes in sequence.
i = 0
Initialize a pointer to track the current position in the encoded string.
The pointer allows sequential parsing of the encoded string, ensuring all substrings are decoded in order.
start_of_str = j + 1
Calculate the start index of the encoded string content after the delimiter.
This index marks where the actual string begins in the encoded string, immediately after the length prefix and delimiter.
end_of_str = start_of_str + length
Calculate the end index of the string content using the parsed length.
This index defines the boundary for slicing the exact substring representing the original string.
return res
Return the fully encoded string after processing all input strings.
Returning the concatenated encoded string completes the encoding process, providing the input for decoding.
return res
Return the list of decoded strings after fully processing the encoded input.
Returning the reconstructed list completes the decoding process, providing the original input list.
res = ""
Initialize an empty string to accumulate the encoded result.
This string will hold the concatenation of all encoded strings, forming the final encoded output.
res = []
Initialize an empty list to store decoded strings.
This list accumulates the decoded strings as they are extracted from the encoded input.
Test Your Understanding
Why is prefixing each string with its length and a delimiter necessary instead of just joining with a delimiter?
Because strings can contain any characters including the delimiter, joining alone can cause ambiguity. The length prefix precisely indicates how many characters to read for each string, ensuring unambiguous and lossless decoding.
Related Problems
Simulation pattern
Don't just read it. Drill it.
Reconstruct Encode and Decode Strings from memory until it sticks. AlgoDrill blanks out key lines and makes you fill them back in, step by step.
Start drilling Encode and Decode Strings for free