Sale: Get 60% Off on all Pro Plans. Buy Now

Encode and Decode Strings

Medium Simulation

Problem

Given a list of strings, encode them into a single string such that it can be decoded back to the original list of strings exactly.

  • 0 ≤ strs.length ≤ 10⁴
  • 0 ≤ strs[i].length ≤ 10⁴
  • strs[i] may contain any ASCII characters

Example

Input: strs = ["hello", "world"]
Output: "5#hello5#world"

The encoding concatenates each string preceded by its length and a delimiter '#'. For example, "hello" has length 5, so it is encoded as "5#hello". The decoder reads the length prefix, then extracts the substring of that length after the '#'. This process repeats until the entire encoded string is decoded back into the original list ["hello", "world"]. The critical moment is correctly parsing the length prefix and using it to slice the exact substring, ensuring no ambiguity even if strings contain digits or '#'.

Approach

Straightforward Solution

A naive approach might join strings with a delimiter like a comma or space, but this fails if strings contain that delimiter. Another naive approach is to escape delimiters, but escaping can be complex and error-prone.

Core Observation

The fundamental challenge is to encode a list of strings into a single string without losing information or introducing ambiguity, especially since strings can contain any characters including digits and the delimiter itself.

Path to Optimal

The key insight is to prefix each string with its length and a unique delimiter (here '#'). This length prefix acts as a precise marker, allowing the decoder to know exactly how many characters to read for each string, regardless of content. This approach avoids ambiguity and does not require escaping characters inside the strings.

Optimal Approach

Encode by concatenating each string as '<length>#<string>'. Decode by reading the length prefix until the delimiter '#', then extract the substring of that length. Repeat until the entire encoded string is processed. This guarantees lossless and unambiguous encoding and decoding.

Time

O(n)

Encoding and decoding each character exactly once results in linear time complexity proportional to the total length of all strings combined.

Space

O(n)

The encoded string and the decoded list require space proportional to the total input size. Auxiliary space is linear due to building the output string and result list.

Pattern Spotlight

String Manipulation (Length-Prefixed Encoding)

When encoding multiple strings into one string, prefix each string with its length and a delimiter to enable exact slicing during decoding, avoiding ambiguity caused by delimiter characters inside the strings.

Solution

Python
1class Solution:
2 def encode(self, strs: List[str]) -> str:
3 res = ""
4 for s in strs:
5 res += str(len(s)) + "#" + s
6 return res
7
8 def decode(self, s: str) -> List[str]:
9 res = []
10 i = 0
11 while i < len(s):
12 j = i
13 while s[j] != '#':
14 j += 1
15 length = int(s[i:j])
16
17 start_of_str = j + 1
18 end_of_str = start_of_str + length
19 res.append(s[start_of_str:end_of_str])
20
21 i = end_of_str
22 return res

Step-by-Step Solution

1

Build Encoded String by Prefixing Length and Delimiter

3res = ""
4for s in strs:
5 res += str(len(s)) + "#" + s
6return res

Objective

To transform the list of strings into a single encoded string by concatenating each string with its length and a delimiter.

Key Insight

Prefixing each string with its length and a delimiter '#' creates a self-describing encoding. This allows the decoder to parse the length first, then extract the exact substring, regardless of the string's content. This approach avoids the complexity and pitfalls of escaping delimiters inside strings.

Interview Quick-Check

Core Logic

Concatenate each string as '<length>#<string>' to enable exact substring extraction during decoding.

Common Pitfalls & Bugs

Forgetting to convert the length to string or omitting the delimiter '#' leads to ambiguous decoding.

2

Decode Encoded String by Parsing Length Prefix and Extracting Substrings

9res = []
10i = 0
11while i < len(s):
12 j = i
13 while s[j] != '#':
14 j += 1
15 length = int(s[i:j])
17 start_of_str = j + 1
18 end_of_str = start_of_str + length
19 res.append(s[start_of_str:end_of_str])
21 i = end_of_str
22return res

Objective

To reconstruct the original list of strings by reading length prefixes and slicing substrings accordingly.

Key Insight

The decoder scans the encoded string, first reading characters until it encounters the delimiter '#', which marks the end of the length prefix. Converting this prefix to an integer gives the exact length of the next string. Using this length, the decoder extracts the substring immediately following the delimiter. This process repeats until the entire encoded string is consumed, ensuring accurate and lossless reconstruction.

Interview Quick-Check

Core Logic

Parse the length prefix by scanning until '#', then slice the substring of that length to decode each original string.

State & Boundaries

Use a pointer to track the current position in the encoded string, advancing it by the length of the parsed substring plus the length prefix and delimiter.

Common Pitfalls & Bugs

Failing to update the pointer correctly after extracting each substring can cause infinite loops or incorrect decoding.

Line Analysis

This solution has 2 Critical lines interviewers watch for.

Line 5 Critical
res += str(len(s)) + "#" + s

Append the length of the string, a delimiter '#', and the string itself to the result.

This line implements the core encoding strategy: prefixing each string with its length and a delimiter to enable unambiguous decoding.

Line 15 Critical
length = int(s[i:j])

Convert the substring between pointers i and j to an integer length.

Parsing the length prefix as an integer determines how many characters to extract for the next string, enabling precise slicing.

Line 19 Core
res.append(s[start_of_str:end_of_str])

Extract the substring of the specified length and append it to the result list.

This step reconstructs one original string from the encoded data, preserving order and content.

Line 13 Core
while s[j] != '#':

Advance the secondary pointer until the delimiter '#' is found.

Locating the delimiter is essential to extract the length prefix accurately.

Line 21 Core
i = end_of_str

Advance the main pointer to the end of the extracted substring for the next iteration.

Updating the pointer ensures the decoder progresses through the encoded string without overlap or repetition.

Line 11 Standard
while i < len(s):

Start a loop to process the encoded string until fully parsed.

This loop ensures that decoding continues until all encoded substrings have been extracted.

Line 12 Standard
j = i

Set a secondary pointer to find the delimiter '#' marking the end of the length prefix.

This pointer scans forward to locate the delimiter, which separates the length prefix from the string content.

Line 14 Standard
j += 1

Increment the secondary pointer to continue searching for the delimiter.

This step iteratively moves the pointer forward, ensuring the delimiter is found even if the length prefix has multiple digits.

Line 4 Standard
for s in strs:

Iterate over each string in the input list.

Processing each string individually allows encoding them with their length prefixes in sequence.

Line 10 Standard
i = 0

Initialize a pointer to track the current position in the encoded string.

The pointer allows sequential parsing of the encoded string, ensuring all substrings are decoded in order.

Line 17 Standard
start_of_str = j + 1

Calculate the start index of the encoded string content after the delimiter.

This index marks where the actual string begins in the encoded string, immediately after the length prefix and delimiter.

Line 18 Standard
end_of_str = start_of_str + length

Calculate the end index of the string content using the parsed length.

This index defines the boundary for slicing the exact substring representing the original string.

Line 6 Standard
return res

Return the fully encoded string after processing all input strings.

Returning the concatenated encoded string completes the encoding process, providing the input for decoding.

Line 22 Standard
return res

Return the list of decoded strings after fully processing the encoded input.

Returning the reconstructed list completes the decoding process, providing the original input list.

Line 3 Basic
res = ""

Initialize an empty string to accumulate the encoded result.

This string will hold the concatenation of all encoded strings, forming the final encoded output.

Line 9 Basic
res = []

Initialize an empty list to store decoded strings.

This list accumulates the decoded strings as they are extracted from the encoded input.

Test Your Understanding

Why is prefixing each string with its length and a delimiter necessary instead of just joining with a delimiter?

Because strings can contain any characters including the delimiter, joining alone can cause ambiguity. The length prefix precisely indicates how many characters to read for each string, ensuring unambiguous and lossless decoding.

Related Problems

Simulation pattern

Don't just read it. Drill it.

Reconstruct Encode and Decode Strings from memory until it sticks. AlgoDrill blanks out key lines and makes you fill them back in, step by step.

Start drilling Encode and Decode Strings for free