Distinct Subsequences
Problem
Given two strings s and t, return the number of distinct subsequences of s which equals t.
- 0 ≤ s.length, t.length ≤ 1000
- s and t consist of English letters.
Example
s = "rabbbit", t = "rabbit"3The string "rabbbit" has 3 distinct subsequences that equal "rabbit". The algorithm explores the string s starting from index 0 and t from index 0. At each step, if characters match, it considers two paths: matching these characters and moving both indices forward, or skipping the current character in s and moving only s's index forward. If characters don't match, it skips the current character in s. Memoization caches results for each (i, j) pair to avoid redundant computations. The critical insight is that the problem breaks down into counting subsequences by exploring these two choices recursively, and memoization ensures polynomial time complexity.
Approach
Straightforward Solution
A brute-force recursive approach tries all subsequences of s and checks if they equal t, resulting in exponential time due to repeated exploration of the same states.
Core Observation
The count of subsequences matching t from position j in s starting at position i depends on whether s[i] matches t[j]. If they match, the total count is the sum of counts from advancing both indices and counts from skipping s[i]. If they don't match, only skipping s[i] is valid. This recursive structure exhibits overlapping subproblems and optimal substructure.
Path to Optimal
PreviewRecognizing the overlapping subproblems, memoization is introduced to cache results for each (i, j) state, preventing redundant computations. This transforms the exponential recursion into a polynomial time dynamic programming solution…
Full step-by-step walkthrough on Pro →
Optimal Approach
PreviewImplement a recursive function dp(i, j) that returns the number of subsequences of s[i:] matching t[j:]. Use memoization to cache results…
Full step-by-step walkthrough on Pro →
Want the full reasoning chain?
Unlock the complete walkthrough, line-by-line analysis, and recall drill.
Unlock ProTime
O(n * m)
Each state (i, j) is computed once and cached. There are at most n * m states, where n = len(s) and m = len(t). Each state computation takes O(1) time.
Space
O(n * m)
The memoization dictionary stores results for each pair of indices (i, j), requiring O(n * m) auxiliary space. The recursion stack depth is at most O(n + m), which is bounded by O(n * m).
Pattern Spotlight
Dynamic Programming (Top-Down Memoization for Counting Subsequences)
When counting subsequences or combinations with overlapping subproblems, model the problem as a state machine over indices and cache results to avoid exponential recomputation, transforming recursion into efficient DP.
Solution
| 1 | class Solution:
|
| 2 | def numDistinct(self, s: str, t: str) -> int:
|
| 3 | memo = {}
|
| 4 |
|
| 5 | def dp(i, j):
|
| 6 | if j == len(t):
|
| 7 | return 1
|
| 8 | if i == len(s):
|
| 9 | return 0
|
| 10 | if (i, j) in memo:
|
| 11 | return memo[(i, j)]
|
| 12 |
|
| 13 | if s[i] == t[j]:
|
| 14 | memo[(i, j)] = dp(i + 1, j + 1) + dp(i + 1, j)
|
| 15 | else:
|
| 16 | memo[(i, j)] = dp(i + 1, j)
|
| 17 |
|
| 18 | return memo[(i, j)]
|
| 19 |
|
| 20 | return dp(0, 0) |
Step-by-Step Solution
Cache Recursive Results to Avoid Redundant Computations
| 3 | memo = {}
|
| 10 | if (i, j) in memo:
|
| 11 | return memo[(i, j)]
|
Objective
To store the results of subproblems defined by indices (i, j) to prevent exponential recomputation.
Key Insight
Memoization transforms the naive recursive solution into an efficient dynamic programming approach by caching the number of subsequences for each pair of indices. This avoids recomputing the same subproblems multiple times, which is the main cause of exponential time complexity in naive recursion.
Interview Quick-Check
Core Logic
Memoization caches results keyed by (i, j), ensuring each subproblem is solved once.
Common Pitfalls & Bugs
Forgetting to cache results leads to exponential time due to repeated exploration of identical states.
Return 1 When Target Fully Matched and 0 When Source Exhausted
To define base cases that terminate recursion when the target string is fully matched or the source string is exhausted without a match.
Sum Counts of Matching and Skipping Characters to Count Subsequences
To compute the number of subsequences by considering both matching the current characters and skipping the current character in s.
Return Cached Result to Complete Recursive Computation
To return the computed or cached number of subsequences for the current indices.
Initiate Recursive Counting from Start of Both Strings
To start the recursive counting process from the beginning of s and t.
4 more steps with full analysis available on Pro.
Line Analysis
This solution has 8 Critical lines interviewers watch for.
if j == len(t):
Check if the target string has been fully matched.
When j equals the length of t, it means a valid subsequence has been found, so return 1 to count this successful path.
if i == len(s):
Check if the source string has been exhausted before matching the target.
If i equals the length of s but j is not at the end of t, no further subsequences can match, so return 0 to indicate failure.
memo[(i, j)] = dp(i + 1, j + 1) + dp(i + 1, j)
Calculate and cache the sum of counts from matching and skipping the current character.
This line embodies the core DP recurrence: when characters match, the total count is the sum of subsequences that include this match and those that skip it, ensuring all distinct subsequences are counted.
Full line-by-line criticality + rationale for all 13 lines available on Pro.
Test Your Understanding
Why does the algorithm add the results of dp(i + 1, j + 1) and dp(i + 1, j) when s[i] == t[j]?
See the answer with Pro.
Related Problems
Dynamic Programming pattern
Don't just read it. Drill it.
Reconstruct Distinct Subsequences from memory until it sticks. AlgoDrill blanks out key lines and makes you fill them back in, step by step.
Unlock the Distinct Subsequences drill