Last Updated: April 2, 2026
We need to find the minimum number of single-character edits (insertions, deletions, or replacements) to transform one string into another. This metric is also known as the Levenshtein distance, and it shows up everywhere, from spell checkers to DNA sequence alignment.
The tricky part is that at each position, we have three choices, and the best choice at one position depends on what we do at other positions. You can't just greedily pick the locally best edit. For example, replacing a character might look good now but could force extra edits later. This kind of overlapping decision-making is a strong signal for dynamic programming.
The key observation: if the last characters of both strings match, we don't need an edit for them, and the problem reduces to the prefixes without those characters. If they don't match, we try all three operations and take the minimum.
0 <= word1.length, word2.length <= 500 --> With both strings up to 500 characters, an O(m n) solution means at most 250,000 operations, which is very comfortable. Even O(m n) space (a 500x500 table) is fine.word1 and word2 consist of lowercase English letters --> No special characters or Unicode to worry about. Simple character comparison works.The most natural way to think about this problem is recursively. Compare the strings from the end. If the last characters match, no edit is needed for those characters, so recurse on the remaining prefixes. If they don't match, try all three operations and take the minimum:
The base cases are simple: if word1 is empty, we need j insertions. If word2 is empty, we need i deletions.
solve(i, j) that returns the edit distance between word1[0..i-1] and word2[0..j-1].i == 0, return j. If j == 0, return i.word1[i-1] == word2[j-1], the characters match, return solve(i-1, j-1).1 + min(solve(i-1, j), solve(i, j-1), solve(i-1, j-1)) corresponding to delete, insert, and replace.The recursion recomputes the same subproblems many times. There are only (m+1) * (n+1) unique states, so what if we cached each result?
The recursive solution has a massive amount of redundant work. The subproblem solve(i, j) only depends on i and j. There are at most (m+1) * (n+1) unique pairs, so if we store each result after computing it, we eliminate all redundant calls.
This is the classic memoization pattern: keep the same recursive structure but add a cache. The first time we compute solve(i, j), we store the result. Every subsequent call with the same arguments just looks up the answer.
solve(i, j) the same way as Approach 1.memo[i][j] is already set. If so, return it immediately.memo[i][j] before returning.The memoization eliminates redundant work, but we still pay the overhead of recursive function calls. What if we computed all subproblems bottom-up in a simple 2D array?
Instead of letting recursion decide which subproblems to solve, we can fill in a 2D table systematically from smaller subproblems to larger ones. Define dp[i][j] as the edit distance between word1[0..i-1] and word2[0..j-1]. The recurrence is the same as our recursive solution, but now we iterate through all (i, j) pairs in order.
The base cases fill the first row and first column: dp[0][j] = j (inserting j characters into an empty string) and dp[i][0] = i (deleting i characters to reach an empty string). Then each cell depends on its left neighbor, top neighbor, and top-left diagonal neighbor, so we fill the table row by row, left to right.
The bottom-up approach works because of optimal substructure: the edit distance for the full strings is built from edit distances of their prefixes. Each cell in the table represents a smaller version of the same problem, and the three neighbors capture the three possible last operations:
dp of size (m+1) x (n+1).dp[i][0] = i for all i, and dp[0][j] = j for all j.dp[i][j] (1-indexed):word1[i-1] == word2[j-1], set dp[i][j] = dp[i-1][j-1] (no edit needed).dp[i][j] = 1 + min(dp[i-1][j], dp[i][j-1], dp[i-1][j-1]).dp[m][n].We're storing the entire 2D table, but each row only depends on the current and previous row. Can we reduce the space to a single row?
Look at the dependency pattern: dp[i][j] depends on dp[i-1][j-1], dp[i-1][j], and dp[i][j-1]. That means we only need the current row and the previous row. We can use a single 1D array if we save the diagonal value in a temporary variable before it gets overwritten.
As we fill left to right, dp[j-1] is the current row's left neighbor (already updated), and dp[j] is the previous row's value at column j (not yet updated). The only value we'd lose is dp[i-1][j-1] (the diagonal), so we save it in prev before moving on.
dp of size (n+1), initialized with dp[j] = j for all j (base case for row 0).dp[0] as prev (the diagonal for column 1). Set dp[0] = i (base case for column 0).dp[j] as temp (it will become the diagonal for the next column).dp[j] = prev.dp[j] = 1 + min(prev, dp[j], dp[j-1]).prev = temp.dp[n].