AlgoMaster Logo

Edit Distance

Last Updated: April 2, 2026

hard
5 min read

Understanding the Problem

We need to find the minimum number of single-character edits (insertions, deletions, or replacements) to transform one string into another. This metric is also known as the Levenshtein distance, and it shows up everywhere, from spell checkers to DNA sequence alignment.

The tricky part is that at each position, we have three choices, and the best choice at one position depends on what we do at other positions. You can't just greedily pick the locally best edit. For example, replacing a character might look good now but could force extra edits later. This kind of overlapping decision-making is a strong signal for dynamic programming.

The key observation: if the last characters of both strings match, we don't need an edit for them, and the problem reduces to the prefixes without those characters. If they don't match, we try all three operations and take the minimum.

Key Constraints:

  • 0 <= word1.length, word2.length <= 500 --> With both strings up to 500 characters, an O(m n) solution means at most 250,000 operations, which is very comfortable. Even O(m n) space (a 500x500 table) is fine.
  • word1 and word2 consist of lowercase English letters --> No special characters or Unicode to worry about. Simple character comparison works.

Approach 1: Recursion (Brute Force)

Intuition

The most natural way to think about this problem is recursively. Compare the strings from the end. If the last characters match, no edit is needed for those characters, so recurse on the remaining prefixes. If they don't match, try all three operations and take the minimum:

  • Insert a character at the end of word1 to match word2's last character. This "uses up" the last character of word2, so recurse on word1[0..i] and word2[0..j-1].
  • Delete the last character of word1. Recurse on word1[0..i-1] and word2[0..j].
  • Replace the last character of word1 with word2's last character. Both last characters are now handled, so recurse on word1[0..i-1] and word2[0..j-1].

The base cases are simple: if word1 is empty, we need j insertions. If word2 is empty, we need i deletions.

Algorithm

  1. Define a recursive function solve(i, j) that returns the edit distance between word1[0..i-1] and word2[0..j-1].
  2. Base cases: if i == 0, return j. If j == 0, return i.
  3. If word1[i-1] == word2[j-1], the characters match, return solve(i-1, j-1).
  4. Otherwise, return 1 + min(solve(i-1, j), solve(i, j-1), solve(i-1, j-1)) corresponding to delete, insert, and replace.

Example Walkthrough

1solve(5,3): word1[4]='e' != word2[2]='s', try all 3 ops
0
h
1
o
2
r
3
s
4
e
i=5
1/4

Code

The recursion recomputes the same subproblems many times. There are only (m+1) * (n+1) unique states, so what if we cached each result?

Approach 2: Top-Down DP (Memoization)

Intuition

The recursive solution has a massive amount of redundant work. The subproblem solve(i, j) only depends on i and j. There are at most (m+1) * (n+1) unique pairs, so if we store each result after computing it, we eliminate all redundant calls.

This is the classic memoization pattern: keep the same recursive structure but add a cache. The first time we compute solve(i, j), we store the result. Every subsequent call with the same arguments just looks up the answer.

Algorithm

  1. Create a 2D memo table of size (m+1) x (n+1), initialized to -1.
  2. Define solve(i, j) the same way as Approach 1.
  3. Before computing, check if memo[i][j] is already set. If so, return it immediately.
  4. After computing the result, store it in memo[i][j] before returning.

Example Walkthrough

1solve(5,3): check memo[-1], compute: 'e'!='s', try 3 ops
0
h
1
o
2
r
3
s
4
e
i=5
1/4

Code

The memoization eliminates redundant work, but we still pay the overhead of recursive function calls. What if we computed all subproblems bottom-up in a simple 2D array?

Approach 3: Bottom-Up DP (Tabulation)

Intuition

Instead of letting recursion decide which subproblems to solve, we can fill in a 2D table systematically from smaller subproblems to larger ones. Define dp[i][j] as the edit distance between word1[0..i-1] and word2[0..j-1]. The recurrence is the same as our recursive solution, but now we iterate through all (i, j) pairs in order.

The base cases fill the first row and first column: dp[0][j] = j (inserting j characters into an empty string) and dp[i][0] = i (deleting i characters to reach an empty string). Then each cell depends on its left neighbor, top neighbor, and top-left diagonal neighbor, so we fill the table row by row, left to right.

Algorithm

  1. Create a 2D table dp of size (m+1) x (n+1).
  2. Initialize base cases: dp[i][0] = i for all i, and dp[0][j] = j for all j.
  3. For each cell dp[i][j] (1-indexed):
    • If word1[i-1] == word2[j-1], set dp[i][j] = dp[i-1][j-1] (no edit needed).
    • Otherwise, set dp[i][j] = 1 + min(dp[i-1][j], dp[i][j-1], dp[i-1][j-1]).
  4. Return dp[m][n].

Example Walkthrough

1Base cases: dp[i][0]=i (delete i chars), dp[0][j]=j (insert j chars)
0
1
2
3
0
0
1
2
3
1
1
0
0
0
2
2
0
0
0
3
3
0
0
0
4
4
0
0
0
5
5
0
0
0
1/7

Code

We're storing the entire 2D table, but each row only depends on the current and previous row. Can we reduce the space to a single row?

Approach 4: Space-Optimized DP

Intuition

Look at the dependency pattern: dp[i][j] depends on dp[i-1][j-1], dp[i-1][j], and dp[i][j-1]. That means we only need the current row and the previous row. We can use a single 1D array if we save the diagonal value in a temporary variable before it gets overwritten.

As we fill left to right, dp[j-1] is the current row's left neighbor (already updated), and dp[j] is the previous row's value at column j (not yet updated). The only value we'd lose is dp[i-1][j-1] (the diagonal), so we save it in prev before moving on.

Algorithm

  1. Create a 1D array dp of size (n+1), initialized with dp[j] = j for all j (base case for row 0).
  2. For each row i from 1 to m:
    • Save dp[0] as prev (the diagonal for column 1). Set dp[0] = i (base case for column 0).
    • For each column j from 1 to n:
      • Save the current dp[j] as temp (it will become the diagonal for the next column).
      • If characters match, dp[j] = prev.
      • Otherwise, dp[j] = 1 + min(prev, dp[j], dp[j-1]).
      • Update prev = temp.
  3. Return dp[n].

Example Walkthrough

1Initial dp (base case row 0): dp[j] = j
0
0
1
1
2
2
3
3
1/7

Code