Last Updated: March 28, 2026
We are given a 2D binary matrix of integers (0s and 1s) and we need to count every square submatrix that consists entirely of 1s. A 1x1 cell with value 1 counts as a square. A 2x2 block of all 1s counts as one more square. A 3x3 block of all 1s adds yet another, and so on.
The critical thing to notice is that we are counting all squares at every size, not just finding the largest one. If a cell can be the bottom-right corner of a 3x3 all-ones square, it is also the bottom-right corner of a 2x2 and a 1x1 all-ones square. So a cell that supports a square of side length k contributes exactly k squares to the total count. This observation is what makes the DP approach so elegant.
1 <= m, n <= 300 → The matrix has at most 90,000 cells. An O(m n) solution runs in under a millisecond. Even O(m n min(m,n)) is feasible at around 27 million operations. But O(m^2 n^2) at roughly 8 billion is too slow.0 <= arr[i][j] <= 1 → The values are integers 0 and 1 (not characters like in Maximal Square). Binary values with neighbor dependencies scream DP.The most straightforward idea: for every possible top-left corner (i, j) and every possible side length k, check whether all k*k cells inside that square are 1. If they are, count it. This directly implements the definition without any cleverness.
For each cell where matrix[i][j] == 1, we try side lengths 1, 2, 3, ... as long as the square fits within the matrix. For each side length, we scan every cell inside the square. The moment we find a 0, that side length fails, and all larger side lengths from this corner will also fail, so we move on.
count = 0. a. Determine the maximum possible side length: maxK = min(m - i, n - j).
b. For each side length k from 1 to maxK:
count.count.Input:
For each cell with value 1, we try expanding squares. Cell (0,1): 1x1 works (count 1), 2x2 fails at (0,0)=0. Cell (1,1): 1x1 works, 2x2 fails at (0,0)=0. Cell (1,2): 1x1 works, 2x2 works (all 1s in rows 0-1, cols 1-2), 3x3 fails. We repeat for every cell. The final count is 15.
The bottleneck is re-scanning the entire square for every cell and side length, doing massive redundant work. What if we could determine the largest all-ones square ending at each cell using only its immediate neighbors in O(1)?
This is a close sibling of the Maximal Square problem (LeetCode #221), using the exact same DP recurrence. The twist is in what we do with the dp values.
Define dp[i][j] as the side length of the largest all-ones square whose bottom-right corner is at position (i, j). If matrix[i][j] is 0, then dp[i][j] = 0 since no all-ones square can end at a 0 cell.
For a cell where matrix[i][j] == 1, three things must hold for it to be the bottom-right corner of a k x k square: the cell above, the cell to the left, and the cell diagonally above-left must each support at least a (k-1) x (k-1) square. The weakest neighbor limits how far we can extend, giving us the recurrence:
dp[i][j] = min(dp[i-1][j], dp[i][j-1], dp[i-1][j-1]) + 1
Now here is the key insight for counting. If dp[i][j] = 3, that cell is the bottom-right corner of a 3x3, a 2x2, and a 1x1 all-ones square. So it contributes exactly 3 squares. The answer is simply the sum of all dp values.
The dp recurrence captures a structural property of squares. A square of side k ending at (i, j) requires three overlapping squares of side k-1: one ending at the cell above, one at the cell to the left, and one at the diagonal. The minimum of these three determines the bottleneck. If the cell to the left only supports a 2x2 square, you cannot form a 4x4 here, no matter how large the other two neighbors are.
The counting trick follows directly. If dp[i][j] = k, there exist all-ones squares of side 1, 2, ..., k with their bottom-right corner at (i, j). That is exactly k squares. Summing dp values counts every square exactly once because each square is counted at exactly one bottom-right corner.
dp of size m x n, initialized to 0.count = 0.matrix[i][j] == 1: a. If i == 0 or j == 0 (first row or first column), set dp[i][j] = 1.
b. Otherwise, dp[i][j] = min(dp[i-1][j], dp[i][j-1], dp[i-1][j-1]) + 1.
c. Add dp[i][j] to count.
count.The time is already optimal, but the space matches the input size. Since the recurrence only depends on the current and previous row, we can compress the dp table to a single row.
When we compute dp[i][j], we only look at three values: dp[i-1][j] (same column, previous row), dp[i][j-1] (previous column, current row), and dp[i-1][j-1] (diagonal, previous row). So we only need the current row and the previous row at any time. We can go further and use just a single 1D array, processing the matrix row by row.
The trick is managing the diagonal value. As we update dp[j] left to right, dp[j] currently holds the value from the previous row (that is dp[i-1][j]). dp[j-1] was already updated for the current row (that is dp[i][j-1]). But we also need dp[i-1][j-1], which was the old value of dp[j-1] before we overwrote it. We save it in a variable called prev before each update.
dp of size n, initialized to 0.count = 0 and prev = 0. a. Reset prev = 0 at the start of each row.
b. For each column j from 0 to n-1:
temp = dp[j] (this will become prev for the next column).matrix[i][j] == 1: if i == 0 or j == 0, set dp[j] = 1; otherwise dp[j] = min(dp[j], dp[j-1], prev) + 1. Add dp[j] to count.dp[j] = 0.prev = temp.count.