AlgoMaster Logo

Set Mismatch

easyFrequency9 min readUpdated June 23, 2026

Understanding the Problem

We have an array that should contain every integer from 1 to n exactly once, but something went wrong. One number appears twice, and because of that, another number is missing entirely. Our job is to find both: the duplicate and the missing number.

The constraint 1 <= nums[i] <= nums.length carries most of the weight. Every value sits in the range [1, n], so each value can double as an index into the array. That mapping is what makes the sub-linear-space solutions possible.

Counting occurrences of every number solves the problem directly. Beyond that, the tight value-to-index relationship opens up sorting, sum and sum-of-squares math, and in-place index marking.

Key Constraints:

  • 2 <= nums.length <= 10^4: With n up to 10,000, an O(n) or O(n log n) solution runs comfortably, so there is no need to settle for the O(n) extra space of a hash map.
  • 1 <= nums[i] <= nums.length: Every value maps to a valid index in [1, n], which is what enables the index-marking and mathematical approaches below.

Approach 1: Sorting

Intuition

Sorting puts every value next to the value one larger than it. After sorting, the duplicate appears as two equal adjacent elements, and the missing number appears as a gap where two adjacent elements differ by more than one. A single scan over the sorted array finds both.

Algorithm

  1. Sort the array in ascending order.
  2. Check if the first element is 1. If not, the missing number is 1.
  3. Iterate through the sorted array from index 1 to n-1.
  4. If nums[i] == nums[i-1], we found the duplicate.
  5. If nums[i] > nums[i-1] + 1, the missing number is nums[i-1] + 1.
  6. After the loop, if missing is still not found, the missing number must be n.
  7. Return [duplicate, missing].

Example Walkthrough

Input:

0
1
1
2
2
2
3
4
nums

The array is already sorted, so [1, 2, 2, 4]. Since nums[0] = 1, the missing number is not 1, and missing stays at -1 for now. We scan adjacent pairs from index 1:

  • i=1: nums[1]=2, nums[0]=1. Not equal and the gap is exactly 1, so no duplicate and no missing here.
  • i=2: nums[2]=2, nums[1]=2. Equal, so duplicate = 2.
  • i=3: nums[3]=4, nums[2]=2. Since 4 > 2 + 1, there is a gap, so missing = 2 + 1 = 3.

After the loop, missing is no longer -1, so the final-fallback check (missing equals n) does not fire. We return [2, 3].

Result:

0
2
1
3
result

Code

Sorting does more work than the problem requires, since full order is unnecessary when we only need per-value counts. The next approach drops to linear time by counting occurrences directly.

Approach 2: Hash Map Counting

Intuition

In a correct array, every number from 1 to n appears exactly once. The error changes exactly two of those counts: the duplicate has count 2, and the missing number has count 0. Building a frequency map in one pass, then scanning the numbers 1 through n, recovers both. The number whose count is 2 is the duplicate; the number whose count is 0 is the missing one.

Algorithm

  1. Create a hash map and count occurrences of each number in the array.
  2. Iterate through numbers 1 to n.
  3. If a number has count 2, it's the duplicate.
  4. If a number has count 0 (not in the map), it's the missing number.
  5. Return [duplicate, missing].

Example Walkthrough

nums
1Start: build frequency map from nums
0
1
i
1
2
2
2
3
4
countMap
1Empty map
1/6

Code

The hash map uses O(n) extra space. The next approach removes that by reducing the problem to two equations, using no extra storage beyond a few accumulators.

Approach 3: Sum and Sum of Squares

Intuition

Let dup be the duplicate and miss the missing number. Compare the given array against a correct array containing 1 through n exactly once. Two aggregates capture the difference. The sum of the array exceeds the expected sum 1 + 2 + ... + n by exactly dup - miss, because the duplicate added one extra copy of dup and the missing number removed one copy of miss. The sum of squares exceeds the expected sum of squares by dup^2 - miss^2.

Those are two equations in two unknowns:

Since dup^2 - miss^2 = (dup - miss)(dup + miss), dividing the second equation by d gives dup + miss. With both dup - miss and dup + miss known, dup = ((dup-miss) + (dup+miss)) / 2 and miss = dup - d.

This runs in O(n) time and O(1) extra space, and unlike the next approach it never modifies the input array.

Algorithm

  1. Compute sumExpected = n(n+1)/2 and sqExpected = n(n+1)(2n+1)/6.
  2. In one pass, compute sumActual (sum of all elements) and sqActual (sum of squares).
  3. Let d = sumActual - sumExpected, which equals dup - miss.
  4. Let s = (sqActual - sqExpected) / d, which equals dup + miss.
  5. Then dup = (d + s) / 2 and miss = dup - d.
  6. Return [dup, miss].

Example Walkthrough

1n=4. Expected sum=10, expected sq=30.
0
1
i=0
1
2
2
2
3
4
1/8

The sum of squares grows quickly: for n up to 10,000 it reaches about 3.3 x 10^11, well past the range of a 32-bit integer. The accumulators must use 64-bit integers (long in Java and C#, long long in C++, int64/i64 in Go and Rust). JavaScript and TypeScript numbers are double-precision floats, which represent integers exactly up to 2^53, so no overflow occurs there.

Code

The math approach is O(1) space but relies on 64-bit arithmetic and a division that assumes d is nonzero. The next approach reaches the same O(1) space using only the sign bit of the existing array entries, with no large intermediate values.

Approach 4: Negative Marking (Optimal)

Intuition

Every value in the array is between 1 and n, so each value can index into the array itself. Visiting a number negates the value at the index it points to, marking that index as seen. Reaching an index that is already negative means the number pointing there has been seen before, which identifies the duplicate. After the marking pass, the one index that still holds a positive value corresponds to the missing number.

This uses the input array as a boolean visited array, with the sign bit as the flag. It runs in O(n) time and O(1) extra space.

Algorithm

  1. Iterate through the array. For each element, compute the target index as abs(nums[i]) - 1.
  2. Check the value at nums[targetIndex]. If it's already negative, then abs(nums[i]) is the duplicate.
  3. Otherwise, negate nums[targetIndex] to mark it as visited.
  4. After the first pass, iterate through the array again. The index where nums[index] > 0 corresponds to the missing number (which is index + 1).
  5. Return [duplicate, missing].

Example Walkthrough

1Initial: nums = [1, 2, 2, 4]
0
1
i=0
1
2
2
2
3
4
1/7

Code