We have a binary tree and we need to compute the average of all node values at each depth level. Level 0 is the root, level 1 is the root's children, level 2 is the grandchildren, and so on.
The phrase "each level" means we are not computing a single average over the whole tree. We need to group nodes by their depth, compute a separate average for each group, and return those averages in order from top to bottom.
Any traversal that can group nodes by depth solves the problem. BFS visits the levels one at a time, so the grouping is direct. DFS can do the same job by recording each node's depth as it descends. Both approaches follow.
1 <= number of nodes <= 10^4 -> Small enough that any O(n) traversal runs well within limits. Time is not the constraint that matters here.-2^31 <= Node.val <= 2^31 - 1 -> A single level can hold thousands of values near the 32-bit limits, so a 32-bit running sum can overflow. Use a 64-bit accumulator. The worst-case sum is about 10^4 x 2^31, roughly 2.1 x 10^13, which a 64-bit integer holds easily and a double-precision float represents exactly (it is below 2^53).BFS processes the tree level by level, which matches the shape of the output. A queue holds the frontier. At the start of each round, the queue contains exactly the nodes of one level, so its size tells us how many nodes to dequeue. We dequeue that many, sum their values, record sum / count as the level's average, and enqueue their children.
Capturing the queue size before dequeuing is what separates the levels. Every node enqueued during a round is a child of a current-level node, so it belongs to the next level, and the loop bound guarantees none of those children are processed until the following round. No markers or sentinel values are needed.
A 64-bit accumulator is the simplest answer to overflow, but a running mean avoids large intermediates entirely: for the k-th value v dequeued at a level, update avg += (v - avg) / k. The intermediate values stay within the range of the node values, at the cost of slightly different floating-point rounding along the way.
BFS visits each node once, so its time is optimal. Its memory cost is the queue, which peaks at the widest level. The same per-level sums and counts can be collected during a depth-first traversal, where the memory cost is the recursion stack instead of a queue.
DFS does not process nodes level by level, but every recursive call knows its depth. Passing the current depth as a parameter lets us accumulate the sum and count for each level in two parallel lists indexed by depth. After the traversal finishes, sums[i] / counts[i] is the average for level i.
BFS groups nodes by level through the queue. DFS reaches the same grouping by using depth as an index into the tracking lists. Only the visit order differs.
The memory trade-off depends on the tree's shape. For a balanced tree, the recursion stack holds O(log n) frames while the BFS queue peaks at O(n) nodes on the widest level, so DFS uses far less memory. For a skewed tree the comparison flips: the stack grows to O(n) while the BFS queue never holds more than a node or two.
The result does not depend on visit order. Each node contributes its value once to sums[depth] and once to counts[depth], and addition is commutative, so any traversal that visits every node exactly once produces the same two lists. Whether node 9 is visited before or after node 20 is irrelevant; both contribute to level 1.
The depth == sums.size() check detects the first visit to each depth. A node at depth d is reached only through its parent at depth d - 1, which was visited first, so depth never exceeds sums.size() at the moment a node is visited. The lists grow one level at a time, with no gaps.
sums to store the running sum at each level, and counts to store the number of nodes at each level.sums, this is the first time we have reached this depth. Append the node's value to sums and 1 to counts.sums[depth] and increment counts[depth].sums[i] by counts[i] for each level.The recursion can also be replaced with an explicit stack of (node, depth) pairs. The bookkeeping is identical, and it removes the risk of stack overflow on a deep, skewed tree.