Imagine you need to analyze 50GB of server logs to find the most frequent error messages. The sequential word counter takes 20 minutes. You have 16 cores available. Can you make it 16x faster?

The answer is: nearly. Unlike sorting, word frequency counting is embarrassingly parallel in its map phase. Each chunk of text can be processed independently without any coordination. The pattern appears everywhere: search engine indexing, log analysis, text analytics, and natural language processing pipelines.

In this chapter, we'll build a multi-threaded word frequency counter from scratch. We'll explore three approaches: shared concurrent maps, thread-local maps with merge, and the fork-join MapReduce pattern. Along the way, we'll tackle the subtle problem of words split across chunk boundaries and learn when each approach shines.

Problem Statement

Premium Content

This content is for premium members only.

Multi-threaded Word Frequency Counter

Problem Statement

Premium Content

Get Premium