AlgoMaster Logo

Goroutines and the Go Scheduler

Last Updated: February 1, 2026

A goroutine is Go's unit of concurrent execution. It's lighter than an OS thread, cheaper to create, and managed entirely by the Go runtime rather than the operating system.

The go keyword spawns a new goroutine that runs concurrently with the calling code. The function executes in the background while main() continues.

Goroutines vs Threads

AspectGoroutineOS Thread
Initial stack size~2KB~1-8MB
Creation time~0.3 microseconds~10+ microseconds
Context switch~100-200ns (user space)~1-10 microseconds (kernel)
Maximum countMillionsThousands
Managed byGo runtimeOperating system
StackGrowableFixed

The small initial stack is a key advantage. An OS thread typically reserves 1-8MB of stack space upfront (even if unused), limiting you to thousands of threads. Goroutines start with ~2KB stacks that grow and shrink as needed, allowing millions of concurrent goroutines on a single machine.

The Growable Stack

Goroutine stacks start small and grow dynamically:

When a goroutine needs more stack space:

  1. The runtime allocates a new, larger stack (typically 2x the current size)
  2. Copies the old stack contents to the new stack
  3. Updates all pointers within the stack
  4. Continues execution

This is transparent to your code. Stacks can also shrink during garbage collection if they're using much less than allocated.

The GMP Model

Go's scheduler uses the GMP model, named after its three core components:

  • G (Goroutine): The unit of work, representing a function to execute
  • M (Machine): An OS thread that executes goroutines
  • P (Processor): A logical processor that mediates between G and M

G (Goroutine)

Each goroutine (G) contains:

  • Stack pointer and program counter
  • Stack bounds (for growth detection)
  • Status (runnable, running, waiting, dead)
  • Reference to the current M (if running)
  • Scheduling-related fields (preemption flag, etc.)

M (Machine)

An M is an OS thread. It executes goroutines and interacts with the operating system:

  • Can be parked (sleeping) when no work is available
  • Can be created as needed (up to GOMAXPROCS running simultaneously)
  • System calls cause M to detach from P temporarily
  • Each M has a "g0" goroutine for scheduling work

P (Processor)

A P is a logical processor context. It acts as a token for executing Go code:

  • There are exactly GOMAXPROCS P's
  • Each P has a local run queue of goroutines
  • An M must acquire a P to execute Go code
  • P's enable work stealing between threads

Why P Exists

You might wonder why we need P when we have M. The answer becomes clear when you consider what would happen without P. Let's walk through three scenarios.

Scenario 1: Without P, syscalls would waste CPUs

Imagine a goroutine making a blocking syscall (reading a file). Without the P abstraction:

With P, the runtime decouples the goroutine from the thread:

Scenario 2: Without P, work stealing would need global locks

Without local run queues attached to P's, all goroutines would be in one global queue:

With P's local run queues:

Scenario 3: Without P, controlling parallelism would be awkward

How would you limit concurrency to 4 cores on an 8-core machine? Without P, you'd need to limit thread creation, but threads are also needed for syscalls.

With P:

In essence, P is a "CPU token" that separates scheduling (G on P) from execution (M runs P). This decoupling is what makes Go's scheduler efficient.

GOMAXPROCS

GOMAXPROCS controls the number of P's, which determines the maximum number of goroutines executing simultaneously:

Or set via environment variable:

Default behavior: Since Go 1.5, GOMAXPROCS defaults to the number of available CPUs. Before that, it defaulted to 1.

When to change it:

  • Running in containers with CPU limits: Set it to match the container's CPU quota
  • CPU-bound workloads: Usually leave at default
  • I/O-bound workloads: Might benefit from higher values (more M's waiting on I/O)
  • Debugging: Set to 1 to serialize execution

GOMAXPROCS doesn't limit the number of goroutines or OS threads. It limits how many goroutines run simultaneously. You can have millions of goroutines with GOMAXPROCS=4, they just take turns running on those 4 P's.

How the Scheduler Works

Creating a Goroutine

When you write go f():

  1. Runtime creates a new G with an initial 2KB stack
  2. G is added to the current P's local run queue (or global queue if local is full)
  3. The creating goroutine continues executing
  4. Eventually, the scheduler runs the new G

The Scheduling Loop

Each M runs a scheduling loop:

Finding a runnable G follows a priority order:

  1. Check runnext (single slot for the next G to run, cache-friendly)
  2. Check local run queue
  3. Check global run queue
  4. Network poller (for goroutines waiting on I/O)
  5. Steal from other P's

Work Stealing

When a P's local queue is empty, it steals work from other P's:

Work stealing takes half the victim's run queue, balancing load across processors.

Preemption

Go uses preemption to prevent a single goroutine from monopolizing a P.

Cooperative preemption (before Go 1.14):

  • Preemption only at safe points: function calls, channel operations, etc.
  • A tight loop with no function calls could block other goroutines indefinitely

Asynchronous preemption (Go 1.14+):

  • Runtime uses OS signals (SIGURG on Unix) to preempt goroutines
  • Even tight loops can be preempted
  • The scheduler sends a signal, and the goroutine stops at the next safe point

Goroutine States

A goroutine transitions through several states during its lifetime. Understanding these states helps with debugging and performance analysis.

State Details

StateTriggersDurationDebugging Visibility
Runnablego statement, unblocked from waitTypically microseconds to millisecondsVisible in pprof as "runnable", in GODEBUG as local/global queue
RunningScheduled by PUntil blocked, preempted, or finishedCurrent goroutine in stack trace
WaitingChannel op, mutex, I/O, sleep, selectVaries: nanoseconds to forever (leak!)Visible in pprof with blocking reason
DeadReturn, panic, runtime.GoexitInstant (becomes garbage)Not visible; memory reclaimed

Runnable state: The goroutine is ready to execute but waiting for a P. This happens when:

  • A new goroutine is created (go f())
  • A goroutine is unblocked (channel receive completes, mutex acquired)
  • A goroutine is preempted (ran too long, yielded for GC)

High runnable counts mean goroutines are competing for limited P's. Consider whether you're spawning too many goroutines or if GOMAXPROCS is too low.

Running state: The goroutine is actively executing on an M+P pair. Only GOMAXPROCS goroutines can be in this state simultaneously.

Waiting state: The goroutine is blocked, not consuming CPU. Common reasons:

  • chan receive: Waiting for data on a channel
  • chan send: Waiting for receiver on a full/unbuffered channel
  • select: Waiting for any case to be ready
  • sync.Mutex.Lock: Waiting for mutex
  • sync.Cond.Wait: Waiting for condition signal
  • time.Sleep: Waiting for timer
  • IO wait: Network I/O via netpoller
  • syscall: Blocking syscall (M also blocked)

Debugging tip: In pprof goroutine profiles, the waiting reason shows why a goroutine is blocked. Look for goroutines stuck in chan receive with no matching sender: that's likely a leak.

What Causes Blocking?

When a goroutine blocks, it releases its M (and P) so others can run:

OperationEffect
Channel send (full buffer/no receiver)G moves to channel's wait queue
Channel receive (empty buffer/no sender)G moves to channel's wait queue
Mutex Lock (already locked)G moves to mutex's wait queue
time.Sleep()G moves to timer heap
I/O operationM enters syscall, P handed off
runtime.Gosched()G yields, moves to run queue

System Calls and the Scheduler

System calls (file I/O, network I/O, etc.) require special handling because they block the OS thread:

Blocking System Calls

When a goroutine makes a blocking syscall:

  1. The M enters the syscall, releasing its P
  2. The P is handed to another M (or a new M is created)
  3. When the syscall returns, the M tries to reacquire a P
  4. If no P is available, the G goes to the global queue, and the M parks

Network I/O: The Netpoller

Network I/O is different from file I/O. While file reads truly block in the kernel, network operations can be made non-blocking with the right OS facilities. The netpoller is Go's integration with these facilities (epoll on Linux, kqueue on BSD/macOS, IOCP on Windows).

What the netpoller enables:

  1. No thread blocked: When waiting for network data, the goroutine is parked (sleeping) but the M is free to run other goroutines. A server can have 100,000 goroutines waiting for network I/O with only a handful of threads.
  2. Efficient multiplexing: Instead of one thread per connection (the thread-per-request model), Go uses a small number of threads to manage many connections via OS-level event notification.
  3. Seamless integration: Your code looks like blocking I/O (conn.Read(buf)), but under the hood it's non-blocking and event-driven.

How it works:

  1. When a goroutine does network I/O that would block:
    • The runtime sets the socket to non-blocking mode
    • The socket is registered with the netpoller
    • The goroutine is parked (moved to waiting state)
    • The M continues running other goroutines
  2. When the socket is ready:
    • The netpoller (running on a background thread) detects readiness
    • The goroutine is marked as runnable
    • The goroutine is added back to a run queue
    • An M picks it up and resumes execution

This is why Go handles thousands of concurrent network connections efficiently. The M's don't block on I/O.

Goroutine Leaks

A goroutine leak occurs when goroutines are created but never terminate. They consume memory and may hold resources:

Common Causes

1. Blocked channel operations:

Fix: Use buffered channel or select with context.

The fix addresses the root cause: the goroutine blocks because its send has no receiver. We have two options:

Option 1: Buffered channel - The send can complete even without a receiver, because the buffer absorbs the value. The goroutine can then exit, and the buffered value is garbage collected later.

Option 2: Select with context - The goroutine watches for cancellation and exits cleanly when the context is done.

2. Infinite loops without exit:

Why this leaks: The goroutine has no exit condition. Even if the function that called startWorker() returns, the goroutine runs forever, consuming memory and potentially CPU.

Fix: Use context for cancellation. Context is the idiomatic way to signal "please stop" to goroutines. The goroutine checks ctx.Done() regularly and exits when cancelled.

3. Missing case in select:

Why this leaks: A bare receive (<-ch) blocks until a value arrives. If the sender crashes, the channel is abandoned, or the send never happens, this goroutine waits forever.

Fix: Add timeout or context. Every blocking operation should have an escape hatch. Use time.After for simple timeouts or ctx.Done() for cancellation that propagates through your call stack.

Important: time.After creates a timer that isn't garbage collected until it fires. In hot paths, use time.NewTimer and call timer.Stop() to avoid memory leaks from accumulated timers.

Detecting Goroutine Leaks

Using runtime.NumGoroutine():

Using goleak (Uber's library):

Using pprof:

Then visit http://localhost:6060/debug/pprof/goroutine?debug=1 to see all goroutines and their stack traces.

Debugging the Scheduler

GODEBUG Environment Variable

Sample output:

Fields:

  • gomaxprocs: Number of P's
  • idleprocs: P's with no work
  • threads: Total M's
  • spinningthreads: M's looking for work
  • runqueue: Global run queue size
  • [...]: Local run queue sizes for each P

Stack Traces

Execution Tracer

View with: go tool trace trace.out

The trace shows:

  • Goroutine creation and blocking
  • GC events
  • Syscalls
  • Network I/O
  • Scheduler decisions

Performance Characteristics

Understanding the costs of goroutine operations helps you make informed design decisions.

Creation and Memory

MetricValueNotes
Initial stack size~2 KBGrows/shrinks dynamically
Maximum stack size1 GB (64-bit)Runtime limit, configurable
Creation time~0.3 microsecondsMuch faster than OS thread (~10 μs)
Goroutine struct~400 bytesRuntime overhead per goroutine

Practical implication: Creating a million goroutines uses about 2-3 GB of memory (stack + struct overhead). This is feasible for connection-per-goroutine servers, but watch memory usage under load.

Context Switch Costs

OperationCostNotes
Goroutine switch100-200 nsUser space, minimal state
OS thread switch1-10 μsKernel mode, full context
Syscall (fast path)~100 nsNo actual kernel entry
Syscall (slow path)~1 μsEnters kernel

Why goroutine switches are fast:

  • Only save/restore stack pointer + program counter
  • No kernel transition
  • No privilege level change
  • No TLB flush

Scheduler Overhead

OperationCostNotes
Work stealing~200 nsPer steal attempt
Global queue access~50 nsLock contention possible
Local queue push/pop~10 nsLock-free for owner
Netpoller check~100 nsAmortized across many goroutines

Benchmarking Goroutine Creation

Practical Guidelines

When to use more goroutines:

  • I/O-bound work (network, disk): goroutines are cheap, I/O is slow
  • Independent tasks that don't share state
  • Connection handling in servers

When to limit goroutines:

  • CPU-bound work: more goroutines than cores just adds overhead
  • Heavy memory usage per goroutine: watch total memory
  • Shared state with high contention: more goroutines = more contention

Rule of thumb for worker pools:

  • CPU-bound: runtime.NumCPU() workers
  • I/O-bound: 10-100x CPU count, tuned by benchmarking
  • Mixed: start with runtime.NumCPU() * 2, adjust based on profiling