Last Updated: January 31, 2026
For over 30 years, Python developers have lived with the GIL. CPU-bound threading does not scale. Multiprocessing works but adds complexity. The GIL seemed permanent, an unchangeable fact of Python life.
Then Python 3.13 arrived with an experimental feature that changes everything: free-threading. This is a Python interpreter that can run multiple threads in parallel, executing Python bytecode simultaneously on different cores. What once seemed impossible is now reality.
This chapter covers Python's free-threading mode: what it is, how to enable it, how it achieves thread safety without the GIL, what code needs to change, and when you should consider using it. This is cutting-edge Python, still experimental, but understanding it prepares you for where the language is heading.
Free-threading is an experimental build of CPython that removes the Global Interpreter Lock. Instead of one global lock serializing all Python bytecode execution, the interpreter uses fine-grained locking on individual objects. Multiple threads can execute Python code truly in parallel.
The following diagram contrasts the two approaches.
The left side shows traditional CPython: one global lock, threads take turns. Solid line means holding the lock, dotted means waiting. The right side shows free-threading: no global lock, threads run on separate CPU cores simultaneously. This is true parallelism for Python.
Enable Python threads to execute in parallel without:
This is a massive engineering challenge. The GIL exists because Python's internals assume single-threaded access. Removing it requires making every internal data structure thread-safe.
But why go through all this trouble? To understand the motivation, we need to see exactly what problem the GIL creates.
The GIL has been Python's Achilles heel for parallel computing. While Python excels at many tasks, CPU-bound parallelism has always required awkward workarounds.
Let's see this limitation in action. The following code runs the same CPU-bound work four times, first using four threads and then sequentially in a single thread.
Output (with GIL):
The results are striking. Four threads take the same time as running everything sequentially. In fact, the threaded version is slightly slower due to thread management overhead. The four threads are not running in parallel. They are taking turns, each one doing a bit of work before yielding to the next.
With free-threading, the same code achieves near-linear speedup. Four threads finish in roughly the same time as one thread because they truly execute in parallel on different CPU cores.
Several factors aligned:
With the motivation clear, let's see how to actually use free-threading in practice.
Free-threading is not enabled by default. It requires a specially compiled Python interpreter built with the --disable-gil configuration flag. You cannot simply flip a switch on your existing Python installation.
Option 1: pyenv (recommended)
Option 2: Build from source
Option 3: Pre-built binaries
Some package managers and Python distributions offer pre-built free-threaded binaries. Check python.org downloads for "free-threaded" or "no-GIL" variants.
Once you have installed a free-threaded Python build, verify that it is working correctly. The following script checks whether the GIL is disabled.
When running on a free-threaded build with the GIL disabled, you should see output like this:
Output (free-threaded build):
If you see "GIL is ENABLED," you either have a standard Python build or the GIL has been re-enabled via environment variable or configuration.
Even in a free-threaded build, you can re-enable the GIL:
You can also check the GIL status programmatically within your code:
This is useful for code that needs to behave differently depending on whether free-threading is active.
Now for the interesting part. Without the GIL, Python needs alternative mechanisms for thread safety. The interpreter cannot simply remove the GIL and hope for the best. Every internal data structure that was protected by the GIL must now protect itself.
The GIL is simple but coarse. It prevents all parallelism to avoid any possibility of data races. Free-threading takes a finer-grained approach: instead of one global lock, each object has its own lock.
The diagram below illustrates how per-object locking enables parallelism.
In this scenario, Thread 1 holds the lock on the List object and is waiting for the Int object's lock. Thread 2 holds locks on both the Dict and Int objects. The key insight is that Thread 1 and Thread 2 can work on different objects simultaneously. Only when both threads need the same object (the Int) does one have to wait.
This fine-grained locking is why free-threading achieves parallelism. Most operations work on different objects, so most operations can proceed in parallel. Contention only occurs when threads need the same object at the same time.
Every Python object tracks how many references point to it. When references drop to zero, the object gets deallocated. In a multi-threaded world without the GIL, multiple threads might increment or decrement the same object's reference count simultaneously. The obvious solution is atomic operations, but atomics are expensive. On modern CPUs, an atomic increment can be 10-20x slower than a regular increment due to cache coherency protocols.
Free-threading solves this with "biased" reference counting, a clever optimization that exploits a key observation: most objects are accessed primarily by a single thread. Think about it. A local variable in a function, a temporary list created during computation, an object stored in a thread-local cache. These are all "owned" by the thread that created them.
Here is how biased reference counting works:
The following conceptual model illustrates the idea.
When the owning thread increments the reference count, it simply adds 1 to local_refcount. No locks, no atomics, no memory barriers. But when a different thread needs to increment the count, it uses an atomic operation on shared_refcount. The total reference count is local_refcount + shared_refcount.
Why does this help? In typical Python programs, 80-90% of reference count operations happen on the owning thread. By making the common case fast and paying the atomic penalty only for cross-thread references, biased reference counting achieves near-GIL performance for single-threaded access patterns while still being correct for multi-threaded code.
Even with biased reference counting, some operations would still require too many reference count updates. Consider accessing a global variable or a module attribute. Every access creates a new reference, and every time the access ends, the reference goes away. In tight loops, this could mean millions of incref/decref pairs per second.
Deferred reference counting addresses this by postponing reference count updates. Instead of immediately incrementing and decrementing, the interpreter tracks these "borrowed" references in thread-local queues. Periodically, it processes these queues in batches, coalescing multiple increments and decrements into single operations.
This batching dramatically reduces the frequency of reference count modifications. A loop that accesses a global variable a million times might only trigger a handful of actual refcount updates. The trade-off is slightly delayed garbage collection, but the performance gain is substantial for common patterns like global lookups and module attribute access.
Some objects in Python are used constantly. None. True. False. Small integers like 0 and 1. Common strings. These objects exist from interpreter startup until shutdown, and they are accessed from every thread in every piece of code.
In free-threaded Python, these objects are marked as "immortal." An immortal object has a special reference count value that means "never deallocate." When the interpreter sees this special value, it skips the incref and decref operations entirely. No local count, no shared count, no atomics. Just a simple check and move on.
This optimization matters because these objects are accessed billions of times during a program's lifetime. Eliminating reference counting overhead for None alone has measurable performance impact. The immortal flag is checked at the very start of the incref/decref path, making the common case (accessing immortal objects) as fast as possible.
Together, biased reference counting, deferred reference counting, and immortal objects form a layered optimization strategy. Each technique handles a different access pattern, and together they make GIL-free reference counting practical.
Now that we understand how free-threading achieves thread safety, let's examine what this means for real-world performance. The mechanisms we discussed come with costs, but they also unlock benefits that were impossible with the GIL.
Where the overhead costs pay off is in parallel workloads. When your code can distribute CPU-intensive work across multiple threads, free-threading delivers near-linear speedup. The following benchmark demonstrates this dramatically.
Output (with GIL):
With the GIL, notice how the time scales linearly with thread count. Two threads take twice as long as one, four threads take four times as long. This is because threads execute sequentially, and each additional thread just adds more work to the queue.
Output (free-threaded):
With free-threading, the story changes completely. One thread takes 0.55 seconds (slightly slower due to free-threading overhead). But adding more threads barely changes the total time. Two threads do twice the work in almost the same time. Four threads, four times the work. Eight threads, eight times the work.
This is true parallelism. The threads run simultaneously on different CPU cores, so doubling the threads does not double the time. The slight increase (0.55s to 0.62s for 8 threads) comes from thread creation overhead and occasional lock contention, not from serialization.
Here is where things get interesting, and potentially dangerous. Removing the GIL exposes race conditions that were previously hidden. Code that "worked" with the GIL may suddenly produce wrong results or crash without it.
This does not mean your code was correct before. It means the GIL was masking bugs. Understanding these patterns helps you write code that works regardless of whether the GIL is present.
Compound operations are not atomic:
The most common issue is assuming that simple operations like counter += 1 are atomic. They are not.
The statement counter += 1 compiles to bytecode that reads counter into a register, adds 1, and writes back. With the GIL, threads switch between bytecode instructions, so this sequence usually completes without interruption. Without the GIL, two threads might read the same value, both increment to the same result, and both write, losing one increment.
List operations are not thread-safe:
Even operations that look atomic, like list.append(), are not guaranteed to be thread-safe without the GIL.
With the GIL, list.append() happens to be safe in CPython because it completes within a single bytecode instruction. Without the GIL, the list's internal buffer might be reallocated by one thread while another thread is writing to it. The result could be corrupted data, lost items, or a segmentation fault.
The good news is that writing thread-safe Python code follows the same principles that apply in any language. The patterns below will keep your code correct regardless of whether the GIL is present.
Use threading primitives for shared counters and flags:
When multiple threads need to modify the same variable, protect it with a lock. The with statement ensures the lock is always released, even if an exception occurs.
The lock ensures that only one thread can read and write counter at a time. Without the lock, two threads might both read counter = 5, both compute 5 + 1 = 6, and both write 6. You would lose one increment. With the lock, this cannot happen.
Use thread-safe collections for producer-consumer patterns:
Python's queue.Queue is explicitly designed for multi-threaded use. It handles all the locking internally, so you can focus on your application logic.
The Queue handles all synchronization internally. Multiple producers can call put() simultaneously, and multiple consumers can call get() simultaneously. The queue ensures no items are lost or duplicated.
Avoid shared mutable state when possible:
The safest pattern is to avoid sharing mutable state altogether. Instead of having threads modify a shared collection, have each thread return its result. The main thread then collects all results.
This pattern is called "map-reduce" or "scatter-gather." You scatter work to threads, each thread processes independently, and you gather results at the end. Since threads do not share mutable state, there is no possibility of race conditions.
Use thread-local storage for per-thread state:
Sometimes each thread needs its own copy of some data. Thread-local storage gives each thread a private namespace.
Thread-local storage is perfect for caching expensive resources (database connections, HTTP sessions) on a per-thread basis without any synchronization overhead.
If you use any libraries with C extensions (and most Python projects do), this section is critical. C extensions face the most significant changes in free-threaded Python, and understanding why helps you plan your migration.
For decades, C extension authors have relied on a simple assumption: the GIL protects them. When your C function is running, no other Python code is running. This means global variables in C code are safe to access, static buffers are safe to use, and you do not need to think about thread safety at all.
Here is what typical C extension code looks like:
Without the GIL, this code has a race condition. Two threads calling my_function simultaneously might both read global_counter = 5, both increment to 6, and both write 6. One increment is lost. The fix requires explicit synchronization, which means auditing and modifying C extension code.
If you maintain C extensions, there is some good news. Extensions built against Python's Limited API (also called the Stable ABI) are more likely to work with free-threading. The Limited API restricts which Python internals you can access, which means fewer assumptions about the GIL.
The Limited API does not automatically make your code thread-safe, but it does mean you are not relying on undocumented GIL behavior. Extensions using the full CPython API often access internal structures that change between Python versions and may assume GIL protection in subtle ways.
Free-threaded Python does not blindly assume all extensions are safe. Extensions must explicitly declare that they support free-threading. Until they do, Python may fall back to GIL-like serialization when calling into that extension.
In your pyproject.toml or setup.py, you will need to indicate free-threading support:
Before adding this flag, you must audit your extension for thread safety. Look for global variables, static buffers, lazy initialization without locks, and any assumption that only one thread executes your code at a time.
Before enabling free-threading, you should inventory the C extensions in your project. The following function helps identify which modules are C extensions versus pure Python.
Pure Python modules are generally safe with free-threading (assuming your code using them is thread-safe). C extensions require checking with the library maintainers or reviewing their issue tracker for free-threading status.
The Python ecosystem is actively working on free-threading support. Here is the current status of major libraries:
| Library | Free-Threading Support | Notes |
|---|---|---|
| NumPy | In progress | Critical for scientific Python. Active development. |
| Pandas | In progress | Depends on NumPy. Waiting for NumPy completion. |
| requests | Likely safe | Pure Python HTTP library. |
| aiohttp | Testing | Mostly pure Python with some C acceleration. |
| SQLAlchemy | Unknown | Complex C extensions for performance. |
| Pillow | Unknown | Heavy C code for image processing. |
This table will become outdated quickly. Always check the library's GitHub repository, issue tracker, and release notes for current free-threading status. Many libraries are adding explicit support as Python 3.13 matures.
If you maintain C extensions, here is a practical path forward:
static variable in your C code is a potential race condition.PyThread_acquire_lock() and related functions.With both thread safety changes and C extension compatibility understood, let's look at how to approach migrating an existing codebase to free-threading.
Moving to free-threading is not a simple upgrade. It requires planning, testing, and possibly code changes. The complexity depends on how much threading you use and how many C extensions your project depends on.
A solid testing strategy is essential for migration. You want to run your test suite in both modes, with and without the GIL, and compare results. The following script automates this comparison.
If tests pass with the GIL enabled but fail without it, you have thread-safety issues to fix. The failures will point you to code that was relying on GIL serialization for correctness. Run your tests multiple times without the GIL, since race conditions may only appear intermittently.
The safest approach is to migrate in stages rather than flipping a switch. The following diagram shows a recommended migration path that minimizes risk.
Start with the free-threaded build but keep the GIL enabled (PYTHON_GIL=1). This validates that your code works with the new interpreter without introducing parallelism. Once tests pass, disable the GIL and run tests again. Fix any race conditions that emerge. Finally, benchmark to ensure you are seeing the expected performance gains before deploying to production.
Based on the current state of free-threading, here are practical recommendations for different situations:
PYTHON_GIL=1 initially to validate the new runtime without introducing parallelism.Before you decide to adopt free-threading, understand its current constraints. Free-threading is experimental, and that label exists for good reasons.
Free-threading is marked experimental in Python 3.13. This means:
This does not mean free-threading is broken. It means it has not been battle-tested at scale across diverse workloads. Early adopters will help identify edge cases and improve the implementation.
The biggest practical limitation is ecosystem readiness:
Before adopting free-threading, inventory your dependencies and check their status. A single incompatible dependency can block your entire migration.
Free-threading is not universally faster. Performance depends on your specific workload:
Profile your actual application before and after. Do not assume free-threading will help without measurement.
Race conditions are notoriously difficult to debug. They depend on timing, which varies with system load, hardware, and seemingly random factors. A bug might appear once in a thousand runs, making reproduction nearly impossible.
Traditional debugging tools like print statements and breakpoints can change timing enough to hide or trigger bugs. Instead, use tools designed for concurrent code.
Thread sanitizers instrument your code to detect data races at runtime. They slow execution significantly (10-50x) but can find bugs that would otherwise take weeks to track down. Consider running sanitized builds as part of your CI pipeline.
Free-threading is not just a feature. It is part of a larger vision for Python's evolution. Understanding where this is heading helps you make better architectural decisions today.
Python 3.13 introduced free-threading as experimental. The Python core team is gathering feedback, measuring performance, and identifying issues. Based on this experience, Python 3.14 and later versions will bring:
The experimental flag gives the core team freedom to make breaking changes if needed. By Python 3.15 or 3.16, free-threading should stabilize enough for broader adoption.
The long-term vision is ambitious: make free-threading the default and eventually deprecate the GIL entirely. But this is years away, probably a decade or more. The transition requires:
Until then, the GIL remains available for code that needs it. You can always run a free-threaded build with PYTHON_GIL=1 for full backward compatibility.
An important insight: free-threading does not replace asyncio or multiprocessing. Each approach solves different problems, and the best Python programs will combine them based on workload characteristics.
Here is how they compare:
| Approach | Best For | Overhead | Sharing |
|---|---|---|---|
| asyncio | I/O-bound, thousands of connections | Lowest | Single thread, no sharing issues |
| threading + free-threading | CPU-bound parallelism | Medium | Shared memory, needs synchronization |
| multiprocessing | Fault isolation, GIL-dependent code | Highest | Separate processes, IPC required |
The following example shows how you might combine asyncio and free-threading in a single application. Imagine a web service that needs to handle many concurrent HTTP requests (I/O-bound) while also performing CPU-intensive data processing.
This hybrid approach gives you the best of both worlds. asyncio handles I/O efficiently with its cooperative multitasking model. Free-threading handles CPU work with true parallelism. Neither approach alone would be optimal for this workload.
The key insight is that concurrency tools are not mutually exclusive. Understanding when to use each one, and how to combine them, is what separates good Python programmers from great ones.
Free-threading is increasingly relevant in technical interviews, especially for positions involving performance-critical Python applications. Here are the key points interviewers often focus on.
Interview Insight: Free-threading is a hot topic in Python interviews. Be ready to explain what the GIL is, why removing it is hard, and what trade-offs free-threading makes. Even if you have not used it, understanding the concepts demonstrates deep Python knowledge.
Interview Insight: Know the difference between "no GIL" and "thread-safe." Removing the GIL does not magically make your code thread-safe. You still need proper synchronization for shared mutable state.
Interview Insight: Understand why free-threading uses biased reference counting. The naive approach (atomic operations on every refcount change) would be too slow. Biased counting optimizes for the common case (single-thread access).
Q1: What is free-threading in Python and why was it introduced?
Free-threading is an experimental CPython build (Python 3.13+) that removes the Global Interpreter Lock (GIL). The GIL serialized all Python bytecode execution, preventing true parallel execution of threads.
Free-threading was introduced to:
It uses per-object locking and biased reference counting to maintain thread safety without a global lock.
Q2: How does biased reference counting work?
Biased reference counting is an optimization for thread-safe reference counting without excessive atomic operations.
Each object has an "owner" thread. When the owner modifies the reference count, it uses fast non-atomic operations. When other threads modify it, they use slower atomic operations.
This works because most objects are accessed primarily by one thread. The "hot path" (owner thread) is fast. The "cold path" (other threads) is slower but rare.
When an object's refcount reaches zero, both local and shared counts are checked before deallocation.
Q3: Does removing the GIL make Python code automatically thread-safe?
No. Removing the GIL removes one form of serialization but does not provide thread safety for your code.
Code that was "accidentally safe" due to GIL serialization may now have race conditions:
You still need explicit synchronization (locks, thread-safe collections, atomic operations) for shared mutable state. The GIL only protected Python's internal data structures, not your application's invariants.
Q4: What are the performance trade-offs of free-threading?
Costs:
Benefits:
Free-threading benefits workloads that:
It may hurt workloads that:
Q5: How should you migrate existing code to free-threading?
Migration should be gradual and tested:
PYTHON_GIL=1 to use the new runtime without GIL removalPYTHON_GIL=0 and retestDo not rush. Many libraries need updates, and hidden bugs may only appear in specific timing scenarios.
Q6: Will free-threading replace asyncio and multiprocessing?
No. Each approach has its place:
asyncio: Best for I/O-bound concurrency with thousands of connections. Single-threaded, cooperative multitasking. Lower overhead than threads for I/O.
multiprocessing: Best for fault isolation and working with GIL-dependent code. Each process is independent. Useful when you need crash isolation.
Free-threading: Best for CPU-bound parallelism with shared memory. Simpler than multiprocessing when you need to share data. Overhead of threads is higher than coroutines.
In practice, you might combine them:
We have covered a lot of ground in this chapter. Let's consolidate the key points you need to remember about free-threading.
--disable-gil| Aspect | With GIL | Free-Threading |
|---|---|---|
| Thread parallelism | No (serialized) | Yes (true parallel) |
| Single-thread perf | Baseline | ~5-10% slower |
| Thread safety | GIL provides some | Explicit locks needed |
| C extension compat | All work | Must be updated |
| Status | Production | Experimental |
This chapter concludes our Python concurrency deep dive. Looking back, you now have a comprehensive understanding of:
The Python concurrency landscape is evolving rapidly. Free-threading represents a fundamental shift in how Python handles parallelism. While it remains experimental today, understanding these concepts prepares you for where the language is heading. Keep an eye on Python releases and your key dependencies as the ecosystem matures.