Cat language: The most widely criticized point in Python is probably its GIL. Due to the existence of the GIL, Python cannot achieve true multi-threaded programming, so many people see this as Python’s biggest weakness.

After the PEP-554 was proposed (September 2017), everyone seemed to see a silver lining. However, can the GIL really be completely killed, and if so, how will it be achieved, why wait more than a year and not realize it, and how long do we still have to wait?

Image source: pexels

English | Has the Python GIL been slain?【1】

Author | Anthony Shaw

Translated by | Pea flowers under the cat

Disclaimer: This article has been translated with the authorization of the original author, please retain the source of the original text for reproduction, please do not use it for commercial or illegal purposes.

In early 2003, Intel introduced the new Pentium 4 “HT” processor, which is clocked at 3 GHz and uses “hyper-threading” technology.

Over the next few years, Intel and AMD competed fiercely to achieve optimal desktop performance by increasing bus speeds, L2 cache size, and reducing chip size to minimize latency. 3Ghz’s HT was replaced in 2004 by the “Prescott” 580 model, which clocked up to 4 GHz.

It seems that the best way to improve performance is to increase the frequency of the processor, but the CPU is plagued by high power consumption and heat dissipation that affects global warming.

Do you have a 4Ghz CPU on your computer? Unlikely because the way performance moves forward is higher bus speeds and more cores. Intel Core Gen 2 replaced the Pentium 4 in 2006 and clocked far less than that.

In addition to the release of consumer-grade multi-core CPUs, something else happened in 2006, Python 2.5 was released! Python 2.5 brings a beta version of the beloved with statement.

When using Intel’s Core 2 or AMD’s Athlon X2, Python 2.5 has one important limitation — GIL.

The GIL, or Global Interpreter Lock, is a Boolean value in the Python interpreter that is protected by mutex. This lock is used by the core bytecode in CPython to evaluate loops and regulate the current thread used to execute statements.

CPython supports the use of multiple threads in a single interpreter, but threads must obtain access to the GIL to perform opcodes (low-level operations). The advantage of this is that Python developers writing asynchronous or multithreaded code do not have to worry about acquiring locks on variables or about processes crashing due to deadlocks.

The GIL makes multithreaded programming in Python simple.

The GIL also means that while CPython can be multithreaded, it can only execute 1 thread at any given time. This means that your quad-core CPU will work as shown above (minus the blue screen, hopefully).

The current version of the GIL, written in 2009 [2] to support asynchronous functionality, has survived almost unchanged, even after multiple attempts to remove it or reduce dependence on it.

The claim that all proposed removal of the GIL is that it should not degrade the performance of single-threaded code. Anyone who enabled Hyper-Threading in 2003 will understand why this is important [3].

If you want to use true concurrency code in CPython, you must use multiple processes.

In CPython 2.6, the multiprocessing module was added to the standard library. multiprocessing is the wrapper of CPython’s mass-generated processes (each process has its own GIL)——

A process can “hatch” from the main process, send commands through a compiled Python module or function, and then reintegrate into the main process.

The multiprocessing module also supports sharing variables through queues or pipelines. It has a Lock object that locks objects in the main process so that other processes can write.

Multi-process has one major drawback: it is expensive in terms of time and memory usage. The startup time of CPython, even if there is no non-site, is 100-200ms (see this link [4]).

Therefore, you can use concurrent code in CPython, but you must carefully plan for long-running processes that rarely share objects with each other.

Another alternative is to use a tripartite library like Twisted.

To sum up, it’s easy to use multithreading in CPython, but it’s not really concurrency, and multiple processes, while concurrent, are extremely expensive.

Is there a better solution?

The clue to bypass the GIL is in its name, and the global interpreter lock is part of the global interpreter state. CPython’s process can have multiple interpreters, so it can have multiple locks, but this feature is rarely used because it is only exposed through the C-API.

Among the features proposed for CPython 3.8 is PEP-554, which proposes to implement a sub-interpreter and provide a new interpreters module with an API in the standard library.

This makes it possible to create multiple interpreters in a single process in Python. Another change to Python 3.8 is that the interpreter will all have a separate GIL —

Because the state of the interpreter contains the memory allocation arena, the set of all pointers to Python objects (local and global), the child interpreter in PEP-554 cannot access the global variables of other interpreters.

Similar to multi-process, the method of sharing objects between interpreters is to serialize in some form of IPC (network, disk, or shared memory). There are many ways to serialize objects in Python, such as marshal modules, pickle modules, and more standardized methods like json and simplexml. These methods vary in praise and criticism, but without exception incur additional overhead.

The best approach is to open up a shared, variable memory space that is controlled by the main process. In this case, the object can be sent from the main interpreter and received by other interpreters. This will be the memory-managed space of the PyObject pointer, which each interpreter can access while the master process has control over the lock.

Such an API is still being developed, but it might look like this:

This example uses numpy and sends a numpy array on the channel by serializing it using the marshal module, which is then processed by the subinterpreter (on a separate GIL), so this would be a computationally intensive (CPU-bound) concurrency problem suitable for use with subinterpreters.

The marshal module is fairly fast, but still not as fast as sharing objects directly from memory.

PEP-574 proposes a new pickle [5] protocol (v5) that supports the separation of memory buffers from the rest of the pickle stream. For large data objects, serializing them once and then deserializing them by the subinterpreter adds a lot of overhead.

The new API can (hypothetically and does not fit into) provide interfaces like this:

Indeed, this example uses a low-level subinterpreter API. If you use the multi-process library, you will find some problems. It’s not as simple as threading, and you can’t think of using the same string of inputs in different interpreters to run the same function (not yet).

Once this PEP is incorporated, I think some of the other APIs in PyPi will adopt it as well.

Short answer: Greater than one thread, less than one process.

Detailed Answer: The interpreter has its own state, so while PEP-554 can make it easy to create a child interpreter, it also needs to clone and initialize the following:

Modules in the main namespace and importlib

The contents of the sys dictionary

Built-in methods (print, assert, etc.)


Core configuration

Core configurations can be easily cloned from memory, but imported modules are not that simple. Importing modules in Python is slow, so if each time you create a subinterpreter means importing the module into another namespace, the benefits are reduced.

The current implementation of the asyncio event loop in the Standard Library is to create frames that need to be evaluated, but share state in the main interpreter (and therefore share GIL).

After PEP-554 is incorporated, most likely in Python 3.9, an alternative implementation of the event loop may be like this (although no one has done so yet): running the async method inside the subinterpreter and therefore concurrent.

No, not yet.

Because CPython has been using single-interpreter implementations for a long time, many parts of the codebase use “Runtime State” instead of “Interpreter State”, so incorporating the current PEP-554 will cause a lot of problems.

For example, the state of the garbage collector (before version 3.7) belongs to the runtime.

During PyCon sprint, the author refers to the official one held in the United States, from May 1 to May 9, 2019. The sprint is a 1-4 day event where developers volunteer to join a project for “sprint” development. This term is used a lot by agile development teams and has a slightly different meaning and form), the change has already started [6] shifting the state of the garbage collector to the interpreter, so each child interpreter will have its own GC (as it should be).

Another problem is that there are still some “global” variables in the CPython codebase and many C extensions. So when people suddenly start writing concurrent code correctly, we can run into some problems.

Another problem is that file handles belong to the process, so when you read and write a file in an interpreter, the child interpreter will not be able to access the file (without further changes to CPython).

In short, there are many other things that need to be addressed.

For single-threaded applications, the GIL is still alive. Therefore, even if PEP-554 is merged, if you have single-threaded code, it will not suddenly become concurrent.

If you want to use concurrency code in Python 3.8, then you will encounter computationally intensive concurrency problems, then this may be a ticket to the market!

Pickle v5 and shared memory for multiple processes may be implemented in Python 3.8 (October 2019), with subinterpreters between 3.8 and 3.9.

If you want to use my example now, I’ve built a branch with all the necessary code [7]

[1] Has the Python GIL been slain?[2] was written in 2009:[3] This is important:[4] This link :[5] PEP-574 proposes a new pickle:[6] The change has begun:[7] Necessary Code :

Random recommendations, chance encounters


Thanks to this book, it helped me calm the anxiety of “thirty unstanding”


Python Craftsman: Three Good Habits for Exception Handling


Python Advanced: Iterators and iterator slices


Let’s talk about Python Chinese community translations