Hacker News

Has the Python GIL Been Slain? Subinterpreters in Python 3.8

by jorshmanon 5/17/2019, 1:52:37 PM with 30 comments

by gmuecklon 5/19/2019, 6:51:30 AM
Hm, this solution seems very cumbersome, inelegant and not like python's "batteries included" approach at all. This means that python will have native threads that behhave as expected minus true parallel execution, so you shouldn't use those, even though the interface is fairly simple. Instead, you should learn to use this weird contraption that is neither multiprocessing nor intuitive multithreading and comes with a cumbersome interface.
I get that the GIL is a very hard problem to solve, but this solution is so inelegant in my eyes that python would be better off without it. I'd feel better if this was a hidden implementation detail that coukd be improved transparently. Just my two cents.
by pmontraon 5/19/2019, 5:34:30 AM
It's somewhat similar to the GIL removal effort in Ruby [1]
They are isolating the GIL into Guilds there, which are containers for language threads sharing the same GIL. They are providing two primitives for communication between threads in different guilds. Send, for immutable data (zero copy) and move, for mutable data (copy). They remove the need for the boiler plate code for marshalling and unmarshalling. However I bet that there will be some library to hide that code in Python too.
[1] http://www.atdot.net/%7Eko1/activities/2018_RubyElixirConfTa...
by FartyMcFarteron 5/19/2019, 10:17:49 AM
> This, in turn, means that Python developers can utilize async code, multi-threaded code and never have to worry about acquiring locks on any variables or having processes crash from deadlocks.
Dangerous advice. Whether this is true or not depends on lots of things such as how many and which operations you're doing on those variables.
Sure, CPython might do lots of simple operations atomically, but this is not enough to avoid the need for all locks. Threads can still interleave their execution in many ways.
See also: https://blog.qqrs.us/blog/2016/05/01/which-python-operations...
by tasubotadason 5/19/2019, 11:01:31 AM
The current state of threading and parallel processing in Python is a joke. While they are still clinging to the GIL and single core performance, the rest of the world is moving to 32 core (consumer) CPUs.
Python's performance, in general, is a crappy[1] and is beaten even by PHP these days. All the people that suggest relying on multiprocessing probably haven't done anything that's CPU and Memory intensive because if you have a code that operates on a "world-state" each new process will have to copy that from a parent. If the state takes ~10GB each process will multiply that.
Others keep suggesting Cython. Well, guess what? If I am required to use another programming language to use threads, I might as well go with Go/Rust/Java instead and save the trouble of dabbling with two languages.
So where does that leave (pure-)Python? It can only be used in I/O bound applications where the performance of the VM itself doesn't matter. So it's basically only used by web/desktop applications that CRUD the databases.
It's really amazing that the machine learning community has managed to hack around that with C-based libraries like SciPy and NumPy. However, my suggestion would be to drop GIL and copy the whatever model has been working for Go/Java/C#. If you can't drop GIL because some esoteric features depend on that, then drop them as well.
[1] https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
by olliejon 5/19/2019, 6:55:09 AM
This is essentially the same concurrency model as Workers in JS engines - on the one hand it’s a fairly limiting crutch[1], on the other hand it is harder to create a bunch of different classes of concurrency bugs.
[1] vs fully shared state of C-like, .NET, JVM, etc, etc. Rust’s no-shared-mutable state model allows it to do some fun stuff but python (and JS) don’t really have a strong concept of mutable vs immutable, let alone ownership so I don’t think it would be applicable?
by Animatson 5/19/2019, 4:09:31 AM
This is just a way to do the same thing as "multiprocessing", but with less memory usage. You still have multiple Python instances that send messages back and forth.
I wonder if they ever fixed the CPickle bug which broke it if you were using CPickle from multiple threads.
by gigatexalon 5/19/2019, 5:17:29 AM
No, Mr. Click-baity-title it’s not. They’re still there just you can use many interpreters now like one would when using the multiprocessing module. I do like the idea of Go-like queues for message passing.
by yingw787on 5/19/2019, 4:26:55 AM
From my limited understanding, I think Eric Snow’s push to use subinterpreters is to move an orchestration layer for multiple Python processes from the service layer to the language layer. It may also modularize Pythons’s C API scope. It may also be one of the cheapest ways in order to provide for true CPU bound concurrency in Python, which is important given Python’s limited resources.
by MichaelMoser123on 5/21/2019, 9:54:29 AM
Wow, just like perl threads since perl 5.8 (1) When in doubt, look at the granddaddy of scripting languages, all your trials and tribulations in scripting land have been considered in the past.. let's all sing 'living in the past' by Jethro Tull (2) this one is also good (3)
(1) https://perldoc.perl.org/threads.html
(2) https://m.youtube.com/watch?v=EsCyC1dZiN8
(3) https://m.youtube.com/watch?v=mXeoNX7DSc8
by andrewshaduraon 5/19/2019, 7:44:51 AM
Tcl has had threads that were subinterpreters since a decade ago or more. I find it quite ironic that Python, it would seem, is reinventing it, only in a less elegant way.
by bchon 5/19/2019, 4:14:07 AM
This sounds like an application (or variation) of the apartment threading model[0]. Given the problem and it’s desrciption/characteristic (Global Interpretter Lock), this sounds like an elegant approach.
[0] https://docs.microsoft.com/en-us/windows/desktop/com/process...
by mballantyneon 5/19/2019, 2:10:31 PM
Racket's "places" work a similar way, though do a bit extra to get down to one memory copy, rather than two: https://www.cs.utah.edu/plt/publications/dls11-tsffd.pdf
by Uptrendaon 5/19/2019, 6:36:24 AM
There's nothing wrong with the GIL as long as you know its there. It makes writing concurrent code in Python semi-magical and thats a huge benefit. Concurrent != parallel though, so if there's really a need to scale up to multiple cores there's always the option of forking with multi-processing or "sub interpreters."
I can think of maybe having network code run in their own process and the UI in another. That way there's no risk of bottle necks slowing down the UI and transfers are likewise protected. If you look at bottle.py it seems that this approach could add A LOT of performance for managing downloads / uploads if it's done right.
by cypharon 5/20/2019, 3:32:39 AM
> Another issue is that file handles belong to the process, so if you have a file open for writing in one interpreter, the sub interpreter won’t be able to access the file (without further changes to CPython).
Wouldn't just using CLONE_FILES when forking off interpreters solve this problem?
by qwerty456127on 5/19/2019, 7:08:28 AM
> The GIL also means that whilst CPython can be multi-threaded, only 1 thread can be executing at any given time.
How does this make sense? What's the point of having multiple threads then?
by riskneutralon 5/19/2019, 3:13:15 AM
"How much overhead does a sub-interpreter have? Short answer: More than a thread, less than a process."
So ... No.
by Alex3917on 5/19/2019, 5:22:39 AM
Are there any overall benchmarks for Python 3.8 yet? I know there are a bunch of performance improvements for calling functions and creating objects, but I have no idea how that translates to real software.
by dragonwriteron 5/20/2019, 6:28:54 PM
Huh. This sounds a lot like Ruby Guilds. This looks it will land sooner, though likely in less complete form, as even the prototype Guild implementation has inter-guild communication.
by sciuruson 5/19/2019, 12:03:24 PM
Some earlier coverage: https://lwn.net/Articles/754162/
by eximiuson 5/19/2019, 5:03:58 PM
Oof, that code-as-strings API guarantees I will never use it.
by magwa101on 5/20/2019, 7:23:45 PM
Same process for everyone, small team bootstraps with Python. With success they find another language, now, mostly Go.
by madhadronon 5/19/2019, 2:16:09 PM
Am I misreading, or does this say that I have to serialize and deserialize data within the same process?
by firethiefon 5/20/2019, 1:57:28 AM
> If you want truly concurrent code in CPython, you have to use multiple processes.
Uh what?
by imhoguyon 5/19/2019, 9:02:54 AM
Wouldn't it be good to have Python 4.x next with all these workarounds cleaned up and with only one right pythonic way for parallel procesing? Surelly with a bit of backward compatibility sacrificed like 2 vs 3.
by AlexTWithBeardon 5/19/2019, 11:25:01 AM
Larry Hastings gilectomy project is an interesting approach.
https://lwn.net/Articles/754577/
TLDR: simply replacing object usage counters with their atomic versions grinds the interpreter to the halt.
by sandGorgonon 5/19/2019, 1:05:25 PM
hmm...there's no mention of Gevent - does Gevent share GIL state as well ?
by tus87on 5/19/2019, 4:15:48 AM
The ghost of Perl5 lives on...
by sara7262on 5/19/2019, 7:41:18 AM
Cute dog doing exercise like humans http://bit.ly/2Jq3rz1
Funny gorilla acting like humans see his video on link below http://bit.ly/2JmTNxr
Best way to get rid from dark spots on face just follow this https://zoomtips.blogspot.com/2019/05/Dark-spots.html
Funny panda is playing with friends they are sho cute http://bit.ly/2JvDQF4
by mrmonkeymanon 5/19/2019, 7:32:06 AM
Let it rest in peace please. All non-python devs know it's taking its last breaths. Give it some space. Python is dead, long live python.