An easy way to concurrency and parallelism with Python stdlib

  • I recently have been doing--what should be--straightforward subprocess work in Python, and the experience is infuriatingly bad. There are so many options for launching subprocesses and communicating with them, and each one has different caveats and undocumented limitations, especially around edge cases like processes crashing, timing out, killing them, if they are stuck in native code outside of the VM, etc.

    For example, some high-level options include Popen, multiprocessing.Process, multiprocessing.Pool, futures.ProcessPoolExecutor, and huge frameworks like Ray.

    multiprocessing.Process includes some pickling magic and you can pick from multiprocessing.Pipe and multiprocessing.Queue, but you need to use either multiprocessing.connection.wait() or select.select() to read the process sentinel simultaneously in case the process crashes. Which one? Well connection.wait() will not be interrupted by an OS signal. It's unclear why I would ever use connection.wait() then, is there some tradeoff I don't know about?

    For my use cases, process reuse would have been nice to be able to reuse network connections and such (useful even for a single process). Then you're looking at either multiprocessing.Pool or futures.ProcessPoolExecutor. They're very similar, except some bug fixes have gone into futures.ProcessPoolExecutor but not multiprocessing.Pool because...??? For example, if your subprocess exits uncleanly, multiprocessing.Pool will just hang, whereas futures.ProcessPoolExecutor will raise a BrokenProcessPool and the pool will refuse to do any more work (both of these are unreasonable behaviors IMO). Timing out and forcibly killing the subprocess is its own adventure for each of these too. I don't care about a result anymore after some time period passes, and they may be stuck in C code so I just want to whack the process and move on, but that is not very trivial with these.

    What a nightmarish mess! So much for "There should be one--and preferably only one--obvious way to do it"...my God.

    (I probably got some details wrong in the above rant, because there are so many to keep track of...)

    My learning: there is no "easy way to [process] parallelism" in Python. There are many different ways to do it, and you need to know all the nuances of each and how they address your requirements to know whether you can reuse existing high-level impls or you need to write your own low-level impl.

  • I know this article is all about the stdlib, but having built multiple multiprocess applications with python I eventually built a library, QuasiQueue to simplify the process. I've written a few applications with it already.

    https://github.com/tedivm/quasiqueue

  • Thank you for the article.

    I use multiprocessing and I am looking forward to the GIL removal.

    I would really like library writers and parallelism experts to think on modelling computation in such a way that arbitrary programs - written in this notation - can be sped up without thinking about async or parallelism or low level synchronization primitives spreading throughout the codebase, increasing its cognitive load for everybody.

    If you're doing business programming and you're using python Threads or Processes directly, I think we're operating against the wrong level of abstraction because our tools are not sufficiently abstract enough. (it's not your error, it's just not ideal where our industry is at)

    I am not an expert but parallelism, coroutines, async is my hobby that I journal about all the time. I think a good approach to parallelism is to split you program into a tree dataflow and never synchronize. Shard everything.

    If I have a single integer value that I want to scale throughput of updates to it by × hardware threads in my multicore and SMT CPU, I can split the integer by that number and apply updates in parallel. (You have £1000 in a bank account and 8 hardware threads you split the account into 8 bank accounts and each store £125, then you can serve 8 transactions simultaneously at a time) Then periodically, those threads can post their value to another buffer (ringbuffer) and then a thread that services that ringbuffer can sum them all for a global view. This provides an eventually consistent view of an integer without slowing down throughput.

    Unfortunately multithreading becomes a distributed system and then you need consensus.

    I am working on barriers inspired by bulk synchronous parallel where you have parallel phases and synchronization phases and an async pipeline syntax (see my previous HN comments for notes on this async syntax)

    My goal would be that business logic can be parallelised without you needing to worry about synchronization.

  • If I need concurrency these days, I just write it in Golang. My primary use for Python was one off scripts for cloud management / automation tasks. Today I write maybe 70% Golang and 30% Python.

  • Does not seem exactly like an easy way to me. Not super hard, surely, but not "easy". More like "moderately easy to do and a bit annoying to implement".

    Probably 20% of the effort shown in this post could have been expended to just write something very similar in Golang, and it would have taken less time, too. Because the way I see it this is trying to emulate futures / promises (and it looks like it's succeeding, at least on the surface). That can spiral out of comfortable maintainable code territory pretty quickly.

    But especially for something as trivial as a crawler, I don't see the appeal of Python. You got a good deal of languages with lower friction for doing parallel stuff nowadays (Golang, Elixir, Rust if you want to cry a bit, hell, even Lua has some parallel libraries nowadays, Zig, Nim...).

  • This is a really nice little guide. Much thanks to the author. Sometimes you just need to hit a bunch of APIs independently and don't want to switch your entire architecture around to do so.

  • Awesome article, use it a lot in a python project at work and it's quite nice how simple it is. I'm trying to replicate the python code but in Rust and it is slightly slower, more than likely my fault though as I'm new to Rust.

  • Is there a way to add tasks with independent timeouts using only the Python stdlib? I was reading a piece of code yesterday that had `pebble` as dependency and it looked like it was only needed for the `pool.schedule(..., timeout=1)`.

  • The article shows how to use ThreadPoolExecutor, but that's not fully parallel. For that, you need multiprocessing.Pool, which is slightly easier to use anyway, unless your data happens to be non-pickle-able.

  • When dinking around in Ipython you need to use a fork for the "multiprocessing" library called "multiprocess."

    Parallelism in a Notebook isn't for everyone, but how would these changes affect it?

  • > For those, Python actually comes with pretty decent tools: the pool executors.

    Delusion level: max.

    You have to be in a very, very bad place when this marginal improvement over absolute horror-show that bare Process offers seemed "pretty decent".

    Python doesn't have good tools for parallelism / concurrency. It doesn't have average tools. It doesn't have even bad tools. It has the worst. Though, unfortunately, it's not the only language in this category :(

  • Maybe I missed it, but how do the threads circumvent the GIL?

    > When a request is waiting on the network, another thread is executing.

    I'm guessing this is the meat, but what controls that? What other operations allow the GIL to switch to another thread?

  • So what is the consensus view on how to do parallelism in python if you just have something that is embarassingly parallel with no communication between processes necessary?

  • [dead]

  • Don't see MPI. Can skip this article.

  • The easiest and modern way is simply to use asyncio...