-
-
Notifications
You must be signed in to change notification settings - Fork 343
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing piece: worker process pools, run_in_worker_process #5
Comments
An interesting question is how this interacts with task-local storage (#2). One possibility is that when creating task-local objects, they could be tagged as multiprocess friendly or not, e.g. |
should I work something on trio-multiprocessing? |
@auvipy Well I mean... it's up to you :-). I think it would be great to have multiprocessing-like tools in Trio. There are lots of things that would be great though, so it really depends on whether it's something you want to work on :-). Since I haven't looked at this issue in like, 2 years, let me update with some more recent thoughts: I wouldn't try to use the I think I'd probably do it as a third-party library, instead of baking it into trio itself, because it's actually a pretty complex and self-contained project. My first goal would be a really seamless Next goal would probably be some kind of process pool support, to cache processes between invocations, because process startup is very expensive, much more so than threads. Then if I wanted to get really fancy, I'd think about ways to seamlessly pass channel objects between processes, so that you can set up producers and consumers in different processes and they can talk to each other. |
I understand. thanks for explaining. celery need a better billiard :) |
Ah, I see! Well, then I guess you understand why I am hesitant to suggest using |
@njsmith I'd be curious to get your feedback on this proof-of-concept for a https://github.com/ethereum/trinity/pull/1079/files Specifically, I'm interested in what you think about how I leveraged the trio subprocess APIs to achieve the process isolation. This is probably something that I'll keep internal to our codebase for a bit while I polish it, but if this approach is something you think is acceptable for larger scale use then I'd like to get it packaged up for others to use as well. |
@pipermerriam I guess there are two important decisions that affect how you implement a
It looks like in your prototype, you're going for "new process each time" and "use stdio". In that case, you can simplify the code a lot, to something like: async def run_in_worker_process(...):
encoded_job = cloudpickle.dumps((async_fn, args, ...))
p = await trio.run_process([sys.executable, "-m", __name__], stdin=encoded_job)
return cloudpickle.loads(p.stdout)
if __name__ == "__main__":
job = cloudpickle.load(sys.stdin.detach())
retval = trio.run(...)
cloudpickle.dump(sys.stdout.detach(), retval) If you want to re-use the processes but are happy with using stdio, then you need to add something like the framing protocol that you use in your prototype. But you don't need to create any pipes manually or anything – you can use If you don't want to use stdio... well, I guess if you only care about one-shot processes then the simplest solution is to pass data through temp files :-). But let's say you choose the fanciest option, where you need persistent processes that you can send and receive messages to, and we use some other channel for communication. In this case you need to do something like:
|
@njsmith great feedback and lots of good pointers. I spent another few hours tinkering and have things in a state that I'm starting to get more happy with. I'm not necessarily convinced that any of my current design decisions are exactly right but behavior-wise it's quickly approaching the general purpose API that I'm shooting for.
The code doesn't currently re-use child processes but have had that functionality in mind and I'm reasonably confident that it can be added without too much effort. For my specific use case this isn't a high value feature but I know there are plenty of use cases where elimination of this overhead allows offloading CPU bound work so it's definitely on the roadmap. Again, curious to hear any new thoughts you might have but I don't want to take up too much of your time. Update: The previously linked PR is still a fine way to view the code but I've moved it to https://github.com/ethereum/trio-run-in-process which has the code in a more organized fashion. |
Celery could be greatly benefitted by this |
It'd be nice to have a clean way to dispatch CPU-bound work. Possibly based on
multiprocessing
.The text was updated successfully, but these errors were encountered: