-
-
Notifications
You must be signed in to change notification settings - Fork 348
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Subprocess support: run_process() #872
Conversation
Since we're now deviating from stdlib subprocess in a few different ways, also uppdate the subprocess documentation and docstrings to move away from the "just like stdlib subprocess, only async" framing.
Codecov Report
@@ Coverage Diff @@
## master #872 +/- ##
==========================================
- Coverage 99.54% 99.52% -0.02%
==========================================
Files 102 102
Lines 12392 12300 -92
Branches 935 919 -16
==========================================
- Hits 12335 12242 -93
Misses 36 36
- Partials 21 22 +1
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for moving this forward! Here are some comments focused on the high-level design and docs. (I haven't done a close-reading of the code yet, though nothing jumped out at me either.)
|
||
.. automethod:: terminate | ||
|
||
.. automethod:: send_signal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does .. autoclass: :members:
not pick these up for some reason...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to be able to control the order, and I couldn't get them to go in non-alphabetical order using :members:
. (This despite our docs/source/conf.py having autodoc_member_order = "bysource"
, and also trying an explicit :member-order: bysource
.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yeah, that's python-trio/sphinxcontrib-trio#13
This was already fixed back in August, we just never released it... whoops. I just released that, so hopefully bysource works now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can drop these now; "bysource"
should work now. (And if it doesn't I want to know ;-))
trio/_subprocess.py
Outdated
more specific error check if you're planning on sending any. | ||
**options: :func:`run_process` also accepts any :ref:`general subprocess | ||
options <subprocess-options>` and passes them on to the | ||
:class:`~trio.Process` constructor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm having trouble thinking of cases where getting access to the Process
object is the natural way to accomplish something... if someone's using run_process
to pause a task to run a process, then it feels like the natural way to cancel that is to, y'know, cancel it :-). And TOOWTDI?
There's a long tradition in other libraries having lots of kinds of "thing you run", and giving each their own ad hoc cancellation API, that I've tried to avoid succumbing to...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could imagine wanting to send SIGSTOP/SIGCONT, or to send some graceful shutdown signal (since we dropped the explicit support for that), or to have access to the PID for various reasons. I guess it's true that many of these wouldn't overlap with output capturing, and with no capturing and check=False
run_process is basically just async with Process(...) as proc: pass
, so maybe it's more of an attractive nuisance than it is a useful tool? Happy to remove the start()
support if you think it's not worth it after reading this explanation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think let's drop the task_status
for now, and we can always add it back later when we have more experience.
docs/source/reference-io.rst
Outdated
so a Ctrl+C will be delivered to both the child process and the parent Trio | ||
process simultaneously. If that's not what you want, you'll need to delve | ||
into platform-specific APIs like (on UNIX) :func:`os.setpgrp` and | ||
:func:`os.tcsetpgrp`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
Why is this up here, instead of in the
run_process
docstring? Output capturing semantics seems like a good thing to describe in the docstring :-). -
I'm surprised at the switch to capturing by default. It doesn't seem obviously wrong to me, but it doesn't seem obviously right, either, and it's a pretty major change from
subprocess.run
so it needs some kind of rationale. -
The mention of
setpgrp
andtcsetpgrp
seems to come out of no-where here? I'm not sure if we should mention them at all, and if we do then probably it shouldn't be tacked onto the end of an unrelated paragraph :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My rationale for output capturing by default is:
- If you're writing a concurrent app, you might want to run multiple processes at once. That's maybe even more likely than that you'd be careful not to run multiple processes at once. If you run multiple processes at once, it's not really safe to not isolate their I/O from each other.
- If we have a flag for capturing output, people might use it but forget to pass
input=b""
to a process that they don't expect will read any input. If that process then does try to read input, it looks like a hang, because whatever prompt it printed before reading from stdin was captured and thus invisible to the user. So, capturing by default seems less error-prone than providing different flags for capturing output and for providing input.
Originally I had run_process
and delegate_to_process
as two separate functions, where run_process
used isolated I/O and delegate_to_process
did no redirections. I removed that because it seemed like too much API surface area given your other concerns. I could bring it back, or could add a passthrough
option to run_process
, where passthrough=True
means don't do redirections and passthrough=False
means send input and capture output. Or whatever name (and default) you think would be clearest? I guess a separate delegate_to_process
function has the advantage that it gives us a place to hang additional cleverness that might be developed in the future for dealing with interactive subprocesses "properly" (there's a lot of complication in the TTY job control handling). But maybe that's premature and would be better handled in an outside library.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I implemented the passthrough
option; let me know how you think it reads?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The passthrough
option is exactly the same as the stdlib's capture_output
, but backwards, right? If we do do this, then making the default capture_output=True
seems strictly better than making it passthrough=False
:-).
I think I might still prefer capture_stdout=True, capture_stderr=False
, but before we dive into that can you clarify the handling of input
? In the stdlib run
, AFAICT input=None
(the default) means we let the child inherit the parent's stdin, and you have to pass input=b""
if you want to give it an empty pipe. Are we matching that, or are you thinking that passthrough
would affect this, or...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
passthrough=True
gives the stdlib defaults. passthrough=False
is equivalent to input=b""
if no input
is specified, plus capture_output=True
. I explained above in this thread why I think it's error-prone to have different isolation defaults for input vs output.
My preference would be to have one flag that controls "are the standard streams for this process basically isolated or are they basically inherited from the parent Trio process", and then people can modify the treatment of individual streams with the stdin/out/err options if they don't want all three streams treated the same way. If the standard streams are isolated and you didn't provide any input, you probably want the subprocess's input to come from /dev/null. It seems confusing to me to distinguish input=None
from input=b""
for that purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about this some more... I think actually that the mental distinction I'm having here is between pre-allocating consumers and post-allocating consumers, given the run_process
high-level API.
I think at an intuitive level, I expect async def run_process
to be atomic -- it's responsible for starting, managing, and closing the process; when it returns, the process is definitely finished, and before you call it, the process definitely hasn't started (plus all of the other usual trio semantics).
So then if you want to combine that high-level API with eager/incremental processing of output/input, then you're (by definition) forced to preallocate your stdin producers and stdout/err consumers using a nursery, which you then need to coordinate with the run_process
API somehow so that they get hooked up correctly. That, ultimately, is the API problem my wrapper is trying to solve.
Also partly relevant to the discussion in #959, I think my choice for a channel over a stream, even though the stream is quite literally the thing returned by the trio Process
, is the result of two things (listed in order of importance):
- at least according to the docs, there's nothing suggesting I can pass an existing stream object to the
trio.Process
constructor, which means I'd have no way of preallocating the stream - it seems much, much more natural (and easier!) to use channels for this, because of the
send_channel, receive_channel = trio.open_memory_channel()
function, which immediately allocates everything you need to do this coordination (and close it down easily)
Hopefully that explanation of my train of thought makes sense!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at least according to the docs, there's nothing suggesting I can pass an existing stream object to the trio.Process constructor, which means I'd have no way of preallocating the stream
That's correct. It's possible in some exotic cases, but not in general.
Thinking about this some more... I think actually that the mental distinction I'm having here is between pre-allocating consumers and post-allocating consumers, given the run_process high-level API.
Why do you prefer pre-allocating consumers, though? It seems like this is just as easy?
async with trio.Process(...) as p:
async with trio.open_nursery() as nursery:
...
Or if not, why not? (I guess two possibilities that come to mind: that doesn't do check
and it doesn't kill the process on error. I sort of wonder if Process.__aexit__
should kill the process if there's an exception, by analogy with nurseries cancelling siblings on error.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@njsmith I don't necessarily prefer preallocating consumers. The goal with my wrapper was to provide a high-level API that does check
, handles await process.wait()
, kills the process on error/cancellation/etc, all within a single await run_process
call. As far as I'm aware, the only possible way to combine those semantics with incremental processing (of either input or output) is to preallocate, so that's what the API ends up exposing. Does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true, if the requirement is that it be a single await my_run_process(...)
, then you have to preallocate. But what I'm still not sure of is: why is it important that it be a single await my_run_process(...)
, as compared to a single async with my_run_process(...)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, at the end of the day it's just my own personal API preference, so it deserves a heaping mound of salt. But I think my preference for an atomic my_run_process(...)
over an async context manager, is that the context manager introduces a second concurrency mechanism to trio -- it's behaving both like a subprocess spawner, and a nursery, and my personal preference would be to keep those two separate. It was also my experience that the context manager resulted in more boilerplate, but I think that's the weaker point so I'm not sure it's worth exploring very far.
Like I said though, I'm just one person, and I always reserve the right to be wrong haha. Just thought I'd add some thoughts into the mix!
@njsmith - I think I've addressed all the outstanding comments here; could you take another look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some line comments here; the big question is about passthrough
, and github's UI doesn't seem to let me respond to that thread in this review interface, so I'll post separately.
|
||
.. automethod:: terminate | ||
|
||
.. automethod:: send_signal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can drop these now; "bysource"
should work now. (And if it doesn't I want to know ;-))
trio/_subprocess.py
Outdated
more specific error check if you're planning on sending any. | ||
**options: :func:`run_process` also accepts any :ref:`general subprocess | ||
options <subprocess-options>` and passes them on to the | ||
:class:`~trio.Process` constructor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think let's drop the task_status
for now, and we can always add it back later when we have more experience.
For some reason github isn't letting me reply to or resolve your last two line comments inline
I just upgraded my sphinx and sphinxcontrib-trio (the latter to 1.0.2, which supposedly includes the fix?) and it still seems to enforce alphabetical ordering.
Ok! |
@njsmith: heads up that this PR is currently awaiting a response to the proposed redirection semantics in #872 (comment). No rush if you're busy, just wanted to poke since I find it easy to accidentally lose comments in github threads and I figured there was a chance that that had happened here. |
Sorry for the delay in responding - how does this iteration look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still on the fence about capture
. Do we really need it?
And if we do need it, I think capture_both
is a much better name... I worry that people will write or read capture
and accidentally swallow stderr again.
-1 on |
Arguably the best way to do that is |
All right, I'll concede that my bias towards output capturing seems to be in the minority around here. :-) Have removed |
For me this is critical and ~always what I do because it's the easiest way to debug when things go wrong. I've always found the stderr on it's own with no surrounding context to be pretty useless for debugging purposes. Is there a way to do this with the current design? From a quick look I can't see how? |
|
Well, I would – if both stdout and stderr were either line buffered or unbuffered. Unfortunately, that's rarely the case; typically stderr is either line buffered or unbuffered, no matter where it's going, while stdout's buffer is some-power-of-two when it's not going to a terminal. |
Off-topic but sometimes useful hack: If your subprocess is python, then you can pass The more general solution would be to allocate a tty instead of a pipe. That has some extra complications and I'm not sure if convenience functions for it are in-scope for trio or not, but if you want to open an issue to discuss then feel free. |
Since we're now deviating from stdlib subprocess in a few different ways, also uppdate the subprocess documentation and docstrings to move away from the "just like stdlib subprocess, only async" framing.
Closes #822.