-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
End-of-file does not always mean that the file can no longer be read from #103436
Comments
1In the doc it says, once EOF is received, no more These are clearly documented and widely used and implemented also in other event-loop implementations. Thus, it is impractical to change. I also intuitively against to change that. 2In addition, I think the correct solution here is to call import sys
import asyncio
import uvloop
async def read_stdin():
loop = asyncio.get_running_loop()
reader = asyncio.StreamReader()
protocol = asyncio.StreamReaderProtocol(reader)
await loop.connect_read_pipe(lambda: protocol, sys.stdin)
async for line in reader:
print("read:", line.decode())
print("EOF")
reader = asyncio.StreamReader()
protocol = asyncio.StreamReaderProtocol(reader)
await loop.connect_read_pipe(lambda: protocol, sys.stdin)
async for line in reader:
print("read:", line.decode())
uvloop.install()
asyncio.run(read_stdin()) This is exactly what you have expected, since 3I think the main problem is that, Lines 487 to 498 in aa87432
I do not know why it is designed this way. (But removing all |
I am overwhelmed by the amount of detail in this bug report. Is there a specific use case you actively want? I am not interested in changing the code, possibly breaking existing uses, for some theoretical correctness reason. |
Apologies, I may have overdone it a little with the detail. Yes, the use case in question is:
Thank you for pointing me to this. I can better see now that this is an intentional design choice, however it is still quite limiting, as I hope I've illustrated. Datagram sockets can't be closed from the other side and can't encounter EOF, FIFOs can't be closed from the other side but can encounter EOF. One can be accommodated cleanly with this design, the other cannot.
That's the first workaround I attempted, which is when I figured out that the transport actually closes the file. I don't really find it intuitive -- the file I'm reading from is the same, EOF is just in-band data for a terminal, why should I keep manually re-adding it to the selection every time it happens? I suppose this is a matter of abstraction incompatibility -- asyncio protocols and transports look like they want to map neatly onto Unix concepts (streams, datagrams, pipes, etc.) but they don't.
This is indeed the main cause of pain in this issue and the core thing that should be addressed. The rest, I suppose, could be written off as conceptual differences between asyncio and Unix, which I still feel are unfortunate, but maybe that's just me. It's interesting that uvloop doesn't have this issue but |
I would say that the terminal convention around ^D is the odd one out. Nothing else can receive data once EOF has been signaled -- with a socket or pipe the EOF is final. The StreamReader and pipe handling are designed around that (IMO more sensible) model. Maybe you can write your own transport for handling your use case? |
Not if it's a named pipe (aka FIFO), the behavior is different again for those. Much of my initial report is specifically about detailing those differences.
I definitely could write my own transport for it, but I find it weird that am supposed to go to those lengths in order to read from a terminal. I encountered this issue by debugging the |
The Transports + Protocols architecture was copied from Twisted. I'd like to know what Twisted does in this case. I'd also like to receive less of an attitude of entitlement. There are exactly two remaining asyncio maintainers left, with very limited time and many other responsibilities; we have to prioritize. |
I am sorry that I came across as entitled, this is very much not what I want to be. Thank you for pointing this out to me, I will try to check my tone better going forwards. I don't call any shots here and I neither influence, nor really know what the priorities of the asyncio maintainers are. I have found a feature in asyncio that is easy to misuse, is misused in a some relatively big projects, and requires a workaround to use correctly, and I wanted to report it so that asyncio could handle it in a way that better accommodates this use case. I understand that I will have to write a workaround in my code for the current behavior either way, because even if asyncio was changed to accommodate this use case tomorrow, users can't be expected to immediately update to the latest version of Python. What I ultimately want is to be helpful, and I apologize if I failed at that.
Similarly to asyncio, Twisted reports I also checked how it's done in Trio, and they don't special-case EOF when reading from a pipe, and the docs make it clear that this API is intended to be used with TTYs. |
Sure, thanks for understanding.
The function you link to just reads from the fd and calls the callback if any data was read. It looks like this convention is actually at the heart of the problem -- when no data is read, the data callback is not called, and instead some other protocol callback is called that indicates EOF. I guess there's a grammar or state diagram for which protocol methods can be called in which order, and it's something like "dataReceived* connectionLost". What you are looking for would correspond to "(dataReceived | connectionLost)*", but that leaves it unclear to the protocol when it is really done. Calling the callback with an empty argument would break all the (Twisted) code in the world so also wouldn't be a good idea.
But Trio doesn't use the Transport/Protocol abstractions, does it? Its Anyway, I'm not sure how we could fix this in asyncio without breaking existing, working code. When reading from an actual pipe (the intended use case for As with Twisted, I don't see how we could make it do what you want -- we can't call In your application, perhaps you could work around it by using Unless you have a brilliant fix in mind that you were waiting to pull out of your sleeve when all other options are exhausted? :-) |
FWIW there is an interesting feature on Linux: you can reopen a pipe from procfs, for example: >>> import os
>>> r, w = os.pipe()
>>> os.close(w)
>>> os.read(r, 16)
b''
>>> w = os.open(f"/proc/self/fd/{r}", os.O_WRONLY)
>>> os.write(w, b"hello")
5
>>> os.read(r, 16)
b'hello' BTW, you can open a pipe even with |
Interesting, I didn't know that. Thank you!
Yes, that is exactly what happens -- the reactor sees the truthy return value and removes the file descriptor from selection, calling
This is very true. While I originally thought
I don't know how crucial it is to the design that the transport owns the file descriptor (uvloop doesn't seem to do that, as sunmy2019 mentioned earlier in the thread), but honestly even if it didn't the API is very much suited for reading specifically from an unnamed pipe and not a terminal.
I'll have to think how best to do that, it's one of the options I've been considering. I thought that writing my own Stream-like class that uses the API for watching file descriptors directly could be cleaner, though it can't exactly conform to the asyncio StreamReader and StreamWriter API, as it also assumes that an EOF is final.
I suppose the only fix would be to add an new API that is specifically designed for reading from and writing to terminals, though I understand that this is a lot of design and implementation work 😅 If you are not inclined to do that, which I would very much understand, that kind of leaves asyncio with an API that can be used for terminals, but is misleadingly not quite complete. Perhaps a note could be added to the documentation for |
But it's totally fine unless you want to keep reading after receiving EOF, right? At best we should warn in the docs about that. And indeed I don't want asyncio to grow another API for this corner case. It really belongs in a 3rd party package. IMO we already provide too many APIs (fortunately, at least HTTP support is 3rd party, but unfortunately TLS support is baked in). |
That is correct. Is that documentation addition something that you would like me to do as a PR? |
@micha030201 Sure, send a PR and I'll review it. |
@micha030201 Please do send in a docs PR. That seems the best way to move forward with this issue. Thanks! |
Background
An "end-of-file condition" or EOF in POSIX is what happens when a process makes a read() call on a file descriptor and gets a return value of 0. Files can sometimes continue after an end-of-file condition has been encountered.
asyncio's
connect_read_pipe
on Unix can take files of three types -- FIFO (or unnamed pipe1), socket, and character device. EOF means different things in each of these file types, so let's go through them one by one.Socket
In docs to read():
In docs to recv():
What that means, however, varies based on the socket type. There are 4 of them: SOCK_DGRAM, SOCK_RAW, SOCK_SEQPACKET, and SOCK_STREAM4. Let's go through them.
SOCK_SEQPACKET and SOCK_STREAM
These are connection-mode, so when whoever we've been talking to shuts the socket down, the connection is closed and no more data will be sent.
Well, almost -- from the Linux man pages:
But provided that we don't call read() with length 0, EOF means that the connection is closed.
SOCK_DGRAM and SOCK_RAW
These, however, are connectionless-mode, so they can't be shut down from the other side, and thus we would never encounter EOF.
With datagram sockets there is a small caveat. In the Linux manual page for recv:
However, on Linux that is not a problem, because it breaks POSIX-compatibility in this instance:
So on Linux, we're still fine. On a POSIX-compatible system, we would confuse receiving a zero-length datagram with a connection being closed.
Raw socket datagrams always include the IP header, so their length can never be 06.
FIFO/pipe
In docs to read():
What this says is that when a reader encounters EOF on a pipe, it means that the all writers have closed their FIFO or pipe file descriptors. It will keep encountering EOF with every call to read() while there are no writers.
If it's an unnamed pipe, it means that its write end is lost forever, because it can't be referred to in any other way than its file descriptor. If it's a FIFO, however, a new process can open it for writing and our reader will stop encountering EOF and will be able to proceed as it had before.
Character device
In docs to read():
That means that Linux or BSD or whoever can figure out whatever each of them wants to do when a reader encounters EOF on a character device. I have not tested any other systems, but Linux seems to generate an EOF exactly once, and then, if the device is still operational, all subsequent reads will proceed as normal.
Application
One of the consequences of that is that programs running in a terminal can keep reading from stdin and generally doing anything they want after the user presses Ctrl+D.
Some programs take advantage of that, one example is GDB -- if you press Ctrl+D while debugging a remote inferior, it will ask for confirmation and warn you that the inferior would be detached. If you change you mind and choose no, it will continue working as it was before.
It can do that without any issue because sending an EOF on a terminal is just a way to communicate with the process reading from it, it doesn't close the file or prevent further communication.
asyncio behavior
When a
_UnixReadPipeTransport
in asyncio encounters EOF, the following happens:cpython/Lib/asyncio/unix_events.py
Lines 527 to 532 in aa87432
It stops reading from that file (unregisters it from selection), and calls
connection_lost
on the protocol.It also destroys itself and closes the file:
cpython/Lib/asyncio/unix_events.py
Lines 587 to 594 in aa87432
This behavior is correct when we're dealing with stream (or technically datagram, but in that case it would just never execute) sockets or unnamed pipes, but it's incorrect if what we're reading from anything else -- a FIFO or a character device.
Consequences
It means that the kind of thing GDB does is not possible with asyncio's
connect_read_pipe
. Here's an example of what doesn't work:Additionally,
stdin
is now closed, so callingconnect_read_pipe
again will not work either.The only ways to work around that on asyncio would be to either monkeypatch
_UnixReadPipeTransport
, or useos.dup
. I suppose one could also wrap thepipe
file object fed toconnect_read_pipe
to return a sentinel instead of an empty bytes object on EOF, and also modify the protocol to interpret that sentinel in the correct way. None of these options are particularly pleasant.The same things works without issue with standard Python file API:
Possible solution
I don't know how Python developers feel about changing the asyncio API in ways that could subtly break programs, but I hope that the argument for correctness and the fact that some fairly important use cases are currently impossible are strong enough to warrant such a change.
Ideally,
_UnixReadPipeTransport
would be changed not to callconnection_lost
on EOF, instead only calling in on exception. The task of interpreting EOF then would be placed upon protocols. The list of protocols already includes specific protocols for datagrams, streams, and unnamed pipes, so it's not a huge stretch to also have specific protocols for FIFOs and character devices. Their behavior is different from all the protocols that are already there, and I would wager that reading from a terminal is a pretty common use case.Alternatively,
connect_read_pipe
could return different transports for different file types, but that would require it to distinguish between unnamed pipes and FIFOs, which I'm not sure is possible.Additionally,
StreamReader
does not work correctly when reading from a named pipe or a character device, as it assumes that EOF is a permanent state -- which is pretty reasonable, it is a stream reader and it works for stream sockets, the issue is that a character device or pipe is not one. It could be modified to accommodate for those types of files, should the responsibility for indicating that the connection is closed when a stream socket receives an EOF be put onProtocol
. That would make stream in this instance an abstraction specific to asyncio and not one that refers to a POSIX stream socket. Otherwise, other types of readers (and probably writers, I didn't even touch on those) should be created. I admit thatCharacterDeviceReader
does not exactly roll off the tongue.I can't really think of less drastic ways to fix this, but perhaps there are some that I'm missing. Anyway, thank you for reading this! This is the first issue I'm submitting in the Python repository and I've tried to make it as detailed as possible.
Footnotes
IEEE Std 1003.1-2017, Standard for Information Technology -- Portable Operating System Interface (POSIX), The Open Group Base Specifications Issue 7, 2018 Edition, ISBN 978-1-5044-4542-9 -- line 13390, page 393 ↩
IEEE Std 1003.1-2017, Standard for Information Technology -- Portable Operating System Interface (POSIX), The Open Group Base Specifications Issue 7, 2018 Edition, ISBN 978-1-5044-4542-9 -- line 57251, page 1772 ↩
IEEE Std 1003.1-2017, Standard for Information Technology -- Portable Operating System Interface (POSIX), The Open Group Base Specifications Issue 7, 2018 Edition, ISBN 978-1-5044-4542-9 -- lines 58076-58078, page 1793 ↩
IEEE Std 1003.1-2017, Standard for Information Technology -- Portable Operating System Interface (POSIX), The Open Group Base Specifications Issue 7, 2018 Edition, ISBN 978-1-5044-4542-9 -- section 2.10.6, lines 18376-18377, page 524 ↩
recv(2), Linux man-pages 6.04 ↩ ↩2 ↩3
raw(7), Linux man-pages 6.04 ↩
IEEE Std 1003.1-2017, Standard for Information Technology -- Portable Operating System Interface (POSIX), The Open Group Base Specifications Issue 7, 2018 Edition, ISBN 978-1-5044-4542-9 -- lines 57221-57227, page 1771 ↩
IEEE Std 1003.1-2017, Standard for Information Technology -- Portable Operating System Interface (POSIX), The Open Group Base Specifications Issue 7, 2018 Edition, ISBN 978-1-5044-4542-9 -- lines 57217, page 1771 ↩
The text was updated successfully, but these errors were encountered: