Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue for Seek::{stream_len, stream_position} (feature seek_convenience) #59359

Open
2 tasks
LukasKalbertodt opened this issue Mar 22, 2019 · 24 comments
Open
2 tasks
Labels
A-io Area: `std::io`, `std::fs`, `std::net` and `std::path` B-unstable Blocker: Implemented in the nightly compiler and unstable. C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC Libs-Small Libs issues that are considered "small" or self-contained Libs-Tracked Libs issues that are tracked on the team's project board. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@LukasKalbertodt
Copy link
Member

This is a tracking issue for Seek::{stream_len, stream_position}. Proposed and implemented in #58422.

Unresolved questions:

  • Override stream_len for File? (is metadata syncing a problem? comment a, comment b)
  • Final names:
    • Rename to len and position? (but that's a strange signature for len and position clashes with Cursor
    • ...
@jonas-schievink jonas-schievink added T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. B-unstable Blocker: Implemented in the nightly compiler and unstable. C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC labels Mar 22, 2019
lo48576 added a commit to lo48576/fbxcel that referenced this issue Apr 9, 2019
`reader.stream_pos()` is equivalent to
`reader.seek(SeekFrom::Current(0))`.

See <rust-lang/rust#59359>.
lo48576 added a commit to lo48576/fbxcel that referenced this issue Apr 21, 2019
`reader.stream_position()` is equivalent to
`reader.seek(SeekFrom::Current(0))`.

See <rust-lang/rust#59359>.
lo48576 added a commit to lo48576/fbxcel that referenced this issue Apr 21, 2019
`reader.stream_position()` is equivalent to
`reader.seek(SeekFrom::Current(0))`.

See <rust-lang/rust#59359>.
lo48576 added a commit to lo48576/fbxcel that referenced this issue Aug 9, 2019
`reader.stream_position()` is equivalent to
`reader.seek(SeekFrom::Current(0))`.

See <rust-lang/rust#59359>.
@meithecatte
Copy link
Contributor

As the issue calls for a bikeshed... tell seems to be an established name for what is called stream_position here. I don't see an analogous alternative name for stream_length, though, and the two methods having similar names might be advantageous. Besides, stream_position is more descriptive.

@marmistrz
Copy link
Contributor

From the API user perspective, stream_position should probably borrow immutably.

@LukasKalbertodt
Copy link
Member Author

I opened a stabilization PR here: #70904


From the API user perspective, stream_position should probably borrow immutably.

@marmistrz Yes, that would be nice. But it's not possible as the method is just a small helper around seek. And seek requires &mut self.

@marmistrz
Copy link
Contributor

marmistrz commented Apr 9, 2020

Yes, therefore this should be implemented separately, on the low level, just as it's done in yanix:
https://github.com/bytecodealliance/wasmtime/blob/c4e90f729ca5e17a29a778ace7971aeda7c22391/crates/wasi-common/yanix/src/file.rs#L251-L258

@LukasKalbertodt
Copy link
Member Author

@marmistrz Unfortunately, I don't quite understand what you are suggesting. And in particular: are you opposing stabilization of the features as they currently are?

@marmistrz
Copy link
Contributor

In my opinion, instead of the cross-platform implementation present in
https://github.com/rust-lang/rust/pull/58422/files#diff-668f8f358d4a93474b396dcb3727399eR1407-R1410, platform-specific implementations should be used (lseek on Unix/something else on Windows/etc), so that a mutable borrow is not required.

I think that the API should not be stabilized, because it's semantically incorrect to require mutable borrows for immutable queries. If the API user only has an immutable reference, they'll still have to use platform-specific unsafe code. (casting & to &mut is UB)

@LukasKalbertodt
Copy link
Member Author

LukasKalbertodt commented Apr 9, 2020

A few things:

  • You are mainly talking about files, but Seek is implemented for many more things which potentially do not have a way to get the length otherwise. So "it's semantically incorrect to require mutable borrows for immutable queries." is only true for files and the like where mutability is indeed not required. But for some of Seeks implementors, a mutable borrow is indeed correct.

  • For files, there is already an API that safely gets you the length with only an immutable borrow (file.metadata().len()).

  • Even if we want to add another API for tell-like behavior (e.g. because metadata().len() is not the same), this would not conflict with Seek::stream_len. The only reason I see for not stabilizing this API is if you think that it is "a bad influence on users". I.e. "users should only very rarely use three seeks to get the length and instead prefer other ways to obtain the length, so having a method that does this easily accessible is harmful". Is that what you were implying? However, again remember that Seek is not just about files.

@fogti
Copy link
Contributor

fogti commented Apr 9, 2020

@LukasKalbertodt "Seek is not just about files" I'm curious, on which other "I/O objects" is this intended to be implemented on (just to make clear what we're talking about)? Are there existing I/O objects on which stream_len can only be implemented with a mutable borrow?

@marmistrz
Copy link
Contributor

marmistrz commented Apr 9, 2020

  • For files, there is already an API that safely gets you the length with only an immutable borrow (file.metadata().len()).

file.metadata().len() returns the total file length, not the current offset in the file and doesn't change after seeking. It's not a replacement for ftell() or stream_position.

@t-rapp
Copy link
Contributor

t-rapp commented Jul 24, 2020

I think for files there could be a specialized File::position() method that takes an immutable &self, similar to the existing Cursor::position(). As the Seek trait is more general (and due to API stability) the Seek::stream_len() and Seek::stream_position() convenience / optimization functions unfortunately require a mutable borrow. They are still useful, in my opinion.

@KodrAus KodrAus added A-io Area: `std::io`, `std::fs`, `std::net` and `std::path` Libs-Small Libs issues that are considered "small" or self-contained Libs-Tracked Libs issues that are tracked on the team's project board. labels Jul 29, 2020
@HeroicKatora
Copy link
Contributor

@marmistrz As with most io-traits there is the impl<'_> Seek for &'_ File and the &mut &File signature for its stream_position is for all functional means equivalent to taking a &File as self. This wouldn't require any changes to the current design. It does mean, however, that any specialization to File should be done to &File as well as the implementation of the latter does not dispatch to the former.

JohnTitor added a commit to JohnTitor/rust that referenced this issue Jan 28, 2021
…enience, r=m-ou-se

Stabilize `Seek::stream_position` (feature `seek_convenience`)

Tracking issue: rust-lang#59359

Unresolved questions from tracking issue:
- "Override `stream_len` for `File`?" → we can do that in the future, this does not block stabilization.
- "Rename to `len` and `position`?" → as noted in the tracking issue, both of these shorter names have problems (`len` is usually a cheap getter, `position` clashes with `Cursor`). I do think the current names are perfectly fine.
- "Rename `stream_position` to `tell`?" → as mentioned in [the comment bringing this up](rust-lang#59359 (comment)), `stream_position` is more descriptive. I don't think `tell` would be a good name.

What remains to decide, is whether or not adding these methods is worth it.
lo48576 added a commit to lo48576/fbxcel that referenced this issue Mar 26, 2021
`reader.stream_position()` is equivalent to
`reader.seek(SeekFrom::Current(0))`.

See <rust-lang/rust#59359>.
@bluss
Copy link
Member

bluss commented Apr 2, 2021

For BufWriter, it might not be clear to users that this method will write out the buffer. Even though that shouldn't be a big problem (and possible to deduct). Same with stream_position

@t-rapp
Copy link
Contributor

t-rapp commented Apr 6, 2021

I assume an optimized implementation of stream_position could be done, similar to how it was done to BufReader in #74366.

@hellow554
Copy link
Contributor

hellow554 commented Feb 4, 2022

This tracking issue is almost 3 years old.

@LukasKalbertodt any progress on stabilization?

@LukasKalbertodt
Copy link
Member Author

@hellow554 Nope, I'm not really involved in this anymore. I am certainly not working on any stabilization attempts. Sorry!

@hellow554
Copy link
Contributor

I would like to stablize this feature.

@m-ou-se can you please start a fcp merge request?

I would open a PR to stablize the feature once a decision has been made.

@est31
Copy link
Member

est31 commented Feb 20, 2023

@hellow554 there were a bunch of concerns raised about stream_len over in the PR that stabilized parts of this feature:

but I'm not really a huge fan of stream_position [stream_len] being implemented by 3 seeks. It's unfortunately non-atomic and (almost?) every seekable construct of finite length should have a more "proper" method to return its length.

@rdrpenguin04
Copy link

Is there any sign of progress on this?

@est31
Copy link
Member

est31 commented Sep 3, 2023

The next step would probably be to move the functions into a separate trait, as suggested by: #70904 (comment) . I'm not sure if one trait would be better, or it should be two traits. IDK.

@rdrpenguin04
Copy link

It seems a bit silly to require a separate trait for that, in my opinion. As the length of a stream can always be found by three seek operations, that can always be a sane default, and the default behavior can be overridden by e.g. File. Creating a separate trait requires significantly more logic (and possibly negative trait bounds and/or default trait blanket impls?), but keeping it in the same trait seems to be perfectly valid.

@est31
Copy link
Member

est31 commented Sep 7, 2023

I agree that there is more overhead, but read the comment right above the one I linked (this one #70904 (comment)). Having a separate trait allows us to bring the API to more types, also ones that don't impl Seek, like for example slices. I suppose they prioritize that advantage to be more important than the additional overhead, shrug.

@fedy-cz
Copy link

fedy-cz commented Oct 15, 2023

Not sure this is the right place to ask this but it regards the Seek trait:

Currently the Seek trait's stream_position signature is:

fn stream_position(&mut self) -> Result<u64>

That means it needs a mutable reference and it is fallible.
While I understand why that is (it is auto-implemented on top of seek()), it has some unfortunate consequences:
mutable reference to the stream is often not available inside of an (consuming) API chain (if we are able to get a reference at all) , the call causes a flush on BufWriter, ...

I was looking around the std and it seems like even for a simple std::fs:File there is no safe alternative to POSIX ftello(3) call (there is a tell() call but it's experimental and it is part of the wasi_ext ?).

For example: When streaming from a simple file (think std::fs::File -> std::io::BufReader -> flate2 -> ...), the common way to monitor progress on the underlying file seems to be mutably cloning the initial File instance and then calling the stream_position() on the clone?

The std:io:Cursor API is currently being stabilized and I was wondering if that's not an opportunity to make things a little nicer. Cursor has:

pub fn position(&self) -> u64
  • infallible, without the need for mutable reference, seem ideal for the ftello call.

Was wondering if it wouldn't be best to move it into the Seek trait and implement it for std::fs:File etc. using the ftello call or similar system APIs.

Alternatively: it could be implemented for the std::os::fd::BorrowedFd

@vincent-sparks
Copy link

vincent-sparks commented Jun 20, 2024

@LukasKalbertodt "Seek is not just about files" I'm curious, on which other "I/O objects" is this intended to be implemented on (just to make clear what we're talking about)? Are there existing I/O objects on which stream_len can only be implemented with a mutable borrow?

Imagine a file-like object permitting seekable access to a streaming network resource that may or may not know its length in advance, by downloading into a buffer (either in-memory or on-disk) and, whenever the calling code tries to read past the end of the buffer, blocks until more data is available from the network. In the case the network server doesn't send a length, the only way for this object to compute its length (or indeed to seek relative to EOF at all) is to wait for the entire network resource to be downloaded into its buffer.

If every object that implemented Seek had to implement stream_len(&self) -> u64 somehow, this object would have to either return some random guess, i.e. 0, or implement interior mutability, which complicates the design of the struct significantly and results in potentially surprising behavior for API consumers.

@fedy-cz I'll also point out that ftello(3) is not necessarily infallible -- it may return EBADF if the File is opened to a file that is not seekable (e.g. a FIFO).

@lel-amri
Copy link

lel-amri commented Jun 21, 2024

@vincent-sparks

Imagine a file-like object permitting seekable access to a streaming network resource that may or may not know its length in advance, by downloading into a buffer [...] and, whenever the calling code tries to read past the end of the buffer, blocks until more data is available from the network.

The semantic of stream_len is left undefined, but its return value is equivalent to seek(SeekFrom::End(0)). The current state of the art (at least AFAIK for Linux, Dawin, Windows and probably the three well known BSDs) regarding seekable objects is that "seek to end" does not care for what's the "expected final size" of the stream, but rather the size of what's currently available. To me it is imperative that all Seek conveniences relies on the properties of the basic seek() (at the Rust level, I'm not saying they have to be implemented with lseek for example).

If every object that implemented Seek had to implement stream_len(&self) -> u64 somehow, this object would have to [...] return some random guess

I think Seek is not the right kind of interface for such an object.


About the tracking:

I also think that renaming stream_len and stream_position to len and position respectively would leave a strange signature for len. About Cursor's position, it seems quite redundant with Seek's position, should it be decided to rename stream_position to position, I would rather opt to remove position from Cursor or rename Cursor's position to something else.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-io Area: `std::io`, `std::fs`, `std::net` and `std::path` B-unstable Blocker: Implemented in the nightly compiler and unstable. C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC Libs-Small Libs issues that are considered "small" or self-contained Libs-Tracked Libs issues that are tracked on the team's project board. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests