Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a Read::initializer method #42002

Merged
merged 1 commit into from
Jun 21, 2017
Merged

Add a Read::initializer method #42002

merged 1 commit into from
Jun 21, 2017

Conversation

sfackler
Copy link
Member

@sfackler sfackler commented May 15, 2017

This is an API that allows types to indicate that they can be passed
buffers of uninitialized memory which can improve performance.

cc @SimonSapin

r? @alexcrichton

@sfackler sfackler added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label May 15, 2017
@@ -182,6 +206,10 @@ impl<R: Read> Read for BufReader<R> {
}
}

// we cant impl unconditionally because of the large buffer case in read.
#[unstable(feature = "trusted_len", issue = "0")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this supposed to be "trusted_read"? "trusted_len" also appears in a few other places.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, will fix

impl<R> MakeBuffer for R
where R: TrustedRead
{
fn make_buffer(capacity: usize) -> Vec<u8> {
Copy link
Contributor

@danburkert danburkert May 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be simplified by removing the MakeBuffer trait and just having make_buffer as a free function?

edit: disregard. The trait is necessary to allow for specialization.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's probably a cleaner way of doing this but I don't really know what I'm doing with specialization beyond the basics :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I don't think this can be a free function, I believe specialization only applies to trait impls right now

@comex
Copy link
Contributor

comex commented May 15, 2017

👎 As I said in the other thread, I'd prefer an 'uninitialized buffer' (or 'appendable') trait, with a corresponding method on Read. It'd be more composable and require less unsafe code, although the trait would overlap a bit with Write.

@est31
Copy link
Member

est31 commented May 15, 2017

Agree with @comex . I'd prefer to encode that you must write some array or slice inside the language over this. We don't need super sweet sugar for it, a trait would be enough.

@carols10cents carols10cents added the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label May 15, 2017
@alexcrichton
Copy link
Member

@comex I don't think I quite understood your proposal or what it would look like. Not requiring all Read implementations to deal with this sounds nice though if possible! Can you expand on your thinking to explain how it'd work here?

@sfackler I think TrustedRead for File is missing?

FWIW this pattern isn't the first of its kind, @Kixunil do you have thoughts on the marker trait pattern?

@alexcrichton
Copy link
Member

@sfackler presumably this could also change the default implementation of read_to_end, right? Performing specialization based on whether Self: TrustedRead.

Also, I think the various ChildStdin style types in std::process can also implement TrustedRead?

@sfackler
Copy link
Member Author

sfackler commented May 16, 2017

@comex what do you mean by "forcing every Read impl to take care of it (and use unsafe code)"? Read impls don't change at all in this PR. Could you expand on your composability concerns?

@est31 I don't really understand what that means. Any given call to Read::read may fill anywhere between 0 and the entirety of the buffer passed in.

@sfackler
Copy link
Member Author

Updated

@sfackler sfackler force-pushed the trusted-read branch 2 times, most recently from 0014e80 to e2653f7 Compare May 16, 2017 04:30
@comex
Copy link
Contributor

comex commented May 16, 2017

To be more explicit about my proposal:

The real problem is that the type signature of Read::read sucks.

fn read(&mut self, buf: &mut [u8]) -> Result<usize>;

If implementations aren't supposed to read from buf, why is it declared with a type that allows reading? In some sense it would be better if it was declared like this:

fn read(&mut self, buf: &mut Vec<u8>) -> Result<usize>;

Then, read impls that forward to the OS/FFI would never need to zero memory: they would just ask the OS to read into the uninitialized space at the end of the Vec, and call set_len afterwards to mark it as initialized.

    let ptr = buf.as_mut_ptr().offset(buf.len() as isize);
    let size = buf.capacity() - buf.len();
    let actual = os_read(ptr, size);
    buf.set_len(buf.len() + actual);

Of course, baking Vec into the trait would be problematic since Vec doesn't cover all use cases (especially without the ability to customize the allocator).

But we could abstract over the required functionality, such as with a trait:

unsafe trait UninitializedBuffer<T> {
    fn get(&mut self) -> *mut [T];
    unsafe fn did_fill(&mut self, size: usize);
}

This encapsulates exactly the pattern from before: unsafe code asks for an uninitialized buffer, fills it out, then reports back how much was filled.

Then Read could have a method like

fn read_to_uninitialized(&mut self, buf: impl UninitializedBuffer<u8>) -> Result<usize>;

There are a few concerns here:

  • Backwards compatibility, obviously. Ideally a type would only have to implement read_to_uninitialized and have read automatically implemented. Maybe it could be a separate trait combined with a blanket impl of the old Read?
  • The signature as specified would make Read non-object-safe. Maybe buf should be a trait object? Or perhaps there should be a concrete struct instead of a trait, but that makes APIs a bit messier.
  • Should the amount read still be returned? It's redundant with the call to did_fill, but in practice it might be more convenient to return it as well.

So what are the actual benefits of this scheme over @sfackler's (seemingly simpler) proposal? Well...

  • With just the minimal addition @sfackler proposed, there is no safe way to read into a Vec<u8>; you have to replicate the whole specialization rigmarole yourself (and use unsafe code). That's silly.
  • But okay, it's minimal. In the future, perhaps it would be augmented with a general-purpose safe function to read into a Vec<u8>. But then, how does that get extended to other types of buffers that can be read into?
    • At the least, every type would have to duplicate the specialization logic, which is ugly. Much neater to just implement UninitializedBuffer. Also, UninitializedBuffer doesn't have to be limited to u8.
    • Also, would this be left to similar free methods/functions for each type, or would there be a trait? If a trait, that's just as much complexity as my proposal.
  • What about the reader end? Arguably it's simpler to tack on TrustedRead everywhere than to rewrite using UninitializedBuffer, but...
    • It's unnecessarily unsafe. Impls of Read that use FFI fundamentally need to be unsafe, but most others don't. Some readers in the standard library just wrap other readers, like Chain and Take; others write known byte streams, like Empty, Repeat, Cursor, even BufReader.
    • By itself, UninitializedBuffer would require unsafe code to interact with, but it could easily be supplemented by safe helper methods. Most needs would be covered by a method that takes a &[u8] and memcpies it into the UninitializedBuffer; in fact, this could just be a Write impl. (As I said, there would be some overlap between UninitializedBuffer and Write in that both represent objects that can be written to. But UninitializedBuffer allows writing directly into the target's buffer [edit: whereas Write requires the written data to be prepared in a separate buffer first], and can't fail.)
  • The use of specialization leaves things implicit: it would be easy to accidentally miss a TrustedRead impl and end up unnecessarily zeroing things. My proposal is more explicit.
  • Anyway, is it even allowed, undefined behavior-wise, to create an &mut [u8] pointing to garbage? I guess it probably is, but it's gross.

@sfackler
Copy link
Member Author

sfackler commented May 16, 2017

If implementations aren't supposed to read from buf, why is it declared with a type that allows reading? In some sense it would be better if it was declared like this:

fn read(&mut self, buf: &mut Vec) -> Result;

How does changing from a &mut [u8] to a &mut Vec<u8> solve the reading problem?

It's unnecessarily unsafe.

It is no more unsafe than use of UninitializedBuffer.

Most needs would be covered by a method that takes a &[u8] and memcpies it into the UninitializedBuffer

Where does this &[u8] come from? If this proposal hinges on the fact that it's "less unsafe" than TrustedRead, then it seems pretty important for the safe API to be described in some detail!

it would be easy to accidentally miss a TrustedRead impl and end up unnecessarily zeroing things. My proposal is more explicit.

Wouldn't it be equally easy to forget to implement read_to_uninitialized and fall back to normal Read with a zeroed buffer?

Anyway, is it even allowed, undefined behavior-wise, to create an &mut [u8] pointing to garbage?

Yes it's totally fine. Uninitialized memory is used extensively throughout the standard library.

@comex
Copy link
Contributor

comex commented May 16, 2017

How does changing from a &mut [u8] to a &mut Vec<u8> solve the reading problem?

Because Vec knows what parts of it are initialized and doesn't allow safe code to read the uninitialized parts.

It is no more unsafe than use of UninitializedBuffer.

It is more unsafe because my suggestion allows non-FFI readers to be written using only safe code.

Where does this &[u8] come from? If this proposal hinges on the fact that it's "less unsafe" than TrustedRead, then it seems pretty important for the safe API to be described in some detail!

Among the non-IO standard library Read impls I mentioned: BufReader and Cursor both already have a &[u8] to copy from. Repeat could repeatedly write a single-byte [u8]. (I'm starting to think UninitializedBuffer would be better as a type rather than a trait; among other things, the repeated write could then be inlined and do something reasonable, even when using Read as a trait object.)

Wouldn't it be equally easy to forget to implement read_to_uninitialized and fall back to normal Read with a zeroed buffer?

Not equally easy, because it's one thing to do right rather than wrong (implement the new one rather than the deprecated old one), not an extra thing to remember in addition to the main thing (Read plus TrustedRead). But I guess it would be possible to forget.

@alexcrichton
Copy link
Member

@sfackler as one minor technical piece, mind deleting all the read_to_end specializations in libstd? I think there's a bunch inside of src/libstd/sys/* and they should all be obviated now I think.

@carllerche
Copy link
Member

Just as an FYI, this is partially why bytes has a BufMut trait. It passes a buffer + a cursor to a read fn (read_buf in tokio-io).

@Kixunil
Copy link
Contributor

Kixunil commented May 16, 2017

@alexcrichton Thank you very much for mentioning me and being interested in my opinion, I highly appreciate it!

As documentation for io::Read explains, the implementors shouldn't read from the provided buffer. The thing is, since it isn't marked as unsafe, there's no guarantee that they won't. But what if I have code that is correct? It'd be inefficient to zero the buffer. I wanted a way to express that the implementation is correct. To me unsafe marker trait was the most natural way of doing it.

One advantage I see in marker trait is that if you know the implementation is correct, you can express it without any change to implementation.

Also, I wasn't targeting specialization. I intended to write a helper which turns any T: Read into Wrapper<T>: Read + ReadOverwrite by zeroing before reading. That way, potential performance hit would be immediately visible. I didn't have time to write the helper.

Regarding UninitializedBuffer I consider having support for such thing a very good idea. One obvious use case is having a safe stack-allocated uninitialized buffer. I was also thinking about it previously. It's reflected in genio::ExtendFromReader trait.

I'm not particularly convince which approach is "the right one". Currently my decision process would be based on whether something is impossible with either one which is possible with the other. In that case I'd go with the more capable one. If someone proves them equivalent, then I'd look at ergonomics.

@comex
Copy link
Contributor

comex commented May 16, 2017

@carllerche Yeah, BufMut seems essentially identical to how I was thinking UninitializedBuffer would work. Actually, I was probably subconsciously recalling it from the one time I tried tokio :)

use memchr;

trait MakeBuffer {
fn make_buffer(capacity: usize) -> Vec<u8>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be unsafe fn given that it potentially creates an uninitialized buffer?

@sfackler
Copy link
Member Author

I am still having a hard time visualizing what UninitializedBuffer's specific API is going to look like, how it interacts with the existing Read trait, and how the use of it compares to that of TrustedRead. Is read_to_uninitialized a new method on Read? Is it a separate trait intended to be used via specialization?

Could you maybe put together an equivalent implementation to this PR for comparison?

@sfackler
Copy link
Member Author

Removed all of the old read_to_end impls.

@bluetech
Copy link

My 2c, just rephrasing a bit:

The problem is: implementer of Read is allowed to read from the buffer, even though it never makes sense(?). But for performance reasons, we want to allow it to be uninitialized.

TrustedLen solution: The (human) implementer of Read asserts "trust me, I don't do this.".

Second approach is (disregarding backward-compat): the compiler statically ensures the buffer is only written to, not read from.

I think that in principle, the second approach is favorable, if it can be made to work. I think that's what @comex is trying to achieve, before giving up and going the TrustedLen route.

But fundamentally, it doesn't seem workable: in order to write, you must have a &mut [u8] pointing to uninitialized memory, and that can be read from, so requires unsafe, so does not gain much over TrustedLen. But maybe there's some clever way to "statically ensure the buffer is only written to, not read from" safely?

@SimonSapin
Copy link
Contributor

Second approach is (disregarding backward-compat): the compiler statically ensures […]

This would have been preferable, and indeed was discussed 2.5 years ago shortly before Rust 1.0:

But 1.0 is done, std::io::Read is #[stable], we’re not breaking compatibility now.

@alexcrichton
Copy link
Member

Ok from the technical side of things, looks great to me!

So thinking about this I personally find the biggest downside to be that this won't work through trait objects. For example if you have &mut Read it doesn't propagate TrustedRead. You could for sure take &mut TrustedRead instead but that may be too restrictive.

Another possibility for implementing this functionality would be taking a tokio-io style approach:

pub trait Read {
    unsafe fn is_trusted(&self) -> bool { false }
}

That's still backwards compatible, works through trait objects, and shouldn't come at a perf cost because except in the trait object case it should always be inlined away. The major downside I see to this approach is that unsafe is in the wrong place. It's not at all unsafe to call, it's moreso unsafe to return true. Here unsafe is purely just a "heads up please read the documentation" rather than a "this is a correct indicator of where the unsafety is.

I'm curious, what do others think of that solution? Are there more downsides I'm not thinking of?

@carllerche
Copy link
Member

Ready::is_trusted would require a conditional. I would that if no trait objects are used, the compiler would "do the right thing" assuming the implementation of is_trusted simply returns true or false.

However, what would w/ branch prediction if is_trusted can't be inlined vs. something like what tokio-io does.

@@ -211,6 +211,10 @@ impl Read for ChildStdout {
fn read_to_end(&mut self, buf: &mut Vec<u8>) -> io::Result<usize> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be removed, right? I think there's one below on ChildStderr too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, fixed

@alexcrichton
Copy link
Member

👍 looks great!

Sort of unfortunate the Ininitializer::nop unsafety is hidden, but I think we've covered our bases.

/// `Read`er - the method only takes `&self` so that it can be used through
/// trait objects.
///
/// # Safety
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we typically title sections like this "Unsafety"

/// This method is unsafe because a `Read`er could otherwise return a
/// non-zeroing `Initializer` from another `Read` type without an `unsafe`
/// block.
#[unstable(feature = "maybe_initialized", issue = "0")]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a reminder to file a tracking issue to fill in before landing

@@ -0,0 +1,7 @@
# `should_initialize`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think tidy will also request that this file is renamed

@sfackler
Copy link
Member Author

Updated with a tracking issue - should be ready to go I think.

This is an API that allows types to indicate that they can be passed
buffers of uninitialized memory which can improve performance.
@sfackler sfackler changed the title Add a TrustedRead trait Add an API for Readers to work with uninitialized buffers Jun 21, 2017
@sfackler sfackler changed the title Add an API for Readers to work with uninitialized buffers Add Read::initializer Jun 21, 2017
@alexcrichton alexcrichton changed the title Add Read::initializer Add a Read::initializer method Jun 21, 2017
@alexcrichton
Copy link
Member

@bors: r+

@bors
Copy link
Contributor

bors commented Jun 21, 2017

📌 Commit ecbb896 has been approved by alexcrichton

@bors
Copy link
Contributor

bors commented Jun 21, 2017

⌛ Testing commit ecbb896 with merge 6ccfe68...

bors added a commit that referenced this pull request Jun 21, 2017
Add a Read::initializer method

This is an API that allows types to indicate that they can be passed
buffers of uninitialized memory which can improve performance.

cc @SimonSapin

r? @alexcrichton
@SimonSapin
Copy link
Contributor

So… this is landing hours after a new designed being proposed (as far as people not in the triage discussion are concerned), but this new design uses unsafe fn “backwards”. So far, as far as I can tell, unsafe fn always means “unsafe to call”, not “unsafe to implement” (which unsafe impl means). In this case it’s the other way around.

By adding a precedent in the standard library, this is effectively changing the meaning of a language feature. Is this OK? Was this discussed? @alexcrichton, you called this a “major downside” in #42002 (comment).

@bors
Copy link
Contributor

bors commented Jun 21, 2017

☀️ Test successful - status-appveyor, status-travis
Approved by: alexcrichton
Pushing 6ccfe68 to master...

@bors bors merged commit ecbb896 into rust-lang:master Jun 21, 2017
@eddyb
Copy link
Member

eddyb commented Jun 21, 2017

@SimonSapin It's not backwards. Certain implementations of the method always return Initializer::nop() which is an unsafe constructor, and if the method wasn't also unsafe, that would completely negate the unsafe on Initializer::nop(), making the whole scheme unsound.

I'd agree with you if the method was only returning a bool, except it's "unsafe to override".

@arielb1
Copy link
Contributor

arielb1 commented Jun 21, 2017

This looks like an evil hack. The reason safe code can't return an Initializer::nop() is because it can't get it hands on any Initializer::nop().

@alexcrichton
Copy link
Member

The unsafety here is in two locations:

  • Initializer::nop because it's unsafe to say that you yourself are safe.
  • Read::initializer because someone else could say that they're safe but you yourself are not safe.

We realized during the discussion that both are necessary. The documentation should indicate this. If it doesn't then it should be noted on the tracking issue that the unsafety is unclear.

@sfackler sfackler deleted the trusted-read branch June 21, 2017 15:41
@sfackler
Copy link
Member Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.