Add a Read::initializer method #42002

sfackler · 2017-05-15T01:34:10Z

This is an API that allows types to indicate that they can be passed
buffers of uninitialized memory which can improve performance.

cc @SimonSapin

r? @alexcrichton

danburkert · 2017-05-15T03:19:20Z

src/libstd/io/buffered.rs

@@ -182,6 +206,10 @@ impl<R: Read> Read for BufReader<R> {
    }
 }

+// we cant impl unconditionally because of the large buffer case in read.
+#[unstable(feature = "trusted_len", issue = "0")]


Is this supposed to be "trusted_read"? "trusted_len" also appears in a few other places.

Oops, will fix

danburkert · 2017-05-15T03:25:17Z

src/libstd/io/buffered.rs

+impl<R> MakeBuffer for R
+    where R: TrustedRead
+{
+    fn make_buffer(capacity: usize) -> Vec<u8> {


Can this be simplified by removing the MakeBuffer trait and just having make_buffer as a free function?

edit: disregard. The trait is necessary to allow for specialization.

There's probably a cleaner way of doing this but I don't really know what I'm doing with specialization beyond the basics :)

FWIW I don't think this can be a free function, I believe specialization only applies to trait impls right now

comex · 2017-05-15T05:58:26Z

👎 As I said in the other thread, I'd prefer an 'uninitialized buffer' (or 'appendable') trait, with a corresponding method on Read. It'd be more composable and require less unsafe code, although the trait would overlap a bit with Write.

est31 · 2017-05-15T06:15:08Z

Agree with @comex . I'd prefer to encode that you must write some array or slice inside the language over this. We don't need super sweet sugar for it, a trait would be enough.

alexcrichton · 2017-05-15T18:20:07Z

@comex I don't think I quite understood your proposal or what it would look like. Not requiring all Read implementations to deal with this sounds nice though if possible! Can you expand on your thinking to explain how it'd work here?

@sfackler I think TrustedRead for File is missing?

FWIW this pattern isn't the first of its kind, @Kixunil do you have thoughts on the marker trait pattern?

alexcrichton · 2017-05-15T18:21:11Z

@sfackler presumably this could also change the default implementation of read_to_end, right? Performing specialization based on whether Self: TrustedRead.

Also, I think the various ChildStdin style types in std::process can also implement TrustedRead?

sfackler · 2017-05-16T02:54:04Z

@comex what do you mean by "forcing every Read impl to take care of it (and use unsafe code)"? Read impls don't change at all in this PR. Could you expand on your composability concerns?

@est31 I don't really understand what that means. Any given call to Read::read may fill anywhere between 0 and the entirety of the buffer passed in.

sfackler · 2017-05-16T03:57:17Z

Updated

comex · 2017-05-16T05:59:16Z

To be more explicit about my proposal:

The real problem is that the type signature of Read::read sucks.

fn read(&mut self, buf: &mut [u8]) -> Result<usize>;

If implementations aren't supposed to read from buf, why is it declared with a type that allows reading? In some sense it would be better if it was declared like this:

fn read(&mut self, buf: &mut Vec<u8>) -> Result<usize>;

Then, read impls that forward to the OS/FFI would never need to zero memory: they would just ask the OS to read into the uninitialized space at the end of the Vec, and call set_len afterwards to mark it as initialized.

    let ptr = buf.as_mut_ptr().offset(buf.len() as isize);
    let size = buf.capacity() - buf.len();
    let actual = os_read(ptr, size);
    buf.set_len(buf.len() + actual);

Of course, baking Vec into the trait would be problematic since Vec doesn't cover all use cases (especially without the ability to customize the allocator).

But we could abstract over the required functionality, such as with a trait:

unsafe trait UninitializedBuffer<T> {
    fn get(&mut self) -> *mut [T];
    unsafe fn did_fill(&mut self, size: usize);
}

This encapsulates exactly the pattern from before: unsafe code asks for an uninitialized buffer, fills it out, then reports back how much was filled.

Then Read could have a method like

fn read_to_uninitialized(&mut self, buf: impl UninitializedBuffer<u8>) -> Result<usize>;

There are a few concerns here:

Backwards compatibility, obviously. Ideally a type would only have to implement read_to_uninitialized and have read automatically implemented. Maybe it could be a separate trait combined with a blanket impl of the old Read?
The signature as specified would make Read non-object-safe. Maybe buf should be a trait object? Or perhaps there should be a concrete struct instead of a trait, but that makes APIs a bit messier.
Should the amount read still be returned? It's redundant with the call to did_fill, but in practice it might be more convenient to return it as well.

So what are the actual benefits of this scheme over @sfackler's (seemingly simpler) proposal? Well...

With just the minimal addition @sfackler proposed, there is no safe way to read into a Vec<u8>; you have to replicate the whole specialization rigmarole yourself (and use unsafe code). That's silly.
But okay, it's minimal. In the future, perhaps it would be augmented with a general-purpose safe function to read into a Vec<u8>. But then, how does that get extended to other types of buffers that can be read into?
- At the least, every type would have to duplicate the specialization logic, which is ugly. Much neater to just implement UninitializedBuffer. Also, UninitializedBuffer doesn't have to be limited to u8.
- Also, would this be left to similar free methods/functions for each type, or would there be a trait? If a trait, that's just as much complexity as my proposal.
What about the reader end? Arguably it's simpler to tack on TrustedRead everywhere than to rewrite using UninitializedBuffer, but...
- It's unnecessarily unsafe. Impls of Read that use FFI fundamentally need to be unsafe, but most others don't. Some readers in the standard library just wrap other readers, like Chain and Take; others write known byte streams, like Empty, Repeat, Cursor, even BufReader.
- By itself, UninitializedBuffer would require unsafe code to interact with, but it could easily be supplemented by safe helper methods. Most needs would be covered by a method that takes a &[u8] and memcpies it into the UninitializedBuffer; in fact, this could just be a Write impl. (As I said, there would be some overlap between UninitializedBuffer and Write in that both represent objects that can be written to. But UninitializedBuffer allows writing directly into the target's buffer [edit: whereas Write requires the written data to be prepared in a separate buffer first], and can't fail.)
The use of specialization leaves things implicit: it would be easy to accidentally miss a TrustedRead impl and end up unnecessarily zeroing things. My proposal is more explicit.
Anyway, is it even allowed, undefined behavior-wise, to create an &mut [u8] pointing to garbage? I guess it probably is, but it's gross.

sfackler · 2017-05-16T06:16:18Z

If implementations aren't supposed to read from buf, why is it declared with a type that allows reading? In some sense it would be better if it was declared like this:

fn read(&mut self, buf: &mut Vec) -> Result;

How does changing from a &mut [u8] to a &mut Vec<u8> solve the reading problem?

It's unnecessarily unsafe.

It is no more unsafe than use of UninitializedBuffer.

Most needs would be covered by a method that takes a &[u8] and memcpies it into the UninitializedBuffer

Where does this &[u8] come from? If this proposal hinges on the fact that it's "less unsafe" than TrustedRead, then it seems pretty important for the safe API to be described in some detail!

it would be easy to accidentally miss a TrustedRead impl and end up unnecessarily zeroing things. My proposal is more explicit.

Wouldn't it be equally easy to forget to implement read_to_uninitialized and fall back to normal Read with a zeroed buffer?

Anyway, is it even allowed, undefined behavior-wise, to create an &mut [u8] pointing to garbage?

Yes it's totally fine. Uninitialized memory is used extensively throughout the standard library.

comex · 2017-05-16T06:46:37Z

How does changing from a &mut [u8] to a &mut Vec<u8> solve the reading problem?

Because Vec knows what parts of it are initialized and doesn't allow safe code to read the uninitialized parts.

It is no more unsafe than use of UninitializedBuffer.

It is more unsafe because my suggestion allows non-FFI readers to be written using only safe code.

Where does this &[u8] come from? If this proposal hinges on the fact that it's "less unsafe" than TrustedRead, then it seems pretty important for the safe API to be described in some detail!

Among the non-IO standard library Read impls I mentioned: BufReader and Cursor both already have a &[u8] to copy from. Repeat could repeatedly write a single-byte [u8]. (I'm starting to think UninitializedBuffer would be better as a type rather than a trait; among other things, the repeated write could then be inlined and do something reasonable, even when using Read as a trait object.)

Wouldn't it be equally easy to forget to implement read_to_uninitialized and fall back to normal Read with a zeroed buffer?

Not equally easy, because it's one thing to do right rather than wrong (implement the new one rather than the deprecated old one), not an extra thing to remember in addition to the main thing (Read plus TrustedRead). But I guess it would be possible to forget.

alexcrichton · 2017-05-16T14:17:34Z

@sfackler as one minor technical piece, mind deleting all the read_to_end specializations in libstd? I think there's a bunch inside of src/libstd/sys/* and they should all be obviated now I think.

carllerche · 2017-05-16T16:09:21Z

Just as an FYI, this is partially why bytes has a BufMut trait. It passes a buffer + a cursor to a read fn (read_buf in tokio-io).

Kixunil · 2017-05-16T16:26:28Z

@alexcrichton Thank you very much for mentioning me and being interested in my opinion, I highly appreciate it!

As documentation for io::Read explains, the implementors shouldn't read from the provided buffer. The thing is, since it isn't marked as unsafe, there's no guarantee that they won't. But what if I have code that is correct? It'd be inefficient to zero the buffer. I wanted a way to express that the implementation is correct. To me unsafe marker trait was the most natural way of doing it.

One advantage I see in marker trait is that if you know the implementation is correct, you can express it without any change to implementation.

Also, I wasn't targeting specialization. I intended to write a helper which turns any T: Read into Wrapper<T>: Read + ReadOverwrite by zeroing before reading. That way, potential performance hit would be immediately visible. I didn't have time to write the helper.

Regarding UninitializedBuffer I consider having support for such thing a very good idea. One obvious use case is having a safe stack-allocated uninitialized buffer. I was also thinking about it previously. It's reflected in genio::ExtendFromReader trait.

I'm not particularly convince which approach is "the right one". Currently my decision process would be based on whether something is impossible with either one which is possible with the other. In that case I'd go with the more capable one. If someone proves them equivalent, then I'd look at ergonomics.

comex · 2017-05-16T20:47:26Z

@carllerche Yeah, BufMut seems essentially identical to how I was thinking UninitializedBuffer would work. Actually, I was probably subconsciously recalling it from the one time I tried tokio :)

arielb1 · 2017-05-16T21:33:04Z

src/libstd/io/buffered.rs

 use memchr;

+trait MakeBuffer {
+    fn make_buffer(capacity: usize) -> Vec<u8>;


Shouldn't this be unsafe fn given that it potentially creates an uninitialized buffer?

sfackler · 2017-05-17T03:22:54Z

I am still having a hard time visualizing what UninitializedBuffer's specific API is going to look like, how it interacts with the existing Read trait, and how the use of it compares to that of TrustedRead. Is read_to_uninitialized a new method on Read? Is it a separate trait intended to be used via specialization?

Could you maybe put together an equivalent implementation to this PR for comparison?

sfackler · 2017-05-17T04:01:38Z

Removed all of the old read_to_end impls.

bluetech · 2017-05-17T09:25:40Z

My 2c, just rephrasing a bit:

The problem is: implementer of Read is allowed to read from the buffer, even though it never makes sense(?). But for performance reasons, we want to allow it to be uninitialized.

TrustedLen solution: The (human) implementer of Read asserts "trust me, I don't do this.".

Second approach is (disregarding backward-compat): the compiler statically ensures the buffer is only written to, not read from.

I think that in principle, the second approach is favorable, if it can be made to work. I think that's what @comex is trying to achieve, before giving up and going the TrustedLen route.

But fundamentally, it doesn't seem workable: in order to write, you must have a &mut [u8] pointing to uninitialized memory, and that can be read from, so requires unsafe, so does not gain much over TrustedLen. But maybe there's some clever way to "statically ensure the buffer is only written to, not read from" safely?

SimonSapin · 2017-05-17T09:37:06Z

Second approach is (disregarding backward-compat): the compiler statically ensures […]

This would have been preferable, and indeed was discussed 2.5 years ago shortly before Rust 1.0:

But 1.0 is done, std::io::Read is #[stable], we’re not breaking compatibility now.

alexcrichton · 2017-05-17T17:13:27Z

Ok from the technical side of things, looks great to me!

So thinking about this I personally find the biggest downside to be that this won't work through trait objects. For example if you have &mut Read it doesn't propagate TrustedRead. You could for sure take &mut TrustedRead instead but that may be too restrictive.

Another possibility for implementing this functionality would be taking a tokio-io style approach:

pub trait Read {
    unsafe fn is_trusted(&self) -> bool { false }
}

That's still backwards compatible, works through trait objects, and shouldn't come at a perf cost because except in the trait object case it should always be inlined away. The major downside I see to this approach is that unsafe is in the wrong place. It's not at all unsafe to call, it's moreso unsafe to return true. Here unsafe is purely just a "heads up please read the documentation" rather than a "this is a correct indicator of where the unsafety is.

I'm curious, what do others think of that solution? Are there more downsides I'm not thinking of?

carllerche · 2017-05-17T17:23:32Z

Ready::is_trusted would require a conditional. I would that if no trait objects are used, the compiler would "do the right thing" assuming the implementation of is_trusted simply returns true or false.

However, what would w/ branch prediction if is_trusted can't be inlined vs. something like what tokio-io does.

alexcrichton · 2017-06-21T03:12:36Z

src/libstd/process.rs

@@ -211,6 +211,10 @@ impl Read for ChildStdout {
    fn read_to_end(&mut self, buf: &mut Vec<u8>) -> io::Result<usize> {


This can be removed, right? I think there's one below on ChildStderr too.

alexcrichton · 2017-06-21T03:14:00Z

👍 looks great!

Sort of unfortunate the Ininitializer::nop unsafety is hidden, but I think we've covered our bases.

alexcrichton · 2017-06-21T03:14:46Z

src/libstd/io/mod.rs

+    /// `Read`er - the method only takes `&self` so that it can be used through
+    /// trait objects.
+    ///
+    /// # Safety


I think we typically title sections like this "Unsafety"

alexcrichton · 2017-06-21T03:14:57Z

src/libstd/io/mod.rs

+    /// This method is unsafe because a `Read`er could otherwise return a
+    /// non-zeroing `Initializer` from another `Read` type without an `unsafe`
+    /// block.
+    #[unstable(feature = "maybe_initialized", issue = "0")]


Just a reminder to file a tracking issue to fill in before landing

alexcrichton · 2017-06-21T03:15:34Z

src/doc/unstable-book/src/library-features/should-initialize.md

@@ -0,0 +1,7 @@
+# `should_initialize`


I think tidy will also request that this file is renamed

sfackler · 2017-06-21T03:25:14Z

Updated with a tracking issue - should be ready to go I think.

This is an API that allows types to indicate that they can be passed buffers of uninitialized memory which can improve performance.

alexcrichton · 2017-06-21T03:31:22Z

@bors: r+

bors · 2017-06-21T03:31:23Z

📌 Commit ecbb896 has been approved by alexcrichton

bors · 2017-06-21T04:14:52Z

⌛ Testing commit ecbb896 with merge 6ccfe68...

@SimonSapin

Add a Read::initializer method This is an API that allows types to indicate that they can be passed buffers of uninitialized memory which can improve performance. cc @SimonSapin r? @alexcrichton

SimonSapin · 2017-06-21T06:20:29Z

So… this is landing hours after a new designed being proposed (as far as people not in the triage discussion are concerned), but this new design uses unsafe fn “backwards”. So far, as far as I can tell, unsafe fn always means “unsafe to call”, not “unsafe to implement” (which unsafe impl means). In this case it’s the other way around.

By adding a precedent in the standard library, this is effectively changing the meaning of a language feature. Is this OK? Was this discussed? @alexcrichton, you called this a “major downside” in #42002 (comment).

bors · 2017-06-21T06:27:31Z

☀️ Test successful - status-appveyor, status-travis
Approved by: alexcrichton
Pushing 6ccfe68 to master...

eddyb · 2017-06-21T09:03:55Z

@SimonSapin It's not backwards. Certain implementations of the method always return Initializer::nop() which is an unsafe constructor, and if the method wasn't also unsafe, that would completely negate the unsafe on Initializer::nop(), making the whole scheme unsound.

I'd agree with you if the method was only returning a bool, except it's "unsafe to override".

arielb1 · 2017-06-21T10:43:24Z

This looks like an evil hack. The reason safe code can't return an Initializer::nop() is because it can't get it hands on any Initializer::nop().

alexcrichton · 2017-06-21T14:06:33Z

The unsafety here is in two locations:

Initializer::nop because it's unsafe to say that you yourself are safe.
Read::initializer because someone else could say that they're safe but you yourself are not safe.

We realized during the discussion that both are necessary. The documentation should indicate this. If it doesn't then it should be noted on the tracking issue that the unsafety is unclear.

sfackler · 2017-06-21T15:42:26Z

Is the documentation here not clear? https://github.com/rust-lang/rust/pull/42002/files#diff-668f8f358d4a93474b396dcb3727399eR521

sfackler added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label May 15, 2017

rust-highfive assigned alexcrichton May 15, 2017

danburkert reviewed May 15, 2017

View reviewed changes

sfackler force-pushed the trusted-read branch from db6dc4d to a023ba2 Compare May 15, 2017 03:57

carols10cents added the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label May 15, 2017

sfackler force-pushed the trusted-read branch from a023ba2 to d8241dd Compare May 16, 2017 03:57

sfackler force-pushed the trusted-read branch 2 times, most recently from 0014e80 to e2653f7 Compare May 16, 2017 04:30

sfackler force-pushed the trusted-read branch from e2653f7 to 6244609 Compare May 16, 2017 06:19

arielb1 reviewed May 16, 2017

View reviewed changes

alexcrichton reviewed Jun 21, 2017

View reviewed changes

sfackler force-pushed the trusted-read branch from 27aa1bf to 90f7f21 Compare June 21, 2017 03:14

alexcrichton reviewed Jun 21, 2017

View reviewed changes

sfackler mentioned this pull request Jun 21, 2017

Tracking issue for Read::initializer #42788

Closed

sfackler force-pushed the trusted-read branch from 90f7f21 to 17a7a22 Compare June 21, 2017 03:24

Add Read::initializer.

ecbb896

This is an API that allows types to indicate that they can be passed buffers of uninitialized memory which can improve performance.

sfackler force-pushed the trusted-read branch from 17a7a22 to ecbb896 Compare June 21, 2017 03:26

sfackler changed the title ~~Add a TrustedRead trait~~ Add an API for Readers to work with uninitialized buffers Jun 21, 2017

sfackler changed the title ~~Add an API for Readers to work with uninitialized buffers~~ Add Read::initializer Jun 21, 2017

alexcrichton changed the title ~~Add Read::initializer~~ Add a Read::initializer method Jun 21, 2017

bors merged commit ecbb896 into rust-lang:master Jun 21, 2017

sfackler deleted the trusted-read branch June 21, 2017 15:41

Kixunil mentioned this pull request Jul 11, 2017

Writing to uninitialized buffer BurntSushi/byteorder#90

Closed

carllerche mentioned this pull request Mar 31, 2018

cpu load by prepare_uninitialized_buffer in async_read of tokio-io tokio-rs/tokio#269

Closed

alex mentioned this pull request Dec 9, 2020

Add bindings for ClientSession, plus demo program rustls/rustls-ffi#1

Merged

jsha mentioned this pull request Dec 22, 2020

Initial implementation of rustls backend in curl. curl/curl#6350

Closed

		@@ -211,6 +211,10 @@ impl Read for ChildStdout {
		fn read_to_end(&mut self, buf: &mut Vec<u8>) -> io::Result<usize> {

Add a Read::initializer method #42002

Add a Read::initializer method #42002

Conversation

sfackler commented May 15, 2017 • edited by alexcrichton Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danburkert May 15, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

comex commented May 15, 2017

est31 commented May 15, 2017 • edited Loading

alexcrichton commented May 15, 2017

alexcrichton commented May 15, 2017

sfackler commented May 16, 2017 • edited Loading

sfackler commented May 16, 2017

comex commented May 16, 2017 • edited Loading

sfackler commented May 16, 2017 • edited Loading

comex commented May 16, 2017 • edited Loading

alexcrichton commented May 16, 2017

carllerche commented May 16, 2017

Kixunil commented May 16, 2017

comex commented May 16, 2017

Choose a reason for hiding this comment

sfackler commented May 17, 2017

sfackler commented May 17, 2017

bluetech commented May 17, 2017

SimonSapin commented May 17, 2017

alexcrichton commented May 17, 2017

carllerche commented May 17, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alexcrichton commented Jun 21, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sfackler commented Jun 21, 2017

alexcrichton commented Jun 21, 2017

bors commented Jun 21, 2017

bors commented Jun 21, 2017

SimonSapin commented Jun 21, 2017

bors commented Jun 21, 2017

eddyb commented Jun 21, 2017

arielb1 commented Jun 21, 2017

alexcrichton commented Jun 21, 2017

sfackler commented Jun 21, 2017

sfackler commented May 15, 2017 •

edited by alexcrichton

Loading

danburkert May 15, 2017 •

edited

Loading

est31 commented May 15, 2017 •

edited

Loading

sfackler commented May 16, 2017 •

edited

Loading

comex commented May 16, 2017 •

edited

Loading

sfackler commented May 16, 2017 •

edited

Loading

comex commented May 16, 2017 •

edited

Loading