-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement CString::from_reader #59314
Conversation
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @shepmaster (or someone else) soon. If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes. Please see the contribution instructions for more information. |
Two more open questions ( see #59229 ):
|
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Changes stable to unstable and removed the hashtag. Co-Authored-By: DevQps <[email protected]>
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Randomly reassigning to t-libs; r? @Amanieu |
@Amanieu Just a bump! I hope you can review this for me! |
Reading a single byte at a time has a massive amount of overhead - this is why |
I thought this at the start as well! But let's take a Another problem with This method was proposed to allow crate authors to safely construct a I hope you follow my rationale! If you have a better idea or just don't agree, please let me know :) |
I think @dtolnay would have a thought on this PR (based on |
All of those objects also implement |
@sfackler I just saw that &[u8] also implements BufRead. What do you suggest to do? Replace this with a function 'CString::from_bufread` or do you have something else in mind? |
As far as I am able to tell from the PR description and the linked issue, the motivation for this is focused on the intersection of high-performance code and safe code. If we knew downstream code didn't care about performance, we would be fine with them implementing this using CString::new rather than adding a constructor. If we knew downstream code didn't mind writing some unsafe code, such as in libflate, similarly we would be fine with them implementing this using CString::from_vec_unchecked. In particular, the motivation does not appear to claim that this piece of logic is so commonly seen in Rust code that it makes sense for std to provide a helper for it. The change is justified really only because this is not possible to implement efficiently in safe code. If it were possible to implement this efficiently in safe code, we would not see the need to bring it into std. Let me know if this does not accurately describe the position of the PR author. I see these options:
|
@dtolnay Thanks for your detailed response! What you said about my assumptions is indeed right. Currently, it is impossible to safely construct a CString without performance overhead. If it were possible using another elegant way, then this PR would indeed not be necessary! I must admit that it doesn't feel like the most beautiful solution. It feels like there must be some better way then adding methods to structs such that they can be read. In my opinion, it always felt weird that Rust does not support reading "primitive" types in I have been diving a bit in the usage of unsafe for only a couple of weeks so I don't know many other scenario's. However, I think there are quite some binary file formats that use CStrings right? There are network protocols such as DNS which use this as well. Therefore, I do believe it needs a safe/performant way. I admit, that most locations inside binary formats that use CString are often in headers which are not that performance critical, but for some network protocols this is not the case I guess. So in short points:
I wonder what your thoughts are on this though! I will probably close this PR afterwards. |
Oh! I completely forgot to ask, but I hope you could explain the following to me!: Let's say we have a function called
So I was wondering: Why is the serde's EDIT: Or does this have to do with Hope you don't mind all these questions :) |
I share this opinion.
But not this one. I think there is room for a library based on a well-designed trait (as opposed to one-off associated functions) that makes it easy to write the type of code you have in mind involving reading primitives and cstrings. Regarding from_reader performance: the from_reader codepath inherently involves more work than from_bytes. https://github.com/serde-rs/json/blob/v1.0.39/src/read.rs has the details, but some examples:
|
@dtolnay Thanks for your response! I didn't have time this weekend so sorry for the small delay! I will close this issue now! I secretly have a few more questions (hope you don't mind answering them otherwise you can ignore them :))
Re-reading your comments from earlier provided me some useful insights! Maybe if I have some spare time, I can look into some interesting solutions that could solve this problem. Thanks in advance! |
…_from_vec_of_nonzerou8, r=KodrAus Added From<Vec<NonZeroU8>> for CString Added a `From<Vec<NonZeroU8>>` `impl` for `CString` # Rationale - `CString::from_vec_unchecked` is a subtle function, that makes `unsafe` code harder to audit when the generated `Vec`'s creation is non-trivial. This `impl` allows to write safer `unsafe` code thanks to the very explicit semantics of the `Vec<NonZeroU8>` type. - One such situation is when trying to `.read()` a `CString`, see issue rust-lang#59229. - this lead to a PR: rust-lang#59314, that was closed for being too specific / narrow (it only targetted being able to `.read()` a `CString`, when this pattern could have been generalized). - the issue suggested another route, based on `From<Vec<NonZeroU8>>`, which is indeed a less general and more concise code pattern. - quoting @Shnatsel: - > For me the main thing about making this safe is simplifying auditing - people have spent like an hour looking at just this one unsafe block in libflate because it's not clear what exactly is unchecked, so you have to look it up when auditing anyway. This has distracted us from much more serious memory safety issues the library had. Having this trivial impl in stdlib would turn this into safe code with compiler more or less guaranteeing that it's fine, and save anyone auditing the code a whole lot of time.
Description
This PR implements the method
CString::from_reader(impl Read) -> Result<CString, io::Error>
.Issue #59229 explains why this method should be included in the Standard Library, but it boils down to:
I still have one open point. Currently I construct a CString by pushing the null character to the buffer and then using:
I thought that would be better then calling
CString::from_vec_unchecked
which results in:In all cases where the vector still contains space for another null character not using
CString::from_vec_unchecked
would be faster becausev.reserve_exact(1);
does not have to be called. I believe this occurs far more often then aVec
having no space left when reading from an object that implementsRead
.I do however wonder if the compiler is able eliminate the check that occurs in
v.push
whenv.reserve_exact
is called just prior to it. If it is able to do that I will change to code to useCString::from_vec_unchecked
because in case of a memory allocation it will try to provide a more exact block without the performance overhead of having to do an extra check. If anyone is able to verify this (Assembly is not my strongest point), please let me know :)What do you guys think about this?
closes #59229
EDIT: Oh, and this is my first pull request on Rust! So if I did anything wrong, please let me know! :)