-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hazard in the specification of std::slice::from_raw_parts, since other languages can represent empty slices with a null pointer #120243
Comments
If an FFI function expects to receive nullable pointers it should be using |
Sure, if you have the slice as slice::from_raw_parts(start.unwrap_or_else(|| ptr::NonNull::<T>::dangling(), len) which is slightly more elegant, I guess. But it doesn't really address the point that you need to know that this is a problem in the first place in order to see that it could be useful to do that. Unless you explicitly think about the fact that the convention in C++ is to represent empty slices using a null pointer, you're very likely to miss it, and I am not saying that you can't write correct code with the API as it is; clearly you can. |
I don't understand the scenario under which this can happen. And the function does have debug asserts. So if one runs such incorrect code under cargo careful it would be detected. |
I am working on a compiler change that will detect this problem via a special kind of debug assertion, without any need to use an external tool like cargo-careful. Since such a change is possible, I think it would be a grave mistake to change the API of this function so that it accepts more calls by doing a runtime check in all build configurations. |
The blog post does suggest improving the documentation (how? It's not clear to me how the current docs are ambiguous or misleading) and adding separate FFI-specific conversion helpers.
It's more than a niche optimization. We're telling LLVM that the pointers are non-null and dereferencable. Passing a null pointer is instant UB, not merely on-access UB.
That's quite subjective. In std those methods are mostly in collections and iterators to convert between raw pointer representations and safe slices which always follow the Rust semantics. |
Why would we assume this transformation would bite people less often than our destruction of the following equivalence? let slice = core::slice::from_raw_parts(ptr, len);
let (ptr, len) = (slice.as_ptr(), slice.len()); |
I do not find David Ben's post to be convincing, because I do work with FFI in a context where making sure one gets the semantics of slices right is quite important... and the most common representation of a slice-like type does not match his preferred formats (it does not even use nullptr to indicate validity or non!). The second uses nullptr for a 0-len version because it used to be a LinkedList. Only the third and onwards do. C programmers can be relentless in their eagerness to bitpack things, and I feel that arguing from "but there's a standard slice representation!" requires C2y to add |
The documentation is clear that
std::slice::from_raw_parts
(std::ptr::null, 0)
is UB.This creates a significant hazard for FFI code interfacing with C++ (and very likely other languages). Although C++ doesn't have a built-in or standard-library type for a slice, the most common way of representing slices is as a
start
pointer and a length, or asstart
andend
pointers. As explained in this post,start
is conventionally allowed to be null (i.e. the representation of an empty slice can be(nullptr, 0)
or(nullptr, nullptr)
), and this is consistent with the behaviour of C++ standard library APIs such as std::span.The stated rationale for this call being UB is:
This rationale is unconvincing to me. It's unnecessary for the detail of the Rust ABI's internal representation of slices to result in exposing this hazard. The full signature is
Null slices are a niche-value optimization, they're not part of the domain of
&'a [T]
. When Rust programmers write a use of this function, they are most likely to be writing FFI code. We don't know wheredata
andlen
are taken from; they could be using either the Rust or the C++ convention for an empty slice. But there is no ambiguity: neither of these representations can mean anything else than an empty slice. So, I argue that the implementation ofslice::from_raw_parts
should accept either, converting a null pointer into an aligned "dangling" pointer as necessary.The fact that this involves a representation change because of differences between C++ and Rust slice conventions shouldn't matter: returning a valid Rust representation of an empty slice is the only thing that could be correct. Note that it is fine for
std::slice::from_raw_parts(ptr::null, len)
still to be UB whenlen > 0
.To correctly convert a
data
andlen
that could be using the C++ convention, with the current definition ofslice::from_raw_parts
you would need to do something like:which is verbose and easy to miss the need for.
What about the cost of the null pointer check? Well, in many cases the compiler will be able to statically determine that
data
is non-null. In those cases, when it inlinesslice::from_raw_parts
it will optimize out the null check, and the cost will be zero. This includes both invocations in the code above. It even includes complicated cases such as this inbitvec
:Here
self.bitspan.address().to_const()
returns the result ofas_ptr()
on aNonNull
value (via this code inwyz
which is#[inline(always)]
), which allows the null check to be optimized out. Many of the other examples I looked at are like this.The only cases where correct code can incur an additional cost for the null check is when the programmer knows —and is correct— that
data
cannot be null, but the compiler does not. And we only care about that in cases where it causes a performance regression. I would suggest that this is probably very rare, and that it is likely to be much more common that programmers take a(start, len)
or(start, end)
slice according to the C++ convention and just don't think of the corner case wherestart
is null.What about the possibility of code written for the new specification being run on an older Rust version that requires
data
to be non-null? That's okay as long as we document when the requirement was changed. Then:Aside: String::from_raw_parts and Vec::from_raw_parts_in are similar, but the case for changing those is much weaker, since they impose strong constraints on the allocated region that would be unlikely to be met by a random slice coming from C++ or another language.
The text was updated successfully, but these errors were encountered: