Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initialization from zeroed bytes is potentially unsound. #1451

Closed
jswrenn opened this issue Jul 31, 2019 · 8 comments
Closed

Initialization from zeroed bytes is potentially unsound. #1451

jswrenn opened this issue Jul 31, 2019 · 8 comments

Comments

@jswrenn
Copy link
Member

jswrenn commented Jul 31, 2019

MaybeUninit::zeroed and mem::zeroed are the recommended way to initialize many of the datatypes in libc (see: #41 (comment), #55 (comment), #58 (comment), #475 (comment), #1135 (comment), #1395 (comment)).

Per rust-lang/unsafe-code-guidelines#174, initialization from zeroed bytes is potentially unsound for structures with padding. The value of padding bytes is expressly undefined. Rust is therefore free to assume that padding bytes have a particular value. If Rust assumes that the padding bytes of a type T have, for instance, the particular value of 42, mem::zeroed() will not produce a valid instance of T. This is instant UB.

Many (but not all) struct definitions in libc encode padding bytes explicitly as private fields. For such types, initialization via mem::zeroed() is not UB. For the types defined using #[repr(align(N))] to introduce padding bytes, initialization via mem::zeroed() flirts with UB. This issue is relevant to #1324, for expanding the use of align(N).

@gnzlbg
Copy link
Contributor

gnzlbg commented Jul 31, 2019

The value of padding bytes is expressly undefined.

I think this is incorrect: the value of padding bytes is currently unspecified.

Rust is therefore free to assume that padding bytes have a particular value

I think this is also incorrect: Rust assumes that any value is a valid value for a padding byte. It does not matter whether you write zero or some other value to them, or whether you write to them at all (e.g. leaving them uninitialized is also ok).

If Rust assumes that the padding bytes of a type T have, for instance, the particular value of 42, mem::zeroed() will not produce a valid instance of T. This is instant UB.

Since padding bytes can take any value, writing zero to them is ok, and the claim that doing so is instant UB is also incorrect.

or the types defined using #[repr(align(N))] to introduce padding bytes, initialization via mem::zeroed() flirts with UB.

Since padding bytes can take any value, and zero is a value, the claim that initializing these padding bytes with mem::zeroed() "flirts" with UB is also incorrect.


In fact, the current documentation for mem::zeroed actually guarantees that all bytes (without any exception) are initialized to zero. This might be a documentation bug, since mem::zeroed shouldn't initialize padding bytes at all (they are allowed to have any value, so why bother).


Many (but not all) struct definitions in libc encode padding bytes explicitly as private fields. For such types, initialization via mem::zeroed() is not UB.

While you are correct here in that initializing these via mem::zeroed is not UB. You are incorrect in that this is actually ok. When obtaining these structs from FFI, these "fields denoting padding" are often uninitialized, and it is currently unspecified whether "uninitialized" is a valid bit-pattern for u8 (and similar types) used to encode padding bits. The type used for padding should probably be MaybeUninit<u8> instead of just u8.

@RalfJung
Copy link
Member

Since padding bytes can take any value, writing zero to them is ok, and the claim that doing so is instant UB is also incorrect.

The argument in rust-lang/unsafe-code-guidelines#174 is that there is actually nothing stating that these bytes can take any value. All we got is current rustc behavior, but that's clearly not enough.

@gnzlbg
Copy link
Contributor

gnzlbg commented Jul 31, 2019

The argument in rust-lang/unsafe-code-guidelines#174 is that there is actually nothing stating that these bytes can take any value.

Which is why the first sentence in my comment says that the current behavior is unspecified.

All we got is current rustc behavior, but that's clearly not enough.

This is not all we got. We do have official documentation everywhere explaining that the following is guaranteed to work on stable Rust:

let value: repr_c_type_with_padding = c_ffi_fn();

Since C is allowed to write anything to padding bytes, the padding bytes of value can be initialized with anything, including undef (e.g. after cross-lang-lto). So even though we don't have explicit wording anywhere in the spec for this, I don't know how you could restrict padding bytes from being able to store any value without breaking stable Rust FFI.

@RalfJung
Copy link
Member

RalfJung commented Jul 31, 2019

Saying that padding bytes cannot be initialized with anything would break Rust FFI stable features.

So your argument is rust-lang/unsafe-code-guidelines#174 does not need an RFC?

I totally agree with you that zero-initialization of padding must be allowed, and initializing padding with any bytes (including uninitialized) should be allowed. But is this just an edit to the reference or the nomicon, or is it a new RFC?

@CryZe
Copy link

CryZe commented Jul 31, 2019

I'd say for at least repr(C) there shouldn't be a need for an RFC, as the guarantees are not for Rust to decide.

@gnzlbg
Copy link
Contributor

gnzlbg commented Jul 31, 2019

But is this just an edit to the reference or the nomicon, or is it a new RFC?

I think this would just be an edit. We do guarantee that:

let mut value: repr_c_type_with_padding = c_ffi_fn();
c_ffi_fn2(&mut value);

works, we do guarantee that one can pass the reference to C via c_ffi_fn2 and that C can dereference that pointer and write through it. That will write anything to the padding bytes, including undef, so that this can happen, and that this happen is ok, is already guaranteed.

The UCGs should, at some point, define what this "anything" is. There is a PR open saying that valid values for a byte are all bit-patterns in range 0..256 and a 0xUU bit-pattern, but is this everything that C can write to a byte? On first look, for all platforms that Rust support that don't have trap representations, I think that's enough, but because of FFI we are a bit tied to C here, so we need to make sure that whatever we define that "anything" to be, that it does not break the FFI use cases.

@gnzlbg
Copy link
Contributor

gnzlbg commented Aug 2, 2019

I'm closing this as mostly invalid and non-actionable.

This issue claims that initialization from zeroed bytes is unsound for integers - which is what we use for padding byte fields - but this claim is false.

This issue also claims that initialization from zeroed bytes is unsound for padding bytes, but that claims is false, and the claim that mem::zeroed() and MaybeUninit::zeroed().assume_init() actually initialize padding bytes with anything is false as well (they do not touch padding bytes).

This issue claims that Initialization from zeroed bytes is potentially unsound. That claim is true, it is potentially unsound for some types, like &T or fn, but this issue does not point out any actual libc type for which zero-initialization is both necessary and unsound, and is therefore not actionable. If you find such a type, please open an issue, so that we can fix it.

@gnzlbg gnzlbg closed this as completed Aug 2, 2019
@gnzlbg
Copy link
Contributor

gnzlbg commented Aug 2, 2019

I've opened #1453 to track that explicit padding fields might invoke undefined behavior when these are returned from C. That issue is barely actionable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants