-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document current justification for not requiring recursive reference validity (in particular, &mut uninit
not being immediate UB)
#346
Comments
This writeup is also relevant, as is the tower of weakenings: it could be a useful position to say that memory validity is checked when retagging (insert a read on |
So I'm going to contest the claim that there's no known optimization benefit to this. The code below gives an example. It uses enum E {
A(UnsafeCell<u8>),
B(u8),
}
let b = E::B(0);
opaque_func(&b);
assert_eq!(b, E::B(0)); The goal is to optimize out the assert. However, it is unclear how to justify this optimization. Specifically, we pass a This necessarily means that retagging (which happens basically every time a reference is copied) must assert validity conditions in at least some cases. We have some control over when this happens though. For example, there's no need to assert validity when retagging a There are alternatives to this model as well - for example, we could try and adjust the aliasing model to make this unnecessary, by having some rule like ("retagged references get the same permissions as their parents" or such). It's unclear how precisely to do this though - a couple naive ideas I've thought of don't work - and so this would probably need quite some investigation. cc @joshtriplett who suggested that we should commit to allowing invalid memory behind references. Given the complexity here, I don't think this is such a good idea right now. |
Maybe I'm just being too picky, but I think this wording is way too strong. Stacked Borrowed with Untagged and Stacked Borrows with Wildcard both miss UB (or at least I'm sure the latter does, the former is just so strange that maybe it just prescribes wacky semantics). So with the best checker we have now you can choose a model farther up the tower, thus with false positives for UB, or the "actual" model with false negatives. I'm not sure there is a path forward to a perfect sanitizing checker like you describe.
Is it really decided if it is still being contended? I don't want to become the guy who argues about |
Note that (Also, I use contentious maybe a little loosely in this case; roughly everyone now supports wf UTF-8 being not a compiler-required property, and the contention is more along the lines of primitives having a language-prescribed safety invariant at all.) I added a clarification to the OP to that point. |
The above clarification has led me to an interesting realization: Roughly speaking, validity invariants are enforced on MIR trying to get typed copy without retagRust #![feature(intrinsics, raw_ref_op)]
// NB: std::intrinsics doesn't expose the raw intrinsic; link it here
extern "rust-intrinsic" {
pub fn copy_nonoverlapping<T>(src: *const T, dst: *mut T, count: usize);
}
pub fn example_retag(_1: &i32) {
let mut _2 = _1;
let _3 = unsafe {
copy_nonoverlapping(
&raw const _1 /* _4 */,
&raw mut _2 /* _5 */,
1,
)
};
} MIR fn example_retag(_1: &i32) -> () {
debug _1 => _1; // in scope 0 at /app/example.rs:8:22: 8:24
let mut _0: (); // return place in scope 0 at /app/example.rs:8:32: 8:32
let mut _2: &i32; // in scope 0 at /app/example.rs:9:9: 9:15
let mut _4: *const &i32; // in scope 0 at /app/example.rs:12:13: 12:26
let mut _5: *mut &i32; // in scope 0 at /app/example.rs:13:13: 13:24
scope 1 {
debug _2 => _2; // in scope 1 at /app/example.rs:9:9: 9:15
let _3: (); // in scope 1 at /app/example.rs:10:9: 10:11
scope 2 {
debug _3 => _3; // in scope 2 at /app/example.rs:10:9: 10:11
}
scope 3 {
}
}
bb0: {
Retag([fn entry] _1); // scope 0 at /app/example.rs:8:1: 17:2
StorageLive(_2); // scope 0 at /app/example.rs:9:9: 9:15
_2 = _1; // scope 0 at /app/example.rs:9:18: 9:20
Retag(_2); // scope 0 at /app/example.rs:9:18: 9:20
StorageLive(_3); // scope 1 at /app/example.rs:10:9: 10:11
StorageLive(_4); // scope 3 at /app/example.rs:12:13: 12:26
_4 = &raw const _1; // scope 3 at /app/example.rs:12:13: 12:26
Retag([raw] _4); // scope 3 at /app/example.rs:12:13: 12:26
StorageLive(_5); // scope 3 at /app/example.rs:13:13: 13:24
_5 = &raw mut _2; // scope 3 at /app/example.rs:13:13: 13:24
Retag([raw] _5); // scope 3 at /app/example.rs:13:13: 13:24
copy_nonoverlapping(src=move _4, dst=move _5, count=const 1_usize); // scope 3 at /app/example.rs:11:9: 15:10
StorageDead(_5); // scope 3 at /app/example.rs:15:9: 15:10
StorageDead(_4); // scope 3 at /app/example.rs:15:9: 15:10
_0 = const (); // scope 0 at /app/example.rs:8:32: 17:2
StorageDead(_3); // scope 1 at /app/example.rs:17:1: 17:2
StorageDead(_2); // scope 0 at /app/example.rs:17:1: 17:2
return; // scope 0 at /app/example.rs:17:2: 17:2
}
} |
The trick there is that Edit: I have been informed by Saethlin that this is still off by default and so you should be able to recreate it |
|
@CAD97
We never special case "(un)initialized" in the operational semantics, so I don't think we should do this here. The fact that uninit memory is UB at almost every type simply falls out of the general framework of validity invariants (which I say in turn falls out of "value representations").
Interesting. Miri does not run this conceptual call, and I had not thought of it as even existing -- I was thinking drop only gets called for types that actually need it, as part of Rust → MIR / MiniRust lowering. But as you say this leads to inconsistencies around generics. We should keep this in mind for when we come to specifying that lowering. Also when you said "Writing into an uninitialized place becomes UB" I thought you meant
So are you suggesting that we should have So far I only saw it as our task to figure out the actual authoritative spec of Rust, not to also figure out a separate, more restrictive set of rules that we present as "the model to teach to unsafe Rust authors by default". I am not opposed to having both of them, and for them to be different (as long as we never state any falsehoods), but I think we should clearly separate those discussions.
Indeed this is an optimization we lose with Miri's current approach to #236. I was aware of that when changing this in Miri (Miri used to consider that program UB at some point) and considered it acceptable, and I still hold that position. YMMV. :) Another example of this is rust-lang/rust#97161: #[rustc_layout_scalar_valid_range_start(1)]
pub struct NonNull {
pointer: *const (),
}
#[no_mangle]
pub fn use_nonnull(p: &NonNull) {
std::hint::black_box(p.pointer);
} Without requiring validity behind However this is a rather special case that arises due to directly accessing a field of a
Yes, such is the tradeoff of permissive provenance. But that is a very special case, and one that we have realistic hopes of restricting to a small set of (hopefully mostly legacy) code through the promotion of Strict Provenance APIs. It is something very different to have practically uncheckable UB on an operation that is used all the time by everyone. That said, I am not sure how bad the impact of checking validity behind references would be. If we only go one level down, it might be feasible to check this. But just conceptually I don't think it is the right thing to do. Validity is not something that is artifically added by "checking" something, it falls out entirely naturally because we round-trip some bytes through a higher-level representation and back.
The full invariant of So, there is just no chance that for reference types, their safety invariant and validity invariant will be the same. Sorry.
Indeed that is also how I think about it. (And And yeah I have not entirely made up my mind whether I think each typed copy should be a retag. But my current inclination is no, that just seems like a too big operation. I am curious what others think about this. :) |
Absolutely. Despite potential the false positives and false negatives, beginners and experts alike use miri on a regular basis even when just showing examples to each other. |
There's been very recent discussion on Zulip that shows that I'm over simplifying here, that I just became aware of. Specifically, it actually is the case that In safe code, this is easy to show: let complex_type = ( Some, Nontrivial, Data );
let moved_out = complex_type.0;
// drop glue for complex_type allows partially deinitializing it and only drops the other stuff but there's a more subtle difference, specifically w.r.t. generics: unsafe fn drop_uninit<T>() {
let mut x = std::mem::zeroed::<T>();
let p = addr_of_mut!(x) as *mut MaybeUninit<u8>;
p.write(MaybeUninit::uninit());
// Drop(x) <- current MIR
// drop_in_place(&mut x) <- proposal
}
fn main() {
unsafe { drop_uninit::<bool>(); }
} Miri currently accepts this code, as MIR This is interestingly related to invalid not-used-again values (#84). Drop glue is I think at the moment perhaps underspecified; I personally think it would be cleaner to say that drop elaboration always calls An alternative formulation would be to say that the MIR
Yes; for the same reason having Miri able to check the AM opsem is desirable, it is also beneficial to have a machine-checkable mid-level subset. This is effectively what safe rust is; the statically enforceable model. I think it would be beneficial to have a mid-level subset to allow incrementally allowing
I do agree that whatever this "teaching model" is, it's a fully distinct question from the AM opsem. But it is also clearly relevant to this specific choice about
This is true w.r.t. the AM opsem, but to a developer it does act more like the AM asserting certain properties at defined points. (e.g. the I do think that retag operations (or place operations) asserting byte-validity of the place is at least a reasonable opsem, and that validity being one level deep in this way is at least as teachable as it being zero levels deep.
I think w.r.t. teaching, asserting byte-validity and And just apologies again for the brain-dump format writeup. If I had more time I'd've written a shorter letter etc. |
&mut uninit
not being immediate UB&mut uninit
not being immediate UB
Agreed, that would be rather clean. It is also not called on variables that have been moved out of (both fully and partially moved). /me briefly ponders the interaction of partial initialization and
Yeah but then we'd have to find a way to preserve all those Drop for Miri. Currently I think many get removed pretty early on, in drop elaboration.
We have to be careful here since this is not monotone. The less you allow the environment/client/caller to do, the more you can do yourself.
I am not opposed to axiomatic semantics, as long as they are proven sound against an operational ground truth. ;)
I think it has some serious problems. Consider this example: // This function can be called with `rc` and `unique` pointing to
// overlapping regions of memory.
fn evil_alias<T>(rc: &RefCell<T>, unique: &mut T) {}
fn proof<T>(x: T) {
let rc = RefCell::new(x);
let mut bmut = rc.borrow_mut();
evil_alias(&rc, &mut *bmut);
} Now when So, at least inside |
Regarding optimization, I feel like there’s some potential gains from knowing that |
Yes, and for that reason Miri (and MiniRust) carve out a special case of reference to uninhabited types; for Miri that happens here, for MiniRust here. Note that this can be checked without even looking at the pointed-to memory, which makes this somewhat different from validating, say, that This then in turn lets us declare |
The My current position is I think the rough consensus: retag does not do a read, and I could go either way on
It's also worth noting that the canonical example of passing (Adide: I'm apparently fond of byte-valid[ity] for clarifying that it's just about the value representation.)
Not currently (and this is a point of vocabulary contention between "byte inhabited" and "safely inhabited" being distinct concepts... but this (or something very closely related) also occurs for e.g. Footnotes
|
&mut uninit
not being immediate UB&mut uninit
not being immediate UB)
MOVNTI, MOVNTDQ, and friends weaken TSO when next to other stores. As most stores are not nontemporal, so LLVM uses simple stores when lowering LLVMIR like `atomic store ... release` on x86. These facts could allow something like the following code to be emitted: vmovntdq [addr], ymmreg vmovntdq [addr+N], ymmreg vmovntdq [addr+N*2], ymmreg vmovntdq [addr+N*3], ymmreg mov byte ptr [flag], 1 ; producer-consumer flag But these stores are NOT ordered with respect to each other! Nontemporal stores induce the CPU to use write-combining buffers. These writes will be resolved in bursts instead of at once, and the write may be further deferred until a serialization point. Even a non-temporal write to any other location will not force the deferred writes to be resolved first. Thus, assuming cache-line-sized buffers of 64 bytes, the CPU may resolve these writes in e.g. this actual order: vmovntdq [addr+N*2], ymmreg vmovntdq [addr+N*3], ymmreg mov byte ptr [flag], 1 vmovntdq [addr+N], ymmreg vmovntdq [addr], ymmreg This could e.g. result in other threads accessing this address after the flag is set, thus accessing memory via safe code that was assumed to be correctly synchronized. This could result in observing tearing or other inconsistent program states. If using `&mut [u8]` to write uninitialized memory is permitted ( per rust-lang/unsafe-code-guidelines#346 ), it could even result in safe code incorrectly reading uninitialized memory! To guarantee program soundness, code using nontemporal stores must currently use sfence in its safety boundary, unless and until LLVM decides this combination of facts should be considered a miscompilation and a motivation to choose lowerings that do not require explicit sfence.
MOVNTI, MOVNTDQ, and friends weaken TSO when next to other stores. As most stores are not nontemporal, so LLVM uses simple stores when lowering LLVMIR like `atomic store ... release` on x86. These facts could allow something like the following code to be emitted: ```asm vmovntdq [addr], ymmreg vmovntdq [addr+N], ymmreg vmovntdq [addr+N*2], ymmreg vmovntdq [addr+N*3], ymmreg mov byte ptr [flag], 1 ; producer-consumer flag ``` But these stores are NOT ordered with respect to each other! Nontemporal stores induce the CPU to use write-combining buffers. These writes will be resolved in bursts instead of at once, and the write may be further deferred until a serialization point. Even a non-temporal write to any other location will not force the deferred writes to be resolved first. Thus, assuming cache-line-sized buffers of 64 bytes, the CPU may resolve these writes in e.g. this actual order: ```asm vmovntdq [addr+N*2], ymmreg vmovntdq [addr+N*3], ymmreg mov byte ptr [flag], 1 vmovntdq [addr+N], ymmreg vmovntdq [addr], ymmreg ``` This could e.g. result in other threads accessing this address after the flag is set, thus accessing memory via safe code that was assumed to be correctly synchronized. This could result in observing tearing or other inconsistent program states. If using `&mut [u8]` to write uninitialized memory is permitted ( per rust-lang/unsafe-code-guidelines#346 ), it could even result in safe code incorrectly reading uninitialized memory! To guarantee program soundness, code using nontemporal stores must currently use sfence in its safety boundary, unless and until LLVM decides this combination of facts should be considered a miscompilation and a motivation to choose lowerings that do not require explicit sfence.
MOVNTI, MOVNTDQ, and friends weaken TSO when next to other stores. As most stores are not nontemporal, so LLVM uses simple stores when lowering LLVMIR like `atomic store ... release` on x86. These facts could allow something like the following code to be emitted: ```asm vmovntdq [addr], ymmreg vmovntdq [addr+N], ymmreg vmovntdq [addr+N*2], ymmreg vmovntdq [addr+N*3], ymmreg mov byte ptr [flag], 1 ; producer-consumer flag ``` But these stores are NOT ordered with respect to each other! Nontemporal stores induce the CPU to use write-combining buffers. These writes will be resolved in bursts instead of at once, and the write may be further deferred until a serialization point. Even a non-temporal write to any other location will not force the deferred writes to be resolved first. Thus, assuming cache-line-sized buffers of 64 bytes, the CPU may resolve these writes in e.g. this actual order: ```asm vmovntdq [addr+N*2], ymmreg vmovntdq [addr+N*3], ymmreg mov byte ptr [flag], 1 vmovntdq [addr+N], ymmreg vmovntdq [addr], ymmreg ``` This could e.g. result in other threads accessing this address after the flag is set, thus accessing memory via safe code that was assumed to be correctly synchronized. This could result in observing tearing or other inconsistent program states. If using `&mut [u8]` to write uninitialized memory is permitted ( per rust-lang/unsafe-code-guidelines#346 ), it could even result in safe code incorrectly reading uninitialized memory! To guarantee program soundness, code using nontemporal stores must currently use SFENCE in its safety boundary, unless and until LLVM decides this combination of facts should be considered a miscompilation and motivation to choose lowerings that do not require explicit SFENCE.
MOVNTI, MOVNTDQ, and friends weaken TSO when next to other stores. As most stores are not nontemporal, LLVM uses simple stores when lowering LLVMIR like `atomic store ... release` on x86. These facts could allow something like the following code to be emitted: ```asm vmovntdq [addr], ymmreg vmovntdq [addr+N], ymmreg vmovntdq [addr+N*2], ymmreg vmovntdq [addr+N*3], ymmreg mov byte ptr [flag], 1 ; producer-consumer flag ``` But these stores are NOT ordered with respect to each other! Nontemporal stores induce the CPU to use write-combining buffers. These writes will be resolved in bursts instead of at once, and the write may be further deferred until a serialization point. Even a non-temporal write to any other location will not force the deferred writes to be resolved first. Thus, assuming cache-line-sized buffers of 64 bytes, the CPU may resolve these writes in e.g. this actual order: ```asm vmovntdq [addr+N*2], ymmreg vmovntdq [addr+N*3], ymmreg mov byte ptr [flag], 1 vmovntdq [addr+N], ymmreg vmovntdq [addr], ymmreg ``` This could e.g. result in other threads accessing this address after the flag is set, thus accessing memory via safe code that was assumed to be correctly synchronized. This could result in observing tearing or other inconsistent program states. If using `&mut [u8]` to write uninitialized memory is permitted ( per rust-lang/unsafe-code-guidelines#346 ), it could even result in safe code incorrectly reading uninitialized memory! To guarantee program soundness, code using nontemporal stores must currently use SFENCE in its safety boundary, unless and until LLVM decides this combination of facts should be considered a miscompilation and motivation to choose lowerings that do not require explicit SFENCE.
MOVNTI, MOVNTDQ, and friends weaken TSO when next to other stores. As most stores are not nontemporal, LLVM uses simple stores when lowering LLVMIR like `atomic store ... release` on x86. These facts could allow something like the following code to be emitted: ```asm vmovntdq [addr], ymmreg vmovntdq [addr+N], ymmreg vmovntdq [addr+N*2], ymmreg vmovntdq [addr+N*3], ymmreg mov byte ptr [flag], 1 ; producer-consumer flag ``` But these stores are NOT ordered with respect to each other! Nontemporal stores induce the CPU to use write-combining buffers. These writes will be resolved in bursts instead of at once, and the write may be further deferred until a serialization point. Even a non-temporal write to any other location will not force the deferred writes to be resolved first. Thus, assuming cache-line-sized buffers of 64 bytes, the CPU may resolve these writes in e.g. this actual order: ```asm vmovntdq [addr+N*2], ymmreg vmovntdq [addr+N*3], ymmreg mov byte ptr [flag], 1 vmovntdq [addr+N], ymmreg vmovntdq [addr], ymmreg ``` This could e.g. result in other threads accessing this address after the flag is set, thus accessing memory via safe code that was assumed to be correctly synchronized. This could result in observing tearing or other inconsistent program states, especially as the number of writes, thus the number of write buffers that may begin retiring simultaneously, thus the chance of them resolving in an unfortunate order, increases. If using `&mut [u8]` to write uninitialized memory is permitted ( per rust-lang/unsafe-code-guidelines#346 ), it could even result in an access to `&[u8]` actually being reading uninitialized memory in safe code! To guarantee program soundness, code using nontemporal stores must currently use SFENCE in its safety boundary, unless and until LLVM decides this combination of facts should be considered a miscompilation and motivation to choose lowerings that do not require explicit SFENCE.
MOVNTI, MOVNTDQ, and friends weaken TSO when next to other stores. As most stores are not nontemporal, LLVM uses simple stores when lowering LLVMIR like `atomic store ... release` on x86. These facts could allow something like the following code to be emitted: ```asm vmovntdq [addr], ymmreg vmovntdq [addr+N], ymmreg vmovntdq [addr+N*2], ymmreg vmovntdq [addr+N*3], ymmreg mov byte ptr [flag], 1 ; producer-consumer flag ``` But these stores are NOT ordered with respect to each other! Nontemporal stores induce the CPU to use write-combining buffers. These writes will be resolved in bursts instead of at once, and the write may be further deferred until a serialization point. Even a non-temporal write to any other location will not force the deferred writes to be resolved first. Thus, assuming cache-line-sized buffers of 64 bytes, the CPU may resolve these writes in e.g. this actual order: ```asm vmovntdq [addr+N*2], ymmreg vmovntdq [addr+N*3], ymmreg mov byte ptr [flag], 1 vmovntdq [addr+N], ymmreg vmovntdq [addr], ymmreg ``` This could e.g. result in other threads accessing this address after the flag is set, thus accessing memory via safe code that was assumed to be correctly synchronized. This could result in observing tearing or other inconsistent program states, especially as the number of writes, thus the number of write buffers that may begin retiring simultaneously, thus the chance of them resolving in an unfortunate order, increases. If using `&mut [u8]` to write uninitialized memory is permitted ( per rust-lang/unsafe-code-guidelines#346 ), it could even result in an access to `&[u8]` actually being reading uninitialized memory in safe code! To guarantee program soundness, code using nontemporal stores must currently use SFENCE in its safety boundary, unless and until LLVM decides this combination of facts should be considered a miscompilation and motivation to choose lowerings that do not require explicit SFENCE.
I have one I find persuasive in rust-lang/rust#117800. Eliminating short-circuiting requires that it be legal to read memory as Interestingly, though, any case I can think of for this just cares about initializedness (so as not to produce poisons) but not validity. I don't know if it'd make it easier, but for example I think it'd be fine for this to allow (I came here from https://rust-lang.zulipchat.com/#narrow/stream/213817-t-lang/topic/Poison-ness.20behind.20references/near/401455323) |
No. It is library UB to call that function when the array is not fully initialized. This is just not a transformation the compiler can make.
If you want |
As @comex1 noted, the compiler actually should be able to turn (informal) That said, it does seem like it would be beneficial to have some way to eagerly directly assert byte validity behind a reference, although based on the load-first impl not getting the better codegen it wouldn't help in this case. ( Footnotes
|
(This is my first PR here, so I've probably missed some things. Please let me know what else I should do to help you as a reviewer!) # Objective Due to rust-lang/rust#117800, the `derive`'d `PartialEq::eq` on `Entity` isn't as good as it could be. Since that's used in hashtable lookup, let's improve it. ## Solution The derived `PartialEq::eq` short-circuits if the generation doesn't match. However, having a branch there is sub-optimal, especially on 64-bit systems like x64 that could just load the whole `Entity` in one load anyway. Due to complications around `poison` in LLVM and the exact details of what unsafe code is allowed to do with reference in Rust (rust-lang/unsafe-code-guidelines#346), LLVM isn't allowed to completely remove the short-circuiting. `&Entity` is marked `dereferencable(8)` so LLVM knows it's allowed to *load* all 8 bytes -- and does so -- but it has to assume that the `index` might be undef/poison if the `generation` doesn't match, and thus while it finds a way to do it without needing a branch, it has to do something slightly more complicated than optimal to combine the results. (LLVM is allowed to change non-short-circuiting code to use branches, but not the other way around.) Here's a link showing the codegen today: <https://rust.godbolt.org/z/9WzjxrY7c> ```rust #[no_mangle] pub fn demo_eq_ref(a: &Entity, b: &Entity) -> bool { a == b } ``` ends up generating the following assembly: ```asm demo_eq_ref: movq xmm0, qword ptr [rdi] movq xmm1, qword ptr [rsi] pcmpeqd xmm1, xmm0 pshufd xmm0, xmm1, 80 movmskpd eax, xmm0 cmp eax, 3 sete al ret ``` (It's usually not this bad in real uses after inlining and LTO, but it makes a strong demo.) This PR manually implements `PartialEq::eq` *without* short-circuiting, and because that tells LLVM that neither the generations nor the index can be poison, it doesn't need to be so careful and can generate the "just compare the two 64-bit values" code you'd have probably already expected: ```asm demo_eq_ref: mov rax, qword ptr [rsi] cmp qword ptr [rdi], rax sete al ret ``` Since this doesn't change the representation of `Entity`, if it's instead passed by *value*, then each `Entity` is two `u32` registers, and the old and the new code do exactly the same thing. (Other approaches, like changing `Entity` to be `[u32; 2]` or `u64`, affect this case.) This should hopefully merge easily with changes like #9907 that also want to change `Entity`. ## Benchmarks I'm not super-confident that I got my machine fully consistent for benchmarking, but whether I run the old or the new one first I get reasonably consistent results. Here's a fairly typical example of the benchmarks I added in this PR: ![image](https://github.com/bevyengine/bevy/assets/18526288/24226308-4616-4082-b0ff-88fc06285ef1) Building the sets seems to be basically the same. It's usually reported as noise, but sometimes I see a few percent slower or faster. But lookup hits in particular -- since a hit checks that the key is equal -- consistently shows around 10% improvement. `cargo run --example many_cubes --features bevy/trace_tracy --release -- --benchmark` showed as slightly faster with this change, though if I had to bet I'd probably say it's more noise than meaningful (but at least it's not worse either): ![image](https://github.com/bevyengine/bevy/assets/18526288/58bb8c96-9c45-487f-a5ab-544bbfe9fba0) This is my first PR here -- and my first time running Tracy -- so please let me know what else I should run, or run things on your own more reliable machines to double-check. --- ## Changelog (probably not worth including) Changed: micro-optimized `Entity::eq` to help LLVM slightly. ## Migration Guide (I really hope nobody was using this on uninitialized entities where sufficiently tortured `unsafe` could could technically notice that this has changed.)
This is a gorgeous writeup 😍 I think we should keep your "what is UB" intro for posterity somewhere. I particularly liked the clarity of "UB should be detectable/justified/operational". Is there a better place than the glossary for that? If not I'll PR this into the glossary. |
The glossary is meant for definitions, not rationale. But I don't know a good place for rationale, unfortunately. |
(This is my first PR here, so I've probably missed some things. Please let me know what else I should do to help you as a reviewer!) # Objective Due to rust-lang/rust#117800, the `derive`'d `PartialEq::eq` on `Entity` isn't as good as it could be. Since that's used in hashtable lookup, let's improve it. ## Solution The derived `PartialEq::eq` short-circuits if the generation doesn't match. However, having a branch there is sub-optimal, especially on 64-bit systems like x64 that could just load the whole `Entity` in one load anyway. Due to complications around `poison` in LLVM and the exact details of what unsafe code is allowed to do with reference in Rust (rust-lang/unsafe-code-guidelines#346), LLVM isn't allowed to completely remove the short-circuiting. `&Entity` is marked `dereferencable(8)` so LLVM knows it's allowed to *load* all 8 bytes -- and does so -- but it has to assume that the `index` might be undef/poison if the `generation` doesn't match, and thus while it finds a way to do it without needing a branch, it has to do something slightly more complicated than optimal to combine the results. (LLVM is allowed to change non-short-circuiting code to use branches, but not the other way around.) Here's a link showing the codegen today: <https://rust.godbolt.org/z/9WzjxrY7c> ```rust #[no_mangle] pub fn demo_eq_ref(a: &Entity, b: &Entity) -> bool { a == b } ``` ends up generating the following assembly: ```asm demo_eq_ref: movq xmm0, qword ptr [rdi] movq xmm1, qword ptr [rsi] pcmpeqd xmm1, xmm0 pshufd xmm0, xmm1, 80 movmskpd eax, xmm0 cmp eax, 3 sete al ret ``` (It's usually not this bad in real uses after inlining and LTO, but it makes a strong demo.) This PR manually implements `PartialEq::eq` *without* short-circuiting, and because that tells LLVM that neither the generations nor the index can be poison, it doesn't need to be so careful and can generate the "just compare the two 64-bit values" code you'd have probably already expected: ```asm demo_eq_ref: mov rax, qword ptr [rsi] cmp qword ptr [rdi], rax sete al ret ``` Since this doesn't change the representation of `Entity`, if it's instead passed by *value*, then each `Entity` is two `u32` registers, and the old and the new code do exactly the same thing. (Other approaches, like changing `Entity` to be `[u32; 2]` or `u64`, affect this case.) This should hopefully merge easily with changes like bevyengine#9907 that also want to change `Entity`. ## Benchmarks I'm not super-confident that I got my machine fully consistent for benchmarking, but whether I run the old or the new one first I get reasonably consistent results. Here's a fairly typical example of the benchmarks I added in this PR: ![image](https://github.com/bevyengine/bevy/assets/18526288/24226308-4616-4082-b0ff-88fc06285ef1) Building the sets seems to be basically the same. It's usually reported as noise, but sometimes I see a few percent slower or faster. But lookup hits in particular -- since a hit checks that the key is equal -- consistently shows around 10% improvement. `cargo run --example many_cubes --features bevy/trace_tracy --release -- --benchmark` showed as slightly faster with this change, though if I had to bet I'd probably say it's more noise than meaningful (but at least it's not worse either): ![image](https://github.com/bevyengine/bevy/assets/18526288/58bb8c96-9c45-487f-a5ab-544bbfe9fba0) This is my first PR here -- and my first time running Tracy -- so please let me know what else I should run, or run things on your own more reliable machines to double-check. --- ## Changelog (probably not worth including) Changed: micro-optimized `Entity::eq` to help LLVM slightly. ## Migration Guide (I really hope nobody was using this on uninitialized entities where sufficiently tortured `unsafe` could could technically notice that this has changed.)
by 1) panicking on overlapping keys and 2) returning an array of Option rather than an Option of array. And address a potential future UB by only using raw pointers rust-lang/unsafe-code-guidelines#346
by 1) panicking on overlapping keys and 2) returning an array of Option rather than an Option of array. And address a potential future UB by only using raw pointers rust-lang/unsafe-code-guidelines#346 a
In the past I have assumed that the mere **existence** of a `&mut` reference to uninitialized memory results in instant Undefined Behavior (UB), even if there are no explicit reads in the code. This scenario has been recently discussed in the internal chatroom about `unsafe` Rust code (see https://chat.google.com/room/AAAAhLsgrQ4/Fx2naiaXbeU) where rust-lang/unsafe-code-guidelines#346 was linked and where it seems that the consensus is to **not** treat `&mut uninit` as immediate UB. On one hand the discussions are still ongoing, but OTOH I don't want to make/spread safety notes that may very well be incorrect and overly conservative. So, for now, let me delete the related safety comments from `FFI.rs`. Bug: chromium:356884491 Change-Id: Ica15532493dc0c35b12332df04306fe87be10d3e Reviewed-on: https://skia-review.googlesource.com/c/skia/+/904956 Auto-Submit: Łukasz Anforowicz <[email protected]> Commit-Queue: Daniel Dilan <[email protected]> Reviewed-by: Daniel Dilan <[email protected]>
This post is a draft of my understanding of the problem space, and justification for
&mut uninit
not being UB to hold or pass around between functions.What is UB?
This is a very brief summary; see the glossary, various blog posts, and #253 for more.
In short, UB is a language-level contract that some situation does not happen. Formally speaking, a program cannot "have UB;" UB is a property of some execution resulting in some behavior considered undefined. Note that this is the formal meaning of undefined; encountering UB retroactively removes any and all guarantees about the program execution1.
Depending on who you ask, UB in C++ may have originally been about allowing implementation-defined behavior and implementations to diverge on how they implement the language. Even if this is the case, though, every commercial C++ compiler uses UB under the modern understanding for optimization, and a language without UB is one that is very difficult if not impossible to optimize2. And more importantly, Rust is not C++.
With the benefit of hindsight from the experience of C/C++ and other language design in the past 50 years, Rust takes a much more deliberate approach to UB. In particular:
Why isn't
&mut uninit
currently considered UB by Miri?In short: because it does no operation that is undefined. Expanding on that a little:
Additionally, writing to an uninitialized place is allowed as this consists of
Note, however, that writing uninit into
&mut init
is always UB. This is because this does a typed copy of the written value, which asserts that the written value is initialized.Why would
&mut uninit
be considered UB?There are two operational ways that references to uninitialized memory could be made operationally UB: in borrow validation or during conversion between references and places.
But how much memory validity? The easy answer is full memory validity at the referenced type; the minimal answer for the desired property is just that the memory is initialized. However, checking bytes are initialized still requires full type information to know which bytes are potentially padding and thus are allowed to be uninitialized, so full memory validity is simpler to check and cost us nothing extra on the implementation.
There are subtle differences to the properties derivable from when exactly memory validity is asserted, but the purpose of this document is to discuss the fundamental reasons for/against using any of them generally.
What otherwise valid programs are made UB?
There's at least two notable losses, one obvious, and one not so obvious.
The obvious one is just any program using a type like
&mut [u8]
to reference potentially-uninitialized memory. Many existing implementations ofio::Read
are written to carefully avoid reading the provided buffer before writing to it, such that they might be used to read into an uninitialized buffer. There is an existing accepted RFC allowing for safely reading into an uninitialized buffer3, but it would be very unfortunate to make nearly all existing code unsound.The less obvious one is with pointers. Writing into an uninitialized place (via
=
assignment expression) becomes UB, even if that place is behind a raw pointer. This is because the drop glue semantically callsstd::ptr::drop_in_place(&mut place)
, creating a mutable reference to the place. You can potentially recover writes to places not asserting memory validity of the place by semantically only creating the reference if the place's type has drop glue, but this has further complications around generics (as MIR is produced for the generic function in polymorphic form, andptr::drop_in_place
must be called there). It is perhaps a better idea to useptr::write
for writing into uninitialized memory anyway, but this adds an additional subtle pitfall to what are supposed to be raw pointers with mostly C-like semantics (so long as you don't create any references).What benefit is to be gained?
What
&mut uninit
being UB would theoretically provide is that references could be marked as "dereferencable(noundef N)
" where they are currently markeddereferencable(N)
in the LLVM backend. Pointee memory validation during borrow validation would likely be enough to justify this; validation, and ref-to-place or place-to-ref time could be enough for "dereferencable_on_entry(noundef N)
" if reborrowing for a function argument counts as doing a ref-to-place-to-ref conversion (how you would write it in source,&*ref
).However, there is at the time of writing no known optimization benefit to eagerly marking references as pointing to known-init memory, neither practical nor theoretical. This is due to a simple observation: when the memory is read by the source program, it then undergoes a typed copy which asserts that it is memory valid.
So the only potential optimization lost is speculative reads. However, we already justify that references must be dereferencable by the borrow validity rules, so it is perfectly fine to speculatively read memory from behind a reference. It is even valid to make decisions based on the value before it is semantically guaranteed to be read by the source program, so long as the speculative execution can deal with speculation being driven by uninit (e.g. by
freeze
ing it to an arbitrary-but-consistent noundef byte value).So if there's no optimization benefit to eagerly checking for references pointing to uninit memory, the benefit is solely in diagnosing ill-formed programs. By eagerly checking, the existence of references-to-uninit can be diagnosed when they are created rather than when the uninitialized memory is read4, properly blaming the creator of the reference rather than the place just doing safe reads of a safe reference.
However, this "victim blaming" is actually fundamental to how Miri works to diagnose operational UB. Miri does not (and cannot) understand your library's safety preconditions. The only thing Miri diagnoses is when the code violates the conditions of the Abstract Machine (exhibits UB), and will point at the point where the violation happened as the culprit, even if the bug in the code is instead a far-removed violation of a library's contract invoking "library UB5". Miri only cares about and diagnoses whether a specific execution of a specific complete compilation graph encounters language UB, and using this to show soundness is left to careful application of tests.
Why is this contentious?
It is the author's belief that
&mut uninit
is somewhat unique in the space of defined-but-unsafe Rust, in that this is a safety invariant on a otherwise very strict primitive type. Of the primitive types,!
is always invalid.()
is always valid.iNN
,uNN
,fNN
, only have the trivial noundef validity invariant.char
,bool
only have validity invariants.[_; N]
and[_]
inherit all safety/validity invariants from the contained type.*const _
and*mut _
only have the trivial noundef validity invariant.str
has officially been decided to have the validity invariant of[u8]
and for valid UTF-8 to be a safety invariant.&str
and notstr
-by-value.str
primitive has a nonempty safety invariantstr
isn't (semantically) a primitive, it just looks like one, but is actually juststruct str([u8])
.Box
is... special, but generally not considered as a primitive type outside of the compiler.References thus are special among primitive types in that they
This is likely unavoidable, as memory validity being shallow / not following references is itself a very desirable property, both for reasoning about unsafe code and for implementing the sanitizer. But I think this can be resolved as solely a teaching problem. The answer to "can a reference point to uninitialized memory" should be "no*, use
MaybeUninit
or a wrapper of it," where the asterisk is "unsafe
can break the rules, but unless you break the rules, you don't have to worry about it." I think in many ways many people are too eager to be correct to remember when it's okay and even better to put forward a simplifying lie, and then refine later as necessary.Having an operational model of "what you can do" with
unsafe
is important for being able to reason about unsafe code and to write complicated unsafe code. But for teaching unsafe, it's almost certainly better to stick to relaxing the safe rules for a significant period.TL;DR
It is the author's conclusion that:
enum
s need to do a read if we want "active variant"MaybeUninit
tracking.unsafe
is difficult.&uninit
is fine, actually" too soon.Footnotes
The smallest possible caveat may informally apply here: external synchronization via observable effects. At each point the Abstract Machine does something observable outside of the Abstract Machine (i.e. does FFI e.g. IO), the observable state of the Abstract Machine must be in the state defined by the execution to this point. The only way for UB to retroactively unguarantee the already observed behavior is if it is not a valid execution to stop the AM at the observable point before the UB occurs. This is, however, merely an informal argument; the guarantee does in fact no longer exist, it is just that there is no known way for a compiler to take advantage of this. However, note that not all behavior you may expect to be externally observable necessarily is, and neither is all behavior implemented via FFI. So long as the Abstract Machine has a definition of the operation which does not leave the AM, it is not considered externally observable. The canonical example of this is that while allocation is implemented by calling into the host OS, the AM has an internal definition of allocation and an implementation may arbitrarily call the OS allocation APIs in any defined manner. ↩
You may disagree here, and point to scripting languages like ECMAScript or even Safe Rust as languages that can be optimized while not having UB. But the insight here is that they still have UB; the difference is solely that the UB is statically prevented from happening in the surface language. As soon as you go to lower the higher-level language to another target, the set of syntactically valid possibilities extends beyond that of the surface language, and any operation which would be forbidden in the surface language is UB. ↩
In general, you should prefer using types like
&mut [MaybeUninit<u8>]
rather than&[u8]
for writing into potentially uninitialized memory. Even if the existence of the reference is not in and of itself UB, it is still wildly unsafe, and using types that allow and potentially track uninitialized memory much better describes the semantics of your program and prevent accidentally exposing references to uninit to downstream code (which is still unsound). ↩It is for this reason that if one of the measures for making
&mut uninit
UB eagerly were to be taken, the author suggests putting the check on place-to-ref conversions. ↩We say an execution's use of a library API exhibits "library UB" if it violates the documented preconditions of the library functions. If "library UB" is caused, the library has full "permission" to cause language-level UB at any later point. The program's behavior is still defined unless language UB is encountered. ↩
The text was updated successfully, but these errors were encountered: