-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What do we say about the equality of pointers in constants, vtables, and function pointers? #522
Comments
I think the "ideal" case is that only one vtable would exist for a given type/trait pair, but I just don't think that's going to be possible:
The next best thing would be to make wide pointer comparisons useful at least - if we can't make comparisons on vtables useful, then we must not consider vtables as part of any comparison. IMO a "useful" pointer comparison obeys the following:
There are four cases to consider:
|
Please let's focus this discussion on what we guarantee about the raw underlying identity of fn/vtable/const pointers. Potentially making wide dyn ptr comparison not use the raw underlying identity should be discussed separately.
|
Reposting from the miri issue: TBH, I'd prefer that I was bitten by the lack of uniqueness when writing a macro producing a *Associated |
Addressing the note about deduplication, for non generic pub const FOO: &str = "Hello World!";
// Becomes
pub static FOO_value: str = *"Hello World!"; // psuedo-syntax
pub const FOO: &str = &FOO_value; |
Wouldn't that mean consts like |
It wouldn't affect string literals directly, and unique here I mean it has one address. IE. pub const FOO: &str = "foo";
assert!(core::ptr::eq(FOO, FOO)); holds. I don't think we need to say that the addresses are distinct from all other address, just that there is one address when you evaluate the value of a constant that contains a reference or a pointer. If you have pub const FOO: &[&str] = &["foo", "foo", "foo"]; Then the array (and thus necessarily each of its elements) would have a single address, but the literals could all be the same address, as could both arrays if you copied the |
Deduplicating constants is an important optimisation, especially for embedded and other highly constrained environments. Same goes for functions (this can even be done in the linker, mold for example has a flag to do this, I don't know if the other linkers do). Even outside of embedded this helps reduce the cache pressure, which can help performance. Relying on separate constants of functions with the same contents not getting merged would be unfortunate, especially since it is quite easy to create these by accident with monomorphization. It should also be valid to merge the backing data for partially overlapping constant data (e.g. the string constants "foo" and "foobar" could share a prefix, just have different metadata for the slice lengths). |
I'm not sure I like having rules that explicitly depend on whether the const is "non-generic". That's a pretty surprising non-continuity in behavior. Generally, the expectation is that if you care about having a unique address, you should use a pub static FOO: &str = "Hello World!";
// Different crate
pub const FOO_COPY: &str = crate_a::FOO;
assert!(ptr::eq(FOO, FOO_COPY)); However, for annoying and counter-intuitive reasons, that is not currently what happens. @oli-obk made things slightly better in rust-lang/rust#121644, but it doesn't help in all cases and I think it doesn't help for string literals. I wonder if we should entirely revamp how
Yes, but optimizations are off-topic here. We should be doing our best to do this optimization. But this issue is about whether we guarantee that deduplication occurs, in the sense that unsafe code may rely on it for correctness. There, the answer is currently "no".
That's a good question -- I agree that for read-only data, this should be considered a valid optimization. |
Doesn't what optimisations we want to be able to perform inform what operational semantics we want (as well as other concerns such as lack of footguns etc of course). As I'm not well versed in the formal methods (but have definite opinions based on working with small embedded microcontrollers, as well as command line tooling) I cannot necessarily express myself in the opsem terms, only in what I as a user need (you often need to compile with size optimisations to even be able to fit the code onto your microcontroller for example).
Indeed, for a debug build it makes sense to not spend the effort to try to deduplicate constants or functions for example, you want something ready to run as quickly as possible there. I would for size optimised builds also like to see non-generic read only data merged where aggressively where possible (again, the embedded use case). For example if two different panic strings just happen to be the same (e.g. "unreachable") they should be merged. That sort of thing happens more often than you might expect. So as little guarantees of uniqueness as possible is my preference, unlike what I understand @chorman0773 wants. Again for embedded as well as for cache pressure on non-embedded this is important. And for the other direction, Rust shouldn't guarantee merging either (as that might slow down the debug/incremental/iterative workflow). |
My problem is that I find it unintuive that a
To be clear, when I say uniqueness, I mean in the sense that the value of the To demonstrate pub const FOO: &str = "Hello World!";
pub const BAR: &str = "Hello World!";
assert!(core::ptr::eq(FOO, FOO)); // Guaranteed to hold, can be relied upon by `unsafe` code
// assert!(!core::ptr::eq(FOO, BAR)); // Not guaranteed to hold or to fail. I would like the first assert to hold (and for greater certainty, for the two pointers to be AM-identical, not just identical under |
This is necessarily the case for consts depending on generics. (The usual problem: it could get instantiated with the same generics in two different downstream crates that know nothing about each other, there's no way they can coordinate to produce the same result.) Carving out a special case for non-generic consts doesn't fundamentally make things any more intuitive, it just makes it harder to notice the strange const semantics, so I don't view that as a good fix. I think it is much easier to teach "if you care about guaranteed unique addresses, use statics" than "you can get guaranteed unique addresses if you make sure your const satisfies the following requirements (and better make sure that does not change in future refactors)". Seems like you are saying it is better to draw the line between generic and non-generic consts, while I think it is better to draw the line between statics and consts. I can see people prefer either option, this comes down to a fairly subjective judgment call. |
Quoting the reference page for constant items: (emphasis mine)
And quoting the reference page for static items: (emphasis mine)
The difference between the documented behavior for consts and statics is that the former may produce a result with a different identity (different location of the value and referenced values) but still structurally equivalent, while the latter is guaranteed to produce a result with a fixed identity. Every other difference between the two follows from the different guarantees between consts and statics. The reason statics aren't allowed to be generic is because a fixed identity is not possible to guarantee with generics. |
The issue that I brought up is that there are cases you cannot use a I also don't necessarily want to turn off deduplication, which a The case which I had needed this was producing a |
Why doesn't this work? const fn test() -> &'static str {
"Hello, world!"
}
#[derive(Debug)]
struct RawPtr(*const u8);
unsafe impl Sync for RawPtr {}
static FOO: &str = test();
static BAR: RawPtr = RawPtr(FOO as *const str as *const u8);
static BAR2: RawPtr = RawPtr(unsafe {BAR.0.offset(FOO.len() as isize)});
fn main() {
println!("{FOO:?} {BAR:?} {BAR2:?}");
} |
This was in the context of a macro that could be expanded in any number of |
At a bare minimum, I hope we can guarantee that multiple copies of a single reference within a single constant evaluation could be guaranteed identical to each other, e.g. const _: () = {
let a = &0;
// with 1.82 nightly (2024-08-06 60d146580c10036ce89e)
assert!(matches!(<*const _>::guaranteed_eq(a, a), None));
};
Generic context But rather than tie constant address stability guarantees to captured generics, I think it's easier to assign to top level versus associated constant items. It's the same thing determining if unnamed const items are allowed. Yeah, it's far from ideal to lose that property if refactoring to an associated name, but maybe it's livable? Although, otoh,
The annoying thing is that we have to choose one or the other — either a consistent address or |
@CAD97 I don't see how your example relates to the others here - |
All the more reason that it should be. I originally wrote the example instead as const C: (&i32, &i32) = { let x = &0; (x, x) };
const _: () = {
let (a, b) = C;
// with 1.82 nightly (2024-08-06 60d146580c10036ce89e)
assert!(matches!(<*const _>::guaranteed_eq(a, b), None));
}; but then I changed it when I checked and Also, |
With rust-lang/rust#126660, vtables get yet another fun kind of pointer equality behavior: casting from |
Functions, vtables, and consts do not have a unique stable address, leading to interesting problems. We've so far just always said "yeah we don't make any guarantees there", but (a) that could probably be documented better, (b) Miri doesn't actually make this non-deterministic so it can easily miss issues here, and (c) maybe we could/should do better in some cases.
function address equality
vtable equality
const address equality
The text was updated successfully, but these errors were encountered: