-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should null().add(0)
work?
#299
Comments
Addendum: Miri accepts the following correct program: let slice = std::slice::from_raw_parts(0x1 as *const u8, 0); // Valid alternate spelling of &[].
let range = slice.as_ptr_range(); // Calls ptr.add(len) internally. So yeah, it seems Miri is special-casing |
0x1 isn't null though. |
My point is that Miri rejects |
With ptr::add, you can start and end one byte past an allocated object. So if you start at one byte past an allocated object then add 0, you're still one byte past the end, and thus you're fine. but if you start at one byte past the end and add 1 you've gone too far and it's UB. |
I think this is a duplicate of #93: basically, I wish I knew what the exact rules in LLVM were here, so that we can ensure Rust is following those rules. But sadly, the LangRef is far from clear, and when I asked for clarification I did not get clear answers either. So currently, Miri accepts For non-0 offsets, like in If I could have my way, |
Btw, in case you are curious, here is the code that implements this in Miri: |
Ralf: so my read of the LangRef seems inconsistent with yours. Looking at the definition of
My read of this is that the null pointer points to a zero-width allocation, so I'm pretty sure the LLVM snippet given is sound, and defined to be In particular, Clang lowers
The funny thing is that we actually have the reverse of the C++ rules:
Yup. I only mentioned it to make sure I wasn't crazy. It would not make sense to allow that. In the meantime, can we document the current Miri behavior under |
Ah, and let us not forget the classic companion to this question. Is the following program UB? std::memcpy(nullptr, nullptr, 0); All three languages say "yes". I believe Rust says no if the pointers are |
We seem to agree that But NULL is explicitly a pointer where no allocation can ever exist in LLVM. I would hence argue that there cannot exist a zero-sized allocation at NULL, either. Put differently: given that no allocation can exist at NULL, then if let ptr = Box::into_raw(Box::new(0i32));
drop(Box::from_raw(ptr));
ptr.add(0); // UB or not? Miri says UB. I tried to ask the LLVM devs about this and here is part of the thread (it's stretched across several months so browsing the full thread is super annoying). It was hard for me to extract much concrete from this, but it seemed pretty clear that It seems you are saying that maybe I was not aware of the
Since the current Miri behavior is just whatever I put in the code to make the tests pass, I don't think we should make it normative by documenting it. I think the only way to make progress on this issue is to get LLVM to improve their LangRef. Until we do that, everything Rust does is like building on quicksand. I tried and failed. Maybe someone else wants to give it a try? :D
Maybe let's not open more than one Pandora's box per issue. ;) |
Speaking of Pandora's boxes, the |
I haven't looked at the thread yet, but looking at @mcy's quote:
This implies that the null pointer is an "in bounds address" for a base of null; otherwise it would have said something like "There are no in bounds addresses for a null pointer in the default address-space." Since the only time the LangRef mentions "in bounds addresses" is with respect to getelementptr, unless I'm missing something, that directly implies that a Of course, it's possible that the LangRef is just wrong. But as it's written, my take is that it does unambiguously permit |
Looks like this language was added in:
The linked bug report is about NULL + 0, and Richard Smith mentions:
So indeed, LLVM intentionally allows 'NULL + 0' in order to match C++, notwithstanding the strangeness you mentioned. I agree this would be a good reason to reopen the discussion about 'deallocated + 0'. |
The thread doesn't cover the "base=NULL" case specifically - it's more about how an "allocation" is defined when it's not known to LLVM (ie. the base=0x1 case), My reading of that LangRef quote agrees that GEP inbounds NULL + 0 is not poison. |
@comex I agree -- I must have missed or forgotten about that part of the LangRef. However, on the Miri side, I think I'd like to maintain the symmetry that for |
Does this mean MIRI currently has a special-case for 0x1? |
No. The code implementing offset-inbounds has no special case at all. It just checks "is the memory range between the old and new pointer dereferencable". The code for "dereferencable" has some special cases:
We could of course remove the non-null case here, but then But I think defining offset-inbounds in terms of dereferencability is very sensible, and avoids having too many slightly different arbitrary definitions in the language, so I'd really like to keep this and not add a special case to offset-inbounds. And that's why I think that |
@comex haha. Not surprised richardsmith is responsible for this.
That's true. Last time folks on the C/C++ side spoke of this, trying to make things more than NULL + 0 work (like the aforementioned memcpy example) leads to madness. I agree that special cases are bad, but I'm not sure this one is avoidable. The C++ people just gave up and enshrined "NULL + 0" as a correct program, which is kind of a bad sign. AGL makes a good case for why this is a disaster in C: https://www.imperialviolet.org/2016/06/26/nonnull.html. I'm not saying it cleanly applies to us, because we have our own exciting Pandora's boxes, but he makes some pretty solid points. |
One more thing:
It is my belief that trying to exploit |
Hmm, why do you think it seems wrong? In most cases, of course, whether a dereference is at ZST type or not is known prior to LLVM IR generation, so ZST dereferences are eliminated before LLVM's optimizer gets a peek, so there's no need to make them UB under any circumstances. Pre-monomorphization MIR optimizations don't know whether a dereference is at ZST type, but there are lots of things that pre-monomorphization optimizations don't know. I suppose there's the unsized-locals issue: #![feature(unsized_locals)]
pub fn foo(px: *const [u8]) {
unsafe { let x: [u8] = *px; }
} The above code is not actually accepted (because However, the implementation of reading an unsized type uses memcpy:
So if you started with a null pointer of type However, this isn't regular memcpy; it's the LLVM memcpy intrinsic. LangRef describes its semantics as follows:
This language seems to imply that if (In fact, rustc already doesn't. The IR I quoted does have Another issue might be |
Not sure if this was already answered, but in C++ dangling pointers are indeed invalid while NULL isn’t, and strictly more operations are allowed on NULL than on invalid pointers. From http://eel.is/c++draft/basic.compound#3:
http://eel.is/c++draft/basic.stc#general-4 makes dangling pointers invalid and operations on them UB or implementation-defined; instead http://eel.is/c++draft/expr.add#4.1 hardcodes As usual for C compilers, I’d expect clang to not exploit most UB on dangling pointers, except in the mess that is pointer equality comparison, where the UB is exploited to allow using provenance. But I haven't run tests on this choice. |
I think it is avoidable -- whatever Miri currently implements seems to be compatible with at least the majority of unsafe code in the standard library, and whatever people run Miri on.
I am fully in favor of allowing offset-by-0 on invalid pointers. :) But I think we should do either one of the following two:
The concern is not just breaking programs but also having teachable rules. I don't think
"In most cases" and "under any circumstances" don't go together.^^ Either we can guarantee that all ZST accesses are removed prior to LLVM IR generation, or we certainly have to make them UB. Note that "accesses" here includes intrinsics that access memory and the access size is only determined at runtime, such as the For Also I feel like "though shalt not dereference the NULL pointer" is sufficiently deeply engraved that it would be more surprising to allow this code than to forbid it... this might be a good surprise for Rust programmers, but it is equally surprising to compiler devs and thus introduces a significant risk of accidentally making a "NULL cannot be dereferenced" assumption during optimizations.
To my knowledge, LLVM's behavior is consistent with pointer equality entirely ignoring provenance. I don't think pointer equality is ever UB in LLVM.
But is there a good reason for this? What does this extra "more invalid that NULL" UB let compilers do? |
@comex oh and also it seems somewhat inconsistent to me to say that |
@RalfJung just to answer your questions (since I can only speak re C++ which is a side discussion):
Fair enough; I'll note in C++ it's not UB, just unspecified in some cases (e.g. https://eel.is/c++draft/expr.eq#3.1), even if GCC devs would prefer it be UB, and the C++ standard wording doesn't actually license "return false when comparing pointers with different provenances" AFAICT. IIRC, the use-cases were in C++ assignment operators.
AFAIK pointer zapping is not used outside of equality comparison (the usual example — |
Yes that reflects my understanding as well.
Ah, you are right -- the "even less valid than NULL" thing in C/C++ is entirely overwhelmed by pointer zapping, so we cannot even discuss the finer points around zero-sized accesses and zero-sized inbounds offsets. LLVM does not document that it does pointer zapping, but also does not explicitly state the opposite... AFAIK LLVM is consistent with "no pointer zapping" and Rust relies on that. |
Yes. I think the best strategy is to persue this on the LLVM side long-term. I can poke some folks I know on Monday, but this is definitely an uphill battle. =(
So I think you're right, but @davidben has made a lot of good points to me about how this can make FFI a lot more dangerous than it has to be. Though he agrees with us that "allow offset-by-zero for all pointers" is optimal for everyone's sanity. (I'll let him comment further if he wants to.) |
That would be great. :D
I don't quite understand the FFI point -- is the issue that C might be passing in NULL pointers and then Rust will If |
The FFI point I think applies less to NULL + 0 (I expect Rust code will convert to slices early) and more to transiting (ptr, len) tuples across the languages. At the risk of opening Pandora's box, I think the questions to ponder there are:
For C/C++, I think it's unavoidable that (NULL, 0) is in (1). Given that, yes, C/C++ both fail (2). C messes up My understanding from @mcy is that Rust, at least at the level of a slice's internal representation and However, this means C/C++ and Rust are incompatible for (3), which has unfortunate implications for FFIs. If Rust tries to take in a slice from C/C++, perhaps out of a I imagine Rust's ban on (NULL, 0) for the internal slice representation is a lost cause at this point (though one could imagine saying |
@davidben thanks for this helpful input! If by "slice" you mean Rust does satisfy (2) for I can see how indeed this violates (3), but I don't think Rust will give up the non-NULL optimization due to this -- that optimization is just too deeply entrenched at this point, and also likely too important. I would think the best way to solve this is to have explicit conversion functions at the "edges" that turn NULL into
Now, in Rust we also have |
If foreign code is giving you ptr+len and you want to try to make it into a rust slice then naturally you should simply use some sort of "try_from_raw_parts" method which returns As Ralf says, that this function doesn't exist in the standard library today is "just" a library design issue. |
Sure, I think we're in agreement here. Short of changing the slice representation, any fix for the unsafe FFI story will be a library one. I just elaborated to clarify the FFI issue I was summoned for. Like I said, it doesn't apply to NULL + 0 per se, just all part of the mess that is empty slices. |
I was chatting w/ @regehr on twitter, who seems to believe that the Alive2 model of LLVM's semantics actually already has Perhaps this is less hopeless than we thought? See: https://twitter.com/DrawsMiguel/status/1419432464418754561?s=20 |
@mcy isn't that the wrong direction? Transforming |
Ah. That is a very good observation. I'm not the one asserting that this is ok, so I won't defend it. =) |
Looks like the opposite direction doesn't work: https://alive2.llvm.org/ce/z/7PXoVt: |
Yeah that's definitely a problem. Rust generates code like this. |
Closing in favor of #93, which discusses the more general question of zero-sized accesses/offsets. I don't think the null pointer should get any special treatment here. |
This question comes up a lot in C/C++ land. Here is the status quo:
NULL + 0
as ill-formed, because arithmetic must land inside of an "allocation".NULL + 0
, by special case, an defines it to beNULL
.getelementptr inbounds T, T* 0, i64 0
to beT* 0
, again by special case.p + 0
togetelementptr inbounds
in both language modes.ptr::null::<T>().add(0)
, for the same reasons as C.IME this has caused quite a bit of pain in C, where you might want to write:
This is UB for any sensible choice of an empty slice, including
p = NULL
. While Rust gives us the tools to avoid writing such code, it is... somewhat questionable to follow C rather than C++ here. Like, you could usewrapping_add()
, but honestly that seems a bit silly given that LLVM goes out of its way to make this work, and to my knowledge Clang's frontend doesn't care except for-fsanitize=null
.I get the impression this wasn't a conscious decision and more of an emergent property of "you can only do
add()
on valid pointers", but it'd be nice to document it if it was. Although you could infer thatp != NULL
if you writep + n
in a vaccum, I've never seen a situation where being able to deduce this is useful optimization fuel. That said, I would not mind being proven wrong for when I have equivalent conversations in C/C++ land. =)The text was updated successfully, but these errors were encountered: