Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should null().add(0) work? #299

Closed
mcy opened this issue Jul 24, 2021 · 36 comments
Closed

Should null().add(0) work? #299

mcy opened this issue Jul 24, 2021 · 36 comments

Comments

@mcy
Copy link

mcy commented Jul 24, 2021

This question comes up a lot in C/C++ land. Here is the status quo:

  • C rejects the program NULL + 0 as ill-formed, because arithmetic must land inside of an "allocation".
  • C++ accepts NULL + 0, by special case, an defines it to be NULL.
  • LLVM defines getelementptr inbounds T, T* 0, i64 0 to be T* 0, again by special case.
    • Incidentally, Clang lowers p + 0 to getelementptr inbounds in both language modes.
  • Rust rejects the program ptr::null::<T>().add(0), for the same reasons as C.
    • Miri correctly rejects it as UB (Miri seems to special-case null here? Am I reading this right?)

IME this has caused quite a bit of pain in C, where you might want to write:

void Foo(char* p, size_t len) {
  char* end = p + len;
  for(; p != end; p++) { ... }
}

This is UB for any sensible choice of an empty slice, including p = NULL. While Rust gives us the tools to avoid writing such code, it is... somewhat questionable to follow C rather than C++ here. Like, you could use wrapping_add(), but honestly that seems a bit silly given that LLVM goes out of its way to make this work, and to my knowledge Clang's frontend doesn't care except for -fsanitize=null.

I get the impression this wasn't a conscious decision and more of an emergent property of "you can only do add() on valid pointers", but it'd be nice to document it if it was. Although you could infer that p != NULL if you write p + n in a vaccum, I've never seen a situation where being able to deduce this is useful optimization fuel. That said, I would not mind being proven wrong for when I have equivalent conversations in C/C++ land. =)

@mcy
Copy link
Author

mcy commented Jul 24, 2021

Addendum: Miri accepts the following correct program:

let slice = std::slice::from_raw_parts(0x1 as *const u8, 0);  // Valid alternate spelling of &[].
let range = slice.as_ptr_range();  // Calls ptr.add(len) internally.

So yeah, it seems Miri is special-casing NULL + 0 intentionally. =/

@Lokathor
Copy link
Contributor

0x1 isn't null though.

@mcy
Copy link
Author

mcy commented Jul 24, 2021

My point is that Miri rejects (0 as *mut u8).add(0) but not (1 as *mut u8).add(0). It does reject (1 as *mut u8).add(1). Something seems wrong here. I would be willing to accept this for a ZST (in the "there exist infinitely many ZSTs at every well-aligned address" sense) but it seems very odd in the non-ZST case.

@Lokathor
Copy link
Contributor

With ptr::add, you can start and end one byte past an allocated object. So if you start at one byte past an allocated object then add 0, you're still one byte past the end, and thus you're fine.

but if you start at one byte past the end and add 1 you've gone too far and it's UB.

@RalfJung
Copy link
Member

RalfJung commented Jul 24, 2021

I think this is a duplicate of #93: basically, I wish I knew what the exact rules in LLVM were here, so that we can ensure Rust is following those rules. But sadly, the LangRef is far from clear, and when I asked for clarification I did not get clear answers either.

So currently, Miri accepts (1 as *mut u8).add(0) because otherwise the standard library is horribly unsound, but rejects (0 as *mut u8).add(0) because LLVM seems pretty clear that this is not allowed. Miri also rejects ptr.add(0) when ptr points to some allocation that was previously allocated but has since been freed (this is based on the provenance of the pointer).

For non-0 offsets, like in (1 as *mut u8).add(1), I think there is no question: this is definitely disallowed, since there would have to be an allocation of size at least 1 here to accept this. Only .add(0) is interesting.

If I could have my way, .add(0) would just be always allowed. But sadly LLVM does not seem to give us that option.

@RalfJung
Copy link
Member

Btw, in case you are curious, here is the code that implements this in Miri:

https://github.com/rust-lang/rust/blob/7d8e7b14a7ccfeca85f7206c0a7e27a3f144ce9e/compiler/rustc_mir/src/interpret/intrinsics.rs#L517

@mcy
Copy link
Author

mcy commented Jul 24, 2021

Ralf: so my read of the LangRef seems inconsistent with yours. Looking at the definition of gep inbounds1:

The base pointer has an in bounds address of an allocated object, which means that it points into an allocated object, or to its end. The only in bounds address for a null pointer in the default address-space is the null pointer itself.

My read of this is that the null pointer points to a zero-width allocation, so I'm pretty sure the LLVM snippet given is sound, and defined to be T* 0. As mentioned above, this mirrors the special case in the C++ standard (I can pull up the citation if you ask, I just don't wanna dive into the standard rn). I'm curious to see where the LLVM folks said otherwise, since that seems contrary to what I've seen them say.

In particular, Clang lowers p + 0 the same (gep inbounds) in C and C++; if LLVM had C semantics, the C++ lowering is unsound. https://godbolt.org/z/esjW1e7Md

So currently, Miri accepts (1 as *mut u8).add(0) because otherwise the standard library is horribly unsound

The funny thing is that we actually have the reverse of the C++ rules: 0x1 + 0 is disallowed, but 0x0 + 0 is defined to be 0x0. I agree that p + 0 := p for all pointers p regardless of value, liveness, provenance, or inferred volatility. This seems kind of hard without getting cooperation from LLVM.

For non-0 offsets, like in (1 as *mut u8).add(1), I think there is no question: this is definitely disallowed, since there would have to be an allocation of size at least 1 here to accept this. Only .add(0) is interesting.

Yup. I only mentioned it to make sure I wasn't crazy. It would not make sense to allow that.

In the meantime, can we document the current Miri behavior under <*mut T>::add? Zero offsets are really weird and confuse people in C, C++, and Rust alllll the time.

@mcy
Copy link
Author

mcy commented Jul 24, 2021

Ah, and let us not forget the classic companion to this question. Is the following program UB?

std::memcpy(nullptr, nullptr, 0);

All three languages say "yes". I believe Rust says no if the pointers are 0x1 or similar for the same reasons Ralf gives above.

@RalfJung
Copy link
Member

My read of this is that the null pointer points to a zero-width allocation, so I'm pretty sure the LLVM snippet given is sound, and defined to be T* 0.

We seem to agree that .add(0) is sound if and only if at the given address there exists a zero-sized allocation.

But NULL is explicitly a pointer where no allocation can ever exist in LLVM. I would hence argue that there cannot exist a zero-sized allocation at NULL, either. Put differently: given that no allocation can exist at NULL, then if add(0) is valid on the NULL pointer, it should be valid on all pointers, even pointers to deallocated objects -- would you agree to that? To be concrete about the latter case, I mean code like this:

let ptr = Box::into_raw(Box::new(0i32));
drop(Box::from_raw(ptr));
ptr.add(0); // UB or not? Miri says UB.

I tried to ask the LLVM devs about this and here is part of the thread (it's stretched across several months so browsing the full thread is super annoying). It was hard for me to extract much concrete from this, but it seemed pretty clear that .add(0) can be invalid for some pointers. I hence made Miri as conservative as I could, making .add(0) invalid for as many pointers as possible without drowning the standard library in error messages. This is using Miri to explore possible specifications, not using Miri to document any prior decision -- there cannot be a decision since LLVM is a mess. :(

It seems you are saying that maybe NULL.add(0) is actually okay (it has to be for clang to be C++-compliant -- but that does not mean much, clang has been violating the C standard for a decade or so until they finally fixed their infinite loop optimizations) -- but possibly examples like the ones above could still be not okay, making dangling pointers "more invalid" than NULL? But that seems ridiculous and pointless. If NULL.add(0) is valid, an analysis can already deduce basically nothing from a GEPi with offset 0, and hence it seems to make no sense at all to forbid this for other pointers.

I was not aware of the NULL + 0 rule in C++. so it would be good to have a citation for this if you could dig that up. Maybe with that information in our hand, we can make another attempt at talking to the LLVM people and convince them that GEPi with offset 0 should just always be valid.

In the meantime, can we document the current Miri behavior under <*mut T>::add? Zero offsets are really weird and confuse people in C, C++, and Rust alllll the time.

Since the current Miri behavior is just whatever I put in the code to make the tests pass, I don't think we should make it normative by documenting it.

I think the only way to make progress on this issue is to get LLVM to improve their LangRef. Until we do that, everything Rust does is like building on quicksand. I tried and failed. Maybe someone else wants to give it a try? :D

std::memcpy(nullptr, nullptr, 0);

Maybe let's not open more than one Pandora's box per issue. ;)

@RalfJung
Copy link
Member

Speaking of Pandora's boxes, the .add(0) issue is closely related to why our docs for pointer validity say some really surprising things about zero-sized accesses. :(

@comex
Copy link

comex commented Jul 24, 2021

I haven't looked at the thread yet, but looking at @mcy's quote:

The only in bounds address for a null pointer in the default address-space is the null pointer itself.

This implies that the null pointer is an "in bounds address" for a base of null; otherwise it would have said something like "There are no in bounds addresses for a null pointer in the default address-space." Since the only time the LangRef mentions "in bounds addresses" is with respect to getelementptr, unless I'm missing something, that directly implies that a getelementptr inbounds that takes null as the base address and returns null can be valid.

Of course, it's possible that the LangRef is just wrong. But as it's written, my take is that it does unambiguously permit NULL + 0.

@comex
Copy link

comex commented Jul 24, 2021

Looks like this language was added in:

commit 13f2e353110c3523295ed6026622d897f7506bbe
Author: Eli Friedman <[email protected]>
Date:   Thu Feb 23 00:48:18 2017 +0000

    Explicitly state the behavior of inbounds with a null pointer.
    
    See https://llvm.org/bugs/show_bug.cgi?id=31439; this reflects LLVM's
    behavior in practice, and should be compatible with C/C++ rules.
    
    Differential Revision: https://reviews.llvm.org/D28026
    
    llvm-svn: 295916

The linked bug report is about NULL + 0, and Richard Smith mentions:

C++ allows null + 0, but not anything beyond that (as if there is a 0 element array at the null pointer).

So indeed, LLVM intentionally allows 'NULL + 0' in order to match C++, notwithstanding the strangeness you mentioned. I agree this would be a good reason to reopen the discussion about 'deallocated + 0'.

@Diggsey
Copy link

Diggsey commented Jul 24, 2021

I haven't looked at the thread yet, but looking at @mcy's quote:

The thread doesn't cover the "base=NULL" case specifically - it's more about how an "allocation" is defined when it's not known to LLVM (ie. the base=0x1 case), My reading of that LangRef quote agrees that GEP inbounds NULL + 0 is not poison.

@RalfJung
Copy link
Member

@comex I agree -- I must have missed or forgotten about that part of the LangRef.

However, on the Miri side, I think I'd like to maintain the symmetry that for ptr pointing to a zero-sized type, ptr.add(n) is allowed if and only if ptr.read() is allowed. Or, to put it differently, allowing null.add(0) would require putting in a special case for the NULL pointer, whereas right now Miri is treating NULL like any deallocated address, avoiding a special case.

@Diggsey
Copy link

Diggsey commented Jul 24, 2021

So currently, Miri accepts (1 as *mut u8).add(0) because otherwise the standard library is horribly unsound, but rejects (0 as *mut u8).add(0)

to put it differently, allowing null.add(0) would require putting in a special case for the NULL pointer

Does this mean MIRI currently has a special-case for 0x1?

@RalfJung
Copy link
Member

RalfJung commented Jul 24, 2021

No. The code implementing offset-inbounds has no special case at all. It just checks "is the memory range between the old and new pointer dereferencable".

The code for "dereferencable" has some special cases:

  • if the requested size is 0, we do not require the pointer to carry the provenance of a live allocation (i.e., we allow ptrs created from integer literals)
  • but even if that case, we still require the pointer to be non-null and aligned

We could of course remove the non-null case here, but then *null::<()>() would be allowed -- i.e., we'd allow dereferencing a null pointer at ZST type. That seems wrong.

But I think defining offset-inbounds in terms of dereferencability is very sensible, and avoids having too many slightly different arbitrary definitions in the language, so I'd really like to keep this and not add a special case to offset-inbounds.

And that's why I think that null::<T>().add(0) should be UB if and only if *null::<()>() is UB.

@mcy
Copy link
Author

mcy commented Jul 24, 2021

@comex haha. Not surprised richardsmith is responsible for this.

But I think defining offset-inbounds in terms of dereferencability is very sensible, and avoids having too many slightly different arbitrary definitions in the language, so I'd really like to keep this and not add a special case to offset-inbounds.

That's true. Last time folks on the C/C++ side spoke of this, trying to make things more than NULL + 0 work (like the aforementioned memcpy example) leads to madness. I agree that special cases are bad, but I'm not sure this one is avoidable. The C++ people just gave up and enshrined "NULL + 0" as a correct program, which is kind of a bad sign.

AGL makes a good case for why this is a disaster in C: https://www.imperialviolet.org/2016/06/26/nonnull.html. I'm not saying it cleanly applies to us, because we have our own exciting Pandora's boxes, but he makes some pretty solid points.

@mcy
Copy link
Author

mcy commented Jul 24, 2021

One more thing:

But that seems ridiculous and pointless. If NULL.add(0) is valid, an analysis can already deduce basically nothing from a GEPi with offset 0, and hence it seems to make no sense at all to forbid this for other pointers.

It is my belief that trying to exploit p + 0 => "p is valid" has caused more pain and suffering, as cataloged in Adam's post I linked, than it saves us in compiler optimizations. I am almost always on the side of the compiler writers that we should break all the incorrect programs we can... but is this one worth it?

@comex
Copy link

comex commented Jul 24, 2021

We could of course remove the non-null case here, but then *null::<()>() would be allowed -- i.e., we'd allow dereferencing a null pointer at ZST type. That seems wrong.

Hmm, why do you think it seems wrong? In most cases, of course, whether a dereference is at ZST type or not is known prior to LLVM IR generation, so ZST dereferences are eliminated before LLVM's optimizer gets a peek, so there's no need to make them UB under any circumstances. Pre-monomorphization MIR optimizations don't know whether a dereference is at ZST type, but there are lots of things that pre-monomorphization optimizations don't know.

I suppose there's the unsized-locals issue:

#![feature(unsized_locals)]
pub fn foo(px: *const [u8]) {
    unsafe { let x: [u8] = *px; }
}

The above code is not actually accepted (because Copy: Sized, so unsized types can't be Copy, so it complains about copying a non-Copy type). But presumably it will be allowed in the future if unsized locals are fully implemented, and you can simulate it for now by switching px to be a Box.

However, the implementation of reading an unsized type uses memcpy:

 %0 = alloca i8, i64 %px.1, align 16
  %1 = getelementptr [0 x i8], [0 x i8]* %px.0, i64 0, i64 0
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 16 %0, i8* nonnull align 1 %1, i64 %px.1, i1 false)

So if you started with a null pointer of type *const [u8; 0], unsized it to *const [u8], and then dereferenced the result, you'd get a memcpy with a null pointer and zero size.

However, this isn't regular memcpy; it's the LLVM memcpy intrinsic. LangRef describes its semantics as follows:

If <len> is 0, it is no-op modulo the behavior of attributes attached to the arguments. If <len> is not a well-defined value, the behavior is undefined. If <len> is not zero, both <dest> and <src> should be well-defined, otherwise the behavior is undefined.

This language seems to imply that if <len> is 0, it's okay for <dest> or <src> to be null, "modulo the behavior of attributes attached to the arguments", which is probably referring to the nonnull attribute. Thus, rustc could avoid any problems here by just ensuring it doesn't add nonnull to memcpys which are generated to implement unsized reads.

(In fact, rustc already doesn't. The IR I quoted does have nonnull on the arguments, but that's because I cheated and used Box to get it to compile, and then LLVM propagated nonnull onto the memcpy call from elsewhere.)

Another issue might be std::ptr::read and other functions which are implemented in terms of std::ptr::copy. But the semantics of standard library functions are a separate topic. And given that "memcpy without assuming non-null" is efficiently implementable in LLVM IR, we should probably loosen the specification of std::ptr::copy to accept null pointers anyway,

@Blaisorblade
Copy link

Blaisorblade commented Jul 24, 2021

making dangling pointers "more invalid" than NULL? But that seems ridiculous and pointless.

Not sure if this was already answered, but in C++ dangling pointers are indeed invalid while NULL isn’t, and strictly more operations are allowed on NULL than on invalid pointers.

From http://eel.is/c++draft/basic.compound#3:

Every value of pointer type is one of the following:
(3.1) a pointer to an object or function (the pointer is said to point to the object or function), or
(3.2) a pointer past the end of an object ([expr.add]), or
(3.3) the null pointer value for that type, or
(3.4) an invalid pointer value.

http://eel.is/c++draft/basic.stc#general-4 makes dangling pointers invalid and operations on them UB or implementation-defined; instead http://eel.is/c++draft/expr.add#4.1 hardcodes NULL + 0 = NULL.

As usual for C compilers, I’d expect clang to not exploit most UB on dangling pointers, except in the mess that is pointer equality comparison, where the UB is exploited to allow using provenance. But I haven't run tests on this choice.

@RalfJung
Copy link
Member

RalfJung commented Jul 25, 2021

@mcy

That's true. Last time folks on the C/C++ side spoke of this, trying to make things more than NULL + 0 work (like the aforementioned memcpy example) leads to madness. I agree that special cases are bad, but I'm not sure this one is avoidable. The C++ people just gave up and enshrined "NULL + 0" as a correct program, which is kind of a bad sign.

I think it is avoidable -- whatever Miri currently implements seems to be compatible with at least the majority of unsafe code in the standard library, and whatever people run Miri on.
In C++, NULL is commonly used as a sentinel for "absence of a value"; we don't do this in Rust so I don't see why Rust would benefit from special-casing null().add(0). There pretty much is a guarantee that NonNull::dangling().as_ptr().add(0) is allowed.

It is my belief that trying to exploit p + 0 => "p is valid" has caused more pain and suffering, as cataloged in Adam's post I linked, than it saves us in compiler optimizations. I am almost always on the side of the compiler writers that we should break all the incorrect programs we can... but is this one worth it?

I am fully in favor of allowing offset-by-0 on invalid pointers. :) But I think we should do either one of the following two:

  • allow offset-by-0 on all pointers
  • allow offset-by-0 if and only if we'd allow a zero-sized read from that pointer

The concern is not just breaking programs but also having teachable rules. I don't think null().add(0) is a common pattern in Rust (but maybe I am looking at the wrong places?), so I think the benefit of allowing that case is outweighed by the cost of adding a special case to the spec.

@comex

Hmm, why do you think it seems wrong? In most cases, of course, whether a dereference is at ZST type or not is known prior to LLVM IR generation, so ZST dereferences are eliminated before LLVM's optimizer gets a peek, so there's no need to make them UB under any circumstances.

"In most cases" and "under any circumstances" don't go together.^^ Either we can guarantee that all ZST accesses are removed prior to LLVM IR generation, or we certainly have to make them UB. Note that "accesses" here includes intrinsics that access memory and the access size is only determined at runtime, such as the copy, copy_nonoverlapping, and write_bytes intrinsics. So, I don't think we can make that guarantee.

For ptr: *const (), doing ptr.read() should be UB if and only if other_ptr.copy_from(ptr as *const u8, 0) is UB (with the 0 not statically known, and other_ptr being an unambiguously valid *mut u8 pointer). Both conceptually read 0 bytes starting at ptr, so their behavior should be consistent.

Also I feel like "though shalt not dereference the NULL pointer" is sufficiently deeply engraved that it would be more surprising to allow this code than to forbid it... this might be a good surprise for Rust programmers, but it is equally surprising to compiler devs and thus introduces a significant risk of accidentally making a "NULL cannot be dereferenced" assumption during optimizations.

@Blaisorblade

except in the mess that is pointer equality comparison

To my knowledge, LLVM's behavior is consistent with pointer equality entirely ignoring provenance. I don't think pointer equality is ever UB in LLVM.

Not sure if this was already answered, but in C++ dangling pointers are indeed invalid while NULL isn’t, and strictly more operations are allowed on NULL than on invalid pointers.

But is there a good reason for this? What does this extra "more invalid that NULL" UB let compilers do?

@RalfJung
Copy link
Member

@comex oh and also it seems somewhat inconsistent to me to say that null::<u8>() can never be dereferenced (even on bare metal systems where there is nothing fundamentally wrong with accessing address 0), but null::<()>() can be dereferenced.

@Blaisorblade
Copy link

@RalfJung just to answer your questions (since I can only speak re C++ which is a side discussion):

To my knowledge, LLVM's behavior is consistent with pointer equality entirely ignoring provenance. I don't think pointer equality is ever UB in LLVM.

Fair enough; I'll note in C++ it's not UB, just unspecified in some cases (e.g. https://eel.is/c++draft/expr.eq#3.1), even if GCC devs would prefer it be UB, and the C++ standard wording doesn't actually license "return false when comparing pointers with different provenances" AFAICT.

IIRC, the use-cases were in C++ assignment operators.

But is there a good reason for this? What does this extra "more invalid that NULL" UB let compilers do?

AFAIK pointer zapping is not used outside of equality comparison (the usual example — auto p = new int; delete p; auto q = new int; return p == q; can return false even if the addresses coincide). http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2443.pdf reviews motivations, but doesn't list applications to optimizations.

@RalfJung
Copy link
Member

Fair enough; I'll note in C++ it's not UB, just unspecified in some cases (e.g. https://eel.is/c++draft/expr.eq#3.1), even if GCC devs would prefer it be UB, and the C++ standard wording doesn't actually license "return false when comparing pointers with different provenances" AFAICT.

Yes that reflects my understanding as well.
In LLVM, optimizations treat pointer comparison as deterministic. The intention quite clearly is that the result is always specified (and equivalent to comparing the underlying integers). Rust also pretty much relies on this since even dangling raw pointers can be compared in safe code. The only counterexample to this that I know involves a custom implementation of malloc that returns the address of a global variable for the first allocation request -- which arguably violates the noalias return type annotation of malloc so this is a different issue I think.

pointer zapping

Ah, you are right -- the "even less valid than NULL" thing in C/C++ is entirely overwhelmed by pointer zapping, so we cannot even discuss the finer points around zero-sized accesses and zero-sized inbounds offsets.

LLVM does not document that it does pointer zapping, but also does not explicitly state the opposite... AFAIK LLVM is consistent with "no pointer zapping" and Rust relies on that.

@mcy
Copy link
Author

mcy commented Jul 25, 2021

@RalfJung

allow offset-by-0 on all pointers

Yes. I think the best strategy is to persue this on the LLVM side long-term. I can poke some folks I know on Monday, but this is definitely an uphill battle. =(

I don't think null().add(0) is a common pattern in Rust (but maybe I am looking at the wrong places?)

So I think you're right, but @davidben has made a lot of good points to me about how this can make FFI a lot more dangerous than it has to be. Though he agrees with us that "allow offset-by-zero for all pointers" is optimal for everyone's sanity. (I'll let him comment further if he wants to.)

@RalfJung
Copy link
Member

Yes. I think the best strategy is to persue this on the LLVM side long-term. I can poke some folks I know on Monday, but this is definitely an uphill battle. =(

That would be great. :D
At this point (given the C++ rule) I'd honestly be rather surprised if any part of LLVM can make good use of offset-by-0 being UB for some pointers -- so my expectation is that this is a spec-only change for LLVM. But then there might be nasty surprises lurking in that codebase.

this can make FFI a lot more dangerous than it has to be

I don't quite understand the FFI point -- is the issue that C might be passing in NULL pointers and then Rust will .add(0) them?

If NULL + 0 is just a C++ rule, not a C rule, then arguably Rust is no worse wrt. FFI than C here... but yeah I could see how that's a footgun under some conditions.

@davidben
Copy link

davidben commented Jul 25, 2021

I don't quite understand the FFI point -- is the issue that C might be passing in NULL pointers and then Rust will .add(0) them?

The FFI point I think applies less to NULL + 0 (I expect Rust code will convert to slices early) and more to transiting (ptr, len) tuples across the languages. At the risk of opening Pandora's box, I think the questions to ponder there are:

  1. What are the natural/allowed representations of the empty slice as a (ptr, len) tuple in some language?
  2. Do allowed operations on non-empty (ptr, len) tuples correctly extend to (1) in that language?
  3. Are the answers for (1) the same across all languages? If not, what are the implications for FFI code?

For C/C++, I think it's unavoidable that (NULL, 0) is in (1). std::vector::data and std::span::data often return nullptr. And given a function that takes (ptr, len) as separate parameters, the most natural way to say empty slice is (NULL, 0). Whether (non-NULL placeholder, 0) should be in the set for C/C++ is less clear to me. It seems less likely to come up in normal code.

Given that, yes, C/C++ both fail (2). C messes up ptr + len, though C++ is fine. C also messes up memcpy and C++ inherits this. This is terrible and I wish C/C++ would fix this.

My understanding from @mcy is that Rust, at least at the level of a slice's internal representation and from_raw_parts/to_raw_parts, believes (1) contains (non-NULL placeholder, 0) and is UB if you use (NULL, 0). I suppose (NULL, 0) slices being UB to begin with means we don't strictly need to allow NULL + 0 for Rust to satisfy (2), but I'd advocate allowing it anyway because forbidding NULL + 0 is pointless.

However, this means C/C++ and Rust are incompatible for (3), which has unfortunate implications for FFIs. If Rust tries to take in a slice from C/C++, perhaps out of a std::span or perhaps by implementing some C (ptr, length) type signature, from_raw_parts's UB on NULL is dangerous. Conversely, if Rust tries to hand a slice to C/C++, we need to ponder how well C/C++ code can be expected to handle (non-NULL placeholder, 0). This is not a good situation. While I don't have direct experience with failures in (3), it's a variation on the problems with failing (2). I can say from experience that failures in (2) are a problem for real-world code. In the codebase I work on, we ended up even replacing memcpy with a small wrapper function to deal with C/C++'s issues here.

I imagine Rust's ban on (NULL, 0) for the internal slice representation is a lost cause at this point (though one could imagine saying Option<&[T]>::None is something real goofy like (NULL, 1)). Still, I think fixing the conversions between slices and (ptr, len) tuples is warranted. Perhaps some branches in from_raw_parts. Note this mismatch also means that a Rust slice cannot be directly used in C/C++ FFIs, even if you do make guarantees on the struct layout.

@RalfJung
Copy link
Member

@davidben thanks for this helpful input!

If by "slice" you mean &[T], then yes that is non-NULL -- unambiguously and as set in stone as it could possibly be, I think.

Rust does satisfy (2) for from_raw_parts(NonNull::dangling().as_ptr(), 0).

I can see how indeed this violates (3), but I don't think Rust will give up the non-NULL optimization due to this -- that optimization is just too deeply entrenched at this point, and also likely too important. I would think the best way to solve this is to have explicit conversion functions at the "edges" that turn NULL into NonNull::dangling() for C++ → Rust, and reset the pointer to NULL for all 0-length slices for Rust → C++. But this also feels like it is moving quite far away from the original topic of this thread -- we are not discussing language rules here any more, "just" some library design.

Option<&[T]>::None is an interesting proposal, and the pointer is indeed guaranteed to be NULL here, but the length is part of the padding and so it could even be uninitialized... so without some very specific hacks, I don't think this type will be easily suited for FFI. It also doesn't feel like the right solution since slices already have a natural "absent" case, namely the empty slice.

Now, in Rust we also have *const [T], and that does allow a NULL data pointer. But at least currently, (almost) any operation on such a NULL slice is UB. Some of the changes proposed above would change that.

@Lokathor
Copy link
Contributor

If foreign code is giving you ptr+len and you want to try to make it into a rust slice then naturally you should simply use some sort of "try_from_raw_parts" method which returns Option<&[T]> and you get the None if the pointer was null or you get a Some with the right length if the pointer was non-null.

As Ralf says, that this function doesn't exist in the standard library today is "just" a library design issue.

@davidben
Copy link

Sure, I think we're in agreement here. Short of changing the slice representation, any fix for the unsafe FFI story will be a library one. I just elaborated to clarify the FFI issue I was summoned for. Like I said, it doesn't apply to NULL + 0 per se, just all part of the mess that is empty slices.

@mcy
Copy link
Author

mcy commented Jul 26, 2021

I was chatting w/ @regehr on twitter, who seems to believe that the Alive2 model of LLVM's semantics actually already has p + 0 = p for all pointers. He provides this IR transform, which Alive2 accepts as valid, as evidence: https://alive2.llvm.org/ce/z/rZ8YDp

Perhaps this is less hopeless than we thought?

See: https://twitter.com/DrawsMiguel/status/1419432464418754561?s=20

@Diggsey
Copy link

Diggsey commented Jul 26, 2021

@mcy isn't that the wrong direction? Transforming p + 0 => p is always valid, regardless of whether p + 0 results in UB, since if it is UB then any transformation is valid...

@mcy
Copy link
Author

mcy commented Jul 26, 2021

Ah. That is a very good observation. I'm not the one asserting that this is ok, so I won't defend it. =)

@sollyucko
Copy link

Looks like the opposite direction doesn't work: https://alive2.llvm.org/ce/z/7PXoVt: (1 as *const u8) + 0 = poison

@RalfJung
Copy link
Member

Looks like the opposite direction doesn't work: https://alive2.llvm.org/ce/z/7PXoVt: (1 as *const u8) + 0 = poison

Yeah that's definitely a problem. Rust generates code like this.

@RalfJung
Copy link
Member

Closing in favor of #93, which discusses the more general question of zero-sized accesses/offsets. I don't think the null pointer should get any special treatment here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants