Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[tracking issue] raw ptr to usize cast inside constants #51910

Closed
oli-obk opened this issue Jun 29, 2018 · 44 comments
Closed

[tracking issue] raw ptr to usize cast inside constants #51910

oli-obk opened this issue Jun 29, 2018 · 44 comments
Assignees
Labels
A-const-eval Area: Constant evaluation, covers all const contexts (static, const fn, ...) B-unstable Blocker: Implemented in the nightly compiler and unstable. C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC T-lang Relevant to the language team, which will review and decide on the PR/issue.

Comments

@oli-obk
Copy link
Contributor

oli-obk commented Jun 29, 2018

This is the tracking issue for the const_raw_ptr_to_usize_cast feature.

Activating the feature allows casting *const T and *mut T to usize.

I did not open an RFC for this because I believe we want to experiment with the feature (and we can already emulate it with union transmute hacks).

@oli-obk oli-obk added A-const-fn A-const-eval Area: Constant evaluation, covers all const contexts (static, const fn, ...) labels Jun 29, 2018
@crlf0710
Copy link
Member

Just encountered this, when i try to define a constant that uses ::winapi::um::winuser::IDC_ARROW as an integer, which is defined as:

pub const IDC_ARROW: LPCWSTR = 32512 as LPCWSTR;

@oli-obk
Copy link
Contributor Author

oli-obk commented Oct 1, 2018

@crlf0710 what is your use case for casting a raw pointer to an integer? Is there any reason you can't use *const () or something similar?

@crlf0710
Copy link
Member

crlf0710 commented Oct 2, 2018

@oli-obk actually it is an integer in the first place, however the upstream library (winapi) cast the integer to a raw pointer (maybe for convenience using it together with the raw Windows APIs) when it defined that variable (see above comment). So i can't use the original integer value in constant evaluation context now :(

I can work around this by (re-)defining those integers myself again. (Arguably the winapi crate can change its definitions to avoid this.) I'm not seeking for any changes, just want to record this use case.

@oli-obk
Copy link
Contributor Author

oli-obk commented Oct 2, 2018

I can offer you a stable workaround, but it's very unsafe and if you screw up by placing a real pointer in an integer constant we will error, and if we don't, we might so in the future. If you do undefined behaviour at compile-time, it means the behaviour (compilation success) is also undefined.

union Transmuter {
    from: LPCWSTR,
    to: usize,
}
const MY_CONST: usize = unsafe { Transmuter { from: IDC_ARROW }.to };

@crlf0710
Copy link
Member

crlf0710 commented Oct 2, 2018

@oli-obk Great to know, thanks a lot!

@Centril Centril added T-lang Relevant to the language team, which will review and decide on the PR/issue. C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC labels Dec 1, 2018
@MSxDOS
Copy link

MSxDOS commented Jul 15, 2019

if you screw up by placing a real pointer in an integer constant we will error, and if we don't, we might so in the future.

I'm confused, what's wrong with putting a real pointer in an integer constant?

Btw, the following code compiles just fine:

use std::ptr::null;

#[repr(C)]
struct Fields {
    field1: u32,
    field2: [u8; 3],
    field3: u64,
    field4: i32,
}

union Transmuter<T: 'static> {
    ptr: *const T,
    reference: &'static T,
    integer: usize,
}

const FIELD4_OFFSET: usize =
    unsafe { Transmuter { reference: &(Transmuter { ptr: null::<Fields>() }.reference).field4 }.integer };

const _STATIC_ASSERT_EQ_: [(); 16] = [(); FIELD4_OFFSET];

@oli-obk
Copy link
Contributor Author

oli-obk commented Jul 16, 2019

you are converting a null pointer to a reference (which is UB) and then back to an integer (which is fine, since 0 is a valid integer and not a pointer). What I'm talking about is if you transmute a reference to e.g. a static to an integer.

@oli-obk
Copy link
Contributor Author

oli-obk commented Jul 31, 2019

Now that #62946 is on its merry way, this issue becomes unactionable. @RalfJung I think we should just close it and make ppl use raw pointers instead of integers?

@oli-obk
Copy link
Contributor Author

oli-obk commented Jul 31, 2019

Hmm actually, if the value is an int it's still fine. So we'd still allow ptr::null() as usize... idk if that is a useful feature

@RalfJung
Copy link
Member

One case where union transmute hacks are used currently is to have a const-capable offsetof!. I am not sure what it would take to avoid this. Ultimately this has to subtract two pointers from each other -- either two integer-valued pointers (but then it did some illegal ptr arithmetic) or two pointer-valued pointers into the same allocation (then we would need new code somewhere if we really wanted to support this).

@Geobert

This comment has been minimized.

@RalfJung

This comment has been minimized.

@Geobert

This comment has been minimized.

@RalfJung

This comment has been minimized.

@Geobert

This comment has been minimized.

@jonas-schievink jonas-schievink added the B-unstable Blocker: Implemented in the nightly compiler and unstable. label Nov 26, 2019
@elichai
Copy link
Contributor

elichai commented Mar 2, 2020

Why does regular casting requires unsafe? is it because we have no idea what to put in that usize and we cannot promise that it will keep working in the future? if so isn't that redundant by already being unstable?

#![feature(const_raw_ptr_to_usize_cast)]

const A: usize = &42 as *const i32 as usize;

Error:

error[E0133]: cast of pointer to int is unsafe and requires unsafe function or block
 --> src/lib.rs:3:18
  |
3 | const A: usize = &42 as *const i32 as usize;
  |                  ^^^^^^^^^^^^^^^^^^^^^^^^^^ cast of pointer to int
  |
  = note: casting pointers to integers in constants

https://play.rust-lang.org/?version=nightly&mode=debug&edition=2018&gist=8b159ba225b57fd39512d1f6d9af9956

EDIT: I see that after adding "unsafe" it prints "pointer-to-integer cast" needs an rfc before being allowed inside constants so I think the bug here is printing about the unsafe part instead of the RFC comment

@oli-obk
Copy link
Contributor Author

oli-obk commented Mar 3, 2020

You can cast pointers that you derived from integers. You just can't cast real pointers. This works as intended, and the reason the feature gate is not stable is because we can't detect statically whether your cast is going to be fine. We actually need to evaluate the constant with the cast as you saw.

In case of associated constants of generic traits, we can't actually do the cast unless you instantiate the trait with specific types. This means that you could build an associated constant that is broken, but only users of your trait see that and not you (that's why it's unsafe. This operation is "const unsafe").

Avi-D-coder added a commit to Avi-D-coder/sundial-gc that referenced this issue Jun 4, 2020
rust-lang/rust#51910
We will have to provide the drop_in_place::<T> ptr upon Arena<T> registration
@luojia65
Copy link
Contributor

One temporary way by now is to using *const T raw pointer in const structures, but converting it into a usize in constants would be more nice and convenient.

@oli-obk
Copy link
Contributor Author

oli-obk commented Jun 30, 2020

I find it less and less likely for raw ptr to int casts to ever become stable in CTFE. See also #51909 for the related discussion on unconst things and soundness concerns in CTFE.

@RalfJung
Copy link
Member

The thing is, if we don't stabilize the casts, people will just transmute instead once that is stable...

@oli-obk
Copy link
Contributor Author

oli-obk commented Jun 30, 2020

But in consquence that means that we should also stabilize raw pointer comparisons as people will otherwise just cast to usize and compare there.

@RalfJung
Copy link
Member

Depends on what you mean by "stabilize". I think the == operator should continue to always error when either operand is a Scalar::Ptr -- no matter whether this is == at type usize or == at type raw pointer. That ensure people can not compare raw pointers by casting to usize.

But you have a point... almost anything people would usually do after casting to usize does not actually work (like doing arithmetic on that integer or comparing it). Literally the only thing you can do is cast it back. So in that regard, disallowing the cast could be useful -- the error could explain what the cast would be rather pointless anyway. Maybe it should be a lint instead of a hard error.

@mark-i-m
Copy link
Member

One could argue that if they transmute, they are doing something unsafe, so it is their problem...

@oli-obk
Copy link
Contributor Author

oli-obk commented Jun 30, 2020

Pointer comparison and casting raw pointers to usize would also be unsafe if we stabilized it in const fn

@mjbshaw
Copy link
Contributor

mjbshaw commented Jul 1, 2020

This might be a stupid question, but why impose such strict restrictions on pointers? What's so bad about subtracting two pointers, or comparing them, or casting to usize and back (possibly with some arithmetic while usize)?

I understand that you don't want to allow const BAD_PTR: *const u8 = &1u8 as *const u8; because that's a meaningless constant pointing to some random spot in the const VM's address space. If that's the problem these restrictions are trying to solve then this sure seems like throwing out the baby with the bathwater, so to speak.

@HadrienG2
Copy link

HadrienG2 commented Jul 1, 2020

@mjbshaw IIUC, the problem is that the correctness of const depends on a const fn returning the same result every time it is called, because the resulting value can be used in contexts like type definitions. For example, consider:

const BAD : [f32; (&1u8 as *const u8 as usize) % 2] = [4.2];

If such code is allowed, this array definition (and thus the underlying program) will sometimes compile, sometimes throw a compiler error, depending on star alignment. This is, by definition, compile-time undefined behavior, and already looks like something Rust might not want to support without at least some kind of unsafe {} guard rail. Furthermore, once we start adding layers of abstraction...

const fn coin() -> usize {
    let randomness = 1u8;
    (&randomness as *const u8 as usize) % 2
}

/* lots of other code */

const BAD : [f32; coin()] = [4.2; coin()];

...it may not be immediately obvious to either a human or a compiler that something evil is going on, and accepting this fate by forcing compilers to do the in-depth check may preclude improving the efficiency of rustc's const evaluator through optimizations that deduplicate code in the future.

@oli-obk
Copy link
Contributor Author

oli-obk commented Jul 1, 2020

@mjbshaw please read #51909 if you are interested in the icky details. There's no point in rehashing it here in detail as it's just another shade of the same thing. But yea, as explained by @HadrienG2, the TLDR is that const eval cannot possibly compute (pointer as usize) / 2 and will throw up (current impl) or decide to compile your binary into a giant nasal demon after LLVM is done with it (possible impl). There's also a video of a lang team meeting where we discuss this https://youtu.be/b3p2vX8wZ_c?t=1887 (it should link to the 31 minute mark where the whole unsafety thing is explained).

If such code is allowed, this array definition (and thus the underlying program) will sometimes compile, sometimes throw a compiler error, depending on star alignment.

That by itself is not a problem we already have this if you make an array length rely on std::mem::size_of which wildly differs between platforms and compiler versions and other things.

@RalfJung
Copy link
Member

RalfJung commented Jul 1, 2020

Speaking specifically about subtraction, we could (dynamically) allow subtraction ((ptr1 as usize) - (ptr2 as usize)) for some pointers, but IMO it is clearer to say that this will never work (unless both pointers were originally cast from an integer). For the cases that do work, people should use offset_from, which conveniently is restrictive enough such that all cases where it is non-UB (at runtime) can actually be computed fine during CTFE.

For comparison, unfortunately we don't have a convenient runtime operation to borrow from. But again I think we should rather not (dynamically) allow (ptr1 as usize) == (ptr2 as usize) ever (unless both were cast from an integer), and instead make people use ptr_guaranteed_eq/ptr_guaranteed_ne, thus explicitly acknowledging that CTFE has special rules. For ptr1 == ptr2, then again it feels better to (statically) disallow such that people are forced to use the explicit ptr_guaranteed_* methods.

Casting to usize and back we could support. Indeed, if you replace cast by transmute we cannot prevent it. The only reason to forbid it would be to help people avoid the above pitfalls where operations such as comparison and subtraction only work under specific circumstances. Thus my suggestion to make it a lint. This is based on me being unable to imagine any correct CTFE code that uses ptr-int-casts and that cannot be written better without them -- thus, the lint would not have false positives.

@RalfJung
Copy link
Member

RalfJung commented Jul 1, 2020

I just had another idea, based on overarching principles rather than a case-by-case analysis: we could statically forbid all safe operations that can cause CTFE trouble: raw ptr comparison and ptr <-> int casts. Dynamically, we make no attempt to support pointer values in integer operations (so if people transmute to get around the lack of casts, that doesn't buy them anything).

As a replacement for what is lost this way, we provide dedicated raw pointer operations that are meaningful during CTFE: offset_from, ptr_gauranteed_{eq,ne}.

This means transmute_ptr_to_int(&4) + 0 will error when compile-time-evaluated, you are expected to use (&4 as *const _).wrapping_offset(0) instead. How it errors is an open question; my personal current thinking is to treat such "unconst" operations as UB, IOW transmute_ptr_to_int(&4) + 0 during CTFE is the same as an out-of-bounds access during CTFE. But we could also guarantee that the compiler will detect this problem during evaluation. This question is orthogonal to the rest that is being discussed here.

@oli-obk
Copy link
Contributor Author

oli-obk commented Jun 19, 2021

So... how about this action plan:

  1. Add a section about pointer casts to the reference: https://doc.rust-lang.org/nightly/reference/const_eval.html?highlight=const%20fn#constant-evaluation
  2. Remove the feature gate, turning these casts into a hard error + explain the problem + link to the reference + suggest to use offset_from/ptr_guaranteed_*

@RalfJung
Copy link
Member

I agree with that plan. The documentation on casts, and transmute should probably also link to where the problem is explained.

@oli-obk oli-obk self-assigned this Jun 20, 2021
@CeleritasCelery
Copy link
Contributor

Now that constant transmutes are stabilized in #85769 does that mean this can be stabilized as well? Transmutes will allow ptr to usize casts, just in the less safe way.

@Lokathor
Copy link
Contributor

transmuting a pointer to a usize (or the other way) is probably UB because of how LLVM works. This is an open question, but the current leaning is "don't do that".

@RalfJung
Copy link
Member

Transmutes will allow ptr to usize casts, just in the less safe way.

This is explicitly excluded in the transmute docs:

Transmuting pointers to integers in a const context is undefined behavior. Any attempt to use the resulting value for integer operations will abort const-evaluation.

The feature this issue was tracking has been removed in #87020. I just forgot to close the tracking issue, so let's do that now.

@fee1-dead
Copy link
Member

fee1-dead commented Jan 14, 2022

i just encountered a possibly valid use case of this, when trying to create an x86 page table at compile time:

(The example is using legacy non-PAE paging, although paging with PAE should have the same problem)

#[repr(C)]
#[repr(align(4096))]
pub struct PageTable([u32; 1024]);

static PAGE_TABLE: PageTable = PageTable({
    let mut arr = [0; 1024];
    let mut i = 0;
    while i < 1024 {
        // supervisor level, read/write, present, identity mapped
        arr[i] = (i as u32 * 0x1000) | 3;
        i += 1;
    }
    arr
});

static PAGE_DIRECTORY: PageTable = PageTable({
    // all other entries are not present
    let mut arr = [2; 1024];
    // supervisor level, read/write, present
    arr[0] = addr_of!(PAGE_TABLE) as u32 | 3; // ERROR
    arr
});

I had semi expected this to just work and the page directory would be somehow magically wired to have the first entry point to the page table. Unfortunately I forgot about ptr-to-int cast in compile time being UB.

How much effort is needed for this use case to be valid Rust? I'd assume a lot because it means CTFE for statics now happens after we decide the actual address to place the statics.

@oli-obk
Copy link
Contributor Author

oli-obk commented Jan 14, 2022

Does the PageTable need to be an array of u32? Couldn't it be *const Something?

The big problem in your example is not the cast, it is precisely the bit-or operation. We could specifically add operations to change the bits of a pointer that are guaranteed zero due to the alignment of the reference that the pointer came from.

@fee1-dead
Copy link
Member

fee1-dead commented Jan 14, 2022

It can be *const PageTable which is always 4KiB aligned. Yes, the only requirement is to set the bits that are never set due to alignment.

EDIT: I have now adopted to populate the entries at runtime, but the example above could identity map enough memory to eliminate the need to locate the kernel so I still think this use case is valid.

@RalfJung
Copy link
Member

RalfJung commented Jan 14, 2022

I was so happy when I deleted the code from Miri that enabled "sub-alignment arithmetic and bitops"...

But I agree it is possible to support this so we should at least consider it. Then the question remains, how? IMO the typical API (cast ptr to int, to bitops there, cast ptr back) is just not a good API -- it is way too powerful, and int-to-ptr casts are super hard to define properly. The fiction that pointers and integers are isomorphic is already brittle for runtime code (keyword "pointer provenance"), it breaks down entirely for compile-time code.

So I would be much more happy if we could find a way to support something like PAGE_DIRECTORY with APIs that do not involve integers. Basically I am advocating for primitive bitops like ptr_bit_or(*const T, usize) -> *const T which would either be UB or defined to raise an error if the 2nd operand is neither all-0 nor all-1 for the not-fixed-by-alignment part of the first operand (this is easy to get wrong so it is probably better to guarantee an error).

So basically I am saying, this is the wrong tracking issue for that discussion. :)

@comex
Copy link
Contributor

comex commented Jan 15, 2022

So I would be much more happy if we could find a way to support something like PAGE_DIRECTORY with APIs that do not involve integers. Basically I am advocating for primitive bitops like ptr_bit_or(*const T, usize) -> *const T which would either be UB or defined to raise an error if the 2nd operand is neither all-0 nor all-1 for the not-fixed-by-alignment part of the first operand (this is easy to get wrong so it is probably better to guarantee an error).

Keep in mind getting something like PAGE_DIRECTORY to work requires the linker's cooperation – only the linker knows the final address of any symbol. Indeed, when any kind of dynamic linking or ASLR is involved (these days, common even in kernels), only the dynamic linker knows (or in kernels' case, whatever ad-hoc setup serves as a dynamic linker).

In other words, any computation must be expressed through relocations, and the only computation that can be portably expressed through relocations is "symbol + constant". Even the non-portable options are usually meant for specific purposes and very limited. (Here is the list for x86-64 ELF.)

In this case, PAGE_DIRECTORY actually could be implemented as "address + 3" instead of "address | 3". In Rust that could be achieved with wrapping_offset, which should be stabilized as a const fn.

In theory Rust could also provide a ptr_bit_or that, as @RalfJung suggested, verifies that the unknown bits are either all-0 or all-1, and then gets translated to either an addition (if they're all-0) or a constant ignoring the pointer (if they're all-1). But to me that seems like more complexity than it's worth. If all it can really do is add, then I'd say just expose addition.

@fee1-dead
Copy link
Member

Just encountered another thing I couldn't achieve:

extern crate x86_64;
static GDT: x86_64::GlobalDescriptorTable = /* ... */;

/// Const-friendly gdt pointer
#[repr(C, packed)]
struct GdtPointer {
    limit: u16,
    base: *const GlobalDescriptorTable,
}

unsafe impl Send for GdtPointer {}
unsafe impl Sync for GdtPointer {}

static GDTPTR: GdtPointer = GdtPointer {
    limit: 2,
    base: &GDT,
};

I wrote the code above, then I realized the base field should be 32-bits..

The solution to this might be a type named like struct TruncatePointerForConstUsage<Pointee, Storage>(Storage) and tell the compiler to track the actual pointer.

@RalfJung
Copy link
Member

RalfJung commented Jan 16, 2022

Keep in mind getting something like PAGE_DIRECTORY to work requires the linker's cooperation

I am fully aware, that is why I specifically restricted the bitops to require the part of the bitmask that interacts with the unknown (and unknoweable) part of the pointer to be al-0 or all-1. :)

In theory Rust could also provide a ptr_bit_or that, as @RalfJung suggested, verifies that the unknown bits are either all-0 or all-1, and then gets translated to either an addition (if they're all-0) or a constant ignoring the pointer (if they're all-1). But to me that seems like more complexity than it's worth. If all it can really do is add, then I'd say just expose addition.

I think the bitops are slightly more powerful than addition, for example they permit implementing "tagged pointer" schemes that pack an aligned pointer and a bool into a ptr-sized type.

But indeed it seems that for your first example, @fee1-dead, wrapping_offset on pointers is sufficient? I'm afraid I do not understand the 2nd example.

@oli-obk
Copy link
Contributor Author

oli-obk commented Jan 16, 2022

I'm not sure about the second example either.... how can it be 32 bit if it's supposed to be the address of a static item which may be anywhere in the 64 bit space of the x86_64 machine.

Edit: let's talk on zulip, this is probably not the right place

@comex
Copy link
Contributor

comex commented Jan 16, 2022

I think the bitops are slightly more powerful than addition, for example they permit implementing "tagged pointer" schemes that pack an aligned pointer and a bool into a ptr-sized type.

But if it has to be emitted as addition then it can't be more powerful… Oh, I guess your point is that bitops would allow further const evaluation to extract the tag if needed?

@RalfJung
Copy link
Member

RalfJung commented Jan 16, 2022

But if it has to be emitted as addition then it can't be more powerful… Oh, I guess your point is that bitops would allow further const evaluation to extract the tag if needed?

Yeah. Also I don't think using just addition in the language we can implement "reset the last bit of the ptr [to a 2-aligned allocation] to 0" (if we do not have ptr-to-int casts), but we can totally implement this in the CTFE interpreter using just addition in the linker relocations we emit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-const-eval Area: Constant evaluation, covers all const contexts (static, const fn, ...) B-unstable Blocker: Implemented in the nightly compiler and unstable. C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC T-lang Relevant to the language team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests