Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

request: lint when raw pointer slicing implicitly creates a temporary reference #99437

Open
RalfJung opened this issue Jul 18, 2022 · 7 comments
Labels
A-lint Area: Lints (warnings about flaws in source code) such as unused_mut. A-raw-pointers Area: raw pointers, MaybeUninit, NonNull A-slice Area: `[T]` C-enhancement Category: An issue proposing an enhancement or a PR with one. T-opsem Relevant to the opsem team

Comments

@RalfJung
Copy link
Member

RalfJung commented Jul 18, 2022

So let's say you know about aliasing models and have learned that to avoid all aliasing rules of Rust you need to use raw pointers throughout, no references. So for example you write:

use std::ptr::addr_of_mut;

pub fn test(ptr: *mut [u8]) -> *mut [u8] {
    let layout_size = 24;
    unsafe { addr_of_mut!((*(ptr))[..layout_size]) }
}

Looks good, doesn't it?
Well sadly the MIR for this code looks as follows:

fn test(_1: *mut [u8]) -> *mut [u8] {
    debug ptr => _1;                     // in scope 0 at [src/lib.rs:3:13: 3:16](https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021#)
    let mut _0: *mut [u8];               // return place in scope 0 at [src/lib.rs:3:32: 3:41](https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021#)
    let _2: usize;                       // in scope 0 at [src/lib.rs:4:9: 4:20](https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021#)
    let mut _3: &mut [u8];               // in scope 0 at [src/lib.rs:5:27: 5:50](https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021#)
    let mut _4: &mut [u8];               // in scope 0 at [src/lib.rs:5:27: 5:35](https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021#)
    let mut _5: std::ops::RangeTo<usize>; // in scope 0 at [src/lib.rs:5:36: 5:49](https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021#)
    scope 1 {
        debug layout_size => _2;         // in scope 1 at [src/lib.rs:4:9: 4:20](https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021#)
        scope 2 {
        }
    }

    bb0: {
        _2 = const 24_usize;             // scope 0 at [src/lib.rs:4:23: 4:25](https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021#)
        _4 = &mut (*_1);                 // scope 2 at [src/lib.rs:5:27: 5:35](https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021#)
        Deinit(_5);                      // scope 2 at [src/lib.rs:5:36: 5:49](https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021#)
        (_5.0: usize) = const 24_usize;  // scope 2 at [src/lib.rs:5:36: 5:49](https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021#)
        _3 = <[u8] as IndexMut<RangeTo<usize>>>::index_mut(move _4, move _5) -> bb1; // scope 2 at [src/lib.rs:5:27: 5:50](https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021#)
                                         // mir::Constant
                                         // + span: [src/lib.rs:5:27: 5:50](https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021#)
                                         // + literal: Const { ty: for<'r> fn(&'r mut [u8], RangeTo<usize>) -> &'r mut <[u8] as Index<RangeTo<usize>>>::Output {<[u8] as IndexMut<RangeTo<usize>>>::index_mut}, val: Value(<ZST>) }
    }

    bb1: {
        _0 = &raw mut (*_3);             // scope 2 at /rustc/263edd43c5255084292329423c61a9d69715ebfa/library/core/src/ptr/mod.rs:2005:5: 2005:20
        return;                          // scope 0 at [src/lib.rs:6:2: 6:2](https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021#)
    }
}

Note the _4 = &mut (*_1): Rust "helpfully" created a reference for us to subslice into!
This is terrible, because it introduces UB.

This code should either be rejected, or use raw pointers throughout.
Related to #73987.

@CAD97
Copy link
Contributor

CAD97 commented Jul 18, 2022

This ties into the concept that has come up a few times of distinguishing an "unsafe place" (one created from a raw pointer deref) from a "safe place" (on created from a reference deref).

@RalfJung
Copy link
Member Author

RalfJung commented Jul 18, 2022

I don't think we want to distinguish them in the AM semantics (just like unsafe code has no effect on the operational semantics), but I do think that this distinction could be quite meaningful for diagnostics and possibly for elaboration (auto-ref, auto-deref, all that stuff).

@LegionMammal978
Copy link
Contributor

LegionMammal978 commented Jul 19, 2022

To repeat my other comment, the most unexpected part IMO is that this is valid with a single index. All official sources say that index notation goes through the Index[Mut] traits in general. Since these traits take and produce ordinary references, I'd assume by default that any indexing is unsound here. Only the Reference hints that arrays and slices might not use Index[Mut], and it says nothing about single indexing not producing transient references.

I agree that this particular case can be footgunny, though (you'd almost never want to directly IndexMut a raw *mut [T]), so I'd be in favor of making it a warn-by-default lint. The question is, what is the correct alternative? If the subslice is a constant-sized prefix, as it is here, existing stable methods can handle it, as long as you can stomach the lack of bounds checks:

use std::ptr::{self, addr_of_mut};

pub fn test(ptr: *mut [u8]) -> *mut [u8] {
    let layout_size = 24;
    unsafe { ptr::slice_from_raw_parts_mut(ptr.cast::<u8>(), layout_size) }
}

But if the subslice is length-dependent (e.g., for a suffix), it's a lot more complicated, since <*mut [T]>::len() is still unstable. We'd have to do a more roundabout metadata decomposition, which doesn't even work for null slice pointers:

use std::ptr::{self, addr_of_mut};

pub fn test(ptr: *mut [u8]) -> *mut [u8] {
    assert!(
        !ptr.is_null(),
        "no stable way to get the len of a null slice ptr"
    );
    let layout_size = 24;
    let len = unsafe { (*(ptr as *mut [()])).len() };
    assert!(len >= layout_size, "slice too short");
    let start = unsafe { ptr.cast::<u8>().add(len - layout_size) };
    unsafe { ptr::slice_from_raw_parts_mut(start, layout_size) }
}

(Weirdness like this is why I generally avoid slice pointers altogether.)

@RalfJung
Copy link
Member Author

RalfJung commented Jul 19, 2022

I think the correct alternative is to use the methods tracked in #74265, though those don't do bounds checks.

bstrie added a commit to bstrie/enarx that referenced this issue Jul 19, 2022
…ith slice

This potential source of UB was discovered while upgrading the Rust toolchain,
which upgrades us to a new version of Miri with stricter rules around raw pointers.
Specifically, an expression like `addr_of_mut!((*(ptr))[offset..])` is deliberately
attempting to operate only on raw pointers while avoiding any intermediate references,
since references have invariants that raw pointers do not.
However, there is in fact an implicit reference here that is created as a result of the
indexing operation. This is both surprising and not surprising, for interesting reasons.

First, it should not be surprising because indexing is governed by the Index traits,
whose methods function return references, so their presence here is natural.

On the other hand, it is surprising because Rust already special cases `(*ptr)[foo]` when `ptr`
is a raw slice and `foo` is not a range to avoid the Index traits entirely, which allows it to
avoid emitting an intermediate reference.

The ideal solution here is for Rust to be smart enough to not introduce the intermediate
reference here at all, which is tracked at rust-lang/rust#73987 .

In addition, while investigating this issue I brought it up to the Unsafe Code Guidelines team,
who saw fit to file rust-lang/rust#99437 as a more specific example
of the potential perils of the current behavior.
bstrie added a commit to bstrie/enarx that referenced this issue Jul 19, 2022
…ith range

This potential source of UB was discovered while upgrading the Rust toolchain,
which upgrades us to a new version of Miri with stricter rules around raw pointers.
Specifically, an expression like `addr_of_mut!((*(ptr))[offset..])` is deliberately
attempting to operate only on raw pointers while avoiding any intermediate references,
since references have invariants that raw pointers do not.
However, there is in fact an implicit reference here that is created as a result of the
indexing operation. This is both surprising and not surprising, for interesting reasons.

First, it should not be surprising because indexing is governed by the Index traits,
whose methods function return references, so their presence here is natural.

On the other hand, it is surprising because Rust already special cases `(*ptr)[foo]` when `ptr`
is a raw slice and `foo` is not a range to avoid the Index traits entirely, which allows it to
avoid emitting an intermediate reference.

The ideal solution here is for Rust to be smart enough to not introduce the intermediate
reference here at all, which is tracked at rust-lang/rust#73987 .

In addition, while investigating this issue I brought it up to the Unsafe Code Guidelines team,
who saw fit to file rust-lang/rust#99437 as a more specific example
of the potential perils of the current behavior.

Signed-off-by: bstrie <[email protected]>
bstrie added a commit to bstrie/enarx that referenced this issue Jul 19, 2022
… with range

This potential source of UB was discovered while upgrading the Rust toolchain,
which upgrades us to a new version of Miri with stricter rules around raw pointers.
Specifically, an expression like `addr_of_mut!((*(ptr))[offset..])` is deliberately
attempting to operate only on raw pointers while avoiding any intermediate references,
since references have invariants that raw pointers do not.
However, there is in fact an implicit reference here that is created as a result of the
indexing operation. This is both surprising and not surprising, for interesting reasons.

First, it should not be surprising because indexing is governed by the Index traits,
whose methods function return references, so their presence here is natural.

On the other hand, it is surprising because Rust already special cases `(*ptr)[foo]` when `ptr`
is a raw slice and `foo` is not a range to avoid the Index traits entirely, which allows it to
avoid emitting an intermediate reference.

The ideal solution here is for Rust to be smart enough to not introduce the intermediate
reference here at all, which is tracked at rust-lang/rust#73987 .

In addition, while investigating this issue I brought it up to the Unsafe Code Guidelines team,
who saw fit to file rust-lang/rust#99437 as a more specific example
of the potential perils of the current behavior.

Signed-off-by: bstrie <[email protected]>
enarxbot pushed a commit to enarx/enarx that referenced this issue Jul 19, 2022
… with range

This potential source of UB was discovered while upgrading the Rust toolchain,
which upgrades us to a new version of Miri with stricter rules around raw pointers.
Specifically, an expression like `addr_of_mut!((*(ptr))[offset..])` is deliberately
attempting to operate only on raw pointers while avoiding any intermediate references,
since references have invariants that raw pointers do not.
However, there is in fact an implicit reference here that is created as a result of the
indexing operation. This is both surprising and not surprising, for interesting reasons.

First, it should not be surprising because indexing is governed by the Index traits,
whose methods function return references, so their presence here is natural.

On the other hand, it is surprising because Rust already special cases `(*ptr)[foo]` when `ptr`
is a raw slice and `foo` is not a range to avoid the Index traits entirely, which allows it to
avoid emitting an intermediate reference.

The ideal solution here is for Rust to be smart enough to not introduce the intermediate
reference here at all, which is tracked at rust-lang/rust#73987 .

In addition, while investigating this issue I brought it up to the Unsafe Code Guidelines team,
who saw fit to file rust-lang/rust#99437 as a more specific example
of the potential perils of the current behavior.

Signed-off-by: bstrie <[email protected]>
@RalfJung
Copy link
Member Author

Rejecting code like that is probably going to be hard, but both this and #73987 can be mitigated by a lint I think. Concretely, what happens here is that the compiler desugars (*ptr)[..layout_size] to IndexMut::index_mut(&mut *ptr, ..layout_size). Whenever such an &mut is inserted implicitly, I think we want to check the place it is applied to, and if that place was constructed by deref'ing a raw pointer, we should have a lint. The lint should suggest to change the code to (&mut *ptr)[..layout_size], to make it explicit that a reference is being created here. The lint should also explain that creating a reference (explicitly or implicitly) comes with a bunch of requirements regarding the data the reference points to, and aliasing requirements on how the reference is being used.

@RalfJung RalfJung changed the title Raw pointer slicing implicitly creates a temporary reference request: lint when raw pointer slicing implicitly creates a temporary reference Oct 29, 2022
@WaffleLapkin
Copy link
Member

@rustbot claim

@jyn514 jyn514 added C-enhancement Category: An issue proposing an enhancement or a PR with one. A-lint Area: Lints (warnings about flaws in source code) such as unused_mut. A-raw-pointers Area: raw pointers, MaybeUninit, NonNull T-opsem Relevant to the opsem team A-slice Area: `[T]` labels Apr 15, 2023
@WaffleLapkin WaffleLapkin removed their assignment Apr 13, 2024
@WaffleLapkin
Copy link
Member

@Urgau is implementing this in #123239

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-lint Area: Lints (warnings about flaws in source code) such as unused_mut. A-raw-pointers Area: raw pointers, MaybeUninit, NonNull A-slice Area: `[T]` C-enhancement Category: An issue proposing an enhancement or a PR with one. T-opsem Relevant to the opsem team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants