Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue for ptr::offset_from (feature: ptr_offset_from) #41079

Closed
3 of 5 tasks
Amanieu opened this issue Apr 5, 2017 · 43 comments · Fixed by #74238
Closed
3 of 5 tasks

Tracking issue for ptr::offset_from (feature: ptr_offset_from) #41079

Amanieu opened this issue Apr 5, 2017 · 43 comments · Fixed by #74238
Labels
A-raw-pointers Area: raw pointers, MaybeUninit, NonNull B-unstable Blocker: Implemented in the nightly compiler and unstable. C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. Libs-Tracked Libs issues that are tracked on the team's project board. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@Amanieu
Copy link
Member

Amanieu commented Apr 5, 2017

PR: #40943

Adds an offset_to method to calculate the distance between two raw pointers.

List o' stuff:

@Mark-Simulacrum Mark-Simulacrum added B-unstable Blocker: Implemented in the nightly compiler and unstable. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Jun 20, 2017
@scottmcm
Copy link
Member

I wonder whether this should have the unsafe + wrapping_ split that offset does.

The unsafe one could, for example, use sdiv exact by requiring that the pointers are to the beginning or past-the-end of real objects inside the same allocated object.

@strega-nil
Copy link
Contributor

It seems suspicious at best to calculate the offset between two unrelated pointers... I agree with @scottmcm

@Mark-Simulacrum Mark-Simulacrum added the C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. label Jul 22, 2017
@joshtriplett
Copy link
Member

@pornel made the point that this should be offset_from, with the order of the subtraction reversed, which I agree with. (Or, perhaps, both methods should exist.)

@joshtriplett
Copy link
Member

Also, I find it unfortunate that this forces handling the ZST case even if you know the pointer type and know that it doesn't point to a zero-sized type.

Could we use traits to provide a method that only exists for non-zero-sized types, and then returns isize?

@sunfishcode
Copy link
Member

I like @scottmcm's suggestion to have an unsafe ptr::offset_to that uses sdiv exact. This is the situation sdiv exact was designed for, and it often reduces a 5-instruction sequence down to 2. And, if the unsafe version could be documented to have UB if the pointers don't point into the same array (or one-past-the-end), it would make me less worried about breaking pervasive assumptions that LLVM makes about pointer differences [0]. And it would be sufficient for Vec and related use cases.

I suggest omitting the wrapping_ form though. Even if the implementation is entirely "safe" code, I agree with @ubsan that it's suspicious at best and doesn't seem like the kind of thing that should be encouraged via a convenient standard library function.

[0] For example, GetUnderlyingObject, which is used throughout the optimizer, assumes that getelementptr X, Y aliases a subset of X. This doesn't work if Y can be the difference from X to an independent object.

@iitalics
Copy link

iitalics commented Jan 4, 2018

I think that returning None for the zero-sized-type case is unnecessary. It is a very unlikely event, and a majority of the time the user knows they will never get the None case, so unwrapping is just clutter.

@SimonSapin
Copy link
Contributor

Renaming to offset_from and swapping the subtraction order seems easy enough.

I don’t really understand the unsafe / sdiv exact aspect, though. I’ll leave that decision to someone else.

@Amanieu since you added this, what do you think?

@Amanieu
Copy link
Member Author

Amanieu commented Mar 17, 2018

I have no objection to renaming it to offset_from.

The unsafe / sdiv exact issue basically boils around one question: what happens if the distance between the two given pointers is not a multiple of sizeof_of::<T>(). There's only 2 ways we can realistically handle this:

  • Make this situation UB (in which case offset_from becomes an unsafe function).
  • Round the result of the division towards zero.

The current implementation rounds towards zero, however I am considering going with the UB route instead. Note that this situation can only happen if one of the pointers is misaligned, so we could just require that both pointers be properly aligned for their type.

@SimonSapin
Copy link
Contributor

this situation can only happen if one of the pointers is misaligned

Can’t it also happen with aligned pointers, if size_of::<T>() > align_of::<T>()?

@Amanieu
Copy link
Member Author

Amanieu commented Mar 19, 2018

That's a good point, I'm not sure why I thought of alignment there. You are correct.

@sunfishcode
Copy link
Member

@SimonSapin That's a good point. C manages to avoid that issue, because in C it's undefined behavior if the pointers aren't both pointing to elements of the same array (6.5.6p9). I don't know whether Rust's offset_from wants a similar rule, though you can get into dangerous territory with pointer aliasing if you access one object using a pointer to another plus the distance between the two.

@strega-nil
Copy link
Contributor

@sunfishcode it seems... incredibly suspect to allow offset_from between two pointers from different objects.

@scottmcm
Copy link
Member

I started on a PR for this at #49297; feedback and suggestions appreciated.

@scottmcm
Copy link
Member

Amanieu and sunfishcode commented that doing fn(*const _, *const _) -> isize needs a subtraction that doesn't fit with any of the "does not overflow" flags available for LLVM sub (like the GEP rules are specific to it, and not available on a normal add instruction).

As such, should this API change form to something where it can be -> usize, and thus use nuw by requiring an order between the arguments? Such a thing probably shouldn't use the word offset; my strawman proposal is something involving distance (since I'm primed by C++).

@Amanieu
Copy link
Member Author

Amanieu commented Mar 24, 2018

Keep in mind that std::distance and pointer subtraction in C++ both return a signed value. This is one of the reasons why I proposed making offset_to return a signed value.

LLVM IR generated by Clang is just a sub (without nsw/nuw) and a sdiv exact, so we currently don't lose anything compared to C.

I considered the name distance, however I'm not a big fan of it because the name somewhat implies that the order of parameters doesn't matter and that an absolute value is returned.

bors added a commit that referenced this issue Mar 26, 2018
Introduce unsafe offset_from on pointers

Adds intrinsics::exact_div to take advantage of the unsafe, which reduces the implementation from
```asm
    sub rcx, rdx
    mov rax, rcx
    sar rax, 63
    shr rax, 62
    lea rax, [rax + rcx]
    sar rax, 2
    ret
```
down to
```asm
    sub rcx, rdx
    sar rcx, 2
    mov rax, rcx
    ret
```
(for `*const i32`)

See discussion on the `offset_to` tracking issue #41079

Some open questions
- Would you rather I split the intrinsic PR from the library PR?
- Do we even want the safe version of the API?  #41079 (comment)  I've added some text to its documentation that even if it's not UB, it's useless to use it between pointers into different objects.

and todos
- [x] ~~I need to make a codegen test~~ Done
- [x] ~~Can the subtraction use nsw/nuw?~~ No, it can't #49297 (comment)
- [x] ~~Should there be `usize` variants of this, like there are now `add` and `sub` that you almost always want over `offset`?  For example, I imagine `sub_ptr` that returns `usize` and where it's UB if the distance is negative.~~ Can wait for later; C gives a signed result #41079 (comment), so we might as well, and this existing to go with `offset` makes sense.
@SimonSapin
Copy link
Contributor

In Nightly this is the tracking issue for three different inherent methods of both *const T and *mut T.

pub fn offset_to(self, other: *const T) -> Option<isize> where T: Sized {}
pub unsafe fn offset_from(self, origin: *const T) -> isize where T: Sized {}
pub fn wrapping_offset_from(self, origin: *const T) -> isize where T: Sized {}

(Note: the *mut T methods also take a *const T parameter. I don’t know if this is intentional or if the parameter’s type should be changed to Self.)

The libs team discussed this and it wasn’t clear from this tracking issue or the implementation PR what is the motivation for this feature.

@Amanieu Could you comment on what situations this would be used in, why it should be in the standard library, and why three different methods are needed?

@Amanieu
Copy link
Member Author

Amanieu commented Mar 28, 2018

My understanding is that offset_from is the "new" form which is intended to replace the old offset_to.

These methods are useful when converting between slices and raw pointers, which can happen when building data structures or with FFI. offset_to is currently used in two places in the standard library:

@SimonSapin
Copy link
Contributor

If this is a replacement, should offset_to be removed? It’s also unstable.

@scottmcm
Copy link
Member

I'll make a PR for that and to move vec&slice to offset_from. (Unless it should be deprecated for a bit first? I don't remember what the rules are for nightly things...)

@scottmcm
Copy link
Member

scottmcm commented Apr 1, 2018

PR up at #41079

That did reinforce my feeling that isize, while good for consistency with offset, might not be the best signature, as the only places that use it in core & alloc know which pointer is higher and want usize.

bors added a commit that referenced this issue Apr 12, 2018
Deprecate offset_to; switch core&alloc to using offset_from instead

Bonus: might make code than uses `.len()` on slice iterators faster

cc #41079
@ghost
Copy link

ghost commented Aug 20, 2018

I tried using the offset_from() method on a pointer today. I like the name and API; it seemed rather intuitive. The only issue I have is that the implementations of offset_from() and wrapping_offset_from() use assert!() rather than debug_assert!(). The result is that offset_from() is a bit slower than it could be when debug = false.

Is there a reason to prefer assert!()?

@scottmcm
Copy link
Member

@dtrebbien The assert is on the size of the type, which is a compile-time constant, so I would expect it to always get optimized out. Are you seeing otherwise?

@oli-obk

This comment has been minimized.

@mjbshaw

This comment has been minimized.

@oli-obk

This comment has been minimized.

@mjbshaw

This comment has been minimized.

@oli-obk

This comment has been minimized.

@mjbshaw

This comment has been minimized.

@oli-obk

This comment has been minimized.

@RalfJung
Copy link
Member

I'd like to see this stabilized. Looks like 18 months ago most concerns were already resolved, except maybe for the isize/usize issue that @scottmcm mentioned. Is that issue still present?

I am surprised there would be a problem here, since in LLVM an allocation cannot be larger than isize::MAX, so even if you know the relative order of the pointers, isize can still always represent their difference as as usize will be lossless.

Manishearth added a commit to Manishearth/rust that referenced this issue Jun 21, 2020
…from, r=Amanieu

deprecate wrapping_offset_from

As per rust-lang#41079 (comment) which seems like a consensus.

r? @Amanieu
Manishearth added a commit to Manishearth/rust that referenced this issue Jun 21, 2020
…from, r=Amanieu

deprecate wrapping_offset_from

As per rust-lang#41079 (comment) which seems like a consensus.

r? @Amanieu
Manishearth added a commit to Manishearth/rust that referenced this issue Jun 22, 2020
…from, r=Amanieu

deprecate wrapping_offset_from

As per rust-lang#41079 (comment) which seems like a consensus.

r? @Amanieu
Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this issue Jun 22, 2020
…from, r=Amanieu

deprecate wrapping_offset_from

As per rust-lang#41079 (comment) which seems like a consensus.

r? @Amanieu
@dhardy
Copy link
Contributor

dhardy commented Jul 11, 2020

I would also like to see this resolved.

Since the only valid application appears to be comparing two positions within a slice, a safe API is possible, if that helps.

impl<T> [T] {
    fn ptr_offset(&self, p: *const T) -> Option<usize>;
    /// Returns `None` if either pointer does not point to a valid position within this slice
    fn ptr_sub(&self, p: *const T, q: *const T) -> Option<isize>;
}

Of course, the latter is certainly not optimal. Personally I care more about making this possible (in stable Rust) than optimal.

@RalfJung
Copy link
Member

RalfJung commented Jul 11, 2020

A safe API is very tricky at best: offset_from has the same problem as methods that "mend" two slices together when their one-past-the-end and first pointer are equal. Just because slice_base + len <= p does not mean it is safe to call p.offset_from(slice_base).

However, (p as usize as *const _).offset_from(slice_base) should work -- except that LLVM has some long-standing bugs in this area.

@RalfJung RalfJung changed the title Tracking issue for ptr::offset_from Tracking issue for ptr::offset_from (feature: ptr_offset_from) Jul 11, 2020
@RalfJung
Copy link
Member

Stabilization PR is up: #74238
I don't know what it takes to summon rfcbot, I hope someone can take over. :)

@comex
Copy link
Contributor

comex commented Jul 12, 2020

A safe API is very tricky at best: offset_from has the same problem as methods that "mend" two slices together when their one-past-the-end and first pointer are equal. Just because slice_base + len <= p does not mean it is safe to call p.offset_from(slice_base).

But it's always safe to call wrapping_offset_from – or, since that's deprecated, cast both pointers to usize and do the subtraction that way. It might spuriously report that a pointer is one-past-the-end of a slice when it's actually at the start of a different allocation, but that's not itself unsafe or UB, just surprising...

@RalfJung
Copy link
Member

But it's always safe to call wrapping_offset_from – or, since that's deprecated, cast both pointers to usize and do the subtraction that way. It might spuriously report that a pointer is one-past-the-end of a slice when it's actually at the start of a different allocation, but that's not itself unsafe or UB, just surprising...

True.

@KodrAus KodrAus added Libs-Tracked Libs issues that are tracked on the team's project board. A-raw-pointers Area: raw pointers, MaybeUninit, NonNull labels Jul 29, 2020
@bors bors closed this as completed in 9d606d9 Aug 23, 2020
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Jan 17, 2022
…tolnay

fix const_ptr_offset_from tracking issue

The old tracking issue rust-lang#41079 was for exposing those functions at all, and got closed when they were stabilized. We had nothing tracking their `const`ness so I opened a new tracking issue: rust-lang#92980.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-raw-pointers Area: raw pointers, MaybeUninit, NonNull B-unstable Blocker: Implemented in the nightly compiler and unstable. C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. Libs-Tracked Libs issues that are tracked on the team's project board. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.