Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memrchr implementations may conflict with stacked borrows #58

Closed
HeroicKatora opened this issue Aug 30, 2019 · 8 comments
Closed

memrchr implementations may conflict with stacked borrows #58

HeroicKatora opened this issue Aug 30, 2019 · 8 comments

Comments

@HeroicKatora
Copy link
Contributor

The reverse search implementations (memrchr) seem illegal under stacked borrows. They all follow the same pattern, so here I'll only annotate one. It retrieves a raw pointer to the end of the haystack from a reference to an empty slice, but then uses that pointer to iterate backwards by offsetting it with negative indices. Under strict rules, that pointer would however only be valid for access to the bytes that the reference covered from which it was cast, i.e. a zero-length slice at the end.

To my understanding, this is very likely illegal but not yet caught by MIRI since it does not strictly track the source for raw pointers (^source). @RalfJung might be able to provide more definitive insights.

Relevant code (inserted comments marked as // !):

pub fn memrchr(n1: u8, haystack: &[u8]) -> Option<usize> {
    // [...]
    let start_ptr = haystack.as_ptr();
    // ! This pointer only covers the same slice that the reference does.
    // ! Would need to create these manually from offsetting the start pointer
    // ! which covers the whole array.
    let end_ptr = haystack[haystack.len()..].as_ptr();
    let mut ptr = end_ptr;

    unsafe {
        // [...]
        ptr = (end_ptr as usize & !align) as *const u8;
        // [...]
        while loop_size == LOOP_SIZE && ptr >= ptr_add(start_ptr, loop_size) {
            // [...]
            // ! These are outside the bounds of the reference from which ptr was created.
            let a = *(ptr_sub(ptr, 2 * USIZE_BYTES) as *const usize);
            let b = *(ptr_sub(ptr, 1 * USIZE_BYTES) as *const usize);
            // [...]
            ptr = ptr_sub(ptr, loop_size);
        }
        // [...]
    }
}

Library code reduced to that version of memrchr.

The fix is simple, create ptr from manually offsetting haystack.as_ptr() which is valid for the whole haystack. I also don't expect any miscompilation.

@BurntSushi
Copy link
Owner

Thanks for the issue! Unfortunately, I don't fully grok the underlying explanation for this. Perhaps you could ELINRJ (that is, Explain Like I'm Not Ralf Jung :-)).

While these points/questions don't represent the full extent of my non-understanding, they are perhaps a start:

  • The phrase "may conflict with stacked borrows" has absolutely no meaning to me. I recall reading Ralf's post on stacked borrows a while back, but I promise you that it didn't fully sink in.
  • Does this result in undefined behavior?
  • From what I can tell, creating the pointer manually from the starting raw pointer and creating it from the empty slice at the end seem like equivalent operations to me? What's going on in the language semantics that cause these two things to actually be different?
  • The ending pointer is calculated this way in all of the memchr implementations I think, not just the reverse ones. For example: https://github.com/BurntSushi/rust-memchr/blob/1ec5ecce03c220c762dd9a8b08f7a3d95522b765/src/x86/sse2.rs#L112 --- Or is there something about the reverse implementation that is special that I am not understanding?
  • How is this reconciled with things like "Both the starting and resulting pointer must be either in bounds or one byte past the end of the same allocated object." from the docs of pointer::offset?

@HeroicKatora
Copy link
Contributor Author

Sure, I will try to answer the immediate questions first.

The phrase "may conflict with stacked borrows" has absolutely no meaning to me. I recall reading Ralf's post on stacked borrows a while back, but I promise you that it didn't fully sink in.

The phrase says: Under the memory model of stacked borrows, the implementation here is illegal. Since stacked borrows is not the official memory model of Rust (yet?) this doesn't make the code have undefined behaviour but the compiler may be changed so that it has. Since adapting the code to be defined even under stacked borrowed should be possible, this is a future proofing issue.

From what I can tell, creating the pointer manually from the starting raw pointer and creating it from the empty slice at the end seem like equivalent operations to me? What's going on in the language semantics that cause these two things to actually be different?

A part of stacked borrows is 'pointer provenance', the concept that the source of a pointer may influence the allowed operations. In particular, two pointers that compare equal may not be allowed to be used interchangably (for dereferencing). (Some more reading, another blog post by Ralf Jung.) And so, while the pointer of the empty slice at the end equals the pointer offset from the starting pointer, dereferencing the two pointers or other pointers created from them need not be allowed equally.

The end_ptr was created by casting a reference to the empty slice at the end. Under the strict pointer provenance model it must only be used to access that empty slice. In contrast, start_ptr is created by casting the reference to the whole slice (haystack.as_ptr() does this internally) and is thus valid for dereferencing it at any index within the slice. Meanwhile, pointer offsetting and comparison is itself allowed on both pointers equally.

Or is there something about the reverse implementation that is special that I am not understanding?

Yes, see the above. memchr only inspects end_ptr, compares it against ptr and subtracts the two—but it never dereferences end_ptr. That is the crucial part.

How is this reconciled with things like "Both the starting and resulting pointer must be either in bounds or one byte past the end of the same allocated object." from the docs of pointer::offset?

The offsetting is allowed but dereferencing/reading from the pointee may be still illegal. Note that the two unsafe operations require separate reasoning.

@BurntSushi
Copy link
Owner

Interesting. Thanks for elaborating. I think I grok this at a surface level, but probably still lack a deeper understanding. I'll mark this as a bug for now. I'd almost slightly prefer to hold off on fixing it until Miri is able to track it, so that we get some confidence that this is the right play.

@RalfJung
Copy link

RalfJung commented Aug 31, 2019

From what I can tell, creating the pointer manually from the starting raw pointer and creating it from the empty slice at the end seem like equivalent operations to me? What's going on in the language semantics that cause these two things to actually be different?

To clarify, this is where the mismatch comes in. @HeroicKatora already mentioned pointer provenance, so let me just point you to rust-lang/unsafe-code-guidelines#134 where we track if maybe Stacked Borrows is too strict here (but also, we might lose basically all aliasing-based optimizations if we relax this). Also see this discussion where I explained the issue in some more details:

Raw pointer operations to not affect the permissions a pointer has, so the moment you are casting a reference to a raw pointer, you are deciding what is allowed to be done with all raw pointers ever created from this one.

Basically, &slice[0] as *const T and slice.as_ptr() are not equivalent (even ignoring the empty slice case): the former is a ref-to-raw cast of the first element, so the resulting raw pointer may only access that element; the latter does a ref-to-raw cast of the entire slice, so the resulting raw pointer may be used on the entire slice. (To be clear, as_ptr does no magic, it's implementation just does the right thing: first cast to wide raw pointer for entire slice, and only then cast to thin raw ptr for first element.)

@BurntSushi
Copy link
Owner

BurntSushi commented Aug 31, 2019

(To be clear, as_ptr does no magic, it's implementation just does the right thing: first cast to wide raw pointer for entire slice, and only then cast to thin raw ptr for first element.)

I think this is the most interesting part to me! My prior mental model is that this

    pub const fn as_ptr(&self) -> *const T {
        self as *const [T] as *const T
    }

could equivalently be implemented as

    pub const fn as_ptr(&self) -> *const T {
        core::mem::transmute(self)
    }

But it sounds like that may not be the case since the actual series of steps on goes through via the raw pointer casts seems to matter.

I must say that I would enjoy having these restrictions placed on raw pointers, but only if tooling could reliably enforce it. (Presumably through miri.) If tooling couldn't reliably enforce it, then I think my joy would turn to misery. :-)

@RalfJung
Copy link

I must say that I would enjoy having these restrictions placed on raw pointers, but only if tooling could reliably enforce it. (Presumably through miri.)

Doing this in Miri is not a big issue. (Raw pointers right now are more relaxed mostly because there is some libstd code that is broken, and cannot be fixed without rust-lang/rfcs#2582.)

But Miri cannot run on FFI code, or code interacting with the hardware (embedded/kernel stuff), or code that just does lots of stuff (Miri is very slow). I think we could have a valgrind tool that helps detect such issues, but by their very nature valgrind tools are unable to reliably find UB.

If tooling couldn't reliably enforce it, then I think my joy would turn to misery. :-)

Fair. ;)

Reference-to-raw transmutes are an interesting open question. This is related to the fact that nobody knows what the exact LLVM semantics are for pointer-to-int transmutes -- making them the same as a cast would kill some quite important optimizations. Also see the discussion of type punning in §8 of this paper.

If you want your model of pointer casts shattered, have a look at this LLVM bug. inttoptr(ptrtoint x) and x are not the same thing.

@eddyb
Copy link

eddyb commented Sep 9, 2019

@BurntSushi I don't actually understand your two examples (the transmute doesn't seem to be valid because &[T] is not the same size as *const T).
The first one is the current implementation of <[T]>::as_ptr, right?

While looking at this I noticed again (playground link) that we don't allow &[T] to be cast directly to *const T while we do allow it for &[T; N].

(When someone brought up this issue elsewhere I thought the &slice[0] as *const T comment referred to code that was written instead of .as_ptr(), which is why I looked at casts)

@BurntSushi
Copy link
Owner

This will be fixed in #82. I know this technically isn't required yet, but I don't see any good reason not to do it.

BurntSushi added a commit that referenced this issue Apr 30, 2021
This commit primarily adds vectorized substring search routines in
a new memmem sub-module. They were originally taken from bstr, but
heavily modified to incorporate a variant of the "generic SIMD"
algorithm[1]. The main highlights:

* We guarantee `O(m + n)` time complexity and constant space
  complexity.
* Two-Way is the primary implementation that can handle all cases.
* Vectorized variants handle a number of common cases.
* Vectorized code uses a heuristic informed by a frequency background
  distribution of bytes, originally devised inside the regex crate.
  This makes it more likely that searching will spend more time in the
  fast vector loops.

While adding memmem to this crate is perhaps a bit of a scope increase,
I think it fits well. It also puts a core primitive, substring
search, very low in the dependency DAG and therefore making it widely
available. For example, it is intended to use these new routines in the
regex, aho-corasick and bstr crates.

This commit does a number of other things, mainly as a result of
convenience. It drastically improves test coverage for substring search
(as compared to what bstr had), completely overhauls the benchmark suite
to make it more comprehensive and adds `cargo fuzz` support for all API
items in the crate.

Closes #58, Closes #72

[1] - http://0x80.pl/articles/simd-strfind.html#algorithm-1-generic-simd
BurntSushi added a commit that referenced this issue Apr 30, 2021
This commit primarily adds vectorized substring search routines in
a new memmem sub-module. They were originally taken from bstr, but
heavily modified to incorporate a variant of the "generic SIMD"
algorithm[1]. The main highlights:

* We guarantee `O(m + n)` time complexity and constant space
  complexity.
* Two-Way is the primary implementation that can handle all cases.
* Vectorized variants handle a number of common cases.
* Vectorized code uses a heuristic informed by a frequency background
  distribution of bytes, originally devised inside the regex crate.
  This makes it more likely that searching will spend more time in the
  fast vector loops.

While adding memmem to this crate is perhaps a bit of a scope increase,
I think it fits well. It also puts a core primitive, substring
search, very low in the dependency DAG and therefore making it widely
available. For example, it is intended to use these new routines in the
regex, aho-corasick and bstr crates.

This commit does a number of other things, mainly as a result of
convenience. It drastically improves test coverage for substring search
(as compared to what bstr had), completely overhauls the benchmark suite
to make it more comprehensive and adds `cargo fuzz` support for all API
items in the crate.

Closes #58, Closes #72

[1] - http://0x80.pl/articles/simd-strfind.html#algorithm-1-generic-simd
BurntSushi added a commit that referenced this issue Apr 30, 2021
This commit primarily adds vectorized substring search routines in
a new memmem sub-module. They were originally taken from bstr, but
heavily modified to incorporate a variant of the "generic SIMD"
algorithm[1]. The main highlights:

* We guarantee `O(m + n)` time complexity and constant space
  complexity.
* Two-Way is the primary implementation that can handle all cases.
* Vectorized variants handle a number of common cases.
* Vectorized code uses a heuristic informed by a frequency background
  distribution of bytes, originally devised inside the regex crate.
  This makes it more likely that searching will spend more time in the
  fast vector loops.

While adding memmem to this crate is perhaps a bit of a scope increase,
I think it fits well. It also puts a core primitive, substring
search, very low in the dependency DAG and therefore making it widely
available. For example, it is intended to use these new routines in the
regex, aho-corasick and bstr crates.

This commit does a number of other things, mainly as a result of
convenience. It drastically improves test coverage for substring search
(as compared to what bstr had), completely overhauls the benchmark suite
to make it more comprehensive and adds `cargo fuzz` support for all API
items in the crate.

Closes #58, Closes #72

[1] - http://0x80.pl/articles/simd-strfind.html#algorithm-1-generic-simd
BurntSushi added a commit that referenced this issue Apr 30, 2021
This commit primarily adds vectorized substring search routines in
a new memmem sub-module. They were originally taken from bstr, but
heavily modified to incorporate a variant of the "generic SIMD"
algorithm[1]. The main highlights:

* We guarantee `O(m + n)` time complexity and constant space
  complexity.
* Two-Way is the primary implementation that can handle all cases.
* Vectorized variants handle a number of common cases.
* Vectorized code uses a heuristic informed by a frequency background
  distribution of bytes, originally devised inside the regex crate.
  This makes it more likely that searching will spend more time in the
  fast vector loops.

While adding memmem to this crate is perhaps a bit of a scope increase,
I think it fits well. It also puts a core primitive, substring
search, very low in the dependency DAG and therefore making it widely
available. For example, it is intended to use these new routines in the
regex, aho-corasick and bstr crates.

This commit does a number of other things, mainly as a result of
convenience. It drastically improves test coverage for substring search
(as compared to what bstr had), completely overhauls the benchmark suite
to make it more comprehensive and adds `cargo fuzz` support for all API
items in the crate.

Closes #58, Closes #72

[1] - http://0x80.pl/articles/simd-strfind.html#algorithm-1-generic-simd
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants