Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge_batcher: use Rust iterators and VecDeque in place of VecQueue #380

Merged
merged 1 commit into from
Feb 27, 2023

Conversation

petrosagg
Copy link
Contributor

The standard library offers VecDeque which also has cheap (i.e non allocating) conversions to and from Vecs so we can use that directly and reduce the number of unsafe calls in the project.

Note that while all uses of the API of VecQueue were correct its methods were marked as safe but you could accidentally cause undefined behavior:

let queue = VecQueue::new();
queue.pop(); // UB, length isn't checked

The standard library offers `VecDeque` which also has cheap (i.e non
allocating) conversions to and from `Vec`s so we can use that directly
and reduce the number of unsafe calls in the project.

Note that while all uses of the API of `VecQueue` were correct its
methods were marked as safe but you could accidentally cause undefined
behavior:

```rust
let queue = VecQueue::new();
queue.pop(); // UB, length isn't checked
```

Signed-off-by: Petros Angelatos <[email protected]>
Copy link
Member

@frankmcsherry frankmcsherry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. I'm inclined to follow it with some work to remove push_unchecked. It seems entirely plausible that push_unchecked isn't needed at the moment. I suspect it existed to support alternate inner loops that don't recheck result.capacity() v result.len() each iteration.

@frankmcsherry frankmcsherry merged commit 6ae61ad into TimelyDataflow:master Feb 27, 2023
dtolnay pushed a commit to dtolnay-contrib/differential-dataflow that referenced this pull request Dec 15, 2023
…imelyDataflow#380)

The standard library offers `VecDeque` which also has cheap (i.e non
allocating) conversions to and from `Vec`s so we can use that directly
and reduce the number of unsafe calls in the project.

Note that while all uses of the API of `VecQueue` were correct its
methods were marked as safe but you could accidentally cause undefined
behavior:

```rust
let queue = VecQueue::new();
queue.pop(); // UB, length isn't checked
```

Signed-off-by: Petros Angelatos <[email protected]>
#[inline]
pub fn peek(&self) -> &T {
debug_assert!(self.head < self.tail);
unsafe { self.list.get_unchecked(self.head) }
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This original code is UB since it is using get_unchecked to index out of bounds of the vector's contents. The vector this is indexing into has length 0 due to the set_len below.

Since rust-lang/rust#116915 (Rust 1.76+, nightly-2023-12-05+), this UB is going to cause pretty much anything calling this code in differential-dataflow 0.12.0 to miscompile. I found this by way of a miscompile of the cargo-tally crate.

Copy link
Member

@frankmcsherry frankmcsherry Dec 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the subtext here that it would help to ship 0.13.0? I think 0.12.0 is multiple years old at this point, and perhaps it's worth a new, utterly incompatible version that has less unsafe in it? (I/we mostly use the repo head rather than anything crates.io; you might be one of the few who do).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not blocked on an official 0.13.0 being cut since I grabbed the code from master yesterday and published it to https://crates.io/crates/differential-dataflow-master myself. Cargo-tally can use it like this: dtolnay/cargo-tally@24a1946. For my purposes, this is completely fine — it lets you folks keep developing against head and lets me use snapshots of head in my crate whenever I like.

If you do 0.13.0 it would mostly be for the benefit of anyone else using differential-dataflow from crates.io. Right now 0.12.0 is effectively unusable, I think, and that was worth calling your attention to (also because the PR description says "all uses of the API were correct" and did not identify that this code is really problematic). But it does not need to translate into urgency to publish 0.13.0.

I had originally looked into whether backporting this single fix onto 0.12.0 as a 0.12.1 release would fix the miscompiles. I first confirmed that the commit in which this PR merged (6ae61ad) does not have the miscompile whereas its parent commit (cc88469) still does, so this is definitely the relevant fix. I backported it onto 7bc5338 (the v0.12.0 tag) as dtolnay-contrib@46edd1f. However that still miscompiled. There must be some other invalid unsafe code elsewhere that got removed earlier than this PR — you mentioned overall there is less unsafe now compared to 3 years ago. I figured finding all the rest of what would need to be backported was not worthwhile so this is when I switched to the approach of publishing the code from master.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We merged #413 recently, which also removes some unsafe code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also because the PR description says "all uses of the API were correct" and did not identify that this code is really problematic

Erm, it seems less problematic to be totally honest; certainly at the time. Afaict it matches the PPYP pattern, and the main thing that has changed is what Rust views as UB (to be fair, afaiu little enough is defined in the first place). The Rust 1.76 change certainly seems to mess up the PPYP pattern; is there something else that Rust folks recommend instead now?

More generally, any recommendations for where to follow and perhaps litigate that certain "optimizations" around unsafe are potentially harmful? It's a recurring anxiety for anyone who wants solid performance, that Rust might just eventually deem any use of unsafe UB and misoptimize it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There must be some other invalid unsafe code elsewhere that got removed earlier than this PR

Yeah, pretty much all use of unsafe is invalid, so this checks out. :D I think there are two uses of unsafe at the moment to work around LLVM not optimizing split_at_mut very well (in consolidation.rs). Everything else has been removed.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I'm here because I was sifting through backlinks)

the main thing that has changed is what Rust views as UB

Out-of-bounds indexing via get_unchecked has been documented to be UB since Rust 1.38.0, which was published in late November 2019. Not that I really expect anyone to be aware that documentation has been updated, but there was no recent change here.

The UB here has also been reliably detected by Miri for a long time, and if it is not possible to run Miri, the standard library has robust debug assertions that you can exercise today with cargo-careful or with -Zbuild-std. Some of those checks should soon be on by default without such tools.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the heads up! Unfortunately the referenced code predates 1.38, back when the method had no documented UB. =/

Fwiw, we've run Miri a fair amount, in particular to find a miscompilation induced by another crate, and it did not detect anything. I don't know if that is surprising.

Copy link

@saethlin saethlin Feb 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it did not detect anything. I don't know if that is surprising.

Unfortunately, it's not. In the past, out-of-bounds slice::get_unchecked was detected almost by accident by Stacked Borrows, so if you pass -Zmiri-disable-stacked-borrows or -Zmiri-tree-borrows, as I suspect some number of codebases need to in order to execute much of anything, it's wasn't checked at all. Miri understands the optimization hint that landed in 1.76 though. So this isn't a problem anymore.

The other problem is just the nature of dynamic checkers. If well-formed/non-malicious inputs never cause indexing out of bounds... well you'll never get a report from Miri. I've been meaning to write a helper to run Miri on a minimized fuzzing corpus to assist with this scenario. Still not a soundness proof though.

This was referenced Oct 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants