Add vector clock to epoll #3928

FrankReh · 2024-09-30T21:02:05Z

Fixes #3911. Deadlock reported on a tokio test using epoll.

tiif

Thanks for investigating the deadlock and submit the fix! :)

Since this is the part of the codebase I am interested in, I will do the initial review.

src/shims/unix/linux/epoll.rs

tiif · 2024-10-01T01:24:57Z

src/shims/unix/linux/epoll.rs

+
+                        let epoll = epfd.downcast::<Epoll>().unwrap();
+
+                        // Synchronize running thread to the epoll ready list.
+                        if let Some(clock) = &this.release_clock() {
+                            epoll.ready_list.clock.borrow_mut().join(clock);
+                        }
+


If we release the clock here, there is no guarantee that any thread will be unblocked through check_and_update_readiness later, @RalfJung would it be better if we only release clock when waiter is not empty or it won't affect correctness at all?

The waiter I referred to is this one:

for thread_id in waiter { this.unblock_thread(thread_id, BlockReason::Epoll)?; }

Not sure what you mean. The release clock isn't lost. It can be used as often as necessary?

src/shims/unix/linux/epoll.rs

tiif · 2024-10-01T01:39:58Z

Also, consider adding this test here #3911 (comment) to libc-epoll-blocking.rs so it is easier to see the effect of the change in the CI.

FrankReh · 2024-10-01T02:22:28Z

Also, consider adding this test here #3911 (comment) to libc-epoll-blocking.rs so it is easier to see the effect of the change in the CI.

I would much prefer if one of you more familiar with that test added it. Either to this branch or before or after in a different PR.

FrankReh · 2024-10-01T02:32:47Z

Thanks for investigating the deadlock and submit the fix! :)

It was my pleasure. It was nice learning more about Miri and figuring out how to tease the extensive trace lines into something I could compare from run to run. Also building a version where I could control the seeds for each of the twelve places the rng is used, so 12 rng's, was fun to figure out. It was only after knowing which rng use was making a difference that the solution was clear to @RalfJung . Still you and he rattled off what to do next in a few minutes and it still took me some time to piece the advice together. Good experience. (Wish Miri could compile faster though but it is amazing how it is part of rustc and what it can do with no special programming on the application side.) And concurrency and epoll support and being able to test more of Tokio is amazing. I look forward to socket support.

tiif · 2024-10-01T02:41:47Z

I would much prefer if one of you more familiar with that test added it. Either to this branch or before or after in a different PR.

Sure, I will leave this to someone who has commit right or do it later in another PR.

I think the current status of the PR should be

@rustbot ready

RalfJung · 2024-10-01T07:23:24Z

The PR should contain the test, indeed, to make sure the problem is indeed fixed.

It shouldn't be too hard to copy the test into the test file?

oli-obk · 2024-10-01T07:28:54Z

Ideally the test should be added in a first commit, and then your change shows how the test changes. But that's not a strict requirement

FrankReh · 2024-10-01T09:47:43Z

So I can start a new PR, redo the order of commits and rebase, add it now, or are merges squashed?

oli-obk · 2024-10-01T10:12:19Z

You can add the commit to this PR and reorder the commits with git rebase -i

Merges are not squashed.

FrankReh · 2024-10-01T10:32:40Z

You can add the commit to this PR and reorder the commits with git rebase -i

Merges are not squashed.

I appreciate the clarifications. Probably all in the contributions section which I didn’t read past how to build. One final question because actually no one likes using git and its log history more than me.

Should the two ancillary changes I made to the same source file all be rolled into the one commit as I had initially done or are we better served with a separate commit for each one? Or even distinct PRs?

oli-obk · 2024-10-01T10:45:28Z

I personally try to put such things in different commits, but they are minor drive by things, so not too important. If it's easy to do for you and you like messing with thr git history, feel free to do it, but don't put a lot of time into it

FrankReh · 2024-10-02T12:27:25Z

This style check failure is the simple test function. The 'cargo +nightly fmt' I ran from the top didn't fix the formatting of the test function. Will rebase and commit again.

This test demonstrates the need to synchronize the clock of the thread waking up from an epoll_wait from the thread that issued the epoll awake event.

A couple of instructions were left over from an earlier rebase it would seem. They don't impact the logic but the ready_list type is about to change in the next commit. Rather than modify one of these lines in the commit that changes ready_list, only to have these lines removed later on, remove them now. They don't impact the tests results.

This adds a VClock to the epoll implementation's ready_list and has this VClock synced from the thread that updates an event in the ready_list and then has the VClocks of any threads being made runnable again, out of the calls to epoll_wait, synced from it.

A simplification that doesn't impact the epoll implementation's logic. It is not necessary to clone the ready_list before reading its `is_empty` state. This avoids the clone step but more importantly avoids the invisible drop step of the clone.

RalfJung · 2024-10-02T12:58:33Z

To format things, run ./miri fmt. You should generally never invoke cargo directly in Miri development.

oli-obk · 2024-10-02T18:11:10Z

Ideally the test should be added in a first commit, and then your change shows how the test changes. But that's not a strict requirement

I created an edit of your branch at https://github.com/rust-lang/miri/compare/epoll-with-vector-clock that shows how I thought it would be laid out. Each commit builds on its own, so the first one adding the test must be adding a failing test, and then the commit that fixes the test changes the test so it passes.

FrankReh · 2024-10-02T19:42:29Z

Ideally the test should be added in a first commit, and then your change shows how the test changes. But that's not a strict requirement

I created an edit of your branch at https://github.com/rust-lang/miri/compare/epoll-with-vector-clock that shows how I thought it would be laid out. Each commit builds on its own, so the first one adding the test must be adding a failing test, and then the commit that fixes the test changes the test so it passes.

Is this the review? I pushed new changes up six hours ago with the failing test first.

oli-obk · 2024-10-02T19:52:57Z

Yes, but you pushed it as a "pass" test. The commit will not pass tests that way. Compare with my first and third commits.

But considering the entirety, it seems fine to just add the test as you did to the commit that fixes it. As I said, the scheme of adding it as a failing test first is more extra credit for the pedantic than a requirement/preference

FrankReh · 2024-10-02T19:55:15Z

oh. it goes in a different directory? even when part of one PR? Okay. If that makes it better for posterity, I'll redo.

(Because I have a new miri question and I want to stay on all your good sides. :))

oli-obk · 2024-10-02T19:56:44Z

XD this is just oli-pedantism. Your PR is fine as it is. You can ask your question in parallel without any bad feelings from our side.

FrankReh · 2024-10-02T20:00:59Z

XD this is just oli-pedantism. Your PR is fine as it is. You can ask your question in parallel without any bad feelings from our side.

Would you like to commit your branch? It has even changed at least one comment so I don't know that there's any benefit to my branch any longer.

TIA

oli-obk · 2024-10-02T20:32:19Z

Done

Add vector clock to epoll ready lists Replaces #3928 Fixes #3911

tiif reviewed Oct 1, 2024

View reviewed changes

rustbot added the S-waiting-on-review Status: Waiting for a review to complete label Oct 1, 2024

FrankReh force-pushed the epoll-with-vector-clock branch from 5fc2539 to 65647df Compare October 2, 2024 12:22

FrankReh added 4 commits October 2, 2024 08:42

epoll: add data_race test

f4c8b97

This test demonstrates the need to synchronize the clock of the thread waking up from an epoll_wait from the thread that issued the epoll awake event.

FrankReh force-pushed the epoll-with-vector-clock branch from 65647df to a7ccfce Compare October 2, 2024 12:45

oli-obk mentioned this pull request Oct 2, 2024

Add vector clock to epoll ready lists #3932

Merged

oli-obk closed this Oct 2, 2024

bors added a commit that referenced this pull request Oct 2, 2024

Auto merge of #3932 - rust-lang:epoll-with-vector-clock, r=oli-obk

483eb7d

Add vector clock to epoll ready lists Replaces #3928 Fixes #3911

FrankReh deleted the epoll-with-vector-clock branch October 2, 2024 21:58

tiif mentioned this pull request Oct 5, 2024

epoll-wait induces too much synchronization #3944

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add vector clock to epoll #3928

Add vector clock to epoll #3928

FrankReh commented Sep 30, 2024

tiif left a comment

tiif Oct 1, 2024

FrankReh Oct 1, 2024

tiif commented Oct 1, 2024

FrankReh commented Oct 1, 2024

FrankReh commented Oct 1, 2024

tiif commented Oct 1, 2024 •

edited

Loading

RalfJung commented Oct 1, 2024

oli-obk commented Oct 1, 2024

FrankReh commented Oct 1, 2024

oli-obk commented Oct 1, 2024

FrankReh commented Oct 1, 2024

oli-obk commented Oct 1, 2024

FrankReh commented Oct 2, 2024 •

edited

Loading

RalfJung commented Oct 2, 2024

oli-obk commented Oct 2, 2024

FrankReh commented Oct 2, 2024

oli-obk commented Oct 2, 2024

FrankReh commented Oct 2, 2024

oli-obk commented Oct 2, 2024

FrankReh commented Oct 2, 2024

oli-obk commented Oct 2, 2024

Add vector clock to epoll #3928

Add vector clock to epoll #3928

Conversation

FrankReh commented Sep 30, 2024

tiif left a comment

Choose a reason for hiding this comment

tiif Oct 1, 2024

Choose a reason for hiding this comment

FrankReh Oct 1, 2024

Choose a reason for hiding this comment

tiif commented Oct 1, 2024

FrankReh commented Oct 1, 2024

FrankReh commented Oct 1, 2024

tiif commented Oct 1, 2024 • edited Loading

RalfJung commented Oct 1, 2024

oli-obk commented Oct 1, 2024

FrankReh commented Oct 1, 2024

oli-obk commented Oct 1, 2024

FrankReh commented Oct 1, 2024

oli-obk commented Oct 1, 2024

FrankReh commented Oct 2, 2024 • edited Loading

RalfJung commented Oct 2, 2024

oli-obk commented Oct 2, 2024

FrankReh commented Oct 2, 2024

oli-obk commented Oct 2, 2024

FrankReh commented Oct 2, 2024

oli-obk commented Oct 2, 2024

FrankReh commented Oct 2, 2024

oli-obk commented Oct 2, 2024

tiif commented Oct 1, 2024 •

edited

Loading

FrankReh commented Oct 2, 2024 •

edited

Loading