Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add vector clock to epoll #3928

Closed
wants to merge 4 commits into from

Conversation

FrankReh
Copy link
Contributor

Fixes #3911. Deadlock reported on a tokio test using epoll.

Copy link
Contributor

@tiif tiif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for investigating the deadlock and submit the fix! :)

Since this is the part of the codebase I am interested in, I will do the initial review.

src/shims/unix/linux/epoll.rs Outdated Show resolved Hide resolved
Comment on lines +571 to +574

let epoll = epfd.downcast::<Epoll>().unwrap();

// Synchronize running thread to the epoll ready list.
if let Some(clock) = &this.release_clock() {
epoll.ready_list.clock.borrow_mut().join(clock);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we release the clock here, there is no guarantee that any thread will be unblocked through check_and_update_readiness later, @RalfJung would it be better if we only release clock when waiter is not empty or it won't affect correctness at all?

The waiter I referred to is this one:

        for thread_id in waiter {
            this.unblock_thread(thread_id, BlockReason::Epoll)?;
        }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you mean. The release clock isn't lost. It can be used as often as necessary?

src/shims/unix/linux/epoll.rs Show resolved Hide resolved
@tiif
Copy link
Contributor

tiif commented Oct 1, 2024

Also, consider adding this test here #3911 (comment) to libc-epoll-blocking.rs so it is easier to see the effect of the change in the CI.

@FrankReh
Copy link
Contributor Author

FrankReh commented Oct 1, 2024

Also, consider adding this test here #3911 (comment) to libc-epoll-blocking.rs so it is easier to see the effect of the change in the CI.

I would much prefer if one of you more familiar with that test added it. Either to this branch or before or after in a different PR.

@FrankReh
Copy link
Contributor Author

FrankReh commented Oct 1, 2024

Thanks for investigating the deadlock and submit the fix! :)

It was my pleasure. It was nice learning more about Miri and figuring out how to tease the extensive trace lines into something I could compare from run to run. Also building a version where I could control the seeds for each of the twelve places the rng is used, so 12 rng's, was fun to figure out. It was only after knowing which rng use was making a difference that the solution was clear to @RalfJung . Still you and he rattled off what to do next in a few minutes and it still took me some time to piece the advice together. Good experience. (Wish Miri could compile faster though but it is amazing how it is part of rustc and what it can do with no special programming on the application side.) And concurrency and epoll support and being able to test more of Tokio is amazing. I look forward to socket support.

@tiif
Copy link
Contributor

tiif commented Oct 1, 2024

I would much prefer if one of you more familiar with that test added it. Either to this branch or before or after in a different PR.

Sure, I will leave this to someone who has commit right or do it later in another PR.

I think the current status of the PR should be

@rustbot ready

@rustbot rustbot added the S-waiting-on-review Status: Waiting for a review to complete label Oct 1, 2024
@RalfJung
Copy link
Member

RalfJung commented Oct 1, 2024

The PR should contain the test, indeed, to make sure the problem is indeed fixed.

It shouldn't be too hard to copy the test into the test file?

@oli-obk
Copy link
Contributor

oli-obk commented Oct 1, 2024

Ideally the test should be added in a first commit, and then your change shows how the test changes. But that's not a strict requirement

@FrankReh
Copy link
Contributor Author

FrankReh commented Oct 1, 2024

So I can start a new PR, redo the order of commits and rebase, add it now, or are merges squashed?

@oli-obk
Copy link
Contributor

oli-obk commented Oct 1, 2024

You can add the commit to this PR and reorder the commits with git rebase -i

Merges are not squashed.

@FrankReh
Copy link
Contributor Author

FrankReh commented Oct 1, 2024

You can add the commit to this PR and reorder the commits with git rebase -i

Merges are not squashed.

I appreciate the clarifications. Probably all in the contributions section which I didn’t read past how to build. One final question because actually no one likes using git and its log history more than me.

Should the two ancillary changes I made to the same source file all be rolled into the one commit as I had initially done or are we better served with a separate commit for each one? Or even distinct PRs?

@oli-obk
Copy link
Contributor

oli-obk commented Oct 1, 2024

I personally try to put such things in different commits, but they are minor drive by things, so not too important. If it's easy to do for you and you like messing with thr git history, feel free to do it, but don't put a lot of time into it

@FrankReh
Copy link
Contributor Author

FrankReh commented Oct 2, 2024

This style check failure is the simple test function. The 'cargo +nightly fmt' I ran from the top didn't fix the formatting of the test function. Will rebase and commit again.

This test demonstrates the need to synchronize the clock
of the thread waking up from an epoll_wait from the thread
that issued the epoll awake event.
A couple of instructions were left over from an earlier rebase
it would seem. They don't impact the logic but the ready_list type
is about to change in the next commit.

Rather than modify one of these lines in the commit that changes
ready_list, only to have these lines removed later on, remove them now.
They don't impact the tests results.
This adds a VClock to the epoll implementation's ready_list
and has this VClock synced from the thread that updates
an event in the ready_list and then has the VClocks of any
threads being made runnable again, out of the calls to
epoll_wait, synced from it.
A simplification that doesn't impact the epoll implementation's logic.

It is not necessary to clone the ready_list before reading its
`is_empty` state.

This avoids the clone step but more importantly avoids the invisible
drop step of the clone.
@RalfJung
Copy link
Member

RalfJung commented Oct 2, 2024

To format things, run ./miri fmt. You should generally never invoke cargo directly in Miri development.

@oli-obk
Copy link
Contributor

oli-obk commented Oct 2, 2024

Ideally the test should be added in a first commit, and then your change shows how the test changes. But that's not a strict requirement

I created an edit of your branch at https://github.com/rust-lang/miri/compare/epoll-with-vector-clock that shows how I thought it would be laid out. Each commit builds on its own, so the first one adding the test must be adding a failing test, and then the commit that fixes the test changes the test so it passes.

@FrankReh
Copy link
Contributor Author

FrankReh commented Oct 2, 2024

Ideally the test should be added in a first commit, and then your change shows how the test changes. But that's not a strict requirement

I created an edit of your branch at https://github.com/rust-lang/miri/compare/epoll-with-vector-clock that shows how I thought it would be laid out. Each commit builds on its own, so the first one adding the test must be adding a failing test, and then the commit that fixes the test changes the test so it passes.

Is this the review? I pushed new changes up six hours ago with the failing test first.

@oli-obk
Copy link
Contributor

oli-obk commented Oct 2, 2024

Yes, but you pushed it as a "pass" test. The commit will not pass tests that way. Compare with my first and third commits.

But considering the entirety, it seems fine to just add the test as you did to the commit that fixes it. As I said, the scheme of adding it as a failing test first is more extra credit for the pedantic than a requirement/preference

@FrankReh
Copy link
Contributor Author

FrankReh commented Oct 2, 2024

oh. it goes in a different directory? even when part of one PR? Okay. If that makes it better for posterity, I'll redo.

(Because I have a new miri question and I want to stay on all your good sides. :))

@oli-obk
Copy link
Contributor

oli-obk commented Oct 2, 2024

XD this is just oli-pedantism. Your PR is fine as it is. You can ask your question in parallel without any bad feelings from our side.

@FrankReh
Copy link
Contributor Author

FrankReh commented Oct 2, 2024

XD this is just oli-pedantism. Your PR is fine as it is. You can ask your question in parallel without any bad feelings from our side.

Would you like to commit your branch? It has even changed at least one comment so I don't know that there's any benefit to my branch any longer.

TIA

@oli-obk
Copy link
Contributor

oli-obk commented Oct 2, 2024

Done

@oli-obk oli-obk closed this Oct 2, 2024
bors added a commit that referenced this pull request Oct 2, 2024
Add vector clock to epoll ready lists

Replaces #3928

Fixes #3911
@FrankReh FrankReh deleted the epoll-with-vector-clock branch October 2, 2024 21:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Waiting for a review to complete
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Epoll deadlock on tokio's test
5 participants