Skip to content

test(mempool/cat): fix flaky TestTxPool_BroadcastQueue#1321

Merged
rootulp merged 3 commits intocelestiaorg:mainfrom
notional-labs:tuan/fix-flaky-mempool-test
May 7, 2024
Merged

test(mempool/cat): fix flaky TestTxPool_BroadcastQueue#1321
rootulp merged 3 commits intocelestiaorg:mainfrom
notional-labs:tuan/fix-flaky-mempool-test

Conversation

@catconcat
Copy link
Copy Markdown

@catconcat catconcat commented Apr 23, 2024

Description

Closes #1261

@catconcat catconcat requested a review from a team as a code owner April 23, 2024 19:52
@catconcat catconcat requested review from cmwaters and rootulp and removed request for a team April 23, 2024 19:52
@rootulp rootulp changed the title fix(tests): fix flaky TestTxPool_BroadcastQueue test test(mempool/cat): fix flaky TestTxPool_BroadcastQueue Apr 24, 2024
Copy link
Copy Markdown
Collaborator

@rootulp rootulp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't get the flake to reproduce locally before or after this change. @tropicaldog do you mind explaining why this fixes the flake?

@catconcat
Copy link
Copy Markdown
Author

catconcat commented Apr 24, 2024

@rootulp I think the original code might lead to a race condition, where the goroutine may start consuming transactions before all of them are added. Using time.Sleep(10 * time.Millisecond) to introduce a delay is not a reliable solution imo.
The command I use to reproduce the flaky behavior is:
go test -timeout 30s -run ^TestTxPool_BroadcastQueue$ github.com/cometbft/cometbft/mempool/cat -race -count 10
Interestingly, the flaky behavior only occurs on my M2 Mac, where the test fails 50% of the time. When I tried running the same command several times on my Ubuntu 22.04 PC, no flakiness was observed.

Copy link
Copy Markdown
Collaborator

@rootulp rootulp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation! I'm on an M1 Mac and the test is still passing on my machine after your fix so LGTM.

Copy link
Copy Markdown
Member

@rach-id rach-id left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unable to reproduce the flake locally on Mac, but this fix looks reasonable to me. There is no need to wait to introduce the delay where we can just broadcast all the transactions before starting to consume them

Copy link
Copy Markdown
Member

@evan-forbes evan-forbes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @tropicaldog

@rootulp rootulp enabled auto-merge (squash) May 7, 2024 13:20
@evan-forbes
Copy link
Copy Markdown
Member

@tropicaldog we have to update this branch by merging main, perhaps you could give one of us perms to do that on your fork so we could merge this? either that, or you will have to merge main again 😅

@catconcat
Copy link
Copy Markdown
Author

@tropicaldog we have to update this branch by merging main, perhaps you could give one of us perms to do that on your fork so we could merge this? either that, or you will have to merge main again 😅

Yeah I guess I didn't have enough permission. Anyway I will merge main now, thanks for your reviews!

@rootulp rootulp merged commit 485332b into celestiaorg:main May 7, 2024
mergify bot pushed a commit that referenced this pull request May 7, 2024
## Description
Closes #1261

(cherry picked from commit 485332b)
evan-forbes pushed a commit that referenced this pull request May 7, 2024
… (#1345)

## Description
Closes #1261 

<hr>This is an automatic backport of pull request #1321 done by
[Mergify](https://mergify.com).

Co-authored-by: Tuan Tran <tuantran@notional.ventures>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Flake in TestTxPool_BroadcastQueue

4 participants