Fix testAbortSnapshotWhileRemovingNode#142852
Conversation
Clear the mock transport rule in testAbortSnapshotWhileRemovingNode after releasing the single update_snapshot_status request we coordinate on, so any further requests are handled normally and complete before teardown. This prevents the test from failing in assertAfterTest() with "All incoming requests on node [X] should have finished". Closes elastic#142805
Clear the mock transport rule in testAbortSnapshotWhileRemovingNode after releasing the single update_snapshot_status request we coordinate on, so any further requests are handled normally and complete before teardown. This prevents the test from failing in assertAfterTest() with "All incoming requests on node [X] should have finished". Closes elastic#142805
DaveCTurner
left a comment
There was a problem hiding this comment.
This looks like a good change but are you sure it fixes the test failure completely?
|
@ywangd Hey, tell me if I'm wrong here, but AFAICT, the test blocks all When the snapshot is aborted, the data node can send a second status update (e.g. FAILED/ABORTED or completion). That second request hits the same handler and blocks on the first safeAwait(barrier) with no further barrier releases, so it never completes. After the test method returns, I believe removing the masterTransportAction rules should prevent this from happening by letting all subsequent requests occur as intended. |
|
I see, thanks, yes the second status update is going to cause problems here. I'd rather not just let it through like this tho, instead I think we need to use a pair of There look to be several other spots where we modify the master's update handling and don't revert it before the end of the test. We should fix them too. |
|
|
||
| // Release the master node to respond | ||
| snapshotStatusUpdateLatch.countDown(); | ||
| masterTransportService.clearAllRules(); |
There was a problem hiding this comment.
Lmk if this is the wrong place to have this. I put it here since it's the line following snapshotStatusUpdateLatch being counted down (and therefore the update snapshot requests are allowed to be processed, and we can remove the rules).
|
@DaveCTurner Hey, thank you for the pointers. I've implemented two |
DaveCTurner
left a comment
There was a problem hiding this comment.
LGTM (I haven't actually reproduced the failure tho, it's quite rare)
|
Thanks for fixing this. This is a new failure since #142637 which adds a 2nd shard snapshot update for PAUSED shard when it is deleted. |
Same issue as elastic#142805 and fixed by elastic#142852. Resolves elastic#142868 Resolves elastic#142869 Resolves elastic#142870 Resolves elastic#142871
Clear the mock transport rule in testAbortSnapshotWhileRemovingNode after releasing the single update_snapshot_status request we coordinate on, so any further requests are handled normally and complete before teardown. This prevents the test from failing in assertAfterTest() with "All incoming requests on node [X] should have finished". Closes elastic#142805 * Fix testAbortSnapshotWhileRemovingNode Clear the mock transport rule in testAbortSnapshotWhileRemovingNode after releasing the single update_snapshot_status request we coordinate on, so any further requests are handled normally and complete before teardown. This prevents the test from failing in assertAfterTest() with "All incoming requests on node [X] should have finished". Closes elastic#142805 * Update comment * Remove second masterTransportService.clearAllRules(); * Use two CountDownLatches * Add extra masterTransportService.clearAllRules(); * [CI] Auto commit changes from spotless --------- Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
Same issue as elastic#142805 and fixed by elastic#142852. Resolves elastic#142868 Resolves elastic#142869 Resolves elastic#142870 Resolves elastic#142871
Clear the mock transport rule in testAbortSnapshotWhileRemovingNode after releasing the single update_snapshot_status request we coordinate on, so any further requests are handled normally and complete before teardown. This prevents the test from failing in assertAfterTest() with "All incoming requests on node [X] should have finished". Closes elastic#142805 * Fix testAbortSnapshotWhileRemovingNode Clear the mock transport rule in testAbortSnapshotWhileRemovingNode after releasing the single update_snapshot_status request we coordinate on, so any further requests are handled normally and complete before teardown. This prevents the test from failing in assertAfterTest() with "All incoming requests on node [X] should have finished". Closes elastic#142805 * Update comment * Remove second masterTransportService.clearAllRules(); * Use two CountDownLatches * Add extra masterTransportService.clearAllRules(); * [CI] Auto commit changes from spotless --------- Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co> (cherry picked from commit c8d36f0)
💚 All backports created successfully
Questions ?Please refer to the Backport tool documentation |
Clear the mock transport rule in testAbortSnapshotWhileRemovingNode after releasing the single update_snapshot_status request we coordinate on, so any further requests are handled normally and complete before teardown. This prevents the test from failing in assertAfterTest() with "All incoming requests on node [X] should have finished". Closes elastic#142805 * Fix testAbortSnapshotWhileRemovingNode Clear the mock transport rule in testAbortSnapshotWhileRemovingNode after releasing the single update_snapshot_status request we coordinate on, so any further requests are handled normally and complete before teardown. This prevents the test from failing in assertAfterTest() with "All incoming requests on node [X] should have finished". Closes elastic#142805 * Update comment * Remove second masterTransportService.clearAllRules(); * Use two CountDownLatches * Add extra masterTransportService.clearAllRules(); * [CI] Auto commit changes from spotless --------- Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co> (cherry picked from commit c8d36f0)
Clear the mock transport rule in testAbortSnapshotWhileRemovingNode after releasing the single update_snapshot_status request we coordinate on, so any further requests are handled normally and complete before teardown. This prevents the test from failing in assertAfterTest() with "All incoming requests on node [X] should have finished". Closes #142805 * Fix testAbortSnapshotWhileRemovingNode Clear the mock transport rule in testAbortSnapshotWhileRemovingNode after releasing the single update_snapshot_status request we coordinate on, so any further requests are handled normally and complete before teardown. This prevents the test from failing in assertAfterTest() with "All incoming requests on node [X] should have finished". Closes #142805 * Update comment * Remove second masterTransportService.clearAllRules(); * Use two CountDownLatches * Add extra masterTransportService.clearAllRules(); * [CI] Auto commit changes from spotless --------- (cherry picked from commit c8d36f0) Co-authored-by: Joshua Adams <joshua.adams@elastic.co> Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
Clear the mock transport rule in testAbortSnapshotWhileRemovingNode after releasing the single update_snapshot_status request we coordinate on, so any further requests are handled normally and complete before teardown. This prevents the test from failing in assertAfterTest() with "All incoming requests on node [X] should have finished". Closes #142805 * Fix testAbortSnapshotWhileRemovingNode Clear the mock transport rule in testAbortSnapshotWhileRemovingNode after releasing the single update_snapshot_status request we coordinate on, so any further requests are handled normally and complete before teardown. This prevents the test from failing in assertAfterTest() with "All incoming requests on node [X] should have finished". Closes #142805 * Update comment * Remove second masterTransportService.clearAllRules(); * Use two CountDownLatches * Add extra masterTransportService.clearAllRules(); * [CI] Auto commit changes from spotless --------- (cherry picked from commit c8d36f0) Co-authored-by: Joshua Adams <joshua.adams@elastic.co> Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
|
Thanks for this @ywangd! |
Clear the mock transport rule in
testAbortSnapshotWhileRemovingNodeafter releasing the singleupdate_snapshot_statusrequest we coordinate on, so any further requests are handled normally and complete before teardown. This prevents the test from failing inassertAfterTest()with "All incoming requests on node [X] should have finished".Closes #142805