-
Notifications
You must be signed in to change notification settings - Fork 593
HDDS-8492. Intermittent timeout in TestStorageContainerManager#testBlockDeletionTransactions. #5928
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ockDeletionTransactions.
|
@adoroszlai Pls review. |
|
@DaveTeng0 can you please take a look? |
…ockDeletionTransactions.
|
@xichen01 can you please review, too? |
xichen01
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@devmadhuu The change looks good. Just a few comment inlined.
| cluster.getStorageContainerManager()); | ||
| } | ||
|
|
||
| Thread.sleep(3000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can use this method to wait for the Container to close but need get the Container list first.
ozone/hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/container/TestHelper.java
Lines 326 to 327 in de76edc
| public static void waitForContainerClose(MiniOzoneCluster cluster, | |
| Long... containerIdList) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // Wait for container report | ||
| Thread.sleep(1000); | ||
| for (OmKeyInfo keyInfo : keyLocations.values()) { | ||
| OzoneTestUtils.closeContainers(keyInfo.getKeyLocationVersions(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that the other tests have similar logic (such as: testBlockDeletingThrottling, it close Container too). Is it possible to have the same bugs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this test also passes in repeated runs, I would not want to change anything in this test as of now. Kindly re-review.
|
@devmadhuu Thanks for the update. LGTM +1. |
adoroszlai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
10x50 runs for latest commit: https://github.com/devmadhuu/ozone/actions/runs/7540060163
|
Thanks @devmadhuu for the patch, @xichen01 for the review. |
What changes were proposed in this pull request?
This PR fixes the intermittent failure of
TestStorageContainerManager#testBlockDeletionTransactionstest case.This test case creates 5 new keys and 5 containers as well. Then close all 5 containers and creates the delete transaction log and immediately verifies the number of valid delete transactions is > 0 (
delLog.getNumOfValidTransactions() > 0), But after sometime SCM Block Deleting service thread will send the DELETE BLOCK command to datanodes and commits the cleanup the delete block transaction log fromDeletedBlocksTransactiontable. So after few seconds this assertion ofdelLog.getNumOfValidTransactions() == 0should pass. But there was an issue in test case. that after closing all containers, some containers were left OPEN due to which delete block transactions still kept lying inDeletedBlocksTransactiontable even after 25 seconds. So this PR updates few configurations related to DN heartbeats interval and add a sleep to give sometime for containers to get closed.What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-8492
How was this patch tested?
This patch is tested by running repeated CI runs of multiple iterations (200 iterations in CI flaky workflow). Here is the green CI link.