-
Notifications
You must be signed in to change notification settings - Fork 588
HDDS-4708. Optimization: update RetryCount less frequently (update once per ~100) #1805
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Probably, we should think about removing persisting the retry count in db altogether here. |
|
The writing the retry count into DB will still be useful at least when the retry count exceed the maxRetry. It is useful when some blocks cannot be deleted by a reason thus there is a record in DB such that people can analyze the reason. |
lokeshj1703
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amaliujia Thanks for working on this! The changes look good to me. I have few comments inline. The added test seems to be failing. Can you please take a look?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nextCount % 100) - (currentCount % 100) This would always be 0 since currentCount now would be equal to nextCount. I think we can also use sth like nextCount % 100 == 99?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
O I see at line 154 I did int nextCount = currentCount++;.
Indeed it means nextCount = currentCount. (use currentCount then ++).
I updated to int nextCount = currentCount + 1; and now nextCount % 100) - (currentCount % 100) is supposed to work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
O I know what you were suggesting:
It needs to be at least (nextCount / 100) - (currentCount / 100) :)
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/block/DeletedBlockLogImpl.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| int currentCount = -1; | |
| if (transactionRetryCountMap.containsKey(txID)) { | |
| currentCount = transactionRetryCountMap.get(txID); | |
| } else { | |
| currentCount = block.getCount(); | |
| int currentCount = transactionRetryCountMap.getOrDefault(txID, block.getCount()); |
We can use the getOrDefault api here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove the commented code here?
Sure @bshashikant . Let's discuss this in a separate jira? |
|
@lokeshj1703 comments addressed. Had to refactor testing code a bit to fix the failed UT. |
lokeshj1703
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amaliujia Thanks for updating the PR! Regarding TestDeletedBlockLogBase and other inherited tests, Can we move them all to TestDeletedBlockLog itself? I see that you have defined public abstract int getMaxRetry(); for configuring the maxRetry. I think this can be done in a test as well using what is followed in TestDeletedBlockLog#testPersistence. We can recreate DeletedBlockLog after changing the configuration for the test. Sorry! This might require you to refactor.
There are few other comments inline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Redundant change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| nextCount = -1; | |
| transactionRetryCountMap.remove(txID); |
We can remove the entry from map here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's difficult to understand the logic here. Can we replace it using %? Perhaps nextCount % 100 == 0 or 99?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nextCount % 100 == 0 is good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be moved inside else if
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated to PR: Can we also remove from transactionToDNsCommitMap here for better surity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact this might be a bug: for purgeTransaction I didn't see transactionToDNsCommitMap is cleaned up properly.
Add the cleaning up of transactionToDNsCommitMap here.
|
@lokeshj1703 you suggestion was actually great: simple and easy. Now this PR changes less code than before. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checkstyle complains this line exceeds 80 char limitation so I made this change.
|
@lokeshj1703 PR rebased and conflicts solved. Any further comments? |
lokeshj1703
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amaliujia Thanks for updating the PR! The changes look good to me. +1.
|
@amaliujia Thanks for the contribution! I have committed the PR to master branch. |
* master: (176 commits) HDDS-4760. Intermittent failure in ozone-ha acceptance test (apache#1853) HDDS-4770. Upgrade Ratis Thirdparty to 0.6.0 (apache#1868) HDDS-4765. Update close-pending workflow for new repo (apache#1856) HDDS-4737. Add ModifierOrder to checkstyle rules (apache#1839) HDDS-4704. Add permission check in OMDBCheckpointServlet (apache#1801) HDDS-4757. Unnecessary WARNING to set OZONE_CONF_DIR (apache#1849) HDDS-4751. TestOzoneFileSystem#testTrash failed when enabledFileSystemPaths and omRatisDisabled (apache#1851) HDDS-4736. Intermittent failure in testExpiredCertificate (apache#1838) HDDS-4758. Adjust classpath of ozone version to include log4j (apache#1850) HDDS-4518. Add metrics around Trash Operations. (apache#1832) HDDS-4708. Optimization: update RetryCount less frequently (update once per ~100) (apache#1805) HDDS-4748. sonarqube issue fix - "static" members should be accessed statically (apache#1748) HDDS-2402. Adapt hadolint check to improved CI framework (apache#1778) HDDS-4698. Upgrade Java for Sonar check (apache#1800) HDDS-4739. Upgrade Ratis to 1.1.0-eb66796d-SNAPSHOT (apache#1842) HDDS-4735. Fix typo in hdds.proto (apache#1837) HDDS-4430. OM failover timeout is too short (apache#1807) HDDS-4477. Delete txnId in SCMMetadataStoreImpl may drop to 0 after SCM restart. (apache#1828) HDDS-4688. Update Hadoop version to 3.2.2 (apache#1795) HDDS-4725. Change metrics unit from nanosecond to millisecond (apache#1823) ...
What changes were proposed in this pull request?
Details in the JIRA
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-4708
How was this patch tested?
UT