HDDS-4708. Optimization: update RetryCount less frequently (update once per ~100) #1805

amaliujia · 2021-01-15T08:03:28Z

What changes were proposed in this pull request?

Details in the JIRA

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-4708

How was this patch tested?

UT

bshashikant · 2021-01-15T09:08:51Z

Probably, we should think about removing persisting the retry count in db altogether here.
cc ~ @lokeshj1703

amaliujia · 2021-01-16T18:56:03Z

The writing the retry count into DB will still be useful at least when the retry count exceed the maxRetry. It is useful when some blocks cannot be deleted by a reason thus there is a record in DB such that people can analyze the reason.

lokeshj1703

@amaliujia Thanks for working on this! The changes look good to me. I have few comments inline. The added test seems to be failing. Can you please take a look?

lokeshj1703 · 2021-01-18T12:20:09Z

hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/block/DeletedBlockLogImpl.java

(nextCount % 100) - (currentCount % 100) This would always be 0 since currentCount now would be equal to nextCount. I think we can also use sth like nextCount % 100 == 99?

O I see at line 154 I did int nextCount = currentCount++;.

Indeed it means nextCount = currentCount. (use currentCount then ++).

I updated to int nextCount = currentCount + 1; and now nextCount % 100) - (currentCount % 100) is supposed to work

O I know what you were suggesting:

It needs to be at least (nextCount / 100) - (currentCount / 100) :)

hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/block/DeletedBlockLogImpl.java

lokeshj1703 · 2021-01-18T12:26:59Z

hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/block/DeletedBlockLogImpl.java

Suggested change

int currentCount = -1;

if (transactionRetryCountMap.containsKey(txID)) {

currentCount = transactionRetryCountMap.get(txID);

} else {

currentCount = block.getCount();

int currentCount = transactionRetryCountMap.getOrDefault(txID, block.getCount());

We can use the getOrDefault api here.

lokeshj1703 · 2021-01-18T12:48:38Z

hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/block/TestDeletedBlockLog.java

We can remove the commented code here?

lokeshj1703 · 2021-01-18T13:16:16Z

Probably, we should think about removing persisting the retry count in db altogether here.

Sure @bshashikant . Let's discuss this in a separate jira?

amaliujia · 2021-01-18T23:29:14Z

@lokeshj1703 comments addressed.

Had to refactor testing code a bit to fix the failed UT.

lokeshj1703

@amaliujia Thanks for updating the PR! Regarding TestDeletedBlockLogBase and other inherited tests, Can we move them all to TestDeletedBlockLog itself? I see that you have defined public abstract int getMaxRetry(); for configuring the maxRetry. I think this can be done in a test as well using what is followed in TestDeletedBlockLog#testPersistence. We can recreate DeletedBlockLog after changing the configuration for the test. Sorry! This might require you to refactor.
There are few other comments inline.

lokeshj1703 · 2021-01-21T09:52:45Z

hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/block/DeletedBlockLogImpl.java

Redundant change.

lokeshj1703 · 2021-01-21T09:54:33Z

hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/block/DeletedBlockLogImpl.java

Suggested change

nextCount = -1;

transactionRetryCountMap.remove(txID);

We can remove the entry from map here.

lokeshj1703 · 2021-01-21T09:56:24Z

hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/block/DeletedBlockLogImpl.java

It's difficult to understand the logic here. Can we replace it using %? Perhaps nextCount % 100 == 0 or 99?

nextCount % 100 == 0 is good.

lokeshj1703 · 2021-01-21T09:56:55Z

hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/block/DeletedBlockLogImpl.java

Can be moved inside else if

lokeshj1703 · 2021-01-21T09:57:48Z

hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/block/DeletedBlockLogImpl.java

Unrelated to PR: Can we also remove from transactionToDNsCommitMap here for better surity?

In fact this might be a bug: for purgeTransaction I didn't see transactionToDNsCommitMap is cleaned up properly.

Add the cleaning up of transactionToDNsCommitMap here.

amaliujia · 2021-01-22T03:47:38Z

@lokeshj1703 you suggestion was actually great: simple and easy.

Now this PR changes less code than before.

amaliujia · 2021-01-22T03:49:15Z

hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/block/TestDeletedBlockLog.java

Checkstyle complains this line exceeds 80 char limitation so I made this change.

…ce per ~100)

amaliujia · 2021-01-25T22:15:23Z

@lokeshj1703 PR rebased and conflicts solved. Any further comments?

lokeshj1703

@amaliujia Thanks for updating the PR! The changes look good to me. +1.

…ce per ~100) (#1805)

lokeshj1703 · 2021-01-27T07:41:59Z

@amaliujia Thanks for the contribution! I have committed the PR to master branch.

* master: (176 commits) HDDS-4760. Intermittent failure in ozone-ha acceptance test (apache#1853) HDDS-4770. Upgrade Ratis Thirdparty to 0.6.0 (apache#1868) HDDS-4765. Update close-pending workflow for new repo (apache#1856) HDDS-4737. Add ModifierOrder to checkstyle rules (apache#1839) HDDS-4704. Add permission check in OMDBCheckpointServlet (apache#1801) HDDS-4757. Unnecessary WARNING to set OZONE_CONF_DIR (apache#1849) HDDS-4751. TestOzoneFileSystem#testTrash failed when enabledFileSystemPaths and omRatisDisabled (apache#1851) HDDS-4736. Intermittent failure in testExpiredCertificate (apache#1838) HDDS-4758. Adjust classpath of ozone version to include log4j (apache#1850) HDDS-4518. Add metrics around Trash Operations. (apache#1832) HDDS-4708. Optimization: update RetryCount less frequently (update once per ~100) (apache#1805) HDDS-4748. sonarqube issue fix - "static" members should be accessed statically (apache#1748) HDDS-2402. Adapt hadolint check to improved CI framework (apache#1778) HDDS-4698. Upgrade Java for Sonar check (apache#1800) HDDS-4739. Upgrade Ratis to 1.1.0-eb66796d-SNAPSHOT (apache#1842) HDDS-4735. Fix typo in hdds.proto (apache#1837) HDDS-4430. OM failover timeout is too short (apache#1807) HDDS-4477. Delete txnId in SCMMetadataStoreImpl may drop to 0 after SCM restart. (apache#1828) HDDS-4688. Update Hadoop version to 3.2.2 (apache#1795) HDDS-4725. Change metrics unit from nanosecond to millisecond (apache#1823) ...

lokeshj1703 reviewed Jan 18, 2021

View reviewed changes

lokeshj1703 reviewed Jan 21, 2021

View reviewed changes

amaliujia commented Jan 22, 2021

View reviewed changes

HDDS-4708. Optimization: update RetryCount less frequently (update on…

f5f3558

…ce per ~100)

amaliujia force-pushed the HDDS-4708 branch from 92c9bc0 to f5f3558 Compare January 25, 2021 18:56

lokeshj1703 approved these changes Jan 27, 2021

View reviewed changes

lokeshj1703 pushed a commit that referenced this pull request Jan 27, 2021

HDDS-4708. Optimization: update RetryCount less frequently (update on…

39027e4

…ce per ~100) (#1805)

lokeshj1703 closed this Jan 27, 2021

amaliujia deleted the HDDS-4708 branch January 27, 2021 18:39

-        int currentCount = -1;
-        if (transactionRetryCountMap.containsKey(txID)) {
-          currentCount = transactionRetryCountMap.get(txID);
-        } else {
-          currentCount = block.getCount();
+        int currentCount = transactionRetryCountMap.getOrDefault(txID, block.getCount());

HDDS-4708. Optimization: update RetryCount less frequently (update once per ~100) #1805

HDDS-4708. Optimization: update RetryCount less frequently (update once per ~100) #1805

Uh oh!

Conversation

amaliujia commented Jan 15, 2021

What changes were proposed in this pull request?

What is the link to the Apache JIRA

How was this patch tested?

Uh oh!

bshashikant commented Jan 15, 2021

Uh oh!

amaliujia commented Jan 16, 2021

Uh oh!

lokeshj1703 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lokeshj1703 commented Jan 18, 2021

Uh oh!

amaliujia commented Jan 18, 2021

Uh oh!

lokeshj1703 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amaliujia Jan 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amaliujia commented Jan 22, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amaliujia commented Jan 25, 2021

Uh oh!

lokeshj1703 left a comment

Choose a reason for hiding this comment

Uh oh!

lokeshj1703 commented Jan 27, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

amaliujia Jan 22, 2021 •

edited

Loading