HDDS-4308. Fix issue with quota update #1489

captainzmc · 2020-10-13T07:45:47Z

What changes were proposed in this pull request?

Currently volumeArgs using getCacheValue and put the same object in doubleBuffer, this might cause issue.

Let's take the below scenario:

InitialVolumeArgs quotaBytes -> 10000

T1 -> Update VolumeArgs, and subtracting 1000 and put this updated volumeArgs to DoubleBuffer.
T2-> Update VolumeArgs, and subtracting 2000 and has not still updated to double buffer.

Now at the end of flushing these transactions, our DB should have 7000 as bytes used.

Now T1 is picked by double Buffer and when it commits, and as it uses cached Object put into doubleBuffer, it flushes to DB with the updated value from T2(As it is a cache object) and update DB with bytesUsed as 7000.

And now OM has restarted, and only DB has transactions till T1. (We get this info from TransactionInfo Table(https://issues.apache.org/jira/browse/HDDS-3685)

Now T2 is again replayed, as it is not committed to DB, now DB will be again subtracted with 2000, and now DB will have 5000.

But after T2, the value should be 7000, so we have DB in an incorrect state.

Issue here:

As we use a cached object and put the same cached object into double buffer this can cause this kind of issue.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-4308

How was this patch tested?

Use the existing UT

captainzmc · 2020-10-14T06:13:16Z

Hi @bharatviswa504, Can you help to review this PR.

captainzmc · 2020-10-27T09:55:42Z

Hi @linyiqun,I modified the implementation based on the latest comments. Can you help to review this PR.

...p-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/request/key/OMKeyRequest.java

captainzmc · 2020-10-30T01:59:54Z

Hi @bharatviswa504 @linyiqun, This PR has been modified. Can you help to review this PR.

linyiqun

Hi @captainzmc , please have a look for the latest comments.

linyiqun · 2020-10-30T14:31:44Z

hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/helpers/OmVolumeArgs.java

Method name setUsedBytes seems confused, can we rename to incrUsedBytes(long bytes)

...ozone-manager/src/main/java/org/apache/hadoop/ozone/om/request/file/OMFileCreateRequest.java

...java/org/apache/hadoop/ozone/om/request/s3/multipart/S3MultipartUploadCommitPartRequest.java

...ozone-manager/src/main/java/org/apache/hadoop/ozone/om/request/file/OMFileCreateRequest.java

linyiqun

Thanks for addressing the comments, @captainzmc ! Leave one minor comment.
@bharatviswa504 , does current fix way make sense to you?

...ozone-manager/src/main/java/org/apache/hadoop/ozone/om/request/file/OMFileCreateRequest.java

linyiqun

LGTM +1.
Let's wait for others to have a look for latest change as well, : ).

bharatviswa504

If we use volume lock for key operations we serialize all bucket ops across volume which have major perf impact. (I understand why we need it, I am just thinking if there are some ways we can avoid it)

So some general questions on Quota:

Can we skip updating volume bytes used during key operations, when it is required just calculated bytes used from all bucket info. As with volume lock, it affects key operations.
During bucket creation why can't we make check if buckets created exceed volume quota set if so, fail bucket creation. In this way, during key operations we don't need to check volume bytesUsed, just bucket bytesUsed will be enough.

Example:
volume quota 100 MB

Bucket1 90MB success
Bucket2 20MB fail(As total volume quota is 100MB)

With this approach, we don't need to check volume quota during key ops. (It has impact ony during quota set operation which is not a frequent operation on the cluster)

Is there a way to disable quota feature? My question is so when upgraded during key creation we by default bytesUsed. And when not required, we don't need to update bytesUsed. (I see that we have check during checkQuota, but not during update bytesUsed)

captainzmc · 2020-11-04T03:31:48Z

Thanks for @bharatviswa504's optimization Suggestions.

Is there a way to disable quota feature? My question is so when upgraded during key creation we by default bytesUsed. And when not required, we don't need to update bytesUsed. (I see that we have check during checkQuota, but not during update bytesUsed)

This bytesUsed should be updated all the time. If quota is not enabled when a bucket is created, then quota is enabled after a certain amount of data is written. At this point we need to know how much bytesUsed was before quota was enabled so that we can correctly update it.
And in addition to quota, displaying bytesUsed on buckets also gives the user a more intuitive view of the current usage of bucket data. This is similar to bytesUsed in the current container.

Can we skip updating volume bytes used during key operations, when it is required just calculated bytes used from all bucket info. As with volume lock, it affects key operations.

During bucket creation why can't we make check if buckets created exceed volume quota set if so, fail bucket creation. In this way, during key operations we don't need to check volume bytesUsed, just bucket bytesUsed will be enough.

Agree, I think this is a better way to avoid the current use of Volume lock. I will modify this PR in this way as soon as possible. cc @linyiqun

linyiqun · 2020-11-04T10:38:44Z

Also +1 with no volume lock approach and aggregate the bucket bytes used instead.

This bytesUsed should be updated all the time. If quota is not enabled when a bucket is created, then quota is enabled after a certain amount of data is written. At this point we need to know how much bytesUsed was before quota was enabled so that we can correctly update it.

Agreed, bytes used still needed but don't need to do any quota check if we have a way to disable quota.

captainzmc · 2020-11-05T16:08:23Z

Hi @bharatviswa504 @linyiqun. Based on what we discussed yesterday, I revised the PR. Can you take another look?
The changes are as follows:

Remove the dependency on volume usedBytes.
The previous function ensures that the total size of bucket quota does not exceed volume, so the portion does not need to be modified. Also, when checking quota, the quota is not checked if it is not enabled.
Modified relevant UT.

runzhiwang · 2020-11-19T12:44:05Z

We no longer need to update volume usedBytes in Response. You can also delete the logic for updating volume usedBytes in Response.

runzhiwang · 2020-11-19T12:44:21Z

Now that we no longer need the usebyte of the volume; You need to synchronize the contents of quote.md.

runzhiwang · 2020-11-19T12:46:07Z

Overall LGTM. Just a couple of minor suggestions.

captainzmc · 2020-11-20T16:00:39Z

The issues has been fixed. Can you take another look? @runzhiwang

.../ozone-manager/src/main/java/org/apache/hadoop/ozone/om/request/key/OMKeysDeleteRequest.java

hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/helpers/OmBucketInfo.java

...e/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/request/key/OMKeyCommitRequest.java

...e/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/request/key/OMKeyDeleteRequest.java

captainzmc · 2020-11-24T12:41:37Z

Thanks for @runzhiwang‘s review. The issues has been fixed.

runzhiwang

LGTM.

runzhiwang · 2020-11-25T02:02:00Z

@captainzmc Thanks the patch. @bharatviswa504 @linyiqun Thanks for review. I have merged the patch.

linyiqun · 2020-11-25T03:22:53Z

@captainzmc and @runzhiwang , I noticed that we also removed below function:

  protected void checkVolumeQuotaInBytes(OmVolumeArgs omVolumeArgs,
      long allocateSize) throws IOException {
    ...
  }

So now where we will check the quota usage of the volume? There should be one place to do the sum of volume buckets usage and then do the quota check.

captainzmc · 2020-11-25T03:41:38Z

Thanks for @linyiqun's attention. For now we no longer need to check the Quota of Volume. Because we have ensured that all bucket quota and do not exceed volume quota when we set bucket and volume quota. Therefore, to write a key under a bucket, we simply check bucket quota. Volume's quota will naturally not exceed as long as the bucket's quota checks pass.
Just as Bharat's suggests.

linyiqun · 2020-11-25T05:28:31Z

Sounds good, thanks @captainzmc for the explanation.

* HDDS-3698-upgrade: HDDS-4429. Create unit test for SimpleContainerDownloader. (apache#1551) HDDS-4461. Reuse compiled binaries in acceptance test (apache#1588) HDDS-4511: Avoiding StaleNodeHandler to take effect in TestDeleteWithSlowFollower. (apache#1625) HDDS-4510. SCM can avoid creating RetriableDatanodeEventWatcher for deletion command ACK (apache#1626) HDDS-3363. Intermittent failure in testContainerImportExport (apache#1618) HDDS-4370. Datanode deletion service can avoid storing deleted blocks. (apache#1620) HDDS-4512. Remove unused netty3 transitive dependency (apache#1627) HDDS-4481. With HA OM can send deletion blocks to SCM multiple times. (apache#1608) HDDS-4487. SCM can avoid using RETRIABLE_DATANODE_COMMAND for datanode deletion commands. (apache#1621) HDDS-4471. GrpcOutputStream length can overflow (apache#1617) HDDS-4308. Fix issue with quota update (apache#1489) HDDS-4392. [DOC] Add Recon architecture to docs (apache#1602) HDDS-4501. Reload OM State fail should terminate OM for any exceptions. (apache#1622) HDDS-4492. CLI flag --quota should default to 'spaceQuota' to preserve backward compatibility. (apache#1609) HDDS-3689. Add various profiles to MiniOzoneChaosCluster to run different modes. (apache#1420) HDDS-4497. Recon File Size Count task throws SQL Exception. (apache#1612)

* master: (40 commits) HDDS-4473. Reduce number of sortDatanodes RPC calls (apache#1610) HDDS-4485. [DOC] add the authentication rules of the Ozone Ranger. (apache#1603) HDDS-4528. Upgrade slf4j to 1.7.30 (apache#1639) HDDS-4424. Update README with information how to report security issues (apache#1548) HDDS-4484. Use RaftServerImpl isLeader instead of periodic leader update logic in OM and isLeaderReady for read/write requests (apache#1638) HDDS-4429. Create unit test for SimpleContainerDownloader. (apache#1551) HDDS-4461. Reuse compiled binaries in acceptance test (apache#1588) HDDS-4511: Avoiding StaleNodeHandler to take effect in TestDeleteWithSlowFollower. (apache#1625) HDDS-4510. SCM can avoid creating RetriableDatanodeEventWatcher for deletion command ACK (apache#1626) HDDS-3363. Intermittent failure in testContainerImportExport (apache#1618) HDDS-4370. Datanode deletion service can avoid storing deleted blocks. (apache#1620) HDDS-4512. Remove unused netty3 transitive dependency (apache#1627) HDDS-4481. With HA OM can send deletion blocks to SCM multiple times. (apache#1608) HDDS-4487. SCM can avoid using RETRIABLE_DATANODE_COMMAND for datanode deletion commands. (apache#1621) HDDS-4471. GrpcOutputStream length can overflow (apache#1617) HDDS-4308. Fix issue with quota update (apache#1489) HDDS-4392. [DOC] Add Recon architecture to docs (apache#1602) HDDS-4501. Reload OM State fail should terminate OM for any exceptions. (apache#1622) HDDS-4492. CLI flag --quota should default to 'spaceQuota' to preserve backward compatibility. (apache#1609) HDDS-3689. Add various profiles to MiniOzoneChaosCluster to run different modes. (apache#1420) ...

…ge request !66) Squash merge branch '2020-11-30' into 'tencent-master' * HDDS-4308. Fix issue with quota update (apache#1489)

… (merge request !66) This reverts commit dfcd441

Revert cherry-pick HDDS-4308. Fix issue with quota update (apache#1489) (merge request !66) This reverts commit dfcd441

captainzmc force-pushed the fix-update-usage branch from 75a47a6 to e6a0ecd Compare October 13, 2020 12:57

cxorm self-requested a review October 13, 2020 12:59

captainzmc force-pushed the fix-update-usage branch from e6a0ecd to f3744e7 Compare October 14, 2020 02:07

linyiqun reviewed Oct 27, 2020

View reviewed changes

...p-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/request/key/OMKeyRequest.java Outdated Show resolved Hide resolved

captainzmc force-pushed the fix-update-usage branch 3 times, most recently from dcfdb76 to dfc8e50 Compare October 29, 2020 13:16

linyiqun reviewed Oct 30, 2020

View reviewed changes

captainzmc force-pushed the fix-update-usage branch from b5a50df to a3c5ba2 Compare November 2, 2020 12:05

linyiqun reviewed Nov 2, 2020

View reviewed changes

...ozone-manager/src/main/java/org/apache/hadoop/ozone/om/request/file/OMFileCreateRequest.java Show resolved Hide resolved

linyiqun reviewed Nov 2, 2020

View reviewed changes

...ozone-manager/src/main/java/org/apache/hadoop/ozone/om/request/file/OMFileCreateRequest.java Show resolved Hide resolved

linyiqun approved these changes Nov 3, 2020

View reviewed changes

bharatviswa504 reviewed Nov 3, 2020

View reviewed changes

Fix issue with quota update

d9d17aa

captainzmc force-pushed the fix-update-usage branch from 29c4852 to d9d17aa Compare November 5, 2020 14:44

trigger new CI check

357d513

captainzmc requested a review from bharatviswa504 November 9, 2020 02:08

captainzmc mentioned this pull request Nov 11, 2020

HDDS-4375. OM changes the block length when receives truncate request #1517

Closed

fix review issues.

f4428bf

captainzmc requested a review from runzhiwang November 20, 2020 15:59

runzhiwang reviewed Nov 24, 2020

View reviewed changes

.../ozone-manager/src/main/java/org/apache/hadoop/ozone/om/request/key/OMKeysDeleteRequest.java Outdated Show resolved Hide resolved

runzhiwang reviewed Nov 24, 2020

View reviewed changes

hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/helpers/OmBucketInfo.java Outdated Show resolved Hide resolved

runzhiwang reviewed Nov 24, 2020

View reviewed changes

...e/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/request/key/OMKeyCommitRequest.java Show resolved Hide resolved

runzhiwang reviewed Nov 24, 2020

View reviewed changes

...e/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/request/key/OMKeyDeleteRequest.java Outdated Show resolved Hide resolved

fix review issues.

f9831c4

runzhiwang approved these changes Nov 25, 2020

View reviewed changes

runzhiwang merged commit 54cca0b into apache:master Nov 25, 2020

captainzmc mentioned this pull request Nov 25, 2020

HDDS-4272. Volume namespace: add usedNamespace and update it when create and delete bucket #1445

Merged

linyiqun mentioned this pull request Nov 26, 2020

HDDS-4358: Delete : make delete an atomic operation #1607

Merged

guihecheng pushed a commit to guihecheng/ozone that referenced this pull request Mar 8, 2021

Revert cherry-pick HDDS-4308. Fix issue with quota update (apache#1489)…

4621298

… (merge request !66) This reverts commit dfcd441

HDDS-4308. Fix issue with quota update #1489

HDDS-4308. Fix issue with quota update #1489

Uh oh!

Conversation

captainzmc commented Oct 13, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

What is the link to the Apache JIRA

How was this patch tested?

Uh oh!

captainzmc commented Oct 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

captainzmc commented Oct 27, 2020

Uh oh!

Uh oh!

captainzmc commented Oct 30, 2020

Uh oh!

linyiqun left a comment

Choose a reason for hiding this comment

Uh oh!

linyiqun Oct 30, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

linyiqun left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

linyiqun left a comment

Choose a reason for hiding this comment

Uh oh!

bharatviswa504 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

captainzmc commented Nov 4, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

linyiqun commented Nov 4, 2020

Uh oh!

captainzmc commented Nov 5, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

runzhiwang commented Nov 19, 2020

Uh oh!

runzhiwang commented Nov 19, 2020

Uh oh!

runzhiwang commented Nov 19, 2020

Uh oh!

captainzmc commented Nov 20, 2020

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

captainzmc commented Nov 24, 2020

Uh oh!

runzhiwang left a comment

Choose a reason for hiding this comment

Uh oh!

runzhiwang commented Nov 25, 2020

Uh oh!

linyiqun commented Nov 25, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

captainzmc commented Nov 25, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

linyiqun commented Nov 25, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

captainzmc commented Oct 13, 2020 •

edited

Loading

captainzmc commented Oct 14, 2020 •

edited

Loading

bharatviswa504 left a comment •

edited

Loading

captainzmc commented Nov 4, 2020 •

edited

Loading

captainzmc commented Nov 5, 2020 •

edited

Loading

linyiqun commented Nov 25, 2020 •

edited

Loading

captainzmc commented Nov 25, 2020 •

edited

Loading