Skip to content

Conversation

@Xushaohong
Copy link
Contributor

@Xushaohong Xushaohong commented Sep 26, 2022

What changes were proposed in this pull request?

We found that the blocks the client committed may not be equal to the allocated(when EC stripe fails), and the uncommitted blocks would remain forever as orphan blocks as no key maps to them anymore.

Here I use a tricky solution that considers these blocks as repeated/ overwritten key blocks, and thus they would be deleted in the existing deletion path.


https://docs.google.com/document/d/1Oi1zPKdmvA7DFwIcUWz6zw_p-pY3D1L76gyV37jQT94/edit#

The prototype of the design.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-7258

How was this patch tested?

UT and Manual test in the test environment.

@sodonnel
Copy link
Contributor

Could you please describe how these blocks are being detected in the PR description - like a small design overview?

@Xushaohong
Copy link
Contributor Author

Could you please describe how these blocks are being detected in the PR description - like a small design overview?

Sure, will update later, I am working on fixing some issues found by UT.

@Xushaohong
Copy link
Contributor Author

@Xushaohong Xushaohong marked this pull request as draft September 29, 2022 08:49
@Xushaohong Xushaohong marked this pull request as ready for review October 8, 2022 06:19
@kaijchen
Copy link
Member

kaijchen commented Oct 8, 2022

Thanks for the work @Xushaohong.
Is it possible to delete those blocks directly instead of relying on OpenKeyCleanupService?
I'm worrying about too much coupling here.

@Xushaohong
Copy link
Contributor Author

Thx for the question~ @kaijchen
The case for OpenKeyCleanupService is not related to the logic here. They should be two cases as I mentioned in the DOC.

@kaijchen
Copy link
Member

kaijchen commented Oct 8, 2022

Thx for the question~ @kaijchen
The case for OpenKeyCleanupService is not related to the logic here. They should be two cases as I mentioned in the DOC.

Sorry I misread your purposal. The pseudoKeyInfo is moved into DeleteKeyTable
instead of staying at OpenKeyTable, which sounds reasonable to me.

Copy link
Member

@kaijchen kaijchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work @Xushaohong, mostly LGTM.
I have a few improvements inlined.

@Xushaohong
Copy link
Contributor Author

Hi, all! I have done the refactor of this PR. Could you take a look ~

Copy link
Contributor

@errose28 errose28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix @Xushaohong this is a good improvement to have.

Comment on lines 198 to 206
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the purpose of this new getKeyLocation(int, int) method is to create block lists with some overlap. However the client should never be able to commit blocks that were not already allocated in the open key. See #2108, which modified OMKeyInfo#verifyAndGetKeyLocations to drop the extra blocks and added a corresponding unit test in this class. Therefore in this test I think we only need to test the client committing a subset of the blocks it allocated. We could use the original getKeyLocation(int) method to make the allocated list, copy this, and just remove a block or two to create the commit list without needing a new method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx @errose28 for the review!
Actually committing the unknown block from the client side is still possible, #2108 this work is on the server side.
But I have done some improvements like what you said, PTAL~

@kaijchen
Copy link
Member

Thanks @Xushaohong for updating the patch. Filtering uncommitted blocks in verifyAndGetKeyLocations is a good idea. For clearity, it's better to return a Pair<updatedBlockLocations, uncommittedBlocks> instead, or create a inner class.

Copy link
Contributor

@errose28 errose28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates @Xushaohong. Looks good overall just some minor comments left.

@Xushaohong Xushaohong requested review from errose28 and removed request for adoroszlai, kerneltime and sodonnel October 24, 2022 10:33
Copy link
Member

@kaijchen kaijchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @Xushaohong for updating this PR.

Copy link
Contributor

@errose28 errose28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the improvement @Xushaohong lgtm

@errose28 errose28 merged commit dfc13a0 into apache:master Oct 25, 2022
@Xushaohong
Copy link
Contributor Author

@kaijchen @errose28 Thx for the review! :)

kaijchen pushed a commit to kaijchen/ozone that referenced this pull request Oct 26, 2022
errose28 added a commit to errose28/ozone that referenced this pull request Oct 26, 2022
* master: (718 commits)
  HDDS-7342. Move encryption-related code from MultipartCryptoKeyInputStream to OzoneCryptoInputStream (apache#3852)
  HDDS-7413. Fix logging while marking container state unhealthy (apache#3887)
  Revert "HDDS-7253. Fix exception when '/' in key name (apache#3774)"
  HDDS-7396. Force close non-RATIS containers in ReplicationManager (apache#3877)
  HDDS-7121. Support namespace summaries (du, dist & counts) for legacy FS buckets (apache#3746)
  HDDS-7258. Cleanup the allocated but uncommitted blocks (apache#3778)
  HDDS-7381. Cleanup of VolumeManagerImpl (apache#3873)
  HDDS-7253. Fix exception when '/' in key name (apache#3774)
  HDDS-7182. Add property to control RocksDB max open files (apache#3843)
  HDDS-7284. JVM crash for rocksdb for read/write after close (apache#3801)
  HDDS-7368. [Multi-Tenant] Add Volume Existence check in preExecute for OMTenantCreateRequest (apache#3869)
  HDDS-7403. README Security Improvement (apache#3879)
  HDDS-7199. Implement new mix workload Read/Write Freon command (apache#3872)
  HDDS-7248. Recon: Expand the container status page to show all unhealthy container states (apache#3837)
  HDDS-7141. Recon: Improve Disk Usage Page (apache#3789)
  HDDS-7369. Fix wrong order of command arguments in Nonrolling-Upgrade.md (apache#3866)
  HDDS-6210. EC: Add EC metrics (apache#3851)
  HDDS-7355. non-primordial scm fail to get signed cert from primordial SCM when converting an unsecure cluster to secure (apache#3859)
  HDDS-7356. Update SCM-HA.zh.md to match the English version (apache#3861)
  HDDS-6930. SCM,OM,RECON should not print ERROR and exit with code 1 on successful shutdown (apache#3848)
  ...

Conflicts:
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/LegacyReplicationManager.java
hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/container/replication/TestLegacyReplicationManager.java
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants