-
Notifications
You must be signed in to change notification settings - Fork 588
HDDS-8385. Ozone can't process snapshot when service UID > 2097151. #4580
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
jojochuang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I'm wondering if the same issue could occur in other places of the code.
TarArchiveOutputStream is used by SCM checkpoint and datanode container re-replication too.
|
@jojochuang I considered that but because I'm not too familiar with those code paths decided not to. Do you want me to? |
|
I think we should. The logic is the same so straightforward. Otherwise It would be really hard for users to troubleshoot a container re-replication failure should it happen. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @GeorgeJahad for fixing this issue.
FYI, here's all the usage of TarArchiveOutputStream in Ozone:
$ grep -rwl TarArchiveOutputStream . --include '*.java'
./hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OMDBCheckpointServlet.java
./hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/ReconUtils.java
./hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/TarContainerPacker.java
./hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/keyvalue/TestTarContainerPacker.java
./hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/HddsServerUtil.java
jojochuang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
kaijchen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* master: (440 commits) HDDS-8445. Move PlacementPolicy back to SCM (apache#4588) HDDS-8335. ReplicationManager: EC Mis and Under replication handlers should handle overloaded exceptions (apache#4593) HDDS-8355. Intermittent failure in TestOMRatisSnapshots#testInstallSnapshot (apache#4592) HDDS-8444. Increase timeout of CI build (apache#4586) HDDS-8446. Selective checks: handle change in ci.yaml (apache#4587) HDDS-8440. Ozone Manager crashed with ClassCastException when deleting FSO bucket. (apache#4582) HDDS-7309. Enable by default GRPC between S3G and OM (apache#3820) HDDS-8458. Mark TestBlockDeletion#testBlockDeletion as flaky HDDS-8385. Ozone can't process snapshot when service UID > 2097151 (apache#4580) HDDS-8424: Preserve legacy bucket getKeyInfo behavior (apache#4576) HDDS-8453. Mark TestDirectoryDeletingServiceWithFSO#testDirDeletedTableCleanUpForSnapshot as flaky HDDS-8137. [Snapshot] SnapDiff to use tombstone entries in SST files (apache#4376) HDDS-8270. Measure checkAccess latency for Ozone objects (apache#4467) HDDS-8109. Seperate Ratis and EC MisReplication Handling (apache#4577) HDDS-8429. Checkpoint is not closed properly in OMDBCheckpointServlet (apache#4575) HDDS-8253. Set ozone.metadata.dirs to temporary dir if not defined in S3 Gateway (apache#4455) HDDS-8400. Expose rocksdb last sequence number through metrics (apache#4557) HDDS-8333. ReplicationManager: Allow partial EC reconstruction if insufficient nodes available (apache#4579) HDDS-8147. Introduce latency metrics for S3 Gateway operations (apache#4383) HDDS-7908. Support OM Metadata operation Generator in `Ozone freon` (apache#4251) ...
What changes were proposed in this pull request?
Fix OM HA RatisSnapshot creation to support large group ID's
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-8385
How was this patch tested?
Reproduced the problem by setting the group id of one of the snapshot files like so:
Then confirmed the problem no longer occurs after the fix