Skip to content

Conversation

@GeorgeJahad
Copy link
Contributor

@GeorgeJahad GeorgeJahad commented Apr 17, 2023

What changes were proposed in this pull request?

Fix OM HA RatisSnapshot creation to support large group ID's

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-8385

How was this patch tested?

Reproduced the problem by setting the group id of one of the snapshot files like so:

chgrp 1301600003 omNode-1/db.snapshots/checkpointState/om.db-c3e20908-b7ea-4650-a538-352c832ceeaa/000051.sst

Then confirmed the problem no longer occurs after the fix

@neils-dev neils-dev added the gr label Apr 17, 2023
Copy link
Contributor

@jojochuang jojochuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I'm wondering if the same issue could occur in other places of the code.
TarArchiveOutputStream is used by SCM checkpoint and datanode container re-replication too.

@neils-dev neils-dev added the snapshot https://issues.apache.org/jira/browse/HDDS-6517 label Apr 17, 2023
@GeorgeJahad
Copy link
Contributor Author

GeorgeJahad commented Apr 18, 2023

@jojochuang I considered that but because I'm not too familiar with those code paths decided not to. Do you want me to?

@jojochuang
Copy link
Contributor

I think we should. The logic is the same so straightforward. Otherwise It would be really hard for users to troubleshoot a container re-replication failure should it happen.

Copy link
Member

@kaijchen kaijchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @GeorgeJahad for fixing this issue.

FYI, here's all the usage of TarArchiveOutputStream in Ozone:

$ grep -rwl TarArchiveOutputStream . --include '*.java'
./hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OMDBCheckpointServlet.java
./hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/ReconUtils.java
./hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/TarContainerPacker.java
./hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/keyvalue/TestTarContainerPacker.java
./hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/HddsServerUtil.java

Copy link
Contributor

@jojochuang jojochuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Member

@kaijchen kaijchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kaijchen kaijchen merged commit 97b18c5 into apache:master Apr 19, 2023
errose28 added a commit to errose28/ozone that referenced this pull request Apr 20, 2023
* master: (440 commits)
  HDDS-8445. Move PlacementPolicy back to SCM (apache#4588)
  HDDS-8335. ReplicationManager: EC Mis and Under replication handlers should handle overloaded exceptions (apache#4593)
  HDDS-8355. Intermittent failure in TestOMRatisSnapshots#testInstallSnapshot (apache#4592)
  HDDS-8444. Increase timeout of CI build (apache#4586)
  HDDS-8446. Selective checks: handle change in ci.yaml (apache#4587)
  HDDS-8440. Ozone Manager crashed with ClassCastException when deleting FSO bucket. (apache#4582)
  HDDS-7309. Enable by default GRPC between S3G and OM (apache#3820)
  HDDS-8458. Mark TestBlockDeletion#testBlockDeletion as flaky
  HDDS-8385. Ozone can't process snapshot when service UID > 2097151 (apache#4580)
  HDDS-8424: Preserve legacy bucket getKeyInfo behavior (apache#4576)
  HDDS-8453. Mark TestDirectoryDeletingServiceWithFSO#testDirDeletedTableCleanUpForSnapshot as flaky
  HDDS-8137. [Snapshot] SnapDiff to use tombstone entries in SST files (apache#4376)
  HDDS-8270. Measure checkAccess latency for Ozone objects (apache#4467)
  HDDS-8109. Seperate Ratis and EC MisReplication Handling (apache#4577)
  HDDS-8429. Checkpoint is not closed properly in OMDBCheckpointServlet (apache#4575)
  HDDS-8253. Set ozone.metadata.dirs to temporary dir if not defined in S3 Gateway (apache#4455)
  HDDS-8400. Expose rocksdb last sequence number through metrics (apache#4557)
  HDDS-8333. ReplicationManager: Allow partial EC reconstruction if insufficient nodes available (apache#4579)
  HDDS-8147. Introduce latency metrics for S3 Gateway operations (apache#4383)
  HDDS-7908. Support OM Metadata operation Generator in `Ozone freon` (apache#4251)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

snapshot https://issues.apache.org/jira/browse/HDDS-6517

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants