Skip to content

Conversation

@peterxcli
Copy link
Member

What changes were proposed in this pull request?

HDDS-9930 added test cases for single key deletion testHSyncDeletedKey. But not for batch key deletion (see PR description checkbox).

Batch key deletion tests remain to be added (in TestHSync or in a new test class).

New test should ensure the behavior of OMKeysDeleteRequest and OmKeysDeleteRequestWithFSO is correct.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-10220

How was this patch tested?

Add test case in TestHSync to cover HDDS-9930 batch key deletion

CI: https://github.com/peterxcli/ozone/actions/runs/13100233276

@peterxcli
Copy link
Member Author

I was confusing why the delete keys request works fine with Legacy layout even if the request map like this:

// DeleteKeys
addRequestClass(Type.DeleteKeys,
OMKeysDeleteRequest.class,
BucketLayout.OBJECT_STORE);
addRequestClass(Type.DeleteKeys,
OmKeysDeleteRequestWithFSO.class,
BucketLayout.FILE_SYSTEM_OPTIMIZED);

Till I see these😅:

static void addRequestClass(Type requestType,
Class<? extends OMKeyRequest>
requestClass,
BucketLayout associatedBucketLayout) {
// If the request class is associated to FSO bucket layout,
// we add a suffix to its key in the map.
OM_KEY_REQUEST_CLASSES.put(getKey(requestType, associatedBucketLayout),
requestClass);
}

static String getKey(Type requestType, BucketLayout bucketLayout) {
return requestType.toString() +
(bucketLayout.isFileSystemOptimized() ? bucketLayout : "");
}

@peterxcli peterxcli marked this pull request as ready for review February 3, 2025 02:11
@adoroszlai adoroszlai requested a review from smengcl February 3, 2025 10:03
@adoroszlai adoroszlai added the test label Feb 3, 2025
Comment on lines 625 to 630
try {
fs.open(new Path(dir, key));
fail("HSync should throw because the open key is gone");
} catch (FileNotFoundException ex) {
// expected
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this style of test should only exist in the legacy code. Please use assertThrows() for new code

Suggested change
try {
fs.open(new Path(dir, key));
fail("HSync should throw because the open key is gone");
} catch (FileNotFoundException ex) {
// expected
}
assertThrows(FileNotFoundException.class,
() -> fs.open(new Path(dir, key)));

@peterxcli peterxcli requested a review from jojochuang February 5, 2025 03:15
os.write(1);
os.hsync();
openKeys.add(keyName);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok sorry now I recall what we wanted to cover in this test case.

We want to open the files, write, hsync; while it's still open, delete the keys.

The try with resources pattern closes the file after exiting this block, so that's not what we want.

Copy link
Member Author

@peterxcli peterxcli Feb 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jojochuang Thanks for the reminder.
I update the test to not use try with resources pattern, and it failed to receive the exception from the second hsync operation after the keys are deleted. I'm still investigating whether the root cause is my incorrect test or an issue in #6472

@ParameterizedTest
@MethodSource("bucketLayouts")
public void testBatchKeyDeletionWithHSync(BucketLayout bucketLayout) throws Exception {
  // Create a bucket with the specified layout
  OzoneBucket testBucket = TestDataUtil.createVolumeAndBucket(client, bucketLayout);
  
  final String rootPath = String.format("%s://%s/",
      OZONE_OFS_URI_SCHEME, CONF.get(OZONE_OM_ADDRESS_KEY));
  CONF.set(CommonConfigurationKeysPublic.FS_DEFAULT_NAME_KEY, rootPath);

  final String dir = OZONE_ROOT + testBucket.getVolumeName()
      + OZONE_URI_DELIMITER + testBucket.getName();

  // Create multiple files, some with hsync
  final int numFiles = 5;
  Map<String, FSDataOutputStream> openKeysOSMap = new HashMap<>();

  try (FileSystem fs = FileSystem.get(CONF)) {
    for (int i = 0; i < numFiles; i++) {
      String keyName = "batch-key-" + i;
      Path file = new Path(dir, keyName);
      FSDataOutputStream os = fs.create(file, true);
      os.write(1);
      os.hsync();
      openKeysOSMap.put(keyName, os);
    }

    try {
      testBucket.deleteKeys(new ArrayList<>(openKeysOSMap.keySet()));
    } catch (Exception ex) {
      fail("deleteKeys should not throw exception, but got " + ex);
    }

    for (Map.Entry<String, FSDataOutputStream> entry : openKeysOSMap.entrySet()) {
      // test if the os is not closed at client side
      OMException exception = assertThrows(OMException.class,
          () -> entry.getValue().hsync(), "hsync should throw because the key " + entry.getKey() + " is deleted");
      assertEquals(OMException.ResultCodes.KEY_NOT_FOUND, exception.getResult());

      // os.close() throws OMException because the key is deleted
      exception = assertThrows(OMException.class,
          () -> entry.getValue().close());
      assertEquals(OMException.ResultCodes.KEY_NOT_FOUND, exception.getResult());
    }
  }
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like a product bug to me. Potentially something inside OmKeysDeleteRequestWithFSO.

Note: we have two separate request types for handling single delete and batch deletes. So mistakes can happen.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some logs to the existing testHsync#testHSyncDeletedKey and found the following:

// delete key
(OMKeyDeleteResponseWithFSO.java:addToDBBatch(126)) - Write open key: /volume42541/bucket06654/key-hsync-del/113980087550083073 to DB with metadata: {hsyncClientId=113980087550083073, deletedHsyncKey=true}

// second hsync
OMKeyCommitRequestWithFSO.java:validateAndUpdateCache(181)) - Read open key: /-9223372036854775552/-9223372036854775040/-9223372036854775040/key-hsync-del/113980087550083073 from DB with metadata: {hsyncClientId=113980087550083073}

It appears that the key /-9223372036854775552/-9223372036854775040/-9223372036854775040/key-hsync-del/113980087550083073 was written during the first hsync process:

omMetadataManager.getOpenKeyTable(getBucketLayout()).putWithBatch(
batchOperation, getOpenKeyName(), getNewOpenKeyInfo());

Given this, I believe we should modify the approach to pass the complete key name along with the desired open key/file info metadata.

I’d love to hear your thoughts on this. I really appreciate your input, @jojochuang.

Thanks!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I added a fail(...) to test if the exception really happened

 @Test
  public void testHSyncDeletedKey() throws Exception {
    // Verify that a key can't be successfully hsync'ed again after it's deleted,
    // and that key won't reappear after a failed hsync.

    // Set the fs.defaultFS
    final String rootPath = String.format("%s://%s/",
        OZONE_OFS_URI_SCHEME, CONF.get(OZONE_OM_ADDRESS_KEY));
    CONF.set(CommonConfigurationKeysPublic.FS_DEFAULT_NAME_KEY, rootPath);

    final String dir = OZONE_ROOT + bucket.getVolumeName()
        + OZONE_URI_DELIMITER + bucket.getName();
    final Path key1 = new Path(dir, "key-hsync-del");

    try (FileSystem fs = FileSystem.get(CONF)) {
      // Create key1
      try (FSDataOutputStream os = fs.create(key1, true)) {
        os.write(1);
        os.hsync();
        fs.delete(key1, false);

        // getFileStatus should throw FNFE because the key is deleted
        assertThrows(FileNotFoundException.class,
            () -> fs.getFileStatus(key1));

        // hsync should throw because the open key is gone
        try {
          os.hsync();
+          fail("hsync should throw because the open key is gone");
        } catch (OMException omEx) {
          assertEquals(OMException.ResultCodes.KEY_NOT_FOUND, omEx.getResult());
        }
        // key1 should not reappear after failed hsync
        assertThrows(FileNotFoundException.class,
            () -> fs.getFileStatus(key1));
      } catch (OMException ex) {
        // os.close() throws OMException because the key is deleted
        assertEquals(OMException.ResultCodes.KEY_NOT_FOUND, ex.getResult());
      }
    }
  }

then it failed with:

[ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 38.015 s <<< FAILURE! - in org.apache.hadoop.fs.ozone.TestHSync
[ERROR] org.apache.hadoop.fs.ozone.TestHSync.testHSyncDeletedKey  Time elapsed: 0.804 s  <<< FAILURE!
org.opentest4j.AssertionFailedError: hsync should throw because the open key is gone
        at org.junit.jupiter.api.AssertionUtils.fail(AssertionUtils.java:38)
        at org.junit.jupiter.api.Assertions.fail(Assertions.java:138)
        at org.apache.hadoop.fs.ozone.TestHSync.testHSyncDeletedKey(TestHSync.java:578)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at java.base/java.util.ArrayList.forEach(ArrayList.java:1541)
        at java.base/java.util.ArrayList.forEach(ArrayList.java:1541)

[INFO] 
[INFO] Results:
[INFO] 
[ERROR] Failures: 
[ERROR]   TestHSync.testHSyncDeletedKey:578 hsync should throw because the open key is gone

@peterxcli
Copy link
Member Author

Hi @jojochuang, I just left a comment on another PR - #6079 (comment).
Do you think it is still necessary to add this coverage? As I guess the hsync request without new write won't be send to the server.

@adoroszlai adoroszlai requested a review from jojochuang June 5, 2025 16:44
@github-actions
Copy link

This PR has been marked as stale due to 21 days of inactivity. Please comment or remove the stale label to keep it open. Otherwise, it will be automatically closed in 7 days.

@github-actions github-actions bot added the stale label Nov 12, 2025
@github-actions
Copy link

Thank you for your contribution. This PR is being closed due to inactivity. If needed, feel free to reopen it.

@github-actions github-actions bot closed this Nov 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants