Skip to content

Conversation

@tanvipenumudy
Copy link
Contributor

@tanvipenumudy tanvipenumudy commented Jun 9, 2025

What changes were proposed in this pull request?

The PR introduces a GET /api/v1/keys/deletePending/dirs/size-summary endpoint to include totalUnreplicatedDataSize and totalReplicatedDataSize fields for tracking the total pending deletion size for FSO directories in OM.

  • totalReplicatedDataSize: Total size of directories pending deletion with expected replicated size.
  • totalUnreplicatedDataSize: Total raw size of directories pending deletion without replicated size.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-13188

How was this patch tested?

Manual Testing:

bash-5.1$ ozone sh volume create vol1
bash-5.1$ ozone sh bucket create vol1/buck1 --layout=FILE_SYSTEM_OPTIMIZED
bash-5.1$ ozone fs -mkdir -p ofs://om/vol1/buck1/dir1/dir2/dir3

# Create 1 MB file:
bash-5.1$ dd if=/dev/zero of=test.txt bs=1M count=1
bash-5.1$ ozone fs -put test.txt ofs://om/vol1/buck1/dir1/file1
bash-5.1$ ozone fs -put test.txt ofs://om/vol1/buck1/dir1/dir2/file2
bash-5.1$ ozone fs -put test.txt ofs://om/vol1/buck1/dir1/dir2/dir3/file3

Directory structure:

dir1/
    ├── file1 (1 MB)
    └── dir2/
        ├── file2 (1 MB)
        └── dir3/
            └── file3 (1 MB)

# Recursively delete top-level directory:
bash-5.1$ ozone fs -rm -r ofs://om/vol1/buck1/dir1

GET /api/v1/keys/deletePending/dirs/size-summary:

Before https://github.com/apache/ozone/pull/8568:

{
  "totalDeletedDirectories": 1,
  "totalReplicatedDataSize": 0,
  "totalUnreplicatedDataSize": 3145728
}

After https://github.com/apache/ozone/pull/8568:

{
  "totalDeletedDirectories": 1,
  "totalReplicatedDataSize": 9437184,
  "totalUnreplicatedDataSize": 3145728
}

@tanvipenumudy tanvipenumudy changed the title HDDS-13188. Track total pending deletion size for FSO directories in OM HDDS-13188. Track total pending deletion size for FSO directories in OM via Recon Jun 9, 2025
Copy link
Contributor

@aryangupta1998 aryangupta1998 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch, changes look good to me!

Copy link
Contributor

@devmadhuu devmadhuu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @aryangupta1998 for the patch. Pls see comments.

@tanvipenumudy tanvipenumudy marked this pull request as draft June 10, 2025 10:57
Copy link
Contributor

@priyeshkaratha priyeshkaratha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @tanvipenumudy for the patch. Changes looks fine for me.

Copy link
Contributor

@devmadhuu devmadhuu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tanvipenumudy for the patch. Pls check comment and also correct the PR description to correct the API end point as now you have created new API - /deletePending/dirs/size

@tanvipenumudy tanvipenumudy marked this pull request as ready for review July 14, 2025 07:56
Comment on lines +636 to +685
while (iterator.hasNext()) {
Table.KeyValue<String, OmKeyInfo> kv = iterator.next();
OmKeyInfo omKeyInfo = kv.getValue();
if (omKeyInfo != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @tanvipenumudy
Currently, we iterate through the DeletedDirectoryTable, calculate the size of each deleted directory, and then find the sum total size. The issue arises when both the parent and child entries are encountered in the DeletedDirectoryTable, resulting in the same size being calculated twice.

Are we going to consider this scenario?
Or is this scenario even possible?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @ArafatKhan2198 for the review, yes for the FSO deleted directory space calculation - it is indeed possible that both the parent and child directory entries might coexist within the deletedDirectoryTable, but this wouldn't result in double counting of the child directory's size because of how Recon processes delete events today, let's consider the below two scenarios for example:

Case 1: /dir1/dir2, if dir1 (parent) is deleted first -> then /dir1 and /dir1/dir2 wouldn't coexist in the deletedDirectoryTable at any given point in time (the directories would land in the deletedDirectoryTable in order from top to bottom) -> this shouldn't cause any impact.

Case 2: /dir1/dir2, if /dir1/dir2 is deleted first following which /dir1 is deleted, then both /dir1/dir2 and /dir1 can coexist in the deletedDirectoryTable.

  • This wouldn't double-calculate the size of /dir1/dir2 when both /dir1/dir2 and /dir1 are in the deletedDirectoryTable because Recon detaches the /dir1/dir2 (child directory) from its parent (/dir1) in the NSSummaryTree as and when it sees this delete directory event.

  • Our approach would iterate over Recon's deletedDirectoryTable entries and recursively calculate the sizes of the entry's sub-directories:

    • We would first encounter /dir1/dir2 in the deletedDirectoryTable, recursively calculate the sizes of directories under it.
    • We would then encounter /dir1 in the deletedDirectoryTable, but since the child tree is detached -> we would only be recursively calculating the sizes of its existing sub-directories.

Copy link
Contributor Author

@tanvipenumudy tanvipenumudy Jul 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking another example for better clarity, consider the following directory structure:

- dir1/
  - file1 (1 GB)
  - dir4/
    - file2 (1 GB)
    - file6 (1 GB)
  - dir2/
    - file3 (1 GB)
    - dir3/
      - file4 (1 GB)
      - file5 (1 GB)

Suppose /dir1/dir2/dir3 is deleted first, followed by the deletion of /dir1:

  • First, the link between /dir1/dir2 and /dir1/dir2/dir3 would be detached.
  • As per our approach, we would iterate over the deletedDirectoryTable entries:
    • We would first encounter: /dir1/dir2/dir3 in deletedDirectoryTable
      /dir1/dir2/dir3.getSizeOfAllFiles() = 2 GB (file4 and file5)
    • We would then encounter: /dir1 in deletedDirectoryTable
      → size of /dir1 would be calculated as:
      • /dir1.sizeOfAllFiles()1 GB (file1) + recursiveSizeOfSubDirs(/dir1):
        • /dir1/dir4.sizeOfFiles()2 GB (file2 and file6) + no sub-directories.
        • /dir1/dir2.sizeOfFiles()1 GB (file3) + no sub-directories (as /dir1/dir2/dir3 is detached).
      • Total size of /dir1 = 1 GB + 2 GB + 1 GB = 4 GB.

So the total FSO deleted directory space would be: 2 GB (/dir1/dir2/dir3) + 4 GB (/dir1 with detached child: /dir1/dir2/dir3) = 6 GB.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks tanvi for the detailed explanation on this!
It makes sense.

Copy link
Contributor

@ArafatKhan2198 ArafatKhan2198 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your tremendous effort on this PR, Tanvi, and for all the insightful discussions.
There's another bug in the code that was introduced earlier and has persisted. Let me walk you through the issue.

For example, let's consider this case:

Take the following path as an example: /dir1/dir2/f1.txt

Suppose the parent directory dir2 is deleted. In this scenario, the recon component can receive events in two possible orders:

Event 1: recon first receives the deletion event for dir2.
When this happens, dir2 is moved from the directoryTable to the deletedDirectoryTable. It is also unlinked from its parent directory dir1 but remains in the NamespaceSummary, so that our method fetchSizeForDeletedDirectory can still utilize it.

Event 2: After processing the deletion of dir2, recon then receives the deletion event for f1.txt.
When this occurs, f1.txt is moved from the fileTable to the deletedTable.

Now, here's the problem:

At this point, the size of f1.txt is subtracted from the NamespaceSummary object of dir2. This means the sizeOfFiles field for dir2 becomes zero. However, the actual data blocks for f1.txt have not been deleted yet—it still resides in the deletedTable. Therefore, the space has not been reclaimed, and the deleted directory still holds space on disk.

However, when we call the method fetchSizeForDeletedDirectory, it recursively traverses the directory tree and returns the size for the deleted directory dir2 as zero. This is incorrect, because f1.txt still exists in the deletedTable and is consuming disk space.

Conclusion:

This shows that summing up file sizes using only the NamespaceSummary leads to incorrect results. Even fetching the size for a single directory can produce a wrong value if file deletions are not handled correctly.

@tanvipenumudy
Copy link
Contributor Author

Thank you, @ArafatKhan2198, for the detailed explanation of the bug we encountered in today's code and for actively working on the corresponding ticket: HDDS-13479 and PR: #8836.

As discussed offline, let's go ahead and merge the current PR into the feature branch for now. Once the fix has been merged into master, we can rebase the feature branch on top of the latest master to include the changes.

Copy link
Contributor

@ChenSammi ChenSammi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tanvipenumudy , the last patch LGTM.

Copy link
Contributor

@sumitagrawl sumitagrawl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sumitagrawl
Copy link
Contributor

HDDS-13479

We should not capture deleted files in NSSummary, it will be unlinked. Deleted files can be obtained from deletedTable.
We need not handle this here, but already reporting deleted files separately.

cc: @ArafatKhan2198 @tanvipenumudy

Copy link
Contributor

@ArafatKhan2198 ArafatKhan2198 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks tanvi!
LGTM +1

@sumitagrawl sumitagrawl merged commit c57b30a into apache:HDDS-13177 Jul 24, 2025
44 checks passed
priyeshkaratha pushed a commit to priyeshkaratha/ozone that referenced this pull request Nov 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants