Skip to content

Conversation

@priyeshkaratha
Copy link
Contributor

@priyeshkaratha priyeshkaratha commented Nov 5, 2025

What changes were proposed in this pull request?

The PR introduces a GET /api/v1/keys/deletePending/dirs/size-summary endpoint to include totalUnreplicatedDataSize and totalReplicatedDataSize fields for tracking the total pending deletion size for FSO directories in OM.

  • totalReplicatedDataSize: Total size of directories pending deletion with expected replicated size.
  • totalUnreplicatedDataSize: Total raw size of directories pending deletion without replicated size.

What is the link to the Apache JIRA

HDDS-13188

How was this patch tested?

CI is green.
Test details can be found here.

@priyeshkaratha priyeshkaratha marked this pull request as ready for review November 7, 2025 05:08
@devmadhuu devmadhuu self-requested a review November 7, 2025 05:49
Copy link
Contributor

@devmadhuu devmadhuu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@priyeshkaratha thanks for the patch, pls check my comment.

* }
*/
@GET
@Path("/deletePending/dirs/size-summary")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How this API is different from existing one here ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@devmadhuu , this "/deletePending/dirs/size-summary" is a new API, returns calculated size and replicated size.

"/deletePending/dirs/summary" is a existing API, returns count from DB directly without calculation.

Do you think we should merge them into one, or keep it separately?

Copy link
Contributor

@devmadhuu devmadhuu Nov 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see that existing API and new API both are iterating deletedDirTable, Only difference is that new API is using size computation which is pre-computed in NSSummary and existing API is also using same NSSummary precomputed sizes ? Then what exactly is the difference, and why we need a separate API if existing API will serve the purpose ? May be we can use existing API with handling of limit param which is 1000 by default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway we are using this api inside recon only. So I think instead of exposing as API, we can use calculateTotalPendingDeletedDirSizes method as public method. So that we can use it in newly added recon endpoint. What is your thoughts? @ChenSammi @devmadhuu

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still not able to understand, why we need that new API code ? Existing API, very minor changes needed related to limit. Can you explain ?

Copy link
Contributor

@ChenSammi ChenSammi Nov 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a summary API, so I guess limit doesn't apply. Ideally, we should add the new function in "/deletePending/dirs/summary" too, as the pending deletion key does, so all the APIs' behavior is consistent.

   * Example response:
   *   {
   *    "totalDeletedKeys": 8,
   *    "totalReplicatedDataSize": 90000,
   *    "totalUnreplicatedDataSize": 30000
   *   }
   */
  @GET
  @Path("/deletePending/summary")
  public Response getDeletedKeySummary() {

The difference is for directory, it iterate the omMetadataManager.getDeletedDirTable() table to calculate all the sizes, for key, it does this

 // Fetch the necessary metrics for deleted keys
      Long replicatedSizeDeleted = getValueFromId(reconGlobalStatsManager.getGlobalStatsValue(
          OmTableInsightTask.getReplicatedSizeKeyFromTable(DELETED_TABLE)));
      Long unreplicatedSizeDeleted = getValueFromId(reconGlobalStatsManager.getGlobalStatsValue(
          OmTableInsightTask.getUnReplicatedSizeKeyFromTable(DELETED_TABLE)));
      Long deletedKeyCount = getValueFromId(reconGlobalStatsManager.getGlobalStatsValue(
          OmTableInsightTask.getTableCountKeyFromTable(DELETED_TABLE)));

I'm not very familiar with Recon code. So does above code just read the data from somewhere and doesn't require any further calculation?

My concern regarding how we use "/deletePending/summary" and "/deletePending/dirs/summary" in Recon UI, if pending deletion directory size calculation is expensive, and in Recon UI, where only totalDeletedDirectories is required, if we calculate the size in each API call, the data is not used and wasted, in meanwhile also add the latency to this API call. I'm fine to merge the two API into one if that's not a problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can close this PR. Instead of maintaining two separate methods, we can simplify the logic by passing limit = -1 when we want to retrieve all results. Inside the loop, we can add a check like if (limit > 0 && resultSize == limit) to break when the limit is reached . This way, the same method can handle both limited and complete results without additional code duplication.

Copy link
Contributor

@devmadhuu devmadhuu Nov 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ChenSammi This PR is adding an API for deletePending directories size summary. These all existing APIs /deletePending (for pending delete keys), /deletePending/dirs (for deletePending directories) and /deletePending/dir/summary are for different purpose respectively. As far as I understand, currently /deletePending/dir/summary API is not being used in Recon UI anywhere and just returning count pre-computed from OmTableInsightTask and not on the fly. And /deletePending/dirs API is iterating deletedDirTable in same way as this new API /deletePending/dirs/size-summary in this PR is iterating the table. If you see the difference in their code, there is not much difference and rather existing API is more sophisticated and supports pagination as well. Existing API and new API in this PR both are using the size values (replicated , unreplicated) from pre-computed data from NSSummary and will provide similar performance and same data. @priyeshkaratha can test and confirm. It is just that , If we can handle limit param with -1 (all values),, then no need of new API in this PR, because new API in this PR also iterating all keys (records) from deletedDirTable.

totalReplicatedDataSize += sizeInfo.getRight();
}
}
} catch (IOException ex) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please log a message for this ex to know why it fails.

Copy link
Contributor

@devmadhuu devmadhuu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls check

@priyeshkaratha
Copy link
Contributor Author

priyeshkaratha commented Nov 10, 2025

@devmadhuu @ChenSammi I was able to use existing method in Recon. So closing this PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants