Skip to content

Conversation

@ArafatKhan2198
Copy link
Contributor

@ArafatKhan2198 ArafatKhan2198 commented Jul 10, 2023

What changes were proposed in this pull request?

1. This PR does two main things :-

  • It lets the OmTableInsightTask figure out the total count of directories marked for deletion.
  • It makes the complex OmTableInsightTask easier to work with by separating different handlers for each table.

2. Finding the Count of Deleted Directories: :-

  • It iterates through the DeletedDirectory table and gets the total count of deleted directories.

3. Refactoring the OmTableInsightTask :-

  • Noticing that we need to find sizes for keys and other entities in various tables, which has made the process() and reprocess() methods long and complicated.
  • To simplify things, we've created separate Insight-Handler classes (like OpenKeysInsightHandler, DeletedKeysInsightHandler) that handle specific tasks for each table.
  • This change makes the code easier to manage and paves the way for adding more tables or operations in the future by creating and connecting specific handlers.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-8627

How was this patch tested?

Unit Testing and Manual Testing

  1. Deleted 3 directories :- dir1, dir2, dir3 :-
image
  1. Summary printing out the count of total deleted directories :-
image

@ArafatKhan2198
Copy link
Contributor Author

@devmadhuu @dombizita @ashishkumar50 can you please take a look !

Copy link
Contributor

@ashishkumar50 ashishkumar50 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ArafatKhan2198, Thanks for working on this. Please find few comments inline.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ArafatKhan2198 thanks for working on this patch, can you pls elaborate the purpose of this method since you added tableName as extra parameter now. Is this method only for counting records of tables which are having keys as OMKeyInfo and RepeatedOmKeyInfo only ?. Also if GlobalStatsTable will not have counts of OpenFile and OpenKey table ?

Copy link
Contributor Author

@ArafatKhan2198 ArafatKhan2198 Jul 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment @devmadhuu

The purpose of the getTableSizeAndCount method is to calculate the total count of records, total unreplicated size, and total replicated size in a given iterator over a table, based on the specified tableName. It is only used for tables which want both the size as well as the count to be calculated hence in our case it will only work for OPEN_KEY_TABLE, OPEN_FILE_TABLE, DELETED_TABLE & DELETED_DIR_TABLE.

Table Name Object Associated
OPEN_KEY_TABLE OmKeyInfo
OPEN_FILE_TABLE OmKeyInfo
DELETED_DIR_TABLE OmKeyInfo
DELETED_TABLE RepeatedOmKeyInfo

In our case, the OPEN_KEY_TABLE, OPEN_FILE_TABLE, and DELETED_DIR_TABLE all have OmKeyInfo associated with them. Therefore, we need to differentiate between them based on the table name. The reason for this differentiation is because the calculation of data differs for the OPEN_KEY_TABLE and OPEN_FILE_TABLE compared to the DELETED_DIR_TABLE.

To summarize, when calculating data for the OPEN_KEY_TABLE and OPEN_FILE_TABLE, we consider the dataSize of the keys stored in the OmKeyInfo objects. However, for the DELETED_DIR_TABLE, we retrieve the size of the deleted directory from the NSSummary table. It's important to note that the replicatedSize value is not applicable to the DELETED_DIR_TABLE.

So, to clarify, the globalStats table stores both the sizes and counts of tables, and the getTableSizeAndCount method calculates these values and updates the globalStats table accordingly.

@ArafatKhan2198 ArafatKhan2198 marked this pull request as draft November 2, 2023 06:12
@ArafatKhan2198 ArafatKhan2198 marked this pull request as ready for review November 20, 2023 06:21
@ArafatKhan2198 ArafatKhan2198 marked this pull request as draft November 24, 2023 13:11
@ArafatKhan2198 ArafatKhan2198 marked this pull request as ready for review November 27, 2023 12:02
@ArafatKhan2198
Copy link
Contributor Author

@devmadhuu @adoroszlai @ashishkumar50
Could you please review this.

Copy link
Contributor

@devmadhuu devmadhuu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ArafatKhan2198 for continued effort on this. Mostly looks good to me except few minor comments.

Copy link
Contributor

@sumitagrawl sumitagrawl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ArafatKhan2198 Thanks for working over this, IMO, this insight is not required as its very difficult to handle dynamic scenario, and further can not see further use case as already we have implemented api to give "pending deletion size".

@ArafatKhan2198
Copy link
Contributor Author

ArafatKhan2198 commented Jan 29, 2024

Initially, we attempted to implement the DU space calculation of deleted directories using NSSummaries. However, Sumit highlighted the difficulty in determining the total size of all deleted directories using our original approach. Currently, we iterate through the DeletedDirectoryTable, calculate the size of each deleted directory, and then find the sum total size. The issue arises when both the parent and child entries are encountered in the DeletedDirectoryTable, resulting in the same size being calculated twice.

To address the challenge of calculating the size of deleted directories, especially when encountering both parent and child entries in the DeletedDirectoryTable, we are adjusting our approach. To obtain the count of deletePending directories and their corresponding data size, we can derive the data size from the keys of the deleted directories. These keys are already being calculated and are an integral part of the deleted directories.

cc: @devmadhuu @dombizita

Copy link
Contributor

@devmadhuu devmadhuu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ArafatKhan2198 for continued effort on this patch. Few comments, Pls check.

@sumitagrawl sumitagrawl changed the title HDDS-8627. Recon - API for Count of deletePending directories and amount of data mapped to such directories. HDDS-8627. Recon - API for Count of deletePending directories Feb 7, 2024
Copy link
Contributor

@sumitagrawl sumitagrawl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sumitagrawl sumitagrawl merged commit c1efa33 into apache:master Feb 9, 2024
@dombizita
Copy link
Contributor

Please adjust the Jira (HDDS-8627) title and description to the changes discussed during the PR review @ArafatKhan2198 @sumitagrawl

@ArafatKhan2198
Copy link
Contributor Author

Please adjust the Jira (HDDS-8627) title and description to the changes discussed during the PR review @ArafatKhan2198 @sumitagrawl

Thanks for pointing it out will do!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants