-
Notifications
You must be signed in to change notification settings - Fork 587
HDDS-4131. Container report should update container key count and bytes used if they differ in SCM #1339
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hi @sodonnel, since a typical container has 3 replicas and container report are asynchronously, we need a consensus on what's the container size is in SCM. Basically AbstractContainerReportHandler is not the perfect place to handle this because it doesn't have a global view while Replication Manager has. |
ContainerReportHandler has the same view of all replicas as replication manager, as it has access to the ContainerManager object. I will push a new commit that adjusts the values based on all 3 replicas. I still need to add a test or two for this, but this demonstrates the point hopefully. Ideally, we should be updating these values in a single place. The containerReportHandler is supposed to do it, but it is not doing it correctly, so we need to fix that, rather than adding new logic elsewhere to work around the bug. |
adoroszlai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @sodonnel for improving code responsibilities / fixing the original bug at the root cause.
...r-scm/src/main/java/org/apache/hadoop/hdds/scm/container/AbstractContainerReportHandler.java
Show resolved
Hide resolved
...erver-scm/src/test/java/org/apache/hadoop/hdds/scm/container/TestContainerReportHandler.java
Outdated
Show resolved
Hide resolved
Agree, -- based on my understanding -- it's better to do before replication manager:
|
|
@adoroszlai @ChenSammi Are you happy with this change at this stage? Can we commit it? |
ChenSammi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one minor inline comment. Good with the rest part.
| SCMException.ResultCodes.FAILED_TO_FIND_CONTAINER); | ||
| } | ||
| containerInfo.updateDeleteTransactionId(entry.getValue()); | ||
| containerInfo.setNumberOfKeys(containerInfoInMem.getNumberOfKeys()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer to keep the KeyCount and UsedKeys persist action here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After looking at this area a bit more I understand why that is needed now. I have added those two lines back in.
|
I think this change is good to commit now? @adoroszlai gave a thumbs up a few days back and I have addressed the only concern @ChenSammi raised. I will commit tomorrow unless anyone objects before then. |
|
+1. Thanks @sodonnel for the contribution. |
…es used if they differ in SCM (apache#1339)
What changes were proposed in this pull request?
In HDDS-4037 it was noted that when blocks are deleted from closed containers, the bytesUsed and Key Count metrics on the SCM container are not updated correctly.
These stats should be updated via the container reports issued by the DNs to SCM periodically. However, in
AbstractContainerReportHandler#updateContainerStats, the code assumes the values are always increasing and it will not update them if they are decreasing:In HDDS-4037 a change was made to the Replication Manager, so it updates the stats. However I don't believe that is the correct place to perform this check, and the issue is caused by the logic shared above.
In this Jira, I have removed the changes to Replication Manager in HDDS-4037 (but retained the other changes in that Jira), ensuring the problem statistics are only updated via the containers reports if they are different in SCM from what is reported.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-4131
How was this patch tested?
Small change to existing unit test. Used it to reproduce the problem before making the changed.