-
Notifications
You must be signed in to change notification settings - Fork 588
HDDS-4023. Delete closed container after all blocks have been deleted. #1338
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks for working on this @ChenSammi. I wonder if it would be simpler to remove empty containers as part of Container Report processing? In In looking at how this all works, I noticed there is a bug in the existing method AbstractContainerReportHandler#updateContainerStats, which might stop your change working: This logic assumes the bytes used and keyCount is always increasing on the container. However in this case, where blocks are deleted the keyCount and bytesUsed can decrease. Therefore I don't think the stats will ever get updated in SCM via the container reports when blocks are removed. I then noticed you recently made a change in ReplicationManager to update these counts here: I think the ReplicationManager part of that change should be reverted and it handled via the ContainerReports by fixing the bug I mentioned above. The only reason we need the Replication Manager change is to work around the bug above. If it was fixed, we don't need that logic in RM. If we decide to keep your changes inside replicationManager (for this change), then we would need a few more tests added to TestReplicationManager to cover the new scenarios. I also have a question - when the replicas are all deleted, and we call: Does this somehow cause the container to be removed from SCM memory and the persistent store? |
|
Hi @sodonnel thanks for review the code and initiate the discussion.
This logic works for OPEN container, but not for CLOSED container.
TestReplicationManager uses Mockito which is not a real end-to-end test. I add a end-to-end test in TestBlockDeletion.
Container of DELETED state currently is still in SCM memory and persistent store. Of course we can just delete the Container from memory and DB when it's state changes to DELETED. But I feel like current way is a more safe way, like a Container trash bin. We can add a new purge DELETED Container action next so user can manually delete these Container when they sure these information are no longer needed. Overall, I'm open to this question, fine with either way. |
|
The failed UT is irrelevant, fixed by #1379. |
|
Hi, @sodonnel do you have time for another review? |
| handleContainerUnderDelete(container, replicas); | ||
| return; | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add another check here:
if (state == LifeCycleState.DELETED) {
return
}
This will avoid doing any further processing on a container which is expected to have zero replicas?
|
As I said before, I have a feeling that it might be better to have a separate service for deleting empty containers, and it would perhaps naturally be part of a "merge small containers" service, which we will need to create eventually. However, the changes to the replication manager here are not too big, and you could argue the current solution fits here OK. Therefore I am OK with keeping this logic in the replication manager. However when it comes to merging small containers in the future, I believe we should create a separate service and not clutter RM with that new feature too. The patch looks mostly good to me, there are just a couple of areas we need to consider.
|
|
Thinking some more about what to do with containers in the DELETED state in SCM. If we are not confident to delete them immediately and automatically, maybe we should provide an command "purge empty containers" which can be run by an admin, or via Recon. We should show in Recon how many empty containers are present in the cluster and then provide an option to remove them. Alternatively we could have some threshold in SCM, so that when the DELETED containers crosses some threshold, they are all removed. We could do either of these in a separate jira if this idea makes sense. |
|
Thanks @sodonnel for very thoughtful suggestions.
Agree. we should just delete the replica in the case.
The extra stale replica is a real problem, not just in empty container deletion case, but also for non-empty CLOSED container case. After a delete block command is confirmed by 3 datanodes, this delete block command is finished and removed from block deletion log. So if after that, a fourth replica shows up, SCM will not send an new delete block command to this replica.
Good catch!
Right. I also agree we should purge these deleted containers after a while. We can have a timer task to periodically remove the DELETED containers from SCM DB. What do you think? |
|
After a second thought, deleting the container record in SCM DB immediately while keep it in memory maybe a better and clean choice. So if there is stale container replica, it can be deleted based on in memory information. And next time when SCM start, SCM doesn't need to handle DELETED containers anymore. |
| if (isHealthy(replicaProto::getState)) { | ||
| final ContainerInfo containerInfo = containerManager | ||
| .getContainer(containerId); | ||
| if (containerInfo.getState() == HddsProtos.LifeCycleState.DELETED) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't seem correct to put the logic to delete the replica for the DELETED container inside updateContainerStats.
In ContainerReportHandler#processContainerReplicas(..) there is logic to delete an unknown container in the exception handler.
Could we extract this into a new method, which is called from the exception handler. Then in AbstractContainerReportHandler#updateContainerState(...) handle the containers which should be deleted in the "case DELETED" branch of the swith statement. It could call that same extracted method - that way the logic to form the DeleteContainer command will be the same for both? It also seems more logical to put the delete inside UpdateContainerState rather than UpdateContainerStats.
|
Sorry for the slow reply on this. I have been caught up on some other things.
I think this is a good enough idea for now. If SCM is up for a very long time, perhaps in the future we will want to add a thread to clear all the in memory DELETED containers. One small concern is that if a container goes DELETED and then SCM is restarted soon after. Then a DN is restarted and reports a stale replica, it will just be seen as an unknown container. The default position there, is to log a warning. The config hdds.scm.unknown-container.action controls this. This is all an edge case - most of the time, all DNs should be up anyway. I left just one comment on a suggested refactor in the container report handler, when dealing with replicas from a DELETED container. Could you also add a test in TestContainerReportHander to check the logic around deleting a replica from a DELETED container? |
Thanks @sodonnel , a new commit to address the concerns. |
|
This change looks almost good now. I wonder about two final things:
|
There is following logic in ReplicatioManager, which will handle the replicas reported during container state is DELETING. if (state == LifeCycleState.DELETING) { |
| */ | ||
| if (isContainerEmpty(container, replicas)) { | ||
| deleteContainerReplicas(container, replicas); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ChenSammi , Is there any specific reason that we let ReplicationManager to help clean empty containers? After this, ReplicaManager will do additionally container empty check for all healthy containers. Not sure if this is an efficiency way to put logic here.
I wonder if it would be simpler to remove empty containers as part of Container Report processing? In AbstractContainerReportHandler#updateContainerState, we could check the size and number of keys of the reported containers in the CLOSED branch of the switch statement, and then take action to delete an empty container there? I have a feeling it might be simpler, but I am not sure. The disadvantage of doing it in the Container Report Processing, is that we are dealing with only a single replica at that stage. However if the container is CLOSED in SCM, and a report says it is empty then we should be good to simply remove the container from SCM and issue the delete container command when processing the container report.
Actually I prefer this way as @sodonnel mentioned.
but I am not sure. The disadvantage of doing it in the Container Report Processing, is that we are dealing with only a single replica at that stage
We could also get all replica info and check state in ContainerReportHandler, then send delete container command
I'm okay for current way but just share my thought for this.
Sorry I missed that. You are correct. I am +1 on this change as it is now, so feel free to commit it. @linyiqun I do agree that I think this could be handled more cleanly and efficiently in the container report handler. However its probably not much of an overhead for replication manager. I am happy for us to commit the change as it is, and we can see how it performs in practice. Worst case we have to refactor the change out of RM into the report handler. What do you think? |
+1 for this, @sodonnel . @ChenSammi , can you add a TODO comment like below while committing this PR? That will be helpful for us to revisit this in the future. +1 from me. |
|
Thanks @sodonnel and @linyiqun for the review. Basically I think report handler is not a good place to handle all the empty container deletion process. It can tell which one is empty , but it lacks of the facilities in ReplicationManager, such as inflightDeletion, such as handle send command to extra replica for DELETING state container, or resend command. In future, when container compation is considered, we can move this container deletion logic to be together. |
https://issues.apache.org/jira/browse/HDDS-4023