Skip to content

Conversation

@jianghuazhu
Copy link
Contributor

What changes were proposed in this pull request?

When SCM repeatedly issues deletion commands, the datanode will execute these commands repeatedly, which is unnecessary.
There is a situation that will cause this problem, that is, SCMBlockDeletingService created the DeleteBlocksCommand in advance, but did not update it.
Details: HDDS-9748

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-9748

How was this patch tested?

Unit tests need to pass validation.

@jianghuazhu
Copy link
Contributor Author

https://github.com/jianghuazhu/ozone/actions/runs/6986022982
Here is the CI result of the fork branch, which is normal.
Can you help review this PR, @adoroszlai @hemantk-12 .
Thanks.

Copy link
Contributor

@sumitagrawl sumitagrawl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jianghuazhu Thanks for working over this. This is one of solution to reduce duplicate delete blocks command,
Here, based on response from DN, removing blocks already handled in prepared request.
But this do not avoid the problem, as at DN, multiple delete block command may be queued already.

Another solution,
DN side have small cache to remove duplicate command comparing previous and next command at DN. This I was thinking which can reduce duplicate blocks deletion even its present at DN queue.

One of solution for which PR already raised, "Avoid duplicate command from SCM itself tracking when the command is send to DN and handle retry"
#4988

IMO, changes may not be required in this PR.

@jianghuazhu
Copy link
Contributor Author

Thanks @sumitagrawl for the comment and reivew.
I noticed an update related to #4988, which seems to be a passive solution.
This PR is a command to actively reduce duplication from the source (SCM). I think it can be used as a supplement to #4988.
Also, I thought of others. For example, is it necessary to update the status of Container and Pipeline?

@adoroszlai adoroszlai marked this pull request as draft December 1, 2023 14:56
@sumitagrawl
Copy link
Contributor

Thanks @sumitagrawl for the comment and reivew. I noticed an update related to #4988, which seems to be a passive solution. This PR is a command to actively reduce duplication from the source (SCM). I think it can be used as a supplement to #4988. Also, I thought of others. For example, is it necessary to update the status of Container and Pipeline?

This is not fixing the scenario completely. Duplicate blocks still exist for request in queue at DN.

@jianghuazhu
Copy link
Contributor Author

Thanks @sumitagrawl for the comment and review.
Thanks.

return commands;
}

private void updateCommands(List<SCMCommand> commands,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

  1. There is too much nesting which can be avoided using early returns.
  2. Use static import.
Suggested change
private void updateCommands(List<SCMCommand> commands,
private void updateCommands(List<SCMCommand> commands,
CommandStatusReportsProto commandStatusReport) {
if (commands == null) {
return;
}
List<StorageContainerDatanodeProtocolProtos.CommandStatus> cmdStatusList
= commandStatusReport.getCmdStatusList();
for (StorageContainerDatanodeProtocolProtos.CommandStatus cmdStatus :
cmdStatusList) {
if (cmdStatus.getType() != deleteBlocksCommand ||
cmdStatus.getStatus() != EXECUTED) {
continue;
}
ContainerBlocksDeletionACKProto ackProto =
cmdStatus.getBlockDeletionAck();
List<DeleteBlockTransactionResult> results = ackProto.getResultsList();
for (SCMCommand command : commands) {
if (command.getType() != deleteBlocksCommand) {
continue;
}
DeleteBlocksCommand deleteBlocksCommand = (DeleteBlocksCommand) command;
List<DeletedBlocksTransaction> deletedBlocks =
deleteBlocksCommand.blocksTobeDeleted();
for (DeleteBlockTransactionResult result : results) {
deletedBlocks.removeIf(delete ->
delete.getTxID() == result.getTxID());
}
}
}
}

@jianghuazhu
Copy link
Contributor Author

Thanks @hemantk-12 for the comment and review.
There is another jira working on this issue, details: HDDS-8882.
Therefore, I'm considering closing this PR.
Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants