Skip to content

Conversation

@xichen01
Copy link
Contributor

@xichen01 xichen01 commented Jun 27, 2023

What changes were proposed in this pull request?

Currently SCM will send a duplicate DeletedBlocksTransaction to the specify DN if the DN not report the transactions have been finish by the Heartbeat. So if the DeleteBlocksCommandHandler Thread of a DN was Blocked cause by some reason (Such as wait Container lock) the SCM will send a duplicate DeletedBlocksTransaction to this DN.

Summary

The Status of DeleteBlocksCommand

    public enum CmdStatus {
      // The DeleteBlocksCommand has not yet been sent.
      // This is the initial status of the command after it's created.
      TO_BE_SENT,
      // If the DeleteBlocksCommand has been sent but has not been executed
      // completely by DN, the DeleteBlocksCommand's state will be SENT.
      // Note that the state of SENT includes the following possibilities.
      //   - The command was sent but not received
      //   - The command was sent and received by the DN,
      //     and is waiting to be executed.
      //   - The Command sent and being executed by DN
      SENT,
    }

State Transfer

  • TO_BE_SENT -> SENT: The DeleteBlocksCommand is sent by SCM, The follow-up status has not been updated by Datanode.

  • SENT -> null (remove state recode from SCMDeleteBlocksCommandStatusManager)
    Once the DN executes DeleteBlocksCommands, regardless of whether DeleteBlocksCommands is executed successfully or not, it will be deleted from record.
    Successful DeleteBlocksCommands are recorded in SCMDeletedBlockTransactionStatusManager#transactionToDNsCommitMap.

DeleteBlocksCommand resent

The DeleteBlocksCommand on the TO_BE_SENT, SENT will not be resent by SCM.

SCMDeletedBlockTransactionStatusManager

SCMDeletedBlockTransactionStatusManager contains the transactionToDNsCommitMap migrated from DeletedBlockLogImpl use to manage the commited DeletedBlocksTransaction.
And the SCMDeletedBlockTransactionStatusManager#SCMDeleteBlocksCommandStatusManager use to manage the DeletedBlocksTransaction which are uncommited.
The "commited" means that DeletedBlockTransaction is executed on DN and reported to SCM by the heartbeat

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-8882

Please replace this section with the link to the Apache JIRA)

How was this patch tested?

integration test

… sending duplicate delete transactions to the DN
@xichen01 xichen01 marked this pull request as draft June 27, 2023 17:06
@xichen01
Copy link
Contributor Author

xichen01 commented Jun 27, 2023

// todo. Implementing unit test and integration test, Draft is set before the test case is completed, but any suggestions are welcome

@github-actions
Copy link

No such command. / Available commands:

  • /close : Close pending pull request temporary
  • /help : Show all the available comment commands
  • /label : add new label to the issue: /label <label>
  • /pending : Add a REQUESTED_CHANGE type review to mark issue non-mergeable: /pending <reason>
  • /ready : Dismiss all the blocking reviews by github-actions bot
  • /retest : provide help on how to trigger new CI build

1 similar comment
@github-actions
Copy link

No such command. / Available commands:

  • /close : Close pending pull request temporary
  • /help : Show all the available comment commands
  • /label : add new label to the issue: /label <label>
  • /pending : Add a REQUESTED_CHANGE type review to mark issue non-mergeable: /pending <reason>
  • /ready : Dismiss all the blocking reviews by github-actions bot
  • /retest : provide help on how to trigger new CI build

@xichen01 xichen01 marked this pull request as ready for review June 30, 2023 08:49
@xichen01
Copy link
Contributor Author

xichen01 commented Jul 7, 2023

@adoroszlai PTAL Thanks.

@adoroszlai adoroszlai requested review from sodonnel, sumitagrawl and symious and removed request for sodonnel July 7, 2023 18:01
Copy link
Contributor

@sumitagrawl sumitagrawl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xichen01 thanks for working over this, this seems good improvement to send new blocks and retry with some delay avoiding duplicate command. This is feasible now after removal of strict ordering of transactionId check at DN HDDS-8228. The metrics added for outOfOrder may not be required now at Dn with this change as it will be common to be out-of-order.

Additionally, at SCM, state is managed in DB with retry, and multiple map. We need relook and refactor to have combined state for the Txs.

@xichen01
Copy link
Contributor Author

xichen01 commented Nov 6, 2023

@xichen01, Thanks for working on this. Whether the latest comments given by @sumitagrawl is addressed?

Yes, CmdStatus has been reduced to two states, and transactionToRetryCountMap had been moved into SCMDeletedBlockGTransactionStatus

@adoroszlai adoroszlai changed the title HDDS-8882. Add status management of SCM's DeleteBlocksCommand to avoid sending duplicate delete transactions to the DN HDDS-8882. Manage status of DeleteBlocksCommand in SCM to avoid sending duplicates to Datanode Nov 27, 2023
Copy link
Contributor

@sumitagrawl sumitagrawl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xichen01 Thanks for update, given few comments for this PR. Overall looks good.
Will recheck for commandStatusMap for cleanup after fix.

# Conflicts:
#	hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/node/SCMNodeManager.java
@xichen01 xichen01 requested a review from sumitagrawl December 1, 2023 08:39
@adoroszlai
Copy link
Contributor

Thanks @xichen01 for updating the patch. Can you please check TestDeletedBlockLog failures?

https://github.com/xichen01/ozone/actions/runs/7057702518/job/19212125760#step:5:1833

@xichen01
Copy link
Contributor Author

xichen01 commented Dec 4, 2023

Copy link
Contributor

@sumitagrawl sumitagrawl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xichen01 LGTM

Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again @xichen01 for the patch.

Comment on lines 90 to 91
SCMDeletedBlockTransactionStatusManager
getSCMDeletedBlockTransactionStatusManager();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DeletedBlockLog interface is defined in terms of operations . I don't think exposing a manager object is appropriate for the interface, it should be an implementation detail. Similarly, sharing the same lock between the two objects does not seem right.

Maybe the interface should define operations that the implementation passes through to the manager. Alternatively the manager object should have an interface defined separately, and act as a way to manipulate the DeletedBlockLog.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the getSCMDeletedBlockTransactionStatusManager interface from DeletedBlockLog and added DeletedBlockTransactionStatusManager related actions to DeletedBlockLog.

@sumitagrawl sumitagrawl merged commit 88e18e3 into apache:master Dec 18, 2023
@adoroszlai
Copy link
Contributor

@xichen01 Could you please check HDDS-9962, intermittent failure in TestBlockDeletion? It started happening after this PR was merged.

@xichen01
Copy link
Contributor Author

@xichen01 Could you please check HDDS-9962, intermittent failure in TestBlockDeletion? It started happening after this PR was merged.

OK, I will check this

xichen01 added a commit to xichen01/ozone that referenced this pull request Jul 17, 2024
…ng duplicates to Datanode (apache#4988)

(cherry picked from commit 88e18e3)
@slfan1989
Copy link
Contributor

@xichen01 @adoroszlai

During our use of deletion, I noticed that it can be very slow, especially after we switched to the EC policy.

Our Ozone01 cluster currently has about 1K machines. Initially, we chose to use a Ratis-3Replica strategy, but for cost considerations, we gradually switched to the EC-6-3 strategy in July.

The following chart shows the deletion speed for Ratis-3Replica .

image

The following chart shows the deletion speed for EC-6-3.

image

By reviewing the code and analyzing the logs, we found that the following situation can cause deletion to be very slow. We will illustrate this with an example.

Background

We want to delete data from an EC container with ContainerId = 1000. Since it is EC-6-3, there are 9 replicas (DN1, DN2, DN3, ... DN9).

Process

Before deletion, we first select a batch of DNs; at this time, we may only select DN1 to DN6. We then send the deletion command to these 6 DNs, and the command executes normally, successfully deleting 6 blocks. However, if DN7 to DN9 are not selected, our deletion process will get stuck.

Code

private void getTransaction(DeletedBlocksTransaction tx,
DatanodeDeletedBlockTransactions transactions,
Set<DatanodeDetails> dnList, Set<ContainerReplica> replicas,
Map<UUID, Map<Long, CmdStatus>> commandStatus) {
DeletedBlocksTransaction updatedTxn =
DeletedBlocksTransaction.newBuilder(tx)
.setCount(transactionStatusManager.getOrDefaultRetryCount(
tx.getTxID(), 0))
.build();
for (ContainerReplica replica : replicas) {
DatanodeDetails details = replica.getDatanodeDetails();
if (!dnList.contains(details)) {
continue;
}
if (!transactionStatusManager.isDuplication(
details, updatedTxn.getTxID(), commandStatus)) {
transactions.addTransactionToDN(details.getUuid(), updatedTxn);
}
}
}

I came up with a possible solution to eliminate this stuck situation. We require that all replicas of the container to be deleted must be present in the selected DN list simultaneously. Otherwise, we will skip that container.

private void getTransaction(DeletedBlocksTransaction tx,
      DatanodeDeletedBlockTransactions transactions,
      Set<DatanodeDetails> dnList, Set<ContainerReplica> replicas,
      Map<UUID, Map<Long, CmdStatus>> commandStatus) {
    DeletedBlocksTransaction updatedTxn =
        DeletedBlocksTransaction.newBuilder(tx)
            .setCount(transactionStatusManager.getOrDefaultRetryCount(
              tx.getTxID(), 0))
            .build();
    
     // Requiring that replicas must be present in the DN list simultaneously ensures that the deletion commands for all 
     // replicas of the same container can be issued at once, avoiding situations where some replicas of the container are 
     // deleted while others are not.
    for (ContainerReplica replica : replicas) {
      DatanodeDetails datanodeDetails = replica.getDatanodeDetails();
      if (!dnList.contains(datanodeDetails)) {
        return;
      }
    }

    for (ContainerReplica replica : replicas) {
      DatanodeDetails details = replica.getDatanodeDetails();
      if (!dnList.contains(details)) {
        continue;
      }
      if (!transactionStatusManager.isDuplication(
          details, updatedTxn.getTxID(), commandStatus)) {
        transactions.addTransactionToDN(details.getUuid(), updatedTxn);
      }
    }
  }

@slfan1989
Copy link
Contributor

@xichen01 @adoroszlai

We rolled out this improvement internally in the SCM around 7 PM on September 26th, and we observed a significant enhancement in deletion efficiency, with 50 million blocks being fully processed within 5 hours.

The core aspect of this improvement is to ensure that all DNs within the same container receive the delete command simultaneously. When they send their ACKs, they can reach the SCM at approximately the same time, which facilitates the confirmation of block deletions.

image

I would like to prepare a PR and submit this change to the community.

@ashishkumar50
Copy link
Contributor

@slfan1989, The issue you found, Is your commit merged in upstream, what is the Jira ID?
cc: @ChenSammi

@slfan1989
Copy link
Contributor

@slfan1989, The issue you found, Is your commit merged in upstream, what is the Jira ID?
cc: @ChenSammi

@ashishkumar50 Thanks for the question! The relevant JIRA issue should be HDDS-11498, and this PR has already been merged.

The configuration I used during the deletion process are as follows:

-- OM 
ozone.path.deleting.limit.per.task 150000
ozone.directory.deleting.service.interval 180s

ozone.key.deleting.limit.per.task 150000
ozone.block.deleting.service.interval 180s

-- SCM
hdds.scm.block.deletion.per-interval.max 2000000
hdds.scm.block.deleting.service.interval 300s

swamirishi pushed a commit to swamirishi/ozone that referenced this pull request Dec 3, 2025
… avoid sending duplicates to Datanode (apache#4988)

(cherry picked from commit 88e18e3)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants