HDDS-8882. Manage status of DeleteBlocksCommand in SCM to avoid sending duplicates to Datanode #4988

xichen01 · 2023-06-27T17:06:02Z

What changes were proposed in this pull request?

Currently SCM will send a duplicate DeletedBlocksTransaction to the specify DN if the DN not report the transactions have been finish by the Heartbeat. So if the DeleteBlocksCommandHandler Thread of a DN was Blocked cause by some reason (Such as wait Container lock) the SCM will send a duplicate DeletedBlocksTransaction to this DN.

Summary

The Status of `DeleteBlocksCommand`

    public enum CmdStatus {
      // The DeleteBlocksCommand has not yet been sent.
      // This is the initial status of the command after it's created.
      TO_BE_SENT,
      // If the DeleteBlocksCommand has been sent but has not been executed
      // completely by DN, the DeleteBlocksCommand's state will be SENT.
      // Note that the state of SENT includes the following possibilities.
      //   - The command was sent but not received
      //   - The command was sent and received by the DN,
      //     and is waiting to be executed.
      //   - The Command sent and being executed by DN
      SENT,
    }

State Transfer

TO_BE_SENT -> SENT: The DeleteBlocksCommand is sent by SCM, The follow-up status has not been updated by Datanode.
SENT -> null (remove state recode from SCMDeleteBlocksCommandStatusManager)
Once the DN executes DeleteBlocksCommands, regardless of whether DeleteBlocksCommands is executed successfully or not, it will be deleted from record.
Successful DeleteBlocksCommands are recorded in SCMDeletedBlockTransactionStatusManager#transactionToDNsCommitMap.

DeleteBlocksCommand resent

The DeleteBlocksCommand on the TO_BE_SENT, SENT will not be resent by SCM.

SCMDeletedBlockTransactionStatusManager

SCMDeletedBlockTransactionStatusManager contains the transactionToDNsCommitMap migrated from DeletedBlockLogImpl use to manage the commited DeletedBlocksTransaction.
And the SCMDeletedBlockTransactionStatusManager#SCMDeleteBlocksCommandStatusManager use to manage the DeletedBlocksTransaction which are uncommited.
The "commited" means that DeletedBlockTransaction is executed on DN and reported to SCM by the heartbeat

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-8882

Please replace this section with the link to the Apache JIRA)

How was this patch tested?

integration test

… sending duplicate delete transactions to the DN

xichen01 · 2023-06-27T17:07:21Z

// todo. Implementing unit test and integration test, Draft is set before the test case is completed, but any suggestions are welcome

github-actions · 2023-06-27T17:07:34Z

No such command. / Available commands:

/close : Close pending pull request temporary
/help : Show all the available comment commands
/label : add new label to the issue: /label <label>
/pending : Add a REQUESTED_CHANGE type review to mark issue non-mergeable: /pending <reason>
/ready : Dismiss all the blocking reviews by github-actions bot
/retest : provide help on how to trigger new CI build

github-actions · 2023-06-27T17:18:59Z

No such command. / Available commands:

/close : Close pending pull request temporary
/help : Show all the available comment commands
/label : add new label to the issue: /label <label>
/pending : Add a REQUESTED_CHANGE type review to mark issue non-mergeable: /pending <reason>
/ready : Dismiss all the blocking reviews by github-actions bot
/retest : provide help on how to trigger new CI build

xichen01 · 2023-07-07T17:06:43Z

@adoroszlai PTAL Thanks.

sumitagrawl

@xichen01 thanks for working over this, this seems good improvement to send new blocks and retry with some delay avoiding duplicate command. This is feasible now after removal of strict ordering of transactionId check at DN HDDS-8228. The metrics added for outOfOrder may not be required now at Dn with this change as it will be common to be out-of-order.

Additionally, at SCM, state is managed in DB with retry, and multiple map. We need relook and refactor to have combined state for the Txs.

...-scm/src/main/java/org/apache/hadoop/hdds/scm/block/SCMDeleteBlocksCommandStatusManager.java

.../main/java/org/apache/hadoop/ozone/container/common/report/CommandStatusReportPublisher.java

hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/node/SCMNodeManager.java

...-scm/src/main/java/org/apache/hadoop/hdds/scm/block/SCMDeleteBlocksCommandStatusManager.java

…CommandStatusManager

…sManager

xichen01 · 2023-11-06T11:39:13Z

@xichen01, Thanks for working on this. Whether the latest comments given by @sumitagrawl is addressed?

Yes, CmdStatus has been reduced to two states, and transactionToRetryCountMap had been moved into SCMDeletedBlockGTransactionStatus

sumitagrawl

@xichen01 Thanks for update, given few comments for this PR. Overall looks good.
Will recheck for commandStatusMap for cleanup after fix.

.../src/main/java/org/apache/hadoop/hdds/scm/block/SCMDeletedBlockTransactionStatusManager.java

...-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/block/SCMBlockDeletingService.java

hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/block/DeletedBlockLogImpl.java

… useless code; Fix thread issue

# Conflicts: # hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/node/SCMNodeManager.java

adoroszlai · 2023-12-01T17:23:17Z

Thanks @xichen01 for updating the patch. Can you please check TestDeletedBlockLog failures?

https://github.com/xichen01/ozone/actions/runs/7057702518/job/19212125760#step:5:1833

xichen01 · 2023-12-04T11:20:04Z

@adoroszlai @sumitagrawl
All tests have passed. PTAL, thanks
https://github.com/xichen01/ozone/actions/runs/7071798914/job/19258175391

sumitagrawl

@xichen01 LGTM

adoroszlai

Thanks again @xichen01 for the patch.

adoroszlai · 2023-12-09T15:18:22Z

hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/block/DeletedBlockLog.java

+  SCMDeletedBlockTransactionStatusManager
+      getSCMDeletedBlockTransactionStatusManager();


DeletedBlockLog interface is defined in terms of operations . I don't think exposing a manager object is appropriate for the interface, it should be an implementation detail. Similarly, sharing the same lock between the two objects does not seem right.

Maybe the interface should define operations that the implementation passes through to the manager. Alternatively the manager object should have an interface defined separately, and act as a way to manipulate the DeletedBlockLog.

Removed the getSCMDeletedBlockTransactionStatusManager interface from DeletedBlockLog and added DeletedBlockTransactionStatusManager related actions to DeletedBlockLog.

adoroszlai · 2023-12-22T11:52:57Z

@xichen01 Could you please check HDDS-9962, intermittent failure in TestBlockDeletion? It started happening after this PR was merged.

xichen01 · 2023-12-22T15:39:31Z

@xichen01 Could you please check HDDS-9962, intermittent failure in TestBlockDeletion? It started happening after this PR was merged.

OK, I will check this

…ng duplicates to Datanode (apache#4988) (cherry picked from commit 88e18e3)

slfan1989 · 2024-09-24T15:38:15Z

@xichen01 @adoroszlai

During our use of deletion, I noticed that it can be very slow, especially after we switched to the EC policy.

Our Ozone01 cluster currently has about 1K machines. Initially, we chose to use a Ratis-3Replica strategy, but for cost considerations, we gradually switched to the EC-6-3 strategy in July.

The following chart shows the deletion speed for Ratis-3Replica .

The following chart shows the deletion speed for EC-6-3.

By reviewing the code and analyzing the logs, we found that the following situation can cause deletion to be very slow. We will illustrate this with an example.

Background

We want to delete data from an EC container with ContainerId = 1000. Since it is EC-6-3, there are 9 replicas (DN1, DN2, DN3, ... DN9).

Process

Before deletion, we first select a batch of DNs; at this time, we may only select DN1 to DN6. We then send the deletion command to these 6 DNs, and the command executes normally, successfully deleting 6 blocks. However, if DN7 to DN9 are not selected, our deletion process will get stuck.

Code

ozone/hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/block/DeletedBlockLogImpl.java

Lines 294 to 313 in 1f86ce8

    
           private void getTransaction(DeletedBlocksTransaction tx, 
        
               DatanodeDeletedBlockTransactions transactions, 
        
               Set<DatanodeDetails> dnList, Set<ContainerReplica> replicas, 
        
               Map<UUID, Map<Long, CmdStatus>> commandStatus) { 
        
             DeletedBlocksTransaction updatedTxn = 
        
                 DeletedBlocksTransaction.newBuilder(tx) 
        
                     .setCount(transactionStatusManager.getOrDefaultRetryCount( 
        
                       tx.getTxID(), 0)) 
        
                     .build(); 
        
             for (ContainerReplica replica : replicas) { 
        
               DatanodeDetails details = replica.getDatanodeDetails(); 
        
               if (!dnList.contains(details)) { 
        
                 continue; 
        
               } 
        
               if (!transactionStatusManager.isDuplication( 
        
                   details, updatedTxn.getTxID(), commandStatus)) { 
        
                 transactions.addTransactionToDN(details.getUuid(), updatedTxn); 
        
               } 
        
             } 
        
           }

I came up with a possible solution to eliminate this stuck situation. We require that all replicas of the container to be deleted must be present in the selected DN list simultaneously. Otherwise, we will skip that container.

private void getTransaction(DeletedBlocksTransaction tx,
      DatanodeDeletedBlockTransactions transactions,
      Set<DatanodeDetails> dnList, Set<ContainerReplica> replicas,
      Map<UUID, Map<Long, CmdStatus>> commandStatus) {
    DeletedBlocksTransaction updatedTxn =
        DeletedBlocksTransaction.newBuilder(tx)
            .setCount(transactionStatusManager.getOrDefaultRetryCount(
              tx.getTxID(), 0))
            .build();
    
     // Requiring that replicas must be present in the DN list simultaneously ensures that the deletion commands for all 
     // replicas of the same container can be issued at once, avoiding situations where some replicas of the container are 
     // deleted while others are not.
    for (ContainerReplica replica : replicas) {
      DatanodeDetails datanodeDetails = replica.getDatanodeDetails();
      if (!dnList.contains(datanodeDetails)) {
        return;
      }
    }

    for (ContainerReplica replica : replicas) {
      DatanodeDetails details = replica.getDatanodeDetails();
      if (!dnList.contains(details)) {
        continue;
      }
      if (!transactionStatusManager.isDuplication(
          details, updatedTxn.getTxID(), commandStatus)) {
        transactions.addTransactionToDN(details.getUuid(), updatedTxn);
      }
    }
  }

slfan1989 · 2024-09-29T09:19:40Z

@xichen01 @adoroszlai

We rolled out this improvement internally in the SCM around 7 PM on September 26th, and we observed a significant enhancement in deletion efficiency, with 50 million blocks being fully processed within 5 hours.

The core aspect of this improvement is to ensure that all DNs within the same container receive the delete command simultaneously. When they send their ACKs, they can reach the SCM at approximately the same time, which facilitates the confirmation of block deletions.

I would like to prepare a PR and submit this change to the community.

ashishkumar50 · 2025-05-14T06:12:58Z

@slfan1989, The issue you found, Is your commit merged in upstream, what is the Jira ID?
cc: @ChenSammi

slfan1989 · 2025-05-14T06:42:52Z

@slfan1989, The issue you found, Is your commit merged in upstream, what is the Jira ID?
cc: @ChenSammi

@ashishkumar50 Thanks for the question! The relevant JIRA issue should be HDDS-11498, and this PR has already been merged.

The configuration I used during the deletion process are as follows:

-- OM 
ozone.path.deleting.limit.per.task 150000
ozone.directory.deleting.service.interval 180s

ozone.key.deleting.limit.per.task 150000
ozone.block.deleting.service.interval 180s

-- SCM
hdds.scm.block.deletion.per-interval.max 2000000
hdds.scm.block.deleting.service.interval 300s

… avoid sending duplicates to Datanode (apache#4988) (cherry picked from commit 88e18e3)

HDDS-8882. Add state management of SCM's DeleteBlocksCommand to avoid…

ce17803

… sending duplicate delete transactions to the DN

xichen01 marked this pull request as draft June 27, 2023 17:06

xichen01 added 6 commits June 28, 2023 16:07

added licensed for new file

d596899

Add unit test

2807f2a

Fix integration test

a94d6cb

Fix findbugs

2021a3b

Merge branch 'master' into HDDS-8882

e4d2e21

Add integration test

1ba8e08

xichen01 marked this pull request as ready for review June 30, 2023 08:49

xichen01 and others added 4 commits June 30, 2023 17:02

Fix test

a1117a2

Fix test

0bf4545

Fix test

5f51a92

Merge branch 'apache:master' into HDDS-8882

d10b30c

adoroszlai requested review from sodonnel, sumitagrawl and symious and removed request for sodonnel July 7, 2023 18:01

sumitagrawl reviewed Jul 11, 2023

View reviewed changes

xichen01 added 6 commits July 13, 2023 19:25

Add status cleanup when datanode is dead

b7a9c1c

Merge transactionToDNsCommitMap to SCMDeleteBlocksCommandStatusManager

391dca9

Merge branch 'HDDS-8882' of github.com:xichen01/ozone into HDDS-8882

5592902

Split TestSCMDeleteBlocksCommandStatusManager and TestSCMDeleteBlocks…

f792ccc

…CommandStatusManager

Implement NodeManager#registerSendCommandNotify

be5e118

Fix test

5fc9b5c

xichen01 force-pushed the HDDS-8882 branch from 6fd5b8c to 5fc9b5c Compare July 17, 2023 06:58

Fix test

24e0e5a

move transactionToRetryCountMap into SCMDeletedBlockGTransactionStatu…

d353c08

…sManager

adoroszlai requested a review from sumitagrawl November 23, 2023 14:58

sumitagrawl mentioned this pull request Nov 27, 2023

HDDS-9748. When the command status is reported, the command set sent to the datanode is updated #5672

Closed

adoroszlai changed the title ~~HDDS-8882. Add status management of SCM's DeleteBlocksCommand to avoid sending duplicate delete transactions to the DN~~ HDDS-8882. Manage status of DeleteBlocksCommand in SCM to avoid sending duplicates to Datanode Nov 27, 2023

sumitagrawl reviewed Nov 29, 2023

View reviewed changes

xichen01 added 3 commits December 1, 2023 16:27

Merge recordTransactionCreated and recordTransactionCommitted; Remove…

d859264

… useless code; Fix thread issue

Add additional processing logic in the updateStatus

addbdf5

Merge branch 'master' into HDDS-8882

3424aec

# Conflicts: # hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/node/SCMNodeManager.java

xichen01 requested a review from sumitagrawl December 1, 2023 08:39

xichen01 added 2 commits December 3, 2023 01:32

Fix test

343b67d

Findbugs

03cca5f

sumitagrawl approved these changes Dec 5, 2023

View reviewed changes

adoroszlai reviewed Dec 9, 2023

View reviewed changes

Remove getSCMDeletedBlockTransactionStatusManager from DeletedBlockLog

0a39cb3

adoroszlai requested review from Xushaohong, aryangupta1998 and sumitagrawl December 12, 2023 20:39

sumitagrawl merged commit 88e18e3 into apache:master Dec 18, 2023

xichen01 added a commit to xichen01/ozone that referenced this pull request Jul 17, 2024

HDDS-8882. Manage status of DeleteBlocksCommand in SCM to avoid sendi…

994182b

…ng duplicates to Datanode (apache#4988) (cherry picked from commit 88e18e3)

xichen01 mentioned this pull request Jul 18, 2024

[DO NOT MERGE] Backport some fixes, performance optimizations from master to ozone-1.4 #6929 #6964

Merged

swamirishi pushed a commit to swamirishi/ozone that referenced this pull request Dec 3, 2025

CDPD-57856. HDDS-8882. Manage status of DeleteBlocksCommand in SCM to…

8587d74

… avoid sending duplicates to Datanode (apache#4988) (cherry picked from commit 88e18e3)

		SCMDeletedBlockTransactionStatusManager
		getSCMDeletedBlockTransactionStatusManager();

HDDS-8882. Manage status of DeleteBlocksCommand in SCM to avoid sending duplicates to Datanode #4988

HDDS-8882. Manage status of DeleteBlocksCommand in SCM to avoid sending duplicates to Datanode #4988

Uh oh!

Conversation

xichen01 commented Jun 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Summary

The Status of DeleteBlocksCommand

State Transfer

DeleteBlocksCommand resent

SCMDeletedBlockTransactionStatusManager

What is the link to the Apache JIRA

How was this patch tested?

Uh oh!

xichen01 commented Jun 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jun 27, 2023

Uh oh!

github-actions bot commented Jun 27, 2023

Uh oh!

xichen01 commented Jul 7, 2023

Uh oh!

sumitagrawl left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

xichen01 commented Nov 6, 2023

Uh oh!

sumitagrawl left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

adoroszlai commented Dec 1, 2023

Uh oh!

xichen01 commented Dec 4, 2023

Uh oh!

sumitagrawl left a comment

Choose a reason for hiding this comment

Uh oh!

adoroszlai left a comment

Choose a reason for hiding this comment

Uh oh!

adoroszlai Dec 9, 2023

Choose a reason for hiding this comment

Uh oh!

xichen01 Dec 9, 2023

Choose a reason for hiding this comment

Uh oh!

adoroszlai commented Dec 22, 2023

Uh oh!

xichen01 commented Dec 22, 2023

Uh oh!

slfan1989 commented Sep 24, 2024

Uh oh!

slfan1989 commented Sep 29, 2024

Uh oh!

ashishkumar50 commented May 14, 2025

Uh oh!

slfan1989 commented May 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

xichen01 commented Jun 27, 2023 •

edited

Loading

The Status of `DeleteBlocksCommand`

xichen01 commented Jun 27, 2023 •

edited

Loading