HDDS-2448 Delete container command should used a thread pool #142

sodonnel · 2019-11-11T13:05:06Z

What changes were proposed in this pull request?

The datanode receives commands over the heartbeat and queues all commands on a single queue in StateContext.commandQueue. Inside DatanodeStateMachine a single thread is used to process this queue (started by initCommandHander thread) and it passes each command to a ‘handler’. Each command type has its own handler.

The delete container command immediately executes the command on the thread used to process the command queue. Therefore if the delete is slow for some reason (it must access disk, so this is possible) it could cause other commands to backup.

This should be changed to use a threadpool to queue the deleteContainer command, in a similar way to ReplicateContainerCommand.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-2448

sodonnel · 2019-11-11T13:05:47Z

This patch is not ready to commit yet, as it needs HDDS-2450 committed first and then the threadpool size made configurable.

anuengineer

+1. LGTM. If you can push the change for 2 -> a name , then I will commit it immediately. Thanks for getting this done.

anuengineer · 2019-11-14T17:15:27Z

...rc/main/java/org/apache/hadoop/ozone/container/common/statemachine/DatanodeStateMachine.java

Can you define this as delete thread pool Size somewhere, otherwise 2 is kind of confusing to people.

Yes, that "2" needs to be configurable just like for the "replicate container thread pool size" in HDDS-2450. Now HDDS-2450 has been committed, I will rebase this patch and make this thread pool configurable in the same way.

…r than on the main commandDispatcher thread

adoroszlai

Thanks @sodonnel for the improvement.

I have a few minor nits about code and config naming. Please consider them for this PR, should another round of rebase be necessary. Otherwise, please consider them for your next change related to this area. Thanks.

I would also love a brief summary about how this was tested. ;)

adoroszlai · 2019-11-19T12:30:28Z

...hadoop/ozone/container/common/statemachine/commandhandler/DeleteContainerCommandHandler.java

-  private long totalTime;
+  private final AtomicInteger invocationCount = new AtomicInteger(0);
+  private final AtomicLong totalTime = new AtomicLong(0);
+  private final ThreadPoolExecutor executor;


Prefer declaring it as the more generic ExecutorService.

I have fixed this.

adoroszlai · 2019-11-19T12:31:01Z

...hadoop/ozone/container/common/statemachine/commandhandler/DeleteContainerCommandHandler.java

+    return invocationCount.get() == 0 ?
+        0 : totalTime.get() / invocationCount.get();


The result of invocationCount.get() could be saved in a local variable for consistency.

Well spotted. I have fixed this.

adoroszlai · 2019-11-19T13:01:20Z

...c/main/java/org/apache/hadoop/ozone/container/common/statemachine/DatanodeConfiguration.java

    return replicationMaxStreams;
  }

+  @Config(key = "delete.container.threads",


Nit: I think container.delete.threads.max would better reflect both the hierarchical nature of config item naming, and the fact that it's an upper limit, not a fixed thread count.

You are correct. It makes more sense to have "container.delete". I have changed this and also the related method names so they match up.

adoroszlai · 2019-11-19T13:13:02Z

...c/main/java/org/apache/hadoop/ozone/container/common/statemachine/DatanodeConfiguration.java

+    if (val < 1) {
+      LOG.warn("hdds.datanode.delete.container.threads must be greater than" +
+              "zero and was set to {}. Defaulting to {}",
+          val, deleteContainerThreads);


This is great only for the case when config is set during load from XML. Later, if the config is being set programmatically, we have two (minor) issues:

the default value is not the real default, rather the previously set valid config

warning is human-friendly, but irrelevant to programmatic callers

Eg.

setDeleteContainerThreads(10); setDeleteContainerThreads(-1);

would log ... Defaulting to 10.

So I think this could be improved by:

creating a constant for the default value

using an Integer to distinguish between unset and set states

logging the warning only if previously unset

consider throwing an exception (or silently ignoring invalid values) if previously already set

(I wanted to mention this for the previous, replication-related PR, but was late to the party. ;) )

These are good points. For now, I have kept the warn log message, as I don't like the idea of silently ignoring bad values. From a usability perspective, I would like the DN to still start if an operator puts a bad value in the config rather than fail completely. However it is easy to argue this the other way too, that we should fail on a bad value rather than trying to be too clever.

I have taken on the default point and applied it to both parameters in the DatanodeConfig class.

adoroszlai

Thanks @sodonnel for addressing my nits.

anuengineer · 2019-11-19T22:31:05Z

Thanks for the contribution. I have committed this to the master branch. @adoroszlai Thanks for the reviews.

) (apache#142) (cherry picked from commit 9477aa6) Co-authored-by: Arafat2198 <[email protected]>

anuengineer approved these changes Nov 14, 2019

View reviewed changes

sodonnel force-pushed the HDDS-2448-del-container branch 2 times, most recently from 8a994d4 to 7796c58 Compare November 15, 2019 11:15

elek mentioned this pull request Nov 15, 2019

HDDS-2482. Enable github actions for pull requests #171

Merged

S O'Donnell added 3 commits November 19, 2019 12:07

Datanode container delete commands will not run in a threadpool rathe…

ccb107d

…r than on the main commandDispatcher thread

Make delete container thread count configurable

c7ccd7e

Removed references to handlers no longer in use

471a648

sodonnel force-pushed the HDDS-2448-del-container branch from 7796c58 to 471a648 Compare November 19, 2019 12:14

adoroszlai reviewed Nov 19, 2019

View reviewed changes

Updates related to review comments

5d88bd5

adoroszlai approved these changes Nov 19, 2019

View reviewed changes

anuengineer merged commit 6186cf9 into apache:master Nov 19, 2019

ptlrs pushed a commit to ptlrs/ozone that referenced this pull request Mar 8, 2025

CDPD-74074. HDDS-11436. Minor update in Recon API handling. (apache#7178

7ac3d39

) (apache#142) (cherry picked from commit 9477aa6) Co-authored-by: Arafat2198 <[email protected]>

		return invocationCount.get() == 0 ?
		0 : totalTime.get() / invocationCount.get();

HDDS-2448 Delete container command should used a thread pool #142

HDDS-2448 Delete container command should used a thread pool #142

Uh oh!

Conversation

sodonnel commented Nov 11, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

What is the link to the Apache JIRA

Uh oh!

sodonnel commented Nov 11, 2019

Uh oh!

anuengineer left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adoroszlai left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adoroszlai left a comment

Choose a reason for hiding this comment

Uh oh!

anuengineer commented Nov 19, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sodonnel commented Nov 11, 2019 •

edited

Loading