-
Notifications
You must be signed in to change notification settings - Fork 588
HDDS-2448 Delete container command should used a thread pool #142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This patch is not ready to commit yet, as it needs HDDS-2450 committed first and then the threadpool size made configurable. |
anuengineer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. LGTM. If you can push the change for 2 -> a name , then I will commit it immediately. Thanks for getting this done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you define this as delete thread pool Size somewhere, otherwise 2 is kind of confusing to people.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
8a994d4 to
7796c58
Compare
…r than on the main commandDispatcher thread
7796c58 to
471a648
Compare
adoroszlai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @sodonnel for the improvement.
I have a few minor nits about code and config naming. Please consider them for this PR, should another round of rebase be necessary. Otherwise, please consider them for your next change related to this area. Thanks.
I would also love a brief summary about how this was tested. ;)
| private long totalTime; | ||
| private final AtomicInteger invocationCount = new AtomicInteger(0); | ||
| private final AtomicLong totalTime = new AtomicLong(0); | ||
| private final ThreadPoolExecutor executor; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer declaring it as the more generic ExecutorService.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have fixed this.
| return invocationCount.get() == 0 ? | ||
| 0 : totalTime.get() / invocationCount.get(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The result of invocationCount.get() could be saved in a local variable for consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well spotted. I have fixed this.
| return replicationMaxStreams; | ||
| } | ||
|
|
||
| @Config(key = "delete.container.threads", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I think container.delete.threads.max would better reflect both the hierarchical nature of config item naming, and the fact that it's an upper limit, not a fixed thread count.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct. It makes more sense to have "container.delete". I have changed this and also the related method names so they match up.
| if (val < 1) { | ||
| LOG.warn("hdds.datanode.delete.container.threads must be greater than" + | ||
| "zero and was set to {}. Defaulting to {}", | ||
| val, deleteContainerThreads); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great only for the case when config is set during load from XML. Later, if the config is being set programmatically, we have two (minor) issues:
- the default value is not the real default, rather the previously set valid config
- warning is human-friendly, but irrelevant to programmatic callers
Eg.
setDeleteContainerThreads(10);
setDeleteContainerThreads(-1);
would log ... Defaulting to 10.
So I think this could be improved by:
- creating a constant for the default value
- using an
Integerto distinguish between unset and set states - logging the warning only if previously unset
- consider throwing an exception (or silently ignoring invalid values) if previously already set
(I wanted to mention this for the previous, replication-related PR, but was late to the party. ;) )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are good points. For now, I have kept the warn log message, as I don't like the idea of silently ignoring bad values. From a usability perspective, I would like the DN to still start if an operator puts a bad value in the config rather than fail completely. However it is easy to argue this the other way too, that we should fail on a bad value rather than trying to be too clever.
I have taken on the default point and applied it to both parameters in the DatanodeConfig class.
adoroszlai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @sodonnel for addressing my nits.
|
Thanks for the contribution. I have committed this to the master branch. @adoroszlai Thanks for the reviews. |
) (apache#142) (cherry picked from commit 9477aa6) Co-authored-by: Arafat2198 <[email protected]>
What changes were proposed in this pull request?
The datanode receives commands over the heartbeat and queues all commands on a single queue in StateContext.commandQueue. Inside DatanodeStateMachine a single thread is used to process this queue (started by initCommandHander thread) and it passes each command to a ‘handler’. Each command type has its own handler.
The delete container command immediately executes the command on the thread used to process the command queue. Therefore if the delete is slow for some reason (it must access disk, so this is possible) it could cause other commands to backup.
This should be changed to use a threadpool to queue the deleteContainer command, in a similar way to ReplicateContainerCommand.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-2448