This repository has been archived by the owner on Nov 17, 2023. It is now read-only.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Support multi-threading for Custom Operator #14363
Support multi-threading for Custom Operator #14363
Changes from 2 commits
4da4a9a
dff6c5a
f6fd6ca
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you evaluated this when the custom op is part of a bigger graph and if there is any performance impact ? Since the CustomOperator is static its lifetime is till the end of the program and destructor of customoperator gets called at the end. This means there is one thread that is waiting on the condition variable while other 11 threads tryign to obtain the lock and in a blocked state. Since these are idle threads, I am not sure if the impact will be significant but good to verify. Will also help us come up with a good default for MXNET_CUSTOM_OP_NUM_THREADS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
Sorry that I do not have any server with multiple GPUs to evaluate a big graph.
In this PR, the number of threads will increase when threads are not enough, and the maximum number is MXNET_CUSTOM_OP_NUM_THREADS.
There are always threads which get the lock and execute the operator function, so I think the idle threads do not drop the performance.
I think the maximum MXNET_CUSTOM_OP_NUM_THREADS is the number of CPU cores.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it's fine during op execution but even after custom op execution there is one thread waiting on CV and other 11 trying to acquire lock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see.
I ever thought that decrease the number of threads when threads are idle, however it is difficult to estimate the number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you run the example/reinforcement-learning/dqn which includes a custom op on CPU to check for performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be not available to check for performance on only CPU,since there Is only a computational stream usually. I will find a server with multiple GPUs and check on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this change is mainly for performance improvements on a GPU machine, but we should be careful not to impact cpu performance for inference. Otherwise we should keep the default small.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry that I met the problem when running the RL demo:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm checking it. The problem may be related to
GPERFTOOLS
andJEMALLOC
. I close them and plan to rebuild MXNet.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my experiment, the performance does not drop in DQN example. The FPS keeps 1500 on my laptop with only CPU i7-7500U(2c4t).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice ! Popping early will prevent the race condition.