-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Fix for import mxnet taking long time if multiple process launched #13602
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -222,3 +222,17 @@ Settings for More GPU Parallelism | |
- Set ```MXNET_GPU_WORKER_NTHREADS``` to a larger number (e.g., 2) | ||
- To reduce memory usage, consider setting ```MXNET_EXEC_NUM_TEMP```. | ||
- This might not speed things up, especially for image applications, because GPU is usually fully utilized even with serialized jobs. | ||
|
||
Settings for controlling OMP tuning | ||
--------------------------------- | ||
- Set ```MXNET_USE_OPERATOR_TUNING=0``` to disable Operator tuning code which decides whether to use OMP or not for operator | ||
- Values: String representation of MXNET_ENABLE_OPERATOR_TUNING environment variable | ||
- 0=disable all | ||
- 1=enable all | ||
- float32, float16, float32=list of types to enable, and disable those not listed | ||
- refer : https://github.com/apache/incubator-mxnet/blob/master/src/operator/operator_tune-inl.h#L444 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure it's a good choice to put code link here. Once There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah , I forgot to add that diff where I listed all the data type. I will create a separate PR to correct this. |
||
|
||
- Set ```MXNET_USE_NUM_CORES_OPERATOR_TUNING``` to define num_cores to be used by operator tuning code. | ||
- This reduces operator tuning overhead when there are multiple instances of mxnet running in the system and we know that | ||
each mxnet will take only partial num_cores available with system. | ||
- refer: https://github.com/apache/incubator-mxnet/pull/13602 |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -56,7 +56,7 @@ namespace op { | |
#endif | ||
#endif // MXNET_NO_INLINE | ||
|
||
#define OUTSIDE_COUNT_SHIFT 9 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. does changing this impact the IsOMPFaster selection in operator_tune.h. Do we need to tweak WORKLOAD_COUNT_SHIFT too ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Workload_count_shift is currently 11, which means workload count will be 2048. |
||
#define OUTSIDE_COUNT_SHIFT 3 | ||
Vikas-kum marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
namespace tune { | ||
|
||
|
@@ -356,7 +356,8 @@ class OperatorTune : public OperatorTuneByType<DType> { | |
static duration_t GetOMPLoopOverhead() { | ||
// It was found empirically that OMP times was not heavily tied to number of cores, | ||
// so take an average across all core counts | ||
const auto max_cores = static_cast<size_t>(omp_get_num_procs()) >> 1; | ||
const auto max_cores_default = static_cast<size_t>(omp_get_num_procs()) >> 1; | ||
Vikas-kum marked this conversation as resolved.
Show resolved
Hide resolved
|
||
const auto max_cores = dmlc::GetEnv("MXNET_USE_NUM_CORES_OPERATOR_TUNING", max_cores_default); | ||
Vikas-kum marked this conversation as resolved.
Show resolved
Hide resolved
|
||
if (max_cores >= 2) { | ||
std::vector<duration_t> core_times; | ||
// Take care of any OMP lazy-init with a throwaway call | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we list the valid types here: "float32", "float16", "float64", "int8", "uint8", "int32", "int64"