Fix bug where onnxruntime_USE_NCCL flag would default to ON #12195

seanmurr1 · 2022-07-16T00:27:22Z

Description: Change onnxruntime_USE_NCCL flag behavior: flag is ON when training is enabled and NCCL is not disabled. Flag is OFF otherwise.

Motivation and Context
With training not enabled and tensorRT not enabled, the onnxruntime_USE_NCCL flag defaults to ON, causing ORT build to fail on my Linux machine with gcc 9.4.0.

Build would fail with error "/include: not a directory" due to the modified directory options of cmake. onnxruntime_USE_NCCL flag is set to ON if the disable_NCCL arg/flag is not set (which is not ON by default). This occurs on line 888 of build.py.

With onnxruntime_USE_NCCL flag ON, but with CUDA and tensorRT not enabled, the path "/include" gets added to the directory options of cmake, which is not a directory on my machine, causing the build to fail. This occurs on line 45 of onnxruntime_framework.cmake.

…ORT to not build properly. New functionality: flag is ON when training is enabled and NCCL is not disabled. Flag is OFF otherwise

snnn · 2022-07-18T16:32:18Z

/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux Nuphar CI Pipeline, Linux OpenVINO CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline, Windows CPU CI Pipeline, Windows GPU CI Pipeline

snnn · 2022-07-18T16:32:27Z

/azp run Windows GPU TensorRT CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline, onnxruntime-python-checks-ci-pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed

azure-pipelines · 2022-07-18T16:32:50Z

Azure Pipelines successfully started running 6 pipeline(s).

azure-pipelines · 2022-07-18T16:32:55Z

Azure Pipelines successfully started running 10 pipeline(s).

Fix bug where onnxruntime_USE_NCCL flag would default to ON, causing ORT to not build properly. New functionality: flag is ON when training is enabled and NCCL is not disabled. Flag is OFF otherwise

* support optimizer opt for deepspeed 0.5.9 * resolve comments * resolve comments * FP16_Optimizer Support for more Deepspeed Versions (#12046) * fp16_optimizer for more ds versions * change ds version * bugfix * fix bug * Fix unused function warning for decodeMIDR(). (#12069) Changed from static function defined in header to function declared in header and defined in separate .cc file. * pin protobuf version to be compatible with onnx (#12132) Co-authored-by: Ashwini Khade <[email protected]@orttrainingdev10.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net> * RoiAlign CPU EP add warning for max mode with samples != 1 (#12136) * RoiAlign add warning about incorrect max summation when sample size not 1 * include coreml_provider_factory.h in macos build instead of coreml_ex… (#12138) include coreml_provider_factory.h in macos build instead of coreml_execution_provider.h * List 3.10 as supported python version and remove 3.6 (#12141) list 3.10 as supported python version and remove 3.6 Co-authored-by: Randy Shuai <[email protected]> * Use updated symbolic_helper.check_training_mode (#11900) Co-authored-by: Jingyan Wang, Baiju Meswani * Fix GH issue 12151 by using inverse perms for updating DQ axis attribute (#12158) * Fix GH issue 12151. Need to use inverse perms for updating that axis to what is used for transposing the input. This only applies if the DQ node is doing per-axis dequantization. * fixing positions for beam search gpt2 (#12156) * fixing positions for beam search gpt2 Co-authored-by: Tianlei Wu <[email protected]> * remove wrong placed libs (#12201) * Add file mapping for windows platform. (#12183) * Add file mapping for windows platform. * Add unit test for file mapping for windows. Also add an error message for mis-aligned offset * Add unit test for file mapping for windows. Also add an error message for mis-aligned offset * Update data type to avoid warnings * Compitable data type to avoid warnings. Update CreatFileMapping2 condition for winml compiling. * Add type conversion to avoid warnings for X86 release build. Co-authored-by: Ting Cao <[email protected]> * Fix bug where onnxruntime_USE_NCCL flag would default to ON (#12195) Fix bug where onnxruntime_USE_NCCL flag would default to ON, causing ORT to not build properly. New functionality: flag is ON when training is enabled and NCCL is not disabled. Flag is OFF otherwise Co-authored-by: zhijxu <[email protected]> Co-authored-by: zhijxu <zhijxu> Co-authored-by: Vincent Wang <[email protected]> Co-authored-by: Edward Chen <[email protected]> Co-authored-by: Ashwini Khade <[email protected]> Co-authored-by: Ashwini Khade <[email protected]@orttrainingdev10.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net> Co-authored-by: Dwayne Robinson <[email protected]> Co-authored-by: Carson Swope <[email protected]> Co-authored-by: Randy Shuai <[email protected]> Co-authored-by: jingyanwangms <[email protected]> Co-authored-by: Scott McKay <[email protected]> Co-authored-by: Viswanath Boga <[email protected]> Co-authored-by: leqiao-1 <[email protected]> Co-authored-by: caoting-dotcom <[email protected]> Co-authored-by: Ting Cao <[email protected]> Co-authored-by: Sean Murray <[email protected]>

Fix bug where onnxruntime_USE_NCCL flag would default to ON, causing …

5ce001e

…ORT to not build properly. New functionality: flag is ON when training is enabled and NCCL is not disabled. Flag is OFF otherwise

snnn approved these changes Jul 18, 2022

View reviewed changes

RandySheriffH added the release:1.12 label Jul 18, 2022

RandySheriffH merged commit 9322994 into microsoft:master Jul 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix bug where onnxruntime_USE_NCCL flag would default to ON #12195

Fix bug where onnxruntime_USE_NCCL flag would default to ON #12195

Uh oh!

seanmurr1 commented Jul 16, 2022

Uh oh!

snnn commented Jul 18, 2022

Uh oh!

snnn commented Jul 18, 2022

Uh oh!

azure-pipelines bot commented Jul 18, 2022

Uh oh!

azure-pipelines bot commented Jul 18, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix bug where onnxruntime_USE_NCCL flag would default to ON #12195

Fix bug where onnxruntime_USE_NCCL flag would default to ON #12195

Uh oh!

Conversation

seanmurr1 commented Jul 16, 2022

Uh oh!

snnn commented Jul 18, 2022

Uh oh!

snnn commented Jul 18, 2022

Uh oh!

azure-pipelines bot commented Jul 18, 2022

Uh oh!

azure-pipelines bot commented Jul 18, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants