Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Fix NCCL Cmake autodetect issue #17297

Merged
merged 9 commits into from
Jan 22, 2020
Merged

Conversation

ChaiBapchya
Copy link
Contributor

@ChaiBapchya ChaiBapchya commented Jan 14, 2020

Description

Fixes #17239

Checklist

Essentials

  • Changes are complete (i.e. I finished coding on this PR)
  • Code is well-documented:
  • To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

modified: cmake/Modules/FindNCCL.cmake

Test

Tested with following command

cmake -GNinja -DUSE_CUDA=ON -DCMAKE_CUDA_COMPILER_LAUNCHER=ccache -DCMAKE_C_COMPILER_LAUNCHER=ccache -DCMAKE_CXX_COMPILER_LAUNCHER=ccache -DCMAKE_BUILD_TYPE=Release -DUSE_CUDNN=ON -DUSE_NCCL=ON ..

Without this branch, it failed with NCCL not found

Could NOT find NCCL (missing: NCCL_INCLUDE_DIRS NCCL_LIBRARIES)
CMake Warning at CMakeLists.txt:636 (message):
  Could not find NCCL libraries

With this PR, it passes.

Copy link
Contributor

@apeforest apeforest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm except for a few cosmetic change

cmake/Modules/FindNCCL.cmake Outdated Show resolved Hide resolved
cmake/Modules/FindNCCL.cmake Outdated Show resolved Hide resolved
cmake/Modules/FindNCCL.cmake Outdated Show resolved Hide resolved
cmake/Modules/FindNCCL.cmake Outdated Show resolved Hide resolved
@ChaiBapchya
Copy link
Contributor Author

Verified this works by running following
Fresh cmake build

rm -rf build
mkdir -p build && cd build
cmake -GNinja \
    -DUSE_CUDA=ON \
    -DUSE_MKL_IF_AVAILABLE=OFF \
    -DCMAKE_CUDA_COMPILER_LAUNCHER=ccache \
    -DCMAKE_C_COMPILER_LAUNCHER=ccache \
    -DCMAKE_CXX_COMPILER_LAUNCHER=ccache \
    -DCMAKE_BUILD_TYPE=Release \
    -DUSE_NCCL=ON \
..
ninja

Python binding

cd ..
pip install -e python/.
python3 -c "from mxnet.runtime import feature_list; print(feature_list())"
[✔ CUDA, ✔ CUDNN, ✔ NCCL, ✔ CUDA_RTC, ✖ TENSORRT, ✔ CPU_SSE, ✔ CPU_SSE2, ✔ CPU_SSE3, ✔ CPU_SSE4_1, ✔ CPU_SSE4_2, ✖ CPU_SSE4A, ✔ CPU_AVX, ✖ CPU_AVX2, ✔ OPENMP, ✖ SSE, ✔ F16C, ✔ JEMALLOC, ✔ BLAS_OPEN, ✖ BLAS_ATLAS, ✖ BLAS_MKL, ✖ BLAS_APPLE, ✔ LAPACK, ✖ MKLDNN, ✔ OPENCV, ✖ CAFFE, ✖ PROFILER, ✖ DIST_KVSTORE, ✖ CXX14, ✖ INT64_TENSOR_SIZE, ✔ SIGNAL_HANDLER, ✖ DEBUG, ✖ TVM_OP]

@ChaiBapchya
Copy link
Contributor Author

ChaiBapchya commented Jan 15, 2020

@mxnet-label-bot add [pr-awaiting-review]

@lanking520 lanking520 added the pr-awaiting-review PR is waiting for code review label Jan 15, 2020
cmake/Modules/FindNCCL.cmake Outdated Show resolved Hide resolved
cmake/Modules/FindNCCL.cmake Outdated Show resolved Hide resolved
cmake/Modules/FindNCCL.cmake Outdated Show resolved Hide resolved
Also add warning about deprecation
@ChaiBapchya
Copy link
Contributor Author

Also @leezu any idea why it's failing for Windows CPU Cmake. Unable to discern from the Error log.

@leezu
Copy link
Contributor

leezu commented Jan 17, 2020

Not sure. I restarted the job to make sure it's not an intermittent issue.

@ChaiBapchya
Copy link
Contributor Author

Wow! It passed! Weird! @leezu thanks for retrigger

@apeforest
Copy link
Contributor

@ChaiBapchya Please fix the indent problem as I pointed out. Otherwise, this PR is good to go for me.

cmake/Modules/FindNCCL.cmake Outdated Show resolved Hide resolved
Copy link
Contributor

@apeforest apeforest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@leezu leezu merged commit 3de5cae into apache:master Jan 22, 2020
@ChaiBapchya ChaiBapchya deleted the nccl_autodetect branch January 22, 2020 19:12
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
pr-awaiting-review PR is waiting for code review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cmake with NCCL flag does not work.
4 participants