Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Fix nightly CD for GPU builds #18205

Merged
merged 4 commits into from
Apr 30, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion cd/mxnet_lib/static/Jenkins_pipeline.groovy
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ licenses = 'licenses/*'

// libmxnet dependencies
mx_native_deps = 'lib/libgfortran.so.4, lib/libquadmath.so.0'
mx_deps = 'lib/libgfortran.so.4, lib/libquadmath.so.0, 3rdparty/mkldnn/build/install/include/dnnl_version.h, 3rdparty/mkldnn/build/install/include/dnnl_config.h'
mx_deps = 'lib/libgfortran.so.4, lib/libquadmath.so.0, include/mkldnn/dnnl_version.h, include/mkldnn/dnnl_config.h'

// library type
// either static or dynamic - depending on how it links to its dependencies
Expand Down
4 changes: 3 additions & 1 deletion ci/docker/runtime_functions.sh
Original file line number Diff line number Diff line change
Expand Up @@ -989,7 +989,8 @@ cd_unittest_ubuntu() {

# Adding these here as CI doesn't test all CUDA environments
pytest example/image-classification/test_score.py
integrationtest_ubuntu_gpu_dist_kvstore
# TODO(szha): fix and reenable the hanging issue. tracked in #18098
# integrationtest_ubuntu_gpu_dist_kvstore
fi

if [[ ${mxnet_variant} = *mkl ]]; then
Expand Down Expand Up @@ -1885,6 +1886,7 @@ build_static_libmxnet() {
source /opt/rh/devtoolset-7/enable
source /opt/rh/rh-python36/enable
export USE_SYSTEM_CUDA=1
export CMAKE_STATICBUILD=1
local mxnet_variant=${1:?"This function requires a python command as the first argument"}
source tools/staticbuild/build.sh ${mxnet_variant}
popd
Expand Down
2 changes: 1 addition & 1 deletion config/distribution/linux_cu100.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -33,4 +33,4 @@ set(USE_F16C OFF CACHE BOOL "Build with x86 F16C instruction support")
set(USE_LIBJPEG_TURBO ON CACHE BOOL "Build with libjpeg-turbo")

set(CUDACXX "/usr/local/cuda-10.0/bin/nvcc" CACHE STRING "Cuda compiler")
set(MXNET_CUDA_ARCH "3.0;5.0;6.0;7.0;7.5" CACHE STRING "Cuda architectures")
set(MXNET_CUDA_ARCH "3.0;5.0;6.0;7.0" CACHE STRING "Cuda architectures")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dropping support for 7.5 cuda arch is it favorable?
For eg for Tesla T4 [G4 instances] cuda arch supported is 7.5
@leezu What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a temporary fix to get the CD working, I checked the builds for cu100 and cu102, both fail because of binary size issues. We should work on adding back 7.5 arch after making sure the build works.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sidenote: 7.0 binaries also run on 7.5

2 changes: 1 addition & 1 deletion config/distribution/linux_cu101.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -35,4 +35,4 @@ set(USE_F16C OFF CACHE BOOL "Build with x86 F16C instruction support")
set(USE_LIBJPEG_TURBO ON CACHE BOOL "Build with libjpeg-turbo")

set(CUDACXX "/usr/local/cuda-10.1/bin/nvcc" CACHE STRING "Cuda compiler")
set(MXNET_CUDA_ARCH "3.0;5.0;6.0;7.0;7.5" CACHE STRING "Cuda architectures")
set(MXNET_CUDA_ARCH "3.0;5.0;6.0;7.0" CACHE STRING "Cuda architectures")
eric-haibin-lin marked this conversation as resolved.
Show resolved Hide resolved
2 changes: 1 addition & 1 deletion config/distribution/linux_cu102.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -33,4 +33,4 @@ set(USE_F16C OFF CACHE BOOL "Build with x86 F16C instruction support")
set(USE_LIBJPEG_TURBO ON CACHE BOOL "Build with libjpeg-turbo")

set(CUDACXX "/usr/local/cuda-10.2/bin/nvcc" CACHE STRING "Cuda compiler")
set(MXNET_CUDA_ARCH "3.0;5.0;6.0;7.0;7.5" CACHE STRING "Cuda architectures")
set(MXNET_CUDA_ARCH "3.0;5.0;6.0;7.0" CACHE STRING "Cuda architectures")
5 changes: 2 additions & 3 deletions tools/pip/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -150,9 +150,8 @@ def skip_markdown_comments(md):
package_data = {'mxnet': [os.path.join('mxnet', os.path.basename(LIB_PATH[0]))],
'dmlc_tracker': []}
if variant.endswith('MKL'):
if platform.system() == 'Darwin':
shutil.copytree(os.path.join(CURRENT_DIR, 'mxnet-build/3rdparty/mkldnn/build/install/include'),
os.path.join(CURRENT_DIR, 'mxnet/include/mkldnn'))
shutil.copytree(os.path.join(CURRENT_DIR, 'mxnet-build/3rdparty/mkldnn/build/install/include'),
Copy link
Contributor

@ChaiBapchya ChaiBapchya Apr 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious how

  1. previously when nightly CD was working, why was mkldnn include done only for Darwin
  2. now, to fix nightly CD, this needs to be done for all OS

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This platform condition was added by mistake in some earlier commit. We are fixing that here. No reason to package dnnl header files for only Darwin.

os.path.join(CURRENT_DIR, 'mxnet/include/mkldnn'))
if platform.system() == 'Linux':
libdir, mxdir = os.path.dirname(LIB_PATH[0]), os.path.join(CURRENT_DIR, 'mxnet')
if os.path.exists(os.path.join(libdir, 'libgfortran.so.3')):
Expand Down