This repository has been archived by the owner on Nov 17, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 6.8k
[v1.x][CI]Flaky tests on Python3:GPU and cpp package GPU Makefile test suites #20011
Labels
Comments
access2rohit
changed the title
Flaky tests on Python3:GPU and cpp package GPU Makefile test suites
[v1.x][CI]Flaky tests on Python3:GPU and cpp package GPU Makefile test suites
Mar 11, 2021
4 tasks
access2rohit
pushed a commit
to access2rohit/incubator-mxnet
that referenced
this issue
Mar 11, 2021
Zha0q1
pushed a commit
that referenced
this issue
Mar 12, 2021
…eline (#19974) * migrating cd builds to ninja + removing static links to nvidia libs and leagacy cuda versions * installing NCCL manually for cuda11.2 container * set MSHADOW_USE_CUDNN=1 in CMakelists of mshadow to build properly for CUDNN support * adding coverage to cd requirements file to fix cu100, cu101 and cu102 tests * updating cd_test containers to ubuntu 18 * adding cmake config for linux native and adding USE_KV_STORE in linux_cpu * updating zmq builds to statically link to libmxnet.so * updating toolchains for r, clang and llvm for ubuntu18. OpenBlas Static link for 'distribution' build type only. Fix caffe build to use openCV 3. Remove leagacy Clang 3.9 from CI * fix versions for pip install in ubuntu_core_sh add new search path for cuDNN * finxing cudnn link problem for CUDA<=11.0 * adding library paths for libjpegturbo and lapack to fix failing CI on ubuntu 18 images * removing ASAN integration test from miscellaneous CI as its not required * fix lapack path for gpu builds * correctly installing libjpegturbo for ubuntu 18 * updating docker images of r,jekyll,julia etc test containers+ fix java version to 8 * installing libomp.so * removing debug test as its not required. Code clean-up * adding alternate URL source for MNIST dataset as original website is down * skipping flaky tests issue tracked #20011 Co-authored-by: Rohit Kumar Srivastava <[email protected]>
access2rohit
added a commit
to access2rohit/incubator-mxnet
that referenced
this issue
Mar 12, 2021
…eline (apache#19974) * migrating cd builds to ninja + removing static links to nvidia libs and leagacy cuda versions * installing NCCL manually for cuda11.2 container * set MSHADOW_USE_CUDNN=1 in CMakelists of mshadow to build properly for CUDNN support * adding coverage to cd requirements file to fix cu100, cu101 and cu102 tests * updating cd_test containers to ubuntu 18 * adding cmake config for linux native and adding USE_KV_STORE in linux_cpu * updating zmq builds to statically link to libmxnet.so * updating toolchains for r, clang and llvm for ubuntu18. OpenBlas Static link for 'distribution' build type only. Fix caffe build to use openCV 3. Remove leagacy Clang 3.9 from CI * fix versions for pip install in ubuntu_core_sh add new search path for cuDNN * finxing cudnn link problem for CUDA<=11.0 * adding library paths for libjpegturbo and lapack to fix failing CI on ubuntu 18 images * removing ASAN integration test from miscellaneous CI as its not required * fix lapack path for gpu builds * correctly installing libjpegturbo for ubuntu 18 * updating docker images of r,jekyll,julia etc test containers+ fix java version to 8 * installing libomp.so * removing debug test as its not required. Code clean-up * adding alternate URL source for MNIST dataset as original website is down * skipping flaky tests issue tracked apache#20011 Co-authored-by: Rohit Kumar Srivastava <[email protected]>
mseth10
added a commit
that referenced
this issue
Mar 14, 2021
…20015) * [BACKPORT]Enable CUDA 11.0 on nightly + CUDA 11.2 on pip (#19295)(#19764) (#19930) * Enable CUDA 11.0 on nightly development builds (#19295) Remove CUDA 9.2 and CUDA 10.0 * [PIP] add build variant for cuda 11.2 (#19764) * adding ci docker files for cu111 and cu112 * removing previous CUDA make versions and adding support for cuda11.2 Co-authored-by: waytrue17 <[email protected]> Co-authored-by: Sheng Zha <[email protected]> Co-authored-by: Rohit Kumar Srivastava <[email protected]> * [FEATURE]Migrating all CD pipelines to Ninja build + fix cu112 CD pipeline (#19974) * migrating cd builds to ninja + removing static links to nvidia libs and leagacy cuda versions * installing NCCL manually for cuda11.2 container * set MSHADOW_USE_CUDNN=1 in CMakelists of mshadow to build properly for CUDNN support * adding coverage to cd requirements file to fix cu100, cu101 and cu102 tests * updating cd_test containers to ubuntu 18 * adding cmake config for linux native and adding USE_KV_STORE in linux_cpu * updating zmq builds to statically link to libmxnet.so * updating toolchains for r, clang and llvm for ubuntu18. OpenBlas Static link for 'distribution' build type only. Fix caffe build to use openCV 3. Remove leagacy Clang 3.9 from CI * fix versions for pip install in ubuntu_core_sh add new search path for cuDNN * finxing cudnn link problem for CUDA<=11.0 * adding library paths for libjpegturbo and lapack to fix failing CI on ubuntu 18 images * removing ASAN integration test from miscellaneous CI as its not required * fix lapack path for gpu builds * correctly installing libjpegturbo for ubuntu 18 * updating docker images of r,jekyll,julia etc test containers+ fix java version to 8 * installing libomp.so * removing debug test as its not required. Code clean-up * adding alternate URL source for MNIST dataset as original website is down * skipping flaky tests issue tracked #20011 Co-authored-by: Rohit Kumar Srivastava <[email protected]> * update cudnn from 7 to 8 for cu102 (#19506) * update cudnn from 7 to 8 for cu102 (#19522) * downloading MNIST dataset from alternate URL (#20014) Co-authored-by: Rohit Kumar Srivastava <[email protected]> * fixing CI issue with v1.8.x * addressing review comments Co-authored-by: waytrue17 <[email protected]> Co-authored-by: Sheng Zha <[email protected]> Co-authored-by: Rohit Kumar Srivastava <[email protected]> Co-authored-by: Manu Seth <[email protected]>
mseth10
added a commit
to mseth10/incubator-mxnet
that referenced
this issue
Mar 15, 2021
…pache#20015) * [BACKPORT]Enable CUDA 11.0 on nightly + CUDA 11.2 on pip (apache#19295)(apache#19764) (apache#19930) * Enable CUDA 11.0 on nightly development builds (apache#19295) Remove CUDA 9.2 and CUDA 10.0 * [PIP] add build variant for cuda 11.2 (apache#19764) * adding ci docker files for cu111 and cu112 * removing previous CUDA make versions and adding support for cuda11.2 Co-authored-by: waytrue17 <[email protected]> Co-authored-by: Sheng Zha <[email protected]> Co-authored-by: Rohit Kumar Srivastava <[email protected]> * [FEATURE]Migrating all CD pipelines to Ninja build + fix cu112 CD pipeline (apache#19974) * migrating cd builds to ninja + removing static links to nvidia libs and leagacy cuda versions * installing NCCL manually for cuda11.2 container * set MSHADOW_USE_CUDNN=1 in CMakelists of mshadow to build properly for CUDNN support * adding coverage to cd requirements file to fix cu100, cu101 and cu102 tests * updating cd_test containers to ubuntu 18 * adding cmake config for linux native and adding USE_KV_STORE in linux_cpu * updating zmq builds to statically link to libmxnet.so * updating toolchains for r, clang and llvm for ubuntu18. OpenBlas Static link for 'distribution' build type only. Fix caffe build to use openCV 3. Remove leagacy Clang 3.9 from CI * fix versions for pip install in ubuntu_core_sh add new search path for cuDNN * finxing cudnn link problem for CUDA<=11.0 * adding library paths for libjpegturbo and lapack to fix failing CI on ubuntu 18 images * removing ASAN integration test from miscellaneous CI as its not required * fix lapack path for gpu builds * correctly installing libjpegturbo for ubuntu 18 * updating docker images of r,jekyll,julia etc test containers+ fix java version to 8 * installing libomp.so * removing debug test as its not required. Code clean-up * adding alternate URL source for MNIST dataset as original website is down * skipping flaky tests issue tracked apache#20011 Co-authored-by: Rohit Kumar Srivastava <[email protected]> * update cudnn from 7 to 8 for cu102 (apache#19506) * update cudnn from 7 to 8 for cu102 (apache#19522) * downloading MNIST dataset from alternate URL (apache#20014) Co-authored-by: Rohit Kumar Srivastava <[email protected]> * fixing CI issue with v1.8.x * addressing review comments Co-authored-by: waytrue17 <[email protected]> Co-authored-by: Sheng Zha <[email protected]> Co-authored-by: Rohit Kumar Srivastava <[email protected]> Co-authored-by: Manu Seth <[email protected]>
7 tasks
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Description
unix-gpu has some flaky tests on
Python3:GPU
andcpp package GPU Makefile
they fail quite frequenty even without any code that touches them.Occurrences
Python3:GPU
failing test:cpp package GPU Makefile
failing test:Next Steps
Since they are blocking the PRs and making CI unstable. Immediate action is to disable them and investigate
The text was updated successfully, but these errors were encountered: