Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Update base CUDA image for CI to v10.0 cuDNN 7.3.1 #14513

Merged
merged 2 commits into from
May 2, 2019

Conversation

perdasilva
Copy link
Contributor

@perdasilva perdasilva commented Mar 25, 2019

Description

I believe CI isn't reporting some GPU test issues because both the version of cudnn in the environment that we set and the cudnn version set by the image are lower than what is required for some tests eg by the test.

So, as it is in the example linked above, the test just ensures an error is raised. I don't know if (m)any of the cudnn functions are being tested right now. Maybe there are bugs slipping through.

This PR bumps the base image for GPU instances to CUDA v10.0. Furthermore, it updates the test functions to use the cuDNN version variable given by the environment, and only use 7.0.3 by default, if nothing is set.

@perdasilva perdasilva changed the title [WIP] Update base CUDA image for CI [WIP] Update base CUDA image for CI to v9.2 Mar 25, 2019
@lebeg
Copy link
Contributor

lebeg commented Mar 25, 2019

I can suggest to try to update to 10 instead, like here: #12850

@perdasilva perdasilva changed the title [WIP] Update base CUDA image for CI to v9.2 [WIP] Update base CUDA image for CI to v10.0 Mar 25, 2019
@abhinavs95
Copy link
Contributor

@mxnet-label-bot add[pr-work-in-progress, Test, CI]

@marcoabreu marcoabreu added CI pr-work-in-progress PR is still work in progress Test labels Mar 25, 2019
@perdasilva perdasilva force-pushed the update_cuda_image branch 3 times, most recently from b0f3580 to e4e31c1 Compare March 26, 2019 12:16
@perdasilva perdasilva force-pushed the update_cuda_image branch 2 times, most recently from 1aba52f to 29578b5 Compare March 27, 2019 07:41
@perdasilva
Copy link
Contributor Author

perdasilva commented Mar 27, 2019

I'm disabling the failing tests until we can figure out how to fix them. I think it's important to apply these changes to CI asap so that we don't have errors slipping through.

@perdasilva perdasilva changed the title [WIP] Update base CUDA image for CI to v10.0 Update base CUDA image for CI to v10.0 Mar 27, 2019
@larroy
Copy link
Contributor

larroy commented Mar 28, 2019

Could you add more info in the description of the issue, like pointers to line numbers? Is not clear to me what is the problem.

@larroy
Copy link
Contributor

larroy commented Mar 28, 2019

Who uses the variable "CUDNN_VERSION" ? when cuda is loaded?

@lebeg
Copy link
Contributor

lebeg commented Mar 28, 2019

Who uses the variable "CUDNN_VERSION" ? when cuda is loaded?

Look here

@perdasilva
Copy link
Contributor Author

@larroy I've updated the PR description with the links and examples. Sorry about the confusion.

@perdasilva
Copy link
Contributor Author

trying cuda 10.1, out of curiosity

@perdasilva perdasilva force-pushed the update_cuda_image branch 3 times, most recently from c78796f to 007ae76 Compare April 1, 2019 17:57
@perdasilva
Copy link
Contributor Author

Trying to put the failing tests on P3 instances, which should have a higher computer and maybe more functionality available to them.

@perdasilva
Copy link
Contributor Author

I've split out the CUDNN_VERSION environment variable commit to its own PR (#14595) to see if it would get passed CI as it currently is.

@perdasilva
Copy link
Contributor Author

Updated the CI images to be based off cuda10-base and I'm manually installing a lower version of cudnn to test the theory that maybe we aren't compatible with cuDNN 7.5

@perdasilva perdasilva changed the title Update base CUDA image for CI to v10.0 [WIP] Update base CUDA image for CI to v10.0 Apr 2, 2019
@perdasilva perdasilva mentioned this pull request Apr 2, 2019
@perdasilva perdasilva force-pushed the update_cuda_image branch 4 times, most recently from 47d3f6b to 7c0b80b Compare April 3, 2019 07:41
@perdasilva perdasilva changed the title [WIP] Update base CUDA image for CI to v10.0 [WIP] Update base CUDA image for CI to v10.0 cuDNN 7.3.1 Apr 3, 2019
@perdasilva perdasilva changed the title [WIP] Update base CUDA image for CI to v10.0 cuDNN 7.3.1 Update base CUDA image for CI to v10.0 cuDNN 7.3.1 May 2, 2019
@perdasilva
Copy link
Contributor Author

Re-running CI after rebase. If it's good, please merge. @stu1130 will explore fixing any issues that are brought by using cudnn 7.5 (#14652) and thereafter bring CI to 10.1.

Copy link
Contributor

@lebeg lebeg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

@perdasilva
Copy link
Contributor Author

@mxnet-label-bot remove[pr-work-in-progress]

@marcoabreu marcoabreu removed the pr-work-in-progress PR is still work in progress label May 2, 2019
@marcoabreu marcoabreu merged commit 36c3306 into apache:master May 2, 2019
@KellenSunderland
Copy link
Contributor

Nice work @perdasilva @lebeg @marcoabreu

@perdasilva perdasilva deleted the update_cuda_image branch May 3, 2019 05:40
access2rohit pushed a commit to access2rohit/incubator-mxnet that referenced this pull request May 14, 2019
* Updates Ubuntu GPU CI image base image to cuda10-devel and manually installs cuDNN version 7.3.1.20

* Updates CentOS 7 GPU CI image base image to cuda10-devel and manually installs cuDNN version 7.3.1.20
haohuanw pushed a commit to haohuanw/incubator-mxnet that referenced this pull request Jun 23, 2019
* Updates Ubuntu GPU CI image base image to cuda10-devel and manually installs cuDNN version 7.3.1.20

* Updates CentOS 7 GPU CI image base image to cuda10-devel and manually installs cuDNN version 7.3.1.20
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants