-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Update base CUDA image for CI to v10.0 cuDNN 7.3.1 #14513
Conversation
I can suggest to try to update to 10 instead, like here: #12850 |
@mxnet-label-bot add[pr-work-in-progress, Test, CI] |
b0f3580
to
e4e31c1
Compare
1aba52f
to
29578b5
Compare
|
b7e824c
to
c01d5f7
Compare
Could you add more info in the description of the issue, like pointers to line numbers? Is not clear to me what is the problem. |
Who uses the variable "CUDNN_VERSION" ? when cuda is loaded? |
Look here |
@larroy I've updated the PR description with the links and examples. Sorry about the confusion. |
5480ed5
to
136443e
Compare
trying cuda 10.1, out of curiosity |
c78796f
to
007ae76
Compare
Trying to put the failing tests on P3 instances, which should have a higher computer and maybe more functionality available to them. |
I've split out the CUDNN_VERSION environment variable commit to its own PR (#14595) to see if it would get passed CI as it currently is. |
007ae76
to
724ea82
Compare
Updated the CI images to be based off cuda10-base and I'm manually installing a lower version of cudnn to test the theory that maybe we aren't compatible with cuDNN 7.5 |
47d3f6b
to
7c0b80b
Compare
…nstalls cuDNN version 7.3.1.20
… installs cuDNN version 7.3.1.20
7c0b80b
to
58a3be0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice
@mxnet-label-bot remove[pr-work-in-progress] |
Nice work @perdasilva @lebeg @marcoabreu |
* Updates Ubuntu GPU CI image base image to cuda10-devel and manually installs cuDNN version 7.3.1.20 * Updates CentOS 7 GPU CI image base image to cuda10-devel and manually installs cuDNN version 7.3.1.20
* Updates Ubuntu GPU CI image base image to cuda10-devel and manually installs cuDNN version 7.3.1.20 * Updates CentOS 7 GPU CI image base image to cuda10-devel and manually installs cuDNN version 7.3.1.20
Description
I believe CI isn't reporting some GPU test issues because both the version of cudnn in the environment that we set and the cudnn version set by the image are lower than what is required for some tests eg by the test.
So, as it is in the example linked above, the test just ensures an error is raised. I don't know if (m)any of the cudnn functions are being tested right now. Maybe there are bugs slipping through.
This PR bumps the base image for GPU instances to CUDA v10.0. Furthermore, it updates the test functions to use the cuDNN version variable given by the environment, and only use 7.0.3 by default, if nothing is set.