Skip to content

{ai}[foss/2021b] PyTorch v1.12.1 w/ Python 3.9.6 w/ CUDA 11.4.1#17154

Closed
Flamefire wants to merge 1 commit intoeasybuilders:developfrom
Flamefire:20230119102538_new_pr_PyTorch1121
Closed

{ai}[foss/2021b] PyTorch v1.12.1 w/ Python 3.9.6 w/ CUDA 11.4.1#17154
Flamefire wants to merge 1 commit intoeasybuilders:developfrom
Flamefire:20230119102538_new_pr_PyTorch1121

Conversation

@Flamefire
Copy link
Copy Markdown
Contributor

@Flamefire Flamefire commented Jan 19, 2023

(created using eb --new-pr)

Note that on x86 AVX machines this requires the compiler fix from #17135 or test_quantization will fail (specifically test_qnnpack_add_broadcast and test_qnnpack_add)

@branfosj branfosj changed the title {ai}[foss/2021b] PyTorch v1.12.1 w/ Python 3.9.6 {ai}[foss/2021b] PyTorch v1.12.1 w/ Python 3.9.6 w/ CUDA 11.4.1 Jan 19, 2023
@branfosj
Copy link
Copy Markdown
Member

Test report by @branfosj
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
bear-pg0103u14a.bear.cluster - Linux RHEL 8.6, x86_64, Intel(R) Xeon(R) Gold 6330 CPU @ 2.00GHz (icelake), 1 x NVIDIA NVIDIA A30, 520.61.05, Python 3.6.8
See https://gist.github.com/55a76b3a02fca4086c10db4c5f6da077 for a full test report.

@Flamefire
Copy link
Copy Markdown
Contributor Author

Test report by @Flamefire
FAILED
Build succeeded for 0 out of 1 (1 easyconfigs in total)
taurusi8026 - Linux CentOS Linux 7.9.2009, x86_64, AMD EPYC 7352 24-Core Processor (zen2), 8 x NVIDIA NVIDIA A100-SXM4-40GB, 470.57.02, Python 2.7.5
See https://gist.github.com/801d0d9790c91c6606d44a274c9ea5bc for a full test report.

@Flamefire
Copy link
Copy Markdown
Contributor Author

Test report by @Flamefire
FAILED
Build succeeded for 0 out of 1 (1 easyconfigs in total)
taurusml22 - Linux RHEL 7.6, POWER, 8335-GTX (power9le), 6 x NVIDIA Tesla V100-SXM2-32GB, 440.64.00, Python 2.7.5
See https://gist.github.com/38f893b8ca964eb7f959972e3a7fc069 for a full test report.

@Flamefire
Copy link
Copy Markdown
Contributor Author

Test report by @Flamefire
FAILED
Build succeeded for 0 out of 1 (1 easyconfigs in total)
taurusa12 - Linux CentOS Linux 7.7.1908, x86_64, Intel(R) Xeon(R) CPU E5-2603 v4 @ 1.70GHz (broadwell), 3 x NVIDIA GeForce GTX 1080 Ti, 460.32.03, Python 2.7.5
See https://gist.github.com/0c8052ee341c27e2a59ae383ac9ad945 for a full test report.

@Flamefire
Copy link
Copy Markdown
Contributor Author

Flamefire commented Feb 7, 2023

Some multi-GPU tests fail (when multiple GPUs are available). I found that updating to CUDA 11.5.0 fixes this --> See #17272
See pytorch/pytorch#94294

So I'm afraid that this will not work properly unless we decide to use CUDA 11.5 here.

A PyTorch 1.12.1 for 2022a/CUDA 11.7 is available via #16484

@casparvl
Copy link
Copy Markdown
Contributor

@Flamefire to avoid confusion, should we close this in favour of #17272 ? Because if I understand it correctly this PR (i.e. with CUDA 11.4.1) will never work, correct?

@Flamefire
Copy link
Copy Markdown
Contributor Author

@Flamefire to avoid confusion, should we close this in favour of #17272 ? Because if I understand it correctly this PR (i.e. with CUDA 11.4.1) will never work, correct?

Correct. I didn't want to decide that on my own as having 2 CUDAs in a toolchain is at least new.

@Flamefire Flamefire deleted the 20230119102538_new_pr_PyTorch1121 branch March 30, 2023 13:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants