{ai}[foss/2023a] PyTorch v2.1.2 w/ CUDA 12.1.1#19666
{ai}[foss/2023a] PyTorch v2.1.2 w/ CUDA 12.1.1#19666lexming merged 15 commits intoeasybuilders:developfrom
Conversation
…atch Two more patches for PyTorch-2.1.2-foss-2023a-CUDA-12.1.1.eb
|
@lexming thanks! |
|
Test report by @jfgrimm |
|
Test report by @casparvl From stdout: From the EasyBuild log: |
|
Test report by @lexming
|
|
Test report by @lexming
|
lexming
left a comment
There was a problem hiding this comment.
On my side with just 4 test failures, this looks pretty good to me already.
|
@Flamefire Any input on this? |
|
haven't had a chance to look into the |
|
Test report by @jfgrimm
|
|
Test report by @jfgrimm
|
|
Test report by @boegel edit: |
|
So, test results across a variety of systems summarized:
So even in the worst case, 99.6% of all tests pass... @jfgrimm I'm strongly in favor of setting |
|
ohhhh damn I forgot to add |
|
Merging, thanks @jfgrimm and @Flamefire and all the testers! |
|
Still no idea why my builds are failing... The only thing I keep seeing consistently is but no clue why. If anyone has ideas, I'd love to hear :D Where there fixes to the toolchain at somepoint? Maybe I should try and recompile that... Anyway, good to get this merged, no point in making it wait for something that is clearly specific to my machine :) |
|
@boegelbot please test @ jsc-zen3 |
|
@lexming: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de PR test command '
Test results coming soon (I hope)... Details- notification for comment with ID 1966533488 processed Message to humans: this is just bookkeeping information for me, |
|
Test report by @boegelbot |
|
Test report by @emdrago |
|
@boegelbot please test @ generoso |
|
@lexming: Request for testing this PR well received on login1 PR test command '
Test results coming soon (I hope)... Details- notification for comment with ID 1968459391 processed Message to humans: this is just bookkeeping information for me, |
|
Test report by @boegelbot |
|
Test report by @Flamefire |
|
We tried building this here at DTU, but it failed with 9 tests that failed. Should it not pass if there are less than 50 tests failing? That part does not seem to work, and I can see that the tests being done above are all with It is a not-yet-implemented feature (in EB 5.0 perhaps) that up to 50 tests can fail, or is there some bug such that the |
If it doesn't say anything about allowed failures in the output then this is a different issue. E.g. in my test report above I have: The |
|
Thank you, @Flamefire We are having Now we are trying to install with |
|
Yes, so |
|
Thanks for your help. It is really strange, it looks like nvcc fails to compile some C++ template code. Could there be some recent update to something that we have missed, e.g. if we have installed Ninja too early? |
|
Yep, that rings a bell for me. And is actually a good reason for why I'm so eager to investigate test failures instead of dismissing them just because many/most are bad code. This is an issue with the pybind11 version we use (pybind11/2.11.1-GCCcore-12.3.0) The relevant patch is https://github.com/easybuilders/easybuild-easyconfigs/blob/f9d0188424dd253a0718af1c61ba002f66072b46/easybuild/easyconfigs/p/pybind11/pybind11-2.10.3_fix-nvcc-compat.patch I suppose you could also apply that to your installed pybind11 module manually |
I opened easybuilders/easybuild-easyblocks#3255 to improve the message in this case |
|
Just to let you both know: I hit the same issue as @schiotz that I thought the So I'll try to rebuild I think improving the error message is indeed a good idea. Your explanation here
Is very clear, but without it the |
|
After rebuilding pybind11, PyTorch compiled without issues (finished during the night). Thank you very much @Flamefire, we had built pybind11 a few weeks before your patch to it. |
|
Test report by @emdrago |
(created using
eb --new-pr)haven't actually run the tests yet to see how many fail