add patches to fix test issues for PyTorch 2.1.2 with foss/2023a + CUDA 12.1.1#20156
Conversation
easybuild/easyconfigs/p/PyTorch/PyTorch-2.1.2-foss-2023a-CUDA-12.1.1.eb
Outdated
Show resolved
Hide resolved
|
Test report by @casparvl |
|
Test report by @Flamefire |
|
Test report by @jfgrimm |
|
Test report by @casparvl |
Looks like we need to increase the allowed failures to ~10. Yours report 8. 6 of them are caught in detail: test_Conv1d_pad_same_cuda_tf32, test_constant_specialization, test_delayed_optim_step_offload_true_no_shard, test_file_reader_no_memory_leak, test_file_reader_no_memory_leak, test_file_reader_no_memory_leak The first is from test_nn which is kinda know, the last 3 are known on your machine from other runs. the other 2 (4) I don't know. The full test log might be useful to enhance the RegEx to capture the other 2 tests too. I think it really helps having the individual tests listed conveniently at a single place to judge the failure (see e.g. the last 3 where you can see that it is the same cause, not only: "test_jit_foo", "test_jit_bar", "test_jit" files failed) |
|
@boegelbot please test @ generoso |
|
@casparvl: Request for testing this PR well received on login1 PR test command '
Test results coming soon (I hope)... Details- notification for comment with ID 2016786483 processed Message to humans: this is just bookkeeping information for me, |
Detailed logtest_jit test_proxy_tensor distributed/fsdp/test_fsdp_core test_jit_legacy test_jit_profiling test_nn |
|
@boegelbot please test @ jsc-zen3 |
|
@casparvl: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de PR test command '
Test results coming soon (I hope)... Details- notification for comment with ID 2016792987 processed Message to humans: this is just bookkeeping information for me, |
|
Test report by @boegelbot |
|
Test report by @boegelbot |
|
@casparvl Can you upload the compressed log of the failures (#20156 (comment)) such that the easyblock can be enhanced to also detect the 2 other failing tests by name? |
|
I'll send it to you in a DM. I don't assume there to be much privacy-sensitive info in there, but just to be safe I'll not share it with the world ;-) |
|
Thanks. As for the failures:
|
|
I think I know: I probably didn't rebuild I'll merge this PR: there are sufficient succesful test-reports, and on my system, we have a reasonable understanding of the failing tests. |
|
Going in, thanks @Flamefire! |
|
Test report by @jfgrimm |
foss/2023a + CUDA 12.1.1
(created using
eb --new-pr)Fixes #19946