Skip to content

Comments

update PyTorch easyblock to avoid configure warnings by disabling some options#3070

Merged
branfosj merged 2 commits intoeasybuilders:developfrom
Flamefire:pytorch-options
Feb 13, 2024
Merged

update PyTorch easyblock to avoid configure warnings by disabling some options#3070
branfosj merged 2 commits intoeasybuilders:developfrom
Flamefire:pytorch-options

Conversation

@Flamefire
Copy link
Contributor

Explicitely disable some options to avoid warnings during the configure.

Explicitely disable some options to avoid warnings during the configure.
@boegel boegel added the bug fix label Jan 17, 2024
@boegel boegel added this to the release after 4.9.0 milestone Jan 17, 2024
@boegel
Copy link
Member

boegel commented Jan 17, 2024

@Flamefire Can you clarify what kind of warning pop up without these changes?

@boegel boegel changed the title Avoid configure warnings in PyTorch update PyTorch easyblock to avoid configure warnings by disabling some options Jan 17, 2024
@Flamefire
Copy link
Contributor Author

Flamefire commented Jan 17, 2024

@Flamefire Can you clarify what kind of warning pop up without these changes?

Stuff like: "Not compiling with X. Turn this warning off by passing Y". E.g.: https://github.com/pytorch/pytorch/blob/40a6710ad34ae4c6f4987f0e47b5c94df3fc8ec7/cmake/Dependencies.cmake#L803-L813

@easybuilders easybuilders deleted a comment from boegelbot Jan 18, 2024
@boegel
Copy link
Member

boegel commented Jan 20, 2024

Test report by @boegel

Overview of tested easyconfigs (in order)

Build succeeded for 7 out of 9 (7 easyconfigs in total)
node3905.accelgor.os - Linux RHEL 8.8, x86_64, AMD EPYC 7413 24-Core Processor (zen3), 1 x NVIDIA NVIDIA A100-SXM4-80GB, 535.129.03, Python 3.6.8
See https://gist.github.com/boegel/e6b728c6f7fb23b5351ccd707c143ab3 for a full test report.

@Flamefire
Copy link
Contributor Author

@boegel

* **FAIL (build issue)** _PyTorch-1.13.1-foss-2022a-CUDA-11.7.0.eb_ (partial log available at https://gist.github.com/boegel/4498523636b35f08ed7804ce7d9ec05a)

This failed because the check logic didn't find the number of failed tests but detected that some have failed ("+ test_ops_gradients ", the plus sign is an indication for this)
We might need to adjust the regex again. Can you attach the log?
In any case this is nothing new from this PR and it seemingly only has 2 failed tests which looks fine

@Flamefire
Copy link
Contributor Author

Ok, that detection issue is handled in #3085 which includes a larger refactoring of the parsing logic to debug it easier. Ran onto your test report yields:

Failed test names:  FailedTestNames(error=[], fail=['test_fn_grad_linalg_det_singular_cpu_complex128', 'test_forward_mode_AD_linalg_det_singular_cpu_complex128'])
Test result:  TestResult(test_cnt=122917, error_cnt=0, failure_cnt=2, failed_suites=[TestSuiteResult(name='test_ops_gradients', summary='2 failed, 3454 passed, 4032 skipped, 72 xfailed, 152 warnings, 4 rerun')])

You can test the failed EC (PyTorch-1.13.1-foss-2022a-CUDA-11.7.0.eb) with that PR which should make it pass as 2 tests are allowed to fail

@branfosj
Copy link
Member

Test report by @branfosj

Overview of tested easyconfigs (in order)

  • SUCCESS PyTorch-2.0.1-foss-2022b.eb

Build succeeded for 1 out of 1 (1 easyconfigs in total)
bear-pg0105u03a - Linux RHEL 8.6, x86_64, Intel(R) Xeon(R) Platinum 8360Y CPU @ 2.40GHz (icelake), Python 3.6.8
See https://gist.github.com/branfosj/a581d8d414c1200d452a7bed204fe898 for a full test report.

@branfosj branfosj merged commit 597aa56 into easybuilders:develop Feb 13, 2024
@Flamefire Flamefire deleted the pytorch-options branch February 14, 2024 16:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants