Skip to content

enhance PyTorch easyblock to print individual failed tests#2983

Merged
branfosj merged 4 commits intoeasybuilders:developfrom
Flamefire:20230810095335_new_pr_pytorch
Aug 24, 2023
Merged

enhance PyTorch easyblock to print individual failed tests#2983
branfosj merged 4 commits intoeasybuilders:developfrom
Flamefire:20230810095335_new_pr_pytorch

Conversation

@Flamefire
Copy link
Copy Markdown
Contributor

@Flamefire Flamefire commented Aug 10, 2023

(created using eb --new-pr)

This enhances the output in the log file by printing not only the test suite of failed tests but also the individual failed tests. This should make it easier to determine if there is only a single test in a test suite that fails for everyone (or most) so we can disable that single test

Edit: As identified in easybuilders/easybuild-easyconfigs#18421 (comment) there are more issues with those possible solutions:

  1. Patch run_test.py to not shard at all (any maybe disable the exit-on-first-failure)
  2. export BUILD_ENVIRONMENT=slow-gradcheck as a hack to disable the parallelization: https://github.com/pytorch/pytorch/blob/v1.13.1/test/run_test.py#L721 although that might get forgotten when the code changes (and it seems to have changed in PyTorch 2 already and was only introduced in 1.13)
  3. Make matching the test-suite-name optional for this pattern, seems to work for this example at least
  4. Look into parsing the XML report (enabled via --save-xml) which might be the best option but requires quite some work.

Edit: As for 4.:
It needs 2 Python packages (i.e. builddependencies): lxml and unittest-xml-reporting and a patch for PyTorch to propagate --save-xml.
But then it has folders named after the tests with 1 or more xml files containing e.g. <testsuites><testsuite name="pytest" errors="0" failures="0" skipped="127" tests="476" time="408.891" timestamp="2023-08-16T13:49:53.750990" hostname="taurusi8002"><testcase classname="TestJitCPU" name="test_jit_alias_remapping_abs_cpu_float32" time="0.063" file="test_ops_jit.py" /> which does look helpful

@boegel boegel changed the title Print individual failed PyTorch tests enhance PyTorch easyblock to print individual failed tests Aug 15, 2023
@boegel boegel added this to the next release (4.8.1?) milestone Aug 15, 2023
@Flamefire
Copy link
Copy Markdown
Contributor Author

@branfosj This change has been tested now multiple times and improves things. I added the possible further enhancements to the OP as found in that other build.
How to proceed?

Copy link
Copy Markdown
Member

@branfosj branfosj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor change and then this is good to go. It has been well tested with various PyTorch PRs.

@branfosj branfosj enabled auto-merge August 24, 2023 16:00
@branfosj branfosj merged commit 6623c0e into easybuilders:develop Aug 24, 2023
@Flamefire Flamefire deleted the 20230810095335_new_pr_pytorch branch August 24, 2023 16:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants