Skip to content

use patch to fix flaky test optim test in PyTorch 1.12.1 w/ foss/2021b + CUDA 11.5.2#17732

Closed
branfosj wants to merge 1 commit intoeasybuilders:developfrom
branfosj:20230415130117_new_pr_PyTorch1121
Closed

use patch to fix flaky test optim test in PyTorch 1.12.1 w/ foss/2021b + CUDA 11.5.2#17732
branfosj wants to merge 1 commit intoeasybuilders:developfrom
branfosj:20230415130117_new_pr_PyTorch1121

Conversation

@branfosj
Copy link
Copy Markdown
Member

@branfosj branfosj commented Apr 15, 2023

(created using eb --new-pr)

add patch from #17726 - using separate PRs for each easyconfig

@boegel boegel added this to the next release (4.7.2) milestone Apr 15, 2023
@boegel boegel changed the title fix flaky test optim test PyTorch 1.12.1 foss/2021b w/CUDA use patch to fix flaky test optim test in PyTorch 1.12.1 w/ foss/2021b + CUDA 11.5.2 Apr 15, 2023
@branfosj
Copy link
Copy Markdown
Member Author

Test report by @branfosj
FAILED
Build succeeded for 0 out of 1 (1 easyconfigs in total)
bear-pg0103u11a.bear.cluster - Linux RHEL 8.6, x86_64, Intel(R) Xeon(R) Gold 6330 CPU @ 2.00GHz (icelake), 1 x NVIDIA NVIDIA A100-PCIE-40GB, 520.61.05, Python 3.6.8
See https://gist.github.com/branfosj/568588a95dfa3f05063e922697a8a06b for a full test report.

@branfosj
Copy link
Copy Markdown
Member Author

======================================================================
FAIL: test_rprop (__main__.TestOptim)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/dev/shm/branfosj/build-up-EL8/PyTorch/1.12.1/foss-2021b-CUDA-11.5.2/pytorch-v1.12.1/test/test_optim.py", line 889, in test_rprop
    self._test_basic_cases(
  File "/dev/shm/branfosj/build-up-EL8/PyTorch/1.12.1/foss-2021b-CUDA-11.5.2/pytorch-v1.12.1/test/test_optim.py", line 298, in _test_basic_cases
    self._test_state_dict(
  File "/dev/shm/branfosj/build-up-EL8/PyTorch/1.12.1/foss-2021b-CUDA-11.5.2/pytorch-v1.12.1/test/test_optim.py", line 240, in _test_state_dict
    self.assertEqual(bias, bias_cuda)
  File "/dev/shm/branfosj/tmp-up-EL8/eb-bn4q41sd/tmpu8l8v5iq/lib/python3.9/site-packages/torch/testing/_internal/common_utils.py", line 2219, in assertEqual
    assert_equal(
  File "/dev/shm/branfosj/tmp-up-EL8/eb-bn4q41sd/tmpu8l8v5iq/lib/python3.9/site-packages/torch/testing/_comparison.py", line 1095, in assert_equal
    raise error_metas[0].to_error(msg)
AssertionError: Tensor-likes are not close!

Mismatched elements: 1 / 10 (10.0%)
Greatest absolute difference: 4.678964614868164e-05 at index (2,) (up to 1e-05 allowed)
Greatest relative difference: 0.0001691698051077644 at index (2,) (up to 1.3e-06 allowed)

@branfosj
Copy link
Copy Markdown
Member Author

Based on the failures seen, I am closing this and suggesting we revert adding it to the other PyTorch 1.12.1 that we've merged ( #17737)

@branfosj branfosj closed this Apr 15, 2023
@branfosj branfosj deleted the 20230415130117_new_pr_PyTorch1121 branch April 18, 2023 09:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants