-
Notifications
You must be signed in to change notification settings - Fork 703
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
{lib}[GCCcore/11.2.0,foss/2021b] CuPy v11.4.0, pytest v7.2.2 w/ Python 3.9.6 #17526
{lib}[GCCcore/11.2.0,foss/2021b] CuPy v11.4.0, pytest v7.2.2 w/ Python 3.9.6 #17526
Conversation
….2-GCCcore-11.2.0.eb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Test report by @Micket |
Test report by @Micket |
Test report by @akesandgren |
Test report by @Micket |
Test report by @Micket |
Damnit, filled up home dir quota with |
easybuild/easyconfigs/c/CuPy/CuPy-11.4.0-foss-2021b-CUDA-11.4.1.eb
Outdated
Show resolved
Hide resolved
Test report by @Micket |
uh, i have no idea how the pytest test suite failed on the V100 nodes. it worked before. |
('cuDNN', '8.2.2.26', versionsuffix, SYSTEM), | ||
('NCCL', '2.10.3', versionsuffix), | ||
('cuTENSOR', '1.6.1.5', versionsuffix, SYSTEM), | ||
# ('cuSPARSELt', '0.3.0.3', versionsuffix, SYSTEM), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this commented? If it's really not needed, can't it just be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was an optional thing, but i had problems getting that to work in my PR. I've forgotten what the reason was, only that i didn't care enough about that optional feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It currently crashes/fails to compile some part, I left in in place to be enabled for a newer toolchain
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we put that as a comment? I.e. that it's an optional dep, commented out because it gave issues, but could be re-enabled for newer toolchain?
Test report by @casparvl |
Test report by @Micket |
Test report by @Micket |
You can ignore my test, I completely overlooked the dependency on the EasyBlock. I'll re-test... |
Test report by @casparvl |
Same result, unfortunately... |
@casparvl you have an odd test failure in pytest already, can you look closer at that? |
@akesandgren: Tests failed in GitHub Actions, see https://github.com/easybuilders/easybuild-easyconfigs/actions/runs/4423597098
bleep, bloop, I'm just a bot (boegelbot v20200716.01) |
…ore according to our rules.
A40's are failing cupy test
|
Test report by @Micket |
You forgot to include the easyblock, CuPy uses "testinstall" which is fixed in that easyblock |
Ah, my bad, will retest. Thanks for the pointer. |
I was investigating the TestProduct.test_tensordot failure for the A40. @testing.for_all_dtypes()
@testing.numpy_cupy_allclose()
def test_tensordot(self, xp, dtype):
a = testing.shaped_arange((2, 3, 4), xp, dtype)
b = testing.shaped_arange((3, 4, 5), xp, dtype)
return xp.tensordot(a, b) I can generate the test data manually
So, with float16 precision on a T4, it does compute the expected output >>> np.tensordot(a_np, b_np)
array([[2938., 3016., 3094., 3172., 3250.],
[7040., 7264., 7488., 7708., 7928.]], dtype=float16)
>>> cp.tensordot(a, b)
array([[2938., 3016., 3094., 3172., 3250.],
[7040., 7264., 7488., 7708., 7928.]], dtype=float16) Since it wasn't clear from the actual/desired output, i can confirm that my icelake CPUs also produce the same output using numpy. actual = array([[2938., 3016., 3094., 3172., 3250.],
[7044., 7264., 7484., 7708., 7932.]], dtype=float16) # wrong results on A40 float16's have enough poor precision that these are actually rounding errors. With higher precision, >>> np.tensordot(np.array(a, dtype=np.float64), np.array(b, dtype=np.float64))
array([[2938., 3016., 3094., 3172., 3250.],
[7042., 7264., 7486., 7708., 7930.]]) So these numbers aren't really any worse in terms of floating point accuracy. Annoying that isn't 1 to 1 with numpy, but i think we just have to ignore this test. |
Test report by @casparvl |
Ok, finally one build that completes. But only when using local scratch disks. Also, I don't see the same test failures that @Micket see's on the A40 (we have A100s). From his comment above, I interpret it as a flaky test though. How do you both think we should proceed on this? I see two options:
And the one from @Micket from
What do you think? |
Test report by @Micket |
The values in the range 4096 - 8192 have a precision interval of 4 so the test is incorrectly written, rtol should probably be {'float16': 1e-3} that would handle both ranges. Conclusion, test is wrong and/or very badly choosen. |
@akesandgren Yep, that's a clear description. I would suggest to also link this PR, or more specifically this comment #17526 (comment) in there though. That will give people a concrete output to compare their own to if they run into this issue, and provide the full context. |
@casparvl does that look good? |
Test report by @Micket |
It's still runinng the tensordot test. I tried specifying the full path to this particular test, including the ::test_tensordot part, but it still ran and failed it.
|
'tests/cupy_tests/fft_tests/test_fft.py', | ||
# float16 has too low precision for these tests as they are written | ||
# See https://github.com/easybuilders/easybuild-easyconfigs/pull/17526#issuecomment-1470843170 for details. | ||
'tests/cupy_tests/linalg_tests/test_product.py::TestProduct', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, turns out this syntax isn't a thing in pytest. This line does nothing. Looks like this syntax pytest-dev/pytest#3198 but this was never supported, the issue was closed in favor of a regex of what to include.
Needs to be a different flag -k "not test_tensordot"
Ugh. This sucks. It won't even let us specify the full name, like "TestProduct::test_tensordot", just the last part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So options are
- let this fail on A40, whoever installs it just has to ignore test failures.
- patch this test out
- add the -k flag, which just really sucks if we ever need to add more than one.
Regardless, this broken ::TestProduct ignore line should be removed, as it does nothing
Test report by @Micket |
@Micket that last commit seem to fix the A40 problem, at least my build on A40's passed cleanly. |
Why leave in |
It is supported when when using --deselect, according to the actually merged PR for that part |
Test report by @Micket |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Going in, thanks @akesandgren! |
(created using
eb --new-pr
)Supercedes #17136
The problems with the scipy tests need more investigation.
Depends on easybuilders/easybuild-easyblocks#2872