Polish all tests #14

mdhaber · 2024-11-22T06:11:24Z

In the first pass at writing tests, I'm interested in demonstrating that the function is mostly correct.

Take a second pass to polish, making sure that each function is bulletproof. This includes testing with multiple backends and all relevant dtypes. It's also important to check that the (random) numerical value used in tests have an appropriate distribution for exercising the function, and we may need to add tests of edge cases.

keewis · 2024-12-07T12:49:57Z

it would probably make sense to leave that kind of edge case testing to hypothesis.

However, I don't think we need to be too thorough, as this can easily turn into testing the wrapped array's behavior, which I don't think we should do (drawing that line can be hard, though).

mdhaber · 2024-12-07T13:59:07Z

this can easily turn into testing the wrapped array's behavior

With hypothesis, it sure can. I added hypothesis to SciPy, and I think I'm still the only one to have used it. But it is too thorough: it routinely comes up with edge cases that fail because of a) the underlying functions rather than the function being tested or b) due to limitations of the underlying algorithm. It can be very tricky to write tests to avoid that while still making good user of hypothesis. That's why I cooked up my own simple system here, and I'm mostly happy with it. The only things I'd change are:

have the assertion error message that shows the failing seed give something that can be copy-posted to re-run the test (right now requires copying the inputs into test_test manually), and
raise that message from a decorator rather than from the assertion functions.

What I meant about testing edge cases is that there may still be edge cases where what the wrapper does needs to be tested. For instance, when all elements of a reducing operations are masked, the result should be masked (according to precedent). That may not already come up in its own very often.

Also, originally, I was generating random values with the uniform distribution, so after conversion to integer dtypes there was only 0 and 1. This might be enough to ensure correct behavior of the masked part, but maybe not for functions like sort or unique_, which I'd prefer to test with some variety in the input (but also some ties).

I've already parametrized the tests over multiple backends and all dtypes and found some small issues. I just mean to take another pass and think about each test again when I'm not sprinting to get something written.

mdhaber mentioned this issue Nov 22, 2024

TST: fix tests that were commented out #13

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Polish all tests #14

Polish all tests #14

mdhaber commented Nov 22, 2024 •

edited

Loading

keewis commented Dec 7, 2024 •

edited

Loading

mdhaber commented Dec 7, 2024 •

edited

Loading

Polish all tests #14

Polish all tests #14

Comments

mdhaber commented Nov 22, 2024 • edited Loading

keewis commented Dec 7, 2024 • edited Loading

mdhaber commented Dec 7, 2024 • edited Loading

mdhaber commented Nov 22, 2024 •

edited

Loading

keewis commented Dec 7, 2024 •

edited

Loading

mdhaber commented Dec 7, 2024 •

edited

Loading