Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paper: Falsify your Software: validating scientific code with property-based testing #549

Merged
merged 1 commit into from
Jul 5, 2020

Conversation

Zac-HD
Copy link

@Zac-HD Zac-HD commented May 28, 2020

PDF link: http://procbuild.scipy.org/download/Zac-HD-zac-hatfield-dodds

Abstract: Where traditional example-based tests check software using manually-specified input-output pairs, property-based tests exploit a general description of valid inputs and program behaviour to automatically search for falsifying examples. Given that Python has excellent property-based testing tools, such tests are often easier to work with and routinely find serious bugs that all other techniques have missed.
I present four categories of properties relevant to most scientific projects, demonstrate how each found real bugs in Numpy and Astropy, and propose that property-based testing should be adopted more widely across the SciPy ecosystem.

@deniederhut
Copy link
Member

Hey! I took a quick look at the build errors, and it looks like bibtex tripping on your use of backslashes in journal names in the bibfile for this paper. If you can handle those, the rest of this should be okay.

@deniederhut deniederhut added the paper This indicates that the PR in question is a paper label May 28, 2020
@Zac-HD Zac-HD force-pushed the zac-hatfield-dodds branch 11 times, most recently from 946cd9c to fdb2392 Compare May 30, 2020 07:32
.gitignore Outdated Show resolved Hide resolved
@Zac-HD Zac-HD force-pushed the zac-hatfield-dodds branch 2 times, most recently from b880820 to 39b22c5 Compare June 2, 2020 01:43
for the task has several advantages:

- a concise and expressive interface for describing inputs
- tests are never flaky - failing examples are cached and
Copy link

@anirudhacharya anirudhacharya Jun 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this requires more explanation. how do libraries that do Property Based Testing prevent flakiness in tests? If anything, adding random data generators into tests, adds flakiness into the testing framework.

A more general comment - how can we reproduce test failures with PBT? With random number generators, we can seed the random number generator to reproduce results. Do the various PBT frameworks also do something similar under the hood?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The basic trick is that we cache the output from the PRNG between runs, and automatically replay any previous failures on the next run.

There are a bunch of options to explicitly report and then force particular seeds or insert a buffer into the cache, but in normal development it's entirely automatic.

This means that we can get the benefit of a different set of examples each run, while also having any failures replayed every time. (or if you want the same set of examples each time, you can set the seed manually too)

Copy link

@anirudhacharya anirudhacharya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can property based testing replace the traditional example-based unit testing?

@Zac-HD
Copy link
Author

Zac-HD commented Jun 18, 2020

Can property based testing replace the traditional example-based unit testing?

Mostly, yes. A good example might be my hypothesis-jsonschema project: of around 500 tests, about six are traditional example-based tests, 80 are parametrised tests (so ~10 tests averaging eight cases each), and the remaining 400 tests are all property-based and run hundreds of examples each.

Traditional example-based tests are still nice to pin down specific or weird edge cases, where asserting the exact result is easy and a property would be fiddly or error-prone, but almost all of the tests I write are property-based.

@deniederhut
Copy link
Member

Hey @Zac-HD ! Thanks to some awesome behind the scenes work by @stargaser, we now have experimental support for publishing papers with ORCIDs. If you have one and would like to add it to your paper, it's as simple as adding an :orcid: tag under your name in the paper header. You can see an example of how to do this in the 00_vanderwalt example paper. If you don't have one, no worries! You can still publish your paper as is.

@Zac-HD
Copy link
Author

Zac-HD commented Jun 19, 2020

I do indeed have one, and took it out of my early draft when I noticed that it wasn't supported yet. Thanks for your persistence @stargaser!

@anirudhacharya
Copy link

Can property based testing replace the traditional example-based unit testing?

Mostly, yes. A good example might be my hypothesis-jsonschema project: of around 500 tests, about six are traditional example-based tests, 80 are parametrised tests (so ~10 tests averaging eight cases each), and the remaining 400 tests are all property-based and run hundreds of examples each.

Traditional example-based tests are still nice to pin down specific or weird edge cases, where asserting the exact result is easy and a property would be fiddly or error-prone, but almost all of the tests I write are property-based.

Wouldn't it also make sense to have unit tests for Test Driven Development to ensure basic sanity of the software as it is being developed. It would seem property-based tests are not so good for Test Driven Development.

@deniederhut
Copy link
Member

Hi @Zac-HD ! Do you have thoughts about using hypothesis with TDD? Or do you feel this is out of scope for the paper?

@Zac-HD
Copy link
Author

Zac-HD commented Jun 27, 2020

(sorry I missed this before!)

I think TDD is out of scope for this paper, but for what it's worth I also think property-based tests are just as applicable for TDD as for any other way testing fits into your development cycle. Personally, I'd rather do my basic sanity-checks with properties than specific examples - @anirudhacharya if you've found this problematic I'd be interested to hear what difficulties you ran into!

@deniederhut
Copy link
Member

Thanks for the response! @anirudhacharya do you feel that this paper is now ready for inclusion in the proceedings?

@anirudhacharya
Copy link

Thanks for the response! @anirudhacharya do you feel that this paper is now ready for inclusion in the proceedings?

@deniederhut Yes I do.

@deniederhut deniederhut merged commit e120e21 into scipy-conference:2020 Jul 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
paper This indicates that the PR in question is a paper ready-for-review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants