Paper: Falsify your Software: validating scientific code with property-based testing #549

Zac-HD · 2020-05-28T17:22:31Z

PDF link: http://procbuild.scipy.org/download/Zac-HD-zac-hatfield-dodds

Abstract: Where traditional example-based tests check software using manually-specified input-output pairs, property-based tests exploit a general description of valid inputs and program behaviour to automatically search for falsifying examples. Given that Python has excellent property-based testing tools, such tests are often easier to work with and routinely find serious bugs that all other techniques have missed.
I present four categories of properties relevant to most scientific projects, demonstrate how each found real bugs in Numpy and Astropy, and propose that property-based testing should be adopted more widely across the SciPy ecosystem.

deniederhut · 2020-05-28T17:53:00Z

Hey! I took a quick look at the build errors, and it looks like bibtex tripping on your use of backslashes in journal names in the bibfile for this paper. If you can handle those, the rest of this should be okay.

.gitignore

anirudhacharya · 2020-06-18T13:04:38Z

papers/zac_hatfield-dodds/falsify-your-software.rst

+for the task has several advantages:
+
+- a concise and expressive interface for describing inputs
+- tests are never flaky - failing examples are cached and


this requires more explanation. how do libraries that do Property Based Testing prevent flakiness in tests? If anything, adding random data generators into tests, adds flakiness into the testing framework.

A more general comment - how can we reproduce test failures with PBT? With random number generators, we can seed the random number generator to reproduce results. Do the various PBT frameworks also do something similar under the hood?

The basic trick is that we cache the output from the PRNG between runs, and automatically replay any previous failures on the next run.

There are a bunch of options to explicitly report and then force particular seeds or insert a buffer into the cache, but in normal development it's entirely automatic.

This means that we can get the benefit of a different set of examples each run, while also having any failures replayed every time. (or if you want the same set of examples each time, you can set the seed manually too)

anirudhacharya

Can property based testing replace the traditional example-based unit testing?

Zac-HD · 2020-06-18T13:48:16Z

Can property based testing replace the traditional example-based unit testing?

Mostly, yes. A good example might be my hypothesis-jsonschema project: of around 500 tests, about six are traditional example-based tests, 80 are parametrised tests (so ~10 tests averaging eight cases each), and the remaining 400 tests are all property-based and run hundreds of examples each.

Traditional example-based tests are still nice to pin down specific or weird edge cases, where asserting the exact result is easy and a property would be fiddly or error-prone, but almost all of the tests I write are property-based.

deniederhut · 2020-06-19T02:06:11Z

Hey @Zac-HD ! Thanks to some awesome behind the scenes work by @stargaser, we now have experimental support for publishing papers with ORCIDs. If you have one and would like to add it to your paper, it's as simple as adding an :orcid: tag under your name in the paper header. You can see an example of how to do this in the 00_vanderwalt example paper. If you don't have one, no worries! You can still publish your paper as is.

Zac-HD · 2020-06-19T02:09:42Z

I do indeed have one, and took it out of my early draft when I noticed that it wasn't supported yet. Thanks for your persistence @stargaser!

anirudhacharya · 2020-06-22T07:57:57Z

Can property based testing replace the traditional example-based unit testing?

Mostly, yes. A good example might be my hypothesis-jsonschema project: of around 500 tests, about six are traditional example-based tests, 80 are parametrised tests (so ~10 tests averaging eight cases each), and the remaining 400 tests are all property-based and run hundreds of examples each.

Traditional example-based tests are still nice to pin down specific or weird edge cases, where asserting the exact result is easy and a property would be fiddly or error-prone, but almost all of the tests I write are property-based.

Wouldn't it also make sense to have unit tests for Test Driven Development to ensure basic sanity of the software as it is being developed. It would seem property-based tests are not so good for Test Driven Development.

deniederhut · 2020-06-26T13:42:28Z

Hi @Zac-HD ! Do you have thoughts about using hypothesis with TDD? Or do you feel this is out of scope for the paper?

Zac-HD · 2020-06-27T10:31:24Z

(sorry I missed this before!)

I think TDD is out of scope for this paper, but for what it's worth I also think property-based tests are just as applicable for TDD as for any other way testing fits into your development cycle. Personally, I'd rather do my basic sanity-checks with properties than specific examples - @anirudhacharya if you've found this problematic I'd be interested to hear what difficulties you ran into!

deniederhut · 2020-06-27T16:19:46Z

Thanks for the response! @anirudhacharya do you feel that this paper is now ready for inclusion in the proceedings?

anirudhacharya · 2020-06-28T18:25:07Z

Thanks for the response! @anirudhacharya do you feel that this paper is now ready for inclusion in the proceedings?

@deniederhut Yes I do.

papers/zac_hatfield-dodds/references.bib

deniederhut added the paper This indicates that the PR in question is a paper label May 28, 2020

Zac-HD force-pushed the zac-hatfield-dodds branch 11 times, most recently from 946cd9c to fdb2392 Compare May 30, 2020 07:32

Zac-HD mentioned this pull request May 30, 2020

Adding property-based tests for Astropy using Hypothesis astropy/astropy#9017

Open

5 tasks

deniederhut added the needs-review label May 30, 2020

deniederhut reviewed May 30, 2020

View reviewed changes

.gitignore Outdated Show resolved Hide resolved

Zac-HD force-pushed the zac-hatfield-dodds branch 2 times, most recently from b880820 to 39b22c5 Compare June 2, 2020 01:43

anirudhacharya reviewed Jun 18, 2020

View reviewed changes

Zac-HD mentioned this pull request Jun 18, 2020

Improving Hypothesis' index and shape strategies Quansight-Labs/ndindex#46

Closed

deniederhut added needs-more-review and removed needs-review labels Jun 18, 2020

Zac-HD force-pushed the zac-hatfield-dodds branch from 39b22c5 to a454560 Compare June 19, 2020 02:24

Draft "Falsify your Software"

21d6a22

Zac-HD force-pushed the zac-hatfield-dodds branch from a454560 to 21d6a22 Compare June 19, 2020 02:26

deniederhut removed the needs-more-review label Jun 22, 2020

deniederhut added the pending-comment label Jun 22, 2020

deniederhut reviewed Jun 29, 2020

View reviewed changes

papers/zac_hatfield-dodds/references.bib Show resolved Hide resolved

deniederhut added ready-for-review and removed pending-comment labels Jun 30, 2020

deniederhut merged commit e120e21 into scipy-conference:2020 Jul 5, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Paper: Falsify your Software: validating scientific code with property-based testing #549

Paper: Falsify your Software: validating scientific code with property-based testing #549

Zac-HD commented May 28, 2020 •

edited

Loading

deniederhut commented May 28, 2020

anirudhacharya Jun 18, 2020 •

edited

Loading

Zac-HD Jun 18, 2020

anirudhacharya left a comment

Zac-HD commented Jun 18, 2020 •

edited

Loading

deniederhut commented Jun 19, 2020

Zac-HD commented Jun 19, 2020

anirudhacharya commented Jun 22, 2020

deniederhut commented Jun 26, 2020

Zac-HD commented Jun 27, 2020

deniederhut commented Jun 27, 2020

anirudhacharya commented Jun 28, 2020

Paper: Falsify your Software: validating scientific code with property-based testing #549

Paper: Falsify your Software: validating scientific code with property-based testing #549

Conversation

Zac-HD commented May 28, 2020 • edited Loading

deniederhut commented May 28, 2020

anirudhacharya Jun 18, 2020 • edited Loading

Choose a reason for hiding this comment

Zac-HD Jun 18, 2020

Choose a reason for hiding this comment

anirudhacharya left a comment

Choose a reason for hiding this comment

Zac-HD commented Jun 18, 2020 • edited Loading

deniederhut commented Jun 19, 2020

Zac-HD commented Jun 19, 2020

anirudhacharya commented Jun 22, 2020

deniederhut commented Jun 26, 2020

Zac-HD commented Jun 27, 2020

deniederhut commented Jun 27, 2020

anirudhacharya commented Jun 28, 2020

Zac-HD commented May 28, 2020 •

edited

Loading

anirudhacharya Jun 18, 2020 •

edited

Loading

Zac-HD commented Jun 18, 2020 •

edited

Loading