Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving Hypothesis' index and shape strategies #46

Closed
Zac-HD opened this issue Jun 18, 2020 · 4 comments
Closed

Improving Hypothesis' index and shape strategies #46

Zac-HD opened this issue Jun 18, 2020 · 4 comments

Comments

@Zac-HD
Copy link
Contributor

Zac-HD commented Jun 18, 2020

👋 I'm a Hypothesis core dev, and got very excited when I saw your intro blog post today.

I've been working to support better property-based testing for science code for some time now - see e.g. scipy-conference/scipy_proceedings#549 - and it's lovely to find another project and potential collaboration!

https://github.com/Quansight/ndindex/blob/0d9d131a5310915f3a51ebe95491e43afcd1dcad/ndindex/tests/helpers.py#L14-L18

In particular, both of these limitations are things that we can fix!

While I can total understand and support rolling your own rather than working upstream, it would be awesome if you could open an issue when you run into things like this - one of our limiting factors is actually user input and use-cases, so if you have suggestions they will find interested ears 😁

Before I go and upgrade our slices() and basic_indices() strategies, can I just check:

  • The "empty selection only if start == end" aspect of slices() is OK? If so I can just add some logic to maybe-invert index-from-start to index-from-end.
  • basic_indices() could similarly maybe-return t[0] if it would return a length-one tuple t, unless disabled by setting a new argument to False. Do you think this would match Numpy user's intuitions about what a basic index is?
@asmeurer
Copy link
Member

Hi. Thanks for taking notice of this project.

Regarding slices, I intentionally rolled my own, because for this library I care about a every possible corner case of slices to ensure correctness, whereas presumably for most use-cases you would want to avoid generating duplicate slices. For example, if a slice step=None, it is always equivalent to step=1. Depending on what level of abstraction you are working in, there is no need to check this. But I always want to make sure I handle every corner case here.

For shapes, I do think hypothesis can improve. See https://github.com/Quansight/ndindex/blob/0d9d131a5310915f3a51ebe95491e43afcd1dcad/ndindex/tests/helpers.py#L58-L61. I want to generate array shapes but only keep those that create arrays that are not too big (also note the NumPy bug that hypothesis found here). I think this is actually a pretty big source of filtering in my code. But I'm not sure if there's a better way to generate such shapes directly to avoid filtering.

There's also quite a bit of filtering in the tests themselves because I filter out invalid indices. Perhaps I should try to be smarter about this. I do want to generate these indices for the base tests, but generally want to avoid them for higher level tests. For example, tuple indices with more than one ellipsis should raise an error. This is tested in the constructor test_tuple_hypothesis but uses assume(False) in the other tests. I've been hesitant to get too clever here because I don't want to accidentally filter out too much, and because what is there has been working (although admittedly some bugs were not found until the tests were ran several times on CI). Or as a simpler example, I should probably avoid step=0 in all but the base Slice constructor test because it raises an exception there. But I haven't bothered, even for my manual exhaustive test.

Another question I had about hypothesis is if you had any kind of gitter or something like that where I could ask questions? On your website I only saw a link to an IRC channel, which didn't seem to have anyone in it when I checked.

@Zac-HD
Copy link
Contributor Author

Zac-HD commented Jun 19, 2020

slices

I've opened an issue to improve our st.slices() strategy. Once that's done - no eta, since I think it's a good intro for new contributors - the only thing it won't generate is slices which have start != end but select no elements due to step. This is usually helpful (because around half the time nothing is selected otherwise), but in your case you might want to roll your own to keep generating such cases.

shapes

Capping the product of side lengths without introducing weird biases or shrinking problems is surprisingly difficult... our usual approach is still to filter them. Note that these problems only really matter if you're creating arrays of the given shape though.

Drawing the side lengths from ints() | just(1) | sampled_from([1, 2]) or similar could help, by increasing the chance that several dimensions are of size one (or maybe two) to boost dimensionality without increasing the product (much).

In the end filtering out a lot isn't actually a problem in itself, so if the performance is still OK you don't need to do anything. And congratulations on the bug - are you keeping a 'trophy case' list?

indices

I'd suggest using separate strategies for "usually valid indices" and "always valid indices". I'm also keen to upgrade our upstream basic_indices() strategy to provide the latter - is there anything other than the "always a tuple" thing that we should change?

admittedly some bugs were not found until the tests were ran several times on CI

For complicated new tests I often set them to run 10k or 100k examples and leave them running for a while - 100 examples just isn't much to hit edge cases in high-dimensional spaces. There are plans in the works for a proper long-running mode, and I'm hoping to announce something really cool later this year... but for now just turn up the max_examples setting 😅

Q&A

No, we don't have a live chat kind of thing... they can devolve into unpaid support and none of us want to go there. There's a stackoverflow tag which is pretty helpful though, and you're welcome to tag me in an issue.

@asmeurer
Copy link
Member

I've opened an issue to improve our st.slices() strategy. Once that's done - no eta, since I think it's a good intro for new contributors - the only thing it won't generate is slices which have start != end but select no elements due to step. This is usually helpful (because around half the time nothing is selected otherwise), but in your case you might want to roll your own to keep generating such cases.

I will probably continue to use my own. I think another difference between my use-case and hypothesis is that I always want to generate indices independent of shape, because ndindex operations should always work independently of array shape when possible.

Capping the product of side lengths without introducing weird biases or shrinking problems is surprisingly difficult... our usual approach is still to filter them. Note that these problems only really matter if you're creating arrays of the given shape though.

Yeah, that's exactly what I'm doing. I don't know what else you would do with a shape. Actually I don't really know what a "typical" use-case for the numpy hypothesis stuff is.

In the end filtering out a lot isn't actually a problem in itself, so if the performance is still OK you don't need to do anything. And congratulations on the bug - are you keeping a 'trophy case' list?

Right, I'm mostly concerned because of the healthcheck errors. And I the filtering means that it isn't actually running as many tests as I would hope.

As an aside, I realized that I've been misusing assume(False). I've used it on error condition cases, but I actually want those to be tested to ensure that they do give an error at the right place. I was getting a lot of healthcheck errors before in some cases because of this.

I'd suggest using separate strategies for "usually valid indices" and "always valid indices". I'm also keen to upgrade our upstream basic_indices() strategy to provide the latter - is there anything other than the "always a tuple" thing that we should change?

That and slices. I can't completely tell what you are doing with ellipses but I would generate ellipses even when they are redundant (an ellipsis at the end of the tuple or an ellipsis when there are already ndim terms in the tuple).

No, we don't have a live chat kind of thing... they can devolve into unpaid support and none of us want to go there. There's a stackoverflow tag which is pretty helpful though, and you're welcome to tag me in an issue.

So where's the best place to ask questions about hypothesis?

@Zac-HD
Copy link
Contributor Author

Zac-HD commented Jul 8, 2020

(back from an outback road trip now)

That and slices. I can't completely tell what you are doing with ellipses but I would generate ellipses even when they are redundant (an ellipsis at the end of the tuple or an ellipsis when there are already ndim terms in the tuple).

We already do this upstream 😁
I've opened a new issue for the SciPy sprints to generate non-tuple basic indices though, which I think was the last thing here.

So where's the best place to ask questions about hypothesis?

Probably https://stackoverflow.com/questions/tagged/python-hypothesis for general questions; for OSS-project-specific questions you can ping me in issues too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants