You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We should expose a public set of hypothesis strategies for use in testing xarray code. It could be useful for downstream users, but also for our own internal test suite. It should live in xarray.testing.strategies. Specifically perhaps
This issue is different from #1846 because that issue describes how we could use such strategies in our own testing code, whereas this issue is for how we create general strategies that we could use in many places (including exposing publicly).
I've become interested in this as part of wanting to see #6894 happen. #6908 would effectively close this issue, but itself is just a pulled out section of all the work @keewis did in #4972.
(Also xref #2686. Also also @max-sixty didn't you have an issue somewhere about creating better and public test fixtures?)
Previous work
I was pretty surprised to see this comment by @Zac-HD in #1846
given that we might have just used that instead of writing new ones in #4972! (@keewis had you already seen that extension?)
We could literally just include that extension in xarray and call this issue solved...
Shrinking performance of strategies
However I was also reading about strategies that shrink yesterday and think that we should try to make some effort to come up with strategies for producing xarray objects that shrink in a performant and well-motivated manner. In particular by pooling the knowledge of the @xarray-dev core team we could try to create strategies that search for many of the edge cases that we are collectively aware of.
My understanding of that guide is that our strategies ideally should:
Quickly include or exclude complexity
For instance if draw(booleans()): # then add coordinates to generated dataset.
It might also be nice to have strategy constructors which allow passing other strategies in, so the user can choose how much complexity they want their strategy to generate. e.g. I think a signature like this should be possible
fromhypothesisimportstrategiesasst@st.compositedefdataarrays(
data: xr.Variable|st.SearchStrategy[xr.Variable] |duckarray|st.SearchStrategy[duckarray] |None ...,
coords: ...,
dims: ...,
attrs: ...,
name: ...,
) ->st.SearchStrategy[xr.DataArray]:
""" Hypothesis strategy for generating arbitrary DataArray objects. Parameters ---------- data Can pass an absolute value of an appropriate type (i.e. `Variable`, `np.ndarray` etc.), or pass a strategy which generates such types. Default is that the generated DataArray could contain any possible data. ... (similar flexibility for other constructor arguments) """
...
Deliberately generate known edge cases
For instance deliberately create:
dimension coordinates,
names which are Hashable but not strings,
multi-indexes,
weird dtypes,
NaNs,
duckarrays instead of np.ndarray,
inconsistent chunking between different variables,
(any other ideas?)
Be very modular internally, to help with "keeping things local"
Each sub-strategy should be in its own function, so that hypothesis' decision tree can cut branches off as soon as possible.
Avoid obvious inefficiencies
e.g. not .filter(...) or assume(...) if we can help it, and if we do need them then keep them in the same function that generates that data. Plus just keep all sizes small by default.
Perhaps the solutions implemented in #6894 or this hypothesis xarray extension already meet these criteria - I'm not sure. I just wanted a dedicated place to discuss building the strategies specifically, without it getting mixed in with complicated discussions about whatever we're trying to use the strategies for!
The text was updated successfully, but these errors were encountered:
Proposal
We should expose a public set of hypothesis strategies for use in testing xarray code. It could be useful for downstream users, but also for our own internal test suite. It should live in
xarray.testing.strategies
. Specifically perhapsxarray.testing.strategies.variables
xarray.testing.strategies.dataarrays
xarray.testing.strategies.datasets
xarray.testing.strategies.datatrees
?)xarray.testing.strategies.indexes
xarray.testing.strategies.chunksizes
followingdask.array.testing.strategies.chunks
This issue is different from #1846 because that issue describes how we could use such strategies in our own testing code, whereas this issue is for how we create general strategies that we could use in many places (including exposing publicly).
I've become interested in this as part of wanting to see #6894 happen. #6908 would effectively close this issue, but itself is just a pulled out section of all the work @keewis did in #4972.
(Also xref #2686. Also also @max-sixty didn't you have an issue somewhere about creating better and public test fixtures?)
Previous work
I was pretty surprised to see this comment by @Zac-HD in #1846
given that we might have just used that instead of writing new ones in #4972! (@keewis had you already seen that extension?)
We could literally just include that extension in xarray and call this issue solved...
Shrinking performance of strategies
However I was also reading about strategies that shrink yesterday and think that we should try to make some effort to come up with strategies for producing xarray objects that shrink in a performant and well-motivated manner. In particular by pooling the knowledge of the @xarray-dev core team we could try to create strategies that search for many of the edge cases that we are collectively aware of.
My understanding of that guide is that our strategies ideally should:
Quickly include or exclude complexity
For instance
if draw(booleans()): # then add coordinates to generated dataset
.It might also be nice to have strategy constructors which allow passing other strategies in, so the user can choose how much complexity they want their strategy to generate. e.g. I think a signature like this should be possible
Deliberately generate known edge cases
For instance deliberately create:
np.ndarray
,Be very modular internally, to help with "keeping things local"
Each sub-strategy should be in its own function, so that hypothesis' decision tree can cut branches off as soon as possible.
Avoid obvious inefficiencies
e.g. not
.filter(...)
orassume(...)
if we can help it, and if we do need them then keep them in the same function that generates that data. Plus just keep all sizes small by default.Perhaps the solutions implemented in #6894 or this hypothesis xarray extension already meet these criteria - I'm not sure. I just wanted a dedicated place to discuss building the strategies specifically, without it getting mixed in with complicated discussions about whatever we're trying to use the strategies for!
The text was updated successfully, but these errors were encountered: