Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip mean over empty axis #5207

Merged
merged 2 commits into from
Apr 24, 2021
Merged

Skip mean over empty axis #5207

merged 2 commits into from
Apr 24, 2021

Conversation

dschwoerer
Copy link
Contributor

Avoids changing the datatype if the data does not have the requested
axis.

Avoids changing the datatype if the data does not have the requested
axis.
Copy link
Collaborator

@max-sixty max-sixty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @dschwoerer !

I think the result is right.

It's implemented as a special-case, which isn't ideal. But I'm not sure how to abstract the rule that applies to mean but not to std — something like "does f([x]) == x". I guess it should apply to .sum too.

For other reviewers — here's an example of the std case (which this PR doesn't change, correctly):

In [5]: ds
Out[5]:
<xarray.Dataset>
Dimensions:  (pos: 3, time: 2)
Coordinates:
  * pos      (pos) int64 1 2 3
Dimensions without coordinates: time
Data variables:
    data     (pos, time) float64 1.0 2.0 3.0 4.0 5.0 6.0
    var      (pos) int64 2 3 4

In [6]: ds.std('time')
Out[6]:
<xarray.Dataset>
Dimensions:  (pos: 3)
Coordinates:
  * pos      (pos) int64 1 2 3
Data variables:
    data     (pos) float64 0.5 0.5 0.5
    var      (pos) float64 0.0 0.0 0.0    # this should be 0.0, unlike `.mean`'s result

xarray/tests/test_duck_array_ops.py Outdated Show resolved Hide resolved
xarray/core/duck_array_ops.py Outdated Show resolved Hide resolved
xarray/core/duck_array_ops.py Outdated Show resolved Hide resolved
@@ -85,6 +85,10 @@ Breaking changes
as positional, all others need to be passed are keyword arguments. This is part of the
refactor to support external backends (:issue:`4309`, :pull:`4989`).
By `Alessandro Amici <https://github.com/alexamici>`_.
- :py:func:`mean` does not change the data if axis is None. This
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:py:func:mean is actually a method (there should be other references in the whats-new which can be copied)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't found an example, so I have removed the explicit reference

ds["pos"] = [1, 2, 3]
ds["data"] = ("pos", "time"), [[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]]
ds["var"] = "pos", [2, 3, 4]
ds2 = ds.mean(dim="time")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Convention:

Suggested change
ds2 = ds.mean(dim="time")
result = ds.mean(dim="time")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not store the result but use it directly. Is that ok as well?

ds = Dataset()
ds["pos"] = [1, 2, 3]
ds["data"] = ("pos", "time"), [[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]]
ds["var"] = "pos", [2, 3, 4]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI you can construct the dataset in a single expression, but it's also fine as-is (Dataset(dict(pos=..., data=...)))

* Better testing
* Clarify comment
* Handle other functions as well, like sum, min, max
@dschwoerer
Copy link
Contributor Author

I have now changed so that several wrapped functions preserve the data.

It is more generic, and hopefully still readable. The flag might also be called identity_0d because the function is the identity function for 0d data, while invariant_0d means that any 0d data is invariant under this function.

@max-sixty
Copy link
Collaborator

Excellent! Thanks @dschwoerer .

Any thoughts from others before we merge, particularly on the identity_0d approach?

@max-sixty
Copy link
Collaborator

Merging — as ever, any additional feedback we can address in another PR.

Thank you very much @dschwoerer , we're happy to have you as a contributor!

@max-sixty max-sixty merged commit 24c357f into pydata:master Apr 24, 2021
@keewis keewis mentioned this pull request Jun 10, 2021
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Dataset.mean changes variables without specified dimension
2 participants