-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Warn on repeated dimension names during construction #8491
Warn on repeated dimension names during construction #8491
Conversation
for more information, see https://pre-commit.ci
Interesting - some of the h5netcdf tests apparently attempt to open hdf5 files that have repeated dimension names FAILED xarray/tests/test_backends.py::TestH5NetCDFDataRos3Driver::test_get_variable_list - ValueError: Duplicate dimension names are forbidden, but dimensions {'PRESSURE'} appear more than once in dims=('DATETIME', 'PRESSURE', 'PRESSURE')
FAILED xarray/tests/test_backends.py::TestH5NetCDFDataRos3Driver::test_get_variable_list_empty_driver_kwds - ValueError: Duplicate dimension names are forbidden, but dimensions {'PRESSURE'} appear more than once in dims=('DATETIME', 'PRESSURE', 'PRESSURE') |
thank you for putting this together, @TomNicholas.
if my understanding is accurate, neither h5netcdf nor netCDF appears to enforce restrictions against duplicate dimensions. are we to modify the HDF5 files accordingly or |
I guess depends what the point of those tests was. If it's just to check that we can correctly read the structure of those particular files then we could change to just check that the exact error we now expect is raised. If we wanted to check more than that then we should change the files. Obviously xfail would be the quickest way to move forward however. |
I'm not sure we should raise here since netcdf4 at least allows repeated dimension names (i bet zarr does too) import netCDF4
ds = netCDF4.Dataset("test.nc", mode="w")
ds.createDimension("time", 24)
ds.createVariable("test", int, dimensions=("time", "time"))
ds["test"][:] = 0
ds.close()
!ncdump -h test.nc
One of the takeaways in #2368 and #1378 (comment) was that we want to be able to open all netCDF files even if the resulting Xarray object isn't as useful as it could be. |
this may be a distinct issue, but should we consider updating the repr to indicate the presence of duplicated dimensions? as it stands, the current repr only displays unique dimensions. In [92]: data = np.zeros((2, 3, 3))
In [93]: dims = ['a', 'b', 'b']
In [94]: var = xr.Variable(data=data, dims=dims)
In [95]: var.shape
Out[95]: (2, 3, 3)
In [96]: var.dims
Out[96]: ('a', 'b', 'b')
In [97]: var
Out[97]:
<xarray.Variable (a: 2, b: 3)>
array([[[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]],
[[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]]]) |
Updating repr sounds good to me. |
(Apparently Zarr v3 does allow them, even whilst it recommends against them) We have three choices:
I was thinking we should do (1) now then either (2) or (3) later, but @dcherian's comment
|
(let's move your comment/discussion to an issue) |
A new issue? Or one of the existing ones? |
I would vote to add a warning to the constructor and mostly be done with it... That list of items are all good, and the library would be better off with them. But I would guess that even raising in the right places has a long tail of work, and setting clear expectations of no support is realistic (unless someone wants to champion this, in which case great) If we just say "we don't support this yet, but we will grudgingly load the file. Please rename the dims immediately", then that's quite clear... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect!
(needs whatsnew changing)
Okay, so I've changed the error in the constructor to merely a warning, and I've added an actual error that gets raised by any call to
I looked at this and it's because the repr calls |
…TomNicholas/xarray into forbid_repeated_dimension_names
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks solid to me 👍🏽
FWIW one thing I'd be keen for to do generally — though maybe this isn't the place to start it — is handle warnings in the test suite when we add a new warning — i.e. filter them out where we expect them. In this case, that would be the loading the netCDF files that have duplicate dims. Otherwise warnings become a huge block of text without much salience. I mostly see the 350 lines of them and think "meh mostly units & cftime", but then something breaks on a new upstream release that was buried in there, or we have a supported code path that is raising warnings internally. (I'm not sure whether it's possible to generally enforce that — maybe we could raise on any warnings coming from within xarray? Would be a non-trivial project to get us there though...) |
I agree @max-sixty , so I raised #8494.
I've added |
* raise FutureWarning * change some internal instances of ds.dims -> ds.sizes * improve clarity of which unexpected errors were raised * whatsnew * return a class which warns if treated like a Mapping * fix failing tests * avoid some warnings in the docs * silence warning caused by #8491 * fix another warning * typing of .get * fix various uses of ds.dims in tests * fix some warnings * add test that FutureWarnings are correctly raised * more fixes to avoid warnings * update tests to avoid warnings * yet more fixes to avoid warnings * also warn in groupby.dims * change groupby tests to match * update whatsnew to include groupby deprecation * filter warning when we actually test ds.dims * remove error I used for debugging --------- Co-authored-by: Deepak Cherian <[email protected]>
whats-new.rst
New functions/methods are listed inapi.rst