Skip to content

Conversation

@newt0311
Copy link

@newt0311 newt0311 commented Sep 5, 2017

Just a possible solution. It seemed to work for me but I'm not as familiar with the code-base so some feedback would be appreciated.

@rabernat
Copy link
Contributor

rabernat commented Oct 8, 2017

@newt0311: Do you think this would fix one of the problems we have encountered in pydata/xarray#1528?

Would it enable the following code to work?

za = zarr.create(shape=(), store='tmp_file')
za[...] = 0

Currently this raises a permission error.

@newt0311
Copy link
Author

newt0311 commented Oct 9, 2017

Just checked. Unfortunately no. But I think the changes needed to make this work are fairly small. I can just roll them into my existing branch...

-- PG.

@newt0311
Copy link
Author

newt0311 commented Oct 9, 2017

My changes are for handling cases like arrays with shape (2, 0, 2). Without my changes, the second zero dimension causes problems.

ar = np.ndarray(())
ar[()] = 100
z = array(ar)
assert_array_equal(ar, z[:])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect!

@rabernat
Copy link
Contributor

rabernat commented Oct 9, 2017

I'm pretty sure this fixes pydata/xarray#1528.

@newt0311
Copy link
Author

newt0311 commented Oct 9, 2017

One more piece of the puzzle. Python dicts don't care if a key is an empty string but unix file-systems do...

-- PG.

@newt0311
Copy link
Author

newt0311 commented Oct 9, 2017

The build failures here look unrelated to my changes. Could somebody trigger a re-build?

Thanks.
-- PG.

@newt0311 newt0311 closed this Oct 9, 2017
@newt0311 newt0311 reopened this Oct 9, 2017
@rabernat
Copy link
Contributor

rabernat commented Oct 9, 2017

I think your build failure in python3.6 is related to a flake issue

py36 runtests: commands[3] | flake8 --max-line-length=100 zarr
zarr/tests/test_creation.py:397:1: E302 expected 2 blank lines, found 1
zarr/tests/test_creation.py:403:1: E302 expected 2 blank lines, found 1
zarr/tests/test_creation.py:410:28: E251 unexpected spaces around keyword / parameter equals
zarr/tests/test_creation.py:410:30: E251 unexpected spaces around keyword / parameter equals
zarr/tests/test_creation.py:415:1: E302 expected 2 blank lines, found 1
ERROR: InvocationError: '/home/travis/build/alimanfoo/zarr/.tox/py36/bin/flake8 --max-line-length=100 zarr'

@newt0311
Copy link
Author

newt0311 commented Oct 9, 2017

I think you're right. This should fix it.

@rabernat
Copy link
Contributor

rabernat commented Oct 9, 2017

I just checked out your branch locally and confirmed it fixes 4 out of 12 failing tests in pydata/xarray#1528!

@alimanfoo: merging this PR would really help us move forward on the xarray side.

@rabernat rabernat mentioned this pull request Oct 9, 2017
4 tasks
@alimanfoo
Copy link
Member

Nice this seems to work with minimal changes to existing code.

It would be good to review how the h5py API works with 0d datasets (I.e. scalars) and also with nd datasets with one or more zero length dims, to be sure there is compatibility between h5py and zarr where possible. The important methods where I've tried to get compatibility are getitem and setitem on the Array class, and create_dataset and require_dataset on the Group class. This PR may already have achieved good compatibility but I am not familiar with how h5py behaves in these cases so would like to get some more info on h5py behaviour. E.g., how do you create a 0d dataset in h5py? What does getitem return from 0d dataset when given empty tuple [()], or total slice [:], or ellipsis [...]? Similar questions for nd dataset with one or more zero length dimensions?

Also would be nice to check that Array.resize works fine if there are any 0 length dims. And check that sensible exceptions are raised when trying to do nonsense stuff on 0d arrays (eg resize).

If it would help the xarray work I'd be happy to merge this PR now and raise issues to review h5py compatibility and other API concerns, then address those a bit later on but before next release.

One other minor point, using "null" to pad out the chunk key for 0d array is a bit ugly. Suggest using "0" instead.

@rabernat
Copy link
Contributor

The following works in h5py:

import h5py
f = h5py.File("mytestfile.hdf5", "w")
dset = f.create_dataset("mydataset", ())
dset[...] = 99
assert dset[...] == 99
assert dset[()] == 99

dset[:] raises ValueError: Illegal slicing argument for scalar dataspace.

@rabernat
Copy link
Contributor

The xarray work is not so urgent as to warrant a premature, rushed merge. I agree totally that h5py compatibility should be a high priority. It sounds like a few more tests, including of exceptions, are needed.

If exact compatibility with the h5py api is desired, one option would be to actually require h5py for the tests and build this into the test suite...i.e. check explicitly that zarr operations and h5py operations give identical results / raise the same exceptions.

@alimanfoo
Copy link
Member

alimanfoo commented Oct 11, 2017 via email

@alimanfoo
Copy link
Member

alimanfoo commented Oct 11, 2017 via email

@will133
Copy link

will133 commented Oct 24, 2017

Just want to add my two cents.

I used to be confused about this since different libraries (like numpy, h5py and pytables) support 0-d array differently. Please note that h5py and pytables don't really support it well due to hdf5 limitations so I would not use them as the final design if you are aiming for the correct behavior. I do like the numpy model as I think it's somewhat consistent at least. In numpy:

>>>  a = numpy.arange(30)
>>> type(a[0])
numpy.int64
# Note that this is really not a 0-d array, it's a scalar type instead where there is no shape
>>> a[0].shape
()
# Note that the previous scalar value is different from a 0-d array:
>>> a[0:0]
array([], dtype=int64)
# where there is a shape of length 1 (the content is is 0 so the shape is preserved)
>>> a[0:0].shape
(0,)
# the type is an n-d array
>>> print type(a[0:0])
<type 'numpy.ndarray'>

# Here is an nd-array with 1 element, which is different from the numpy scalar
>>> print a[1:2]
[1]
# Of course then it has a shape:
>>> print a[1:2].shape
(1,)
>>> print type(a[1:2])
<type 'numpy.ndarray'>

# You can actually reshape this to arbitrary dimension and it works fine:
>>> print a[1:2].reshape((1, 1, 1, 1))
[[[[1]]]]
>>> a[1:2].reshape((1, 1, 1, 1)).shape
(1, 1, 1, 1)

I think at the end of the day the types are different (scalar vs nd-array) and the round trip from numpy to zarr (or pytables/h5py in that regard) should be seamless. Last I read about this I think HDF5 has an empty data set support but you can not store the shape. The end result is that you ended up writing meta data yourself so you can preserve the original shape like (10, 0, 2).

@alimanfoo
Copy link
Member

alimanfoo commented Oct 24, 2017 via email

@alimanfoo alimanfoo mentioned this pull request Oct 24, 2017
@alimanfoo
Copy link
Member

I've created a separate PR #160 to focus on zero length dimensions. It has some fixes from this PR manually merged, plus some extra tests.

@alimanfoo alimanfoo mentioned this pull request Oct 24, 2017
@alimanfoo
Copy link
Member

I've created another PR #161 to focus on zero-dimensional arrays, building on work here. Comments welcome there.

@alimanfoo alimanfoo closed this Oct 24, 2017
@alimanfoo alimanfoo added this to the v2.2 milestone Nov 20, 2017
@alimanfoo alimanfoo added enhancement New features or improvements release notes done Automatically applied to PRs which have release notes. labels Nov 20, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New features or improvements release notes done Automatically applied to PRs which have release notes.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants