Update guess-chunks so that it works with zero dimensions. #154

newt0311 · 2017-09-05T01:48:38Z

Just a possible solution. It seemed to work for me but I'm not as familiar with the code-base so some feedback would be appreciated.

…foo/zarr#150.

rabernat · 2017-10-08T04:29:21Z

@newt0311: Do you think this would fix one of the problems we have encountered in pydata/xarray#1528?

Would it enable the following code to work?

za = zarr.create(shape=(), store='tmp_file')
za[...] = 0

Currently this raises a permission error.

newt0311 · 2017-10-09T13:14:38Z

Just checked. Unfortunately no. But I think the changes needed to make this work are fairly small. I can just roll them into my existing branch...

-- PG.

newt0311 · 2017-10-09T13:15:54Z

My changes are for handling cases like arrays with shape (2, 0, 2). Without my changes, the second zero dimension causes problems.

rabernat · 2017-10-09T13:36:59Z

zarr/tests/test_creation.py

+    ar = np.ndarray(())
+    ar[()] = 100
+    z = array(ar)
+    assert_array_equal(ar, z[:])


rabernat · 2017-10-09T13:43:04Z

I'm pretty sure this fixes pydata/xarray#1528.

…tores. The problem was the empty file name.

newt0311 · 2017-10-09T13:50:48Z

One more piece of the puzzle. Python dicts don't care if a key is an empty string but unix file-systems do...

-- PG.

newt0311 · 2017-10-09T14:51:48Z

The build failures here look unrelated to my changes. Could somebody trigger a re-build?

Thanks.
-- PG.

rabernat · 2017-10-09T15:13:30Z

I think your build failure in python3.6 is related to a flake issue

py36 runtests: commands[3] | flake8 --max-line-length=100 zarr
zarr/tests/test_creation.py:397:1: E302 expected 2 blank lines, found 1
zarr/tests/test_creation.py:403:1: E302 expected 2 blank lines, found 1
zarr/tests/test_creation.py:410:28: E251 unexpected spaces around keyword / parameter equals
zarr/tests/test_creation.py:410:30: E251 unexpected spaces around keyword / parameter equals
zarr/tests/test_creation.py:415:1: E302 expected 2 blank lines, found 1
ERROR: InvocationError: '/home/travis/build/alimanfoo/zarr/.tox/py36/bin/flake8 --max-line-length=100 zarr'

newt0311 · 2017-10-09T15:42:20Z

I think you're right. This should fix it.

rabernat · 2017-10-09T16:09:04Z

I just checked out your branch locally and confirmed it fixes 4 out of 12 failing tests in pydata/xarray#1528!

@alimanfoo: merging this PR would really help us move forward on the xarray side.

alimanfoo · 2017-10-09T22:14:32Z

Nice this seems to work with minimal changes to existing code.

It would be good to review how the h5py API works with 0d datasets (I.e. scalars) and also with nd datasets with one or more zero length dims, to be sure there is compatibility between h5py and zarr where possible. The important methods where I've tried to get compatibility are getitem and setitem on the Array class, and create_dataset and require_dataset on the Group class. This PR may already have achieved good compatibility but I am not familiar with how h5py behaves in these cases so would like to get some more info on h5py behaviour. E.g., how do you create a 0d dataset in h5py? What does getitem return from 0d dataset when given empty tuple [()], or total slice [:], or ellipsis [...]? Similar questions for nd dataset with one or more zero length dimensions?

Also would be nice to check that Array.resize works fine if there are any 0 length dims. And check that sensible exceptions are raised when trying to do nonsense stuff on 0d arrays (eg resize).

If it would help the xarray work I'd be happy to merge this PR now and raise issues to review h5py compatibility and other API concerns, then address those a bit later on but before next release.

One other minor point, using "null" to pad out the chunk key for 0d array is a bit ugly. Suggest using "0" instead.

rabernat · 2017-10-10T01:00:36Z

The following works in h5py:

import h5py
f = h5py.File("mytestfile.hdf5", "w")
dset = f.create_dataset("mydataset", ())
dset[...] = 99
assert dset[...] == 99
assert dset[()] == 99

dset[:] raises ValueError: Illegal slicing argument for scalar dataspace.

rabernat · 2017-10-10T01:06:42Z

The xarray work is not so urgent as to warrant a premature, rushed merge. I agree totally that h5py compatibility should be a high priority. It sounds like a few more tests, including of exceptions, are needed.

If exact compatibility with the h5py api is desired, one option would be to actually require h5py for the tests and build this into the test suite...i.e. check explicitly that zarr operations and h5py operations give identical results / raise the same exceptions.

alimanfoo · 2017-10-11T20:46:07Z

Thanks Ryan, some tests would be good. Comparing against h5py in the tests is a nice idea, although this isn't done anywhere else in zarr test suite at the moment so I'd be happy for now with some simple direct tests along the lines of existing unit tests for Array getitem and setitem and Group create_dataset. On other thing that may be worth reviewing is exactly what happens to scalar (0d) arrays in terms of encoding and storage. I think under the current code in this PR a scalar value would still get passed through compressor and any filters before storage. I'm actually surprised that works, but main point is that compressor or filters on scalar is pointless/nonsense. Maybe better to special case scalars and don't allow any compressor or filters, and see if any shortcuts are possible given no compressor or filters. May also be worth mentioning some of this in the storage spec so it's explicit how scalars should be implemented.

On Tue, 10 Oct 2017 at 02:06, Ryan Abernathey ***@***.***> wrote: The xarray work is not so urgent as to warrant a premature, rushed merge. I agree totally that h5py compatibility should be a high priority. It sounds like a few more tests, including of exceptions, are needed. If exact compatibility with the h5py api is desired, one option would be to actually require h5py for the tests and build this into the test suite...i.e. check explicitly that zarr operations and h5py operations give identical results / raise the same exceptions. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <https://github.com/alimanfoo/zarr/pull/154#issuecomment-335330069>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAq8QtRKCUWRleuHCnTYB1XNxvVObQViks5sqsMigaJpZM4PMWbo> .

-- Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health <http://cggh.org> Big Data Institute Building Old Road Campus Roosevelt Drive Oxford OX3 7LF United Kingdom Phone: +44 (0)1865 743596 Email: [email protected] Web: http://a <http://purl.org/net/aliman>limanfoo.github.io/ Twitter: https://twitter.com/alimanfoo

alimanfoo · 2017-10-11T20:48:14Z

Btw happy to work on this (tests, spec) myself when back from leave if no one has time. On Wed, 11 Oct 2017 at 21:45, Alistair Miles <[email protected]> wrote:

Thanks Ryan, some tests would be good. Comparing against h5py in the tests is a nice idea, although this isn't done anywhere else in zarr test suite at the moment so I'd be happy for now with some simple direct tests along the lines of existing unit tests for Array getitem and setitem and Group create_dataset. On other thing that may be worth reviewing is exactly what happens to scalar (0d) arrays in terms of encoding and storage. I think under the current code in this PR a scalar value would still get passed through compressor and any filters before storage. I'm actually surprised that works, but main point is that compressor or filters on scalar is pointless/nonsense. Maybe better to special case scalars and don't allow any compressor or filters, and see if any shortcuts are possible given no compressor or filters. May also be worth mentioning some of this in the storage spec so it's explicit how scalars should be implemented. On Tue, 10 Oct 2017 at 02:06, Ryan Abernathey ***@***.***> wrote: > The xarray work is not so urgent as to warrant a premature, rushed merge. > I agree totally that h5py compatibility should be a high priority. It > sounds like a few more tests, including of exceptions, are needed. > > If exact compatibility with the h5py api is desired, one option would be > to actually require h5py for the tests and build this into the test > suite...i.e. check explicitly that zarr operations and h5py operations give > identical results / raise the same exceptions. > > — > You are receiving this because you were mentioned. > > > Reply to this email directly, view it on GitHub > <https://github.com/alimanfoo/zarr/pull/154#issuecomment-335330069>, or mute > the thread > <https://github.com/notifications/unsubscribe-auth/AAq8QtRKCUWRleuHCnTYB1XNxvVObQViks5sqsMigaJpZM4PMWbo> > . > -- Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health <http://cggh.org> Big Data Institute Building Old Road Campus Roosevelt Drive Oxford OX3 7LF United Kingdom Phone: +44 (0)1865 743596 Email: ***@***.*** Web: http://a <http://purl.org/net/aliman>limanfoo.github.io/ Twitter: https://twitter.com/alimanfoo

-- Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health <http://cggh.org> Big Data Institute Building Old Road Campus Roosevelt Drive Oxford OX3 7LF United Kingdom Phone: +44 (0)1865 743596 Email: [email protected] Web: http://a <http://purl.org/net/aliman>limanfoo.github.io/ Twitter: https://twitter.com/alimanfoo

will133 · 2017-10-24T15:43:30Z

Just want to add my two cents.

I used to be confused about this since different libraries (like numpy, h5py and pytables) support 0-d array differently. Please note that h5py and pytables don't really support it well due to hdf5 limitations so I would not use them as the final design if you are aiming for the correct behavior. I do like the numpy model as I think it's somewhat consistent at least. In numpy:

>>>  a = numpy.arange(30)
>>> type(a[0])
numpy.int64
# Note that this is really not a 0-d array, it's a scalar type instead where there is no shape
>>> a[0].shape
()
# Note that the previous scalar value is different from a 0-d array:
>>> a[0:0]
array([], dtype=int64)
# where there is a shape of length 1 (the content is is 0 so the shape is preserved)
>>> a[0:0].shape
(0,)
# the type is an n-d array
>>> print type(a[0:0])
<type 'numpy.ndarray'>

# Here is an nd-array with 1 element, which is different from the numpy scalar
>>> print a[1:2]
[1]
# Of course then it has a shape:
>>> print a[1:2].shape
(1,)
>>> print type(a[1:2])
<type 'numpy.ndarray'>

# You can actually reshape this to arbitrary dimension and it works fine:
>>> print a[1:2].reshape((1, 1, 1, 1))
[[[[1]]]]
>>> a[1:2].reshape((1, 1, 1, 1)).shape
(1, 1, 1, 1)

I think at the end of the day the types are different (scalar vs nd-array) and the round trip from numpy to zarr (or pytables/h5py in that regard) should be seamless. Last I read about this I think HDF5 has an empty data set support but you can not store the shape. The end result is that you ended up writing meta data yourself so you can preserve the original shape like (10, 0, 2).

alimanfoo · 2017-10-24T19:02:09Z

Thanks Will, very timely, I'm just looking at this.

On Tue, 24 Oct 2017 at 16:43, Will Lee ***@***.***> wrote: Just want to add my two cents. I used to be confused about this since different libraries (like numpy, h5py and pytables) support 0-d array differently. Please note that h5py and pytables don't really support it well due to hdf5 limitations so I would not use them as the final design if you are aiming for the correct behavior. I do like the numpy model as I think it's somewhat consistent at least. In numpy: >>> a = numpy.arange(30) >>> type(a[0]) numpy.int64 # Note that this is really not a 0-d array, it's a scalar type instead where there is no shape >>> a[0].shape () # Note that the previous scalar value is different from a 0-d array: >>> a[0:0] array([], dtype=int64) # where there is a shape of length 1 (the content is is 0 so the shape is preserved) >>> a[0:0].shape (0,) # the type is an n-d array >>> print type(a[0:0]) <type 'numpy.ndarray'> # Here is an nd-array with 1 element, which is different from the numpy scalar >>> print a[1:2] [1] # Of course then it has a shape: >>> print a[1:2].shape (1,) >>> print type(a[1:2]) <type 'numpy.ndarray'> # You can actually reshape this to arbitrary dimension and it works fine: >>> print a[1:2].reshape((1, 1, 1, 1)) [[[[1]]]] >>> a[1:2].reshape((1, 1, 1, 1)).shape (1, 1, 1, 1) I think at the end of the day the types are different (scalar vs nd-array) and the round trip from numpy to zarr (or pytables/h5py in that regard) should be seamless. Last I read about this I think HDF5 has an empty data set support but you can not store the shape. The end result is that you ended up writing meta data yourself so you can preserve the original shape like (10, 0, 2). — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <https://github.com/alimanfoo/zarr/pull/154#issuecomment-339035385>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAq8QrkgxE4pgyR0tFI4qC4sDLV12-Fxks5svgWigaJpZM4PMWbo> .

-- Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health <http://cggh.org> Big Data Institute Building Old Road Campus Roosevelt Drive Oxford OX3 7LF United Kingdom Phone: +44 (0)1865 743596 Email: [email protected] Web: http://a <http://purl.org/net/aliman>limanfoo.github.io/ Twitter: https://twitter.com/alimanfoo

alimanfoo · 2017-10-24T21:38:05Z

I've created a separate PR #160 to focus on zero length dimensions. It has some fixes from this PR manually merged, plus some extra tests.

alimanfoo · 2017-10-24T23:53:57Z

I've created another PR #161 to focus on zero-dimensional arrays, building on work here. Comments welcome there.

newt0311 added 3 commits September 5, 2017 01:46

Update guess-chunks so that it works with zero dimensions. See aliman…

45b5a0c

…foo/zarr#150.

Fix util.py to handle zero-length axis.

96f2558

Fix flake8 issue.

177b4fd

Added in changes to handle the no-dimension case as well.

9cde86e

rabernat reviewed Oct 9, 2017

View reviewed changes

zarr/tests/test_creation.py

ar = np.ndarray(())

ar[()] = 100

z = array(ar)

assert_array_equal(ar, z[:])

Copy link

Contributor

rabernat Oct 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect!

Added in changes to make no-dims arrays work well with the DirectoryS…

0c258db

…tores. The problem was the empty file name.

newt0311 closed this Oct 9, 2017

newt0311 reopened this Oct 9, 2017

Fixed flake8 issues.

9153a3f

rabernat mentioned this pull request Oct 9, 2017

WIP: Zarr backend pydata/xarray#1528

Merged

4 tasks

Use "0" instead of "null" for the chunk key for 0-d arrays.

9b96ad3

alimanfoo mentioned this pull request Oct 24, 2017

Zero length dimensions #160

Merged

alimanfoo mentioned this pull request Oct 24, 2017

0-dimensional arrays #161

Merged

alimanfoo closed this Oct 24, 2017

alimanfoo added this to the v2.2 milestone Nov 20, 2017

alimanfoo added enhancement New features or improvements release notes done Automatically applied to PRs which have release notes. labels Nov 20, 2017

Uh oh!

Update guess-chunks so that it works with zero dimensions. #154

Update guess-chunks so that it works with zero dimensions. #154

Uh oh!

Conversation

newt0311 commented Sep 5, 2017

Uh oh!

rabernat commented Oct 8, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

newt0311 commented Oct 9, 2017

Uh oh!

newt0311 commented Oct 9, 2017

Uh oh!

rabernat Oct 9, 2017

Choose a reason for hiding this comment

Uh oh!

rabernat commented Oct 9, 2017

Uh oh!

newt0311 commented Oct 9, 2017

Uh oh!

newt0311 commented Oct 9, 2017

Uh oh!

rabernat commented Oct 9, 2017

Uh oh!

newt0311 commented Oct 9, 2017

Uh oh!

rabernat commented Oct 9, 2017

Uh oh!

alimanfoo commented Oct 9, 2017

Uh oh!

rabernat commented Oct 10, 2017

Uh oh!

rabernat commented Oct 10, 2017

Uh oh!

alimanfoo commented Oct 11, 2017 via email

Uh oh!

alimanfoo commented Oct 11, 2017 via email

Uh oh!

will133 commented Oct 24, 2017

Uh oh!

alimanfoo commented Oct 24, 2017 via email

Uh oh!

alimanfoo commented Oct 24, 2017

Uh oh!

alimanfoo commented Oct 24, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

rabernat commented Oct 8, 2017 •

edited

Loading