Skip to content

Conversation

@alimanfoo
Copy link
Member

@alimanfoo alimanfoo commented Apr 26, 2017

This PR has some work on array and group repr, resolves #83, resolves #115, resolves #132.

@alimanfoo alimanfoo added this to the v2.2 milestone Apr 26, 2017
@alimanfoo
Copy link
Member Author

I've simplified the array and group repr to a minimal output consistent with h5py. To still get diagnostic information on an array when you want it there is a new info() method on the array class. E.g.:

>>> import zarr
>>> g = zarr.group()
>>> g
<zarr group '/' (0 arrays, 0 groups)>
>>> g.create_group('foo/bar')
<zarr group '/foo/bar' (0 arrays, 0 groups)>
>>> g
<zarr group '/' (0 arrays, 1 groups)>
>>> g.create_dataset('qux', shape=1000, chunks=100, dtype='i4', filters=[zarr.Delta('i4')], compression='gzip')
<zarr array '/qux': shape (1000,), type '<i4'>
>>> g
<zarr group '/' (1 arrays, 1 groups)>
>>> z = g['qux']
>>> z
<zarr array '/qux': shape (1000,), type '<i4'>
>>> print(z.info())
Name                   : /qux
Type                   : zarr.core.Array
Data type              : int32
Shape                  : (1000,)
Chunk shape            : (100,)
Order                  : C
Filter [0]             : Delta(dtype='<i4')
Compressor             : Zlib(level=1)
Store type             : zarr.storage.DictStore
No. bytes              : 4000 (3.9K)
No. bytes stored       : 350
Storage ratio          : 11.43
No. chunks initialized : 0/10

>>> z[:] = 42
>>> print(z.info())
Name                   : /qux
Type                   : zarr.core.Array
Data type              : int32
Shape                  : (1000,)
Chunk shape            : (100,)
Order                  : C
Filter [0]             : Delta(dtype='<i4')
Compressor             : Zlib(level=1)
Store type             : zarr.storage.DictStore
No. bytes              : 4000 (3.9K)
No. bytes stored       : 490
Storage ratio          : 8.16
No. chunks initialized : 10/10

cc @jakirkham, comments welcome.

@alimanfoo alimanfoo changed the title WIP repr repr changes Oct 24, 2017
@alimanfoo
Copy link
Member Author

I've made some further changes to also simplify the group repr and avoid any potentially expensive computations when generating reprs. New examples:

    >>> import zarr
    >>> root = zarr.group()
    >>> foo = root.create_group('foo')
    >>> bar = foo.zeros('bar', shape=1000000, chunks=100000)
    >>> bar[:] = 42
    >>> root
    <zarr.hierarchy.Group '/'>
    >>> root.info
    Name        : /
    Type        : zarr.hierarchy.Group
    Read-only   : False
    Store type  : zarr.storage.DictStore
    No. members : 1
    No. arrays  : 0
    No. groups  : 1
    Groups      : foo

    >>> foo
    <zarr.hierarchy.Group '/foo'>
    >>> foo.info
    Name        : /foo
    Type        : zarr.hierarchy.Group
    Read-only   : False
    Store type  : zarr.storage.DictStore
    No. members : 1
    No. arrays  : 1
    No. groups  : 0
    Arrays      : bar

    >>> bar
    <zarr.core.Array '/foo/bar' (1000000,) float64>
    >>> bar.info
    Name               : /foo/bar
    Type               : zarr.core.Array
    Data type          : float64
    Shape              : (1000000,)
    Chunk shape        : (100000,)
    Order              : C
    Read-only          : False
    Compressor         : Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0)
    Store type         : zarr.storage.DictStore
    No. bytes          : 8000000 (7.6M)
    No. bytes stored   : 38482 (37.6K)
    Storage ratio      : 207.9
    Chunks initialized : 10/10

@alimanfoo
Copy link
Member Author

Test coverage should be back up. Merging soon.

@alimanfoo alimanfoo merged commit 92c3f13 into master Oct 24, 2017
@alimanfoo alimanfoo deleted the repr_work branch October 24, 2017 21:03
@jakirkham
Copy link
Member

Sorry coming to this late and have only skimmed the changes. Generally I like the new repr. It's compact, simple, and clean. Adding everything else to info is nice. Also like that info is much more orderly.

The only comment I might have is whether we want to be listing groups through info or whether we should be using the tree-work to handle that somehow.

@alimanfoo
Copy link
Member Author

No problem, thanks for comments. FWIW I think it's worth including a listing of groups and arrays in group info, it's potentially more compact than viewing as a tree, and so provides a convenient way to get an overview of first level members. The tree view is then useful to browse beyond immediate children.

commands =
python setup.py build_ext --inplace
py27,py34,py35: nosetests -v --with-coverage --cover-erase --cover-package=zarr zarr
py27,py34,py35: nosetests -v zarr
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I missed this change the other day. Part of the problem with dropping coverage is we may miss Python 2 specific branches. Am suggesting we revert this one line in PR ( https://github.com/alimanfoo/zarr/pull/169 ). Though it would be good to know the motivation for this change in case there is a better solution.

@alimanfoo alimanfoo added enhancement New features or improvements release notes done Automatically applied to PRs which have release notes. labels Nov 20, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New features or improvements release notes done Automatically applied to PRs which have release notes.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Group and Array repr performance ENH: read_only in repr Bring back "zarr" into repr

3 participants