WIP: Use kenjutsu #116

jakirkham · 2017-02-20T00:21:35Z

Fixes https://github.com/alimanfoo/zarr/issues/93
Fixes https://github.com/alimanfoo/zarr/issues/78

Use kenjutsu to handle slice normalization and shape determination.

Also have extend further to handle a sequence of indices in the slice, but may require some discussion about whether this is ok.

jakirkham · 2017-02-20T01:56:20Z

zarr/util.py

-    if isinstance(item, tuple):
+    rf_item = reformat_slices(item)
+    if Ellipsis not in rf_item:
+        rf_item += (Ellipsis,)


This shouldn't be necessary, but is really a bug on my part where I'm being overly strict. Will fix in kenjutsu with PR ( jakirkham/kenjutsu#59 ). In the interim, this shouldn't cause any issues and should be fine even when we fix it.

Solved in 0.4.1.

jakirkham · 2017-02-20T01:57:00Z

Appears there is not enough coverage, but only in the Python 3.6 case. 😕 Not really sure what to do about that though or whether that is something for me to improve or whether it lies outside these changes.

Edit: More info would be very helpful. ( https://github.com/alimanfoo/zarr/issues/117 )

alimanfoo · 2017-02-20T17:10:18Z

Hi @jakirkham, thank you for this.

Would mind breaking out just the changes that address #93 into a separate PR? That would help a lot.

Regarding fancy indexing, if I've understood this correctly, I think there may be performance issues. In the worst case, chunks could get decompressed many times over. Also, again in the worst case, if there are lots of indices, then there could be a lot of Python looping. I think we need a solution that ensures each chunk of the array is decompressed at most once, and also which minimises looping. I have some ideas, happy to discuss further.

jakirkham · 2017-02-20T17:23:51Z

Would mind breaking out just the changes that address #93 into a separate PR? That would help a lot.

So just move the multiple indices into a separate PR? No problem.

Regarding fancy indexing, if I've understood this correctly, I think there may be performance issues.

Indeed. Was interested first to see if we could get it to work. Plus it would provide some demonstration as to how one might use this functionality to handle these slices. Figured optimization would require some discussion or passing of the baton. Not sure I know enough about Zarr's internals to do this alone.

jakirkham · 2017-02-20T18:05:47Z

I'll probably borrow from this PR and tweak it as it makes for a nice template. However, I have broken out the slice normalization fixes into PR ( https://github.com/alimanfoo/zarr/pull/119 ).

alimanfoo · 2017-02-20T23:44:10Z

Thanks for breaking out #119.

For fancy indexing, the critical section of code is within Array.__getitem__, particularly the loop over chunk indices. This is the crux of Zarr, it's the only bit of code that has any real complexity, and took me some time to get right for the simple case of slicing out a contiguous region currently supported. Before entering this loop, the get_chunk_range function figures out which chunks overlap the selected region. Then the loop iterates over only the chunks in range (important optimisation - iterating over all chunks would be very slow). For each chunk, the code within the loop then figures out which part of the chunk needs to be loaded, and where that should end up in the output array.

For fancy indexing I think it is potentially simpler to handle the fancy part of the index item as a boolean array, rather than a sequence of integer indices. The code within the loop could then be modified to slice this boolean array, to obtain a sub-array of the boolean array which would be appropriate for indexing just that chunk. I.e., the boolean array would get divided up into pieces, with one piece for each chunk along the dimension the boolean array is being applied to. Figuring out the pieces is easy because they are equal sized, determined by the size of the chunks along that dimension. Within the loop there is already the current offset for each dimension, which tells you where you are in the context of the whole array. So this offset and the chunk shape is all that is needed to slice the boolean indexing array for the current chunk. Then the whole chunk can be loaded into a numpy array and the selection applied via numpy indexing. The trick is then to figure out where the data extracted from the current chunk should end up in the output array. I think probably we'd have to keep track of the offsets into the output array too, which means counting how many True values from the boolean array have been handled so far. All of this could I think be done with an integer index array instead of a boolean array, but would be a bit harder as it would require searching the array to find the set of indices that apply to the current chunk, then mapping those indices into the chunk's coordinate system, which I think will be more complex to code and slower.

There may also need to be some modification to the get_chunk_range function to handle a Boolean array, which in the first instance could be dumb and just return all chunks for the indexed dimension as overlapping the selection, but potentially could be optimised so only the chunks where there is some data to extract are returned as in-range.

This is probably all making very little sense, but thought I'd share ideas. This is the basic route I was planning to explore at some point, although have some other priorities I have to work through at the moment.

samiur · 2017-05-22T23:16:04Z

@alimanfoo any update on this? Would love to use this to update an Array using an array of indices.

alimanfoo · 2017-05-24T15:44:50Z

Hi @samiur, apologies I'm snowed under with other work at the moment. TBH I think support for fancy indexing to update the contents of a zarr array may be some way off. I need to think through how to implement limited support for fancy indexing in __getitem__ first which is not entirely trivial.

samiur · 2017-07-18T19:56:25Z

@alimanfoo no worries! Thanks for the help on this.

alimanfoo · 2017-11-10T09:44:17Z

Closing as indexing code is substantially changed via #172. Happy to revisit using kenjutsu in future if it looks like there's a lot of common code.

jakirkham commented Feb 20, 2017

View reviewed changes

This was referenced Feb 20, 2017

Advanced indexing #78

Closed

BUG: Parsing ellipsis in indexing #93

Closed

jakirkham added 9 commits February 20, 2017 19:33

Require kenjutsu.

e8c11fc

Handle selection normalization with kenjutsu.

d26b3bc

Handle slice shape comparison with kenjutsu.

74978c7

Fix length determination for selections.

4c2e68c

Test handling of Ellipsis with indices.

9f937cf

Extend slice shape comparison for index sequences.

bbc90a9

Handle one sequence of indices in the selection.

774affd

Handle setting with one sequence of indices.

248dd6a

Test handling of multiple indices.

ed243fe

alimanfoo mentioned this pull request Apr 6, 2017

Release 2.2 #144

Closed

alimanfoo mentioned this pull request Oct 25, 2017

Ellipsis handling #168

Closed

alimanfoo closed this Nov 10, 2017

jakirkham deleted the use_kenjutsu branch December 2, 2018 02:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

WIP: Use kenjutsu #116

WIP: Use kenjutsu #116

Uh oh!

jakirkham commented Feb 20, 2017 •

edited

Loading

Uh oh!

jakirkham Feb 20, 2017 •

edited

Loading

Uh oh!

jakirkham Feb 20, 2017

Uh oh!

jakirkham commented Feb 20, 2017 •

edited

Loading

Uh oh!

alimanfoo commented Feb 20, 2017

Uh oh!

jakirkham commented Feb 20, 2017 •

edited

Loading

Uh oh!

jakirkham commented Feb 20, 2017

Uh oh!

alimanfoo commented Feb 20, 2017

Uh oh!

samiur commented May 22, 2017

Uh oh!

alimanfoo commented May 24, 2017

Uh oh!

samiur commented Jul 18, 2017

Uh oh!

alimanfoo commented Nov 10, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

WIP: Use kenjutsu #116

WIP: Use kenjutsu #116

Uh oh!

Conversation

jakirkham commented Feb 20, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jakirkham Feb 20, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jakirkham Feb 20, 2017

Choose a reason for hiding this comment

Uh oh!

jakirkham commented Feb 20, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alimanfoo commented Feb 20, 2017

Uh oh!

jakirkham commented Feb 20, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jakirkham commented Feb 20, 2017

Uh oh!

alimanfoo commented Feb 20, 2017

Uh oh!

samiur commented May 22, 2017

Uh oh!

alimanfoo commented May 24, 2017

Uh oh!

samiur commented Jul 18, 2017

Uh oh!

alimanfoo commented Nov 10, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jakirkham commented Feb 20, 2017 •

edited

Loading

jakirkham Feb 20, 2017 •

edited

Loading

jakirkham commented Feb 20, 2017 •

edited

Loading

jakirkham commented Feb 20, 2017 •

edited

Loading