DataChunkIterator clarify iteration behavior #623

oruebel · 2021-06-02T21:27:35Z

Description

DataChunkIterator currently assumes that the iterator yields one element along the iteration dimensions at a time. We should clarify this in the documentation:

Add to the main docstring of DataChunkIterator a note that describes the expected iteration behavior
Add note to DataChunkIterator.maxshape to indicate that for iterators this adds a dimension to the chunk generated by the iterator. E.g. if the iterator produces chunks of (x,y) the DataChunkIterator will have a shape of (None, x, y)
Add a corresponding note also to the iterative write tutorial https://pynwb.readthedocs.io/en/stable/tutorials/general/iterative_write.html#iterative-data-write

Alternative approach:

Enhance DataChunkIterator to allow iterators that return multiple blocks of data along the iteration dimension. This is useful in case where, e.g., reading single timesteps from a file is very expensive and we want the iterator to return a range of timesteps directly from the file. This would require an additional parameter to control this behavior, e.g., multistep_chunks=True to indicate that chunks retrieved from the iterator contain multiple parts along the iteration dimension.

This has come up on the NWB Slack channel https://nwb-users.slack.com/archives/C5XKC14L9/p1622668229009900?thread_ts=1622588105.009100&cid=C5XKC14L9

Checklist

Have you ensured the bug was not already reported ?
Have you included a brief and descriptive title?
Have you included a clear description of the problem you are trying to solve?
Have you included a minimal code snippet that reproduces the issue you are encountering?
Have you checked our Contributing document?

The text was updated successfully, but these errors were encountered:

* Fix #623 Clarify documentation of DataChunkIterator * Update CHANGELOG.md Co-authored-by: Ryan Ly <[email protected]>

oruebel added the topic: docs Issues related to documentation label Jun 2, 2021

oruebel self-assigned this Jun 2, 2021

mavaylon1 self-assigned this Dec 6, 2021

oruebel added a commit that referenced this issue Jan 10, 2023

Fix #623 Clarify documentation of DataChunkIterator

7f5038a

This was referenced Jan 11, 2023

Clarify documentation of DataChunkIterator #813

Merged

Update iterative write and parallel I/O tutorial NeurodataWithoutBorders/pynwb#1633

Merged

oruebel closed this as completed in #813 Jan 13, 2023

oruebel added a commit that referenced this issue Jan 13, 2023

Clarify documentation of DataChunkIterator (#813)

ad55477

* Fix #623 Clarify documentation of DataChunkIterator * Update CHANGELOG.md Co-authored-by: Ryan Ly <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DataChunkIterator clarify iteration behavior #623

DataChunkIterator clarify iteration behavior #623

oruebel commented Jun 2, 2021

DataChunkIterator clarify iteration behavior #623

DataChunkIterator clarify iteration behavior #623

Comments

oruebel commented Jun 2, 2021

Description

Checklist