Skip to content

Commit ad55477

Browse files
oruebelrly
andauthored
Clarify documentation of DataChunkIterator (#813)
* Fix #623 Clarify documentation of DataChunkIterator * Update CHANGELOG.md Co-authored-by: Ryan Ly <[email protected]>
1 parent 95f1965 commit ad55477

File tree

2 files changed

+29
-7
lines changed

2 files changed

+29
-7
lines changed

CHANGELOG.md

+1
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@
1313
- Changed the name of `ExternalResources.export_to_sqlite` to `ExternalResources.to_sqlite`. @mavaylon [#799](https://github.com/hdmf-dev/hdmf/pull/799)
1414
- Updated the tutorial for `ExternalResources`. @mavaylon [#799](https://github.com/hdmf-dev/hdmf/pull/799)
1515
- Added `message` argument for assert methods defined by `hdmf.testing.TestCase` to allow developers to include custom error messages with asserts. @oruebel [#812](https://github.com/hdmf-dev/hdmf/pull/812)
16+
- Clarify the expected chunk shape behavior for `DataChunkIterator`. @oruebel [#813](https://github.com/hdmf-dev/hdmf/pull/813)
1617

1718
## HDMF 3.4.7 (November 9, 2022)
1819

src/hdmf/data_utils.py

+28-7
Original file line numberDiff line numberDiff line change
@@ -426,6 +426,16 @@ class DataChunkIterator(AbstractDataChunkIterator):
426426
i.e., multiple values from the input iterator can be combined to a single chunk. This is
427427
useful for buffered I/O operations, e.g., to improve performance by accumulating data
428428
in memory and writing larger blocks at once.
429+
430+
.. note::
431+
432+
DataChunkIterator assumes that the iterator that it wraps returns one element along the
433+
iteration dimension at a time. I.e., the iterator is expected to return chunks that are
434+
one dimension lower than the array itself. For example, when iterating over the first dimension
435+
of a dataset with shape (1000, 10, 10), then the iterator would return 1000 chunks of
436+
shape (10, 10) one-chunk-at-a-time. If this pattern does not match your use-case then
437+
using :py:class:`~hdmf.data_utils.GenericDataChunkIterator` or
438+
:py:class:`~hdmf.data_utils.AbstractDataChunkIterator` may be more appropriate.
429439
"""
430440

431441
__docval_init = (
@@ -585,10 +595,13 @@ def _read_next_chunk(self):
585595
return self.__next_chunk
586596

587597
def __next__(self):
588-
r"""Return the next data chunk or raise a StopIteration exception if all chunks have been retrieved.
598+
"""
599+
Return the next data chunk or raise a StopIteration exception if all chunks have been retrieved.
589600
590-
HINT: numpy.s\_ provides a convenient way to generate index tuples using standard array slicing. This
591-
is often useful to define the DataChunk.selection of the current chunk
601+
.. tip::
602+
603+
:py:attr:`numpy.s_` provides a convenient way to generate index tuples using standard array slicing. This
604+
is often useful to define the DataChunk.selection of the current chunk
592605
593606
:returns: DataChunk object with the data and selection of the current chunk
594607
:rtype: DataChunk
@@ -639,11 +652,19 @@ def recommended_data_shape(self):
639652
@property
640653
def maxshape(self):
641654
"""
642-
Get a shape tuple describing the maximum shape of the array described by this DataChunkIterator. If an iterator
643-
is provided and no data has been read yet, then the first chunk will be read (i.e., next will be called on the
644-
iterator) in order to determine the maxshape.
655+
Get a shape tuple describing the maximum shape of the array described by this DataChunkIterator.
656+
657+
.. note::
658+
659+
If an iterator is provided and no data has been read yet, then the first chunk will be read
660+
(i.e., next will be called on the iterator) in order to determine the maxshape. The iterator
661+
is expected to return single chunks along the iterator dimension, this means that maxshape will
662+
add an additional dimension along the iteration dimension. E.g., if we iterate over
663+
the first dimension and the iterator returns chunks of shape (10, 10), then the maxshape would
664+
be (None, 10, 10) or (len(self.data), 10, 10), depending on whether size of the
665+
iteration dimension is known.
645666
646-
:return: Shape tuple. None is used for dimenwions where the maximum shape is not known or unlimited.
667+
:return: Shape tuple. None is used for dimensions where the maximum shape is not known or unlimited.
647668
"""
648669
if self.__maxshape is None:
649670
# If no data has been read from the iterator yet, read the first chunk and use it to determine the maxshape

0 commit comments

Comments
 (0)