Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
80 changes: 75 additions & 5 deletions docs/protocol/core/v3.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -103,11 +103,81 @@ Data types

TODO define core data types

Regular chunk grids
-------------------

TODO define regular chunk grids, including how to form a key for each chunk in a grid

Chunk grids
-----------

A chunk grid defines a set of chunks which contain the elements of an
array. The chunks of a grid form a tessellation of the array space,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you foresee a simpler grid definition than doesn't include the relationship to an array?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, I was thinking that the regular grid was the simplest. I think a grid has to be defined in relation to the array, i.e., how does the grid cover the array space. Although I may have misunderstood the question?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to map between what you have here and the "absolute minimal" storage layer proposed by @axtimwalde where for a given grid location one gets back nothing more than a byte stream a la #8 (comment)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah OK. IIUC what @axtimwalde was saying was that there are some useful functionalities that could be provided without needing to know anything about the grid layout. For example, if you wanted to copy data from one store to another, or recode chunks using a different compressor. But I think the core protocol needs to define the full picture, in the sense that chunks always belong to an array, and that array will have some grid layout that defines what the chunks contain.

which is a space defined by the dimensionality and shape of the
array. This means that every element of the array is a member of one
chunk, and there are no gaps or overlaps between chunks.

In general there are different possible types of grids. The core
protocol defines the regular grid type, where all chunks are
hyperrectangles of the same shape. Protocol extensions may define
other grid types, such as rectilinear grids where chunks are still
hyperrectangles but do not all share the same shape.

A grid type also defines rules for constructing a unique key for each
chunk, which is a string of ASCII characters that can be used to save
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is perhaps for a different PR, but I could see specifying a "tuple of strings of ASCII characters in the set [...]". See below.

and retrieve chunk data in a store.

Regular grids
~~~~~~~~~~~~~

A regular grid is a type of grid where an array is divided into chunks
such that each chunk is a hyperrectangle of the same shape. The
dimensionality of the grid is the same as the dimensionality of the
array. Each chunk in the grid can be addressed by a tuple of positive
integers (i, j, k, ...) corresponding to the indices of the chunk
along each dimension.

The origin vertex of a chunk has coordinates in the array space (i *
dx, j * dy, k * dz, ...) where (dx, dy, dz, ...) are the grid spacings
along each dimension, also known as the chunk shape. Thus the origin
vertex of the chunk at grid index (0, 0, 0, ...) is at coordinate (0,
0, 0, ...) in the array space, i.e., the grid is aligned with the
origin of the array. If the length of any array dimension is not
perfectly divisible by the chunk length along the same dimension, then
the grid will overhang the edge of the array space.

The shape of the chunk grid will be (ceil(x / dx), ceil(y / dy),
ceil(z / dz), ...) where (x, y, z, ...) is the array shape, / is the
division operator and ceil() is the ceiling function. For example, if
a 3 dimensional array has shape (10, 200, 3000), and has chunk shape
(5, 20, 400), then the shape of the chunk grid will be (2, 10, 8),
meaning that there will be 2 chunks along the first dimension, 10
along the second dimension, and 8 along the third dimension.

An element of an array with coordinates (i, j, k, ...) will occur
within the chunk at grid index (i // dx, j // dy, k // dz, ...), where
// is the floor division operator. The element will have coordinates
(i % dx, j % dy, k % dz, ...) within that chunk. For example, @@TODO
example.

The key for chunk with grid index (i, j, k, ...) is formed by joining
together the path of the array in which the chunk occurs, then a
forward slash character ("/"), then a prefix, then the ASCII string
representations of each index, then a suffix. The prefix, chunk
indices and suffix are joined using a separator. The default value for
the prefix is the empty string (""), the default value for the
separator is the period character (".") and the default value for the
suffix is the empty string (""), but these values may be configured,
see the section on `Array metadata`_ below.

For example, in a 3 dimensional array at path "/foo/bar" configured
with default values for the chunk key prefix, suffix and separator,
the key for the chunk at grid index (1, 23, 45) is the string
"/foo/bar/1.23.45".
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@axtimwalde, @constantinpape, @funkey, do you want to add a configuration option to allow the chunk indices to be given in reverse order, as in n5? Or is it OK to fix on using a single ordering as in zarr v2?


Note that this specification does not consider the case where the
chunk grid and the array space are not aligned at the origin vertices
of the array and the chunk at grid index (0, 0, 0, ...). However,
protocol extensions may define variations on the regular grid type
such that the grid indices may include negative integers, and the
origin vertex of the array may occur at an arbitrary position within
any chunk, which is required to allow arrays to be extended by an
arbitrary length in a "negative" direction along any dimension.

Memory layouts
--------------
Expand Down