-
Notifications
You must be signed in to change notification settings - Fork 32
Core protocol v3.0 - chunk grids #22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -103,11 +103,81 @@ Data types | |
|
|
||
| TODO define core data types | ||
|
|
||
| Regular chunk grids | ||
| ------------------- | ||
|
|
||
| TODO define regular chunk grids, including how to form a key for each chunk in a grid | ||
|
|
||
| Chunk grids | ||
| ----------- | ||
|
|
||
| A chunk grid defines a set of chunks which contain the elements of an | ||
| array. The chunks of a grid form a tessellation of the array space, | ||
| which is a space defined by the dimensionality and shape of the | ||
| array. This means that every element of the array is a member of one | ||
| chunk, and there are no gaps or overlaps between chunks. | ||
|
|
||
| In general there are different possible types of grids. The core | ||
alimanfoo marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| protocol defines the regular grid type, where all chunks are | ||
| hyperrectangles of the same shape. Protocol extensions may define | ||
| other grid types, such as rectilinear grids where chunks are still | ||
| hyperrectangles but do not all share the same shape. | ||
|
|
||
| A grid type also defines rules for constructing a unique key for each | ||
| chunk, which is a string of ASCII characters that can be used to save | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is perhaps for a different PR, but I could see specifying a "tuple of strings of ASCII characters in the set [...]". See below. |
||
| and retrieve chunk data in a store. | ||
|
|
||
| Regular grids | ||
| ~~~~~~~~~~~~~ | ||
|
|
||
| A regular grid is a type of grid where an array is divided into chunks | ||
| such that each chunk is a hyperrectangle of the same shape. The | ||
| dimensionality of the grid is the same as the dimensionality of the | ||
| array. Each chunk in the grid can be addressed by a tuple of positive | ||
| integers (i, j, k, ...) corresponding to the indices of the chunk | ||
| along each dimension. | ||
|
|
||
| The origin vertex of a chunk has coordinates in the array space (i * | ||
| dx, j * dy, k * dz, ...) where (dx, dy, dz, ...) are the grid spacings | ||
| along each dimension, also known as the chunk shape. Thus the origin | ||
| vertex of the chunk at grid index (0, 0, 0, ...) is at coordinate (0, | ||
| 0, 0, ...) in the array space, i.e., the grid is aligned with the | ||
| origin of the array. If the length of any array dimension is not | ||
| perfectly divisible by the chunk length along the same dimension, then | ||
| the grid will overhang the edge of the array space. | ||
|
|
||
| The shape of the chunk grid will be (ceil(x / dx), ceil(y / dy), | ||
| ceil(z / dz), ...) where (x, y, z, ...) is the array shape, / is the | ||
| division operator and ceil() is the ceiling function. For example, if | ||
| a 3 dimensional array has shape (10, 200, 3000), and has chunk shape | ||
| (5, 20, 400), then the shape of the chunk grid will be (2, 10, 8), | ||
| meaning that there will be 2 chunks along the first dimension, 10 | ||
| along the second dimension, and 8 along the third dimension. | ||
|
|
||
| An element of an array with coordinates (i, j, k, ...) will occur | ||
| within the chunk at grid index (i // dx, j // dy, k // dz, ...), where | ||
| // is the floor division operator. The element will have coordinates | ||
| (i % dx, j % dy, k % dz, ...) within that chunk. For example, @@TODO | ||
| example. | ||
|
|
||
| The key for chunk with grid index (i, j, k, ...) is formed by joining | ||
| together the path of the array in which the chunk occurs, then a | ||
| forward slash character ("/"), then a prefix, then the ASCII string | ||
| representations of each index, then a suffix. The prefix, chunk | ||
| indices and suffix are joined using a separator. The default value for | ||
| the prefix is the empty string (""), the default value for the | ||
| separator is the period character (".") and the default value for the | ||
| suffix is the empty string (""), but these values may be configured, | ||
| see the section on `Array metadata`_ below. | ||
|
|
||
| For example, in a 3 dimensional array at path "/foo/bar" configured | ||
| with default values for the chunk key prefix, suffix and separator, | ||
| the key for the chunk at grid index (1, 23, 45) is the string | ||
| "/foo/bar/1.23.45". | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @axtimwalde, @constantinpape, @funkey, do you want to add a configuration option to allow the chunk indices to be given in reverse order, as in n5? Or is it OK to fix on using a single ordering as in zarr v2? |
||
|
|
||
| Note that this specification does not consider the case where the | ||
| chunk grid and the array space are not aligned at the origin vertices | ||
| of the array and the chunk at grid index (0, 0, 0, ...). However, | ||
| protocol extensions may define variations on the regular grid type | ||
| such that the grid indices may include negative integers, and the | ||
| origin vertex of the array may occur at an arbitrary position within | ||
| any chunk, which is required to allow arrays to be extended by an | ||
| arbitrary length in a "negative" direction along any dimension. | ||
|
|
||
| Memory layouts | ||
| -------------- | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you foresee a simpler grid definition than doesn't include the relationship to an array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, I was thinking that the regular grid was the simplest. I think a grid has to be defined in relation to the array, i.e., how does the grid cover the array space. Although I may have misunderstood the question?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying to map between what you have here and the "absolute minimal" storage layer proposed by @axtimwalde where for a given grid location one gets back nothing more than a byte stream a la #8 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah OK. IIUC what @axtimwalde was saying was that there are some useful functionalities that could be provided without needing to know anything about the grid layout. For example, if you wanted to copy data from one store to another, or recode chunks using a different compressor. But I think the core protocol needs to define the full picture, in the sense that chunks always belong to an array, and that array will have some grid layout that defines what the chunks contain.