Core protocol v3.0 - chunk grids#22
Core protocol v3.0 - chunk grids#22alimanfoo merged 2 commits intozarr-developers:core-protocol-v3.0-devfrom
Conversation
|
Straw man for discussion. Note that I'm tentatively suggesting that the core protocol sticks to defining regular chunk grids aligned to the array origin, however protocol extensions can define other grid types. For example, rectilinear grids where chunks may have different shapes would be addressed via a protocol extension. Similarly, grids where chunks may have negative indices and the origin of the array may occur anywhere in any chunk (needed to allow arrays to "grow" in the "negative" direction) would be a protocol extension. Again, just thinking of ways to keep the core protocol simple and minimal but allow for flexibility and development of additional features via protocol extensions. |
joshmoore
left a comment
There was a problem hiding this comment.
Comments from a first reading.
| ----------- | ||
|
|
||
| A chunk grid defines a set of chunks which contain the elements of an | ||
| array. The chunks of a grid form a tessellation of the array space, |
There was a problem hiding this comment.
Do you foresee a simpler grid definition than doesn't include the relationship to an array?
There was a problem hiding this comment.
I don't think so, I was thinking that the regular grid was the simplest. I think a grid has to be defined in relation to the array, i.e., how does the grid cover the array space. Although I may have misunderstood the question?
There was a problem hiding this comment.
Trying to map between what you have here and the "absolute minimal" storage layer proposed by @axtimwalde where for a given grid location one gets back nothing more than a byte stream a la #8 (comment)
There was a problem hiding this comment.
Ah OK. IIUC what @axtimwalde was saying was that there are some useful functionalities that could be provided without needing to know anything about the grid layout. For example, if you wanted to copy data from one store to another, or recode chunks using a different compressor. But I think the core protocol needs to define the full picture, in the sense that chunks always belong to an array, and that array will have some grid layout that defines what the chunks contain.
| hyperrectangles but do not all share the same shape. | ||
|
|
||
| A grid type also defines rules for constructing a unique key for each | ||
| chunk, which is a string of ASCII characters that can be used to save |
There was a problem hiding this comment.
This is perhaps for a different PR, but I could see specifying a "tuple of strings of ASCII characters in the set [...]". See below.
docs/protocol/core/v3.0.rst
Outdated
|
|
||
| The key for chunk with grid index (i, j, k, ...) is formed by | ||
| concatenating the ASCII string representation of each index, joined | ||
| together via the period (".") character. For example, in a 3 |
There was a problem hiding this comment.
Related to the "tuple of strings" from above, I would propose either leaving the joining operation to the backend or going so far as to suggest "/" as the default. My understanding is that cloud storage doesn't suffer under use of "/" but local storage does suffer under use of ".".
There was a problem hiding this comment.
Good point. In general I imagine that you could have a scheme where there is a prefix (default ""), a separator (default "."), and a suffix (default ""), which could be overridden. I.e., you could allow these to be configured on a per-array basis in the array metadata. I was wondering if that's something we should include in the core protocol, or could be a protocol extension.
There was a problem hiding this comment.
I've just tentatively pushed an edit which allows for an array to have configurable prefix, suffix and separator for chunk keys, which would be a mechanism to allow e.g. use of "/" as chunk key separator. Happy to row back on this if anyone feels that should be a protocol extension.
| ----------- | ||
|
|
||
| A chunk grid defines a set of chunks which contain the elements of an | ||
| array. The chunks of a grid form a tessellation of the array space, |
There was a problem hiding this comment.
I don't think so, I was thinking that the regular grid was the simplest. I think a grid has to be defined in relation to the array, i.e., how does the grid cover the array space. Although I may have misunderstood the question?
docs/protocol/core/v3.0.rst
Outdated
|
|
||
| The key for chunk with grid index (i, j, k, ...) is formed by | ||
| concatenating the ASCII string representation of each index, joined | ||
| together via the period (".") character. For example, in a 3 |
There was a problem hiding this comment.
Good point. In general I imagine that you could have a scheme where there is a prefix (default ""), a separator (default "."), and a suffix (default ""), which could be overridden. I.e., you could allow these to be configured on a per-array basis in the array metadata. I was wondering if that's something we should include in the core protocol, or could be a protocol extension.
| For example, in a 3 dimensional array at path "/foo/bar" configured | ||
| with default values for the chunk key prefix, suffix and separator, | ||
| the key for the chunk at grid index (1, 23, 45) is the string | ||
| "/foo/bar/1.23.45". |
There was a problem hiding this comment.
@axtimwalde, @constantinpape, @funkey, do you want to add a configuration option to allow the chunk indices to be given in reverse order, as in n5? Or is it OK to fix on using a single ordering as in zarr v2?
7a68931 to
98c0c60
Compare
|
In the interests of having content together in one place, I'd like to merge this PR into the core-protocol-v3.0-dev branch. We can still discuss, revise and revisit anything after merge. I'll merge tomorrow if no objections. |
This PR adds a section introducing chunk grids and defining regular chunk grids.