Skip to content

Comments

Refactor codec specs into a single doc#102

Merged
joshmoore merged 6 commits intozarr-developers:core-protocol-v3.0-devfrom
alimanfoo:refactor-codecs-20201021
May 6, 2022
Merged

Refactor codec specs into a single doc#102
joshmoore merged 6 commits intozarr-developers:core-protocol-v3.0-devfrom
alimanfoo:refactor-codecs-20201021

Conversation

@alimanfoo
Copy link
Member

On reflection, I thought it might be better to refactor the codec specifications into a single document, because this would reduce the overhead of adding new codecs and maintaining the documentation. Generally for each codec we are citing some existing documentation elsewhere for the definition of the encoding algorithm and format, and so we don't actually need to provide much information for each codec, hence having a single document per codec seems overkill.

So this PR brings the existing gzip codec spec and the proposed blosc codec spec (#95) into a single "Codec registry" specification.

Comments welcome.

@alimanfoo alimanfoo requested a review from Carreau October 21, 2020 22:53
@alimanfoo
Copy link
Member Author

cc @davidbrochart

https://tools.ietf.org/html/rfc1952

.. [BLOSC] F. Alted. Blosc Chunk Format. URL:
https://github.com/Blosc/c-blosc/blob/master/README_CHUNK_FORMAT.rst
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This document describes the "Blosc Chunk Format". Does it mean a Zarr chunk consists of one or more Blosc chunks? If so, is the Zarr chunk a simple concatenation of the Blosc chunks?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "Blosc Chunk Format" describes how blosc encodes an input buffer. So one zarr chunk becomes one blosc chunk once encoded.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

docs/codecs.rst Outdated

{
"compressor": {
"codec": "https://purl.org/zarr/spec/codecs/gzip",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec version isn't part of the URL anymore? We previously had:

Suggested change
"codec": "https://purl.org/zarr/spec/codecs/gzip",
"codec": "https://purl.org/zarr/spec/codecs/gzip/1.0",

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes on reflection I thought the version number was potentially confusing and unnecessary in the codec URI. In the case of gzip (and blosc) we do not expect the encoding format to change, and so we don't need a version number.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

docs/codecs.rst Outdated

{
"compressor": {
"codec": "https://purl.org/zarr/spec/codecs/blosc",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec version isn't part of the URL anymore? We previously had:

Suggested change
"codec": "https://purl.org/zarr/spec/codecs/blosc",
"codec": "https://purl.org/zarr/spec/codecs/blosc/1.0",

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, see above.

Note that Blosc 2 would be treated as an entirely new codec, and we would probably give it a URI like https://purl.org/zarr/spec/codecs/blosc2.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@davidbrochart
Copy link
Contributor

davidbrochart commented Oct 22, 2020

Thanks @alimanfoo, it is much better than what we had previously with one document per codec, and I like the way parameters are described, similar to what we are used to in a Python docstring.

alimanfoo and others added 3 commits October 23, 2020 16:42
Co-authored-by: David Brochart <david.brochart@gmail.com>
Co-authored-by: David Brochart <david.brochart@gmail.com>
Co-authored-by: David Brochart <david.brochart@gmail.com>
@alimanfoo
Copy link
Member Author

Thanks @davidbrochart, I've added your suggestions.

Regarding the codec URIs, are you happy that we remove the version numbers, or would you like to discuss this some more?

@davidbrochart
Copy link
Contributor

Regarding the codec URIs, are you happy that we remove the version numbers, or would you like to discuss this some more?

I'm happy like that, I was indeed confused if it was the version of the spec or the version of e.g. the GZip library.

@joshmoore
Copy link
Member

joshmoore commented May 6, 2022

I've attempted an update of this PR and intend to merge it into core-protocol-v3.0-dev in case anyone would like to re-review.

The change in 2314192 matches the work from #116, #117 and #119 which the community is already depending on.

@joshmoore joshmoore merged commit c69da91 into zarr-developers:core-protocol-v3.0-dev May 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants