-
Notifications
You must be signed in to change notification settings - Fork 32
Refactor codec specs into a single doc #102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
joshmoore
merged 6 commits into
zarr-developers:core-protocol-v3.0-dev
from
alimanfoo:refactor-codecs-20201021
May 6, 2022
Merged
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
fe07904
refactor codecs into a single registry spec
alimanfoo 571cbc0
allow compression level 0
alimanfoo b646e1b
clarify shuffling
alimanfoo dffe91d
Clarify automatic block size
alimanfoo 4dbed88
Merge branch 'merge' into refactor-codecs-20201021
joshmoore 2314192
Re-apply s/codecs/codec/ change
joshmoore File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,11 +1,214 @@ | ||
| ====== | ||
| ============== | ||
| Codec registry | ||
| ============== | ||
| ------------------------------ | ||
| Editor's Draft 21 October 2020 | ||
| ------------------------------ | ||
|
|
||
| Specification URI: | ||
| https://purl.org/zarr/specs/codec | ||
| Issue tracking: | ||
| `GitHub issues <https://github.com/zarr-developers/zarr-specs/labels/codec>`_ | ||
| Suggest an edit for this spec: | ||
| `GitHub editor <https://github.com/zarr-developers/zarr-specs/blob/master/docs/codecs.rst>`_ | ||
|
|
||
| Copyright 2020 `Zarr core development team | ||
| <https://github.com/orgs/zarr-developers/teams/core-devs>`_. This work | ||
| is licensed under a `Creative Commons Attribution 3.0 Unported License | ||
| <https://creativecommons.org/licenses/by/3.0/>`_. | ||
|
|
||
| ---- | ||
|
|
||
|
|
||
| Abstract | ||
| ======== | ||
|
|
||
| This document defines codecs for use as compressors and/or filters as | ||
| part of a Zarr implementation. | ||
|
|
||
|
|
||
| Status of this documents | ||
| ======================== | ||
|
|
||
| This document is a **Work in Progress**. It may be updated, replaced | ||
| or obsoleted by other documents at any time. It is inappapropriate to | ||
| cite this document as other than work in progress. | ||
|
|
||
| Comments, questions or contributions to this document are very | ||
| welcome. Comments and questions should be raised via `GitHub issues | ||
| <https://github.com/zarr-developers/zarr-specs/labels/codec>`_. | ||
|
|
||
| This document is maintained by the `Zarr core development team | ||
| <https://github.com/orgs/zarr-developers/teams/core-devs>`_. | ||
|
|
||
|
|
||
| Document conventions | ||
| ==================== | ||
|
|
||
| This document lists a collection of codecs. For each codec, the | ||
| following information is provided: | ||
|
|
||
| * A URI which can be used to uniquely identify the codec in Zarr array | ||
| metadata. | ||
| * Any configuration parameters which can be set in Zarr array | ||
| metadata. | ||
| * A definition of encoding/decoding algorithm and the encoded format, | ||
| or a citation to an existing specification where this is defined. | ||
| * Any additional headers added to the encoded data. | ||
|
|
||
| Conformance requirements are expressed with a combination of | ||
| descriptive assertions and [RFC2119]_ terminology. The key words | ||
| "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", | ||
| "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in the normative | ||
| parts of this document are to be interpreted as described in | ||
| [RFC2119]_. However, for readability, these words do not appear in all | ||
| uppercase letters in this specification. | ||
|
|
||
| All of the text of this specification is normative except sections | ||
| explicitly marked as non-normative, examples, and notes. Examples in | ||
| this specification are introduced with the words "for example". | ||
|
|
||
|
|
||
| Codecs | ||
| ====== | ||
|
|
||
| Under construction. | ||
| Gzip | ||
| ---- | ||
|
|
||
| Codec URI: | ||
| https://purl.org/zarr/spec/codec/gzip | ||
|
|
||
|
|
||
| Configuration parameters | ||
| ~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
|
||
| level: | ||
| An integer from 0 to 9 which controls the speed and level of | ||
| compression. A level of 1 is the fastest compression method and | ||
| produces the least compressions, while 9 is slowest and produces | ||
| the most compression. Compression is turned off completely when | ||
| level is 0. | ||
|
|
||
| For example, the array metadata below specifies that the compressor is | ||
| the Gzip codec configured with a compression level of 1:: | ||
|
|
||
| { | ||
| "compressor": { | ||
| "codec": "https://purl.org/zarr/spec/codec/gzip", | ||
| "configuration": { | ||
| "level": 1 | ||
| } | ||
| }, | ||
| } | ||
|
|
||
|
|
||
| Format and algorithm | ||
| ~~~~~~~~~~~~~~~~~~~~ | ||
|
|
||
| Encoding and decoding is performed using the algorithm defined in | ||
| [RFC1951]_. | ||
|
|
||
| Encoded data should conform to the Gzip file format [RFC1952]_. | ||
|
|
||
|
|
||
| Blosc | ||
| ----- | ||
|
|
||
| Codec URI: | ||
| https://purl.org/zarr/spec/codec/blosc | ||
|
|
||
|
|
||
| Configuration parameters | ||
| ~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
|
||
| cname: | ||
| A string identifying the internal compression algorithm to be | ||
| used. At the time of writing, the following values are supported | ||
| by the c-blosc library: "lz4", "lz4hc", "blosclz", "zstd", | ||
| "snappy", "zlib". | ||
|
|
||
| clevel: | ||
| An integer from 0 to 9 which controls the speed and level of | ||
| compression. A level of 1 is the fastest compression method and | ||
| produces the least compressions, while 9 is slowest and produces | ||
| the most compression. Compression is turned off completely when | ||
| level is 0. | ||
|
|
||
| shuffle: | ||
| An integer value in the set {0, 1, 2, -1} indicating the way | ||
| bytes or bits are rearranged, which can lead to faster | ||
| and/or greater compression. A value of 1 | ||
| indicates that byte-wise shuffling is performed prior to | ||
| compression. A value of 2 indicates the bit-wise shuffling is | ||
| performed prior to compression. If a value of -1 is given, | ||
| then default shuffling is used: bit-wise shuffling for buffers | ||
| with item size of 1 byte, byte-wise shuffling otherwise. | ||
| Shuffling is turned off completely when the value is 0. | ||
|
|
||
| blocksize: | ||
| An integer giving the size in bytes of blocks into which a | ||
| buffer is divided before compression. A value of 0 | ||
| indicates that an automatic size will be used. | ||
|
|
||
| For example, the array metadata document below specifies that the | ||
| compressor is the Blosc codec configured with a compression level of | ||
| 1, byte-wise shuffling, the ``lz4`` compression algorithm and the | ||
| default block size:: | ||
|
|
||
| { | ||
| "compressor": { | ||
| "codec": "https://purl.org/zarr/spec/codec/blosc", | ||
| "configuration": { | ||
| "cname": "lz4", | ||
| "clevel": 1, | ||
| "shuffle": 1, | ||
| "blocksize": 0 | ||
| } | ||
| }, | ||
| } | ||
|
|
||
|
|
||
| Format and algorithm | ||
| ~~~~~~~~~~~~~~~~~~~~ | ||
|
|
||
| Blosc is a meta-compressor, which divides an input buffer into blocks, | ||
| then applies an internal compression algorithm to each block, then | ||
| packs the encoded blocks together into a single output buffer with a | ||
| header. The format of the encoded buffer is defined in [BLOSC]_. The | ||
| reference implementation is provided by the `c-blosc library | ||
| <https://github.com/Blosc/c-blosc>`_. | ||
|
|
||
|
|
||
| Deprecated codecs | ||
| ================= | ||
|
|
||
| There are no deprecated codecs at this time. | ||
|
|
||
|
|
||
| References | ||
| ========== | ||
|
|
||
| .. [RFC2119] S. Bradner. Key words for use in RFCs to Indicate | ||
| Requirement Levels. March 1997. Best Current Practice. URL: | ||
| https://tools.ietf.org/html/rfc2119 | ||
|
|
||
| .. [RFC1951] P. Deutsch. DEFLATE Compressed Data Format Specification version | ||
| 1.3. Requirement Levels. May 1996. Informational. URL: | ||
| https://tools.ietf.org/html/rfc1951 | ||
|
|
||
| .. [RFC1952] P. Deutsch. GZIP file format specification version 4.3. | ||
| Requirement Levels. May 1996. Informational. URL: | ||
| https://tools.ietf.org/html/rfc1952 | ||
|
|
||
| .. [BLOSC] F. Alted. Blosc Chunk Format. URL: | ||
| https://github.com/Blosc/c-blosc/blob/master/README_CHUNK_FORMAT.rst | ||
|
|
||
|
|
||
| Change log | ||
| ========== | ||
|
|
||
| .. toctree:: | ||
| :maxdepth: 1 | ||
| :caption: Contents: | ||
| Editor's Draft 21 October 2020 | ||
| ------------------------------ | ||
|
|
||
| codecs/gzip/v1.0 | ||
| * Added Gzip codec. | ||
| * Added Blosc codec. | ||
This file was deleted.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -10,8 +10,8 @@ Under construction. | |
| :caption: Contents: | ||
|
|
||
| protocol | ||
| stores | ||
| codecs | ||
| stores | ||
|
|
||
|
|
||
| Indices and tables | ||
|
|
||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This document describes the "Blosc Chunk Format". Does it mean a Zarr chunk consists of one or more Blosc chunks? If so, is the Zarr chunk a simple concatenation of the Blosc chunks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "Blosc Chunk Format" describes how blosc encodes an input buffer. So one zarr chunk becomes one blosc chunk once encoded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍