-
Notifications
You must be signed in to change notification settings - Fork 32
Add initial draft for blosc codec #95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -9,3 +9,4 @@ Under construction. | |
| :caption: Contents: | ||
|
|
||
| codecs/gzip/v1.0 | ||
| codecs/blosc/v1.0 | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,141 @@ | ||
| ========================= | ||
| Blosc Codec (version 1.0) | ||
| ========================= | ||
| --------------------------------- | ||
| Editor's draft 29 September 2020 | ||
| --------------------------------- | ||
|
|
||
| Specification URI: | ||
| https://purl.org/zarr/spec/codecs/blosc/1.0 | ||
| Issue tracking: | ||
| `GitHub issues <https://github.com/zarr-developers/zarr-specs/labels/codecs-blosc-v1.0>`_ | ||
| Suggest an edit for this spec: | ||
| `GitHub editor <https://github.com/zarr-developers/zarr-specs/blob/core-protocol-v3.0-dev/docs/codecs/blosc/v1.0.rst>`_ | ||
|
|
||
| Copyright 2020 `Zarr core development | ||
| team <https://github.com/orgs/zarr-developers/teams/core-devs>`_ (@@TODO | ||
| list institutions?). This work is licensed under a `Creative Commons | ||
| Attribution 3.0 Unported | ||
| License <https://creativecommons.org/licenses/by/3.0/>`_. | ||
|
|
||
| ---- | ||
|
|
||
|
|
||
| Abstract | ||
| ======== | ||
|
|
||
| This specification defines a codec for chunk compression using Blosc | ||
|
|
||
|
|
||
| Status of this document | ||
| ======================= | ||
|
|
||
| This document is a **Work in Progress**. It may be updated, replaced | ||
| or obsoleted by other documents at any time. It is inappapropriate to | ||
| cite this document as other than work in progress. | ||
|
|
||
| Comments, questions or contributions to this document are very | ||
| welcome. Comments and questions should be raised via `GitHub issues | ||
| <https://github.com/zarr-developers/zarr-specs/labels/codecs-blosc-v1.0>`_. When | ||
| raising an issue, please add the label "codecs-blosc-v1.0". | ||
|
|
||
| This document was produced by the `Zarr core development team | ||
| <https://github.com/orgs/zarr-developers/teams/core-devs>`_. | ||
|
|
||
|
|
||
| Document conventions | ||
| ==================== | ||
|
|
||
| Conformance requirements are expressed with a combination of | ||
| descriptive assertions and [RFC2119]_ terminology. The key words | ||
| "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", | ||
| "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in the normative | ||
| parts of this document are to be interpreted as described in | ||
| [RFC2119]_. However, for readability, these words do not appear in all | ||
| uppercase letters in this specification. | ||
|
|
||
| All of the text of this specification is normative except sections | ||
| explicitly marked as non-normative, examples, and notes. Examples in | ||
| this specification are introduced with the words "for example". | ||
|
|
||
|
|
||
| Chunk encoding/decoding with Blosc | ||
| ================================== | ||
|
|
||
| @@TODO define how chunks are encoded and decoded | ||
| @@TODO be sure to clarify that the encoded data should conform to the Blosc file format | ||
|
|
||
| Chunks are encoded and decoded using the compression algorithm implemented in the | ||
| `C-Blosc library version 1 <https://github.com/Blosc/c-blosc>`_ or | ||
| `C-Blosc library version 2 <https://github.com/Blosc/c-blosc2>`_, | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. let's remove blosc2 which is not production ready.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, blosc2 is a different codec, that would need to be covered in a different codec spec. |
||
| and encoded data should be stored as it is to file. | ||
|
|
||
| The compressor can be configured through the following parameters: | ||
|
|
||
| - The compression level is an integer from 0 to 9 which controls the speed and | ||
| level of compression. A level of 1 is the fastest compression method and | ||
| produces the least compression, while 9 is slowest and produces the most | ||
| compression. Compression is turned off completely when level is 0. | ||
| - The shuffling method is an integer from -1 to 2 which controls the way bytes or | ||
| bits are rearranged, which can lead to a greater compression. | ||
| A value of 1 performs byte-wise shuffling, and a value of 2 performs bit-wise | ||
| shuffling. If a value of -1 is given, then default shuffling is used: bit-wise | ||
| shuffling for buffers with item size 1, byte-wise shuffling otherwise. | ||
| Shuffling is turned off completely when the method value is 0. | ||
| - The size of the compressed blocks is an integer number of bytes. When | ||
| it is set to 0, an automatic size will be used. | ||
| - The name of the compression algorithm is a string identifier corresponding | ||
| to one of the algorithms supported by Blosc, e.g. "lz4", "zstd", "blosclz", | ||
| "zlib" or "snappy". | ||
|
Comment on lines
+73
to
+89
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Suggset moving this content into the section below on "Configuring codec in array metadata" as it provides a bit more explanation about those configuration parameters. |
||
|
|
||
|
|
||
| Configuring codec in array metadata | ||
| =================================== | ||
|
|
||
| @@TODO define how to specify in array metadata documents. | ||
|
|
||
| The Blosc codec can be specified as a compressor for a Zarr array under the | ||
| ``compressor`` name in the corresponding array metadata document. The URI for | ||
| the Blosc codec defined in this specification is | ||
| https://purl.org/zarr/spec/codecs/blosc/1.0. | ||
|
|
||
| Additionally, the following parameters method must be specified | ||
| in the ``configuration`` metadata name: | ||
|
|
||
| - the compression level value is given by the ``clevel`` name. | ||
| - the shuffling method value is given by the ``shuffle`` name. | ||
| - the size of the compressed blocks is given by the ``blocksize`` name. | ||
| - the compression algorithm is given by the ``cname`` name. | ||
|
|
||
| For example, the array | ||
| metadata document below specifies a Blosc codec configured with a compression | ||
| level of 1, a byte-wise shuffling, the ``lz4`` compression algorithm and the | ||
| default block size:: | ||
|
|
||
|
|
||
| { | ||
| "compressor": { | ||
| "codec": "https://purl.org/zarr/spec/codecs/blosc/1.0", | ||
| "configuration": { | ||
| "clevel": 1, | ||
| "shuffle": 1, | ||
| "cname": "lz4", | ||
| "blocksize": 0 | ||
| } | ||
| }, | ||
| } | ||
|
|
||
|
|
||
| References | ||
| ========== | ||
|
|
||
| .. [RFC2119] S. Bradner. Key words for use in RFCs to Indicate | ||
| Requirement Levels. March 1997. Best Current Practice. URL: | ||
| https://tools.ietf.org/html/rfc2119 | ||
|
|
||
|
|
||
|
|
||
| Change log | ||
| ========== | ||
|
|
||
| @@TODO | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll look up the best reference for these.