diff --git a/docs/codecs.rst b/docs/codecs.rst index 5e6702a8..6f0b3f5b 100644 --- a/docs/codecs.rst +++ b/docs/codecs.rst @@ -9,3 +9,4 @@ Under construction. :caption: Contents: codecs/gzip/v1.0 + codecs/blosc/v1.0 diff --git a/docs/codecs/blosc/v1.0.rst b/docs/codecs/blosc/v1.0.rst new file mode 100644 index 00000000..5634fd21 --- /dev/null +++ b/docs/codecs/blosc/v1.0.rst @@ -0,0 +1,141 @@ +========================= +Blosc Codec (version 1.0) +========================= +--------------------------------- + Editor's draft 29 September 2020 +--------------------------------- + +Specification URI: + https://purl.org/zarr/spec/codecs/blosc/1.0 +Issue tracking: + `GitHub issues `_ +Suggest an edit for this spec: + `GitHub editor `_ + +Copyright 2020 `Zarr core development +team `_ (@@TODO +list institutions?). This work is licensed under a `Creative Commons +Attribution 3.0 Unported +License `_. + +---- + + +Abstract +======== + +This specification defines a codec for chunk compression using Blosc + + +Status of this document +======================= + +This document is a **Work in Progress**. It may be updated, replaced +or obsoleted by other documents at any time. It is inappapropriate to +cite this document as other than work in progress. + +Comments, questions or contributions to this document are very +welcome. Comments and questions should be raised via `GitHub issues +`_. When +raising an issue, please add the label "codecs-blosc-v1.0". + +This document was produced by the `Zarr core development team +`_. + + +Document conventions +==================== + +Conformance requirements are expressed with a combination of +descriptive assertions and [RFC2119]_ terminology. The key words +"MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", +"SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in the normative +parts of this document are to be interpreted as described in +[RFC2119]_. However, for readability, these words do not appear in all +uppercase letters in this specification. + +All of the text of this specification is normative except sections +explicitly marked as non-normative, examples, and notes. Examples in +this specification are introduced with the words "for example". + + +Chunk encoding/decoding with Blosc +================================== + +@@TODO define how chunks are encoded and decoded +@@TODO be sure to clarify that the encoded data should conform to the Blosc file format + +Chunks are encoded and decoded using the compression algorithm implemented in the +`C-Blosc library version 1 `_ or +`C-Blosc library version 2 `_, +and encoded data should be stored as it is to file. + +The compressor can be configured through the following parameters: + +- The compression level is an integer from 0 to 9 which controls the speed and + level of compression. A level of 1 is the fastest compression method and + produces the least compression, while 9 is slowest and produces the most + compression. Compression is turned off completely when level is 0. +- The shuffling method is an integer from -1 to 2 which controls the way bytes or + bits are rearranged, which can lead to a greater compression. + A value of 1 performs byte-wise shuffling, and a value of 2 performs bit-wise + shuffling. If a value of -1 is given, then default shuffling is used: bit-wise + shuffling for buffers with item size 1, byte-wise shuffling otherwise. + Shuffling is turned off completely when the method value is 0. +- The size of the compressed blocks is an integer number of bytes. When + it is set to 0, an automatic size will be used. +- The name of the compression algorithm is a string identifier corresponding + to one of the algorithms supported by Blosc, e.g. "lz4", "zstd", "blosclz", + "zlib" or "snappy". + + +Configuring codec in array metadata +=================================== + +@@TODO define how to specify in array metadata documents. + +The Blosc codec can be specified as a compressor for a Zarr array under the +``compressor`` name in the corresponding array metadata document. The URI for +the Blosc codec defined in this specification is +https://purl.org/zarr/spec/codecs/blosc/1.0. + +Additionally, the following parameters method must be specified +in the ``configuration`` metadata name: + +- the compression level value is given by the ``clevel`` name. +- the shuffling method value is given by the ``shuffle`` name. +- the size of the compressed blocks is given by the ``blocksize`` name. +- the compression algorithm is given by the ``cname`` name. + +For example, the array +metadata document below specifies a Blosc codec configured with a compression +level of 1, a byte-wise shuffling, the ``lz4`` compression algorithm and the +default block size:: + + + { + "compressor": { + "codec": "https://purl.org/zarr/spec/codecs/blosc/1.0", + "configuration": { + "clevel": 1, + "shuffle": 1, + "cname": "lz4", + "blocksize": 0 + } + }, + } + + +References +========== + +.. [RFC2119] S. Bradner. Key words for use in RFCs to Indicate + Requirement Levels. March 1997. Best Current Practice. URL: + https://tools.ietf.org/html/rfc2119 + + + +Change log +========== + +@@TODO