Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/codecs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,4 @@ Under construction.
:caption: Contents:

codecs/gzip/v1.0
codecs/blosc/v1.0
141 changes: 141 additions & 0 deletions docs/codecs/blosc/v1.0.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,141 @@
=========================
Blosc Codec (version 1.0)
=========================
---------------------------------
Editor's draft 29 September 2020
---------------------------------

Specification URI:
https://purl.org/zarr/spec/codecs/blosc/1.0
Issue tracking:
`GitHub issues <https://github.com/zarr-developers/zarr-specs/labels/codecs-blosc-v1.0>`_
Suggest an edit for this spec:
`GitHub editor <https://github.com/zarr-developers/zarr-specs/blob/core-protocol-v3.0-dev/docs/codecs/blosc/v1.0.rst>`_

Copyright 2020 `Zarr core development
team <https://github.com/orgs/zarr-developers/teams/core-devs>`_ (@@TODO
list institutions?). This work is licensed under a `Creative Commons
Attribution 3.0 Unported
License <https://creativecommons.org/licenses/by/3.0/>`_.

----


Abstract
========

This specification defines a codec for chunk compression using Blosc


Status of this document
=======================

This document is a **Work in Progress**. It may be updated, replaced
or obsoleted by other documents at any time. It is inappapropriate to
cite this document as other than work in progress.

Comments, questions or contributions to this document are very
welcome. Comments and questions should be raised via `GitHub issues
<https://github.com/zarr-developers/zarr-specs/labels/codecs-blosc-v1.0>`_. When
raising an issue, please add the label "codecs-blosc-v1.0".

This document was produced by the `Zarr core development team
<https://github.com/orgs/zarr-developers/teams/core-devs>`_.


Document conventions
====================

Conformance requirements are expressed with a combination of
descriptive assertions and [RFC2119]_ terminology. The key words
"MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
"SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in the normative
parts of this document are to be interpreted as described in
[RFC2119]_. However, for readability, these words do not appear in all
uppercase letters in this specification.

All of the text of this specification is normative except sections
explicitly marked as non-normative, examples, and notes. Examples in
this specification are introduced with the words "for example".


Chunk encoding/decoding with Blosc
==================================

@@TODO define how chunks are encoded and decoded
@@TODO be sure to clarify that the encoded data should conform to the Blosc file format
Comment on lines +65 to +66
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look up the best reference for these.


Chunks are encoded and decoded using the compression algorithm implemented in the
`C-Blosc library version 1 <https://github.com/Blosc/c-blosc>`_ or
`C-Blosc library version 2 <https://github.com/Blosc/c-blosc2>`_,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's remove blosc2 which is not production ready.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, blosc2 is a different codec, that would need to be covered in a different codec spec.

and encoded data should be stored as it is to file.

The compressor can be configured through the following parameters:

- The compression level is an integer from 0 to 9 which controls the speed and
level of compression. A level of 1 is the fastest compression method and
produces the least compression, while 9 is slowest and produces the most
compression. Compression is turned off completely when level is 0.
- The shuffling method is an integer from -1 to 2 which controls the way bytes or
bits are rearranged, which can lead to a greater compression.
A value of 1 performs byte-wise shuffling, and a value of 2 performs bit-wise
shuffling. If a value of -1 is given, then default shuffling is used: bit-wise
shuffling for buffers with item size 1, byte-wise shuffling otherwise.
Shuffling is turned off completely when the method value is 0.
- The size of the compressed blocks is an integer number of bytes. When
it is set to 0, an automatic size will be used.
- The name of the compression algorithm is a string identifier corresponding
to one of the algorithms supported by Blosc, e.g. "lz4", "zstd", "blosclz",
"zlib" or "snappy".
Comment on lines +73 to +89
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggset moving this content into the section below on "Configuring codec in array metadata" as it provides a bit more explanation about those configuration parameters.



Configuring codec in array metadata
===================================

@@TODO define how to specify in array metadata documents.

The Blosc codec can be specified as a compressor for a Zarr array under the
``compressor`` name in the corresponding array metadata document. The URI for
the Blosc codec defined in this specification is
https://purl.org/zarr/spec/codecs/blosc/1.0.

Additionally, the following parameters method must be specified
in the ``configuration`` metadata name:

- the compression level value is given by the ``clevel`` name.
- the shuffling method value is given by the ``shuffle`` name.
- the size of the compressed blocks is given by the ``blocksize`` name.
- the compression algorithm is given by the ``cname`` name.

For example, the array
metadata document below specifies a Blosc codec configured with a compression
level of 1, a byte-wise shuffling, the ``lz4`` compression algorithm and the
default block size::


{
"compressor": {
"codec": "https://purl.org/zarr/spec/codecs/blosc/1.0",
"configuration": {
"clevel": 1,
"shuffle": 1,
"cname": "lz4",
"blocksize": 0
}
},
}


References
==========

.. [RFC2119] S. Bradner. Key words for use in RFCs to Indicate
Requirement Levels. March 1997. Best Current Practice. URL:
https://tools.ietf.org/html/rfc2119



Change log
==========

@@TODO