Skip to content

Add initial draft for blosc codec#95

Closed
davidbrochart wants to merge 2 commits intozarr-developers:core-protocol-v3.0-devfrom
davidbrochart:core-protocol-v3.0-dev
Closed

Add initial draft for blosc codec#95
davidbrochart wants to merge 2 commits intozarr-developers:core-protocol-v3.0-devfrom
davidbrochart:core-protocol-v3.0-dev

Conversation

@davidbrochart
Copy link
Contributor

I have many questions:

  • should we provide a compression algorithm name, as in numcodecs?
  • there doesn't seem to be a reference document for Blosc. Is the reference the C implementation GitHub URL?
  • it looks like Blosc and Blosc2 are incompatible. Should we provide the version number in the meta data?
  • is the file format the raw storage of the memory compressed data?

@davidbrochart davidbrochart force-pushed the core-protocol-v3.0-dev branch 2 times, most recently from 81b9154 to 55c0332 Compare September 29, 2020 09:28

Chunks are encoded and decoded using the compression algorithm implemented in the
`C-Blosc library version 1 <https://github.com/Blosc/c-blosc>`_ or
`C-Blosc library version 2 <https://github.com/Blosc/c-blosc2>`_,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's remove blosc2 which is not production ready.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, blosc2 is a different codec, that would need to be covered in a different codec spec.


Chunks are encoded and decoded using the compression algorithm implemented in the
`C-Blosc library version 1 <https://github.com/Blosc/c-blosc>`_ or
`C-Blosc library version 2 <https://github.com/Blosc/c-blosc2>`_,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, blosc2 is a different codec, that would need to be covered in a different codec spec.

Comment on lines +73 to +89
The compressor can be configured through the following parameters:

- The compression level is an integer from 0 to 9 which controls the speed and
level of compression. A level of 1 is the fastest compression method and
produces the least compression, while 9 is slowest and produces the most
compression. Compression is turned off completely when level is 0.
- The shuffling method is an integer from -1 to 2 which controls the way bytes or
bits are rearranged, which can lead to a greater compression.
A value of 1 performs byte-wise shuffling, and a value of 2 performs bit-wise
shuffling. If a value of -1 is given, then default shuffling is used: bit-wise
shuffling for buffers with item size 1, byte-wise shuffling otherwise.
Shuffling is turned off completely when the method value is 0.
- The size of the compressed blocks is an integer number of bytes. When
it is set to 0, an automatic size will be used.
- The name of the compression algorithm is a string identifier corresponding
to one of the algorithms supported by Blosc, e.g. "lz4", "zstd", "blosclz",
"zlib" or "snappy".
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggset moving this content into the section below on "Configuring codec in array metadata" as it provides a bit more explanation about those configuration parameters.

Comment on lines +65 to +66
@@TODO define how chunks are encoded and decoded
@@TODO be sure to clarify that the encoded data should conform to the Blosc file format
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look up the best reference for these.

@alimanfoo
Copy link
Member

Hi @davidbrochart, many thanks for doing this. On reflection I thought it might make our lives easier in future if we refactored all the codec specs into a single document, and so I have taken the content from this PR and refactored it into the proposed PR #102. Your thoughts on this would be very welcome.

@davidbrochart
Copy link
Contributor Author

Hi @alimanfoo, I agree, let's close this PR then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants