Add initial draft for blosc codec#95
Add initial draft for blosc codec#95davidbrochart wants to merge 2 commits intozarr-developers:core-protocol-v3.0-devfrom
Conversation
81b9154 to
55c0332
Compare
55c0332 to
3d40084
Compare
92156ff to
3da3f04
Compare
|
|
||
| Chunks are encoded and decoded using the compression algorithm implemented in the | ||
| `C-Blosc library version 1 <https://github.com/Blosc/c-blosc>`_ or | ||
| `C-Blosc library version 2 <https://github.com/Blosc/c-blosc2>`_, |
There was a problem hiding this comment.
let's remove blosc2 which is not production ready.
There was a problem hiding this comment.
Yes, blosc2 is a different codec, that would need to be covered in a different codec spec.
|
|
||
| Chunks are encoded and decoded using the compression algorithm implemented in the | ||
| `C-Blosc library version 1 <https://github.com/Blosc/c-blosc>`_ or | ||
| `C-Blosc library version 2 <https://github.com/Blosc/c-blosc2>`_, |
There was a problem hiding this comment.
Yes, blosc2 is a different codec, that would need to be covered in a different codec spec.
| The compressor can be configured through the following parameters: | ||
|
|
||
| - The compression level is an integer from 0 to 9 which controls the speed and | ||
| level of compression. A level of 1 is the fastest compression method and | ||
| produces the least compression, while 9 is slowest and produces the most | ||
| compression. Compression is turned off completely when level is 0. | ||
| - The shuffling method is an integer from -1 to 2 which controls the way bytes or | ||
| bits are rearranged, which can lead to a greater compression. | ||
| A value of 1 performs byte-wise shuffling, and a value of 2 performs bit-wise | ||
| shuffling. If a value of -1 is given, then default shuffling is used: bit-wise | ||
| shuffling for buffers with item size 1, byte-wise shuffling otherwise. | ||
| Shuffling is turned off completely when the method value is 0. | ||
| - The size of the compressed blocks is an integer number of bytes. When | ||
| it is set to 0, an automatic size will be used. | ||
| - The name of the compression algorithm is a string identifier corresponding | ||
| to one of the algorithms supported by Blosc, e.g. "lz4", "zstd", "blosclz", | ||
| "zlib" or "snappy". |
There was a problem hiding this comment.
Suggset moving this content into the section below on "Configuring codec in array metadata" as it provides a bit more explanation about those configuration parameters.
| @@TODO define how chunks are encoded and decoded | ||
| @@TODO be sure to clarify that the encoded data should conform to the Blosc file format |
There was a problem hiding this comment.
I'll look up the best reference for these.
|
Hi @davidbrochart, many thanks for doing this. On reflection I thought it might make our lives easier in future if we refactored all the codec specs into a single document, and so I have taken the content from this PR and refactored it into the proposed PR #102. Your thoughts on this would be very welcome. |
|
Hi @alimanfoo, I agree, let's close this PR then. |
I have many questions: