-
Notifications
You must be signed in to change notification settings - Fork 107
Description
numcodecs currently treats a return value of 0 from ZSTD_getDecompressedSize as an input error. A value of zero could mean one of the following.
- empty
- unknown
- error
Lines 151 to 153 in 366318f
| dest_size = ZSTD_getDecompressedSize(source_ptr, source_size) | |
| if dest_size == 0: | |
| raise RuntimeError('Zstd decompression error: invalid input data') |
Rather numcodecs should use ZSTD_getFrameContentSize which the return value can be differentiated.
0means empty0xffffffffffffffff,ZSTD_CONTENTSIZE_UNKNOWN, means unknown0xfffffffffffffffe,ZSTD_CONTENTSIZE_ERROR, means error
See zstd.h or the manual for a reference.
https://github.com/facebook/zstd/blob/7cf62bc274105f5332bf2d28c57cb6e5669da4d8/lib/zstd.h#L195-L203
https://facebook.github.io/zstd/zstd_manual.html
This error arose during the implementation of Zstandard in n5-zarr:
saalfeldlab/n5-zarr#35
There the compressor was producing blocks which would return ZSTD_CONTENTSIZE_UNKNOWN. ZSTD_getDecompressedSize would return 0 and numcodecs would incorrectly interpret this as an error.
Handling ZSTD_CONTENTSIZE_UNKNOWN may be difficult.
- If a
destbuffer is provided, then perhaps that should we set as the expected decompressed size and an error should occur if the decompressed size is not that. - If a
destbuffer is not provided, we may need to either use a default or use the streaming API to build an growing buffer until all the data is decompressed.