Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BITSHUFFLE filter not supported out of the box for HDF5 reading #981

Closed
jamesrhester opened this issue Jul 2, 2022 · 9 comments · Fixed by #986
Closed

BITSHUFFLE filter not supported out of the box for HDF5 reading #981

jamesrhester opened this issue Jul 2, 2022 · 9 comments · Fixed by #986

Comments

@jamesrhester
Copy link
Contributor

Commit #899 added support for using the BITSHUFFLE filter when writing, but does not register the filter for reading and so HDF5 files written with the bitshuffle filter are not in general readable without an external filter.

So in HDF5 the Blosc and bitshuffle filters have distinct IDs (32001 and 32008 respectively) but line 148 in H5ZBlosc.jl

filterid(::Type{BloscFilter}) = H5Z_FILTER_BLOSC

attaches the filter to id 32001 only.

It looks to me like adding an abstract type or two might be a quick fix for this.

@mkitti
Copy link
Member

mkitti commented Jul 2, 2022

To clarify, the Blosc plugin (id 32001) cannot be used to read data that is encoded via the bitshuffle filter directly.

In the documentation, we provide an example that specifically details how one might create a new filter type for bitshuffle:
https://juliaio.github.io/HDF5.jl/stable/filters/#Creating-a-new-Filter-type

We also how one can use the external filter interface:
https://juliaio.github.io/HDF5.jl/stable/filters/#Other-External-Filters

@mkitti
Copy link
Member

mkitti commented Jul 2, 2022

If you would like to add bitshuffle plugin support, the starting place would be to add a build_tarballs.jl recipe for bitshuffle to
https://github.com/JuliaPackaging/Yggdrasil via https://github.com/JuliaPackaging/BinaryBuilder.jl .

Once we have a bitshuffle_jll package, then one could create a H5Zbitshuffle package similar to those in
https://github.com/JuliaIO/HDF5.jl/tree/master/filters

@jamesrhester
Copy link
Contributor Author

I see. I had hoped that passing the BITSHUFFLE option to the Blosc filter would be sufficient.

@mkitti
Copy link
Member

mkitti commented Jul 2, 2022

Could you elaborate on your current situation? You do not need to install the bitshuffle filter via Julia. HDF5 has its own independent plugin mechanism.

Do you have the bitshuffle HDF5 plugin installed?

If not, then you can download it from
https://www.hdfgroup.org/downloads/hdf5

If they are not installed to a standard location, you can then use the following to tell HDF.jl where the plugins are:

julia> HDF5.API.h5pl_prepend("/path/to/plugins")

Reading should be transparent.

Once you have the plugin somewhere, you can then use the following code if you would like to write data.

julia> const H5Zbitshuffle = ExternalFilter(32008, Cuint[0, 0]) # no compression
ExternalFilter(32008, 0x00000000, UInt32[0x00000000, 0x00000000], "Unknown Filter with id 32008", 0x00000000)

julia> const H5Zbitshuffle_lz4 = ExternalFilter(32008, Cuint[0, 2]) # lz4 compression
ExternalFilter(32008, 0x00000000, UInt32[0x00000000, 0x00000000], "Unknown Filter with id 32008", 0x00000000)

julia> const H5Zbitshuffle_zstd = ExternalFilter(32008, Cuint[0, 3]) # zstd compression
ExternalFilter(32008, 0x00000000, UInt32[0x00000000, 0x00000002], "Unknown Filter with id 32008", 0x00000000)

The Python based hdf5plugin packages derive from
https://github.com/silx-kit/hdf5plugin
That can also be used in a limited fashion.

@jamesrhester
Copy link
Contributor Author

Thanks for the detail. I want to be able to provide a minimum effort install for a cross-platform Julia script that is supposed to read a reasonably wide selection of HDF5 files. pkg.instantiate works well to handle automatically setting up the Julia environment, so the solution above of making a _jll of the bitshuffle library looks like the best solution.

@jamesrhester
Copy link
Contributor Author

A bitshuffle_jll waiting for merging into Yggdrasil: JuliaPackaging/Yggdrasil#5103

@mkitti
Copy link
Member

mkitti commented Jul 4, 2022

Excellent, thank you!

@jamesrhester
Copy link
Contributor Author

Still some work to do on this as I didn't bundle in lz4 or zstd compressors, thinking that they would appear separately in the HDF5 pipeline. But they are also possible options to the bitshuffle filter.

@mkitti
Copy link
Member

mkitti commented Jul 5, 2022

We already have lz4 and zstandard jlls. It would be good to dynamically link those.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants