Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add bitshuffle #986

Merged
merged 10 commits into from
Jul 11, 2022
Merged

Add bitshuffle #986

merged 10 commits into from
Jul 11, 2022

Conversation

jamesrhester
Copy link
Contributor

This adds the bitshuffle filter for HDF5. Closes #981.

@mkitti
Copy link
Member

mkitti commented Jul 7, 2022

@jamesrhester
Copy link
Contributor Author

Will do.

@mkitti
Copy link
Member

mkitti commented Jul 7, 2022

I think you are missing an implementation of bshuf_h5_set_local function.
https://github.com/kiyo-masui/bitshuffle/blob/master/src/bshuf_h5filter.c#L29

@jamesrhester
Copy link
Contributor Author

Yes, working on it now. Having to write tests meant I discovered that I'd ignored the writing part of the filter...

@mkitti
Copy link
Member

mkitti commented Jul 7, 2022

See

function blosc_set_local(dcpl::API.hid_t, htype::API.hid_t, space::API.hid_t)
if you need an example of that.

Also, we could just compile the C code for the plugin, probably as a separate Yggrassil recipe. We could then just use the H5PL interface to tell HDF5 where the plugin is.

@jamesrhester
Copy link
Contributor Author

It turns out that I put the lower compat limit of bitshuffle_jll as 1.6, have to wait for that to be fixed.

@jamesrhester
Copy link
Contributor Author

OK, I'm informed (see JuliaPackaging/Yggdrasil#5111) that the lower compatibility limit for bitshuffle_jll should be 1.6. So I don't know if I should be trying to pass the Julia 1.3 checks?

@mkitti
Copy link
Member

mkitti commented Jul 7, 2022

These tests should only run with 1.6 then. Just use a @static if VERSION >= v"1.6" to qualify the tests and the package import.

@jamesrhester
Copy link
Contributor Author

jamesrhester commented Jul 8, 2022

I have used @static as suggested, not sure how to manage the import resolution fail for the 1.3 tests.

Co-authored-by: Mark Kittisopikul <[email protected]>
test/filter.jl Outdated Show resolved Hide resolved
test/runtests.jl Outdated Show resolved Hide resolved
@mkitti
Copy link
Member

mkitti commented Jul 8, 2022

I'm happy to merge this if you are. Thanks again.

@jamesrhester
Copy link
Contributor Author

Yes, I'm happy to merge. Thanks for your review and help.

@mkitti mkitti requested a review from musm July 8, 2022 07:08
@mkitti
Copy link
Member

mkitti commented Jul 8, 2022

I'm pausing for a few days to give others the chance to review. I'm also going to do some further testing with plugins from other sources and via different APIs.

@mkitti
Copy link
Member

mkitti commented Jul 11, 2022

Can the license correctly be categorized as a MIT license?

@mkitti mkitti merged commit 3a24c8e into JuliaIO:master Jul 11, 2022
@jamesrhester
Copy link
Contributor Author

Yes, it was meant to be a MIT license copied from the original bitshuffle distribution. Thanks for merging this.

@mkitti
Copy link
Member

mkitti commented Jul 12, 2022

@jamesrhester , I'm seeing a sporadic CI failure on 64-bit Windows:
https://github.com/JuliaIO/HDF5.jl/runs/7292646756?check_suite_focus=true

We may need to review

@jamesrhester
Copy link
Contributor Author

Looks like nightly is OK, but Julia 1.7 has a problem. I'll see if I can spin up a Windows 64 VM here and run the test repeatedly.

@jamesrhester
Copy link
Contributor Author

I'm suspecting it could be because of a lack of @GC.preserve at the end of the filter function with the unsafe_load and unsafe_store! calls. The corresponding H5Zblosc filter doesn't have these, however, so once I've got a test VM running I'll try and duplicate the error and then see if the @GC.preserve fixes it.

@mkitti
Copy link
Member

mkitti commented Jul 13, 2022

The original code allocates 11 Cuints for values, but you only allocate 8 (similar to tmp_values) in bitshuffle_set_local. You also don't seem to move the values to a higher position. I think you may be overwriting values stored in bs_values by API.h5p_get_filter_by_id. Perhaps I am missing something?

https://github.com/kiyo-masui/bitshuffle/blob/fdfcd404ac8dcb828857a90c559d36d8ac4c2968/src/bshuf_h5filter.c#L47-L50

You also do not bump nelements by 3.

@mkitti
Copy link
Member

mkitti commented Jul 13, 2022

I see now that you added three extra values to the struct similar to how we implemented the Blosc filter.

@mkitti
Copy link
Member

mkitti commented Jul 13, 2022

Hmm, you never initialize bs_values to 0.

@jamesrhester
Copy link
Contributor Author

The way this works, as I understand it, is that the set_local routine provides an opportunity to fill and check values in the BitshuffleFilter struct, which is what bs_values is filled with after H5API.h5p_get_filter_by_id. It would also be OK to completely skip this step and set everything at BitshuffleFilter initialisation, except that the element size isn't known at the time that the filter is constructed. So in the Julia version I have set almost every value at initialisation, except for element size, but repeat the setting of MAJOR, MINOR values.
Still sorting out access to a Windows VM.

Copy link
Member

@mkitti mkitti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other things I noticed were:

  1. The values are zero initialized.
  2. We probably should integer division. / in Julia will produce a Float64.

nbytes_uncomp = ccall((:bshuf_read_uint64_BE,libbitshuffle),Cuint,(Ptr{Cvoid},),in_buf)
# Next 4 bytes are the block size

block_size = ccall((:bshuf_read_uint32_BE,libbitshuffle),Cuint,(Ptr{Cvoid},),in_buf+8)/elem_size
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should ÷ here rather than /. block_size should be an integer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense.

nbytes_uncomp = nbytes
if compress_flag == BSHUF_H5_COMPRESS_LZ4
buf_size_out = ccall((:bshuf_compress_lz4_bound,libbitshuffle),Cuint,(Cuint,Cuint,Cuint),
nbytes_uncomp/elem_size,elem_size,block_size) + 12
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use ÷ here rather than /, though Julia might take care of translating this back for us here.

nbytes_uncomp/elem_size,elem_size,block_size) + 12
elseif compress_flag == BSHUF_H5_COMPRESS_ZSTD
buf_size_out = ccall((:bshuf_compress_zstd_bound,libbitshuffle),Cuint,(Cuint,Cuint,Cuint),
nbytes_uncomp/elem_size,elem_size,block_size)+12
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use ÷ here rather than /, though Julia might take care of translating this back for us here.

error("bitshuffle_h5plugin: Uncompressed size $nbytes_uncomp is not a multiple of $elem_size")
end

size = nbytes_uncomp/elem_size
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use ÷ here rather than /

@mkitti
Copy link
Member

mkitti commented Jul 13, 2022

One issue I encountered that relates to GC.@preserve and decompression is detailed here:
JuliaLang/julia#43402

This was due a specific change on the development branch of Julia though.

What happens if bs_values is getting freed after bitshuffle_set_local but before H5Z_filter_bitshuffle finishes?

I have a Windows box, and so far I am not able to reproduce the error reliably.

@mkitti
Copy link
Member

mkitti commented Jul 13, 2022

I'm going to try to focus testing here:
#988

@jamesrhester
Copy link
Contributor Author

One issue I encountered that relates to GC.@preserve and decompression is detailed here: JuliaLang/julia#43402

This was due a specific change on the development branch of Julia though.

I read through that but couldn't see how it applies here. Weird bug though.

What happens if bs_values is getting freed after bitshuffle_set_local but before H5Z_filter_bitshuffle finishes?

Well, as bs_values is of type Ptr{Cuint} it should not get freed by Julia if I understand the whole Ptr, Ref thing.

@mkitti
Copy link
Member

mkitti commented Jul 13, 2022

Well, as bs_values is of type Ptr{Cuint} it should not get freed by Julia if I understand the whole Ptr, Ref thing.

The problem is that you allocate it via bs_values = Vector{Cuint}(undef,8). When it gets passed to ccall, the pointer behind the Vector is passed to HDF5. After bitshuffle_set_local completes, Julia thinks it can free the pointer behind bs_values.

The question is if H5Pmodify_filter just hangs on to the pointer or if it makes a copy.

@mkitti
Copy link
Member

mkitti commented Jul 13, 2022

@mkitti
Copy link
Member

mkitti commented Jul 13, 2022

Aha, I found something.

The C function signature is:

int64_t bshuf_decompress_lz4(const void* in, void* out, const size_t size,
        const size_t elem_size, size_t block_size) {

The ccall however is

                    err = ccall((:bshuf_decompress_lz4,libbitshuffle),Cint,
                                (Ptr{Cvoid},Ptr{Cvoid},Cuint,Cuint,Cuint),
                                in_buf,out_buf,size,elem_size,block_size)

This should be

                    err = ccall((:bshuf_decompress_lz4,libbitshuffle),Int64,
                                (Ptr{Cvoid},Ptr{Cvoid},Csize_t,Csize_t,Csize_t),
                                in_buf,out_buf,size,elem_size,block_size)
julia> Csize_t
UInt64

julia> Cuint
UInt32

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BITSHUFFLE filter not supported out of the box for HDF5 reading
2 participants