Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add test for FilterPipeline and UnknownFilter, fix bug #904

Merged

Conversation

mkitti
Copy link
Member

@mkitti mkitti commented Feb 6, 2022

I noticed that when playing with UnknownFilter that the data values displayed by the FilterPipeline would be complete gibberish. While it is possible that the set local function could alter the data, I eventually found there was a display bug.

julia> using HDF5, H5Zlz4

julia> FILTERS_backup = copy(HDF5.Filters.FILTERS)
Dict{Int32, Type{<:HDF5.Filters.Filter}} with 7 entries:
  5     => NBit
  4     => Szip
  6     => ScaleOffset
  2     => Shuffle
  32004 => Lz4Filter
  3     => Fletcher32
  1     => Deflate

julia> empty!(HDF5.Filters.FILTERS)
Dict{Int32, Type{<:HDF5.Filters.Filter}}()

julia> fn = tempname()
"C:\\Users\\KITTIS~1\\AppData\\Local\\Temp\\jl_g6SQbroy8s"

julia> f = h5open(fn, "w")
🗂️ HDF5.File: (read-write) C:\Users\KITTIS~1\AppData\Local\Temp\jl_g6SQbroy8s

julia>     data = collect(1:128);

julia>     filter = UnknownFilter(H5Z_FILTER_LZ4, 0, Cuint[0, 2, 4, 6, 8, 10], "Unknown LZ4", 0)
UnknownFilter(32004, 0x00000000, UInt32[0x00000000, 0x00000002, 0x00000004, 0x00000006, 0x00000008, 0x0000000a], "Unknown LZ4", 0x00000000)

julia>     ds, dt = create_dataset(f, "data", data, chunk=(32,), filter=filter)
(HDF5.Dataset: /data (file: C:\Users\KITTIS~1\AppData\Local\Temp\jl_g6SQbroy8s xfer_mode: 0), HDF5.Datatype: H5T_STD_I64LE)

julia>     dcpl = HDF5.get_create_properties(ds)
HDF5.DatasetCreateProperties(
  alloc_time      = :incremental,
  chunk           = (32,),
  filters         = HDF5.Filters.Filter[UnknownFilter(32004, 0x00000000, UInt32[0x71353038, 0x00000000, 0x00000000, 0x00000000, 0x00089804, 0x00000000], "HDF5 lz4 filter; see http://www.hdfgroup.org/services/contributions.html", 0x00000003)],
  layout          = :chunked,
  filter          = HDF5.Filters.Filter[UnknownFilter(32004, 0x00000000, UInt32[0x00000003, 0x00000000, 0x00000003, 0x00000000, 0x0c8e4b30, 0x00000000], "HDF5 lz4 filter; see http://www.hdfgroup.org/services/contributions.html", 0x00000003)],
  obj_track_times = true,
)

julia>     pipeline = HDF5.Filters.FilterPipeline(dcpl)
1-element HDF5.Filters.FilterPipeline{HDF5.DatasetCreateProperties}:
 UnknownFilter(32004, 0x00000000, UInt32[0x00000003, 0x00000000, 0x00000003, 0x00000000, 0x0cc67610, 0x00000000], "HDF5 lz4 filter; see http://www.hdfgroup.org/services/contributions.html", 0x00000003)

julia> pipeline[1]
UnknownFilter(32004, 0x00000000, UInt32[0x0e40aa68, 0x00000000, 0x00000000, 0x00000000, 0x00089804, 0x00000000], "HDF5 lz4 filter; see http://www.hdfgroup.org/services/contributions.html", 0x00000003)

@mkitti
Copy link
Member Author

mkitti commented Feb 6, 2022

The bug is on line 306. Since we do not initially know how many elements of data there might be, getindex starts with an empty array in cd_values. On the first call to h5p_get_filter, we obtain the number of elements in cd_nelmts[].

If there are actually more data elements, then we do recursion but with an appropriately sized buffer. However, line 306 below has the opposite logic. On first evaluation, length(cd_values) is always 0. cd_nelmts[] will never be less than 0 so lines 307 through 308 are never reached.

The < should be switched for >.

function Base.getindex(f::FilterPipeline, ::Type{UnknownFilter}, i::Integer, cd_values::Vector{Cuint} = Cuint[])
flags = Ref{Cuint}()
cd_nelmts = Ref{Csize_t}(length(cd_values))
namebuf = Array{UInt8}(undef, 256)
config = Ref{Cuint}()
id = API.h5p_get_filter(f.plist, i-1, flags, cd_nelmts, cd_values, length(namebuf), namebuf, config)
if cd_nelmts[] < length(cd_values)
resize!(cd_values, cd_nelmts[])
return getindex(f, UnknownFilter, i, cd_values)
end
resize!(namebuf, findfirst(isequal(0), namebuf)-1)
resize!(cd_values, cd_nelmts[])
return UnknownFilter(id, flags[], cd_values, String(namebuf), config[])
end

I purposely added a test in 0c005ac that would fail in order to demonstrate the issue.

@mkitti
Copy link
Member Author

mkitti commented Feb 6, 2022

@simonbyrne , did I understand the intent correct?

Copy link
Collaborator

@simonbyrne simonbyrne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

test/filter.jl Show resolved Hide resolved
@@ -108,4 +110,17 @@ h5open(fn, "r") do f
@test f["filtshufdef"][] == data
end

# Filter Pipeline test for UnknownFilter
FILTERS_backup = copy(HDF5.Filters.FILTERS)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question, why back up FILTERS then restore it again?

Copy link
Member Author

@mkitti mkitti Feb 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We empty FILTERS in the next line. I do not want that to affect subsequent tests, so I back it up and restore it to its original state.

We empty FILTERS so that FilterPipeline will think the LZ4 filter is Unknown.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Thanks.

@musm musm merged commit 67fc357 into JuliaIO:master Feb 9, 2022
@musm musm deleted the mk/fix_unknown_filter_get_index_bug branch February 9, 2022 16:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants