Skip to content

Commit

Permalink
Add initial support for H5Dchunk_iter (JuliaIO#1031)
Browse files Browse the repository at this point in the history
* Add initial support for H5Dchunk_iter

* Implement h5d_chunk_iter_helper

* Implement HDF5.get_all_chunk_info

* Make tests pass via HDF5 1.14.0

* Apply formatting

* Test filters with filter_mask via H5Dchunk_iter

* Require functions to return an integer

* Provide index based chunk iteration, rename to HDF5.get_chunk_info_all

* Fix formatting

* Fix documentation

* Fix documentation

* Improve testing

* Always define _get_chunk_info_all_by_iter for documenter

* Update src/datasets.jl

Co-authored-by: Simon Byrne <[email protected]>

* Precompile get_chunk_info_all implementations before benchmarking

* Fix documentation

* Fix tests

* Formatting

---------

Co-authored-by: Simon Byrne <[email protected]>
  • Loading branch information
mkitti and simonbyrne authored May 30, 2023
1 parent 4957fb8 commit cfba706
Show file tree
Hide file tree
Showing 18 changed files with 315 additions and 21 deletions.
1 change: 1 addition & 0 deletions Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ Compat = "34da2185-b29b-5c13-b0c7-acf172513d20"
HDF5_jll = "0234f1f7-429e-5d53-9886-15a909be8d59"
Libdl = "8f399da3-3557-5675-b5ff-fb832c97cbdb"
Mmap = "a63ad114-7e13-5084-954f-fe012c677804"
Printf = "de0858da-6303-5e67-8744-51eddeeeb8d7"
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
Requires = "ae029012-a4dd-5104-9daa-d747884805df"
UUIDs = "cf7118a7-6976-5b1a-9a39-7adc72f591a4"
Expand Down
2 changes: 1 addition & 1 deletion docs/Manifest.toml
Original file line number Diff line number Diff line change
Expand Up @@ -128,7 +128,7 @@ uuid = "f6f2d980-1ec6-471c-a70d-0270e22f1103"
version = "0.1.0"

[[deps.HDF5]]
deps = ["Compat", "HDF5_jll", "Libdl", "Mmap", "Random", "Requires", "UUIDs"]
deps = ["Compat", "HDF5_jll", "Libdl", "Mmap", "Printf", "Random", "Requires", "UUIDs"]
path = ".."
uuid = "f67ccb44-e63f-5c2f-98bd-6dc0ccc4ba2f"
version = "0.16.13"
Expand Down
2 changes: 2 additions & 0 deletions docs/src/api_bindings.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,7 @@ h5a_write
---

## [[`H5D`](https://portal.hdfgroup.org/display/HDF5/Datasets) — Dataset Interface](@id H5D)
- [`h5d_chunk_iter`](@ref h5d_chunk_iter)
- [`h5d_close`](@ref h5d_close)
- [`h5d_create`](@ref h5d_create)
- [`h5d_create_anon`](@ref h5d_create_anon)
Expand Down Expand Up @@ -119,6 +120,7 @@ h5a_write
- [`h5d_write`](@ref h5d_write)
- [`h5d_write_chunk`](@ref h5d_write_chunk)
```@docs
h5d_chunk_iter
h5d_close
h5d_create
h5d_create_anon
Expand Down
17 changes: 16 additions & 1 deletion docs/src/interface/dataset.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,15 @@ CurrentModule = HDF5
Many dataset operations are available through the indexing interface, which is aliased to the functional interface. Below describes the functional interface.

```@docs
Dataset
create_dataset
Base.copyto!
Base.similar
create_external_dataset
get_datasets
open_dataset
write_dataset
read_dataset
```

## Chunks
Expand All @@ -20,10 +24,21 @@ get_datasets
do_read_chunk
do_write_chunk
get_chunk_index
get_chunk_info_all
get_chunk_length
get_chunk_offset
get_num_chunks
get_num_chunks_per_dim
read_chunk
write_chunk
```
```

### Private Implementation

These functions select private implementations of the public high-level API.
They should be used for diagnostic purposes only.

```@docs
_get_chunk_info_all_by_index
_get_chunk_info_all_by_iter
```
9 changes: 9 additions & 0 deletions docs/src/interface/datatype.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Datatypes

```@meta
CurrentModule = HDF5
```

```@docs
Datatype
```
1 change: 1 addition & 0 deletions docs/src/interface/files.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,5 +8,6 @@ CurrentModule = HDF5
h5open
ishdf5
Base.isopen
Base.read
start_swmr_write
```
1 change: 1 addition & 0 deletions gen/api_defs.jl
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@
### Dataset Interface
###

@bind h5d_chunk_iter(dset_id::hid_t, dxpl_id::hid_t, cb::Ptr{Nothing}, op_data::Any)::herr_t "Error iterating over chunks" (v"1.12.3", nothing)
@bind h5d_close(dataset_id::hid_t)::herr_t "Error closing dataset"
@bind h5d_create2(loc_id::hid_t, pathname::Cstring, dtype_id::hid_t, space_id::hid_t, lcpl_id::hid_t, dcpl_id::hid_t, dapl_id::hid_t)::hid_t string("Error creating dataset ", h5i_get_name(loc_id), "/", pathname)
@bind h5d_create_anon(loc_id::hid_t, type_id::hid_t, space_id::hid_t, dcpl_id::hid_t, dapl_id::hid_t)::hid_t "Error in creating anonymous dataset"
Expand Down
1 change: 1 addition & 0 deletions src/HDF5.jl
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ using Mmap: Mmap
# needed for filter(f, tuple) in julia 1.3
using Compat
using UUIDs: uuid4
using Printf: @sprintf

### PUBLIC API ###

Expand Down
18 changes: 18 additions & 0 deletions src/api/functions.jl
Original file line number Diff line number Diff line change
Expand Up @@ -443,6 +443,24 @@ function h5a_write(attr_hid, mem_type_id, buf)
return nothing
end

@static if v"1.12.3" _libhdf5_build_ver
@doc """
h5d_chunk_iter(dset_id::hid_t, dxpl_id::hid_t, cb::Ptr{Nothing}, op_data::Any)
See `libhdf5` documentation for [`H5Dchunk_iter`](https://portal.hdfgroup.org/display/HDF5/H5D_CHUNK_ITER).
"""
function h5d_chunk_iter(dset_id, dxpl_id, cb, op_data)
lock(liblock)
var"#status#" = try
ccall((:H5Dchunk_iter, libhdf5), herr_t, (hid_t, hid_t, Ptr{Nothing}, Any), dset_id, dxpl_id, cb, op_data)
finally
unlock(liblock)
end
var"#status#" < 0 && @h5error("Error iterating over chunks")
return nothing
end
end

"""
h5d_close(dataset_id::hid_t)
Expand Down
46 changes: 46 additions & 0 deletions src/api/helpers.jl
Original file line number Diff line number Diff line change
Expand Up @@ -174,6 +174,52 @@ end
end
end

"""
h5d_chunk_iter(f, dataset, [dxpl_id=H5P_DEFAULT])
Call `f(offset::Ptr{hsize_t}, filter_mask::Cuint, addr::haddr_t, size::hsize_t)` for each chunk.
`dataset` maybe a `HDF5.Dataset` or a dataset id.
`dxpl_id` is the the dataset transfer property list and is optional.
Available only for HDF5 1.10.x series for 1.10.9 and greater or for version HDF5 1.12.3 or greater.
"""
h5d_chunk_iter() = nothing

@static if v"1.12.3" _libhdf5_build_ver ||
(_libhdf5_build_ver.minor == 10 && _libhdf5_build_ver.patch >= 10)
# H5Dchunk_iter is first available in 1.10.10, 1.12.3, and 1.14.0 in the 1.10, 1.12, and 1.14 minor version series, respectively
function h5d_chunk_iter_helper(
offset::Ptr{hsize_t},
filter_mask::Cuint,
addr::haddr_t,
size::hsize_t,
@nospecialize(data::Any)
)::H5_iter_t
func, err_ref = data
try
return convert(H5_iter_t, func(offset, filter_mask, addr, size))
catch err
err_ref[] = err
return H5_ITER_ERROR
end
end
function h5d_chunk_iter(@nospecialize(f), dset_id, dxpl_id=H5P_DEFAULT)
err_ref = Ref{Any}(nothing)
fptr = @cfunction(
h5d_chunk_iter_helper, H5_iter_t, (Ptr{hsize_t}, Cuint, haddr_t, hsize_t, Any)
)
try
return h5d_chunk_iter(dset_id, dxpl_id, fptr, (f, err_ref))
catch h5err
jlerr = err_ref[]
if !isnothing(jlerr)
rethrow(jlerr)
end
rethrow(h5err)
end
end
end

"""
h5d_get_space_status(dataset_id)
Expand Down
8 changes: 7 additions & 1 deletion src/api/types.jl
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,12 @@ end
H5_ITER_NATIVE = 2
H5_ITER_N = 3
end
@enum H5_iter_t::Cint begin
H5_ITER_CONT = 0
H5_ITER_ERROR = -1
H5_ITER_STOP = 1
end
Base.convert(::Type{H5_iter_t}, x::Integer) = H5_iter_t(x)

const H5O_iterate1_t = Ptr{Cvoid}
const H5O_iterate2_t = Ptr{Cvoid}
Expand Down Expand Up @@ -249,7 +255,7 @@ _read_const(sym::Symbol) = unsafe_load(cglobal(Libdl.dlsym(libhdf5handle[], sym)
_has_symbol(sym::Symbol) = Libdl.dlsym(libhdf5handle[], sym; throw_error=false) !== nothing

# iteration order constants
# Moved to H5_iter_t enum
# Moved to H5_iter_order_t enum
#const H5_ITER_UNKNOWN = -1
#const H5_ITER_INC = 0
#const H5_ITER_DEC = 1
Expand Down
2 changes: 1 addition & 1 deletion src/api_midlevel.jl
Original file line number Diff line number Diff line change
Expand Up @@ -148,7 +148,7 @@ end
Helper method to read chunks via 0-based integer `index`.
Argument `buf` is optional and defaults to a `Vector{UInt8}` of length determined by `HDF5.h5d_get_chunk_info`.
Argument `buf` is optional and defaults to a `Vector{UInt8}` of length determined by `HDF5.API.h5d_get_chunk_info`.
Argument `dxpl_id` can be supplied a keyword and defaults to `HDF5.API.H5P_DEFAULT`.
Argument `filters` can be retrieved by supplying a `Ref{UInt32}` value via a keyword argument.
Expand Down
112 changes: 111 additions & 1 deletion src/datasets.jl
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,12 @@
# Get the dataspace of a dataset
dataspace(dset::Dataset) = Dataspace(API.h5d_get_space(checkvalid(dset)))

# Open Dataset
"""
open_dataset(parent::Union{File, Group}, name::AbstractString, [dapl, dxpl])
Open a dataset and return a [`HDF5.Dataset`](@ref) handle. Alternatively , just use index
a file or group with `name`.
"""
open_dataset(
parent::Union{File,Group},
name::AbstractString,
Expand Down Expand Up @@ -112,6 +116,14 @@ create_dataset(
# Get the datatype of a dataset
datatype(dset::Dataset) = Datatype(API.h5d_get_type(checkvalid(dset)), file(dset))

"""
read_dataset(parent::Union{File,Group}, name::AbstractString)
Read a dataset with named `name` from `parent`. This will typically return an array.
The dataset will be opened, read, and closed.
See also [`HDF5.open_dataset`](@ref), [`Base.read`](@ref)
"""
function read_dataset(parent::Union{File,Group}, name::AbstractString)
local ret
obj = open_dataset(parent, name)
Expand Down Expand Up @@ -275,6 +287,14 @@ function create_dataset(
end

# Create and write, closing the objects upon exit
"""
write_dataset(parent::Union{File,Group}, name::Union{AbstractString,Nothing}, data; pv...)
Create and write a dataset with `data`. Keywords are forwarded to [`create_dataset`](@ref).
Providing `nothing` as the name will create an anonymous dataset.
See also [`create_dataset`](@ref)
"""
function write_dataset(
parent::Union{File,Group}, name::Union{AbstractString,Nothing}, data; pv...
)
Expand Down Expand Up @@ -735,6 +755,96 @@ function get_chunk(dset::Dataset)
ret
end

struct ChunkInfo{N}
offset::NTuple{N,Int}
filter_mask::Cuint
addr::API.haddr_t
size::API.hsize_t
end
function Base.show(io::IO, ::MIME"text/plain", info::Vector{<:ChunkInfo})
print(io, typeof(info))
println(io, " with $(length(info)) elements:")
println(io, "Offset \tFilter Mask \tAddress\tSize")
println(io, "----------\t--------------------------------\t-------\t----")
for ci in info
println(
io,
@sprintf("%10s", ci.offset),
"\t",
bitstring(ci.filter_mask),
"\t",
ci.addr,
"\t",
ci.size
)
end
end

"""
HDF5.get_chunk_info_all(dataset, [dxpl])
Obtain information on all the chunks in a dataset. Returns a
`Vector{ChunkInfo{N}}`. The fields of `ChunkInfo{N}` are
* offset - `NTuple{N, Int}` indicating the offset of the chunk in terms of elements, reversed to F-order
* filter_mask - Cuint, 32-bit flags indicating whether filters have been applied to the cunk
* addr - haddr_t, byte-offset of the chunk in the file
* size - hsize_t, size of the chunk in bytes
"""
function get_chunk_info_all(dataset, dxpl=API.H5P_DEFAULT)
@static if hasmethod(API.h5d_chunk_iter, Tuple{API.hid_t})
return _get_chunk_info_all_by_iter(dataset, dxpl)
else
return _get_chunk_info_all_by_index(dataset, dxpl)
end
end

"""
_get_chunk_info_all_by_iter(dataset, [dxpl])
Implementation of [`get_chunk_info_all`](@ref) via [`HDF5.API.h5d_chunk_iter`](@ref).
We expect this will be faster, O(N), than using `h5d_get_chunk_info` since this allows us to iterate
through the chunks once.
"""
@inline function _get_chunk_info_all_by_iter(dataset, dxpl=API.H5P_DEFAULT)
ds = dataspace(dataset)
N = ndims(ds)
info = ChunkInfo{N}[]
num_chunks = get_num_chunks(dataset)
sizehint!(info, num_chunks)
API.h5d_chunk_iter(dataset, dxpl) do offset, filter_mask, addr, size
_offset = reverse(unsafe_load(Ptr{NTuple{N,API.hsize_t}}(offset)))
push!(info, ChunkInfo{N}(_offset, filter_mask, addr, size))
return HDF5.API.H5_ITER_CONT
end
return info
end

"""
_get_chunk_info_all_by_index(dataset, [dxpl])
Implementation of [`get_chunk_info_all`](@ref) via [`HDF5.API.h5d_get_chunk_info`](@ref).
We expect this will be slower, O(N^2), than using `h5d_chunk_iter` since each call to `h5d_get_chunk_info`
iterates through the B-tree structure.
"""
@inline function _get_chunk_info_all_by_index(dataset, dxpl=API.H5P_DEFAULT)
ds = dataspace(dataset)
N = ndims(ds)
info = ChunkInfo{N}[]
num_chunks = get_num_chunks(dataset)
sizehint!(info, num_chunks)
for chunk_index in 0:(num_chunks - 1)
_info_nt = HDF5.API.h5d_get_chunk_info(dataset, chunk_index)
_offset = (reverse(_info_nt[:offset])...,)
filter_mask = _info_nt[:filter_mask]
addr = _info_nt[:addr]
size = _info_nt[:size]
push!(info, ChunkInfo{N}(_offset, filter_mask, addr, size))
end
return info
end

# properties that require chunks in order to work (e.g. any filter)
# values do not matter -- just needed to form a NamedTuple with the desired keys
const chunked_props = (; compress=nothing, deflate=nothing, blosc=nothing, shuffle=nothing)
12 changes: 12 additions & 0 deletions src/readwrite.jl
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,13 @@ end

# Generic read functions

"""
read(parent::Union{HDF5.File, HDF5.Group}, name::AbstractString; pv...)
read(parent::Union{HDF5.File, HDF5.Group}, name::AbstractString => dt::HDF5.Datatype; pv...)
Read a dataset or attribute from a HDF5 file of group identified by `name`.
Optionally, specify the [`HDF5.Datatype`](@ref) to be read.
"""
function Base.read(parent::Union{File,Group}, name::AbstractString; pv...)
obj = getindex(parent, name; pv...)
val = read(obj)
Expand All @@ -33,6 +40,11 @@ end
# This infers the Julia type from the HDF5.Datatype. Specific file formats should provide their own read(dset).
const DatasetOrAttribute = Union{Dataset,Attribute}

"""
read(obj::HDF5.DatasetOrAttribute}
Read the data within a [`HDF5.Dataset`](@ref) or [`HDF5.Attribute`](@ref).
"""
function Base.read(obj::DatasetOrAttribute)
dtype = datatype(obj)
T = get_jl_type(dtype)
Expand Down
Loading

0 comments on commit cfba706

Please sign in to comment.