Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading a part of an HDF5Compound dataset #427

Closed
tamasgal opened this issue Jul 3, 2017 · 6 comments
Closed

Reading a part of an HDF5Compound dataset #427

tamasgal opened this issue Jul 3, 2017 · 6 comments

Comments

@tamasgal
Copy link
Contributor

tamasgal commented Jul 3, 2017

I ended up using a flat data structure in our data format, so instead of having thousands of /event/1, /event/2, etc. we now have a single table and a second table which holds event_index and event_length, so we can directly access the data in the flat /events dataset.
This is now really fast in Python but I could not figure out a way to read in parts of an HDF5Compound dataset with Julia.

So something like

f["/hits"][100:250]
Dataset indexing (hyperslab) is available only for bits types

fails obviously. Now I am a bit confused: is this even possible with the current implementation of HDF5.jl?

@damiendr
Copy link

I'd also like to have this feature. I might give it a go — did this prove problematic or is it just not implemented yet?

@damiendr
Copy link

In particular, what's up with this:

HDF5.jl/src/HDF5.jl

Lines 2137 to 2138 in c04c620

## The following doesn't work because it's in libhdf5_hl.so.
## (:h5tb_get_field_info, :H5TBget_field_info, Herr, (Hid, Ptr{UInt8}, Ptr{Ptr{UInt8}}, Ptr{UInt8}, Ptr{UInt8}, Ptr{UInt8}), (:loc_id, :table_name, :field_names, :field_sizes, :field_offsets, :type_size), :(error("Error getting field information")))

Is there a particular reason why one shouldn't depend on the high-level APIs?

@carstenbauer
Copy link

carstenbauer commented Nov 22, 2018

@damiendr Did you make any progress on this? Are there any half-way attempts? I will give this a try but want to get as much info/help as possible first.

Note the related issue JuliaIO/MAT.jl#76.

@damiendr
Copy link

@crstnbr not on the indexing specifically. I did write some code a while ago that sits on top of HDF5.jl and reads compound types much faster: https://github.com/damiendr/HDFTables.jl
It's not yet registered as a package or tested on Julia 1.0 (I'll get to that some time this month).

This uses HDF5.read_array() which in turns calls h5d_read() with H5S_ALL as the dataspace. Hopefully you would only have to modify that call and supply a dataspace object for the slice, but I haven't tried it.

@carstenbauer
Copy link

carstenbauer commented Nov 22, 2018

Turns out that JLD now allows hyperslabbing (is this a word?) in getindex of a JldDataset:

jldopen("myfile.jld", "r") do f
    f["my_dataset"][4] # only read 4th element
end

Strangely, read(f["my_dataset"], 4) doesn't work for me though. Anyways, this should be sufficient for my use case. I'll take a look at your package nonetheless!

PS: JuliaIO/JLD.jl#239

@kleinhenz
Copy link
Contributor

closed by #652.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants