Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Use fewer objects in h5open and read #762

Merged
merged 3 commits into from
Dec 9, 2020
Merged

Conversation

jmert
Copy link
Contributor

@jmert jmert commented Dec 7, 2020

This PR closes most of the performance gap identified in reads from #752. The changes are:

  1. Reuse the Datatype identified in each of the untyped read() functions when handing off to the generic typed-read. This is accomplished by renaming the generic read() to _read() and giving it both the HDF5 Datatype and the Julia Type as arguments.
  2. Only create a new in-memory Dataspace if a hyperslab is being selected; otherwise, reuse the file Dataspace.
  3. Only create the file access property list once in h5open, and only create a create-file property list if non-default options are being set (and close it if one was created).

image

Part of the performance gap appears to be due to libhdf5 v1.12 being a bit slower than v1.10, but if the library version is forced back to v1.10, each of the operations are on-par or faster than with HDF5.jl v0.13.7.

The RFC comes from the renaming of read() — I'm not sure just hiding the generic read as _read() is really the best option since it does make specializing on particular types require overloading the internal _read instead of public read if special data handling is required (as demonstrated by the _read(..., ::Type{Opaque}) case). Ultimately, though, it does seems some sort of similar structuring will be desired to retain the performance; might just need to iterate on the exact restructuring a bit.

(Note that the first commit is from #747 since both that PR and the changes here modify the opaque read method.)

src/HDF5.jl Outdated Show resolved Hide resolved
@@ -460,13 +459,6 @@ function h5open(filename::AbstractString, mode::AbstractString = "r"; swmr::Bool
error("HDF5 does not support appending without writing")
end

close_apl = false
if apl == DEFAULT_PROPERTIES
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yikes, by construction apl != DEFAULT_PROPERTIES, since apl[:fclose_degree] = H5F_CLOSE_STRONG on the line above.

src/HDF5.jl Show resolved Hide resolved
@musm
Copy link
Member

musm commented Dec 8, 2020

LGTM

@jmert jmert force-pushed the reuse_read branch 2 times, most recently from 89fe6c6 to 27516ed Compare December 8, 2020 02:54
@musm
Copy link
Member

musm commented Dec 8, 2020

The RFC comes from the renaming of read() — I'm not sure just hiding the generic read as _read() is really the best option since it does make specializing on particular types require overloading the internal _read instead of public read if special data handling is required (as demonstrated by the _read(..., ::Type{Opaque}) case). Ultimately, though, it does seems some sort of similar structuring will be desired to retain the performance; might just need to iterate on the exact restructuring a bit.

Yeah I don't see a way to avoid that, but I also don't see this as a huge issue. How else would it be possible to allow such re-structing without the current signature were the Datatype is passed in as the second argument?

@jmert
Copy link
Contributor Author

jmert commented Dec 9, 2020

I guess the question if we're OK with the idea (and sitting on it for another day, no other alternative has come to me) would be should we allocate a better, more descriptive name than _read? generic_read might be better. It'd make describing in a future "dev docs" how to do an Opaque-like overload look more supported.

@musm
Copy link
Member

musm commented Dec 9, 2020

generic_read might be better. It'd make describing in a future "dev docs" how to do an Opaque-like overload look more supported.

That sounds good to me generic_read or h5_read (I kind of don't like this as much)

I think as long as it's documented, having the second argument require the datatype pass though, should be fine IMO.

@jmert
Copy link
Contributor Author

jmert commented Dec 9, 2020

CI all green! 🎉

@musm
Copy link
Member

musm commented Dec 9, 2020

I don't know why, but seeing all green checkmarks gives me a dopamine hit 🎉

@jmert jmert merged commit 69e23c4 into JuliaIO:master Dec 9, 2020
@jmert jmert deleted the reuse_read branch December 9, 2020 18:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants