-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid calling non-public h5fd_xxx_init() functions #967
Conversation
The h5fd_xxx_init() functions are not part of the HDF5 public API. A method of retrieving the hid_t value of drivers that uses public/supported API calls was suggested in a comment on a related github issue and is implemented in this commit. For more details see: HDFGroup/hdf5#1809 (comment)
Fixes #960 |
OK, something weird is happening to the fapl = HDF5.FileAccessProperties(fclose_degree=:default) # NEW
close(fapl) # NEW
DRIVERS[API.h5fd_core_init()] = Core # EXISTING
DRIVERS[API.h5fd_sec2_init()] = POSIX # EXISTING I don't understand why that would break the |
Even putting the function __init__()
DRIVERS[API.h5fd_core_init()] = Core
DRIVERS[API.h5fd_sec2_init()] = POSIX
@require MPI="da04e1cc-30fd-572f-bb4f-1f8673147195" include("mpio.jl")
fapl = HDF5.init!(HDF5.FileAccessProperties()) # NEW
close(fapl) # NEW
end Is there some other part of the HDF5 libraries that need to be initialized before creating a file access property list? |
Note that there is also the MPIO driver: Line 47 in cfa5a24
|
Thanks for the pointer to the MPIO driver. And you are exactly right about the file locking messing with the |
I see two paths forward:
|
OK, yeah, I think the lazy approach is the way to go. I tried adding similar # Check whether the HDF5 libraries were compiled with parallel support.
try
fapl = HDF5.init!(HDF5.FileAccessProperties())
comm = info = reinterpret(H5MPIHandle, 0)
API.h5p_set_fapl_mpio(fapl, comm, info)
DRIVERS[API.h5p_get_driver(fapl)] = MPIO
close(fapl)
HDF5.HAS_PARALLEL[] = true
catch e
@show e
end but that resulted in an MPI error: "The MPI_Comm_dup() function was called before MPI_INIT was invoked." It's obviously best to let the user call MPI_INIT, so the non-lazy approach seems fatally flawed in this case. |
One issue with the lazy approach is that a |
Perhaps the only driver we need to initialize is driver is the default one. That may be platform dependent. This may need to be done in |
Unless the user has already set `HDF5_USE_FILE_LOCKING` in the environment, set it to `FALSE` to avoid causing problems with mmap'ing. This needs to be done before the first call to the HDF5 APIs.
Instead of detecting parallel HDF5 support by whether `h5fd_mpio_init()` throws an exception, parallel support is detected by finding the `H5Pset_fapl_mpio` symbol in any loaded library whose name matches `r"hdf5"i`. The documentation for `H5Pset_fapl_mpio()` states: > H5Pset_fapl_mpio() is available only in the parallel HDF5 library The detection has also been moved into `Drivers.__init__()` and the inclusion of `mpio.jl` (upon loading of the MPI package) will only happen if the HDF5 libraries have parallel support.
I believe SEC2 is the default on all platforms. As for checking |
This maintains compatibility with pre-1.5 versions of Julia.
You probably should just do |
src/drivers/drivers.jl
Outdated
loaded_libs = filter(s->match(r"hdf5"i, s) !== nothing, Libc.Libdl.dllist()) | ||
for libname in loaded_libs | ||
has_mpio = Libc.Libdl.dlopen(libname) do lib | ||
Libc.Libdl.dlsym(lib, :H5Pset_fapl_mpio; throw_error=false) !== nothing | ||
end | ||
if has_mpio | ||
HDF5.HAS_PARALLEL[] = true | ||
break | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
loaded_libs = filter(s->match(r"hdf5"i, s) !== nothing, Libc.Libdl.dllist()) | |
for libname in loaded_libs | |
has_mpio = Libc.Libdl.dlopen(libname) do lib | |
Libc.Libdl.dlsym(lib, :H5Pset_fapl_mpio; throw_error=false) !== nothing | |
end | |
if has_mpio | |
HDF5.HAS_PARALLEL[] = true | |
break | |
end | |
end | |
API.Libdl.dlopen(API.libhdf5) do lib | |
HDF5.HAS_PARALLEL[] = Libdl.dlsym(lib, :H5Pset_fapl_mpio; throw_error=false) !== nothing | |
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose you could also just reference API.Libdl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this will work with a custom HDF5 library either
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I revised it to use API.libhdf5handle[]
instead of HDF5_jll.libhdf5_handle
.
Why are we not using the standard JLL override mechanism?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, that's not going to work either. HDF5.API.libhdf5
should work though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nonetheless, I think we should make sure to use the exact same shared library that the API module loaded rather than doing a search through the loaded libraries.
I pushed that latest commit before I saw the "use |
Not sure why 57c0e52 didn't get a green check, but CI did run after that was pushed. I think this is "done" other than possibly changing |
The original Overrides.toml approach (https://docs.binarybuilder.org/stable/jll/#Overriding-the-artifacts-in-JLL-packages) is way too restrictive to be useful (it requires that the libraries have the same .so version as the one in the jll) The Preferences.jl approach (https://docs.binarybuilder.org/stable/jll/#Overriding-specific-products) should work though |
API.h5p_set_fapl_core(fapl, 4096, false) | ||
DRIVERS[API.h5p_get_driver(fapl)] = Core |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could lazily initialize Core
as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This way seemed closer to the original, so it's a slightly smaller change, but I don't think there would be a problem lazily initializing Core as well if that's the consensus.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps all the set_driver!
methods should uniformly register their hid_t in DRIVERS
. From this __init__
, we just call set_driver!(fapl, POSIX())
.
I would also be willing to merge this as is and revise later. |
end | ||
|
||
fapl = HDF5.init!(HDF5.FileAccessProperties()) | ||
API.h5p_set_fapl_core(fapl, 4096, false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does 4096 come from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nowhere in particular (other than typical Linux page size). TBH, I suspect this could be set to 0 here because it's only used to setup fapl, which in turn is only used to get the ID of the Core driver. This fapl is never used to actually create anything. FWIW, I think this value defaults to 8192 elsewhere in HDF5.jl, but it's not clear where that value comes from.
DRIVERS[API.h5fd_sec2_init()] = POSIX | ||
@require MPI="da04e1cc-30fd-572f-bb4f-1f8673147195" include("mpio.jl") | ||
# disable file locking as that can cause problems with mmap'ing | ||
if !haskey(ENV, "HDF5_USE_FILE_LOCKING") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason we need to add this check here as well? Is it related to Drivers
being in its own module?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What we really need to do is address #959. Basically what happens is that the Drivers
module is initialized before the HDF5
module. However, if we create a FAPL during Drivers
initialization, then the environment variable no longer has the desired effect. Then memory mapping breaks before the files are now locked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks that makes sense.
For future work, I think we should also try to get a build with HDF5 1.13 in our CI. |
I would like to merge this and setup another pull request for revisions. |
Perhaps we should deprecate deps.jl and recommend the Preferences.jl approach? |
The h5fd_xxx_init() functions are not part of the HDF5 public API. A
method of retrieving the hid_t value of drivers that uses
public/supported API calls was suggested in a comment on a related
github issue and is implemented in this commit. For more details see:
HDFGroup/hdf5#1809 (comment)