Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for read-only s3 virtual driver #1010

Merged
merged 16 commits into from
Nov 21, 2022
Merged

add support for read-only s3 virtual driver #1010

merged 16 commits into from
Nov 21, 2022

Conversation

gszep
Copy link
Contributor

@gszep gszep commented Sep 11, 2022

Related to #552; We would like the equivalent functionality as h5py

h5open("s3://..."; driver=Drivers.ROS3(aws_region="us-west-2", key="", id="")) do file
    file
end

@gszep gszep marked this pull request as draft September 11, 2022 22:28
Copy link
Member

@mkitti mkitti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the pull request.

We need to pass a pointer to H5Pset_fapl_ros3. Perhaps we should box the struct in a Ref or concretely as a RefValue.

src/drivers/drivers.jl Outdated Show resolved Hide resolved
src/api/types.jl Outdated Show resolved Hide resolved
src/drivers/drivers.jl Outdated Show resolved Hide resolved
src/drivers/drivers.jl Outdated Show resolved Hide resolved
This is the read-only virtual driver that enables access to HDF5 objects stored in AWS S3
"""
struct ROS3 <: Driver
fa::API.H5FD_ros3_fapl_t
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably should be a Base.RefValue{API.H5FD_ros3_fapl_t}. It can then be converted to a pointer by ccall.

@gszep
Copy link
Contributor Author

gszep commented Sep 11, 2022

@mkitti thank you for the helpful review! Do you have a recommendation for a unit test for this? We need a public HDF5 file hosted on S3.. It appears that some of the builds do not pass raising

could not find function H5Pset_fapl_ros3 in library libhdf5

suggesting that these libhdf5 builds to not have ros3

@mkitti
Copy link
Member

mkitti commented Sep 12, 2022

You might need to register this lazily like the mpio driver rather than during __init__. Not all of the HDF5 binaries we bundle have this driver enabled.

@mkitti
Copy link
Member

mkitti commented Sep 12, 2022

The modifications to functions.jl directly are not going to work. They will be removed the next time someone calls the API generator.

I will provide a suggestion t as an alternate fix.

Copy link
Member

@mkitti mkitti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are some suggestions.

Please regenerate functions.jl.

src/drivers/drivers.jl Outdated Show resolved Hide resolved
src/drivers/drivers.jl Outdated Show resolved Hide resolved
src/file.jl Show resolved Hide resolved
test/plain.jl Outdated Show resolved Hide resolved
test/ros3.jl Outdated
Comment on lines 7 to 9
h5open("http://s3.us-east-2.amazonaws.com/hdf5ros3/GMODO-SVM01.h5"; driver=HDF5.Drivers.ROS3()) do f
@test keys(f) == ["All_Data", "Data_Products"]
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. Can you add a test to see if you can actually read data correctly?

@@ -477,6 +477,7 @@ h5pl_size
- [`h5p_get_fapl_mpio32`](@ref h5p_get_fapl_mpio32)
- [`h5p_get_fapl_mpio64`](@ref h5p_get_fapl_mpio64)
- [`h5p_get_fapl_multi`](@ref h5p_get_fapl_multi)
- [`h5p_get_fapl_ros`](@ref h5p_get_fapl_ros)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... I think we may need to the patch the generator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed. the name should be h5p_get_fapl_ros3

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a bind exception for ros3 in cc6625e

Copy link
Member

@mkitti mkitti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few changes. If you could just comment out the failing tests and mark it for later that would be fine. It would be even better if you could narrow the situation to remove the test / error case.

@mkitti
Copy link
Member

mkitti commented Sep 13, 2022

This is getting closer. I need to examine the lazy loading mechanism more closely.

Basically it should only try to load the driver when the user tries to use it rather than eagerly loading it during initialization.

return (ROS3 ∘ API.H5FD_ros3_fapl_t)(
version,
auth,
Base.unsafe_convert(Cstring, region),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be a regular convert or does it have to be an unsafe_convert ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It needs to be an unsafe_convert:

julia> convert(Cstring, "hello")
ERROR: MethodError: Cannot `convert` an object of type String to an object of type Cstring
Closest candidates are:
  convert(::Type{Cstring}, ::Union{Ptr{Int8}, Ptr{Nothing}, Ptr{UInt8}}) at ~/src/julia-1.7.1/share/julia/base/c.jl:167
  convert(::Type{T}, ::T) where T at ~/src/julia-1.7.1/share/julia/base/essentials.jl:218
Stacktrace:
 [1] top-level scope
   @ REPL[55]:1

julia> Base.unsafe_convert(Cstring, "hello")
Cstring(0x00007fc5c5e06838)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. you need to first call cconvert(Cstring, region) (this converts to String, and ensures it is nul-terminated)
  2. you then need to keep a reference to the result of this whenever you pass it to a ccall

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably the best option is to add them as extra fields to the ROS3 struct?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't we just do unsafe_convert(Cstring, String(region))?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I missed the original point of this all. Please correct me if I'm still wrong in my thinking
So the reason we need this is because we use the ROS3 instance in further calls (specifically h5p_set_fapl_ros3 h5p_get_fapl_ros3) and those need to convert the outward driver struct ROS3 to the internal libhdf5 struct H5FD_ros3_fapl_t and right now there's a mismatch in that ROS3 doesn't agree with the layout of H5FD_ros3_fapl_t?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to convert the Julia struct to the C struct. Part of the issue is that the Julia struct and its Strings could be garbage collected. Meanwhile, the Cstring is an unmanaged pointer to a null terminated character array.

Anyways, it turns out that the C struct definition here is wrong. The correct definition is

https://github.com/HDFGroup/hdf5/blob/100b22e6c23c44a082fd69b8c05a63c7492083f7/src/H5FDros3.h#L82-L88

typedef struct H5FD_ros3_fapl_t {
    int32_t version;
    hbool_t authenticate;
    char    aws_region[H5FD_ROS3_MAX_REGION_LEN + 1];
    char    secret_id[H5FD_ROS3_MAX_SECRET_ID_LEN + 1];
    char    secret_key[H5FD_ROS3_MAX_SECRET_KEY_LEN + 1];
} H5FD_ros3_fapl_t;

In Julia, that would be

struct H5FD_ros3_fapl_t
    version::Int32
    authenticate::Bool
    aws_region::NTuple{H5FD_ROS3_MAX_REGION_LEN + 1,Cchar}
    secret_id ::NTuple{H5FD_ROS3_MAX_SECRET_ID_LEN + 1, Cchar}
    secret_key::NTuple{H5FD_ROS3_MAX_SECRET_KEY_LEN + 1, Cchar};
end

I've started to rewrite this. Also, I think we need to check our definition of hbool_t. Currently it is UInt32, but I think it probably should be UInt8.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's probably not the first time there has been a mismatch in definitions. Do you want to push your changes on this branch?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I will later today. I just started it on a different computer than I am on right now. I didn't quite get the tests to pass yet on the branch I was working on.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I quickly saw this comment on the hbool issue aldanor/hdf5-rust#28
but perhaps it doesn't affect us since we dont' use msvc

@mkitti
Copy link
Member

mkitti commented Sep 14, 2022

I think this is about ready now. Is it?

@mkitti
Copy link
Member

mkitti commented Sep 18, 2022

@simonbyrne , could you take a look at this pull request?

@mkitti
Copy link
Member

mkitti commented Sep 18, 2022

@gszep is there any reason this is still in draft form?

@mkitti
Copy link
Member

mkitti commented Sep 21, 2022

I think we should move forward with this.

@mkitti mkitti marked this pull request as ready for review September 21, 2022 08:12
@@ -0,0 +1,45 @@
"""
ROS3()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update the docstring signature

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added additonal signatures below this line.

@simonbyrne
Copy link
Collaborator

Do we have a way to test this?

@mkitti
Copy link
Member

mkitti commented Sep 22, 2022

There is a test already, but without authentication.

There seems to be an example here:
https://www.hdfgroup.org/solutions/cloud-amazon-s3-storage-hdf5-connector/

ROS3(region::AbstractString, id::AbstractString, key::AbstractString) =
ROS3(1, true, region, id, key)
ROS3() = ROS3(1, false, "", "", "")
_ntuple_to_string(x) = unsafe_string(Ptr{Cchar}(pointer_from_objref(Ref(x))), length(x))
Copy link
Member

@musm musm Sep 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't convert the ntuple to a string back for me

_ntuple_to_string(x) = unsafe_string(Ptr{Cchar}(pointer_from_objref(Ref(x))), length(x))
x = "Dfsddfs"
a = ntuple(i -> i <= length(x) ? Cchar(x[i]) : Cchar(0x0), 10 + 1)
julia> _ntuple_to_string(a)
"\xc8\x18\xb2\n\0\0\0\0X\xd0\xc3"

Shouldn't this actually look like something like the following

function _ntuple_to_string(buf)
    len = findfirst(iszero, buf) -1
    unsafe_string(Base.unsafe_convert(Ptr{Int8}, collect(buf[1:len])))
end

Looks like this can also be written as

function _ntuple_to_string(buf)
    len = findfirst(iszero, buf) -1
    unsafe_string(Base.unsafe_convert(Ptr{Int8}, collect(buf)), len)
end

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unsafe_string already searches for a null itself.

julia> function _ntuple_to_string(x)
           r = Ref(x)
           GC.@preserve r unsafe_string(Ptr{Cchar}(pointer_from_objref(r)))
       end
_ntuple_to_string (generic function with 1 method)

julia> _ntuple_to_string(a)
"Dfsddfs"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also probably should use ncodeunits and codeunits like you suggested before:
#818 (comment)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use FixedString, which is already in the package?

struct FixedString{N,PAD}
data::NTuple{N,UInt8}
end
Base.length(::Type{FixedString{N,PAD}}) where {N,PAD} = N
Base.length(str::FixedString) = length(typeof(str))
pad(::Type{FixedString{N,PAD}}) where {N,PAD} = PAD
pad(x::T) where {T<:FixedString} = pad(T)

Copy link
Member

@mkitti mkitti Sep 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FixedString is not an AbstractString.

I started working on a version of InlineStrings.jl called NStrings.jl over the weekend that basically is a full blown general implementation of this and is an AbstractString.

This is basically an implementation of this idea:
JuliaStrings/InlineStrings.jl#6

I'll publish that repo soon.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was actually about to suggest moving it to a separate repo! Nice!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The initial version of NTuple based Strings is here:
https://github.com/mkitti/NStrings.jl

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current NTuple based Strings are now here and registered:

https://github.com/mkitti/StaticStrings.jl

@mkitti
Copy link
Member

mkitti commented Nov 18, 2022

Let's merge just merge this. I'll follow up with a StaticStrings.jl pull request.

@mkitti mkitti mentioned this pull request Nov 18, 2022
@musm musm merged commit 9891593 into JuliaIO:master Nov 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants