Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrap a subset of the libhdf5 table interface methods #777

Merged
merged 9 commits into from
Dec 14, 2020

Conversation

musm
Copy link
Member

@musm musm commented Dec 10, 2020

Methods from #511


function h5tb_get_field_info(loc_id, table_name)
nfields, = h5tb_get_table_info(loc_id, table_name)
field_names = Vector{UInt8}[fill(0x00,255) for i in 1:2]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if there is any way to determine the length of these.
In https://confluence.hdfgroup.org/display/HDF5/H5TB_GET_FIELD_INFO they just allocate 255 sized buffer.

src/api_helpers.jl Outdated Show resolved Hide resolved
field_sizes = Vector{Csize_t}(undef,nfields)
field_offsets = Vector{Csize_t}(undef,nfields)
type_size = Ref{Csize_t}()
while true
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To work around the issue of determing the string buffer size, here I just initially arbitrary set it to 64 then we keep doubling until it until the string fits (detected through presence of a null byte)

Copy link
Contributor

@jmert jmert Dec 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, I don't think this is valid/safe: https://bitbucket.hdfgroup.org/projects/HDFFV/repos/hdf5/browse/hl/src/H5TB.c?at=refs%2Fheads%2Fhdf5_1_10#3092 libhdf5 just does a strcpy, so for a long name, it is overflowing the buffer.

Copy link
Member Author

@musm musm Dec 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, which is why if it doesn't fit, we resize the buffer and then check if there's a 0x00 in the buffer which would indicate it was copied correctly. If it didn't fit it would overflow the buffer and no null byte would be detected, in which case we increase the buffer size.

My local testing, where I set the fieldname to be a long string, showed that this strategy worked.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But overflowing means you're overwriting other bytes in memory — this is a classic buffer overflow attack vector.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears this is just a wrapper over datatype querying functions, so unfortunately the safe thing to do might be to just pass C_NULL pointer for the names (which the function implementation shows should just skip over filling), and then to do the loop of h5t_get_member_name over the table dataset's datatype ourselves.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, good point. I don't know what we can do other than perhaps loop through and call H5Tget_member_name to manually determine the required buffer size? At which point we're just reimplementing this function?

Heck even their example is also susceptible to overflow:
https://confluence.hdfgroup.org/display/HDF5/H5TB_GET_FIELD_INFO

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm kind of disappointed with both the documentation and implementation of this function :-\

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I updated this PR to avoid the buffer attack, by computing the column names via h5t_get_member_name

@musm
Copy link
Member Author

musm commented Dec 10, 2020

This is now ready for review.

@musm musm changed the title Wrap several table interface functions Wrap a subset of the libhdf5 table interface methods Dec 11, 2020
@musm
Copy link
Member Author

musm commented Dec 13, 2020

Sans objections I plan to merge soon

src/api_helpers.jl Outdated Show resolved Hide resolved
@musm musm merged commit bbac593 into JuliaIO:master Dec 14, 2020
@musm musm deleted the tableint branch December 14, 2020 15:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants