Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault when writing variable length string as attribute #1129

Closed
ericphanson opened this issue Nov 1, 2023 · 8 comments · Fixed by #1130
Closed

Segfault when writing variable length string as attribute #1129

ericphanson opened this issue Nov 1, 2023 · 8 comments · Fixed by #1130

Comments

@ericphanson
Copy link

ericphanson commented Nov 1, 2023

julia> using HDF5

julia> fid = h5open("test.h5", "w")
🗂️ HDF5.File: (read-write) test.h5

julia> attr = create_attribute(fid, "attr-name", datatype(String), dataspace(String))
🏷️ HDF5.Attribute: attr-name

julia> write_attribute(attr, datatype(String), "attr-value")

[16750] signal (11.2): Segmentation fault: 11
in expression starting at REPL[4]:1
_platform_strlen at /usr/lib/system/libsystem_platform.dylib (unknown line)
Allocations: 1346244 (Pool: 1345285; Big: 959); GC: 2
zsh: segmentation fault  julia --project

using

  [f67ccb44] HDF5 v0.17.1

and

Julia Version 1.9.3
Commit bed2cd540a1 (2023-08-24 14:43 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: macOS (arm64-apple-darwin22.4.0)
  CPU: 8 × Apple M1
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, apple-m1)
  Threads: 4 on 4 virtual cores
Environment:
  JULIA_VERSION = 1.9.3
  JULIA_NUM_THREADS = 4
  JULIA_PKG_SERVER_REGISTRY_PREFERENCE = eager

Also: the documentation around create_attribute is confusing. It says to use write_attribute to actually write the data, but if you don't use the returned Attribute object from create_attribute (instead using write_attribute(fid, "attr-key", "attr-value")), then it will error and say the attribute already exists.

@ericphanson
Copy link
Author

I was able to workaround it with the following code:

julia> function write_variable_length_string_attribute(fid, attr_key::String, attr_value::String)
           attr = create_attribute(fid, attr_key, datatype(String), dataspace(String))
           v = Vector{UInt8}(attr_value)
           GC.@preserve v begin
               p = pointer(v)
               write_attribute(attr, datatype(String), Ref(p))
           end
           return nothing
       end
write_variable_length_string_attribute (generic function with 1 method)

julia> fid = h5open("test.h5", "w")
🗂️ HDF5.File: (read-write) test.h5

julia> write_variable_length_string_attribute(fid, "attr-key", "attr-value")

julia> close(fid)

shell> h5dump test.h5
HDF5 "test.h5" {
GROUP "/" {
   ATTRIBUTE "attr-key" {
      DATATYPE  H5T_STRING {
         STRSIZE H5T_VARIABLE;
         STRPAD H5T_STR_NULLTERM;
         CSET H5T_CSET_UTF8;
         CTYPE H5T_C_S1;
      }
      DATASPACE  SCALAR
      DATA {
      (0): "attr-value"
      }
   }
}
}

The context here is I need to write a variable-length string as an attribute, so some python code using h5py will interpret the attribute as a string and not a numpy byte array (xref https://docs.h5py.org/en/stable/strings.html).

@ericphanson ericphanson changed the title Segfault when writing attribute Segfault when writing variable length string as attribute Nov 1, 2023
@ericphanson
Copy link
Author

I don't know if something specific should be done to ensure null termination. I added a branch here, though the output seems exactly the same:

julia> using HDF5

julia> fid = h5open("test.h5", "w")
🗂️ HDF5.File: (read-write) test.h5

julia> function write_variable_length_string_attribute(fid, attr_key::String, attr_value::String)
           attr = create_attribute(fid, attr_key, datatype(String), dataspace(String))
           v = Vector{UInt8}(attr_value)
           v[end] == 0 || push!(v, 0) # null termination?
           GC.@preserve v begin
               p = pointer(v)
               write_attribute(attr, datatype(String), Ref(p))
           end
           return nothing
       end
write_variable_length_string_attribute (generic function with 1 method)

julia> write_variable_length_string_attribute(fid, "attr-key", "attr-value")

julia> close(fid)

shell> h5dump test.input
h5dump error: unable to open file "test.input"

shell> h5dump test.h5
HDF5 "test.h5" {
GROUP "/" {
   ATTRIBUTE "attr-key" {
      DATATYPE  H5T_STRING {
         STRSIZE H5T_VARIABLE;
         STRPAD H5T_STR_NULLTERM;
         CSET H5T_CSET_UTF8;
         CTYPE H5T_C_S1;
      }
      DATASPACE  SCALAR
      DATA {
      (0): "attr-value"
      }
   }
}
}

@simonbyrne
Copy link
Collaborator

We should update the docs to recommend everyone use attrs

attrs(fid)["attr-key"] = "attr-value"

I don't get why this is segfaulting though?

@simonbyrne
Copy link
Collaborator

oh, i misunderstood: by default we write them as fixed length strings...

@simonbyrne
Copy link
Collaborator

It looks like we have to pass a pointer to a string pointer.

@simonbyrne
Copy link
Collaborator

@ericphanson
Copy link
Author

Is

function write_variable_length_string_attribute(fid, attr_key::String, attr_value::String)
           attr = create_attribute(fid, attr_key, datatype(String), dataspace(String))
           v = Vector{UInt8}(attr_value)
           v[end] == 0 || push!(v, 0) # null termination?
           GC.@preserve v begin
               p = pointer(v)
               write_attribute(attr, datatype(String), Ref(p))
           end
           return nothing
       end

safe/legit? It seems to work, but I don't really know what I am doing

@simonbyrne
Copy link
Collaborator

Most reliable option is to use cconvert/unsafe_convert:

julia> using HDF5

julia> fid = h5open("test.h5", "w")
🗂️ HDF5.File: (read-write) test.h5

julia> attr = create_attribute(fid, "attr-name", datatype(String), dataspace(String))
🏷️ HDF5.Attribute: attr-name

julia> val = Base.cconvert(Cstring, "attr-val") # ensures string is nul-terminated
"attr-val"

julia> GC.@preserve val begin
          p = Base.unsafe_convert(Cstring, val)
          write_attribute(attr, datatype(String), Ref(p))
       end

julia> close(fid)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants