-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define AttributeDict #948
Define AttributeDict #948
Conversation
I'm not sure if we need to deprecate |
It's kind of confusing to keep both, and everything it does can be better handled by the mid-level APIs. |
1952ff8
to
fddc71a
Compare
Even if we are to deprecate For the moment, we need to duplicate the tests, not replace them. We can remove the tests when we actually remove the API. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I like the idea of having a AbstractDict{String, Any}
interface emulating h5py. The interface is simple to use. However, we should also figure out a way to provide a Julian interface with additional type safe features.
We should retain the tests for the current attributes
API until removal. We cannot remove this without a deprecation period and at least a minor version bump.
I would also prefer to have deprecation in another PR while we evaluate the matter. It would also help to narrow the diff. We should consider if there might be a Julian way to emulate the C++ highfive interface with parametric types:
https://bluebrain.github.io/HighFive/class_high_five_1_1_attribute.html
One consideration is we could also specify the value type so that we can have an AttributeDict{T} <: AbstractDict{String, T} where T
rather than the value being Any
. Accessing attributes via this interface should use convert
to coerce the type.
In summary, I think the easiest path here would be to focus on adding the new interface and make it optionally type safe.
@@ -188,45 +193,82 @@ function h5readattr(filename, name::AbstractString) | |||
end | |||
|
|||
|
|||
struct Attributes | |||
parent::Union{File,Object} | |||
struct AttributeDict <: AbstractDict{String,Any} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's consider parameterizing this so that we can have a more specific type than Any
, which can be the default is not specified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see how that could be made to work, or really what advantage it would give. Can you expand on what you had in mind? What would happen if you attempted to read or write an attribute of a different type?
My intention was to provide a simple, easy-to-use interface. I think if you want to specify the return type, it would be better to do in the mid-level interface, e.g. adding a return type to read_attribute
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps AttributeDict{T}
could use convert(T, ...)
? If convert
throws an error, then so be it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It adds a lot of complexity and potential brittleness, and I'm still skeptical as to why it would be helpful? The files I've seen with attributes use a mix of different types (typically strings, integers and vectors of integers).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's my prototype diff:
JaneliaSciComp@c3a5db3
I added a try
to Base.iterate
to skip over items where convert
fails.
Type stability can be helpful. Since I'm accessing the HDF5 file in Julia rather than Python, I would like to take advantage of the type system. In many cases, I may have prior knowledge of the type I'm expecting the attribute to be, or I can glean that from the API. This provides an easy way to enforce knowledge of the type, allowing subsequent code to be type stable without having to use a completely different API. It also throws an error if my assumptions are violated.
Having the parameter default to Any
produces the exact same results you have now. Allowing for a parameter allows the same API to be extended to encourage type stability.
julia> attrs(h5f)
AttributeDict of HDF5.Group: / (file: test.h5) with 4 attributes:
"bye" => "3.0"
"hello" => 5
"someint" => 3
"test" => 5.5
julia> attrs(h5f, Int)
AttributeDict of HDF5.Group: / (file: test.h5) with 4 attributes:
"hello" => 5
"someint" => 3
julia> attrs(h5f, String)
AttributeDict of HDF5.Group: / (file: test.h5) with 4 attributes:
"bye" => "3.0"
julia> attrs(h5f, Float64)
AttributeDict of HDF5.Group: / (file: test.h5) with 4 attributes:
"hello" => 5.0
"someint" => 3.0
"test" => 5.5
julia> attrs(h5f, Int)["test"]
ERROR: InexactError: Int64(5.5)
Stacktrace:
[1] Int64
@ ./float.jl:812 [inlined]
[2] convert(#unused#::Type{Int64}, x::Float64)
@ Base ./number.jl:7
[3] getindex(x::HDF5.AttributeDict{Int64}, name::String)
@ HDF5 ~/.julia/dev/HDF5/src/attributes.jl:224
[4] top-level scope
@ REPL[12]:1
There a few things to explore here, such as passing the parameter as the memtype
to read_attribute
, but I think this would allow this API to go a lot further. Since we are talking about parameterizing Dataset
as well, I think adding a type parameter to AttributeDict
would be consistent with future direction of this package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are talking about parameterizing Dataset as well, I think adding a type parameter to AttributeDict would be consistent with future direction of this package.
The analogy to Dataset
would be adding a type parameter to Attribute
though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dataset
would be struct Dataset{T,N} <: AbstractArray{T,N}
or struct Dataset{T,N} <: AbstractDiskArray{T,N}
.
I wanted to explore if Attribute
could become Attribute{T}
and have a corresponding struct Attributes{T2} <: AbstractDict{String, Attribute{<:T2}}
. However, you want to deprecate that interface, which is why I'm exploring the idea here.
I would be fine if we parameterized the struct but only implemented methods for AttributeDict{Any}
in this pull request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't see the benefit of adding a parameter to AttributeDict
: how would it help?
Okay, I can move the deprecations to a new PR.
I admit to not having much experience with C++, but this looks similar to the mid-level interface, combined with adding a type parameter or output array to |
Here's an excerpt from their code: https://github.com/BlueBrain/Brion/blob/2b42ecd9c251741c42970141f34d292aee53d991/brion/plugin/compartmentReportHDF5.cpp#L78-L98 bool _verifyFile(const HighFive::File& file)
{
try
{
uint32_t magic = 0;
file.getAttribute("magic").read(magic);
if (magic != _sonataMagic)
return false;
std::vector<uint32_t> version;
file.getAttribute("version").read(version);
if (version.size() != 2 || version[0] != _currentVersion[0] ||
version[1] != _currentVersion[1])
return false;
}
catch (HighFive::Exception& e)
{
return false;
}
return true;
}
|
Thanks that is helpful. If we were to parametrise |
|
The more I think about it, Attributes is really more like a |
It's not static though: I can add or delete attributes |
I meant what would the attribute type parameter be? |
|
It's a shame we can't use |
Could you move the deprecated methods to the |
I asked him to do the deprecation in another pull request. |
Agreed, but once it's deprecated it become easier. |
@@ -178,8 +184,7 @@ function h5readattr(filename, name::AbstractString) | |||
file = h5open(filename,"r") | |||
try | |||
obj = file[name] | |||
a = attributes(obj) | |||
dat = Dict(x => read(a[x]) for x in keys(a)) | |||
dat = Dict(attrs(obj)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dat = Dict(attrs(obj)) | |
dat = Dict(AttributeDict(obj)) |
Internally, we should just use the constructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any difference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, there is one less level of indirection. For no difference we need const attrs = AttributeDict
.
test/attributes.jl
Outdated
@test haskey(attrs(f), "a") | ||
@test attrs(f)["a"] == 1 | ||
|
||
attrs(f)["b"] = [2,3] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test attrs
on groups and datasets as well.
Co-authored-by: Mark Kittisopikul <[email protected]>
I've had the opportunity to dig into However, there are fundamental issues down to the The only non-breaking route to type stable reads that I can see is by adding new methods. In a sense, the new That said, I believe it would be possible to eventually use the following syntax to do type stable reads of attributes. It would just need to be rebased upon a lower level API than magic = AttributeDict{UInt32}(file)["magic"]
version = AttributeDict{Vector{UInt32}}(file)["version"] The advantage of this syntax is that the second type parameter for My conclusion at the moment is that it is probably possible to achieve type stability through this new access mechanism, but I'm sympathetic to the idea that this might be too much work to be achieved in this pull request. Scoping everything to The main thing on my mind at the moment is expanding test coverage for |
My conclusion is that we should proceed with the changes in this PR and revisit the type stability issue in other PRs. I don't think it's a high priority for the new interface. We should probably instead aim to get the new API in and then focus on stability later. Personally, I'm not confident how much of a performance benefit stability could have, although I understand it's desirable in a lot of contexts.
I think we should look into this more seriously, even if it means a breaking change. Especially if it simplifies type-stability with the AttributeDict interface and other API methods. It sounds more robust than trying to monkey patch the AttributeDict interface for type stability, which doesn't sound entirely trivial with the current code. |
Within HDF5.jl, the performance impact is likely not great. However, the performance impact could be significant in downstream applications as this makes dynamic dispatch much more likely. Moreover, the issue is about correctness. Can people write scientific programs using HDF5.jl that will be robust to variations in HDF5 files? Furthermore, can we balance convenience and correctness? The immediate question is if there is anything that we can do here that will make iteration on this design easier in the future?
const attrs = AttributeDict
AttributeDict(x) = AttributeDict{Any}(x)
magic = attrs(file)["magic"]
version = attrs(file)["version"] Later we might then implement the following: magic = attrs{UInt32}(file)["magic"]
version = attrs{Vector{UInt32}}(file)["version"] |
I think both of those changes could easily be made at a later date without causing breakages. For the record, I am opposed to 1, as I fail to see the benefits from the added complexity (I think type-stable access would be better done via |
Adding a type parameter after release could be breaking if someone directly accessed the type, so we should be clear that it is not part of the public API. I do not see how qualifying the current implementation as Let's expand the testing to directly test groups, datasets, and datatypes, then merge. |
Oops, I pushed 875807a here instead of my own fork. Any objections to the expanded testing? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merge when ready
@@ -63,6 +63,16 @@ function Base.show(io::IO, attr::Attributes) | |||
print(io, "Attributes of ", attr.parent) | |||
end | |||
|
|||
Base.show(io::IO, attrdict::AttributeDict) = summary(io, attrdict) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does Julia end up printing the rest of the attributes ? Is that in some sort of fallback AbstractDict printing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the REPL printing is this one:
https://github.com/JuliaLang/julia/blob/803f90db9195c5c72df90d8a424c7066f1a8f2ee/base/show.jl#L133
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah it's the 3 argument show method. Thanks, I was wondering why @which
was sending me to a seemingly unrelated call.
This follows on from #943 (comment)
It defines
attrs(obj)
which returns anAttibuteDict
: this is mostly the same asAttributes
object, except:AbstractDict{String, Any}
, and supports the relevant features (keys
,pairs
,values
)getindex
willread
instead of returning anAttribute
delete!
TODO:
attributes
getindex
andsetindex!
onDataset
s for writing attributes