Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

move attributes to new file, add docs #943

Merged
merged 1 commit into from
May 31, 2022
Merged

Conversation

simonbyrne
Copy link
Collaborator

As part of my intermittent effort to clean up the code, this moves all the attributes-related stuff to a new file, and adds some docs.

@simonbyrne
Copy link
Collaborator Author

@musm @mkitti

@mkitti
Copy link
Member

mkitti commented May 29, 2022

Looks pretty good to me.

Should the Dataset, File, and Group related methods end up in attributes? Should we parcel those out to files as well?

Is there a bigger plan for a file tree?

@simonbyrne
Copy link
Collaborator Author

Should the Dataset, File, and Group related methods end up in attributes? Should we parcel those out to files as well?

I'm not sure what you mean: can you give an example?

Is there a bigger plan for a file tree?

I'd like to split up the functionality so it is easier to navigate and understand the API. At the moment, it is not even clear what the user-facing APIs are, let alone where they're defined. I originally tried to split them out with #833, but that was too ambitious, so I decided a piecemeal approach might be easier to manage.

This also makes it a bit easier to refactor and identify smaller changes. For example, after looking at this, there are a couple of things we might to consider to refactor (but I'll leave for separate PRs):

  • make Attributes a subtype of AbstractDict{String, Attribute}
  • add additional methods to Attributes (e.g. delete!) to make it full-featured (so that the mid-level API is not required)
  • deprecate "datasets act like attributes" with getindex/setindex! of strings (calling attributes(obj)["name"] isn't that hard).

@mkitti
Copy link
Member

mkitti commented May 29, 2022

deprecate "datasets act like attributes" with getindex/setindex! of strings (calling attributes(obj)["name"] isn't that hard).

While I kind of agree, I'm also not sure why this is a problem. Is there any ambiguity that this causes at the moment?

@mkitti
Copy link
Member

mkitti commented May 29, 2022

make Attributes a subtype of AbstractDict{String, Attribute}

I think we should target this for 0.17.

2.45
```
"""
function read_attribute(parent::Union{File,Group,Dataset,Datatype}, name::AbstractString)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we decide that read_attribute belongs in attributes.jl and not objects.jl?

If this were an OOP language, this method would belong to the types of parent.

Do we move all the attribute related methods into their own file mainly because these are leaves in the HDF5 tree?

Basically, I'm just trying to figure out the organization principle so we can be consistent going forward.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll admit that there aren't any clear rules, but there are a couple of reasons to keep it here:

  • they all use the attribute (h5a_XXX) C API functions
  • attributes are an "extra" thing stuck on top of HDF5 (i.e. you can define groups, datasets etc without referring to attributes at all)
  • in this particular case, it is a method that applies to multiple objects (so if you were to put it with those, it would need to be defined multiple times)

@simonbyrne
Copy link
Collaborator Author

While I kind of agree, I'm also not sure why this is a problem. Is there any ambiguity that this causes at the moment?

  1. it's inconsistent with other objects (Files or Groups), which use it for accessing sub-objects.
  2. when reading code, it's a lot easier to understand what attributes(dataset)["name"] does.

(anyway, that was why i wanted to leave it to a new PR)

@mkitti
Copy link
Member

mkitti commented May 29, 2022

I originally tried to split them out with #833,

I agree with changes in #833 more than I disagree with them. Using #833 as a practical example, perhaps prioritizing what to split out and why in an issue might be helpful.

Comment on lines +18 to +23
## Convenience interface

```@docs
h5readattr
h5writeattr
```
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what to call these functions (honestly, these might be another good candidate for deprecation)

@mkitti
Copy link
Member

mkitti commented May 29, 2022

While I kind of agree, I'm also not sure why this is a problem. Is there any ambiguity that this causes at the moment?

  1. it's inconsistent with other objects (Files or Groups), which use it for accessing sub-objects.
  2. when reading code, it's a lot easier to understand what attributes(dataset)["name"] does.

(anyway, that was why i wanted to leave it to a new PR)

Is it possible to have an attribute or dataset with the same name in a group?

One technical advantage I see to your approach is that we would likely improve type stability overall by doing this.

I do think attributes(dataset)["name"] is a bit verbose and hard to intuit though.

@mkitti
Copy link
Member

mkitti commented May 29, 2022

Overall, I would be in favor of merging this sooner rather than later.

@simonbyrne simonbyrne mentioned this pull request May 29, 2022
9 tasks
@simonbyrne
Copy link
Collaborator Author

Is it possible to have an attribute or dataset with the same name in a group?

Yes:

julia> h5 = h5open("xx.h5", "cw")
🗂️ HDF5.File: (read-write) xx.h5

julia> attributes(h5)["aa"] = [1.0]
1-element Vector{Float64}:
 1.0

julia> h5["aa"] = [2.0]
1-element Vector{Float64}:
 2.0

julia> h5
🗂️ HDF5.File: (read-write) xx.h5
├─ 🏷️ aa
└─ 🔢 aa

@mkitti
Copy link
Member

mkitti commented May 29, 2022

Is it possible to have an attribute or dataset with the same name in a group?

Yes:

I'm convinced. The interface needs to be deprecated.

@mkitti
Copy link
Member

mkitti commented May 29, 2022

Let's set a deadline of May 4th to merge this. I just want to give it more time because of the US holiday so everyone can see it.

end
end
Base.cconvert(::Type{API.hid_t}, attr::Attribute) = attr
Base.unsafe_convert(::Type{API.hid_t}, attr::Attribute) = attr.id
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@musm
Copy link
Member

musm commented May 31, 2022

Looks great to me, I'm also in strong approval of all the suggestions in

  • make Attributes a subtype of AbstractDict{String, Attribute}
  • add additional methods to Attributes (e.g. delete!) to make it full-featured (so that the mid-level API is not required)
  • deprecate "datasets act like attributes" with getindex/setindex! of strings (calling attributes(obj)["name"] isn't that hard).

On the last bullet, I agree. It's not very clear why dataset act like attributes and it's certainly far more descriptive to make users call attributes on the dataset if that's what they want to work with.

@musm musm merged commit 8338e09 into JuliaIO:master May 31, 2022
@simonbyrne simonbyrne deleted the sb/attributes branch May 31, 2022 17:34
@simonbyrne
Copy link
Collaborator Author

simonbyrne commented Jun 2, 2022

After looking at this a bit more, I thing one other change that may make sense is that getindex(::Attribute, ::String) should actually read the attribute value, rather than returning an Attribute object (since there is no way to read only part of an attribute). This however would be a breaking change.

@simonbyrne
Copy link
Collaborator Author

This appears to be what h5py does: https://docs.h5py.org/en/stable/quick.html#attributes

@mkitti
Copy link
Member

mkitti commented Jun 2, 2022

getindex(::Attribute, ::String)

Do you mean getindex(::Attributes, ::String)?

@mkitti
Copy link
Member

mkitti commented Jun 2, 2022

Maybe we should have a attrs that returns a AbstractDict{String, Any} and acts like the h5py interface.

Meanwhile, perhaps we should parameterize attribute by its Julia type and have it work like a Ref.

@mkitti
Copy link
Member

mkitti commented Jun 2, 2022

I would also be interested in comparing to the Zarr.jl as well.

@simonbyrne simonbyrne mentioned this pull request Jun 4, 2022
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants