Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove confusing example of create_dataset #974

Merged
merged 4 commits into from
Jun 24, 2022
Merged

Remove confusing example of create_dataset #974

merged 4 commits into from
Jun 24, 2022

Conversation

nhz2
Copy link
Member

@nhz2 nhz2 commented Jun 21, 2022

Based on the example:

We can write the `"mydataset"` by:
```@repl main
fid["mydataset"] = rand()
```
Or
```@repl main
create_dataset(fid, "myvector", rand(10))
```

I thought create_dataset would write the data, but create_dataset doesn't actually write any data, it just creates the dataset.

@mkitti
Copy link
Member

mkitti commented Jun 22, 2022

How about we show how to use write_dataset ?

@nhz2
Copy link
Member Author

nhz2 commented Jun 22, 2022

I think this is described in the "Reading and writing data" section.

HDF5.jl/docs/src/index.md

Lines 153 to 158 in d6ecf42

Datasets can be created with either
```julia
g["mydataset"] = rand(3,5)
write(g, "mydataset", rand(3,5))
```

It is also described in the "Mid-level routines" section.

HDF5.jl/docs/src/index.md

Lines 463 to 468 in d6ecf42

You can create and write data in one step,
```julia
write_dataset(parent, name, data[, lcpl, dcpl, dapl])
write_attribute(parent, name, data)
```

@musm
Copy link
Member

musm commented Jun 22, 2022

Not sure how removing this helps clarity. It seems like the paragraph above is clear on how to 'create' and also how to 'write' datasets.

Perhaps instead you could add more explanatory text on where you think the prose is lacking?

@mkitti
Copy link
Member

mkitti commented Jun 22, 2022

He's right in that the second call to create_dataset does not actually write any data.

image

I think it would clearer if we called this section "Creating a group or dataset":

Creating a group or dataset

Groups can be created via the function create_group.

create_group(fid, "mygroup")

We can create "mydataset" by indexing into fid. This also happens to write data to the dataset.

fid["mydataset"] = rand()

Alternatively, we can call create_dataset, which does not write data to the dataset. It merely creates the dataset.

create_dataset(fid, "myvector", Int, (10,))

Creating a dataset within a group is as simple as indexing into the group with the name of the dataset or calling create_dataset with the group as the first argument.

g = fid["mygroup"]
g["mydataset"] = "Hello World!"
create_dataset(g, "myvector", Int, (10,))

The do syntax is also supported. The file, group, and dataset handles will automatically be closed after the do block terminates.

h5open("example2.h5", "w") do fid
    g = create_group(fid, "mygroup")
    create_dataset(g, "myvector", Int, (10,))
end

@nhz2
Copy link
Member Author

nhz2 commented Jun 22, 2022

What you wrote seems much clearer than what is currently there.
I still find the behavior of create_dataset(fid, "myvector", rand(10)) confusing, and I think it would be clearer if it was changed in the example.

I find create_dataset(fid, "myvector", rand(10)) confusing for a couple of reasons.

  1. The h5py method with the same name does write the data. https://docs.h5py.org/en/stable/high/dataset.html#creating-datasets
  2. The example isn't a good use of create_dataset. There is no reason to call rand(10) here, since the results are just thrown away. Vector{Float64}(undef,10) would be better, or maybe something like:
data = rand(10)
dset = create_dataset(fid, "myvector", data)
write(dset, data)
  1. It doesn't fit well with the julia array syntax. Calling Vector(rand(10)) doesn't create an uninitialized vector. It uses the values returned by rand(10). The function similar creates an uninitialized array with the same type and size of a given array, so maybe create_dataset(fid, "myvector", data) should actually write the data and similar_dataset(fid, "myvector", data) would call create_dataset(fid, "myvector", eltype(data), size(data))

@mkitti
Copy link
Member

mkitti commented Jun 22, 2022

I changed all the instances in my comment to create_dataset(fid, "myvector", Int, (10,)) now exemplifying the form giving the datatype and dataspace dimensions.

@nhz2 nhz2 marked this pull request as draft June 23, 2022 15:07
Copy link
Member

@mkitti mkitti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. I suggest we consider deprecating the create_dataset form in another pull request that takes an array instance but does not write. After deprecation, we could then consider whether to reinstate it along the lines of the h5py interface.

@nhz2 nhz2 marked this pull request as ready for review June 23, 2022 19:39
@musm musm merged commit b9bf751 into JuliaIO:master Jun 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants