Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Have VectorData expandable by default #1064

Closed
wants to merge 7 commits into from
Closed

Have VectorData expandable by default #1064

wants to merge 7 commits into from

Conversation

mavaylon1
Copy link
Contributor

@mavaylon1 mavaylon1 commented Mar 4, 2024

Motivation

What was the reasoning behind this change? Please explain the changes briefly.

Have VectorData expandable by default:

  • Check that reading existing non-expandable data into the now expandable VectorData is not a problem (due to shape).
  • Make to test that the data is wrapped with H5DataIO on read
  • Update VectorData
  • Add tests
  • Investigate VectorIndex. When indexing vector index that has a wrapped target vector data, an error arises.
  • TBD

Questions to look into:

  • Do we have tests where we look at add_row when the data is wrapped with H5DataIO? ---> Yes
  • If a user wanted to make a EnumData column extendable, would they wrap both data and elements with H5DataIO?

Ideas:

  • We support Zarr, so wrapping with H5DataIO for every instance does not work. Do we instead want a field that is default to True that will wrap data if true. In hdmf-zarr, we will override this to False (similar to how we import HERD in pynwb into a pynwb version in order to reset the typemap). We will also have logic that says if the user provides their own instance of DataIO, it will not wrap automatically either. We could also just skip the field and within HDMF-zarr just unwrap and reset data, but I understand if that is a little messy.

How to test the behavior?

Show how to reproduce the new behavior (can be a bug fix or a new feature)

Checklist

  • Did you update CHANGELOG.md with your changes?
  • Does the PR clearly describe the problem and the solution?
  • Have you reviewed our Contributing Guide?
  • Does the PR use "Fix #XXX" notation to tell GitHub to close the relevant issue numbered XXX when the PR is merged?

@mavaylon1
Copy link
Contributor Author

Note: This could be done by next release if backwards compatibility does not raise an issue. Otherwise it will be Future (the current set milestone).

@mavaylon1 mavaylon1 self-assigned this Mar 4, 2024
@mavaylon1 mavaylon1 added category: enhancement improvements of code or code behavior priority: medium non-critical problem and/or affecting only a small set of users labels Mar 4, 2024
@mavaylon1 mavaylon1 added this to the Future milestone Mar 4, 2024
Copy link

codecov bot commented Mar 4, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 88.53%. Comparing base (d85d0cb) to head (2f3a01a).

❗ Current head 2f3a01a differs from pull request most recent head c0ce73b. Consider uploading reports for the commit c0ce73b to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##              dev    #1064      +/-   ##
==========================================
- Coverage   88.68%   88.53%   -0.15%     
==========================================
  Files          45       45              
  Lines        9740     9606     -134     
  Branches     2768     2732      -36     
==========================================
- Hits         8638     8505     -133     
  Misses        778      778              
+ Partials      324      323       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -42,6 +42,7 @@ class VectorData(Data):
'doc': 'a dataset where the first dimension is a concatenation of multiple vectors', 'default': list()},
allow_positional=AllowPositional.WARNING)
def __init__(self, **kwargs):
kwargs['data'] = H5DataIO(data=kwargs['data'], maxshape=(None,))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoding H5DataIO will probably create issues with ZarrIO. In the best case, ZarrIO ignores the wrapper and worst case it would fail.... I'm not sure which. For ZarrIO we should not need to wrap for datasets to be expandable, but the problem is, we won't know which backend will be used until we call write to know which wrapper to use.

Not sure what the best solution is for this. I could imagine a few approaches:

  1. Have "expandable" as an option on the base DataIO class and wrap with DataIO as a generic wrapper that the specific backend (HD5FIO, ZarrIO etc.) would then need to know how to translate. ---> I think this should be feasible, but would require changes in both HDF5IO and ZarrIO
  2. Maybe we would need to wrap in the build process ---> I think this could work, but may be tricky
  3. Wrap with H5DataIO and require that the other backends know how to translate it ---> I don't like this one, because it hard-codes backend-specific wrappers in the Container

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what was going through my mind as well. I made a note documenting that. I had a expandable field idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: enhancement improvements of code or code behavior priority: medium non-critical problem and/or affecting only a small set of users
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants