Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add missing TimeSeriesMap.data_attr function to link data between timeries #1766

Merged
merged 16 commits into from
Feb 8, 2024

Conversation

oruebel
Copy link
Contributor

@oruebel oruebel commented Aug 23, 2023

Motivation

Fix #1765

TimeSeries.data did not create SoftLink when set to another TimeSeries. It looks like the cause is that the TimeSeriesMap ObjectMapper does not define the corresponding TimeSeriesMap.data_attr (as is the case for timestamps) so that the link was not created. This PR adds this method.

  • Fix TimeSeriesMap
  • Add integration test for linking TimeSeries.data
  • Check and update existing tests
  • Update CHANGELOG

Creating a SoftLink for TimeSeries.data triggered a secondary error in TimeSeriesMap.unit_carg where we may not get a TimeSeries instead of a DatasetBuilder to look up the unit.

How to test the behavior?

See example script in #1765

Checklist

  • Did you update CHANGELOG.md with your changes?
  • Have you checked our Contributing document?
  • Have you ensured the PR clearly describes the problem and the solution?
  • Is your contribution compliant with our coding style? This can be checked running flake8 from the source directory.
  • Have you checked to ensure that there aren't other open Pull Requests for the same change?
  • Have you included the relevant issue number using "Fix #XXX" notation where XXX is the issue number? By including "Fix #XXX" you allow GitHub to close issue #XXX when the PR is merged.

@codecov
Copy link

codecov bot commented Aug 23, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (3597052) 91.99% compared to head (2c51bf0) 92.18%.

Additional details and impacted files
@@            Coverage Diff             @@
##              dev    #1766      +/-   ##
==========================================
+ Coverage   91.99%   92.18%   +0.19%     
==========================================
  Files          27       27              
  Lines        2623     2637      +14     
  Branches      685      689       +4     
==========================================
+ Hits         2413     2431      +18     
+ Misses        138      136       -2     
+ Partials       72       70       -2     
Flag Coverage Δ
integration 71.33% <100.00%> (+0.30%) ⬆️
unit 84.30% <33.33%> (-0.30%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@oruebel oruebel marked this pull request as draft August 23, 2023 21:37
@oruebel
Copy link
Contributor Author

oruebel commented Aug 23, 2023

Creating external links with the logic of setting TimeSeries(data=timeseries,...) and TimeSeries(timestamps=timeseries, ...) appears to be broken. Creating an external link with HDF5IO works:

timestamps=H5DataIO(
data=data_file_obt.get_acquisition('data_ts').timestamps,
link_data=True # test with setting link data

But using the approach with TimeSeries(data=timeseries, ...)

data=data_file_obt.get_acquisition('data_ts'), # test direct link

results in a broken external link:

  ...hdmf/backends/hdf5/h5tools.py:641: BrokenLinkWarning: Path to Group altered/broken at /acquisition/test_mod_ts/timestamps
    warnings.warn('Path to Group altered/broken at ' + os.path.join(h5obj.name, k), BrokenLinkWarning)

This issues appears to be a separate issue that I think we simply didn't see previously because of the bug from #1765 hiding the failure in the test. I'm not sure yet what the fix for this issue is, but I think it is likely either part of TimeSeriesMap.data_attr and TimeSeriesMap.timestamps_attr or a bug in HDF5IO.write_link

@rly do you have an idea what might be wrong here? The test that triggers this issue is pytest tests/integration/hdf5/test_modular_storage.py

@oruebel
Copy link
Contributor Author

oruebel commented Aug 24, 2023

Looking at the generated file, it appears that the path to the external file being set correctly but the path to the object is not being set correctly when using the TimeSeries(data=timeseries,...) approach:

data                     External Link {test_time_series_modular_data.nwb//data}
timestamps               External Link {test_time_series_modular_data.nwb//acquisition/data_ts/timestamps}

I.e. it should be External Link {test_time_series_modular_data.nwb//acquisition/data_ts/data} instead.

@CodyCBakerPhD
Copy link
Collaborator

To be clear, our only desire w.r.t. this feature are internal linkages. We have two TimeSeries internal to the same file that share the same data (and don't want to duplicate it) but different time bases, so this approach seemed ideal for us

Sounds like the external link thing might be a separate issue for a separate PR

@oruebel
Copy link
Contributor Author

oruebel commented Aug 24, 2023

Sounds like the external link thing might be a separate issue for a separate PR

Yes, I believe that is correct. We could update the test to not use the problematic logic and file a separate issue.

@oruebel
Copy link
Contributor Author

oruebel commented Aug 24, 2023

Sounds like the external link thing might be a separate issue for a separate PR

I've create #1767 as a separate issue. To get the test-suite to pass I have further:

  • Updated TestTimeSeriesModular to only use HDF5IO to create external links instead of setting TimeSeries(data=timeseries, ...)
  • Added TestTimeSeriesModularLinkViaTimeSeries to explicitly test the behavior with TimeSeries(data=timeseries, timeseries=timeseries). This test is currently being skipped and should be addressed in Creating external links via TimeSeries(data=timeseries,...) broken #1767

@oruebel oruebel requested a review from rly August 24, 2023 01:07
@oruebel oruebel marked this pull request as ready for review August 24, 2023 01:07
@rly
Copy link
Contributor

rly commented Aug 24, 2023

Note to self: add test to make sure dataset is coming from the expected file

@oruebel
Copy link
Contributor Author

oruebel commented Aug 24, 2023

Note to self: add test to make sure dataset is coming from the expected file

@rly I updated the test in test_modular_storage.py to check for this here:

# Make sure the timestamps and data are linked correctly. I.e., the filename of the h5py dataset should
# match between the data file and the file with links even-though they have been read from different files
self.assertEqual(
self.read_data_container.data.file.filename, # Path where the source data is stored
self.read_link_container.data.file.filename # Path where the linked h5py dataset points to
)
self.assertEqual(
self.read_data_container.timestamps.file.filename, # Path where the source data is stored
self.read_link_container.timestamps.file.filename # Path where the linked h5py dataset points to
)

I also updated the test to make it more readable. The test was getting confusing because it uses 2 different files and it wasn't always clear from the names of the variables which file the variables belonged to and what was actually being tested by the asserts.

@oruebel
Copy link
Contributor Author

oruebel commented Aug 24, 2023

I accidentally marked #1767 as fixed in one of the commits because I thought I had found the issue and the problem was actually an issue with the test. But it turns out I had made and error in that commit and the problem still remains. I fixed the tests but #1767 should remain open for now.

@oruebel
Copy link
Contributor Author

oruebel commented Aug 24, 2023

@rly just to summarize the status of this:

  • This PR fixes the issues of creating links within the same file by setting TimeSeries(data=other_timeseries) and is ready for review and I believe could be merged as is
  • Creating external links via TimeSeries(data=timeseries,...) broken #1767 describes the issue with linking to external files via TimeSeries(data=other_timeseries). This PR adds TestTimeSeriesModularLinkViaTimeSeries to explicitly test this issue but the test is currently being skipped and should be addressed either separately or (if it is a quick fix) in this PR

rly
rly previously approved these changes Feb 7, 2024
@rly
Copy link
Contributor

rly commented Feb 7, 2024

Looks good to me. Thanks for the thorough comments in the tests too.

@oruebel oruebel merged commit 047c37d into dev Feb 8, 2024
23 checks passed
@oruebel oruebel deleted the fix/link_timeseries_data branch February 8, 2024 21:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: Reusing data across TimeSeries objects does not seem to work
3 participants