Automatically migrate TimeIntervals.timeseries to use TimeSeriesReferenceVectorData #1390

oruebel · 2021-08-11T13:05:49Z

Motivation

Fix NeurodataWithoutBorders/nwb-schema#484 . This PR automatically migrates TimeIntervals.timeseries column to use the new TimeSeriesReferenceVectorData container class. It does so by patching the __class__ at runtime in the TimeIntervalsMap ObjectMapper class when the TimeIntervals object is being built on read. Since TimeIntervals.timeseries and TimeSeriesReferenceVectorData have the same schema and memory layout, swapping the class (i.e., object type) at runtime, after-the-fact should be fine.

Advantages

does not require changes to the Container classes as all changes are transparently handled in the ObjectMapper before the user Containers are set up.
leads to consistent API behavior for old and new files when accessing data

Disadvantages

Relies on monkey patching the objects after-the-fact

Major changes

Update the ObjectMapper for TimeIntervals to update the timeseries column on read
Update schema
Add tests for original schema (that use VectorData for TimeIntervals.timeseries)
Add tutorial for using TimeIntervals and TimeSeriesReferenceVectorData and TimeSeriesReference

Checklist

Did you update CHANGELOG.md with your changes?
Have you checked our Contributing document?
Have you ensured the PR clearly describes the problem and the solution?
Is your contribution compliant with our coding style? This can be checked running flake8 from the source directory.
Have you checked to ensure that there aren't other open Pull Requests for the same change?
Have you included the relevant issue number using "Fix #XXX" notation where XXX is the issue number? By including "Fix #XXX" you allow GitHub to close issue #XXX when the PR is merged.

…d for convenience and backward compatability of TimeIntervals

… to TimeSeriesReferenceVectorData

…ess rather than on set

codecov · 2021-08-11T13:07:46Z

Codecov Report

Merging #1390 (faaac10) into dev (3b886ca) will increase coverage by 0.70%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##              dev    #1390      +/-   ##
==========================================
+ Coverage   77.47%   78.18%   +0.70%     
==========================================
  Files          37       37              
  Lines        2735     2759      +24     
  Branches      455      461       +6     
==========================================
+ Hits         2119     2157      +38     
+ Misses        535      522      -13     
+ Partials       81       80       -1

Impacted Files	Coverage Δ
src/pynwb/base.py	`94.97% <100.00%> (+0.16%)`	⬆️
src/pynwb/epoch.py	`88.46% <100.00%> (+5.76%)`	⬆️
src/pynwb/io/epoch.py	`100.00% <100.00%> (ø)`
src/pynwb/file.py	`87.62% <0.00%> (+1.26%)`	⬆️
src/pynwb/io/base.py	`58.62% <0.00%> (+2.29%)`	⬆️
src/pynwb/__init__.py	`76.64% <0.00%> (+2.91%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3b886ca...faaac10. Read the comment docs.

rly · 2021-08-11T16:41:30Z

Interesting approach. I'm surprised that changing __class__ of an object is allowed, but I guess Python allows you to do anything.

A couple comments:

The migration would only happen when timeintervals.timeseries or timeintervals['timeseries'] is called but not when timeintervals.columns or timeintervals[0, 'timeseries'] are called and not when any future changes to how columns can be accessed are called.
Can the migration be done in __init__?
If an old file is read with tuples, then would timeintervals.timeseries return a TimeSeriesReferenceVectorData object containing tuples?

oruebel · 2021-08-11T20:00:22Z

Interesting approach. I'm surprised that changing __class__ of an object is allowed, but I guess Python allows you to do anything.

Yeah, Python allows you to do a lot of strange things that usually shouldn't be done ;-) It works in this case mainly because VectorData and TimeSeriesReferenceVectorData are essentially the same aside from changing behavior of a few instance methods. https://stackoverflow.com/a/42137440

The migration would only happen when timeintervals.timeseries or timeintervals['timeseries'] is called but not when

Good point.

Can the migration be done in __init__?

Yes. That is how I originally did it in this commit fe87879 Maybe the better place would be to do this in the ObjectMapper when getting the constructor arguments. In this way, the whole build-process should already be done and the swap just happens right before the TimeIntervals object is being instantiated.

If an old file is read with tuples, then would timeintervals.timeseries return a TimeSeriesReferenceVectorData object containing tuples?

I don't think this is an issue. The datatypes in HDF5 are identical for TimeIntervals.timeseries and TimeSeriesReferenceVectorData and the conversion to TimeSeriesReference objects happen when you slice into the column, so the behavior for "old" vs. "new" files should be identical.

rly · 2021-08-11T21:20:51Z

Yeah, I agree - the better place to do this would be in the ObjectMapper when getting the constructor arguments. Do you want to tackle that or do you want me to?

oruebel · 2021-08-11T21:30:29Z

Do you want to tackle that or do you want me to?

I can give it a try. We can always scrap this PR if we decide on another option. I just wanted to see if this "creative" way of changing types could work.

oruebel · 2021-08-11T23:12:34Z

I think it may be simpler to just do it in __init__. In fact, if we update the schema, then I think the only time we should have to migrate the type is on read of old files anyways and that will always go through setting the columns constructor argument, so we shouldn't have to worry about functions like, add_column, add_interval, __getitem__ etc. as those are not relevant when reading a file and once the column type has been migrated they should function the same for old and new files.

For what it's worth, here my attempt at doing the migration in the ObjectMapper. I didn't see a way to get the values for the constructor arguments that are not overwritten. It looks like this is all handled in the ObjectMapper.construct function directly. The following should work to migrate the column in the objectmapper but its a bit ugly because it uses private functions from the ObjectMapper and requires iterating overall subspecs (which isn't too terrible but doesn't seem very clean)

@register_map(TimeIntervals)
class TimeIntervalsMap(DynamicTableMap):
    pass

    @ObjectMapper.constructor_arg('columns')
    def columns(self, builder, manager):
        # This goes through all the subspecs, but we really just need the VectorData.
        # Is there a better way to get all the VectorData containers?
        subspecs = self._ObjectMapper__get_subspec_values(builder, self.spec, manager)
        vectordata_cols = []
        for subspec, value in subspecs.items():
            const_arg = self.get_const_arg(subspec)
            if const_arg is not None and const_arg == 'columns':
               vectordata_cols  += value
        # Find the timeseries column and update it if necessary
        for v in vectordata_cols:
            if v.name == 'timeseries':
                if not isinstance(v, TimeSeriesReferenceVectorData):
                    v.__class__ = TimeSeriesReferenceVectorData
        return vectordata_cols

rly · 2021-08-12T00:07:29Z

That approach isn't terrible, especially if we make some changes to the ObjectMapper to make certain functions not private. I should have pointed you to my previous two approaches here: bef1be8 (#1274)

I think we can combine yours and mine. In my approach, I also updated the builders just in case that was needed later for write/append but I'm not sure that is necessary. So something like:

@DynamicTableMap.constructor_arg('columns')
def columns_carg(self, builder, manager):
    # handle case when a TimeIntervals is read with a non-TimeSeriesReferenceVectorData "timeseries" column
    timeseries_builder = builder.get('timeseries')
    if timeseries_builder.attributes['neurodata_type'] != 'TimeSeriesReferenceVectorData':
        # override builder attributes
        timeseries_builder.attributes['neurodata_type'] = 'TimeSeriesReferenceVectorData'
        timeseries_builder.attributes['namespace'] = 'core'
        # construct new columns list
        columns = list()
        for dset_builder in builder.datasets.values():
            dset_obj = manager.construct(dset_builder)  # these have already been constructed
            # go through only the column datasets and replace the 'timeseries' column class in-place
            if isinstance(dset_obj, VectorData):
                if dset_obj.name == 'timeseries':
                    dset_obj.__class__ = TimeSeriesReferenceVectorData
                columns.append(dset_obj)

        return columns

    return None  # do not override

oruebel · 2021-08-12T00:31:18Z

30a4a75 updates the PR to do the conversion in TimeIntervals.__init__ only. I believe this works with old and new schema.

I think we can combine yours and mine.

Sure, I think that should work. Generally speaking, doing this in the ObjectMapper seems slightly trickier to do but I think its the more appropriate place for this sort of thing, rather than cluttering user-facing classes with backwards compatibility code. One thing I noticed is that it seems that DynamicTable.from_dataframe currently may not use theclass defined for the column. As such, calling TimeIntervals.from_dataframe currently triggers migration of the timeseries column in __init__ even when updating the schema to use TimeSeriesReferenceColumn for TimeIntervals.timeseries. I.e., if we fix this in the ObjectMapper then we'll likely also need to update DynamicTable.from_dataframe(but that seems like a bug that should be fixed anyways).

…ate the builders

oruebel · 2021-08-12T01:08:15Z

I think we can combine yours and mine.

@rly I updated the code as you suggested to do the migration of the builders and container in the ObjectMapper only. After a few minor fixes (the code failed when TimeIntervals.timeseries was missing), this seems to work fine. Thanks for your patience in iterating on possible solutions, but I think what we have here now seems like an elegant solution.

it seems that DynamicTable.from_dataframe currently may not use theclass

I stand corrected this seems to work correctly and I added an assert in the unit tests to make sure.

@rly You mentioned at some point that you had a test to also test agains previous files that use VectorData. Can you add that test to this PR. I've tested agains both the updated and current schema and it seems fine, but I've done this only by changing the schema by hand. It would be good to have an actual test for this.

…vals.timeseries work

oruebel · 2021-08-13T00:16:36Z

This is good to go.

…der in case we need to add custom init functionality in the future

This reverts commit e098849.

oruebel added 4 commits August 11, 2021 04:45

Support regular tuples in TimeSeriesReferenceVectorData.add_row/appen…

472be82

…d for convenience and backward compatability of TimeIntervals

Automatically migrate TimeIntervals.timeseries column from VectorData…

fe87879

… to TimeSeriesReferenceVectorData

minor enhancements

88f8b51

Update approach to convert TimeIntervals.timeseries on first read acc…

54b709a

…ess rather than on set

oruebel requested a review from rly August 11, 2021 13:06

oruebel mentioned this pull request Aug 11, 2021

Update TimeIntervals to use the new TimeSeriesReferenceVectorData NeurodataWithoutBorders/nwb-schema#484

Closed

Update to do class migration in __init__

30a4a75

Updated approach to do the migration in the ObjectMapper and also upd…

b362eb0

…ate the builders

oruebel added 2 commits August 12, 2021 16:03

Add test to ensure reading of files that use VectorData for TimeInter…

fcdb9fa

…vals.timeseries work

Update schema

4c189dc

oruebel mentioned this pull request Aug 12, 2021

Update TimeIntervals.timeseries to use new TimeSeriesReferenceVectorData NeurodataWithoutBorders/nwb-schema#486

Merged

3 tasks

oruebel added 2 commits August 12, 2021 16:25

Moved epoch legacy test

1adf145

Udpated schema

55a0d5c

oruebel marked this pull request as ready for review August 12, 2021 23:54

Updated changelog

bccf52d

oruebel added 5 commits August 12, 2021 22:01

Add gallery tutorial for annotating time intervals

43b88d1

Fix gallery flake8

2b7f4b4

Merge branch 'dev' into enh/migrate_timeintervals_col

e282358

Add TimeIntervals I/O to gallery tutoria

5f77173

Added link to the new tutorial in the file basic tutorial

1f2ce4d

oruebel and others added 6 commits August 13, 2021 18:16

Added TimeSeriesReferenceVectorData_init_internal as a empty placehol…

ddf509f

…der in case we need to add custom init functionality in the future

Merge branch 'dev' into enh/migrate_timeintervals_col

9a7689d

Update Changelog

3eb0184

Merge branch 'dev' into enh/migrate_timeintervals_col

ea267cc

Add mapping for stimulus_templates/Images

e098849

Update schema to latest

0266774

rly previously approved these changes May 17, 2022

View reviewed changes

rly added 2 commits May 17, 2022 14:20

Revert "Add mapping for stimulus_templates/Images"

1883673

This reverts commit e098849.

Rollback schema just to TimeIntervals change

faaac10

rly dismissed their stale review via faaac10 May 17, 2022 21:21

rly approved these changes May 17, 2022

View reviewed changes

rly merged commit c9ea3f2 into dev May 17, 2022

rly deleted the enh/migrate_timeintervals_col branch May 17, 2022 21:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automatically migrate TimeIntervals.timeseries to use TimeSeriesReferenceVectorData #1390

Automatically migrate TimeIntervals.timeseries to use TimeSeriesReferenceVectorData #1390

oruebel commented Aug 11, 2021 •

edited

Loading

codecov bot commented Aug 11, 2021 •

edited

Loading

rly commented Aug 11, 2021

oruebel commented Aug 11, 2021 •

edited

Loading

rly commented Aug 11, 2021

oruebel commented Aug 11, 2021

oruebel commented Aug 11, 2021

rly commented Aug 12, 2021

oruebel commented Aug 12, 2021 •

edited

Loading

oruebel commented Aug 12, 2021

oruebel commented Aug 13, 2021

Automatically migrate TimeIntervals.timeseries to use TimeSeriesReferenceVectorData #1390

Automatically migrate TimeIntervals.timeseries to use TimeSeriesReferenceVectorData #1390

Conversation

oruebel commented Aug 11, 2021 • edited Loading

Motivation

Checklist

codecov bot commented Aug 11, 2021 • edited Loading

Codecov Report

rly commented Aug 11, 2021

oruebel commented Aug 11, 2021 • edited Loading

rly commented Aug 11, 2021

oruebel commented Aug 11, 2021

oruebel commented Aug 11, 2021

rly commented Aug 12, 2021

oruebel commented Aug 12, 2021 • edited Loading

oruebel commented Aug 12, 2021

oruebel commented Aug 13, 2021

oruebel commented Aug 11, 2021 •

edited

Loading

codecov bot commented Aug 11, 2021 •

edited

Loading

oruebel commented Aug 11, 2021 •

edited

Loading

oruebel commented Aug 12, 2021 •

edited

Loading