-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework DynamicTable/DynamicTableRegion.get when the table contains a DTR #579
Conversation
Codecov Report
@@ Coverage Diff @@
## rc/3.0.0 #579 +/- ##
============================================
- Coverage 86.30% 86.24% -0.07%
============================================
Files 41 41
Lines 8289 8310 +21
Branches 1786 1790 +4
============================================
+ Hits 7154 7167 +13
- Misses 799 807 +8
Partials 336 336
Continue to review full report at Codecov.
|
This PR is nearly ready for review. There are two outstanding issues:
|
This is now ready for review, tested with h5py 2.10 and 3.3. |
It would be useful to add the description of the behavior of the get functions that is included here in the description of the PR also in the tutorial for DynamicTable. https://hdmf.readthedocs.io/en/latest/tutorials/dynamictable.html#sphx-glr-tutorials-dynamictable-py |
Co-authored-by: Oliver Ruebel <[email protected]>
…DTR (#579) Co-authored-by: Oliver Ruebel <[email protected]>
Motivation
Rework of #556. Fix #552.
When a
DynamicTable dt
contains aDynamicTableRegion dtr
anddt.get()
,dt.__getitem__()
,dtr.get()
, ordtr.__getitem__()
are called, then:df=True
,index=True
(default), then a pandas DataFrame will be returned where the entries for the DTR column are the indices of the DTR.df=True
andindex=False
, then a pandas DataFrame will be returned where the entries for the DTR column are nested DataFrames (this works for multiple levels of nesting).df=False
andindex=True
, a list of lists will be returned where the list corresponding to the DTR column contains the indices of the DTR.df=False
andindex=False
, an error will be raised. Returning a list of lists where the DTR column contains lists of lists gets very messy in the ragged DTR case, and the nesting removes all column names and structure, so the returned value is difficult to interpret/parse. It is recommended to setdf=True
if you want to return a row with DynamicTableRegions resolved into nested DataFrames.__getitem__
,get(df=True, index=True)
,get(df=True, index=False)
,get(df=False, index=True)
] with the data stored as a [np.ndarray
,list
,h5py.Dataset
] with tables with scalar, 1D arrays, ragged 1D arrays, 2D arrays, DTRs, ragged DTRs at multiple levels of nesting.Removed name mangling when a
DynamicTable dt
contains aDynamicTableRegion dtr
anddt.get()
,dt.__getitem__()
,dtr.get()
, ordtr.__getitem__()
are called. This is technically a breaking change, though it did not work correctly in many cases.When a
DynamicTable dt
contains aDynamicTableRegion dtr
, anddt.to_dataframe()
is called, then:index=True
, then a pandas DataFrame will be returned where the entries for the DTR column are the indices of the DTR. This is equivalent totable.get(slice(None, None, None))
.index=False
(default), then a pandas DataFrame will be returned where the entries for the DTR column are nested DataFrames (this works for multiple levels of nesting). This is equivalent totable.get(slice(None, None, None), index=False)
.Also increase h5py minimum version from 2.9 to 2.10 because 2.9 does not support
__getitem__
on an h5py.Dataset using a list of indices.Checklist
flake8
from the source directory.