Skip to content

Add/aligned dynamic table #551

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
Apr 13, 2021
Merged
Show file tree
Hide file tree
Changes from 17 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 8 additions & 4 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,14 +8,18 @@
- Add `HDF5IO.get_namespaces(path=path, file=file)` method which returns a dict of namespace name mapped to the
namespace version (the largest one if there are multiple) for each namespace cached in the given HDF5 file.
@rly (#527)
- Add experimental namespace to HDMF common schema. New data types should go in the experimental namespace
- Add experimental namespace to HDMF common schema. New data types should go in the experimental namespace
(hdmf-experimental) prior to being added to the core (hdmf-common) namespace. The purpose of this is to provide
a place to test new data types that may break backward compatibility as they are refined. @ajtritt (#545)

- Add `EnumData` type for storing data that comes from a fixed set of values. This replaces `VocabData` i.e.
`VocabData` has been removed. `VocabData` stored vocabulary elements in an attribute, which has a size limit.
`EnumData` now stores elements in a separate dataset, referenced by an attribute stored on the `EnumData` dataset.
@ajtritt (#537)
- Add `AlignedDynamicTable` type which defines a DynamicTable that supports storing a collection of subtables.
Each sub-table is itself a DynamicTable that is aligned with the main table by row index. Each subtable
defines a sub-category in the main table effectively creating a table with sub-headings to organize columns.
@oruebel (#551)

### Internal improvements
- Update CI and copyright year. @rly (#523, #524)
Expand Down Expand Up @@ -50,13 +54,13 @@
### New features
- Add methods for automatic creation of `MultiContainerInterface` classes. @bendichter (#420, #425)
- Add ability to specify a custom class for new columns to a `DynamicTable` that are not `VectorData`,
`DynamicTableRegion`, or `VocabData` using `DynamicTable.__columns__` or `DynamicTable.add_column(...)`. @rly (#436)
`DynamicTableRegion`, or `VocabData` using `DynamicTable.__columns__` or `DynamicTable.add_column(...)`. @rly (#436)
- Add support for creating and specifying multi-index columns in a `DynamicTable` using `add_column(...)`.
@bendichter, @rly (#430)
- Add capability to add a row to a column after IO. @bendichter (#426)
- Add method `AbstractContainer.get_fields_conf`. @rly (#441)
- Add functionality for storing external resource references. @ajtritt (#442)
- Add method `hdmf.utils.get_docval_macro` to get a tuple of the current values for a docval_macro, e.g., 'array_data'
- Add method `hdmf.utils.get_docval_macro` to get a tuple of the current values for a docval_macro, e.g., 'array_data'
and 'scalar_data'. @rly (#446)
- Add `SimpleMultiContainer`, a data_type for storing a `Container` and `Data` objects together. @ajtritt (#449)
- Support `pathlib.Path` paths in `HDMFIO.__init__`, `HDF5IO.__init__`, and `HDF5IO.load_namespaces`. @dsleiter (#450)
Expand Down Expand Up @@ -144,7 +148,7 @@
- Allow passing None for docval enum arguments with default value None. @rly (#409)
- If a file is written with an orphan container, e.g., a link to a container that is not written, then an
`OrphanContainerBuildError` will be raised. This replaces the `OrphanContainerWarning` that was previously raised.
@rly (#407)
@rly (#407)

## HDMF 2.0.0 (July 17, 2020)

Expand Down
2 changes: 2 additions & 0 deletions src/hdmf/common/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,7 @@ def available_namespaces():
from . import io as __io # noqa: F401,E402

from . import table # noqa: F401,E402
from . import alignedtable # noqa: F401,E402
from . import sparse # noqa: F401,E402
from . import resources # noqa: F401,E402
from . import multi # noqa: F401,E402
Expand All @@ -137,6 +138,7 @@ def available_namespaces():
CSRMatrix = __TYPE_MAP.get_container_cls('CSRMatrix', CORE_NAMESPACE)
ExternalResources = __TYPE_MAP.get_container_cls('ExternalResources', EXP_NAMESPACE)
SimpleMultiContainer = __TYPE_MAP.get_container_cls('SimpleMultiContainer', CORE_NAMESPACE)
AlignedDynamicTable = __TYPE_MAP.get_container_cls('AlignedDynamicTable', CORE_NAMESPACE)


@docval({'name': 'extensions', 'type': (str, TypeMap, list),
Expand Down
243 changes: 243 additions & 0 deletions src/hdmf/common/alignedtable.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,243 @@
"""
Collection of Container classes for interacting with aligned and hierarchical dynamic tables
"""
from collections import OrderedDict

import numpy as np
import pandas as pd

from . import register_class
from .table import DynamicTable
from ..utils import docval, getargs, call_docval_func, popargs, get_docval


@register_class('AlignedDynamicTable')
class AlignedDynamicTable(DynamicTable):
"""
DynamicTable container that supports storing a collection of subtables. Each sub-table is a
DynamicTable itself that is aligned with the main table by row index. I.e., all
DynamicTables stored in this group MUST have the same number of rows. This type effectively
defines a 2-level table in which the main data is stored in the main table implemented by this type
and additional columns of the table are grouped into categories, with each category being'
represented by a separate DynamicTable stored within the group.
"""
__fields__ = ({'name': 'category_tables', 'child': True}, )

@docval(*get_docval(DynamicTable.__init__),
{'name': 'category_tables', 'type': list,
'doc': 'List of DynamicTables to be added to the container', 'default': None},
{'name': 'categories', 'type': 'array_data',
'doc': 'List of names with the ordering of category tables', 'default': None})
def __init__(self, **kwargs):
in_category_tables = popargs('category_tables', kwargs)
in_categories = popargs('categories', kwargs)
if in_categories is None and in_category_tables is not None:
in_categories = [tab.name for tab in in_category_tables]
if in_categories is not None and in_category_tables is None:
raise ValueError("Categories provided but no category_tables given")
# at this point both in_categories and in_category_tables should either both be None or both be a list
if in_categories is not None:
if len(in_categories) != len(in_category_tables):
raise ValueError("%s category_tables given but %s categories specified" %
(len(in_category_tables), len(in_categories)))
# Initialize the main dynamic table
call_docval_func(super().__init__, kwargs)
# Create and set all sub-categories
dts = OrderedDict()
# Add the custom categories given as inputs
if in_category_tables is not None:
# We may need to resize our main table when adding categories as the user may not have set ids
if len(in_category_tables) > 0:
# We have categories to process
if len(self.id) == 0:
# The user did not initialize our main table id's nor set columns for our main table
for i in range(len(in_category_tables[0])):
self.id.append(i)
# Add the user-provided categories in the correct order as described by the categories
# This is necessary, because we do not store the categories explicitly but we maintain them
# as the order of our self.category_tables. In this makes sure look-ups are consistent.
lookup_index = OrderedDict([(k, -1) for k in in_categories])
for i, v in enumerate(in_category_tables):
# Error check that the name of the table is in our categories list
if v.name not in lookup_index:
raise ValueError("DynamicTable %s does not appear in categories %s" % (v.name, str(in_categories)))
# Error check to make sure no two tables with the same name are given
if lookup_index[v.name] >= 0:
raise ValueError("Duplicate table name %s found in input dynamic_tables" % v.name)
lookup_index[v.name] = i
for table_name, tabel_index in lookup_index.items():
# This error case should not be able to occur since the length of the in_categories and
# in_category_tables must match and we made sure that each DynamicTable we added had its
# name in the in_categories list. We, therefore, exclude this check from coverage testing
# but we leave it in just as a backup trigger in case something unexpected happens
if tabel_index < 0: # pragma: no cover
raise ValueError("DynamicTable %s listed in categories but does not appear in category_tables" %
table_name) # pragma: no cover
# Test that all category tables have the correct number of rows
category = in_category_tables[tabel_index]
if len(category) != len(self):
raise ValueError('Category DynamicTable %s does not align, it has %i rows expected %i' %
(category.name, len(category), len(self)))
# Add the category table to our category_tables.
dts[category.name] = category
# Set the self.category_tables attribute, which will set the parent/child relationships for the category_tables
self.category_tables = dts

def __contains__(self, val):
"""
Check if the given value (i.e., column) exists in this table

:param val: If val is a string then check if the given category exists. If val is a tuple
of two strings (category, colname) then check for the given category if the given colname exists.
"""
if isinstance(val, str):
return val in self.category_tables or val in self.colnames
elif isinstance(val, tuple):
if len(val) != 2:
raise ValueError("Expected tuple of strings of length 2 got tuple of length %i" % len(val))
return val[1] in self.get_category(val[0])
else:
return False

@property
def categories(self):
"""
Get the list of names the categories

Short-hand for list(self.category_tables.keys())

:raises: KeyError if the given name is not in self.category_tables
"""
return list(self.category_tables.keys())

@docval({'name': 'category', 'type': DynamicTable, 'doc': 'Add a new DynamicTable category'},)
def add_category(self, **kwargs):
"""
Add a new DynamicTable to the AlignedDynamicTable to create a new category in the table.

NOTE: The table must align with (i.e, have the same number of rows as) the main data table (and
other category tables). I.e., if the AlignedDynamicTable is already populated with data
then we have to populate the new category with the corresponding data before adding it.

:raises: ValueError is raised if the input table does not have the same number of rows as the main table
"""
category = getargs('category', kwargs)
if len(category) != len(self):
raise ValueError('New category DynamicTable does not align, it has %i rows expected %i' %
(len(category), len(self)))
if category.name in self.category_tables:
raise ValueError("Category %s already in the table" % category.name)
self.category_tables[category.name] = category
category.parent = self

@docval({'name': 'name', 'type': str, 'doc': 'Name of the category we want to retrieve', 'default': None})
def get_category(self, **kwargs):
name = popargs('name', kwargs)
if name is None or (name not in self.category_tables and name == self.name):
return self
else:
return self.category_tables[name]

@docval(*get_docval(DynamicTable.add_column),
{'name': 'category', 'type': str, 'doc': 'The category the column should be added to',
'default': None})
def add_column(self, **kwargs):
"""
Add a column to the table

:raises: KeyError if the category does not exist

"""
category_name = popargs('category', kwargs)
if category_name is None:
# Add the column to our main table
call_docval_func(super().add_column, kwargs)
else:
# Add the column to a sub-category table
try:
category = self.get_category(category_name)
except KeyError:
raise KeyError("Category %s not in table" % category_name)
category.add_column(**kwargs)

@docval({'name': 'data', 'type': dict, 'doc': 'the data to put in this row', 'default': None},
{'name': 'id', 'type': int, 'doc': 'the ID for the row', 'default': None},
{'name': 'enforce_unique_id', 'type': bool, 'doc': 'enforce that the id in the table must be unique',
'default': False},
allow_extra=True)
def add_row(self, **kwargs):
"""
We can either provide the row data as a single dict or by specifying a dict for each category
"""
data, row_id, enforce_unique_id = popargs('data', 'id', 'enforce_unique_id', kwargs)
data = data if data is not None else kwargs

# extract the category data
category_data = {k: data.pop(k) for k in self.categories if k in data}

# Check that we have the approbriate categories provided
missing_categories = set(self.categories) - set(list(category_data.keys()))
if missing_categories:
raise KeyError(
'\n'.join([
'row data keys don\'t match available categories',
'missing {} category keys: {}'.format(len(missing_categories), missing_categories)
])
)
# Add the data to our main dynamic table
data['id'] = row_id
data['enforce_unique_id'] = enforce_unique_id
call_docval_func(super().add_row, data)

# Add the data to all out dynamic table categories
for category, values in category_data.items():
self.category_tables[category].add_row(**values)

@docval({'name': 'ignore_category_ids', 'type': bool,
'doc': "Ignore id columns of sub-category tables", 'default': False})
def to_dataframe(self, **kwargs):
"""Convert the collection of tables to a single pandas DataFrame"""
dfs = [super().to_dataframe().reset_index(), ]
if getargs('ignore_category_ids', kwargs):
dfs += [category.to_dataframe() for category in self.category_tables.values()]
else:
dfs += [category.to_dataframe().reset_index() for category in self.category_tables.values()]
names = [self.name, ] + list(self.category_tables.keys())
res = pd.concat(dfs, axis=1, keys=names)
res.set_index((self.name, 'id'), drop=True, inplace=True)
return res

def __getitem__(self, item):
"""
If item is:
* int : Return a single row of the table
* string : Return a single category of the table
* tuple: Get a column, row, or cell from a particular category. The tuple is expected to consist
of (category, selection) where category may be a string with the name of the sub-category
or None (or the name of this AlignedDynamicTable) if we want to slice into the main table.

:returns: DataFrame when retrieving a row or category. Returns scalar when selecting a cell.
Returns a VectorData/VectorIndex when retrieving a single column.
"""
if isinstance(item, (int, list, np.ndarray, slice)):
# get a single full row from all tables
dfs = ([super().__getitem__(item).reset_index(), ] +
[category[item].reset_index() for category in self.category_tables.values()])
names = [self.name, ] + list(self.category_tables.keys())
res = pd.concat(dfs, axis=1, keys=names)
res.set_index((self.name, 'id'), drop=True, inplace=True)
return res
elif isinstance(item, str) or item is None:
if item in self.colnames:
# get a specfic column
return super().__getitem__(item)
else:
# get a single category
return self.get_category(item).to_dataframe()
elif isinstance(item, tuple):
if len(item) == 2:
return self.get_category(item[0])[item[1]]
elif len(item) == 3:
return self.get_category(item[0])[item[1]][item[2]]
else:
raise ValueError("Expected tuple of length 2 or 3 with (category, column, row) as value.")
1 change: 1 addition & 0 deletions src/hdmf/common/io/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
from . import multi # noqa: F401
from . import resources # noqa: F401
from . import table # noqa: F401
from . import alignedtable # noqa: F401
15 changes: 15 additions & 0 deletions src/hdmf/common/io/alignedtable.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
from .. import register_map
from ..alignedtable import AlignedDynamicTable
from .table import DynamicTableMap


@register_map(AlignedDynamicTable)
class AlignedDynamicTableMap(DynamicTableMap):
"""
Customize the mapping for AlignedDynamicTable
"""
def __init__(self, spec):
super().__init__(spec)
# By default the DynamicTables contained as sub-categories in the AlignedDynamicTable are mapped to
# the 'dynamic_tables' class attribute. This renames the attribute to 'category_tables'
self.map_spec('category_tables', spec.get_data_type('DynamicTable'))
4 changes: 2 additions & 2 deletions src/hdmf/common/table.py
Original file line number Diff line number Diff line change
Expand Up @@ -161,7 +161,7 @@ def __getitem__(self, arg):

def get(self, arg, **kwargs):
"""
Select elements in this VectorIndex and retrieve the corrsponding data from the self.target VectorData
Select elements in this VectorIndex and retrieve the corresponding data from the self.target VectorData

:param arg: slice or integer index indicating the elements we want to select in this VectorIndex
:param kwargs: any additional arguments to *get* method of the self.target VectorData
Expand Down Expand Up @@ -1056,6 +1056,7 @@ def get(self, arg, index=False, df=True, **kwargs):

:param arg: 1) tuple consisting of (str, int) where the string defines the column to select
and the int selects the row, 2) int or slice to select a subset of rows
:param df: Boolean indicating whether we want to return the result as a pandas dataframe

:return: Result from self.table[....] with the appropritate selection based on the
rows selected by this DynamicTableRegion
Expand Down Expand Up @@ -1099,7 +1100,6 @@ def get(self, arg, index=False, df=True, **kwargs):
ret = values.iloc[[lut[i] for i in ret]]
else:
ret = self._index_lol(values, ret, lut)

return ret
else:
raise ValueError("unrecognized argument: '%s'" % arg)
Expand Down
Loading