Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exported matrices have wrong dimensions #360

Closed
cechava opened this issue Dec 8, 2021 · 11 comments
Closed

exported matrices have wrong dimensions #360

cechava opened this issue Dec 8, 2021 · 11 comments

Comments

@cechava
Copy link
Collaborator

cechava commented Dec 8, 2021

matrices are saved as rows-last in NWB file instead of rows-first.

Code to reproduce

file= NwbFile( ...
    'session_start_time', '2021-01-01 00:00:00', ...
    'identifier', 'ident1', ...
    'session_description', 'test' ...
    );

start_times = types.hdmf_common.VectorData( ...
    'description', 'start_times', ...
    'data', [1:100].' ...
);

stop_times = types.hdmf_common.VectorData( ...
    'description', 'stop_times', ...
    'data', [2:101].' ...
);


randomvalues = types.hdmf_common.VectorData( ...
    'description', 'random values',...
    'data', rand(100,2,3) ...
);

multi_dim_table = types.core.TimeIntervals( ...
    'description','test table', ...
    'colnames', {'randomvalues', 'start_time', 'stop_time'}, ...
    'start_time', start_times, ...
    'stop_time', stop_times, ...
    'randomvalues', randomvalues, ...
     'id', types.hdmf_common.ElementIdentifiers('data', [0:99].') ...
 );
file.intervals.set('multi', multi_dim_table);
nwbExport(file, 'test_export.nwb');

Examining the dimensions in the exported file:

Screen Shot 2021-12-08 at 12 35 07 PM

@bendichter
Copy link
Contributor

bendichter commented Dec 9, 2021

woah, hold on. This is known, and documented in the README. It has to do with the way MATLAB represents arrays in memory. It uses F-ordering instead of C ordering. We can't change this without breaking every single user's code.

@cechava
Copy link
Collaborator Author

cechava commented Dec 9, 2021

Thanks for pointing me to the README. I vaguely remembered this, but couldn't figure out where I read it. I think this is an issue that should be further discussed. I'm finding that it leads to some incompatibility between matnwb and pynwb when multidimensional rows are involved.
For example, generating the file as above prevents loading into PyNWB (see below). The issue is that PyNWB recognizes that the length of id's (length 100) does not match the number of rows in the table as encoded (length 3). Thus, the configuration of the table is invalid and an error is thrown.

Maybe @ln-vidrio or @rly have some thoughts on this.

import os
from pynwb import NWBFile
from pynwb import NWBHDF5IO


base_path = '/Users/cesar/Documents/CatalystNeuro/matNWB/general_dev/checkExportDims/'
file_path = os.path.join(base_path, 'test_export.nwb')

io = NWBHDF5IO(file_path,\
               mode='r',\
               load_namespaces=True)
nwb = io.read()

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
~/anaconda3/envs/nwb_conversion_env/lib/python3.7/site-packages/hdmf/build/objectmapper.py in construct(self, **kwargs)
   1244             obj = self.__new_container__(cls, builder.source, parent, builder.attributes.get(self.__spec.id_key()),
-> 1245                                          **kwargs)
   1246         except Exception as ex:

~/anaconda3/envs/nwb_conversion_env/lib/python3.7/site-packages/hdmf/build/objectmapper.py in __new_container__(self, cls, container_source, parent, object_id, **kwargs)
   1253         obj = cls.__new__(cls, container_source=container_source, parent=parent, object_id=object_id)
-> 1254         obj.__init__(**kwargs)
   1255         return obj

~/anaconda3/envs/nwb_conversion_env/lib/python3.7/site-packages/hdmf/utils.py in func_call(*args, **kwargs)
    582                 pargs = _check_args(args, kwargs)
--> 583                 return func(args[0], **pargs)
    584         else:

~/anaconda3/envs/nwb_conversion_env/lib/python3.7/site-packages/pynwb/epoch.py in __init__(self, **kwargs)
     30     def __init__(self, **kwargs):
---> 31         call_docval_func(super(TimeIntervals, self).__init__, kwargs)
     32 

~/anaconda3/envs/nwb_conversion_env/lib/python3.7/site-packages/hdmf/utils.py in call_docval_func(func, kwargs)
    423     fargs, fkwargs = fmt_docval_args(func, kwargs)
--> 424     return func(*fargs, **fkwargs)
    425 

~/anaconda3/envs/nwb_conversion_env/lib/python3.7/site-packages/hdmf/utils.py in func_call(*args, **kwargs)
    582                 pargs = _check_args(args, kwargs)
--> 583                 return func(args[0], **pargs)
    584         else:

~/anaconda3/envs/nwb_conversion_env/lib/python3.7/site-packages/hdmf/common/table.py in __init__(self, **kwargs)
    343             if not all(i == lens[0] for i in lens):
--> 344                 raise ValueError("columns must be the same length")
    345             if len(lens) > 0 and lens[0] != len(id):

ValueError: columns must be the same length

The above exception was the direct cause of the following exception:

ConstructError                            Traceback (most recent call last)
/var/folders/0l/b1c608yd6_1bwmfv3gcpwhyr0000gn/T/ipykernel_19110/841572429.py in <module>
     10                mode='r',\
     11                load_namespaces=True)
---> 12 nwb = io.read()

~/anaconda3/envs/nwb_conversion_env/lib/python3.7/site-packages/hdmf/backends/hdf5/h5tools.py in read(self, **kwargs)
    496                                        % (self.source, self.__mode))
    497         try:
--> 498             return call_docval_func(super().read, kwargs)
    499         except UnsupportedOperation as e:
    500             if str(e) == 'Cannot build data. There are no values.':  # pragma: no cover

~/anaconda3/envs/nwb_conversion_env/lib/python3.7/site-packages/hdmf/utils.py in call_docval_func(func, kwargs)
    422 def call_docval_func(func, kwargs):
    423     fargs, fkwargs = fmt_docval_args(func, kwargs)
--> 424     return func(*fargs, **fkwargs)
    425 
    426 

~/anaconda3/envs/nwb_conversion_env/lib/python3.7/site-packages/hdmf/utils.py in func_call(*args, **kwargs)
    581             def func_call(*args, **kwargs):
    582                 pargs = _check_args(args, kwargs)
--> 583                 return func(args[0], **pargs)
    584         else:
    585             def func_call(*args, **kwargs):

~/anaconda3/envs/nwb_conversion_env/lib/python3.7/site-packages/hdmf/backends/io.py in read(self, **kwargs)
     39             # TODO also check that the keys are appropriate. print a better error message
     40             raise UnsupportedOperation('Cannot build data. There are no values.')
---> 41         container = self.__manager.construct(f_builder)
     42         return container
     43 

~/anaconda3/envs/nwb_conversion_env/lib/python3.7/site-packages/hdmf/utils.py in func_call(*args, **kwargs)
    581             def func_call(*args, **kwargs):
    582                 pargs = _check_args(args, kwargs)
--> 583                 return func(args[0], **pargs)
    584         else:
    585             def func_call(*args, **kwargs):

~/anaconda3/envs/nwb_conversion_env/lib/python3.7/site-packages/hdmf/build/manager.py in construct(self, **kwargs)
    278                 # we are at the top of the hierarchy,
    279                 # so it must be time to resolve parents
--> 280                 result = self.__type_map.construct(builder, self, None)
    281                 self.__resolve_parents(result)
    282             self.prebuilt(result, builder)

~/anaconda3/envs/nwb_conversion_env/lib/python3.7/site-packages/hdmf/utils.py in func_call(*args, **kwargs)
    581             def func_call(*args, **kwargs):
    582                 pargs = _check_args(args, kwargs)
--> 583                 return func(args[0], **pargs)
    584         else:
    585             def func_call(*args, **kwargs):

~/anaconda3/envs/nwb_conversion_env/lib/python3.7/site-packages/hdmf/build/manager.py in construct(self, **kwargs)
    788             raise ValueError('No ObjectMapper found for builder of type %s' % dt)
    789         else:
--> 790             return obj_mapper.construct(builder, build_manager, parent)
    791 
    792     @docval({"name": "container", "type": AbstractContainer, "doc": "the container to convert to a Builder"},

~/anaconda3/envs/nwb_conversion_env/lib/python3.7/site-packages/hdmf/utils.py in func_call(*args, **kwargs)
    581             def func_call(*args, **kwargs):
    582                 pargs = _check_args(args, kwargs)
--> 583                 return func(args[0], **pargs)
    584         else:
    585             def func_call(*args, **kwargs):

~/anaconda3/envs/nwb_conversion_env/lib/python3.7/site-packages/hdmf/build/objectmapper.py in construct(self, **kwargs)
   1212         cls = manager.get_cls(builder)
   1213         # gather all subspecs
-> 1214         subspecs = self.__get_subspec_values(builder, self.spec, manager)
   1215         # get the constructor argument that each specification corresponds to
   1216         const_args = dict()

~/anaconda3/envs/nwb_conversion_env/lib/python3.7/site-packages/hdmf/build/objectmapper.py in __get_subspec_values(self, builder, spec, manager)
   1141                         ret[subspec] = self.__flatten(sub_builder, subspec, manager)
   1142             # now process groups and datasets
-> 1143             self.__get_sub_builders(groups, spec.groups, manager, ret)
   1144             self.__get_sub_builders(datasets, spec.datasets, manager, ret)
   1145         elif isinstance(spec, DatasetSpec):

~/anaconda3/envs/nwb_conversion_env/lib/python3.7/site-packages/hdmf/build/objectmapper.py in __get_sub_builders(self, sub_builders, subspecs, manager, ret)
   1192                 if dt is None:
   1193                     # recurse
-> 1194                     ret.update(self.__get_subspec_values(sub_builder, subspec, manager))
   1195                 else:
   1196                     ret[subspec] = manager.construct(sub_builder)

~/anaconda3/envs/nwb_conversion_env/lib/python3.7/site-packages/hdmf/build/objectmapper.py in __get_subspec_values(self, builder, spec, manager)
   1141                         ret[subspec] = self.__flatten(sub_builder, subspec, manager)
   1142             # now process groups and datasets
-> 1143             self.__get_sub_builders(groups, spec.groups, manager, ret)
   1144             self.__get_sub_builders(datasets, spec.datasets, manager, ret)
   1145         elif isinstance(spec, DatasetSpec):

~/anaconda3/envs/nwb_conversion_env/lib/python3.7/site-packages/hdmf/build/objectmapper.py in __get_sub_builders(self, sub_builders, subspecs, manager, ret)
   1184                 sub_builder = builder_dt.get(dt)
   1185                 if sub_builder is not None:
-> 1186                     sub_builder = self.__flatten(sub_builder, subspec, manager)
   1187                     ret[subspec] = sub_builder
   1188             else:

~/anaconda3/envs/nwb_conversion_env/lib/python3.7/site-packages/hdmf/build/objectmapper.py in __flatten(self, sub_builder, subspec, manager)
   1197 
   1198     def __flatten(self, sub_builder, subspec, manager):
-> 1199         tmp = [manager.construct(b) for b in sub_builder]
   1200         if len(tmp) == 1 and not subspec.is_many():
   1201             tmp = tmp[0]

~/anaconda3/envs/nwb_conversion_env/lib/python3.7/site-packages/hdmf/build/objectmapper.py in <listcomp>(.0)
   1197 
   1198     def __flatten(self, sub_builder, subspec, manager):
-> 1199         tmp = [manager.construct(b) for b in sub_builder]
   1200         if len(tmp) == 1 and not subspec.is_many():
   1201             tmp = tmp[0]

~/anaconda3/envs/nwb_conversion_env/lib/python3.7/site-packages/hdmf/utils.py in func_call(*args, **kwargs)
    581             def func_call(*args, **kwargs):
    582                 pargs = _check_args(args, kwargs)
--> 583                 return func(args[0], **pargs)
    584         else:
    585             def func_call(*args, **kwargs):

~/anaconda3/envs/nwb_conversion_env/lib/python3.7/site-packages/hdmf/build/manager.py in construct(self, **kwargs)
    274             if parent_builder is not None:
    275                 parent = self._get_proxy_builder(parent_builder)
--> 276                 result = self.__type_map.construct(builder, self, parent)
    277             else:
    278                 # we are at the top of the hierarchy,

~/anaconda3/envs/nwb_conversion_env/lib/python3.7/site-packages/hdmf/utils.py in func_call(*args, **kwargs)
    581             def func_call(*args, **kwargs):
    582                 pargs = _check_args(args, kwargs)
--> 583                 return func(args[0], **pargs)
    584         else:
    585             def func_call(*args, **kwargs):

~/anaconda3/envs/nwb_conversion_env/lib/python3.7/site-packages/hdmf/build/manager.py in construct(self, **kwargs)
    788             raise ValueError('No ObjectMapper found for builder of type %s' % dt)
    789         else:
--> 790             return obj_mapper.construct(builder, build_manager, parent)
    791 
    792     @docval({"name": "container", "type": AbstractContainer, "doc": "the container to convert to a Builder"},

~/anaconda3/envs/nwb_conversion_env/lib/python3.7/site-packages/hdmf/utils.py in func_call(*args, **kwargs)
    581             def func_call(*args, **kwargs):
    582                 pargs = _check_args(args, kwargs)
--> 583                 return func(args[0], **pargs)
    584         else:
    585             def func_call(*args, **kwargs):

~/anaconda3/envs/nwb_conversion_env/lib/python3.7/site-packages/hdmf/build/objectmapper.py in construct(self, **kwargs)
   1246         except Exception as ex:
   1247             msg = 'Could not construct %s object due to: %s' % (cls.__name__, ex)
-> 1248             raise ConstructError(builder, msg) from ex
   1249         return obj
   1250 

ConstructError: (root/intervals/multi GroupBuilder {'attributes': {'colnames': array(['randomvalues', 'start_time', 'stop_time'], dtype=object), 'description': 'test table', 'namespace': 'core', 'neurodata_type': 'TimeIntervals', 'object_id': 'aea7939a-379c-47a7-81af-5b4e3bbb746e'}, 'groups': {}, 'datasets': {'id': root/intervals/multi/id DatasetBuilder {'attributes': {'namespace': 'hdmf-common', 'neurodata_type': 'ElementIdentifiers', 'object_id': 'ec1d12ad-29dd-4850-bd8b-99fa3e8bfe11'}, 'data': <HDF5 dataset "id": shape (100,), type "|i1">}, 'randomvalues': root/intervals/multi/randomvalues DatasetBuilder {'attributes': {'description': 'random values', 'namespace': 'hdmf-common', 'neurodata_type': 'VectorData', 'object_id': '4e1beee3-1e8f-4730-a809-2001d4cc43b8'}, 'data': <HDF5 dataset "randomvalues": shape (3, 2, 100), type "<f8">}, 'start_time': root/intervals/multi/start_time DatasetBuilder {'attributes': {'description': 'start_times', 'namespace': 'hdmf-common', 'neurodata_type': 'VectorData', 'object_id': '4bb5c88b-450b-49bb-8bce-b80c0bbee7c6'}, 'data': <HDF5 dataset "start_time": shape (100,), type "<f8">}, 'stop_time': root/intervals/multi/stop_time DatasetBuilder {'attributes': {'description': 'stop_times', 'namespace': 'hdmf-common', 'neurodata_type': 'VectorData', 'object_id': '4ebebb9e-0558-4eaa-bc4a-c46527dc7d3d'}, 'data': <HDF5 dataset "stop_time": shape (100,), type "<f8">}}, 'links': {}}, 'Could not construct TimeIntervals object due to: columns must be the same length')

@cechava
Copy link
Collaborator Author

cechava commented Dec 9, 2021

a couple of potential solutions come to mind:

  1. re-generate the row id's when loading a MatNWB-generated file into PyNWB. This is not an ideal solution because it could be the case that the row ID's are meant to be meaningful (e.g., timestamps).

  2. Use single-dimensional data within the DynamicTable object (such as start_time) to figure out the intended number of rows when loading in files in PyNWB and reformat multi-dimensional columns to match that number of rows.

  3. Address this on the MatNWB side and figure out how to encode the data in rows-first format. I have some initial work towards this in PR write and read row-first matrices from NWB file #361.

Let me know if anyone has any other ideas or if the issue has been addressed elsewhere.

@lawrence-mbf
Copy link
Collaborator

a couple of potential solutions come to mind:

1. re-generate the row id's when loading a MatNWB-generated file into PyNWB. This is not an ideal solution because it could be the case that the row ID's are meant to be meaningful (e.g., timestamps).

2. Use single-dimensional data within the `DynamicTable` object (such as `start_time`) to figure out the intended number of rows when loading in files in PyNWB and reformat multi-dimensional columns to match that number of rows.

3. Address this on the MatNWB side and figure out how to encode the data in rows-first format. I have some initial work towards this in PR [write and read row-first matrices from NWB file #361](https://github.com/NeurodataWithoutBorders/matnwb/pull/361).

Let me know if anyone has any other ideas or if the issue has been addressed elsewhere.

  1. Ignoring as I'd rather not have pynwb make changes solely to suit MatNWB.
  2. is only feasible if we can guarantee at least one vector column. Not all Dynamic Tables have a start_time and I forget if id must be a vector or not since it's user-settable. That said, id is used by addRow to determine the true number of rows anyway so maybe that's not an issue.
  3. This is guaranteed to break something, incur some user mental load, and would require changing all possible data reads and writes everywhere. Not a big fan but is a solution.

How does the inverse work? i.e. loading a pynwb table in MatNWB?

@cechava
Copy link
Collaborator Author

cechava commented Dec 9, 2021

You do bring up a good point: id can be used to determine the true number of rows. It's then most straightforward to implement a solution on the pynwb side. Something like the following pseudo-code:

if size(loaded_matrix,1) != len(id) AND size(loaded_matrix, ndims(loaded_matrix)) == len(id) :
   swap first and last dimensions of loaded_matrix
else if size(loaded_matrix,1) != len(id) AND size(loaded_matrix, ndims(loaded_matrix)) != len(id):
    throw error

Checking now how loading a multidimensional pynwb table into MatNWB works out.

@bendichter
Copy link
Contributor

bendichter commented Dec 9, 2021 via email

@cechava
Copy link
Collaborator Author

cechava commented Dec 9, 2021

No worries. Won't implement anything until we are all on the same page. Just wanted to start the discussion with relevant details.

The same issue occurs when going from pynwb into matnwb. Relevant code below.

from datetime import datetime
from dateutil.tz import tzlocal
from pynwb import NWBFile
import numpy as np
from pynwb import NWBHDF5IO

start_time = datetime(2017, 4, 3, 11, tzinfo=tzlocal())

nwbfile = NWBFile(session_description='NWB multidimensional test',  # required
                  identifier='NWB123',  # required
                  session_start_time=start_time)

nwbfile.add_trial_column(name='randomvalues', description='')

for i in range(100):
    nwbfile.add_trial(start_time=float(i+1), stop_time=float(i+2), randomvalues=np.random.rand(3,2),id = i)
with NWBHDF5IO('pynwb_multi_test.nwb', 'w') as io:
    io.write(nwbfile)

Screen Shot 2021-12-09 at 12 33 19 PM

file_path = '/Users/cesar/Documents/CatalystNeuro/pynwb/pynwb_multi_test.nwb';
readFile = nwbRead(file_path);

Error using types.util.dynamictable.checkConfig (line 85)
Must provide same number of ids as length of columns.

Error in types.hdmf_common.DynamicTable (line 40)
        types.util.dynamictable.checkConfig(obj,ignoreList);

Error in types.core.TimeIntervals (line 25)
        obj =
        [email protected]_common.DynamicTable(varargin{:});

Error in io.parseGroup (line 85)
    parsed = eval([Type.typename '(kwargs{:})']);

Error in io.parseGroup (line 38)
    subg = io.parseGroup(filename, group, Blacklist);

Error in io.parseGroup (line 38)
    subg = io.parseGroup(filename, group, Blacklist);

Error in nwbRead (line 59)
nwb = io.parseGroup(filename, h5info(filename),
Blacklist);

@bendichter
Copy link
Contributor

the above error looks like a problem with types.util.dynamictable.checkConfig, right?

@cechava
Copy link
Collaborator Author

cechava commented Dec 13, 2021

Not quite. The issue is with MatNWB reading in the matrix in the NWB file as rows-last in size. The inferred number of rows (3) does not match the length of ID (100) so checkConfig throws an error. This is essentially the same mismatch that PyNWB complains about when we try to go the other way. I put together the schematic below to summarize the issue.

Untitled 001

@bendichter
Copy link
Contributor

bendichter commented Dec 14, 2021

145852045-94eac1e1-09af-4327-b97d-6005ef3be3aa
This should be 3 x 2 x 100, as it indicates in the README:

The ordering of dimensions in MATLAB are reversed compared to numpy (and pynwb). Thus, a 3-D SpikeEventSeries, which in pynwb would normally be indexed in order (num_samples, num_channels, num_events), would be indexed in form (num_events, num_channels, num_samples) in MatNWB.

For number of rows, you should be checking the last dimension, not the first.

@cechava
Copy link
Collaborator Author

cechava commented Dec 14, 2021

@bendichter I know that we agreed that we would tackle this issue by modifying addRow method in MatNWB as described in issue #364, but thinking about it further on it this falls short of addressing the issue and I think we should revisit.

The issue is schematized below:
slide 001

Basically, it is still not possible the read-in valid pynwb-generated DynamicTable objects into MatNWB. Perhaps this is why MatNWB did not perform any checks on DynamicTable objects and we should backtrack on having that checkConfig function.

@cechava cechava reopened this Dec 14, 2021
@cechava cechava closed this as completed Dec 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants