Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

updating of functions to work as rows-last #367

Closed

Conversation

cechava
Copy link
Collaborator

@cechava cechava commented Dec 14, 2021

fixes #364

The following should be amended to work with a rows-last convention:

  • checkConfig
  • getRow
  • addRow

make sure to have tests for DynamicTable columns of following varieties:

  • one-dimensional columns
  • multi-dimensional columns
  • ragged multi-dimensional columns
  • nested ragged columns
  • expandable columns (DataPipe)

@cechava
Copy link
Collaborator Author

cechava commented Dec 14, 2021

with this first commit, valid Dynamic Tables can be defined in the following manner:

>> col1 = types.hdmf_common.VectorData( ...
    'description', 'column #1', ...
    'data', [1, 2] ...
);

col2 = types.hdmf_common.VectorData( ...
    'description', 'column #2', ...
    'data', {'a', 'b'} ...
);

my_table = types.hdmf_common.DynamicTable( ...
    'description','an example table', ...
    'colnames', {'col1','col2'}, ...
    'col1',col1, ...
    'col2',col2, ...
    'id', types.hdmf_common.ElementIdentifiers('data', [0, 1]) ...  % 0-indexed, for compatibility with Python
);

getRow needs to be updated, since current output is as follows:

>> my_table.getRow(1)

ans =

  1×2 table

     col1          col2     
    ______    ______________

    1    2    {'a'}    {'b'}

under the planned changes 'row' will be a reserved word for the the last (not the first) dimension.

@cechava
Copy link
Collaborator Author

cechava commented Dec 14, 2021

with latest commit, getRow has correct behavior for rows-last convention (at least for simple tables):

>> my_table.getRow(2)

ans =

  1×2 table

    col1    col2 
    ____    _____

     2      {'b'}

>> my_table.getRow(1)

ans =

  1×2 table

    col1    col2 
    ____    _____

     1      {'a'}

>> my_table.getRow(1:2)

ans =

  1×2 table

     col1          col2     
    ______    ______________

    1    2    {'a'}    {'b'}

@cechava
Copy link
Collaborator Author

cechava commented Dec 15, 2021

with latest commit, the last dimension of DynamicTable objets are printed along the vertical axes (i.e., as conventional rows)

start_times = types.hdmf_common.VectorData( ...
    'description', 'start_times', ...
    'data', 1:10 ...
);

stop_times = types.hdmf_common.VectorData( ...
    'description', 'stop_times', ...
    'data', 2:11 ...
);
randomvalues = types.hdmf_common.VectorData( ...
    'description', 'random values',...
    'data', rand(3,2,10) ...
);

multi_dim_table = types.core.TimeIntervals( ...
    'description','test table', ...
    'colnames', {'randomvalues', 'start_time', 'stop_time'}, ...
    'start_time', start_times, ...
    'stop_time', stop_times, ...
    'randomvalues', randomvalues, ...
     'id', types.hdmf_common.ElementIdentifiers('data', 0:9) ...
 );

>> multi_dim_table.toTable()

ans =

  10×4 table

    id     randomvalues     start_time    stop_time
    __    ______________    __________    _________

    0     [1×2×3 double]         1            2    
    1     [1×2×3 double]         2            3    
    2     [1×2×3 double]         3            4    
    3     [1×2×3 double]         4            5    
    4     [1×2×3 double]         5            6    
    5     [1×2×3 double]         6            7    
    6     [1×2×3 double]         7            8    
    7     [1×2×3 double]         8            9    
    8     [1×2×3 double]         9           10    
    9     [1×2×3 double]        10           11    


>> multi_dim_table.getRow(1:5)

ans =

  5×3 table

     randomvalues     start_time    stop_time
    ______________    __________    _________

    [1×2×3 double]        1             2    
    [1×2×3 double]        2             3    
    [1×2×3 double]        3             4    
    [1×2×3 double]        4             5    
    [1×2×3 double]        5             6    

also works for multi-dimensional ragged arrays:

data = zeros(2,3,5);
data(:,:,1) = ...
    [1, 2, 3; ...
    4, 5, 6];
data(:,:,2) = ...
    [11, 12, 13; ...
    14, 15, 16];
data(:,:,3) = ...
    [17, 18, 19; ...
    20, 21, 22];
data(:,:,4) = ...
    [23, 24, 25; ...
    26, 27, 28];
data(:,:,5) = ...
    [29, 30, 31; ...
    32, 33, 34];

col6 = types.hdmf_common.VectorData( ...
    'description', 'column #6',...
    'data', data ...
);
col6_index = types.hdmf_common.VectorIndex( ...
    'description', 'column #6 index', ...
    'target', types.untyped.ObjectView(col6),'data', [2, 3, 5] ...
);

multi_ragged_table = types.hdmf_common.DynamicTable( ...
    'description','test table', ...
    'colnames', {'col6'}, ...
    'col6', col6, ...
    'col6_index', col6_index, ...
    'id', types.hdmf_common.ElementIdentifiers('data', [0, 1, 2]) ...  
);

>> multi_ragged_table.toTable()

ans =

  3×2 table

    id         col6     
    __    ______________

    0     {2×3×2 double}
    1     {2×3   double}
    2     {2×3×2 double}

@cechava
Copy link
Collaborator Author

cechava commented Dec 15, 2021

with latest commit, can use the addRow method to add elements to the last dimension of a DynamicTable column

col1 = types.hdmf_common.VectorData( ...
    'description', 'column #1', ...
    'data', [1, 2] ...
);

col2 = types.hdmf_common.VectorData( ...
    'description', 'column #2', ...
    'data', {'a', 'b'} ...
);

my_table = types.hdmf_common.DynamicTable( ...
    'description','an example table', ...
    'colnames', {'col1','col2'}, ...
    'col1',col1, ...
    'col2',col2, ...
    'id', types.hdmf_common.ElementIdentifiers('data', [0, 1]) ...  % 0-indexed, for compatibility with Python
);
my_table.addRow('col1', 3, 'col2', {'c'}, 'id', 2);

>> my_table.toTable()

ans =

  3×3 table

    id    col1    col2 
    __    ____    _____

    0      1      {'a'}
    1      2      {'b'}
    2      3      {'c'}

also works for ragged arrays:

col1 = types.hdmf_common.VectorData( ...
    'description', 'column #1', ...
    'data', {'1a', '1b', '1c', '2a'} ...
);

col1_index = types.hdmf_common.VectorIndex( ...
    'description', 'column #1 index',...
    'target',types.untyped.ObjectView(col1), ...  % object view of target column
    'data', [3, 4] ...
);
% 
table_ragged_col = types.hdmf_common.DynamicTable( ...
    'description', 'an example table', ...
    'colnames', {'col1'}, ...
    'col1', col1, ...
    'col1_index', col1_index, ...
    'id', types.hdmf_common.ElementIdentifiers('data', [0, 1]) ...  % 0-indexed, for compatibility with Python
);

table_ragged_col.addRow('col1', {'3a','3b','3c'}, 'id', 2);

>> table_ragged_col.toTable()

ans =

  3×2 table

    id       col1   
    __    __________

    0     {1×3 cell}
    1     {1×1 cell}
    2     {1×3 cell}

@cechava
Copy link
Collaborator Author

cechava commented Dec 15, 2021

with latest commit, addRow function works for vararg input and MATLAB table inputs:

col1 = types.hdmf_common.VectorData( ...
    'description', 'column #1', ...
    'data', rand(3,2,10) ...
);

my_table = types.hdmf_common.DynamicTable( ...
    'description','an example table', ...
    'colnames', {'col1'}, ...
    'col1',col1, ...
    'id', types.hdmf_common.ElementIdentifiers('data', 0:9) ... 
);
% add row with vararg
my_table.addRow('col1',rand(3,2), 'id', 10);

% add row with table
col1 = rand(3,2,5);
sub_table = table(col1);
my_table.addRow(sub_table);

>> my_table.vectordata.get('col1')

ans = 

  VectorData with properties:

    description: 'column #1'
           data: [3×2×16 double]

>> my_table.toTable()

ans =

  14×2 table

    id         col1     
    __    ______________

     0    [1×2×3 double]
     1    [1×2×3 double]
     2    [1×2×3 double]
     3    [1×2×3 double]
     4    [1×2×3 double]
     5    [1×2×3 double]
     6    [1×2×3 double]
     7    [1×2×3 double]
     8    [1×2×3 double]
     9    [1×2×3 double]
    10    [1×2×3 double]
    11    [1×2×3 double]
    12    [1×2×3 double]
    13    [1×2×3 double]

@cechava
Copy link
Collaborator Author

cechava commented Dec 15, 2021

@bendichter @ln-vidrio

I've hit a bit of a wall in implementing changes to deal with a rows-last convention in MatNWB and I'm wondering if you have any thoughts. Let me illustrate the issue with an example. Let's say I'm trying to build a DynamicTable by progressively adding rows in the following manner:

colnames = {'start_time', 'stop_time', 'random_multi'};
time_table = types.core.TimeIntervals(...
    'description', 'test dynamic table column',...
    'colnames', colnames ...
);

for i = 1:10
    time_table.addRow( ...
        'start_time', i, ...
        'stop_time', i+1, ...
        'random_multi', rand(3,2), ...
        'id', i-1 ...
    )
end

After going through the first loop( and adding the first row) the checkConfig utility function will find an unmatched length of columns. 'start_time' and 'stop_time' will have length 1 while 'random_multi' will have length 2 because 1) there's only one 3x2 matrix in the 'random_multi' column and 2) MATLAB doesn't recognize singleton dimensions so that implicit 3x2x1 becomes 3x2.

Right now, I'm dealing with this by looking at the other columns and making the decision to not assert equal length across all columns if there's a column of length one (in which case the dimensions of such multidimensional columns are ambiguous - is it 3x2 or 3x2x1). I don't foresee this solution being a problem, but let me know if you guys can think of a more elegant solution (haven't pushed this yet but will do so shortly).

@cechava
Copy link
Collaborator Author

cechava commented Dec 15, 2021

with latest commit, Dynmic Tables can be built up row-by-row

colnames = {'start_time', 'stop_time', 'random_multi'};
time_table = types.core.TimeIntervals(...
    'description', 'test dynamic table column',...
    'colnames', colnames ...
);

for i = 1:10
    time_table.addRow( ...
        'start_time', i, ...
        'stop_time', i+1, ...
        'random_multi', rand(3,2), ...
        'id', i-1 ...
    )
end


>> time_table.vectordata.get('random_multi')

ans = 

  VectorData with properties:

    description: 'AUTOGENERATED description for column `random_multi`'
           data: [3×2×10 double]

>> time_table.toTable()

ans =

  10×4 table

    id    start_time    stop_time     random_multi 
    __    __________    _________    ______________

    0          1            2        [1×2×3 double]
    1          2            3        [1×2×3 double]
    2          3            4        [1×2×3 double]
    3          4            5        [1×2×3 double]
    4          5            6        [1×2×3 double]
    5          6            7        [1×2×3 double]
    6          7            8        [1×2×3 double]
    7          8            9        [1×2×3 double]
    8          9           10        [1×2×3 double]
    9         10           11        [1×2×3 double]

@bendichter
Copy link
Contributor

@cechava It's a shame MATLAB does not allow for singleton dimensions. That sounds like a good solution to me.

@cechava
Copy link
Collaborator Author

cechava commented Dec 15, 2021

with latest commit, adding nested ragged arrays supported

colnames = {'start_time', 'stop_time', 'randomvalues', 'random_multi','stringdata'};
time_table = types.core.TimeIntervals(...
    'description', 'test dynamic table column',...
    'colnames', colnames ...
);
% 
for i = 1:10
    time_table.addRow( ...
        'start_time', i, ...
        'stop_time', i+1, ...
        'random_multi', rand(3,2), ...
        'randomvalues', {rand(2,5), rand(2,3)}, ...
        'stringdata', {'TRUE'},...
        'id', i-1 ...
    )
end

>> time_table.vectordata.get('random_multi')

ans = 

  VectorData with properties:

    description: 'AUTOGENERATED description for column `random_multi`'
           data: [3×2×10 double]

>> time_table.vectordata.get('randomvalues')

ans = 

  VectorData with properties:

    description: 'AUTOGENERATED description for column `randomvalues`'
           data: [2×80 double]

>> time_table.toTable()

ans =

  10×6 table

    id    start_time    stop_time    randomvalues    random_multi    stringdata
    __    __________    _________    ____________    ____________    __________

    0          1            2         {1×2 cell}     {3×2 double}     {'TRUE'} 
    1          2            3         {1×2 cell}     {3×2 double}     {'TRUE'} 
    2          3            4         {1×2 cell}     {3×2 double}     {'TRUE'} 
    3          4            5         {1×2 cell}     {3×2 double}     {'TRUE'} 
    4          5            6         {1×2 cell}     {3×2 double}     {'TRUE'} 
    5          6            7         {1×2 cell}     {3×2 double}     {'TRUE'} 
    6          7            8         {1×2 cell}     {3×2 double}     {'TRUE'} 
    7          8            9         {1×2 cell}     {3×2 double}     {'TRUE'} 
    8          9           10         {1×2 cell}     {3×2 double}     {'TRUE'} 
    9         10           11         {1×2 cell}     {3×2 double}     {'TRUE'} 

@cechava
Copy link
Collaborator Author

cechava commented Dec 16, 2021

@bendichter @ln-vidrio

another potential complication is that we can't use MATLAB tables as arguments to addRows. This is because we can't create tables with an unmatched number of rows (first-dimension as MATLAB defines it). For example, the code below doesn't work anymore

colnames = {'start_time', 'stop_time', 'randomvalues', 'random_multi','stringdata'};
time_table = types.core.TimeIntervals(...
    'description', 'test dynamic table column',...
    'colnames', colnames ...
);
subTable = table(...
    (101:150), ...
    (102:151), ...
    rand(2,3,50),...
    repmat({'TRUE'}, 1, 50),...
    'VariableNames', {'start_time', 'stop_time', 'random_multi','stringdata'});
time_table.addRow(subTable)

Error using table (line 233)
All table variables must have the same number of rows.

The issue is that the 'random_multi' column has 2 rows while all other columns have one row, as defined by MATLAB. One potential work-around is to have users define MATLAB tables with the number of rows along the first-dimension and under the hood we take that first dimension and map it onto the last dimension when using the addRow method.
I can't decide if this is a good idea because I think it will make things really confusing with users if they are using a rows-first convention for MATLAB tables and then using rows-last convention for matnwb tables. Perhaps the thing to do is to just remove the option of passing in MATLAB tables to the addRow method. Let me know your thoughts.

@bendichter
Copy link
Contributor

@cechava Thanks for bringing this up. I did not think of this, and I see that it presents a problem. I think your idea of transposing under the hood is probably the best way we could handle this if we wanted to continue to support it, but I agree it is potentially confusing. I could go either way. I guess I would rather not allow tables as input, but I think it would be OK if you wanted to keep them. We'd need to be very clear about this in the documentation though

@cechava
Copy link
Collaborator Author

cechava commented Dec 16, 2021

After sleeping on it, I agree with you and think it would be best to not allow tables as input. I can gently deprecate the addTableRow function by throwing out an error with a message whenever the function is called (e.g., adding rows by supplying a MATLAB table has been deprecated. Please use key-value pair arguments)

@codecov
Copy link

codecov bot commented Dec 22, 2021

Codecov Report

Merging #367 (c770d00) into master (223a0c4) will increase coverage by 0.12%.
The diff coverage is 98.02%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #367      +/-   ##
==========================================
+ Coverage   85.64%   85.76%   +0.12%     
==========================================
  Files         124      124              
  Lines        4626     4624       -2     
==========================================
+ Hits         3962     3966       +4     
+ Misses        664      658       -6     
Impacted Files Coverage Δ
+types/+util/+dynamictable/addTableRow.m 100.00% <ø> (+11.11%) ⬆️
+types/+util/+dynamictable/nwbToTable.m 91.42% <ø> (ø)
+types/+untyped/+datapipe/BlueprintPipe.m 71.91% <66.66%> (+0.64%) ⬆️
+types/+util/+dynamictable/getRow.m 96.42% <95.65%> (-0.72%) ⬇️
+types/+util/+dynamictable/checkConfig.m 92.40% <96.96%> (+0.60%) ⬆️
+tests/+sanity/GenerationTest.m 100.00% <100.00%> (ø)
+tests/+system/DynamicTableTest.m 99.42% <100.00%> (-0.03%) ⬇️
+tests/+system/UnitTimesIOTest.m 100.00% <100.00%> (ø)
+tests/+unit/dataPipeTest.m 97.14% <100.00%> (+0.08%) ⬆️
+types/+untyped/+datapipe/BoundPipe.m 84.74% <100.00%> (+0.13%) ⬆️
... and 10 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 223a0c4...c770d00. Read the comment docs.

@cechava
Copy link
Collaborator Author

cechava commented Dec 23, 2021

@bendichter @ln-vidrio

All tests now pass

when you get a chance, please look over this PR implementing the rows-last for multidimensional columns throughout MatNWB. It was quite a tricky thing to implement so I appreciate any feedback wherever thing could be more elegant.

Let me recap some of the main points I highlighted above:


  1. In order to prevent breaking current users' scripts, single-dimensional columns should be defined with MATLAB's rows-first convention

  2. For columns with more than one dimension, the desired row should come last.

Here's a snippet illustrating these points

% create NwbFile object with required fields
file= NwbFile( ...
    'session_start_time', '2021-01-01 00:00:00', ...
    'identifier', 'ident1', ...
    'session_description', 'DynamicTableTest' ...
    );

% create VectorData objects 
start_time= types.hdmf_common.VectorData( ...
    'description', 'start times column', ...
    'data', (1:6).' ...
);

stop_time = types.hdmf_common.VectorData( ...
    'description', 'stop times column', ...
    'data', (2:7).' ...
);

random = types.hdmf_common.VectorData( ...
    'description', 'random data column', ...
    'data', rand(4,3,6) ... % rows-last
);

ids = types.hdmf_common.ElementIdentifiers( ...
    'data', (0:5).' ...
);
% create expandable table
colnames = {'start_time', 'stop_time', 'randomvalues'};
file.intervals_trials = types.core.TimeIntervals( ...
    'description', 'test dynamic table', ...
    'colnames', colnames, ...
    'start_time', start_time, ...
    'stop_time', stop_time, ...
    'randomvalues', random, ...
    'id', ids ...
);  
% export file
nwbExport(file, 'DynamicTableTest.nwb');

  1. usage of getRow and toTable methods. Given that 1) the output of both these methods are MATLAB tables and 2) for these tables to be valid the height of all columns must be matched, the dimensions of multidimensional columns are swapped when these methods are called. The need for this swap is illustrated with the following snippet:
start_times = types.hdmf_common.VectorData( ...
    'description', 'start_times', ...
    'data', 1:10 ...
);

stop_times = types.hdmf_common.VectorData( ...
    'description', 'stop_times', ...
    'data', 2:11 ...
);
randomvalues = types.hdmf_common.VectorData( ...
    'description', 'random values',...
    'data', rand(3,2,10) ...
);

multi_dim_table = types.core.TimeIntervals( ...
    'description','test table', ...
    'colnames', {'randomvalues', 'start_time', 'stop_time'}, ...
    'start_time', start_times, ...
    'stop_time', stop_times, ...
    'randomvalues', randomvalues, ...
     'id', types.hdmf_common.ElementIdentifiers('data', 0:9) ...
 );

>> multi_dim_table.toTable()

ans =

  10×4 table

    id     randomvalues     start_time    stop_time
    __    ______________    __________    _________

    0     [1×2×3 double]         1            2    
    1     [1×2×3 double]         2            3    
    2     [1×2×3 double]         3            4    
    3     [1×2×3 double]         4            5    
    4     [1×2×3 double]         5            6    
    5     [1×2×3 double]         6            7    
    6     [1×2×3 double]         7            8    
    7     [1×2×3 double]         8            9    
    8     [1×2×3 double]         9           10    
    9     [1×2×3 double]        10           11    

without the dimension swap, the 'randomvalues' column would have a height of 30 while all other columns have a height of 10.

An open question, I would like to discuss is on how we want to direct users to define single-dimensional columns when DataPipes are involved. As you both have pointed out in the other PR, the mapping between MatNWB and HDF5 dimensions differs whether or not DataPipes are involved:

MatNWB HDF5
====== =====
-—VectorData—
[N,M] <-> [M,N]
[N,M,O] <-> [O,M,N]
[N,1] <-> [N]
[1,N] <-> [N]
impossible <-> [1,N]
-—DataPipe--
[N,M] <-> [M,N]
[N,M,O] <-> [O,M,N]
[N,1] <-> [1,N]
[1,N] <-> [N,1]

That means that if one wants to end up with row vectors in the HDF5 file and subsequently in PyNWB, one should define single-dimensional columns with column vectors as follows:

% create NwbFile object with required fields
file= NwbFile( ...
    'session_start_time', '2021-01-01 00:00:00', ...
    'identifier', 'ident1', ...
    'session_description', 'DataPipeTest' ...
    );

%create VectorData objects with DataPipe objects
start_time_exp = types.hdmf_common.VectorData( ...
    'description', 'start times column', ...
    'data', types.untyped.DataPipe( ...
        'data', 1:6, ...  # data must be numerical
        'maxSize', [1, Inf], ...
        'axis', 2 ...
    ) ...
);

stop_time_exp = types.hdmf_common.VectorData( ...
    'description', 'stop times column', ...
    'data', types.untyped.DataPipe( ...
        'data', 2:7, ...  #data must be numerical
        'maxSize', [1, Inf], ...
        'axis', 2 ...
    ) ...
);

random_exp = types.hdmf_common.VectorData( ...
    'description', 'random data column', ...
    'data', types.untyped.DataPipe( ...
        'data', rand(4, 3 , 6), ...  # rows last 
        'maxSize', [4, 3,  Inf], ...
        'axis', 3 ...
    ) ...
);

ids_exp = types.hdmf_common.ElementIdentifiers( ...
    'data', types.untyped.DataPipe( ...
        'data', 0:5, ...  # data must be numerical
        'maxSize', [1, Inf], ...
        'axis', 2 ...
    ) ...
);
% create expandable table
colnames = {'start_time', 'stop_time', 'randomvalues'};
file.intervals_trials = types.core.TimeIntervals( ...
    'description', 'test expdandable dynamic table', ...
    'colnames', colnames, ...
    'start_time', start_time_exp, ...
    'stop_time', stop_time_exp, ...
    'randomvalues', random_exp, ...
    'id', ids_exp ...
);  
% export file
nwbExport(file, 'dataPipeTest.nwb');

Screen Shot 2021-12-23 at 12 58 22 PM

Does this make sense to you? It might not be the most user-friendly thing to direct users to define single-dimensional columns as row-vectors in one context and as column-vectors in another, but will probably make things easiest for compatibility between MatNWB and PyNWB.

This is more of an issue for the hdmf tutorial rather than something that would break this PR. Let me know your thoughts.

@cechava cechava marked this pull request as ready for review December 23, 2021 18:36
@bendichter
Copy link
Contributor

@cechava I don't understand this (1:6).' syntax you are using throughout. What are you trying to accomplish with that period?

+util/swapDims.m Outdated Show resolved Hide resolved
@bendichter
Copy link
Contributor

bendichter commented Dec 24, 2021

This creates incorrect tables

image

@cechava, let me know when you are available to discuss a path forward.

@bendichter
Copy link
Contributor

In order to prevent breaking current users' scripts, single-dimensional columns should be defined with MATLAB's rows-first convention

wait, what?

@cechava
Copy link
Collaborator Author

cechava commented Dec 27, 2021

This creates incorrect tables!

image

@cechava, let me know when you are available to discuss a path forward.

can you share the snippet that you used to generate that file? I just want to make sure I'm not missing a bug that is not covered by the current tests

@cechava
Copy link
Collaborator Author

cechava commented Dec 27, 2021

putting this on the shelf, for now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

addRow function appending to wrong dimension of multidimensional matrices
2 participants