-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
updating of functions to work as rows-last #367
updating of functions to work as rows-last #367
Conversation
with this first commit, valid Dynamic Tables can be defined in the following manner: >> col1 = types.hdmf_common.VectorData( ...
'description', 'column #1', ...
'data', [1, 2] ...
);
col2 = types.hdmf_common.VectorData( ...
'description', 'column #2', ...
'data', {'a', 'b'} ...
);
my_table = types.hdmf_common.DynamicTable( ...
'description','an example table', ...
'colnames', {'col1','col2'}, ...
'col1',col1, ...
'col2',col2, ...
'id', types.hdmf_common.ElementIdentifiers('data', [0, 1]) ... % 0-indexed, for compatibility with Python
); getRow needs to be updated, since current output is as follows: >> my_table.getRow(1)
ans =
1×2 table
col1 col2
______ ______________
1 2 {'a'} {'b'} under the planned changes 'row' will be a reserved word for the the last (not the first) dimension. |
with latest commit, getRow has correct behavior for rows-last convention (at least for simple tables): >> my_table.getRow(2)
ans =
1×2 table
col1 col2
____ _____
2 {'b'}
>> my_table.getRow(1)
ans =
1×2 table
col1 col2
____ _____
1 {'a'}
>> my_table.getRow(1:2)
ans =
1×2 table
col1 col2
______ ______________
1 2 {'a'} {'b'}
|
with latest commit, the last dimension of start_times = types.hdmf_common.VectorData( ...
'description', 'start_times', ...
'data', 1:10 ...
);
stop_times = types.hdmf_common.VectorData( ...
'description', 'stop_times', ...
'data', 2:11 ...
);
randomvalues = types.hdmf_common.VectorData( ...
'description', 'random values',...
'data', rand(3,2,10) ...
);
multi_dim_table = types.core.TimeIntervals( ...
'description','test table', ...
'colnames', {'randomvalues', 'start_time', 'stop_time'}, ...
'start_time', start_times, ...
'stop_time', stop_times, ...
'randomvalues', randomvalues, ...
'id', types.hdmf_common.ElementIdentifiers('data', 0:9) ...
);
>> multi_dim_table.toTable()
ans =
10×4 table
id randomvalues start_time stop_time
__ ______________ __________ _________
0 [1×2×3 double] 1 2
1 [1×2×3 double] 2 3
2 [1×2×3 double] 3 4
3 [1×2×3 double] 4 5
4 [1×2×3 double] 5 6
5 [1×2×3 double] 6 7
6 [1×2×3 double] 7 8
7 [1×2×3 double] 8 9
8 [1×2×3 double] 9 10
9 [1×2×3 double] 10 11
>> multi_dim_table.getRow(1:5)
ans =
5×3 table
randomvalues start_time stop_time
______________ __________ _________
[1×2×3 double] 1 2
[1×2×3 double] 2 3
[1×2×3 double] 3 4
[1×2×3 double] 4 5
[1×2×3 double] 5 6 also works for multi-dimensional ragged arrays: data = zeros(2,3,5);
data(:,:,1) = ...
[1, 2, 3; ...
4, 5, 6];
data(:,:,2) = ...
[11, 12, 13; ...
14, 15, 16];
data(:,:,3) = ...
[17, 18, 19; ...
20, 21, 22];
data(:,:,4) = ...
[23, 24, 25; ...
26, 27, 28];
data(:,:,5) = ...
[29, 30, 31; ...
32, 33, 34];
col6 = types.hdmf_common.VectorData( ...
'description', 'column #6',...
'data', data ...
);
col6_index = types.hdmf_common.VectorIndex( ...
'description', 'column #6 index', ...
'target', types.untyped.ObjectView(col6),'data', [2, 3, 5] ...
);
multi_ragged_table = types.hdmf_common.DynamicTable( ...
'description','test table', ...
'colnames', {'col6'}, ...
'col6', col6, ...
'col6_index', col6_index, ...
'id', types.hdmf_common.ElementIdentifiers('data', [0, 1, 2]) ...
);
>> multi_ragged_table.toTable()
ans =
3×2 table
id col6
__ ______________
0 {2×3×2 double}
1 {2×3 double}
2 {2×3×2 double} |
with latest commit, can use the addRow method to add elements to the last dimension of a DynamicTable column col1 = types.hdmf_common.VectorData( ...
'description', 'column #1', ...
'data', [1, 2] ...
);
col2 = types.hdmf_common.VectorData( ...
'description', 'column #2', ...
'data', {'a', 'b'} ...
);
my_table = types.hdmf_common.DynamicTable( ...
'description','an example table', ...
'colnames', {'col1','col2'}, ...
'col1',col1, ...
'col2',col2, ...
'id', types.hdmf_common.ElementIdentifiers('data', [0, 1]) ... % 0-indexed, for compatibility with Python
);
my_table.addRow('col1', 3, 'col2', {'c'}, 'id', 2);
>> my_table.toTable()
ans =
3×3 table
id col1 col2
__ ____ _____
0 1 {'a'}
1 2 {'b'}
2 3 {'c'}
also works for ragged arrays: col1 = types.hdmf_common.VectorData( ...
'description', 'column #1', ...
'data', {'1a', '1b', '1c', '2a'} ...
);
col1_index = types.hdmf_common.VectorIndex( ...
'description', 'column #1 index',...
'target',types.untyped.ObjectView(col1), ... % object view of target column
'data', [3, 4] ...
);
%
table_ragged_col = types.hdmf_common.DynamicTable( ...
'description', 'an example table', ...
'colnames', {'col1'}, ...
'col1', col1, ...
'col1_index', col1_index, ...
'id', types.hdmf_common.ElementIdentifiers('data', [0, 1]) ... % 0-indexed, for compatibility with Python
);
table_ragged_col.addRow('col1', {'3a','3b','3c'}, 'id', 2);
>> table_ragged_col.toTable()
ans =
3×2 table
id col1
__ __________
0 {1×3 cell}
1 {1×1 cell}
2 {1×3 cell} |
with latest commit, addRow function works for vararg input and MATLAB table inputs: col1 = types.hdmf_common.VectorData( ...
'description', 'column #1', ...
'data', rand(3,2,10) ...
);
my_table = types.hdmf_common.DynamicTable( ...
'description','an example table', ...
'colnames', {'col1'}, ...
'col1',col1, ...
'id', types.hdmf_common.ElementIdentifiers('data', 0:9) ...
);
% add row with vararg
my_table.addRow('col1',rand(3,2), 'id', 10);
% add row with table
col1 = rand(3,2,5);
sub_table = table(col1);
my_table.addRow(sub_table);
>> my_table.vectordata.get('col1')
ans =
VectorData with properties:
description: 'column #1'
data: [3×2×16 double]
>> my_table.toTable()
ans =
14×2 table
id col1
__ ______________
0 [1×2×3 double]
1 [1×2×3 double]
2 [1×2×3 double]
3 [1×2×3 double]
4 [1×2×3 double]
5 [1×2×3 double]
6 [1×2×3 double]
7 [1×2×3 double]
8 [1×2×3 double]
9 [1×2×3 double]
10 [1×2×3 double]
11 [1×2×3 double]
12 [1×2×3 double]
13 [1×2×3 double]
|
@bendichter @ln-vidrio I've hit a bit of a wall in implementing changes to deal with a rows-last convention in MatNWB and I'm wondering if you have any thoughts. Let me illustrate the issue with an example. Let's say I'm trying to build a colnames = {'start_time', 'stop_time', 'random_multi'};
time_table = types.core.TimeIntervals(...
'description', 'test dynamic table column',...
'colnames', colnames ...
);
for i = 1:10
time_table.addRow( ...
'start_time', i, ...
'stop_time', i+1, ...
'random_multi', rand(3,2), ...
'id', i-1 ...
)
end After going through the first loop( and adding the first row) the checkConfig utility function will find an unmatched length of columns. 'start_time' and 'stop_time' will have length 1 while 'random_multi' will have length 2 because 1) there's only one 3x2 matrix in the 'random_multi' column and 2) MATLAB doesn't recognize singleton dimensions so that implicit 3x2x1 becomes 3x2. Right now, I'm dealing with this by looking at the other columns and making the decision to not assert equal length across all columns if there's a column of length one (in which case the dimensions of such multidimensional columns are ambiguous - is it 3x2 or 3x2x1). I don't foresee this solution being a problem, but let me know if you guys can think of a more elegant solution (haven't pushed this yet but will do so shortly). |
with latest commit, Dynmic Tables can be built up row-by-row colnames = {'start_time', 'stop_time', 'random_multi'};
time_table = types.core.TimeIntervals(...
'description', 'test dynamic table column',...
'colnames', colnames ...
);
for i = 1:10
time_table.addRow( ...
'start_time', i, ...
'stop_time', i+1, ...
'random_multi', rand(3,2), ...
'id', i-1 ...
)
end
>> time_table.vectordata.get('random_multi')
ans =
VectorData with properties:
description: 'AUTOGENERATED description for column `random_multi`'
data: [3×2×10 double]
>> time_table.toTable()
ans =
10×4 table
id start_time stop_time random_multi
__ __________ _________ ______________
0 1 2 [1×2×3 double]
1 2 3 [1×2×3 double]
2 3 4 [1×2×3 double]
3 4 5 [1×2×3 double]
4 5 6 [1×2×3 double]
5 6 7 [1×2×3 double]
6 7 8 [1×2×3 double]
7 8 9 [1×2×3 double]
8 9 10 [1×2×3 double]
9 10 11 [1×2×3 double] |
@cechava It's a shame MATLAB does not allow for singleton dimensions. That sounds like a good solution to me. |
with latest commit, adding nested ragged arrays supported colnames = {'start_time', 'stop_time', 'randomvalues', 'random_multi','stringdata'};
time_table = types.core.TimeIntervals(...
'description', 'test dynamic table column',...
'colnames', colnames ...
);
%
for i = 1:10
time_table.addRow( ...
'start_time', i, ...
'stop_time', i+1, ...
'random_multi', rand(3,2), ...
'randomvalues', {rand(2,5), rand(2,3)}, ...
'stringdata', {'TRUE'},...
'id', i-1 ...
)
end
>> time_table.vectordata.get('random_multi')
ans =
VectorData with properties:
description: 'AUTOGENERATED description for column `random_multi`'
data: [3×2×10 double]
>> time_table.vectordata.get('randomvalues')
ans =
VectorData with properties:
description: 'AUTOGENERATED description for column `randomvalues`'
data: [2×80 double]
>> time_table.toTable()
ans =
10×6 table
id start_time stop_time randomvalues random_multi stringdata
__ __________ _________ ____________ ____________ __________
0 1 2 {1×2 cell} {3×2 double} {'TRUE'}
1 2 3 {1×2 cell} {3×2 double} {'TRUE'}
2 3 4 {1×2 cell} {3×2 double} {'TRUE'}
3 4 5 {1×2 cell} {3×2 double} {'TRUE'}
4 5 6 {1×2 cell} {3×2 double} {'TRUE'}
5 6 7 {1×2 cell} {3×2 double} {'TRUE'}
6 7 8 {1×2 cell} {3×2 double} {'TRUE'}
7 8 9 {1×2 cell} {3×2 double} {'TRUE'}
8 9 10 {1×2 cell} {3×2 double} {'TRUE'}
9 10 11 {1×2 cell} {3×2 double} {'TRUE'} |
@bendichter @ln-vidrio another potential complication is that we can't use MATLAB tables as arguments to addRows. This is because we can't create tables with an unmatched number of rows (first-dimension as MATLAB defines it). For example, the code below doesn't work anymore colnames = {'start_time', 'stop_time', 'randomvalues', 'random_multi','stringdata'};
time_table = types.core.TimeIntervals(...
'description', 'test dynamic table column',...
'colnames', colnames ...
);
subTable = table(...
(101:150), ...
(102:151), ...
rand(2,3,50),...
repmat({'TRUE'}, 1, 50),...
'VariableNames', {'start_time', 'stop_time', 'random_multi','stringdata'});
time_table.addRow(subTable)
Error using table (line 233)
All table variables must have the same number of rows. The issue is that the 'random_multi' column has 2 rows while all other columns have one row, as defined by MATLAB. One potential work-around is to have users define MATLAB tables with the number of rows along the first-dimension and under the hood we take that first dimension and map it onto the last dimension when using the addRow method. |
@cechava Thanks for bringing this up. I did not think of this, and I see that it presents a problem. I think your idea of transposing under the hood is probably the best way we could handle this if we wanted to continue to support it, but I agree it is potentially confusing. I could go either way. I guess I would rather not allow tables as input, but I think it would be OK if you wanted to keep them. We'd need to be very clear about this in the documentation though |
After sleeping on it, I agree with you and think it would be best to not allow tables as input. I can gently deprecate the addTableRow function by throwing out an error with a message whenever the function is called (e.g., adding rows by supplying a MATLAB table has been deprecated. Please use key-value pair arguments) |
Codecov Report
@@ Coverage Diff @@
## master #367 +/- ##
==========================================
+ Coverage 85.64% 85.76% +0.12%
==========================================
Files 124 124
Lines 4626 4624 -2
==========================================
+ Hits 3962 3966 +4
+ Misses 664 658 -6
Continue to review full report at Codecov.
|
@bendichter @ln-vidrio All tests now pass when you get a chance, please look over this PR implementing the rows-last for multidimensional columns throughout MatNWB. It was quite a tricky thing to implement so I appreciate any feedback wherever thing could be more elegant. Let me recap some of the main points I highlighted above:
Here's a snippet illustrating these points % create NwbFile object with required fields
file= NwbFile( ...
'session_start_time', '2021-01-01 00:00:00', ...
'identifier', 'ident1', ...
'session_description', 'DynamicTableTest' ...
);
% create VectorData objects
start_time= types.hdmf_common.VectorData( ...
'description', 'start times column', ...
'data', (1:6).' ...
);
stop_time = types.hdmf_common.VectorData( ...
'description', 'stop times column', ...
'data', (2:7).' ...
);
random = types.hdmf_common.VectorData( ...
'description', 'random data column', ...
'data', rand(4,3,6) ... % rows-last
);
ids = types.hdmf_common.ElementIdentifiers( ...
'data', (0:5).' ...
);
% create expandable table
colnames = {'start_time', 'stop_time', 'randomvalues'};
file.intervals_trials = types.core.TimeIntervals( ...
'description', 'test dynamic table', ...
'colnames', colnames, ...
'start_time', start_time, ...
'stop_time', stop_time, ...
'randomvalues', random, ...
'id', ids ...
);
% export file
nwbExport(file, 'DynamicTableTest.nwb');
start_times = types.hdmf_common.VectorData( ...
'description', 'start_times', ...
'data', 1:10 ...
);
stop_times = types.hdmf_common.VectorData( ...
'description', 'stop_times', ...
'data', 2:11 ...
);
randomvalues = types.hdmf_common.VectorData( ...
'description', 'random values',...
'data', rand(3,2,10) ...
);
multi_dim_table = types.core.TimeIntervals( ...
'description','test table', ...
'colnames', {'randomvalues', 'start_time', 'stop_time'}, ...
'start_time', start_times, ...
'stop_time', stop_times, ...
'randomvalues', randomvalues, ...
'id', types.hdmf_common.ElementIdentifiers('data', 0:9) ...
);
>> multi_dim_table.toTable()
ans =
10×4 table
id randomvalues start_time stop_time
__ ______________ __________ _________
0 [1×2×3 double] 1 2
1 [1×2×3 double] 2 3
2 [1×2×3 double] 3 4
3 [1×2×3 double] 4 5
4 [1×2×3 double] 5 6
5 [1×2×3 double] 6 7
6 [1×2×3 double] 7 8
7 [1×2×3 double] 8 9
8 [1×2×3 double] 9 10
9 [1×2×3 double] 10 11
without the dimension swap, the 'randomvalues' column would have a height of 30 while all other columns have a height of 10. An open question, I would like to discuss is on how we want to direct users to define single-dimensional columns when DataPipes are involved. As you both have pointed out in the other PR, the mapping between MatNWB and HDF5 dimensions differs whether or not DataPipes are involved: MatNWB HDF5 That means that if one wants to end up with row vectors in the HDF5 file and subsequently in PyNWB, one should define single-dimensional columns with column vectors as follows: % create NwbFile object with required fields
file= NwbFile( ...
'session_start_time', '2021-01-01 00:00:00', ...
'identifier', 'ident1', ...
'session_description', 'DataPipeTest' ...
);
%create VectorData objects with DataPipe objects
start_time_exp = types.hdmf_common.VectorData( ...
'description', 'start times column', ...
'data', types.untyped.DataPipe( ...
'data', 1:6, ... # data must be numerical
'maxSize', [1, Inf], ...
'axis', 2 ...
) ...
);
stop_time_exp = types.hdmf_common.VectorData( ...
'description', 'stop times column', ...
'data', types.untyped.DataPipe( ...
'data', 2:7, ... #data must be numerical
'maxSize', [1, Inf], ...
'axis', 2 ...
) ...
);
random_exp = types.hdmf_common.VectorData( ...
'description', 'random data column', ...
'data', types.untyped.DataPipe( ...
'data', rand(4, 3 , 6), ... # rows last
'maxSize', [4, 3, Inf], ...
'axis', 3 ...
) ...
);
ids_exp = types.hdmf_common.ElementIdentifiers( ...
'data', types.untyped.DataPipe( ...
'data', 0:5, ... # data must be numerical
'maxSize', [1, Inf], ...
'axis', 2 ...
) ...
);
% create expandable table
colnames = {'start_time', 'stop_time', 'randomvalues'};
file.intervals_trials = types.core.TimeIntervals( ...
'description', 'test expdandable dynamic table', ...
'colnames', colnames, ...
'start_time', start_time_exp, ...
'stop_time', stop_time_exp, ...
'randomvalues', random_exp, ...
'id', ids_exp ...
);
% export file
nwbExport(file, 'dataPipeTest.nwb'); Does this make sense to you? It might not be the most user-friendly thing to direct users to define single-dimensional columns as row-vectors in one context and as column-vectors in another, but will probably make things easiest for compatibility between MatNWB and PyNWB. This is more of an issue for the hdmf tutorial rather than something that would break this PR. Let me know your thoughts. |
@cechava I don't understand this |
This creates incorrect tables @cechava, let me know when you are available to discuss a path forward. |
wait, what? |
can you share the snippet that you used to generate that file? I just want to make sure I'm not missing a bug that is not covered by the current tests |
putting this on the shelf, for now |
fixes #364
The following should be amended to work with a rows-last convention:
make sure to have tests for DynamicTable columns of following varieties: