Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor addRawData to flip append dims #387

Merged
merged 23 commits into from
Feb 1, 2022
Merged

Conversation

lawrence-mbf
Copy link
Collaborator

@lawrence-mbf lawrence-mbf commented Jan 11, 2022

Fixes #390, Fixes #364

lawrence-mbf and others added 3 commits January 17, 2022 13:29
Now only supports up to 2D matrices when adding in memory.
Since we no longer assume that the first dimension is necessariily the append dimension.
@lawrence-mbf lawrence-mbf marked this pull request as ready for review January 17, 2022 18:33
Since this is an ambiguous case where we have no idea which
dimension to actually use until a second piece of data is appended.
Will now only check the config right before instantiation
instead at every abstraction level.
- Refactored a lot of checkConfig.
- Added Index infinite loop check case.
- Added error IDs to errors.
- Fix weird case where the file generation did not include the right class type check.
- Fix case where `id` column was not in-memory.
Not necessarily only called by addRow.
Forgot these IDs were actually used somewhere.
- Did not populate the ragged array properly.
- Checked for the existence of hdmf_common before checking config despite that being
supported in the function itself.
@codecov
Copy link

codecov bot commented Jan 18, 2022

Codecov Report

Merging #387 (fa66bb7) into master (1cdd68b) will increase coverage by 0.04%.
The diff coverage is 89.80%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #387      +/-   ##
==========================================
+ Coverage   85.69%   85.74%   +0.04%     
==========================================
  Files         125      125              
  Lines        4698     4714      +16     
==========================================
+ Hits         4026     4042      +16     
  Misses        672      672              
Impacted Files Coverage Δ
+schemes/Namespace.m 90.38% <ø> (ø)
+tests/+system/DynamicTableTest.m 99.52% <ø> (ø)
+types/+util/+dynamictable/addRawData.m 86.23% <84.41%> (+0.94%) ⬆️
+types/+util/+dynamictable/addVecInd.m 92.59% <85.71%> (-3.71%) ⬇️
+types/+util/+dynamictable/checkConfig.m 91.66% <94.44%> (-0.93%) ⬇️
+file/fillConstructor.m 94.44% <100.00%> (+0.17%) ⬆️
+types/+util/+dynamictable/addRow.m 100.00% <100.00%> (+7.69%) ⬆️
+types/+util/+dynamictable/addVarargRow.m 95.12% <100.00%> (+3.81%) ⬆️
+types/+untyped/+datapipe/BlueprintPipe.m 73.03% <0.00%> (ø)
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1cdd68b...fa66bb7. Read the comment docs.

Copy link
Collaborator

@cechava cechava left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

off to a strong start!
DynamicTable tests should be expanded to test expected addRow behavior. Currently getting the errors in the following scenarios:

  1. adding row to multidimensional column of table in memory
% Define 1D column
simple_col = types.hdmf_common.VectorData( ...
    'description', '1D column',...
    'data', rand(10,1) ...
);
% Define ND column
multi_col = types.hdmf_common.VectorData( ...
    'description', 'multidimensional column',...
    'data', rand(3,2,10) ...
);
% construct table
multi_dim_table = types.hdmf_common.DynamicTable( ...
    'description','test table', ...
    'colnames', {'simple','multi'}, ...
    'simple', simple_col, ...
    'multi', multi_col, ...
    'id', types.hdmf_common.ElementIdentifiers('data', (0:9)') ...  % 0-indexed, for compatibility with Python
);
% add row
multi_dim_table.addRow( ...
    'simple', 1, ...
    'multi', rand(3,2), ...
    'id', 10 ...
);

Error using cat
Dimensions of arrays being concatenated are not consistent.

Error in types.util.dynamictable.addRawData>add2MemData (line 140)
VectorData.data = cat(catDim, VectorData.data, data);

Error in types.util.dynamictable.addRawData>nestedAdd (line 118)
        numRows = add2MemData(Vector, data);

Error in types.util.dynamictable.addRawData>nestedAdd (line 105)
        numRows = nestedAdd(DynamicTable, indChain(2:end), data);

Error in types.util.dynamictable.addRawData (line 38)
nestedAdd(DynamicTable, flip(indexChain), data);

Error in types.util.dynamictable.addVarargRow (line 41)
    types.util.dynamictable.addRawData(DynamicTable, rn, rv);

Error in types.util.dynamictable.addRow (line 45)
    types.util.dynamictable.addVarargRow(DynamicTable, varargin{:});

Error in types.hdmf_common.DynamicTable/addRow (line 117)
        types.util.dynamictable.addRow(obj, varargin{:});

Error in ind_tests (line 22)
multi_dim_table.addRow( ...
  1. adding row to multidimensional column of expandable table
% Create file
file= NwbFile( ...
    'session_start_time', '2021-01-01 00:00:00', ...
    'identifier', 'ident1', ...
    'session_description', 'test_file' ...
    );

% Define Vector Data Objects with first row of table
start_time_exp = types.hdmf_common.VectorData( ...
    'description', 'start times column', ...
    'data', types.untyped.DataPipe( ...
        'data', 1, ...  
        'maxSize', Inf ...
    ) ...
);
stop_time_exp = types.hdmf_common.VectorData( ...
    'description', 'stop times column', ...
    'data', types.untyped.DataPipe( ...
        'data', 10, ...  
        'maxSize', Inf ...
    ) ...
);
random_exp = types.hdmf_common.VectorData( ...
    'description', 'random data column', ...
    'data', types.untyped.DataPipe( ...
        'data', rand(3,2,5), ...  #random data
        'maxSize', [3, 2, Inf], ...
        'axis', 3 ...
    ) ...
);
random_exp_index = types.hdmf_common.VectorIndex( ...
    'description', 'index to random data column', ...
     'target',types.untyped.ObjectView(random_exp), ...
    'data', types.untyped.DataPipe( ...
        'data', [5], ...  
        'maxSize', [Inf, 1], ...
        'axis', 1 ...
    ) ...
);
ids_exp = types.hdmf_common.ElementIdentifiers( ...
    'data', types.untyped.DataPipe( ...
        'data', 0, ...  # data must be numerical
        'maxSize', Inf ...
    ) ...
);
% Create expandable table
colnames = {'start_time', 'stop_time', 'randomvalues'};
file.intervals_trials = types.core.TimeIntervals( ...
    'description', 'test expdandable dynamic table', ...
    'colnames', colnames, ...
    'start_time', start_time_exp, ...
    'stop_time', stop_time_exp, ...
    'randomvalues', random_exp, ...
    'randomvalues_index', random_exp_index, ...
    'id', ids_exp ...
);  
% Export file
nwbExport(file, 'multiRaggedExpandableTableTest.nwb');
% Read in file
read_file = nwbRead('multiRaggedExpandableTableTest.nwb');
% add individual rows
read_file.intervals_trials.addRow( ...
    'start_time', 2, ...
    'stop_time', 20, ...
    'randomvalues', rand(3,2,6), ...
    'id', 1 ...
);
Error using types.util.dynamictable.addVarargRow>validateType (line 69)
Expected input to be two-dimensional.

Error in types.util.dynamictable.addVarargRow (line 38)
        validateType(TypeMap(rn), rv);

Error in types.util.dynamictable.addRow (line 45)
    types.util.dynamictable.addVarargRow(DynamicTable, varargin{:});

Error in types.hdmf_common.DynamicTable/addRow (line 117)
        types.util.dynamictable.addRow(obj, varargin{:});

Error in ind_tests (line 91)
read_file.intervals_trials.addRow( ...

@lawrence-mbf
Copy link
Collaborator Author

  1. adding row to multidimensional column of table in memory

Unless I read you wrong, I believe we agreed that we wanted to limit the dimensions for in-memory concatenation to 2D matrices. Did you mean 3D matrices or should I just make the error message more clear?

@lawrence-mbf lawrence-mbf requested a review from cechava January 25, 2022 18:27
@cechava
Copy link
Collaborator

cechava commented Jan 26, 2022

  1. adding row to multidimensional column of table in memory

Unless I read you wrong, I believe we agreed that we wanted to limit the dimensions for in-memory concatenation to 2D matrices. Did you mean 3D matrices or should I just make the error message more clear?

Thanks for adding that helpful error message. The error on that point was on my part. I forgot that we had agreed on to use DataPipes for multidimensional column. Thanks for clarifying.

However, there's still some issues with the behavior of DataPipe when appending to a multidimensional array column. Example code:

% Create file
file= NwbFile( ...
    'session_start_time', '2021-01-01 00:00:00', ...
    'identifier', 'ident1', ...
    'session_description', 'test_file' ...
    );

% Define Vector Data Objects with first row of table
start_time_exp = types.hdmf_common.VectorData( ...
    'description', 'start times column', ...
    'data', types.untyped.DataPipe( ...
        'data', 1, ...  
        'maxSize', Inf ...
    ) ...
);
stop_time_exp = types.hdmf_common.VectorData( ...
    'description', 'stop times column', ...
    'data', types.untyped.DataPipe( ...
        'data', 10, ...  
        'maxSize', Inf ...
    ) ...
);
random_exp = types.hdmf_common.VectorData( ...
    'description', 'random data column', ...
    'data', types.untyped.DataPipe( ...
        'data', rand(3,2,5), ...  #random data
        'maxSize', [3, 2, Inf], ...
        'axis', 3 ...
    ) ...
);
random_exp_index = types.hdmf_common.VectorIndex( ...
    'description', 'index to random data column', ...
     'target',types.untyped.ObjectView(random_exp), ...
    'data', types.untyped.DataPipe( ...
        'data', [5], ...  
        'maxSize', [Inf, 1], ...
        'axis', 1 ...
    ) ...
);
ids_exp = types.hdmf_common.ElementIdentifiers( ...
    'data', types.untyped.DataPipe( ...
        'data', 0, ...  # data must be numerical
        'maxSize', Inf ...
    ) ...
);
% Create expandable table
colnames = {'start_time', 'stop_time', 'randomvalues'};
file.intervals_trials = types.core.TimeIntervals( ...
    'description', 'test expdandable dynamic table', ...
    'colnames', colnames, ...
    'start_time', start_time_exp, ...
    'stop_time', stop_time_exp, ...
    'randomvalues', random_exp, ...
    'randomvalues_index', random_exp_index, ...
    'id', ids_exp ...
);  
% Export file
nwbExport(file, 'multiRaggedExpandableTableTest.nwb');
% Read in file
read_file = nwbRead('multiRaggedExpandableTableTest.nwb');
% add individual rows
read_file.intervals_trials.addRow( ...
    'start_time', 2, ...
    'stop_time', 20, ...
    'randomvalues', rand(3,2,6), ...
    

output after addRow line

>>  read_file.intervals_trials.vectordata.get('randomvalues').data.internal

ans = 

  BoundPipe with properties:

            config: [1×1 types.untyped.datapipe.Configuration]
    pipeProperties: {1×2 cell}
              stub: [1×1 types.untyped.DataStub]
              axis: 3
            offset: 11
          dataType: 'double'
           maxSize: [3 2 Inf]
              dims: [3 2 11]
          filename: 'multiRaggedExpandableTableTest.nwb'
              path: '/intervals/trials/randomvalues'

>> read_file.intervals_trials.vectordata.get('randomvalues_index').data(:)

ans =

     5
     6

Second entry of randomvalues_index data should be 11.

Copy link
Collaborator

@cechava cechava left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comment above for requested changes.

@lawrence-mbf
Copy link
Collaborator Author

@cechava you have to include the axis keyword argument and set the dimension manually in datapipes. However, this is an interesting usage case. Do you and @bendichter think this would be a good idea to read the last Inf in the maxSize keyword argument as the axis dimension if no axis was provided?

@lawrence-mbf
Copy link
Collaborator Author

lawrence-mbf commented Jan 31, 2022

A Blueprint pipe requires two things: the shape of the data structure and an axis for appending.
Given the keyword arguments (x indicates that the user has provided these arguments):

Axis maxSize data Result
x x x Use maxSize for shape and axis as dimension for appending. Check if data is within size bounds
x x _ As above but without the data shape check.
x _ x Generate a maxSize by using the shape provided by data and set the dimension at the axis dimension to Inf
x _ _ Not sure about this. Perhaps assume ones for all dimensions less than the axis dimension? If axis == 3 then the maxSize is [1, 1, Inf]
_ x x Check if data is within maxSize bounds. Check for Inf in maxSize and use the last occurring one. If maxSize has no Inf bounds then assume the last dimension in maxSize
_ x _ Same as above but without the data check.
_ _ x Not sure about this one either. Perhaps assume the last dimension in data?
_ _ _ Most likely an error.

@cechava
Copy link
Collaborator

cechava commented Jan 31, 2022

@cechava you have to include the axis keyword argument and set the dimension manually in datapipes. However, this is an interesting usage case. Do you and @bendichter think this would be a good idea to read the last Inf in the maxSize keyword argument as the axis dimension if no axis was provided?

I did set the 'axis' argument for the randomvalues and randomvalues_index columns. These are the relevant columns for this test, which is to verify the behavior of addRow for multidimensional columns.
I did not set axis for start_times, stop_times, or ids` columns intentionally so that they would be saved as data sets of size (N,). These are not relevant for this test.

@lawrence-mbf
Copy link
Collaborator Author

@cechava Oh I just missed that then.

@lawrence-mbf lawrence-mbf requested a review from cechava January 31, 2022 16:17
Copy link
Collaborator

@cechava cechava left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

works great. Thanks @ln-vidrio

@lawrence-mbf lawrence-mbf merged commit fb24fe3 into master Feb 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants