Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: VectorData.data is converted to string: Error using hdf5lib2 The memory type is incompatible with H5T_STRING. #462

Closed
2 tasks done
dervinism opened this issue Sep 3, 2022 · 28 comments

Comments

@dervinism
Copy link

What happened?

In some instances VectorData property data is being converted to an array of strings upon saving the NWB file. When the conversion fails, the error is issued:

Error using hdf5lib2
The memory type is incompatible with H5T_STRING.  Consider using
'H5ML_DEFAULT' instead.

If I manually convert data into an array of strings, the issue is resolved. However, I don't understand why in some instances I can use an array of numbers and in other instances I have convert them to strings. The documentation states that data property can be any type. This occurs when I try to construct the units table.

Steps to Reproduce

nwb.units = types.core.Units( ...
    'colnames', {'cluster_id','local_cluster_id','type',...
    'peak_channel_index','peak_channel_id',... % Provide the column order. All column names have to be defined below
    'local_peak_channel_id','rel_horz_pos','rel_vert_pos',...
    'isi_violations','isolation_distance','area','probe_id',...
    'spike_times','spike_times_index'}, ...
    'description', 'Units table', ...
    'id', types.hdmf_common.ElementIdentifiers( ...
    'data', int64(0:length(spikes) - 1)), ...
    'cluster_id', types.hdmf_common.VectorData( ...
    'data', metadata(:,1), ...
    'description', 'Unique cluster id'), ...
    'local_cluster_id', types.hdmf_common.VectorData( ...
    'data', metadata(:,2), ...
    'description', 'Local cluster id on the probe'), ...
    'type', types.hdmf_common.VectorData( ...
    'data', metadata(:,3), ...
    'description', 'Cluster type: unit vs mua'), ...
    'peak_channel_index', types.hdmf_common.VectorData( ...
    'data', metadata(:,4), ...
    'description', 'Peak channel row index in the electrode table'), ...
    'peak_channel_id', types.hdmf_common.VectorData( ...
    'data', metadata(:,5), ...
    'description', 'Unique ID of the channel with the largest cluster waveform amplitude'), ...
    'local_peak_channel_id', types.hdmf_common.VectorData( ...
    'data', metadata(:,6), ...
    'description', 'Local probe channel with the largest cluster waveform amplitude'), ...
    'rel_horz_pos', types.hdmf_common.VectorData( ...
    'data', metadata(:,7), ...
    'description', 'Probe-relative horizontal position in mm'), ...
    'rel_vert_pos', types.hdmf_common.VectorData( ...
    'data', metadata(:,8), ...
    'description', 'Probe tip-relative vertical position in mm'), ...
    'isi_violations', types.hdmf_common.VectorData( ...
    'data', metadata(:,9), ...
    'description', 'Interstimulus interval violations (unit quality measure)'), ...
    'isolation_distance', types.hdmf_common.VectorData( ...
    'data', metadata(:,10), ...
    'description', 'Cluster isolation distance (unit quality measure)'), ...
    'area', types.hdmf_common.VectorData( ...
    'data', metadata(:,11), ...
    'description', ['Brain area where the unit is located. Internal thalamic' ...
    'nuclei divisions are not precise, because they are derived from unit locations on the probe.']), ...
    'probe_id', types.hdmf_common.VectorData( ...
    'data', metadata(:,12), ...
    'description', 'Probe id where the unit is located'), ...
    'spike_times', spike_times_vector, ...
    'spike_times_index', spike_times_index, ...
    'electrode_group', types.hdmf_common.VectorData( ...
    'data', metadata(:,13), ...
    'description', 'Recording channel groups'), ...
    'electrodes', types.hdmf_common.DynamicTableRegion('table', ...
    types.untyped.ObjectView('/general/extracellular_ephys/electrodes'), ...
    'description',  'Probe recording channels', ...
    'data', cell2mat(table2array(metadata(:,1)))), ...
    'waveform_mean', types.hdmf_common.VectorData( ...
    'data', waveformMeans, ...
    'description', ['Mean waveforms on the probe channel with the largest waveform amplitude.' ...
    'MUA waveforms are excluded. The order that waveforms are stored match the order of units in the unit table.']) ...
    );

These lines cause the error:
    'rel_horz_pos', types.hdmf_common.VectorData( ...
    'data', metadata(:,7), ...
    'description', 'Probe-relative horizontal position in mm'), ...
    'rel_vert_pos', types.hdmf_common.VectorData( ...
    'data', metadata(:,8), ...
    'description', 'Probe tip-relative vertical position in mm'), ...

If I convert every element of metadata(:,7) and metadata(:,8) to strings, then error is resolved. Using any other data type causes the error.

Error Message

Error using hdf5lib2
The memory type is incompatible with H5T_STRING.  Consider using
'H5ML_DEFAULT' instead.

Error in H5D.write (line 100)
H5ML.hdf5lib2('H5Dwrite', varargin{:});

Error in io.writeCompound (line 110)
H5D.write(did, tid, sid, sid, 'H5P_DEFAULT', data);

Error in types.untyped.MetaClass/write_base (line 24)
                    io.writeCompound(fid, fullpath, obj.data);

Error in types.untyped.MetaClass/export (line 66)
            refs = obj.write_base(fid, fullpath, refs);

Error in types.hdmf_common.Data/export (line 37)
        refs = [email protected](obj, fid, fullpath, refs);

Error in types.hdmf_common.VectorData/export (line 55)
        refs = [email protected]_common.Data(obj, fid, fullpath, refs);

Error in types.untyped.Set/export (line 181)
                    refs = v.export(fid, propfp, refs);

Error in types.hdmf_common.DynamicTable/export (line 106)
            refs = obj.vectordata.export(fid, fullpath, refs);

Error in types.core.Units/export (line 140)
        refs = [email protected]_common.DynamicTable(obj, fid, fullpath,
        refs);

Error in types.core.NWBFile/export (line 1078)
            refs = obj.units.export(fid, [fullpath '/units'], refs);

Error in NwbFile/export (line 62)
                refs = [email protected](obj, output_file_id, '/',
                {});

Error in nwbExport (line 36)
    nwb(i).export(filename);

Error in convert2nwb (line 260)
    nwbExport(nwb, [animalDerivedDataFolderNWB filesep 'ecephys_session_0'
    num2str(iSess) '.nwb']);

Operating System

Windows

Matlab Version

R2021a

Code of Conduct

@lawrence-mbf lawrence-mbf self-assigned this Sep 6, 2022
@lawrence-mbf
Copy link
Collaborator

Hi @dervinism what is the type of the 7th and 8th column of metadata?

@dervinism
Copy link
Author

dervinism commented Sep 6, 2022

Hi Lawrence,

metadata is a Matlab table but individual values are double. Other table columns contain either doubles or strings. If I input any other table columns with doubles, they produce the same error. The only thing that works is converting doubles into strings. However, no such issue for other entries excepts for waveform_mean which has the same issue.

@lawrence-mbf
Copy link
Collaborator

Are you using any extension schemas? Specifically anything that might overload the Units table declaration? I'm not seeing anything special with rel_vert_pos and rel_horz_pos using the normal schema.

@dervinism
Copy link
Author

No extensions. rel_vert_pos and rel_horz_pos are just like other entries. Nothing special about them.

@lawrence-mbf
Copy link
Collaborator

lawrence-mbf commented Sep 6, 2022

So when you index into a table using parentheses, you actually get another table containing only that column. When exporting, this is actually presumed to be a compound type and we encode all the types that way. I'm not sure why table2struct converted your values to string arrays but this might be part of the problem.

I understand this might be asking a lot but could you try this script but replacing the table indexing instances metadata(:,1) with curly braces instead of round ones (i.e. metadata{:,1})?

@dervinism
Copy link
Author

metadata{:,1} gives a column cell array of doubles. The NWB file is saved fine, but the saved values are random strings. Presumably the numbers are translated into strings.

cell2mat(metadata{:,1}) gives a column vector of doubles. It crashes the code with the typical error.

@lawrence-mbf
Copy link
Collaborator

I am unable to recreate this issue on my end using random double values. Is it possible for you to send a subset of the non-working NWB file? I can send you a link where you can upload it.

@lawrence-mbf
Copy link
Collaborator

lawrence-mbf commented Sep 6, 2022

Oh, nevermind. I missed your reference to your cell array of doubles.

Blocking vector data in primitive cell arrays is currently not supported through regular construction (it is supported in addRow syntax). Were you trying to create ragged arrays using this format?

@dervinism
Copy link
Author

I am not sure I understand your reply. As I said, the error is caused by all data types except strings. I can share the code with the data example that reproduces the error.

@lawrence-mbf
Copy link
Collaborator

At the current moment and from what I could glean from your comments, what you're trying to do won't work without script modification. There's no support for storing cell arrays of doubles and MatNWB assumes that cell arrays are strings (which should now throw an error as of the tip of master).

Is this part of a downstream automation that was read in from another NWB file by chance? Otherwise I'm not sure why you have cell arrays of doubles from indexing into your table.

@dervinism
Copy link
Author

dervinism commented Sep 6, 2022

I don't have cell arrays. I have a Matlab table. Indexing into a table using curly brackets will give cell arrays. But you can convert them to any type of data. Whatever type is used, either the error is thrown or data is converted to strings and the output no longer makes sense.

@lawrence-mbf
Copy link
Collaborator

lawrence-mbf commented Sep 6, 2022

Right, the tip of master should now throw an error when it encounters a cell array of doubles as that is not representable in HDF5 on its own.

In the case of MATLAB tables, grouping doubles into vectors of cell arrays is to allow for a variable number of double values per row. Do you happen to know if that is the case here or is it just preventative?

@dervinism
Copy link
Author

I don't know the answer to your question. I'll try out the latest matnwb and see if I still get the error.

@lawrence-mbf
Copy link
Collaborator

lawrence-mbf commented Sep 6, 2022 via email

@dervinism
Copy link
Author

I use util.table2nwb to create an electrode table. Overall, I am following matnwb ecephys tutorial.

Now I've tested the latest master branch of matnwb and it gives the same error. Except now if I use cell arrays, I get a new type of error that you have recently introduced to prohibit cell arrays. Using all other types of data gives the same old error. So the issue remains not fixed. It is probably best to replicate the error using my code and example data.

@lawrence-mbf
Copy link
Collaborator

Yes, is there a place where we can see your code? Posting example data should help too.

@dervinism
Copy link
Author

dervinism commented Sep 7, 2022

I have tested the code on another machine (Windows 11) running Matlab R2022a and the problem goes away. I only experience it on a Windows 10 machine with Matlab R2021a. I've uploaded code and example data on a GIN repository. You would have to install GIN command line tools to download it (not sure if git only would work).

To replicate the error, replace line number 194 of convert2nwb 'data', rel_horz_pos, ... to 'data', metadata(:,7), ...
That should suffice.

@lawrence-mbf
Copy link
Collaborator

Hi @dervinism,

I got this data working (and on R2017a as well) but the script needs a few changes in how you create and use the table object. I do not believe these fixes warrant any change to the MatNWB code base itself. Would you prefer the changes described here? I'm not familiar with GIN.

@dervinism
Copy link
Author

dervinism commented Sep 7, 2022

Sure, describing here will work. Thanks for looking into my code.
GIN works very similarly to Github but for large files. Suitable for uploading research data repositories.

@lawrence-mbf
Copy link
Collaborator

It really all goes back to the cell arrays. VectorData objects cannot be instantiated with cell arrays of numeric types. Their data properties only allow for conversions of:

  1. cell arrays of character vectors (converted to arrays of variable length string types).
  2. multi-dimensional numeric matrices (converted to their equivalents in HDF5).
  3. structs, container.Map objects, and table objects (converted to compound types).
  4. ObjectView/RegionView matrices (converted to references).

With the current script, your metadata table is attempting to create VectorData with sub-tables (of the table type) containing a cell array of either character vectors, numbers, or ObjectView.

This (before the addition of the assertion check) would have been mapped to a HDF5 compound data type of variable length strings because a cell array was assumed to imply a string type. This has now been fixed in master and should error. What you want to do is construct your VectorData object with a numeric vector representing your DynamicTable's column.

The simplest method would be to replace all table indexes with this: cell2mat(metadata{:,x}) but only for numeric and ObjectView columns. There should not be any string conversions necessary for this to work.

@lawrence-mbf
Copy link
Collaborator

On a side note, waveformMeans should also be concatenated into a double matrix and not converted to a string.

@dervinism
Copy link
Author

This info is useful but this is not the cause of the error. Even if I follow your instructions I get an error on one machine but not on the other. I even now tried different Matlab versions and they all produce the same error on one machine but not on the other. Very strange!

@dervinism
Copy link
Author

I noticed that the error is different than previously reported, but still all else is relevant:

Error using hdf5lib2
The HDF5 library encountered an unknown error.

Error in H5D.write (line 100)
H5ML.hdf5lib2('H5Dwrite', varargin{:});

Error in io.writeDataset (line 35)
H5D.write(did, tid, sid, sid, 'H5P_DEFAULT', data);

Error in types.untyped.MetaClass/write_base (line 26)
                    io.writeDataset(fid, fullpath, obj.data, 'forceArray');

Error in types.untyped.MetaClass/export (line 66)
            refs = obj.write_base(fid, fullpath, refs);

Error in types.hdmf_common.Data/export (line 37)
        refs = [email protected](obj, fid, fullpath, refs);

Error in types.hdmf_common.VectorData/export (line 55)
        refs = [email protected]_common.Data(obj, fid, fullpath, refs);

Error in types.untyped.Set/export (line 181)
                    refs = v.export(fid, propfp, refs);

Error in types.hdmf_common.DynamicTable/export (line 106)
            refs = obj.vectordata.export(fid, fullpath, refs);

Error in types.core.Units/export (line 140)
        refs = [email protected]_common.DynamicTable(obj, fid, fullpath, refs);

Error in types.core.NWBFile/export (line 1078)
            refs = obj.units.export(fid, [fullpath '/units'], refs);

Error in NwbFile/export (line 62)
                refs = [email protected](obj, output_file_id, '/', {});

Error in nwbExport (line 36)
    nwb(i).export(filename);

Error in convert2nwb (line 289)
    nwbExport(nwb, [animalDerivedDataFolderNWB filesep 'ecephys_session_0' num2str(iSess) '.nwb']);

@lawrence-mbf
Copy link
Collaborator

The script with your changes works fine for me.

Can you delete the output NWB file in your cleanup script prior to running convert2nwb? MatNWB by will attempt to modify the NWB file otherwise.

@dervinism
Copy link
Author

Deleting old NWB files did solve the problem in the end. Many thanks for your help!

@lawrence-mbf
Copy link
Collaborator

Good to see!

@lawrence-mbf lawrence-mbf removed their assignment Jan 12, 2023
@NeurodataWithoutBorders NeurodataWithoutBorders locked as spam and limited conversation to collaborators Jan 12, 2023
@lawrence-mbf
Copy link
Collaborator

lawrence-mbf commented Jan 18, 2023 via email

@lawrence-mbf
Copy link
Collaborator

lawrence-mbf commented Mar 15, 2023 via email

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants