-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
incompatibility reading boolean data written using pynwb/h5py #206
Comments
Related to #77 |
@bendichter Thanks for raising this issue. Specifically, MATLAB will read a multi-dimensional/nested cell array of Fiji appears to have issues reading, but ImageJ loads them fine. Is the way h5py stores 0-1 uint8 any more efficient (space-wise) after compression that storing it as just a uint8 matrix of 0s and 1s? |
@bahanonu Thanks for the additional context. I think the only advantage of storing them this way as opposed to 0s and 1s in |
@bendichter Sounds good, makes sense to keep the h5py compatibility in that case. Are you thinking of including a small function within matnwb repo to convert to a logical array (as opposed to each user writing their own)? Tested the issue in |
Yes, that is the goal of this ticket :-) |
Maybe the best approach is to use this example code: fid = H5F.open('example.h5');
dset_id = H5D.open(fid,'/g3/enum');
type_id = H5D.get_type(dset_id);
num_members = H5T.get_nmembers(type_id);
for j = 1:num_members
member_name{j} = H5T.get_member_name(type_id,j-1);
member_value(j) = H5T.enum_valueof(type_id,member_name{j});
end running that on an h5py-style boolean, I get
We could use this as a test and then access the data via |
Fixed via #77 |
HDF5 does not natively have a boolean format. h5py (and by extension pynwb) handles this by creating an enumerated type with an underlying data type of uint8 that is constrained to 0-1 and maps onto the values "FALSE" and "TRUE" (see documentation). h5py knows to write data this way, and it also knows to convert this type of data automatically to a boolean numpy array on read. However, MATLAB's HDF5 API does not use this convention, and reads the data according to the enumerated type mapping defined in the dataset: "TRUE" and "FALSE". The data is stored efficiently in the HDF5 file, but is not read efficiently by MATLAB, and requires the user to wrangle the data back to boolean form. The pipeline is working as expected on the python side, but we have a bug/incompatibility when boolean data written by PyNWB is read using MatNWB.
The text was updated successfully, but these errors were encountered: