-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature]: Add Compression support for compound objects #1041
Comments
Why? How are you thinking about this? Is the cost of leaving them uncompressed so high? They usually appear on dynamic tables where I think the effect is not that large.
Question about this for my own learning: shouldn't those be strings and not objects? https://hdmf-schema-language.readthedocs.io/en/latest/description.html#dtype |
The cost of leaving them uncompressed isn't the problem. The problem is that To be honest, it might be worthwhile to be able to specify no-compression at a higher level in addition to addressing this specific error. |
pandas defaults to |
Regarding the Phy/Kilosort this is an easy case that we can modify on the fly here or in spikeinterface. But in general maybe we should allow compression settings to strings. The problem would be to differentiate which arrays of dtype objects are really strings and which are not. Do we have examples of dtype objects that are not strings? I think things like groups but we would need to verify. |
OK, the phy/kilosort problem is fixed, merged and should be included in the incoming release of Spikeinterface: This issue can be separated into three main cases:
(we are already doing that for arrays of numpy type "U" so it should be unified)
Examples: from hdmf.common import DynamicTable
from pynwb.testing.mock.file import mock_NWBFile
from pynwb import TimeSeries
import numpy as np
nwbfile = mock_NWBFile()
dynamic_table = DynamicTable(name="table", description="desc")
dynamic_table.add_row()
dynamic_table.add_row()
dynamic_table.add_row()
data = np.asarray(["this", 35.0, 1])
dynamic_table.add_column(name="stuff", description="description", data=data)
time_series = TimeSeries(name="ts",data=data.copy(), unit="n.a.", rate=1.0)
nwbfile.add_acquisition(dynamic_table)
nwbfile.add_acquisition(time_series)
from pynwb import NWBHDF5IO
with NWBHDF5IO("test.nwb", mode="w") as io:
io.write(nwbfile)
io = NWBHDF5IO("test.nwb", mode="r")
nwbfile_read = io.read() Arrays with python objects (this does not work): import numpy as np
class MyObject:
def __init__(self, name, value):
self.name = name
self.value = value
obj1 = MyObject("Object1", 10)
obj2 = MyObject("Object2", 20)
obj3 = MyObject("Object3", 30)
# NumPy array containing custom objects (none work)
object_array = np.array([obj1, obj2, obj3], dtype=object)
object_array = np.array([dict(x=1), dict(y=2), dict(z=3)])
from pynwb import TimeSeries
from pynwb.testing.mock.file import mock_NWBFile
from hdmf.common import DynamicTable
nwbfile = mock_NWBFile()
dynamic_table = DynamicTable(name="table", description="desc")
dynamic_table.add_row()
dynamic_table.add_row()
dynamic_table.add_row()
dynamic_table.add_column(name="stuff", description="description", data=object_array.copy())
#time_series = TimeSeries(name="data", data=object_array.copy(), unit="n.a.", rate=1.0)
nwbfile.add_acquisition(dynamic_table)
nwbfile.add_acquisition(time_series)
from pynwb import NWBHDF5IO
# Write the NWBFile to a file
with NWBHDF5IO("test.nwb", mode="w") as io:
io.write(nwbfile) |
Sounds good! I can take 1&2 in #1042 3 should probably be a separate PR? |
Actually, numpy strings have itemsize 0, so the estimate methods fail.
How did you deal with the itemsize problem in those cases? |
Can you show me the code that is failing, maybe I misunderstood but I just ran: import numpy as np
x = np.asarray(["a", "b", "c"])
x.dtype.itemsize, x.dtype, np.__version__
(4, dtype('<U1'), '1.26.4') Yeah, 3 I would just ask the pynwb team, probably we don't need to do anything about it... |
I see now. I was specifying dtype="U", which doesn't have an itemsize, but if all the strings are below a certain length it will use a fixed size (<U1, <U2, <U3, etc.) |
Ah, that's a catch. |
What would you like to see added to NeuroConv?
DatasetIOConfiguration.from_neurodata_object
currently raises aNotImplementedError
for compound objects with dtype 'object'.This makes it very difficult to use the
KiloSortSortingInterface
, which includes a compound object for unit quality ('good' or 'noise').Some chunking/compression should be selected by default, maybe with a warning that it might be sub-optimal for compound objects.
Do you have any interest in helping implement the feature?
Yes.
Code of Conduct
The text was updated successfully, but these errors were encountered: