Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Namespace collision with two or more SpikeGLX interfaces #1112

Open
2 tasks done
simon-ball opened this issue Oct 11, 2024 · 11 comments
Open
2 tasks done

[Bug]: Namespace collision with two or more SpikeGLX interfaces #1112

simon-ball opened this issue Oct 11, 2024 · 11 comments

Comments

@simon-ball
Copy link
Contributor

simon-ball commented Oct 11, 2024

What happened?

A large proportion of the experimental data recorded within our lab uses two or more Neuropixels probes (recorded with SpikeGLX) simultaneously. Each probe recording is then clustered independently via KiloSort. I expect to then generate a single NWB file, containing these multiple sets of SpikeGLX data, correctly linked to the correct member of multiple sets of Kilosort data.

However, I have not yet found a way to handle this case via Neuroconv. Possibly I am missing something obvious, possibly it was not envisaged during development. Nowhere in the documentation nor CatalystNeuro examples can I find something similar.

Data from a typical recording session looks something like this (irrelevant files excluded for clarity)

> data
   > subject_1
      > session_1
         > probe_1
            > ks2.5_01
            npx.imec.ap.bin
            npx.imec.ap.meta
         > probe_2
            > ks2.5_01
            npx.imec.ap.bin
            npx.imec.ap.meta

I have tried the following approaches, with the following failure modes:

Attempt 1: NWBConverter, following the documentation here

< See code attempt_1 >

This runs into several problems

  • Since source_data must be provided as a dictionary, not a list, it does not scale to arbitrary n probes. I would need to duplicate the code for new, separate classes ThreeProbeConverter, FourProbeConverter, etc
  • There is no obvious way to specify that ks_1 is associated with probe_1 and not with probe_2
  • Most crucially, the SpikeGLXRecordingInterface does not name its component parts uniquely (despite having access to the unique serial number encoded in the metadata file used to extract the model type). Therefore, for example, the Device entity for each SpikeGLXRecordingInterface is identical as far as name, description etc, as are the various ElectrodeGroups, and so when the metadata dictionary is repeatedly updated within md = converter.get_metadata(), the outcome only lists a single device, and as many electrode groups as a single probe

I did not get as far as figuring out how to crosslink ks_1 with probe_1, as the device issue proved terminal.


Attempt 2: ConverterPipe, similar to the documentation here, overwriting the metadata as needed to provide unique device IDs

< see code attempt_2>

This solves the issue of degenerate devices, and also avoids the issue of needing a new Converter class for each number of probes. However, it still doesn't address crosslinking the ks interface, and even before that, it runs into an issue with Ecephys.ElectricalSeriesAP <see Traceback>
Unlike Ecephys.Device, Ecephys.ElectricalSeriesAP is a single entry per file (at least, right now), and forcibly casting it to a list unsurprisingly causes a validation error.

I haven't found any other potential work arounds to attempt, yet.


Is what I'm trying to do possibly via Neuroconv right now, and I've just missed the bleeding obvious? Given the complete absence of any such case of multiple identical interfaces from the documentation, perhaps this is a sufficiently esoteric edge case that it has either never been considered or considered and discarded.

Steps to Reproduce

# Attempt 1
class TwoProbeConverter(NWBConverter):
    data_interface_classes=dict(
        probe_1 = SpikeGLXRecordingInterface,
        probe_2 = SpikeGLXRecordingInterface,
        ks_1 = KiloSortSortingInterface,
        ks_2 = KiloSortSortingInterface,
    )

data = pathlib.Path("/path/to/data/subject_1/session_1")
source_data = dict()
for pn in range(1,3):    # edit: fix for consistent demonstration purposes
    source_data[f"probe_{pn}"] = dict(file_path = data / f"probe_{pn}" / "npx.imec.ap.bin")
    source_data[f"ks_{pn}"] = dict(folder_path = data / f"probe_{pn}" / "ks2.5_01")

converter = TwoProbeConverter(source_data)
md = converter.get_metadata()

# There should be two devices
print(len(md["Ecephys"]["Device"])
>>> 1     # Oh dear



# -------------------------

# Attempt 2

md = DeepDict()
interfaces = []
for pn in range(1,3):
    device_name = f"probe_{pn}"
    sglx_interface = SpikeGLXRecordingInterface(data / device_name / "npx.imec.ap.bin")
    ks_interface = KiloSortSortingInterface(data / device_name / "ks2.5_01")
    
    sglx_md = sglx_interface.get_metadata()
    ks_md = ks_interface.get_metadata()
    
    # Handle Device updating
    sglx_md["Ecephys"]["Device"][0]["name"] = device_name
    for eg in sglx_md["Ecephys"]["ElectrodeGroup"]:
        eg["device"] = device_name
        eg["name"] = device_name + "::" + eg["name"]

    # Add metadata
    md = dict_deep_update(md, sglx_md)
    md = dict_deep_update(md, ks_md)

    # Prepare for conversion
    interfaces.append(sglx_interface)
    interfaces.append(ks_interface)

converter = ConverterPipe(data_interfaces = interfaces, verbose=True)
converter.run_conversion(nwbfile_path = data/"out.nwb", metadata = md)

Traceback

File ~/opt/anaconda3/envs/exporter/lib/python3.11/site-packages/neuroconv/nwbconverter.py:248, in NWBConverter.run_conversion(self, nwbfile_path, nwbfile, metadata, overwrite, backend, backend_configuration, conversion_options)
    239 with make_or_load_nwbfile(
    240     nwbfile_path=nwbfile_path,
    241     nwbfile=nwbfile,
   (...)
    245     verbose=getattr(self, "verbose", False),
    246 ) as nwbfile_out:
    247     if no_nwbfile_provided:
--> 248         self.add_to_nwbfile(nwbfile=nwbfile_out, metadata=metadata, conversion_options=conversion_options)
    250     if backend_configuration is None:
    251         backend_configuration = self.get_default_backend_configuration(nwbfile=nwbfile_out, backend=backend)

File ~/opt/anaconda3/envs/exporter/lib/python3.11/site-packages/neuroconv/nwbconverter.py:172, in NWBConverter.add_to_nwbfile(self, nwbfile, metadata, conversion_options)
    170 conversion_options = conversion_options or dict()
    171 for interface_name, data_interface in self.data_interface_objects.items():
--> 172     data_interface.add_to_nwbfile(
    173         nwbfile=nwbfile, metadata=metadata, **conversion_options.get(interface_name, dict())
    174     )

File ~/opt/anaconda3/envs/exporter/lib/python3.11/site-packages/neuroconv/datainterfaces/ecephys/baserecordingextractorinterface.py:367, in BaseRecordingExtractorInterface.add_to_nwbfile(self, nwbfile, metadata, stub_test, starting_time, write_as, write_electrical_series, compression, compression_opts, iterator_type, iterator_opts)
    364 if metadata is None:
    365     metadata = self.get_metadata()
--> 367 add_recording_to_nwbfile(
    368     recording=recording,
    369     nwbfile=nwbfile,
    370     metadata=metadata,
    371     starting_time=starting_time,
    372     write_as=write_as,
    373     write_electrical_series=write_electrical_series,
    374     es_key=self.es_key,
    375     compression=compression,
    376     compression_opts=compression_opts,
    377     iterator_type=iterator_type,
    378     iterator_opts=iterator_opts,
    379 )

File ~/opt/anaconda3/envs/exporter/lib/python3.11/site-packages/neuroconv/tools/spikeinterface/spikeinterface.py:1115, in add_recording_to_nwbfile(recording, nwbfile, metadata, starting_time, write_as, es_key, write_electrical_series, write_scaled, compression, compression_opts, iterator_type, iterator_opts)
   1113 number_of_segments = recording.get_num_segments()
   1114 for segment_index in range(number_of_segments):
-> 1115     add_electrical_series_to_nwbfile(
   1116         recording=recording,
   1117         nwbfile=nwbfile,
   1118         segment_index=segment_index,
   1119         starting_time=starting_time,
   1120         metadata=metadata,
   1121         write_as=write_as,
   1122         es_key=es_key,
   1123         write_scaled=write_scaled,
   1124         compression=compression,
   1125         compression_opts=compression_opts,
   1126         iterator_type=iterator_type,
   1127         iterator_opts=iterator_opts,
   1128     )

File ~/opt/anaconda3/envs/exporter/lib/python3.11/site-packages/neuroconv/tools/spikeinterface/spikeinterface.py:954, in add_electrical_series_to_nwbfile(recording, nwbfile, metadata, segment_index, starting_time, write_as, es_key, write_scaled, compression, compression_opts, iterator_type, iterator_opts)
    952 es = pynwb.ecephys.ElectricalSeries(**eseries_kwargs)
    953 if write_as == "raw":
--> 954     nwbfile.add_acquisition(es)
    955 elif write_as == "processed":
    956     ecephys_mod.data_interfaces["Processed"].add_electrical_series(es)

File ~/opt/anaconda3/envs/exporter/lib/python3.11/site-packages/hdmf/utils.py:668, in docval.<locals>.dec.<locals>.func_call(*args, **kwargs)
    666 def func_call(*args, **kwargs):
    667     pargs = _check_args(args, kwargs)
--> 668     return func(args[0], **pargs)

File ~/opt/anaconda3/envs/exporter/lib/python3.11/site-packages/pynwb/file.py:852, in NWBFile.add_acquisition(self, **kwargs)
    848 @docval({'name': 'nwbdata', 'type': (NWBDataInterface, DynamicTable)},
    849         {'name': 'use_sweep_table', 'type': bool, 'default': False, 'doc': 'Use the deprecated SweepTable'})
    850 def add_acquisition(self, **kwargs):
    851     nwbdata = popargs('nwbdata', kwargs)
--> 852     self._add_acquisition_internal(nwbdata)
    853     use_sweep_table = popargs('use_sweep_table', kwargs)
    854     if use_sweep_table:

File ~/opt/anaconda3/envs/exporter/lib/python3.11/site-packages/hdmf/utils.py:668, in docval.<locals>.dec.<locals>.func_call(*args, **kwargs)
    666 def func_call(*args, **kwargs):
    667     pargs = _check_args(args, kwargs)
--> 668     return func(args[0], **pargs)

File ~/opt/anaconda3/envs/exporter/lib/python3.11/site-packages/hdmf/container.py:1182, in MultiContainerInterface.__make_add.<locals>._func(self, **kwargs)
   1178     if tmp.name in d:
   1179         msg = (f"Cannot add {tmp.__class__} '{tmp.name}' at 0x{id(tmp)} to dict attribute '{attr_name}' in "
   1180                f"{cls} '{self.name}'. {d[tmp.name].__class__} '{tmp.name}' at 0x{id(d[tmp.name])} "
   1181                f"already exists in '{attr_name}' and has the same name.")
-> 1182         raise ValueError(msg)
   1183     d[tmp.name] = tmp
   1184 return container

ValueError: Cannot add <class 'pynwb.ecephys.ElectricalSeries'> 'ElectricalSeriesAP' at 0x13801152080 to dict attribute 'acquisition' in <class 'pynwb.file.NWBFile'> 'root'. <class 'pynwb.ecephys.ElectricalSeries'> 'ElectricalSeriesAP' at 0x13538196304 already exists in 'acquisition' and has the same name.

Operating System

macOS

Python Executable

Python

Python Version

3.11

Package Versions

annotated-types==0.7.0
appdirs==1.4.4
appnope @ file:///home/conda/feedstock_root/build_artifacts/appnope_1707233003401/work
argon2-cffi==23.1.0
argon2-cffi-bindings==21.2.0
asciitree==0.3.3
asttokens @ file:///home/conda/feedstock_root/build_artifacts/asttokens_1698341106958/work
attrs==24.2.0
blinker==1.8.2
certifi==2024.8.30
cffi==1.17.1
click==8.1.7
comm @ file:///home/conda/feedstock_root/build_artifacts/comm_1710320294760/work
contourpy==1.3.0
cryptography==43.0.1
cycler==0.12.1
datajoint==0.14.3
debugpy @ file:///Users/runner/miniforge3/conda-bld/debugpy_1728594122099/work
decorator @ file:///home/conda/feedstock_root/build_artifacts/decorator_1641555617451/work
docstring_parser==0.16
exceptiongroup @ file:///home/conda/feedstock_root/build_artifacts/exceptiongroup_1720869315914/work
executing @ file:///home/conda/feedstock_root/build_artifacts/executing_1725214404607/work
Faker==30.3.0
fasteners==0.19
Flask==3.0.3
fonttools==4.54.1
h5py==3.12.1
hdmf==3.14.5
hdmf_zarr==0.9.0
importlib_metadata @ file:///home/conda/feedstock_root/build_artifacts/importlib-metadata_1726082825846/work
ipykernel @ file:///Users/runner/miniforge3/conda-bld/ipykernel_1719845458456/work
ipython @ file:///home/conda/feedstock_root/build_artifacts/ipython_1727944696411/work
itsdangerous==2.2.0
jedi @ file:///home/conda/feedstock_root/build_artifacts/jedi_1696326070614/work
Jinja2==3.1.4
jsonschema==4.23.0
jsonschema-specifications==2024.10.1
jupyter_client @ file:///home/conda/feedstock_root/build_artifacts/jupyter_client_1726610684920/work
jupyter_core @ file:///home/conda/feedstock_root/build_artifacts/jupyter_core_1727163409502/work
kiwisolver==1.4.7
MarkupSafe==3.0.1
matplotlib==3.9.2
matplotlib-inline @ file:///home/conda/feedstock_root/build_artifacts/matplotlib-inline_1713250518406/work
minio==7.2.9
neo==0.13.3
nest_asyncio @ file:///home/conda/feedstock_root/build_artifacts/nest-asyncio_1705850609492/work
networkx==3.4
neuroconv==0.6.4
numcodecs==0.13.1
numpy==1.26.4
otumat==0.3.1
packaging @ file:///home/conda/feedstock_root/build_artifacts/packaging_1718189413536/work
pandas==2.2.3
parse==1.20.2
parso @ file:///home/conda/feedstock_root/build_artifacts/parso_1712320355065/work
pexpect @ file:///home/conda/feedstock_root/build_artifacts/pexpect_1706113125309/work
pickleshare @ file:///home/conda/feedstock_root/build_artifacts/pickleshare_1602536217715/work
pillow==10.4.0
platformdirs @ file:///home/conda/feedstock_root/build_artifacts/platformdirs_1726613481435/work
probeinterface==0.2.24
prompt_toolkit @ file:///home/conda/feedstock_root/build_artifacts/prompt-toolkit_1727341649933/work
psutil @ file:///Users/runner/miniforge3/conda-bld/psutil_1725737862086/work
ptyprocess @ file:///home/conda/feedstock_root/build_artifacts/ptyprocess_1609419310487/work/dist/ptyprocess-0.7.0-py2.py3-none-any.whl
pure_eval @ file:///home/conda/feedstock_root/build_artifacts/pure_eval_1721585709575/work
pycparser==2.22
pycryptodome==3.21.0
pydantic==2.9.2
pydantic_core==2.23.4
pydot==3.0.2
Pygments @ file:///home/conda/feedstock_root/build_artifacts/pygments_1714846767233/work
PyMySQL==1.1.1
pynwb==2.8.2
pyparsing==3.1.4
python-dateutil @ file:///home/conda/feedstock_root/build_artifacts/python-dateutil_1709299778482/work
pytz==2024.2
PyYAML==6.0.2
pyzmq @ file:///Users/runner/miniforge3/conda-bld/pyzmq_1725448984636/work
quantities==0.15.0
referencing==0.35.1
rpds-py==0.20.0
ruamel.yaml==0.18.6
ruamel.yaml.clib==0.2.8
scipy==1.14.1
six @ file:///home/conda/feedstock_root/build_artifacts/six_1620240208055/work
spikeinterface==0.101.2
stack-data @ file:///home/conda/feedstock_root/build_artifacts/stack_data_1669632077133/work
threadpoolctl==3.5.0
tornado @ file:///Users/runner/miniforge3/conda-bld/tornado_1724956123063/work
tqdm==4.66.5
traitlets @ file:///home/conda/feedstock_root/build_artifacts/traitlets_1713535121073/work
typing_extensions @ file:///home/conda/feedstock_root/build_artifacts/typing_extensions_1717802530399/work
tzdata==2024.2
urllib3==2.2.3
watchdog==5.0.3
wcwidth @ file:///home/conda/feedstock_root/build_artifacts/wcwidth_1704731205417/work
Werkzeug==3.0.4
zarr==2.17.2
zipp @ file:///home/conda/feedstock_root/build_artifacts/zipp_1726248574750/work

Code of Conduct

@simon-ball simon-ball added the bug label Oct 11, 2024
@h-mayorquin h-mayorquin self-assigned this Oct 11, 2024
@h-mayorquin
Copy link
Collaborator

Hi, thanks for your detailed report.

Indeed we are lacking documentation on how to do that. Moreover, as I was trying to build an example for you I found our current data model in certain parts inadequate to cover this. We would be working on a way to make this simpler in the future but meanwhile, I wanted to provide working code to solve your problem.

First, though, I am giving you code to write two SpikeGLX interfaces without name collision and adequate device and electrode group handling:

from neuroconv.datainterfaces import SpikeGLXRecordingInterface, KiloSortSortingInterface
from neuroconv import ConverterPipe

file_path_spikeglx_1 = ECEPHY_DATA_PATH / "spikeglx"/ "long_nhp_stubbed" /"snippet_g0/snippet_g0_imec0" / "snippet_g0_t0.imec0.ap.bin"
es_key_1 = "SpikeGLX1"  # Electrical Series Metadata Key
interface_spikeglx_1 = SpikeGLXRecordingInterface(file_path=file_path_spikeglx_1, es_key=es_key_1)
metadata = interface_spikeglx_1.get_metadata()
probe_name = "ProbeA"
probe_metadata = {"name": probe_name, "description": "probe_description", "manufacturer": "IMEC"}
metadata["Ecephys"]["Device"] = [probe_metadata]

electrode_group_metadata = metadata["Ecephys"]["ElectrodeGroup"]

# This is a list of dicts with entries name, description and device
# Update name to Probe_{shank_index} and device to probe_name
electrode_group_names = [electrode_metadata["name"].replace("Imec", probe_name) for electrode_metadata in electrode_group_metadata]


for entry, name in zip(electrode_group_metadata, electrode_group_names):
    entry.update(name=name, device=probe_name)

metadata["Ecephys"]["ElectrodeGroup"] = electrode_group_metadata


channel_group_names = interface_spikeglx_1.recording_extractor.get_property("group_name")
new_channel_group_names = [name.replace("Imec", probe_name) for name in channel_group_names]
interface_spikeglx_1.recording_extractor.set_property(key="group_name", values=new_channel_group_names)

# Note that the first interface needs to cretate the nwbfile
nwbfile = interface_spikeglx_1.create_nwbfile(metadata=metadata)


file_path_spikeglx_2 = ECEPHY_DATA_PATH / "spikeglx" / "NP2_no_sync" / "all_chan_g0_t0.exported.imec0.ap.bin"
es_key_2 = "SPIKEGLX2"  # Electrical Series Metadata Key
interface_spikeglx_2 = SpikeGLXRecordingInterface(file_path=file_path_spikeglx_2, es_key=es_key_2)
metadata = interface_spikeglx_2.get_metadata()
probe_name = "ProbeB"
probe_metadata = {"name": probe_name, "description": "probe_description", "manufacturer": "IMEC"}
metadata["Ecephys"]["Device"] = [probe_metadata]

electrode_group_metadata = metadata["Ecephys"]["ElectrodeGroup"]

# This is a list of dicts with entries name, description and device
# Update name to Probe_{shank_index} and device to probe_name
electrode_group_names = [electrode_metadata["name"].replace("Imec", probe_name) for electrode_metadata in electrode_group_metadata]


for entry, name in zip(electrode_group_metadata, electrode_group_names):
    entry.update(name=name, device=probe_name)

metadata["Ecephys"]["ElectrodeGroup"] = electrode_group_metadata


channel_group_names = interface_spikeglx_2.recording_extractor.get_property("group_name")
new_channel_group_names = [name.replace("Imec", probe_name) for name in channel_group_names]
interface_spikeglx_2.recording_extractor.set_property(key="group_name", values=new_channel_group_names)

# Note that the first interface needs to cretate the nwbfile
interface_spikeglx_2.add_to_nwbfile(nwbfile=nwbfile, metadata=metadata)
nwbfile

# When you are done write to NWB
nwbfile_path = "nwbfile.nwb"
from pynwb import NWBHDF5IO
with NWBHDF5IO(nwbfile_path, mode='w') as io:
    io.write(nwbfile)

I think this can be put inside a loop the way you wanted. Let me know if it makes sense.

Before adding the sorting part I just wanted to clarify:

, correctly linked to the correct member of multiple sets of Kilosort data.

What do you mean by this? Just so I don't start writing code that does not relate to your intent.

@simon-ball
Copy link
Contributor Author

simon-ball commented Oct 16, 2024

Thanks for the detailed response! I shall try it shortly

What do you mean by this? Just so I don't start writing code that does not relate to your intent.

With the standard caveat that I may misunderstand the structure of NWB, we currently have a tool that converts from our internal Datajoint system to NWB. It doesn't use Neuroconv, it's based directly on pynwb, and one I'm trying to augment, with neuroconv_, because I'd like something that is easier to maintain.

In the current tool, when Units are added to the nwbfile, each Unit is cross-linked to an Electrode and therefore to an ElectrodeGroup which is linked to a Device. In psuedocode, that looks something like this:

for probe in inserted_probes:
    device = nwbfile.create_device()
    electrode_group = nwbfile.create_electrode_group()    # currently, this is 1 group per probe, even for 4-shank probes. Which may be inappropriate. 
    electrodes = make_electrodes(nwbfile, electrode_group)    # Create however many hundred electrodes a probe has
    for unit in (set_of_units_recorded_by_this_probe):
        nwbfile.add_unit(unit_id, electrodre=unit_central_electrode, electrode_group=electrode_group)

Thus, my assumption is that when I use Neuroconv to add, say, 2 SpikeGLXInterfaces (each of which corresponds to a unique Device) and 2 KilosortInterfaces, there is, or should be, some way to indicate that "Units from ks_1 are associated with Electrodes that are a member of ElectrodeGroup_[1,2,3,4] that are attached to Device 1; Units from ks_2 are with ElectrodeGroup_[5,6,7,8] from Device 2" And when the SpikeGLXInterface and KilosortInterface are added separately and in plural, there seems to be no obvious way to either specify, or infer, that relationship.

As my original bug report indicates, I didn't get far enough into the process to see how, or if, neuroconv works as I assume it does. Maybe that expectation is wrong, and the way our current tool works is not the correct, or not the only correct, way to do it.

But that's what I meant :-)

@simon-ball
Copy link
Contributor Author

simon-ball commented Oct 16, 2024

Digging into the output with the help of your example, I can generate a complete NWB file for further debugging purposes

  • The namespace collision in Devices is addressed. 2 devices are correctly registered with whatever ID I care to give them. I've not tried to get the actual serial numbers from the metadata in yet, but that's mostly convenience for me (it lets me automate registering the implant location)
  • The namespace collision in the ElectricalSeries is solved.

However, it looks like adding KiloSort to the mix introduces a fresh collision.

Recording is done by channel, which is a subset of the entire set of electrodes - so a 4-shank NPX2 probe has 5120 electrodes from which 384 can be recorded simultaneously. The NWBfile gets the list of those 384 channels as its table nwbfile.electrodes, identified by a channel_name - e.g. AP3. I assume the AP refers to this being the full bandwidth sampling. These channel numbers are only unique per probe - which is fine, that's how they're handled in spikeGLX as well, so changing that convention would be confusing.

print("Electrode Channels")
print("Number of channels: ", len(nwbfile.electrodes))
print("Number of unique channel names: ", len(set(nwbfile.electrodes.channel_name)))
>>> Electrode Channels
>>> Number of channels:  768
>>> Number of unique channel names:  384

However, since Kilosort doesn't have any concept of plural devices, it only tracks units by channel number, and that is what is picked up by the KiloSortSortingInterface (understandably so). But because channel_numbers aren't unique in a multi-device recording, once multiple KilosortSortingInterfaces are added, the mapping of (Unit to unique Channel) is lost - a Unit labelled on, e.g., channel 9 could come from either of two different probes, in different brain regions.

I'm not sure what the best solution to this is for the project as a whole, but I guess in the short term, I need to build a slightly modified CustomKiloSortSortingInterface to add an additional field to the nwbfile.units table to store a device identity as well, since (channel_number, device) should be unique

That's starting to get off topic from the issue I originally raised, but might be relevant to the project as a whole, requiring a more general solution.

@h-mayorquin
Copy link
Collaborator

No, this is something that we have been pondering for a while. You are correct that the canonical way to make this linkage is by writing the electrodes in the units table.

As you can see our base function for adding unit tables has a way of passing this information (see the `unit_electrode_indices``) but we have not propagated to the interface level because of some conceptual difficulties:

add_units_table_to_nwbfile

Exposing this and adding documentation is something on our todo list.

Meanwhile, I will come back to you soon with a way of doing this but it will be more manual work like the example above.

Another option you have is write the sorting of each probe in a different unit table, see this issue:
#1079

Btw, do you guys have a paper where you have this multiple probe setup or you know references in the literature where this is used? It is always useful for us to be able to point out to specific user cases when developing features.

@simon-ball
Copy link
Contributor Author

simon-ball commented Oct 17, 2024

Btw, do you guys have a paper where you have this multiple probe setup or you know references in the literature where this is used? It is always useful for us to be able to point out to specific user cases when developing features.

Here is one

Two rats were implanted bilaterally with prototype Neuropixels ‘phase 3A’ single-shank probes and with one probe targeting MEC–PaS in each hemisphere

Here is another

The rats were implanted with single-shank 384-site Neuropixels probes (Jun et al., 2017) targeting the medial entorhinal cortex (MEC) in either one or both hemispheres

Those are both from 2022 using the older NPX 1 probes. I don't know if we've published anything with multiple NPX2 probes yet, but that will be coming - I discovered this issue doing the prep work for exactly that.

In addition, while I don't have access to all of our surgical records, I have data on at least 40 subjects with n>=2 Neuropixel probes implanted since we first started testing the first generation prototypes

@h-mayorquin
Copy link
Collaborator

Thank you, as promised, here is some code that would handle the correct mapping between raw and sorted data on ecephys. To this end, this introduced a new converted class SortedRecordingConverter. The idea is that you pass the recording and the corresponding sorter objects, then you need to pass a mapping that for each unit id on the sorting object gives all the recording channels that should be associated with it. The class under the hood will take care of writing the correct mapping between the newly added units table and the electrode table.

from neuroconv.datainterfaces import SpikeGLXRecordingInterface, KiloSortSortingInterface
from neuroconv import ConverterPipe

from neuroconv.datainterfaces import SpikeGLXRecordingInterface, KiloSortSortingInterface
from neuroconv import ConverterPipe


from neuroconv import ConverterPipe
from neuroconv.datainterfaces.ecephys.baserecordingextractorinterface import BaseRecordingExtractorInterface
from neuroconv.datainterfaces.ecephys.basesortingextractorinterface import BaseSortingExtractorInterface


class SortedRecordingConverter(ConverterPipe):

    def __init__(
        self,
        recording_interface: BaseRecordingExtractorInterface,
        sorting_interface: BaseSortingExtractorInterface,
        unit_ids_to_channel_ids: dict[str, list[str]],
    ):

        self.recording_interface = recording_interface
        self.sorting_interface = sorting_interface
        self.unit_ids_to_channel_ids = unit_ids_to_channel_ids

        self.channel_ids = self.recording_interface.recording_extractor.get_channel_ids()
        self.unit_ids = self.sorting_interface.sorting_extractor.get_unit_ids()

        data_interfaces = [recording_interface, sorting_interface]
        super().__init__(data_interfaces=data_interfaces)

    def add_to_nwbfile(self, nwbfile, metadata, conversion_options=None):

        conversion_options = conversion_options or dict()
        conversion_options_recording = conversion_options.get("recording", dict())

        self.recording_interface.add_to_nwbfile(
            nwbfile=nwbfile,
            metadata=metadata,
            **conversion_options_recording,
        )

        from neuroconv.tools.spikeinterface import add_sorting_to_nwbfile
        from neuroconv.tools.spikeinterface.spikeinterface import _get_electrode_table_indices_for_recording
        
        # This returns the indices in the electrode table that corresponds to the channel ids of the recording
        electrode_table_indices = _get_electrode_table_indices_for_recording(
            recording=self.recording_interface.recording_extractor,
            nwbfile=nwbfile,
        )
        
        # Map ids in the recording to the indices in the electrode table
        channel_id_to_electrode_table_index = {
            channel_id: electrode_table_indices[channel_index]
            for channel_index, channel_id in enumerate(self.channel_ids)
        }

        # Create a list of lists with the indices of the electrodes in the electrode table for each unit
        unit_electrode_indices = []
        for unit_id in self.unit_ids:
            unit_channel_ids = self.unit_ids_to_channel_ids[unit_id]
            unit_indices = [channel_id_to_electrode_table_index[channel_id] for channel_id in unit_channel_ids]
            unit_electrode_indices.append(unit_indices)

        # TODO: this should be the interface add_to_nwbfile method but we have not exposed unit_electrode_indices yet
        add_sorting_to_nwbfile(
            nwbfile=nwbfile,
            sorting=self.sorting_interface.sorting_extractor,
            unit_electrode_indices=unit_electrode_indices,
        )

        return


file_path_spikeglx_1 = (
    ECEPHY_DATA_PATH / "spikeglx" / "long_nhp_stubbed" / "snippet_g0/snippet_g0_imec0" / "snippet_g0_t0.imec0.ap.bin"
)
es_key_1 = "SpikeGLX1"  # Electrical Series Metadata Key
interface_spikeglx_1 = SpikeGLXRecordingInterface(file_path=file_path_spikeglx_1, es_key=es_key_1)
metadata = interface_spikeglx_1.get_metadata()
probe_name = "ProbeA"
probe_metadata = {"name": probe_name, "description": "probe_description", "manufacturer": "IMEC"}
metadata["Ecephys"]["Device"] = [probe_metadata]

electrode_group_metadata = metadata["Ecephys"]["ElectrodeGroup"]

# This is a list of dicts with entries name, description and device
# Update name to Probe_{shank_index} and device to probe_name
electrode_group_names = [
    electrode_metadata["name"].replace("Imec", probe_name) for electrode_metadata in electrode_group_metadata
]


for entry, name in zip(electrode_group_metadata, electrode_group_names):
    entry.update(name=name, device=probe_name)

metadata["Ecephys"]["ElectrodeGroup"] = electrode_group_metadata


channel_group_names = interface_spikeglx_1.recording_extractor.get_property("group_name")
new_channel_group_names = [name.replace("Imec", probe_name) for name in channel_group_names]
interface_spikeglx_1.recording_extractor.set_property(key="group_name", values=new_channel_group_names)

# Note that the first interface needs to cretate the nwbfile

# Interface for the corresponding sorting
folder_path = ECEPHY_DATA_PATH / "phy" / "phy_example_0"
sorting_interface_1 = KiloSortSortingInterface(folder_path=folder_path)


# Dummy mapping, you need to provide the actual mapping
recording_extractor = interface_spikeglx_1.recording_extractor
sorting_extractor = sorting_interface_1.sorting_extractor
number_of_units = sorting_extractor.get_num_units()
unit_ids = sorting_extractor.get_unit_ids()
channel_ids = recording_extractor.get_channel_ids()
unit_ids_to_channel_ids = {unit_ids[unit_index]: [channel_ids[unit_index]] for unit_index in range(number_of_units)}

sorted_recorded_interface_1 = SortedRecordingConverter(
    recording_interface=interface_spikeglx_1,
    sorting_interface=sorting_interface_1,
    unit_ids_to_channel_ids=unit_ids_to_channel_ids,
)
nwbfile = sorted_recorded_interface_1.create_nwbfile(metadata=metadata)

file_path_spikeglx_2 = ECEPHY_DATA_PATH / "spikeglx" / "NP2_no_sync" / "all_chan_g0_t0.exported.imec0.ap.bin"
es_key_2 = "SPIKEGLX2"  # Electrical Series Metadata Key
interface_spikeglx_2 = SpikeGLXRecordingInterface(file_path=file_path_spikeglx_2, es_key=es_key_2)
metadata = interface_spikeglx_2.get_metadata()
probe_name = "ProbeB"
probe_metadata = {"name": probe_name, "description": "probe_description", "manufacturer": "IMEC"}
metadata["Ecephys"]["Device"] = [probe_metadata]

electrode_group_metadata = metadata["Ecephys"]["ElectrodeGroup"]

# This is a list of dicts with entries name, description and device
# Update name to Probe_{shank_index} and device to probe_name
electrode_group_names = [
    electrode_metadata["name"].replace("Imec", probe_name) for electrode_metadata in electrode_group_metadata
]


for entry, name in zip(electrode_group_metadata, electrode_group_names):
    entry.update(name=name, device=probe_name)

metadata["Ecephys"]["ElectrodeGroup"] = electrode_group_metadata


channel_group_names = interface_spikeglx_2.recording_extractor.get_property("group_name")
new_channel_group_names = [name.replace("Imec", probe_name) for name in channel_group_names]
interface_spikeglx_2.recording_extractor.set_property(key="group_name", values=new_channel_group_names)

# Interface for the corresponding sorting
folder_path = ECEPHY_DATA_PATH / "phy" / "phy_example_0"
sorting_interface_2 = KiloSortSortingInterface(folder_path=folder_path)

# Dummy mapping, you need to provide the actual mapping
recording_extractor = interface_spikeglx_2.recording_extractor
sorting_extractor = sorting_interface_2.sorting_extractor
number_of_units = sorting_extractor.get_num_units()
unit_ids = sorting_extractor.get_unit_ids()
channel_ids = recording_extractor.get_channel_ids()
unit_ids_to_channel_ids = {unit_ids[unit_index]: [channel_ids[unit_index]] for unit_index in range(number_of_units)}


# Add the second recording and sorting to the nwbfile
sorted_recorded_interface_2 = SortedRecordingConverter(
    recording_interface=interface_spikeglx_2,
    sorting_interface=sorting_interface_2,
    unit_ids_to_channel_ids=unit_ids_to_channel_ids,
)
sorted_recorded_interface_2.add_to_nwbfile(nwbfile=nwbfile, metadata=metadata)

# When you are done write to NWB
nwbfile_path = "nwbfile.nwb"
from pynwb import NWBHDF5IO

with NWBHDF5IO(nwbfile_path, mode="w") as io:
    io.write(nwbfile)

nwbfile

We will be integrating something like this to main but first I need to fix a couple of things. Expose unit_electrode_indices at a higher level, fix the metadata mapping collision so all the shenanigans to fix the metadata above is removed, and add some documentation and edge case failing. Meanwhile it would be cool if you could give it a test.

@h-mayorquin
Copy link
Collaborator

@simon-ball

Those are both from 2022 using the older NPX 1 probes. I don't know if we've published anything with multiple NPX2 probes yet, but that will be coming - I discovered this issue doing the prep work for exactly that.

The example that I am using for debugging uses our test data from here:

https://gin.g-node.org/NeuralEnsemble/ephy_testing_data/src/master/spikeglx

This includes multi shank 2.0 probe where the electrode groups are a bit more complicated precisely because I wanted to edge for that case.

In addition, while I don't have access to all of our surgical records, I have data on at least 40 subjects with n>=2 Neuropixel probes implanted since we first started testing the first generation prototypes

Do you think you could share one of those sessions with us? It would be good to have more testing data for us that includes this type of setup. Feel free to reach me at h . mayorquin @ gmail (without spaces of course) to discuss this further if you could help us with it.

@simon-ball
Copy link
Contributor Author

unit_ids_to_channel_ids = {unit_ids[unit_index]: [channel_ids[unit_index]] for unit_index in range(number_of_units)}

Your example code seems to work with one exception, possibly deliberately (you mention it's a dummy). Your example assigns channels consecutively by unit id value, rather than by unit channel. The below assigns by "central" channel, and there probably exist ways with which I have not yet experimented to assign a group of electrodes.

I think that line should read

unit_ids_to_channel_ids = {
    unit_ids[unit_index] : [channel_ids[sorting_extractor.get_unit_property(unit_index, "ch")]
    for unit_index in range(number_of_units)
}

With that modification, I get units assigned to electrode IDs exceeding the range of the zeroth device; and as far as I have checked manually, matching the expected electrode.

@h-mayorquin
Copy link
Collaborator

Correct, the mapping I provided is for illustration purposes.

There is no guarantee that the sorting extractor should have a "unit_index" property. There is also no guarantee that they would come in any order. This is something that the user should know and/or assign at writing time.

@yarikoptic
Copy link
Contributor

Sample data (albeit big) is available from https://datasets.datalad.org/?dir=/labs/mvdm/OdorSequence/sourcedata/raw/M541/M541-2024-08-31_g0 . If you were to use git and git-annex directly, do

git clone https://datasets.datalad.org/labs/mvdm/OdorSequence/sourcedata/raw/M541/M541-2024-08-31_g0/.git

to get it.

@h-mayorquin
Copy link
Collaborator

Hi, @yarikoptic. Now that #1150 has been merged you should be able to use the SpikeGLXConverter to read your data without metadata collision. Something like this:

image

Could you guys if that works for you and if the metadata is written correctly?

We are still working on #1132 for the linking of the electrode metadata.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants