Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add base schema and classes #3

Merged
merged 12 commits into from
Oct 26, 2023
Merged

Add base schema and classes #3

merged 12 commits into from
Oct 26, 2023

Conversation

rly
Copy link
Collaborator

@rly rly commented Oct 23, 2023

Resolves #1. This PR adds 2 new neurodata types:

  1. HedTags - a subtype of VectorData that can be used to represent a column of string HED tags in any DynamicTable object
  2. HedNWBFile - a subtype of NWBFile that adds an optional attribute that stores the HED version schema (string) at the file level.

Example usage:

from datetime import datetime
from dateutil.tz import tzlocal
from hdmf.common import VectorIndex
from pynwb import NWBHDF5IO
from uuid import uuid4

from ndx_hed import HedNWBFile, HedTags


hed_nwbfile = HedNWBFile(
    session_description="session_description",
    identifier=str(uuid4()),
    session_start_time=datetime(1970, 1, 1, tzinfo=tzlocal()),
    hed_schema_version="8.2.0",
)

hed_nwbfile.add_trial_column("hed_tags", "HED tags for each trial", col_cls=HedTags, index=True)
hed_nwbfile.add_trial(start_time=0.0, stop_time=1.0, hed_tags=["animal_target", "correct_response"])
hed_nwbfile.add_trial(start_time=2.0, stop_time=3.0, hed_tags=["animal_target", "incorrect_response"])

with NWBHDF5IO(self.path, mode="w") as io:
    io.write(hed_nwbfile)

with NWBHDF5IO(self.path, mode="r", load_namespaces=True) as io:
    read_nwbfile = io.read()
    assert isinstance(read_nwbfile, HedNWBFile)
    assert isinstance(read_nwbfile.trials["hed_tags"], VectorIndex)
    assert isinstance(read_nwbfile.trials["hed_tags"].target, HedTags)
    # read_nwbfile.trials["hed_tags"][0] is read as a numpy array
    assert all(read_nwbfile.trials["hed_tags"][0] == ["animal_target", "correct_response"])
    assert all(read_nwbfile.trials["hed_tags"][1] == ["animal_target", "incorrect_response"])

This does not yet use the Events tables in the ndx-events extension. We are in the process of updating those neurodata types. I'll add and demonstrate support for those types in a separate PR.

I've commented out two integration tests that do not pass because, in writing them, I found a bug in one of the testing utilities of pynwb. I have a bugfix for that in pynwb here: NeurodataWithoutBorders/pynwb#1782 . After that is merged and PyNWB 2.6.0 is released (hopefully this week), we can uncomment those tests.

@rly
Copy link
Collaborator Author

rly commented Oct 23, 2023

Right now, the authors of the extension are listed as Ryan Ly, Oliver Ruebel, and Kay Robbins, and uses our institutional email addresses. I'm happy to change/add to the names, email addresses, order, etc. The extension uses the BSD-3 license. Happy to change that to another permissive license also.

@rly rly requested review from VisLab and oruebel October 23, 2023 01:55
Copy link
Member

@VisLab VisLab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should talk about the name of the type HedTags. A HED annotation consists of a comma separated list of HED tags (which might include nested parentheses to group tags).

The thing appearing in a timeseries column is a HED annotation, which we generally refer to as a HED string.

Copy link
Member

@VisLab VisLab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add Ian Callanan to the list of authors IanCa (on github). He will be doing the integration into the HED ecosystem. @IanCa

Copy link
Member

@VisLab VisLab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you amplify on what the HedNWB file represents? Thx

Copy link
Member

@VisLab VisLab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you are modeling the HED column as an array of string tags. Usually the HED annotation is a single string. We could deal with an array by just joining, but the items will need to represent not just single tags to allow parentheses.

@oruebel
Copy link
Collaborator

oruebel commented Oct 23, 2023

  • HedNWBFile - a subtype of NWBFile that adds an optional attribute that stores the HED version schema

@rly could you clarify why hed_schema_version should sit globally at the NWBFile level, rather than being an attribute on the HedTags column?

If there is indeed additional global metadata that we need to store for HED (e.g., the version), then I would suggest extending LabMetaData to add a group /general/hed to the metada

@VisLab
Copy link
Member

VisLab commented Oct 23, 2023 via email

@VisLab
Copy link
Member

VisLab commented Oct 23, 2023 via email

@oruebel
Copy link
Collaborator

oruebel commented Oct 23, 2023

Actually, we want only one version of HED for the entire dataset (Dandi set?)

I don't think that is something we can enforce in the extension. We only have control of the schema of individual files but not of the collection.

@rly
Copy link
Collaborator Author

rly commented Oct 23, 2023

Thanks for the feedback!

Yes, I am currently modeling the HED column as an array of freeform string tags. This could include strings like "(Intended-effect, Cue)" but I think I understand now that "Intended-effect" and "Cue" are HED tags and this is a nested tag.

Would it be preferred to store a single string for a HED annotation that is a comma-separated list of HED tags? To be explicit, we would change the usage to:

nwbfile.add_trial_column("HED", "HED annotations for each trial", col_cls=HedAnnotations)
nwbfile.add_trial(start_time=0.0, stop_time=1.0, HED="animal_target, correct_response, (Intended-effect, Cue)")
# ...
nwbfile.trials["HED"][0]  # returns "animal_target, correct_response, (Intended-effect, Cue)"

I will move the hed_schema_version to a new group HedMetadata with the name hed that extends LabMetaData so that it can be placed at /general/hed. This group will have only one attribute, hed_schema_version.

I will add Ian Callanan to the authors list.

@VisLab
Copy link
Member

VisLab commented Oct 24, 2023

It looks like you are modeling the HED column as an array of string tags. Usually the HED annotation is a single string. We could deal with an array by just joining, but the items will need to represent not just single tags to allow parentheses.
Yes that is correct.

Copy link
Member

@VisLab VisLab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good.... and I'm going to approve so that we can move forward. Can we enforce that if a HED column will be called "HED"?

@oruebel
Copy link
Collaborator

oruebel commented Oct 26, 2023

Can we enforce that if a HED column will be called "HED"?

Yes, you can set name: HED in the schema to enforce a fixed name. However, if the name is fixed this means that you will only be able to have one HED annotation dataset in a particular table if the name is fixed. Alternatively, you could also set default_name: HED, which would allow the user to change the name but by default the name would be HED. I think in the case of HED it is probably Ok to limit to one of these columns per table, so fixing the name may be Ok.

Even when the name is fixed, however, I would still recommend to check for the neurodata_type (i.e., here HedAnnotation) rather than for the nameof a dataset to identify the type (i.e., if a column stores HED tags). A user can add custom columns to tables and as such a user could name a column HED, even it is not a HedAnnotation column with HED tags (i.e., we can enforce only that the column from this schema must be named HED but we cannot enforce that other types of columns cannot be named HED).

@rly rly merged commit 4b193bb into main Oct 26, 2023
33 of 34 checks passed
@rly rly deleted the use-ndx-template branch October 26, 2023 16:57
@VisLab
Copy link
Member

VisLab commented Nov 22, 2023

Even when the name is fixed, however, I would still recommend to check for the neurodata_type (i.e., here HedAnnotation) rather than for the nameof a dataset to identify the type (i.e., if a column stores HED tags). A user can add custom columns to tables and as such a user could name a column HED, even it is not a HedAnnotation column with HED tags (i.e., we can enforce only that the column from this schema must be named HED but we cannot enforce that other types of columns cannot be named HED).

Yes good idea...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support annotation of NWB trials with HED tags
3 participants