Add base schema and classes #3

rly · 2023-10-23T01:51:34Z

Resolves #1. This PR adds 2 new neurodata types:

HedTags - a subtype of VectorData that can be used to represent a column of string HED tags in any DynamicTable object
HedNWBFile - a subtype of NWBFile that adds an optional attribute that stores the HED version schema (string) at the file level.

Example usage:

from datetime import datetime
from dateutil.tz import tzlocal
from hdmf.common import VectorIndex
from pynwb import NWBHDF5IO
from uuid import uuid4

from ndx_hed import HedNWBFile, HedTags


hed_nwbfile = HedNWBFile(
    session_description="session_description",
    identifier=str(uuid4()),
    session_start_time=datetime(1970, 1, 1, tzinfo=tzlocal()),
    hed_schema_version="8.2.0",
)

hed_nwbfile.add_trial_column("hed_tags", "HED tags for each trial", col_cls=HedTags, index=True)
hed_nwbfile.add_trial(start_time=0.0, stop_time=1.0, hed_tags=["animal_target", "correct_response"])
hed_nwbfile.add_trial(start_time=2.0, stop_time=3.0, hed_tags=["animal_target", "incorrect_response"])

with NWBHDF5IO(self.path, mode="w") as io:
    io.write(hed_nwbfile)

with NWBHDF5IO(self.path, mode="r", load_namespaces=True) as io:
    read_nwbfile = io.read()
    assert isinstance(read_nwbfile, HedNWBFile)
    assert isinstance(read_nwbfile.trials["hed_tags"], VectorIndex)
    assert isinstance(read_nwbfile.trials["hed_tags"].target, HedTags)
    # read_nwbfile.trials["hed_tags"][0] is read as a numpy array
    assert all(read_nwbfile.trials["hed_tags"][0] == ["animal_target", "correct_response"])
    assert all(read_nwbfile.trials["hed_tags"][1] == ["animal_target", "incorrect_response"])

This does not yet use the Events tables in the ndx-events extension. We are in the process of updating those neurodata types. I'll add and demonstrate support for those types in a separate PR.

I've commented out two integration tests that do not pass because, in writing them, I found a bug in one of the testing utilities of pynwb. I have a bugfix for that in pynwb here: NeurodataWithoutBorders/pynwb#1782 . After that is merged and PyNWB 2.6.0 is released (hopefully this week), we can uncomment those tests.

rly · 2023-10-23T01:53:53Z

Right now, the authors of the extension are listed as Ryan Ly, Oliver Ruebel, and Kay Robbins, and uses our institutional email addresses. I'm happy to change/add to the names, email addresses, order, etc. The extension uses the BSD-3 license. Happy to change that to another permissive license also.

VisLab

I think we should talk about the name of the type HedTags. A HED annotation consists of a comma separated list of HED tags (which might include nested parentheses to group tags).

The thing appearing in a timeseries column is a HED annotation, which we generally refer to as a HED string.

VisLab

Add Ian Callanan to the list of authors IanCa (on github). He will be doing the integration into the HED ecosystem. @IanCa

VisLab

Could you amplify on what the HedNWB file represents? Thx

VisLab

It looks like you are modeling the HED column as an array of string tags. Usually the HED annotation is a single string. We could deal with an array by just joining, but the items will need to represent not just single tags to allow parentheses.

oruebel · 2023-10-23T17:30:39Z

HedNWBFile - a subtype of NWBFile that adds an optional attribute that stores the HED version schema

@rly could you clarify why hed_schema_version should sit globally at the NWBFile level, rather than being an attribute on the HedTags column?

If there is indeed additional global metadata that we need to store for HED (e.g., the version), then I would suggest extending LabMetaData to add a group /general/hed to the metada

VisLab · 2023-10-23T17:40:00Z

We want only one HED version per data file.... I think it should be global to that file.

…

On Mon, Oct 23, 2023 at 12:30 PM Oliver Ruebel ***@***.***> wrote: - HedNWBFile - a subtype of NWBFile that adds an optional attribute that stores the HED version schema @rly <https://github.com/rly> could you clarify why hed_schema_version should sit globally at the NWBFile level, rather than being an attribute on the HedTags column? If there is indeed additional global metadata that we need to store for HED (e.g., the version), then I would suggest extending LabMetaData to add a group /general/hed to the metada — Reply to this email directly, view it on GitHub <#3 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAJCJOR5O4Q4GPXXSNLRUOLYA2SUXAVCNFSM6AAAAAA6LKSM2OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONZVGY4DAOBVG4> . You are receiving this because your review was requested.Message ID: ***@***.***>

VisLab · 2023-10-23T17:40:26Z

Actually, we want only one version of HED for the entire dataset (Dandi set?)

…

On Mon, Oct 23, 2023 at 12:39 PM Kay Robbins ***@***.***> wrote: We want only one HED version per data file.... I think it should be global to that file. On Mon, Oct 23, 2023 at 12:30 PM Oliver Ruebel ***@***.***> wrote: > > - HedNWBFile - a subtype of NWBFile that adds an optional attribute > that stores the HED version schema > > @rly <https://github.com/rly> could you clarify why hed_schema_version > should sit globally at the NWBFile level, rather than being an attribute on > the HedTags column? > > If there is indeed additional global metadata that we need to store for > HED (e.g., the version), then I would suggest extending LabMetaData to > add a group /general/hed to the metada > > — > Reply to this email directly, view it on GitHub > <#3 (comment)>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AAJCJOR5O4Q4GPXXSNLRUOLYA2SUXAVCNFSM6AAAAAA6LKSM2OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONZVGY4DAOBVG4> > . > You are receiving this because your review was requested.Message ID: > ***@***.***> >

oruebel · 2023-10-23T17:43:42Z

Actually, we want only one version of HED for the entire dataset (Dandi set?)

I don't think that is something we can enforce in the extension. We only have control of the schema of individual files but not of the collection.

rly · 2023-10-23T18:49:18Z

Thanks for the feedback!

Yes, I am currently modeling the HED column as an array of freeform string tags. This could include strings like "(Intended-effect, Cue)" but I think I understand now that "Intended-effect" and "Cue" are HED tags and this is a nested tag.

Would it be preferred to store a single string for a HED annotation that is a comma-separated list of HED tags? To be explicit, we would change the usage to:

nwbfile.add_trial_column("HED", "HED annotations for each trial", col_cls=HedAnnotations)
nwbfile.add_trial(start_time=0.0, stop_time=1.0, HED="animal_target, correct_response, (Intended-effect, Cue)")
# ...
nwbfile.trials["HED"][0]  # returns "animal_target, correct_response, (Intended-effect, Cue)"

I will move the hed_schema_version to a new group HedMetadata with the name hed that extends LabMetaData so that it can be placed at /general/hed. This group will have only one attribute, hed_schema_version.

I will add Ian Callanan to the authors list.

VisLab · 2023-10-24T20:21:39Z

It looks like you are modeling the HED column as an array of string tags. Usually the HED annotation is a single string. We could deal with an array by just joining, but the items will need to represent not just single tags to allow parentheses.
Yes that is correct.

VisLab

This looks good.... and I'm going to approve so that we can move forward. Can we enforce that if a HED column will be called "HED"?

oruebel · 2023-10-26T15:18:35Z

Can we enforce that if a HED column will be called "HED"?

Yes, you can set name: HED in the schema to enforce a fixed name. However, if the name is fixed this means that you will only be able to have one HED annotation dataset in a particular table if the name is fixed. Alternatively, you could also set default_name: HED, which would allow the user to change the name but by default the name would be HED. I think in the case of HED it is probably Ok to limit to one of these columns per table, so fixing the name may be Ok.

Even when the name is fixed, however, I would still recommend to check for the neurodata_type (i.e., here HedAnnotation) rather than for the nameof a dataset to identify the type (i.e., if a column stores HED tags). A user can add custom columns to tables and as such a user could name a column HED, even it is not a HedAnnotation column with HED tags (i.e., we can enforce only that the column from this schema must be named HED but we cannot enforce that other types of columns cannot be named HED).

VisLab · 2023-11-22T15:28:25Z

Even when the name is fixed, however, I would still recommend to check for the neurodata_type (i.e., here HedAnnotation) rather than for the nameof a dataset to identify the type (i.e., if a column stores HED tags). A user can add custom columns to tables and as such a user could name a column HED, even it is not a HedAnnotation column with HED tags (i.e., we can enforce only that the column from this schema must be named HED but we cannot enforce that other types of columns cannot be named HED).

Yes good idea...

rly added 9 commits October 21, 2023 10:18

Initial commit from ndx-template

6763780

Update .gitignore

c71a338

Update metadata

043739d

Modify spec, add tests

aea6e3f

Update from latest template

fd6fc5a

Fix unused imports

0e185e1

Comment out not-yet-working test

49af8cf

Comment out currently broken tests

240de96

Make sure HedTags column are strings

db2e8c5

Clean up GH actions

083f105

rly requested review from VisLab and oruebel October 23, 2023 01:55

VisLab reviewed Oct 23, 2023

View reviewed changes

rly added 2 commits October 25, 2023 17:22

Update from PR feedback

156a0cc

Cleanup

4a3e488

VisLab approved these changes Oct 26, 2023

View reviewed changes

rly merged commit 4b193bb into main Oct 26, 2023

rly deleted the use-ndx-template branch October 26, 2023 16:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add base schema and classes #3

Add base schema and classes #3

rly commented Oct 23, 2023 •

edited

Loading

rly commented Oct 23, 2023

VisLab left a comment

VisLab left a comment

VisLab left a comment

VisLab left a comment

oruebel commented Oct 23, 2023

VisLab commented Oct 23, 2023 via email

VisLab commented Oct 23, 2023 via email

oruebel commented Oct 23, 2023

rly commented Oct 23, 2023 •

edited

Loading

VisLab commented Oct 24, 2023

VisLab left a comment

oruebel commented Oct 26, 2023

VisLab commented Nov 22, 2023

Add base schema and classes #3

Add base schema and classes #3

Conversation

rly commented Oct 23, 2023 • edited Loading

rly commented Oct 23, 2023

VisLab left a comment

Choose a reason for hiding this comment

VisLab left a comment

Choose a reason for hiding this comment

VisLab left a comment

Choose a reason for hiding this comment

VisLab left a comment

Choose a reason for hiding this comment

oruebel commented Oct 23, 2023

VisLab commented Oct 23, 2023 via email

VisLab commented Oct 23, 2023 via email

oruebel commented Oct 23, 2023

rly commented Oct 23, 2023 • edited Loading

VisLab commented Oct 24, 2023

VisLab left a comment

Choose a reason for hiding this comment

oruebel commented Oct 26, 2023

VisLab commented Nov 22, 2023

rly commented Oct 23, 2023 •

edited

Loading

rly commented Oct 23, 2023 •

edited

Loading