-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OHE attributes with existing GSC validation #858
Comments
Email this week from NCBI: John Anderson passed along your request Ruth to remove the validation for the following fields in the OHE BioSample package. indoor_surf [cabinet|ceiling|counter top|door|shelving|vent cover|window|wall] We validate based on a set list of terms as indicated in the list above. We would be happy to make the changes you requested, but we cannot do so without the approval of the GSC as well. These terms and validations appear in their packages as well as the OHE package and therefore any changes to the validation would need to be approved by all stakeholders. We cannot simply remove the validation for one package (OHE) as that is not how our system works. Lynn we would need the GSC to approve removing the validations for these packages and making them free text. My understanding from Ruth's request is that OHE plans to add a new picklist with specific terms and some of those terms are not in the current validation list so the new values will fail the existing validation. Alternatively, if the GSC and OHE could provide us with an updated list of agreed upon terms that should be used for validation, we could make that change as well. Please let us know what decision OHE and the GSC can come to and then we can begin the work to implement the needed changes. Best Regards, Linda Linda Frisse, PhD |
Hello Linda, Cheers, |
thanks for sharing this @lschriml |
Requested in Google group [gensc-cig], Ruth Time (GSC Board member)
Issue raised via email:
https://groups.google.com/g/gensc-cig/c/POQWoEXEP2c/m/ToqrFMLJAAAJ?utm_medium=email&utm_source=footer&pli=1
Hi NCBI BioSample,
I'm including Lynn Schriml here, for awareness on this question regarding existing GSC validation on the following attributes, also included in the One Health Enteric BioSample package.
indoor_surf [cabinet|ceiling|counter top|door|shelving|vent cover|window|wall]
surf_material [adobe|carpet|cinder blocks|concrete|hay bales|glass|metal|paint|plastic|stainless steel|stone|stucco|tile|vinyl|wood]
We'd like to start curating our own picklists for these attributes, specific for our use cases. While our use cases overlap significantly with what's included here, we need to expand beyond this term list. We'd also like to perform our own validation.
Can we remove NCBI validation on these attributes when they are included in OHE package submissions?
How shall we proceed here? This question is triggered by a recent validation error we received on SUB14608130. We discussed this a few years back (see below), but I never followed up on solving it.
==========================================================================
From: Timme, Ruth <[email protected]>
Sent: Tuesday, May 10, 2022 4:18 PM
To: Pennerman, Kayla * <[email protected]>; Anderson, John B (NIH) <[email protected]>
Cc: Barrett, Tanya (NIH) <[email protected]>
Subject: Re: [EXTERNAL] OHE examples
Hi John, replying with more specific answers to your questions:
NEW ISSUE: Sorry we didn’t notice this previously, but the One Health Enteric package includes the following attributes that are already in our system and already use picklists.
[1] These attributes were originally provided by the GSC. We now see that some of the terms in your picklists differ from the GSC picklists, which are:
building_setting [GSC] [urban|suburban|exurban|rural]
indoor_surf [GSC] [cabinet|ceiling|counter top|door|shelving|vent cover|window|wall]
surf_material [GSC] [adobe|carpet|cinder blocks|concrete|hay bales|glass|metal|paint|plastic|stainless steel|stone|stucco|tile|vinyl|wood]
[2] And host_gender currently has a picklist defined by the INSDC:
host_gender [INSDC] [male|female|pooled male and female|neuter|hermaphrodite|intersex]
We could possibly edit this list to at least match GSC host_gender picklist, which I believe is:
host_gender [GSC] [female|hermaphrodite|non-binary|male|neuter|transgender|transgender (female to male)|transgender (male to female)|undeclared]
Do you think you might be able to work with the GSC to harmonize your lists? As it is, any values you supply that don’t match existing picklists will fail submission validation.
One way to stop this delaying submissions in the short term would be to omit these fields from uploads, and update at a later date once picklists have been harmonized.
Thanks,
John
===================================================================
From: "Pennerman, Kayla *" <[email protected]>
Date: Tuesday, May 10, 2022 at 4:11 PM
To: "Anderson, John B (NIH)" <[email protected]>
Cc: "Timme, Ruth" <[email protected]>, "Barrett, Tanya (NIH)" <[email protected]>
Subject: RE: [EXTERNAL] OHE examples
Hi John,
We removed the ontological accessions from all picklists and excess terms from the specified picklists. Please let us know if there are still issues to address.
Thank you,
Kayla
========================================================
From: Anderson, John (NIH/NLM/NCBI) [E] <[email protected]>
Sent: Thursday, May 5, 2022 8:36 AM
To: Pennerman, Kayla * <[email protected]>
Cc: Timme, Ruth <[email protected]>; Barrett, Tanya (NIH) <[email protected]>
Subject: RE: [EXTERNAL] OHE examples
CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.
Hi Kayla,
Thanks for sending us your latest reference guide, template and vocabulary picklists.
To pass validation this side, the ontology information will only have to be stripped from fields where NCBI supports picklists. The values provided for these fields have to be an exact match with values in our picklists, regardless of the package they are submitted under (including SARS-CoV-2). Stripping ontology information is not necessary for free text fields. The full list of attributes recognized by our system is provided at https://www.ncbi.nlm.nih.gov/biosample/docs/attributes/ (this page is not yet updated with new OneHealthEnteric attributes). The exact picklist values we support are listed in the ‘Format’ field (note, no ontology information is included within those picklist values). Note that attributes are package-agnostic at NCBI - they follow the same rules regardless of which package they participate in.
NEW ISSUE: Sorry we didn’t notice this previously, but the One Health Enteric package includes the following attributes that are already in our system and already use picklists.
[1] These attributes were originally provided by the GSC. We now see that some of the terms in your picklists differ from the GSC picklists, which are:
building_setting [GSC] [urban|suburban|exurban|rural]
indoor_surf [GSC] [cabinet|ceiling|counter top|door|shelving|vent cover|window|wall]
surf_material [GSC] [adobe|carpet|cinder blocks|concrete|hay bales|glass|metal|paint|plastic|stainless steel|stone|stucco|tile|vinyl|wood]
[2] And host_gender currently has a picklist defined by the INSDC:
host_gender [INSDC] [male|female|pooled male and female|neuter|hermaphrodite|intersex]
We could possibly edit this list to at least match GSC host_gender picklist, which I believe is:
host_gender [GSC] [female|hermaphrodite|non-binary|male|neuter|transgender|transgender (female to male)|transgender (male to female)|undeclared]
Do you think you might be able to work with the GSC to harmonize your lists? As it is, any values you supply that don’t match existing picklists will fail submission validation.
One way to stop this delaying submissions in the short term would be to omit these fields from uploads, and update at a later date once picklists have been harmonized.
Thanks,
John
=================================================
Hi Kayla,
Thanks for the file. It’s great. I test it and found a few things we need to clear up:
We also need finalized picklists for "building setting", "indoor surface", "surface material" "host_gender".
The new attribute 'sequenced by' is optional, right?
In your test input, you still have the Ontology IDs appended to some picklist values, eg, human as food consumer [FOODON:03510026]. Ruth said it wouldn't be a problem for you to strip these out before submitting, so all future submissions will not have those, correct?
We noticed that your template omits 3 optional attributes. Should these attributes still be included in our template?
indoor_surf_subpart
serovar
surface_orientation
food_processing_method = "food (ground) [FOODON:00002713];food (frozen) [FOODON:03302148]" passed validation. We're not validating that attribute at all, so it’s OK to have multiple entries.
host_age 13 passed. We don't require units.
latitude and longitude 50 N 20 N failed
latitude and longitude USA:WI failed
latitude and longitude 2 3 passed because it was automatically converted to 2 N 3 E
also one you didn't have in orange:
latitude and longitude 120 S 90 W failed because latitude 120 is impossible
cult_isol_date 13/27/2022 passed because we're not validating that attribute
collection date March failed
collection date 2022-03-04 passed - not sure why you had this in orange. That's a valid format.
reference_material Star*Reads passed because we're not validating that attribute
The following organisms were flagged for curator review:
bacteria this is always flagged. We require a valid tax name. In rare cases “bacterium” wouldbe allowed.
Listeria sp. 'sp.' names without an appended strain name are always flagged
Escherichia coli serovar O157 this is in the Taxonomy database as “Escherichia coli O157”
Salmonella enterica subsp. enterica servoar Dublin "serovar" is misspelled
Thanks,
John
=======================================================
The text was updated successfully, but these errors were encountered: