Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update MONDO local unique identifiers #105

Merged
merged 7 commits into from
Oct 3, 2022
Merged

Update MONDO local unique identifiers #105

merged 7 commits into from
Oct 3, 2022

Conversation

cthoyt
Copy link
Member

@cthoyt cthoyt commented Oct 2, 2022

Following #104 (comment), this PR implements the change that we won't make assumptions about how Identifiers.org might curate future OBO Foundry ontologies' local unique identifier patterns. Therefore, this PR removes the redundant prefixes in MONDO's local unique identifiers.

This PR could be merged directly or also combine into #104 if the diff would be a problem

@cthoyt cthoyt marked this pull request as ready for review October 2, 2022 19:04
@cthoyt cthoyt requested a review from bgyori October 2, 2022 19:10
@bgyori
Copy link
Contributor

bgyori commented Oct 3, 2022

Okay. We should also look into validation functions and make sure they are consistent. The function check_valid_prefix_id is used in the following places:

tests/test_validity.py
23:from biomappings.utils import SKIP_BANANA, check_valid_prefix_id, get_canonical_tuple
124:        check_valid_prefix_id(
128:        check_valid_prefix_id(

src/biomappings/utils.py
147:def check_valid_prefix_id(prefix: str, identifier: str) -> None:

src/biomappings/wsgi.py
25:    check_valid_prefix_id,
216:            check_valid_prefix_id(source_prefix, source_id)
225:            check_valid_prefix_id(target_prefix, target_id)

However, its behavior doesn't make sense currently, and after some digging, I traced this back to a significant change in logic in Bioregistry's miriam_standardize_identifier function, which, when changed in biopragmatics/bioregistry#427 was not appropriately followed up by corresponding changes here. The issue is that this function used to return a useful value (i.e., an identifier) whether or not a resource had MIRIAM data, but now it just returns None. So the validity check in Biomappings returns incorrectly after a check for a MIRIAM-standardized identifier.

@cthoyt
Copy link
Member Author

cthoyt commented Oct 3, 2022

@bgyori in 0ab87e9 I took care of deduplicating two functions that both were doing validity checking (I favored the one that wasn't baked into the unittest since it was reused elsewhere in the package). Now it's more consistent how the logic is applied (no more need for the flags for NCIT and AGRO) and more clear that the potential None return in miriam_standardize_identifier() is not actually possible since there's a guard clause (i.e., check to make sure there's a miriam prefix available before even running miriam_standardize_identifier()).

The diff isn't so great but the function reads pretty nicely now:

def check_valid_prefix_id(prefix: str, identifier: str) -> None:
"""Check the prefix/identifier pair is valid."""
resource = bioregistry.get_resource(prefix)
if resource is None:
raise InvalidPrefix(prefix)
miriam_prefix = resource.get_miriam_prefix()
if miriam_prefix is not None:
norm_id = resource.miriam_standardize_identifier(identifier)
if norm_id is None:
raise ValueError(
"should not be possible since we check for miriam prefix before running miriam_standardize_identifier"
)
if norm_id != identifier:
raise InvalidNormIdentifier(prefix, identifier, norm_id)
pattern = re.compile(resource.miriam["pattern"])
else:
norm_id = resource.standardize_identifier(identifier)
if norm_id != identifier:
raise InvalidNormIdentifier(prefix, identifier, norm_id)
pattern = resource.get_pattern_re()
if pattern is not None and not pattern.match(identifier):
raise InvalidIdentifierPattern(prefix, identifier, pattern)

cthoyt added 5 commits October 3, 2022 13:04
Now that the two implementations are unified, this is no longer strictly necessary
Copy link
Contributor

@bgyori bgyori left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks!

@bgyori bgyori merged commit 75efc9d into master Oct 3, 2022
@bgyori bgyori deleted the clean-mondo branch October 3, 2022 16:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants