-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Re-implement reconciliation code with curies
#426
Conversation
This PR was motivated by mapping-commons/sssom-py#426 (and serves as a follow-up to mapping-commons/sssom-py#216). It implements two functions that are high-level ways to rewire a given converter.
This PR is finished but is currently failing CI because of non-deterministic issues with poetry (is it time to get rid of poetry yet 😛?) |
@hrshdhgd please help, we have a poetry issue again |
@@ -1105,62 +1105,25 @@ def reconcile_prefix_and_data( | |||
""" | |||
# Discussion about this found here: | |||
# https://github.com/mapping-commons/sssom-py/issues/216#issue-1171701052 | |||
converter = msdf.converter | |||
converter = curies.remap_curie_prefixes(converter, prefix_reconciliation["prefix_synonyms"]) | |||
converter = curies.rewire(converter, prefix_reconciliation["prefix_expansion_reconciliation"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use constants instead of strings to satisfy SRP?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know what SRP means, but let's save updating this data structure for a subsequent PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Single responsibility principle: classes/methods/functions should only have one reason to change. If this string changes for whatever reason, many different classes and places in the code will have to change as well.
"r": "rdfs", | ||
"o": "owl", | ||
}, | ||
"prefix_expansion_reconciliation": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename expansion_reconciliation
to rewiring
to maintain consistent terminology?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about this?
tests/test_collapse.py
Outdated
self.assertEqual(recon_msdf.prefix_map[pfx], exp) | ||
self.assertEqual( | ||
{ | ||
"a": "http://example.org/a/", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is "a" still present? Shouldn't it have been remapped to c before it go rewired?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The mapping "a":"c" got ignored since c
is already present in the prefix map and this would cause an inconsistency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the question is: do we want to add an "overwrite" setting? I don't think that having this be default functionality makes sense. if you don't want original prefix c
in the map, then maybe there should be a 3rd part to this workflow for explicit deletions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm interesting. I think we would like a --force option, which results in the removal of the already existing entry, and a logging.warning(). What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think having a flag would be the way to go. I will mock that in the curies
package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a solution! Pushing now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems to work nicely! My new solution doesn't allow deletions, but requires that if you remap a->c and there is already a c, you need to give c-> something else. Seems reasonable to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No reasons not to accept, but two questions still open.
@@ -1105,62 +1105,25 @@ def reconcile_prefix_and_data( | |||
""" | |||
# Discussion about this found here: | |||
# https://github.com/mapping-commons/sssom-py/issues/216#issue-1171701052 | |||
converter = msdf.converter | |||
converter = curies.remap_curie_prefixes(converter, prefix_reconciliation["prefix_synonyms"]) | |||
converter = curies.rewire(converter, prefix_reconciliation["prefix_expansion_reconciliation"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Single responsibility principle: classes/methods/functions should only have one reason to change. If this string changes for whatever reason, many different classes and places in the code will have to change as well.
"r": "rdfs", | ||
"o": "owl", | ||
}, | ||
"prefix_expansion_reconciliation": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about this?
Ok, I merged it now under the flag of progress 🙏 |
@hrshdhgd can you comment on wether you agree to the change in the action? |
This PR replaces the
reconcile_prefix_and_data
with more carefully defined operations from thecuries
package (implemented in biopragmatics/curies#74 and biopragmatics/curies#78).The new implementation is built on
curies.remap_curie_prefixes
andcuries.rewire
. Note that if you want to overwrite a CURIE prefix, e.g., in the Bioregistry extended prefix map, you need to provide a place for the old one to go as in{"geo": "ncbi.geo", "geogeo": "geo"}
. Just doing{"geogeo": "geo"}
would not work sincegeo
already exists.This is a slight change from the existing implementation, but I think it is better since it allows for solving a much more general problem of dependent (i.e., transitive) remappings