Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-implement reconciliation code with curies #426

Merged
merged 14 commits into from
Sep 26, 2023
Merged

Conversation

cthoyt
Copy link
Member

@cthoyt cthoyt commented Sep 15, 2023

This PR replaces the reconcile_prefix_and_data with more carefully defined operations from the curies package (implemented in biopragmatics/curies#74 and biopragmatics/curies#78).

The new implementation is built on curies.remap_curie_prefixes and curies.rewire. Note that if you want to overwrite a CURIE prefix, e.g., in the Bioregistry extended prefix map, you need to provide a place for the old one to go as in {"geo": "ncbi.geo", "geogeo": "geo"}. Just doing {"geogeo": "geo"} would not work since geo already exists.

This is a slight change from the existing implementation, but I think it is better since it allows for solving a much more general problem of dependent (i.e., transitive) remappings

cthoyt added a commit to biopragmatics/curies that referenced this pull request Sep 19, 2023
This PR was motivated by
mapping-commons/sssom-py#426 (and serves as a
follow-up to mapping-commons/sssom-py#216). It
implements two functions that are high-level ways to rewire a given
converter.
@cthoyt cthoyt requested a review from matentzn September 19, 2023 09:42
@cthoyt
Copy link
Member Author

cthoyt commented Sep 19, 2023

This PR is finished but is currently failing CI because of non-deterministic issues with poetry (is it time to get rid of poetry yet 😛?)

@cthoyt
Copy link
Member Author

cthoyt commented Sep 19, 2023

@hrshdhgd please help, we have a poetry issue again

.github/workflows/qc.yml Outdated Show resolved Hide resolved
@@ -1105,62 +1105,25 @@ def reconcile_prefix_and_data(
"""
# Discussion about this found here:
# https://github.com/mapping-commons/sssom-py/issues/216#issue-1171701052
converter = msdf.converter
converter = curies.remap_curie_prefixes(converter, prefix_reconciliation["prefix_synonyms"])
converter = curies.rewire(converter, prefix_reconciliation["prefix_expansion_reconciliation"])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use constants instead of strings to satisfy SRP?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what SRP means, but let's save updating this data structure for a subsequent PR

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Single responsibility principle: classes/methods/functions should only have one reason to change. If this string changes for whatever reason, many different classes and places in the code will have to change as well.

"r": "rdfs",
"o": "owl",
},
"prefix_expansion_reconciliation": {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename expansion_reconciliation to rewiring to maintain consistent terminology?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about this?

tests/test_collapse.py Show resolved Hide resolved
self.assertEqual(recon_msdf.prefix_map[pfx], exp)
self.assertEqual(
{
"a": "http://example.org/a/",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is "a" still present? Shouldn't it have been remapped to c before it go rewired?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mapping "a":"c" got ignored since c is already present in the prefix map and this would cause an inconsistency

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the question is: do we want to add an "overwrite" setting? I don't think that having this be default functionality makes sense. if you don't want original prefix c in the map, then maybe there should be a 3rd part to this workflow for explicit deletions

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm interesting. I think we would like a --force option, which results in the removal of the already existing entry, and a logging.warning(). What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think having a flag would be the way to go. I will mock that in the curies package.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a solution! Pushing now

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to work nicely! My new solution doesn't allow deletions, but requires that if you remap a->c and there is already a c, you need to give c-> something else. Seems reasonable to me

tests/test_collapse.py Outdated Show resolved Hide resolved
@cthoyt cthoyt requested a review from matentzn September 24, 2023 12:26
Copy link
Collaborator

@matentzn matentzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No reasons not to accept, but two questions still open.

.github/workflows/qc.yml Show resolved Hide resolved
@@ -1105,62 +1105,25 @@ def reconcile_prefix_and_data(
"""
# Discussion about this found here:
# https://github.com/mapping-commons/sssom-py/issues/216#issue-1171701052
converter = msdf.converter
converter = curies.remap_curie_prefixes(converter, prefix_reconciliation["prefix_synonyms"])
converter = curies.rewire(converter, prefix_reconciliation["prefix_expansion_reconciliation"])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Single responsibility principle: classes/methods/functions should only have one reason to change. If this string changes for whatever reason, many different classes and places in the code will have to change as well.

"r": "rdfs",
"o": "owl",
},
"prefix_expansion_reconciliation": {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about this?

@matentzn matentzn merged commit 11fde9d into master Sep 26, 2023
@cthoyt cthoyt deleted the update-reconciliation branch September 26, 2023 10:55
@matentzn
Copy link
Collaborator

Ok, I merged it now under the flag of progress 🙏

@matentzn
Copy link
Collaborator

@hrshdhgd can you comment on wether you agree to the change in the action?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants