Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend parse_sssom_table to report wrong prefixes and metadata #565

Merged
merged 5 commits into from
Dec 7, 2024

Conversation

matentzn
Copy link
Collaborator

@matentzn matentzn commented Dec 7, 2024

This also comes with the introduction of a strict mode, which allows draconian failing in cases where small violations are encountered (by default, we drop unknown metadata elements, and redefined built in prefixes).

So the idea is:

Look for some basic errors before the actual parsing occurs, fail if in strict mode, if not proceed to normal mode with repair.

@gouttegd feel free to check, I can already see three comments you will have, but its best I do small steps rather than none :)

This is very crude. Ideally I would add tests for every case it could fail but way over time limit now.
This also comes with the introduction of a strict mode, which allows draconian failing in cases where small violations are encountered (by default, we drop unknown metadata elements, and redefined built in prefixes).

So the idea is:

Look for some basic errors before the actual parsing occurs, fail if in strict mode, if not proceed to normal mode with repair.
@matentzn matentzn requested a review from gouttegd December 7, 2024 09:07
Copy link
Contributor

@gouttegd gouttegd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but mind that you are being stricter than what the spec requires.

logging.warning(
f"A built-in URI namespace ({builtin_uri}) was used in (one of) the provided prefix map(s), "
f"but the provided prefix ({reverse_bimap[builtin_uri]}) does not correspond to the "
f"standard prefix: {builtin_prefix}. The prefix will be ignored."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ll just note that this case (standard prefix associated to a different prefix name than the one listed in the spec, e.g. som: https://w3id.org/sssom/) is not forbidden by the spec. The spec only forbids associated a builtin prefix name to a different prefix (e.g., sssom: https://not-the-sssom-namespace.example.org/).

The spec says:

By exception, prefix names listed in the table found in the IRI prefixes section are considered “built-in”. As such, they MAY be omitted from the curie_map. If they are not omitted, they MUST point to the same IRI prefixes as in the aforementioned table.

There is nothing here that forbids using another prefix name than a builtin one to refer to one of the known namespaces.

Not requesting any change here because SSSOM-Py is free to be stricter than the spec if it wants to.

I previously failed parsing in strict mode when a non-standard metadata element was encountered on mapping_set level. This here adds support for the case that someone legally specified an extension slot according to https://w3id.org/sssom/spec).
src/sssom/parsers.py Outdated Show resolved Hide resolved
@matentzn matentzn merged commit e596d70 into master Dec 7, 2024
8 checks passed
@matentzn matentzn deleted the report-prefix-issues branch December 7, 2024 19:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants