-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend parse_sssom_table to report wrong prefixes and metadata #565
Conversation
This is very crude. Ideally I would add tests for every case it could fail but way over time limit now.
This also comes with the introduction of a strict mode, which allows draconian failing in cases where small violations are encountered (by default, we drop unknown metadata elements, and redefined built in prefixes). So the idea is: Look for some basic errors before the actual parsing occurs, fail if in strict mode, if not proceed to normal mode with repair.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, but mind that you are being stricter than what the spec requires.
logging.warning( | ||
f"A built-in URI namespace ({builtin_uri}) was used in (one of) the provided prefix map(s), " | ||
f"but the provided prefix ({reverse_bimap[builtin_uri]}) does not correspond to the " | ||
f"standard prefix: {builtin_prefix}. The prefix will be ignored." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ll just note that this case (standard prefix associated to a different prefix name than the one listed in the spec, e.g. som: https://w3id.org/sssom/
) is not forbidden by the spec. The spec only forbids associated a builtin prefix name to a different prefix (e.g., sssom: https://not-the-sssom-namespace.example.org/
).
The spec says:
By exception, prefix names listed in the table found in the IRI prefixes section are considered “built-in”. As such, they MAY be omitted from the curie_map. If they are not omitted, they MUST point to the same IRI prefixes as in the aforementioned table.
There is nothing here that forbids using another prefix name than a builtin one to refer to one of the known namespaces.
Not requesting any change here because SSSOM-Py is free to be stricter than the spec if it wants to.
I previously failed parsing in strict mode when a non-standard metadata element was encountered on mapping_set level. This here adds support for the case that someone legally specified an extension slot according to https://w3id.org/sssom/spec).
…tions Updated test case to show this works.
This also comes with the introduction of a strict mode, which allows draconian failing in cases where small violations are encountered (by default, we drop unknown metadata elements, and redefined built in prefixes).
So the idea is:
Look for some basic errors before the actual parsing occurs, fail if in strict mode, if not proceed to normal mode with repair.
@gouttegd feel free to check, I can already see three comments you will have, but its best I do small steps rather than none :)