-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mistakes with IRI regular expressions #32
Comments
Thanks for the report! When there's time, we'll fix these bugs and add more tests. One aspect of this library that I never got around to was coming up with a better API. I dislike how url.URL has public fields, which allows the user to set fields to invalid values, so I didn't want to simply copy that approach. Did you consider alternatives when writing the fork? |
Thank you for your response. The semi-closest that I came upon (on GitHub) was https://github.com/fogfish/iri (archived) and its successor https://github.com/fogfish/curie , which seems to have a different target. As for the potential of invalid values: One approach could be to percent-escape values during In my approach I let myself be guided more by the design of (Here my thoughts went off - if not of interest, at least a thank you for rubber-ducking :) ) An unfinished thought was that the type itself could be a data object with no intelligence apart from I also had the wild idea to remove the path-resolving code and internally create Coming back to my unfinished thought, truly separating the two mentioned phases. The second phase would be in dedicated functions -- which would then exist for URI or IRI or whatever:
... contomap/iri is still in version |
The code regarding the regular expressions used for
rdf/iri
contain mistakes:iqueryRE
usesipath
instead ofiquery
, which is thus unused; When replacing it, further mistakes withiniquery
come to light:iprivate
contains an invalid regexp sequence:\x{F0000]-\x{FFFFD}
should be\x{F0000}-\x{FFFFD}
.iquery
is wrongly using "/?" as a sequence; This should be a choice, as in[\/\?]
.iuserinfo
is missing the colon character as per RFC. As such, IRI"https://user:[email protected]"
cannot be parsed.h16
regular expression should allow for 1-4 hex digits as per RFC, not require exactly 4 hex digitsAs a side-note, the example
"http://résumé.example.org"
, used for testing normalization, is not a properIRI string. Theé
sequence is according to RFC chapter 1.4 the way how non US-ASCII characters are represented within a US-ASCII-only RFC text.The first
#
makes the remainder be considered a fragment, which would be invalid because of the second#
.I found these things as I was extracting the package as a separate library, handling all the TODOs (ending up in a large rework), and feeding in many samples from the RFC - especially those about resolving relative IRIs. See https://github.com/contomap/iri .
My rework makes it incompatible with your use in here (different type & behaviour), which is why I collect the mistakes I found only as an issue.
The text was updated successfully, but these errors were encountered: