-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should elements suffixed by "Url" be xs:anyURI? #87
Comments
@demcg yes I believe that is right. |
@demcg Yes and probably with a pattern to only accept HTTP(S) protocols. |
NB: there may be reason for other protocols in the future, but, for the current URIs, HTTP(S) seems the only appropriate option/restriction. |
@jungshadow
I am not sure of the syntax, so please fix as required. |
@demcg testing my regex skills, but I think this should work.
|
@pstenbjorn Are there situations where the protocols would be in all caps? That seems…interesting. Should that be allowed or flag as something to fix? |
I should add that I know it's a part of the spec, but it may be off-putting. |
@jungshadow not really, if we want to enforce a lower case protocol, I think it we are safe in flagging it as something needing fixing. So that would make the regex
|
@pstenbjorn will this handle encoded spaces '%20', '#' or query strings '?,&,=' We might want to stay simple, because fully supporting all possible Url styles is a big task. Thanks |
@demcg thanks to the link you provided me, the following is a validation RegEx that seems to account for both allowed and disallowed uris. Scary to embed this into a schema, but it works.
|
@pstenbjorn we could remove "|ftp" ... but that is not much savings. Wow that is a big RegEx, but we only need it in one place ... |
@demcg. Agree on both counts, we certainly don't need to permit ftp URIs and it does only need to be declared once. @jungshadow and @jdmgoogle what are your thoughts? |
It seems like overkill to try to have a regex validating an entire url. Why not just check for an initial "http(s)://"? Also, how do we know that the regex is correct and that it doesn't have false negatives, etc? |
I'm actually not that concerned about restricting it to http(s), so I'm okay with just leaving it at anyURI. |
Out of curiosity, when would validations like these be applied (and are they ever)? I would also be okay with not imposing a restriction. |
I imagine we (Google and Pew) would encourage data providers to do a test validation before publishing their VIP feeds. I also imagine that we (Google) would perform a validation prior to parsing the file. A http(s)-only regexp would be useful but (a) could be overkill, and (b) could limit jurisdictions who might do something weird like host sample ballots on an FTP site or something. Plus regexps are really hard to get perfectly right, so IMHO the cost/benefit tradeoff isn't worth it. I may be wrong, though, YMMV, etc, etc. |
I wouldn't have brought it up if the URLs weren't so widely inconsistent (e.g. some have a protocol, some have a malformed protocol, some have no protocol...all in the same feed). In the end, I'd like the URLs to work in various tools (ours and others) and, currently, they fail more often than I'd like. Maybe just specifying |
Just made up my mind that it should be good enough. |
Address Issue #87 - changed Urls of type xs:string to type xs:anyURI
We have many elements that are named "...Url" are type xs:string, should they be xs:anyURI?
The text was updated successfully, but these errors were encountered: