Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support matching http | https #633

Closed
bradleeedmondson opened this issue Apr 5, 2018 · 16 comments
Closed

support matching http | https #633

bradleeedmondson opened this issue Apr 5, 2018 · 16 comments

Comments

@bradleeedmondson
Copy link
Contributor

bradleeedmondson commented Apr 5, 2018

As pointed out in #551, lots of licenses match out to URLs, either inline in plaintext or explicitly marked-up as links.

Typically these have been HTTP URLs, but to be good citizens we might want to allow matches to signed/encrypted HTTPS URLs as well. If we decide to do this, I can see two options.

Option (1) would be to update the spec to explicitly allow all URLs to match either HTTP or HTTPS prefixes. I'm assuming without checking here that the matching guidelines in the spec don't currently address this. (@goneall @kestewart ?)

Option (2) would be to identify and modify all license definition files with URLs to allow matching against either (the existing specific HTTP URL) | (a new specific HTTPS URL).

Does anyone have thoughts on this or feel strongly about it? I feel medium-strongly that we should accommodate HTTPS URLs, but I wouldn't push for a particular timeline. (I view it as a nice-to-have for the future, not a must-have for the near term.)

Edit:
Opened companion bug in spdx-spec repo: spdx/spdx-spec#81

@zvr
Copy link
Member

zvr commented Apr 7, 2018

I also believe that this should be handled in the matching guidelines and not in each and every license file.

@goneall
Copy link
Member

goneall commented Apr 7, 2018

I agree on adding this to the matching guidelines. It would be straightforward to explain in the matching guidelines text and it would also be straightforward to implement in a license matching algorithm.

Just a note - if this is added to the matching guidelines, we need to open an issue on the SPDX tools to implement the new guideline.

@wking
Copy link
Contributor

wking commented Apr 7, 2018

Option (2) would be to identify and modify all license definition files with URLs to allow matching against either (the existing specific HTTP URL) | (a new specific HTTPS URL).

This is my preference, because the less we have in our English-only matching guidelines, the simpler it is to write conformant matching code. I don't expect it would be that much trouble to mark up our XML to allow for this (for an example, see here). A benefit would be that we could explicitly choose the appropriate value (http or https) for each link, and avoid allowing matches where the upstream server only supported one or the other. For example, dev.bertos.org/doxygen (which we have here) has a broken HTTPS cert but a working HTTP redirect:

$ curl -sI http://dev.bertos.org/doxygen/ | grep -i '^HTTP\|location'
HTTP/1.1 301 Moved Permanently
Location: https://github.com/develersrl/bertos
$ curl -I https://dev.bertos.org/doxygen/
curl: (51) SSL: no alternative certificate subject name matches target host name 'dev.bertos.org'

That particular link is in a <crossRef> entry, so we're free to update it (I can file a PR for that later), but the same issue might apply to other links embedded in legal text, where we have less freedom to inject changes.

However, if the maintainers here feel happy with http(s) for all legal-text URLs, I'd rather add a post-processor (in LicenseListPublisher or a separate step in its pipeline) so the templates we release in license-list-data have explicit markup for this even if our XML here does not. That allows folks to stick with the simple matcher implementations (no need for them to add explicit code for a HTTP(S) rule) without needing updates to the XML here. And if, at some point in the future, we decide that we do want per-URL control, we can update our post-processor without affecting license-list-data and its downstream consumers.

@goneall
Copy link
Member

goneall commented Apr 8, 2018

However, if the maintainers here feel happy with http(s) for all legal-text URLs, I'd rather add a post-processor (in LicenseListPublisher or a separate step in its pipeline) so the templates we release in license-list-data have explicit markup for this even if our XML here does not

This is an easy change to make in LicenseListPublisher and I agree it makes it easier for other tool providers.

@bradleeedmondson
Copy link
Contributor Author

However, if the maintainers here feel happy with http(s) for all legal-text URLs, I'd rather add a post-processor (in LicenseListPublisher or a separate step in its pipeline) so the templates we release in license-list-data have explicit markup for this even if our XML here does not

This is an easy change to make in LicenseListPublisher and I agree it makes it easier for other tool providers.

Makes sense to me -- I just do not want legal to have to mark up every license for (http|https) if we agree that we're OK with a blanket rule allowing matching to HTTPS

mlinksva added a commit to mlinksva/licensee that referenced this issue Apr 19, 2018
Reflectng consensus in spdx/license-list-XML#633

Also allowing updating of some vendored licenses to https reflecting
what some license stewards have done eg
github/choosealicense.com#543 (comment)
without any risk of false negatives
@jlovejoy jlovejoy added this to the Later Release milestone Oct 26, 2018
@swinslow
Copy link
Member

Discussed on 2019-07-30 tech team call, agreed that this should be added to the matching guidelines to treat http and https as the same. @goneall noted that the tools have already been fixed to address this.

@goneall
Copy link
Member

goneall commented Jul 30, 2019

I found the "PR" with the proposal: #892 It was actually an issue and not a PR. The proposal is a duplicate of this issue but has some specific wording suggestions.

@goneall
Copy link
Member

goneall commented Jul 30, 2019

Here is the proposal from the duplicate issue amended with the proposal by @zvr to use the official nomenclature:

Propose adding the following to the matching guidelines:

  1. URL references

12.1 Purpose: To avoid a license mismatch merely because a different in http URI scheme for a URL reference (e.g. http:// vs. https://.

12.1.1 Guideline: http and https URI schemes shall be considered equivalent for matching purposes.

@swinslow
Copy link
Member

Looks good to me, with a couple of tweaks (changing 12 => 13 in all bullet points and a couple of typo tweaks). Also added a sentence at end for 13.1.1 re: templates, since some existing templates currently may have markup for this.

How does this look? If folks are good with it, I can update the website + add a PR to track the addition in the DOCS folder

Note that we would also need to bump the Matching Guidelines version number (currently v2.0). Because of that, I'd like to mention this on the next legal team call before we make it live.

= = = = =

  1. URL references

13.1 Purpose: To avoid a license mismatch merely because a different http URI scheme is used for a URL reference (e.g. http:// vs. https://).

13.1.1 Guideline: http and https URI schemes shall be considered equivalent for matching purposes. Templates may or may not include markup for this guideline.

@goneall
Copy link
Member

goneall commented Jul 30, 2019

@swinslow I'm thinking we should just remove the occurrences of http<optional>s</optional>. I'll re-create the PR since the tools are now in place to support this.

@wking
Copy link
Contributor

wking commented Jul 31, 2019

Explicit <optional>s</optional> seems easy to add to templates and makes it easy to write new matching implementations. I'd rather make this a template-entry policy and not a template-consumption rule.

@goneall
Copy link
Member

goneall commented Aug 1, 2019

Having the legal team manually add the http<optional>s</optional> for all new licenses would be quite a burden IMHO.

I guess there isn't much harm in retaining the http<optional>s</optional> in the existing licenses other than possible confusion from the inconsistency.

I did add PR #909 which removes the http<optional>s</optional> if we want to go that route.

@swinslow @zvr thoughts?

@wking
Copy link
Contributor

wking commented Aug 1, 2019

Having the legal team manually add the http<optional>s</optional> for all new licenses would be quite a burden IMHO.

So add it in tooling in the form that accepts new license submissions. Or run a fixup script here before cutting a release. Or in the tools that publish to the data repo. As long as the data we provide to our end-users has explicit markup for optional https, it will satisfy my "make it easy to implement from-scratch matchers" goal.

@goneall
Copy link
Member

goneall commented Oct 16, 2019

As long as the data we provide to our end-users has explicit markup for optional https, it will satisfy my "make it easy to implement from-scratch matchers" goal.

I agree adding the explicit markup automatically will make it easier for the downstream tools providers. However, we already have a number of items in the matching guidelines (e.g. spaces, hyphens, equivalent words) that do not have explicit markup.

The most straight forward approach would be to enhance the licenseListPublisher to add the markups to the license list data.

Pull requests to add this are more than welcome.

@swinslow
Copy link
Member

swinslow commented Feb 7, 2021

@goneall Since the HTTPS matching guideline has been added, and there haven't been further comments on this issue since 2019, are you good with us closing this issue? Thanks!

@goneall
Copy link
Member

goneall commented Feb 7, 2021

FYI - The license matcher code used in the SPDX Online Tools supports the http/s matching guidelines.

Since the HTTPS matching guideline has been added, and there haven't been further comments on this issue since 2019, are you good with us closing this issue?

@swinslow Yes - I'll go ahead and close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants