Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New test: Presence of license information #1038

Closed
david-a-wheeler opened this issue Sep 17, 2021 · 28 comments · Fixed by #1178
Closed

New test: Presence of license information #1038

david-a-wheeler opened this issue Sep 17, 2021 · 28 comments · Fixed by #1178
Assignees
Labels
good first issue Good for newcomers kind/enhancement New feature or request
Milestone

Comments

@david-a-wheeler
Copy link
Contributor

Is your feature request related to a problem? Please describe.
The lack of any clearly-stated license creates a risk in use, and would definitely reduce the likelihood of current & future review. It doesn't appear that scorecard is focused on just OSS, but even proprietary software needs a clearly stated license.

Describe the solution you'd like
Add a new test for a "The source repository must include a clearly stated license, as lack of a license will impede any kind of security review/audit and creates a legal risk for potential users. This is also required by the CII Best Practices badge criterion license_location".
To determine if there's a clearly-stated license, look for a file named
LICENSE{,.txt,.md} or COPYING{,.txt,.md} or a directory named LICENSES.

Here's the description from the CII Best Practices badge license_location:

The project MUST post the license(s) of its results
in a standard location in their source repository.
details:
One convention is posting the license
as a top-level file named LICENSE or COPYING, which
MAY be followed by an extension such as ".txt" or ".md".
An alternative convention is to have a directory named LICENSES
containing license file(s); these files are typically named as their
SPDX license identifier followed by an appropriate file extension,
as described in the
REUSE Specification.

@david-a-wheeler david-a-wheeler added the kind/enhancement New feature or request label Sep 17, 2021
@laurentsimon laurentsimon added the good first issue Good for newcomers label Sep 20, 2021
@laurentsimon
Copy link
Contributor

the Security policy check is a good starting point for this https://github.com/ossf/scorecard/blob/main/checks/security_policy.go

@nanikjava
Copy link
Contributor

nanikjava commented Sep 24, 2021

Happy to take this on @laurentsimon @david-a-wheeler please assign the ticket to me.

Thanks.

@david-a-wheeler
Copy link
Contributor Author

@nanikjava - great, thanks!

I just remembered that .html is another common format for licenses, and obviously other formats might become common in the future. So I'd just look for these ERE/PCRE regex patterns:

^LICENSE(\.[a-z]+)?$
^COPYING(\.[a-z]+)?$
LICENSES

I've written them as case-sensitive, but I understand if you choose to match them as case-insensitive (C/POSIX locale).

@laurentsimon
Copy link
Contributor

Thanks @nanikjava. Let's make it a standalone check.

@david-a-wheeler
Copy link
Contributor Author

If you're looking for more info, here's how licensee looks for license files including its regexes for license files.

The key thing to learn is that people aren't consistent about spelling, so it might be good to allow both LICENSE and LICENCE. It appears that LICENCE is the UK spelling for the noun form.

Also, you might allow both COPYING and COPYRIGHT.

Licensee allows for a number of more quirky cases, e.g., LICENSE-name_of_licence.extensions, COPYING-name_of_license.extensions, and name_of_license-COPYING.extensions. Not sure those are important, but those could be allowed, at least for common licenses like MIT, Apache, BSD, GPL, and LGPL.

@nanikjava
Copy link
Contributor

Few things to confirm:

  • This will be a new check and shall we call it license_check.go ?
  • Please help me to fill in the checks.yaml which I've included below:
  License:
    risk: ????
    short: ?????
    tags: ????
    description: >-
	The project MUST post the license(s) of its results in a standard location
	in their source repository.
	details:
	One convention is posting the license as a top-level file named LICENSE or 
	COPYING, which	MAY be followed by an extension such as ".txt" or ".md". 
	An alternative convention is to have a directory named LICENSES containing 
	license file(s); 
	these files are typically named as their SPDX license identifier followed 
	by an appropriate file extension, as described in the REUSE Specification.
    remediation:
      - >-
        ??????

@laurentsimon
Copy link
Contributor

laurentsimon commented Oct 6, 2021

Thank you so much for your help, @nanikjava !

risk -> Low
short -> Determines if the project has defined a license
tags -> license
description -> @olivekl@ can you chime in?
remediation -> use what you have in description now. @olivekl to confirm

We'll add @david-a-wheeler as reviewer to your PR.

@olivekl
Copy link
Contributor

olivekl commented Oct 6, 2021

Here's a description and remediation. Please confirm I've linked to the correct places for SPDX, REUSE, and SPDX License Identifier?

(The | at the start of the description will preserve formatting.)

Description: |

Risk: Low (possible impediment to security review)

This check tries to determine if the project has published a license. It works by checking standard locations for a file named according to common conventions for licenses.

A license can give users information about how the source code may or may not be used. The lack of a license will impede any kind of security review or audit and creates a legal risk for potential users.

This check will detect files in the top-level directory with any combination of the following names and extensions:LICENSE, LICENCE, COPYING, COPYRIGHT and .html, .txt, .md. It will also detect these files in a directory named LICENSES. (Files in a LICENSES directory are typically named as their SPDX license identifier followed by an appropriate file extension, as described in the REUSE Specification.)

Remediation:

  • Determine which license to apply to your project.
  • Create the license in a .txt, .html, or .md file named LICENSE or COPYING, and place it in the top-level directory.
  • Alternately, create a LICENSE directory and add license files with a name that matches your SPDX license identifier.

@laurentsimon
Copy link
Contributor

Thanks @olivekl !

@nanikjava
Copy link
Contributor

nanikjava commented Oct 30, 2021

@david-a-wheeler Does the check need to keep on checking until if finds a license file ? in other words once it find a single license file the check is complete ?

The reason for the question is if the check is going through repository such as this https://github.com/ghosthgy/Lienolopenwrt where it has multiple licenses in the project itself, is it sufficient that the check find a single license and be done with it ?

@nanikjava
Copy link
Contributor

@laurentsimon Any thoughts about this #1038 (comment) ?

@laurentsimon
Copy link
Contributor

You can stop once the first license is found. Having them all is useful in itself, but there are dedicated tools to deal with licenses in general. I think it's good enough for us to inform the user whether a project has published a license or not at all. Also, some projects vendor dependencies so may have thousands of licenses which would clobber the scorecard output AND increase running time without providing actual benefits.
If someone really disagrees, it's easy to change. So let's go with the simple/fast solution for now.
Let me know if you disagree.

@laurentsimon
Copy link
Contributor

FYI, some user inform about license in their README, see https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/licensing-a-repository
Let's not worry about this for now.

@david-a-wheeler
Copy link
Contributor Author

david-a-wheeler commented Nov 6, 2021 via email

@naveensrinivasan
Copy link
Member

FYI, some user inform about license in their README...
Yes, but I think it's fine to not accept this as adequate. The problem is that when applications have thousands of dependencies, having the license only in a Readme is not adequate notification to potential users about their obligations, because it's not easily machine readable. It may be legally adequate, but it is not good practice because this poor notification is hostile to users.

I agree. This is a project by itself to figure it out. Could we utilize this https://github.com/google/licensecheck?

@nanikjava
Copy link
Contributor

You can stop once the first license is found. Having them all is useful in itself, but there are dedicated tools to deal with licenses in general. I think it's good enough for us to inform the user whether a project has published a license or not at all.

Thanks for the confirmation.

@david-a-wheeler
Copy link
Contributor Author

@nanikjava - I agree, stop when you find a license. The point of this criterion is to assert that there is clear license information. This isn't a tool to determine exactly what the license is, just that there's information clearly available, so 1 license file is enough to determine this.

@evverx
Copy link
Contributor

evverx commented Dec 5, 2021

I'm puzzled as to why this has been integrated into scorecard because it doesn't seem to have anything to do with security. Apart from "as lack of a license will impede any kind of security review/audit" (which in my opinion seems to be a stretch) I'm not sure I understand why it's here. I agree it's kind of important but given that it can't even tell open source projects and proprietary projects apart I'm not sure what exactly people are supposed to do with that information.

@evverx
Copy link
Contributor

evverx commented Dec 5, 2021

I'm not sure why the remediation steps don't include disclaimers like:

We hope it helps, but please keep in mind that we’re not lawyers and that we make mistakes like everyone else. For that reason, GitHub provides the information on an "as-is" basis and makes no warranties regarding any information or licenses provided on or through it, and disclaims liability for damages resulting from using the license information. If you have any questions regarding the right license for your code or any other legal issues relating to it, it’s always best to consult with a professional.

I'm pretty sure that the most important part there is "it’s always best to consult with a professional" and it should probably be among the remediation steps.

@david-a-wheeler
Copy link
Contributor Author

It's indirect, but making it impractical to do a security review & audit is considered by many to also be a security problem. It's not solely a security problem of course, but after discussion it was accepted.

It's always best to consult with a professional, but for many developers that's impractical. Most developers don't have thousands of extra dollars lying around to pay a lawyer for a legal review.

@evverx
Copy link
Contributor

evverx commented Dec 5, 2021

Most developers don't have thousands of extra dollars lying around to pay a lawyer for a legal review.

If they don't it's probably safe to say that they aren't actually interested in whether their dependencies are licensed properly or not. And that false sense of security they get looking at the output of scorecard saying that projects are licensed is what it is: a false sense of security. Given that scorecard can't tell licences apart it can mark projects using bespoke licenses preventing people from benchmarking, hacking and so on as "licensed" but it seems those kind of licenses aren't exactly what most people analyzing their dependencies are looking for.

@laurentsimon
Copy link
Contributor

cc @olivekl

@david-a-wheeler
Copy link
Contributor Author

@evverx:

If they don't it's probably safe to say that they aren't actually interested in whether their dependencies are licensed properly or not.

I do not think it's safe to say that; I think that's unlikely to be true. Someone can believe something's important (such as licensing) and not be rich. I'm not rich. Most OSS developers pick a common OSS license without consulting a lawyer, and that usually works just fine because the common OSS licenses (such as MIT, Apache-2.0, LGPL, and GPL) were created with lawyers specifically to make this kind of collaboration possible.

The reference to dependencies here is irrelevant. Scorecard evaluates a particular project, not its transitive dependencies, and that's by intent. For many projects it'd be impossible to require a scorecard rank of all transitive dependencies before they could get a given rank, there are too many dependencies. It's true that dependencies do matter for security (completely agree!), but if we try to include transitive dependencies in the score, the result would be something no one would (or could) adopt.

As far as getting a "false sense of security" from not telling the licenses apart.... fair enough!! That concern makes a lot of sense to me. I think the problem is that the tool is currently too course; it's just 0 or 10, but there are clearly important gradations. It would be possible to tell if a license is known OSI-approved or not. We could give OSI-approved licenses a score of 10, while giving other licenses a lower score like "5" (as there's more risk that the license might not be OSS and/or have nasty clauses), and give "no license found" a 0 (in that case we have no evidence that it's legal to use at all). There are several tools that can look at the license text to do a deeper analysis. I think we could just use one of several existing tools. A quick solution, if the repo is on GitHub, is to extract the GitHub analysis result from GitHub itself (GitHub provides this in its API).

This is a new issue though; this issue (#1038) is closed & completed. If you think there should be a finer grain on licensing (& I'd support that!), let's create a new issue to discuss it. Most people ignore closed issues - they're closed.

@evverx
Copy link
Contributor

evverx commented Dec 6, 2021

The reference to dependencies here is irrelevant

Having read https://github.com/ossf/scorecard#what-is-scorecards I was under the impression that scorecard is supposed to be used by consumers of open-source projects trying to figure out whether their dependencies are safe:

We created Scorecards to give consumers of open-source projects an easy way to judge whether their dependencies are safe.

so I'm not sure why dependencies are irrelevant here.

@evverx
Copy link
Contributor

evverx commented Dec 6, 2021

FWIW I wasn't talking about transitive dependencies. I agree scorecard shouldn't even try to do that. What I meant is that people running scorecard do it to figure out whether projects they depend on are safe. Developers can use scorecard too of course but I think I was already told once in one of PRs/issues that they aren't the target audience of scorecard.

@evverx
Copy link
Contributor

evverx commented Dec 6, 2021

Anyway, personally, I think that this check is completely misleading at best (even if it could tell some licences apart). But since it isn't exactly the hill I want to die on I don't think it makes much sense to keep discussing it here :-)

@david-a-wheeler
Copy link
Contributor Author

No need to die on a hill :-). But a closed issue isn't really a good place to discuss a change, most people will never see this discussion.

I've created a new issue to track the license gradation idea: #1369 . I think it makes sense, and hopefully captures at least some of your concerns. For that point, let's discuss the issue issue there.

@david-a-wheeler
Copy link
Contributor Author

BTW, I do think developers are potential users of scorecard!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers kind/enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants