Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scancode-toolkit-31.0.2 returns an unknown-license-reference just before the bzip2-libbzip-2010 text #3080

Open
DennisClark opened this issue Sep 1, 2022 · 1 comment
Assignees
Labels

Comments

@DennisClark
Copy link
Contributor

I scanned doris-1.1.1-rc03 ( available at https://github.com/apache/doris/archive/refs/tags/1.1.1-rc03.tar.gz )
using scancode-toolkit-31.0.2
and although it detected most of the licenses in the rather complex notice (attached) in
doris-1.1.1-rc03/dist/LICENSE-dist.txt
it returns both unknown-license-reference and bzip2-libbzip-2010 for this chunk of text:

be/src/gutil/valgrind.h: licensed under the following terms:

   This file is part of Valgrind, a dynamic binary instrumentation
   framework.

   Copyright (C) 2000-2008 Julian Seward.  All rights reserved.

   Redistribution and use in source and binary forms, with or without
   modification, are permitted provided that the following conditions
   are met:

   1. Redistributions of source code must retain the above copyright
      notice, this list of conditions and the following disclaimer.

   2. The origin of this software must not be misrepresented; you must
      not claim that you wrote the original software.  If you use this
      software in a product, an acknowledgment in the product
      documentation would be appreciated but is not required.

   3. Altered source versions must be plainly marked as such, and must
      not be misrepresented as being the original software.

   4. The name of the author may not be used to endorse or promote
      products derived from this software without specific prior written
      permission.

   THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS
   OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
   WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
   ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
   DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
   DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
   GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
   INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
   WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
   NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
   SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

See lines 9190 through 9257 in the attached scan results to see both detection instances.

Apparently the "licensed under the following terms:" text snippet misled the scan logic, even though it found the bzip2-libbzip-2010 license correctly right after that. There is no reason to return unknown-license-reference for the introductory sentence, which is primarily to provide clarity to the reader of the file.

LICENSE-dist.txt.zip

doris-1.1.1-rc03-results.json.zip

@AyanSinhaMahapatra
Copy link
Contributor

@DennisClark this is already fixed in the LicenseDetection branch for the upcoming release: https://github.com/nexB/scancode-toolkit/tree/add-license-detection.

Similar to Issue 2 in #3069 (comment) and also similar to this issue reported by eclipse foundation here: #2878 (comment), this is solved by:

Here the detection rule is "unknown-intro-followed-by-match" i.e. an unknown intro was there followed by a proper detection and so this unknown can be removed. This is achieved by tagging specific rules as is_license_intro as True.

New license detection looks like this:

      "detected_license_expression": "bzip2-libbzip-2010",
      "detected_license_expression_spdx": "bzip2-1.0.6",
      "license_detections": [
        {
          "license_expression": "bzip2-libbzip-2010",
          "detection_rules": [
            "unknown-intro-followed-by-match"
          ],
          "matches": [
            {
              "score": 100.0,
              "start_line": 1,
              "end_line": 1,
              "matched_length": 5,
              "match_coverage": 100.0,
              "matcher": "2-aho",
              "license_expression": "unknown-license-reference",
              "rule_identifier": "license-intro_4.RULE",
              "referenced_filenames": [],
              "is_license_text": false,
              "is_license_notice": false,
              "is_license_reference": false,
              "is_license_tag": false,
              "is_license_intro": true,
              "rule_length": 5,
              "rule_relevance": 100,
              "matched_text": "licensed under the following terms:",
              "licenses": [
                {
                  "key": "unknown-license-reference",
                  "name": "Unknown License file reference",
                  "short_name": "Unknown License reference",
                  "category": "Unstated License",
                  "is_exception": false,
                  "is_unknown": true,
                  "owner": "Unspecified",
                  "homepage_url": null,
                  "text_url": "",
                  "reference_url": "https://scancode-licensedb.aboutcode.org/unknown-license-reference",
                  "scancode_text_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/licenses/unknown-license-reference.LICENSE",
                  "scancode_data_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/licenses/unknown-license-reference.yml",
                  "spdx_license_key": "LicenseRef-scancode-unknown-license-reference",
                  "spdx_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/licenses/unknown-license-reference.LICENSE"
                }
              ]
            },
            {
              "score": 100.0,
              "start_line": 8,
              "end_line": 37,
              "matched_length": 233,
              "match_coverage": 100.0,
              "matcher": "2-aho",
              "license_expression": "bzip2-libbzip-2010",
              "rule_identifier": "bzip2-libbzip-2010.LICENSE",
              "referenced_filenames": [],
              "is_license_text": true,
              "is_license_notice": false,
              "is_license_reference": false,
              "is_license_tag": false,
              "is_license_intro": false,
              "rule_length": 233,
              "rule_relevance": 100,
              "matched_text": "Redistribution and use in source and binary forms, with or without\n   modification, are permitted provided that the following conditions\n   are met:\n\n   1. Redistributions of source code must retain the above copyright\n      notice, this list of conditions and the following disclaimer.\n\n   2. The origin of this software must not be misrepresented; you must\n      not claim that you wrote the original software.  If you use this\n      software in a product, an acknowledgment in the product\n      documentation would be appreciated but is not required.\n\n   3. Altered source versions must be plainly marked as such, and must\n      not be misrepresented as being the original software.\n\n   4. The name of the author may not be used to endorse or promote\n      products derived from this software without specific prior written\n      permission.\n\n   THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS\n   OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED\n   WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE\n   ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY\n   DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL\n   DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE\n   GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS\n   INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,\n   WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING\n   NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS\n   SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.",
              "licenses": [
                {
                  "key": "bzip2-libbzip-2010",
                  "name": "bzip2 License 2010",
                  "short_name": "bzip2 License 2010",
                  "category": "Permissive",
                  "is_exception": false,
                  "is_unknown": false,
                  "owner": "bzip",
                  "homepage_url": "https://github.com/asimonov-im/bzip2/blob/master/LICENSE",
                  "text_url": "",
                  "reference_url": "https://scancode-licensedb.aboutcode.org/bzip2-libbzip-2010",
                  "scancode_text_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/licenses/bzip2-libbzip-2010.LICENSE",
                  "scancode_data_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/licenses/bzip2-libbzip-2010.yml",
                  "spdx_license_key": "bzip2-1.0.6",
                  "spdx_url": "https://spdx.org/licenses/bzip2-1.0.6"
                }
              ]
            }
          ]
        }
      ],
      "license_clues": [],

There was also a bug related to how we group matches into LicenseDetection, I have solved this to factor in license intros when doing this grouping.

Here are the scan results for you to look at:

Old scan just this issue:
doris-issue-3080.json.txt

New scan just this issue:
doris-add-license-detection-issue-3080.json.txt

Old scan entire file:
doris-v31.1.1-LICENSE-dist.json.txt

New scan entire file:
doris-add-license-detection-LICENSE-dist.json.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants