Skip to content

CycloneDX v1.3 JSON Schema - canonical (duplicate) $id errors #123

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mrutkows opened this issue Jan 18, 2022 · 13 comments
Closed

CycloneDX v1.3 JSON Schema - canonical (duplicate) $id errors #123

mrutkows opened this issue Jan 18, 2022 · 13 comments
Labels

Comments

@mrutkows
Copy link
Contributor

mrutkows commented Jan 18, 2022

General Problem

Within the latest, published v1.3 JSON schema (i.e., https://github.com/CycloneDX/specification/blob/master/schema/bom-1.3.schema.json)

The same local $id value being used within 2 different definitions (i.e., both at top-level and under the "definitions" key) which cause "canonical id" or similar errors using several JSON schema validators referenced by the JSON schema project (ignored to unknown results by some others). Output of errors shown below for two tools.

Notes:


JSON Schema reference (applicable to problem space)

Declared schema:

  "$schema": "http://json-schema.org/draft-07/schema#",

Reproducible in tools

  1. Go: https://github.com/xeipuuv/gojsonschema
  • errors.errorString {s: "Reference #/definitions/hash must be canonical"}
  • errors.errorString {s: "Reference #/definitions/component must be canonical"}
  • errors.errorString {s: "Reference #/definitions/service must be canonical"}
  1. Online: https://www.liquid-technologies.com/online-json-schema-validator
line file message
397:23 schema.json Duplicate schema id 'http://cyclonedx.org/schema/bom-1.3.schema.json#/properties/components' encountered. Path 'definitions.component.properties.components', line 397, position 23.
920:21 schema.json Duplicate schema id 'http://cyclonedx.org/schema/bom-1.3.schema.json#/properties/services' encountered. Path 'definitions.service.properties.services', line 920, position 21.
160:19 schema.json Duplicate schema id 'http://cyclonedx.org/schema/bom-1.3.schema.json#/properties/hashes' encountered. Path 'definitions.tool.properties.hashes', line 160, position 19.

Investigation

Performing a manual inspection of schema...

Base URI

  "$id": "http://cyclonedx.org/schema/bom-1.3.schema.json"

Specific instances of problem

  1. Duplicate "$id": "#/properties/hashes"
    "tool": {
      "type": "object",
      ...
      "properties": {
       ...
        "hashes": {
          "$id": "#/properties/hashes",
          "type": "array",
          "items": {"$ref": "#/definitions/hash"},
          "title": "Hashes",
          "description": "The hashes of the tool (if applicable)."
        }
      }
    },
    "externalReference": {
      "type": "object",
       ...
      "properties": {
        ...
        "hashes": {
          "$id": "#/properties/hashes",
          "type": "array",
          "items": {"$ref": "#/definitions/hash"},
          "title": "Hashes",
          "description": "The hashes of the external reference (if applicable)."
        }
      }
    },
both reference the same "hash" definition property with the $ref key:
  "definitions": {
    ...
    "hash": {
      "type": "object",
      "title": "Hash Objects",
      "required": [
        "alg",
        "content"
      ],
      "properties": {
        "alg": {
          "$ref": "#/definitions/hash-alg"
        },
        "content": {
          "$ref": "#/definitions/hash-content"
        }
      }
    },
both resolve to the same "$id" value of "http://cyclonedx.org/schema/bom-1.3.schema.json#/properties/hashes"
  1. Duplicate "$id": "#/properties/components",
  1. Duplicate "$id": "#/properties/services",

Notes/Observation

  • Unsure of use case for declaring an $id in these fragments (i.e., why do these $id" declarations exist at all)

  • Cut/paste error (copied JSON fragment from top-level to "tools"?

    for example, these are the only "definitions" that have an "$id" (again $ref" usage in call cases is fine):

        "hashes": {
          "$id": "#/tools/properties/hashes",
          "type": "array",
          "items": {"$ref": "#/definitions/hash"},
          "title": "Hashes",
          "description": "The hashes of the tool (if applicable)."
        }
        "hashes": {
          "$id": "#/externalReference/properties/hashes",
          "type": "array",
          "items": {"$ref": "#/definitions/hash"},
          "title": "Hashes",
          "description": "The hashes of the external reference (if applicable)."
        }
        "components": {
          "$id": "#/component/properties/components",
          "type": "array",
          "items": {"$ref": "#/definitions/component"},
          "uniqueItems": true,
          "title": "Components"
        },
        "services": {
          "$id": "#service/properties/services",
          "type": "array",
          "items": {"$ref": "#service/definitions/service"},
          "uniqueItems": true,
          "title": "Services"
        },
@jkowalleck
Copy link
Member

Thanks for the detailed report, @mrutkows .

Do you see similarities to #83?
Do you see any relations to #84?

How do you see the removal of the all the $îds in schema that happened in CycloneDX spec transition from version 1.3 to 1.4?
is it making things better? making things worse?

How would you describe an ideal solution?
Do you see any risks with the solution you would propose?
Are there any side effects to consider?

@mrutkows
Copy link
Contributor Author

mrutkows commented Jan 19, 2022

@jkowalleck

Do you see similarities to #83? Do you see any relations to #84?

If #83 and #84 were combined (with subsequent discussions included) and netted out eventually identify all 3 $id duplication are identified (but still do not call out all 6 specific $id usages that get called into question) in order to have a comprehensive discussion...

  • Discussion in #83 identifies the duplicate $id values for the 2 appearances of "hashes", but does not include "tools" or "services". I see #83 is closed and seems to point to v1.4 for a fix?
  • Discussion in #84 identifies the duplicate $id values for the 2 appearances in "tools" and "services". It also seems to have a PR that deletes all $id declarations.

This issue summarizes the problem along with a comprehensive list of all occurrences of duplicate $id usage (along with line numbers and snippets).

There are 2 distinct cases that I call out here:

  1. "components" and "services" $id values are declared within top-level bom "properties" (and would be correct if a use case exists for them) which means you only need to change the 2nd occurrence (to add a path). This might look like:
        "components": {
          "$id": "#/component/properties/components",
          "type": "array",
          "items": {"$ref": "#/definitions/component"},
          "uniqueItems": true,
          "title": "Components"
        },
        "services": {
          "$id": "#service/properties/services",
          "type": "array",
          "items": {"$ref": "#/definitions/service"},
          "uniqueItems": true,
          "title": "Services"
        },
  1. In the case of "hashes", both $id declarations are not top-level, but their values indicate they are top-level "properties" and likely both need an additional relative path added to their URI fragments to include the parent schema to provide uniqueness; for example:
        "hashes": {
          "$id": "#/tools/properties/hashes",
          "type": "array",
          "items": {"$ref": "#/definitions/hash"},
          "title": "Hashes",
          "description": "The hashes of the tool (if applicable)."
        }
        
        "hashes": {
          "$id": "#/externalReference/properties/hashes",
          "type": "array",
          "items": {"$ref": "#/definitions/hash"},
          "title": "Hashes",
          "description": "The hashes of the external reference (if applicable)."
        }

Again, with the proper use of $ref (in each of these cases) you could simply delete the $id declaration; however, it then begs the question of removing all $id declarations in the top-level "properties" (and perhaps what lead us to v1.4 current state).

As to fixing in v1.4 (e.g., as #83 indicates perhaps)... I would say also that (IMO) it is important for discussing a possible solution within the v1.3 schema revisions (assuming some documents "finalized" and will always declare "1.3" as their version and cannot simply re-declare for 1.4)

@mrutkows
Copy link
Contributor Author

mrutkows commented Jan 19, 2022

Ideally, my approach would be to remove the problematic $ids in favor of $ref; specifically:

  1. Case 1: remove the (duplicate) second occurrence of the $id declaration (since there is a valid $ref) in the case of "components" and "services" (i.e., case 1 above). This results in:
        "components": {
          "type": "array",
          "items": {"$ref": "#/definitions/component"},
          "uniqueItems": true,
          "title": "Components"
        },
        "services": {
          "type": "array",
          "items": {"$ref": "#/definitions/service"},
          "uniqueItems": true,
          "title": "Services"
        },
  1. Case 2: remove the $id reference from both "hashes" object declarations as both are incorrect in their paths decleared on their URI fragments .

Do you see any risks with the solution you would propose?
Are there any side effects to consider?

Risks...
Any external reference that might be "working" today are likely working "by accident" of their tooling implementation and likely getting different "JSON pointers" depending on manner of parsing/searching identifiers (e.g., HashMap would overwrite entry with last $id parsed) or "fast" (streaming) parsers would short-circuit on first match (or others simply not find the fragment at all as is the case with browsers like Mozilla but not indicate he condition as an error).

My primary concern is leaving in top-level $id declarations in case they (for some unknown/historically discussed use case) are being referenced directly by an external schema (which by my recent reading is possible, but typically not encouraged) to not break external schemas.

If you want further risk reduction...

Preserve all $ids, but fix all paths as shown in my previous comment (this is what I did locally in a local schema copy my Go code uses as a minimal fix). This does still beg the use case as to when $id values would be used of course. Happy to discuss my thoughts...

@stevespringett
Copy link
Member

@jkowalleck Does #84 address all the necessary changes required to fix the 1.2 and 1.3 JSON schemas? If not, would it be possible to do so?

@jkowalleck
Copy link
Member

@mrutkows
could you pull-request the "fixed" files, so we could spot your recommended changes easily?

#123 (comment)

Preserve all $ids, but fix all paths as shown in my previous comment (this is what I did locally in a local schema copy my Go code uses as a minimal fix). This does still beg the use case as to when $id values would be used of course. Happy to discuss my thoughts...

@jkowalleck
Copy link
Member

jkowalleck commented Jan 20, 2022

@jkowalleck Does #84 address all the necessary changes required to fix the 1.2 and 1.3 JSON schemas? If not, would it be possible to do so?

@stevespringett basically, yes. BUT
i want to see how @mrutkows fixed the issues, and compare the solutions. There are multiple options to fix/address the issue - and considering the risks/backward-compatibility one solution might be "better" than others.

@jkowalleck
Copy link
Member

jkowalleck commented Jan 22, 2022

my rule of thumb for fixing: if a $id is just the same value a JSON pointer has, we could drop the $id without any risk
my point of view: the wrong $ids are not needed, as long as JSON pointers are usable - so instead of renaming the $id we could simply drop them.
This is the way how #84 was prepared.

Therefore i would prefer #84 over #125. Maybe i am not seeing something.

@mrutkows Do you see a benefit in #125 over #84?

regarding #125 I have some questions, @mrutkows :

  • do you see any benefit in adding new / fixing $id values? they don't help anybody at the moment, because they are new and not used anywhere, yet. The elements can be accessed via JSON pointers, so a $id is not needed.
  • Furthermore the $ids you introduced are the same values the JSON pointer has, are they? Don't they shadow the JSON pointer and therefore do not have any benefit ?
  • Do you see a benefit in keeping $ids that are the same value the JSON pointer has?

@mrutkows
Copy link
Contributor Author

@jkowalleck Please see PR opened to show what could be a minimally invasive "fix" that reduces risk if adopted in the existing 1.3 release line. Again, I am in full support that with a new miner release, all $id usage beyond the base URI should be removed as I see is the case for current v1.4 schema.

@jkowalleck
Copy link
Member

@stevespringett as far as i understand it, we have two options ready to fix/address this issues:

Guess it is time for the working groups to make a decision if and how to address the topic...
And they might need to decide how "{bug}fixes" will affect already published schema versions (1.2.2? 1.2b? )

@mrutkows
Copy link
Contributor Author

@stevespringett as far as i understand it, we have two options ready to fix/address this issues:

* [Preserve  keys, but fix potential JSON pointers to reflect actual DOM… #125](https://github.com/CycloneDX/specification/pull/125)

* [fixed JSON schema id  #84](https://github.com/CycloneDX/specification/pull/84)

@jkowalleck Please know I did not produce PR 125 to "compete" in an either/or manner with PR 84, but was only providing what I was asked for in terms of "minimally invasive" (least risk). Hope you have an informed discussion.

@stevespringett
Copy link
Member

Thank you so much @mrutkows. A least risk approach is likely what the Core Working Group would move forward with. I'm going to get clarity on it this week from the other members. I'd like to push updates out which we've already agreed to do. That's being tracked in #124 .

@jkowalleck
Copy link
Member

updated the proposed CI/CT in #110

@stevespringett
Copy link
Member

Thanks for all the work on this. PR #125 has been merged that fixes schema versions 1.2 and 1.3 in a minimally invasive way. All official CycloneDX tools will be updated with newer versions of the schemas.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants