Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

It's not possible to import components with PURLs with more than 255 characters #2076

Closed
nscuro opened this issue Oct 22, 2022 · 14 comments · Fixed by #3560
Closed

It's not possible to import components with PURLs with more than 255 characters #2076

nscuro opened this issue Oct 22, 2022 · 14 comments · Fixed by #3560
Labels
defect Something isn't working
Milestone

Comments

@nscuro
Copy link
Member

nscuro commented Oct 22, 2022

Current Behavior:

Due to the chosen database schema, Dependency-Track is not able to import components with package URLs that have a length of 255 characters or more. See CycloneDX/cyclonedx-node-npm#224

Steps to Reproduce:

Upload a BOM that contains a component with a long (>255 characters) PURL.

Expected Behavior:

Dependency-Track should be able to import components with PURLs of arbitrary length.

Environment:

  • Dependency-Track Version: 4.6.1
  • Distribution: N/A
  • BOM Format & Version: N/A
  • Database Server: N/A
  • Browser: N/A

Additional Details:

The "COMPONENT"."PURL" column is indexed. Migrating it from VARCHAR(255) to MEDIUMTEXT, TEXT or similar data types may have a strong impact on query performance.

@nscuro nscuro added the defect Something isn't working label Oct 22, 2022
@ddurham2
Copy link

I've seen the same thing happen on our end.

Thankfully, the cyclonedx-npm tool supports a --short-PURLs argument which shortens the URLs at the cost of losing some information. It's a stopgap until this is fixed.

@rylyade1
Copy link

I have the same issue with the DESCRIPTION field of a project. Dependency-Track returns HTTP_500 on POST and PUT /api/v1/project, whenever the project's description exceed 255 characters.

@mehab
Copy link
Contributor

mehab commented May 19, 2023

Would it be useful if we trim the filepath, checksum, repository etc from the purl before saving it to database. And keep the full purl in the details field of the component, for example, saying that this is the original purl.
We could also put any hashes found in the hashes field of the component etc

@sahibamittal
Copy link
Contributor

sahibamittal commented May 30, 2023

Current bug as per testing in Postgres

The index defined on component's purl (COMPONENT_PURL_IDX) is not used in any query using purl match (either equals or LIKE). Screenshots proving the same :

Screenshot 2023-05-30 at 2 17 21 p m Screenshot 2023-05-30 at 2 22 15 p m

Tested solution

We can remove max length from purl and make it VARCHAR with length 1024.
@Column(name = "PURL", jdbcType = "VARCHAR", length = 1024)

To improve query performance, we can create index on method directly in database (cannot be done via annotation) using either :

  1. GIN index to use search using LIKE %purlString%
    create index "COMPONENT_PURL_IDX" on "COMPONENT" using gin (lower("PURL"::text) gin_trgm_ops);

  2. BTREE index to use search using LIKE purlString% or equals comparison
    create index "COMPONENT_PURL_IDX" on "COMPONENT" (lower("PURL"::text) text_pattern_ops);

Execution time of same query without using method index : ~40s
Execution time of same query with using method index : ~10s

Screenshot proving use of index :

Screenshot 2023-05-30 at 2 26 02 p m

@ShuP1
Copy link

ShuP1 commented Oct 6, 2023

Any update ?

@nscuro
Copy link
Member Author

nscuro commented Oct 6, 2023

FWIW, if you're running Postgres, you can simply ALTER the column definitions from VARCHAR(255) to VARCHAR(somelargenumber) or even TEXT (which is unlimited length). Postgres stores VARCHAR(X) and TEXT the same way, converting between them is a seamless operation.

We have yet to find a good way to address this in the code, such that it works for all supported databases, and does not break any indexes.

@ShuP1
Copy link

ShuP1 commented Oct 9, 2023

Just altered all PURL* columns to TEXT and the problem persist
It seems to be a check done by datanucleus not a database exception

@macblazer
Copy link

There are actually quite a few things other than just the PURL that are currently limited to 255 characters in the database and/or in the input parsing. See this query for places where the @Size attribute is used.

@msymons
Copy link
Member

msymons commented Feb 9, 2024

Whilst this issue has now been assigned to 5.x milestone, it could make it into v4.x if someone were to submit a PR. Just as long as it works across all RDBMSs.

@nscuro
Copy link
Member Author

nscuro commented Feb 9, 2024

We can remove the @Size annotations from PURL fields, which means that users can manually change their schema to accommodate for larger values. As mentioned above, for PostgreSQL it won't be a problem.

But to my knowledge MySQL requires a key length when creating indexes on text columns, and MSSQL also has some special behavior when indexing unlimited text. Because the ORM we still use does not allow us to specify specific indexes to use, per default it will create indexes that are plain useless on at least 2 of the supported RDBMSes. So the problem remains that we can't offer a general solution without breaking things for certain RDBMSes.

@nluzgin
Copy link

nluzgin commented Mar 12, 2024

Facing the same issue.
My fix is to modify SBOM purl with python. maybe useful for some one:
from packageurl import PackageURL purl = PackageURL.from_string(value) purl.qualifiers['download_url'] = '' some_dict[key] = PackageURL.to_string(purl)

this make this:
pkg:npm/%40babel/[email protected]?download_url=https%3A//myrepomanager/path/%40babel/plugin-transform-react-display-name/-/plugin-transform-react-display-name-7.23.3.tgz#packages/babel-plugin-transform-react-display-name

->

pkg:npm/%40babel/[email protected]#packages/babel-plugin-transform-react-display-name

and worked fine

@nscuro
Copy link
Member Author

nscuro commented Mar 17, 2024

I had another look at this:

  • MySQL has an index key length limit of 192 when using unicode collation
  • MSSQL < 2016 has an index key length limit of 900, both for clustered and non-clustered indexes (https://dba.stackexchange.com/a/208019)
  • MSSQL >= 2016 has an index key length limit of 1700 for non-clustered indexes
  • PostgreSQL and H2 have no index key length limit at all
  • All this would not be an issue if the ORM would allow us to specify an index key length based on which RDBMS we're running on, but it doesn't (Add MariaDB Support #271 (comment))

For MySQL, the above means that even the current database schema is incompatible when using Unicode collation. Even a PURL column defined as VARCHAR(255) exceeds the limit of 192. It is already the case that manual changes are required to make DT work with MySQL, see: #271 (comment)

Given the MSSQL constraint, what we can do is we can increase the PURL length limit from 255 to 786, which is less than the lowest index key length limit of 900. This means that everyone except MySQL users will automatically benefit from this.

786 characters should be plenty. We can re-asses the situation if that still causes issues.

nscuro added a commit to nscuro/dependency-track that referenced this issue Mar 17, 2024
Because PURLs are also used to populate the `TARGET` column of `COMPONENTANALYSISCACHE`, that column's length also has to be extended.

Fixes DependencyTrack#2076

Signed-off-by: nscuro <[email protected]>
nscuro added a commit to nscuro/dependency-track that referenced this issue Mar 17, 2024
Because PURLs are also used to populate the `TARGET` column of `COMPONENTANALYSISCACHE`, that column's length also has to be extended.

Fixes DependencyTrack#2076

Signed-off-by: nscuro <[email protected]>
@nscuro nscuro modified the milestones: 5.x, 4.11 Mar 17, 2024
nscuro added a commit to nscuro/dependency-track that referenced this issue Mar 17, 2024
Because PURLs are also used to populate the `TARGET` column of `COMPONENTANALYSISCACHE`, that column's length also has to be extended.

Fixes DependencyTrack#2076

Signed-off-by: nscuro <[email protected]>
Copy link
Contributor

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 19, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
defect Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants