Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: improve the performance of list artifacts #18610

Merged
merged 1 commit into from
Apr 30, 2023

Conversation

chlins
Copy link
Member

@chlins chlins commented Apr 27, 2023

  1. Change the query for listing tasks of scan which can use the db index.
  2. Add the gin index for task.extra_attrs.report_uuids

Fixes: #18013

Thank you for contributing to Harbor!

Comprehensive Summary of your change

Situation

The issue is that when the db has the large number of tasks(scan or other), the API of list artifacts will be quite slow when set the query parameter with_scan_overview=true, and also cost many db CPU resource. The root cause for the problem is that when querying scan reports, it is necessary to look up the scan tasks to obtain their status. However, the previous lookup method resulted in a full table scan of the database, which caused slow response times and increased resource utilization in cases with large amounts of data.

Improve

The data size of testing environment.

Artifacts: 100000+
Tasks: 500000+

Testing API: https://harbor.domain/api/v2.0/projects/project-001/repositories/repository-100/artifacts?with_tag=false&with_scan_overview=true&with_label=false&with_accessory=false&page_size=50&page=1

Metric Before After
API response time 28s+ <0.1s
DB CPU Usage ~300% < 1%

Issue being fixed

Fixes

Please indicate you've done the following:

  • Well Written Title and Summary of the PR
  • Label the PR as needed. "release-note/ignore-for-release, release-note/new-feature, release-note/update, release-note/enhancement, release-note/community, release-note/breaking-change, release-note/docs, release-note/infra, release-note/deprecation"
  • Accepted the DCO. Commits without the DCO will delay acceptance.
  • Made sure tests are passing and test coverage is added if needed.
  • Considered the docs impact and opened a new docs issue or PR with docs changes if needed in website repository.

@chlins chlins added release-note/enhancement Label to mark PR to be added under release notes as enhancement target/2.9.0 labels Apr 27, 2023
@chlins chlins requested a review from a team as a code owner April 27, 2023 02:31
@codecov
Copy link

codecov bot commented Apr 27, 2023

Codecov Report

Merging #18610 (7884e3c) into main (c09e539) will decrease coverage by 0.02%.
The diff coverage is 68.96%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main   #18610      +/-   ##
==========================================
- Coverage   67.40%   67.38%   -0.02%     
==========================================
  Files         984      984              
  Lines      107010   107029      +19     
  Branches     2670     2670              
==========================================
- Hits        72127    72121       -6     
- Misses      31001    31025      +24     
- Partials     3882     3883       +1     
Flag Coverage Δ
unittests 67.38% <68.96%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/pkg/task/dao/task.go 64.21% <60.00%> (-0.34%) ⬇️
src/pkg/task/task.go 62.42% <75.00%> (+0.98%) ⬆️
src/controller/scan/base_controller.go 60.07% <100.00%> (-0.40%) ⬇️

... and 10 files with indirect coverage changes

@chlins chlins force-pushed the perf/artifact-api-with-scan branch from 14bb443 to f12c209 Compare April 27, 2023 07:05
Copy link
Contributor

@zyyw zyyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please move the raw sql to the dao layer as we discussed.

@chlins chlins force-pushed the perf/artifact-api-with-scan branch from f12c209 to b29cf93 Compare April 28, 2023 06:23
Copy link
Contributor

@wy65701436 wy65701436 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

1. Change the query for listing tasks of scan which can use the db
   index.
2. Add the gin index for task.extra_attrs.report_uuids

Fixes: goharbor#18013

Signed-off-by: chlins <[email protected]>
@dioguerra
Copy link

@chlins It it possible to backport this to v2.7?

@chlins
Copy link
Member Author

chlins commented May 3, 2023

@chlins It it possible to backport this to v2.7?

@dioguerra Yes, it will be backported to the patch version of v2.7 and v2.8.

@dkulchinsky
Copy link
Contributor

Hey @chlins, is there anything I can do to help get this fix into 2.7 & 2.8?

I'm happy to try and prepare the PRs for this?

@chlins
Copy link
Member Author

chlins commented Jun 7, 2023

Hey @chlins, is there anything I can do to help get this fix into 2.7 & 2.8?

I'm happy to try and prepare the PRs for this?

@dkulchinsky Thanks, it has been back ported to v2.7.3 and v2.8.1, you can try to upgrade to these versions.

@dkulchinsky
Copy link
Contributor

@dkulchinsky Thanks, it has been back ported to v2.7.3 and v2.8.1, you can try to upgrade to these versions.

Ohh! that's great @chlins 👍🏼 looks like v2.7.3 was not released yet though, but I see the cherry-pick was merged #18632

Will wait for v2.7.3, hopefully soon 😄🤞🏼

WilfredAlmeida pushed a commit to WilfredAlmeida/harbor that referenced this pull request Jul 8, 2023
1. Change the query for listing tasks of scan which can use the db
   index.
2. Add the gin index for task.extra_attrs.report_uuids

Fixes: goharbor#18013

Signed-off-by: chlins <[email protected]>
Signed-off-by: Wilfred Almeida <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note/enhancement Label to mark PR to be added under release notes as enhancement target/2.9.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

UI list artifacts too slow when there is much trivy scan data
7 participants