Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ListArtifacts API is slow, UI is timing out when accessing repositories with many artifacts #18598

Closed
dkulchinsky opened this issue Apr 24, 2023 · 12 comments

Comments

@dkulchinsky
Copy link
Contributor

dkulchinsky commented Apr 24, 2023

Expected behavior and actual behavior:

I believe this is a follow-up on #16839, which was closed but I'm not entirely sure what is the recommended solution and whether it was implemented in subsequent releases, so guidance would be greatly appreciated.

When accessing a repository in the Harbor UI, a response should be provided within a reasonable time and not timeout.

We noticed that when users are browsing repositories in the UI that have more than several hundreds tags, the UI will spin for a long time and in some cases will throw a 504 due to a timeout.

Our Harbor deployment is quite large, some figures:

  1. 1,578 repositories (over 5 projects)
  2. 90,435 tags/artifacts
  3. there are several repositories that have >1000 tags and this is where we notice this issue the most

DB counts (per the bits that was requested in the other issue):

coredb=# select count(*) from artifact;
 count
--------
 108986
coredb=# select count(*) from blob;
  count
---------
 1483965

When this happens, we are observing high CPU usage on the Database (as high as 8 out of 16 cores) and ListArtifacts (GET) requests latency spikes >3 minutes
image

Based on #16839 I understand there may be some indexes that should be added to improve the performance if the DB queries, my understanding is that these should be (based on this FAQ):

CREATE INDEX concurrently idx_artifact_repository_name ON artifact USING hash (repository_name);
CREATE INDEX IF NOT EXISTS idx_artifact_repository_id ON artifact (repository_id);
CREATE INDEX IF NOT EXISTS idx_artifact_project_id ON artifact (project_id);

OTOH, I see that the first index already exists, however is btree type instead of hash, looking at the code it seems it was added in v2.6.0 but indeed has the default btree method:

CREATE INDEX IF NOT EXISTS idx_artifact_repository_name ON artifact (repository_name);

These are the indexes in the artifact table in our production deployment:

coredb=# SELECT indexname,indexdef FROM pg_indexes WHERE tablename = 'artifact';
           indexname            |                                            indexdef
--------------------------------+-------------------------------------------------------------------------------------------------
 artifact_pkey                  | CREATE UNIQUE INDEX artifact_pkey ON public.artifact USING btree (id)
 unique_artifact                | CREATE UNIQUE INDEX unique_artifact ON public.artifact USING btree (repository_id, digest)
 idx_artifact_push_time         | CREATE INDEX idx_artifact_push_time ON public.artifact USING btree (push_time)
 idx_artifact_repository_name   | CREATE INDEX idx_artifact_repository_name ON public.artifact USING btree (repository_name)
 idx_artifact_digest_project_id | CREATE INDEX idx_artifact_digest_project_id ON public.artifact USING btree (digest, project_id)

should the index idx_artifact_repository_name be adjusted/re-created to be hash type? and should I add the other two indexes?

CREATE INDEX IF NOT EXISTS idx_artifact_repository_id ON artifact (repository_id);
CREATE INDEX IF NOT EXISTS idx_artifact_project_id ON artifact (project_id);

Steps to reproduce the problem:

Populate a Harbor deployment with repositories that have many artifacts (tags) and observe slow UI when listing the repository.

Versions:
Please specify the versions of following systems.

  • harbor version: v2.7.1
  • docker engine version: N/A
  • docker-compose version: N/A

Additional context:

N/A

@dkulchinsky
Copy link
Contributor Author

Looking at the ListArtifects method implementation I believe there are 3 mains steps:

  1. Count number of artifacts in the repository
  2. Retrieve 15 artifacts in the repository (based on page number & size)
  3. Retrieve vulnerability overview of the artifacts

The UI then dynamically loads additional information (tags, cosign signatures, immutable status, etc..) but AFAICT this is non blocking and is very quick to load

I believe I was able to capture the queries for (1) & (2) above, and ran an explain analyze, the results (below) show that these two queries are very quick to execute (I've used the repository which is consistently very slow).

I also played with this API and identified that it is slow only when with_scan_overview is set to true!

  1. with_scan_overview == false takes ~580ms:
https://<registry>/api/v2.0/projects/external/repositories/maven/maven/artifacts?page=1&page_size=15&with_tag=false&with_label=false&with_scan_overview=false&with_signature=false&with_immutable_status=false&with_accessory=false
  1. with_scan_overview == true takes ~5.7 seconds! a 10x slowdown:
https://<registry>/api/v2.0/projects/external/repositories/maven/maven/artifacts?page=1&page_size=15&with_tag=false&with_label=false&with_scan_overview=true&with_signature=false&with_immutable_status=false&with_accessory=false

For reference, EXPLAIN ANALYZE of the two first queries mentioned above, I believe the issue is not here but adding for completeness.

Query (1):

coredb=# EXPLAIN ANALYZE SELECT COUNT(*) FROM artifact T0 WHERE T0.repository_name = 'external/library/redis' AND T0.id=id AND (EXISTS (SELECT 1 FROM tag WHERE tag.artifact_id = T0.id) OR NOT EXISTS (SELECT 1 FROM artifact_reference ref WHERE ref.child_id = T0.id)) AND T0.id not in (select artifact_id from artifact_accessory);
                                                                                 QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=25427.99..25428.00 rows=1 width=8) (actual time=64.937..64.943 rows=1 loops=1)
   ->  Bitmap Heap Scan on artifact t0  (cost=399.98..25426.79 rows=481 width=0) (actual time=61.107..64.920 rows=183 loops=1)
         Recheck Cond: ((repository_name)::text = 'external/library/redis'::text)
         Filter: ((id IS NOT NULL) AND (NOT (hashed SubPlan 5)) AND ((alternatives: SubPlan 1 or hashed SubPlan 2) OR (NOT (alternatives: SubPlan 3 or hashed SubPlan 4))))
         Rows Removed by Filter: 1158
         Heap Blocks: exact=744
         ->  Bitmap Index Scan on idx_artifact_repository_name  (cost=0.00..78.05 rows=1284 width=0) (actual time=0.257..0.258 rows=1343 loops=1)
               Index Cond: ((repository_name)::text = 'external/library/redis'::text)
         SubPlan 5
           ->  Seq Scan on artifact_accessory  (cost=0.00..301.85 rows=7985 width=8) (actual time=0.014..2.019 rows=8058 loops=1)
         SubPlan 1
           ->  Index Only Scan using idx_tag_artifact_id on tag  (cost=0.42..8.44 rows=1 width=0) (never executed)
                 Index Cond: (artifact_id = t0.id)
                 Heap Fetches: 0
         SubPlan 2
           ->  Seq Scan on tag tag_1  (cost=0.00..2772.99 rows=110899 width=4) (actual time=0.019..21.061 rows=111051 loops=1)
         SubPlan 3
           ->  Index Only Scan using idx_artifact_reference_child_id on artifact_reference ref  (cost=0.29..8.30 rows=1 width=0) (never executed)
                 Index Cond: (child_id = t0.id)
                 Heap Fetches: 0
         SubPlan 4
           ->  Seq Scan on artifact_reference ref_1  (cost=0.00..345.46 rows=11046 width=4) (actual time=0.016..2.298 rows=11035 loops=1)
 Planning Time: 1.207 ms
 Execution Time: 65.917 ms
(24 rows)

Execution time is very quick here.

Then Query (2):

coredb=# EXPLAIN ANALYZE SELECT T0.id, T0.type, T0.media_type, T0.manifest_media_type, T0.project_id, T0.repository_id, T0.repository_name, T0.digest, T0.size, T0.icon, T0.push_time, T0.pull_time, T0.extra_attrs, T0.annotations FROM artifact T0 WHERE T0.repository_name = 'external/library/redis' AND T0.id =id AND (EXISTS (SELECT 1 FROM tag WHERE tag.artifact_id = T0.id) OR NOT EXISTS (SELECT 1 FROM artifact_reference ref WHERE ref.child_id = T0.id)) AND T0.id not in (select artifact_id from artifact_accessory) ORDER BY T0.push_time DESC, T0.id DESC LIMIT 15;
                                                                                    QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=25438.59..25438.63 rows=15 width=668) (actual time=49.283..49.288 rows=15 loops=1)
   ->  Sort  (cost=25438.59..25439.79 rows=481 width=668) (actual time=49.282..49.285 rows=15 loops=1)
         Sort Key: t0.push_time DESC, t0.id DESC
         Sort Method: top-N heapsort  Memory: 36kB
         ->  Bitmap Heap Scan on artifact t0  (cost=399.98..25426.79 rows=481 width=668) (actual time=47.452..49.162 rows=183 loops=1)
               Recheck Cond: ((repository_name)::text = 'external/library/redis'::text)
               Filter: ((id IS NOT NULL) AND (NOT (hashed SubPlan 5)) AND ((alternatives: SubPlan 1 or hashed SubPlan 2) OR (NOT (alternatives: SubPlan 3 or hashed SubPlan 4))))
               Rows Removed by Filter: 1158
               Heap Blocks: exact=744
               ->  Bitmap Index Scan on idx_artifact_repository_name  (cost=0.00..78.05 rows=1284 width=0) (actual time=0.137..0.138 rows=1343 loops=1)
                     Index Cond: ((repository_name)::text = 'external/library/redis'::text)
               SubPlan 5
                 ->  Seq Scan on artifact_accessory  (cost=0.00..301.85 rows=7985 width=8) (actual time=0.005..1.349 rows=8060 loops=1)
               SubPlan 1
                 ->  Index Only Scan using idx_tag_artifact_id on tag  (cost=0.42..8.44 rows=1 width=0) (never executed)
                       Index Cond: (artifact_id = t0.id)
                       Heap Fetches: 0
               SubPlan 2
                 ->  Seq Scan on tag tag_1  (cost=0.00..2772.99 rows=110899 width=4) (actual time=0.012..15.749 rows=111068 loops=1)
               SubPlan 3
                 ->  Index Only Scan using idx_artifact_reference_child_id on artifact_reference ref  (cost=0.29..8.30 rows=1 width=0) (never executed)
                       Index Cond: (child_id = t0.id)
                       Heap Fetches: 0
               SubPlan 4
                 ->  Seq Scan on artifact_reference ref_1  (cost=0.00..345.46 rows=11046 width=4) (actual time=0.008..1.588 rows=11035 loops=1)
 Planning Time: 0.473 ms
 Execution Time: 49.382 ms

This is also executing very quickly.

@chlins
Copy link
Member

chlins commented Apr 25, 2023

@dkulchinsky Hi, thanks for your digging, we have found the root cause is with_scan_overview=true will slow down the API response time in the big size of task table, same issue is #18013, targeted to fix the problem in the next release.

@chlins chlins self-assigned this Apr 25, 2023
@dkulchinsky
Copy link
Contributor Author

Thanks @chlins, I went over the other issue you mentioned and looks like the fix will be in v2.9?

In the meantime, there was a possible workaround mentioned by cleaning up the task table, can you peovide more information about that? we're in a situation where people browsing through the UI are degradation the registry performance.

@chlins
Copy link
Member

chlins commented Apr 26, 2023

Although cleaning up the task table can improve performance, it requires a large amount of data to be cleaned up. Cleaning up a small amount of data may not have a significant effect, so it is not recommended for users to clean up on their own unless they are certain that these tasks have no impact on Harbor, so there is no better workaround currently available. However, we will consider including the fix to the previous release patch version(e.g 2.7.x/2.8.x).

@chlins
Copy link
Member

chlins commented Apr 27, 2023

@dkulchinsky Would you like to verify the changes in your environment? It achieved a good improvement in our testing environment. I can back port the fix to 2.7 and build a temporary package or image for your testing if you would.

@dkulchinsky
Copy link
Contributor Author

@dkulchinsky Would you like to verify the changes in your environment? It achieved a good improvement in our testing environment. I can back port the fix to 2.7 and build a temporary package or image for your testing if you would.

thanks @chlins, that would be great!

@chlins
Copy link
Member

chlins commented Apr 27, 2023

Note: This is only a temporary way for testing, the image built based on the v2.7.1 with the above PR fix, so please do not adopt following steps if you are not v2.7.1 version which will affect the future normal upgrade. Backup your database if needed. The proper procedure is to wait for the official patch release in subsequent versions and upgrade.

There are 2 steps needed:

  1. Create the index manually in the database, CREATE INDEX IF NOT EXISTS idx_task_extra_attrs_report_uuids ON task USING gin ((extra_attrs::jsonb->'report_uuids'));
  2. Replace the harbor-core image to ghcr.io/chlins/harbor-core:v2.7.1-dev

@dkulchinsky
Copy link
Contributor Author

that's awesome @chlins, I'm deploying it now to our sandbox Harbor and should have results shortly.

@dkulchinsky
Copy link
Contributor Author

@chlins very promising results! on repos with >100 tags, using the stock 2.7.1 core image load time is >2-3s, with the image you provided (and the added index), this is down to 300~500ms

@chlins
Copy link
Member

chlins commented May 1, 2023

The issue has been resolved as of #18610.

@chlins chlins closed this as completed May 1, 2023
@dkulchinsky
Copy link
Contributor Author

Thanks @chlins! could you confirm if the fix will be backported to 2.7?

@chlins
Copy link
Member

chlins commented May 3, 2023

Thanks @chlins! could you confirm if the fix will be backported to 2.7?

Sure, it will be backported to the patch release of 2.7 and 2.8.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants