ListArtifacts API is slow, UI is timing out when accessing repositories with many artifacts #18598

dkulchinsky · 2023-04-24T19:51:57Z

Expected behavior and actual behavior:

I believe this is a follow-up on #16839, which was closed but I'm not entirely sure what is the recommended solution and whether it was implemented in subsequent releases, so guidance would be greatly appreciated.

When accessing a repository in the Harbor UI, a response should be provided within a reasonable time and not timeout.

We noticed that when users are browsing repositories in the UI that have more than several hundreds tags, the UI will spin for a long time and in some cases will throw a 504 due to a timeout.

Our Harbor deployment is quite large, some figures:

1,578 repositories (over 5 projects)
90,435 tags/artifacts
there are several repositories that have >1000 tags and this is where we notice this issue the most

DB counts (per the bits that was requested in the other issue):

coredb=# select count(*) from artifact;
 count
--------
 108986

coredb=# select count(*) from blob;
  count
---------
 1483965

When this happens, we are observing high CPU usage on the Database (as high as 8 out of 16 cores) and ListArtifacts (GET) requests latency spikes >3 minutes

Based on #16839 I understand there may be some indexes that should be added to improve the performance if the DB queries, my understanding is that these should be (based on this FAQ):

CREATE INDEX concurrently idx_artifact_repository_name ON artifact USING hash (repository_name);
CREATE INDEX IF NOT EXISTS idx_artifact_repository_id ON artifact (repository_id);
CREATE INDEX IF NOT EXISTS idx_artifact_project_id ON artifact (project_id);

OTOH, I see that the first index already exists, however is btree type instead of hash, looking at the code it seems it was added in v2.6.0 but indeed has the default btree method:

harbor/make/migrations/postgresql/0090_2.6.0_schema.up.sql

Line 20 in 7ace561

    
           CREATE INDEX IF NOT EXISTS idx_artifact_repository_name ON artifact (repository_name);

These are the indexes in the artifact table in our production deployment:

coredb=# SELECT indexname,indexdef FROM pg_indexes WHERE tablename = 'artifact';

           indexname            |                                            indexdef
--------------------------------+-------------------------------------------------------------------------------------------------
 artifact_pkey                  | CREATE UNIQUE INDEX artifact_pkey ON public.artifact USING btree (id)
 unique_artifact                | CREATE UNIQUE INDEX unique_artifact ON public.artifact USING btree (repository_id, digest)
 idx_artifact_push_time         | CREATE INDEX idx_artifact_push_time ON public.artifact USING btree (push_time)
 idx_artifact_repository_name   | CREATE INDEX idx_artifact_repository_name ON public.artifact USING btree (repository_name)
 idx_artifact_digest_project_id | CREATE INDEX idx_artifact_digest_project_id ON public.artifact USING btree (digest, project_id)

should the index idx_artifact_repository_name be adjusted/re-created to be hash type? and should I add the other two indexes?

CREATE INDEX IF NOT EXISTS idx_artifact_repository_id ON artifact (repository_id);
CREATE INDEX IF NOT EXISTS idx_artifact_project_id ON artifact (project_id);

Steps to reproduce the problem:

Populate a Harbor deployment with repositories that have many artifacts (tags) and observe slow UI when listing the repository.

Versions:
Please specify the versions of following systems.

harbor version: v2.7.1
docker engine version: N/A
docker-compose version: N/A

Additional context:

N/A

The text was updated successfully, but these errors were encountered:

dkulchinsky · 2023-04-25T02:31:04Z

Looking at the ListArtifects method implementation I believe there are 3 mains steps:

Count number of artifacts in the repository
Retrieve 15 artifacts in the repository (based on page number & size)
Retrieve vulnerability overview of the artifacts

The UI then dynamically loads additional information (tags, cosign signatures, immutable status, etc..) but AFAICT this is non blocking and is very quick to load

I believe I was able to capture the queries for (1) & (2) above, and ran an explain analyze, the results (below) show that these two queries are very quick to execute (I've used the repository which is consistently very slow).

I also played with this API and identified that it is slow only when with_scan_overview is set to true!

with_scan_overview == false takes ~580ms:

https://<registry>/api/v2.0/projects/external/repositories/maven/maven/artifacts?page=1&page_size=15&with_tag=false&with_label=false&with_scan_overview=false&with_signature=false&with_immutable_status=false&with_accessory=false

with_scan_overview == true takes ~5.7 seconds! a 10x slowdown:

https://<registry>/api/v2.0/projects/external/repositories/maven/maven/artifacts?page=1&page_size=15&with_tag=false&with_label=false&with_scan_overview=true&with_signature=false&with_immutable_status=false&with_accessory=false

For reference, EXPLAIN ANALYZE of the two first queries mentioned above, I believe the issue is not here but adding for completeness.

Query (1):

coredb=# EXPLAIN ANALYZE SELECT COUNT(*) FROM artifact T0 WHERE T0.repository_name = 'external/library/redis' AND T0.id=id AND (EXISTS (SELECT 1 FROM tag WHERE tag.artifact_id = T0.id) OR NOT EXISTS (SELECT 1 FROM artifact_reference ref WHERE ref.child_id = T0.id)) AND T0.id not in (select artifact_id from artifact_accessory);
                                                                                 QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=25427.99..25428.00 rows=1 width=8) (actual time=64.937..64.943 rows=1 loops=1)
   ->  Bitmap Heap Scan on artifact t0  (cost=399.98..25426.79 rows=481 width=0) (actual time=61.107..64.920 rows=183 loops=1)
         Recheck Cond: ((repository_name)::text = 'external/library/redis'::text)
         Filter: ((id IS NOT NULL) AND (NOT (hashed SubPlan 5)) AND ((alternatives: SubPlan 1 or hashed SubPlan 2) OR (NOT (alternatives: SubPlan 3 or hashed SubPlan 4))))
         Rows Removed by Filter: 1158
         Heap Blocks: exact=744
         ->  Bitmap Index Scan on idx_artifact_repository_name  (cost=0.00..78.05 rows=1284 width=0) (actual time=0.257..0.258 rows=1343 loops=1)
               Index Cond: ((repository_name)::text = 'external/library/redis'::text)
         SubPlan 5
           ->  Seq Scan on artifact_accessory  (cost=0.00..301.85 rows=7985 width=8) (actual time=0.014..2.019 rows=8058 loops=1)
         SubPlan 1
           ->  Index Only Scan using idx_tag_artifact_id on tag  (cost=0.42..8.44 rows=1 width=0) (never executed)
                 Index Cond: (artifact_id = t0.id)
                 Heap Fetches: 0
         SubPlan 2
           ->  Seq Scan on tag tag_1  (cost=0.00..2772.99 rows=110899 width=4) (actual time=0.019..21.061 rows=111051 loops=1)
         SubPlan 3
           ->  Index Only Scan using idx_artifact_reference_child_id on artifact_reference ref  (cost=0.29..8.30 rows=1 width=0) (never executed)
                 Index Cond: (child_id = t0.id)
                 Heap Fetches: 0
         SubPlan 4
           ->  Seq Scan on artifact_reference ref_1  (cost=0.00..345.46 rows=11046 width=4) (actual time=0.016..2.298 rows=11035 loops=1)
 Planning Time: 1.207 ms
 Execution Time: 65.917 ms
(24 rows)

Execution time is very quick here.

Then Query (2):

coredb=# EXPLAIN ANALYZE SELECT T0.id, T0.type, T0.media_type, T0.manifest_media_type, T0.project_id, T0.repository_id, T0.repository_name, T0.digest, T0.size, T0.icon, T0.push_time, T0.pull_time, T0.extra_attrs, T0.annotations FROM artifact T0 WHERE T0.repository_name = 'external/library/redis' AND T0.id =id AND (EXISTS (SELECT 1 FROM tag WHERE tag.artifact_id = T0.id) OR NOT EXISTS (SELECT 1 FROM artifact_reference ref WHERE ref.child_id = T0.id)) AND T0.id not in (select artifact_id from artifact_accessory) ORDER BY T0.push_time DESC, T0.id DESC LIMIT 15;
                                                                                    QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=25438.59..25438.63 rows=15 width=668) (actual time=49.283..49.288 rows=15 loops=1)
   ->  Sort  (cost=25438.59..25439.79 rows=481 width=668) (actual time=49.282..49.285 rows=15 loops=1)
         Sort Key: t0.push_time DESC, t0.id DESC
         Sort Method: top-N heapsort  Memory: 36kB
         ->  Bitmap Heap Scan on artifact t0  (cost=399.98..25426.79 rows=481 width=668) (actual time=47.452..49.162 rows=183 loops=1)
               Recheck Cond: ((repository_name)::text = 'external/library/redis'::text)
               Filter: ((id IS NOT NULL) AND (NOT (hashed SubPlan 5)) AND ((alternatives: SubPlan 1 or hashed SubPlan 2) OR (NOT (alternatives: SubPlan 3 or hashed SubPlan 4))))
               Rows Removed by Filter: 1158
               Heap Blocks: exact=744
               ->  Bitmap Index Scan on idx_artifact_repository_name  (cost=0.00..78.05 rows=1284 width=0) (actual time=0.137..0.138 rows=1343 loops=1)
                     Index Cond: ((repository_name)::text = 'external/library/redis'::text)
               SubPlan 5
                 ->  Seq Scan on artifact_accessory  (cost=0.00..301.85 rows=7985 width=8) (actual time=0.005..1.349 rows=8060 loops=1)
               SubPlan 1
                 ->  Index Only Scan using idx_tag_artifact_id on tag  (cost=0.42..8.44 rows=1 width=0) (never executed)
                       Index Cond: (artifact_id = t0.id)
                       Heap Fetches: 0
               SubPlan 2
                 ->  Seq Scan on tag tag_1  (cost=0.00..2772.99 rows=110899 width=4) (actual time=0.012..15.749 rows=111068 loops=1)
               SubPlan 3
                 ->  Index Only Scan using idx_artifact_reference_child_id on artifact_reference ref  (cost=0.29..8.30 rows=1 width=0) (never executed)
                       Index Cond: (child_id = t0.id)
                       Heap Fetches: 0
               SubPlan 4
                 ->  Seq Scan on artifact_reference ref_1  (cost=0.00..345.46 rows=11046 width=4) (actual time=0.008..1.588 rows=11035 loops=1)
 Planning Time: 0.473 ms
 Execution Time: 49.382 ms

This is also executing very quickly.

chlins · 2023-04-25T03:24:33Z

@dkulchinsky Hi, thanks for your digging, we have found the root cause is with_scan_overview=true will slow down the API response time in the big size of task table, same issue is #18013, targeted to fix the problem in the next release.

dkulchinsky · 2023-04-25T03:43:33Z

Thanks @chlins, I went over the other issue you mentioned and looks like the fix will be in v2.9?

In the meantime, there was a possible workaround mentioned by cleaning up the task table, can you peovide more information about that? we're in a situation where people browsing through the UI are degradation the registry performance.

chlins · 2023-04-26T06:21:37Z

Although cleaning up the task table can improve performance, it requires a large amount of data to be cleaned up. Cleaning up a small amount of data may not have a significant effect, so it is not recommended for users to clean up on their own unless they are certain that these tasks have no impact on Harbor, so there is no better workaround currently available. However, we will consider including the fix to the previous release patch version(e.g 2.7.x/2.8.x).

chlins · 2023-04-27T03:33:29Z

@dkulchinsky Would you like to verify the changes in your environment? It achieved a good improvement in our testing environment. I can back port the fix to 2.7 and build a temporary package or image for your testing if you would.

dkulchinsky · 2023-04-27T03:40:31Z

@dkulchinsky Would you like to verify the changes in your environment? It achieved a good improvement in our testing environment. I can back port the fix to 2.7 and build a temporary package or image for your testing if you would.

thanks @chlins, that would be great!

chlins · 2023-04-27T03:57:12Z

Note: This is only a temporary way for testing, the image built based on the v2.7.1 with the above PR fix, so please do not adopt following steps if you are not v2.7.1 version which will affect the future normal upgrade. Backup your database if needed. The proper procedure is to wait for the official patch release in subsequent versions and upgrade.

There are 2 steps needed:

Create the index manually in the database, CREATE INDEX IF NOT EXISTS idx_task_extra_attrs_report_uuids ON task USING gin ((extra_attrs::jsonb->'report_uuids'));
Replace the harbor-core image to ghcr.io/chlins/harbor-core:v2.7.1-dev

dkulchinsky · 2023-04-27T23:29:10Z

that's awesome @chlins, I'm deploying it now to our sandbox Harbor and should have results shortly.

dkulchinsky · 2023-04-28T01:05:34Z

@chlins very promising results! on repos with >100 tags, using the stock 2.7.1 core image load time is >2-3s, with the image you provided (and the added index), this is down to 300~500ms

chlins · 2023-05-01T02:07:46Z

The issue has been resolved as of #18610.

dkulchinsky · 2023-05-01T11:15:25Z

Thanks @chlins! could you confirm if the fix will be backported to 2.7?

chlins · 2023-05-03T07:44:31Z

Thanks @chlins! could you confirm if the fix will be backported to 2.7?

Sure, it will be backported to the patch release of 2.7 and 2.8.

dkulchinsky mentioned this issue Apr 24, 2023

ListArtifacts is slow,it takes about 1s to call ListArtifacts function. And i found that the database doesn't create indexes for repository_name and content_type #16839

Closed

chlins self-assigned this Apr 25, 2023

chlins added the area/performance label Apr 25, 2023

chlins mentioned this issue Apr 27, 2023

fix: improve the performance of list artifacts #18610

Merged

5 tasks

chlins added the target/2.9.0 label May 1, 2023

chlins closed this as completed May 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ListArtifacts API is slow, UI is timing out when accessing repositories with many artifacts #18598

ListArtifacts API is slow, UI is timing out when accessing repositories with many artifacts #18598

dkulchinsky commented Apr 24, 2023 •

edited

Loading

dkulchinsky commented Apr 25, 2023

chlins commented Apr 25, 2023

dkulchinsky commented Apr 25, 2023

chlins commented Apr 26, 2023

chlins commented Apr 27, 2023

dkulchinsky commented Apr 27, 2023

chlins commented Apr 27, 2023 •

edited

Loading

dkulchinsky commented Apr 27, 2023

dkulchinsky commented Apr 28, 2023

chlins commented May 1, 2023

dkulchinsky commented May 1, 2023

chlins commented May 3, 2023

ListArtifacts API is slow, UI is timing out when accessing repositories with many artifacts #18598

ListArtifacts API is slow, UI is timing out when accessing repositories with many artifacts #18598

Comments

dkulchinsky commented Apr 24, 2023 • edited Loading

dkulchinsky commented Apr 25, 2023

chlins commented Apr 25, 2023

dkulchinsky commented Apr 25, 2023

chlins commented Apr 26, 2023

chlins commented Apr 27, 2023

dkulchinsky commented Apr 27, 2023

chlins commented Apr 27, 2023 • edited Loading

dkulchinsky commented Apr 27, 2023

dkulchinsky commented Apr 28, 2023

chlins commented May 1, 2023

dkulchinsky commented May 1, 2023

chlins commented May 3, 2023

dkulchinsky commented Apr 24, 2023 •

edited

Loading

chlins commented Apr 27, 2023 •

edited

Loading