Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Our company has over 20TB of data, and now the Garbage Collection task is error. #18701

Closed
SandyLing opened this issue May 19, 2023 · 7 comments
Assignees

Comments

@SandyLing
Copy link

Our company's Harbor repository has been in use for 7-8 years, and we have gradually upgraded it to version 2.7. We have cleaned up data several times in the past, but recently when running the Garbage Collection task, it shows "error" after about 2 days. I suspect the issue may be due to the large data volume and a timeout occurring since it hasn't been cleaned for a while. Is there any solution that can help us clean up the data regularly? Currently, we have 20TB of data, which grows by approximately 1-2TB per month. The error message is as follows:

{"errors":[{"code":"NOT_FOUND","message":"{"code":10010,"message":"object is not found","details":"42e453cd73dc854255e30b54"}"}]} ~

@Vad1mo
Copy link
Member

Vad1mo commented May 19, 2023

any logs?

@chlins chlins added the area/gc label May 22, 2023
@chlins
Copy link
Member

chlins commented May 22, 2023

Could you check the jobservice dashboard status whether the GC job is running?

@SandyLing
Copy link
Author

GARBAGE_COLLECTION pending count 2 , latency 117hrs 54min 0sec
I see GC job is running,but the data has not been cleaned.
I remember there used to be a mirroring tool that could clean the data, can it be used?

@stonezdj
Copy link
Contributor

The errored GC job is still running in a goroutine, it blocks other GC jobs from running before complete.
because the selected job is pending in the job queue, and no log is available.

@stonezdj
Copy link
Contributor

stonezdj commented Jun 7, 2023

GARBAGE_COLLECTION pending count 2 , latency 117hrs 54min 0sec I see GC job is running,but the data has not been cleaned. I remember there used to be a mirroring tool that could clean the data, can it be used?

It is not recommended to delete images or blobs with third party tools, it might cause a discrepancy between the database and file system.

@github-actions
Copy link

github-actions bot commented Aug 7, 2023

This issue is being marked stale due to a period of inactivity. If this issue is still relevant, please comment or remove the stale label. Otherwise, this issue will close in 30 days.

@github-actions github-actions bot added the Stale label Aug 7, 2023
@stonezdj
Copy link
Contributor

You could try 2.9.0, with this feature #18855, the total GC time could be shortened.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Completed
Development

No branches or pull requests

5 participants