-
-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(server): library refresh go brrr #14456
Conversation
0eb1440
to
80aa615
Compare
80aa615
to
8ecde3b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice start! I think there are still a lot of untapped potential improvements here.
The update to |
Thanks for your comments @mertalev ! I'll first attempt to do the import path and exclusion pattern checks in SQL and then move to your suggestions |
d394654
to
8b2a48c
Compare
6d69307
to
c26f6aa
Compare
c26f6aa
to
a3be620
Compare
775b817
to
69b273d
Compare
Never thought of that, I've implemented your suggestion. I'm also considering changing the initial import code to ignore file mtime, this allows us to not do any file system calls except for the crawl. Metadata extraction will have to do the heavy lifting instead |
Would that mean you queue them for metadata extraction even if they're unchanged? You can test it but I think it'd be more overhead than the stat calls. Edit: also if you do this with the source set to |
I was referring to new imports, files that are new to immich. I hoped to improve the ingest performance by removing the stat call. After testing, there are two issues:
If we can mitigate the two issues above, I can rewrite the library import feature and do that in batches as well! |
I don't see why fileModifiedAt needs a non-null constraint in the DB. Might just be an oversight that didn't matter because it didn't affect our usage. I think you can change the asset entity and generate a migration to remove that constraint. For sidecar files, maybe you could add |
I might just put new Date() in at the moment to keep the PR somewhat constrained. Regarding sidecars, I have thought about that, problem right now is that we're batching the crawled files in batches of 10k. It might be hard to do get that working alright. Maybe I'll just queue a sidecar discovery for every imported asset for now |
17bd7ec
to
cb772ad
Compare
…/inline-offline-check
cb772ad
to
aa689ef
Compare
…/inline-offline-check
9313217
to
d8d61a0
Compare
You call on merging this one @mertalev |
If it's ready for final review I would like to take a look at it too before it is merged, I can take a look tomorrow morning. |
Thanks, I'd apprecate it! |
…/inline-offline-check
822ea5f
to
954200e
Compare
@etnoy I made a few changes, let me know if there's anything that seems off. |
I had a look and it's a big improvement, thanks for helping me clean this up |
Didn't get to this today sorry, on my list for tomorrow 😅 |
…/inline-offline-check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, nice work. You have some failing unit tests after the latest changes, but it's good to go after those are fixed I think
dbaa302
to
a67d97e
Compare
@zackpollard good to go now that the release is done? |
This PR significantly improves library scanning performance. Wherever suitable, we are doing jobs in batches, and many looped database interactions are replaced with SQL queries.
User testimonials
"@etnoy what on earth have you done. I tried your PR and it finished the scan for 1M assets in 37 seconds down from 728s on main. It takes 188s just to finish queuing on main" -- @mertalev
Changes made
Plus several minor cleanups and performance enhancements.
The performance improvements are at least an order of magnitude in library scanning.
Benchmark 1
A library scan with 22k items where nothing has changed since the last scan used to take 1m 22s, now it's below 10 seconds, an improvement of 87 percent!
Benchmark 2
A clean library import with 19k items takes 1m40s in main and 7 seconds in this PR.
NOTE: this benchmark is only the library service scan and does not include the metadata extraction. Also, some fs calls have been migrated from the library service to the metadata service, although this should only have a minor impact on overall scan performance
Benchmark 3
Importing a library with >5M assets.
No need to compare to main, you know it's fast!
Benchmark 4
Importing a library of 527041 files took 1m58s (without metadata extraction) in this PR.
No need to compare to main, you know it's fast!
Bonus:
This scan imports all new files:
This is an "idle scan", where a refresh finds no changes:

Future work:
Final note:
This PR allowed me to hit a milestone of 10M assets in a single Immich instance, likely a world-first. This does require max-old-space-size=8096, but that's to be expected anyway