Skip to content

feat: integrity check jobs (missing files, untracked files, checksums)#24205

Open
insertish wants to merge 140 commits intomainfrom
feat/integrity-checks-izzy
Open

feat: integrity check jobs (missing files, untracked files, checksums)#24205
insertish wants to merge 140 commits intomainfrom
feat/integrity-checks-izzy

Conversation

@insertish
Copy link
Copy Markdown
Member

@insertish insertish commented Nov 26, 2025

  • Adds integrity check jobs:
    • Find missing files: Finds assets or asset files which are missing files on system. Checks all every day by default.
    • Find untracked files: Finds stray files not attached to either assets / asset files. Checks all every day by default.
    • Check checksums: Find assets whose checksum does not match source file. Maintains a continuous queue through all files which runs in batches for 1 hour each day by default.
  • Users can view integrity reports from the maintenance settings.
    • Each report can be viewed as a table, downloaded as a CSV, and re-run.
    • Individual entries in the reports can be downloaded (if applicable), deleted (removing underlying file or trashing asset).
    • Entire reports can be cleared (deleting all underlying files or trashing all assets).

How Has This Been Tested?

  • Service tests for integrity service.
  • Updated some tests where necessary. (job, queue, system-config)
  • e2e integration tests for API
  • e2e integration test for the integrity page

Screenshots

Screen Shot 2026-01-07 at 13 10 04 Screen Shot 2026-01-07 at 13 10 14 Screen Shot 2026-01-07 at 13 10 23
Screencast_20260107_131048.webm
Screen Shot 2026-01-07 at 13 16 15

Checklist:

  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation if applicable
  • I have no unrelated changes in the PR.
  • I have confirmed that any new dependencies are strictly necessary.
  • I have written tests for new code (if applicable)
  • I have followed naming conventions/patterns in the surrounding code
  • All code in src/services/ uses repositories implementations for database calls, filesystem operations, etc.
  • All code in src/repositories/ is pretty basic/simple and does not have any immich specific logic (that belongs in src/services/)

Please describe to which degree, if any, an LLM was used in creating this pull request.

Used to figure out some of the types in the SQL query builder syntax.
Used as a starting point for some research into Postgres / SQL query performance.

Future Work

Signed-off-by: izzy <me@insrt.uk>
Signed-off-by: izzy <me@insrt.uk>
Signed-off-by: izzy <me@insrt.uk>
Signed-off-by: izzy <me@insrt.uk>
Signed-off-by: izzy <me@insrt.uk>
fix: should provide arrow function to , oops!

Signed-off-by: izzy <me@insrt.uk>
Signed-off-by: izzy <me@insrt.uk>
type?: IntegrityReportType;
isDeleting: boolean;
}) => {
if (type === reportType || reportId === id) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the id globally unique? Is it only unique for a given report type?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it's globally unique.
This check succeeds if:

  • type? is present, i.e. "delete all" triggered for a type
  • reportId? is present, i.e. this specific report is being deleted

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I misread this as an && 🤦

Copy link
Copy Markdown
Member

@danieldietzler danieldietzler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks great to me. Handing it off to @jrasm91 :D

.where('asset.deletedAt', 'is', null)
.select(['asset.originalPath as path'])
.select((eb) => [
eb.ref('asset.id').$castTo<string | null>().as('assetId'),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that you cast the end result to something much more meaningful, you also don't need these casts anymore. (+ the sql template strings don't need to be typed as string | null if they're always null

},
}: ArgOf<'ConfigInit'>) {
this.integrityLock = await this.databaseRepository.tryLock(DatabaseLock.IntegrityCheck);
if (this.integrityLock) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe change this to if (!this.integrityLock) { return; } so that the entire function body does not need to be in that if

}

getIntegrityReport(dto: IntegrityGetReportDto): Promise<IntegrityReportResponseDto> {
return this.integrityRepository.getIntegrityReport({ cursor: dto.cursor, limit: dto.limit || 100 }, dto.type);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I set a limit of 0, this will give me the first 100 reports. Do you want ?? 100 instead, or is this intentional?

date: lastCreatedAt?.toISOString(),
});

printStats();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does printStats even need to be a function? You're only calling it once

Comment on lines +651 to +653
const byAsset = reports.filter((report) => report.assetId);
const byFileAsset = reports.filter((report) => report.fileAssetId);
const byPath = reports.filter((report) => !report.assetId && !report.fileAssetId);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is cursed lmao

@danieldietzler danieldietzler requested a review from jrasm91 March 2, 2026 11:48
@LeLunZ
Copy link
Copy Markdown
Collaborator

LeLunZ commented Mar 4, 2026

Suggestions:

Wouldn't it make sense to make these integrity checks also be part of the "restoring from db dump/backup" process, so you can directly verify after a restore if there are missing assets in your database or in your filesystem?


Also for assets that are on the filesystem but not in the database, it would be nice if they can be re-uploaded through immich itself, so you don't have to press "download", get them on your device, and then re-upload, but just press "import" or similar.

@akostadinov
Copy link
Copy Markdown

Will this catch instances of:

Unable to run job handler (thumbnailGeneration/generate-jpeg-thumbnail): Error: Input file contains unsupported image format

See #7082 for more discussion. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants