-
-
Notifications
You must be signed in to change notification settings - Fork 14.4k
checksum-freshness: Fix invalid checksum calculation for binary files #151137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
rust-bors
merged 2 commits into
rust-lang:main
from
osiewicz:151090-checksum-freshness-binary-files
Jan 27, 2026
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| binaryÿ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,6 +1,8 @@ | ||
| lib.d: lib.rs foo.rs | ||
| lib.d: lib.rs foo.rs binary_file | ||
|
|
||
| lib.rs: | ||
| foo.rs: | ||
| # checksum:blake3=94af75ee4ed805434484c3de51c9025278e5c3ada2315e2592052e102168a503 file_len:120 lib.rs | ||
| binary_file: | ||
| # checksum:blake3=4ac56f3f877798fb762d714c7bcb72e70133f4cc585f80dbd99c07755ae2c7f6 file_len:222 lib.rs | ||
| # checksum:blake3=2720e17bfda4f3b2a5c96bb61b7e76ed8ebe3359b34128c0e5d8032c090a4f1a file_len:119 foo.rs | ||
| # checksum:blake3=119a5db8711914922c5b1c1908be4958175c5afa95c08888de594725329b5439 file_len:7 binary_file |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,7 +1,8 @@ | ||
| // A basic library to be used in tests with no real purpose. | ||
|
|
||
| mod foo; | ||
|
|
||
| // Binary file with invalid UTF-8 sequence. | ||
| static BINARY_FILE: &[u8] = include_bytes!("binary_file"); | ||
| pub fn sum(a: i32, b: i32) -> i32 { | ||
| a + b | ||
| } |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe make this
IndexMap<String, Option<(u64, Option<SourceFileHash>)>>whereNoneindicates that the length and hash haven't been filled in yet?Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does that get us anything? This PR makes it so that files with binary contents are inserted into both
source_map.files()(with null hash unnormalized_source_len equal to 0) andfile_depinfo; with the use ofIndexMap, we can override those coming fromsource_map.files()with hashes coming fromfile_depinfo. Thus, the value in that IndexMap would never beNone.Now is that the cleanest way? I'm sure not.. but I honestly don't understand implications of removing these "null" binary entries from the sourcemap well enough to make that cut.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes some amount of sense to keep them like this, as null entries? Especially if the has can never really be None as Piotr says. @bjorn3 does Piotr's comment convince you? Or d'ya have a good reason why making it an Option makes sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are first inserting into
fileswith a dummy value of hash 0 + len 0 and later overwriting it with the real value, right? My suggestion is to first insertNoneand later overwrite it with the real value. This way you can unwrap the entry at use and be guaranteed that you didn't accidentally get a dummy value.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point. What's tricky about it is that we have no way of knowing whether the items in
filesare "bugged" or not.That is to say, in the context of that function, we have no way to tell if an entry with
hash 0 + len 0originated in ainclude_bytes!("my_1024mb_file.bin");call (bugged one, that this PR tries to address) orinclude_str!("empty_file.txt")(legit one). Both of them have ahash 0 + len 0, and it is the correct value for theinclude_strone.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can change line 639 to return
Noneinstead of(0, None), right? And adapt the return type ofhash_iter_filesaccordingly. Or am I missing something?Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that branch (on line 639) is not relevant at all and that's not where we were getting our fake-nulls (
(hash 0, len 0)) from.Before this PR, the file path of a binary file (with non-UTF8 contents) was injected into depfile through
source_map().files(). For non-utf8 files the values were not correct, since SourceMap only accepts Strings as the contents of a source file. See the following snippet, as that's where we "fake out" contents of a binary file when inserting it into a SourceMap:rust/compiler/rustc_span/src/source_map.rs
Line 279 in 2850ca8
Crucially, we did not/do not call
hash_iter_fileson these, as we assume that SourceMap calculated the length and a hash correctly.. Which it did not, in some cases (non-utf8 binary files). This then led to binary files having a hash of empty string and a length of 0.With this PR, that's still the case - binary files are still present in
source_map().files()result, but they're also injected intofile_depinfoand that lets us callhash_iter_fileson them and get the correct hash/file length. The change tofilestype let's us override the "faulty" value with a good one. However, as outlined in https://github.com/rust-lang/rust/pull/151137/files#r2726164490 , there's no straightforward way (IMHO) to use an option here, because if we knew which entries from the SourceMap were "faulty", we could just callhash_iter_fileson them directly without jumping through the hoops.. And line 639 is not really at fault for what we're struggling with.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. In that case I'm fine with leaving the code as is.