Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 9 additions & 1 deletion compiler/rustc_builtin_macros/src/source_util.rs
Original file line number Diff line number Diff line change
Expand Up @@ -275,7 +275,15 @@ fn load_binary_file(
}
};
match cx.source_map().load_binary_file(&resolved_path) {
Ok(data) => Ok(data),
Ok(data) => {
cx.sess
.psess
.file_depinfo
.borrow_mut()
.insert(Symbol::intern(&resolved_path.to_string_lossy()));

Ok(data)
}
Err(io_err) => {
let mut err = cx.dcx().struct_span_err(
macro_span,
Expand Down
29 changes: 14 additions & 15 deletions compiler/rustc_interface/src/passes.rs
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ use rustc_ast as ast;
use rustc_attr_parsing::{AttributeParser, ShouldEmit};
use rustc_codegen_ssa::traits::CodegenBackend;
use rustc_codegen_ssa::{CodegenResults, CrateInfo};
use rustc_data_structures::indexmap::IndexMap;
use rustc_data_structures::jobserver::Proxy;
use rustc_data_structures::steal::Steal;
use rustc_data_structures::sync::{AppendOnlyIndexVec, FreezeLock, WorkerLocal};
Expand Down Expand Up @@ -586,7 +587,7 @@ fn write_out_deps(tcx: TyCtxt<'_>, outputs: &OutputFilenames, out_filenames: &[P
let result: io::Result<()> = try {
// Build a list of files used to compile the output and
// write Makefile-compatible dependency rules
let mut files: Vec<(String, u64, Option<SourceFileHash>)> = sess
let mut files: IndexMap<String, (u64, Option<SourceFileHash>)> = sess
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe make this IndexMap<String, Option<(u64, Option<SourceFileHash>)>> where None indicates that the length and hash haven't been filled in yet?

Copy link
Contributor Author

@osiewicz osiewicz Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that get us anything? This PR makes it so that files with binary contents are inserted into both source_map.files() (with null hash unnormalized_source_len equal to 0) and file_depinfo; with the use of IndexMap, we can override those coming from source_map.files() with hashes coming from file_depinfo. Thus, the value in that IndexMap would never be None.

Now is that the cleanest way? I'm sure not.. but I honestly don't understand implications of removing these "null" binary entries from the sourcemap well enough to make that cut.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes some amount of sense to keep them like this, as null entries? Especially if the has can never really be None as Piotr says. @bjorn3 does Piotr's comment convince you? Or d'ya have a good reason why making it an Option makes sense

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are first inserting into files with a dummy value of hash 0 + len 0 and later overwriting it with the real value, right? My suggestion is to first insert None and later overwrite it with the real value. This way you can unwrap the entry at use and be guaranteed that you didn't accidentally get a dummy value.

Copy link
Contributor Author

@osiewicz osiewicz Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point. What's tricky about it is that we have no way of knowing whether the items in files are "bugged" or not.
That is to say, in the context of that function, we have no way to tell if an entry with hash 0 + len 0 originated in a include_bytes!("my_1024mb_file.bin"); call (bugged one, that this PR tries to address) or include_str!("empty_file.txt") (legit one). Both of them have a hash 0 + len 0, and it is the correct value for the include_str one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can change line 639 to return None instead of (0, None), right? And adapt the return type of hash_iter_files accordingly. Or am I missing something?

Copy link
Contributor Author

@osiewicz osiewicz Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that branch (on line 639) is not relevant at all and that's not where we were getting our fake-nulls ((hash 0, len 0)) from.
Before this PR, the file path of a binary file (with non-UTF8 contents) was injected into depfile through source_map().files(). For non-utf8 files the values were not correct, since SourceMap only accepts Strings as the contents of a source file. See the following snippet, as that's where we "fake out" contents of a binary file when inserting it into a SourceMap:

let text = std::str::from_utf8(&bytes).unwrap_or("").to_string();

Crucially, we did not/do not call hash_iter_files on these, as we assume that SourceMap calculated the length and a hash correctly.. Which it did not, in some cases (non-utf8 binary files). This then led to binary files having a hash of empty string and a length of 0.

With this PR, that's still the case - binary files are still present in source_map().files() result, but they're also injected into file_depinfo and that lets us call hash_iter_files on them and get the correct hash/file length. The change to files type let's us override the "faulty" value with a good one. However, as outlined in https://github.com/rust-lang/rust/pull/151137/files#r2726164490 , there's no straightforward way (IMHO) to use an option here, because if we knew which entries from the SourceMap were "faulty", we could just call hash_iter_files on them directly without jumping through the hoops.. And line 639 is not really at fault for what we're struggling with.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. In that case I'm fine with leaving the code as is.

.source_map()
.files()
.iter()
Expand All @@ -595,10 +596,12 @@ fn write_out_deps(tcx: TyCtxt<'_>, outputs: &OutputFilenames, out_filenames: &[P
.map(|fmap| {
(
escape_dep_filename(&fmap.name.prefer_local_unconditionally().to_string()),
// This needs to be unnormalized,
// as external tools wouldn't know how rustc normalizes them
fmap.unnormalized_source_len as u64,
fmap.checksum_hash,
(
// This needs to be unnormalized,
// as external tools wouldn't know how rustc normalizes them
fmap.unnormalized_source_len as u64,
fmap.checksum_hash,
),
)
})
.collect();
Expand All @@ -616,7 +619,7 @@ fn write_out_deps(tcx: TyCtxt<'_>, outputs: &OutputFilenames, out_filenames: &[P
fn hash_iter_files<P: AsRef<Path>>(
it: impl Iterator<Item = P>,
checksum_hash_algo: Option<SourceFileHashAlgorithm>,
) -> impl Iterator<Item = (P, u64, Option<SourceFileHash>)> {
) -> impl Iterator<Item = (P, (u64, Option<SourceFileHash>))> {
it.map(move |path| {
match checksum_hash_algo.and_then(|algo| {
fs::File::open(path.as_ref())
Expand All @@ -632,8 +635,8 @@ fn write_out_deps(tcx: TyCtxt<'_>, outputs: &OutputFilenames, out_filenames: &[P
})
.ok()
}) {
Some((file_len, checksum)) => (path, file_len, Some(checksum)),
None => (path, 0, None),
Some((file_len, checksum)) => (path, (file_len, Some(checksum))),
None => (path, (0, None)),
}
})
}
Expand Down Expand Up @@ -707,18 +710,14 @@ fn write_out_deps(tcx: TyCtxt<'_>, outputs: &OutputFilenames, out_filenames: &[P
file,
"{}: {}\n",
path.display(),
files
.iter()
.map(|(path, _file_len, _checksum_hash_algo)| path.as_str())
.intersperse(" ")
.collect::<String>()
files.keys().map(String::as_str).intersperse(" ").collect::<String>()
)?;
}

// Emit a fake target for each input file to the compilation. This
// prevents `make` from spitting out an error if a file is later
// deleted. For more info see #28735
for (path, _file_len, _checksum_hash_algo) in &files {
for path in files.keys() {
writeln!(file, "{path}:")?;
}

Expand Down Expand Up @@ -747,7 +746,7 @@ fn write_out_deps(tcx: TyCtxt<'_>, outputs: &OutputFilenames, out_filenames: &[P
if sess.opts.unstable_opts.checksum_hash_algorithm().is_some() {
files
.iter()
.filter_map(|(path, file_len, hash_algo)| {
.filter_map(|(path, (file_len, hash_algo))| {
hash_algo.map(|hash_algo| (path, file_len, hash_algo))
})
.try_for_each(|(path, file_len, checksum_hash)| {
Expand Down
1 change: 1 addition & 0 deletions tests/run-make/checksum-freshness/binary_file
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
binaryÿ
6 changes: 4 additions & 2 deletions tests/run-make/checksum-freshness/expected.d
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
lib.d: lib.rs foo.rs
lib.d: lib.rs foo.rs binary_file

lib.rs:
foo.rs:
# checksum:blake3=94af75ee4ed805434484c3de51c9025278e5c3ada2315e2592052e102168a503 file_len:120 lib.rs
binary_file:
# checksum:blake3=4ac56f3f877798fb762d714c7bcb72e70133f4cc585f80dbd99c07755ae2c7f6 file_len:222 lib.rs
# checksum:blake3=2720e17bfda4f3b2a5c96bb61b7e76ed8ebe3359b34128c0e5d8032c090a4f1a file_len:119 foo.rs
# checksum:blake3=119a5db8711914922c5b1c1908be4958175c5afa95c08888de594725329b5439 file_len:7 binary_file
3 changes: 2 additions & 1 deletion tests/run-make/checksum-freshness/lib.rs
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
// A basic library to be used in tests with no real purpose.

mod foo;

// Binary file with invalid UTF-8 sequence.
static BINARY_FILE: &[u8] = include_bytes!("binary_file");
pub fn sum(a: i32, b: i32) -> i32 {
a + b
}
Loading