-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Point at invalid utf-8 span on user's source code #135557
Conversation
r? @fee1-dead rustbot has assigned @fee1-dead. Use |
Some changes occurred in src/tools/compiletest cc @jieyouxu |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
@@ -101,6 +101,8 @@ pub fn load_errors(testfile: &Path, revision: Option<&str>) -> Vec<Error> { | |||
|
|||
rdr.lines() | |||
.enumerate() | |||
// We want to ignore utf-8 failures in tests during collection of annotations. | |||
.filter(|(_, line)| line.is_ok()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This...
@@ -58,7 +58,7 @@ pub fn check(tests_path: impl AsRef<Path>, bad: &mut bool) { | |||
|
|||
let mut expected_revisions = BTreeSet::new(); | |||
|
|||
let contents = std::fs::read_to_string(test).unwrap(); | |||
let Ok(contents) = std::fs::read_to_string(test) else { continue }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...this, and another line in md-doc
blow up if a test has non-utf-8 bytes.
I ended up removing a test of ".rs
file with invalid few utf-8 bytes" because md-docs is in a separate repo. I feel we should make the entire test suite more resilient to these, but I think I can make that test I added and removed be in run-make
...
compiler/rustc_parse/src/lib.rs
Outdated
let source_file = psess.source_map().load_file(path).unwrap_or_else(|e| { | ||
let msg = format!("couldn't read {}: {}", path.display(), e); | ||
let msg = format!("couldn't read `{}`: {}", path.display(), e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the error here could be an enum of an io error or a utf8 error, and store the information about the span and message to be reported, and that would probably help with deduplicating these code
@@ -210,8 +210,34 @@ pub(crate) fn expand_include_str( | |||
MacEager::expr(cx.expr_str(cx.with_def_site_ctxt(bsp), interned_src)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe worth looking into unifying the logic here and the rustc_parse
logic somehow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved the logic to a separate function and called it from both places.
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
r=me after squashing
``` error: couldn't read `$DIR/not-utf8-bin-file.rs`: stream did not contain valid UTF-8 --> $DIR/not-utf8-2.rs:6:5 | LL | include!("not-utf8-bin-file.rs"); | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | note: `[193]` is not valid utf-8 --> $DIR/not-utf8-bin-file.rs:2:14 | LL | let _ = "�|�␂!5�cc␕␂��"; | ^ = note: this error originates in the macro `include` (in Nightly builds, run with -Z macro-backtrace for more info) ``` When we attempt to load a Rust source code file, if there is a OS file failure we try reading the file as bytes. If that succeeds we try to turn it into UTF-8. If *that* fails, we provide additional context about *where* the file has the first invalid UTF-8 character. Fix rust-lang#76869.
``` error: couldn't read `$DIR/not-utf8-bin-file.rs`: stream did not contain valid UTF-8 --> $DIR/not-utf8-2.rs:6:5 | LL | include!("not-utf8-bin-file.rs"); | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | note: byte `193` is not valid utf-8 --> $DIR/not-utf8-bin-file.rs:2:14 | LL | let _ = "�|�␂!5�cc␕␂��"; | ^ = note: this error originates in the macro `include` (in Nightly builds, run with -Z macro-backtrace for more info) ``` CC rust-lang#76869, rust-lang#135557.
@bors r=fee1-dead |
Point at invalid utf-8 span on user's source code ``` error: couldn't read `$DIR/not-utf8-bin-file.rs`: stream did not contain valid UTF-8 --> $DIR/not-utf8-2.rs:6:5 | LL | include!("not-utf8-bin-file.rs"); | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | note: byte `193` is not valid utf-8 --> $DIR/not-utf8-bin-file.rs:2:14 | LL | let _ = "�|�␂!5�cc␕␂��"; | ^ = note: this error originates in the macro `include` (in Nightly builds, run with -Z macro-backtrace for more info) ``` When we attempt to load a Rust source code file, if there is a OS file failure we try reading the file as bytes. If that succeeds we try to turn it into UTF-8. If *that* fails, we provide additional context about *where* the file has the first invalid UTF-8 character. Fix rust-lang#76869.
Rollup of 8 pull requests Successful merges: - rust-lang#135557 (Point at invalid utf-8 span on user's source code) - rust-lang#135596 (Properly note when query stack is being cut off) - rust-lang#135638 (Make it possible to build GCC on CI) - rust-lang#135648 (support wasm inline assembly in `naked_asm!`) - rust-lang#135826 (Misc. `rustc_resolve` cleanups) - rust-lang#135827 (CI: free disk with in-tree script instead of GitHub Action) - rust-lang#135850 (Update the `wasm-component-ld` tool) - rust-lang#135855 (Only assert the `Parser` size on specific arches) r? `@ghost` `@rustbot` modify labels: rollup
Point at invalid utf-8 span on user's source code ``` error: couldn't read `$DIR/not-utf8-bin-file.rs`: stream did not contain valid UTF-8 --> $DIR/not-utf8-2.rs:6:5 | LL | include!("not-utf8-bin-file.rs"); | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | note: byte `193` is not valid utf-8 --> $DIR/not-utf8-bin-file.rs:2:14 | LL | let _ = "�|�␂!5�cc␕␂��"; | ^ = note: this error originates in the macro `include` (in Nightly builds, run with -Z macro-backtrace for more info) ``` When we attempt to load a Rust source code file, if there is a OS file failure we try reading the file as bytes. If that succeeds we try to turn it into UTF-8. If *that* fails, we provide additional context about *where* the file has the first invalid UTF-8 character. Fix rust-lang#76869.
…iaskrgr Rollup of 10 pull requests Successful merges: - rust-lang#132983 (Edit dangling pointers ) - rust-lang#133154 (Reword resolve errors caused by likely missing crate in dep tree) - rust-lang#135409 (Fix ICE-133117: multiple never-pattern arm doesn't have false_edge_start_block) - rust-lang#135557 (Point at invalid utf-8 span on user's source code) - rust-lang#135596 (Properly note when query stack is being cut off) - rust-lang#135794 (Detect missing fields with default values and suggest `..`) - rust-lang#135814 (ci: use ghcr buildkit image) - rust-lang#135826 (Misc. `rustc_resolve` cleanups) - rust-lang#135837 (Remove test panic from File::open) - rust-lang#135856 (Library: Finalize dyn compatibility renaming) r? `@ghost` `@rustbot` modify labels: rollup
oops wrong pr |
You had me scared for a second there "oh, no! what fresh hell did this utf-8 handling cause during rollup?" XD |
…iaskrgr Rollup of 9 pull requests Successful merges: - rust-lang#132983 (Edit dangling pointers ) - rust-lang#135409 (Fix ICE-133117: multiple never-pattern arm doesn't have false_edge_start_block) - rust-lang#135557 (Point at invalid utf-8 span on user's source code) - rust-lang#135596 (Properly note when query stack is being cut off) - rust-lang#135794 (Detect missing fields with default values and suggest `..`) - rust-lang#135814 (ci: use ghcr buildkit image) - rust-lang#135826 (Misc. `rustc_resolve` cleanups) - rust-lang#135837 (Remove test panic from File::open) - rust-lang#135856 (Library: Finalize dyn compatibility renaming) r? `@ghost` `@rustbot` modify labels: rollup
When we attempt to load a Rust source code file, if there is a OS file failure we try reading the file as bytes. If that succeeds we try to turn it into UTF-8. If that fails, we provide additional context about where the file has the first invalid UTF-8 character.
Fix #76869.