Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Point at invalid utf-8 span on user's source code #135557

Merged
merged 1 commit into from
Jan 23, 2025
Merged

Conversation

estebank
Copy link
Contributor

@estebank estebank commented Jan 15, 2025

error: couldn't read `$DIR/not-utf8-bin-file.rs`: stream did not contain valid UTF-8
  --> $DIR/not-utf8-2.rs:6:5
   |
LL |     include!("not-utf8-bin-file.rs");
   |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   |
note: byte `193` is not valid utf-8
  --> $DIR/not-utf8-bin-file.rs:2:14
   |
LL |     let _ = "�|�␂!5�cc␕␂��";
   |              ^
   = note: this error originates in the macro `include` (in Nightly builds, run with -Z macro-backtrace for more info)

When we attempt to load a Rust source code file, if there is a OS file failure we try reading the file as bytes. If that succeeds we try to turn it into UTF-8. If that fails, we provide additional context about where the file has the first invalid UTF-8 character.

Fix #76869.

@rustbot
Copy link
Collaborator

rustbot commented Jan 15, 2025

r? @fee1-dead

rustbot has assigned @fee1-dead.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added A-compiletest Area: The compiletest test runner A-testsuite Area: The testsuite used to check the correctness of rustc S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jan 15, 2025
@rustbot
Copy link
Collaborator

rustbot commented Jan 15, 2025

Some changes occurred in src/tools/compiletest

cc @jieyouxu

@rust-log-analyzer

This comment has been minimized.

@rustbot rustbot added the A-tidy Area: The tidy tool label Jan 15, 2025
@rust-log-analyzer

This comment has been minimized.

compiler/rustc_builtin_macros/src/source_util.rs Outdated Show resolved Hide resolved
compiler/rustc_parse/src/lib.rs Outdated Show resolved Hide resolved
@@ -101,6 +101,8 @@ pub fn load_errors(testfile: &Path, revision: Option<&str>) -> Vec<Error> {

rdr.lines()
.enumerate()
// We want to ignore utf-8 failures in tests during collection of annotations.
.filter(|(_, line)| line.is_ok())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This...

@@ -58,7 +58,7 @@ pub fn check(tests_path: impl AsRef<Path>, bad: &mut bool) {

let mut expected_revisions = BTreeSet::new();

let contents = std::fs::read_to_string(test).unwrap();
let Ok(contents) = std::fs::read_to_string(test) else { continue };
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...this, and another line in md-doc blow up if a test has non-utf-8 bytes.

I ended up removing a test of ".rs file with invalid few utf-8 bytes" because md-docs is in a separate repo. I feel we should make the entire test suite more resilient to these, but I think I can make that test I added and removed be in run-make...

compiler/rustc_builtin_macros/src/source_util.rs Outdated Show resolved Hide resolved
Comment on lines 77 to 81
let source_file = psess.source_map().load_file(path).unwrap_or_else(|e| {
let msg = format!("couldn't read {}: {}", path.display(), e);
let msg = format!("couldn't read `{}`: {}", path.display(), e);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the error here could be an enum of an io error or a utf8 error, and store the information about the span and message to be reported, and that would probably help with deduplicating these code

@@ -210,8 +210,34 @@ pub(crate) fn expand_include_str(
MacEager::expr(cx.expr_str(cx.with_def_site_ctxt(bsp), interned_src))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe worth looking into unifying the logic here and the rustc_parse logic somehow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the logic to a separate function and called it from both places.

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 18, 2025
@rust-log-analyzer

This comment has been minimized.

@estebank estebank added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jan 18, 2025
Copy link
Member

@fee1-dead fee1-dead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r=me after squashing

```
error: couldn't read `$DIR/not-utf8-bin-file.rs`: stream did not contain valid UTF-8
  --> $DIR/not-utf8-2.rs:6:5
   |
LL |     include!("not-utf8-bin-file.rs");
   |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   |
note: `[193]` is not valid utf-8
  --> $DIR/not-utf8-bin-file.rs:2:14
   |
LL |     let _ = "�|�␂!5�cc␕␂��";
   |              ^
   = note: this error originates in the macro `include` (in Nightly builds, run with -Z macro-backtrace for more info)
```

When we attempt to load a Rust source code file, if there is a OS file failure we try reading the file as bytes. If that succeeds we try to turn it into UTF-8. If *that* fails, we provide additional context about *where* the file has the first invalid UTF-8 character.

Fix rust-lang#76869.
estebank added a commit to estebank/rust that referenced this pull request Jan 22, 2025
```
error: couldn't read `$DIR/not-utf8-bin-file.rs`: stream did not contain valid UTF-8
  --> $DIR/not-utf8-2.rs:6:5
   |
LL |     include!("not-utf8-bin-file.rs");
   |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   |
note: byte `193` is not valid utf-8
  --> $DIR/not-utf8-bin-file.rs:2:14
   |
LL |     let _ = "�|�␂!5�cc␕␂��";
   |              ^
   = note: this error originates in the macro `include` (in Nightly builds, run with -Z macro-backtrace for more info)
```

CC rust-lang#76869, rust-lang#135557.
@estebank
Copy link
Contributor Author

@bors r=fee1-dead

@bors
Copy link
Contributor

bors commented Jan 22, 2025

📌 Commit 57dd42d has been approved by fee1-dead

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 22, 2025
jieyouxu added a commit to jieyouxu/rust that referenced this pull request Jan 22, 2025
Point at invalid utf-8 span on user's source code

```
error: couldn't read `$DIR/not-utf8-bin-file.rs`: stream did not contain valid UTF-8
  --> $DIR/not-utf8-2.rs:6:5
   |
LL |     include!("not-utf8-bin-file.rs");
   |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   |
note: byte `193` is not valid utf-8
  --> $DIR/not-utf8-bin-file.rs:2:14
   |
LL |     let _ = "�|�␂!5�cc␕␂��";
   |              ^
   = note: this error originates in the macro `include` (in Nightly builds, run with -Z macro-backtrace for more info)
```

When we attempt to load a Rust source code file, if there is a OS file failure we try reading the file as bytes. If that succeeds we try to turn it into UTF-8. If *that* fails, we provide additional context about *where* the file has the first invalid UTF-8 character.

Fix rust-lang#76869.
bors added a commit to rust-lang-ci/rust that referenced this pull request Jan 22, 2025
Rollup of 8 pull requests

Successful merges:

 - rust-lang#135557 (Point at invalid utf-8 span on user's source code)
 - rust-lang#135596 (Properly note when query stack is being cut off)
 - rust-lang#135638 (Make it possible to build GCC on CI)
 - rust-lang#135648 (support wasm inline assembly in `naked_asm!`)
 - rust-lang#135826 (Misc. `rustc_resolve` cleanups)
 - rust-lang#135827 (CI: free disk with in-tree script instead of GitHub Action)
 - rust-lang#135850 (Update the `wasm-component-ld` tool)
 - rust-lang#135855 (Only assert the `Parser` size on specific arches)

r? `@ghost`
`@rustbot` modify labels: rollup
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Jan 22, 2025
Point at invalid utf-8 span on user's source code

```
error: couldn't read `$DIR/not-utf8-bin-file.rs`: stream did not contain valid UTF-8
  --> $DIR/not-utf8-2.rs:6:5
   |
LL |     include!("not-utf8-bin-file.rs");
   |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   |
note: byte `193` is not valid utf-8
  --> $DIR/not-utf8-bin-file.rs:2:14
   |
LL |     let _ = "�|�␂!5�cc␕␂��";
   |              ^
   = note: this error originates in the macro `include` (in Nightly builds, run with -Z macro-backtrace for more info)
```

When we attempt to load a Rust source code file, if there is a OS file failure we try reading the file as bytes. If that succeeds we try to turn it into UTF-8. If *that* fails, we provide additional context about *where* the file has the first invalid UTF-8 character.

Fix rust-lang#76869.
bors added a commit to rust-lang-ci/rust that referenced this pull request Jan 22, 2025
…iaskrgr

Rollup of 10 pull requests

Successful merges:

 - rust-lang#132983 (Edit dangling pointers )
 - rust-lang#133154 (Reword resolve errors caused by likely missing crate in dep tree)
 - rust-lang#135409 (Fix ICE-133117: multiple never-pattern arm doesn't have false_edge_start_block)
 - rust-lang#135557 (Point at invalid utf-8 span on user's source code)
 - rust-lang#135596 (Properly note when query stack is being cut off)
 - rust-lang#135794 (Detect missing fields with default values and suggest `..`)
 - rust-lang#135814 (ci: use ghcr buildkit image)
 - rust-lang#135826 (Misc. `rustc_resolve` cleanups)
 - rust-lang#135837 (Remove test panic from File::open)
 - rust-lang#135856 (Library: Finalize dyn compatibility renaming)

r? `@ghost`
`@rustbot` modify labels: rollup
@matthiaskrgr
Copy link
Member

@bors r-
#135891 (comment)

@bors bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Jan 22, 2025
@matthiaskrgr
Copy link
Member

oops wrong pr
@bors r=fee1-dead

@bors
Copy link
Contributor

bors commented Jan 22, 2025

📌 Commit 57dd42d has been approved by fee1-dead

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jan 22, 2025
@estebank
Copy link
Contributor Author

oops wrong pr

You had me scared for a second there "oh, no! what fresh hell did this utf-8 handling cause during rollup?" XD

bors added a commit to rust-lang-ci/rust that referenced this pull request Jan 22, 2025
…iaskrgr

Rollup of 9 pull requests

Successful merges:

 - rust-lang#132983 (Edit dangling pointers )
 - rust-lang#135409 (Fix ICE-133117: multiple never-pattern arm doesn't have false_edge_start_block)
 - rust-lang#135557 (Point at invalid utf-8 span on user's source code)
 - rust-lang#135596 (Properly note when query stack is being cut off)
 - rust-lang#135794 (Detect missing fields with default values and suggest `..`)
 - rust-lang#135814 (ci: use ghcr buildkit image)
 - rust-lang#135826 (Misc. `rustc_resolve` cleanups)
 - rust-lang#135837 (Remove test panic from File::open)
 - rust-lang#135856 (Library: Finalize dyn compatibility renaming)

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit b4266b0 into rust-lang:master Jan 23, 2025
6 checks passed
@rustbot rustbot added this to the 1.86.0 milestone Jan 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-compiletest Area: The compiletest test runner A-testsuite Area: The testsuite used to check the correctness of rustc A-tidy Area: The tidy tool S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ER] Line number for 'stream did not contain valid UTF-8' errors
7 participants