Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The diagnostic emitter doesn't deal with non-ASCII characters correctly when eliding parts of overly long source lines #132860

Open
fmease opened this issue Nov 10, 2024 · 1 comment
Assignees
Labels
A-diagnostics Area: Messages for errors, warnings, and lints A-Unicode Area: Unicode C-bug Category: This is a bug. D-diagnostic-infra Diagnostics: Issues that affect all diagnostics, or relate to the diagnostic machinery itself. P-low Low priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@fmease
Copy link
Member

fmease commented Nov 10, 2024

Found while reviewing #126597 for the nth and final time. Many parts of the emitter look fishy to me wrt. Unicode handling and to be honest it was quite frustrating to review code related to it because the emitter doesn't make any attempt to newtype the different units used, namely byte lengths/offsets, char / Unicode scalar lengths/offsets and Unicode widths (I hope rust-lang/annotate-snippets-rs will remedy that). That's just an aside.


The string offset acrobatics performed in HumanEmitter::render_source_line and HumanEmitter::draw_line look incredibly suspicious to me. Let me just link some parts where we likely incorrectly reinterpret different units (byte len, char count, Unicode width):

let left = margin.left(source_string.len());
// Account for unicode characters of width !=0 that were removed.
let left = source_string.chars().take(left).map(|ch| char_width(ch)).sum();

let line_len = source_string.len();
// Create the source line we will highlight.
let left = margin.left(line_len);
let right = margin.right(line_len);
// On long lines, we strip the source line, accounting for unicode.
let mut taken = 0;
let code: String = source_string
.chars()
.skip(left)
.take_while(|ch| {
// Make sure that the trimming on the right will fall within the terminal width.
let next = char_width(*ch);
if taken + next > right - left {
return false;
}
taken += next;
true
})
.collect();

I gave up trying to make sense of this – having to look at all the weakly typed variables and fields of type usize.
However, based on these functions I crafted a pathological input file where it's clear something is amiss.

Example Reproducer

/*这是宽的。这是宽的。这是宽的。这是宽的。这是宽的。这是宽的。这是宽的。这是宽的。这是宽的。这是宽的。这是宽的。这是宽的。这是宽的。这是宽的。这是宽的。这是宽的。这是宽的。这是宽的。这是宽的。这是宽的。这是宽的。这是宽的。这是宽的。这是宽的。这是宽的。这是宽的。这是宽的。这是宽的。这是宽的。这是宽的。这是宽的。这是宽的。这是宽的。*/?

Compiler Output

Clearly butchered:

error: expected item, found `?`
 --> col.rs:1:170
  |
1.|....
  |^ expected item
  |
  = note: for a full list of items that can appear in modules, see <https://doc.rust-lang.org/reference/items.html>

error: aborting due to 1 previous error

Counterexample

Compare this to ASCII-only input:

/*aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa*/?

Compiler Output

Perfectly sensible:

error: expected item, found `?`
 --> col_ascii.rs:1:335
  |
1 | ...aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa*/?
  |                                                                                       ^ expected item
  |
  = note: for a full list of items that can appear in modules, see <https://doc.rust-lang.org/reference/items.html>

error: aborting due to 1 previous error
@fmease fmease added A-Unicode Area: Unicode C-bug Category: This is a bug. D-diagnostic-infra Diagnostics: Issues that affect all diagnostics, or relate to the diagnostic machinery itself. P-low Low priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Nov 10, 2024
@fmease fmease self-assigned this Nov 10, 2024
@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Nov 10, 2024
@fmease fmease added A-diagnostics Area: Messages for errors, warnings, and lints and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Nov 10, 2024
@fmease
Copy link
Member Author

fmease commented Nov 10, 2024

I've tentatively assigned myself but I'm not sure if I will actually tackle this. Not before #126597 is merged, that's for sure.

Priority set to low because this should be turned into an upstream issue over at https://github.com/rust-lang/annotate-snippets-rs since I want to prioritize migrating rustc to annote-snippet-rs (#59346) instead of messing with rustc's emitter.rs. Note that --error-format=human-annotate-rs -Zunstable-options currently panics on the very input I gave above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-diagnostics Area: Messages for errors, warnings, and lints A-Unicode Area: Unicode C-bug Category: This is a bug. D-diagnostic-infra Diagnostics: Issues that affect all diagnostics, or relate to the diagnostic machinery itself. P-low Low priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

2 participants