Comment indentation/alignment changed in Rust 1.81 #6351

printfn · 2024-09-28T02:43:39Z

When I updated to Rust 1.81 I noticed that the comment indentation changed. Not sure if this is an intentional breakage or a bug.

I have the following code (formatted with Rust 1.80.1, rustfmt 1.7.0):

enum Enum12 {
	Fn,
	NotEquals,
	Backslash,
}

fn parse_symbol2(ch: char) -> Enum12 {
	match ch {
		'=' => Enum12::Fn,
		'\u{2260}' => Enum12::NotEquals,       // unicode not equal to symbol
		'\\' | '\u{3bb}' => Enum12::Backslash, // lambda symbol
		_ => todo!(),
	}
}

.rustfmt.toml:

hard_tabs = true

When I updated to Rust 1.81 (rustfmt 1.7.1) I got this diff:

diff --git a/src/lib.rs b/src/lib.rs
index 634b281..edc0fca 100644
--- a/src/lib.rs
+++ b/src/lib.rs
@@ -7,7 +7,7 @@ enum Enum12 {
 fn parse_symbol2(ch: char) -> Enum12 {
        match ch {
                '=' => Enum12::Fn,
-               '\u{2260}' => Enum12::NotEquals,       // unicode not equal to symbol
+               '\u{2260}' => Enum12::NotEquals, // unicode not equal to symbol
                '\\' | '\u{3bb}' => Enum12::Backslash, // lambda symbol
                _ => todo!(),
        }

Strangely the bug seems to be very dependent on the length of some of my identifiers. If I change the enum name to Enum or E the formatting difference goes away. Are comments like this meant to be aligned or not?

The text was updated successfully, but these errors were encountered:

ytmimi · 2024-09-29T20:28:26Z

Thanks for the report. I took a look at the 1.7.1 CHANGELOG, but it's unclear to me which of those changes, if any, caused this. Also, I'm unable to reproduce this when building the v1.7.1 tag and running rustfmt on the input.

git switch v1.7.1 --detach
cargo run --bin rustfmt -- --config=version=One,hard_tabs=true --check

That said, I am able to reproduce this when I run with the 1.81 toolchain:

 rustfmt +1.81 --config=hard_tabs=true --check

There's a chance that this change in behavior is related to changes in rustc (it has happened before). I think the best way to figure out what's going on here is to bisect between the 1.80.1 and 1.81 release to see what commit caused this change in behavior.

printfn · 2024-09-30T00:51:49Z

cargo-bisect-rustc identified this commit: rust-lang/rust@a70b2ae.

searched nightlies: from nightly-2024-06-07 to nightly-2024-07-20
regressed nightly: nightly-2024-06-10
searched commit range: rust-lang/rust@f21554f...a70b2ae
regressed commit: rust-lang/rust@a70b2ae

bisected with cargo-bisect-rustc v0.6.9

Host triple: x86_64-unknown-linux-gnu
Reproduce with:

cargo bisect-rustc --start=1.80.1 --end=1.81.0 -c rustfmt -- fmt --check

printfn · 2024-09-30T05:22:09Z

I've tracked down the regression to the unicode_width crate, which was updated from 0.1.12 to 0.1.13 in this PR: rust-lang/rust#126172. The PR unicode-rs/unicode-width#45 updated widths of control characters from 0 to 1, including the width of tab characters.

This patch reverses the behaviour and fixes the regression:

diff --git a/src/utils.rs b/src/utils.rs
index d1cfc6ac..31957e97 100644
--- a/src/utils.rs
+++ b/src/utils.rs
@@ -690,7 +690,7 @@ impl NodeIdExt for NodeId {
 }

 pub(crate) fn unicode_str_width(s: &str) -> usize {
-    s.width()
+    s.replace('\t', "").width()
 }

 #[cfg(test)]
@@ -713,4 +713,9 @@ mod test {
             Some("aaa\n    bbb\n    ccc".to_string())
         );
     }
+
+    #[test]
+    fn tab_width() {
+        assert_eq!(unicode_str_width("\t"), 0);
+    }
 }

ytmimi · 2024-09-30T15:19:37Z

Thanks for digging into this! That helped me remember that #6203 mentioned there would be an impact to non-ascii unicode chars, but I don't think we expected this to impact \t characters.

@calebcartwright do you have any thoughts on how we should handle this breakage?

If we don't run this with hard_tabs=true then both rustfmt +1.81 and rustfmt +1.80 format this as follows:

enum Enum12 {
    Fn,
    NotEquals,
    Backslash,
}

fn parse_symbol2(ch: char) -> Enum12 {
    match ch {
        '=' => Enum12::Fn,
        '\u{2260}' => Enum12::NotEquals, // unicode not equal to symbol
        '\\' | '\u{3bb}' => Enum12::Backslash, // lambda symbol
        _ => todo!(),
    }
}

Maybe that suggests that \t being assigned a width of 0 before was actually a mistake?

printfn · 2024-09-30T23:11:48Z

I don't think either 0 or 1 width is correct, rustfmt needs to properly calculate the tab width to make it consistent.

Here's an example that's broken with Rust 1.81's version of rustfmt (i.e. unicode-width v0.1.13):

cargo fmt -- --config hard_tabs=false

#[derive(Copy, Clone)]
pub enum EitherOrBoth {
    Left(VersionChunk),
    Right(VersionChunk),
    Both(VersionChunk, VersionChunk),
}

#[derive(Copy, Clone)]
pub enum VersionChunk {
    Str(&'static str),
    Number { source: &'static str },
}

pub fn version_sort(either_or_both: EitherOrBoth) -> std::cmp::Ordering {
    loop {
        match either_or_both {
            EitherOrBoth::Left(_) => return std::cmp::Ordering::Greater,
            EitherOrBoth::Right(_) => return std::cmp::Ordering::Less,
            EitherOrBoth::Both(a, b) => match (a, b) {
                (VersionChunk::Str(ca), VersionChunk::Str(cb))
                | (VersionChunk::Str(ca), VersionChunk::Number { source: cb, .. })
                | (VersionChunk::Number { source: ca, .. }, VersionChunk::Str(cb)) => {
                    match ca.cmp(&cb) {
                        std::cmp::Ordering::Equal => {
                            continue;
                        }
                        order @ _ => return order,
                    }
                }
                _ => todo!(),
            },
        }
    }
}

cargo fmt -- --config hard_tabs=true

#[derive(Copy, Clone)]
pub enum EitherOrBoth {
	Left(VersionChunk),
	Right(VersionChunk),
	Both(VersionChunk, VersionChunk),
}

#[derive(Copy, Clone)]
pub enum VersionChunk {
	Str(&'static str),
	Number { source: &'static str },
}

pub fn version_sort(either_or_both: EitherOrBoth) -> std::cmp::Ordering {
	loop {
		match either_or_both {
			EitherOrBoth::Left(_) => return std::cmp::Ordering::Greater,
			EitherOrBoth::Right(_) => return std::cmp::Ordering::Less,
			EitherOrBoth::Both(a, b) => match (a, b) {
				(VersionChunk::Str(ca), VersionChunk::Str(cb))
				| (VersionChunk::Str(ca), VersionChunk::Number { source: cb, .. })
				| (VersionChunk::Number { source: ca, .. }, VersionChunk::Str(cb)) => match ca.cmp(&cb) {
					std::cmp::Ordering::Equal => {
						continue;
					}
					order @ _ => return order,
				},
				_ => todo!(),
			},
		}
	}
}

You can see how the match ca.cmp(&cb) expression is formatted quite differently depending on whether we're using spaces or tabs.

ytmimi · 2024-11-02T18:30:52Z

Agreed that neither 0 nor 1 are correct. Getting unicode_str_width to calculating the precise width of any given tab character while rustfmt is rewriting your code will be challenging. It'll requires having context of all the characters that have already been formatted on the current line, which I don't think is always possible.

klensy · 2024-11-03T16:30:40Z

unicode-width 0.1 (0.1.14) branch reverted to old behavior, while 0.2 using new: unicode-rs/unicode-width#67

printfn · 2024-11-03T18:50:42Z

That's not correct, only the behaviour for newlines was reverted; the behaviour for tabs and other control characters has not been reverted. Regardless, passing any ASCII character <=31 to unicode-width is a bug. Other users of the crate such as rustc use code like this, which rustfmt also needs to do:

https://github.com/rust-lang/rust/blob/7028d9318fadc20e5e3058d52e44d785d31a6aaa/compiler/rustc_span/src/lib.rs#L2211

calebcartwright · 2024-11-03T19:25:55Z

Are comments like this meant to be aligned or not?

No

@calebcartwright do you have any thoughts on how we should handle this breakage?
It'll requires having context of all the characters that have already been formatted on the current line, which I don't think is always possible.

Caveat this by saying I'm not looking at the code so just brainstorming, but if all the call sites have the relevant info (presumably the config, maybe the shape?) in context would it be conceivable to pass those along and swap the tab width for the configured tab spaces that will be replaced later and just use the unicode crate for the width on other characters?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comment indentation/alignment changed in Rust 1.81 #6351

Comment indentation/alignment changed in Rust 1.81 #6351

printfn commented Sep 28, 2024

ytmimi commented Sep 29, 2024

printfn commented Sep 30, 2024 •

edited

Loading

printfn commented Sep 30, 2024

ytmimi commented Sep 30, 2024

printfn commented Sep 30, 2024

ytmimi commented Nov 2, 2024

klensy commented Nov 3, 2024

printfn commented Nov 3, 2024

calebcartwright commented Nov 3, 2024

Comment indentation/alignment changed in Rust 1.81 #6351

Comment indentation/alignment changed in Rust 1.81 #6351

Comments

printfn commented Sep 28, 2024

ytmimi commented Sep 29, 2024

printfn commented Sep 30, 2024 • edited Loading

printfn commented Sep 30, 2024

ytmimi commented Sep 30, 2024

printfn commented Sep 30, 2024

ytmimi commented Nov 2, 2024

klensy commented Nov 3, 2024

printfn commented Nov 3, 2024

calebcartwright commented Nov 3, 2024

printfn commented Sep 30, 2024 •

edited

Loading