-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Off-by-one in generated DWARF columns #65437
Comments
@RReverser it was definitely not intentional to make them 0 based or inconsistent to what clang's doing. I agree that this is a bug. This problem had been present in other places too:
So this bug isn't unique. I guess the root cause of this issue is Rust storing column numbers 0 based internally. |
@est31 Thanks for the confirmation. Do you want me to send a PR or could you make a change? |
Thanks, but the PR you linked was in fact one of my last PRs to this repo. I have since directed the focus of my free work to other projects. While I'm still open to paid work for contributions in the rust-lang org, this change is too small for that :). TLDR: I'd prefer if you did it :). |
Fair enough. If nobody does this earlier, I'll try to make a PR myself once I have some more spare time. Need to understand the best place to make this sort of adjustment first, because, as you said, it potentially affects several places. |
I'm still around to give gratis consulting :). Quick & easy fix would be in A longer term solution of the root of this problem would be to make spans start at column 1 everywhere in the compiler. Maybe I'm missing something and this is actually a bad idea, but the best way to determine that is to attempt a change. You'd probably have to remove +1's in a few places :). I suggest doing the easy fix first, then this bug is fixed. And in a second step, the internal format can be changed. Btw kudos to @bjorn3 : I just looked up cranelift's source code and they seem to add 1: https://github.com/bjorn3/rustc_codegen_cranelift/blob/0934dc84fd73d32b4964d6f80390bc9582e9152b/src/debuginfo.rs#L306 |
Yeah, but I think I'll also need to account for the beginning of line, which is currently correctly reported at column |
Interesting, good point. However, I believe this isn't pointing to columns of 1 because of special casing for line starts. If I take the line with The spans where the column is 1 are all pointing to closing braces const FOO: fn(u32) -> bool = |_v| true; // 38
const FOO: fn(u32) -> bool = |_v| true // 38
;
const FOO: fn(u32) -> bool = |_v| (true); // 40
const FOO: fn(u32) -> bool = (|_v| true); // 40
const FOO: fn(u32) -> bool = |_v| { true }; // 42
const FOO: fn(u32) -> bool = |_v| { true
}; // 1 These confirm the hypothesis: all spans, if you add 1, point to right behind the last character of the function. Further experiments show that even if a function has multiple return statements spread out over the function, it has only one LLVM |
@est31 Ah, thanks, you're right, I haven't noticed that column is The case with multiple |
Although this probably still means that I need to handle this special case and not just add |
Sorry that sentence I wrote was easy to misunderstand. I got the same results as you. All the return statements get converted to jumps to a section on the end, which then has a single
|
Ah, yeah, I thought you meant that there are several |
I did a bit of digging on the "final ret instruction" question. Apparently this already occurs in MIR. fn foo() -> bool {
false
} MIR is (generated by playground):
Here you can see the return instrunction having a span starting at line 3, column 2, and ending at that same place (The printing code accurately adds 1 to the column so there's no offset here). I guess the MIR A little bit of further digging gives us this code snippet which is responsible for generating the MIR rust/src/librustc_mir/build/mod.rs Lines 633 to 645 in ac45719
You can see that As for improving this, you might use the body param passed to the |
Also, IMO making the |
Yeah, absolutely, and, as I said above, I'll try to make it soon-ish. Just trying to figure out all implications, like this one where after my PR there will be now a 2-char difference between the real column and produced one. |
@RReverser small nit: in the case of the |
@est31 There is no direct equivalent, but presumably you'd want debugger to point at something in the source code when stepping through or extracting ranges, and for that It already doesn't and points to non-existing column instead (1 char past the brace), which can easily cause problems for tooling that consumes DWARF, and with my suggested PR it would be pointing even further, 2 chars past the brace - this is what I meant when saying "2-char difference". |
Btw, as you can see above or by playing with multiple returns, this is what Clang is doing as well. |
@RReverser I opened a PR that contains a test for the proper column numbers (as well as the simple off-by-one fix in |
I don't expect to be able to continue work on #65676 for the time being, so if anyone reading this wants to pick it up, go for it! |
While parsing DWARF in custom tooling, I've noticed that Rust generates 0-based columns, whereas other compilers, including LLVM-based (e.g. Clang and Swift) generate 1-based column.
This creates issues for tools that want to extract source spans, generate error messages or otherwise link to the original source location.
Let's take an example for Clang:
This generates the following debug info (Godbolt):
You can see how locations for expressions start at the first char, e.g.
5:9
forabort(...)
,10:5
forassert_less(...)
etc.Now let's take a Rust example:
This generates debug info (Godbolt):
You can see how locations for beginning of the line are still starting at column
1
like in Clang, but locations for various expressions are like8:12
forabort
,14:4
forassert_less
, etc. - that is, starting right before the expression rather than at its first character.I wasn't sure which one is correct, so I also checked Swift compiler (it does the same as Clang) and GCC (it turns out not to implement column information yet).
I also looked at LLVM docs, and their examples for debug information also use 1-based columns, pointing at the first char of an expression: https://llvm.org/docs/SourceLevelDebugging.html
Finally, I checked the DWARF spec, and, while I couldn't find exactly how columns are supposed to be represented for expressions, I found this for declarations and expect it to be true / consistent for other items as well:
All in all, it looks like Rust is the one generating columns that violate LLVM and, subsequently, DWARF descriptions and expectations.
The text was updated successfully, but these errors were encountered: