Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Off-by-one in generated DWARF columns #65437

Closed
RReverser opened this issue Oct 15, 2019 · 20 comments · Fixed by #69357
Closed

Off-by-one in generated DWARF columns #65437

RReverser opened this issue Oct 15, 2019 · 20 comments · Fixed by #69357
Labels
A-debuginfo Area: Debugging information in compiled programs (DWARF, PDB, etc.) C-bug Category: This is a bug. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@RReverser
Copy link
Contributor

While parsing DWARF in custom tooling, I've noticed that Rust generates 0-based columns, whereas other compilers, including LLVM-based (e.g. Clang and Swift) generate 1-based column.

This creates issues for tools that want to extract source spans, generate error messages or otherwise link to the original source location.

Let's take an example for Clang:

extern void abort();

void assert_less(int num1, int num2) {
    if (num1 >= num2) {
        abort();
    }
}

int main() {
    assert_less(10, 20);
    assert_less(30, 20);
    return 0;
}

This generates the following debug info (Godbolt):

!0 = distinct !DICompileUnit(language: DW_LANG_C99, file: !1, producer: "clang version 9.0.0 (tags/RELEASE_900/final 372344)", isOptimized: false, runtimeVersion: 0, emissionKind: LineTablesOnly, enums: !2, nameTableKind: None)
!1 = !DIFile(filename: "example.c", directory: "/home/ubuntu")
...
!7 = distinct !DISubprogram(name: "assert_less", scope: !8, file: !8, line: 3, type: !9, scopeLine: 3, flags: DIFlagPrototyped, spFlags: DISPFlagDefinition, unit: !0, retainedNodes: !2)
!8 = !DIFile(filename: "./example.c", directory: "/home/ubuntu")
!9 = !DISubroutineType(types: !2)
!10 = !DILocation(line: 4, column: 9, scope: !7)
!11 = !DILocation(line: 4, column: 17, scope: !7)
!12 = !DILocation(line: 4, column: 14, scope: !7)
!13 = !DILocation(line: 5, column: 9, scope: !7)
!14 = !DILocation(line: 7, column: 1, scope: !7)
!15 = distinct !DISubprogram(name: "main", scope: !8, file: !8, line: 9, type: !9, scopeLine: 9, spFlags: DISPFlagDefinition, unit: !0, retainedNodes: !2)
!16 = !DILocation(line: 10, column: 5, scope: !15)
!17 = !DILocation(line: 11, column: 5, scope: !15)
!18 = !DILocation(line: 12, column: 5, scope: !15)

You can see how locations for expressions start at the first char, e.g. 5:9 for abort(...), 10:5 for assert_less(...) etc.

Now let's take a Rust example:

extern {
    fn abort();
}

fn assert_less(num1: i32, num2: i32) {
    if num1 >= num2 {
        unsafe {
            abort();
        }
    }
}

pub fn main() {
    assert_less(10, 20);
    assert_less(30, 20);
}

This generates debug info (Godbolt):

...
!2 = distinct !DICompileUnit(language: DW_LANG_Rust, file: !3, producer: "clang LLVM (rustc version 1.38.0 (625451e37 2019-09-23))", isOptimized: false, runtimeVersion: 0, emissionKind: FullDebug, enums: !4)
!3 = !DIFile(filename: "./example.rs", directory: "/home/ubuntu")
!4 = !{}
!5 = distinct !DISubprogram(name: "assert_less", linkageName: "_ZN7example11assert_less17h3f90e1d508137775E", scope: !6, file: !3, line: 5, type: !7, scopeLine: 5, flags: DIFlagPrototyped, spFlags: DISPFlagLocalToUnit | DISPFlagDefinition, unit: !2, templateParams: !4, retainedNodes: !4)
!6 = !DINamespace(name: "example", scope: null)
!7 = !DISubroutineType(types: !4)
!8 = !DILocation(line: 6, column: 7, scope: !5)
!9 = !DILocation(line: 6, column: 4, scope: !5)
!10 = !DILocation(line: 8, column: 12, scope: !11)
!11 = distinct !DILexicalBlock(scope: !5, file: !3, line: 7, column: 8)
!12 = !DILocation(line: 11, column: 1, scope: !5)
!13 = distinct !DISubprogram(name: "main", linkageName: "_ZN7example4main17h3efe614eed321469E", scope: !6, file: !3, line: 13, type: !7, scopeLine: 13, flags: DIFlagPrototyped, spFlags: DISPFlagDefinition, unit: !2, templateParams: !4, retainedNodes: !4)
!14 = !DILocation(line: 14, column: 4, scope: !13)
!15 = !DILocation(line: 15, column: 4, scope: !13)
!16 = !DILocation(line: 16, column: 1, scope: !13)

You can see how locations for beginning of the line are still starting at column 1 like in Clang, but locations for various expressions are like 8:12 for abort, 14:4 for assert_less, etc. - that is, starting right before the expression rather than at its first character.

I wasn't sure which one is correct, so I also checked Swift compiler (it does the same as Clang) and GCC (it turns out not to implement column information yet).

I also looked at LLVM docs, and their examples for debug information also use 1-based columns, pointing at the first char of an expression: https://llvm.org/docs/SourceLevelDebugging.html

Finally, I checked the DWARF spec, and, while I couldn't find exactly how columns are supposed to be represented for expressions, I found this for declarations and expect it to be true / consistent for other items as well:

The value of the DW_AT_decl_line attribute represents the source line number at which the first character of the identifier of the declared object appears. The value 0 indicates that no source line has been specified.

The value of the DW_AT_decl_column attribute represents the source column number at which the first character of the identifier of the declared object appears. The value 0 indicates that no column has been specified.

All in all, it looks like Rust is the one generating columns that violate LLVM and, subsequently, DWARF descriptions and expectations.

@jonas-schievink jonas-schievink added A-debuginfo Area: Debugging information in compiled programs (DWARF, PDB, etc.) C-bug Category: This is a bug. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Oct 15, 2019
@RReverser
Copy link
Contributor Author

RReverser commented Oct 15, 2019

@est31 Looks like you implemented relevant code in #51980, so maybe want to take a look? I'm happy to make a PR, but not sure if my assumptions here are wrong or something changed since the initial implementation that introduced the bug.

@est31
Copy link
Member

est31 commented Oct 15, 2019

@RReverser it was definitely not intentional to make them 0 based or inconsistent to what clang's doing. I agree that this is a bug. This problem had been present in other places too:

So this bug isn't unique. I guess the root cause of this issue is Rust storing column numbers 0 based internally.

@RReverser
Copy link
Contributor Author

@est31 Thanks for the confirmation. Do you want me to send a PR or could you make a change?

@est31
Copy link
Member

est31 commented Oct 15, 2019

Thanks, but the PR you linked was in fact one of my last PRs to this repo. I have since directed the focus of my free work to other projects. While I'm still open to paid work for contributions in the rust-lang org, this change is too small for that :). TLDR: I'd prefer if you did it :).

@RReverser
Copy link
Contributor Author

RReverser commented Oct 16, 2019

Fair enough. If nobody does this earlier, I'll try to make a PR myself once I have some more spare time. Need to understand the best place to make this sort of adjustment first, because, as you said, it potentially affects several places.

@est31
Copy link
Member

est31 commented Oct 16, 2019

I'm still around to give gratis consulting :).

Quick & easy fix would be in source_loc.rs where my PR introduced the change. For such a fix figuring out the right way to test would probably be the hardest part.

A longer term solution of the root of this problem would be to make spans start at column 1 everywhere in the compiler. Maybe I'm missing something and this is actually a bad idea, but the best way to determine that is to attempt a change. You'd probably have to remove +1's in a few places :).

I suggest doing the easy fix first, then this bug is fixed. And in a second step, the internal format can be changed.

Btw kudos to @bjorn3 : I just looked up cranelift's source code and they seem to add 1: https://github.com/bjorn3/rustc_codegen_cranelift/blob/0934dc84fd73d32b4964d6f80390bc9582e9152b/src/debuginfo.rs#L306

@RReverser
Copy link
Contributor Author

Quick & easy fix would be in source_loc.rs where my PR introduced the change.

Yeah, but I think I'll also need to account for the beginning of line, which is currently correctly reported at column 1 and shouldn't be moved forward. It's not hard to add a condition here, but that feels weird and error-prone.

@est31
Copy link
Member

est31 commented Oct 17, 2019

Interesting, good point. However, I believe this isn't pointing to columns of 1 because of special casing for line starts. If I take the line with abort() in your example and remove all indentation, it removes column info in LLVM IR, indicating that 0 was passed to LLVM (which then removes it as it was 0).

The spans where the column is 1 are all pointing to closing braces } of functions and are attached to ret statements. If you add a space prior to the }, the column becomes 2 instead of remaining at 1. So I don't think that the 1 is meant to indicate the } character, but what's behind it. I've also done some experiments with closures where you can omit the }:

const FOO: fn(u32) -> bool = |_v| true; // 38
const FOO: fn(u32) -> bool = |_v| true // 38
;
const FOO: fn(u32) -> bool = |_v| (true); // 40
const FOO: fn(u32) -> bool = (|_v| true); // 40
const FOO: fn(u32) -> bool = |_v| { true }; // 42
const FOO: fn(u32) -> bool = |_v| { true
}; // 1

These confirm the hypothesis: all spans, if you add 1, point to right behind the last character of the function. Further experiments show that even if a function has multiple return statements spread out over the function, it has only one LLVM ret statement that has this behaviour. Not sure whether it should be considered a bug or not, just to keep it in mind.

@RReverser
Copy link
Contributor Author

@est31 Ah, thanks, you're right, I haven't noticed that column is 1 only for closing braces. That makes way more sense now.

The case with multiple ret is interesting - do you have an example code? All the minimal samples with multiple returns I've tried so far compile down to a single ret with a phi instruction, which doesn't help to reproduce these differences.

@RReverser
Copy link
Contributor Author

I haven't noticed that column is 1 only for closing braces. That makes way more sense now.

Although this probably still means that I need to handle this special case and not just add 1 since it would point yet another char further beyond the last one.

@est31
Copy link
Member

est31 commented Oct 17, 2019

The case with multiple ret is interesting - do you have an example code? All the minimal samples with multiple returns I've tried so far compile down to a single ret with a phi instruction, which doesn't help to reproduce these differences.

Sorry that sentence I wrote was easy to misunderstand. I got the same results as you. All the return statements get converted to jumps to a section on the end, which then has a single ret instruction per function (not multiple). That ret has that specific span that points to beyond the end of the function.

Further experiments show that even if a function has multiple return statements spread out over the function, it has only one LLVM ret statement that has this behaviour.

@RReverser
Copy link
Contributor Author

Ah, yeah, I thought you meant that there are several ret statements but only one has this behaviour.

@est31
Copy link
Member

est31 commented Oct 17, 2019

I did a bit of digging on the "final ret instruction" question. Apparently this already occurs in MIR.
Given this example:

fn foo() -> bool {
    false
}

MIR is (generated by playground):

fn  foo() -> bool {
    let mut _0: bool;                    // return place in scope 0 at src/lib.rs:1:13: 1:17

    bb0: {
        _0 = const false;                // bb0[0]: scope 0 at src/lib.rs:2:5: 2:10
                                         // ty::Const
                                         // + ty: bool
                                         // + val: Scalar(0x00)
                                         // mir::Constant
                                         // + span: src/lib.rs:2:5: 2:10
                                         // + ty: bool
                                         // + literal: Const { ty: bool, val: Scalar(0x00) }
        return;                          // bb0[1]: scope 0 at src/lib.rs:3:2: 3:2
    }
}

Here you can see the return instrunction having a span starting at line 3, column 2, and ending at that same place (The printing code accurately adds 1 to the column so there's no offset here).

I guess the MIR return statement is being lowered to the LLVM ret statement we observed.

A little bit of further digging gives us this code snippet which is responsible for generating the MIR return statements:

// Attribute epilogue to function's closing brace
let fn_end = span.shrink_to_hi();
let source_info = builder.source_info(fn_end);
let return_block = builder.return_block();
builder.cfg.terminate(block, source_info,
TerminatorKind::Goto { target: return_block });
builder.cfg.terminate(return_block, source_info,
TerminatorKind::Return);
// Attribute any unreachable codepaths to the function's closing brace
if let Some(unreachable_block) = builder.cached_unreachable_block {
builder.cfg.terminate(unreachable_block, source_info,
TerminatorKind::Unreachable);
}

You can see that shrink_to_hi is being called on the function's span to obtain the span of the return statement. This is where the problem lies IMO because shrink_to_hi is just taking the end location for the span and building a zero-sized span around it.

As for improving this, you might use the body param passed to the construct_fn function and do a match on body.value.kind. If it's not a ExprKind::Block, you have stuff like closures. Here you can just do the old behaviour and return the shrink_to_hi on the function's span. If it is a ExprKind::Block, you need to do special casing. I'm not sure what will contain the needed info, but trial and error might help. Maybe it's enough to just take the block's span value and call shrink_to_hi on it. Maybe you need to iterate on all items in the block (iterator on stmts chained with iterator on expr) and then call shrink_to_hi on the final one. If there is no statement (a block like {}), then I don't know how a span can be obtained that points into the block.

@est31
Copy link
Member

est31 commented Oct 17, 2019

Also, IMO making the ret's span point to somewhere inside the block instead of slightly to its end is a different issue from this bug you filed. This bug is completely resolved by just a simple +1 PR.

@RReverser
Copy link
Contributor Author

This bug is completely resolved by just a simple +1 PR.

Yeah, absolutely, and, as I said above, I'll try to make it soon-ish. Just trying to figure out all implications, like this one where after my PR there will be now a 2-char difference between the real column and produced one.

@est31
Copy link
Member

est31 commented Oct 18, 2019

@RReverser small nit: in the case of the ret there will only be a difference of one char. Those ret statements actually have nothing corresponding to them in Rust's syntax.

@RReverser
Copy link
Contributor Author

@est31 There is no direct equivalent, but presumably you'd want debugger to point at something in the source code when stepping through or extracting ranges, and for that ret location needs to point at the closing brace.

It already doesn't and points to non-existing column instead (1 char past the brace), which can easily cause problems for tooling that consumes DWARF, and with my suggested PR it would be pointing even further, 2 chars past the brace - this is what I meant when saying "2-char difference".

@RReverser
Copy link
Contributor Author

and for that ret location needs to point at the closing brace

Btw, as you can see above or by playing with multiple returns, this is what Clang is doing as well.

@ecstatic-morse
Copy link
Contributor

@RReverser I opened a PR that contains a test for the proper column numbers (as well as the simple off-by-one fix in source_loc.rs). Perhaps it will be useful if you wish to proceed further.

@ecstatic-morse
Copy link
Contributor

I don't expect to be able to continue work on #65676 for the time being, so if anyone reading this wants to pick it up, go for it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-debuginfo Area: Debugging information in compiled programs (DWARF, PDB, etc.) C-bug Category: This is a bug. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
4 participants