Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Malformed coverage data when using llvm-cov #119453

Open
StackOverflowExcept1on opened this issue Dec 30, 2023 · 16 comments
Open

Malformed coverage data when using llvm-cov #119453

StackOverflowExcept1on opened this issue Dec 30, 2023 · 16 comments
Labels
A-code-coverage Area: Source-based code coverage (-Cinstrument-coverage) C-bug Category: This is a bug. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@StackOverflowExcept1on
Copy link
Contributor

I tried to fuzz our substrate runtime but getting error:

error: Failed to load coverage: 'target/x86_64-unknown-linux-gnu/coverage/x86_64-unknown-linux-gnu/release/main': Malformed coverage data
How to reproduce (on our large repo)
git clone --branch av/rust-1.76-support https://github.com/gear-tech/gear.git
cd gear
git checkout 849dbb301c751c951754b73b39a50a02e7296bef

cd utils/runtime-fuzzer

mkdir -p fuzz/corpus/main
dd if=/dev/urandom of=fuzz/corpus/main/fuzzer-seed-corpus bs=1 count=350000

# Run fuzzer for at least for 3 minutes and then press Ctrl-C to stop fuzzing.
cargo fuzz run \
    --release \
    --sanitizer=none \
    main \
    fuzz/corpus/main \
    -- \
        -rss_limit_mb=8192 \
        -max_len=450000 \
        -len_control=0

cargo fuzz coverage \
    --release \
    --sanitizer=none \
    main \
    fuzz/corpus/main \
    -- \
        -rss_limit_mb=8192 \
        -max_len=450000 \
        -len_control=0

HOST_TARGET=$(rustc -Vv | grep "host: " | sed "s/^host: \(.*\)$/\1/")
cargo cov -- show target/$HOST_TARGET/coverage/$HOST_TARGET/release/main \
  --format=text \
  --show-line-counts \
  --Xdemangler=rustfilt \
  --instr-profile=fuzz/coverage/main/coverage.profdata \
  --ignore-filename-regex=/rustc/ \
  --ignore-filename-regex=.cargo/ &> fuzz/coverage/main/coverage.txt

Meta

rustc --version --verbose:

rustc 1.77.0-nightly (3cdd004e5 2023-12-29)
binary: rustc
commit-hash: 3cdd004e55c869faa2b7b25efd3becf50346e7d6
commit-date: 2023-12-29
host: x86_64-unknown-linux-gnu
release: 1.77.0-nightly
LLVM version: 17.0.6
@StackOverflowExcept1on StackOverflowExcept1on added the C-bug Category: This is a bug. label Dec 30, 2023
@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Dec 30, 2023
@StackOverflowExcept1on
Copy link
Contributor Author

@Zalathar I don't know how to reproduce this with a minimal example, but would appreciate it if you could somehow debug this in LLVM code. As you wrote earlier this comes from llvm-cov. It throws coveragemap_error::malformed somewhere but without any backtrace.

@Noratrieb Noratrieb added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. A-code-coverage Area: Source-based code coverage (-Cinstrument-coverage) and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Dec 30, 2023
@StackOverflowExcept1on
Copy link
Contributor Author

StackOverflowExcept1on commented Dec 30, 2023

I debugged LLVM and this comes from the same place: https://github.com/rust-lang/llvm-project/blob/fef3d7b14ede45d051dc688aae0bb8c8b02a0566/llvm/lib/ProfileData/Coverage/CoverageMappingReader.cpp#L340-L344
But I have no idea how to get source file by ExpandedFileID to provide more info.

@Zalathar
Copy link
Contributor

Pinpointing the LLVM error was very helpful, thanks. Without that I wouldn’t be able to do much.

Can you give me an example of the LineStart/ColumnStart/NumLines/ColumnEnd values that are present when it fails? That should give me a better idea of what’s going wrong on the Rust side.

@Zalathar
Copy link
Contributor

(I might not be able to track down the underlying issue, but hopefully I should at least be able to add some extra checks to the compiler to make sure LLVM doesn’t encounter a fatal error.)

@StackOverflowExcept1on
Copy link
Contributor Author

@Zalathar

LineStart=652 LineStartDelta=0 ColumnStart=9 NumLines=0 ColumnEnd=10
Counter in file 0 652:9 -> 652:10, #44
LineStart=1112 LineStartDelta=460 ColumnStart=1 NumLines=4294966836 ColumnEnd=10
Counter in file 0 1112:1 -> 4294967948:10, #0
here 1112, 1652, 10
error: Failed to load coverage: 'target/x86_64-unknown-linux-gnu/coverage/x86_64-unknown-linux-gnu/release/main': malformed coverage data
#10

@Zalathar
Copy link
Contributor

I've submitted a possible workaround in #119460.

It doesn't address the underlying question of why we were producing improper regions in the first place, but it does at least mean that we will detect and discard those regions early, instead of emitting them and having llvm-cov fail.

@Zalathar
Copy link
Contributor

NumLines=4294966836 is 0xFFFF_FE34, which is -460 (relative to LineStart=1112).

This suggests that the original coordinates were 1112:1 -> 652:10.

@Zalathar
Copy link
Contributor

Spans within the compiler are always properly-ordered in terms of byte positions (enforced in Span::new), so I believe the only way to end up with improper line/column coordinates is if the span starts and ends in different files.

I'm not sure how we're ending up with a span like that. I initially suspected fn_sig_span, but it turns out that we require fn_sig_span and body_span to start in the same file, so that probably isn't the cause.

@Zalathar
Copy link
Contributor

I wonder if the span adjustment in filtered_terminator_span for TerminatorKind::Call could be responsible, as it combines the endpoints of two potentially-unrelated spans.

Though if the resulting span crosses file boundaries, I would expect it to be discarded by unexpand_into_body_span, unless the body span also crosses file boundaries. But we don't do any ad-hoc manipulation of the endpoints of the body span itself, so the file-crossing span would have to already be present in MIR before coverage gets involved. I don't know whether that's possible or not.

@StackOverflowExcept1on
Copy link
Contributor Author

@Zalathar Is there a way to find out which source file is causing the problem? I think it's pretty easy to edit the llvm-cov source code on the fly and then put it to ~/.rustup/toolchains/<toolchain>/lib/rustlib/...

@Zalathar
Copy link
Contributor

If you look at method RawCoverageMappingReader::readMappingRegionsSubArray, you should see a parameter unsigned InferredFileID.

That file ID should be an index into the field std::vector<StringRef> &Filenames in RawCoverageMappingReader.

So you might be able to rig up something to print out that filename.

@StackOverflowExcept1on
Copy link
Contributor Author

StackOverflowExcept1on commented Dec 31, 2023

The problem comes from the macro construct_runtime!(): https://github.com/gear-tech/gear/blob/av/rust-1.76-support/runtime/vara/src/lib.rs#L1112

Counter in file 0 1112:1 -> 4294967948:10, #0
/home/.../work/gear/runtime/vara/src/lib.rs
LineStart=1112 LineStartDelta=460 ColumnStart=1 NumLines=4294966836 ColumnEnd=10
error: Failed to load coverage: 'target/x86_64-unknown-linux-gnu/coverage/x86_64-unknown-linux-gnu/release/main': malformed coverage data

@Zalathar
Copy link
Contributor

That looks like the source of the 1112:1, but it leaves me puzzled as to where the 652:10 is coming from.

I don’t think those coordinates point anywhere meaningful in the same file, but I’m not sure what other file they could be trying to refer to.

@StackOverflowExcept1on
Copy link
Contributor Author

I don’t know what the hell is going on at 652:9 -> 652:10, but I can attach the full log: coverage.txt

@Zalathar
Copy link
Contributor

Zalathar commented Jan 1, 2024

Another piece of information that might be useful is the name of the function that has the malformed region in its coverage mappings.

It looks like you should be able to dump it from R.FunctionName in BinaryCoverageReader::readNextRecord.

But you might need to dump it before the call to Reader.read(), because I believe that's the call that fails when it encounters the bad region.

matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Jan 22, 2024
…wiser

coverage: Never emit improperly-ordered coverage regions

If we emit a coverage region that is improperly ordered (end < start), `llvm-cov` will fail with `coveragemap_error::malformed`, which is inconvenient for users and also very hard to debug.

Ideally we would fix the root causes of these situations, but they tend to occur in very obscure edge-case scenarios (often involving nested macros), and we don't always have a good MCVE to work from. So it makes sense to also have a catch-all check that will prevent improperly-ordered regions from ever being emitted.

---

This is mainly aimed at resolving rust-lang#119453. We don't have a specific way to reproduce it, which is why I haven't been able to add a test case in this PR. But based on the information provided in that issue, this change seems likely to avoid the error in `llvm-cov`.

`@rustbot` label +A-code-coverage
fmease added a commit to fmease/rust that referenced this issue Jan 23, 2024
…wiser

coverage: Never emit improperly-ordered coverage regions

If we emit a coverage region that is improperly ordered (end < start), `llvm-cov` will fail with `coveragemap_error::malformed`, which is inconvenient for users and also very hard to debug.

Ideally we would fix the root causes of these situations, but they tend to occur in very obscure edge-case scenarios (often involving nested macros), and we don't always have a good MCVE to work from. So it makes sense to also have a catch-all check that will prevent improperly-ordered regions from ever being emitted.

---

This is mainly aimed at resolving rust-lang#119453. We don't have a specific way to reproduce it, which is why I haven't been able to add a test case in this PR. But based on the information provided in that issue, this change seems likely to avoid the error in `llvm-cov`.

``@rustbot`` label +A-code-coverage
fmease added a commit to fmease/rust that referenced this issue Jan 24, 2024
…wiser

coverage: Never emit improperly-ordered coverage regions

If we emit a coverage region that is improperly ordered (end < start), `llvm-cov` will fail with `coveragemap_error::malformed`, which is inconvenient for users and also very hard to debug.

Ideally we would fix the root causes of these situations, but they tend to occur in very obscure edge-case scenarios (often involving nested macros), and we don't always have a good MCVE to work from. So it makes sense to also have a catch-all check that will prevent improperly-ordered regions from ever being emitted.

---

This is mainly aimed at resolving rust-lang#119453. We don't have a specific way to reproduce it, which is why I haven't been able to add a test case in this PR. But based on the information provided in that issue, this change seems likely to avoid the error in `llvm-cov`.

``@rustbot`` label +A-code-coverage
fmease added a commit to fmease/rust that referenced this issue Jan 24, 2024
…wiser

coverage: Never emit improperly-ordered coverage regions

If we emit a coverage region that is improperly ordered (end < start), `llvm-cov` will fail with `coveragemap_error::malformed`, which is inconvenient for users and also very hard to debug.

Ideally we would fix the root causes of these situations, but they tend to occur in very obscure edge-case scenarios (often involving nested macros), and we don't always have a good MCVE to work from. So it makes sense to also have a catch-all check that will prevent improperly-ordered regions from ever being emitted.

---

This is mainly aimed at resolving rust-lang#119453. We don't have a specific way to reproduce it, which is why I haven't been able to add a test case in this PR. But based on the information provided in that issue, this change seems likely to avoid the error in `llvm-cov`.

```@rustbot``` label +A-code-coverage
fmease added a commit to fmease/rust that referenced this issue Jan 24, 2024
…wiser

coverage: Never emit improperly-ordered coverage regions

If we emit a coverage region that is improperly ordered (end < start), `llvm-cov` will fail with `coveragemap_error::malformed`, which is inconvenient for users and also very hard to debug.

Ideally we would fix the root causes of these situations, but they tend to occur in very obscure edge-case scenarios (often involving nested macros), and we don't always have a good MCVE to work from. So it makes sense to also have a catch-all check that will prevent improperly-ordered regions from ever being emitted.

---

This is mainly aimed at resolving rust-lang#119453. We don't have a specific way to reproduce it, which is why I haven't been able to add a test case in this PR. But based on the information provided in that issue, this change seems likely to avoid the error in `llvm-cov`.

````@rustbot```` label +A-code-coverage
fmease added a commit to fmease/rust that referenced this issue Jan 24, 2024
…wiser

coverage: Never emit improperly-ordered coverage regions

If we emit a coverage region that is improperly ordered (end < start), `llvm-cov` will fail with `coveragemap_error::malformed`, which is inconvenient for users and also very hard to debug.

Ideally we would fix the root causes of these situations, but they tend to occur in very obscure edge-case scenarios (often involving nested macros), and we don't always have a good MCVE to work from. So it makes sense to also have a catch-all check that will prevent improperly-ordered regions from ever being emitted.

---

This is mainly aimed at resolving rust-lang#119453. We don't have a specific way to reproduce it, which is why I haven't been able to add a test case in this PR. But based on the information provided in that issue, this change seems likely to avoid the error in `llvm-cov`.

`````@rustbot````` label +A-code-coverage
rust-timer added a commit to rust-lang-ci/rust that referenced this issue Jan 24, 2024
Rollup merge of rust-lang#119460 - Zalathar:improper-region, r=wesleywiser

coverage: Never emit improperly-ordered coverage regions

If we emit a coverage region that is improperly ordered (end < start), `llvm-cov` will fail with `coveragemap_error::malformed`, which is inconvenient for users and also very hard to debug.

Ideally we would fix the root causes of these situations, but they tend to occur in very obscure edge-case scenarios (often involving nested macros), and we don't always have a good MCVE to work from. So it makes sense to also have a catch-all check that will prevent improperly-ordered regions from ever being emitted.

---

This is mainly aimed at resolving rust-lang#119453. We don't have a specific way to reproduce it, which is why I haven't been able to add a test case in this PR. But based on the information provided in that issue, this change seems likely to avoid the error in `llvm-cov`.

`````@rustbot````` label +A-code-coverage
@atodorov
Copy link

atodorov commented Mar 8, 2024

FTR I was seeing the same issue when running cargo llvm-cov with another Substrate based chain. rustc 1.77.0-nightly fixed it for me. Will wait for this to become stable and upgrade.

@Zalathar thanks for your patch!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-code-coverage Area: Source-based code coverage (-Cinstrument-coverage) C-bug Category: This is a bug. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

5 participants