Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression on nightly since LLVM 8 upgrade: thread sanitizer doesn't compile anymore #53945

Open
PaulGrandperrin opened this issue Sep 4, 2018 · 19 comments
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-sanitizers Area: Sanitizers for correctness and code quality T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@PaulGrandperrin
Copy link

Hi, the fuzzer I maintain is failing to build on the latest nightlies:

The interesting part of the error log seems to be:

note: /usr/bin/ld: __sancov_guards has both ordered [`__sancov_guards' in /home/travis/build/rust-fuzz/honggfuzz-rs/example/hfuzz_target/x86_64-unknown-linux-gnu/release/deps/example-934b6185f6a63e31.example.ajs5tmgw-cgu.0.rcgu.o] and unordered [`__sancov_guards' in /home/travis/build/rust-fuzz/honggfuzz-rs/example/hfuzz_target/x86_64-unknown-linux-gnu/release/deps/example-934b6185f6a63e31.example.ajs5tmgw-cgu.0.rcgu.o] sections
          /usr/bin/ld: final link failed: Bad value
          collect2: error: ld returned 1 exit status

You can find the full log here:
https://travis-ci.org/rust-fuzz/honggfuzz-rs/jobs/424079778

I bisected on my computer the exact rust version that fails and it seems to be related to the LLVM 8 upgrade.

This version works well:

# rustup default nightly-2018-09-01
# rustc -vV
rustc 1.30.0-nightly (aaa170beb 2018-08-31)
binary: rustc
commit-hash: aaa170bebe31d03e2eea14e8cb06dc2e8891216b
commit-date: 2018-08-31
host: x86_64-unknown-linux-gnu
release: 1.30.0-nightly
LLVM version: 7.0

This version doesn't:

# rustup default nightly-2018-09-02
# rustc -vV                                                                                                                             Tue 04 Sep 2018 02:25:07 PM CEST
rustc 1.30.0-nightly (28bcffead 2018-09-01)
binary: rustc
commit-hash: 28bcffead74d5e17c6cb1f7de432e37f93a6b50c
commit-date: 2018-09-01
host: x86_64-unknown-linux-gnu
release: 1.30.0-nightly
LLVM version: 8.0
@guidovranken
Copy link

I have the same issue.

@PaulGrandperrin
Copy link
Author

libfuzzer-sys, which is used by cargo-fuzz also fails with the same error:
https://travis-ci.org/rust-fuzz/libfuzzer-sys/builds/417491674

@PaulGrandperrin PaulGrandperrin changed the title Regression on nightly: /usr/bin/ld: __sancov_guards has both ordered [...] and unordered [...] sections Regression on nightly since LLVM 8 upgrade: /usr/bin/ld: __sancov_guards has both ordered [...] and unordered [...] sections Sep 4, 2018
@Havvy Havvy added the regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. label Sep 5, 2018
PaulGrandperrin added a commit to rust-fuzz/honggfuzz-rs that referenced this issue Sep 6, 2018
PaulGrandperrin added a commit to rust-fuzz/honggfuzz-rs that referenced this issue Sep 6, 2018
@PaulGrandperrin
Copy link
Author

I progressed a little bit on narrowing down the root cause of the issue.
It's only triggered when using the thread sanitizer.
How to reproduce:

cd /tmp
git clone https://github.com/rust-fuzz/honggfuzz-rs.git
cd honggfuzz-rs/example/
RUSTFLAGS="-Z sanitizer=thread" ./test.sh

If you use the address or leak sanitizer or no sanitizers, there is no issues.

@eddyb eddyb removed the regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. label Sep 6, 2018
@eddyb
Copy link
Member

eddyb commented Sep 6, 2018

(The regression is nightly-to-nightly and recent, the label must've been an accident)

@PaulGrandperrin PaulGrandperrin changed the title Regression on nightly since LLVM 8 upgrade: /usr/bin/ld: __sancov_guards has both ordered [...] and unordered [...] sections Regression on nightly since LLVM 8 upgrade: thread sanitizer doesn't compile anymore Sep 9, 2018
@Shnatsel
Copy link
Member

Shnatsel commented Sep 13, 2018

I am facing the same issue with address sanitizer on rustc 1.30.0-nightly (f2302daef 2018-09-12) when building with libfuzzer (cargo-fuzz)

@memoryruins memoryruins added the A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. label Sep 15, 2018
@Aaron1011
Copy link
Member

I'd like to work on this.

@Aaron1011
Copy link
Member

Some initial findings:

That error is emited by ld when(I think) a section links to both both unordered and order sections. Ordered sections are defined by the presence of the SHF_LINK_ORDER ELF section header flag, which is described here

LLVM emits this flag in TargetLoweringObjectFileImpl.cpp here and here, in response to LLVMContext::MD_associated being set.

From what I can see, LLVMContext::MD_associated is unconditionally set by SanitizerCoverage when writing to the __sancov_guards section.

I'll need to investigate further to determine how this flag is getting left off.

@Aaron1011
Copy link
Member

I've determine that passing -C opt-level=0 causes the compilation to succeed, while passing -C opt-level=1 causes it to fail.

I suspect that this issue is caused by an interaction between LLVM's Dead Global Elimination pass (which doesn't run with opt-level=0) and the sanitizer. My guess is that LLVM ends up deleting an unused function referenced by MD_ASSOCIATED. This would leave FunctionGuardArray with a dangling reference to its function, causing getAssociatedSymbol to return null.

In this case, LLVM would no longer add the SHF_LINK_ORDER flag to the ELF section, resulting in a linker error due to the missing flag.

However, this is all still somewhat speculative. I'm going to try to come up with a minimal reproduction, which can hopefully be induced to fail/succeed by toggling the Dead Global Elimination pass.

@Aaron1011
Copy link
Member

Aaron1011 commented Sep 30, 2018

TL;DR: As as a temporary workaround, pass -C opt-level=0. This issue is caused by an LLVM bug, so it will need to be fixed upstream.

I've now determined that this is definitely an LLVM bug. I've created a minimal reproduction, which only uses Clang and other LLVM tools, here: https://github.com/Aaron1011/llvm_arg_elim

The issue occurs due to the behavior of LLVM's DeadArgumentEliminationPass (not Dead Global Elimination, as I had previously thought). When DeadArgumentEliminationPass removes arguments/return values from a function, it actually creates an entirely new function, and updates all references to the previous function. However, it fails to update any MD_associated metadata entries
targeting the old function.

As I described in my previous comment, this results in LLVM leaving off the SHF_LINK_ORDER flag when generating the ELF section header. Since there are still other __sancov_guarc sections with the header present (from functions that DeadArgumentEliminationPass didn't modify), ld will error when it sees the mismatched flags.

I'll be filing a bug with LLVM once I'm given an account on their bugtracker. For now, you can work around this issue by passing -C opt-level=0 to rustc. This will disable running LLVM optimizations, including DeadArgumentEliminationPass. Unfortunately, there doesn't seem to be a way to disable that particular pass, other than by disabling all optimizations.

@Aaron1011
Copy link
Member

Using the gold linker with -Clink-arg=-fuse-ld=gold seems to avoid this problem entirely.

When using the default (BFD) linker, the 'has both ordered and unordered' error appears to be triggered by two separate bugs:

  1. The LLVM DeadArgumentEliminationPass bug, which I'm still planning to upstream a fix for.
  2. The Dead Global Elimiation interaction that I mentioend here. I'm not sure if this is actually an LLVM bug - the existance of an MD_Associated global shouldn't prevent a function from being deleted, but there's no good way to delete the __sancov_gen global entirely. Since golddoesn't complain about SHF_LINK_ORDER being used inconsistently, I'm not sure if this is a real issue or not.

@Aaron1011
Copy link
Member

I've managed to come up with a full fix locally. I'll be submitting my changes to LLVM tomorrow, and will post the Phabricator link(s) here once I do so.

The cause of the issue:

  1. Several LLVM passes (ArgumentPromotion, DeadArgumentElimination, Inliner, GlobalDCE, GlobalOpt, Internalize, and possibly others) mishandle COMDATs and/or MD_Associated metadata - either through improper deletion, or failure to properly update.
  2. This mishandling can result in two kinds of malformed __sancov_guards sections:
    1. The associated function is stripped from the object, but the __sancov_gen_ symbol associated with it is still emitted in a __sancov_guards section. Since the associated function section does not exist in the object, the __sancov_guards section will have nothing to link to. This is due to LLVM failing to take COMDATs into account in several places when deleting dead code/objects.
    2. The associated function still exists in the object, but the __sancov_guards section is not linked to it. This is due to several LLVM passes accidentally removing the MD_Associated metadata from the __sancov_gen_ global object.

In both of these cases, the BFD linker will see proper, 'ordered' __sancov_guards sections (sh_link is set and the SHF_LINK_ORDER flag is set) in addition to an improper, unordered __sancov_guards section (which LLVM failed to link to its associated function).

@Aaron1011
Copy link
Member

Here are all of the LLVM patches I've submitted (checked-off items have been merged):

@sfackler
Copy link
Member

Nice! Might be worth pushing them to our llvm fork so we can pick them up more quickly?

@Aaron1011
Copy link
Member

I think it might be best to wait until they're (hopefully) all accepted by LLVM. Getting them into the rust LLVM fork is going to require cherry-picking some additional commits, and it's possible that the LLVM team might want some changes before my patches are merged.

SingingTree added a commit to SingingTree/afl.rs that referenced this issue Oct 22, 2018
rust-fuzz#141 + rust-lang/rust#53945 track issues with linkage
which regressed when rust updated to llvm 8. This commit adds a work
around for such issues for cargo-afl. This helps with the ergonomics of
cargo-afl, particularly for those less familiar with the project and the
above issues.

These changes can be safely removed once patches are landed in llvm and
rust updates to use the patched version.
frewsxcv pushed a commit to rust-fuzz/afl.rs that referenced this issue Oct 23, 2018
#141 + rust-lang/rust#53945 track issues with linkage
which regressed when rust updated to llvm 8. This commit adds a work
around for such issues for cargo-afl. This helps with the ergonomics of
cargo-afl, particularly for those less familiar with the project and the
above issues.

These changes can be safely removed once patches are landed in llvm and
rust updates to use the patched version.
brson added a commit to brson/tikv that referenced this issue Jan 1, 2019
Recent Rust compilers have bugs that appear when fuzzing
optimized binaries:

rust-lang/rust#53945

This patch works around the issue by adding the "-C codegen-units=1 -C
incremental=fuzz-incremental" arguments to `RUSTFLAGS`.

Why this works I don't actually know. This workaround isn't mentioned
in the linked issue, and afaik the "incremental" flag is simply
changing the directory of the incremental cache, not turning it on or
off.

Signed-off-by: Brian Anderson <[email protected]>
garyttierney added a commit to garyttierney/secsp that referenced this issue Jan 3, 2019
This is a workaround for an LLVM 8 codegen issue. See
rust-lang/rust#53945 for more information.
brson added a commit to brson/tikv that referenced this issue Jan 3, 2019
Recent Rust compilers have bugs that appear when fuzzing
optimized binaries:

rust-lang/rust#53945

This patch works around the issue by adding the "-C codegen-units=1 -C
incremental=fuzz-incremental" arguments to `RUSTFLAGS`.

Why this works I don't actually know. This workaround isn't mentioned
in the linked issue, and afaik the "incremental" flag is simply
changing the directory of the incremental cache, not turning it on or
off.

Signed-off-by: Brian Anderson <[email protected]>
brson added a commit to brson/tikv that referenced this issue Jan 3, 2019
Recent Rust compilers have bugs that appear when fuzzing
optimized binaries:

rust-lang/rust#53945

This patch works around the issue by adding the "-C codegen-units=1 -C
incremental=fuzz-incremental" arguments to `RUSTFLAGS`.

Why this works I don't actually know. This workaround isn't mentioned
in the linked issue, and afaik the "incremental" flag is simply
changing the directory of the incremental cache, not turning it on or
off.

Signed-off-by: Brian Anderson <[email protected]>
garyttierney added a commit to garyttierney/secsp that referenced this issue Jan 3, 2019
This is a workaround for an LLVM 8 codegen issue. See
rust-lang/rust#53945 for more information.
garyttierney added a commit to garyttierney/secsp that referenced this issue Jan 3, 2019
This is a workaround for an LLVM 8 codegen issue. See
rust-lang/rust#53945 for more information.
brson added a commit to brson/tikv that referenced this issue Jan 7, 2019
Recent Rust compilers have bugs that appear when fuzzing
optimized binaries:

rust-lang/rust#53945

This patch works around the issue by adding the "-C codegen-units=1 -C
incremental=fuzz-incremental" arguments to `RUSTFLAGS`.

Why this works I don't actually know. This workaround isn't mentioned
in the linked issue, and afaik the "incremental" flag is simply
changing the directory of the incremental cache, not turning it on or
off.

Signed-off-by: Brian Anderson <[email protected]>
brson added a commit to brson/tikv that referenced this issue Jan 9, 2019
Recent Rust compilers have bugs that appear when fuzzing
optimized binaries:

rust-lang/rust#53945

This patch works around the issue by adding the "-C codegen-units=1 -C
incremental=fuzz-incremental" arguments to `RUSTFLAGS`.

Why this works I don't actually know. This workaround isn't mentioned
in the linked issue, and afaik the "incremental" flag is simply
changing the directory of the incremental cache, not turning it on or
off.

Signed-off-by: Brian Anderson <[email protected]>
brson added a commit to brson/tikv that referenced this issue Jan 9, 2019
Recent Rust compilers have bugs that appear when fuzzing
optimized binaries:

rust-lang/rust#53945

This patch works around the issue by adding the "-C codegen-units=1 -C
incremental=fuzz-incremental" arguments to `RUSTFLAGS`.

Why this works I don't actually know. This workaround isn't mentioned
in the linked issue, and afaik the "incremental" flag is simply
changing the directory of the incremental cache, not turning it on or
off.

Signed-off-by: Brian Anderson <[email protected]>
jamii added a commit to MaterializeInc/materialize that referenced this issue Apr 25, 2019
@nagisa
Copy link
Member

nagisa commented Oct 20, 2019

This might be related to the linker versions. I.e. something that fails to build on travis builds successfully on my local machine with a super recent toolchain.

@jonhoo
Copy link
Contributor

jonhoo commented Feb 4, 2020

That's some great detective work @Aaron1011! Did you ever hear anything more back about those patches?

jonhoo added a commit to jonhoo/flurry that referenced this issue Feb 4, 2020
jonhoo added a commit to jonhoo/flurry that referenced this issue Feb 4, 2020
The fix of using `ld.gold` from
rust-lang/rust#53945 (comment)
sadly did not work for me, so instead we're going to only run ASAN on
unit and integration tests. It's not great.
@g2p
Copy link
Contributor

g2p commented Apr 4, 2020

I would suggest raising the urgency on this, because the ld.gold workaround doesn't work here anymore.

Relevant versions:

ld.gold --version
GNU gold (GNU Binutils for Ubuntu 2.34) 1.16

apt policy binutils
Installed: 2.34-5ubuntu1
(running the soon to be released Ubuntu 20.04)

rustc --version
rustc 1.42.0 (b8cedc004 2020-03-09)

Both honggfuzz-rs and bolero (all backends: libfuser, honggfuzz, afl) now get linker errors.

Using the gold linker with -Clink-arg=-fuse-ld=gold seems to avoid this problem entirely.

When using the default (BFD) linker, the 'has both ordered and unordered' error appears to be triggered by two separate bugs:

1. The LLVM `DeadArgumentEliminationPass` bug, which I'm still planning to upstream a fix for.

2. The `Dead Global Elimiation` interaction that I mentioend [here](https://github.com/rust-lang/rust/issues/53945#issuecomment-425620542). I'm not sure if this is actually an LLVM bug - the existance of an `MD_Associated` global shouldn't prevent a function from being deleted, but there's no good way to delete the `__sancov_gen` global entirely. Since `gold`doesn't complain about `SHF_LINK_ORDER` being used inconsistently, I'm not sure if this is a real issue or not.

@Jules-Bertholet
Copy link
Contributor

@rustbot label A-sanitizers

@rustbot rustbot added the A-sanitizers Area: Sanitizers for correctness and code quality label Apr 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-sanitizers Area: Sanitizers for correctness and code quality T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests