valgrind: Mark ppc64 w/ ELFv2 as unsupported#295906
valgrind: Mark ppc64 w/ ELFv2 as unsupported#295906OPNA2608 wants to merge 1 commit intoNixOS:stagingfrom
Conversation
Dig deeper, we shall do… |
|
Okay… Adélie's valgrind works with binaries produced by Adélie's GCC+musl toolchain. As does our valgrind with binaries from Adélie's toolchain. But Adélie's valgrind dies in the same manner as reported above with binaries from our GCC toolchain (glibc & musl, doesn't matter). So I guess something about our toolchain is breaking this? I'll rework this to instead just mark the tests as unsupported, since they're definitely broken in this config - the ELFv2 patch comes from Adélie, and they don't run the tests there. Maybe we can also fetch another ppc64 patch from them, though I haven't tested if it works on glibc yet. |
|
Testing with
Also tried:
The graceful error: Zero experience with Valgrind internals or POWER register details. AFAICT it's trying to identify what register that offset & size maps to, and comes up empty. 64-bit POWER ELF V2 specs say offset 44 is in the middle of an 8-byte general-purpose register, so bad offset & size. It would match GPR12 on 32-bit PowerPC… but I'm not sure how we could arrive there? Also, there's still the issue of all the memory error traces inside the libcs before this. I'm really not sure what to make of all of this… Considering that for all intents and purposes, Nixpkgs' |
9d83d9e to
61d47db
Compare
|
I don't have the capacity / willpower to debug this much more for now, let's just try marking this as unsupported. I'm also switching the ELFv2 patch from the Void version to the Adélie one, just to track the upstream of this more closely. |
3162b09 to
c140563
Compare
- Tests are still failing due to ELFv1 hardcoding
- We can build for ppc64 ELFv2, but running the resulting valgrind binary on anything Nixpkgs produces results in
- a SIGSEGV with glibc
- many reports of uninitialised value usage inside glibc
- the process terminating on a SIGSEGV with "Bad permissions for mapped region" inside glibc
- valgrind itself crashing with a SIGSEGV in do_syscall_WRK
- a slightly more graceful crash of valgrind with musl
- similar reports of uninitialised value usage inside musl
- valgrind exits upon encountering "the impossible":
MC_(get_otrack_shadow_offset)(ppc64)(off=44,sz=4)
Memcheck: mc_machine.c:403 (get_otrack_shadow_offset_wrk): the 'impossible' happened.
host stacktrace:
==4118== at 0x58053514: show_sched_status_wrk (m_libcassert.c:406)
==4118== by 0x580536E7: report_and_quit (m_libcassert.c:477)
==4118== by 0x58053887: vgPlain_assert_fail (m_libcassert.c:543)
==4118== by 0x5804285F: get_otrack_shadow_offset_wrk (mc_machine.c:403)
==4118== by 0x5804285F: vgMemCheck_get_otrack_shadow_offset (mc_machine.c:97)
==4118== by 0x58008BBB: mb_get_origin_for_guest_offset (mc_main.c:4618)
==4118== by 0x58008BBB: mc_pre_reg_read (mc_main.c:4685)
==4118== by 0x580E2BAB: vgSysWrap_generic_sys_read_before (syswrap-generic.c:4261)
==4118== by 0x580D0E7B: vgPlain_client_syscall (syswrap-main.c:2240)
==4118== by 0x580CAE37: handle_syscall (scheduler.c:1206)
==4118== by 0x580CDFDB: vgPlain_scheduler (scheduler.c:1552)
==4118== by 0x5812011B: thread_wrapper (syswrap-linux.c:102)
==4118== by 0x5812011B: run_a_thread_NORETURN (syswrap-linux.c:155)
sched status:
running_tid=1
Thread 1: status = VgTs_Runnable syscall 3 (lwpid 4118)
==4118== at 0x407B2C0: ??? (in /nix/store/5837cqfasvlj1xzdwd4f47fgi3c6cszg-musl-powerpc64-unknown-linux-musl-1.2.3/lib/libc.so)
==4118== by 0x4069FC7: __syscall_cp (in /nix/store/5837cqfasvlj1xzdwd4f47fgi3c6cszg-musl-powerpc64-unknown-linux-musl-1.2.3/lib/libc.so)
==4118== by 0x407590F: read (in /nix/store/5837cqfasvlj1xzdwd4f47fgi3c6cszg-musl-powerpc64-unknown-linux-musl-1.2.3/lib/libc.so)
==4118== by 0x40780CB: map_library (in /nix/store/5837cqfasvlj1xzdwd4f47fgi3c6cszg-musl-powerpc64-unknown-linux-musl-1.2.3/lib/libc.so)
==4118== by 0x4078CFF: load_library (in /nix/store/5837cqfasvlj1xzdwd4f47fgi3c6cszg-musl-powerpc64-unknown-linux-musl-1.2.3/lib/libc.so)
==4118== by 0x4079F93: __dls3 (in /nix/store/5837cqfasvlj1xzdwd4f47fgi3c6cszg-musl-powerpc64-unknown-linux-musl-1.2.3/lib/libc.so)
==4118== by 0x407969B: __dls2b (in /nix/store/5837cqfasvlj1xzdwd4f47fgi3c6cszg-musl-powerpc64-unknown-linux-musl-1.2.3/lib/libc.so)
==4118== by 0x40795A7: __dls2 (in /nix/store/5837cqfasvlj1xzdwd4f47fgi3c6cszg-musl-powerpc64-unknown-linux-musl-1.2.3/lib/libc.so)
==4118== by 0x4076AF3: _dlstart_c (in /nix/store/5837cqfasvlj1xzdwd4f47fgi3c6cszg-musl-powerpc64-unknown-linux-musl-1.2.3/lib/libc.so)
==4118== by 0x407B313: ??? (in /nix/store/5837cqfasvlj1xzdwd4f47fgi3c6cszg-musl-powerpc64-unknown-linux-musl-1.2.3/lib/libc.so)
c140563 to
9c397c6
Compare
|
This is a long shot (and apologies for the random ping, feel free to disregard if you're busy) but @awilfox maybe you have any input/advice on this? I'm not super knowledgable about how to debug this… Maybe our compiler is slightly misconfigured for your valgrind patch? |
|
I tried, but find digging through the gcc packaging of nix quite exhausting. Everyone I know that used to use nix has switched away, so this is rough. However, I would be quite surprised if it wasn't something related to the toolchain.
Perhaps you could compile a very small hello world binary using our toolchain and yours, and check output of n.b. we don't run valgrind's test suite because a lot of it has hardcoded assumptions conflating endianness with ABI type (BE = ELFv1, EL = ELFv2) when that is incorrect. The tests that don't have those assumptions pass easily; I think we have around 70% passing on ppc64. Fixing it is very low on my priority list right now, but I'm hoping to fix up test suites in general in the autumn, and that might be one of them. |
Description of changes
Details on debugging
Here is a full log of
valgrind <hello>/bin/hello(both ELFv2). Lots ofConditional jump or move depends on uninitialised value(s)&Use of uninitialised value of size, ending inThis works fine on ELFv1 (ELFv1 valgrind, on ELFv1 hello):
So I've arrived at the conclusion that this just doesn't work.
Adding this platform to
badPlatformslets us uselib.meta.availableOnin other derivations (i.e. libdrm, which already uses it).Draft, because I still wanna try installing Adélie Linux on my hardware and give the valgrind there a test (which is where this patch seems to originate from). If it works there, then I'll try to dig deeper into this.Edit: See follow-up comments for results of that.
CC @alyssais, did the resulting binary work when you added the patch in #213341? I tried locally undoing just the valgrind bumps since then, but the binary still completely fails.
Things done
nix.conf? (See Nix manual)sandbox = relaxedsandbox = truenix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage./result/bin/)Add a 👍 reaction to pull requests you find important.