Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Valgrind always crashes Rust programs on FreeBSD with "failed to allocate a guard page" #154

Closed
asomers opened this issue Feb 3, 2021 · 8 comments

Comments

@asomers
Copy link

asomers commented Feb 3, 2021

Every Rust program will crash on FreeBSD when run with Valgrind with the error "failed to allocate a guard page". This affects literally every single Rust program. For example, ripgrep. It affects every tool: memcheck, cachegrind, callgrind, helgrind, drd, massif, lackey, exp-bbv, and even none.

STEPS TO REPRODUCE

  1. pkg install ripgrep
  2. valgrind --tool=callgrind /usr/local/bin/rg

OBSERVED RESULT
$ valgrind --tool=memcheck /usr/local/bin/rg
==10062== Memcheck, a memory error detector
==10062== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==10062== Using Valgrind-3.17.0.GIT and LibVEX; rerun with -h for copyright info
==10062== Command: /usr/local/bin/rg
==10062==
thread '' panicked at 'failed to allocate a guard page', library/std/src/sys/unix/thread.rs:364:17
note: run with RUST_BACKTRACE=1 environment variable to display a backtrace
fatal runtime error: failed to initiate panic, error 5
==10062==
==10062== Process terminating with default action of signal 6 (SIGABRT): dumping core

EXPECTED RESULT
The program should run normally.

SOFTWARE/OS VERSIONS
FreeBSD. Reproduced on 14.0-CURRENT, 12.2-RELEASE, 11.2-RELEASE, and 11.4-RELEASE amd64.
Reproduced with Valgrind 3.10.1 and 3.17.0.GIT,

ADDITIONAL INFORMATION

Rust bug entry. The Rust team believes this to be a Valgrind bug, however.
rust-lang/rust#67153

Rust code that allocates the guard page on startup of every program.
https://doc.rust-lang.org/src/std/sys/unix/thread.rs.html#346

@paulfloyd
Copy link
Owner

paulfloyd commented Feb 3, 2021

Here are some traces

SYSCALL6127,1 sys___sysctlbyname ( 0x491113d(kern.sched.cpusetsize), 0x15, 0x4935740, 0x7fc0002f8, 0 )[sync] --> Success(0x0)
SYSCALL6127,1 sys_mmap ( 0x0, 4096, 3, 4098, 4294967295, 0x0)--6127-- di_notify_mmap-0:
--6127-- di_notify_mmap-1: 0x4b20000-0x4d5dfff rw-
--> [pre-success] Success(0x4d5d000)
SYSCALL6127,1 sys_cpuset_getaffinity ( 3, 1, 101876, 32, 0x4d5d000 )[sync] --> Success(0x0)
SYSCALL6127,1 sys_mmap ( 0x7fffdffff000, 4096, 3, 4114, 4294967295, 0x0) --> [pre-fail] Failure(0x16)
SYSCALL6127,1 sys_mmap ( 0x0, 131072, 3, 4098, 4294967295, 0x0)--6127-- di_notify_mmap-0:
--6127-- di_notify_mmap-1: 0x4b20000-0x4d7dfff rw-
--> [pre-success] Success(0x4d5e000)
SYSCALL6127,1 sys_mmap ( 0x0, 12288, 3, 4098, 4294967295, 0x0)--6127-- di_notify_mmap-0:
--6127-- di_notify_mmap-1: 0x4b20000-0x4d80fff rw-
--> [pre-success] Success(0x4d7e000)
SYSCALL6127,1 sys_mmap ( 0x0, 4096, 3, 4098, 4294967295, 0x0)--6127-- di_notify_mmap-0:
--6127-- di_notify_mmap-1: 0x4b20000-0x4d81fff rw-
--> [pre-success] Success(0x4d81000)
SYSCALL6127,1 sys_mmap ( 0x0, 20480, 3, 4098, 4294967295, 0x0)--6127-- di_notify_mmap-0:
--6127-- di_notify_mmap-1: 0x4b20000-0x4d86fff rw-
--> [pre-success] Success(0x4d82000)

The failure is in bold.

It will take me a while to figure out what the rust application is mmaping, how that differs from C/C++ applications.

The error code looks like EINVAL

SYSCALL6127,1 sys_mmap ( 0x7fffdffff000, 4096, 3, 4114, 4294967295, 0x0) --> [pre-fail] Failure(0x16)

Other than demangling there is little in the way of Rust specific code in Valgrind.

Looking a bit at the VG code, the failure is here

   if (forClient && req->rkind == MFixed) {
      Int  iLo   = find_nsegment_idx(reqStart);
      Int  iHi   = find_nsegment_idx(reqEnd);
      Bool allow = True;
      for (i = iLo; i <= iHi; i++) {
         if (nsegments[i].kind == SkFree
             || nsegments[i].kind == SkFileC
             || nsegments[i].kind == SkAnonC
             || nsegments[i].kind == SkShmC
             || nsegments[i].kind == SkResvn) {
            /* ok */
         } else {
            allow = False;
            VG_(printf)("in advisory about to go bad, kind %d\n", (int)nsegments[i].kind );
            break;
         }
      }
      if (allow) {
         /* Acceptable.  Granted. */
         *ok = True;
         return reqStart;
      }
      /* Not acceptable.  Fail. */
      VG_(printf)("in advisory bad 0\n");
      *ok = False;
      return 0;
   }

with a few added printfs. The kind is 4 which is
SkAnonV = 0x04, // anonymous mapping belonging to valgrind

So the guest is trying to mmap to anon space reserved for the host.

@paulfloyd
Copy link
Owner

Not surprisingly, no useful stack info from the guest stack

(gdb) p vgPlain_get_and_pp_StackTrace(0, 6)
==58190==    at 0x0: ???
$4 = void
(gdb) p vgPlain_get_and_pp_StackTrace(1, 6)
==58190==    at 0x4B06C2A: thr_kill (in /lib/libc.so.7)
==58190==    by 0x4B05083: raise (in /lib/libc.so.7)
==58190==    by 0x4A7B278: abort (in /lib/libc.so.7)
==58190==    by 0x521D19: ??? (in /usr/local/bin/rg)
==58190==    by 0x510B3F: ??? (in /usr/local/bin/rg)
==58190==    by 0x51AA73: ??? (in /usr/local/bin/rg)

@paulfloyd
Copy link
Owner

paulfloyd commented Mar 11, 2021

Installing the rust package and building hello world with debug info gives

=59160== Process terminating with default action of signal 6 (SIGABRT): dumping core
==59160==    at 0x4A4AC2A: thr_kill (in /lib/libc.so.7)
==59160==    by 0x49BF278: abort (in /lib/libc.so.7)
==59160==    by 0x13C599: std::sys::unix::abort_internal (in /usr/home/paulf/scratch/vg_rust/hello)
==59160==    by 0x13562F: std::sys_common::util::abort (in /usr/home/paulf/scratch/vg_rust/hello)
==59160==    by 0x1393A3: rust_panic (in /usr/home/paulf/scratch/vg_rust/hello)
==59160==    by 0x13930B: std::panicking::rust_panic_with_hook (in /usr/home/paulf/scratch/vg_rust/hello)
==59160==    by 0x129115: std::panicking::begin_panic::{{closure}} (in /usr/home/paulf/scratch/vg_rust/hello)
==59160==    by 0x128A3F: std::sys_common::backtrace::__rust_end_short_backtrace (in /usr/home/paulf/scratch/vg_rust/hello)
==59160==    by 0x1390EE: std::panicking::begin_panic (in /usr/home/paulf/scratch/vg_rust/hello)
==59160==    by 0x131AA6: std::sys::unix::thread::guard::init (in /usr/home/paulf/scratch/vg_rust/hello)
==59160==    by 0x1393DA: std::rt::lang_start_internal (in /usr/home/paulf/scratch/vg_rust/hello)
==59160==    by 0x11CA91: std::rt::lang_start (rt.rs:65)

@paulfloyd
Copy link
Owner

This doesn't look like it will be that easy to fix. The problem is that we have two functions:

    pthread_attr_get_np(pthread_self(), &attr);
    pthread_attr_getstack(&attr, &stackaddr, &stacksize);

In the fist function, we know the tid so we can tell if it is the main thread or not. BUT we don't want to mess with attr
In the second we can't tell if it is the primary thread or not.

@asomers
Copy link
Author

asomers commented Mar 29, 2021

Do you know why it's an issue for Rust but not for C?

@paulfloyd
Copy link
Owner

Probably because the C startup code isn't trying to add a guard page (or at least doing so differently).

@paulfloyd
Copy link
Owner

paulfloyd commented Apr 7, 2021

Fixed with the latest push

commit 5923237
Author: Paul Floyd [email protected]
Date: Wed Apr 7 08:37:20 2021 +0200

Modify the value returned by the kern.usrstack sysctl to reflect the
user stack that Valgrind synthesizes for the guest. Without this change
the sysctl will return the stack of the Valgrind host. This manifested itself
as a problem on rust compiled binaries, which were trying to add an extra
guard page but were failing since Valgrind refused guest mmaps into what it
considered to be its own memory space.

@asomers
Copy link
Author

asomers commented Apr 7, 2021

Thanks Paul!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants