Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bt: cannot transition from exception stack to current process stack #43

Open
stevenh opened this issue Nov 13, 2019 · 4 comments
Open

Comments

@stevenh
Copy link

stevenh commented Nov 13, 2019

When attempting to debug a kernel panic on ubuntu 16.04 LTS with hwe the stack is unavailable.

Having manually built 7.2.7 I got a little further but still hit a dead end with it failing with:
bt: cannot transition from exception stack to current process stack:

./crash /usr/lib/debug/boot/vmlinux-4.15.0-1041-gcp /var/crash/201911112306/dump.201911112306 

crash 7.2.7
Copyright (C) 2002-2019  Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
Copyright (C) 1999-2006  Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011  NEC Corporation
Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions.  Enter "help copying" to see the conditions.
This program has absolutely no warranty.  Enter "help warranty" for details.
 
GNU gdb (GDB) 7.6
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...

WARNING: kernel relocated [918MB]: patching 99648 gdb minimal_symbol values

      KERNEL: /usr/lib/debug/boot/vmlinux-4.15.0-1041-gcp              
    DUMPFILE: /var/crash/201911112306/dump.201911112306  [PARTIAL DUMP]
        CPUS: 8
        DATE: Mon Nov 11 23:06:47 2019
      UPTIME: 02:06:18
LOAD AVERAGE: 1.49, 1.69, 1.72
       TASKS: 396
    NODENAME: XXXXX
     RELEASE: 4.15.0-1041-gcp
     VERSION: #43-Ubuntu SMP Wed Aug 21 09:04:51 UTC 2019
     MACHINE: x86_64  (2300 Mhz)
      MEMORY: 30 GB
       PANIC: "BUG: unable to handle kernel paging request at ffffffffba678770"
         PID: 13869
     COMMAND: "PoolThread 6"
        TASK: ffffa0245a690000  [THREAD_INFO: ffffa0245a690000]
         CPU: 1
       STATE: TASK_RUNNING (PANIC)

crash> bt 13869
PID: 13869  TASK: ffffa0245a690000  CPU: 1   COMMAND: "PoolThread 6"
 #0 [fffffe0000033d60] machine_kexec at ffffffffba6669ce
 #1 [fffffe0000033dc0] __crash_kexec at ffffffffba732bd9
 #2 [fffffe0000033e88] panic at ffffffffba691a45
 #3 [fffffe0000033f10] df_debug at ffffffffba66ae0d
 #4 [fffffe0000033f28] do_double_fault at ffffffffba62f49a
 #5 [fffffe0000033f50] double_fault at ffffffffbb000fe3
    [exception RIP: __sprint_symbol+69]
    RIP: ffffffffba731165  RSP: fffffe0000032fe8  RFLAGS: 00010046
    RAX: 0000000000000000  RBX: ffffffffba678770  RCX: fffffe0000032fe8
    RDX: fffffe0000032ff0  RSI: fffffe0000032ff8  RDI: ffffffffba678770
    RBP: fffffe0000033030   R8: fffffe0000033051   R9: fffffe0000033320
    R10: fffffe0000033388  R11: ffffffffbbd5e80d  R12: fffffe0000033051
    R13: 0000000000000000  R14: 0000000000000001  R15: ffffffffbb6a51b0
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
--- <DOUBLEFAULT exception stack> ---
 #6 [fffffe0000032fe8] __sprint_symbol at ffffffffba731165
bt: cannot transition from exception stack to current process stack:
    exception stack pointer: fffffe0000033d60
      process stack pointer: fffffe0000033038
         current stack base: ffffb2d48cfdc000

Last two entries from the log which may be relevant:

[29413.763776] unable to execute userspace code (SMEP?) (uid: 2000)
[29413.769982] BUG: unable to handle kernel paging request at ffffffffba678770

Any ideas on how to identify what user space stack caused the kernel to panic?

This is something that happens on a semi regular basis with this app.

@crash-utility
Copy link
Collaborator

crash-utility commented Nov 13, 2019 via email

@stevenh
Copy link
Author

stevenh commented Nov 14, 2019

Thanks for the feedback, here's the output from the requested commands:

crash> sym ffffffffba678770
ffffffffba678770 (T) do_page_fault /build/linux-gcp-lp4Fx0/linux-gcp-4.15.0/arch/x86/mm/fault.c: 1529
crash> vtop ffffffffba678770
VIRTUAL           PHYSICAL        
ffffffffba678770  3b2a78770       

PGD DIRECTORY: ffffffffbb80a000
PAGE DIRECTORY: 3b3c0e067
   PUD: 3b3c0eff0 => 3b3c0f063
   PMD: 3b3c0fe98 => 3b2a000e1
  PAGE: 3b2a00000  (2MB)

   PTE     PHYSICAL   FLAGS
3b2a000e1  3b2a00000  (PRESENT|ACCESSED|DIRTY|PSE)

      PAGE        PHYSICAL      MAPPING       INDEX CNT FLAGS
fffffbd1ceca9e00 3b2a78000                0        0  1 17ffffc0000800 reserved
crash> 

@crash-utility
Copy link
Collaborator

I cannot make sense of how this scenario evolved. It appears that it
has made it to no_context(), where the "BUG: unable to handle kernel ..."
message is printed before crashing. It normally gets there via
do_page_fault(), but it's that function's address that is the address
that it cannot translate. Or perhaps it thinks it's in user mode and
therefore cannot translate kernel virtual address? Sorry, I don't really
have any suggestions.

@crash-utility
Copy link
Collaborator

As you surmised, it's presumably related to that X86_CR4_SMEP message.
"Supervisor Mode Access Prevention" is something I know nothing about.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant