Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault seen while decoding ARM64 kdump. #176

Closed
codernavi18 opened this issue Mar 15, 2024 · 5 comments
Closed

Segmentation fault seen while decoding ARM64 kdump. #176

codernavi18 opened this issue Mar 15, 2024 · 5 comments

Comments

@codernavi18
Copy link

codernavi18 commented Mar 15, 2024

I am using crash-8.0.4 release (make target=ARM64) on my x86_64 host to decode kdump generated on ARM64 target. But when I decode that kdump, the crash itself crashes.

$ sudo ./crash ~/.repos/src/arm64/linux/vmlinux /home/naveen/nfsroot/rootfs-buildroot-arm64/kernel.20240315160627.core.kdump

crash 8.0.4
Copyright (C) 2002-2022  Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
Copyright (C) 1999-2006  Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011, 2020-2022  NEC Corporation
Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
Copyright (C) 2015, 2021  VMware, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions.  Enter "help copying" to see the conditions.
This program has absolutely no warranty.  Enter "help warranty" for details.

GNU gdb (GDB) 10.2
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "--host=x86_64-pc-linux-gnu --target=aarch64-elf-linux".
Type "show configuration" for configuration details.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...

please wait... (determining panic task)Segmentation fault
@codernavi18 codernavi18 changed the title Segmentation fault seen while decoding kdump. Segmentation fault seen while decoding ARM64 kdump. Mar 15, 2024
@codernavi18
Copy link
Author

Here's the backtrace of the crash :

GNU gdb (GDB) 10.2
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "--host=x86_64-pc-linux-gnu --target=aarch64-elf-linux".
Type "show configuration" for configuration details.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...

please wait... (determining panic task)
Thread 1 "crash" received signal SIGSEGV, Segmentation fault.
value_search_module_6_4 (value=18446603338276298752, offset=0x7ffffffface0) at symbols.c:5564
5564				if (value < sp->value)
(gdb) bt
#0  value_search_module_6_4 (value=18446603338276298752, offset=0x7ffffffface0) at symbols.c:5564
#1  0x0000555555812bd0 in value_to_symstr (value=18446603338276298752,
    buf=buf@entry=0x7fffffffb9c0 "", radix=10, radix@entry=0) at symbols.c:5872
#2  0x00005555557694a2 in display_memory (addr=<optimized out>, count=2048, flag=208,
    memtype=memtype@entry=1, opt=opt@entry=0x0) at memory.c:1740
#3  0x0000555555769e1f in raw_stack_dump (stackbase=<optimized out>, size=<optimized out>)
    at memory.c:2194
#4  0x00005555557923ff in get_active_set_panic_task () at task.c:8639
#5  0x00005555557930d2 in get_dumpfile_panic_task () at task.c:7628
#6  0x00005555557a89d3 in panic_search () at task.c:7380
#7  get_panic_context () at task.c:6267
#8  task_init () at task.c:687
#9  0x00005555557305b3 in main_loop () at main.c:787
#10 0x0000555555a64331 in captured_main (data=<optimized out>) at main.c:1284
#11 gdb_main (args=<optimized out>) at main.c:1313
#12 0x0000555555a643b0 in gdb_main_entry (argc=<optimized out>, argv=argv@entry=0x7fffffffe508)
    at main.c:1338
#13 0x00005555557d1ece in gdb_main_loop (argc=<optimized out>, argc@entry=3,
    argv=argv@entry=0x7fffffffe508) at gdb_interface.c:81
#14 0x0000555555728dfc in main (argc=3, argv=0x7fffffffe508) at main.c:720

@codernavi18
Copy link
Author

The kernel module being loded is a dummy kernel module that just have a null pointer deference in the init function, to trigger a kernel panic intentionally. The debug symbols are present.

$ /opt/arm-gnu-toolchain-13.2.Rel1-x86_64-aarch64-none-linux-gnu/bin/aarch64-none-linux-gnu-objdump -t drivers/naveen/npdereference.ko

drivers/naveen/npdereference.ko:     file format elf64-littleaarch64

SYMBOL TABLE:
0000000000000000 l    d  .text	0000000000000000 .text
0000000000000000 l    d  .init.text	0000000000000000 .init.text
0000000000000000 l    d  .exit.text	0000000000000000 .exit.text
0000000000000000 l    d  .plt	0000000000000000 .plt
0000000000000000 l    d  .init.plt	0000000000000000 .init.plt
0000000000000000 l    d  .text.ftrace_trampoline	0000000000000000 .text.ftrace_trampoline
0000000000000000 l    d  .rodata.str1.8	0000000000000000 .rodata.str1.8
0000000000000000 l    d  .modinfo	0000000000000000 .modinfo
0000000000000000 l    d  .note.gnu.property	0000000000000000 .note.gnu.property
0000000000000000 l    d  .note.gnu.build-id	0000000000000000 .note.gnu.build-id
0000000000000000 l    d  .note.Linux	0000000000000000 .note.Linux
0000000000000000 l    d  .data	0000000000000000 .data
0000000000000000 l    d  .exit.data	0000000000000000 .exit.data
0000000000000000 l    d  .init.data	0000000000000000 .init.data
0000000000000000 l    d  .gnu.linkonce.this_module	0000000000000000 .gnu.linkonce.this_module
0000000000000000 l    d  .bss	0000000000000000 .bss
0000000000000000 l    d  .note.GNU-stack	0000000000000000 .note.GNU-stack
0000000000000000 l    d  .comment	0000000000000000 .comment
0000000000000000 l    d  .debug_info	0000000000000000 .debug_info
0000000000000000 l    d  .debug_abbrev	0000000000000000 .debug_abbrev
0000000000000000 l    d  .debug_aranges	0000000000000000 .debug_aranges
0000000000000000 l    d  .debug_rnglists	0000000000000000 .debug_rnglists
0000000000000000 l    d  .debug_line	0000000000000000 .debug_line
0000000000000000 l    d  .debug_str	0000000000000000 .debug_str
0000000000000000 l    d  .debug_line_str	0000000000000000 .debug_line_str
0000000000000000 l    d  .debug_frame	0000000000000000 .debug_frame
0000000000000000 l    df *ABS*	0000000000000000 npdereference.c
0000000000000000 l     F .init.text	0000000000000040 null_deref_module_init
0000000000000000 l     F .exit.text	0000000000000024 null_deref_module_exit
0000000000000000 l     O .exit.data	0000000000000008 __UNIQUE_ID___addressable_cleanup_module332
0000000000000000 l     O .init.data	0000000000000008 __UNIQUE_ID___addressable_init_module331
0000000000000000 l     O .modinfo	0000000000000049 __UNIQUE_ID_description330
0000000000000049 l     O .modinfo	0000000000000011 __UNIQUE_ID_author329
000000000000005a l     O .modinfo	000000000000000c __UNIQUE_ID_license328
0000000000000000 l    df *ABS*	0000000000000000 npdereference.mod.c
0000000000000066 l     O .modinfo	0000000000000009 __UNIQUE_ID_depends331
000000000000006f l     O .modinfo	0000000000000009 __UNIQUE_ID_intree330
0000000000000078 l     O .modinfo	0000000000000013 __UNIQUE_ID_name329
000000000000008b l     O .modinfo	0000000000000048 __UNIQUE_ID_vermagic328
0000000000000000 l     O .note.Linux	0000000000000018 _note_15
0000000000000018 l     O .note.Linux	0000000000000018 _note_14
0000000000000000 g     O .gnu.linkonce.this_module	0000000000000440 __this_module
0000000000000000 g     F .exit.text	0000000000000024 cleanup_module
0000000000000000 g     F .init.text	0000000000000040 init_module
0000000000000000         *UND*	0000000000000000 _printk


naveen@workstation:~/.repos/src/arm64/linux$ file drivers/naveen/npdereference.ko
drivers/naveen/npdereference.ko: ELF 64-bit LSB relocatable, ARM aarch64, version 1 (SYSV), BuildID[sha1]=118e35b0267440ef364c551c5890ff934392fb6c, with debug_info, not stripped
naveen@workstation:~/.repos/src/arm64/linux$

@liutgnu
Copy link
Member

liutgnu commented Apr 2, 2024

Here's the backtrace of the crash :

GNU gdb (GDB) 10.2
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "--host=x86_64-pc-linux-gnu --target=aarch64-elf-linux".
Type "show configuration" for configuration details.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...

please wait... (determining panic task)
Thread 1 "crash" received signal SIGSEGV, Segmentation fault.
value_search_module_6_4 (value=18446603338276298752, offset=0x7ffffffface0) at symbols.c:5564
5564				if (value < sp->value)

Interesting, the above code shouldn't cause any segfault, except the sp is an invalid pointer. Could you please print the value of sp out?

In addition, if sp is a valid pointer, then I guess the line printed here isn't the real place where segfault happens. In this case, you can rebuild crash utility source code without any compiling optimization. To do this, in crash source code, "make target=ARM64" to compile crash for the first time, then "make clean && make target=ARM64" to clean up and make the 2nd time. There won't be compiling optimization for the 2nd time :). Then re-test and post your findings here.

If none of these can work for you, it will be OK to send your vmcore/vmlinux to me via google drive or any other sharing method by private email if the vmcore shouldn't open to public, so I can have a look myself.

Thanks,
Tao Liu

(gdb) bt
#0 value_search_module_6_4 (value=18446603338276298752, offset=0x7ffffffface0) at symbols.c:5564
#1 0x0000555555812bd0 in value_to_symstr (value=18446603338276298752,
buf=buf@entry=0x7fffffffb9c0 "", radix=10, radix@entry=0) at symbols.c:5872
#2 0x00005555557694a2 in display_memory (addr=, count=2048, flag=208,
memtype=memtype@entry=1, opt=opt@entry=0x0) at memory.c:1740
#3 0x0000555555769e1f in raw_stack_dump (stackbase=, size=)
at memory.c:2194
#4 0x00005555557923ff in get_active_set_panic_task () at task.c:8639
#5 0x00005555557930d2 in get_dumpfile_panic_task () at task.c:7628
#6 0x00005555557a89d3 in panic_search () at task.c:7380
#7 get_panic_context () at task.c:6267
#8 task_init () at task.c:687
#9 0x00005555557305b3 in main_loop () at main.c:787
#10 0x0000555555a64331 in captured_main (data=) at main.c:1284
#11 gdb_main (args=) at main.c:1313
#12 0x0000555555a643b0 in gdb_main_entry (argc=, argv=argv@entry=0x7fffffffe508)
at main.c:1338
#13 0x00005555557d1ece in gdb_main_loop (argc=, argc@entry=3,
argv=argv@entry=0x7fffffffe508) at gdb_interface.c:81
#14 0x0000555555728dfc in main (argc=3, argv=0x7fffffffe508) at main.c:720

@codernavi18
Copy link
Author

Sorry my bad. I forgot to mention that the sp is coming as NULL.
kdump : https://drive.google.com/file/d/1z55OHcPLuKy5KvsMml1uTJ2kJYf3LqwI/view?usp=drive_link
vmlinux : https://drive.google.com/file/d/1DusF8Ipu24b5VQBYmbUjGg5nfCoPErdM/view?usp=drive_link

The kernel uImage is built from vanilla 6.5 linux kernel release, built for arm64 using defconfig. The module just has two lines of code in init to trigger null pointer deference and when this module is loaded, the kdump is triggered. The makedumpfile utility is used to generate the kdump using command :
makedumpfile --message-level 4 -d 17,31 /proc/vmcore "${FILENAME}"

@liutgnu
Copy link
Member

liutgnu commented Apr 2, 2024

Patch posted upstream1, it can work according to my test. Thanks for your bug reporting and vmcore providing!

@k-hagio k-hagio closed this as completed in ced754d Apr 4, 2024
liutgnu added a commit to liutgnu/crash-preview that referenced this issue Dec 5, 2024
The following segmentation fault occurred during session initialization:

  $ crash vmlinx vmcore
  ...
  please wait... (determining panic task)Segmentation fault

Here is the backtrace of the crash-utility:

  (gdb) bt
  #0  value_search_module_6_4 (value=18446603338276298752, offset=0x7ffffffface0) at symbols.c:5564
  crash-utility#1  0x0000555555812bd0 in value_to_symstr (value=18446603338276298752,
      buf=buf@entry=0x7fffffffb9c0 "", radix=10, radix@entry=0) at symbols.c:5872
  crash-utility#2  0x00005555557694a2 in display_memory (addr=<optimized out>, count=2048, flag=208,
      memtype=memtype@entry=1, opt=opt@entry=0x0) at memory.c:1740
  crash-utility#3  0x0000555555769e1f in raw_stack_dump (stackbase=<optimized out>, size=<optimized out>)
      at memory.c:2194
  crash-utility#4  0x00005555557923ff in get_active_set_panic_task () at task.c:8639
  crash-utility#5  0x00005555557930d2 in get_dumpfile_panic_task () at task.c:7628
  crash-utility#6  0x00005555557a89d3 in panic_search () at task.c:7380
  crash-utility#7  get_panic_context () at task.c:6267
  crash-utility#8  task_init () at task.c:687
  crash-utility#9  0x00005555557305b3 in main_loop () at main.c:787
  ...

This is due to lack of existence check on module symbol table.  Not all
mod_mem_type will be existent for a module, e.g. in the following module
case:

  (gdb) p lm->symtable[0]
  $1 = (struct syment *) 0x4dcbad0
  (gdb) p lm->symtable[1]
  $2 = (struct syment *) 0x4dcbb70
  (gdb) p lm->symtable[2]
  $3 = (struct syment *) 0x4dcbc10
  (gdb) p lm->symtable[3]
  $4 = (struct syment *) 0x0
  (gdb) p lm->symtable[4]
  $5 = (struct syment *) 0x4dcbcb0
  (gdb) p lm->symtable[5]
  $6 = (struct syment *) 0x4dcbd00
  (gdb) p lm->symtable[6]
  $7 = (struct syment *) 0x0

MOD_RO_AFTER_INIT(3) and MOD_INIT_RODATA(6) do not exist, which should
be skipped, otherwise the segmentation fault will happen.

Fixes: 7750e61 ("Support module memory layout change on Linux 6.4")
Closes: crash-utility#176
Reported-by: Naveen Chaudhary <[email protected]>
Signed-off-by: Tao Liu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants