Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

some trouble I meet when I use the crash -utility to parse the dump of arch arm64 #12

Closed
hzy0 opened this issue May 27, 2017 · 1 comment

Comments

@hzy0
Copy link

hzy0 commented May 27, 2017

the error like this :
crash64> extend /home/huangzaiyang/WorkSpack/crash_ext_64/gcore.so
/home/huangzaiyang/WorkSpack/crash_ext_64/gcore.so: shared object already loaded

crash64> bt
PID: 545 TASK: ffffffc0620cf000 CPU: 3 COMMAND: "ftmdaemon"
bt: WARNING: cannot determine starting stack frame for task ffffffc0620cf000

crash64> gcore 545
Failed.

can any body help me with this?

@crash-utility
Copy link
Collaborator

crash-utility commented May 28, 2017 via email

k-hagio added a commit to k-hagio/crash that referenced this issue Apr 6, 2021
Fix for 'bt' command and options on Linux 5.8-rc1 or later kernels
that contain merge commit 076f14be7fc942e112c94c841baec44124275cd0.
The merged patches changed the name of exception functions that
have been used by the crash utility to check the exception frame.
Without the patch, the command and options cannot display it.

Before:
  crash> bt
  PID: 8752   TASK: ffff8f80cb244380  CPU: 2   COMMAND: "insmod"
   #0 [ffffa3e40187f9f8] machine_kexec at ffffffffab25d267
   crash-utility#1 [ffffa3e40187fa48] __crash_kexec at ffffffffab38e2ed
   crash-utility#2 [ffffa3e40187fb10] crash_kexec at ffffffffab38f1dd
   crash-utility#3 [ffffa3e40187fb28] oops_end at ffffffffab222cbd
   crash-utility#4 [ffffa3e40187fb48] do_trap at ffffffffab21fea1
   crash-utility#5 [ffffa3e40187fb90] do_error_trap at ffffffffab21ff75
   crash-utility#6 [ffffa3e40187fbd0] exc_invalid_op at ffffffffabb76a2c
   crash-utility#7 [ffffa3e40187fbf0] asm_exc_invalid_op at ffffffffabc00a72
   crash-utility#8 [ffffa3e40187fc78] init_module at ffffffffc042b018 [invalid]
   crash-utility#9 [ffffa3e40187fca0] init_module at ffffffffc042b018 [invalid]
  crash-utility#10 [ffffa3e40187fca8] do_one_initcall at ffffffffab202806
  crash-utility#11 [ffffa3e40187fd18] do_init_module at ffffffffab3888ba
  crash-utility#12 [ffffa3e40187fd38] load_module at ffffffffab38afde

After:
  crash> bt
  PID: 8752   TASK: ffff8f80cb244380  CPU: 2   COMMAND: "insmod"
   #0 [ffffa3e40187f9f8] machine_kexec at ffffffffab25d267
   crash-utility#1 [ffffa3e40187fa48] __crash_kexec at ffffffffab38e2ed
   crash-utility#2 [ffffa3e40187fb10] crash_kexec at ffffffffab38f1dd
   crash-utility#3 [ffffa3e40187fb28] oops_end at ffffffffab222cbd
   crash-utility#4 [ffffa3e40187fb48] do_trap at ffffffffab21fea1
   crash-utility#5 [ffffa3e40187fb90] do_error_trap at ffffffffab21ff75
   crash-utility#6 [ffffa3e40187fbd0] exc_invalid_op at ffffffffabb76a2c
   crash-utility#7 [ffffa3e40187fbf0] asm_exc_invalid_op at ffffffffabc00a72
      [exception RIP: init_module+24]
      RIP: ffffffffc042b018  RSP: ffffa3e40187fca8  RFLAGS: 00010246
      RAX: 000000000000001c  RBX: 0000000000000000  RCX: 0000000000000000
      RDX: 0000000000000000  RSI: ffff8f80fbd18000  RDI: ffff8f80fbd18000
      RBP: ffffffffc042b000   R8: 000000000000029d   R9: 000000000000002c
      R10: 0000000000000000  R11: ffffa3e40187fb58  R12: ffffffffc042d018
      R13: ffffa3e40187fdf0  R14: ffffffffc042d000  R15: ffffa3e40187fe90
      ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
   crash-utility#8 [ffffa3e40187fca0] init_module at ffffffffc042b018 [invalid]
   crash-utility#9 [ffffa3e40187fca8] do_one_initcall at ffffffffab202806
  crash-utility#10 [ffffa3e40187fd18] do_init_module at ffffffffab3888ba
  crash-utility#11 [ffffa3e40187fd38] load_module at ffffffffab38afde

Signed-off-by: Kazuhito Hagio <[email protected]>
k-hagio added a commit that referenced this issue Apr 16, 2021
Fix for 'bt' command and options on Linux 5.8-rc1 and later kernels
that contain merge commit 076f14be7fc942e112c94c841baec44124275cd0.
The merged patches changed the name of exception functions that
have been used by the crash utility to check the exception frame.
Without the patch, the command and options cannot display it.

Before:
  crash> bt
  PID: 8752   TASK: ffff8f80cb244380  CPU: 2   COMMAND: "insmod"
   #0 [ffffa3e40187f9f8] machine_kexec at ffffffffab25d267
   #1 [ffffa3e40187fa48] __crash_kexec at ffffffffab38e2ed
   #2 [ffffa3e40187fb10] crash_kexec at ffffffffab38f1dd
   #3 [ffffa3e40187fb28] oops_end at ffffffffab222cbd
   #4 [ffffa3e40187fb48] do_trap at ffffffffab21fea1
   #5 [ffffa3e40187fb90] do_error_trap at ffffffffab21ff75
   #6 [ffffa3e40187fbd0] exc_invalid_op at ffffffffabb76a2c
   #7 [ffffa3e40187fbf0] asm_exc_invalid_op at ffffffffabc00a72
   #8 [ffffa3e40187fc78] init_module at ffffffffc042b018 [invalid]
   #9 [ffffa3e40187fca0] init_module at ffffffffc042b018 [invalid]
  #10 [ffffa3e40187fca8] do_one_initcall at ffffffffab202806
  #11 [ffffa3e40187fd18] do_init_module at ffffffffab3888ba
  #12 [ffffa3e40187fd38] load_module at ffffffffab38afde

After:
  crash> bt
  PID: 8752   TASK: ffff8f80cb244380  CPU: 2   COMMAND: "insmod"
   #0 [ffffa3e40187f9f8] machine_kexec at ffffffffab25d267
   #1 [ffffa3e40187fa48] __crash_kexec at ffffffffab38e2ed
   #2 [ffffa3e40187fb10] crash_kexec at ffffffffab38f1dd
   #3 [ffffa3e40187fb28] oops_end at ffffffffab222cbd
   #4 [ffffa3e40187fb48] do_trap at ffffffffab21fea1
   #5 [ffffa3e40187fb90] do_error_trap at ffffffffab21ff75
   #6 [ffffa3e40187fbd0] exc_invalid_op at ffffffffabb76a2c
   #7 [ffffa3e40187fbf0] asm_exc_invalid_op at ffffffffabc00a72
      [exception RIP: init_module+24]
      RIP: ffffffffc042b018  RSP: ffffa3e40187fca8  RFLAGS: 00010246
      RAX: 000000000000001c  RBX: 0000000000000000  RCX: 0000000000000000
      RDX: 0000000000000000  RSI: ffff8f80fbd18000  RDI: ffff8f80fbd18000
      RBP: ffffffffc042b000   R8: 000000000000029d   R9: 000000000000002c
      R10: 0000000000000000  R11: ffffa3e40187fb58  R12: ffffffffc042d018
      R13: ffffa3e40187fdf0  R14: ffffffffc042d000  R15: ffffa3e40187fe90
      ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
   #8 [ffffa3e40187fca0] init_module at ffffffffc042b018 [invalid]
   #9 [ffffa3e40187fca8] do_one_initcall at ffffffffab202806
  #10 [ffffa3e40187fd18] do_init_module at ffffffffab3888ba
  #11 [ffffa3e40187fd38] load_module at ffffffffab38afde

Signed-off-by: Kazuhito Hagio <[email protected]>
yangh added a commit to yangh/crash that referenced this issue Nov 15, 2021
Overflow stack supported since kernel 4.14 in commit 872d8327ce8,
without this patch, bt command trigger a SIGSEGV fault due the SP
pointed to the overflow stack which not yet loaded by crash.

Before:

      KERNEL: ../vmlinux
    DUMPFILE: la_guestdump.gcore
        CPUS: 8
        DATE: Tue Jul 13 19:59:44 CST 2021
      UPTIME: 00:00:42
LOAD AVERAGE: 3.99, 1.13, 0.39
       TASKS: 1925
    NODENAME: localhost
     RELEASE: 4.14.156+
     VERSION: crash-utility#1 SMP PREEMPT Tue Jul 13 10:37:23 UTC 2021
     MACHINE: aarch64  (unknown Mhz)
      MEMORY: 8.7 GB
       PANIC: "Kernel panic - not syncing: kernel stack overflow"
         PID: 1969
     COMMAND: "irq/139-0-0024"
        TASK: ffffffcc1a230000  [THREAD_INFO: ffffffcc1a230000]
         CPU: 0
       STATE: TASK_RUNNING (PANIC)

crash-7.3.0> bt
PID: 1969   TASK: ffffffcc1a230000  CPU: 0   COMMAND: "irq/139-0-0024"
Segmentation fault (core dumped)

After:

crash> bt
PID: 1969   TASK: ffffffcc1a230000  CPU: 0   COMMAND: "irq/139-0-0024"
  #0 [ffffffcc7fd5cf50] __delay at ffffff8008c80774
  crash-utility#1 [ffffffcc7fd5cf60] __const_udelay at ffffff8008c80864
  crash-utility#2 [ffffffcc7fd5cf80] msm_trigger_wdog_bite at ffffff80084e9430
  crash-utility#3 [ffffffcc7fd5cfa0] do_vm_restart at ffffff80087bc974
  crash-utility#4 [ffffffcc7fd5cfc0] machine_restart at ffffff80080856fc
  crash-utility#5 [ffffffcc7fd5cfd0] emergency_restart at ffffff80080d49bc
  crash-utility#6 [ffffffcc7fd5d140] panic at ffffff80080af4c0
  crash-utility#7 [ffffffcc7fd5d150] nmi_panic at ffffff80080af150
  crash-utility#8 [ffffffcc7fd5d190] handle_bad_stack at ffffff800808b0b8
  crash-utility#9 [ffffffcc7fd5d2d0] __bad_stack at ffffff800808285c
--- <IRQ stack> ---
 crash-utility#10 [ffffff801187bc60] el1_error_invalid at ffffff8008082e7c
 crash-utility#11 [ffffff801187bcc0] cyttsp6_mt_attention at ffffff8000e8498c [cyttsp6]
 crash-utility#12 [ffffff801187bd20] call_atten_cb at ffffff8000e82030 [cyttsp6]
 crash-utility#13 [ffffff801187bdc0] cyttsp6_irq at ffffff8000e81e34 [cyttsp6]
 crash-utility#14 [ffffff801187bdf0] irq_thread_fn at ffffff8008128dd8
 crash-utility#15 [ffffff801187be50] irq_thread at ffffff8008128ca4
 crash-utility#16 [ffffff801187beb0] kthread at ffffff80080d2fc4
crash>

Signed-off-by: Hong YANG <[email protected]>
k-hagio pushed a commit that referenced this issue Feb 16, 2023
Kernel commit 7d65f4a65532 ("irq: Consolidate do_softirq() arch overriden
implementations") renamed the call_softirq to do_softirq_own_stack, and
there is no exception frame also when coming from do_softirq_own_stack.
Without the patch, crash may unnecessarily output an exception frame with
a warning as below:

  crash> foreach bt
  ...
  PID: 0        TASK: ffff914f820a8000  CPU: 25   COMMAND: "swapper/25"
   #0 [fffffe0000504e48] crash_nmi_callback at ffffffffa665d763
   #1 [fffffe0000504e50] nmi_handle at ffffffffa662a423
   #2 [fffffe0000504ea8] default_do_nmi at ffffffffa6fe7dc9
   #3 [fffffe0000504ec8] do_nmi at ffffffffa662a97f
   #4 [fffffe0000504ef0] end_repeat_nmi at ffffffffa70015e8
      [exception RIP: clone_endio+172]
      RIP: ffffffffc005c1ec  RSP: ffffa1d403d08e98  RFLAGS: 00000246
      RAX: 0000000000000000  RBX: ffff915326fba230  RCX: 0000000000000018
      RDX: ffffffffc0075400  RSI: 0000000000000000  RDI: ffff915326fba230
      RBP: ffff915326fba1c0   R8: 0000000000001000   R9: ffff915308d6d2a0
      R10: 000000a97dfe5e10  R11: ffffa1d40038fe98  R12: ffff915302babc40
      R13: ffff914f94360000  R14: 0000000000000000  R15: 0000000000000000
      ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
  --- <NMI exception stack> ---
   #5 [ffffa1d403d08e98] clone_endio at ffffffffc005c1ec [dm_mod]
   #6 [ffffa1d403d08ed0] blk_update_request at ffffffffa6a96954
   #7 [ffffa1d403d08f10] scsi_end_request at ffffffffa6c9b968
   #8 [ffffa1d403d08f48] scsi_io_completion at ffffffffa6c9bb3e
   #9 [ffffa1d403d08f90] blk_complete_reqs at ffffffffa6aa0e95
   #10 [ffffa1d403d08fa0] __softirqentry_text_start at ffffffffa72000dc
   #11 [ffffa1d403d08ff0] do_softirq_own_stack at ffffffffa7000f9a
  --- <IRQ stack> ---
   #12 [ffffa1d40038fe70] do_softirq_own_stack at ffffffffa7000f9a
      [exception RIP: unknown or invalid address]
      RIP: 0000000000000000  RSP: 0000000000000000  RFLAGS: 00000000
      RAX: ffffffffa672eae5  RBX: ffffffffa83b34e0  RCX: ffffffffa672eb12
      RDX: 0000000000000010  RSI: 8b7d6c8869010c00  RDI: 0000000000000085
      RBP: 0000000000000286   R8: ffff914f820a8000   R9: ffffffffa67a94e0
      R10: 0000000000000286  R11: ffffffffa66fb4c5  R12: ffffffffa67a898b
      R13: 0000000000000000  R14: fffffffffffffff8  R15: ffffffffa67a1e68
      ORIG_RAX: 0000000000000000  CS: 0000  SS: ffffffffa672edff
   bt: WARNING: possibly bogus exception frame
   #13 [ffffa1d40038ff30] start_secondary at ffffffffa665fa2c
   #14 [ffffa1d40038ff50] secondary_startup_64_no_verify at ffffffffa6600116
   ...

Reported-by: Marco Patalano <[email protected]>
Signed-off-by: Lianbo Jiang <[email protected]>
k-hagio pushed a commit that referenced this issue Nov 29, 2023
…usly

There is an issue that, for kernel modules, "dis -rl" fails to display
modules code line number data after execute "bt" command in crash.

Without the patch:
  crsah> mod -S
  crash> bt
  PID: 1500     TASK: ff2bd8b093524000  CPU: 16   COMMAND: "lpfc_worker_0"
   #0 [ff2c9f725c39f9e0] machine_kexec at ffffffff8e0686d3
   ...snip...
   #8 [ff2c9f725c39fcc0] __lpfc_sli_release_iocbq_s4 at ffffffffc0f2f425 [lpfc]
   ...snip...
  crash> dis -rl ffffffffc0f60f82
  0xffffffffc0f60eb0 <lpfc_nlp_get>:      nopl   0x0(%rax,%rax,1) [FTRACE NOP]
  0xffffffffc0f60eb5 <lpfc_nlp_get+5>:    push   %rbp
  0xffffffffc0f60eb6 <lpfc_nlp_get+6>:    push   %rbx
  0xffffffffc0f60eb7 <lpfc_nlp_get+7>:    test   %rdi,%rdi

With the patch:
  crash> mod -S
  crash> bt
  PID: 1500     TASK: ff2bd8b093524000  CPU: 16   COMMAND: "lpfc_worker_0"
   #0 [ff2c9f725c39f9e0] machine_kexec at ffffffff8e0686d3
   ...snip...
   #8 [ff2c9f725c39fcc0] __lpfc_sli_release_iocbq_s4 at ffffffffc0f2f425 [lpfc]
   ...snip...
  crash> dis -rl ffffffffc0f60f82
  /usr/src/debug/kernel-4.18.0-425.13.1.el8_7/linux-4.18.0-425.13.1.el8_7.x86_64/drivers/scsi/lpfc/lpfc_hbadisc.c: 6756
  0xffffffffc0f60eb0 <lpfc_nlp_get>:      nopl   0x0(%rax,%rax,1) [FTRACE NOP]
  /usr/src/debug/kernel-4.18.0-425.13.1.el8_7/linux-4.18.0-425.13.1.el8_7.x86_64/drivers/scsi/lpfc/lpfc_hbadisc.c: 6759
  0xffffffffc0f60eb5 <lpfc_nlp_get+5>:    push   %rbp

The root cause is, after kernel module been loaded by mod command, the symtable
is not expanded in gdb side. crash bt or dis command will trigger such an
expansion. However the symtable expansion is different for the 2 commands:

The stack trace of "dis -rl" for symtable expanding:

  #0  0x00000000008d8d9f in add_compunit_symtab_to_objfile ...
  #1  0x00000000006d3293 in buildsym_compunit::end_symtab_with_blockvector ...
  #2  0x00000000006d336a in buildsym_compunit::end_symtab_from_static_block ...
  #3  0x000000000077e8e9 in process_full_comp_unit ...
  #4  process_queue ...
  #5  dw2_do_instantiate_symtab ...
  #6  0x000000000077ed67 in dw2_instantiate_symtab ...
  #7  0x000000000077f75e in dw2_expand_all_symtabs ...
  #8  0x00000000008f254d in gdb_get_line_number ...
  #9  0x00000000008f22af in gdb_command_funnel_1 ...
  #10 0x00000000008f2003 in gdb_command_funnel ...
  #11 0x00000000005b7f02 in gdb_interface ...
  #12 0x00000000005f5bd8 in get_line_number ...
  #13 0x000000000059e574 in cmd_dis ...

The stack trace of "bt" for symtable expanding:

  #0  0x00000000008d8d9f in add_compunit_symtab_to_objfile ...
  #1  0x00000000006d3293 in buildsym_compunit::end_symtab_with_blockvector ...
  #2  0x00000000006d336a in buildsym_compunit::end_symtab_from_static_block ...
  #3  0x000000000077e8e9 in process_full_comp_unit ...
  #4  process_queue ...
  #5  dw2_do_instantiate_symtab ...
  #6  0x000000000077ed67 in dw2_instantiate_symtab ...
  #7  0x000000000077f8ed in dw2_lookup_symbol ...
  #8  0x00000000008e6d03 in lookup_symbol_via_quick_fns ...
  #9  0x00000000008e7153 in lookup_symbol_in_objfile ...
  #10 0x00000000008e73c6 in lookup_symbol_global_or_static_iterator_cb ...
  #11 0x00000000008b99c4 in svr4_iterate_over_objfiles_in_search_order ...
  #12 0x00000000008e754e in lookup_global_or_static_symbol ...
  #13 0x00000000008e75da in lookup_static_symbol ...
  #14 0x00000000008e632c in lookup_symbol_aux ...
  #15 0x00000000008e5a7a in lookup_symbol_in_language ...
  #16 0x00000000008e5b30 in lookup_symbol ...
  #17 0x00000000008f2a4a in gdb_get_datatype ...
  #18 0x00000000008f22c0 in gdb_command_funnel_1 ...
  #19 0x00000000008f2003 in gdb_command_funnel ...
  #20 0x00000000005b7f02 in gdb_interface ...
  #21 0x00000000005f8a9f in datatype_info ...
  #22 0x0000000000599947 in cpu_map_size ...
  #23 0x00000000005a975d in get_cpus_online ...
  #24 0x0000000000637a8b in diskdump_get_prstatus_percpu ...
  #25 0x000000000062f0e4 in get_netdump_regs_x86_64 ...
  #26 0x000000000059fe68 in back_trace ...
  #27 0x00000000005ab1cb in cmd_bt ...

For the stacktrace of "dis -rl", it calls dw2_expand_all_symtabs() to expand
all symtable of the objfile, or "*.ko.debug" in our case. However for
the stacktrace of "bt", it doesn't expand all, but only a subset of symtable
which is enough to find a symbol by dw2_lookup_symbol(). As a result, the
objfile->compunit_symtabs, which is the head of a single linked list of
struct compunit_symtab, is not NULL but didn't contain all symtables. It
will not be reinitialized in gdb_get_line_number() by "dis -rl" because
!objfile_has_full_symbols(objfile) check will fail, so it cannot display
the proper code line number data.

Since objfile_has_full_symbols(objfile) check cannot ensure all symbols
been expanded, this patch add a new member as a flag for struct objfile
to record if all symbols have been expanded. The flag will be set only ofter
expand_all_symtabs been called.

Signed-off-by: Tao Liu <[email protected]>
liutgnu added a commit to liutgnu/crash-preview that referenced this issue Nov 30, 2023
…usly

There is an issue that, for kernel modules, "dis -rl" fails to display
modules code line number data after execute "bt" command in crash.

Without the patch:
  crsah> mod -S
  crash> bt
  PID: 1500     TASK: ff2bd8b093524000  CPU: 16   COMMAND: "lpfc_worker_0"
   #0 [ff2c9f725c39f9e0] machine_kexec at ffffffff8e0686d3
   ...snip...
   crash-utility#8 [ff2c9f725c39fcc0] __lpfc_sli_release_iocbq_s4 at ffffffffc0f2f425 [lpfc]
   ...snip...
  crash> dis -rl ffffffffc0f60f82
  0xffffffffc0f60eb0 <lpfc_nlp_get>:      nopl   0x0(%rax,%rax,1) [FTRACE NOP]
  0xffffffffc0f60eb5 <lpfc_nlp_get+5>:    push   %rbp
  0xffffffffc0f60eb6 <lpfc_nlp_get+6>:    push   %rbx
  0xffffffffc0f60eb7 <lpfc_nlp_get+7>:    test   %rdi,%rdi

With the patch:
  crash> mod -S
  crash> bt
  PID: 1500     TASK: ff2bd8b093524000  CPU: 16   COMMAND: "lpfc_worker_0"
   #0 [ff2c9f725c39f9e0] machine_kexec at ffffffff8e0686d3
   ...snip...
   crash-utility#8 [ff2c9f725c39fcc0] __lpfc_sli_release_iocbq_s4 at ffffffffc0f2f425 [lpfc]
   ...snip...
  crash> dis -rl ffffffffc0f60f82
  /usr/src/debug/kernel-4.18.0-425.13.1.el8_7/linux-4.18.0-425.13.1.el8_7.x86_64/drivers/scsi/lpfc/lpfc_hbadisc.c: 6756
  0xffffffffc0f60eb0 <lpfc_nlp_get>:      nopl   0x0(%rax,%rax,1) [FTRACE NOP]
  /usr/src/debug/kernel-4.18.0-425.13.1.el8_7/linux-4.18.0-425.13.1.el8_7.x86_64/drivers/scsi/lpfc/lpfc_hbadisc.c: 6759
  0xffffffffc0f60eb5 <lpfc_nlp_get+5>:    push   %rbp

The root cause is, after kernel module been loaded by mod command, the symtable
is not expanded in gdb side. crash bt or dis command will trigger such an
expansion. However the symtable expansion is different for the 2 commands:

The stack trace of "dis -rl" for symtable expanding:

  #0  0x00000000008d8d9f in add_compunit_symtab_to_objfile ...
  crash-utility#1  0x00000000006d3293 in buildsym_compunit::end_symtab_with_blockvector ...
  crash-utility#2  0x00000000006d336a in buildsym_compunit::end_symtab_from_static_block ...
  crash-utility#3  0x000000000077e8e9 in process_full_comp_unit ...
  crash-utility#4  process_queue ...
  crash-utility#5  dw2_do_instantiate_symtab ...
  crash-utility#6  0x000000000077ed67 in dw2_instantiate_symtab ...
  crash-utility#7  0x000000000077f75e in dw2_expand_all_symtabs ...
  crash-utility#8  0x00000000008f254d in gdb_get_line_number ...
  crash-utility#9  0x00000000008f22af in gdb_command_funnel_1 ...
  crash-utility#10 0x00000000008f2003 in gdb_command_funnel ...
  crash-utility#11 0x00000000005b7f02 in gdb_interface ...
  crash-utility#12 0x00000000005f5bd8 in get_line_number ...
  crash-utility#13 0x000000000059e574 in cmd_dis ...

The stack trace of "bt" for symtable expanding:

  #0  0x00000000008d8d9f in add_compunit_symtab_to_objfile ...
  crash-utility#1  0x00000000006d3293 in buildsym_compunit::end_symtab_with_blockvector ...
  crash-utility#2  0x00000000006d336a in buildsym_compunit::end_symtab_from_static_block ...
  crash-utility#3  0x000000000077e8e9 in process_full_comp_unit ...
  crash-utility#4  process_queue ...
  crash-utility#5  dw2_do_instantiate_symtab ...
  crash-utility#6  0x000000000077ed67 in dw2_instantiate_symtab ...
  crash-utility#7  0x000000000077f8ed in dw2_lookup_symbol ...
  crash-utility#8  0x00000000008e6d03 in lookup_symbol_via_quick_fns ...
  crash-utility#9  0x00000000008e7153 in lookup_symbol_in_objfile ...
  crash-utility#10 0x00000000008e73c6 in lookup_symbol_global_or_static_iterator_cb ...
  crash-utility#11 0x00000000008b99c4 in svr4_iterate_over_objfiles_in_search_order ...
  crash-utility#12 0x00000000008e754e in lookup_global_or_static_symbol ...
  crash-utility#13 0x00000000008e75da in lookup_static_symbol ...
  crash-utility#14 0x00000000008e632c in lookup_symbol_aux ...
  crash-utility#15 0x00000000008e5a7a in lookup_symbol_in_language ...
  crash-utility#16 0x00000000008e5b30 in lookup_symbol ...
  crash-utility#17 0x00000000008f2a4a in gdb_get_datatype ...
  crash-utility#18 0x00000000008f22c0 in gdb_command_funnel_1 ...
  crash-utility#19 0x00000000008f2003 in gdb_command_funnel ...
  crash-utility#20 0x00000000005b7f02 in gdb_interface ...
  crash-utility#21 0x00000000005f8a9f in datatype_info ...
  crash-utility#22 0x0000000000599947 in cpu_map_size ...
  crash-utility#23 0x00000000005a975d in get_cpus_online ...
  crash-utility#24 0x0000000000637a8b in diskdump_get_prstatus_percpu ...
  crash-utility#25 0x000000000062f0e4 in get_netdump_regs_x86_64 ...
  crash-utility#26 0x000000000059fe68 in back_trace ...
  crash-utility#27 0x00000000005ab1cb in cmd_bt ...

For the stacktrace of "dis -rl", it calls dw2_expand_all_symtabs() to expand
all symtable of the objfile, or "*.ko.debug" in our case. However for
the stacktrace of "bt", it doesn't expand all, but only a subset of symtable
which is enough to find a symbol by dw2_lookup_symbol(). As a result, the
objfile->compunit_symtabs, which is the head of a single linked list of
struct compunit_symtab, is not NULL but didn't contain all symtables. It
will not be reinitialized in gdb_get_line_number() by "dis -rl" because
!objfile_has_full_symbols(objfile) check will fail, so it cannot display
the proper code line number data.

Since objfile_has_full_symbols(objfile) check cannot ensure all symbols
been expanded, this patch add a new member as a flag for struct objfile
to record if all symbols have been expanded. The flag will be set only ofter
expand_all_symtabs been called.

Signed-off-by: Tao Liu <[email protected]>
liutgnu added a commit to liutgnu/crash-preview that referenced this issue Dec 1, 2023
…eviously

There is an issue that, for kernel modules, "dis -rl" fails to display
modules code line number data after execute "bt" command in crash.

Without the patch:
  crsah> mod -S
  crash> bt
  PID: 1500     TASK: ff2bd8b093524000  CPU: 16   COMMAND: "lpfc_worker_0"
   #0 [ff2c9f725c39f9e0] machine_kexec at ffffffff8e0686d3
   ...snip...
   crash-utility#8 [ff2c9f725c39fcc0] __lpfc_sli_release_iocbq_s4 at ffffffffc0f2f425 [lpfc]
   ...snip...
  crash> dis -rl ffffffffc0f60f82
  0xffffffffc0f60eb0 <lpfc_nlp_get>:      nopl   0x0(%rax,%rax,1) [FTRACE NOP]
  0xffffffffc0f60eb5 <lpfc_nlp_get+5>:    push   %rbp
  0xffffffffc0f60eb6 <lpfc_nlp_get+6>:    push   %rbx
  0xffffffffc0f60eb7 <lpfc_nlp_get+7>:    test   %rdi,%rdi

With the patch:
  crash> mod -S
  crash> bt
  PID: 1500     TASK: ff2bd8b093524000  CPU: 16   COMMAND: "lpfc_worker_0"
   #0 [ff2c9f725c39f9e0] machine_kexec at ffffffff8e0686d3
   ...snip...
   crash-utility#8 [ff2c9f725c39fcc0] __lpfc_sli_release_iocbq_s4 at ffffffffc0f2f425 [lpfc]
   ...snip...
  crash> dis -rl ffffffffc0f60f82
  /usr/src/debug/kernel-4.18.0-425.13.1.el8_7/linux-4.18.0-425.13.1.el8_7.x86_64/drivers/scsi/lpfc/lpfc_hbadisc.c: 6756
  0xffffffffc0f60eb0 <lpfc_nlp_get>:      nopl   0x0(%rax,%rax,1) [FTRACE NOP]
  /usr/src/debug/kernel-4.18.0-425.13.1.el8_7/linux-4.18.0-425.13.1.el8_7.x86_64/drivers/scsi/lpfc/lpfc_hbadisc.c: 6759
  0xffffffffc0f60eb5 <lpfc_nlp_get+5>:    push   %rbp

The root cause is, after kernel module been loaded by mod command, the symtable
is not expanded in gdb side. crash bt or dis command will trigger such an
expansion. However the symtable expansion is different for the 2 commands:

The stack trace of "dis -rl" for symtable expanding:

  #0  0x00000000008d8d9f in add_compunit_symtab_to_objfile ...
  crash-utility#1  0x00000000006d3293 in buildsym_compunit::end_symtab_with_blockvector ...
  crash-utility#2  0x00000000006d336a in buildsym_compunit::end_symtab_from_static_block ...
  crash-utility#3  0x000000000077e8e9 in process_full_comp_unit ...
  crash-utility#4  process_queue ...
  crash-utility#5  dw2_do_instantiate_symtab ...
  crash-utility#6  0x000000000077ed67 in dw2_instantiate_symtab ...
  crash-utility#7  0x000000000077f75e in dw2_expand_all_symtabs ...
  crash-utility#8  0x00000000008f254d in gdb_get_line_number ...
  crash-utility#9  0x00000000008f22af in gdb_command_funnel_1 ...
  crash-utility#10 0x00000000008f2003 in gdb_command_funnel ...
  crash-utility#11 0x00000000005b7f02 in gdb_interface ...
  crash-utility#12 0x00000000005f5bd8 in get_line_number ...
  crash-utility#13 0x000000000059e574 in cmd_dis ...

The stack trace of "bt" for symtable expanding:

  #0  0x00000000008d8d9f in add_compunit_symtab_to_objfile ...
  crash-utility#1  0x00000000006d3293 in buildsym_compunit::end_symtab_with_blockvector ...
  crash-utility#2  0x00000000006d336a in buildsym_compunit::end_symtab_from_static_block ...
  crash-utility#3  0x000000000077e8e9 in process_full_comp_unit ...
  crash-utility#4  process_queue ...
  crash-utility#5  dw2_do_instantiate_symtab ...
  crash-utility#6  0x000000000077ed67 in dw2_instantiate_symtab ...
  crash-utility#7  0x000000000077f8ed in dw2_lookup_symbol ...
  crash-utility#8  0x00000000008e6d03 in lookup_symbol_via_quick_fns ...
  crash-utility#9  0x00000000008e7153 in lookup_symbol_in_objfile ...
  crash-utility#10 0x00000000008e73c6 in lookup_symbol_global_or_static_iterator_cb ...
  crash-utility#11 0x00000000008b99c4 in svr4_iterate_over_objfiles_in_search_order ...
  crash-utility#12 0x00000000008e754e in lookup_global_or_static_symbol ...
  crash-utility#13 0x00000000008e75da in lookup_static_symbol ...
  crash-utility#14 0x00000000008e632c in lookup_symbol_aux ...
  crash-utility#15 0x00000000008e5a7a in lookup_symbol_in_language ...
  crash-utility#16 0x00000000008e5b30 in lookup_symbol ...
  crash-utility#17 0x00000000008f2a4a in gdb_get_datatype ...
  crash-utility#18 0x00000000008f22c0 in gdb_command_funnel_1 ...
  crash-utility#19 0x00000000008f2003 in gdb_command_funnel ...
  crash-utility#20 0x00000000005b7f02 in gdb_interface ...
  crash-utility#21 0x00000000005f8a9f in datatype_info ...
  crash-utility#22 0x0000000000599947 in cpu_map_size ...
  crash-utility#23 0x00000000005a975d in get_cpus_online ...
  crash-utility#24 0x0000000000637a8b in diskdump_get_prstatus_percpu ...
  crash-utility#25 0x000000000062f0e4 in get_netdump_regs_x86_64 ...
  crash-utility#26 0x000000000059fe68 in back_trace ...
  crash-utility#27 0x00000000005ab1cb in cmd_bt ...

For the stacktrace of "dis -rl", it calls dw2_expand_all_symtabs() to expand
all symtable of the objfile, or "*.ko.debug" in our case. However for
the stacktrace of "bt", it doesn't expand all, but only a subset of symtable
which is enough to find a symbol by dw2_lookup_symbol(). As a result, the
objfile->compunit_symtabs, which is the head of a single linked list of
struct compunit_symtab, is not NULL but didn't contain all symtables. It
will not be reinitialized in gdb_get_line_number() by "dis -rl" because
!objfile_has_full_symbols(objfile) check will fail, so it cannot display
the proper code line number data.

Since objfile_has_full_symbols(objfile) check cannot ensure all symbols
been expanded, this patch add a new member as a flag for struct objfile
to record if all symbols have been expanded. The flag will be set only ofter
expand_all_symtabs been called.

Signed-off-by: Tao Liu <[email protected]>
sugarfillet pushed a commit to sugarfillet/crash that referenced this issue Dec 13, 2023
This patch introduces per-cpu IRQ stacks for RISCV64 to let
"bt" do backtrace on it and 'bt -E' search eframes on it,
and the 'help -m' command displays the addresses of each
per-cpu IRQ stack.

TEST: a vmcore dumped via hacking the handle_irq_event_percpu()
( Why not using lkdtm INT_HW_IRQ_EN EXCEPTION ?
  There is a deadlock[1] in crash_kexec path if use that)

```
crash> bt
PID: 0        TASK: ffffffff8140db00  CPU: 0    COMMAND: "swapper/0"
 #0 [ff20000000003e60] __handle_irq_event_percpu at ffffffff8006462e
 crash-utility#1 [ff20000000003ed0] handle_irq_event_percpu at ffffffff80064702
 crash-utility#2 [ff20000000003ef0] handle_irq_event at ffffffff8006477c
 crash-utility#3 [ff20000000003f20] handle_fasteoi_irq at ffffffff80068664
 crash-utility#4 [ff20000000003f50] generic_handle_domain_irq at ffffffff80063988
 crash-utility#5 [ff20000000003f60] plic_handle_irq at ffffffff8046633e
 crash-utility#6 [ff20000000003fb0] generic_handle_domain_irq at ffffffff80063988
 crash-utility#7 [ff20000000003fc0] riscv_intc_irq at ffffffff80465f8e
 crash-utility#8 [ff20000000003fd0] handle_riscv_irq at ffffffff808361e8
     PC: ffffffff80837314  [default_idle_call+50]
     RA: ffffffff80837310  [default_idle_call+46]
     SP: ffffffff81403da0  CAUSE: 8000000000000009
epc : ffffffff80837314 ra : ffffffff80837310 sp : ffffffff81403da0
 gp : ffffffff814ef848 tp : ffffffff8140db00 t0 : ff2000000004bb18
 t1 : 0000000000032c73 t2 : ffffffff81200a48 s0 : ffffffff81403db0
 s1 : 0000000000000000 a0 : 0000000000000004 a1 : 0000000000000000
 a2 : ff6000009f1e7000 a3 : 0000000000002304 a4 : ffffffff80c1c2d8
 a5 : 0000000000000000 a6 : ff6000001fe01958 a7 : 00002496ea89dbf1
 s2 : ffffffff814f0220 s3 : 0000000000000001 s4 : 000000000000003f
 s5 : ffffffff814f03d8 s6 : 0000000000000000 s7 : ffffffff814f00d0
 s8 : ffffffff81526f10 s9 : ffffffff80c1d880 s10: 0000000000000000
 s11: 0000000000000001 t3 : 0000000000003392 t4 : 0000000000000000
 t5 : 0000000000000000 t6 : 0000000000000040
 status: 0000000200000120 badaddr: 0000000000000000
  cause: 8000000000000009 orig_a0: ffffffff80837310
--- <IRQ stack> ---
 crash-utility#9 [ffffffff81403da0] default_idle_call at ffffffff80837314
 crash-utility#10 [ffffffff81403db0] do_idle at ffffffff8004d0a0
 crash-utility#11 [ffffffff81403e40] cpu_startup_entry at ffffffff8004d21e
 crash-utility#12 [ffffffff81403e60] kernel_init at ffffffff8083746a
 crash-utility#13 [ffffffff81403e70] arch_post_acpi_subsys_init at ffffffff80a006d8
 crash-utility#14 [ffffffff81403e80] console_on_rootfs at ffffffff80a00c92
crash>

crash> bt -E
CPU 0 IRQ STACK:
KERNEL-MODE EXCEPTION FRAME AT: ff20000000003a48
     PC: ffffffff8006462e  [__handle_irq_event_percpu+30]
     RA: ffffffff80064702  [handle_irq_event_percpu+18]
     SP: ff20000000003e60  CAUSE: 000000000000000d
epc : ffffffff8006462e ra : ffffffff80064702 sp : ff20000000003e60
 gp : ffffffff814ef848 tp : ffffffff8140db00 t0 : 0000000000046600
 t1 : ffffffff80836464 t2 : ffffffff81200a48 s0 : ff20000000003ed0
 s1 : 0000000000000000 a0 : 0000000000000000 a1 : 0000000000000118
 a2 : 0000000000000052 a3 : 0000000000000000 a4 : 0000000000000000
 a5 : 0000000000010001 a6 : ff6000001fe01958 a7 : 00002496ea89dbf1
 s2 : ff60000000941ab0 s3 : ffffffff814a0658 s4 : ff60000000089230
 s5 : ffffffff814a0518 s6 : ffffffff814a0620 s7 : ffffffff80e5f0f8
 s8 : ffffffff80fc50b0 s9 : ffffffff80c1d880 s10: 0000000000000000
 s11: 0000000000000001 t3 : 0000000000003392 t4 : 0000000000000000
 t5 : 0000000000000000 t6 : 0000000000000040
 status: 0000000200000100 badaddr: 0000000000000078
  cause: 000000000000000d orig_a0: ff20000000003ea0

CPU 1 IRQ STACK:(none found)

crash>

crash> help -m
<snip>
           machspec: ced1e0
        irq_stack_size: 16384
         irq_stacks[0]: ff20000000000000
         irq_stacks[1]: ff20000000008000
crash>
```
[1]: https://lore.kernel.org/linux-riscv/[email protected]/

Signed-off-by: Song Shuai <[email protected]>
k-hagio pushed a commit that referenced this issue Jan 11, 2024
This patch introduces per-cpu IRQ stacks for RISCV64 to let
"bt" do backtrace on it and 'bt -E' search eframes on it,
and the 'help -m' command displays the addresses of each
per-cpu IRQ stack.

TEST: a vmcore dumped via hacking the handle_irq_event_percpu()
( Why not using lkdtm INT_HW_IRQ_EN EXCEPTION ?
  There is a deadlock[1] in crash_kexec path if use that)

  crash> bt
  PID: 0        TASK: ffffffff8140db00  CPU: 0    COMMAND: "swapper/0"
   #0 [ff20000000003e60] __handle_irq_event_percpu at ffffffff8006462e
   #1 [ff20000000003ed0] handle_irq_event_percpu at ffffffff80064702
   #2 [ff20000000003ef0] handle_irq_event at ffffffff8006477c
   #3 [ff20000000003f20] handle_fasteoi_irq at ffffffff80068664
   #4 [ff20000000003f50] generic_handle_domain_irq at ffffffff80063988
   #5 [ff20000000003f60] plic_handle_irq at ffffffff8046633e
   #6 [ff20000000003fb0] generic_handle_domain_irq at ffffffff80063988
   #7 [ff20000000003fc0] riscv_intc_irq at ffffffff80465f8e
   #8 [ff20000000003fd0] handle_riscv_irq at ffffffff808361e8
       PC: ffffffff80837314  [default_idle_call+50]
       RA: ffffffff80837310  [default_idle_call+46]
       SP: ffffffff81403da0  CAUSE: 8000000000000009
  epc : ffffffff80837314 ra : ffffffff80837310 sp : ffffffff81403da0
   gp : ffffffff814ef848 tp : ffffffff8140db00 t0 : ff2000000004bb18
   t1 : 0000000000032c73 t2 : ffffffff81200a48 s0 : ffffffff81403db0
   s1 : 0000000000000000 a0 : 0000000000000004 a1 : 0000000000000000
   a2 : ff6000009f1e7000 a3 : 0000000000002304 a4 : ffffffff80c1c2d8
   a5 : 0000000000000000 a6 : ff6000001fe01958 a7 : 00002496ea89dbf1
   s2 : ffffffff814f0220 s3 : 0000000000000001 s4 : 000000000000003f
   s5 : ffffffff814f03d8 s6 : 0000000000000000 s7 : ffffffff814f00d0
   s8 : ffffffff81526f10 s9 : ffffffff80c1d880 s10: 0000000000000000
   s11: 0000000000000001 t3 : 0000000000003392 t4 : 0000000000000000
   t5 : 0000000000000000 t6 : 0000000000000040
   status: 0000000200000120 badaddr: 0000000000000000
    cause: 8000000000000009 orig_a0: ffffffff80837310
  --- <IRQ stack> ---
   #9 [ffffffff81403da0] default_idle_call at ffffffff80837314
   #10 [ffffffff81403db0] do_idle at ffffffff8004d0a0
   #11 [ffffffff81403e40] cpu_startup_entry at ffffffff8004d21e
   #12 [ffffffff81403e60] kernel_init at ffffffff8083746a
   #13 [ffffffff81403e70] arch_post_acpi_subsys_init at ffffffff80a006d8
   #14 [ffffffff81403e80] console_on_rootfs at ffffffff80a00c92
  crash>

  crash> bt -E
  CPU 0 IRQ STACK:
  KERNEL-MODE EXCEPTION FRAME AT: ff20000000003a48
       PC: ffffffff8006462e  [__handle_irq_event_percpu+30]
       RA: ffffffff80064702  [handle_irq_event_percpu+18]
       SP: ff20000000003e60  CAUSE: 000000000000000d
  epc : ffffffff8006462e ra : ffffffff80064702 sp : ff20000000003e60
   gp : ffffffff814ef848 tp : ffffffff8140db00 t0 : 0000000000046600
   t1 : ffffffff80836464 t2 : ffffffff81200a48 s0 : ff20000000003ed0
   s1 : 0000000000000000 a0 : 0000000000000000 a1 : 0000000000000118
   a2 : 0000000000000052 a3 : 0000000000000000 a4 : 0000000000000000
   a5 : 0000000000010001 a6 : ff6000001fe01958 a7 : 00002496ea89dbf1
   s2 : ff60000000941ab0 s3 : ffffffff814a0658 s4 : ff60000000089230
   s5 : ffffffff814a0518 s6 : ffffffff814a0620 s7 : ffffffff80e5f0f8
   s8 : ffffffff80fc50b0 s9 : ffffffff80c1d880 s10: 0000000000000000
   s11: 0000000000000001 t3 : 0000000000003392 t4 : 0000000000000000
   t5 : 0000000000000000 t6 : 0000000000000040
   status: 0000000200000100 badaddr: 0000000000000078
    cause: 000000000000000d orig_a0: ff20000000003ea0

  CPU 1 IRQ STACK: (none found)

  crash>

  crash> help -m
  <snip>
             machspec: ced1e0
          irq_stack_size: 16384
           irq_stacks[0]: ff20000000000000
           irq_stacks[1]: ff20000000008000
  crash>

[1]: https://lore.kernel.org/linux-riscv/[email protected]/

Signed-off-by: Song Shuai <[email protected]>
liutgnu pushed a commit to liutgnu/crash-preview that referenced this issue Feb 21, 2024
This patch introduces per-cpu IRQ stacks for RISCV64 to let
"bt" do backtrace on it and 'bt -E' search eframes on it,
and the 'help -m' command displays the addresses of each
per-cpu IRQ stack.

TEST: a vmcore dumped via hacking the handle_irq_event_percpu()
( Why not using lkdtm INT_HW_IRQ_EN EXCEPTION ?
  There is a deadlock[1] in crash_kexec path if use that)

  crash> bt
  PID: 0        TASK: ffffffff8140db00  CPU: 0    COMMAND: "swapper/0"
   #0 [ff20000000003e60] __handle_irq_event_percpu at ffffffff8006462e
   crash-utility#1 [ff20000000003ed0] handle_irq_event_percpu at ffffffff80064702
   crash-utility#2 [ff20000000003ef0] handle_irq_event at ffffffff8006477c
   crash-utility#3 [ff20000000003f20] handle_fasteoi_irq at ffffffff80068664
   crash-utility#4 [ff20000000003f50] generic_handle_domain_irq at ffffffff80063988
   crash-utility#5 [ff20000000003f60] plic_handle_irq at ffffffff8046633e
   crash-utility#6 [ff20000000003fb0] generic_handle_domain_irq at ffffffff80063988
   crash-utility#7 [ff20000000003fc0] riscv_intc_irq at ffffffff80465f8e
   crash-utility#8 [ff20000000003fd0] handle_riscv_irq at ffffffff808361e8
       PC: ffffffff80837314  [default_idle_call+50]
       RA: ffffffff80837310  [default_idle_call+46]
       SP: ffffffff81403da0  CAUSE: 8000000000000009
  epc : ffffffff80837314 ra : ffffffff80837310 sp : ffffffff81403da0
   gp : ffffffff814ef848 tp : ffffffff8140db00 t0 : ff2000000004bb18
   t1 : 0000000000032c73 t2 : ffffffff81200a48 s0 : ffffffff81403db0
   s1 : 0000000000000000 a0 : 0000000000000004 a1 : 0000000000000000
   a2 : ff6000009f1e7000 a3 : 0000000000002304 a4 : ffffffff80c1c2d8
   a5 : 0000000000000000 a6 : ff6000001fe01958 a7 : 00002496ea89dbf1
   s2 : ffffffff814f0220 s3 : 0000000000000001 s4 : 000000000000003f
   s5 : ffffffff814f03d8 s6 : 0000000000000000 s7 : ffffffff814f00d0
   s8 : ffffffff81526f10 s9 : ffffffff80c1d880 s10: 0000000000000000
   s11: 0000000000000001 t3 : 0000000000003392 t4 : 0000000000000000
   t5 : 0000000000000000 t6 : 0000000000000040
   status: 0000000200000120 badaddr: 0000000000000000
    cause: 8000000000000009 orig_a0: ffffffff80837310
  --- <IRQ stack> ---
   crash-utility#9 [ffffffff81403da0] default_idle_call at ffffffff80837314
   crash-utility#10 [ffffffff81403db0] do_idle at ffffffff8004d0a0
   crash-utility#11 [ffffffff81403e40] cpu_startup_entry at ffffffff8004d21e
   crash-utility#12 [ffffffff81403e60] kernel_init at ffffffff8083746a
   crash-utility#13 [ffffffff81403e70] arch_post_acpi_subsys_init at ffffffff80a006d8
   crash-utility#14 [ffffffff81403e80] console_on_rootfs at ffffffff80a00c92
  crash>

  crash> bt -E
  CPU 0 IRQ STACK:
  KERNEL-MODE EXCEPTION FRAME AT: ff20000000003a48
       PC: ffffffff8006462e  [__handle_irq_event_percpu+30]
       RA: ffffffff80064702  [handle_irq_event_percpu+18]
       SP: ff20000000003e60  CAUSE: 000000000000000d
  epc : ffffffff8006462e ra : ffffffff80064702 sp : ff20000000003e60
   gp : ffffffff814ef848 tp : ffffffff8140db00 t0 : 0000000000046600
   t1 : ffffffff80836464 t2 : ffffffff81200a48 s0 : ff20000000003ed0
   s1 : 0000000000000000 a0 : 0000000000000000 a1 : 0000000000000118
   a2 : 0000000000000052 a3 : 0000000000000000 a4 : 0000000000000000
   a5 : 0000000000010001 a6 : ff6000001fe01958 a7 : 00002496ea89dbf1
   s2 : ff60000000941ab0 s3 : ffffffff814a0658 s4 : ff60000000089230
   s5 : ffffffff814a0518 s6 : ffffffff814a0620 s7 : ffffffff80e5f0f8
   s8 : ffffffff80fc50b0 s9 : ffffffff80c1d880 s10: 0000000000000000
   s11: 0000000000000001 t3 : 0000000000003392 t4 : 0000000000000000
   t5 : 0000000000000000 t6 : 0000000000000040
   status: 0000000200000100 badaddr: 0000000000000078
    cause: 000000000000000d orig_a0: ff20000000003ea0

  CPU 1 IRQ STACK: (none found)

  crash>

  crash> help -m
  <snip>
             machspec: ced1e0
          irq_stack_size: 16384
           irq_stacks[0]: ff20000000000000
           irq_stacks[1]: ff20000000008000
  crash>

[1]: https://lore.kernel.org/linux-riscv/[email protected]/

Signed-off-by: Song Shuai <[email protected]>
adi-g15-ibm pushed a commit to adi-g15-ibm/crash that referenced this issue Mar 10, 2024
The stack unwinding is for kernel addresses only. If non-kernel address
encountered, it is usually a user space address, or non-address value
like a function call parameter. So stopping stack unwinding at non-kernel
address will decrease the invalid unwind results.

Before:
crash> gdb bt
 #0  0xffffffff816a8f65 in context_switch ...
 crash-utility#1  __schedule () ...
 crash-utility#2  0xffffffff816a94e9 in schedule ...
 crash-utility#3  0xffffffff816a86fd in schedule_hrtimeout_range_clock ...
 crash-utility#4  0xffffffff816a8733 in schedule_hrtimeout_range ...
 crash-utility#5  0xffffffff8124bb7e in ep_poll ...
 crash-utility#6  0xffffffff8124d00d in SYSC_epoll_wait ...
 crash-utility#7  SyS_epoll_wait ...
 crash-utility#8  <signal handler called>
 crash-utility#9  0x00007f0449407923 in ?? ()
 crash-utility#10 0xffff880100000001 in ?? ()
 crash-utility#11 0xffff880169b3c010 in ?? ()
 crash-utility#12 0x0000000000000040 in irq_stack_union ()
 crash-utility#13 0xffff880169b3c058 in ?? ()
 crash-utility#14 0xffff880169b3c048 in ?? ()
 crash-utility#15 0xffff880169b3c050 in ?? ()
 crash-utility#16 0x0000000000000000 in ?? ()

After:
crash> gdb bt
 #0  0xffffffff816a8f65 in context_switch ...
 crash-utility#1  __schedule () ...
 crash-utility#2  0xffffffff816a94e9 in schedule () ...
 crash-utility#3  0xffffffff816a86fd in schedule_hrtimeout_range_clock ...
 crash-utility#4  0xffffffff816a8733 in schedule_hrtimeout_range ...
 crash-utility#5  0xffffffff8124bb7e in ep_poll ...
 crash-utility#6  0xffffffff8124d00d in SYSC_epoll_wait ...
 crash-utility#7  SyS_epoll_wait ...
 crash-utility#8  <signal handler called>
 crash-utility#9  0x00007f0449407923 in ?? ()

Signed-off-by: Tao Liu <ltao(a)redhat.com>
adi-g15-ibm pushed a commit to adi-g15-ibm/crash that referenced this issue Mar 27, 2024
The stack unwinding is for kernel addresses only. If non-kernel address
encountered, it is usually a user space address, or non-address value
like a function call parameter. So stopping stack unwinding at non-kernel
address will decrease the invalid unwind results.

Before:
crash> gdb bt
 #0  0xffffffff816a8f65 in context_switch ...
 crash-utility#1  __schedule () ...
 crash-utility#2  0xffffffff816a94e9 in schedule ...
 crash-utility#3  0xffffffff816a86fd in schedule_hrtimeout_range_clock ...
 crash-utility#4  0xffffffff816a8733 in schedule_hrtimeout_range ...
 crash-utility#5  0xffffffff8124bb7e in ep_poll ...
 crash-utility#6  0xffffffff8124d00d in SYSC_epoll_wait ...
 crash-utility#7  SyS_epoll_wait ...
 crash-utility#8  <signal handler called>
 crash-utility#9  0x00007f0449407923 in ?? ()
 crash-utility#10 0xffff880100000001 in ?? ()
 crash-utility#11 0xffff880169b3c010 in ?? ()
 crash-utility#12 0x0000000000000040 in irq_stack_union ()
 crash-utility#13 0xffff880169b3c058 in ?? ()
 crash-utility#14 0xffff880169b3c048 in ?? ()
 crash-utility#15 0xffff880169b3c050 in ?? ()
 crash-utility#16 0x0000000000000000 in ?? ()

After:
crash> gdb bt
 #0  0xffffffff816a8f65 in context_switch ...
 crash-utility#1  __schedule () ...
 crash-utility#2  0xffffffff816a94e9 in schedule () ...
 crash-utility#3  0xffffffff816a86fd in schedule_hrtimeout_range_clock ...
 crash-utility#4  0xffffffff816a8733 in schedule_hrtimeout_range ...
 crash-utility#5  0xffffffff8124bb7e in ep_poll ...
 crash-utility#6  0xffffffff8124d00d in SYSC_epoll_wait ...
 crash-utility#7  SyS_epoll_wait ...
 crash-utility#8  <signal handler called>
 crash-utility#9  0x00007f0449407923 in ?? ()

Signed-off-by: Tao Liu <[email protected]>
adi-g15-ibm pushed a commit to adi-g15-ibm/crash that referenced this issue Mar 27, 2024
The stack unwinding is for kernel addresses only. If non-kernel address
encountered, it is usually a user space address, or non-address value
like a function call parameter. So stopping stack unwinding at non-kernel
address will decrease the invalid unwind results.

Before:
crash> gdb bt
 #0  0xffffffff816a8f65 in context_switch ...
 crash-utility#1  __schedule () ...
 crash-utility#2  0xffffffff816a94e9 in schedule ...
 crash-utility#3  0xffffffff816a86fd in schedule_hrtimeout_range_clock ...
 crash-utility#4  0xffffffff816a8733 in schedule_hrtimeout_range ...
 crash-utility#5  0xffffffff8124bb7e in ep_poll ...
 crash-utility#6  0xffffffff8124d00d in SYSC_epoll_wait ...
 crash-utility#7  SyS_epoll_wait ...
 crash-utility#8  <signal handler called>
 crash-utility#9  0x00007f0449407923 in ?? ()
 crash-utility#10 0xffff880100000001 in ?? ()
 crash-utility#11 0xffff880169b3c010 in ?? ()
 crash-utility#12 0x0000000000000040 in irq_stack_union ()
 crash-utility#13 0xffff880169b3c058 in ?? ()
 crash-utility#14 0xffff880169b3c048 in ?? ()
 crash-utility#15 0xffff880169b3c050 in ?? ()
 crash-utility#16 0x0000000000000000 in ?? ()

After:
crash> gdb bt
 #0  0xffffffff816a8f65 in context_switch ...
 crash-utility#1  __schedule () ...
 crash-utility#2  0xffffffff816a94e9 in schedule () ...
 crash-utility#3  0xffffffff816a86fd in schedule_hrtimeout_range_clock ...
 crash-utility#4  0xffffffff816a8733 in schedule_hrtimeout_range ...
 crash-utility#5  0xffffffff8124bb7e in ep_poll ...
 crash-utility#6  0xffffffff8124d00d in SYSC_epoll_wait ...
 crash-utility#7  SyS_epoll_wait ...
 crash-utility#8  <signal handler called>
 crash-utility#9  0x00007f0449407923 in ?? ()

Signed-off-by: Tao Liu <[email protected]>
adi-g15-ibm pushed a commit to adi-g15-ibm/crash that referenced this issue Mar 27, 2024
The stack unwinding is for kernel addresses only. If non-kernel address
encountered, it is usually a user space address, or non-address value
like a function call parameter. So stopping stack unwinding at non-kernel
address will decrease the invalid unwind results.

Before:
crash> gdb bt
 #0  0xffffffff816a8f65 in context_switch ...
 crash-utility#1  __schedule () ...
 crash-utility#2  0xffffffff816a94e9 in schedule ...
 crash-utility#3  0xffffffff816a86fd in schedule_hrtimeout_range_clock ...
 crash-utility#4  0xffffffff816a8733 in schedule_hrtimeout_range ...
 crash-utility#5  0xffffffff8124bb7e in ep_poll ...
 crash-utility#6  0xffffffff8124d00d in SYSC_epoll_wait ...
 crash-utility#7  SyS_epoll_wait ...
 crash-utility#8  <signal handler called>
 crash-utility#9  0x00007f0449407923 in ?? ()
 crash-utility#10 0xffff880100000001 in ?? ()
 crash-utility#11 0xffff880169b3c010 in ?? ()
 crash-utility#12 0x0000000000000040 in irq_stack_union ()
 crash-utility#13 0xffff880169b3c058 in ?? ()
 crash-utility#14 0xffff880169b3c048 in ?? ()
 crash-utility#15 0xffff880169b3c050 in ?? ()
 crash-utility#16 0x0000000000000000 in ?? ()

After:
crash> gdb bt
 #0  0xffffffff816a8f65 in context_switch ...
 crash-utility#1  __schedule () ...
 crash-utility#2  0xffffffff816a94e9 in schedule () ...
 crash-utility#3  0xffffffff816a86fd in schedule_hrtimeout_range_clock ...
 crash-utility#4  0xffffffff816a8733 in schedule_hrtimeout_range ...
 crash-utility#5  0xffffffff8124bb7e in ep_poll ...
 crash-utility#6  0xffffffff8124d00d in SYSC_epoll_wait ...
 crash-utility#7  SyS_epoll_wait ...
 crash-utility#8  <signal handler called>
 crash-utility#9  0x00007f0449407923 in ?? ()

Signed-off-by: Tao Liu <[email protected]>
adi-g15-ibm pushed a commit to adi-g15-ibm/crash that referenced this issue Mar 28, 2024
The stack unwinding is for kernel addresses only. If non-kernel address
encountered, it is usually a user space address, or non-address value
like a function call parameter. So stopping stack unwinding at non-kernel
address will decrease the invalid unwind results.

Before:
crash> gdb bt
 #0  0xffffffff816a8f65 in context_switch ...
 crash-utility#1  __schedule () ...
 crash-utility#2  0xffffffff816a94e9 in schedule ...
 crash-utility#3  0xffffffff816a86fd in schedule_hrtimeout_range_clock ...
 crash-utility#4  0xffffffff816a8733 in schedule_hrtimeout_range ...
 crash-utility#5  0xffffffff8124bb7e in ep_poll ...
 crash-utility#6  0xffffffff8124d00d in SYSC_epoll_wait ...
 crash-utility#7  SyS_epoll_wait ...
 crash-utility#8  <signal handler called>
 crash-utility#9  0x00007f0449407923 in ?? ()
 crash-utility#10 0xffff880100000001 in ?? ()
 crash-utility#11 0xffff880169b3c010 in ?? ()
 crash-utility#12 0x0000000000000040 in irq_stack_union ()
 crash-utility#13 0xffff880169b3c058 in ?? ()
 crash-utility#14 0xffff880169b3c048 in ?? ()
 crash-utility#15 0xffff880169b3c050 in ?? ()
 crash-utility#16 0x0000000000000000 in ?? ()

After:
crash> gdb bt
 #0  0xffffffff816a8f65 in context_switch ...
 crash-utility#1  __schedule () ...
 crash-utility#2  0xffffffff816a94e9 in schedule () ...
 crash-utility#3  0xffffffff816a86fd in schedule_hrtimeout_range_clock ...
 crash-utility#4  0xffffffff816a8733 in schedule_hrtimeout_range ...
 crash-utility#5  0xffffffff8124bb7e in ep_poll ...
 crash-utility#6  0xffffffff8124d00d in SYSC_epoll_wait ...
 crash-utility#7  SyS_epoll_wait ...
 crash-utility#8  <signal handler called>
 crash-utility#9  0x00007f0449407923 in ?? ()

Signed-off-by: Tao Liu <[email protected]>
adi-g15-ibm pushed a commit to adi-g15-ibm/crash that referenced this issue Mar 28, 2024
The stack unwinding is for kernel addresses only. If non-kernel address
encountered, it is usually a user space address, or non-address value
like a function call parameter. So stopping stack unwinding at non-kernel
address will decrease the invalid unwind results.

Before:
crash> gdb bt
 #0  0xffffffff816a8f65 in context_switch ...
 crash-utility#1  __schedule () ...
 crash-utility#2  0xffffffff816a94e9 in schedule ...
 crash-utility#3  0xffffffff816a86fd in schedule_hrtimeout_range_clock ...
 crash-utility#4  0xffffffff816a8733 in schedule_hrtimeout_range ...
 crash-utility#5  0xffffffff8124bb7e in ep_poll ...
 crash-utility#6  0xffffffff8124d00d in SYSC_epoll_wait ...
 crash-utility#7  SyS_epoll_wait ...
 crash-utility#8  <signal handler called>
 crash-utility#9  0x00007f0449407923 in ?? ()
 crash-utility#10 0xffff880100000001 in ?? ()
 crash-utility#11 0xffff880169b3c010 in ?? ()
 crash-utility#12 0x0000000000000040 in irq_stack_union ()
 crash-utility#13 0xffff880169b3c058 in ?? ()
 crash-utility#14 0xffff880169b3c048 in ?? ()
 crash-utility#15 0xffff880169b3c050 in ?? ()
 crash-utility#16 0x0000000000000000 in ?? ()

After:
crash> gdb bt
 #0  0xffffffff816a8f65 in context_switch ...
 crash-utility#1  __schedule () ...
 crash-utility#2  0xffffffff816a94e9 in schedule () ...
 crash-utility#3  0xffffffff816a86fd in schedule_hrtimeout_range_clock ...
 crash-utility#4  0xffffffff816a8733 in schedule_hrtimeout_range ...
 crash-utility#5  0xffffffff8124bb7e in ep_poll ...
 crash-utility#6  0xffffffff8124d00d in SYSC_epoll_wait ...
 crash-utility#7  SyS_epoll_wait ...
 crash-utility#8  <signal handler called>
 crash-utility#9  0x00007f0449407923 in ?? ()

Signed-off-by: Tao Liu <[email protected]>
adi-g15-ibm pushed a commit to adi-g15-ibm/crash that referenced this issue Mar 29, 2024
The stack unwinding is for kernel addresses only. If non-kernel address
encountered, it is usually a user space address, or non-address value
like a function call parameter. So stopping stack unwinding at non-kernel
address will decrease the invalid unwind results.

Before:
crash> gdb bt
 #0  0xffffffff816a8f65 in context_switch ...
 crash-utility#1  __schedule () ...
 crash-utility#2  0xffffffff816a94e9 in schedule ...
 crash-utility#3  0xffffffff816a86fd in schedule_hrtimeout_range_clock ...
 crash-utility#4  0xffffffff816a8733 in schedule_hrtimeout_range ...
 crash-utility#5  0xffffffff8124bb7e in ep_poll ...
 crash-utility#6  0xffffffff8124d00d in SYSC_epoll_wait ...
 crash-utility#7  SyS_epoll_wait ...
 crash-utility#8  <signal handler called>
 crash-utility#9  0x00007f0449407923 in ?? ()
 crash-utility#10 0xffff880100000001 in ?? ()
 crash-utility#11 0xffff880169b3c010 in ?? ()
 crash-utility#12 0x0000000000000040 in irq_stack_union ()
 crash-utility#13 0xffff880169b3c058 in ?? ()
 crash-utility#14 0xffff880169b3c048 in ?? ()
 crash-utility#15 0xffff880169b3c050 in ?? ()
 crash-utility#16 0x0000000000000000 in ?? ()

After:
crash> gdb bt
 #0  0xffffffff816a8f65 in context_switch ...
 crash-utility#1  __schedule () ...
 crash-utility#2  0xffffffff816a94e9 in schedule () ...
 crash-utility#3  0xffffffff816a86fd in schedule_hrtimeout_range_clock ...
 crash-utility#4  0xffffffff816a8733 in schedule_hrtimeout_range ...
 crash-utility#5  0xffffffff8124bb7e in ep_poll ...
 crash-utility#6  0xffffffff8124d00d in SYSC_epoll_wait ...
 crash-utility#7  SyS_epoll_wait ...
 crash-utility#8  <signal handler called>
 crash-utility#9  0x00007f0449407923 in ?? ()

Signed-off-by: Tao Liu <[email protected]>
adi-g15-ibm pushed a commit to adi-g15-ibm/crash that referenced this issue May 19, 2024
The stack unwinding is for kernel addresses only. If non-kernel address
encountered, it is usually a user space address, or non-address value
like a function call parameter. So stopping stack unwinding at non-kernel
address will decrease the invalid unwind results.

Before:
crash> gdb bt
 #0  0xffffffff816a8f65 in context_switch ...
 crash-utility#1  __schedule () ...
 crash-utility#2  0xffffffff816a94e9 in schedule ...
 crash-utility#3  0xffffffff816a86fd in schedule_hrtimeout_range_clock ...
 crash-utility#4  0xffffffff816a8733 in schedule_hrtimeout_range ...
 crash-utility#5  0xffffffff8124bb7e in ep_poll ...
 crash-utility#6  0xffffffff8124d00d in SYSC_epoll_wait ...
 crash-utility#7  SyS_epoll_wait ...
 crash-utility#8  <signal handler called>
 crash-utility#9  0x00007f0449407923 in ?? ()
 crash-utility#10 0xffff880100000001 in ?? ()
 crash-utility#11 0xffff880169b3c010 in ?? ()
 crash-utility#12 0x0000000000000040 in irq_stack_union ()
 crash-utility#13 0xffff880169b3c058 in ?? ()
 crash-utility#14 0xffff880169b3c048 in ?? ()
 crash-utility#15 0xffff880169b3c050 in ?? ()
 crash-utility#16 0x0000000000000000 in ?? ()

After:
crash> gdb bt
 #0  0xffffffff816a8f65 in context_switch ...
 crash-utility#1  __schedule () ...
 crash-utility#2  0xffffffff816a94e9 in schedule () ...
 crash-utility#3  0xffffffff816a86fd in schedule_hrtimeout_range_clock ...
 crash-utility#4  0xffffffff816a8733 in schedule_hrtimeout_range ...
 crash-utility#5  0xffffffff8124bb7e in ep_poll ...
 crash-utility#6  0xffffffff8124d00d in SYSC_epoll_wait ...
 crash-utility#7  SyS_epoll_wait ...
 crash-utility#8  <signal handler called>
 crash-utility#9  0x00007f0449407923 in ?? ()

Cc: Sourabh Jain <[email protected]>
Cc: Hari Bathini <[email protected]>
Cc: Mahesh J Salgaonkar <[email protected]>
Cc: Naveen N. Rao <[email protected]>
Cc: Lianbo Jiang <[email protected]>
Cc: HAGIO KAZUHITO(萩尾 一仁) <[email protected]>
Cc: Tao Liu <[email protected]>
Cc: Alexey Makhalov <[email protected]>
Signed-off-by: Tao Liu <[email protected]>
adi-g15-ibm pushed a commit to adi-g15-ibm/crash that referenced this issue May 19, 2024
There may be extra "=>" prefix before gdb disassembly, as a result,
parse_line() will return string "=>" as arglist[0], which will be
converted to number by htol() and fails. E.g.:

crash> gdb x/40i __list_del_entry
   ...
   0xffffffff8133c384 <__list_del_entry+36>:    cmp    %rcx,%rax
   0xffffffff8133c387 <__list_del_entry+39>:    je     0xffffffff8133c403 <__list_del_entry+163>
=> 0xffffffff8133c389 <__list_del_entry+41>:    mov    (%rax),%r8
   0xffffffff8133c38c <__list_del_entry+44>:    cmp    %r8,%rdi
   0xffffffff8133c38f <__list_del_entry+47>:    jne    0xffffffff8133c3e4 <__list_del_entry+132>
   0xffffffff8133c391 <__list_del_entry+49>:    mov    0x8(%rdx),%r8

Before the patch:

crash> bt
 ...
 crash-utility#10 [ffff880095647c00] async_page_fault at ffffffff816a8638
    [exception RIP: __list_del_entry+41]
    RIP: ffffffff8133c389  RSP: ffff880095647cb0  RFLAGS: 00010207
    RAX: 0000000000000000  RBX: ffffea0400408020  RCX: dead000000000200
    RDX: 0000000000000000  RSI: 0000000000000246  RDI: ffffea0400408020
    RBP: ffff880095647cb0   R8: 0000000080000431   R9: ffffffff81e835c0
    R10: 0000000000000000  R11: 0000000000000400  R12: ffff880138795b58
    R13: 0000000010010201  R14: ffff880095647d70  R15: 0000000400408040
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 bt: invalid input: "=>"
 crash-utility#11 [ffff880095647cb8] list_del at ffffffff8133c43d
 crash-utility#12 [ffff880095647cd0] devm_memremap_pages at ffffffff81180c53

After the patch:

No string as 'bt: invalid input: "=>"' of output.

Cc: Sourabh Jain <[email protected]>
Cc: Hari Bathini <[email protected]>
Cc: Mahesh J Salgaonkar <[email protected]>
Cc: Naveen N. Rao <[email protected]>
Cc: Lianbo Jiang <[email protected]>
Cc: HAGIO KAZUHITO(萩尾 一仁) <[email protected]>
Cc: Tao Liu <[email protected]>
Cc: Alexey Makhalov <[email protected]>
Signed-off-by: Tao Liu <[email protected]>
adi-g15-ibm pushed a commit to adi-g15-ibm/crash that referenced this issue Jul 31, 2024
The stack unwinding is for kernel addresses only. If non-kernel address
encountered, it is usually a user space address, or non-address value
like a function call parameter. So stopping stack unwinding at non-kernel
address will decrease the invalid unwind results.

Before:
crash> gdb bt
 #0  0xffffffff816a8f65 in context_switch ...
 crash-utility#1  __schedule () ...
 crash-utility#2  0xffffffff816a94e9 in schedule ...
 crash-utility#3  0xffffffff816a86fd in schedule_hrtimeout_range_clock ...
 crash-utility#4  0xffffffff816a8733 in schedule_hrtimeout_range ...
 crash-utility#5  0xffffffff8124bb7e in ep_poll ...
 crash-utility#6  0xffffffff8124d00d in SYSC_epoll_wait ...
 crash-utility#7  SyS_epoll_wait ...
 crash-utility#8  <signal handler called>
 crash-utility#9  0x00007f0449407923 in ?? ()
 crash-utility#10 0xffff880100000001 in ?? ()
 crash-utility#11 0xffff880169b3c010 in ?? ()
 crash-utility#12 0x0000000000000040 in irq_stack_union ()
 crash-utility#13 0xffff880169b3c058 in ?? ()
 crash-utility#14 0xffff880169b3c048 in ?? ()
 crash-utility#15 0xffff880169b3c050 in ?? ()
 crash-utility#16 0x0000000000000000 in ?? ()

After:
crash> gdb bt
 #0  0xffffffff816a8f65 in context_switch ...
 crash-utility#1  __schedule () ...
 crash-utility#2  0xffffffff816a94e9 in schedule () ...
 crash-utility#3  0xffffffff816a86fd in schedule_hrtimeout_range_clock ...
 crash-utility#4  0xffffffff816a8733 in schedule_hrtimeout_range ...
 crash-utility#5  0xffffffff8124bb7e in ep_poll ...
 crash-utility#6  0xffffffff8124d00d in SYSC_epoll_wait ...
 crash-utility#7  SyS_epoll_wait ...
 crash-utility#8  <signal handler called>

Cc: Sourabh Jain <[email protected]>
Cc: Hari Bathini <[email protected]>
Cc: Mahesh J Salgaonkar <[email protected]>
Cc: Naveen N. Rao <[email protected]>
Cc: Lianbo Jiang <[email protected]>
Cc: HAGIO KAZUHITO(萩尾 一仁) <[email protected]>
Cc: Tao Liu <[email protected]>
Cc: Alexey Makhalov <[email protected]>
Signed-off-by: Tao Liu <[email protected]>
adi-g15-ibm pushed a commit to adi-g15-ibm/crash that referenced this issue Jul 31, 2024
There may be extra "=>" prefix before gdb disassembly, as a result,
parse_line() will return string "=>" as arglist[0], which will be
converted to number by htol() and fails. E.g.:

crash> gdb x/40i __list_del_entry
   ...
   0xffffffff8133c384 <__list_del_entry+36>:    cmp    %rcx,%rax
   0xffffffff8133c387 <__list_del_entry+39>:    je     0xffffffff8133c403 <__list_del_entry+163>
=> 0xffffffff8133c389 <__list_del_entry+41>:    mov    (%rax),%r8
   0xffffffff8133c38c <__list_del_entry+44>:    cmp    %r8,%rdi
   0xffffffff8133c38f <__list_del_entry+47>:    jne    0xffffffff8133c3e4 <__list_del_entry+132>
   0xffffffff8133c391 <__list_del_entry+49>:    mov    0x8(%rdx),%r8

Before the patch:

crash> bt
 ...
 crash-utility#10 [ffff880095647c00] async_page_fault at ffffffff816a8638
    [exception RIP: __list_del_entry+41]
    RIP: ffffffff8133c389  RSP: ffff880095647cb0  RFLAGS: 00010207
    RAX: 0000000000000000  RBX: ffffea0400408020  RCX: dead000000000200
    RDX: 0000000000000000  RSI: 0000000000000246  RDI: ffffea0400408020
    RBP: ffff880095647cb0   R8: 0000000080000431   R9: ffffffff81e835c0
    R10: 0000000000000000  R11: 0000000000000400  R12: ffff880138795b58
    R13: 0000000010010201  R14: ffff880095647d70  R15: 0000000400408040
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 bt: invalid input: "=>"
 crash-utility#11 [ffff880095647cb8] list_del at ffffffff8133c43d
 crash-utility#12 [ffff880095647cd0] devm_memremap_pages at ffffffff81180c53

After the patch:

No string as 'bt: invalid input: "=>"' of output.

Cc: Sourabh Jain <[email protected]>
Cc: Hari Bathini <[email protected]>
Cc: Mahesh J Salgaonkar <[email protected]>
Cc: Naveen N. Rao <[email protected]>
Cc: Lianbo Jiang <[email protected]>
Cc: HAGIO KAZUHITO(萩尾 一仁) <[email protected]>
Cc: Tao Liu <[email protected]>
Cc: Alexey Makhalov <[email protected]>
Signed-off-by: Tao Liu <[email protected]>
adi-g15-ibm pushed a commit to adi-g15-ibm/crash that referenced this issue Aug 27, 2024
The stack unwinding is for kernel addresses only. If non-kernel address
encountered, it is usually a user space address, or non-address value
like a function call parameter. So stopping stack unwinding at non-kernel
address will decrease the invalid unwind results.

Before:
crash> gdb bt
 #0  0xffffffff816a8f65 in context_switch ...
 crash-utility#1  __schedule () ...
 crash-utility#2  0xffffffff816a94e9 in schedule ...
 crash-utility#3  0xffffffff816a86fd in schedule_hrtimeout_range_clock ...
 crash-utility#4  0xffffffff816a8733 in schedule_hrtimeout_range ...
 crash-utility#5  0xffffffff8124bb7e in ep_poll ...
 crash-utility#6  0xffffffff8124d00d in SYSC_epoll_wait ...
 crash-utility#7  SyS_epoll_wait ...
 crash-utility#8  <signal handler called>
 crash-utility#9  0x00007f0449407923 in ?? ()
 crash-utility#10 0xffff880100000001 in ?? ()
 crash-utility#11 0xffff880169b3c010 in ?? ()
 crash-utility#12 0x0000000000000040 in irq_stack_union ()
 crash-utility#13 0xffff880169b3c058 in ?? ()
 crash-utility#14 0xffff880169b3c048 in ?? ()
 crash-utility#15 0xffff880169b3c050 in ?? ()
 crash-utility#16 0x0000000000000000 in ?? ()

After:
crash> gdb bt
 #0  0xffffffff816a8f65 in context_switch ...
 crash-utility#1  __schedule () ...
 crash-utility#2  0xffffffff816a94e9 in schedule () ...
 crash-utility#3  0xffffffff816a86fd in schedule_hrtimeout_range_clock ...
 crash-utility#4  0xffffffff816a8733 in schedule_hrtimeout_range ...
 crash-utility#5  0xffffffff8124bb7e in ep_poll ...
 crash-utility#6  0xffffffff8124d00d in SYSC_epoll_wait ...
 crash-utility#7  SyS_epoll_wait ...
 crash-utility#8  <signal handler called>

Cc: Sourabh Jain <[email protected]>
Cc: Hari Bathini <[email protected]>
Cc: Mahesh J Salgaonkar <[email protected]>
Cc: Naveen N. Rao <[email protected]>
Cc: Lianbo Jiang <[email protected]>
Cc: HAGIO KAZUHITO(萩尾 一仁) <[email protected]>
Cc: Tao Liu <[email protected]>
Cc: Alexey Makhalov <[email protected]>
Signed-off-by: Tao Liu <[email protected]>
adi-g15-ibm pushed a commit to adi-g15-ibm/crash that referenced this issue Aug 27, 2024
There may be extra "=>" prefix before gdb disassembly, as a result,
parse_line() will return string "=>" as arglist[0], which will be
converted to number by htol() and fails. E.g.:

crash> gdb x/40i __list_del_entry
   ...
   0xffffffff8133c384 <__list_del_entry+36>:    cmp    %rcx,%rax
   0xffffffff8133c387 <__list_del_entry+39>:    je     0xffffffff8133c403 <__list_del_entry+163>
=> 0xffffffff8133c389 <__list_del_entry+41>:    mov    (%rax),%r8
   0xffffffff8133c38c <__list_del_entry+44>:    cmp    %r8,%rdi
   0xffffffff8133c38f <__list_del_entry+47>:    jne    0xffffffff8133c3e4 <__list_del_entry+132>
   0xffffffff8133c391 <__list_del_entry+49>:    mov    0x8(%rdx),%r8

Before the patch:

crash> bt
 ...
 crash-utility#10 [ffff880095647c00] async_page_fault at ffffffff816a8638
    [exception RIP: __list_del_entry+41]
    RIP: ffffffff8133c389  RSP: ffff880095647cb0  RFLAGS: 00010207
    RAX: 0000000000000000  RBX: ffffea0400408020  RCX: dead000000000200
    RDX: 0000000000000000  RSI: 0000000000000246  RDI: ffffea0400408020
    RBP: ffff880095647cb0   R8: 0000000080000431   R9: ffffffff81e835c0
    R10: 0000000000000000  R11: 0000000000000400  R12: ffff880138795b58
    R13: 0000000010010201  R14: ffff880095647d70  R15: 0000000400408040
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 bt: invalid input: "=>"
 crash-utility#11 [ffff880095647cb8] list_del at ffffffff8133c43d
 crash-utility#12 [ffff880095647cd0] devm_memremap_pages at ffffffff81180c53

After the patch:

No string as 'bt: invalid input: "=>"' of output.

Cc: Sourabh Jain <[email protected]>
Cc: Hari Bathini <[email protected]>
Cc: Mahesh J Salgaonkar <[email protected]>
Cc: Naveen N. Rao <[email protected]>
Cc: Lianbo Jiang <[email protected]>
Cc: HAGIO KAZUHITO(萩尾 一仁) <[email protected]>
Cc: Tao Liu <[email protected]>
Cc: Alexey Makhalov <[email protected]>
Signed-off-by: Tao Liu <[email protected]>
lian-bo pushed a commit that referenced this issue Sep 18, 2024
Previously, "retq" is used to determine the end of a function, so the end
of framesize calculation. However "ret" might be outputted by gdb rather
than "retq", as a result, the framesize is returned incorrectly, and bogus
stack trace will be outputted.

Without the patch:

   $ crash -d 3 vmcore vmlinux
   crash> bt
   0xffffffff92da7545 <copy_process+5>: push   %rbp     [framesize: 8]
   ...
   0xffffffff92da7561 <copy_process+33>:        sub    $0x238,%rsp      [framesize: 624]
   ...
   0xffffffff92da776a <copy_process+554>:       pop    %r15     [framesize: 8]
   0xffffffff92da776c <copy_process+556>:       pop    %rbp     [framesize: 0]
   0xffffffff92da776d <copy_process+557>:       ret

   crash> bt -D dump
   framesize_cache_entries:
      ...
      [  3]: ffffffff92dadcbd 0 CF (copy_process+26493)

   crash> bt
   ...
   #9  [ffff888263157bc0] copy_process at ffffffff92dadcbd
   #10 [ffff888263157d20] __mutex_init at ffffffff92ed8dd5
   #11 [ffff888263157d38] __alloc_file at ffffffff93458397
   #12 [ffff888263157d60] alloc_empty_file at ffffffff934585d2
   #13 [ffff888263157da8] __alloc_fd at ffffffff934b5ead
   #14 [ffff888263157e38] _do_fork at ffffffff92dae7a1
   #15 [ffff888263157f28] do_syscall_64 at ffffffff92c085f4

Stack #10 ~ #13 are bogus and misleading.

With the patch:
   ...
   0xffffffff92da776d <copy_process+557>:       ret     [framesize restored to: 624]

   crash> bt -D dump
      ...
      [  3]: ffffffff92dadcbd 624 CF (copy_process+26493)

   crash> bt
   ...
   #9  [ffff888263157bc0] copy_process at ffffffff92dadcbd
   #10 [ffff888263157e38] _do_fork at ffffffff92dae7a1
   #11 [ffff888263157f28] do_syscall_64 at ffffffff92c085f4

Signed-off-by: Tao Liu <[email protected]>
lian-bo pushed a commit that referenced this issue Nov 4, 2024
There may be extra "=>" prefix before gdb disassembly, as a result,
parse_line() will return string "=>" as arglist[0], which will be
converted to number by htol() and fails. E.g.:

crash> gdb x/40i __list_del_entry
   ...
   0xffffffff8133c384 <__list_del_entry+36>:    cmp    %rcx,%rax
   0xffffffff8133c387 <__list_del_entry+39>:    je     0xffffffff8133c403 <__list_del_entry+163>
=> 0xffffffff8133c389 <__list_del_entry+41>:    mov    (%rax),%r8
   0xffffffff8133c38c <__list_del_entry+44>:    cmp    %r8,%rdi
   0xffffffff8133c38f <__list_del_entry+47>:    jne    0xffffffff8133c3e4 <__list_del_entry+132>
   0xffffffff8133c391 <__list_del_entry+49>:    mov    0x8(%rdx),%r8

Before the patch:

crash> bt
 ...
 #10 [ffff880095647c00] async_page_fault at ffffffff816a8638
    [exception RIP: __list_del_entry+41]
    RIP: ffffffff8133c389  RSP: ffff880095647cb0  RFLAGS: 00010207
    RAX: 0000000000000000  RBX: ffffea0400408020  RCX: dead000000000200
    RDX: 0000000000000000  RSI: 0000000000000246  RDI: ffffea0400408020
    RBP: ffff880095647cb0   R8: 0000000080000431   R9: ffffffff81e835c0
    R10: 0000000000000000  R11: 0000000000000400  R12: ffff880138795b58
    R13: 0000000010010201  R14: ffff880095647d70  R15: 0000000400408040
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 bt: invalid input: "=>"
 #11 [ffff880095647cb8] list_del at ffffffff8133c43d
 #12 [ffff880095647cd0] devm_memremap_pages at ffffffff81180c53

After the patch:

No string as 'bt: invalid input: "=>"' of output.

Signed-off-by: Tao Liu <[email protected]>
liutgnu added a commit to liutgnu/crash-preview that referenced this issue Dec 5, 2024
…usly

There is an issue that, for kernel modules, "dis -rl" fails to display
modules code line number data after execute "bt" command in crash.

Without the patch:
  crsah> mod -S
  crash> bt
  PID: 1500     TASK: ff2bd8b093524000  CPU: 16   COMMAND: "lpfc_worker_0"
   #0 [ff2c9f725c39f9e0] machine_kexec at ffffffff8e0686d3
   ...snip...
   crash-utility#8 [ff2c9f725c39fcc0] __lpfc_sli_release_iocbq_s4 at ffffffffc0f2f425 [lpfc]
   ...snip...
  crash> dis -rl ffffffffc0f60f82
  0xffffffffc0f60eb0 <lpfc_nlp_get>:      nopl   0x0(%rax,%rax,1) [FTRACE NOP]
  0xffffffffc0f60eb5 <lpfc_nlp_get+5>:    push   %rbp
  0xffffffffc0f60eb6 <lpfc_nlp_get+6>:    push   %rbx
  0xffffffffc0f60eb7 <lpfc_nlp_get+7>:    test   %rdi,%rdi

With the patch:
  crash> mod -S
  crash> bt
  PID: 1500     TASK: ff2bd8b093524000  CPU: 16   COMMAND: "lpfc_worker_0"
   #0 [ff2c9f725c39f9e0] machine_kexec at ffffffff8e0686d3
   ...snip...
   crash-utility#8 [ff2c9f725c39fcc0] __lpfc_sli_release_iocbq_s4 at ffffffffc0f2f425 [lpfc]
   ...snip...
  crash> dis -rl ffffffffc0f60f82
  /usr/src/debug/kernel-4.18.0-425.13.1.el8_7/linux-4.18.0-425.13.1.el8_7.x86_64/drivers/scsi/lpfc/lpfc_hbadisc.c: 6756
  0xffffffffc0f60eb0 <lpfc_nlp_get>:      nopl   0x0(%rax,%rax,1) [FTRACE NOP]
  /usr/src/debug/kernel-4.18.0-425.13.1.el8_7/linux-4.18.0-425.13.1.el8_7.x86_64/drivers/scsi/lpfc/lpfc_hbadisc.c: 6759
  0xffffffffc0f60eb5 <lpfc_nlp_get+5>:    push   %rbp

The root cause is, after kernel module been loaded by mod command, the symtable
is not expanded in gdb side. crash bt or dis command will trigger such an
expansion. However the symtable expansion is different for the 2 commands:

The stack trace of "dis -rl" for symtable expanding:

  #0  0x00000000008d8d9f in add_compunit_symtab_to_objfile ...
  crash-utility#1  0x00000000006d3293 in buildsym_compunit::end_symtab_with_blockvector ...
  crash-utility#2  0x00000000006d336a in buildsym_compunit::end_symtab_from_static_block ...
  crash-utility#3  0x000000000077e8e9 in process_full_comp_unit ...
  crash-utility#4  process_queue ...
  crash-utility#5  dw2_do_instantiate_symtab ...
  crash-utility#6  0x000000000077ed67 in dw2_instantiate_symtab ...
  crash-utility#7  0x000000000077f75e in dw2_expand_all_symtabs ...
  crash-utility#8  0x00000000008f254d in gdb_get_line_number ...
  crash-utility#9  0x00000000008f22af in gdb_command_funnel_1 ...
  crash-utility#10 0x00000000008f2003 in gdb_command_funnel ...
  crash-utility#11 0x00000000005b7f02 in gdb_interface ...
  crash-utility#12 0x00000000005f5bd8 in get_line_number ...
  crash-utility#13 0x000000000059e574 in cmd_dis ...

The stack trace of "bt" for symtable expanding:

  #0  0x00000000008d8d9f in add_compunit_symtab_to_objfile ...
  crash-utility#1  0x00000000006d3293 in buildsym_compunit::end_symtab_with_blockvector ...
  crash-utility#2  0x00000000006d336a in buildsym_compunit::end_symtab_from_static_block ...
  crash-utility#3  0x000000000077e8e9 in process_full_comp_unit ...
  crash-utility#4  process_queue ...
  crash-utility#5  dw2_do_instantiate_symtab ...
  crash-utility#6  0x000000000077ed67 in dw2_instantiate_symtab ...
  crash-utility#7  0x000000000077f8ed in dw2_lookup_symbol ...
  crash-utility#8  0x00000000008e6d03 in lookup_symbol_via_quick_fns ...
  crash-utility#9  0x00000000008e7153 in lookup_symbol_in_objfile ...
  crash-utility#10 0x00000000008e73c6 in lookup_symbol_global_or_static_iterator_cb ...
  crash-utility#11 0x00000000008b99c4 in svr4_iterate_over_objfiles_in_search_order ...
  crash-utility#12 0x00000000008e754e in lookup_global_or_static_symbol ...
  crash-utility#13 0x00000000008e75da in lookup_static_symbol ...
  crash-utility#14 0x00000000008e632c in lookup_symbol_aux ...
  crash-utility#15 0x00000000008e5a7a in lookup_symbol_in_language ...
  crash-utility#16 0x00000000008e5b30 in lookup_symbol ...
  crash-utility#17 0x00000000008f2a4a in gdb_get_datatype ...
  crash-utility#18 0x00000000008f22c0 in gdb_command_funnel_1 ...
  crash-utility#19 0x00000000008f2003 in gdb_command_funnel ...
  crash-utility#20 0x00000000005b7f02 in gdb_interface ...
  crash-utility#21 0x00000000005f8a9f in datatype_info ...
  crash-utility#22 0x0000000000599947 in cpu_map_size ...
  crash-utility#23 0x00000000005a975d in get_cpus_online ...
  crash-utility#24 0x0000000000637a8b in diskdump_get_prstatus_percpu ...
  crash-utility#25 0x000000000062f0e4 in get_netdump_regs_x86_64 ...
  crash-utility#26 0x000000000059fe68 in back_trace ...
  crash-utility#27 0x00000000005ab1cb in cmd_bt ...

For the stacktrace of "dis -rl", it calls dw2_expand_all_symtabs() to expand
all symtable of the objfile, or "*.ko.debug" in our case. However for
the stacktrace of "bt", it doesn't expand all, but only a subset of symtable
which is enough to find a symbol by dw2_lookup_symbol(). As a result, the
objfile->compunit_symtabs, which is the head of a single linked list of
struct compunit_symtab, is not NULL but didn't contain all symtables. It
will not be reinitialized in gdb_get_line_number() by "dis -rl" because
!objfile_has_full_symbols(objfile) check will fail, so it cannot display
the proper code line number data.

Since objfile_has_full_symbols(objfile) check cannot ensure all symbols
been expanded, this patch add a new member as a flag for struct objfile
to record if all symbols have been expanded. The flag will be set only ofter
expand_all_symtabs been called.

Signed-off-by: Tao Liu <[email protected]>
liutgnu pushed a commit to liutgnu/crash-preview that referenced this issue Dec 5, 2024
This patch introduces per-cpu IRQ stacks for RISCV64 to let
"bt" do backtrace on it and 'bt -E' search eframes on it,
and the 'help -m' command displays the addresses of each
per-cpu IRQ stack.

TEST: a vmcore dumped via hacking the handle_irq_event_percpu()
( Why not using lkdtm INT_HW_IRQ_EN EXCEPTION ?
  There is a deadlock[1] in crash_kexec path if use that)

  crash> bt
  PID: 0        TASK: ffffffff8140db00  CPU: 0    COMMAND: "swapper/0"
   #0 [ff20000000003e60] __handle_irq_event_percpu at ffffffff8006462e
   crash-utility#1 [ff20000000003ed0] handle_irq_event_percpu at ffffffff80064702
   crash-utility#2 [ff20000000003ef0] handle_irq_event at ffffffff8006477c
   crash-utility#3 [ff20000000003f20] handle_fasteoi_irq at ffffffff80068664
   crash-utility#4 [ff20000000003f50] generic_handle_domain_irq at ffffffff80063988
   crash-utility#5 [ff20000000003f60] plic_handle_irq at ffffffff8046633e
   crash-utility#6 [ff20000000003fb0] generic_handle_domain_irq at ffffffff80063988
   crash-utility#7 [ff20000000003fc0] riscv_intc_irq at ffffffff80465f8e
   crash-utility#8 [ff20000000003fd0] handle_riscv_irq at ffffffff808361e8
       PC: ffffffff80837314  [default_idle_call+50]
       RA: ffffffff80837310  [default_idle_call+46]
       SP: ffffffff81403da0  CAUSE: 8000000000000009
  epc : ffffffff80837314 ra : ffffffff80837310 sp : ffffffff81403da0
   gp : ffffffff814ef848 tp : ffffffff8140db00 t0 : ff2000000004bb18
   t1 : 0000000000032c73 t2 : ffffffff81200a48 s0 : ffffffff81403db0
   s1 : 0000000000000000 a0 : 0000000000000004 a1 : 0000000000000000
   a2 : ff6000009f1e7000 a3 : 0000000000002304 a4 : ffffffff80c1c2d8
   a5 : 0000000000000000 a6 : ff6000001fe01958 a7 : 00002496ea89dbf1
   s2 : ffffffff814f0220 s3 : 0000000000000001 s4 : 000000000000003f
   s5 : ffffffff814f03d8 s6 : 0000000000000000 s7 : ffffffff814f00d0
   s8 : ffffffff81526f10 s9 : ffffffff80c1d880 s10: 0000000000000000
   s11: 0000000000000001 t3 : 0000000000003392 t4 : 0000000000000000
   t5 : 0000000000000000 t6 : 0000000000000040
   status: 0000000200000120 badaddr: 0000000000000000
    cause: 8000000000000009 orig_a0: ffffffff80837310
  --- <IRQ stack> ---
   crash-utility#9 [ffffffff81403da0] default_idle_call at ffffffff80837314
   crash-utility#10 [ffffffff81403db0] do_idle at ffffffff8004d0a0
   crash-utility#11 [ffffffff81403e40] cpu_startup_entry at ffffffff8004d21e
   crash-utility#12 [ffffffff81403e60] kernel_init at ffffffff8083746a
   crash-utility#13 [ffffffff81403e70] arch_post_acpi_subsys_init at ffffffff80a006d8
   crash-utility#14 [ffffffff81403e80] console_on_rootfs at ffffffff80a00c92
  crash>

  crash> bt -E
  CPU 0 IRQ STACK:
  KERNEL-MODE EXCEPTION FRAME AT: ff20000000003a48
       PC: ffffffff8006462e  [__handle_irq_event_percpu+30]
       RA: ffffffff80064702  [handle_irq_event_percpu+18]
       SP: ff20000000003e60  CAUSE: 000000000000000d
  epc : ffffffff8006462e ra : ffffffff80064702 sp : ff20000000003e60
   gp : ffffffff814ef848 tp : ffffffff8140db00 t0 : 0000000000046600
   t1 : ffffffff80836464 t2 : ffffffff81200a48 s0 : ff20000000003ed0
   s1 : 0000000000000000 a0 : 0000000000000000 a1 : 0000000000000118
   a2 : 0000000000000052 a3 : 0000000000000000 a4 : 0000000000000000
   a5 : 0000000000010001 a6 : ff6000001fe01958 a7 : 00002496ea89dbf1
   s2 : ff60000000941ab0 s3 : ffffffff814a0658 s4 : ff60000000089230
   s5 : ffffffff814a0518 s6 : ffffffff814a0620 s7 : ffffffff80e5f0f8
   s8 : ffffffff80fc50b0 s9 : ffffffff80c1d880 s10: 0000000000000000
   s11: 0000000000000001 t3 : 0000000000003392 t4 : 0000000000000000
   t5 : 0000000000000000 t6 : 0000000000000040
   status: 0000000200000100 badaddr: 0000000000000078
    cause: 000000000000000d orig_a0: ff20000000003ea0

  CPU 1 IRQ STACK: (none found)

  crash>

  crash> help -m
  <snip>
             machspec: ced1e0
          irq_stack_size: 16384
           irq_stacks[0]: ff20000000000000
           irq_stacks[1]: ff20000000008000
  crash>

[1]: https://lore.kernel.org/linux-riscv/[email protected]/

Signed-off-by: Song Shuai <[email protected]>
liutgnu added a commit to liutgnu/crash-preview that referenced this issue Dec 5, 2024
Previously, "retq" is used to determine the end of a function, so the end
of framesize calculation. However "ret" might be outputted by gdb rather
than "retq", as a result, the framesize is returned incorrectly, and bogus
stack trace will be outputted.

Without the patch:

   $ crash -d 3 vmcore vmlinux
   crash> bt
   0xffffffff92da7545 <copy_process+5>: push   %rbp     [framesize: 8]
   ...
   0xffffffff92da7561 <copy_process+33>:        sub    $0x238,%rsp      [framesize: 624]
   ...
   0xffffffff92da776a <copy_process+554>:       pop    %r15     [framesize: 8]
   0xffffffff92da776c <copy_process+556>:       pop    %rbp     [framesize: 0]
   0xffffffff92da776d <copy_process+557>:       ret

   crash> bt -D dump
   framesize_cache_entries:
      ...
      [  3]: ffffffff92dadcbd 0 CF (copy_process+26493)

   crash> bt
   ...
   crash-utility#9  [ffff888263157bc0] copy_process at ffffffff92dadcbd
   crash-utility#10 [ffff888263157d20] __mutex_init at ffffffff92ed8dd5
   crash-utility#11 [ffff888263157d38] __alloc_file at ffffffff93458397
   crash-utility#12 [ffff888263157d60] alloc_empty_file at ffffffff934585d2
   crash-utility#13 [ffff888263157da8] __alloc_fd at ffffffff934b5ead
   crash-utility#14 [ffff888263157e38] _do_fork at ffffffff92dae7a1
   crash-utility#15 [ffff888263157f28] do_syscall_64 at ffffffff92c085f4

Stack crash-utility#10 ~ crash-utility#13 are bogus and misleading.

With the patch:
   ...
   0xffffffff92da776d <copy_process+557>:       ret     [framesize restored to: 624]

   crash> bt -D dump
      ...
      [  3]: ffffffff92dadcbd 624 CF (copy_process+26493)

   crash> bt
   ...
   crash-utility#9  [ffff888263157bc0] copy_process at ffffffff92dadcbd
   crash-utility#10 [ffff888263157e38] _do_fork at ffffffff92dae7a1
   crash-utility#11 [ffff888263157f28] do_syscall_64 at ffffffff92c085f4

Signed-off-by: Tao Liu <[email protected]>
liutgnu added a commit to liutgnu/crash-preview that referenced this issue Dec 5, 2024
There may be extra "=>" prefix before gdb disassembly, as a result,
parse_line() will return string "=>" as arglist[0], which will be
converted to number by htol() and fails. E.g.:

crash> gdb x/40i __list_del_entry
   ...
   0xffffffff8133c384 <__list_del_entry+36>:    cmp    %rcx,%rax
   0xffffffff8133c387 <__list_del_entry+39>:    je     0xffffffff8133c403 <__list_del_entry+163>
=> 0xffffffff8133c389 <__list_del_entry+41>:    mov    (%rax),%r8
   0xffffffff8133c38c <__list_del_entry+44>:    cmp    %r8,%rdi
   0xffffffff8133c38f <__list_del_entry+47>:    jne    0xffffffff8133c3e4 <__list_del_entry+132>
   0xffffffff8133c391 <__list_del_entry+49>:    mov    0x8(%rdx),%r8

Before the patch:

crash> bt
 ...
 crash-utility#10 [ffff880095647c00] async_page_fault at ffffffff816a8638
    [exception RIP: __list_del_entry+41]
    RIP: ffffffff8133c389  RSP: ffff880095647cb0  RFLAGS: 00010207
    RAX: 0000000000000000  RBX: ffffea0400408020  RCX: dead000000000200
    RDX: 0000000000000000  RSI: 0000000000000246  RDI: ffffea0400408020
    RBP: ffff880095647cb0   R8: 0000000080000431   R9: ffffffff81e835c0
    R10: 0000000000000000  R11: 0000000000000400  R12: ffff880138795b58
    R13: 0000000010010201  R14: ffff880095647d70  R15: 0000000400408040
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 bt: invalid input: "=>"
 crash-utility#11 [ffff880095647cb8] list_del at ffffffff8133c43d
 crash-utility#12 [ffff880095647cd0] devm_memremap_pages at ffffffff81180c53

After the patch:

No string as 'bt: invalid input: "=>"' of output.

Signed-off-by: Tao Liu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant