-
Notifications
You must be signed in to change notification settings - Fork 280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow to not include build date, hostname, user #14
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
if indicated by the environment to allow for reproducible builds This is slightly stretching the definition of the variable defined at https://reproducible-builds.org/specs/source-date-epoch/ Signed-off-by: Bernhard M. Wiedemann <[email protected]>
Hi Bernhard,
I don't normally accept github pull requests, and prefer that patches
be posted to the crash-utility mailing list. But this one's simple
enough, and so I went ahead and applied it.
Thanks,
Dave
…----- Original Message -----
if indicated by the environment
to allow for reproducible builds
This is slightly stretching the definition of the variable defined at
https://reproducible-builds.org/specs/source-date-epoch/
Signed-off-by: Bernhard M. Wiedemann ***@***.***>
You can view, comment on, or merge this pull request online at:
#14
-- Commit Summary --
* Allow to not include build date, hostname, user
-- File Changes --
M configure.c (4)
-- Patch Links --
https://github.com/crash-utility/crash/pull/14.patch
https://github.com/crash-utility/crash/pull/14.diff
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
#14
|
This was referenced Nov 28, 2019
yangh
added a commit
to yangh/crash
that referenced
this pull request
Nov 15, 2021
Overflow stack supported since kernel 4.14 in commit 872d8327ce8, without this patch, bt command trigger a SIGSEGV fault due the SP pointed to the overflow stack which not yet loaded by crash. Before: KERNEL: ../vmlinux DUMPFILE: la_guestdump.gcore CPUS: 8 DATE: Tue Jul 13 19:59:44 CST 2021 UPTIME: 00:00:42 LOAD AVERAGE: 3.99, 1.13, 0.39 TASKS: 1925 NODENAME: localhost RELEASE: 4.14.156+ VERSION: crash-utility#1 SMP PREEMPT Tue Jul 13 10:37:23 UTC 2021 MACHINE: aarch64 (unknown Mhz) MEMORY: 8.7 GB PANIC: "Kernel panic - not syncing: kernel stack overflow" PID: 1969 COMMAND: "irq/139-0-0024" TASK: ffffffcc1a230000 [THREAD_INFO: ffffffcc1a230000] CPU: 0 STATE: TASK_RUNNING (PANIC) crash-7.3.0> bt PID: 1969 TASK: ffffffcc1a230000 CPU: 0 COMMAND: "irq/139-0-0024" Segmentation fault (core dumped) After: crash> bt PID: 1969 TASK: ffffffcc1a230000 CPU: 0 COMMAND: "irq/139-0-0024" #0 [ffffffcc7fd5cf50] __delay at ffffff8008c80774 crash-utility#1 [ffffffcc7fd5cf60] __const_udelay at ffffff8008c80864 crash-utility#2 [ffffffcc7fd5cf80] msm_trigger_wdog_bite at ffffff80084e9430 crash-utility#3 [ffffffcc7fd5cfa0] do_vm_restart at ffffff80087bc974 crash-utility#4 [ffffffcc7fd5cfc0] machine_restart at ffffff80080856fc crash-utility#5 [ffffffcc7fd5cfd0] emergency_restart at ffffff80080d49bc crash-utility#6 [ffffffcc7fd5d140] panic at ffffff80080af4c0 crash-utility#7 [ffffffcc7fd5d150] nmi_panic at ffffff80080af150 crash-utility#8 [ffffffcc7fd5d190] handle_bad_stack at ffffff800808b0b8 crash-utility#9 [ffffffcc7fd5d2d0] __bad_stack at ffffff800808285c --- <IRQ stack> --- crash-utility#10 [ffffff801187bc60] el1_error_invalid at ffffff8008082e7c crash-utility#11 [ffffff801187bcc0] cyttsp6_mt_attention at ffffff8000e8498c [cyttsp6] crash-utility#12 [ffffff801187bd20] call_atten_cb at ffffff8000e82030 [cyttsp6] crash-utility#13 [ffffff801187bdc0] cyttsp6_irq at ffffff8000e81e34 [cyttsp6] crash-utility#14 [ffffff801187bdf0] irq_thread_fn at ffffff8008128dd8 crash-utility#15 [ffffff801187be50] irq_thread at ffffff8008128ca4 crash-utility#16 [ffffff801187beb0] kthread at ffffff80080d2fc4 crash> Signed-off-by: Hong YANG <[email protected]>
k-hagio
pushed a commit
that referenced
this pull request
Feb 16, 2023
Kernel commit 7d65f4a65532 ("irq: Consolidate do_softirq() arch overriden implementations") renamed the call_softirq to do_softirq_own_stack, and there is no exception frame also when coming from do_softirq_own_stack. Without the patch, crash may unnecessarily output an exception frame with a warning as below: crash> foreach bt ... PID: 0 TASK: ffff914f820a8000 CPU: 25 COMMAND: "swapper/25" #0 [fffffe0000504e48] crash_nmi_callback at ffffffffa665d763 #1 [fffffe0000504e50] nmi_handle at ffffffffa662a423 #2 [fffffe0000504ea8] default_do_nmi at ffffffffa6fe7dc9 #3 [fffffe0000504ec8] do_nmi at ffffffffa662a97f #4 [fffffe0000504ef0] end_repeat_nmi at ffffffffa70015e8 [exception RIP: clone_endio+172] RIP: ffffffffc005c1ec RSP: ffffa1d403d08e98 RFLAGS: 00000246 RAX: 0000000000000000 RBX: ffff915326fba230 RCX: 0000000000000018 RDX: ffffffffc0075400 RSI: 0000000000000000 RDI: ffff915326fba230 RBP: ffff915326fba1c0 R8: 0000000000001000 R9: ffff915308d6d2a0 R10: 000000a97dfe5e10 R11: ffffa1d40038fe98 R12: ffff915302babc40 R13: ffff914f94360000 R14: 0000000000000000 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #5 [ffffa1d403d08e98] clone_endio at ffffffffc005c1ec [dm_mod] #6 [ffffa1d403d08ed0] blk_update_request at ffffffffa6a96954 #7 [ffffa1d403d08f10] scsi_end_request at ffffffffa6c9b968 #8 [ffffa1d403d08f48] scsi_io_completion at ffffffffa6c9bb3e #9 [ffffa1d403d08f90] blk_complete_reqs at ffffffffa6aa0e95 #10 [ffffa1d403d08fa0] __softirqentry_text_start at ffffffffa72000dc #11 [ffffa1d403d08ff0] do_softirq_own_stack at ffffffffa7000f9a --- <IRQ stack> --- #12 [ffffa1d40038fe70] do_softirq_own_stack at ffffffffa7000f9a [exception RIP: unknown or invalid address] RIP: 0000000000000000 RSP: 0000000000000000 RFLAGS: 00000000 RAX: ffffffffa672eae5 RBX: ffffffffa83b34e0 RCX: ffffffffa672eb12 RDX: 0000000000000010 RSI: 8b7d6c8869010c00 RDI: 0000000000000085 RBP: 0000000000000286 R8: ffff914f820a8000 R9: ffffffffa67a94e0 R10: 0000000000000286 R11: ffffffffa66fb4c5 R12: ffffffffa67a898b R13: 0000000000000000 R14: fffffffffffffff8 R15: ffffffffa67a1e68 ORIG_RAX: 0000000000000000 CS: 0000 SS: ffffffffa672edff bt: WARNING: possibly bogus exception frame #13 [ffffa1d40038ff30] start_secondary at ffffffffa665fa2c #14 [ffffa1d40038ff50] secondary_startup_64_no_verify at ffffffffa6600116 ... Reported-by: Marco Patalano <[email protected]> Signed-off-by: Lianbo Jiang <[email protected]>
k-hagio
pushed a commit
that referenced
this pull request
Nov 29, 2023
…usly There is an issue that, for kernel modules, "dis -rl" fails to display modules code line number data after execute "bt" command in crash. Without the patch: crsah> mod -S crash> bt PID: 1500 TASK: ff2bd8b093524000 CPU: 16 COMMAND: "lpfc_worker_0" #0 [ff2c9f725c39f9e0] machine_kexec at ffffffff8e0686d3 ...snip... #8 [ff2c9f725c39fcc0] __lpfc_sli_release_iocbq_s4 at ffffffffc0f2f425 [lpfc] ...snip... crash> dis -rl ffffffffc0f60f82 0xffffffffc0f60eb0 <lpfc_nlp_get>: nopl 0x0(%rax,%rax,1) [FTRACE NOP] 0xffffffffc0f60eb5 <lpfc_nlp_get+5>: push %rbp 0xffffffffc0f60eb6 <lpfc_nlp_get+6>: push %rbx 0xffffffffc0f60eb7 <lpfc_nlp_get+7>: test %rdi,%rdi With the patch: crash> mod -S crash> bt PID: 1500 TASK: ff2bd8b093524000 CPU: 16 COMMAND: "lpfc_worker_0" #0 [ff2c9f725c39f9e0] machine_kexec at ffffffff8e0686d3 ...snip... #8 [ff2c9f725c39fcc0] __lpfc_sli_release_iocbq_s4 at ffffffffc0f2f425 [lpfc] ...snip... crash> dis -rl ffffffffc0f60f82 /usr/src/debug/kernel-4.18.0-425.13.1.el8_7/linux-4.18.0-425.13.1.el8_7.x86_64/drivers/scsi/lpfc/lpfc_hbadisc.c: 6756 0xffffffffc0f60eb0 <lpfc_nlp_get>: nopl 0x0(%rax,%rax,1) [FTRACE NOP] /usr/src/debug/kernel-4.18.0-425.13.1.el8_7/linux-4.18.0-425.13.1.el8_7.x86_64/drivers/scsi/lpfc/lpfc_hbadisc.c: 6759 0xffffffffc0f60eb5 <lpfc_nlp_get+5>: push %rbp The root cause is, after kernel module been loaded by mod command, the symtable is not expanded in gdb side. crash bt or dis command will trigger such an expansion. However the symtable expansion is different for the 2 commands: The stack trace of "dis -rl" for symtable expanding: #0 0x00000000008d8d9f in add_compunit_symtab_to_objfile ... #1 0x00000000006d3293 in buildsym_compunit::end_symtab_with_blockvector ... #2 0x00000000006d336a in buildsym_compunit::end_symtab_from_static_block ... #3 0x000000000077e8e9 in process_full_comp_unit ... #4 process_queue ... #5 dw2_do_instantiate_symtab ... #6 0x000000000077ed67 in dw2_instantiate_symtab ... #7 0x000000000077f75e in dw2_expand_all_symtabs ... #8 0x00000000008f254d in gdb_get_line_number ... #9 0x00000000008f22af in gdb_command_funnel_1 ... #10 0x00000000008f2003 in gdb_command_funnel ... #11 0x00000000005b7f02 in gdb_interface ... #12 0x00000000005f5bd8 in get_line_number ... #13 0x000000000059e574 in cmd_dis ... The stack trace of "bt" for symtable expanding: #0 0x00000000008d8d9f in add_compunit_symtab_to_objfile ... #1 0x00000000006d3293 in buildsym_compunit::end_symtab_with_blockvector ... #2 0x00000000006d336a in buildsym_compunit::end_symtab_from_static_block ... #3 0x000000000077e8e9 in process_full_comp_unit ... #4 process_queue ... #5 dw2_do_instantiate_symtab ... #6 0x000000000077ed67 in dw2_instantiate_symtab ... #7 0x000000000077f8ed in dw2_lookup_symbol ... #8 0x00000000008e6d03 in lookup_symbol_via_quick_fns ... #9 0x00000000008e7153 in lookup_symbol_in_objfile ... #10 0x00000000008e73c6 in lookup_symbol_global_or_static_iterator_cb ... #11 0x00000000008b99c4 in svr4_iterate_over_objfiles_in_search_order ... #12 0x00000000008e754e in lookup_global_or_static_symbol ... #13 0x00000000008e75da in lookup_static_symbol ... #14 0x00000000008e632c in lookup_symbol_aux ... #15 0x00000000008e5a7a in lookup_symbol_in_language ... #16 0x00000000008e5b30 in lookup_symbol ... #17 0x00000000008f2a4a in gdb_get_datatype ... #18 0x00000000008f22c0 in gdb_command_funnel_1 ... #19 0x00000000008f2003 in gdb_command_funnel ... #20 0x00000000005b7f02 in gdb_interface ... #21 0x00000000005f8a9f in datatype_info ... #22 0x0000000000599947 in cpu_map_size ... #23 0x00000000005a975d in get_cpus_online ... #24 0x0000000000637a8b in diskdump_get_prstatus_percpu ... #25 0x000000000062f0e4 in get_netdump_regs_x86_64 ... #26 0x000000000059fe68 in back_trace ... #27 0x00000000005ab1cb in cmd_bt ... For the stacktrace of "dis -rl", it calls dw2_expand_all_symtabs() to expand all symtable of the objfile, or "*.ko.debug" in our case. However for the stacktrace of "bt", it doesn't expand all, but only a subset of symtable which is enough to find a symbol by dw2_lookup_symbol(). As a result, the objfile->compunit_symtabs, which is the head of a single linked list of struct compunit_symtab, is not NULL but didn't contain all symtables. It will not be reinitialized in gdb_get_line_number() by "dis -rl" because !objfile_has_full_symbols(objfile) check will fail, so it cannot display the proper code line number data. Since objfile_has_full_symbols(objfile) check cannot ensure all symbols been expanded, this patch add a new member as a flag for struct objfile to record if all symbols have been expanded. The flag will be set only ofter expand_all_symtabs been called. Signed-off-by: Tao Liu <[email protected]>
liutgnu
added a commit
to liutgnu/crash-preview
that referenced
this pull request
Nov 30, 2023
…usly There is an issue that, for kernel modules, "dis -rl" fails to display modules code line number data after execute "bt" command in crash. Without the patch: crsah> mod -S crash> bt PID: 1500 TASK: ff2bd8b093524000 CPU: 16 COMMAND: "lpfc_worker_0" #0 [ff2c9f725c39f9e0] machine_kexec at ffffffff8e0686d3 ...snip... crash-utility#8 [ff2c9f725c39fcc0] __lpfc_sli_release_iocbq_s4 at ffffffffc0f2f425 [lpfc] ...snip... crash> dis -rl ffffffffc0f60f82 0xffffffffc0f60eb0 <lpfc_nlp_get>: nopl 0x0(%rax,%rax,1) [FTRACE NOP] 0xffffffffc0f60eb5 <lpfc_nlp_get+5>: push %rbp 0xffffffffc0f60eb6 <lpfc_nlp_get+6>: push %rbx 0xffffffffc0f60eb7 <lpfc_nlp_get+7>: test %rdi,%rdi With the patch: crash> mod -S crash> bt PID: 1500 TASK: ff2bd8b093524000 CPU: 16 COMMAND: "lpfc_worker_0" #0 [ff2c9f725c39f9e0] machine_kexec at ffffffff8e0686d3 ...snip... crash-utility#8 [ff2c9f725c39fcc0] __lpfc_sli_release_iocbq_s4 at ffffffffc0f2f425 [lpfc] ...snip... crash> dis -rl ffffffffc0f60f82 /usr/src/debug/kernel-4.18.0-425.13.1.el8_7/linux-4.18.0-425.13.1.el8_7.x86_64/drivers/scsi/lpfc/lpfc_hbadisc.c: 6756 0xffffffffc0f60eb0 <lpfc_nlp_get>: nopl 0x0(%rax,%rax,1) [FTRACE NOP] /usr/src/debug/kernel-4.18.0-425.13.1.el8_7/linux-4.18.0-425.13.1.el8_7.x86_64/drivers/scsi/lpfc/lpfc_hbadisc.c: 6759 0xffffffffc0f60eb5 <lpfc_nlp_get+5>: push %rbp The root cause is, after kernel module been loaded by mod command, the symtable is not expanded in gdb side. crash bt or dis command will trigger such an expansion. However the symtable expansion is different for the 2 commands: The stack trace of "dis -rl" for symtable expanding: #0 0x00000000008d8d9f in add_compunit_symtab_to_objfile ... crash-utility#1 0x00000000006d3293 in buildsym_compunit::end_symtab_with_blockvector ... crash-utility#2 0x00000000006d336a in buildsym_compunit::end_symtab_from_static_block ... crash-utility#3 0x000000000077e8e9 in process_full_comp_unit ... crash-utility#4 process_queue ... crash-utility#5 dw2_do_instantiate_symtab ... crash-utility#6 0x000000000077ed67 in dw2_instantiate_symtab ... crash-utility#7 0x000000000077f75e in dw2_expand_all_symtabs ... crash-utility#8 0x00000000008f254d in gdb_get_line_number ... crash-utility#9 0x00000000008f22af in gdb_command_funnel_1 ... crash-utility#10 0x00000000008f2003 in gdb_command_funnel ... crash-utility#11 0x00000000005b7f02 in gdb_interface ... crash-utility#12 0x00000000005f5bd8 in get_line_number ... crash-utility#13 0x000000000059e574 in cmd_dis ... The stack trace of "bt" for symtable expanding: #0 0x00000000008d8d9f in add_compunit_symtab_to_objfile ... crash-utility#1 0x00000000006d3293 in buildsym_compunit::end_symtab_with_blockvector ... crash-utility#2 0x00000000006d336a in buildsym_compunit::end_symtab_from_static_block ... crash-utility#3 0x000000000077e8e9 in process_full_comp_unit ... crash-utility#4 process_queue ... crash-utility#5 dw2_do_instantiate_symtab ... crash-utility#6 0x000000000077ed67 in dw2_instantiate_symtab ... crash-utility#7 0x000000000077f8ed in dw2_lookup_symbol ... crash-utility#8 0x00000000008e6d03 in lookup_symbol_via_quick_fns ... crash-utility#9 0x00000000008e7153 in lookup_symbol_in_objfile ... crash-utility#10 0x00000000008e73c6 in lookup_symbol_global_or_static_iterator_cb ... crash-utility#11 0x00000000008b99c4 in svr4_iterate_over_objfiles_in_search_order ... crash-utility#12 0x00000000008e754e in lookup_global_or_static_symbol ... crash-utility#13 0x00000000008e75da in lookup_static_symbol ... crash-utility#14 0x00000000008e632c in lookup_symbol_aux ... crash-utility#15 0x00000000008e5a7a in lookup_symbol_in_language ... crash-utility#16 0x00000000008e5b30 in lookup_symbol ... crash-utility#17 0x00000000008f2a4a in gdb_get_datatype ... crash-utility#18 0x00000000008f22c0 in gdb_command_funnel_1 ... crash-utility#19 0x00000000008f2003 in gdb_command_funnel ... crash-utility#20 0x00000000005b7f02 in gdb_interface ... crash-utility#21 0x00000000005f8a9f in datatype_info ... crash-utility#22 0x0000000000599947 in cpu_map_size ... crash-utility#23 0x00000000005a975d in get_cpus_online ... crash-utility#24 0x0000000000637a8b in diskdump_get_prstatus_percpu ... crash-utility#25 0x000000000062f0e4 in get_netdump_regs_x86_64 ... crash-utility#26 0x000000000059fe68 in back_trace ... crash-utility#27 0x00000000005ab1cb in cmd_bt ... For the stacktrace of "dis -rl", it calls dw2_expand_all_symtabs() to expand all symtable of the objfile, or "*.ko.debug" in our case. However for the stacktrace of "bt", it doesn't expand all, but only a subset of symtable which is enough to find a symbol by dw2_lookup_symbol(). As a result, the objfile->compunit_symtabs, which is the head of a single linked list of struct compunit_symtab, is not NULL but didn't contain all symtables. It will not be reinitialized in gdb_get_line_number() by "dis -rl" because !objfile_has_full_symbols(objfile) check will fail, so it cannot display the proper code line number data. Since objfile_has_full_symbols(objfile) check cannot ensure all symbols been expanded, this patch add a new member as a flag for struct objfile to record if all symbols have been expanded. The flag will be set only ofter expand_all_symtabs been called. Signed-off-by: Tao Liu <[email protected]>
liutgnu
added a commit
to liutgnu/crash-preview
that referenced
this pull request
Dec 1, 2023
…eviously There is an issue that, for kernel modules, "dis -rl" fails to display modules code line number data after execute "bt" command in crash. Without the patch: crsah> mod -S crash> bt PID: 1500 TASK: ff2bd8b093524000 CPU: 16 COMMAND: "lpfc_worker_0" #0 [ff2c9f725c39f9e0] machine_kexec at ffffffff8e0686d3 ...snip... crash-utility#8 [ff2c9f725c39fcc0] __lpfc_sli_release_iocbq_s4 at ffffffffc0f2f425 [lpfc] ...snip... crash> dis -rl ffffffffc0f60f82 0xffffffffc0f60eb0 <lpfc_nlp_get>: nopl 0x0(%rax,%rax,1) [FTRACE NOP] 0xffffffffc0f60eb5 <lpfc_nlp_get+5>: push %rbp 0xffffffffc0f60eb6 <lpfc_nlp_get+6>: push %rbx 0xffffffffc0f60eb7 <lpfc_nlp_get+7>: test %rdi,%rdi With the patch: crash> mod -S crash> bt PID: 1500 TASK: ff2bd8b093524000 CPU: 16 COMMAND: "lpfc_worker_0" #0 [ff2c9f725c39f9e0] machine_kexec at ffffffff8e0686d3 ...snip... crash-utility#8 [ff2c9f725c39fcc0] __lpfc_sli_release_iocbq_s4 at ffffffffc0f2f425 [lpfc] ...snip... crash> dis -rl ffffffffc0f60f82 /usr/src/debug/kernel-4.18.0-425.13.1.el8_7/linux-4.18.0-425.13.1.el8_7.x86_64/drivers/scsi/lpfc/lpfc_hbadisc.c: 6756 0xffffffffc0f60eb0 <lpfc_nlp_get>: nopl 0x0(%rax,%rax,1) [FTRACE NOP] /usr/src/debug/kernel-4.18.0-425.13.1.el8_7/linux-4.18.0-425.13.1.el8_7.x86_64/drivers/scsi/lpfc/lpfc_hbadisc.c: 6759 0xffffffffc0f60eb5 <lpfc_nlp_get+5>: push %rbp The root cause is, after kernel module been loaded by mod command, the symtable is not expanded in gdb side. crash bt or dis command will trigger such an expansion. However the symtable expansion is different for the 2 commands: The stack trace of "dis -rl" for symtable expanding: #0 0x00000000008d8d9f in add_compunit_symtab_to_objfile ... crash-utility#1 0x00000000006d3293 in buildsym_compunit::end_symtab_with_blockvector ... crash-utility#2 0x00000000006d336a in buildsym_compunit::end_symtab_from_static_block ... crash-utility#3 0x000000000077e8e9 in process_full_comp_unit ... crash-utility#4 process_queue ... crash-utility#5 dw2_do_instantiate_symtab ... crash-utility#6 0x000000000077ed67 in dw2_instantiate_symtab ... crash-utility#7 0x000000000077f75e in dw2_expand_all_symtabs ... crash-utility#8 0x00000000008f254d in gdb_get_line_number ... crash-utility#9 0x00000000008f22af in gdb_command_funnel_1 ... crash-utility#10 0x00000000008f2003 in gdb_command_funnel ... crash-utility#11 0x00000000005b7f02 in gdb_interface ... crash-utility#12 0x00000000005f5bd8 in get_line_number ... crash-utility#13 0x000000000059e574 in cmd_dis ... The stack trace of "bt" for symtable expanding: #0 0x00000000008d8d9f in add_compunit_symtab_to_objfile ... crash-utility#1 0x00000000006d3293 in buildsym_compunit::end_symtab_with_blockvector ... crash-utility#2 0x00000000006d336a in buildsym_compunit::end_symtab_from_static_block ... crash-utility#3 0x000000000077e8e9 in process_full_comp_unit ... crash-utility#4 process_queue ... crash-utility#5 dw2_do_instantiate_symtab ... crash-utility#6 0x000000000077ed67 in dw2_instantiate_symtab ... crash-utility#7 0x000000000077f8ed in dw2_lookup_symbol ... crash-utility#8 0x00000000008e6d03 in lookup_symbol_via_quick_fns ... crash-utility#9 0x00000000008e7153 in lookup_symbol_in_objfile ... crash-utility#10 0x00000000008e73c6 in lookup_symbol_global_or_static_iterator_cb ... crash-utility#11 0x00000000008b99c4 in svr4_iterate_over_objfiles_in_search_order ... crash-utility#12 0x00000000008e754e in lookup_global_or_static_symbol ... crash-utility#13 0x00000000008e75da in lookup_static_symbol ... crash-utility#14 0x00000000008e632c in lookup_symbol_aux ... crash-utility#15 0x00000000008e5a7a in lookup_symbol_in_language ... crash-utility#16 0x00000000008e5b30 in lookup_symbol ... crash-utility#17 0x00000000008f2a4a in gdb_get_datatype ... crash-utility#18 0x00000000008f22c0 in gdb_command_funnel_1 ... crash-utility#19 0x00000000008f2003 in gdb_command_funnel ... crash-utility#20 0x00000000005b7f02 in gdb_interface ... crash-utility#21 0x00000000005f8a9f in datatype_info ... crash-utility#22 0x0000000000599947 in cpu_map_size ... crash-utility#23 0x00000000005a975d in get_cpus_online ... crash-utility#24 0x0000000000637a8b in diskdump_get_prstatus_percpu ... crash-utility#25 0x000000000062f0e4 in get_netdump_regs_x86_64 ... crash-utility#26 0x000000000059fe68 in back_trace ... crash-utility#27 0x00000000005ab1cb in cmd_bt ... For the stacktrace of "dis -rl", it calls dw2_expand_all_symtabs() to expand all symtable of the objfile, or "*.ko.debug" in our case. However for the stacktrace of "bt", it doesn't expand all, but only a subset of symtable which is enough to find a symbol by dw2_lookup_symbol(). As a result, the objfile->compunit_symtabs, which is the head of a single linked list of struct compunit_symtab, is not NULL but didn't contain all symtables. It will not be reinitialized in gdb_get_line_number() by "dis -rl" because !objfile_has_full_symbols(objfile) check will fail, so it cannot display the proper code line number data. Since objfile_has_full_symbols(objfile) check cannot ensure all symbols been expanded, this patch add a new member as a flag for struct objfile to record if all symbols have been expanded. The flag will be set only ofter expand_all_symtabs been called. Signed-off-by: Tao Liu <[email protected]>
sugarfillet
pushed a commit
to sugarfillet/crash
that referenced
this pull request
Dec 13, 2023
This patch introduces per-cpu IRQ stacks for RISCV64 to let "bt" do backtrace on it and 'bt -E' search eframes on it, and the 'help -m' command displays the addresses of each per-cpu IRQ stack. TEST: a vmcore dumped via hacking the handle_irq_event_percpu() ( Why not using lkdtm INT_HW_IRQ_EN EXCEPTION ? There is a deadlock[1] in crash_kexec path if use that) ``` crash> bt PID: 0 TASK: ffffffff8140db00 CPU: 0 COMMAND: "swapper/0" #0 [ff20000000003e60] __handle_irq_event_percpu at ffffffff8006462e crash-utility#1 [ff20000000003ed0] handle_irq_event_percpu at ffffffff80064702 crash-utility#2 [ff20000000003ef0] handle_irq_event at ffffffff8006477c crash-utility#3 [ff20000000003f20] handle_fasteoi_irq at ffffffff80068664 crash-utility#4 [ff20000000003f50] generic_handle_domain_irq at ffffffff80063988 crash-utility#5 [ff20000000003f60] plic_handle_irq at ffffffff8046633e crash-utility#6 [ff20000000003fb0] generic_handle_domain_irq at ffffffff80063988 crash-utility#7 [ff20000000003fc0] riscv_intc_irq at ffffffff80465f8e crash-utility#8 [ff20000000003fd0] handle_riscv_irq at ffffffff808361e8 PC: ffffffff80837314 [default_idle_call+50] RA: ffffffff80837310 [default_idle_call+46] SP: ffffffff81403da0 CAUSE: 8000000000000009 epc : ffffffff80837314 ra : ffffffff80837310 sp : ffffffff81403da0 gp : ffffffff814ef848 tp : ffffffff8140db00 t0 : ff2000000004bb18 t1 : 0000000000032c73 t2 : ffffffff81200a48 s0 : ffffffff81403db0 s1 : 0000000000000000 a0 : 0000000000000004 a1 : 0000000000000000 a2 : ff6000009f1e7000 a3 : 0000000000002304 a4 : ffffffff80c1c2d8 a5 : 0000000000000000 a6 : ff6000001fe01958 a7 : 00002496ea89dbf1 s2 : ffffffff814f0220 s3 : 0000000000000001 s4 : 000000000000003f s5 : ffffffff814f03d8 s6 : 0000000000000000 s7 : ffffffff814f00d0 s8 : ffffffff81526f10 s9 : ffffffff80c1d880 s10: 0000000000000000 s11: 0000000000000001 t3 : 0000000000003392 t4 : 0000000000000000 t5 : 0000000000000000 t6 : 0000000000000040 status: 0000000200000120 badaddr: 0000000000000000 cause: 8000000000000009 orig_a0: ffffffff80837310 --- <IRQ stack> --- crash-utility#9 [ffffffff81403da0] default_idle_call at ffffffff80837314 crash-utility#10 [ffffffff81403db0] do_idle at ffffffff8004d0a0 crash-utility#11 [ffffffff81403e40] cpu_startup_entry at ffffffff8004d21e crash-utility#12 [ffffffff81403e60] kernel_init at ffffffff8083746a crash-utility#13 [ffffffff81403e70] arch_post_acpi_subsys_init at ffffffff80a006d8 crash-utility#14 [ffffffff81403e80] console_on_rootfs at ffffffff80a00c92 crash> crash> bt -E CPU 0 IRQ STACK: KERNEL-MODE EXCEPTION FRAME AT: ff20000000003a48 PC: ffffffff8006462e [__handle_irq_event_percpu+30] RA: ffffffff80064702 [handle_irq_event_percpu+18] SP: ff20000000003e60 CAUSE: 000000000000000d epc : ffffffff8006462e ra : ffffffff80064702 sp : ff20000000003e60 gp : ffffffff814ef848 tp : ffffffff8140db00 t0 : 0000000000046600 t1 : ffffffff80836464 t2 : ffffffff81200a48 s0 : ff20000000003ed0 s1 : 0000000000000000 a0 : 0000000000000000 a1 : 0000000000000118 a2 : 0000000000000052 a3 : 0000000000000000 a4 : 0000000000000000 a5 : 0000000000010001 a6 : ff6000001fe01958 a7 : 00002496ea89dbf1 s2 : ff60000000941ab0 s3 : ffffffff814a0658 s4 : ff60000000089230 s5 : ffffffff814a0518 s6 : ffffffff814a0620 s7 : ffffffff80e5f0f8 s8 : ffffffff80fc50b0 s9 : ffffffff80c1d880 s10: 0000000000000000 s11: 0000000000000001 t3 : 0000000000003392 t4 : 0000000000000000 t5 : 0000000000000000 t6 : 0000000000000040 status: 0000000200000100 badaddr: 0000000000000078 cause: 000000000000000d orig_a0: ff20000000003ea0 CPU 1 IRQ STACK:(none found) crash> crash> help -m <snip> machspec: ced1e0 irq_stack_size: 16384 irq_stacks[0]: ff20000000000000 irq_stacks[1]: ff20000000008000 crash> ``` [1]: https://lore.kernel.org/linux-riscv/[email protected]/ Signed-off-by: Song Shuai <[email protected]>
k-hagio
pushed a commit
that referenced
this pull request
Jan 11, 2024
This patch introduces per-cpu IRQ stacks for RISCV64 to let "bt" do backtrace on it and 'bt -E' search eframes on it, and the 'help -m' command displays the addresses of each per-cpu IRQ stack. TEST: a vmcore dumped via hacking the handle_irq_event_percpu() ( Why not using lkdtm INT_HW_IRQ_EN EXCEPTION ? There is a deadlock[1] in crash_kexec path if use that) crash> bt PID: 0 TASK: ffffffff8140db00 CPU: 0 COMMAND: "swapper/0" #0 [ff20000000003e60] __handle_irq_event_percpu at ffffffff8006462e #1 [ff20000000003ed0] handle_irq_event_percpu at ffffffff80064702 #2 [ff20000000003ef0] handle_irq_event at ffffffff8006477c #3 [ff20000000003f20] handle_fasteoi_irq at ffffffff80068664 #4 [ff20000000003f50] generic_handle_domain_irq at ffffffff80063988 #5 [ff20000000003f60] plic_handle_irq at ffffffff8046633e #6 [ff20000000003fb0] generic_handle_domain_irq at ffffffff80063988 #7 [ff20000000003fc0] riscv_intc_irq at ffffffff80465f8e #8 [ff20000000003fd0] handle_riscv_irq at ffffffff808361e8 PC: ffffffff80837314 [default_idle_call+50] RA: ffffffff80837310 [default_idle_call+46] SP: ffffffff81403da0 CAUSE: 8000000000000009 epc : ffffffff80837314 ra : ffffffff80837310 sp : ffffffff81403da0 gp : ffffffff814ef848 tp : ffffffff8140db00 t0 : ff2000000004bb18 t1 : 0000000000032c73 t2 : ffffffff81200a48 s0 : ffffffff81403db0 s1 : 0000000000000000 a0 : 0000000000000004 a1 : 0000000000000000 a2 : ff6000009f1e7000 a3 : 0000000000002304 a4 : ffffffff80c1c2d8 a5 : 0000000000000000 a6 : ff6000001fe01958 a7 : 00002496ea89dbf1 s2 : ffffffff814f0220 s3 : 0000000000000001 s4 : 000000000000003f s5 : ffffffff814f03d8 s6 : 0000000000000000 s7 : ffffffff814f00d0 s8 : ffffffff81526f10 s9 : ffffffff80c1d880 s10: 0000000000000000 s11: 0000000000000001 t3 : 0000000000003392 t4 : 0000000000000000 t5 : 0000000000000000 t6 : 0000000000000040 status: 0000000200000120 badaddr: 0000000000000000 cause: 8000000000000009 orig_a0: ffffffff80837310 --- <IRQ stack> --- #9 [ffffffff81403da0] default_idle_call at ffffffff80837314 #10 [ffffffff81403db0] do_idle at ffffffff8004d0a0 #11 [ffffffff81403e40] cpu_startup_entry at ffffffff8004d21e #12 [ffffffff81403e60] kernel_init at ffffffff8083746a #13 [ffffffff81403e70] arch_post_acpi_subsys_init at ffffffff80a006d8 #14 [ffffffff81403e80] console_on_rootfs at ffffffff80a00c92 crash> crash> bt -E CPU 0 IRQ STACK: KERNEL-MODE EXCEPTION FRAME AT: ff20000000003a48 PC: ffffffff8006462e [__handle_irq_event_percpu+30] RA: ffffffff80064702 [handle_irq_event_percpu+18] SP: ff20000000003e60 CAUSE: 000000000000000d epc : ffffffff8006462e ra : ffffffff80064702 sp : ff20000000003e60 gp : ffffffff814ef848 tp : ffffffff8140db00 t0 : 0000000000046600 t1 : ffffffff80836464 t2 : ffffffff81200a48 s0 : ff20000000003ed0 s1 : 0000000000000000 a0 : 0000000000000000 a1 : 0000000000000118 a2 : 0000000000000052 a3 : 0000000000000000 a4 : 0000000000000000 a5 : 0000000000010001 a6 : ff6000001fe01958 a7 : 00002496ea89dbf1 s2 : ff60000000941ab0 s3 : ffffffff814a0658 s4 : ff60000000089230 s5 : ffffffff814a0518 s6 : ffffffff814a0620 s7 : ffffffff80e5f0f8 s8 : ffffffff80fc50b0 s9 : ffffffff80c1d880 s10: 0000000000000000 s11: 0000000000000001 t3 : 0000000000003392 t4 : 0000000000000000 t5 : 0000000000000000 t6 : 0000000000000040 status: 0000000200000100 badaddr: 0000000000000078 cause: 000000000000000d orig_a0: ff20000000003ea0 CPU 1 IRQ STACK: (none found) crash> crash> help -m <snip> machspec: ced1e0 irq_stack_size: 16384 irq_stacks[0]: ff20000000000000 irq_stacks[1]: ff20000000008000 crash> [1]: https://lore.kernel.org/linux-riscv/[email protected]/ Signed-off-by: Song Shuai <[email protected]>
liutgnu
pushed a commit
to liutgnu/crash-preview
that referenced
this pull request
Feb 21, 2024
This patch introduces per-cpu IRQ stacks for RISCV64 to let "bt" do backtrace on it and 'bt -E' search eframes on it, and the 'help -m' command displays the addresses of each per-cpu IRQ stack. TEST: a vmcore dumped via hacking the handle_irq_event_percpu() ( Why not using lkdtm INT_HW_IRQ_EN EXCEPTION ? There is a deadlock[1] in crash_kexec path if use that) crash> bt PID: 0 TASK: ffffffff8140db00 CPU: 0 COMMAND: "swapper/0" #0 [ff20000000003e60] __handle_irq_event_percpu at ffffffff8006462e crash-utility#1 [ff20000000003ed0] handle_irq_event_percpu at ffffffff80064702 crash-utility#2 [ff20000000003ef0] handle_irq_event at ffffffff8006477c crash-utility#3 [ff20000000003f20] handle_fasteoi_irq at ffffffff80068664 crash-utility#4 [ff20000000003f50] generic_handle_domain_irq at ffffffff80063988 crash-utility#5 [ff20000000003f60] plic_handle_irq at ffffffff8046633e crash-utility#6 [ff20000000003fb0] generic_handle_domain_irq at ffffffff80063988 crash-utility#7 [ff20000000003fc0] riscv_intc_irq at ffffffff80465f8e crash-utility#8 [ff20000000003fd0] handle_riscv_irq at ffffffff808361e8 PC: ffffffff80837314 [default_idle_call+50] RA: ffffffff80837310 [default_idle_call+46] SP: ffffffff81403da0 CAUSE: 8000000000000009 epc : ffffffff80837314 ra : ffffffff80837310 sp : ffffffff81403da0 gp : ffffffff814ef848 tp : ffffffff8140db00 t0 : ff2000000004bb18 t1 : 0000000000032c73 t2 : ffffffff81200a48 s0 : ffffffff81403db0 s1 : 0000000000000000 a0 : 0000000000000004 a1 : 0000000000000000 a2 : ff6000009f1e7000 a3 : 0000000000002304 a4 : ffffffff80c1c2d8 a5 : 0000000000000000 a6 : ff6000001fe01958 a7 : 00002496ea89dbf1 s2 : ffffffff814f0220 s3 : 0000000000000001 s4 : 000000000000003f s5 : ffffffff814f03d8 s6 : 0000000000000000 s7 : ffffffff814f00d0 s8 : ffffffff81526f10 s9 : ffffffff80c1d880 s10: 0000000000000000 s11: 0000000000000001 t3 : 0000000000003392 t4 : 0000000000000000 t5 : 0000000000000000 t6 : 0000000000000040 status: 0000000200000120 badaddr: 0000000000000000 cause: 8000000000000009 orig_a0: ffffffff80837310 --- <IRQ stack> --- crash-utility#9 [ffffffff81403da0] default_idle_call at ffffffff80837314 crash-utility#10 [ffffffff81403db0] do_idle at ffffffff8004d0a0 crash-utility#11 [ffffffff81403e40] cpu_startup_entry at ffffffff8004d21e crash-utility#12 [ffffffff81403e60] kernel_init at ffffffff8083746a crash-utility#13 [ffffffff81403e70] arch_post_acpi_subsys_init at ffffffff80a006d8 crash-utility#14 [ffffffff81403e80] console_on_rootfs at ffffffff80a00c92 crash> crash> bt -E CPU 0 IRQ STACK: KERNEL-MODE EXCEPTION FRAME AT: ff20000000003a48 PC: ffffffff8006462e [__handle_irq_event_percpu+30] RA: ffffffff80064702 [handle_irq_event_percpu+18] SP: ff20000000003e60 CAUSE: 000000000000000d epc : ffffffff8006462e ra : ffffffff80064702 sp : ff20000000003e60 gp : ffffffff814ef848 tp : ffffffff8140db00 t0 : 0000000000046600 t1 : ffffffff80836464 t2 : ffffffff81200a48 s0 : ff20000000003ed0 s1 : 0000000000000000 a0 : 0000000000000000 a1 : 0000000000000118 a2 : 0000000000000052 a3 : 0000000000000000 a4 : 0000000000000000 a5 : 0000000000010001 a6 : ff6000001fe01958 a7 : 00002496ea89dbf1 s2 : ff60000000941ab0 s3 : ffffffff814a0658 s4 : ff60000000089230 s5 : ffffffff814a0518 s6 : ffffffff814a0620 s7 : ffffffff80e5f0f8 s8 : ffffffff80fc50b0 s9 : ffffffff80c1d880 s10: 0000000000000000 s11: 0000000000000001 t3 : 0000000000003392 t4 : 0000000000000000 t5 : 0000000000000000 t6 : 0000000000000040 status: 0000000200000100 badaddr: 0000000000000078 cause: 000000000000000d orig_a0: ff20000000003ea0 CPU 1 IRQ STACK: (none found) crash> crash> help -m <snip> machspec: ced1e0 irq_stack_size: 16384 irq_stacks[0]: ff20000000000000 irq_stacks[1]: ff20000000008000 crash> [1]: https://lore.kernel.org/linux-riscv/[email protected]/ Signed-off-by: Song Shuai <[email protected]>
adi-g15-ibm
pushed a commit
to adi-g15-ibm/crash
that referenced
this pull request
Mar 10, 2024
The stack unwinding is for kernel addresses only. If non-kernel address encountered, it is usually a user space address, or non-address value like a function call parameter. So stopping stack unwinding at non-kernel address will decrease the invalid unwind results. Before: crash> gdb bt #0 0xffffffff816a8f65 in context_switch ... crash-utility#1 __schedule () ... crash-utility#2 0xffffffff816a94e9 in schedule ... crash-utility#3 0xffffffff816a86fd in schedule_hrtimeout_range_clock ... crash-utility#4 0xffffffff816a8733 in schedule_hrtimeout_range ... crash-utility#5 0xffffffff8124bb7e in ep_poll ... crash-utility#6 0xffffffff8124d00d in SYSC_epoll_wait ... crash-utility#7 SyS_epoll_wait ... crash-utility#8 <signal handler called> crash-utility#9 0x00007f0449407923 in ?? () crash-utility#10 0xffff880100000001 in ?? () crash-utility#11 0xffff880169b3c010 in ?? () crash-utility#12 0x0000000000000040 in irq_stack_union () crash-utility#13 0xffff880169b3c058 in ?? () crash-utility#14 0xffff880169b3c048 in ?? () crash-utility#15 0xffff880169b3c050 in ?? () crash-utility#16 0x0000000000000000 in ?? () After: crash> gdb bt #0 0xffffffff816a8f65 in context_switch ... crash-utility#1 __schedule () ... crash-utility#2 0xffffffff816a94e9 in schedule () ... crash-utility#3 0xffffffff816a86fd in schedule_hrtimeout_range_clock ... crash-utility#4 0xffffffff816a8733 in schedule_hrtimeout_range ... crash-utility#5 0xffffffff8124bb7e in ep_poll ... crash-utility#6 0xffffffff8124d00d in SYSC_epoll_wait ... crash-utility#7 SyS_epoll_wait ... crash-utility#8 <signal handler called> crash-utility#9 0x00007f0449407923 in ?? () Signed-off-by: Tao Liu <ltao(a)redhat.com>
adi-g15-ibm
pushed a commit
to adi-g15-ibm/crash
that referenced
this pull request
Mar 27, 2024
The stack unwinding is for kernel addresses only. If non-kernel address encountered, it is usually a user space address, or non-address value like a function call parameter. So stopping stack unwinding at non-kernel address will decrease the invalid unwind results. Before: crash> gdb bt #0 0xffffffff816a8f65 in context_switch ... crash-utility#1 __schedule () ... crash-utility#2 0xffffffff816a94e9 in schedule ... crash-utility#3 0xffffffff816a86fd in schedule_hrtimeout_range_clock ... crash-utility#4 0xffffffff816a8733 in schedule_hrtimeout_range ... crash-utility#5 0xffffffff8124bb7e in ep_poll ... crash-utility#6 0xffffffff8124d00d in SYSC_epoll_wait ... crash-utility#7 SyS_epoll_wait ... crash-utility#8 <signal handler called> crash-utility#9 0x00007f0449407923 in ?? () crash-utility#10 0xffff880100000001 in ?? () crash-utility#11 0xffff880169b3c010 in ?? () crash-utility#12 0x0000000000000040 in irq_stack_union () crash-utility#13 0xffff880169b3c058 in ?? () crash-utility#14 0xffff880169b3c048 in ?? () crash-utility#15 0xffff880169b3c050 in ?? () crash-utility#16 0x0000000000000000 in ?? () After: crash> gdb bt #0 0xffffffff816a8f65 in context_switch ... crash-utility#1 __schedule () ... crash-utility#2 0xffffffff816a94e9 in schedule () ... crash-utility#3 0xffffffff816a86fd in schedule_hrtimeout_range_clock ... crash-utility#4 0xffffffff816a8733 in schedule_hrtimeout_range ... crash-utility#5 0xffffffff8124bb7e in ep_poll ... crash-utility#6 0xffffffff8124d00d in SYSC_epoll_wait ... crash-utility#7 SyS_epoll_wait ... crash-utility#8 <signal handler called> crash-utility#9 0x00007f0449407923 in ?? () Signed-off-by: Tao Liu <[email protected]>
adi-g15-ibm
pushed a commit
to adi-g15-ibm/crash
that referenced
this pull request
Mar 27, 2024
The stack unwinding is for kernel addresses only. If non-kernel address encountered, it is usually a user space address, or non-address value like a function call parameter. So stopping stack unwinding at non-kernel address will decrease the invalid unwind results. Before: crash> gdb bt #0 0xffffffff816a8f65 in context_switch ... crash-utility#1 __schedule () ... crash-utility#2 0xffffffff816a94e9 in schedule ... crash-utility#3 0xffffffff816a86fd in schedule_hrtimeout_range_clock ... crash-utility#4 0xffffffff816a8733 in schedule_hrtimeout_range ... crash-utility#5 0xffffffff8124bb7e in ep_poll ... crash-utility#6 0xffffffff8124d00d in SYSC_epoll_wait ... crash-utility#7 SyS_epoll_wait ... crash-utility#8 <signal handler called> crash-utility#9 0x00007f0449407923 in ?? () crash-utility#10 0xffff880100000001 in ?? () crash-utility#11 0xffff880169b3c010 in ?? () crash-utility#12 0x0000000000000040 in irq_stack_union () crash-utility#13 0xffff880169b3c058 in ?? () crash-utility#14 0xffff880169b3c048 in ?? () crash-utility#15 0xffff880169b3c050 in ?? () crash-utility#16 0x0000000000000000 in ?? () After: crash> gdb bt #0 0xffffffff816a8f65 in context_switch ... crash-utility#1 __schedule () ... crash-utility#2 0xffffffff816a94e9 in schedule () ... crash-utility#3 0xffffffff816a86fd in schedule_hrtimeout_range_clock ... crash-utility#4 0xffffffff816a8733 in schedule_hrtimeout_range ... crash-utility#5 0xffffffff8124bb7e in ep_poll ... crash-utility#6 0xffffffff8124d00d in SYSC_epoll_wait ... crash-utility#7 SyS_epoll_wait ... crash-utility#8 <signal handler called> crash-utility#9 0x00007f0449407923 in ?? () Signed-off-by: Tao Liu <[email protected]>
adi-g15-ibm
pushed a commit
to adi-g15-ibm/crash
that referenced
this pull request
Mar 27, 2024
The stack unwinding is for kernel addresses only. If non-kernel address encountered, it is usually a user space address, or non-address value like a function call parameter. So stopping stack unwinding at non-kernel address will decrease the invalid unwind results. Before: crash> gdb bt #0 0xffffffff816a8f65 in context_switch ... crash-utility#1 __schedule () ... crash-utility#2 0xffffffff816a94e9 in schedule ... crash-utility#3 0xffffffff816a86fd in schedule_hrtimeout_range_clock ... crash-utility#4 0xffffffff816a8733 in schedule_hrtimeout_range ... crash-utility#5 0xffffffff8124bb7e in ep_poll ... crash-utility#6 0xffffffff8124d00d in SYSC_epoll_wait ... crash-utility#7 SyS_epoll_wait ... crash-utility#8 <signal handler called> crash-utility#9 0x00007f0449407923 in ?? () crash-utility#10 0xffff880100000001 in ?? () crash-utility#11 0xffff880169b3c010 in ?? () crash-utility#12 0x0000000000000040 in irq_stack_union () crash-utility#13 0xffff880169b3c058 in ?? () crash-utility#14 0xffff880169b3c048 in ?? () crash-utility#15 0xffff880169b3c050 in ?? () crash-utility#16 0x0000000000000000 in ?? () After: crash> gdb bt #0 0xffffffff816a8f65 in context_switch ... crash-utility#1 __schedule () ... crash-utility#2 0xffffffff816a94e9 in schedule () ... crash-utility#3 0xffffffff816a86fd in schedule_hrtimeout_range_clock ... crash-utility#4 0xffffffff816a8733 in schedule_hrtimeout_range ... crash-utility#5 0xffffffff8124bb7e in ep_poll ... crash-utility#6 0xffffffff8124d00d in SYSC_epoll_wait ... crash-utility#7 SyS_epoll_wait ... crash-utility#8 <signal handler called> crash-utility#9 0x00007f0449407923 in ?? () Signed-off-by: Tao Liu <[email protected]>
adi-g15-ibm
pushed a commit
to adi-g15-ibm/crash
that referenced
this pull request
Mar 28, 2024
The stack unwinding is for kernel addresses only. If non-kernel address encountered, it is usually a user space address, or non-address value like a function call parameter. So stopping stack unwinding at non-kernel address will decrease the invalid unwind results. Before: crash> gdb bt #0 0xffffffff816a8f65 in context_switch ... crash-utility#1 __schedule () ... crash-utility#2 0xffffffff816a94e9 in schedule ... crash-utility#3 0xffffffff816a86fd in schedule_hrtimeout_range_clock ... crash-utility#4 0xffffffff816a8733 in schedule_hrtimeout_range ... crash-utility#5 0xffffffff8124bb7e in ep_poll ... crash-utility#6 0xffffffff8124d00d in SYSC_epoll_wait ... crash-utility#7 SyS_epoll_wait ... crash-utility#8 <signal handler called> crash-utility#9 0x00007f0449407923 in ?? () crash-utility#10 0xffff880100000001 in ?? () crash-utility#11 0xffff880169b3c010 in ?? () crash-utility#12 0x0000000000000040 in irq_stack_union () crash-utility#13 0xffff880169b3c058 in ?? () crash-utility#14 0xffff880169b3c048 in ?? () crash-utility#15 0xffff880169b3c050 in ?? () crash-utility#16 0x0000000000000000 in ?? () After: crash> gdb bt #0 0xffffffff816a8f65 in context_switch ... crash-utility#1 __schedule () ... crash-utility#2 0xffffffff816a94e9 in schedule () ... crash-utility#3 0xffffffff816a86fd in schedule_hrtimeout_range_clock ... crash-utility#4 0xffffffff816a8733 in schedule_hrtimeout_range ... crash-utility#5 0xffffffff8124bb7e in ep_poll ... crash-utility#6 0xffffffff8124d00d in SYSC_epoll_wait ... crash-utility#7 SyS_epoll_wait ... crash-utility#8 <signal handler called> crash-utility#9 0x00007f0449407923 in ?? () Signed-off-by: Tao Liu <[email protected]>
adi-g15-ibm
pushed a commit
to adi-g15-ibm/crash
that referenced
this pull request
Mar 28, 2024
The stack unwinding is for kernel addresses only. If non-kernel address encountered, it is usually a user space address, or non-address value like a function call parameter. So stopping stack unwinding at non-kernel address will decrease the invalid unwind results. Before: crash> gdb bt #0 0xffffffff816a8f65 in context_switch ... crash-utility#1 __schedule () ... crash-utility#2 0xffffffff816a94e9 in schedule ... crash-utility#3 0xffffffff816a86fd in schedule_hrtimeout_range_clock ... crash-utility#4 0xffffffff816a8733 in schedule_hrtimeout_range ... crash-utility#5 0xffffffff8124bb7e in ep_poll ... crash-utility#6 0xffffffff8124d00d in SYSC_epoll_wait ... crash-utility#7 SyS_epoll_wait ... crash-utility#8 <signal handler called> crash-utility#9 0x00007f0449407923 in ?? () crash-utility#10 0xffff880100000001 in ?? () crash-utility#11 0xffff880169b3c010 in ?? () crash-utility#12 0x0000000000000040 in irq_stack_union () crash-utility#13 0xffff880169b3c058 in ?? () crash-utility#14 0xffff880169b3c048 in ?? () crash-utility#15 0xffff880169b3c050 in ?? () crash-utility#16 0x0000000000000000 in ?? () After: crash> gdb bt #0 0xffffffff816a8f65 in context_switch ... crash-utility#1 __schedule () ... crash-utility#2 0xffffffff816a94e9 in schedule () ... crash-utility#3 0xffffffff816a86fd in schedule_hrtimeout_range_clock ... crash-utility#4 0xffffffff816a8733 in schedule_hrtimeout_range ... crash-utility#5 0xffffffff8124bb7e in ep_poll ... crash-utility#6 0xffffffff8124d00d in SYSC_epoll_wait ... crash-utility#7 SyS_epoll_wait ... crash-utility#8 <signal handler called> crash-utility#9 0x00007f0449407923 in ?? () Signed-off-by: Tao Liu <[email protected]>
adi-g15-ibm
pushed a commit
to adi-g15-ibm/crash
that referenced
this pull request
Mar 29, 2024
The stack unwinding is for kernel addresses only. If non-kernel address encountered, it is usually a user space address, or non-address value like a function call parameter. So stopping stack unwinding at non-kernel address will decrease the invalid unwind results. Before: crash> gdb bt #0 0xffffffff816a8f65 in context_switch ... crash-utility#1 __schedule () ... crash-utility#2 0xffffffff816a94e9 in schedule ... crash-utility#3 0xffffffff816a86fd in schedule_hrtimeout_range_clock ... crash-utility#4 0xffffffff816a8733 in schedule_hrtimeout_range ... crash-utility#5 0xffffffff8124bb7e in ep_poll ... crash-utility#6 0xffffffff8124d00d in SYSC_epoll_wait ... crash-utility#7 SyS_epoll_wait ... crash-utility#8 <signal handler called> crash-utility#9 0x00007f0449407923 in ?? () crash-utility#10 0xffff880100000001 in ?? () crash-utility#11 0xffff880169b3c010 in ?? () crash-utility#12 0x0000000000000040 in irq_stack_union () crash-utility#13 0xffff880169b3c058 in ?? () crash-utility#14 0xffff880169b3c048 in ?? () crash-utility#15 0xffff880169b3c050 in ?? () crash-utility#16 0x0000000000000000 in ?? () After: crash> gdb bt #0 0xffffffff816a8f65 in context_switch ... crash-utility#1 __schedule () ... crash-utility#2 0xffffffff816a94e9 in schedule () ... crash-utility#3 0xffffffff816a86fd in schedule_hrtimeout_range_clock ... crash-utility#4 0xffffffff816a8733 in schedule_hrtimeout_range ... crash-utility#5 0xffffffff8124bb7e in ep_poll ... crash-utility#6 0xffffffff8124d00d in SYSC_epoll_wait ... crash-utility#7 SyS_epoll_wait ... crash-utility#8 <signal handler called> crash-utility#9 0x00007f0449407923 in ?? () Signed-off-by: Tao Liu <[email protected]>
k-hagio
pushed a commit
that referenced
this pull request
Apr 17, 2024
On x86_64, there are cases where crash cannot handle IRQ exception frames properly. For example, with RHEL9.3 kernel, "bt" command fails with with "WARNING possibly bogus exception frame": crash> bt -c 30 PID: 2898241 TASK: ff4cb0ce0da0c680 CPU: 30 COMMAND: "star-ccm+" #0 [fffffe4658d88e58] crash_nmi_callback at ffffffffa00675e8 #1 [fffffe4658d88e68] nmi_handle at ffffffffa002ebab ... --- <NMI exception stack> --- ... #13 [ff5eba269937cf90] __do_softirq at ffffffffa0c6c007 #14 [ff5eba269937cfe0] __irq_exit_rcu at ffffffffa010ef61 #15 [ff5eba269937cff0] sysvec_apic_timer_interrupt at ffffffffa0c58ca2 --- <IRQ stack> --- RIP: 0000000000000010 RSP: 0000000000000018 RFLAGS: ff5eba26ddc9f7e8 RAX: 0000000000000a20 RBX: ff5eba26ddc9f940 RCX: 0000000000001000 RDX: ffffffb559980000 RSI: ff4cb12d67207400 RDI: ffffffffffffffff RBP: 0000000000001000 R8: ff5eba26ddc9f940 R9: ff5eba26ddc9f8af R10: 0000000000000003 R11: 0000000000000a20 R12: ff5eba26ddc9f8b0 R13: 000000283c07f000 R14: ff4cb0f5a29a1c00 R15: 0000000000000001 ORIG_RAX: ffffffffa07c4e60 CS: 0206 SS: 7000001cf0380001 bt: WARNING: possibly bogus exception frame Running "crash" with "--machdep irq_eframe_link=0xffffffffffffffe8" option (i.e. thus irq_eframe_link = -24) works properly: PID: 2898241 TASK: ff4cb0ce0da0c680 CPU: 30 COMMAND: "star-ccm+" #0 [fffffe4658d88e58] crash_nmi_callback at ffffffffa00675e8 #1 [fffffe4658d88e68] nmi_handle at ffffffffa002ebab ... --- <NMI exception stack> --- ... #13 [ff5eba269937cf90] __do_softirq at ffffffffa0c6c007 #14 [ff5eba269937cfe0] __irq_exit_rcu at ffffffffa010ef61 #15 [ff5eba269937cff0] sysvec_apic_timer_interrupt at ffffffffa0c58ca2 --- <IRQ stack> --- #16 [ff5eba26ddc9f738] asm_sysvec_apic_timer_interrupt at ffffffffa0e00e06 [exception RIP: alloc_pte.constprop.0+32] RIP: ffffffffa07c4e60 RSP: ff5eba26ddc9f7e8 RFLAGS: 00000206 RAX: ff5eba26ddc9f940 RBX: 0000000000001000 RCX: 0000000000000a20 RDX: 0000000000001000 RSI: ffffffb559980000 RDI: ff4cb12d67207400 RBP: ff5eba26ddc9f8b0 R8: ff5eba26ddc9f8af R9: 0000000000000003 R10: 0000000000000a20 R11: ff5eba26ddc9f940 R12: 000000283c07f000 R13: ff4cb0f5a29a1c00 R14: 0000000000000001 R15: ff4cb0f5a29a1bf8 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #17 [ff5eba26ddc9f830] iommu_v1_map_pages at ffffffffa07c5648 #18 [ff5eba26ddc9f8f8] __iommu_map at ffffffffa07d7803 #19 [ff5eba26ddc9f990] iommu_map_sg at ffffffffa07d7b71 #20 [ff5eba26ddc9f9f0] iommu_dma_map_sg at ffffffffa07ddcc9 #21 [ff5eba26ddc9fa90] __dma_map_sg_attrs at ffffffffa01b5205 ... Some background: asm_common_interrupt: callq error_entry movq %rax,%rsp movq %rsp,%rdi movq 0x78(%rsp),%rsi movq $-0x1,0x78(%rsp) call common_interrupt # rsp pointing to regs common_interrupt: pushq %r12 pushq %rbp pushq %rbx [...] movq hardirq_stack_ptr,%r11 movq %rsp,(%r11) movq %r11,%rsp [...] call __common_interrupt # rip:common_interrupt So frame_size(rip:common_interrupt) = 32 (3 push + ret). Hence "machdep->machspec->irq_eframe_link = -32;" (see x86_64_irq_eframe_link_init()). Now: asm_sysvec_apic_timer_interrupt: pushq $-0x1 callq error_entry movq %rax,%rsp movq %rsp,%rdi callq sysvec_apic_timer_interrupt sysvec_apic_timer_interrupt: pushq %r12 pushq %rbp [...] movq hardirq_stack_ptr,%r11 movq %rsp,(%r11) movq %r11,%rsp [...] call __sysvec_apic_timer_interrupt # rip:sysvec_apic_timer_interrupt Here frame_size(rip:sysvec_apic_timer_interrupt) = 24 (2 push + ret) We should also notice that: rip = *(hardirq_stack_ptr - 8) rsp = *(hardirq_stack_ptr) regs = rsp - frame_size(rip) But x86_64_get_framesize() does not work with IRQ handlers (returns 0). So not many options other than hardcoding the most likely value and looking around it. Actually x86_64_irq_eframe_link() was trying -32 (default), and then -40, but not -24. Signed-off-by: Georges Aureau <[email protected]>
adi-g15-ibm
pushed a commit
to adi-g15-ibm/crash
that referenced
this pull request
May 19, 2024
The stack unwinding is for kernel addresses only. If non-kernel address encountered, it is usually a user space address, or non-address value like a function call parameter. So stopping stack unwinding at non-kernel address will decrease the invalid unwind results. Before: crash> gdb bt #0 0xffffffff816a8f65 in context_switch ... crash-utility#1 __schedule () ... crash-utility#2 0xffffffff816a94e9 in schedule ... crash-utility#3 0xffffffff816a86fd in schedule_hrtimeout_range_clock ... crash-utility#4 0xffffffff816a8733 in schedule_hrtimeout_range ... crash-utility#5 0xffffffff8124bb7e in ep_poll ... crash-utility#6 0xffffffff8124d00d in SYSC_epoll_wait ... crash-utility#7 SyS_epoll_wait ... crash-utility#8 <signal handler called> crash-utility#9 0x00007f0449407923 in ?? () crash-utility#10 0xffff880100000001 in ?? () crash-utility#11 0xffff880169b3c010 in ?? () crash-utility#12 0x0000000000000040 in irq_stack_union () crash-utility#13 0xffff880169b3c058 in ?? () crash-utility#14 0xffff880169b3c048 in ?? () crash-utility#15 0xffff880169b3c050 in ?? () crash-utility#16 0x0000000000000000 in ?? () After: crash> gdb bt #0 0xffffffff816a8f65 in context_switch ... crash-utility#1 __schedule () ... crash-utility#2 0xffffffff816a94e9 in schedule () ... crash-utility#3 0xffffffff816a86fd in schedule_hrtimeout_range_clock ... crash-utility#4 0xffffffff816a8733 in schedule_hrtimeout_range ... crash-utility#5 0xffffffff8124bb7e in ep_poll ... crash-utility#6 0xffffffff8124d00d in SYSC_epoll_wait ... crash-utility#7 SyS_epoll_wait ... crash-utility#8 <signal handler called> crash-utility#9 0x00007f0449407923 in ?? () Cc: Sourabh Jain <[email protected]> Cc: Hari Bathini <[email protected]> Cc: Mahesh J Salgaonkar <[email protected]> Cc: Naveen N. Rao <[email protected]> Cc: Lianbo Jiang <[email protected]> Cc: HAGIO KAZUHITO(萩尾 一仁) <[email protected]> Cc: Tao Liu <[email protected]> Cc: Alexey Makhalov <[email protected]> Signed-off-by: Tao Liu <[email protected]>
adi-g15-ibm
pushed a commit
to adi-g15-ibm/crash
that referenced
this pull request
Jul 31, 2024
The stack unwinding is for kernel addresses only. If non-kernel address encountered, it is usually a user space address, or non-address value like a function call parameter. So stopping stack unwinding at non-kernel address will decrease the invalid unwind results. Before: crash> gdb bt #0 0xffffffff816a8f65 in context_switch ... crash-utility#1 __schedule () ... crash-utility#2 0xffffffff816a94e9 in schedule ... crash-utility#3 0xffffffff816a86fd in schedule_hrtimeout_range_clock ... crash-utility#4 0xffffffff816a8733 in schedule_hrtimeout_range ... crash-utility#5 0xffffffff8124bb7e in ep_poll ... crash-utility#6 0xffffffff8124d00d in SYSC_epoll_wait ... crash-utility#7 SyS_epoll_wait ... crash-utility#8 <signal handler called> crash-utility#9 0x00007f0449407923 in ?? () crash-utility#10 0xffff880100000001 in ?? () crash-utility#11 0xffff880169b3c010 in ?? () crash-utility#12 0x0000000000000040 in irq_stack_union () crash-utility#13 0xffff880169b3c058 in ?? () crash-utility#14 0xffff880169b3c048 in ?? () crash-utility#15 0xffff880169b3c050 in ?? () crash-utility#16 0x0000000000000000 in ?? () After: crash> gdb bt #0 0xffffffff816a8f65 in context_switch ... crash-utility#1 __schedule () ... crash-utility#2 0xffffffff816a94e9 in schedule () ... crash-utility#3 0xffffffff816a86fd in schedule_hrtimeout_range_clock ... crash-utility#4 0xffffffff816a8733 in schedule_hrtimeout_range ... crash-utility#5 0xffffffff8124bb7e in ep_poll ... crash-utility#6 0xffffffff8124d00d in SYSC_epoll_wait ... crash-utility#7 SyS_epoll_wait ... crash-utility#8 <signal handler called> Cc: Sourabh Jain <[email protected]> Cc: Hari Bathini <[email protected]> Cc: Mahesh J Salgaonkar <[email protected]> Cc: Naveen N. Rao <[email protected]> Cc: Lianbo Jiang <[email protected]> Cc: HAGIO KAZUHITO(萩尾 一仁) <[email protected]> Cc: Tao Liu <[email protected]> Cc: Alexey Makhalov <[email protected]> Signed-off-by: Tao Liu <[email protected]>
adi-g15-ibm
pushed a commit
to adi-g15-ibm/crash
that referenced
this pull request
Aug 27, 2024
The stack unwinding is for kernel addresses only. If non-kernel address encountered, it is usually a user space address, or non-address value like a function call parameter. So stopping stack unwinding at non-kernel address will decrease the invalid unwind results. Before: crash> gdb bt #0 0xffffffff816a8f65 in context_switch ... crash-utility#1 __schedule () ... crash-utility#2 0xffffffff816a94e9 in schedule ... crash-utility#3 0xffffffff816a86fd in schedule_hrtimeout_range_clock ... crash-utility#4 0xffffffff816a8733 in schedule_hrtimeout_range ... crash-utility#5 0xffffffff8124bb7e in ep_poll ... crash-utility#6 0xffffffff8124d00d in SYSC_epoll_wait ... crash-utility#7 SyS_epoll_wait ... crash-utility#8 <signal handler called> crash-utility#9 0x00007f0449407923 in ?? () crash-utility#10 0xffff880100000001 in ?? () crash-utility#11 0xffff880169b3c010 in ?? () crash-utility#12 0x0000000000000040 in irq_stack_union () crash-utility#13 0xffff880169b3c058 in ?? () crash-utility#14 0xffff880169b3c048 in ?? () crash-utility#15 0xffff880169b3c050 in ?? () crash-utility#16 0x0000000000000000 in ?? () After: crash> gdb bt #0 0xffffffff816a8f65 in context_switch ... crash-utility#1 __schedule () ... crash-utility#2 0xffffffff816a94e9 in schedule () ... crash-utility#3 0xffffffff816a86fd in schedule_hrtimeout_range_clock ... crash-utility#4 0xffffffff816a8733 in schedule_hrtimeout_range ... crash-utility#5 0xffffffff8124bb7e in ep_poll ... crash-utility#6 0xffffffff8124d00d in SYSC_epoll_wait ... crash-utility#7 SyS_epoll_wait ... crash-utility#8 <signal handler called> Cc: Sourabh Jain <[email protected]> Cc: Hari Bathini <[email protected]> Cc: Mahesh J Salgaonkar <[email protected]> Cc: Naveen N. Rao <[email protected]> Cc: Lianbo Jiang <[email protected]> Cc: HAGIO KAZUHITO(萩尾 一仁) <[email protected]> Cc: Tao Liu <[email protected]> Cc: Alexey Makhalov <[email protected]> Signed-off-by: Tao Liu <[email protected]>
lian-bo
pushed a commit
that referenced
this pull request
Sep 18, 2024
Previously, "retq" is used to determine the end of a function, so the end of framesize calculation. However "ret" might be outputted by gdb rather than "retq", as a result, the framesize is returned incorrectly, and bogus stack trace will be outputted. Without the patch: $ crash -d 3 vmcore vmlinux crash> bt 0xffffffff92da7545 <copy_process+5>: push %rbp [framesize: 8] ... 0xffffffff92da7561 <copy_process+33>: sub $0x238,%rsp [framesize: 624] ... 0xffffffff92da776a <copy_process+554>: pop %r15 [framesize: 8] 0xffffffff92da776c <copy_process+556>: pop %rbp [framesize: 0] 0xffffffff92da776d <copy_process+557>: ret crash> bt -D dump framesize_cache_entries: ... [ 3]: ffffffff92dadcbd 0 CF (copy_process+26493) crash> bt ... #9 [ffff888263157bc0] copy_process at ffffffff92dadcbd #10 [ffff888263157d20] __mutex_init at ffffffff92ed8dd5 #11 [ffff888263157d38] __alloc_file at ffffffff93458397 #12 [ffff888263157d60] alloc_empty_file at ffffffff934585d2 #13 [ffff888263157da8] __alloc_fd at ffffffff934b5ead #14 [ffff888263157e38] _do_fork at ffffffff92dae7a1 #15 [ffff888263157f28] do_syscall_64 at ffffffff92c085f4 Stack #10 ~ #13 are bogus and misleading. With the patch: ... 0xffffffff92da776d <copy_process+557>: ret [framesize restored to: 624] crash> bt -D dump ... [ 3]: ffffffff92dadcbd 624 CF (copy_process+26493) crash> bt ... #9 [ffff888263157bc0] copy_process at ffffffff92dadcbd #10 [ffff888263157e38] _do_fork at ffffffff92dae7a1 #11 [ffff888263157f28] do_syscall_64 at ffffffff92c085f4 Signed-off-by: Tao Liu <[email protected]>
liutgnu
added a commit
to liutgnu/crash-preview
that referenced
this pull request
Dec 5, 2024
…usly There is an issue that, for kernel modules, "dis -rl" fails to display modules code line number data after execute "bt" command in crash. Without the patch: crsah> mod -S crash> bt PID: 1500 TASK: ff2bd8b093524000 CPU: 16 COMMAND: "lpfc_worker_0" #0 [ff2c9f725c39f9e0] machine_kexec at ffffffff8e0686d3 ...snip... crash-utility#8 [ff2c9f725c39fcc0] __lpfc_sli_release_iocbq_s4 at ffffffffc0f2f425 [lpfc] ...snip... crash> dis -rl ffffffffc0f60f82 0xffffffffc0f60eb0 <lpfc_nlp_get>: nopl 0x0(%rax,%rax,1) [FTRACE NOP] 0xffffffffc0f60eb5 <lpfc_nlp_get+5>: push %rbp 0xffffffffc0f60eb6 <lpfc_nlp_get+6>: push %rbx 0xffffffffc0f60eb7 <lpfc_nlp_get+7>: test %rdi,%rdi With the patch: crash> mod -S crash> bt PID: 1500 TASK: ff2bd8b093524000 CPU: 16 COMMAND: "lpfc_worker_0" #0 [ff2c9f725c39f9e0] machine_kexec at ffffffff8e0686d3 ...snip... crash-utility#8 [ff2c9f725c39fcc0] __lpfc_sli_release_iocbq_s4 at ffffffffc0f2f425 [lpfc] ...snip... crash> dis -rl ffffffffc0f60f82 /usr/src/debug/kernel-4.18.0-425.13.1.el8_7/linux-4.18.0-425.13.1.el8_7.x86_64/drivers/scsi/lpfc/lpfc_hbadisc.c: 6756 0xffffffffc0f60eb0 <lpfc_nlp_get>: nopl 0x0(%rax,%rax,1) [FTRACE NOP] /usr/src/debug/kernel-4.18.0-425.13.1.el8_7/linux-4.18.0-425.13.1.el8_7.x86_64/drivers/scsi/lpfc/lpfc_hbadisc.c: 6759 0xffffffffc0f60eb5 <lpfc_nlp_get+5>: push %rbp The root cause is, after kernel module been loaded by mod command, the symtable is not expanded in gdb side. crash bt or dis command will trigger such an expansion. However the symtable expansion is different for the 2 commands: The stack trace of "dis -rl" for symtable expanding: #0 0x00000000008d8d9f in add_compunit_symtab_to_objfile ... crash-utility#1 0x00000000006d3293 in buildsym_compunit::end_symtab_with_blockvector ... crash-utility#2 0x00000000006d336a in buildsym_compunit::end_symtab_from_static_block ... crash-utility#3 0x000000000077e8e9 in process_full_comp_unit ... crash-utility#4 process_queue ... crash-utility#5 dw2_do_instantiate_symtab ... crash-utility#6 0x000000000077ed67 in dw2_instantiate_symtab ... crash-utility#7 0x000000000077f75e in dw2_expand_all_symtabs ... crash-utility#8 0x00000000008f254d in gdb_get_line_number ... crash-utility#9 0x00000000008f22af in gdb_command_funnel_1 ... crash-utility#10 0x00000000008f2003 in gdb_command_funnel ... crash-utility#11 0x00000000005b7f02 in gdb_interface ... crash-utility#12 0x00000000005f5bd8 in get_line_number ... crash-utility#13 0x000000000059e574 in cmd_dis ... The stack trace of "bt" for symtable expanding: #0 0x00000000008d8d9f in add_compunit_symtab_to_objfile ... crash-utility#1 0x00000000006d3293 in buildsym_compunit::end_symtab_with_blockvector ... crash-utility#2 0x00000000006d336a in buildsym_compunit::end_symtab_from_static_block ... crash-utility#3 0x000000000077e8e9 in process_full_comp_unit ... crash-utility#4 process_queue ... crash-utility#5 dw2_do_instantiate_symtab ... crash-utility#6 0x000000000077ed67 in dw2_instantiate_symtab ... crash-utility#7 0x000000000077f8ed in dw2_lookup_symbol ... crash-utility#8 0x00000000008e6d03 in lookup_symbol_via_quick_fns ... crash-utility#9 0x00000000008e7153 in lookup_symbol_in_objfile ... crash-utility#10 0x00000000008e73c6 in lookup_symbol_global_or_static_iterator_cb ... crash-utility#11 0x00000000008b99c4 in svr4_iterate_over_objfiles_in_search_order ... crash-utility#12 0x00000000008e754e in lookup_global_or_static_symbol ... crash-utility#13 0x00000000008e75da in lookup_static_symbol ... crash-utility#14 0x00000000008e632c in lookup_symbol_aux ... crash-utility#15 0x00000000008e5a7a in lookup_symbol_in_language ... crash-utility#16 0x00000000008e5b30 in lookup_symbol ... crash-utility#17 0x00000000008f2a4a in gdb_get_datatype ... crash-utility#18 0x00000000008f22c0 in gdb_command_funnel_1 ... crash-utility#19 0x00000000008f2003 in gdb_command_funnel ... crash-utility#20 0x00000000005b7f02 in gdb_interface ... crash-utility#21 0x00000000005f8a9f in datatype_info ... crash-utility#22 0x0000000000599947 in cpu_map_size ... crash-utility#23 0x00000000005a975d in get_cpus_online ... crash-utility#24 0x0000000000637a8b in diskdump_get_prstatus_percpu ... crash-utility#25 0x000000000062f0e4 in get_netdump_regs_x86_64 ... crash-utility#26 0x000000000059fe68 in back_trace ... crash-utility#27 0x00000000005ab1cb in cmd_bt ... For the stacktrace of "dis -rl", it calls dw2_expand_all_symtabs() to expand all symtable of the objfile, or "*.ko.debug" in our case. However for the stacktrace of "bt", it doesn't expand all, but only a subset of symtable which is enough to find a symbol by dw2_lookup_symbol(). As a result, the objfile->compunit_symtabs, which is the head of a single linked list of struct compunit_symtab, is not NULL but didn't contain all symtables. It will not be reinitialized in gdb_get_line_number() by "dis -rl" because !objfile_has_full_symbols(objfile) check will fail, so it cannot display the proper code line number data. Since objfile_has_full_symbols(objfile) check cannot ensure all symbols been expanded, this patch add a new member as a flag for struct objfile to record if all symbols have been expanded. The flag will be set only ofter expand_all_symtabs been called. Signed-off-by: Tao Liu <[email protected]>
liutgnu
pushed a commit
to liutgnu/crash-preview
that referenced
this pull request
Dec 5, 2024
This patch introduces per-cpu IRQ stacks for RISCV64 to let "bt" do backtrace on it and 'bt -E' search eframes on it, and the 'help -m' command displays the addresses of each per-cpu IRQ stack. TEST: a vmcore dumped via hacking the handle_irq_event_percpu() ( Why not using lkdtm INT_HW_IRQ_EN EXCEPTION ? There is a deadlock[1] in crash_kexec path if use that) crash> bt PID: 0 TASK: ffffffff8140db00 CPU: 0 COMMAND: "swapper/0" #0 [ff20000000003e60] __handle_irq_event_percpu at ffffffff8006462e crash-utility#1 [ff20000000003ed0] handle_irq_event_percpu at ffffffff80064702 crash-utility#2 [ff20000000003ef0] handle_irq_event at ffffffff8006477c crash-utility#3 [ff20000000003f20] handle_fasteoi_irq at ffffffff80068664 crash-utility#4 [ff20000000003f50] generic_handle_domain_irq at ffffffff80063988 crash-utility#5 [ff20000000003f60] plic_handle_irq at ffffffff8046633e crash-utility#6 [ff20000000003fb0] generic_handle_domain_irq at ffffffff80063988 crash-utility#7 [ff20000000003fc0] riscv_intc_irq at ffffffff80465f8e crash-utility#8 [ff20000000003fd0] handle_riscv_irq at ffffffff808361e8 PC: ffffffff80837314 [default_idle_call+50] RA: ffffffff80837310 [default_idle_call+46] SP: ffffffff81403da0 CAUSE: 8000000000000009 epc : ffffffff80837314 ra : ffffffff80837310 sp : ffffffff81403da0 gp : ffffffff814ef848 tp : ffffffff8140db00 t0 : ff2000000004bb18 t1 : 0000000000032c73 t2 : ffffffff81200a48 s0 : ffffffff81403db0 s1 : 0000000000000000 a0 : 0000000000000004 a1 : 0000000000000000 a2 : ff6000009f1e7000 a3 : 0000000000002304 a4 : ffffffff80c1c2d8 a5 : 0000000000000000 a6 : ff6000001fe01958 a7 : 00002496ea89dbf1 s2 : ffffffff814f0220 s3 : 0000000000000001 s4 : 000000000000003f s5 : ffffffff814f03d8 s6 : 0000000000000000 s7 : ffffffff814f00d0 s8 : ffffffff81526f10 s9 : ffffffff80c1d880 s10: 0000000000000000 s11: 0000000000000001 t3 : 0000000000003392 t4 : 0000000000000000 t5 : 0000000000000000 t6 : 0000000000000040 status: 0000000200000120 badaddr: 0000000000000000 cause: 8000000000000009 orig_a0: ffffffff80837310 --- <IRQ stack> --- crash-utility#9 [ffffffff81403da0] default_idle_call at ffffffff80837314 crash-utility#10 [ffffffff81403db0] do_idle at ffffffff8004d0a0 crash-utility#11 [ffffffff81403e40] cpu_startup_entry at ffffffff8004d21e crash-utility#12 [ffffffff81403e60] kernel_init at ffffffff8083746a crash-utility#13 [ffffffff81403e70] arch_post_acpi_subsys_init at ffffffff80a006d8 crash-utility#14 [ffffffff81403e80] console_on_rootfs at ffffffff80a00c92 crash> crash> bt -E CPU 0 IRQ STACK: KERNEL-MODE EXCEPTION FRAME AT: ff20000000003a48 PC: ffffffff8006462e [__handle_irq_event_percpu+30] RA: ffffffff80064702 [handle_irq_event_percpu+18] SP: ff20000000003e60 CAUSE: 000000000000000d epc : ffffffff8006462e ra : ffffffff80064702 sp : ff20000000003e60 gp : ffffffff814ef848 tp : ffffffff8140db00 t0 : 0000000000046600 t1 : ffffffff80836464 t2 : ffffffff81200a48 s0 : ff20000000003ed0 s1 : 0000000000000000 a0 : 0000000000000000 a1 : 0000000000000118 a2 : 0000000000000052 a3 : 0000000000000000 a4 : 0000000000000000 a5 : 0000000000010001 a6 : ff6000001fe01958 a7 : 00002496ea89dbf1 s2 : ff60000000941ab0 s3 : ffffffff814a0658 s4 : ff60000000089230 s5 : ffffffff814a0518 s6 : ffffffff814a0620 s7 : ffffffff80e5f0f8 s8 : ffffffff80fc50b0 s9 : ffffffff80c1d880 s10: 0000000000000000 s11: 0000000000000001 t3 : 0000000000003392 t4 : 0000000000000000 t5 : 0000000000000000 t6 : 0000000000000040 status: 0000000200000100 badaddr: 0000000000000078 cause: 000000000000000d orig_a0: ff20000000003ea0 CPU 1 IRQ STACK: (none found) crash> crash> help -m <snip> machspec: ced1e0 irq_stack_size: 16384 irq_stacks[0]: ff20000000000000 irq_stacks[1]: ff20000000008000 crash> [1]: https://lore.kernel.org/linux-riscv/[email protected]/ Signed-off-by: Song Shuai <[email protected]>
liutgnu
pushed a commit
to liutgnu/crash-preview
that referenced
this pull request
Dec 5, 2024
On x86_64, there are cases where crash cannot handle IRQ exception frames properly. For example, with RHEL9.3 kernel, "bt" command fails with with "WARNING possibly bogus exception frame": crash> bt -c 30 PID: 2898241 TASK: ff4cb0ce0da0c680 CPU: 30 COMMAND: "star-ccm+" #0 [fffffe4658d88e58] crash_nmi_callback at ffffffffa00675e8 crash-utility#1 [fffffe4658d88e68] nmi_handle at ffffffffa002ebab ... --- <NMI exception stack> --- ... crash-utility#13 [ff5eba269937cf90] __do_softirq at ffffffffa0c6c007 crash-utility#14 [ff5eba269937cfe0] __irq_exit_rcu at ffffffffa010ef61 crash-utility#15 [ff5eba269937cff0] sysvec_apic_timer_interrupt at ffffffffa0c58ca2 --- <IRQ stack> --- RIP: 0000000000000010 RSP: 0000000000000018 RFLAGS: ff5eba26ddc9f7e8 RAX: 0000000000000a20 RBX: ff5eba26ddc9f940 RCX: 0000000000001000 RDX: ffffffb559980000 RSI: ff4cb12d67207400 RDI: ffffffffffffffff RBP: 0000000000001000 R8: ff5eba26ddc9f940 R9: ff5eba26ddc9f8af R10: 0000000000000003 R11: 0000000000000a20 R12: ff5eba26ddc9f8b0 R13: 000000283c07f000 R14: ff4cb0f5a29a1c00 R15: 0000000000000001 ORIG_RAX: ffffffffa07c4e60 CS: 0206 SS: 7000001cf0380001 bt: WARNING: possibly bogus exception frame Running "crash" with "--machdep irq_eframe_link=0xffffffffffffffe8" option (i.e. thus irq_eframe_link = -24) works properly: PID: 2898241 TASK: ff4cb0ce0da0c680 CPU: 30 COMMAND: "star-ccm+" #0 [fffffe4658d88e58] crash_nmi_callback at ffffffffa00675e8 crash-utility#1 [fffffe4658d88e68] nmi_handle at ffffffffa002ebab ... --- <NMI exception stack> --- ... crash-utility#13 [ff5eba269937cf90] __do_softirq at ffffffffa0c6c007 crash-utility#14 [ff5eba269937cfe0] __irq_exit_rcu at ffffffffa010ef61 crash-utility#15 [ff5eba269937cff0] sysvec_apic_timer_interrupt at ffffffffa0c58ca2 --- <IRQ stack> --- crash-utility#16 [ff5eba26ddc9f738] asm_sysvec_apic_timer_interrupt at ffffffffa0e00e06 [exception RIP: alloc_pte.constprop.0+32] RIP: ffffffffa07c4e60 RSP: ff5eba26ddc9f7e8 RFLAGS: 00000206 RAX: ff5eba26ddc9f940 RBX: 0000000000001000 RCX: 0000000000000a20 RDX: 0000000000001000 RSI: ffffffb559980000 RDI: ff4cb12d67207400 RBP: ff5eba26ddc9f8b0 R8: ff5eba26ddc9f8af R9: 0000000000000003 R10: 0000000000000a20 R11: ff5eba26ddc9f940 R12: 000000283c07f000 R13: ff4cb0f5a29a1c00 R14: 0000000000000001 R15: ff4cb0f5a29a1bf8 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 crash-utility#17 [ff5eba26ddc9f830] iommu_v1_map_pages at ffffffffa07c5648 crash-utility#18 [ff5eba26ddc9f8f8] __iommu_map at ffffffffa07d7803 crash-utility#19 [ff5eba26ddc9f990] iommu_map_sg at ffffffffa07d7b71 crash-utility#20 [ff5eba26ddc9f9f0] iommu_dma_map_sg at ffffffffa07ddcc9 crash-utility#21 [ff5eba26ddc9fa90] __dma_map_sg_attrs at ffffffffa01b5205 ... Some background: asm_common_interrupt: callq error_entry movq %rax,%rsp movq %rsp,%rdi movq 0x78(%rsp),%rsi movq $-0x1,0x78(%rsp) call common_interrupt # rsp pointing to regs common_interrupt: pushq %r12 pushq %rbp pushq %rbx [...] movq hardirq_stack_ptr,%r11 movq %rsp,(%r11) movq %r11,%rsp [...] call __common_interrupt # rip:common_interrupt So frame_size(rip:common_interrupt) = 32 (3 push + ret). Hence "machdep->machspec->irq_eframe_link = -32;" (see x86_64_irq_eframe_link_init()). Now: asm_sysvec_apic_timer_interrupt: pushq $-0x1 callq error_entry movq %rax,%rsp movq %rsp,%rdi callq sysvec_apic_timer_interrupt sysvec_apic_timer_interrupt: pushq %r12 pushq %rbp [...] movq hardirq_stack_ptr,%r11 movq %rsp,(%r11) movq %r11,%rsp [...] call __sysvec_apic_timer_interrupt # rip:sysvec_apic_timer_interrupt Here frame_size(rip:sysvec_apic_timer_interrupt) = 24 (2 push + ret) We should also notice that: rip = *(hardirq_stack_ptr - 8) rsp = *(hardirq_stack_ptr) regs = rsp - frame_size(rip) But x86_64_get_framesize() does not work with IRQ handlers (returns 0). So not many options other than hardcoding the most likely value and looking around it. Actually x86_64_irq_eframe_link() was trying -32 (default), and then -40, but not -24. Signed-off-by: Georges Aureau <[email protected]>
liutgnu
added a commit
to liutgnu/crash-preview
that referenced
this pull request
Dec 5, 2024
Previously, "retq" is used to determine the end of a function, so the end of framesize calculation. However "ret" might be outputted by gdb rather than "retq", as a result, the framesize is returned incorrectly, and bogus stack trace will be outputted. Without the patch: $ crash -d 3 vmcore vmlinux crash> bt 0xffffffff92da7545 <copy_process+5>: push %rbp [framesize: 8] ... 0xffffffff92da7561 <copy_process+33>: sub $0x238,%rsp [framesize: 624] ... 0xffffffff92da776a <copy_process+554>: pop %r15 [framesize: 8] 0xffffffff92da776c <copy_process+556>: pop %rbp [framesize: 0] 0xffffffff92da776d <copy_process+557>: ret crash> bt -D dump framesize_cache_entries: ... [ 3]: ffffffff92dadcbd 0 CF (copy_process+26493) crash> bt ... crash-utility#9 [ffff888263157bc0] copy_process at ffffffff92dadcbd crash-utility#10 [ffff888263157d20] __mutex_init at ffffffff92ed8dd5 crash-utility#11 [ffff888263157d38] __alloc_file at ffffffff93458397 crash-utility#12 [ffff888263157d60] alloc_empty_file at ffffffff934585d2 crash-utility#13 [ffff888263157da8] __alloc_fd at ffffffff934b5ead crash-utility#14 [ffff888263157e38] _do_fork at ffffffff92dae7a1 crash-utility#15 [ffff888263157f28] do_syscall_64 at ffffffff92c085f4 Stack crash-utility#10 ~ crash-utility#13 are bogus and misleading. With the patch: ... 0xffffffff92da776d <copy_process+557>: ret [framesize restored to: 624] crash> bt -D dump ... [ 3]: ffffffff92dadcbd 624 CF (copy_process+26493) crash> bt ... crash-utility#9 [ffff888263157bc0] copy_process at ffffffff92dadcbd crash-utility#10 [ffff888263157e38] _do_fork at ffffffff92dae7a1 crash-utility#11 [ffff888263157f28] do_syscall_64 at ffffffff92c085f4 Signed-off-by: Tao Liu <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
if indicated by the environment
to allow for reproducible builds
This is slightly stretching the definition of the variable defined at
https://reproducible-builds.org/specs/source-date-epoch/
Signed-off-by: Bernhard M. Wiedemann [email protected]
The alternative would be to only include build time information on request
or drop it completely