Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x86_64: Fix "bt" command on kernels with random_kstack_offset=on #15

Merged
merged 6 commits into from
Feb 27, 2023

Conversation

fengjixuchui
Copy link
Owner

No description provided.

georges-aureau and others added 6 commits February 14, 2023 14:59
For CONFIG_SLAB_FREELIST_HARDENED, the crash memory.c:freelist_ptr()
code is checking for an additional bswap using a simple release test eg.
THIS_KERNEL_VERSION >= LINUX(5,7,0), basically checking for RHEL9 and
beyond.

However, for RHEL8.6 and later, we have CONFIG_SLAB_FREELIST_HARDENED=y,
and we also have the additional bswap, but the current crash is not
handling this case, hence "kmem -s|-S" will not work properly, and free
objects will not be counted nor reported properly.

An example from a RHEL8.6 x86_64 kdump, a kmem cache with a single slab
having 42 objects, only the freelist head is seen as free as crash can't
walk freelist next pointers, and crash is wrongly reporting 41 allocated
objects:

  crash> sys | grep RELEASE
       RELEASE: 4.18.0-372.9.1.el8.x86_64
  crash> kmem -s nfs_commit_data
  CACHE             OBJSIZE  ALLOCATED     TOTAL  SLABS  SSIZE  NAME
  ffff9ad40c7cb2c0      728         41        42      1    32k  nfs_commit_data

When properly accounting for the additional bswap, we can walk the
freelist and find 38 free objects, and crash is now reporting only 4
allocated objects:

  crash> kmem -s nfs_commit_data
  CACHE             OBJSIZE  ALLOCATED     TOTAL  SLABS  SSIZE  NAME
  ffff9ad40c7cb2c0      728          4        42      1    32k  nfs_commit_data

Signed-off-by: Georges Aureau <[email protected]>
Currently, the "net -s" option fails to show IPv6 addresses and ports
for the SOURCE-PORT and DESTINATION-PORT columns on Linux 3.13 and later
kernels, which have kernel commit efe4208f47f907 ("ipv6: make lookups
simpler and faster").  For example:

  crash> net -s
  PID: 305524   TASK: ffff9bc449895580  CPU: 6    COMMAND: "sshd"
  FD      SOCKET            SOCK       FAMILY:TYPE SOURCE-PORT DESTINATION-PORT
   3 ffff9bc446e9a680 ffff9bc4455b5940 UNIX:DGRAM
   4 ffff9bc446e9c600 ffff9bc3b2b24e00 INET6:STREAM

With the patch:

  crash> net -s
  PID: 305524   TASK: ffff9bc449895580  CPU: 6    COMMAND: "sshd"
  FD      SOCKET            SOCK       FAMILY:TYPE SOURCE-PORT DESTINATION-PORT
   3 ffff9bc446e9a680 ffff9bc4455b5940 UNIX:DGRAM
   4 ffff9bc446e9c600 ffff9bc3b2b24e00 INET6:STREAM xxxx:xx:x:xxxx:xxxx:xxxx:xxxx:xxxx-22 yyyy:yy:y:yyyy:yyyy:yyyy:yyyy:yyyy-44870

Reported-by: Buland Kumar Singh <[email protected]>
Signed-off-by: Lianbo Jiang <[email protected]>
Signed-off-by: Kazuhito Hagio <[email protected]>
The "kmem -i" option may output a bogus statistics for CACHED, which
might be observed when some extreme situations occur in kernel, such as
OOM, disk IO errors, etc.

The following result of calculation may be a negative value, refer to
the dump_kmeminfo():
  page_cache_size = nr_file_pages - swapper_space_nrpages - buffer_pages;

As a result, the negative value will be converted to unsigned long
integer, eventually it overflows and is printed as big integers.

  crash> kmem -i
                   PAGES        TOTAL      PERCENTAGE
      TOTAL MEM  255314511     973.9 GB         ----
           FREE   533574         2 GB    0% of TOTAL MEM
           USED  254780937     971.9 GB   99% of TOTAL MEM
         SHARED     1713       6.7 MB    0% of TOTAL MEM
        BUFFERS      374       1.5 MB    0% of TOTAL MEM
         CACHED     -114  70368744177664 GB  72251060080% of TOTAL MEM
                    ^^^^  ^^^^^^^^^^^^^^     ^^^^^^^^^^^^
         ...

Let's normalize it to zero with an info message to fix such cornor cases.

Reported-by: Buland Kumar Singh <[email protected]>
Signed-off-by: Lianbo Jiang <[email protected]>
Signed-off-by: Kazuhito Hagio <[email protected]>
Kernel commit 7d65f4a65532 ("irq: Consolidate do_softirq() arch overriden
implementations") renamed the call_softirq to do_softirq_own_stack, and
there is no exception frame also when coming from do_softirq_own_stack.
Without the patch, crash may unnecessarily output an exception frame with
a warning as below:

  crash> foreach bt
  ...
  PID: 0        TASK: ffff914f820a8000  CPU: 25   COMMAND: "swapper/25"
   #0 [fffffe0000504e48] crash_nmi_callback at ffffffffa665d763
   #1 [fffffe0000504e50] nmi_handle at ffffffffa662a423
   #2 [fffffe0000504ea8] default_do_nmi at ffffffffa6fe7dc9
   #3 [fffffe0000504ec8] do_nmi at ffffffffa662a97f
   #4 [fffffe0000504ef0] end_repeat_nmi at ffffffffa70015e8
      [exception RIP: clone_endio+172]
      RIP: ffffffffc005c1ec  RSP: ffffa1d403d08e98  RFLAGS: 00000246
      RAX: 0000000000000000  RBX: ffff915326fba230  RCX: 0000000000000018
      RDX: ffffffffc0075400  RSI: 0000000000000000  RDI: ffff915326fba230
      RBP: ffff915326fba1c0   R8: 0000000000001000   R9: ffff915308d6d2a0
      R10: 000000a97dfe5e10  R11: ffffa1d40038fe98  R12: ffff915302babc40
      R13: ffff914f94360000  R14: 0000000000000000  R15: 0000000000000000
      ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
  --- <NMI exception stack> ---
   #5 [ffffa1d403d08e98] clone_endio at ffffffffc005c1ec [dm_mod]
   #6 [ffffa1d403d08ed0] blk_update_request at ffffffffa6a96954
   #7 [ffffa1d403d08f10] scsi_end_request at ffffffffa6c9b968
   #8 [ffffa1d403d08f48] scsi_io_completion at ffffffffa6c9bb3e
   #9 [ffffa1d403d08f90] blk_complete_reqs at ffffffffa6aa0e95
   #10 [ffffa1d403d08fa0] __softirqentry_text_start at ffffffffa72000dc
   #11 [ffffa1d403d08ff0] do_softirq_own_stack at ffffffffa7000f9a
  --- <IRQ stack> ---
   #12 [ffffa1d40038fe70] do_softirq_own_stack at ffffffffa7000f9a
      [exception RIP: unknown or invalid address]
      RIP: 0000000000000000  RSP: 0000000000000000  RFLAGS: 00000000
      RAX: ffffffffa672eae5  RBX: ffffffffa83b34e0  RCX: ffffffffa672eb12
      RDX: 0000000000000010  RSI: 8b7d6c8869010c00  RDI: 0000000000000085
      RBP: 0000000000000286   R8: ffff914f820a8000   R9: ffffffffa67a94e0
      R10: 0000000000000286  R11: ffffffffa66fb4c5  R12: ffffffffa67a898b
      R13: 0000000000000000  R14: fffffffffffffff8  R15: ffffffffa67a1e68
      ORIG_RAX: 0000000000000000  CS: 0000  SS: ffffffffa672edff
   bt: WARNING: possibly bogus exception frame
   #13 [ffffa1d40038ff30] start_secondary at ffffffffa665fa2c
   #14 [ffffa1d40038ff50] secondary_startup_64_no_verify at ffffffffa6600116
   ...

Reported-by: Marco Patalano <[email protected]>
Signed-off-by: Lianbo Jiang <[email protected]>
…code

For gdb-10.2, the disassembly code may start with "=>", which needs to
be stripped when calculating the address. Otherwise, parsing the address
will fail because the current code always assumes that it starts with the
"0x". For example:

  crash> gdb disassemble 0xffffffffa2317add
  Dump of assembler code for function native_queued_spin_lock_slowpath:
     ...
     0xffffffffa2317ad3 <+35>:    mov    %edx,%eax
     0xffffffffa2317ad5 <+37>:    lock cmpxchg %ecx,(%rdi)
  => 0xffffffffa2317ad9 <+41>:    cmp    %eax,%edx
     0xffffffffa2317adb <+43>:    jne    0xffffffffa2317ac0 ...
     0xffffffffa2317add <+45>:    pop    %rbp
     ...

Without the patch:
  crash> dis 0xffffffffa2317add -r | tail -5
  0xffffffffa2317ad3 <native_queued_spin_lock_slowpath+35>:	mov    %edx,%eax
  0xffffffffa2317ad5 <native_queued_spin_lock_slowpath+37>:	lock cmpxchg %ecx,(%rdi)
  0xffffffffa2317ad5 <native_queued_spin_lock_slowpath+37>:	cmp    %eax,%edx
                                                       ^^
  0xffffffffa2317adb <native_queued_spin_lock_slowpath+43>:	jne    0xffffffffa2317ac0 ...
  0xffffffffa2317add <native_queued_spin_lock_slowpath+45>:	pop    %rbp

With the patch:

  crash> dis 0xffffffffa2317add -r | tail -5
  0xffffffffa2317ad3 <native_queued_spin_lock_slowpath+35>:	mov    %edx,%eax
  0xffffffffa2317ad5 <native_queued_spin_lock_slowpath+37>:	lock cmpxchg %ecx,(%rdi)
  0xffffffffa2317ad9 <native_queued_spin_lock_slowpath+41>:	cmp    %eax,%edx
  0xffffffffa2317adb <native_queued_spin_lock_slowpath+43>:	jne    0xffffffffa2317ac0 ...
  0xffffffffa2317add <native_queued_spin_lock_slowpath+45>:	pop    %rbp

Reported-by: Vernon Lovejoy <[email protected]>
Signed-off-by: Lianbo Jiang <[email protected]>
On kernels configured with CONFIG_RANDOMIZE_KSTACK_OFFSET=y and
random_kstack_offset=on, a random offset is added to task stacks with
__kstack_alloca() at the beginning of do_syscall_64() and other syscall
entry functions.  This eventually does the following instruction.

  <do_syscall_64+32>:  sub    %rax,%rsp

On the other hand, crash uses only a part of data for ORC unwinder to
unwind stacks and if an ip value doesn't have a usable ORC data, it
caluculates the frame size with parsing the assembly of the function.

However, crash cannot calculate the frame size correctly with the
instruction above, and prints stale return addresses like this:

  crash> bt 1
  PID: 1        TASK: ffff9c250023b880  CPU: 0    COMMAND: "systemd"
    #0 [ffffb7e5c001fc80] __schedule at ffffffff91ae2b16
    #1 [ffffb7e5c001fd00] schedule at ffffffff91ae2ed3
    #2 [ffffb7e5c001fd18] schedule_hrtimeout_range_clock at ffffffff91ae7ed8
    #3 [ffffb7e5c001fda8] ep_poll at ffffffff913ef828
    #4 [ffffb7e5c001fe48] do_epoll_wait at ffffffff913ef943
    #5 [ffffb7e5c001fe80] __x64_sys_epoll_wait at ffffffff913f0130
    #6 [ffffb7e5c001fed0] do_syscall_64 at ffffffff91ad7169
    #7 [ffffb7e5c001fef0] do_syscall_64 at ffffffff91ad7179             <<
    #8 [ffffb7e5c001ff10] syscall_exit_to_user_mode at ffffffff91adaab2 << stale entries
    #9 [ffffb7e5c001ff20] do_syscall_64 at ffffffff91ad7179             <<
   #10 [ffffb7e5c001ff50] entry_SYSCALL_64_after_hwframe at ffffffff91c0009b
       RIP: 00007f258d9427ae  RSP: 00007fffda631d60  RFLAGS: 00000293
       ...

To fix this, enhance the use of ORC data.  The ORC unwinder often uses
%rbp value, so keep it from exception frames and inactive task stacks.

Signed-off-by: Kazuhito Hagio <[email protected]>
@fengjixuchui fengjixuchui merged commit 63315d0 into fengjixuchui:master Feb 27, 2023
fengjixuchui pushed a commit that referenced this pull request Mar 5, 2024
…usly

There is an issue that, for kernel modules, "dis -rl" fails to display
modules code line number data after execute "bt" command in crash.

Without the patch:
  crsah> mod -S
  crash> bt
  PID: 1500     TASK: ff2bd8b093524000  CPU: 16   COMMAND: "lpfc_worker_0"
   #0 [ff2c9f725c39f9e0] machine_kexec at ffffffff8e0686d3
   ...snip...
   #8 [ff2c9f725c39fcc0] __lpfc_sli_release_iocbq_s4 at ffffffffc0f2f425 [lpfc]
   ...snip...
  crash> dis -rl ffffffffc0f60f82
  0xffffffffc0f60eb0 <lpfc_nlp_get>:      nopl   0x0(%rax,%rax,1) [FTRACE NOP]
  0xffffffffc0f60eb5 <lpfc_nlp_get+5>:    push   %rbp
  0xffffffffc0f60eb6 <lpfc_nlp_get+6>:    push   %rbx
  0xffffffffc0f60eb7 <lpfc_nlp_get+7>:    test   %rdi,%rdi

With the patch:
  crash> mod -S
  crash> bt
  PID: 1500     TASK: ff2bd8b093524000  CPU: 16   COMMAND: "lpfc_worker_0"
   #0 [ff2c9f725c39f9e0] machine_kexec at ffffffff8e0686d3
   ...snip...
   #8 [ff2c9f725c39fcc0] __lpfc_sli_release_iocbq_s4 at ffffffffc0f2f425 [lpfc]
   ...snip...
  crash> dis -rl ffffffffc0f60f82
  /usr/src/debug/kernel-4.18.0-425.13.1.el8_7/linux-4.18.0-425.13.1.el8_7.x86_64/drivers/scsi/lpfc/lpfc_hbadisc.c: 6756
  0xffffffffc0f60eb0 <lpfc_nlp_get>:      nopl   0x0(%rax,%rax,1) [FTRACE NOP]
  /usr/src/debug/kernel-4.18.0-425.13.1.el8_7/linux-4.18.0-425.13.1.el8_7.x86_64/drivers/scsi/lpfc/lpfc_hbadisc.c: 6759
  0xffffffffc0f60eb5 <lpfc_nlp_get+5>:    push   %rbp

The root cause is, after kernel module been loaded by mod command, the symtable
is not expanded in gdb side. crash bt or dis command will trigger such an
expansion. However the symtable expansion is different for the 2 commands:

The stack trace of "dis -rl" for symtable expanding:

  #0  0x00000000008d8d9f in add_compunit_symtab_to_objfile ...
  #1  0x00000000006d3293 in buildsym_compunit::end_symtab_with_blockvector ...
  #2  0x00000000006d336a in buildsym_compunit::end_symtab_from_static_block ...
  #3  0x000000000077e8e9 in process_full_comp_unit ...
  #4  process_queue ...
  #5  dw2_do_instantiate_symtab ...
  #6  0x000000000077ed67 in dw2_instantiate_symtab ...
  #7  0x000000000077f75e in dw2_expand_all_symtabs ...
  #8  0x00000000008f254d in gdb_get_line_number ...
  #9  0x00000000008f22af in gdb_command_funnel_1 ...
  #10 0x00000000008f2003 in gdb_command_funnel ...
  #11 0x00000000005b7f02 in gdb_interface ...
  #12 0x00000000005f5bd8 in get_line_number ...
  #13 0x000000000059e574 in cmd_dis ...

The stack trace of "bt" for symtable expanding:

  #0  0x00000000008d8d9f in add_compunit_symtab_to_objfile ...
  #1  0x00000000006d3293 in buildsym_compunit::end_symtab_with_blockvector ...
  #2  0x00000000006d336a in buildsym_compunit::end_symtab_from_static_block ...
  #3  0x000000000077e8e9 in process_full_comp_unit ...
  #4  process_queue ...
  #5  dw2_do_instantiate_symtab ...
  #6  0x000000000077ed67 in dw2_instantiate_symtab ...
  #7  0x000000000077f8ed in dw2_lookup_symbol ...
  #8  0x00000000008e6d03 in lookup_symbol_via_quick_fns ...
  #9  0x00000000008e7153 in lookup_symbol_in_objfile ...
  #10 0x00000000008e73c6 in lookup_symbol_global_or_static_iterator_cb ...
  #11 0x00000000008b99c4 in svr4_iterate_over_objfiles_in_search_order ...
  #12 0x00000000008e754e in lookup_global_or_static_symbol ...
  #13 0x00000000008e75da in lookup_static_symbol ...
  #14 0x00000000008e632c in lookup_symbol_aux ...
  #15 0x00000000008e5a7a in lookup_symbol_in_language ...
  #16 0x00000000008e5b30 in lookup_symbol ...
  #17 0x00000000008f2a4a in gdb_get_datatype ...
  #18 0x00000000008f22c0 in gdb_command_funnel_1 ...
  crash-utility#19 0x00000000008f2003 in gdb_command_funnel ...
  crash-utility#20 0x00000000005b7f02 in gdb_interface ...
  crash-utility#21 0x00000000005f8a9f in datatype_info ...
  crash-utility#22 0x0000000000599947 in cpu_map_size ...
  crash-utility#23 0x00000000005a975d in get_cpus_online ...
  crash-utility#24 0x0000000000637a8b in diskdump_get_prstatus_percpu ...
  crash-utility#25 0x000000000062f0e4 in get_netdump_regs_x86_64 ...
  crash-utility#26 0x000000000059fe68 in back_trace ...
  crash-utility#27 0x00000000005ab1cb in cmd_bt ...

For the stacktrace of "dis -rl", it calls dw2_expand_all_symtabs() to expand
all symtable of the objfile, or "*.ko.debug" in our case. However for
the stacktrace of "bt", it doesn't expand all, but only a subset of symtable
which is enough to find a symbol by dw2_lookup_symbol(). As a result, the
objfile->compunit_symtabs, which is the head of a single linked list of
struct compunit_symtab, is not NULL but didn't contain all symtables. It
will not be reinitialized in gdb_get_line_number() by "dis -rl" because
!objfile_has_full_symbols(objfile) check will fail, so it cannot display
the proper code line number data.

Since objfile_has_full_symbols(objfile) check cannot ensure all symbols
been expanded, this patch add a new member as a flag for struct objfile
to record if all symbols have been expanded. The flag will be set only ofter
expand_all_symtabs been called.

Signed-off-by: Tao Liu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants