Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dump maple tree offset variables by "help -o" #13

Merged
merged 23 commits into from
Jan 12, 2023

Conversation

fengjixuchui
Copy link
Owner

No description provided.

k-hagio and others added 23 commits December 16, 2022 13:24
Kernel commit f1a7941243c1 ("mm: convert mm's rss stats into
percpu_counter"), which is contained in Linux 6.2-rc1 and later
kernels, changed mm_struct.rss_stat from struct mm_rss_stat into an
array of struct percpu_counter.

Without the patch, "ps" and several commands fail with the following
error message:

  ps: invalid structure member offset: mm_rss_stat_count
      FILE: memory.c  LINE: 4724  FUNCTION: get_task_mem_usage()

Signed-off-by: Kazuhito Hagio <[email protected]>
Recently the following failure has been observed on some vmcores when
using the mount command:

  crash> mount
       MOUNT           SUPERBLK     TYPE   DEVNAME   DIRNAME
  ffff97a4818a3480 ffff979500013800 rootfs none      /
  ffff97e4846ca700 ffff97e484653000 sysfs  sysfs     /sys
  ...
  ffff97b484753420                0 mount: invalid kernel virtual address: 0  type: "super_block buffer"

The kernel virtual address of the super_block is zero when the mount
command fails with the vfsmnt address 0xffff97b484753420. And the
remaining mount information will be discarded. That is not expected.

Check the address and skip it with a warning, if this is an invalid
kernel virtual address, that can avoid truncating the remaining mount
dumps.

Reported-by: Dave Wysochanski <[email protected]>
Signed-off-by: Lianbo Jiang <[email protected]>
This patch mainly added some environment configurations, macro definitions,
specific architecture structures and some function declarations supported
by the RISCV64 architecture.

We can use the build command to get the simplest version crash tool:
	make target=RISCV64 -j2

Co-developed-by: Lifang Xia <[email protected]>
Signed-off-by: Xianting Tian <[email protected]>
1. Add riscv64_init() implementation, do all necessary machine-specific setup,
   which will be called multiple times during initialization.
2. Add riscv64 sv39/48/57 pagetable macro definitions, the function of converting
   virtual address to a physical address via 4K page table.
   For 2M and 1G pagesize, they will be implemented in the future(currently not supported).
3. Add the implementation of the vtop command, which is used to convert a
   virtual address to a physical address(call the functions defined in 2).
4. Add the implementation to get virtual memory layout, va_bits, phys_ram_base
   from vmcoreinfo. As these configurations changes from time to time, we sent
   a Linux kernel patch to export these configurations, which can simplify the
   development of crash tool.
   The kernel commit: 649d6b1019a2 ("RISC-V: Add arch_crash_save_vmcoreinfo")
5. Add riscv64_get_smp_cpus() implementation, get the number of cpus.
6. Add riscv64_get_page_size() implementation, get page size.
And so on.

With this patch, we can enter crash command line, and run "vtop", "mod", "rd",
"*", "p", "kmem" ...

Tested on QEMU RISCV64 end and SoC platform of T-head Xuantie 910 CPU.

      KERNEL: vmlinux
    DUMPFILE: vmcore
        CPUS: 1
        DATE: Fri Jul 15 10:24:25 CST 2022
      UPTIME: 00:00:33
LOAD AVERAGE: 0.05, 0.01, 0.00
       TASKS: 41
    NODENAME: buildroot
     RELEASE: 5.18.9
     VERSION: #30 SMP Fri Jul 15 09:47:03 CST 2022
     MACHINE: riscv64  (unknown Mhz)
      MEMORY: 1 GB
       PANIC: "Kernel panic - not syncing: sysrq triggered crash"
         PID: 113
     COMMAND: "sh"
        TASK: ff60000002269600  [THREAD_INFO: ff60000002269600]
         CPU: 0
       STATE: TASK_RUNNING (PANIC)

crash> p mem_map
mem_map = $1 = (struct page *) 0xff6000003effbf00

crash> p /x *(struct page *) 0xff6000003effbf00
$5 = {
  flags = 0x1000,
  {
    {
      {
        lru = {
          next = 0xff6000003effbf08,
          prev = 0xff6000003effbf08
        },
        {
          __filler = 0xff6000003effbf08,
          mlock_count = 0x3effbf08
        }
      },
      mapping = 0x0,
      index = 0x0,
      private = 0x0
    },

crash> mod
     MODULE       NAME             BASE         SIZE  OBJECT FILE
ffffffff0113e740  nvme_core  ffffffff01133000  98304  (not loaded)  [CONFIG_KALLSYMS]
ffffffff011542c0  nvme       ffffffff0114c000  61440  (not loaded)  [CONFIG_KALLSYMS]

crash> rd ffffffff0113e740 8
ffffffff0113e740:  0000000000000000 ffffffff810874f8   .........t......
ffffffff0113e750:  ffffffff011542c8 726f635f656d766e   .B......nvme_cor
ffffffff0113e760:  0000000000000065 0000000000000000   e...............
ffffffff0113e770:  0000000000000000 0000000000000000   ................

crash> vtop ffffffff0113e740
VIRTUAL           PHYSICAL
ffffffff0113e740  8254d740

  PGD: ffffffff810e9ff8 => 2ffff001
  P4D: 0000000000000000 => 000000002fffec01
  PUD: 00005605c2957470 => 0000000020949801
  PMD: 00007fff7f1750c0 => 0000000020947401
  PTE: 0 => 209534e7
 PAGE: 000000008254d000

  PTE     PHYSICAL  FLAGS
209534e7  8254d000  (PRESENT|READ|WRITE|GLOBAL|ACCESSED|DIRTY)

      PAGE       PHYSICAL      MAPPING       INDEX CNT FLAGS
ff6000003f0777d8 8254d000                0        0  1 0

Tested-by: Yixun Lan <[email protected]>
Signed-off-by: Xianting Tian <[email protected]>
Use generic_dis_filter() function to support dis command implementation.

With this patch, we can get the disassembled code,
crash> dis __crash_kexec
0xffffffff80088580 <__crash_kexec>:     addi    sp,sp,-352
0xffffffff80088582 <__crash_kexec+2>:   sd      s0,336(sp)
0xffffffff80088584 <__crash_kexec+4>:   sd      s1,328(sp)
0xffffffff80088586 <__crash_kexec+6>:   sd      s2,320(sp)
0xffffffff80088588 <__crash_kexec+8>:   addi    s0,sp,352
0xffffffff8008858a <__crash_kexec+10>:  sd      ra,344(sp)
0xffffffff8008858c <__crash_kexec+12>:  sd      s3,312(sp)
0xffffffff8008858e <__crash_kexec+14>:  sd      s4,304(sp)
0xffffffff80088590 <__crash_kexec+16>:  auipc   s2,0x1057
0xffffffff80088594 <__crash_kexec+20>:  addi    s2,s2,-1256
0xffffffff80088598 <__crash_kexec+24>:  ld      a5,0(s2)
0xffffffff8008859c <__crash_kexec+28>:  mv      s1,a0
0xffffffff8008859e <__crash_kexec+30>:  auipc   a0,0xfff

Signed-off-by: Xianting Tian <[email protected]>
With the patch, we can get the irq info,
crash> irq
 IRQ   IRQ_DESC/_DATA      IRQACTION      NAME
 0       (unused)          (unused)
 1   ff60000001329600  ff60000001d17180  "101000.rtc"
 2   ff60000001329800  ff60000001d17680  "ttyS0"
 3   ff60000001329a00  ff60000001c33c00  "virtio0"
 4   ff60000001329c00  ff60000001c33f80  "virtio1"
 5   ff6000000120f400  ff60000001216000  "riscv-timer"

Signed-off-by: Xianting Tian <[email protected]>
1, Add the implementation to get stack frame from active & inactive
   task's stack.
2, Add 'bt -l' command support get a line number associated with a
   current pc address.
3, Add 'bt -f' command support to display all stack data contained
   in a frame

With the patch, we can get the backtrace,
crash> bt
PID: 113      TASK: ff6000000226c200  CPU: 0    COMMAND: "sh"
 #0 [ff20000010333b90] riscv_crash_save_regs at ffffffff800078f8
 #1 [ff20000010333cf0] panic at ffffffff806578c6
 #2 [ff20000010333d50] sysrq_reset_seq_param_set at ffffffff8038c03c
 #3 [ff20000010333da0] __handle_sysrq at ffffffff8038c604
 #4 [ff20000010333e00] write_sysrq_trigger at ffffffff8038cae4
 #5 [ff20000010333e20] proc_reg_write at ffffffff801b7ee8
 #6 [ff20000010333e40] vfs_write at ffffffff80152bb2
 #7 [ff20000010333e80] ksys_write at ffffffff80152eda
 #8 [ff20000010333ed0] sys_write at ffffffff80152f52

crash> bt -l
PID: 113      TASK: ff6000000226c200  CPU: 0    COMMAND: "sh"
 #0 [ff20000010333b90] riscv_crash_save_regs at ffffffff800078f8
    /buildroot/qemu_riscv64_virt_defconfig/build/linux-custom/arch/riscv/kernel/crash_save_regs.S: 47
 #1 [ff20000010333cf0] panic at ffffffff806578c6
    /buildroot/qemu_riscv64_virt_defconfig/build/linux-custom/kernel/panic.c: 276
 ... ...

crash> bt -f
PID: 113      TASK: ff6000000226c200  CPU: 0    COMMAND: "sh"
 #0 [ff20000010333b90] riscv_crash_save_regs at ffffffff800078f8
    [PC: ffffffff800078f8 RA: ffffffff806578c6 SP: ff20000010333b90 SIZE: 352]
    ff20000010333b90: ff20000010333bb0 ffffffff800078f8
    ff20000010333ba0: ffffffff8008862c ff20000010333b90
    ff20000010333bb0: ffffffff810dde38 ff6000000226c200
    ff20000010333bc0: ffffffff8032be68 0720072007200720
 ... ...

Signed-off-by: Xianting Tian <[email protected]>
Add support form printing out the registers from the dump file.

With the patch, we can get the regs,
crash> help -r
CPU 0:
epc : 00ffffffa5537400 ra : ffffffff80088620 sp : ff2000001039bb90
 gp : ffffffff810dde38 tp : ff60000002269600 t0 : ffffffff8032be5c
 t1 : 0720072007200720 t2 : 666666666666663c s0 : ff2000001039bcf0
 s1 : 0000000000000000 a0 : ff2000001039bb98 a1 : 0000000000000001
 a2 : 0000000000000010 a3 : 0000000000000000 a4 : 0000000000000000
 a5 : ff60000001c7d000 a6 : 000000000000003c a7 : ffffffff8035c998
 s2 : ffffffff810df0a8 s3 : ffffffff810df718 s4 : ff2000001039bb98
 s5 : 0000000000000000 s6 : 0000000000000007 s7 : ffffffff80c4a468
 s8 : 00fffffffde45410 s9 : 0000000000000007 s10: 00aaaaaad1640700
 s11: 0000000000000001 t3 : ff60000001218f00 t4 : ff60000001218f00
 t5 : ff60000001218000 t6 : ff2000001039b988

Signed-off-by: Xianting Tian <[email protected]>
Add riscv64_dump_machdep_table() implementation, display machdep_table.

crash> help -m
              flags: 80 ()
             kvbase: ff60000000000000
  identity_map_base: ff60000000000000
           pagesize: 4096
          pageshift: 12
           pagemask: fffffffffffff000
         pageoffset: fff
        pgdir_shift: 48
       ptrs_per_pgd: 512
       ptrs_per_pte: 512
          stacksize: 16384
                 hz: 250
            memsize: 1071644672 (0x3fe00000)
               bits: 64
         back_trace: riscv64_back_trace_cmd()
    processor_speed: riscv64_processor_speed()
              uvtop: riscv64_uvtop()
              kvtop: riscv64_kvtop()
    get_stack_frame: riscv64_get_stack_frame()
      get_stackbase: generic_get_stackbase()
       get_stacktop: generic_get_stacktop()
      translate_pte: riscv64_translate_pte()
        memory_size: generic_memory_size()
      vmalloc_start: riscv64_vmalloc_start()
       is_task_addr: riscv64_is_task_addr()
      verify_symbol: riscv64_verify_symbol()
         dis_filter: generic_dis_filter()
           dump_irq: generic_dump_irq()
    show_interrupts: generic_show_interrupts()
   get_irq_affinity: generic_get_irq_affinity()
           cmd_mach: riscv64_cmd_mach()
       get_smp_cpus: riscv64_get_smp_cpus()
          is_kvaddr: riscv64_is_kvaddr()
          is_uvaddr: riscv64_is_uvaddr()
       verify_paddr: generic_verify_paddr()
    init_kernel_pgd: NULL
    value_to_symbol: generic_machdep_value_to_symbol()
  line_number_hooks: NULL
      last_pgd_read: ffffffff810e9000
      last_p4d_read: 81410000
      last_pud_read: 81411000
      last_pmd_read: 81412000
     last_ptbl_read: 81415000
                pgd: 560d586f3ab0
                p4d: 560d586f4ac0
                pud: 560d586f5ad0
                pmd: 560d586f6ae0
               ptbl: 560d586f7af0
  section_size_bits: 27
   max_physmem_bits: 56
  sections_per_root: 0
           machspec: 560d57d204a0

Signed-off-by: Xianting Tian <[email protected]>
With the patch we can get some basic machine state information,
crash> mach
                MACHINE TYPE: riscv64
                 MEMORY SIZE: 1 GB
                        CPUS: 1
             PROCESSOR SPEED: (unknown)
                          HZ: 250
                   PAGE SIZE: 4096
           KERNEL STACK SIZE: 16384

Signed-off-by: Xianting Tian <[email protected]>
Verify the symbol to accept or reject a symbol from the kernel namelist.

Signed-off-by: Xianting Tian <[email protected]>
The following kernel commits split slab info from struct page into
struct slab in Linux 5.17.

  d122019bf061 ("mm: Split slab into its own type")
  07f910f9b729 ("mm: Remove slab from struct page")

Crash commit 5f390ed followed the change for SLUB, but crash still
uses the offset of page.lru inappropriately.  Luckily, it could work
because it was the same value as the offset of slab.slab_list until
Linux 6.1.

However, kernel commit 130d4df57390 ("mm/sl[au]b: rearrange struct slab
fields to allow larger rcu_head") in Linux 6.2-rc1 changed the offset of
slab.slab_list.  As a result, without the patch, "kmem -s|-S" options
print the following errors and fail to print values correctly for
kernels configured with CONFIG_SLUB.

  crash> kmem -S filp
  CACHE             OBJSIZE  ALLOCATED     TOTAL  SLABS  SSIZE  NAME
  kmem: filp: partial list slab: ffffcc650405ab88 invalid page.inuse: -1
  ffff8fa0401eca00      232       1267      1792     56     8k  filp
  ...
  KMEM_CACHE_NODE   NODE  SLABS  PARTIAL  PER-CPU
  ffff8fa0401cb8c0     0     56       24        8
  NODE 0 PARTIAL:
    SLAB              MEMORY            NODE  TOTAL  ALLOCATED  FREE
  kmem: filp: invalid partial list slab pointer: ffffcc650405ab88

Signed-off-by: Kazuhito Hagio <[email protected]>
… later

Kernel commit d42f3245c7e2 ("mm: memcg: convert vmstat slab counters to
bytes"), which is contained in Linux v5.9-rc1 and later kernels, renamed
NR_SLAB_{RECLAIMABLE,UNRECLAIMABLE} to NR_SLAB_{RECLAIMABLE,UNRECLAIMABLE}_B.

Without the patch, "kmem -i" command will display incorrect SLAB
statistics:

  crash> kmem -i | grep -e PAGES -e SLAB
                   PAGES        TOTAL      PERCENTAGE
           SLAB    89458     349.4 MB    0% of TOTAL MEM
                   ^^^^^     ^^^^^

With the patch, the actual result is:
  crash> kmem -i | grep -e PAGES -e SLAB
                   PAGES        TOTAL      PERCENTAGE
           SLAB   261953    1023.3 MB    0% of TOTAL MEM

Reported-by: Buland Kumar Singh <[email protected]>
Signed-off-by: Lianbo Jiang <[email protected]>
Signed-off-by: Kazuhito Hagio <[email protected]>
With glibc-2.23 and earlier (e.g. RHEL7), crash build fails with errors
like this due to EM_RISCV undeclared:

  $ make -j 24 warn
  TARGET: X86_64
  CRASH: 8.0.2++
  GDB: 10.2
  ...
  symbols.c: In function 'is_kernel':
  symbols.c:3746:8: error: 'EM_RISCV' undeclared (first use in this function)
     case EM_RISCV:
          ^
  ...

Define EM_RISCV as 243 [1][2] if not defined.

[1] https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=94e73c95d9b5
[2] http://www.sco.com/developers/gabi/latest/ch4.eheader.html

Signed-off-by: Kazuhito Hagio <[email protected]>
This is a backported patch from gdb. Without the patch, the following
crash command may abort due to an assertion failure in the gdb's
copy_type():

  crash> px __per_cpu_start:0
  gdbtypes.c:5505: internal-error: type* copy_type(const type*): Assertion `TYPE_OBJFILE_OWNED (type)' failed.
  A problem internal to GDB has been detected,
  further debugging may prove unreliable.
  Quit this debugging session? (y or n)

The gdb commit 8e2da1651879 ("Fix assertion failure in copy_type")
solved the current issue.

Reported-by: Buland Kumar Singh <[email protected]>
Signed-off-by: Lianbo Jiang <[email protected]>
Kernel commit e36ce448a08d ("mm/slab: use kmalloc_node() for off slab
freelist_idx_t array allocation"), which is contained in Linux 6.1 and
later kernels, removed kmem_cache.freelist_cache member on kernels
configured with CONFIG_SLAB=y.

Without the patch, crash does not set SLAB_OVERLOAD_PAGE and
"kmem -s|-S" options fail with the following error:

  kmem: invalid structure member offset: slab_list
        FILE: memory.c  LINE: 12156  FUNCTION: verify_slab_v2()

Use kmem_cache.freelist_size instead, which was introduced together
with kmem_cache.freelist_cache by kernel commit 8456a648cf44.

Signed-off-by: Kazuhito Hagio <[email protected]>
Kernel commit 130d4df57390 ("mm/sl[au]b: rearrange struct slab fields to
allow larger rcu_head"), which is contained in Linux 6.2-rc1 and later
kernels, changed the offset of slab.slab_list and now it's not equal to
the offset of page.lru.

Without the patch, "kmem -s|-S" options print errors and zeros for slab
counters like this for kernels configured with CONFIG_SLAB=y.

  crash> kmem -s
  CACHE             OBJSIZE  ALLOCATED     TOTAL  SLABS  SSIZE  NAME
  kmem: rpc_inode_cache: partial list: page/slab: fffff31ac4125190  bad active counter: 99476865
  kmem: rpc_inode_cache: partial list: page/slab: fffff31ac4125190  bad s_mem pointer: 100000003
  kmem: rpc_inode_cache: full list: page/slab: fffff31ac4125150  bad active counter: 99476225
  kmem: rpc_inode_cache: full list: page/slab: fffff31ac4125150  bad active counter: 99476225
  kmem: rpc_inode_cache: full list: page/slab: fffff31ac4125150  bad s_mem pointer: 100000005
  ffff930202adfb40      704          0         0      0     4k rpc_inode_cache
  ...

Signed-off-by: Kazuhito Hagio <[email protected]>
There have been two ways to iterate vm_area_struct until Linux 6.0:
 1) by rbtree, aka vma.vm_rb;
 2) by linked list, aka vma.vm_{next,prev}.
However with the maple tree patches[1][2] in Linux 6.1, vm_rb and
vm_{next,prev} are removed from vm_area_struct. The vm_area_dump()
in crash mainly uses the linked list for vma iteration, which will
not work for this case. So the maple tree iteration needs to be
ported to crash.

For crash, currently it only iteratively reads the maple tree,
no more rcu safe or maple tree modification features needed.
So we only port a subset of kernel maple tree features.
In addition, we need to modify the ported kernel source code,
making it compatible with crash.

This patch deals with the two issues:
 1) Poring mt_dump() function and all its dependencies from
    kernel source to crash, to enable crash maple tree iteration,
 2) adapting the ported code with crash.

[1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=524e00b36e8c547f5582eef3fb645a8d9fc5e3df
[2]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=763ecb035029f500d7e6dc99acd1ad299b7726a1

Signed-off-by: Tao Liu <[email protected]>
The maple tree is a new data structure for crash, so "tree" command
needs to support it for users to dump and view the content of maple
trees.  This patch achieves this by using ported mt_dump() and its
related functions from kernel and adapting them with "tree" command.

Also introduce a new -v arg specifically for dumping the complete
content of a maple tree:

    crash> tree -t maple 0xffff9034c006aec0 -v

    maple_tree(ffff9034c006aec0) flags 309, height 2 root 0xffff9034de70041e

    0-18446744073709551615: node 0xffff9034de700400 depth 0 type 3 parent 0xffff9034c006aec1 contents:...
      0-140112331583487: node 0xffff9034c01e8800 depth 1 type 1 parent 0xffff9034de700406 contents:...
        0-94643156942847: (nil)
        94643156942848-94643158024191: 0xffff9035131754c0
        94643158024192-94643160117247: (nil)
        ...

The existing options of "tree" command can work as well:

    crash> tree -t maple -r mm_struct.mm_mt 0xffff9034c006aec0 -p
    ffff9035131754c0
      index: 1  position: root/0/1
    ffff9035131751c8
      index: 2  position: root/0/3
    ffff9035131757b8
      index: 3  position: root/0/4
    ...

    crash> tree -t maple 0xffff9034c006aec0 -p -x -s vm_area_struct.vm_start,vm_end
    ffff9035131754c0
      index: 1  position: root/0/1
      vm_start = 0x5613d3c00000,
      vm_end = 0x5613d3d08000,
    ffff9035131751c8
      index: 2  position: root/0/3
      vm_start = 0x5613d3f07000,
      vm_end = 0x5613d3f0b000,
    ffff9035131757b8
      index: 3  position: root/0/4
      vm_start = 0x5613d3f0b000,
      vm_end = 0x5613d3f14000,
    ....

Signed-off-by: Tao Liu <[email protected]>
do_maple_tree() is similar to do_radix_tree() and do_xarray(), which
takes the same do_maple_tree_traverse entry as tree command.

Signed-off-by: Tao Liu <[email protected]>
Since memory.c:vm_area_dump() will iterate all vma, this patch mainly
introduces maple tree vma iteration to it.

We extract the code which handles each vma into a function. If
mm_struct_mmap exist, aka the linked list of vma iteration available,
we goto the original way; if not and mm_struct_mm_mt exist, aka
maple tree is available, then we goto the maple tree vma iteration.

Signed-off-by: Tao Liu <[email protected]>
In the previous patches, some variables are added to offset_table and
size_table, print them out with "help -o" command.

Signed-off-by: Tao Liu <[email protected]>
@fengjixuchui fengjixuchui merged commit 710d80b into fengjixuchui:master Jan 12, 2023
fengjixuchui pushed a commit that referenced this pull request Feb 27, 2023
Kernel commit 7d65f4a65532 ("irq: Consolidate do_softirq() arch overriden
implementations") renamed the call_softirq to do_softirq_own_stack, and
there is no exception frame also when coming from do_softirq_own_stack.
Without the patch, crash may unnecessarily output an exception frame with
a warning as below:

  crash> foreach bt
  ...
  PID: 0        TASK: ffff914f820a8000  CPU: 25   COMMAND: "swapper/25"
   #0 [fffffe0000504e48] crash_nmi_callback at ffffffffa665d763
   #1 [fffffe0000504e50] nmi_handle at ffffffffa662a423
   #2 [fffffe0000504ea8] default_do_nmi at ffffffffa6fe7dc9
   #3 [fffffe0000504ec8] do_nmi at ffffffffa662a97f
   #4 [fffffe0000504ef0] end_repeat_nmi at ffffffffa70015e8
      [exception RIP: clone_endio+172]
      RIP: ffffffffc005c1ec  RSP: ffffa1d403d08e98  RFLAGS: 00000246
      RAX: 0000000000000000  RBX: ffff915326fba230  RCX: 0000000000000018
      RDX: ffffffffc0075400  RSI: 0000000000000000  RDI: ffff915326fba230
      RBP: ffff915326fba1c0   R8: 0000000000001000   R9: ffff915308d6d2a0
      R10: 000000a97dfe5e10  R11: ffffa1d40038fe98  R12: ffff915302babc40
      R13: ffff914f94360000  R14: 0000000000000000  R15: 0000000000000000
      ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
  --- <NMI exception stack> ---
   #5 [ffffa1d403d08e98] clone_endio at ffffffffc005c1ec [dm_mod]
   #6 [ffffa1d403d08ed0] blk_update_request at ffffffffa6a96954
   #7 [ffffa1d403d08f10] scsi_end_request at ffffffffa6c9b968
   #8 [ffffa1d403d08f48] scsi_io_completion at ffffffffa6c9bb3e
   #9 [ffffa1d403d08f90] blk_complete_reqs at ffffffffa6aa0e95
   #10 [ffffa1d403d08fa0] __softirqentry_text_start at ffffffffa72000dc
   #11 [ffffa1d403d08ff0] do_softirq_own_stack at ffffffffa7000f9a
  --- <IRQ stack> ---
   #12 [ffffa1d40038fe70] do_softirq_own_stack at ffffffffa7000f9a
      [exception RIP: unknown or invalid address]
      RIP: 0000000000000000  RSP: 0000000000000000  RFLAGS: 00000000
      RAX: ffffffffa672eae5  RBX: ffffffffa83b34e0  RCX: ffffffffa672eb12
      RDX: 0000000000000010  RSI: 8b7d6c8869010c00  RDI: 0000000000000085
      RBP: 0000000000000286   R8: ffff914f820a8000   R9: ffffffffa67a94e0
      R10: 0000000000000286  R11: ffffffffa66fb4c5  R12: ffffffffa67a898b
      R13: 0000000000000000  R14: fffffffffffffff8  R15: ffffffffa67a1e68
      ORIG_RAX: 0000000000000000  CS: 0000  SS: ffffffffa672edff
   bt: WARNING: possibly bogus exception frame
   #13 [ffffa1d40038ff30] start_secondary at ffffffffa665fa2c
   #14 [ffffa1d40038ff50] secondary_startup_64_no_verify at ffffffffa6600116
   ...

Reported-by: Marco Patalano <[email protected]>
Signed-off-by: Lianbo Jiang <[email protected]>
fengjixuchui pushed a commit that referenced this pull request Mar 5, 2024
…usly

There is an issue that, for kernel modules, "dis -rl" fails to display
modules code line number data after execute "bt" command in crash.

Without the patch:
  crsah> mod -S
  crash> bt
  PID: 1500     TASK: ff2bd8b093524000  CPU: 16   COMMAND: "lpfc_worker_0"
   #0 [ff2c9f725c39f9e0] machine_kexec at ffffffff8e0686d3
   ...snip...
   #8 [ff2c9f725c39fcc0] __lpfc_sli_release_iocbq_s4 at ffffffffc0f2f425 [lpfc]
   ...snip...
  crash> dis -rl ffffffffc0f60f82
  0xffffffffc0f60eb0 <lpfc_nlp_get>:      nopl   0x0(%rax,%rax,1) [FTRACE NOP]
  0xffffffffc0f60eb5 <lpfc_nlp_get+5>:    push   %rbp
  0xffffffffc0f60eb6 <lpfc_nlp_get+6>:    push   %rbx
  0xffffffffc0f60eb7 <lpfc_nlp_get+7>:    test   %rdi,%rdi

With the patch:
  crash> mod -S
  crash> bt
  PID: 1500     TASK: ff2bd8b093524000  CPU: 16   COMMAND: "lpfc_worker_0"
   #0 [ff2c9f725c39f9e0] machine_kexec at ffffffff8e0686d3
   ...snip...
   #8 [ff2c9f725c39fcc0] __lpfc_sli_release_iocbq_s4 at ffffffffc0f2f425 [lpfc]
   ...snip...
  crash> dis -rl ffffffffc0f60f82
  /usr/src/debug/kernel-4.18.0-425.13.1.el8_7/linux-4.18.0-425.13.1.el8_7.x86_64/drivers/scsi/lpfc/lpfc_hbadisc.c: 6756
  0xffffffffc0f60eb0 <lpfc_nlp_get>:      nopl   0x0(%rax,%rax,1) [FTRACE NOP]
  /usr/src/debug/kernel-4.18.0-425.13.1.el8_7/linux-4.18.0-425.13.1.el8_7.x86_64/drivers/scsi/lpfc/lpfc_hbadisc.c: 6759
  0xffffffffc0f60eb5 <lpfc_nlp_get+5>:    push   %rbp

The root cause is, after kernel module been loaded by mod command, the symtable
is not expanded in gdb side. crash bt or dis command will trigger such an
expansion. However the symtable expansion is different for the 2 commands:

The stack trace of "dis -rl" for symtable expanding:

  #0  0x00000000008d8d9f in add_compunit_symtab_to_objfile ...
  #1  0x00000000006d3293 in buildsym_compunit::end_symtab_with_blockvector ...
  #2  0x00000000006d336a in buildsym_compunit::end_symtab_from_static_block ...
  #3  0x000000000077e8e9 in process_full_comp_unit ...
  #4  process_queue ...
  #5  dw2_do_instantiate_symtab ...
  #6  0x000000000077ed67 in dw2_instantiate_symtab ...
  #7  0x000000000077f75e in dw2_expand_all_symtabs ...
  #8  0x00000000008f254d in gdb_get_line_number ...
  #9  0x00000000008f22af in gdb_command_funnel_1 ...
  #10 0x00000000008f2003 in gdb_command_funnel ...
  #11 0x00000000005b7f02 in gdb_interface ...
  #12 0x00000000005f5bd8 in get_line_number ...
  #13 0x000000000059e574 in cmd_dis ...

The stack trace of "bt" for symtable expanding:

  #0  0x00000000008d8d9f in add_compunit_symtab_to_objfile ...
  #1  0x00000000006d3293 in buildsym_compunit::end_symtab_with_blockvector ...
  #2  0x00000000006d336a in buildsym_compunit::end_symtab_from_static_block ...
  #3  0x000000000077e8e9 in process_full_comp_unit ...
  #4  process_queue ...
  #5  dw2_do_instantiate_symtab ...
  #6  0x000000000077ed67 in dw2_instantiate_symtab ...
  #7  0x000000000077f8ed in dw2_lookup_symbol ...
  #8  0x00000000008e6d03 in lookup_symbol_via_quick_fns ...
  #9  0x00000000008e7153 in lookup_symbol_in_objfile ...
  #10 0x00000000008e73c6 in lookup_symbol_global_or_static_iterator_cb ...
  #11 0x00000000008b99c4 in svr4_iterate_over_objfiles_in_search_order ...
  #12 0x00000000008e754e in lookup_global_or_static_symbol ...
  #13 0x00000000008e75da in lookup_static_symbol ...
  #14 0x00000000008e632c in lookup_symbol_aux ...
  #15 0x00000000008e5a7a in lookup_symbol_in_language ...
  #16 0x00000000008e5b30 in lookup_symbol ...
  #17 0x00000000008f2a4a in gdb_get_datatype ...
  #18 0x00000000008f22c0 in gdb_command_funnel_1 ...
  crash-utility#19 0x00000000008f2003 in gdb_command_funnel ...
  crash-utility#20 0x00000000005b7f02 in gdb_interface ...
  crash-utility#21 0x00000000005f8a9f in datatype_info ...
  crash-utility#22 0x0000000000599947 in cpu_map_size ...
  crash-utility#23 0x00000000005a975d in get_cpus_online ...
  crash-utility#24 0x0000000000637a8b in diskdump_get_prstatus_percpu ...
  crash-utility#25 0x000000000062f0e4 in get_netdump_regs_x86_64 ...
  crash-utility#26 0x000000000059fe68 in back_trace ...
  crash-utility#27 0x00000000005ab1cb in cmd_bt ...

For the stacktrace of "dis -rl", it calls dw2_expand_all_symtabs() to expand
all symtable of the objfile, or "*.ko.debug" in our case. However for
the stacktrace of "bt", it doesn't expand all, but only a subset of symtable
which is enough to find a symbol by dw2_lookup_symbol(). As a result, the
objfile->compunit_symtabs, which is the head of a single linked list of
struct compunit_symtab, is not NULL but didn't contain all symtables. It
will not be reinitialized in gdb_get_line_number() by "dis -rl" because
!objfile_has_full_symbols(objfile) check will fail, so it cannot display
the proper code line number data.

Since objfile_has_full_symbols(objfile) check cannot ensure all symbols
been expanded, this patch add a new member as a flag for struct objfile
to record if all symbols have been expanded. The flag will be set only ofter
expand_all_symtabs been called.

Signed-off-by: Tao Liu <[email protected]>
fengjixuchui pushed a commit that referenced this pull request Mar 5, 2024
This patch introduces per-cpu IRQ stacks for RISCV64 to let
"bt" do backtrace on it and 'bt -E' search eframes on it,
and the 'help -m' command displays the addresses of each
per-cpu IRQ stack.

TEST: a vmcore dumped via hacking the handle_irq_event_percpu()
( Why not using lkdtm INT_HW_IRQ_EN EXCEPTION ?
  There is a deadlock[1] in crash_kexec path if use that)

  crash> bt
  PID: 0        TASK: ffffffff8140db00  CPU: 0    COMMAND: "swapper/0"
   #0 [ff20000000003e60] __handle_irq_event_percpu at ffffffff8006462e
   #1 [ff20000000003ed0] handle_irq_event_percpu at ffffffff80064702
   #2 [ff20000000003ef0] handle_irq_event at ffffffff8006477c
   #3 [ff20000000003f20] handle_fasteoi_irq at ffffffff80068664
   #4 [ff20000000003f50] generic_handle_domain_irq at ffffffff80063988
   #5 [ff20000000003f60] plic_handle_irq at ffffffff8046633e
   #6 [ff20000000003fb0] generic_handle_domain_irq at ffffffff80063988
   #7 [ff20000000003fc0] riscv_intc_irq at ffffffff80465f8e
   #8 [ff20000000003fd0] handle_riscv_irq at ffffffff808361e8
       PC: ffffffff80837314  [default_idle_call+50]
       RA: ffffffff80837310  [default_idle_call+46]
       SP: ffffffff81403da0  CAUSE: 8000000000000009
  epc : ffffffff80837314 ra : ffffffff80837310 sp : ffffffff81403da0
   gp : ffffffff814ef848 tp : ffffffff8140db00 t0 : ff2000000004bb18
   t1 : 0000000000032c73 t2 : ffffffff81200a48 s0 : ffffffff81403db0
   s1 : 0000000000000000 a0 : 0000000000000004 a1 : 0000000000000000
   a2 : ff6000009f1e7000 a3 : 0000000000002304 a4 : ffffffff80c1c2d8
   a5 : 0000000000000000 a6 : ff6000001fe01958 a7 : 00002496ea89dbf1
   s2 : ffffffff814f0220 s3 : 0000000000000001 s4 : 000000000000003f
   s5 : ffffffff814f03d8 s6 : 0000000000000000 s7 : ffffffff814f00d0
   s8 : ffffffff81526f10 s9 : ffffffff80c1d880 s10: 0000000000000000
   s11: 0000000000000001 t3 : 0000000000003392 t4 : 0000000000000000
   t5 : 0000000000000000 t6 : 0000000000000040
   status: 0000000200000120 badaddr: 0000000000000000
    cause: 8000000000000009 orig_a0: ffffffff80837310
  --- <IRQ stack> ---
   #9 [ffffffff81403da0] default_idle_call at ffffffff80837314
   #10 [ffffffff81403db0] do_idle at ffffffff8004d0a0
   #11 [ffffffff81403e40] cpu_startup_entry at ffffffff8004d21e
   #12 [ffffffff81403e60] kernel_init at ffffffff8083746a
   #13 [ffffffff81403e70] arch_post_acpi_subsys_init at ffffffff80a006d8
   #14 [ffffffff81403e80] console_on_rootfs at ffffffff80a00c92
  crash>

  crash> bt -E
  CPU 0 IRQ STACK:
  KERNEL-MODE EXCEPTION FRAME AT: ff20000000003a48
       PC: ffffffff8006462e  [__handle_irq_event_percpu+30]
       RA: ffffffff80064702  [handle_irq_event_percpu+18]
       SP: ff20000000003e60  CAUSE: 000000000000000d
  epc : ffffffff8006462e ra : ffffffff80064702 sp : ff20000000003e60
   gp : ffffffff814ef848 tp : ffffffff8140db00 t0 : 0000000000046600
   t1 : ffffffff80836464 t2 : ffffffff81200a48 s0 : ff20000000003ed0
   s1 : 0000000000000000 a0 : 0000000000000000 a1 : 0000000000000118
   a2 : 0000000000000052 a3 : 0000000000000000 a4 : 0000000000000000
   a5 : 0000000000010001 a6 : ff6000001fe01958 a7 : 00002496ea89dbf1
   s2 : ff60000000941ab0 s3 : ffffffff814a0658 s4 : ff60000000089230
   s5 : ffffffff814a0518 s6 : ffffffff814a0620 s7 : ffffffff80e5f0f8
   s8 : ffffffff80fc50b0 s9 : ffffffff80c1d880 s10: 0000000000000000
   s11: 0000000000000001 t3 : 0000000000003392 t4 : 0000000000000000
   t5 : 0000000000000000 t6 : 0000000000000040
   status: 0000000200000100 badaddr: 0000000000000078
    cause: 000000000000000d orig_a0: ff20000000003ea0

  CPU 1 IRQ STACK: (none found)

  crash>

  crash> help -m
  <snip>
             machspec: ced1e0
          irq_stack_size: 16384
           irq_stacks[0]: ff20000000000000
           irq_stacks[1]: ff20000000008000
  crash>

[1]: https://lore.kernel.org/linux-riscv/[email protected]/

Signed-off-by: Song Shuai <[email protected]>
fengjixuchui pushed a commit that referenced this pull request Mar 5, 2024
…ss range

Previously, to find a module symbol and its offset by an arbitrary address,
all symbols within the module will be iterated by address ascending order
until the last symbol with a smaller address been noticed.

However if the address is not within the module address range, e.g.
the address is higher than the module's last symbol's address, then
the module can be surely skipped, because its symbol iteration is
unnecessary. This can speed up the kernel module symbols finding and improve
the overall performance.

Without the patch:
  $ time echo "bt 8993" | ~/crash-dev/crash vmcore vmlinux
  crash> bt 8993
  PID: 8993     TASK: ffff927569cc2100  CPU: 2    COMMAND: "WriterPool0"
   #0 [ffff927569cd76f0] __schedule at ffffffffb3db78d8
   #1 [ffff927569cd7758] schedule_preempt_disabled at ffffffffb3db8bf9
   #2 [ffff927569cd7768] __mutex_lock_slowpath at ffffffffb3db6ca7
   #3 [ffff927569cd77c0] mutex_lock at ffffffffb3db602f
   #4 [ffff927569cd77d8] ucache_retrieve at ffffffffc0cf4409 [secfs2]
   ...snip the stacktrace of the same module...
   #11 [ffff927569cd7ba0] cskal_path_vfs_getattr_nosec at ffffffffc05cae76 [falcon_kal]
   ...snip...
   #13 [ffff927569cd7c40] _ZdlPv at ffffffffc086e751 [falcon_lsm_serviceable]
   ...snip...
   crash-utility#20 [ffff927569cd7ef8] unload_network_ops_symbols at ffffffffc06f11c0 [falcon_lsm_pinned_14713]
   crash-utility#21 [ffff927569cd7f50] system_call_fastpath at ffffffffb3dc539a
      RIP: 00007f2b28ed4023  RSP: 00007f2a45fe7f80  RFLAGS: 00000206
      RAX: 0000000000000012  RBX: 00007f2a68302e00  RCX: 00007f2a682546d8
      RDX: 0000000000000826  RSI: 00007eb57ea6a000  RDI: 00000000000000e3
      RBP: 00007eb57ea6a000   R8: 0000000000000826   R9: 00000002670bdfd2
      R10: 00000002670bdfd2  R11: 0000000000000293  R12: 00000002670bdfd2
      R13: 00007f29d501a480  R14: 0000000000000826  R15: 00000002670bdfd2
      ORIG_RAX: 0000000000000012  CS: 0033  SS: 002b
  crash>
  real	7m14.826s
  user	7m12.502s
  sys	0m1.091s

With the patch:
  $ time echo "bt 8993" | ~/crash-dev/crash vmcore vmlinux
  crash> bt 8993
  PID: 8993     TASK: ffff927569cc2100  CPU: 2    COMMAND: "WriterPool0"
   #0 [ffff927569cd76f0] __schedule at ffffffffb3db78d8
   #1 [ffff927569cd7758] schedule_preempt_disabled at ffffffffb3db8bf9
   ...snip the same output...
  crash>
  real	0m8.827s
  user	0m7.896s
  sys	0m0.938s

Signed-off-by: Tao Liu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants