-
Notifications
You must be signed in to change notification settings - Fork 279
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error reading global variable value in module with p command #50
Comments
----- Original Message -----
Hi:
I use crash 7.2.6-3 to parse vmcore. The vmcore was generated by kernel
4.19 aarch64。
When I read the global variables in the module, the values returned by the
p command and the rd command are different.
#crash /boot/vmlinux vmcore
crash 7.2.6-3
Copyright (C) 2002-2019 Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
Copyright (C) 1999-2006 Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011 NEC Corporation
Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. Enter "help copying" to see the conditions.
This program has absolutely no warranty. Enter "help warranty" for details.
GNU gdb (GDB) 7.6
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "aarch64-unknown-linux-gnu"...
WARNING: cannot find NT_PRSTATUS note for cpu: 78
KERNEL: /boot/vmlinux
DUMPFILE: vmcore [PARTIAL DUMP]
CPUS: 96
DATE: Wed Feb 12 16:23:47 2020
UPTIME: 17 days, 13:05:44
LOAD AVERAGE: 5253.54, 5244.11, 5221.62
TASKS: 11580
NODENAME: 121-6
RELEASE: 4.19.aarch64
VERSION: #1 SMP Mon Jul 22 00:00:00 UTC 2019
MACHINE: aarch64 (unknown Mhz)
MEMORY: 96 GB
PANIC: "kernel BUG at /xxx/upi_cache.c:120!"
PID: 29229
COMMAND: "Jpool"
TASK: ffff8022be10be00 [THREAD_INFO: ffff8022be10be00]
CPU: 18
STATE: TASK_RUNNING (PANIC)
crash> mod -s snas_ds ./modules/snas_ds.ko
MODULE NAME SIZE OBJECT FILE
ffff000003ed0900 snas_ds 2887680 ./modules/snas_ds.ko
crash> p g_bCheckMetaCap
g_bCheckMetaCap = $1 = 2432712771
crash>
crash> rd g_bCheckMetaCap
ffff000003ececc0: 0000000000000001 ........
crash>
crash> set debug 31
debug: 31
crash> set debug 31
debug: 31
text hit rate: 0% (0 of 1)
crash> rd g_bCheckMetaCap
<addr: ffff000003ececc0 count: 1 flag: 490 (KVADDR)>
<readmem: ffff000003ececc0, KVADDR, "64-bit KVADDR", 8, (FOE), ffffc570ddb0>
<read_diskdump: addr: ffff000003ececc0 paddr: 202d7e8cecc0 cnt: 8>
read_diskdump: paddr/pfn: 202d7e8cecc0/202d7e8ce -> physical page is cached:
202d7e8ce000
ffff000003ececc0: 0000000000000001 ........
text hit rate: 0% (0 of 1)
crash> p g_bCheckMetaCap
p: per_cpu_symbol_search(g_bCheckMetaCap): NULL
g_bCheckMetaCap = GETBUF(328 -> 0)
$2 = 2432712771
FREEBUF(0)
text hit rate: 50% (1 of 2)
I don't understand why there's no debug output after the "p g_bCheckMetaCap" command?
There should be a "<readmem: ..." line with a virtual address and a "gdb_readmem_callback"
type string.
Note that the "rd g_bCheckMetaCap" command shows a readmem debug output line with virtual
address ffff000003ececc0 and type "64-bit KVADDR".
In any case, both the rd and the p commands should be requesting the same virtual address,
which would be the address shown by "sym g_bCheckMetaCap". But presumably that's not the
case for some reason.
|
Hi, p commands seems didnot readmem the virtual address. cmd_p func call gdb interface to get the value. I read a lot of global variables defined in the module in my vmcore, and some displayed incorrectly. I don't particularly understand the scenario and specific implementation of the p command, can you give me some guidance. Below is the cmd_p code.
Enter the branch
|
----- Original Message -----
> ----- Original Message -----
> Hi: I use crash 7.2.6-3 to parse vmcore. The vmcore was generated by kernel
> 4.19 aarch64。 When I read the global variables in the module, the values
> returned by the p command and the rd command are different. #crash
> /boot/vmlinux vmcore crash 7.2.6-3 Copyright (C) 2002-2019 Red Hat, Inc.
> Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation Copyright (C)
> 1999-2006 Hewlett-Packard Co Copyright (C) 2005, 2006, 2011, 2012 Fujitsu
> Limited Copyright (C) 2006, 2007 VA Linux Systems Japan K.K. Copyright (C)
> 2005, 2011 NEC Corporation Copyright (C) 1999, 2002, 2007 Silicon
> Graphics, Inc. Copyright (C) 1999, 2000, 2001, 2002 Mission Critical
> Linux, Inc. This program is free software, covered by the GNU General
> Public License, and you are welcome to change it and/or distribute copies
> of it under certain conditions. Enter "help copying" to see the
> conditions. This program has absolutely no warranty. Enter "help warranty"
> for details. GNU gdb (GDB) 7.6 Copyright (C) 2013 Free Software
> Foundation, Inc. License GPLv3+: GNU GPL version 3 or later
> <http://gnu.org/licenses/gpl.html> This is free software: you are free to
> change and redistribute it. There is NO WARRANTY, to the extent permitted
> by law. Type "show copying" and "show warranty" for details. This GDB was
> configured as "aarch64-unknown-linux-gnu"... WARNING: cannot find
> NT_PRSTATUS note for cpu: 78 KERNEL: /boot/vmlinux DUMPFILE: vmcore
> [PARTIAL DUMP] CPUS: 96 DATE: Wed Feb 12 16:23:47 2020 UPTIME: 17 days,
> 13:05:44 LOAD AVERAGE: 5253.54, 5244.11, 5221.62 TASKS: 11580 NODENAME:
> 121-6 RELEASE: 4.19.aarch64 VERSION: #1 SMP Mon Jul 22 00:00:00 UTC 2019
> MACHINE: aarch64 (unknown Mhz) MEMORY: 96 GB PANIC: "kernel BUG at
> /xxx/upi_cache.c:120!" PID: 29229 COMMAND: "Jpool" TASK: ffff8022be10be00
> [THREAD_INFO: ffff8022be10be00] CPU: 18 STATE: TASK_RUNNING (PANIC) crash>
> mod -s snas_ds ./modules/snas_ds.ko MODULE NAME SIZE OBJECT FILE
> ffff000003ed0900 snas_ds 2887680 ./modules/snas_ds.ko crash> p
> g_bCheckMetaCap g_bCheckMetaCap = $1 = 2432712771 crash> crash> rd
> g_bCheckMetaCap ffff000003ececc0: 0000000000000001 ........ crash> crash>
> set debug 31 debug: 31 crash> set debug 31 debug: 31 text hit rate: 0% (0
> of 1) crash> rd g_bCheckMetaCap <addr: ffff000003ececc0 count: 1 flag: 490
> (KVADDR)> <readmem: ffff000003ececc0, KVADDR, "64-bit KVADDR", 8, (FOE),
> ffffc570ddb0> <read_diskdump: addr: ffff000003ececc0 paddr: 202d7e8cecc0
> cnt: 8> read_diskdump: paddr/pfn: 202d7e8cecc0/202d7e8ce -> physical page
> is cached: 202d7e8ce000 ffff000003ececc0: 0000000000000001 ........ text
> hit rate: 0% (0 of 1) crash> p g_bCheckMetaCap p:
> per_cpu_symbol_search(g_bCheckMetaCap): NULL g_bCheckMetaCap = GETBUF(328
> -> 0) $2 = 2432712771 FREEBUF(0) text hit rate: 50% (1 of 2)
> I don't understand why there's no debug output after the "p
> g_bCheckMetaCap" command? There should be a "<readmem: ..." line with a
> virtual address and a "gdb_readmem_callback" type string. Note that the
> "rd g_bCheckMetaCap" command shows a readmem debug output line with
> virtual address ffff000003ececc0 and type "64-bit KVADDR". In any case,
> both the rd and the p commands should be requesting the same virtual
> address, which would be the address shown by "sym g_bCheckMetaCap". But
> presumably that's not the case for some reason.
==========================================
Hi,
p commands seems didnot readmem the virtual address of g_bCheckMetaCap. cmd_p
func call gdb interface to get the value.
Below is the cmd_p code.
sp = NULL;
if ((sp = symbol_search(args[optind])) && !args[optind+1]) { **//《--
Enter the branch**
if ((percpu_sp = per_cpu_symbol_search(args[optind])) &&
display_per_cpu_info(percpu_sp, radix, cpuspec))
return;
if (module_symbol(sp->value, NULL, NULL, NULL, *gdb_output_radix)) **//
<-sp->value is
the correct virtual address**
do_load_module_filter = TRUE;
} else if ((percpu_sp = per_cpu_symbol_search(args[optind])) &&
display_per_cpu_info(percpu_sp, radix, cpuspec))
return;
else if (st->flags & LOAD_MODULE_SYMS)
do_load_module_filter = TRUE;
if (cpuspec) {
if (sp)
error(WARNING, "%s is not percpu; cpuspec ignored.\n",
sp->name);
else
/* maybe a valid C expression (e.g. ':') */
*(cpuspec-1) = ':';
}
process_gdb_output(concat_args(buf1, 0, TRUE), radix,
sp ? sp->name : NULL, do_load_module_filter);
That's correct. However, when gdb needs to read the data in order to
display it, it calls back into the crash utility's gdb_readmem_callback()
function. And gdb_readmem_callback() then does the requested readmem() call.
|
Hi, I found that the input parameters of the gdb_readmem_callback function are incorrect. " And gdb_readmem_callback() then does the requested readmem() call. " Can you tell me how the addr parameters of the gdb_readmem_callback() are passed. |
----- Original Message -----
Hi,
I found that the input parameters of the gdb_readmem_callback function are
incorrect.
crash> rd g_bCheckMetaCap
ffff000003ececc0: 0000000000000001 ........
but gdb_readmem_callback(addr=0xffff000003d79cc0)
" And gdb_readmem_callback() then does the requested readmem() call. "
Because it read from the cache, it is not call readmem.
Can you tell me how the addr parameters of the gdb_readmem_callback() are
passed
The "p bCheckMetaCap" string is passed to the embedded gdb module, the gdb
code evaluates it, and then reads the resultant address via the call-back
into gdb_readmem_callback().
|
Can you give me some suggestions so that I can go to the gdb code to find out why p command returns the wrong address? |
----- Original Message -----
> ----- Original Message -----
> Hi, I found that the input parameters of the gdb_readmem_callback function
> are incorrect. crash> rd g_bCheckMetaCap ffff000003ececc0:
> 0000000000000001 ........ but
> gdb_readmem_callback(addr=0xffff000003d79cc0) " And gdb_readmem_callback()
> then does the requested readmem() call. " Because it read from the cache,
> it is not call readmem. Can you tell me how the addr parameters of the
> gdb_readmem_callback() are passed
> The "p bCheckMetaCap" string is passed to the embedded gdb module, the gdb
> code evaluates it, and then reads the resultant address via the call-back
> into gdb_readmem_callback().
Can you give me some suggestions so that I can go to the gdb code to find out
why p command returns the wrong address?
The gdb sources incredibly convoluted, and I am by no means an expert.
Start with print_command() in gdb-7.6/gdb/printcmd.c, and go from there.
Somewhere in there it will parse the string and evaluate it to an address.
|
I still didn't find the reason why gdb can't read the global variable address in the module correctly.
If I use the command mod -s test test.o, and then read the 'along' variable, the following correct information is displayed:
If I use the command mod -s test test.ko, it is wrong to read 'along' information.
This is an inevitable problem. Can any expert give me some advice? |
@bhupesh-sharma @lian-bo It looks like RHEL8 also has the same or a similar issue. I could reproduce it on RHEL8.2 for arm64 and its crash-7.2.7-3.el8, though I could not on x86_64. As @cjl20062529 said above,
|
Yes, this problem can be easily reproduced on arm64, and can also be reproduced on RHEL8.2. In the layer 0 call stack, the information in ‘’attr‘’ contains the global variable address of the module, and attr is obtained by ‘’die‘’. I can't find where to get the die information. |
@k-hagio gdb/gdb /home/mod/test/test.koGNU gdb (GDB) 9.2 [root@hpe-apollo-cn99xx-14-vm-08 build]# gdb/gdb /home/mod/test/test.o @cjl20062529 Can you help to report this issue in gdb? Thanks. BTW: I would like to forward this issue to the gdb maintainer(Pedro Alves), who knows more details about gdb, and we can still discuss with them together. Thanks. |
@lian-bo, thanks for the info. I tried the following and it looks to be fixed in gdb-9.2 (and gdb-9.1 as well):
If this behavior is the issue, it looks to be fixed by this patch to me:
FYI, the add-symbol-file command is seen in crash's debug output:
|
It may be similar to this issue, but not sure if the gdb still has another issue about this, the behavior looks strange.
In general, I tend to use the file command to load it as follow, and got the wrong result. If I use the add-symbol-file command to load a symbol file with additional parameters, and got the correct result. [root@hpe-apollo-cn99xx-14-vm-08 test]# /home/mod/binutils-gdb/build/gdb/gdb /usr/lib/debug/usr/lib/modules/4.18.0-193.el8.aarch64/vmlinux /proc/kcore [root@hpe-apollo-cn99xx-14-vm-08 test]# /home/mod/binutils-gdb/build/gdb/gdb /usr/lib/debug/usr/lib/modules/4.18.0-193.el8.aarch64/vmlinux /proc/kcore
Good findings. I just tested the commit: 4b610737f0 ("Handle copy relocations") with your method, and I saw the gdb works well and got the correct result as expected.
The above commit may be able to fix the current problem that we are facing in the crash utility. In view of this, introduce this patch to the crash utility or rebase to the latest gdb, any thoughts?
|
This is really good news, thanks. @k-hagio @lian-bo I have tried to upgrade gdb in crash before and it failed because the difference is too big. But you have proved that this patch is a repair patch, which is really good. I tried to consult with gdb maintainer, but I didn't find a communication channel. |
I'll see if the patch can be applied to the crash utility later on. |
Hmm, I tried to apply the patch to crash, but it looks pretty hard for me because it got changed from C into C++ and I'm also not an expert on gdb. Who can do this? Another approach I think of is that, it looks like the debug modules (*.ko.debug) in RHEL8 kernel-debuginfo don't reproduce this issue as far as I've checked, so there might be some option or config to be able to avoid this issue? |
There are too many differences between gdb-7.6 and gdb-8.3+, and there are some dependencies. It's not easy to backport from the latest gdb. Anyway, let me investigate later to see if this is doable.
It might be worth looking into what happened. |
I have not found any good workaround or fix for this issue so far. |
hi,I am using crash-8.0.2 with gdb 10.2, the problem is still exist. |
Hi:
I use crash 7.2.6-3 to parse vmcore. The vmcore was generated by kernel 4.19 aarch64。
When I read the global variables in the module, the values returned by the p command and the rd command are different.
#crash /boot/vmlinux vmcore
crash 7.2.6-3
Copyright (C) 2002-2019 Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
Copyright (C) 1999-2006 Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011 NEC Corporation
Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. Enter "help copying" to see the conditions.
This program has absolutely no warranty. Enter "help warranty" for details.
GNU gdb (GDB) 7.6
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "aarch64-unknown-linux-gnu"...
WARNING: cannot find NT_PRSTATUS note for cpu: 78
KERNEL: /boot/vmlinux
DUMPFILE: vmcore [PARTIAL DUMP]
CPUS: 96
DATE: Wed Feb 12 16:23:47 2020
UPTIME: 17 days, 13:05:44
LOAD AVERAGE: 5253.54, 5244.11, 5221.62
TASKS: 11580
NODENAME: 121-6
RELEASE: 4.19.aarch64
VERSION: #1 SMP Mon Jul 22 00:00:00 UTC 2019
MACHINE: aarch64 (unknown Mhz)
MEMORY: 96 GB
PANIC: "kernel BUG at /xxx/upi_cache.c:120!"
PID: 29229
COMMAND: "Jpool"
TASK: ffff8022be10be00 [THREAD_INFO: ffff8022be10be00]
CPU: 18
STATE: TASK_RUNNING (PANIC)
crash> mod -s snas_ds ./modules/snas_ds.ko
MODULE NAME SIZE OBJECT FILE
ffff000003ed0900 snas_ds 2887680 ./modules/snas_ds.ko
crash> p g_bCheckMetaCap
g_bCheckMetaCap = $1 = 2432712771
crash>
crash> rd g_bCheckMetaCap
ffff000003ececc0: 0000000000000001 ........
crash>
crash> set debug 31
debug: 31
crash> set debug 31
debug: 31
text hit rate: 0% (0 of 1)
crash> rd g_bCheckMetaCap
<addr: ffff000003ececc0 count: 1 flag: 490 (KVADDR)>
<readmem: ffff000003ececc0, KVADDR, "64-bit KVADDR", 8, (FOE), ffffc570ddb0>
<read_diskdump: addr: ffff000003ececc0 paddr: 202d7e8cecc0 cnt: 8>
read_diskdump: paddr/pfn: 202d7e8cecc0/202d7e8ce -> physical page is cached: 202d7e8ce000
ffff000003ececc0: 0000000000000001 ........
text hit rate: 0% (0 of 1)
crash> p g_bCheckMetaCap
p: per_cpu_symbol_search(g_bCheckMetaCap): NULL
g_bCheckMetaCap = GETBUF(328 -> 0)
$2 = 2432712771
FREEBUF(0)
text hit rate: 50% (1 of 2)
The text was updated successfully, but these errors were encountered: