Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error reading global variable value in module with p command #50

Open
cjl20062529 opened this issue Mar 4, 2020 · 19 comments
Open

Error reading global variable value in module with p command #50

cjl20062529 opened this issue Mar 4, 2020 · 19 comments

Comments

@cjl20062529
Copy link

Hi:
I use crash 7.2.6-3 to parse vmcore. The vmcore was generated by kernel 4.19 aarch64。

When I read the global variables in the module, the values ​​returned by the p command and the rd command are different.

#crash /boot/vmlinux vmcore
crash 7.2.6-3
Copyright (C) 2002-2019 Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
Copyright (C) 1999-2006 Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011 NEC Corporation
Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. Enter "help copying" to see the conditions.
This program has absolutely no warranty. Enter "help warranty" for details.

GNU gdb (GDB) 7.6
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "aarch64-unknown-linux-gnu"...

WARNING: cannot find NT_PRSTATUS note for cpu: 78
KERNEL: /boot/vmlinux
DUMPFILE: vmcore [PARTIAL DUMP]
CPUS: 96
DATE: Wed Feb 12 16:23:47 2020
UPTIME: 17 days, 13:05:44
LOAD AVERAGE: 5253.54, 5244.11, 5221.62
TASKS: 11580
NODENAME: 121-6
RELEASE: 4.19.aarch64
VERSION: #1 SMP Mon Jul 22 00:00:00 UTC 2019
MACHINE: aarch64 (unknown Mhz)
MEMORY: 96 GB
PANIC: "kernel BUG at /xxx/upi_cache.c:120!"
PID: 29229
COMMAND: "Jpool"
TASK: ffff8022be10be00 [THREAD_INFO: ffff8022be10be00]
CPU: 18
STATE: TASK_RUNNING (PANIC)

crash> mod -s snas_ds ./modules/snas_ds.ko
MODULE NAME SIZE OBJECT FILE
ffff000003ed0900 snas_ds 2887680 ./modules/snas_ds.ko
crash> p g_bCheckMetaCap
g_bCheckMetaCap = $1 = 2432712771
crash>
crash> rd g_bCheckMetaCap
ffff000003ececc0: 0000000000000001 ........
crash>
crash> set debug 31
debug: 31
crash> set debug 31
debug: 31
text hit rate: 0% (0 of 1)
crash> rd g_bCheckMetaCap
<addr: ffff000003ececc0 count: 1 flag: 490 (KVADDR)>
<readmem: ffff000003ececc0, KVADDR, "64-bit KVADDR", 8, (FOE), ffffc570ddb0>
<read_diskdump: addr: ffff000003ececc0 paddr: 202d7e8cecc0 cnt: 8>
read_diskdump: paddr/pfn: 202d7e8cecc0/202d7e8ce -> physical page is cached: 202d7e8ce000
ffff000003ececc0: 0000000000000001 ........
text hit rate: 0% (0 of 1)
crash> p g_bCheckMetaCap
p: per_cpu_symbol_search(g_bCheckMetaCap): NULL
g_bCheckMetaCap = GETBUF(328 -> 0)
$2 = 2432712771
FREEBUF(0)
text hit rate: 50% (1 of 2)

@crash-utility
Copy link
Collaborator

crash-utility commented Mar 4, 2020 via email

@cjl20062529
Copy link
Author

cjl20062529 commented Mar 5, 2020

Hi,
g_bCheckMetaCap define as U32 g_bCheckMetaCap = 1
crash> p g_bCheckMetaCap
g_bCheckMetaCap = $1 = 2432712771
I set debug to 31, but no debug info shown.
crash> rd g_bCheckMetaCap
ffff000003ececc0: 0000000000000001 ........

p commands seems didnot readmem the virtual address. cmd_p func call gdb interface to get the value. I read a lot of global variables defined in the module in my vmcore, and some displayed incorrectly.

I don't particularly understand the scenario and specific implementation of the p command, can you give me some guidance.

Below is the cmd_p code.

sp = NULL;
if ((sp = symbol_search(args[optind])) && !args[optind+1]) {  //《--

Enter the branch
if ((percpu_sp = per_cpu_symbol_search(args[optind])) &&
display_per_cpu_info(percpu_sp, radix, cpuspec))
return;
if (module_symbol(sp->value, NULL, NULL, NULL, *gdb_output_radix)) // <-sp->value is
the correct virtual address g_bCheckMetaCap
do_load_module_filter = TRUE;
} else if ((percpu_sp = per_cpu_symbol_search(args[optind])) &&
display_per_cpu_info(percpu_sp, radix, cpuspec))
return;
else if (st->flags & LOAD_MODULE_SYMS)
do_load_module_filter = TRUE;

if (cpuspec) {
	if (sp)
		error(WARNING, "%s is not percpu; cpuspec ignored.\n",
		      sp->name);
	else
		/* maybe a valid C expression (e.g. ':') */
		*(cpuspec-1) = ':';
}

process_gdb_output(concat_args(buf1, 0, TRUE), radix,
		   sp ? sp->name : NULL, do_load_module_filter);

@crash-utility
Copy link
Collaborator

crash-utility commented Mar 5, 2020 via email

@cjl20062529
Copy link
Author

cjl20062529 commented Mar 10, 2020

Hi,

I found that the input parameters of the gdb_readmem_callback function are incorrect.
crash> rd g_bCheckMetaCap
ffff000003ececc0: 0000000000000001 ........
but gdb_readmem_callback(addr=0xffff000003d79cc0)

" And gdb_readmem_callback() then does the requested readmem() call. "
Because it read from the cache, it is not call readmem.

Can you tell me how the addr parameters of the gdb_readmem_callback() are passed.
Thanks.

@crash-utility
Copy link
Collaborator

crash-utility commented Mar 10, 2020 via email

@cjl20062529
Copy link
Author

----- Original Message -----
Hi, I found that the input parameters of the gdb_readmem_callback function are incorrect. crash> rd g_bCheckMetaCap ffff000003ececc0: 0000000000000001 ........ but gdb_readmem_callback(addr=0xffff000003d79cc0) " And gdb_readmem_callback() then does the requested readmem() call. " Because it read from the cache, it is not call readmem. Can you tell me how the addr parameters of the gdb_readmem_callback() are passed
The "p bCheckMetaCap" string is passed to the embedded gdb module, the gdb code evaluates it, and then reads the resultant address via the call-back into gdb_readmem_callback().

Can you give me some suggestions so that I can go to the gdb code to find out why p command returns the wrong address?

@crash-utility
Copy link
Collaborator

crash-utility commented Mar 10, 2020 via email

@cjl20062529
Copy link
Author

cjl20062529 commented May 30, 2020

I still didn't find the reason why gdb can't read the global variable address in the module correctly.
I have some new discoveries, crash can not read the global variables in the live system module normally. My test module is as follows:

unsigned long along = 0x1234;
struct aaa {
	int aa;
	unsigned long bb;
} test;

static int test_init(void)
{
        printk("hello, test begin...\n");
        printk("along=0x%lx\n", along);
	test.aa = 0xabc;
	test.bb = 0x789;
	printk("test.aa=0x%lx  test.bb=0x%lx\n", test.aa, test.bb);
        return 0;
}

static void test_exit(void)
{
    printk("bye!\n");
}

If I use the command mod -s test test.o, and then read the 'along' variable, the following correct information is displayed:

crash> mod -s test test.o
     MODULE       NAME                 SIZE  OBJECT FILE
ffff000000a24040  test                16384  test.o 
crash> p /x along
$1 = 0x1234
crash> sym along
ffff000000a24000 (D) along [test]
crash> p /x &along
$2 = 0xffff000000a24000

If I use the command mod -s test test.ko, it is wrong to read 'along' information.

crash> mod -s test test.ko
     MODULE       NAME                 SIZE  OBJECT FILE
ffff000000a24040  test                16384  test.ko 
crash> p /x  along
$2 = 0x1400000004
crash> sym along
ffff000000a24000 (D) along [test]
crash> p /x  &along
$3 = 0xffff000000a23000

This is an inevitable problem. Can any expert give me some advice?
@crash-utility @bhupesh-sharma @k-hagio @lian-bo
Thanks.

@k-hagio
Copy link
Contributor

k-hagio commented Jun 2, 2020

@bhupesh-sharma @lian-bo
(Seems editing a comment doesn't send a notification..)

It looks like RHEL8 also has the same or a similar issue. I could reproduce it on RHEL8.2 for arm64 and its crash-7.2.7-3.el8, though I could not on x86_64. As @cjl20062529 said above, mod -s test test.ko is NG, but mod -s test test.o looks OK:

crash> mod -s test test.ko
     MODULE       NAME               SIZE  OBJECT FILE
ffff3efb81930040  test             262144  test.ko 
crash> sym -m test
ffff3efb81910000 MODULE START: test
ffff3efb81910000 (t) init_mod
ffff3efb81910000 (T) init_module
ffff3efb81910070 (T) cleanup_module
ffff3efb81910070 (t) exit_mod
ffff3efb81930000 (D) testint
ffff3efb81930008 (D) testlong
ffff3efb81930040 (D) __this_module
ffff3efb81950000 MODULE END: test
crash> rd testint
ffff3efb81930000:  0000000000001234                    4.......
crash> p testint
p: gdb request failed: p testint
crash> mod -d test
crash> mod -s test test.o
     MODULE       NAME               SIZE  OBJECT FILE
ffff3efb81930040  test             262144  test.o 
crash> sym -m test
ffff3efb81910000 MODULE START: test
ffff3efb81910000 (t) init_mod
ffff3efb81910000 (T) init_module
ffff3efb81910070 (T) cleanup_module
ffff3efb81910070 (t) exit_mod
ffff3efb81930000 (D) testint
ffff3efb81930008 (D) testlong
ffff3efb81930040 (d) __this_module
ffff3efb81950000 MODULE END: test
crash> rd testint
ffff3efb81930000:  0000000000001234                    4.......
crash> p testint
testint = $10 = 4660
crash> p -x testint
testint = $20 = 0x1234

@cjl20062529
Copy link
Author

@bhupesh-sharma @lian-bo
(Seems editing a comment doesn't send a notification..)

It looks like RHEL8 also has the same or a similar issue. I could reproduce it on RHEL8.2 for arm64 and its crash-7.2.7-3.el8, though I could not on x86_64. As @cjl20062529 said above, mod -s test test.ko is NG, but mod -s test test.o looks OK:

crash> mod -s test test.ko
     MODULE       NAME               SIZE  OBJECT FILE
ffff3efb81930040  test             262144  test.ko 
crash> sym -m test
ffff3efb81910000 MODULE START: test
ffff3efb81910000 (t) init_mod
ffff3efb81910000 (T) init_module
ffff3efb81910070 (T) cleanup_module
ffff3efb81910070 (t) exit_mod
ffff3efb81930000 (D) testint
ffff3efb81930008 (D) testlong
ffff3efb81930040 (D) __this_module
ffff3efb81950000 MODULE END: test
crash> rd testint
ffff3efb81930000:  0000000000001234                    4.......
crash> p testint
p: gdb request failed: p testint
crash> mod -d test
crash> mod -s test test.o
     MODULE       NAME               SIZE  OBJECT FILE
ffff3efb81930040  test             262144  test.o 
crash> sym -m test
ffff3efb81910000 MODULE START: test
ffff3efb81910000 (t) init_mod
ffff3efb81910000 (T) init_module
ffff3efb81910070 (T) cleanup_module
ffff3efb81910070 (t) exit_mod
ffff3efb81930000 (D) testint
ffff3efb81930008 (D) testlong
ffff3efb81930040 (d) __this_module
ffff3efb81950000 MODULE END: test
crash> rd testint
ffff3efb81930000:  0000000000001234                    4.......
crash> p testint
testint = $10 = 4660
crash> p -x testint
testint = $20 = 0x1234

Yes, this problem can be easily reproduced on arm64, and can also be reproduced on RHEL8.2.
I have located that it is wrong for gdb to get the address of the variable. The call stack for gdb to obtain the call variables is roughly as follows:
#0 var_decode_location (attr=0xaaaaacaf90f8, sym=0xaaaaad8845b0, cu=0xaaaaac1bd590) at dwarf2read.c:15760
#1 0x0000aaaaaae20d98 in new_symbol_full (die=0xaaaaacaf9080, type=, cu=0xaaaaac1bd590, space=) at dwarf2read.c:15976
#2 0x0000aaaaaae2276c in new_symbol (cu=0xaaaaac1bd590, type=0x0, die=0xaaaaacaf9080) at dwarf2read.c:16222
#3 process_die (die=0xaaaaacaf9080, cu=0xaaaaac1bd590) at dwarf2read.c:7275
#4 0x0000aaaaaae22914 in read_file_scope (cu=0xaaaaac1bd590, die=0xaaaaacab20d0) at dwarf2read.c:8015
#5 process_die (die=0xaaaaacab20d0, cu=0xaaaaac1bd590) at dwarf2read.c:7201
#6 0x0000aaaaaae2653c in process_full_comp_unit (pretend_language=, per_cu=) at dwarf2read.c:7005
#7 process_queue () at dwarf2read.c:6570
#8 dw2_do_instantiate_symtab (per_cu=) at dwarf2read.c:2295
#9 0x0000aaaaaae27b34 in dwarf2_read_symtab (self=0xaaaaacaab140, objfile=0xaaaaacb1da00) at dwarf2read.c:6459
#10 0x0000aaaaaad94684 in psymtab_to_symtab (objfile=objfile@entry=0xaaaaacb1da00, pst=pst@entry=0xaaaaacaab140) at psymtab.c:781
#11 0x0000aaaaaad96224 in lookup_symbol_aux_psymtabs (objfile=0xaaaaacb1da00, block_index=0, name=0xaaaaab723de0 "along", domain=VAR_DOMAIN) at psymtab.c:515
#12 0x0000aaaaaad8efe4 in lookup_symbol_aux_quick (objfile=0xaaaaacb1da00, kind=0, name=0xaaaaab723de0 "along", domain=VAR_DOMAIN) at symtab.c:1645
#13 0x0000aaaaaad8f1ec in lookup_symbol_global_iterator_cb (objfile=0xaaaaacb1da00, cb_data=0xffffffffc010) at symtab.c:1774
#14 0x0000aaaaaadfaed4 in default_iterate_over_objfiles_in_search_order (gdbarch=, cb=0xaaaaaad8f188 <lookup_symbol_global_iterator_cb>, cb_data=0xffffffffc010, current_objfile=)
at objfiles.c:1436
#15 0x0000aaaaaad8eba8 in lookup_symbol_global (name=0xaaaaab723de0 "along", block=, domain=VAR_DOMAIN) at symtab.c:1804
#16 0x0000aaaaaad8f3fc in lookup_symbol_aux (is_a_field_of_this=0x0, language=language_c, domain=VAR_DOMAIN, block=0x0, name=0xaaaaab723de0 "along") at symtab.c:1380
#17 lookup_symbol_in_language (name=name@entry=0xaaaaab723de0 "along", block=0x0, domain=VAR_DOMAIN, lang=language_c, is_a_field_of_this=0x0) at symtab.c:1213
#18 0x0000aaaaaad8f570 in lookup_symbol (name=name@entry=0xaaaaab723de0 "along", block=, block@entry=0x0, domain=domain@entry=VAR_DOMAIN, is_a_field_of_this=) at symtab.c:1241
#19 0x0000aaaaaad27768 in classify_name (block=0x0) at c-exp.y:2766
#20 0x0000aaaaaad299d0 in c_lex () at c-exp.y:2934
#21 c_parse_internal () at c-exp.c:1938
#22 0x0000aaaaaad2bd70 in c_parse () at c-exp.y:3064
#23 0x0000aaaaaadf0dcc in parse_exp_in_context (stringptr=0x0, stringptr@entry=0xffffffffdfb0, pc=pc@entry=0, block=block@entry=0x0, comma=comma@entry=0, out_subexp=out_subexp@entry=0x0, void_context_p=0)
at parse.c:1234
#24 0x0000aaaaaadf1034 in parse_exp_1 (stringptr=stringptr@entry=0xffffffffdfd8, pc=pc@entry=0, block=block@entry=0x0, comma=comma@entry=0) at parse.c:1136
#25 0x0000aaaaaadf10bc in parse_expression (string=) at parse.c:1279
#26 0x0000aaaaaad88fc4 in print_command_1 (exp=, voidprint=1) at ./printcmd.c:972
#27 0x0000aaaaaae83e28 in execute_command (p=, from_tty=1) at top.c:484
#28 0x0000aaaaaad93d9c in gdb_command_funnel (req=0xaaaaab220070 <shared_bufs>, req@entry=0x1) at symtab.c:5174
#29 0x0000aaaaaac2f504 in gdb_interface (req=0x1, req@entry=0xaaaaab220070 <shared_bufs>) at gdb_interface.c:397
#30 0x0000aaaaaac2fc4c in gdb_pass_through (cmd=cmd@entry=0xffffffffe948 "p along", fptr=fptr@entry=0x0, flags=flags@entry=8) at gdb_interface.c:332
#31 0x0000aaaaaac59ba4 in process_gdb_output (gdb_request=0xffffffffe948 "p along", radix=radix@entry=0, leader=0xaaaaacf54b40 "along", do_load_module_filter=do_load_module_filter@entry=1) at symbols.c:7323
#32 0x0000aaaaaac63854 in cmd_p () at symbols.c:7305
#33 0x0000aaaaaab958d8 in exec_command () at main.c:879
#34 0x0000aaaaaab95c1c in main_loop () at main.c:826
#35 0x0000aaaaaadc3488 in captured_command_loop (data=) at main.c:258
#36 0x0000aaaaaadc18fc in catch_errors (func=0x1, func@entry=0xaaaaaadc3468 <captured_command_loop>, func_args=0x1, func_args@entry=0x0, errstring=0xfffffffff1a0 "\020\362\377\377\377\377",
errstring@entry=0xaaaaaaff6840 "", mask=403589793, mask@entry=6) at exceptions.c:557
#37 0x0000aaaaaadc4664 in captured_main (data=) at main.c:1064
#38 0x0000aaaaaadc18fc in catch_errors (func=0xaaaaaab94014 <main+2620>, func@entry=0xaaaaaadc3810 <captured_main>, func_args=0xaaaaab18ac58 , func_args@entry=0xfffffffff248,
errstring=0xfffffffff260 "\340\362\377\377\377\377", errstring@entry=0xaaaaaaff6840 "", mask=403589793, mask@entry=6) at exceptions.c:557
#39 0x0000aaaaaadc4a14 in gdb_main (args=0xfffffffff248) at main.c:1079
#40 gdb_main_entry (argc=, argv=) at main.c:1099
#41 0x0000aaaaaab94014 in main (argc=43690, argv=0x0) at main.c:707

In the layer 0 call stack, the information in ‘’attr‘’ contains the global variable address of the module, and attr is obtained by ‘’die‘’. I can't find where to get the die information.

@lian-bo
Copy link
Member

lian-bo commented Jun 6, 2020

@bhupesh-sharma @lian-bo
(Seems editing a comment doesn't send a notification..)

It looks like RHEL8 also has the same or a similar issue. I could reproduce it on RHEL8.2 for arm64 and its crash-7.2.7-3.el8, though I could not on x86_64. As @cjl20062529 said above, mod -s test test.ko is NG, but mod -s test test.o looks OK:

crash> mod -s test test.ko
     MODULE       NAME               SIZE  OBJECT FILE
ffff3efb81930040  test             262144  test.ko 
crash> sym -m test
ffff3efb81910000 MODULE START: test
ffff3efb81910000 (t) init_mod
ffff3efb81910000 (T) init_module
ffff3efb81910070 (T) cleanup_module
ffff3efb81910070 (t) exit_mod
ffff3efb81930000 (D) testint
ffff3efb81930008 (D) testlong
ffff3efb81930040 (D) __this_module
ffff3efb81950000 MODULE END: test
crash> rd testint
ffff3efb81930000:  0000000000001234                    4.......
crash> p testint
p: gdb request failed: p testint
crash> mod -d test
crash> mod -s test test.o
     MODULE       NAME               SIZE  OBJECT FILE
ffff3efb81930040  test             262144  test.o 
crash> sym -m test
ffff3efb81910000 MODULE START: test
ffff3efb81910000 (t) init_mod
ffff3efb81910000 (T) init_module
ffff3efb81910070 (T) cleanup_module
ffff3efb81910070 (t) exit_mod
ffff3efb81930000 (D) testint
ffff3efb81930008 (D) testlong
ffff3efb81930040 (d) __this_module
ffff3efb81950000 MODULE END: test
crash> rd testint
ffff3efb81930000:  0000000000001234                    4.......
crash> p testint
testint = $10 = 4660
crash> p -x testint
testint = $20 = 0x1234

@k-hagio
This should be related to the gdb behavior, not a crash issue. If the gdb loads the test.ko, it can be reproduced on the latest gdb-9.2 as follow:

gdb/gdb /home/mod/test/test.ko

GNU gdb (GDB) 9.2
......
#Reading symbols from /home/mod/test/test.ko...
(gdb) p along
$1 = 85899345924 ---here is not correct value.

[root@hpe-apollo-cn99xx-14-vm-08 build]# gdb/gdb /home/mod/test/test.o
GNU gdb (GDB) 9.2
......
Reading symbols from /home/mod/test/test.o...
(gdb) p along
$1 = 4660 ---here is correct value as expected.(0x1234)

@cjl20062529 Can you help to report this issue in gdb? Thanks.

BTW: I would like to forward this issue to the gdb maintainer(Pedro Alves), who knows more details about gdb, and we can still discuss with them together. Thanks.

@k-hagio
Copy link
Contributor

k-hagio commented Jun 6, 2020

@lian-bo, thanks for the info.
But I'm not sure whether the gdb behavior you showed is the same as this issue in crash.

I tried the following and it looks to be fixed in gdb-9.2 (and gdb-9.1 as well):

# ../gdb-9.2 /usr/lib/debug/usr/lib/modules/4.18.0-193.el8.aarch64/vmlinux /proc/kcore
...
(gdb) add-symbol-file test.ko 0xffff0000091d0000 -s .data 0xffff0000091f0000
add symbol table from file "test.ko" at
        .text_addr = 0xffff0000091d0000
        .data_addr = 0xffff0000091f0000
(y or n) y
Reading symbols from test.ko...
(gdb) print /x testlong
$1 = 0x1234            <<-- correct

# ../gdb-8.3 /usr/lib/debug/usr/lib/modules/4.18.0-193.el8.aarch64/vmlinux /proc/kcore
...
(gdb) add-symbol-file test.ko 0xffff0000091d0000 -s .data 0xffff0000091f0000
add symbol table from file "test.ko" at
        .text_addr = 0xffff0000091d0000
        .data_addr = 0xffff0000091f0000
(y or n) y
Reading symbols from test.ko...
(gdb) p /x testlong
$1 = 0x554e4700000003  <<-- wrong

If this behavior is the issue, it looks to be fixed by this patch to me:

commit 4b610737f02338b2aea7641ab771aa5e137d067c
Author: Tom Tromey <[email protected]>
Date:   Tue Jun 25 12:50:45 2019 -0600

    Handle copy relocations

FYI, the add-symbol-file command is seen in crash's debug output:

crash> set debug 1
crash> mod -s test test.ko
...
add-symbol-file test.ko 0xffff0000091d0000  -s .data 0xffff0000091f0000
add symbol table from file "test.ko" at
        .text_addr = 0xffff0000091d0000
        .data_addr = 0xffff0000091f0000
     MODULE       NAME               SIZE  OBJECT FILE
ffff0000091f0040  test             262144  test.ko 
crash> 

@lian-bo
Copy link
Member

lian-bo commented Jun 7, 2020

@lian-bo, thanks for the info.
But I'm not sure whether the gdb behavior you showed is the same as this issue in crash.

It may be similar to this issue, but not sure if the gdb still has another issue about this, the behavior looks strange.

I tried the following and it looks to be fixed in gdb-9.2 (and gdb-9.1 as well):

In general, I tend to use the file command to load it as follow, and got the wrong result. If I use the add-symbol-file command to load a symbol file with additional parameters, and got the correct result.

[root@hpe-apollo-cn99xx-14-vm-08 test]# /home/mod/binutils-gdb/build/gdb/gdb /usr/lib/debug/usr/lib/modules/4.18.0-193.el8.aarch64/vmlinux /proc/kcore
GNU gdb (GDB) 8.3.50.20191002-git
......
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/lib/debug/usr/lib/modules/4.18.0-193.el8.aarch64/vmlinux...
[New process 1]
Core was generated by `BOOT_IMAGE=(hd0,gpt2)/vmlinuz-4.18.0-193.el8.aarch64 root=/dev/mapper/rhel_hpe-'.
#0 0x0000000000000000 in ?? ()
(gdb) file test.ko
warning: core file may not match specified executable file.
Load new symbol table from "test.ko"? (y or n) y
Reading symbols from test.ko...
(gdb) p along
$1 = 85899345924
(gdb) quit

[root@hpe-apollo-cn99xx-14-vm-08 test]# /home/mod/binutils-gdb/build/gdb/gdb /usr/lib/debug/usr/lib/modules/4.18.0-193.el8.aarch64/vmlinux /proc/kcore
GNU gdb (GDB) 8.3.50.20191002-git
......
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/lib/debug/usr/lib/modules/4.18.0-193.el8.aarch64/vmlinux...
[New process 1]
Core was generated by `BOOT_IMAGE=(hd0,gpt2)/vmlinuz-4.18.0-193.el8.aarch64 root=/dev/mapper/rhel_hpe-'.
#0 0x0000000000000000 in ?? ()
(gdb) add-symbol-file test.ko 0xffff3fedc60e0000 -s .data 0xffff3fedc6100000 -s .bss 0xffff3fedc61003c0
add symbol table from file "test.ko" at
.text_addr = 0xffff3fedc60e0000
.data_addr = 0xffff3fedc6100000
.bss_addr = 0xffff3fedc61003c0
(y or n) y
Reading symbols from test.ko...
(gdb) p along
$1 = 4660
(gdb) quit

# ../gdb-9.2 /usr/lib/debug/usr/lib/modules/4.18.0-193.el8.aarch64/vmlinux /proc/kcore
...
(gdb) add-symbol-file test.ko 0xffff0000091d0000 -s .data 0xffff0000091f0000
add symbol table from file "test.ko" at
        .text_addr = 0xffff0000091d0000
        .data_addr = 0xffff0000091f0000
(y or n) y
Reading symbols from test.ko...
(gdb) print /x testlong
$1 = 0x1234            <<-- correct

# ../gdb-8.3 /usr/lib/debug/usr/lib/modules/4.18.0-193.el8.aarch64/vmlinux /proc/kcore
...
(gdb) add-symbol-file test.ko 0xffff0000091d0000 -s .data 0xffff0000091f0000
add symbol table from file "test.ko" at
        .text_addr = 0xffff0000091d0000
        .data_addr = 0xffff0000091f0000
(y or n) y
Reading symbols from test.ko...
(gdb) p /x testlong
$1 = 0x554e4700000003  <<-- wrong

If this behavior is the issue, it looks to be fixed by this patch to me:

Good findings. I just tested the commit: 4b610737f0 ("Handle copy relocations") with your method, and I saw the gdb works well and got the correct result as expected.

commit 4b610737f02338b2aea7641ab771aa5e137d067c
Author: Tom Tromey <[email protected]>
Date:   Tue Jun 25 12:50:45 2019 -0600

    Handle copy relocations

FYI, the add-symbol-file command is seen in crash's debug output:

The above commit may be able to fix the current problem that we are facing in the crash utility. In view of this, introduce this patch to the crash utility or rebase to the latest gdb, any thoughts?
Thanks.

crash> set debug 1
crash> mod -s test test.ko
...
add-symbol-file test.ko 0xffff0000091d0000  -s .data 0xffff0000091f0000
add symbol table from file "test.ko" at
        .text_addr = 0xffff0000091d0000
        .data_addr = 0xffff0000091f0000
     MODULE       NAME               SIZE  OBJECT FILE
ffff0000091f0040  test             262144  test.ko 
crash> 

@cjl20062529
Copy link
Author

Good findings. I just tested the commit: 4b610737f0 ("Handle copy relocations") with your method, and I saw the gdb works well and got the correct result as expected.

commit 4b610737f02338b2aea7641ab771aa5e137d067c
Author: Tom Tromey <[email protected]>
Date:   Tue Jun 25 12:50:45 2019 -0600

    Handle copy relocations

FYI, the add-symbol-file command is seen in crash's debug output:

The above commit may be able to fix the current problem that we are facing in the crash utility. In view of this, introduce this patch to the crash utility or rebase to the latest gdb, any thoughts?
Thanks.

This is really good news, thanks. @k-hagio @lian-bo

I have tried to upgrade gdb in crash before and it failed because the difference is too big. But you have proved that this patch is a repair patch, which is really good.

I tried to consult with gdb maintainer, but I didn't find a communication channel.

@k-hagio
Copy link
Contributor

k-hagio commented Jun 8, 2020

I'll see if the patch can be applied to the crash utility later on.
Rebasing gdb should be a very tough work for us (at least for me), so I think it would be a last resort.

@k-hagio
Copy link
Contributor

k-hagio commented Jun 9, 2020

Hmm, I tried to apply the patch to crash, but it looks pretty hard for me because it got changed from C into C++ and I'm also not an expert on gdb. Who can do this?

Another approach I think of is that, it looks like the debug modules (*.ko.debug) in RHEL8 kernel-debuginfo don't reproduce this issue as far as I've checked, so there might be some option or config to be able to avoid this issue?

@lian-bo
Copy link
Member

lian-bo commented Jun 10, 2020

Hmm, I tried to apply the patch to crash, but it looks pretty hard for me because it got changed from C into C++ and I'm also not an expert on gdb. Who can do this?

There are too many differences between gdb-7.6 and gdb-8.3+, and there are some dependencies. It's not easy to backport from the latest gdb. Anyway, let me investigate later to see if this is doable.

Another approach I think of is that, it looks like the debug modules (*.ko.debug) in RHEL8 kernel-debuginfo don't reproduce this issue as far as I've checked, so there might be some option or config to be able to avoid this issue?

It might be worth looking into what happened.
btw: I checked the config, unfortunately, I didn't see useful clues, there may be some compile options?

@k-hagio
Copy link
Contributor

k-hagio commented Jun 18, 2020

I have not found any good workaround or fix for this issue so far.
Please use rd and struct commands instead of p for now.

@xuchunmei000
Copy link

I have not found any good workaround or fix for this issue so far. Please use rd and struct commands instead of p for now.

hi,I am using crash-8.0.2 with gdb 10.2, the problem is still exist.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants