Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kernel BUG at kernel/sched/core.c:2734 #89

Closed
diegofaraujo opened this issue May 24, 2016 · 2 comments
Closed

kernel BUG at kernel/sched/core.c:2734 #89

diegofaraujo opened this issue May 24, 2016 · 2 comments

Comments

@diegofaraujo
Copy link

diegofaraujo commented May 24, 2016

BBB rev A5C freezing every day with kernel 4.1.22-ti-rt-r59.

[48816.076042] kernel BUG at kernel/sched/core.c:2734!
[48816.076047] Internal error: Oops - BUG: 0 [#7] PREEMPT SMP THUMB2
[48816.076095] Modules linked in: 8021q garp mrp stp llc usb_f_acm u_serial usb_f_rndis g_multi usb_f_mass_storage u_ether libcomposite pru_rproc snd_soc_evm pruss omap_rng tilcdc rng_core spi_omap2_mcspi snd_soc_davinci_mcasp snd_soc_edma tda998x snd_soc_hdmi_codec uio_pdrv_genirq uio
[48816.076104] CPU: 0 PID: 73 Comm: irq/162-mmc1 Tainted: G      D         4.1.22-ti-rt-r59 #1
[48816.076107] Hardware name: Generic AM33XX (Flattened Device Tree)
[48816.076112] task: dce29680 ti: dceec000 task.ti: dceec000
[48816.076119] PC is at __schedule+0x3ce/0x4f8
[48816.076124] LR is at rcu_preempt_qs+0x17/0x48
[48816.076130] pc : [<c06d679e>]    lr : [<c007ca5f>]    psr: 300f01b3
[48816.076130] sp : dceed298  ip : 1766e000  fp : 00000002
[48816.076134] r10: c06d68fd  r9 : df913280  r8 : c0b662b0
[48816.076139] r7 : dceed2a8  r6 : dce66d00  r5 : df913280  r4 : c0b5c280
[48816.076144] r3 : 57ac6e9d  r2 : c004f2dd  r1 : c09e0f2c  r0 : 00000000
[48816.076149] Flags: nzCV  IRQs off  FIQs on  Mode SVC_32  ISA Thumb  Segment user
[48816.076154] Control: 50c5387d  Table: 9dc54019  DAC: 00000015
[48816.076158] Process irq/162-mmc1 (pid: 73, stack limit = 0xdceec218)
[48816.076163] Stack: (0xdceed298 to 0xdceee000)
[48816.076168] d280:                                                       c095f2dc c0b65188
[48816.076177] d2a0: dceed458 600f0193 0000000b c06d67a6 00000002 c00748c7 c095f2dc dceec000
[48816.076186] d2c0: dce29680 c0b65188 dceed2e0 600f0193 0000000b c06d67a6 00000002 c06d68fd
[48816.076195] d2e0: dce29680 0000000b dceed458 c0035821 dceed458 c06d3c4f dceed458 c0074e33
[48816.076204] d300: c0b57664 c0c4b244 c0c4b244 c0b65188 dceed458 600f0193 0000000b c06d67a6
[48816.076213] d320: 00000002 c0011a97 dceec218 0000000b 302f3535 c0b6b1b0 c0b65188 00000020
[48816.076221] d340: bf000000 00000004 c0952fd0 c0952fd8 00000005 36000021 20623939 38393734
[48816.076230] d360: 30363420 36652034 28203739 32306564 38362029 00203333 00000000 00000000
[48816.076238] d380: c0b65188 00000000 c0c8eeec dc8ba316 00000000 00000000 00030001 00000004
[48816.076247] d3a0: dceed458 c06d679e df913280 c06d68fd 00000002 c0009107 00000006 c0072c79
[48816.076256] d3c0: 00000004 00000000 00030001 c06d679e c0c4eaaf ffffffff dceed41c 00000010
[48816.076265] d3e0: 000cec69 00000000 0000bead 00000010 0000000a 00000006 00000006 00000000
[48816.076273] d400: 00000000 00000000 c0b65188 00000000 c0b65188 00000000 ffffffff 00000030
[48816.076282] d420: 00000010 00000001 00000010 00000000 00000000 00000000 c0b65188 00000000
[48816.076291] d440: c06d67a0 300f01b3 ffffffff dceed48c c0b662b0 c06d8eff 00000000 c09e0f2c
[48816.076299] d460: c004f2dd 57ac6e9d c0b5c280 df913280 dce66d00 dceed4f0 c0b662b0 df913280
[48816.076308] d480: c06d68fd 00000002 ce7ce000 dceed4e0 c007ca5f c06d679e 300f01b3 ffffffff
[48816.076317] d4a0: c0b662b0 c007ca5f 00000000 dce29680 c0c3341c c007f76b c0b5c280 c0b5c280
[48816.076326] d4c0: df913280 c0b65188 dceed4f0 c0b662b0 0000000b c06d68fd 00000002 c06d6407
[48816.076334] d4e0: c095f2dc c0b65188 dceed6a0 600f0193 0000000b c06d67a6 00000002 c00748c7
[48816.076343] d500: c095f2dc dceec000 dce29680 c0b65188 dceed528 600f0193 0000000b c06d67a6
[48816.076352] d520: 00000002 c06d68fd dce29680 0000000b dceed6a0 c0035821 dceed6a0 c06d3c4f
[48816.076361] d540: dceed6a0 c0074e33 c0b57664 c0c4b244 c0c4b244 c0b65188 dceed6a0 600f0193
[48816.076369] d560: 0000000b c06d67a6 00000002 c0011a97 dceec218 0000000b 302f3535 c0b6b1b0
[48816.076378] d580: c0b65188 00000020 bf000000 00000004 c0952fd0 c0952fd8 00000004 36000021
[48816.076387] d5a0: 20623939 38393734 30363420 36652034 28203739 32306564 38362029 00203333
[48816.076396] d5c0: 00000000 00000000 c0b65188 00000000 c0c8eeec dc8ba316 00000000 00000000
[48816.076404] d5e0: 00030001 00000004 dceed6a0 c06d679e df913280 c06d68fd 00000002 c0009107
[48816.076413] d600: 00000006 c0072c79 00000004 00000000 00030001 c06d679e c0c4eaaf ffffffff
[48816.076422] d620: dceed664 00000010 0000e1c4 00000000 0000bead 00000010 0000000a 00000005
[48816.076430] d640: 00000006 00000000 00000000 00000000 c0b65188 00000000 c0b65188 00000000
[48816.076439] d660: ffffffff 00000030 00000010 00000001 00000010 00000000 00000000 00000000
[48816.076448] d680: c0b65188 00000000 c06d67a0 300f01b3 ffffffff dceed6d4 c0b662b0 c06d8eff
[48816.076456] d6a0: 00000000 c09e0f2c c004f2dd 57ac6e9d c0b5c280 df913280 dce66d00 dceed738
[48816.076465] d6c0: c0b662b0 df913280 c06d68fd 00000002 9278f800 dceed728 c007ca5f c06d679e
[48816.076474] d6e0: 300f01b3 ffffffff c0b662b0 c007ca5f 00000000 dce29680 c0c3341c c007f76b
[48816.076483] d700: c0b5c280 c0b5c280 df913280 c0b65188 dceed738 c0b662b0 0000000b c06d68fd
[48816.076491] d720: 00000002 c06d6407 c095f2dc c0b65188 dceed8e8 600f0193 0000000b c06d67a6
[48816.076500] d740: 00000002 c00748c7 c095f2dc dceec000 dce29680 c0b65188 dceed770 600f0193
[48816.076509] d760: 0000000b c06d67a6 00000002 c06d68fd dce29680 0000000b dceed8e8 c0035821
[48816.076517] d780: dceed8e8 c06d3c4f dceed8e8 c0074e33 c0b57664 c0c4b244 c0c4b244 c0b65188
[48816.076526] d7a0: dceed8e8 600f0193 0000000b c06d67a6 00000002 c0011a97 dceec218 0000000b
[48816.076535] d7c0: 302f3535 c0b6b1b0 c0b65188 00000020 bf000000 00000004 c0952fd0 c0952fd8
[48816.076544] d7e0: 00000003 36000021 20623939 38393734 30363420 36652034 28203739 32306564
[48816.076552] d800: 38362029 00203333 00000000 00000000 c0b65188 00000000 c0c8eeec dc8ba316
[48816.076561] d820: 00000000 00000000 00030001 00000004 dceed8e8 c06d679e df913280 c06d68fd
[48816.076569] d840: 00000002 c0009107 00000006 c0072c79 00000004 00000000 00030001 c06d679e
[48816.076578] d860: c0c4eaaf ffffffff dceed8ac 00000010 00077306 00000000 0000beac 00000010
[48816.076586] d880: 0000000a 00000006 00000006 00000000 00000000 00000000 c0b65188 00000000
[48816.076595] d8a0: c0b65188 00000000 ffffffff 00000030 00000010 00000001 00000010 00000000
[48816.076604] d8c0: 00000000 00000000 c0b65188 00000000 c06d67a0 300f01b3 ffffffff dceed91c
[48816.076612] d8e0: c0b662b0 c06d8eff 00000000 c09e0f2c c004f2dd 57ac6e9d c0b5c280 df913280
[48816.076621] d900: dce66d00 dceed980 c0b662b0 df913280 c06d68fd 00000002 63731000 dceed970
[48816.076630] d920: c007ca5f c06d679e 300f01b3 ffffffff c0b662b0 c007ca5f 00000000 dce29680
[48816.076639] d940: c0c3341c c007f76b c0b5c280 c0b5c280 df913280 c0b65188 dceed980 c0b662b0
[48816.076647] d960: 0000000b c06d68fd 00000002 c06d6407 c095f2dc c0b65188 dceedb30 600f0193
[48816.076656] d980: 0000000b c06d67a6 00000002 c00748c7 c095f2dc dceec000 dce29680 c0b65188
[48816.076665] d9a0: dceed9b8 600f0193 0000000b c06d67a6 00000002 c06d68fd dce29680 0000000b
[48816.076674] d9c0: dceedb30 c0035821 dceedb30 c06d3c4f dceedb30 c0074e33 c0b57664 c0c4b244
[48816.076682] d9e0: c0c4b244 c0b65188 dceedb30 600f0193 0000000b c06d67a6 00000002 c0011a97
[48816.076691] da00: dceec218 0000000b 302f3535 c0b6b1b0 c0b65188 00000020 bf000000 00000004
[48816.076700] da20: c0952fd0 c0952fd8 00000002 36000022 20623939 38393734 30363420 36652034
[48816.076708] da40: 28203739 32306564 38362029 00203333 c0c8eeec 00000022 c0c8ef13 00000002
[48816.076717] da60: 00000000 dc8ba316 00000000 00000000 00030001 00000004 dceedb30 c06d679e
[48816.076725] da80: df913280 c06d68fd 00000002 c0009107 00000006 c0c8f2cc 00000004 00000000
[48816.076734] daa0: 00030001 c06d679e c0c4eaaf ffffffff dceedaf4 00000010 000286fb 00000000
[48816.076743] dac0: 0000beac 00000010 0000000a 00000006 00000006 00000000 00000000 00000000
[48816.076751] dae0: c0b65188 00000000 c0b65188 00000000 ffffffff 00000030 00000010 00000001
[48816.076760] db00: 00000010 00000000 00000000 00000000 c0b65188 00000000 c06d67a0 300f01b3
[48816.076769] db20: ffffffff dceedb64 c0b662b0 c06d8eff 00000000 c09e0f2c c004f2dd 57ac6e9d
[48816.076778] db40: c0b5c280 df913280 dce66d00 dceedbc8 c0b662b0 df913280 c06d68fd 00000002
[48816.076786] db60: 41831000 dceedbb8 c007ca5f c06d679e 300f01b3 ffffffff c0b662b0 c007ca5f
[48816.076795] db80: 00000000 dce29680 c0c3341c c007f76b c0b5c280 c0b5c280 df913280 c0b65188
[48816.076804] dba0: dceedbc8 c0b662b0 0000000b c06d68fd 00000002 c06d6407 c095f2dc c0b65188
[48816.076813] dbc0: dceeddd8 600f0193 0000000b c0048892 00000002 c00748c7 c095f2dc dceec000
[48816.076821] dbe0: dce29680 c0b65188 dceedc00 600f0193 0000000b c0048892 00000002 c06d68fd
[48816.076830] dc00: dce29680 0000000b dceeddd8 c0035821 dceeddd8 c06d3c4f dceeddd8 c0074e33
[48816.076839] dc20: c0b57664 c0c4b244 c0c4b244 c0b65188 dceeddd8 600f0193 0000000b c0048892
[48816.076847] dc40: 00000002 c0011a97 dceec218 0000000b 00000000 c0b6b1b0 c0b65188 00000020
[48816.076856] dc60: bf000000 00000004 c0952fd0 c0952fd8 00000000 66000001 20643538 34306265
[48816.076865] dc80: 64386620 33332030 28203462 33353866 63302029 00203431 00000002 c00748c7
[48816.076874] dca0: c0a07034 dceedccc 00000017 dc8ba316 000001ff ffffffec 00000017 c095543c
[48816.076882] dcc0: 00000017 dceeddd8 00000000 c06d954c 00000002 c06d37e1 00000000 c06d980d
[48816.076891] dce0: c0c8ef01 c0c8f2cc dceedd2c 00000001 00000049 00000000 0000bea9 00000001
[48816.076900] dd00: 0000000a 00000001 0000ffff c06d951d c0b6b84c 00000017 ffffffec dceeddd8
[48816.076909] dd20: c0b6b8bc c06d954c 00000002 c000920d 00000010 00000001 00000010 00000000
[48816.076917] dd40: 00000000 dc8ba316 c0b65188 c0c8ef1f c095f337 c0072c79 ffffffff 00000000
[48816.076926] dd60: 00000044 c0c5b8bc 00000032 00000002 00000000 00000000 c0c4e580 c0072d5f
[48816.076935] dd80: ff0a0100 00000000 00000006 00000006 c0b8e284 00000032 c0b8e284 c007460f
[48816.076943] dda0: 00000002 00000032 00000000 c0c4e580 c0c8eeee c007460f 00000000 c004888a
[48816.076952] ddc0: 800f01b3 ffffffff dceede0c 200f0193 0000000b c06d8db5 dce29680 dceedf38
[48816.076961] dde0: 000004f4 00000000 dce29680 00000000 c0c4ca88 dce29680 200f0193 0000000b
[48816.076970] de00: c06d954c 00000002 2e4c1000 dceede20 c0076579 c004888a 800f01b3 ffffffff
[48816.076978] de20: c0076559 dce29c3c 00000000 c00471a1 00000000 dce29680 00000000 00000001
[48816.076987] de40: dceede50 c003548f dceee010 c06d3c4f 200f0193 c0074e33 dceee010 c0c4b244
[48816.076996] de60: c0c4b244 c0b65188 dceee010 200f0193 0000000b c06d954c 00000002 c0011a97
[48816.077004] de80: dceec218 0000000b 00000000 c0b6b1b0 c0b65188 00000020 00000000 00000004
[48816.077013] dea0: c0952fd0 c0952fd8 00000000 66000001 20333230 66313330 31363020 39362030
[48816.077022] dec0: 28206531 36643866 32372029 00203065 00000000 c00748c7 c0a07034 dceedf04
[48816.077031] dee0: 00000017 dc8ba316 000002e1 000002e1 00000017 c095543c 00000017 dceee010
[48816.077039] df00: 00000000 00000000 00000000 c06d37e1 00000000 c06d980d c0c32e1f 00000000
[48816.077048] df20: 00000000 c06d68fd dced0324 dceec000 dced0300 c0076661 00000000 c0076559
[48816.077057] df40: dceedf58 c06d951d c0b6b84c 00000017 000002e1 dceee010 c0b6b8bc 00000000
[48816.077065] df60: 00000000 c000920d 2f20a0e0 00000000 dd8d0780 dced0300 00000000 00000000
[48816.077074] df80: dceedf80 dceedf80 00000001 00010001 dceedf90 dceedf90 dceedfa0 dced0340
[48816.077082] dfa0: c00483c9 00000000 00000000 c000e961 00000000 00000000 00000000 00000000
[48816.077090] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[48816.077099] dfe0: 00000000 00000000 00000000 00000000 00000013 c06d9544 a00f01b3 ffffffff
[48816.077110] [<c06d679e>] (__schedule) from [<c06d68fd>] (schedule+0x35/0x90)
[48816.077122] [<c06d68fd>] (schedule) from [<c0035821>] (do_exit+0x615/0x7c0)
[48816.077131] [<c0035821>] (do_exit) from [<c0011a97>] (die+0x35b/0x378)
[48816.077140] [<c0011a97>] (die) from [<c0009107>] (do_undefinstr+0xf3/0x124)
[48816.077150] [<c0009107>] (do_undefinstr) from [<c06d8eff>] (__und_svc_finish+0x1/0x22)
[48816.077155] Exception stack(0xdceed458 to 0xdceed4a0)
[48816.077160] d440:                                                       00000000 c09e0f2c
[48816.077169] d460: c004f2dd 57ac6e9d c0b5c280 df913280 dce66d00 dceed4f0 c0b662b0 df913280
[48816.077178] d480: c06d68fd 00000002 ce7ce000 dceed4e0 c007ca5f c06d679e 300f01b3 ffffffff
[48816.077188] [<c06d8eff>] (__und_svc_finish) from [<c06d679e>] (__schedule+0x3ce/0x4f8)
[48816.077197] [<c06d679e>] (__schedule) from [<c06d68fd>] (schedule+0x35/0x90)
[48816.077207] [<c06d68fd>] (schedule) from [<c0035821>] (do_exit+0x615/0x7c0)
[48816.077216] [<c0035821>] (do_exit) from [<c0011a97>] (die+0x35b/0x378)
[48816.077224] [<c0011a97>] (die) from [<c0009107>] (do_undefinstr+0xf3/0x124)
[48816.077234] [<c0009107>] (do_undefinstr) from [<c06d8eff>] (__und_svc_finish+0x1/0x22)
[48816.077238] Exception stack(0xdceed6a0 to 0xdceed6e8)
[48816.077247] d6a0: 00000000 c09e0f2c c004f2dd 57ac6e9d c0b5c280 df913280 dce66d00 dceed738
[48816.077256] d6c0: c0b662b0 df913280 c06d68fd 00000002 9278f800 dceed728 c007ca5f c06d679e
[48816.077261] d6e0: 300f01b3 ffffffff
[48816.077271] [<c06d8eff>] (__und_svc_finish) from [<c06d679e>] (__schedule+0x3ce/0x4f8)
[48816.077280] [<c06d679e>] (__schedule) from [<c06d68fd>] (schedule+0x35/0x90)
[48816.077289] [<c06d68fd>] (schedule) from [<c0035821>] (do_exit+0x615/0x7c0)
[48816.077298] [<c0035821>] (do_exit) from [<c0011a97>] (die+0x35b/0x378)
[48816.077307] [<c0011a97>] (die) from [<c0009107>] (do_undefinstr+0xf3/0x124)
[48816.077316] [<c0009107>] (do_undefinstr) from [<c06d8eff>] (__und_svc_finish+0x1/0x22)
[48816.077320] Exception stack(0xdceed8e8 to 0xdceed930)
[48816.077328] d8e0:                   00000000 c09e0f2c c004f2dd 57ac6e9d c0b5c280 df913280
[48816.077337] d900: dce66d00 dceed980 c0b662b0 df913280 c06d68fd 00000002 63731000 dceed970
[48816.077343] d920: c007ca5f c06d679e 300f01b3 ffffffff
[48816.077353] [<c06d8eff>] (__und_svc_finish) from [<c06d679e>] (__schedule+0x3ce/0x4f8)
[48816.077362] [<c06d679e>] (__schedule) from [<c06d68fd>] (schedule+0x35/0x90)
[48816.077371] [<c06d68fd>] (schedule) from [<c0035821>] (do_exit+0x615/0x7c0)
[48816.077380] [<c0035821>] (do_exit) from [<c0011a97>] (die+0x35b/0x378)
[48816.077389] [<c0011a97>] (die) from [<c0009107>] (do_undefinstr+0xf3/0x124)
[48816.077399] [<c0009107>] (do_undefinstr) from [<c06d8eff>] (__und_svc_finish+0x1/0x22)
[48816.077403] Exception stack(0xdceedb30 to 0xdceedb78)
[48816.077409] db20:                                     00000000 c09e0f2c c004f2dd 57ac6e9d
[48816.077418] db40: c0b5c280 df913280 dce66d00 dceedbc8 c0b662b0 df913280 c06d68fd 00000002
[48816.077426] db60: 41831000 dceedbb8 c007ca5f c06d679e 300f01b3 ffffffff
[48816.077436] [<c06d8eff>] (__und_svc_finish) from [<c06d679e>] (__schedule+0x3ce/0x4f8)
[48816.077444] [<c06d679e>] (__schedule) from [<c06d68fd>] (schedule+0x35/0x90)
[48816.077454] [<c06d68fd>] (schedule) from [<c0035821>] (do_exit+0x615/0x7c0)
[48816.077463] [<c0035821>] (do_exit) from [<c0011a97>] (die+0x35b/0x378)
[48816.077475] [<c0011a97>] (die) from [<c06d37e1>] (__do_kernel_fault.part.0+0x55/0x60)
[48816.077486] [<c06d37e1>] (__do_kernel_fault.part.0) from [<c06d980d>] (do_page_fault+0x2f1/0x2fc)
[48816.077497] [<c06d980d>] (do_page_fault) from [<c000920d>] (do_DataAbort+0x35/0x90)
[48816.077506] [<c000920d>] (do_DataAbort) from [<c06d8db5>] (__dabt_svc+0x35/0x60)
[48816.077510] Exception stack(0xdceeddd8 to 0xdceede20)
[48816.077516] ddc0:                                                       dce29680 dceedf38
[48816.077525] dde0: 000004f4 00000000 dce29680 00000000 c0c4ca88 dce29680 200f0193 0000000b
[48816.077533] de00: c06d954c 00000002 2e4c1000 dceede20 c0076579 c004888a 800f01b3 ffffffff
[48816.077546] [<c06d8db5>] (__dabt_svc) from [<c004888a>] (kthread_data+0xa/0x10)
[48816.077558] [<c004888a>] (kthread_data) from [<c0076579>] (irq_thread_dtor+0x21/0x8c)
[48816.077567] [<c0076579>] (irq_thread_dtor) from [<c00471a1>] (task_work_run+0x79/0xac)
[48816.077579] [<c00471a1>] (task_work_run) from [<c003548f>] (do_exit+0x283/0x7c0)
[48816.077588] [<c003548f>] (do_exit) from [<c0011a97>] (die+0x35b/0x378)
[48816.077599] [<c0011a97>] (die) from [<c06d37e1>] (__do_kernel_fault.part.0+0x55/0x60)
[48816.077611] [<c06d37e1>] (__do_kernel_fault.part.0) from [<c06d980d>] (do_page_fault+0x2f1/0x2fc)
[48816.077620] [<c06d980d>] (do_page_fault) from [<c000920d>] (do_DataAbort+0x35/0x90)
[48816.077629] Code: 699b 4798 4604 e697 (de02) 6833
[48817.528672] ---[ end trace 0000000000000008 ]---
[48817.528675] Fixing recursive fault but reboot is needed!
[48817.528683] ------------[ cut here ]------------
``
RobertCNelson pushed a commit that referenced this issue May 31, 2018
[ Upstream commit 00c20cd ]

When aacraid init fails with "AAC0: adapter self-test failed.", shutdown
leads to UBSAN warning and then oops:

[154316.118423] ================================================================================
[154316.118508] UBSAN: Undefined behaviour in drivers/scsi/scsi_lib.c:2328:27
[154316.118566] member access within null pointer of type 'struct Scsi_Host'
[154316.118631] CPU: 2 PID: 14530 Comm: reboot Tainted: G        W        4.15.0-dirty #89
[154316.118701] Hardware name: Hewlett Packard HP NetServer/HP System Board, BIOS 4.06.46 PW 06/25/2003
[154316.118774] Call Trace:
[154316.118848]  dump_stack+0x48/0x65
[154316.118916]  ubsan_epilogue+0xe/0x40
[154316.118976]  __ubsan_handle_type_mismatch+0xfb/0x180
[154316.119043]  scsi_block_requests+0x20/0x30
[154316.119135]  aac_shutdown+0x18/0x40 [aacraid]
[154316.119196]  pci_device_shutdown+0x33/0x50
[154316.119269]  device_shutdown+0x18a/0x390
[...]
[154316.123435] BUG: unable to handle kernel NULL pointer dereference at 000000f4
[154316.123515] IP: scsi_block_requests+0xa/0x30

This is because aac_shutdown() does

        struct Scsi_Host *shost = pci_get_drvdata(dev);
        scsi_block_requests(shost);

and that assumes shost has been assigned with pci_set_drvdata().

However, pci_set_drvdata(pdev, shost) is done in aac_probe_one() far
after bailing out with error from calling the init function
((*aac_drivers[index].init)(aac)), and when the init function fails, no
error is returned from aac_probe_one() so PCI layer assumes there is
driver attached, and tries to shut it down later.

Fix it by returning error from aac_probe_one() when card-specific init
function fails.

This fixes reboot on my HP NetRAID-4M with dead battery.

Signed-off-by: Meelis Roos <[email protected]>
Reviewed-by: Dave Carroll <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
RobertCNelson pushed a commit that referenced this issue Jun 4, 2018
[ Upstream commit 00c20cd ]

When aacraid init fails with "AAC0: adapter self-test failed.", shutdown
leads to UBSAN warning and then oops:

[154316.118423] ================================================================================
[154316.118508] UBSAN: Undefined behaviour in drivers/scsi/scsi_lib.c:2328:27
[154316.118566] member access within null pointer of type 'struct Scsi_Host'
[154316.118631] CPU: 2 PID: 14530 Comm: reboot Tainted: G        W        4.15.0-dirty #89
[154316.118701] Hardware name: Hewlett Packard HP NetServer/HP System Board, BIOS 4.06.46 PW 06/25/2003
[154316.118774] Call Trace:
[154316.118848]  dump_stack+0x48/0x65
[154316.118916]  ubsan_epilogue+0xe/0x40
[154316.118976]  __ubsan_handle_type_mismatch+0xfb/0x180
[154316.119043]  scsi_block_requests+0x20/0x30
[154316.119135]  aac_shutdown+0x18/0x40 [aacraid]
[154316.119196]  pci_device_shutdown+0x33/0x50
[154316.119269]  device_shutdown+0x18a/0x390
[...]
[154316.123435] BUG: unable to handle kernel NULL pointer dereference at 000000f4
[154316.123515] IP: scsi_block_requests+0xa/0x30

This is because aac_shutdown() does

        struct Scsi_Host *shost = pci_get_drvdata(dev);
        scsi_block_requests(shost);

and that assumes shost has been assigned with pci_set_drvdata().

However, pci_set_drvdata(pdev, shost) is done in aac_probe_one() far
after bailing out with error from calling the init function
((*aac_drivers[index].init)(aac)), and when the init function fails, no
error is returned from aac_probe_one() so PCI layer assumes there is
driver attached, and tries to shut it down later.

Fix it by returning error from aac_probe_one() when card-specific init
function fails.

This fixes reboot on my HP NetRAID-4M with dead battery.

Signed-off-by: Meelis Roos <[email protected]>
Reviewed-by: Dave Carroll <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
RobertCNelson pushed a commit that referenced this issue Jun 22, 2018
[ Upstream commit 0dfc0c7 ]

It allows to flush more than 4GB of device TLBs. So the mask should be
64bit wide. UBSAN captured this fault as below.

[    3.760024] ================================================================================
[    3.768440] UBSAN: Undefined behaviour in drivers/iommu/dmar.c:1348:3
[    3.774864] shift exponent 64 is too large for 32-bit type 'int'
[    3.780853] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G     U            4.17.0-rc1+ #89
[    3.788661] Hardware name: Dell Inc. OptiPlex 7040/0Y7WYT, BIOS 1.2.8 01/26/2016
[    3.796034] Call Trace:
[    3.798472]  <IRQ>
[    3.800479]  dump_stack+0x90/0xfb
[    3.803787]  ubsan_epilogue+0x9/0x40
[    3.807353]  __ubsan_handle_shift_out_of_bounds+0x10e/0x170
[    3.812916]  ? qi_flush_dev_iotlb+0x124/0x180
[    3.817261]  qi_flush_dev_iotlb+0x124/0x180
[    3.821437]  iommu_flush_dev_iotlb+0x94/0xf0
[    3.825698]  iommu_flush_iova+0x10b/0x1c0
[    3.829699]  ? fq_ring_free+0x1d0/0x1d0
[    3.833527]  iova_domain_flush+0x25/0x40
[    3.837448]  fq_flush_timeout+0x55/0x160
[    3.841368]  ? fq_ring_free+0x1d0/0x1d0
[    3.845200]  ? fq_ring_free+0x1d0/0x1d0
[    3.849034]  call_timer_fn+0xbe/0x310
[    3.852696]  ? fq_ring_free+0x1d0/0x1d0
[    3.856530]  run_timer_softirq+0x223/0x6e0
[    3.860625]  ? sched_clock+0x5/0x10
[    3.864108]  ? sched_clock+0x5/0x10
[    3.867594]  __do_softirq+0x1b5/0x6f5
[    3.871250]  irq_exit+0xd4/0x130
[    3.874470]  smp_apic_timer_interrupt+0xb8/0x2f0
[    3.879075]  apic_timer_interrupt+0xf/0x20
[    3.883159]  </IRQ>
[    3.885255] RIP: 0010:poll_idle+0x60/0xe7
[    3.889252] RSP: 0018:ffffb1b201943e30 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
[    3.896802] RAX: 0000000080200000 RBX: 000000000000008e RCX: 000000000000001f
[    3.903918] RDX: 0000000000000000 RSI: 000000002819aa06 RDI: 0000000000000000
[    3.911031] RBP: ffff9e93c6b33280 R08: 00000010f717d567 R09: 000000000010d205
[    3.918146] R10: ffffb1b201943df8 R11: 0000000000000001 R12: 00000000e01b169d
[    3.925260] R13: 0000000000000000 R14: ffffffffb12aa400 R15: 0000000000000000
[    3.932382]  cpuidle_enter_state+0xb4/0x470
[    3.936558]  do_idle+0x222/0x310
[    3.939779]  cpu_startup_entry+0x78/0x90
[    3.943693]  start_secondary+0x205/0x2e0
[    3.947607]  secondary_startup_64+0xa5/0xb0
[    3.951783] ================================================================================

Signed-off-by: Changbin Du <[email protected]>
Signed-off-by: Joerg Roedel <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
crow-misia pushed a commit to crow-misia/linux that referenced this issue May 11, 2019
[ Upstream commit 6c0afef ]

syzbot was able to catch a use-after-free read in pid_nr_ns() [1]

ip6fl_seq_show() seems to use RCU protection, dereferencing fl->owner.pid
but fl_free() releases fl->owner.pid before rcu grace period is started.

[1]

BUG: KASAN: use-after-free in pid_nr_ns+0x128/0x140 kernel/pid.c:407
Read of size 4 at addr ffff888094012a04 by task syz-executor.0/18087

CPU: 0 PID: 18087 Comm: syz-executor.0 Not tainted 5.1.0-rc6+ beagleboard#89
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x172/0x1f0 lib/dump_stack.c:113
 print_address_description.cold+0x7c/0x20d mm/kasan/report.c:187
 kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
 __asan_report_load4_noabort+0x14/0x20 mm/kasan/generic_report.c:131
 pid_nr_ns+0x128/0x140 kernel/pid.c:407
 ip6fl_seq_show+0x2f8/0x4f0 net/ipv6/ip6_flowlabel.c:794
 seq_read+0xad3/0x1130 fs/seq_file.c:268
 proc_reg_read+0x1fe/0x2c0 fs/proc/inode.c:227
 do_loop_readv_writev fs/read_write.c:701 [inline]
 do_loop_readv_writev fs/read_write.c:688 [inline]
 do_iter_read+0x4a9/0x660 fs/read_write.c:922
 vfs_readv+0xf0/0x160 fs/read_write.c:984
 kernel_readv fs/splice.c:358 [inline]
 default_file_splice_read+0x475/0x890 fs/splice.c:413
 do_splice_to+0x12a/0x190 fs/splice.c:876
 splice_direct_to_actor+0x2d2/0x970 fs/splice.c:953
 do_splice_direct+0x1da/0x2a0 fs/splice.c:1062
 do_sendfile+0x597/0xd00 fs/read_write.c:1443
 __do_sys_sendfile64 fs/read_write.c:1498 [inline]
 __se_sys_sendfile64 fs/read_write.c:1490 [inline]
 __x64_sys_sendfile64+0x15a/0x220 fs/read_write.c:1490
 do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x458da9
Code: ad b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f300d24bc78 EFLAGS: 00000246 ORIG_RAX: 0000000000000028
RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 0000000000458da9
RDX: 00000000200000c0 RSI: 0000000000000008 RDI: 0000000000000007
RBP: 000000000073bf00 R08: 0000000000000000 R09: 0000000000000000
R10: 000000000000005a R11: 0000000000000246 R12: 00007f300d24c6d4
R13: 00000000004c5fa3 R14: 00000000004da748 R15: 00000000ffffffff

Allocated by task 17543:
 save_stack+0x45/0xd0 mm/kasan/common.c:75
 set_track mm/kasan/common.c:87 [inline]
 __kasan_kmalloc mm/kasan/common.c:497 [inline]
 __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:470
 kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:505
 slab_post_alloc_hook mm/slab.h:437 [inline]
 slab_alloc mm/slab.c:3393 [inline]
 kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3555
 alloc_pid+0x55/0x8f0 kernel/pid.c:168
 copy_process.part.0+0x3b08/0x7980 kernel/fork.c:1932
 copy_process kernel/fork.c:1709 [inline]
 _do_fork+0x257/0xfd0 kernel/fork.c:2226
 __do_sys_clone kernel/fork.c:2333 [inline]
 __se_sys_clone kernel/fork.c:2327 [inline]
 __x64_sys_clone+0xbf/0x150 kernel/fork.c:2327
 do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 7789:
 save_stack+0x45/0xd0 mm/kasan/common.c:75
 set_track mm/kasan/common.c:87 [inline]
 __kasan_slab_free+0x102/0x150 mm/kasan/common.c:459
 kasan_slab_free+0xe/0x10 mm/kasan/common.c:467
 __cache_free mm/slab.c:3499 [inline]
 kmem_cache_free+0x86/0x260 mm/slab.c:3765
 put_pid.part.0+0x111/0x150 kernel/pid.c:111
 put_pid+0x20/0x30 kernel/pid.c:105
 fl_free+0xbe/0xe0 net/ipv6/ip6_flowlabel.c:102
 ip6_fl_gc+0x295/0x3e0 net/ipv6/ip6_flowlabel.c:152
 call_timer_fn+0x190/0x720 kernel/time/timer.c:1325
 expire_timers kernel/time/timer.c:1362 [inline]
 __run_timers kernel/time/timer.c:1681 [inline]
 __run_timers kernel/time/timer.c:1649 [inline]
 run_timer_softirq+0x652/0x1700 kernel/time/timer.c:1694
 __do_softirq+0x266/0x95a kernel/softirq.c:293

The buggy address belongs to the object at ffff888094012a00
 which belongs to the cache pid_2 of size 88
The buggy address is located 4 bytes inside of
 88-byte region [ffff888094012a00, ffff888094012a58)
The buggy address belongs to the page:
page:ffffea0002500480 count:1 mapcount:0 mapping:ffff88809a483080 index:0xffff888094012980
flags: 0x1fffc0000000200(slab)
raw: 01fffc0000000200 ffffea00018a3508 ffffea0002524a88 ffff88809a483080
raw: ffff888094012980 ffff888094012000 000000010000001b 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff888094012900: fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc fc
 ffff888094012980: fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc fc
>ffff888094012a00: fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc fc
                   ^
 ffff888094012a80: fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc fc
 ffff888094012b00: fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc fc

Fixes: 4f82f45 ("net ip6 flowlabel: Make owner a union of struct pid * and kuid_t")
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Eric W. Biederman <[email protected]>
Reported-by: syzbot <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
RobertCNelson pushed a commit that referenced this issue Nov 4, 2019
commit fe9bc16 upstream.

Nullify the resource task struct pointer to ensure that subsequent calls
won't try to release task_struct again.

------------[ cut here ]------------
ODEBUG: free active (active state 1) object type: rcu_head hint:
(null)
WARNING: CPU: 0 PID: 6048 at lib/debugobjects.c:329
debug_print_object+0x16a/0x210 lib/debugobjects.c:326
Kernel panic - not syncing: panic_on_warn set ...

CPU: 0 PID: 6048 Comm: syz-executor022 Not tainted
4.19.0-rc7-next-20181008+ #89
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x244/0x3ab lib/dump_stack.c:113
  panic+0x238/0x4e7 kernel/panic.c:184
  __warn.cold.8+0x163/0x1ba kernel/panic.c:536
  report_bug+0x254/0x2d0 lib/bug.c:186
  fixup_bug arch/x86/kernel/traps.c:178 [inline]
  do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:271
  do_invalid_op+0x36/0x40 arch/x86/kernel/traps.c:290
  invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:969
RIP: 0010:debug_print_object+0x16a/0x210 lib/debugobjects.c:326
Code: 41 88 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 92 00 00 00 48 8b 14
dd
60 02 41 88 4c 89 fe 48 c7 c7 00 f8 40 88 e8 36 2f b4 fd <0f> 0b 83 05
a9
f4 5e 06 01 48 83 c4 18 5b 41 5c 41 5d 41 5e 41 5f
RSP: 0018:ffff8801d8c3eda8 EFLAGS: 00010086
RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffffff8164d235 RDI: 0000000000000005
RBP: ffff8801d8c3ede8 R08: ffff8801d70aa280 R09: ffffed003b5c3eda
R10: ffffed003b5c3eda R11: ffff8801dae1f6d7 R12: 0000000000000001
R13: ffffffff8939a760 R14: 0000000000000000 R15: ffffffff8840fca0
  __debug_check_no_obj_freed lib/debugobjects.c:786 [inline]
  debug_check_no_obj_freed+0x3ae/0x58d lib/debugobjects.c:818
  kmem_cache_free+0x202/0x290 mm/slab.c:3759
  free_task_struct kernel/fork.c:163 [inline]
  free_task+0x16e/0x1f0 kernel/fork.c:457
  __put_task_struct+0x2e6/0x620 kernel/fork.c:730
  put_task_struct include/linux/sched/task.h:96 [inline]
  finish_task_switch+0x66c/0x900 kernel/sched/core.c:2715
  context_switch kernel/sched/core.c:2834 [inline]
  __schedule+0x8d7/0x21d0 kernel/sched/core.c:3480
  schedule+0xfe/0x460 kernel/sched/core.c:3524
  freezable_schedule include/linux/freezer.h:172 [inline]
  futex_wait_queue_me+0x3f9/0x840 kernel/futex.c:2530
  futex_wait+0x45c/0xa50 kernel/futex.c:2645
  do_futex+0x31a/0x26d0 kernel/futex.c:3528
  __do_sys_futex kernel/futex.c:3589 [inline]
  __se_sys_futex kernel/futex.c:3557 [inline]
  __x64_sys_futex+0x472/0x6a0 kernel/futex.c:3557
  do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x446549
Code: e8 2c b3 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
ff 0f 83 2b 09 fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f3a998f5da8 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
RAX: ffffffffffffffda RBX: 00000000006dbc38 RCX: 0000000000446549
RDX: 0000000000000000 RSI: 0000000000000080 RDI: 00000000006dbc38
RBP: 00000000006dbc30 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000006dbc3c
R13: 2f646e6162696e69 R14: 666e692f7665642f R15: 00000000006dbd2c
Kernel Offset: disabled

Reported-by: [email protected]
Fixes: ed7a01f ("RDMA/restrack: Release task struct which was hold by CM_ID object")
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
Cc: Pavel Machek <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
@pdp7
Copy link
Contributor

pdp7 commented Jun 9, 2020

@RobertCNelson should I just close very old (4 years) issues like this?

@pdp7
Copy link
Contributor

pdp7 commented Jun 10, 2020

Please re-open if still an issue with our current Debian images:
https://github.com/beagleboard/Latest-Images

@pdp7 pdp7 closed this as completed Jun 10, 2020
RobertCNelson pushed a commit that referenced this issue May 14, 2021
commit dbcc7d5 upstream.

While resolving backreferences, as part of a logical ino ioctl call or
fiemap, we can end up hitting a BUG_ON() when replaying tree mod log
operations of a root, triggering a stack trace like the following:

  ------------[ cut here ]------------
  kernel BUG at fs/btrfs/ctree.c:1210!
  invalid opcode: 0000 [#1] SMP KASAN PTI
  CPU: 1 PID: 19054 Comm: crawl_335 Tainted: G        W         5.11.0-2d11c0084b02-misc-next+ #89
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
  RIP: 0010:__tree_mod_log_rewind+0x3b1/0x3c0
  Code: 05 48 8d 74 10 (...)
  RSP: 0018:ffffc90001eb70b8 EFLAGS: 00010297
  RAX: 0000000000000000 RBX: ffff88812344e400 RCX: ffffffffb28933b6
  RDX: 0000000000000007 RSI: dffffc0000000000 RDI: ffff88812344e42c
  RBP: ffffc90001eb7108 R08: 1ffff11020b60a20 R09: ffffed1020b60a20
  R10: ffff888105b050f9 R11: ffffed1020b60a1f R12: 00000000000000ee
  R13: ffff8880195520c0 R14: ffff8881bc958500 R15: ffff88812344e42c
  FS:  00007fd1955e8700(0000) GS:ffff8881f5600000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007efdb7928718 CR3: 000000010103a006 CR4: 0000000000170ee0
  Call Trace:
   btrfs_search_old_slot+0x265/0x10d0
   ? lock_acquired+0xbb/0x600
   ? btrfs_search_slot+0x1090/0x1090
   ? free_extent_buffer.part.61+0xd7/0x140
   ? free_extent_buffer+0x13/0x20
   resolve_indirect_refs+0x3e9/0xfc0
   ? lock_downgrade+0x3d0/0x3d0
   ? __kasan_check_read+0x11/0x20
   ? add_prelim_ref.part.11+0x150/0x150
   ? lock_downgrade+0x3d0/0x3d0
   ? __kasan_check_read+0x11/0x20
   ? lock_acquired+0xbb/0x600
   ? __kasan_check_write+0x14/0x20
   ? do_raw_spin_unlock+0xa8/0x140
   ? rb_insert_color+0x30/0x360
   ? prelim_ref_insert+0x12d/0x430
   find_parent_nodes+0x5c3/0x1830
   ? resolve_indirect_refs+0xfc0/0xfc0
   ? lock_release+0xc8/0x620
   ? fs_reclaim_acquire+0x67/0xf0
   ? lock_acquire+0xc7/0x510
   ? lock_downgrade+0x3d0/0x3d0
   ? lockdep_hardirqs_on_prepare+0x160/0x210
   ? lock_release+0xc8/0x620
   ? fs_reclaim_acquire+0x67/0xf0
   ? lock_acquire+0xc7/0x510
   ? poison_range+0x38/0x40
   ? unpoison_range+0x14/0x40
   ? trace_hardirqs_on+0x55/0x120
   btrfs_find_all_roots_safe+0x142/0x1e0
   ? find_parent_nodes+0x1830/0x1830
   ? btrfs_inode_flags_to_xflags+0x50/0x50
   iterate_extent_inodes+0x20e/0x580
   ? tree_backref_for_extent+0x230/0x230
   ? lock_downgrade+0x3d0/0x3d0
   ? read_extent_buffer+0xdd/0x110
   ? lock_downgrade+0x3d0/0x3d0
   ? __kasan_check_read+0x11/0x20
   ? lock_acquired+0xbb/0x600
   ? __kasan_check_write+0x14/0x20
   ? _raw_spin_unlock+0x22/0x30
   ? __kasan_check_write+0x14/0x20
   iterate_inodes_from_logical+0x129/0x170
   ? iterate_inodes_from_logical+0x129/0x170
   ? btrfs_inode_flags_to_xflags+0x50/0x50
   ? iterate_extent_inodes+0x580/0x580
   ? __vmalloc_node+0x92/0xb0
   ? init_data_container+0x34/0xb0
   ? init_data_container+0x34/0xb0
   ? kvmalloc_node+0x60/0x80
   btrfs_ioctl_logical_to_ino+0x158/0x230
   btrfs_ioctl+0x205e/0x4040
   ? __might_sleep+0x71/0xe0
   ? btrfs_ioctl_get_supported_features+0x30/0x30
   ? getrusage+0x4b6/0x9c0
   ? __kasan_check_read+0x11/0x20
   ? lock_release+0xc8/0x620
   ? __might_fault+0x64/0xd0
   ? lock_acquire+0xc7/0x510
   ? lock_downgrade+0x3d0/0x3d0
   ? lockdep_hardirqs_on_prepare+0x210/0x210
   ? lockdep_hardirqs_on_prepare+0x210/0x210
   ? __kasan_check_read+0x11/0x20
   ? do_vfs_ioctl+0xfc/0x9d0
   ? ioctl_file_clone+0xe0/0xe0
   ? lock_downgrade+0x3d0/0x3d0
   ? lockdep_hardirqs_on_prepare+0x210/0x210
   ? __kasan_check_read+0x11/0x20
   ? lock_release+0xc8/0x620
   ? __task_pid_nr_ns+0xd3/0x250
   ? lock_acquire+0xc7/0x510
   ? __fget_files+0x160/0x230
   ? __fget_light+0xf2/0x110
   __x64_sys_ioctl+0xc3/0x100
   do_syscall_64+0x37/0x80
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x7fd1976e2427
  Code: 00 00 90 48 8b 05 (...)
  RSP: 002b:00007fd1955e5cf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
  RAX: ffffffffffffffda RBX: 00007fd1955e5f40 RCX: 00007fd1976e2427
  RDX: 00007fd1955e5f48 RSI: 00000000c038943b RDI: 0000000000000004
  RBP: 0000000001000000 R08: 0000000000000000 R09: 00007fd1955e6120
  R10: 0000557835366b00 R11: 0000000000000246 R12: 0000000000000004
  R13: 00007fd1955e5f48 R14: 00007fd1955e5f40 R15: 00007fd1955e5ef8
  Modules linked in:
  ---[ end trace ec8931a1c36e57be ]---

  (gdb) l *(__tree_mod_log_rewind+0x3b1)
  0xffffffff81893521 is in __tree_mod_log_rewind (fs/btrfs/ctree.c:1210).
  1205                     * the modification. as we're going backwards, we do the
  1206                     * opposite of each operation here.
  1207                     */
  1208                    switch (tm->op) {
  1209                    case MOD_LOG_KEY_REMOVE_WHILE_FREEING:
  1210                            BUG_ON(tm->slot < n);
  1211                            fallthrough;
  1212                    case MOD_LOG_KEY_REMOVE_WHILE_MOVING:
  1213                    case MOD_LOG_KEY_REMOVE:
  1214                            btrfs_set_node_key(eb, &tm->key, tm->slot);

Here's what happens to hit that BUG_ON():

1) We have one tree mod log user (through fiemap or the logical ino ioctl),
   with a sequence number of 1, so we have fs_info->tree_mod_seq == 1;

2) Another task is at ctree.c:balance_level() and we have eb X currently as
   the root of the tree, and we promote its single child, eb Y, as the new
   root.

   Then, at ctree.c:balance_level(), we call:

      tree_mod_log_insert_root(eb X, eb Y, 1);

3) At tree_mod_log_insert_root() we create tree mod log elements for each
   slot of eb X, of operation type MOD_LOG_KEY_REMOVE_WHILE_FREEING each
   with a ->logical pointing to ebX->start. These are placed in an array
   named tm_list.
   Lets assume there are N elements (N pointers in eb X);

4) Then, still at tree_mod_log_insert_root(), we create a tree mod log
   element of operation type MOD_LOG_ROOT_REPLACE, ->logical set to
   ebY->start, ->old_root.logical set to ebX->start, ->old_root.level set
   to the level of eb X and ->generation set to the generation of eb X;

5) Then tree_mod_log_insert_root() calls tree_mod_log_free_eb() with
   tm_list as argument. After that, tree_mod_log_free_eb() calls
   __tree_mod_log_insert() for each member of tm_list in reverse order,
   from highest slot in eb X, slot N - 1, to slot 0 of eb X;

6) __tree_mod_log_insert() sets the sequence number of each given tree mod
   log operation - it increments fs_info->tree_mod_seq and sets
   fs_info->tree_mod_seq as the sequence number of the given tree mod log
   operation.

   This means that for the tm_list created at tree_mod_log_insert_root(),
   the element corresponding to slot 0 of eb X has the highest sequence
   number (1 + N), and the element corresponding to the last slot has the
   lowest sequence number (2);

7) Then, after inserting tm_list's elements into the tree mod log rbtree,
   the MOD_LOG_ROOT_REPLACE element is inserted, which gets the highest
   sequence number, which is N + 2;

8) Back to ctree.c:balance_level(), we free eb X by calling
   btrfs_free_tree_block() on it. Because eb X was created in the current
   transaction, has no other references and writeback did not happen for
   it, we add it back to the free space cache/tree;

9) Later some other task T allocates the metadata extent from eb X, since
   it is marked as free space in the space cache/tree, and uses it as a
   node for some other btree;

10) The tree mod log user task calls btrfs_search_old_slot(), which calls
    get_old_root(), and finally that calls __tree_mod_log_oldest_root()
    with time_seq == 1 and eb_root == eb Y;

11) First iteration of the while loop finds the tree mod log element with
    sequence number N + 2, for the logical address of eb Y and of type
    MOD_LOG_ROOT_REPLACE;

12) Because the operation type is MOD_LOG_ROOT_REPLACE, we don't break out
    of the loop, and set root_logical to point to tm->old_root.logical
    which corresponds to the logical address of eb X;

13) On the next iteration of the while loop, the call to
    tree_mod_log_search_oldest() returns the smallest tree mod log element
    for the logical address of eb X, which has a sequence number of 2, an
    operation type of MOD_LOG_KEY_REMOVE_WHILE_FREEING and corresponds to
    the old slot N - 1 of eb X (eb X had N items in it before being freed);

14) We then break out of the while loop and return the tree mod log operation
    of type MOD_LOG_ROOT_REPLACE (eb Y), and not the one for slot N - 1 of
    eb X, to get_old_root();

15) At get_old_root(), we process the MOD_LOG_ROOT_REPLACE operation
    and set "logical" to the logical address of eb X, which was the old
    root. We then call tree_mod_log_search() passing it the logical
    address of eb X and time_seq == 1;

16) Then before calling tree_mod_log_search(), task T adds a key to eb X,
    which results in adding a tree mod log operation of type
    MOD_LOG_KEY_ADD to the tree mod log - this is done at
    ctree.c:insert_ptr() - but after adding the tree mod log operation
    and before updating the number of items in eb X from 0 to 1...

17) The task at get_old_root() calls tree_mod_log_search() and gets the
    tree mod log operation of type MOD_LOG_KEY_ADD just added by task T.
    Then it enters the following if branch:

    if (old_root && tm && tm->op != MOD_LOG_KEY_REMOVE_WHILE_FREEING) {
       (...)
    } (...)

    Calls read_tree_block() for eb X, which gets a reference on eb X but
    does not lock it - task T has it locked.
    Then it clones eb X while it has nritems set to 0 in its header, before
    task T sets nritems to 1 in eb X's header. From hereupon we use the
    clone of eb X which no other task has access to;

18) Then we call __tree_mod_log_rewind(), passing it the MOD_LOG_KEY_ADD
    mod log operation we just got from tree_mod_log_search() in the
    previous step and the cloned version of eb X;

19) At __tree_mod_log_rewind(), we set the local variable "n" to the number
    of items set in eb X's clone, which is 0. Then we enter the while loop,
    and in its first iteration we process the MOD_LOG_KEY_ADD operation,
    which just decrements "n" from 0 to (u32)-1, since "n" is declared with
    a type of u32. At the end of this iteration we call rb_next() to find the
    next tree mod log operation for eb X, that gives us the mod log operation
    of type MOD_LOG_KEY_REMOVE_WHILE_FREEING, for slot 0, with a sequence
    number of N + 1 (steps 3 to 6);

20) Then we go back to the top of the while loop and trigger the following
    BUG_ON():

        (...)
        switch (tm->op) {
        case MOD_LOG_KEY_REMOVE_WHILE_FREEING:
                 BUG_ON(tm->slot < n);
                 fallthrough;
        (...)

    Because "n" has a value of (u32)-1 (4294967295) and tm->slot is 0.

Fix this by taking a read lock on the extent buffer before cloning it at
ctree.c:get_old_root(). This should be done regardless of the extent
buffer having been freed and reused, as a concurrent task might be
modifying it (while holding a write lock on it).

Reported-by: Zygo Blaxell <[email protected]>
Link: https://lore.kernel.org/linux-btrfs/[email protected]/
Fixes: 834328a ("Btrfs: tree mod log's old roots could still be part of the tree")
CC: [email protected] # 4.4+
Signed-off-by: Filipe Manana <[email protected]>
Signed-off-by: David Sterba <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants