Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Found more than one non-landing-pad successor #322

Closed
bwendling opened this issue Jan 18, 2019 · 11 comments
Closed

Found more than one non-landing-pad successor #322

bwendling opened this issue Jan 18, 2019 · 11 comments
Assignees
Labels
asm goto related to the implementation of asm goto [BUG] llvm A bug that should be fixed in upstream LLVM Compiler crash This bug makes Clang crash, emitting a backtrace unreproducible Not or no longer reproducible wontfix This will not be worked on

Comments

@bwendling
Copy link

clang-8: /usr/local/google/home/morbo/llvm/llvm.src/lib/CodeGen/MachineBasicBlock.cpp:548: void llvm::MachineBasicBlock::updateTerminator(): Assertion `!TBB && "Found more than one non-landing-pad successor!"' failed.
Stack dump:
0.	Program arguments: /sandbox/morbo/llvm/llvm.opt.install/bin/clang-8 -cc1 -triple x86_64-unknown-linux-gnu -S -disable-free -main-file-name intel.c -mrelocation-model static -mthread-model posix -fno-delete-null-pointer-checks -mllvm -warn-stack-size=2048 -mdisable-fp-elim -relaxed-aliasing -fmath-errno -masm-verbose -no-integrated-as -mconstructor-aliases -fuse-init-array -mcode-model kernel -target-cpu x86-64 -target-feature +retpoline-indirect-calls -target-feature +retpoline-indirect-branches -target-feature -sse -target-feature -mmx -target-feature -sse2 -target-feature -3dnow -target-feature -avx -target-feature -x87 -target-feature +retpoline-external-thunk -disable-red-zone -dwarf-column-info -debugger-tuning=gdb -coverage-notes-file /usr/local/google/home/morbo/prodkernel-gcc/arch/x86/kernel/cpu/microcode/intel.gcno -nostdsysteminc -nobuiltininc -resource-dir /sandbox/morbo/llvm/llvm.opt.install/lib/clang/9.0.0 -dependency-file arch/x86/kernel/cpu/microcode/.intel.o.d -MT arch/x86/kernel/cpu/microcode/intel.o -sys-header-deps -isystem /sandbox/morbo/llvm/llvm.opt.install/lib/clang/9.0.0/include -include ./include/linux/kconfig.h -include ./include/linux/compiler_types.h -I ./arch/x86/include -I ./arch/x86/include/generated -I ./include -I ./arch/x86/include/uapi -I ./arch/x86/include/generated/uapi -I ./include/uapi -I ./include/generated/uapi -D __KERNEL__ -D CONFIG_X86_X32_ABI -D CONFIG_AS_CFI=1 -D CONFIG_AS_CFI_SIGNAL_FRAME=1 -D CONFIG_AS_CFI_SECTIONS=1 -D CONFIG_AS_FXSAVEQ=1 -D CONFIG_AS_SSSE3=1 -D CONFIG_AS_AVX=1 -D CONFIG_AS_AVX2=1 -D CONFIG_AS_AVX512=1 -D CONFIG_AS_SHA1_NI=1 -D CONFIG_AS_SHA256_NI=1 -D CC_USING_FENTRY -D KBUILD_BASENAME="intel" -D KBUILD_MODNAME="microcode" -O2 -Wall -Wundef -Werror=strict-prototypes -Wno-trigraphs -Werror-implicit-function-declaration -Werror=implicit-int -Wno-format-security -Wno-sign-compare -Wno-format-invalid-specifier -Wno-gnu -Wno-address-of-packed-member -Wno-tautological-compare -Wno-unused-const-variable -Wdeclaration-after-statement -Wvla -Wno-pointer-sign -Werror=date-time -Werror=incompatible-pointer-types -Wno-initializer-overrides -Wno-unused-value -Wno-format -Wno-sign-compare -Wno-format-zero-length -Wno-uninitialized -std=gnu89 -fno-dwarf-directory-asm -fdebug-compilation-dir /usr/local/google/home/morbo/prodkernel-gcc -ferror-limit 19 -fmessage-length 0 -fsanitize-coverage-type=3 -fsanitize-coverage-trace-cmp -fsanitize-coverage-trace-pc -fsanitize=kernel-address -fsanitize-recover=kernel-address -pg -mfentry -fwrapv -stack-protector 2 -mstack-alignment=8 -fwchar-type=short -fno-signed-wchar -fobjc-runtime=gcc -fno-common -fdiagnostics-show-option -vectorize-loops -vectorize-slp -mllvm -asan-mapping-offset=0xdffffc0000000000 -mllvm -asan-globals=1 -mllvm -asan-instrumentation-with-call-threshold=0 -mllvm -asan-stack=1 -mllvm -asan-use-after-scope=1 -o /tmp/intel-a6b9a9.s -x c arch/x86/kernel/cpu/microcode/intel.c 
1.	<eof> parser at end of file
2.	Code generation
3.	Running pass 'Function Pass Manager' on module 'arch/x86/kernel/cpu/microcode/intel.c'.
4.	Running pass 'Branch Probability Basic Block Placement' on function '@generic_load_microcode'
 #0 0x00005640793e158a llvm::sys::PrintStackTrace(llvm::raw_ostream&) (/sandbox/morbo/llvm/llvm.opt.install/bin/clang-8+0x27bf58a)
 #1 0x00005640793df554 llvm::sys::RunSignalHandlers() (/sandbox/morbo/llvm/llvm.opt.install/bin/clang-8+0x27bd554)
 #2 0x00005640793df682 SignalHandler(int) (/sandbox/morbo/llvm/llvm.opt.install/bin/clang-8+0x27bd682)
 #3 0x00007f2ae87420c0 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x110c0)
 #4 0x00007f2ae72d3fcf gsignal (/lib/x86_64-linux-gnu/libc.so.6+0x32fcf)
 #5 0x00007f2ae72d53fa abort (/lib/x86_64-linux-gnu/libc.so.6+0x343fa)
 #6 0x00007f2ae72cce37 (/lib/x86_64-linux-gnu/libc.so.6+0x2be37)
 #7 0x00007f2ae72ccee2 (/lib/x86_64-linux-gnu/libc.so.6+0x2bee2)
 #8 0x0000564078a745e0 llvm::MachineBasicBlock::addLiveIn(unsigned short, llvm::TargetRegisterClass const*) (/sandbox/morbo/llvm/llvm.opt.install/bin/clang-8+0x1e525e0)
 #9 0x0000564078cdb7b4 (anonymous namespace)::MachineBlockPlacement::buildCFGChains() (/sandbox/morbo/llvm/llvm.opt.install/bin/clang-8+0x20b97b4)
#10 0x0000564078cdc493 (anonymous namespace)::MachineBlockPlacement::runOnMachineFunction(llvm::MachineFunction&) (.part.348) (/sandbox/morbo/llvm/llvm.opt.install/bin/clang-8+0x20ba493)
#11 0x0000564078ab9edf llvm::MachineFunctionPass::runOnFunction(llvm::Function&) (/sandbox/morbo/llvm/llvm.opt.install/bin/clang-8+0x1e97edf)
#12 0x0000564078e4f829 llvm::FPPassManager::runOnFunction(llvm::Function&) (/sandbox/morbo/llvm/llvm.opt.install/bin/clang-8+0x222d829)
#13 0x0000564078e4f8d9 llvm::FPPassManager::runOnModule(llvm::Module&) (/sandbox/morbo/llvm/llvm.opt.install/bin/clang-8+0x222d8d9)
#14 0x0000564078e4ebf2 llvm::legacy::PassManagerImpl::run(llvm::Module&) (/sandbox/morbo/llvm/llvm.opt.install/bin/clang-8+0x222cbf2)
#15 0x00005640795d414c (anonymous namespace)::EmitAssemblyHelper::EmitAssembly(clang::BackendAction, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream> >) (/sandbox/morbo/llvm/llvm.opt.install/bin/clang-8+0x29b214c)
#16 0x00005640795d5af5 clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::DataLayout const&, llvm::Module*, clang::BackendAction, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream> >) (/sandbox/morbo/llvm/llvm.opt.install/bin/clang-8+0x29b3af5)
#17 0x0000564079f23200 clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) (/sandbox/morbo/llvm/llvm.opt.install/bin/clang-8+0x3301200)
#18 0x000056407a786719 clang::ParseAST(clang::Sema&, bool, bool) (/sandbox/morbo/llvm/llvm.opt.install/bin/clang-8+0x3b64719)
#19 0x0000564079f21f20 clang::CodeGenAction::ExecuteAction() (/sandbox/morbo/llvm/llvm.opt.install/bin/clang-8+0x32fff20)
#20 0x0000564079a5af1e clang::FrontendAction::Execute() (/sandbox/morbo/llvm/llvm.opt.install/bin/clang-8+0x2e38f1e)
#21 0x0000564079a1ca06 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/sandbox/morbo/llvm/llvm.opt.install/bin/clang-8+0x2dfaa06)
#22 0x0000564079afae45 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/sandbox/morbo/llvm/llvm.opt.install/bin/clang-8+0x2ed8e45)
#23 0x0000564077946208 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/sandbox/morbo/llvm/llvm.opt.install/bin/clang-8+0xd24208)
#24 0x00005640778a494c main (/sandbox/morbo/llvm/llvm.opt.install/bin/clang-8+0xc8294c)
#25 0x00007f2ae72c12b1 __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x202b1)
#26 0x0000564077941dca _start (/sandbox/morbo/llvm/llvm.opt.install/bin/clang-8+0xd1fdca)
clang-8: error: unable to execute command: Aborted
clang-8: error: clang frontend command failed due to signal (use -v to see invocation)
clang version 9.0.0 (trunk) (llvm/trunk 351520)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /usr/local/google/home/morbo/llvm/llvm.opt.install/bin
clang-8: note: diagnostic msg: PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace, preprocessed source, and associated run script.
clang-8: note: diagnostic msg: 
********************

PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
Preprocessed source(s) and associated run script(s) are located at:
clang-8: note: diagnostic msg: /tmp/intel-a595d8.c
clang-8: note: diagnostic msg: /tmp/intel-a595d8.sh
clang-8: note: diagnostic msg: 

********************
scripts/Makefile.build:276: recipe for target 'arch/x86/kernel/cpu/microcode/intel.o' failed
make[4]: *** [arch/x86/kernel/cpu/microcode/intel.o] Error 254
scripts/Makefile.build:492: recipe for target 'arch/x86/kernel/cpu/microcode' failed
make[3]: *** [arch/x86/kernel/cpu/microcode] Error 2
scripts/Makefile.build:492: recipe for target 'arch/x86/kernel/cpu' failed
make[2]: *** [arch/x86/kernel/cpu] Error 2
scripts/Makefile.build:492: recipe for target 'arch/x86/kernel' failed
make[1]: *** [arch/x86/kernel] Error 2
Makefile:1043: recipe for target 'arch/x86' failed
make: *** [arch/x86] Error 2

intel-a595d8.c.txt
intel-a595d8.sh.txt

@nickdesaulniers nickdesaulniers added the [BUG] llvm A bug that should be fixed in upstream LLVM label Jan 18, 2019
@nickdesaulniers
Copy link
Member

nickdesaulniers commented Jan 22, 2019

@gwelymernans ; with the attached files, I was not able to reproduce.

$ mv intel-a595d8.c.txt intel-a595d8.c
$ mv intel-a595d8.sh.txt intel-a595d8.sh
$ chmod +x intel-a595d8.sh
$ vim !$
<edit to remove absolute path to clang>
$ ./intel-a595d8.sh
$ file intel-a595d8.s 
intel-a595d8.s: assembler source, ASCII text

This is with a build of Clang with D53765 and D56571 patched in. (Am I understanding correctly that this is asm goto related, or not?)

intel-a595d8.s.txt

@nickdesaulniers nickdesaulniers added the unreproducible Not or no longer reproducible label Jan 22, 2019
@bwendling
Copy link
Author

Weird. I got another error though #324. It didn't get far enough to compile this file...

@nickdesaulniers
Copy link
Member

Ctopper mentions that there the abort is due to an assert; which I wouldn't observe with my release builds. Let me retry with a debug build.

@nickdesaulniers
Copy link
Member

ah, now I'm able to repro. Bisecting.

@nickdesaulniers nickdesaulniers removed the unreproducible Not or no longer reproducible label Jan 23, 2019
@nickdesaulniers nickdesaulniers self-assigned this Jan 23, 2019
@nickdesaulniers nickdesaulniers added the asm goto related to the implementation of asm goto label Jan 23, 2019
@dileks
Copy link
Collaborator

dileks commented Jan 25, 2019

I am testing Linux v5.0-rc3+ on Debian/buster AMD64 with a selfmade llvm-toolchain-8.0.0-rc1 (means llvm, clang and compiler-rt) with the asm-goto RFC prototype D53765.id182593.diff and D56571.id181973.diff patches applied.

With this setup I see in drivers/hwmon/abituguru.c and net/mac80211/mlme.c:

clang-8: /home/sdi/src/llvm-toolchain/llvm/lib/CodeGen/MachineBasicBlock.cpp:548: void llvm::MachineBasicBlock::updateTerminator(): Assertion `!TBB && "Found more than one non-landing-pad successor!"' failed.

As a workaround I tried disabling these kernel-configs (and still building):

 MAC80211 m -> n
 SENSORS_ABITUGURU m -> n
 SENSORS_ABITUGURU3 m -> n

Attached are the reproducers.

abituguru-f9b0af.c.txt
abituguru-f9b0af.sh.txt
mlme-03ee06.c.txt
mlme-03ee06.sh.txt

@nickdesaulniers
Copy link
Member

nickdesaulniers commented Jan 25, 2019

Hmmm....creduce is not reaching a fixed point for this. (It narrows down the repro, then just keeps expanding in size, moving in the wrong direction).

I was able to pare down the command line flags. Looks like some kind of interaction between -fsanitize=kernel-address (cc @ramosian-glider , @dvyukov ) and -mrelocation-model static.

intel-a595d8.sh.txt
intel-a595d8.c.txt

Going to take a look now at @dileks report, since I'll bet it's not adding a bunch of funny configs.

@nickdesaulniers
Copy link
Member

@dileks I'm not able to repro. Can you provide steps to reproduce starting from a defconfig?

$ make CC=clang -j46 defconfig
$ ./scripts/config -e SENSORS_ABITUGURU -e SENSORS_ABITUGURU3
$ grep ABITU .config
CONFIG_SENSORS_ABITUGURU=y
CONFIG_SENSORS_ABITUGURU3=y
$ make CC=clang -j46 drivers/hwmon/abituguru.o
...
  CC      drivers/hwmon/abituguru.o
$ echo $?
0
$ make CC=clang -j46 net/mac80211/mlme.o
...
  CC      net/mac80211/mlme.o
$ echo $?
0

@nickdesaulniers
Copy link
Member

Fixed in https://reviews.llvm.org/D53765 diff 183623. Thanks @topperc !

@nickdesaulniers nickdesaulniers added the unreproducible Not or no longer reproducible label Jan 26, 2019
@bwendling
Copy link
Author

BTW, I was able to build an allyesconfig with Clang + asm goto patches. (It failed to link because of some gcov missing symbols, but I think those are unrelated to the asm goto stuff.)

@dileks
Copy link
Collaborator

dileks commented Jan 26, 2019

@nickdesaulniers

My setup is now llvm-toolchain-8.0.0rc1 (llvm and clang only, dropped compiler-rt) and applied D53765-id183688.diff and D56571-id181973.diff.

Testing with x86_64_defconfig and...

$ make CC=clang-8 -j46 defconfig
$ cp -v .config ../config-x86_64_defconfig

$ ./scripts/config -m SENSORS_ABITUGURU -m SENSORS_ABITUGURU3 -m MAC80211
$ ./scripts/diffconfig ../config-x86_64_defconfig .config
 MAC80211 y -> m
 SENSORS_ABITUGURU n -> m
 SENSORS_ABITUGURU3 n -> m

$ make CC=clang-8 -j46 drivers/hwmon/abituguru.o
scripts/kconfig/conf  --syncconfig Kconfig
  DESCEND  objtool
  CALL    scripts/checksyscalls.sh
  CC [M]  drivers/hwmon/abituguru.o

$ make CC=clang-8 -j46 net/mac80211/mlme.o
  DESCEND  objtool
  CALL    scripts/checksyscalls.sh
  CC [M]  net/mac80211/mlme.o

...looks good/promising.

Will do more testing with kernel-config from Debian's config-4.20.0-trunk-amd64.

@nickdesaulniers
Copy link
Member

@gwelymernans see #17 . TL;DR patches currently V3 in upstream review via @vo4

@tpimh tpimh added the Compiler crash This bug makes Clang crash, emitting a backtrace label Mar 7, 2019
@nickdesaulniers nickdesaulniers added wontfix This will not be worked on and removed wontfix This will not be worked on labels May 20, 2019
nathanchance pushed a commit that referenced this issue Feb 4, 2020
When module is being initialized, __init() calls bus_register() and
driver_register().
These functions internally create various resources and sysfs files.
The sysfs files are used for basic operations(add/del device).
/sys/bus/netdevsim/new_device
/sys/bus/netdevsim/del_device

These sysfs files use netdevsim resources, they are mostly allocated
and initialized in ->probe() function, which is nsim_dev_probe().
But, sysfs files could be executed before ->probe() is finished.
So, accessing uninitialized data would occur.

Another problem is very similar.
/sys/bus/netdevsim/new_device internally creates sysfs files.
/sys/devices/netdevsim<id>/new_port
/sys/devices/netdevsim<id>/del_port

These sysfs files also use netdevsim resources, they are mostly allocated
and initialized in creating device routine, which is nsim_bus_dev_new().
But they also could be executed before nsim_bus_dev_new() is finished.
So, accessing uninitialized data would occur.

To fix these problems, this patch adds flags, which means whether the
operation is finished or not.
The flag variable 'nsim_bus_enable' means whether netdevsim bus was
initialized or not.
This is protected by nsim_bus_dev_list_lock.
The flag variable 'nsim_bus_dev->init' means whether nsim_bus_dev was
initialized or not.
This could be used in {new/del}_port_store() with no lock.

Test commands:
    #SHELL1
    modprobe netdevsim
    while :
    do
        echo "1 1" > /sys/bus/netdevsim/new_device
        echo "1 1" > /sys/bus/netdevsim/del_device
    done

    #SHELL2
    while :
    do
        echo 1 > /sys/devices/netdevsim1/new_port
        echo 1 > /sys/devices/netdevsim1/del_port
    done

Splat looks like:
[   47.508954][ T1008] general protection fault, probably for non-canonical address 0xdffffc0000000021: 0000 I
[   47.510793][ T1008] KASAN: null-ptr-deref in range [0x0000000000000108-0x000000000000010f]
[   47.511963][ T1008] CPU: 2 PID: 1008 Comm: bash Not tainted 5.5.0+ #322
[   47.512823][ T1008] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[   47.514041][ T1008] RIP: 0010:__mutex_lock+0x10a/0x14b0
[   47.514699][ T1008] Code: 08 84 d2 0f 85 7f 12 00 00 44 8b 0d 10 23 65 02 45 85 c9 75 29 49 8d 7f 68 48 b8 00 00 00 0f
[   47.517163][ T1008] RSP: 0018:ffff888059b4fbb0 EFLAGS: 00010206
[   47.517802][ T1008] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
[   47.518941][ T1008] RDX: 0000000000000021 RSI: ffffffff85926440 RDI: 0000000000000108
[   47.519732][ T1008] RBP: ffff888059b4fd30 R08: ffffffffc073fad0 R09: 0000000000000000
[   47.520729][ T1008] R10: ffff888059b4fd50 R11: ffff88804bb38040 R12: 0000000000000000
[   47.521702][ T1008] R13: dffffc0000000000 R14: ffffffff871976c0 R15: 00000000000000a0
[   47.522760][ T1008] FS:  00007fd4be05a740(0000) GS:ffff88806c800000(0000) knlGS:0000000000000000
[   47.523877][ T1008] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   47.524627][ T1008] CR2: 0000561c82b69cf0 CR3: 0000000065dd6004 CR4: 00000000000606e0
[   47.527662][ T1008] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   47.528604][ T1008] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   47.529531][ T1008] Call Trace:
[   47.529874][ T1008]  ? nsim_dev_port_add+0x50/0x150 [netdevsim]
[   47.530470][ T1008]  ? mutex_lock_io_nested+0x1380/0x1380
[   47.531018][ T1008]  ? _kstrtoull+0x76/0x160
[   47.531449][ T1008]  ? _parse_integer+0xf0/0xf0
[   47.531874][ T1008]  ? kernfs_fop_write+0x1cf/0x410
[   47.532330][ T1008]  ? sysfs_file_ops+0x160/0x160
[   47.532773][ T1008]  ? kstrtouint+0x86/0x110
[   47.533168][ T1008]  ? nsim_dev_port_add+0x50/0x150 [netdevsim]
[   47.533721][ T1008]  nsim_dev_port_add+0x50/0x150 [netdevsim]
[   47.534336][ T1008]  ? sysfs_file_ops+0x160/0x160
[   47.534858][ T1008]  new_port_store+0x99/0xb0 [netdevsim]
[   47.535439][ T1008]  ? del_port_store+0xb0/0xb0 [netdevsim]
[   47.536035][ T1008]  ? sysfs_file_ops+0x112/0x160
[   47.536544][ T1008]  ? sysfs_kf_write+0x3b/0x180
[   47.537029][ T1008]  kernfs_fop_write+0x276/0x410
[   47.537548][ T1008]  ? __sb_start_write+0x215/0x2e0
[   47.538110][ T1008]  vfs_write+0x197/0x4a0
[ ... ]

Fixes: f9d9db4 ("netdevsim: add bus attributes to add new and delete devices")
Fixes: 794b2c0 ("netdevsim: extend device attrs to support port addition and deletion")
Signed-off-by: Taehee Yoo <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
nathanchance pushed a commit that referenced this issue Feb 4, 2020
devlink reload destroys resources and allocates resources again.
So, when devices and ports resources are being used, devlink reload
function should not be executed. In order to avoid this race, a new
lock is added and new_port() and del_port() call devlink_reload_disable()
and devlink_reload_enable().

Thread0                      Thread1
{new/del}_port()             {new/del}_port()
devlink_reload_disable()
                             devlink_reload_disable()
devlink_reload_enable()
                             //here
                             devlink_reload_enable()

Before Thread1's devlink_reload_enable(), the devlink is already allowed
to execute reload because Thread0 allows it. devlink reload disable/enable
variable type is bool. So the above case would exist.
So, disable/enable should be executed atomically.
In order to do that, a new lock is used.

Test commands:
    modprobe netdevsim
    echo 1 > /sys/bus/netdevsim/new_device
    while :
    do
        echo 1 > /sys/devices/netdevsim1/new_port &
        echo 1 > /sys/devices/netdevsim1/del_port &
        devlink dev reload netdevsim/netdevsim1 &
    done

Splat looks like:
[   23.342145][  T932] DEBUG_LOCKS_WARN_ON(mutex_is_locked(lock))
[   23.342159][  T932] WARNING: CPU: 0 PID: 932 at kernel/locking/mutex-debug.c:103 mutex_destroy+0xc7/0xf0
[   23.344182][  T932] Modules linked in: netdevsim openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_dx
[   23.346485][  T932] CPU: 0 PID: 932 Comm: devlink Not tainted 5.5.0+ #322
[   23.347696][  T932] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[   23.348893][  T932] RIP: 0010:mutex_destroy+0xc7/0xf0
[   23.349505][  T932] Code: e0 07 83 c0 03 38 d0 7c 04 84 d2 75 2e 8b 05 00 ac b0 02 85 c0 75 8b 48 c7 c6 00 5e 07 96 40
[   23.351887][  T932] RSP: 0018:ffff88806208f810 EFLAGS: 00010286
[   23.353963][  T932] RAX: dffffc0000000008 RBX: ffff888067f6f2c0 RCX: ffffffff942c4bd4
[   23.355222][  T932] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff96dac5b4
[   23.356169][  T932] RBP: ffff888067f6f000 R08: fffffbfff2d235a5 R09: fffffbfff2d235a5
[   23.357160][  T932] R10: 0000000000000001 R11: fffffbfff2d235a4 R12: ffff888067f6f208
[   23.358288][  T932] R13: ffff88806208fa70 R14: ffff888067f6f000 R15: ffff888069ce3800
[   23.359307][  T932] FS:  00007fe2a3876740(0000) GS:ffff88806c000000(0000) knlGS:0000000000000000
[   23.360473][  T932] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   23.361319][  T932] CR2: 00005561357aa000 CR3: 000000005227a006 CR4: 00000000000606f0
[   23.362323][  T932] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   23.363417][  T932] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   23.364414][  T932] Call Trace:
[   23.364828][  T932]  nsim_dev_reload_destroy+0x77/0xb0 [netdevsim]
[   23.365655][  T932]  nsim_dev_reload_down+0x84/0xb0 [netdevsim]
[   23.366433][  T932]  devlink_reload+0xb1/0x350
[   23.367010][  T932]  genl_rcv_msg+0x580/0xe90

[ ...]

[   23.531729][ T1305] kernel BUG at lib/list_debug.c:53!
[   23.532523][ T1305] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
[   23.533467][ T1305] CPU: 2 PID: 1305 Comm: bash Tainted: G        W         5.5.0+ #322
[   23.534962][ T1305] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[   23.536503][ T1305] RIP: 0010:__list_del_entry_valid+0xe6/0x150
[   23.538346][ T1305] Code: 89 ea 48 c7 c7 00 73 1e 96 e8 df f7 4c ff 0f 0b 48 c7 c7 60 73 1e 96 e8 d1 f7 4c ff 0f 0b 44
[   23.541068][ T1305] RSP: 0018:ffff888047c27b58 EFLAGS: 00010282
[   23.542001][ T1305] RAX: 0000000000000054 RBX: ffff888067f6f318 RCX: 0000000000000000
[   23.543051][ T1305] RDX: 0000000000000054 RSI: 0000000000000008 RDI: ffffed1008f84f61
[   23.544072][ T1305] RBP: ffff88804aa0fca0 R08: ffffed100d940539 R09: ffffed100d940539
[   23.545085][ T1305] R10: 0000000000000001 R11: ffffed100d940538 R12: ffff888047c27cb0
[   23.546422][ T1305] R13: ffff88806208b840 R14: ffffffff981976c0 R15: ffff888067f6f2c0
[   23.547406][ T1305] FS:  00007f76c0431740(0000) GS:ffff88806c800000(0000) knlGS:0000000000000000
[   23.548527][ T1305] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   23.549389][ T1305] CR2: 00007f5048f1a2f8 CR3: 000000004b310006 CR4: 00000000000606e0
[   23.550636][ T1305] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   23.551578][ T1305] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   23.552597][ T1305] Call Trace:
[   23.553004][ T1305]  mutex_remove_waiter+0x101/0x520
[   23.553646][ T1305]  __mutex_lock+0xac7/0x14b0
[   23.554218][ T1305]  ? nsim_dev_port_del+0x4e/0x140 [netdevsim]
[   23.554908][ T1305]  ? mutex_lock_io_nested+0x1380/0x1380
[   23.555570][ T1305]  ? _parse_integer+0xf0/0xf0
[   23.556043][ T1305]  ? kstrtouint+0x86/0x110
[   23.556504][ T1305]  ? nsim_dev_port_del+0x4e/0x140 [netdevsim]
[   23.557133][ T1305]  nsim_dev_port_del+0x4e/0x140 [netdevsim]
[   23.558024][ T1305]  del_port_store+0xcc/0xf0 [netdevsim]
[ ... ]

Fixes: 75ba029 ("netdevsim: implement proper devlink reload")
Signed-off-by: Taehee Yoo <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
nathanchance pushed a commit that referenced this issue Feb 4, 2020
nsim_dev_take_snapshot_write() uses nsim_dev and nsim_dev->dummy_region.
So, during this function, these data shouldn't be removed.
But there is no protecting stuff in this function.

There are two similar cases.
1. reload case
reload could be called during nsim_dev_take_snapshot_write().
When reload is being executed, nsim_dev_reload_down() is called and it
calls nsim_dev_reload_destroy(). nsim_dev_reload_destroy() calls
devlink_region_destroy() to destroy nsim_dev->dummy_region.
So, during nsim_dev_take_snapshot_write(), nsim_dev->dummy_region()
would be removed.
At this point, snapshot_write() would access freed pointer.
In order to fix this case, take_snapshot file will be removed before
devlink_region_destroy().
The take_snapshot file will be re-created by ->reload_up().

2. del_device_store case
del_device_store() also could call nsim_dev_reload_destroy()
during nsim_dev_take_snapshot_write(). If so, panic would occur.
This problem is actually the same problem with the first case.
So, this problem will be fixed by the first case's solution.

Test commands:
    modprobe netdevsim
    while :
    do
        echo 1 > /sys/bus/netdevsim/new_device &
        echo 1 > /sys/bus/netdevsim/del_device &
	devlink dev reload netdevsim/netdevsim1 &
	echo 1 > /sys/kernel/debug/netdevsim/netdevsim1/take_snapshot &
    done

Splat looks like:
[   45.564513][  T975] general protection fault, probably for non-canonical address 0xdffffc000000003a: 0000 [#1] SMP DEI
[   45.566131][  T975] KASAN: null-ptr-deref in range [0x00000000000001d0-0x00000000000001d7]
[   45.566135][  T975] CPU: 1 PID: 975 Comm: bash Not tainted 5.5.0+ #322
[   45.569020][  T975] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[   45.569026][  T975] RIP: 0010:__mutex_lock+0x10a/0x14b0
[   45.570518][  T975] Code: 08 84 d2 0f 85 7f 12 00 00 44 8b 0d 10 23 65 02 45 85 c9 75 29 49 8d 7f 68 48 b8 00 00 00 0f
[   45.570522][  T975] RSP: 0018:ffff888046ccfbf0 EFLAGS: 00010206
[   45.572305][  T975] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
[   45.572308][  T975] RDX: 000000000000003a RSI: ffffffffac926440 RDI: 00000000000001d0
[   45.576843][  T975] RBP: ffff888046ccfd70 R08: ffffffffab610645 R09: 0000000000000000
[   45.576847][  T975] R10: ffff888046ccfd90 R11: ffffed100d6360ad R12: 0000000000000000
[   45.578471][  T975] R13: dffffc0000000000 R14: ffffffffae1976c0 R15: 0000000000000168
[   45.578475][  T975] FS:  00007f614d6e7740(0000) GS:ffff88806c400000(0000) knlGS:0000000000000000
[   45.581492][  T975] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   45.582942][  T975] CR2: 00005618677d1cf0 CR3: 000000005fb9c002 CR4: 00000000000606e0
[   45.584543][  T975] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   45.586633][  T975] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   45.589889][  T975] Call Trace:
[   45.591445][  T975]  ? devlink_region_snapshot_create+0x55/0x4a0
[   45.601250][  T975]  ? mutex_lock_io_nested+0x1380/0x1380
[   45.602817][  T975]  ? mutex_lock_io_nested+0x1380/0x1380
[   45.603875][  T975]  ? mark_held_locks+0xa5/0xe0
[   45.604769][  T975]  ? _raw_spin_unlock_irqrestore+0x2d/0x50
[   45.606147][  T975]  ? __mutex_unlock_slowpath+0xd0/0x670
[   45.607723][  T975]  ? crng_backtrack_protect+0x80/0x80
[   45.613530][  T975]  ? wait_for_completion+0x390/0x390
[   45.615152][  T975]  ? devlink_region_snapshot_create+0x55/0x4a0
[   45.616834][  T975]  devlink_region_snapshot_create+0x55/0x4a0
[ ... ]

Fixes: 4418f86 ("netdevsim: implement support for devlink region and snapshots")
Signed-off-by: Taehee Yoo <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
nathanchance pushed a commit that referenced this issue Feb 4, 2020
When netdevsim dev is being created, a debugfs directory is created.
The variable "dev_ddir_name" is 16bytes device name pointer and device
name is "netdevsim<dev id>".
The maximum dev id length is 10.
So, 16bytes for device name isn't enough.

Test commands:
    modprobe netdevsim
    echo "1000000000 0" > /sys/bus/netdevsim/new_device

Splat looks like:
[  249.622710][  T900] BUG: KASAN: stack-out-of-bounds in number+0x824/0x880
[  249.623658][  T900] Write of size 1 at addr ffff88804c527988 by task bash/900
[  249.624521][  T900]
[  249.624830][  T900] CPU: 1 PID: 900 Comm: bash Not tainted 5.5.0+ #322
[  249.625691][  T900] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[  249.626712][  T900] Call Trace:
[  249.627103][  T900]  dump_stack+0x96/0xdb
[  249.627639][  T900]  ? number+0x824/0x880
[  249.628173][  T900]  print_address_description.constprop.5+0x1be/0x360
[  249.629022][  T900]  ? number+0x824/0x880
[  249.629569][  T900]  ? number+0x824/0x880
[  249.630105][  T900]  __kasan_report+0x12a/0x170
[  249.630717][  T900]  ? number+0x824/0x880
[  249.631201][  T900]  kasan_report+0xe/0x20
[  249.631723][  T900]  number+0x824/0x880
[  249.632235][  T900]  ? put_dec+0xa0/0xa0
[  249.632716][  T900]  ? rcu_read_lock_sched_held+0x90/0xc0
[  249.633392][  T900]  vsnprintf+0x63c/0x10b0
[  249.633983][  T900]  ? pointer+0x5b0/0x5b0
[  249.634543][  T900]  ? mark_lock+0x11d/0xc40
[  249.635200][  T900]  sprintf+0x9b/0xd0
[  249.635750][  T900]  ? scnprintf+0xe0/0xe0
[  249.636370][  T900]  nsim_dev_probe+0x63c/0xbf0 [netdevsim]
[ ... ]

Reviewed-by: Jakub Kicinski <[email protected]>
Fixes: ab1d0cc ("netdevsim: change debugfs tree topology")
Signed-off-by: Taehee Yoo <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
nathanchance pushed a commit that referenced this issue Feb 21, 2020
After bond_release(), netdev_update_lockdep_key() should be called.
But both ioctl path and attribute path don't call
netdev_update_lockdep_key().
This patch adds missing netdev_update_lockdep_key().

Test commands:
    ip link add bond0 type bond
    ip link add bond1 type bond
    ifenslave bond0 bond1
    ifenslave -d bond0 bond1
    ifenslave bond1 bond0

Splat looks like:
[   29.501182][ T1046] WARNING: possible circular locking dependency detected
[   29.501945][ T1039] hardirqs last disabled at (1962): [<ffffffffac6c807f>] handle_mm_fault+0x13f/0x700
[   29.503442][ T1046] 5.5.0+ #322 Not tainted
[   29.503447][ T1046] ------------------------------------------------------
[   29.504277][ T1039] softirqs last  enabled at (1180): [<ffffffffade00678>] __do_softirq+0x678/0x981
[   29.505443][ T1046] ifenslave/1046 is trying to acquire lock:
[   29.505886][ T1039] softirqs last disabled at (1169): [<ffffffffac19c18a>] irq_exit+0x17a/0x1a0
[   29.509997][ T1046] ffff88805d5da280 (&dev->addr_list_lock_key#3){+...}, at: dev_mc_sync_multiple+0x95/0x120
[   29.511243][ T1046]
[   29.511243][ T1046] but task is already holding lock:
[   29.512192][ T1046] ffff8880460f2280 (&dev->addr_list_lock_key#4){+...}, at: bond_enslave+0x4482/0x47b0 [bonding]
[   29.514124][ T1046]
[   29.514124][ T1046] which lock already depends on the new lock.
[   29.514124][ T1046]
[   29.517297][ T1046]
[   29.517297][ T1046] the existing dependency chain (in reverse order) is:
[   29.518231][ T1046]
[   29.518231][ T1046] -> #1 (&dev->addr_list_lock_key#4){+...}:
[   29.519076][ T1046]        _raw_spin_lock+0x30/0x70
[   29.519588][ T1046]        dev_mc_sync_multiple+0x95/0x120
[   29.520208][ T1046]        bond_enslave+0x448d/0x47b0 [bonding]
[   29.520862][ T1046]        bond_option_slaves_set+0x1a3/0x370 [bonding]
[   29.521640][ T1046]        __bond_opt_set+0x1ff/0xbb0 [bonding]
[   29.522438][ T1046]        __bond_opt_set_notify+0x2b/0xf0 [bonding]
[   29.523251][ T1046]        bond_opt_tryset_rtnl+0x92/0xf0 [bonding]
[   29.524082][ T1046]        bonding_sysfs_store_option+0x8a/0xf0 [bonding]
[   29.524959][ T1046]        kernfs_fop_write+0x276/0x410
[   29.525620][ T1046]        vfs_write+0x197/0x4a0
[   29.526218][ T1046]        ksys_write+0x141/0x1d0
[   29.526818][ T1046]        do_syscall_64+0x99/0x4f0
[   29.527430][ T1046]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
[   29.528265][ T1046]
[   29.528265][ T1046] -> #0 (&dev->addr_list_lock_key#3){+...}:
[   29.529272][ T1046]        __lock_acquire+0x2d8d/0x3de0
[   29.529935][ T1046]        lock_acquire+0x164/0x3b0
[   29.530638][ T1046]        _raw_spin_lock+0x30/0x70
[   29.531187][ T1046]        dev_mc_sync_multiple+0x95/0x120
[   29.531790][ T1046]        bond_enslave+0x448d/0x47b0 [bonding]
[   29.532451][ T1046]        bond_option_slaves_set+0x1a3/0x370 [bonding]
[   29.533163][ T1046]        __bond_opt_set+0x1ff/0xbb0 [bonding]
[   29.533789][ T1046]        __bond_opt_set_notify+0x2b/0xf0 [bonding]
[   29.534595][ T1046]        bond_opt_tryset_rtnl+0x92/0xf0 [bonding]
[   29.535500][ T1046]        bonding_sysfs_store_option+0x8a/0xf0 [bonding]
[   29.536379][ T1046]        kernfs_fop_write+0x276/0x410
[   29.537057][ T1046]        vfs_write+0x197/0x4a0
[   29.537640][ T1046]        ksys_write+0x141/0x1d0
[   29.538251][ T1046]        do_syscall_64+0x99/0x4f0
[   29.538870][ T1046]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
[   29.539659][ T1046]
[   29.539659][ T1046] other info that might help us debug this:
[   29.539659][ T1046]
[   29.540953][ T1046]  Possible unsafe locking scenario:
[   29.540953][ T1046]
[   29.541883][ T1046]        CPU0                    CPU1
[   29.542540][ T1046]        ----                    ----
[   29.543209][ T1046]   lock(&dev->addr_list_lock_key#4);
[   29.543880][ T1046]                                lock(&dev->addr_list_lock_key#3);
[   29.544873][ T1046]                                lock(&dev->addr_list_lock_key#4);
[   29.545863][ T1046]   lock(&dev->addr_list_lock_key#3);
[   29.546525][ T1046]
[   29.546525][ T1046]  *** DEADLOCK ***
[   29.546525][ T1046]
[   29.547542][ T1046] 5 locks held by ifenslave/1046:
[   29.548196][ T1046]  #0: ffff88806044c478 (sb_writers#5){.+.+}, at: vfs_write+0x3bb/0x4a0
[   29.549248][ T1046]  #1: ffff88805af00890 (&of->mutex){+.+.}, at: kernfs_fop_write+0x1cf/0x410
[   29.550343][ T1046]  #2: ffff88805b8b54b0 (kn->count#157){.+.+}, at: kernfs_fop_write+0x1f2/0x410
[   29.551575][ T1046]  #3: ffffffffaecf4cf0 (rtnl_mutex){+.+.}, at: bond_opt_tryset_rtnl+0x5f/0xf0 [bonding]
[   29.552819][ T1046]  #4: ffff8880460f2280 (&dev->addr_list_lock_key#4){+...}, at: bond_enslave+0x4482/0x47b0 [bonding]
[   29.554175][ T1046]
[   29.554175][ T1046] stack backtrace:
[   29.554907][ T1046] CPU: 0 PID: 1046 Comm: ifenslave Not tainted 5.5.0+ #322
[   29.555854][ T1046] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[   29.557064][ T1046] Call Trace:
[   29.557504][ T1046]  dump_stack+0x96/0xdb
[   29.558054][ T1046]  check_noncircular+0x371/0x450
[   29.558723][ T1046]  ? print_circular_bug.isra.35+0x310/0x310
[   29.559486][ T1046]  ? hlock_class+0x130/0x130
[   29.560100][ T1046]  ? __lock_acquire+0x2d8d/0x3de0
[   29.560761][ T1046]  __lock_acquire+0x2d8d/0x3de0
[   29.561366][ T1046]  ? register_lock_class+0x14d0/0x14d0
[   29.562045][ T1046]  ? find_held_lock+0x39/0x1d0
[   29.562641][ T1046]  lock_acquire+0x164/0x3b0
[   29.563199][ T1046]  ? dev_mc_sync_multiple+0x95/0x120
[   29.563872][ T1046]  _raw_spin_lock+0x30/0x70
[   29.564464][ T1046]  ? dev_mc_sync_multiple+0x95/0x120
[   29.565146][ T1046]  dev_mc_sync_multiple+0x95/0x120
[   29.565793][ T1046]  bond_enslave+0x448d/0x47b0 [bonding]
[   29.566487][ T1046]  ? bond_update_slave_arr+0x940/0x940 [bonding]
[   29.567279][ T1046]  ? bstr_printf+0xc20/0xc20
[   29.567857][ T1046]  ? stack_trace_consume_entry+0x160/0x160
[   29.568614][ T1046]  ? deactivate_slab.isra.77+0x2c5/0x800
[   29.569320][ T1046]  ? check_chain_key+0x236/0x5d0
[   29.569939][ T1046]  ? sscanf+0x93/0xc0
[   29.570442][ T1046]  ? vsscanf+0x1e20/0x1e20
[   29.571003][ T1046]  bond_option_slaves_set+0x1a3/0x370 [bonding]
[ ... ]

Fixes: ab92d68 ("net: core: add generic lockdep keys")
Signed-off-by: Taehee Yoo <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
asm goto related to the implementation of asm goto [BUG] llvm A bug that should be fixed in upstream LLVM Compiler crash This bug makes Clang crash, emitting a backtrace unreproducible Not or no longer reproducible wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

5 participants