Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blocking accept() does not return #40

Closed
mjmartineau opened this issue Jun 25, 2020 · 2 comments
Closed

Blocking accept() does not return #40

mjmartineau opened this issue Jun 25, 2020 · 2 comments

Comments

@mjmartineau
Copy link
Member

When running a minimal test program with a typical socket/bind/listen/accept sequence on the server side (blocking calls), the accept() does not return after the handshake completes. I first found this with b239a7b (export/20200624T164427).

After adding some debug output, I see that the call to ssock->ops->accept() in mptcp_accept() is not returning. It does resume after SIGINT when the test program is terminated.

The self tests currently run only nonblocking tests. @pabeni mentioned that the fallback refactor changed some connect-time signaling.

I will bisect this and post an update.

@mjmartineau
Copy link
Member Author

mjmartineau commented Jun 25, 2020

Looks like this was introduced in export/20200614T012255 (33f2d74) export/20200608T112436 (a235216). This is the first tag with the fallback commit abbab27

@mjmartineau
Copy link
Member Author

Fixed in export/20200626T181651

jenkins-tessares pushed a commit that referenced this issue Aug 19, 2020
…card()

If create a loop device with a backing NVMe SSD, current loop device
driver doesn't correctly set its  queue's limits.discard_granularity and
leaves it as 0. If a discard request at LBA 0 on this loop device, in
__blkdev_issue_discard() the calculated req_sects will be 0, and a zero
length discard request will trigger a BUG() panic in generic block layer
code at block/blk-mq.c:563.

[  955.565006][   C39] ------------[ cut here ]------------
[  955.559660][   C39] invalid opcode: 0000 [#1] SMP NOPTI
[  955.622171][   C39] CPU: 39 PID: 248 Comm: ksoftirqd/39 Tainted: G            E     5.8.0-default+ #40
[  955.622171][   C39] Hardware name: Lenovo ThinkSystem SR650 -[7X05CTO1WW]-/-[7X05CTO1WW]-, BIOS -[IVE160M-2.70]- 07/17/2020
[  955.622175][   C39] RIP: 0010:blk_mq_end_request+0x107/0x110
[  955.622177][   C39] Code: 48 8b 03 e9 59 ff ff ff 48 89 df 5b 5d 41 5c e9 9f ed ff ff 48 8b 35 98 3c f4 00 48 83 c7 10 48 83 c6 19 e8 cb 56 c9 ff eb cb <0f> 0b 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 56 41 54
[  955.622179][   C39] RSP: 0018:ffffb1288701fe28 EFLAGS: 00010202
[  955.749277][   C39] RAX: 0000000000000001 RBX: ffff956fffba5080 RCX: 0000000000004003
[  955.749278][   C39] RDX: 0000000000000003 RSI: 0000000000000000 RDI: 0000000000000000
[  955.749279][   C39] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[  955.749279][   C39] R10: ffffb1288701fd28 R11: 0000000000000001 R12: ffffffffa8e05160
[  955.749280][   C39] R13: 0000000000000004 R14: 0000000000000004 R15: ffffffffa7ad3a1e
[  955.749281][   C39] FS:  0000000000000000(0000) GS:ffff95bfbda00000(0000) knlGS:0000000000000000
[  955.749282][   C39] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  955.749282][   C39] CR2: 00007f6f0ef766a8 CR3: 0000005a37012002 CR4: 00000000007606e0
[  955.749283][   C39] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  955.749284][   C39] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  955.749284][   C39] PKRU: 55555554
[  955.749285][   C39] Call Trace:
[  955.749290][   C39]  blk_done_softirq+0x99/0xc0
[  957.550669][   C39]  __do_softirq+0xd3/0x45f
[  957.550677][   C39]  ? smpboot_thread_fn+0x2f/0x1e0
[  957.550679][   C39]  ? smpboot_thread_fn+0x74/0x1e0
[  957.550680][   C39]  ? smpboot_thread_fn+0x14e/0x1e0
[  957.550684][   C39]  run_ksoftirqd+0x30/0x60
[  957.550687][   C39]  smpboot_thread_fn+0x149/0x1e0
[  957.886225][   C39]  ? sort_range+0x20/0x20
[  957.886226][   C39]  kthread+0x137/0x160
[  957.886228][   C39]  ? kthread_park+0x90/0x90
[  957.886231][   C39]  ret_from_fork+0x22/0x30
[  959.117120][   C39] ---[ end trace 3dacdac97e2ed164 ]---

This is the procedure to reproduce the panic,
  # modprobe scsi_debug delay=0 dev_size_mb=2048 max_queue=1
  # losetup -f /dev/nvme0n1 --direct-io=on
  # blkdiscard /dev/loop0 -o 0 -l 0x200

This patch fixes the issue by checking q->limits.discard_granularity in
__blkdev_issue_discard() before composing the discard bio. If the value
is 0, then prints a warning oops information and returns -EOPNOTSUPP to
the caller to indicate that this buggy device driver doesn't support
discard request.

Fixes: 9b15d10 ("block: improve discard bio alignment in __blkdev_issue_discard()")
Fixes: c52abf5 ("loop: Better discard support for block devices")
Reported-and-suggested-by: Ming Lei <[email protected]>
Signed-off-by: Coly Li <[email protected]>
Reviewed-by: Ming Lei <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Jack Wang <[email protected]>
Cc: Bart Van Assche <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Darrick J. Wong <[email protected]>
Cc: Enzo Matsumiya <[email protected]>
Cc: Evan Green <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Martin K. Petersen <[email protected]>
Cc: Xiao Ni <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
jenkins-tessares pushed a commit that referenced this issue Aug 27, 2021
…king KUEP

Commit b5efec0 ("powerpc/32s: Move KUEP locking/unlocking in C")
removed the 'isync' instruction after adding/removing NX bit in user
segments. The reasoning behind this change was that when setting the
NX bit we don't mind it taking effect with delay as the kernel never
executes text from userspace, and when clearing the NX bit this is
to return to userspace and then the 'rfi' should synchronise the
context.

However, it looks like on book3s/32 having a hash page table, at least
on the G3 processor, we get an unexpected fault from userspace, then
this is followed by something wrong in the verification of MSR_PR
at end of another interrupt.

This is fixed by adding back the removed isync() following update
of NX bit in user segment registers. Only do it for cores with an
hash table, as 603 cores don't exhibit that problem and the two isync
increase ./null_syscall selftest by 6 cycles on an MPC 832x.

First problem: unexpected WARN_ON() for mysterious PROTFAULT

  WARNING: CPU: 0 PID: 1660 at arch/powerpc/mm/fault.c:354 do_page_fault+0x6c/0x5b0
  Modules linked in:
  CPU: 0 PID: 1660 Comm: Xorg Not tainted 5.13.0-pmac-00028-gb3c15b60339a #40
  NIP:  c001b5c8 LR: c001b6f8 CTR: 00000000
  REGS: e2d09e40 TRAP: 0700   Not tainted  (5.13.0-pmac-00028-gb3c15b60339a)
  MSR:  00021032 <ME,IR,DR,RI>  CR: 42d04f30  XER: 20000000
  GPR00: c000424c e2d09f00 c301b680 e2d09f40 0000001e 42000000 00cba028 00000000
  GPR08: 08000000 48000010 c301b680 e2d09f30 22d09f30 00c1fff0 00cba000 a7b7ba4c
  GPR16: 00000031 00000000 00000000 00000000 00000000 00000000 a7b7b0d0 00c5c010
  GPR24: a7b7b64c a7b7d2f0 00000004 00000000 c1efa6c0 00cba02c 00000300 e2d09f40
  NIP [c001b5c8] do_page_fault+0x6c/0x5b0
  LR [c001b6f8] do_page_fault+0x19c/0x5b0
  Call Trace:
  [e2d09f00] [e2d09f04] 0xe2d09f04 (unreliable)
  [e2d09f30] [c000424c] DataAccess_virt+0xd4/0xe4
  --- interrupt: 300 at 0xa7a261dc
  NIP:  a7a261dc LR: a7a253bc CTR: 00000000
  REGS: e2d09f40 TRAP: 0300   Not tainted  (5.13.0-pmac-00028-gb3c15b60339a)
  MSR:  0000d032 <EE,PR,ME,IR,DR,RI>  CR: 228428e2  XER: 20000000
  DAR: 00cba02c DSISR: 42000000
  GPR00: a7a27448 afa6b0e0 a74c35c0 a7b7b614 0000001e a7b7b614 00cba028 00000000
  GPR08: 00020fd9 00000031 00cb9ff8 a7a273b0 220028e2 00c1fff0 00cba000 a7b7ba4c
  GPR16: 00000031 00000000 00000000 00000000 00000000 00000000 a7b7b0d0 00c5c010
  GPR24: a7b7b64c a7b7d2f0 00000004 00000002 0000001e a7b7b614 a7b7aff4 00000030
  NIP [a7a261dc] 0xa7a261dc
  LR [a7a253bc] 0xa7a253bc
  --- interrupt: 300
  Instruction dump:
  7c4a1378 810300a0 75278410 83820298 83a300a4 553b018c 551e0036 4082038c
  2e1b0000 40920228 75280800 41820220 <0fe00000> 3b600000 41920214 81420594

Second problem: MSR PR is seen unset allthough the interrupt frame shows it set

  kernel BUG at arch/powerpc/kernel/interrupt.c:458!
  Oops: Exception in kernel mode, sig: 5 [#1]
  BE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=2 PowerMac
  Modules linked in:
  CPU: 0 PID: 1660 Comm: Xorg Tainted: G        W         5.13.0-pmac-00028-gb3c15b60339a #40
  NIP:  c0011434 LR: c001629c CTR: 00000000
  REGS: e2d09e70 TRAP: 0700   Tainted: G        W          (5.13.0-pmac-00028-gb3c15b60339a)
  MSR:  00029032 <EE,ME,IR,DR,RI>  CR: 42d09f30  XER: 00000000
  GPR00: 00000000 e2d09f30 c301b680 e2d09f40 83440000 c44d0e68 e2d09e8c 00000000
  GPR08: 00000002 00dc228a 00004000 e2d09f30 22d09f30 00c1fff0 afa6ceb4 00c26144
  GPR16: 00c25fb8 00c26140 afa6ceb8 90000000 00c944d8 0000001c 00000000 00200000
  GPR24: 00000000 000001fb afa6d1b4 00000001 00000000 a539a2a0 a530fd80 00000089
  NIP [c0011434] interrupt_exit_kernel_prepare+0x10/0x70
  LR [c001629c] interrupt_return+0x9c/0x144
  Call Trace:
  [e2d09f30] [c000424c] DataAccess_virt+0xd4/0xe4 (unreliable)
  --- interrupt: 300 at 0xa09be008
  NIP:  a09be008 LR: a09bdfe8 CTR: a09bdfc0
  REGS: e2d09f40 TRAP: 0300   Tainted: G        W          (5.13.0-pmac-00028-gb3c15b60339a)
  MSR:  0000d032 <EE,PR,ME,IR,DR,RI>  CR: 420028e2  XER: 20000000
  DAR: a539a308 DSISR: 0a000000
  GPR00: a7b90d50 afa6b2d0 a74c35c0 a0a8b690 a0a8b698 a5365d70 a4fa82a8 00000004
  GPR08: 00000000 a09bdfc0 00000000 a5360000 a09bde7c 00c1fff0 afa6ceb4 00c26144
  GPR16: 00c25fb8 00c26140 afa6ceb8 90000000 00c944d8 0000001c 00000000 00200000
  GPR24: 00000000 000001fb afa6d1b4 00000001 00000000 a539a2a0 a530fd80 00000089
  NIP [a09be008] 0xa09be008
  LR [a09bdfe8] 0xa09bdfe8
  --- interrupt: 300
  Instruction dump:
  80010024 83e1001c 7c0803a6 4bffff80 3bc00800 4bffffd0 486b42fd 4bffffcc
  81430084 71480002 41820038 554a0462 <0f0a0000> 80620060 74630001 40820034

Fixes: b5efec0 ("powerpc/32s: Move KUEP locking/unlocking in C")
Cc: [email protected] # v5.13+
Reported-by: Stan Johnson <[email protected]>
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/4856f5574906e2aec0522be17bf3848a22b2cd0b.1629269345.git.christophe.leroy@csgroup.eu
jenkins-tessares pushed a commit that referenced this issue Apr 5, 2022
…frame()

The following KASAN warning is detected by QEMU.

==================================================================
BUG: KASAN: stack-out-of-bounds in unwind_frame+0x508/0x870
Read of size 4 at addr c36bba90 by task cat/163

CPU: 1 PID: 163 Comm: cat Not tainted 5.10.0-rc1 #40
Hardware name: ARM-Versatile Express
[<c0113fac>] (unwind_backtrace) from [<c010e71c>] (show_stack+0x10/0x14)
[<c010e71c>] (show_stack) from [<c0b805b4>] (dump_stack+0x98/0xb0)
[<c0b805b4>] (dump_stack) from [<c0b7d658>] (print_address_description.constprop.0+0x58/0x4bc)
[<c0b7d658>] (print_address_description.constprop.0) from [<c031435c>] (kasan_report+0x154/0x170)
[<c031435c>] (kasan_report) from [<c0113c44>] (unwind_frame+0x508/0x870)
[<c0113c44>] (unwind_frame) from [<c010e298>] (__save_stack_trace+0x110/0x134)
[<c010e298>] (__save_stack_trace) from [<c01ce0d8>] (stack_trace_save+0x8c/0xb4)
[<c01ce0d8>] (stack_trace_save) from [<c0313520>] (kasan_set_track+0x38/0x60)
[<c0313520>] (kasan_set_track) from [<c0314cb8>] (kasan_set_free_info+0x20/0x2c)
[<c0314cb8>] (kasan_set_free_info) from [<c0313474>] (__kasan_slab_free+0xec/0x120)
[<c0313474>] (__kasan_slab_free) from [<c0311e20>] (kmem_cache_free+0x7c/0x334)
[<c0311e20>] (kmem_cache_free) from [<c01c35dc>] (rcu_core+0x390/0xccc)
[<c01c35dc>] (rcu_core) from [<c01013a8>] (__do_softirq+0x180/0x518)
[<c01013a8>] (__do_softirq) from [<c0135214>] (irq_exit+0x9c/0xe0)
[<c0135214>] (irq_exit) from [<c01a40e4>] (__handle_domain_irq+0xb0/0x110)
[<c01a40e4>] (__handle_domain_irq) from [<c0691248>] (gic_handle_irq+0xa0/0xb8)
[<c0691248>] (gic_handle_irq) from [<c0100b0c>] (__irq_svc+0x6c/0x94)
Exception stack(0xc36bb928 to 0xc36bb970)
b920:                   c36bb9c0 00000000 c0126919 c0101228 c36bb9c0 b76d7730
b940: c36b8000 c36bb9a0 c3335b00 c01ce0d8 00000003 c36bba3c c36bb940 c36bb978
b960: c010e298 c011373c 60000013 ffffffff
[<c0100b0c>] (__irq_svc) from [<c011373c>] (unwind_frame+0x0/0x870)
[<c011373c>] (unwind_frame) from [<00000000>] (0x0)

The buggy address belongs to the page:
page:(ptrval) refcount:0 mapcount:0 mapping:00000000 index:0x0 pfn:0x636bb
flags: 0x0()
raw: 00000000 00000000 ef867764 00000000 00000000 00000000 ffffffff 00000000
page dumped because: kasan: bad access detected

addr c36bba90 is located in stack of task cat/163 at offset 48 in frame:
 stack_trace_save+0x0/0xb4

this frame has 1 object:
 [32, 48) 'trace'

Memory state around the buggy address:
 c36bb980: f1 f1 f1 f1 00 04 f2 f2 00 00 f3 f3 00 00 00 00
 c36bba00: 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
>c36bba80: 00 00 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00
                 ^
 c36bbb00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 c36bbb80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
==================================================================

There is a same issue on x86 and has been resolved by the commit f7d27c3
("x86/mm, kasan: Silence KASAN warnings in get_wchan()").
The solution could be applied to arm architecture too.

Signed-off-by: Lin Yujun <[email protected]>
Reported-by: He Ying <[email protected]>
Signed-off-by: Russell King (Oracle) <[email protected]>
jenkins-tessares pushed a commit that referenced this issue Apr 9, 2022
The BPF STX/LDX instruction uses offset relative to the FP to address
stack space. Since the BPF_FP locates at the top of the frame, the offset
is usually a negative number. However, arm64 str/ldr immediate instruction
requires that offset be a positive number.  Therefore, this patch tries to
convert the offsets.

The method is to find the negative offset furthest from the FP firstly.
Then add it to the FP, calculate a bottom position, called FPB, and then
adjust the offsets in other STR/LDX instructions relative to FPB.

FPB is saved using the callee-saved register x27 of arm64 which is not
used yet.

Before adjusting the offset, the patch checks every instruction to ensure
that the FP does not change in run-time. If the FP may change, no offset
is adjusted.

For example, for the following bpftrace command:

  bpftrace -e 'kprobe:do_sys_open { printf("opening: %s\n", str(arg1)); }'

Without this patch, jited code(fragment):

   0:   bti     c
   4:   stp     x29, x30, [sp, #-16]!
   8:   mov     x29, sp
   c:   stp     x19, x20, [sp, #-16]!
  10:   stp     x21, x22, [sp, #-16]!
  14:   stp     x25, x26, [sp, #-16]!
  18:   mov     x25, sp
  1c:   mov     x26, #0x0                       // #0
  20:   bti     j
  24:   sub     sp, sp, #0x90
  28:   add     x19, x0, #0x0
  2c:   mov     x0, #0x0                        // #0
  30:   mov     x10, #0xffffffffffffff78        // #-136
  34:   str     x0, [x25, x10]
  38:   mov     x10, #0xffffffffffffff80        // #-128
  3c:   str     x0, [x25, x10]
  40:   mov     x10, #0xffffffffffffff88        // #-120
  44:   str     x0, [x25, x10]
  48:   mov     x10, #0xffffffffffffff90        // #-112
  4c:   str     x0, [x25, x10]
  50:   mov     x10, #0xffffffffffffff98        // #-104
  54:   str     x0, [x25, x10]
  58:   mov     x10, #0xffffffffffffffa0        // #-96
  5c:   str     x0, [x25, x10]
  60:   mov     x10, #0xffffffffffffffa8        // #-88
  64:   str     x0, [x25, x10]
  68:   mov     x10, #0xffffffffffffffb0        // #-80
  6c:   str     x0, [x25, x10]
  70:   mov     x10, #0xffffffffffffffb8        // #-72
  74:   str     x0, [x25, x10]
  78:   mov     x10, #0xffffffffffffffc0        // #-64
  7c:   str     x0, [x25, x10]
  80:   mov     x10, #0xffffffffffffffc8        // #-56
  84:   str     x0, [x25, x10]
  88:   mov     x10, #0xffffffffffffffd0        // #-48
  8c:   str     x0, [x25, x10]
  90:   mov     x10, #0xffffffffffffffd8        // #-40
  94:   str     x0, [x25, x10]
  98:   mov     x10, #0xffffffffffffffe0        // #-32
  9c:   str     x0, [x25, x10]
  a0:   mov     x10, #0xffffffffffffffe8        // #-24
  a4:   str     x0, [x25, x10]
  a8:   mov     x10, #0xfffffffffffffff0        // #-16
  ac:   str     x0, [x25, x10]
  b0:   mov     x10, #0xfffffffffffffff8        // #-8
  b4:   str     x0, [x25, x10]
  b8:   mov     x10, #0x8                       // #8
  bc:   ldr     x2, [x19, x10]
  [...]

With this patch, jited code(fragment):

   0:   bti     c
   4:   stp     x29, x30, [sp, #-16]!
   8:   mov     x29, sp
   c:   stp     x19, x20, [sp, #-16]!
  10:   stp     x21, x22, [sp, #-16]!
  14:   stp     x25, x26, [sp, #-16]!
  18:   stp     x27, x28, [sp, #-16]!
  1c:   mov     x25, sp
  20:   sub     x27, x25, #0x88
  24:   mov     x26, #0x0                       // #0
  28:   bti     j
  2c:   sub     sp, sp, #0x90
  30:   add     x19, x0, #0x0
  34:   mov     x0, #0x0                        // #0
  38:   str     x0, [x27]
  3c:   str     x0, [x27, #8]
  40:   str     x0, [x27, #16]
  44:   str     x0, [x27, #24]
  48:   str     x0, [x27, #32]
  4c:   str     x0, [x27, #40]
  50:   str     x0, [x27, #48]
  54:   str     x0, [x27, #56]
  58:   str     x0, [x27, #64]
  5c:   str     x0, [x27, #72]
  60:   str     x0, [x27, #80]
  64:   str     x0, [x27, #88]
  68:   str     x0, [x27, #96]
  6c:   str     x0, [x27, #104]
  70:   str     x0, [x27, #112]
  74:   str     x0, [x27, #120]
  78:   str     x0, [x27, #128]
  7c:   ldr     x2, [x19, #8]
  [...]

Signed-off-by: Xu Kuohai <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
jenkins-tessares pushed a commit that referenced this issue Feb 21, 2023
Xsk Tx can be triggered via either sendmsg() or poll() syscalls. These
two paths share a call to common function xsk_xmit() which has two
sanity checks within. A pseudo code example to show the two paths:

__xsk_sendmsg() :                       xsk_poll():
if (unlikely(!xsk_is_bound(xs)))        if (unlikely(!xsk_is_bound(xs)))
    return -ENXIO;                          return mask;
if (unlikely(need_wait))                (...)
    return -EOPNOTSUPP;                 xsk_xmit()
mark napi id
(...)
xsk_xmit()

xsk_xmit():
if (unlikely(!(xs->dev->flags & IFF_UP)))
	return -ENETDOWN;
if (unlikely(!xs->tx))
	return -ENOBUFS;

As it can be observed above, in sendmsg() napi id can be marked on
interface that was not brought up and this causes a NULL ptr
dereference:

[31757.505631] BUG: kernel NULL pointer dereference, address: 0000000000000018
[31757.512710] #PF: supervisor read access in kernel mode
[31757.517936] #PF: error_code(0x0000) - not-present page
[31757.523149] PGD 0 P4D 0
[31757.525726] Oops: 0000 [#1] PREEMPT SMP NOPTI
[31757.530154] CPU: 26 PID: 95641 Comm: xdpsock Not tainted 6.2.0-rc5+ #40
[31757.536871] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019
[31757.547457] RIP: 0010:xsk_sendmsg+0xde/0x180
[31757.551799] Code: 00 75 a2 48 8b 00 a8 04 75 9b 84 d2 74 69 8b 85 14 01 00 00 85 c0 75 1b 48 8b 85 28 03 00 00 48 8b 80 98 00 00 00 48 8b 40 20 <8b> 40 18 89 85 14 01 00 00 8b bd 14 01 00 00 81 ff 00 01 00 00 0f
[31757.570840] RSP: 0018:ffffc90034f27dc0 EFLAGS: 00010246
[31757.576143] RAX: 0000000000000000 RBX: ffffc90034f27e18 RCX: 0000000000000000
[31757.583389] RDX: 0000000000000001 RSI: ffffc90034f27e18 RDI: ffff88984cf3c100
[31757.590631] RBP: ffff88984714a800 R08: ffff88984714a800 R09: 0000000000000000
[31757.597877] R10: 0000000000000001 R11: 0000000000000000 R12: 00000000fffffffa
[31757.605123] R13: 0000000000000000 R14: 0000000000000003 R15: 0000000000000000
[31757.612364] FS:  00007fb4c5931180(0000) GS:ffff88afdfa00000(0000) knlGS:0000000000000000
[31757.620571] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[31757.626406] CR2: 0000000000000018 CR3: 000000184b41c003 CR4: 00000000007706e0
[31757.633648] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[31757.640894] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[31757.648139] PKRU: 55555554
[31757.650894] Call Trace:
[31757.653385]  <TASK>
[31757.655524]  sock_sendmsg+0x8f/0xa0
[31757.659077]  ? sockfd_lookup_light+0x12/0x70
[31757.663416]  __sys_sendto+0xfc/0x170
[31757.667051]  ? do_sched_setscheduler+0xdb/0x1b0
[31757.671658]  __x64_sys_sendto+0x20/0x30
[31757.675557]  do_syscall_64+0x38/0x90
[31757.679197]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
[31757.687969] Code: 8e f6 ff 44 8b 4c 24 2c 4c 8b 44 24 20 41 89 c4 44 8b 54 24 28 48 8b 54 24 18 b8 2c 00 00 00 48 8b 74 24 10 8b 7c 24 08 0f 05 <48> 3d 00 f0 ff ff 77 3a 44 89 e7 48 89 44 24 08 e8 b5 8e f6 ff 48
[31757.707007] RSP: 002b:00007ffd49c73c70 EFLAGS: 00000293 ORIG_RAX: 000000000000002c
[31757.714694] RAX: ffffffffffffffda RBX: 000055a996565380 RCX: 00007fb4c5727c16
[31757.721939] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003
[31757.729184] RBP: 0000000000000040 R08: 0000000000000000 R09: 0000000000000000
[31757.736429] R10: 0000000000000040 R11: 0000000000000293 R12: 0000000000000000
[31757.743673] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[31757.754940]  </TASK>

To fix this, let's make xsk_xmit a function that will be responsible for
generic Tx, where RCU is handled accordingly and pull out sanity checks
and xs->zc handling. Populate sanity checks to __xsk_sendmsg() and
xsk_poll().

Fixes: ca2e1a6 ("xsk: Mark napi_id on sendmsg()")
Fixes: 18b1ab7 ("xsk: Fix race at socket teardown")
Signed-off-by: Maciej Fijalkowski <[email protected]>
Reviewed-by: Alexander Lobakin <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Martin KaFai Lau <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
jenkins-tessares pushed a commit that referenced this issue Sep 15, 2023
Inject fault while probing btrfs.ko, if kstrdup() fails in
eventfs_prepare_ef() in eventfs_add_dir(), it will return ERR_PTR
to assign file->ef. But the eventfs_remove() check NULL in
trace_module_remove_events(), which causes the below NULL
pointer dereference.

As both Masami and Steven suggest, allocater side should handle the
error carefully and remove it, so fix the places where it failed.

 Could not create tracefs 'raid56_write' directory
 Btrfs loaded, zoned=no, fsverity=no
 Unable to handle kernel NULL pointer dereference at virtual address 000000000000001c
 Mem abort info:
   ESR = 0x0000000096000004
   EC = 0x25: DABT (current EL), IL = 32 bits
   SET = 0, FnV = 0
   EA = 0, S1PTW = 0
   FSC = 0x04: level 0 translation fault
 Data abort info:
   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
 user pgtable: 4k pages, 48-bit VAs, pgdp=0000000102544000
 [000000000000001c] pgd=0000000000000000, p4d=0000000000000000
 Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
 Dumping ftrace buffer:
    (ftrace buffer empty)
 Modules linked in: btrfs(-) libcrc32c xor xor_neon raid6_pq cfg80211 rfkill 8021q garp mrp stp llc ipv6 [last unloaded: btrfs]
 CPU: 15 PID: 1343 Comm: rmmod Tainted: G                 N 6.5.0+ #40
 Hardware name: linux,dummy-virt (DT)
 pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
 pc : eventfs_remove_rec+0x24/0xc0
 lr : eventfs_remove+0x68/0x1d8
 sp : ffff800082d63b60
 x29: ffff800082d63b60 x28: ffffb84b80ddd00c x27: ffffb84b3054ba40
 x26: 0000000000000002 x25: ffff800082d63bf8 x24: ffffb84b8398e440
 x23: ffffb84b82af3000 x22: dead000000000100 x21: dead000000000122
 x20: ffff800082d63bf8 x19: fffffffffffffff4 x18: ffffb84b82508820
 x17: 0000000000000000 x16: 0000000000000000 x15: 000083bc876a3166
 x14: 000000000000006d x13: 000000000000006d x12: 0000000000000000
 x11: 0000000000000001 x10: 00000000000017e0 x9 : 0000000000000001
 x8 : 0000000000000000 x7 : 0000000000000000 x6 : ffffb84b84289804
 x5 : 0000000000000000 x4 : 9696969696969697 x3 : ffff33a5b7601f38
 x2 : 0000000000000000 x1 : ffff800082d63bf8 x0 : fffffffffffffff4
 Call trace:
  eventfs_remove_rec+0x24/0xc0
  eventfs_remove+0x68/0x1d8
  remove_event_file_dir+0x88/0x100
  event_remove+0x140/0x15c
  trace_module_notify+0x1fc/0x230
  notifier_call_chain+0x98/0x17c
  blocking_notifier_call_chain+0x4c/0x74
  __arm64_sys_delete_module+0x1a4/0x298
  invoke_syscall+0x44/0x100
  el0_svc_common.constprop.1+0x68/0xe0
  do_el0_svc+0x1c/0x28
  el0_svc+0x3c/0xc4
  el0t_64_sync_handler+0xa0/0xc4
  el0t_64_sync+0x174/0x178
 Code: 5400052c a90153b3 aa0003f3 aa0103f4 (f9401400)
 ---[ end trace 0000000000000000 ]---
 Kernel panic - not syncing: Oops: Fatal exception
 SMP: stopping secondary CPUs
 Dumping ftrace buffer:
    (ftrace buffer empty)
 Kernel Offset: 0x384b00c00000 from 0xffff800080000000
 PHYS_OFFSET: 0xffffcc5b80000000
 CPU features: 0x88000203,3c020000,1000421b
 Memory Limit: none
 Rebooting in 1 seconds..

Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
Link: https://lore.kernel.org/all/[email protected]/
Link: https://lore.kernel.org/all/[email protected]/
Link: https://lore.kernel.org/all/[email protected]/
Link: https://lore.kernel.org/all/[email protected]/

Cc: Ajay Kaher <[email protected]>
Fixes: 5bdcd5f ("eventfs: Implement removal of meta data from eventfs")
Signed-off-by: Jinjie Ruan <[email protected]>
Suggested-by: Masami Hiramatsu (Google) <[email protected]>
Suggested-by: Steven Rostedt <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
matttbe pushed a commit that referenced this issue Feb 9, 2024
A last minute revert in 6.7-final introduced a potential deadlock when
enabling ASPM during probe of Qualcomm PCIe controllers as reported by
lockdep:

  ============================================
  WARNING: possible recursive locking detected
  6.7.0 #40 Not tainted
  --------------------------------------------
  kworker/u16:5/90 is trying to acquire lock:
  ffffacfa78ced000 (pci_bus_sem){++++}-{3:3}, at: pcie_aspm_pm_state_change+0x58/0xdc

              but task is already holding lock:
  ffffacfa78ced000 (pci_bus_sem){++++}-{3:3}, at: pci_walk_bus+0x34/0xbc

              other info that might help us debug this:
   Possible unsafe locking scenario:

         CPU0
         ----
    lock(pci_bus_sem);
    lock(pci_bus_sem);

               *** DEADLOCK ***

  Call trace:
   print_deadlock_bug+0x25c/0x348
   __lock_acquire+0x10a4/0x2064
   lock_acquire+0x1e8/0x318
   down_read+0x60/0x184
   pcie_aspm_pm_state_change+0x58/0xdc
   pci_set_full_power_state+0xa8/0x114
   pci_set_power_state+0xc4/0x120
   qcom_pcie_enable_aspm+0x1c/0x3c [pcie_qcom]
   pci_walk_bus+0x64/0xbc
   qcom_pcie_host_post_init_2_7_0+0x28/0x34 [pcie_qcom]

The deadlock can easily be reproduced on machines like the Lenovo ThinkPad
X13s by adding a delay to increase the race window during asynchronous
probe where another thread can take a write lock.

Add a new pci_set_power_state_locked() and associated helper functions that
can be called with the PCI bus semaphore held to avoid taking the read lock
twice.

Link: https://lore.kernel.org/r/[email protected]
Link: https://lore.kernel.org/r/[email protected]
Fixes: f93e71a ("Revert "PCI/ASPM: Remove pcie_aspm_pm_state_change()"")
Signed-off-by: Johan Hovold <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Cc: <[email protected]>	# 6.7
matttbe pushed a commit that referenced this issue Aug 12, 2024
The queue stats API queries the queues according to the
real_num_[tr]x_queues, in case the device is down and channels were not
yet created, don't try to query their statistics.

To trigger the panic, run this command before the interface is brought
up:
./cli.py --spec ../../../Documentation/netlink/specs/netdev.yaml --dump qstats-get --json '{"ifindex": 4}'

BUG: kernel NULL pointer dereference, address: 0000000000000c00
PGD 0 P4D 0
Oops: Oops: 0000 [#1] SMP PTI
CPU: 3 UID: 0 PID: 977 Comm: python3 Not tainted 6.10.0+ #40
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
RIP: 0010:mlx5e_get_queue_stats_rx+0x3c/0xb0 [mlx5_core]
Code: fc 55 48 63 ee 53 48 89 d3 e8 40 3d 70 e1 85 c0 74 58 4c 89 ef e8 d4 07 04 00 84 c0 75 41 49 8b 84 24 f8 39 00 00 48 8b 04 e8 <48> 8b 90 00 0c 00 00 48 03 90 40 0a 00 00 48 89 53 08 48 8b 90 08
RSP: 0018:ffff888116be37d0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff888116be3868 RCX: 0000000000000004
RDX: ffff88810ada4000 RSI: 0000000000000000 RDI: ffff888109df09c0
RBP: 0000000000000000 R08: 0000000000000004 R09: 0000000000000004
R10: ffff88813461901c R11: ffffffffffffffff R12: ffff888109df0000
R13: ffff888109df09c0 R14: ffff888116be38d0 R15: 0000000000000000
FS:  00007f4375d5c740(0000) GS:ffff88852c980000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000c00 CR3: 0000000106ada006 CR4: 0000000000370eb0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 ? __die+0x1f/0x60
 ? page_fault_oops+0x14e/0x3d0
 ? exc_page_fault+0x73/0x130
 ? asm_exc_page_fault+0x22/0x30
 ? mlx5e_get_queue_stats_rx+0x3c/0xb0 [mlx5_core]
 netdev_nl_stats_by_netdev+0x2a6/0x4c0
 ? __rmqueue_pcplist+0x351/0x6f0
 netdev_nl_qstats_get_dumpit+0xc4/0x1b0
 genl_dumpit+0x2d/0x80
 netlink_dump+0x199/0x410
 __netlink_dump_start+0x1aa/0x2c0
 genl_family_rcv_msg_dumpit+0x94/0xf0
 ? __pfx_genl_start+0x10/0x10
 ? __pfx_genl_dumpit+0x10/0x10
 ? __pfx_genl_done+0x10/0x10
 genl_rcv_msg+0x116/0x2b0
 ? __pfx_netdev_nl_qstats_get_dumpit+0x10/0x10
 ? __pfx_genl_rcv_msg+0x10/0x10
 netlink_rcv_skb+0x54/0x100
 genl_rcv+0x24/0x40
 netlink_unicast+0x21a/0x340
 netlink_sendmsg+0x1f4/0x440
 __sys_sendto+0x1b6/0x1c0
 ? do_sock_setsockopt+0xc3/0x180
 ? __sys_setsockopt+0x60/0xb0
 __x64_sys_sendto+0x20/0x30
 do_syscall_64+0x50/0x110
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7f43757132b0
Code: c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 1d 45 31 c9 45 31 c0 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 68 c3 0f 1f 80 00 00 00 00 41 54 48 83 ec 20
RSP: 002b:00007ffd258da048 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00007ffd258da0f8 RCX: 00007f43757132b0
RDX: 000000000000001c RSI: 00007f437464b850 RDI: 0000000000000003
RBP: 00007f4375085de0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: ffffffffc4653600 R14: 0000000000000001 R15: 00007f43751a6147
 </TASK>
Modules linked in: netconsole xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_registry overlay rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm mlx5_ib ib_uverbs ib_core zram zsmalloc mlx5_core fuse [last unloaded: netconsole]
CR2: 0000000000000c00
---[ end trace 0000000000000000 ]---
RIP: 0010:mlx5e_get_queue_stats_rx+0x3c/0xb0 [mlx5_core]
Code: fc 55 48 63 ee 53 48 89 d3 e8 40 3d 70 e1 85 c0 74 58 4c 89 ef e8 d4 07 04 00 84 c0 75 41 49 8b 84 24 f8 39 00 00 48 8b 04 e8 <48> 8b 90 00 0c 00 00 48 03 90 40 0a 00 00 48 89 53 08 48 8b 90 08
RSP: 0018:ffff888116be37d0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff888116be3868 RCX: 0000000000000004
RDX: ffff88810ada4000 RSI: 0000000000000000 RDI: ffff888109df09c0
RBP: 0000000000000000 R08: 0000000000000004 R09: 0000000000000004
R10: ffff88813461901c R11: ffffffffffffffff R12: ffff888109df0000
R13: ffff888109df09c0 R14: ffff888116be38d0 R15: 0000000000000000
FS:  00007f4375d5c740(0000) GS:ffff88852c980000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000c00 CR3: 0000000106ada006 CR4: 0000000000370eb0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

Fixes: 7b66ae5 ("net/mlx5e: Add per queue netdev-genl stats")
Signed-off-by: Gal Pressman <[email protected]>
Signed-off-by: Tariq Toukan <[email protected]>
Reviewed-by: Joe Damato <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant