Possible bug in event system? #17507

bill-xia · 2024-11-26T04:33:12Z

Description

I think I encountered a bug about syncronization between ppoll fds and frr event structures. I printed ppoll's return value by LD_PRELOAD-ing libc function calls. In bgpd_io thread(the ppoll call is returned before first line is printed, it then prints the returned events). The first column is MONOTONIC timestamp in microsecs.

000000000596019575 S34: __ppoll_chk(nfds=7, timeout=-1us)=1
000000000596019577 S34: fds[0]=31 is normal fd, events=POLLIN, revents=
000000000596019579 S34: fds[1]=37 is TCP fd, events=POLLIN, revents=
000000000596019581 S34: fds[2]=40 is TCP fd, events=POLLIN, revents=
000000000596019583 S34: fds[3]=42 is TCP fd, events=POLLIN, revents=
000000000596021361 S34: fds[4]=43 is TCP fd, events=, revents=
000000000596021367 S34: fds[5]=45 is TCP fd, events=, revents=POLLHUP
000000000596021369 S34: fds[6]=21 is normal fd, events=POLLIN, revents=
000000000596021375 S34: Entering read(fd=45)

and it blocks at the final read call forever. Note the gap between 596019583 and 596021361.

On the other side, in bgpd main process, following things happened during this gap:

000000000596019781 S32: Entering close(fd=45)
000000000596019802 S32: TCP close(fd=45)=0
000000000596020071 S32: TCP accept(fd=39, self_addr=169.254.20.57:179,peer_addr=169.254.20.58:10007)=45
000000000596020074 S32: Entering fcntl()
000000000596020076 S32: TCP fcntl(fd=45, cmd=F_GETFL)
000000000596020079 S32: Entering fcntl()
000000000596020080 S32: TCP fcntl(fd=45, cmd=F_SETFL)
000000000596020082 S32: F_SETFL 800 = 0
000000000596020085 S32: Entering getsockopt(45, SOL_SOCKET)
000000000596020087 S32: TCP getsockopt(45, SOL_SOCKET, SO_BINDTODEVICE=, 0)=0
000000000596020090 S32: Entering getsockopt(45, SOL_SOCKET)
000000000596020092 S32: TCP getsockopt(45, SOL_SOCKET, SO_SNDBUF=65536, 4)=0
000000000596020094 S32: Entering getsockopt(45, SOL_SOCKET)
000000000596020096 S32: TCP getsockopt(45, SOL_SOCKET, SO_RCVBUF=65536, 4)=0
000000000596020151 S32: getservbyname(bgp), port=179
000000000596020165 S32: Entering setsockopt(sockfd=45, level=IPPROTO_IP)
000000000596020167 S32: TCP setsockopt(45, IPPROTO_IP, IP_TTL=1, 4)=0
000000000596020356 S32: Entering getsockname()
000000000596020359 S32: Hijacked TCP getsockname(sockfd=45, addr=169.254.20.57:179)=0
000000000596020361 S32: Entering getpeername()
000000000596020363 S32: Hijacked TCP getpeername(sockfd=45, addr=169.254.20.58:10007)=0

That is, the fd 45 is closed at main process and this fd is immediately reused by another connection, even before the bgpd_io thread handles the ppoll return value.

However, I still cannot understand why the read() in bgpd_io blocked, given the fcntl(45, F_SETFL) sets the O_NONBLOCK flag (0x800 or 04000) successfully according to the log above.

My question is :1. Is this handled in frr in any way? 2. If not, is the blocking read() caused by this error?

P.S. I noticed a difference between this kind of fd and normal fd, that is, normal fd should try to get the RTT with getsockopt. This is why I doubt the above error isn't handled in frr.

000000000594864280 S32: Entering getsockopt(37, IPPROTO_TCP)
000000000594864282 S32: Hijacked TCP getsockopt(37, IPPROTO_TCP, TCP_INFO{tcpi_rtt=100}, 104)=0

Version

FRRouting 10.0-my-manual-build (wilson-ASUS-PC) on Linux(5.15.0-126-generic).
Copyright 1996-2005 Kunihiro Ishiguro, et al.
configured with:
    '--prefix=/usr' '--sysconfdir=/etc' '--localstatedir=/var' '--sbindir=/usr/lib/frr' '--enable-sharpd' '--enable-multipath=64' '--enable-user=frr' '--enable-group=frr' '--enable-config-rollbacks' '--enable-grpc' '--enable-vty-group=frrvty' '--enable-snmp=agentx' '--enable-scripting' '--with-pkg-extra-version=-my-manual-build'

How to reproduce

I can only reproduce this using large topologies where each node is emulated in docker, like a k=30 fat tree. To reproduce in small topologies, maybe we can sleep for a while after the ppoll call, and make lots of connections at the peer during the sleep, to effectively reuse the fd within the sleep time.

Expected behavior

No blocking for read() call.
If possible, no fd reuse should happen before ppoll call returns back. Or if a reuse have to happen due to lock usage, detect and handle this reuse-after-close situation.

Actual behavior

read() call blocks. Maybe the fd reuse isn't detected and handled by FRR correctly.

Additional context

Actually, I replaced the TCP connections with Unix domain sockets by LD_PRELOAD-ing all socket related libc calls, but I currently don't think it's my preloading library's fault. Maybe I just want an explanation about whether this kind of fd reuse I observed is handled in frr, and how if it is handled. Then I can dig further by myself.

Besides, the blocking read() call have the following kernel stack (cat /proc/34/stack):

[<0>] unix_stream_read_generic+0x3a7/0xd40
[<0>] unix_stream_recvmsg+0x8c/0xa0
[<0>] sock_recvmsg+0x75/0x80
[<0>] sock_read_iter+0x8f/0xf0
[<0>] new_sync_read+0x184/0x1a0
[<0>] vfs_read+0x105/0x1a0
[<0>] ksys_read+0xb5/0xf0
[<0>] __x64_sys_read+0x1a/0x20
[<0>] x64_sys_call+0x1dba/0x1fa0
[<0>] do_syscall_64+0x54/0xb0
[<0>] entry_SYSCALL_64_after_hwframe+0x6c/0xd6

and user stack (printed by gdb):

Thread 3 (Thread 0x7fec1c547640 (LWP 34) "bgpd_io"):
#0  __GI___libc_read (nbytes=655350, buf=0x562229c78120 <ibuf_scratch>, fd=45) at ../sysdeps/unix/sysv/linux/read.c:26
#1  __GI___libc_read (fd=45, buf=0x562229c78120 <ibuf_scratch>, nbytes=655350) at ../sysdeps/unix/sysv/linux/read.c:24
#2  0x00007fec1d8cd77d in read (fd=45, buf=0x562229c78120 <ibuf_scratch>, count=655350) at preload.cpp:203
#3  0x00005622299f2761 in read (__nbytes=<optimized out>, __buf=0x562229c78120 <ibuf_scratch>, __fd=<optimized out>) at /usr/include/x86_64-linux-gnu/bits/unistd.h:38
#4  bgp_read (code_p=<synthetic pointer>, connection=0x56223ac41040) at bgpd/bgp_io.c:515
#5  bgp_process_reads (thread=<optimized out>) at bgpd/bgp_io.c:239
#6  0x00007fec1d794854 in event_call (thread=thread@entry=0x7fec1c546950) at lib/event.c:2011
#7  0x00007fec1d72eca0 in fpt_run (arg=0x56223a6bfcf0) at lib/frr_pthread.c:334
#8  0x00007fec1d8d2a5f in my_pthread_starter (_myarg=0x56223abbe960) at preload.cpp:889
#9  0x00007fec1d3aaac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#10 0x00007fec1d43c850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Checklist

I have searched the open issues for this bug.
I have not included sensitive information in this report.

The text was updated successfully, but these errors were encountered:

bill-xia · 2024-11-26T04:36:30Z

Possible relevant issues: #16249 #16248

mjstapp · 2024-11-26T13:11:44Z

wow, that's a beauty - thanks for all that detailed info. I'd be very surprised if we expected this kind of fd reuse; I've seen in happen, but I think we'll have to do some work in bgpd to get this fixed.

bill-xia · 2024-11-29T12:31:43Z

Is there a plan to fix this? I think fixing this requires a thorough understanding of lock usages etc. in the event system, and is beyond my ability now.

mjstapp · 2024-12-02T14:12:21Z

It's going to take some thinking about what bgp is doing between the main and io pthreads to sort this out- that's not really a plan though.

Is there a plan to fix this? I think fixing this requires a thorough understanding of lock usages etc. in the event system, and is beyond my ability now.

bill-xia added the triage Needs further investigation label Nov 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible bug in event system? #17507

Possible bug in event system? #17507

bill-xia commented Nov 26, 2024

bill-xia commented Nov 26, 2024

mjstapp commented Nov 26, 2024

bill-xia commented Nov 29, 2024

mjstapp commented Dec 2, 2024

Possible bug in event system? #17507

Possible bug in event system? #17507

Comments

bill-xia commented Nov 26, 2024

Description

Version

How to reproduce

Expected behavior

Actual behavior

Additional context

Checklist

bill-xia commented Nov 26, 2024

mjstapp commented Nov 26, 2024

bill-xia commented Nov 29, 2024

mjstapp commented Dec 2, 2024