You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I think I encountered a bug about syncronization between ppoll fds and frr event structures. I printed ppoll's return value by LD_PRELOAD-ing libc function calls. In bgpd_io thread(the ppoll call is returned before first line is printed, it then prints the returned events). The first column is MONOTONIC timestamp in microsecs.
000000000596019575 S34: __ppoll_chk(nfds=7, timeout=-1us)=1
000000000596019577 S34: fds[0]=31 is normal fd, events=POLLIN, revents=
000000000596019579 S34: fds[1]=37 is TCP fd, events=POLLIN, revents=
000000000596019581 S34: fds[2]=40 is TCP fd, events=POLLIN, revents=
000000000596019583 S34: fds[3]=42 is TCP fd, events=POLLIN, revents=
000000000596021361 S34: fds[4]=43 is TCP fd, events=, revents=
000000000596021367 S34: fds[5]=45 is TCP fd, events=, revents=POLLHUP
000000000596021369 S34: fds[6]=21 is normal fd, events=POLLIN, revents=
000000000596021375 S34: Entering read(fd=45)
and it blocks at the final read call forever. Note the gap between 596019583 and 596021361.
On the other side, in bgpd main process, following things happened during this gap:
That is, the fd 45 is closed at main process and this fd is immediately reused by another connection, even before the bgpd_io thread handles the ppoll return value.
However, I still cannot understand why the read() in bgpd_io blocked, given the fcntl(45, F_SETFL) sets the O_NONBLOCK flag (0x800 or 04000) successfully according to the log above.
My question is :1. Is this handled in frr in any way? 2. If not, is the blocking read() caused by this error?
P.S. I noticed a difference between this kind of fd and normal fd, that is, normal fd should try to get the RTT with getsockopt. This is why I doubt the above error isn't handled in frr.
I can only reproduce this using large topologies where each node is emulated in docker, like a k=30 fat tree. To reproduce in small topologies, maybe we can sleep for a while after the ppoll call, and make lots of connections at the peer during the sleep, to effectively reuse the fd within the sleep time.
Expected behavior
No blocking for read() call.
If possible, no fd reuse should happen before ppoll call returns back. Or if a reuse have to happen due to lock usage, detect and handle this reuse-after-close situation.
Actual behavior
read() call blocks. Maybe the fd reuse isn't detected and handled by FRR correctly.
Additional context
Actually, I replaced the TCP connections with Unix domain sockets by LD_PRELOAD-ing all socket related libc calls, but I currently don't think it's my preloading library's fault. Maybe I just want an explanation about whether this kind of fd reuse I observed is handled in frr, and how if it is handled. Then I can dig further by myself.
Besides, the blocking read() call have the following kernel stack (cat /proc/34/stack):
wow, that's a beauty - thanks for all that detailed info. I'd be very surprised if we expected this kind of fd reuse; I've seen in happen, but I think we'll have to do some work in bgpd to get this fixed.
Is there a plan to fix this? I think fixing this requires a thorough understanding of lock usages etc. in the event system, and is beyond my ability now.
It's going to take some thinking about what bgp is doing between the main and io pthreads to sort this out- that's not really a plan though.
Is there a plan to fix this? I think fixing this requires a thorough understanding of lock usages etc. in the event system, and is beyond my ability now.
Description
I think I encountered a bug about syncronization between ppoll fds and frr event structures. I printed ppoll's return value by LD_PRELOAD-ing libc function calls. In
bgpd_io
thread(the ppoll call is returned before first line is printed, it then prints the returned events). The first column is MONOTONIC timestamp in microsecs.and it blocks at the final read call forever. Note the gap between
596019583
and596021361
.On the other side, in
bgpd
main process, following things happened during this gap:That is, the fd 45 is closed at main process and this fd is immediately reused by another connection, even before the bgpd_io thread handles the ppoll return value.
However, I still cannot understand why the read() in
bgpd_io
blocked, given the fcntl(45, F_SETFL) sets the O_NONBLOCK flag (0x800 or 04000) successfully according to the log above.My question is :1. Is this handled in frr in any way? 2. If not, is the blocking read() caused by this error?
P.S. I noticed a difference between this kind of fd and normal fd, that is, normal fd should try to get the RTT with getsockopt. This is why I doubt the above error isn't handled in frr.
Version
How to reproduce
I can only reproduce this using large topologies where each node is emulated in docker, like a k=30 fat tree. To reproduce in small topologies, maybe we can sleep for a while after the ppoll call, and make lots of connections at the peer during the sleep, to effectively reuse the fd within the sleep time.
Expected behavior
No blocking for read() call.
If possible, no fd reuse should happen before ppoll call returns back. Or if a reuse have to happen due to lock usage, detect and handle this reuse-after-close situation.
Actual behavior
read() call blocks. Maybe the fd reuse isn't detected and handled by FRR correctly.
Additional context
Actually, I replaced the TCP connections with Unix domain sockets by LD_PRELOAD-ing all socket related libc calls, but I currently don't think it's my preloading library's fault. Maybe I just want an explanation about whether this kind of fd reuse I observed is handled in frr, and how if it is handled. Then I can dig further by myself.
Besides, the blocking read() call have the following kernel stack (
cat /proc/34/stack
):and user stack (printed by gdb):
Checklist
The text was updated successfully, but these errors were encountered: