Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault under running kernel's netfilter concurrency testing #1750

Closed
xTire opened this issue Aug 23, 2024 · 20 comments · Fixed by #1752
Closed

Segmentation fault under running kernel's netfilter concurrency testing #1750

xTire opened this issue Aug 23, 2024 · 20 comments · Fixed by #1752

Comments

@xTire
Copy link
Contributor

xTire commented Aug 23, 2024

Context

  • Version of iperf3: 3.17.1+ (cJSON 1.7.15)

  • Hardware: x86_64&aarch64

  • Operating system (and distribution, if any): openEuler & openSUSE leap 15.6

  • Other relevant information :

  1. The number of processor cores exceeds 16. The more cores, the more likely the problem is to be triggered;
  2. Physical machines are much easier to reproduce than virtual machines

Bug Report

  • Expected Behavior
    The kernel self-test script has completed running

  • Actual Behavior
    When executing the concurrent test phase, a segmentation fault occurred in iperf3

TEST: concurrency
  net,port                                                      [ OK ]
  port,net                                                      [ OK ]
  net6,port                                                     ./nft_concat_range.sh: line 456: 799955 Segmentation fault      (core dumped) ip netns exec B "$@" > /dev/null 2>&1
./nft_concat_range.sh: line 456: 799958 Segmentation fault      (core dumped) ip netns exec B "$@" > /dev/null 2>&1
./nft_concat_range.sh: line 456: 799965 Segmentation fault      (core dumped) ip netns exec B "$@" > /dev/null 2>&1
[ OK ]
  net,port,net                                                  [ OK ]
  net6,port,net6,port                                           [ OK ]
TEST: timeout
  net,port                                                      [ OK ]
  • Steps to Reproduce
  1. Download Linux kernel self test script:
    https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/plain/tools/testing/selftests/netfilter/nft_concat_range.sh?h=v5.10.224
  2. Running the script multiple times in a row:
    for i in $(seq 5); do ./nft_concat_range.sh; done
  • Call trace
warning: Can't open file /tmp/iperf3.74s4y2 (deleted) during file-backed mapping note processing
[New LWP 2683041]
[New LWP 2682985]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib64/libthread_db.so.1".
Core was generated by `iperf3 -s -DB 2001:db8::0000:000c -p 13'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00000000392f1cb9 in ?? ()
[Current thread is 1 (Thread 0x7f38bd3f5640 (LWP 2683041))]
(gdb) bt
#0  0x00000000392f1cb9 in ?? ()
#1  0x0000000000000000 in ?? ()
(gdb) thread 2
[Switching to thread 2 (Thread 0x7f38bd3f6740 (LWP 2682985))]
#0  0x00007f38bd51214f in __GI___libc_write (nbytes=46, buf=0x7ffd738062a0, fd=2) at ../sysdeps/unix/sysv/linux/write.c:26
26        return SYSCALL_CANCEL (write, fd, buf, nbytes);
(gdb) bt
#0  0x00007f38bd51214f in __GI___libc_write (nbytes=46, buf=0x7ffd738062a0, fd=2) at ../sysdeps/unix/sysv/linux/write.c:26
#1  __GI___libc_write (fd=2, buf=0x7ffd738062a0, nbytes=46) at ../sysdeps/unix/sysv/linux/write.c:24
#2  0x00007f38bd4980cd in _IO_new_file_write (f=0x7f38bd6074c0 <_IO_2_1_stderr_>, data=0x7ffd738062a0, n=46) at fileops.c:1181
#3  0x00007f38bd497436 in new_do_write (fp=fp@entry=0x7f38bd6074c0 <_IO_2_1_stderr_>,
    data=data@entry=0x7ffd738062a0 "iperf3: interrupt - the server has terminated\n", to_do=to_do@entry=46) at libioP.h:947
#4  0x00007f38bd4987ee in _IO_new_file_xsputn (n=46, data=<optimized out>, f=<optimized out>) at fileops.c:1255
#5  _IO_new_file_xsputn (f=0x7f38bd6074c0 <_IO_2_1_stderr_>, data=<optimized out>, n=46) at fileops.c:1197
#6  0x00007f38bd485b5e in buffered_vfprintf (s=0x7f38bd6074c0 <_IO_2_1_stderr_>,
    format=format@entry=0x7f38bdab8c44 "iperf3: %s\n", args=args@entry=0x7ffd738082e0, mode_flags=mode_flags@entry=0)
    at ../libio/libioP.h:947
#7  0x00007f38bd484b6b in __vfprintf_internal (s=<optimized out>, format=0x7f38bdab8c44 "iperf3: %s\n",
    ap=ap@entry=0x7ffd738082e0, mode_flags=mode_flags@entry=0) at vfprintf-internal.c:1377
#8  0x00007f38bd47051a in __fprintf (stream=<optimized out>, format=<optimized out>) at fprintf.c:32
#9  0x00007f38bdaa997a in iperf_errexit (test=0x1b322a0, format=0x7f38bdab88c3 "interrupt - %s") at iperf_error.c:128
#10 0x00007f38bdaa872f in iperf_got_sigend (test=0x1b322a0) at iperf_api.c:4800
#11 0x00000000004013d6 in run (test=0x1b322a0) at main.c:148
#12 0x000000000040134f in main (argc=6, argv=0x7ffd73808a58) at main.c:124
(gdb)
@davidBar-On
Copy link
Contributor

Per the gdb output (see below), it may be that the problem happens when the iperf3 completes the tests and the server is killed (SIGTERM) in line 1340 of the shell script (kill $(for pid in ${flood_pids}; do echo ${pid}; done) 2>/dev/null).

In any case, it is not clear if the problem is in iperf3 or somehow in the openEuler implementation of sejmp() / longjmp(). To test this I suggest that you will run iperf3 as a server (iperf3 -s) and then kill -SIGTERM the iperf3 process, to see what happens. If the SIGSEGV doesn't happen, retry this but now using the actual command run when the error happened: ``iperf3 -s -DB 2001:db8::0000:000c -p 13'.

Other information that can help, is showing by gdb the registers contents when the error happened, and the assembly code around the command where it happened. That may help to understand what is causing the SIGSEGV and give a hint for what is the root cause.

Evaluation of the gdb output (which is very helpful!), shows that the SIGSEGV happened in __libc_write() which was called from this iperf3 fprintf() to stderr (trying to print "iperf3: interrupt - the server has terminated" as also seen in the gdb output). This is after the server's iperf_got_sigend(() signal handler was called catching either SIGINT, SIGTEM or SIGHUP, as set by iperf_catch_sigend(). The signal handler was called per the sejmp() / longjmp() set and executed here.

@xTire
Copy link
Contributor Author

xTire commented Aug 25, 2024

Per the gdb output (see below), it may be that the problem happens when the iperf3 completes the tests and the server is killed (SIGTERM) in line 1340 of the shell script (kill $(for pid in ${flood_pids}; do echo ${pid}; done) 2>/dev/null).

Indeed, iperf3 should have received the SIGTERM signal sent by the shell script before receiving SIGSEGV. This makes one have to consider thread/process synchronization when the signal arrives.

The script uses the -D option when starting iperf3 server mode, which can be seen from line 634 and line 734. After removing this option, the problem did not reproduce in thousands of tests. This again leads the problem to multi-process signal handling.

In any case, it is not clear if the problem is in iperf3 or somehow in the openEuler implementation of sejmp() / longjmp(). To test this I suggest that you will run iperf3 as a server (iperf3 -s) and then kill -SIGTERM the iperf3 process, to see what happens. If the SIGSEGV doesn't happen, retry this but now using the actual command run when the error happened: ``iperf3 -s -DB 2001:db8::0000:000c -p 13'.

It can be roughly understood that openEuler is a clone of CentOS 7. There is nothing unique implementation about setjmp()/longjmp(). Moreover, the problem can be reproduced on openSUSE leap 15.6(I'll add this information to the initial post), using the same iperf3 code and script.

Unfortunately, manually starting iperf3 -s and then kill -SIGTERM did not trigger the segmentation fault. As mentioned above, the -D option is most likely required.

Other information that can help, is showing by gdb the registers contents when the error happened, and the assembly code around the command where it happened. That may help to understand what is causing the SIGSEGV and give a hint for what is the root cause.

Evaluation of the gdb output (which is very helpful!), shows that the SIGSEGV happened in __libc_write() which was called from this iperf3 fprintf() to stderr (trying to print "iperf3: interrupt - the server has terminated" as also seen in the gdb output). This is after the server's iperf_got_sigend(() signal handler was called catching either SIGINT, SIGTEM or SIGHUP, as set by iperf_catch_sigend(). The signal handler was called per the sejmp() / longjmp() set and executed here.

gdb shows two threads, as shown in post 0, and the call trace that can be parsed (thread 2 in gdb, Thread 0x7f38bd3f6740) shows the working flow of the signal processing function of iperf3 when the SIGTERM signal is caught. However, it is thread 1 that generates SIGSEGV, and its call stack seems to be smashed (I have used CFLAG='-g -O0' ./configure to generate the Makefile of iperf3 before).

Let me read the corresponding code first, thank you for your hint anyway.

@xTire xTire closed this as completed Aug 25, 2024
@xTire xTire reopened this Aug 25, 2024
@xTire
Copy link
Contributor Author

xTire commented Aug 25, 2024

Closed by mistake, reopen it.

@ikernel-mryao
Copy link

Per the gdb output (see below), it may be that the problem happens when the iperf3 completes the tests and the server is killed (SIGTERM) in line 1340 of the shell script (kill $(for pid in ${flood_pids}; do echo ${pid}; done) 2>/dev/null).

In any case, it is not clear if the problem is in iperf3 or somehow in the openEuler implementation of sejmp() / longjmp(). To test this I suggest that you will run iperf3 as a server (iperf3 -s) and then kill -SIGTERM the iperf3 process, to see what happens. If the SIGSEGV doesn't happen, retry this but now using the actual command run when the error happened: ``iperf3 -s -DB 2001:db8::0000:000c -p 13'.

Other information that can help, is showing by gdb the registers contents when the error happened, and the assembly code around the command where it happened. That may help to understand what is causing the SIGSEGV and give a hint for what is the root cause.

Evaluation of the gdb output (which is very helpful!), shows that the SIGSEGV happened in __libc_write() which was called from this iperf3 fprintf() to stderr (trying to print "iperf3: interrupt - the server has terminated" as also seen in the gdb output). This is after the server's iperf_got_sigend(() signal handler was called catching either SIGINT, SIGTEM or SIGHUP, as set by iperf_catch_sigend(). The signal handler was called per the sejmp() / longjmp() set and executed here.

I have also reproduced this issue, and I found the first commit that caused coredump.
commit url: 277b84b
Here I test this commit.

Context:

  • Version of iperf3: 3.13 (cJSON 1.7.15)
    steps to compile iperf3
  1. git reset --hard e691ad3 (commit id e691ad3 refers to the url above)
  2. compile
  • Hardware: x86_64&aarch64

  • Operating system (and distribution, if any): openEuler

  • Other relevant information (for example, non-default compilers,

  • libraries, cross-compiling, etc.): The number of processor cores exceeds 16. The more cores, the more likely the problem is to be triggered.

Steps to Reproduce

  1. Download Linux kernel self test script:

    Modify nft_concat_range.sh and select a reproducible scenario. That is, TESTS="concurrency", TYPES="net_port_net" and set quicktest=1 to skip the TESTS=performance test.

  2. Running the script multiple times in a row:
    for i in $(seq 10); do ./nft_concat_range.sh; done

Bug Report

  • Expected Behavior

The kernel self-test script has completed running and print ok.

  • Actual Behavior

When executing the concurrent test phase, a segmentation fault occurred in iperf3. it produces many coredump files.
The stack of each core file is different, I selected one core file for analysis.

core-iperf3-3353376-11:

warning: Can't open file /tmp/iperf3.q7ZIKq (deleted) during file-backed mapping note processing
[New LWP 3353376]
[New LWP 3353471]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib64/libthread_db.so.1".
Core was generated by `iperf3 -s -DB 10.0.0.6 -p 2'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007faa9439124d in iperf_stats_callback (test=0x7662a0) at iperf_api.c:3214
3214        SLIST_FOREACH(sp, &test->streams, streams) {
[Current thread is 1 (Thread 0x7faa94091740 (LWP 3353376))]
(gdb) bt
#0  0x00007faa9439124d in iperf_stats_callback (test=0x7662a0) at iperf_api.c:3214
#1  0x00007faa94395d66 in iperf_got_sigend (test=0x7662a0) at iperf_api.c:4565
#2  0x00000000004013f0 in run (test=0x7662a0) at main.c:132
#3  0x0000000000401369 in main (argc=6, argv=0x7fff6bc354b8) at main.c:108
(gdb) x/i 0x7faa9439124d
=> 0x7faa9439124d <iperf_stats_callback+972>:   mov    0x1b0(%rax),%rax
(gdb) info registers
rax            0x7d1b83            8199043
rbx            0x0                 0
rcx            0x769318            7770904
rdx            0x0                 0
rsi            0x7fff6bc35280      140735001350784
rdi            0x7692e0            7770848
rbp            0x7fff6bc35300      0x7fff6bc35300
rsp            0x7fff6bc35140      0x7fff6bc35140
r8             0xffffffffffffffe0  -32
r9             0x80                128
r10            0x7faa94285b60      140370606840672
r11            0x7691c0            7770560
r12            0x7fff6bc354b8      140735001351352
r13            0x4012a2            4199074
r14            0x7faa943e1a80      140370608265856
r15            0x403dd8            4210136
rip            0x7faa9439124d      0x7faa9439124d <iperf_stats_callback+972>
eflags         0x10287             [ CF PF SF IF RF ]
cs             0x33                51
ss             0x2b                43
ds             0x0                 0
es             0x0                 0
fs             0x0                 0
gs             0x0                 0
k0             0x0                 0
k1             0x0                 0
k2             0x0                 0
k3             0x0                 0
k4             0x0                 0
k5             0x0                 0
k6             0x0                 0
k7             0x0                 0
(gdb) x/1x (0x7d1b83 + 0x1b0)
0x7d1d33:       Cannot access memory at address 0x7d1d33
(gdb) p/x  (size_t)&(((struct iperf_stream *)0)->streams.sle_next)
$1 = 0x1b0
(gdb) p sp
$2 = (struct iperf_stream *) 0x7d1b83
(gdb) thread apply all bt

Thread 2 (Thread 0x7faa94090640 (LWP 3353471)):
#0  0x0000000006133ba5 in ?? ()
#1  0x000117bc00000b36 in ?? ()
#2  0x0000000600000000 in ?? ()
#3  0x0000000000000001 in ?? ()
#4  0x0000000000084298 in ?? ()
#5  0x0000000000000006 in ?? ()
#6  0x00000000000a092b in ?? ()
#7  0x00000000000007ec in ?? ()
#8  0x0000000000000000 in ?? ()

Thread 1 (Thread 0x7faa94091740 (LWP 3353376)):
#0  0x00007faa9439124d in iperf_stats_callback (test=0x7662a0) at iperf_api.c:3214
#1  0x00007faa94395d66 in iperf_got_sigend (test=0x7662a0) at iperf_api.c:4565
#2  0x00000000004013f0 in run (test=0x7662a0) at main.c:132
#3  0x0000000000401369 in main (argc=6, argv=0x7fff6bc354b8) at main.c:108

maybe reason?
There are two threads here because "Create one thread" is in commit id e691ad3, and there is no -P option in the iperf3 command. When thread 1 executes to line 3214, thread 2 has already released iperf_test, so a segmentation fault occurs? Looking forward to your reply. Thank you very much.

@bmah888 bmah888 linked a pull request Aug 26, 2024 that will close this issue
@xTire
Copy link
Contributor Author

xTire commented Aug 27, 2024

Since the call trace in the core files(my test and ikernel-mryao) are incomplete or corrupted, I have to infer from the code logic that the race condition is caused by multiple threads calling the signal processing function at the same time.

When SIGTERM arrives, one of thread(It can be the main thread or a sub-thread) calls the signal processing function to prepare for exiting the process and then main thread frees the test resource. At this time, another thread also calls the signal processing function, and accesses the released test in the signal processing function, which eventually causes a segmentation fault.

PR #1752 allows only the main thread to handle signals, while the thread synchronization facilities in the existing code ensure that child threads do not race with the main thread for resources.

please let me know if you have any different opinions :)

@xTire
Copy link
Contributor Author

xTire commented Aug 27, 2024

When SIGTERM arrives, one of thread(It can be the main thread or a sub-thread) calls the signal processing function to prepare for exiting the process and then main thread frees the test resource. At this time, another thread also calls the signal processing function, and accesses the released test in the signal processing function, which eventually causes a segmentation fault.

I didn't realize that exit() may have undefined behavior when we call it from a signal handler. see man 3 exit, exit() is a MT-Unsafe interface.

call flow as below:

iperf_got_sigend()
     iperf_errexit()
          exit(1)

In the presence of multiple threads, it may be necessary to handle synchronization and exit more gracefully.

@davidBar-On
Copy link
Contributor

In the presence of multiple threads, it may be necessary to handle synchronization and exit more gracefully.

Good observation about exit()! This may be the right solution. I tried running iperf3 as a demon using PR #1752 code and after the server starts the first test, it cannot be terminated using SIGTERM. In addition, ^C and SIGTERM also don't kill the client.

If the exit() is the real problem, then the solution suggested by this StacjOverflow issue (using pthread_mutex_lock() before the exit()) may be the right approach.

@xTire
Copy link
Contributor Author

xTire commented Aug 28, 2024

Good observation about exit()! This may be the right solution. I tried running iperf3 as a demon using PR #1752 code and after the server starts the first test, it cannot be terminated using SIGTERM. In addition, ^C and SIGTERM also don't kill the client.

Oops, my bad. I didn't observe that the client needs to use a signal handler to control the exit logic. But no abnormal exit can be reproduced after PR #1752. This at least shows that there is a race condition in the multi-threaded response to the signal.

If the exit() is the real problem, then the solution suggested by this StacjOverflow issue (using pthread_mutex_lock() before the exit()) may be the right approach.

Unfortunately, I tried to use mutex to protect exit(), but it still didn't solve the issue. Please forgive my unfamiliarity with the code. So far, I personally have no good way to ensure that only the main thread calls exit().

@davidBar-On
Copy link
Contributor

I have also reproduced this issue, and I found the first commit that caused coredump.
commit url: 277b84b

@ikernel-mryao, thanks for the detailed information! It shows that the issue in this case happened in iperf_stats_callback() when trying to access the next stream in the SLIST_FOREACH(sp, &test->streams, streams) loop. However, my current thinking (see below) is that at that point the server has to be already in IPERF_DONE state, and iperf_got_sigend() calls iperf_stats_callback() only if the server is in the TEST_RUNNING state. It may be because the version used is an early implementation of the threads, so going forward it is better to use the latest version.

Based on the inputs and discussions so far, I think that the problem may be that the Server is killed before it completed the test cleanup. That is, the Client completed, but the server is still in the process of terminating the worker threads. This is the reason why there are still two server threads, as after the test cleanup, only the main thread should be running.

@xTire / @ikernel-mryao, to test the above conjuncture, can you retry running the test script (I don't have openEuler installed), but with adding sleep 5 before the kill $(for pid in ${flood_pids}; do echo ${pid}; done) 2>/dev/null line?

If this will solve the problem, the next step will be to find a way to change the code, so that the signal handler will not try to do problematic actions if its state is not TEST_RUNNING and maybe add different locks, or something like that.

@xTire
Copy link
Contributor Author

xTire commented Aug 30, 2024

@xTire / @ikernel-mryao, to test the above conjuncture, can you retry running the test script (I don't have openEuler installed), but with adding sleep 5 before the kill $(for pid in ${flood_pids}; do echo ${pid}; done) 2>/dev/null line?

If this will solve the problem, the next step will be to find a way to change the code, so that the signal handler will not try to do problematic actions if its state is not TEST_RUNNING and maybe add different locks, or something like that.

Sure, this should modify the test script to:

        for pid in ${flood_pids}; do
                sleep 5
                kill ${pid} 2>/dev/null
                echo ${pid}
        done

right?

IIUC, after adding sleep 5 before kill the symptom still persists whether using binary shipped with openEuler or compiled with master code.

Here, is the test script I'm currently using. You can try it on your local test environment. Thanks to @ikernel-mryao for simplifying the test suit to shorten the test time.
Theoretically, this problem can be reproduced on most Linux distributions, but the reproduction requires a certain number of processors (actually I have 64 cores). As I mentioned at the beginning, the more processor cores, the easier it is to reproduce.

@xTire
Copy link
Contributor Author

xTire commented Aug 31, 2024

Good observation about exit()! This may be the right solution. I tried running iperf3 as a demon using PR #1752 code and after the server starts the first test, it cannot be terminated using SIGTERM. In addition, ^C and SIGTERM also don't kill the client.

Oops, my bad. I didn't observe that the client needs to use a signal handler to control the exit logic. But no abnormal exit can be reproduced after PR #1752. This at least shows that there is a race condition in the multi-threaded response to the signal.

Hi @davidBar-On,
I wonder if I can ask you to review da21dc2 and 33a4b96? Smoke testing confirmed that the inability to terminate the client via ^C and this issue should have been fixed. Really don't want to mess up any existing facilities. TIA.

If the exit() is the real problem, then the solution suggested by this StacjOverflow issue (using pthread_mutex_lock() before the exit()) may be the right approach.

Unfortunately, I tried to use mutex to protect exit(), but it still didn't solve the issue. Please forgive my unfamiliarity with the code. So far, I personally have no good way to ensure that only the main thread calls exit().

I think the reason why using mutex() to protect exit()did not work before is because the scope of the critical section was not correctly determined. When iperf_got_sigend() was called, both threads were already running very close to the position whereexit(1) was called. A critical section that was too small was not enough to hold exit(), and a critical section that was too large might affect existing functions and even cause deadlock.

It would be great if gdb could parse all call traces, but in fact... , this is the obstacle to locating the problem.

If you can reproduce this problem locally and collect enough core files, you will always have a chance to find that a thread calls the alias __GI_exit(), which is a sign that glibc enters the exit() implementation(gdb info as follow). This is why I always think that exit() is the root cause of this issue. The call stack of the thread that cannot be parsed among the two threads is the thread that actually causes the segfault, but it is a victim.

[New LWP 3463634]
[New LWP 3463580]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib64/libthread_db.so.1".
Core was generated by `/usr/local/bin/iperf3 -s -DB 10.0.0.8 -p 4'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f54a2df90fd in state_to_text (state=0 '\000') at iperf_util.c:614
614             case EXCHANGE_RESULTS: txt = "EXCHANGE_RESULTS"; break;
[Current thread is 1 (Thread 0x7f54a273a640 (LWP 3463634))]
(gdb) t a a bt

Thread 2 (Thread 0x7f54a273b740 (LWP 3463580)):
#0  0x00007f54a2d42cc0 in __do_global_dtors_aux () from /usr/lib64/libssl.so.1.1
#1  0x00007f54a2e11e1b in _dl_fini () at dl-fini.c:139
#2  0x00007f54a279d5cd in __run_exit_handlers (status=1, listp=0x7f54a294b658 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true) at exit.c:116
#3  0x00007f54a279d710 in __GI_exit (status=<optimized out>) at exit.c:146
#4  0x00007f54a2dee9dc in iperf_errexit (test=0x152a2a0, format=0x7f54a2dfd8c3 "interrupt - %s") at iperf_error.c:138
#5  0x00007f54a2ded74f in iperf_got_sigend (test=0x152a2a0) at iperf_api.c:4800
#6  0x00000000004013d6 in run (test=0x152a2a0) at main.c:148
#7  0x000000000040134f in main (argc=6, argv=0x7ffd3767ae78) at main.c:124

Thread 1 (Thread 0x7f54a273a640 (LWP 3463634)):
#0  0x00007f54a2df90fd in state_to_text (state=0 '\000') at iperf_util.c:614
#1  0x0000000000000000 in ?? ()

@davidBar-On
Copy link
Contributor

davidBar-On commented Aug 31, 2024

I wonder if I can ask you to review da21dc2 and 33a4b96? Smoke testing confirmed that the inability to terminate the client via ^C and this issue should have been fixed. Really don't want to mess up any existing facilities. TIA.

I tested this version, and it seems to work o.k.😄 For both the server and the client (using -P4) I tried to kill the main thread or a "worker" thread and in all cases termination was as expected. I guess that anyway it will be good to also add a mutex() to protect the exit() in iperf_errexit().

Did you try running the tests using this version to see if it solves the SIGSEGV problem?

Sure, this should modify the test script to:
...
right?

Yes, this is what I meant (although the sleep 5 can be done only once before the servers keel loop). See below for what I think is the reason it didn't help.

reproduction requires a certain number of processors (actually I have 64 cores). As I mentioned at the beginning, the more processor cores, the easier it is to reproduce.

As the test is with only one stream, in principle the number of cores should not have an impact. Therefore, I think the problem might be in the operating system or in the system/network functionality (see below).

If you can reproduce this problem locally and collect enough core files, you will always have a chance to find that a thread calls the alias __GI_exit(), which is a sign that glibc enters the exit() implementation (gdb info as follow). This is why I always think that exit() is the root cause of this issue. The call stack of the thread that cannot be parsed among the two threads is the thread that actually causes the segfault, but it is a victim.

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f54a2df90fd in state_to_text (state=0 '\000') at iperf_util.c:614
614             case EXCHANGE_RESULTS: txt = "EXCHANGE_RESULTS"; break;

Unfortunately, I am not able to reproduce the problem. However, from this and the other gdb outputs I suspect that the root cause is related to the operating system multi-core processing. When everything works o.k., the server cancels all the "worker" threads when the client ends execution (maybe with some small delay, this is why I asked for the sleep 5).

Since both the main and the "worker" thread are still active when the server is being killed (even after the 5 seconds sleep), I can think of two options how this can happen:

  1. At the end of the test, the client sends IPERF_DONE state to the server and then the client exits immediately. At that point the server is in a read() function, waiting to receive the state. If for some reason the sent state did not reach the server because of some system/network issue, or the operating system did not terminate the read() with and error, the two threads of the server will be active when the server is killed.
  2. Similar to the above, but that the client terminated for some reason before the normal end. In this case I also expect that the server will be in a read() (or write()) function when killed.

In both cases, it may be that iperf3 receives a return code from read() that it doesn't handle correctly, etc. To check these conjunctures, the following may be done:

  1. After the sleep(5), but before killing the server, list all the active iperf3 threads, e.g. by ps -afT | grep iperf3. It may also help to run this command before waiting for the client termination, to see the active threads while the test is active.
  2. If possible, set -V --debug=3 options to both the client and the server and show their output when the problem happens. This may help to understand what happens (for example, this will show the states changes of the processes).

@xTire
Copy link
Contributor Author

xTire commented Sep 1, 2024

I tested this version, and it seems to work o.k.😄 For both the server and the client (using -P4) I tried to kill the main thread or a "worker" thread and in all cases termination was as expected. I guess that anyway it will be good to also add a mutex() to protect the exit() in iperf_errexit().

Thank you for your time.
I'm not against using mutex(), but I'm not confident about the scope of the critical section. It would be much appreciated if you could add that protection.

Did you try running the tests using this version to see if it solves the SIGSEGV problem?

Yes, I ran the test script through another 1000 loops, and no new segfault.

As the test is with only one stream, in principle the number of cores should not have an impact. Therefore, I think the problem might be in the operating system (see below).

Yes, multi-core processors do not impact the keel cycle. But they will impact the scheduling of multiple threads, especially when multiple threads enter the signal processing function at the same time to respond to the signal. This process is asynchronous, so mutex or synchronous operation is needed.

If you can reproduce this problem locally and collect enough core files, you will always have a chance to find that a thread calls the alias __GI_exit(), which is a sign that glibc enters the exit() implementation (gdb info as follow). This is why I always think that exit() is the root cause of this issue. The call stack of the thread that cannot be parsed among the two threads is the thread that actually causes the segfault, but it is a victim.

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f54a2df90fd in state_to_text (state=0 '\000') at iperf_util.c:614
614             case EXCHANGE_RESULTS: txt = "EXCHANGE_RESULTS"; break;

Unfortunately, I am not able to reproduce the problem.
...

If the number of processors is sufficient, it is recommended to use bare metal to reproduce the issue. The probability of reproducing in a virtual machine environment is very low. In another test environment of mine, a virtual machine with 48 processor cores was assigned and only 5 segmentation fault records were obtained after running the test script 200 times.

Let's go back to iperf_util.c:614, which caused the segmentation fault. The assignment of a fixed-length string to the local variable char *txt caused the segmentation fault. Since char *txt does not point to any address outside the scope of the function, I don't know what else can explain this phenomenon except that the resources corresponding to the function state_to_text() are released or destroyed.
Anyway, this is just one of many coredumps, and you'll notice that the stack trace for each segfault looks different, and always occurs in some unlikely code (see the attachment below).

In both cases, it may be that iperf3 receives a return code from read() that it doesn't handle correctly, etc. To check these conjunctures, the following may be done:

  1. After the sleep(5), but before killing the server, list all the active iperf3 threads, e.g. by ps -afT | grep iperf3. It may also help to run this command before waiting for the client termination, to see the active threads while the test is active.

Before the kill command in the script is executed use the 'iperf3' keyword to filter in htop to obtain the following. The white font in the "Command" column are the processes, and the green font are the threads. Similar results were obtained with ps -afT.
image

  1. If possible, set -V --debug=3 options to both the client and the server and show their output when the problem happens. This may help to understand what happens (for example, this will show the states changes of the processes).

Unfortunately, turning on debug output in my tests made the problem unreproducible. Remember what I mentioned before, if you remove the -D in -s -DB, the problem cannot be reproduced (or more accurately, it is more difficult to reproduce). This is because verbose output will cause relatively slow I/O to affect the system process scheduling. This will result in enough time for exit() to do a complete resource release before another thread is scheduled, so the process has completely exited before the segmentation fault occurs.

In case the problem cannot be reproduced in your environment, I have attached some core files I collected. The core files in the attachment are generated by ipef3 compiled by commit 67ca2c8, and also contains test script.
core_issue_1750.zip

Please let me know if I missed anything.

@davidBar-On
Copy link
Contributor

I'm not against using mutex(), but I'm not confident about the scope of the critical section. It would be much appreciated if you could add that protection.

I believe that mutex around these lines should be sufficient:

iperf/src/iperf_error.c

Lines 137 to 139 in 7679199

if (test)
iperf_delete_pidfile(test);
exit(1);

In addition, the iperf_timestrerr should be changed to a local variable in iperf_err() and iperf_errexit(), instead of being global. (These changes can be done in a separate PR, but I think it is better to add them to #1752.)

Yes, I ran the test script through another 1000 loops, and no new segfault.

Thanks great! I believe that in this case, PR #1752 is good enough for resolving the problem, even if the root case will not be fully understood.

I don't know what else can explain this phenomenon except that the resources corresponding to the function state_to_text() are released or destroyed.

I fully agree. The release of the resources by a "worker" thread (which your PR #1752 fixes) caused the fault in the main thread. What I still don't understand is why the "worker" thread is still active at that point.

Before the kill command in the script is executed use the 'iperf3' keyword to filter in htop to obtain the following.

Is this taken after the wait for the clients to terminate? If this is the case, it is very strange that the clients are still active, as the script should not have reached the kill line before all clients ended execution. Do you have the output of the echo ${pid}, or can even print "pids", so it will be clear whether the script waited for all the client to end?

In case the problem cannot be reproduced in your environment, I have attached some core files I collected. The core files in the attachment are generated by ipef3 compiled by commit 67ca2c8, and also contains test script.
core_issue_1750.zip

Thanks a lot! I will try to evaluate these core dumps later to see if they can provide any further insight about the problem.

@xTire
Copy link
Contributor Author

xTire commented Sep 2, 2024

I believe that mutex around these lines should be sufficient:

iperf/src/iperf_error.c

Lines 137 to 139 in 7679199

if (test)
iperf_delete_pidfile(test);
exit(1);

In addition, the iperf_timestrerr should be changed to a local variable in iperf_err() and iperf_errexit(), instead of being global. (These changes can be done in a separate PR, but I think it is better to add them to #1752.)

Sorry for posting the modification directly here, as it does not fix the SIGSEGV issue. For your double check:

diff --git a/src/iperf.h b/src/iperf.h
index f297587..8757402 100644
--- a/src/iperf.h
+++ b/src/iperf.h
@@ -288,6 +288,7 @@ enum debug_level {
 struct iperf_test
 {
     pthread_mutex_t print_mutex;
+    pthread_mutex_t exit_mutex;

     char      role;                             /* 'c' lient or 's' erver */
     enum iperf_mode mode;
diff --git a/src/iperf_api.c b/src/iperf_api.c
index dafbadf..7d71fee 100644
--- a/src/iperf_api.c
+++ b/src/iperf_api.c
@@ -2875,6 +2875,11 @@ iperf_new_test()
         perror("iperf_new_test: pthread_mutex_init");
     }

+
+    if (pthread_mutex_init(&(test->exit_mutex), &mutexattr) != 0) {
+        perror("iperf_new_test: pthread_mutex_init");
+    }
+
     pthread_mutexattr_destroy(&mutexattr);

     test->settings = (struct iperf_settings *) malloc(sizeof(struct iperf_settings));
@@ -3137,6 +3142,13 @@ iperf_free_test(struct iperf_test *test)
         errno = rc;
         perror("iperf_free_test: pthread_mutex_destroy");
     }
+
+    /* Destroy exit mutex */
+    rc = pthread_mutex_destroy(&(test->exit_mutex));
+    if (rc != 0) {
+        errno = rc;
+        perror("iperf_free_test: pthread_mutex_destroy");
+    }

     if (test->logfile) {
        free(test->logfile);
diff --git a/src/iperf_error.c b/src/iperf_error.c
index ce925a8..0b34f92 100644
--- a/src/iperf_error.c
+++ b/src/iperf_error.c
@@ -35,7 +35,6 @@

 int gerror;

-char iperf_timestrerr[100];

 /* Do a printf to stderr. */
 void
@@ -43,6 +42,7 @@ iperf_err(struct iperf_test *test, const char *format, ...)
 {
     va_list argp;
     char str[1000];
+    char iperf_timestrerr[100];
     time_t now;
     struct tm *ltm = NULL;
     char *ct = NULL;
@@ -91,6 +91,7 @@ iperf_errexit(struct iperf_test *test, const char *format, ...)
 {
     va_list argp;
     char str[1000];
+    char iperf_timestrerr[100];
     time_t now;
     struct tm *ltm = NULL;
     char *ct = NULL;
@@ -134,9 +135,20 @@ iperf_errexit(struct iperf_test *test, const char *format, ...)
     }

     va_end(argp);
+
+    if (pthread_mutex_lock(&(test->exit_mutex)) != 0) {
+           perror("iperf_errexit: pthread_mutex_lock:exit_mutext");
+    }
+
     if (test)
         iperf_delete_pidfile(test);
     exit(1);
+
+    /* Never run here, is unlock really needed? */
+    if (pthread_mutex_unlock(&(test->exit_mutex)) != 0) {
+           perror("iperf_errexit: pthread_mutex_unlock:exit_mutext");
+    }
+
 }

 int i_errno;

After the modification, a SIGSEGV occurred around the 3rd script loop:
image
and gdb said:

Using host libthread_db library "/usr/lib64/libthread_db.so.1".
Core was generated by `/usr/local/bin/iperf3 -s -DB 10.0.0.11 -p 7'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000000000000000 in ?? ()
[Current thread is 1 (Thread 0x7f7d06225640 (LWP 3390660))]
(gdb) bt
#0  0x0000000000000000 in ?? ()
#1  0x000000000000006e in ?? ()
#2  0x0000000000000000 in ?? ()
(gdb) t a a bt

Thread 2 (Thread 0x7f7d06226740 (LWP 3390641)):
#0  0x00007f7d062b5b40 in buffered_vfprintf (s=0x7ffc4bedbf10, format=format@entry=0x7f7d068e8c44 "iperf3: %s\n", args=args@entry=0x7ffc4bede050, mode_flags=mode_flags@entry=0) at vfprintf-internal.c:2275
#1  0x00007f7d062b4c1b in __vfprintf_internal (s=<optimized out>, format=0x7f7d068e8c44 "iperf3: %s\n", ap=ap@entry=0x7ffc4bede050, mode_flags=mode_flags@entry=0) at vfprintf-internal.c:1377
#2  0x00007f7d062a05ca in __fprintf (stream=<optimized out>, format=<optimized out>) at fprintf.c:32
#3  0x00007f7d068d9a22 in iperf_errexit (test=0x1fae2a0, format=0x7f7d068e88c3 "interrupt - %s") at iperf_error.c:129
#4  0x00007f7d068d87da in iperf_got_sigend (test=0x1fae2a0) at iperf_api.c:4812
#5  0x00000000004013d6 in run (test=0x1fae2a0) at main.c:148
#6  0x000000000040134f in main (argc=6, argv=0x7ffc4bede838) at main.c:124

Thread 1 (Thread 0x7f7d06225640 (LWP 3390660)):
#0  0x0000000000000000 in ?? ()
#1  0x000000000000006e in ?? ()
#2  0x0000000000000000 in ?? ()
(gdb)

To my uneducated eye, maybe a separated PR is needed. Based on the previous discussion, it should be very helpful to reproduce the problem locally.

Do you have the output of the echo ${pid}, or can even print "pids", so it will be clear whether the script waited for all the client to end?

Please ignore the htop output, the timing may not be correct (but at least it shows that the problem can be reproduced on a 64-core processor 😄 ).
After modifying the script to the following content and testing it, it shows that the client can exit.

     for pid in ${flood_pids}; do
                sleep 5
                kill ${pid} 2>/dev/null
                echo ${pid}
                cat /proc/${pid}/cmdline
        done

test output:
image
In addition, when the above test is running, we can see that the script runs 2 rounds and there is still a SIGSEGV has arrived, but the terminal does not display it.

@davidBar-On
Copy link
Contributor

@xTire, thanks for continuing to put effort for testing this issue.

Sorry for posting the modification directly here, as it does not fix the SIGSEGV issue. For your double check:
....
To my uneducated eye, maybe a separated PR is needed.

Maybe you are right that a separate PR is required, but if possible, I suggest another try. It may be that the problem with your solution is that exit_mutex is defined as part of the iperf_test structure. As the first exit releases test, the exit_mutex does not exist any more. Therefore, since only iperf_errexit() is using it, I suggest to define exit_mutex as a static variable in iperf_error.c instead of as part of the iperf_test structure..

/* Never run here, is unlock really needed? */

I don't know whether this is needed ... For now, I believe it is o.k. to keep it.

After modifying the script to the following content and testing it, it shows that the client can exit.

 for pid in ${flood_pids}; do
            sleep 5
            kill ${pid} 2>/dev/null
            echo ${pid}
            cat /proc/${pid}/cmdline
    done

The above change shows the status of the main Server threads (the flood_pids) after the kill. It does not show the status before the kill and does not show the Client process and threads (the Client main threads are in pids). I think some like the below may help:

	# shellcheck disable=SC2046,SC2086 # word splitting wanted here
        echo Waiting for Clients to terminate: ${pids}
        ps -afT | grep iperf3
	wait $(for pid in ${pids}; do echo ${pid}; done)
        echo Ended waiting for Clients to terminate
        ps -afT | grep iperf3
	for pid in ${pids}; do 
		echo ${pid}
		cat /proc/${pid}/cmdline
	done

	# shellcheck disable=SC2046,SC2086
	# test added(https://github.com/esnet/iperf/issues/1750#issuecomment-2317382909)
       sleep 5
       echo Killing server: ${flood_pids}
       ps -afT | grep iperf3
	for pid in ${flood_pids}; do
                echo ${pid} before kill 
		kill ${pid} 2>/dev/null
		echo ${pid} after kill
		cat /proc/${pid}/cmdline
	done 	

(Note that probably I will not be available for about the next two weeks, although I hope that there will be others that can help, especially if they have the environment for reproducing the issue.)

@xTire
Copy link
Contributor Author

xTire commented Sep 3, 2024

t may be that the problem with your solution is that exit_mutex is defined as part of the iperf_test structure. As the first exit releases test, the exit_mutex does not exist any more. Therefore, since only iperf_errexit() is using it, I suggest to define exit_mutex as a static variable in iperf_error.c instead of as part of the iperf_test structure..

I just tried it and it still doesn't work.

As you mentioned, the mutex belonging to test may have been released. However, once exit() is called, any resource, whether local or global, static or dynamic, may be released. IMHO, changing variable scope is futile. This is why I said earlier that the critical area cannot be determined.

In my opinion, the following three conditions need to be met at the same time to trigger this problem:

  1. Multithreading
  2. Signal handeling
  3. Calling exit()

As @ikernel-mryao mentioned,

I have also reproduced this issue, and I found the first commit that caused coredump.
commit url: 277b84b

I guess this commit is the first time the worker subthread is introduced. So, it will not meet condition 1 once code rollback; PR #1752 only allows the main thread to call the signal processing function, which also does not meet condition 1;
Avoiding condition 2 is probably impossible; In order not to satisfy condition 3, exit() needs to be moved out of the signal processing function, it will impact on API of iperf3.

So far, PR #1752 is a lower cost solution unless it introduces a new regression.

The above change shows the status of the main Server threads (the flood_pids) after the kill. It does not show the status before the kill and does not show the Client process and threads (the Client main threads are in pids). I think some like the below may help:
...

TEST: concurrency
  net,port,net                                                  Waiting for Clients to terminate:  4018532 4018533 4018535 4018537 4018539 4018541 4018543 4018545 4018548 4018551 4018554 4018557 4018559 4018563 4018565 4018570 4018574 4018581 4018585 4018589 4018592 4018598 4018606 4018614 4018618 4018624 4018631 4018642 4018648 4018655 4018663 4018672 4018678 4018688 4018699 4018709 4018720 4018733 4018740 4018753 4018764 4018772 4018784 4018789 4018800 4018815 4018824 4018834 4018843 4018852 4018866 4018873 4018887 4018896 4018908 4018913 4018924 4018930 4018941 4018951 4018962 4018971 4018977 4018985
root     4018510 4018510 4018381  0 17:07 pts/2    00:00:00 /usr/local/bin/iperf3 -c 10.0.0.6 -p 2 -B 10.0.7.214 -l16 -t 1000
root     4018510 4018512 4018381 48 17:07 pts/2    00:00:03 /usr/local/bin/iperf3 -c 10.0.0.6 -p 2 -B 10.0.7.214 -l16 -t 1000
root     4018513 4018513 4018399  0 17:07 pts/2    00:00:00 /usr/local/bin/iperf3 -c 10.0.0.7 -p 3 -B 10.0.7.215 -l16 -t 1000
root     4018513 4018515 4018399 48 17:07 pts/2    00:00:03 /usr/local/bin/iperf3 -c 10.0.0.7 -p 3 -B 10.0.7.215 -l16 -t 1000
root     4018516 4018516 4018422  0 17:07 pts/2    00:00:00 /usr/local/bin/iperf3 -c 10.0.0.8 -p 4 -B 10.0.7.216 -l16 -t 1000
root     4018516 4018518 4018422 47 17:07 pts/2    00:00:03 /usr/local/bin/iperf3 -c 10.0.0.8 -p 4 -B 10.0.7.216 -l16 -t 1000
root     4018519 4018519 4018442  0 17:07 pts/2    00:00:00 /usr/local/bin/iperf3 -c 10.0.0.9 -p 5 -B 10.0.7.217 -l16 -t 1000
root     4018519 4018520 4018442 47 17:07 pts/2    00:00:03 /usr/local/bin/iperf3 -c 10.0.0.9 -p 5 -B 10.0.7.217 -l16 -t 1000
root     4018522 4018522 4018465  0 17:07 pts/2    00:00:00 /usr/local/bin/iperf3 -c 10.0.0.10 -p 6 -B 10.0.7.218 -l16 -t 1000
root     4018522 4018525 4018465 47 17:07 pts/2    00:00:03 /usr/local/bin/iperf3 -c 10.0.0.10 -p 6 -B 10.0.7.218 -l16 -t 1000
root     4018526 4018526 4018488  0 17:07 pts/2    00:00:00 /usr/local/bin/iperf3 -c 10.0.0.11 -p 7 -B 10.0.7.219 -l16 -t 1000
root     4018526 4018529 4018488 48 17:07 pts/2    00:00:03 /usr/local/bin/iperf3 -c 10.0.0.11 -p 7 -B 10.0.7.219 -l16 -t 1000
root     4019001 4019001 4018329  0 17:07 pts/2    00:00:00 grep iperf3
Ended waiting for Clients to terminate
root     4064540 4064540 4018329  0 17:08 pts/2    00:00:00 grep iperf3
4018532
cat: /proc/4018532/cmdline: No such file or directory
4018533
cat: /proc/4018533/cmdline: No such file or directory
4018535
cat: /proc/4018535/cmdline: No such file or directory
4018537
cat: /proc/4018537/cmdline: No such file or directory
4018539
cat: /proc/4018539/cmdline: No such file or directory
4018541
cat: /proc/4018541/cmdline: No such file or directory
4018543
cat: /proc/4018543/cmdline: No such file or directory
4018545
cat: /proc/4018545/cmdline: No such file or directory
4018548
cat: /proc/4018548/cmdline: No such file or directory
4018551
cat: /proc/4018551/cmdline: No such file or directory
4018554
cat: /proc/4018554/cmdline: No such file or directory
4018557
cat: /proc/4018557/cmdline: No such file or directory
4018559
cat: /proc/4018559/cmdline: No such file or directory
4018563
cat: /proc/4018563/cmdline: No such file or directory
4018565
cat: /proc/4018565/cmdline: No such file or directory
4018570
cat: /proc/4018570/cmdline: No such file or directory
4018574
cat: /proc/4018574/cmdline: No such file or directory
4018581
cat: /proc/4018581/cmdline: No such file or directory
4018585
cat: /proc/4018585/cmdline: No such file or directory
4018589
cat: /proc/4018589/cmdline: No such file or directory
4018592
cat: /proc/4018592/cmdline: No such file or directory
4018598
cat: /proc/4018598/cmdline: No such file or directory
4018606
cat: /proc/4018606/cmdline: No such file or directory
4018614
cat: /proc/4018614/cmdline: No such file or directory
4018618
cat: /proc/4018618/cmdline: No such file or directory
4018624
cat: /proc/4018624/cmdline: No such file or directory
4018631
cat: /proc/4018631/cmdline: No such file or directory
4018642
cat: /proc/4018642/cmdline: No such file or directory
4018648
cat: /proc/4018648/cmdline: No such file or directory
4018655
cat: /proc/4018655/cmdline: No such file or directory
4018663
cat: /proc/4018663/cmdline: No such file or directory
4018672
cat: /proc/4018672/cmdline: No such file or directory
4018678
cat: /proc/4018678/cmdline: No such file or directory
4018688
cat: /proc/4018688/cmdline: No such file or directory
4018699
cat: /proc/4018699/cmdline: No such file or directory
4018709
cat: /proc/4018709/cmdline: No such file or directory
4018720
cat: /proc/4018720/cmdline: No such file or directory
4018733
cat: /proc/4018733/cmdline: No such file or directory
4018740
cat: /proc/4018740/cmdline: No such file or directory
4018753
cat: /proc/4018753/cmdline: No such file or directory
4018764
cat: /proc/4018764/cmdline: No such file or directory
4018772
cat: /proc/4018772/cmdline: No such file or directory
4018784
cat: /proc/4018784/cmdline: No such file or directory
4018789
cat: /proc/4018789/cmdline: No such file or directory
4018800
cat: /proc/4018800/cmdline: No such file or directory
4018815
cat: /proc/4018815/cmdline: No such file or directory
4018824
cat: /proc/4018824/cmdline: No such file or directory
4018834
cat: /proc/4018834/cmdline: No such file or directory
4018843
cat: /proc/4018843/cmdline: No such file or directory
4018852
cat: /proc/4018852/cmdline: No such file or directory
4018866
cat: /proc/4018866/cmdline: No such file or directory
4018873
cat: /proc/4018873/cmdline: No such file or directory
4018887
cat: /proc/4018887/cmdline: No such file or directory
4018896
cat: /proc/4018896/cmdline: No such file or directory
4018908
cat: /proc/4018908/cmdline: No such file or directory
4018913
cat: /proc/4018913/cmdline: No such file or directory
4018924
cat: /proc/4018924/cmdline: No such file or directory
4018930
cat: /proc/4018930/cmdline: No such file or directory
4018941
cat: /proc/4018941/cmdline: No such file or directory
4018951
cat: /proc/4018951/cmdline: No such file or directory
4018962
cat: /proc/4018962/cmdline: No such file or directory
4018971
cat: /proc/4018971/cmdline: No such file or directory
4018977
cat: /proc/4018977/cmdline: No such file or directory
4018985
cat: /proc/4018985/cmdline: No such file or directory
Killing server:  4018381 4018399 4018422 4018442 4018465 4018488
root     4064607 4064607 4018329  0 17:08 pts/2    00:00:00 grep iperf3
4018381 before kill
4018381 after kill
cat: /proc/4018381/cmdline: No such file or directory
4018399 before kill
4018399 after kill
cat: /proc/4018399/cmdline: No such file or directory
4018422 before kill
4018422 after kill
cat: /proc/4018422/cmdline: No such file or directory
4018442 before kill
4018442 after kill
cat: /proc/4018442/cmdline: No such file or directory
4018465 before kill
4018465 after kill
cat: /proc/4018465/cmdline: No such file or directory
4018488 before kill
4018488 after kill
cat: /proc/4018488/cmdline: No such file or directory
[ OK ]

(Note that probably I will not be available for about the next two weeks, although I hope that there will be others that can help, especially if they have the environment for reproducing the issue.)

Unfortunately, I may soon lose access to the physical machine environment and the machine will need to be used for other purposes. Really hope someone can reproduce this problem, so far, as long as you have enough processor cores.

@davidBar-On
Copy link
Contributor

So far, PR #1752 is a lower cost solution unless it introduces a new regression.

With your tests and analysis I now fully agree that PR #1752 should remain as is and the the exit() mutex should be handled separately (if at all).

The above change shows the status of the main Server threads (the flood_pids) after the kill. It does not show the status before the kill and does not show the Client process and threads (the Client main threads are in pids). I think some like the below may help:
...

From the output I see that I completely misunderstood the script and that pids does not hold the clients pids. Also, the server processes are not shown in the grep (are they running on a different machine)? I don't have time now to further understand this, but it would help if we can see the list or the running clients and servers and their threads: before waiting for the client to terminate, after waiting, 5 seconds after that but before killing the servers. This will will give better understanding of the running threads before the servers are killed, since the main issue is why the servers have more then one running thread at that point.

(As I wrote before, I will probably not be able to respond further in about the next two weeks, but if you will have more info about the processes before/after the kill it would be helpful to show it. Thanks.)

@xTire
Copy link
Contributor Author

xTire commented Sep 5, 2024

From the output I see that I completely misunderstood the script and that pids does not hold the clients pids. Also, the server processes are not shown in the grep (are they running on a different machine)?

pids holds the shell subprocess of test script as /proc/[PID]/cmdline's content is the name of test script.
So far, all data uploaded or shown in this issue are from the same machine with 64 processor cores.

@ikernel-mryao
Copy link

ikernel-mryao commented Sep 6, 2024

Let me briefly introduce the logic of the nft_concat_range.sh script.
In this script, we set the value of TESTS to "concurrency" and the value of TYPES to "net_port_net" (these two setting values ​​can reproduce the problem), and use this as an example to illustrate the general logic:

First of all, it should be reminded that the key function is test_concurrency.

  1. Create a new namespace B and set the virtual Ethernet device veth_b to send traffic, while the main namespace sets the virtual Ethernet device veth_a to receive traffic, and configure the default route to ensure communication OK.

  2. Execute 6 flood processes concurrently in the background, namely flood_pids. The first flood process creates two groups of flow tests, where the first group of server commands is iperf3 -s -DB 10.0.0.6 -p 2 and the client command is iperf3 -c 10.0.0.6 -p 2 -B 10.0.7.214 -l16 -t 1000 ; the second group of server commands is perf3 -s -DB 10.0.0.7 -p 3 and the client command is iperf3 -c 10.0.0.7 -p 3 -B 10.0.7.215 -l16 -t 1000. In addition, it should be noted that iperf3 -s xxx runs in the main namespace and sends traffic as veth_a, and iperf3 -c xxx runs in namespace B and receives traffic as veth_b. By analogy, the second flood process creates three groups of flow tests, and the third flood process creates four groups of flow tests... The IP address and port number inside each flood process are different. In addition, each flood process configures mac addresses for veth_a and veth_b.

  3. $nproc pid processes (the number is equal to the number of CPU cores on the service machine), also known as pids, are executed concurrently in the background. Each pid process repeatedly adds or deletes the IP address, port number and other information of the flood process in the nftables table test set. Their purpose is to change the network traffic rules.

  4. Finally, we reach the wait and kill process commands that we are familiar with and often mention.

	# shellcheck disable=SC2046,SC2086 # word splitting wanted here
	wait $(for pid in ${pids}; do echo ${pid}; done)
	# shellcheck disable=SC2046,SC2086
	kill $(for pid in ${flood_pids}; do echo ${pid}; done) 2>/dev/null
	# shellcheck disable=SC2046,SC2086
	wait $(for pid in ${flood_pids}; do echo ${pid}; done) 2>/dev/null

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants