Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

slow udp stream generation with message size 1400B #899

Open
olichtne opened this issue Jul 16, 2019 · 4 comments
Open

slow udp stream generation with message size 1400B #899

olichtne opened this issue Jul 16, 2019 · 4 comments

Comments

@olichtne
Copy link
Contributor

olichtne commented Jul 16, 2019

Context

  • Version of iperf3: master

  • Hardware: 10G ixgbe nic, other parts are probably not relevant

  • Operating system (and distribution, if any): RHEL8.0 but should be reproducible on other Linux distros

  • Other relevant information (for example, non-default compilers,
    libraries, cross-compiling, etc.):

Bug Report

  • Expected Behavior

when running iperf3 -c <dst_ip> -B <src_ip> -u -b 0 -t 10 -l 1400 -A <cpuid>

I'd expect the iperf generator to either reach line rate throughput (close to 10Gbps) or to reach maximum utilization of a single CPU (processor isn't fast enough to generate more packets per second)

  • Actual Behavior

Generator only generates 4.3gbps while the utilization of the single cpu core is only ~40%.

Running netperf in a similar configuration - basic ipv4 udp stream with 1400 messages, does reach line rate without issues.

  • Steps to Reproduce
  1. configure a simple network of 2 hosts with a connection capable of 10Gbps
  2. start an iperf3 server
  3. run the client with the following command iperf3 -c <dst_ip> -B <src_ip> -u -b 0 -t 10 -l 1400 -A <cpuid> replacing the <> values with whatever you configured
  • Possible Solution

I investigated this issue with a colleague who has experience with udp kernel development and we found out that this seems to happen because of an unfortunate combination of UDP stream generation burstiness and the fact that every 10 test stream write() calls a select() gets called to check on the control connection to the server.

What seems to happen is that some amount of writes to the test socket fills out it's buffer for a short time at which point the select call suspends the iperf client process and re-waking the process takes a bit of time, considering that this happens every 10 writes we think this can affect the generator performance.

We made this conclusion after:

  • comparing iperf and netperf strace
    • both create and configure the socket in comparable ways
    • one difference is that iperf uses write and netperf uses sendto but this shouldn't have the measured impact
    • netperf doesn't interlace sendto with select calls, which means that when a socket buffer fills up for a moment the sendto call blocks and we suspect that re-waking from this kind of blocking might be faster than from the select call in iperf
  • we found that the 10 value comes from https://github.com/esnet/iperf/blob/master/src/iperf_api.c#L2331 and when changed to a larger value (e.g. 1000) the issue dissappears

When looking for possible solutions I found that using a larger send buffer size (using the -w argument) can also work around the issue, however considering that netperf doesn't configure this and uses the default send buffer size I don't consider this a valid solution.

Configuring a burst packet value using -b 0/1000 can override the multisend variable, however this currently doesn't work and I submitted a pull request #898 to fix this. However, I'm not sure if this is a good and intended way to configure iperf to be faster for UDP streams, maybe the "multisend" variable should also be configurable via a separate CLI argument?

I'm willing to look into implementing a solution and sending a pull request after discussing it here.

@bmah888
Copy link
Contributor

bmah888 commented Aug 23, 2019

So with respect to the burst mode, I'm wondering if it would make sense to have something in the argument processing to force the burst size to zero if we're doing an "unlimited" bitrate UDP test. That would keep the special-casing out of the time-critical code. I still need to respond to your other points, but I was actually staring at, and thinking about, your pull request several times this week.

@olichtne
Copy link
Contributor Author

I think that makes sense as long as the argument processing logic for these values can be overridden somehow.

Currently the logic of how big of a burst to send is part of the iperf_send function. Moving it outside and expanding it to recognize "unlimited" UDP test, is a good idea, but IMO it's still just implementing a "default/recommended value" that the tester should have the ability to directly control - which would mean either introducing a new argument or splitting the -b argument into two.

@bmah888
Copy link
Contributor

bmah888 commented Apr 28, 2020

Having lost track of this dialog (apologies), are we done here and can I close this issue? It looks like the pull request that I merged mostly fixes the problem you were seeing. Thanks.

@olichtne
Copy link
Contributor Author

I think that depends on how if you view "burst mode" and the "multisend" as the same thing...

To simplify, the original problem is that a single iteration of the generator loop (implemented in the iperf_send function) calls the snd function (iow a write to a socket) ten times, then calls a select call on the controller socket. This 10:1 ratio results means that the flow generation is suboptimal when packet sizes are smaller. Suboptimal here means that the generated flow slower than the line rate of the hardware NIC card and at the same time the CPU core is not fully utilized - if iperf could run faster it could generate more packets.

The 10:1 ratio can be manipulated by specifying the burst value however it is originally defined by the multisend attribute - defined in iperf_defaults as:

 testp->multisend = 10;	/* arbitrary */

The pull request that was merged a year ago was about a bug related to manipulating the multisend value via the burst parameter. I didn't bother creating an issue for that as I was pretty sure that it was an actual bug and I already had a pretty easy fix.

On the other hand I opened this issue because:

  1. the burst mode resolves our one use case, but it's limited to a maximum value of 1000 (defined as MAX_BURST in iperf.h file
  2. the multisend value is initialized as arbitrary which made me question if the burst mode is the correct way to resolve the full problem of the 10:1 ratio or if there should be a way to actually configure the multisend value as well.

In summary, if you think that burst mode and multisend are the same thing and that using burst mode is the correct way to change the 10:1 ratio then this issue can be closed.

Hopefully I explained what the issue is, if it's still somehow confusing feel free to ask more questions :).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants