-
Notifications
You must be signed in to change notification settings - Fork 461
-
Notifications
You must be signed in to change notification settings - Fork 461
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AWS ENA DPDK only able to use one core with igb_uio #509
Comments
Every mode has a different requirements from The DPDK driver. Please try STL or ASTF mode |
Thanks for the quick response Hanoh. I know that, but we kinda rely on STF mode. I can see if I can migrate some parts to STL/ASTF of testing. But would be interesting to know why this limitation exists.The ENA NIC does support several queues but Or is this something we would have to ask the DPDK folks that take care of the DPDK driver for ENA? |
STF mode is rather old and wasn’t progressed like STL and ASTF to support “full software mode”. It is theoretically possible to add this support |
So what would you suggest if someone wants to mimic the I assumed that STF mode is more or less the default mode and way to go with all the provided examples in the And I would have expected that ASTF might have the same limitation with ENA and thus not using the full potential of a 20/40 cpu core machine in AWS. I did test
|
Yes, this is the ASTF profile, it is just a different format (python) and you get much more features (full TCP/UDP counters in case of errors) in the cost of CPU% resource :-)
If this does not work, you could scale by adding more virtual interfaces and have one core per dual virtual interfaces. STL requires even less from the driver and should almost always work in software mode. Try doing this (interactive mode)
and then connect from another terminal and connect using our console |
I would have thought that ENA does support it, but somehow it doesn't in that case. I guess I will have to talk to the DPDK folks about that. We tried more virtual interfaces which worked with 4 instead of 2 but with 6 it didn't. The Intel NICs at AWS work better but lack the traffic mirroring feature. I guess we have to spawn more ec2 instances to generate a load of 10Gbit/s for now :/ |
@norg could you share the error? |
Sure:
|
I meant multi-queue e.g. 2 cores per dual virtual interface. |
That case was with 6 ENA NICs but maybe you meant something else by virtual interfaces? So no additional virtual NICs? So might be a misunderstanding :) |
You should be able to work with multiples Rx/Tx queues with ENA/ASTF. |
That's what I already tried but I did again, with
I see the diff with interactive mode and TCP_LRO is not checked in https://trex-tgn.cisco.com/trex/doc/trex_manual.html#_hardware_recommendations so that's expected. |
Try adding this to the CLI --lro-disable For interactive mode |
That helped. So
With 4 I get:
I guess if we can reach 8 that should be enough to reach the 10Gbit/s goal. |
@norg we are making a progress. Try to add “-v 8 “ to CLI . This will enable debug I think the default number of mbuf is low for virtual driver try to add more descriptors and mbufs (2k) |
I did not yet add the additional descriptors, will test this tomorrow but I will already paste you the debug info:
With 4 instead of 8:
With 6 it's the same as with 4 but I will play with mbuf settings tomorrow to see if this helps in the 4/6 scenario or maybe even in the one with Thanks a lot so far! |
@norg You can apply this patch to solve this: Let me know BTW LRU is enabled by mistake too, this should be fixed too |
Hi @hhaim |
Please send the output of running the server with I think it is this line
Another option is to add more mbuf to 9k, add this to trex_cfg.yaml
|
I increased the mbuf_9k a bit before as well but that didn't change much.
And with increased mbuf:
I'll put in the debugger line next and get back to you. |
@tielou thanks for the logs. I will need to put my hand on a setup to understand why the driver is complaining. Will update |
@norg @tielou just add more rx/tx desc e.g. 1024 or 2048 will do (the default is 512) Another thing, set the port_bw to 50 instead of 10 to have more mbufs see this example
../Running it
However, running a simple ASTF profile (http simple) shows OOO packets need to look into the DUT |
Looking into the code further, it seems the function supported by ENA is CRC32 and can't change to TOEPLITZ (explains why RSS does not work) This is the changes I did.
|
I've googled it and found this (it actually toeplitz but with the wrong key) |
Hi @hhaim , I will give the driver patch a shot later! |
Hi @tielou, you didn't read it carefully. :-)
|
ENA driver does not support it even on latest version |
Does this only affect ASTF but also STF? (we will still look into ASTF to solve the overall issue and ideally end up with one machine doing the fancy advanced traffic generating) |
Only ASTF, STL works fine as there is NO need for specific distributions (RSS) in this mode. I've looked into the ENA code more it seems that this (giving the right key,and key_len)
|
Looks like I might be facing the same issue with rampup HTTP test with more than one CPU core. Works fine with single CPU but not at all with two or more. Any timetable when the DPDK is going to be updated in Trex? Shouldn't the new ena in 20.11 fix the issue, if I understood correctly? |
@priikone DPDK AWS driver development is super slow and does not support full RSS API. So in the foreseeable future (see issue in ENA DPDK driver) I don't see a solution from this front. |
Thanks for the info. I see it now they didn't change anything wrt. ena RSS in 20.11. Out of curiosity, why does Trex need to set the RSS key and reta itself? Wouldn't this work by default without it as well? Drivers typically set the default RSS to the configured queues automatically. Unless Trex have the need to redirect to some specific core(s). My need is just to be able send and receive traffic on as many cores as possible. I don't care about latency measurements for example at all in this type of setup. |
@priikone when a flow is generated from core x (ASTF) it is required to know which tuple to generate so the reverse flow will come to the same core x. Without knowing the key and the distribution model there is a need to have a software rings between the cores to redirect the packets (using software -slow) back to the right generated core. |
Understood, thanks. This is of course a common problem in networking and there's many ways it can be handled. But I'm not familiar with Trex architecture and what limitations there are. What's the plan with the software solution? How is it going to work? |
@priikone |
That's probably fine with the packet rates in the rampup HTTP test I'm thinking... Looks like you want to keep everything per-thread. Have you considered doing a flow lookup (lockless of course) so it wouldn't matter where the traffic arrives? Or move the flow once it's known where it balances. In principal it would be possible to even precompute the RSS hash before hand to know where the return traffic is going to balance. Again, I'm not familiar with Trex architecture so I have no clue what's easiest to do there... |
@priikone calculating the reverse flow destination is not possible when you don't know how the RSS function works. This is the all point of making a software redirect. |
Hi @priikone , |
It works! Thanks guys. Didn't do much performance testing yet, though. One issue is that I can't use all 8 queues. If I try (-c 8), I get: Ethdev port_id=0 nb_tx_queues=10 > 8 WIth 6 queues or fewer it works fine. |
@priikone the code was released. You can't use 8 cores because the AWS driver is limited to 8 TX queues and we need another 2 queues |
Hmm... is that for the Tx side? Or is there a reason why 8 Rx queues couldn't exist? Or what are they used for? |
@priikone when you specify |
Ok, so an improvement could be to utilize more queues when latency measurement isn't used. |
@priikone we had this in the past and removed this optimization beacuse the init code became too complex. add more queues in AWS |
With 8 cores (10 queues) everything seems to work fine, but >8 cores (>10 queues) something goes wrong with the TCP. I'm seeing invalid TCP sequence numbers with SYN (ACK=0) and RST (ACK=0) packets, dropped by the DUT. No issues seen with 8 or fewer cores. |
@priikone maybe this is an issue with AWS. With bare metal and software mode, it works fine with 20 cores |
Actually, it was an ARP problem and the errors were because of retransmissions. |
@priikone could you elaborate? I don't see how the number of cores related to ARP |
The range of IPs defined in ASTFIPGenDist covers the number of CPU cores, and I didn't get ARP replies for IPs beyond the first 8 in the range. But now I do and traffic works fine with 16 cores. |
We're trying to achieve 10Gbit/s in a big AWS instance with the ENA interfaces but so far we can't get it running ofer 2.5Gbit/s due to the fact that for each interface just one core can be used:
When I run it with just
-c 1
I see the DPDK message:And it's running but with just one core for each interface it reaches a cap.
So I'm wondering is it not an option to get it running like that in AWS like I have it on some baremetal machines? There I see DPDK mode being set to
DROP_QUE_FILTER
on X710 Intel Nics.What made me wonder as well is that the DPDK mode is missing at all when I run
-c 10
. Is this rather a DPDK issue with the ena/igb_uio module?So any hint would be helpful
The text was updated successfully, but these errors were encountered: