-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Native stack bug when the num_cores > num_queues #654
Comments
/cc @gleb-cloudius |
What did you pas to The code in connect does not suppose to use current cpu. On the contrary it tries to figure out which cpu will receive the packet with a given tuple. Can you describe your NIC HW? How many hw queues it has, does it support HW RSS? |
The exact modification that I made was changing:
to:
I'm using a C5.9xlarge EC2 instance with an ENA adapter. It has 8 hardware queues, and it does not support RSS (Or at least, Seastar is not using hardware RSS). |
How can it not support RSS if it has 8 queues? How it balances the traffic between the queues? It looks like NIC and seastar dressage about how particular queue is chosen. |
It may support RSS, but when I added print statements to:
The else branch was being taken. |
This does not mean it does not support RSS, it means it does not provide rss hash value in a packet descriptor, so we need to calculate it by ourselves. The best way to figure out RSS configuration is to instrument dpdk device initialization in |
Hmm, well, it seems it does support RSS, per this: https://github.com/amzn/amzn-drivers/tree/master/kernel/linux/ena I am also seeing set_rss_table called and:
Printed at startup. So yes, I think RSS is both supported by the NIC and being used by Seastar/DPDK. EDIT: I will note that I also get this message:
I wonder, is the issue that Seastar/DPDK are having trouble configuring hardware flow control, and that's creating problems for RSS? |
Maybe the ena driver is outdated. You can try with https://groups.google.com/d/msg/seastar-dev/qUN4ig1BWa8/Dvbm8B9GAAAJ (still undergoing review). |
Ah, good idea, I had tried to upgrade Seastar's DPDK myself earlier when I noticed that the ENA PMD driver was not the latest, but I ran into some troubles. I imagine i'll have more luck with one of your patches :). I'll give that a try. |
It's from @cyb70289, to give credit |
Can you provide full dpdk output? I also see that they do support providing rss hash in a packet descriptor in the Linux driver, so may be new driver will help indeed. |
Here's the dpdk output, using the patched version with v19.05. It's currently not working:
I've tracked down this issue to failing to allocate buffers in |
@ashaffer, the main update of Seastar with 19.05 patch is it now uses iommu+vfio for dpdk memory management, uio is not supported. Do you bind your nic to vfio-pci driver? You mentioned you're running in EC2 instance, not baremetal? I'm not sure if iommu+vfio can work in virtualized environment. |
@cyb70289 Thanks for pointing that out. I did indeed have trouble getting that working, it turns out you can get it working on EC2, though. Found instructions here: https://github.com/amzn/amzn-drivers/tree/master/userspace/dpdk I'm now getting the same output as before, except with VFIO enabled:
I will note one thing, which is that I am using VFIO in the "unsafe noiommu" mode as recommended by the page I linked. Does your patch rely on it using IOMMU specifically, or should it work with VFIO without that? |
@ashaffer, Thanks for the report. |
Thanks, I appreciate the help. Dmesg output when running seastar:
Testpmd does seem to work, which I guess is a good sign that it is at least possible in principle for VFIO to work on a non-metal EC2 instance. Also, I traced exactly where the call is failing. It's inside of That's about as far as i've gotten trying to debug it myself. It's clearly some problem allocating memory, but I don't yet understand why it's failing there. EDIT: For comparison, when I run
|
I'll add another issue i've noticed. I hadn't been using hugepages, so I tried using them. Ultimately I encountered the same issue as above, but in order to get to that point I had to explicitly specify |
Alright guys...after fumbling around in the dark for a while, I have managed to get it working with the latest DPDK on my EC2 instance. Here's what I had to change:
|
Great to hear latest dpdk works on EC2 ENA nic. Thank you @ashaffer. |
On Sun, Jun 23, 2019 at 03:45:17PM -0700, Andrew Shaffer wrote:
2. Even with the new dpdk, my original issue doesn't seem to be fixed, so I had to reinstate the change of `interface::dispatch_packet` to use `_dev->hash2qid(hash)` rather than `engine().cpu_id()`. However, I do notice that with the new DPDK it seems to now be taking advantage of the "Low Latency Queue" option for the ENA NIC, which is great.
This will result in additional cross cpu hop for each packet. In
interface::dispatch_packet() _dev->hash2qid(hash) == engine().cpu_id()
has to be true. If it is not a HW and seastar disagrees on how RSS
works. Either redirection tables are different or hash function used or
keys. May be ENA ignores some of our configuration without issuing any
errors. Does it work for you if nic queues equal to number of cpus? Can
you provide a full dpdk initialization output?
…--
Gleb.
|
If I restrict the cpus using
How do I do that? |
On Mon, Jun 24, 2019 at 12:51:52AM -0700, Andrew Shaffer wrote:
> Does it work for you if nic queues equal to number of cpus?
If I restrict the cpus using `--cpuset`, it gets more likely to succeed, as you would expect, but the issue remains.
OK, so this is indeed RSS issue and not seastar internal redirection
issue.
> Can you provide a full dpdk initialization output?
How do I do that?
The output that the application prints when it starts. You already
provided it here, but for a failed boot. Can you show one for successful
boot as well.
…--
Gleb.
|
Sure:
It hangs here if I don't make that |
It looks like the dpdk ena driver does not support changing neither
RSS hash function nor RSS key. They set the function by default to CRC2
and ignore the configuration seastar provides
(ena_com_fill_hash_function is called only during default config). Looks
like a bug to me. They should at least return an error on re-configure
attempt.
…--
Gleb.
|
I think i've found the culprit: It seems to be defaulting to a CRC32 hash. I'm not sure how to configure this from the outside, but i'll try recompiling DPDK with this changed to toeplitz and see if that fixes the issue, then we can try to figure out how to set it externally. EDIT: Looks like we found it at the same time. Ya, I can't figure out how to set it either. I guess i'll either modify my ENA driver or add a CRC32 hash to Seastar. Would you guys be open to adding the CRC32 option to Seastar and switching to it in the case of an ENA driver? If not it's cool i'll just manage my own little fork. |
We setup hash function here: https://github.com/scylladb/seastar/blob/master/src/net/dpdk.cc#L1729 and the key here: https://github.com/scylladb/seastar/blob/master/src/net/dpdk.cc#L1524 The problem is that ena driver does not report an error on neither of those, but just silently ignore the requested configuration. It would be great to add CRC32 support to seastar and fallback to it if the function cannot be changed, but for that ena drive needs to return a proper error. EDIT: I think it is fair to complain to dpdk developers to either add function changing support or report an error. |
Just a little update here. I added CRC32 hashing in, but it doesn't seem to match the output of the NIC. I forced the NIC to report to me the RSS hash it's generating, and it's quite odd. The hash is always of the form: |
Alright....after many trials and tribulations...we have a resolution. It turns out that even though the Amazon DPDK driver sets CRC32 hashing without allowing you to configure it...The NIC itself completely ignores all that and uses Toeplitz. It does this weird thing where it mirrors the upper and lower word in the Toeplitz hash, but since we're only using the low-order bits anyway, we don't need to worry about that. The main reason that things weren't working initially is that they don't use the Mellanox RSS key by default, they use this one, which they claim to be used by the majority of NIC vendors:
I'm not sure how you guys would want to handle this. Detect which NIC it is and if it's an ENA, switch to this key? Or just make this one the default and hope keys are configurable for others? |
I don't really know. I recommend raising this on the dpdk mailing list and copying the ena maintainers. Ideally it would be resolved in dpdk and we would just upgrade to a fixed version. If dpdk refuses to fix this, we can add a workaround in Seastar, but we should try upstream first. |
Unfortunately dpdk can't fix it. It's at the hardware/VM level on EC2. I tried going in and modifying the ENA DPDK driver, and that wasn't sufficient, because the NIC simply ignores commands related to RSS. It's not a huge deal to me, i've gone ahead and changed the key in my fork. The AWS guys say that they're going to enable configurable RSS "soon", but they've been saying that for a little while now it seems. |
Can't they change the dpdk API to report that you can't change the hash function and that it's toeplitz? If not, then we can patch it in Seastar. |
Just want to check back on this thread. I am facing the similar issue as @ashaffer mentioned in AWS EC2 ENA. Below are the steps I followed on AWS EC2 ENA enabled instance:
With this I am seeing a hang issue when using multiple cpus for the Seastar based app. If I use one cpu for running the app with cpuset=1, then I am not facing the connection problem. But this limits the application to use single cpu. Do we have any solution for the Seastar + DPDK + ENA combination that works? Also, what is the current stable Seastar version that we can use? Does "seastar-20.05-branch" seems good? @avikivity , or @ashaffer or others any thoughts? |
Best is to try master. |
I just got started using Seastar so it's entirely possible i'm configuring something incorrectly. However, I spent the last several days tracking down a nasty little issue I was having running the native stack on an EC2 instance. I noticed that I could complete a TCP connection a small percentage of the time, but it seemed completely random. Eventually I tracked it down to these offending lines of code:
In interface::dispatch_packet (net.cc line 328):
auto fw = _dev->forward_dst(engine().cpu_id(), [&p, &l3, this] () {
and
tcp::connect (tcp.hh line 844):
As you can see here, CPU selection is being done slightly differently upon opening the connection, vs receiving packets. Inside of
hash2cpu
the CPU is selected like this:return forward_dst(hash2qid(hash), [hash] { return hash; });
It passes in
hash2qid
rather thanengine().cpu_id()
, liketcp::connect
does. What this ends up meaning is that my connection only works if, by chance, these two values happen to match. Which ends up being a small percentage of the time on the instance i'm using.I know that this is the issue, because if I change the
engine().cpu_id()
call indispatch_packet
tohash2qid
, everything works reliably again. However, I don't think that's going to spread the load over all the cores in the way that I want.Is this an issue of me misunderstanding some aspect of configuration, or is this a real bug?
The text was updated successfully, but these errors were encountered: