Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ena_com_get_dev_basic_stats spin more than 1 sec #140

Closed
hhaim opened this issue Aug 25, 2020 · 18 comments
Closed

ena_com_get_dev_basic_stats spin more than 1 sec #140

hhaim opened this issue Aug 25, 2020 · 18 comments

Comments

@hhaim
Copy link

hhaim commented Aug 25, 2020

  • DPDK 20.02
 0x5584cf3f0faa ./_t-rex-64(+0x194faa) [0x5584cf3f0faa]
2 0x7f0f006668a0 /lib/x86_64-linux-gnu/libpthread.so.0(+0x128a0) [0x7f0f006668a0]
3 0x7f0f00661f85 pthread_cond_timedwait + 821
4 0x5584cf80e393 ena_com_execute_admin_command + 435
5 0x5584cf810659 ena_com_get_dev_basic_stats + 57
6 0x5584cf808129 ./_t-rex-64(+0x5ac129) [0x5584cf808129]
7 0x5584cf6d8020 rte_eth_stats_get + 128
8 0x5584cf490f30 CTRexExtendedDriverBase::get_extended_stats_fixed(CPhyEthIF*, CPhyEthIFStats*, int, int) + 32
9 0x5584cf362922 CPhyEthIF::get_extended_stats() + 28
10 0x5584cf362b67 CPhyEthIF::update_counters() + 17
11 0x5584cf36c73e CGlobalTRex::update_stats() + 56
12 0x5584cf36d657 CGlobalTRex::sync_threads_stats() + 9
13 0x5584cf36e457 CGlobalTRex::port_stats_to_json(Json::Value&, unsigned char) + 17
14 0x5584cf5440e2 TrexRpcCmdGetPortStats::_run(Json::Value const&, Json::Value&) + 66
15 0x5584cf534911 TrexRpcCommand::run(Json::Value const&, Json::Value&) + 81
16 0x5584cf530474 JsonRpcMethod::_execute(Json::Value&) + 52
17 0x5584cf52dbd3 TrexJsonRpcV2ParsedObject::execute(Json::Value&) + 131
18 0x5584cf52c16d TrexRpcServerReqRes::process_request_raw(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&) + 541
19 0x5584cf52c90e TrexRpcServerReqRes::process_zipped_request(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&) + 462
20 0x5584cf52cce5 TrexRpcServerReqRes::handle_request(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 469
21 0x5584cf52d2fb TrexRpcServerReqRes::_rpc_thread_cb_int() + 1339
22 0x5584cf52da2b TrexRpcServerReqRes::_rpc_thread_cb() + 11
23 0x7f0efff6b27f so/x86_64/libstdc++.so.6(+0xba27f) [0x7f0efff6b27f]
24 0x7f0f0065b6db /lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7f0f0065b6db]
25 0x7f0eff62ba3f clone + 63

This is a new crash https://github.com/cisco-system-traffic-generator/trex-core
It means that the ena_com_get_dev_basic_stats spin for more that 1sec.
Is it expected?

@AWSNB
Copy link
Contributor

AWSNB commented Aug 25, 2020 via email

@hhaim
Copy link
Author

hhaim commented Aug 25, 2020

@AWSNB I have forward the questions.
I think it is EC2-C5 type

@hhaim
Copy link
Author

hhaim commented Aug 25, 2020

@AWSNB one more thing. For some it happens and for some it does not. It rather new thing from the past week

@jaygmuru
Copy link

happend on t3.xlarge i-089921d082eaf5bc6

@AWSNB
Copy link
Contributor

AWSNB commented Aug 25, 2020 via email

@AWSNB
Copy link
Contributor

AWSNB commented Aug 25, 2020 via email

@hhaim
Copy link
Author

hhaim commented Aug 25, 2020

@AWSNB thanks for the quick answer.
We can disable the watchdog to verify if it's only 1 sec spin and not more.
@jaygmuru could you disable the watchdog by adding --no-watchdog to the invocation.
In any case I don't think you want to run on such instance that stuck for 1 sec for TGN purpose

@mpastyl
Copy link

mpastyl commented Sep 16, 2021

(sorry for necrobumping)
Were there any new insights on this? We see the same issue on a c5n.xlarge running VPP.

@yastreb78
Copy link

@mpastyl Please share instance-id and time

@mpastyl
Copy link

mpastyl commented Sep 17, 2021

@yastreb78 Thank you for your reply. Happened on i-00ffbaea3c3768628 at roughly Sept 16 09:15:24 UTC

I have appended the stack trace bellow. Basically we kill the application automatically if it fails to reply after a timeout, hence the signal at frame 4.
Please note one difference with the original issue: the calling function is ena_com_get_eni_stats not ena_com_get_dev_basic_stats, but they both spin in the same function afterwards.

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007ff0d20a0859 in __GI_abort () at abort.c:79
#2  0x000055beccd8afbe in os_exit (code=code@entry=1) at /codebuild/output/src073226041/src/github.com/EMnify/vpp/src/vpp/vnet/main.c:433
#3  0x00007ff0d23cb77c in unix_signal_handler (signum=11, si=<optimized out>, uc=<optimized out>) at /codebuild/output/src073226041/src/github.com/EMnify/vpp/src/vlib/unix/main.c:187
#4  <signal handler called>
#5  futex_abstimed_wait_cancelable (private=<optimized out>, abstime=0x7fef904ddf70, clockid=<optimized out>, expected=0, futex_word=0xac0203538) at ../sysdeps/nptl/futex-internal.h:320
#6  __pthread_cond_wait_common (abstime=0x7fef904ddf70, clockid=<optimized out>, mutex=0xac0203540, cond=0xac0203510) at pthread_cond_wait.c:520
#7  __pthread_cond_timedwait (cond=0xac0203510, mutex=0xac0203540, abstime=0x7fef904ddf70) at pthread_cond_wait.c:656
#8  0x00007fefd0eb7d86 in ena_com_execute_admin_command () from /usr/lib/x86_64-linux-gnu/vpp_plugins/dpdk_plugin.so
#9  0x00007fefd0eba0c5 in ena_com_get_eni_stats () from /usr/lib/x86_64-linux-gnu/vpp_plugins/dpdk_plugin.so
#10 0x00007fefd0eb1d7b in ena_copy_eni_stats () from /usr/lib/x86_64-linux-gnu/vpp_plugins/dpdk_plugin.so
#11 0x00007fefd0eb201c in ena_xstats_get () from /usr/lib/x86_64-linux-gnu/vpp_plugins/dpdk_plugin.so
#12 0x00007fefd0d4dea9 in rte_eth_xstats_get () from /usr/lib/x86_64-linux-gnu/vpp_plugins/dpdk_plugin.so
#13 0x00007fefd137ff75 in dpdk_get_xstats (xd=0x7fefd4bc1540) at /codebuild/output/src073226041/src/github.com/EMnify/vpp/src/plugins/dpdk/device/dpdk_priv.h:66
#14 dpdk_update_counters (now=<optimized out>, xd=0x7fefd4bc1540) at /codebuild/output/src073226041/src/github.com/EMnify/vpp/src/plugins/dpdk/device/dpdk_priv.h:116
#15 format_dpdk_device (s=0x7fefd4bd8ef0 "eth_Gp", ' ' <repeats 29 times>, "2     up   eth_Gp\n  Link speed: unknown\n  Ethernet address 06:79:e4:a7:a9:8f\n  ", args=<optimized out>)
    at /codebuild/output/src073226041/src/github.com/EMnify/vpp/src/plugins/dpdk/device/format.c:556
#16 0x00007ff0d22bb6ad in do_percent (va=0x7fef904de850, fmt=<optimized out>, _s=<synthetic pointer>) at /codebuild/output/src073226041/src/github.com/EMnify/vpp/src/vppinfra/format.c:373
#17 va_format (s=0x7fefd4bd8ef0 "eth_Gp", ' ' <repeats 29 times>, "2     up   eth_Gp\n  Link speed: unknown\n  Ethernet address 06:79:e4:a7:a9:8f\n  ", fmt=<optimized out>, va=va@entry=0x7fef904de850)
    at /codebuild/output/src073226041/src/github.com/EMnify/vpp/src/vppinfra/format.c:404
#18 0x00007ff0d22bca68 in format (s=<optimized out>, fmt=fmt@entry=0x7ff0d3363c4c "\n%U%U") at /codebuild/output/src073226041/src/github.com/EMnify/vpp/src/vppinfra/format.c:428
#19 0x00007ff0d2da000a in format_vnet_hw_interface (s=<optimized out>, args=<optimized out>) at /codebuild/output/src073226041/src/github.com/EMnify/vpp/src/vnet/interface_format.c:218
#20 0x00007ff0d22bb6ad in do_percent (va=0x7fef904dea78, fmt=<optimized out>, _s=<synthetic pointer>) at /codebuild/output/src073226041/src/github.com/EMnify/vpp/src/vppinfra/format.c:373
#21 va_format (s=s@entry=0x0, fmt=<optimized out>, va=va@entry=0x7fef904dea78) at /codebuild/output/src073226041/src/github.com/EMnify/vpp/src/vppinfra/format.c:404
#22 0x00007ff0d2364b5d in vlib_cli_output (vm=vm@entry=0x7ff0d23e7680 <vlib_global_main>, fmt=fmt@entry=0x7ff0d3335463 "%U\n") at /codebuild/output/src073226041/src/github.com/EMnify/vpp/src/vlib/cli.c:707
#23 0x00007ff0d2d9c6e9 in show_or_clear_hw_interfaces (vm=0x7ff0d23e7680 <vlib_global_main>, input=<optimized out>, is_show=1, cmd=<optimized out>) at /codebuild/output/src073226041/src/github.com/EMnify/vpp/src/vnet/interface_cli.c:131
#24 0x00007ff0d2365402 in vlib_cli_dispatch_sub_commands (vm=vm@entry=0x7ff0d23e7680 <vlib_global_main>, cm=cm@entry=0x7ff0d23e78d0 <vlib_global_main+592>, input=input@entry=0x7fef904dede0, parent_command_index=<optimized out>)
    at /codebuild/output/src073226041/src/github.com/EMnify/vpp/src/vlib/cli.c:574
#25 0x00007ff0d236550e in vlib_cli_dispatch_sub_commands (vm=vm@entry=0x7ff0d23e7680 <vlib_global_main>, cm=cm@entry=0x7ff0d23e78d0 <vlib_global_main+592>, input=input@entry=0x7fef904dede0, parent_command_index=parent_command_index@entry=0)
    at /codebuild/output/src073226041/src/github.com/EMnify/vpp/src/vlib/cli.c:529
#26 0x00007ff0d23659ad in vlib_cli_input (vm=vm@entry=0x7ff0d23e7680 <vlib_global_main>, input=input@entry=0x7fef904dede0, function=function@entry=0x55beccd95560 <inband_cli_output>, function_arg=function_arg@entry=140666894937520)
    at /codebuild/output/src073226041/src/github.com/EMnify/vpp/src/vlib/cli.c:678
#27 0x000055beccd95c26 in vl_api_cli_inband_t_handler (mp=0x7fefd4b8db00) at /codebuild/output/src073226041/src/github.com/EMnify/vpp/src/vpp/api/api.c:233
#28 0x00007ff0d3470c4c in msg_handler_internal (free_it=0, do_it=1, trace_it=<optimized out>, the_msg=0x7fefd4b8db00, am=0x7ff0d3483f40 <api_global_main>) at /codebuild/output/src073226041/src/github.com/EMnify/vpp/src/vlibapi/api_shared.c:505
#29 vl_msg_api_handler_no_free (the_msg=0x7fefd4b8db00) at /codebuild/output/src073226041/src/github.com/EMnify/vpp/src/vlibapi/api_shared.c:710
#30 0x00007ff0d3462431 in vl_socket_process_api_msg (rp=<optimized out>, input_v=<optimized out>) at /codebuild/output/src073226041/src/github.com/EMnify/vpp/src/vlibmemory/socket_api.c:209
#31 0x00007ff0d3467dcb in vl_api_clnt_process (vm=<optimized out>, node=<optimized out>, f=<optimized out>) at /codebuild/output/src073226041/src/github.com/EMnify/vpp/src/vlibmemory/vlib_api.c:405
#32 0x00007ff0d237e48b in vlib_process_bootstrap (_a=<optimized out>) at /codebuild/output/src073226041/src/github.com/EMnify/vpp/src/vlib/main.c:1477
#33 0x00007ff0d22c20a4 in clib_calljmp () from /lib/x86_64-linux-gnu/libvppinfra.so.21.01+emnify8.3.0
#34 0x0000000000000000 in ?? ()

@ndagan
Copy link

ndagan commented Sep 19, 2021

We are looking into this - will keep this thread updated.

@shaibran
Copy link
Contributor

@mpastyl Thanks for the info, we will review the instance data from our side

@shaibran
Copy link
Contributor

Hey, we reviewed the instances logs from our side but we did not observe any errors on the HW side when issue reproduced.
@mpastyl @hhaim We would appreciate if you could assist with two reproductions attempts:

  1. Check if this is resolved by increasing timeout on application to 4 seconds?
  2. Enable ena_com logs (build the driver with RTE_LIBRTE_ENA_COM_DEBUG flag and then enable the logs by passing EAL argument: "--log-level=pmd.net.ena.com:8" if logging level is a problem on your side, you could decrease it from 8 down to 7 or 6; lower than this will not provide us any useful information)

We would like to attempt to reproduce this in house in parallel; We would appreciate if you could share the following information:

  1. Confirm DPDK version
  2. Confirm instance size and type
  3. Frequency of this issue, if known
  4. What is the utilization of the resources when this happens (how many CPUs, network, etc.)
  5. Any commands, configuration or scenario which can assist us with reproduction.
    Thanks in advance

@mpastyl
Copy link

mpastyl commented Sep 30, 2021

Hi @shaibran, thank you for taking the time to look at this. We have been trying to reproduce and isolate the issue ourselves.

Here are some updates:

  • Originally we were using VPP v21.01, that uses DPDK v20.11. We recently merged with the latest stable versions of VPP (v.21.06) which used DPDK v21.02. The problem still remains, but it happens at a lower frequency (1-2 times per day, as opposed to once every 2-3 hours) which is something we can live with for now.
  • I have attached some ena_com logs here. During this run, we saw the problem happen from roughly 11:31:08 to 11:31:10. During this period, the ena driver stops producing logs.

To answer your previous questions:

Check if this is resolved by increasing timeout on application to 4 seconds?

This would indeed stop the application from timing out, because we see that the function gets unstuck after roughly 3 sec. However, a spurious 3 sec freeze is a problem for our use case and we would like to avoid it.

  1. Confirm DPDK version
  2. Confirm instance size and type
  3. Frequency of this issue, if known
  4. What is the utilization of the resources when this happens (how many CPUs, network, etc.)
  5. Any commands, configuration or scenario which can assist us with reproduction.
  1. Currently running DPDK v21.02
  2. c5n.xlarge
  3. The problem happens infrequently and at seemingly random intervals. In our current setup (VPP v21.06 -> DPDK v21.02) it happens only 1-2 per day. We know that it is also related to how often our application queries device stats (currently every 30 sec). Setting the query frequency to 1 sec increased the frequency of the problem to 2-3 times per hour.
  4. There is no noticeable change in resource utilization when the problem happens. We also did some profiling with perf. When the problem happens, perf does not seem to collect any samples for the main thread (the one that you can see in the stack trace)
  5. Other (hopefully useful) information
  • The arguments VPP uses to initialize DPDK EAL: -c e -n 4 --log-level pmd.net.ena.com:8 --in-memory --no-telemetry --file-prefix vpp -a 0000:00:06.0 -a 0000:00:07.0 -a 0000:00:08.0 --main-lcore 1
  • I have tried to reproduce the problem in standalone DPDK by continuously measuring the time it takes to query xstats from devices but I do not see any long delays. Maybe the problem lies in conjunction with how VPP used DPDK.

Thank you for your time.

@shaibran
Copy link
Contributor

Thank you @mpastyl for the detailed response, we will review this and update

@mejedi
Copy link

mejedi commented Oct 4, 2021

Together with @mpastyl we've been looking further into this issue. I believe that we've identified the issue and fixed the bug.

We observe the main thread occasionally waiting on a condition variable in ENA DPDK driver for 3 seconds. Found with Linux perf, tracing sched_switch events. The backtrace:

	    7fa3d1b917b1 __pthread_cond_timedwait
  futex-internal.h:320 (inlined)
	    7fa2d08be484 ena_com_wait_and_process_admin_cq_interrupts
  ena_com.c:764 (inlined)
	    7fa2d08be484 ena_com_wait_and_process_admin_cq
  ena_com.c:764 (inlined)
	    7fa2d08be484 ena_com_execute_admin_command (/usr/lib/x86_64-linux-gnu/vpp_plugins/dpdk_plugin.so)
  ena_com.c:764
	    7fa2d08be7d9 ena_get_dev_stats (/usr/lib/x86_64-linux-gnu/vpp_plugins/dpdk_plugin.so)
  ena_com.c:2206
	    7fa2d08c038a ena_com_get_dev_basic_stats (/usr/lib/x86_64-linux-gnu/vpp_plugins/dpdk_plugin.so)
  ena_com.c:2240
	    7fa2d08b9f82 ena_stats_get (/usr/lib/x86_64-linux-gnu/vpp_plugins/dpdk_plugin.so)
  ena_ethdev.c:970
	    7fa2d0786523 rte_eth_stats_get (/usr/lib/x86_64-linux-gnu/vpp_plugins/dpdk_plugin.so)
  rte_ethdev.c:2693
	    7fa2d0786597 eth_basic_stats_get (/usr/lib/x86_64-linux-gnu/vpp_plugins/dpdk_plugin.so)
  rte_ethdev.c:3003
	    7fa2d0786fdf rte_eth_xstats_get (/usr/lib/x86_64-linux-gnu/vpp_plugins/dpdk_plugin.so)
  rte_ethdev.c:3179
	    ...

Checking the source code:

static int ena_com_wait_and_process_admin_cq_interrupts(struct ena_comp_ctx *comp_ctx,
                            struct ena_com_admin_queue *admin_queue)
{
    unsigned long flags = 0;
    int ret;

    ENA_WAIT_EVENT_WAIT(comp_ctx->wait_event, // <-- ena_com.c:764
                admin_queue->completion_timeout);

    /* In case the command wasn't completed find out the root cause.
     * There might be 2 kinds of errors
     * 1) No completion (timeout reached)
     * 2) There is completion but the device didn't get any msi-x interrupt.
     */
    if (unlikely(comp_ctx->status == ENA_CMD_SUBMITTED)) {
        ENA_SPINLOCK_LOCK(admin_queue->q_lock, flags);
        ena_com_handle_admin_completion(admin_queue);
        admin_queue->stats.no_completion++;
        ENA_SPINLOCK_UNLOCK(admin_queue->q_lock, flags);

        if (comp_ctx->status == ENA_CMD_COMPLETED) {
            ena_trc_err("The ena device sent a completion but the driver didn't receive a MSI-X interrupt (cmd %d), autopolling mode is %s\n",
                    comp_ctx->cmd_opcode, admin_queue->auto_polling ? "ON" : "OFF");
            /* Check if fallback to polling is enabled */
            if (admin_queue->auto_polling)
                admin_queue->polling = true;
        } else {
            ena_trc_err("The ena device didn't send a completion for the admin cmd %d status %d\n",
                    comp_ctx->cmd_opcode, comp_ctx->status);
        }

surprisingly, completion_timeout is exactly 3s. Looks like the wait on the CV times out.

Furthermore, our build had RTE_LIBRTE_ENA_COM_DEBUG enabled and we didn't see any errors, meaning that the request succeeded. Is it possible that ENA_WAIT_EVENT_WAIT somehow missed the wake up by ENA_WAIT_EVENT_SIGNAL?

#define q_waitqueue_t                   \
        struct {                        \
                pthread_cond_t cond;    \
                pthread_mutex_t mutex;  \
        }

#define ena_wait_queue_t q_waitqueue_t
#define ENA_WAIT_EVENT_WAIT(waitevent, timeout)                         \
        do {                                                            \
                struct timespec wait;                                   \
                struct timeval now;                                     \
                unsigned long timeout_us;                               \
                gettimeofday(&now, NULL);                               \
                wait.tv_sec = now.tv_sec + timeout / 1000000UL;         \
                timeout_us = timeout % 1000000UL;                       \
                wait.tv_nsec = (now.tv_usec + timeout_us) * 1000UL;     \
                pthread_mutex_lock(&waitevent.mutex);                   \
                pthread_cond_timedwait(&waitevent.cond,                 \
                                &waitevent.mutex, &wait);               \
                pthread_mutex_unlock(&waitevent.mutex);                 \
        } while (0)
#define ENA_WAIT_EVENT_SIGNAL(waitevent) pthread_cond_signal(&waitevent.cond)

This definitely doesn't stick to the textbook condition variable patterns!

We suspect that in extreme rare cases, ENA_WAIT_EVENT_SIGNAL is executed before ENA_WAIT_EVENT_WAIT (device being fast, the main thread being slow).

We implemented the change similar to 072b9f2bbc2 in http://dpdk.org/git/dpdk, and we were running for 48 hours without issues.

@shaibran
Copy link
Contributor

Thanks @mejedi and @mpastyl for taking the time to assist with this investigation.
We are reviewing this and will update shortly

@shaibran
Copy link
Contributor

shaibran commented Oct 11, 2021

@mejedi and @mpastyl @hhaim Thank you again. Indeed commit 072b9f2bbc2402a8c86194fe9e11458c1605540a "net/ena: handle spurious wakeups in wait event" resolves the scenario where interrupt comes faster than ENA_WAIT_EVENT_WAIT, and was not part of DPDK v21.02 (included only in 21.05); We are preparing a troubleshooting document where we will include also recommendation to update ENA or cherry pick the specific commit.

@akiyano akiyano closed this as completed Dec 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants