Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSubscriber create dead lock #1628

Open
kubbo opened this issue Jun 13, 2024 · 1 comment
Open

CSubscriber create dead lock #1628

kubbo opened this issue Jun 13, 2024 · 1 comment

Comments

@kubbo
Copy link

kubbo commented Jun 13, 2024

Problem Description

sometimes I found CSubscriber create stucked,the stack is

Thread 20 (Thread 0xffff5cff5900 (LWP 4756)):
#0  futex_abstimed_wait (private=0, abstime=0x0, clockid=0, expected=2, futex_word=<optimized out>) at ../sysdeps/nptl/futex-internal.h:284
#1  __pthread_rwlock_wrlock_full (abstime=0x0, clockid=0, rwlock=0xffff74001d30) at pthread_rwlock_common.c:830
#2  __GI___pthread_rwlock_wrlock (rwlock=0xffff74001d30) at pthread_rwlock_wrlock.c:27
#3  0x0000ffffa8303da4 in eCAL::CSubGate::Register(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::shared
_ptr<eCAL::CDataReader> const&) () from /lib/aarch64-linux-gnu/libecal_core.so.5
#4  0x0000ffffa8306b80 in eCAL::CSubscriber::Create(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, eCAL::SDat
aTypeInformation const&) () from /lib/aarch64-linux-gnu/libecal_core.so.5
#5  0x0000ffffa8307540 in eCAL::CSubscriber::CSubscriber(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, eCAL:
:SDataTypeInformation const&) () from /lib/aarch64-linux-gnu/libecal_core.so.5
#6  0x0000ffffa83075b8 in eCAL::CSubscriber::CSubscriber(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () fr
om /lib/aarch64-linux-gnu/libecal_core.so.5
#7  0x0000ffffaa94bf28 in ?? ()
#8  0x0000ffff50001030 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
Thread 13 (Thread 0xffff90f8b900 (LWP 4749)):
#0  __lll_lock_wait (futex=futex@entry=0xffff984b2000, private=128) at lowlevellock.c:52
#1  0x0000ffffa9fd9cd8 in __GI___pthread_mutex_lock (mutex=0xffff984b2000) at pthread_mutex_lock.c:80
#2  0x0000ffffa82b59e4 in eCAL::CNamedMutexImpl::Lock(long) () from /lib/aarch64-linux-gnu/libecal_core.so.5
#3  0x0000ffffa82b69b0 in eCAL::CMemoryFile::Create(char const*, bool, unsigned long, bool) () from /lib/aarch64-linux-gnu/libecal_core.so.5
#4  0x0000ffffa82bacc8 in eCAL::CMemFileObserver::Create(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::
__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /lib/aarch64-linux-gnu/libecal_core.so.5
#5  0x0000ffffa82bbf10 in eCAL::CMemFileThreadPool::ObserveFile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&
, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std
::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, std::function<unsigned long (
std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::
allocator<char> > const&, char const*, unsigned long, long long, long long, long long, unsigned long)> const&) () from /lib/aarch64-linux-gnu/libecal_
core.so.5
#6  0x0000ffffa831d888 in eCAL::CSHMReaderLayer::SetConnectionParameter(eCAL::SReaderLayerPar&) () from /lib/aarch64-linux-gnu/libecal_core.so.5
#7  0x0000ffffa830b608 in eCAL::CDataReader::ApplyLocLayerParameter(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > co
nst&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, eCAL::pb::eTLayerType, std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> > const&) () from /lib/aarch64-linux-gnu/libecal_core.so.5
#8  0x0000ffffa8302d40 in eCAL::CSubGate::ApplyLocPubRegistration(eCAL::pb::Sample const&) () from /lib/aarch64-linux-gnu/libecal_core.so.5
#9  0x0000ffffa832f9f0 in eCAL::CRegistrationReceiver::ApplySample(eCAL::pb::Sample const&) () from /lib/aarch64-linux-gnu/libecal_core.so.5
#10 0x0000ffffa82d153c in eCAL::UDP::CSampleReceiver::Process(char const*, unsigned long) () from /lib/aarch64-linux-gnu/libecal_core.so.5
#11 0x0000ffffa82d05a0 in void eCAL::CCallbackThread::callbackFunction<std::chrono::duration<long, std::ratio<1l, 1000l> > >(std::chrono::duration<lon
g, std::ratio<1l, 1000l> >) () from /lib/aarch64-linux-gnu/libecal_core.so.5
#12 0x0000ffffa86d8f9c in ?? () from /lib/aarch64-linux-gnu/libstdc++.so.6
#13 0x0000ffffa9fd7624 in start_thread (arg=0xffffa86d8f80) at pthread_create.c:477
#14 0x0000ffffa854662c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78

How to reproduce

I have no way to manually reproduce it.

How did you get eCAL?

Ubuntu PPA (apt-get)

Environment

eCAL Version: 5.13.0

eCAL System Information

; --------------------------------------------------
; NETWORK SETTINGS
; --------------------------------------------------
; network_enabled                  = true / false                  true  = all eCAL components communicate over network boundaries
;                                                                  false = local host only communication
;
; multicast_config_version         = v1 / v2                       UDP configuration version (Since eCAL 5.12.)
;                                                                    v1: default behavior
;                                                                    v2: new behavior, comes with a bit more intuitive handling regarding masking of the groups
; multicast_group                  = 239.0.0.1                     UDP multicast group base
;                                                                  All registration and logging is sent on this address
; multicast_mask                   = 0.0.0.1-0.0.0.255             v1: Mask maximum number of dynamic multicast group
;                                    255.0.0.0-255.255.255.255     v2: masks are now considered like routes masking
;
; multicast_port                   = 14000 + x                     UDP multicast port number (eCAL will use at least the 2 following port
;                                                                    numbers too, so please modify in steps of 10 (e.g. 1010, 1020 ...)
;
; multicast_ttl                    = 0 + x                         UDP ttl value, also known as hop limit, is used in determining 
;                                                                    the intermediate routers being traversed towards the destination
;
; multicast_sndbuf                 = 1024 * x                      UDP send buffer in bytes
;  
; multicast_rcvbuf                 = 1024 * x                      UDP receive buffer in bytes
;
; multicast_join_all_if            = false                         Linux specific setting to enable joining multicast groups on all network interfacs
;                                                                    independent of their link state. Enabling this makes sure that eCAL processes
;                                                                    receive data if they are started before network devices are up and running.
;  
; bandwidth_max_udp                = -1                            UDP bandwidth limit for eCAL udp layer (-1 == unlimited)
;  
; inproc_rec_enabled               = true                          Enable to receive on eCAL inner process layer
; shm_rec_enabled                  = true                          Enable to receive on eCAL shared memory layer
; udp_mc_rec_enabled               = true                          Enable to receive on eCAL udp multicast layer
;
; npcap_enabled                    = false                         Enable to receive UDP traffic with the Npcap based receiver
;
; tcp_pubsub_num_executor_reader   = 4                             Tcp_pubsub reader amount of threads that shall execute workload
; tcp_pubsub_num_executor_writer   = 4                             Tcp_pubsub writer amount of threads that shall execute workload
; tcp_pubsub_max_reconnections     = 5                             Tcp_pubsub reconnection attemps the session will try to reconnect in 
;                                                                    case of an issue (a negative value means infinite reconnection attemps)
;
; host_group_name                  =                               Common host group name that enables interprocess mechanisms across 
;                                                                    (virtual) host borders (e.g, Docker); by default equivalent to local host name
; --------------------------------------------------

[network]
network_enabled                    = false
multicast_config_version           = v1
multicast_group                    = 239.0.0.1
multicast_mask                     = 0.0.0.15
multicast_port                     = 14000
multicast_ttl                      = 2
multicast_sndbuf                   = 5242880
multicast_rcvbuf                   = 5242880

multicast_join_all_if              = false

bandwidth_max_udp                  = -1

inproc_rec_enabled                 = true
shm_rec_enabled                    = true
tcp_rec_enabled                    = true
udp_mc_rec_enabled                 = true

npcap_enabled                      = false

tcp_pubsub_num_executor_reader     = 4
tcp_pubsub_num_executor_writer     = 4
tcp_pubsub_max_reconnections       = 5

host_group_name                    =

; --------------------------------------------------
; COMMON SETTINGS
; --------------------------------------------------
; registration_timeout             = 60000                         Timeout for topic registration in ms (internal)
; registration_refresh             = 1000                          Topic registration refresh cylce (has to be smaller then registration timeout !)

; --------------------------------------------------
[common]
registration_timeout               = 60000
registration_refresh               = 10

; --------------------------------------------------
; TIME SETTINGS
; --------------------------------------------------
; timesync_module_rt               = "ecaltime-localtime"          Time synchronisation interface name (dynamic library)
;                                                                  The name will be extended with platform suffix (32|64), debug suffix (d) and platform extension (.dll|.so)
;
;                                                                  Available modules are:
;                                                                    - ecaltime-localtime    local system time without synchronization        
;                                                                    - ecaltime-linuxptp     For PTP / gPTP synchronization over ethernet on Linux
;                                                                                            (device configuration in ecaltime.ini)
;                                                                    - ecaltime-simtime      Simulation time as published by the eCAL Player.
; --------------------------------------------------
[time]
timesync_module_rt                 = "ecaltime-localtime"

; ---------------------------------------------
; PROCESS SETTINGS
; ---------------------------------------------
;
; terminal_emulator                = /usr/bin/x-terminal-emulator -e    command for starting applications with an external terminal emulator. If empty, the command will be ignored. Ignored on Windows.
;                                                                       e.g.  /usr/bin/x-terminal-emulator -e
;                                                                             /usr/bin/gnome-terminal -x
;                                                                             /usr/bin/xterm -e
;
; ---------------------------------------------
[process]
terminal_emulator                  = 

; --------------------------------------------------
; PUBLISHER SETTINGS
; --------------------------------------------------
; use_inproc                       = 0, 1, 2                       Use inner process transport layer (0 = off, 1 = on, 2 = auto, default = 0)
; use_shm                          = 0, 1, 2                       Use shared memory transport layer (0 = off, 1 = on, 2 = auto, default = 2)
; use_tcp                          = 0, 1, 2                       Use tcp transport layer           (0 = off, 1 = on, 2 = auto, default = 0)
; use_udp_mc                       = 0, 1, 2                       Use udp multicast transport layer (0 = off, 1 = on, 2 = auto, default = 2)
;
; memfile_minsize                  = x * 4096 kB                   Default memory file size for new publisher
;
; memfile_reserve                  = 50 .. x %                     Dynamic file size reserve before recreating memory file if topic size changes
;
; memfile_ack_timeout              = 0 .. x ms                     Publisher timeout for ack event from subscriber that memory file content is processed
;
; memfile_buffer_count             = 1 .. x                        Number of parallel used memory file buffers for 1:n publish/subscribe ipc connections (default = 1)
; memfile_zero_copy                = 0, 1                          Allow matching subscriber to access memory file without copying its content in advance (blocking mode)
;
; share_ttype                      = 0, 1                          Share topic type via registration layer
; share_tdesc                      = 0, 1                          Share topic description via registration layer (switch off to disable reflection)
; --------------------------------------------------
[publisher]
use_inproc                         = 0
use_shm                            = 2
use_tcp                            = 0
use_udp_mc                         = 2

memfile_minsize                    = 4096
memfile_reserve                    = 50
memfile_ack_timeout                = 0
memfile_buffer_count               = 1
memfile_zero_copy                  = 0

share_ttype                        = 1
share_tdesc                        = 1

; --------------------------------------------------
; SERVICE SETTINGS
; --------------------------------------------------
; protocol_v0                      = 0, 1                          Support service protocol v0, eCAL 5.11 and older (0 = off, 1 = on)
; protocol_v1                      = 0, 1                          Support service protocol v1, eCAL 5.12 and newer (0 = off, 1 = on)
; --------------------------------------------------
[service]
protocol_v0                        = 1
protocol_v1                        = 1

; --------------------------------------------------
; MONITORING SETTINGS
; --------------------------------------------------
; timeout                          = 1000 + (x * 1000)             Timeout for topic monitoring in ms
; filter_excl                      = ^__.*$                        Topics blacklist as regular expression (will not be monitored)
; filter_incl                      =                               Topics whitelist as regular expression (will be monitored only)
; filter_log_con                   = info, warning, error, fatal   Log messages logged to console (all, info, warning, error, fatal, debug1, debug2, debug3, debug4)
; filter_log_file                  =                               Log messages to logged into file system
; filter_log_udp                   = info, warning, error, fatal   Log messages logged via udp network
; --------------------------------------------------
[monitoring]
timeout                            = 1000
filter_excl                        = ^__.*$
filter_incl                        =
filter_log_con                     = info, warning, error, fatal
filter_log_file                    =
filter_log_udp                     = info, warning, error, fatal

; --------------------------------------------------
; SYS SETTINGS
; --------------------------------------------------
; filter_excl                      = App1,App2                     Apps blacklist to be excluded when importing tasks from cloud
; --------------------------------------------------
[sys]
filter_excl                        = ^eCALSysClient$|^eCALSysGUI$|^eCALSys$

; --------------------------------------------------
; EXPERIMENTAL SETTINGS
; --------------------------------------------------
; shm_monitoring_enabled           = false                         Enable distribution of monitoring/registration information via shared memory
; shm_monitoring_domain            = ecal_monitoring               Domain name for shared memory based monitoring/registration
; shm_monitoring_queue_size        = 1024                          Queue size of monitoring/registration events
; network_monitoring_disabled      = false                         Disable distribution of monitoring/registration information via network
;
; drop_out_of_order_messages       = false                         Enable dropping of payload messages that arrive out of order
; --------------------------------------------------
[experimental]
shm_monitoring_enabled             = false
shm_monitoring_domain              = ecal_mon
shm_monitoring_queue_size          = 1024
network_monitoring_disabled        = false
drop_out_of_order_messages         = false
@kubbo kubbo changed the title CSubscriber create stucked CSubscriber create dead lock Jun 13, 2024
@KerstinKeller
Copy link
Contributor

Hi @kubbo, sorry, somehow we missed your issue.
What programm are you running exactly? Do you have a reproducible sample?
Which eCAL version? We have recently released eCAL 5.12.6 and 5.13.2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants