You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Manticore should not rely on DNS for cluster nodes resolution when working in kubernetes.
When a pod restarts because of node failure/upgrade/etc, its entry is removed from the kube dns (core-dns). Manticore then crashes on the remaining node with error, because it cannot resolve the failed node by name:
[Thu Aug 12 12:18:54.614 2021] [13] FATAL: no AF_INET address found for: backend-manticoresearch-worker-1.backend-manticoresearch-worker
FATAL: no AF_INET address found for: backend-manticoresearch-worker-1.backend-manticoresearch-worker
[Thu Aug 12 12:18:54.673 2021] [1] caught SIGTERM, shutting down
caught SIGTERM, shutting down
------- FATAL: CRASH DUMP -------
[Thu Aug 12 12:18:54.673 2021] [ 1]
[Thu Aug 12 12:19:19.674 2021] [1] WARNING: GlobalCrashQueryGetRef: thread-local info is not set! Use ad-hoc
WARNING: GlobalCrashQueryGetRef: thread-local info is not set! Use ad-hoc
--- crashed invalid query ---
--- request dump end ---
--- local index:
Manticore 3.6.0 96d61d8bf@210504 release
Handling signal 11
Crash!!! Handling signal 11
-------------- backtrace begins here ---------------
Program compiled with 7
Configured with flags: Configured by CMake with these definitions: -DCMAKE_BUILD_TYPE=RelWithDebInfo -DDISTR_BUILD=bionic -DUSE_SSL=ON -DDL_UNIXODBC=1 -DUNIXODBC_LIB=libodbc.so.2 -DDL_EXPAT=1 -DEXPAT_LIB=libexpat.so.1 -DUSE_LIBICONV=1 -DDL_MYSQL=1 -DMYSQL_LIB=libmysqlclient.so.20 -DDL_PGSQL=1 -DPGSQL_LIB=libpq.so.5 -DLOCALDATADIR=/var/data -DFULL_SHARE_DIR=/usr/share/manticore -DUSE_RE2=1 -DUSE_ICU=1 -DUSE_BISON=ON -DUSE_FLEX=ON -DUSE_SYSLOG=1 -DWITH_EXPAT=1 -DWITH_ICONV=ON -DWITH_MYSQL=1 -DWITH_ODBC=ON -DWITH_PGSQL=1 -DWITH_RE2=1 -DWITH_STEMMER=1 -DWITH_ZLIB=ON -DGALERA_SONAME=libgalera_manticore.so.31 -DSYSCONFDIR=/etc/manticoresearch
Host OS is Linux x86_64
Stack bottom = 0x7fff43fad227, thread stack size = 0x20000
Trying manual backtrace:
Something wrong with thread stack, manual backtrace may be incorrect (fp=0x5c95bbd0f9002)
Wrong stack limit or frame pointer, manual backtrace failed (fp=0x5c95bbd0f9002, stack=0x564947870000, stacksize=0x20000)
Trying system backtrace:
begin of system symbols:
searchd(_Z12sphBacktraceib+0xcb)[0x564946faf75b]
searchd(_ZN11CrashLogger11HandleCrashEi+0x1ac)[0x564946dcd66c]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x12890)[0x7fa45a7fa890]
searchd(_ZN11CSphNetLoop11StopNetLoopEv+0xa)[0x564946eb978a]
searchd(_Z8Shutdownv+0xd0)[0x564946dd2c00]
searchd(_Z12CheckSignalsv+0x63)[0x564946de04a3]
searchd(_Z8TickHeadv+0x1b)[0x564946de04fb]
searchd(_Z11ServiceMainiPPc+0x1cea)[0x564946dfa5ea]
searchd(main+0x63)[0x564946dcb6a3]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7fa4594b4b97]
searchd(_start+0x2a)[0x564946dcca6a]
-------------- backtrace ends here ---------------
Please, create a bug report in our bug tracker (https://github.com/manticoresoftware/manticore/issues)
and attach there:
a) searchd log, b) searchd binary, c) searchd symbols.
Look into the chapter 'Reporting bugs' in the manual
(https://manual.manticoresearch.com/Reporting_bugs)
Dump with GDB is not available
--- BT to source lines (depth 11): ---
conversion failed (error 'No such file or directory'):
1. Run the command provided below over the crashed binary (for example, 'searchd'):
2. Attach the source.txt to the bug report.
addr2line -e searchd 0x46faf75b 0x46dcd66c 0x5a7fa890 0x46eb978a 0x46dd2c00 0x46de04a3 0x46de04fb
0x46dfa5ea 0x46dcb6a3 0x594b4b97 0x46dcca6a > source.txt
After cluster node comes back online, the remaining node cannot start because it cannot resolve its own IP due to it's own entry got removed from DNS. The NXDOMAIN DNS response is cached by the kubernetes cluster node OS time and again, so node cannot start anymore at all.
The text was updated successfully, but these errors were encountered:
While the above referenced issue is resolved, for the kubernetes-based installations, manticore should not rely on DNS as it's unreliable overall (with entries removed and added dynamically and the DNS caching), Instead manticore should query kube-api directly for the IPs of it's pods.
Manticore should not rely on DNS for cluster nodes resolution when working in kubernetes.
When a pod restarts because of node failure/upgrade/etc, its entry is removed from the kube dns (core-dns). Manticore then crashes on the remaining node with error, because it cannot resolve the failed node by name:
After cluster node comes back online, the remaining node cannot start because it cannot resolve its own IP due to it's own entry got removed from DNS. The NXDOMAIN DNS response is cached by the kubernetes cluster node OS time and again, so node cannot start anymore at all.
The text was updated successfully, but these errors were encountered: