Manticore should not rely on DNS when running in kubernetes. #6

zerthimon · 2021-08-12T12:15:26Z

Manticore should not rely on DNS for cluster nodes resolution when working in kubernetes.

When a pod restarts because of node failure/upgrade/etc, its entry is removed from the kube dns (core-dns). Manticore then crashes on the remaining node with error, because it cannot resolve the failed node by name:

[Thu Aug 12 12:18:54.614 2021] [13] FATAL: no AF_INET address found for: backend-manticoresearch-worker-1.backend-manticoresearch-worker  
FATAL: no AF_INET address found for: backend-manticoresearch-worker-1.backend-manticoresearch-worker  
[Thu Aug 12 12:18:54.673 2021] [1] caught SIGTERM, shutting down  
caught SIGTERM, shutting down  
------- FATAL: CRASH DUMP -------  
[Thu Aug 12 12:18:54.673 2021] [    1]  
[Thu Aug 12 12:19:19.674 2021] [1] WARNING: GlobalCrashQueryGetRef: thread-local info is not set! Use ad-hoc  
WARNING: GlobalCrashQueryGetRef: thread-local info is not set! Use ad-hoc  
  
--- crashed invalid query ---  
  
--- request dump end ---  
--- local index:  
Manticore 3.6.0 96d61d8bf@210504 release  
Handling signal 11  
Crash!!! Handling signal 11  
-------------- backtrace begins here ---------------  
Program compiled with 7  
Configured with flags: Configured by CMake with these definitions: -DCMAKE_BUILD_TYPE=RelWithDebInfo -DDISTR_BUILD=bionic -DUSE_SSL=ON -DDL_UNIXODBC=1 -DUNIXODBC_LIB=libodbc.so.2 -DDL_EXPAT=1 -DEXPAT_LIB=libexpat.so.1 -DUSE_LIBICONV=1 -DDL_MYSQL=1 -DMYSQL_LIB=libmysqlclient.so.20 -DDL_PGSQL=1 -DPGSQL_LIB=libpq.so.5 -DLOCALDATADIR=/var/data -DFULL_SHARE_DIR=/usr/share/manticore -DUSE_RE2=1 -DUSE_ICU=1 -DUSE_BISON=ON -DUSE_FLEX=ON -DUSE_SYSLOG=1 -DWITH_EXPAT=1 -DWITH_ICONV=ON -DWITH_MYSQL=1 -DWITH_ODBC=ON -DWITH_PGSQL=1 -DWITH_RE2=1 -DWITH_STEMMER=1 -DWITH_ZLIB=ON -DGALERA_SONAME=libgalera_manticore.so.31 -DSYSCONFDIR=/etc/manticoresearch  
Host OS is Linux x86_64  
Stack bottom = 0x7fff43fad227, thread stack size = 0x20000  
Trying manual backtrace:  
Something wrong with thread stack, manual backtrace may be incorrect (fp=0x5c95bbd0f9002)  
Wrong stack limit or frame pointer, manual backtrace failed (fp=0x5c95bbd0f9002, stack=0x564947870000, stacksize=0x20000)  
Trying system backtrace:  
begin of system symbols:  
searchd(_Z12sphBacktraceib+0xcb)[0x564946faf75b]  
searchd(_ZN11CrashLogger11HandleCrashEi+0x1ac)[0x564946dcd66c]  
/lib/x86_64-linux-gnu/libpthread.so.0(+0x12890)[0x7fa45a7fa890]  
searchd(_ZN11CSphNetLoop11StopNetLoopEv+0xa)[0x564946eb978a]  
searchd(_Z8Shutdownv+0xd0)[0x564946dd2c00]  
searchd(_Z12CheckSignalsv+0x63)[0x564946de04a3]  
searchd(_Z8TickHeadv+0x1b)[0x564946de04fb]  
searchd(_Z11ServiceMainiPPc+0x1cea)[0x564946dfa5ea]  
searchd(main+0x63)[0x564946dcb6a3]  
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7fa4594b4b97]  
searchd(_start+0x2a)[0x564946dcca6a]  
-------------- backtrace ends here ---------------  
Please, create a bug report in our bug tracker (https://github.com/manticoresoftware/manticore/issues)  
and attach there:  
a) searchd log, b) searchd binary, c) searchd symbols.  
Look into the chapter 'Reporting bugs' in the manual  
(https://manual.manticoresearch.com/Reporting_bugs)  
Dump with GDB is not available  
--- BT to source lines (depth 11): ---  
conversion failed (error 'No such file or directory'):  
  1. Run the command provided below over the crashed binary (for example, 'searchd'):  
  2. Attach the source.txt to the bug report.  
addr2line -e searchd 0x46faf75b 0x46dcd66c 0x5a7fa890 0x46eb978a 0x46dd2c00 0x46de04a3 0x46de04fb   
0x46dfa5ea 0x46dcb6a3 0x594b4b97 0x46dcca6a > source.txt

After cluster node comes back online, the remaining node cannot start because it cannot resolve its own IP due to it's own entry got removed from DNS. The NXDOMAIN DNS response is cached by the kubernetes cluster node OS time and again, so node cannot start anymore at all.

The text was updated successfully, but these errors were encountered:

zerthimon · 2021-08-12T14:03:32Z

While the above referenced issue is resolved, for the kubernetes-based installations, manticore should not rely on DNS as it's unreliable overall (with entries removed and added dynamically and the DNS caching), Instead manticore should query kube-api directly for the IPs of it's pods.

sanikolaev · 2021-10-21T09:19:52Z

Can you please elaborate more on

Manticore should not rely on DNS as it's unreliable overall (with entries removed and added dynamically and the DNS caching)

and provide some example?

zerthimon mentioned this issue Aug 12, 2021

Manticore cluster node crashes when unable to resolve a node by name manticoresoftware/manticoresearch#607

Closed

sanikolaev closed this as completed Oct 21, 2021

sanikolaev reopened this Oct 21, 2021

sanikolaev added the waiting Waiting for the original poster (in most cases) or something else label Oct 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Manticore should not rely on DNS when running in kubernetes. #6

Manticore should not rely on DNS when running in kubernetes. #6

zerthimon commented Aug 12, 2021 •

edited by githubmanticore

Loading

zerthimon commented Aug 12, 2021

sanikolaev commented Oct 21, 2021

Manticore should not rely on DNS when running in kubernetes. #6

Manticore should not rely on DNS when running in kubernetes. #6

Comments

zerthimon commented Aug 12, 2021 • edited by githubmanticore Loading

zerthimon commented Aug 12, 2021

sanikolaev commented Oct 21, 2021

zerthimon commented Aug 12, 2021 •

edited by githubmanticore

Loading