Something wrong with thread stack #286

wildshaman · 2019-12-10T15:10:15Z

Manticore 3.1.2 47b6bc2@190822 release

Linux 3.10.0-1062.1.1.el7.x86_64 #1 SMP Fri Sep 13 22:55:44 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

Sometimes (while inserting data - different data, different RT indexes) manticore crashes with an error:

    
Manticore 3.1.2 47b6bc2@190822 release
    
Handling signal 11
    
-------------- backtrace begins here ---------------
    
Program compiled with 4.8.5
    
Configured with flags: Configured by CMake with these definitions: -DCMAKE_BUILD_TYPE=RelWithDebInfo -DDISTR_BUILD=rhel7 -DDL_UNIXODBC=1 -DUNIXODBC_LIB=libodbc.so.2 -DDL_EXPAT=1 -DEXPAT_LIB=libexpat.so.1 -DUSE_LIBICONV=1 -DDL_MYSQL=1 -DMYSQL_LIB=libmysqlclient.so.18 -DDL_PGSQL=1 -DPGSQL_LIB=libpq.so.5 -DLOCALDATADIR=/var/data -DFULL_SHARE_DIR=/usr/share/manticore -DICU_IS_SHARED=1 -DDL_ICU=1 -DICU_LIB=libicuuc.so.50 -DUSE_BISON=ON -DUSE_FLEX=ON -DUSE_SYSLOG=1 -DWITH_EXPAT=1 -DWITH_ICONV=ON -DWITH_MYSQL=1 -DWITH_ODBC=ON -DWITH_PGSQL=1 -DWITH_RE2=1 -DWITH_STEMMER=1 -DWITH_ZLIB=ON -DGALERA_SOVERSION=31 -DSYSCONFDIR=etc/sphinx
    
Host OS is Linux runner-72989761-project-3858465-concurrent-0 4.19.23-coreos-r1 <a href ="https://Github.com/manticoresoftware/manticoresearch/issues/1">#1</a> SMP Mon Feb 25 23:40:01 -00 2019 x86_64 x86_64 x86_64 GNU/Linux
    
Stack bottom = 0x7f606450ddbf, thread stack size = 0x100000
    
Trying manual backtrace:
    
Something wrong with thread stack, manual backtrace may be incorrect (fp=0x2a7abe00)
    
Wrong stack limit or frame pointer, manual backtrace failed (fp=0x2a7abe00, stack=0x7f6064510000, stacksize=0x100000)
    
Trying system backtrace:
    
begin of system symbols:
    
searchd(_Z12sphBacktraceib 0x90)[0x6baca0]
    
searchd(_ZN16SphCrashLogger_c11HandleCrashEi 0x18a)[0x533b8a]
    
/lib64/libpthread.so.0( 0xf5f0)[0x7f6a689805f0]
    
searchd(_ZN13CSphIndex_VLN9KillMultiERK11VecTraits_TIlE 0x104)[0x5fc0c4]
    
searchd(_ZN9RtIndex_c16CommitReplayableEP11RtSegment_tRN3sph8Vector_TIlNS2_13DefaultCopy_TIlEENS2_14DefaultRelimitENS2_16DefaultStorage_TIlEEEEPib 0x14dd)[0x7d625d]
    
searchd(_ZN9RtIndex_c6CommitEPiP9RtAccum_t 0x144)[0x7d6af4]
    
searchd(_ZNK15CommitMonitor_c18CommitNonEmptyCmdsEP9RtIndex_iRK20ReplicationCommand_tbR10CSphString 0x6b)[0x5cf12b]
    
searchd(_ZN15CommitMonitor_c6CommitER10CSphString 0x68)[0x5cf468]
    
searchd[0x5dcd14]
    
searchd(_Z20sphHandleMysqlInsertR19StmtErrorReporter_iR9SqlStmt_tbbR10CSphStringR16CSphSessionAccum13ESphCollationRN3sph8Vector_TIlNS8_13DefaultCopy_TIlEENS8_14DefaultRelimitENS8_16DefaultStorage_TIlEEEE 0x1ff7)[0x57e227]
    
searchd(_ZN16CSphinxqlSession7ExecuteERK10CSphStringR16ISphOutputBufferRhRN7Threads9ThdDesc_tE 0xc8c)[0x5b5c1c]
    
searchd[0x58e585]
    
searchd[0x58e914]
    
searchd(_Z17HandlerThreadFuncPv 0x19)[0x58ebc9]
    
searchd(_ZN16SphCrashLogger_c13ThreadWrapperEPv 0x44)[0x532b74]
    
searchd(_Z20sphThreadProcWrapperPv 0x25)[0x6c0b15]
    
/lib64/libpthread.so.0( 0x7e65)[0x7f6a68978e65]
    
/lib64/libc.so.6(clone 0x6d)[0x7f6a6718f88d]
    
-------------- backtrace ends here ---------------

The text was updated successfully, but these errors were encountered:

tomatolog · 2019-12-10T15:12:21Z

could you check your index that causes crash with indextool ?

wildshaman · 2019-12-10T15:15:18Z

Will do indexcheck some later, thanks.

Just got an error again (whule reindexing part of rows):

-------------- backtrace begins here ---------------
Program compiled with 4.8.5
Configured with flags: Configured by CMake with these definitions: -DCMAKE_BUILD_TYPE=RelWithDebInfo -DDISTR_BUILD=rhel7 -DDL_UNIXODBC=1 -DUNIXODBC_LIB=libodbc.so.2 -DDL_EXPAT=1 -DEXPAT_LIB=libexpat.so.1 -DUSE_LIBICONV=1 -DDL_MYSQL=1 -DMYSQL_LIB=libmysqlclient.so.18 -DDL_PGSQL=1 -DPGSQL_LIB=libpq.so.5 -DLOCALDATADIR=/var/data -DFULL_SHARE_DIR=/usr/share/manticore -DICU_IS_SHARED=1 -DDL_ICU=1 -DICU_LIB=libicuuc.so.50 -DUSE_BISON=ON -DUSE_FLEX=ON -DUSE_SYSLOG=1 -DWITH_EXPAT=1 -DWITH_ICONV=ON -DWITH_MYSQL=1 -DWITH_ODBC=ON -DWITH_PGSQL=1 -DWITH_RE2=1 -DWITH_STEMMER=1 -DWITH_ZLIB=ON -DGALERA_SOVERSION=31 -DSYSCONFDIR=etc/sphinx
Host OS is Linux runner-72989761-project-3858465-concurrent-0 4.19.23-coreos-r1 #1 SMP Mon Feb 25 23:40:01 -00 2019 x86_64 x86_64 x86_64 GNU/Linux
Stack bottom = 0x7f60509f6dbf, thread stack size = 0x100000
Trying manual backtrace:
Something wrong with thread stack, manual backtrace may be incorrect (fp=0x7f5ea906e530)
Wrong stack limit or frame pointer, manual backtrace failed (fp=0x7f5ea906e530, stack=0x7f60509f0000, stacksize=0x100000)
Trying system backtrace:
begin of system symbols:
searchd(_Z12sphBacktraceib+0x90)[0x6baca0]
searchd(_ZN16SphCrashLogger_c11HandleCrashEi+0x18a)[0x533b8a]
/lib64/libpthread.so.0(+0xf5f0)[0x7f6a689805f0]
searchd(_ZN13CSphIndex_VLN9KillMultiERK11VecTraits_TIlE+0x104)[0x5fc0c4]
searchd(_ZN9RtIndex_c16CommitReplayableEP11RtSegment_tRN3sph8Vector_TIlNS2_13DefaultCopy_TIlEENS2_14DefaultRelimitENS2_16DefaultStorage_TIlEEEEPib+0x14dd)[0x7d625d]
searchd(_ZN9RtIndex_c6CommitEPiP9RtAccum_t+0x144)[0x7d6af4]
searchd(_ZNK15CommitMonitor_c18CommitNonEmptyCmdsEP9RtIndex_iRK20ReplicationCommand_tbR10CSphString+0x6b)[0x5cf12b]
searchd(_ZN15CommitMonitor_c6CommitER10CSphString+0x68)[0x5cf468]
searchd[0x5dcd14]
searchd(_Z20sphHandleMysqlInsertR19StmtErrorReporter_iR9SqlStmt_tbbR10CSphStringR16CSphSessionAccum13ESphCollationRN3sph8Vector_TIlNS8_13DefaultCopy_TIlEENS8_14DefaultRelimitENS8_16DefaultStorage_TIlEEEE+0x1ff7)[0x57e227]
searchd(_ZN16CSphinxqlSession7ExecuteERK10CSphStringR16ISphOutputBufferRhRN7Threads9ThdDesc_tE+0xc8c)[0x5b5c1c]
searchd[0x58e585]
searchd[0x58e914]
searchd(_Z17HandlerThreadFuncPv+0x19)[0x58ebc9]
searchd(_ZN16SphCrashLogger_c13ThreadWrapperEPv+0x44)[0x532b74]
searchd(_Z20sphThreadProcWrapperPv+0x25)[0x6c0b15]
/lib64/libpthread.so.0(+0x7e65)[0x7f6a68978e65]
/lib64/libc.so.6(clone+0x6d)[0x7f6a6718f88d]
-------------- backtrace ends here ---------------

wildshaman · 2019-12-10T15:16:45Z

Tell me please the full command for indextool (with arguments)

tomatolog · 2019-12-10T15:19:46Z

in case indextool will report that index got damaged there is no point to insert data - you have to clean your index and reinsert data from scratch

The command is ./indextool -c your.conf --check index_name

However daemon should be shutdown or indextool unable to check RAM part of RT index that got served by daemon. Or you might copy your RT index to another location and fix path at your config for check.

wildshaman · 2019-12-10T15:36:57Z

Maybe there is a way to prevent crashing searchd while inserting a "problem" rows (if inserting affects damaged parts)? To prevent repopulating index if it damaged.

I can insert data, but an error appears not every time, just sometimes - and in various indexes.

So, it will be good to have capability of inserting good data and skipping "bad data" to prevent searchd crashing and restarting (and to avoid losing of data in ram while crashing).

Я так понимаю, можно задать вопрос на родном русском, чтобы быть более правильно понятым :)

Суть такая - searchd крашится при попытке вставки пачки данных (replace into с 100-500 значениями через запятую). Крашится иногда - когда раз в пару дней, когда несколько раз в день.

И крашится при вставке в совсем разные индексы. Причем последние краши - в свежесозданном (свежезаполненном) индексе, который вряд ли бы успел побиться. indexcheck сейчас не сделать - нет возможности остановить демон. Так что, скорее всего, я грешу на какой-то баг (но не утверждаю, разумеется), чем на тот факт, что индексы поломан - ибо данные я в них вставлять могу, пока не случится какой-то краш.

Может быть, по указанному выше дебаг-логу можео установить причину, в чем дело?

И было бы очень хорошо, если бы ошибка при вставке данных, которые вызывают краш, просто писалась в лог (с потерей данных, которые не вставить, разумеется), а не крашила намертво весь searchd с потерей всего, что сейчас в RAM и не успелось скинуться на диск.

tomatolog · 2019-12-10T20:30:12Z

если у вас включен binlog - опция binlog_path в секции searchd - то ничего не теряется, а все данные которые не успели сохраниться в РТ индекс, при рестарте демона, после креша, применяются в индекс заново. Вы можете почитать больше в документации

По логу видно что креш случается при удалении документа в уже сохраненных диск чанках. Поэтому без воспроизводимого кейса трудно понять в чем именно причина.

Я бы на вашем месте, настроил binlog_path чтобы сохранять вставляемые в РТ индекс данные,
и после креша выслал бы RT индекс вместе со всеми диск чанками, бинлог и лог демона где видно запрос и стек падения на наш FTP для дальнейшего расследования.

manticoresearch added the waiting Waiting for the original poster (in most cases) or something else label Dec 12, 2019

githubmanticore removed the waiting Waiting for the original poster (in most cases) or something else label Dec 16, 2019

hgfdd mentioned this issue Jan 27, 2020

UPDATE MVA is freezing #305

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Something wrong with thread stack #286

Something wrong with thread stack #286

wildshaman commented Dec 10, 2019 •

edited by githubmanticore

Loading

tomatolog commented Dec 10, 2019

wildshaman commented Dec 10, 2019 •

edited by tomatolog

Loading

wildshaman commented Dec 10, 2019

tomatolog commented Dec 10, 2019

wildshaman commented Dec 10, 2019 •

edited

Loading

tomatolog commented Dec 10, 2019

Something wrong with thread stack #286

Something wrong with thread stack #286

Comments

wildshaman commented Dec 10, 2019 • edited by githubmanticore Loading

tomatolog commented Dec 10, 2019

wildshaman commented Dec 10, 2019 • edited by tomatolog Loading

wildshaman commented Dec 10, 2019

tomatolog commented Dec 10, 2019

wildshaman commented Dec 10, 2019 • edited Loading

tomatolog commented Dec 10, 2019

wildshaman commented Dec 10, 2019 •

edited by githubmanticore

Loading

wildshaman commented Dec 10, 2019 •

edited by tomatolog

Loading

wildshaman commented Dec 10, 2019 •

edited

Loading