Manticore crushed with errors #1782

Vickcle · 2024-02-01T06:41:58Z

Describe the bug
manticoresearch suddenly crushed

To Reproduce

create a real-time table contains some fulltext columns
insert 30 million test data
drop one of the fulltext column

Expected behavior
the table drops this column successfully
and no suddenly crushed.

Describe the environment:

Manticore 6.2.12 dc5144d@230822 (columnar 2.2.4 5aec342@230822) (secondary 2.2.4 5aec342@230822)
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Copyright (c) 2017-2023, Manticore Software LTD (https://manticoresearch.com)

OS version (uname -a if on a Unix-like system):
Linux 3.10.0-1127.el7.x86_64 Should I use manticore instead of sphinxsearch? #1 SMP Tue Mar 31 23:36:51 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Messages from log files:
--- crashed SphinxAPI request dump ---
AAABIgAAAf8AAAAUAAAAAQAAAEAAAAAAAAAACgAAAAYAAAAIAAAAXihzdW0oKDgqd29yZF9jb3VudCs0Kmxjcysy
KihtaW5faGl0X3Bvcz09MSkrZXhhY3RfaGl0KSp1c2VyX3dlaWdodCkqMTAwMCtibTI1KStsb2cyKDEr
cHYpKjEwMDAAAAAEAAAAFHdlaWdodCBERVNDLCBwdiBERVNDAAAANkBAcmVsYXhlZCBAKHRpdGxlX2pp
ZWJhLHN1bW1hcnlfamllYmEpICJweSBhcm0g5LquIi8xIAAAAAAAAAAFcml2YWwAAAABAAAAAAAAAAD/
/////////wAAAAAAAAAEAAAAAAAAAAoAAAANQGdyb3VwYnkgZGVzY/////8AAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAACAAAAC3RpdGxlX2ppZWJhAAAACgAAAA1zdW1tYXJ5X2ppZWJhAAAABQAAAAAA
AAAAAAAAF2lkLCBwdiwgd2VpZ2h0KCkgd2VpZ2h0AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAQAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAMAAAACaWQAAAACaWQAAAAAAAAAAnB2AAAAAnB2AAAAAAAA
AAZ3ZWlnaHQAAAAId2VpZ2h0KCkAAAAAAAAAAAAAAAAAAAAA
--- request dump end ---
--- local index:rival
Manticore 6.2.12 dc5144d@230822 (columnar 2.2.4 5aec342@230822) (secondary 2.2.4 5aec342@230822)
Handling signal 11
-------------- backtrace begins here ---------------
Program compiled with Clang 15.0.7
Configured with flags: Configured with these definitions: -DDISTR_BUILD=jammy -DUSE_SYSLOG=1 -DWITH_GALERA=1 -DWITH_RE2=1 -DWITH_RE2_FORCE_STATIC=1 -DWITH_STEMMER=1 -DWITH_STEMMER_FORCE_STATIC=1 -DWITH_NLJSON=1 -DWITH_UNIALGO=1 -DWITH_ICU=1 -DWITH_ICU_FORCE_STATIC=1 -DWITH_SSL=1 -DWITH_ZLIB=1 -DWITH_ZSTD=1 -DDL_ZSTD=1 -DZSTD_LIB=libzstd.so.1 -DWITH_CURL=1 -DDL_CURL=1 -DCURL_LIB=libcurl.so.4 -DWITH_ODBC=1 -DDL_ODBC=1 -DODBC_LIB=libodbc.so.2 -DWITH_EXPAT=1 -DDL_EXPAT=1 -DEXPAT_LIB=libexpat.so.1 -DWITH_ICONV=1 -DWITH_MYSQL=1 -DDL_MYSQL=1 -DMYSQL_LIB=libmysqlclient.so.21 -DWITH_POSTGRESQL=1 -DDL_POSTGRESQL=1 -DPOSTGRESQL_LIB=libpq.so.5 -DLOCALDATADIR=/var/lib/manticore -DFULL_SHARE_DIR=/usr/share/manticore
Built on Linux x86_64 (jammy) (cross-compiled)
Stack bottom = 0x7f909c025da0, thread stack size = 0x20000
Trying manual backtrace:
Something wrong with thread stack, manual backtrace may be incorrect (fp=0x1)
Wrong stack limit or frame pointer, manual backtrace failed (fp=0x1, stack=0x7f909c020000, stacksize=0x20000)
Trying system backtrace:
begin of system symbols:
searchd(_Z12sphBacktraceib+0x22a)[0x55e8fe01d9fa]
searchd(_ZN11CrashLogger11HandleCrashEi+0x355)[0x55e8fde9c2b5]
/lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f9322693520]
searchd(_ZNK13Expr_GetInt_c4EvalERK9CSphMatch+0x38)[0x55e8febd3c88]
searchd(_ZNK10Expr_Add_c4EvalERK9CSphMatch+0x27)[0x55e8febdd1c7]
searchd(_ZNK11Expr_Log2_c4EvalERK9CSphMatch+0xb)[0x55e8febd5d5b]
searchd(_ZNK11Expr_Madd_c4EvalERK9CSphMatch+0x27)[0x55e8febd8627]
searchd(_ZN19RankerState_Expr_fnILb0ELb0EE8FinalizeERK9CSphMatch+0x3b)[0x55e8fec27e0b]
searchd(_ZN17ExtRanker_State_TI19RankerState_Expr_fnILb0ELb0EELb1EE10GetMatchesEv+0x4c6)[0x55e8fec26b26]
searchd(+0x1c1cd66)[0x55e8fec5ed66]
searchd(+0x1c0dc9d)[0x55e8fec4fc9d]
searchd(_ZNK9RtIndex_c10MultiQueryER15CSphQueryResultRK9CSphQueryRK11VecTraits_TIP15ISphMatchSorterERK18CSphMultiQueryArgs+0x1c20)[0x55e8fec4e310]
searchd(_ZNK13CSphIndexStub12MultiQueryExEiPK9CSphQueryP15CSphQueryResultPP15ISphMatchSorterRK18CSphMultiQueryArgs+0x71)[0x55e8fdfea1b1]
searchd(+0xebf20b)[0x55e8fdf0120b]
searchd(+0x1d223cd)[0x55e8fed643cd]
searchd(_ZN7Threads4Coro8ExecuteNEiOSt8functionIFvvEE+0x78)[0x55e8ff169ff8]
searchd(_ZN15SearchHandler_c16RunLocalSearchesEv+0xb39)[0x55e8fdeb2689]
searchd(_ZN15SearchHandler_c9RunSubsetEii+0x51a)[0x55e8fdeb39ea]
searchd(_ZN15SearchHandler_c10RunQueriesEv+0xd4)[0x55e8fdeb06e4]
searchd(_Z19HandleCommandSearchR16ISphOutputBuffertR13InputBuffer_c+0x333)[0x55e8fdebb4d3]
searchd(_Z8ApiServeSt10unique_ptrI16AsyncNetBuffer_cSt14default_deleteIS0_EE+0x7b1)[0x55e8fde2a011]
searchd(_Z10MultiServeSt10unique_ptrI16AsyncNetBuffer_cSt14default_deleteIS0_EESt4pairIitE7Proto_e+0x12e)[0x55e8fde27ffe]
searchd(+0xde6af2)[0x55e8fde28af2]
searchd(ZZN7Threads11CoRoutine_c13CreateContextESt8functionIFvvEESt4pairIN5boost7context13stack_contextENS_14StackFlavour_EEEENUlNS6_6detail10transfer_tEE_8__invokeESB+0x1c)[0x55e8ff16cdac]
searchd(make_fcontext+0x37)[0x55e8ff18d167]
Trying boost backtrace:
0# sphBacktrace(int, bool) in searchd
1# CrashLogger::HandleCrash(int) in searchd
2# 0x00007F9322693520 in /lib/x86_64-linux-gnu/libc.so.6
3# Expr_GetInt_c::Eval(CSphMatch const&) const in searchd
4# Expr_Add_c::Eval(CSphMatch const&) const in searchd
5# Expr_Log2_c::Eval(CSphMatch const&) const in searchd
6# Expr_Madd_c::Eval(CSphMatch const&) const in searchd
7# RankerState_Expr_fn<false, false>::Finalize(CSphMatch const&) in searchd
8# ExtRanker_State_T<RankerState_Expr_fn<false, false>, true>::GetMatches() in searchd
9# 0x000055E8FEC5ED66 in searchd
10# 0x000055E8FEC4FC9D in searchd
11# RtIndex_c::MultiQuery(CSphQueryResult&, CSphQuery const&, VecTraits_T<ISphMatchSorter*> const&, CSphMultiQueryArgs const&) const in searchd
12# CSphIndexStub::MultiQueryEx(int, CSphQuery const*, CSphQueryResult*, ISphMatchSorter**, CSphMultiQueryArgs const&) const in searchd
13# 0x000055E8FDF0120B in searchd
14# 0x000055E8FED643CD in searchd
15# Threads::Coro::ExecuteN(int, std::function<void ()>&&) in searchd
16# SearchHandler_c::RunLocalSearches() in searchd
17# SearchHandler_c::RunSubset(int, int) in searchd
18# SearchHandler_c::RunQueries() in searchd
19# HandleCommandSearch(ISphOutputBuffer&, unsigned short, InputBuffer_c&) in searchd
20# ApiServe(std::unique_ptr<AsyncNetBuffer_c, std::default_delete<AsyncNetBuffer_c> >) in searchd
21# MultiServe(std::unique_ptr<AsyncNetBuffer_c, std::default_delete<AsyncNetBuffer_c> >, std::pair<int, unsigned short>, Proto_e) in searchd
22# 0x000055E8FDE28AF2 in searchd
23# Threads::CoRoutine_c::CreateContext(std::function<void ()>, std::pair<boost::context::stack_context, Threads::StackFlavour_E>)::{lambda(boost::context::detail::transfer_t)#1}::__invoke(boost::context::detail::transfer_t) in searchd
24# make_fcontext in searchd

-------------- backtrace ends here ---------------
Please, create a bug report in our bug tracker (https://github.com/manticoresoftware/manticore/issues)
and attach there:
a) searchd log, b) searchd binary, c) searchd symbols.
Look into the chapter 'Reporting bugs' in the manual
(https://manual.manticoresearch.com/Reporting_bugs)
Dump with GDB via watchdog
--- active threads ---
thd 0 (work_9), proto sphinx, state query, command search
--- Totally 4 threads, and 1 client-working threads ---
------- CRASH DUMP END -------
b

Additional context

The text was updated successfully, but these errors were encountered:

sanikolaev · 2024-02-01T06:45:02Z

Thanks for reporting. Can you please:

check that the table is valid by running indextool --check
check if you can reproduce the crash in the latest dev version (https://mnt.cr/nightly)
?

Vickcle · 2024-02-01T07:45:23Z

when i do indextool --check, it reports: "check FAILED, 99 of 2370815 failures reported, 6.5 sec elapsed
checking disk chunk, extension 8613, 1(192)...
checking schema...
checking dictionary...
checking data...
checking rows...
checking attribute blocks index...
checking kill-list...
checking docstore...
checking dead row map...
checking doc-id lookup...
check FAILED, 99 of 2370815 failures reported, 12.8 sec elapsed"

Vickcle · 2024-02-01T07:52:34Z

I guess there might be some issue when i try do drop a fulltext column ,and the process might not be really done.

sanikolaev · 2024-02-01T07:54:27Z

I guess there might be some issue when i try do drop a fulltext column ,and the process might not be really done.

Yes, it might have corrupted the table. As said in the docs about ALTER:

❗It's recommended to backup table files before ALTER ing it to avoid data corruption in case of a sudden power interruption or other similar issues.

If you have it I recommend restoring to it.

Vickcle · 2024-02-02T01:13:27Z

Okey, We have more than one replication. it's accepctable. But should we consider about redo it or continue do it to aviod the interrupt? I mean not rely on backup but can continue do the drop column job.

sanikolaev · 2024-02-02T06:05:19Z

I mean not rely on backup but can continue do the drop column job.

AFAIK there's no known issue that an ALTER corrupts a table. In most cases it turns out that the table is already corrupted before the ALTER, then the ALTER crashes. ALTER can be considered safe, especially when you make a backup and indextool --check before it.

sanikolaev added the waiting Waiting for the original poster (in most cases) or something else label Feb 1, 2024

Vickcle closed this as completed Feb 2, 2024

Vickcle reopened this Feb 2, 2024

Vickcle closed this as completed Feb 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Manticore crushed with errors #1782

Manticore crushed with errors #1782

Vickcle commented Feb 1, 2024 •

edited

Loading

sanikolaev commented Feb 1, 2024

Vickcle commented Feb 1, 2024

Vickcle commented Feb 1, 2024

sanikolaev commented Feb 1, 2024

Vickcle commented Feb 2, 2024

sanikolaev commented Feb 2, 2024

Manticore crushed with errors #1782

Manticore crushed with errors #1782

Comments

Vickcle commented Feb 1, 2024 • edited Loading

sanikolaev commented Feb 1, 2024

Vickcle commented Feb 1, 2024

Vickcle commented Feb 1, 2024

sanikolaev commented Feb 1, 2024

Vickcle commented Feb 2, 2024

sanikolaev commented Feb 2, 2024

Vickcle commented Feb 1, 2024 •

edited

Loading