Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manticore crushed with errors #1782

Closed
Vickcle opened this issue Feb 1, 2024 · 6 comments
Closed

Manticore crushed with errors #1782

Vickcle opened this issue Feb 1, 2024 · 6 comments
Labels
waiting Waiting for the original poster (in most cases) or something else

Comments

@Vickcle
Copy link

Vickcle commented Feb 1, 2024

Describe the bug
manticoresearch suddenly crushed

To Reproduce

  1. create a real-time table contains some fulltext columns
  2. insert 30 million test data
  3. drop one of the fulltext column

Expected behavior
the table drops this column successfully
and no suddenly crushed.

Describe the environment:

Manticore 6.2.12 dc5144d@230822 (columnar 2.2.4 5aec342@230822) (secondary 2.2.4 5aec342@230822)
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Copyright (c) 2017-2023, Manticore Software LTD (https://manticoresearch.com)

Messages from log files:
--- crashed SphinxAPI request dump ---
AAABIgAAAf8AAAAUAAAAAQAAAEAAAAAAAAAACgAAAAYAAAAIAAAAXihzdW0oKDgqd29yZF9jb3VudCs0Kmxjcysy
KihtaW5faGl0X3Bvcz09MSkrZXhhY3RfaGl0KSp1c2VyX3dlaWdodCkqMTAwMCtibTI1KStsb2cyKDEr
cHYpKjEwMDAAAAAEAAAAFHdlaWdodCBERVNDLCBwdiBERVNDAAAANkBAcmVsYXhlZCBAKHRpdGxlX2pp
ZWJhLHN1bW1hcnlfamllYmEpICJweSBhcm0g5LquIi8xIAAAAAAAAAAFcml2YWwAAAABAAAAAAAAAAD/
/////////wAAAAAAAAAEAAAAAAAAAAoAAAANQGdyb3VwYnkgZGVzY/////8AAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAACAAAAC3RpdGxlX2ppZWJhAAAACgAAAA1zdW1tYXJ5X2ppZWJhAAAABQAAAAAA
AAAAAAAAF2lkLCBwdiwgd2VpZ2h0KCkgd2VpZ2h0AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAQAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAMAAAACaWQAAAACaWQAAAAAAAAAAnB2AAAAAnB2AAAAAAAA
AAZ3ZWlnaHQAAAAId2VpZ2h0KCkAAAAAAAAAAAAAAAAAAAAA
--- request dump end ---
--- local index:rival
Manticore 6.2.12 dc5144d@230822 (columnar 2.2.4 5aec342@230822) (secondary 2.2.4 5aec342@230822)
Handling signal 11
-------------- backtrace begins here ---------------
Program compiled with Clang 15.0.7
Configured with flags: Configured with these definitions: -DDISTR_BUILD=jammy -DUSE_SYSLOG=1 -DWITH_GALERA=1 -DWITH_RE2=1 -DWITH_RE2_FORCE_STATIC=1 -DWITH_STEMMER=1 -DWITH_STEMMER_FORCE_STATIC=1 -DWITH_NLJSON=1 -DWITH_UNIALGO=1 -DWITH_ICU=1 -DWITH_ICU_FORCE_STATIC=1 -DWITH_SSL=1 -DWITH_ZLIB=1 -DWITH_ZSTD=1 -DDL_ZSTD=1 -DZSTD_LIB=libzstd.so.1 -DWITH_CURL=1 -DDL_CURL=1 -DCURL_LIB=libcurl.so.4 -DWITH_ODBC=1 -DDL_ODBC=1 -DODBC_LIB=libodbc.so.2 -DWITH_EXPAT=1 -DDL_EXPAT=1 -DEXPAT_LIB=libexpat.so.1 -DWITH_ICONV=1 -DWITH_MYSQL=1 -DDL_MYSQL=1 -DMYSQL_LIB=libmysqlclient.so.21 -DWITH_POSTGRESQL=1 -DDL_POSTGRESQL=1 -DPOSTGRESQL_LIB=libpq.so.5 -DLOCALDATADIR=/var/lib/manticore -DFULL_SHARE_DIR=/usr/share/manticore
Built on Linux x86_64 (jammy) (cross-compiled)
Stack bottom = 0x7f909c025da0, thread stack size = 0x20000
Trying manual backtrace:
Something wrong with thread stack, manual backtrace may be incorrect (fp=0x1)
Wrong stack limit or frame pointer, manual backtrace failed (fp=0x1, stack=0x7f909c020000, stacksize=0x20000)
Trying system backtrace:
begin of system symbols:
searchd(_Z12sphBacktraceib+0x22a)[0x55e8fe01d9fa]
searchd(_ZN11CrashLogger11HandleCrashEi+0x355)[0x55e8fde9c2b5]
/lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f9322693520]
searchd(_ZNK13Expr_GetInt_c4EvalERK9CSphMatch+0x38)[0x55e8febd3c88]
searchd(_ZNK10Expr_Add_c4EvalERK9CSphMatch+0x27)[0x55e8febdd1c7]
searchd(_ZNK11Expr_Log2_c4EvalERK9CSphMatch+0xb)[0x55e8febd5d5b]
searchd(_ZNK11Expr_Madd_c4EvalERK9CSphMatch+0x27)[0x55e8febd8627]
searchd(_ZN19RankerState_Expr_fnILb0ELb0EE8FinalizeERK9CSphMatch+0x3b)[0x55e8fec27e0b]
searchd(_ZN17ExtRanker_State_TI19RankerState_Expr_fnILb0ELb0EELb1EE10GetMatchesEv+0x4c6)[0x55e8fec26b26]
searchd(+0x1c1cd66)[0x55e8fec5ed66]
searchd(+0x1c0dc9d)[0x55e8fec4fc9d]
searchd(_ZNK9RtIndex_c10MultiQueryER15CSphQueryResultRK9CSphQueryRK11VecTraits_TIP15ISphMatchSorterERK18CSphMultiQueryArgs+0x1c20)[0x55e8fec4e310]
searchd(_ZNK13CSphIndexStub12MultiQueryExEiPK9CSphQueryP15CSphQueryResultPP15ISphMatchSorterRK18CSphMultiQueryArgs+0x71)[0x55e8fdfea1b1]
searchd(+0xebf20b)[0x55e8fdf0120b]
searchd(+0x1d223cd)[0x55e8fed643cd]
searchd(_ZN7Threads4Coro8ExecuteNEiOSt8functionIFvvEE+0x78)[0x55e8ff169ff8]
searchd(_ZN15SearchHandler_c16RunLocalSearchesEv+0xb39)[0x55e8fdeb2689]
searchd(_ZN15SearchHandler_c9RunSubsetEii+0x51a)[0x55e8fdeb39ea]
searchd(_ZN15SearchHandler_c10RunQueriesEv+0xd4)[0x55e8fdeb06e4]
searchd(_Z19HandleCommandSearchR16ISphOutputBuffertR13InputBuffer_c+0x333)[0x55e8fdebb4d3]
searchd(_Z8ApiServeSt10unique_ptrI16AsyncNetBuffer_cSt14default_deleteIS0_EE+0x7b1)[0x55e8fde2a011]
searchd(_Z10MultiServeSt10unique_ptrI16AsyncNetBuffer_cSt14default_deleteIS0_EESt4pairIitE7Proto_e+0x12e)[0x55e8fde27ffe]
searchd(+0xde6af2)[0x55e8fde28af2]
searchd(ZZN7Threads11CoRoutine_c13CreateContextESt8functionIFvvEESt4pairIN5boost7context13stack_contextENS_14StackFlavour_EEEENUlNS6_6detail10transfer_tEE_8__invokeESB+0x1c)[0x55e8ff16cdac]
searchd(make_fcontext+0x37)[0x55e8ff18d167]
Trying boost backtrace:
0# sphBacktrace(int, bool) in searchd
1# CrashLogger::HandleCrash(int) in searchd
2# 0x00007F9322693520 in /lib/x86_64-linux-gnu/libc.so.6
3# Expr_GetInt_c::Eval(CSphMatch const&) const in searchd
4# Expr_Add_c::Eval(CSphMatch const&) const in searchd
5# Expr_Log2_c::Eval(CSphMatch const&) const in searchd
6# Expr_Madd_c::Eval(CSphMatch const&) const in searchd
7# RankerState_Expr_fn<false, false>::Finalize(CSphMatch const&) in searchd
8# ExtRanker_State_T<RankerState_Expr_fn<false, false>, true>::GetMatches() in searchd
9# 0x000055E8FEC5ED66 in searchd
10# 0x000055E8FEC4FC9D in searchd
11# RtIndex_c::MultiQuery(CSphQueryResult&, CSphQuery const&, VecTraits_T<ISphMatchSorter*> const&, CSphMultiQueryArgs const&) const in searchd
12# CSphIndexStub::MultiQueryEx(int, CSphQuery const*, CSphQueryResult*, ISphMatchSorter**, CSphMultiQueryArgs const&) const in searchd
13# 0x000055E8FDF0120B in searchd
14# 0x000055E8FED643CD in searchd
15# Threads::Coro::ExecuteN(int, std::function<void ()>&&) in searchd
16# SearchHandler_c::RunLocalSearches() in searchd
17# SearchHandler_c::RunSubset(int, int) in searchd
18# SearchHandler_c::RunQueries() in searchd
19# HandleCommandSearch(ISphOutputBuffer&, unsigned short, InputBuffer_c&) in searchd
20# ApiServe(std::unique_ptr<AsyncNetBuffer_c, std::default_delete<AsyncNetBuffer_c> >) in searchd
21# MultiServe(std::unique_ptr<AsyncNetBuffer_c, std::default_delete<AsyncNetBuffer_c> >, std::pair<int, unsigned short>, Proto_e) in searchd
22# 0x000055E8FDE28AF2 in searchd
23# Threads::CoRoutine_c::CreateContext(std::function<void ()>, std::pair<boost::context::stack_context, Threads::StackFlavour_E>)::{lambda(boost::context::detail::transfer_t)#1}::__invoke(boost::context::detail::transfer_t) in searchd
24# make_fcontext in searchd

-------------- backtrace ends here ---------------
Please, create a bug report in our bug tracker (https://github.com/manticoresoftware/manticore/issues)
and attach there:
a) searchd log, b) searchd binary, c) searchd symbols.
Look into the chapter 'Reporting bugs' in the manual
(https://manual.manticoresearch.com/Reporting_bugs)
Dump with GDB via watchdog
--- active threads ---
thd 0 (work_9), proto sphinx, state query, command search
--- Totally 4 threads, and 1 client-working threads ---
------- CRASH DUMP END -------
b

Additional context

@sanikolaev
Copy link
Collaborator

Thanks for reporting. Can you please:

  • check that the table is valid by running indextool --check
  • check if you can reproduce the crash in the latest dev version (https://mnt.cr/nightly)
    ?

@sanikolaev sanikolaev added the waiting Waiting for the original poster (in most cases) or something else label Feb 1, 2024
@Vickcle
Copy link
Author

Vickcle commented Feb 1, 2024

when i do indextool --check, it reports: "check FAILED, 99 of 2370815 failures reported, 6.5 sec elapsed
checking disk chunk, extension 8613, 1(192)...
checking schema...
checking dictionary...
checking data...
checking rows...
checking attribute blocks index...
checking kill-list...
checking docstore...
checking dead row map...
checking doc-id lookup...
check FAILED, 99 of 2370815 failures reported, 12.8 sec elapsed"

@Vickcle
Copy link
Author

Vickcle commented Feb 1, 2024

I guess there might be some issue when i try do drop a fulltext column ,and the process might not be really done.

@sanikolaev
Copy link
Collaborator

I guess there might be some issue when i try do drop a fulltext column ,and the process might not be really done.

Yes, it might have corrupted the table. As said in the docs about ALTER:

❗It's recommended to backup table files before ALTER ing it to avoid data corruption in case of a sudden power interruption or other similar issues.

If you have it I recommend restoring to it.

@Vickcle
Copy link
Author

Vickcle commented Feb 2, 2024

Okey, We have more than one replication. it's accepctable. But should we consider about redo it or continue do it to aviod the interrupt? I mean not rely on backup but can continue do the drop column job.

@Vickcle Vickcle closed this as completed Feb 2, 2024
@Vickcle Vickcle reopened this Feb 2, 2024
@Vickcle Vickcle closed this as completed Feb 2, 2024
@sanikolaev
Copy link
Collaborator

I mean not rely on backup but can continue do the drop column job.

AFAIK there's no known issue that an ALTER corrupts a table. In most cases it turns out that the table is already corrupted before the ALTER, then the ALTER crashes. ALTER can be considered safe, especially when you make a backup and indextool --check before it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
waiting Waiting for the original poster (in most cases) or something else
Projects
None yet
Development

No branches or pull requests

2 participants