Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crash dump searchd #190

Closed
rlattuad opened this issue May 2, 2019 · 9 comments
Closed

crash dump searchd #190

rlattuad opened this issue May 2, 2019 · 9 comments

Comments

@rlattuad
Copy link

rlattuad commented May 2, 2019

Describe the environment

Centos
Apache Version | 2.4.38
PHP Version | 7.0.33

Manticore Search version:
Manticore 2.8.0 4006794@190128 release

Describe the problem

searchd Crash

Description of the issue:

Steps to reproduce:
Unable to reproduce

Messsages from log files:
------- FATAL: CRASH DUMP -------
[Thu May 2 03:59:45.181 2019] [27885]

--- crashed SphinxQL request dump ---
SELECT * FROM myHB_index2 WHERE MATCH ('"Bouillie bordelaise Rame sotto forma di
preparati a base di calce Copper in the form of prepared lime based "/3') AND (language_code
IN (8, 27, 10)) ORDER BY pubDate DESC LIMIT 0,64 OPTION max_matches=64, ranker=sph04
--- request dump end ---
Manticore 2.8.0 4006794@190128 release
Handling signal 11
-------------- backtrace begins here ---------------
Program compiled with 4.8.5
Configured with flags: Configured by CMake with these definitions: -DCMAKE_BUILD_TYPE=RelWithDebInfo -DDISTR_BUILD=rhel7 -DDL_UNIXODBC=1 -DUNIXODBC_LI
B=libodbc.so.2 -DDL_EXPAT=1 -DEXPAT_LIB=libexpat.so.1 -DUSE_LIBICONV=1 -DDL_MYSQL=1 -DMYSQL_LIB=libmysqlclient.so.18 -DDL_PGSQL=1 -DPGSQL_LIB=libpq.so
.5 -DDATADIR=/var/data -DFULL_SHARE_DIR=/usr/share/manticore -DSPLIT_SYMBOLS=ON -DUSE_BISON=ON -DUSE_FLEX=ON -DUSE_SYSLOG=1 -DWITH_EXPAT=ON -DWITH_ICO
NV=ON -DWITH_MYSQL=ON -DWITH_ODBC=ON -DWITH_PGSQL=ON -DWITH_RE2=ON -DWITH_STEMMER=ON -DWITH_ZLIB=ON -DSYSCONFDIR=/etc/sphinx
Host OS is Linux runner-72989761-project-3858465-concurrent-0 4.14.48-coreos-r2 #1 SMP Thu Jun 14 08:23:03 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Stack bottom = 0x7fbef4871eff, thread stack size = 0x40000
Trying manual backtrace:
Frame pointer is null, manual backtrace failed (did you build with -fomit-frame-pointer?)
Trying system backtrace:
begin of system symbols:
/usr/bin/searchd(_Z12sphBacktraceib+0x93)[0x6a0413]
/usr/bin/searchd(_ZN16SphCrashLogger_c11HandleCrashEi+0x187)[0x4f8e77]
/lib64/libpthread.so.0(+0xf5d0)[0x7fbf695225d0]
/usr/bin/searchd(_ZNK9CSphMatch7GetAttrERK15CSphAttrLocator+0x78)[0x55d228]
/usr/bin/searchd(_ZN16MatchGeneric2_fn6IsLessERK9CSphMatchS2_RK24CSphMatchComparatorState+0x17c)[0x6ddd0c]
/usr/bin/searchd(_ZN14CSphMatchQueueI16MatchGeneric2_fnLb0EE4PushERK9CSphMatch+0xe7)[0x6de087]
/usr/bin/searchd(_ZNK13CSphIndex_VLN13MatchExtendedEP16CSphQueryContextPK9CSphQueryiPP15ISphMatchSorterP10ISphRankerii+0x1de)[0x5b575e]
/usr/bin/searchd(_ZNK13CSphIndex_VLN16ParsedMultiQueryEPK9CSphQueryP15CSphQueryResultiPP15ISphMatchSorterRK9XQQuery_tP8CSphDictRK18CSphMultiQueryArgsP
18CSphQueryNodeCacheRK20SphWordStatChecker_t+0xfba)[0x5c8cca]
/usr/bin/searchd(_ZNK13CSphIndex_VLN10MultiQueryEPK9CSphQueryP15CSphQueryResultiPP15ISphMatchSorterRK18CSphMultiQueryArgs+0x690)[0x5fe880]
/usr/bin/searchd(_ZNK13CSphIndex_VLN12MultiQueryExEiPK9CSphQueryPP15CSphQueryResultPP15ISphMatchSorterRK18CSphMultiQueryArgs+0x8b0)[0x5e9d60]
/usr/bin/searchd(_ZN15SearchHandler_c16RunLocalSearchesEv+0xa30)[0x541500]
/usr/bin/searchd(_ZN15SearchHandler_c9RunSubsetEii+0x2328)[0x546e18]
/usr/bin/searchd(_ZN15SearchHandler_c10RunQueriesEv+0xb5)[0x5477b5]
/usr/bin/searchd(_Z17HandleMysqlSelectR14SqlRowBuffer_cR15SearchHandler_c+0x1c0)[0x547a60]
/usr/bin/searchd(_ZN16CSphinxqlSession7ExecuteERK10CSphStringR16ISphOutputBufferRhR9ThdDesc_t+0x137c)[0x57feac]
/usr/bin/searchd[0x5551bb]
/usr/bin/searchd(_ZN10ThdJobQL_t4CallEv+0x2b2)[0x555bd2]
/usr/bin/searchd(_ZN11CSphThdPool4TickEPv+0x9b)[0x6abc6b]
/usr/bin/searchd(_Z20sphThreadProcWrapperPv+0x25)[0x6a9265]
/lib64/libpthread.so.0(+0x7dd5)[0x7fbf6951add5]
/lib64/libc.so.6(clone+0x6d)[0x7fbf68406ead]
-------------- backtrace ends here ---------------
--- BT to source lines finished ---
--- 1 active threads ---
thd 0, proto sphinxql, state query, command select
------- CRASH DUMP END -------

@tomatolog
Copy link
Contributor

seems crash due to error on matching at index myHB_index2 - could you check your data with index_tool like

./indextool -c your.conf --check myHB_index2 

then in case indextool founds 0 error upload your index to our write only FTP

ftp: dev.manticoresearch.com
user: manticorebugs
pass: shithappens

I will replay query that cause this crash and investigate root of it.

@rlattuad
Copy link
Author

rlattuad commented May 3, 2019 via email

@tomatolog
Copy link
Contributor

no these errors unable to open stopwords seems just tool error and index valid with only these errors reported

@rlattuad
Copy link
Author

rlattuad commented May 3, 2019

I just uploaded the index onto directory issue-190

When I run the query from the mysql client it works ok for me

Thanks
Roberto

@tomatolog
Copy link
Contributor

I issued query that cause crash and daemon works fine now reply with 0 rows found.

However crash log shows there were matches (these daemon trying to sort) at crash event. Seems your index got rebuild since crash and does not cause crashes anymore.

In case you still see crashes or connection drop upload more crash logs and data related to them. As it hard to investigate crash related on data without data that cause it.

@abhijo89-uc
Copy link

@rlattuad Can you try with new release ? It come with 2 to 3 migration tool .

@rlattuad
Copy link
Author

rlattuad commented May 7, 2019

I will try but I have now a fair idea of what happened and what caused the problem.
There was a connection that would stay open for a long time and issue a huge number of queries, the connection was never closed and eventually (despite all settings) would time out, at that point "MySQL was no longer available" but in fact it was just a connection timeout.
After removing the cause (a misbehaving crawler) the problem disappeared.

@abhijo89-uc
Copy link

@manticoresearch : Do you think too many connection crash searchd ? Any updates ?

@rlattuad
Copy link
Author

Yes, too many connections on the mysqld connection/port that would never close, eventually they would time out and subsequent queries produce an error.
Extending the timeout or adding threads would simply delay the problem.
I think the problem was caused by a mix of misbehaving application (a badly written a crawler) and php pages not checking for some extreme conditions.
Overall I think searchd behaved correctly except for the crashes that happened once in a while under heavy load and while restarting with connections disappearing under the hood.
Very difficult to trace the whole set of events really.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants