-
-
Notifications
You must be signed in to change notification settings - Fork 510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
intermittent core dump when running query with SNIPPET() function #387
Comments
Downgraded to Manticore 3.5.0 1d34c49@200722 , seems that the issue does not longer occur so the bug was introduced sometime since. Will continue monitoring. Seems like the variable pIndex passed into the method is null, I don't have sufficient familiarity with the code base to debug this. |
could you provide your config and idx_subtitles_content_dev to debug the crash here locally? Is index plain or RT? Do you reindex your data and rotate that index ? |
Do you mean the entire index? It's around 20GB, I don't think it's feasible. |
you might upload all your private data at our write only FTP
However core file might be checked only at box there crash happens that is why it might be useless at our box. Could you upload your daemon log where more events logged prior and after crash? |
nice password, uploaded to /github-issue-387. There doesn't seem to be much interesting in the daemon log. |
I will upload the idx_subtitles_content_dev_2 shard shortly, it will be around 2GB compressed. Hopefully it will be sufficient to reproduce the problem. |
Upload completed into /github-issue-387 |
Hello, this issue still happens for me in 3.5.4 13f8d08@201211 , intermittently as before. |
@popalot2 Hello. I can't find your config neither in the ftp folder nor here. Can you upload it to the same folder or put it in this issue? |
test fix for intermittent core dump when running query with SNIPPET() function manticoresoftware#387 due to mutable ExprHook_c m_tHook being shared across all threads, when a parallel local query uses SNIPPET there is a race condition, where multiple threads do tCtx.m_tHook.SetIndex ( pServed->m_pIndex ) during parsing which causes a core dump Added a lock to prevent race condition, this should probably have a different hook per thread but I am not familiar enough with the code base to do such a fix. Tested extensively and the problem no longer occurs, performance impact seems negligible but for a case of a lot of small queries might be noticable.
Hello @sanikolaev, it seems your ftp ran out of space, I can't upload. Command: MKD /github-issue-387-update2 I've diagnosed the issue and made a test fix here: due to mutable ExprHook_c m_tHook being shared across all threads, You have this line in your code but it seems you didn't get around to fixing this.
|
Sorry for that, pls try now
Awesome! Can you make a PR? |
I think I made PR, I am not experienced with github and not really sure how that works. uploaded config, index and query to github-issue-387/update_04_01_2021/popalot2bug387update01Jan2021..zip |
Right. Here it is #472 I've reproduced the crash on your data/query. Our developers will look into that and into the PR. |
➤ Sergey Nikolaev commented: How to reproduce on the dev server:
Notes:
|
➤ Aleksey N. Vinogradov commented: Concurrent searches (and comment about them) is not actual for the first glance, as the way it works now is isolated clone of context which belongs to one thread only and so, not to be affected by any concurrency. The fact that serializing of all searches helps m.b. affected just by the fact of such serialization (as one static mutex made all searches on all threads came throw one serialization point made by mutex), so it needs to be looked deeper which way the index pointer might became zeroed (m.b. moment of cloning is the key). Anyway, random mutex here is a good hint. |
I agree that this needs a deeper dig. |
I'm against of apply such PR#472 as it clear that Or another option is to limit concurrency to 1 in case m_tHook is present and code for it cloning could not be easily made |
I concur that this fix should not be merged into the production branch. |
we will fix the issue in the upcoming release |
➤ Aleksey N. Vinogradov commented:
The patch you provide do it ) One static global variable (lock) means it affects every search despite indexes in use (just by the fact it is global). And it is called on each turn before creating sorters - so each search despite whether it uses snippets or not goes throw the lock. |
➤ Aleksey N. Vinogradov commented: Strictly speaking, running provided line However using true distributed idx_manual_subtitles_content_prod_dump works. |
I've pushed the fix - e8b3b38 If you interesting - reason was in miss-using that hook. And that is explainable by the fact that snippets to distributed indexes wasn't in usecases at all before, so that hook and missusing it was just kind of 'theoretical future issue' |
Describe the bug
manticore dev 3.5.1 produces coredump intermittently when running a query (~5% of the time hit coredump 95% of time query works fine).
To Reproduce
unclear yet
Expected behavior
expect not to produce coredump
Describe the environment:
Messages from log files:
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/bin/searchd --coredump --config /etc/manticoresearch/manticore.conf'.
Program terminated with signal 11, Segmentation fault.
#0 SnippetBuilder_c::Impl_c::Setup (this=0x7efc1c010540, pIndex=0x0, tSettings=...) at /usr/src/debug/manticore-3.5.1-200811-04af034-release-rhel7/bin/src_0/src/sphinxexcerpt.cpp:1369
1369 m_pDict = GetStatelessDict ( pIndex->GetDictionary () );
(gdb) p
The history is empty.
(gdb) where
#0 SnippetBuilder_c::Impl_c::Setup (this=0x7efc1c010540, pIndex=0x0, tSettings=...) at /usr/src/debug/manticore-3.5.1-200811-04af034-release-rhel7/bin/src_0/src/sphinxexcerpt.cpp:1369
#1 0x00000000006fe6d8 in SnippetBuilder_c::Setup (this=, pIndex=, tQuery=...) at /usr/src/debug/manticore-3.5.1-200811-04af034-release-rhel7/bin/src_0/src/sphinxexcerpt.cpp:1590
#2 0x000000000092ffa4 in Expr_Snippet_c::Expr_Snippet_c (this=0x7efc1c00fc20, pArglist=0x7efc1c00fbf0, pIndex=, pProfiler=, eQueryType=, sError=...)
at /usr/src/debug/manticore-3.5.1-200811-04af034-release-rhel7/bin/src_0/src/searchdexpr.cpp:365
#3 0x0000000000932a07 in ExprHook_c::CreateNode (this=0x7ef6ec0413e0, iID=, pLeft=0x7efc1c00fbf0, pEvalStage=, sError=...) at /usr/src/debug/manticore-3.5.1-200811-04af034-release-rhel7/bin/src_0/src/searchdexpr.cpp:727
#4 0x000000000082b57b in ExprParser_t::CreateTree (this=this@entry=0x7efc500fe340, iNode=13) at /usr/src/debug/manticore-3.5.1-200811-04af034-release-rhel7/bin/src_0/src/sphinxexpr.cpp:6206
#5 0x000000000082eb3c in ExprParser_t::Parse (this=this@entry=0x7efc500fe340, sExpr=sExpr@entry=0x7efc50017ce0 "snippet(subtitles, QUERY(),'before_match=<<<','after_match=>>>','limit=300','around=30','limit_passages=1')", tSchema=..., pAttrType=0x7efc500fe4f8,
pUsesWeight=0x7efc500fe542, sError=...) at /usr/src/debug/manticore-3.5.1-200811-04af034-release-rhel7/bin/src_0/src/sphinxexpr.cpp:9309
#6 0x000000000082ee86 in sphExprParse (sExpr=0x7efc50017ce0 "snippet(subtitles, QUERY(),'before_match=<<<','after_match=>>>','limit=300','around=30','limit_passages=1')", tSchema=..., sError=..., tArgs=...)
at /usr/src/debug/manticore-3.5.1-200811-04af034-release-rhel7/bin/src_0/src/sphinxexpr.cpp:9347
#7 0x000000000073f1b0 in QueueCreator_c::ParseQueryItem (this=this@entry=0x7efc1c4afb20, tItem=...) at /usr/src/debug/manticore-3.5.1-200811-04af034-release-rhel7/bin/src_0/src/sphinxsort.cpp:6239
#8 0x000000000073f832 in operator() (v=..., __closure=) at /usr/src/debug/manticore-3.5.1-200811-04af034-release-rhel7/bin/src_0/src/sphinxsort.cpp:6430
#9 TestAll<QueueCreator_c::MaybeAddExpressionsFromSelectList()::__lambda39> (cond=, this=0x7efc500150b8) at /usr/src/debug/manticore-3.5.1-200811-04af034-release-rhel7/bin/src_0/src/sphinxstd.h:1253
#10 QueueCreator_c::MaybeAddExpressionsFromSelectList (this=0x7efc1c4afb20) at /usr/src/debug/manticore-3.5.1-200811-04af034-release-rhel7/bin/src_0/src/sphinxsort.cpp:6430
#11 0x0000000000741ca8 in CreateMultiQueue (dCreators=..., tQueue=..., dQueries=..., dSorters=..., dErrors=..., tRes=..., dExtras=...) at /usr/src/debug/manticore-3.5.1-200811-04af034-release-rhel7/bin/src_0/src/sphinxsort.cpp:7036
#12 0x000000000074220c in sphCreateMultiQueue (tQueue=..., dQueries=..., dSorters=..., dErrors=..., tRes=..., dExtras=...) at /usr/src/debug/manticore-3.5.1-200811-04af034-release-rhel7/bin/src_0/src/sphinxsort.cpp:7152
#13 0x000000000059d53b in SearchHandler_c::CreateMultiQueryOrFacetSorters (this=0x7ef6ec0402b0, pIndex=, dSorters=..., dErrors=..., dExtraSchemas=..., tQueueRes=...) at /usr/src/debug/manticore-3.5.1-200811-04af034-release-rhel7/bin/src_0/src/searchd.cpp:5214
#14 0x00000000005b4398 in std::_Function_handler<void(), SearchHandler_c::RunLocalSearchesCoro()::__lambda18>::_M_invoke(const std::_Any_data &) (__functor=...) at /usr/src/debug/manticore-3.5.1-200811-04af034-release-rhel7/bin/src_0/src/searchd.cpp:5569
#15 0x0000000000969947 in operator() (__closure=0x7efc500de900) at /usr/src/debug/manticore-3.5.1-200811-04af034-release-rhel7/bin/src_0/src/task_info.cpp:216
#16 std::_Function_handler<void(), myinfo::OwnMini(Threads::Handler)::__lambda14>::_M_invoke(const std::_Any_data &) (__functor=...) at /usr/include/c /4.8.2/functional:2071
#17 0x0000000000965bf7 in WorkerLowest (tCtx=, this=0x7efc500de948) at /usr/src/debug/manticore-3.5.1-200811-04af034-release-rhel7/bin/src_0/src/coroutine.cpp:74
#18 operator() (pT=..., __closure=0x0) at /usr/src/debug/manticore-3.5.1-200811-04af034-release-rhel7/bin/src_0/src/coroutine.cpp:95
#19 Threads::CoRoutine_c::CoRoutine_c(std::function<void ()>, unsigned long)::{lambda(boost::context::detail::transfer_t)#1}::_FUN(boost::context::detail::transfer_t) (pT=...) at /usr/src/debug/manticore-3.5.1-200811-04af034-release-rhel7/bin/src_0/src/coroutine.cpp:96
#20 0x0000000000965bf7 in CoroWorker_c (pScheduler=, fnHandler=..., this=)
#21 StartPrimary (pScheduler=, fnHandler=...) at /usr/src/debug/manticore-3.5.1-200811-04af034-release-rhel7/bin/src_0/src/coroutine.cpp:301
#22 Threads::CoRoutine_c::CoRoutine_c(std::function<void ()>, unsigned long)::{lambda(boost::context::detail::transfer_t)#1}::_FUN(boost::context::detail::transfer_t) () at /usr/src/debug/manticore-3.5.1-200811-04af034-release-rhel7/bin/src_0/src/coroutine.cpp:434
#23 0x000000000096b2ef in make_fcontext ()
#24 0x0000000000000000 in ?? ()
Additional context
query:
SELECT id,vid videoid,allowembed,SNIPPET(subtitles, QUERY(),'before_match=<<<','after_match=>>>','limit=300','around=30','limit_passages=1') snip,title,viewcount,likecount,dislikecount,subnum,subisauto,sublang FROM idx_subtitles_content_dev WHERE MATCH('@subtitles "hello world"') AND subisauto=1 LIMIT 0,50 OPTION max_query_time=3000;
currently trying to rebuild index to see if problem resolves, if more data is needed please suggest.
The text was updated successfully, but these errors were encountered: