Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash and corrupted index on 3.4.2 under query load #367

Closed
lox opened this issue Jul 5, 2020 · 3 comments
Closed

Crash and corrupted index on 3.4.2 under query load #367

lox opened this issue Jul 5, 2020 · 3 comments
Labels
bug waiting Waiting for the original poster (in most cases) or something else wontfix

Comments

@lox
Copy link

lox commented Jul 5, 2020

Manticore Search version (top line in output of bin/searchd -v or bin/indexer -v):

Manticore 3.4.2

OS version (uname -a if on a Unix-like system):

Distributor ID: Ubuntu
Description: Ubuntu 20.04 LTS
Release: 20.04
Codename: focal

Linux xxx 5.4.0-26-generic #30-Ubuntu SMP Mon Apr 20 16:58:30 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Build version:

manticore_3.4.2-200410-69033058-release.buster_amd64-bin.deb

Describe the problem:

This crash happens repeatably after I recreate the index over the mysql connection from a golang project and then start doing heavy searches against it.

I add things to the index in bulk with INSERT and then run OPTIMIZE afterwards, then run queries against it. This works great on macOS (manticore 3.4.2), but seems to crash on the production linux environment.

It seems like this actually corrupts the index as after searchd restarts the index doesn't exist and I need to manually delete the folder for it to be recreateable.

Steps to reproduce:

I'm afraid it's going to be tough to create a simple reproduceable case 🤔

See logs below, nothing custom. Index is created with the following options (excuse the code, couldn't figure out how to get these out of sphinxql):

go code with table options
    
	var options = map[string]interface{}{    
		`wordforms`:        filepath.Join(wd, `manticore`, `wordforms.txt`),    
		`stopwords`:        filepath.Join(wd, `manticore`, `stopwords.txt`),    
		`rt_mem_limit`:     `512M`,    
		`charset_table`:    `non_cjk, %, _, &`,    
		`ignore_chars`:     `\'`,    
		`min_infix_len`:    `3`,    
		`morphology`:       `stem_en`,    
		`min_stemming_len`: `4`,    
		`regexp_filter`: []string{    
			`(?i)\\b0 ([1-9]) => \\1`,                              // 04 -> 4    
			`(?i)\\b([0-9] )yo\\b => \\1 yo`,                       // 12yo -> 12 yo    
			`(?i)\\b([0-9] (?:\\.[0-9] )?)\\s*\\(gradi\\) => \\1%`, // 43 (gradi) -> 43%    
			`(?i)\\b([0-9] )proof => \\1 proof`,                    // 92proof -> 92 proof    
			`(?i)\\b([0-9] )\\.0 % => \\1%`,                        // 50.0% -> 50%    
			`(?i)\\b([0-9] )\\.([0-9] )% => \\1_\\2%`,              // 50.1% -> 50_1%    
			`(?i)\\b([0-9] ),([0-9] )% => \\1_\\2%`,                // 50,1% -> 50_1%    
			`(?i)\\b([0-9] ),0 % => \\1%`,                          // 50,0% -> 50%    
			`(?i)\\b([0-9] )\\s ml => \\1ml`,                       // 750 ml -> 750ml    
			`(?i)\\b([0-9] )\\s*cl => \\10ml`,                      // 75cl -> 750ml    
			`(?i)\\b0\.(\\d{1})\\s*L => \\100ml`,                   // 0.7L -> 700ml    
			`(?i)\\b0\.(\\d{2})\\s*L => \\10ml`,                    // 0.75L -> 750ml    
			`(?i)\\b([1-9][0-9]*)\\s*L => \\1000ml`,                // 1L -> 1000ml    
			`(?i)\\b([\\w-] )\-glenlivet\\b => \\1`,                // aberlour-glenlivet -> aberlour    
			`(?i)\\b(whisky)(\\w )\\b => \\1 \\2`,                  // whiskybroker -> whisky broker    
			`(?i)\\b(glenn?)(\\w )\\b => \\1 \\2`,                  // glenkeith -> glen keith    
			`\\b(\\d ),(\\d{3,})\\b => \\1\\2`,                     // 24,000 -> 24000    
		},    
	}    

My workflow looks like:

  1. Recreate index (using mysql protocol)
  2. Optimize index
  3. Run my indexer program that searches manticore

It doesn't always crash, but it seems to quite often on different queries.

Messages from log files:

searchd.log
    
[Mon Jun 22 11:38:26.377 2020] [872009] watchdog: main process 872010 forked ok    
[Mon Jun 22 11:38:26.378 2020] [872010] listening on 127.0.0.1:9312    
[Mon Jun 22 11:38:26.378 2020] [872010] listening on 127.0.0.1:9306    
[Mon Jun 22 11:38:26.378 2020] [872010] listening on 127.0.0.1:9308    
[Mon Jun 22 11:38:26.379 2020] [872010] accepting connections    
[Mon Jun 22 11:38:26.380 2020] [872014] prereading 0 indexes    
[Mon Jun 22 11:38:26.380 2020] [872014] prereaded 0 indexes in 0.000 sec    
[Sat Jul  4 04:57:39.603 2020] [615] watchdog: main process 616 forked ok    
[Sat Jul  4 04:57:39.606 2020] [616] listening on 127.0.0.1:9312    
[Sat Jul  4 04:57:39.607 2020] [616] listening on 127.0.0.1:9306    
[Sat Jul  4 04:57:39.607 2020] [616] listening on 127.0.0.1:9308    
[Sat Jul  4 04:57:39.611 2020] [616] binlog: replaying log /var/lib/manticore/binlog/binlog.001    
[Sat Jul  4 04:57:39.611 2020] [616] WARNING: binlog: empty binlog /var/lib/manticore/binlog/binlog.001 detected, skipping    
[Sat Jul  4 04:57:39.611 2020] [616] binlog: finished replaying total 1 in 0.001 sec    
[Sat Jul  4 04:57:39.642 2020] [616] accepting connections    
[Sat Jul  4 04:57:39.642 2020] [624] prereading 0 indexes    
[Sat Jul  4 04:57:39.642 2020] [624] prereaded 0 indexes in 0.000 sec    
[Sun Jul  5 08:43:04.023 2020] [41338] rt: index whisky: optimized (progressive) chunk(s) 0 ( of 0 ) in 0.000 sec    
[Sun Jul  5 08:51:51.618 2020] [41338] rt: index whisky: optimized (progressive) chunk(s) 0 ( of 0 ) in 0.000 sec    
[Sun Jul  5 08:52:00.936 2020] [41338] last message repeated 100 times    
------- FATAL: CRASH DUMP -------    
[Sun Jul  5 09:31:39.837 2020] [  616]    

--- crashed SphinxQL request dump ---
SELECT , WEIGHT() (votes10) weight
FROM whisky
WHERE match('@(distillery,
bottler,brand) "starward 7 yo 2012 2019 59% ob whisky exchange 20th anniversary refill
australian apera cask 220"/1 (@* "starward 7 yo 2012 2019 59% ob whisky exchange 20th
anniversary refill australian apera cask 220"/16 | @* "starward 7 yo 2012 2019 59%
ob whisky exchange 20th anniversary refill australian apera cask 220"/15 | @* "starward
7 yo 2012 2019 59% ob whisky exchange 20th anniversary refill australian apera cask
220"/14 | @* "starward 7 yo 2012 2019 59% ob whisky exchange 20th anniversary refill
australian apera cask 220"/13 | @* "starward 7 yo 2012 2019 59% ob whisky exchange
20th anniversary refill australian apera cask 220"/12 | @* "starward 7 yo 2012 2019
59% ob whisky exchange 20th anniversary refill australian apera cask 220"/11)')

ORDER BY weight DESC, bottledyear DESC, nobottledfor DESC, nobottlingseries DESC,
sizemls DESC
LIMIT 20
OPTION ranker=expr('sum(lcs*user_weight)*1000 bm25')
--- request dump end ---
--- local index:
Manticore 3.4.2 69033058@200410 release
Handling signal 11
-------------- backtrace begins here ---------------
Program compiled with 8
Configured with flags: Configured by CMake with these definitions: -DCMAKE_BUILD_TYPE=RelWithDebInfo -DDISTR_BUILD=buster -DUSE_SSL=ON -DDL_UNIXODBC=1 -DUNIXODBC_LIB=libodbc.so.2 -DDL_EXPAT=1 -DEXPAT_LIB=libexpat.so.1 -DUSE_LIBICONV=1 -DDL_MYSQL=1 -DMYSQL_LIB=libmariadb.so.3 -DDL_PGSQL=1 -DPGSQL_LIB=libpq.so.5 -DLOCALDATADIR=/var/data -DFULL_SHARE_DIR=/usr/share/manticore -DUSE_ICU=1 -DUSE_BISON=ON -DUSE_FLEX=ON -DUSE_SYSLOG=1 -DWITH_EXPAT=1 -DWITH_ICONV=ON -DWITH_MYSQL=1 -DWITH_ODBC=ON -DWITH_PGSQL=1 -DWITH_RE2=1 -DWITH_STEMMER=1 -DWITH_ZLIB=ON -DGALERA_SOVERSION=31 -DSYSCONFDIR=/etc/manticoresearch
Host OS is Linux runner-0277ea0f-project-3858465-concurrent-0 4.19.78-coreos #1 SMP Mon Oct 14 22:56:39 -00 2019 x86_64 GNU/Linux
Stack bottom = 0x7fe1da365e9f, thread stack size = 0x100000
Trying manual backtrace:
Something wrong with thread stack, manual backtrace may be incorrect (fp=0x55d9d02ed4b0)
Wrong stack limit or frame pointer, manual backtrace failed (fp=0x55d9d02ed4b0, stack=0x7fe1da360000, stacksize=0x100000)
Trying system backtrace:
begin of system symbols:
/usr/bin/searchd(_Z12sphBacktraceib 0x9b)[0x55d9d0474fcb]
/usr/bin/searchd(_ZN16SphCrashLogger_c11HandleCrashEi 0x20a)[0x55d9d02edaca]
/lib/x86_64-linux-gnu/libpthread.so.0( 0x153c0)[0x7fe1dad563c0]
/usr/bin/searchd(_Z11sphUnzipIntRPKh 0xa)[0x55d9d03d9bba]
/usr/bin/searchd(_ZNK12DocstoreRT_c6GetDocEjPK11VecTraits_TIiElb 0x1f8)[0x55d9d06576d8]
/usr/bin/searchd(_ZNK9RtIndex_c6GetDocER13DocstoreDoc_tlPK11VecTraits_TIiElb 0x158)[0x55d9d05c13b8]
/usr/bin/searchd(_ZNK15Expr_GetField_c16StringEvalPackedERK9CSphMatch 0x87)[0x55d9d057a5e7]
/usr/bin/searchd( 0x28eae3)[0x55d9d0325ae3]
/usr/bin/searchd(_Z18MinimizeAggrResultR12AggrResult_tRK9CSphQuerybRKN3sph9StringSetEP16CSphQueryProfilePK18CSphFilterSettingsbbR12VecRefPtrs_tIP11AgentConn_tE 0xead)[0x55d9d032c33d]
/usr/bin/searchd(_ZN15SearchHandler_c9RunSubsetEii 0x11a7)[0x55d9d032d9e7]
/usr/bin/searchd(_ZN15SearchHandler_c10RunQueriesEv 0x95)[0x55d9d032e115]
/usr/bin/searchd(_Z17HandleMysqlSelectR11RowBuffer_iR15SearchHandler_c 0x152)[0x55d9d032e712]
/usr/bin/searchd(_ZN16CSphinxqlSession7ExecuteERK10CSphStringR11RowBuffer_iRN7Threads9ThdDesc_tE 0x58b)[0x55d9d0357b4b]
/usr/bin/searchd(_Z15LoopClientMySQLRhR16CSphinxqlSessionR10CSphStringibRN7Threads9ThdDesc_tER13InputBuffer_cR16ISphOutputBuffer 0x2c5)[0x55d9d03375f5]
/usr/bin/searchd(_ZN10ThdJobQL_t4CallEv 0x236)[0x55d9d03c85e6]
/usr/bin/searchd(_ZN11CSphThdPool4TickEPv 0x8f)[0x55d9d0480bdf]
/usr/bin/searchd(_Z20sphThreadProcWrapperPv 0x23)[0x55d9d047c933]
/lib/x86_64-linux-gnu/libpthread.so.0( 0x9609)[0x7fe1dad4a609]
/lib/x86_64-linux-gnu/libc.so.6(clone 0x43)[0x7fe1da593103]
-------------- backtrace ends here ---------------
Please, create a bug report in our bug tracker (https://github.com/manticoresoftware/manticore/issues)
and attach there:
a) searchd log, b) searchd binary, c) searchd symbols.
Look into the chapter 'Reporting bugs' in the documentation
(http://docs.manticoresearch.com/latest/html/reporting_bugs.html)
Dump with GDB via watchdog
[Sun Jul 5 09:31:40.350 2020] [615] watchdog: got USR1, performing dump of child's stack
Will run gdb on '/usr/bin/searchd', pid '616'
--- 2 active threads ---
thd 0, proto sphinxql, state net_write, command replace
thd 1, proto sphinxql, state query, command select
------- CRASH DUMP END -------
[Sun Jul 5 09:31:43.451 2020] [615] watchdog: main process 616 crashed via CRASH_EXIT (exit code 2), will be restarted
[Sun Jul 5 09:31:43.451 2020] [615] watchdog: got error 10, No child processes
[Sun Jul 5 09:31:43.451 2020] [615] watchdog: main process 42531 forked ok
[Sun Jul 5 09:31:43.455 2020] [42531] listening on 127.0.0.1:9312
[Sun Jul 5 09:31:43.456 2020] [42531] listening on 127.0.0.1:9306
[Sun Jul 5 09:31:43.456 2020] [42531] listening on 127.0.0.1:9308
[Sun Jul 5 09:31:43.458 2020] [42531] binlog: replaying log /var/lib/manticore/binlog/binlog.001
[Sun Jul 5 09:31:44.385 2020] [42531] WARNING: binlog: commit: CRC mismatch (index=wb_whisky, tid=33734, pos=160278760)
[Sun Jul 5 09:31:44.385 2020] [42531] WARNING: binlog: replay error at pos=160278755)
[Sun Jul 5 09:31:44.385 2020] [42531] binlog: index wb_whisky: recovered from tid 0 to tid 33733
[Sun Jul 5 09:31:44.385 2020] [42531] binlog: replay stats: 195172 rows in 33733 commits; 0 updates, 0 reconfigure; 0 pq-add; 0 pq-delete; 1 indexes
[Sun Jul 5 09:31:44.385 2020] [42531] binlog: finished replaying /var/lib/manticore/binlog/binlog.001; 152.8 MB in 0.926 sec
[Sun Jul 5 09:31:44.385 2020] [42531] binlog: replaying log /var/lib/manticore/binlog/binlog.001
[Sun Jul 5 09:31:45.075 2020] [42531] WARNING: binlog: commit: CRC mismatch (index=wb_whisky, tid=33734, pos=160278760)
[Sun Jul 5 09:31:45.075 2020] [42531] WARNING: binlog: replay error at pos=160278755)
[Sun Jul 5 09:31:45.075 2020] [42531] binlog: index wb_whisky: recovered from tid 0 to tid 33733
[Sun Jul 5 09:31:45.075 2020] [42531] binlog: replay stats: 195172 rows in 33733 commits; 0 updates, 0 reconfigure; 0 pq-add; 0 pq-delete; 1 indexes
[Sun Jul 5 09:31:45.075 2020] [42531] binlog: finished replaying /var/lib/manticore/binlog/binlog.001; 152.8 MB in 0.689 sec
[Sun Jul 5 09:31:45.075 2020] [42531] binlog: finished replaying total 2 in 1.616 sec
[Sun Jul 5 09:31:45.076 2020] [42531] accepting connections
[Sun Jul 5 09:31:45.076 2020] [42535] prereading 0 indexes
[Sun Jul 5 09:31:45.076 2020] [42535] prereaded 0 indexes in 0.000 sec
[Sun Jul 5 09:44:07.825 2020] [42533] WARNING: index 'wb_whisky': all source tokens are stopwords (wordform='', file='/var/lib/manticore/whisky/wordforms.txt'). IGNORED.
[Sun Jul 5 09:44:08.981 2020] [42531] WARNING: last message repeated 3 times

Nothing noteable in dmesg

manticore.conf
    
searchd    
{    
	listen			 = 127.0.0.1:9312    
	listen			 = 127.0.0.1:9306:mysql41    
	listen			 = 127.0.0.1:9308:http    
	log				 = /var/log/manticore/searchd.log    
	query_log		 = /var/log/manticore/query.log    
	pid_file		 = /var/run/manticore/searchd.pid    
	data_dir		 = /var/lib/manticore    
	query_log_format         = sphinxql    
        binlog_flush             = 1    
        max_open_files           = 10000    
}    

In case you've faced a crash what indextool --check returns:

    
Manticore 3.4.2 69033058@200410 release    
Copyright (c) 2001-2016, Andrew Aksyonoff    
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)    
Copyright (c) 2017-2020, Manticore Software LTD (http://manticoresearch.com)    

using config file '/etc/manticoresearch/manticore.conf'...
checking index 'wb_whisky'...
WARNING: failed to load RAM chunks, checking only 0 disk chunks
checking schema...
check passed, 0.0 sec elapsed

@lox
Copy link
Author

lox commented Jul 5, 2020

The query I use is a little odd, it's basically to do a search for "the best possible match with as many of these keywords as possible":

SELECT , WEIGHT()+(votes10) weight
FROM whisky
WHERE match('@(distillery,
bottler,brand) "starward 7 yo 2012 2019 59% ob whisky exchange 20th anniversary refill
australian apera cask 220"/1 (@* "starward 7 yo 2012 2019 59% ob whisky exchange 20th
anniversary refill australian apera cask 220"/16 | @* "starward 7 yo 2012 2019 59%
ob whisky exchange 20th anniversary refill australian apera cask 220"/15 | @* "starward
7 yo 2012 2019 59% ob whisky exchange 20th anniversary refill australian apera cask
220"/14 | @* "starward 7 yo 2012 2019 59% ob whisky exchange 20th anniversary refill
australian apera cask 220"/13 | @* "starward 7 yo 2012 2019 59% ob whisky exchange
20th anniversary refill australian apera cask 220"/12 | @* "starward 7 yo 2012 2019
59% ob whisky exchange 20th anniversary refill australian apera cask 220"/11)')

I use a "phrases"/N | "phrases"/N-1 | "phrases"/N-2 ... pattern as it seems like just "phrases"/1 returns less relevant matches.

@tomatolog
Copy link
Contributor

I'm afraid it's going to be tough to create a simple reproduceable case

could you set correct binlog_path searchd option then after daemon got crashed upload manticore.json and your config along with content of binlog_path folder into our write only FTP at

ftp: dev.manticoresearch.com
user: manticorebugs
pass: shithappens

Just create folder there with proper name.

This way all incoming TRX got stored into binlog that I could replay and get index at same state as you got prior to crash then I could try to investigate crash further.
Or you might upload you script that populate index with data and script that query index at FTP.

@githubmanticore githubmanticore added the waiting Waiting for the original poster (in most cases) or something else label Jul 6, 2020
@stale
Copy link

stale bot commented Aug 12, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Feel free to re-open the issue in case it becomes actual.

@stale stale bot added the wontfix label Aug 12, 2020
@stale stale bot closed this as completed Aug 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug waiting Waiting for the original poster (in most cases) or something else wontfix
Projects
None yet
Development

No branches or pull requests

3 participants