Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash of searchd v2.2.11 #30

Closed
isqad opened this issue Nov 13, 2017 · 8 comments
Closed

Crash of searchd v2.2.11 #30

isqad opened this issue Nov 13, 2017 · 8 comments

Comments

@isqad
Copy link
Contributor

isqad commented Nov 13, 2017

Hello!

Recently we have a few crashes with following post mortem log:

------- FATAL: CRASH DUMP -------
[Wed Nov 1 05:07:31.691 2017] [ 6979]

--- crashed SphinxAPI request dump ---
AAABGQAAAdcAAAAAAAAAAQAAAAAAABOIAAAABgAAAAIAAAAAAAAAAAAAACNAY29tcGFueV9pZF9pZHggY29tcGFu
eV9pZF8xMDg5NzQwNgAAAAAAAAAHcHJvZHVjdAAAAAEAAAAAAAAAAAAAAAAAAAAAAAAABQAAAA5zcGhp
bnhfZGVsZXRlZAAAAAAAAAABAAAAAAAAAAAAAAAAAAAACWNsYXNzX2NyYwAAAAAAAAABAAAAALydhNYA
AAAAAAAACmNvbXBhbnlfaWQAAAAAAAAAAQAAAAAApkf+AAAAAAAAAAVzdGF0ZQAAAAAAAAAEAAAAAAAA
AAEAAAAAAAAAAgAAAAAAAAADAAAAAAAAAAUAAAAAAAAADHB1YmxpY19zdGF0ZQAAAAAAAAABAAAA
AAAAAAEAAAAAAAAABAAAAA5wcm9kdWN0X2dyb3VwcwAAE4gAAAAMQHdlaWdodCBERVNDAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAQAAAAEbmFtZQAAIygAAAAIYW5ub3VuY2UAAAAeAAAACHN5bm9u
eW1zAAAAAQAAAAthcnRpY2xlX2lkeAAAAAEAAAAAAAAAAAAAABJzcGhpbnhfaW50ZXJuYWxfaWQ=
--- request dump end ---
Sphinx 2.2.11-id64-release (95ae9a6)
Handling signal 11
-------------- backtrace begins here ---------------
Program compiled with x86_64-pc-linux-gnu-gcc 4.9.4
Configured with flags: '--prefix=/usr' '--host=x86_64-pc-linux-gnu' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--datadir=/usr/share' '--sysconfdir=/etc' '--localstatedir=/var/lib' '--disable-dependency-tracking' '--libdir=/usr/lib64' '--sysconfdir=/etc/sphinx' '--
enable-id64' '--without-debug' '--with-mysql' '--without-unixodbc' '--with-pgsql' '--without-libstemmer' '--with-syslog' '--without-libexpat' '--build=x86_64-pc-linux-gnu' 'build_alias=x86_64-pc-linux-gnu' 'host_alias=x86_64-pc-linux-gnu' 'CFLAGS=-O2 -march=nocona -pipe' 'LDFL
AGS=-Wl,-O1 -Wl,--as-needed' 'CXXFLAGS=-O2 -march=nocona -pipe'
Host OS is Linux oberon 4.4.52-gentoo-universal-03 0000013 SMP Mon Apr 3 09:58:58 +05 2017 x86_64 Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz GenuineIntel GNU/Linux
Stack bottom = 0x7f4a24744ef7, thread stack size = 0x100000
[Wed Nov 1 05:07:32.000 2017] [34789] FATAL: Cannot get stack frame pointer on this architecture
Trying system backtrace:
begin of system symbols:
searchd[0x5605d6]
searchd[0x40d352]
/lib64/libpthread.so.0(+0x10460)[0x7f4a94430460]
searchd[0x667dc5]
searchd[0x64b269]
searchd[0x445fbf]
searchd[0x44e941]
searchd[0x450180]
searchd[0x4504a3]
searchd[0x455e13]
searchd[0x456028]
searchd[0x407733]
searchd[0x5684df]
/lib64/libpthread.so.0(+0x728c)[0x7f4a9442728c]
/lib64/libc.so.6(clone+0x6d)[0x7f4a930bd57d]
-------------- backtrace ends here ---------------
Please, create a bug report in our bug tracker (http://sphinxsearch.com/bugs [^]) and attach there:
a) searchd log, b) searchd binary, c) searchd symbols.
Look into the chapter 'Reporting bugs' in the documentation
(/usr/share/doc/sphinx/sphinx.txt or http://sphinxsearch.com/docs/current.html#reporting-bugs [^])
--- BT to source lines (depth 15): ---
--- BT to source lines finished ---
--- 131 active threads ---

Attached searchd symobls here:
searchd.tar.gz

My little experiment with gdb show me that the issue is manifested in the addition of numbers in macros:
https://gist.github.com/isqad/92450e3d2cc05af2258f082c1d078c26

@klirichek
Copy link
Contributor

Hm, seems crashlog is from sphinx 2.2.x What it is doing here?

BTW, just a quick look:
(addr2line is enough to digg into symbols, it is simpler than gdb for this very task). addr2line -f -C -e searchd - and just push addresses).

How you sure that symbols is from this very searchd? I see no debuglink between files. So, it may mean that from viewpoint of attached symbols the addresses are noncense. At least provided backtrace together with the symbols have no logic at all.

Usually if we build with symbols, we do:

objcopy --only-keep-debug searchd searchd.dbg
objcopy --strip-all searchd
objcopy --add-gnu-debuglink searchd.dbg searchd

Then the files are linked. In the header of each of them stored SHA1 BuildID, which shows that these symbols matches this binary. Absence of such link together with noncense in bt signs that this symbols doesn't match this executable

@isqad
Copy link
Contributor Author

isqad commented Nov 15, 2017

Thank you for reply!

Hm, seems crashlog is from sphinx 2.2.x What it is doing here?

I thought that support of 2.2.x has not ended.

How you sure that symbols is from this very searchd?

During of build of searchd we extract debug symbols and than strip binary.
We did not made objcopy --add-gnu-debuglink searchd.dbg searchd but
gdb successfully read the symbols:

$ gdb
(gdb) file searchd
Reading symbols from searchd...Reading symbols from /home/index/sphinx/debug/bak/searchd.debug...done.
done.

I think that the symbols does match this searchd binary.

What should I do? I can execute objcopy --add-gnu-debuglink searchd.dbg searchd and attach archive again.

@klirichek
Copy link
Contributor

This is nothing to do with this symbols. They can't help to investigate the issue.
--add-gnu-debuglink is the way when you just compiled the binary and extracted the symbols. It kind of 'glue' binary and symbols, so that binary matches symbols file and vice-versa. So, with such link gdb and other utils will warn or reject to load wrong symbol file. Without them it is possible, but backtraces will show nonsence.

@isqad
Copy link
Contributor Author

isqad commented Nov 15, 2017

So I should execute objcopy --add-gnu-debuglink searchd.dbg searchd and wait for next crash?

@klirichek
Copy link
Contributor

if you can recompile and surely extract the symbols - well, yes, you can try.
Adding link between the files you currently attached will not help them to match each other, however.

Also, any fixes to 2.2.x are impossible from our side 'oficialy'. Remember, we are not sphinx, but a fork. We can't commit there.

@isqad
Copy link
Contributor Author

isqad commented Nov 15, 2017

I found my mistake!

We got debug symbols after rebuild executable but daemon did was runnig from old searchd executable.
( >_<) So backtrace is wrong.
I will restart daemon and will wait next crash for debug.

Thanks a lot!

@iivanov
Copy link

iivanov commented Nov 27, 2017

Hello. I also have a crash in same sphinx version. Maybe this crash dump contains more information.

------- FATAL: CRASH DUMP -------
[Mon Nov 27 15:39:05.198 2017] [ 6245]

--- crashed SphinxQL request dump ---
SELECT id, (is_group_primary or group_id=0) as for_catalog FROM section_1728 WHERE
details_49495 in (105330) AND status in (1) AND status_inherited in (1) AND sell_status
in (0, 1, 2, 3, 4, 7) AND for_catalog=1 LIMIT 1 FACET details_77247 LIMIT 1000 FACET
producer LIMIT 1000 FACET details_60402 LIMIT 1000 FACET mpath LIMIT 1000 FACET details_49495
LIMIT 1000
--- request dump end ---
Sphinx 2.2.11-id64-release (95ae9a6)
Handling signal 11
-------------- backtrace begins here ---------------
Program compiled with gcc 4.8.2
Configured with flags:  '--build=x86_64-redhat-linux-gnu' '--host=x86_64-redhat-linux-gnu' '--program-prefix=' '--disable-dependency-tracking' '--prefix=/usr' '--exec-prefix=/usr' '--bindir=/usr/bin' '--sbindir=/usr/sbin' '--sysconfdir=/etc' '--datadir=/usr/share' '--includedir=/usr/include' '--libdir=/usr/lib64' '--libexecdir=/usr/libexec' '--localstatedir=/var' '--sharedstatedir=/var/lib' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--sysconfdir=/etc/sphinx' '--with-mysql' '--with-re2' '--with-libstemmer' '--with-unixodbc' '--with-iconv' '--enable-id64' '--with-pgsql' '--with-syslog' 'build_alias=x86_64-redhat-linux-gnu' 'host_alias=x86_64-redhat-linux-gnu' 'CFLAGS=-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches   -m64 -mtune=generic' 'LDFLAGS=-Wl,-z,relro ' 'CXXFLAGS=-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches   -m64 -mtune=generic'
Host OS is Linux localhost.localdomain 3.10.0-123.6.3.el7.x86_64 #1 SMP Wed Jul 16 15:10:46 EDT 2014 x86_64 x86_64 x86_64 GNU/Linux
Stack bottom = 0x7ff3d0c6ddf7, thread stack size = 0x100000
Trying manual backtrace:
Something wrong with thread stack, manual backtrace may be incorrect (fp=0x7ff3d0c61cf0)
Stack looks OK, attempting backtrace.
0x4106d2
0x678483
0x691ee4
0x6755ad
0x44a91e
0x45417b
0x454ecd
0x455574
0x455ed3
0x47e2a7
0x458a2e
0x45ae4c
0x41046e
0x57970c
0x7ff3fde49dc5
Something wrong in frame pointers, manual backtrace failed (fp=0)
Trying system backtrace:
[Mon Nov 27 15:39:06.691 2017] [8080] watchdog: main process 6245 killed dirtily with signal 11, will be restarted
[Mon Nov 27 15:39:06.692 2017] [8080] watchdog: main process 22031 forked ok
[Mon Nov 27 15:39:06.726 2017] [22031] listening on all interfaces, port=9312
[Mon Nov 27 15:39:06.726 2017] [22031] listening on all interfaces, port=9306

@isqad
Copy link
Contributor Author

isqad commented Mar 23, 2018

After some time in gdb I found that we have similar problem as there: #29

fixed by apply of this commit: 72dcf66

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants