Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crash on alter table tbl add column col uint #1692

Closed
xdimus opened this issue Dec 25, 2023 · 11 comments
Closed

crash on alter table tbl add column col uint #1692

xdimus opened this issue Dec 25, 2023 · 11 comments
Assignees
Labels

Comments

@xdimus
Copy link

xdimus commented Dec 25, 2023

crash on alter table tbl add column col uint

uname -a
Linux dcn35 6.2.9-1-pve #1 SMP PREEMPT_DYNAMIC PVE 6.2.9-1 x86_64 GNU/Linux

------- FATAL: CRASH DUMP -------
[Thu Dec 21 10:42:44.891 2023] [143411]

--- crashed SphinxQL request dump ---
alter table item add column la3 uint
--- request dump end ---
--- local index:characteristic_
Manticore 6.2.12 dc5144d@230822 (columnar 2.2.4 5aec342@230822) (secondary 2.2.4 5aec342@230822)
Handling signal 11
-------------- backtrace begins here ---------------
Program compiled with Clang 15.0.7
Configured with flags: Configured with these definitions: -DDISTR_BUILD=bullseye -DUSE_SYSLOG=1 -DWITH_GALERA=1 -DWITH_RE2=1 -DWITH_RE2_FORCE_STATIC=1 -DWITH_STEMMER=1 -DWITH_STEMMER_FORCE_STATIC=1 -DWITH_NLJSON=1 -DWITH_UNIALGO=1 -DWITH_ICU=1 -DWITH_ICU_FORCE_STATIC=1 -DWITH_SSL=1 -DWITH_ZLIB=1 -DWITH_ZSTD=1 -DDL_ZSTD=1 -DZSTD_LIB=libzstd.so.1 -DWITH_CURL=1 -DDL_CURL=1 -DCURL_LIB=libcurl.so.4 -DWITH_ODBC=1 -DDL_ODBC=1 -DODBC_LIB=libodbc.so.2 -DWITH_EXPAT=1 -DDL_EXPAT=1 -DEXPAT_LIB=libexpat.so.1 -DWITH_ICONV=1 -DWITH_MYSQL=1 -DDL_MYSQL=1 -DMYSQL_LIB=libmariadb.so.3 -DWITH_POSTGRESQL=1 -DDL_POSTGRESQL=1 -DPOSTGRESQL_LIB=libpq.so.5 -DLOCALDATADIR=/var/lib/manticore -DFULL_SHARE_DIR=/usr/share/manticore
Built on Linux x86_64 (bullseye) (cross-compiled)
Stack bottom = 0x7fcd687d3850, thread stack size = 0x20000
Trying manual backtrace:
Something wrong with thread stack, manual backtrace may be incorrect (fp=0x1)
Wrong stack limit or frame pointer, manual backtrace failed (fp=0x1, stack=0x7fcd687d0000, stacksize=0x20000)
Trying system backtrace:
begin of system symbols:
/usr/bin/searchd(_Z12sphBacktraceib+0x22a)[0x56006752fcfa]
/usr/bin/searchd(_ZN11CrashLogger11HandleCrashEi+0x355)[0x5600673aece5]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x13140)[0x7fcece6a3140]
/lib/x86_64-linux-gnu/libc.so.6(+0x16804d)[0x7fcece62404d]
/usr/bin/searchd(_ZN18IndexAlterHelper_c26Alter_AddRemoveRowwiseAttrERK10CSphSchemaS2_PKjjPKhR14WriteWrapper_cS8_bRK10CSphString+0x293)[0x5600682a63a3]
/usr/bin/searchd(_ZN13CSphIndex_VLN18AddRemoveAttributeEbRK18AttrAddRemoveCtx_tR10CSphString+0x66a)[0x56006743e55a]
/usr/bin/searchd(_ZN9RtIndex_c18AddRemoveAttributeEbRK18AttrAddRemoveCtx_tR10CSphString+0x3d5)[0x560068164c35]
/usr/bin/searchd(+0xea7982)[0x5600673fc982]
/usr/bin/searchd(_ZN15ClientSession_c7ExecuteESt4pairIPKciER11RowBuffer_i+0x193b)[0x5600673f947b]
/usr/bin/searchd(_Z20ProcessSqlQueryBuddySt4pairIPKciERhR21GenericOutputBuffer_c+0x52)[0x5600673591a2]
/usr/bin/searchd(_Z8SqlServeSt10unique_ptrI16AsyncNetBuffer_cSt14default_deleteIS0_EE+0x105d)[0x56006733e74d]
/usr/bin/searchd(_Z10MultiServeSt10unique_ptrI16AsyncNetBuffer_cSt14default_deleteIS0_EESt4pairIitE7Proto_e+0x43)[0x56006733a503]
/usr/bin/searchd(+0xde60b2)[0x56006733b0b2]
/usr/bin/searchd(ZZN7Threads11CoRoutine_c13CreateContextESt8functionIFvvEESt4pairIN5boost7context13stack_contextENS_14StackFlavour_EEEENUlNS6_6detail10transfer_tEE_8__invokeESB+0x1c)[0x56006867d62c]
/usr/bin/searchd(make_fcontext+0x2f)[0x56006869d9cf]
Trying boost backtrace:
0# sphBacktrace(int, bool) in /usr/bin/searchd
1# CrashLogger::HandleCrash(int) in /usr/bin/searchd
2# 0x00007FCECE6A3140 in /lib/x86_64-linux-gnu/libpthread.so.0
3# 0x00007FCECE62404D in /lib/x86_64-linux-gnu/libc.so.6
4# IndexAlterHelper_c::Alter_AddRemoveRowwiseAttr(CSphSchema const&, CSphSchema const&, unsigned int const*, unsigned int, unsigned char const*, WriteWrapper_c&, WriteWrapper_c&, bool, CSphString const&) in /usr/bin/searchd
5# CSphIndex_VLN::AddRemoveAttribute(bool, AttrAddRemoveCtx_t const&, CSphString&) in /usr/bin/searchd
6# RtIndex_c::AddRemoveAttribute(bool, AttrAddRemoveCtx_t const&, CSphString&) in /usr/bin/searchd
7# 0x00005600673FC982 in /usr/bin/searchd
8# ClientSession_c::Execute(std::pair<char const*, int>, RowBuffer_i&) in /usr/bin/searchd
9# ProcessSqlQueryBuddy(std::pair<char const*, int>, unsigned char&, GenericOutputBuffer_c&) in /usr/bin/searchd
10# SqlServe(std::unique_ptr<AsyncNetBuffer_c, std::default_delete<AsyncNetBuffer_c> >) in /usr/bin/searchd
11# MultiServe(std::unique_ptr<AsyncNetBuffer_c, std::default_delete<AsyncNetBuffer_c> >, std::pair<int, unsigned short>, Proto_e) in /usr/bin/searchd
12# 0x000056006733B0B2 in /usr/bin/searchd
13# Threads::CoRoutine_c::CreateContext(std::function<void ()>, std::pair<boost::context::stack_context, Threads::StackFlavour_E>)::{lambda(boost::context::detail::transfer_t)#1}::__invoke(boost::context::detail::transfer_t) in /usr/bin/searchd
14# make_fcontext in /usr/bin/searchd
-------------- backtrace ends here ---------------

UPDATE

The task is to check if all attribute files exist before trying to modify attributes list in the table.

@tomatolog
Copy link
Contributor

could you upload your table there crash happened? as described in our manual upload

@sanikolaev
Copy link
Collaborator

Can't reproduce like this:

snikolaev@dev2:~$ docker run -e EXTRA=1 --name manticore --rm -d manticoresearch/manticore:6.2.12 && echo "Waiting for Manticore docker to start. Consider mapping the data_dir to make it start faster next time" && until docker logs manticore 2>&1 | grep -q "accepting connections"; do sleep 1; echo -n .; done && echo && docker exec -it manticore mysql && docker stop manticore
d98f4810b5992dd7b56f52a6b9d9e5035a476a90d9632c60ab725ca602473593
Waiting for Manticore docker to start. Consider mapping the data_dir to make it start faster next time
....
mysql> create table item(la2 int);
mysql> alter table item add column la3 uint;
mysql> drop table item;
mysql> create table item(la2 int);
mysql> insert into item(la2) values(1),(43786487364),(343);
mysql> flush ramchunk item;
mysql> insert into item(la2) values(1),(43786487364),(343);
mysql> flush table item;
mysql> insert into item(la2) values(1),(43786487364),(343);
mysql> alter table item add column la3 uint;
mysql> desc item;
+-------+--------+------------+
| Field | Type   | Properties |
+-------+--------+------------+
| id    | bigint |            |
| la2   | uint   |            |
| la3   | uint   |            |
+-------+--------+------------+
mysql> select * from item;
+--------------------+-----------+------+
| id                 | la2       | la3  |
+--------------------+-----------+------+
| 362751603444285444 |         1 |    0 |
| 362751603444285441 |         1 |    0 |
| 362751603444285447 |         1 |    0 |
| 362751603444285442 | 836814404 |    0 |
| 362751603444285445 | 836814404 |    0 |
| 362751603444285448 | 836814404 |    0 |
| 362751603444285446 |       343 |    0 |
| 362751603444285449 |       343 |    0 |
| 362751603444285443 |       343 |    0 |
+--------------------+-----------+------+

We need something to reproduce it locally to fix the crash: the data files or a way to recreate the table from scratch.

@sanikolaev sanikolaev added waiting Waiting for the original poster (in most cases) or something else bug labels Dec 25, 2023
@xdimus
Copy link
Author

xdimus commented Dec 28, 2023

crashes only with non-empty table, if I'm truncate it no crashes

@tomatolog
Copy link
Contributor

we need reproducible example to investigate the crash further, ie you could upload your table that cause the crash as as described in our manual upload section or provide MRE with the CREATE TABLE to create table structure then INSERT to populate table and ALTER statement that cause the crash

@xdimus
Copy link
Author

xdimus commented Jan 17, 2024

I uploaded files

@sanikolaev
Copy link
Collaborator

Thanks. I've reproduced the crash:

 0# sphBacktrace(int, bool) in searchd
 1# CrashLogger::HandleCrash(int) in searchd
 2# 0x00007FA911FFB520 in /lib/x86_64-linux-gnu/libc.so.6
 3# 0x00007FA912159741 in /lib/x86_64-linux-gnu/libc.so.6
 4# IndexAlterHelper_c::Alter_AddRemoveRowwiseAttr(CSphSchema const&, CSphSchema const&, unsigned int const*, unsigned int, unsigned char const*, WriteWrapper_c&, WriteWrapper_c&, bool, CSphString const&) in searchd
 5# CSphIndex_VLN::AddRemoveAttribute(bool, AttrAddRemoveCtx_t const&, CSphString&) in searchd
 6# RtIndex_c::AddRemoveAttribute(bool, AttrAddRemoveCtx_t const&, CSphString&) in searchd
 7# ClientSession_c::Execute(std::pair<char const*, int>, RowBuffer_i&) in searchd
 8# SqlServe(std::unique_ptr<AsyncNetBuffer_c, std::default_delete<AsyncNetBuffer_c> >) in searchd
 9# MultiServe(std::unique_ptr<AsyncNetBuffer_c, std::default_delete<AsyncNetBuffer_c> >, std::pair<int, unsigned short>, Proto_e) in searchd
10# 0x0000560665768542 in searchd
11# Threads::CoRoutine_c::CreateContext(std::function<void ()>, std::pair<boost::context::stack_context, Threads::StackFlavour_E>)::{lambda(boost::context::detail::transfer_t)#1}::__invoke(boost::context::detail::transfer_t) in searchd
12# make_fcontext in searchd

But the point is that even before the ALTER indextool reports:

FAILED, unable to open attributes: failed to open datadir/active_item/active_item.65.spa: No such file or directory

so, the table is corrupted. If I remove chunk 65, I can do:

mysql> alter table active_item add column col uint;
Query OK, 0 rows affected (11.43 sec)

fine

mysql> desc active_item;
...
| col                        | uint   |                |

and the table is not corrupted after that:

snikolaev@dev2:~/issue-1692$ indextool -c manticore.conf --check active_item
...
snikolaev@dev2:~/issue-1692$ echo $?
0

So the ALTER crashes due to the corrupted table.

What we can try to do in this specific case is to check if all attribute files exist before trying to modifying attributes list in the table.

@sanikolaev sanikolaev removed the waiting Waiting for the original poster (in most cases) or something else label Jan 17, 2024
@xdimus
Copy link
Author

xdimus commented Jan 18, 2024

If I remove chunk 65, I can do:

How to correctly delete a bad chunk?

@tomatolog
Copy link
Contributor

the correct way is to truncate table then reindex data from scratch

@xdimus
Copy link
Author

xdimus commented Jan 25, 2024

it is too long, can i delete only bad chank, alter the table then re-fill it?

@sanikolaev
Copy link
Collaborator

In theory yes. You can delete it by:

  • stopping Manticore
  • updating table's meta file
  • starting it back

Then you can re-insert data from the bad chunk.

@sanikolaev
Copy link
Collaborator

@klirichek klirichek closed this as completed in b7c3384 3 days ago

Reopening as it turns out with this change a few columnar tests don't pass - https://github.com/manticoresoftware/columnar/runs/24096217755

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants