Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lost connection to MySQL server during query #325

Closed
glukkkk opened this issue Apr 8, 2020 · 57 comments
Closed

Lost connection to MySQL server during query #325

glukkkk opened this issue Apr 8, 2020 · 57 comments
Labels

Comments

@glukkkk
Copy link

glukkkk commented Apr 8, 2020

image
image

--- crashed SphinxQL request dump ---
select count(distinct un_sku_id) From sku_20200317
--- request dump end ---
--- local index:sku_20200317
Manticore 3.4.0 b212975@200327 release
Handling signal 11
-------------- backtrace begins here ---------------
Program compiled with 4.8.5
Configured with flags: Configured by CMake with these definitions: -DCMAKE_BUILD_TYPE=RelWithDebInfo -DDISTR_BUILD=rhel7 -DUSE_SSL=ON -DDL_UNIXODBC=1 -DUNIXODBC_LIB=libodbc.so.2 -DDL_EXPAT=1 -DEXPAT_LIB=libexpat.so.1 -DUSE_LIBICONV=1
-DDL_MYSQL=1 -DMYSQL_LIB=libmysqlclient.so.18 -DDL_PGSQL=1 -DPGSQL_LIB=libpq.so.5 -DLOCALDATADIR=/var/data -DFULL_SHARE_DIR=/usr/share/manticore -DUSE_ICU=1 -DUSE_BISON=ON -DUSE_FLEX=ON -DUSE_SYSLOG=1 -DWITH_EXPAT=1 -DWITH_ICONV=ON
-DWITH_MYSQL=1 -DWITH_ODBC=ON -DWITH_PGSQL=1 -DWITH_RE2=1 -DWITH_STEMMER=1 -DWITH_ZLIB=ON -DGALERA_SOVERSION=31 -DSYSCONFDIR=/etc/manticoresearch
Host OS is Linux runner-ed2dce3a-project-3858465-concurrent-0 4.19.78-coreos #1 SMP Mon Oct 14 22:56:39 -00 2019 x86_64 x86_64 x86_64 GNU/Linux
Stack bottom = 0x7fd8e7982d7f, thread stack size = 0x100000
Trying manual backtrace:
Something wrong with thread stack, manual backtrace may be incorrect (fp=0xc)
Wrong stack limit or frame pointer, manual backtrace failed (fp=0xc, stack=0x7fd8e7980000, stacksize=0x100000)
Trying system backtrace:
begin of system symbols:
searchd(_Z12sphBacktraceib 0x90)[0x710e20]
searchd(_ZN16SphCrashLogger_c11HandleCrashEi 0x1fe)[0x58409e]
/lib64/libpthread.so.0( 0xf5f0)[0x7fd8f8c375f0]
searchd[0x8987c0]
searchd(_Z14sphGetBlobAttrRK9CSphMatchRK15CSphAttrLocatorPKh 0x22)[0x899f02]
searchd(_ZNK9CSphMatch13FetchAttrDataERK15CSphAttrLocatorPKh 0x10)[0x665fa0]
searchd[0x71efc6]
searchd(_ZN23CSphImplicitGroupSorterI16MatchGeneric2_fnLb1ELb0EE6MoveToEP15ISphMatchSorter 0xc9)[0x7669e9]
searchd(_Z15FlattenToSorterP15ISphMatchSorter11VecTraits_TIS0_E 0x30)[0x84b080]
searchd(_ZN13Tls_context_c8FinalizeEv 0x1e1)[0x870711]
searchd(_Z15QueryDiskChunksPK9CSphQueryP15CSphQueryResultRK18CSphMultiQueryArgsR15SphChunkGuard_tR11VecTraits_TIP15ISphMatchSorterEP16CSphQueryProfilebPK15CSphOrderedHashIl10CSphString15CSphStrHashFuncLi256EElPKcRS9_IPKhEl 0x49a)[0x85d52a]
searchd(_ZNK9RtIndex_c10MultiQueryEPK9CSphQueryP15CSphQueryResultiPP15ISphMatchSorterRK18CSphMultiQueryArgs 0x45f)[0x85f7af]
searchd(_ZNK9RtIndex_c12MultiQueryExEiPK9CSphQueryPP15CSphQueryResultPP15ISphMatchSorterRK18CSphMultiQueryArgs 0x77)[0x861517]
searchd(_ZN15SearchHandler_c16RunLocalSearchesEv 0x76a)[0x5bc57a]
searchd(_ZN15SearchHandler_c9RunSubsetEii 0xca7)[0x5ced77]
searchd(_ZN15SearchHandler_c10RunQueriesEv 0xb5)[0x5cfaf5]
searchd(_Z17HandleMysqlSelectR11RowBuffer_iR15SearchHandler_c 0x1b0)[0x5d0180]
searchd(_ZN16CSphinxqlSession7ExecuteERK10CSphStringR11RowBuffer_iRN7Threads9ThdDesc_tE 0x1411)[0x5f45c1]
searchd(_Z15LoopClientMySQLRhR16CSphinxqlSessionR10CSphStringibRN7Threads9ThdDesc_tER13InputBuffer_cR16ISphOutputBuffer 0x322)[0x5d1022]
searchd[0x5d13cb]
searchd(_Z17HandlerThreadFuncPv 0x19)[0x5d3519]
searchd(_ZN16SphCrashLogger_c13ThreadWrapperEPv 0x43)[0x583cd3]
searchd(_Z20sphThreadProcWrapperPv 0x23)[0x7149a3]
/lib64/libpthread.so.0( 0x7e65)[0x7fd8f8c2fe65]
/lib64/libc.so.6(clone 0x6d)[0x7fd8f744688d]
-------------- backtrace ends here ---------------

@glukkkk
Copy link
Author

glukkkk commented Apr 9, 2020

It happens everytime I execute this query. The same error occurs when executing the following query:

select un_sku_id from sku_20200317 group by un_sku_id

@githubmanticore
Copy link
Contributor

➤ Sergey Nikolaev commented:

Hi

I can't reproduce it on a test index:

mysql> show index rt status;  
+-------------------+---------------------------------------------------------------------------------------------------------+  
| Variable_name     | Value                                                                                                   |  
+-------------------+---------------------------------------------------------------------------------------------------------+  
| index_type        | rt                                                                                                      |  
| indexed_documents | 10000000                                                                                                |  
| indexed_bytes     | 4070133566                                                                                              |  
| ram_bytes         | 1914450281                                                                                              |  
| disk_bytes        | 3673594629                                                                                              |  
| ram_chunk         | 91268981                                                                                                |  
| ram_chunks_count  | 22                                                                                                      |  
| disk_chunks       | 34                                                                                                      |  
| mem_limit         | 104857600                                                                                               |  
| ram_bytes_retired | 0                                                                                                       |  
| tid               | 12104                                                                                                   |  
| tid_saved         | 12080                                                                                                   |  
| query_time_1min   | {"queries":1, "avg_sec":2.308, "min_sec":2.308, "max_sec":2.308, "pct95_sec":2.308, "pct99_sec":2.308}  |  
| query_time_5min   | {"queries":1, "avg_sec":2.308, "min_sec":2.308, "max_sec":2.308, "pct95_sec":2.308, "pct99_sec":2.308}  |  
| query_time_15min  | {"queries":1, "avg_sec":2.308, "min_sec":2.308, "max_sec":2.308, "pct95_sec":2.308, "pct99_sec":2.308}  |  
| query_time_total  | {"queries":11, "avg_sec":0.555, "min_sec":0.155, "max_sec":2.308, "pct95_sec":2.308, "pct99_sec":2.308} |  
| found_rows_1min   | {"queries":1, "avg":1, "min":1, "max":1, "pct95":1, "pct99":1}                                          |  
| found_rows_5min   | {"queries":1, "avg":1, "min":1, "max":1, "pct95":1, "pct99":1}                                          |  
| found_rows_15min  | {"queries":1, "avg":1, "min":1, "max":1, "pct95":1, "pct99":1}                                          |  
| found_rows_total  | {"queries":11, "avg":1, "min":1, "max":1, "pct95":1, "pct99":1}                                         |  
+-------------------+---------------------------------------------------------------------------------------------------------+  
20 rows in set (0.00 sec)  
  
mysql> select count(distinct site_id) from rt;  
+-------------------------+  
| count(distinct site_id) |  
+-------------------------+  
|                    1687 |  
+-------------------------+  
1 row in set (2.31 sec)  

So if possible - can you upload your index to our write-only FTP server?

ftp: dev.manticoresearch.com      
user: manticorebugs      
pass: shithappens  

It will be very helpful to debug the issue.

@glukkkk
Copy link
Author

glukkkk commented Apr 9, 2020

So if possible - can you upload your index to our write-only FTP server?

ftp: dev.manticoresearch.com      
user: manticorebugs      
pass: shithappens  

It will be very helpful to debug the issue.

Done.

@tomatolog
Copy link
Contributor

seems your RT index got changed since the crash because I tried to reproduce the crash on data you uploaded and see no issue - got correct reply

mysql> select count(distinct un_sku_id) From sku_20200317;
+---------------------------+
| count(distinct un_sku_id) |
+---------------------------+
|                    128133 |

I need same index that causes the crash or a way to reproduce the crash here locally.

@glukkkk
Copy link
Author

glukkkk commented Apr 9, 2020

seems your RT index got changed since the crash

No, it hasn't. What additional info I can provide you with so that you can reproduce the issue?

@glukkkk
Copy link
Author

glukkkk commented Apr 9, 2020

@tomatolog Maybe there is difference in config files? Can you please provide your searchd config? I'll try it on our server.

@tomatolog
Copy link
Contributor

here is config you provided with only search section from me

stas@dev:~/bin/1302$ cat s.conf
index sku_20200317
{
 type = rt
 path = sku/sku_20200317

 rt_mem_limit=1024M
 morphology = stem_enru
 index_exact_words = 1
 expand_keywords = 1
 min_infix_len = 3
 blend_chars = -
 charset_table = 0..9, english, russian, _, U+0401->U+0435, U+0451->U+0435

 # список полей для индексации
 rt_field = aggregated
 rt_field = cert_num
 rt_field = holder_address
 rt_field = holder_country
 rt_field = holder
 rt_field = atc_code
 rt_field = atc
 rt_field = ptg
 rt_field = trade_name
 rt_field = drugs_presence
 rt_field = circulation_period
 rt_field = man_info_country
 rt_field = man_info_address
 rt_field = man_info_company
 rt_field = man_info_stage
 rt_field = sku_pack_text
 rt_field = consumer_pack_type_text
 rt_field = consumer_pack_type
 rt_field = purpose
 rt_field = orig_man_form_dosage_text
 rt_field = orig_man_form_dosage_form_text
 rt_field = orig_man_form_shelf_life
 rt_field = orig_man_form_storage_conditions
 rt_field = pack_component
 rt_field = primary_pack_type_text
 rt_field = primary_pack_type
 rt_field = man_form_measure_unit
 rt_field = man_form_shelf_life
 rt_field = man_form_storage_conditions
 rt_field = man_form_dosage_form
 rt_field = inn_man_form_measure_unit
 rt_field = inn
 rt_field = inn_code
 rt_field = inn_text
 rt_field = cert_num_id
 rt_field = atc_id
 rt_field = barcode_type_id
 rt_field = consumer_pack_type_id
 rt_field = grls_data_id
 rt_field = grls_sku_id
 rt_field = grls_uuid
 rt_field = grls_id
 rt_field = holder_address_id
 rt_field = holder_country_id
 rt_field = holder_id
 rt_field = inn_id
 rt_field = inn_man_form_measure_unit_id
 rt_field = man_form_dosage_form_id
 rt_field = man_form_measure_unit_id
 rt_field = man_info_address_id
 rt_field = man_info_company_id
 rt_field = man_info_country_id
 rt_field = man_info_stage_id
 rt_field = pack_component_id
 rt_field = primary_pack_type_id
 rt_field = un_sku_id
 rt_field = total_count
 rt_field = sku_base_name
 rt_field = inn_list
 rt_field = inn_list_id

 # список атрибутов
 rt_attr_string = cert_num
 rt_attr_string = cert_num_id
 rt_attr_bigint = validity_date_to
 rt_attr_bigint = validity_date_from
 rt_attr_string = holder_address
 rt_attr_string = holder_address_id
 rt_attr_string = holder_country
 rt_attr_string = holder_country_id
 rt_attr_string = holder
 rt_attr_string = holder_id
 rt_attr_string = atc
 rt_attr_string = atc_code
 rt_attr_string = atc_id
 rt_attr_string = ptg
 rt_attr_string = trade_name
 rt_attr_bigint = is_interchangeable
 rt_attr_bigint = is_ref
 rt_attr_bool = is_life_important
 rt_attr_bool = is_eaeu
 rt_attr_string = drugs_presence
 rt_attr_string = circulation_period
 rt_attr_bigint = cancel_date
 rt_attr_bigint = renew_date
 rt_attr_bigint = exp_date
 rt_attr_bigint = reg_date
 rt_attr_bigint = created_at
 rt_attr_string = grls_data_id
 rt_attr_string = grls_id
 rt_attr_string = grls_uuid
 rt_attr_string = man_info_country
 rt_attr_string = man_info_country_id
 rt_attr_string = man_info_address
 rt_attr_string = man_info_address_id
 rt_attr_string = man_info_company
 rt_attr_string = man_info_company_id
 rt_attr_string = man_info_stage
 rt_attr_string = man_info_stage_id
 rt_attr_bigint = is_recipe
 rt_attr_string = sku_pack_text
 rt_attr_string = consumer_pack_type_text
 rt_attr_string = consumer_pack_type
 rt_attr_string = consumer_pack_type_id
 rt_attr_string = purpose
 rt_attr_string = orig_man_form_dosage_text
 rt_attr_string = orig_man_form_dosage_form_text
 rt_attr_uint = consumer_pack_count
 rt_attr_uint = consumer_pack_count_end
 rt_attr_string = grls_sku_id
 rt_attr_string = un_sku_id
 rt_attr_uint = pack_component_count
 rt_attr_string = pack_component
 rt_attr_string = pack_component_id
 rt_attr_string = primary_pack_type_text
 rt_attr_string = primary_pack_type
 rt_attr_string = primary_pack_type_id
 rt_attr_float = man_form_count
 rt_attr_float = man_form_count_end
 rt_attr_string = man_form_measure_unit
 rt_attr_string = man_form_measure_unit_id
 rt_attr_uint = primary_pack_count
 rt_attr_uint = primary_pack_count_end
 rt_attr_string = man_form_dosage_form
 rt_attr_string = man_form_dosage_form_id
 rt_attr_string = inn_man_form_measure_unit
 rt_attr_string = inn_man_form_measure_unit_id
 rt_attr_float = inn_man_form_count_end
 rt_attr_float = inn_man_form_count
 rt_attr_string = inn
 rt_attr_string = inn_id
 rt_attr_bigint = inn_is_inn
 rt_attr_string = inn_text
 rt_attr_string = man_form_shelf_life
 rt_attr_string = orig_man_form_shelf_life
 rt_attr_bool = has_pim_reference
 rt_attr_string = sku_base_name
}


searchd
{
        listen                  = 7306:mysql41_vip
        log                             = searchd.log
        pid_file                = searchd.pid
        query_log               = query.log
        query_log_format = sphinxql
        workers                 = thread_pool # threads # thread_pool #threads # for RT to work
        max_children = 4
        binlog_path             = data
        seamless_rotate = 1

        client_timeout = 15000
        read_timeout = 10
        max_packet_size = 128M
}

@tomatolog
Copy link
Contributor

could you provide or upload searchd.log ? maybe there are many different crashes or query logged these cause daemon crashes?

@tomatolog
Copy link
Contributor

could you also provide your box OS and package that you uses?

Could you create case at Docker container that crashes daemon and upload it into our FTP?

@glukkkk
Copy link
Author

glukkkk commented Apr 10, 2020

could you provide or upload searchd.log ? maybe there are many different crashes or query logged these cause daemon crashes?

@tomatolog Uploaded to the same folder.

could you also provide your box OS and package that you uses?

Linux dev-search.local 3.10.0-1062.18.1.el7.x86_64 #1 SMP Tue Mar 17 23:49:17 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Could you create case at Docker container that crashes daemon and upload it into our FTP?

We're working on it.

@glukkkk
Copy link
Author

glukkkk commented Apr 10, 2020

JFYI, the same index configuration works properly in Sphinx 3.2.1 on the same server.

@tomatolog
Copy link
Contributor

from log you provided I see that a lot of update were replayed from binlog after every daemon restarts.

Did you upload your index after clean daemon shutdown? Otherwise index might be not in the state that causes crash.

@glukkkk
Copy link
Author

glukkkk commented Apr 10, 2020

from log you provided I see that a lot of update were replayed from binlog after every daemon restarts.

Did you upload your index after clean daemon shutdown? Otherwise index might be not in the state that causes crash.

Yes, I stopped the Manticore service, put index data to /data/sphinx and cleared binlogs. After this I started the service. Almost all queries are executed properly to this index, except the problematic ones.

I've uploaded *.vdi images to your FTP server (dev-search.zip). The problem is reproduced there. root pass is 1234.

@tomatolog
Copy link
Contributor

what OS should I use for your vdi file?

@glukkkk
Copy link
Author

glukkkk commented Apr 10, 2020

what OS should I use for your vdi file?

Linux 64-bit. There is CentOS inside.

@tomatolog
Copy link
Contributor

there is two vdi inside should I mount both? or specific one?

@glukkkk
Copy link
Author

glukkkk commented Apr 10, 2020

Mount both

@tomatolog
Copy link
Contributor

I checked VM you provided and reproduced crash with this data.

However I checked that index with indextool and see that disk chunk of RT index is damaged

C:\dev\sphinx\manticore\build\src\Debug>indextool -c a2.conf --check sku_20200317
Manticore 3.4.2 8a46d92a@200410 dev
using config file 'a2.conf'...
checking index 'weblog2'...
checking schema...
checking disk chunk, extension 0, 0(2)...
checking schema...
checking rows...
FAILED, Unknown blob row type: 129 at offset 88769916, docid=3857731375265344514, rowid=40245 of 136352
FAILED, Unknown blob row type: 128 at offset 88771812, docid=6807777356173493252, rowid=40246 of 136352
checking attribute blocks index...

that is why daemon crashed on using such invalid string attribute.

I need a way to investigate issue how index got into such state.

@glukkkk
Copy link
Author

glukkkk commented Apr 12, 2020

I just start filling the index and it breaks at some state everytime. The same process works properly with the latest version of Sphinx.

I can fill the index from the scratch and provide you with query.log and searchd.log after that. Will it help you?

@tomatolog
Copy link
Contributor

no
I need source query stream for that index. That should be sql file with insert / replace / update statements these fill index with data and flush and select statements these issue searches and show that daemon breaks at some point.

query.log is not appropriate as it contains only select / search queries and we can not recreate index from it.

@glukkkk
Copy link
Author

glukkkk commented Apr 12, 2020

Ok, I'll try to make such file in 1-2 days. I'll upload it to the server and let you know about it.

@glukkkk
Copy link
Author

glukkkk commented Apr 14, 2020

@tomatolog Uploaded to /issue-325/log.sql.zip

@tomatolog
Copy link
Contributor

I truncated your index when inserted data you provided there as mysql -h 127.0.0.1 -P 8306 < log.sql

And after populate finished checked index with indextool and see no issue that index from VM has.

I need not a raw data or already broken index I need a way to recreate how index become broken.

Maybe there should be script that insert data from stream in N parallel threads or interleave populate with searches or OPTIMIZE commands. Do you have thoughts how to get broken index?

@glukkkk glukkkk closed this as completed Apr 15, 2020
@glukkkk glukkkk reopened this Apr 15, 2020
@glukkkk
Copy link
Author

glukkkk commented Apr 15, 2020

@tomatolog Ok, I have uploaded extended sql-file to the server (/issue-325/log_extended.sql.gz)

What I did to reproduce the problem:

  • stopped the manticore/searchd service
  • cleared data folder
  • cleared binlogs folder
  • started the manticore/searchd service
  • executed mysql -h 127.0.0.1 -P9306 < log_extended.sql
  • entered mysql console and executed select un_sku_id from sku group by un_sku_id

After this I got ERROR 2013 (HY000): Lost connection to MySQL server during query.

Note that this problem is reproduced under VM with using vdi images uploaded earlier.

@tomatolog
Copy link
Contributor

ok I will try at native box then at VM you provided in case native build will show no issue

@glukkkk
Copy link
Author

glukkkk commented Apr 20, 2020

@tomatolog Any updates?

@glukkkk
Copy link
Author

glukkkk commented Apr 27, 2020

@tomatolog Hello again! Did you reproduced the issue with the provided info?

@glukkkk
Copy link
Author

glukkkk commented May 13, 2020

@tomatolog Unfortunately, it doesn't work. It still fails on the following query:

select count(distinct un_sku_id) from sku_20200317;

Server version: 3.4.3 ab7cbe5@200511 dev

@githubmanticore
Copy link
Contributor

githubmanticore commented May 13, 2020

➤ Stan commented:

could you provide crash log from new build?

@glukkkk
Copy link
Author

glukkkk commented May 13, 2020

@githubmanticore @tomatolog

[Wed May 13 08:06:46.914 2020] [1270] rt: index sku_20200317: diskchunk 13(1), segments 32 saved in 12.928 sec
[Wed May 13 08:11:04.333 2020] [1270] rt: index sku_20200317: diskchunk 14(2), segments 32 saved in 12.911 sec
------- FATAL: CRASH DUMP -------
[Wed May 13 08:17:46.919 2020] [ 966]

--- crashed SphinxQL request dump ---
select count(distinct un_sku_id) from sku_20200317
--- request dump end ---
--- local index:sku_20200317
Manticore 3.4.3 ab7cbe5@200511 dev
Handling signal 11
-------------- backtrace begins here ---------------
Program compiled with 4.8.5
Configured with flags: Configured by CMake with these definitions: -DCMAKE_BUILD_TYPE=RelWithDebInfo -DDISTR_BUILD=rhel7 -DUSE_SSL=ON -DDL_UNIXODBC=1 -DUNIXODBC_LIB=libodbc.so.2 -DDL_EXPAT=1 -DEXPAT_LIB=libexpat.so.1 -DUSE_LIBICONV=1 -DDL_MYSQL=1 -DMYSQL_LIB=libmysqlclient.so.18 -DDL_PGSQL=1 -DPGSQL_LIB=libpq.so.5 -DLOCALDATADIR=/var/data -DFULL_SHARE_DIR=/usr/share/manticore -DUSE_ICU=1 -DUSE_BISON=ON -DUSE_FLEX=ON -DUSE_SYSLOG=1 -DWITH_EXPAT=1 -DWITH_ICONV=ON -DWITH_MYSQL=1 -DWITH_ODBC=ON -DWITH_PGSQL=1 -DWITH_RE2=1 -DWITH_STEMMER=1 -DWITH_ZLIB=ON -DGALERA_SOVERSION=31 -DSYSCONFDIR=/etc/manticoresearch
Host OS is Linux dev-search.pharm.local 3.10.0-1062.18.1.el7.x86_64 #1 SMP Tue Mar 17 23:49:17 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Stack bottom = 0x7f667700ed7f, thread stack size = 0x100000
Trying manual backtrace:
Something wrong with thread stack, manual backtrace may be incorrect (fp=0xc)
Wrong stack limit or frame pointer, manual backtrace failed (fp=0xc, stack=0x7f6677010000, stacksize=0x100000)
Trying system backtrace:
begin of system symbols:
/usr/bin/searchd(_Z12sphBacktraceib+0x90)[0x715a50]
/usr/bin/searchd(_ZN16SphCrashLogger_c11HandleCrashEi+0x1fe)[0x58613e]
/lib64/libpthread.so.0(+0xf630)[0x7f668036a630]
/usr/bin/searchd[0x8a37c0]
/usr/bin/searchd(_Z14sphGetBlobAttrRK9CSphMatchRK15CSphAttrLocatorPKh+0x22)[0x8a4ef2]
/usr/bin/searchd(_ZNK9CSphMatch13FetchAttrDataERK15CSphAttrLocatorPKh+0x10)[0x669030]
/usr/bin/searchd[0x723586]
/usr/bin/searchd(_ZN23CSphImplicitGroupSorterI16MatchGeneric2_fnLb1ELb0EE6MoveToEP15ISphMatchSorter+0xc9)[0x76ac19]
/usr/bin/searchd(_Z15FlattenToSorterP15ISphMatchSorter11VecTraits_TIS0_E+0x30)[0x856410]
/usr/bin/searchd(_ZN13Tls_context_c8FinalizeEv+0x1e4)[0x87b684]
/usr/bin/searchd(_Z15QueryDiskChunksPK9CSphQueryP15CSphQueryResultRK18CSphMultiQueryArgsR15SphChunkGuard_tR11VecTraits_TIP15ISphMatchSorterEP16CSphQueryProfilebPK15CSphOrderedHashIl10CSphString15CSphStrHashFuncLi256EElPKcRS9_IPKhEl+0x5d2)[0x868642]
/usr/bin/searchd(_ZNK9RtIndex_c10MultiQueryEPK9CSphQueryP15CSphQueryResultiPP15ISphMatchSorterRK18CSphMultiQueryArgs+0x425)[0x86a9b5]
/usr/bin/searchd(_ZNK9RtIndex_c12MultiQueryExEiPK9CSphQueryPP15CSphQueryResultPP15ISphMatchSorterRK18CSphMultiQueryArgs+0x77)[0x86c777]
/usr/bin/searchd(_ZN15SearchHandler_c16RunLocalSearchesEv+0x961)[0x5beb51]
/usr/bin/searchd(_ZN15SearchHandler_c9RunSubsetEii+0xca7)[0x5d23c7]
/usr/bin/searchd(_ZN15SearchHandler_c10RunQueriesEv+0xb5)[0x5d3145]
/usr/bin/searchd(_Z17HandleMysqlSelectR11RowBuffer_iR15SearchHandler_c+0x1b0)[0x5d37d0]
/usr/bin/searchd(_ZN16CSphinxqlSession7ExecuteERK10CSphStringR11RowBuffer_iRN7Threads9ThdDesc_tE+0x1405)[0x5f6605]
/usr/bin/searchd(_Z15LoopClientMySQLRhR16CSphinxqlSessionR10CSphStringibRN7Threads9ThdDesc_tER13InputBuffer_cR16ISphOutputBuffer+0x322)[0x5d4752]
/usr/bin/searchd[0x5d4afb]
/usr/bin/searchd(_Z17HandlerThreadFuncPv+0x19)[0x5d6d69]
/usr/bin/searchd(_ZN16SphCrashLogger_c13ThreadWrapperEPv+0x43)[0x585d73]
/usr/bin/searchd(_Z20sphThreadProcWrapperPv+0x23)[0x7195d3]
/lib64/libpthread.so.0(+0x7ea5)[0x7f6680362ea5]
/lib64/libc.so.6(clone+0x6d)[0x7f667eb798dd]
-------------- backtrace ends here ---------------
Please, create a bug report in our bug tracker (https://github.com/manticoresoftware/manticore/issues)
and attach there:
a) searchd log, b) searchd binary, c) searchd symbols.
Look into the chapter 'Reporting bugs' in the documentation
(http://docs.manticoresearch.com/latest/html/reporting_bugs.html)
Dump with GDB via watchdog
[Wed May 13 08:17:47.204 2020] [965] watchdog: got USR1, performing dump of child's stack
Will run gdb on '/usr/bin/searchd', pid '966'
Error reading attached process's symbol file.
: No such file or directory.
Error reading attached process's symbol file.
: No such file or directory.
[New LWP 1340]
[New LWP 1292]
[New LWP 1291]
[New LWP 1290]
[New LWP 1289]
[New LWP 968]
[New LWP 967]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
0x00007f667eb709a3 in select () from /lib64/libc.so.6
Id Target Id Frame
8 Thread 0x7f6680780700 (LWP 967) "TaskSched" 0x00007f6680366de2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
7 Thread 0x7f6677114700 (LWP 968) "TaskW_1" 0x00007f6680366de2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
6 Thread 0x7f6676f0a700 (LWP 1289) "rtsearch_0" 0x00007f6680366a35 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
5 Thread 0x7f6676ace700 (LWP 1290) "rtsearch_1" 0x00007f6680366a35 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
4 Thread 0x7f66769c9700 (LWP 1291) "rtsearch_2" 0x00007f6680366a35 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
3 Thread 0x7f66768c4700 (LWP 1292) "rtsearch_3" 0x00007f6680366a35 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
2 Thread 0x7f667700f700 (LWP 1340) "handler" 0x00007f667eb709a3 in select () from /lib64/libc.so.6

  • 1 Thread 0x7f66807828c0 (LWP 966) "searchd" 0x00007f667eb709a3 in select () from /lib64/libc.so.6

Thread 8 (Thread 0x7f6680780700 (LWP 967)):
#0 0x00007f6680366de2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x000000000071a106 in ?? ()
#2 0x000000005ebb82fb in ?? ()
#3 0x000000002cb4ef84 in ?? ()
#4 0x00000000000003e7 in ?? ()
#5 0x0000000000e2af60 in ?? ()
#6 0x00000000000003e7 in ?? ()
#7 0x00000000000f4202 in ?? ()
#8 0x0000000000000000 in ?? ()

Thread 7 (Thread 0x7f6677114700 (LWP 968)):
#0 0x00007f6680366de2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x0000000000719fb6 in ?? ()
#2 0x000000005ebb8552 in ?? ()
#3 0x000000002cb41fda in ?? ()
#4 0x0000000002363228 in ?? ()
#5 0x0000000023c345ff in ?? ()
#6 0x0000000002363228 in ?? ()
#7 0x0000000000000002 in ?? ()
#8 0x0000000002363250 in ?? ()
#9 0x0000000000624f74 in ?? ()
#10 0x00007f6670000ad0 in ?? ()
#11 0x0000000000e2b0a8 in ?? ()
#12 0x0000000000e2b0a8 in ?? ()
#13 0x0000000000e2b098 in ?? ()
#14 0x0000000000000000 in ?? ()

Thread 6 (Thread 0x7f6676f0a700 (LWP 1289)):
#0 0x00007f6680366a35 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x0000000000910c59 in ?? ()
#2 0x0000000000001000 in ?? ()
#3 0x00007f666c376208 in ?? ()
#4 0x0000000000000000 in ?? ()

Thread 5 (Thread 0x7f6676ace700 (LWP 1290)):
#0 0x00007f6680366a35 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x0000000000910c59 in ?? ()
#2 0x0000000000000000 in ?? ()

Thread 4 (Thread 0x7f66769c9700 (LWP 1291)):
#0 0x00007f6680366a35 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x0000000000910c59 in ?? ()
#2 0x0000000000000000 in ?? ()

Thread 3 (Thread 0x7f66768c4700 (LWP 1292)):
#0 0x00007f6680366a35 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x0000000000910c59 in ?? ()
#2 0x0000000000000000 in ?? ()

Thread 2 (Thread 0x7f667700f700 (LWP 1340)):
#0 0x00007f667eb709a3 in select () from /lib64/libc.so.6
#1 0x0000000000661284 in ?? ()
#2 0x0000000000000002 in ?? ()
#3 0x00000000000dff5b in ?? ()
#4 0x00000000000000f6 in ?? ()
#5 0x0000000000715ad0 in ?? ()
#6 0x0000000000000000 in ?? ()

Thread 1 (Thread 0x7f66807828c0 (LWP 966)):
#0 0x00007f667eb709a3 in select () from /lib64/libc.so.6
#1 0x0000000000596b74 in ?? ()
#2 0x00007ffcffffffff in ?? ()
#3 0x00007ffc07679460 in ?? ()
#4 0x0000000000000000 in ?? ()

Main thread:
#0 0x00007f667eb709a3 in select () from /lib64/libc.so.6
#1 0x0000000000596b74 in ?? ()
#2 0x00007ffcffffffff in ?? ()
#3 0x00007ffc07679460 in ?? ()
#4 0x0000000000000000 in ?? ()

Local variables:
No symbol table info available.
[Inferior 1 (process 966) detached]
You can obtain the sources of this version from https://github.com/manticoresoftware/manticoresearch/archive/ab7cbe5.zip
and set up debug env with this shippet (select wget or curl version below):

wget https://codeload.github.com/manticoresoftware/manticoresearch/zip/ab7cbe5 -O manticore.zip
curl https://codeload.github.com/manticoresoftware/manticoresearch/zip/ab7cbe5 -o manticore.zip

Unpack the sources by command:
mkdir -p /tmp/manticore && unzip manticore.zip -d /tmp/manticore

For comfortable debug also suggest to append a substitution def to your ~/.gdbinit file:
set substitute-path "/home/ztsv/manticoresearch" /tmp/manticore/manticoresearch-ab7cbe5
--- 1 active threads ---
thd 0, proto sphinxql, state query, command select
------- CRASH DUMP END -------
[Wed May 13 08:17:50.335 2020] [965] watchdog: main process 966 crashed via CRASH_EXIT (exit code 2), will be restarted

@tomatolog
Copy link
Contributor

could you upload this index (where crash happens) to our FTP?

@glukkkk
Copy link
Author

glukkkk commented May 13, 2020

@tomatolog Uploaded to /issue-325/sku(2020May13).tar.gz

@tomatolog
Copy link
Contributor

got correct reply for your recent index from archive sku(2020May13).tar.gz and your query

mysql> select count(distinct un_sku_id) from sku;
+---------------------------+
| count(distinct un_sku_id) |
+---------------------------+
|                    120005 |
+---------------------------+
1 row in set (0.10 sec)

indextool --check also not found errors related to attributes.

daemon runs under valgrind checker shows no errors.

Could you upload to our FTP your package where daemon will crash along with debug package? or daemon binary file along with symbol file for it? For binary I need not only daemon and symbols for it but also and indexer binary to check what cmake options do you use for build.

As for daemon that I build (Manticore 3.4.3 ab7cbe5@200511 dev) I see no crash you describe.

@glukkkk
Copy link
Author

glukkkk commented May 13, 2020

We will try again on a clean Manticore installation when new version is released and will let you know the result.

@tomatolog BTW, which flags should be used for building the daemon? Maybe we can try to rebuid it properly.

@tomatolog
Copy link
Contributor

here is my output of binaries I tried at dev box with your data

stas@dev:~/bin$ indexer -h
Built by gcc/clang v 5.4.0,

Built on Linux dev.manticoresearch.com 4.15.0-30-generic #32~16.04.1-Ubuntu SMP Thu Jul 26 20:25:39 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

Configured by CMake with these definitions: -DCMAKE_BUILD_TYPE=RelWithDebugInfo -DUSE_SSL=ON -DDL_EXPAT=1 -DEXPAT_LIB=libexpat.so.1 -DUSE_LIBICONV=1 -DDL_MYSQL=1 -DMYSQL_LIB=libmysqlclient.so.20 -DUSE_ICU=1 -DUSE_BISON=ON -DUSE_FLEX=ON -DUSE_SYSLOG=1 -DWITH_EXPAT=ON -DWITH_ICONV=ON -DWITH_MYSQL=ON -DWITH_RE2=1 -DWITH_STEMMER=ON -DWITH_ZLIB=ON -DGALERA_SOVERSION=31

@tomatolog
Copy link
Contributor

Here is a rhel7 packages that I build from mater version at our CI pipeline we use for regular release.

Could you check it?

@glukkkk
Copy link
Author

glukkkk commented Jun 9, 2020

Still no luck. I'll wait for the next release version and try there. If the problem is fixed, I will closed the issue.

@sanikolaev
Copy link
Collaborator

You can also check one of these https://repo.manticoresearch.com/#browse/browse:dev:release%2Fcentos%2F7

The most recent at the moment is of June 8

@glukkkk
Copy link
Author

glukkkk commented Jun 10, 2020

@sanikolaev We have tried it using your latest dev-release, it fails as well.

I've uploaded the file to your server (/issue-325/vagrant-pack.tar.gz). Please follow the instructions in the README.md file and you will reproduce the error.

@sanikolaev
Copy link
Collaborator

Thanks! We'll take a look into it.

@tomatolog
Copy link
Contributor

I installed Vagrant and Ansible and VirtaulBox at VPS however it failed to start with following error message

vagrant up
...
    default: Guest Additions Version: 6.1.6
    default: VirtualBox Version: 5.1
==> default: Mounting shared folders...
    default: /vagrant => /root
==> default: Running provisioner: ansible...
    default: Running ansible-playbook...
PYTHONUNBUFFERED=1 ANSIBLE_FORCE_COLOR=true ANSIBLE_HOST_KEY_CHECKING=false ANSIBLE_SSH_ARGS='-o UserKnownHostsFile=/dev/null -o IdentitiesOnly=yes -o ControlMaster=auto -o ControlPersist=60s' ansible-playbook --connection=ssh --timeout=30 --limit='default' --inventory-file=/root/.vagrant/provisioners/ansible/inventory -v playbook.yml
Using /etc/ansible/ansible.cfg as config file
ERROR! The handlers/main.yml file for role 'manticore-search' must contain a list of tasks

The error appears to have been in '/root/roles/manticore-search/tasks/main.yml': line 2, column 1, but may
be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:

---
- name: Install ManticoreSearch
^ here

Ansible failed to complete successfully. Any error output should be
visible above. Please fix these errors and try again.

@tomatolog
Copy link
Contributor

if I replayed your data.sql at raw metal box with daemon installed I got correct reply

stas@dev:~/bin/325/06_15$ mysql -e "select count(distinct un_sku_id) From sku;"
+---------------------------+
| count(distinct un_sku_id) |
+---------------------------+
|                    163509 |
+---------------------------+

@tomatolog tomatolog added the waiting Waiting for the original poster (in most cases) or something else label Jun 15, 2020
@glukkkk
Copy link
Author

glukkkk commented Jun 15, 2020

I installed Vagrant and Ansible and VirtaulBox at VPS however it failed to start with following error message

vagrant up
...
    default: Guest Additions Version: 6.1.6
    default: VirtualBox Version: 5.1
==> default: Mounting shared folders...
    default: /vagrant => /root
==> default: Running provisioner: ansible...
    default: Running ansible-playbook...
PYTHONUNBUFFERED=1 ANSIBLE_FORCE_COLOR=true ANSIBLE_HOST_KEY_CHECKING=false ANSIBLE_SSH_ARGS='-o UserKnownHostsFile=/dev/null -o IdentitiesOnly=yes -o ControlMaster=auto -o ControlPersist=60s' ansible-playbook --connection=ssh --timeout=30 --limit='default' --inventory-file=/root/.vagrant/provisioners/ansible/inventory -v playbook.yml
Using /etc/ansible/ansible.cfg as config file
ERROR! The handlers/main.yml file for role 'manticore-search' must contain a list of tasks

The error appears to have been in '/root/roles/manticore-search/tasks/main.yml': line 2, column 1, but may
be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:

---
- name: Install ManticoreSearch
^ here

Ansible failed to complete successfully. Any error output should be
visible above. Please fix these errors and try again.

Not sure why you experience this error. We checked it on MacOS and Ubuntu with different versions of Ansible, Vagrant and VBox. It works properly.

Could you please check it using another environment?

Thank you in advance.

@tomatolog
Copy link
Contributor

in case you still uses Virtual Box could you issue your steps but prior last step there daemon got crashed in count query save snapshot from working VM and upload that snapshot along with vdi file into FTP.

I could import that VM into Virtual Box and check the crash without posting data or any setup.

@glukkkk
Copy link
Author

glukkkk commented Jun 16, 2020

Uploaded to /issue-325/vagrant-pack_default_1592323550793_68799.ova
login/pass: vagrant/vagrant

Just run the machine and execute: mysql -h 127.0.0.1 -P9306 < /tmp/data.sql

@klirichek
Copy link
Contributor

Pavel,
it would be good if on EACH case when a crash is reproduced, you cite the tail of searchd.log with backtrace and all related stuff.
It might be possibly, that the reason would be decrypted, and then fixed and confirmed even without full circle of running through all these excersizes about running/starting system of vm, vagrant and other parts (if it actually doesn't depend from very narrow conditions of environment, and could be reproduced in more wide also, on bare metal/system)

@tomatolog
Copy link
Contributor

I reproduced the issue with VM image you provided.

The version installed crashes on inserting data.
However after I updated to master HEAD (3.4.3 751f340@200616) data posted well but it crashed on select query

 select count(distinct un_sku_id) From sku_20200317;

with following stack

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f76326ed700 (LWP 18792)]
GetBlobAttr (pRow=0x8 <Address 0x8 out of bounds>, iBlobAttrId=34, nBlobAttrs=52)
    at /usr/src/debug/manticore-3.4.3-200616-751f340-release-rhel7/bin/src_0/src/attribute.cpp:538
538             switch ( *pRow )
(gdb) bt
#0  GetBlobAttr (pRow=0x8 <Address 0x8 out of bounds>, iBlobAttrId=34, nBlobAttrs=52)
    at /usr/src/debug/manticore-3.4.3-200616-751f340-release-rhel7/bin/src_0/src/attribute.cpp:538
#1  0x00000000008bea62 in sphGetBlobAttr (tMatch=..., tLocator=..., pBlobPool=<optimized out>)
    at /usr/src/debug/manticore-3.4.3-200616-751f340-release-rhel7/bin/src_0/src/attribute.cpp:563
#2  0x000000000066edd0 in CSphMatch::FetchAttrData (this=<optimized out>, tLoc=..., pPool=<optimized out>)
    at /usr/src/debug/manticore-3.4.3-200616-751f340-release-rhel7/bin/src_0/src/sphinx.cpp:2443
#3  0x0000000000728cb6 in AddDistinctKeys<CSphImplicitGroupSorter<COMPGROUP, DISTINCT, NOTIFICATIONS>::UpdateDistinct(const CSphMatch&, bool) [with COMPGROUP = MatchGeneric2_fn; bool DISTINCT = true; bool NOTIFICATIONS = false]::__lambda35>(const CSphMatch &, CSphAttrLocator &, ESphAttr, const BYTE *, <unknown type in /usr/lib/debug/usr/bin/searchd.debug, CU 0x5f0483, DIE 0x6e0fd4>) (tEntry=..., tDistinctLoc=..., eDistinctAttr=<optimized out>,
    pBlobPool=<optimized out>, fnAdd=<unknown type in /usr/lib/debug/usr/bin/searchd.debug, CU 0x5f0483, DIE 0x6e0fd4>)
    at /usr/src/debug/manticore-3.4.3-200616-751f340-release-rhel7/bin/src_0/src/sphinxsort.cpp:2307
#4  0x000000000076fae9 in CSphImplicitGroupSorter<MatchGeneric2_fn, true, false>::MoveTo (this=0x7f75d8000980, pRhs=0x7f762817aa60)
    at /usr/src/debug/manticore-3.4.3-200616-751f340-release-rhel7/bin/src_0/src/sphinxsort.cpp:4217
#5  0x000000000086a2f0 in FlattenToSorter (pResult=0x7f762817aa60, pSources=...)
    at /usr/src/debug/manticore-3.4.3-200616-751f340-release-rhel7/bin/src_0/src/sphinxrt.cpp:5942
#6  0x000000000088fcc4 in Tls_context_c::Finalize (this=this@entry=0x7f76326e4440)
    at /usr/src/debug/manticore-3.4.3-200616-751f340-release-rhel7/bin/src_0/src/sphinxrt.cpp:6064
#7  0x000000000087c97a in QueryDiskChunks (pQuery=pQuery@entry=0x7f76284378c8, pResult=pResult@entry=0x7f761c151738, tArgs=..., tGuard=...,
    dSorters=..., pProfiler=pProfiler@entry=0x0, bGotLocalDF=bGotLocalDF@entry=false, pLocalDocs=pLocalDocs@entry=0x0,
    iTotalDocs=iTotalDocs@entry=370551, szIndexName=0x27aaf30 "sku_20200317", dDiskBlobPools=..., tmMaxTimer=tmMaxTimer@entry=0)
    at /usr/src/debug/manticore-3.4.3-200616-751f340-release-rhel7/bin/src_0/src/sphinxrt.cpp:6303
#8  0x000000000087ecf5 in RtIndex_c::MultiQuery (this=this@entry=0x27ad7c0, pQuery=pQuery@entry=0x7f76284378c8, pResult=0x7f761c151738,
    iSorters=iSorters@entry=1, ppSorters=ppSorters@entry=0x7f76280e8f50, tArgs=...)
    at /usr/src/debug/manticore-3.4.3-200616-751f340-release-rhel7/bin/src_0/src/sphinxrt.cpp:6408
#9  0x0000000000880ab7 in RtIndex_c::MultiQueryEx (this=0x27ad7c0, iQueries=<optimized out>, ppQueries=<optimized out>, ppResults=0x7f7615994cf0,
    ppSorters=0x7f76280e8f50, tArgs=...) at /usr/src/debug/manticore-3.4.3-200616-751f340-release-rhel7/bin/src_0/src/sphinxrt.cpp:6817
#10 0x00000000005c4821 in SearchHandler_c::RunLocalSearches (this=this@entry=0x7f76326e8350)
    at /usr/src/debug/manticore-3.4.3-200616-751f340-release-rhel7/bin/src_0/src/searchd.cpp:5726
#11 0x00000000005d9707 in SearchHandler_c::RunSubset (this=this@entry=0x7f76326e8350, iStart=iStart@entry=0, iEnd=iEnd@entry=0)
    at /usr/src/debug/manticore-3.4.3-200616-751f340-release-rhel7/bin/src_0/src/searchd.cpp:6477
#12 0x00000000005da485 in SearchHandler_c::RunQueries (this=this@entry=0x7f76326e8350)
    at /usr/src/debug/manticore-3.4.3-200616-751f340-release-rhel7/bin/src_0/src/searchd.cpp:5113
#13 0x00000000005dab10 in HandleMysqlSelect (dRows=..., tHandler=...)
    at /usr/src/debug/manticore-3.4.3-200616-751f340-release-rhel7/bin/src_0/src/searchd.cpp:12491
#14 0x00000000005fc667 in CSphinxqlSession::Execute (this=this@entry=0x7f76326eaec0, sQuery=..., tOut=..., tThd=...)
    at /usr/src/debug/manticore-3.4.3-200616-751f340-release-rhel7/bin/src_0/src/searchd.cpp:15248

I will investigate the issue and inform you on fix

@glukkkk
Copy link
Author

glukkkk commented Jun 17, 2020

Pavel,
it would be good if on EACH case when a crash is reproduced, you cite the tail of searchd.log with backtrace and all related stuff.
It might be possibly, that the reason would be decrypted, and then fixed and confirmed even without full circle of running through all these excersizes about running/starting system of vm, vagrant and other parts (if it actually doesn't depend from very narrow conditions of environment, and could be reproduced in more wide also, on bare metal/system)

Hello!

I had done it twice, however, it did not help you to find the cause of the problem. It's not hard for me to provide the log info. But I think it would be better if you reproduce the problem by yourselves.

The provided VB snapshot will certainly help you. Just import it and restore the dump, and you will see the problem.

@glukkkk
Copy link
Author

glukkkk commented Jun 17, 2020

I reproduced the issue with VM image you provided.

The version installed crashes on inserting data.
However after I updated to master HEAD (3.4.3 751f340@200616) data posted well but it crashed on select query

 select count(distinct un_sku_id) From sku_20200317;

with following stack

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f76326ed700 (LWP 18792)]
GetBlobAttr (pRow=0x8 <Address 0x8 out of bounds>, iBlobAttrId=34, nBlobAttrs=52)
    at /usr/src/debug/manticore-3.4.3-200616-751f340-release-rhel7/bin/src_0/src/attribute.cpp:538
538             switch ( *pRow )
(gdb) bt
#0  GetBlobAttr (pRow=0x8 <Address 0x8 out of bounds>, iBlobAttrId=34, nBlobAttrs=52)
    at /usr/src/debug/manticore-3.4.3-200616-751f340-release-rhel7/bin/src_0/src/attribute.cpp:538
#1  0x00000000008bea62 in sphGetBlobAttr (tMatch=..., tLocator=..., pBlobPool=<optimized out>)
    at /usr/src/debug/manticore-3.4.3-200616-751f340-release-rhel7/bin/src_0/src/attribute.cpp:563
#2  0x000000000066edd0 in CSphMatch::FetchAttrData (this=<optimized out>, tLoc=..., pPool=<optimized out>)
    at /usr/src/debug/manticore-3.4.3-200616-751f340-release-rhel7/bin/src_0/src/sphinx.cpp:2443
#3  0x0000000000728cb6 in AddDistinctKeys<CSphImplicitGroupSorter<COMPGROUP, DISTINCT, NOTIFICATIONS>::UpdateDistinct(const CSphMatch&, bool) [with COMPGROUP = MatchGeneric2_fn; bool DISTINCT = true; bool NOTIFICATIONS = false]::__lambda35>(const CSphMatch &, CSphAttrLocator &, ESphAttr, const BYTE *, <unknown type in /usr/lib/debug/usr/bin/searchd.debug, CU 0x5f0483, DIE 0x6e0fd4>) (tEntry=..., tDistinctLoc=..., eDistinctAttr=<optimized out>,
    pBlobPool=<optimized out>, fnAdd=<unknown type in /usr/lib/debug/usr/bin/searchd.debug, CU 0x5f0483, DIE 0x6e0fd4>)
    at /usr/src/debug/manticore-3.4.3-200616-751f340-release-rhel7/bin/src_0/src/sphinxsort.cpp:2307
#4  0x000000000076fae9 in CSphImplicitGroupSorter<MatchGeneric2_fn, true, false>::MoveTo (this=0x7f75d8000980, pRhs=0x7f762817aa60)
    at /usr/src/debug/manticore-3.4.3-200616-751f340-release-rhel7/bin/src_0/src/sphinxsort.cpp:4217
#5  0x000000000086a2f0 in FlattenToSorter (pResult=0x7f762817aa60, pSources=...)
    at /usr/src/debug/manticore-3.4.3-200616-751f340-release-rhel7/bin/src_0/src/sphinxrt.cpp:5942
#6  0x000000000088fcc4 in Tls_context_c::Finalize (this=this@entry=0x7f76326e4440)
    at /usr/src/debug/manticore-3.4.3-200616-751f340-release-rhel7/bin/src_0/src/sphinxrt.cpp:6064
#7  0x000000000087c97a in QueryDiskChunks (pQuery=pQuery@entry=0x7f76284378c8, pResult=pResult@entry=0x7f761c151738, tArgs=..., tGuard=...,
    dSorters=..., pProfiler=pProfiler@entry=0x0, bGotLocalDF=bGotLocalDF@entry=false, pLocalDocs=pLocalDocs@entry=0x0,
    iTotalDocs=iTotalDocs@entry=370551, szIndexName=0x27aaf30 "sku_20200317", dDiskBlobPools=..., tmMaxTimer=tmMaxTimer@entry=0)
    at /usr/src/debug/manticore-3.4.3-200616-751f340-release-rhel7/bin/src_0/src/sphinxrt.cpp:6303
#8  0x000000000087ecf5 in RtIndex_c::MultiQuery (this=this@entry=0x27ad7c0, pQuery=pQuery@entry=0x7f76284378c8, pResult=0x7f761c151738,
    iSorters=iSorters@entry=1, ppSorters=ppSorters@entry=0x7f76280e8f50, tArgs=...)
    at /usr/src/debug/manticore-3.4.3-200616-751f340-release-rhel7/bin/src_0/src/sphinxrt.cpp:6408
#9  0x0000000000880ab7 in RtIndex_c::MultiQueryEx (this=0x27ad7c0, iQueries=<optimized out>, ppQueries=<optimized out>, ppResults=0x7f7615994cf0,
    ppSorters=0x7f76280e8f50, tArgs=...) at /usr/src/debug/manticore-3.4.3-200616-751f340-release-rhel7/bin/src_0/src/sphinxrt.cpp:6817
#10 0x00000000005c4821 in SearchHandler_c::RunLocalSearches (this=this@entry=0x7f76326e8350)
    at /usr/src/debug/manticore-3.4.3-200616-751f340-release-rhel7/bin/src_0/src/searchd.cpp:5726
#11 0x00000000005d9707 in SearchHandler_c::RunSubset (this=this@entry=0x7f76326e8350, iStart=iStart@entry=0, iEnd=iEnd@entry=0)
    at /usr/src/debug/manticore-3.4.3-200616-751f340-release-rhel7/bin/src_0/src/searchd.cpp:6477
#12 0x00000000005da485 in SearchHandler_c::RunQueries (this=this@entry=0x7f76326e8350)
    at /usr/src/debug/manticore-3.4.3-200616-751f340-release-rhel7/bin/src_0/src/searchd.cpp:5113
#13 0x00000000005dab10 in HandleMysqlSelect (dRows=..., tHandler=...)
    at /usr/src/debug/manticore-3.4.3-200616-751f340-release-rhel7/bin/src_0/src/searchd.cpp:12491
#14 0x00000000005fc667 in CSphinxqlSession::Execute (this=this@entry=0x7f76326eaec0, sQuery=..., tOut=..., tThd=...)
    at /usr/src/debug/manticore-3.4.3-200616-751f340-release-rhel7/bin/src_0/src/searchd.cpp:15248

I will investigate the issue and inform you on fix

Good to hear that! I look forward to your reply!

@githubmanticore githubmanticore removed the waiting Waiting for the original poster (in most cases) or something else label Jun 22, 2020
@tomatolog
Copy link
Contributor

do you have crashes similar to these not inside VM?

Do you work mostly with VM or metal boxes (VPS)?

As inside VM you provided I have crashes on insert or after insert finishes on select but all crashes happens at different places and it is still hard to get stable reproduction and debug the root of the crash as I still have no single point there crashes happened.

However at metal box or VPS (centos7, ubuntu) I see no crashes either at insert nor at select. That is why I ask you about the crashes at your environment.

@glukkkk
Copy link
Author

glukkkk commented Jun 25, 2020

Yes, we have the crash reproduced on the VPS server running on CentOS. We built this VM image similar to our VPS environment just to provide you with it.

@glukkkk
Copy link
Author

glukkkk commented Jul 21, 2020

Hello there! Any news? @tomatolog

@sanikolaev
Copy link
Collaborator

Hello @glukkkk. We can reproduce the issue with some chance, but only in the VM you provided and cannot reproduce it neither on a bare metal server running centos/ubuntu nor in Docker or in a Hetzner VPS running Centos 7. The problem seems to be too specific and is taking too much time. If it's mission critical for you please consider using our professional services https://manticoresearch.com/services , you can then perhaps give us access to your server so we can do debugging write there.

@glukkkk
Copy link
Author

glukkkk commented Aug 11, 2020

@tomatolog @sanikolaev

Thank you for your time and patience. We have installed the latest version 3.5.0 and the problem is not reproduced there anymore. It seems that some of your commits fixed it!

@glukkkk glukkkk closed this as completed Aug 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants