Searchd 6.2.12 thread hangs when executing a SELECT query #1774

kzakhark · 2024-01-30T09:36:39Z

Describe the bug

Searchd 6.2.12 thread hangs indefinitely when executing a select query and consumes 100% of a CPU core. The query cannot be killed with kill CLI command, searchd must be killed with system tools to clear the stuck thread. Index checks run fine and show no issues.

To Reproduce
Steps to reproduce the behavior:

Connect to searchd with MySQL client;
Execute a specific query against our dataset:

SELECT id,length(concat(street,city)) as length,weight() FROM `coverage_autocomplete` WHERE MATCH('@(city,street) "karlstad"') AND number_and_letter='' AND country_code='SE' ORDER BY weight() DESC,length ASC LIMIT 1 OFFSET 0;

The query hangs, the thread executing the query consumes 100% of 1 CPU core.

Expected behavior
The query is expected to complete and return results in a finite time. We have another system with identical OS and software versions and similarly sized index (with somewhat different data), where the same query returns results very quickly without any issues.

Describe the environment:

Manticore Search version:

Server version: 6.2.12 dc5144d35@230822 (columnar 2.2.4 5aec342@230822) (secondary 2.2.4 5aec342@230822) git branch manticore-6.2.12...origin/manticore-6.2.12

OS version (uname -a if on a Unix-like system):

Linux systemname 3.10.0-1160.45.1.el7.x86_64 #1 SMP Wed Oct 13 17:20:51 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

CentOS Linux release 7.9.2009 (Core)
Derived from Red Hat Enterprise Linux 7.9 (Source)
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

Messages from log files:
There's nothing relevant in searchd.log and query.log.

Additional context
I'll provide additional information separately, as some files are too large to upload to GitHub.
threads-output.txt
lsof-output.txt
gdb-output.txt

The text was updated successfully, but these errors were encountered:

kzakhark · 2024-01-30T09:47:16Z

I've uploaded additional data to s3.manticoresearch.com, manticore/write-only/issue-1774:

core.26109.gz - searchd process core dump;
coverage_autocomplete.tar.gz - complete tarball of the data against which we run the query;
Manticore.sql.gz - SQL dump of the above data.

kzakhark · 2024-01-30T12:01:46Z

What's interesting, after the 1st select thread hangs, the same query made from a new connection repeatedly works correctly:

| 32000 | work_0     | mysql | query | 127.0.0.1:43210 |   2986 | 1491.909407 | 31s       |    117316 | 25m           | No (working) | Mini Mini Mini Query Conn  | 5 ch 0: 5 ch 2: api-search query="@(city,street) "karlstad"" comment="" index="coverage_autocomplete" SELECT id,length(concat(street,city)) as length,weight() FROM `coverage_autocomplete` WHERE MATCH('@(city,street) "karlstad"') AND number_and_letter='' AND country_code='SE' ORDER BY weight() DESC,length ASC LIMIT 1 OFFSET 0 |
...
MySQL [(none)]> SELECT id,length(concat(street,city)) as length,weight() FROM `coverage_autocomplete` WHERE MATCH('@(city,street) "karlstad"') AND number_and_letter='' AND country_code='SE' ORDER BY weight() DESC,length ASC LIMIT 1 OFFSET 0;
+---------+--------+----------+
| id      | length | weight() |
+---------+--------+----------+
| 2147906 |     24 |     1625 |
+---------+--------+----------+
1 row in set (0.002 sec)

sanikolaev · 2024-01-30T12:14:49Z

Would you like to try to reproduce it in the latest dev version (https://mnt.cr/nightly) ?

kzakhark · 2024-01-30T12:38:08Z

We only face the issue on a production system, where we'd rather not use nightly builds unless absolutely necessary. We're unable to reproduce the issue on a test system, although the test system doesn't have the same load as the production environment.

kzakhark · 2024-02-05T05:00:25Z

An update: unfortunately, we were unable to reproduce the issue in our test systems, including an exact clone of the affected system. After we rebooted the affected system the issue disappeared and we couldn't reproduce it in the last few days. It is unclear what caused specific queries to get stuck, as the affected system didn't show any abnormal readings or other problems, but it's no longer happening after the system reboot.

sanikolaev · 2024-02-05T08:10:05Z

BTW I tried to reproduce this issue on our side on 6.2.12 by sending the mentioned above select query with some concurrency and a few concurrent inserts/updates concurrent threads and it didn't hang or crash.

kzakhark · 2024-02-05T08:13:00Z

Same here, we were unable to reproduce the issue with rather high request rate with high concurrency. We're rather puzzled, as it's unclear what could have been the culprit.

sanikolaev added bug rel::upcoming Upcoming release labels Jan 30, 2024

kzakhark closed this as completed Feb 5, 2024

sanikolaev removed the rel::upcoming Upcoming release label Feb 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Searchd 6.2.12 thread hangs when executing a SELECT query #1774

Searchd 6.2.12 thread hangs when executing a SELECT query #1774

kzakhark commented Jan 30, 2024 •

edited

Loading

kzakhark commented Jan 30, 2024

kzakhark commented Jan 30, 2024

sanikolaev commented Jan 30, 2024

kzakhark commented Jan 30, 2024

kzakhark commented Feb 5, 2024

sanikolaev commented Feb 5, 2024

kzakhark commented Feb 5, 2024

Searchd 6.2.12 thread hangs when executing a SELECT query #1774

Searchd 6.2.12 thread hangs when executing a SELECT query #1774

Comments

kzakhark commented Jan 30, 2024 • edited Loading

kzakhark commented Jan 30, 2024

kzakhark commented Jan 30, 2024

sanikolaev commented Jan 30, 2024

kzakhark commented Jan 30, 2024

kzakhark commented Feb 5, 2024

sanikolaev commented Feb 5, 2024

kzakhark commented Feb 5, 2024

kzakhark commented Jan 30, 2024 •

edited

Loading