Setting max_threads_per_query = 12 leads to 99.9 CPU load for two threads on 16 core box #1631

starinacool · 2023-11-26T04:50:35Z

Describe the bug
2 of 16 worker threads go 99.9 CPU time when I try to change max_threads_per_query from 10 to 12 on a 16 core box. Even after removing all workload from the server these two threads keep consuming 99.9 CPU.
Server cannot be stoped with systemctl stop manticore. Only kill -9 helps.

To Reproduce
Steps to reproduce the behavior:

Setup a 16 core 32GB SSD box width RT index
Load some data
Change to max_threads_per_query = 12 , restart
Add workload

Expected behavior
All worker threads working normaly.

Describe the environment:
Manticore 6.2.12 dc5144d@230822 (columnar 2.2.4 5aec342@230822) (secondary 2.2.4 5aec342@230822)

Linux manticore-001 6.1.0-13-amd64 Should I use manticore instead of sphinxsearch? #1 SMP PREEMPT_DYNAMIC Debian 6.1.55-1 (2023-09-29) x86_64 GNU/Linux

Messages from log files:
[Sun Nov 26 06:31:34.042 2023] [634140] caught SIGTERM, shutting down
[Sun Nov 26 06:31:39.550 2023] [634140] WARNING: still 2 alive tasks during shutdown, after 5.508 sec
[Sun Nov 26 06:31:39.701 2023] [634153] rt: table listing_finished: ramchunk saved in 0.150 sec

Additional context
Config:
optimize_cutoff = 8
max_threads_per_query = 10
access_doclists=mmap
access_hitlists=mmap
network_timeout = 20
client_timeout = 300
seamless_rotate = 1
unlink_old = 1
max_packet_size = 64M
max_filter_values = 65535
listen_backlog = 255
max_batch_queries = 32
subtree_docs_cache = 16M
subtree_hits_cache = 32M
binlog_flush = 2
binlog_max_log_size = 128M
expansion_limit = 100
query_log_format = sphinxql
collation_server = utf8_general_ci
collation_libc_locale = ru_RU.UTF-8
query_log_min_msec = 200
predicted_time_costs = doc=64, hit=48, skip=2048, match=64

The text was updated successfully, but these errors were encountered:

sanikolaev · 2023-11-26T05:49:38Z

Even after removing all workload from the server these two threads keep consuming 99.9 CPU.

Please show the following at this moment:

top
vmstat 5 during a minute
show threads option format=all
select * from @@system.sessions
show status
searchd log
query log
show table <name> status of your table(s)

Korkman · 2024-01-10T00:15:51Z

I have observed a similar failure. Workers would go to 100%, the connection to the client would break (the client receives no response).

They were processing sphinx protocol requests querying a text field "all_childs" which can contain the words "child_1", "child_2", ... up to "child_18". These were the hanging queries I had to kill -9:

RT indices were present, but the queries ran against a non-RT index.

strace started from htop showed no syscall activity on the crashed worker processes.

These unspecific queries match a good portion of 230k documents. Other, more specific queries did not crash.

After reading this issue, I set max_threads_per_query = 4 to lower my threads per query. ~~No failing workers so far.~~
UPDATE: this setting did not fix the issue for me.

Hardware: AMD Ryzen 9 5950X 16-Core Processor, 128 GB RAM
OS: Debian Bookworm within a KVM VM, 16 vcores assigned (hyperthreading is enabled, so this is 16 of 32 possible vcores) and 16 GB RAM
Config:

max_connections = 100
expansion_limit = 500
seamless_rotate = 1
collation_libc_locale = de_DE.UTF-8
network_timeout = 5m
qcache_max_bytes = 0

searchd: Manticore 6.2.12 dc5144d35@230822 (columnar 2.2.4 5aec342@230822) (secondary 2.2.4 5aec342@230822)

Korkman · 2024-01-10T00:55:52Z

Here's some data

journalctl.txt
show-status.txt
show-threads.txt
system-sessions.txt
vmstat-5.txt

sanikolaev · 2024-01-10T07:20:07Z

@Korkman if you can stably reproduce it by running on of the @(all_childs) queries, could you share your table files and your config with us by sending them to our write-only s3 storage - https://manual.manticoresearch.com/Reporting_bugs#Uploading-your-data ? If we can reproduce this issue on our side, we'll be able to fix it.

tomatolog · 2024-01-10T08:48:35Z

could you try to use head of the dev version as it has fixes of CPU limit during FT queries ?

Korkman · 2024-01-10T22:05:58Z

@tomatolog @sanikolaev 6.2.13 a2af06ca3@240110 dev (columnar 2.2.5 1d1e432@231204) (secondary 2.2.5 1d1e432@231204) (knn 2.2.5 1d1e432@231204) seems to work fine.

@tomatolog Would a workaround be possible in 6.2.12 or can this only be fixed with the release of 6.2.13?

tomatolog · 2024-01-10T23:48:40Z

you could set max_threads_per_query for full-text with multiple OR terms to keep CPU under control at the 6.2.12
or use 6.2.13 as dev version soon be released into main repository

sanikolaev · 2024-01-12T05:30:37Z

seems to work fine.

Thanks. I'm closing this issue then.

@starinacool feel free to reopen in case it doesn't work for you in the dev version or the upcoming release.

sanikolaev added the waiting Waiting for the original poster (in most cases) or something else label Nov 26, 2023

sanikolaev closed this as completed Jan 12, 2024

sanikolaev added rel::upcoming Upcoming release and removed waiting Waiting for the original poster (in most cases) or something else labels Jan 12, 2024

sanikolaev added the bug label Feb 7, 2024

sanikolaev added rel::6.3.0 Released in 6.3.0 and removed rel::upcoming Upcoming release labels May 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Setting max_threads_per_query = 12 leads to 99.9 CPU load for two threads on 16 core box #1631

Setting max_threads_per_query = 12 leads to 99.9 CPU load for two threads on 16 core box #1631

starinacool commented Nov 26, 2023

sanikolaev commented Nov 26, 2023

Korkman commented Jan 10, 2024 •

edited

Loading

Korkman commented Jan 10, 2024

sanikolaev commented Jan 10, 2024

tomatolog commented Jan 10, 2024

Korkman commented Jan 10, 2024

tomatolog commented Jan 10, 2024

sanikolaev commented Jan 12, 2024

Setting max_threads_per_query = 12 leads to 99.9 CPU load for two threads on 16 core box #1631

Setting max_threads_per_query = 12 leads to 99.9 CPU load for two threads on 16 core box #1631

Comments

starinacool commented Nov 26, 2023

sanikolaev commented Nov 26, 2023

Korkman commented Jan 10, 2024 • edited Loading

Korkman commented Jan 10, 2024

sanikolaev commented Jan 10, 2024

tomatolog commented Jan 10, 2024

Korkman commented Jan 10, 2024

tomatolog commented Jan 10, 2024

sanikolaev commented Jan 12, 2024

Korkman commented Jan 10, 2024 •

edited

Loading