iotune fails on i4i.32xlarge #17154

travisdowns · 2024-03-18T02:58:26Z

Version & Environment

Redpanda version: 23.3

What went wrong?

On EC2 i4i instance type, when run in the usual way via rpk iotune, the iotune process will fail with EAGAIN error in io_setup.

This is most likely due to the large number of CPUs (128) combined with slightly or very off aio cb calculations as we will try to consume more or less exactly all 1m aio cbs at this instance size (as evidence by the log message about reducing networking io cbs).

What should have happened instead?

iotune completes succesfully

How to reproduce the issue?

run rpk iotune on an i4i instance with aio-max-nr set to 1m

Additional information

Log:

[client 0:0] Overriding evaluation directories with: ["/mnt/vectorized/0ada61eec5354e22856072fe0cbe7cff"]
[client 0:0] Starting iotune...
[client 0:0] 02:46:43.693  DEBUG  Running 'iotune-redpanda' with '[`--evaluation-directory` `/mnt/vectorized/0ada61eec5354e22856072fe0cbe7cff` `--format` `seastar` `--properties-file` `/mnt/vectorized/0ada61eec5354e22856072fe0cbe7cff/io-config.yaml` `--duration` `600`]'
[client 0:0] 02:46:43.693  DEBUG  Running command 'iotune-redpanda' with arguments '[--evaluation-directory /mnt/vectorized/0ada61eec5354e22856072fe0cbe7cff --format seastar --properties-file /mnt/vectorized/0ada61eec5354e22856072fe0cbe7cff/io-config.yaml --duration 600]'
[client 0:0] error during iotune execution: err=exit status 1, stderr=WARN  2024-03-18 02:46:43,896 seastar - Requested AIO slots too large, please increase request capacity in /proc/sys/fs/aio-max-nr. configured:1048576 available:1048576 requested:1411328
[client 0:0] WARN  2024-03-18 02:46:43,896 seastar - max-networking-io-control-blocks adjusted from 10000 to 7166, since AIO slots are unavailable
[client 0:0] INFO  2024-03-18 02:46:43,896 seastar - Reactor backend: linux-aio
[client 0:0] INFO  2024-03-18 02:46:44,355 [shard   0:n/a ] seastar - Perf-based stall detector creation failed (EACCESS), try setting /proc/sys/kernel/perf_event_paranoid to 1 or less to enable kernel backtraces: falling back to posix timer.
[client 0:0] INFO  2024-03-18 02:46:44,367 [shard   0:n/a ] cpu_profiler - Perf-based cpu profiler creation failed (EACCESS), try setting /proc/sys/kernel/perf_event_paranoid to 1 or less to enable kernel backtraces: falling back to posix timer.
[client 0:0] INFO  2024-03-18 02:46:44,378 [shard   0:main] seastar - Created fair group io-queue-0 for 64 queues, capacity rate 2147483:2147483, limit 12582912, rate 16777216 (factor 1), threshold 2000, per tick grab 196608
[client 0:0] INFO  2024-03-18 02:46:44,378 [shard   0:main] seastar - IO queue uses 0.75ms latency goal for device 0
[client 0:0] INFO  2024-03-18 02:46:44,378 [shard   0:main] seastar - Created io group dev(0), length limit 4194304:4194304, rate 2147483647:2147483647
[client 0:0] INFO  2024-03-18 02:46:44,378 [shard   0:main] seastar - Created io queue dev(0) capacities: 512:2000:2000 1024:3000:3000 2048:5000:5000 4096:9000:9000 8192:17000:17000 16384:33000:33000 32768:65000:65000 65536:129000:129000 131072:257000:257000
[client 0:0] INFO  2024-03-18 02:46:44,837 [shard 109:main] seastar - Created fair group io-queue-0 for 64 queues, capacity rate 2147483:2147483, limit 12582912, rate 16777216 (factor 1), threshold 2000, per tick grab 196608
[client 0:0] INFO  2024-03-18 02:46:44,837 [shard 109:main] seastar - IO queue uses 0.75ms latency goal for device 0
[client 0:0] INFO  2024-03-18 02:46:44,837 [shard 109:main] seastar - Created io group dev(0), length limit 4194304:4194304, rate 2147483647:2147483647
[client 0:0] ERROR 2024-03-18 02:46:44,956 [shard   0:main] seastar - Exiting on unhandled exception: std::__1::system_error (error system:11, io_setup: Resource temporarily unavailable)
[client 0:0]

JIRA Link: CORE-1885

The text was updated successfully, but these errors were encountered:

github-actions · 2024-09-24T06:39:05Z

This issue hasn't seen activity in 3 months. If you want to keep it open, post a comment or remove the stale label – otherwise this will be closed in two weeks.

github-actions · 2025-01-17T06:37:45Z

This issue hasn't seen activity in 3 months. If you want to keep it open, post a comment or remove the stale label – otherwise this will be closed in two weeks.

travisdowns added kind/bug Something isn't working performance labels Mar 18, 2024

travisdowns self-assigned this Mar 18, 2024

github-actions bot added the stale label Sep 24, 2024

travisdowns removed the stale label Oct 1, 2024

github-actions bot added the stale label Jan 17, 2025

travisdowns removed the stale label Jan 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

iotune fails on i4i.32xlarge #17154

iotune fails on i4i.32xlarge #17154

travisdowns commented Mar 18, 2024 •

edited by jira bot

Loading

github-actions bot commented Sep 24, 2024

github-actions bot commented Jan 17, 2025

iotune fails on i4i.32xlarge #17154

iotune fails on i4i.32xlarge #17154

Comments

travisdowns commented Mar 18, 2024 • edited by jira bot Loading

Version & Environment

What went wrong?

What should have happened instead?

How to reproduce the issue?

Additional information

github-actions bot commented Sep 24, 2024

github-actions bot commented Jan 17, 2025

travisdowns commented Mar 18, 2024 •

edited by jira bot

Loading