Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

iotune fails on i4i.32xlarge #17154

Open
travisdowns opened this issue Mar 18, 2024 · 2 comments
Open

iotune fails on i4i.32xlarge #17154

travisdowns opened this issue Mar 18, 2024 · 2 comments
Assignees
Labels
kind/bug Something isn't working performance

Comments

@travisdowns
Copy link
Member

travisdowns commented Mar 18, 2024

Version & Environment

Redpanda version: 23.3

What went wrong?

On EC2 i4i instance type, when run in the usual way via rpk iotune, the iotune process will fail with EAGAIN error in io_setup.

This is most likely due to the large number of CPUs (128) combined with slightly or very off aio cb calculations as we will try to consume more or less exactly all 1m aio cbs at this instance size (as evidence by the log message about reducing networking io cbs).

What should have happened instead?

iotune completes succesfully

How to reproduce the issue?

  1. run rpk iotune on an i4i instance with aio-max-nr set to 1m

Additional information

Log:

[client 0:0] Overriding evaluation directories with: ["/mnt/vectorized/0ada61eec5354e22856072fe0cbe7cff"]
[client 0:0] Starting iotune...
[client 0:0] 02:46:43.693  DEBUG  Running 'iotune-redpanda' with '[`--evaluation-directory` `/mnt/vectorized/0ada61eec5354e22856072fe0cbe7cff` `--format` `seastar` `--properties-file` `/mnt/vectorized/0ada61eec5354e22856072fe0cbe7cff/io-config.yaml` `--duration` `600`]'
[client 0:0] 02:46:43.693  DEBUG  Running command 'iotune-redpanda' with arguments '[--evaluation-directory /mnt/vectorized/0ada61eec5354e22856072fe0cbe7cff --format seastar --properties-file /mnt/vectorized/0ada61eec5354e22856072fe0cbe7cff/io-config.yaml --duration 600]'
[client 0:0] error during iotune execution: err=exit status 1, stderr=WARN  2024-03-18 02:46:43,896 seastar - Requested AIO slots too large, please increase request capacity in /proc/sys/fs/aio-max-nr. configured:1048576 available:1048576 requested:1411328
[client 0:0] WARN  2024-03-18 02:46:43,896 seastar - max-networking-io-control-blocks adjusted from 10000 to 7166, since AIO slots are unavailable
[client 0:0] INFO  2024-03-18 02:46:43,896 seastar - Reactor backend: linux-aio
[client 0:0] INFO  2024-03-18 02:46:44,355 [shard   0:n/a ] seastar - Perf-based stall detector creation failed (EACCESS), try setting /proc/sys/kernel/perf_event_paranoid to 1 or less to enable kernel backtraces: falling back to posix timer.
[client 0:0] INFO  2024-03-18 02:46:44,367 [shard   0:n/a ] cpu_profiler - Perf-based cpu profiler creation failed (EACCESS), try setting /proc/sys/kernel/perf_event_paranoid to 1 or less to enable kernel backtraces: falling back to posix timer.
[client 0:0] INFO  2024-03-18 02:46:44,378 [shard   0:main] seastar - Created fair group io-queue-0 for 64 queues, capacity rate 2147483:2147483, limit 12582912, rate 16777216 (factor 1), threshold 2000, per tick grab 196608
[client 0:0] INFO  2024-03-18 02:46:44,378 [shard   0:main] seastar - IO queue uses 0.75ms latency goal for device 0
[client 0:0] INFO  2024-03-18 02:46:44,378 [shard   0:main] seastar - Created io group dev(0), length limit 4194304:4194304, rate 2147483647:2147483647
[client 0:0] INFO  2024-03-18 02:46:44,378 [shard   0:main] seastar - Created io queue dev(0) capacities: 512:2000:2000 1024:3000:3000 2048:5000:5000 4096:9000:9000 8192:17000:17000 16384:33000:33000 32768:65000:65000 65536:129000:129000 131072:257000:257000
[client 0:0] INFO  2024-03-18 02:46:44,837 [shard 109:main] seastar - Created fair group io-queue-0 for 64 queues, capacity rate 2147483:2147483, limit 12582912, rate 16777216 (factor 1), threshold 2000, per tick grab 196608
[client 0:0] INFO  2024-03-18 02:46:44,837 [shard 109:main] seastar - IO queue uses 0.75ms latency goal for device 0
[client 0:0] INFO  2024-03-18 02:46:44,837 [shard 109:main] seastar - Created io group dev(0), length limit 4194304:4194304, rate 2147483647:2147483647
[client 0:0] ERROR 2024-03-18 02:46:44,956 [shard   0:main] seastar - Exiting on unhandled exception: std::__1::system_error (error system:11, io_setup: Resource temporarily unavailable)
[client 0:0] 

JIRA Link: CORE-1885

@travisdowns travisdowns added kind/bug Something isn't working performance labels Mar 18, 2024
@travisdowns travisdowns self-assigned this Mar 18, 2024
Copy link

This issue hasn't seen activity in 3 months. If you want to keep it open, post a comment or remove the stale label – otherwise this will be closed in two weeks.

Copy link

This issue hasn't seen activity in 3 months. If you want to keep it open, post a comment or remove the stale label – otherwise this will be closed in two weeks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working performance
Projects
None yet
Development

No branches or pull requests

1 participant