Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve indexing performance: batch stat()s with io_uring #2821

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

dcolascione
Copy link
Contributor

Add support for using io_uring to batch stat() calls when scanning maildir directories. This approach reduces the number of individual syscalls by processing stats in batches of up to 16384 files at a time; it also allows the kernel to continue doing kernel-internal stat()s while we do indexing work.

Performance impact is moderate but noticeable on a real-world maildir with ~490k messages:

Without io_uring: 17.5s real time (0.75s user, 5.6s sys)
With io_uring: 15.4s real time (0.74s user, 8.4s sys)

The higher sys time with io_uring reflects the batch processing happening in kernel space rather than repeated userspace->kernel transitions.

Only enable io_uring for maildir directories since:

  1. They're the only directories likely to be large enough to benefit
  2. This ensures the io_uring instance isn't used concurrently

The feature can be disabled at runtime via MU_DISABLE_IO_URING=1 or at build time via -Diouring=disabled. Requires liburing >= 2.3.

Add support for using io_uring to batch stat() calls when scanning
maildir directories. This approach reduces the number of individual
syscalls by processing stats in batches of up to 16384 files at a
time; it also allows the kernel to continue doing kernel-internal
stat()s while we do indexing work.

Performance impact is moderate but noticeable on a real-world maildir with
~490k messages:

  Without io_uring: 17.5s real time (0.75s user, 5.6s sys)
  With io_uring:    15.4s real time (0.74s user, 8.4s sys)

The higher sys time with io_uring reflects the batch processing happening
in kernel space rather than repeated userspace->kernel transitions.

Only enable io_uring for maildir directories since:
1. They're the only directories likely to be large enough to benefit
2. This ensures the io_uring instance isn't used concurrently

The feature can be disabled at runtime via MU_DISABLE_IO_URING=1 or at
build time via -Diouring=disabled. Requires liburing >= 2.3.
@djcb
Copy link
Owner

djcb commented Feb 23, 2025

Looks interesting. Will take a bit longer to review this (I'm not too familiar with io_uring), I don't think before 1.12.9 (which should be in the coming week)

The performance improvements are modest, so we'll have to weigh that against the added complexity. Anyway, thanks for doing this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants