Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TLS: use fewer inotify instances and/or be more resilient to them not working #2513

Open
nyh opened this issue Oct 21, 2024 · 4 comments
Open
Assignees
Labels

Comments

@nyh
Copy link
Contributor

nyh commented Oct 21, 2024

Recently some ScyllaDB test runs failed (scylladb/scylladb#21199) with the error:

std::system_error (error system:24, could not create inotify instance: Too many open files)

It turns out that Seastar's TLS implementation internally uses inotify to automatically recognize when the certificate files have changed. The error message is misleading - for inotify_init() an EMFILE errno (which gets printed as "Too many open files") does not refer the limit on number of open files, but to the separate per-user limit on the number of inotify instances. This number is configured in /proc/sys/fs/inotify/max_user_instances and is often fairly low - e.g., on my Fedora 40 it defaults to just 128. It appears that Seastar creates an inotify instance for each shard, and the ScyllaDB test framework ran many tests in parallel, and the result was running out of inotify instances.

The ScyllaDB testers easily solved this problem by increasing the /proc/sys/fs/inotify/max_user_instances, but I'm worried that this problem can hit other Seastar users as well, who won't be aware that Seastar TLS even uses inotify, and that /proc/sys/fs/inotify/max_user_instances is so low. I want to propose that we consider two options, perhaps even both:

  1. Use fewer inotify instances - perhaps even just once per per process instead of one per shard (e.g., have just shard 0 be responsible for the very rare inotify work). On modern many-shard machines, this will make a big difference.
  2. Be forgiving towards failing inotify_init() in TLS: Make it a logged warning (don't use the phrase "Too many open files" in the logged message) - not an exception. When inotify_init() failed, either the certificate reloading would not work at all - or if we consider it an important feature, we can implement something similar without inotify (e.g., once a minute check if the file changed).

CC @elcallio

@bhalevy
Copy link
Member

bhalevy commented Dec 8, 2024

@regevran please assign

@elcallio elcallio self-assigned this Dec 8, 2024
@bhalevy
Copy link
Member

bhalevy commented Dec 8, 2024

Is it possible the inotify instances are leaking somehow when running the tests via test.py?

@elcallio
Copy link
Contributor

elcallio commented Dec 8, 2024

No. The problem is that we use a few per shard (one for each file referenced - typically 3-4 per TLS config, say up to 3-4 TLS configs in scylla.yaml, run 8 shards and then run tests in parallel. This adds up. With ludicrously small default settings we hit the ceiling.

I've done some work to maybe make the TLS usage in scylla use a shard-0-only solution for TLS inotify, but I got sidetracked. In any case, however you design it, it will be slightly hackish...

elcallio pushed a commit to elcallio/seastar that referenced this issue Dec 11, 2024
…ose rebuild

Refs scylladb#2513

Adds a more advanced callback type, which takes both actual reloading builder as
argument (into which new files are loaded), and allows proper future-wait in
callback.

Exposes certificates rebuilding (via builder) to allow "manual", quick, reload of certs.

The point of these seemingly small changes is to allow client code to, for example,
limit actual reloadable_certs (and by extension inotify watches) to shard 0 (or whatever),
and simply use this as a trigger for manual reload of other shards.

Note: we cannot do any magical "shard-0-only" file monitor in the objects themselves,
not without making the certs/builders sharded or similarly stored (which contradict the
general design of light objects, copyable between shards etc). But with this, a calling
app in which certs _are_ held in sharded manners, we can fairly easily delegate non-shard-0
ops in a way that fits that topology.

Note: a builder can be _called_ from any shard (as long as it is safe in its originating
shard), but the objects returned are only valid on the current shard.

Similarly, it is safe to share the reloading builder across shards _in the callback_,
since rebuilding is blocked for the duration of the call.
elcallio pushed a commit to elcallio/seastar that referenced this issue Dec 16, 2024
…ose rebuild

Refs scylladb#2513

Adds a more advanced callback type, which takes both actual reloading builder as
argument (into which new files are loaded), and allows proper future-wait in
callback.

Exposes certificates rebuilding (via builder) to allow "manual", quick, reload of certs.

The point of these seemingly small changes is to allow client code to, for example,
limit actual reloadable_certs (and by extension inotify watches) to shard 0 (or whatever),
and simply use this as a trigger for manual reload of other shards.

Note: we cannot do any magical "shard-0-only" file monitor in the objects themselves,
not without making the certs/builders sharded or similarly stored (which contradict the
general design of light objects, copyable between shards etc). But with this, a calling
app in which certs _are_ held in sharded manners, we can fairly easily delegate non-shard-0
ops in a way that fits that topology.

Note: a builder can be _called_ from any shard (as long as it is safe in its originating
shard), but the objects returned are only valid on the current shard.

Similarly, it is safe to share the reloading builder across shards _in the callback_,
since rebuilding is blocked for the duration of the call.
xemul pushed a commit that referenced this issue Dec 24, 2024
…ose rebuild

Refs #2513

Adds a more advanced callback type, which takes both actual reloading builder as
argument (into which new files are loaded), and allows proper future-wait in
callback.

Exposes certificates rebuilding (via builder) to allow "manual", quick, reload of certs.

The point of these seemingly small changes is to allow client code to, for example,
limit actual reloadable_certs (and by extension inotify watches) to shard 0 (or whatever),
and simply use this as a trigger for manual reload of other shards.

Note: we cannot do any magical "shard-0-only" file monitor in the objects themselves,
not without making the certs/builders sharded or similarly stored (which contradict the
general design of light objects, copyable between shards etc). But with this, a calling
app in which certs _are_ held in sharded manners, we can fairly easily delegate non-shard-0
ops in a way that fits that topology.

Note: a builder can be _called_ from any shard (as long as it is safe in its originating
shard), but the objects returned are only valid on the current shard.

Similarly, it is safe to share the reloading builder across shards _in the callback_,
since rebuilding is blocked for the duration of the call.

Closes #2573
@mykaul
Copy link

mykaul commented Dec 30, 2024

6f39b89 is merged in Seastar, what else do we need?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants