Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to avoid max open files limit #391

Closed
DirkFries opened this issue Mar 13, 2020 · 9 comments
Closed

How to avoid max open files limit #391

DirkFries opened this issue Mar 13, 2020 · 9 comments
Assignees
Labels
bug Something isn't working caching affects the caching layer

Comments

@DirkFries
Copy link

Hi @ALL,

tonight trickster ran into an open files limit, which was configured to 32k files.

2020-03-13T01:16:10.910147+01:00 host trickster-1.0.1.linux-amd64[1416]: 2020/03/13 01:16:10 http: Accept error: accept tcp [::]:9092: accept4: too many open files; retrying in 1s

Of course one way to fix this is raising the open files limit, but I am wondering which configuration option of trickster affects the number of files the daemon opens.

Does anyone have an idea?

Thanks a lot !

Bye, Dirk

@jranson
Copy link
Member

jranson commented Mar 13, 2020

Thanks for the report! Can you let us know what type of cache you are using in your Trickster config?

@DirkFries
Copy link
Author

I am using

cache_type = 'memory'

@jwshieldsGetty
Copy link

Hello,

We are using Trickster at my organization;
Running version 1.0.1 - we've configured our instance of Trickster to use an in memory cache, and have given it a limit of 64k files

Lately, we've been running into this issue, where our instance soft-crashes and floods our logs with errors of "too many open files"

Is there any idea where this may be coming from or what can be done to mitigate/fix it?

@jranson
Copy link
Member

jranson commented Mar 16, 2020

Thanks for the additional info. Since this is happening on a memory cache, it would point to network connections causing the file descriptors to be exhausted, and the default (32k) should work just fine, so there is a possibility something is leaking. We'll take a look at the pprof diagnostics to try and isolate this and issue a patch.

@DirkFries
Copy link
Author

Hi @ALL,

had the same error tonight and collected some infos before restarting

lsof showed 1028 opened files like this:

trickster 29131 trickster 1023u IPv6 68885247 0t0 TCP host:XmlIpcRegSvc->nagios:35762 (ESTABLISHED)

So the assumption of jranson that the problem are not really open files but open network connections is true.

@jranson
Copy link
Member

jranson commented Apr 14, 2020

@jwshieldsGetty @DirkFries Thanks for all the information on this - we have identified the mutex deadlock that is the root cause of this (causing subsequent inbound requests to hang while waiting on lock acquisition and exhausting the file descriptors), and will be issuing a patch in the next 24 hours.

@jranson jranson self-assigned this Apr 14, 2020
@jranson jranson added bug Something isn't working caching affects the caching layer labels Apr 14, 2020
This was referenced Apr 14, 2020
@jranson
Copy link
Member

jranson commented Apr 16, 2020

All we have cut release v1.0.3, which is the patch release to address this issue. We would really appreciate if you can try it out and report back if the issue has resolved for you. Thanks again for the report and your patience while we got it worked out.

@jwshieldsGetty
Copy link

Hello! I've rolled this change out to our environments this morning.
So far, around 4 hours in, things are holding stable, but I will report back here if I notice something.

Thank you all for the responsiveness and the fix on this!

@jranson
Copy link
Member

jranson commented Jun 2, 2020

It seems this issue has been resolved during the beta cycle, please reopen if the GA version of 1.1 still brings you to the max file limits.

@jranson jranson closed this as completed Jun 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working caching affects the caching layer
Projects
None yet
Development

No branches or pull requests

3 participants