-
Notifications
You must be signed in to change notification settings - Fork 198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improved warning for ulimit -n 4096 needed #1758
Comments
tbh if we're going to spend time on this I think we should just require fewer FDs in tests I think most of them are because we write files to disk just to immediately read them back into the database? That should be pretty easy to mock. |
I'm not sure it's about files. TCP connections also consume an FD. This is very unscientific but I ran
I killed the tests after seeing that, so I think one or more of the tests late in this list were responsible:
|
@jsha a more reliable way to see the file descriptors used is to run strace and track |
This seems to be one contributor, but not the sole cause: Line 25 in 846add3
And there's a test case that intentionally tests that max: Lines 866 to 879 in 846add3
But removing that one test case, or reducing MAX_CONCURRENT_UPLOADS to 10, doesn't solve the problem. |
I think part of the problem is that the storage is never dropped because the strong count on the I'm not sure why we are keeping it alive, I didn't think there were any loops in the |
Oh, there's a comment on why Lines 494 to 496 in 846add3
we do shutdown the database underneath the storage so that shouldn't be it. |
I added some logging and during the complete test suite we create 2347 tokio runtimes and drop 2066 of them, so there's 281 leaked from somewhere... (also, that's like 10 runtimes per test, why are we creating so many 😰, I would expect like 4 max (one for database, one for S3, one for reqwest, maybe one somewhere else)). |
Ok, I tracked down the leaked runtimes, they're from the |
It took me a minute to figure out why we leak the webserver, so copying the comment here (from web/mod.rs):
That's a bummer! It would be so nice to fix that underlying problem, but that would necessitate getting off of iron, which I realize is a huge task. Given that, your proposed solution - make the Another approach: what about stubbing out the |
I'm relatively near to a first PR that starts the axum migration (and a branch where all handlers work with axum + sqlx),
I don't think it's doing live requests, I remember @GuillaumeGomez mocked the requests via We could stub out the updater when it's an issue, otherwise it would probably be fixed when we only have one runtime and most things are |
The README says:
All tests are failing or timing out
Our test setup needs a certain about of file descriptors.
At least 4096 should be enough, you can set it via:
$ ulimit -n 4096
But I've forgotten and relearned this a number of times, and it's cost me a lot of frustration each time, since the errors (and especially the hangs) are opaque. Is there a way we can improve on this? Maybe each test can check for available FDs and fast-fail if they're not available?
The text was updated successfully, but these errors were encountered: