-
Notifications
You must be signed in to change notification settings - Fork 330
Possible memory leak in WP entrypoint binary #1192
Comments
While we've been very careful to try to avoid something like this, I believe it. We'll have to do some investigation on what's going on here. But can I ask are you using any features such as app config, exec, logs access, etc.? |
I've been running a rather unscientific test in the background as I go through my day today. I'll update this post with any findings. For this test, I deployed into Docker locally since we're testing for an entrypoint binary leak which is hopefully reproducible regardless of platform, and Docker is easy for me to setup for test.
So far I'm not seeing any leaks. That doesn't mean they don't exist, of course, its just that maybe they're not obvious to trigger yet. I'll keep my refresh loop running for the day and see what happens... |
@kfh If I added a signal handler (i.e. |
Our waypoint configurations are fairly small. We are also not using much of the functionality provided by the entrypoint anymore. For logs we have to use AWS Cloudwatch since we need more advanced features and that works out of the box. The only feature we are missing when not injecting the entrypoint is the possibility to exec directly into a shell on the running Fargate container. Something that isn't supported by default for Fargate, but there are workarounds. |
We ran a deployment with my PR above over the weekend and dumped the profile today. We spent some time thinking about it and we think we found the source of the leak. The good news is we think we can solve this serverside (a TCP connection on the server isn't being properly closed out which is causing the clients to hold on). We're also gonna investigate making the client more robust so we can just force Will update you soon! |
This brings in a partial fix for #1192 by adding a close timeout to open Yamux streams to release the connection from the connection table. There is another change coming in to fix this more directly but this adds a fail safe
The fixes are coming in. These are more fixes than are necessary but ultimately make the whole source of this memory leak more robust to never happen again on both the client and server sides.
After merging these in we'll probably want to run another test for a couple days to completely validate this then cut a release. |
This is fixed and verified over the past 24 hours! We're going to cut a new 0.2.x release soon and that will have the fix in it. |
WP server: 0.2.3
WP client: 0.2.3
I've chosen my wording carefully because we haven't done any form of isolated profiling on the entrypoint but i'm instead basing this on our observations over time as heavy users of WP in general with and without the entrypoint injected into our containers.
This should be seen in relevance with the memory draining issue reported here. By using the soft limit memory draining is only delayed. Eventually the container will be killed.
This is a slow burner, even using the smallest possible instance type on Fargate(512mb) is takes some days before it's drained completely. But the pattern is clear, even when the apps deployed are only idling there is a constant rise in memory usage. As mentioned in the linked issue we have profiled our apps using AWS Codeguru extensively to make sure we are not the source of this issue. The attached image illustrate our observations:
When we opt out of injecting the entrypoint binary into our containers we have not seen any memory draining issues with our WP enabled applications.
The text was updated successfully, but these errors were encountered: