-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
memory leak: session/observers are not cleared on ungracefully closed websocket #1769
Comments
That does sound difficult to reproduce and track down. I looked at your packages and saw that you were not using the new Facts package in your reproduction. It counts observers, subscriptions, and other related things and might be useful to confirm what you are seeing. Just mentioning since the package is relatively new and undocumented. |
@Tarangp and anyone interested may find the server info package similarly useful. It gives you an instantaneous snapshot of what subs / LRSes etc Meteor has open. |
I started off using livedata because I wanted to know why the sessions stayed in memory. Originally I thought perhaps people aren't closing their tabs or the browser's thumbnail preview is doing this. I managed to get the same thing to happen with the Facts package. That can be reproduced locally which may be better but another computer/iphone is still needed. With a bare bones app with the facts package
With locally hosted Atmosphere and the packages dump:
So its quite likely this is the main memory leak on atmosphere as there was only actually one session active at the end of this. Possible Fixes
|
We did see a handles leak in our mystery hunt meteor app over the course of a weekend as well, FWIW. We went from ~5k handles open to ~7k handles open over the course of 24 hours, then reset to around ~5k after a restart. Of course, our conditions weren't very reproducible, so kudos to @Tarangp for his substantial work above. |
Thanks for the great writeup and replication, @Tarangp. I've replicated this and we're looking into it. |
We've narrowed this down to the http proxy. The symptoms go away if you use port 3001 instead of 3000. The real fix is probably the same as #1543: ddp level heartbeats. However, we can hopefully fix the proxy to pass errors along better and avoid the leak in this case. |
Previously, certain errors on the client/proxy socket would result in the proxy keeping the websocket to the server open forever rather than closing it. For example, disconnecting the client from the internet without a graceful shutdown. This could easily be reproduced with any app using `facts`, eg https://github.com/tarangp/test-observers. Run the app, connect locally with one browser, connect from a second computer, observe sessions going to 2, disconnect the second computer. With this commit, sessions will go back to 1 in about a minute. Without it, it never will. This particular fix is not very compelling, since it uses undocumented features of the stream interface. I will file an issue with the node-http-proxy project tonight asking how we're supposed to do this. This addresses #1769. I'll close that once we have a more compelling fix, and once the similar bug is fixed in the proxies used in the `meteor deploy` server and in Galaxy.
@n1mmy meant to say "ddp level heartbeats". (Specifically, heartbeats with responses that are actually used by both client and server to terminate the connection when they don't arrive.) DDP level heartbeats will be more consistent and faster than TCP/websocket level detection. But that doesn't excuse not also getting the TCP-level error handling right. |
Oops, yeah. I meant heartbeats. Brain-o =). Comment updated to avoid confusing future readers. |
Wouldn't it be better to hook into sockjs' heartbeats? It makes sense with server to server DDP because there's no sockjs layer underneath. It would be a bit redundant with the ordinary browser-client meteor-server scenario because there's this 'h' sockjs sends every 30s or 60s not too sure. |
Well, there are two things that DDP-level heartbeats could provide that would be improvements over the sockjs heartbeats: (a) They could be bidrectional, so that clients could detect vanishing servers (like in #1543). (b) SockJS heartbeats don't have a reply. As far as I understand, their main purpose is to ensure that something goes over the wire every so often. This allows polling transports to finish intermediate queries, and for websocket transports to be pushing something over the network so that TCP will eventually notice that transmissions are timing out. But many apps will want something stronger: to be able to clean up resources associated with clients (or servers) that aren't actively responding to pings right now. |
Also, (c): they could happen on raw websockets when there is no sockjs framing (eg on server-to-server DDP connections). We could in theory use websocket ping frames in this case, but that may not work on future transport layers (eg raw TCP, etc). |
Status update for clarity: this is fixed in |
@n1mmy Which version of meteor was this fix applied to? There might be another way it still occurs, at least on the old atmosphere which runs 0.7.1.2. Its not as bad as before so there may have been another way a session stays in memory. |
I am observing this issue on my test app for https://github.com/mizzao/meteor-user-status as well, except the mystery connections are from a normal Google Chrome desktop client. http://user-status.meteor.com/ makes it really easy to see the ghost sessions (there are some right now, which can be easily seen as dupes from the same IP, and will probably stay until the server is restarted.) This issue was hypothesized in tmeasday/meteor-presence#17 and I'm glad it was discovered and replicated. |
This is fixed in the development mode proxy in Meteor 0.7.1+. It is not totally fixed on The real solution here is DDP heartbeats. See #1865. |
We're using the new http-proxy on |
Cc: @tmeasday @n1mmy
After monitoring atmosphere over the past week (sorry about the downtime during this) I think i've found a potential issue when sockets are ungracefully closed. Meteor keeps these sessions in memory and they build up.
It's quite difficult to test this because its hard to emulate what ungraceful exit is. I could see lots of sessions in memory on atmosphere that were over 14 hours old so this is what I started to watch for.
Additionally meteorite, until last week, used
process.exit()
to exit instead of disconnecting the active socket. I'm not sure if this could have played a role in all this.There were about 430 active sessions in memory, 30% being from meteorite, most over 14 hours old (310 sessions). I say this because those 30% didn't subscribe to the typical methods one would on atmosphere). The meteorite
process.exit()
issue was fixed last week: oortcloud/meteorite@6d89351I tried to make an app with a modified livedata package that keeps track of all the active sockets at any given time. (Sorry about the js errors, I don't think linker plays well with custom modified core packages. it doesn't work without
--debug
) : https://github.com/Tarangp/Meteor-memory-leak-bugTo give you an idea, this is what the memory on atmosphere looks like at the moment, so this may be what is causing it.
How to reproduce the issue
--debug
. Also hosted at : https://testsessionbug.meteor.com/socket.on('close'..)
isn't firing) :meteor/packages/livedata/stream_server.js
Line 70 in b5cd66d
This might be a bug in sockjs or there might be a different way to handle the ungracefully closed connection?
Amplification
While this bug may be small, when a session is left in memory this way all its observer's and publish methods stay active. This is probably the memory used builds up so quickly.
I tested this in a separate project similar to the above just monitoring to see if
onStop
was called. If the socket is closed ungracefullyonStop
isn't called and all the observers remain active.This is probably why Nick mentioned there might be a CPU leak too, likely from calculating the diffs on the observer methods from the publish functions. This would have been done for all these unclosed sessions wastefully when a document is changed.
There are also > 800 packages on atmosphere which are subscribed to in entirety for each of these sessions. Meteor keeps track of each users subscription's documents too (to identify what to remove when the subscription is
unsub
'd and what to send.changed
for a particular client) so this would add up for each of these sessions.The issues on atmosphere started when the custom download counter was added in and its likely that it contributed to using more memory and cpu per session.
Alongside higher traffic than usual this is what might have taken atmosphere down, especially when the sessions build over a weeks time. I've been redeploying atmosphere every 2 days for the moment to try and prevent this.
Related
The text was updated successfully, but these errors were encountered: