-
-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leak: engineio.Socket object never freed in error conditions #328
Comments
You seem to be correct, it looks like some socket objects are not removed when there's a lot of concurrent reconnections. I've been testing with this patch and this appears to remove all lingering sockets:
I need to test this more. I'm also not sure why some disconnections are caught due to a timeout and not with the disconnect that the browser sends (though this may be that the browser doesn't properly do this when it has to refresh too many pages at the same time). Let me know if this helps. |
@pierrehebert I committed a fix to the main branch. Can I ask you to retest on your side and report back? Thanks! |
Thank you for your fast reply and patch. Also, there should be a signal indicating that the socket is no more valid, isn't it ? This might be a read/write exception or something else but once the socket has been closed, whatever the reason, the OS should report it reliably to the app. When closing 50 tabs, all sockets are gone with no delay and this can be seen in |
Low-level sockets do not have a notification system. If one side closes the socket nothing happens on the other side, which will only find out of the closure the next time it tries to read or write on it. The JavaScript client adds a With respect to the |
|
@pierrehebert monkey patching is not required and should not make any difference because the writer task is started using the eventlet/gevent implementations of the threading module, not the Python one. If eventlet and/or gevent call Python's |
@pierrehebert I did some debugging on the
Can I ask you to give this patch a try on your side and report back if you see improvements? |
Hello,
Bug description
I'm facing memory leak issues when using python-engineio inside a Flask app, with Flask-SocketIO and either gevent or eventlet. This is easily reproducible using the sample below.
I believe this is closely related to #323 , but there is more than that, and this is why I'm opening a new issue. There are multiple problems in one, and probably not all in python-engineio. I would like to honestly say that my understanding is not perfect, so I apologize if my assumptions or conclusions are wrong.
Also I'm sorry for this long bug report...
Setup
Using Arch Linux with python 3.11.3 and this requirements.txt file:
For the reference, pip list gives:
The python code below creates an app that can use either gevent or eventlet, either in standalone mode or using gunicorn. This app creates a single web page displaying a live count of connected clients.
Using a venv and the aforementioned requirements.txt, the app can be launched these ways:
Steps to trigger the leak
Open a browser with a few tabs on http://127.0.0.1:5001/. This works correctly and the displayed counter reflects the exact count of connections. Now duplicate the tabs to get between 30 or 50 connections. Select all tabs and reload them all at once using Ctrl-R multiple times to stress the server. The connections count will grow above the number of tabs but will eventually decrease and reach again the right count. But looking at the process RSS memory, it shows a constantly growing memory use and the process will ultimately be killed by the OOM. Underlying socket file descriptors have been closed and enough time have been given to the server to handle timeouts but RAM is never released.
Another but longer way to reproduce the issue is to simulate a network disconnection using iptables:
Initially I caught the issue because of random network disconnections, but the 50 tabs trick is a better and faster way to reproduce the bug.
Please note that if the server is not stressed a bit, the leak doesn't seem to appear frequently. So the issue presumably appears in a context of errors, such as timeout or aborted connection.
Preliminary analysis
Using tracemalloc and objgraph shows that the number of instances of engineio.Socket objects grows. This can be monitored using this patch:
Because this object has a packet queue and messages sent in the sample are rather large and sent to all clients, the memory leak rapidly becomes visible.
After removing the
@socketio.on('connect')
and@socketio.on('disconnect')
handlers, the leak is still present, although a lot less noticeable of course.A simple (but obviously unsatisfying) workaround is to call
del self.sockets[sid]
somewhere in the _service_task where check_ping_timeout is called.This improves a lot the situation, but alas, there is another leak. I'm not sure this one is linked with python-engineio though, but hopefully someone could point me in the right direction if this is not the case. The issue is related to python _DummyThread objects, used to handle threads that have been created outside of the python context. Such instances are, in the context of the above sample, created when using
writer_task.join()
from engineio/socket.py:260 (in particular but not only). These _DummyThread instances should not be created, but something is wrong somewhere in python's threading.current_thread() and get_ident(). I can see this with eventlet but not with gevent. Maybe this has something to do with eventlet but that's not obvious.I could also prevent this leak using a dirty workaround (skip calls to the threading.current_thread() function), however there is still another leak (both with eventlet and gevent). I didn't started investigated this one yet.
Any help would be appreciated! And sorry again for the long post...
The text was updated successfully, but these errors were encountered: