-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
4.3.2: Crashing use-after-free in poller_base_t::execute_timers #3645
Comments
I can imagine that the changes made by fa976f8 changed the behaviour here subtly. Could you try to revert the changes made by that commit to poller_base.cpp and see if this fixes your issue? However, the code didn't seem to be correct before. It assumed already that |
Hi, I'm using ZMQ_CLIENT and libzmq version: 4.3.2 and I have a very similar symptom! Thread 4 "ZMQbg/IO/0" received signal SIGSEGV, Segmentation fault. Is there any solution for this ? |
Hi. The issue is that the class which created the timer calls
I guess the simple mutex locking on all three |
@SteveReadSNS , @rhymerjr Can you please try with the latest commit from https://github.com/bjovke/libzmq.git? |
Sorry, I've been busy with other things recently. Based on the analysis of my colleague who has also worked on this, I'd say that a mutex is not going to solve this problem in my case, because the accesses to the container are all from the same thread (ZMQ's I/O thread - in my case there's only one I/O thread). |
That's not to say that a mutex isn't important, just that if there's only one I/O thread, it won't help in most cases, and I suspect it shouldn't help at all in any case. The reaper shouldn't be manipulating these objects while they are still have timers open in an I/O thread, and it shouldn't be possible to have one I/O thread manipulating objects that are in another I/O thread's timer queue. |
Might be, but then again it might not. |
Hi. |
Hi, I 'am a SteveReadSNS collegue, and I will follow this issue. The most important thing, is that the iteration on the map is not safe, because its content may be changed by the called |
Hi @ebourneuf . Sorry for the late reply.
This is also valid when new timers are inserted, none of the existing iterators are invalidated. I'm wondering, since you don't have this issue any more (as I understood), why is this an issue for you now? |
I don't have this issue any more because we fixed the function with the following patch |
Great. I get the idea, this is the safest way. I'll try to make a patch based on this. |
OK, I've made the changes at https://github.com/bjovke/libzmq.git. |
Yes! A have changed the poller_base.cpp by the patch attached by ebourneuf! |
Great! Please let us know when you have some results. |
After more than a week I can say that the crash has gone! |
Great! |
To avoid a use-after-free crash in `execute_timers` that showed up when running the test suite with valgrind. Refs: zeromq#3645
To avoid a use-after-free crash in `execute_timers` that showed up when running the test suite with valgrind. Refs: zeromq#3645
To avoid a use-after-free crash in `execute_timers` that showed up when running the test suite with valgrind. Refs: zeromq#3645 PR-URL: #6 Reviewed-by: Trevor Norris <[email protected]>
Issue description
While
poller_base_t::execute_timers
is walking the collection of timers that are expiring / have already expired, a different thread cancels the specific timer whose timer_event method is about to be called. In my environment, the heap functions including new/delete will fill blocks with 0x5A when they are freed.The crash happens at line 103 of poller_base.cpp:
when the code attempts to dereference
it->second.sink
.Environment
Calling language is C++ via cppzmq. I only have one thread, so there are no issues surrounding my code using libzmq objects from more than one thread. At the moment of the crash, there are three threads:
Minimal test code / Steps to reproduce the issue
I don't have a clear picture of what exactly causes this. I can't tell what kind of object was just removed from the timer multimap because the iterator points to deleted memory (see note above about deleted blocks being filled with 0x5A). My best guess is a PUB or SUB socket that's being disconnected or unbound from my code at the same moment as the TCP connection to its peer is about to be reconnected, and the poller_base_t is about to dispatch to the timer_event handler.
At certain times, my program will find that its ZeroMQ sockets are disconnected by network events at about the same time that those network events cause a notification to my program to change which endpoints it uses (unbind-then-bind on one socket, disconnect-then-connect on another).
There are also zmq IPC sockets, but they are only rarely changed while the program is running, whereas the network events mentioned above are expected to happen (even if we hope they don't), and they will cause the disconnect/connect and unbind/bind sequences.
What's the actual result? (include assertion message & call stack if applicable)
What's the expected result?
Does not crash.
The text was updated successfully, but these errors were encountered: