Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

4.2.2 crash in zmq::msg_t::close() #2617

Closed
lacombar opened this issue Jul 6, 2017 · 13 comments
Closed

4.2.2 crash in zmq::msg_t::close() #2617

lacombar opened this issue Jul 6, 2017 · 13 comments

Comments

@lacombar
Copy link

lacombar commented Jul 6, 2017

One of my program recently crashed on the following backtrace:

#0 0x00000000008b6f52 in zmq::msg_t::close (this=this@entry=0x7fc3ec000f60) at .../libzmq/src/msg.cpp:237           
#1 0x00000000008e61b8 in zmq::stream_engine_t::~stream_engine_t (this=0x7fc3ec000f40, __in_chrg=<optimized out>) at .../libzmq/src/stream_engine.cpp:163
#2 0x00000000008e64a9 in zmq::stream_engine_t::~stream_engine_t (this=0x7fc3ec000f40, __in_chrg=<optimized out>) at .../libzmq/src/stream_engine.cpp:177
#3 0x00000000008e470d in zmq::stream_engine_t::error (this=this@entry=0x7fc3ec000f40, reason=reason@entry=zmq::stream_engine_t::connection_error) at .../libzmq/src/stream_engine.cpp:981
#4 0x00000000008e59b9 in zmq::stream_engine_t::in_event (this=0x7fc3ec000f40) at .../libzmq/src/stream_engine.cpp:318
#5 0x00000000008b46cb in zmq::epoll_t::loop (this=0x1126dd0) at .../libzmq/src/epoll.cpp:180
#6 0x00000000008cbb85 in thread_routine (arg_=0x1126e50) at .../libzmq/src/thread.cpp:100
#7 0x00007fc4077e7184 in start_thread (arg=0x7fc3fd7fa700) at pthread_create.c:312
#8 0x00007fc40605cffd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Code involved is the following:

int zmq::msg_t::close ()                                                                            
{                                                                                                   
    //  Check the validity of the message.                                                          
    if (unlikely (!check ())) {                                                                     
        errno = EFAULT;                                                                             
        return -1;                                                                                  
    }                                                                                               
                                                                                                    
    if (u.base.type == type_lmsg) {                                                                 
                                                                                                    
        //  If the content is not shared, or if it is shared and the reference                      
        //  count has dropped to zero, deallocate it.                                               
        if (!(u.lmsg.flags & msg_t::shared) ||                                                      
              !u.lmsg.content->refcnt.sub (1)) {                                                    
                                                                                                    
            //  We used "placement new" operator to initialize the reference                        
            //  counter so we call the destructor explicitly now.                                   
            u.lmsg.content->refcnt.~atomic_counter_t ();                                            
                                                                                                    
            if (u.lmsg.content->ffn)                                                                
                u.lmsg.content->ffn (u.lmsg.content->data,                                          
                    u.lmsg.content->hint); // <== line 237                                                          
            free (u.lmsg.content);                                                                  
        }                                                                                           
    }                                                                                               

FYI, I have a core available and would be available to provide any forensic if needed.

@lacombar
Copy link
Author

lacombar commented Jul 7, 2017

note: this might be a corruption of the struct. u.lmsg.content->ffn is 0xc2b54c14b0300400 which does not appear to be in any mapped region of the program (as far as gdb's info files can tell).

@lacombar
Copy link
Author

lacombar commented Jul 7, 2017

the whole structure (minus data) seem to be corrupted:

(gdb) print *u.lmsg.content
$12 = {data = 0x7fc3c0000388, size = 140478716576648, ffn = 0xc2b54c14b0300400, hint = 0xba70512344aee9e1, refcnt = {value = 3160290249}

@lacombar
Copy link
Author

lacombar commented Jul 7, 2017

size is 0x7fc3c0000388 hex, which seems more to be a pointer value than a valid size.

@bjovke
Copy link
Contributor

bjovke commented Jul 7, 2017

@lacombar This doesn't necessarily mean that u.lmsg.content->ffn is corrupted. It might be that u.lmsg.content was freed/never allocated or message type is not lmsg at all.
It would be nice if you have a minimal example which reproduces this issue?

@bluca
Copy link
Member

bluca commented Jul 7, 2017

Also please show the dump of the whole msg, so we can check what msg type it was

@lacombar
Copy link
Author

lacombar commented Jul 7, 2017

@bjovke I do not have a reproducible test case yet. However, this crash might be associated with a python stack trace on the other end:

Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/gevent-1.1.0-py2.7-linux-x86_64.egg/gevent/greenlet.py", line 534, in run
    result = self._run(*self.args, **self.kwargs)
  File ".../msg_broker/message_broker.py", line 367, in responseListener
    response = self.jsonToIpcMessage(self.__msgBrokerResponsePullerServer.recvJson())
  File ".../utils/zmq_handler.py", line 126, in recvJson
    message = self.recv()
[...]
ZMQError: Interrupted system call
<Greenlet at 0x7f28993c6870: <bound method MessageBroker.responseListener of <MessageBroker at 0x7f28993c6690>>> failed with ZMQError

@bluca I'll have a look into this later in the day, thanks !

@bluca
Copy link
Member

bluca commented Jul 7, 2017

The type is an int, and the enum is defined here: https://github.com/zeromq/libzmq/blob/master/src/msg.hpp#L146

@bjovke
Copy link
Contributor

bjovke commented Jul 10, 2017

Definitely the type of message in zmq::msg_t::close() is marked as type_lmsg when crash happens:

    if (u.base.type == type_lmsg) {
        ......
        free (u.lmsg.content);
        .......
    }

Either the u.base.type is wrong or u.lmsg.content is unallocated.
Case with double free of u.lmsg.content is less possible since there's always a u.lmsg.content->refcnt.sub (1) check before freeing it.

Anyway, it's hard to say anything more without a working example and/or more information.

@bluca
Copy link
Member

bluca commented Mar 24, 2018

@lacombar any update?

@stale
Copy link

stale bot commented Mar 24, 2019

This issue has been automatically marked as stale because it has not had activity for 365 days. It will be closed if no further activity occurs within 56 days. Thank you for your contributions.

@eclazi
Copy link

eclazi commented Nov 8, 2022

I believe we have seen this issue also. @yinhuan99 did you find a resolution?

@uczwq
Copy link

uczwq commented Sep 28, 2023

Is there a solution?
I have the same problem

@hmenn
Copy link

hmenn commented May 25, 2024

Hi,

I'm using v4.3.5 and faced same problem. Periodically happens after +15days running
Is it possible to re-open this issue?

#0  __libc_do_syscall () at libc-do-syscall.S:49
#1  0xb68501a4 in __libc_signal_restore_set (set=0xaf9f89a8) at ../sysdeps/unix/sysv/linux/internal-signals.h:86
#2  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:48
#3  0xb6841762 in __GI_abort () at abort.c:79
#4  0xb6876e88 in __libc_message (action=action@entry=do_abort, fmt=<optimized out>) at ../sysdeps/posix/libc_fatal.c:155
#5  0xb687bca6 in malloc_printerr (str=<optimized out>) at malloc.c:5347
#6  0xb687d28a in _int_free (av=0xb6916594 <main_arena>, p=0x6927f8, have_lock=<optimized out>) at malloc.c:4314
#7  0xb6ce3f12 in zmq::msg_t::close (this=this@entry=0xaf9f8d10) at /usr/src/debug/zeromq/4.3.5-r0/zeromq-4.3.5/src/msg.cpp:262
#8  0xb6cd86b4 in zmq_msg_close (msg_=msg_@entry=0xaf9f8d10) at /usr/src/debug/zeromq/4.3.5-r0/zeromq-4.3.5/src/zmq.cpp:627
#9  0x0044a55c in zmq::message_t::~message_t (this=0xaf9f8d10, __in_chrg=<optimized out>) at /opt/poky/3.1.32/sysroots/cortexa8t2hf-neon-poky-linux-gnueabi/usr/include/zmq.hpp:388

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants