Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prov/socket segfaults in sock_ep_connect() when it tries to dereference dest_addr #2676

Closed
tonyzinger opened this issue Jan 26, 2017 · 2 comments

Comments

@tonyzinger
Copy link
Contributor

The socket provider segfaults in sock_ep_connect() when it tries to dereference the dest_addr field of the sock_ep_attr structure.

The endpoint type is FI_EP_MSG.
The dest_addr field was NULL.

The gdb back trace information from the core files is:
Program terminated with signal 11, Segmentation fault.
#0 0x00007ffff78923e8 in sock_ep_connect (ep_attr=0x64fdc0, index=0) at prov/sockets/src/sock_conn.c:414
414 addr = *ep_attr->dest_addr;
(gdb) bt
#0 0x00007ffff78923e8 in sock_ep_connect (ep_attr=0x64fdc0, index=0) at prov/sockets/src/sock_conn.c:414
#1 0x00007ffff7880b43 in sock_ep_get_conn (attr=0x64fdc0, tx_ctx=0x650610, index=2, pconn=0x7fffffff5b88) at prov/sockets/src/sock_ep.c:1794
#2 0x00007ffff7895cd9 in sock_ep_tx_atomic (ep=0x64fcd0, msg=0x7fffffff5cd0, comparev=0x0, compare_desc=0x0, compare_count=0, resultv=0x0, result_desc=0x0,
result_count=0, flags=2305843009213693952) at prov/sockets/src/sock_atomic.c:101
#3 0x00007ffff7896627 in sock_ep_atomic_writemsg (ep=0x64fcd0, msg=0x7fffffff5cd0, flags=2305843009213693952) at prov/sockets/src/sock_atomic.c:272
#4 0x00007ffff7896708 in sock_ep_atomic_write (ep=0x64fcd0, buf=0x69a340, count=1, desc=0x69a3f0, dest_addr=2, addr=6923712, key=2, datatype=FI_INT32, op=FI_SUM,
context=0x7fffffff5e88) at prov/sockets/src/sock_atomic.c:304

(gdb) print *ep_attr
$1 = {fclass = 3, tx_shared = 0, rx_shared = 0, buffered_len = 0, min_multi_recv = 64, ref = {val = 0, is_initialized = 1}, eq = 0x0, av = 0x699fd0, domain = 0x62be80,
rx_ctx = 0x650990, tx_ctx = 0x650610, rx_array = 0x62c160, tx_array = 0x62ad60, num_rx_ctx = {val = 0, is_initialized = 1}, num_tx_ctx = {val = 0,
is_initialized = 1}, rx_ctx_entry = {next = 0x650ad8, prev = 0x650ad8}, tx_ctx_entry = {next = 0x650760, prev = 0x650760}, info = {next = 0x62aff0,
caps = 216172782117008144, mode = 0, addr_format = 2, src_addrlen = 16, dest_addrlen = 0, src_addr = 0x62a790, dest_addr = 0x0, handle = 0x0, tx_attr = 0x62ae10,
rx_attr = 0x62ae60, ep_attr = 0x62aeb0, domain_attr = 0x62af20, fabric_attr = 0x62afc0}, ep_attr = {type = FI_EP_UNSPEC, protocol = 0, protocol_version = 0,
max_msg_size = 0, msg_prefix_size = 0, max_order_raw_size = 0, max_order_war_size = 0, max_order_waw_size = 0, mem_tag_format = 0, tx_ctx_cnt = 1, rx_ctx_cnt = 1,
auth_keylen = 0, auth_key = 0x0}, ep_type = FI_EP_MSG, src_addr = 0x62ad40, dest_addr = 0x0, msg_src_port = 0, msg_dest_port = 0, peer_fid = 0, key = 0,
is_enabled = 1, cm = {sock = 0, do_listen = 0, signal_fds = {11, 12}, next_msg_id = 0, lock = {impl = 1, is_initialized = 1}, is_connected = 0, listener_thread = 0,
msg_list = {next = 0x64ff80, prev = 0x64ff80}}, listener = {sock = 0, do_listen = 0, is_ready = 0, signal_fds = {0, 0}, listener_thread = 0,
service = '\000' <repeats 31 times>}, lock = {impl = 1, is_initialized = 1}, conn_idm = {array = {0x0 <repeats 64 times>}, count = {0 <repeats 64 times>}},
av_idm = {array = {0x69a660, 0x0 <repeats 63 times>}, count = {1, 0 <repeats 63 times>}}, cmap = {table = 0x650b50, epoll_set = {fd = 13, size = 1024, used = 0,
events = 0x664b60}, used = 0, size = 1024, lock = {impl = 1, is_initialized = 1}}}

The sock_ep_connect() source code that I was testing against is:
403 struct sock_conn *sock_ep_connect(struct sock_ep_attr *ep_attr, fi_addr_t index)
404 {
405 int conn_fd = -1, ret;
406 int do_retry = sock_conn_retry;
407 struct sock_conn *conn, *new_conn;
408 struct sockaddr_in addr;
409 socklen_t lon;
410 int valopt = 0;
411 struct pollfd poll_fd;
412
413 if (ep_attr->ep_type == FI_EP_MSG) {
414 addr = *ep_attr->dest_addr;
415 addr.sin_port = htons(ep_attr->msg_dest_port);
416 } else {
417 addr = *((struct sockaddr_in *)&ep_attr->av->table[index].addr);
418 }
419
420 do_connect:
421 fastlock_acquire(&ep_attr->cmap.lock);
422 conn = sock_ep_lookup_conn(ep_attr, index, &addr);
423 fastlock_release(&ep_attr->cmap.lock);
424
425 if (conn != SOCK_CM_CONN_IN_PROGRESS)
426 return conn;
427
428 conn_fd = socket(AF_INET, SOCK_STREAM, 0);
429 if (conn_fd == -1) {
430 SOCK_LOG_ERROR("failed to create conn_fd, errno: %d\n", errno);
431 errno = FI_EOTHER;
432 return NULL;
433 }
434
435 ret = fd_set_nonblock(conn_fd);

The test case that I was using was invalid but the socket provider should not segfault because of a user using Libfabric incorrectly. The reason, that the test case is invalid, was it did not use fi_connect() for an endpoint type of FI_EP_MSG before it tried to send its data.

@tonyzinger tonyzinger changed the title prov/socket segfaults in sock_ep_connect() when ti tries to dereference dest_addr prov/socket segfaults in sock_ep_connect() when it tries to dereference dest_addr Jan 26, 2017
@dmitrygx
Copy link
Member

dmitrygx commented Feb 7, 2017

under investigation by @gladkovdmitry17

dmitrygx added a commit to dmitrygx/fabtests that referenced this issue Feb 9, 2017
…it tries to

dereference dest_addr" issue
ofiwg/libfabric#2676

fi_msg_sockets: Add new test case that covers Issue #2676
Invokes fi_send when no connect is established and no destination addres:port
pair is passed to fi_info

Change-Id: I1a64131eafa882b9f60a725d055ede039ad2250b
Signed-off-by: Gladkov, Dmitry <[email protected]>
dmitrygx added a commit to dmitrygx/fabtests that referenced this issue Feb 9, 2017
…it tries to

dereference dest_addr" issue
ofiwg/libfabric#2676

fi_msg_sockets: Add new test case that covers Issue #2676
Invokes fi_send when no connect is established and no destination addres:port
pair is passed to fi_info

Change-Id: I1a64131eafa882b9f60a725d055ede039ad2250b
Signed-off-by: Gladkov, Dmitry <[email protected]>
dmitrygx added a commit to dmitrygx/libfabric that referenced this issue Feb 9, 2017
…nce dest_addr

ofiwg#2676

The validation of destination address on NULL pointer has been added in
sock_ep_connect. The destination address is NULL in case of if no destination
address is passed as a fi_info and no connect is established. but fi_send is
called.

Change-Id: I0e526b286360756ed4dd5732b36f29ca08ed18d4
Signed-off-by: Dmitry Gladkov <[email protected]>
@dmitrygx
Copy link
Member

dmitrygx commented Feb 9, 2017

Should work in #2714

dmitrygx added a commit to dmitrygx/libfabric that referenced this issue Feb 13, 2017
…nce dest_addr

ofiwg#2676

The validation of destination address on NULL pointer has been added in
sock_ep_connect. The destination address is NULL in case of if no destination
address is passed as a fi_info and no connect is established. but fi_send is
called.

Change-Id: I0e526b286360756ed4dd5732b36f29ca08ed18d4
Signed-off-by: Dmitry Gladkov <[email protected]>
sayantansur pushed a commit to sayantansur/libfabric that referenced this issue Feb 15, 2017
…nce dest_addr

ofiwg#2676

The validation of destination address on NULL pointer has been added in
sock_ep_connect. The destination address is NULL in case of if no destination
address is passed as a fi_info and no connect is established. but fi_send is
called.

Change-Id: I0e526b286360756ed4dd5732b36f29ca08ed18d4
Signed-off-by: Dmitry Gladkov <[email protected]>
@shefty shefty closed this as completed Feb 24, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants