Skip to content

Commit

Permalink
rdma: (fix) memory leak: don't alloc recv_conn_resp_req on EAGAIN
Browse files Browse the repository at this point in the history
Move call to `prepare_recv_conn_resp_req` before entry point of
`COMM_SEND_CONN` stage.

This resolves a potential memory leak where `prepare_recv_conn_resp_req`
is called multiple times on EAGAIN.

Also fixup a similar case in `accept`, although there was no memory
allocation there so no leak.

Signed-off-by: Eric Raut <[email protected]>
(cherry picked from commit 398b853)
  • Loading branch information
rauteric authored and rajachan committed Apr 14, 2024
1 parent 6969756 commit ed1861e
Showing 1 changed file with 8 additions and 8 deletions.
16 changes: 8 additions & 8 deletions src/nccl_ofi_rdma.c
Original file line number Diff line number Diff line change
Expand Up @@ -3790,10 +3790,6 @@ static int accept(nccl_net_ofi_listen_comm_t *listen_comm,
/* Reset request state for connect response message */
prepare_send_conn_resp_req(l_comm);

l_comm->stage = COMM_SEND_CONN;

case COMM_SEND_CONN:

/* Initialize connect response message */
ret = prepare_conn_resp(ep, l_comm, dev_id);
if (ret != 0) {
Expand All @@ -3806,6 +3802,10 @@ static int accept(nccl_net_ofi_listen_comm_t *listen_comm,
/* Send r_comm's remote comm ID */
conn_msg->remote_comm_id = r_comm->remote_comm_id;

l_comm->stage = COMM_SEND_CONN;

case COMM_SEND_CONN:

/* COMM_SEND_CONN: Send connect response message to remote */
ret = post_send_conn_resp(r_comm, conn_msg, device, ep, req);
if (ret == -FI_EAGAIN) {
Expand Down Expand Up @@ -5159,17 +5159,17 @@ static int connect(nccl_net_ofi_ep_t *base_ep,
}
comm_state->req = &req->base;

comm_state->stage = COMM_SEND_CONN;

case COMM_SEND_CONN:

/* Prepare request to receive connect response message */
s_comm->conn_resp_req = prepare_recv_conn_resp_req(s_comm);
if (OFI_UNLIKELY(s_comm->conn_resp_req == NULL)) {
send_close(s_comm);
return -EINVAL;
}

comm_state->stage = COMM_SEND_CONN;

case COMM_SEND_CONN:

/* COMM_SEND_CONN: Post a connect message to send peer connections */
ret = post_send_conn(s_comm, device, ep, req);
if (ret == -FI_EAGAIN) {
Expand Down

0 comments on commit ed1861e

Please sign in to comment.