Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use sender thread consistently to send msgs to a peer #3067

Merged
merged 1 commit into from
Oct 7, 2019

Conversation

antiochp
Copy link
Member

@antiochp antiochp commented Sep 30, 2019

We currently send p2p messages two different ways -

  • broadcast msgs to peers via the peer_write thread
  • receive msgs via the peer_read thread then respond with a msg (but still via peer_read thread)

Each peer has a dedicated mpsc queue for sending messages out via the peer_write thread. This is polled regularly, taking a message from the queue and sending it out to the peer.

But message responses are not sent via this queue and instead sent out immediately, using the same peer_read thread used to receive the original request from the peer.

This PR reworks the message response implementation to use the mpsc queue as follows -

  • receive a msg from a peer on peer_read thread
  • if we want to send a response, push it onto the mpsc queue
  • the peer_write thread will then pull this response msg off the queue and send it

The peer_read thread reads from the tcp stream and potentially writes to the mpsc queue.
The peer_write thread reads from the mpsc queue and writes to the tcp stream.

Having two threads per peer both potentially sending messages out via the same tcp connection is potentially asking for trouble.
Most messages are "safe" in this situation as we simply create a single buffer and then use a single stream.write_all() to send the entire message or none of the message.
But this is not the case for messages with attachments, specifically a response to a txhashset request where we send back the txhashset archive to the peer. This involves multiple calls to stream.write_all() and I think there is a risk of the multiple threads effectively corrupting the data here with interleaved messages.

The way to solve this is to funnel all messages back out via the same peer_write thread.

  • Introduce a Msg struct to represent messages on the mpsc queue.
  • Send messages out by building msg instances and pushing them to the queue.
  • Send responses out by building msg instances and pushing them onto the same queue.
  • The peer_write thread polls this queue (as it does currently) and sends to the peer.

These changes had the nice side-effect of removing some duplication as we no longer need to manage message responses as a separate code path.
We also reduced potentially blocking activity on the peer_read threads, which should allow a node to receive messages from peers with less latency and process them faster.

@antiochp antiochp added this to the 3.0.0 milestone Sep 30, 2019
@antiochp antiochp force-pushed the send_via_sender_thread branch from 7fde1cb to f9a3a57 Compare September 30, 2019 20:05
@antiochp antiochp force-pushed the send_via_sender_thread branch from f9a3a57 to ffd9306 Compare October 4, 2019 11:04
@antiochp antiochp changed the title [DNM] Use sender thread consistently to send msgs to a peer Use sender thread consistently to send msgs to a peer Oct 4, 2019
@antiochp antiochp requested a review from hashmap October 4, 2019 11:11
Copy link
Member

@yeastplume yeastplume left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes absolute sense to me, were there observed cases of a peer 'corrupting' the write stream with other messages when sending back the txhashset, or just a suspicion this might be happening in some cases?

@antiochp
Copy link
Member Author

antiochp commented Oct 7, 2019

Makes absolute sense to me, were there observed cases of a peer 'corrupting' the write stream with other messages when sending back the txhashset, or just a suspicion this might be happening in some cases?

Mainly just a suspicion. But we still have some unexplained failures during txhashset receipt which_may_ be related to this. I'm not 100% sure though.

@antiochp antiochp merged commit a3f3fc2 into mimblewimble:master Oct 7, 2019
@antiochp antiochp deleted the send_via_sender_thread branch October 7, 2019 15:22
@antiochp antiochp mentioned this pull request Dec 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants