Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

msg deserialization error causes peer thread to loop continuously #2793

Closed
antiochp opened this issue May 2, 2019 · 4 comments
Closed

msg deserialization error causes peer thread to loop continuously #2793

antiochp opened this issue May 2, 2019 · 4 comments
Labels

Comments

@antiochp
Copy link
Member

antiochp commented May 2, 2019

We fail to deserialize a p2p msg if the msg_len exceeds a defined max_len (per msg type).

If this happens we have read the 11 bytes of the msg header off the stream but we have not read the msg body itself.

We require every msg to begin with a pair of "magic bytes" that indicate the start of a msg header.

  • we attempt to deserialize the msg header
    • 11 bytes (fixed size) read from stream
    • deserialization error due to msg_len exceeding max_len
  • try_break! loops again
  • we then attempt to deserialize the next msg header
    • 11 bytes (fixed size) read from stream
    • the magic bytes will not match as these 11 bytes are not actually a msg header
  • try_break! is now effectively looping looking for a valid 11 byte chunk starting with magic bytes

If the next bytes are not the magic bytes then we are going to read chunks of 11 bytes at a time and if we are misaligned by even one byte we will not identify the next msg header correctly.

In fact this is not limited to the msg_len check. Any error occurring during deserialization of the msg header or the msg itself will cause this.

Is my understanding here correct?

  • Exceeding max_len is treated as a serialization error
  • We do not drop the connection on serialization errors
  • Our msg polling logic will loop again on a serialization error
  • Once we are out of alignment" we are effectively stuck reading garbage off the stream

A couple of alternative approaches -

  1. If we don't read a full msg successfully we need to drop the connection. We cannot rely on the next bytes read being the next pair of magic bytes.
  2. Alternatively - consider adding some logic to read (and discard) bytes until we encounter the next pair of magic bytes? I'm not sure if this is an approach often used in practice?

Related #2791.

@antiochp antiochp added the bug label May 2, 2019
@antiochp
Copy link
Member Author

antiochp commented May 2, 2019

cc @hashmap

@antiochp
Copy link
Member Author

antiochp commented May 2, 2019

False alarm.
The internal logic in try_break! does not suppress ser::Error and will close the connection -

grin/p2p/src/conn.rs

Lines 47 to 64 in b9db129

// Macro to simplify the boilerplate around async I/O error handling,
// especially with WouldBlock kind of errors.
macro_rules! try_break {
($chan:ident, $inner:expr) => {
match $inner {
Ok(v) => Some(v),
Err(Error::Connection(ref e)) if e.kind() == io::ErrorKind::WouldBlock => None,
Err(Error::Store(_))
| Err(Error::Chain(_))
| Err(Error::Internal)
| Err(Error::NoDandelionRelay) => None,
Err(e) => {
let _ = $chan.send(e);
break;
}
}
};
}

@antiochp
Copy link
Member Author

antiochp commented May 2, 2019

This stuff is really confusing - we put an error on the error channel in the above case (the connection close logic is asynchronous).

This is then picked up in peer.check_connection() that reads off the error channel.

But the logic here does subtly different things for a serialization error vs. any other kind of error -

grin/p2p/src/peer.rs

Lines 416 to 463 in b9db129

fn check_connection(&self) -> bool {
let connection = match self.connection.as_ref() {
Some(conn) => conn.lock(),
None => return false,
};
match connection.error_channel.try_recv() {
Ok(Error::Serialization(e)) => {
let need_stop = {
let mut state = self.state.write();
if State::Banned != *state {
*state = State::Disconnected;
true
} else {
false
}
};
if need_stop {
debug!(
"Client {} corrupted, will disconnect ({:?}).",
self.info.addr, e
);
stop_with_connection(&connection);
}
false
}
Ok(e) => {
let need_stop = {
let mut state = self.state.write();
if State::Disconnected != *state {
*state = State::Disconnected;
true
} else {
false
}
};
if need_stop {
debug!("Client {} connection lost: {:?}", self.info.addr, e);
stop_with_connection(&connection);
}
false
}
Err(_) => {
let state = self.state.read();
State::Connected == *state
}
}
}
}

Specifically - we will close a connection for a serialization error unless the peer is banned.
Whereas we will close a connection for any other kind of error unless the peer is disconnected.

We will not close a connection for a non-serialization error if the peer is banned.
Edit:

  • If the error is a serialization error and the peer is not currently banned then we will update the state to disconnected and close the connection.
  • If the error is a non-serialization error and the peer is not currently disconnected (it could be banned) then set the state to disconnected and close the connection.

i.e. We will unban a currently banned peer as part of disconnecting it if we encounter a non-serialization error here.

This does not immediately seem correct to me.
This seems clearly wrong.

@antiochp
Copy link
Member Author

This is resolved now I think (went down various related rabbit holes).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant