network: connection deduplication#4695
Conversation
Now, via headers passed between peers, an Identity Challenge is passed between peers. This object is used for cryptographic identification, currently for connection deduplication.
Also clean up debug logging lines and reorganize code slightly.
brianolson
left a comment
There was a problem hiding this comment.
I should re-read netidentity.go more carefully, but a couple early notes
Codecov Report
@@ Coverage Diff @@
## master #4695 +/- ##
==========================================
+ Coverage 53.42% 53.62% +0.19%
==========================================
Files 431 432 +1
Lines 54372 54541 +169
==========================================
+ Hits 29050 29249 +199
+ Misses 23062 23039 -23
+ Partials 2260 2253 -7
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
brianolson
left a comment
There was a problem hiding this comment.
LGTM. I think that got everything.
like prioScheme, only initialize identityScheme if it is enabled
iansuvak
left a comment
There was a problem hiding this comment.
Looks good to me in general, my comments and questions are minor and not-blocking
| // as a header. It returns the identityChallengeValue used for this challenge, so the network can | ||
| // confirm it later (by passing it to VerifyResponse), or returns an empty challenge if dedupName is | ||
| // not set. | ||
| func (i identityChallengePublicKeyScheme) AttachChallenge(attach http.Header, addr string) identityChallengeValue { |
There was a problem hiding this comment.
nit, feel free to drop: I'd prefer the attach variable name to be something like attachTo or baseHeaders. To me attach is ambiguous and the first parse of it sounds like the attachment itself rather than the target.
There was a problem hiding this comment.
Easy enough! consider it done.
| err := protocol.Decode(message.Data, &msg) | ||
| if err != nil { | ||
| networkPeerIdentityError.Inc(nil) | ||
| peer.net.log.With("err", err).With("remote", peer.OriginAddress()).With("local", localAddr).Warn("peer identity verification could not be decoded, disconnecting") |
There was a problem hiding this comment.
Would it make sense to add a short named field for reason instead of having that information in the Message string itself?
It's a low cardinality value and it would make looking at the breakdown of reasons for identity related disconnects easier without breaking out the telemetry metric
There was a problem hiding this comment.
When a wsNetwork disconnects a peer, it has either the exported Disconnect(peer) method, or the internal disconnect(peer, reason) method. Since we are using the internal, I added a reason "disconnectDuplicateConnection". So, the accounting should already be handled.
Plus, all non-connections (due to duplicate or error) trigger metric counter increments.
There was a problem hiding this comment.
that made me think, what do you think of changing all the Disconnect(peer) calls added in this file where we disconnect due to identity failures to use a new disconnectReason like disconnect(peer, disconnectBadIdentityData)?
There was a problem hiding this comment.
That's partially what I was thinking but even more granular on the reason for what went wrong in the identity. It's findable right now though via the message field in Elastic Search though so that's good enough.
| ) | ||
|
|
||
| // The following msgp objects are implemented in this file: | ||
| // disconnectReason |
There was a problem hiding this comment.
looks like it's not hurting but do we actually need to generate methods for this type or is it safe to msgp:ignore it?
There was a problem hiding this comment.
No clue, my goal was to get msgp working for the networking package. As others start using msgp in networking, they'll need to make sure the generated files work as intended for them.
There was a problem hiding this comment.
My guess is not if you are not using it and network didn't run msgp:generate prior to your commit. nbd but you can msgp:ignore it which will cut down a few tests and remove unnecessary interface implementation
There was a problem hiding this comment.
Agreed, disconnectReason should be internal only and can be ignored
| // will prevent this connection from attempting in the first place | ||
| // in the real world, that isConnectedTo doesn't always trigger, if the hosts are behind | ||
| // a load balancer or other NAT | ||
| if _, ok := netA.tryConnectReserveAddr(addrB); ok || true { |
There was a problem hiding this comment.
You're testing for ok || true here but I assume this really means require.True(t, ok)?
There was a problem hiding this comment.
Actually it means you don't need to call tryConnectReserveAddr at all, right? Since the A->B conn is already connected, ok will always be false
There was a problem hiding this comment.
That's right. I wrote it like this for a couple reasons:
-
All the tests I wrote that use tryConnect attempt a
tryConnectReserveAddrbecause it's how tryConnect gets used in the wsNetwork. In this case I really don't care about the value, but I would rather explicitly overload it and demonstrate/explain the difference than simply not include it. -
At the original time of writing, I was unsure of the potential side effects that
tryConnectReserveAddrwould have on the upcomingtryConnect, so I wanted to make sure it matched the real-world code.
There was a problem hiding this comment.
In this case it seems like you would want to assert that ok is false to assert the behavior that tryConnectReserveAddr wouldn't let you connect twice to the same address, and that it didn't lose track of the first conn's address.
have mockIdentityTracker wrap real identityTracker
Identity tests require a full HTTP to Web-Socket peering, and are then checked for expected for the expected peerings. This means some time is required to let the connection fully settle, and let each network's readLoop for the peer find potentially closed connections. Also removes a duplicate test case
| version: matchingVersion, | ||
| identity: peerID, | ||
| identityChallenge: peerIDChallenge, | ||
| identityVerified: 0, |
There was a problem hiding this comment.
nit: identityVerified is zero so it doesn't need to be listed here (and the other wsPeer struct you fill out an identity: peerID below doesn't have the zero value initialized)
| const maxAddressLen = 256 + 32 // Max DNS (255) + margin for port specification | ||
|
|
||
| // identityChallengeValue is 32 random bytes used for identity challenge exchange | ||
| type identityChallengeValue [32]byte |
There was a problem hiding this comment.
nit
| type identityChallengeValue [32]byte | |
| type identityChallengeValue crypto.Digest |
| return identityChallengeValue{}, crypto.PublicKey{}, err | ||
| } | ||
| if !idChal.Verify() { | ||
| return identityChallengeValue{}, crypto.PublicKey{}, fmt.Errorf("identity challenge incorrectly signed") |
There was a problem hiding this comment.
use named error created with errors.New(...)
we are trying to eliminate errors literals and new code should not add more
| return crypto.PublicKey{}, []byte{}, err | ||
| } | ||
| if resp.Msg.Challenge != c { | ||
| return crypto.PublicKey{}, []byte{}, fmt.Errorf("challenge response did not contain originally issued challenge value") |
| return crypto.PublicKey{}, []byte{}, fmt.Errorf("challenge response did not contain originally issued challenge value") | ||
| } | ||
| if !resp.Verify() { | ||
| return crypto.PublicKey{}, []byte{}, fmt.Errorf("challenge response incorrectly signed ") |
| if err != nil { | ||
| networkPeerIdentityError.Inc(nil) | ||
| peer.net.log.With("err", err).With("remote", peer.OriginAddress()).With("local", localAddr).Warn("peer identity verification could not be decoded, disconnecting") | ||
| peer.net.disconnect(peer, disconnectBadIdentityData) |
There was a problem hiding this comment.
why disconnecting right here instead of returning Action=Disconnect as other handler do?
| if ok { | ||
| url, err := url.Parse(addr) | ||
| if err == nil { | ||
| wn.config.PublicAddress = fmt.Sprintf("%s:%s", url.Hostname(), url.Port()) |
There was a problem hiding this comment.
modifying config that has constant/read only semantic in random parts of codebase is not good at all. Why not a local var since all the usage of it is right here?
Summary
What
Introduces "Identity Challenge" exchange during peering. This is a process in which two peers exchange signed challenges to register one another with public keys that validate their identities. This validated identity is then used as the mechanism to prevent duplicate and bidirectional connections between peers on the network.
Why
Today, Algorand nodes connect outwardly to a number of peers on the network (default of 4). Peer selection strategies within the node will regularly drop the worst connection, and seek a random new peer. This means that over time, nodes will select highly performant connections. However, it also means the likelihood of peers connecting to one another bidirectionally is high, since highly performant connections would tend to be mutually preferred.
Why can't we simply deduplicate against addresses, Telemetry ID, or other basic node information? Because Algorand is distributed, nodes need a way to trust the identity being presented as secure. A bad faith actor could impersonate a node's identity, and "claim" connections to all relay peers, which is a denial of service vector. Instead, we need to be able to validate a connection against a secret identity on the node. A cryptographic solution is the obvious one, where peers exchange signed messages.
How
From the package comment:
Additional Testing Feature:
Public Address: autoNote: use only for local networks and performance tests.
Thank you very much @brianolson for this idea -- in order to enable IdentityExchange, nodes must have set their PublicAddress to what they expect the DNS knows them to be. This can be tricky in test scenarios (or even live scenarios) where the exact Public Address wouldn't be easily determined -- if "auto" is specified, the PublicAddress value is set once the listener is up. This allows for simpler tests, and could also have utility for node operators who maybe don't want to explicitly set Public Address.
Test Plan
Unit Tests:
Identity Challenges and Responses:
Identity Tracker Tests:
wsNetwork
Manual Testing
(last done Jan17)
tryConnect. If the receiver does not have a ConectionDeduplicationName, the header payload is never even checked, and all peering continues without identity, as expected.setPeersByIDmethod to return false (indicating the key is in use) and observed that the connection is disconnected at the point that the peer has a verified identity, as expected.Potential outcomes: