Skip to content

quic: Connect client UDP sockets before usage#35652

Merged
RyanTheOptimist merged 22 commits intoenvoyproxy:mainfrom
abeyad:udp_sockets
Aug 30, 2024
Merged

quic: Connect client UDP sockets before usage#35652
RyanTheOptimist merged 22 commits intoenvoyproxy:mainfrom
abeyad:udp_sockets

Conversation

@abeyad
Copy link
Contributor

@abeyad abeyad commented Aug 9, 2024

This change does a few things:

  1. Calls connect() on the QUIC UDP sockets, which allows the connection to receive async ICMP error messages on recvmsg calls throughout the lifetime of the connection.
  2. Does not call bind() on a QUIC client UDP socket unless the local address is specified.
  3. If the socket is already connected, call writev/send instead of sendmsg. Connected sockets should not specify a peer address and hence, writev/send is more appropriate.
  4. Previously, when EnvoyQuicClientConnection did preferred server address probing, it specified the existing remote address, not the new preferred server address. This still worked because the sendmsg call took in the actual destination IP address. However, when calling connect(), this caused the socket to connect to the wrong address. This issue is fixed in this PR by creating a socket connection to the preferred server address.
  5. Uses the loopback address for the local address, if one isn't specified, as it is only needed in createConnectionSocket to get an IoHandle. This avoids having to make the more expensive getifaddrs call, as mentioned in Network::Utility::getLocalAddress performance is bad in multi-threads #35137.
  6. The connect() behavior is guarded by the runtime guard envoy.reloadable_features.quic_connect_client_udp_sockets, defaulted to true.

Fixes #35137

This change does a few things:
  1. Calls `connect()` on the QUIC UDP sockets, which allows the socket
     to know if the peer address is routable earlier.
  2. Does not call `bind()` on a QUIC client UDP socket. It's incorrect
     to do so.
  3. If the socket is already connected, call writev/send instead of
     sendmsg. Connected sockets should not specify a peer address and
     hence, writev/send is more appropriate.

Signed-off-by: Ali Beyad <abeyad@google.com>
@abeyad abeyad requested review from kyessenov and soulxu as code owners August 9, 2024 20:54
@abeyad abeyad marked this pull request as draft August 9, 2024 20:55
@danzh2010
Copy link
Contributor

  1. Calls connect() on the QUIC UDP sockets, which allows the socket to know if the peer address is routable earlier.

Not really, we saw connect() succeeded but following write cause "no route to host". The reason we want to connect() here is that connected UDP socket will surface async socket error received from ICMP messages.

abeyad added 5 commits August 13, 2024 14:22
Signed-off-by: Ali Beyad <abeyad@google.com>
Signed-off-by: Ali Beyad <abeyad@google.com>
Signed-off-by: Ali Beyad <abeyad@google.com>
Signed-off-by: Ali Beyad <abeyad@google.com>
Signed-off-by: Ali Beyad <abeyad@google.com>
@abeyad abeyad marked this pull request as ready for review August 19, 2024 22:51
@abeyad
Copy link
Contributor Author

abeyad commented Aug 19, 2024

/assign @danzh2010

@abeyad abeyad changed the title udp: Connect QUIC sockets before usage quic: Connect UDP sockets before usage Aug 19, 2024
@abeyad abeyad changed the title quic: Connect UDP sockets before usage quic: Connect client UDP sockets before usage Aug 19, 2024
Signed-off-by: Ali Beyad <abeyad@google.com>
@adisuissa
Copy link
Contributor

Alyssa is OOO this week. Assigning Ryan who has more QuIC context.
/assign @RyanTheOptimist

Copy link
Contributor

@danzh2010 danzh2010 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! Just a few nits!

abeyad added 2 commits August 20, 2024 14:59
Signed-off-by: Ali Beyad <abeyad@google.com>
Signed-off-by: Ali Beyad <abeyad@google.com>
@abeyad
Copy link
Contributor Author

abeyad commented Aug 22, 2024

@danzh2010 I pushed up new changes that I believe fixes all the tests. PTAL!

c
Signed-off-by: Ali Beyad <abeyad@google.com>
abeyad added 3 commits August 23, 2024 18:44
Signed-off-by: Ali Beyad <abeyad@google.com>
Signed-off-by: Ali Beyad <abeyad@google.com>
Signed-off-by: Ali Beyad <abeyad@google.com>
@abeyad abeyad requested a review from mattklein123 as a code owner August 23, 2024 19:03
Copy link
Contributor Author

@abeyad abeyad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also fixed some more broken tests. PTAL, thanks!

@abeyad
Copy link
Contributor Author

abeyad commented Aug 23, 2024

/retest

Signed-off-by: Ali Beyad <abeyad@google.com>
@abeyad
Copy link
Contributor Author

abeyad commented Aug 24, 2024

/retest

abeyad added 2 commits August 24, 2024 15:51
Signed-off-by: Ali Beyad <abeyad@google.com>
Signed-off-by: Ali Beyad <abeyad@google.com>
Copy link
Contributor Author

@abeyad abeyad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed yesterday's comments, thanks!

Signed-off-by: Ali Beyad <abeyad@google.com>
@abeyad
Copy link
Contributor Author

abeyad commented Aug 27, 2024

/retest

@abeyad
Copy link
Contributor Author

abeyad commented Aug 27, 2024

@danzh2010 friendly ping, thanks!

Copy link
Contributor

@RyanTheOptimist RyanTheOptimist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, have we tried to use EM with this PR to make sure it works as expected?

Also, should we add a runtime guard since this changes behavior.

Network::Socket::Type::Datagram,
// Use the loopback address if `local_addr` is null, to pass in the socket interface used to
// create the IoHandle, without having to make the more expensive `getifaddrs` call.
local_addr ? local_addr : getLoopbackAddress(peer_addr->ip()->version()), peer_addr,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious how this local address is used. In the server case, we need a local address 'cause that's what we listen. But in the normal POSIX client socket case, a caller typically does not specify a local address. Do you understand how this is used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, the local address is needed so we can get a SocketInterface instance because it is needed to create an IoHandle instance. If we don't supply an address here, we segfault in creating the IoHandle in the SocketImpl constructor. Dan and I discussed this and we think the setup of SocketImpl isn't correct - we should be able to use a SocketInterface independent of an Address instance. But that's a change for a separate PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. That makes sense, but it makes me nervous. Have we tried running this PR against an external server (using HTTP/3)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should only affect upstream traffic. And if we hide it behind a runtime feature, it should be fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have tested it on a mobile app connecting to a H/3 server, but for added safety, I added a runtime guard.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent. Can you also mention that in the PR description and update the release notes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

abeyad added 2 commits August 27, 2024 20:04
Signed-off-by: Ali Beyad <abeyad@google.com>
Signed-off-by: Ali Beyad <abeyad@google.com>
@abeyad
Copy link
Contributor Author

abeyad commented Aug 28, 2024

/retest

1 similar comment
@abeyad
Copy link
Contributor Author

abeyad commented Aug 28, 2024

/retest

@danzh2010
Copy link
Contributor

  1. Calls connect() on the QUIC UDP sockets, which allows the connection to know if the socket is not routable upon the first recvmsg call via an async ICMP message.

"No route to host" can be reported synchronously during connect() as we observed in the tests. But we want connected UDP socket overall because it surfaces any async errors (from ICMP messages) via following recvmsg through out the connection life time.

Copy link
Contributor

@RyanTheOptimist RyanTheOptimist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good!

Network::Socket::Type::Datagram,
// Use the loopback address if `local_addr` is null, to pass in the socket interface used to
// create the IoHandle, without having to make the more expensive `getifaddrs` call.
local_addr ? local_addr : getLoopbackAddress(peer_addr->ip()->version()), peer_addr,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. That makes sense, but it makes me nervous. Have we tried running this PR against an external server (using HTTP/3)?

@abeyad
Copy link
Contributor Author

abeyad commented Aug 28, 2024

  1. Calls connect() on the QUIC UDP sockets, which allows the connection to know if the socket is not routable upon the first recvmsg call via an async ICMP message.

"No route to host" can be reported synchronously during connect() as we observed in the tests. But we want connected UDP socket overall because it surfaces any async errors (from ICMP messages) via following recvmsg through out the connection life time.

Updated the PR description

Signed-off-by: Ali Beyad <abeyad@google.com>
@repokitteh-read-only
Copy link

CC @envoyproxy/runtime-guard-changes: FYI only for changes made to (source/common/runtime/runtime_features.cc).

🐱

Caused by: #35652 was synchronize by abeyad.

see: more, trace.

Signed-off-by: Ali Beyad <abeyad@google.com>
@abeyad
Copy link
Contributor Author

abeyad commented Aug 29, 2024

/retest

Copy link
Contributor

@danzh2010 danzh2010 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this!

@abeyad
Copy link
Contributor Author

abeyad commented Aug 30, 2024

Thanks for the great reviews, @danzh2010 and @RyanTheOptimist !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Network::Utility::getLocalAddress performance is bad in multi-threads

4 participants