Skip to content

udp: support forwarding packets between workers#13086

Merged
ggreenway merged 45 commits intoenvoyproxy:masterfrom
ggreenway:udp-worker-routing
Sep 23, 2020
Merged

udp: support forwarding packets between workers#13086
ggreenway merged 45 commits intoenvoyproxy:masterfrom
ggreenway:udp-worker-routing

Conversation

@ggreenway
Copy link
Member

@ggreenway ggreenway commented Sep 14, 2020

Signed-off-by: Greg Greenway ggreenway@apple.com

Commit Message: Also, a few fixes in active_quic_listener that were discovered while fixing
tests to work with this change:

  • Buffered CHLOs wouldn't be processed unless another readable event occurred on the socket. Fixed it so that some CHLOs are processed every event loop until the queue is empty.
  • Disabling a quic listener wouldn't immediately stop accepting new connections; it only stopped accepting connections that were buffered. The implementation would allow up to 16 connections to be accepted after disabling the listener. This made it tricky to make tests work consistently. Fixed by having also checking the runtime disable in onData().

Additional Description:
Risk Level: Medium
Testing: Fixed all existing tests; still need to add some new
Docs Changes:
Release Notes: This is mostly invisible to the user; will be used to build user-facing features later
[Optional Runtime guard:]
[Optional Fixes #Issue]
[Optional Deprecated:]

Also, a few fixes in active_quic_listener that were discovered while fixing
tests to work with this change.

Signed-off-by: Greg Greenway <ggreenway@apple.com>
@ggreenway
Copy link
Member Author

I took inspiration on this from the ConnectionBalancer code, since it's doing a similar operation.

@ggreenway
Copy link
Member Author

@lambdai @mattklein123 would the crash fixed in #12748 logically apply to this change as well? Eg, should I use a shared_ptr in the same way #12748 did?

Copy link
Contributor

@danzh2010 danzh2010 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice packet re-route interface! I have some thought regarding to the QUIC fix. Overall looks good!

Signed-off-by: Greg Greenway <ggreenway@apple.com>
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this. Flushing out a few high level comments before diving into the detailed review.

/wait

the dispatcher for processing buffered chlos

Signed-off-by: Greg Greenway <ggreenway@apple.com>
Signed-off-by: Greg Greenway <ggreenway@apple.com>
Signed-off-by: Greg Greenway <ggreenway@apple.com>
Signed-off-by: Greg Greenway <ggreenway@apple.com>
Signed-off-by: Greg Greenway <ggreenway@apple.com>
Signed-off-by: Greg Greenway <ggreenway@apple.com>
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM at a high level. Flushing out a few more comments. Thank you!

/wait

Signed-off-by: Greg Greenway <ggreenway@apple.com>
Signed-off-by: Greg Greenway <ggreenway@apple.com>
Signed-off-by: Greg Greenway <ggreenway@apple.com>
Signed-off-by: Greg Greenway <ggreenway@apple.com>
@ggreenway
Copy link
Member Author

ci failure due to disk full

/retest

@repokitteh-read-only
Copy link

Retrying Azure Pipelines, to retry CircleCI checks, use /retest-circle.
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #13086 (comment) was created by @ggreenway.

see: more, trace.

@ggreenway
Copy link
Member Author

coverage error wasn't caused by this PR:

Code coverage 96.5 is good and higher than limit of 96.5
Checking per-extension coverage
Per-extension coverage failed:
Code coverage for source/server/admin is lower than limit of 95.5 (95.3)

Copy link
Contributor

@danzh2010 danzh2010 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, alone aside the renaming request. Also can you speak of #13086 (comment)?

Network::UdpPacketWriterPtr udp_packet_writer =
std::make_unique<Quic::UdpGsoBatchWriter>(io_handle, scope);
return udp_packet_writer;
#if UDP_GSO_BATCH_WRITER_PLATFORM_SUPPORT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now GSO support is determined at runtime. But GSO code can also compile at the linux kernels that doesn't support it. I think what we want here is PLATFORM_SUPPORT_POSIX or something like that.

Signed-off-by: Greg Greenway <ggreenway@apple.com>
@ggreenway
Copy link
Member Author

Also can you speak of #13086 (comment)?

Sorry, I missed that earlier. Replied.

@ggreenway
Copy link
Member Author

@mattklein123 can you give this a final pass/approval?

Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks this is really neat. Just a few small things from another pass through. Thank you!

/wait

virtual ConnectionHandler::ActiveListenerPtr
createActiveUdpListener(ConnectionHandler& parent, Event::Dispatcher& disptacher,
Network::ListenerConfig& config) PURE;
createActiveUdpListener(uint32_t worker_id, ConnectionHandler& parent,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: worker_index, here and anywhere else we still have id. Also needs doc comment update.

Network::UdpListenerCallbacks* udp_listener =
dynamic_cast<ActiveUdpListenerBase*>(details.listener_.get());
ASSERT(udp_listener != nullptr);
details.typed_listener_ = *static_cast<Network::UdpListenerCallbacks*>(udp_listener);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this static cast needed? It's the type of the variable above?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That must have been added during some previous variation. Removed.

Comment on lines +62 to +63
Network::UdpListenerCallbacks* udp_listener =
dynamic_cast<ActiveUdpListenerBase*>(details.listener_.get());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the dynamic_cast needed here? This seems bad? Can you add a TODO of some type to figure out how to clean this up?

IIRC there is some mess here that we have dealt with in the past around extensions actually implementing active listener base, etc.? Is that right? It would be good to have a comment if that is the case. I know I have commented on similar things in the past. I think possible if we make QUIC/QUICHE not extension based this gets better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.

I fixed this by promoting part of ActiveUdpListenerBase to /include, so that the factory can return that type. No more casts needed. Take a look at the commit; I had to switch inheritance of ActiveListener to virtual. If we don't like that, the same can be accomplished by making ActiveListenerImplBase into a mixin template, or templatizing whether it inherits from ActiveListener or ActiveUdpListener. Either is fine with me.

Comment on lines +344 to +346
absl::variant<absl::monostate, std::reference_wrapper<ActiveTcpListener>,
std::reference_wrapper<Network::UdpListenerCallbacks>>
typed_listener_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for my understanding, why do we need the variant? Is this just because it truly is a variant and it avoids multiple variables that are XOR? Maybe just add a small comment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just a safe union

Signed-off-by: Greg Greenway <ggreenway@apple.com>
Signed-off-by: Greg Greenway <ggreenway@apple.com>
Signed-off-by: Greg Greenway <ggreenway@apple.com>
needing unsafe dynamic_cast

Signed-off-by: Greg Greenway <ggreenway@apple.com>
danzh2010
danzh2010 previously approved these changes Sep 22, 2020
mattklein123
mattklein123 previously approved these changes Sep 22, 2020
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Epic, thanks for the cleanups!

@ggreenway
Copy link
Member Author

coverage error wasn't caused by this PR:

Code coverage 96.5 is good and higher than limit of 96.5
Checking per-extension coverage
Per-extension coverage failed:
Code coverage for source/server/admin is lower than limit of 95.5 (95.3)

Same on latest run

@ggreenway
Copy link
Member Author

coverage error wasn't caused by this PR:

Code coverage 96.5 is good and higher than limit of 96.5
Checking per-extension coverage
Per-extension coverage failed:
Code coverage for source/server/admin is lower than limit of 95.5 (95.3)

Same on latest run

Actually, it looks like the stub I added is counting against coverage:

    Network::UdpListenerWorkerRouterOptRef udpListenerWorkerRouter() override {
      NOT_REACHED_GCOVR_EXCL_LINE;
    }

Signed-off-by: Greg Greenway <ggreenway@apple.com>
@ggreenway
Copy link
Member Author

asan failure says (killed) on.a compilation step, so I think it's some resource exhaustion, not a real failure caused by this PR. @mattklein123 can you approve one more time?

/retest

@repokitteh-read-only
Copy link

Retrying Azure Pipelines, to retry CircleCI checks, use /retest-circle.
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #13086 (comment) was created by @ggreenway.

see: more, trace.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants