Skip to content

listener manager: implement graceful draining using a local drain man…#1219

Merged
mattklein123 merged 6 commits intomasterfrom
listener_draining
Jul 10, 2017
Merged

listener manager: implement graceful draining using a local drain man…#1219
mattklein123 merged 6 commits intomasterfrom
listener_draining

Conversation

@mattklein123
Copy link
Member

…ager

Listeners now drain as part of the removal/update process using the server
configured drain time options. drainClose() decisions are the union between
the local drain manager and the global drain manager.

…ager

Listeners now drain as part of the removal/update process using the server
configured drain time options. drainClose() decisions are the union between
the local drain manager and the global drain manager.
@mattklein123
Copy link
Member Author

@lyft/network-team @htuch @dnoe

@htuch I fixed your comments from the other PR here.

Copy link
Contributor

@dnoe dnoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. A few comments/questions.

/**
* @return TRUE if the manager is currently draining connections.
* Invoked to begin the drain procedure. (Making drain close operations more likely).
* @param completion supplies the completion that will be called when the drain sequence is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this is optional (could be unassigned std::function) from the code below. Might be useful to call that out here.

// TODO(mattklein123): Real listener draining with a per listener drain manager.
// TODO(mattklein123): Detect runtime worker listener addition failure and handle.
// TODO(mattklein123): Consider getting rid of pre-worker start and post-worker start code by
// initializing all listeners after workers are started. This is related to correctly dealing with
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want the text to line up? I don't care either way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer lining up but I think when all the TODOs got reformatted a while ago the lining up got removed. I will reline up.

Upstream::ClusterManager& clusterManager() override { return parent_.server_.clusterManager(); }
Event::Dispatcher& dispatcher() override { return parent_.server_.dispatcher(); }
DrainManager& drainManager() override { return parent_.server_.drainManager(); }
Network::DrainDecision& drainManager() override { return *this; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit confusing since there is also a local drain manager. Could the member be renamed drainDecision()?

Copy link
Member Author

@mattklein123 mattklein123 Jul 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about renaming but then at the time I thought it was a breaking change, but now I realize that the whole factory context thing is new in 1.4.0 so I will change it.

@@ -37,12 +40,15 @@ TEST(DrainManagerImplTest, All) {
// Test drain sequence.
Event::MockTimer* drain_timer = new Event::MockTimer(&server.dispatcher_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it leaks memory. Not a big deal for the test but can it be unique_ptr?

Copy link
Member Author

@mattklein123 mattklein123 Jul 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't leak. This is a pretty common pattern in tests where code needs to operate on a unique_ptr but we need access to the returned thing so we have to use the raw pointer. (This timer gets wrapped in a unique_ptr and returned to code under test).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, so the deletion actually happens when that unique_ptr is destroyed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alyssawilk who had a question on the Envoy-ish pattern for this form of memory management the other day.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you would like to me document this somewhere in the style guide let me know. It allows us to use unique_ptr in product code whenever possible at the expense of making some of the memory management in the tests a bit strange, which IMO is a good compromise.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to document in the style guide since this explicitly departs from the usual unique_ptr hygiene. Agree it makes total sense for test code once I got the explanation.

std::string listener_foo_json = R"EOF(
{
"name": "foo",
"address": "tcp://127.0.0.1:1234",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this test run in an IPv6 only environment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the tests use mock sockets.

@mattklein123
Copy link
Member Author

@dnoe updated

htuch
htuch previously approved these changes Jul 9, 2017
Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, some minor comments.

* more likely).
*/
virtual void startDrainSequence() PURE;
virtual void startDrainSequence(std::function<void()> completion) PURE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we drain globally on each LDS update? This seems reasonable given the current semantics (each LDS refresh is an update of the complete set of listeners), but won't this also preclude the possibility of skipping the drain on listeners that aren't being modified (i.e. they have the same config hash)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see, we create a new drain manager per listener as well as the singleton per-server. Makes sense (could be called out in some comment as well).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK will find a comment to add to.

void ListenerManagerImpl::drainListener(ListenerImplPtr&& listener) {
// TODO(mattklein123): Actually implement timed draining in a follow up. Currently we just
// correctly synchronize removal across all workers.
// First add the listener to the draining list. Thist must be done under the lock since it
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: s/Thirst/This/

Does the timed draining TODO no longer apply?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It no longer applies. The drain manager handles the timer and then removal via the completion callback. I will add more comments.

@@ -37,12 +40,15 @@ TEST(DrainManagerImplTest, All) {
// Test drain sequence.
Event::MockTimer* drain_timer = new Event::MockTimer(&server.dispatcher_);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alyssawilk who had a question on the Envoy-ish pattern for this form of memory management the other day.

@mattklein123
Copy link
Member Author

@htuch @dnoe updated

@mattklein123 mattklein123 merged commit 32ffab0 into master Jul 10, 2017
@mattklein123 mattklein123 deleted the listener_draining branch July 10, 2017 17:26
rshriram pushed a commit to rshriram/envoy that referenced this pull request Oct 30, 2018
Automatic merge from submit-queue.

Add support for v1 config.

**What this PR does / why we need it**:

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes envoyproxy#1217 

**Special notes for your reviewer**:

**Release note**:

```release-note
NONE
```
jpsim pushed a commit that referenced this pull request Nov 28, 2022
Description: This PR implements all of the header-related classes--all of the request/response headers/trailers, their builders, and their base classes.
Risk Level: Low
Testing: N/A, will follow up when everything is finished with e2e testing + unit testing in Python and/orC++
Docs Changes: N/A
Release Notes: N/A

Signed-off-by: Cerek Hillen <chillen@lyft.com>
Signed-off-by: JP Simard <jp@jpsim.com>
jpsim pushed a commit that referenced this pull request Nov 29, 2022
Description: This PR implements all of the header-related classes--all of the request/response headers/trailers, their builders, and their base classes.
Risk Level: Low
Testing: N/A, will follow up when everything is finished with e2e testing + unit testing in Python and/orC++
Docs Changes: N/A
Release Notes: N/A

Signed-off-by: Cerek Hillen <chillen@lyft.com>
Signed-off-by: JP Simard <jp@jpsim.com>
mathetake pushed a commit that referenced this pull request Mar 3, 2026
…t model (#1230)

**Description**

This implements `OriginalModel` as the model name extracted from the
incoming request body before any virtualization applies. In doing so, we
allow metrics such as "gen_ai.server.token.usage" to be aggregated
either on the model the router received (`OriginalModel`) or an override
(`RequestModel`).

Flow:
 1. Router filter extracts model from request body
2. If ModelNameOverride is configured, RequestModel differs from
OriginalModel
 3. Provider responds with ResponseModel (may differ from RequestModel)

Example:
 1. `OriginalModel`: OpenAI Client sends: {"model": "gpt-5"}
 2. `RequestModel`: ModelNameOverride replaces with "gpt-5-nano"
3. `ResponseModel`: OpenAI Platform sends: {"model":
"gpt-5-nano-2025-08-07"}

OpenTelemetry:

In OpenTelemetry Generative AI Metrics, this is an attribute on metrics
such as "gen_ai.server.token.usage". For example, an OpenAI Chat
Completion request to the "gpt-5" model results in a plain text string
attribute: "gen_ai.original.model" -> "gpt-5"

**Related Issues/PRs (if applicable)**

Builds on #1219

---------

Signed-off-by: Adrian Cole <adrian@tetrate.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants