server: avoid flushing while a flush is in progress by snowp · Pull Request #16370 · envoyproxy/envoy

snowp · 2021-05-06T19:33:30Z

In order to support calls to Server::Instance::flushStats (let's call this an "external flush") while also having a flush timer activated we need to handle an external flush being requested while in the process of doing a periodic flush. As it stands,
this currently runs the risk of triggering an ASSERT due to the histogram merging being async. See envoyproxy/envoy-mobile#748 for an example of this happening.

This PR proposes a simple solution where we simply ignore calls to flushStats when we're still in the process of merging
histograms.

Signed-off-by: Snow Pettersen snowp@lyft.com

Risk Level: Medium
Testing: UTs
Docs Changes: n/a
Release Notes: n/a

Signed-off-by: Snow Pettersen <snowp@lyft.com>

snowp · 2021-05-06T19:34:39Z

@jmarantz This is a fairly naive solution to the problem so wanted to put this up for feedback before I started writing tests. I'm not sure if it makes sense to support other ways of handling this (e.g. enqueue a flush instead of dropping it), or if it makes more sense to make this part of the stats store (e.g. have mergeHistogram do nothing if merging is already happening).

jmarantz · 2021-05-06T20:06:34Z

I like this approach. I think if stat-flushing is slow you don't want to queue them. You want to drop flushes if they are not done.

I would suggest adding a state counting the dropped flushes. I think that would be more useful to us than info-logs.

Signed-off-by: Snow Pettersen <snowp@lyft.com>

dmitri-d · 2021-05-07T20:25:18Z

How responsive does stat-flushing need to be? If it could wait you could queue up a flush and cancel the timer until the flush is complete...

snowp · 2021-05-08T22:22:24Z

How responsive does stat-flushing need to be? If it could wait you could queue up a flush and cancel the timer until the flush is complete...

I think we already pause the timer until we complete the flush (ie its scheduled after the flush is complete). The issue this PR is trying to fix is allowing out of band flushes to not conflict with scheduled ones.

/retest

repokitteh-read-only · 2021-05-08T22:22:29Z

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #16370 (comment) was created by @snowp.

see: more, trace.

jmarantz

Looks great with one minor comment.

jmarantz · 2021-05-10T12:58:15Z

test/integration/server.h

-  void mergeHistograms(PostMergeCb) override {}
+  void mergeHistograms(PostMergeCb cb) override { merge_cb_ = cb; }
+
+  PostMergeCb merge_cb_;


nit: expose runMergeCallback() as a public function and leave merge_cb_ private.

Signed-off-by: Snow Pettersen <snowp@lyft.com>

jmarantz

thanks!

jmarantz · 2021-05-12T12:08:58Z

tsan failure in //test/integration:multiplexed_integration_test , 2/5 times. Not sure whether that's related.

snowp · 2021-05-12T17:43:05Z

I see the same failures listed in #test-flaky, so I believe these are also happening on main

server: avoid flushing while a flush is in progress

b26a1eb

Signed-off-by: Snow Pettersen <snowp@lyft.com>

Snow Pettersen added 3 commits May 7, 2021 17:32

add stat + test

3fc1439

Signed-off-by: Snow Pettersen <snowp@lyft.com>

better test

aed5e39

Signed-off-by: Snow Pettersen <snowp@lyft.com>

format

d234bdf

Signed-off-by: Snow Pettersen <snowp@lyft.com>

snowp marked this pull request as ready for review May 7, 2021 18:12

snowp assigned jmarantz May 7, 2021

jmarantz reviewed May 10, 2021

View reviewed changes

Snow Pettersen added 2 commits May 11, 2021 18:39

runMergeCallback

9b83ed6

Signed-off-by: Snow Pettersen <snowp@lyft.com>

format

77f1ef2

Signed-off-by: Snow Pettersen <snowp@lyft.com>

jmarantz approved these changes May 11, 2021

View reviewed changes

snowp merged commit c784d89 into envoyproxy:main May 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server: avoid flushing while a flush is in progress#16370

server: avoid flushing while a flush is in progress#16370
snowp merged 6 commits intoenvoyproxy:mainfrom
snowp:flush-during-flush

snowp commented May 6, 2021 •

edited

Loading

Uh oh!

snowp commented May 6, 2021

Uh oh!

jmarantz commented May 6, 2021

Uh oh!

dmitri-d commented May 7, 2021

Uh oh!

snowp commented May 8, 2021

Uh oh!

repokitteh-read-only bot commented May 8, 2021

Uh oh!

jmarantz left a comment

Uh oh!

jmarantz May 10, 2021

Uh oh!

jmarantz left a comment

Uh oh!

jmarantz commented May 12, 2021

Uh oh!

snowp commented May 12, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

snowp commented May 6, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

snowp commented May 6, 2021

Uh oh!

jmarantz commented May 6, 2021

Uh oh!

dmitri-d commented May 7, 2021

Uh oh!

snowp commented May 8, 2021

Uh oh!

repokitteh-read-only bot commented May 8, 2021

Uh oh!

jmarantz left a comment

Choose a reason for hiding this comment

Uh oh!

jmarantz May 10, 2021

Choose a reason for hiding this comment

Uh oh!

jmarantz left a comment

Choose a reason for hiding this comment

Uh oh!

jmarantz commented May 12, 2021

Uh oh!

snowp commented May 12, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

snowp commented May 6, 2021 •

edited

Loading