event: update fd registration mask even if it hasn't changed. by antoniovicente · Pull Request #16389 · envoyproxy/envoy

antoniovicente · 2021-05-08T03:40:22Z

Commit Message:
event: update fd registration mask even if it hasn't changed.

Updates to the fd mask can result in new events when operating in EDGE trigger mode.
Doing this update unconditionally is specially important in cases where there was a synthetic event scheduled
and setEnabled ends up clearing it before the call to updateEvents. By skipping the update we prevent the
generation of a new real event to replace the lost synthetic event.

Without this change, calling close(Flush) a socket that is readDisabled and had a pending synthetic write event
can result in th write event never being delivered so the final flush will fail due to timeout.

Additional Description:
The issue was introduced by this optimization request from me which works for EmulatedEdge but not for Edge triggered sockets: https://github.com/envoyproxy/envoy/pull/13787/files#r520315847

I don't know if there are current situations where this bug would trigger in the proxy. I ran into this while trying to change the order of operations when generating HTTP/1.0 responses framed by connection close; before my prototype changes there was a call to readDisable(false) before Connection::close(Flush) which resulted in existing tests to work.

Risk Level: low
Testing: unit
Docs Changes: n/a
Release Notes: n/a
Platform Specific Features: n/a

Updates to the fd mask can result in new events when operating in EDGE trigger mode. Doing this update unconditionally is specially important in cases where there was a synthetic event scheduled and setEnabled ends up clearing it before the call to updateEvents since by skipping the update we prevent the generation of a new real event to replace the lost synthetic event. Without this change, calling close(Flush) a socket that is readDisabled and had a pending synthetic write event can result in th write event never being delivered so the final flush will fail due to timeout. Signed-off-by: Antonio Vicente <avd@google.com>

Signed-off-by: Antonio Vicente <avd@google.com>

antoniovicente · 2021-05-11T23:58:54Z

/assign-from @envoyproxy/first-pass-reviewers

repokitteh-read-only · 2021-05-11T23:58:58Z

@envoyproxy/first-pass-reviewers assignee is @dio

🐱

Caused by: a #16389 (comment) was created by @antoniovicente.

see: more, trace.

davinci26 · 2021-05-12T00:12:52Z

I have this PR under my radar, but I haven't got time to understand the edge case exactly. This is why I haven't commented on it

davinci26

Thanks for fixing a Windows issue!

antoniovicente · 2021-05-12T17:33:08Z

Thanks for fixing a Windows issue!

Windows works fine AFAIK, it's linux that has the issues.

Thanks for the review!

antoniovicente · 2021-05-12T17:33:24Z

/assign-from @envoyproxy/senior-maintainers

repokitteh-read-only · 2021-05-12T17:33:27Z

@envoyproxy/senior-maintainers assignee is @alyssawilk

🐱

Caused by: a #16389 (comment) was created by @antoniovicente.

see: more, trace.

alyssawilk

Thanks for the thorough explanation (and sorry for review delay - second shot hit me hard)
LGTM modulo some explanatory comments :-)

alyssawilk · 2021-05-17T14:15:49Z

source/common/event/file_event_impl.cc

    if (trigger_ == FileTriggerType::EmulatedEdge) {
      auto new_event_mask = enabled_events_ & ~event;
-      updateEvents(new_event_mask);
+      if (new_event_mask != enabled_events_) {


comment here and below why this case doesn't need the update?

Writing the comment for updateEvents made me rethink what we should do here. I reverted these changes and instead added a trigger_mode_ check to updateEvents so the update is skipped in modes where it is truly a no-op.

alyssawilk · 2021-05-17T14:17:19Z

source/common/event/file_event_impl.cc

@@ -85,9 +85,6 @@ void FileEventImpl::assignEvents(uint32_t events, event_base* base) {

 void FileEventImpl::updateEvents(uint32_t events) {
  ASSERT(dispatcher_.isThreadSafe());


I'd like either a comment here on why this is important, or a comment in the test calling out that it's regression testing [info from PR description] just to ensure no clever person decides to improve perf by undoing this PR :-P

Comment added. Thanks for asking for further info in the code, there's a lot of subtle behavior that can be accidentally missed.

Signed-off-by: Antonio Vicente <avd@google.com>

antoniovicente · 2021-05-20T06:41:52Z

Thanks for the thorough explanation (and sorry for review delay - second shot hit me hard)
LGTM modulo some explanatory comments :-)

I know the feeling. The second shot is rough but so worth it.

alyssawilk

LGTM!

alyssawilk · 2021-05-20T20:14:50Z

Coverage flaked. I'll kick off another run but feel free to merge once CI is happy

antoniovicente · 2021-05-20T22:34:08Z

Coverage flaked. I'll kick off another run but feel free to merge once CI is happy

The coverage failure is real, I'll fix it by adding a test case for Level trigger events.

/wait

Signed-off-by: Antonio Vicente <avd@google.com>

wrowe · 2021-05-21T20:17:39Z

//test/integration:integration_test is a known flake in CI, investigating, don't let that stop you if the rest of CI passes.

antoniovicente · 2021-05-21T21:41:18Z

The issues I see right now are bazel RPC failure while building ASAN and TSAN failure related to the issue I'm trying to address in #16590

/retest

repokitteh-read-only · 2021-05-21T21:41:22Z

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #16389 (comment) was created by @antoniovicente.

see: more, trace.

…roxy#16389) * event: update fd registration mask even if it hasn't changed. Updates to the fd mask can result in new events when operating in EDGE trigger mode. Doing this update unconditionally is specially important in cases where there was a synthetic event scheduled and setEnabled ends up clearing it before the call to updateEvents since by skipping the update we prevent the generation of a new real event to replace the lost synthetic event. Without this change, calling close(Flush) a socket that is readDisabled and had a pending synthetic write event can result in th write event never being delivered so the final flush will fail due to timeout. * Remove call to dispatcher exit that is not needed Signed-off-by: Antonio Vicente <avd@google.com>

antoniovicente added 3 commits May 7, 2021 23:23

Remove call to dispatcher exit that is not needed

6a856ef

Signed-off-by: Antonio Vicente <avd@google.com>

lint

955b037

Signed-off-by: Antonio Vicente <avd@google.com>

antoniovicente requested a review from davinci26 May 8, 2021 03:40

antoniovicente assigned davinci26 May 8, 2021

repokitteh-read-only bot assigned dio May 11, 2021

davinci26 previously approved these changes May 12, 2021

View reviewed changes

repokitteh-read-only bot assigned alyssawilk May 12, 2021

alyssawilk reviewed May 17, 2021

View reviewed changes

antoniovicente added 2 commits May 20, 2021 02:32

add comment and improve implementation

ea266f1

Signed-off-by: Antonio Vicente <avd@google.com>

Merge remote-tracking branch 'upstream/main' into lost_events

19c59da

Signed-off-by: Antonio Vicente <avd@google.com>

antoniovicente dismissed davinci26’s stale review via 19c59da May 20, 2021 06:33

alyssawilk previously approved these changes May 20, 2021

View reviewed changes

repokitteh-read-only bot added the waiting label May 20, 2021

improve coverage

7c53822

Signed-off-by: Antonio Vicente <avd@google.com>

antoniovicente dismissed alyssawilk’s stale review via 7c53822 May 21, 2021 17:16

repokitteh-read-only bot removed the waiting label May 21, 2021

fix spelling. activations is not a word.

c2d02fb

Signed-off-by: Antonio Vicente <avd@google.com>

alyssawilk approved these changes May 24, 2021

View reviewed changes

wrowe approved these changes May 24, 2021

View reviewed changes

wrowe merged commit fb274d6 into envoyproxy:main May 24, 2021

		@@ -85,9 +85,6 @@ void FileEventImpl::assignEvents(uint32_t events, event_base* base) {

		void FileEventImpl::updateEvents(uint32_t events) {
		ASSERT(dispatcher_.isThreadSafe());

Conversation

antoniovicente commented May 8, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

antoniovicente commented May 11, 2021

Uh oh!

repokitteh-read-only bot commented May 11, 2021

Uh oh!

davinci26 commented May 12, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

davinci26 left a comment

Choose a reason for hiding this comment

Uh oh!

antoniovicente commented May 12, 2021

Uh oh!

antoniovicente commented May 12, 2021

Uh oh!

repokitteh-read-only bot commented May 12, 2021

Uh oh!

alyssawilk left a comment

Choose a reason for hiding this comment

Uh oh!

alyssawilk May 17, 2021

Choose a reason for hiding this comment

Uh oh!

antoniovicente May 20, 2021

Choose a reason for hiding this comment

Uh oh!

alyssawilk May 17, 2021

Choose a reason for hiding this comment

Uh oh!

antoniovicente May 20, 2021

Choose a reason for hiding this comment

Uh oh!

antoniovicente commented May 20, 2021

Uh oh!

alyssawilk left a comment

Choose a reason for hiding this comment

Uh oh!

alyssawilk commented May 20, 2021

Uh oh!

antoniovicente commented May 20, 2021

Uh oh!

wrowe commented May 21, 2021

Uh oh!

antoniovicente commented May 21, 2021

Uh oh!

repokitteh-read-only bot commented May 21, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

antoniovicente commented May 8, 2021 •

edited

Loading

davinci26 commented May 12, 2021 •

edited

Loading