Skip to content

test: adding some debug logs to diagnose hang#6875

Merged
alyssawilk merged 4 commits intoenvoyproxy:masterfrom
alyssawilk:hang
May 16, 2019
Merged

test: adding some debug logs to diagnose hang#6875
alyssawilk merged 4 commits intoenvoyproxy:masterfrom
alyssawilk:hang

Conversation

@alyssawilk
Copy link
Contributor

@alyssawilk alyssawilk commented May 9, 2019

Risk Level: n/a (test only)
Testing: yep
Docs Changes: no
Release Notes: no

#6874

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
@alyssawilk
Copy link
Contributor Author

I can't repro #6874 locally - this is my best idea is to try to get some more information.
Feel free to reject as really kludgy :-P

initialize();

// Debug logs for #6874
std::cerr << "Waiting for load stats stream.\n";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you'll need use std::endl as end of line so it flushes the buffer.

@mattklein123
Copy link
Member

Hopefully not needed anymore via #6784

@alyssawilk
Copy link
Contributor Author

lizan merged commit 27b3e48 into envoyproxy:master 3 days ago

Sat, May 11, 8:07 PM (2 days ago)
https://circleci.com/gh/envoyproxy/envoy/215476?utm_campaign=workflow-failed&utm_medium=email&utm_source=notification
[ RUN ] IpVersions/LoadStatsIntegrationTest.LocalityWeighted/IPv4

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed let's merge this and see if we can figure this out.

@alyssawilk
Copy link
Contributor Author

Thanks. One more hang last night
https://circleci.com/gh/envoyproxy/envoy/217508?utm_campaign=workflow-failed&utm_medium=email&utm_source=notification
so hopefully next time we'll get more info!

@alyssawilk alyssawilk merged commit 04f0cc9 into envoyproxy:master May 16, 2019
alyssawilk added a commit that referenced this pull request Jun 19, 2019
For posterity #6875 got us the information it was a "waiting for stats" bug, which is then debuggable by lowering load stats interval to 1ms.

This exposed two issues with merge. When request 1 arrived in a separate interval from request 2, the stats protos could arrive in the opposite order (fixed by ignoring order) and that if there was a stats-update with a request in progress it never resolved because we were always adding fields (oops!)

I'm tempted to leave the interval low now that this is thoroughly debugged, but there's expectations on the time being within a range which wouldn't work if we lower it so meh?

Risk Level: n/a (test only)
Testing: yes. very much so.
Docs Changes: n/a
Release Notes: n/a
Fixes #6874

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
@alyssawilk alyssawilk deleted the hang branch April 20, 2020 13:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants