-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: grpc reconnection #1521
fix: grpc reconnection #1521
Conversation
|
Codecov Report
@@ Coverage Diff @@
## main #1521 +/- ##
=====================================
Coverage 78.0% 78.0%
=====================================
Files 127 127
Lines 6600 6600
=====================================
+ Hits 5153 5154 +1
+ Misses 1203 1202 -1
Partials 244 244
|
Is there an issue tracking this? If not, can we start there? It would be nice to better understand the problem this is attempting to solve. |
I hope it helps a little bit: |
It would be great if we had a test that would prevent regressions, since this bug has been a major issue of this nature. I am not very familiar with the code, but this certainly looks correct. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't reproduce this bug, and the test TestNewExporter_collectorConnectionDiesThenReconnects
works fine.
@@ -13,6 +13,10 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm | |||
- Rename project default branch from `master` to `main`. | |||
- Reverse order in which `Resource` attributes are merged, per change in spec. (#1501) | |||
|
|||
## Fixed | |||
|
|||
- Fixed otlpgrpc reconnection issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Fixed otlpgrpc reconnection issue. | |
- Fixed otlpgrpc reconnection issue. (#1521) |
thank you - i didn't notice integration tests. I'm not sure about my solution but I think this test doesn't cover this line: Which i think causes goroutine to hang. I will try to write new test or extend this. |
I think changing disconnectedCh channel to buffered may be better option but I will change this PR once i have time to write integration tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may also fix the test flake I ran into elsewhere (TestNewExporter_collectorConnectionDiesThenReconnects).
Actually, I saw it flake several times in CI yesterday, and locally I see it fail about 5% of the time (https://gist.githubusercontent.com/punya/1db2a5fc7aede3baa06685997e145f3b/raw/bc52d8e2ba68772539a36c87f8ff9d1492e2ecf6/runs.txt). |
btw. in this case connected() function was very misleading for me (grpc exporter). It doesn't mean exporter is connected and thanks to this reconnection goroutine it almost always returns true. First reconnection always successfully creates new "connection" which is really not established. I changed fix to buffered channel, please review and let me know what do you think? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good 👍
* fix: grpc reconnection fixed * chore: changelog update * fix: grpc reconnection issue - red test * fix: grpc reconnection open-telemetry#1524 * fix: grpc reconnection issue cleanup
picking up the change from open-telemetry#1521
I have successfully reproduced this issue on 0.15.0 but i think it also applies to recent version. After disconnection it tries to reconnect only once. With this fix it should try reconnect periodically until successful connection.
Please let me know what do you think. Maybe there is some other better way to fix it.