-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flaky Test/AuthorityRevive in google.golang.org/grpc/xds/internal/xdsclient/tests #7365
Comments
There is roughly 0.4% flakiness when run on forge: 399 out of 100000 failures |
InvestigationIt looks like the subchannel picks the same address twice in case of failures
tryAllAddrs is being called twice. |
Root Cause
When addrConn.connect releases the mutex after checking for idleness, another call to addrConn.connect can come in which also sees the channel as idle because resetTransport hasn't acquired the lock yet. So we have two connection attempts in parallel. A simple fix it to set the state to connecting while addrConn.connect has the mutex locked. I tried it and it fixed the flakiness. Will discuss with the team and raise a PR. |
https://github.com/grpc/grpc-go/actions/runs/9711071699/job/26803095072?pr=7364
The text was updated successfully, but these errors were encountered: