Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

grpc: perform a blocking close of the balancer in ccb #6497

Merged
merged 1 commit into from
Aug 3, 2023

Conversation

easwars
Copy link
Contributor

@easwars easwars commented Aug 3, 2023

This PR fixes the following issue:

  • Channel is READY and things are good
  • After a period of inactivity, channel moves to IDLE
    • As part of this the LB policy is shutdown. But currently a goroutine is spawned to shutdown the LB policy.
  • At the same time, an RPC is initiated
    • This brings the channel out of IDLE and a new LB policy is created
  • New policy moves channel to READY
  • Still, the old policy hasn't been completely shutdown because the goroutine did not get a change to run
    • The connections belonging to the old policy and shutdown, and the old policy receives state updates on these subConns
    • The old policy updates channel with TRANSIENT_FAILURE

This whole scenario can be prevented if the old policy was shutdown inline when the channel entered IDLE.

This PR also includes a minor fix to one of the tests to prevent a race where the test is waiting for the channel become READY. But it could so happen that the channel became READY and in fact became IDLE before the test started the check. Instead of relying on the connectivity state, the test now performs an RPC to ensure that the channel is in fact READY.

RELEASE NOTES:

  • grpc: fix a bug where the channel could erroneously move to TRANSIENT_FAILURE when actually moving to IDLE

@easwars easwars requested a review from dfawley August 3, 2023 16:31
@easwars easwars added this to the 1.58 Release milestone Aug 3, 2023
@dfawley
Copy link
Member

dfawley commented Aug 3, 2023

  • The connections belonging to the old policy and shutdown, and the policy receives state updates on these subConns
  • The policy updates channel with TRANSIENT_FAILURE

Can you clarify this? Is this the new policy or the old policy?

Comment on lines 408 to +414
// Verify that the ClientConn moves to READY.
ctx, cancel := context.WithTimeout(context.Background(), defaultTestTimeout)
defer cancel()
awaitState(ctx, t, cc, connectivity.Ready)
client := testgrpc.NewTestServiceClient(cc)
if _, err := client.EmptyCall(ctx, &testpb.Empty{}); err != nil {
t.Errorf("EmptyCall RPC failed: %v", err)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just delete this entirely? We're going to perform 20 RPCs after this anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This acts as a sanity check to ensure that the channel is READY and can make RPCs before we get to our actual test. So, if for some reason, the channel doesn't become READY, then we will fail here and will have a better signal instead of failing down below wherein we might initially think the failure is because of a bad interaction between idle timeout firing and a new RPC being made.

@dfawley dfawley assigned easwars and unassigned dfawley Aug 3, 2023
@easwars
Copy link
Contributor Author

easwars commented Aug 3, 2023

  • The connections belonging to the old policy and shutdown, and the policy receives state updates on these subConns
  • The policy updates channel with TRANSIENT_FAILURE

Can you clarify this? Is this the new policy or the old policy?

It is the old policy. Updated the PR description.

@easwars easwars merged commit 5d3d9d7 into grpc:master Aug 3, 2023
1 check passed
@easwars easwars deleted the idle_balancer_close_bug_fix branch August 3, 2023 18:41
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 31, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants