Skip to content

[konnectivity-client] Ensure grpc tunnel is closed on dial failure#398

Merged
k8s-ci-robot merged 5 commits intokubernetes-sigs:masterfrom
tallclair:client-dial-cls
Sep 7, 2022
Merged

[konnectivity-client] Ensure grpc tunnel is closed on dial failure#398
k8s-ci-robot merged 5 commits intokubernetes-sigs:masterfrom
tallclair:client-dial-cls

Conversation

@tallclair
Copy link
Copy Markdown
Contributor

3 commits:

  1. Handle DIAL_CLS packets in the client - treat it similarly to a DIAL_RSP with a failure
  2. Send a best-effort DIAL_CLS to the server if the client times out (or the request is canceled)
  3. Explicitly close the client connection and stop serving the tunnel on dial failures

Note to reviewers: I'm hoping to backport this to older Kubernetes versions, so consider backwards compatibility (both with older k8s, as well as older proxy-servers). I think it's OK, but would appreciate extra attention on this detail.

/assign @cheftako

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Sep 3, 2022
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Sep 3, 2022
// Override goleakVerifyNone to set t.Helper.
// TODO: delete this once goleak has been updated to include
// https://github.com/uber-go/goleak/commit/2dfebe88ddf19de216c4ab15a1189fc640b1ea9f
func goleakVerifyNone(t *testing.T, options ...goleak.Option) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider making this part of a util in konn-client.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack. Will follow up in a separate PR.

klog.V(5).InfoS("Context canceled waiting for DialResp", "ctxErr", requestCtx.Err(), "dialID", random)
go t.closeDial(random)
return nil, &dialFailure{"dial timeout, context", DialFailureContext}
case <-t.done:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need 3 paths for we're shutting down?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately yes, I don't see a clean way to consolidate these. The timeout & context canceled cases could easily be collapsed, but then we wouldn't be able to distinguish between them in the logs. This <-t.done case is not strictly necessary (the timeout case would be hit eventually), but it makes for a cleaner shutdown if the underlying connection gets closed prematurely (outside of konnectivity-client).

@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Sep 7, 2022
@tallclair
Copy link
Copy Markdown
Contributor Author

Addressed feedback. Also added a commit to to ensure the tunnel is always closed when the net.Conn is closed (fixing another potential leak)


if !ok {
// If the DIAL_RSP does not match a pending dial, it means one of two things:
// 1. There was a second DIAL_RSP for the connection request (this is very unlikely but possible)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should ever get two DIAL_RSP for a pending dial. Collisions shouldn't be possible given the client sets req.Random. Maybe this could happen if the server had a race between sending the DIAL_RESP and a DIAL_CLS on something like a timeout?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost the inverse of #1 below.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was also confused by this comment. The comment was there before (moved to here), but I can't think of when this would occur. I'll update it.

@cheftako
Copy link
Copy Markdown
Contributor

cheftako commented Sep 7, 2022

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 7, 2022
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cheftako, tallclair

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 7, 2022
@k8s-ci-robot k8s-ci-robot merged commit 4271084 into kubernetes-sigs:master Sep 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants