Set TCP_USER_TIMEOUT socket option by p0lyn0mial · Pull Request #926 · openshift/library-go

p0lyn0mial · 2020-10-19T16:24:00Z

This PR sets the TCP_USER_TIMEOUT (https://man7.org/linux/man-pages/man7/tcp.7.html) socket option which controls for how long transmitted data may be unacknowledged before the connection is forcefully closed.

Without that option, we rely on the TCP stack to detect broken network connection. This can take up to 15 minutes. During that time our platform might be unavailable.

There are already reported cases in which aggregated APIs (i.e. openshift-apiserver) were unable to establish a new connection to the Kube API for 15 minutes after "ungraceful termination" (https://bugzilla.redhat.com/show_bug.cgi?id=1881878 and after a network error https://bugzilla.redhat.com/show_bug.cgi?id=1879232#c39)

It looks like detecting broken connections on the application level is getting more traction and is preferable. Unfortunately, it is on the slow track and will require backporting to golang's std library. Until that time we would like to take advantage of TCP_USER_TIMEOUT

[1] https://go-review.googlesource.com/c/net/+/198040
[2] https://go-review.googlesource.com/c/net/+/236498#message-7bd657ac6960f0dc7acbbe28cbe3d80ac4f3a34b

p0lyn0mial · 2020-10-19T16:24:43Z

/assing @sttts @squeed

pkg/config/client/client_config.go

p0lyn0mial · 2020-10-19T16:27:12Z

pkg/config/client/client_config.go

 		KeepAlive: 30 * time.Second,
-	}).DialContext
+		Control: func(network, address string, c syscall.RawConn) error {
+			// Supported only on Linux


do we have to detect connections to the local host?

squeed · 2020-10-20T09:52:16Z

Given that I don't think many people will be accessing the apiserver over sattelite internet, these timing parameters seem just fine. However, we need to verify that they do what we want, since these knobs are non-orthogonal and never obvious.

The easiest way I can think of to test this is

spin up a cluster with this PR
Start a pod, start a tcpdump on a master with all traffic from / to that pod ip
issue a watch for something that has no updates
verify that tcp keepalives do what we expect
using iptables, block access to that pod from the apiserver
verify that the connection is closed.

p0lyn0mial · 2020-10-20T09:52:26Z

pkg/network/dialer_unix.go

+func dialerWithDefaultOptions() *net.Dialer {
+	return &net.Dialer{
+		// TCP_USER_TIMEOUT does affect the behaviour of connect() which is controlled by this field so we set it to the same value
+		Timeout: 25 * time.Second,


does it make sense?

Timeout, in this case, is timeout to establish the TCP session. 25 seconds might be gigantic, if this is exclusively intra-cluster traffic.

yes, this function is primarily used by the operators (https://github.com/search?q=org%3Aopenshift+GetKubeConfigOrInClusterConfig&type=code).

Setting it to 5s would be okay?

actually, now that I think about it, 20 is probably about right. In a cluster that is thrashing but making progress, setting too low a timeout just leads to additional resource exhaustion. As always, it's a balancing act between keeping lightly-loaded clusters performant vs. tolerating heavliy-loaded ones.

I wish we had better numbers to make this decision. Let me see if we have some metrics.

pkg/config/client/client_config.go

sttts · 2020-10-20T10:38:30Z

pkg/network/dialer_others.go

+	klog.V(2).Info("Creating the default network Dialer (unsupported platform). It may take up to 15 minutes to detect broken connections and establish a new one")
+	return &net.Dialer{
+		Timeout:   30 * time.Second,
+		KeepAlive: 30 * time.Second,


keepalive in linux is gone?

no, KeepAlive setts both TCP_KEEPINTVL and TCP_KEEPIDLE to the same value. Since we want distinct values we are now setting them in setDefaultSocketOptions function

…lt options sets.

p0lyn0mial · 2020-10-20T12:46:39Z

/hold

for testing

p0lyn0mial · 2020-10-28T11:18:46Z

re e8be127 commit

it turned out that TCP_KEEPINTVL and TCP_KEEPIDLE weren't set because they got overwritten after calling DialContext method, here https://github.com/golang/go/blob/master/src/net/dial.go#L431

p0lyn0mial · 2020-10-28T12:03:49Z

I did test it on an idle connection and it looks like it actually works.

I created a simple app that establishes a watch on never changing test01/secrets resources to the Kube API server and simply dropped network traffic originating from the app.

Withouth the patch, the connection is terminated after ~270s=30s * 9 , here is a tcpdump

With the patch, the connection is terminated after ~25s, after dropping the traffic approximately 4 keep alive probes were sent every 5s, here is a tcpdump and the modified app

p0lyn0mial · 2020-10-29T11:33:13Z

I did test it on an active connection and it looks like it actually works.

I created a simple app that sends 3 req per second to list test01/secrets resources from the Kube API server and simply dropped network traffic originating from the app.

Withouth the patch, the connection is terminated after ~16m, here is a tcpdump

With the patch, the connection is terminated after ~25s, here is a tcpdump and the modified app

p0lyn0mial · 2020-10-30T12:59:27Z

/hold cancel

sttts · 2020-11-02T09:00:56Z

/lgtm
/approve

openshift-ci-robot · 2020-11-02T09:01:20Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: p0lyn0mial, sttts

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [sttts]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

squeed · 2020-11-02T09:23:08Z

post-merge /lgtm - this is awesome!

p0lyn0mial · 2020-11-03T13:10:34Z

/cherry-pick release-4.6

openshift-cherrypick-robot · 2020-11-03T13:10:45Z

@p0lyn0mial: new pull request created: #937

Details

In response to this:

/cherry-pick release-4.6

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 19, 2020

openshift-ci-robot requested review from deads2k and smarterclayton October 19, 2020 16:24

p0lyn0mial commented Oct 19, 2020

View reviewed changes

pkg/config/client/client_config.go Outdated Show resolved Hide resolved

p0lyn0mial commented Oct 19, 2020

View reviewed changes

p0lyn0mial force-pushed the tcp-usr-timeout-dialer branch 2 times, most recently from 7c2abcc to 6f5b6b3 Compare October 20, 2020 09:52

p0lyn0mial commented Oct 20, 2020

View reviewed changes

p0lyn0mial force-pushed the tcp-usr-timeout-dialer branch 3 times, most recently from 5b15e39 to 3073050 Compare October 20, 2020 10:18

sttts reviewed Oct 20, 2020

View reviewed changes

pkg/config/client/client_config.go Outdated Show resolved Hide resolved

sttts reviewed Oct 20, 2020

View reviewed changes

p0lyn0mial added 2 commits October 20, 2020 12:54

provides DefaultClientDialer that returns a network dialer with defau…

7c42c3e

…lt options sets.

go mod vendor

9693567

p0lyn0mial force-pushed the tcp-usr-timeout-dialer branch from 3073050 to 9693567 Compare October 20, 2020 10:54

p0lyn0mial changed the title ~~[WIP]: Set TCP_USER_TIMEOUT socket option~~ Set TCP_USER_TIMEOUT socket option Oct 20, 2020

openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 20, 2020

openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 20, 2020

to DialContext

e8be127

p0lyn0mial force-pushed the tcp-usr-timeout-dialer branch from f6b30d9 to e8be127 Compare October 28, 2020 11:11

p0lyn0mial mentioned this pull request Oct 29, 2020

[WIP] fake library-go bump to test the tcp patch openshift/openshift-apiserver#152

Closed

openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 30, 2020

openshift-ci-robot assigned sttts Nov 2, 2020

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Nov 2, 2020

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 2, 2020

openshift-merge-robot merged commit c4fa0f5 into openshift:master Nov 2, 2020

p0lyn0mial mentioned this pull request Nov 2, 2020

bumps (library-go) openshift/openshift-apiserver#153

Merged

openshift-cherrypick-robot mentioned this pull request Nov 3, 2020

[release-4.6] Set TCP_USER_TIMEOUT socket option #937

Merged

p0lyn0mial mentioned this pull request Nov 16, 2020

UPSTREAM: <drop>: enable TCP_USER_TIMEOUT on client connections openshift/kubernetes#457

Closed

p0lyn0mial mentioned this pull request Nov 23, 2020

improve DefaultClientDialContext #944

Merged

p0lyn0mial mentioned this pull request Dec 9, 2020

Set TCP_USER_TIMEOUT socket option #967

Merged

p0lyn0mial mentioned this pull request Jan 25, 2021

release-3.11: Set TCP_USER_TIMEOUT socket option #984

Closed

p0lyn0mial mentioned this pull request Dec 9, 2021

Add DialContext configuration override #1264

Closed

Conversation

p0lyn0mial commented Oct 19, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

p0lyn0mial commented Oct 19, 2020

Uh oh!

Uh oh!

p0lyn0mial Oct 19, 2020

Choose a reason for hiding this comment

Uh oh!

squeed commented Oct 20, 2020

Uh oh!

p0lyn0mial Oct 20, 2020

Choose a reason for hiding this comment

Uh oh!

squeed Oct 20, 2020

Choose a reason for hiding this comment

Uh oh!

p0lyn0mial Oct 20, 2020

Choose a reason for hiding this comment

Uh oh!

squeed Oct 20, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sttts Oct 20, 2020

Choose a reason for hiding this comment

Uh oh!

p0lyn0mial Oct 20, 2020

Choose a reason for hiding this comment

Uh oh!

p0lyn0mial commented Oct 20, 2020

Uh oh!

p0lyn0mial commented Oct 28, 2020

Uh oh!

p0lyn0mial commented Oct 28, 2020

Uh oh!

p0lyn0mial commented Oct 29, 2020

Uh oh!

p0lyn0mial commented Oct 30, 2020

Uh oh!

sttts commented Nov 2, 2020

Uh oh!

openshift-ci-robot commented Nov 2, 2020

Uh oh!

squeed commented Nov 2, 2020

Uh oh!

p0lyn0mial commented Nov 3, 2020

Uh oh!

openshift-cherrypick-robot commented Nov 3, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

p0lyn0mial commented Oct 19, 2020 •

edited

Loading