-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
grpc: disable and document overrides of OS default TCP keepalive by Go #6672
Conversation
…t implemention of server side overrides
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing the previous comments. the changes LGTM, modulo minor formatting nits.
In general for docstrings, ensure that commentary is readable from source even on narrow screens. When a comment gets too long(typically anything > 80), it is recommended to wrap it into multiple single-line comments.
cc: @dfawley. Requesting a 2nd review from you.
dialoptions.go
Outdated
// Note: Go overrides the OS defaults for TCP keepalive time and interval to 15s. | ||
// To retain OS defaults, use a net.Dialer with the KeepAlive field set to a negative value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Note: Go overrides the OS defaults for TCP keepalive time and interval to 15s. | |
// To retain OS defaults, use a net.Dialer with the KeepAlive field set to a negative value. | |
// Note: Go overrides the OS defaults for TCP keepalive time and interval to 15s. | |
// To retain OS defaults, use a net.Dialer with the KeepAlive field set to a | |
// negative value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's say something like "As of Go 1.21, the standard library overrides ............"
Also, something that gives clarity into our default behavior would be nice, too. E.g. "gRPC-Go's default dialer does this in order to restore the OS defaults."
Also:
//
// For more information, please see [issue 48622] in the Go github repo.
//
// [issue 48622]: https://github.com/golang/go/issues/48622
dialoptions.go
Outdated
// Note: Go overrides the OS defaults for TCP keepalive time and interval to 15s. | ||
// To retain OS defaults, use a net.Dialer with the KeepAlive field set to a negative value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's say something like "As of Go 1.21, the standard library overrides ............"
Also, something that gives clarity into our default behavior would be nice, too. E.g. "gRPC-Go's default dialer does this in order to restore the OS defaults."
Also:
//
// For more information, please see [issue 48622] in the Go github repo.
//
// [issue 48622]: https://github.com/golang/go/issues/48622
Co-authored-by: Arvind Bright <[email protected]>
Codecov Report
Additional details and impacted files |
@dfawley I've reverted the changes to the greeter_server example as requested. Actually the default TCP keepalive times were implemented since Go 1.13. The original issue in the Go github repo that discussed the implementation of default keepalive times is here. Here is the change I made for the documentation in dialoptions.go instead, do let me know if it is okay.
Also, I am not really sure what you meant by including "gRPC-Go's default dialer does this in order to restore the OS defaults.". Based on my understanding, there has not been any restoration of OS defaults on grpc side until this PR. And the only restoration this PR achieves is only on the client side (see this) |
Sorry, by "As of Go 1.21" I meant "ALL versions of go, up to and including 1.21 SO FAR". I think we should document that it's currently the case, but that it may not always be the case, and is even expected to be changed. If you can think of better wording than "As of" for this, then that's fine too! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM modulo the punctuation and potentially the benchmark changes.
@arvindbr8 please review again after @JaydenTeoh updates this. Thanks! |
This PR is labeled as requiring an update from the reporter, and no update has been received after 6 days. If no update is provided in the next 7 days, this issue will be automatically closed. |
@arvindbr8 updated. Hmm but seems like the tests are failing. I don't think it is related to changing the dialer on the |
@JaydenTeoh -- it is a known flake, I've rerun the failed test for you. Meanwhile, let me take a look at the diff. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, modulo minor comment about docstring. Just to maintain a similar formatting for references.
nvmd, I've been informed this doesnt work. Your comments was per the golang doc |
@@ -176,7 +176,9 @@ func dial(ctx context.Context, fn func(context.Context, string) (net.Conn, error | |||
if networkType == "tcp" && useProxy { | |||
return proxyDial(ctx, address, grpcUA) | |||
} | |||
return (&net.Dialer{}).DialContext(ctx, networkType, address) | |||
// KeepAlive is set to a negative value to prevent Go's override of the TCP | |||
// keepalive time and interval; retain the OS default. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this comment is technically correct, I'm not sure this is what you intended and I think the comment could be more clear: the default on linux is to disable keepalives unless opted in via SO_KEEPALIVE, which then uses the os-level keepalive configuration.
At least this is different from Java and what Eric proposed in #6250 (comment), which would be to set SO_KEEPALIVE to true unconditionally. This could be done by adding something like:
Control: func(network, address string, c syscall.RawConn) error {
return c.Control(func(fd uintptr) {
syscall.SetsockoptInt(int(fd), syscall.SOL_SOCKET, syscall.SO_KEEPALIVE, 1)
})
},
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the default on linux is to disable keepalives unless opted in via SO_KEEPALIVE, which then uses the os-level keepalive configuration.
Hmm, I believe you are right. We were under the impression that keepalives were always enabled and that a negative value would not disable them, but would use the OS defaults for the timers. However, this doesn't call setKeepAlive
when the value is negative:
https://go.dev/src/net/tcpsock.go#L238
And they set SO_KEEPALIVE
explicitly for posix systems in setKeepAlive
:
https://go.dev/src/net/sockopt_posix.go#L116
So presumably not doing that means keepalives will be disabled otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think of unconditionally enabling keepalives via Dialer control then?
(3) is something that Java's been doing forever, and still do independent of (1). Even with (1), (3) is still useful as some people configure their OS to use more aggressive settings for their specific environment (e.g., AWS). For those environments (3) behaves well without any extra work from users. And if you aren't in such an environment, then (3) is very low or zero cost.
I think we should try to do that before the release.
edit: nevermind, just saw your comment on #6250 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think of unconditionally enabling keepalives via Dialer control then?
A little bit more thought is required I guess. We do set the Dialer.Control
to a different function when running inside of the Google production network to set certain socket options that are appropriate for that environment. Unconditionally enabling TCP keepalives using the Dialer.Control
here could still work (if we enhance the above function to also do the same). Thoughts @dfawley ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, it appears that the proposal mentioned here has made significant progress and has a pending change. This would allow us to disable TCP keepalive or use OS defaults with a simple API. We would have to wait for a new Go version though, to be able to use this and would be a while until that Go version becomes the least supported Go version for grpc.
Pertaining to #6250
Summary of Changes made (as suggested by @easwars):
transport/http2_client.go
: Disable Go's override of OS default TCP keepalive interval (currently 15s) on the client-sidehelloworld/greeter_server
I am still unable to replicate the bug mentioned originally in the issue and test whether my changes on client-side will fix it.
grpc version: v1.58.2
go version: go1.20.4 darwin/arm64
OS: macOS Ventura 13.4
at packet 3773, I applied the rule to
pftcl
:but my server does not reset the connection after a single lost keepalive packet (I am using the default keepalive settings on both client and server side).
RELEASE NOTES: none