-
Notifications
You must be signed in to change notification settings - Fork 537
[Socket Handler]: Adds HTTP/2 PING keep-alive to detect broken connections in pool. #5788
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -183,6 +183,42 @@ public static HttpMessageHandler CreateSocketsHttpHandlerHelper( | |
| DefaultTrace.TraceWarning("Failed to set EnableMultipleHttp2Connections on SocketsHttpHandler: {0}", ex.Message); | ||
| } | ||
|
|
||
| // Enable HTTP/2 PING keep-alive to detect broken connections. | ||
| // Without this, a broken HTTP/2 connection (e.g. after a network blip or load balancer | ||
| // reset) can remain in the pool indefinitely, causing persistent request failures | ||
| // that only resolve after application restart. | ||
| // KeepAlivePingDelay/Timeout/Policy are available on SocketsHttpHandler in .NET 5.0+. | ||
| try | ||
| { | ||
| int pingDelayInSeconds = ConfigurationManager.GetEnvironmentVariable<int>( | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟡 Recommendation — Resilience: Add int pingDelayInSeconds = ConfigurationManager.GetEnvironmentVariable<int>(
ConfigurationManager.Http2KeepAlivePingDelayInSeconds,
defaultValue: 1);
int pingTimeoutInSeconds = ConfigurationManager.GetEnvironmentVariable<int>(
ConfigurationManager.Http2KeepAlivePingTimeoutInSeconds,
defaultValue: 2);Both
The feature this PR adds is silently defeated with only a trace warning. This is especially subtle because the delay failure also prevents the The codebase already has an established pattern for this in // Existing pattern
return Math.Max(
ConfigurationManager.GetEnvironmentVariable(..., defaultValue: ...),
MinMaxRetriesInLocalRegionWhenRemoteRegionPreferred);Suggested fix: int pingDelayInSeconds = Math.Max(1, ConfigurationManager.GetEnvironmentVariable<int>(
ConfigurationManager.Http2KeepAlivePingDelayInSeconds,
defaultValue: 1));
int pingTimeoutInSeconds = Math.Max(1, ConfigurationManager.GetEnvironmentVariable<int>(
ConfigurationManager.Http2KeepAlivePingTimeoutInSeconds,
defaultValue: 2));This guarantees values are always ≥ 1 second, which is the minimum |
||
| ConfigurationManager.Http2KeepAlivePingDelayInSeconds, | ||
| defaultValue: 1); | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Defaults seems very aggressive.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Seems like there are the RUST defaults.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The set defaults are what have been set in Rust as well https://github.com/Azure/azure-sdk-for-rust/blob/955b4c4803f6a96f082f3ecd394fd4ae58bc8cd6/sdk/cosmos/azure_data_cosmos_driver/src/options/connection_pool.rs#L254-L255
kirankumarkolli marked this conversation as resolved.
|
||
|
|
||
| int pingTimeoutInSeconds = ConfigurationManager.GetEnvironmentVariable<int>( | ||
| ConfigurationManager.Http2KeepAlivePingTimeoutInSeconds, | ||
| defaultValue: 2); | ||
|
|
||
| PropertyInfo keepAlivePingDelayInfo = socketHandlerType.GetProperty("KeepAlivePingDelay"); | ||
| keepAlivePingDelayInfo?.SetValue(socketHttpHandler, TimeSpan.FromSeconds(pingDelayInSeconds)); | ||
|
|
||
| PropertyInfo keepAlivePingTimeoutInfo = socketHandlerType.GetProperty("KeepAlivePingTimeout"); | ||
| keepAlivePingTimeoutInfo?.SetValue(socketHttpHandler, TimeSpan.FromSeconds(pingTimeoutInSeconds)); | ||
|
|
||
| // HttpKeepAlivePingPolicy.Always = 1: send pings even for idle connections, | ||
| // which is critical for detecting broken connections lingering in the pool. | ||
| PropertyInfo keepAlivePingPolicyInfo = socketHandlerType.GetProperty("KeepAlivePingPolicy"); | ||
| if (keepAlivePingPolicyInfo != null) | ||
| { | ||
| Type pingPolicyType = keepAlivePingPolicyInfo.PropertyType; | ||
| object alwaysValue = Enum.ToObject(pingPolicyType, 1); | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟢 Suggestion — Resilience: Prefer object alwaysValue = Enum.ToObject(pingPolicyType, 1);The hardcoded Consider using name-based lookup instead: object alwaysValue = Enum.Parse(pingPolicyType, "Always");This is self-documenting, resilient to ordinal changes, and equally correct. The performance difference is irrelevant — this runs once at client startup. |
||
| keepAlivePingPolicyInfo.SetValue(socketHttpHandler, alwaysValue); | ||
| } | ||
| } | ||
| catch (Exception ex) | ||
| { | ||
| DefaultTrace.TraceWarning("Failed to configure HTTP/2 keep-alive ping on SocketsHttpHandler: {0}", ex.Message); | ||
| } | ||
|
|
||
| if (serverCertificateCustomValidationCallback != null) | ||
| { | ||
| //Get SslOptions Property | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -466,6 +466,11 @@ public void CreateSocketsHttpHandlerCreatesCorrectValueType() | |
| Assert.AreEqual(gatewayLimit, socketsHandler.MaxConnectionsPerServer); | ||
| Assert.IsTrue(socketsHandler.EnableMultipleHttp2Connections, "EnableMultipleHttp2Connections should be true for HTTP/2 thin client support"); | ||
|
|
||
| // HTTP/2 PING keep-alive: detects broken connections lingering in the pool | ||
| Assert.AreEqual(TimeSpan.FromSeconds(1), socketsHandler.KeepAlivePingDelay, "KeepAlivePingDelay should be 1 second for HTTP/2 connection health monitoring"); | ||
| Assert.AreEqual(TimeSpan.FromSeconds(2), socketsHandler.KeepAlivePingTimeout, "KeepAlivePingTimeout should be 2 seconds"); | ||
| Assert.AreEqual(HttpKeepAlivePingPolicy.Always, socketsHandler.KeepAlivePingPolicy, "KeepAlivePingPolicy should be Always to detect broken idle connections"); | ||
|
|
||
| //Create cert for test | ||
| X509Certificate2 x509Certificate2 = new CertificateRequest("cn=www.test", ECDsa.Create(), HashAlgorithmName.SHA256).CreateSelfSigned(DateTime.Now, DateTime.Now.AddYears(1)); | ||
| X509Chain x509Chain = new X509Chain(); | ||
|
|
@@ -495,6 +500,46 @@ public void CreateHttpClientHandlerCreatesCorrectValueType() | |
| Assert.IsFalse(clientHandler.ServerCertificateCustomValidationCallback.Invoke(new HttpRequestMessage(), x509Certificate2, x509Chain, sslPolicyErrors)); | ||
| } | ||
|
|
||
| [TestMethod] | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Possible concurrency issue with other tests? |
||
| public void CreateSocketsHttpHandlerRespectsEnvironmentVariableOverrides() | ||
| { | ||
| int customPingDelay = 60; | ||
| int customPingTimeout = 10; | ||
|
|
||
| try | ||
| { | ||
| Environment.SetEnvironmentVariable( | ||
| ConfigurationManager.Http2KeepAlivePingDelayInSeconds, | ||
| customPingDelay.ToString()); | ||
| Environment.SetEnvironmentVariable( | ||
| ConfigurationManager.Http2KeepAlivePingTimeoutInSeconds, | ||
| customPingTimeout.ToString()); | ||
|
|
||
| HttpMessageHandler handler = CosmosHttpClientCore.CreateSocketsHttpHandlerHelper( | ||
| gatewayModeMaxConnectionLimit: 10, | ||
| webProxy: null, | ||
| serverCertificateCustomValidationCallback: null); | ||
|
|
||
| SocketsHttpHandler socketsHandler = (SocketsHttpHandler)handler; | ||
|
|
||
| Assert.AreEqual(TimeSpan.FromSeconds(customPingDelay), socketsHandler.KeepAlivePingDelay, | ||
| "KeepAlivePingDelay should respect environment variable override"); | ||
| Assert.AreEqual(TimeSpan.FromSeconds(customPingTimeout), socketsHandler.KeepAlivePingTimeout, | ||
| "KeepAlivePingTimeout should respect environment variable override"); | ||
| Assert.AreEqual(HttpKeepAlivePingPolicy.Always, socketsHandler.KeepAlivePingPolicy, | ||
| "KeepAlivePingPolicy should always be Always regardless of environment variables"); | ||
| } | ||
| finally | ||
| { | ||
| Environment.SetEnvironmentVariable( | ||
| ConfigurationManager.Http2KeepAlivePingDelayInSeconds, | ||
| null); | ||
| Environment.SetEnvironmentVariable( | ||
| ConfigurationManager.Http2KeepAlivePingTimeoutInSeconds, | ||
| null); | ||
| } | ||
| } | ||
|
|
||
| [TestMethod] | ||
| public async Task HttpTimeoutPolicyForThinClientOn503TestAsync() | ||
| { | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🟡 Recommendation — Resilience: Separate try-catch blocks for independent property sets
The single
try/catchblock (lines 191–220) groups env var parsing,KeepAlivePingDelay,KeepAlivePingTimeout, andKeepAlivePingPolicytogether. If any one operation fails — including aFormatExceptionfrom a non-numeric env var string, or aTargetInvocationExceptionfrom the delay setter — the entire block is skipped, including the criticalKeepAlivePingPolicy = Always.The sibling
EnableMultipleHttp2Connectionsjust above (lines 176–184) uses its own isolated try-catch, which is the established pattern in this method. Each logical property set is independent and should fail independently.Concrete failure scenario: A user sets
AZURE_COSMOS_HTTP2_KEEPALIVE_PING_DELAY_IN_SECONDS=fast(non-numeric).Convert.ChangeTypethrows before any property is set → the catch fires → delay, timeout, AND policy are all skipped → the handler has .NET defaults (KeepAlivePingDelay = Timeout.InfiniteTimeSpan,KeepAlivePingPolicy = WithActiveRequests) → PING keep-alive is completely disabled. A bad delay value shouldn't prevent the timeout and policy from being configured with their defaults.Suggested structure:
This way, a failure in one property still allows the others to be applied — all partial states are safe since the .NET defaults are reasonable fallbacks.