Skip to content

Conversation

@linkvt
Copy link

@linkvt linkvt commented Oct 31, 2025

Fixes #15432
Fixes #10962

Release Note

Fall back to HTTP1 on failed HTTP2 health probes (e.g. on connection error or non-readiness)

Summary

Improves queue-proxy HTTP/2 probing reliability by switching from upgrade-based detection to direct H2C probes with fallback to HTTP/1.1.

Changes

  • HTTP/2 probe fallback logic: Replaces the deprecated HTTP upgrade mechanism (OPTIONS with Connection: Upgrade, HTTP2-Settings) with direct H2C GET requests. Falls back to HTTP/1.1 if H2C probe fails or returns non-ready status
  • Simplified transport handling: Removes version-spoofing transport wrapper in favor of protocol hints via req.ProtoMajor
  • Test updates: Rewrites hellohttp2 test service to use standard library HTTP/2 server

Additional Change (Could be Separate PR)

Also includes support for overriding the queue-proxy image via the queue.sidecar.serving.knative.dev/image annotation on KService specs.
I found this very useful during my tests, as...

  • I could ko apply a single file
  • get the update of both service and queue-pro in one revision
  • could compare the behavior of different kservices with different queue-proxy images (relevant for the next section)

Replacement of golang.org/x/net/http2 with stdlib

I looked into the replacement of golang.org/x/net/http2 (as h2c and http2 support exists in stdlib since 1.24) and golang.org/x/net/http2/h2c (proposal for deprecation exists) in pkg but didn't include it in this PR as it was unexpectedly a huge topic.
Switching to stdlib is not as easy, as the http2.Client sets up the http2 connection in a non standard way sending an HTTP2 Preface despite using the TLS connection:

A client that knows that a server supports HTTP/2 can establish a TCP connection and send the connection preface (Section 3.4) followed by HTTP/2 frames. Servers can identify these connections by the presence of the connection preface. This only affects the establishment of HTTP/2 connections over cleartext TCP; HTTP/2 connections over TLS MUST use protocol negotiation in TLS [TLS-ALPN].

This means that during a knative upgrade pod updates of queue-proxy before the activator would cause issues, as activator would send the preface the go stdlib http2 implementation in queue-proxy does not handle. We might be able to ignore such requests but I didn't test it yet.

The second task would be to setup H2 connections the standard way: via TLS ALPN.
But: how do we know in queue-proxy and the activator whether we actually want to use HTTP2?
The queue-proxy has currently no knowledge (besides it using the HTTP2 port 8013) and could rely on the probe to the user-container to figure this out and only afterwards accept HTTP2 connections.
We could derive this info during the Revision reconciliation but that would oppose removing the port naming restriction (see #4283).

The activator could potentially add the h2 protocol in the transport so that proxy connections using it would attempt h2 .
But how does queue-proxy behave? Always accept h2 (defined in the TLS Config of the server) without knowing whether the service supports h2?

Then there is also h2c (http2 cleartext) - how are things negotiated in this case?

I have more questions than answers right now but fortunately a solution of that isn't required to fix the issue referenced above.

Thanks for the feedback!

@knative-prow knative-prow bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Oct 31, 2025
@knative-prow
Copy link

knative-prow bot commented Oct 31, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: linkvt
Once this PR has been reviewed and has the lgtm label, please assign dprotaso for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@knative-prow knative-prow bot requested review from dprotaso and skonto October 31, 2025 08:50
@codecov
Copy link

codecov bot commented Oct 31, 2025

Codecov Report

❌ Patch coverage is 93.54839% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.18%. Comparing base (2f3129a) to head (c52f12c).

Files with missing lines Patch % Lines
pkg/queue/health/probe.go 89.47% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #16205      +/-   ##
==========================================
+ Coverage   80.05%   80.18%   +0.12%     
==========================================
  Files         214      214              
  Lines       13281    13275       -6     
==========================================
+ Hits        10632    10644      +12     
+ Misses       2291     2271      -20     
- Partials      358      360       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@linkvt linkvt force-pushed the fallback-to-http1-on-failed-http2-probe branch from 386f8c6 to 9e6b6fa Compare October 31, 2025 09:55

// Returns a transport that uses HTTP/2 if it's known to be supported, and otherwise
// spoofs the request & response versions to HTTP/1.1.
func autoDowngradingTransport(opt HTTPProbeConfigOptions) http.RoundTripper {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could not come up with a reason for why this autoDowngradingTransport would be needed at all, maybe it was required some time ago but doesn't seem to be the case today.

t := http.DefaultTransport.(*http.Transport).Clone()
t.TLSClientConfig.InsecureSkipVerify = true
return t
}()
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This transport was only used during HTTP2 detection, as we dont support https/h2 probes IMO not really needed.

@linkvt linkvt force-pushed the fallback-to-http1-on-failed-http2-probe branch from 9e6b6fa to 0c5f9df Compare October 31, 2025 10:14
@linkvt linkvt force-pushed the fallback-to-http1-on-failed-http2-probe branch from 0c5f9df to c52f12c Compare October 31, 2025 10:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Queue Proxy health checks incompatible with non-HTTP/2 applications [gRPC/http2 auto-detect] Flakiness and potential connection leak

1 participant