client: process RPC stats/tracing only when a handler is configured#8874
client: process RPC stats/tracing only when a handler is configured#8874mbissa merged 4 commits intogrpc:masterfrom
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #8874 +/- ##
==========================================
- Coverage 81.41% 80.70% -0.72%
==========================================
Files 416 416
Lines 33429 33434 +5
==========================================
- Hits 27216 26982 -234
+ Misses 4660 4648 -12
- Partials 1553 1804 +251
🚀 New features to boost your workflow:
|
|
Curious, how did you run into this? Is there an issue for this? |
Can you please elaborate on this? My understanding is that when the I think it would be good to open an issue that documents all the details and the approach taken (and why this is the preferred approach, if there are other approached available at all). Thanks. |
| } | ||
|
|
||
| s, err := as.transport.NewStream(as.ctx, as.callHdr) | ||
| s, err := as.transport.NewStream(as.ctx, as.callHdr, nil) |
There was a problem hiding this comment.
Is this how the health and ORCA streams get a nil stats handler? We want to eliminate this duplication between the regular client stream and the non-retry client stream at some point. We tried to prioritize that work for Q1, but it didn't happen. This has been a maintenance burden for a while now, and a source of hard to find bugs.
|
Just did a rebase with master to fast forward my commit and address the vet-proto failure. |
dde0677 to
734cd3c
Compare
|
Just FYI, no action needed. I was considering whether we'd need to pass grpc-go/stats/opentelemetry/client_metrics.go Lines 192 to 199 in d7b3f93 Consequently, this change should also eliminate the error log from the OTel metrics handler as well. As before, per-call metrics will not be recorded for RPCs initiated by producers. |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request refactors how RPC stats and tracing are handled to resolve an issue with repetitive error logging for health and ORCA producers. The change introduces a per-stream stats.Handler, moving away from a transport-level handler for RPC events. This allows streams like those for health checks, which don't have stats configured, to bypass stats processing, thus preventing the erroneous log messages. The implementation is clean and effective: the stats.Handler is now part of the ClientStream struct and is passed during stream creation. The NewStream function signature has been updated across the transport layer, and call sites have been correctly modified. For regular RPCs, the existing stats handler is plumbed through, while for internal streams like health checks, nil is passed, correctly disabling stats for them. The changes are logical and well-executed. This is a good improvement to reduce log noise and make stats handling more precise.
easwars
left a comment
There was a problem hiding this comment.
Would it be possible to add a test that verifies that the offending log is not seen for health/ORCA streams?
| headerChanClosed uint32 // set when headerChan is closed. Used to avoid closing headerChan multiple times. | ||
| bytesReceived atomic.Bool // indicates whether any bytes have been received on this stream | ||
| unprocessed atomic.Bool // set if the server sends a refused stream or GOAWAY including this stream | ||
| statsHandler stats.Handler |
There was a problem hiding this comment.
Can we please add a short trailing comment explaining when this can be nil? Thanks.
| } | ||
|
|
||
| s, err := as.transport.NewStream(as.ctx, as.callHdr) | ||
| s, err := as.transport.NewStream(as.ctx, as.callHdr, nil) |
There was a problem hiding this comment.
Can we please add a comment here as to why we are setting it to nil here, so that future readers dont have to second guess themselves.
|
|
||
| // NewStream creates a Stream for an RPC. | ||
| NewStream(ctx context.Context, callHdr *CallHdr) (*ClientStream, error) | ||
| NewStream(ctx context.Context, callHdr *CallHdr, handler stats.Handler) (*ClientStream, error) |
There was a problem hiding this comment.
I see that a stats.Handler field is stored in the transport now. Is it required anymore at all given that the stats handler is passed at stream creation time?
There was a problem hiding this comment.
We can look at all the details and address it as part of bigger cleanup - don't want to include more changes here as this is an urgent change.
…rpc#8874) This resolves an issue where below log statement keeps printing repeatedly: "ERROR: [otel-plugin] ctx passed into client side stats handler metrics event handling has no client attempt data present". When new stream is created for health/orca producers, stats and tracing is not setup. However, this fact is ignored during RPC and an error logs is printed to denote that stats cannot be handled. We will enable stream to have its own reference to the stats handler and only process per RPC implementation when it is present (like in case of regular data streams). Internal issue: b/385685802 RELEASE NOTES: * stats: only process RPC stats/tracing in health and ORCA producers if a handler is configured, preventing unnecessary error logging
Cherry picks [PR](#8874) into 1.79.x RELEASE NOTES: * stats: only process RPC stats/tracing in health and ORCA producers if a handler is configured, preventing unnecessary error logging
…rpc#8874) This resolves an issue where below log statement keeps printing repeatedly: "ERROR: [otel-plugin] ctx passed into client side stats handler metrics event handling has no client attempt data present". When new stream is created for health/orca producers, stats and tracing is not setup. However, this fact is ignored during RPC and an error logs is printed to denote that stats cannot be handled. We will enable stream to have its own reference to the stats handler and only process per RPC implementation when it is present (like in case of regular data streams). Internal issue: b/385685802 RELEASE NOTES: * stats: only process RPC stats/tracing in health and ORCA producers if a handler is configured, preventing unnecessary error logging
…rpc#8874) This resolves an issue where below log statement keeps printing repeatedly: "ERROR: [otel-plugin] ctx passed into client side stats handler metrics event handling has no client attempt data present". When new stream is created for health/orca producers, stats and tracing is not setup. However, this fact is ignored during RPC and an error logs is printed to denote that stats cannot be handled. We will enable stream to have its own reference to the stats handler and only process per RPC implementation when it is present (like in case of regular data streams). Internal issue: b/385685802 RELEASE NOTES: * stats: only process RPC stats/tracing in health and ORCA producers if a handler is configured, preventing unnecessary error logging
#3314) Bump google.golang.org/grpc from 1.56.3 to 1.79.3 in the go_modules group across 1 directory Bumps the go_modules group with 1 update in the / directory: google.golang.org/grpc. Updates google.golang.org/grpc from 1.56.3 to 1.79.3 Release notes Sourced from google.golang.org/grpc's releases. Release 1.79.3 Security server: fix an authorization bypass where malformed :path headers (missing the leading slash) could bypass path-based restricted "deny" rules in interceptors like grpc/authz. Any request with a non-canonical path is now immediately rejected with an Unimplemented error. (#8981) Release 1.79.2 Bug Fixes stats: Prevent redundant error logging in health/ORCA producers by skipping stats/tracing processing when no stats handler is configured. (grpc/grpc-go#8874) Release 1.79.1 Bug Fixes grpc: Remove the -dev suffix from the User-Agent header. (grpc/grpc-go#8902) Release 1.79.0 API Changes mem: Add experimental API SetDefaultBufferPool to change the default buffer pool. (#8806) Special Thanks: @vanja-p experimental/stats: Update MetricsRecorder to require embedding the new UnimplementedMetricsRecorder (a no-op struct) in all implementations for forward compatibility. (#8780) Behavior Changes balancer/weightedtarget: Remove handling of Addresses and only handle Endpoints in resolver updates. (#8841) New Features experimental/stats: Add support for asynchronous gauge metrics through the new AsyncMetricReporter and RegisterAsyncReporter APIs. (#8780) pickfirst: Add support for weighted random shuffling of endpoints, as described in gRFC A113. This is enabled by default, and can be turned off using the environment variable GRPC_EXPERIMENTAL_PF_WEIGHTED_SHUFFLING. (#8864) xds: Implement :authority rewriting, as specified in gRFC A81. (#8779) balancer/randomsubsetting: Implement the random_subsetting LB policy, as specified in gRFC A68. (#8650) Special Thanks: @marek-szews Bug Fixes credentials/tls: Fix a bug where the port was not stripped from the authority override before validation. (#8726) Special Thanks: @Atul1710 xds/priority: Fix a bug causing delayed failover to lower-priority clusters when a higher-priority cluster is stuck in CONNECTING state. (#8813) health: Fix a bug where health checks failed for clients using legacy compression options (WithDecompressor or RPCDecompressor). (#8765) Special Thanks: @sanki92 transport: Fix an issue where the HTTP/2 server could skip header size checks when terminating a stream early. (#8769) Special Thanks: @joybestourous server: Propagate status detail headers, if available, when terminating a stream during request header processing. (#8754) Special Thanks: @joybestourous Performance Improvements credentials/alts: Optimize read buffer alignment to reduce copies. (#8791) mem: Optimize pooling and creation of buffer objects. (#8784) transport: Reduce slice re-allocations by reserving slice capacity. (#8797) ... (truncated) Commits dda86db Change version to 1.79.3 (#8983) 72186f1 grpc: enforce strict path checking for incoming requests on the server (#8981) 97ca352 Changing version to 1.79.3-dev (#8954) 8902ab6 Change the version to release 1.79.2 (#8947) a928670 Cherry-pick #8874 to v1.79.x (#8904) 06df363 Change version to 1.79.2-dev (#8903) 782f2de Change version to 1.79.1 (#8902) 850eccb Change version to 1.79.1-dev (#8851) 765ff05 Change version to 1.79.0 (#8850) 68804be Cherry pick #8864 to v1.79.x (#8896) Additional commits viewable in compare view Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase. Dependabot commands and options You can trigger Dependabot actions by commenting on this PR: @dependabot rebase will rebase this PR @dependabot recreate will recreate this PR, overwriting any edits that have been made to it @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency @dependabot ignore <dependency name> major version will close this group update PR and stop Dependabot creating any more for the specific dependency's major version (unless you unignore this specific dependency's major version or upgrade to it yourself) @dependabot ignore <dependency name> minor version will close this group update PR and stop Dependabot creating any more for the specific dependency's minor version (unless you unignore this specific dependency's minor version or upgrade to it yourself) @dependabot ignore <dependency name> will close this group update PR and stop Dependabot creating any more for the specific dependency (unless you unignore this specific dependency or upgrade to it yourself) @dependabot unignore <dependency name> will remove all of the ignore conditions of the specified dependency @dependabot unignore <dependency name> <ignore condition> will remove the ignore condition of the specified dependency and ignore conditions You can disable automated security fix PRs for this repo from the Security Alerts page. Reviewed-by: Anton Sidelnikov Reviewed-by: Artem Lifshits
This resolves an issue where below log statement keeps printing repeatedly:
"ERROR: [otel-plugin] ctx passed into client side stats handler metrics event handling has no client attempt data present".
When new stream is created for health/orca producers, stats and tracing is not setup. However, this fact is ignored during RPC and an error logs is printed to denote that stats cannot be handled. We will enable stream to have its own reference to the stats handler and only process per RPC implementation when it is present (like in case of regular data streams).
Internal issue: b/385685802
RELEASE NOTES: