Skip to content

Conversation

@santigimeno
Copy link
Member

@santigimeno santigimeno commented Jun 5, 2025

Summary by CodeRabbit

  • New Features

    • Introduced advanced TCP frame handling, tracing, and diagnostics for improved transport observability.
    • Added new configuration options for extended channel monitoring and node lifetime control.
    • Enhanced SPIFFE identity and bundle support for secure transport authentication.
    • Improved channelz monitoring with extensible data sources and detailed JSON reporting.
    • Added persistent and ephemeral builder scopes for flexible configuration management.
    • Added comprehensive HTTP/2 transport ping and keepalive management with promise-based asynchronous control.
    • Integrated detailed HTTP/2 frame and flow control tracing for diagnostics.
    • Added new transport extensions including chaotic_good and CHTTP2 plugins with modern channel creation APIs.
    • Enhanced filter fusion with new methods and improved instance management.
    • Added JSON serialization support for compression filters and HTTP server filters in channelz.
    • Introduced scoped builder registration and endpoint transport registry in core configuration.
    • Added new keepalive and ping management for HTTP/2 transport.
    • Added message assembler and disassembler for gRPC message framing.
    • Added new transport context abstraction for chaotic_good transport.
    • Added support for ALPN protocol preferences via channel arguments in SSL security connectors.
  • Bug Fixes

    • Improved error handling and validation in HTTP/2 frame parsing and stream management.
    • Enhanced random number generation and padding logic for transport consistency.
    • Fixed handshake and frame header validation logic in chaotic_good transport.
    • Resolved issues with callback ownership and threading in client channel connectivity watchers.
    • Corrected HTTP/2 stream ID masking and padding validation per RFC 9113.
  • Refactor

    • Unified internal APIs for channel creation, server port addition, and transport context handling.
    • Modernized channelz registry and node management for better concurrency and scalability.
    • Removed legacy and deprecated code paths in HTTP/2 and transport layers.
    • Refactored chaotic_good transport connection logic to handshake-driven model with status return.
    • Simplified client and server transport frame handling and tracing integration.
    • Replaced legacy subchannel pool with sharded implementation for improved concurrency.
    • Consolidated protobuf generated code with added oneof clear functions and removed map entry types.
    • Enhanced HTTP/2 client and server transports with explicit stream ID management and error handling.
    • Improved channelz node and data source architecture with extensible JSON reporting.
    • Refined filter fusion implementation with new wrapper and method adaptations.
  • Documentation

    • Updated and clarified comments throughout the codebase, including deprecation notices and improved configuration descriptions.
    • Enhanced channel argument documentation with detailed explanations and references.
    • Added deprecation warnings for OpenCensus and GCP Observability APIs.
  • Chores

    • Added new oneof clear functions to generated protobuf headers for more robust message handling.
    • Updated build configurations and dependencies to support new features and transport extensions.
    • Added inclusion of port definition headers in generated gRPC proto source files for consistent compilation.
    • Cleaned up unused includes and improved code formatting in multiple modules.

These updates collectively provide enhanced transport flexibility, security, diagnostics, and configuration options for end-users.

@santigimeno santigimeno self-assigned this Jun 5, 2025
@coderabbitai
Copy link

coderabbitai bot commented Jun 5, 2025

Caution

Review failed

The pull request is closed.

Walkthrough

This update delivers a major refactor and extension of gRPC's core transport, channelz, and configuration infrastructure. It introduces new transport abstractions, tracing and telemetry, channelz node extensibility, and significant HTTP/2 transport improvements. The patch also adds new configuration options, enhances promise-based async flows, and modernizes the build system and API surface. Numerous generated protobuf headers are updated to remove map entry typedefs and add explicit oneof clear functions.

Changes

File(s) / Area Change Summary
Core transport: chaotic_good, chttp2, inproc, legacy, etc. Major refactor: new transport context, frame header types, tracing, channelz integration, endpoint/port APIs, TCP call tracing, and promise-based flows. Adds new files for frame headers, tracing collectors, and context.
HTTP/2 transport (chttp2): core, client, server, frame, settings Adds structured HTTP/2 error/status handling, frame parsing/serialization improvements, promise-based keepalive/ping, detailed ztrace tracing, and channelz data source integration.
Channelz: registry, node, trace, ztrace_collector Refactors registry to sharded/weak-ref design, adds extensible data sources, new node types, tracing, and JSON reporting.
Core configuration Builder registration now supports persistent/ephemeral scopes, with improved reset and tracking.
Channel/credentials/client/server/connector Updates to handle new transport context, socket node tracking, and subchannel pool sharding.
Filters: call, fusion, http, compression, logging Adds channelz DataSource support, filter fusion enhancements, and improved pointer arithmetic for call data.
EventEngine API Endpoint interface extended for write metrics, encapsulation of read/write args, and metrics sink support.
Build files, Makefile, .bcr, .gitattributes, .gitignore Updates for new source files, build flags, version bumps, and generated file tracking.
Generated upb protobuf headers Removes map entry typedefs, adds explicit oneof clear functions, and updates field size metadata.
Misc: spiffe, URI, ALPN, random generators, comments, version info Adds spiffe utilities, improves URI construction, ALPN protocol support, random generator changes, doc updates.

Sequence Diagram(s)

sequenceDiagram
    participant App
    participant Channel
    participant Transport
    participant Endpoint
    participant Channelz
    participant Tracer

    App->>Channel: Create channel (target, args)
    Channel->>Transport: ChannelCreate(target, args)
    Transport->>Endpoint: Setup endpoints (control/data)
    Endpoint-->>Transport: Provide metrics/tracing sink
    Transport->>Channelz: Register socket node, data source
    Transport->>Tracer: Append trace events (frame, write, error)
    Channelz->>App: Expose JSON/trace info
    App->>Channel: Make call
    Channel->>Transport: StartCall
    Transport->>Endpoint: Read/Write (with metrics/tracing)
    Endpoint-->>Transport: Report write metrics/events
    Transport->>Tracer: Trace call activity
Loading

Possibly related PRs

  • Node v22.15.1 nsolid v5.7.2 release #310: Removes continuous CPU profiling and related gRPC/protobuf code in Node.js N|Solid, directly related to this PR's removal and refactor of profiling and transport code.

Suggested reviewers

  • juanarbol
  • RafaelGSS

Poem

🐇
In the tunnels of code where the packets do race,
New transports and tracing now quicken the pace.
Channelz nodes multiply, metrics abound,
With promises, sockets, and endpoints profound.
From oneof to tracing, the changes are grand—
A hoppy new future for gRPC is planned!
🥕✨

✨ Finishing Touches
  • 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@windsurf-bot windsurf-bot bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files skipped due to size limits:
  • deps/grpc/Makefile
  • deps/grpc/src/compiler/cpp_generator.cc
  • deps/grpc/src/core/BUILD

💡 To request another review, post a new comment with "/windsurf-review".

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 308046b and f074f4d.

📒 Files selected for processing (7)
  • deps/grpc/.bcr/presubmit.yml (1 hunks)
  • deps/grpc/Makefile (1 hunks)
  • deps/grpc/include/grpcpp/version_info.h (1 hunks)
  • deps/grpc/src/compiler/cpp_generator.cc (2 hunks)
  • deps/grpc/src/core/BUILD (2 hunks)
  • deps/grpc/src/core/load_balancing/xds/xds_override_host.cc (6 hunks)
  • deps/grpc/src/core/util/linux/env.cc (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (6)
  • GitHub Check: lint-js-and-md
  • GitHub Check: test-macOS
  • GitHub Check: build-docs
  • GitHub Check: test-linux (ubuntu-24.04)
  • GitHub Check: test-linux (ubuntu-24.04-arm)
  • GitHub Check: build-tarball
🔇 Additional comments (13)
deps/grpc/Makefile (1)

371-371: LGTM! Version update is correct.

The CPP_VERSION bump from 1.72.0 to 1.72.1 is consistent with the PR objective of updating grpc to version 1.72.1.

deps/grpc/.bcr/presubmit.yml (1)

41-41: LGTM! Bazel flag syntax update looks correct.

The change from --//third_party:use_openssl=true to --define=@grpc//third_party:use_openssl=true properly adds the --define= prefix and explicit workspace scope @grpc//, which is a valid Bazel flag format improvement.

deps/grpc/include/grpcpp/version_info.h (1)

23-25: LGTM! Version macros updated consistently.

Both the patch version macro and version string have been correctly updated from 1.72.0 to 1.72.1, maintaining consistency between the numeric and string representations.

deps/grpc/src/core/util/linux/env.cc (1)

40-46: LGTM! Build-time configuration option added correctly.

The conditional compilation logic properly adds support for the GRPC_FORCE_UNSECURE_GETENV macro to allow falling back to getenv when needed, while preserving the existing default behavior of using secure_getenv when available.

deps/grpc/src/compiler/cpp_generator.cc (1)

747-755: LGTM: Clean consolidation of includes with proper ordering.

The consolidation of headers into a single array with explicit documentation about ports_def.inc needing to be last is a good improvement. This ensures proper macro definition/undefinition pairing.

deps/grpc/src/core/BUILD (2)

39-42: LGTM! Clean addition of build configuration.

The new config_setting follows Bazel conventions correctly and provides a clear mechanism to control getenv behavior at build time. The naming is descriptive and the syntax is proper.


469-472: LGTM! Proper conditional compilation setup.

The defines attribute correctly uses select() to conditionally define the GRPC_FORCE_UNSECURE_GETENV macro based on the config_setting. This provides build-time control while maintaining backward compatibility (default behavior unchanged when the flag is not set).

deps/grpc/src/core/load_balancing/xds/xds_override_host.cc (6)

289-316: LGTM! Proper thread-safe storage for per-endpoint args.

The addition of per_endpoint_args_ member with appropriate getter/setter methods follows the established pattern for other guarded members. The mutex protection ensures thread safety.


806-812: Good use of move semantics in the AddressInfo constructor.

The struct properly extends to include per-endpoint args and efficiently moves all arguments in the constructor to avoid unnecessary copies.


847-849: LGTM! Proper propagation of endpoint args.

The code correctly passes endpoint.args() when constructing AddressInfo entries, ensuring per-endpoint channel arguments are captured from the resolver update.


883-888: Comprehensive logging of per-endpoint args.

Good addition to the debug logging to include per_endpoint_args for better observability when troubleshooting.


929-975: Well-designed lock management for subchannel creation.

The three-step process with clear documentation effectively handles the challenge of avoiding deadlock while maintaining consistency. The implementation:

  • Properly releases the lock before calling into arbitrary channel code
  • Handles race conditions by rechecking map state after reacquiring the lock
  • Returns early if another thread has already created the subchannel

This is a good example of careful concurrent programming.


1012-1024:

✅ Verification successful

Verify per-address args logging and propagation.

The Helper::CreateSubchannel method correctly logs and passes through the per-address args. The implementation maintains consistency with the overall design.

Let me verify that the per_address_args parameter is consistently used throughout the codebase for subchannel creation:


🏁 Script executed:

#!/bin/bash
# Check for other CreateSubchannel implementations to ensure consistency
ast-grep --pattern 'CreateSubchannel($$$)'

Length of output: 44


To ensure we’re not missing any other CreateSubchannel definitions or invocations, let’s broaden the search beyond ast-grep. We’ll use ripgrep to find all occurrences:


🏁 Script executed:

#!/bin/bash
# Search for all CreateSubchannel definitions and calls across the repo
rg -n "CreateSubchannel\(" || true

Length of output: 4486


Approve per-address args logging and propagation

I’ve verified that every CreateSubchannel(address, per_address_args, args) implementation across the load-balancing policies—including xds_override_host.cc lines 1012–1024—correctly accepts, logs, and forwards per_address_args to channel_control_helper(). No inconsistencies were found; the change aligns with the overall design.

@santigimeno santigimeno changed the title deps: update grpc to 1.72.1 deps: update grpc to 1.73.1 Jul 18, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 36

🔭 Outside diff range comments (3)
deps/grpc/src/core/ext/transport/chttp2/transport/frame.cc (1)

241-251: Consider handling stream ID overflow more robustly

While the code logs when last_stream_id exceeds the 31-bit limit, it silently truncates the value which could lead to protocol violations. Consider either:

  • Returning an error status to the caller
  • Using CHECK_LE to catch this as a programming error
  • Explicitly clamping the value with a clear API contract
-    if (GPR_UNLIKELY(frame.last_stream_id > RFC9113::kMaxStreamId31Bit)) {
-      LOG(ERROR) << "Stream ID will be truncated. The MSB will be set to 0 "
-                 << frame.last_stream_id;
-    }
+    CHECK_LE(frame.last_stream_id, RFC9113::kMaxStreamId31Bit)
+        << "GOAWAY last_stream_id exceeds 31-bit limit";
deps/grpc/src/core/BUILD (2)

2105-2105: Missing transport_auth_context BUILD rule and header

The transport_auth_context target has been added systematically in deps/grpc/src/core/BUILD (e.g. lines 3801, 4105, 4182, …, 9155), but there is no corresponding cc_library(name = "transport_auth_context", …) definition in any BUILD or .bzl file, nor is there a transport_auth_context.h header in the repo. This will break the build.

Please:

  • Add a cc_library(name = "transport_auth_context", srcs = […], hdrs = ["transport_auth_context.h"], …) rule in the appropriate BUILD file (or at the workspace root if that’s intended).
  • Ensure transport_auth_context.h (and any implementation .cc) are present under the matching directory.

1786-1793: Confirm presence of implementation files for new HTTP/2 transport components

It looks like you’ve added only header files for header_assembler, message_assembler, writable_stream_list, and stream_data_queue and wired them up in deps/grpc/src/core/BUILD, but no .cc implementation files were found in the repo. Please verify whether these libraries are meant to be fully header-only (with all definitions inline) or if their corresponding .cc files still need to be added and listed in the BUILD rules.

• BUILD entries located at deps/grpc/src/core/BUILD around lines 7694, 7715, 7735, and 7748
• Header files under deps/grpc/src/core/ext/transport/chttp2/transport/*.h
• No matching *.cc files detected for these components

🧹 Nitpick comments (37)
deps/grpc/src/core/ext/transport/chttp2/transport/http2_server_transport.h (2)

78-81: Consider implementing GetSocketNode() for channelz support

The method returns nullptr, which may prevent proper channelz integration. Consider implementing this if channelz support is needed for HTTP/2 server transports.


134-149: Track implementation of CloseStream and CloseTransport methods

Both methods are marked with TODO comments indicating they need implementation. These are critical for proper resource cleanup and should be tracked.

Would you like me to create issues to track the implementation of these critical cleanup methods?

deps/grpc/src/core/ext/transport/chttp2/transport/header_assembler.h (1)

33-34: TODO comment indicates missing benchmarks.

The TODO comment suggests micro benchmarks are planned for assembler and disassembler code. This indicates the implementation may not be performance-optimized yet.

Would you like me to help track this TODO or create an issue to ensure the benchmarks are implemented?

deps/grpc/src/core/ext/transport/chaotic_good/frame_transport.h (1)

71-75: Consider documenting the TODO comment.

The OutgoingFrame struct correctly encapsulates a frame with its optional tracer. However, the TODO comment about non-TCP transports should be addressed or tracked.

Would you like me to create an issue to track the non-TCP transport tracing design?

deps/grpc/src/core/client_channel/backup_poller.cc (1)

163-171: Helper function improves code clarity and maintainability.

The new g_can_poll_in_background() function encapsulates the conditional logic for determining whether background polling is supported. The function correctly:

  • Returns grpc_iomgr_run_in_background() when POSIX poller is instantiated
  • Returns false when POSIX poller is not instantiated (as noted in the comment)

Minor typo in comment.

There's a typo in the comment on line 168: "backgroung" should be "background".

-  // are able to run in backgroung.
+  // are able to run in background.
deps/grpc/src/core/ext/transport/chaotic_good/control_endpoint.h (1)

113-117: Document the intended use case for GetEventEngineEndpoint

This method exposes the underlying endpoint implementation, which could break encapsulation. Consider adding a comment explaining when and why callers should use this method, particularly if it's intended for channelz integration.

deps/grpc/src/core/ext/transport/chttp2/transport/ping_promise.cc (2)

1-2: Remove extra empty line in copyright header

The copyright header has an unusual double empty line at the beginning.

-//
-//
+//
 // Copyright 2025 gRPC authors.

92-93: Address TODO: Update keepalive flags

The TODO indicates missing keepalive flag updates when a ping is sent.

Would you like me to help implement the keepalive flag updates or create an issue to track this?

deps/grpc/src/core/channelz/channel_trace.h (2)

27-27: Unnecessary include of sys/types.h

The inclusion of <sys/types.h> appears unnecessary as the code doesn't use any types from this header. Consider removing it to reduce dependencies.

-#include <sys/types.h>

58-69: Consider making SeverityString constexpr

The SeverityString method could be made constexpr for compile-time evaluation when possible.

-static const char* SeverityString(ChannelTrace::Severity severity) {
+static constexpr const char* SeverityString(ChannelTrace::Severity severity) {
deps/grpc/src/core/ext/transport/chaotic_good/tcp_frame_header.h (2)

24-28: Potential integer overflow in padding calculation

The DataConnectionPadding function doesn't check for integer overflow when calculating alignment - (payload_length % alignment). While unlikely with typical alignment values, consider adding a check or using saturating arithmetic.

inline uint32_t DataConnectionPadding(uint32_t payload_length,
                                      uint32_t alignment) {
+  if (alignment == 0) return 0;  // Avoid division by zero
   if (payload_length % alignment == 0) return 0;
   return alignment - (payload_length % alignment);
}

42-42: Parse methods should document error conditions

The static Parse methods return StatusOr but don't document what error conditions might occur (e.g., insufficient data, invalid format). Consider adding documentation.

   // Parses a frame header from a buffer of kFrameHeaderSize bytes. All
   // kFrameHeaderSize bytes are consumed.
+  // Returns an error status if the data is malformed or insufficient.
   static absl::StatusOr<TcpFrameHeader> Parse(const uint8_t* data);

Also applies to: 65-65

deps/grpc/src/core/client_channel/global_subchannel_pool.cc (2)

76-77: Old map cleanup relies on RAII

The old_map1 and old_map2 variables will be destroyed when going out of scope, which should clean up the old AVL tree nodes. This is correct but subtle - consider adding a comment.

   SubchannelMap old_map1;
   SubchannelMap old_map2;
+  // Old maps will be destroyed when going out of scope, cleaning up old AVL nodes

119-119: Consider making shard count configurable

The hardcoded value of 127 shards might not be optimal for all deployments. Consider making this configurable or at least a named constant with documentation about why 127 was chosen.

+// Prime number chosen to reduce hash collisions while keeping memory overhead reasonable
+static constexpr size_t kDefaultShardCount = 127;
+
 size_t GlobalSubchannelPool::ShardIndex(const SubchannelKey& key) {
   absl::string_view addr(key.address().addr, key.address().len);
-  return absl::HashOf(addr) % kShards;
+  return absl::HashOf(addr) % kDefaultShardCount;
 }
deps/grpc/src/core/ext/transport/chttp2/transport/message_assembler.h (4)

37-38: Consider implementing the micro benchmarks for performance validation.

The TODO indicates that micro benchmarks are needed for the assembler and disassembler code. This would be valuable for validating performance characteristics, especially given the message batching and buffer management operations.

Would you like me to help generate micro benchmark code for the GrpcMessageAssembler and GrpcMessageDisassembler classes?


112-112: Address the TODO for Http2StatusOr integration.

The TODO indicates that Http2StatusOr should be used here once available. This suggests the error handling might need to be updated for consistency with the rest of the codebase.

Would you like me to help track this TODO or create an issue for implementing the Http2StatusOr integration?


158-160: Consider using std::min for clarity.

The ternary operator could be replaced with std::min for better readability.

-    const uint32_t current_length =
-        message_.Length() >= max_length ? max_length : message_.Length();
+    const uint32_t current_length = std::min(message_.Length(), max_length);

50-50: Consider documenting thread safety guarantees.

Both GrpcMessageAssembler and GrpcMessageDisassembler classes should document their thread safety guarantees. Based on the implementation and the comment about "one stream for its lifetime", these appear to be designed for single-threaded use per stream instance.

Add thread safety documentation to the class comments to clarify usage expectations.

Also applies to: 130-130

deps/grpc/src/core/client_channel/global_subchannel_pool.h (1)

78-78: Consider making the shard count configurable or power of 2.

The hardcoded value of 127 shards seems arbitrary. Consider:

  1. Making it configurable via environment variable or channel args
  2. Using a power of 2 (e.g., 128) for more efficient modulo operations in ShardIndex()
-  static const size_t kShards = 127;
+  static const size_t kShards = 128;  // Power of 2 for efficient modulo
deps/grpc/src/core/ext/transport/chaotic_good/server_transport.cc (2)

279-279: Consider graceful error handling instead of CHECK

Using CHECK(ctx_ != nullptr) will cause a crash if the context is null. While this might be intentional to catch programming errors early, consider whether graceful error handling would be more appropriate for a transport layer component.

If graceful error handling is preferred, consider returning an error status or using DCHECK for debug builds only:

-  CHECK(ctx_ != nullptr);
+  if (ctx_ == nullptr) {
+    LOG(ERROR) << "Frame transport context is null";
+    // Handle error appropriately
+    return;
+  }

311-312: Consider using VLOG or DEBUG level for transport closure logging

Logging transport closures at INFO level might be too verbose for production environments where transport closures are common during normal operation.

-  GRPC_TRACE_LOG(chaotic_good, INFO)
-      << "CHAOTIC_GOOD: OnFrameTransportClosed: " << status;
+  GRPC_TRACE_VLOG(chaotic_good, 2)
+      << "CHAOTIC_GOOD: OnFrameTransportClosed: " << status;
deps/grpc/src/core/ext/transport/chttp2/transport/http2_status.h (1)

95-95: Remove unnecessary semicolon after switch statement.

-  };
+  }
deps/grpc/src/core/ext/transport/chttp2/transport/http2_ztrace_collector.h (3)

1-17: Correct copyright year

The copyright notice shows 2025, which is in the future. This should be 2024.

-// Copyright 2025 The gRPC Authors
+// Copyright 2024 The gRPC Authors

28-36: Config class needs implementation

The Config class appears to be a stub. The Finishes method always returns false, which may prevent proper trace lifecycle management. Consider adding a TODO comment or implementing the actual configuration logic.

Do you want me to help implement the configuration logic or create an issue to track this?


40-240: Well-designed frame trace structures

The trace structs comprehensively capture HTTP/2 frame data with proper memory accounting and structured JSON output. Each struct correctly includes all relevant fields for its frame type according to RFC 9113.

Consider adding string constants for frame type names to avoid duplication:

namespace {
constexpr absl::string_view kDataFrameType = "DATA";
constexpr absl::string_view kHeadersFrameType = "HEADERS";
// ... etc
}
deps/grpc/src/core/ext/transport/chttp2/server/chttp2_server.cc (1)

761-761: Unusual nullptr cast pattern

The cast static_cast<Chttp2ServerListener*>(nullptr) is unusual. Consider using a more idiomatic approach:

-  listener_ = static_cast<Chttp2ServerListener*>(nullptr);
+  listener_ = nullptr;
deps/grpc/src/core/ext/transport/chttp2/transport/http2_server_transport.cc (3)

97-239: Skeleton implementations pending completion

All Process*Frame functions currently return Http2Status::Ok() with TODO comments indicating they need implementation. This is acceptable for initial skeleton code, but ensure these are implemented before the feature is enabled.

Do you want me to help create GitHub issues to track the implementation of each frame processor?


303-311: Consider using explicit discard instead of GRPC_UNUSED

While the error handling is correct, consider using std::ignore or (void) cast instead of GRPC_UNUSED for discarding the return value:

-    GRPC_UNUSED absl::Status error_status =
-        self->HandleError(Http2Status::AbslConnectionError(
-            status.code(), std::string(status.message())));
+    std::ignore = self->HandleError(Http2Status::AbslConnectionError(
+        status.code(), std::string(status.message())));

335-342: Apply same improvement to OnWriteLoopEnded

Same suggestion as OnReadLoopEnded for consistency.

-    GRPC_UNUSED absl::Status error_status =
-        self->HandleError(Http2Status::AbslConnectionError(
-            status.code(), std::string(status.message())));
+    std::ignore = self->HandleError(Http2Status::AbslConnectionError(
+        status.code(), std::string(status.message())));
deps/grpc/src/core/channelz/ztrace_collector.h (2)

38-38: Use default constructor for ZTraceImpl.

The explicitly defined empty constructor can be replaced with a default constructor.

-  explicit ZTraceImpl() {}
+  ZTraceImpl() = default;

286-290: Optimize single instance case to avoid iterator overhead.

When there's only one instance, using an iterator might be less efficient than direct access.

-          auto& instances = impl->instances;
-          auto& instance = *instances.begin();
+          auto& instance = **impl->instances.begin();
           const bool finishes = instance->config.Finishes(value.second);
           instance->Append(std::move(value));
           if (finishes) {
             instance->Finish(absl::OkStatus());
-            instances.clear();
+            impl->instances.clear();
           }
deps/grpc/src/core/ext/transport/chaotic_good/tcp_ztrace_collector.h (1)

87-89: MemoryUsage calculation may underestimate actual memory usage.

The calculation doesn't account for the memory used by the string contents in the metrics vector. Each absl::string_view may reference external string data that should be included in the memory estimate.

 size_t MemoryUsage() const {
-  return sizeof(*this) + sizeof(metrics[0]) * metrics.size();
+  size_t size = sizeof(*this) + sizeof(metrics[0]) * metrics.size();
+  // Note: This doesn't account for external string memory referenced by string_views
+  return size;
 }
deps/grpc/src/core/ext/transport/chttp2/transport/http2_client_transport.cc (2)

486-512: Consider async-safe stream lifecycle management

The comment correctly identifies that transport_mutex_ might cause issues with synchronous OnDone callbacks. Since MakeStream is called under stream_id_mutex_, consider using lock-free data structures or InterActivity patterns for stream_list_ management.


552-559: Verify half-close behavior on cancellation

The TODO correctly identifies a concern about sending END_STREAM when the call is cancelled. According to HTTP/2 spec, sending END_STREAM on a cancelled stream is acceptable but might waste resources.

Consider checking cancellation before sending the final END_STREAM frame to avoid unnecessary network traffic.

deps/grpc/src/core/BUILD (1)

858-859: Unusual dependencies for promise_like

The promise_like library is adding dependencies on function_signature and json, which seems unusual for a promise abstraction. These dependencies might indicate feature creep or improper separation of concerns.

Consider whether these dependencies truly belong in the promise abstraction or if they should be moved to a higher-level component.

deps/grpc/src/core/ext/transport/chttp2/transport/http2_client_transport.h (1)

361-369: Consider the architectural improvement suggested in the comment.

The comment about creating a separate ref-counted struct to avoid back references is valid. This would simplify the cleanup path and reduce coupling between transport members and the transport itself.

deps/grpc/src/core/channelz/channelz_registry.h (1)

240-241: Consider using a power of 2 for shard count.

The choice of 63 shards is unusual. Using a power of 2 (like 64) would allow for more efficient modulo operations using bit masking and potentially better hash distribution.

-  static constexpr size_t kNodeShards = 63;
+  static constexpr size_t kNodeShards = 64;
   size_t NodeShardIndex(BaseNode* node) {
-    return absl::HashOf(node) % kNodeShards;
+    return absl::HashOf(node) & (kNodeShards - 1);
   }

Comment on lines +31 to +44
class StreamDataQueue {
public:
~StreamDataQueue() = default;

StreamDataQueue(StreamDataQueue&& rhs) = delete;
StreamDataQueue& operator=(StreamDataQueue&& rhs) = delete;
StreamDataQueue(const StreamDataQueue&) = delete;
StreamDataQueue& operator=(const StreamDataQueue&) = delete;

private:
// TODO(akshitpatel) : [PH2][P1] Keep this either in the transport or here.
// Not both places.
GrpcMessageDisassembler disassembler;
};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Incomplete class implementation with design uncertainty.

The StreamDataQueue class appears to be a placeholder with several concerns:

  1. No public interface: The class has no public methods beyond the destructor, making it unusable in its current form.
  2. Design uncertainty: The TODO comment indicates uncertainty about where the GrpcMessageDisassembler should reside.
  3. Incomplete functionality: The class doesn't expose any way to interact with the disassembler member.

Consider these actions:

  • Define the public interface for StreamDataQueue
  • Resolve the TODO comment by deciding on the disassembler placement
  • Add necessary methods to make the class functional

🏁 Script executed:

#!/bin/bash
# Check if StreamDataQueue is actually used anywhere in the codebase
rg -A 5 -B 5 "StreamDataQueue" --type cpp --type h

Length of output: 1868


Implement StreamDataQueue interface and resolve disassembler placement

The new StreamDataQueue in
deps/grpc/src/core/ext/transport/chttp2/transport/stream_data_queue.h (lines 31–44)
is currently unused and provides no public methods:

  • No way to enqueue or dequeue HTTP/2 frames or message fragments.
  • The GrpcMessageDisassembler member has an open TODO about ownership.
  • No integration points in the transport—no calls to or from this class exist in the codebase.

Please address the following before landing:

• Define and implement the public API for StreamDataQueue (e.g., methods to accept raw data, produce reassembled messages, handle backpressure).
• Decide whether GrpcMessageDisassembler should live here or be owned by the surrounding transport component; remove the TODO and update accordingly.
• Integrate StreamDataQueue into the HTTP/2 stream lifecycle (so that it’s actually instantiated and invoked).

🤖 Prompt for AI Agents
In deps/grpc/src/core/ext/transport/chttp2/transport/stream_data_queue.h around
lines 31 to 44, the StreamDataQueue class lacks a public interface and
integration. Define and implement public methods to enqueue raw HTTP/2 data,
dequeue reassembled messages, and manage backpressure. Decide on the ownership
of GrpcMessageDisassembler—either keep it as a member here or move it to the
transport component—then remove the TODO comment accordingly. Finally, integrate
StreamDataQueue into the HTTP/2 stream lifecycle by instantiating and invoking
it within the transport code to ensure it is used.

Comment on lines +39 to +52
class HeaderAssembler {
public:
Http2ErrorCode AppendHeaderFrame(GRPC_UNUSED Http2HeaderFrame& frame) {
return Http2ErrorCode::kNoError;
}

Http2ErrorCode AppendContinuationFrame(
GRPC_UNUSED Http2ContinuationFrame& frame) {
return Http2ErrorCode::kNoError;
}

private:
GRPC_UNUSED uint32_t stream_id_;
};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Incomplete stub implementation detected.

The HeaderAssembler class contains stub implementations that simply return Http2ErrorCode::kNoError without any actual processing. The GRPC_UNUSED parameters and private unused stream_id_ member indicate this is placeholder code.

Please ensure this implementation is intended for the current release or consider adding proper error handling if this code could be invoked.

🤖 Prompt for AI Agents
In deps/grpc/src/core/ext/transport/chttp2/transport/header_assembler.h between
lines 39 and 52, the HeaderAssembler class methods AppendHeaderFrame and
AppendContinuationFrame are stub implementations that return success without
processing, and the stream_id_ member is unused. Review whether this placeholder
code is acceptable for the current release; if not, implement actual header
frame processing logic and proper error handling, and remove the GRPC_UNUSED
annotations to reflect real usage of parameters and members.

Comment on lines +96 to 126
auto Send(MessageHandle message, uint32_t stream_id,
std::shared_ptr<TcpCallTracer> call_tracer, Output& output) {
return If(
ShouldChunk(*message),
[&]() {
BeginMessageFrame begin;
begin.body.set_length(message->payload()->Length());
begin.stream_id = stream_id;
return Seq(output.Send(std::move(begin)),
Loop([chunker = message_chunker_detail::PayloadChunker(
max_chunk_size_, alignment_, stream_id,
std::move(*message->payload())),
&output]() mutable {
auto next = chunker.NextChunk();
return Map(output.Send(std::move(next.frame)),
[done = next.done](
StatusFlag x) -> LoopCtl<StatusFlag> {
if (!done) return Continue{};
return x;
});
}));
return Seq(
output.Send(OutgoingFrame{std::move(begin), call_tracer}),
Loop([chunker = message_chunker_detail::PayloadChunker(
max_chunk_size_, alignment_, stream_id,
std::move(*message->payload())),
&output, call_tracer = std::move(call_tracer)]() mutable {
auto next = chunker.NextChunk();
return Map(
output.Send(
OutgoingFrame{std::move(next.frame), call_tracer}),
[done = next.done](StatusFlag x) -> LoopCtl<StatusFlag> {
if (!done) return Continue{};
return x;
});
}));
},
[&]() {
MessageFrame frame;
frame.message = std::move(message);
frame.stream_id = stream_id;
return output.Send(std::move(frame));
return output.Send(OutgoingFrame{std::move(frame), nullptr});
});
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Inconsistent tracer propagation for chunked vs unchunked messages.

The implementation propagates the call_tracer for chunked messages but passes nullptr for unchunked messages (line 124). This inconsistency could lead to missing tracing data for unchunked messages.

Consider using the call_tracer for both paths to ensure consistent telemetry coverage.

-          return output.Send(OutgoingFrame{std::move(frame), nullptr});
+          return output.Send(OutgoingFrame{std::move(frame), call_tracer});
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
auto Send(MessageHandle message, uint32_t stream_id,
std::shared_ptr<TcpCallTracer> call_tracer, Output& output) {
return If(
ShouldChunk(*message),
[&]() {
BeginMessageFrame begin;
begin.body.set_length(message->payload()->Length());
begin.stream_id = stream_id;
return Seq(output.Send(std::move(begin)),
Loop([chunker = message_chunker_detail::PayloadChunker(
max_chunk_size_, alignment_, stream_id,
std::move(*message->payload())),
&output]() mutable {
auto next = chunker.NextChunk();
return Map(output.Send(std::move(next.frame)),
[done = next.done](
StatusFlag x) -> LoopCtl<StatusFlag> {
if (!done) return Continue{};
return x;
});
}));
return Seq(
output.Send(OutgoingFrame{std::move(begin), call_tracer}),
Loop([chunker = message_chunker_detail::PayloadChunker(
max_chunk_size_, alignment_, stream_id,
std::move(*message->payload())),
&output, call_tracer = std::move(call_tracer)]() mutable {
auto next = chunker.NextChunk();
return Map(
output.Send(
OutgoingFrame{std::move(next.frame), call_tracer}),
[done = next.done](StatusFlag x) -> LoopCtl<StatusFlag> {
if (!done) return Continue{};
return x;
});
}));
},
[&]() {
MessageFrame frame;
frame.message = std::move(message);
frame.stream_id = stream_id;
return output.Send(std::move(frame));
return output.Send(OutgoingFrame{std::move(frame), nullptr});
});
}
auto Send(MessageHandle message, uint32_t stream_id,
std::shared_ptr<TcpCallTracer> call_tracer, Output& output) {
return If(
ShouldChunk(*message),
[&]() {
BeginMessageFrame begin;
begin.body.set_length(message->payload()->Length());
begin.stream_id = stream_id;
return Seq(
output.Send(OutgoingFrame{std::move(begin), call_tracer}),
Loop([chunker = message_chunker_detail::PayloadChunker(
max_chunk_size_, alignment_, stream_id,
std::move(*message->payload())),
&output, call_tracer = std::move(call_tracer)]() mutable {
auto next = chunker.NextChunk();
return Map(
output.Send(
OutgoingFrame{std::move(next.frame), call_tracer}),
[done = next.done](StatusFlag x) -> LoopCtl<StatusFlag> {
if (!done) return Continue{};
return x;
});
}));
},
[&]() {
MessageFrame frame;
frame.message = std::move(message);
frame.stream_id = stream_id;
return output.Send(OutgoingFrame{std::move(frame), call_tracer});
});
}
🤖 Prompt for AI Agents
In deps/grpc/src/core/ext/transport/chaotic_good/message_chunker.h between lines
96 and 126, the call_tracer is propagated for chunked messages but set to
nullptr for unchunked messages, causing inconsistent tracing. Modify the
unchunked message path to pass the call_tracer instead of nullptr when
constructing the OutgoingFrame to ensure consistent tracer propagation and
telemetry coverage.

Comment on lines +56 to +59
arena->SetContext(ctx_->event_engine.get());
write_party_ = Party::Make(arena);
CHECK(event_engine != nullptr);
write_party_->arena()->SetContext(event_engine);
CHECK(ctx_->event_engine != nullptr);
write_party_->arena()->SetContext(ctx_->event_engine.get());
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Move the CHECK before dereferencing event_engine.

The CHECK for null event_engine at line 58 comes after it's already dereferenced at line 56. If ctx_->event_engine were null, the code would crash before reaching the CHECK.

Apply this diff to fix the issue:

   auto arena = SimpleArenaAllocator(0)->MakeArena();
-  arena->SetContext(ctx_->event_engine.get());
-  write_party_ = Party::Make(arena);
   CHECK(ctx_->event_engine != nullptr);
+  arena->SetContext(ctx_->event_engine.get());
+  write_party_ = Party::Make(arena);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
arena->SetContext(ctx_->event_engine.get());
write_party_ = Party::Make(arena);
CHECK(event_engine != nullptr);
write_party_->arena()->SetContext(event_engine);
CHECK(ctx_->event_engine != nullptr);
write_party_->arena()->SetContext(ctx_->event_engine.get());
auto arena = SimpleArenaAllocator(0)->MakeArena();
CHECK(ctx_->event_engine != nullptr);
arena->SetContext(ctx_->event_engine.get());
write_party_ = Party::Make(arena);
write_party_->arena()->SetContext(ctx_->event_engine.get());
🤖 Prompt for AI Agents
In deps/grpc/src/core/ext/transport/chaotic_good/control_endpoint.cc around
lines 56 to 59, the code dereferences ctx_->event_engine before checking if it
is null, which can cause a crash. Move the CHECK(ctx_->event_engine != nullptr)
line to before any dereferencing of ctx_->event_engine to ensure it is not null
before use.

Comment on lines +86 to +88
const ChannelArgs& handshake_result_args() const {
return handshake_result_args_.value();
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add safety check for optional access.

The value() call on optional could throw std::bad_optional_access if the optional is not set. Consider adding a check or using a safer accessor.

Consider one of these approaches:

 const ChannelArgs& handshake_result_args() const {
+  CHECK(handshake_result_args_.has_value());
   return handshake_result_args_.value();
 }

Or return a pointer/optional reference to allow callers to check:

-const ChannelArgs& handshake_result_args() const {
-  return handshake_result_args_.value();
-}
+const ChannelArgs* handshake_result_args() const {
+  return handshake_result_args_.has_value() ? &*handshake_result_args_ : nullptr;
+}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const ChannelArgs& handshake_result_args() const {
return handshake_result_args_.value();
}
// Approach 1: assert that the value is present before unwrapping
const ChannelArgs& handshake_result_args() const {
CHECK(handshake_result_args_.has_value());
return handshake_result_args_.value();
}
Suggested change
const ChannelArgs& handshake_result_args() const {
return handshake_result_args_.value();
}
// Approach 2: return nullptr when there’s no value, letting callers handle absence
const ChannelArgs* handshake_result_args() const {
return handshake_result_args_.has_value()
? &*handshake_result_args_
: nullptr;
}
🤖 Prompt for AI Agents
In
deps/grpc/src/core/ext/transport/chaotic_good_legacy/server/chaotic_good_server.h
around lines 86 to 88, the method handshake_result_args() calls value() on an
optional without checking if it is set, which can throw
std::bad_optional_access. Modify the method to first check if
handshake_result_args_ has a value before accessing it, or change the return
type to a pointer or optional reference so callers can safely handle the absence
of a value.

Comment on lines +39 to +42
config_setting(
name = "force_unsecure_getenv",
values = {"define": "GRPC_FORCE_UNSECURE_GETENV=1"},
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Verify the security implications of GRPC_FORCE_UNSECURE_GETENV

This configuration setting allows forcing unsecure environment variable access. This could pose security risks if environment variables contain sensitive information.

Please ensure this is only used in development/testing scenarios and document its security implications.


🌐 Web query:

What is GRPC_FORCE_UNSECURE_GETENV and what are its security implications in gRPC?

💡 Result:

GRPC_FORCE_UNSECURE_GETENV is an environment variable or internal flag used in the gRPC library that controls whether gRPC allows insecure (non-TLS) channel creation based on environment variables. When this flag is enabled, gRPC permits the creation of insecure channels even if typically a secure channel would be expected.

What It Does

  • This flag can be checked inside gRPC implementations to determine if insecure channel creation should be forcibly allowed, usually by reading an environment variable (such as GRPC_FORCE_UNSECURE_GETENV) at runtime.
  • It is primarily used during development, testing, or in very controlled environments to bypass the default security settings that require TLS encryption.
  • By setting this flag, developers can override default secure behavior and force gRPC to work with insecure connections — meaning no encryption or data integrity verification during the remote procedure call communication.

Security Implications

  • Enabling GRPC_FORCE_UNSECURE_GETENV effectively disables default security protections in gRPC, allowing communication over unsecured channels.
  • This translates to communication without TLS, which puts the data transmitted over the network at risk of interception (Man-in-the-Middle attacks), eavesdropping, and tampering.
  • Using insecure channels can also expose the system to various injection attacks and increase the risk of compromised authentication and authorization, as the communication is not encrypted or verified.
  • It is considered dangerous to enable this flag in production or any environment where security is a concern, since sensitive data may be leaked.
  • The appropriate use of TLS and proper authentication/authorization mechanisms (such as mTLS, JWTs, or OAuth2 tokens) is recommended to maintain the confidentiality and integrity of gRPC communications[2][4][6][8].

Summary

Aspect Description
What it is Flag/environment variable allowing insecure gRPC connections
Default behavior gRPC enforces secure channels (TLS)
Effect of enabling it Allows unencrypted, insecure connections
Security risk Data interception, tampering, loss of confidentiality and integrity
Recommended use Development/testing only, never in production
Better alternatives Use TLS, mutual TLS (mTLS), JWT, OAuth 2.0 authentication

In summary, GRPC_FORCE_UNSECURE_GETENV enables insecure gRPC channels by overriding defaults that require secure (TLS) communication, and its use should be restricted to non-production scenarios due to serious security risks such as data leaks and Man-in-the-Middle attacks. Proper channel security with encryption and robust authentication mechanisms is fundamental for secure gRPC deployments[4][6][8].

Citations:


Document and Restrict GRPC_FORCE_UNSECURE_GETENV Usage

The new config_setting in deps/grpc/src/core/BUILD (lines 39–42):

config_setting(
    name = "force_unsecure_getenv",
    values = {"define": "GRPC_FORCE_UNSECURE_GETENV=1"},
)

enables insecure (non-TLS) gRPC channels by overriding the default security enforcement. This exposes RPC traffic to interception, tampering, and man-in-the-middle attacks.

Action items:

  • In the BUILD file (deps/grpc/src/core/BUILD @ lines 39–42), add a clear comment:
    ⚠️ DEVELOPMENT/TESTING ONLY — do not enable in production environments.”
  • Update the gRPC documentation (or a top-level README) to explain:
    • What GRPC_FORCE_UNSECURE_GETENV does
    • Its serious security implications
    • The recommended use case (dev/test only)
  • (Optional) If your Bazel setup supports separate build profiles, gate this config_setting behind a “dev” or “test” profile to prevent accidental production rollout.
🤖 Prompt for AI Agents
In deps/grpc/src/core/BUILD at lines 39 to 42, add a clear comment above the
config_setting named "force_unsecure_getenv" stating "⚠️ DEVELOPMENT/TESTING
ONLY — do not enable in production environments." Additionally, update the gRPC
documentation or a top-level README to explain what GRPC_FORCE_UNSECURE_GETENV
does, its serious security risks, and that it should only be used in development
or testing. Optionally, if your Bazel setup supports it, restrict this
config_setting to only be active in dev or test build profiles to avoid
accidental use in production.

Comment on lines +351 to +359
Promise<absl::Status> PingTimeout() override {
// TODO(akshitpatel) : [PH2][P1] : Trigger goaway here.
// Returns a promise that resolves once goaway is sent.
LOG(INFO) << "Ping timeout at time: " << Timestamp::Now();

return Immediate(
transport_->HandleError(Http2Status::Http2ConnectionError(
Http2ErrorCode::kRefusedStream, "Ping timeout")));
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

PingTimeout needs to send GOAWAY frame before closing connection.

The TODO at line 352 indicates that GOAWAY should be sent on ping timeout. Currently, it only triggers a connection error. This is important for proper HTTP/2 connection termination.

🤖 Prompt for AI Agents
In deps/grpc/src/core/ext/transport/chttp2/transport/http2_client_transport.h
around lines 351 to 359, the PingTimeout method currently only triggers a
connection error without sending a GOAWAY frame as indicated by the TODO
comment. To fix this, modify the method to first send a GOAWAY frame on the
transport before returning the error promise. Ensure the GOAWAY frame is
properly triggered to signal connection termination according to HTTP/2 protocol
before handling the error.

Comment on lines +239 to +256
// This function MUST be idempotent.
void CloseStream(uint32_t stream_id, absl::Status status,
DebugLocation whence = {}) {
HTTP2_CLIENT_DLOG << "Http2ClientTransport::CloseStream for stream id: "
<< stream_id << " status=" << status
<< " location=" << whence.file() << ":" << whence.line();
// TODO(akshitpatel) : [PH2][P1] : Implement this.
}
RefCountedPtr<Http2ClientTransport::Stream> LookupStream(uint32_t stream_id);

// This function MUST be idempotent.
void CloseTransport(const Http2Status& status, DebugLocation whence = {}) {
HTTP2_CLIENT_DLOG << "Http2ClientTransport::CloseTransport status="
<< status << " location=" << whence.file() << ":"
<< whence.line();
// TODO(akshitpatel) : [PH2][P1] : Implement this.
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Critical functions CloseStream and CloseTransport are not implemented.

Both functions are marked with [PH2][P1] TODOs indicating they must be implemented before the current sub-project is complete. These are essential for proper resource cleanup and error handling.

Would you like me to help implement these critical cleanup functions or create an issue to track this work?

🤖 Prompt for AI Agents
In deps/grpc/src/core/ext/transport/chttp2/transport/http2_client_transport.h
around lines 239 to 256, the critical functions CloseStream and CloseTransport
are currently unimplemented, which is necessary for proper resource cleanup and
error handling. Implement CloseStream to safely close the specified stream by
its stream_id, ensuring idempotency and updating any relevant internal state or
resources. Implement CloseTransport to close the entire transport connection
with the given status, also ensuring idempotency and performing necessary
cleanup of transport-level resources. Follow existing patterns for logging and
error handling, and remove the TODO comments once implemented.

Comment on lines +376 to +378
// TODO(ctiller): keeping channel args here can create odd circular references
// that are hard to reason about. Consider moving this to a DataSource.
ChannelArgs channel_args_;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Address the circular reference concern with channel args storage.

The TODOs correctly identify that storing channel args in nodes can create problematic circular references. Consider moving this to a DataSource as suggested to improve lifetime management and reduce coupling.

The DataSource pattern already established in this file would be a good fit for channel args storage, providing clear ownership and cleanup semantics.

Also applies to: 443-445, 494-496

🤖 Prompt for AI Agents
In deps/grpc/src/core/channelz/channelz.h around lines 376 to 378, 443 to 445,
and 494 to 496, the current storage of channel_args_ within nodes creates
circular references that complicate lifetime management. Refactor the code to
move channel_args_ storage into a DataSource object following the existing
DataSource pattern in this file. This will centralize ownership, simplify
cleanup, and reduce coupling by decoupling channel args from node lifetimes.

Comment on lines +248 to +251
DataSource::~DataSource() {
DCHECK(node_ == nullptr) << "DataSource must be ResetDataSource()'d in the "
"most derived class before destruction";
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Consider making DataSource destruction more robust.

The DCHECK requiring ResetDataSource() to be called in derived classes could lead to crashes in debug builds if developers forget to call it. Consider automatically calling ResetDataSource() in the destructor with a warning log instead.

 DataSource::~DataSource() {
-  DCHECK(node_ == nullptr) << "DataSource must be ResetDataSource()'d in the "
-                              "most derived class before destruction";
+  if (node_ != nullptr) {
+    LOG(ERROR) << "DataSource was not ResetDataSource()'d before destruction";
+    ResetDataSource();
+  }
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
DataSource::~DataSource() {
DCHECK(node_ == nullptr) << "DataSource must be ResetDataSource()'d in the "
"most derived class before destruction";
}
DataSource::~DataSource() {
if (node_ != nullptr) {
LOG(ERROR) << "DataSource was not ResetDataSource()'d before destruction";
ResetDataSource();
}
}
🤖 Prompt for AI Agents
In deps/grpc/src/core/channelz/channelz.cc around lines 248 to 251, the
DataSource destructor currently uses a DCHECK to enforce that ResetDataSource()
is called in derived classes, which can cause debug build crashes if forgotten.
To make destruction more robust, modify the destructor to check if node_ is not
nullptr, and if so, log a warning and call ResetDataSource() automatically
instead of asserting. This ensures safe cleanup without relying on derived
classes to always call ResetDataSource().

santigimeno added a commit that referenced this pull request Jul 18, 2025
PR-URL: #324
Reviewed-By: Rafael Gonzaga <[email protected]>
@santigimeno
Copy link
Member Author

Landed in cb229ed

@santigimeno santigimeno deleted the santi/update_grpc branch July 18, 2025 15:32
santigimeno added a commit that referenced this pull request Jul 23, 2025
PR-URL: #324
Reviewed-By: Rafael Gonzaga <[email protected]>
santigimeno added a commit that referenced this pull request Aug 25, 2025
PR-URL: #324
Reviewed-By: Rafael Gonzaga <[email protected]>
santigimeno added a commit that referenced this pull request Aug 26, 2025
PR-URL: #324
Reviewed-By: Rafael Gonzaga <[email protected]>
PR-URL: #359
Reviewed-By: Juan José Arboleda <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants