[None][feat] Support CancelRequest for Disaggregated Serving #6587

Shunkangz · 2025-08-04T03:58:44Z

Summary by CodeRabbit

New Features
- Added ability to cancel in-flight KV cache transfers through a new cancel_request API (available from C++ and Python).
- Introduced readiness signaling between sender and receiver to coordinate cancellations and avoid unnecessary work.
- Enhanced cancellation handling in the executor to defer termination until the transceiver confirms, improving stability in asynchronous workflows.
Tests
- Added tests for readiness signaling and canceling requests during transmission, including multi-GPU scenarios.

Description

Test Coverage

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md
and the scripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

coderabbitai · 2025-08-04T03:58:51Z

📝 Walkthrough

Walkthrough

Introduces end-to-end cancellation for in-flight KV-cache transmissions, adds a readiness signal between sender/receiver, extends cache/data transceiver public interfaces and implementations, updates agent connection notifications to carry ReadySignalInfo, exposes cancel_request to Python, integrates cancellation handling in PyExecutor, and updates unit/integration tests.

Changes

Cohort / File(s)	Summary
Cache transceiver API `cpp/include/tensorrt_llm/batch_manager/cacheTransceiver.h`, `cpp/tensorrt_llm/batch_manager/cacheTransceiver.cpp`	Adds BaseCacheTransceiver::cancelRequest and implements CacheTransceiver::cancelRequest routing to DataResponder/DataRequester based on request type.
Data transceiver interfaces `cpp/tensorrt_llm/batch_manager/dataTransceiver.h`, `cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.h`	Adds sendReadySignal/receiveReadySignal APIs and kREADY_SIGNAL_TAG; adds cancelRequest on DataResponder/DataRequester.
Data transceiver impl `cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp`, `cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.cpp`	Implements cancellation tracking/handling on requester/responder; introduces ready signaling send/receive paths; integrates readiness gate before transfer.
Agent connection notifications `cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.h`, `cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.cpp`	Adds ReadySignalInfo and extends NotificationInfo variant; implements AgentConnection send/recv ready signal; generalizes AgentConnectionManager wait via templated waitForNotification and adds waitForReadySignal; adjusts waitForSyncInfo signature.
Python bindings (C++) `cpp/tensorrt_llm/nanobind/batch_manager/cacheTransceiver.cpp`, `cpp/tensorrt_llm/pybind/batch_manager/cacheTransceiver.cpp`	Exposes BaseCacheTransceiver::cancelRequest to Python (`cancel_request`); updates trampoline classes with cancelRequest overrides.
Python executor integration `tensorrt_llm/_torch/pyexecutor/kv_cache_transceiver.py`, `tensorrt_llm/_torch/pyexecutor/py_executor.py`	Adds KvCacheTransceiver.cancel_request and wrapper; PyExecutor tracks cancel_request_in_transmission and invokes kv_cache_transceiver.cancel_request for in-transit cancellations; defers termination accordingly.
Tests `cpp/tests/unit_tests/multi_gpu/cacheTransceiverTest.cpp`, `tests/unittest/others/test_kv_cache_transceiver.py`	Mocks ready signaling in C++ tests; extends Python test with transfer status checks and adds test_cancel_request_in_transmission.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Exec as PyExecutor
  participant PyTx as KvCacheTransceiver (Py)
  participant CTx as CacheTransceiver (C++)
  participant Req as DataRequester
  participant Res as DataResponder
  note over Exec,Res: Cancellation of in-flight KV-cache request

  Exec->>Exec: detect canceled request in-transit
  Exec->>PyTx: cancel_request(req)
  PyTx->>CTx: cancelRequest(req)
  alt Context-only request
    CTx->>Res: cancelRequest(req)
    Res->>Res: mark cancelled if queued/not active
    Res-->>CTx: bool (success/fail)
  else Generation-only request
    CTx->>Req: cancelRequest(req)
    Req->>Req: remove from local async queue if present
    Req-->>CTx: bool (success/fail)
  end
  CTx-->>PyTx: bool
  PyTx-->>Exec: bool
  note over Exec: defer termination if marked in cancel_request_in_transmission

sequenceDiagram
  autonumber
  participant Res as DataResponder
  participant Sender as DataSenderImpl
  participant Recv as DataReceiverImpl
  participant Req as DataRequester

  note over Res,Req: Readiness handshake before transfer

  Res->>Sender: sendReadySignal(requestId, isReady)
  Sender->>Recv: kREADY_SIGNAL_TAG (isReady)
  Req->>Recv: receiveReadySignal(session)
  Recv-->>Req: bool isReady
  alt isReady == true
    Req->>Req: proceed with receiveSync
  else isReady == false
    Req->>Req: set kDISAGG_TRANS_ERROR and end transfer early
  end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

[TRTLLM-7096][infra] Testing cache transmission functionality in Python #7025 — Touches the same Python test file for KV-cache transceiver; this PR extends it with new assertions and a cancellation-in-transit test.
[None][feat] Add Request specific exception #6931 — Modifies batch manager transceiver stack around request state/error handling, intersecting with newly added cancellation and readiness flows.

Suggested labels

KV-Cache Management

Suggested reviewers

Shixiaowei02
chuangz0

Pre-merge checks (1 passed, 2 warnings)

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description Check	⚠️ Warning	The PR description is still the unfilled template stub and contains no explanation of what was changed, why it was changed, or which tests cover the new logic, leaving reviewers without context.	Please complete the Description section with a concise summary of the implementation details and objectives, list the relevant tests in the Test Coverage section, and remove any unused template placeholders.
Docstring Coverage	⚠️ Warning	Docstring coverage is 5.63% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title succinctly specifies the core feature added—cancelRequest support for disaggregated serving—uses the correct prefix format, and clearly reflects the main change without unnecessary detail.

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
		  - name: "Undocumented Breaking Changes"
			  mode: "warning"
			  instructions: |
				  Flag potential breaking changes that are not documented:
				  1. Identify changes to public APIs/exports, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints (including removed/renamed items and changes to types, required params, return values, defaults, or behavior).
				  2. Ignore purely internal/private changes (e.g., code not exported from package entry points or marked internal).
				  3. Verify documentation exists: a "Breaking Change" section in the PR description and updates to CHANGELOG.md.

Please share your feedback with us on this Discord post.

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2279cec and cbc9f98.

📒 Files selected for processing (5)

cpp/include/tensorrt_llm/batch_manager/cacheTransceiver.h (2 hunks)
cpp/tensorrt_llm/batch_manager/cacheTransceiver.cpp (1 hunks)
cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp (4 hunks)
cpp/tensorrt_llm/batch_manager/dataTransceiver.h (2 hunks)
cpp/tensorrt_llm/pybind/batch_manager/cacheTransceiver.cpp (2 hunks)

🧰 Additional context used

📓 Path-based instructions (3)

**/*.{cpp,h,hpp,cc,cxx}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.{cpp,h,hpp,cc,cxx}: Closing braces of namespaces should have a comment saying the namespace it closes (e.g., } // namespace foo)
Prefer const or constexpr variables over #defines whenever possible, as the latter are not visible to the compiler.
A variable that is not modified after its initialization should be declared as const.
Except 0 (only used in comparison for checking signness/existence/emptiness) and nullptr, true, false, all other literals should only be used for variable initialization.
Use the Allman indentation style for braces in C++ code.
Put the semicolon for an empty for or while loop in a new line.
The statement forming the body of a switch, while, do .. while or for statement shall be a compound statement (use brace-delimited statements).
If and else should always be followed by brace-delimited statements, even if empty or a single statement.
C++ filenames should use camel case with first letter lowercase (e.g., thisIsAFilename.cpp), and all files involved in the compilation of a target must have filenames that are case-insensitive unique.
All types (including class names) are camel case with uppercase first letter (e.g., FooBarClass).
Local variables, methods, and namespaces use camel case with first letter lowercase (e.g., localFooBar).
Non-magic-number global variables that are non-static and not defined in anonymous namespace use camel case prefixed by a lower case 'g' (e.g., gDontUseGlobalFoos).
Non-magic-number global variables that are static or defined in an anonymous namespace use camel case prefixed by a lower case 's' (e.g., sMutableStaticGlobal).
Locally visible static variable uses camel case with lowercase prefix 's' as the first letter of the name (e.g., static std::once_flag sFlag;).
Class member variables use camel case prefixed with an 'm' (e.g., mNbFooValues). Public member variables do not require the 'm' prefix but it is encouraged for clarity.
Enumerations, global constants, static constants at class-scope and function-...

Files:

cpp/tensorrt_llm/batch_manager/dataTransceiver.h
cpp/tensorrt_llm/batch_manager/cacheTransceiver.cpp
cpp/include/tensorrt_llm/batch_manager/cacheTransceiver.h
cpp/tensorrt_llm/pybind/batch_manager/cacheTransceiver.cpp
cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp

**/*.{h,hpp}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

Use a preprocessor guard in header files. The guard name must have prefix TRTLLM_ followed by the filename, all in caps, and no trailing underscore.

Files:

cpp/tensorrt_llm/batch_manager/dataTransceiver.h
cpp/include/tensorrt_llm/batch_manager/cacheTransceiver.h

**/*.{cpp,h,hpp,cc,cxx,cu,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.

Files:

cpp/tensorrt_llm/batch_manager/dataTransceiver.h
cpp/tensorrt_llm/batch_manager/cacheTransceiver.cpp
cpp/include/tensorrt_llm/batch_manager/cacheTransceiver.h
cpp/tensorrt_llm/pybind/batch_manager/cacheTransceiver.cpp
cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp

🧠 Learnings (1)

📚 Learning: in tensorrt-llm's multimodal processing pipeline, shared tensor recovery using `from_shared_tensor()...

Learnt from: yechank-nvidia
PR: NVIDIA/TensorRT-LLM#6254
File: tensorrt_llm/_torch/pyexecutor/model_engine.py:1201-1204
Timestamp: 2025-07-22T09:22:14.726Z
Learning: In TensorRT-LLM's multimodal processing pipeline, shared tensor recovery using `from_shared_tensor()` is only needed during the context phase. Generation requests reuse the already-recovered tensor data and only need to call `strip_for_generation()` to remove unnecessary multimodal data while preserving the recovered tensors. This avoids redundant tensor recovery operations during generation.

Applied to files:

cpp/tensorrt_llm/batch_manager/cacheTransceiver.cpp

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Pre-commit Check

🔇 Additional comments (9)

cpp/tensorrt_llm/batch_manager/dataTransceiver.h (2)

240-242: LGTM!

The method declaration is well-documented and follows the existing code style. The const qualifier is appropriate for a cancellation operation.

265-267: LGTM!

The method declaration is consistent with the DataResponder interface and properly documented.

cpp/include/tensorrt_llm/batch_manager/cacheTransceiver.h (2)

67-67: LGTM!

The pure virtual method declaration properly extends the abstract interface and maintains consistency with existing method declarations.

108-108: LGTM!

The override declaration is consistent with the base class interface and follows the established pattern.

cpp/tensorrt_llm/pybind/batch_manager/cacheTransceiver.cpp (2)

72-75: LGTM!

The pybind11 override implementation correctly handles the pure virtual method and follows the established pattern in the file.

88-88: LGTM!

The Python binding properly exposes the cancelRequest method using the appropriate snake_case naming convention for Python.

cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp (3)

135-155: LGTM with good thread safety and error handling.

The DataResponder cancellation logic correctly:

Uses proper mutex locking to ensure thread safety

Checks if the request is not currently being processed before canceling

Provides appropriate logging for non-cancelable requests

Has proper exception handling with re-throwing

554-557: LGTM!

The public wrapper method properly delegates to the implementation class.

571-574: LGTM!

The public wrapper method properly delegates to the implementation class.

cpp/tensorrt_llm/batch_manager/cacheTransceiver.cpp

cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cbc9f98 and 7f51fa0.

📒 Files selected for processing (7)

cpp/include/tensorrt_llm/batch_manager/cacheTransceiver.h (2 hunks)
cpp/tensorrt_llm/batch_manager/cacheTransceiver.cpp (1 hunks)
cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp (9 hunks)
cpp/tensorrt_llm/batch_manager/dataTransceiver.h (4 hunks)
cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.cpp (2 hunks)
cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.h (3 hunks)
cpp/tensorrt_llm/pybind/batch_manager/cacheTransceiver.cpp (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (5)

cpp/tensorrt_llm/batch_manager/cacheTransceiver.cpp
cpp/tensorrt_llm/batch_manager/dataTransceiver.h
cpp/tensorrt_llm/pybind/batch_manager/cacheTransceiver.cpp
cpp/include/tensorrt_llm/batch_manager/cacheTransceiver.h
cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp

🧰 Additional context used

📓 Path-based instructions (3)

**/*.{cpp,h,hpp,cc,cxx}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.{cpp,h,hpp,cc,cxx}: Closing braces of namespaces should have a comment saying the namespace it closes (e.g., } // namespace foo)
Prefer const or constexpr variables over #defines whenever possible, as the latter are not visible to the compiler.
A variable that is not modified after its initialization should be declared as const.
Except 0 (only used in comparison for checking signness/existence/emptiness) and nullptr, true, false, all other literals should only be used for variable initialization.
Use the Allman indentation style for braces in C++ code.
Put the semicolon for an empty for or while loop in a new line.
The statement forming the body of a switch, while, do .. while or for statement shall be a compound statement (use brace-delimited statements).
If and else should always be followed by brace-delimited statements, even if empty or a single statement.
C++ filenames should use camel case with first letter lowercase (e.g., thisIsAFilename.cpp), and all files involved in the compilation of a target must have filenames that are case-insensitive unique.
All types (including class names) are camel case with uppercase first letter (e.g., FooBarClass).
Local variables, methods, and namespaces use camel case with first letter lowercase (e.g., localFooBar).
Non-magic-number global variables that are non-static and not defined in anonymous namespace use camel case prefixed by a lower case 'g' (e.g., gDontUseGlobalFoos).
Non-magic-number global variables that are static or defined in an anonymous namespace use camel case prefixed by a lower case 's' (e.g., sMutableStaticGlobal).
Locally visible static variable uses camel case with lowercase prefix 's' as the first letter of the name (e.g., static std::once_flag sFlag;).
Class member variables use camel case prefixed with an 'm' (e.g., mNbFooValues). Public member variables do not require the 'm' prefix but it is encouraged for clarity.
Enumerations, global constants, static constants at class-scope and function-...

Files:

cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.cpp
cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.h

**/*.{cpp,h,hpp,cc,cxx,cu,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.

Files:

cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.cpp
cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.h

**/*.{h,hpp}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

Use a preprocessor guard in header files. The guard name must have prefix TRTLLM_ followed by the filename, all in caps, and no trailing underscore.

Files:

cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.h

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Pre-commit Check

🔇 Additional comments (4)

cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.h (3)

39-39: LGTM! Well-defined constant for ready signal communication.

The addition of kREADY_SIGNAL_TAG with value 42 follows the established pattern of other tag constants and provides a clear identifier for ready signal communication between peers.

56-56: LGTM! Method signature follows interface contract.

The sendReadySignal method declaration correctly follows the virtual method signature from the base DataSender interface, with appropriate parameter types and override specification.

87-87: LGTM! Method signature follows interface contract.

The receiveReadySignal method declaration correctly follows the virtual method signature from the base DataReceiver interface, with appropriate parameter types and return type.

cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.cpp (1)

106-116: LGTM! Clean implementation of ready signal broadcast.

The method correctly:

Validates the session exists using TLLM_CHECK

Retrieves all connections from the session

Broadcasts the ready signal to all connections using the appropriate tag

cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.cpp

pcastonguay

Are there any tests that could be added to test/verify this new feature?

Shunkangz · 2025-08-09T05:23:42Z

Are there any tests that could be added to test/verify this new feature?

This PR is still in progress and I need to merge #6362 first. Thanks.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.cpp (1)
277-290: Clarify AND semantics, remove TODO, validate connections, and use fixed-width wire type for readiness

This still carries the unresolved TODO and uses bool over the wire. Also, dereferencing connections without validating they’re non-null risks crashes if not all connections are established. Prior feedback already requested clarifying the AND behavior and removing the TODO.

Apply the following to document AND semantics, validate connections, and read a fixed-width byte:
 bool DataReceiverImpl::receiveReadySignal(TransferSession& session)
 {
-    bool isReadyFinal = true;
-    bool isReady = false;
-    auto connections = session.getConnections();
-    // TODO: check if the logic is correct
-    for (size_t i = 0; i < connections.size(); i++)
-    {
-        connections.at(i)->recv(executor::kv_cache::DataContext{kREADY_SIGNAL_TAG}, &isReady, sizeof(isReady));
-        isReadyFinal &= isReady;
-    }
-
-    return isReadyFinal;
+    // Combine readiness across all peers: return true only if every connection signals ready.
+    bool isReadyFinal = true;
+    std::uint8_t wireReady = 0U;
+    auto connections = session.getConnections();
+    for (size_t i = 0; i < connections.size(); i++)
+    {
+        auto* conn = connections.at(i);
+        TLLM_CHECK(conn != nullptr);
+        conn->recv(executor::kv_cache::DataContext{kREADY_SIGNAL_TAG}, &wireReady, sizeof(wireReady));
+        bool const isReady = (wireReady != 0U);
+        isReadyFinal &= isReady;
+    }
+    return isReadyFinal;
 }
Notes:

I didn’t add early-exit on first false to avoid leaving unread messages pending on other connections. If your protocol guarantees this method is called only once per round and unread messages won’t accumulate, we can short-circuit for latency. Otherwise, keep draining all messages as above.

Please add unit/integration tests covering:

Single peer true/false.

Multiple peers mixed true/false — verify AND semantics.

Behavior when some connections are not yet established (should fail fast or never be invoked in that state). I can draft these tests if helpful.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 7f51fa0 and a5e97f9.

📒 Files selected for processing (6)

cpp/include/tensorrt_llm/batch_manager/cacheTransceiver.h (2 hunks)
cpp/tensorrt_llm/batch_manager/cacheTransceiver.cpp (1 hunks)
cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp (9 hunks)
cpp/tensorrt_llm/batch_manager/dataTransceiver.h (4 hunks)
cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.cpp (2 hunks)
cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.h (3 hunks)

🚧 Files skipped from review as they are similar to previous changes (5)

cpp/include/tensorrt_llm/batch_manager/cacheTransceiver.h
cpp/tensorrt_llm/batch_manager/cacheTransceiver.cpp
cpp/tensorrt_llm/batch_manager/dataTransceiver.h
cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp
cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.h

🧰 Additional context used

📓 Path-based instructions (4)

**/*.{cpp,cxx,cc,cu,h,hpp,hxx,hh,cuh}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.{cpp,cxx,cc,cu,h,hpp,hxx,hh,cuh}: In C++, close namespaces with a comment naming the namespace (e.g., } // namespace foo)
Prefer const/constexpr variables over #define for constants
Declare variables const if not modified after initialization
Use Allman brace style in C++
C++ filenames use lowerCamelCase and must be case-insensitively unique within a build target
C++ type names use UpperCamelCase
Local variables, methods, and namespaces use lowerCamelCase
Global non-static variables not in anonymous namespace use gPrefix lowerCamelCase (e.g., gExample)
Static globals or globals in anonymous namespaces use sPrefix lowerCamelCase
Locally visible static variables start with 's' (e.g., static std::once_flag sFlag;)
Member variables use mPrefix lowerCamelCase; public members may omit but are encouraged to use 'm'
Constants (enums, global/static/function-scope magic numbers) use kPREFIXED_UPPER_SNAKE (e.g., kDIGIT_NUM)
If macros are unavoidable, use UPPER_SNAKE_CASE (prefer constants over #define)
Constructor parameter that conflicts with a public member name gets trailing underscore (foo_)
Literal suffixes should be uppercase (e.g., 1234L not 1234l)
C++: use spaces only; indent 4 spaces
Run clang-format (LLVM style) before submitting; wrap lines at 120 characters
If formatting must be bypassed, use // clang-format off/on around the section
Prefer smart pointers; use unique_ptr for sole ownership, shared_ptr for shared; weak_ptr only in exceptional cases
Do not use deprecated pre-C++11 smart pointers
Use C++ style comments; avoid C comments except special inline cases; prefer // single-line
Capitalize and punctuate full-sentence comments
Follow Doxygen rules: use //! for comments and //!< for members in C++
Disable code with #if/#endif and mnemonic conditions; avoid commented-out code; avoid dead code
Do not throw exceptions across library boundaries
Use least-forceful casts; avoid removing const/volatile; avoid C-style and functional casts (except constructors); p...

Files:

cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.cpp

**/*.{cpp,cxx,cc,cu}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.{cpp,cxx,cc,cu}: Avoid literal values except for 0, nullptr, true, false; use named constexpr for other literals
Place semicolon of empty for/while loop on a new line
Always use brace-delimited bodies for switch/while/do-for/if/else
Use inline C comments in argument lists when parameter meaning is unclear (e.g., /* checkForErrors = */ false)
Do not use assignment in subexpressions (e.g., if (x = y) ... is forbidden)
Switch on enums should enumerate all values and omit default to catch new values at compile time
Structure switch statements; prohibit fallthrough except between empty cases; each case ends with break or throw; return at end of case not allowed; put break inside braces for compound case
Prefer anonymous namespaces over static for internal linkage of functions
Every defined function must be called at least once (no unused methods)

Files:

cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.cpp

**/*.{h,hpp,hxx,hh,cuh,cpp,cxx,cc,cu}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

Parameter names must be consistent between declarations and definitions

Files:

cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.cpp

**/*.{cpp,cxx,cc,cu,h,hpp,hxx,hh,cuh,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

Prepend NVIDIA copyright header (current year) to all source files

Files:

cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.cpp

🧠 Learnings (1)

📚 Learning: 2025-08-06T08:18:28.669Z

Learnt from: zhengd-nv
PR: NVIDIA/TensorRT-LLM#6633
File: cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.cpp:145-155
Timestamp: 2025-08-06T08:18:28.669Z
Learning: In cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.cpp, the existing `mMtxForMap` mutex in DataSenderImpl is sufficient to synchronize measurement file operations in the `release` method, as all file operations occur within the same critical section that protects the `mRequestToSession` map access.

Applied to files:

cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.cpp

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Pre-commit Check

cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.cpp

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp (1)

259-306: Response loop can dereference empty mCurrentRequest; add guard and consolidate locking.

Avoid bad_optional_access on mCurrentRequest.value() and race on mReadyResponses iterator.

-                auto it = getCurrentResponse();
-                bool isReady = !isCancelled(mCurrentRequest.value());
+                if (!isSending())
+                {
+                    continue;
+                }
+                auto it = getCurrentResponse();
+                auto const curReqId = getCurrentRequestId();
+                bool isReady = !isCancelled(curReqId);
                 if (it != mReadyResponses.end() && isReady)
                 {
-                    auto reqId = mCurrentRequest.value();
+                    auto reqId = curReqId;
                     auto count = --mRemainSendCount[reqId];
@@
                 else
                 {
-                    // The request has been cancelled. Release the resources.
-                    auto it = mReadyResponses.find(mCurrentRequest.value());
+                    // The request has been cancelled. Release the resources.
+                    auto it = mReadyResponses.find(curReqId);
                     auto newException = std::make_exception_ptr(std::runtime_error("Request cancelled"));
                     it->second.mPromise.set_exception(newException);
                     {
                         std::unique_lock lkResp(mResponderMutex);
                         mReadyResponses.erase(it);
-                        mCancelledRequests.erase(mCurrentRequest.value());
+                        mCancelledRequests.erase(curReqId);
                     }
                     mCurrentRequest = std::nullopt;

♻️ Duplicate comments (1)

cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp (1)

428-455: Cancel in requester: handle missing processInfo and protect map access.

at(processInfo) can throw; also mInstanceToAsyncResource is accessed without synchronization from multiple threads.

-    bool cancelRequest(LlmRequest& llmRequest)
+    bool cancelRequest(LlmRequest& llmRequest)
     {
-
         std::string processInfo = "default";
         if (common::getEnvRequestKVCacheConcurrent())
         {
             processInfo = llmRequest.getDataTransceiverState().getCommState()->toString();
         }
 
         bool isCancelled = false;
-        auto& asyncResource = mInstanceToAsyncResource.at(processInfo);
+        auto itRes = mInstanceToAsyncResource.find(processInfo);
+        if (itRes == mInstanceToAsyncResource.end())
+        {
+            TLLM_LOG_WARNING("Cannot cancel request %zu: processInfo not found", llmRequest.mRequestId);
+            return false;
+        }
+        auto& asyncResource = itRes->second;
         {
             std::unique_lock<std::mutex> lck(asyncResource->mMtxForQueue);
             auto it = std::find_if(asyncResource->mRequestsQueue.begin(), asyncResource->mRequestsQueue.end(),
                 [&llmRequest](RequestAndPromise const& requestAndPromise)
                 { return requestAndPromise.mRequest->mRequestId == llmRequest.mRequestId; });
             if (it != asyncResource->mRequestsQueue.end())
             {
                 asyncResource->mRequestsQueue.erase(it);
                 isCancelled = true;
             }
             else
             {
                 TLLM_LOG_WARNING("Cannot cancel request %zu", llmRequest.mRequestId);
             }
         }
         return isCancelled;
     }

Follow-up: consider adding a mutex guarding mInstanceToAsyncResource and mRequestFutures to avoid data races.

🧹 Nitpick comments (5)

cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.h (2)

58-59: Document sendReadySignal contract (per-connection broadcast, payload size, threading).

Brief Doxygen helps prevent misuse; specify if thread-safe and whether it must be called after recvRequestInfo.

90-91: Clarify receiveReadySignal semantics.

Specify whether it blocks until all peers respond, what “false” means (cancel vs transient not-ready), and error handling on recv failures.
cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp (3)
242-252: READY handshake: consider short-circuiting without touching mRemainSendCount when not ready.

Right now you still initialize mRemainSendCount even if isReady=false; move that initialization under the “ready” path.

359-363: Make isCancelled const-correct or document thread-safety.

If feasible, mark isCancelled as const and make mResponderMutex/mCancelledRequests mutable; otherwise leave as-is but add a brief comment that the mutex guards read access.

479-484: Remove unreachable return after TLLM_THROW.

The return is dead code.
-        if (!isReady)
-        {
-            TLLM_THROW("Request %zu is not ready to be received.", llmRequest.mRequestId);
-            return;
-        }
+        if (!isReady)
+        {
+            TLLM_THROW("Request %zu is not ready to be received.", llmRequest.mRequestId);
+        }

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between a5e97f9 and b538dab.

📒 Files selected for processing (6)

cpp/include/tensorrt_llm/batch_manager/cacheTransceiver.h (2 hunks)
cpp/tensorrt_llm/batch_manager/cacheTransceiver.cpp (1 hunks)
cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp (9 hunks)
cpp/tensorrt_llm/batch_manager/dataTransceiver.h (4 hunks)
cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.cpp (2 hunks)
cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.h (3 hunks)

🚧 Files skipped from review as they are similar to previous changes (3)

cpp/include/tensorrt_llm/batch_manager/cacheTransceiver.h
cpp/tensorrt_llm/batch_manager/dataTransceiver.h
cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.cpp

🧰 Additional context used

📓 Path-based instructions (4)

**/*.{c,cc,cpp,cxx,cu}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.{c,cc,cpp,cxx,cu}: Closing braces of C++ namespaces must include a trailing comment naming the namespace (e.g., } // namespace foo)
Use Allman brace style; empty for/while loop semicolon on its own line; always use braces for control statements
C++ filenames must be lowerCamelCase (e.g., thisIsAFilename.cpp) and be case-insensitively unique within a compilation target
Use smart pointers; prefer unique_ptr for sole ownership, shared_ptr for shared; weak_ptr only in exceptional cases; do not use deprecated smart pointers
In implementation, prefer C++ comments (//); use inline C comments only for annotating parameters in calls (e.g., /* checkForErrors = */ false)
Do not use assignment in subexpressions (e.g., if (x = y) or chained x = y = z)

Files:

cpp/tensorrt_llm/batch_manager/cacheTransceiver.cpp
cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp

**/*.{c,cc,cpp,cxx,cu,h,hh,hpp,hxx,cuh}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.{c,cc,cpp,cxx,cu,h,hh,hpp,hxx,cuh}: Prefer const or constexpr variables over #define for constants; variables not modified after initialization must be declared const
Avoid using literals (except 0, nullptr, true, false) outside of initialization; prefer named constexpr constants
Type names (classes, structs, enums, typedefs) must be UpperCamelCase
Local variables, methods, and namespaces must be lowerCamelCase
Non-magic-number global variables that are non-static/not in anonymous namespace must be prefixed with g (e.g., gDontUseGlobalFoos)
Non-magic-number globals that are static or in an anonymous namespace must be prefixed with s (e.g., sMutableStaticGlobal)
Locally visible static variables should be lowerCamelCase prefixed with s (e.g., static std::once_flag sFlag)
Member variables should be lowerCamelCase prefixed with m (e.g., mNbFooValues); public members may omit but prefix is encouraged for clarity
Constants (enums, globals, static constants, and function-scope magic numbers) should be UPPER_SNAKE_CASE with k prefix (e.g., kDIGIT_NUM)
Avoid Hungarian notation except limited 'apps Hungarian' like nb for counts; literal suffixes should be uppercase (e.g., 1234L)
Use spaces only; indent with 4 spaces (no tabs)
Format C++ code with clang-format (LLVM style) and limit lines to 120 characters; exceptions must be bracketed with // clang-format off/on
Disable code with #if/#endif (prefer mnemonic conditions) or macros that noop in release; do not comment out code; avoid dead code
Use the least forceful cast necessary; avoid removing const/volatile; avoid C-style and functional casts (except explicit constructors); cast void* to T* with static_cast; use reinterpret_cast only as last resort; avoid dynamic_cast
Switch on enum should cover all values and omit default when possible; switch statements must be well-structured with no fall-through except between adjacent empty cases; each case must end with break or throw; returns at end of case are not allowed; if ...

Files:

cpp/tensorrt_llm/batch_manager/cacheTransceiver.cpp
cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.h
cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp

**/*.{c,cc,cpp,cxx,h,hh,hpp,hxx,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Prepend the NVIDIA copyright header (current year) to all source files (.cpp, .h, .cu, .py, etc.)

Files:

cpp/tensorrt_llm/batch_manager/cacheTransceiver.cpp
cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.h
cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp

**/*.{h,hh,hpp,hxx,cuh}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.{h,hh,hpp,hxx,cuh}: Closing braces of C++ namespaces in headers must include a trailing comment naming the namespace
Use Allman brace style and always use braces for control statements in headers as well
C++ header filenames must be lowerCamelCase and case-insensitively unique within a compilation target
Document public C++ interfaces with Doxygen using //! and //!<; C-style comments are not allowed except inline special cases; single-line comments should use // and be properly capitalized and punctuated if full sentences
Avoid assignment in subexpressions within header inline/template code as well
All class/function templates and their members should be instantiated at least once; if a class is not POD, its data members should be private
Use header include guards; name as TRTLLM__H (all caps of filename only, no dirs), no leading underscore and no trailing underscore

Files:

cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.h

🧠 Learnings (3)

📚 Learning: 2025-08-21T09:41:49.347Z

Learnt from: eopXD
PR: NVIDIA/TensorRT-LLM#6768
File: cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp:2010-2045
Timestamp: 2025-08-21T09:41:49.347Z
Learning: In cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp, updateSequenceCacheBlockOffsets is specifically for updating bookkeeping when blocks are added during the context phase, not for refreshing offsets after detach operations. During detach operations, GenerationRequest::removeFrontBlock handles the necessary cache block bookkeeping internally.

Applied to files:

cpp/tensorrt_llm/batch_manager/cacheTransceiver.cpp

📚 Learning: 2025-08-20T06:48:45.368Z

Learnt from: eopXD
PR: NVIDIA/TensorRT-LLM#6768
File: cpp/include/tensorrt_llm/batch_manager/kvCacheManager.h:0-0
Timestamp: 2025-08-20T06:48:45.368Z
Learning: In cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp, updateSequenceCacheBlockOffsets is only called when adding a sequence, not during detach operations. During detach, the cache block bookkeeping is handled by GenerationRequest::removeFrontBlock.

Applied to files:

cpp/tensorrt_llm/batch_manager/cacheTransceiver.cpp

📚 Learning: 2025-08-06T08:18:28.669Z

Learnt from: zhengd-nv
PR: NVIDIA/TensorRT-LLM#6633
File: cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.cpp:145-155
Timestamp: 2025-08-06T08:18:28.669Z
Learning: In cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.cpp, the existing `mMtxForMap` mutex in DataSenderImpl is sufficient to synchronize measurement file operations in the `release` method, as all file operations occur within the same critical section that protects the `mRequestToSession` map access.

Applied to files:

cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.h
cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp

🧬 Code graph analysis (3)

cpp/tensorrt_llm/batch_manager/cacheTransceiver.cpp (2)

cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp (13)

cancelRequest (618-621)

cancelRequest (618-618)

cancelRequest (635-638)

cancelRequest (635-635)

llmRequest (145-145)

llmRequest (172-188)

llmRequest (172-172)

llmRequest (388-388)

llmRequest (394-394)

llmRequest (428-455)

llmRequest (428-428)

llmRequest (471-491)

llmRequest (471-471)

cpp/tensorrt_llm/pybind/batch_manager/cacheTransceiver.cpp (7)

llmRequest (42-45)

llmRequest (42-42)

llmRequest (47-50)

llmRequest (47-47)

llmRequest (52-55)

llmRequest (52-52)

llmRequest (72-75)

cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.h (1)

cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.cpp (4)

sendReadySignal (123-133)

sendReadySignal (123-123)

receiveReadySignal (277-290)

receiveReadySignal (277-277)

cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp (5)

cpp/tensorrt_llm/batch_manager/dataTransceiver.h (2)

nodiscard (162-172)

nodiscard (204-241)

cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.cpp (3)

nodiscard (61-112)

nodiscard (135-138)

nodiscard (145-150)

cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.h (1)

nodiscard (54-118)

cpp/tensorrt_llm/common/envUtils.cpp (2)

getEnvRequestKVCacheConcurrent (333-337)

getEnvRequestKVCacheConcurrent (333-333)

cpp/tensorrt_llm/batch_manager/cacheTransceiver.cpp (2)

cancelRequest (551-562)

cancelRequest (551-551)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Pre-commit Check

🔇 Additional comments (4)

cpp/tensorrt_llm/batch_manager/cacheTransceiver.cpp (1)
551-562: Harden cancelRequest: null-check and log unsupported types; keep behavior explicit.

Add TLLM_CHECK for llmRequest and warn/return false for unsupported request types.
[ suggest_essential_refactor ]
 bool CacheTransceiver::cancelRequest(LlmRequest* llmRequest)
 {
-    if (llmRequest->isContextOnlyRequest())
+    TLLM_CHECK(llmRequest);
+    if (llmRequest->isContextOnlyRequest())
     {
         return mDataResponder->cancelRequest(*llmRequest);
     }
     else if (llmRequest->isGenerationOnlyRequest())
     {
         return mDataRequester->cancelRequest(*llmRequest);
     }
-    return false;
+    TLLM_LOG_WARNING("Cannot cancel request %zu: unsupported request type", llmRequest->mRequestId);
+    return false;
 }
cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.h (1)

41-41: READY signal tag addition looks good.

Constant is named and scoped; no issues.

cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp (2)

618-621: Forwarder LGTM.

Simple delegation; consistent with interface.

635-638: Forwarder LGTM.

Simple delegation; consistent with interface.

cpp/tensorrt_llm/batch_manager/cacheTransceiver.cpp

cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp

Shunkangz · 2025-08-27T03:34:07Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-08-27T03:39:16Z

PR_Github #16629 [ run ] triggered by Bot

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp (1)

259-305: On cancel path, release session and fulfill the promise to avoid leaks/hangs.

TransferSession held by the sender is never released on cancel → memory/state leak.
The associated promise is erased without being fulfilled → futures may complete with broken_promise or hang depending on usage.

Apply this diff to set an explicit exception, clean up counters, and release the session:

-                else
-                {
-                    // The request has been cancelled. Release the resources.
-                    auto it = mReadyResponses.find(mCurrentRequest.value());
-                    {
-                        std::unique_lock lkResp(mResponderMutex);
-                        mReadyResponses.erase(it);
-                        mCancelledRequests.erase(mCurrentRequest.value());
-                        mRemainSendCount.erase(mCurrentRequest.value());
-                    }
-                    mCurrentRequest = std::nullopt;
-
-                    if (mReadyResponses.empty())
-                    {
-                        std::unique_lock lk(mCondMutex);
-                        mAnyReady = false;
-                    }
-                }
+                else
+                {
+                    // The request has been cancelled. Fulfill promise and release resources.
+                    auto const reqId = mCurrentRequest.value();
+                    {
+                        std::unique_lock lkResp(mResponderMutex);
+                        auto itReady = mReadyResponses.find(reqId);
+                        if (itReady != mReadyResponses.end())
+                        {
+                            itReady->second.mPromise.set_exception(
+                                std::make_exception_ptr(std::runtime_error("Request cancelled")));
+                            mReadyResponses.erase(itReady);
+                        }
+                        mCancelledRequests.erase(reqId);
+                        mRemainSendCount.erase(reqId);
+                    }
+                    // Release sender-side session created in recvRequestInfo()
+                    mSender->release(reqId);
+                    mCurrentRequest = std::nullopt;
+
+                    if (mReadyResponses.empty())
+                    {
+                        std::unique_lock lk(mCondMutex);
+                        mAnyReady = false;
+                    }
+                }

♻️ Duplicate comments (3)

cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp (3)

172-188: Wake responder after marking cancellation (and avoid silent delays).

After inserting into mCancelledRequests, set mAnyReady and notify the CV so the responder reacts promptly. This also aligns with prior feedback.

Apply this diff:

     bool cancelRequest(LlmRequest& llmRequest)
     {
         bool isCancelled = false;
         std::unique_lock lkResp(mResponderMutex);
         auto it = mReadyResponses.find(llmRequest.mRequestId);
         // If the request is not the current request and already in the ready queue, we can cancel it.
         if (it != mReadyResponses.end() && (!isSending() || getCurrentRequestId() != llmRequest.mRequestId))
         {
             mCancelledRequests.insert(llmRequest.mRequestId);
             isCancelled = true;
         }
         else
         {
             TLLM_LOG_WARNING("Cannot cancel request %zu", llmRequest.mRequestId);
         }
+        if (isCancelled)
+        {
+            std::unique_lock lkCond(mCondMutex);
+            mAnyReady = true; // ensure responder proceeds
+        }
+        mResponderCv.notify_all();
         return isCancelled;
     }

373-373: Missing direct includes for std::set and std::deque.

mCancelledRequests uses std::set and AsyncResource uses std::deque but the file doesn't include /. Add direct includes to avoid reliance on transitive headers.

Add near the other standard headers:

#include <set>
#include <deque>

427-454: Guard missing processInfo entry and complete the promise on cancel.

at(processInfo) can throw if the async resource hasn’t been created.
Erasing the queued item without setting the promise leaves the future in broken or waiting state.

Apply this diff:

     bool cancelRequest(LlmRequest& llmRequest)
     {
-
-        std::string processInfo = "default";
+        std::string processInfo = "default";
         if (common::getEnvRequestKVCacheConcurrent())
         {
             processInfo = llmRequest.getDataTransceiverState().getCommState()->toString();
         }
 
         bool isCancelled = false;
-        auto& asyncResource = mInstanceToAsyncResource.at(processInfo);
+        auto itAsync = mInstanceToAsyncResource.find(processInfo);
+        if (itAsync == mInstanceToAsyncResource.end())
+        {
+            TLLM_LOG_WARNING("Cannot cancel request %zu: processInfo not found", llmRequest.mRequestId);
+            return false;
+        }
+        auto& asyncResource = itAsync->second;
         {
             std::unique_lock<std::mutex> lck(asyncResource->mMtxForQueue);
             auto it = std::find_if(asyncResource->mRequestsQueue.begin(), asyncResource->mRequestsQueue.end(),
                 [&llmRequest](RequestAndPromise const& requestAndPromise)
                 { return requestAndPromise.mRequest->mRequestId == llmRequest.mRequestId; });
             if (it != asyncResource->mRequestsQueue.end())
             {
-                asyncResource->mRequestsQueue.erase(it);
+                // Complete the promise to unblock waiters
+                if (it->mPromise)
+                {
+                    it->mPromise->set_exception(std::make_exception_ptr(std::runtime_error("Request cancelled")));
+                }
+                asyncResource->mRequestsQueue.erase(it);
                 isCancelled = true;
             }
             else
             {
                 TLLM_LOG_WARNING("Cannot cancel request %zu", llmRequest.mRequestId);
             }
         }
+        asyncResource->mCVforQueue.notify_all();
         return isCancelled;
     }

🧹 Nitpick comments (3)

tests/unittest/others/test_kv_cache_transceiver.py (2)

1-1: Avoid fixed sleeps; prefer polling/futures to reduce flakiness.

time.sleep-based sync is brittle on busy CI. Below I propose a retry-loop for cancel and waiting on the returned future instead of sleeping.

130-196: Harden cancellation test: retry cancel, distinct IDs, await async, and assert no mutation.

Retry cancel until it succeeds (avoid racing the sender).
Use a different request_id for gen to avoid accidental collisions.
Capture and await the async future instead of sleeping.
Assert the gen buffer stays unchanged after cancellation.

Apply this diff within the shown hunk:

@@
 def test_cancel_request_in_transmission(attention_type):
@@
-    ctx_request = LlmRequest(
+    ctx_request = LlmRequest(
         request_id=0,
@@
-    # wait for ctx request to be sent
-    time.sleep(2)
-
-    # cancel ctx request
-    is_cancelled = kv_cache_transceiver_ctx.cancel_request(ctx_request)
-    assert is_cancelled
+    # cancel ctx request (retry until it’s not the current-sending one)
+    start = time.monotonic()
+    is_cancelled = False
+    while time.monotonic() - start < 5.0:
+        if kv_cache_transceiver_ctx.cancel_request(ctx_request):
+            is_cancelled = True
+            break
+        time.sleep(0.01)
+    assert is_cancelled, "Failed to cancel ctx request within 5s"
@@
-    gen_request = LlmRequest(
-        request_id=0,
+    gen_request = LlmRequest(
+        request_id=1,
         max_new_tokens=1,
@@
-    kv_cache_manager_gen.impl.add_sequence(gen_request.py_request_id,
+    # Snapshot before receive to assert no mutation on cancel
+    gen_before = kv_cache_manager_gen.get_buffers(0).clone()
+    kv_cache_manager_gen.impl.add_sequence(gen_request.py_request_id,
                                            gen_request.prompt_len, 1,
                                            gen_request)
@@
-    kv_cache_transceiver_gen.request_and_receive_async(gen_request)
-
-    # Block the main thread due to the async operation
-    time.sleep(2)
+    fut = kv_cache_transceiver_gen.request_and_receive_async(gen_request)
+    # Prefer awaiting the future if available; otherwise fall back to a short sleep
+    try:
+        if fut is not None:
+            fut.result(timeout=5)
+        else:
+            time.sleep(0.2)
+    except Exception:
+        # Cancellation path may propagate as an exception; acceptable for this test.
+        pass
+    # Verify the cancelled transfer did not modify the gen buffer
+    assert torch.equal(kv_cache_manager_gen.get_buffers(0), gen_before), "gen buffer changed despite cancellation"

cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp (1)

358-363: Minor: make isCancelled const-friendly.

Method reads state under mutex; consider marking it const and the mutex as mutable to communicate intent. Low priority.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between b538dab and db0aef7.

📒 Files selected for processing (2)

cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp (9 hunks)
tests/unittest/others/test_kv_cache_transceiver.py (2 hunks)

🧰 Additional context used

📓 Path-based instructions (4)

**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Code must target Python 3.8+
Indent Python code with 4 spaces; do not use tabs
Preserve module namespaces when importing; import modules/packages and access members via the module (e.g., from package.subpackage import foo; foo.SomeClass())
Python file names should be snake_case
Python class names should be PascalCase
Python functions/methods and local variables should be snake_case; variables beginning with a number should be prefixed with k_ (e.g., k_99th_percentile)
Global variables should be UPPER_SNAKE_CASE prefixed with G_ (e.g., G_MY_GLOBAL); constants should be UPPER_SNAKE_CASE
Avoid shadowing variables from outer scopes; initialize all externally visible members in init
Prefer docstrings for interfaces used outside a file; comments should be reserved for in-function or file-local interfaces
Use Google-style docstrings for classes and functions; attributes and variables may be documented inline with trailing string literals
Avoid reflection when simpler, explicit code suffices (e.g., avoid dict(**locals()) patterns)
In try/except, catch the narrowest exceptions possible
For duck-typing patterns, keep the try body minimal and move logic to else to avoid masking unrelated failures

Files:

tests/unittest/others/test_kv_cache_transceiver.py

**/*.{c,cc,cpp,cxx,h,hh,hpp,hxx,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Prepend the NVIDIA copyright header (current year) to all source files (.cpp, .h, .cu, .py, etc.)

Files:

tests/unittest/others/test_kv_cache_transceiver.py
cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp

**/*.{c,cc,cpp,cxx,cu}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.{c,cc,cpp,cxx,cu}: Closing braces of C++ namespaces must include a trailing comment naming the namespace (e.g., } // namespace foo)
Use Allman brace style; empty for/while loop semicolon on its own line; always use braces for control statements
C++ filenames must be lowerCamelCase (e.g., thisIsAFilename.cpp) and be case-insensitively unique within a compilation target
Use smart pointers; prefer unique_ptr for sole ownership, shared_ptr for shared; weak_ptr only in exceptional cases; do not use deprecated smart pointers
In implementation, prefer C++ comments (//); use inline C comments only for annotating parameters in calls (e.g., /* checkForErrors = */ false)
Do not use assignment in subexpressions (e.g., if (x = y) or chained x = y = z)

Files:

cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp

**/*.{c,cc,cpp,cxx,cu,h,hh,hpp,hxx,cuh}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.{c,cc,cpp,cxx,cu,h,hh,hpp,hxx,cuh}: Prefer const or constexpr variables over #define for constants; variables not modified after initialization must be declared const
Avoid using literals (except 0, nullptr, true, false) outside of initialization; prefer named constexpr constants
Type names (classes, structs, enums, typedefs) must be UpperCamelCase
Local variables, methods, and namespaces must be lowerCamelCase
Non-magic-number global variables that are non-static/not in anonymous namespace must be prefixed with g (e.g., gDontUseGlobalFoos)
Non-magic-number globals that are static or in an anonymous namespace must be prefixed with s (e.g., sMutableStaticGlobal)
Locally visible static variables should be lowerCamelCase prefixed with s (e.g., static std::once_flag sFlag)
Member variables should be lowerCamelCase prefixed with m (e.g., mNbFooValues); public members may omit but prefix is encouraged for clarity
Constants (enums, globals, static constants, and function-scope magic numbers) should be UPPER_SNAKE_CASE with k prefix (e.g., kDIGIT_NUM)
Avoid Hungarian notation except limited 'apps Hungarian' like nb for counts; literal suffixes should be uppercase (e.g., 1234L)
Use spaces only; indent with 4 spaces (no tabs)
Format C++ code with clang-format (LLVM style) and limit lines to 120 characters; exceptions must be bracketed with // clang-format off/on
Disable code with #if/#endif (prefer mnemonic conditions) or macros that noop in release; do not comment out code; avoid dead code
Use the least forceful cast necessary; avoid removing const/volatile; avoid C-style and functional casts (except explicit constructors); cast void* to T* with static_cast; use reinterpret_cast only as last resort; avoid dynamic_cast
Switch on enum should cover all values and omit default when possible; switch statements must be well-structured with no fall-through except between adjacent empty cases; each case must end with break or throw; returns at end of case are not allowed; if ...

Files:

cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp

🧠 Learnings (1)

📚 Learning: 2025-08-06T08:18:28.669Z

Learnt from: zhengd-nv
PR: NVIDIA/TensorRT-LLM#6633
File: cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.cpp:145-155
Timestamp: 2025-08-06T08:18:28.669Z
Learning: In cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.cpp, the existing `mMtxForMap` mutex in DataSenderImpl is sufficient to synchronize measurement file operations in the `release` method, as all file operations occur within the same critical section that protects the `mRequestToSession` map access.

Applied to files:

cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp

🧬 Code graph analysis (2)

tests/unittest/others/test_kv_cache_transceiver.py (4)

tensorrt_llm/mapping.py (1)

Mapping (32-513)

tensorrt_llm/_torch/pyexecutor/kv_cache_transceiver.py (1)

create_kv_cache_transceiver (30-63)

tensorrt_llm/_torch/pyexecutor/llm_request.py (1)

LlmRequest (282-424)

tensorrt_llm/sampling_params.py (1)

_get_sampling_config (408-438)

cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp (4)

cpp/tensorrt_llm/batch_manager/dataTransceiver.h (2)

nodiscard (162-172)

nodiscard (204-241)

cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.cpp (3)

nodiscard (61-112)

nodiscard (135-138)

nodiscard (145-150)

cpp/tensorrt_llm/common/envUtils.cpp (2)

getEnvRequestKVCacheConcurrent (333-337)

getEnvRequestKVCacheConcurrent (333-333)

cpp/tensorrt_llm/batch_manager/cacheTransceiver.cpp (2)

cancelRequest (551-562)

cancelRequest (551-551)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Pre-commit Check

🔇 Additional comments (1)

cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp (1)

242-252: Ready-signal handshake on cancellation looks good.

sendReadySignal(reqId, isReady) based on the cancelled set is the right contract for short-circuiting receivers.

cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp

tensorrt-cicd · 2025-08-27T04:24:34Z

PR_Github #16629 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #12485 completed with status: 'FAILURE'

Shunkangz · 2025-08-27T04:30:35Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-08-27T04:35:59Z

PR_Github #16637 [ run ] triggered by Bot

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

cpp/tensorrt_llm/pybind/batch_manager/cacheTransceiver.cpp (1)

81-81: Fix typo: use py::class_ (not py::classh)

Compilation will fail with pybind11: classh is invalid.

Apply:

-    py::classh<tb::BaseCacheTransceiver, PyCacheTransceiver>(m, "BaseCacheTransceiver")
+    py::class_<tb::BaseCacheTransceiver, PyCacheTransceiver>(m, "BaseCacheTransceiver")
...
-    py::classh<tb::CacheTransceiver, tb::BaseCacheTransceiver>(m, "CacheTransceiver")
+    py::class_<tb::CacheTransceiver, tb::BaseCacheTransceiver>(m, "CacheTransceiver")

Also applies to: 94-94

cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp (1)

259-305: Propagate cancellation to waiting futures instead of dropping promise

Currently the promise is erased without signaling; callers will see std::future_error(broken_promise) instead of a clear cancellation error.

-                else
-                {
-                    // The request has been cancelled. Release the resources.
-                    auto it = mReadyResponses.find(mCurrentRequest.value());
-                    {
-                        std::unique_lock lkResp(mResponderMutex);
-                        mReadyResponses.erase(it);
-                        mCancelledRequests.erase(mCurrentRequest.value());
-                        mRemainSendCount.erase(mCurrentRequest.value());
-                    }
-                    mCurrentRequest = std::nullopt;
-
-                    if (mReadyResponses.empty())
-                    {
-                        std::unique_lock lk(mCondMutex);
-                        mAnyReady = false;
-                    }
-                }
+                else
+                {
+                    // The request has been cancelled. Notify the waiter and release resources.
+                    auto curId = mCurrentRequest.value();
+                    {
+                        std::unique_lock lkResp(mResponderMutex);
+                        auto it2 = mReadyResponses.find(curId);
+                        if (it2 != mReadyResponses.end())
+                        {
+                            it2->second.mPromise.set_exception(
+                                std::make_exception_ptr(std::runtime_error("Request cancelled")));
+                            mReadyResponses.erase(it2);
+                        }
+                        mCancelledRequests.erase(curId);
+                        mRemainSendCount.erase(curId);
+                    }
+                    mCurrentRequest = std::nullopt;
+
+                    if (mReadyResponses.empty())
+                    {
+                        std::unique_lock lk(mCondMutex);
+                        mAnyReady = false;
+                    }
+                }

Add direct include:

+#include <stdexcept>

near other standard headers.

♻️ Duplicate comments (3)

cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp (3)

172-188: Wake responder thread after marking cancellation

Without notifying, the responder may not promptly observe the cancellation.

     bool cancelRequest(LlmRequest& llmRequest)
     {
         bool isCancelled = false;
         std::unique_lock lkResp(mResponderMutex);
         auto it = mReadyResponses.find(llmRequest.mRequestId);
         // If the request is not the current request and already in the ready queue, we can cancel it.
         if (it != mReadyResponses.end() && (!isSending() || getCurrentRequestId() != llmRequest.mRequestId))
         {
             mCancelledRequests.insert(llmRequest.mRequestId);
             isCancelled = true;
         }
         else
         {
             TLLM_LOG_WARNING("Cannot cancel request %zu", llmRequest.mRequestId);
         }
+        if (isCancelled)
+        {
+            std::unique_lock lkCond(mCondMutex);
+            mAnyReady = true; // ensure waiter proceeds
+            mResponderCv.notify_all();
+        }
         return isCancelled;
     }

373-373: Add missing header for std::set

Avoid relying on transitive includes.

 #include <unordered_map>
+#include <set>

427-454: Guard missing processInfo entry to avoid out_of_range

at() will throw if the key isn’t present; handle gracefully.

     bool cancelRequest(LlmRequest& llmRequest)
     {
 
         std::string processInfo = "default";
         if (common::getEnvRequestKVCacheConcurrent())
         {
             processInfo = llmRequest.getDataTransceiverState().getCommState()->toString();
         }
 
         bool isCancelled = false;
-        auto& asyncResource = mInstanceToAsyncResource.at(processInfo);
+        auto itRes = mInstanceToAsyncResource.find(processInfo);
+        if (itRes == mInstanceToAsyncResource.end())
+        {
+            TLLM_LOG_WARNING("Cannot cancel request %zu: processInfo '%s' not found",
+                llmRequest.mRequestId, processInfo.c_str());
+            return false;
+        }
+        auto& asyncResource = itRes->second;
         {
-            std::unique_lock<std::mutex> lck(asyncResource->mMtxForQueue);
-            auto it = std::find_if(asyncResource->mRequestsQueue.begin(), asyncResource->mRequestsQueue.end(),
+            std::unique_lock<std::mutex> lck(asyncResource->mMtxForQueue);
+            auto it = std::find_if(asyncResource->mRequestsQueue.begin(), asyncResource->mRequestsQueue.end(),
                 [&llmRequest](RequestAndPromise const& requestAndPromise)
                 { return requestAndPromise.mRequest->mRequestId == llmRequest.mRequestId; });
-            if (it != asyncResource->mRequestsQueue.end())
+            if (it != asyncResource->mRequestsQueue.end())
             {
-                asyncResource->mRequestsQueue.erase(it);
+                asyncResource->mRequestsQueue.erase(it);
                 isCancelled = true;
             }
             else
             {
                 TLLM_LOG_WARNING("Cannot cancel request %zu", llmRequest.mRequestId);
             }
         }
         return isCancelled;
     }

🧹 Nitpick comments (2)

cpp/tensorrt_llm/pybind/batch_manager/cacheTransceiver.cpp (1)
2-2: Consider updating the copyright year

Guidelines require current year; header shows 2022-2024. If appropriate for this repo, extend to 2025.
- * SPDX-FileCopyrightText: Copyright (c) 2022-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ * SPDX-FileCopyrightText: Copyright (c) 2022-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp (1)
358-362: Make isCancelled const-safe (optional)

Consider marking as const and making mResponderMutex mutable for clarity of thread-safety intent.
-    [[nodiscard]] bool isCancelled(RequestIdType requestId)
+    [[nodiscard]] bool isCancelled(RequestIdType requestId) const

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between db0aef7 and 1df042c.

📒 Files selected for processing (2)

cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp (9 hunks)
cpp/tensorrt_llm/pybind/batch_manager/cacheTransceiver.cpp (2 hunks)

🧰 Additional context used

📓 Path-based instructions (3)

**/*.{c,cc,cpp,cxx,cu}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.{c,cc,cpp,cxx,cu}: Closing braces of C++ namespaces must include a trailing comment naming the namespace (e.g., } // namespace foo)
Use Allman brace style; empty for/while loop semicolon on its own line; always use braces for control statements
C++ filenames must be lowerCamelCase (e.g., thisIsAFilename.cpp) and be case-insensitively unique within a compilation target
Use smart pointers; prefer unique_ptr for sole ownership, shared_ptr for shared; weak_ptr only in exceptional cases; do not use deprecated smart pointers
In implementation, prefer C++ comments (//); use inline C comments only for annotating parameters in calls (e.g., /* checkForErrors = */ false)
Do not use assignment in subexpressions (e.g., if (x = y) or chained x = y = z)

Files:

cpp/tensorrt_llm/pybind/batch_manager/cacheTransceiver.cpp
cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp

**/*.{c,cc,cpp,cxx,cu,h,hh,hpp,hxx,cuh}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.{c,cc,cpp,cxx,cu,h,hh,hpp,hxx,cuh}: Prefer const or constexpr variables over #define for constants; variables not modified after initialization must be declared const
Avoid using literals (except 0, nullptr, true, false) outside of initialization; prefer named constexpr constants
Type names (classes, structs, enums, typedefs) must be UpperCamelCase
Local variables, methods, and namespaces must be lowerCamelCase
Non-magic-number global variables that are non-static/not in anonymous namespace must be prefixed with g (e.g., gDontUseGlobalFoos)
Non-magic-number globals that are static or in an anonymous namespace must be prefixed with s (e.g., sMutableStaticGlobal)
Locally visible static variables should be lowerCamelCase prefixed with s (e.g., static std::once_flag sFlag)
Member variables should be lowerCamelCase prefixed with m (e.g., mNbFooValues); public members may omit but prefix is encouraged for clarity
Constants (enums, globals, static constants, and function-scope magic numbers) should be UPPER_SNAKE_CASE with k prefix (e.g., kDIGIT_NUM)
Avoid Hungarian notation except limited 'apps Hungarian' like nb for counts; literal suffixes should be uppercase (e.g., 1234L)
Use spaces only; indent with 4 spaces (no tabs)
Format C++ code with clang-format (LLVM style) and limit lines to 120 characters; exceptions must be bracketed with // clang-format off/on
Disable code with #if/#endif (prefer mnemonic conditions) or macros that noop in release; do not comment out code; avoid dead code
Use the least forceful cast necessary; avoid removing const/volatile; avoid C-style and functional casts (except explicit constructors); cast void* to T* with static_cast; use reinterpret_cast only as last resort; avoid dynamic_cast
Switch on enum should cover all values and omit default when possible; switch statements must be well-structured with no fall-through except between adjacent empty cases; each case must end with break or throw; returns at end of case are not allowed; if ...

Files:

cpp/tensorrt_llm/pybind/batch_manager/cacheTransceiver.cpp
cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp

**/*.{c,cc,cpp,cxx,h,hh,hpp,hxx,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Prepend the NVIDIA copyright header (current year) to all source files (.cpp, .h, .cu, .py, etc.)

Files:

cpp/tensorrt_llm/pybind/batch_manager/cacheTransceiver.cpp
cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp

🧠 Learnings (1)

📚 Learning: 2025-08-06T08:18:28.669Z

Learnt from: zhengd-nv
PR: NVIDIA/TensorRT-LLM#6633
File: cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.cpp:145-155
Timestamp: 2025-08-06T08:18:28.669Z
Learning: In cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.cpp, the existing `mMtxForMap` mutex in DataSenderImpl is sufficient to synchronize measurement file operations in the `release` method, as all file operations occur within the same critical section that protects the `mRequestToSession` map access.

Applied to files:

cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp

🧬 Code graph analysis (2)

cpp/tensorrt_llm/pybind/batch_manager/cacheTransceiver.cpp (2)

cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp (13)

llmRequest (145-145)

llmRequest (172-188)

llmRequest (172-172)

llmRequest (387-387)

llmRequest (393-393)

llmRequest (427-454)

llmRequest (427-427)

llmRequest (470-490)

llmRequest (470-470)

cancelRequest (617-620)

cancelRequest (617-617)

cancelRequest (634-637)

cancelRequest (634-634)

cpp/tensorrt_llm/batch_manager/cacheTransceiver.cpp (4)

checkGenTransferComplete (546-549)

checkGenTransferComplete (546-546)

cancelRequest (551-562)

cancelRequest (551-551)

cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp (5)

cpp/tensorrt_llm/pybind/batch_manager/cacheTransceiver.cpp (8)

llmRequest (42-45)

llmRequest (42-42)

llmRequest (47-50)

llmRequest (47-47)

llmRequest (52-55)

llmRequest (52-52)

llmRequest (72-75)

llmRequest (72-72)

cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.h (1)

nodiscard (54-118)

cpp/tensorrt_llm/batch_manager/dataTransceiver.h (2)

nodiscard (162-172)

nodiscard (204-241)

cpp/tensorrt_llm/common/envUtils.cpp (2)

getEnvRequestKVCacheConcurrent (333-337)

getEnvRequestKVCacheConcurrent (333-333)

cpp/tensorrt_llm/batch_manager/cacheTransceiver.cpp (2)

cancelRequest (551-562)

cancelRequest (551-551)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Pre-commit Check

🔇 Additional comments (5)

cpp/tensorrt_llm/pybind/batch_manager/cacheTransceiver.cpp (2)

71-75: Expose cancelRequest override — LGTM

The pure override with PYBIND11_OVERLOAD_PURE is correct.

87-88: Python API bindings — LGTM

Names are consistent (snake_case) and map correctly to C++ methods.

cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp (3)

242-252: Ready-signal handshake — LGTM

Correctly computes readiness and signals counterpart.

617-620: Responder cancelRequest wrapper — LGTM

Thin wrapper correctly forwards to Impl.

634-637: Requester cancelRequest wrapper — LGTM

Forwarding to Impl is correct.

cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp

tensorrt-cicd · 2025-08-27T08:43:38Z

PR_Github #16637 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #12491 completed with status: 'FAILURE'

cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.cpp

tensorrt-cicd · 2025-09-10T04:28:31Z

PR_Github #18152 [ run ] completed with state ABORTED

Shunkangz · 2025-09-16T08:08:31Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-09-16T08:14:14Z

PR_Github #18749 [ run ] triggered by Bot

tensorrt-cicd · 2025-09-16T13:00:16Z

PR_Github #18749 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #14058 completed with status: 'FAILURE'

Shunkangz · 2025-09-22T08:51:22Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-09-22T08:57:12Z

PR_Github #19552 [ run ] triggered by Bot

tensorrt-cicd · 2025-09-22T21:19:27Z

PR_Github #19552 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #14698 completed with status: 'SUCCESS'

cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.cpp

cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.h

cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp

Shixiaowei02 · 2025-09-23T03:09:37Z

cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp

+        for (size_t i = 0; i < connections.size(); i++)
+        {
+            auto* agentConnectionManager = dynamic_cast<executor::kv_cache::AgentConnectionManager*>(mManager);
+            if (agentConnectionManager != nullptr)


Suggested change

if (agentConnectionManager != nullptr)

if (agentConnectionManager)

cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp

Shixiaowei02 · 2025-09-23T03:11:00Z

cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp

+        for (size_t i = 0; i < connections.size(); i++)
+        {
+            auto* agentConnectionManager = dynamic_cast<executor::kv_cache::AgentConnectionManager*>(mManager);
+            if (agentConnectionManager != nullptr)


Suggested change

if (agentConnectionManager != nullptr)

if (agentConnectionManager)

cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp

tensorrt_llm/_torch/pyexecutor/py_executor.py

cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.cpp

chuangz0 · 2025-09-23T07:25:52Z

tests/unittest/others/test_kv_cache_transceiver.py

+                                           gen_request.prompt_len, 1,
+                                           gen_request)
+    # send gen request
+    kv_cache_transceiver_gen.request_and_receive_async(gen_request)


should we check whether the state is DISAGG_TRANS_ERROR

cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp

pcastonguay · 2025-09-24T15:41:41Z

cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp

-                CacheSender::Impl::sendAndRemoveResponse(it->first, std::move(it->second));
+                // TODO: if the generation does not require the kv cache, the request will
+                // not be removed from mCancelledRequests. This should be handled by timeout.
+                auto it = mReadyResponses.find(mCurrentRequest.value());


Should we check that it != end?

I think that we do not need to check. The logic comes here because the request has been cancelled. In the cancel_request, we do not modify the mReadyResponses. Instead, we insert the request id into mCancelledRequests.

cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.cpp

cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.h

tensorrt_llm/_torch/pyexecutor/py_executor.py

Signed-off-by: Shunkang <[email protected]>

pcastonguay · 2025-10-09T18:44:47Z

Closing, superseded by #8114

coderabbitai bot reviewed Aug 4, 2025

View reviewed changes

cpp/tensorrt_llm/batch_manager/cacheTransceiver.cpp Outdated Show resolved Hide resolved

cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp Outdated Show resolved Hide resolved

Shunkangz force-pushed the cancel_request_pd branch from cbc9f98 to 7f51fa0 Compare August 4, 2025 06:55

coderabbitai bot reviewed Aug 4, 2025

View reviewed changes

cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.cpp Outdated Show resolved Hide resolved

Shunkangz self-assigned this Aug 4, 2025

Shunkangz requested review from Shixiaowei02, chuangz0, pcastonguay and zhengd-nv August 4, 2025 07:03

pcastonguay requested changes Aug 8, 2025

View reviewed changes

Shunkangz mentioned this pull request Aug 16, 2025

[https://nvbugs/5429636][fix] fix KVC mem leakage by adding kv_transfer_timeout_ms #6801

Closed

Shunkangz force-pushed the cancel_request_pd branch from 7f51fa0 to a5e97f9 Compare August 16, 2025 08:32

Shunkangz requested review from a team as code owners August 16, 2025 08:32

Shunkangz requested a review from Naveassaf August 16, 2025 08:32

coderabbitai bot reviewed Aug 16, 2025

View reviewed changes

cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.cpp Outdated Show resolved Hide resolved

Shunkangz force-pushed the cancel_request_pd branch from a5e97f9 to b538dab Compare August 27, 2025 01:35

coderabbitai bot reviewed Aug 27, 2025

View reviewed changes

cpp/tensorrt_llm/batch_manager/cacheTransceiver.cpp Show resolved Hide resolved

cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp Outdated Show resolved Hide resolved

cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp Outdated Show resolved Hide resolved

Shunkangz force-pushed the cancel_request_pd branch from b538dab to db0aef7 Compare August 27, 2025 03:33

coderabbitai bot reviewed Aug 27, 2025

View reviewed changes

cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp Show resolved Hide resolved

Shunkangz force-pushed the cancel_request_pd branch from db0aef7 to 1df042c Compare August 27, 2025 04:30

coderabbitai bot reviewed Aug 27, 2025

View reviewed changes

cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp Outdated Show resolved Hide resolved

Shunkangz force-pushed the cancel_request_pd branch from 1df042c to 3b9218e Compare August 27, 2025 10:51

chuangz0 reviewed Sep 10, 2025

View reviewed changes

cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.cpp Outdated Show resolved Hide resolved

Shunkangz force-pushed the cancel_request_pd branch from 5285358 to e1d67c6 Compare September 16, 2025 08:08

Shunkangz force-pushed the cancel_request_pd branch from e1d67c6 to ba55434 Compare September 22, 2025 08:50

Shunkangz requested review from chuangz0 and pcastonguay September 23, 2025 02:30

Shixiaowei02 reviewed Sep 23, 2025

View reviewed changes

cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.cpp Outdated Show resolved Hide resolved

Shixiaowei02 reviewed Sep 23, 2025

View reviewed changes

cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.h Show resolved Hide resolved