Resolve cancel scope error in MCP session cleanup with lifetime task #931

AnuradhaKaruppiah · 2025-10-06T23:38:28Z

Description

Previously, session cleanup was calling client.aexit() from a different task context than where aenter() was called, violating anyio's CancelScope requirement that enter and exit must happen in the same task. This caused "Attempted to exit cancel scope in a different task" errors during session cleanup.

Solution:

Introduce a per-client lifetime task that manages the entire client lifecycle
The lifetime task enters the client context (async with client:) and waits for a stop_event signal before exiting
Session cleanup now signals the stop_event and waits for the lifetime task to complete, ensuring aexit runs in the correct task context

This ensures proper cancel scope handling and prevents resource leaks while maintaining thread-safe session management.

By Submitting this PR I confirm:

I am familiar with the Contributing Guidelines.
We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
- Any contribution which contains commits that are not Signed-Off will not be accepted.
When the PR is ready for review, new or existing tests cover these changes.
When the PR is ready for review, the documentation is up to date with these changes.

Summary by CodeRabbit

Bug Fixes
- Sessions now shut down cleanly (stop signaling and fallback cleanup), reducing hangs and timeouts; inactive sessions are reliably removed.
Refactor
- Session lifecycle reworked to include explicit stop events and managed background lifetime tasks for more robust session management.
Chores
- Initialization and cleanup wiring improved to avoid orphaned tasks and lower resource use during idle/long runs.
Tests
- Test harness updated with session cleanup helpers to ensure per-session tasks are torn down and improve isolation.

Signed-off-by: Anuradha Karuppiah <[email protected]>

…ation

Signed-off-by: Anuradha Karuppiah <[email protected]>

Co-authored-by: Will Killian <[email protected]> Signed-off-by: Anuradha Karuppiah <[email protected]>

Signed-off-by: Anuradha Karuppiah <[email protected]>

Also add per-session locks for ref counting Signed-off-by: Anuradha Karuppiah <[email protected]>

Signed-off-by: Anuradha Karuppiah <[email protected]>

…k pattern Previously, session cleanup was calling client.__aexit__() from a different task context than where __aenter__() was called, violating anyio's CancelScope requirement that enter and exit must happen in the same task. This caused "Attempted to exit cancel scope in a different task" errors during session cleanup. Solution: - Introduce a per-client lifetime task that manages the entire client lifecycle - The lifetime task enters the client context (async with client:) and waits for a stop_event signal before exiting - Session cleanup now signals the stop_event and waits for the lifetime task to complete, ensuring __aexit__ runs in the correct task context - Add SessionData fields: stop_event (asyncio.Event) and lifetime_task (asyncio.Task) - Update _create_session_client to return (client, stop_event, task) tuple This ensures proper cancel scope handling and prevents resource leaks while maintaining thread-safe session management. Signed-off-by: Anuradha Karuppiah <[email protected]>

…ation-2

coderabbitai · 2025-10-06T23:39:00Z

Walkthrough

Session lifecycle expanded: SessionData gains stop_event and lifetime_task; _create_session_client now returns (client, stop_event, lifetime_task) and starts a background lifetime task that manages the client's async context; _get_session_client stores these; _cleanup_inactive_sessions signals stop_event and awaits or falls back to direct close, then removes sessions.

Changes

Cohort / File(s)	Change Summary
Session lifecycle management `packages/nvidia_nat_mcp/src/nat/plugins/mcp/client_impl.py`	`SessionData` now includes `stop_event: asyncio.Event` and `lifetime_task: asyncio.Task
Session cleanup flow `packages/nvidia_nat_mcp/src/nat/plugins/mcp/client_impl.py`	`_cleanup_inactive_sessions` collects sessions to close, sets each session's `stop_event` (if present), awaits the `lifetime_task` when available, falls back to calling `client._close()` if no lifetime task exists, and removes sessions from tracking regardless of cleanup success.
Tests: session teardown helper `tests/nat/mcp/test_mcp_session_management.py`	Added `cleanup_sessions(function_group)` helper and imported `SessionData`; tests now call the helper to stop per-session tasks (`stop_event`, `lifetime_task`) and clear the in-memory `_sessions` mapping to prevent leaks between tests.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Caller
    participant ClientImpl as MCP ClientImpl
    participant Factory as _create_session_client
    participant Lifetime as lifetime_task
    participant MCP as MCPBaseClient

    Caller->>ClientImpl: request session client
    ClientImpl->>Factory: _create_session_client(session_id)
    activate Factory
    Factory->>MCP: instantiate client
    Factory->>Lifetime: spawn _lifetime(client, stop_event)
    Factory-->>ClientImpl: (client, stop_event, lifetime_task)
    deactivate Factory
    ClientImpl->>ClientImpl: store SessionData(client, stop_event, lifetime_task)
    ClientImpl-->>Caller: return client

    Note over Lifetime,MCP: _lifetime runs "async with client", awaits `stop_event`

    Caller->>ClientImpl: trigger cleanup inactive sessions
    ClientImpl->>ClientImpl: collect SessionData to_close
    ClientImpl->>Lifetime: stop_event.set()
    alt lifetime_task present
      ClientImpl->>Lifetime: await lifetime_task
    else no lifetime_task
      ClientImpl->>MCP: call client._close() (fallback)
    end
    Lifetime->>MCP: __aexit__() (client closed)
    Lifetime-->>ClientImpl: task completed

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested labels

breaking

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title uses imperative mood to summarize the primary fix of the cancel scope error by introducing a lifetime task in MCP session cleanup, is under the 72-character limit, and accurately reflects the core changes described in the pull request.
Docstring Coverage	✅ Passed	Docstring coverage is 90.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fe7d883 and 4bf9cad.

📒 Files selected for processing (1)

tests/nat/mcp/test_mcp_session_management.py (12 hunks)

🧰 Additional context used

📓 Path-based instructions (5)

**/*.{py,yaml,yml}

📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)

**/*.{py,yaml,yml}: Configure response_seq as a list of strings; values cycle per call, and [] yields an empty string.
Configure delay_ms to inject per-call artificial latency in milliseconds for nat_test_llm.

Files:

tests/nat/mcp/test_mcp_session_management.py

**/*.py

📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)

**/*.py: Programmatic use: create TestLLMConfig(response_seq=[...], delay_ms=...), add with builder.add_llm("", cfg).
When retrieving the test LLM wrapper, use builder.get_llm(name, wrapper_type=LLMFrameworkEnum.) and call the framework’s method (e.g., ainvoke, achat, call).

**/*.py: In code comments/identifiers use NAT abbreviations as specified: nat for API namespace/CLI, nvidia-nat for package name, NAT for env var prefixes; do not use these abbreviations in documentation
Follow PEP 20 and PEP 8; run yapf with column_limit=120; use 4-space indentation; end files with a single trailing newline
Run ruff check --fix as linter (not formatter) using pyproject.toml config; fix warnings unless explicitly ignored
Respect naming: snake_case for functions/variables, PascalCase for classes, UPPER_CASE for constants
Treat pyright warnings as errors during development
Exception handling: use bare raise to re-raise; log with logger.error() when re-raising to avoid duplicate stack traces; use logger.exception() when catching without re-raising
Provide Google-style docstrings for every public module, class, function, and CLI command; first line concise and ending with a period; surround code entities with backticks
Validate and sanitize all user input, especially in web or CLI interfaces
Prefer httpx with SSL verification enabled by default and follow OWASP Top-10 recommendations
Use async/await for I/O-bound work; profile CPU-heavy paths with cProfile or mprof before optimizing; cache expensive computations with functools.lru_cache or external cache; leverage NumPy vectorized operations when beneficial

Files:

tests/nat/mcp/test_mcp_session_management.py

tests/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Unit tests reside under tests/ and should use markers defined in pyproject.toml (e.g., integration)

Files:

tests/nat/mcp/test_mcp_session_management.py

⚙️ CodeRabbit configuration file

tests/**/*.py: - Ensure that tests are comprehensive, cover edge cases, and validate the functionality of the code. - Test functions should be named using the test_ prefix, using snake_case. - Any frequently repeated code should be extracted into pytest fixtures. - Pytest fixtures should define the name argument when applying the pytest.fixture decorator. The fixture
function being decorated should be named using the fixture_ prefix, using snake_case. Example:
@pytest.fixture(name="my_fixture")
def fixture_my_fixture():
pass

Files:

tests/nat/mcp/test_mcp_session_management.py

{tests/**/*.py,examples/*/tests/**/*.py}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

{tests/**/*.py,examples/*/tests/**/*.py}: Use pytest (with pytest-asyncio for async); name test files test_*.py; test functions start with test_; extract repeated code into fixtures; fixtures must set name in decorator and be named with fixture_ prefix
Mock external services with pytest_httpserver or unittest.mock; do not hit live endpoints
Mark expensive tests with @pytest.mark.slow or @pytest.mark.integration

Files:

tests/nat/mcp/test_mcp_session_management.py

**/*

⚙️ CodeRabbit configuration file

**/*: # Code Review Instructions
Ensure the code follows best practices and coding standards. - For Python code, follow
PEP 20 and
PEP 8 for style guidelines.
Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
Example:
def my_function(param1: int, param2: str) -> bool:
    pass
For Python exception handling, ensure proper stack trace preservation:

When re-raising exceptions: use bare raise statements to maintain the original stack trace,
and use logger.error() (not logger.exception()) to avoid duplicate stack trace output.

When catching and logging exceptions without re-raising: always use logger.exception()
to capture the full stack trace information.
Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any

words listed in the ci/vale/styles/config/vocabularies/nat/reject.txt file, words that might appear to be
spelling mistakes but are listed in the ci/vale/styles/config/vocabularies/nat/accept.txt file are OK.

Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,

and should contain an Apache License 2.0 header comment at the top of each file.

Confirm that copyright years are up-to date whenever a file is changed.

Files:

tests/nat/mcp/test_mcp_session_management.py

🧬 Code graph analysis (1)

tests/nat/mcp/test_mcp_session_management.py (1)

packages/nvidia_nat_mcp/src/nat/plugins/mcp/client_impl.py (6)

SessionData (40-49)

cleanup_sessions (186-199)

_create_session_client (347-413)

_cleanup_inactive_sessions (201-256)

_get_session_client (258-314)

_session_usage_context (317-345)

🪛 Ruff (0.13.3)

tests/nat/mcp/test_mcp_session_management.py

599-599: Unused function argument: self

(ARG001)

607-607: Pattern passed to match= contains metacharacters but is neither escaped nor raw

(RUF043)

644-644: Unused function argument: self

(ARG001)

650-650: Unused function argument: self

(ARG001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: CI Pipeline / Check

🔇 Additional comments (12)

tests/nat/mcp/test_mcp_session_management.py (12)

28-28: LGTM!

The import of SessionData is necessary for tests that directly construct session data objects to verify lifecycle management.

34-50: LGTM with robust cleanup logic.

The helper method properly manages session cleanup by:

Signaling stop events for each session's lifetime task

Implementing timeout handling to prevent indefinite waits

Gracefully handling task cancellation

Ensuring the sessions mapping is cleared

This approach ensures test isolation and prevents resource leaks between tests.

127-127: LGTM - Proper test cleanup.

The cleanup calls ensure test isolation by properly stopping lifetime tasks and clearing sessions after each test. This prevents resource leaks and ensures tests don't interfere with each other.

Also applies to: 148-148, 174-174, 199-199, 251-251, 281-281, 308-308, 364-364, 414-414, 454-454, 552-552, 580-580, 675-675, 802-802, 844-844

554-581: LGTM - Comprehensive lifecycle verification.

The test properly verifies:

Successful client initialization with lifetime task

Correct types for returned values (client, stop_event, lifetime_task)

aenter invocation during initialization

Proper cleanup via stop_event signaling

582-592: LGTM - Error handling verification.

The test correctly verifies that initialization failures in __aenter__ are properly propagated with descriptive error messages.

610-632: LGTM - Stop event mechanism verification.

The test properly verifies:

Lifetime task responds to stop_event signal

Task completes after stop_event is set

aexit is called with correct arguments (None, None, None) indicating normal exit

633-676: LGTM - Critical test for cancel scope compliance.

This test validates the core objective of the PR: ensuring __aenter__ and __aexit__ execute in the same task to respect anyio's CancelScope requirements. The verification of matching task IDs and proper task naming provides strong evidence that the fix prevents the "Attempted to exit cancel scope in a different task" error.

Note: The unused self parameters on lines 644 and 650 are acceptable as they match the async context manager method signatures.

677-705: LGTM - Cleanup integration verification.

The test properly verifies that the cleanup mechanism:

Signals the lifetime task via stop_event

Removes the session from tracking

Ensures aexit is called during cleanup

706-739: LGTM - Active session protection.

The test correctly verifies that sessions with active references (ref_count > 0) are protected from cleanup, even when they're old. This ensures in-use sessions are not prematurely terminated.

740-769: LGTM - Edge case handling.

The test verifies that cleanup handles already-completed lifetime tasks gracefully, ensuring the cleanup logic is robust against race conditions where a task might complete before cleanup is triggered.

770-803: LGTM - End-to-end lifecycle test.

The test provides comprehensive verification of the complete session lifecycle:

Session creation with lifetime task

Context manager usage with proper ref_count tracking

Graceful cleanup via stop_event

This integration test ensures all components work together correctly.

804-845: LGTM - Multi-session independence verification.

The test properly verifies that multiple sessions with lifetime tasks operate independently:

Each session has its own lifecycle

Stop events are handled independently

Tasks complete without interfering with each other

The use of asyncio.gather with return_exceptions=True ensures both tasks complete even if one fails.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (1)

packages/nvidia_nat_mcp/src/nat/plugins/mcp/client_impl.py (1)

47-49: Consider making lifetime_task non-optional for clarity.

The type hint lifetime_task: asyncio.Task | None = None suggests it's optional, but _create_session_client always creates and returns a task (line 377-382). In practice, lifetime_task should never be None after session creation. The optional type and None default exist only for dataclass initialization, but could confuse readers about the actual invariants.

Consider either:

Removing the | None from the type hint and using field(init=False) if a default is needed

Adding a comment explaining that lifetime_task is always set during session creation and should never be None in normal operation

</review_comment_end>

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 62c05b5 and 1571c44.

📒 Files selected for processing (1)

packages/nvidia_nat_mcp/src/nat/plugins/mcp/client_impl.py (7 hunks)

🧰 Additional context used

📓 Path-based instructions (6)

**/*.{py,yaml,yml}

📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)

**/*.{py,yaml,yml}: Configure response_seq as a list of strings; values cycle per call, and [] yields an empty string.
Configure delay_ms to inject per-call artificial latency in milliseconds for nat_test_llm.

Files:

packages/nvidia_nat_mcp/src/nat/plugins/mcp/client_impl.py

**/*.py

📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)

**/*.py: Programmatic use: create TestLLMConfig(response_seq=[...], delay_ms=...), add with builder.add_llm("", cfg).
When retrieving the test LLM wrapper, use builder.get_llm(name, wrapper_type=LLMFrameworkEnum.) and call the framework’s method (e.g., ainvoke, achat, call).

**/*.py: In code comments/identifiers use NAT abbreviations as specified: nat for API namespace/CLI, nvidia-nat for package name, NAT for env var prefixes; do not use these abbreviations in documentation
Follow PEP 20 and PEP 8; run yapf with column_limit=120; use 4-space indentation; end files with a single trailing newline
Run ruff check --fix as linter (not formatter) using pyproject.toml config; fix warnings unless explicitly ignored
Respect naming: snake_case for functions/variables, PascalCase for classes, UPPER_CASE for constants
Treat pyright warnings as errors during development
Exception handling: use bare raise to re-raise; log with logger.error() when re-raising to avoid duplicate stack traces; use logger.exception() when catching without re-raising
Provide Google-style docstrings for every public module, class, function, and CLI command; first line concise and ending with a period; surround code entities with backticks
Validate and sanitize all user input, especially in web or CLI interfaces
Prefer httpx with SSL verification enabled by default and follow OWASP Top-10 recommendations
Use async/await for I/O-bound work; profile CPU-heavy paths with cProfile or mprof before optimizing; cache expensive computations with functools.lru_cache or external cache; leverage NumPy vectorized operations when beneficial

Files:

packages/nvidia_nat_mcp/src/nat/plugins/mcp/client_impl.py

packages/*/src/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Importable Python code inside packages must live under packages//src/

Files:

packages/nvidia_nat_mcp/src/nat/plugins/mcp/client_impl.py

{src/**/*.py,packages/*/src/**/*.py}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

All public APIs must have Python 3.11+ type hints on parameters and return values; prefer typing/collections.abc abstractions; use typing.Annotated when useful

Files:

packages/nvidia_nat_mcp/src/nat/plugins/mcp/client_impl.py

**/*

⚙️ CodeRabbit configuration file

**/*: # Code Review Instructions
Ensure the code follows best practices and coding standards. - For Python code, follow
PEP 20 and
PEP 8 for style guidelines.
Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
Example:
def my_function(param1: int, param2: str) -> bool:
    pass
For Python exception handling, ensure proper stack trace preservation:

When re-raising exceptions: use bare raise statements to maintain the original stack trace,
and use logger.error() (not logger.exception()) to avoid duplicate stack trace output.

When catching and logging exceptions without re-raising: always use logger.exception()
to capture the full stack trace information.
Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any

words listed in the ci/vale/styles/config/vocabularies/nat/reject.txt file, words that might appear to be
spelling mistakes but are listed in the ci/vale/styles/config/vocabularies/nat/accept.txt file are OK.

Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,

and should contain an Apache License 2.0 header comment at the top of each file.

Confirm that copyright years are up-to date whenever a file is changed.

Files:

packages/nvidia_nat_mcp/src/nat/plugins/mcp/client_impl.py

packages/**/*

⚙️ CodeRabbit configuration file

packages/**/*: - This directory contains optional plugin packages for the toolkit, each should contain a pyproject.toml file. - The pyproject.toml file should declare a dependency on nvidia-nat or another package with a name starting
with nvidia-nat-. This dependency should be declared using ~=<version>, and the version should be a two
digit version (ex: ~=1.0).

Not all packages contain Python code, if they do they should also contain their own set of tests, in a
tests/ directory at the same level as the pyproject.toml file.

Files:

packages/nvidia_nat_mcp/src/nat/plugins/mcp/client_impl.py

🧬 Code graph analysis (1)

packages/nvidia_nat_mcp/src/nat/plugins/mcp/client_impl.py (2)

packages/nvidia_nat_mcp/src/nat/plugins/mcp/utils.py (1)

truncate_session_id (24-37)

packages/nvidia_nat_mcp/src/nat/plugins/mcp/client_base.py (1)

MCPBaseClient (128-407)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: CI Pipeline / Check

🔇 Additional comments (2)

packages/nvidia_nat_mcp/src/nat/plugins/mcp/client_impl.py (2)

297-304: LGTM!

The unpacking of the triple return value and initialization of SessionData correctly handles the new session lifecycle management pattern.

</review_comment_end>

342-342: LGTM!

The function signature correctly reflects the new return type with proper type hints as required by the coding guidelines.

</review_comment_end>

packages/nvidia_nat_mcp/src/nat/plugins/mcp/client_impl.py

Signed-off-by: Anuradha Karuppiah <[email protected]>

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

packages/nvidia_nat_mcp/src/nat/plugins/mcp/client_impl.py (1)
370-413: Excellent implementation of lifetime task pattern!

The lifetime task implementation correctly solves the cancel scope violation by ensuring __aenter__ and __aexit__ occur in the same task context. Key strengths:

Prevents initialization hang: The try/except block (lines 378-384) ensures ready.set() is called even if __aenter__ fails, preventing indefinite blocking at line 391.

Timeout protection: Initialization waits with a timeout (lines 388-401), preventing resource leaks from stuck clients.

Failure detection: Lines 404-409 properly detect and propagate initialization failures by checking if the task completed before entering the wait state.

Session ID privacy: Task name uses truncate_session_id() (line 386) to avoid exposing sensitive information in debugging tools.

Optional refinement: At line 401, consider using from None to clarify the timeout is a new error context unrelated to any caught exception:
     raise RuntimeError(f"Session client initialization timed out after {timeout}s")
+    # Or more explicitly:
+    raise RuntimeError(f"Session client initialization timed out after {timeout}s") from None

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1571c44 and 49be113.

📒 Files selected for processing (1)

packages/nvidia_nat_mcp/src/nat/plugins/mcp/client_impl.py (7 hunks)

🧰 Additional context used

📓 Path-based instructions (6)

**/*.{py,yaml,yml}

📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)

**/*.{py,yaml,yml}: Configure response_seq as a list of strings; values cycle per call, and [] yields an empty string.
Configure delay_ms to inject per-call artificial latency in milliseconds for nat_test_llm.

Files:

packages/nvidia_nat_mcp/src/nat/plugins/mcp/client_impl.py

**/*.py

📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)

**/*.py: Programmatic use: create TestLLMConfig(response_seq=[...], delay_ms=...), add with builder.add_llm("", cfg).
When retrieving the test LLM wrapper, use builder.get_llm(name, wrapper_type=LLMFrameworkEnum.) and call the framework’s method (e.g., ainvoke, achat, call).

**/*.py: In code comments/identifiers use NAT abbreviations as specified: nat for API namespace/CLI, nvidia-nat for package name, NAT for env var prefixes; do not use these abbreviations in documentation
Follow PEP 20 and PEP 8; run yapf with column_limit=120; use 4-space indentation; end files with a single trailing newline
Run ruff check --fix as linter (not formatter) using pyproject.toml config; fix warnings unless explicitly ignored
Respect naming: snake_case for functions/variables, PascalCase for classes, UPPER_CASE for constants
Treat pyright warnings as errors during development
Exception handling: use bare raise to re-raise; log with logger.error() when re-raising to avoid duplicate stack traces; use logger.exception() when catching without re-raising
Provide Google-style docstrings for every public module, class, function, and CLI command; first line concise and ending with a period; surround code entities with backticks
Validate and sanitize all user input, especially in web or CLI interfaces
Prefer httpx with SSL verification enabled by default and follow OWASP Top-10 recommendations
Use async/await for I/O-bound work; profile CPU-heavy paths with cProfile or mprof before optimizing; cache expensive computations with functools.lru_cache or external cache; leverage NumPy vectorized operations when beneficial

Files:

packages/nvidia_nat_mcp/src/nat/plugins/mcp/client_impl.py

packages/*/src/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Importable Python code inside packages must live under packages//src/

Files:

packages/nvidia_nat_mcp/src/nat/plugins/mcp/client_impl.py

{src/**/*.py,packages/*/src/**/*.py}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

All public APIs must have Python 3.11+ type hints on parameters and return values; prefer typing/collections.abc abstractions; use typing.Annotated when useful

Files:

packages/nvidia_nat_mcp/src/nat/plugins/mcp/client_impl.py

**/*

⚙️ CodeRabbit configuration file

**/*: # Code Review Instructions
Ensure the code follows best practices and coding standards. - For Python code, follow
PEP 20 and
PEP 8 for style guidelines.
Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
Example:
def my_function(param1: int, param2: str) -> bool:
    pass
For Python exception handling, ensure proper stack trace preservation:

When re-raising exceptions: use bare raise statements to maintain the original stack trace,
and use logger.error() (not logger.exception()) to avoid duplicate stack trace output.

When catching and logging exceptions without re-raising: always use logger.exception()
to capture the full stack trace information.
Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any

words listed in the ci/vale/styles/config/vocabularies/nat/reject.txt file, words that might appear to be
spelling mistakes but are listed in the ci/vale/styles/config/vocabularies/nat/accept.txt file are OK.

Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,

and should contain an Apache License 2.0 header comment at the top of each file.

Confirm that copyright years are up-to date whenever a file is changed.

Files:

packages/nvidia_nat_mcp/src/nat/plugins/mcp/client_impl.py

packages/**/*

⚙️ CodeRabbit configuration file

packages/**/*: - This directory contains optional plugin packages for the toolkit, each should contain a pyproject.toml file. - The pyproject.toml file should declare a dependency on nvidia-nat or another package with a name starting
with nvidia-nat-. This dependency should be declared using ~=<version>, and the version should be a two
digit version (ex: ~=1.0).

Not all packages contain Python code, if they do they should also contain their own set of tests, in a
tests/ directory at the same level as the pyproject.toml file.

Files:

packages/nvidia_nat_mcp/src/nat/plugins/mcp/client_impl.py

🧬 Code graph analysis (1)

packages/nvidia_nat_mcp/src/nat/plugins/mcp/client_impl.py (2)

packages/nvidia_nat_mcp/src/nat/plugins/mcp/utils.py (1)

truncate_session_id (24-37)

packages/nvidia_nat_mcp/src/nat/plugins/mcp/client_base.py (2)

MCPBaseClient (128-407)

name (598-600)

🪛 Ruff (0.13.3)

packages/nvidia_nat_mcp/src/nat/plugins/mcp/client_impl.py

398-400: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

401-401: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

401-401: Avoid specifying long messages outside the exception class

(TRY003)

408-408: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

409-409: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (3)

packages/nvidia_nat_mcp/src/nat/plugins/mcp/client_impl.py (3)

47-49: LGTM! Lifetime task fields properly integrated.

The stop_event and lifetime_task fields are correctly added with appropriate defaults. The Event factory ensures each SessionData instance gets its own event object, and lifetime_task being optional handles cases where it hasn't been created yet.

239-256: LGTM! Cleanup properly respects cancel scope boundaries.

The cleanup logic correctly addresses the core issue by:

Signaling stop_event to trigger graceful shutdown in the lifetime task

Awaiting the lifetime task to ensure __aexit__ runs in the same task that called __aenter__

Including fallback handling for edge cases where the lifetime task is missing

This prevents the "Attempted to exit cancel scope in a different task" error while maintaining thread-safe session management.

302-309: LGTM! Session creation properly wired.

The unpacking and SessionData initialization correctly handles the new return type from _create_session_client, storing all lifecycle components needed for proper cleanup.

Signed-off-by: Anuradha Karuppiah <[email protected]>

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 49be113 and fe7d883.

📒 Files selected for processing (1)

tests/nat/mcp/test_mcp_session_management.py (12 hunks)

🧰 Additional context used

📓 Path-based instructions (5)

**/*.{py,yaml,yml}

📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)

**/*.{py,yaml,yml}: Configure response_seq as a list of strings; values cycle per call, and [] yields an empty string.
Configure delay_ms to inject per-call artificial latency in milliseconds for nat_test_llm.

Files:

tests/nat/mcp/test_mcp_session_management.py

**/*.py

📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)

**/*.py: Programmatic use: create TestLLMConfig(response_seq=[...], delay_ms=...), add with builder.add_llm("", cfg).
When retrieving the test LLM wrapper, use builder.get_llm(name, wrapper_type=LLMFrameworkEnum.) and call the framework’s method (e.g., ainvoke, achat, call).

**/*.py: In code comments/identifiers use NAT abbreviations as specified: nat for API namespace/CLI, nvidia-nat for package name, NAT for env var prefixes; do not use these abbreviations in documentation
Follow PEP 20 and PEP 8; run yapf with column_limit=120; use 4-space indentation; end files with a single trailing newline
Run ruff check --fix as linter (not formatter) using pyproject.toml config; fix warnings unless explicitly ignored
Respect naming: snake_case for functions/variables, PascalCase for classes, UPPER_CASE for constants
Treat pyright warnings as errors during development
Exception handling: use bare raise to re-raise; log with logger.error() when re-raising to avoid duplicate stack traces; use logger.exception() when catching without re-raising
Provide Google-style docstrings for every public module, class, function, and CLI command; first line concise and ending with a period; surround code entities with backticks
Validate and sanitize all user input, especially in web or CLI interfaces
Prefer httpx with SSL verification enabled by default and follow OWASP Top-10 recommendations
Use async/await for I/O-bound work; profile CPU-heavy paths with cProfile or mprof before optimizing; cache expensive computations with functools.lru_cache or external cache; leverage NumPy vectorized operations when beneficial

Files:

tests/nat/mcp/test_mcp_session_management.py

tests/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Unit tests reside under tests/ and should use markers defined in pyproject.toml (e.g., integration)

Files:

tests/nat/mcp/test_mcp_session_management.py

⚙️ CodeRabbit configuration file

tests/**/*.py: - Ensure that tests are comprehensive, cover edge cases, and validate the functionality of the code. - Test functions should be named using the test_ prefix, using snake_case. - Any frequently repeated code should be extracted into pytest fixtures. - Pytest fixtures should define the name argument when applying the pytest.fixture decorator. The fixture
function being decorated should be named using the fixture_ prefix, using snake_case. Example:
@pytest.fixture(name="my_fixture")
def fixture_my_fixture():
pass

Files:

tests/nat/mcp/test_mcp_session_management.py

{tests/**/*.py,examples/*/tests/**/*.py}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

{tests/**/*.py,examples/*/tests/**/*.py}: Use pytest (with pytest-asyncio for async); name test files test_*.py; test functions start with test_; extract repeated code into fixtures; fixtures must set name in decorator and be named with fixture_ prefix
Mock external services with pytest_httpserver or unittest.mock; do not hit live endpoints
Mark expensive tests with @pytest.mark.slow or @pytest.mark.integration

Files:

tests/nat/mcp/test_mcp_session_management.py

**/*

⚙️ CodeRabbit configuration file

**/*: # Code Review Instructions
Ensure the code follows best practices and coding standards. - For Python code, follow
PEP 20 and
PEP 8 for style guidelines.
Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
Example:
def my_function(param1: int, param2: str) -> bool:
    pass
For Python exception handling, ensure proper stack trace preservation:

When re-raising exceptions: use bare raise statements to maintain the original stack trace,
and use logger.error() (not logger.exception()) to avoid duplicate stack trace output.

When catching and logging exceptions without re-raising: always use logger.exception()
to capture the full stack trace information.
Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any

words listed in the ci/vale/styles/config/vocabularies/nat/reject.txt file, words that might appear to be
spelling mistakes but are listed in the ci/vale/styles/config/vocabularies/nat/accept.txt file are OK.

Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,

and should contain an Apache License 2.0 header comment at the top of each file.

Confirm that copyright years are up-to date whenever a file is changed.

Files:

tests/nat/mcp/test_mcp_session_management.py

🧬 Code graph analysis (1)

tests/nat/mcp/test_mcp_session_management.py (1)

packages/nvidia_nat_mcp/src/nat/plugins/mcp/client_impl.py (6)

SessionData (40-49)

cleanup_sessions (186-199)

_create_session_client (347-413)

_cleanup_inactive_sessions (201-256)

_get_session_client (258-314)

_session_usage_context (317-345)

🪛 Ruff (0.13.3)

tests/nat/mcp/test_mcp_session_management.py

599-599: Unused function argument: self

(ARG001)

607-607: Pattern passed to match= contains metacharacters but is neither escaped nor raw

(RUF043)

644-644: Unused function argument: self

(ARG001)

650-650: Unused function argument: self

(ARG001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: CI Pipeline / Check

🔇 Additional comments (5)

tests/nat/mcp/test_mcp_session_management.py (5)

34-49: LGTM! Well-structured test cleanup helper.

The cleanup logic properly handles the lifetime task lifecycle: signals stop_event, waits with timeout, and gracefully handles cancellation. The defensive hasattr checks ensure robustness.

126-127: Good test hygiene with cleanup calls.

Adding explicit cleanup at the end of tests that create sessions ensures proper resource management and prevents leaks between tests.

Also applies to: 147-148, 173-174, 198-199, 250-251, 280-281, 307-308, 363-364, 413-414, 453-454, 551-552

554-580: LGTM! Comprehensive test of successful initialization.

The test properly verifies the lifetime task is created, running, and that __aenter__ is called. The cleanup sequence (set stop_event, await task) is correct.

582-591: LGTM! Proper failure handling test.

The test correctly verifies that initialization failures are propagated with appropriate error messages.

610-844: Excellent comprehensive test coverage for lifetime task functionality.

These tests thoroughly cover:

Cleanup on stop_event signal

Cancel scope task boundaries (verifying enter/exit in same task)

Cleanup with lifetime tasks

Preservation of active sessions

Handling of already-completed tasks

Complete session lifecycle

Multiple independent sessions

The logic is correct, assertions are appropriate, and edge cases are well covered.

Note: Static analysis warnings about unused self parameters on lines 644 and 650 are false positives—these are intentional mock method signatures that require the parameter to match the async context manager protocol.

tests/nat/mcp/test_mcp_session_management.py

Signed-off-by: Anuradha Karuppiah <[email protected]>

AnuradhaKaruppiah · 2025-10-07T02:32:59Z

/merge

AnuradhaKaruppiah and others added 30 commits October 2, 2025 08:54

Initial changes for user isolation

9a24b3b

Signed-off-by: Anuradha Karuppiah <[email protected]>

Add a config control for enabling/disabling session aware tools

062fe8f

Signed-off-by: Anuradha Karuppiah <[email protected]>

Move config to a separate file

31e1e00

Signed-off-by: Anuradha Karuppiah <[email protected]>

Set the user_id in per-session auth_provider and drop metadata

38a1f34

Signed-off-by: Anuradha Karuppiah <[email protected]>

Update docs with warnings

19c12b6

Signed-off-by: Anuradha Karuppiah <[email protected]>

Trigger cleanup on new session creation

badaf0f

Signed-off-by: Anuradha Karuppiah <[email protected]>

Add a limit on the number of sessions that can be created

4c4da17

Signed-off-by: Anuradha Karuppiah <[email protected]>

Make the idle timeout configurable

a63968b

Signed-off-by: Anuradha Karuppiah <[email protected]>

Session_id cannot be None for tool calls

5c8f50b

Signed-off-by: Anuradha Karuppiah <[email protected]>

Fix cleanup lock handling

b06318b

Signed-off-by: Anuradha Karuppiah <[email protected]>

Add a lock for session creation

28c71bb

Signed-off-by: Anuradha Karuppiah <[email protected]>

Truncate session id

979ff64

Signed-off-by: Anuradha Karuppiah <[email protected]>

Style fixes

05f376a

Signed-off-by: Anuradha Karuppiah <[email protected]>

Use a shared auth provider

6bbc661

Signed-off-by: Anuradha Karuppiah <[email protected]>

Update docs with the new config options

564d1b9

Signed-off-by: Anuradha Karuppiah <[email protected]>

Fix unit tests

e239a50

Signed-off-by: Anuradha Karuppiah <[email protected]>

Merge remote-tracking branch 'upstream/release/1.3' into ak-user-isol…

0f42615

…ation

Add unit tests for session management

40ef3cb

Signed-off-by: Anuradha Karuppiah <[email protected]>

Remove unused import

22f1b24

Signed-off-by: Anuradha Karuppiah <[email protected]>

Make session cleanup a private method

dcd1651

Signed-off-by: Anuradha Karuppiah <[email protected]>

Add a more graceful error when max sessions is exceeded

2d01f6a

Signed-off-by: Anuradha Karuppiah <[email protected]>

Log the number of sessions on create and cleanup

6299936

Signed-off-by: Anuradha Karuppiah <[email protected]>

Update docs/source/workflows/mcp/mcp-auth.md

14b4bc2

Co-authored-by: Will Killian <[email protected]> Signed-off-by: Anuradha Karuppiah <[email protected]>

Fix test failure

d4ccbfb

Signed-off-by: Anuradha Karuppiah <[email protected]>

Fix style

ef8bd77

Signed-off-by: Anuradha Karuppiah <[email protected]>

Move to a multiple reader lock

4028a46

Also add per-session locks for ref counting Signed-off-by: Anuradha Karuppiah <[email protected]>

Consolidate the state data and simplify checks

65825e5

Signed-off-by: Anuradha Karuppiah <[email protected]>

Address review comments

7114e46

Signed-off-by: Anuradha Karuppiah <[email protected]>

Use base client if user-id matches the default

30d1a4f

Signed-off-by: Anuradha Karuppiah <[email protected]>

Add a public method for session cleanup and unit tests

49adebf

Signed-off-by: Anuradha Karuppiah <[email protected]>

AnuradhaKaruppiah added 2 commits October 6, 2025 15:58

Merge remote-tracking branch 'upstream/release/1.3' into ak-user-isol…

1571c44

…ation-2

AnuradhaKaruppiah self-assigned this Oct 6, 2025

AnuradhaKaruppiah added bug Something isn't working non-breaking Non-breaking change labels Oct 6, 2025

AnuradhaKaruppiah changed the base branch from develop to release/1.3 October 6, 2025 23:38

AnuradhaKaruppiah marked this pull request as ready for review October 6, 2025 23:41

AnuradhaKaruppiah requested a review from a team as a code owner October 6, 2025 23:41

coderabbitai bot added the breaking Breaking change label Oct 6, 2025

AnuradhaKaruppiah removed the breaking Breaking change label Oct 6, 2025

yczhang-nv approved these changes Oct 6, 2025

View reviewed changes

coderabbitai bot reviewed Oct 6, 2025

View reviewed changes

packages/nvidia_nat_mcp/src/nat/plugins/mcp/client_impl.py Show resolved Hide resolved

packages/nvidia_nat_mcp/src/nat/plugins/mcp/client_impl.py Outdated Show resolved Hide resolved

packages/nvidia_nat_mcp/src/nat/plugins/mcp/client_impl.py Outdated Show resolved Hide resolved

AnuradhaKaruppiah changed the title ~~Resolve cancel scope error in MCP session cleanup using lifetime task pattern~~ Resolve cancel scope error in MCP session cleanup with lifetime task Oct 7, 2025

AnuradhaKaruppiah added 2 commits October 6, 2025 17:18

Update in response to review comments

656bd3f

Signed-off-by: Anuradha Karuppiah <[email protected]>

Avoid waiting indefinitely if session fails to init

49be113

Signed-off-by: Anuradha Karuppiah <[email protected]>

coderabbitai bot added the breaking Breaking change label Oct 7, 2025

AnuradhaKaruppiah removed the breaking Breaking change label Oct 7, 2025

coderabbitai bot reviewed Oct 7, 2025

View reviewed changes

AnuradhaKaruppiah added 2 commits October 6, 2025 18:09

Add unit tests for lifetime task

7a13333

Signed-off-by: Anuradha Karuppiah <[email protected]>

Add cleanup to tests

fe7d883

Signed-off-by: Anuradha Karuppiah <[email protected]>

coderabbitai bot added the breaking Breaking change label Oct 7, 2025

AnuradhaKaruppiah removed the breaking Breaking change label Oct 7, 2025

coderabbitai bot reviewed Oct 7, 2025

View reviewed changes

tests/nat/mcp/test_mcp_session_management.py Show resolved Hide resolved

Fix CI failures

4bf9cad

Signed-off-by: Anuradha Karuppiah <[email protected]>

coderabbitai bot added the breaking Breaking change label Oct 7, 2025

AnuradhaKaruppiah removed the breaking Breaking change label Oct 7, 2025

rapids-bot bot merged commit 21f2e30 into NVIDIA:release/1.3 Oct 7, 2025
17 checks passed

AnuradhaKaruppiah deleted the ak-user-isolation-2 branch October 7, 2025 21:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Resolve cancel scope error in MCP session cleanup with lifetime task #931

Resolve cancel scope error in MCP session cleanup with lifetime task #931

Uh oh!

AnuradhaKaruppiah commented Oct 6, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Oct 6, 2025 •

edited

Loading

Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,

Uh oh!

coderabbitai bot left a comment

Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,

Uh oh!

coderabbitai bot left a comment

Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,

Uh oh!

Uh oh!

AnuradhaKaruppiah commented Oct 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Resolve cancel scope error in MCP session cleanup with lifetime task #931

Resolve cancel scope error in MCP session cleanup with lifetime task #931

Uh oh!

Conversation

AnuradhaKaruppiah commented Oct 6, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

By Submitting this PR I confirm:

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested labels

Pre-merge checks and finishing touches

Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,

Uh oh!

Uh oh!

AnuradhaKaruppiah commented Oct 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

AnuradhaKaruppiah commented Oct 6, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 6, 2025 •

edited

Loading