Skip to content

Conversation

@AnuradhaKaruppiah
Copy link
Contributor

@AnuradhaKaruppiah AnuradhaKaruppiah commented Oct 6, 2025

Description

Previously, session cleanup was calling client.aexit() from a different task context than where aenter() was called, violating anyio's CancelScope requirement that enter and exit must happen in the same task. This caused "Attempted to exit cancel scope in a different task" errors during session cleanup.

Solution:

  • Introduce a per-client lifetime task that manages the entire client lifecycle
  • The lifetime task enters the client context (async with client:) and waits for a stop_event signal before exiting
  • Session cleanup now signals the stop_event and waits for the lifetime task to complete, ensuring aexit runs in the correct task context

This ensures proper cancel scope handling and prevents resource leaks while maintaining thread-safe session management.

By Submitting this PR I confirm:

  • I am familiar with the Contributing Guidelines.
  • We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
    • Any contribution which contains commits that are not Signed-Off will not be accepted.
  • When the PR is ready for review, new or existing tests cover these changes.
  • When the PR is ready for review, the documentation is up to date with these changes.

Summary by CodeRabbit

  • Bug Fixes

    • Sessions now shut down cleanly (stop signaling and fallback cleanup), reducing hangs and timeouts; inactive sessions are reliably removed.
  • Refactor

    • Session lifecycle reworked to include explicit stop events and managed background lifetime tasks for more robust session management.
  • Chores

    • Initialization and cleanup wiring improved to avoid orphaned tasks and lower resource use during idle/long runs.
  • Tests

    • Test harness updated with session cleanup helpers to ensure per-session tasks are torn down and improve isolation.

AnuradhaKaruppiah and others added 30 commits October 2, 2025 08:54
Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
Co-authored-by: Will Killian <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
Also add per-session locks for ref counting

Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
…k pattern

Previously, session cleanup was calling client.__aexit__() from a different
task context than where __aenter__() was called, violating anyio's CancelScope
requirement that enter and exit must happen in the same task. This caused
"Attempted to exit cancel scope in a different task" errors during session
cleanup.

Solution:
- Introduce a per-client lifetime task that manages the entire client lifecycle
- The lifetime task enters the client context (async with client:) and waits
  for a stop_event signal before exiting
- Session cleanup now signals the stop_event and waits for the lifetime task
  to complete, ensuring __aexit__ runs in the correct task context
- Add SessionData fields: stop_event (asyncio.Event) and lifetime_task (asyncio.Task)
- Update _create_session_client to return (client, stop_event, task) tuple

This ensures proper cancel scope handling and prevents resource leaks while
maintaining thread-safe session management.

Signed-off-by: Anuradha Karuppiah <[email protected]>
@AnuradhaKaruppiah AnuradhaKaruppiah self-assigned this Oct 6, 2025
@AnuradhaKaruppiah AnuradhaKaruppiah added bug Something isn't working non-breaking Non-breaking change labels Oct 6, 2025
@AnuradhaKaruppiah AnuradhaKaruppiah changed the base branch from develop to release/1.3 October 6, 2025 23:38
@coderabbitai
Copy link

coderabbitai bot commented Oct 6, 2025

Walkthrough

Session lifecycle expanded: SessionData gains stop_event and lifetime_task; _create_session_client now returns (client, stop_event, lifetime_task) and starts a background lifetime task that manages the client's async context; _get_session_client stores these; _cleanup_inactive_sessions signals stop_event and awaits or falls back to direct close, then removes sessions.

Changes

Cohort / File(s) Change Summary
Session lifecycle management
packages/nvidia_nat_mcp/src/nat/plugins/mcp/client_impl.py
SessionData now includes stop_event: asyncio.Event and `lifetime_task: asyncio.Task
Session cleanup flow
packages/nvidia_nat_mcp/src/nat/plugins/mcp/client_impl.py
_cleanup_inactive_sessions collects sessions to close, sets each session's stop_event (if present), awaits the lifetime_task when available, falls back to calling client._close() if no lifetime task exists, and removes sessions from tracking regardless of cleanup success.
Tests: session teardown helper
tests/nat/mcp/test_mcp_session_management.py
Added cleanup_sessions(function_group) helper and imported SessionData; tests now call the helper to stop per-session tasks (stop_event, lifetime_task) and clear the in-memory _sessions mapping to prevent leaks between tests.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Caller
    participant ClientImpl as MCP ClientImpl
    participant Factory as _create_session_client
    participant Lifetime as lifetime_task
    participant MCP as MCPBaseClient

    Caller->>ClientImpl: request session client
    ClientImpl->>Factory: _create_session_client(session_id)
    activate Factory
    Factory->>MCP: instantiate client
    Factory->>Lifetime: spawn _lifetime(client, stop_event)
    Factory-->>ClientImpl: (client, stop_event, lifetime_task)
    deactivate Factory
    ClientImpl->>ClientImpl: store SessionData(client, stop_event, lifetime_task)
    ClientImpl-->>Caller: return client

    Note over Lifetime,MCP: _lifetime runs "async with client", awaits `stop_event`

    Caller->>ClientImpl: trigger cleanup inactive sessions
    ClientImpl->>ClientImpl: collect SessionData to_close
    ClientImpl->>Lifetime: stop_event.set()
    alt lifetime_task present
      ClientImpl->>Lifetime: await lifetime_task
    else no lifetime_task
      ClientImpl->>MCP: call client._close() (fallback)
    end
    Lifetime->>MCP: __aexit__() (client closed)
    Lifetime-->>ClientImpl: task completed
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested labels

breaking

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title uses imperative mood to summarize the primary fix of the cancel scope error by introducing a lifetime task in MCP session cleanup, is under the 72-character limit, and accurately reflects the core changes described in the pull request.
Docstring Coverage ✅ Passed Docstring coverage is 90.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fe7d883 and 4bf9cad.

📒 Files selected for processing (1)
  • tests/nat/mcp/test_mcp_session_management.py (12 hunks)
🧰 Additional context used
📓 Path-based instructions (5)
**/*.{py,yaml,yml}

📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)

**/*.{py,yaml,yml}: Configure response_seq as a list of strings; values cycle per call, and [] yields an empty string.
Configure delay_ms to inject per-call artificial latency in milliseconds for nat_test_llm.

Files:

  • tests/nat/mcp/test_mcp_session_management.py
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)

**/*.py: Programmatic use: create TestLLMConfig(response_seq=[...], delay_ms=...), add with builder.add_llm("", cfg).
When retrieving the test LLM wrapper, use builder.get_llm(name, wrapper_type=LLMFrameworkEnum.) and call the framework’s method (e.g., ainvoke, achat, call).

**/*.py: In code comments/identifiers use NAT abbreviations as specified: nat for API namespace/CLI, nvidia-nat for package name, NAT for env var prefixes; do not use these abbreviations in documentation
Follow PEP 20 and PEP 8; run yapf with column_limit=120; use 4-space indentation; end files with a single trailing newline
Run ruff check --fix as linter (not formatter) using pyproject.toml config; fix warnings unless explicitly ignored
Respect naming: snake_case for functions/variables, PascalCase for classes, UPPER_CASE for constants
Treat pyright warnings as errors during development
Exception handling: use bare raise to re-raise; log with logger.error() when re-raising to avoid duplicate stack traces; use logger.exception() when catching without re-raising
Provide Google-style docstrings for every public module, class, function, and CLI command; first line concise and ending with a period; surround code entities with backticks
Validate and sanitize all user input, especially in web or CLI interfaces
Prefer httpx with SSL verification enabled by default and follow OWASP Top-10 recommendations
Use async/await for I/O-bound work; profile CPU-heavy paths with cProfile or mprof before optimizing; cache expensive computations with functools.lru_cache or external cache; leverage NumPy vectorized operations when beneficial

Files:

  • tests/nat/mcp/test_mcp_session_management.py
tests/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Unit tests reside under tests/ and should use markers defined in pyproject.toml (e.g., integration)

Files:

  • tests/nat/mcp/test_mcp_session_management.py

⚙️ CodeRabbit configuration file

tests/**/*.py: - Ensure that tests are comprehensive, cover edge cases, and validate the functionality of the code. - Test functions should be named using the test_ prefix, using snake_case. - Any frequently repeated code should be extracted into pytest fixtures. - Pytest fixtures should define the name argument when applying the pytest.fixture decorator. The fixture
function being decorated should be named using the fixture_ prefix, using snake_case. Example:
@pytest.fixture(name="my_fixture")
def fixture_my_fixture():
pass

Files:

  • tests/nat/mcp/test_mcp_session_management.py
{tests/**/*.py,examples/*/tests/**/*.py}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

{tests/**/*.py,examples/*/tests/**/*.py}: Use pytest (with pytest-asyncio for async); name test files test_*.py; test functions start with test_; extract repeated code into fixtures; fixtures must set name in decorator and be named with fixture_ prefix
Mock external services with pytest_httpserver or unittest.mock; do not hit live endpoints
Mark expensive tests with @pytest.mark.slow or @pytest.mark.integration

Files:

  • tests/nat/mcp/test_mcp_session_management.py
**/*

⚙️ CodeRabbit configuration file

**/*: # Code Review Instructions

  • Ensure the code follows best practices and coding standards. - For Python code, follow
    PEP 20 and
    PEP 8 for style guidelines.
  • Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
    Example:
    def my_function(param1: int, param2: str) -> bool:
        pass
  • For Python exception handling, ensure proper stack trace preservation:
    • When re-raising exceptions: use bare raise statements to maintain the original stack trace,
      and use logger.error() (not logger.exception()) to avoid duplicate stack trace output.
    • When catching and logging exceptions without re-raising: always use logger.exception()
      to capture the full stack trace information.

Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any

words listed in the ci/vale/styles/config/vocabularies/nat/reject.txt file, words that might appear to be
spelling mistakes but are listed in the ci/vale/styles/config/vocabularies/nat/accept.txt file are OK.

Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,

and should contain an Apache License 2.0 header comment at the top of each file.

  • Confirm that copyright years are up-to date whenever a file is changed.

Files:

  • tests/nat/mcp/test_mcp_session_management.py
🧬 Code graph analysis (1)
tests/nat/mcp/test_mcp_session_management.py (1)
packages/nvidia_nat_mcp/src/nat/plugins/mcp/client_impl.py (6)
  • SessionData (40-49)
  • cleanup_sessions (186-199)
  • _create_session_client (347-413)
  • _cleanup_inactive_sessions (201-256)
  • _get_session_client (258-314)
  • _session_usage_context (317-345)
🪛 Ruff (0.13.3)
tests/nat/mcp/test_mcp_session_management.py

599-599: Unused function argument: self

(ARG001)


607-607: Pattern passed to match= contains metacharacters but is neither escaped nor raw

(RUF043)


644-644: Unused function argument: self

(ARG001)


650-650: Unused function argument: self

(ARG001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: CI Pipeline / Check
🔇 Additional comments (12)
tests/nat/mcp/test_mcp_session_management.py (12)

28-28: LGTM!

The import of SessionData is necessary for tests that directly construct session data objects to verify lifecycle management.


34-50: LGTM with robust cleanup logic.

The helper method properly manages session cleanup by:

  • Signaling stop events for each session's lifetime task
  • Implementing timeout handling to prevent indefinite waits
  • Gracefully handling task cancellation
  • Ensuring the sessions mapping is cleared

This approach ensures test isolation and prevents resource leaks between tests.


127-127: LGTM - Proper test cleanup.

The cleanup calls ensure test isolation by properly stopping lifetime tasks and clearing sessions after each test. This prevents resource leaks and ensures tests don't interfere with each other.

Also applies to: 148-148, 174-174, 199-199, 251-251, 281-281, 308-308, 364-364, 414-414, 454-454, 552-552, 580-580, 675-675, 802-802, 844-844


554-581: LGTM - Comprehensive lifecycle verification.

The test properly verifies:

  • Successful client initialization with lifetime task
  • Correct types for returned values (client, stop_event, lifetime_task)
  • aenter invocation during initialization
  • Proper cleanup via stop_event signaling

582-592: LGTM - Error handling verification.

The test correctly verifies that initialization failures in __aenter__ are properly propagated with descriptive error messages.


610-632: LGTM - Stop event mechanism verification.

The test properly verifies:

  • Lifetime task responds to stop_event signal
  • Task completes after stop_event is set
  • aexit is called with correct arguments (None, None, None) indicating normal exit

633-676: LGTM - Critical test for cancel scope compliance.

This test validates the core objective of the PR: ensuring __aenter__ and __aexit__ execute in the same task to respect anyio's CancelScope requirements. The verification of matching task IDs and proper task naming provides strong evidence that the fix prevents the "Attempted to exit cancel scope in a different task" error.

Note: The unused self parameters on lines 644 and 650 are acceptable as they match the async context manager method signatures.


677-705: LGTM - Cleanup integration verification.

The test properly verifies that the cleanup mechanism:

  • Signals the lifetime task via stop_event
  • Removes the session from tracking
  • Ensures aexit is called during cleanup

706-739: LGTM - Active session protection.

The test correctly verifies that sessions with active references (ref_count > 0) are protected from cleanup, even when they're old. This ensures in-use sessions are not prematurely terminated.


740-769: LGTM - Edge case handling.

The test verifies that cleanup handles already-completed lifetime tasks gracefully, ensuring the cleanup logic is robust against race conditions where a task might complete before cleanup is triggered.


770-803: LGTM - End-to-end lifecycle test.

The test provides comprehensive verification of the complete session lifecycle:

  • Session creation with lifetime task
  • Context manager usage with proper ref_count tracking
  • Graceful cleanup via stop_event

This integration test ensures all components work together correctly.


804-845: LGTM - Multi-session independence verification.

The test properly verifies that multiple sessions with lifetime tasks operate independently:

  • Each session has its own lifecycle
  • Stop events are handled independently
  • Tasks complete without interfering with each other

The use of asyncio.gather with return_exceptions=True ensures both tasks complete even if one fails.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@AnuradhaKaruppiah AnuradhaKaruppiah marked this pull request as ready for review October 6, 2025 23:41
@AnuradhaKaruppiah AnuradhaKaruppiah requested a review from a team as a code owner October 6, 2025 23:41
@coderabbitai coderabbitai bot added the breaking Breaking change label Oct 6, 2025
@AnuradhaKaruppiah AnuradhaKaruppiah removed the breaking Breaking change label Oct 6, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
packages/nvidia_nat_mcp/src/nat/plugins/mcp/client_impl.py (1)

47-49: Consider making lifetime_task non-optional for clarity.

The type hint lifetime_task: asyncio.Task | None = None suggests it's optional, but _create_session_client always creates and returns a task (line 377-382). In practice, lifetime_task should never be None after session creation. The optional type and None default exist only for dataclass initialization, but could confuse readers about the actual invariants.

Consider either:

  1. Removing the | None from the type hint and using field(init=False) if a default is needed
  2. Adding a comment explaining that lifetime_task is always set during session creation and should never be None in normal operation

</review_comment_end>

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 62c05b5 and 1571c44.

📒 Files selected for processing (1)
  • packages/nvidia_nat_mcp/src/nat/plugins/mcp/client_impl.py (7 hunks)
🧰 Additional context used
📓 Path-based instructions (6)
**/*.{py,yaml,yml}

📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)

**/*.{py,yaml,yml}: Configure response_seq as a list of strings; values cycle per call, and [] yields an empty string.
Configure delay_ms to inject per-call artificial latency in milliseconds for nat_test_llm.

Files:

  • packages/nvidia_nat_mcp/src/nat/plugins/mcp/client_impl.py
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)

**/*.py: Programmatic use: create TestLLMConfig(response_seq=[...], delay_ms=...), add with builder.add_llm("", cfg).
When retrieving the test LLM wrapper, use builder.get_llm(name, wrapper_type=LLMFrameworkEnum.) and call the framework’s method (e.g., ainvoke, achat, call).

**/*.py: In code comments/identifiers use NAT abbreviations as specified: nat for API namespace/CLI, nvidia-nat for package name, NAT for env var prefixes; do not use these abbreviations in documentation
Follow PEP 20 and PEP 8; run yapf with column_limit=120; use 4-space indentation; end files with a single trailing newline
Run ruff check --fix as linter (not formatter) using pyproject.toml config; fix warnings unless explicitly ignored
Respect naming: snake_case for functions/variables, PascalCase for classes, UPPER_CASE for constants
Treat pyright warnings as errors during development
Exception handling: use bare raise to re-raise; log with logger.error() when re-raising to avoid duplicate stack traces; use logger.exception() when catching without re-raising
Provide Google-style docstrings for every public module, class, function, and CLI command; first line concise and ending with a period; surround code entities with backticks
Validate and sanitize all user input, especially in web or CLI interfaces
Prefer httpx with SSL verification enabled by default and follow OWASP Top-10 recommendations
Use async/await for I/O-bound work; profile CPU-heavy paths with cProfile or mprof before optimizing; cache expensive computations with functools.lru_cache or external cache; leverage NumPy vectorized operations when beneficial

Files:

  • packages/nvidia_nat_mcp/src/nat/plugins/mcp/client_impl.py
packages/*/src/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Importable Python code inside packages must live under packages//src/

Files:

  • packages/nvidia_nat_mcp/src/nat/plugins/mcp/client_impl.py
{src/**/*.py,packages/*/src/**/*.py}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

All public APIs must have Python 3.11+ type hints on parameters and return values; prefer typing/collections.abc abstractions; use typing.Annotated when useful

Files:

  • packages/nvidia_nat_mcp/src/nat/plugins/mcp/client_impl.py
**/*

⚙️ CodeRabbit configuration file

**/*: # Code Review Instructions

  • Ensure the code follows best practices and coding standards. - For Python code, follow
    PEP 20 and
    PEP 8 for style guidelines.
  • Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
    Example:
    def my_function(param1: int, param2: str) -> bool:
        pass
  • For Python exception handling, ensure proper stack trace preservation:
    • When re-raising exceptions: use bare raise statements to maintain the original stack trace,
      and use logger.error() (not logger.exception()) to avoid duplicate stack trace output.
    • When catching and logging exceptions without re-raising: always use logger.exception()
      to capture the full stack trace information.

Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any

words listed in the ci/vale/styles/config/vocabularies/nat/reject.txt file, words that might appear to be
spelling mistakes but are listed in the ci/vale/styles/config/vocabularies/nat/accept.txt file are OK.

Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,

and should contain an Apache License 2.0 header comment at the top of each file.

  • Confirm that copyright years are up-to date whenever a file is changed.

Files:

  • packages/nvidia_nat_mcp/src/nat/plugins/mcp/client_impl.py
packages/**/*

⚙️ CodeRabbit configuration file

packages/**/*: - This directory contains optional plugin packages for the toolkit, each should contain a pyproject.toml file. - The pyproject.toml file should declare a dependency on nvidia-nat or another package with a name starting
with nvidia-nat-. This dependency should be declared using ~=<version>, and the version should be a two
digit version (ex: ~=1.0).

  • Not all packages contain Python code, if they do they should also contain their own set of tests, in a
    tests/ directory at the same level as the pyproject.toml file.

Files:

  • packages/nvidia_nat_mcp/src/nat/plugins/mcp/client_impl.py
🧬 Code graph analysis (1)
packages/nvidia_nat_mcp/src/nat/plugins/mcp/client_impl.py (2)
packages/nvidia_nat_mcp/src/nat/plugins/mcp/utils.py (1)
  • truncate_session_id (24-37)
packages/nvidia_nat_mcp/src/nat/plugins/mcp/client_base.py (1)
  • MCPBaseClient (128-407)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: CI Pipeline / Check
🔇 Additional comments (2)
packages/nvidia_nat_mcp/src/nat/plugins/mcp/client_impl.py (2)

297-304: LGTM!

The unpacking of the triple return value and initialization of SessionData correctly handles the new session lifecycle management pattern.

</review_comment_end>


342-342: LGTM!

The function signature correctly reflects the new return type with proper type hints as required by the coding guidelines.

</review_comment_end>

@AnuradhaKaruppiah AnuradhaKaruppiah changed the title Resolve cancel scope error in MCP session cleanup using lifetime task pattern Resolve cancel scope error in MCP session cleanup with lifetime task Oct 7, 2025
@coderabbitai coderabbitai bot added the breaking Breaking change label Oct 7, 2025
@AnuradhaKaruppiah AnuradhaKaruppiah removed the breaking Breaking change label Oct 7, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
packages/nvidia_nat_mcp/src/nat/plugins/mcp/client_impl.py (1)

370-413: Excellent implementation of lifetime task pattern!

The lifetime task implementation correctly solves the cancel scope violation by ensuring __aenter__ and __aexit__ occur in the same task context. Key strengths:

  1. Prevents initialization hang: The try/except block (lines 378-384) ensures ready.set() is called even if __aenter__ fails, preventing indefinite blocking at line 391.

  2. Timeout protection: Initialization waits with a timeout (lines 388-401), preventing resource leaks from stuck clients.

  3. Failure detection: Lines 404-409 properly detect and propagate initialization failures by checking if the task completed before entering the wait state.

  4. Session ID privacy: Task name uses truncate_session_id() (line 386) to avoid exposing sensitive information in debugging tools.

Optional refinement: At line 401, consider using from None to clarify the timeout is a new error context unrelated to any caught exception:

     raise RuntimeError(f"Session client initialization timed out after {timeout}s")
+    # Or more explicitly:
+    raise RuntimeError(f"Session client initialization timed out after {timeout}s") from None
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1571c44 and 49be113.

📒 Files selected for processing (1)
  • packages/nvidia_nat_mcp/src/nat/plugins/mcp/client_impl.py (7 hunks)
🧰 Additional context used
📓 Path-based instructions (6)
**/*.{py,yaml,yml}

📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)

**/*.{py,yaml,yml}: Configure response_seq as a list of strings; values cycle per call, and [] yields an empty string.
Configure delay_ms to inject per-call artificial latency in milliseconds for nat_test_llm.

Files:

  • packages/nvidia_nat_mcp/src/nat/plugins/mcp/client_impl.py
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)

**/*.py: Programmatic use: create TestLLMConfig(response_seq=[...], delay_ms=...), add with builder.add_llm("", cfg).
When retrieving the test LLM wrapper, use builder.get_llm(name, wrapper_type=LLMFrameworkEnum.) and call the framework’s method (e.g., ainvoke, achat, call).

**/*.py: In code comments/identifiers use NAT abbreviations as specified: nat for API namespace/CLI, nvidia-nat for package name, NAT for env var prefixes; do not use these abbreviations in documentation
Follow PEP 20 and PEP 8; run yapf with column_limit=120; use 4-space indentation; end files with a single trailing newline
Run ruff check --fix as linter (not formatter) using pyproject.toml config; fix warnings unless explicitly ignored
Respect naming: snake_case for functions/variables, PascalCase for classes, UPPER_CASE for constants
Treat pyright warnings as errors during development
Exception handling: use bare raise to re-raise; log with logger.error() when re-raising to avoid duplicate stack traces; use logger.exception() when catching without re-raising
Provide Google-style docstrings for every public module, class, function, and CLI command; first line concise and ending with a period; surround code entities with backticks
Validate and sanitize all user input, especially in web or CLI interfaces
Prefer httpx with SSL verification enabled by default and follow OWASP Top-10 recommendations
Use async/await for I/O-bound work; profile CPU-heavy paths with cProfile or mprof before optimizing; cache expensive computations with functools.lru_cache or external cache; leverage NumPy vectorized operations when beneficial

Files:

  • packages/nvidia_nat_mcp/src/nat/plugins/mcp/client_impl.py
packages/*/src/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Importable Python code inside packages must live under packages//src/

Files:

  • packages/nvidia_nat_mcp/src/nat/plugins/mcp/client_impl.py
{src/**/*.py,packages/*/src/**/*.py}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

All public APIs must have Python 3.11+ type hints on parameters and return values; prefer typing/collections.abc abstractions; use typing.Annotated when useful

Files:

  • packages/nvidia_nat_mcp/src/nat/plugins/mcp/client_impl.py
**/*

⚙️ CodeRabbit configuration file

**/*: # Code Review Instructions

  • Ensure the code follows best practices and coding standards. - For Python code, follow
    PEP 20 and
    PEP 8 for style guidelines.
  • Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
    Example:
    def my_function(param1: int, param2: str) -> bool:
        pass
  • For Python exception handling, ensure proper stack trace preservation:
    • When re-raising exceptions: use bare raise statements to maintain the original stack trace,
      and use logger.error() (not logger.exception()) to avoid duplicate stack trace output.
    • When catching and logging exceptions without re-raising: always use logger.exception()
      to capture the full stack trace information.

Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any

words listed in the ci/vale/styles/config/vocabularies/nat/reject.txt file, words that might appear to be
spelling mistakes but are listed in the ci/vale/styles/config/vocabularies/nat/accept.txt file are OK.

Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,

and should contain an Apache License 2.0 header comment at the top of each file.

  • Confirm that copyright years are up-to date whenever a file is changed.

Files:

  • packages/nvidia_nat_mcp/src/nat/plugins/mcp/client_impl.py
packages/**/*

⚙️ CodeRabbit configuration file

packages/**/*: - This directory contains optional plugin packages for the toolkit, each should contain a pyproject.toml file. - The pyproject.toml file should declare a dependency on nvidia-nat or another package with a name starting
with nvidia-nat-. This dependency should be declared using ~=<version>, and the version should be a two
digit version (ex: ~=1.0).

  • Not all packages contain Python code, if they do they should also contain their own set of tests, in a
    tests/ directory at the same level as the pyproject.toml file.

Files:

  • packages/nvidia_nat_mcp/src/nat/plugins/mcp/client_impl.py
🧬 Code graph analysis (1)
packages/nvidia_nat_mcp/src/nat/plugins/mcp/client_impl.py (2)
packages/nvidia_nat_mcp/src/nat/plugins/mcp/utils.py (1)
  • truncate_session_id (24-37)
packages/nvidia_nat_mcp/src/nat/plugins/mcp/client_base.py (2)
  • MCPBaseClient (128-407)
  • name (598-600)
🪛 Ruff (0.13.3)
packages/nvidia_nat_mcp/src/nat/plugins/mcp/client_impl.py

398-400: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


401-401: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


401-401: Avoid specifying long messages outside the exception class

(TRY003)


408-408: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


409-409: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (3)
packages/nvidia_nat_mcp/src/nat/plugins/mcp/client_impl.py (3)

47-49: LGTM! Lifetime task fields properly integrated.

The stop_event and lifetime_task fields are correctly added with appropriate defaults. The Event factory ensures each SessionData instance gets its own event object, and lifetime_task being optional handles cases where it hasn't been created yet.


239-256: LGTM! Cleanup properly respects cancel scope boundaries.

The cleanup logic correctly addresses the core issue by:

  1. Signaling stop_event to trigger graceful shutdown in the lifetime task
  2. Awaiting the lifetime task to ensure __aexit__ runs in the same task that called __aenter__
  3. Including fallback handling for edge cases where the lifetime task is missing

This prevents the "Attempted to exit cancel scope in a different task" error while maintaining thread-safe session management.


302-309: LGTM! Session creation properly wired.

The unpacking and SessionData initialization correctly handles the new return type from _create_session_client, storing all lifecycle components needed for proper cleanup.

Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
@coderabbitai coderabbitai bot added the breaking Breaking change label Oct 7, 2025
@AnuradhaKaruppiah AnuradhaKaruppiah removed the breaking Breaking change label Oct 7, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 49be113 and fe7d883.

📒 Files selected for processing (1)
  • tests/nat/mcp/test_mcp_session_management.py (12 hunks)
🧰 Additional context used
📓 Path-based instructions (5)
**/*.{py,yaml,yml}

📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)

**/*.{py,yaml,yml}: Configure response_seq as a list of strings; values cycle per call, and [] yields an empty string.
Configure delay_ms to inject per-call artificial latency in milliseconds for nat_test_llm.

Files:

  • tests/nat/mcp/test_mcp_session_management.py
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)

**/*.py: Programmatic use: create TestLLMConfig(response_seq=[...], delay_ms=...), add with builder.add_llm("", cfg).
When retrieving the test LLM wrapper, use builder.get_llm(name, wrapper_type=LLMFrameworkEnum.) and call the framework’s method (e.g., ainvoke, achat, call).

**/*.py: In code comments/identifiers use NAT abbreviations as specified: nat for API namespace/CLI, nvidia-nat for package name, NAT for env var prefixes; do not use these abbreviations in documentation
Follow PEP 20 and PEP 8; run yapf with column_limit=120; use 4-space indentation; end files with a single trailing newline
Run ruff check --fix as linter (not formatter) using pyproject.toml config; fix warnings unless explicitly ignored
Respect naming: snake_case for functions/variables, PascalCase for classes, UPPER_CASE for constants
Treat pyright warnings as errors during development
Exception handling: use bare raise to re-raise; log with logger.error() when re-raising to avoid duplicate stack traces; use logger.exception() when catching without re-raising
Provide Google-style docstrings for every public module, class, function, and CLI command; first line concise and ending with a period; surround code entities with backticks
Validate and sanitize all user input, especially in web or CLI interfaces
Prefer httpx with SSL verification enabled by default and follow OWASP Top-10 recommendations
Use async/await for I/O-bound work; profile CPU-heavy paths with cProfile or mprof before optimizing; cache expensive computations with functools.lru_cache or external cache; leverage NumPy vectorized operations when beneficial

Files:

  • tests/nat/mcp/test_mcp_session_management.py
tests/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Unit tests reside under tests/ and should use markers defined in pyproject.toml (e.g., integration)

Files:

  • tests/nat/mcp/test_mcp_session_management.py

⚙️ CodeRabbit configuration file

tests/**/*.py: - Ensure that tests are comprehensive, cover edge cases, and validate the functionality of the code. - Test functions should be named using the test_ prefix, using snake_case. - Any frequently repeated code should be extracted into pytest fixtures. - Pytest fixtures should define the name argument when applying the pytest.fixture decorator. The fixture
function being decorated should be named using the fixture_ prefix, using snake_case. Example:
@pytest.fixture(name="my_fixture")
def fixture_my_fixture():
pass

Files:

  • tests/nat/mcp/test_mcp_session_management.py
{tests/**/*.py,examples/*/tests/**/*.py}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

{tests/**/*.py,examples/*/tests/**/*.py}: Use pytest (with pytest-asyncio for async); name test files test_*.py; test functions start with test_; extract repeated code into fixtures; fixtures must set name in decorator and be named with fixture_ prefix
Mock external services with pytest_httpserver or unittest.mock; do not hit live endpoints
Mark expensive tests with @pytest.mark.slow or @pytest.mark.integration

Files:

  • tests/nat/mcp/test_mcp_session_management.py
**/*

⚙️ CodeRabbit configuration file

**/*: # Code Review Instructions

  • Ensure the code follows best practices and coding standards. - For Python code, follow
    PEP 20 and
    PEP 8 for style guidelines.
  • Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
    Example:
    def my_function(param1: int, param2: str) -> bool:
        pass
  • For Python exception handling, ensure proper stack trace preservation:
    • When re-raising exceptions: use bare raise statements to maintain the original stack trace,
      and use logger.error() (not logger.exception()) to avoid duplicate stack trace output.
    • When catching and logging exceptions without re-raising: always use logger.exception()
      to capture the full stack trace information.

Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any

words listed in the ci/vale/styles/config/vocabularies/nat/reject.txt file, words that might appear to be
spelling mistakes but are listed in the ci/vale/styles/config/vocabularies/nat/accept.txt file are OK.

Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,

and should contain an Apache License 2.0 header comment at the top of each file.

  • Confirm that copyright years are up-to date whenever a file is changed.

Files:

  • tests/nat/mcp/test_mcp_session_management.py
🧬 Code graph analysis (1)
tests/nat/mcp/test_mcp_session_management.py (1)
packages/nvidia_nat_mcp/src/nat/plugins/mcp/client_impl.py (6)
  • SessionData (40-49)
  • cleanup_sessions (186-199)
  • _create_session_client (347-413)
  • _cleanup_inactive_sessions (201-256)
  • _get_session_client (258-314)
  • _session_usage_context (317-345)
🪛 Ruff (0.13.3)
tests/nat/mcp/test_mcp_session_management.py

599-599: Unused function argument: self

(ARG001)


607-607: Pattern passed to match= contains metacharacters but is neither escaped nor raw

(RUF043)


644-644: Unused function argument: self

(ARG001)


650-650: Unused function argument: self

(ARG001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: CI Pipeline / Check
🔇 Additional comments (5)
tests/nat/mcp/test_mcp_session_management.py (5)

34-49: LGTM! Well-structured test cleanup helper.

The cleanup logic properly handles the lifetime task lifecycle: signals stop_event, waits with timeout, and gracefully handles cancellation. The defensive hasattr checks ensure robustness.


126-127: Good test hygiene with cleanup calls.

Adding explicit cleanup at the end of tests that create sessions ensures proper resource management and prevents leaks between tests.

Also applies to: 147-148, 173-174, 198-199, 250-251, 280-281, 307-308, 363-364, 413-414, 453-454, 551-552


554-580: LGTM! Comprehensive test of successful initialization.

The test properly verifies the lifetime task is created, running, and that __aenter__ is called. The cleanup sequence (set stop_event, await task) is correct.


582-591: LGTM! Proper failure handling test.

The test correctly verifies that initialization failures are propagated with appropriate error messages.


610-844: Excellent comprehensive test coverage for lifetime task functionality.

These tests thoroughly cover:

  • Cleanup on stop_event signal
  • Cancel scope task boundaries (verifying enter/exit in same task)
  • Cleanup with lifetime tasks
  • Preservation of active sessions
  • Handling of already-completed tasks
  • Complete session lifecycle
  • Multiple independent sessions

The logic is correct, assertions are appropriate, and edge cases are well covered.

Note: Static analysis warnings about unused self parameters on lines 644 and 650 are false positives—these are intentional mock method signatures that require the parameter to match the async context manager protocol.

Signed-off-by: Anuradha Karuppiah <[email protected]>
@coderabbitai coderabbitai bot added the breaking Breaking change label Oct 7, 2025
@AnuradhaKaruppiah AnuradhaKaruppiah removed the breaking Breaking change label Oct 7, 2025
@AnuradhaKaruppiah
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit 21f2e30 into NVIDIA:release/1.3 Oct 7, 2025
17 checks passed
@AnuradhaKaruppiah AnuradhaKaruppiah deleted the ak-user-isolation-2 branch October 7, 2025 21:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants