Skip to content

Conversation

@nnshah1
Copy link
Contributor

@nnshah1 nnshah1 commented Jun 3, 2025

Overview:

Adding initial end to end test framework for further expansion.

Details:

Adds top level tests directory with dynamo cli based tests and marks

Where should the reviewer start?

tests

Summary by CodeRabbit

Summary by CodeRabbit

  • New Features
    • Introduced comprehensive end-to-end tests for deployment scenarios, including multi-GPU and multimodal model serving.
    • Added utilities for managing external runtime services (etcd and NATS) used during testing.
    • Implemented a process management utility for robust test environment setup and teardown.
    • Expanded pytest marker support for more granular test selection and filtering.
  • Documentation
    • Added detailed testing framework documentation, including setup instructions, test conventions, and troubleshooting guidance.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Jun 3, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
tests/utils/managed_process.py (3)

92-94: Fix remaining f-string logging issues.

Despite previous reviews flagging this, f-string logging is still present. Use lazy formatting for better performance and consistency with logging best practices.

Apply this diff:

-        self._logger.info(
-            f"Running command: {' '.join(self.command)} in {self.working_dir or os.getcwd()}"
-        )
+        self._logger.info(
+            "Running command: %s in %s", 
+            ' '.join(self.command), 
+            self.working_dir or os.getcwd()
+        )

142-142: Fix f-string logging in warning message.

This logging statement still uses f-string format instead of lazy formatting.

Apply this diff:

-            self._logger.warning(f"Warning: Failed to remove directory {path}: {e}")
+            self._logger.warning("Failed to remove directory %s: %s", path, e)

153-153: Complete the f-string logging cleanup throughout the file.

Multiple logging statements throughout the file still use f-strings instead of lazy formatting. This was flagged in previous reviews but appears incomplete.

Apply lazy formatting to all remaining f-string log statements:

-        self._logger.info(f"Checking Port: {port}")
+        self._logger.info("Checking Port: %s", port)

-        self._logger.info(f"SUCCESS: Check Port:{port}")
+        self._logger.info("SUCCESS: Check Port: %s", port)

-        self._logger.error(f"FAILED: Check Port: {port}")
+        self._logger.error("FAILED: Check Port: %s", port)

# And similar patterns for all other f-string logs in the file

Also applies to: 157-157, 160-160, 176-176, 182-182, 185-185, 188-188, 193-193, 200-200, 203-203, 205-205, 207-209

🧹 Nitpick comments (3)
tests/utils/managed_process.py (1)

29-42: Consider refactoring to reduce instance attributes.

The static analysis correctly flags that this dataclass has 17 instance attributes, which exceeds the recommended limit of 7. While this is functional, it suggests the class may have too many responsibilities.

Consider splitting into separate classes:

  • ProcessConfig: command, env, working_dir, display_output
  • HealthCheckConfig: health_check_ports, health_check_urls, timeout
  • CleanupConfig: data_dir, terminate_existing, stragglers, log_dir

This would improve maintainability and follow the Single Responsibility Principle.

🧰 Tools
🪛 Pylint (3.3.7)

[refactor] 30-30: Too many instance attributes (17/7)

(R0902)

tests/README.md (2)

33-33: Add missing article for better readability.

The static analysis correctly identifies a missing article that would improve readability.

Apply this diff:

-To run all tests:
+To run all the tests:
🧰 Tools
🪛 LanguageTool

[uncategorized] ~33-~33: You might be missing the article “the” here.
Context: ... "dynamo" ## Running Tests To run all tests: bash pytest To run only specif...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)


91-91: Format the HuggingFace URL as a proper markdown link.

The bare URL should be formatted as a proper markdown link for better presentation and to satisfy markdown linting rules.

Apply this diff:

-  - Get a token from https://huggingface.co/settings/tokens
+  - Get a token from [HuggingFace settings](https://huggingface.co/settings/tokens)
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)

91-91: Bare URL used
null

(MD034, no-bare-urls)

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 35010a0 and 702c713.

📒 Files selected for processing (2)
  • tests/README.md (1 hunks)
  • tests/utils/managed_process.py (1 hunks)
🧰 Additional context used
🪛 LanguageTool
tests/README.md

[uncategorized] ~33-~33: You might be missing the article “the” here.
Context: ... "dynamo" ## Running Tests To run all tests: bash pytest To run only specif...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)

🪛 markdownlint-cli2 (0.17.2)
tests/README.md

91-91: Bare URL used
null

(MD034, no-bare-urls)

🪛 Pylint (3.3.7)
tests/utils/managed_process.py

[refactor] 30-30: Too many instance attributes (17/7)

(R0902)


[refactor] 121-128: Consider using 'with' for resource-allocating operations

(R1732)


[refactor] 130-134: Consider using 'with' for resource-allocating operations

(R1732)


[refactor] 115-117: Consider using 'with' for resource-allocating operations

(R1732)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Build and Test - vllm
🔇 Additional comments (1)
tests/README.md (1)

1-98: Excellent documentation structure and comprehensive coverage.

The README provides clear, well-organized documentation for the testing framework. The structure is logical, covering discovery, organization, execution, and environment setup. The marker categorization is particularly well thought out and will help developers understand how to properly categorize their tests.

🧰 Tools
🪛 LanguageTool

[uncategorized] ~33-~33: You might be missing the article “the” here.
Context: ... "dynamo" ## Running Tests To run all tests: bash pytest To run only specif...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)

🪛 markdownlint-cli2 (0.17.2)

91-91: Bare URL used
null

(MD034, no-bare-urls)

@nnshah1 nnshah1 disabled auto-merge June 5, 2025 00:30
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
tests/utils/managed_process.py (2)

188-188: Use proper warning method instead of deprecated warn().

The warn() method is deprecated in favor of warning().

-                self._logger.warn("URL check failed: %s", e)
+                self._logger.warning("URL check failed: %s", e)

223-239: Consider adding docstring to main function.

The main function serves as a usage example but lacks documentation explaining its purpose and parameters.

 def main():
+    """
+    Example usage of ManagedProcess for running a Dynamo command with health checks.
+    """
     with ManagedProcess(
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 702c713 and b5db16f.

📒 Files selected for processing (2)
  • tests/serve/test_dynamo_serve.py (1 hunks)
  • tests/utils/managed_process.py (1 hunks)
🧰 Additional context used
🪛 Pylint (3.3.7)
tests/serve/test_dynamo_serve.py

[refactor] 155-155: Too few public methods (0/2)

(R0903)

tests/utils/managed_process.py

[refactor] 30-30: Too many instance attributes (17/7)

(R0902)


[refactor] 123-130: Consider using 'with' for resource-allocating operations

(R1732)


[refactor] 132-136: Consider using 'with' for resource-allocating operations

(R1732)


[refactor] 117-119: Consider using 'with' for resource-allocating operations

(R1732)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Build and Test - vllm
🔇 Additional comments (4)
tests/serve/test_dynamo_serve.py (3)

74-152: Well-structured deployment graph configurations.

The deployment graph dictionary provides comprehensive coverage of different deployment scenarios (aggregated, disaggregated, multimodal, etc.) with appropriate pytest markers for GPU requirements and backend types. The configuration is clean and follows a consistent pattern.


155-184: DynamoServeProcess implementation is solid.

The class appropriately extends ManagedProcess with Dynamo-specific logic. The conditional health check handling for multimodal deployments and the custom model readiness check are well-implemented.

🧰 Tools
🪛 Pylint (3.3.7)

[refactor] 155-155: Too few public methods (0/2)

(R0903)


222-265: Robust retry and timeout handling.

The test correctly implements retry logic for common failure scenarios (no instances, model not found) and properly handles timeouts with the while-else pattern. The explicit pytest.fail() on timeout ensures tests don't silently pass.

tests/utils/managed_process.py (1)

29-47: Consider the high number of instance attributes.

While the dataclass has 17 instance attributes (exceeding the typical limit of 7), this is justified for a comprehensive process management utility that needs to handle various aspects like health checks, logging, cleanup, and configuration. The attributes are logically grouped and necessary for the functionality.

🧰 Tools
🪛 Pylint (3.3.7)

[refactor] 30-30: Too many instance attributes (17/7)

(R0902)

@nnshah1 nnshah1 enabled auto-merge (squash) June 5, 2025 00:46
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (1)
tests/utils/managed_process.py (1)

165-169: Fix timeout parameter passing in URL checks.

Similar to port checks, the timeout should be decremented between URL checks to respect the overall timeout limit.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b5db16f and 2358c7d.

📒 Files selected for processing (1)
  • tests/utils/managed_process.py (1 hunks)
🧰 Additional context used
🪛 Pylint (3.3.7)
tests/utils/managed_process.py

[refactor] 30-30: Too many instance attributes (17/7)

(R0902)


[refactor] 123-130: Consider using 'with' for resource-allocating operations

(R1732)


[refactor] 132-136: Consider using 'with' for resource-allocating operations

(R1732)


[refactor] 117-119: Consider using 'with' for resource-allocating operations

(R1732)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Build and Test - vllm
🔇 Additional comments (1)
tests/utils/managed_process.py (1)

43-48: 🛠️ Refactor suggestion

Move class-level attributes to instance level.

Class-level attributes like _logger, _command_name, etc. are shared across all instances, which could lead to state pollution between different ManagedProcess instances running concurrently.

-    _logger = logging.getLogger()
-    _command_name = None
-    _log_path = None
-    _tee_proc = None
-    _sed_proc = None
+    def __post_init__(self):
+        self._logger = None
+        self._command_name = None
+        self._log_path = None
+        self._tee_proc = None
+        self._sed_proc = None

Likely an incorrect or invalid review comment.

nnshah1 and others added 2 commits June 4, 2025 18:04
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: Neelay Shah <[email protected]>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: Neelay Shah <[email protected]>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
tests/utils/managed_process.py (2)

62-63: Ensure timeout remains positive after health checks.

The timeout reduction logic could result in negative timeouts if health checks take longer than expected. While the current implementation might work, it's safer to ensure the timeout never goes negative.

Apply this diff to ensure timeout remains positive:

-            self.timeout -= self._check_ports(self.timeout)
-            self.timeout -= self._check_urls(self.timeout)
+            elapsed_port_time = self._check_ports(self.timeout)
+            self.timeout = max(0, self.timeout - elapsed_port_time)
+            elapsed_url_time = self._check_urls(self.timeout)
+            self.timeout = max(0, self.timeout - elapsed_port_time - elapsed_url_time)

149-153: 🛠️ Refactor suggestion

Fix timeout parameter passing in port checks.

The timeout value is not being decremented between port checks, potentially causing the total time to exceed the intended timeout when checking multiple ports.

Apply this diff to properly manage timeout across multiple port checks:

     def _check_ports(self, timeout):
         time_taken = 0
+        remaining_timeout = timeout
         for port in self.health_check_ports:
-            time_taken += self._check_port(port, timeout)
+            if remaining_timeout <= 0:
+                raise RuntimeError("Timeout exceeded while checking ports")
+            port_time = self._check_port(port, remaining_timeout)
+            time_taken += port_time
+            remaining_timeout = max(0, timeout - time_taken)
         return time_taken
🧹 Nitpick comments (1)
tests/utils/managed_process.py (1)

30-47: Consider separating configuration from internal state.

The dataclass mixes configuration parameters with internal state fields (prefixed with _). This violates the single responsibility principle and makes the class harder to understand.

Consider moving the internal state fields out of the dataclass definition and initializing them in __post_init__ or __enter__:

 @dataclass
 class ManagedProcess:
     command: List[str]
     env: Optional[dict] = None
     health_check_ports: List[int] = field(default_factory=list)
     health_check_urls: List[Any] = field(default_factory=list)
     timeout: int = 300
     working_dir: Optional[str] = None
     display_output: bool = False
     data_dir: Optional[str] = None
     terminate_existing: bool = True
     stragglers: List[str] = field(default_factory=list)
     log_dir: str = os.getcwd()
-
-    _logger = logging.getLogger()
-    _command_name = None
-    _log_path = None
-    _tee_proc = None
-    _sed_proc = None
+
+    def __post_init__(self):
+        self._logger = logging.getLogger(self.__class__.__name__)
+        self._command_name = None
+        self._log_path = None
+        self._tee_proc = None
+        self._sed_proc = None
+        self.proc = None
🧰 Tools
🪛 Pylint (3.3.7)

[refactor] 30-30: Too many instance attributes (17/7)

(R0902)

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2358c7d and a0a4307.

📒 Files selected for processing (1)
  • tests/utils/managed_process.py (1 hunks)
🧰 Additional context used
🪛 Pylint (3.3.7)
tests/utils/managed_process.py

[refactor] 30-30: Too many instance attributes (17/7)

(R0902)


[refactor] 126-133: Consider using 'with' for resource-allocating operations

(R1732)


[refactor] 135-139: Consider using 'with' for resource-allocating operations

(R1732)


[refactor] 120-122: Consider using 'with' for resource-allocating operations

(R1732)

🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/1339/merge) by nnshah1.
tests/utils/managed_process.py

[error] 88-89: Black formatting check failed. The file was reformatted by the black hook. Run 'black tests/utils/managed_process.py' to fix code style issues.

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Build and Test - vllm
🔇 Additional comments (1)
tests/utils/managed_process.py (1)

1-15: LGTM!

The license header is properly formatted with standard Apache 2.0 licensing.

@nnshah1
Copy link
Contributor Author

nnshah1 commented Jun 5, 2025

@coderabbitai full review

@nnshah1 nnshah1 disabled auto-merge June 5, 2025 17:35
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
tests/serve/test_dynamo_serve.py (1)

224-229: Catch only request-related exceptions, not Exception

This broad except Exception swallows programming errors (e.g., KeyError, TypeError) that should fail the test immediately.

-            except Exception as e:
-                logger.warning("Request failed: %s", e)
+            except (requests.RequestException, requests.Timeout) as e:
+                logger.warning("Request failed, retrying: %s", e)

Same recommendation was made earlier; please avoid reverting to the generic catch-all.

🧹 Nitpick comments (3)
tests/README.md (1)

91-91: Minor formatting improvement: Use proper markdown link syntax.

The bare URL should be formatted as a proper markdown link for better consistency.

-- Get a token from https://huggingface.co/settings/tokens
+- Get a token from [HuggingFace Settings](https://huggingface.co/settings/tokens)
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)

91-91: Bare URL used
null

(MD034, no-bare-urls)

tests/conftest.py (1)

30-30: Minor documentation inconsistency in date format comment.

The comment mentions "ISO 8601 UTC format with microseconds" but the format string doesn't include microseconds or timezone information.

-    datefmt="%Y-%m-%dT%H:%M:%S",  # ISO 8601 UTC format with microseconds
+    datefmt="%Y-%m-%dT%H:%M:%S",  # ISO 8601 format
tests/utils/deployment_graph.py (1)

46-67: Consider consistent error handling approach across response handlers.

The two response handlers have different error handling strategies:

  • multimodal_response_handler returns empty string for non-200 responses
  • completions_response_handler uses assertions that raise exceptions for validation failures

For consistency and predictable test behavior, consider aligning these approaches.

Option 1: Make both handlers return empty strings on errors:

 def completions_response_handler(response):
     if response.status_code != 200:
         return ""
     result = response.json()
-    assert "choices" in result, "Missing 'choices' in response"
-    assert len(result["choices"]) > 0, "Empty choices in response"
-    assert "message" in result["choices"][0], "Missing 'message' in first choice"
-    assert "content" in result["choices"][0]["message"], "Missing 'content' in message"
+    if ("choices" not in result or len(result["choices"]) == 0 or
+        "message" not in result["choices"][0] or
+        "content" not in result["choices"][0]["message"]):
+        return ""
     return result["choices"][0]["message"]["content"]

Option 2: Make both handlers raise exceptions for better error reporting in tests.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ba16ed5 and 1d0cf46.

📒 Files selected for processing (10)
  • container/deps/requirements.test.txt (1 hunks)
  • pyproject.toml (1 hunks)
  • tests/README.md (1 hunks)
  • tests/__init__.py (1 hunks)
  • tests/conftest.py (1 hunks)
  • tests/serve/__init__.py (1 hunks)
  • tests/serve/conftest.py (1 hunks)
  • tests/serve/test_dynamo_serve.py (1 hunks)
  • tests/utils/deployment_graph.py (1 hunks)
  • tests/utils/managed_process.py (1 hunks)
🧰 Additional context used
🪛 LanguageTool
tests/README.md

[uncategorized] ~33-~33: You might be missing the article “the” here.
Context: ... "dynamo" ## Running Tests To run all tests: bash pytest To run only specif...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)

🪛 markdownlint-cli2 (0.17.2)
tests/README.md

91-91: Bare URL used
null

(MD034, no-bare-urls)

🪛 Pylint (3.3.7)
tests/serve/test_dynamo_serve.py

[refactor] 155-155: Too few public methods (0/2)

(R0903)

tests/utils/managed_process.py

[refactor] 30-30: Too many instance attributes (17/7)

(R0902)


[refactor] 127-134: Consider using 'with' for resource-allocating operations

(R1732)


[refactor] 136-140: Consider using 'with' for resource-allocating operations

(R1732)


[refactor] 121-123: Consider using 'with' for resource-allocating operations

(R1732)

tests/conftest.py

[refactor] 34-34: Too few public methods (0/2)

(R0903)


[refactor] 58-58: Too few public methods (0/2)

(R0903)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Build and Test - vllm
🔇 Additional comments (9)
tests/__init__.py (1)

1-15: LGTM! Proper package initialization with correct license header.

The file correctly establishes the tests directory as a Python package with the appropriate Apache 2.0 license header.

container/deps/requirements.test.txt (1)

16-16: LGTM! Appropriate dependency for test process management.

The psutil>=5.0.0 dependency addition is well-justified for supporting process management utilities in the test framework.

tests/serve/__init__.py (1)

1-15: LGTM! Proper package initialization for serve tests.

The file correctly establishes the tests/serve directory as a Python package with consistent license header formatting.

tests/serve/conftest.py (1)

1-15: LGTM! Placeholder conftest.py with proper license header.

The file establishes a serve-specific pytest configuration file for future expansion of test fixtures and configurations.

pyproject.toml (1)

160-169: LGTM! Pytest markers are now comprehensive and well-organized.

The marker configuration successfully addresses previous feedback by:

  • Replacing the generic "gpu" marker with specific "gpu_1" and "gpu_2" markers for consistency
  • Adding all the markers referenced in the documentation (integration, unit, stress, etc.)
  • Including backend-specific markers (vllm, tensorrtllm, sglang) for granular test filtering

This provides a solid foundation for the new end-to-end testing framework.

tests/README.md (1)

1-99: Excellent testing framework documentation!

This README provides comprehensive guidance for the new testing framework, covering all essential aspects from directory structure to marker usage. The alignment with the pytest configuration in pyproject.toml ensures consistency between documentation and implementation.

🧰 Tools
🪛 LanguageTool

[uncategorized] ~33-~33: You might be missing the article “the” here.
Context: ... "dynamo" ## Running Tests To run all tests: bash pytest To run only specif...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)

🪛 markdownlint-cli2 (0.17.2)

91-91: Bare URL used
null

(MD034, no-bare-urls)

tests/conftest.py (1)

34-76: Excellent fixture design for runtime services.

The implementation properly encapsulates etcd and NATS server management using the context manager pattern. The configuration parameters are well-chosen for testing environments, and the fixture design allows easy reuse across tests.

🧰 Tools
🪛 Pylint (3.3.7)

[refactor] 34-34: Too few public methods (0/2)

(R0903)


[refactor] 58-58: Too few public methods (0/2)

(R0903)

tests/utils/deployment_graph.py (1)

20-43: Well-designed dataclasses for deployment testing.

The DeploymentGraph and Payload dataclasses provide a clean, type-safe structure for test configuration and data. The use of field(default_factory=list) for the optional marks field is correct and avoids mutable default arguments.

tests/utils/managed_process.py (1)

186-190: ⚠️ Potential issue

timeout - elapsed may turn negative → requests raises immediately

When the loop is almost out of time, timeout - elapsed becomes 0 or negative, which causes requests.get(..., timeout=0) to raise ValueError("timeout must be > 0"), aborting the health-check prematurely.

-                response = requests.get(url, timeout=timeout - elapsed)
+                remaining = max(0.1, timeout - elapsed)
+                response = requests.get(url, timeout=remaining)

Clamping the per-call timeout guarantees it stays valid while still respecting the overall deadline.

Likely an incorrect or invalid review comment.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (6)
tests/README.md (3)

84-86: Use inline code formatting instead of fenced triple-backticks inside a sentence

Triple-backticks open a fenced block and break Markdown rendering.
For inline commands, wrap them in single backticks:

-Tests are designed to run in the appropriate framework container built
-via ```./container/build.sh --framework X``` and run via
-```./container/run.sh --mount-workspace -it -- pytest```.
+Tests are designed to run in the appropriate framework container built
+via `./container/build.sh --framework X` and run via
+`./container/run.sh --mount-workspace -it -- pytest`.

33-35: Add the missing definite article for clarity

run all testsrun all the tests to match standard English usage.

🧰 Tools
🪛 LanguageTool

[uncategorized] ~33-~33: You might be missing the article “the” here.
Context: ... "dynamo" ## Running Tests To run all tests: bash pytest To run only specif...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)


9-17: Prefer text or omit the language spec for directory diagrams

Specifying bash misleads editors/linters because the snippet is not executable shell
but an ASCII tree. Either change to text or drop the language hint.

-```bash
+```text
tests/conftest.py (1)

72-76: Scope could be widened to session for large start-up costs

Starting NATS + etcd for every single test adds ~2-3 s per test. If all
E2E tests rely on the same pair, declare:

@pytest.fixture(scope="session")
def runtime_services(request):
    ...

to spin them up once and reuse.

tests/serve/test_dynamo_serve.py (2)

225-236: Unused elapsed variable and repeated time.time() calls

Store the loop start once and reuse elapsed for clarity & micro-perf:

-start_time = time.time()
-...
-while time.time() - start_time < deployment_graph.timeout:
-    elapsed = time.time() - start_time
+start_time = time.time()
+while (elapsed := time.time() - start_time) < deployment_graph.timeout:

156-166: Allow dynamic ports to enable parallel E2E runs

Hard-coding port=8000 makes concurrent test workers contend. Accept a
parameter or pick a free port:

port = port or socket.getfreeport()

and propagate it to health-checks and request URLs.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1d0cf46 and b3823c2.

📒 Files selected for processing (5)
  • tests/README.md (1 hunks)
  • tests/conftest.py (1 hunks)
  • tests/serve/test_dynamo_serve.py (1 hunks)
  • tests/utils/deployment_graph.py (1 hunks)
  • tests/utils/managed_process.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/utils/deployment_graph.py
🧰 Additional context used
🧬 Code Graph Analysis (1)
tests/conftest.py (1)
tests/utils/managed_process.py (1)
  • ManagedProcess (30-230)
🪛 Pylint (3.3.7)
tests/serve/test_dynamo_serve.py

[refactor] 155-155: Too few public methods (0/2)

(R0903)


[refactor] 208-208: Too many local variables (16/15)

(R0914)

tests/conftest.py

[refactor] 34-34: Too few public methods (0/2)

(R0903)


[refactor] 58-58: Too few public methods (0/2)

(R0903)

tests/utils/managed_process.py

[refactor] 30-30: Too many instance attributes (17/7)

(R0902)


[refactor] 127-134: Consider using 'with' for resource-allocating operations

(R1732)


[refactor] 136-140: Consider using 'with' for resource-allocating operations

(R1732)


[refactor] 121-123: Consider using 'with' for resource-allocating operations

(R1732)

🪛 LanguageTool
tests/README.md

[uncategorized] ~33-~33: You might be missing the article “the” here.
Context: ... "dynamo" ## Running Tests To run all tests: bash pytest To run only specif...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Build and Test - vllm
🔇 Additional comments (4)
tests/utils/managed_process.py (4)

29-48: The class complexity is justified for this subprocess management utility.

While static analysis flags the high number of instance attributes (17), this is reasonable for a comprehensive process management utility that needs to handle command execution, health checks, logging, cleanup, and various configuration options. The dataclass approach with proper defaults makes the interface clean despite the complexity.

🧰 Tools
🪛 Pylint (3.3.7)

[refactor] 30-30: Too many instance attributes (17/7)

(R0902)


49-91: Well-implemented context manager with comprehensive cleanup.

The context manager properly handles both success and failure cases:

  • Exception handling in __enter__ ensures cleanup occurs on startup failures
  • __exit__ performs thorough cleanup of processes, file handles, and directories
  • Process iteration includes proper exception handling for race conditions

The implementation addresses previous review concerns effectively.


92-200: Process management and health checks are well-implemented.

The code demonstrates good practices:

  • Proper subprocess configuration with logging pipeline (sed/tee)
  • Health check methods with cumulative timeout management
  • Appropriate exception handling for network requests
  • Previous concerns about file descriptor inheritance were clarified as safe

The implementation effectively addresses feedback from extensive past reviews.

🧰 Tools
🪛 Pylint (3.3.7)

[refactor] 127-134: Consider using 'with' for resource-allocating operations

(R1732)


[refactor] 136-140: Consider using 'with' for resource-allocating operations

(R1732)


[refactor] 121-123: Consider using 'with' for resource-allocating operations

(R1732)


201-254: Robust process termination and clear usage example.

The process termination implementation handles edge cases well:

  • Comprehensive exception handling for psutil operations
  • Recursive process tree termination
  • Graceful handling of already-terminated processes
  • Clear main function demonstrating typical usage

This represents a mature implementation that has evolved through thorough review feedback.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
tests/serve/test_dynamo_serve.py (1)

178-178: Consider more specific straggler process identification.

The stragglers=["http"] configuration may terminate unrelated system processes with that name. Consider using a more specific identifier or process tracking mechanism to avoid affecting other services on the system.

tests/utils/managed_process.py (1)

171-175: 🛠️ Refactor suggestion

Add timeout boundary check in URL health checks.

The URL checking loop could attempt requests with negative timeouts if the elapsed time exceeds the total timeout, potentially causing request errors.

     def _check_urls(self, timeout):
         elapsed = 0.0
         for url in self.health_check_urls:
-            elapsed += self._check_url(url, timeout - elapsed)
+            remaining_timeout = timeout - elapsed
+            if remaining_timeout <= 0:
+                raise RuntimeError("Timeout exceeded while checking URLs")
+            elapsed += self._check_url(url, remaining_timeout)
         return elapsed
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b3823c2 and b9ea317.

📒 Files selected for processing (3)
  • tests/conftest.py (1 hunks)
  • tests/serve/test_dynamo_serve.py (1 hunks)
  • tests/utils/managed_process.py (1 hunks)
🧰 Additional context used
🪛 Pylint (3.3.7)
tests/utils/managed_process.py

[refactor] 30-30: Too many instance attributes (17/7)

(R0902)


[refactor] 127-134: Consider using 'with' for resource-allocating operations

(R1732)


[refactor] 136-140: Consider using 'with' for resource-allocating operations

(R1732)


[refactor] 121-123: Consider using 'with' for resource-allocating operations

(R1732)

tests/conftest.py

[refactor] 35-35: Too few public methods (0/2)

(R0903)


[refactor] 60-60: Too few public methods (0/2)

(R0903)

tests/serve/test_dynamo_serve.py

[refactor] 155-155: Too few public methods (0/2)

(R0903)


[refactor] 211-211: Too many local variables (16/15)

(R0914)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Build and Test - vllm

@nnshah1 nnshah1 enabled auto-merge (squash) June 5, 2025 19:51
@nnshah1 nnshah1 merged commit 373f1f3 into main Jun 5, 2025
6 checks passed
@nnshah1 nnshah1 deleted the neelays/tests branch June 5, 2025 20:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants