chore(vllm): port range, tests #2187

alec-flowers · 2025-07-30T16:03:01Z

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

closes GitHub issue: #xxx

Summary by CodeRabbit

New Features
- Added configurable port range selection and improved port allocation logic for Dynamo vLLM backend, supporting allocation of contiguous port blocks for advanced parallelism scenarios.
- Introduced robust error handling and validation for port allocations.
- Added utilities for distributed port reservation and host IP detection.
Documentation
- Added README with instructions for running and configuring tests for the Dynamo vLLM backend.
- Provided a pytest configuration template for test customization.
Tests
- Introduced comprehensive unit and integration tests for port management utilities, including port range validation, metadata serialization, port holding, ETCD reservation, and host IP detection.
Chores
- Updated .gitignore to exclude coverage reports.
- Updated project configuration to support custom test paths.

coderabbitai · 2025-07-30T17:26:22Z

Walkthrough

This update modularizes and refactors port allocation and reservation for the Dynamo vLLM backend. It introduces a dedicated ports.py module for ETCD-coordinated port management, updates configuration to support user-defined port ranges, and adds comprehensive unit tests for the new logic. Documentation and configuration files are also updated.

Changes

Cohort / File(s)	Change Summary
Port Management Utilities `components/backends/vllm/src/dynamo/vllm/ports.py`	New module for port allocation and ETCD-based reservation, including data classes for port range and metadata, context manager for holding ports, and synchronous/asynchronous APIs for port management and host IP detection.
Port Allocation Refactor `components/backends/vllm/src/dynamo/vllm/args.py`	Refactored to use new port management utilities, added support for configurable port ranges, updated CLI arguments, and improved logic for block allocation and error handling. Removed legacy port allocation and host IP logic.
Port Management Tests `components/backends/vllm/src/dynamo/vllm/tests/test_ports.py`	Added comprehensive tests for port range validation, metadata, port holding, ETCD reservation, single and block port allocation, and host IP retrieval using mocks and async testing.
Test Suite Documentation `components/backends/vllm/src/dynamo/vllm/tests/README.md`	Added README with instructions for running tests, coverage, and dependencies.
Test Suite Initialization `components/backends/vllm/src/dynamo/vllm/tests/__init__.py`	New file with module docstring and license headers for test package initialization.
Pytest Configuration `components/backends/vllm/src/dynamo/vllm/tests/pytest.ini`	Added fully-commented pytest configuration template for test discovery, markers, and options.
Pytest Path Configuration `pyproject.toml`	Added `[pytest]` section specifying Python path for test discovery and imports.
Coverage Ignore `.gitignore`	Added `.coverage` to ignore test coverage data files.

Sequence Diagram(s)

sequenceDiagram
    participant CLI/User
    participant ArgsParser
    participant Config
    participant PortsModule
    participant EtcdClient

    CLI/User->>ArgsParser: Provide CLI args (including port min/max)
    ArgsParser->>Config: Set config.port_range from args
    Config->>PortsModule: Request port/block allocation
    PortsModule->>PortsModule: Hold and check port(s) availability
    PortsModule->>EtcdClient: Reserve port(s) atomically in ETCD
    EtcdClient-->>PortsModule: Confirm reservation
    PortsModule-->>Config: Return allocated port(s)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~35 minutes

Possibly related PRs

fix: add better port logic #2175: Introduces the same ports.py module and refactors args.py to use it, directly overlapping with this PR.
fix: port race condition through deterministic ports #1937: Also addresses port allocation but uses a deterministic approach without ETCD; related by topic but not by direct code overlap.

Poem

In a warren of ports, the rabbits convene,
To allocate numbers both tidy and clean.
With ETCD’s help, they hop to reserve,
Ensuring no conflicts, just what we deserve!
Tests in the meadow, configs in the breeze—
The Dynamo backend now runs with such ease.
🐇✨

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (10)

components/backends/vllm/src/dynamo/vllm/tests/__init__.py (1)

4-4: Docstring could convey more context.

A one-liner is okay, but consider briefly stating what is being tested (e.g., “port-allocation utilities”) to help new contributors quickly understand the scope of this test package.
components/backends/vllm/src/dynamo/vllm/tests/README.md (2)
9-11: Incorrect path in usage example.

The cd target omits the vllm segment; newcomers will end up in a non-existent directory.
-cd components/backends/vllm/src/dynamo/tests
+cd components/backends/vllm/src/dynamo/vllm/tests
25-27: Spell out full example once.

Repeating the module path is redundant; a single example plus pointer to pytest -k pattern matching keeps the README concise.
components/backends/vllm/src/dynamo/vllm/tests/pytest.ini (1)

1-25: Consider deleting or uncommenting template.

Keeping an entirely commented-out pytest.ini adds noise. Either delete it (config lives in pyproject.toml) or uncomment the options you actually need.
components/backends/vllm/src/dynamo/vllm/tests/test_ports.py (4)
26-50: Good test coverage for DynamoPortRange validation.

The tests effectively cover valid ranges, out-of-bounds ranges, and invalid ordering. Consider adding a test for boundary values (min=1024, max=49151) to ensure edge cases work correctly.

52-67: Consider expanding test coverage for PortMetadata.

The test covers metadata with block info well, but consider adding:

A test case for metadata without block info

Verification that reserved_at and pid fields are included in the output
def test_to_etcd_value_without_block_info(self):
    """Test converting metadata to ETCD value without block info."""
    metadata = PortMetadata(
        worker_id="test-worker",
        reason="test-reason",
    )
    
    value = metadata.to_etcd_value()
    assert value["worker_id"] == "test-worker"
    assert value["reason"] == "test-reason"
    assert "reserved_at" in value
    assert "pid" in value
    assert isinstance(value["reserved_at"], float)
    assert isinstance(value["pid"], int)
143-155: Consider using a cleaner mock pattern for context managers.

The test logic is correct, but the context manager mocking could be more idiomatic.
with patch("dynamo.vllm.ports.check_port_available", return_value=True):
    with patch("dynamo.vllm.ports.hold_ports") as mock_hold:
        # Use MagicMock's built-in context manager support
        mock_hold.return_value = MagicMock()
        
        port = await allocate_and_reserve_port(
            context, metadata, port_range, max_attempts=5
        )
212-226: Consider adding tests for error/fallback scenarios.

The successful case is well tested, but get_host_ip has fallback logic for various error conditions that should also be tested.

Add test cases for:

When gethostname() fails

When gethostbyname() fails

When binding to the resolved IP fails

Example:
def test_get_host_ip_hostname_failure(self):
    """Test fallback when hostname retrieval fails."""
    with patch("socket.gethostname", side_effect=socket.error("Failed")):
        ip = get_host_ip()
        assert ip == "127.0.0.1"
components/backends/vllm/src/dynamo/vllm/ports.py (1)
220-221: Consider increasing retry delay for better backoff.

The 0.01s sleep between retries might be too aggressive, especially under high contention. Consider implementing exponential backoff or at least a slightly longer delay.
if attempt < actual_max_attempts:
    # Exponential backoff with jitter
    delay = min(0.1 * (2 ** (attempt - 1)), 1.0) + random.uniform(0, 0.1)
    await asyncio.sleep(delay)
components/backends/vllm/src/dynamo/vllm/args.py (1)
188-194: Consider enhancing the error message for better debugging.

The error message could be more helpful by explaining the relationship between dp_rank, tp_size, and required port range.
if base_side_channel_port < 0:
    min_required_start = dp_rank * tp_size + config.port_range.min
    raise ValueError(
        f"NIXL base port calculation resulted in negative port: "
        f"first_allocated_port={first_port_for_dp_rank}, offset={nixl_offset}, "
        f"base_port={base_side_channel_port}. "
        f"For dp_rank={dp_rank} and tp_size={tp_size}, the minimum allocated port "
        f"must be at least {min_required_start}. "
        f"Current range: {config.port_range.min}-{config.port_range.max}. "
        f"Consider using a higher port range or reducing dp_rank/tp_size."
    )

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b69c507 and 2b94a66.

📒 Files selected for processing (8)

.gitignore (1 hunks)
components/backends/vllm/src/dynamo/vllm/args.py (5 hunks)
components/backends/vllm/src/dynamo/vllm/ports.py (1 hunks)
components/backends/vllm/src/dynamo/vllm/tests/README.md (1 hunks)
components/backends/vllm/src/dynamo/vllm/tests/__init__.py (1 hunks)
components/backends/vllm/src/dynamo/vllm/tests/pytest.ini (1 hunks)
components/backends/vllm/src/dynamo/vllm/tests/test_ports.py (1 hunks)
pyproject.toml (1 hunks)

🔇 Additional comments (17)

.gitignore (1)

92-96: Good catch adding .coverage.

Ignoring coverage artifacts keeps the repo clean. ✅

components/backends/vllm/src/dynamo/vllm/tests/test_ports.py (4)

1-24: LGTM! Well-organized imports and proper licensing headers.

The imports are properly structured and all imported items are used in the tests.

69-99: Excellent test design for hold_ports context manager.

The tests effectively verify port holding and release behavior. Good use of dynamic port allocation to avoid conflicts during testing.

101-127: Well-structured async test for ETCD reservation.

The test properly mocks ETCD interactions and validates key format, lease ID usage, and JSON serialization. Good verification of the complete reservation flow.

157-210: Comprehensive test coverage for port block allocation.

Good testing of both successful allocation and error conditions. The test properly verifies contiguous port allocation and validates that the correct number of ETCD reservations are made.

components/backends/vllm/src/dynamo/vllm/ports.py (9)

1-23: LGTM! Well-structured module header and imports.

Good choice of default port range (20000-30000) within the registered ports section.

25-41: Excellent validation logic for port ranges.

The dataclass properly validates that ports are within the registered range and enforces proper ordering. Clear error messages help with debugging.

43-54: Clean ETCD context abstraction.

Good key design including the host IP for multi-node support.

56-75: Well-designed metadata structure.

Good inclusion of debugging information (timestamp, PID) and flexible handling of block allocation metadata.

88-114: Robust context manager implementation.

Good resource management with proper cleanup. The SO_REUSEADDR flag is appropriate for this use case.

116-124: Clean port availability check.

Simple and effective implementation.

126-140: Proper ETCD reservation implementation.

Good use of lease ID for automatic cleanup and JSON serialization for metadata.

229-261: Clean wrapper for single port allocation.

Good code reuse by leveraging the block allocation function.

263-291: Excellent robust IP detection implementation.

The function handles multiple failure scenarios gracefully with appropriate fallbacks. The bind test ensures the resolved IP is actually usable, which is a nice touch.

components/backends/vllm/src/dynamo/vllm/args.py (3)

15-25: LGTM! Clean import of port management utilities.

All imported components are properly used in the refactored code.

44-44: Good addition of port range configuration.

The type annotation properly uses the imported DynamoPortRange class.

75-86: Well-documented CLI arguments for port configuration.

The help text clearly explains the constraints and defaults are sensible.

coderabbitai · 2025-07-30T17:26:25Z

components/backends/vllm/src/dynamo/vllm/args.py

    assert (
        config.side_channel_port is not None
    ), "Must set the kv_port, use configure_ports_with_etcd"


⚠️ Potential issue

Fix outdated comment referencing kv_port.

The assertion only checks side_channel_port, but the comment still mentions kv_port.

- ), "Must set the kv_port, use configure_ports_with_etcd" + ), "Must set the side_channel_port, use configure_ports_with_etcd"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

assert (

config.side_channel_port is not None

), "Must set the kv_port, use configure_ports_with_etcd"

assert (

config.side_channel_port is not None

), "Must set the side_channel_port, use configure_ports_with_etcd"

🤖 Prompt for AI Agents

In components/backends/vllm/src/dynamo/vllm/args.py around lines 206 to 208, update the assertion error message to correctly reference side_channel_port instead of kv_port. Change the message to clearly indicate that side_channel_port must be set and suggest using configure_ports_with_etcd if needed, ensuring the comment matches the actual assertion condition.

coderabbitai · 2025-07-30T17:26:25Z

pyproject.toml

+[pytest]
+pythonpath = [
+    ".",
+    "components/backends/vlm/src"
+]


⚠️ Potential issue

Typo breaks test discovery – vlm ➜ vllm.

The Python path entry is missing an “l”. Pytest will not be able to import modules under components/backends/vllm/src, causing ImportErrors.

pythonpath = [ ".", - "components/backends/vlm/src" + "components/backends/vllm/src" ]

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

[pytest]

pythonpath = [

".",

"components/backends/vlm/src"

]

[pytest]

pythonpath = [

".",

"components/backends/vllm/src"

]

🤖 Prompt for AI Agents

In pyproject.toml around lines 135 to 139, the pythonpath entry has a typo: "vlm" should be corrected to "vllm". Update the path from "components/backends/vlm/src" to "components/backends/vllm/src" to ensure pytest can correctly discover and import the modules for testing.

grahamking · 2025-08-13T19:47:11Z

@alec-flowers Are you still working on this? There are a couple of important Code Rabbit comments, then it looks ready.

github-actions · 2025-09-13T09:32:12Z

This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions · 2025-09-18T09:34:17Z

This PR has been closed due to inactivity. If you believe this PR is still relevant, please feel free to reopen it with additional context or information.

alec-flowers added 2 commits July 29, 2025 23:28

add new port reservation behavior

d9c09e6

bug fix

fc6d2a1

alec-flowers requested review from a team, GuanLuo, PeaBrane, biswapanda, grahamking, ishandhanani, jthomson04, kkranen, nnshah1, paulhendricks, piotrm-nvidia, ptarasiewiczNV, rmccorm4, ryanolson, tanmayv25, tedzhouhk and tmonty12 as code owners July 30, 2025 16:03

pull-request-size bot added the size/L label Jul 30, 2025

copy-pr-bot bot temporarily deployed to GITLAB July 30, 2025 16:03 Inactive

add tests

619c6bc

copy-pr-bot bot temporarily deployed to GITLAB July 30, 2025 16:03 Inactive

alec-flowers force-pushed the aflowers/add-tests-for-ports branch from de20a64 to 619c6bc Compare July 30, 2025 16:03

copy-pr-bot bot temporarily deployed to GITLAB July 30, 2025 16:04 Inactive

copy-pr-bot bot temporarily deployed to GITLAB July 30, 2025 16:09 Inactive

fix mypy

2b94a66

copy-pr-bot bot temporarily deployed to GITLAB July 30, 2025 16:14 Inactive

Base automatically changed from aflowers/fix-port-race-2 to main July 30, 2025 17:21

pull-request-size bot added size/XL and removed size/L labels Jul 30, 2025

coderabbitai bot reviewed Jul 30, 2025

View reviewed changes

grahamking approved these changes Aug 4, 2025

View reviewed changes

grahamking changed the title ~~add tests~~ chore(vllm): port range, tests Aug 4, 2025

github-actions bot added the chore label Aug 4, 2025

github-actions bot added the Stale label Sep 13, 2025

github-actions bot closed this Sep 18, 2025

github-actions bot deleted the aflowers/add-tests-for-ports branch September 18, 2025 09:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore(vllm): port range, tests #2187

chore(vllm): port range, tests #2187

Uh oh!

alec-flowers commented Jul 30, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jul 30, 2025

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Jul 30, 2025

Uh oh!

coderabbitai bot Jul 30, 2025

Uh oh!

grahamking commented Aug 13, 2025

Uh oh!

github-actions bot commented Sep 13, 2025

Uh oh!

github-actions bot commented Sep 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

chore(vllm): port range, tests #2187

chore(vllm): port range, tests #2187

Uh oh!

Conversation

alec-flowers commented Jul 30, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jul 30, 2025

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jul 30, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jul 30, 2025

Choose a reason for hiding this comment

Uh oh!

grahamking commented Aug 13, 2025

Uh oh!

github-actions bot commented Sep 13, 2025

Uh oh!

github-actions bot commented Sep 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

alec-flowers commented Jul 30, 2025 •

edited by coderabbitai bot

Loading