Skip to content

Conversation

@alec-flowers
Copy link
Contributor

@alec-flowers alec-flowers commented Jul 30, 2025

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

Summary by CodeRabbit

  • New Features

    • Added configurable port range selection and improved port allocation logic for Dynamo vLLM backend, supporting allocation of contiguous port blocks for advanced parallelism scenarios.
    • Introduced robust error handling and validation for port allocations.
    • Added utilities for distributed port reservation and host IP detection.
  • Documentation

    • Added README with instructions for running and configuring tests for the Dynamo vLLM backend.
    • Provided a pytest configuration template for test customization.
  • Tests

    • Introduced comprehensive unit and integration tests for port management utilities, including port range validation, metadata serialization, port holding, ETCD reservation, and host IP detection.
  • Chores

    • Updated .gitignore to exclude coverage reports.
    • Updated project configuration to support custom test paths.

Base automatically changed from aflowers/fix-port-race-2 to main July 30, 2025 17:21
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jul 30, 2025

Walkthrough

This update modularizes and refactors port allocation and reservation for the Dynamo vLLM backend. It introduces a dedicated ports.py module for ETCD-coordinated port management, updates configuration to support user-defined port ranges, and adds comprehensive unit tests for the new logic. Documentation and configuration files are also updated.

Changes

Cohort / File(s) Change Summary
Port Management Utilities
components/backends/vllm/src/dynamo/vllm/ports.py
New module for port allocation and ETCD-based reservation, including data classes for port range and metadata, context manager for holding ports, and synchronous/asynchronous APIs for port management and host IP detection.
Port Allocation Refactor
components/backends/vllm/src/dynamo/vllm/args.py
Refactored to use new port management utilities, added support for configurable port ranges, updated CLI arguments, and improved logic for block allocation and error handling. Removed legacy port allocation and host IP logic.
Port Management Tests
components/backends/vllm/src/dynamo/vllm/tests/test_ports.py
Added comprehensive tests for port range validation, metadata, port holding, ETCD reservation, single and block port allocation, and host IP retrieval using mocks and async testing.
Test Suite Documentation
components/backends/vllm/src/dynamo/vllm/tests/README.md
Added README with instructions for running tests, coverage, and dependencies.
Test Suite Initialization
components/backends/vllm/src/dynamo/vllm/tests/__init__.py
New file with module docstring and license headers for test package initialization.
Pytest Configuration
components/backends/vllm/src/dynamo/vllm/tests/pytest.ini
Added fully-commented pytest configuration template for test discovery, markers, and options.
Pytest Path Configuration
pyproject.toml
Added [pytest] section specifying Python path for test discovery and imports.
Coverage Ignore
.gitignore
Added .coverage to ignore test coverage data files.

Sequence Diagram(s)

sequenceDiagram
    participant CLI/User
    participant ArgsParser
    participant Config
    participant PortsModule
    participant EtcdClient

    CLI/User->>ArgsParser: Provide CLI args (including port min/max)
    ArgsParser->>Config: Set config.port_range from args
    Config->>PortsModule: Request port/block allocation
    PortsModule->>PortsModule: Hold and check port(s) availability
    PortsModule->>EtcdClient: Reserve port(s) atomically in ETCD
    EtcdClient-->>PortsModule: Confirm reservation
    PortsModule-->>Config: Return allocated port(s)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~35 minutes

Possibly related PRs

Poem

In a warren of ports, the rabbits convene,
To allocate numbers both tidy and clean.
With ETCD’s help, they hop to reserve,
Ensuring no conflicts, just what we deserve!
Tests in the meadow, configs in the breeze—
The Dynamo backend now runs with such ease.
🐇✨


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (10)
components/backends/vllm/src/dynamo/vllm/tests/__init__.py (1)

4-4: Docstring could convey more context.

A one-liner is okay, but consider briefly stating what is being tested (e.g., “port-allocation utilities”) to help new contributors quickly understand the scope of this test package.

components/backends/vllm/src/dynamo/vllm/tests/README.md (2)

9-11: Incorrect path in usage example.

The cd target omits the vllm segment; newcomers will end up in a non-existent directory.

-cd components/backends/vllm/src/dynamo/tests
+cd components/backends/vllm/src/dynamo/vllm/tests

25-27: Spell out full example once.

Repeating the module path is redundant; a single example plus pointer to pytest -k pattern matching keeps the README concise.

components/backends/vllm/src/dynamo/vllm/tests/pytest.ini (1)

1-25: Consider deleting or uncommenting template.

Keeping an entirely commented-out pytest.ini adds noise. Either delete it (config lives in pyproject.toml) or uncomment the options you actually need.

components/backends/vllm/src/dynamo/vllm/tests/test_ports.py (4)

26-50: Good test coverage for DynamoPortRange validation.

The tests effectively cover valid ranges, out-of-bounds ranges, and invalid ordering. Consider adding a test for boundary values (min=1024, max=49151) to ensure edge cases work correctly.


52-67: Consider expanding test coverage for PortMetadata.

The test covers metadata with block info well, but consider adding:

  1. A test case for metadata without block info
  2. Verification that reserved_at and pid fields are included in the output
def test_to_etcd_value_without_block_info(self):
    """Test converting metadata to ETCD value without block info."""
    metadata = PortMetadata(
        worker_id="test-worker",
        reason="test-reason",
    )
    
    value = metadata.to_etcd_value()
    assert value["worker_id"] == "test-worker"
    assert value["reason"] == "test-reason"
    assert "reserved_at" in value
    assert "pid" in value
    assert isinstance(value["reserved_at"], float)
    assert isinstance(value["pid"], int)

143-155: Consider using a cleaner mock pattern for context managers.

The test logic is correct, but the context manager mocking could be more idiomatic.

with patch("dynamo.vllm.ports.check_port_available", return_value=True):
    with patch("dynamo.vllm.ports.hold_ports") as mock_hold:
        # Use MagicMock's built-in context manager support
        mock_hold.return_value = MagicMock()
        
        port = await allocate_and_reserve_port(
            context, metadata, port_range, max_attempts=5
        )

212-226: Consider adding tests for error/fallback scenarios.

The successful case is well tested, but get_host_ip has fallback logic for various error conditions that should also be tested.

Add test cases for:

  1. When gethostname() fails
  2. When gethostbyname() fails
  3. When binding to the resolved IP fails

Example:

def test_get_host_ip_hostname_failure(self):
    """Test fallback when hostname retrieval fails."""
    with patch("socket.gethostname", side_effect=socket.error("Failed")):
        ip = get_host_ip()
        assert ip == "127.0.0.1"
components/backends/vllm/src/dynamo/vllm/ports.py (1)

220-221: Consider increasing retry delay for better backoff.

The 0.01s sleep between retries might be too aggressive, especially under high contention. Consider implementing exponential backoff or at least a slightly longer delay.

if attempt < actual_max_attempts:
    # Exponential backoff with jitter
    delay = min(0.1 * (2 ** (attempt - 1)), 1.0) + random.uniform(0, 0.1)
    await asyncio.sleep(delay)
components/backends/vllm/src/dynamo/vllm/args.py (1)

188-194: Consider enhancing the error message for better debugging.

The error message could be more helpful by explaining the relationship between dp_rank, tp_size, and required port range.

if base_side_channel_port < 0:
    min_required_start = dp_rank * tp_size + config.port_range.min
    raise ValueError(
        f"NIXL base port calculation resulted in negative port: "
        f"first_allocated_port={first_port_for_dp_rank}, offset={nixl_offset}, "
        f"base_port={base_side_channel_port}. "
        f"For dp_rank={dp_rank} and tp_size={tp_size}, the minimum allocated port "
        f"must be at least {min_required_start}. "
        f"Current range: {config.port_range.min}-{config.port_range.max}. "
        f"Consider using a higher port range or reducing dp_rank/tp_size."
    )
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b69c507 and 2b94a66.

📒 Files selected for processing (8)
  • .gitignore (1 hunks)
  • components/backends/vllm/src/dynamo/vllm/args.py (5 hunks)
  • components/backends/vllm/src/dynamo/vllm/ports.py (1 hunks)
  • components/backends/vllm/src/dynamo/vllm/tests/README.md (1 hunks)
  • components/backends/vllm/src/dynamo/vllm/tests/__init__.py (1 hunks)
  • components/backends/vllm/src/dynamo/vllm/tests/pytest.ini (1 hunks)
  • components/backends/vllm/src/dynamo/vllm/tests/test_ports.py (1 hunks)
  • pyproject.toml (1 hunks)
🔇 Additional comments (17)
.gitignore (1)

92-96: Good catch adding .coverage.

Ignoring coverage artifacts keeps the repo clean. ✅

components/backends/vllm/src/dynamo/vllm/tests/test_ports.py (4)

1-24: LGTM! Well-organized imports and proper licensing headers.

The imports are properly structured and all imported items are used in the tests.


69-99: Excellent test design for hold_ports context manager.

The tests effectively verify port holding and release behavior. Good use of dynamic port allocation to avoid conflicts during testing.


101-127: Well-structured async test for ETCD reservation.

The test properly mocks ETCD interactions and validates key format, lease ID usage, and JSON serialization. Good verification of the complete reservation flow.


157-210: Comprehensive test coverage for port block allocation.

Good testing of both successful allocation and error conditions. The test properly verifies contiguous port allocation and validates that the correct number of ETCD reservations are made.

components/backends/vllm/src/dynamo/vllm/ports.py (9)

1-23: LGTM! Well-structured module header and imports.

Good choice of default port range (20000-30000) within the registered ports section.


25-41: Excellent validation logic for port ranges.

The dataclass properly validates that ports are within the registered range and enforces proper ordering. Clear error messages help with debugging.


43-54: Clean ETCD context abstraction.

Good key design including the host IP for multi-node support.


56-75: Well-designed metadata structure.

Good inclusion of debugging information (timestamp, PID) and flexible handling of block allocation metadata.


88-114: Robust context manager implementation.

Good resource management with proper cleanup. The SO_REUSEADDR flag is appropriate for this use case.


116-124: Clean port availability check.

Simple and effective implementation.


126-140: Proper ETCD reservation implementation.

Good use of lease ID for automatic cleanup and JSON serialization for metadata.


229-261: Clean wrapper for single port allocation.

Good code reuse by leveraging the block allocation function.


263-291: Excellent robust IP detection implementation.

The function handles multiple failure scenarios gracefully with appropriate fallbacks. The bind test ensures the resolved IP is actually usable, which is a nice touch.

components/backends/vllm/src/dynamo/vllm/args.py (3)

15-25: LGTM! Clean import of port management utilities.

All imported components are properly used in the refactored code.


44-44: Good addition of port range configuration.

The type annotation properly uses the imported DynamoPortRange class.


75-86: Well-documented CLI arguments for port configuration.

The help text clearly explains the constraints and defaults are sensible.

Comment on lines 206 to 208
assert (
config.side_channel_port is not None
), "Must set the kv_port, use configure_ports_with_etcd"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix outdated comment referencing kv_port.

The assertion only checks side_channel_port, but the comment still mentions kv_port.

-    ), "Must set the kv_port, use configure_ports_with_etcd"
+    ), "Must set the side_channel_port, use configure_ports_with_etcd"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
assert (
config.side_channel_port is not None
), "Must set the kv_port, use configure_ports_with_etcd"
assert (
config.side_channel_port is not None
), "Must set the side_channel_port, use configure_ports_with_etcd"
🤖 Prompt for AI Agents
In components/backends/vllm/src/dynamo/vllm/args.py around lines 206 to 208,
update the assertion error message to correctly reference side_channel_port
instead of kv_port. Change the message to clearly indicate that
side_channel_port must be set and suggest using configure_ports_with_etcd if
needed, ensuring the comment matches the actual assertion condition.

Comment on lines +135 to +139
[pytest]
pythonpath = [
".",
"components/backends/vlm/src"
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Typo breaks test discovery – vlmvllm.

The Python path entry is missing an “l”. Pytest will not be able to import modules under components/backends/vllm/src, causing ImportErrors.

 pythonpath = [
     ".",
-    "components/backends/vlm/src"
+    "components/backends/vllm/src"
 ]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
[pytest]
pythonpath = [
".",
"components/backends/vlm/src"
]
[pytest]
pythonpath = [
".",
"components/backends/vllm/src"
]
🤖 Prompt for AI Agents
In pyproject.toml around lines 135 to 139, the pythonpath entry has a typo:
"vlm" should be corrected to "vllm". Update the path from
"components/backends/vlm/src" to "components/backends/vllm/src" to ensure pytest
can correctly discover and import the modules for testing.

@grahamking grahamking changed the title add tests chore(vllm): port range, tests Aug 4, 2025
@github-actions github-actions bot added the chore label Aug 4, 2025
@grahamking
Copy link
Contributor

@alec-flowers Are you still working on this? There are a couple of important Code Rabbit comments, then it looks ready.

@github-actions
Copy link

This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale label Sep 13, 2025
@github-actions
Copy link

This PR has been closed due to inactivity. If you believe this PR is still relevant, please feel free to reopen it with additional context or information.

@github-actions github-actions bot closed this Sep 18, 2025
@github-actions github-actions bot deleted the aflowers/add-tests-for-ports branch September 18, 2025 09:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants