-
Notifications
You must be signed in to change notification settings - Fork 676
chore(vllm): port range, tests #2187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
de20a64 to
619c6bc
Compare
WalkthroughThis update modularizes and refactors port allocation and reservation for the Dynamo vLLM backend. It introduces a dedicated Changes
Sequence Diagram(s)sequenceDiagram
participant CLI/User
participant ArgsParser
participant Config
participant PortsModule
participant EtcdClient
CLI/User->>ArgsParser: Provide CLI args (including port min/max)
ArgsParser->>Config: Set config.port_range from args
Config->>PortsModule: Request port/block allocation
PortsModule->>PortsModule: Hold and check port(s) availability
PortsModule->>EtcdClient: Reserve port(s) atomically in ETCD
EtcdClient-->>PortsModule: Confirm reservation
PortsModule-->>Config: Return allocated port(s)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~35 minutes Possibly related PRs
Poem
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (10)
components/backends/vllm/src/dynamo/vllm/tests/__init__.py (1)
4-4: Docstring could convey more context.A one-liner is okay, but consider briefly stating what is being tested (e.g., “port-allocation utilities”) to help new contributors quickly understand the scope of this test package.
components/backends/vllm/src/dynamo/vllm/tests/README.md (2)
9-11: Incorrect path in usage example.The
cdtarget omits thevllmsegment; newcomers will end up in a non-existent directory.-cd components/backends/vllm/src/dynamo/tests +cd components/backends/vllm/src/dynamo/vllm/tests
25-27: Spell out full example once.Repeating the module path is redundant; a single example plus pointer to
pytest -kpattern matching keeps the README concise.components/backends/vllm/src/dynamo/vllm/tests/pytest.ini (1)
1-25: Consider deleting or uncommenting template.Keeping an entirely commented-out
pytest.iniadds noise. Either delete it (config lives inpyproject.toml) or uncomment the options you actually need.components/backends/vllm/src/dynamo/vllm/tests/test_ports.py (4)
26-50: Good test coverage for DynamoPortRange validation.The tests effectively cover valid ranges, out-of-bounds ranges, and invalid ordering. Consider adding a test for boundary values (min=1024, max=49151) to ensure edge cases work correctly.
52-67: Consider expanding test coverage for PortMetadata.The test covers metadata with block info well, but consider adding:
- A test case for metadata without block info
- Verification that
reserved_atandpidfields are included in the outputdef test_to_etcd_value_without_block_info(self): """Test converting metadata to ETCD value without block info.""" metadata = PortMetadata( worker_id="test-worker", reason="test-reason", ) value = metadata.to_etcd_value() assert value["worker_id"] == "test-worker" assert value["reason"] == "test-reason" assert "reserved_at" in value assert "pid" in value assert isinstance(value["reserved_at"], float) assert isinstance(value["pid"], int)
143-155: Consider using a cleaner mock pattern for context managers.The test logic is correct, but the context manager mocking could be more idiomatic.
with patch("dynamo.vllm.ports.check_port_available", return_value=True): with patch("dynamo.vllm.ports.hold_ports") as mock_hold: # Use MagicMock's built-in context manager support mock_hold.return_value = MagicMock() port = await allocate_and_reserve_port( context, metadata, port_range, max_attempts=5 )
212-226: Consider adding tests for error/fallback scenarios.The successful case is well tested, but
get_host_iphas fallback logic for various error conditions that should also be tested.Add test cases for:
- When
gethostname()fails- When
gethostbyname()fails- When binding to the resolved IP fails
Example:
def test_get_host_ip_hostname_failure(self): """Test fallback when hostname retrieval fails.""" with patch("socket.gethostname", side_effect=socket.error("Failed")): ip = get_host_ip() assert ip == "127.0.0.1"components/backends/vllm/src/dynamo/vllm/ports.py (1)
220-221: Consider increasing retry delay for better backoff.The 0.01s sleep between retries might be too aggressive, especially under high contention. Consider implementing exponential backoff or at least a slightly longer delay.
if attempt < actual_max_attempts: # Exponential backoff with jitter delay = min(0.1 * (2 ** (attempt - 1)), 1.0) + random.uniform(0, 0.1) await asyncio.sleep(delay)components/backends/vllm/src/dynamo/vllm/args.py (1)
188-194: Consider enhancing the error message for better debugging.The error message could be more helpful by explaining the relationship between dp_rank, tp_size, and required port range.
if base_side_channel_port < 0: min_required_start = dp_rank * tp_size + config.port_range.min raise ValueError( f"NIXL base port calculation resulted in negative port: " f"first_allocated_port={first_port_for_dp_rank}, offset={nixl_offset}, " f"base_port={base_side_channel_port}. " f"For dp_rank={dp_rank} and tp_size={tp_size}, the minimum allocated port " f"must be at least {min_required_start}. " f"Current range: {config.port_range.min}-{config.port_range.max}. " f"Consider using a higher port range or reducing dp_rank/tp_size." )
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
.gitignore(1 hunks)components/backends/vllm/src/dynamo/vllm/args.py(5 hunks)components/backends/vllm/src/dynamo/vllm/ports.py(1 hunks)components/backends/vllm/src/dynamo/vllm/tests/README.md(1 hunks)components/backends/vllm/src/dynamo/vllm/tests/__init__.py(1 hunks)components/backends/vllm/src/dynamo/vllm/tests/pytest.ini(1 hunks)components/backends/vllm/src/dynamo/vllm/tests/test_ports.py(1 hunks)pyproject.toml(1 hunks)
🔇 Additional comments (17)
.gitignore (1)
92-96: Good catch adding.coverage.Ignoring coverage artifacts keeps the repo clean. ✅
components/backends/vllm/src/dynamo/vllm/tests/test_ports.py (4)
1-24: LGTM! Well-organized imports and proper licensing headers.The imports are properly structured and all imported items are used in the tests.
69-99: Excellent test design for hold_ports context manager.The tests effectively verify port holding and release behavior. Good use of dynamic port allocation to avoid conflicts during testing.
101-127: Well-structured async test for ETCD reservation.The test properly mocks ETCD interactions and validates key format, lease ID usage, and JSON serialization. Good verification of the complete reservation flow.
157-210: Comprehensive test coverage for port block allocation.Good testing of both successful allocation and error conditions. The test properly verifies contiguous port allocation and validates that the correct number of ETCD reservations are made.
components/backends/vllm/src/dynamo/vllm/ports.py (9)
1-23: LGTM! Well-structured module header and imports.Good choice of default port range (20000-30000) within the registered ports section.
25-41: Excellent validation logic for port ranges.The dataclass properly validates that ports are within the registered range and enforces proper ordering. Clear error messages help with debugging.
43-54: Clean ETCD context abstraction.Good key design including the host IP for multi-node support.
56-75: Well-designed metadata structure.Good inclusion of debugging information (timestamp, PID) and flexible handling of block allocation metadata.
88-114: Robust context manager implementation.Good resource management with proper cleanup. The SO_REUSEADDR flag is appropriate for this use case.
116-124: Clean port availability check.Simple and effective implementation.
126-140: Proper ETCD reservation implementation.Good use of lease ID for automatic cleanup and JSON serialization for metadata.
229-261: Clean wrapper for single port allocation.Good code reuse by leveraging the block allocation function.
263-291: Excellent robust IP detection implementation.The function handles multiple failure scenarios gracefully with appropriate fallbacks. The bind test ensures the resolved IP is actually usable, which is a nice touch.
components/backends/vllm/src/dynamo/vllm/args.py (3)
15-25: LGTM! Clean import of port management utilities.All imported components are properly used in the refactored code.
44-44: Good addition of port range configuration.The type annotation properly uses the imported DynamoPortRange class.
75-86: Well-documented CLI arguments for port configuration.The help text clearly explains the constraints and defaults are sensible.
| assert ( | ||
| config.side_channel_port is not None | ||
| ), "Must set the kv_port, use configure_ports_with_etcd" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix outdated comment referencing kv_port.
The assertion only checks side_channel_port, but the comment still mentions kv_port.
- ), "Must set the kv_port, use configure_ports_with_etcd"
+ ), "Must set the side_channel_port, use configure_ports_with_etcd"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| assert ( | |
| config.side_channel_port is not None | |
| ), "Must set the kv_port, use configure_ports_with_etcd" | |
| assert ( | |
| config.side_channel_port is not None | |
| ), "Must set the side_channel_port, use configure_ports_with_etcd" |
🤖 Prompt for AI Agents
In components/backends/vllm/src/dynamo/vllm/args.py around lines 206 to 208,
update the assertion error message to correctly reference side_channel_port
instead of kv_port. Change the message to clearly indicate that
side_channel_port must be set and suggest using configure_ports_with_etcd if
needed, ensuring the comment matches the actual assertion condition.
| [pytest] | ||
| pythonpath = [ | ||
| ".", | ||
| "components/backends/vlm/src" | ||
| ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo breaks test discovery – vlm ➜ vllm.
The Python path entry is missing an “l”. Pytest will not be able to import modules under components/backends/vllm/src, causing ImportErrors.
pythonpath = [
".",
- "components/backends/vlm/src"
+ "components/backends/vllm/src"
]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| [pytest] | |
| pythonpath = [ | |
| ".", | |
| "components/backends/vlm/src" | |
| ] | |
| [pytest] | |
| pythonpath = [ | |
| ".", | |
| "components/backends/vllm/src" | |
| ] |
🤖 Prompt for AI Agents
In pyproject.toml around lines 135 to 139, the pythonpath entry has a typo:
"vlm" should be corrected to "vllm". Update the path from
"components/backends/vlm/src" to "components/backends/vllm/src" to ensure pytest
can correctly discover and import the modules for testing.
|
@alec-flowers Are you still working on this? There are a couple of important Code Rabbit comments, then it looks ready. |
|
This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
|
This PR has been closed due to inactivity. If you believe this PR is still relevant, please feel free to reopen it with additional context or information. |
Overview:
Details:
Where should the reviewer start?
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit
New Features
Documentation
Tests
Chores
.gitignoreto exclude coverage reports.