-
Notifications
You must be signed in to change notification settings - Fork 692
feat: reorganize sglang and add expert distribution endpoints #2181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…uest handlers for SGLang endpoints
WalkthroughThis change introduces a new common module for sglang worker handlers, refactors both decode and worker handler classes to inherit from a shared abstract base class, and centralizes utility functions such as graceful shutdown. It also removes placeholder documentation sections from the SGLang backend README. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant WorkerHandler (BaseWorkerHandler)
participant Engine
participant Runtime
User->>WorkerHandler: generate(request)
WorkerHandler->>Engine: generate tokens
Engine-->>WorkerHandler: tokens/result
WorkerHandler-->>User: response
User->>WorkerHandler: flush_cache(request)
WorkerHandler->>Engine: flush_cache()
Engine-->>WorkerHandler: result
WorkerHandler-->>User: {"success": True/False, "msg": ...}
User->>WorkerHandler: start_expert_distribution_record(request)
WorkerHandler->>Engine: start_recording()
WorkerHandler-->>User: {"success": True, "msg": ...}
User->>WorkerHandler: stop_expert_distribution_record(request)
WorkerHandler->>Engine: stop_recording()
WorkerHandler-->>User: {"success": True, "msg": ...}
User->>WorkerHandler: dump_expert_distribution_record(request)
WorkerHandler->>Engine: dump_record()
WorkerHandler-->>User: {"success": True/False, "msg": ...}
Note over WorkerHandler,Runtime: On shutdown signal
WorkerHandler->>Runtime: graceful_shutdown()
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~15 minutes Possibly related PRs
Poem
Note ⚡️ Unit Test Generation is now available in beta!Learn more here, or try it out under "Finishing Touches" below. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (3)
components/backends/sglang/src/dynamo/sglang/common/base_handlers.py (2)
15-26: Consider making init concrete instead of abstract.Making the constructor abstract is unusual in Python. Since all subclasses need the same base initialization logic, consider making this a concrete method that subclasses can call via
super().__init__().- @abstractmethod - def __init__( + def __init__( self, engine: sgl.Engine, server_args: ServerArgs, component, decode_client: Optional[Any] = None, ): self.engine = engine self.server_args = server_args self.component = component
70-84: Improve directory validation and error handling.The directory validation logic could be more robust by checking if the path is actually a directory and handling edge cases.
# Validate directory - if not os.path.exists(dir): + if not os.path.exists(dir): yield { "status": "error", "message": f"Directory {dir} does not exist", } return + + if not os.path.isdir(dir): + yield { + "status": "error", + "message": f"Path {dir} is not a directory", + } + returncomponents/backends/sglang/src/dynamo/sglang/worker/main.py (1)
287-293: Clarify flush_cache method override.This class has its own
flush_cachemethod that differs from the base class implementation (usestokenizer_manager.flush_cache()vsengine.flush_cache()). Consider adding a comment explaining why this override is necessary.async def flush_cache(self, request: dict): + """Override base class to use tokenizer_manager for worker-specific cache flushing""" _ = request asyncio.create_task(self.engine.tokenizer_manager.flush_cache())
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
components/backends/sglang/README.md(0 hunks)components/backends/sglang/src/dynamo/sglang/common/__init__.py(1 hunks)components/backends/sglang/src/dynamo/sglang/common/base_handlers.py(1 hunks)components/backends/sglang/src/dynamo/sglang/common/sgl_utils.py(2 hunks)components/backends/sglang/src/dynamo/sglang/decode_worker/main.py(2 hunks)components/backends/sglang/src/dynamo/sglang/worker/main.py(1 hunks)
💤 Files with no reviewable changes (1)
- components/backends/sglang/README.md
🧰 Additional context used
🧠 Learnings (3)
components/backends/sglang/src/dynamo/sglang/common/sgl_utils.py (1)
Learnt from: nnshah1
PR: #1444
File: tests/fault_tolerance/utils/metrics.py:30-32
Timestamp: 2025-07-01T13:55:03.940Z
Learning: The @dynamo_worker() decorator in the dynamo codebase returns a wrapper that automatically injects the runtime parameter before calling the wrapped function. This means callers only need to provide the non-runtime parameters, while the decorator handles injecting the runtime argument automatically. For example, a function with signature async def get_metrics(runtime, log_dir) decorated with @dynamo_worker() can be called as get_metrics(log_dir) because the decorator wrapper injects the runtime parameter.
components/backends/sglang/src/dynamo/sglang/decode_worker/main.py (1)
Learnt from: nnshah1
PR: #1444
File: tests/fault_tolerance/utils/metrics.py:30-32
Timestamp: 2025-07-01T13:55:03.940Z
Learning: The @dynamo_worker() decorator in the dynamo codebase returns a wrapper that automatically injects the runtime parameter before calling the wrapped function. This means callers only need to provide the non-runtime parameters, while the decorator handles injecting the runtime argument automatically. For example, a function with signature async def get_metrics(runtime, log_dir) decorated with @dynamo_worker() can be called as get_metrics(log_dir) because the decorator wrapper injects the runtime parameter.
components/backends/sglang/src/dynamo/sglang/worker/main.py (1)
Learnt from: nnshah1
PR: #1444
File: tests/fault_tolerance/utils/metrics.py:30-32
Timestamp: 2025-07-01T13:55:03.940Z
Learning: The @dynamo_worker() decorator in the dynamo codebase returns a wrapper that automatically injects the runtime parameter before calling the wrapped function. This means callers only need to provide the non-runtime parameters, while the decorator handles injecting the runtime argument automatically. For example, a function with signature async def get_metrics(runtime, log_dir) decorated with @dynamo_worker() can be called as get_metrics(log_dir) because the decorator wrapper injects the runtime parameter.
🪛 GitHub Actions: Copyright Checks
components/backends/sglang/src/dynamo/sglang/common/base_handlers.py
[error] 1-1: Copyright check failed: Invalid or missing copyright header.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: pre-merge-rust (lib/runtime/examples)
- GitHub Check: pre-merge-rust (.)
- GitHub Check: pre-merge-rust (lib/bindings/python)
- GitHub Check: Build and Test - vllm
🔇 Additional comments (7)
components/backends/sglang/src/dynamo/sglang/common/sgl_utils.py (2)
18-18: LGTM: Logging import added appropriately.The logging import is correctly added to support the new
graceful_shutdownfunction.
61-64: LGTM: Clean graceful shutdown implementation.The
graceful_shutdownfunction is well-implemented with appropriate logging and clean separation of concerns. The function signature correctly expects the runtime parameter to be passed explicitly, which aligns with how it's called in the worker modules.components/backends/sglang/src/dynamo/sglang/worker/main.py (2)
30-35: LGTM: Clean import consolidation.The import consolidation from
dynamo.sglang.commonimproves code organization and reduces import complexity.
40-48: LGTM: Proper inheritance implementation.The
RequestHandlerclass correctly inherits fromBaseWorkerHandlerand properly calls the superclass constructor.components/backends/sglang/src/dynamo/sglang/decode_worker/main.py (3)
18-22: LGTM: Consistent import consolidation.The import consolidation follows the same pattern as the worker module, improving consistency across the codebase.
27-29: LGTM: Proper base class integration.The
DecodeRequestHandlercorrectly inherits fromBaseWorkerHandlerand the constructor properly passes the required parameters to the superclass.
79-79: LGTM: Handler instantiation updated correctly.The handler instantiation correctly includes the new
server_argsandcomponentparameters required by the updated constructor.
components/backends/sglang/src/dynamo/sglang/common/__init__.py
Outdated
Show resolved
Hide resolved
components/backends/sglang/src/dynamo/sglang/common/__init__.py
Outdated
Show resolved
Hide resolved
components/backends/sglang/src/dynamo/sglang/common/base_handlers.py
Outdated
Show resolved
Hide resolved
…prove cache management in SGLang components
…PLB) and update SGLang HTTP server details
|
/ok to test ba4bc6e |
|
@ishandhanani would it make sense to put the benchmarking scripts somewhere in Also, not part of this PR, but there are some duplicate instructions in the |
biswapanda
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
…orks in pre-merge workflow
components/backends/sglang/src/dynamo/sglang/decode_worker/main.py
Outdated
Show resolved
Hide resolved
components/backends/sglang/src/dynamo/sglang/common/base_handlers.py
Outdated
Show resolved
Hide resolved
rmccorm4
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left some non-blocking comments/questions
…nd add ignore glob in pyproject.toml
Summary by CodeRabbit
Documentation
New Features
Refactor