Conversation
There was a problem hiding this comment.
Summary of Changes
Hello @slin1237, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request aims to complete the SGLang Router's OpenAI API specification by incorporating a suite of SGLang-specific extensions. The primary goal is to bridge compatibility gaps and provide richer functionality for advanced use cases, such as fine-tuned control over sampling, structured output generation, and integration with specialized model features like LoRA and O1-style reasoning. This expansion ensures that the SGLang Router can fully leverage the capabilities of SGLang models while maintaining seamless integration with existing OpenAI API clients. Additionally, I've taken the opportunity to improve the maintainability of our test suite by reducing redundant code.
Highlights
- Enhanced Sampling Parameters: I've added comprehensive SGLang-specific sampling parameters to both Chat Completions and Completions APIs, including
top_k,min_p,min_tokens, andrepetition_penalty. - Structured Generation Capabilities: New structured generation constraints like
regex,ebnf, andjson_schema(for Completions) are now supported, enabling more controlled and predictable model outputs. - Fine-Grained Generation Control: Advanced generation control options such as
stop_token_ids,no_stop_trim,ignore_eos,continue_final_message, andskip_special_tokenshave been integrated. - LoRA and Model Customization: The router now supports LoRA adapters via
lora_pathand includessession_paramsfor session management, along with the ability to returnhidden_states. - Reasoning Model Integration: Support for O1-style reasoning models has been added with
separate_reasoning,stream_reasoning, andreasoning_contentfields. - Test Code Refactoring and Cleanup: I've significantly cleaned up test code across
benchmark_integration.rs,request_processing.rs, andpd_types.rsby introducing helper functions and refactoring existing tests, reducing boilerplate by approximately 850 lines.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Code Review
This pull request does a great job of extending the OpenAI API types with SGLang-specific features and refactoring the test suite to be more maintainable by removing significant boilerplate. The changes are well-structured and the intent is clear.
However, I've identified a critical issue: while the new API fields are correctly defined in the request structs, they are not being passed through the to_pd_request adapter functions. This means the new parameters would be dropped and ignored by the backend. I've provided suggestions to fix this for each of the affected request types. I also included a minor suggestion to improve the conciseness of one of the new helper functions.
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
|
@slin1237 |
Motivation
This PR updates the SGLang Router's OpenAI API specification by implementing critical SGLang-specific extensions that were previously missing.
Key Problems Addressed:
top_k,min_p,repetition_penalty,min_tokensregex,ebnf,json_schemaconstraintsstop_token_ids,ignore_eos,no_stop_trimparametersseparate_reasoning,stream_reasoning)This improvement bridges the compatibility gap while maintaining full backward compatibility with existing OpenAI clients.
Modifications
Core SGLang Sampling & Generation Extensions
Completed Chat Completions API (
ChatCompletionRequest):Completed Completions API (
CompletionRequest):Advanced SGLang Features
LoRA & Model Customization:
Reasoning Models (O1-style) Support:
Enhanced Response Types:
Code Quality Improvements
Test Code Cleanup - Eliminated ~850 lines of repetitive boilerplate:
Accuracy Test
This PR does not modify model-side code, kernels, or model architecture. All changes are to the router's API type definitions and request handling logic. Accuracy testing is not applicable.
Benchmark & Profiling
Performance Impact: Minimal - All optimizations maintain performance
Serialization Performance (using existing benchmarks):
Checklist