Skip to content

Conversation

@whybeyoung
Copy link
Collaborator

Usage:
When the router starts without displaying the guided chat template or model template, it will fall back to retrieving the content from the worker.

./sglang-router \ --worker-urls=grpc://127.0.0.1:8000

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @whybeyoung, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the SGLang router's flexibility by enabling it to retrieve tokenizer and chat template configurations directly from the worker via a new gRPC service. This eliminates the previous requirement for local tokenizer file paths, simplifying deployment in distributed environments and improving operational resilience. The router now includes a fallback mechanism to fetch these assets from a worker if local paths are not specified, alongside improvements to handle large tokenizer data and more robust chat template rendering.

Highlights

  • Worker-side Tokenizer Retrieval: A new gRPC endpoint, GetTokenizerInfo, has been implemented on the worker to allow direct retrieval of tokenizer and chat template file contents.
  • Router Fallback Mechanism: The router now includes a fallback mechanism to fetch tokenizer information from a worker via gRPC if local tokenizer_path or model_path are not explicitly configured.
  • Dynamic Tokenizer Creation: A new utility function, create_tokenizer_from_memory, was added to enable the creation of tokenizers from in-memory content, facilitating the use of fetched tokenizer data.
  • gRPC Message Size Increase: The maximum gRPC message size for the scheduler client has been increased to 256MB to accommodate potentially large tokenizer files during transfer.
  • Chat Template Robustness: Chat template processing has been improved to gracefully handle cases where tools or documents parameters are None, preventing template rendering errors.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a valuable feature allowing the sglang-router to fetch tokenizer information directly from the worker via a new gRPC endpoint. This is particularly useful for deployments where the router and worker do not share a filesystem. The implementation is solid, with changes correctly applied to both the Python worker and the Rust router, including new protobuf definitions and robust fallback logic. My review focuses on improving code clarity and maintainability by suggesting refactors to reduce code duplication and simplify some of the new logic. I've also pointed out a few minor style issues. Overall, this is a great addition to the project.

Comment on lines +349 to +368
requested_files = list(request.requested_files) if request.requested_files else []
if not requested_files:
# Default: read common tokenizer files
requested_files = [
"tokenizer.json",
"tokenizer_config.json",
"chat_template.jinja",
"chat_template.json",
]

files = {}
base_dir = base_path

# Check if base_path is a file or directory
if os.path.isfile(base_path):
base_dir = os.path.dirname(base_path)
# If it's tokenizer.json, include it
if os.path.basename(base_path) == "tokenizer.json":
if "tokenizer.json" not in requested_files:
requested_files.append("tokenizer.json")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using a list for requested_files can lead to processing duplicate filenames if the client sends them, and membership checking is less efficient than with a set. It's more robust to use a set to automatically handle duplicates and provide faster lookups (O(1) average time complexity).

            requested_files = set(request.requested_files)
            if not requested_files:
                # Default: read common tokenizer files
                requested_files = {
                    "tokenizer.json",
                    "tokenizer_config.json",
                    "chat_template.jinja",
                    "chat_template.json",
                }

            files = {}
            base_dir = base_path

            # Check if base_path is a file or directory
            if os.path.isfile(base_path):
                base_dir = os.path.dirname(base_path)
                # If it's tokenizer.json, include it
                if os.path.basename(base_path) == "tokenizer.json":
                    requested_files.add("tokenizer.json")

Comment on lines +441 to +463
let worker_url = match &config.mode {
crate::config::types::RoutingMode::Regular { worker_urls }
| crate::config::types::RoutingMode::OpenAI { worker_urls } => {
worker_urls
.iter()
.find(|url| url.starts_with("grpc://"))
.ok_or_else(|| {
"No gRPC worker URL found. Cannot fetch tokenizer from worker".to_string()
})?
}
crate::config::types::RoutingMode::PrefillDecode { prefill_urls, decode_urls, .. } => {
// Try prefill workers first, then decode workers
// prefill_urls is Vec<(String, Option<u16>)>, decode_urls is Vec<String>
prefill_urls
.iter()
.map(|(url, _)| url)
.chain(decode_urls.iter())
.find(|url| url.starts_with("grpc://"))
.ok_or_else(|| {
"No gRPC worker URL found. Cannot fetch tokenizer from worker".to_string()
})?
}
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic for finding a gRPC worker URL is duplicated for different routing modes. This can be simplified by first creating an iterator over all worker URLs and then searching for a gRPC URL. This refactoring improves code clarity and reduces repetition.

        let worker_urls_iter: Box<dyn Iterator<Item = &String>> = match &config.mode {
            crate::config::types::RoutingMode::Regular { worker_urls }
            | crate::config::types::RoutingMode::OpenAI { worker_urls } => Box::new(worker_urls.iter()),
            crate::config::types::RoutingMode::PrefillDecode { prefill_urls, decode_urls, .. } => {
                Box::new(prefill_urls.iter().map(|(url, _)| url).chain(decode_urls.iter()))
            }
        };

        let worker_url = worker_urls_iter
            .find(|url| url.starts_with("grpc://"))
            .ok_or_else(|| {
                "No gRPC worker URL found. Cannot fetch tokenizer from worker".to_string()
            })?;


/// Load chat template from tokenizer config JSON
pub fn load_chat_template_from_config(config_path: &str) -> Result<Option<String>> {
use tracing::info;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

It is conventional to place use statements at the top of the file, rather than inside a function. This improves readability and makes it easier to see all dependencies at a glance. Please move use tracing::info; to the top of the file.

tokenizer_config_content: Option<&str>,
chat_template_content: Option<&str>,
) -> Result<Arc<dyn traits::Tokenizer>> {
use tempfile::TempDir;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

It is conventional to place use statements at the top of the file, rather than inside a function. This improves readability and makes it easier to see all dependencies at a glance. Please move use tempfile::TempDir; to the top of the file.

Comment on lines +207 to +239
let final_chat_template = chat_template_path.or_else(|| {
let config_path = temp_path.join("tokenizer_config.json");
if config_path.exists() {
// Try to extract chat_template field from tokenizer_config.json
if let Ok(content) = fs::read_to_string(&config_path) {
if let Ok(config_json) = serde_json::from_str::<serde_json::Value>(&content) {
if let Some(template_str) = config_json.get("chat_template").and_then(|v| v.as_str()) {
// Write extracted template to a separate file
let extracted_path = temp_path.join("chat_template.jinja");
if fs::write(&extracted_path, template_str).is_ok() {
info!(
config_path = ?config_path,
extracted_path = ?extracted_path,
temp_dir = ?temp_path,
"Extracted chat template from tokenizer_config.json and saved to temporary file"
);
extracted_path.to_str().map(|s| s.to_string())
} else {
None
}
} else {
None
}
} else {
None
}
} else {
None
}
} else {
None
}
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic for discovering and extracting the chat template from tokenizer_config.json is deeply nested, which can make it hard to read and maintain. This can be flattened using the ? operator within the closure, which works with Option and makes the code more linear and easier to follow.

    let final_chat_template = chat_template_path.or_else(|| {
        let config_path = temp_path.join("tokenizer_config.json");
        if !config_path.exists() {
            return None;
        }

        let content = fs::read_to_string(&config_path).ok()?;
        let config_json: serde_json::Value = serde_json::from_str(&content).ok()?;
        let template_str = config_json.get("chat_template")?.as_str()?;

        let extracted_path = temp_path.join("chat_template.jinja");
        if fs::write(&extracted_path, template_str).is_ok() {
            info!(
                config_path = ?config_path,
                extracted_path = ?extracted_path,
                temp_dir = ?temp_path,
                "Extracted chat template from tokenizer_config.json and saved to temporary file"
            );
            extracted_path.to_str().map(|s| s.to_string())
        } else {
            None
        }
    });

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants