Supports direct retrieval of tokenizer content and chat template content from the worker. #12513

whybeyoung · 2025-11-02T07:52:12Z

Usage:
When the router starts without displaying the guided chat template or model template, it will fall back to retrieving the content from the worker.

./sglang-router \ --worker-urls=grpc://127.0.0.1:8000

Signed-off-by: ybyang <[email protected]>

gemini-code-assist · 2025-11-02T07:52:36Z

Summary of Changes

Hello @whybeyoung, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the SGLang router's flexibility by enabling it to retrieve tokenizer and chat template configurations directly from the worker via a new gRPC service. This eliminates the previous requirement for local tokenizer file paths, simplifying deployment in distributed environments and improving operational resilience. The router now includes a fallback mechanism to fetch these assets from a worker if local paths are not specified, alongside improvements to handle large tokenizer data and more robust chat template rendering.

Highlights

Worker-side Tokenizer Retrieval: A new gRPC endpoint, GetTokenizerInfo, has been implemented on the worker to allow direct retrieval of tokenizer and chat template file contents.
Router Fallback Mechanism: The router now includes a fallback mechanism to fetch tokenizer information from a worker via gRPC if local tokenizer_path or model_path are not explicitly configured.
Dynamic Tokenizer Creation: A new utility function, create_tokenizer_from_memory, was added to enable the creation of tokenizers from in-memory content, facilitating the use of fetched tokenizer data.
gRPC Message Size Increase: The maximum gRPC message size for the scheduler client has been increased to 256MB to accommodate potentially large tokenizer files during transfer.
Chat Template Robustness: Chat template processing has been improved to gracefully handle cases where tools or documents parameters are None, preventing template rendering errors.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a valuable feature allowing the sglang-router to fetch tokenizer information directly from the worker via a new gRPC endpoint. This is particularly useful for deployments where the router and worker do not share a filesystem. The implementation is solid, with changes correctly applied to both the Python worker and the Rust router, including new protobuf definitions and robust fallback logic. My review focuses on improving code clarity and maintainability by suggesting refactors to reduce code duplication and simplify some of the new logic. I've also pointed out a few minor style issues. Overall, this is a great addition to the project.

gemini-code-assist · 2025-11-02T07:55:33Z

python/sglang/srt/entrypoints/grpc_server.py

+            requested_files = list(request.requested_files) if request.requested_files else []
+            if not requested_files:
+                # Default: read common tokenizer files
+                requested_files = [
+                    "tokenizer.json",
+                    "tokenizer_config.json",
+                    "chat_template.jinja",
+                    "chat_template.json",
+                ]
+
+            files = {}
+            base_dir = base_path
+
+            # Check if base_path is a file or directory
+            if os.path.isfile(base_path):
+                base_dir = os.path.dirname(base_path)
+                # If it's tokenizer.json, include it
+                if os.path.basename(base_path) == "tokenizer.json":
+                    if "tokenizer.json" not in requested_files:
+                        requested_files.append("tokenizer.json")


Using a list for requested_files can lead to processing duplicate filenames if the client sends them, and membership checking is less efficient than with a set. It's more robust to use a set to automatically handle duplicates and provide faster lookups (O(1) average time complexity).

requested_files = set(request.requested_files) if not requested_files: # Default: read common tokenizer files requested_files = { "tokenizer.json", "tokenizer_config.json", "chat_template.jinja", "chat_template.json", } files = {} base_dir = base_path # Check if base_path is a file or directory if os.path.isfile(base_path): base_dir = os.path.dirname(base_path) # If it's tokenizer.json, include it if os.path.basename(base_path) == "tokenizer.json": requested_files.add("tokenizer.json")

gemini-code-assist · 2025-11-02T07:55:34Z

sgl-router/src/app_context.rs

+        let worker_url = match &config.mode {
+            crate::config::types::RoutingMode::Regular { worker_urls }
+            | crate::config::types::RoutingMode::OpenAI { worker_urls } => {
+                worker_urls
+                    .iter()
+                    .find(|url| url.starts_with("grpc://"))
+                    .ok_or_else(|| {
+                        "No gRPC worker URL found. Cannot fetch tokenizer from worker".to_string()
+                    })?
+            }
+            crate::config::types::RoutingMode::PrefillDecode { prefill_urls, decode_urls, .. } => {
+                // Try prefill workers first, then decode workers
+                // prefill_urls is Vec<(String, Option<u16>)>, decode_urls is Vec<String>
+                prefill_urls
+                    .iter()
+                    .map(|(url, _)| url)
+                    .chain(decode_urls.iter())
+                    .find(|url| url.starts_with("grpc://"))
+                    .ok_or_else(|| {
+                        "No gRPC worker URL found. Cannot fetch tokenizer from worker".to_string()
+                    })?
+            }
+        };


The logic for finding a gRPC worker URL is duplicated for different routing modes. This can be simplified by first creating an iterator over all worker URLs and then searching for a gRPC URL. This refactoring improves code clarity and reduces repetition.

let worker_urls_iter: Box<dyn Iterator<Item = &String>> = match &config.mode { crate::config::types::RoutingMode::Regular { worker_urls } | crate::config::types::RoutingMode::OpenAI { worker_urls } => Box::new(worker_urls.iter()), crate::config::types::RoutingMode::PrefillDecode { prefill_urls, decode_urls, .. } => { Box::new(prefill_urls.iter().map(|(url, _)| url).chain(decode_urls.iter())) } }; let worker_url = worker_urls_iter .find(|url| url.starts_with("grpc://")) .ok_or_else(|| { "No gRPC worker URL found. Cannot fetch tokenizer from worker".to_string() })?;

gemini-code-assist · 2025-11-02T07:55:34Z

sgl-router/src/tokenizer/chat_template.rs


 /// Load chat template from tokenizer config JSON
 pub fn load_chat_template_from_config(config_path: &str) -> Result<Option<String>> {
+    use tracing::info;


It is conventional to place use statements at the top of the file, rather than inside a function. This improves readability and makes it easier to see all dependencies at a glance. Please move use tracing::info; to the top of the file.

gemini-code-assist · 2025-11-02T07:55:34Z

sgl-router/src/tokenizer/factory.rs

+    tokenizer_config_content: Option<&str>,
+    chat_template_content: Option<&str>,
+) -> Result<Arc<dyn traits::Tokenizer>> {
+    use tempfile::TempDir;


It is conventional to place use statements at the top of the file, rather than inside a function. This improves readability and makes it easier to see all dependencies at a glance. Please move use tempfile::TempDir; to the top of the file.

gemini-code-assist · 2025-11-02T07:55:34Z

sgl-router/src/tokenizer/factory.rs

+    let final_chat_template = chat_template_path.or_else(|| {
+        let config_path = temp_path.join("tokenizer_config.json");
+        if config_path.exists() {
+            // Try to extract chat_template field from tokenizer_config.json
+            if let Ok(content) = fs::read_to_string(&config_path) {
+                if let Ok(config_json) = serde_json::from_str::<serde_json::Value>(&content) {
+                    if let Some(template_str) = config_json.get("chat_template").and_then(|v| v.as_str()) {
+                        // Write extracted template to a separate file
+                        let extracted_path = temp_path.join("chat_template.jinja");
+                        if fs::write(&extracted_path, template_str).is_ok() {
+                            info!(
+                                config_path = ?config_path,
+                                extracted_path = ?extracted_path,
+                                temp_dir = ?temp_path,
+                                "Extracted chat template from tokenizer_config.json and saved to temporary file"
+                            );
+                            extracted_path.to_str().map(|s| s.to_string())
+                        } else {
+                            None
+                        }
+                    } else {
+                        None
+                    }
+                } else {
+                    None
+                }
+            } else {
+                None
+            }
+        } else {
+            None
+        }
+    });


The logic for discovering and extracting the chat template from tokenizer_config.json is deeply nested, which can make it hard to read and maintain. This can be flattened using the ? operator within the closure, which works with Option and makes the code more linear and easier to follow.

let final_chat_template = chat_template_path.or_else(|| { let config_path = temp_path.join("tokenizer_config.json"); if !config_path.exists() { return None; } let content = fs::read_to_string(&config_path).ok()?; let config_json: serde_json::Value = serde_json::from_str(&content).ok()?; let template_str = config_json.get("chat_template")?.as_str()?; let extracted_path = temp_path.join("chat_template.jinja"); if fs::write(&extracted_path, template_str).is_ok() { info!( config_path = ?config_path, extracted_path = ?extracted_path, temp_dir = ?temp_path, "Extracted chat template from tokenizer_config.json and saved to temporary file" ); extracted_path.to_str().map(|s| s.to_string()) } else { None } });

whybeyoung added 2 commits November 2, 2025 13:50

[1/n]Support get Tokenizer From Grpc in server

2466fee

Signed-off-by: ybyang <[email protected]>

[2/n]Support get Tokenizer From Grpc in rust router client

ce84aef

Signed-off-by: ybyang <[email protected]>

whybeyoung requested review from ByronHsu, CatherineSue, JustinTong0323, ispobock, merrymercy and slin1237 as code owners November 2, 2025 07:52

sglang-bot added the run-ci label Nov 2, 2025

gemini-code-assist bot reviewed Nov 2, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Supports direct retrieval of tokenizer content and chat template content from the worker. #12513

Supports direct retrieval of tokenizer content and chat template content from the worker. #12513

whybeyoung commented Nov 2, 2025

Uh oh!

gemini-code-assist bot commented Nov 2, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Nov 2, 2025

Uh oh!

gemini-code-assist bot Nov 2, 2025

Uh oh!

gemini-code-assist bot Nov 2, 2025

Uh oh!

gemini-code-assist bot Nov 2, 2025

Uh oh!

gemini-code-assist bot Nov 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Supports direct retrieval of tokenizer content and chat template content from the worker. #12513

Are you sure you want to change the base?

Supports direct retrieval of tokenizer content and chat template content from the worker. #12513

Conversation

whybeyoung commented Nov 2, 2025

Uh oh!

gemini-code-assist bot commented Nov 2, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 2, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 2, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 2, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 2, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 2, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants