Skip to content

support new qwen3_coder_detector#16744

Merged
Kangyan-Zhou merged 11 commits intosgl-project:mainfrom
attack204:feature/gaoji_support_new_qwen3_coder_detector
Jan 19, 2026
Merged

support new qwen3_coder_detector#16744
Kangyan-Zhou merged 11 commits intosgl-project:mainfrom
attack204:feature/gaoji_support_new_qwen3_coder_detector

Conversation

@attack204
Copy link
Contributor

@attack204 attack204 commented Jan 8, 2026

UT

39a93fe0305fa84ec404a08211d84853

E2E Test

76e2386d0e007fb9c295f7f8b61f2904

The test results, provided by Zeyu Cui from the Qwen Team @cyente, have been confirmed to meet expectations despite the presence of some jitter.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @attack204, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significantly improved Qwen3CoderDetector for handling function calls within Qwen3 models. The core change is a complete overhaul of the parsing logic, moving to a more robust, cursor-based streaming approach that can accurately extract tool names and their arguments, including complex data types, even when input is fragmented. This enhancement is backed by a comprehensive new test framework designed to rigorously validate the parser's behavior across various scenarios.

Highlights

  • New Qwen3 Coder Detector: Introduced a new Qwen3CoderDetector to support function call parsing for Qwen3 models, replacing previous detection logic.
  • Robust Streaming Parser: Implemented a new cursor-based streaming parser in Qwen3CoderDetector for incremental and reliable processing of tool calls and their arguments.
  • Schema-Aware Parameter Conversion: Added functionality to convert parameter values to appropriate Python types (e.g., int, bool, list, dict) based on the provided tool schemas.
  • Comprehensive Test Suite: Included a new, extensive test suite with a dedicated test framework, fixtures, and various test cases to validate both streaming and non-streaming parsing, covering edge cases and complex data structures.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@attack204
Copy link
Contributor Author

/tag-and-rerun-ci

@github-actions github-actions bot added the run-ci label Jan 8, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the Qwen3CoderDetector to use a more robust cursor-based streaming parser, which is a significant improvement over the previous regex-based implementation. It also introduces a comprehensive suite of tests to validate the new parser's functionality for both streaming and non-streaming scenarios.

While the changes are a positive step forward, I've identified a few critical and high-severity issues that need to be addressed:

  • The Qwen3CoderDetector class is currently not instantiable due to an unimplemented abstract method.
  • There's a debug logger.critical statement that should be removed.
  • The streaming state is not correctly reset, which will lead to issues when the detector instance is reused.

Additionally, there are some medium-severity issues related to maintainability, such as comments in Chinese and the use of bare except blocks. Please see the detailed comments for suggestions on how to resolve these issues.

Comment on lines +16 to +18
logger.critical(
f"[xixi.yjx] PARSER: 1231 Try to port from vLLM parser: Using cursor-based streaming parser."
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This logger.critical call appears to be a debug statement. Such statements should be removed from production code to avoid polluting logs with irrelevant information, especially at a critical level.

Comment on lines +59 to +69
def _reset_streaming_state(self):
"""Reset internal streaming cursors."""
self.parsed_pos = 0
self.current_tool_param_count = 0
self.json_started = False
self.is_inside_tool_call = False # [FIX] Reset state

# Base class state reset is handled by base class logic mostly,
# but we ensure our cursor aligns with buffer resets.
if hasattr(self, "_buffer") and not self._buffer:
self.parsed_pos = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The _reset_streaming_state method is incomplete. It fails to reset state variables inherited from BaseFormatDetector, such as current_tool_id and current_tool_name_sent. Since these are modified during parsing, not resetting them will cause state to leak between streaming sessions on the same detector instance, leading to incorrect behavior. The method should reset all streaming-related state from both the base and child class.

    def _reset_streaming_state(self):
        """Reset internal streaming cursors and all streaming state."""
        # Reset state for this class
        self.parsed_pos = 0
        self.current_tool_param_count = 0
        self.json_started = False
        self.is_inside_tool_call = False
        self.current_func_name = None

        # Reset state from BaseFormatDetector
        # Note: _buffer is also part of state that should be reset.
        self._buffer = ""
        self.prev_tool_call_arr = []
        self.current_tool_id = -1
        self.current_tool_name_sent = False
        self.streamed_args_for_tool = []

Comment on lines +43 to +58
# Streaming State
# 覆盖父类的 _buffer 管理,或者配合父类使用。
# SGLang BaseFormatDetector 通常有自己的 _buffer,但这里我们显式管理以确保逻辑清晰
if not hasattr(self, "_buffer"):
self._buffer = ""

# 指向 buffer 中下一个待处理字符的索引
self.parsed_pos = 0
# 当前正在处理的 tool 内部的参数计数,用于判断是否加逗号
self.current_tool_param_count = 0
# 标记当前 tool 是否已经发送了 '{'
self.json_started = False

# [FIX] 新增状态位:标记是否处于 tool_call 结构块内部
self.is_inside_tool_call = False

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This __init__ method has a couple of issues:

  1. The instance attribute self.current_func_name is used in parse_streaming_increment but it's not initialized here. This could lead to an AttributeError. It should be initialized, for example to None.
  2. There are several comments in Chinese (e.g., lines 44-45, 49, 51, 53, 56). For consistency and maintainability in this codebase, they should be translated to English.
  3. The comment [FIX] on line 56 should be removed.

Comment on lines +98 to +179
def _convert_param_value(
self, param_value: str, param_name: str, param_config: dict, func_name: str
) -> Any:
"""Convert parameter value based on its type in the schema."""
# Handle null value for any type
if param_value.lower() == "null":
return None

if param_name not in param_config:
if param_config != {}:
logger.warning(
f"Parsed parameter '{param_name}' is not defined in the tool "
f"parameters for tool '{func_name}', directly returning the string value."
)
return param_value

if (
isinstance(param_config[param_name], dict)
and "type" in param_config[param_name]
):
param_type = str(param_config[param_name]["type"]).strip().lower()
else:
param_type = "string"
if param_type in ["string", "str", "text", "varchar", "char", "enum"]:
return param_value
elif (
param_type.startswith("int")
or param_type.startswith("uint")
or param_type.startswith("long")
or param_type.startswith("short")
or param_type.startswith("unsigned")
):
try:
param_value = int(param_value)
except:
logger.warning(
f"Parsed value '{param_value}' of parameter '{param_name}' is not an integer in tool "
f"'{func_name}', degenerating to string."
)
return param_value
elif param_type.startswith("num") or param_type.startswith("float"):
try:
float_param_value = float(param_value)
param_value = (
float_param_value
if float_param_value - int(float_param_value) != 0
else int(float_param_value)
)
except:
logger.warning(
f"Parsed value '{param_value}' of parameter '{param_name}' is not a float in tool "
f"'{func_name}', degenerating to string."
)
return param_value
elif param_type in ["boolean", "bool", "binary"]:
param_value = param_value.lower()
if param_value not in ["true", "false"]:
logger.warning(
f"Parsed value '{param_value}' of parameter '{param_name}' is not a boolean (`true` of `false`) in tool '{func_name}', degenerating to false."
)
return param_value == "true"
else:
if (
param_type in ["object", "array", "arr"]
or param_type.startswith("dict")
or param_type.startswith("list")
):
try:
param_value = json.loads(param_value)
return param_value
except:
logger.warning(
f"Parsed value '{param_value}' of parameter '{param_name}' cannot be parsed with json.loads in tool "
f"'{func_name}', will try other methods to parse it."
)
try:
param_value = ast.literal_eval(param_value) # safer
except:
logger.warning(
f"Parsed value '{param_value}' of parameter '{param_name}' cannot be converted via Python `ast.literal_eval()` in tool '{func_name}', degenerating to string."
)
return param_value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The try...except blocks in this method use a bare except:, which is too broad and can mask unexpected errors. It's better to catch specific exceptions that you expect to occur during type conversion, such as ValueError for int() and float(), and json.JSONDecodeError or SyntaxError for json.loads() and ast.literal_eval(). This makes the error handling more robust and predictable.

Comment on lines +50 to +58
"""
运行单个测试用例

Args:
test_name: 测试名称
response_text: 响应文本
mode: 流式生成模式
expected: 期望的解析结果,包含 'text' 和 'tools' 字段
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The docstrings and comments in this test file are in Chinese. To maintain consistency and improve accessibility for all contributors, please translate them to English.

Comment on lines +49 to +63
"""
运行单个流式测试

Args:
test_name: 测试名称
response_text: 完整的响应文本
mode: 流式生成模式 ('char', 'atomic_tags', 或其他)
tools: 工具列表,如果为 None 则使用默认工具
verbose: 是否打印详细日志,如果为 None 则使用实例属性 self.verbose
compare_with_non_streaming: 是否与非流式解析结果对比,如果为 None 则根据 parser_mode 自动决定
expected: 期望的解析结果,包含 'text' 和 'tools' 字段

Returns:
StreamingTestResult: 测试结果
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The docstrings and comments in this test framework file are in Chinese. For better maintainability and to ensure the code is accessible to all contributors, please translate these to English.

@attack204 attack204 force-pushed the feature/gaoji_support_new_qwen3_coder_detector branch 5 times, most recently from ba3597a to fa40bac Compare January 10, 2026 10:44
@attack204
Copy link
Contributor Author

/rerun-failed-ci

@zhaochenyang20
Copy link
Collaborator

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant refactoring of the Qwen3CoderDetector to support the new Qwen3 Coder model's tool call format. The implementation is completely rewritten, moving from a simpler regex-based parser to a more robust, cursor-based streaming parser. This new approach improves handling of various edge cases and incremental JSON construction. Additionally, it adds sophisticated parameter type conversion based on the tool's schema. The accompanying tests have also been completely rewritten to provide comprehensive coverage for the new implementation, including basic functionality, streaming, parameter types, and edge cases.

The changes are a great improvement in terms of robustness and functionality. I have a few suggestions to further improve the code quality, mainly around exception handling and a small bug in type conversion logic.

return raw


class Qwen3CoderDetector(BaseFormatDetector):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The docstring for Qwen3CoderDetector was removed. The new implementation is significantly more complex than the previous one, involving a cursor-based streaming parser and type conversion logic. Adding a new, detailed docstring explaining the class's purpose, its state variables, and the expected tool call format would greatly improve maintainability. For example:

    """
    Detector for Qwen3 Coder models.

    This detector uses a cursor-based streaming parser to handle the XML-like
    tool call format. It supports incremental parsing of tool calls and converts
    parameter values to their appropriate types based on the provided tool schema.

    Assumed format:
        <tool_call>
        <function=function_name>
        <parameter=param_name1>value1</parameter>
        <parameter=param_name2>value2</parameter>
        </function>
        </tool_call>
    """

Comment on lines +129 to +133
except:
logger.warning(
f"Parsed value '{param_value}' of parameter '{param_name}' is not an integer in tool "
f"'{func_name}', degenerating to string."
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using a bare except: is generally discouraged as it can catch unexpected exceptions like SystemExit or KeyboardInterrupt, making the program harder to debug and control. It's better to catch Exception if you want to catch all general exceptions. This applies to the other try...except: blocks in this function as well (lines 143, 165, and 172).

Suggested change
except:
logger.warning(
f"Parsed value '{param_value}' of parameter '{param_name}' is not an integer in tool "
f"'{func_name}', degenerating to string."
)
except Exception:
logger.warning(
f"Parsed value '{param_value}' of parameter '{param_name}' is not an integer in tool "
f"'{func_name}', degenerating to string."
)

Comment on lines +137 to +142
maybe_convert = (
False if "." in param_value or "e" in param_value.lower() else True
)
param_value: float = float(param_value)
if maybe_convert and param_value.is_integer():
param_value = int(param_value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic to convert a float-like string to an integer seems to have a flaw. The maybe_convert flag prevents conversion for strings containing a . like "3.0", which will remain a float instead of being converted to an integer. This is likely not the intended behavior. You can simplify the logic to correctly handle all cases where a float represents a whole number.

Suggested change
maybe_convert = (
False if "." in param_value or "e" in param_value.lower() else True
)
param_value: float = float(param_value)
if maybe_convert and param_value.is_integer():
param_value = int(param_value)
val = float(param_value)
if val.is_integer():
param_value = int(val)
else:
param_value = val

Comment on lines +143 to +147
except:
logger.warning(
f"Parsed value '{param_value}' of parameter '{param_name}' is not a float in tool "
f"'{func_name}', degenerating to string."
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using a bare except: is generally discouraged as it can hide unexpected errors. It's better to catch Exception to avoid catching system-exiting exceptions.

Suggested change
except:
logger.warning(
f"Parsed value '{param_value}' of parameter '{param_name}' is not a float in tool "
f"'{func_name}', degenerating to string."
)
except Exception:
logger.warning(
f"Parsed value '{param_value}' of parameter '{param_name}' is not a float in tool "
f"'{func_name}', degenerating to string."
)

Comment on lines +165 to +169
except:
logger.warning(
f"Parsed value '{param_value}' of parameter '{param_name}' cannot be parsed with json.loads in tool "
f"'{func_name}', will try other methods to parse it."
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using a bare except: can catch more exceptions than intended. Please specify the exception type, or use except Exception: for a general catch.

Suggested change
except:
logger.warning(
f"Parsed value '{param_value}' of parameter '{param_name}' cannot be parsed with json.loads in tool "
f"'{func_name}', will try other methods to parse it."
)
except Exception:
logger.warning(
f"Parsed value '{param_value}' of parameter '{param_name}' cannot be parsed with json.loads in tool "
f"'{func_name}', will try other methods to parse it."
)

Comment on lines +172 to +175
except:
logger.warning(
f"Parsed value '{param_value}' of parameter '{param_name}' cannot be converted via Python `ast.literal_eval()` in tool '{func_name}', degenerating to string."
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

A bare except: is too broad. It's recommended to catch Exception to avoid suppressing critical system-level exceptions.

Suggested change
except:
logger.warning(
f"Parsed value '{param_value}' of parameter '{param_name}' cannot be converted via Python `ast.literal_eval()` in tool '{func_name}', degenerating to string."
)
except Exception:
logger.warning(
f"Parsed value '{param_value}' of parameter '{param_name}' cannot be converted via Python `ast.literal_eval()` in tool '{func_name}', degenerating to string."
)

Comment on lines +56 to +65
def _reset_streaming_state(self):
"""Reset internal streaming cursors."""
self.parsed_pos = 0
self.current_tool_param_count = 0
self.json_started = False
self.is_inside_tool_call = False # [FIX] Reset state

# Base class state reset is handled by base class logic mostly,
# but we ensure our cursor aligns with buffer resets.
if hasattr(self, "_buffer") and not self._buffer:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a stream is interrupted mid-generation (user disconnects), self.current_func_name might retain a stale value. When the detector is reused, it could corrupt the next request.

Please Ensure self.current_func_name = None is explicitly set in _reset_streaming_state

# -------------------------------------------------------
if current_slice.startswith(self.tool_call_start_token):
self.parsed_pos += len(self.tool_call_start_token)
self.is_inside_tool_call = True # [FIX] Enter tool call region
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clean the comments

Comment on lines +43 to +50
if not hasattr(self, "_buffer"):
self._buffer = ""

# Index pointing to the next character to be processed in buffer
self.parsed_pos = 0
# Parameter count inside the current tool being processed, used to determine whether to add comma
self.current_tool_param_count = 0
# Flag indicating whether current tool has already sent '{'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In our style guide, try to avoid hasattr as much as possible. Like these two examples:

  • Avoid Dynamic Attributes: Minimize the use of getattr or setattr. Code should be explicit for better traceability. Example Example 2

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

class Qwen3CoderDetector(BaseFormatDetector):
    def __init__(self):
        super().__init__()
        
        # Explicitly define all attributes with Type Hints
        # No 'hasattr' checks needed. We control the lifecycle here.
        self._buffer: str = "" 
        self.parsed_pos: int = 0
        self.current_tool_param_count: int = 0
        self.json_started: bool = False
        self.is_inside_tool_call: bool = False
        
        # Initialize attributes that were missing in the original PR
        self.current_func_name: Optional[str] = None

@attack204 attack204 force-pushed the feature/gaoji_support_new_qwen3_coder_detector branch from be3a864 to bf57b75 Compare January 11, 2026 06:55
@attack204
Copy link
Contributor Author

/rerun-failed-ci

@attack204 attack204 force-pushed the feature/gaoji_support_new_qwen3_coder_detector branch from f55b7fe to c41c78b Compare January 18, 2026 04:21
@zhaochenyang20
Copy link
Collaborator

/rerun-failed-ci

@zhaochenyang20
Copy link
Collaborator

/rerun-failed-ci

Copy link
Collaborator

@zhaochenyang20 zhaochenyang20 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are great

@Kangyan-Zhou Kangyan-Zhou merged commit 858a4d6 into sgl-project:main Jan 19, 2026
226 of 241 checks passed
DotSlash-A pushed a commit to DotSlash-A/sglang that referenced this pull request Jan 19, 2026
* fix(ci): recover from corrupted MMMU parquet cache (sgl-project#17256)

* [diffusion] feat: support default 4-step inference for Flux2-Klein distilled models (sgl-project#17225)

Signed-off-by: Lancer <maruixiang6688@gmail.com>

* Add runner utilization report workflow (sgl-project#17234)

* cli: support sglang version (sgl-project#17250)

* Use swa radix cache and memory pool for gpt-oss model (sgl-project#17261)

* [VLM][Reland] Refactor load_mm_data to improve performance (sgl-project#16152)

Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>

* [Tiny] Improve docs (sgl-project#17264)

* [diffusion] fix: set guidance_scale default to None (sgl-project#17182)

* Tiny fix comment typo (sgl-project#17287)

* [SPEC_V2] Enable cudagraph draft_extend for trtllm_mla_backend and Acclen Fix for DP under cudagraph mode (sgl-project#16974)

* Add kl test for swa radix cache (sgl-project#17281)

* fix: Handle multiple named chat templates in HuggingFace tokenizers (sgl-project#17236)

Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

* Move radix cache related tests (sgl-project#17295)

* [Refactor] Add `-fp4-gemm-backend` to replace `SGLANG_FLASHINFER_FP4_GEMM_BACKEND` (sgl-project#16534)

Co-authored-by: Vincent Zhong <207368749+vincentzed@users.noreply.github.com>

* [Bugfix] Fix PD accuracy when MTP is not configured on the prefill node (sgl-project#17212)

Co-authored-by: Shangming Cai <csmthu@gmail.com>

* [Diffusion] Apply jit qk_norm to flux1 (sgl-project#17296)

* [Refactor] Split out deepseek v2 weight loader function into mixin (sgl-project#16649)

* [NPU]Support GPT-OSS for NPU (sgl-project#14197)

* [jit-kernel] Add CuTe DSL GDN Decode Kernel (sgl-project#15631)

Co-authored-by: Jinyan Chen <jinyanc@nvidia.com>

* [GLM 4.7] Add RTX 6000 Pro aka sm120 (sgl-project#17235)

Co-authored-by: root <root@ubuntu-nvidia.localdomain>

* Update CODEOWNERS for multimodal_gen (sgl-project#17308)

Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>

* [Feature] overlap LoRA weight loading with compute (sgl-project#15512)

* [PD] Optimize MHA models pp util calculation logic (sgl-project#17306)

* [Minor] Correct sglang version when installing from source (sgl-project#17315)

* Use dsv3 optimized routing `fused_topk_deepseek` instead of `moe_fused_gate` (sgl-project#15347)

* [DeepSeek v3.2] Opt MTP decode cuda batch sizes and nsa implementation (sgl-project#16961)

* Update code sync scripts (sgl-project#17319)

* [Auto Sync] Update tokenizer_manager.py (20260119) (sgl-project#17317)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* support new qwen3_coder_detector (sgl-project#16744)

Co-authored-by: liugaoji.lgj <liugaoji.lgj@alibaba-inc.com>

* Fix kernel selection in biased_grouped_topk_gpu (sgl-project#17325)

* KV Cache Events with Attention DP bug fix (sgl-project#16030) (sgl-project#16412)

* [Perf] fuse q, k norm for Flux2Attention (sgl-project#17241)

Co-authored-by: Minglei Zhu <zminglei@linkedin.com>

* [CI] Add partition to stage-b-test-large-1-gpu (11->12) (sgl-project#17245)

* fix(ci): rate limit and permission errors in trace publishing (sgl-project#17238)

* Revert "[Perf] fuse q, k norm for Flux2Attention (sgl-project#17241)" (sgl-project#17332)

* Migrate performance, accuracy, and quantization tests to CI registry (sgl-project#17177)

Co-authored-by: Kangyan-Zhou <zky314343421@gmail.com>

* Inclusion of nvfp4 blockscale in EPLB Rebalance (sgl-project#17158)

* [Refactor] Set `fp4-gemm-backend=auto` on SM100 and rename `fp4-gemm-backend` with `flashinfer_` prefix (sgl-project#17309)

* [Diffusion] Apply qknorm to flux2 and apply lightx2v rms_norm_one_pass kernel(without residual) (sgl-project#17305)

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* Fix v32 continue_final_message not work (sgl-project#16567)

* Evict swa kv cache during decoding (sgl-project#17220)

* [RadixTree][1/N Refactor]: Support unified match_prefix params (sgl-project#17142)

Co-authored-by: yizhang2077 <1109276519@qq.com>
Co-authored-by: pansicheng <sicheng.pan.chn@gmail.com>

* [AMD CI] Migrate and Add More Testcases (sgl-project#17116)

Co-authored-by: yctseng0211 <yctseng@amd.com>

* [AMD] CI - add partitions for stage-b-test-small-1-gpu-amd (sgl-project#17345)

* Restore deepseek_v2.py to main's code, except the utils

* Ran `pre-commit`

---------

Signed-off-by: Lancer <maruixiang6688@gmail.com>
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: Hudson Xing <1277646412@qq.com>
Co-authored-by: Lancer <402430575@qq.com>
Co-authored-by: Alison Shao <54658187+alisonshao@users.noreply.github.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
Co-authored-by: Ke Bao <ispobaoke@gmail.com>
Co-authored-by: Yuan Luo <yuan.luo@hotmail.com>
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
Co-authored-by: Mohammad Miadh Angkad <mangkad.bsdsba2027@aim.edu>
Co-authored-by: Changyi Yang <112288487+ChangyiYang@users.noreply.github.com>
Co-authored-by: YAMY <74099316+YAMY1234@users.noreply.github.com>
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
Co-authored-by: b8zhong <b8zhong@uwaterloo.ca>
Co-authored-by: Vincent Zhong <207368749+vincentzed@users.noreply.github.com>
Co-authored-by: Ch3ngY1 <91232537+Ch3ngY1@users.noreply.github.com>
Co-authored-by: Shangming Cai <csmthu@gmail.com>
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
Co-authored-by: Jerry Ji <jerryjilol@gmail.com>
Co-authored-by: Todobe <43903496+Todobe@users.noreply.github.com>
Co-authored-by: Jinyan Chen <93358689+liz-badada@users.noreply.github.com>
Co-authored-by: Jinyan Chen <jinyanc@nvidia.com>
Co-authored-by: Koushik Dutta <koush@koushikdutta.com>
Co-authored-by: root <root@ubuntu-nvidia.localdomain>
Co-authored-by: Glen Liu <62917497+glenliu21@users.noreply.github.com>
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
Co-authored-by: Lee Nau <lnau@nvidia.com>
Co-authored-by: Yongfei Xu <xuyongfei.xyf@antgroup.com>
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Gaoji Liu <34803073+attack204@users.noreply.github.com>
Co-authored-by: liugaoji.lgj <liugaoji.lgj@alibaba-inc.com>
Co-authored-by: yudian0504 <138860534+yudian0504@users.noreply.github.com>
Co-authored-by: Kartik Ramesh <kartikx2000@gmail.com>
Co-authored-by: Minglei Zhu <mingleizhu1122@gmail.com>
Co-authored-by: Minglei Zhu <zminglei@linkedin.com>
Co-authored-by: Kangyan-Zhou <zky314343421@gmail.com>
Co-authored-by: Shu Wang <shuw@nvidia.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: ybyang <10629930+whybeyoung@users.noreply.github.com>
Co-authored-by: zhangheng <hzh0425@apache.org>
Co-authored-by: yizhang2077 <1109276519@qq.com>
Co-authored-by: pansicheng <sicheng.pan.chn@gmail.com>
Co-authored-by: Bingxu Chen <Bingxu.Chen@amd.com>
Co-authored-by: yctseng0211 <yctseng@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants