Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
70 commits
Select commit Hold shift + click to select a range
1f0e24c
Upgrade transformers from 5.3.0 to 5.4.0
JustinTong0323 Mar 27, 2026
926cbda
Fix rope_parameters validation error for unregistered model types
JustinTong0323 Mar 27, 2026
2106153
Add easydict dependency for DeepSeek-OCR remote model code
JustinTong0323 Mar 27, 2026
a225590
Patch LlamaFlashAttention2 at import time for remote code compat
JustinTong0323 Mar 27, 2026
b80d563
Add addict dependency to pyproject_xpu.toml for DeepSeek-OCR
JustinTong0323 Mar 28, 2026
3714fc0
Patch BaseImageProcessor.__call__ for remote code compat
JustinTong0323 Mar 28, 2026
d4ced8c
Fix lint and TokenizersBackend fallback for transformers 5.4.0
JustinTong0323 Mar 28, 2026
be30493
Fix lint and improve TokenizersBackend fallback
JustinTong0323 Mar 28, 2026
76d74fe
Refactor transformers 5.4.0 compat patches into dedicated module
JustinTong0323 Mar 28, 2026
427a055
Extract helpers: _is_mistral_model, _ensure_sub_configs
JustinTong0323 Mar 28, 2026
9de7636
Split hf_transformers_utils.py into hf_transformers/ subpackage
JustinTong0323 Mar 28, 2026
6cbce29
Fix PR review findings: missing re-exports, version bug, logging
JustinTong0323 Mar 28, 2026
991929e
Update rope_parameters TODO with upstream fix reference (#45049)
JustinTong0323 Mar 28, 2026
58c6b92
Move mistral_utils.py into hf_transformers/ subpackage
JustinTong0323 Mar 28, 2026
3d72d60
Move Mistral helpers into mistral_utils.py for cohesion
JustinTong0323 Mar 28, 2026
88a181a
Remove unnecessary comments from hf_transformers package
JustinTong0323 Mar 28, 2026
d32d4c6
Refactor internals of hf_transformers package files
JustinTong0323 Mar 28, 2026
68fbd19
Improve error handling and fix docstring typo in hf_transformers
JustinTong0323 Mar 28, 2026
e2e97c0
Fix lint: black formatting in mistral_utils.py and tokenizer.py
JustinTong0323 Mar 28, 2026
6fa802f
Fix: don't escalate trust_remote_code in TokenizersBackend retry
JustinTong0323 Mar 28, 2026
617a93d
Simplify TokenizersBackend retry to single use_fast=False attempt
JustinTong0323 Mar 28, 2026
509fdf9
Absorb PR #21586: patch is_base_mistral in CI to avoid HF 429 rate li…
JustinTong0323 Mar 28, 2026
eefe2e3
Merge remote-tracking branch 'upstream/main' into upgrade/transformer…
JustinTong0323 Mar 28, 2026
1741525
Fix PR review findings: error handling, dead code, and code quality
JustinTong0323 Mar 28, 2026
0a39854
Revert TokenizersBackend raise to warning for CI compatibility
JustinTong0323 Mar 28, 2026
53bceac
Fix TokenizersBackend in processor: reload tokenizer via get_tokenizer
JustinTong0323 Mar 31, 2026
7867f01
Merge branch 'main' into upgrade/transformers-5.4.0
JustinTong0323 Mar 31, 2026
39ec3d3
Fix TokenizersBackend for models with unmapped model_type
JustinTong0323 Apr 1, 2026
d4efd5e
Trigger CI
JustinTong0323 Apr 1, 2026
67f8298
Merge remote-tracking branch 'upstream/main' into upgrade/transformer…
JustinTong0323 Apr 1, 2026
6df79f4
Merge branch 'main' into upgrade/transformers-5.4.0
JustinTong0323 Apr 1, 2026
7de1261
Fix error handling and code quality in hf_transformers package
JustinTong0323 Apr 1, 2026
b01b4b3
Merge branch 'main' into upgrade/transformers-5.4.0
JustinTong0323 Apr 1, 2026
f341d7f
Simplify hf_transformers: dedup calls, extract constant, use shared l…
JustinTong0323 Apr 1, 2026
749714d
Move mistral/pixtral helpers to mistral_utils, import directly
JustinTong0323 Apr 1, 2026
42aea7d
Fix isort: remove extra blank line in __init__.py
JustinTong0323 Apr 1, 2026
f3a35a5
Merge branch 'main' into upgrade/transformers-5.4.0
JustinTong0323 Apr 1, 2026
ad4cffc
Downgrade TokenizersBackend trust_remote_code error to warning
JustinTong0323 Apr 1, 2026
14a17d4
Trigger CI rerun
JustinTong0323 Apr 1, 2026
350a703
Fix TokenizersBackend for models with auto_map custom tokenizers
JustinTong0323 Apr 1, 2026
58dd64b
Fix transformers 5.4 compat: CUDA tensor numpy + nemotron_h pattern
JustinTong0323 Apr 2, 2026
b1e7f07
Simplify: dedup config read, hoist import, narrow except, fix docstring
JustinTong0323 Apr 2, 2026
86b8e6b
Merge branch 'main' into upgrade/transformers-5.4.0
JustinTong0323 Apr 2, 2026
6411f96
Merge branch 'main' into upgrade/transformers-5.4.0
JustinTong0323 Apr 6, 2026
cc81f62
Merge branch 'main' into upgrade/transformers-5.4.0
JustinTong0323 Apr 8, 2026
140ee38
Port Gemma 4 config remapping from upstream to config.py
JustinTong0323 Apr 8, 2026
9f6f59f
Port upstream changes to hf_transformers subpackage
JustinTong0323 Apr 8, 2026
16a34c0
Move patch_mistral_common_tokenizer to mistral_utils.py
JustinTong0323 Apr 8, 2026
6a0c664
Fix error handling and add unit tests for hf_transformers subpackage
JustinTong0323 Apr 8, 2026
d03b8ff
Fix revision kwarg bug, dead code, comment, and add compat patch tests
JustinTong0323 Apr 8, 2026
05114eb
Add test_multi_item_scoring.py to not_in_ci in old test runner
JustinTong0323 Apr 8, 2026
9a7ed0c
Upgrade transformers from 5.4.0 to 5.5.0
JustinTong0323 Apr 9, 2026
7c259d9
Restore ValueError handler for deepseek_v32 in get_config
JustinTong0323 Apr 10, 2026
226cd4d
Bump transformers from 5.5.0 to 5.5.3
JustinTong0323 Apr 10, 2026
647c4d6
Refactor get_tokenizer into focused helpers and move Mistral code to …
JustinTong0323 Apr 10, 2026
0b9017e
Trigger CI rerun after maintenance window
JustinTong0323 Apr 10, 2026
6f59c75
Merge branch 'main' into upgrade/transformers-5.4.0
JustinTong0323 Apr 10, 2026
5cfdb0a
Trigger fresh NVIDIA CI run
JustinTong0323 Apr 10, 2026
66a21d1
Merge branch 'main' into upgrade/transformers-5.4.0
JustinTong0323 Apr 11, 2026
84797a2
Merge branch 'main' into upgrade/transformers-5.4.0
JustinTong0323 Apr 11, 2026
d9aceaa
Remove stale test_multi_item_scoring.py reference from run_suite.py
JustinTong0323 Apr 11, 2026
c191d81
Bump mistral_common>=1.11.0 for transformers 5.5.3 compatibility
JustinTong0323 Apr 12, 2026
22d00c1
Fix Qwen3-VL-MoE garbled output with transformers 5.5.3
JustinTong0323 Apr 12, 2026
7f25c2e
Fix InternVL test crash with transformers 5.5.3
JustinTong0323 Apr 12, 2026
cf28335
Fix MiniCPM-V-4 test with transformers 5.5.3 TokenizersBackend
JustinTong0323 Apr 12, 2026
52013c2
Fix Step-3.5-Flash config validation with transformers 5.5.3
JustinTong0323 Apr 13, 2026
adccb87
Merge branch 'main' into upgrade/transformers-5.4.0
JustinTong0323 Apr 14, 2026
ffb090b
Fix Qwen3_5Moe config classes with transformers 5.5.3
JustinTong0323 Apr 15, 2026
c15583e
Bump transformers 5.5.3 -> 5.5.4
JustinTong0323 Apr 15, 2026
8620596
Trigger CI
JustinTong0323 Apr 15, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions python/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ dependencies = [
"modelscope",
"msgspec",
"ninja",
"easydict", # Required by remote model code (e.g. DeepSeek-OCR) loaded via trust_remote_code; validated by transformers 5.4+ check_imports
"numpy",
"nvidia-cutlass-dsl>=4.4.1",
"nvidia-ml-py",
Expand Down Expand Up @@ -70,8 +71,8 @@ dependencies = [
"av ; sys_platform == 'linux' and (platform_machine == 'aarch64' or platform_machine == 'arm64' or platform_machine == 'armv7l')",
"torchvision",
"tqdm",
"mistral_common>=1.9.0",
"transformers==5.3.0",
"mistral_common>=1.11.0",
"transformers==5.5.4",
"uvicorn",
"uvloop",
"watchfiles",
Expand Down
5 changes: 3 additions & 2 deletions python/pyproject_cpu.toml
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ dependencies = [
"llguidance>=0.7.11,<0.8.0",
"modelscope",
"msgspec",
"easydict",
"ninja",
"numpy",
"openai-harmony==0.0.4",
Expand Down Expand Up @@ -60,8 +61,8 @@ dependencies = [
"torchaudio==2.9.0",
"torchvision==0.24.0",
"tqdm",
"mistral_common>=1.9.0",
"transformers==5.3.0",
"mistral_common>=1.11.0",
"transformers==5.5.4",
"triton==3.5.0",
"uvicorn",
"uvloop",
Expand Down
5 changes: 3 additions & 2 deletions python/pyproject_npu.toml
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ dependencies = [
"datasets",
"einops",
"fastapi",
"easydict",
"gguf",
"hf_transfer",
"huggingface_hub",
Expand Down Expand Up @@ -57,8 +58,8 @@ dependencies = [
"timm==1.0.16",
"torchao==0.9.0",
"tqdm",
"mistral_common>=1.9.0",
"transformers==5.3.0",
"mistral_common>=1.11.0",
"transformers==5.5.4",
"uvicorn",
"uvloop",
"xgrammar==0.1.32",
Expand Down
5 changes: 3 additions & 2 deletions python/pyproject_other.toml
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ runtime_common = [
"build",
"compressed-tensors",
"datasets",
"easydict",
"einops",
"fastapi",
"gguf",
Expand Down Expand Up @@ -57,8 +58,8 @@ runtime_common = [
"timm==1.0.16",
"torchao==0.9.0",
"tqdm",
"mistral_common>=1.9.0",
"transformers==5.3.0",
"mistral_common>=1.11.0",
"transformers==5.5.4",
"uvicorn",
"uvloop",
"xgrammar==0.1.32",
Expand Down
6 changes: 4 additions & 2 deletions python/pyproject_xpu.toml
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,9 @@ dependencies = [
"blobfile==3.0.0",
"build",
"compressed-tensors",
"addict",
"datasets",
"easydict",
"einops",
"fastapi",
"gguf",
Expand Down Expand Up @@ -60,8 +62,8 @@ dependencies = [
"timm==1.0.16",
"torchao==0.9.0+xpu",
"tqdm",
"mistral_common>=1.9.0",
"transformers==5.3.0",
"mistral_common>=1.11.0",
"transformers==5.5.4",
"uvicorn",
"uvloop",
# "xgrammar==0.1.24", , xgrammar depends on CUDA PyTorch and Triton only
Expand Down
16 changes: 16 additions & 0 deletions python/sglang/srt/configs/qwen3_5.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,9 @@ class Qwen3_5VisionConfig(Qwen3VLVisionConfig):
model_type = "qwen3_5"
base_config_key = "vision_config"

def __init__(self, **kwargs):
super().__init__(**kwargs)


class Qwen3_5TextConfig(Qwen3NextConfig):
model_type = "qwen3_5_text"
Expand Down Expand Up @@ -109,14 +112,27 @@ def __init__(
class Qwen3_5MoeVisionConfig(Qwen3_5VisionConfig):
model_type = "qwen3_5_moe"

def __init__(self, **kwargs):
super().__init__(**kwargs)


class Qwen3_5MoeTextConfig(Qwen3_5TextConfig):
model_type = "qwen3_5_moe_text"

def __init__(self, **kwargs):
super().__init__(**kwargs)


# All Moe variant classes need explicit __init__ because the kw_only=True
# dataclass decorator in transformers v5.5.3+ auto-generates __init__ for
# subclasses, bypassing parent __init__ methods that set up attributes
# (e.g. norm_topk_prob, rope_scaling) and convert sub-config dicts to objects.
class Qwen3_5MoeConfig(Qwen3_5Config):
model_type = "qwen3_5_moe"
sub_configs = {
"vision_config": Qwen3_5MoeVisionConfig,
"text_config": Qwen3_5MoeTextConfig,
}

def __init__(self, **kwargs):
super().__init__(**kwargs)
9 changes: 9 additions & 0 deletions python/sglang/srt/configs/step3p5.py
Original file line number Diff line number Diff line change
Expand Up @@ -94,4 +94,13 @@ def __init__(
self.moe_layers_enum = moe_layers_enum
self.layer_types = layer_types
self.sliding_window = sliding_window
# The upstream Step-3.5-Flash config has layer_types with 48 entries
# but num_hidden_layers=45. The extra 3 are for MTP/nextn predict
# layers (indices 45-47) used by Step3p5DecoderLayer during EAGLE
# speculative decoding. Temporarily align num_hidden_layers to pass
# the transformers v5.5.3+ validator, then restore the real value.
real_num_hidden_layers = self.num_hidden_layers
if layer_types is not None and len(layer_types) != self.num_hidden_layers:
self.num_hidden_layers = len(layer_types)
super().__init__(**kwargs)
self.num_hidden_layers = real_num_hidden_layers
8 changes: 7 additions & 1 deletion python/sglang/srt/models/qwen3_vl.py
Original file line number Diff line number Diff line change
Expand Up @@ -1091,9 +1091,15 @@ def __init__(
if language_model_cls is Qwen3LLMModel:
self.config: Qwen3VLConfig = config # for qwen3-vl
else:
self.config = config.text_config # for qwen3-omni
self.config = config.text_config # for qwen3-omni / qwen3-vl-moe
self.config.encoder_only = getattr(config, "encoder_only", False)
self.config.language_only = getattr(config, "language_only", False)
# Propagate tie_word_embeddings from parent config. In transformers
# v5.5.3+, Qwen3VLMoeTextConfig sets tie_word_embeddings=True by
# default but the actual model checkpoint has a separate lm_head.
# The parent Qwen3VLMoeConfig correctly has tie_word_embeddings=False.
if hasattr(config, "tie_word_embeddings"):
self.config.tie_word_embeddings = config.tie_word_embeddings

if not hasattr(config, "encoder_only") or not config.encoder_only:
self.model = language_model_cls(
Expand Down
67 changes: 67 additions & 0 deletions python/sglang/srt/utils/hf_transformers/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
# Copyright 2023-2024 SGLang Team
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Hugging Face Transformers utilities.

This package provides HF Transformers helpers, split into submodules
(common, compat, config, tokenizer, processor, mistral_utils).
All public symbols are re-exported here for convenience. The old import
path ``sglang.srt.utils.hf_transformers_utils`` is preserved by a
separate shim module.
"""

from .compat import apply_all as _apply_compat

_apply_compat()

from .common import ( # noqa: E402
CONTEXT_LENGTH_KEYS,
AutoConfig,
attach_additional_stop_token_ids,
check_gguf_file,
download_from_hf,
get_context_length,
get_generation_config,
get_hf_text_config,
get_rope_config,
get_sparse_attention_config,
get_tokenizer_from_processor,
)
from .compat import normalize_rope_scaling_compat # noqa: E402
from .config import get_config # noqa: E402
from .processor import get_processor # noqa: E402
from .tokenizer import ( # noqa: E402
_fix_added_tokens_encoding,
_fix_v5_add_bos_eos_token,
get_tokenizer,
)

__all__ = [
"AutoConfig",
"CONTEXT_LENGTH_KEYS",
"_fix_added_tokens_encoding",
"_fix_v5_add_bos_eos_token",
"attach_additional_stop_token_ids",
"check_gguf_file",
"download_from_hf",
"get_config",
"get_context_length",
"get_generation_config",
"get_hf_text_config",
"get_processor",
"get_rope_config",
"get_sparse_attention_config",
"get_tokenizer",
"get_tokenizer_from_processor",
"normalize_rope_scaling_compat",
]
Loading
Loading