Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
67 commits
Select commit Hold shift + click to select a range
72f9f07
Update dependencies and refactor rope parameter handling across multi…
JustinTong0323 Jan 26, 2026
24b2db0
Remove `huggingface_hub>=1.0.0` dependency from multiple pyproject fi…
JustinTong0323 Jan 26, 2026
a5584c3
Update `peft` dependency to version `>=0.18.0` in multiple pyproject …
JustinTong0323 Jan 27, 2026
c2328d9
Merge branch 'main' into update-transformers-v5
JustinTong0323 Jan 27, 2026
f298583
Merge branch 'main' into update-transformers-v5
JustinTong0323 Jan 27, 2026
4d5cc69
Merge branch 'main' into update-transformers-v5
JustinTong0323 Feb 2, 2026
330fbcd
fix batch decode
JustinTong0323 Feb 2, 2026
88f617e
Merge branch 'main' into update-transformers-v5
JustinTong0323 Feb 13, 2026
69114ce
fix get rope_scaling
JustinTong0323 Feb 13, 2026
3ebf145
Update transformers dependency to version 5.1.0
JustinTong0323 Feb 13, 2026
4535638
Merge branch 'main' into update-transformers-v5
JustinTong0323 Feb 13, 2026
28b5ff5
remove deadcode `pad_token_id` for qwen
JustinTong0323 Feb 14, 2026
32524ba
use rope_parameters["rope_theta"] to avoid silent fail
JustinTong0323 Feb 14, 2026
c78a45d
fix for minimax_m2's external config
JustinTong0323 Feb 14, 2026
ca1ebf3
Merge branch 'main' into update-transformers-v5
JustinTong0323 Feb 14, 2026
4cdd034
fix gemma3 for transformers v5
JustinTong0323 Feb 14, 2026
5162738
lint
JustinTong0323 Feb 14, 2026
acc119f
follow up use rope_parameters["rope_theta"] to avoid silent fail
JustinTong0323 Feb 14, 2026
6225a4b
fix: Add synchronization for text configuration attributes in transfo…
JustinTong0323 Feb 14, 2026
62396ca
Remove accidentally added files
JustinTong0323 Feb 14, 2026
1dd0b49
fix: Address special tokens pattern issue in tokenizer in tf v5
JustinTong0323 Feb 14, 2026
003677d
[diffusion] fix qwen2_5vl visual model output
JustinTong0323 Feb 14, 2026
ad091ad
fix: Introduce get_rope_config utility for improved rope parameter ha…
JustinTong0323 Feb 14, 2026
db914a2
fix llava: Handle CLIPImageProcessorFast returning torch.Tensor in tr…
JustinTong0323 Feb 14, 2026
a1fb733
upgrade to 5.2.0
JustinTong0323 Feb 24, 2026
c2077df
Merge branch 'main' into update-transformers-v5
JustinTong0323 Feb 24, 2026
327bf0b
Merge branch 'main' into update-transformers-v5
JustinTong0323 Feb 24, 2026
5fb844b
Merge branch 'main' into update-transformers-v5
Kangyan-Zhou Feb 25, 2026
78ecd17
Remove clean_up_tokenization in InternLM2Tokenizer class as it's not …
JustinTong0323 Feb 27, 2026
4a00cd3
Implement workaround for transformers v5 bug in _ensure_gguf_version …
JustinTong0323 Feb 27, 2026
60311f6
Update diffusion/ Qwen2_5_VLModel to use pooler_output for visual enc…
JustinTong0323 Feb 27, 2026
41ed387
fix import of apply_rotary_emb in test_qkv_proj_with_rope.py to use t…
JustinTong0323 Feb 27, 2026
37cfed0
Refactor visual model initialization in TestQwenVLUnderstandsImage to…
JustinTong0323 Feb 27, 2026
7e2b30e
Merge branch 'main' into update-transformers-v5
JustinTong0323 Feb 27, 2026
b180787
Merge branch 'main' into update-transformers-v5
JustinTong0323 Feb 28, 2026
e257085
Refactor Qwen3.5 attention decoder to streamline rope configuration h…
JustinTong0323 Mar 2, 2026
a624bda
Fix transformers v5 compatibility: clean_up_tokenization monkey-patch…
Mar 1, 2026
7abd2b7
Merge branch 'main' into update-transformers-v5
alisonshao Mar 2, 2026
9b2b05f
Fix transformers v5: InternVL meta tensor crash and MiniCPM-V-4 speci…
Mar 3, 2026
9aadaea
Fix _fix_added_tokens_encoding to scan tokenizer attributes instead o…
Mar 3, 2026
dde8d16
Merge branch 'main' into update-transformers-v5
alisonshao Mar 3, 2026
70aa688
Fix transformers v5: MiniCPM-o processor, InternVL meta tensor, clean…
Mar 3, 2026
e648016
Fix transformers v5: GGUF tokenizer, DeepSeek-OCR config, embedding i…
Mar 4, 2026
59d0799
Merge branch 'main' into update-transformers-v5
alisonshao Mar 4, 2026
f87955f
Fix dict hf_text_config: convert to PretrainedConfig in _patch_text_c…
Mar 5, 2026
1a4391d
Fix dict sub-config assert in get_hf_text_config
Mar 5, 2026
177609c
Merge branch 'main' into update-transformers-v5
alisonshao Mar 5, 2026
b9834ae
Fix transformers v5: is_torch_fx_available shim, MiniCPMV all_tied_we…
Mar 5, 2026
acc6799
Fix v5 compat: rope_scaling normalization, MiniCPMV special tokens, e…
Mar 5, 2026
a8ee10d
Fix KimiVL v5 test: rope_scaling reset, tie_weights patch, dtype cast
Mar 6, 2026
da27704
Use getattr for rope config in internlm2 for sub-config compatibility
Mar 6, 2026
21728fc
Bump transformers v5.2.0 → v5.3.0, fix DeepSeek-OCR OOM and Tokenizer…
Mar 7, 2026
57190eb
Fix v5 tokenizer component mismatch for DeepSeek-V3.2 streaming funct…
Mar 7, 2026
c1476ef
Fix NemotronH config KeyError when upstream v5 class conflicts with s…
Mar 7, 2026
e2c875c
Merge branch 'main' into update-transformers-v5
alisonshao Mar 7, 2026
7140f42
Increase AWQ test est_time to 700s for H100 cold cache
Mar 7, 2026
02c975f
Fix v5 rope_parameters/rope_scaling KeyError and TypeError
Mar 7, 2026
4dd7ffd
Fix v5 crash-level bugs: factory.py KeyError, deepseek_v32 config, Mi…
Mar 7, 2026
800e9f7
Merge branch 'main' into update-transformers-v5
JustinTong0323 Mar 9, 2026
090333b
Fix MiMo-V2 rope_parameters v5 regression and ModelOpt loader KeyError
Mar 8, 2026
e603cf5
Bump transformers 5.2.0 → 5.3.0 in all pyproject files
Mar 9, 2026
dde6ef6
Add warnings for missing rope_scaling keys instead of silent defaults
Mar 9, 2026
3460231
Fix longcat_flash rope_scaling merge conflict: revert to None
Mar 9, 2026
476d371
Merge branch 'main' into update-transformers-v5
Mar 9, 2026
76fc7f2
Fix ModelOpt config passthrough and v5 memory release hooks
Mar 10, 2026
63cfbf7
fix diffusion attempt
mickqian Mar 11, 2026
b4dd7cd
fix diffusion attempt
mickqian Mar 11, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 3 additions & 4 deletions python/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,6 @@ dependencies = [
"flashinfer_python==0.6.4", # keep it aligned with jit-cache version in Dockerfile
"flashinfer_cubin==0.6.4",
"gguf",
"hf_transfer",
"huggingface_hub",
"interegular",
"llguidance>=0.7.11,<0.8.0",
"modelscope",
Expand Down Expand Up @@ -72,7 +70,7 @@ dependencies = [
"av ; sys_platform == 'linux' and (platform_machine == 'aarch64' or platform_machine == 'arm64' or platform_machine == 'armv7l')",
"torchvision",
"tqdm",
"transformers==4.57.1",
"transformers==5.3.0",
"uvicorn",
"uvloop",
"watchfiles",
Expand Down Expand Up @@ -135,14 +133,15 @@ tracing = [

test = [
"accelerate",
"addict",
"bitsandbytes",
"expecttest",
"jsonlines",
"lm-eval[api]>=0.4.9.2",
"matplotlib",
"pandas",
"parameterized",
"peft",
"peft>=0.18.0",
"pytest",
"sentence_transformers",
"tabulate",
Expand Down
6 changes: 2 additions & 4 deletions python/pyproject_cpu.toml
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,6 @@ dependencies = [
"einops",
"fastapi",
"gguf",
"hf_transfer",
"huggingface_hub",
"intel-openmp; platform_machine == 'x86_64'",
"interegular",
"llguidance>=0.7.11,<0.8.0",
Expand Down Expand Up @@ -62,7 +60,7 @@ dependencies = [
"torchaudio==2.9.0",
"torchvision==0.24.0",
"tqdm",
"transformers==4.57.1",
"transformers==5.3.0",
"triton==3.5.0",
"uvicorn",
"uvloop",
Expand All @@ -85,7 +83,7 @@ test = [
"jsonlines",
"matplotlib",
"pandas",
"peft",
"peft>=0.18.0",
"pytest",
"sentence_transformers",
]
Expand Down
6 changes: 2 additions & 4 deletions python/pyproject_npu.toml
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,6 @@ dependencies = [
"einops",
"fastapi",
"gguf",
"hf_transfer",
"huggingface_hub",
"interegular",
"llguidance>=0.7.11,<0.8.0",
"modelscope",
Expand Down Expand Up @@ -57,7 +55,7 @@ dependencies = [
"timm==1.0.16",
"torchao==0.9.0",
"tqdm",
"transformers==4.57.1",
"transformers==5.3.0",
"uvicorn",
"uvloop",
"xgrammar==0.1.27",
Expand Down Expand Up @@ -98,7 +96,7 @@ test = [
"jsonlines",
"matplotlib",
"pandas",
"peft",
"peft>=0.18.0",
"pytest",
"sentence_transformers",
"tabulate",
Expand Down
6 changes: 2 additions & 4 deletions python/pyproject_other.toml
Original file line number Diff line number Diff line change
Expand Up @@ -28,8 +28,6 @@ runtime_common = [
"einops",
"fastapi",
"gguf",
"hf_transfer",
"huggingface_hub",
"interegular",
"llguidance>=0.7.11,<0.8.0",
"modelscope",
Expand Down Expand Up @@ -59,7 +57,7 @@ runtime_common = [
"timm==1.0.16",
"torchao==0.9.0",
"tqdm",
"transformers==4.57.1",
"transformers==5.3.0",
"uvicorn",
"uvloop",
"xgrammar==0.1.27",
Expand Down Expand Up @@ -143,7 +141,7 @@ test = [
"jsonlines",
"matplotlib",
"pandas",
"peft",
"peft>=0.18.0",
"pytest",
"sentence_transformers",
"tabulate",
Expand Down
6 changes: 2 additions & 4 deletions python/pyproject_xpu.toml
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,6 @@ dependencies = [
"einops",
"fastapi",
"gguf",
"hf_transfer",
"huggingface_hub",
"interegular",
"llguidance>=0.7.11,<0.8.0",
"modelscope",
Expand Down Expand Up @@ -62,7 +60,7 @@ dependencies = [
"timm==1.0.16",
"torchao==0.9.0",
"tqdm",
"transformers==4.57.1",
"transformers==5.3.0",
"uvicorn",
"uvloop",
# "xgrammar==0.1.24", , xgrammar depends on CUDA PyTorch and Triton only
Expand All @@ -84,7 +82,7 @@ test = [
"jsonlines",
"matplotlib",
"pandas",
"peft",
"peft>=0.18.0",
"pytest",
"sentence_transformers",
"tabulate",
Expand Down
1 change: 0 additions & 1 deletion python/sglang/check_env.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,6 @@ def is_cuda_v2():
"numpy",
"aiohttp",
"fastapi",
"hf_transfer",
"huggingface_hub",
"interegular",
"modelscope",
Expand Down
16 changes: 0 additions & 16 deletions python/sglang/multimodal_gen/runtime/loader/weight_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,6 @@
from pathlib import Path

import filelock
import huggingface_hub.constants
import torch
from safetensors.torch import safe_open
from torch.distributed.tensor import DTensor
Expand All @@ -37,21 +36,6 @@
temp_dir = tempfile.gettempdir()


def enable_hf_transfer() -> None:
"""automatically activates hf_transfer"""
if "HF_HUB_ENABLE_HF_TRANSFER" not in os.environ:
try:
# enable hf hub transfer if available
import hf_transfer # type: ignore # noqa

huggingface_hub.constants.HF_HUB_ENABLE_HF_TRANSFER = True
except ImportError:
pass


enable_hf_transfer()


class DisabledTqdm(tqdm):

def __init__(self, *args, **kwargs):
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -227,8 +227,8 @@ def __init__(
) -> None:
super().__init__()
self.hidden_size = config.hidden_size
rope_theta = getattr(config, "rope_theta", 10000)
rope_scaling = getattr(config, "rope_scaling", None)
rope_theta = config.rope_parameters["rope_theta"]
rope_scaling = config.rope_parameters
if rope_scaling is not None and getattr(
config, "original_max_position_embeddings", None
):
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -798,6 +798,11 @@ def get_image_features(
"""
pixel_values = pixel_values.type(self.visual.dtype)
image_embeds = self.visual(pixel_values, grid_thw=image_grid_thw)
if not isinstance(image_embeds, torch.Tensor):
# In transformers v5, the visual encoder returns BaseModelOutputWithPooling.
# pooler_output contains the spatially merged embeddings (what we need),
# while last_hidden_state contains the raw unmerged output.
image_embeds = image_embeds.pooler_output
split_sizes = (
image_grid_thw.prod(-1) // self.visual.spatial_merge_size**2
).tolist()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -204,8 +204,8 @@ def __init__(
) -> None:
super().__init__()
self.hidden_size = config.hidden_size
rope_theta = getattr(config, "rope_theta", 1000000.0)
rope_scaling = getattr(config, "rope_scaling", None)
rope_theta = config.rope_parameters["rope_theta"]
rope_scaling = config.rope_parameters
max_position_embeddings = getattr(config, "max_position_embeddings", 40960)
attention_bias = getattr(config, "attention_bias", False)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@
This module contains implementations of image encoding stages for diffusion pipelines.
"""

import inspect

import PIL
import torch
from diffusers.models.autoencoders.vae import DiagonalGaussianDistribution
Expand Down Expand Up @@ -111,9 +113,15 @@ def forward(
server_args.pipeline_config.prepare_image_processor_kwargs(batch)
)

params = inspect.signature(self.image_processor.__call__).parameters
image_processor_kwargs = {
k: v for k, v in image_processor_kwargs.items() if k in params
}

image_inputs = self.image_processor(
images=image, return_tensors="pt", **image_processor_kwargs
).to(cuda_device)

if self.image_encoder:
# if an image encoder is provided
with set_forward_context(current_timestep=0, attn_metadata=None):
Expand Down
1 change: 0 additions & 1 deletion python/sglang/srt/configs/internvl.py
Original file line number Diff line number Diff line change
Expand Up @@ -593,7 +593,6 @@ def convert_tokens_to_string(self, tokens):
current_sub_tokens.append(token)
prev_is_special = False
out_string += self.sp_model.decode(current_sub_tokens)
out_string = self.clean_up_tokenization(out_string)
out_string = self._maybe_add_prefix_space(tokens=tokens, decoded=out_string)
return out_string[1:]

Expand Down
40 changes: 33 additions & 7 deletions python/sglang/srt/configs/model_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,10 +51,15 @@ class ModelImpl(str, Enum):
MINDSPORE = "mindspore"


def is_deepseek_nsa(config: PretrainedConfig) -> bool:
def is_deepseek_nsa(config) -> bool:
architectures = (
config.get("architectures")
if isinstance(config, dict)
else getattr(config, "architectures", None)
)
return (
config.architectures is not None
and config.architectures[0]
architectures is not None
and architectures[0]
in [
"DeepseekV3ForCausalLM",
"DeepseekV32ForCausalLM",
Expand All @@ -63,7 +68,12 @@ def is_deepseek_nsa(config: PretrainedConfig) -> bool:
"PixtralForConditionalGeneration",
"GlmMoeDsaForCausalLM",
]
and getattr(config, "index_topk", None) is not None
and (
config.get("index_topk")
if isinstance(config, dict)
else getattr(config, "index_topk", None)
)
is not None
)


Expand Down Expand Up @@ -458,7 +468,13 @@ def _derive_model_shapes(self):
)
if rope_type != "default":
mscale_all_dim = rope_scaling.get("mscale_all_dim", False)
scaling_factor = rope_scaling["factor"]
if "factor" not in rope_scaling:
logger.warning(
"rope_scaling (type=%s) missing 'factor', defaulting to 1.0. "
"Check model accuracy.",
rope_type,
)
scaling_factor = rope_scaling.get("factor", 1.0)
mscale = yarn_get_mscale(scaling_factor, float(mscale_all_dim))
self.scaling = self.scaling * mscale * mscale
elif "MiniCPM3ForCausalLM" in self.hf_config.architectures:
Expand Down Expand Up @@ -504,7 +520,12 @@ def _derive_model_shapes(self):
mscale_all_dim = self.hf_config.rope_scaling.get(
"mscale_all_dim", False
)
scaling_factor = self.hf_config.rope_scaling["factor"]
if "factor" not in self.hf_config.rope_scaling:
logger.warning(
"BailingMoe rope_scaling missing 'factor', defaulting to 1.0. "
"Check model accuracy.",
)
scaling_factor = self.hf_config.rope_scaling.get("factor", 1.0)
mscale = yarn_get_mscale(scaling_factor, float(mscale_all_dim))
self.scaling = self.scaling * mscale * mscale
elif "SarvamMLAForCausalLM" in self.hf_config.architectures:
Expand All @@ -521,7 +542,12 @@ def _derive_model_shapes(self):
mscale_all_dim = self.hf_config.rope_scaling.get(
"mscale_all_dim", False
)
scaling_factor = self.hf_config.rope_scaling["factor"]
if "factor" not in self.hf_config.rope_scaling:
logger.warning(
"SarvamMLA rope_scaling missing 'factor', defaulting to 1.0. "
"Check model accuracy.",
)
scaling_factor = self.hf_config.rope_scaling.get("factor", 1.0)
mscale = yarn_get_mscale(scaling_factor, float(mscale_all_dim))
self.scaling = self.scaling * mscale * mscale
else:
Expand Down
Loading
Loading