Upgrade transformers==5.3.0 by JustinTong0323 · Pull Request #17784 · sgl-project/sglang

JustinTong0323 · 2026-01-26T23:03:12Z

Motivation

Address #17779 — Upgrade transformers to 5.3.0.

Changes

Bump transformers>=5.2.0, huggingface_hub>=1.0.0; remove hf_transfer
get_rope_config() utility for backward-compatible config.rope_parameters access
Qwen2: remove padding_idx (transformers#41541)
Gemma3: adapt for v5 API changes
LLaVA: handle CLIPImageProcessorFast returning torch.Tensor instead of ndarray
Qwen2.5-VL encoder: use pooler_output instead of last_hidden_state
Tokenizer: explicitly use_fast=True in auto mode; fix special_tokens_pattern; sync text_config
Config: register custom configs with AutoConfig; GGUF version parsing workaround
Tests: fix _apply_rotary_emb import path; fix Qwen2.5-VL .visual → .model.visual
InternVL test: patch meta-tensor .item() and missing all_tied_weights_keys for v5 compat

TODO

…ple files - Updated `huggingface_hub` dependency to version `>=1.0.0` in `pyproject_cpu.toml`, `pyproject_npu.toml`, `pyproject_other.toml`, `pyproject_xpu.toml`, and `pyproject.toml`. - Upgraded `transformers` dependency to version `5.0.0` in the same files. - Removed `hf_transfer` from the dependencies in the aforementioned files. - Refactored the handling of `rope_theta` and `rope_scaling` parameters to use `config.rope_parameters` in various model files for consistency and improved maintainability. Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

gemini-code-assist · 2026-01-26T23:04:13Z

Summary of Changes

Hello @JustinTong0323, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request primarily focuses on upgrading core dependencies, most notably the transformers library to version 5.0.0 and huggingface_hub to >=1.0.0. These updates necessitate significant refactoring across various model implementations to align with changes in how Rotary Positional Embedding (RoPE) parameters (rope_theta and rope_scaling) are accessed, moving towards a more unified config.rope_parameters approach. Additionally, the hf_transfer dependency has been removed, and test runners have been adapted to new API changes in the transformers library for multimodal models. The overall impact is enhanced compatibility with the latest Hugging Face ecosystem and improved code consistency.

Highlights

Dependency Upgrades: The transformers library has been updated to version 5.0.0, and huggingface_hub to >=1.0.0 across all pyproject.toml configurations.
hf_transfer Removal: The hf_transfer dependency and its related activation logic have been removed from project configurations and utility files, streamlining dependencies.
RoPE Parameter Refactoring: The handling of rope_theta and rope_scaling parameters has been refactored across numerous model implementations to consistently use config.rope_parameters.get() for improved maintainability and compatibility with the updated transformers library.
Test Runner Adaptations: Model loading and feature extraction logic in sglang/test/runners.py have been updated to reflect changes in the transformers library, specifically replacing AutoModelForVision2Seq with AutoModelForImageTextToText and adjusting feature extraction calls to use return_dict=True and pooler_output.
Tokenizer Usage Update: A minor adjustment was made in test/registered/core/test_score_api.py to use tokenizer() directly instead of tokenizer.encode_plus().

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request primarily focuses on updating the transformers library to version 5.0.0 and adapting the codebase to changes introduced in this new version. Key modifications include updating huggingface_hub and transformers dependencies, removing the hf_transfer dependency and its associated code, and refactoring the access pattern for rope_theta and rope_scaling parameters across various model files to use config.rope_parameters.get(...). Additionally, transformers API calls in test files have been updated to reflect changes in class names and output access methods. These changes are well-justified and necessary for compatibility with the upgraded transformers library, improving overall code maintainability and consistency.

…les for cleaner dependency management. Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

JustinTong0323 · 2026-01-26T23:22:42Z

/tag-and-rerun-ci run again again

…files for v5 compatibility. Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

tugot17 · 2026-01-29T13:03:01Z

The tokenizer_manager.py also will have to be changed right?

# Wrap each token ID in its own list for batch_decode to decode them separately
 # batch_decode([1, 2, 3]) concatenates tokens, batch_decode([[1], [2], [3]]) decodes separately
token_texts = self.tokenizer.batch_decode([[idx] for idx in token_logprobs_idx])

at least I had to change this to make it run with sglang after manually upgrading

Fix tokenizer behavior in auto mode to ensure compatibility with Transformers v5 by explicitly setting use_fast=True when not provided.

…s_causal - Add _ensure_gguf_version() in get_tokenizer() to fix InvalidVersion: 'N/A' when detokenizer loads GGUF models (was only called in get_config()) - Handle dict hf_text_config in is_deepseek_nsa() to fix AttributeError for DeepSeek-OCR models whose text_config is a plain dict - Add config_kwargs={"is_causal": True} to HF embedding test runner so SentenceTransformer uses causal attention matching SGLang's behavior (v5 now respects config.is_causal=false → bidirectional attention)

adarshxs · 2026-03-04T10:41:58Z

Kimi-VL: is_torch_fx_available removed — upstream model code (moonshotai) or sglang shim

Kimi-VL works and verified. Modeling code is not affected since its an sglang native implementation with no references to is_torch_fx_available

…onfig Models like DeepSeek-OCR store text_config as a plain dict. Previously this was returned as-is, causing AttributeError crashes at every downstream site that uses attribute access (e.g. config.hidden_size in _derive_model_shapes, config.architectures in is_deepseek_nsa). Convert the dict to PretrainedConfig at the source so all downstream code works uniformly with attribute access.

Convert dict sub-configs (text_config, llm_config, language_config) to PretrainedConfig early in get_hf_text_config(), before hasattr() asserts that fail on plain dicts. Also propagates torch_dtype from parent config.

…ights_keys - Add is_torch_fx_available compat shim: remote HF model code (e.g. MiniCPM-V) imports this function which was removed in transformers v5. Patch it back as always-True since torch.fx is always available in PyTorch >= 2.0. - Add try/except fallback in TestMiniCPMVUnderstandsImage._init_visual: v5's from_pretrained expects model.all_tied_weights_keys but MiniCPM-V-4 remote code only has _tied_weights_keys. Fall back to from_config + manual safetensors weight loading (same pattern as InternVL fix).

…mbedding hidden_size - Add normalize_rope_scaling_compat() to ensure rope_scaling dicts have "type" alongside "rope_type" (fixes KimiVL KeyError on v5) - Call _fix_added_tokens_encoding() on test processors so special tokens like <image_id>, <image>, <slice> encode as single tokens in v5 - Patch composite config hidden_size fallback in runners.py for sentence_transformers compatibility with v5 Qwen2VLConfig

- normalize_rope_scaling_compat: wrap getattr in try/except for configs where rope_scaling property raises AttributeError, remove redundant rope_parameters patching - KimiVL _init_visual: reset v5 auto-populated rope_scaling to None so remote code takes correct branch, patch tie_weights to accept recompute_mapping kwarg from v5 post_init, cast pixel_values dtype

InternLM2 is used as a sub-model for InternVL, where its config may be created from a dict via InternLM2Config(**dict). Use getattr with defaults instead of config.rope_parameters[] for safer access.

…sBackend fallback - Bump transformers from v5.2.0 to v5.3.0 to fix InternVL2.5 MGSM accuracy regression (0.38 → 0.90 on v5.3.0) - Fix DeepSeek-OCR OOM: _override_deepseek_ocr_v_head_dim now also patches language_config (which get_hf_text_config may prefer over text_config), preventing KV cache profiler from underestimating by 2x - Add TokenizersBackend auto-retry: detect when transformers v5 silently falls back to a generic TokenizersBackend and retry with trust_remote_code=True Tested on H200 with v5.3.0: InternVL2.5-8B MGSM 0.896, DeepSeek-OCR, MiniCPM-o-2_6, InternVL2-2B all pass.

…ion calls In transformers v5, LlamaTokenizer rebuilds the pre_tokenizer and decoder from scratch with Llama-specific components (Metaspace), discarding the originals from tokenizer.json. This breaks models like DeepSeek-V3.2 that specify LlamaTokenizerFast but use ByteLevel pre_tokenizer/decoder. Add _fix_v5_tokenizer_components() to detect the mismatch by comparing the loaded tokenizer's pre_tokenizer against the raw tokenizer.json and restore the original components when they differ. Update TestDeepSeekV32Detector to use sglang's get_tokenizer() so the fix applies in tests too. Tested: all 145 function_call_parser tests pass on v5.3.0.

…glang's Transformers v5.3.0 registers its own NemotronHConfig with model_type "nemotron_h", which only supports 3 layer types (M, *, E) and crashes with KeyError on '-' (MLP) in hybrid_override_pattern. Sglang's custom NemotronHConfig supports all 4 types but loses the AutoConfig registration since the upstream one takes priority. Catch KeyError in get_config(), extract model_type from the raw config dict, and fall back to sglang's config class when available.

Mixtral-8x7B AWQ MoE int4 kernels require Triton autotuning on H100 runners with cold cache, taking ~400s for the first CUDA graph batch. This increases the CI estimated time to avoid timeout on slow runners.

alisonshao · 2026-03-07T21:32:44Z

stage b passed: https://github.com/sgl-project/sglang/actions/runs/22796556726?pr=17784

- qwen3_next.py: Use getattr(config, "rope_theta", 10000) instead of config.rope_parameters["rope_theta"] which is None when no rope_scaling - model_config.py: Use rope_scaling.get("factor", 1.0) instead of rope_scaling["factor"] which may be missing in v5's rope_parameters format

…Mo pad_token_id, memory release - factory.py: Use .get() for rope_scaling["factor"] and ["original_max_position_embeddings"] across all scaling types (llama3, linear, dynamic, yarn, deepseek_yarn, longrope, cpu) - deepseek_v2.py: Use .get("factor", 1.0) for mscale computation - hf_transformers_utils.py: Handle KeyError 'deepseek_v32' from v5 CONFIG_MAPPING (v5 throws KeyError instead of ValueError for unrecognized model_type) - mimo_v2_flash.py: Use getattr for pad_token_id (MiMoV2FlashConfig lacks attribute in v5) - test_multi_instance_release_memory_occupation.py: Add gc.collect() before empty_cache() (v5's from_pretrained dispatch hooks create circular refs preventing deallocation)

- MiMo-V2: Use direct config attributes (rope_theta, rope_scaling, swa_rope_theta) instead of v5's rope_parameters property which loses custom fields. Treat rope_type=default as None. - ModelOpt loader: Wrap AutoConfig.from_pretrained in try/except falling back to get_config() for models needing config patching.

pyproject_xpu.toml, pyproject_other.toml, pyproject_npu.toml, and pyproject_cpu.toml were still pinned to 5.2.0 while pyproject.toml was already at 5.3.0.

Per reviewer feedback (Xinyuan), replace silent .get("factor", 1.0) defaults with _get_rope_param() helper that logs a WARNING when a key is missing. Makes accuracy bugs from v5 config mismatches easier to debug. - factory.py: Add _get_rope_param() helper, use for all scaling types - model_config.py: Add inline warnings for BailingMoe/SarvamMLA/DeepSeek MLA - deepseek_v2.py: Add inline warning for mscale factor

The merge of main into update-transformers-v5 incorrectly resolved rope_scaling=None (v5 branch) vs rope_scaling=getattr(config, "rope_scaling", None) (main) as rope_scaling=config.rope_parameters. This is wrong because in v5, config.rope_parameters always returns a non-None dict even for models with no scaling, which would break MLA attention that expects None for no-scaling.

JustinTong0323 requested review from Fridge003, ch-wan, fzyzcjy, ispobock, merrymercy, mickqian, yhyang201 and zhyncs as code owners January 26, 2026 23:03

github-actions bot added quant LLM Quantization dependencies Pull requests that update a dependency file deepseek npu diffusion SGLang Diffusion labels Jan 26, 2026

gemini-code-assist bot reviewed Jan 26, 2026

View reviewed changes

Remove huggingface_hub>=1.0.0 dependency from multiple pyproject fi…

24b2db0

…les for cleaner dependency management. Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

github-actions bot added the run-ci label Jan 26, 2026

Fridge003 added the high priority label Jan 27, 2026

JustinTong0323 and others added 3 commits January 27, 2026 12:16

Update peft dependency to version >=0.18.0 in multiple pyproject …

a5584c3

…files for v5 compatibility. Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

Merge branch 'main' into update-transformers-v5

c2328d9

Merge branch 'main' into update-transformers-v5

f298583

vincentzed mentioned this pull request Jan 29, 2026

[Liquid AI] Support LFM VL #17954

Open

5 tasks

Merge branch 'main' into update-transformers-v5

4d5cc69

JustinTong0323 requested review from Ying1123, hnyls2002 and xiezhq-hermann as code owners February 2, 2026 05:30

fix batch decode

330fbcd

Fix tokenizer behavior in auto mode to ensure compatibility with Transformers v5 by explicitly setting use_fast=True when not provided.

alisonshao requested a review from yuan-luo as a code owner March 4, 2026 07:50

Merge branch 'main' into update-transformers-v5

59d0799

Alison Shao and others added 4 commits March 4, 2026 16:33

Fix dict sub-config assert in get_hf_text_config

1a4391d

Convert dict sub-configs (text_config, llm_config, language_config) to PretrainedConfig early in get_hf_text_config(), before hasattr() asserts that fail on plain dicts. Also propagates torch_dtype from parent config.

Merge branch 'main' into update-transformers-v5

177609c

alisonshao force-pushed the update-transformers-v5 branch from b4a9863 to b9834ae Compare March 5, 2026 20:43

Alison Shao and others added 7 commits March 5, 2026 14:08

Use getattr for rope config in internlm2 for sub-config compatibility

da27704

InternLM2 is used as a sub-model for InternVL, where its config may be created from a dict via InternLM2Config(**dict). Use getattr with defaults instead of config.rope_parameters[] for safer access.

Merge branch 'main' into update-transformers-v5

e2c875c

Kangyan-Zhou changed the title ~~Upgrade transformers==5.2.0~~ Upgrade transformers==5.3.0 Mar 7, 2026

Increase AWQ test est_time to 700s for H100 cold cache

7140f42

Mixtral-8x7B AWQ MoE int4 kernels require Triton autotuning on H100 runners with cold cache, taking ~400s for the first CUDA graph batch. This increases the CI estimated time to avoid timeout on slow runners.

Alison Shao added 2 commits March 7, 2026 13:38

alisonshao requested review from BBuf, Edwardf0t1 and HaiShaw as code owners March 7, 2026 23:51

JustinTong0323 and others added 6 commits March 9, 2026 18:47

Merge branch 'main' into update-transformers-v5

800e9f7

Bump transformers 5.2.0 → 5.3.0 in all pyproject files

e603cf5

pyproject_xpu.toml, pyproject_other.toml, pyproject_npu.toml, and pyproject_cpu.toml were still pinned to 5.2.0 while pyproject.toml was already at 5.3.0.

Merge branch 'main' into update-transformers-v5

476d371

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgrade transformers==5.3.0#17784

Upgrade transformers==5.3.0#17784
JustinTong0323 wants to merge 64 commits intosgl-project:mainfrom
JustinTong0323:update-transformers-v5

JustinTong0323 commented Jan 26, 2026 •

edited by Kangyan-Zhou

Loading

Uh oh!

gemini-code-assist bot commented Jan 26, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

JustinTong0323 commented Jan 26, 2026 •

edited by b8zhong

Loading

Uh oh!

tugot17 commented Jan 29, 2026 •

edited

Loading

Uh oh!

adarshxs commented Mar 4, 2026

Uh oh!

alisonshao commented Mar 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Conversation

JustinTong0323 commented Jan 26, 2026 • edited by Kangyan-Zhou Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Changes

TODO

Uh oh!

gemini-code-assist bot commented Jan 26, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

JustinTong0323 commented Jan 26, 2026 • edited by b8zhong Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tugot17 commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adarshxs commented Mar 4, 2026

Uh oh!

alisonshao commented Mar 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

JustinTong0323 commented Jan 26, 2026 •

edited by Kangyan-Zhou

Loading

JustinTong0323 commented Jan 26, 2026 •

edited by b8zhong

Loading

tugot17 commented Jan 29, 2026 •

edited

Loading