Skip to content

Upgrade transformers==5.3.0#17784

Open
JustinTong0323 wants to merge 64 commits intosgl-project:mainfrom
JustinTong0323:update-transformers-v5
Open

Upgrade transformers==5.3.0#17784
JustinTong0323 wants to merge 64 commits intosgl-project:mainfrom
JustinTong0323:update-transformers-v5

Conversation

@JustinTong0323
Copy link
Collaborator

@JustinTong0323 JustinTong0323 commented Jan 26, 2026

Motivation

Address #17779 — Upgrade transformers to 5.3.0.

Changes

  • Bump transformers>=5.2.0, huggingface_hub>=1.0.0; remove hf_transfer
  • get_rope_config() utility for backward-compatible config.rope_parameters access
  • Qwen2: remove padding_idx (transformers#41541)
  • Gemma3: adapt for v5 API changes
  • LLaVA: handle CLIPImageProcessorFast returning torch.Tensor instead of ndarray
  • Qwen2.5-VL encoder: use pooler_output instead of last_hidden_state
  • Tokenizer: explicitly use_fast=True in auto mode; fix special_tokens_pattern; sync text_config
  • Config: register custom configs with AutoConfig; GGUF version parsing workaround
  • Tests: fix _apply_rotary_emb import path; fix Qwen2.5-VL .visual.model.visual
  • InternVL test: patch meta-tensor .item() and missing all_tied_weights_keys for v5 compat

TODO

  • Rope parameter handling (config.rope_parameters)
  • Qwen2 padding_idx removal
  • Gemma3 v5 adaptation
  • LLaVA CLIPImageProcessorFast tensor handling
  • Qwen2.5-VL encoder pooler_output
  • Tokenizer use_fast=True default, special_tokens_pattern fix
  • GGUF InvalidVersion: 'N/A' workaround
  • Test import path fix (_apply_rotary_emb)
  • Test fix: Qwen2.5-VL .visual moved to .model.visual
  • InternVL test: v5 meta-tensor init crashes torch.linspace().item() + missing all_tied_weights_keys
  • clean_up_tokenization removed in v5 — InternVL's HF Hub tokenizer (trust_remote_code) still calls it; TOKENIZER_MAPPING.register is bypassed by auto_map
  • Kimi-VL: is_torch_fx_available removed — upstream model code (moonshotai) or sglang shim
  • fp8 quantization incompatible with diffusers — upstream
  • Embedding model crash — SRT engine forward_batch_embedding fails with batch.input_ids=None (TypeError: object of type 'NoneType' has no len() in ForwardBatch.init_new)
  • MiniCPM-o-2_6: v5 AutoProcessor fails — ValueError: Unrecognized feature extractor; v5 can't resolve feature extractor for MiniCPM-o model type
  • MiniCPM-V-4: model sees "text/slice" instead of images — vision embeddings not working correctly (needs investigation)
  • InternVL test: model loads but inference output is wrong (describes both images as SGL logos)
  • DeepSeek-OCR: missing addict, matplotlib packages in CI (not v5-related)
  • DeepSeek-OCR: is_deepseek_nsa() crashes on dict hf_text_configAttributeError: 'dict' object has no attribute 'architectures'
  • Embedding test_matryoshka_embedding: v5 respects config.is_causal=false → bidirectional attention in HF reference, but SGLang always uses causal
  • InternVL2.5-8B piecewise cuda graph: MGSM accuracy drops to ~0.36 (v5-specific; individual prompts correct, fails under concurrent eval load)

…ple files

- Updated `huggingface_hub` dependency to version `>=1.0.0` in `pyproject_cpu.toml`, `pyproject_npu.toml`, `pyproject_other.toml`, `pyproject_xpu.toml`, and `pyproject.toml`.
- Upgraded `transformers` dependency to version `5.0.0` in the same files.
- Removed `hf_transfer` from the dependencies in the aforementioned files.
- Refactored the handling of `rope_theta` and `rope_scaling` parameters to use `config.rope_parameters` in various model files for consistency and improved maintainability.

Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
@github-actions github-actions bot added quant LLM Quantization dependencies Pull requests that update a dependency file deepseek npu diffusion SGLang Diffusion labels Jan 26, 2026
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @JustinTong0323, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request primarily focuses on upgrading core dependencies, most notably the transformers library to version 5.0.0 and huggingface_hub to >=1.0.0. These updates necessitate significant refactoring across various model implementations to align with changes in how Rotary Positional Embedding (RoPE) parameters (rope_theta and rope_scaling) are accessed, moving towards a more unified config.rope_parameters approach. Additionally, the hf_transfer dependency has been removed, and test runners have been adapted to new API changes in the transformers library for multimodal models. The overall impact is enhanced compatibility with the latest Hugging Face ecosystem and improved code consistency.

Highlights

  • Dependency Upgrades: The transformers library has been updated to version 5.0.0, and huggingface_hub to >=1.0.0 across all pyproject.toml configurations.
  • hf_transfer Removal: The hf_transfer dependency and its related activation logic have been removed from project configurations and utility files, streamlining dependencies.
  • RoPE Parameter Refactoring: The handling of rope_theta and rope_scaling parameters has been refactored across numerous model implementations to consistently use config.rope_parameters.get() for improved maintainability and compatibility with the updated transformers library.
  • Test Runner Adaptations: Model loading and feature extraction logic in sglang/test/runners.py have been updated to reflect changes in the transformers library, specifically replacing AutoModelForVision2Seq with AutoModelForImageTextToText and adjusting feature extraction calls to use return_dict=True and pooler_output.
  • Tokenizer Usage Update: A minor adjustment was made in test/registered/core/test_score_api.py to use tokenizer() directly instead of tokenizer.encode_plus().

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request primarily focuses on updating the transformers library to version 5.0.0 and adapting the codebase to changes introduced in this new version. Key modifications include updating huggingface_hub and transformers dependencies, removing the hf_transfer dependency and its associated code, and refactoring the access pattern for rope_theta and rope_scaling parameters across various model files to use config.rope_parameters.get(...). Additionally, transformers API calls in test files have been updated to reflect changes in class names and output access methods. These changes are well-justified and necessary for compatibility with the upgraded transformers library, improving overall code maintainability and consistency.

…les for cleaner dependency management.

Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
@JustinTong0323
Copy link
Collaborator Author

JustinTong0323 commented Jan 26, 2026

/tag-and-rerun-ci run again again

@tugot17
Copy link
Contributor

tugot17 commented Jan 29, 2026

The tokenizer_manager.py also will have to be changed right?

# Wrap each token ID in its own list for batch_decode to decode them separately
 # batch_decode([1, 2, 3]) concatenates tokens, batch_decode([[1], [2], [3]]) decodes separately
token_texts = self.tokenizer.batch_decode([[idx] for idx in token_logprobs_idx])

at least I had to change this to make it run with sglang after manually upgrading

@vincentzed vincentzed mentioned this pull request Jan 29, 2026
5 tasks
Fix tokenizer behavior in auto mode to ensure compatibility with Transformers v5 by explicitly setting use_fast=True when not provided.
…s_causal

- Add _ensure_gguf_version() in get_tokenizer() to fix InvalidVersion: 'N/A'
  when detokenizer loads GGUF models (was only called in get_config())
- Handle dict hf_text_config in is_deepseek_nsa() to fix AttributeError
  for DeepSeek-OCR models whose text_config is a plain dict
- Add config_kwargs={"is_causal": True} to HF embedding test runner so
  SentenceTransformer uses causal attention matching SGLang's behavior
  (v5 now respects config.is_causal=false → bidirectional attention)
@alisonshao alisonshao requested a review from yuan-luo as a code owner March 4, 2026 07:50
@adarshxs
Copy link
Collaborator

adarshxs commented Mar 4, 2026

  • Kimi-VL: is_torch_fx_available removed — upstream model code (moonshotai) or sglang shim

Kimi-VL works and verified. Modeling code is not affected since its an sglang native implementation with no references to is_torch_fx_available

Alison Shao and others added 4 commits March 4, 2026 16:33
…onfig

Models like DeepSeek-OCR store text_config as a plain dict. Previously
this was returned as-is, causing AttributeError crashes at every
downstream site that uses attribute access (e.g. config.hidden_size in
_derive_model_shapes, config.architectures in is_deepseek_nsa).

Convert the dict to PretrainedConfig at the source so all downstream
code works uniformly with attribute access.
Convert dict sub-configs (text_config, llm_config, language_config) to
PretrainedConfig early in get_hf_text_config(), before hasattr() asserts
that fail on plain dicts.  Also propagates torch_dtype from parent config.
…ights_keys

- Add is_torch_fx_available compat shim: remote HF model code (e.g.
  MiniCPM-V) imports this function which was removed in transformers v5.
  Patch it back as always-True since torch.fx is always available in
  PyTorch >= 2.0.

- Add try/except fallback in TestMiniCPMVUnderstandsImage._init_visual:
  v5's from_pretrained expects model.all_tied_weights_keys but MiniCPM-V-4
  remote code only has _tied_weights_keys. Fall back to from_config +
  manual safetensors weight loading (same pattern as InternVL fix).
@alisonshao alisonshao force-pushed the update-transformers-v5 branch from b4a9863 to b9834ae Compare March 5, 2026 20:43
Alison Shao and others added 7 commits March 5, 2026 14:08
…mbedding hidden_size

- Add normalize_rope_scaling_compat() to ensure rope_scaling dicts have
  "type" alongside "rope_type" (fixes KimiVL KeyError on v5)
- Call _fix_added_tokens_encoding() on test processors so special tokens
  like <image_id>, <image>, <slice> encode as single tokens in v5
- Patch composite config hidden_size fallback in runners.py for
  sentence_transformers compatibility with v5 Qwen2VLConfig
- normalize_rope_scaling_compat: wrap getattr in try/except for configs
  where rope_scaling property raises AttributeError, remove redundant
  rope_parameters patching
- KimiVL _init_visual: reset v5 auto-populated rope_scaling to None
  so remote code takes correct branch, patch tie_weights to accept
  recompute_mapping kwarg from v5 post_init, cast pixel_values dtype
InternLM2 is used as a sub-model for InternVL, where its config may be
created from a dict via InternLM2Config(**dict). Use getattr with
defaults instead of config.rope_parameters[] for safer access.
…sBackend fallback

- Bump transformers from v5.2.0 to v5.3.0 to fix InternVL2.5 MGSM
  accuracy regression (0.38 → 0.90 on v5.3.0)
- Fix DeepSeek-OCR OOM: _override_deepseek_ocr_v_head_dim now also
  patches language_config (which get_hf_text_config may prefer over
  text_config), preventing KV cache profiler from underestimating by 2x
- Add TokenizersBackend auto-retry: detect when transformers v5 silently
  falls back to a generic TokenizersBackend and retry with
  trust_remote_code=True

Tested on H200 with v5.3.0: InternVL2.5-8B MGSM 0.896, DeepSeek-OCR,
MiniCPM-o-2_6, InternVL2-2B all pass.
…ion calls

In transformers v5, LlamaTokenizer rebuilds the pre_tokenizer and decoder
from scratch with Llama-specific components (Metaspace), discarding the
originals from tokenizer.json. This breaks models like DeepSeek-V3.2 that
specify LlamaTokenizerFast but use ByteLevel pre_tokenizer/decoder.

Add _fix_v5_tokenizer_components() to detect the mismatch by comparing
the loaded tokenizer's pre_tokenizer against the raw tokenizer.json and
restore the original components when they differ.

Update TestDeepSeekV32Detector to use sglang's get_tokenizer() so the
fix applies in tests too.

Tested: all 145 function_call_parser tests pass on v5.3.0.
…glang's

Transformers v5.3.0 registers its own NemotronHConfig with model_type
"nemotron_h", which only supports 3 layer types (M, *, E) and crashes
with KeyError on '-' (MLP) in hybrid_override_pattern. Sglang's custom
NemotronHConfig supports all 4 types but loses the AutoConfig registration
since the upstream one takes priority.

Catch KeyError in get_config(), extract model_type from the raw config
dict, and fall back to sglang's config class when available.
@Kangyan-Zhou Kangyan-Zhou changed the title Upgrade transformers==5.2.0 Upgrade transformers==5.3.0 Mar 7, 2026
Mixtral-8x7B AWQ MoE int4 kernels require Triton autotuning on H100
runners with cold cache, taking ~400s for the first CUDA graph batch.
This increases the CI estimated time to avoid timeout on slow runners.
@alisonshao
Copy link
Collaborator

Alison Shao added 2 commits March 7, 2026 13:38
- qwen3_next.py: Use getattr(config, "rope_theta", 10000) instead of
  config.rope_parameters["rope_theta"] which is None when no rope_scaling
- model_config.py: Use rope_scaling.get("factor", 1.0) instead of
  rope_scaling["factor"] which may be missing in v5's rope_parameters format
…Mo pad_token_id, memory release

- factory.py: Use .get() for rope_scaling["factor"] and ["original_max_position_embeddings"]
  across all scaling types (llama3, linear, dynamic, yarn, deepseek_yarn, longrope, cpu)
- deepseek_v2.py: Use .get("factor", 1.0) for mscale computation
- hf_transformers_utils.py: Handle KeyError 'deepseek_v32' from v5 CONFIG_MAPPING
  (v5 throws KeyError instead of ValueError for unrecognized model_type)
- mimo_v2_flash.py: Use getattr for pad_token_id (MiMoV2FlashConfig lacks attribute in v5)
- test_multi_instance_release_memory_occupation.py: Add gc.collect() before empty_cache()
  (v5's from_pretrained dispatch hooks create circular refs preventing deallocation)
JustinTong0323 and others added 6 commits March 9, 2026 18:47
- MiMo-V2: Use direct config attributes (rope_theta, rope_scaling,
  swa_rope_theta) instead of v5's rope_parameters property which
  loses custom fields. Treat rope_type=default as None.
- ModelOpt loader: Wrap AutoConfig.from_pretrained in try/except
  falling back to get_config() for models needing config patching.
pyproject_xpu.toml, pyproject_other.toml, pyproject_npu.toml, and
pyproject_cpu.toml were still pinned to 5.2.0 while pyproject.toml
was already at 5.3.0.
Per reviewer feedback (Xinyuan), replace silent .get("factor", 1.0)
defaults with _get_rope_param() helper that logs a WARNING when a key
is missing. Makes accuracy bugs from v5 config mismatches easier to
debug.

- factory.py: Add _get_rope_param() helper, use for all scaling types
- model_config.py: Add inline warnings for BailingMoe/SarvamMLA/DeepSeek MLA
- deepseek_v2.py: Add inline warning for mscale factor
The merge of main into update-transformers-v5 incorrectly resolved
rope_scaling=None (v5 branch) vs rope_scaling=getattr(config, "rope_scaling", None)
(main) as rope_scaling=config.rope_parameters. This is wrong because in v5,
config.rope_parameters always returns a non-None dict even for models with
no scaling, which would break MLA attention that expects None for no-scaling.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

deepseek dependencies Pull requests that update a dependency file diffusion SGLang Diffusion high priority Multi-modal multi-modal language model npu quant LLM Quantization run-ci

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants