Upgrade transformers==5.3.0#17784
Conversation
…ple files - Updated `huggingface_hub` dependency to version `>=1.0.0` in `pyproject_cpu.toml`, `pyproject_npu.toml`, `pyproject_other.toml`, `pyproject_xpu.toml`, and `pyproject.toml`. - Upgraded `transformers` dependency to version `5.0.0` in the same files. - Removed `hf_transfer` from the dependencies in the aforementioned files. - Refactored the handling of `rope_theta` and `rope_scaling` parameters to use `config.rope_parameters` in various model files for consistency and improved maintainability. Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Summary of ChangesHello @JustinTong0323, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request primarily focuses on upgrading core dependencies, most notably the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request primarily focuses on updating the transformers library to version 5.0.0 and adapting the codebase to changes introduced in this new version. Key modifications include updating huggingface_hub and transformers dependencies, removing the hf_transfer dependency and its associated code, and refactoring the access pattern for rope_theta and rope_scaling parameters across various model files to use config.rope_parameters.get(...). Additionally, transformers API calls in test files have been updated to reflect changes in class names and output access methods. These changes are well-justified and necessary for compatibility with the upgraded transformers library, improving overall code maintainability and consistency.
…les for cleaner dependency management. Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
|
/tag-and-rerun-ci run again again |
…files for v5 compatibility. Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
|
The # Wrap each token ID in its own list for batch_decode to decode them separately
# batch_decode([1, 2, 3]) concatenates tokens, batch_decode([[1], [2], [3]]) decodes separately
token_texts = self.tokenizer.batch_decode([[idx] for idx in token_logprobs_idx])at least I had to change this to make it run with sglang after manually upgrading |
Fix tokenizer behavior in auto mode to ensure compatibility with Transformers v5 by explicitly setting use_fast=True when not provided.
…s_causal
- Add _ensure_gguf_version() in get_tokenizer() to fix InvalidVersion: 'N/A'
when detokenizer loads GGUF models (was only called in get_config())
- Handle dict hf_text_config in is_deepseek_nsa() to fix AttributeError
for DeepSeek-OCR models whose text_config is a plain dict
- Add config_kwargs={"is_causal": True} to HF embedding test runner so
SentenceTransformer uses causal attention matching SGLang's behavior
(v5 now respects config.is_causal=false → bidirectional attention)
Kimi-VL works and verified. Modeling code is not affected since its an sglang native implementation with no references to |
…onfig Models like DeepSeek-OCR store text_config as a plain dict. Previously this was returned as-is, causing AttributeError crashes at every downstream site that uses attribute access (e.g. config.hidden_size in _derive_model_shapes, config.architectures in is_deepseek_nsa). Convert the dict to PretrainedConfig at the source so all downstream code works uniformly with attribute access.
Convert dict sub-configs (text_config, llm_config, language_config) to PretrainedConfig early in get_hf_text_config(), before hasattr() asserts that fail on plain dicts. Also propagates torch_dtype from parent config.
…ights_keys - Add is_torch_fx_available compat shim: remote HF model code (e.g. MiniCPM-V) imports this function which was removed in transformers v5. Patch it back as always-True since torch.fx is always available in PyTorch >= 2.0. - Add try/except fallback in TestMiniCPMVUnderstandsImage._init_visual: v5's from_pretrained expects model.all_tied_weights_keys but MiniCPM-V-4 remote code only has _tied_weights_keys. Fall back to from_config + manual safetensors weight loading (same pattern as InternVL fix).
b4a9863 to
b9834ae
Compare
…mbedding hidden_size - Add normalize_rope_scaling_compat() to ensure rope_scaling dicts have "type" alongside "rope_type" (fixes KimiVL KeyError on v5) - Call _fix_added_tokens_encoding() on test processors so special tokens like <image_id>, <image>, <slice> encode as single tokens in v5 - Patch composite config hidden_size fallback in runners.py for sentence_transformers compatibility with v5 Qwen2VLConfig
- normalize_rope_scaling_compat: wrap getattr in try/except for configs where rope_scaling property raises AttributeError, remove redundant rope_parameters patching - KimiVL _init_visual: reset v5 auto-populated rope_scaling to None so remote code takes correct branch, patch tie_weights to accept recompute_mapping kwarg from v5 post_init, cast pixel_values dtype
InternLM2 is used as a sub-model for InternVL, where its config may be created from a dict via InternLM2Config(**dict). Use getattr with defaults instead of config.rope_parameters[] for safer access.
…sBackend fallback - Bump transformers from v5.2.0 to v5.3.0 to fix InternVL2.5 MGSM accuracy regression (0.38 → 0.90 on v5.3.0) - Fix DeepSeek-OCR OOM: _override_deepseek_ocr_v_head_dim now also patches language_config (which get_hf_text_config may prefer over text_config), preventing KV cache profiler from underestimating by 2x - Add TokenizersBackend auto-retry: detect when transformers v5 silently falls back to a generic TokenizersBackend and retry with trust_remote_code=True Tested on H200 with v5.3.0: InternVL2.5-8B MGSM 0.896, DeepSeek-OCR, MiniCPM-o-2_6, InternVL2-2B all pass.
…ion calls In transformers v5, LlamaTokenizer rebuilds the pre_tokenizer and decoder from scratch with Llama-specific components (Metaspace), discarding the originals from tokenizer.json. This breaks models like DeepSeek-V3.2 that specify LlamaTokenizerFast but use ByteLevel pre_tokenizer/decoder. Add _fix_v5_tokenizer_components() to detect the mismatch by comparing the loaded tokenizer's pre_tokenizer against the raw tokenizer.json and restore the original components when they differ. Update TestDeepSeekV32Detector to use sglang's get_tokenizer() so the fix applies in tests too. Tested: all 145 function_call_parser tests pass on v5.3.0.
…glang's Transformers v5.3.0 registers its own NemotronHConfig with model_type "nemotron_h", which only supports 3 layer types (M, *, E) and crashes with KeyError on '-' (MLP) in hybrid_override_pattern. Sglang's custom NemotronHConfig supports all 4 types but loses the AutoConfig registration since the upstream one takes priority. Catch KeyError in get_config(), extract model_type from the raw config dict, and fall back to sglang's config class when available.
Mixtral-8x7B AWQ MoE int4 kernels require Triton autotuning on H100 runners with cold cache, taking ~400s for the first CUDA graph batch. This increases the CI estimated time to avoid timeout on slow runners.
- qwen3_next.py: Use getattr(config, "rope_theta", 10000) instead of
config.rope_parameters["rope_theta"] which is None when no rope_scaling
- model_config.py: Use rope_scaling.get("factor", 1.0) instead of
rope_scaling["factor"] which may be missing in v5's rope_parameters format
…Mo pad_token_id, memory release
- factory.py: Use .get() for rope_scaling["factor"] and ["original_max_position_embeddings"]
across all scaling types (llama3, linear, dynamic, yarn, deepseek_yarn, longrope, cpu)
- deepseek_v2.py: Use .get("factor", 1.0) for mscale computation
- hf_transformers_utils.py: Handle KeyError 'deepseek_v32' from v5 CONFIG_MAPPING
(v5 throws KeyError instead of ValueError for unrecognized model_type)
- mimo_v2_flash.py: Use getattr for pad_token_id (MiMoV2FlashConfig lacks attribute in v5)
- test_multi_instance_release_memory_occupation.py: Add gc.collect() before empty_cache()
(v5's from_pretrained dispatch hooks create circular refs preventing deallocation)
- MiMo-V2: Use direct config attributes (rope_theta, rope_scaling, swa_rope_theta) instead of v5's rope_parameters property which loses custom fields. Treat rope_type=default as None. - ModelOpt loader: Wrap AutoConfig.from_pretrained in try/except falling back to get_config() for models needing config patching.
pyproject_xpu.toml, pyproject_other.toml, pyproject_npu.toml, and pyproject_cpu.toml were still pinned to 5.2.0 while pyproject.toml was already at 5.3.0.
Per reviewer feedback (Xinyuan), replace silent .get("factor", 1.0)
defaults with _get_rope_param() helper that logs a WARNING when a key
is missing. Makes accuracy bugs from v5 config mismatches easier to
debug.
- factory.py: Add _get_rope_param() helper, use for all scaling types
- model_config.py: Add inline warnings for BailingMoe/SarvamMLA/DeepSeek MLA
- deepseek_v2.py: Add inline warning for mscale factor
The merge of main into update-transformers-v5 incorrectly resolved rope_scaling=None (v5 branch) vs rope_scaling=getattr(config, "rope_scaling", None) (main) as rope_scaling=config.rope_parameters. This is wrong because in v5, config.rope_parameters always returns a non-None dict even for models with no scaling, which would break MLA attention that expects None for no-scaling.
Motivation
Address #17779 — Upgrade
transformersto5.3.0.Changes
transformers>=5.2.0,huggingface_hub>=1.0.0; removehf_transferget_rope_config()utility for backward-compatibleconfig.rope_parametersaccesspadding_idx(transformers#41541)CLIPImageProcessorFastreturningtorch.Tensorinstead ofndarraypooler_outputinstead oflast_hidden_stateuse_fast=Truein auto mode; fixspecial_tokens_pattern; synctext_configAutoConfig; GGUF version parsing workaround_apply_rotary_embimport path; fix Qwen2.5-VL.visual→.model.visual.item()and missingall_tied_weights_keysfor v5 compatTODO
config.rope_parameters)padding_idxremovalCLIPImageProcessorFasttensor handlingpooler_outputuse_fast=Truedefault,special_tokens_patternfixInvalidVersion: 'N/A'workaround_apply_rotary_emb).visualmoved to.model.visualtorch.linspace().item()+ missingall_tied_weights_keysclean_up_tokenizationremoved in v5 — InternVL's HF Hub tokenizer (trust_remote_code) still calls it;TOKENIZER_MAPPING.registeris bypassed byauto_mapis_torch_fx_availableremoved — upstream model code (moonshotai) or sglang shimdiffusers— upstreamforward_batch_embeddingfails withbatch.input_ids=None(TypeError: object of type 'NoneType' has no len()inForwardBatch.init_new)AutoProcessorfails —ValueError: Unrecognized feature extractor; v5 can't resolve feature extractor for MiniCPM-o model typeaddict,matplotlibpackages in CI (not v5-related)is_deepseek_nsa()crashes on dicthf_text_config—AttributeError: 'dict' object has no attribute 'architectures'test_matryoshka_embedding: v5 respectsconfig.is_causal=false→ bidirectional attention in HF reference, but SGLang always uses causal