Skip to content

Revert "Various Transformers v5 fixes" (#38127)#38229

Closed
zhewenl wants to merge 1 commit intovllm-project:mainfrom
zhewenl:auto-revert/pr-38127
Closed

Revert "Various Transformers v5 fixes" (#38127)#38229
zhewenl wants to merge 1 commit intovllm-project:mainfrom
zhewenl:auto-revert/pr-38127

Conversation

@zhewenl
Copy link
Copy Markdown
Collaborator

@zhewenl zhewenl commented Mar 26, 2026

Revert of #38127

This reverts commit 3c3c084.

Reason

PR #38127 changed deepseek_vl2.py to remove kv_lora_rank when None before passing to DeepseekV2Config. This caused KV cache initialization failures in NixlConnector PD tests using deepseek-ai/deepseek-vl2-tiny:

  • CrossLayer KV layout Distributed NixlConnector PD accuracy tests (4 GPUs): AssertionError: KV cache sizes must match between P and D when replicated
  • DP EP Distributed NixlConnector PD accuracy tests (4 GPUs): RuntimeError: All kv cache tensors must have the same number of blocks
  • Distributed NixlConnector PD accuracy (4 GPUs): AssertionError: All kv cache tensors must have the same number of blocks

All 3 failures are new in build #58181.


Auto-generated by CI failure analyzer.

@mergify mergify bot added the deepseek Related to DeepSeek models label Mar 26, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request primarily introduces support for Olmo3Config by moving its definition to vllm.transformers_utils.configs and integrating it into the vLLM configuration system. However, two high-severity issues were identified in the review. The first issue is a re-introduced bug in vllm/config/model.py where sliding_window=0 is not correctly handled as disabled, potentially leading to incorrect behavior. The second issue is a re-introduced bug in vllm/transformers_utils/configs/deepseek_vl2.py where passing kv_lora_rank: None in language_config can cause a TypeError.

Comment thread vllm/config/model.py
Comment on lines 589 to 590
self.original_max_model_len = self.max_model_len
self.max_model_len = self.get_and_verify_max_len(self.max_model_len)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This revert re-introduces an issue where sliding_window=0 is not correctly handled as 'disabled'. Some models use 0 to indicate that sliding window attention is disabled, but vLLM expects None. Without this conversion, a sliding_window of 0 might be passed to attention layers, which could lead to incorrect behavior or errors.

The original change seems correct and it's likely this was reverted as part of a larger revert. It should be re-introduced to prevent issues with models that use sliding_window: 0.

        # Some checkpoints set sliding_window to 0 to indicate that sliding window is
        # disabled, but vLLM uses None for that. Convert 0 to None to avoid errors.
        # Set before get_and_verify_max_len to ensure that max_model_len does not get
        # capped to 0.
        if self.get_sliding_window() == 0:
            self.disable_sliding_window = True
            self.hf_text_config.sliding_window = None

        self.original_max_model_len = self.max_model_len
        self.max_model_len = self.get_and_verify_max_len(self.max_model_len)

Comment on lines 116 to 117
language_config = kwargs.get("language_config", {})
# remove kv_lora_rank if not specified, passing None is prohibited
if language_config.get("kv_lora_rank") is None:
language_config.pop("kv_lora_rank", None)
self.text_config = DeepseekV2Config(**language_config)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This revert re-introduces a potential bug. If language_config contains kv_lora_rank: None, it will be passed to DeepseekV2Config, which can cause a TypeError later on when kv_lora_rank is used in arithmetic operations (e.g., in DeepseekV2Attention).

The original change attempted to fix this by removing the key, but that seems to have caused other issues, likely due to in-place modification of the language_config dictionary. A safer fix that avoids this side effect is to create a copy before modification.

Consider this alternative which should be safer and prevent both the TypeError and the CI failures seen in the original PR:

Suggested change
language_config = kwargs.get("language_config", {})
# remove kv_lora_rank if not specified, passing None is prohibited
if language_config.get("kv_lora_rank") is None:
language_config.pop("kv_lora_rank", None)
self.text_config = DeepseekV2Config(**language_config)
language_config = kwargs.get("language_config", {}).copy()
if language_config.get("kv_lora_rank") is None:
language_config.pop("kv_lora_rank", None)
self.text_config = DeepseekV2Config(**language_config)

@hmellor
Copy link
Copy Markdown
Member

hmellor commented Mar 26, 2026

Reverting this will block vLLM from upgrading to Transformers v5 because it is misusing DeepseekV2Config.

@hmellor
Copy link
Copy Markdown
Member

hmellor commented Mar 26, 2026

This also reverts a bunch of unrelated things. I'm going to fix this properly.

@hmellor hmellor closed this Mar 26, 2026
@hmellor
Copy link
Copy Markdown
Member

hmellor commented Mar 26, 2026

The fix for the specific issue is applied in #38247

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

deepseek Related to DeepSeek models

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants