Revert "Various Transformers v5 fixes" (#38127)#38229
Revert "Various Transformers v5 fixes" (#38127)#38229zhewenl wants to merge 1 commit intovllm-project:mainfrom
Conversation
This reverts commit 3c3c084.
There was a problem hiding this comment.
Code Review
This pull request primarily introduces support for Olmo3Config by moving its definition to vllm.transformers_utils.configs and integrating it into the vLLM configuration system. However, two high-severity issues were identified in the review. The first issue is a re-introduced bug in vllm/config/model.py where sliding_window=0 is not correctly handled as disabled, potentially leading to incorrect behavior. The second issue is a re-introduced bug in vllm/transformers_utils/configs/deepseek_vl2.py where passing kv_lora_rank: None in language_config can cause a TypeError.
| self.original_max_model_len = self.max_model_len | ||
| self.max_model_len = self.get_and_verify_max_len(self.max_model_len) |
There was a problem hiding this comment.
This revert re-introduces an issue where sliding_window=0 is not correctly handled as 'disabled'. Some models use 0 to indicate that sliding window attention is disabled, but vLLM expects None. Without this conversion, a sliding_window of 0 might be passed to attention layers, which could lead to incorrect behavior or errors.
The original change seems correct and it's likely this was reverted as part of a larger revert. It should be re-introduced to prevent issues with models that use sliding_window: 0.
# Some checkpoints set sliding_window to 0 to indicate that sliding window is
# disabled, but vLLM uses None for that. Convert 0 to None to avoid errors.
# Set before get_and_verify_max_len to ensure that max_model_len does not get
# capped to 0.
if self.get_sliding_window() == 0:
self.disable_sliding_window = True
self.hf_text_config.sliding_window = None
self.original_max_model_len = self.max_model_len
self.max_model_len = self.get_and_verify_max_len(self.max_model_len)| language_config = kwargs.get("language_config", {}) | ||
| # remove kv_lora_rank if not specified, passing None is prohibited | ||
| if language_config.get("kv_lora_rank") is None: | ||
| language_config.pop("kv_lora_rank", None) | ||
| self.text_config = DeepseekV2Config(**language_config) |
There was a problem hiding this comment.
This revert re-introduces a potential bug. If language_config contains kv_lora_rank: None, it will be passed to DeepseekV2Config, which can cause a TypeError later on when kv_lora_rank is used in arithmetic operations (e.g., in DeepseekV2Attention).
The original change attempted to fix this by removing the key, but that seems to have caused other issues, likely due to in-place modification of the language_config dictionary. A safer fix that avoids this side effect is to create a copy before modification.
Consider this alternative which should be safer and prevent both the TypeError and the CI failures seen in the original PR:
| language_config = kwargs.get("language_config", {}) | |
| # remove kv_lora_rank if not specified, passing None is prohibited | |
| if language_config.get("kv_lora_rank") is None: | |
| language_config.pop("kv_lora_rank", None) | |
| self.text_config = DeepseekV2Config(**language_config) | |
| language_config = kwargs.get("language_config", {}).copy() | |
| if language_config.get("kv_lora_rank") is None: | |
| language_config.pop("kv_lora_rank", None) | |
| self.text_config = DeepseekV2Config(**language_config) |
|
Reverting this will block vLLM from upgrading to Transformers v5 because it is misusing |
|
This also reverts a bunch of unrelated things. I'm going to fix this properly. |
|
The fix for the specific issue is applied in #38247 |
Revert of #38127
This reverts commit 3c3c084.
Reason
PR #38127 changed
deepseek_vl2.pyto removekv_lora_rankwhenNonebefore passing toDeepseekV2Config. This caused KV cache initialization failures in NixlConnector PD tests usingdeepseek-ai/deepseek-vl2-tiny:AssertionError: KV cache sizes must match between P and D when replicatedRuntimeError: All kv cache tensors must have the same number of blocksAssertionError: All kv cache tensors must have the same number of blocksAll 3 failures are new in build #58181.
Auto-generated by CI failure analyzer.