[Bugfix] DeepSeek V4: support transformers >= 4.57 normalized compress_ratios#42806
[Bugfix] DeepSeek V4: support transformers >= 4.57 normalized compress_ratios#42806dparikh79 wants to merge 1 commit into
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in PRs do not trigger a full CI run by default. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add If you have any questions, please reach out to us on Slack at https://slack.vllm.ai. Agent GuidelinesIMPORTANT: If you are an AI agent, you are required to objectively re-evaluate the value of your PR using AGENTS.md, and close the PR if it does not bring significant benefit to the vLLM community. Failure to do so may result in an immediate ban. 🚀 |
There was a problem hiding this comment.
Code Review
This pull request updates the DeepSeek-V4 model executor to handle configuration changes introduced in transformers version 4.57, specifically regarding how compression ratios are stored. It adds logic to support both the legacy 'compress_ratios' attribute and the newer 'compress_rates' and 'layer_types' fields. Feedback suggests using 'getattr' with default values when accessing these new configuration fields to prevent potential AttributeErrors and improve robustness against incomplete configurations.
| rates = config.compress_rates or {} | ||
| raw = rates.get(config.layer_types[layer_id], 0) |
There was a problem hiding this comment.
Accessing config.compress_rates and config.layer_types directly may raise an AttributeError if these fields are missing from the configuration object (which can happen if they are not explicitly set in the model's config.json). Using getattr with appropriate defaults is more robust and follows defensive programming practices, ensuring the model can load even if the configuration schema varies slightly or is incomplete.
| rates = config.compress_rates or {} | |
| raw = rates.get(config.layer_types[layer_id], 0) | |
| rates = getattr(config, "compress_rates", {}) or {} | |
| layer_types = getattr(config, "layer_types", []) or [] | |
| raw = rates.get(layer_types[layer_id], 0) if layer_id < len(layer_types) else 0 |
…s_ratios `DeepseekV4Attention.__init__` reads `config.compress_ratios[layer_id]` directly. transformers >= 4.57 normalizes the same JSON field on `DeepseekV4Config.__init__` into `layer_types` (list[str]) + `compress_rates` (dict[str, int]) and stops exposing `compress_ratios`, so every DSV4 model fails to load with: AttributeError: 'DeepseekV4Config' object has no attribute 'compress_ratios'. Did you mean: 'compress_rates'? Read from the normalized fields when `compress_ratios` is absent. The per-layer ratio is reconstructed via the documented 1-to-1 mapping `compress_ratios[i] == compress_rates.get(layer_types[i], 0)`, and the existing `max(1, ...)` clamp keeps the downstream invariant (compress ratio is never 0) intact. Legacy configs with `compress_ratios` keep the original code path, so anyone pinning a pre-4.57 transformers stack sees no behavior change. Fixes vllm-project#42741 Signed-off-by: Dhruvil <dhruvilparikh79@gmail.com>
|
Thanks for the review. Good call on the getattr fallbacks. Pushed an amended commit that:
The Re-ran the standalone repro against three new partial-config shapes (no |
b8d1174 to
8e9c69d
Compare
|
This pull request has merge conflicts that must be resolved before it can be |
Purpose
Closes #42741.
DeepseekV4Attention.__init__readsconfig.compress_ratios[layer_id]directly (vllm/model_executor/models/deepseek_v4.py:960). transformers >= 4.57 reshapes the legacy JSON field onDeepseekV4Config.__init__intolayer_types(list[str]) +compress_rates(dict[str, int]) and stops exposingcompress_ratios. Every DSV4 checkpoint then fails to load on vLLM >= 0.20.2:Read from the normalized fields when
compress_ratiosis absent. The per-layer ratio is reconstructed via the issue author's documented 1-to-1 mapping:The existing
max(1, ...)clamp at the end keeps the downstream invariant (compress ratio is never 0, see line 1019 whereself.compress_ratio > 1gatescompress_rope_theta) intact for both paths. Legacy configs withcompress_ratioskeep the original code path, so anyone pinning a pre-4.57 transformers stack sees no behavior change.Duplicate-work check
No existing PR addresses this bug.
Test Plan
The CUDA / GPU code path that DSV4 actually exercises is not reachable on a CPU dev box, but the resolution logic is small enough to validate at the Python level with a synthetic config. Twenty assertions covering both schemas:
AttributeErroron a normalized-only config.compress_ratios): 8 layer indices, fixed values match the pre-fix path.layer_types+compress_rates): 8 layer indices, fixed values match the legacy schema for the same logical mapping.compress_rates: falls back to 1 via themax(1, ...)clamp.compress_rates is None: same fallback (covered byconfig.compress_rates or {}).All twenty cases pass under the fix.
Test Result
ruff checkandruff format --checkclean on the modified file.Related work
Same DSV4 bug-report set:
name_mappedUnboundLocalErrorin expert loading.head.weight->lm_head.weightis non-idempotent #42777 / PR [Bugfix] WeightsMapper: make orig_to_new_suffix idempotent #42805:WeightsMapper.orig_to_new_suffixnon-idempotent.Issue author originally offered to bundle the three; opening them separately so reviewers can take each independently.
AI Assistance Disclosure
Per
AGENTS.md, disclosing that this PR was drafted with AI assistance (Claude Code). I read the surroundingDeepseekV4Attention.__init__, traced every downstream use ofself.compress_ratio(lines 1019, 1049, 1081), confirmed the invariant the existingmax(1, ...)clamp preserves, and ran the standalone Python repro against both schemas and the documented edge cases. The fix shape is the minimum to handle the schema variation without changing behavior for any config the upstream code already handles correctly.