fix(gemma4): key shared KV by layer_type on transformers >=5.8#3701
Conversation
The fused Gemma4 attention monkeypatch read and stored shared KV states by `kv_shared_layer_index`/`layer_idx`, but transformers 5.8 dropped the `kv_shared_layer_index` attribute and switched to keying `shared_kv_states` by `layer_type`. On the pinned transformers 5.9, any Gemma4 model with `num_kv_shared_layers > 0` (e.g. gemma-4-E2B vision) raised `AttributeError: 'Gemma4TextAttention' object has no attribute 'kv_shared_layer_index'` once execution reached a shared layer. Derive the read/store key from whichever attribute the installed transformers exposes, keeping compatibility with both the old and new APIs. Add a fused-attn regression with `num_kv_shared_layers > 0` so the shared-KV branch is actually exercised (existing tests defaulted to 0).
📝 WalkthroughWalkthroughThis PR adds helper functions to the Gemma4 fused attention monkeypatch that abstract away shared-KV dictionary key derivation, enabling support for multiple Transformers version conventions. The fused forward pass is updated to use these helpers for both KV retrieval and storage, and regression tests exercise the shared-KV code path end-to-end. ChangesGemma4 Shared-KV Key Mapping
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/axolotl/monkeypatch/models/gemma4/fused_attn.py`:
- Around line 34-44: The shared-KV keying is inconsistent and unsafe: change
_shared_kv_read_key and _shared_kv_store_key to consistently prefer
attn.kv_shared_layer_index (return attn.kv_shared_layer_index when present)
rather than falling back to layer_type/store layer_idx in only one function;
also make the selection safe by checking/getting kv_shared_layer_index on both
producer and consumer (e.g., use getattr(attn, "kv_shared_layer_index", None)
and only use it if not None on both sides), otherwise fall back to the same
consistent legacy key (e.g., attn.layer_idx or attn.layer_type) in both
_shared_kv_read_key and _shared_kv_store_key so read/store use the identical
cache keying.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 51b26822-2991-4e3b-b0c6-6a409acc6f00
📒 Files selected for processing (2)
src/axolotl/monkeypatch/models/gemma4/fused_attn.pytests/monkeypatch/test_gemma4_fused_attn.py
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Description
Fix Gemma4 KV-sharing fine-tuning on transformers >=5.8. The fused Gemma4 attention monkeypatch keyed
shared_kv_statesbykv_shared_layer_index/layer_idx, but transformers 5.8 removedkv_shared_layer_indexand now keys bylayer_type. The key is now derived from whichever attribute the installed transformers exposes, keeping both the old (<=5.5) and new (>=5.8) APIs working.Motivation and Context
On Axolotl v0.17.0 (pinned
transformers==5.9.0), LoRA fine-tuning Gemma4 vision models (e.g.gemma-4-E2B-it,gemma-4-E4B-it) fails withAttributeError: 'Gemma4TextAttention' object has no attribute 'kv_shared_layer_index'. Only models withnum_kv_shared_layers > 0hit it (E2B: 20/35 layers, E4B: 18/42);gemma-4-31B-itandgemma-4-26B-A4B-ithave 0 shared layers and were unaffected. v0.16 worked.How has this been tested?
TestFusedAttnSharedKVwithnum_kv_shared_layers > 0; suite: 7 passed, 1 xfailed.AI Usage Disclaimer
Yes — Claude Code Opus 4.8 assisted with debug and fix
Types of changes
Summary by CodeRabbit
Bug Fixes
Tests