fix(gemma4): key shared KV by layer_type on transformers >=5.8 by thad0ctor · Pull Request #3701 · axolotl-ai-cloud/axolotl

thad0ctor · 2026-06-03T15:31:09Z

Description

Fix Gemma4 KV-sharing fine-tuning on transformers >=5.8. The fused Gemma4 attention monkeypatch keyed shared_kv_states by kv_shared_layer_index/layer_idx, but transformers 5.8 removed kv_shared_layer_index and now keys by layer_type. The key is now derived from whichever attribute the installed transformers exposes, keeping both the old (<=5.5) and new (>=5.8) APIs working.

Motivation and Context

On Axolotl v0.17.0 (pinned transformers==5.9.0), LoRA fine-tuning Gemma4 vision models (e.g. gemma-4-E2B-it, gemma-4-E4B-it) fails with AttributeError: 'Gemma4TextAttention' object has no attribute 'kv_shared_layer_index'. Only models with num_kv_shared_layers > 0 hit it (E2B: 20/35 layers, E4B: 18/42); gemma-4-31B-it and gemma-4-26B-A4B-it have 0 shared layers and were unaffected. v0.16 worked.

How has this been tested?

Real-weights GPU reproduce->fix (transformers 5.8.1, bf16, sdpa, gradient checkpointing): E2B reproduces the exact error with the old key, then runs a clean forward+backward (finite loss & grads); E4B reproduces and passes forward.
Config-level rendezvous check across E2B/E4B/31B/26B-A4B: every shared layer's read-key is written by a storing layer under both keying conventions.
New regression test TestFusedAttnSharedKV with num_kv_shared_layers > 0; suite: 7 passed, 1 xfailed.

AI Usage Disclaimer

Yes — Claude Code Opus 4.8 assisted with debug and fix

Types of changes

Bug fix (non-breaking change which fixes an issue)

Summary by CodeRabbit

Bug Fixes
- Enhanced Gemma4 shared KV attention mechanism to properly support multiple Transformers library conventions.
Tests
- Added regression tests validating Gemma4 shared KV forward and backward propagation with various layer configurations.

The fused Gemma4 attention monkeypatch read and stored shared KV states by `kv_shared_layer_index`/`layer_idx`, but transformers 5.8 dropped the `kv_shared_layer_index` attribute and switched to keying `shared_kv_states` by `layer_type`. On the pinned transformers 5.9, any Gemma4 model with `num_kv_shared_layers > 0` (e.g. gemma-4-E2B vision) raised `AttributeError: 'Gemma4TextAttention' object has no attribute 'kv_shared_layer_index'` once execution reached a shared layer. Derive the read/store key from whichever attribute the installed transformers exposes, keeping compatibility with both the old and new APIs. Add a fused-attn regression with `num_kv_shared_layers > 0` so the shared-KV branch is actually exercised (existing tests defaulted to 0).

coderabbitai · 2026-06-03T15:31:25Z

📝 Walkthrough

Walkthrough

This PR adds helper functions to the Gemma4 fused attention monkeypatch that abstract away shared-KV dictionary key derivation, enabling support for multiple Transformers version conventions. The fused forward pass is updated to use these helpers for both KV retrieval and storage, and regression tests exercise the shared-KV code path end-to-end.

Changes

Gemma4 Shared-KV Key Mapping

Layer / File(s)	Summary
Shared-KV key mapping helpers and integration `src/axolotl/monkeypatch/models/gemma4/fused_attn.py`	`_shared_kv_read_key()` and `_shared_kv_store_key()` helper functions choose the correct shared-KV dictionary keys based on which Gemma4 attention attributes exist, abstracting version differences. Fused forward pass updated to retrieve and store shared KV tensors using these helpers instead of direct attribute access.
Regression tests for shared-KV fused forward `tests/monkeypatch/test_gemma4_fused_attn.py`	`_build_kv_shared_config()` creates a hybrid Gemma4 model configuration with shared-KV layers, and `TestFusedAttnSharedKV.test_shared_kv_forward_backward()` validates fused output shape, finiteness, cosine similarity to stock implementation, and backward pass correctness.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

axolotl-ai-cloud/axolotl#3611: Both PRs modify the shared-KV storage path in fused_attn.py—fix: [gemma4] fix VRAM leak in hybrid FA2+SDPA (hybrid attentiuon) path under activation check… #3611 reroutes through thread-local storage while this PR changes how read/write keys are computed.

Suggested labels

ready to merge

Suggested reviewers

winglian
ved1beta

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 37.50% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the core fix: addressing how shared KV is keyed in Gemma4 for transformers >=5.8, which directly aligns with the main change of introducing key-mapping helpers to support multiple keying conventions.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/axolotl/monkeypatch/models/gemma4/fused_attn.py`:
- Around line 34-44: The shared-KV keying is inconsistent and unsafe: change
_shared_kv_read_key and _shared_kv_store_key to consistently prefer
attn.kv_shared_layer_index (return attn.kv_shared_layer_index when present)
rather than falling back to layer_type/store layer_idx in only one function;
also make the selection safe by checking/getting kv_shared_layer_index on both
producer and consumer (e.g., use getattr(attn, "kv_shared_layer_index", None)
and only use it if not None on both sides), otherwise fall back to the same
consistent legacy key (e.g., attn.layer_idx or attn.layer_type) in both
_shared_kv_read_key and _shared_kv_store_key so read/store use the identical
cache keying.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 51b26822-2991-4e3b-b0c6-6a409acc6f00

📥 Commits

Reviewing files that changed from the base of the PR and between 09d325b and cc09537.

📒 Files selected for processing (2)

src/axolotl/monkeypatch/models/gemma4/fused_attn.py
tests/monkeypatch/test_gemma4_fused_attn.py

codecov · 2026-06-03T15:42:22Z

Codecov Report

❌ Patch coverage is 20.00000% with 8 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
...rc/axolotl/monkeypatch/models/gemma4/fused_attn.py	20.00%	8 Missing ⚠️

📢 Thoughts on this report? Let us know!

coderabbitai Bot reviewed Jun 3, 2026

View reviewed changes

Comment thread src/axolotl/monkeypatch/models/gemma4/fused_attn.py

winglian approved these changes Jun 4, 2026

View reviewed changes

winglian merged commit 41ef48f into axolotl-ai-cloud:main Jun 4, 2026
50 of 64 checks passed

coderabbitai Bot mentioned this pull request Jun 5, 2026

Feat: add gemma4 unified #3706

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(gemma4): key shared KV by layer_type on transformers >=5.8#3701

fix(gemma4): key shared KV by layer_type on transformers >=5.8#3701
winglian merged 1 commit into
axolotl-ai-cloud:mainfrom
thad0ctor:fix/gemma4-vlm-kv-shared-layer-index

thad0ctor commented Jun 3, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 3, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

codecov Bot commented Jun 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

thad0ctor commented Jun 3, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

How has this been tested?

AI Usage Disclaimer

Types of changes

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov Bot commented Jun 3, 2026

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

thad0ctor commented Jun 3, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 3, 2026 •

edited

Loading