Skip to content

Add unified arch for gemma4#305

Merged
neilmehta24 merged 7 commits intomainfrom
neil/gemma4-unified
Apr 8, 2026
Merged

Add unified arch for gemma4#305
neilmehta24 merged 7 commits intomainfrom
neil/gemma4-unified

Conversation

@neilmehta24
Copy link
Copy Markdown
Member

@neilmehta24 neilmehta24 commented Apr 7, 2026

Summary of changes:

  • Removed _BatchedLogitsProcessorAdapter now that Align batch logits processor token contract ml-explore/mlx-lm#1115 has landed
  • Add vision addon for gemma4
    • The vision features for gemma4 include both embeddings and per-layer-inputs. So, add a model patch to ensure that per_layer_inputs is wired into the language model.
    • Unapply embed_scale before passing into the language model, because mlx-lm will apply it.
    • Add tests for vision input, text input, and prompt caching.

To verify that this implementation is correct, I ensured the following things:

  • The input_ids reaching the first mlx-lm pass match native mlx-vlm
  • The per_layer_inputs reaching that same pass also match native mlx-vlm
  • The embeddings seen by mlx-lm match native behavior. The only difference is a slight rounding error for image embeddings from the un-application of the embed_scale followed by a later re-application.
  • I also ran an e2e generation with (1) a multi-image summary task and (2) a single-image transcription task. I didn't notice any quality degradation between mlx-vlm and this unified impl.

Codex review:
Screenshot 2026-04-07 at 12 27 01 PM

@github-actions github-actions bot added the CLA signed Indicates that all contributors have signed label Apr 7, 2026
@neilmehta24 neilmehta24 requested a review from will-lms April 7, 2026 16:42
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you confirm the logits are the same for both text-only and vision prompts before and after the patch? Consider adding tests to test_patched_models.py in line with the Qwen 3.5 heavy tests to verify.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I verified that the logits matched for text-only work, and that the logits are close-enough within a tolerance for image+text work. I added a test for each of these cases.

Comment on lines +31 to +42
REAL_MODEL_CASES = [
pytest.param("lmstudio-community/Qwen3.5-2B-MLX-4bit", id="dense"),
pytest.param(
"lmstudio-community/Qwen3.5-35B-A3B-MLX-4bit",
marks=pytest.mark.heavy,
id="moe",
),
]
GEMMA4_MODEL_NAME = "lmstudio-community/gemma-4-E2B-it-MLX-4bit"
GEMMA4_IMAGE_TOPK = 5
GEMMA4_IMAGE_TOPK_PROB_RTOL = 0.25
GEMMA4_IMAGE_TOPK_PROB_REF_FLOOR = 1e-3
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels like these should be in the model-specific test files, not the shared utils.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved these to the model-specific files

from tests.shared import read_image_b64


def test_gemma4_text_only_generation_patched_matches_unpatched():
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be heavy? I see the VLM test is.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, missed it. Added pytestmark = pytest.mark.heavy

@neilmehta24 neilmehta24 merged commit 147cc6f into main Apr 8, 2026
2 checks passed
@neilmehta24 neilmehta24 deleted the neil/gemma4-unified branch April 8, 2026 20:19
@github-actions github-actions bot locked and limited conversation to collaborators Apr 8, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

CLA signed Indicates that all contributors have signed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants