Conversation
There was a problem hiding this comment.
Did you confirm the logits are the same for both text-only and vision prompts before and after the patch? Consider adding tests to test_patched_models.py in line with the Qwen 3.5 heavy tests to verify.
There was a problem hiding this comment.
I verified that the logits matched for text-only work, and that the logits are close-enough within a tolerance for image+text work. I added a test for each of these cases.
tests/patched_model_test_utils.py
Outdated
| REAL_MODEL_CASES = [ | ||
| pytest.param("lmstudio-community/Qwen3.5-2B-MLX-4bit", id="dense"), | ||
| pytest.param( | ||
| "lmstudio-community/Qwen3.5-35B-A3B-MLX-4bit", | ||
| marks=pytest.mark.heavy, | ||
| id="moe", | ||
| ), | ||
| ] | ||
| GEMMA4_MODEL_NAME = "lmstudio-community/gemma-4-E2B-it-MLX-4bit" | ||
| GEMMA4_IMAGE_TOPK = 5 | ||
| GEMMA4_IMAGE_TOPK_PROB_RTOL = 0.25 | ||
| GEMMA4_IMAGE_TOPK_PROB_REF_FLOOR = 1e-3 |
There was a problem hiding this comment.
Feels like these should be in the model-specific test files, not the shared utils.
There was a problem hiding this comment.
Moved these to the model-specific files
| from tests.shared import read_image_b64 | ||
|
|
||
|
|
||
| def test_gemma4_text_only_generation_patched_matches_unpatched(): |
There was a problem hiding this comment.
Should this be heavy? I see the VLM test is.
There was a problem hiding this comment.
Yes, missed it. Added pytestmark = pytest.mark.heavy
Summary of changes:
_BatchedLogitsProcessorAdapternow that Align batch logits processor token contract ml-explore/mlx-lm#1115 has landedper_layer_inputsis wired into the language model.embed_scalebefore passing into the language model, because mlx-lm will apply it.To verify that this implementation is correct, I ensured the following things:
input_idsreaching the firstmlx-lmpass match nativemlx-vlmper_layer_inputsreaching that same pass also match nativemlx-vlmmlx-lmmatch native behavior. The only difference is a slight rounding error for image embeddings from the un-application of the embed_scale followed by a later re-application.Codex review:
