[tests] Review tests for PR #679#24
Closed
danielhanchen wants to merge 14 commits into
Closed
Conversation
Fixes a "The size of tensor a (N) must match the size of tensor b (M)"
RuntimeError in gpt-oss inference when generation crosses the sliding
window (e.g. short prompt with max_new_tokens that pushes the sequence
past 128 tokens) on transformers 5.x running below its required torch
(< 2.11). In that case the KV cache returns more positions for a
full-attention layer than the causal mask covers (pre-allocated cache
slots), so the eager path's `attn_weights += attention_mask` crashes.
The surplus key positions are masked out anyway, so attend only the
overlap: trim key/value (or the mask) to the shorter length in both
eager attention variants. This keeps inference correct and shape
consistent across transformers 4.57.x / 5.5.x and torch 2.9 - 2.11.
Also drop the dead singular `past_key_value` cache_position-free forward
variant: the singular naming predates transformers dropping
`cache_position`, so that signature combination does not exist in any
release. Only the `past_key_values` variant is reachable.
Verified on gpt-oss-20b across the full matrix (transformers
{4.57.6, 5.5.0} x torch {2.9.1, 2.10.0, 2.11.0}): all reasoning efforts
generate coherent output, and greedy continuations match across torch
versions for a given transformers version.
Reload + metadata follow-ups from review feedback: 1. loader._apply_lora_at_paths() now also recreates LoRASwitchLinear for SwitchLinear / QuantizedSwitchLinear and LoRAEmbedding for Embedding / QuantizedEmbedding. The previous version saved switch rank metadata but the reload helper only wrapped Linear, so MoE adapter weights were silently dropped at load time. Switch/embedding imports are wrapped in try/except for older mlx-lm. 2. utils._get_mlx_dropout_probability() now reads MLX _p_1 keep-prob first and falls back to .p. Compat shims that expose both (.p=0.0 default plus a real _p_1) previously wrote stale dropout=0.0. 3. utils._infer_mlx_lora_rank() returns None instead of falling back to lora_a.shape[-1] when lora_a and lora_b disagree on the rank dimension. The previous fallback wrote the input dim as rank for partially materialized or non-LoRA shapes. 4. utils._enrich_mlx_adapter_config() only infers rank/scale/dropout from modules that appear in the caller-provided unsloth_mlx_lora_module_paths set. Previously an unrelated earlier LoRA in named_modules() would write the wrong language-tower params when the caller selected a vision/projector path. Also distinguishes "caller passed nothing" from "caller passed [] or None" so explicit empty values are preserved. 5. utils._enrich_mlx_adapter_config() now fills num_layers (derived via _get_transformer_layers) and fine_tune_type when callers invoke save_lora_adapters() without a trainer-built config. mlx-lm load_adapters() dereferences config.num_layers and previously crashed with AttributeError on these direct-save artifacts.
The 3D rank-consistency check in _infer_mlx_lora_rank() used lora_b_shape[-2] which is the out_features axis. mlx-lm's LoRASwitchLinear stores lora_b as (num_experts, out_dims, rank) so rank lives on lora_b_shape[-1]. Using [-2] caused valid MoE adapters to be rejected as contradicted and rank to be left unrecorded in adapter_config.json.
- _enrich_mlx_adapter_config: an explicit empty unsloth_mlx_lora_module_paths list now preserves the caller topology but no longer suppresses live rank/scale/dropout inference (treat empty as no filter, with a fallback module captured before filtering). - MLXTrainer.save_model: skip lora_a-bearing modules whose rank cannot be inferred instead of silently falling back to rank 8 while still copying the bad module scale and dropout. - _apply_lora_at_paths: when from_base does not accept scale/dropout kwargs on older mlx-lm, retry with r=rank alone and restore the saved scale on the wrapped layer (scale is a Python attribute, not loaded by load_weights).
- _enrich_mlx_adapter_config: capture the fallback rank/scale/dropout only from modules that pass the explicit_path_set filter, so an explicit caller-supplied path list whose modules are not inferable no longer borrows metadata from unrelated LoRA layers. Auto-discovery and empty explicit-list paths still benefit from the fallback unchanged. - _apply_lora_at_paths: when from_base() cannot accept the dropout kwarg on older mlx-lm, patch wrapped.dropout._p_1 (or .p on shims) after the fallback so the saved adapter dropout is faithfully restored alongside the scale that was already being patched.
- _infer_mlx_lora_rank: replace getattr(lora_a, 'shape', ()) with explicit None and hasattr guards so a None tensor attribute can no longer raise AttributeError on callers that share this helper outside the existing hasattr-gated sites. - _enrich_mlx_adapter_config: remove fallback_rank/scale/dropout variables and the post-loop rescue. After the explicit-path filter was moved ahead of fallback capture, both fallback_* and lora_rank are assigned from the same filtered module pool, making the rescue unreachable. The scoping decision is intentional: an explicit-filtered save must not borrow metadata from unselected modules.
Collaborator
Author
|
Fixes pushed to unslothai#679. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Automated test files from review process