[tests] Review tests for PR #601 by danielhanchen · Pull Request #11 · shimmyshimmer/unsloth-zoo-staging-2

danielhanchen · 2026-04-20T14:29:46Z

Automated test files from review process

The shared `_make_qwen_moe_lora_extractor` (used by Qwen3-MoE, Qwen3.5/3.6 MoE, and Qwen3-Next) produced `first=(E, out_dim, R)` instead of the `(E, in_dim, R)` shape expected by `forward_native_grouped_mm`. On models like Qwen3.6-35B-A3B this triggered, during the first training step: torch._grouped_mm(inputs, weight, offs=offsets) RuntimeError: contraction dimension of mat_a and mat_b must match when `permuted_input` (N, in_dim) was matmul'd against a first_weight whose second-to-last dim was `out_dim` (e.g. `2*intermediate_dim` for gate_up_proj on Qwen3.6-35B-A3B's 256-expert architecture). Root cause: the explicit `param_name in ("gate_up_proj", "down_proj")` branches and the `dim_B == hidden_dim` branch all constructed `first_weight = weight_B.view(dim_B, E, R).permute(1, 0, 2)` — i.e. derived from `lora_B`, which has shape `(out_dim, E*R)` — so `first.shape[-2]` ended up as `out_dim`, not `in_dim`. The final fallback at the bottom of the function was already correct. Fix: drop the broken branches. The correct mapping — identical to the default extractor in `moe_utils.py::_extract_lora_from_wrapper` and to the working Qwen3-VL-MoE extractor in `qwen3_vl_moe.py::_qwen3_vl_lora_extractor` — is format-independent: weight_A : (E*R, in_dim) -> view(E, R, in_dim).permute(0, 2, 1) = (E, in_dim, R) weight_B : (out_dim, E*R) -> view(out_dim, E, R).permute(1, 2, 0) = (E, R, out_dim) PEFT LoRA weights have fixed shape relative to the linear's in/out dims; they don't depend on whether base weights are stored "standard" (E, out, in) or "transposed" (E, in, out) — that distinction is handled upstream by `preprocess_weight`. Verified against Qwen3.6-35B-A3B (unsloth/Qwen3.6-35B-A3B): the LoRA forward path through `torch._grouped_mm` no longer fails with the contraction-dim error, and training progresses past the first forward into the expected memory-bound regime. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

danielhanchen · 2026-04-20T15:06:10Z

Fixes pushed to unslothai#601.

danielhanchen force-pushed the pr-601-tests branch from 14683d2 to 2ba0920 Compare April 20, 2026 14:59

danielhanchen added 3 commits April 20, 2026 15:05

Merge remote-tracking branch 'origin/main'

10fd68b

Add review tests for qwen MoE LoRA extractor

d3e29e8

Consolidate review tests for qwen MoE LoRA extractor

2f0d8d9

danielhanchen force-pushed the pr-601-tests branch from d74969d to 2f0d8d9 Compare April 20, 2026 15:05

danielhanchen closed this Apr 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[tests] Review tests for PR #601#11

[tests] Review tests for PR #601#11
danielhanchen wants to merge 4 commits into
mainfrom
pr-601-tests

danielhanchen commented Apr 20, 2026

Uh oh!

danielhanchen commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

danielhanchen commented Apr 20, 2026

Uh oh!

danielhanchen commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants