Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
345a2fd
in. com.
Goekdeniz-Guelmez Sep 9, 2025
9d73b43
adding attention + gated rms norm
Goekdeniz-Guelmez Sep 9, 2025
ae3ba82
adding Qwen3NextDecoderLayer
Goekdeniz-Guelmez Sep 9, 2025
eeb9e22
adding Qwen3NextModel
Goekdeniz-Guelmez Sep 9, 2025
9e62688
adding Model
Goekdeniz-Guelmez Sep 9, 2025
8aa2017
adding MLP
Goekdeniz-Guelmez Sep 9, 2025
0dd5093
adding Qwen3NextGatedDeltaNet
Goekdeniz-Guelmez Sep 9, 2025
91f527f
updates
Goekdeniz-Guelmez Sep 9, 2025
416e0c7
updates
Goekdeniz-Guelmez Sep 9, 2025
936a72a
upd. ackn.
Goekdeniz-Guelmez Sep 9, 2025
a720053
nits
Goekdeniz-Guelmez Sep 9, 2025
089c2be
making it trainable
Goekdeniz-Guelmez Sep 9, 2025
222627f
inference fix
Goekdeniz-Guelmez Sep 9, 2025
4176d9d
gibberish inference
Goekdeniz-Guelmez Sep 9, 2025
0c5507c
fix training
Goekdeniz-Guelmez Sep 9, 2025
71a2d48
Merge branch 'main' into adding-qwen3-next
Goekdeniz-Guelmez Sep 10, 2025
aa65e7d
fix for batching
Goekdeniz-Guelmez Sep 10, 2025
fd6c110
nits
Goekdeniz-Guelmez Sep 10, 2025
fa20e46
optimize
Goekdeniz-Guelmez Sep 10, 2025
f95f3fe
updates
Goekdeniz-Guelmez Sep 10, 2025
3864198
closer
Goekdeniz-Guelmez Sep 10, 2025
48f5222
upd.
Goekdeniz-Guelmez Sep 10, 2025
7bf6f8a
fix inference
Goekdeniz-Guelmez Sep 10, 2025
daf6f0b
fix
Goekdeniz-Guelmez Sep 10, 2025
1d952a4
optimization
Goekdeniz-Guelmez Sep 10, 2025
21afa60
nits
Goekdeniz-Guelmez Sep 10, 2025
1d07811
minimize
Goekdeniz-Guelmez Sep 10, 2025
12560cd
clean ups
Goekdeniz-Guelmez Sep 10, 2025
65f4250
format
Goekdeniz-Guelmez Sep 10, 2025
e42be94
nits
Goekdeniz-Guelmez Sep 10, 2025
ac55338
format again
Goekdeniz-Guelmez Sep 10, 2025
fa2e5c4
set some defaults
Goekdeniz-Guelmez Sep 11, 2025
e1f104e
alternateing layer defaults
Goekdeniz-Guelmez Sep 11, 2025
7d248a1
remove MTP layers
Goekdeniz-Guelmez Sep 11, 2025
06a97ed
add head dim but optional
Goekdeniz-Guelmez Sep 11, 2025
6e30a19
nits + format
Goekdeniz-Guelmez Sep 11, 2025
605c4c6
some nits
Sep 11, 2025
39f207d
some fixes
Sep 11, 2025
bcf76ac
fixes
Sep 12, 2025
ef346c3
move f to innit
Goekdeniz-Guelmez Sep 12, 2025
7b792c9
optimized recurrent_gated_delta_rule
Goekdeniz-Guelmez Sep 12, 2025
a0685b1
optmize and shorten recurrent_gated_delta_rule a lot + moving g = mx.…
Goekdeniz-Guelmez Sep 12, 2025
ca24475
make train better
Goekdeniz-Guelmez Sep 12, 2025
8a9809a
nits
Sep 12, 2025
16ca09a
nits + fix
Sep 13, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions ACKNOWLEDGMENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,11 +13,11 @@ MLX LM was developed with contributions from the following individuals:
THUKEG's `GLM4`, Rednote `dots.llm1`, Baisu's `Ernie4.5 MoE`, inclusionAI's
`Bailing MoE e.g. Ling-family`, Klear team - Kuaishou Technology's `Klear`,
IBM's `Granite MoE`, Meituan's `LongCat`, Nvidia's `Nemotron H`, Swiss-AI's
`Apertus`, Nikity's `Lille130m`, and Allenai's `OLMoE`; Added support for the
following training algorithms: `Full Weight Fine-Tuning`, and the `Muon`
`Apertus`, Nikity's `Lille130m`, Alibaba Qwen's `Qwen3Next`, and Allenai's `OLMoE`;
Helped add support for the following model architectures: Alibaba Qwen's `Qwen3 & Qwen3MoE)`;
Added support for the following training algorithms: `Full Weight Fine-Tuning`, and the `Muon`
optimizer; Added support for the following other features: `Multiple Optimizers
to choose for training`, and `reporting training metrics to WandB (Weights &
Biases)`.
to choose for training`, and `reporting training metrics to WandB (Weights & Biases)`.
- Prince Canuma: Helped add support for the following model architectures:
HuggingFace's `Starcoder2`, Cohere's `Cohere (1 and 2)`, Alibaba Qwen's `Qwen
(2, 3 and MoE)`, Microsoft's `Phi (3 and 3.5 MoE)`, `BitNet1.58`, Meta's `Llama
Expand Down
2 changes: 1 addition & 1 deletion mlx_lm/models/qwen3_moe.py
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,7 @@ def __call__(
gates = mx.softmax(gates, axis=-1, precise=True)

k = self.top_k
inds = mx.stop_gradient(mx.argpartition(-gates, kth=k - 1, axis=-1)[..., :k])
inds = mx.argpartition(gates, kth=-k, axis=-1)[..., -k:]
scores = mx.take_along_axis(gates, inds, axis=-1)
if self.norm_topk_prob:
scores /= mx.sum(scores, axis=-1, keepdims=True)
Expand Down
Loading