Skip to content
Merged
Changes from all commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
730c876
New tiny model generation
qgallouedec Apr 24, 2026
a060e6d
cohere and fix vocab size
qgallouedec Apr 24, 2026
158b891
print pr
qgallouedec Apr 24, 2026
f5eedfb
precommit
qgallouedec Apr 24, 2026
ffbf3b1
precommit
qgallouedec Apr 24, 2026
d24a76c
cohere2
qgallouedec Apr 24, 2026
f0f5563
deepseek v3
qgallouedec Apr 24, 2026
59cb16e
revert to keep this focused
qgallouedec Apr 24, 2026
9bc6ad4
nit
qgallouedec Apr 24, 2026
a7ad64a
revert
qgallouedec Apr 24, 2026
6b361e1
revove force and update readme
qgallouedec Apr 24, 2026
b2cf603
nit commit message
qgallouedec Apr 24, 2026
b4bae78
better
qgallouedec Apr 24, 2026
540502a
Align tiny-Glm4MoeForCausalLM with GLM-4.5 reference config
qgallouedec Apr 24, 2026
0b7fa20
fix generation config peft
qgallouedec Apr 24, 2026
538f486
Merge branch 'main' into new-tiny-model-generation
qgallouedec Apr 24, 2026
49d5fca
Merge branch 'new-tiny-model-generation' into fix-tiny-glm4-moe
qgallouedec Apr 24, 2026
39bafd4
Qwen3.6 integration (#5642)
qgallouedec Apr 26, 2026
07e65d7
Release: v1.3 (#5647)
qgallouedec Apr 26, 2026
7198c14
⬆️ Bump dev version (#5648)
qgallouedec Apr 26, 2026
71b8219
Add Qwen3.6 model generation script with updated configuration
qgallouedec Apr 27, 2026
545e5e9
merge main
qgallouedec Apr 27, 2026
db13f29
Merge remote-tracking branch 'origin/main' into new-tiny-model-genera…
qgallouedec Apr 27, 2026
5cc7fc8
Merge branch 'main' into new-tiny-model-generation
qgallouedec Apr 27, 2026
7f25397
Merge remote-tracking branch 'origin/main' into new-tiny-model-genera…
qgallouedec Apr 28, 2026
4730fec
Qwen3 Instruct-2507
qgallouedec Apr 28, 2026
37c6893
Merge branch 'new-tiny-model-generation' into fix-tiny-glm4-moe
qgallouedec Apr 28, 2026
4c4f843
Merge branch 'main' into new-tiny-model-generation
qgallouedec Apr 28, 2026
6a8be8f
Merge branch 'main' into new-tiny-model-generation
qgallouedec Apr 29, 2026
385ac24
Merge branch 'main' into new-tiny-model-generation
qgallouedec Apr 29, 2026
977b0c4
Merge branch 'new-tiny-model-generation' into fix-tiny-glm4-moe
qgallouedec Apr 29, 2026
36baabe
rm smoke test for enc dec
qgallouedec Apr 29, 2026
b80a1fb
Merge branch 'new-tiny-model-generation' into fix-tiny-glm4-moe
qgallouedec Apr 29, 2026
b231373
Upload testing suite for `DistillationTrainer` (#5615)
cmpatino Apr 30, 2026
abb98ac
Fix OOM in CI by reducing batch size in VLM SFT tests (#5687)
albertvillanova Apr 30, 2026
9f19b4a
Fix OOM in CI test reruns due to GPU memory leak from traceback frame…
albertvillanova Apr 30, 2026
3f56be7
Add training-invariance tests (#5686)
qgallouedec Apr 30, 2026
d232332
Regenerate invariance data + relax the tolerance (#5688)
qgallouedec May 1, 2026
3cde729
Merge remote-tracking branch 'origin/main' into new-tiny-model-genera…
qgallouedec May 3, 2026
e05a232
Merge branch 'new-tiny-model-generation' into fix-tiny-glm4-moe
qgallouedec May 3, 2026
bd5e693
gemma3
qgallouedec May 4, 2026
9e3b6d6
Merge branch 'main' into new-tiny-model-generation
qgallouedec May 4, 2026
6366461
Merge branch 'new-tiny-model-generation' into fix-tiny-glm4-moe
qgallouedec May 4, 2026
3bc2a57
fix conftest
qgallouedec May 4, 2026
4c5ac17
Merge branch 'main' into new-tiny-model-generation
qgallouedec May 5, 2026
8967867
Merge branch 'new-tiny-model-generation' into fix-tiny-glm4-moe
qgallouedec May 5, 2026
6ba6f64
Merge remote-tracking branch 'origin/main' into fix-tiny-glm4-moe
qgallouedec May 5, 2026
9a76c3d
Skip GLM4 model tests for transformers version < 5.0.0
qgallouedec May 5, 2026
5d8ba77
Merge branch 'main' into fix-tiny-glm4-moe
qgallouedec May 6, 2026
e513313
Merge branch 'main' into fix-tiny-glm4-moe
qgallouedec May 6, 2026
40f2916
Merge branch 'main' into fix-tiny-glm4-moe
qgallouedec May 9, 2026
2d7c42a
fix
qgallouedec May 9, 2026
c4d64d1
Merge branch 'main' into fix-tiny-glm4-moe
qgallouedec May 11, 2026
baa5e0e
Merge branch 'main' into fix-tiny-glm4-moe
qgallouedec May 12, 2026
7e7e558
Merge branch 'main' into fix-tiny-glm4-moe
qgallouedec May 14, 2026
995f20b
style
qgallouedec May 14, 2026
1961d87
revert conftest
qgallouedec May 14, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -32,14 +32,23 @@
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
generation_config = GenerationConfig.from_pretrained(MODEL_ID)
config = Glm4MoeConfig(
vocab_size=len(tokenizer.vocab),
vocab_size=151365,
hidden_size=8,
num_attention_heads=4,
num_key_value_heads=2,
num_hidden_layers=2,
intermediate_size=32,
moe_intermediate_size=32,
head_dim=2,
n_routed_experts=4,
num_experts_per_tok=2,
attention_bias=True,
eos_token_id=[151329, 151336, 151338],
pad_token_id=151329,
rope_theta=1000000,
routed_scaling_factor=2.5,
use_qk_norm=True,
num_nextn_predict_layers=1,
)
model = Glm4MoeForCausalLM(config).to(dtype=torch.bfloat16)
init_weights_tiny_model(model)
Expand Down
Loading