Skip to content

New WR 156s (1.25% better than PR #122): Optimize distributed training, improve skip connection gating, and enhance bfloat16 usage#125

Merged
ClassicLarry merged 11 commits into
KellerJordan:masterfrom
bernard24:new_wr
Oct 15, 2025
Merged

New WR 156s (1.25% better than PR #122): Optimize distributed training, improve skip connection gating, and enhance bfloat16 usage#125
ClassicLarry merged 11 commits into
KellerJordan:masterfrom
bernard24:new_wr

Commits

Commits on Aug 23, 2025

Commits on Aug 27, 2025

Commits on Sep 2, 2025

Commits on Sep 11, 2025