Code for "Dichotomy of Early and Late Phase Implicit Biases Can Provably Induce Grokking"

Code:

train_mod_add.py: Train a two-layer ReLU net on modular addition
train_mod_add_nowd.py: Train a two-layer ReLU net on modular addition without weight decay. A special learning rate schedule is applied to speed up the training in the late phase.
train_diag_cls.py: Train a diagonal linear net on sparse linear classification.
train_diag_cls2.py: Train a diagonal linear net on linear classification, where the data has a very large L2 margin.
train_mc.py: Optimize for an overparameterized matrix completion problem.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
data.py		data.py
kloader.py		kloader.py
models.py		models.py
train_diag_cls.py		train_diag_cls.py
train_diag_cls2.py		train_diag_cls2.py
train_mc.py		train_mc.py
train_mod_add.py		train_mod_add.py
train_mod_add_nowd.py		train_mod_add_nowd.py

Provide feedback