Activity
fix: top p for full version
fix: top p for full version
fix: validation top p as paper
fix: validation top p as paper
fix: remove uncessary verify from naive
fix: remove uncessary verify from naive
chore: format
chore: format
feat: CI for DAPO
feat: CI for DAPO
In the GRPO example scripts, modify the entropy_coeff
to 0 to ensur…
In the GRPO example scripts, modify the
entropy_coeff
to 0 to ensur…Pull request merge
fix: prevent NaN when all items are intentionally masked by adding ep…
fix: prevent NaN when all items are intentionally masked by adding ep…
Pull request merge
chore: wandb run of an early version
chore: wandb run of an early version