Skip to content

fix: on GB200 use single-thread checkpoint save to avoid Cpu OOM#1703

Merged
terrykong merged 4 commits into
NVIDIA-NeMo:mainfrom
guyueh1:fix_gb200_dpsk_ckpt
Jan 5, 2026
Merged

fix: on GB200 use single-thread checkpoint save to avoid Cpu OOM#1703
terrykong merged 4 commits into
NVIDIA-NeMo:mainfrom
guyueh1:fix_gb200_dpsk_ckpt

Merge branch 'main' into fix_gb200_dpsk_ckpt

f37f830
Select commit
Loading
Failed to load commit list.
DCO / DCO succeeded Jan 5, 2026 in 0s

DCO

Commit sign-off was manually approved.