Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
219b0cf
hybrid-cp feature for dev branch (Author: Parth Kunlun Tailai)
xiaoyao0115 Oct 16, 2025
c76f060
add thd support to run sft training with hybrid-cp
xiaoyao0115 Oct 20, 2025
a6c1818
pack sequence and wrap data_iterator
xiaoyao0115 Oct 27, 2025
d06805f
fix nan bugs, need to fix flops calculation.
xiaoyao0115 Nov 2, 2025
4f529db
support tp
xiaoyao0115 Nov 5, 2025
11d9960
support hybrid-cp+fsdp and support pp
xiaoyao0115 Nov 12, 2025
65f457e
hybrid-cp feature for dev branch (Author: Parth Kunlun Tailai)
xiaoyao0115 Oct 16, 2025
575dc00
add thd support to run sft training with hybrid-cp
xiaoyao0115 Oct 20, 2025
b495c9f
pack sequence and wrap data_iterator
xiaoyao0115 Oct 27, 2025
19cc585
fix nan bugs, need to fix flops calculation.
xiaoyao0115 Nov 2, 2025
9291da2
support tp
xiaoyao0115 Nov 5, 2025
b3a3190
support hybrid-cp+fsdp and support pp
xiaoyao0115 Nov 12, 2025
be8d859
Use NullTokenizer for mock data; TFLOPs calculation to be fixed later
xiaoyao0115 Nov 18, 2025
1167117
debugging nan issue when using FSDP+THD
xiaoyao0115 Dec 3, 2025
0d2c232
add hotswitch solver
Dec 4, 2025
501a5f6
add only_packing_no_scheduling for hybrid-cp
xiaoyao0115 Dec 4, 2025
37678db
add gpu_timer and pipeine simulator
Dec 8, 2025
91bf9b2
add support for vpp and debug moe nan issue
xiaoyao0115 Dec 8, 2025
d737717
merge remote branch
Dec 9, 2025
36bef70
merge remote branch
Dec 9, 2025
b66cb47
hybrid-cp feature for dev branch (Author: Parth Kunlun Tailai)
xiaoyao0115 Oct 16, 2025
a7e15f0
clean code, need to fix thd+fsdp nan issue and vpp input tensor out-o…
xiaoyao0115 Dec 10, 2025
f1fec85
add profile memory; add multiprocess solver
Dec 11, 2025
e1b244e
interface change
Dec 22, 2025
7cb4878
fix solver
Dec 23, 2025
c0beac7
resolver merge remote conflict
Dec 23, 2025
045b657
merge conflict
Dec 23, 2025
01282e2
fix mem simulation for hotcp samples && fix TP bcast
Dec 24, 2025
c640108
simplify indexed dataset code
Dec 24, 2025
0996e2d
Merge branch 'hybrid-cp-example' of https://github.com/AndyBug0/Megat…
Dec 25, 2025
c28de99
adaptation
Dec 30, 2025
8eaf516
modify script
Dec 30, 2025
ca5b537
add stash commit
Dec 30, 2025
afa67c2
optimize packing perf
Dec 30, 2025
b3782c7
optimize packing perf
Dec 30, 2025
266c548
add flag for async scheduler and new scheduler
Dec 30, 2025
333ad89
fix launch script
Dec 30, 2025
c828a9e
resolver merge conflict and fix some bugs
Dec 30, 2025
8017417
Merge pull request #1 from AndyBug0/wcy/hybrid_cp_example
AndyBug0 Dec 30, 2025
3b309c6
limit thread number in background process
Jan 20, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions examples/nsys_profile_rank.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
#!/bin/bash

set -e

# nsys profile -t cuda,nvtx,osrt -s none --cpuctxsw none --python-sampling true --python-sampling-frequency 1000 $@ || true

# nsys profile -t cuda,nvtx,osrt --force-overwrite true \
# --capture-range=cudaProfilerApi --capture-range-end=stop --gpu-metrics-device=0 \
# --python-sampling-frequency 1000 --python-sampling true \
# $@ || true

nsys profile -w true -t cublas,cuda,nvtx,osrt -s cpu -c cudaProfilerApi -o "$NSYS_DIR/datetime_${DATETIME}_gpt_sft_hetero_cp_iter2_4_flash_global_8192_rank${OMPI_COMM_WORLD_RANK}" $@ || true

# PROFILE_RANKS=(0 1 2 3 4 5 6 7 8)

# if [[ " ${PROFILE_RANKS[*]} " =~ " $OMPI_COMM_WORLD_RANK " ]]; then
# nsys profile -w true -t cublas,cuda,nvtx,osrt -s cpu -c cudaProfilerApi -o "datetime_${DATETIME}_gpt_sft_hetero_cp_iter2_4_flash_global_8192_rank${OMPI_COMM_WORLD_RANK}" $@ || true
# else
# $@ || true
# fi
Loading