Commit 2beead7
[PP] move FSDP reduce scatters to end of step (pytorch#165106)
Move FSDP reduce scatters to the end of the PP step. The reduce scatter compute stream sync blocks the other stages from executing their backwards leading to bubbles. There should be a way to execute these RS earlier, but doing this for now as a quick fix.
<img width="1056" height="463" alt="image" src="https://github.com/user-attachments/assets/b945dd55-8ab1-4acc-b862-c6e2e476b834" />
Pull Request resolved: pytorch#165106
Approved by: https://github.com/weifengpy
ghstack dependencies: pytorch#1649761 parent 3a110c9 commit 2beead7
File tree
3 files changed
+40
-38
lines changed- test/distributed
- torch/distributed/pipelining
3 files changed
+40
-38
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
146 | 146 | | |
147 | 147 | | |
148 | 148 | | |
| 149 | + | |
149 | 150 | | |
150 | 151 | | |
151 | 152 | | |
| |||
163 | 164 | | |
164 | 165 | | |
165 | 166 | | |
| 167 | + | |
166 | 168 | | |
167 | 169 | | |
168 | 170 | | |
| |||
185 | 187 | | |
186 | 188 | | |
187 | 189 | | |
| 190 | + | |
188 | 191 | | |
189 | 192 | | |
190 | 193 | | |
| |||
523 | 526 | | |
524 | 527 | | |
525 | 528 | | |
526 | | - | |
527 | | - | |
| 529 | + | |
| 530 | + | |
528 | 531 | | |
529 | 532 | | |
530 | 533 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
625 | 625 | | |
626 | 626 | | |
627 | 627 | | |
| 628 | + | |
| 629 | + | |
| 630 | + | |
| 631 | + | |
628 | 632 | | |
629 | 633 | | |
630 | 634 | | |
| |||
773 | 777 | | |
774 | 778 | | |
775 | 779 | | |
776 | | - | |
777 | | - | |
778 | | - | |
779 | | - | |
780 | 780 | | |
781 | 781 | | |
782 | 782 | | |
| |||
951 | 951 | | |
952 | 952 | | |
953 | 953 | | |
954 | | - | |
955 | | - | |
956 | | - | |
957 | | - | |
958 | 954 | | |
959 | 955 | | |
960 | 956 | | |
| |||
1555 | 1551 | | |
1556 | 1552 | | |
1557 | 1553 | | |
| 1554 | + | |
| 1555 | + | |
| 1556 | + | |
| 1557 | + | |
| 1558 | + | |
| 1559 | + | |
1558 | 1560 | | |
1559 | 1561 | | |
1560 | 1562 | | |
| |||
2086 | 2088 | | |
2087 | 2089 | | |
2088 | 2090 | | |
2089 | | - | |
2090 | 2091 | | |
2091 | 2092 | | |
2092 | 2093 | | |
2093 | 2094 | | |
2094 | 2095 | | |
2095 | 2096 | | |
2096 | | - | |
2097 | | - | |
2098 | 2097 | | |
2099 | 2098 | | |
2100 | 2099 | | |
| |||
2131 | 2130 | | |
2132 | 2131 | | |
2133 | 2132 | | |
2134 | | - | |
2135 | 2133 | | |
2136 | 2134 | | |
2137 | 2135 | | |
2138 | 2136 | | |
2139 | | - | |
2140 | | - | |
2141 | 2137 | | |
2142 | 2138 | | |
2143 | 2139 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
651 | 651 | | |
652 | 652 | | |
653 | 653 | | |
654 | | - | |
655 | | - | |
656 | | - | |
657 | | - | |
658 | | - | |
659 | | - | |
660 | | - | |
661 | | - | |
662 | | - | |
663 | | - | |
664 | | - | |
665 | | - | |
666 | | - | |
667 | | - | |
668 | | - | |
669 | | - | |
670 | | - | |
671 | | - | |
672 | | - | |
673 | | - | |
674 | | - | |
675 | | - | |
676 | 654 | | |
677 | 655 | | |
678 | 656 | | |
| |||
998 | 976 | | |
999 | 977 | | |
1000 | 978 | | |
| 979 | + | |
| 980 | + | |
| 981 | + | |
| 982 | + | |
| 983 | + | |
| 984 | + | |
| 985 | + | |
| 986 | + | |
| 987 | + | |
| 988 | + | |
| 989 | + | |
| 990 | + | |
| 991 | + | |
| 992 | + | |
| 993 | + | |
| 994 | + | |
| 995 | + | |
| 996 | + | |
| 997 | + | |
| 998 | + | |
| 999 | + | |
| 1000 | + | |
| 1001 | + | |
| 1002 | + | |
| 1003 | + | |
1001 | 1004 | | |
1002 | 1005 | | |
1003 | 1006 | | |
| |||
0 commit comments