Zero2: avoid graph breaks in torch.compile by using param_idx #6803

nelyahu · 2024-11-28T06:29:00Z

inside reduce_independent_p_g_buckets_and_remove_grads and in reduce_ipg_grads which are being executed during the BWD hook in zero2, the model param is being stored inside params_in_ipg_bucket. torch.compile has hard time tracing parameters.
By using the param's static index inside the group the same logic can be maintain with less complexity.

inside reduce_independent_p_g_buckets_and_remove_grads and in reduce_ipg_grads which are being executed during the BWD hook in zero2, the model param is being stored inside params_in_ipg_bucket. torch.compile has hard time tracing parameters. By using the param's static index inside the group the same logic can be maintain with less complexity.

deepspeed/runtime/zero/stage_1_and_2.py

assign param according to the param idx Co-authored-by: Olatunji Ruwase <[email protected]>

loadams · 2024-12-16T23:09:46Z

deepspeed/runtime/zero/stage_1_and_2.py

@@ -1067,7 +1068,8 @@ def average_tensor(self, tensor):

            process_group = self.dp_process_group
            # count = 0
-            for i, param, param_id in self.params_in_ipg_bucket:
+            for i, param_idx_in_group, param_id in self.params_in_ipg_bucket:
+                param = self.bit16_groups[group_idx][param_idx_in_group]


@nelyahu - this is failing on group_idx just fyi

fix index value to 'i' instead of 'group_idx'

nelyahu requested review from tjruwase, loadams and tohtana as code owners November 28, 2024 06:29

tjruwase reviewed Dec 2, 2024

View reviewed changes

deepspeed/runtime/zero/stage_1_and_2.py Show resolved Hide resolved

nelyahu and others added 2 commits December 11, 2024 11:56

Update deepspeed/runtime/zero/stage_1_and_2.py

7d1c883

assign param according to the param idx Co-authored-by: Olatunji Ruwase <[email protected]>

Merge branch 'master' into zero2_param_idx

726deee

tjruwase approved these changes Dec 12, 2024

View reviewed changes

tjruwase and others added 3 commits December 11, 2024 20:06

Merge branch 'master' into zero2_param_idx

9cb29d8

Merge branch 'master' into zero2_param_idx

4bb8f91

Formatting, fix indent

f7e8d53

loadams reviewed Dec 16, 2024

View reviewed changes

nelyahu and others added 2 commits December 17, 2024 09:51

Update stage_1_and_2.py

7606795

fix index value to 'i' instead of 'group_idx'

Merge branch 'master' into zero2_param_idx

9e58c41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Zero2: avoid graph breaks in torch.compile by using param_idx #6803

Zero2: avoid graph breaks in torch.compile by using param_idx #6803

nelyahu commented Nov 28, 2024

loadams Dec 16, 2024

nelyahu Dec 17, 2024

Zero2: avoid graph breaks in torch.compile by using param_idx #6803

Are you sure you want to change the base?

Zero2: avoid graph breaks in torch.compile by using param_idx #6803

Conversation

nelyahu commented Nov 28, 2024

loadams Dec 16, 2024

Choose a reason for hiding this comment

nelyahu Dec 17, 2024

Choose a reason for hiding this comment