feature(pu): adapt to unizero-multitask ddp, and adapt ppo to support jericho config #858

puyuan1996 · 2025-03-06T07:11:24Z

Description

Related Issue

TODO

Check List

merge the latest version source branch/repo, and resolve all the conflicts
pass style check
pass all the tests

…n mask), e.g. detective env

PaParaZz1 · 2025-03-09T04:01:16Z

ding/policy/base_policy.py

+                    if param.grad is not None:
+                        allreduce(param.grad.data)
+                    else:
+                        # 如果梯度为 None，则创建一个与 param.grad_size 相同的零张量，并执行 allreduce


remove commented code and add English comment, then these modifications will be merged

PaParaZz1 · 2025-03-09T04:01:36Z

ding/utils/pytorch_ddp_dist_helper.py

+    # dist.init_process_group(backend=backend, rank=rank, world_size=world_size)
+    # TODO： 
+    import datetime
+    dist.init_process_group(backend=backend, rank=rank, world_size=world_size, timeout=datetime.timedelta(seconds=60000))


why add this

这个是因为执行程序前的环境变量不起作用，在这里显式传入才能生效，我将其改成了默认值，作为最后一个参数哈

PaParaZz1 · 2025-03-09T04:02:25Z

ding/worker/learner/base_learner.py

+        # if self._rank == 0:
+        #     self._monitor = get_simple_monitor_type(self._policy.monitor_vars())(TickTime(), expire=10)
+
+        self._monitor = get_simple_monitor_type(self._policy.monitor_vars())(TickTime(), expire=10)


add an argument named only_monitor_rank0 to control the logic, defaults to True

PaParaZz1 · 2025-03-09T04:03:32Z

ding/worker/learner/learner_hook.py

-            for k in engine.log_buffer:
-                engine.log_buffer[k].clear()
-            return
+        # if engine.rank != 0:


also pass the only_monitor_rank0 argument to the hook class

PaParaZz1 · 2025-03-09T04:05:35Z

ding/model/template/qmix.py

+                self._global_state_encoder = nn.Identity()
+            elif len(global_obs_shape) == 3:
+                self._mixer = Mixer(agent_num, embedding_size, embedding_size, activation=activation)
+                self._global_state_encoder = ConvEncoder(global_obs_shape, hidden_size_list=hidden_size_list, activation=activation, norm_type='BN')


why BN rather than using LN as default here

PaParaZz1 · 2025-03-09T04:05:58Z

ding/model/template/qmix.py

-            agent_state, global_state = agent_state.unsqueeze(0), global_state.unsqueeze(0)
+            agent_state = agent_state.unsqueeze(0)
+        if single_step and len(global_state.shape) == 2:
+            global_state = global_state.unsqueeze(0)


add shape comments

PaParaZz1 · 2025-03-09T04:06:24Z

ding/model/template/qmix.py

        agent_q_act = agent_q_act.squeeze(-1)  # T, B, A
        if self.mixer:
-            global_state_embedding = self._global_state_encoder(global_state)
+            if len(global_state.shape) == 5:


add some comments

PaParaZz1 · 2025-03-09T04:07:49Z

ding/model/template/vac.py

        """
        if self.share_encoder:
-            x = self.encoder(x)
+            # import ipdb;ipdb.set_trace()


modify the corresponding API comments, and the isinstance(x, dict) to control the logic

puyuan1996 · 2025-03-12T03:48:27Z

We have a new polished PR: #860

puyuan1996 added 8 commits October 22, 2024 17:07

feature(pu): add pistonball_env, its unittest and qmix config

d1e427e

polish(pu): pistonball reuse PTZRecordVideo

e916841

polish(pu): adapt qmix's mixer to support image obs

55dc254

tmp commit: unizero_mt_ddp_v2

e6a18ba

polish(pu): adapt learner to unizero_multitask_ddp_v2

a42c85b

polish(pu): adapt learner to unizero_multitask_ddp_v2

7a66b76

test(pu): add timeout in dist.init_process_group

0c4b338

feature(pu): adapt ppo vac to env that return obs dict (include actio…

141cf51

…n mask), e.g. detective env

PaParaZz1 requested changes Mar 9, 2025

View reviewed changes

PaParaZz1 added enhancement New feature or request env Questions about RL environment labels Mar 10, 2025

PaParaZz1 changed the title ~~WIP: feature(pu): adapt to unizero-multitask ddp, and adapt ppo to support jericho config~~ feature(pu): adapt to unizero-multitask ddp, and adapt ppo to support jericho config Mar 10, 2025

PaParaZz1 mentioned this pull request Mar 10, 2025

Roadmap for DI-engine #548

Open

puyuan1996 closed this Mar 12, 2025

feature(pu): adapt to unizero-multitask ddp, and adapt ppo to support jericho config #858

feature(pu): adapt to unizero-multitask ddp, and adapt ppo to support jericho config #858

Uh oh!

Conversation

puyuan1996 commented Mar 6, 2025

Description

Related Issue

TODO

Check List

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

puyuan1996 commented Mar 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants