How to update the actions only once during the whole episode #13

famora2 · 2022-05-22T14:17:50Z

famora2
May 22, 2022

Hi,

thank you for this great open-sourced library. Currently, I am trying to use PPO from this library in conjunction with ISAAC GYM.

More specifically, I am trying to find out a way to update the actions only once during the whole episode, which means one action from the action buffer should be sampled at the beginning of the episode and should remain constant until the episode ends. Is there a way for this?

Toni-SM · 2022-05-23T19:37:11Z

Toni-SM
May 23, 2022
Maintainer

Hi @Joo236

An approach to do it is to implement a custom logic in the code that takes care of the execution of the RL loop (get the actions of the agent and step the environment). For example, in the case of the skrl library, you can edit the trainer base class (that runs the RL loop for single-agent experiments) single_agent_train method and modify it as follow.

Note the 3 blocks delimited by +++++++++++++++++++++ and ------------------------.

Declare temporal variables
Update the action to be passed to the env step method to be the same during the whole episode. This section will only update the action when the episode ends for a specific environment (or specific parallel environments)
Update last_dones variable. it is used to identify when an episode ends in order to update the actions once for the new episode

    def single_agent_train(self) -> None:
        """Train a single agent

        This method executes the following steps in loop:

        - Pre-interaction
        - Compute actions
        - Interact with the environments
        - Render scene
        - Record transitions
        - Post-interaction
        - Reset environments
        """
        assert self.num_agents == 1, "This method is only valid for a single agent"

        # 1. +++++++++++++++++++++++++++
        last_done = None
        only_once_actions = None
        # ------------------------------

        # reset env
        states = self.env.reset()

        for timestep in range(self.initial_timestep, self.timesteps):
            # show progress
            self.show_progress(timestep=timestep, timesteps=self.timesteps)

            # pre-interaction
            self.agents.pre_interaction(timestep=timestep, timesteps=self.timesteps)
            
            # compute actions
            with torch.no_grad():
                actions, _, _ = self.agents.act(states, inference=True, timestep=timestep, timesteps=self.timesteps)

            # 2. ++++++++++++++++++++++++++++
            if only_once_actions is None:
                only_once_actions = actions.clone()
            if last_done is not None and last_done.any():
                indexs = last_done.nonzero()
                only_once_actions[indexs[:, 0]] = actions[indexs[:, 0]]
            actions = only_once_actions.clone()
            # ------------------------------
            
            # step the environments
            next_states, rewards, dones, infos = self.env.step(actions)

            # 3. +++++++++++++++++++++++++++
            last_done = dones.clone()
            # ------------------------------
            
            # render scene
            if not self.headless:
                self.env.render()

            # record the environments' transitions
            with torch.no_grad():
                self.agents.record_transition(states=states, 
                                              actions=actions,
                                              rewards=rewards,
                                              next_states=next_states,
                                              dones=dones,
                                              infos=infos,
                                              timestep=timestep,
                                              timesteps=self.timesteps)
            
            # post-interaction
            self.agents.post_interaction(timestep=timestep, timesteps=self.timesteps)
            
            # reset environments
            with torch.no_grad():
                if dones.any():
                    states = self.env.reset()
                else:
                    states.copy_(next_states)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to update the actions only once during the whole episode #13

{{title}}

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

How to update the actions only once during the whole episode #13

famora2 May 22, 2022

Replies: 1 comment

Toni-SM May 23, 2022 Maintainer

famora2
May 22, 2022

Toni-SM
May 23, 2022
Maintainer