Replies: 1 comment
-
Hi @Joo236 An approach to do it is to implement a custom logic in the code that takes care of the execution of the RL loop (get the actions of the agent and step the environment). For example, in the case of the skrl library, you can edit the trainer base class (that runs the RL loop for single-agent experiments) Note the 3 blocks delimited by
def single_agent_train(self) -> None:
"""Train a single agent
This method executes the following steps in loop:
- Pre-interaction
- Compute actions
- Interact with the environments
- Render scene
- Record transitions
- Post-interaction
- Reset environments
"""
assert self.num_agents == 1, "This method is only valid for a single agent"
# 1. +++++++++++++++++++++++++++
last_done = None
only_once_actions = None
# ------------------------------
# reset env
states = self.env.reset()
for timestep in range(self.initial_timestep, self.timesteps):
# show progress
self.show_progress(timestep=timestep, timesteps=self.timesteps)
# pre-interaction
self.agents.pre_interaction(timestep=timestep, timesteps=self.timesteps)
# compute actions
with torch.no_grad():
actions, _, _ = self.agents.act(states, inference=True, timestep=timestep, timesteps=self.timesteps)
# 2. ++++++++++++++++++++++++++++
if only_once_actions is None:
only_once_actions = actions.clone()
if last_done is not None and last_done.any():
indexs = last_done.nonzero()
only_once_actions[indexs[:, 0]] = actions[indexs[:, 0]]
actions = only_once_actions.clone()
# ------------------------------
# step the environments
next_states, rewards, dones, infos = self.env.step(actions)
# 3. +++++++++++++++++++++++++++
last_done = dones.clone()
# ------------------------------
# render scene
if not self.headless:
self.env.render()
# record the environments' transitions
with torch.no_grad():
self.agents.record_transition(states=states,
actions=actions,
rewards=rewards,
next_states=next_states,
dones=dones,
infos=infos,
timestep=timestep,
timesteps=self.timesteps)
# post-interaction
self.agents.post_interaction(timestep=timestep, timesteps=self.timesteps)
# reset environments
with torch.no_grad():
if dones.any():
states = self.env.reset()
else:
states.copy_(next_states) |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
thank you for this great open-sourced library. Currently, I am trying to use PPO from this library in conjunction with ISAAC GYM.
More specifically, I am trying to find out a way to update the actions only once during the whole episode, which means one action from the action buffer should be sampled at the beginning of the episode and should remain constant until the episode ends. Is there a way for this?
Beta Was this translation helpful? Give feedback.
All reactions