Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

My Custom env training go for ever #1032

Closed
Abdelrahman-Alkhodary opened this issue Aug 25, 2022 · 6 comments
Closed

My Custom env training go for ever #1032

Abdelrahman-Alkhodary opened this issue Aug 25, 2022 · 6 comments
Labels
custom gym env Issue related to Custom Gym Env more information needed Please fill the issue template completely question Further information is requested

Comments

@Abdelrahman-Alkhodary
Copy link

Hello everyone,

I have created my own custom environment following the example in the docs and ran the env checker and it went well except for a warning about box bound precision. Now I have been trying to run the model training using the td3 algorithm but the training runs forever although I put timesteps=1 just to check there is nothing wrong with my env or the training. You can find my gym env here
https://github.com/Abdelrahman-Alkhodary/Custom-gym-env

and this the system info
sb3.get_system_info()
OS: Linux-5.15.0-46-generic-x86_64-with-debian-bullseye-sid #49~20.04.1-Ubuntu SMP Thu Aug 4 19:15:44 UTC 2022
Python: 3.7.9
Stable-Baselines3: 1.6.0
PyTorch: 1.11.0+cu102
GPU Enabled: True
Numpy: 1.19.2
Gym: 0.21.0

({'OS': 'Linux-5.15.0-46-generic-x86_64-with-debian-bullseye-sid #4920.04.1-Ubuntu SMP Thu Aug 4 19:15:44 UTC 2022', 'Python': '3.7.9', 'Stable-Baselines3': '1.6.0', 'PyTorch': '1.11.0+cu102', 'GPU Enabled': 'True', 'Numpy': '1.19.2', 'Gym': '0.21.0'}, 'OS: Linux-5.15.0-46-generic-x86_64-with-debian-bullseye-sid #4920.04.1-Ubuntu SMP Thu Aug 4 19:15:44 UTC 2022\nPython: 3.7.9\nStable-Baselines3: 1.6.0\nPyTorch: 1.11.0+cu102\nGPU Enabled: True\nNumpy: 1.19.2\nGym: 0.21.0\n')

@Abdelrahman-Alkhodary Abdelrahman-Alkhodary added custom gym env Issue related to Custom Gym Env question Further information is requested labels Aug 25, 2022
@araffin araffin added the more information needed Please fill the issue template completely label Aug 26, 2022
@Abdelrahman-Alkhodary
Copy link
Author

Hello @araffin
The custom environment that I made is about reaching I have a goal and my agent tries to reach it. The step function is driven by neural network, the neural network reads the observation and give the action. I have trained a TD3 manually without using baseline and it worked. When I tried to use stable-baseline, the function model().learn() run forever although I put total_timesteps=1 just to check. Also, I put a dummy counter inside the step function and print the count every 1000 steps to see if the model is not stuck in somewhere else. Please if there's nothing clear tell me exactly what it's and also I have put my code

@araffin
Copy link
Member

araffin commented Aug 26, 2022

Hello,
please provide a minimal code example to reproduce the error as requested by the custom env issue template.

See #982 (comment) for a definition of what is "minimal" ;)

@Abdelrahman-Alkhodary
Copy link
Author

Abdelrahman-Alkhodary commented Aug 26, 2022

import random
import gym
from gym import spaces
import numpy as np
import pandas as pd
class SofaArmEnv(gym.Env):
    """Custom Environment that follows gym interface"""
    metadata = {'render.modes': ['human']}
    def __init__(self):
        super(SofaArmEnv, self).__init__()
        # Define action and observation space
        # The action will be tha displacement in the cables
        self.action_space = spaces.Box(low=-1.0, high=1.0, shape=(8,), dtype=np.float32)
        # observation or the state of the env will be the tip position, goal position, cables' displacement 
        # Eff_X, Eff_Y, Eff_Z, g_X, g_Y, g_Z, c_L0, c_L1, c_L2, c_L3, c_S0, c_S1, c_S2, c_S3
        self.observation_space = spaces.Box(
            low=np.array([-150.0, -150.0, -30.0, -150.0, -150.0, -30.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]), 
            high=np.array([150.0, 150.0, 195.0, 150.0, 150.0, 195.0, 60.0, 60.0, 60.0, 60.0, 40.0, 40.0, 40.0, 40.0]), 
            shape=(14,),
            dtype=np.float32
        )

        # dataframe that contains all the possible goals
        self.goals_df = pd.read_csv('./goals_xyz.csv')
        # set the done as False
        self.done = False 
        self.dummy_count = 1

    def step(self, delta_action):
        self.dummy_count += 1
        if self.dummy_count % 1000 == 0:
            print('1000 step')
        # here should be the neural network model that take the action and produce the tip
        self.new_tip_pos = np.random.random((3,))
        # The observation or the state is the stacking of the goal position, tip position and the actuations of the cables
        observation = np.hstack([self.goal_pos, self.new_tip_pos, delta_action]).astype(np.float32)
        # the reward is the negative distance between the tip position and the goal
        reward = self.get_reward(self.new_tip_pos)
        # update the tip position for the next step
        self.tip_pos = self.new_tip_pos
        if self.distance(self.new_tip_pos) < 5:
            self.done = True
        else: 
            self.done = False
        info = {}
        return observation, reward, self.done, info
    
    def reset(self):
        x, y, z = self.goals_df.iloc[random.randrange(0,len(self.goals_df)),:]
        self.goal_pos = [x, y, z]
        self.tip_pos = [0, 0, 195]
        self.action = [0, 0, 0, 0, 0, 0, 0, 0]
        observation = np.hstack([self.goal_pos, self.tip_pos, self.action]).astype(np.float32)
        self.done = False
        return observation  # reward, done, info can't be included

    def get_reward(self, tip_pos):
        current_distance = self.distance(tip_pos)
        return - current_distance 

    def distance(self, tip_pos):
        eff_goal_dist = np.sqrt((self.goal_pos[0] - tip_pos[0]) ** 2 +
                                (self.goal_pos[1] - tip_pos[1]) ** 2 +
                                (self.goal_pos[2] - tip_pos[2]) ** 2) 
        return eff_goal_dist

    def close(self):
        print('close method called')

and this is the training code


import numpy as np
from sofa_arm_env import SofaArmEnv

from stable_baselines3 import TD3
from stable_baselines3.common.noise import NormalActionNoise

env = SofaArmEnv()
env.reset()

n_actions = env.action_space.shape[-1]
action_noise = NormalActionNoise(mean=np.zeros(n_actions), sigma=0.1 * np.ones(n_actions))
model = TD3('MlpPolicy', env, verbose=1)
model.learn(total_timesteps=1, reset_num_timesteps=False)

@araffin
Copy link
Member

araffin commented Aug 26, 2022

Please read carefully the link i provided (provided code is not functional) and please use markdown codeblock to format your code (also shown in the link and the issue template).

@araffin
Copy link
Member

araffin commented Aug 26, 2022

my guess is that train freq is set to (1, "episode") by default and your episode never finishes. A solution to that is to set train freq to 1. You are also 'ot using the action noise.

@Abdelrahman-Alkhodary
Copy link
Author

@araffin you are right when train_freq set to 1, it ran normally and didn't get stuck. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
custom gym env Issue related to Custom Gym Env more information needed Please fill the issue template completely question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants