Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No tensorboard log is created during training #242

Closed
hn2 opened this issue Nov 23, 2020 · 7 comments
Closed

No tensorboard log is created during training #242

hn2 opened this issue Nov 23, 2020 · 7 comments
Labels
bug Something isn't working more information needed Please fill the issue template completely windows

Comments

@hn2
Copy link

hn2 commented Nov 23, 2020

This is the code that I am using:

v_env = ...   
v_model_dir = path_join(MODELS_DIR, model_name)
v_model_file_name = path_join(v_model_dir, 'model.zip')
v_model_file_name_stats = path_join(v_model_dir, 'stats.pkl')             
v_model_replay_buffer = path_join(v_model_dir, 'replay_buffer.pkl')    

if not os.path.exists(v_model_dir):
    os.makedirs(v_model_dir)

v_tensorboard_log = path_join(LOGS_DIR, model_name)

print('===================')
print(v_tensorboard_log)
print('===================')

v_DummyVecEnv = DummyVecEnv([lambda: Monitor(v_env, LOGS_DIR)])

# load recent checkpoint
if os.path.isfile(v_model_file_name) and os.path.isfile(v_model_file_name_stats):
    v_VecNormalize = VecNormalize.load(v_model_file_name_stats, v_DummyVecEnv)
    v_VecNormalize.reset()
    model = SAC.load(v_model_file_name, v_VecNormalize) 
    print('===================')
    print('Model Loaded ...')
    print('===================')
else:
    #   v_VecNormalize = VecNormalize(v_DummyVecEnv, norm_obs, norm_reward, clip_obs, clip_reward, gamma)
    v_VecNormalize = VecNormalize(v_DummyVecEnv)
    model = SAC(env=v_VecNormalize, policy=MlpPolicy, verbose=1, tensorboard_log=LOGS_DIR, policy_kwargs=dict(net_arch=[64, 64]))

# replay buffer
if os.path.isfile(v_model_replay_buffer):
    model.load_replay_buffer(v_model_replay_buffer)

model.learn(total_timesteps=total_timesteps, log_interval=1, reset_num_timesteps=False, tb_log_name=model_name)

model.save(v_model_file_name)
model.save_replay_buffer(v_model_replay_buffer)
v_VecNormalize.save(v_model_file_name_stats) 

v_env.close()

There is no error and the model is training. It does not create tensorboard log under LOGS_DIR.
It does create monitor.csv

My env:
windows 10,
python --version
Python 3.6.11

tensorboard 1.14.0
tensorboardX 2.0
tensorflow 1.14.0
tensorflow-estimator 1.14.0
tensorflow-gpu 1.14.0

torch 1.6.0+cu101

@hn2 hn2 added the bug Something isn't working label Nov 23, 2020
@hn2 hn2 changed the title [Bug] bug title No tensorboard log is created during training Nov 23, 2020
@araffin araffin added the more information needed Please fill the issue template completely label Nov 23, 2020
@araffin
Copy link
Member

araffin commented Nov 23, 2020

Hello,

Could you try with tensorboard==2.3.0 ?

Also missing, the SB3 version you are using and how it was installed.

@hn2
Copy link
Author

hn2 commented Nov 23, 2020

Works now (with tensorboard==2.3.0) . What is the difference between ep_len_mean and ep_rew_mean?

@araffin
Copy link
Member

araffin commented Nov 23, 2020

Works now (with tensorboard==2.3.0)

hmm, we should probably put a minimum version (because of pytorch).

What is the difference between ep_len_mean and ep_rew_mean?

mean episode length
mean episode reward

Related: hill-a/stable-baselines#121

@araffin araffin closed this as completed Nov 23, 2020
@hn2
Copy link
Author

hn2 commented Nov 23, 2020

For some reason, it sometimes doesn't show ep_len_mean and ep_rew_mean in tensorboard

I guess that only if env is monitored it will show full statistics.

@araffin
Copy link
Member

araffin commented Nov 23, 2020

For some reason, it sometimes doesn't show ep_len_mean and ep_rew_mean in tensorboard
I guess that only if env is monitored it will show full statistics.

yes (see #232), that's also why we recommend to use the rl zoo.

@hn2
Copy link
Author

hn2 commented Nov 23, 2020

I use custom env ...

@hn2
Copy link
Author

hn2 commented Nov 24, 2020

There is another issue, probably for a new thread, which I am not sure whether is a bug or not.
When I train models with SB3 and torch, cpu utilization is always 100%.
This was not the case when I was using SB with tensorflow.
I am not sure if it is the default torch "behavior" or it has to do with SB3 specific implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working more information needed Please fill the issue template completely windows
Projects
None yet
Development

No branches or pull requests

2 participants