You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to pretrain a policy using BC to later on continue training it by using SB3. I read #27 and I followed the Colab notebook mentioned there by @araffinhttps://colab.research.google.com/github/Stable-Baselines-Team/rl-colab-notebooks/blob/sb3/pretraining.ipynb. That worked fine. The problem comes when continuing to learn using SB3. While the average episode reward when using the pretrained policy is very high, after training a few more steps with RL it decreases dramatically. It's like pre-training has been of no use at all. My question is: is this the correct approach using SB3 and thefore, it should be working fine, and the problem is related with my particular case? Or is there something I'm missing?
Additional context
This is the code I'm using. The pretrain_agent is the one used in the Colab Notebook I just shared. The observation space is an image (box 45x45x1, 0-255) and the action space a discrete(3112).
This goes more into the "exploration" side of projects and we do not have the right answers. However it is somewhat known that doing RL training after BC will quickly erase the behaviour from BC training. Some suggestions:
Try bigger batch-size and/or n-steps (more data before first RL updates)
Requires modifications to the code, but try first only training the value head for some time to initialize it with the correct value estimate of the BC model (similar done e.g. in this SC2 paper).
Requires modifications to the code, but add KL-penalty loss which aims to keep the RL model outputs close to BC model outputs (similar done e.g. in this SC2 paper and this competition submission)
after training a few more steps with RL it decreases dramatically. It's like pre-training has been of no use at all.
I recommend you checking offline RL to understand the issues you are facing (start with the BCQ paper ;) ) and probably read that paper: https://arxiv.org/abs/2006.09359
Question
Hello,
I'm trying to pretrain a policy using BC to later on continue training it by using SB3. I read #27 and I followed the Colab notebook mentioned there by @araffin https://colab.research.google.com/github/Stable-Baselines-Team/rl-colab-notebooks/blob/sb3/pretraining.ipynb. That worked fine. The problem comes when continuing to learn using SB3. While the average episode reward when using the pretrained policy is very high, after training a few more steps with RL it decreases dramatically. It's like pre-training has been of no use at all. My question is: is this the correct approach using SB3 and thefore, it should be working fine, and the problem is related with my particular case? Or is there something I'm missing?
Additional context
This is the code I'm using. The pretrain_agent is the one used in the Colab Notebook I just shared. The observation space is an image (box 45x45x1, 0-255) and the action space a discrete(3112).
Initialize model
env = RSEnv()
model = A2C('CnnPolicy', env, verbose = 1, n_steps = 1, seed = 1, learning_rate = 0.0007)
Pretrain model
pretrain_agent(model, batch_size = 64, epochs = 30, learning_rate = 5.0, test_batch_size = 64)
Continue training
model.learn(10000)
Checklist
The text was updated successfully, but these errors were encountered: