[Question] Pretraining policy using BC anc continue training using SB3 #543

hectorIzquierdo · 2021-08-17T07:44:53Z

Question

Hello,

I'm trying to pretrain a policy using BC to later on continue training it by using SB3. I read #27 and I followed the Colab notebook mentioned there by @araffin https://colab.research.google.com/github/Stable-Baselines-Team/rl-colab-notebooks/blob/sb3/pretraining.ipynb. That worked fine. The problem comes when continuing to learn using SB3. While the average episode reward when using the pretrained policy is very high, after training a few more steps with RL it decreases dramatically. It's like pre-training has been of no use at all. My question is: is this the correct approach using SB3 and thefore, it should be working fine, and the problem is related with my particular case? Or is there something I'm missing?

Additional context

This is the code I'm using. The pretrain_agent is the one used in the Colab Notebook I just shared. The observation space is an image (box 45x45x1, 0-255) and the action space a discrete(3112).

Initialize model

env = RSEnv()
model = A2C('CnnPolicy', env, verbose = 1, n_steps = 1, seed = 1, learning_rate = 0.0007)

Pretrain model

pretrain_agent(model, batch_size = 64, epochs = 30, learning_rate = 5.0, test_batch_size = 64)

Continue training

model.learn(10000)

Checklist

I have read the documentation (required)
I have checked that there is no similar issue in the repo (required)

Miffyli · 2021-08-17T10:37:25Z

This goes more into the "exploration" side of projects and we do not have the right answers. However it is somewhat known that doing RL training after BC will quickly erase the behaviour from BC training. Some suggestions:

Try bigger batch-size and/or n-steps (more data before first RL updates)
Requires modifications to the code, but try first only training the value head for some time to initialize it with the correct value estimate of the BC model (similar done e.g. in this SC2 paper).
Requires modifications to the code, but add KL-penalty loss which aims to keep the RL model outputs close to BC model outputs (similar done e.g. in this SC2 paper and this competition submission)

You could also try asking on e.g. RL Discord.

araffin · 2021-08-23T10:10:57Z

after training a few more steps with RL it decreases dramatically. It's like pre-training has been of no use at all.

I recommend you checking offline RL to understand the issues you are facing (start with the BCQ paper ;) ) and probably read that paper: https://arxiv.org/abs/2006.09359

hectorIzquierdo · 2021-08-30T16:21:29Z

Ok, thank you so much for your help. I'll take a look to it.

hectorIzquierdo added the question Further information is requested label Aug 17, 2021

araffin closed this as completed Aug 30, 2021

araffin mentioned this issue Dec 13, 2021

[Question] I have a data set of 'State - action - reward - next state', can I pretrain my model? #690

Closed

WreckItTim mentioned this issue Mar 27, 2023

[Question] loading pretrained model weights into a new model #1414

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] Pretraining policy using BC anc continue training using SB3 #543

[Question] Pretraining policy using BC anc continue training using SB3 #543

hectorIzquierdo commented Aug 17, 2021 •

edited

Loading

Miffyli commented Aug 17, 2021 •

edited

Loading

araffin commented Aug 23, 2021

hectorIzquierdo commented Aug 30, 2021

[Question] Pretraining policy using BC anc continue training using SB3 #543

[Question] Pretraining policy using BC anc continue training using SB3 #543

Comments

hectorIzquierdo commented Aug 17, 2021 • edited Loading

Question

Additional context

Initialize model

Pretrain model

Continue training

Checklist

Miffyli commented Aug 17, 2021 • edited Loading

araffin commented Aug 23, 2021

hectorIzquierdo commented Aug 30, 2021

hectorIzquierdo commented Aug 17, 2021 •

edited

Loading

Miffyli commented Aug 17, 2021 •

edited

Loading