You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am currently using your DDPG implementation for real life training of an inverted pendulum system. Although you currently have save checkpoints for the 4 networks, I notice that they are not enough to resume training. After some research, I believe that there is a need to save the 4 networks, their optimizers, as well as the replay buffer, into a file on the computer.
So that the next time we resume training, we will be able to load all of them and continue the training.
Based on my current situation, I have did 1000 episodes of training and saved/loaded the checkpoints of the 4 networks. But when I resumed training (by loading the checkpoint), the network retains the performance, but it seems that it is trying to retrain by re-experimenting all the errors that it has previously (much earlier) experimented.
E.g, I could see my pendulum really trying to swing up upon loading checkpoint, but yet after a few episodes of resuming training, it started banging against the extremities for many episodes (as if it has never learnt those lessons in the past).
For the optimizer, I can use the state_dict() to extract the parameters for saving, but it appears that it is not as straightforward for the replay buffer. May I know if you can implement that into your codes please?
Also, should we just save only the latest batch_sizes (i used a batch size of only 64) within the memory? Or should I save the entire 1E6 of the memory (for state, state_, action, reward, terminal)?
Do you know if there are other critical information that needs saving/loading in order to permit the resumption of training please?
This is my first time posting an "Issue" in Github, please pardon me if I am not posting in the usual way.
Thanks for your explanation on youtube and sharing your codes here!
The text was updated successfully, but these errors were encountered:
If the goal is to be able to save an agent at any point in your code, to then reload it later on, I would use dill or pickle. dill.dump allows you to save a whole object to a binary file, and reload it when you need it. If you were to save the agent object, that would automatically save also the buffer and all the networks together.
I normally use this strategy to avoid re-running the whole training if something goes wrong on the cluster where I run my codes.
I believe this method in the agent class would work:
I am currently using your DDPG implementation for real life training of an inverted pendulum system. Although you currently have save checkpoints for the 4 networks, I notice that they are not enough to resume training. After some research, I believe that there is a need to save the 4 networks, their optimizers, as well as the replay buffer, into a file on the computer.
So that the next time we resume training, we will be able to load all of them and continue the training.
Based on my current situation, I have did 1000 episodes of training and saved/loaded the checkpoints of the 4 networks. But when I resumed training (by loading the checkpoint), the network retains the performance, but it seems that it is trying to retrain by re-experimenting all the errors that it has previously (much earlier) experimented.
E.g, I could see my pendulum really trying to swing up upon loading checkpoint, but yet after a few episodes of resuming training, it started banging against the extremities for many episodes (as if it has never learnt those lessons in the past).
For the optimizer, I can use the state_dict() to extract the parameters for saving, but it appears that it is not as straightforward for the replay buffer. May I know if you can implement that into your codes please?
Also, should we just save only the latest batch_sizes (i used a batch size of only 64) within the memory? Or should I save the entire 1E6 of the memory (for state, state_, action, reward, terminal)?
Do you know if there are other critical information that needs saving/loading in order to permit the resumption of training please?
This is my first time posting an "Issue" in Github, please pardon me if I am not posting in the usual way.
Thanks for your explanation on youtube and sharing your codes here!
The text was updated successfully, but these errors were encountered: