Saving the networks, optimizers and replay buffer for resuming training. #4

elvintoh82 · 2021-03-02T17:23:36Z

I am currently using your DDPG implementation for real life training of an inverted pendulum system. Although you currently have save checkpoints for the 4 networks, I notice that they are not enough to resume training. After some research, I believe that there is a need to save the 4 networks, their optimizers, as well as the replay buffer, into a file on the computer.

So that the next time we resume training, we will be able to load all of them and continue the training.

Based on my current situation, I have did 1000 episodes of training and saved/loaded the checkpoints of the 4 networks. But when I resumed training (by loading the checkpoint), the network retains the performance, but it seems that it is trying to retrain by re-experimenting all the errors that it has previously (much earlier) experimented.

E.g, I could see my pendulum really trying to swing up upon loading checkpoint, but yet after a few episodes of resuming training, it started banging against the extremities for many episodes (as if it has never learnt those lessons in the past).

For the optimizer, I can use the state_dict() to extract the parameters for saving, but it appears that it is not as straightforward for the replay buffer. May I know if you can implement that into your codes please?

Also, should we just save only the latest batch_sizes (i used a batch size of only 64) within the memory? Or should I save the entire 1E6 of the memory (for state, state_, action, reward, terminal)?

Do you know if there are other critical information that needs saving/loading in order to permit the resumption of training please?

This is my first time posting an "Issue" in Github, please pardon me if I am not posting in the usual way.

Thanks for your explanation on youtube and sharing your codes here!

berberto · 2021-05-27T11:10:04Z

If the goal is to be able to save an agent at any point in your code, to then reload it later on, I would use dill or pickle. dill.dump allows you to save a whole object to a binary file, and reload it when you need it. If you were to save the agent object, that would automatically save also the buffer and all the networks together.
I normally use this strategy to avoid re-running the whole training if something goes wrong on the cluster where I run my codes.

I believe this method in the agent class would work:

def dump(self):
    filename = 'something_in_checkpointdir'
    with open(filename, "wb") as file:
        dill.dump(self, file)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Saving the networks, optimizers and replay buffer for resuming training. #4

Saving the networks, optimizers and replay buffer for resuming training. #4

elvintoh82 commented Mar 2, 2021

berberto commented May 27, 2021 •

edited

Loading

Saving the networks, optimizers and replay buffer for resuming training. #4

Saving the networks, optimizers and replay buffer for resuming training. #4

Comments

elvintoh82 commented Mar 2, 2021

berberto commented May 27, 2021 • edited Loading

berberto commented May 27, 2021 •

edited

Loading