-
Notifications
You must be signed in to change notification settings - Fork 155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
loading checkpoint doesn't work for resuming training and test #247
Comments
@qiuyuchen14 thank you for the reporting an issue. I'll take a look. |
Thanks, @ViktorM ! Any updates? |
I reproduced the issue, also found another one and will push the fix tomorrow. |
Hi: I've designed an environment in IsaacGym and am currently training an environment with the A2C Continuous PPO implementation. I am running into a similar error when trying to resume training from the same checkpoint or use a checkpoint for evaluation -- my training rewards are already converged around ~2000, while my evaluation rewards are ~200 for . As a more informative metric, the task terminates if the agent performs any unrecoverable behaviors such as falling down, or if episode length hits 700. My average episode length before termination in training is ~690-700 upon convergence, while my testing average episode length before termination is ~50-100. I do not have anything in my environment code that applies any change in environment in training or testing, and my training reward has consistently around 2000 (meaning I'm confident the issue is not an overfit) using
I have tried |
@sashwat-mahalingam could you try a regular ant. Does it work? |
@sashwat-mahalingam this one could be related to the way how I do reporting. When you restart a couple of first total reward reports would be reports from the failed ants because you don't get any results from good ants till 1000 step. I'll doublecheck it anywat. |
I see; this makes sense. Do you have any pointers as to what else I could be missing that would cause my environment to show the discrepancy between training and testing results (since for the Ant environment this is clearly not the case)? While I make sure to not apply randomizations to my custom environment, I am wondering if there are some other configurations I am missing that changes the testing environment drastically from the training ones when using the RLGames training setup. I've already discussed trying stochastic vs. deterministic players above, but beyond that from my read of the RLGames codebase I am not sure what other configurations might cause this environment to behave much differently in training/testing. Here is an example of how my yaml looks for the actual environment itself:
|
I think the issue is that the physX solver is only deterministic under the same set of operations and seed. It would not act the exact same way if I try to load the checkpoint under the fixed seed or evaluate it vs train the model from scratch. I had assumed that maybe the distribution of physics steps would be the same given just the seed, but it seems like this is not guaranteed either and so I can’t expect to achieve the same expected performance during reruns. As I do not do domain randomization which the other tasks all seem to do, could adding this in fix the problem? |
To resume you have to put inside the runner.run the "checkpoint" path like this:
hope this help |
Hi! I'm trying the humanoid SAC example but it doesn't seem to work with loading checkpoint for test and resuming training. Here are what I did:
python train.py task=HumanoidSAC train=HumanoidSAC
python train.py task=HumanoidSAC train=HumanoidSAC test=True checkpoint=runs/HumanoidSAC_19-21-24-22/nn/HumanoidSAC.pth
python train.py task=HumanoidSAC train=HumanoidSAC checkpoint=runs/HumanoidSAC_19-21-24-22/nn/HumanoidSAC.pth
The training itself works fine and the reward went up to >5000, but if I test or resume from the saved checkpoint, it doesn't seem to initialize with the weights properly and reward is around 40. I did a quick check and it went through restore and set full state weights though, so I'm not so sure where might be the problem. One thing I did change was here: weights['step']->weights['steps'] due to the KeyError.
I'm using rl-games 1.6.0 and IsaacGym 1.0rc4.
Thank you!!
The text was updated successfully, but these errors were encountered: