Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regarding pretrained models #14

Open
LostXine opened this issue Aug 12, 2023 · 6 comments
Open

Regarding pretrained models #14

LostXine opened this issue Aug 12, 2023 · 6 comments

Comments

@LostXine
Copy link

Hello,

Thanks again for this great project. It would be great if you could help diagnose this issue regarding the pretrained models.

When I try to evaluate your pretrained models of the hybrid CNN setting, I found them not working properly on Push-T, Transport ph, and transport mh. There could be more but I haven't tried them yet.
Basically, the action trajectory is relatively reasonable (not random noisy actions), but the agent just could not finish the task. (Push-T mean score: 0.09, Transport mean score: 0)
However, when I tried to train a model from scratch and evaluate it, it works fine, which may indicate that the evaluation code is correct.
I tested them on two machines at different locations, and all models are directly downloaded from your website. I also performed the integrity check and confirm that the two copies of the models on the two machines are identical. The training code can properly load the model file and the num of epochs matches the filename. But, it just does not generate the correct actions. After days of debugging, I could not find any possible directions to look into.

So could you please share some insights on what may cause this issue?

Thank you so much!

Best regards,

@cheng-chi
Copy link
Collaborator

Hi @LostXine Interesing, do you mind shearing your code for evaluating on pretrained models?

@LostXine
Copy link
Author

Hi @cheng-chi ,

Thanks a lot for your response. I tried two versions:

  1. Untouched training code from this repo (diffusion_policy/workspace/train_diffusion_unet_hybrid_workspace.py).
  2. A simplified version whose core run function is like:
# ========= eval for this epoch ==========
policy = self.model
if cfg.training.use_ema:
    policy = self.ema_model
policy.to(device)
policy.eval()

# configure env
env_runner: BaseImageRunner
env_runner = hydra.utils.instantiate(cfg.task.env_runner, output_dir=self.output_dir)
assert isinstance(env_runner, BaseImageRunner)
# run rollout
runner_log = env_runner.run(policy)
del env_runner
# log all
step_log.update(runner_log)
json_logger.log(step_log)
print(step_log)

They have the same behavior that only the model I trained works. I could also check the hash sum of the model I downloaded if you believe that is helpful.

Thank you so much!

@cheng-chi
Copy link
Collaborator

Hi @LostXine, I'm not sure what exactly is the problem in your script, but I have just created a script (that is fairly similar to what you have) that can evaluate all provided checkpoints.
Please checkout the updated README for usage.
On Push-T lowdim + Diffusion Policy CNN I'm getting "test/mean_score": 0.9150393806777066 using epoch=0550-test_mean_score=0.969.ckpt.
On Push-T Image + Diffusion Policy CNN I'm getting "test/mean_score": 0.9177610059407988 using epoch=0500-test_mean_score=0.884.ckpt.

@LostXine
Copy link
Author

Hi @cheng-chi , thank you so much for your effort. I'll check it and get back to you soon.
Best regards,

@LostXine
Copy link
Author

Hi @cheng-chi
Thanks for all your efforts, we finally figured it out.
It turns out that the order of the states in policy.shape_meta.obs will change the order of the state features (order of the channels) in global_cond tensor. The config files currently listed on the website do not match the config in the checkpoints in terms of the order of the states, though they have the same value. As a result, the order of the channels of global_cond will be different which causes unexpected behavior.
Hope it helps, and thanks again.

@cheng-chi
Copy link
Collaborator

@LostXine Oh great! Good to know! I will probably add sorting for the keys in the future. Dependent on yaml ordering is indeed a bit problematic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants