-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expert datasets #2
Comments
@araffin Hello! Thank you for reaching my repository! Of course, I'm interested in expert datasets! One reason why I did not make it was that I was not sure how well export policies should perform with pybullet environments since most papers used MuJoCo environments instead of pybullet. I took a little look at the performance reports you posted. The performance seems to be good enough to say "it's expert". Then, do you kindly want to share the expert datasets? or do you want me to run my SAC for 1M steps to make the datasets? Thank you! |
yes, that's a shame.
I don't have much time this week but I could share them. I have been working with PyBullet envs for almost two years now and the performance reported are expert ;) There is one detail we need to agree on though: to account for termination due to timelimit, I add a time feature to the input observation. Another way of solving the problem is to set |
Okay, then let me try to reproduce the performance with my own SAC implemented in this repository. And I'll post the update here!
Although I'm aware of this problem, I did not handle it yet because the original d4rl seems not to handle it. So I'll make the expert datasets without differentiating them. In the future version, we may add the environmental terminal flag besides episodic terminal flags. What do you think? |
I'm running experiments to obtain expert performance now. Once it succeeds, I'll upload expert datasets. |
Sorry for being late to update. Now, I've changed some default parameters and added some codes. The performance seems to be good. I'll share the performance in 24h:) |
Looks good for Hopper, but the performance is a bit low for HalfCheetah and Ant... Probably due to the time limit... |
I did try several random seeds. But they were very similar results. The hyper parameters are as below:
I'm suspecting the time limit flag as you said. |
Well, the time limit will help (see "Influence of the time feature" in appendix here) but I would also recommend you to update the hyperparameters (those works but require 2M steps). |
Hi, is there any update on the expert datasets? |
Hello,
I'm glad someone finally did that repo ;)
I was wondering if you would be interested by expert datasets? (where behavior cloning should perform relatively well)
I would use SB3 and the tuned hyperparameters from the zoo for that: https://github.com/DLR-RM/rl-baselines3-zoo
Performance report: DLR-RM/stable-baselines3#48
The text was updated successfully, but these errors were encountered: