Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expert datasets #2

Open
araffin opened this issue Aug 26, 2020 · 10 comments
Open

Expert datasets #2

araffin opened this issue Aug 26, 2020 · 10 comments

Comments

@araffin
Copy link

araffin commented Aug 26, 2020

Hello,

I'm glad someone finally did that repo ;)
I was wondering if you would be interested by expert datasets? (where behavior cloning should perform relatively well)

I would use SB3 and the tuned hyperparameters from the zoo for that: https://github.com/DLR-RM/rl-baselines3-zoo

Performance report: DLR-RM/stable-baselines3#48

@araffin araffin changed the title Expert dataset Expert datasets Aug 26, 2020
@takuseno
Copy link
Owner

@araffin Hello! Thank you for reaching my repository!

Of course, I'm interested in expert datasets! One reason why I did not make it was that I was not sure how well export policies should perform with pybullet environments since most papers used MuJoCo environments instead of pybullet.

I took a little look at the performance reports you posted. The performance seems to be good enough to say "it's expert". Then, do you kindly want to share the expert datasets? or do you want me to run my SAC for 1M steps to make the datasets?
I just want to know your intention:)

Thank you!

@araffin
Copy link
Author

araffin commented Aug 26, 2020

since most papers used MuJoCo environments instead of pybullet.

yes, that's a shame.

Then, do you kindly want to share the expert datasets? or do you want me to run my SAC for 1M steps to make the datasets?

I don't have much time this week but I could share them. I have been working with PyBullet envs for almost two years now and the performance reported are expert ;)
Otherwise, you can run SAC for 2M steps instead of 1M to reach maximal performance with the corresponding episodic return:
HalfCheetah: ~3000
Ant: ~3300
Hopper: ~2500
Walker: ~2500 (hard to reach)

There is one detail we need to agree on though: to account for termination due to timelimit, I add a time feature to the input observation. Another way of solving the problem is to set done=False when the termination is due to time limit (as done in the TD3 paper for instance).

@takuseno
Copy link
Owner

takuseno commented Aug 26, 2020

Okay, then let me try to reproduce the performance with my own SAC implemented in this repository. And I'll post the update here!

There is one detail we need to agree on though: to account for termination due to timelimit

Although I'm aware of this problem, I did not handle it yet because the original d4rl seems not to handle it. So I'll make the expert datasets without differentiating them. In the future version, we may add the environmental terminal flag besides episodic terminal flags. What do you think?

@takuseno
Copy link
Owner

takuseno commented Aug 26, 2020

I'm running experiments to obtain expert performance now. Once it succeeds, I'll upload expert datasets.

@takuseno
Copy link
Owner

Sorry for being late to update. Now, I've changed some default parameters and added some codes. The performance seems to be good. I'll share the performance in 24h:)

@takuseno
Copy link
Owner

takuseno commented Aug 28, 2020

These are the result (evaluation) for a single random seed. Maybe, I need to run for multiple seeds to get the maximum performance.

image
image
image

@araffin
Copy link
Author

araffin commented Sep 20, 2020

Looks good for Hopper, but the performance is a bit low for HalfCheetah and Ant... Probably due to the time limit...
What hyperparameters did you use?

@takuseno
Copy link
Owner

I did try several random seeds. But they were very similar results. The hyper parameters are as below:

  • learning rate (including Lagrangean multiplier): 3e-4
  • tau: 0.005
  • gamma: 0.99
  • hidden units: [256, 256]

I'm suspecting the time limit flag as you said.

@araffin
Copy link
Author

araffin commented Sep 21, 2020

Well, the time limit will help (see "Influence of the time feature" in appendix here) but I would also recommend you to update the hyperparameters (those works but require 2M steps).
You can look at that paper or the rl zoo for tuned hyperparameters ;)

@ryanxhr
Copy link

ryanxhr commented Jun 3, 2021

Hi, is there any update on the expert datasets?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants