Expert datasets #2

araffin · 2020-08-26T09:08:24Z

Hello,

I'm glad someone finally did that repo ;)
I was wondering if you would be interested by expert datasets? (where behavior cloning should perform relatively well)

I would use SB3 and the tuned hyperparameters from the zoo for that: https://github.com/DLR-RM/rl-baselines3-zoo

Performance report: DLR-RM/stable-baselines3#48

takuseno · 2020-08-26T09:23:32Z

@araffin Hello! Thank you for reaching my repository!

Of course, I'm interested in expert datasets! One reason why I did not make it was that I was not sure how well export policies should perform with pybullet environments since most papers used MuJoCo environments instead of pybullet.

I took a little look at the performance reports you posted. The performance seems to be good enough to say "it's expert". Then, do you kindly want to share the expert datasets? or do you want me to run my SAC for 1M steps to make the datasets?
I just want to know your intention:)

Thank you!

araffin · 2020-08-26T09:45:34Z

since most papers used MuJoCo environments instead of pybullet.

yes, that's a shame.

Then, do you kindly want to share the expert datasets? or do you want me to run my SAC for 1M steps to make the datasets?

I don't have much time this week but I could share them. I have been working with PyBullet envs for almost two years now and the performance reported are expert ;)
Otherwise, you can run SAC for 2M steps instead of 1M to reach maximal performance with the corresponding episodic return:
HalfCheetah: ~3000
Ant: ~3300
Hopper: ~2500
Walker: ~2500 (hard to reach)

There is one detail we need to agree on though: to account for termination due to timelimit, I add a time feature to the input observation. Another way of solving the problem is to set done=False when the termination is due to time limit (as done in the TD3 paper for instance).

takuseno · 2020-08-26T11:23:52Z

Okay, then let me try to reproduce the performance with my own SAC implemented in this repository. And I'll post the update here!

There is one detail we need to agree on though: to account for termination due to timelimit

Although I'm aware of this problem, I did not handle it yet because the original d4rl seems not to handle it. So I'll make the expert datasets without differentiating them. In the future version, we may add the environmental terminal flag besides episodic terminal flags. What do you think?

takuseno · 2020-08-26T13:53:03Z

I'm running experiments to obtain expert performance now. Once it succeeds, I'll upload expert datasets.

takuseno · 2020-08-27T23:52:03Z

Sorry for being late to update. Now, I've changed some default parameters and added some codes. The performance seems to be good. I'll share the performance in 24h:)

takuseno · 2020-08-28T11:03:37Z

These are the result (evaluation) for a single random seed. Maybe, I need to run for multiple seeds to get the maximum performance.

araffin · 2020-09-20T16:40:27Z

Looks good for Hopper, but the performance is a bit low for HalfCheetah and Ant... Probably due to the time limit...
What hyperparameters did you use?

takuseno · 2020-09-21T02:45:38Z

I did try several random seeds. But they were very similar results. The hyper parameters are as below:

learning rate (including Lagrangean multiplier): 3e-4
tau: 0.005
gamma: 0.99
hidden units: [256, 256]

I'm suspecting the time limit flag as you said.

araffin · 2020-09-21T07:07:58Z

Well, the time limit will help (see "Influence of the time feature" in appendix here) but I would also recommend you to update the hyperparameters (those works but require 2M steps).
You can look at that paper or the rl zoo for tuned hyperparameters ;)

ryanxhr · 2021-06-03T04:57:24Z

Hi, is there any update on the expert datasets?

araffin changed the title ~~Expert dataset~~ Expert datasets Aug 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expert datasets #2

Expert datasets #2

araffin commented Aug 26, 2020

takuseno commented Aug 26, 2020

araffin commented Aug 26, 2020

takuseno commented Aug 26, 2020 •

edited

Loading

takuseno commented Aug 26, 2020 •

edited

Loading

takuseno commented Aug 27, 2020

takuseno commented Aug 28, 2020 •

edited

Loading

araffin commented Sep 20, 2020

takuseno commented Sep 21, 2020

araffin commented Sep 21, 2020

ryanxhr commented Jun 3, 2021

Expert datasets #2

Expert datasets #2

Comments

araffin commented Aug 26, 2020

takuseno commented Aug 26, 2020

araffin commented Aug 26, 2020

takuseno commented Aug 26, 2020 • edited Loading

takuseno commented Aug 26, 2020 • edited Loading

takuseno commented Aug 27, 2020

takuseno commented Aug 28, 2020 • edited Loading

araffin commented Sep 20, 2020

takuseno commented Sep 21, 2020

araffin commented Sep 21, 2020

ryanxhr commented Jun 3, 2021

takuseno commented Aug 26, 2020 •

edited

Loading

takuseno commented Aug 26, 2020 •

edited

Loading

takuseno commented Aug 28, 2020 •

edited

Loading