Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to run BlackDROPS with GP-MI #13

Open
urnotmeeto opened this issue May 27, 2020 · 4 comments
Open

How to run BlackDROPS with GP-MI #13

urnotmeeto opened this issue May 27, 2020 · 4 comments
Assignees

Comments

@urnotmeeto
Copy link

Have you implemented the BlackDROPS with GP-MI algorithm that was proposed in your ICRA 2018 paper in this repo? I am very interested in that idea and wondering how to replicate your experimental results.

@costashatz
Copy link
Member

First of all, thank you for your interest.

Have you implemented the BlackDROPS with GP-MI algorithm that was proposed in your ICRA 2018 paper in this repo?

Yes and no. Yes because we have already implemented the GP-MI optimization procedure (see here), but no because we haven't included an example usage.

Let me create an example in the cartpole scenario (which is easy and fast to do), and I will ping you. Give me until 15th of June as I have a few urgent things to finish till then..

@costashatz costashatz self-assigned this May 27, 2020
@urnotmeeto
Copy link
Author

That would be great! I'll check the optimization procedure before you release an example.
Thank you very much!

@costashatz
Copy link
Member

costashatz commented Jul 10, 2020

@urnotmeeto sorry for being late almost one month, but lots of things came up.

I have created a branch with an example of using GP-MI with the cartpole: gp_mi_example. Compile everything and then you can run the example with: ./deps/limbo/build/exp/blackdrops/src/classic_control/cartpole_mi_simu -m -1 -r 1 -n 10 -b 5 -e 1 -u -s. Replace simu with graphic to visualize what's going on. I am still debugging it for possible errors/mistakes (there is something fishy going on in the initial optimization), but it should be a good enough starting point for you and I do not want you to wait more.

The process starts by optimizing the mean model first (with an initial guess of the optimization variables of the mean --- different from the actual system), and the proceeds with the normal loop of optimizing the model and then the policy given the model. Beware that the model optimization will take much longer as well as the policy optimization (we are calling the mean function every-time we query the model).

@urnotmeeto
Copy link
Author

@costashatz Great! I'll check it. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants