[rllib] Add squash_to_range model option#2239
Conversation
|
Test PASSed. |
|
Test FAILed. |
|
Test PASSed. |
|
Test FAILed. |
|
Can you check for regressions on |
|
It's already done -- works fine.
…On Tue, Jun 19, 2018 at 1:20 PM Richard Liaw ***@***.***> wrote:
Can you check for regressions on pendulum-ppo?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#2239 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAA6SjAaIWuSUq26Wevy7u20eMLfmI3yks5t-V0DgaJpZM4Ujo5X>
.
|
|
Test PASSed. |
* 'master' of https://github.com/ray-project/ray: (157 commits) Fix build failure while using make -j1. Issue 2257 (ray-project#2279) Cast locator with index type (ray-project#2274) fixing zero length partitions (ray-project#2237) Make actor handles work in Python mode. (ray-project#2283) [xray] Add error table and push error messages to driver through node manager. (ray-project#2256) addressing comments (ray-project#2210) Re-enable some actor tests. (ray-project#2276) Experimental: enable automatic GCS flushing with configurable policy. (ray-project#2266) [xray] Sets good object manager defaults. (ray-project#2255) [tune] Update Trainable doc to expose interface (ray-project#2272) [rllib] Add a simple REST policy server and client example (ray-project#2232) [asv] Pushing to s3 (ray-project#2246) [rllib] Remove need to pass around registry (ray-project#2250) Support multiple availability zones in AWS (fix ray-project#2177) (ray-project#2254) [rllib] Add squash_to_range model option (ray-project#2239) Mitigate randomly building failure: adding gen_local_scheduler_fbs to raylet lib. (ray-project#2271) [rllib] Refactor Multi-GPU for PPO (ray-project#1646) [rllib] Envs for vectorized execution, async execution, and policy serving (ray-project#2170) [Dataframe] Change pandas and ray.dataframe imports (ray-project#1942) [Java] Replace binary rewrite with Remote Lambda Cache (SerdeLambda) (ray-project#2245) ...
What do these changes do?
PPO / A3C / PG currently do not respect Box action space low/high values, and will emit values beyond that range. This uses tf.sigmoid to squash to [0, 1] and then rescale to the right range.
Related issue number
#1862