[rllib] Envs for vectorized execution, async execution, and policy serving#2170
[rllib] Envs for vectorized execution, async execution, and policy serving#2170ericl merged 85 commits intoray-project:masterfrom
Conversation
|
Test FAILed. |
|
Test FAILed. |
|
Test FAILed. |
|
Test FAILed. |
|
Test FAILed. |
|
Test FAILed. |
|
Test FAILed. |
|
jenkins retest this please |
|
Test FAILed. |
|
Test FAILed. |
|
jenkins retest this please |
|
Test FAILed. |
|
Test FAILed. |
|
Test PASSed. |
* 'master' of https://github.com/ray-project/ray: (157 commits) Fix build failure while using make -j1. Issue 2257 (ray-project#2279) Cast locator with index type (ray-project#2274) fixing zero length partitions (ray-project#2237) Make actor handles work in Python mode. (ray-project#2283) [xray] Add error table and push error messages to driver through node manager. (ray-project#2256) addressing comments (ray-project#2210) Re-enable some actor tests. (ray-project#2276) Experimental: enable automatic GCS flushing with configurable policy. (ray-project#2266) [xray] Sets good object manager defaults. (ray-project#2255) [tune] Update Trainable doc to expose interface (ray-project#2272) [rllib] Add a simple REST policy server and client example (ray-project#2232) [asv] Pushing to s3 (ray-project#2246) [rllib] Remove need to pass around registry (ray-project#2250) Support multiple availability zones in AWS (fix ray-project#2177) (ray-project#2254) [rllib] Add squash_to_range model option (ray-project#2239) Mitigate randomly building failure: adding gen_local_scheduler_fbs to raylet lib. (ray-project#2271) [rllib] Refactor Multi-GPU for PPO (ray-project#1646) [rllib] Envs for vectorized execution, async execution, and policy serving (ray-project#2170) [Dataframe] Change pandas and ray.dataframe imports (ray-project#1942) [Java] Replace binary rewrite with Remote Lambda Cache (SerdeLambda) (ray-project#2245) ...
What do these changes do?
Vectorized envs: Users can either implement
VectorEnv, or alternatively setnum_envs=Nto auto-vectorize gym envs (this vectorizes just the action computation part).Async envs: The more general form of
VectorEnvisAsyncVectorEnv, which allows agents to execute out of lockstep. We use this as an adapter to supportServingEnv. Since we can convert any other form of env toAsyncVectorEnv, utils.sampler has been rewritten to run against this interface.Policy serving: This provides an env which is not stepped. Rather, the env executes in its own thread, querying the policy for actions via
self.get_action(obs), and reporting results viaself.log_returns(rewards). We also support logging of off-policy actions viaself.log_action(obs, action). This is a more convenient API for some use cases, and also provides parallelizable support for policy serving (for example, if you start a HTTP server in the env) and ingest of offline logs (if the env reads from serving logs).Any of these types of envs can be passed to RLlib agents. RLlib handles conversions internally in CommonPolicyEvaluator, for example:
TODO:
Related issue number
#2053