[rllib] Refactor Multi-GPU for PPO#1646
Conversation
|
Test FAILed. |
|
Test FAILed. |
|
Test FAILed. |
|
Test FAILed. |
|
Test FAILed. |
|
Test FAILed. |
|
Test FAILed. |
|
Looks like this needs to be rebased (also some PAL stuff leaked in). |
|
Test FAILed. |
|
Test FAILed. |
|
Test FAILed. |
|
Test FAILed. |
|
Test FAILed. |
There was a problem hiding this comment.
we should get rid of this
There was a problem hiding this comment.
The file? Or the line from __future__ import absolute_import? We include that line at the top of every Python file.
There was a problem hiding this comment.
ah this comment is for the file
python/ray/rllib/dqn/models.py
Outdated
There was a problem hiding this comment.
we should revert all the changes in this file
python/ray/rllib/models/catalog.py
Outdated
There was a problem hiding this comment.
we should revert all the changes in this file
python/ray/rllib/ppo/rollout.py
Outdated
There was a problem hiding this comment.
we should just delete this file
python/ray/rllib/dqn/dqn.py
Outdated
There was a problem hiding this comment.
and revert changes here
There was a problem hiding this comment.
@victorsun123 can you fix so that this isn't a hack?
python/ray/rllib/utils/__init__.py
Outdated
|
Test FAILed. |
|
Test FAILed. |
|
@victorsun123 is this still in progress? |
|
I have to clean it up - yeah
…On Fri, May 25, 2018 at 10:11 PM Robert Nishihara ***@***.***> wrote:
@victorsun123 <https://github.com/victorsun123> is this still in progress?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1646 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AEUc5QPROhllY4JYgM2QLQRSysrN_Fj-ks5t2OP0gaJpZM4SbJWp>
.
|
|
Test FAILed. |
|
Test FAILed. |
|
Test FAILed. |
|
GPU ids are not detected |
|
Test FAILed. |
|
Test FAILed. |
|
Test FAILed. |
|
Test FAILed. |
|
Test FAILed. |
|
Test FAILed. |
| @@ -0,0 +1,13 @@ | |||
| from __future__ import absolute_import | |||
There was a problem hiding this comment.
we should support seeding (this utility was incredibly useful during debugging). If so we also need to support environment seeding, which is not covered in this utility.
| # [e.sample.remote() for e in self.remote_evaluators])) | ||
| from ray.rllib.ppo.rollout import collect_samples | ||
| samples = collect_samples(self.remote_evaluators, | ||
| self.timesteps_per_batch) |
There was a problem hiding this comment.
there's probably a better treatment for this?
| self.kl_coeff, self.distribution_class, self.config, | ||
| self.sess, self.registry) | ||
|
|
||
| def init_extra_ops(self, device_losses): |
There was a problem hiding this comment.
This was odd because of the chain of dependencies:
- Local evaluator creates a model
- MultiGPU creates Variable replicas (which are just refs to the local model)
- These ops are created, which use nodes from (2)
|
@ericl this is ready for readthrough |
|
Test FAILed. |
|
Test FAILed. |
ericl
left a comment
There was a problem hiding this comment.
Looks fine. I'm not too worried about the ugly bits since we'll need to port this to CommonPolicyEvaluator next anyways....
|
Test FAILed. |
|
Test PASSed. |
|
Test PASSed. |
* 'master' of https://github.com/ray-project/ray: (157 commits) Fix build failure while using make -j1. Issue 2257 (ray-project#2279) Cast locator with index type (ray-project#2274) fixing zero length partitions (ray-project#2237) Make actor handles work in Python mode. (ray-project#2283) [xray] Add error table and push error messages to driver through node manager. (ray-project#2256) addressing comments (ray-project#2210) Re-enable some actor tests. (ray-project#2276) Experimental: enable automatic GCS flushing with configurable policy. (ray-project#2266) [xray] Sets good object manager defaults. (ray-project#2255) [tune] Update Trainable doc to expose interface (ray-project#2272) [rllib] Add a simple REST policy server and client example (ray-project#2232) [asv] Pushing to s3 (ray-project#2246) [rllib] Remove need to pass around registry (ray-project#2250) Support multiple availability zones in AWS (fix ray-project#2177) (ray-project#2254) [rllib] Add squash_to_range model option (ray-project#2239) Mitigate randomly building failure: adding gen_local_scheduler_fbs to raylet lib. (ray-project#2271) [rllib] Refactor Multi-GPU for PPO (ray-project#1646) [rllib] Envs for vectorized execution, async execution, and policy serving (ray-project#2170) [Dataframe] Change pandas and ray.dataframe imports (ray-project#1942) [Java] Replace binary rewrite with Remote Lambda Cache (SerdeLambda) (ray-project#2245) ...
What do these changes do?
Refactor Multi-GPU support for PPO algorithm
Main notes:
The each evaluator, including the local evaluator has a copy of the graph.
The MultiGPU Optimizer has also a copy of the graph, with
Ngradient op copies, where N is the number of devices exposed.Before and after each optimizer step, the many copies of the graph are synchronized.
TODOS:
python /ray/python/ray/rllib/train.py --env CartPole-v1 --run PPO --stop '{"training_iteration": 2}' --config '{"kl_coeff": 1.0, "num_sgd_iter": 10, "sgd_stepsize": 1e-4, "sgd_batchsize": 64, "timesteps_per_batch": 2000, "num_workers": 1, "use_gae": false}'is failingComments/cleanup
YAPF + Lint
With matching weights and dataset, the graph outputs the same loss and KL and entropy. However, the resulting weights are different after 1 SGD step (and with changing the optimizer with SGD instead of Adam).