[rllib] Refactor Multi-GPU for PPO by victorsun123 · Pull Request #1646 · ray-project/ray

victorsun123 · 2018-03-04T00:45:16Z

What do these changes do?

Refactor Multi-GPU support for PPO algorithm

Main notes:
The each evaluator, including the local evaluator has a copy of the graph.
The MultiGPU Optimizer has also a copy of the graph, with N gradient op copies, where N is the number of devices exposed.

Before and after each optimizer step, the many copies of the graph are synchronized.

TODOS:

python /ray/python/ray/rllib/train.py --env CartPole-v1 --run PPO --stop '{"training_iteration": 2}' --config '{"kl_coeff": 1.0, "num_sgd_iter": 10, "sgd_stepsize": 1e-4, "sgd_batchsize": 64, "timesteps_per_batch": 2000, "num_workers": 1, "use_gae": false}' is failing
Comments/cleanup
YAPF + Lint
With matching weights and dataset, the graph outputs the same loss and KL and entropy. However, the resulting weights are different after 1 SGD step (and with changing the optimizer with SGD instead of Adam).
- Setting extra_ops vs not setting extra_ops changes the gradient calculation?

AmplabJenkins · 2018-03-04T01:15:02Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/4091/
Test FAILed.

AmplabJenkins · 2018-03-08T22:37:49Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/4213/
Test FAILed.

AmplabJenkins · 2018-03-13T22:48:46Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/4310/
Test FAILed.

AmplabJenkins · 2018-03-13T23:35:01Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/4312/
Test FAILed.

AmplabJenkins · 2018-03-14T02:24:53Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/4316/
Test FAILed.

AmplabJenkins · 2018-03-14T19:28:10Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/4330/
Test FAILed.

AmplabJenkins · 2018-03-15T21:35:10Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/4348/
Test FAILed.

ericl · 2018-03-16T22:02:05Z

Looks like this needs to be rebased (also some PAL stuff leaked in).

AmplabJenkins · 2018-03-18T07:58:49Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/4382/
Test FAILed.

AmplabJenkins · 2018-03-19T13:37:55Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/4405/
Test FAILed.

AmplabJenkins · 2018-03-22T00:36:48Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/4461/
Test FAILed.

AmplabJenkins · 2018-03-23T01:12:02Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/4476/
Test FAILed.

AmplabJenkins · 2018-03-23T04:07:33Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/4478/
Test FAILed.

richardliaw · 2018-03-23T06:16:20Z

python/ray/rllib/dqn/dqn_replay_evaluator.py

we should get rid of this

The file? Or the line from __future__ import absolute_import? We include that line at the top of every Python file.

ah this comment is for the file

richardliaw · 2018-03-23T06:16:47Z

python/ray/rllib/dqn/models.py

we should revert all the changes in this file

richardliaw · 2018-03-23T06:17:01Z

python/ray/rllib/models/catalog.py

we should revert all the changes in this file

richardliaw · 2018-03-23T06:17:47Z

python/ray/rllib/ppo/rollout.py

we should just delete this file

richardliaw · 2018-03-23T06:18:29Z

python/ray/rllib/dqn/dqn.py

and revert changes here

richardliaw · 2018-03-23T06:22:18Z

python/ray/rllib/utils/process_rollout.py

@victorsun123 can you fix so that this isn't a hack?

richardliaw · 2018-03-23T06:22:27Z

python/ray/rllib/utils/__init__.py

remove the comment?

AmplabJenkins · 2018-04-01T05:19:07Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/4587/
Test FAILed.

AmplabJenkins · 2018-04-11T02:04:19Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/4795/
Test FAILed.

robertnishihara · 2018-05-26T05:11:12Z

@victorsun123 is this still in progress?

richardliaw · 2018-05-26T05:14:35Z

I have to clean it up - yeah

…

On Fri, May 25, 2018 at 10:11 PM Robert Nishihara ***@***.***> wrote: @victorsun123 <https://github.com/victorsun123> is this still in progress? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#1646 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AEUc5QPROhllY4JYgM2QLQRSysrN_Fj-ks5t2OP0gaJpZM4SbJWp> .

AmplabJenkins · 2018-05-26T05:57:32Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5635/
Test FAILed.

AmplabJenkins · 2018-06-04T20:49:56Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5858/
Test FAILed.

AmplabJenkins · 2018-06-06T18:36:16Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5892/
Test FAILed.

michaeltu1 · 2018-06-09T00:17:54Z

GPU ids are not detected

AmplabJenkins · 2018-06-09T00:56:39Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5953/
Test FAILed.

AmplabJenkins · 2018-06-11T04:25:28Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5994/
Test FAILed.

AmplabJenkins · 2018-06-11T04:46:06Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5995/
Test FAILed.

AmplabJenkins · 2018-06-11T05:00:06Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5996/
Test FAILed.

AmplabJenkins · 2018-06-18T05:04:54Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/6090/
Test FAILed.

AmplabJenkins · 2018-06-18T05:45:22Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/6091/
Test FAILed.

richardliaw · 2018-06-18T06:41:43Z

python/ray/rllib/utils/seed.py

@@ -0,0 +1,13 @@
+from __future__ import absolute_import


we should support seeding (this utility was incredibly useful during debugging). If so we also need to support environment seeding, which is not covered in this utility.

richardliaw · 2018-06-18T06:50:16Z

python/ray/rllib/optimizers/multi_gpu.py

+                #         [e.sample.remote() for e in self.remote_evaluators]))
+                from ray.rllib.ppo.rollout import collect_samples
+                samples = collect_samples(self.remote_evaluators,
+                                          self.timesteps_per_batch)


there's probably a better treatment for this?

richardliaw · 2018-06-18T06:55:19Z

python/ray/rllib/ppo/ppo_evaluator.py

+            self.kl_coeff, self.distribution_class, self.config,
+            self.sess, self.registry)
+
+    def init_extra_ops(self, device_losses):


This was odd because of the chain of dependencies:

Local evaluator creates a model

MultiGPU creates Variable replicas (which are just refs to the local model)

These ops are created, which use nodes from (2)

richardliaw · 2018-06-18T06:58:53Z

@ericl this is ready for readthrough

AmplabJenkins · 2018-06-18T07:26:50Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/6095/
Test FAILed.

AmplabJenkins · 2018-06-18T07:34:25Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/6094/
Test FAILed.

ericl

Looks fine. I'm not too worried about the ugly bits since we'll need to port this to CommonPolicyEvaluator next anyways....

AmplabJenkins · 2018-06-19T00:14:12Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/6110/
Test FAILed.

AmplabJenkins · 2018-06-19T01:36:41Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/6114/
Test PASSed.

Fixed

AmplabJenkins · 2018-06-19T03:17:50Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/6115/
Test PASSed.

* 'master' of https://github.com/ray-project/ray: (157 commits) Fix build failure while using make -j1. Issue 2257 (ray-project#2279) Cast locator with index type (ray-project#2274) fixing zero length partitions (ray-project#2237) Make actor handles work in Python mode. (ray-project#2283) [xray] Add error table and push error messages to driver through node manager. (ray-project#2256) addressing comments (ray-project#2210) Re-enable some actor tests. (ray-project#2276) Experimental: enable automatic GCS flushing with configurable policy. (ray-project#2266) [xray] Sets good object manager defaults. (ray-project#2255) [tune] Update Trainable doc to expose interface (ray-project#2272) [rllib] Add a simple REST policy server and client example (ray-project#2232) [asv] Pushing to s3 (ray-project#2246) [rllib] Remove need to pass around registry (ray-project#2250) Support multiple availability zones in AWS (fix ray-project#2177) (ray-project#2254) [rllib] Add squash_to_range model option (ray-project#2239) Mitigate randomly building failure: adding gen_local_scheduler_fbs to raylet lib. (ray-project#2271) [rllib] Refactor Multi-GPU for PPO (ray-project#1646) [rllib] Envs for vectorized execution, async execution, and policy serving (ray-project#2170) [Dataframe] Change pandas and ray.dataframe imports (ray-project#1942) [Java] Replace binary rewrite with Remote Lambda Cache (SerdeLambda) (ray-project#2245) ...

victorsun123 changed the title ~~Refactor Multi-GPU for PPO~~ Refactor Multi-GPU for PPO (WIP) Mar 4, 2018

victorsun123 changed the title ~~Refactor Multi-GPU for PPO (WIP)~~ [rllib] Refactor Multi-GPU for PPO (WIP) Mar 4, 2018

ericl self-assigned this Mar 4, 2018

richardliaw mentioned this pull request Mar 15, 2018

[rllib] Engineering Issues/Backlog #1724

Closed

17 tasks

richardliaw reviewed Mar 23, 2018

View reviewed changes

moved par_opt out of evaluator

0b852f2

richardliaw mentioned this pull request Jun 17, 2018

[rllib] Replace tf.minimum with tf.maximum in PPO loss. #2265

Closed

richardliaw added 5 commits June 17, 2018 19:16

moved loss building out

48c4e49

nit optimizer fix

838a006

convert ppo to use postprocessing and other MultiGPU structures

e243942

letting the local eval manage kl state

b4ca261

refactoring the rest of PPO

698ce21

richardliaw force-pushed the gpu branch from 0bc00f7 to 698ce21 Compare June 18, 2018 04:20

comments and such

7c8a046

remove vestiges from evaluator

8861e20

flaking and adption of apis

0be6b18

richardliaw changed the title ~~[rllib] Refactor Multi-GPU for PPO (WIP)~~ [rllib] Refactor Multi-GPU for PPO Jun 18, 2018

seeding

154e4c0

richardliaw reviewed Jun 18, 2018

View reviewed changes

ericl approved these changes Jun 18, 2018

View reviewed changes

richardliaw added 2 commits June 18, 2018 16:01

Merge branch 'master' into gpu_v2

0003be4

add comment about ugly collection

81dbc78

fix test supported space

3593b72

small comment

ee1f2b1

richardliaw merged commit b372b71 into ray-project:master Jun 19, 2018

Comments

Conversation

victorsun123 commented Mar 4, 2018 • edited by richardliaw Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What do these changes do?

TODOS:

Uh oh!

AmplabJenkins commented Mar 4, 2018

Uh oh!

AmplabJenkins commented Mar 8, 2018

Uh oh!

AmplabJenkins commented Mar 13, 2018

Uh oh!

AmplabJenkins commented Mar 13, 2018

Uh oh!

AmplabJenkins commented Mar 14, 2018

Uh oh!

AmplabJenkins commented Mar 14, 2018

Uh oh!

AmplabJenkins commented Mar 15, 2018

Uh oh!

ericl commented Mar 16, 2018

Uh oh!

AmplabJenkins commented Mar 18, 2018

Uh oh!

AmplabJenkins commented Mar 19, 2018

Uh oh!

AmplabJenkins commented Mar 22, 2018

Uh oh!

AmplabJenkins commented Mar 23, 2018

Uh oh!

AmplabJenkins commented Mar 23, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AmplabJenkins commented Apr 1, 2018

Uh oh!

AmplabJenkins commented Apr 11, 2018

Uh oh!

robertnishihara commented May 26, 2018

Uh oh!

richardliaw commented May 26, 2018 via email

Uh oh!

AmplabJenkins commented May 26, 2018

Uh oh!

AmplabJenkins commented Jun 4, 2018

Uh oh!

AmplabJenkins commented Jun 6, 2018

Uh oh!

michaeltu1 commented Jun 9, 2018

Uh oh!

AmplabJenkins commented Jun 9, 2018

Uh oh!

AmplabJenkins commented Jun 11, 2018

Uh oh!

AmplabJenkins commented Jun 11, 2018

Uh oh!

AmplabJenkins commented Jun 11, 2018

Uh oh!

AmplabJenkins commented Jun 18, 2018

Uh oh!

AmplabJenkins commented Jun 18, 2018

Uh oh!

Choose a reason for hiding this comment

victorsun123 commented Mar 4, 2018 •

edited by richardliaw

Loading