Fix A3C PyTorch implementation by alok · Pull Request #2036 · ray-project/ray

alok · 2018-05-11T11:22:14Z

What do these changes do?

Fixes up old broken torch code.
Ensures that data is the proper shape.
Renames some variables and removes unused ones.
Makes code more idiomatic.

Related issue number

#2021. These changes are a subset of the ones in that PR, broken off to make
review easier.

Stateless functions should not be network layers.

Matches in_size and makes more sense.

Advantages and rewards both should be scalars, and therefore a list of them should be 1D.

AmplabJenkins · 2018-05-11T12:31:28Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5335/
Test PASSed.

alok · 2018-05-11T23:03:22Z

python/ray/rllib/a3c/shared_torch_policy.py

        overall_err.backward()
-        torch.nn.utils.clip_grad_norm(
-            self._model.parameters(), self.config["grad_clip"])
+        torch.nn.utils.clip_grad_norm_(self._model.parameters(),


clip_grad_norm is deprecated in favor of the underscore version, hence the change

alok · 2018-05-11T23:04:02Z

python/ray/rllib/models/pytorch/model.py

        if initializer:
            initializer(conv.weight)
-        nn.init.constant(conv.bias, bias_init)
+        nn.init.constant_(conv.bias, bias_init)


nn.init.constant is deprecated in favor of the underscore version, hence the change

richardliaw

Nice! Overall looks good; have you tested it out?

alok · 2018-05-11T23:32:15Z

Tested on Cartpole and it hit 200 reward pretty quick, so I think it works.

…

-- Alok

On Fri, May 11, 2018 at 4:24 PM, Richard Liaw ***@***.***> wrote: ***@***.**** commented on this pull request. Nice! Overall looks good; have you tested it out? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#2036 (review)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AH8KTNivW6lKWak6OlBqEIVoDgessCg8ks5txh2ZgaJpZM4T7TwY> .

richardliaw · 2018-05-12T00:06:48Z

awesome; can you make sure it runs on Pong? just as a sanity check.

We should seriously add pytorch to the test suite...

richardliaw

lgtm conditioned on Pong running

alok · 2018-05-12T01:48:57Z

As it turns out, pong doesn't run. Getting shape errors with the Conv2D layers which I think are unrelated to these changes, but have just been broken a while. @richardliaw Do you mind trying out this PR on Pong and taking a look at the errors?

AmplabJenkins · 2018-05-12T02:23:34Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5346/
Test FAILed.

AmplabJenkins · 2018-05-14T20:02:44Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5381/
Test FAILed.

Torch does this for us now.

* master: Create RemoteFunction class, remove FunctionProperties, simplify worker Python code. (ray-project#2052) Don't crash on duplicate actor notifications (ray-project#2043) Fixed attribute name in code example (ray-project#2054) [xray] Add Travis build for testing xray on Linux. (ray-project#2047) Added missing comma to code example (ray-project#2050) Use more CPUs for testMultipleWaitsAndGets. (ray-project#2051) use jobid_nil (ray-project#2044) Fix typo in tune. (ray-project#2046) Fix error in api.rst. (ray-project#2048) Improve shared_ptr usage (ray-project#2030)

AmplabJenkins · 2018-05-15T01:09:03Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5390/
Test PASSed.

richardliaw · 2018-05-15T02:54:18Z

python/ray/rllib/models/pytorch/visionnet.py

            [16, [8, 8], 4],
            [32, [4, 4], 2],
-            [512, [10, 10], 1]
+            [512, [10, 1], 1],


are you sure about this?

No. This worked so I ran with it, but I don't know much about convnets, so this is just a SWAG. I'd appreciate if you could take a look at it.

richardliaw · 2018-05-15T23:43:58Z

which pytorch version are you using?

alok · 2018-05-15T23:45:38Z

0.4.0

richardliaw · 2018-05-16T06:00:58Z

I've been using the following script to test:

import ray
import torch
ray.init()
from ray.rllib.a3c import A3CAgent
from ray.rllib.a3c import DEFAULT_CONFIG
DEFAULT_CONFIG
config = DEFAULT_CONFIG.copy()
config["use_pytorch"] = True
config["model"]["channel_major"] = True
config["num_workers"] = 1
config["optimizer"]["grads_per_step"] = 10
import ipdb; ipdb.set_trace()
agent = A3CAgent(config=config, env="Pong-v0")
evaluator = agent.local_evaluator
agent.train()
policy = evaluator.policy
state = (evaluator.sampler.env.reset())
ob = torch.from_numpy(state).float().unsqueeze(0)

AmplabJenkins · 2018-05-16T07:04:58Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5417/
Test PASSed.

AmplabJenkins · 2018-05-16T07:10:17Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5418/
Test PASSed.

* master: Pin Pandas version for Travis to 0.22 (ray-project#2075) Fix python linting (ray-project#2076) [xray] Fix GCS table prefixes (ray-project#2065) Some tests for _submit API. (ray-project#2062) [rllib] Queue lib for python 2.7 (ray-project#2057) [autoscaler] Remove faulty assert that breaks during downscaling, pull configs from env (ray-project#2006) [DataFrame] Refactor indexers and implement setitem (ray-project#2020) [rllib]Update bc/policy.py (ray-project#2012)

AmplabJenkins · 2018-05-29T07:46:14Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5683/
Test PASSed.

richardliaw

OK I'm going to add regression tests for PyTorch because for some reason, CartPole-v0 does not work on my machine

richardliaw · 2018-05-29T18:58:25Z

python/ray/rllib/a3c/a3c.py

        "dim": 80,
        # (Image statespace) - Converts image shape to (C, dim, dim)
-        "channel_major": False
+        "channel_major" : False,


is there supposed to be a space here?

alok · 2018-05-29T21:17:03Z

I'm looking at the best way to overhaul how state/action spaces are handled in torch. Since torch now supports scalars, we should be able to support the same range of envs as TF.

richardliaw · 2018-05-29T21:18:54Z

How much do you think should go in this PR and how much do you think should go into a subsequent one? Keep in mind #2149 is pretty big and will go in soon...

AmplabJenkins · 2018-05-29T21:27:05Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5692/
Test FAILed.

alok · 2018-05-29T21:29:51Z

I think this one should fix the shapes for Pendulum, Cartpole, and Pong. Anything else is probably best handled in a followup.

AmplabJenkins · 2018-05-29T22:10:22Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5694/
Test FAILed.

richardliaw · 2018-05-29T22:11:15Z

Ok awesome - Jenkins is running PyTorch tests and Pong is passing while CartPole is not.

AmplabJenkins · 2018-05-29T22:52:46Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5695/
Test FAILed.

alok · 2018-05-29T23:59:39Z

@richardliaw I think the current torch version of A3C only works in discrete action spaces since it always samples from a multinomial distribution, so we could punt supporting Pendulum and related envs for another PR while fixing CartPole and Pong in this PR.

alok · 2018-05-30T00:23:36Z

@richardliaw @ericl Can you check this? This should be an OK set of changes to merge before the larger overhaul.

Pendulum doesn't work since it's an edge case (expects singleton arrays, which `.squeeze()` collapses to scalars).

alok · 2018-05-30T00:28:26Z

It fails Pendulum but runs on CartPole and Pong.

richardliaw · 2018-05-30T00:33:22Z

Yeah I just came to that decision too - that sounds good to me.

…

On Tue, May 29, 2018 at 5:28 PM Alok Singh ***@***.***> wrote: It fails Pendulum but runs on CartPole and Pong. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2036 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AEUc5SxUJbn3sJBqaRHFRVp-RinlnC4tks5t3eetgaJpZM4T7TwY> .

AmplabJenkins · 2018-05-30T01:32:19Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5697/
Test PASSed.

AmplabJenkins · 2018-05-30T01:37:42Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5700/
Test FAILed.

AmplabJenkins · 2018-05-30T01:55:19Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5699/
Test PASSed.

AmplabJenkins · 2018-05-30T06:02:34Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5708/
Test PASSed.

alok · 2018-05-30T17:14:28Z

@richardliaw This passes lint and implements the decision to only support envs like CartPole and Pong (for now).

richardliaw · 2018-05-30T17:48:01Z

Test failures unrelated

richardliaw · 2018-05-30T17:48:25Z

thanks for contributing this!

* master: [autoscaler] GCP node provider (ray-project#2061) [xray] Evict tasks from the lineage cache (ray-project#2152) [ASV] Add ray.init and simple Ray benchmarks (ray-project#2166) Re-encrypt key for uploading to S3 from travis to use travis-ci.com. (ray-project#2169) [rllib] Fix A3C PyTorch implementation (ray-project#2036) [JavaWorker] Do not kill local-scheduler-forked workers in RunManager.cleanup (ray-project#2151) Update Travis CI badge from travis-ci.org to travis-ci.com. (ray-project#2155) Implement Python global state API for xray. (ray-project#2125) [xray] Improve flush algorithm for the lineage cache (ray-project#2130) Fix support for actor classmethods (ray-project#2146) Add empty df test (ray-project#1879) [JavaWorker] Enable java worker support (ray-project#2094) [DataFrame] Fixing the code formatting of the tests (ray-project#2123) Update resource documentation (remove outdated limitations). (ray-project#2022) bugfix: use array redis_primary_addr out of its scope (ray-project#2139) Fix infinite retry in Push function. (ray-project#2133) [JavaWorker] Changes to the directory under src for support java worker (ray-project#2093) Integrate credis with Ray & route task table entries into credis. (ray-project#1841)

alok added 5 commits May 11, 2018 10:53

Use F.softmax instead of a pointless network layer

51901ad

Stateless functions should not be network layers.

Use correct pytorch functions

5d7fc19

Rename argument name to out_size

8583616

Matches in_size and makes more sense.

Fix shapes of tensors

18c4a4c

Advantages and rewards both should be scalars, and therefore a list of them should be 1D.

Fmt

64ae2ab

alok commented May 11, 2018

View reviewed changes

richardliaw reviewed May 11, 2018

View reviewed changes

replace deprecated function

8accdae

richardliaw approved these changes May 12, 2018

View reviewed changes

rm unnecessary Variable wrapper

8645cd7

alok added 4 commits May 14, 2018 23:06

rm all use of torch Variables

47e8ebd

Torch does this for us now.

Ensure that values are flat list

884a6a8

Fix shape error in conv nets

7d1b205

richardliaw reviewed May 15, 2018

View reviewed changes

richardliaw mentioned this pull request May 16, 2018

Ray fails to serialize Torch tensor. #955

Closed

richardliaw reviewed May 29, 2018

View reviewed changes

try adding pytorch tests

9acd029

typo

c4b8ca7

fixup docker messages

6a79793

Fix A3C for some envs

7cdedf3

Pendulum doesn't work since it's an edge case (expects singleton arrays, which `.squeeze()` collapses to scalars).

alok and others added 2 commits May 29, 2018 17:51

fmt

da414fc

nit flake

3b9234f

small lint

9ddab77

richardliaw merged commit fd234e3 into ray-project:master May 30, 2018

alok deleted the fix-a3c-torch branch June 1, 2018 05:24

Comments

Conversation

alok commented May 11, 2018

What do these changes do?

Related issue number

Uh oh!

AmplabJenkins commented May 11, 2018

Uh oh!

alok May 11, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alok May 11, 2018

Choose a reason for hiding this comment

Uh oh!

richardliaw left a comment

Choose a reason for hiding this comment

Uh oh!

alok commented May 11, 2018 via email

Uh oh!

richardliaw commented May 12, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

richardliaw left a comment

Choose a reason for hiding this comment

Uh oh!

alok commented May 12, 2018

Uh oh!

AmplabJenkins commented May 12, 2018

Uh oh!

AmplabJenkins commented May 14, 2018

Uh oh!

AmplabJenkins commented May 15, 2018

Uh oh!

richardliaw May 15, 2018

Choose a reason for hiding this comment

Uh oh!

alok May 15, 2018

Choose a reason for hiding this comment

Uh oh!

richardliaw commented May 15, 2018

Uh oh!

alok commented May 15, 2018 via email • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

richardliaw commented May 16, 2018

Uh oh!

AmplabJenkins commented May 16, 2018

Uh oh!

AmplabJenkins commented May 16, 2018

Uh oh!

AmplabJenkins commented May 29, 2018

Uh oh!

richardliaw left a comment

Choose a reason for hiding this comment

Uh oh!

richardliaw May 29, 2018

Choose a reason for hiding this comment

Uh oh!

alok May 29, 2018

Choose a reason for hiding this comment

Uh oh!

alok commented May 29, 2018

Uh oh!

richardliaw commented May 29, 2018

Uh oh!

AmplabJenkins commented May 29, 2018

Uh oh!

alok commented May 29, 2018

Uh oh!

AmplabJenkins commented May 29, 2018

Uh oh!

richardliaw commented May 29, 2018

Uh oh!

AmplabJenkins commented May 29, 2018

Uh oh!

alok commented May 29, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alok commented May 30, 2018

alok May 11, 2018 •

edited

Loading

richardliaw commented May 12, 2018 •

edited

Loading

alok commented May 15, 2018 via email •

edited

Loading

alok commented May 29, 2018 •

edited

Loading

alok commented May 30, 2018 •

edited

Loading