[rllib, WIP, DO NOT MERGE] TRPO PyTorch implementation by alok · Pull Request #2021 · ray-project/ray

alok · 2018-05-09T09:32:06Z

What do these changes do?

Adds TRPO to RLlib. PyTorch only, for now.

Doesn't work yet. The agent will execute the environment, but there's a few
issues to work out before merging.

Remote evaluators error out
Some agent methods like _save need to be implemented
apply_gradients zeros out the gradients that _backward worked so
hard to produce. I got around that by sticking in a deepcopy, but
that's obviously not ideal.
The implementation isn't correct. There's a bug or two in it, though
I don't anticipate it needing a total rewrite.
There's extra layers of indirection that'll need removing.
Annoying use of chain(...) to get around the fact that generators are
consumed on use.

Related issue number

None, though I think there's some commits to cherry-pick that fix a few
issues in the PyTorch model code. @rliaw may find some of them relevant.

keep_dims is deprecated.

temp ones

- constant deprecated for constant_ - specify dimension to softmax explicitly

Debug torch script without multithreading getting in the way yet.

Fixes some bugs that crept in

mse_loss is faster, easier to read, and builtin

Since we typically care about the `.data` attribute, but may not want to just subclass dict (I assume), we can use `dict`'s already implemented magic methods to clean up syntax. These let us just treat rollouts more or less as dicts.

No longer need to explicitly wrap

- rm comments - try out different params to test if shapes are correct

* master: (21 commits) Expand local_dir in Trial init (ray-project#2013) Fixing ascii error for Python2 (ray-project#2009) [DataFrame] Implements df.update (ray-project#1997) [DataFrame] Implements df.as_matrix (ray-project#2001) [DataFrame] Implement quantile (ray-project#1992) [DataFrame] Impement sort_values and sort_index (ray-project#1977) [DataFrame] Implement rank (ray-project#1991) [DataFrame] Implemented prod, product, added test suite (ray-project#1994) [DataFrame] Implemented __setitem__, select_dtypes, and astype (ray-project#1941) [DataFrame] Implement diff (ray-project#1996) [DataFrame] Implemented nunique, skew (ray-project#1995) [DataFrame] Implements filter and dropna (ray-project#1959) [DataFrame] Implements df.pipe (ray-project#1999) [DataFrame] Apply() for Lists and Dicts (ray-project#1973) Clean up syntax for supported Python versions. (ray-project#1963) [DataFrame] Implements mode, to_datetime, and get_dummies (ray-project#1956) [DataFrame] Fix dtypes (ray-project#1930) keep_dims -> keepdims (ray-project#1980) add pthread linking (ray-project#1986) [DataFrame] Add layer of abstraction to allow OID instantiation (ray-project#1984) ...

detach() can alert you to errors in the computation graph construction

old_p is not a good variable name

Activations should not be named attributes since they carry no state.

Rewards are always scalars and should be treated as such. Treating a scalar as a 1x1 array is a great way to give yourself a headache.

`size` is an awful variable name compared to out_size, especially since `in_size` is defined too.

Matches NumPy (return type (torch.Size) is a wrapper over tuples).

Seems to work now though

This is a syntax error in python 2.7. I'm counting down the days till end of life.

* fix-a3c-torch: (37 commits) Add missing channel major Use correct filter size Add TODO Fix shape errors fmt Performance fix (ray-project#2110) Use flake8-comprehensions (ray-project#1976) Improve error message printing and suppression. (ray-project#2104) [rllib] [doc] Broken link in ddpg doc YAPF, take 3 (ray-project#2098) [rllib] rename async -> _async (ray-project#2097) fix unused lambda capture (ray-project#2102) [xray] Use pubsub instead of timeout for ObjectManager Pull. (ray-project#2079) [DataFrame] Update _inherit_docstrings (ray-project#2085) [JavaWorker] Changes to the build system for support java worker (ray-project#2092) [xray] Fix bug in updating actor execution dependencies (ray-project#2064) [DataFrame] Refactor __delitem__ (ray-project#2080) [xray] Better error messaging when pulling from self. (ray-project#2068) Use source code in hash where possible (fix ray-project#2089) (ray-project#2090) Functions for flushing done tasks and evicted objects. (ray-project#2033) ...

* master: [DataFrame] Refactor GroupBy Methods and Implement Reindex (ray-project#2101) Initial Support for Airspeed Velocity (ray-project#2113) Use automatic memory management in Redis modules. (ray-project#1797) [DataFrame] Test bugfixes (ray-project#2111) [DataFrame] Update initializations of IndexMetadata which use outdated APIs (ray-project#2103)

AmplabJenkins · 2018-05-24T08:09:17Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5618/
Test FAILed.

* master: Prototype named actors. (ray-project#2129) Update arrow to latest master (ray-project#2100) [DataFrame] Speed up dtypes (ray-project#2118) do not fetch from dead Plasma Manager (ray-project#2116) [DataFrame] Refactor GroupBy Methods and Implement Reindex (ray-project#2101) Initial Support for Airspeed Velocity (ray-project#2113) Use automatic memory management in Redis modules. (ray-project#1797) [DataFrame] Test bugfixes (ray-project#2111) [DataFrame] Update initializations of IndexMetadata which use outdated APIs (ray-project#2103)

* fix-a3c-torch: Prototype named actors. (ray-project#2129) Update arrow to latest master (ray-project#2100) [DataFrame] Speed up dtypes (ray-project#2118) do not fetch from dead Plasma Manager (ray-project#2116)

This should be handled by the agent or at least in a cleaner way that doesn't break existing envs.

This should deal with some cases such as cartpole where actions are scalars while leaving alone cases where actions are arrays (some robotics tasks).

Pendulum doesn't work since it's an edge case (expects singleton arrays, which `.squeeze()` collapses to scalars).

* fix-a3c-torch: small lint nit flake fmt Fix A3C for some envs fixup docker messages typo try adding pytorch tests Squeeze actions along first dimension Squeeze action Revert reshape of action

AmplabJenkins · 2018-05-30T09:21:02Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5714/
Test PASSed.

Clearer name.

A lot of the comments are pointless.

* master: [autoscaler] GCP node provider (ray-project#2061) [xray] Evict tasks from the lineage cache (ray-project#2152) [ASV] Add ray.init and simple Ray benchmarks (ray-project#2166) Re-encrypt key for uploading to S3 from travis to use travis-ci.com. (ray-project#2169) [rllib] Fix A3C PyTorch implementation (ray-project#2036) [JavaWorker] Do not kill local-scheduler-forked workers in RunManager.cleanup (ray-project#2151) Update Travis CI badge from travis-ci.org to travis-ci.com. (ray-project#2155) Implement Python global state API for xray. (ray-project#2125) [xray] Improve flush algorithm for the lineage cache (ray-project#2130) Fix support for actor classmethods (ray-project#2146) Add empty df test (ray-project#1879) [JavaWorker] Enable java worker support (ray-project#2094) [DataFrame] Fixing the code formatting of the tests (ray-project#2123) Update resource documentation (remove outdated limitations). (ray-project#2022) bugfix: use array redis_primary_addr out of its scope (ray-project#2139) Fix infinite retry in Push function. (ray-project#2133) [JavaWorker] Changes to the directory under src for support java worker (ray-project#2093) Integrate credis with Ray & route task table entries into credis. (ray-project#1841)

AmplabJenkins · 2018-06-04T00:17:55Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5841/
Test PASSed.

ericl · 2018-07-19T12:30:05Z

Is this still in progress or should we close?

alok · 2018-07-19T14:43:22Z

I was planning to resubmit this one since it's become obsolete in light of the new RLlib changes.

ericl · 2018-07-19T14:56:58Z

Sounds good!

…

On Thu, Jul 19, 2018 at 7:43 AM Alok Singh ***@***.***> wrote: Closed #2021 <#2021>. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#2021 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAA6St9joCqgYTna42JLpTUSSl69kNcxks5uIJsUgaJpZM4T4AMA> .

alok added 30 commits May 2, 2018 05:07

keep_dims -> keepdims

222bf13

keep_dims is deprecated.

WIP

a2ab3f7

Get single sample

7fc25c6

add test scripts

470d56a

temp ones

WIP

19b6510

Silence PyTorch warnings

4a24483

- constant deprecated for constant_ - specify dimension to softmax explicitly

Test A3C with only 1 worker

4c52ede

Debug torch script without multithreading getting in the way yet.

Use PyTorch's new scalar support

18b7692

Fixes some bugs that crept in

Use F.mse_loss instead of rolling our own

6745a4b

mse_loss is faster, easier to read, and builtin

Use correct samplebatch key

aaabe16

Write magic methods for SampleBatch/PartialRollout

5e7fe40

Since we typically care about the `.data` attribute, but may not want to just subclass dict (I assume), we can use `dict`'s already implemented magic methods to clean up syntax. These let us just treat rollouts more or less as dicts.

WIP

3d53186

Fix IndentationError

19490f4

rm Variable for torch 0.4.0

0f0a17b

No longer need to explicitly wrap

misc

1dade7c

- rm comments - try out different params to test if shapes are correct

Fix some shape errors in TRPO

353cff9

Use kl_divergence provided by PyTorch

06611c2

Use detach() over .data

06192a5

detach() can alert you to errors in the computation graph construction

Rename variables

bc3dca6

old_p is not a good variable name

rm unnecessary probs attribute

779fa5c

Activations should not be named attributes since they carry no state.

Fix rewards shape

2c46098

Rewards are always scalars and should be treated as such. Treating a scalar as a 1x1 array is a great way to give yourself a headache.

size -> out_size in SlimFC

0125eef

`size` is an awful variable name compared to out_size, especially since `in_size` is defined too.

.size() -> .shape

a3ec08d

Matches NumPy (return type (torch.Size) is a wrapper over tuples).

Use chain to adjust only action head

6d45db2

WIP

eda75ae

Seems to work now though

rm trailing comma

8ba4c17

This is a syntax error in python 2.7. I'm counting down the days till end of life.

Update test scripts

2d43fd0

leave note to debug remote_evaluators

fec90b7

Use .item() to extract number from torch scalar

c2592fd

alok added 3 commits May 24, 2018 00:32

Add missing channel major

db9804d

alok and others added 13 commits May 24, 2018 18:18

Merge branch 'fix-a3c-torch' into trpo

a62fa6e

* fix-a3c-torch: Prototype named actors. (ray-project#2129) Update arrow to latest master (ray-project#2100) [DataFrame] Speed up dtypes (ray-project#2118) do not fetch from dead Plasma Manager (ray-project#2116)

Revert reshape of action

27cd897

This should be handled by the agent or at least in a cleaner way that doesn't break existing envs.

Squeeze action

75ea9a7

Squeeze actions along first dimension

87ab87e

This should deal with some cases such as cartpole where actions are scalars while leaving alone cases where actions are arrays (some robotics tasks).

try adding pytorch tests

9acd029

typo

c4b8ca7

fixup docker messages

6a79793

Fix A3C for some envs

7cdedf3

Pendulum doesn't work since it's an edge case (expects singleton arrays, which `.squeeze()` collapses to scalars).

fmt

da414fc

nit flake

3b9234f

small lint

9ddab77

Merge branch 'fix-a3c-torch' into trpo

ca9b33c

* fix-a3c-torch: small lint nit flake fmt Fix A3C for some envs fixup docker messages typo try adding pytorch tests Squeeze actions along first dimension Squeeze action Revert reshape of action

alok added 7 commits June 1, 2018 03:34

Use A3C's save/restore/optimizer

51dc392

ent_coeff -> entropy_coeff

f3d401f

Clearer name.

Clean up config dicts

c57989d

A lot of the comments are pointless.

fmt

6c15780

Use async optimizer for TRPO

cbbaf32

Use single quotes

78ab9d4

alok closed this Jul 19, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

[rllib, WIP, DO NOT MERGE] TRPO PyTorch implementation#2021

[rllib, WIP, DO NOT MERGE] TRPO PyTorch implementation#2021
alok wants to merge 90 commits intoray-project:masterfrom
alok:trpo

alok commented May 9, 2018

Uh oh!

AmplabJenkins commented May 24, 2018

Uh oh!

AmplabJenkins commented May 30, 2018

Uh oh!

AmplabJenkins commented Jun 4, 2018

Uh oh!

ericl commented Jul 19, 2018

Uh oh!

alok commented Jul 19, 2018

Uh oh!

ericl commented Jul 19, 2018 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Comments

Conversation

alok commented May 9, 2018

What do these changes do?

Related issue number

Uh oh!

AmplabJenkins commented May 24, 2018

Uh oh!

AmplabJenkins commented May 30, 2018

Uh oh!

AmplabJenkins commented Jun 4, 2018

Uh oh!

ericl commented Jul 19, 2018

Uh oh!

alok commented Jul 19, 2018

Uh oh!

ericl commented Jul 19, 2018 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants