Skip to content

Comments

[rllib, WIP, DO NOT MERGE] TRPO PyTorch implementation#2021

Closed
alok wants to merge 90 commits intoray-project:masterfrom
alok:trpo
Closed

[rllib, WIP, DO NOT MERGE] TRPO PyTorch implementation#2021
alok wants to merge 90 commits intoray-project:masterfrom
alok:trpo

Conversation

@alok
Copy link
Contributor

@alok alok commented May 9, 2018

What do these changes do?

Adds TRPO to RLlib. PyTorch only, for now.

Doesn't work yet. The agent will execute the environment, but there's a few
issues to work out before merging.

  1. Remote evaluators error out
  2. Some agent methods like _save need to be implemented
  3. apply_gradients zeros out the gradients that _backward worked so
    hard to produce. I got around that by sticking in a deepcopy, but
    that's obviously not ideal.
  4. The implementation isn't correct. There's a bug or two in it, though
    I don't anticipate it needing a total rewrite.
  5. There's extra layers of indirection that'll need removing.
  6. Annoying use of chain(...) to get around the fact that generators are
    consumed on use.

Related issue number

None, though I think there's some commits to cherry-pick that fix a few
issues in the PyTorch model code. @rliaw may find some of them relevant.

alok added 30 commits May 2, 2018 05:07
keep_dims is deprecated.
temp ones
- constant deprecated for constant_
- specify dimension to softmax explicitly
Debug torch script without multithreading getting in the way yet.
Fixes some bugs that crept in
mse_loss is faster, easier to read, and builtin
Since we typically care about the `.data` attribute, but may not want to just
subclass dict (I assume), we can use `dict`'s already implemented magic methods
to clean up syntax. These let us just treat rollouts more or less as dicts.
No longer need to explicitly wrap
- rm comments
- try out different params to test if shapes are correct
* master: (21 commits)
  Expand local_dir in Trial init (ray-project#2013)
  Fixing ascii error for Python2 (ray-project#2009)
  [DataFrame] Implements df.update (ray-project#1997)
  [DataFrame] Implements df.as_matrix (ray-project#2001)
  [DataFrame] Implement quantile (ray-project#1992)
  [DataFrame] Impement sort_values and sort_index (ray-project#1977)
  [DataFrame] Implement rank (ray-project#1991)
  [DataFrame] Implemented prod, product, added test suite (ray-project#1994)
  [DataFrame] Implemented __setitem__, select_dtypes, and astype (ray-project#1941)
  [DataFrame] Implement diff (ray-project#1996)
  [DataFrame] Implemented nunique, skew (ray-project#1995)
  [DataFrame] Implements filter and dropna (ray-project#1959)
  [DataFrame] Implements df.pipe (ray-project#1999)
  [DataFrame] Apply() for Lists and Dicts (ray-project#1973)
  Clean up syntax for supported Python versions. (ray-project#1963)
  [DataFrame] Implements mode, to_datetime, and get_dummies (ray-project#1956)
  [DataFrame] Fix dtypes (ray-project#1930)
  keep_dims -> keepdims (ray-project#1980)
  add pthread linking (ray-project#1986)
  [DataFrame] Add layer of abstraction to allow OID instantiation (ray-project#1984)
  ...
detach() can alert you to errors in the computation graph construction
old_p is not a good variable name
Activations should not be named attributes since they carry no state.
Rewards are always scalars and should be treated as such. Treating a scalar as
a 1x1 array is a great way to give yourself a headache.
`size` is an awful variable name compared to out_size, especially since
`in_size` is defined too.
Matches NumPy (return type (torch.Size) is a wrapper over tuples).
Seems to work now though
This is a syntax error in python 2.7. I'm counting down the days till end of
life.
alok added 3 commits May 24, 2018 00:32
* fix-a3c-torch: (37 commits)
  Add missing channel major
  Use correct filter size
  Add TODO
  Fix shape errors
  fmt
  Performance fix (ray-project#2110)
  Use flake8-comprehensions (ray-project#1976)
  Improve error message printing and suppression. (ray-project#2104)
  [rllib] [doc] Broken link in ddpg doc
  YAPF, take 3 (ray-project#2098)
  [rllib] rename async -> _async (ray-project#2097)
  fix unused lambda capture (ray-project#2102)
  [xray] Use pubsub instead of timeout for ObjectManager Pull. (ray-project#2079)
  [DataFrame] Update _inherit_docstrings (ray-project#2085)
  [JavaWorker] Changes to the build system for support java worker (ray-project#2092)
  [xray] Fix bug in updating actor execution dependencies (ray-project#2064)
  [DataFrame] Refactor __delitem__ (ray-project#2080)
  [xray] Better error messaging when pulling from self. (ray-project#2068)
  Use source code in hash where possible (fix ray-project#2089) (ray-project#2090)
  Functions for flushing done tasks and evicted objects. (ray-project#2033)
  ...
* master:
  [DataFrame] Refactor GroupBy Methods and Implement Reindex (ray-project#2101)
  Initial Support for Airspeed Velocity (ray-project#2113)
  Use automatic memory management in Redis modules. (ray-project#1797)
  [DataFrame] Test bugfixes (ray-project#2111)
  [DataFrame] Update initializations of IndexMetadata which use outdated APIs (ray-project#2103)
@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5618/
Test FAILed.

alok and others added 13 commits May 24, 2018 18:18
* master:
  Prototype named actors. (ray-project#2129)
  Update arrow to latest master (ray-project#2100)
  [DataFrame] Speed up dtypes (ray-project#2118)
  do not fetch from dead Plasma Manager (ray-project#2116)
  [DataFrame] Refactor GroupBy Methods and Implement Reindex (ray-project#2101)
  Initial Support for Airspeed Velocity (ray-project#2113)
  Use automatic memory management in Redis modules. (ray-project#1797)
  [DataFrame] Test bugfixes (ray-project#2111)
  [DataFrame] Update initializations of IndexMetadata which use outdated APIs (ray-project#2103)
* fix-a3c-torch:
  Prototype named actors. (ray-project#2129)
  Update arrow to latest master (ray-project#2100)
  [DataFrame] Speed up dtypes (ray-project#2118)
  do not fetch from dead Plasma Manager (ray-project#2116)
This should be handled by the agent or at least in a cleaner way that doesn't
break existing envs.
This should deal with some cases such as cartpole where actions are scalars
while leaving alone cases where actions are arrays (some robotics tasks).
Pendulum doesn't work since it's an edge case (expects singleton arrays, which
`.squeeze()` collapses to scalars).
* fix-a3c-torch:
  small lint
  nit flake
  fmt
  Fix A3C for some envs
  fixup docker messages
  typo
  try adding pytorch tests
  Squeeze actions along first dimension
  Squeeze action
  Revert reshape of action
@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5714/
Test PASSed.

alok added 7 commits June 1, 2018 03:34
A lot of the comments are pointless.
* master:
  [autoscaler] GCP node provider (ray-project#2061)
  [xray] Evict tasks from the lineage cache (ray-project#2152)
  [ASV] Add ray.init and simple Ray benchmarks (ray-project#2166)
  Re-encrypt key for uploading to S3 from travis to use travis-ci.com. (ray-project#2169)
  [rllib] Fix A3C PyTorch implementation (ray-project#2036)
  [JavaWorker] Do not kill local-scheduler-forked workers in RunManager.cleanup (ray-project#2151)
  Update Travis CI badge from travis-ci.org to travis-ci.com. (ray-project#2155)
  Implement Python global state API for xray. (ray-project#2125)
  [xray] Improve flush algorithm for the lineage cache (ray-project#2130)
  Fix support for actor classmethods (ray-project#2146)
  Add empty df test (ray-project#1879)
  [JavaWorker] Enable java worker support (ray-project#2094)
  [DataFrame] Fixing the code formatting of the tests (ray-project#2123)
  Update resource documentation (remove outdated limitations). (ray-project#2022)
  bugfix: use array redis_primary_addr out of its scope (ray-project#2139)
  Fix infinite retry in Push function. (ray-project#2133)
  [JavaWorker] Changes to the directory under src for support java worker (ray-project#2093)
  Integrate credis with Ray & route task table entries into credis. (ray-project#1841)
@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5841/
Test PASSed.

@ericl
Copy link
Contributor

ericl commented Jul 19, 2018

Is this still in progress or should we close?

@alok
Copy link
Contributor Author

alok commented Jul 19, 2018

I was planning to resubmit this one since it's become obsolete in light of the new RLlib changes.

@alok alok closed this Jul 19, 2018
@ericl
Copy link
Contributor

ericl commented Jul 19, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants