[rllib, WIP, DO NOT MERGE] TRPO PyTorch implementation#2021
Closed
alok wants to merge 90 commits intoray-project:masterfrom
Closed
[rllib, WIP, DO NOT MERGE] TRPO PyTorch implementation#2021alok wants to merge 90 commits intoray-project:masterfrom
alok wants to merge 90 commits intoray-project:masterfrom
Conversation
keep_dims is deprecated.
temp ones
- constant deprecated for constant_ - specify dimension to softmax explicitly
Debug torch script without multithreading getting in the way yet.
Fixes some bugs that crept in
mse_loss is faster, easier to read, and builtin
Since we typically care about the `.data` attribute, but may not want to just subclass dict (I assume), we can use `dict`'s already implemented magic methods to clean up syntax. These let us just treat rollouts more or less as dicts.
No longer need to explicitly wrap
* master: (21 commits) Expand local_dir in Trial init (ray-project#2013) Fixing ascii error for Python2 (ray-project#2009) [DataFrame] Implements df.update (ray-project#1997) [DataFrame] Implements df.as_matrix (ray-project#2001) [DataFrame] Implement quantile (ray-project#1992) [DataFrame] Impement sort_values and sort_index (ray-project#1977) [DataFrame] Implement rank (ray-project#1991) [DataFrame] Implemented prod, product, added test suite (ray-project#1994) [DataFrame] Implemented __setitem__, select_dtypes, and astype (ray-project#1941) [DataFrame] Implement diff (ray-project#1996) [DataFrame] Implemented nunique, skew (ray-project#1995) [DataFrame] Implements filter and dropna (ray-project#1959) [DataFrame] Implements df.pipe (ray-project#1999) [DataFrame] Apply() for Lists and Dicts (ray-project#1973) Clean up syntax for supported Python versions. (ray-project#1963) [DataFrame] Implements mode, to_datetime, and get_dummies (ray-project#1956) [DataFrame] Fix dtypes (ray-project#1930) keep_dims -> keepdims (ray-project#1980) add pthread linking (ray-project#1986) [DataFrame] Add layer of abstraction to allow OID instantiation (ray-project#1984) ...
detach() can alert you to errors in the computation graph construction
old_p is not a good variable name
Activations should not be named attributes since they carry no state.
Rewards are always scalars and should be treated as such. Treating a scalar as a 1x1 array is a great way to give yourself a headache.
`size` is an awful variable name compared to out_size, especially since `in_size` is defined too.
Matches NumPy (return type (torch.Size) is a wrapper over tuples).
This is a syntax error in python 2.7. I'm counting down the days till end of life.
* fix-a3c-torch: (37 commits) Add missing channel major Use correct filter size Add TODO Fix shape errors fmt Performance fix (ray-project#2110) Use flake8-comprehensions (ray-project#1976) Improve error message printing and suppression. (ray-project#2104) [rllib] [doc] Broken link in ddpg doc YAPF, take 3 (ray-project#2098) [rllib] rename async -> _async (ray-project#2097) fix unused lambda capture (ray-project#2102) [xray] Use pubsub instead of timeout for ObjectManager Pull. (ray-project#2079) [DataFrame] Update _inherit_docstrings (ray-project#2085) [JavaWorker] Changes to the build system for support java worker (ray-project#2092) [xray] Fix bug in updating actor execution dependencies (ray-project#2064) [DataFrame] Refactor __delitem__ (ray-project#2080) [xray] Better error messaging when pulling from self. (ray-project#2068) Use source code in hash where possible (fix ray-project#2089) (ray-project#2090) Functions for flushing done tasks and evicted objects. (ray-project#2033) ...
* master: [DataFrame] Refactor GroupBy Methods and Implement Reindex (ray-project#2101) Initial Support for Airspeed Velocity (ray-project#2113) Use automatic memory management in Redis modules. (ray-project#1797) [DataFrame] Test bugfixes (ray-project#2111) [DataFrame] Update initializations of IndexMetadata which use outdated APIs (ray-project#2103)
|
Test FAILed. |
* master: Prototype named actors. (ray-project#2129) Update arrow to latest master (ray-project#2100) [DataFrame] Speed up dtypes (ray-project#2118) do not fetch from dead Plasma Manager (ray-project#2116) [DataFrame] Refactor GroupBy Methods and Implement Reindex (ray-project#2101) Initial Support for Airspeed Velocity (ray-project#2113) Use automatic memory management in Redis modules. (ray-project#1797) [DataFrame] Test bugfixes (ray-project#2111) [DataFrame] Update initializations of IndexMetadata which use outdated APIs (ray-project#2103)
* fix-a3c-torch: Prototype named actors. (ray-project#2129) Update arrow to latest master (ray-project#2100) [DataFrame] Speed up dtypes (ray-project#2118) do not fetch from dead Plasma Manager (ray-project#2116)
This should be handled by the agent or at least in a cleaner way that doesn't break existing envs.
This should deal with some cases such as cartpole where actions are scalars while leaving alone cases where actions are arrays (some robotics tasks).
Pendulum doesn't work since it's an edge case (expects singleton arrays, which `.squeeze()` collapses to scalars).
* fix-a3c-torch: small lint nit flake fmt Fix A3C for some envs fixup docker messages typo try adding pytorch tests Squeeze actions along first dimension Squeeze action Revert reshape of action
|
Test PASSed. |
Clearer name.
A lot of the comments are pointless.
* master: [autoscaler] GCP node provider (ray-project#2061) [xray] Evict tasks from the lineage cache (ray-project#2152) [ASV] Add ray.init and simple Ray benchmarks (ray-project#2166) Re-encrypt key for uploading to S3 from travis to use travis-ci.com. (ray-project#2169) [rllib] Fix A3C PyTorch implementation (ray-project#2036) [JavaWorker] Do not kill local-scheduler-forked workers in RunManager.cleanup (ray-project#2151) Update Travis CI badge from travis-ci.org to travis-ci.com. (ray-project#2155) Implement Python global state API for xray. (ray-project#2125) [xray] Improve flush algorithm for the lineage cache (ray-project#2130) Fix support for actor classmethods (ray-project#2146) Add empty df test (ray-project#1879) [JavaWorker] Enable java worker support (ray-project#2094) [DataFrame] Fixing the code formatting of the tests (ray-project#2123) Update resource documentation (remove outdated limitations). (ray-project#2022) bugfix: use array redis_primary_addr out of its scope (ray-project#2139) Fix infinite retry in Push function. (ray-project#2133) [JavaWorker] Changes to the directory under src for support java worker (ray-project#2093) Integrate credis with Ray & route task table entries into credis. (ray-project#1841)
|
Test PASSed. |
Contributor
|
Is this still in progress or should we close? |
Contributor
Author
|
I was planning to resubmit this one since it's become obsolete in light of the new RLlib changes. |
Contributor
|
Sounds good!
…On Thu, Jul 19, 2018 at 7:43 AM Alok Singh ***@***.***> wrote:
Closed #2021 <#2021>.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#2021 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAA6St9joCqgYTna42JLpTUSSl69kNcxks5uIJsUgaJpZM4T4AMA>
.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What do these changes do?
Adds TRPO to RLlib. PyTorch only, for now.
Doesn't work yet. The agent will execute the environment, but there's a few
issues to work out before merging.
_saveneed to be implementedapply_gradientszeros out the gradients that_backwardworked sohard to produce. I got around that by sticking in a deepcopy, but
that's obviously not ideal.
I don't anticipate it needing a total rewrite.
chain(...)to get around the fact that generators areconsumed on use.
Related issue number
None, though I think there's some commits to cherry-pick that fix a few
issues in the PyTorch model code. @rliaw may find some of them relevant.