-
Notifications
You must be signed in to change notification settings - Fork 7k
[RLlib] Fix failing env step in MultiAgentEnvRunner.
#55567
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Kamil Kaczmarek <[email protected]>
Signed-off-by: Kamil Kaczmarek <[email protected]>
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
simonsays1980
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great PR @kamil-kaczmarek! Thanks a lot for working through this complex setup. I left some comments and request another debug iteration to understand better why this happens now and the intended process is somehow interrupted.
| ) | ||
| # In case all environments had been terminated `to_module` will be | ||
| # empty and no actions are needed b/c we reset all environemnts. | ||
| # empty and no actions are needed b/c we reset all environments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder why this happens now that the to_module is None. Can we debug another round and see where this happens? Then check the autoreset and the connector run (I know this is complex).
What should happen is: env resets automatically; init obs goes through the to_module connector pipeline and produces to_module which can in turn passed through the module and the to_env pipeline to produce an action.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me investigate this more. This first happened last Thursday in the release tests. Will look into the code diff.
MultiAgentEnvRunnerMultiAgentEnvRunner.
… true after env.step(). Signed-off-by: Kamil Kaczmarek <[email protected]>
Signed-off-by: Kamil Kaczmarek <[email protected]>
|
This pull request has been automatically marked as stale because it has not had You can always ask for help on our discussion forum or Ray's public slack channel. If you'd like to keep this open, just leave any comment, and the stale label will be removed. |
Signed-off-by: Kamil Kaczmarek <[email protected]>
|
unstale |
Signed-off-by: Kamil Kaczmarek <[email protected]>
Signed-off-by: Kamil Kaczmarek <[email protected]>
Signed-off-by: Kamil Kaczmarek <[email protected]>
Signed-off-by: Kamil Kaczmarek <[email protected]>
Signed-off-by: Kamil Kaczmarek <[email protected]>
Signed-off-by: Kamil Kaczmarek <[email protected]>
Signed-off-by: Kamil Kaczmarek <[email protected]>
Signed-off-by: Kamil Kaczmarek <[email protected]>
Signed-off-by: Kamil Kaczmarek <[email protected]>
simonsays1980
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Nice patch @kamil-kaczmarek ! Thanks for the work!
…5567) ## Why are these changes needed? Fix failing release test: `learning_tests_multi_agent_cartpole_appo_multi_gpu`. ## Related issue number <!-- For example: "Closes ray-project#1234" --> ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [x] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Kamil Kaczmarek <[email protected]> Signed-off-by: Kamil Kaczmarek <[email protected]> Signed-off-by: Zhiqiang Ma <[email protected]>
…5567) ## Why are these changes needed? Fix failing release test: `learning_tests_multi_agent_cartpole_appo_multi_gpu`. ## Related issue number <!-- For example: "Closes ray-project#1234" --> ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [x] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Kamil Kaczmarek <[email protected]> Signed-off-by: Kamil Kaczmarek <[email protected]> Signed-off-by: zac <[email protected]>
## Why are these changes needed? Fix failing release test: `learning_tests_multi_agent_cartpole_appo_multi_gpu`. ## Related issue number <!-- For example: "Closes #1234" --> ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [x] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Kamil Kaczmarek <[email protected]> Signed-off-by: Kamil Kaczmarek <[email protected]> Signed-off-by: elliot-barn <[email protected]>
…5567) ## Why are these changes needed? Fix failing release test: `learning_tests_multi_agent_cartpole_appo_multi_gpu`. ## Related issue number <!-- For example: "Closes ray-project#1234" --> ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [x] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Kamil Kaczmarek <[email protected]> Signed-off-by: Kamil Kaczmarek <[email protected]> Signed-off-by: Marco Stephan <[email protected]>
## Why are these changes needed? Fix failing release test: `learning_tests_multi_agent_cartpole_appo_multi_gpu`. ## Related issue number <!-- For example: "Closes #1234" --> ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [x] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Kamil Kaczmarek <[email protected]> Signed-off-by: Kamil Kaczmarek <[email protected]> Signed-off-by: elliot-barn <[email protected]>
…5567) ## Why are these changes needed? Fix failing release test: `learning_tests_multi_agent_cartpole_appo_multi_gpu`. ## Related issue number <!-- For example: "Closes ray-project#1234" --> ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [x] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Kamil Kaczmarek <[email protected]> Signed-off-by: Kamil Kaczmarek <[email protected]> Signed-off-by: Douglas Strodtman <[email protected]>
…5567) ## Why are these changes needed? Fix failing release test: `learning_tests_multi_agent_cartpole_appo_multi_gpu`. ## Related issue number <!-- For example: "Closes ray-project#1234" --> ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [x] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Kamil Kaczmarek <[email protected]> Signed-off-by: Kamil Kaczmarek <[email protected]>
…5567) ## Why are these changes needed? Fix failing release test: `learning_tests_multi_agent_cartpole_appo_multi_gpu`. ## Related issue number <!-- For example: "Closes ray-project#1234" --> ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [x] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Kamil Kaczmarek <[email protected]> Signed-off-by: Kamil Kaczmarek <[email protected]>
## Why are these changes needed? We introduced an improved error message when environments fail in #55567. At the same time, this bypasses the silencing of env step errors. This PR consolidates the messages. --------- Co-authored-by: Kamil Kaczmarek <[email protected]>
## Why are these changes needed? We introduced an improved error message when environments fail in ray-project#55567. At the same time, this bypasses the silencing of env step errors. This PR consolidates the messages. --------- Co-authored-by: Kamil Kaczmarek <[email protected]>
…5567) ## Why are these changes needed? Fix failing release test: `learning_tests_multi_agent_cartpole_appo_multi_gpu`. ## Related issue number <!-- For example: "Closes ray-project#1234" --> ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [x] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Kamil Kaczmarek <[email protected]> Signed-off-by: Kamil Kaczmarek <[email protected]> Signed-off-by: Future-Outlier <[email protected]>
Why are these changes needed?
Fix failing release test:
learning_tests_multi_agent_cartpole_appo_multi_gpu.Related issue number
Checks
git commit -s) in this PR.scripts/format.shto lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/under thecorresponding
.rstfile.