-
Notifications
You must be signed in to change notification settings - Fork 7k
[RLlib] Move new restart message into EnvRunner #56750
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] Move new restart message into EnvRunner #56750
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request effectively consolidates error messaging for environment step failures by moving the logic into the base EnvRunner class. This change removes redundant logging from MultiAgentEnvRunner and correctly restores the silencing of StepFailedRecreateEnvError, which was being bypassed. The new error message is more informative, and the logic for retrying the sample with a forced reset is now more explicit. The changes improve code maintainability and the clarity of error logs. The implementation is solid and I have no further suggestions.
kamil-kaczmarek
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ArturNiederfahrenhorst!
Let's use this PR as an opportunity to simplify the logic here and provide user with more info. We can do in the else block:
logger.exception(
f"RLlib {self.__class__.__name__}: Environment step failed and "
"'config.restart_failed_sub_environments' is False. "
"This env will not be recreated. "
"Consider setting 'fault_tolerance(restart_failed_sub_environments=True)' in your AlgorithmConfig "
"in order to automatically re-create and force-reset an env."
f"The original error type: {type(e)}. "
f"{e}"
)
raise RuntimeError from e
Perhaps RuntimeError would be better that ValueError.
Your thoughts?
|
Great idea. Better now? |
kamil-kaczmarek
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one more comment and we are good to go! Thanks a lot for your patience with this! 💯
|
This pull request has been automatically marked as stale because it has not had You can always ask for help on our discussion forum or Ray's public slack channel. If you'd like to keep this open, just leave any comment, and the stale label will be removed. |
|
This pull request has been automatically closed because there has been no more activity in the 14 days Please feel free to reopen or open a new pull request if you'd still like this to be addressed. Again, you can always ask for help on our discussion forum or Ray's public slack channel. Thanks again for your contribution! |
|
yo @ArturNiederfahrenhorst! It's worth finishing. Can you fix DCO? |
## Why are these changes needed? We introduced an improved error message when environments fail in ray-project#55567. At the same time, this bypasses the silencing of env step errors. This PR consolidates the messages. --------- Co-authored-by: Kamil Kaczmarek <[email protected]>
Why are these changes needed?
We introduced an improved error message when environments fail in #55567.
At the same time, this bypasses the silencing of env step errors.
This PR consolidates the messages.