Skip to content

Conversation

@ArturNiederfahrenhorst
Copy link
Contributor

Why are these changes needed?

We introduced an improved error message when environments fail in #55567.
At the same time, this bypasses the silencing of env step errors.
This PR consolidates the messages.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively consolidates error messaging for environment step failures by moving the logic into the base EnvRunner class. This change removes redundant logging from MultiAgentEnvRunner and correctly restores the silencing of StepFailedRecreateEnvError, which was being bypassed. The new error message is more informative, and the logic for retrying the sample with a forced reset is now more explicit. The changes improve code maintainability and the clarity of error logs. The implementation is solid and I have no further suggestions.

@ray-gardener ray-gardener bot added the rllib RLlib related issues label Sep 19, 2025
cursor[bot]

This comment was marked as outdated.

Copy link
Contributor

@kamil-kaczmarek kamil-kaczmarek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ArturNiederfahrenhorst!

Let's use this PR as an opportunity to simplify the logic here and provide user with more info. We can do in the else block:

logger.exception(
    f"RLlib {self.__class__.__name__}: Environment step failed and "
    "'config.restart_failed_sub_environments' is False. "
    "This env will not be recreated. "
    "Consider setting 'fault_tolerance(restart_failed_sub_environments=True)' in your AlgorithmConfig "
    "in order to automatically re-create and force-reset an env."
    f"The original error type: {type(e)}. "
    f"{e}"
)
raise RuntimeError from e

Perhaps RuntimeError would be better that ValueError.

Your thoughts?

@ArturNiederfahrenhorst
Copy link
Contributor Author

Great idea. Better now?

Copy link
Contributor

@kamil-kaczmarek kamil-kaczmarek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one more comment and we are good to go! Thanks a lot for your patience with this! 💯

cursor[bot]

This comment was marked as outdated.

@ArturNiederfahrenhorst ArturNiederfahrenhorst added the go add ONLY when ready to merge, run all tests label Sep 24, 2025
@github-actions
Copy link

github-actions bot commented Nov 4, 2025

This pull request has been automatically marked as stale because it has not had
any activity for 14 days. It will be closed in another 14 days if no further activity occurs.
Thank you for your contributions.

You can always ask for help on our discussion forum or Ray's public slack channel.

If you'd like to keep this open, just leave any comment, and the stale label will be removed.

@github-actions github-actions bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Nov 4, 2025
@github-actions
Copy link

This pull request has been automatically closed because there has been no more activity in the 14 days
since being marked stale.

Please feel free to reopen or open a new pull request if you'd still like this to be addressed.

Again, you can always ask for help on our discussion forum or Ray's public slack channel.

Thanks again for your contribution!

@kamil-kaczmarek
Copy link
Contributor

kamil-kaczmarek commented Nov 19, 2025

yo @ArturNiederfahrenhorst! It's worth finishing. Can you fix DCO?

@github-actions github-actions bot added unstale A PR that has been marked unstale. It will not get marked stale again if this label is on it. and removed stale The issue is stale. It will be closed within 7 days unless there are further conversation labels Nov 20, 2025
@ArturNiederfahrenhorst ArturNiederfahrenhorst merged commit 8de65ce into ray-project:master Nov 25, 2025
6 checks passed
SheldonTsen pushed a commit to SheldonTsen/ray that referenced this pull request Dec 1, 2025
## Why are these changes needed?

We introduced an improved error message when environments fail in
ray-project#55567.
At the same time, this bypasses the silencing of env step errors.
This PR consolidates the messages.

---------

Co-authored-by: Kamil Kaczmarek <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go add ONLY when ready to merge, run all tests rllib RLlib related issues unstale A PR that has been marked unstale. It will not get marked stale again if this label is on it.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants