Skip to content

Conversation

@pseudo-rnd-thoughts
Copy link
Member

Description

RLlib uses nested metric structure (like "{ENV_RUNNER_RESULTS}/{EPISODE_RETURN_MEAN}") which Result.get_best_checkpoint doesn't support.
Following ResultGrid.get_best_result() to use unflattened_lookup, I've added that to get_best_checkpoint along with testing for nested structures (and its backward compatibility)

Reproduction script

from ray import tune
from ray.rllib.algorithms.ppo import PPOConfig
from ray.rllib.core.rl_module.default_model_config import DefaultModelConfig
from ray.rllib.utils.metrics import (
    ENV_RUNNER_RESULTS,
    EPISODE_RETURN_MEAN,
    NUM_ENV_STEPS_SAMPLED_LIFETIME,
)
from ray.tune.result import TRAINING_ITERATION

config = (
    PPOConfig()
    .environment("CartPole-v1")
    .training(num_epochs=6)
)

tuner = tune.Tuner(
    "PPO",
    param_space=config.to_dict(),
    run_config=tune.RunConfig(
        "PPO_Reproduce",
        checkpoint_config=tune.CheckpointConfig(
            num_to_keep=10,
            checkpoint_score_attribute=f"{ENV_RUNNER_RESULTS}/{EPISODE_RETURN_MEAN}",
            checkpoint_at_end=True,
            checkpoint_frequency=5,
        ),
        stop={
            f"{ENV_RUNNER_RESULTS}/{NUM_ENV_STEPS_SAMPLED_LIFETIME}": 3e5,
            f"{ENV_RUNNER_RESULTS}/{EPISODE_RETURN_MEAN}": 450,
            TRAINING_ITERATION: 100,
        },
    ),
)

results = tuner.fit()
best_result = results.get_best_result()
ckpt = best_result.get_best_checkpoint(f"{ENV_RUNNER_RESULTS}/{EPISODE_RETURN_MEAN}", "max")
print(ckpt.path)

Related issues

#57533

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for nested metrics in Result.get_best_checkpoint by using unflattened_lookup. The changes are correct and are accompanied by good tests covering nested metrics, different modes, and backward compatibility. I've suggested one improvement to the error message when an invalid metric is provided, to make it more helpful for users of nested metrics.

Signed-off-by: Mark Towers <[email protected]>
import pyarrow

import ray
from ray._private.dict import unflattened_lookup
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@justinvyu do we want to support this for Train V2 as well, or should we diverge for Tune?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok to support this in Train. This is only needed if users self-report nested dicts.

@pseudo-rnd-thoughts pseudo-rnd-thoughts changed the title [rllib, air, train] Add support for nested metrics for Result.get_best_checkpoint [rllib, air, train] Add support for nested metrics in Result.get_best_checkpoint Nov 13, 2025
import pyarrow

import ray
from ray._private.dict import unflattened_lookup
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok to support this in Train. This is only needed if users self-report nested dicts.

@pseudo-rnd-thoughts pseudo-rnd-thoughts added the go add ONLY when ready to merge, run all tests label Nov 19, 2025
Copy link
Contributor

@justinvyu justinvyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@justinvyu justinvyu merged commit 0325fab into ray-project:master Nov 21, 2025
7 checks passed
@justinvyu justinvyu changed the title [rllib, air, train] Add support for nested metrics in Result.get_best_checkpoint [rllib, train] Add support for nested metrics in Result.get_best_checkpoint Nov 21, 2025
400Ping pushed a commit to 400Ping/ray that referenced this pull request Nov 21, 2025
…ckpoint` (ray-project#58537)

RLlib uses nested metric structure (like
`"{ENV_RUNNER_RESULTS}/{EPISODE_RETURN_MEAN}"`) which
`Result.get_best_checkpoint` doesn't support.
Following `ResultGrid.get_best_result()` to use `unflattened_lookup`,
I've added that to `get_best_checkpoint` along with testing for nested
structures (and its backward compatibility)

---------

Signed-off-by: Mark Towers <[email protected]>
Signed-off-by: Mark Towers <[email protected]>
Co-authored-by: Mark Towers <[email protected]>
Co-authored-by: Justin Yu <[email protected]>
ykdojo pushed a commit to ykdojo/ray that referenced this pull request Nov 27, 2025
…ckpoint` (ray-project#58537)

RLlib uses nested metric structure (like
`"{ENV_RUNNER_RESULTS}/{EPISODE_RETURN_MEAN}"`) which
`Result.get_best_checkpoint` doesn't support.
Following `ResultGrid.get_best_result()` to use `unflattened_lookup`,
I've added that to `get_best_checkpoint` along with testing for nested
structures (and its backward compatibility)

---------

Signed-off-by: Mark Towers <[email protected]>
Signed-off-by: Mark Towers <[email protected]>
Co-authored-by: Mark Towers <[email protected]>
Co-authored-by: Justin Yu <[email protected]>
Signed-off-by: YK <[email protected]>
SheldonTsen pushed a commit to SheldonTsen/ray that referenced this pull request Dec 1, 2025
…ckpoint` (ray-project#58537)

RLlib uses nested metric structure (like
`"{ENV_RUNNER_RESULTS}/{EPISODE_RETURN_MEAN}"`) which
`Result.get_best_checkpoint` doesn't support.
Following `ResultGrid.get_best_result()` to use `unflattened_lookup`,
I've added that to `get_best_checkpoint` along with testing for nested
structures (and its backward compatibility)

---------

Signed-off-by: Mark Towers <[email protected]>
Signed-off-by: Mark Towers <[email protected]>
Co-authored-by: Mark Towers <[email protected]>
Co-authored-by: Justin Yu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go add ONLY when ready to merge, run all tests rllib RLlib related issues rllib-logging This problem is related to logging metrics train-tune

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants