[rllib, train] Add support for nested metrics in `Result.get_best_checkpoint` #58537

pseudo-rnd-thoughts · 2025-11-11T10:28:01Z

Description

RLlib uses nested metric structure (like "{ENV_RUNNER_RESULTS}/{EPISODE_RETURN_MEAN}") which Result.get_best_checkpoint doesn't support.
Following ResultGrid.get_best_result() to use unflattened_lookup, I've added that to get_best_checkpoint along with testing for nested structures (and its backward compatibility)

Reproduction script

from ray import tune
from ray.rllib.algorithms.ppo import PPOConfig
from ray.rllib.core.rl_module.default_model_config import DefaultModelConfig
from ray.rllib.utils.metrics import (
    ENV_RUNNER_RESULTS,
    EPISODE_RETURN_MEAN,
    NUM_ENV_STEPS_SAMPLED_LIFETIME,
)
from ray.tune.result import TRAINING_ITERATION

config = (
    PPOConfig()
    .environment("CartPole-v1")
    .training(num_epochs=6)
)

tuner = tune.Tuner(
    "PPO",
    param_space=config.to_dict(),
    run_config=tune.RunConfig(
        "PPO_Reproduce",
        checkpoint_config=tune.CheckpointConfig(
            num_to_keep=10,
            checkpoint_score_attribute=f"{ENV_RUNNER_RESULTS}/{EPISODE_RETURN_MEAN}",
            checkpoint_at_end=True,
            checkpoint_frequency=5,
        ),
        stop={
            f"{ENV_RUNNER_RESULTS}/{NUM_ENV_STEPS_SAMPLED_LIFETIME}": 3e5,
            f"{ENV_RUNNER_RESULTS}/{EPISODE_RETURN_MEAN}": 450,
            TRAINING_ITERATION: 100,
        },
    ),
)

results = tuner.fit()
best_result = results.get_best_result()
ckpt = best_result.get_best_checkpoint(f"{ENV_RUNNER_RESULTS}/{EPISODE_RETURN_MEAN}", "max")
print(ckpt.path)

Related issues

#57533

…st_checkpoint` Signed-off-by: Mark Towers <[email protected]>

gemini-code-assist

Code Review

This pull request adds support for nested metrics in Result.get_best_checkpoint by using unflattened_lookup. The changes are correct and are accompanied by good tests covering nested metrics, different modes, and backward compatibility. I've suggested one improvement to the error message when an invalid metric is provided, to make it more helpful for users of nested metrics.

Signed-off-by: Mark Towers <[email protected]>

python/ray/air/result.py

matthewdeng · 2025-11-11T20:55:22Z

python/ray/air/result.py

 import pyarrow

 import ray
+from ray._private.dict import unflattened_lookup


@justinvyu do we want to support this for Train V2 as well, or should we diverge for Tune?

I'm ok to support this in Train. This is only needed if users self-report nested dicts.

python/ray/air/result.py

justinvyu · 2025-11-17T20:44:10Z

python/ray/air/result.py

 import pyarrow

 import ray
+from ray._private.dict import unflattened_lookup


I'm ok to support this in Train. This is only needed if users self-report nested dicts.

python/ray/train/v2/tests/test_result.py

Co-authored-by: Justin Yu <[email protected]> Signed-off-by: Mark Towers <[email protected]>

Signed-off-by: Mark Towers <[email protected]>

python/ray/air/result.py

justinvyu

Thanks!

…ckpoint` (ray-project#58537) RLlib uses nested metric structure (like `"{ENV_RUNNER_RESULTS}/{EPISODE_RETURN_MEAN}"`) which `Result.get_best_checkpoint` doesn't support. Following `ResultGrid.get_best_result()` to use `unflattened_lookup`, I've added that to `get_best_checkpoint` along with testing for nested structures (and its backward compatibility) --------- Signed-off-by: Mark Towers <[email protected]> Signed-off-by: Mark Towers <[email protected]> Co-authored-by: Mark Towers <[email protected]> Co-authored-by: Justin Yu <[email protected]>

…ckpoint` (ray-project#58537) RLlib uses nested metric structure (like `"{ENV_RUNNER_RESULTS}/{EPISODE_RETURN_MEAN}"`) which `Result.get_best_checkpoint` doesn't support. Following `ResultGrid.get_best_result()` to use `unflattened_lookup`, I've added that to `get_best_checkpoint` along with testing for nested structures (and its backward compatibility) --------- Signed-off-by: Mark Towers <[email protected]> Signed-off-by: Mark Towers <[email protected]> Co-authored-by: Mark Towers <[email protected]> Co-authored-by: Justin Yu <[email protected]> Signed-off-by: YK <[email protected]>

…ckpoint` (ray-project#58537) RLlib uses nested metric structure (like `"{ENV_RUNNER_RESULTS}/{EPISODE_RETURN_MEAN}"`) which `Result.get_best_checkpoint` doesn't support. Following `ResultGrid.get_best_result()` to use `unflattened_lookup`, I've added that to `get_best_checkpoint` along with testing for nested structures (and its backward compatibility) --------- Signed-off-by: Mark Towers <[email protected]> Signed-off-by: Mark Towers <[email protected]> Co-authored-by: Mark Towers <[email protected]> Co-authored-by: Justin Yu <[email protected]>

[rllib, air, train] Add support for nested metrics for `Result.get_be…

f99d7a1

…st_checkpoint` Signed-off-by: Mark Towers <[email protected]>

pseudo-rnd-thoughts requested a review from a team as a code owner November 11, 2025 10:28

pseudo-rnd-thoughts added rllib RLlib related issues rllib-logging This problem is related to logging metrics train-tune labels Nov 11, 2025

pseudo-rnd-thoughts mentioned this pull request Nov 11, 2025

[RLlib] Cannot retreive checkpoint from tune due to nested metrics of RLlib #57533

Closed

gemini-code-assist bot reviewed Nov 11, 2025

View reviewed changes

Fix RLlib spelling

e3027e7

Signed-off-by: Mark Towers <[email protected]>

cursor bot reviewed Nov 11, 2025

View reviewed changes

python/ray/air/result.py Show resolved Hide resolved

matthewdeng reviewed Nov 11, 2025

View reviewed changes

Morphlng reviewed Nov 13, 2025

View reviewed changes

python/ray/air/result.py Show resolved Hide resolved

pseudo-rnd-thoughts changed the title ~~[rllib, air, train] Add support for nested metrics for Result.get_best_checkpoint~~ [rllib, air, train] Add support for nested metrics in Result.get_best_checkpoint Nov 13, 2025

justinvyu reviewed Nov 17, 2025

View reviewed changes

pseudo-rnd-thoughts and others added 3 commits November 17, 2025 20:48

Update python/ray/train/v2/tests/test_result.py

3549004

Co-authored-by: Justin Yu <[email protected]> Signed-off-by: Mark Towers <[email protected]>

Merge branch 'master' into issue-57533

50d949d

pre-commit

d3813bb

Signed-off-by: Mark Towers <[email protected]>

cursor bot reviewed Nov 17, 2025

View reviewed changes

python/ray/air/result.py Show resolved Hide resolved

pseudo-rnd-thoughts added the go add ONLY when ready to merge, run all tests label Nov 19, 2025

justinvyu approved these changes Nov 21, 2025

View reviewed changes

justinvyu merged commit 0325fab into ray-project:master Nov 21, 2025
7 checks passed

justinvyu changed the title ~~[rllib, air, train] Add support for nested metrics in Result.get_best_checkpoint~~ [rllib, train] Add support for nested metrics in Result.get_best_checkpoint Nov 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[rllib, train] Add support for nested metrics in `Result.get_best_checkpoint` #58537

[rllib, train] Add support for nested metrics in `Result.get_best_checkpoint` #58537

Uh oh!

pseudo-rnd-thoughts commented Nov 11, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

matthewdeng Nov 11, 2025

Uh oh!

justinvyu Nov 17, 2025

Uh oh!

Uh oh!

justinvyu Nov 17, 2025

Uh oh!

Uh oh!

Uh oh!

justinvyu left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[rllib, train] Add support for nested metrics in Result.get_best_checkpoint #58537

[rllib, train] Add support for nested metrics in Result.get_best_checkpoint #58537

Uh oh!

Conversation

pseudo-rnd-thoughts commented Nov 11, 2025

Description

Related issues

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

matthewdeng Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

justinvyu Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

justinvyu Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

justinvyu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[rllib, train] Add support for nested metrics in `Result.get_best_checkpoint` #58537

[rllib, train] Add support for nested metrics in `Result.get_best_checkpoint` #58537