Skip to content

[RLlib] Cannot retreive checkpoint from tune due to nested metrics of RLlib #57533

@Morphlng

Description

@Morphlng

What happened + What you expected to happen

Description:

This is an issue I've already posted to Ray forum. In short, for the latest Ray 2.49.2 version, one can't get_best_checkpoint from tune.Tuner with metrics inside "env_runners" key.

What you expect

results = tuner.fit()
best_result = results.get_best_result()
ckpt = best_result.get_best_checkpoint(f"{ENV_RUNNER_RESULTS}/{EPISODE_RETURN_MEAN}", "max")
print(ckpt.path)

When using above code, we are expecting to get the best checkpoint regarding to pointed metric.

What happened

It will raise a RuntimeError, instead of returning the ckpt:

RuntimeError: Invalid metric name env_runners/episode_return_mean! You may choose from the following metrics: dict_keys(['timers', 'env_runners', 'learners', 'num_training_step_calls_per_iteration', 'num_env_steps_sampled_lifetime', 'fault_tolerance', 'env_runner_group', 'done', 'training_iteration', 'trial_id', 'date', 'timestamp', 'time_this_iter_s', 'time_total_s', 'pid', 'hostname', 'node_ip', 'config', 'time_since_restore', 'iterations_since_restore', 'perf', 'experiment_tag']).
Image

Ways to work around

It seems like if we use air.Result to load back these checkpoints, we can then retreive the checkpoint with nested metrics. So it really confused me since the traceback shows that we are calling the same method.

from ray import train

result = train.Result.from_path("D:/Cache/ray_results/PPO_Reproduce/PPO_CartPole-v1_53ebe_00000_0_2025-10-07_09-17-01")
best_ckpt = result.get_best_checkpoint(f"{ENV_RUNNER_RESULTS}/{EPISODE_RETURN_MEAN}", "max")
print(best_ckpt.path)

Related issue

However, RLlib's default result_dict is not correctly logging checkpoint_dir_name, which is required for air.Result.from_path to read the checkpoints.

Image

This has been an issue since Ray 2.8.0+ (where I have started to use Ray), and I have come up with a solution to manually add this through callback.

class CheckpointCallback(RLlibCallback):
    def on_train_result(
        self,
        *,
        algorithm: "Algorithm",
        metrics_logger: MetricsLogger = None,
        result: dict,
        **kwargs
    ):
        if algorithm._storage:
            algorithm._storage.current_checkpoint_index += 1
            result["checkpoint_dir_name"] = algorithm._storage.checkpoint_dir_name
            algorithm._storage.current_checkpoint_index -= 1

But It would be great if the ray team could officially fix the problem.

Versions / Dependencies

Environment:

  • Ray version: 2.49.2
  • Python version: 3.10.18
  • OS: Windows 11 24H2
  • Cloud/Infrastructure: Local
  • Other libs/tools (if relevant): PyTorch

Reproduction script

Reproduction script

from ray import tune
from ray.rllib.algorithms.ppo import PPOConfig
from ray.rllib.core.rl_module.default_model_config import DefaultModelConfig
from ray.rllib.utils.metrics import (
    ENV_RUNNER_RESULTS,
    EPISODE_RETURN_MEAN,
    NUM_ENV_STEPS_SAMPLED_LIFETIME,
)
from ray.tune.result import TRAINING_ITERATION

config = (
    PPOConfig()
    .environment("CartPole-v1")
    .rl_module(
        model_config=DefaultModelConfig(
            fcnet_hiddens=[32],
            fcnet_activation="linear",
            vf_share_layers=True,
        ),
    )
    .training(
        lr=0.0003,
        num_epochs=6,
        vf_loss_coeff=0.01,
    )
)

tuner = tune.Tuner(
    "PPO",
    param_space=config.to_dict(),
    run_config=tune.RunConfig(
        "PPO_Reproduce",
        checkpoint_config=tune.CheckpointConfig(
            num_to_keep=10,
            checkpoint_score_attribute=f"{ENV_RUNNER_RESULTS}/{EPISODE_RETURN_MEAN}",
            checkpoint_at_end=True,
            checkpoint_frequency=5,
        ),
        stop={
            f"{ENV_RUNNER_RESULTS}/{NUM_ENV_STEPS_SAMPLED_LIFETIME}": 3e5,
            f"{ENV_RUNNER_RESULTS}/{EPISODE_RETURN_MEAN}": 450,
            TRAINING_ITERATION: 100,
        },
    ),
)

results = tuner.fit()
best_result = results.get_best_result()
ckpt = best_result.get_best_checkpoint(f"{ENV_RUNNER_RESULTS}/{EPISODE_RETURN_MEAN}", "max")
print(ckpt.path)

Issue Severity

Low: It annoys or frustrates me.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P0Issues that should be fixed in short orderbugSomething that is supposed to be working; but isn'tcommunity-backlogrllibRLlib related issuesstabilitytuneTune-related issuesusability

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions