-
Notifications
You must be signed in to change notification settings - Fork 7k
Description
What happened + What you expected to happen
Description:
This is an issue I've already posted to Ray forum. In short, for the latest Ray 2.49.2 version, one can't get_best_checkpoint from tune.Tuner with metrics inside "env_runners" key.
What you expect
results = tuner.fit()
best_result = results.get_best_result()
ckpt = best_result.get_best_checkpoint(f"{ENV_RUNNER_RESULTS}/{EPISODE_RETURN_MEAN}", "max")
print(ckpt.path)When using above code, we are expecting to get the best checkpoint regarding to pointed metric.
What happened
It will raise a RuntimeError, instead of returning the ckpt:
RuntimeError: Invalid metric name env_runners/episode_return_mean! You may choose from the following metrics: dict_keys(['timers', 'env_runners', 'learners', 'num_training_step_calls_per_iteration', 'num_env_steps_sampled_lifetime', 'fault_tolerance', 'env_runner_group', 'done', 'training_iteration', 'trial_id', 'date', 'timestamp', 'time_this_iter_s', 'time_total_s', 'pid', 'hostname', 'node_ip', 'config', 'time_since_restore', 'iterations_since_restore', 'perf', 'experiment_tag']).
Ways to work around
It seems like if we use air.Result to load back these checkpoints, we can then retreive the checkpoint with nested metrics. So it really confused me since the traceback shows that we are calling the same method.
from ray import train
result = train.Result.from_path("D:/Cache/ray_results/PPO_Reproduce/PPO_CartPole-v1_53ebe_00000_0_2025-10-07_09-17-01")
best_ckpt = result.get_best_checkpoint(f"{ENV_RUNNER_RESULTS}/{EPISODE_RETURN_MEAN}", "max")
print(best_ckpt.path)Related issue
However, RLlib's default result_dict is not correctly logging checkpoint_dir_name, which is required for air.Result.from_path to read the checkpoints.
This has been an issue since Ray 2.8.0+ (where I have started to use Ray), and I have come up with a solution to manually add this through callback.
class CheckpointCallback(RLlibCallback):
def on_train_result(
self,
*,
algorithm: "Algorithm",
metrics_logger: MetricsLogger = None,
result: dict,
**kwargs
):
if algorithm._storage:
algorithm._storage.current_checkpoint_index += 1
result["checkpoint_dir_name"] = algorithm._storage.checkpoint_dir_name
algorithm._storage.current_checkpoint_index -= 1But It would be great if the ray team could officially fix the problem.
Versions / Dependencies
Environment:
- Ray version: 2.49.2
- Python version: 3.10.18
- OS: Windows 11 24H2
- Cloud/Infrastructure: Local
- Other libs/tools (if relevant): PyTorch
Reproduction script
Reproduction script
from ray import tune
from ray.rllib.algorithms.ppo import PPOConfig
from ray.rllib.core.rl_module.default_model_config import DefaultModelConfig
from ray.rllib.utils.metrics import (
ENV_RUNNER_RESULTS,
EPISODE_RETURN_MEAN,
NUM_ENV_STEPS_SAMPLED_LIFETIME,
)
from ray.tune.result import TRAINING_ITERATION
config = (
PPOConfig()
.environment("CartPole-v1")
.rl_module(
model_config=DefaultModelConfig(
fcnet_hiddens=[32],
fcnet_activation="linear",
vf_share_layers=True,
),
)
.training(
lr=0.0003,
num_epochs=6,
vf_loss_coeff=0.01,
)
)
tuner = tune.Tuner(
"PPO",
param_space=config.to_dict(),
run_config=tune.RunConfig(
"PPO_Reproduce",
checkpoint_config=tune.CheckpointConfig(
num_to_keep=10,
checkpoint_score_attribute=f"{ENV_RUNNER_RESULTS}/{EPISODE_RETURN_MEAN}",
checkpoint_at_end=True,
checkpoint_frequency=5,
),
stop={
f"{ENV_RUNNER_RESULTS}/{NUM_ENV_STEPS_SAMPLED_LIFETIME}": 3e5,
f"{ENV_RUNNER_RESULTS}/{EPISODE_RETURN_MEAN}": 450,
TRAINING_ITERATION: 100,
},
),
)
results = tuner.fit()
best_result = results.get_best_result()
ckpt = best_result.get_best_checkpoint(f"{ENV_RUNNER_RESULTS}/{EPISODE_RETURN_MEAN}", "max")
print(ckpt.path)Issue Severity
Low: It annoys or frustrates me.