[RLlib] Cannot retreive checkpoint from `tune` due to nested metrics of RLlib

### What happened + What you expected to happen

## Description:

This is an issue I've already posted to [Ray forum](https://discuss.ray.io/t/cannot-retreive-checkpoint-from-tune-due-to-nested-metrics-of-rllib/23230). In short, for the latest Ray 2.49.2 version, one can't `get_best_checkpoint` from `tune.Tuner` with metrics inside "env_runners" key.


### What you expect

```python
results = tuner.fit()
best_result = results.get_best_result()
ckpt = best_result.get_best_checkpoint(f"{ENV_RUNNER_RESULTS}/{EPISODE_RETURN_MEAN}", "max")
print(ckpt.path)
```

When using above code, we are expecting to get the best checkpoint regarding to pointed metric.

### What happened

It will raise a RuntimeError, instead of returning the ckpt:

```bash
RuntimeError: Invalid metric name env_runners/episode_return_mean! You may choose from the following metrics: dict_keys(['timers', 'env_runners', 'learners', 'num_training_step_calls_per_iteration', 'num_env_steps_sampled_lifetime', 'fault_tolerance', 'env_runner_group', 'done', 'training_iteration', 'trial_id', 'date', 'timestamp', 'time_this_iter_s', 'time_total_s', 'pid', 'hostname', 'node_ip', 'config', 'time_since_restore', 'iterations_since_restore', 'perf', 'experiment_tag']).
```

<img width="1377" height="783" alt="Image" src="https://github.com/user-attachments/assets/592c1ad1-baf9-4a47-95d6-06b458c7d489" />

## Ways to work around

It seems like if we use `air.Result` to load back these checkpoints, we can then retreive the checkpoint with nested metrics. So it really confused me since the traceback shows that we are calling the same method.

```python
from ray import train

result = train.Result.from_path("D:/Cache/ray_results/PPO_Reproduce/PPO_CartPole-v1_53ebe_00000_0_2025-10-07_09-17-01")
best_ckpt = result.get_best_checkpoint(f"{ENV_RUNNER_RESULTS}/{EPISODE_RETURN_MEAN}", "max")
print(best_ckpt.path)
```

### Related issue

However, RLlib's default `result_dict` is not correctly logging `checkpoint_dir_name`, which is required for `air.Result.from_path` to read the checkpoints.

<img width="1974" height="633" alt="Image" src="https://github.com/user-attachments/assets/8d1abaea-e93d-4e38-97e2-c1c2804c7636" />

This has been an issue since Ray 2.8.0+ (where I have started to use Ray), and I have come up with a solution to manually add this through callback.

```python
class CheckpointCallback(RLlibCallback):
    def on_train_result(
        self,
        *,
        algorithm: "Algorithm",
        metrics_logger: MetricsLogger = None,
        result: dict,
        **kwargs
    ):
        if algorithm._storage:
            algorithm._storage.current_checkpoint_index += 1
            result["checkpoint_dir_name"] = algorithm._storage.checkpoint_dir_name
            algorithm._storage.current_checkpoint_index -= 1
```

But It would be great if the ray team could officially fix the problem.

### Versions / Dependencies

## Environment:
- Ray version: 2.49.2
- Python version: 3.10.18
- OS: Windows 11 24H2
- Cloud/Infrastructure: Local
- Other libs/tools (if relevant): PyTorch

### Reproduction script

## Reproduction script

```python
from ray import tune
from ray.rllib.algorithms.ppo import PPOConfig
from ray.rllib.core.rl_module.default_model_config import DefaultModelConfig
from ray.rllib.utils.metrics import (
    ENV_RUNNER_RESULTS,
    EPISODE_RETURN_MEAN,
    NUM_ENV_STEPS_SAMPLED_LIFETIME,
)
from ray.tune.result import TRAINING_ITERATION

config = (
    PPOConfig()
    .environment("CartPole-v1")
    .rl_module(
        model_config=DefaultModelConfig(
            fcnet_hiddens=[32],
            fcnet_activation="linear",
            vf_share_layers=True,
        ),
    )
    .training(
        lr=0.0003,
        num_epochs=6,
        vf_loss_coeff=0.01,
    )
)

tuner = tune.Tuner(
    "PPO",
    param_space=config.to_dict(),
    run_config=tune.RunConfig(
        "PPO_Reproduce",
        checkpoint_config=tune.CheckpointConfig(
            num_to_keep=10,
            checkpoint_score_attribute=f"{ENV_RUNNER_RESULTS}/{EPISODE_RETURN_MEAN}",
            checkpoint_at_end=True,
            checkpoint_frequency=5,
        ),
        stop={
            f"{ENV_RUNNER_RESULTS}/{NUM_ENV_STEPS_SAMPLED_LIFETIME}": 3e5,
            f"{ENV_RUNNER_RESULTS}/{EPISODE_RETURN_MEAN}": 450,
            TRAINING_ITERATION: 100,
        },
    ),
)

results = tuner.fit()
best_result = results.get_best_result()
ckpt = best_result.get_best_checkpoint(f"{ENV_RUNNER_RESULTS}/{EPISODE_RETURN_MEAN}", "max")
print(ckpt.path)
```

### Issue Severity

Low: It annoys or frustrates me.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RLlib] Cannot retreive checkpoint from `tune` due to nested metrics of RLlib #57533

What happened + What you expected to happen

Description:

What you expect

What happened

Ways to work around

Related issue

Versions / Dependencies

Environment:

Reproduction script

Reproduction script

Issue Severity

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RLlib] Cannot retreive checkpoint from tune due to nested metrics of RLlib #57533

Description

What happened + What you expected to happen

Description:

What you expect

What happened

Ways to work around

Related issue

Versions / Dependencies

Environment:

Reproduction script

Reproduction script

Issue Severity

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[RLlib] Cannot retreive checkpoint from `tune` due to nested metrics of RLlib #57533