Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] params.json is not a valid JSON file when using PPO #50051

Open
ema-pe opened this issue Jan 24, 2025 · 2 comments
Open

[RLlib] params.json is not a valid JSON file when using PPO #50051

ema-pe opened this issue Jan 24, 2025 · 2 comments
Labels
bug Something that is supposed to be working; but isn't rllib RLlib related issues rllib-checkpointing-or-recovery An issue related to checkpointing/recovering RLlib Trainers. rllib-oldstack-cleanup Issues related to cleaning up classes, utilities on the old API stack

Comments

@ema-pe
Copy link

ema-pe commented Jan 24, 2025

What happened + What you expected to happen

Bug: When running a training using the PPO algorithm, the params.json file in the experiment directory is not a valid JSON file representing the parameters of the experiment, while the `params.pkl' file is ok because it is the serialization of the PPOConfig object.

As an example, I run an experiment with a simple environment using PPO and in the result directory is the content of the params.json file:

$ ls ~/ray_results/PPO_SimplexTest_2025-01-24_15-26-24fb0ql0hd/
events.out.tfevents.1737728784.dfaas-marl  params.json  params.pkl  progress.csv  result.json
$ cat ~/ray_results/PPO_SimplexTest_2025-01-24_15-26-24fb0ql0hd/params.json
"<ray.rllib.algorithms.ppo.ppo.PPOConfig object at 0x7647680efe30>"

Expected behavior: The `params.json' file should contain a valid JSON representation of the PPOConfig object.

This is a low severity issue because the params.pkl is serialized correctly and Ray RLLib uses this file, not the JSON one, when loading from a checkpoint. I have not tried other algorithms. But anyway, the JSON is the only way to have an interoperable, human-readable, Python-version-independent log of the algorithm configuration.

Versions / Dependencies

Ubuntu 24.04.1 LTS wit Ray 2.40.0.

Reproduction script

Just run this script and then browse the results directory to find the `params.json' file.

import gymnasium as gym

from ray.rllib.algorithms.ppo import PPOConfig
from ray.rllib.utils.spaces.simplex import Simplex
from ray.tune.registry import register_env


class SimplexTest(gym.Env):
    """SimplexTest is a sample environment that basically does nothing."""

    def __init__(self, config=None):
        self.action_space = Simplex(shape=(3,))
        self.observation_space = gym.spaces.Box(shape=(1,), low=-1, high=1)
        self.max_steps = 100

    def reset(self, seed=None, options=None):
        self.current_step = 0

        obs = self.observation_space.sample()
        return obs, {}

    def step(self, action):
        self.current_step += 1

        obs = self.observation_space.sample()
        reward = self.np_random.random()
        terminated = self.current_step == self.max_steps
        return obs, reward, terminated, False, {}


register_env("SimplexTest", lambda env_config: SimplexTest(config=env_config))


if __name__ == "__main__":
    # Algorithm config.
    ppo_config = (
        PPOConfig()
        # By default RLlib uses the new API stack, but I use the old one.
        .api_stack(
            enable_rl_module_and_learner=False, enable_env_runner_and_connector_v2=False
        )
        .environment(env="SimplexTest")
        .framework("torch")
        .env_runners(num_env_runners=0)  # Get experiences in the main process.
        .evaluation(evaluation_interval=None)  # No automatic evaluation.
        .resources(num_gpus=1)
    )

    # Build the experiment.
    ppo_algo = ppo_config.build()
    print(f"Algorithm initialized ({ppo_algo.logdir = }")

    iterations = 2
    print(f"Start of training ({iterations = })")
    for iteration in range(iterations):
        print(f"Iteration {iteration}")
        ppo_algo.train()
    print("Training terminated")

    ppo_algo.stop()
    print("Training end")

Issue Severity

Low: It annoys or frustrates me.

@ema-pe ema-pe added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Jan 24, 2025
@jcotant1 jcotant1 added the rllib RLlib related issues label Jan 27, 2025
@simonsays1980 simonsays1980 added rllib-checkpointing-or-recovery An issue related to checkpointing/recovering RLlib Trainers. rllib-oldstack-cleanup Issues related to cleaning up classes, utilities on the old API stack and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Jan 27, 2025
@simonsays1980
Copy link
Collaborator

@ema-pe Thanks for raising this issue. We gonna take a look into it. Does this also occur when using our new API stack?

@ema-pe
Copy link
Author

ema-pe commented Jan 27, 2025

@ema-pe Thanks for raising this issue. We gonna take a look into it. Does this also occur when using our new API stack?

Yes, it does. The simplex action space is not yet supported with the new API stack, so I created an environment with a new reproduction script:

# This script simply runs a training experiment of a dummy environment with PPO
# using Ray RLLib with the new API stack.
import gymnasium as gym

from ray.rllib.algorithms.ppo import PPOConfig
from ray.rllib.utils.spaces.simplex import Simplex
from ray.tune.registry import register_env


class SimplexTest(gym.Env):
    """SimplexTest is a sample environment that basically does nothing."""

    def __init__(self, config=None):
        #self.action_space = Simplex(shape=(3,))
        self.action_space = gym.spaces.Box(shape=(3,), low=0, high=1)
        self.observation_space = gym.spaces.Box(shape=(1,), low=-1, high=1)
        self.max_steps = 100

    def reset(self, seed=None, options=None):
        self.current_step = 0

        obs = self.observation_space.sample()
        return obs, {}

    def step(self, action):
        self.current_step += 1

        obs = self.observation_space.sample()
        reward = self.np_random.random()
        terminated = self.current_step == self.max_steps
        return obs, reward, terminated, False, {}


register_env("SimplexTest", lambda env_config: SimplexTest(config=env_config))


if __name__ == "__main__":
    # Algorithm config.
    ppo_config = (
        PPOConfig()
        .environment(env="SimplexTest")
        .framework("torch")
        .env_runners(num_env_runners=0)  # Get experiences in the main process.
        .evaluation(evaluation_interval=None)  # No automatic evaluation.
        .resources(num_gpus=1)
    )

    # Build the experiment.
    ppo_algo = ppo_config.build()
    print(f"Algorithm initialized ({ppo_algo.logdir = }")

    iterations = 2
    print(f"Start of training ({iterations = })")
    for iteration in range(iterations):
        print(f"Iteration {iteration}")
        ppo_algo.train()
    print("Training terminated")

    ppo_algo.stop()
    print("Training end")

After the training:

$ cat /home/emanuele/ray_results/PPO_SimplexTest_2025-01-27_14-21-28_6cvgbr_/params.json
"<ray.rllib.algorithms.ppo.ppo.PPOConfig object at 0x79a94a3fc290>"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that is supposed to be working; but isn't rllib RLlib related issues rllib-checkpointing-or-recovery An issue related to checkpointing/recovering RLlib Trainers. rllib-oldstack-cleanup Issues related to cleaning up classes, utilities on the old API stack
Projects
None yet
Development

No branches or pull requests

3 participants