[RLlib] `params.json` is not a valid JSON file when using PPO #50051

ema-pe · 2025-01-24T14:38:15Z

What happened + What you expected to happen

Bug: When running a training using the PPO algorithm, the params.json file in the experiment directory is not a valid JSON file representing the parameters of the experiment, while the `params.pkl' file is ok because it is the serialization of the PPOConfig object.

As an example, I run an experiment with a simple environment using PPO and in the result directory is the content of the params.json file:

$ ls ~/ray_results/PPO_SimplexTest_2025-01-24_15-26-24fb0ql0hd/
events.out.tfevents.1737728784.dfaas-marl  params.json  params.pkl  progress.csv  result.json
$ cat ~/ray_results/PPO_SimplexTest_2025-01-24_15-26-24fb0ql0hd/params.json
"<ray.rllib.algorithms.ppo.ppo.PPOConfig object at 0x7647680efe30>"

Expected behavior: The `params.json' file should contain a valid JSON representation of the PPOConfig object.

This is a low severity issue because the params.pkl is serialized correctly and Ray RLLib uses this file, not the JSON one, when loading from a checkpoint. I have not tried other algorithms. But anyway, the JSON is the only way to have an interoperable, human-readable, Python-version-independent log of the algorithm configuration.

Versions / Dependencies

Ubuntu 24.04.1 LTS wit Ray 2.40.0.

Reproduction script

Just run this script and then browse the results directory to find the `params.json' file.

import gymnasium as gym

from ray.rllib.algorithms.ppo import PPOConfig
from ray.rllib.utils.spaces.simplex import Simplex
from ray.tune.registry import register_env


class SimplexTest(gym.Env):
    """SimplexTest is a sample environment that basically does nothing."""

    def __init__(self, config=None):
        self.action_space = Simplex(shape=(3,))
        self.observation_space = gym.spaces.Box(shape=(1,), low=-1, high=1)
        self.max_steps = 100

    def reset(self, seed=None, options=None):
        self.current_step = 0

        obs = self.observation_space.sample()
        return obs, {}

    def step(self, action):
        self.current_step += 1

        obs = self.observation_space.sample()
        reward = self.np_random.random()
        terminated = self.current_step == self.max_steps
        return obs, reward, terminated, False, {}


register_env("SimplexTest", lambda env_config: SimplexTest(config=env_config))


if __name__ == "__main__":
    # Algorithm config.
    ppo_config = (
        PPOConfig()
        # By default RLlib uses the new API stack, but I use the old one.
        .api_stack(
            enable_rl_module_and_learner=False, enable_env_runner_and_connector_v2=False
        )
        .environment(env="SimplexTest")
        .framework("torch")
        .env_runners(num_env_runners=0)  # Get experiences in the main process.
        .evaluation(evaluation_interval=None)  # No automatic evaluation.
        .resources(num_gpus=1)
    )

    # Build the experiment.
    ppo_algo = ppo_config.build()
    print(f"Algorithm initialized ({ppo_algo.logdir = }")

    iterations = 2
    print(f"Start of training ({iterations = })")
    for iteration in range(iterations):
        print(f"Iteration {iteration}")
        ppo_algo.train()
    print("Training terminated")

    ppo_algo.stop()
    print("Training end")

Issue Severity

Low: It annoys or frustrates me.

The text was updated successfully, but these errors were encountered:

simonsays1980 · 2025-01-27T09:56:23Z

@ema-pe Thanks for raising this issue. We gonna take a look into it. Does this also occur when using our new API stack?

ema-pe · 2025-01-27T13:27:33Z

@ema-pe Thanks for raising this issue. We gonna take a look into it. Does this also occur when using our new API stack?

Yes, it does. The simplex action space is not yet supported with the new API stack, so I created an environment with a new reproduction script:

# This script simply runs a training experiment of a dummy environment with PPO
# using Ray RLLib with the new API stack.
import gymnasium as gym

from ray.rllib.algorithms.ppo import PPOConfig
from ray.rllib.utils.spaces.simplex import Simplex
from ray.tune.registry import register_env


class SimplexTest(gym.Env):
    """SimplexTest is a sample environment that basically does nothing."""

    def __init__(self, config=None):
        #self.action_space = Simplex(shape=(3,))
        self.action_space = gym.spaces.Box(shape=(3,), low=0, high=1)
        self.observation_space = gym.spaces.Box(shape=(1,), low=-1, high=1)
        self.max_steps = 100

    def reset(self, seed=None, options=None):
        self.current_step = 0

        obs = self.observation_space.sample()
        return obs, {}

    def step(self, action):
        self.current_step += 1

        obs = self.observation_space.sample()
        reward = self.np_random.random()
        terminated = self.current_step == self.max_steps
        return obs, reward, terminated, False, {}


register_env("SimplexTest", lambda env_config: SimplexTest(config=env_config))


if __name__ == "__main__":
    # Algorithm config.
    ppo_config = (
        PPOConfig()
        .environment(env="SimplexTest")
        .framework("torch")
        .env_runners(num_env_runners=0)  # Get experiences in the main process.
        .evaluation(evaluation_interval=None)  # No automatic evaluation.
        .resources(num_gpus=1)
    )

    # Build the experiment.
    ppo_algo = ppo_config.build()
    print(f"Algorithm initialized ({ppo_algo.logdir = }")

    iterations = 2
    print(f"Start of training ({iterations = })")
    for iteration in range(iterations):
        print(f"Iteration {iteration}")
        ppo_algo.train()
    print("Training terminated")

    ppo_algo.stop()
    print("Training end")

After the training:

$ cat /home/emanuele/ray_results/PPO_SimplexTest_2025-01-27_14-21-28_6cvgbr_/params.json
"<ray.rllib.algorithms.ppo.ppo.PPOConfig object at 0x79a94a3fc290>"

ema-pe added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Jan 24, 2025

jcotant1 added the rllib RLlib related issues label Jan 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] `params.json` is not a valid JSON file when using PPO #50051

[RLlib] `params.json` is not a valid JSON file when using PPO #50051

ema-pe commented Jan 24, 2025

simonsays1980 commented Jan 27, 2025

ema-pe commented Jan 27, 2025

[RLlib] params.json is not a valid JSON file when using PPO #50051

[RLlib] params.json is not a valid JSON file when using PPO #50051

Comments

ema-pe commented Jan 24, 2025

What happened + What you expected to happen

Versions / Dependencies

Reproduction script

Issue Severity

simonsays1980 commented Jan 27, 2025

ema-pe commented Jan 27, 2025

[RLlib] `params.json` is not a valid JSON file when using PPO #50051

[RLlib] `params.json` is not a valid JSON file when using PPO #50051