Skip to content

Conversation

@pseudo-rnd-thoughts
Copy link
Member

@pseudo-rnd-thoughts pseudo-rnd-thoughts commented Nov 18, 2025

Description

The algorithm config isn't updating rl_module_spec.model_config when a custom one is specified which means that the learner and env-runner. As a result, the runner model wasn't been updated.
The reason this problem wasn't detected previous was that when updating the model state-dict is we used strict=False.
Therefore, I've added an error checker that the missing keys should always be empty and will detect when env-runner is missing components from the learner update model.

from ray.rllib.algorithms import PPOConfig
from ray.rllib.core.rl_module import RLModuleSpec
from ray.rllib.policy.sample_batch import DEFAULT_POLICY_ID


config = (
    PPOConfig()
    .environment('CartPole-v1')
    .env_runners(
        num_env_runners=0,
        num_envs_per_env_runner=1,
    )
    .rl_module(
        rl_module_spec=RLModuleSpec(
            model_config={
                "head_fcnet_hiddens": (32,), # This used to cause encoder.config.shared mismatch
            }
        )
    )
)

algo = config.build_algo()

learner_module = algo.learner_group._learner._module[DEFAULT_POLICY_ID]
env_runner_modules = algo.env_runner_group.foreach_env_runner(lambda runner: runner.module)

print(f'{learner_module.encoder.config.shared=}')
print(f'{[mod.encoder.config.shared for mod in env_runner_modules]=}')

algo.train()

Related issues

Closes #58715

@pseudo-rnd-thoughts pseudo-rnd-thoughts requested a review from a team as a code owner November 18, 2025 12:47
@pseudo-rnd-thoughts pseudo-rnd-thoughts added rllib RLlib related issues rllib-algorithms An RLlib algorithm/Trainer is not learning. rllib-models An issue related to RLlib (default or custom) Models. labels Nov 18, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a model configuration mismatch between the environment runner and the learner by correctly merging the algorithm's base model config with the RLModuleSpec's custom config. It also introduces a valuable error check in TorchRLModule.set_state to detect architecture mismatches when loading state into an inference_only module. My review includes a critical fix for the config merging logic to prevent a TypeError when the model config is a dataclass and to ensure compatibility with Python versions older than 3.9.

@pseudo-rnd-thoughts pseudo-rnd-thoughts added the go add ONLY when ready to merge, run all tests label Nov 19, 2025
@HassamSheikh
Copy link
Contributor

LGTM

@HassamSheikh HassamSheikh self-assigned this Nov 20, 2025
Copy link
Contributor

@HassamSheikh HassamSheikh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ArturNiederfahrenhorst ArturNiederfahrenhorst enabled auto-merge (squash) November 21, 2025 11:11
raise ValueError(
"Architecture mismatch detected when loading state into inference_only module! "
f"Missing parameters (not found in source state): {list(missing_keys)} "
"This usually indicates the learner and env-runner have different architectures."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's please be a little more precise here.
-> What does having a different architecture mean here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically this Error should give a good clue to the user about what they are doing wrong.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good spot, it should probably reference the layer names being difference.

Signed-off-by: Mark Towers <[email protected]>
@github-actions github-actions bot disabled auto-merge November 21, 2025 11:42
Copy link
Contributor

@ArturNiederfahrenhorst ArturNiederfahrenhorst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

@ArturNiederfahrenhorst ArturNiederfahrenhorst merged commit 32cc715 into ray-project:master Nov 22, 2025
6 checks passed
ykdojo pushed a commit to ykdojo/ray that referenced this pull request Nov 27, 2025
…er (ray-project#58739)

## Description
The algorithm config isn't updating `rl_module_spec.model_config` when a
custom one is specified which means that the learner and env-runner. As
a result, the runner model wasn't been updated.
The reason this problem wasn't detected previous was that when updating
the model state-dict is we used `strict=False`.
Therefore, I've added an error checker that the missing keys should
always be empty and will detect when env-runner is missing components
from the learner update model.

```python
from ray.rllib.algorithms import PPOConfig
from ray.rllib.core.rl_module import RLModuleSpec
from ray.rllib.policy.sample_batch import DEFAULT_POLICY_ID

config = (
    PPOConfig()
    .environment('CartPole-v1')
    .env_runners(
        num_env_runners=0,
        num_envs_per_env_runner=1,
    )
    .rl_module(
        rl_module_spec=RLModuleSpec(
            model_config={
                "head_fcnet_hiddens": (32,), # This used to cause encoder.config.shared mismatch
            }
        )
    )
)

algo = config.build_algo()

learner_module = algo.learner_group._learner._module[DEFAULT_POLICY_ID]
env_runner_modules = algo.env_runner_group.foreach_env_runner(lambda runner: runner.module)

print(f'{learner_module.encoder.config.shared=}')
print(f'{[mod.encoder.config.shared for mod in env_runner_modules]=}')

algo.train()
```

## Related issues
Closes ray-project#58715

---------

Signed-off-by: Mark Towers <[email protected]>
Co-authored-by: Mark Towers <[email protected]>
Co-authored-by: Hassam Ullah Sheikh <[email protected]>
Signed-off-by: YK <[email protected]>
SheldonTsen pushed a commit to SheldonTsen/ray that referenced this pull request Dec 1, 2025
…er (ray-project#58739)

## Description
The algorithm config isn't updating `rl_module_spec.model_config` when a
custom one is specified which means that the learner and env-runner. As
a result, the runner model wasn't been updated.
The reason this problem wasn't detected previous was that when updating
the model state-dict is we used `strict=False`.
Therefore, I've added an error checker that the missing keys should
always be empty and will detect when env-runner is missing components
from the learner update model.

```python
from ray.rllib.algorithms import PPOConfig
from ray.rllib.core.rl_module import RLModuleSpec
from ray.rllib.policy.sample_batch import DEFAULT_POLICY_ID


config = (
    PPOConfig()
    .environment('CartPole-v1')
    .env_runners(
        num_env_runners=0,
        num_envs_per_env_runner=1,
    )
    .rl_module(
        rl_module_spec=RLModuleSpec(
            model_config={
                "head_fcnet_hiddens": (32,), # This used to cause encoder.config.shared mismatch
            }
        )
    )
)

algo = config.build_algo()

learner_module = algo.learner_group._learner._module[DEFAULT_POLICY_ID]
env_runner_modules = algo.env_runner_group.foreach_env_runner(lambda runner: runner.module)

print(f'{learner_module.encoder.config.shared=}')
print(f'{[mod.encoder.config.shared for mod in env_runner_modules]=}')

algo.train()
```

## Related issues
Closes ray-project#58715

---------

Signed-off-by: Mark Towers <[email protected]>
Co-authored-by: Mark Towers <[email protected]>
Co-authored-by: Hassam Ullah Sheikh <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go add ONLY when ready to merge, run all tests rllib RLlib related issues rllib-algorithms An RLlib algorithm/Trainer is not learning. rllib-models An issue related to RLlib (default or custom) Models.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[RLlib] Encoders on EnvRunner objects are set to 'shared' when they shouldn't be if model_config is used and vf_share_layers isn't explicitly set.

4 participants