-
Notifications
You must be signed in to change notification settings - Fork 7k
[rllib] Fix custom model-config mismatch between env-runner and learner #58739
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[rllib] Fix custom model-config mismatch between env-runner and learner #58739
Conversation
Signed-off-by: Mark Towers <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request addresses a model configuration mismatch between the environment runner and the learner by correctly merging the algorithm's base model config with the RLModuleSpec's custom config. It also introduces a valuable error check in TorchRLModule.set_state to detect architecture mismatches when loading state into an inference_only module. My review includes a critical fix for the config merging logic to prevent a TypeError when the model config is a dataclass and to ensure compatibility with Python versions older than 3.9.
|
LGTM |
HassamSheikh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| raise ValueError( | ||
| "Architecture mismatch detected when loading state into inference_only module! " | ||
| f"Missing parameters (not found in source state): {list(missing_keys)} " | ||
| "This usually indicates the learner and env-runner have different architectures." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's please be a little more precise here.
-> What does having a different architecture mean here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically this Error should give a good clue to the user about what they are doing wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good spot, it should probably reference the layer names being difference.
Signed-off-by: Mark Towers <[email protected]>
ArturNiederfahrenhorst
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks!
…er (ray-project#58739) ## Description The algorithm config isn't updating `rl_module_spec.model_config` when a custom one is specified which means that the learner and env-runner. As a result, the runner model wasn't been updated. The reason this problem wasn't detected previous was that when updating the model state-dict is we used `strict=False`. Therefore, I've added an error checker that the missing keys should always be empty and will detect when env-runner is missing components from the learner update model. ```python from ray.rllib.algorithms import PPOConfig from ray.rllib.core.rl_module import RLModuleSpec from ray.rllib.policy.sample_batch import DEFAULT_POLICY_ID config = ( PPOConfig() .environment('CartPole-v1') .env_runners( num_env_runners=0, num_envs_per_env_runner=1, ) .rl_module( rl_module_spec=RLModuleSpec( model_config={ "head_fcnet_hiddens": (32,), # This used to cause encoder.config.shared mismatch } ) ) ) algo = config.build_algo() learner_module = algo.learner_group._learner._module[DEFAULT_POLICY_ID] env_runner_modules = algo.env_runner_group.foreach_env_runner(lambda runner: runner.module) print(f'{learner_module.encoder.config.shared=}') print(f'{[mod.encoder.config.shared for mod in env_runner_modules]=}') algo.train() ``` ## Related issues Closes ray-project#58715 --------- Signed-off-by: Mark Towers <[email protected]> Co-authored-by: Mark Towers <[email protected]> Co-authored-by: Hassam Ullah Sheikh <[email protected]> Signed-off-by: YK <[email protected]>
…er (ray-project#58739) ## Description The algorithm config isn't updating `rl_module_spec.model_config` when a custom one is specified which means that the learner and env-runner. As a result, the runner model wasn't been updated. The reason this problem wasn't detected previous was that when updating the model state-dict is we used `strict=False`. Therefore, I've added an error checker that the missing keys should always be empty and will detect when env-runner is missing components from the learner update model. ```python from ray.rllib.algorithms import PPOConfig from ray.rllib.core.rl_module import RLModuleSpec from ray.rllib.policy.sample_batch import DEFAULT_POLICY_ID config = ( PPOConfig() .environment('CartPole-v1') .env_runners( num_env_runners=0, num_envs_per_env_runner=1, ) .rl_module( rl_module_spec=RLModuleSpec( model_config={ "head_fcnet_hiddens": (32,), # This used to cause encoder.config.shared mismatch } ) ) ) algo = config.build_algo() learner_module = algo.learner_group._learner._module[DEFAULT_POLICY_ID] env_runner_modules = algo.env_runner_group.foreach_env_runner(lambda runner: runner.module) print(f'{learner_module.encoder.config.shared=}') print(f'{[mod.encoder.config.shared for mod in env_runner_modules]=}') algo.train() ``` ## Related issues Closes ray-project#58715 --------- Signed-off-by: Mark Towers <[email protected]> Co-authored-by: Mark Towers <[email protected]> Co-authored-by: Hassam Ullah Sheikh <[email protected]>
Description
The algorithm config isn't updating
rl_module_spec.model_configwhen a custom one is specified which means that the learner and env-runner. As a result, the runner model wasn't been updated.The reason this problem wasn't detected previous was that when updating the model state-dict is we used
strict=False.Therefore, I've added an error checker that the missing keys should always be empty and will detect when env-runner is missing components from the learner update model.
Related issues
Closes #58715