[rollout, trainer] feat: extend agent loop for custom implementations#4548
[rollout, trainer] feat: extend agent loop for custom implementations#4548wuxibin89 merged 16 commits intoverl-project:mainfrom
Conversation
Allow dynamic loading of custom AgentLoopManager class through configuration. This enables users to specify agent_loop_manager_class in their config to use custom implementations like OsmosisAgentLoopManager.
- Introduced a new optional field `tool_calls` in the `TokenOutput` model to accommodate OpenAI-style tool calls extracted from generation. - This enhancement improves the model's capability to handle additional response data.
- Introduced an optional field `agent_loop_manager_class` in the `AgentLoopConfig` to allow users to specify a custom implementation for the AgentLoopManager, facilitating external integrations such as remote rollouts.
- Introduced a new optional field `custom` in the `RolloutConfig` class to allow users to specify arbitrary configurations for external integrations, enhancing flexibility for custom rollout implementations.
…tLoopManager - Updated the RayPPOTrainer to use importlib for importing the custom AgentLoopManager class specified in the configuration, enhancing security and maintainability by avoiding the use of exec.
Pure Extract Method refactoring with no behavioral changes.
Extracted functions from ToolAgentLoop to utils.py:
- compute_system_prompt: computes system prompt tokens for prefix stripping
- apply_chat_template_with_processor: applies chat template via processor
- apply_chat_template_with_tokenizer: applies chat template via tokenizer
- tokenize_with_processor: tokenizes pre-rendered prompt via processor
- build_gpt_oss_tool_response_text: builds gpt-oss tool response text
All call sites maintain identical behavior:
- Parameter mapping is 1:1 with original inline code
- Executor context (sync vs async) unchanged at call sites
- Empty dict {} passed where original code omitted kwargs (equivalent)
- Optional `tools` parameter handled via conditional branching
This enables reuse in external projects (e.g., remote rollout servers) without duplicating tokenization logic.
…icate compute_system_prompt
- Moved tokenization utility functions from ToolAgentLoop to chat_template.py for better reusability. - Updated import statements to reflect the new structure, ensuring cleaner code organization. - Added detailed docstrings for new functions to clarify their usage and parameters.
… utilities - Modified the apply_chat_template_with_processor and apply_chat_template_with_tokenizer functions to accept apply_chat_template_kwargs as an optional parameter, defaulting to None. - Updated internal logic to initialize apply_chat_template_kwargs to an empty dictionary if not provided, ensuring consistent behavior across function calls. - Removed unnecessary empty dictionary arguments from ToolAgentLoop class methods, streamlining the code.
There was a problem hiding this comment.
Code Review
This pull request enhances the extensibility of the agent loop by allowing a custom AgentLoopManager, extracting tokenization utilities for reusability, and adding a generic custom configuration field. The changes are well-structured and the refactoring of tokenization logic into utility functions is a good step towards better modularity. My review focuses on the new dynamic import mechanism for the custom AgentLoopManager. I have identified a robustness issue that could lead to a crash if the configuration is malformed, and a potential security vulnerability related to code injection. I've provided suggestions to make the implementation more robust and secure.
… import - Added validation to ensure the agent_loop_manager_class is a fully qualified class name. - Implemented detailed error handling for ImportError and AttributeError during dynamic import of the AgentLoopManager, providing clearer feedback for configuration issues. - Updated documentation in AgentLoopConfig to emphasize the need for trusted class paths.
verl/trainer/ppo/ray_trainer.py
Outdated
| if self.config.actor_rollout_ref.rollout.mode == "async": | ||
| from verl.experimental.agent_loop import AgentLoopManager | ||
| # Support custom AgentLoopManager via config | ||
| manager_class = self.config.actor_rollout_ref.rollout.get("agent", {}).get("agent_loop_manager_class") |
There was a problem hiding this comment.
Please move this to a util function
There was a problem hiding this comment.
Thanks for the suggestion!
I've refactored this by adding a load_class_from_fqn() utility function to verl/utils/import_utils.py. This function handles FQN parsing, dynamic import, and provides clear error messages. Commit: 3b78336
The code in ray_trainer.py is now simplified from ~20 lines to just 4 lines:
manager_class_fqn = self.config.actor_rollout_ref.rollout.get("agent", {}).get("agent_loop_manager_class")
if manager_class_fqn:
AgentLoopManager = load_class_from_fqn(manager_class_fqn, "AgentLoopManager")
else:
from verl.experimental.agent_loop import AgentLoopManagerThis also removes the now-unused import importlib from the file.
| prometheus: PrometheusConfig = field(default_factory=PrometheusConfig) | ||
|
|
||
| # Extension point for custom configurations | ||
| custom: Optional[dict] = None |
There was a problem hiding this comment.
This field seems not used?
There was a problem hiding this comment.
This field is designed as an extension point for downstream projects to pass custom configurations without modifying core config classes.
For example, a project implementing remote rollout could use:
actor_rollout_ref:
rollout:
custom:
remote_rollout:
server_url: "https://..."
callback_url: "http://..."
timeout_seconds: 300Then access it via config.actor_rollout_ref.rollout.get("custom", {}) in their custom AgentLoopManager.
Happy to add a docstring clarifying this is an extension point!
…unction - Replaced manual import logic for AgentLoopManager with a new utility function, load_class_from_fqn, to enhance code readability and maintainability. - Added comprehensive error handling in load_class_from_fqn for improved feedback on import issues. - Updated RayPPOTrainer to utilize the new utility, simplifying the configuration process for custom agent loop managers.
|
@wuxibin89 thanks for the review!
Would you mind taking another look when you have a chance? 🙏 |
verl/utils/chat_template.py
Outdated
| """ | ||
| if apply_chat_template_kwargs is None: | ||
| apply_chat_template_kwargs = {} | ||
| if tools is None: |
There was a problem hiding this comment.
It's a bit redundant here, we can safely pass tools=None to processor.apply_chat_template?
|
I don't quit understand why we need these util function, it's more straightforward to call processor and tokenizer function directly.
|
…tLoop - Removed redundant tokenization utility functions from chat_template.py, consolidating logic within the ToolAgentLoop class. - Updated the application of chat templates to utilize the processor and tokenizer directly, enhancing code clarity and maintainability. - Simplified the handling of apply_chat_template_kwargs, ensuring consistent behavior across different processing paths. - Improved the overall structure of the code by eliminating unnecessary imports and functions, promoting better reusability.
You're right. I initially extracted these utilities for consistency with |
…verl-project#4548) ### What does this PR do? This PR enhances the extensibility of the agent loop module by: 1. **Supporting custom AgentLoopManager**: Allow users to specify a custom `AgentLoopManager` class via configuration (`agent_loop_manager_class`), enabling external projects to implement their own rollout management logic. 2. **Extracting gpt-oss tool response builder**: Extract the gpt-oss specific tool response formatting logic into a reusable utility function `build_gpt_oss_tool_response_text()`, combining existing `format_gpt_oss_tool_response_manually()` and `add_generation_prompt_for_gpt_oss()` calls. 3. **Adding extension point for custom configurations**: Introduce a `custom: Optional[dict]` field in `RolloutConfig` to support arbitrary user-defined configurations. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: https://github.com/volcengine/verl/pulls?q=is%3Apr+AgentLoopManager - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) ### Test This PR is a refactoring and extension of existing functionality: - The `build_gpt_oss_tool_response_text()` function maintains the same behavior as the original inline code in `ToolAgentLoop` - The custom `AgentLoopManager` loading uses `importlib` which is a standard Python mechanism - No behavioral changes to existing workflows when `agent_loop_manager_class` is not configured ### API and Usage Example #### 1. Custom AgentLoopManager Users can now specify a custom `AgentLoopManager` class in their config: ```yaml actor_rollout_ref: rollout: mode: async agent: agent_loop_manager_class: "myproject.custom_manager.MyAgentLoopManager" ``` > **Security Note**: The `agent_loop_manager_class` configuration uses dynamic import via `importlib`, which will execute the specified module's code. Only use class paths from trusted sources. Ensure your configuration files are not writable by untrusted users or processes. #### 2. Reusable gpt-oss Tool Response Builder External projects can now import and use the gpt-oss utility: ```python from verl.experimental.agent_loop.utils import build_gpt_oss_tool_response_text # Build gpt-oss tool response text (combines formatting + generation prompt) tool_response_text = build_gpt_oss_tool_response_text(messages, tool_call_names) ``` #### 3. Custom Configuration Extension Point ```yaml actor_rollout_ref: rollout: custom: my_custom_key: "my_custom_value" another_setting: 123 ``` ### Design & Code Changes #### High-Level Design ``` ┌─────────────────────────────────────────────────────────────────┐ │ ray_trainer.py │ │ ┌───────────────────────────────────────────────────────────┐ │ │ │ if agent_loop_manager_class configured: │ │ │ │ dynamically import via importlib │ │ │ │ else: │ │ │ │ use default AgentLoopManager │ │ │ └───────────────────────────────────────────────────────────┘ │ └─────────────────────────────────────────────────────────────────┘ ┌─────────────────────────────────────────────────────────────────┐ │ verl/experimental/agent_loop/utils.py (new export) │ │ ┌───────────────────────────────────────────────────────────┐ │ │ │ build_gpt_oss_tool_response_text() │ │ │ │ - combines format_gpt_oss_tool_response_manually() │ │ │ │ - and add_generation_prompt_for_gpt_oss() │ │ │ └───────────────────────────────────────────────────────────┘ │ └─────────────────────────────────────────────────────────────────┘ ``` #### Specific Changes | File | Changes | |------|---------| | `verl/trainer/ppo/ray_trainer.py` | Added dynamic loading of custom `AgentLoopManager` via `importlib` | | `verl/workers/config/rollout.py` | Added `agent_loop_manager_class` to `AgentLoopConfig`; Added `custom` field to `RolloutConfig` | | `verl/experimental/agent_loop/utils.py` | Added `build_gpt_oss_tool_response_text()` combining existing gpt-oss formatting functions | | `verl/experimental/agent_loop/tool_agent_loop.py` | Refactored to use `build_gpt_oss_tool_response_text()` for cleaner code | ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). **N/A** - This PR targets advanced users extending verl. The code is self-documenting with type hints and docstrings, and the PR description includes complete usage examples. - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. **N/A** - This is a pure refactoring with no behavioral changes: (1) `build_gpt_oss_tool_response_text()` maintains identical behavior to the original inline code; (2) `agent_loop_manager_class` is optional and defaults to the existing `AgentLoopManager`; (3) existing agent loop tests already cover the default code path. - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
…verl-project#4548) ### What does this PR do? This PR enhances the extensibility of the agent loop module by: 1. **Supporting custom AgentLoopManager**: Allow users to specify a custom `AgentLoopManager` class via configuration (`agent_loop_manager_class`), enabling external projects to implement their own rollout management logic. 2. **Extracting gpt-oss tool response builder**: Extract the gpt-oss specific tool response formatting logic into a reusable utility function `build_gpt_oss_tool_response_text()`, combining existing `format_gpt_oss_tool_response_manually()` and `add_generation_prompt_for_gpt_oss()` calls. 3. **Adding extension point for custom configurations**: Introduce a `custom: Optional[dict]` field in `RolloutConfig` to support arbitrary user-defined configurations. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: https://github.com/volcengine/verl/pulls?q=is%3Apr+AgentLoopManager - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) ### Test This PR is a refactoring and extension of existing functionality: - The `build_gpt_oss_tool_response_text()` function maintains the same behavior as the original inline code in `ToolAgentLoop` - The custom `AgentLoopManager` loading uses `importlib` which is a standard Python mechanism - No behavioral changes to existing workflows when `agent_loop_manager_class` is not configured ### API and Usage Example #### 1. Custom AgentLoopManager Users can now specify a custom `AgentLoopManager` class in their config: ```yaml actor_rollout_ref: rollout: mode: async agent: agent_loop_manager_class: "myproject.custom_manager.MyAgentLoopManager" ``` > **Security Note**: The `agent_loop_manager_class` configuration uses dynamic import via `importlib`, which will execute the specified module's code. Only use class paths from trusted sources. Ensure your configuration files are not writable by untrusted users or processes. #### 2. Reusable gpt-oss Tool Response Builder External projects can now import and use the gpt-oss utility: ```python from verl.experimental.agent_loop.utils import build_gpt_oss_tool_response_text # Build gpt-oss tool response text (combines formatting + generation prompt) tool_response_text = build_gpt_oss_tool_response_text(messages, tool_call_names) ``` #### 3. Custom Configuration Extension Point ```yaml actor_rollout_ref: rollout: custom: my_custom_key: "my_custom_value" another_setting: 123 ``` ### Design & Code Changes #### High-Level Design ``` ┌─────────────────────────────────────────────────────────────────┐ │ ray_trainer.py │ │ ┌───────────────────────────────────────────────────────────┐ │ │ │ if agent_loop_manager_class configured: │ │ │ │ dynamically import via importlib │ │ │ │ else: │ │ │ │ use default AgentLoopManager │ │ │ └───────────────────────────────────────────────────────────┘ │ └─────────────────────────────────────────────────────────────────┘ ┌─────────────────────────────────────────────────────────────────┐ │ verl/experimental/agent_loop/utils.py (new export) │ │ ┌───────────────────────────────────────────────────────────┐ │ │ │ build_gpt_oss_tool_response_text() │ │ │ │ - combines format_gpt_oss_tool_response_manually() │ │ │ │ - and add_generation_prompt_for_gpt_oss() │ │ │ └───────────────────────────────────────────────────────────┘ │ └─────────────────────────────────────────────────────────────────┘ ``` #### Specific Changes | File | Changes | |------|---------| | `verl/trainer/ppo/ray_trainer.py` | Added dynamic loading of custom `AgentLoopManager` via `importlib` | | `verl/workers/config/rollout.py` | Added `agent_loop_manager_class` to `AgentLoopConfig`; Added `custom` field to `RolloutConfig` | | `verl/experimental/agent_loop/utils.py` | Added `build_gpt_oss_tool_response_text()` combining existing gpt-oss formatting functions | | `verl/experimental/agent_loop/tool_agent_loop.py` | Refactored to use `build_gpt_oss_tool_response_text()` for cleaner code | ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). **N/A** - This PR targets advanced users extending verl. The code is self-documenting with type hints and docstrings, and the PR description includes complete usage examples. - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. **N/A** - This is a pure refactoring with no behavioral changes: (1) `build_gpt_oss_tool_response_text()` maintains identical behavior to the original inline code; (2) `agent_loop_manager_class` is optional and defaults to the existing `AgentLoopManager`; (3) existing agent loop tests already cover the default code path. - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
What does this PR do?
This PR enhances the extensibility of the agent loop module by:
Supporting custom AgentLoopManager: Allow users to specify a custom
AgentLoopManagerclass via configuration (agent_loop_manager_class), enabling external projects to implement their own rollout management logic.Extracting gpt-oss tool response builder: Extract the gpt-oss specific tool response formatting logic into a reusable utility function
build_gpt_oss_tool_response_text(), combining existingformat_gpt_oss_tool_response_manually()andadd_generation_prompt_for_gpt_oss()calls.Adding extension point for custom configurations: Introduce a
custom: Optional[dict]field inRolloutConfigto support arbitrary user-defined configurations.Checklist Before Starting
[{modules}] {type}: {description}(This will be checked by the CI)Test
This PR is a refactoring and extension of existing functionality:
build_gpt_oss_tool_response_text()function maintains the same behavior as the original inline code inToolAgentLoopAgentLoopManagerloading usesimportlibwhich is a standard Python mechanismagent_loop_manager_classis not configuredAPI and Usage Example
1. Custom AgentLoopManager
Users can now specify a custom
AgentLoopManagerclass in their config:2. Reusable gpt-oss Tool Response Builder
External projects can now import and use the gpt-oss utility:
3. Custom Configuration Extension Point
Design & Code Changes
High-Level Design
Specific Changes
verl/trainer/ppo/ray_trainer.pyAgentLoopManagerviaimportlibverl/workers/config/rollout.pyagent_loop_manager_classtoAgentLoopConfig; Addedcustomfield toRolloutConfigverl/experimental/agent_loop/utils.pybuild_gpt_oss_tool_response_text()combining existing gpt-oss formatting functionsverl/experimental/agent_loop/tool_agent_loop.pybuild_gpt_oss_tool_response_text()for cleaner codeChecklist Before Submitting
pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=alwaysbuild_gpt_oss_tool_response_text()maintains identical behavior to the original inline code; (2)agent_loop_manager_classis optional and defaults to the existingAgentLoopManager; (3) existing agent loop tests already cover the default code path.ci-requestchannel in theverlSlack workspace. (If not accessible, please try the Feishu group (飞书群).)