[rollout, trainer] feat: extend agent loop for custom implementations by JoyboyBrian · Pull Request #4548 · verl-project/verl

JoyboyBrian · 2025-12-16T18:28:42Z

What does this PR do?

This PR enhances the extensibility of the agent loop module by:

Supporting custom AgentLoopManager: Allow users to specify a custom AgentLoopManager class via configuration (agent_loop_manager_class), enabling external projects to implement their own rollout management logic.
Extracting gpt-oss tool response builder: Extract the gpt-oss specific tool response formatting logic into a reusable utility function build_gpt_oss_tool_response_text(), combining existing format_gpt_oss_tool_response_manually() and add_generation_prompt_for_gpt_oss() calls.
Adding extension point for custom configurations: Introduce a custom: Optional[dict] field in RolloutConfig to support arbitrary user-defined configurations.

Checklist Before Starting

Search for similar PRs. Paste at least one query link here: https://github.com/volcengine/verl/pulls?q=is%3Apr+AgentLoopManager
Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)

Test

This PR is a refactoring and extension of existing functionality:

The build_gpt_oss_tool_response_text() function maintains the same behavior as the original inline code in ToolAgentLoop
The custom AgentLoopManager loading uses importlib which is a standard Python mechanism
No behavioral changes to existing workflows when agent_loop_manager_class is not configured

API and Usage Example

1. Custom AgentLoopManager

Users can now specify a custom AgentLoopManager class in their config:

actor_rollout_ref:
  rollout:
    mode: async
    agent:
      agent_loop_manager_class: "myproject.custom_manager.MyAgentLoopManager"

Security Note: The agent_loop_manager_class configuration uses dynamic import via importlib, which will execute the specified module's code. Only use class paths from trusted sources. Ensure your configuration files are not writable by untrusted users or processes.

2. Reusable gpt-oss Tool Response Builder

External projects can now import and use the gpt-oss utility:

from verl.experimental.agent_loop.utils import build_gpt_oss_tool_response_text

# Build gpt-oss tool response text (combines formatting + generation prompt)
tool_response_text = build_gpt_oss_tool_response_text(messages, tool_call_names)

3. Custom Configuration Extension Point

actor_rollout_ref:
  rollout:
    custom:
      my_custom_key: "my_custom_value"
      another_setting: 123

Design & Code Changes

High-Level Design

┌─────────────────────────────────────────────────────────────────┐
│                      ray_trainer.py                             │
│  ┌───────────────────────────────────────────────────────────┐  │
│  │  if agent_loop_manager_class configured:                  │  │
│  │      dynamically import via importlib                     │  │
│  │  else:                                                    │  │
│  │      use default AgentLoopManager                         │  │
│  └───────────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│         verl/experimental/agent_loop/utils.py (new export)      │
│  ┌───────────────────────────────────────────────────────────┐  │
│  │  build_gpt_oss_tool_response_text()                       │  │
│  │    - combines format_gpt_oss_tool_response_manually()     │  │
│  │    - and add_generation_prompt_for_gpt_oss()              │  │
│  └───────────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────┘

Specific Changes

File	Changes
`verl/trainer/ppo/ray_trainer.py`	Added dynamic loading of custom `AgentLoopManager` via `importlib`
`verl/workers/config/rollout.py`	Added `agent_loop_manager_class` to `AgentLoopConfig`; Added `custom` field to `RolloutConfig`
`verl/experimental/agent_loop/utils.py`	Added `build_gpt_oss_tool_response_text()` combining existing gpt-oss formatting functions
`verl/experimental/agent_loop/tool_agent_loop.py`	Refactored to use `build_gpt_oss_tool_response_text()` for cleaner code

Checklist Before Submitting

Read the Contribute Guide.
Apply pre-commit checks: pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always
Add / Update the documentation. N/A - This PR targets advanced users extending verl. The code is self-documenting with type hints and docstrings, and the PR description includes complete usage examples.
Add unit or end-to-end test(s) to the CI workflow to cover all the code. N/A - This is a pure refactoring with no behavioral changes: (1) build_gpt_oss_tool_response_text() maintains identical behavior to the original inline code; (2) agent_loop_manager_class is optional and defaults to the existing AgentLoopManager; (3) existing agent loop tests already cover the default code path.
Once your PR is ready for CI, send a message in the ci-request channel in the verl Slack workspace. (If not accessible, please try the Feishu group (飞书群).)

Allow dynamic loading of custom AgentLoopManager class through configuration. This enables users to specify agent_loop_manager_class in their config to use custom implementations like OsmosisAgentLoopManager.

- Introduced a new optional field `tool_calls` in the `TokenOutput` model to accommodate OpenAI-style tool calls extracted from generation. - This enhancement improves the model's capability to handle additional response data.

- Introduced an optional field `agent_loop_manager_class` in the `AgentLoopConfig` to allow users to specify a custom implementation for the AgentLoopManager, facilitating external integrations such as remote rollouts.

- Introduced a new optional field `custom` in the `RolloutConfig` class to allow users to specify arbitrary configurations for external integrations, enhancing flexibility for custom rollout implementations.

…tLoopManager - Updated the RayPPOTrainer to use importlib for importing the custom AgentLoopManager class specified in the configuration, enhancing security and maintainability by avoiding the use of exec.

Pure Extract Method refactoring with no behavioral changes. Extracted functions from ToolAgentLoop to utils.py: - compute_system_prompt: computes system prompt tokens for prefix stripping - apply_chat_template_with_processor: applies chat template via processor - apply_chat_template_with_tokenizer: applies chat template via tokenizer - tokenize_with_processor: tokenizes pre-rendered prompt via processor - build_gpt_oss_tool_response_text: builds gpt-oss tool response text All call sites maintain identical behavior: - Parameter mapping is 1:1 with original inline code - Executor context (sync vs async) unchanged at call sites - Empty dict {} passed where original code omitted kwargs (equivalent) - Optional `tools` parameter handled via conditional branching This enables reuse in external projects (e.g., remote rollout servers) without duplicating tokenization logic.

…icate compute_system_prompt

- Moved tokenization utility functions from ToolAgentLoop to chat_template.py for better reusability. - Updated import statements to reflect the new structure, ensuring cleaner code organization. - Added detailed docstrings for new functions to clarify their usage and parameters.

… utilities - Modified the apply_chat_template_with_processor and apply_chat_template_with_tokenizer functions to accept apply_chat_template_kwargs as an optional parameter, defaulting to None. - Updated internal logic to initialize apply_chat_template_kwargs to an empty dictionary if not provided, ensuring consistent behavior across function calls. - Removed unnecessary empty dictionary arguments from ToolAgentLoop class methods, streamlining the code.

gemini-code-assist

Code Review

This pull request enhances the extensibility of the agent loop by allowing a custom AgentLoopManager, extracting tokenization utilities for reusability, and adding a generic custom configuration field. The changes are well-structured and the refactoring of tokenization logic into utility functions is a good step towards better modularity. My review focuses on the new dynamic import mechanism for the custom AgentLoopManager. I have identified a robustness issue that could lead to a crash if the configuration is malformed, and a potential security vulnerability related to code injection. I've provided suggestions to make the implementation more robust and secure.

verl/trainer/ppo/ray_trainer.py

… import - Added validation to ensure the agent_loop_manager_class is a fully qualified class name. - Implemented detailed error handling for ImportError and AttributeError during dynamic import of the AgentLoopManager, providing clearer feedback for configuration issues. - Updated documentation in AgentLoopConfig to emphasize the need for trusted class paths.

wuxibin89 · 2025-12-17T05:39:06Z

verl/trainer/ppo/ray_trainer.py

        if self.config.actor_rollout_ref.rollout.mode == "async":
-            from verl.experimental.agent_loop import AgentLoopManager
+            # Support custom AgentLoopManager via config
+            manager_class = self.config.actor_rollout_ref.rollout.get("agent", {}).get("agent_loop_manager_class")


Please move this to a util function

Thanks for the suggestion!
I've refactored this by adding a load_class_from_fqn() utility function to verl/utils/import_utils.py. This function handles FQN parsing, dynamic import, and provides clear error messages. Commit: 3b78336
The code in ray_trainer.py is now simplified from ~20 lines to just 4 lines:

manager_class_fqn = self.config.actor_rollout_ref.rollout.get("agent", {}).get("agent_loop_manager_class") if manager_class_fqn: AgentLoopManager = load_class_from_fqn(manager_class_fqn, "AgentLoopManager") else: from verl.experimental.agent_loop import AgentLoopManager

This also removes the now-unused import importlib from the file.

wuxibin89 · 2025-12-17T05:40:26Z

verl/workers/config/rollout.py

    prometheus: PrometheusConfig = field(default_factory=PrometheusConfig)

+    # Extension point for custom configurations
+    custom: Optional[dict] = None


This field seems not used?

This field is designed as an extension point for downstream projects to pass custom configurations without modifying core config classes.

For example, a project implementing remote rollout could use:

actor_rollout_ref: rollout: custom: remote_rollout: server_url: "https://..." callback_url: "http://..." timeout_seconds: 300

Then access it via config.actor_rollout_ref.rollout.get("custom", {}) in their custom AgentLoopManager.

Happy to add a docstring clarifying this is an extension point!

…unction - Replaced manual import logic for AgentLoopManager with a new utility function, load_class_from_fqn, to enhance code readability and maintainability. - Added comprehensive error handling in load_class_from_fqn for improved feedback on import issues. - Updated RayPPOTrainer to utilize the new utility, simplifying the configuration process for custom agent loop managers.

JoyboyBrian · 2025-12-17T06:32:59Z

@wuxibin89 thanks for the review!

“Move this to a util function”: addressed — I extracted the dynamic import logic into a reusable utility (load_class_from_fqn) and simplified the ray_trainer.py path accordingly.
Commit: 3b78336
“This field seems not used?”: clarified — RolloutConfig.custom is intentionally an extension point for downstream projects, and I added inline documentation to make that intent explicit.

Would you mind taking another look when you have a chance? 🙏

wuxibin89 · 2025-12-17T08:14:36Z

verl/utils/chat_template.py

+    """
+    if apply_chat_template_kwargs is None:
+        apply_chat_template_kwargs = {}
+    if tools is None:


It's a bit redundant here, we can safely pass tools=None to processor.apply_chat_template?

wuxibin89 · 2025-12-17T08:21:16Z

I don't quit understand why we need these util function, it's more straightforward to call processor and tokenizer function directly.

apply_chat_template_with_processor
apply_chat_template_with_tokenizer
tokenize_with_processor

…tLoop - Removed redundant tokenization utility functions from chat_template.py, consolidating logic within the ToolAgentLoop class. - Updated the application of chat templates to utilize the processor and tokenizer directly, enhancing code clarity and maintainability. - Simplified the handling of apply_chat_template_kwargs, ensuring consistent behavior across different processing paths. - Improved the overall structure of the code by eliminating unnecessary imports and functions, promoting better reusability.

JoyboyBrian · 2025-12-17T08:49:22Z

I don't quit understand why we need these util function, it's more straightforward to call processor and tokenizer function directly.

apply_chat_template_with_processor

apply_chat_template_with_tokenizer

tokenize_with_processor

You're right. I initially extracted these utilities for consistency with initialize_system_prompt which was already in chat_template.py, thinking it might help centralize chat template related operations. But looking at it again, these wrappers are too thin to justify the indirection. Reverted.

…verl-project#4548) ### What does this PR do? This PR enhances the extensibility of the agent loop module by: 1. **Supporting custom AgentLoopManager**: Allow users to specify a custom `AgentLoopManager` class via configuration (`agent_loop_manager_class`), enabling external projects to implement their own rollout management logic. 2. **Extracting gpt-oss tool response builder**: Extract the gpt-oss specific tool response formatting logic into a reusable utility function `build_gpt_oss_tool_response_text()`, combining existing `format_gpt_oss_tool_response_manually()` and `add_generation_prompt_for_gpt_oss()` calls. 3. **Adding extension point for custom configurations**: Introduce a `custom: Optional[dict]` field in `RolloutConfig` to support arbitrary user-defined configurations. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: https://github.com/volcengine/verl/pulls?q=is%3Apr+AgentLoopManager - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) ### Test This PR is a refactoring and extension of existing functionality: - The `build_gpt_oss_tool_response_text()` function maintains the same behavior as the original inline code in `ToolAgentLoop` - The custom `AgentLoopManager` loading uses `importlib` which is a standard Python mechanism - No behavioral changes to existing workflows when `agent_loop_manager_class` is not configured ### API and Usage Example #### 1. Custom AgentLoopManager Users can now specify a custom `AgentLoopManager` class in their config: ```yaml actor_rollout_ref: rollout: mode: async agent: agent_loop_manager_class: "myproject.custom_manager.MyAgentLoopManager" ``` > **Security Note**: The `agent_loop_manager_class` configuration uses dynamic import via `importlib`, which will execute the specified module's code. Only use class paths from trusted sources. Ensure your configuration files are not writable by untrusted users or processes. #### 2. Reusable gpt-oss Tool Response Builder External projects can now import and use the gpt-oss utility: ```python from verl.experimental.agent_loop.utils import build_gpt_oss_tool_response_text # Build gpt-oss tool response text (combines formatting + generation prompt) tool_response_text = build_gpt_oss_tool_response_text(messages, tool_call_names) ``` #### 3. Custom Configuration Extension Point ```yaml actor_rollout_ref: rollout: custom: my_custom_key: "my_custom_value" another_setting: 123 ``` ### Design & Code Changes #### High-Level Design ``` ┌─────────────────────────────────────────────────────────────────┐ │ ray_trainer.py │ │ ┌───────────────────────────────────────────────────────────┐ │ │ │ if agent_loop_manager_class configured: │ │ │ │ dynamically import via importlib │ │ │ │ else: │ │ │ │ use default AgentLoopManager │ │ │ └───────────────────────────────────────────────────────────┘ │ └─────────────────────────────────────────────────────────────────┘ ┌─────────────────────────────────────────────────────────────────┐ │ verl/experimental/agent_loop/utils.py (new export) │ │ ┌───────────────────────────────────────────────────────────┐ │ │ │ build_gpt_oss_tool_response_text() │ │ │ │ - combines format_gpt_oss_tool_response_manually() │ │ │ │ - and add_generation_prompt_for_gpt_oss() │ │ │ └───────────────────────────────────────────────────────────┘ │ └─────────────────────────────────────────────────────────────────┘ ``` #### Specific Changes | File | Changes | |------|---------| | `verl/trainer/ppo/ray_trainer.py` | Added dynamic loading of custom `AgentLoopManager` via `importlib` | | `verl/workers/config/rollout.py` | Added `agent_loop_manager_class` to `AgentLoopConfig`; Added `custom` field to `RolloutConfig` | | `verl/experimental/agent_loop/utils.py` | Added `build_gpt_oss_tool_response_text()` combining existing gpt-oss formatting functions | | `verl/experimental/agent_loop/tool_agent_loop.py` | Refactored to use `build_gpt_oss_tool_response_text()` for cleaner code | ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). **N/A** - This PR targets advanced users extending verl. The code is self-documenting with type hints and docstrings, and the PR description includes complete usage examples. - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. **N/A** - This is a pure refactoring with no behavioral changes: (1) `build_gpt_oss_tool_response_text()` maintains identical behavior to the original inline code; (2) `agent_loop_manager_class` is optional and defaults to the existing `AgentLoopManager`; (3) existing agent loop tests already cover the default code path. - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

JoyboyBrian added 13 commits December 16, 2025 00:09

feat: support configurable AgentLoopManager via config

8775425

Allow dynamic loading of custom AgentLoopManager class through configuration. This enables users to specify agent_loop_manager_class in their config to use custom implementations like OsmosisAgentLoopManager.

feat(rollout): add support for custom AgentLoopManager class

4b62ab5

- Introduced an optional field `agent_loop_manager_class` in the `AgentLoopConfig` to allow users to specify a custom implementation for the AgentLoopManager, facilitating external integrations such as remote rollouts.

feat(rollout): add custom configuration support in RolloutConfig

3b00d00

- Introduced a new optional field `custom` in the `RolloutConfig` class to allow users to specify arbitrary configurations for external integrations, enhancing flexibility for custom rollout implementations.

fix(trainer): replace exec with importlib for dynamic loading of Agen…

85f1d62

…tLoopManager - Updated the RayPPOTrainer to use importlib for importing the custom AgentLoopManager class specified in the configuration, enhancing security and maintainability by avoiding the use of exec.

simplify custom configuration documentation

e77999b

remove comments

58315a2

remove redundant import statement for importlib

b879412

refactor: move tokenization utils to chat_template.py and remove dupl…

687fb36

…icate compute_system_prompt

delete unused fields

9039efe

JoyboyBrian requested review from PeterSH6, eric-haibin-lin, tongyx361 and vermouth1992 as code owners December 16, 2025 18:28

JoyboyBrian changed the title ~~Extend agent loop~~ [rollout, trainer] feat: extend agent loop for custom implementations Dec 16, 2025

gemini-code-assist bot reviewed Dec 16, 2025

View reviewed changes

verl/trainer/ppo/ray_trainer.py Outdated Show resolved Hide resolved

verl/trainer/ppo/ray_trainer.py Outdated Show resolved Hide resolved

wuxibin89 reviewed Dec 17, 2025

View reviewed changes

wuxibin89 approved these changes Dec 17, 2025

View reviewed changes

wuxibin89 merged commit 022c0ae into verl-project:main Dec 17, 2025
52 of 55 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[rollout, trainer] feat: extend agent loop for custom implementations#4548

[rollout, trainer] feat: extend agent loop for custom implementations#4548
wuxibin89 merged 16 commits intoverl-project:mainfrom
Osmosis-AI:extend-agent-loop

JoyboyBrian commented Dec 16, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

wuxibin89 Dec 17, 2025

Uh oh!

JoyboyBrian Dec 17, 2025 •

edited

Loading

Uh oh!

wuxibin89 Dec 17, 2025

Uh oh!

JoyboyBrian Dec 17, 2025

Uh oh!

JoyboyBrian commented Dec 17, 2025 •

edited

Loading

Uh oh!

wuxibin89 Dec 17, 2025

Uh oh!

JoyboyBrian Dec 17, 2025

Uh oh!

wuxibin89 commented Dec 17, 2025

Uh oh!

JoyboyBrian commented Dec 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

JoyboyBrian commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Checklist Before Starting

Test

API and Usage Example

1. Custom AgentLoopManager

2. Reusable gpt-oss Tool Response Builder

3. Custom Configuration Extension Point

Design & Code Changes

High-Level Design

Specific Changes

Checklist Before Submitting

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

wuxibin89 Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

JoyboyBrian Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wuxibin89 Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

JoyboyBrian Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

JoyboyBrian commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wuxibin89 Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

JoyboyBrian Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

wuxibin89 commented Dec 17, 2025

Uh oh!

JoyboyBrian commented Dec 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

JoyboyBrian commented Dec 16, 2025 •

edited

Loading

JoyboyBrian Dec 17, 2025 •

edited

Loading

JoyboyBrian commented Dec 17, 2025 •

edited

Loading