Skip to content

[rollout, trainer] feat: extend agent loop for custom implementations#4548

Merged
wuxibin89 merged 16 commits intoverl-project:mainfrom
Osmosis-AI:extend-agent-loop
Dec 17, 2025
Merged

[rollout, trainer] feat: extend agent loop for custom implementations#4548
wuxibin89 merged 16 commits intoverl-project:mainfrom
Osmosis-AI:extend-agent-loop

Conversation

@JoyboyBrian
Copy link
Contributor

@JoyboyBrian JoyboyBrian commented Dec 16, 2025

What does this PR do?

This PR enhances the extensibility of the agent loop module by:

  1. Supporting custom AgentLoopManager: Allow users to specify a custom AgentLoopManager class via configuration (agent_loop_manager_class), enabling external projects to implement their own rollout management logic.

  2. Extracting gpt-oss tool response builder: Extract the gpt-oss specific tool response formatting logic into a reusable utility function build_gpt_oss_tool_response_text(), combining existing format_gpt_oss_tool_response_manually() and add_generation_prompt_for_gpt_oss() calls.

  3. Adding extension point for custom configurations: Introduce a custom: Optional[dict] field in RolloutConfig to support arbitrary user-defined configurations.

Checklist Before Starting

Test

This PR is a refactoring and extension of existing functionality:

  • The build_gpt_oss_tool_response_text() function maintains the same behavior as the original inline code in ToolAgentLoop
  • The custom AgentLoopManager loading uses importlib which is a standard Python mechanism
  • No behavioral changes to existing workflows when agent_loop_manager_class is not configured

API and Usage Example

1. Custom AgentLoopManager

Users can now specify a custom AgentLoopManager class in their config:

actor_rollout_ref:
  rollout:
    mode: async
    agent:
      agent_loop_manager_class: "myproject.custom_manager.MyAgentLoopManager"

Security Note: The agent_loop_manager_class configuration uses dynamic import via importlib, which will execute the specified module's code. Only use class paths from trusted sources. Ensure your configuration files are not writable by untrusted users or processes.

2. Reusable gpt-oss Tool Response Builder

External projects can now import and use the gpt-oss utility:

from verl.experimental.agent_loop.utils import build_gpt_oss_tool_response_text

# Build gpt-oss tool response text (combines formatting + generation prompt)
tool_response_text = build_gpt_oss_tool_response_text(messages, tool_call_names)

3. Custom Configuration Extension Point

actor_rollout_ref:
  rollout:
    custom:
      my_custom_key: "my_custom_value"
      another_setting: 123

Design & Code Changes

High-Level Design

┌─────────────────────────────────────────────────────────────────┐
│                      ray_trainer.py                             │
│  ┌───────────────────────────────────────────────────────────┐  │
│  │  if agent_loop_manager_class configured:                  │  │
│  │      dynamically import via importlib                     │  │
│  │  else:                                                    │  │
│  │      use default AgentLoopManager                         │  │
│  └───────────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│         verl/experimental/agent_loop/utils.py (new export)      │
│  ┌───────────────────────────────────────────────────────────┐  │
│  │  build_gpt_oss_tool_response_text()                       │  │
│  │    - combines format_gpt_oss_tool_response_manually()     │  │
│  │    - and add_generation_prompt_for_gpt_oss()              │  │
│  └───────────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────┘

Specific Changes

File Changes
verl/trainer/ppo/ray_trainer.py Added dynamic loading of custom AgentLoopManager via importlib
verl/workers/config/rollout.py Added agent_loop_manager_class to AgentLoopConfig; Added custom field to RolloutConfig
verl/experimental/agent_loop/utils.py Added build_gpt_oss_tool_response_text() combining existing gpt-oss formatting functions
verl/experimental/agent_loop/tool_agent_loop.py Refactored to use build_gpt_oss_tool_response_text() for cleaner code

Checklist Before Submitting

  • Read the Contribute Guide.
  • Apply pre-commit checks: pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always
  • Add / Update the documentation. N/A - This PR targets advanced users extending verl. The code is self-documenting with type hints and docstrings, and the PR description includes complete usage examples.
  • Add unit or end-to-end test(s) to the CI workflow to cover all the code. N/A - This is a pure refactoring with no behavioral changes: (1) build_gpt_oss_tool_response_text() maintains identical behavior to the original inline code; (2) agent_loop_manager_class is optional and defaults to the existing AgentLoopManager; (3) existing agent loop tests already cover the default code path.
  • Once your PR is ready for CI, send a message in the ci-request channel in the verl Slack workspace. (If not accessible, please try the Feishu group (飞书群).)

  Allow dynamic loading of custom AgentLoopManager class through configuration.
  This enables users to specify agent_loop_manager_class in their config to use
  custom implementations like OsmosisAgentLoopManager.
- Introduced a new optional field `tool_calls` in the `TokenOutput` model to accommodate OpenAI-style tool calls extracted from generation.
- This enhancement improves the model's capability to handle additional response data.
- Introduced an optional field `agent_loop_manager_class` in the `AgentLoopConfig` to allow users to specify a custom implementation for the AgentLoopManager, facilitating external integrations such as remote rollouts.
- Introduced a new optional field `custom` in the `RolloutConfig` class to allow users to specify arbitrary configurations for external integrations, enhancing flexibility for custom rollout implementations.
…tLoopManager

- Updated the RayPPOTrainer to use importlib for importing the custom AgentLoopManager class specified in the configuration, enhancing security and maintainability by avoiding the use of exec.
Pure Extract Method refactoring with no behavioral changes.

Extracted functions from ToolAgentLoop to utils.py:
- compute_system_prompt: computes system prompt tokens for prefix stripping
- apply_chat_template_with_processor: applies chat template via processor
- apply_chat_template_with_tokenizer: applies chat template via tokenizer
- tokenize_with_processor: tokenizes pre-rendered prompt via processor
- build_gpt_oss_tool_response_text: builds gpt-oss tool response text

All call sites maintain identical behavior:
- Parameter mapping is 1:1 with original inline code
- Executor context (sync vs async) unchanged at call sites
- Empty dict {} passed where original code omitted kwargs (equivalent)
- Optional `tools` parameter handled via conditional branching

This enables reuse in external projects (e.g., remote rollout servers) without duplicating tokenization logic.
- Moved tokenization utility functions from ToolAgentLoop to chat_template.py for better reusability.
- Updated import statements to reflect the new structure, ensuring cleaner code organization.
- Added detailed docstrings for new functions to clarify their usage and parameters.
… utilities

- Modified the apply_chat_template_with_processor and apply_chat_template_with_tokenizer functions to accept apply_chat_template_kwargs as an optional parameter, defaulting to None.
- Updated internal logic to initialize apply_chat_template_kwargs to an empty dictionary if not provided, ensuring consistent behavior across function calls.
- Removed unnecessary empty dictionary arguments from ToolAgentLoop class methods, streamlining the code.
@JoyboyBrian JoyboyBrian changed the title Extend agent loop [rollout, trainer] feat: extend agent loop for custom implementations Dec 16, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances the extensibility of the agent loop by allowing a custom AgentLoopManager, extracting tokenization utilities for reusability, and adding a generic custom configuration field. The changes are well-structured and the refactoring of tokenization logic into utility functions is a good step towards better modularity. My review focuses on the new dynamic import mechanism for the custom AgentLoopManager. I have identified a robustness issue that could lead to a crash if the configuration is malformed, and a potential security vulnerability related to code injection. I've provided suggestions to make the implementation more robust and secure.

… import

- Added validation to ensure the agent_loop_manager_class is a fully qualified class name.
- Implemented detailed error handling for ImportError and AttributeError during dynamic import of the AgentLoopManager, providing clearer feedback for configuration issues.
- Updated documentation in AgentLoopConfig to emphasize the need for trusted class paths.
if self.config.actor_rollout_ref.rollout.mode == "async":
from verl.experimental.agent_loop import AgentLoopManager
# Support custom AgentLoopManager via config
manager_class = self.config.actor_rollout_ref.rollout.get("agent", {}).get("agent_loop_manager_class")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move this to a util function

Copy link
Contributor Author

@JoyboyBrian JoyboyBrian Dec 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion!
I've refactored this by adding a load_class_from_fqn() utility function to verl/utils/import_utils.py. This function handles FQN parsing, dynamic import, and provides clear error messages. Commit: 3b78336
The code in ray_trainer.py is now simplified from ~20 lines to just 4 lines:

  manager_class_fqn = self.config.actor_rollout_ref.rollout.get("agent", {}).get("agent_loop_manager_class")
  if manager_class_fqn:
      AgentLoopManager = load_class_from_fqn(manager_class_fqn, "AgentLoopManager")
  else:
      from verl.experimental.agent_loop import AgentLoopManager

This also removes the now-unused import importlib from the file.

prometheus: PrometheusConfig = field(default_factory=PrometheusConfig)

# Extension point for custom configurations
custom: Optional[dict] = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This field seems not used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This field is designed as an extension point for downstream projects to pass custom configurations without modifying core config classes.

For example, a project implementing remote rollout could use:

actor_rollout_ref:
  rollout:
    custom:
      remote_rollout:
        server_url: "https://..."
        callback_url: "http://..."
        timeout_seconds: 300

Then access it via config.actor_rollout_ref.rollout.get("custom", {}) in their custom AgentLoopManager.

Happy to add a docstring clarifying this is an extension point!

…unction

- Replaced manual import logic for AgentLoopManager with a new utility function, load_class_from_fqn, to enhance code readability and maintainability.
- Added comprehensive error handling in load_class_from_fqn for improved feedback on import issues.
- Updated RayPPOTrainer to utilize the new utility, simplifying the configuration process for custom agent loop managers.
@JoyboyBrian
Copy link
Contributor Author

JoyboyBrian commented Dec 17, 2025

@wuxibin89 thanks for the review!

  1. “Move this to a util function”: addressed — I extracted the dynamic import logic into a reusable utility (load_class_from_fqn) and simplified the ray_trainer.py path accordingly.
    Commit: 3b78336

  2. “This field seems not used?”: clarified — RolloutConfig.custom is intentionally an extension point for downstream projects, and I added inline documentation to make that intent explicit.

Would you mind taking another look when you have a chance? 🙏

"""
if apply_chat_template_kwargs is None:
apply_chat_template_kwargs = {}
if tools is None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit redundant here, we can safely pass tools=None to processor.apply_chat_template?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted!

@wuxibin89
Copy link
Collaborator

I don't quit understand why we need these util function, it's more straightforward to call processor and tokenizer function directly.

  • apply_chat_template_with_processor
  • apply_chat_template_with_tokenizer
  • tokenize_with_processor

…tLoop

- Removed redundant tokenization utility functions from chat_template.py, consolidating logic within the ToolAgentLoop class.
- Updated the application of chat templates to utilize the processor and tokenizer directly, enhancing code clarity and maintainability.
- Simplified the handling of apply_chat_template_kwargs, ensuring consistent behavior across different processing paths.
- Improved the overall structure of the code by eliminating unnecessary imports and functions, promoting better reusability.
@JoyboyBrian
Copy link
Contributor Author

I don't quit understand why we need these util function, it's more straightforward to call processor and tokenizer function directly.

  • apply_chat_template_with_processor
  • apply_chat_template_with_tokenizer
  • tokenize_with_processor

You're right. I initially extracted these utilities for consistency with initialize_system_prompt which was already in chat_template.py, thinking it might help centralize chat template related operations. But looking at it again, these wrappers are too thin to justify the indirection. Reverted.

@wuxibin89 wuxibin89 merged commit 022c0ae into verl-project:main Dec 17, 2025
52 of 55 checks passed
vyomakesh0728 added a commit to vyomakesh0728/verl that referenced this pull request Jan 22, 2026
…verl-project#4548)

### What does this PR do?

This PR enhances the extensibility of the agent loop module by:

1. **Supporting custom AgentLoopManager**: Allow users to specify a
custom `AgentLoopManager` class via configuration
(`agent_loop_manager_class`), enabling external projects to implement
their own rollout management logic.

2. **Extracting gpt-oss tool response builder**: Extract the gpt-oss
specific tool response formatting logic into a reusable utility function
`build_gpt_oss_tool_response_text()`, combining existing
`format_gpt_oss_tool_response_manually()` and
`add_generation_prompt_for_gpt_oss()` calls.

3. **Adding extension point for custom configurations**: Introduce a
`custom: Optional[dict]` field in `RolloutConfig` to support arbitrary
user-defined configurations.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
https://github.com/volcengine/verl/pulls?q=is%3Apr+AgentLoopManager
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)

### Test

This PR is a refactoring and extension of existing functionality:

- The `build_gpt_oss_tool_response_text()` function maintains the same
behavior as the original inline code in `ToolAgentLoop`
- The custom `AgentLoopManager` loading uses `importlib` which is a
standard Python mechanism
- No behavioral changes to existing workflows when
`agent_loop_manager_class` is not configured

### API and Usage Example

#### 1. Custom AgentLoopManager

Users can now specify a custom `AgentLoopManager` class in their config:

```yaml
actor_rollout_ref:
  rollout:
    mode: async
    agent:
      agent_loop_manager_class: "myproject.custom_manager.MyAgentLoopManager"
```

> **Security Note**: The `agent_loop_manager_class` configuration uses
dynamic import via `importlib`, which will execute the specified
module's code. Only use class paths from trusted sources. Ensure your
configuration files are not writable by untrusted users or processes.

#### 2. Reusable gpt-oss Tool Response Builder

External projects can now import and use the gpt-oss utility:

```python
from verl.experimental.agent_loop.utils import build_gpt_oss_tool_response_text

# Build gpt-oss tool response text (combines formatting + generation prompt)
tool_response_text = build_gpt_oss_tool_response_text(messages, tool_call_names)
```

#### 3. Custom Configuration Extension Point

```yaml
actor_rollout_ref:
  rollout:
    custom:
      my_custom_key: "my_custom_value"
      another_setting: 123
```

### Design & Code Changes

#### High-Level Design

```
┌─────────────────────────────────────────────────────────────────┐
│                      ray_trainer.py                             │
│  ┌───────────────────────────────────────────────────────────┐  │
│  │  if agent_loop_manager_class configured:                  │  │
│  │      dynamically import via importlib                     │  │
│  │  else:                                                    │  │
│  │      use default AgentLoopManager                         │  │
│  └───────────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│         verl/experimental/agent_loop/utils.py (new export)      │
│  ┌───────────────────────────────────────────────────────────┐  │
│  │  build_gpt_oss_tool_response_text()                       │  │
│  │    - combines format_gpt_oss_tool_response_manually()     │  │
│  │    - and add_generation_prompt_for_gpt_oss()              │  │
│  └───────────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────┘
```

#### Specific Changes

| File | Changes |
|------|---------|
| `verl/trainer/ppo/ray_trainer.py` | Added dynamic loading of custom
`AgentLoopManager` via `importlib` |
| `verl/workers/config/rollout.py` | Added `agent_loop_manager_class` to
`AgentLoopConfig`; Added `custom` field to `RolloutConfig` |
| `verl/experimental/agent_loop/utils.py` | Added
`build_gpt_oss_tool_response_text()` combining existing gpt-oss
formatting functions |
| `verl/experimental/agent_loop/tool_agent_loop.py` | Refactored to use
`build_gpt_oss_tool_response_text()` for cleaner code |

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
**N/A** - This PR targets advanced users extending verl. The code is
self-documenting with type hints and docstrings, and the PR description
includes complete usage examples.
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. **N/A** - This is a pure refactoring with no
behavioral changes: (1) `build_gpt_oss_tool_response_text()` maintains
identical behavior to the original inline code; (2)
`agent_loop_manager_class` is optional and defaults to the existing
`AgentLoopManager`; (3) existing agent loop tests already cover the
default code path.
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
sophiayyya pushed a commit to sophiayyya/verl that referenced this pull request Jan 25, 2026
…verl-project#4548)

### What does this PR do?

This PR enhances the extensibility of the agent loop module by:

1. **Supporting custom AgentLoopManager**: Allow users to specify a
custom `AgentLoopManager` class via configuration
(`agent_loop_manager_class`), enabling external projects to implement
their own rollout management logic.

2. **Extracting gpt-oss tool response builder**: Extract the gpt-oss
specific tool response formatting logic into a reusable utility function
`build_gpt_oss_tool_response_text()`, combining existing
`format_gpt_oss_tool_response_manually()` and
`add_generation_prompt_for_gpt_oss()` calls.

3. **Adding extension point for custom configurations**: Introduce a
`custom: Optional[dict]` field in `RolloutConfig` to support arbitrary
user-defined configurations.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
https://github.com/volcengine/verl/pulls?q=is%3Apr+AgentLoopManager
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)

### Test

This PR is a refactoring and extension of existing functionality:

- The `build_gpt_oss_tool_response_text()` function maintains the same
behavior as the original inline code in `ToolAgentLoop`
- The custom `AgentLoopManager` loading uses `importlib` which is a
standard Python mechanism
- No behavioral changes to existing workflows when
`agent_loop_manager_class` is not configured

### API and Usage Example

#### 1. Custom AgentLoopManager

Users can now specify a custom `AgentLoopManager` class in their config:

```yaml
actor_rollout_ref:
  rollout:
    mode: async
    agent:
      agent_loop_manager_class: "myproject.custom_manager.MyAgentLoopManager"
```

> **Security Note**: The `agent_loop_manager_class` configuration uses
dynamic import via `importlib`, which will execute the specified
module's code. Only use class paths from trusted sources. Ensure your
configuration files are not writable by untrusted users or processes.

#### 2. Reusable gpt-oss Tool Response Builder

External projects can now import and use the gpt-oss utility:

```python
from verl.experimental.agent_loop.utils import build_gpt_oss_tool_response_text

# Build gpt-oss tool response text (combines formatting + generation prompt)
tool_response_text = build_gpt_oss_tool_response_text(messages, tool_call_names)
```

#### 3. Custom Configuration Extension Point

```yaml
actor_rollout_ref:
  rollout:
    custom:
      my_custom_key: "my_custom_value"
      another_setting: 123
```

### Design & Code Changes

#### High-Level Design

```
┌─────────────────────────────────────────────────────────────────┐
│                      ray_trainer.py                             │
│  ┌───────────────────────────────────────────────────────────┐  │
│  │  if agent_loop_manager_class configured:                  │  │
│  │      dynamically import via importlib                     │  │
│  │  else:                                                    │  │
│  │      use default AgentLoopManager                         │  │
│  └───────────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│         verl/experimental/agent_loop/utils.py (new export)      │
│  ┌───────────────────────────────────────────────────────────┐  │
│  │  build_gpt_oss_tool_response_text()                       │  │
│  │    - combines format_gpt_oss_tool_response_manually()     │  │
│  │    - and add_generation_prompt_for_gpt_oss()              │  │
│  └───────────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────┘
```

#### Specific Changes

| File | Changes |
|------|---------|
| `verl/trainer/ppo/ray_trainer.py` | Added dynamic loading of custom
`AgentLoopManager` via `importlib` |
| `verl/workers/config/rollout.py` | Added `agent_loop_manager_class` to
`AgentLoopConfig`; Added `custom` field to `RolloutConfig` |
| `verl/experimental/agent_loop/utils.py` | Added
`build_gpt_oss_tool_response_text()` combining existing gpt-oss
formatting functions |
| `verl/experimental/agent_loop/tool_agent_loop.py` | Refactored to use
`build_gpt_oss_tool_response_text()` for cleaner code |

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
**N/A** - This PR targets advanced users extending verl. The code is
self-documenting with type hints and docstrings, and the PR description
includes complete usage examples.
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. **N/A** - This is a pure refactoring with no
behavioral changes: (1) `build_gpt_oss_tool_response_text()` maintains
identical behavior to the original inline code; (2)
`agent_loop_manager_class` is optional and defaults to the existing
`AgentLoopManager`; (3) existing agent loop tests already cover the
default code path.
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants