fix: support verl 0.7.1 EngineWorker in agent_workflow_trainer by yifannnwu · Pull Request #474 · rllm-org/rllm

yifannnwu · 2026-04-02T20:33:34Z

Summary

verl 0.7.1 defaults to use_legacy_worker_impl: disable, which activates the new EngineWorker class instead of the legacy TrainingWorker. This changes the worker API in ways that break agent_workflow_trainer.py:

compute_log_prob, compute_ref_log_prob, and update_actor now return TensorDict instead of DataProto
Workers operate in no-padding format internally, so outputs need no_padding_2_padding conversion
ppo_loss reads global_batch_size, temperature, etc. from the batch TensorDict — these must be injected before calls
compute_log_prob must set compute_loss=False to skip unnecessary loss computation (which requires global_batch_size)

Without this fix, any rllm user with verl 0.7.1 (default config) hits a cascade of errors:

KeyError: 'temperature'
KeyError: 'global_batch_size'
AttributeError: 'TensorDict' object has no attribute 'batch'
RuntimeError: tensor size mismatch (no-padding vs padding format)

Changes

All changes are backward-compatible via isinstance(output, TensorDict) checks:

compute_log_prob (training path): Convert DataProto → TensorDict → no-padding before call. Set compute_loss=False, calculate_entropy=True. Convert output back to padded DataProto with old_log_probs/entropys keys. This mirrors verl's own _compute_old_log_prob in ray_trainer.py.
compute_ref_log_prob: Same TensorDict return handling + no_padding_2_padding conversion.
update_actor: Inject mini_batch_size, epochs, seed, dataloader_kwargs, global_batch_size, temperature, calculate_entropy into batch metadata before call. Handle TensorDict return for metrics extraction. This mirrors verl's own _update_actor in ray_trainer.py.
Validation/distillation compute_log_prob: Same no-padding pattern.

Test plan

Verified end-to-end GRPO training (3 steps) with Qwen3-Coder-30B-A3B on 8×H200 GPUs
WandB metrics logged correctly (pg_loss, entropy, rewards)
Backward-compatible: isinstance checks ensure legacy DataProto path still works

verl 0.7.1 defaults to `use_legacy_worker_impl: disable`, which uses `EngineWorker` instead of `TrainingWorker`. This changes the worker API: - `compute_log_prob` / `compute_ref_log_prob` / `update_actor` now return `TensorDict` instead of `DataProto` - Workers operate in no-padding format internally, so outputs must be converted back via `no_padding_2_padding` - `ppo_loss` requires `global_batch_size`, `temperature` etc. in the batch TensorDict; `compute_log_prob` needs `compute_loss=False` and `calculate_entropy=True` Without this fix, training crashes with: KeyError: 'temperature' / 'global_batch_size' AttributeError: 'TensorDict' object has no attribute 'batch' RuntimeError: tensor size mismatch (no-padding vs padding) Changes: - compute_log_prob: convert DataProto→TensorDict→no-padding before call, set compute_loss=False + calculate_entropy=True, convert output back to padded DataProto with old_log_probs/entropys keys - compute_ref_log_prob: same TensorDict handling + no_padding_2_padding - update_actor: inject mini_batch_size, epochs, seed, global_batch_size, temperature, calculate_entropy into batch before call; handle TensorDict return for metrics extraction - validation/distillation compute_log_prob: same pattern - All changes are backward-compatible (isinstance checks for TensorDict vs DataProto returns)

yifannnwu · 2026-04-02T20:37:29Z

cc @jeffreysijuntan for review

listar2000 · 2026-04-02T20:55:49Z

Hi @yifannnwu thx for the (much needed) help on Verl support! Will take a look soon!

listar2000 · 2026-04-02T21:49:41Z

Thanks for the fix! Actually we are thinking about deprecating the legacy path all-together as well so this is definitely some first-steps we need (while ensuring backward-compatibility). I will also port these changes into VerlBackend.

yifannnwu · 2026-04-02T23:35:35Z

Thanks for the fix! Actually we are thinking about deprecating the legacy path all-together as well so this is definitely some first-steps we need (while ensuring backward-compatibility). I will also port these changes into VerlBackend.

Thanks for the quick merge @listar2000! Glad this aligns with your direction on deprecating the legacy path. Thanks for your support and look forward to large-scale stable RL training.

listar2000 · 2026-04-02T23:38:03Z

@yifannnwu I'm already working on some VerlBackend changes under this branch: https://github.com/rllm-org/rllm/tree/fix/verl-new-engine

The idea is to (since VerlBackend is still within experimental, can be a bit more aggressive) even disregard backward-compatibility there and fully embrace the new engine workers (as this is also what Verl is heading for). There are a few other issues in our current Verl support (e.g. padding, metainfo, etc.) that are also linked with the legacy worker.

Either way, feel free to also send PR/comments/issues on other Verl-related things -- this is one of our recent focus to improve on.

listar2000 merged commit 21e198c into rllm-org:main Apr 2, 2026

This was referenced Apr 4, 2026

fix: migrate VerlBackend to new EngineWorker path (verl 0.7.1) #483

Merged

Integrate fully async training to UnifiedTrainer #481

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: support verl 0.7.1 EngineWorker in agent_workflow_trainer#474

fix: support verl 0.7.1 EngineWorker in agent_workflow_trainer#474
listar2000 merged 1 commit intorllm-org:mainfrom
yifannnwu:fix/verl-071-new-worker-compat

yifannnwu commented Apr 2, 2026

Uh oh!

yifannnwu commented Apr 2, 2026

Uh oh!

listar2000 commented Apr 2, 2026

Uh oh!

listar2000 commented Apr 2, 2026

Uh oh!

yifannnwu commented Apr 2, 2026

Uh oh!

listar2000 commented Apr 2, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yifannnwu commented Apr 2, 2026

Summary

Changes

Test plan

Uh oh!

yifannnwu commented Apr 2, 2026

Uh oh!

listar2000 commented Apr 2, 2026

Uh oh!

listar2000 commented Apr 2, 2026

Uh oh!

yifannnwu commented Apr 2, 2026

Uh oh!

listar2000 commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

listar2000 commented Apr 2, 2026 •

edited

Loading