[vllm, fully_async] fix: clamp max_tokens to response_length instead of max_model_len - prompt_len in async vLLM rollout#5505

Closed

Silas-11 wants to merge 2 commits intoverl-project:mainfrom

Silas-11:release

Commits on Mar 5, 2026

[rollout] fix: Fix full async mode performance degradation caused by wrong max_tokens
Silas-11
committed

Commits on Mar 6, 2026

[rollout] fix: compute max_tokens based on response_length with prompt overflow deduction
Silas-11
committed