You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
With vLLM backend, currently there's no way for us to control the batch size defined in here and the vLLM model config does not have ways to determine a specific batch size. However, we can control the maximum number of sequences (batch size) in vLLM directly from examples such as this.
Solution/Feature
Propagate the max_num_seqs parameter into the initialization of the vLLM model.
Possible alternatives
Other alternatives are to implement batching ourselves, which is an overkill since the vLLM backend already supports that.