[Performance]: Reevaluate padding requirement for Sequence Parallelism

### Proposal to improve performance

Currently, we pad `num_tokens` to a multiple of TP (SP) size when Sequence Parallelism (SP) is enabled. However, we should benchmark to see the performance reduction in padding num_tokens and compare it to just manually padding with `-num_tokens % tp_size` around the sequence parallel section, or by doing uneven work across TP ranks by manipulating the sizes returned by `reduce_scatter` (more complicated but theoretically the best performance).

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Performance]: Reevaluate padding requirement for Sequence Parallelism #29136

Proposal to improve performance

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Performance]: Reevaluate padding requirement for Sequence Parallelism #29136

Description

Proposal to improve performance

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions