-
-
Notifications
You must be signed in to change notification settings - Fork 11.7k
Open
Labels
Description
Proposal to improve performance
Currently, we pad num_tokens to a multiple of TP (SP) size when Sequence Parallelism (SP) is enabled. However, we should benchmark to see the performance reduction in padding num_tokens and compare it to just manually padding with -num_tokens % tp_size around the sequence parallel section, or by doing uneven work across TP ranks by manipulating the sizes returned by reduce_scatter (more complicated but theoretically the best performance).
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Ready