-
-
Notifications
You must be signed in to change notification settings - Fork 11.7k
Open
Labels
performancePerformance-related issuesPerformance-related issues
Description
Performance issues
As seen in #27080, Inductor partition is not always faster than Dynamo partition or no partition. Those are two separate issues:
- On Blackwell, no partition is sometimes faster than Inductor partition (particularly TTFT with attention+quant fusion). This might just be attributable to the difference in performance between
FULL_AND_PIECEWISEandFULL_DECODE_ONLYcudagraph modes. - On Hopper, Inductor partition seems to outperform Dynamo partition, although we only have numbers for TP=4. We should try again with llama-8B TP=1.
BoyuanFeng
Metadata
Metadata
Assignees
Labels
performancePerformance-related issuesPerformance-related issues
Type
Projects
Status
Ready