Skip to content

[Performance][torch.compile]: FlashInfer Attention + quant fusion performance issue with TP=4 #27829

@ProExpertProg

Description

@ProExpertProg

As seen in #27080, attention+quant fusion on 4xB200 with the FlashInfer attn backend performs worse than unfused code. This should be resolved so that we can turn on this fusion by default.

cc @nvpohanh @pavanimajety @zou3519 for visibility

Metadata

Metadata

Assignees

Type

No type

Projects

Status

Ready

Relationships

None yet

Development

No branches or pull requests

Issue actions