add varlen attention for qwen 3 #2084

liangel-02 · 2025-11-24T21:31:10Z

As title

Testing

performance and loss on par

wwwjn

LGTM! A random question about the loss - Why they are not identical? Is it because bot Flex and Varlen attention are compiled to be different kennels so they don't have numerics guarantee? If we don't compile the attention (bare the low efficiency), will the numerics be the same

torchtitan/models/qwen3/model/model.py

tianyu-l · 2025-11-25T04:23:20Z

also need to add sac support for Qwen3 varlen

liangel-02 · 2025-11-25T17:32:24Z

@wwwjn yes, i think compiling both is causing the slight difference in loss. this is the loss when both are not compiled which ends up being a lot closer

cc @drisspg

wwwjn

LGTM!

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 24, 2025

liangel-02 requested a review from drisspg November 24, 2025 21:31

liangel-02 marked this pull request as ready for review November 25, 2025 01:46

liangel-02 requested review from fegin, tianyu-l, wconstab and wwwjn as code owners November 25, 2025 01:46

wwwjn reviewed Nov 25, 2025

View reviewed changes

tianyu-l reviewed Nov 25, 2025

View reviewed changes

torchtitan/models/qwen3/model/model.py Outdated Show resolved Hide resolved

torchtitan/models/qwen3/model/model.py Outdated Show resolved Hide resolved

add varlen attention for qwen 3

cf4c8a1

liangel-02 force-pushed the qwen3 branch from a47df59 to cf4c8a1 Compare November 25, 2025 16:34

liangel-02 requested a review from tianyu-l November 25, 2025 18:03

wwwjn approved these changes Nov 25, 2025

View reviewed changes

drisspg approved these changes Nov 25, 2025

View reviewed changes

tianyu-l approved these changes Nov 25, 2025

View reviewed changes

liangel-02 merged commit 1b9cfda into main Nov 25, 2025
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add varlen attention for qwen 3 #2084

add varlen attention for qwen 3 #2084

liangel-02 commented Nov 24, 2025

Uh oh!

wwwjn left a comment

Uh oh!

Uh oh!

Uh oh!

tianyu-l commented Nov 25, 2025

Uh oh!

liangel-02 commented Nov 25, 2025

Uh oh!

wwwjn left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

add varlen attention for qwen 3 #2084

add varlen attention for qwen 3 #2084

Conversation

liangel-02 commented Nov 24, 2025

Uh oh!

wwwjn left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

tianyu-l commented Nov 25, 2025

Uh oh!

liangel-02 commented Nov 25, 2025

Uh oh!

wwwjn left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants