Skip to content

Conversation

@liangel-02
Copy link
Contributor

As title

Testing

Screenshot 2025-11-24 at 4 30 53 PM

performance and loss on par

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 24, 2025
@liangel-02 liangel-02 requested a review from drisspg November 24, 2025 21:31
@liangel-02 liangel-02 marked this pull request as ready for review November 25, 2025 01:46
Copy link
Contributor

@wwwjn wwwjn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! A random question about the loss - Why they are not identical? Is it because bot Flex and Varlen attention are compiled to be different kennels so they don't have numerics guarantee? If we don't compile the attention (bare the low efficiency), will the numerics be the same

@tianyu-l
Copy link
Contributor

also need to add sac support for Qwen3 varlen

@liangel-02
Copy link
Contributor Author

@wwwjn yes, i think compiling both is causing the slight difference in loss. this is the loss when both are not compiled which ends up being a lot closer

Screenshot 2025-11-25 at 12 31 12 PM

cc @drisspg

@liangel-02 liangel-02 requested a review from tianyu-l November 25, 2025 18:03
Copy link
Contributor

@wwwjn wwwjn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@liangel-02 liangel-02 merged commit 1b9cfda into main Nov 25, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants