Skip to content

Attention Sinks Perf Boost#78

Merged
LucasWilkinson merged 25 commits intomainfrom
lwilkinson/aux-fast-api-clean
Aug 9, 2025
Merged

Attention Sinks Perf Boost#78
LucasWilkinson merged 25 commits intomainfrom
lwilkinson/aux-fast-api-clean

Conversation

@LucasWilkinson
Copy link
Collaborator

@LucasWilkinson LucasWilkinson commented Aug 7, 2025

Shout-out to @jayhshah (the performance wizard 🪄) for the implementation

PR

reasoning-effort: low
model: openai/gpt-oss-20b

Results (n=5)
mean_chars: 53.7662878788
stderr_chars: 1.6426099883
mean_score: 0.5571969697
stderr_score: 0.0072987006


Main

reasoning-effort: low
model: openai/gpt-oss-20b

Results (n=5)
mean_chars: 53.2223484848
stderr_chars: 1.8772311021
mean_score: 0.5637626263
stderr_score: 0.0020261119

@jayhshah jayhshah force-pushed the lwilkinson/aux-fast-api-clean branch from 994e966 to deb7484 Compare August 8, 2025 00:46
@LucasWilkinson LucasWilkinson force-pushed the lwilkinson/aux-fast-api-clean branch from deb7484 to e1f506c Compare August 8, 2025 01:07
jayhshah and others added 22 commits August 8, 2025 01:08
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
…in cmakelists

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
@LucasWilkinson LucasWilkinson force-pushed the lwilkinson/aux-fast-api-clean branch from e1f506c to 4bca7a3 Compare August 8, 2025 01:08
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
@LucasWilkinson LucasWilkinson changed the title [WIP] Attention Sinks Perf Boost Attention Sinks Perf Boost Aug 9, 2025
@LucasWilkinson LucasWilkinson marked this pull request as ready for review August 9, 2025 03:58
…-api-clean

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
@LucasWilkinson LucasWilkinson merged commit 2d3b750 into main Aug 9, 2025
1 check passed
@mickaelseznec
Copy link

This breaks FlashAttention3 (required for FP8 attention) in vLLM v0.

I know V0 is deprecated but it's still worth a note announcing that support is removed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants