Skip to content

Conversation

@fzyzcjy
Copy link
Collaborator

@fzyzcjy fzyzcjy commented Apr 7, 2025

Motivation

tune outputs will be in #5092, here I only put script updates to avoid making 5092 so big

Modifications

Checklist

CatherineSue and others added 30 commits April 4, 2025 16:36
# Conflicts:
#	python/sglang/srt/layers/attention/flashattention_backend.py
This reverts commit ac4cca3.
1. Adds a `use_irope` parameter to the RadixAttention class to indicate whether a layer should use local attention based on iRoPE
2. Modifies Llama4Attention to pass `use_irope=not self.nope` to RadixAttention, leveraging the existing NoPE flag
3. Updates FlashAttentionBackend.forward_extend to check for the `use_irope` flag when determining if local attention should be used
4. Simplifies local attention activation logic by directly checking `attention_chunk_size is not None` instead of using a separate flag
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants