Skip to content

Commit 60a931e

Browse files
committed
add gqa limit
Signed-off-by: fsx950223 <[email protected]>
1 parent b267b82 commit 60a931e

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

vllm/attention/ops/chunked_prefill_paged_decode.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -286,7 +286,7 @@ def chunked_prefill_paged_decode(
286286
num_queries_per_kv,
287287
max_seq_len, sliding_window,
288288
kv_cache_dtype, alibi_slopes)
289-
if use_custom and head_size <= 128:
289+
if use_custom and head_size <= 128 and num_queries_per_kv <= 16:
290290
_PARTITION_SIZE_ROCM = 256
291291
max_num_partitions = ((max_seq_len + _PARTITION_SIZE_ROCM - 1) //
292292
_PARTITION_SIZE_ROCM)

0 commit comments

Comments
 (0)