Skip to content

Commit b267b82

Browse files
committed
add more condition
Signed-off-by: fsx950223 <[email protected]>
1 parent dd36f79 commit b267b82

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

vllm/attention/ops/chunked_prefill_paged_decode.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -286,7 +286,7 @@ def chunked_prefill_paged_decode(
286286
num_queries_per_kv,
287287
max_seq_len, sliding_window,
288288
kv_cache_dtype, alibi_slopes)
289-
if use_custom:
289+
if use_custom and head_size <= 128:
290290
_PARTITION_SIZE_ROCM = 256
291291
max_num_partitions = ((max_seq_len + _PARTITION_SIZE_ROCM - 1) //
292292
_PARTITION_SIZE_ROCM)

0 commit comments

Comments
 (0)