CUDA: mul_mat_vec_q max. batch size 8 -> 4 #5370

JohannesGaessler · 2024-02-06T17:35:58Z

As a follow-up to #5351 , reduces the maximum batch size for which to use mul_mat_vec_q from 8 to 4.

CUDA: mul_mat_vec_q max. batch size 8 -> 4

4e1d68b

ggerganov approved these changes Feb 6, 2024

View reviewed changes

ggerganov merged commit 17c97fb into ggerganov:master Feb 6, 2024
52 checks passed

jordankanter pushed a commit to jordankanter/llama.cpp that referenced this pull request Mar 13, 2024

CUDA: mul_mat_vec_q max. batch size 8 -> 4 (ggerganov#5370)

61a0422

hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024

CUDA: mul_mat_vec_q max. batch size 8 -> 4 (ggerganov#5370)

a640cf3

This was referenced Apr 16, 2024

Quantized Matmul: Small batches are slower than no-batch huggingface/candle#2074

Closed

Quantized Mistral: Batching is slower than non batches EricLBuehler/mistral.rs#139

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA: mul_mat_vec_q max. batch size 8 -> 4 #5370

CUDA: mul_mat_vec_q max. batch size 8 -> 4 #5370

JohannesGaessler commented Feb 6, 2024

CUDA: mul_mat_vec_q max. batch size 8 -> 4 #5370

CUDA: mul_mat_vec_q max. batch size 8 -> 4 #5370

Conversation

JohannesGaessler commented Feb 6, 2024