-
Notifications
You must be signed in to change notification settings - Fork 13.5k
Closed
Labels
Description
OS: Linux 2d078bb41859 5.15.0-83-generic #92~20.04.1-Ubuntu SMP Mon Aug 21 14:00:49 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
instance: 1xRTX 3090
load test tool: k6
Hi, i am doing load test for llama cpp server, but somehow the request only capped at the --parallel n. below i give the evidence



is the command for batch inference wrong?, because when the load test completed i try to manual send 1 request but the model didnt response anything(seems like the slot to released yet?)
any help is appreciated, thank you.