Continuous batching load test stuck

OS: Linux 2d078bb41859 5.15.0-83-generic #92~20.04.1-Ubuntu SMP Mon Aug 21 14:00:49 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

instance: 1xRTX 3090

load test tool: k6

Hi, i am doing load test for llama cpp server, but somehow the request only capped at the --parallel n. below i give the evidence
<img width="636" alt="Screen Shot 2024-03-02 at 10 05 07" src="https://github.com/ggerganov/llama.cpp/assets/55968084/75444de6-92a6-4f27-86b6-f00b93d7f8b2">
<img width="891" alt="Screen Shot 2024-03-02 at 10 04 36"  src="https://github.com/ggerganov/llama.cpp/assets/55968084/0b152b7c-3341-4d49-9866-a1f22ca69b28">
<img width="869" alt="Screen Shot 2024-03-02 at 10 06 46" src="https://github.com/ggerganov/llama.cpp/assets/55968084/5d80039b-b8a3-4fcd-8631-ae9b764cea32">

is the command for batch inference wrong?, because when the load test completed i try to manual send 1 request but the model didnt response anything(seems like the slot to released yet?)

any help is appreciated, thank you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Continuous batching load test stuck #5827

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Continuous batching load test stuck #5827

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions