Skip to content

Continuous batching load test stuck #5827

@Kev1ntan

Description

@Kev1ntan

OS: Linux 2d078bb41859 5.15.0-83-generic #92~20.04.1-Ubuntu SMP Mon Aug 21 14:00:49 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

instance: 1xRTX 3090

load test tool: k6

Hi, i am doing load test for llama cpp server, but somehow the request only capped at the --parallel n. below i give the evidence
Screen Shot 2024-03-02 at 10 05 07
Screen Shot 2024-03-02 at 10 04 36
Screen Shot 2024-03-02 at 10 06 46

is the command for batch inference wrong?, because when the load test completed i try to manual send 1 request but the model didnt response anything(seems like the slot to released yet?)

any help is appreciated, thank you.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions