-
Notifications
You must be signed in to change notification settings - Fork 13.9k
Closed
Description
Looking at the options presented in the help-text, there doesn't appear to be a way to actually turn off continuous batching:
parallel:
-dt, --defrag-thold N KV cache defragmentation threshold (default: -1.0, < 0 - disabled)
-np, --parallel N number of parallel sequences to decode (default: 1)
-ns, --sequences N number of sequences to decode (default: 1)
-cb, --cont-batching enable continuous batching (a.k.a dynamic batching) (default: enabled)
The default value in common.h is true:
https://github.com/ggerganov/llama.cpp/blob/4e24cffd8cccd653634e24ee461c252bd77b1426/common/common.h#L166
And then setting the -cb flag also sets the param to true:
https://github.com/ggerganov/llama.cpp/blob/4e24cffd8cccd653634e24ee461c252bd77b1426/common/common.cpp#L796-L799
I haven't really tested it out a ton on my own, but it seems that there should at least be a way to turn it off. Unless there's really no good reason not to have it enabled.
Metadata
Metadata
Assignees
Labels
No labels