-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Closed
Labels
Description
Awesome project. We have a paper https://arxiv.org/abs/2310.14034 with really complicated KV caching that I would love to go back and implement in SGLang.
I tried to get an example working in Colab for a demo, but I got kind of stuck getting the server running.
This runs fine:
!nohup python -m sglang.launch_server --model-path TheBloke/Mistral-7B-v0.1-AWQ --port 30000
But then when I run the following,
%%script bash
curl http://localhost:30000/v1/completions \
-H "Content-Type: application/json" \
-d '{
"prompt": "Say this is a test",
"max_tokens": 16,
"temperature": 0
}'
I just get this.
KV cache pool leak detected!
Warning: available_size=75821, max_total_num_token=75833
KV cache pool leak detected!
Warning: available_size=75821, max_total_num_token=75833
KV cache pool leak detected!
Warning: available_size=75821, max_total_num_token=75833
KV cache pool leak detected!
Warning: available_size=75821, max_total_num_token=75833
KV cache pool leak detected!
Warning: available_size=75821, max_total_num_token=75833
KV cache pool leak detected!
Warning: available_size=75821, max_total_num_token=75833
KV cache pool leak detected!
Warning: available_size=75821, max_total_num_token=75833
KV cache pool leak detected!
Warning: available_size=75821, max_total_num_token=75833
KV cache pool leak detected!
Warning: available_size=75821, max_total_num_token=75833
KV cache pool leak detected!
Warning: available_size=75821, max_total_num_token=75833
KV cache pool leak detected!
Warning: available_size=75821, max_total_num_token=75833
KV cache pool leak detected!
Warning: available_size=75821, max_total_num_token=75833
KV cache pool leak detected!
Warning: available_size=75821, max_total_num_token=75833
KV cache pool leak detected!
Warning: available_size=75821, max_total_num_token=75833
KV cache pool leak detected!
Warning: available_size=75821, max_total_num_token=75833
KV cache pool leak detected!
Warning: available_size=75821, max_total_num_token=75833
KV cache pool leak detected!
Warning: available_size=75821, max_total_num_token=75833
KV cache pool leak detected!
Warning: available_size=75821, max_total_num_token=75833
KV cache pool leak detected!
Warning: available_size=75821, max_total_num_token=75833
KV cache pool leak detected!
Warning: available_size=75821, max_total_num_token=75833
KV cache pool leak detected!
Warning: available_size=75821, max_total_num_token=75833
KV cache pool leak detected!
Warning: available_size=75821, max_total_num_token=75833
KV cache pool leak detected!
Warning: available_size=75821, max_total_num_token=75833
KV cache pool leak detected!
Warning: available_size=75821, max_total_num_token=75833
KV cache pool leak detected!
Warning: available_size=75821, max_total_num_token=75833
KV cache pool leak detected!
Warning: available_size=75821, max_total_num_token=75833
KV cache pool leak detected!
Warning: available_size=75821, max_total_num_token=75833
KV cache pool leak detected!
Warning: available_size=75821, max_total_num_token=75833
KV cache pool leak detected!
Warning: available_size=75821, max_total_num_token=75833
KV cache pool leak detected!
Warning: available_size=75821, max_total_num_token=75833
KV cache pool leak detected!
Any ideas?
tranhoangnguyen03