Skip to content

Colab?  #14

@srush

Description

@srush

Awesome project. We have a paper https://arxiv.org/abs/2310.14034 with really complicated KV caching that I would love to go back and implement in SGLang.

I tried to get an example working in Colab for a demo, but I got kind of stuck getting the server running.

This runs fine:

!nohup python -m sglang.launch_server --model-path TheBloke/Mistral-7B-v0.1-AWQ --port 30000

But then when I run the following,

%%script bash
curl http://localhost:30000/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Say this is a test",
    "max_tokens": 16,
    "temperature": 0
  }'

I just get this.

KV cache pool leak detected!
Warning: available_size=75821, max_total_num_token=75833
KV cache pool leak detected!
Warning: available_size=75821, max_total_num_token=75833
KV cache pool leak detected!
Warning: available_size=75821, max_total_num_token=75833
KV cache pool leak detected!
Warning: available_size=75821, max_total_num_token=75833
KV cache pool leak detected!
Warning: available_size=75821, max_total_num_token=75833
KV cache pool leak detected!
Warning: available_size=75821, max_total_num_token=75833
KV cache pool leak detected!
Warning: available_size=75821, max_total_num_token=75833
KV cache pool leak detected!
Warning: available_size=75821, max_total_num_token=75833
KV cache pool leak detected!
Warning: available_size=75821, max_total_num_token=75833
KV cache pool leak detected!
Warning: available_size=75821, max_total_num_token=75833
KV cache pool leak detected!
Warning: available_size=75821, max_total_num_token=75833
KV cache pool leak detected!
Warning: available_size=75821, max_total_num_token=75833
KV cache pool leak detected!
Warning: available_size=75821, max_total_num_token=75833
KV cache pool leak detected!
Warning: available_size=75821, max_total_num_token=75833
KV cache pool leak detected!
Warning: available_size=75821, max_total_num_token=75833
KV cache pool leak detected!
Warning: available_size=75821, max_total_num_token=75833
KV cache pool leak detected!
Warning: available_size=75821, max_total_num_token=75833
KV cache pool leak detected!
Warning: available_size=75821, max_total_num_token=75833
KV cache pool leak detected!
Warning: available_size=75821, max_total_num_token=75833
KV cache pool leak detected!
Warning: available_size=75821, max_total_num_token=75833
KV cache pool leak detected!
Warning: available_size=75821, max_total_num_token=75833
KV cache pool leak detected!
Warning: available_size=75821, max_total_num_token=75833
KV cache pool leak detected!
Warning: available_size=75821, max_total_num_token=75833
KV cache pool leak detected!
Warning: available_size=75821, max_total_num_token=75833
KV cache pool leak detected!
Warning: available_size=75821, max_total_num_token=75833
KV cache pool leak detected!
Warning: available_size=75821, max_total_num_token=75833
KV cache pool leak detected!
Warning: available_size=75821, max_total_num_token=75833
KV cache pool leak detected!
Warning: available_size=75821, max_total_num_token=75833
KV cache pool leak detected!
Warning: available_size=75821, max_total_num_token=75833
KV cache pool leak detected!
Warning: available_size=75821, max_total_num_token=75833
KV cache pool leak detected!

Any ideas?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions