bug: router crashes because of OOM and prefix cache HashTrie memory usage is unbounded

### Describe the bug

We found an issue during the stability test that router Pod keeps restarting after running for a certain amount of time. It seems the router is restarted by k8s because of OOM. We configured the router Pod limit with 2G memory initially. After raising to 16G, we observed that the memory usage of router is unbounded and keeps ramping up.

### To Reproduce

1. Create a vLLM production stack router, connected with multiple vLLM model server Pods
2. Configure the memory limit of router to be 2GB
3. Run a continuous long document QA benchmark for over 2 hours
4. The router pod should be periodically get restarted because of OOM

### Expected behavior

The HashTrie size should be capped by a limit and perhaps a GC can run periodically to cleanup the stale nodes to free up the memory, e.g TTL or LRU

### Additional context

Does vLLM production stack team have any plan to fix this?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

bug: router crashes because of OOM and prefix cache HashTrie memory usage is unbounded #687

Describe the bug

To Reproduce

Expected behavior

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

bug: router crashes because of OOM and prefix cache HashTrie memory usage is unbounded #687

Description

Describe the bug

To Reproduce

Expected behavior

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions