Skip to content

bug: router crashes because of OOM and prefix cache HashTrie memory usage is unbounded #687

@can-sun

Description

@can-sun

Describe the bug

We found an issue during the stability test that router Pod keeps restarting after running for a certain amount of time. It seems the router is restarted by k8s because of OOM. We configured the router Pod limit with 2G memory initially. After raising to 16G, we observed that the memory usage of router is unbounded and keeps ramping up.

To Reproduce

  1. Create a vLLM production stack router, connected with multiple vLLM model server Pods
  2. Configure the memory limit of router to be 2GB
  3. Run a continuous long document QA benchmark for over 2 hours
  4. The router pod should be periodically get restarted because of OOM

Expected behavior

The HashTrie size should be capped by a limit and perhaps a GC can run periodically to cleanup the stale nodes to free up the memory, e.g TTL or LRU

Additional context

Does vLLM production stack team have any plan to fix this?

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions