Skip to content

Redundant Prefix Stores for Single Base Model #190

@sagiahrac

Description

@sagiahrac

Summary

The KV-Cache manager currently creates and maintains multiple prefix stores (prompt-to-tokens cache) despite serving only a single base model. This results in redundant cache management overhead.

Action Required

Consolidate the prefix caching mechanism to utilize a single, global store per base model instance, removing unnecessary redundant stores.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions