Skip to content

Commit a7be540

Browse files
authored
[KVCache] Initialize one extra page than specified (#16849)
This PR udpates PagedKVCache to initialize one more page than specified via constructor. The reason is that applications usually depends the number of free pages (returned from `GetNumAvailablePages`) to decide the KV cache operation policy. If there is no this extra page, the KV cache will tell "no available" pages even when the last allocated pages are not full, which may give the applications an illusion that the KV cache is already completely full, and cause further issues.
1 parent a156181 commit a7be540

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

src/runtime/relax_vm/paged_kv_cache.cc

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1790,7 +1790,7 @@ TVM_REGISTER_GLOBAL("vm.builtin.paged_attention_kv_cache_create")
17901790
int64_t prefill_chunk_size = cache_config[2];
17911791
int64_t page_size = cache_config[3];
17921792
bool support_sliding_window = cache_config[4];
1793-
int64_t num_total_pages = (total_token_capacity + page_size - 1) / page_size;
1793+
int64_t num_total_pages = (total_token_capacity + page_size - 1) / page_size + 1;
17941794
if (support_sliding_window) {
17951795
// When sliding window is enabled, each sequence may use two more pages at most.
17961796
num_total_pages += reserved_num_seqs * 2;
@@ -1827,7 +1827,7 @@ TVM_REGISTER_GLOBAL("vm.builtin.paged_attention_kv_cache_create_reduced")
18271827
int64_t prefill_chunk_size = cache_config[2];
18281828
int64_t page_size = cache_config[3];
18291829
bool support_sliding_window = cache_config[4];
1830-
int64_t num_total_pages = (total_token_capacity + page_size - 1) / page_size;
1830+
int64_t num_total_pages = (total_token_capacity + page_size - 1) / page_size + 1;
18311831
if (support_sliding_window) {
18321832
// When sliding window is enabled, each sequence may use two more pages at most.
18331833
num_total_pages += reserved_num_seqs * 2;

0 commit comments

Comments
 (0)