Skip to content

Commit 6eda9e8

Browse files
committed
[KV cache manager] No functional change intended, simplify block allocation in KVCacheManager::addSequence
Simply we are allocating enough block to accommodate "window size" number of tokens. Signed-off-by: eopXD <[email protected]>
1 parent 31ef6c0 commit 6eda9e8

File tree

1 file changed

+1
-5
lines changed

1 file changed

+1
-5
lines changed

cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1961,11 +1961,7 @@ void KVCacheManager::addSequence(
19611961

19621962
for (auto const [windowSize, metadata] : mBlockManager.getWindowSizesMetadata())
19631963
{
1964-
auto const maxTokenNum = metadata.maxTokenNum;
1965-
auto const temporaryAttentionWindow = metadata.temporaryAttentionWindow;
1966-
1967-
// Consider the temporaryAttentionWindow when allocating blocks.
1968-
auto const effectiveInputLength = std::min(inputLength, maxTokenNum + temporaryAttentionWindow);
1964+
auto const effectiveInputLength = std::min(inputLength, windowSize);
19691965
auto const numContextBlocks = tc::ceilDiv(effectiveInputLength, getTokensPerBlock());
19701966
if (!sequence.isCyclic() && mEnableBlockReuse)
19711967
{

0 commit comments

Comments
 (0)