Skip to content

Commit 6e93c1b

Browse files
committed
revert: revert the change for prepare estimation:
- we shouldn't use max_seq_len as the kv config max_tokens as it doesn't need that much - and it makes preparation to have OOM more easily especially long seq. Signed-off-by: qixiang-99 <[email protected]>
1 parent 33cc07d commit 6e93c1b

File tree

1 file changed

+2
-6
lines changed

1 file changed

+2
-6
lines changed

tensorrt_llm/_torch/pyexecutor/_util.py

Lines changed: 2 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -193,12 +193,8 @@ def try_prepare_estimation(self) -> bool:
193193
estimating_kv_cache = False
194194
if 'cp_type' not in self._mapping.cp_config:
195195
estimating_kv_cache = True
196-
max_attention_window_from_config = self._executor_config.kv_cache_config.max_attention_window
197-
max_seq_len = max(
198-
max_attention_window_from_config
199-
) if max_attention_window_from_config is not None else self._executor_config.max_seq_len
200-
self._executor_config.kv_cache_config.max_tokens = max(
201-
self._get_token_num_for_estimation(), max_seq_len)
196+
self._executor_config.kv_cache_config.max_tokens = self._get_token_num_for_estimation(
197+
)
202198
return estimating_kv_cache
203199

204200
def configure_kv_cache_capacity(self, py_executor: PyExecutor) -> None:

0 commit comments

Comments
 (0)