You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[None][fix] Remove overwrite of kv_cache_config.max_tokens
The existing code overwrites kv_cache_config.max_token and this restricts
kv_cache_config.max_token from be passed to the kv_cache_manager.
This is not correct, this commit fixes it. Additionally, we have
`max_gpu_total_bytes` from NVIDIA#5933 to estimate GPU memory now.
The next step is to remove the `max_tokens` concept as it is confusing
under a VSWA scheme and overlaps with `max_gpu_total_bytes` under
full attention scheme.
Signed-off-by: eopXD <[email protected]>
0 commit comments