Skip to content

Refactor: review improvements for KV cache quantization#72

Closed
janhilgard wants to merge 1 commit intowaybarrios:mainfrom
janhilgard:review-improvements
Closed

Refactor: review improvements for KV cache quantization#72
janhilgard wants to merge 1 commit intowaybarrios:mainfrom
janhilgard:review-improvements

Conversation

@janhilgard
Copy link
Copy Markdown
Collaborator

Summary

  • Extract _maybe_dequantize() helper method to eliminate 5 repeated conditional dequantize blocks in fetch() (DRY)
  • Replace duck-typed hasattr/isinstance(keys, (list, tuple)) check with explicit isinstance(layer_cache, QuantizedKVCache) in estimate_kv_cache_memory() — more robust and self-documenting
  • Replace __new__() bypass with normal QuantizedKVCache(group_size=..., bits=...) constructor in _trim_cache_offset() — forward-compatible if __init__ adds new attributes
  • Extract _add_kv_cache_quantization_args() helper to deduplicate identical CLI argument definitions between serve and bench parsers

Test plan

  • All 16 test_kv_cache_quantization.py tests pass
  • ruff check clean on changed files (pre-existing N806 on _QKVCache unrelated)
  • Pure refactoring — no behavioral changes

🤖 Generated with Claude Code

@janhilgard
Copy link
Copy Markdown
Collaborator Author

Hey @waybarrios — ideally I'd push these changes directly into #62, but I don't have write access and maintainerCanModify is disabled on PR #62. It would be cleaner if you could either grant me write access to the repo or enable maintainerCanModify on #62, so I can push directly to your branch instead of creating separate PRs. Thanks!

@waybarrios
Copy link
Copy Markdown
Owner

@janhilgard Sorry for the late. Being pretty busy. but now you have access!! :)

@janhilgard
Copy link
Copy Markdown
Collaborator Author

Thank you very much @waybarrios! Really appreciate it 🙏

- Add `_maybe_dequantize()` method to replace 5 repeated dequantize patterns in fetch()
- Use explicit `isinstance(QuantizedKVCache)` instead of duck-typing in estimate_kv_cache_memory()
- Use normal QuantizedKVCache constructor instead of __new__() bypass in _trim_cache_offset()
- Extract `_add_kv_cache_quantization_args()` to deduplicate 4 CLI args in serve/bench parsers

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@janhilgard janhilgard changed the base branch from feat/kv-cache-quantization to main February 14, 2026 16:44
@janhilgard
Copy link
Copy Markdown
Collaborator Author

Closing: Related PRs #73 and #69 have already been merged. This refactor is now outdated.

@janhilgard janhilgard closed this Mar 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants