Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
b5f0b7c
[Metal] Fix hybrid model KV cache page size alignment
Alex-ai-future Apr 5, 2026
bce19e3
[Metal] Fix Qwen3.5 hybrid model initialization
Alex-ai-future Apr 5, 2026
12a609e
Restore model_runner.py to upstream version
Alex-ai-future Apr 5, 2026
d185630
Fix test for determine_available_memory single_sequence mode
Alex-ai-future Apr 5, 2026
c533f45
[Metal] Add unit tests for update_block_size_for_backend and improve …
Alex-ai-future Apr 5, 2026
2bf692e
[Metal] Improve error handling in update_block_size_for_backend
Alex-ai-future Apr 5, 2026
3598de5
[Metal] Add MLA support to update_block_size_for_backend
Alex-ai-future Apr 6, 2026
6315e47
[Metal] Fix lint issues in MLA support changes
Alex-ai-future Apr 6, 2026
1c22bf0
[Metal] Fix ruff format issues
Alex-ai-future Apr 6, 2026
3049380
use new device
Alex-ai-future Apr 6, 2026
ae3722f
[Metal] Address reviewer feedback on hybrid + paged attention
Alex-ai-future Apr 6, 2026
ffbe490
[Metal] Fix import order and add test for hybrid + paged error case
Alex-ai-future Apr 6, 2026
3234dd7
[Metal] Fix inaccurate docstring and test comments
Alex-ai-future Apr 6, 2026
fe70fc9
[Metal] Fix test_prefix_cache.py to use mx.device_info()
Alex-ai-future Apr 6, 2026
c768a77
[Metal] Add warning for hybrid + paged attention block-size translation
Alex-ai-future Apr 8, 2026
cbd6a91
[Metal] Use cache_config.gpu_memory_utilization instead of hardcoded 0.8
Alex-ai-future Apr 8, 2026
74854e5
[Metal] Remove code duplication in update_block_size_for_backend
Alex-ai-future Apr 8, 2026
a4f9512
[Metal] Restore one-sequence estimate in determine_available_memory
Alex-ai-future Apr 8, 2026
4c8e508
[Test] Update tests for hybrid + paged attention warning and one-sequ…
Alex-ai-future Apr 8, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading
Loading