Skip to content

[MoE][Offload] Run MoE models exceeding VRAM via expert CPU offloading with GPU cache (--moe-expert-cache-size)#37190

Open
e1n00r wants to merge 3 commits intovllm-project:mainfrom
e1n00r:feature/moe-expert-lru-cache
Open

[MoE][Offload] Run MoE models exceeding VRAM via expert CPU offloading with GPU cache (--moe-expert-cache-size)#37190
e1n00r wants to merge 3 commits intovllm-project:mainfrom
e1n00r:feature/moe-expert-lru-cache

Commits

Commits on Apr 9, 2026