UPSTREAM PR #17744: model: add llama 4 scaling for mistral-large (deepseek arch)#423
UPSTREAM PR #17744: model: add llama 4 scaling for mistral-large (deepseek arch)#423
Conversation
|
Explore the complete analysis inside the Version Insights Performance Analysis Summary: PR #423OverviewPR #423 adds Llama 4 attention temperature scaling support for Mistral Large models using the DeepSeek2 architecture. The changes enable context lengths beyond 16K by implementing optional temperature tuning parameters. Modified Files: 2 files ( Code Changes Analysis1. Model Loading (
|
df48f9e to
cb46586
Compare
048ad94 to
6c1fde6
Compare
ef7afbe to
d4c3480
Compare
Mirrored from ggml-org/llama.cpp#17744
Cont ggml-org/llama.cpp#17730
This should allow Mistral Large to go past 16K context length (hopefully, someone with enough VRAM can verify if this works or not)