Skip to content

Fix meta backend tensor reads for split tensors during state serialization#22063

Merged
JohannesGaessler merged 1 commit into
ggml-org:masterfrom
ssam18:fix/meta-buffer-get-tensor-multi-segment
Apr 18, 2026
Merged

Fix meta backend tensor reads for split tensors during state serialization#22063
JohannesGaessler merged 1 commit into
ggml-org:masterfrom
ssam18:fix/meta-buffer-get-tensor-multi-segment

Conversation

@ssam18
Copy link
Copy Markdown
Contributor

@ssam18 ssam18 commented Apr 17, 2026

This PR fixes a crash when saving recurrent state with tensor-split models using the meta backend. The previous code assumed that a tensor read would always map to a single segment, which is not always true when -sm tensor is enabled. The fix handles multi-segment tensor reads correctly instead of hitting the split_state.n_segments == 1 assertion. This should allow checkpoint/state serialization to work reliably with tensor-parallel CUDA setups. Fixes #22058

@github-actions github-actions Bot added the ggml changes relating to the ggml tensor library for machine learning label Apr 17, 2026
@pwilkin
Copy link
Copy Markdown
Member

pwilkin commented Apr 18, 2026

Can confirm it now works. Performance is slightly worse at 93 t/s tg vs 110 on -sm layer.

@JohannesGaessler JohannesGaessler merged commit 59accc8 into ggml-org:master Apr 18, 2026
50 of 51 checks passed
@pwilkin
Copy link
Copy Markdown
Member

pwilkin commented Apr 18, 2026

Actually, never mind, was missing NCCL. With NCCL installed, performance is up at 123 t/s.

samuraieng pushed a commit to samuraieng/llama.cpp that referenced this pull request Apr 19, 2026
mengqin pushed a commit to mengqin/llama.cpp that referenced this pull request Apr 20, 2026
ArberSephirotheca pushed a commit to ArberSephirotheca/llama.cpp that referenced this pull request Apr 21, 2026
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Apr 23, 2026
rsenthilkumar6 pushed a commit to rsenthilkumar6/llama.cpp that referenced this pull request May 1, 2026
jimbothigpen pushed a commit to jimbothigpen/frankenturbo2 that referenced this pull request May 2, 2026
ljubomirj pushed a commit to ljubomirj/llama.cpp that referenced this pull request May 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Eval bug: Qwen 3.6 35B fails on -sm tensor

3 participants