graph : utilize `ggml_build_forward_select()` to avoid reallocations by ggerganov · Pull Request #18898 · ggml-org/llama.cpp

ggerganov · 2026-01-17T14:09:01Z

Extracted the usage of the new ggml_build_forward_select() from #18550 into a separate PR in order to more clearly demonstrate how it can be applied to avoid graph reallocations.

Here we utilize it to avoid reallocations when switching between different types of inputs (tokens or embeddings) for most models.

Also enable the server CI to report errors if unexpected reallocations occur during the server tests.

ngxson · 2026-01-18T07:38:45Z

src/llama-graph.cpp

-        inp->embd = ggml_new_tensor_2d(ctx0, GGML_TYPE_F32, n_embd, ubatch.n_tokens);
-        ggml_set_input(inp->embd);
+
+        if (hparams.n_deepstack_layers > 0) {


I think the more generic condition here is n_embd_inp != n_embd

ngxson · 2026-01-18T07:40:33Z

src/llama-graph.cpp

+            cur = ggml_view_2d(ctx0, cur, hparams.n_embd, n_tokens, cur->nb[1], 0);
+            cur = ggml_cont   (ctx0, cur); // makes the shape of this node the same as the ubatch.token path


Instead of resizing input embeddings row size to n_embd, I'm wondering if we can/should do the reverse: pad the input token embedding to n_embd_inp using ggml_pad

Note that doing this way will allow us to remove the ggml_build_forward_select in model cgraph, since the ggml_add(ctx0, cur, ds) path will always be taken. Although, I'm a bit worry if ggml_pad will have negative impact on performance

Good idea - applied in c84637d

…gml-org#18898) * graph : avoid branches between embedding and token inputs * models : make deepstack graphs (e.g. Qwen3 VL) have constant topology * ci : enable -DGGML_SCHED_NO_REALLOC=ON for server CI * cont : pad token embeddings to n_embd_inp

ggerganov requested a review from CISC as a code owner January 17, 2026 14:09

ggerganov changed the title ~~graph : utlize ggml_build_forward_select() to avoid reallocations~~ graph : utliize ggml_build_forward_select() to avoid reallocations Jan 17, 2026

ggerganov changed the title ~~graph : utliize ggml_build_forward_select() to avoid reallocations~~ graph : utilize ggml_build_forward_select() to avoid reallocations Jan 17, 2026

github-actions bot added model Model specific devops improvements to build systems and github actions labels Jan 17, 2026

ngxson reviewed Jan 18, 2026

View reviewed changes

Base automatically changed from gg/graph-avoid-branches-3 to master January 19, 2026 18:03

ggerganov requested review from 0cc4m, lhez, max-krasnyansky, reeselevine and taronaeo as code owners January 19, 2026 18:03

ggerganov added 3 commits January 23, 2026 14:49

graph : avoid branches between embedding and token inputs

7ae2c4c

models : make deepstack graphs (e.g. Qwen3 VL) have constant topology

84d13bc

ci : enable -DGGML_SCHED_NO_REALLOC=ON for server CI

3daf8e3

ggerganov force-pushed the gg/graph-use-select branch from 9a066d0 to 3daf8e3 Compare January 23, 2026 12:49

cont : pad token embeddings to n_embd_inp

c84637d

ggerganov merged commit 557515b into master Jan 23, 2026
77 of 78 checks passed

ggerganov deleted the gg/graph-use-select branch January 23, 2026 16:22

ggerganov mentioned this pull request Feb 9, 2026

Eval bug: Llama.cpp 40% slower than VLLM + high CPU usage when running Qwen Coder Next #19345

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

graph : utilize `ggml_build_forward_select()` to avoid reallocations#18898

graph : utilize `ggml_build_forward_select()` to avoid reallocations#18898
ggerganov merged 4 commits intomasterfrom
gg/graph-use-select

ggerganov commented Jan 17, 2026 •

edited

Loading

Uh oh!

ngxson Jan 18, 2026

Uh oh!

ngxson Jan 18, 2026 •

edited

Loading

Uh oh!

ngxson Jan 18, 2026 •

edited

Loading

Uh oh!

ggerganov Jan 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		cur = ggml_view_2d(ctx0, cur, hparams.n_embd, n_tokens, cur->nb[1], 0);
		cur = ggml_cont (ctx0, cur); // makes the shape of this node the same as the ubatch.token path

Conversation

ggerganov commented Jan 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngxson Jan 18, 2026

Choose a reason for hiding this comment

Uh oh!

ngxson Jan 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ngxson Jan 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ggerganov Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ggerganov commented Jan 17, 2026 •

edited

Loading

ngxson Jan 18, 2026 •

edited

Loading

ngxson Jan 18, 2026 •

edited

Loading