app: re-inject subcommand when router spawns children under unified binary by ServeurpersoCom · Pull Request #23442 · ggml-org/llama.cpp

ServeurpersoCom · 2026-05-20T21:30:57Z

Overview

Under LLAMA_BUILD_APP=ON, /proc/self/exe is llama, so the router spawns the child as "llama --host ..." which dies on unknown command. The dispatcher now exports the subcommand (LLAMA_APP_CMD) and the router re-injects it, so the child starts as "llama serve ...". No effect on the standalone llama-server binary.

Additional information

@ngxson WDYT? There are several possible approaches here.

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: YES

…d binary

ggerganov · 2026-05-21T06:29:41Z

Can we do a discovery on startup and avoid setting env variables? Check which of the two tool is available. Print error if none.

angt · 2026-05-21T06:35:51Z

The current way of doing it is very fragile btw..

angt · 2026-05-21T06:48:20Z

Setting the env from llama-app as a quickfix LGTM, but at some point I believe we should reorganize the code to spawn many models

ServeurpersoCom · 2026-05-21T07:46:37Z

Yes, intended as a quickfix. We can do better in a follow-up PR.

…d binary (ggml-org#23442)

* origin/master: (138 commits) fix(flash-attn): replace f32 with kv_type and q_type (ggml-org#23372) tests : move save-load-state from examples to tests (ggml-org#23336) server: expose prompt token counts in /slots endpoint (ggml-org#23454) metal : optimize concat kernel and fix set kernel threads (ggml-org#23411) server : free draft/MTP resources on sleep to fix VRAM leak (ggml-org#23461) server: re-inject subcommand when router spawns children under unified binary (ggml-org#23442) app : add batched-bench, fit-params, quantize & perplexity (ggml-org#23459) mtp: use inp_out_ids for skipping logit computation (ggml-org#23433) vocab : add Carbon-3B (HybridDNATokenizer) support (ggml-org#23410) doc: fix spec mtp typo (ggml-org#23435) ui: Improve Git Hooks for UI development (ggml-org#23403) ggml : Check the right iface method before using the fallback 2d get (ggml-org#23306) llama-graph: fix null-buffer crash in llm_graph_input_attn_kv_iswa for SWA-only models (ggml-org#23131) hexagon: ssm-conv fix for large prompts (ggml-org#23307) app : show version (ggml-org#23426) mtmd, model : merge HunyuanOCR into HunyuanVL and fix OCR vision precision (ggml-org#23329) ui: Add max image size option (ggml-org#22849) Move to backend sampling for MTP draft path (ggml-org#23287) opencl: refactor backend initilization (ggml-org#23318) common/speculative : fix nullptr crash in get_devices_str (ggml-org#23386) ...

…d binary (ggml-org#23442)

server: re-inject subcommand when router spawns children under unifie…

c222838

…d binary

ServeurpersoCom requested a review from a team as a code owner May 20, 2026 21:30

ServeurpersoCom mentioned this pull request May 20, 2026

app : introduce the llama unified executable #23296

Merged

allozaur approved these changes May 20, 2026

View reviewed changes

github-actions Bot added examples server labels May 21, 2026

ggerganov approved these changes May 21, 2026

View reviewed changes

allozaur merged commit c902171 into ggml-org:master May 21, 2026
49 checks passed

ProTekk pushed a commit to ProTekk/buun-llama-cpp that referenced this pull request May 21, 2026

server: re-inject subcommand when router spawns children under unifie…

4748eec

…d binary (ggml-org#23442)

nyo16 mentioned this pull request May 21, 2026

Bump llama.cpp to 52fb93a2b (30 commits) nyo16/llama_cpp_ex#42

Merged

4 tasks

baramofme pushed a commit to baramofme/llama-cpp-turboquant that referenced this pull request May 23, 2026

server: re-inject subcommand when router spawns children under unifie…

fd90020

…d binary (ggml-org#23442)

srossitto79 pushed a commit to srossitto79/llama.cpp that referenced this pull request May 23, 2026

server: re-inject subcommand when router spawns children under unifie…

41d2868

…d binary (ggml-org#23442)

fewtarius pushed a commit to fewtarius/llama.cpp that referenced this pull request May 30, 2026

server: re-inject subcommand when router spawns children under unifie…

0b08480

…d binary (ggml-org#23442)

turbo-tan pushed a commit to turbo-tan/llama.cpp-tq3 that referenced this pull request Jun 2, 2026

server: re-inject subcommand when router spawns children under unifie…

2770812

…d binary (ggml-org#23442)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

app: re-inject subcommand when router spawns children under unified binary#23442

app: re-inject subcommand when router spawns children under unified binary#23442
allozaur merged 1 commit into
ggml-org:masterfrom
ServeurpersoCom:fix/router-spawns-unified

ServeurpersoCom commented May 20, 2026 •

edited by ggerganov

Loading

Uh oh!

ggerganov commented May 21, 2026

Uh oh!

angt commented May 21, 2026

Uh oh!

angt commented May 21, 2026

Uh oh!

ServeurpersoCom commented May 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

ServeurpersoCom commented May 20, 2026 • edited by ggerganov Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Additional information

Requirements

Uh oh!

ggerganov commented May 21, 2026

Uh oh!

angt commented May 21, 2026

Uh oh!

angt commented May 21, 2026

Uh oh!

ServeurpersoCom commented May 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ServeurpersoCom commented May 20, 2026 •

edited by ggerganov

Loading