Skip to content

Conversation

@ggerganov
Copy link
Member

  • Add "model_alias" to /props endpoint
  • Render the model alias when specified
llama-server -hf ggml-org/gpt-oss-20b-GGUF --jinja -c 0 --port 8033 --alias gpt-oss-20-example
image

@allozaur
Copy link
Collaborator

allozaur commented Nov 2, 2025

Looking good to me!

@CISC
Copy link
Collaborator

CISC commented Nov 2, 2025

Not a massively helpful message, but I guess something needs to be fixed :)
https://github.com/ggml-org/llama.cpp/actions/runs/19014962680/job/54301478932?pr=16943#step:6:15

@allozaur
Copy link
Collaborator

allozaur commented Nov 2, 2025

Not a massively helpful message, but I guess something needs to be fixed :)

https://github.com/ggml-org/llama.cpp/actions/runs/19014962680/job/54301478932?pr=16943#step:6:15

I hadn't seen that check failing in CI. @ggerganov u can just run npm run format locally and push.

@ggerganov ggerganov force-pushed the gg/server-use-alias branch from e63315b to 0e42f25 Compare November 3, 2025 09:33
@allozaur allozaur merged commit 48bd265 into master Nov 3, 2025
68 of 70 checks passed
@ggerganov ggerganov deleted the gg/server-use-alias branch November 3, 2025 13:46
gabe-l-hart added a commit to gabe-l-hart/llama.cpp that referenced this pull request Nov 3, 2025
* origin/master: (169 commits)
opencl: support imrope (ggml-org#16914)
fix: Viewing multiple PDF attachments (ggml-org#16974)
model-conversion : pass config to from_pretrained (ggml-org#16963)
server : add props.model_alias (ggml-org#16943)
ggml: CUDA: add head size 72 for flash-attn (ggml-org#16962)
mtmd: add --image-min/max-tokens (ggml-org#16921)
mtmd: pad mask for qwen2.5vl (ggml-org#16954)
ggml : LoongArch fixes (ggml-org#16958)
sync: minja (glm 4.6 & minmax m2 templates) (ggml-org#16949)
SYCL: optimized repeat_back kernel (3× fewer asm instructions, 2× faster)Feature/sycl repeat back opt (ggml-org#16869)
feat(webui): improve LaTeX rendering with currency detection (ggml-org#16508)
test-backend-ops : fix segfault in moe-expert-reduce test in support mode and coverage (ggml-org#16936)
ci : disable failing riscv cross build (ggml-org#16952)
model: add Janus Pro for image understanding (ggml-org#16906)
clip : use FA (ggml-org#16837)
server : support unified cache across slots (ggml-org#16736)
common : move gpt-oss reasoning processing to init params (ggml-org#16937)
docs: remove llama_sampler_accept reference in sampling sample usage (ggml-org#16920)
CUDA: add FLOOR, CEIL, ROUND, TRUNC unary ops (ggml-org#16917)
devops: fix failing s390x docker build (ggml-org#16918)
...
GittyBurstein pushed a commit to yael-works/llama.cpp that referenced this pull request Nov 5, 2025
* server : add props.model_alias

* webui : npm run format
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants