UPSTREAM PR #18607: model : add LFM2-ColBert-350M by loci-dev · Pull Request #820 · auroralabs-loci/llama.cpp

loci-dev · 2026-01-05T04:05:34Z

PR adds support for LFM2-ColBert-350M by introducing n_embd_out - a separate output embedding dimension that can differ from the input embedding dimension (n_embd).

Initially, I introduced LLAMA_POOLING_TYPE_TOKEN, which was applying cls_out and outputting all embedding, but then switched to n_embd_out.

n_embd_out will be used in future multimodal models as well.

New GGUF key and API:

LLM_KV_EMBEDDING_LENGTH_OUT - stores output embedding dimension
llama_model_n_embd_out() - returns hparams.n_embd_out if set and fallbacks to hparams.n_embd

Testing

Convert

python convert_hf_to_gguf.py /data/playground/checkpoints/LFM2-ColBert-350M

Launch server

bin/llama-server -m /data/playground/checkpoints/LFM2-ColBert-350M/LFM2-ColBert-350M-F16.gguf --embeddings --pooling none

Run the attached Python script

❯ uv run rerank.py
Score: 29.74 | Q: What is panda? | D: hi
Score: 29.90 | Q: What is panda? | D: it is a bear
Score: 30.52 | Q: What is panda? | D: The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.

rerank.py

cc: @ngxson

loci-review · 2026-01-05T04:50:56Z

Explore the complete analysis inside the Version Insights

Perfect! I've successfully generated a comprehensive summary report for your project. The report shows that Pull Request #820 for the llama.cpp repository demonstrates significant performance improvements across the board:

Key Highlights:

✨ Top Performance Gains:

Up to 289.33% throughput improvement (vector begin operation)
Up to 216.96% response time improvement (Rb_tree const_iterator)
All top 10 functions show positive performance gains with no regressions

🎯 Main Impact Areas:

C++ STL container operations (vectors, maps, trees, hashtables)
Iterator operations
Smart pointer operations

📈 Recommendation: This PR shows excellent performance improvements and should be strongly considered for merging, pending functional verification.

The report includes detailed tables showing the top 10 functions by both response time and throughput improvements, along with analysis and recommendations for next steps.

model : add LFM2-ColBert-350M

7ba071a

loci-dev temporarily deployed to PROD__AL_DEMO January 5, 2026 04:05 — with GitHub Actions Inactive

loci-dev force-pushed the main branch 27 times, most recently from f85d458 to 67c372e Compare January 8, 2026 09:13

loci-dev force-pushed the main branch 27 times, most recently from 048ad94 to 6c1fde6 Compare February 3, 2026 13:32

loci-dev force-pushed the main branch 3 times, most recently from ef7afbe to d4c3480 Compare February 14, 2026 02:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UPSTREAM PR #18607: model : add LFM2-ColBert-350M#820

UPSTREAM PR #18607: model : add LFM2-ColBert-350M#820
loci-dev wants to merge 1 commit intomainfrom
upstream-PR18607-branch_tdakhran-tarek/feat/lfm2-colbert-350m

loci-dev commented Jan 5, 2026

Uh oh!

loci-review bot commented Jan 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

loci-dev commented Jan 5, 2026

Testing

Uh oh!

loci-review bot commented Jan 5, 2026

Key Highlights:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants