Skip to content

UPSTREAM PR #18607: model : add LFM2-ColBert-350M#820

Open
loci-dev wants to merge 1 commit intomainfrom
upstream-PR18607-branch_tdakhran-tarek/feat/lfm2-colbert-350m
Open

UPSTREAM PR #18607: model : add LFM2-ColBert-350M#820
loci-dev wants to merge 1 commit intomainfrom
upstream-PR18607-branch_tdakhran-tarek/feat/lfm2-colbert-350m

Conversation

@loci-dev
Copy link

@loci-dev loci-dev commented Jan 5, 2026

Mirrored from ggml-org/llama.cpp#18607

PR adds support for LFM2-ColBert-350M by introducing n_embd_out - a separate output embedding dimension that can differ from the input embedding dimension (n_embd).

Initially, I introduced LLAMA_POOLING_TYPE_TOKEN, which was applying cls_out and outputting all embedding, but then switched to n_embd_out.

n_embd_out will be used in future multimodal models as well.

New GGUF key and API:

  • LLM_KV_EMBEDDING_LENGTH_OUT - stores output embedding dimension
  • llama_model_n_embd_out() - returns hparams.n_embd_out if set and fallbacks to hparams.n_embd

Testing

Convert

python convert_hf_to_gguf.py /data/playground/checkpoints/LFM2-ColBert-350M

Launch server

bin/llama-server -m /data/playground/checkpoints/LFM2-ColBert-350M/LFM2-ColBert-350M-F16.gguf --embeddings --pooling none

Run the attached Python script

❯ uv run rerank.py
Score: 29.74 | Q: What is panda? | D: hi
Score: 29.90 | Q: What is panda? | D: it is a bear
Score: 30.52 | Q: What is panda? | D: The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.

rerank.py

cc: @ngxson

@loci-review
Copy link

loci-review bot commented Jan 5, 2026

Explore the complete analysis inside the Version Insights

Perfect! I've successfully generated a comprehensive summary report for your project. The report shows that Pull Request #820 for the llama.cpp repository demonstrates significant performance improvements across the board:

Key Highlights:

Top Performance Gains:

  • Up to 289.33% throughput improvement (vector begin operation)
  • Up to 216.96% response time improvement (Rb_tree const_iterator)
  • All top 10 functions show positive performance gains with no regressions

🎯 Main Impact Areas:

  • C++ STL container operations (vectors, maps, trees, hashtables)
  • Iterator operations
  • Smart pointer operations

📈 Recommendation: This PR shows excellent performance improvements and should be strongly considered for merging, pending functional verification.

The report includes detailed tables showing the top 10 functions by both response time and throughput improvements, along with analysis and recommendations for next steps.

@loci-dev loci-dev force-pushed the main branch 27 times, most recently from f85d458 to 67c372e Compare January 8, 2026 09:13
@loci-dev loci-dev force-pushed the main branch 27 times, most recently from 048ad94 to 6c1fde6 Compare February 3, 2026 13:32
@loci-dev loci-dev force-pushed the main branch 3 times, most recently from ef7afbe to d4c3480 Compare February 14, 2026 02:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants