vocab: Support tokenizer for LFM2.5-8B-A1B by tdakhran · Pull Request #23826 · ggml-org/llama.cpp

tdakhran · 2026-05-28T15:53:20Z

Overview

LFM2.5-8B-A1B shares architecture with LFM2-8B-A1B but comes with a new extended tokenizer.

This PR adds support for it.

GGUFs are uploaded to LiquidAI/LFM2.5-8B-A1B-GGUF

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: NO

ngxson

🚀

CISC

Why both in pre_computed_hashes?

CISC

Hmmm, looking more closely the LFM2 pre-tokenizer does not match LFM2.5, it uses a new regex:

'(?i:[sdmt]|ll|ve|re)|[^\\r\\n\\p{L}\\p{N}]?\\p{L}+|\\p{N}{1,3}| ?[^\\s\\p{L}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]|\\s+(?!\\S)|\\s

tdakhran · 2026-05-29T06:08:57Z

Thanks for the feedback @CISC .

Why both in pre_computed_hashes?

I tried to place them both in the table, but then re-running convert_hf_to_gguf_update.py leaves conversion/base.py in an incorrect state. Then I followed falcon-h1 example and placed it into pre_computed_hashes.

Hmmm, looking more closely the LFM2 pre-tokenizer does not match LFM2.5, it uses a new regex:

on regex, the difference is minimal, and testing on real-life use cases didn't show any difference.

However, we discovered that tool calling doesn't work with this chat template in llama.cpp (works in other frameworks) and currently debugging it.

CISC · 2026-05-29T06:31:38Z

Why both in pre_computed_hashes?

I tried to place them both in the table, but then re-running convert_hf_to_gguf_update.py leaves conversion/base.py in an incorrect state. Then I followed falcon-h1 example and placed it into pre_computed_hashes.

One (the original) should stay in models, any duplicates go in pre_computed_hashes.

Hmmm, looking more closely the LFM2 pre-tokenizer does not match LFM2.5, it uses a new regex:

on regex, the difference is minimal, and testing on real-life use cases didn't show any difference.

Still, since there is an actual difference it would be prudent to add an lfm2.5 pre-tokenizer.

tdakhran · 2026-05-29T15:58:20Z

@CISC , we reworked the chat template to use a similar regex to lfm2 here https://huggingface.co/LiquidAI/LFM2.5-8B-A1B/discussions/5 .

I moved back existing tokenizer to models, hope it looks good to merge!

CISC · 2026-05-29T18:04:57Z

@CISC , we reworked the chat template to use a similar regex to lfm2 here https://huggingface.co/LiquidAI/LFM2.5-8B-A1B/discussions/5 .

I moved back existing tokenizer to models, hope it looks good to merge!

That works too I guess. :)

* origin/master: vocab : support tokenizer for LFM2.5-8B-A1B (ggml-org#23826) graph : ensure DS32 kq_mask_lid is F32 (ggml-org#23864) server: remove obsolete scripts (ggml-org#23870) ci : update macos release to use macos-26 runner (ggml-org#23878) download: add option to skip_download (ggml-org#23059) mtmd: Add DeepSeekOCR 2 Support (ggml-org#20975) CUDA: Check PTX version on host side to guard PDL dispatch (ggml-org#23530) server: bump timeout to 3600s (ggml-org#23842) model : support for DeepseekV32ForCausalLM with generic DeepSeek Sparse Attention (DSA) implementation (ggml-org#23346) llama: use f16 mask for FA to save VRAM (ggml-org#23764) sync : ggml ggml : bump version to 0.13.1 (ggml/1523) ngram-mod : Add missing include (ggml-org#23857) llama: add llm_graph_input_mtp (ggml-org#23643) app : move licences to llama-app (ggml-org#23824) cuda : disables launch_fattn PDL enrollment due to compiler bug (ggml-org#23825) meta : Add missing `buffer` set in allreduce fallback !COMPUTE clear (ggml-org#23480)

* vocab: Support tokenizer for LFM2.5-8B-A1B * Keep liquid6 tokenizer in models

tdakhran requested a review from CISC as a code owner May 28, 2026 15:53

github-actions Bot added the python python script changes label May 28, 2026

ngxson approved these changes May 28, 2026

View reviewed changes

CISC approved these changes May 28, 2026

View reviewed changes

CISC requested changes May 28, 2026

View reviewed changes

tdakhran added 2 commits May 29, 2026 17:54

vocab: Support tokenizer for LFM2.5-8B-A1B

69cc285

Keep liquid6 tokenizer in models

6f8fa55

tdakhran force-pushed the tarek/feat/liquid7-tokenizer branch from 964007f to 6f8fa55 Compare May 29, 2026 15:56

tdakhran requested a review from CISC May 29, 2026 15:59

CISC approved these changes May 29, 2026

View reviewed changes

CISC merged commit 2084434 into ggml-org:master May 29, 2026
7 checks passed

tdakhran deleted the tarek/feat/liquid7-tokenizer branch May 29, 2026 21:11

fewtarius pushed a commit to fewtarius/llama.cpp that referenced this pull request May 30, 2026

vocab : support tokenizer for LFM2.5-8B-A1B (ggml-org#23826)

e03828f

* vocab: Support tokenizer for LFM2.5-8B-A1B * Keep liquid6 tokenizer in models

turbo-tan pushed a commit to turbo-tan/llama.cpp-tq3 that referenced this pull request Jun 2, 2026

vocab : support tokenizer for LFM2.5-8B-A1B (ggml-org#23826)

98bb790

* vocab: Support tokenizer for LFM2.5-8B-A1B * Keep liquid6 tokenizer in models

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vocab: Support tokenizer for LFM2.5-8B-A1B#23826

vocab: Support tokenizer for LFM2.5-8B-A1B#23826
CISC merged 2 commits into
ggml-org:masterfrom
tdakhran:tarek/feat/liquid7-tokenizer

tdakhran commented May 28, 2026 •

edited

Loading

Uh oh!

ngxson left a comment

Uh oh!

CISC left a comment

Uh oh!

CISC left a comment

Uh oh!

tdakhran commented May 29, 2026

Uh oh!

CISC commented May 29, 2026

Uh oh!

tdakhran commented May 29, 2026

Uh oh!

CISC commented May 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

tdakhran commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Requirements

Uh oh!

ngxson left a comment

Choose a reason for hiding this comment

Uh oh!

CISC left a comment

Choose a reason for hiding this comment

Uh oh!

CISC left a comment

Choose a reason for hiding this comment

Uh oh!

tdakhran commented May 29, 2026

Uh oh!

CISC commented May 29, 2026

Uh oh!

tdakhran commented May 29, 2026

Uh oh!

CISC commented May 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tdakhran commented May 28, 2026 •

edited

Loading