UPSTREAM PR #19249: support infill for Falcon-H1-Tiny-Coder by loci-dev · Pull Request #1123 · auroralabs-loci/llama.cpp

loci-dev · 2026-02-01T16:44:50Z

Note

Source pull request: ggml-org/llama.cpp#19249

Added FIM tokens used in Falcon-H1-Tiny-Coder (see https://huggingface.co/tiiuae/Falcon-H1-Tiny-Coder-90M-GGUF#usage) to make the llama-server POST /infill handle work.

loci-review · 2026-02-01T17:45:55Z

Overview

Analysis of llama.cpp across 115,331 functions (9 modified, 4 new, 4 removed, 115,314 unchanged) reveals negligible performance impact from a single commit adding Falcon-H1-Tiny-Coder FIM vocabulary tokens.

Power Consumption Changes:

build.bin.libllama.so: +0.03% (+75.13 nJ)
build.bin.llama-cvector-generator: +0.00%
build.bin.llama-tts: +0.00%
build.bin.libmtmd.so: +0.00%
build.bin.llama-tokenize: 0.00%
build.bin.llama-quantize: 0.00%
build.bin.llama-qwen2vl-cli: 0.00%
build.bin.libggml-base.so: 0.00%
build.bin.libggml-cpu.so: 0.00%
build.bin.libggml.so: 0.00%
build.bin.llama-gguf-split: 0.00%
build.bin.llama-llava-cli: 0.00%
build.bin.llama-minicpmv-cli: 0.00%
build.bin.llama-gemma3-cli: 0.00%
build.bin.llama-bench: 0.00%

System-wide power consumption increased by 0.005% (+78.52 nJ).

Function Analysis

All measurable changes affect C++ standard library functions, not llama.cpp application code:

_M_swap_data (std::vector internal, build.bin.libllama.so): Response time improved 243.10ns → 169.67ns (-30.2%), throughput time 150.81ns → 77.38ns (-48.7%). Compiler optimization improvement in vector swap operations used during regex tokenization.

operator+ (iterator arithmetic, build.bin.libllama.so): Response time regressed 141.11ns → 177.26ns (+25.6%), throughput time 119.84ns → 155.99ns (+30.2%). Used in BPE tokenization paths with llm_symbol vectors. Regression likely from compiler version differences.

_M_insert_character_class_matcher (regex compiler, build.bin.libllama.so): Response time 27,224.06ns → 27,319.64ns (+0.35%), throughput time 251.63ns → 347.50ns (+38.1%). Significant self-time regression with negligible overall impact. Used only during initialization for argument parsing and template loading, not in inference paths.

Source code changes (vocabulary token additions in llama-vocab.cpp) do not directly affect these standard library functions. Performance variations stem from build environment differences rather than code modifications.

Additional Findings

Core inference operations remain unchanged: matrix multiplication (70-90% of inference time), attention computation, KV cache management, and GPU acceleration backends show zero modification. The vocabulary addition is purely additive and does not impact performance-critical paths. Tokenization may experience 1-2% regression in worst-case scenarios, representing <0.1% overall inference impact.

🔎 Full breakdown: Loci Inspector.
💬 Questions? Tag @loci-dev.

vocab: add Falcon-H1-Tiny-Coder FIM tokens

a922def

loci-dev temporarily deployed to PROD__AL_DEMO February 1, 2026 16:44 — with GitHub Actions Inactive

loci-dev force-pushed the main branch from d0c72a3 to b6fbb40 Compare February 1, 2026 17:10

loci-dev force-pushed the main branch 26 times, most recently from cbda11a to 03fef13 Compare February 3, 2026 00:46

loci-dev force-pushed the main branch 10 times, most recently from 823244c to bab7d39 Compare February 19, 2026 02:17

loci-dev force-pushed the main branch 10 times, most recently from a92fe2a to 6495042 Compare February 27, 2026 02:17

loci-dev force-pushed the main branch 8 times, most recently from 4298c74 to 0db6c47 Compare March 7, 2026 02:16

loci-dev force-pushed the main branch 2 times, most recently from 8019888 to 17452e3 Compare March 9, 2026 02:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UPSTREAM PR #19249: support infill for Falcon-H1-Tiny-Coder#1123

UPSTREAM PR #19249: support infill for Falcon-H1-Tiny-Coder#1123
loci-dev wants to merge 1 commit intomainfrom
loci/pr-19249-infill-falcon-h1

loci-dev commented Feb 1, 2026

Uh oh!

loci-review bot commented Feb 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

loci-dev commented Feb 1, 2026

Uh oh!

loci-review bot commented Feb 1, 2026

Overview

Function Analysis

Additional Findings

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants