UPSTREAM PR #18825: model: Add PaddleOCR-VL model support by loci-dev · Pull Request #914 · auroralabs-loci/llama.cpp

loci-dev · 2026-01-14T06:45:30Z

Add PaddleOCR-VL model support.

Test with some images:

A receipt

with command:

./build/bin/llama-cli -m /media/shun/bigdata/Models/PaddleOCR_VL_SFT/PaddleOCR-VL-GGUF.gguf \
  --mmproj /media/shun/bigdata/Models/PaddleOCR_VL_SFT/PaddleOCR-VL-GGUF-mmproj.gguf \
  --color on\
  --image /home/shun/Pictures/1640.jpeg \
  --prompt "OCR:"

A table

with command:

./build/bin/llama-cli -m /media/shun/bigdata/Models/PaddleOCR_VL_SFT/PaddleOCR-VL-GGUF.gguf \
  --mmproj /media/shun/bigdata/Models/PaddleOCR_VL_SFT/PaddleOCR-VL-GGUF-mmproj.gguf \
  --color on\
  --image /home/shun/Pictures/paddleocr.jpg \
  --prompt "Table Recognition:"

can be formatted:

p.s. Thanks to @ngxson ggml-org/llama.cpp#16701

…warmup

…addleocr-vl

loci-review · 2026-01-14T07:54:36Z

Explore the complete analysis inside the Version Insights

Now I'll generate the performance review report based on all the gathered information.

Performance Review Report

Summary

The target version introduces PaddleOCR-VL multimodal model support across 11 commits with 21 modified files, 39 additions, and 3 deletions. Performance impact is negligible: libllama.so shows +0.18% power consumption increase, libmtmd.so shows +0.70%, while all other binaries remain unchanged. The top 10 functions with largest percentage changes are all standard library utilities (STL vector/iterator operations) with absolute timing differences under 200ns, representing compiler optimization variance rather than algorithmic regressions.

Code Changes Context

The commit history reveals functional additions for PaddleOCR-VL vision-language model integration: 4D m-rope position encoding, 16x patch merging for memory efficiency, dynamic resolution support, and conservative warmup settings (784 tokens) to prevent OOM. No changes were made to performance-critical inference paths (llama_decode, KV cache operations, matrix multiplication kernels, or quantization logic).

Function-Level Analysis

All 10 analyzed functions are compiler-generated STL template instantiations with no source code modifications:

std::vector::end() (llama-kv-cache.cpp): +183ns regression from compiler code reorganization adding extra indirection
std::make_move_iterator: +169ns from additional branch instructions at function entry
std::make_error_condition: +187ns in error handling path (initialization only)
std::vector reallocation functions: Mixed results (-42ns to +41ns) from compiler optimization differences in memory management

These functions execute during initialization (model loading, backend setup) or error paths, not in the inference hot path dominated by matrix operations and attention computation.

Power Consumption

Total power consumption changes are minimal: libllama.so increased by 440 nanojoules (+0.18%), libmtmd.so by 1,253 nanojoules (+0.70%). These microscopic increases align with the addition of new model architecture code paths rather than performance degradation in existing functionality.

Assessment

The performance differences represent normal compiler optimization variance in standard library code generation between build environments. The absolute timing changes (all under 300ns per function) are negligible compared to typical LLM inference times (10-100ms per token). The PaddleOCR-VL feature additions are isolated to new code paths and do not impact existing model inference performance.

megemini added 11 commits December 19, 2025 12:24

support PaddleOCR-VL

b4cde7c

clip: update PaddleOCR model loader parameters to prevent OOM during …

64f0a46

…warmup

[update] add paddleocr vl text model instead of ernie4.5

fbfa906

Merge branch 'master' of https://github.com/ggml-org/llama.cpp into p…

54de283

…addleocr-vl

[update] restore change of minicpmv

0995fbb

Merge branch 'master' of https://github.com/ggml-org/llama.cpp into p…

6634ff1

…addleocr-vl

[update] format

73573b7

[update] format

65e43e4

[update] positions and patch merge permute

d54c871

Merge branch 'master' of https://github.com/ggml-org/llama.cpp into p…

0ab3b43

…addleocr-vl

[update] mtmd_decode_use_mrope for paddleocr

9d5a701

loci-dev temporarily deployed to PROD__AL_DEMO January 14, 2026 06:45 — with GitHub Actions Inactive

loci-dev force-pushed the main branch from 63d526f to 4d4e1d4 Compare January 14, 2026 07:12

loci-dev force-pushed the main branch 16 times, most recently from ad54807 to d388dca Compare January 16, 2026 11:08

loci-dev force-pushed the main branch 14 times, most recently from 048ad94 to 6c1fde6 Compare February 3, 2026 13:32

loci-dev force-pushed the main branch 10 times, most recently from 823244c to bab7d39 Compare February 19, 2026 02:17

loci-dev force-pushed the main branch 6 times, most recently from 45aacad to 6e8718a Compare February 24, 2026 02:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

UPSTREAM PR #18825: model: Add PaddleOCR-VL model support#914

UPSTREAM PR #18825: model: Add PaddleOCR-VL model support#914
loci-dev wants to merge 11 commits intomainfrom
upstream-PR18825-branch_megemini-paddleocr-vl

loci-dev commented Jan 14, 2026

Uh oh!

loci-review bot commented Jan 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

loci-dev commented Jan 14, 2026

Uh oh!

loci-review bot commented Jan 14, 2026

Performance Review Report

Summary

Code Changes Context

Function-Level Analysis

Power Consumption

Assessment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants