Skip to content

Comments

UPSTREAM PR #18825: model: Add PaddleOCR-VL model support#914

Open
loci-dev wants to merge 11 commits intomainfrom
upstream-PR18825-branch_megemini-paddleocr-vl
Open

UPSTREAM PR #18825: model: Add PaddleOCR-VL model support#914
loci-dev wants to merge 11 commits intomainfrom
upstream-PR18825-branch_megemini-paddleocr-vl

Conversation

@loci-dev
Copy link

Mirrored from ggml-org/llama.cpp#18825

Add PaddleOCR-VL model support.

Test with some images:

  1. A receipt

1640

with command:

./build/bin/llama-cli -m /media/shun/bigdata/Models/PaddleOCR_VL_SFT/PaddleOCR-VL-GGUF.gguf \
  --mmproj /media/shun/bigdata/Models/PaddleOCR_VL_SFT/PaddleOCR-VL-GGUF-mmproj.gguf \
  --color on\
  --image /home/shun/Pictures/1640.jpeg \
  --prompt "OCR:"
Screenshot from 2026-01-14 13-54-35
  1. A table

paddleocr

with command:

./build/bin/llama-cli -m /media/shun/bigdata/Models/PaddleOCR_VL_SFT/PaddleOCR-VL-GGUF.gguf \
  --mmproj /media/shun/bigdata/Models/PaddleOCR_VL_SFT/PaddleOCR-VL-GGUF-mmproj.gguf \
  --color on\
  --image /home/shun/Pictures/paddleocr.jpg \
  --prompt "Table Recognition:"
Screenshot from 2026-01-14 13-56-26

can be formatted:

Screenshot from 2026-01-14 13-56-53

p.s. Thanks to @ngxson ggml-org/llama.cpp#16701

@loci-review
Copy link

loci-review bot commented Jan 14, 2026

Explore the complete analysis inside the Version Insights

Now I'll generate the performance review report based on all the gathered information.


Performance Review Report

Summary

The target version introduces PaddleOCR-VL multimodal model support across 11 commits with 21 modified files, 39 additions, and 3 deletions. Performance impact is negligible: libllama.so shows +0.18% power consumption increase, libmtmd.so shows +0.70%, while all other binaries remain unchanged. The top 10 functions with largest percentage changes are all standard library utilities (STL vector/iterator operations) with absolute timing differences under 200ns, representing compiler optimization variance rather than algorithmic regressions.

Code Changes Context

The commit history reveals functional additions for PaddleOCR-VL vision-language model integration: 4D m-rope position encoding, 16x patch merging for memory efficiency, dynamic resolution support, and conservative warmup settings (784 tokens) to prevent OOM. No changes were made to performance-critical inference paths (llama_decode, KV cache operations, matrix multiplication kernels, or quantization logic).

Function-Level Analysis

All 10 analyzed functions are compiler-generated STL template instantiations with no source code modifications:

  • std::vector::end() (llama-kv-cache.cpp): +183ns regression from compiler code reorganization adding extra indirection
  • std::make_move_iterator: +169ns from additional branch instructions at function entry
  • std::make_error_condition: +187ns in error handling path (initialization only)
  • std::vector reallocation functions: Mixed results (-42ns to +41ns) from compiler optimization differences in memory management

These functions execute during initialization (model loading, backend setup) or error paths, not in the inference hot path dominated by matrix operations and attention computation.

Power Consumption

Total power consumption changes are minimal: libllama.so increased by 440 nanojoules (+0.18%), libmtmd.so by 1,253 nanojoules (+0.70%). These microscopic increases align with the addition of new model architecture code paths rather than performance degradation in existing functionality.

Assessment

The performance differences represent normal compiler optimization variance in standard library code generation between build environments. The absolute timing changes (all under 300ns per function) are negligible compared to typical LLM inference times (10-100ms per token). The PaddleOCR-VL feature additions are isolated to new code paths and do not impact existing model inference performance.

@loci-dev loci-dev force-pushed the main branch 16 times, most recently from ad54807 to d388dca Compare January 16, 2026 11:08
@loci-dev loci-dev force-pushed the main branch 14 times, most recently from 048ad94 to 6c1fde6 Compare February 3, 2026 13:32
@loci-dev loci-dev force-pushed the main branch 10 times, most recently from 823244c to bab7d39 Compare February 19, 2026 02:17
@loci-dev loci-dev force-pushed the main branch 6 times, most recently from 45aacad to 6e8718a Compare February 24, 2026 02:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants