UPSTREAM PR #18825: model: Add PaddleOCR-VL model support#914
UPSTREAM PR #18825: model: Add PaddleOCR-VL model support#914
Conversation
|
Explore the complete analysis inside the Version Insights Now I'll generate the performance review report based on all the gathered information. Performance Review ReportSummaryThe target version introduces PaddleOCR-VL multimodal model support across 11 commits with 21 modified files, 39 additions, and 3 deletions. Performance impact is negligible: libllama.so shows +0.18% power consumption increase, libmtmd.so shows +0.70%, while all other binaries remain unchanged. The top 10 functions with largest percentage changes are all standard library utilities (STL vector/iterator operations) with absolute timing differences under 200ns, representing compiler optimization variance rather than algorithmic regressions. Code Changes ContextThe commit history reveals functional additions for PaddleOCR-VL vision-language model integration: 4D m-rope position encoding, 16x patch merging for memory efficiency, dynamic resolution support, and conservative warmup settings (784 tokens) to prevent OOM. No changes were made to performance-critical inference paths (llama_decode, KV cache operations, matrix multiplication kernels, or quantization logic). Function-Level AnalysisAll 10 analyzed functions are compiler-generated STL template instantiations with no source code modifications:
These functions execute during initialization (model loading, backend setup) or error paths, not in the inference hot path dominated by matrix operations and attention computation. Power ConsumptionTotal power consumption changes are minimal: libllama.so increased by 440 nanojoules (+0.18%), libmtmd.so by 1,253 nanojoules (+0.70%). These microscopic increases align with the addition of new model architecture code paths rather than performance degradation in existing functionality. AssessmentThe performance differences represent normal compiler optimization variance in standard library code generation between build environments. The absolute timing changes (all under 300ns per function) are negligible compared to typical LLM inference times (10-100ms per token). The PaddleOCR-VL feature additions are isolated to new code paths and do not impact existing model inference performance. |
ad54807 to
d388dca
Compare
048ad94 to
6c1fde6
Compare
823244c to
bab7d39
Compare
45aacad to
6e8718a
Compare
Mirrored from ggml-org/llama.cpp#18825
Add PaddleOCR-VL model support.
Test with some images:
with command:
./build/bin/llama-cli -m /media/shun/bigdata/Models/PaddleOCR_VL_SFT/PaddleOCR-VL-GGUF.gguf \ --mmproj /media/shun/bigdata/Models/PaddleOCR_VL_SFT/PaddleOCR-VL-GGUF-mmproj.gguf \ --color on\ --image /home/shun/Pictures/1640.jpeg \ --prompt "OCR:"with command:
./build/bin/llama-cli -m /media/shun/bigdata/Models/PaddleOCR_VL_SFT/PaddleOCR-VL-GGUF.gguf \ --mmproj /media/shun/bigdata/Models/PaddleOCR_VL_SFT/PaddleOCR-VL-GGUF-mmproj.gguf \ --color on\ --image /home/shun/Pictures/paddleocr.jpg \ --prompt "Table Recognition:"can be formatted:
p.s. Thanks to @ngxson ggml-org/llama.cpp#16701