Skip to content

Conversation

@ngxson
Copy link
Collaborator

@ngxson ngxson commented Oct 21, 2025

This is a very early WIP

Progress:

  • Only the language model is working now. The vision encoder is not yet implemented
  • Vision encoder is added, but not yet numerically correct
  • Model generate hallucinated text, likely because of the projector being incorrect

@ngxson ngxson linked an issue Oct 21, 2025 that may be closed by this pull request
4 tasks
@github-actions github-actions bot added the python python script changes label Oct 21, 2025
@TalonBvV
Copy link

@ngxson thanks for the great work on this, I was really looking forward to benchmarking this model, until I saw it's limitations, on your point here "Model generate hallucinated text, likely because of the projector being incorrect" I don't think it's due to the projector, I cloned your branch to see why it's hallucinating, it seems to be due to the lack of pre-processing input done by this model "PP-DocLayoutV2"... PaddleOCR-VL is not an end to end VLM, it relies on "PP-DocLayoutV2" for detection, it's basically a glorified version of LayoutLM.

@ngxson
Copy link
Collaborator Author

ngxson commented Nov 3, 2025

@TalonBvV thanks for the info. Yes I also almost come to the same conclusion. The main issue is that PaddleOCR is not just one monolithic model like Qwen or Deepseek-OCR, but it's more like a pipeline of multiple models glued together. Therefore, I don't think we currently have the infrastructure to bring it into llama.cpp.

I'll close this PR for now as it's not giving any meaningful results. For users who need to do OCR task, I would recommend having a look at the latest Qwen3-VL series, or LightOnOCR-1B

@ngxson ngxson closed this Nov 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

examples python python script changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: support PaddleOCR-VL

2 participants