-
Notifications
You must be signed in to change notification settings - Fork 33.6k
[Model] Add PP-Chart2Table Model Support #43767
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 27 commits
8aa566b
1a5908d
c51b1c6
5e3f1d3
2c064cc
81514c0
fc7c75f
01f2b29
d8cc881
117a6cb
053f59e
db1e9a8
d8763e5
3d8a654
4abb70d
1efd48b
779cbca
d61079a
618c63c
1079052
974d3b1
3f01494
3b91e2d
587652c
65b7d01
cc85b83
b419732
cc8bbca
9094eb5
55664d1
6bb4dbc
f79d83b
d050fe6
bae2c96
8e4062b
ac2bc66
636e759
45907f9
d0bf04f
0f7ed31
beea17f
6915583
6fe075b
ff8f634
86e9ec5
6d791e7
8e201c1
787e650
da92f90
7280cf5
957220a
9f74fa1
bcaef3d
f33cfb5
8394c08
3c0b028
b50607c
44529f7
ca92529
ba238c8
e7401f0
28653c3
d7d8ee8
d71e07b
c095f11
eb5c2a5
081537c
bcccd9d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,174 @@ | ||||||||||||||||
| <!--Copyright 2026 The HuggingFace Team. All rights reserved. | ||||||||||||||||
|
|
||||||||||||||||
| Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with | ||||||||||||||||
| the License. You may obtain a copy of the License at | ||||||||||||||||
|
|
||||||||||||||||
| http://www.apache.org/licenses/LICENSE-2.0 | ||||||||||||||||
|
|
||||||||||||||||
| Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on | ||||||||||||||||
| an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the | ||||||||||||||||
| specific language governing permissions and limitations under the License. | ||||||||||||||||
|
|
||||||||||||||||
| ⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be | ||||||||||||||||
| rendered properly in your Markdown viewer. | ||||||||||||||||
|
|
||||||||||||||||
| --> | ||||||||||||||||
| *This model was released on {release_date} and added to Hugging Face Transformers on 2026-03-16.* | ||||||||||||||||
|
|
||||||||||||||||
| # PP-Chart2Table | ||||||||||||||||
|
|
||||||||||||||||
| <div class="flex flex-wrap space-x-1"> | ||||||||||||||||
| <img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white"> | ||||||||||||||||
| </div> | ||||||||||||||||
|
|
||||||||||||||||
| ## Overview | ||||||||||||||||
|
|
||||||||||||||||
| **PP-Chart2Table** is a SOTA multimodal model developed by the PaddlePaddle team, specializing in chart parsing for both Chinese and English. Its high performance is driven by a novel "Shuffled Chart Data Retrieval" training task, which, combined with a refined token masking strategy, significantly improves its efficiency in converting charts to data tables. The model is further strengthened by an advanced data synthesis pipeline that uses high-quality seed data, RAG, and LLMs persona design to create a richer, more diverse training set. To address the challenge of large-scale unlabeled, out-of-distribution (OOD) data, the team implemented a two-stage distillation process, ensuring robust adaptability and generalization on real-world data. | ||||||||||||||||
|
|
||||||||||||||||
| ## Model Architecture | ||||||||||||||||
| PP-Chart2Table adopts a multimodal fusion architecture that combines a vision tower for chart feature extraction and a language model for table structure generation, enabling end-to-end chart-to-table conversion. | ||||||||||||||||
|
|
||||||||||||||||
|
|
||||||||||||||||
| ## Usage | ||||||||||||||||
|
|
||||||||||||||||
| ### Single input inference | ||||||||||||||||
|
|
||||||||||||||||
| The example below demonstrates how to classify image with PP-Chart2Table using [`Pipeline`] or the [`AutoModel`]. | ||||||||||||||||
|
|
||||||||||||||||
| <hfoptions id="usage"> | ||||||||||||||||
| <hfoption id="Pipeline"> | ||||||||||||||||
|
|
||||||||||||||||
| ```py | ||||||||||||||||
| import requests | ||||||||||||||||
| from PIL import Image | ||||||||||||||||
| from transformers import pipeline | ||||||||||||||||
| model_path = "PaddlePaddle/PP-Chart2Table_safetensors" | ||||||||||||||||
| pipe = pipeline( | ||||||||||||||||
| task="image-text-to-text", | ||||||||||||||||
| model=model_path, | ||||||||||||||||
| device_map="auto", | ||||||||||||||||
| ) | ||||||||||||||||
| image = Image.open(requests.get("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/chart_parsing_02.png", stream=True).raw) | ||||||||||||||||
| result = pipe( | ||||||||||||||||
| images=image, | ||||||||||||||||
| text="", | ||||||||||||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe a small comment to tell that we use prefilled instructions that are forced - as a note or tip somewhere. I don't think it's super obvious
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||||||||||||||||
| do_sample=False, | ||||||||||||||||
| max_new_tokens=256 | ||||||||||||||||
| ) | ||||||||||||||||
| print(result) | ||||||||||||||||
|
|
||||||||||||||||
| ``` | ||||||||||||||||
|
|
||||||||||||||||
| </hfoption> | ||||||||||||||||
|
|
||||||||||||||||
| <hfoption id="AutoModel"> | ||||||||||||||||
|
|
||||||||||||||||
| ```py | ||||||||||||||||
| import requests | ||||||||||||||||
| from PIL import Image | ||||||||||||||||
| from transformers import AutoModelForImageTextToText, AutoProcessor | ||||||||||||||||
|
|
||||||||||||||||
| model_path = "PaddlePaddle/PP-Chart2Table_safetensors" | ||||||||||||||||
| model = AutoModelForImageTextToText.from_pretrained( | ||||||||||||||||
| model_path, | ||||||||||||||||
| dtype="float32", | ||||||||||||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||||||||||||||||
| device_map="auto", | ||||||||||||||||
| ) | ||||||||||||||||
| processor = AutoProcessor.from_pretrained(model_path, use_fast=True).to(model.device) | ||||||||||||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
not sure but I don't think we need this
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||||||||||||||||
|
|
||||||||||||||||
| image = Image.open(requests.get("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/chart_parsing_02.png", stream=True).raw) | ||||||||||||||||
| inputs = processor(images=image) | ||||||||||||||||
|
|
||||||||||||||||
| generated_ids = model.generate(**inputs, use_cache=True, do_sample=False, max_new_tokens=256) | ||||||||||||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
same here, shouldn't be needed I think
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||||||||||||||||
| generated_ids_trimmed = [out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)] | ||||||||||||||||
| result = processor.batch_decode(generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False) | ||||||||||||||||
| print(result) | ||||||||||||||||
|
|
||||||||||||||||
| ``` | ||||||||||||||||
|
|
||||||||||||||||
| </hfoption> | ||||||||||||||||
| </hfoptions> | ||||||||||||||||
|
|
||||||||||||||||
| ### Batched inference | ||||||||||||||||
|
|
||||||||||||||||
| Here is how you can do it with PP-Chart2Table using [`Pipeline`] or the [`AutoModel`]: | ||||||||||||||||
|
|
||||||||||||||||
| <hfoptions id="usage"> | ||||||||||||||||
| <hfoption id="Pipeline"> | ||||||||||||||||
|
|
||||||||||||||||
| ```py | ||||||||||||||||
| import requests | ||||||||||||||||
| from transformers import pipeline | ||||||||||||||||
| from PIL import Image | ||||||||||||||||
| model_path = "PaddlePaddle/PP-Chart2Table_safetensors" | ||||||||||||||||
| pipe = pipeline( | ||||||||||||||||
| task="image-text-to-text", | ||||||||||||||||
| model=model_path, | ||||||||||||||||
| device_map="auto", | ||||||||||||||||
| ) | ||||||||||||||||
| image = Image.open(requests.get("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/chart_parsing_02.png", stream=True).raw) | ||||||||||||||||
| result = pipe( | ||||||||||||||||
| images=[image, image], | ||||||||||||||||
| text="", | ||||||||||||||||
| do_sample=False, | ||||||||||||||||
| max_new_tokens=256 | ||||||||||||||||
| ) | ||||||||||||||||
| print(result) | ||||||||||||||||
| ``` | ||||||||||||||||
|
|
||||||||||||||||
| </hfoption> | ||||||||||||||||
|
|
||||||||||||||||
| <hfoption id="AutoModel"> | ||||||||||||||||
|
|
||||||||||||||||
| ```py | ||||||||||||||||
| import requests | ||||||||||||||||
| from PIL import Image | ||||||||||||||||
| from transformers import AutoModelForImageTextToText, AutoProcessor | ||||||||||||||||
|
|
||||||||||||||||
| model_path = "PaddlePaddle/PP-Chart2Table_safetensors" | ||||||||||||||||
| model = AutoModelForImageTextToText.from_pretrained( | ||||||||||||||||
| model_path, | ||||||||||||||||
| dtype="float32", | ||||||||||||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||||||||||||||||
| device_map="auto", | ||||||||||||||||
| ) | ||||||||||||||||
| processor = AutoProcessor.from_pretrained(model_path).to(model.device) | ||||||||||||||||
|
|
||||||||||||||||
| image = Image.open(requests.get("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/chart_parsing_02.png", stream=True).raw) | ||||||||||||||||
| inputs = processor(images=[image, image]) | ||||||||||||||||
|
|
||||||||||||||||
| generated_ids = model.generate(**inputs, do_sample=False, max_new_tokens=256) | ||||||||||||||||
| generated_ids_trimmed = [out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)] | ||||||||||||||||
| result = processor.batch_decode(generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False) | ||||||||||||||||
| print(result) | ||||||||||||||||
| ``` | ||||||||||||||||
|
|
||||||||||||||||
| </hfoption> | ||||||||||||||||
| </hfoptions> | ||||||||||||||||
|
|
||||||||||||||||
| ## PPChart2TableForConditionalGeneration | ||||||||||||||||
|
|
||||||||||||||||
| [[autodoc]] PPChart2TableForConditionalGeneration | ||||||||||||||||
|
|
||||||||||||||||
| ## PPChart2TableModel | ||||||||||||||||
|
|
||||||||||||||||
| [[autodoc]] PPChart2TableModel | ||||||||||||||||
|
|
||||||||||||||||
| ## PPChart2TableConfig | ||||||||||||||||
|
|
||||||||||||||||
| [[autodoc]] PPChart2TableConfig | ||||||||||||||||
|
|
||||||||||||||||
| ## PPChart2TableVisionPreTrainedModel | ||||||||||||||||
|
|
||||||||||||||||
| [[autodoc]] PPChart2TableVisionPreTrainedModel | ||||||||||||||||
|
|
||||||||||||||||
| ## PPChart2TablePreTrainedModel | ||||||||||||||||
|
|
||||||||||||||||
| [[autodoc]] PPChart2TablePreTrainedModel | ||||||||||||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Nit: don't really need those exposed in the docs
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||||||||||||||||
|
|
||||||||||||||||
| ## PPChart2TableImageProcessorFast | ||||||||||||||||
|
|
||||||||||||||||
| [[autodoc]] PPChart2TableImageProcessorFast | ||||||||||||||||
|
|
||||||||||||||||
| ## PPChart2TableProcessor | ||||||||||||||||
|
|
||||||||||||||||
| [[autodoc]] PPChart2TableProcessor | ||||||||||||||||
| Original file line number | Diff line number | Diff line change | ||
|---|---|---|---|---|
|
|
@@ -450,6 +450,7 @@ def register_checkpoint_conversion_mapping( | |||
| "sam3_tracker", | ||||
| "sam3_tracker_video", | ||||
| "paddleocrvl", | ||||
| "ppchart2table", | ||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This changed on main, we need to do the same as here
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||||
| # NOTE: Slightly different from `model_type` (to follow naming conventions in vllm/sglang) | ||||
| "ernie4_5_vlmoe", | ||||
| "ernie4_5_vl_moe", # BC alias | ||||
|
|
||||
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -985,6 +985,7 @@ class _BaseModelWithGenerate(PreTrainedModel, GenerationMixin): | |||||
| ("perception_lm", "PerceptionLMForConditionalGeneration"), | ||||||
| ("pix2struct", "Pix2StructForConditionalGeneration"), | ||||||
| ("pixtral", "LlavaForConditionalGeneration"), | ||||||
| ("pp_chart2table", "PPChart2TableForConditionalGeneration"), | ||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
this way we don't reimplement but have the connection with
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||||||
| ("qwen2_5_vl", "Qwen2_5_VLForConditionalGeneration"), | ||||||
| ("qwen2_vl", "Qwen2VLForConditionalGeneration"), | ||||||
| ("qwen3_5", "Qwen3_5ForConditionalGeneration"), | ||||||
|
|
||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -255,6 +255,7 @@ | |
| else ("TokenizersBackend" if is_tokenizers_available() else None), | ||
| ), | ||
| ("plbart", "PLBartTokenizer" if is_tokenizers_available() else None), | ||
| ("pp_chart2table", "TokenizersBackend" if is_tokenizers_available() else None), | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not necessarily needed iirc, we auto fallback to the tokenizers backend but not a big deal to have it here as well
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||
| ("prophetnet", "ProphetNetTokenizer"), | ||
| ("qdqbert", "BertTokenizer" if is_tokenizers_available() else None), | ||
| ("qwen2", "Qwen2Tokenizer" if is_tokenizers_available() else None), | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,30 @@ | ||
| # Copyright 2026 The HuggingFace Team. All rights reserved. | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
| from typing import TYPE_CHECKING | ||
|
|
||
| from ...utils import _LazyModule | ||
| from ...utils.import_utils import define_import_structure | ||
|
|
||
|
|
||
| if TYPE_CHECKING: | ||
| from .configuration_pp_chart2table import * | ||
| from .image_processing_pp_chart2table_fast import * | ||
| from .modeling_pp_chart2table import * | ||
| from .processing_pp_chart2table import * | ||
|
vasqu marked this conversation as resolved.
|
||
| else: | ||
| import sys | ||
|
|
||
| _file = globals()["__file__"] | ||
| sys.modules[__name__] = _LazyModule(__name__, _file, define_import_structure(_file), module_spec=__spec__) | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to double check, we dont have a release date yet / will be done later in a follow up PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, released on 2025-05-20