Skip to content
Merged
Show file tree
Hide file tree
Changes from 27 commits
Commits
Show all changes
68 commits
Select commit Hold shift + click to select a range
8aa566b
init
XingweiDeng Feb 5, 2026
1a5908d
fix doc
XingweiDeng Feb 9, 2026
c51b1c6
update
XingweiDeng Feb 24, 2026
5e3f1d3
update
XingweiDeng Feb 25, 2026
2c064cc
update
XingweiDeng Feb 25, 2026
81514c0
update
XingweiDeng Feb 26, 2026
fc7c75f
update
XingweiDeng Feb 27, 2026
01f2b29
update
XingweiDeng Feb 28, 2026
d8cc881
update
XingweiDeng Mar 2, 2026
117a6cb
update
XingweiDeng Mar 9, 2026
053f59e
Merge remote-tracking branch 'official/main' into feat/pp_chart2table
XingweiDeng Mar 10, 2026
db1e9a8
refactor image_processor_fast
XingweiDeng Mar 10, 2026
d8763e5
update
XingweiDeng Mar 13, 2026
3d8a654
update
XingweiDeng Mar 13, 2026
4abb70d
update
XingweiDeng Mar 13, 2026
1efd48b
update
XingweiDeng Mar 13, 2026
779cbca
update
XingweiDeng Mar 15, 2026
d61079a
update
XingweiDeng Mar 15, 2026
618c63c
update
XingweiDeng Mar 15, 2026
1079052
update
XingweiDeng Mar 15, 2026
974d3b1
update
XingweiDeng Mar 16, 2026
3f01494
update
XingweiDeng Mar 16, 2026
3b91e2d
update
XingweiDeng Mar 16, 2026
587652c
merge transformers main
XingweiDeng Mar 16, 2026
65b7d01
update
XingweiDeng Mar 16, 2026
cc85b83
update
XingweiDeng Mar 16, 2026
b419732
update
XingweiDeng Mar 16, 2026
cc8bbca
update
XingweiDeng Mar 17, 2026
9094eb5
update
XingweiDeng Mar 17, 2026
55664d1
update
XingweiDeng Mar 17, 2026
6bb4dbc
upddate
XingweiDeng Mar 17, 2026
f79d83b
update
XingweiDeng Mar 17, 2026
d050fe6
update
XingweiDeng Mar 18, 2026
bae2c96
update
XingweiDeng Mar 18, 2026
8e4062b
update
XingweiDeng Mar 18, 2026
ac2bc66
update
XingweiDeng Mar 18, 2026
636e759
Merge official main
XingweiDeng Mar 18, 2026
45907f9
update
XingweiDeng Mar 18, 2026
d0bf04f
update
XingweiDeng Mar 18, 2026
0f7ed31
update
XingweiDeng Mar 18, 2026
beea17f
update
XingweiDeng Mar 18, 2026
6915583
update
XingweiDeng Mar 18, 2026
6fe075b
update
XingweiDeng Mar 18, 2026
ff8f634
Merge remote-tracking branch 'official/main' into feat/pp_chart2table
XingweiDeng Mar 18, 2026
86e9ec5
update
XingweiDeng Mar 18, 2026
6d791e7
small fixes
vasqu Mar 18, 2026
8e201c1
more explicit skip msg
vasqu Mar 18, 2026
787e650
some quick fixes
vasqu Mar 18, 2026
da92f90
Merge branch 'main' into feat/pp_chart2table
vasqu Mar 18, 2026
7280cf5
fix
vasqu Mar 18, 2026
957220a
quick cleanups
vasqu Mar 18, 2026
9f74fa1
update
XingweiDeng Mar 19, 2026
bcaef3d
merge
XingweiDeng Mar 19, 2026
f33cfb5
update
XingweiDeng Mar 19, 2026
8394c08
update
XingweiDeng Mar 19, 2026
3c0b028
update
XingweiDeng Mar 19, 2026
b50607c
update
XingweiDeng Mar 19, 2026
44529f7
update
XingweiDeng Mar 19, 2026
ca92529
merge
XingweiDeng Mar 19, 2026
ba238c8
update
XingweiDeng Mar 19, 2026
e7401f0
fixup after new refactor
vasqu Mar 19, 2026
28653c3
fix
vasqu Mar 19, 2026
d7d8ee8
update
XingweiDeng Mar 19, 2026
d71e07b
update
XingweiDeng Mar 19, 2026
c095f11
last fixups
vasqu Mar 19, 2026
eb5c2a5
update
XingweiDeng Mar 19, 2026
081537c
Merge branch 'main' into feat/pp_chart2table
vasqu Mar 19, 2026
bcccd9d
remove my todos I left there
vasqu Mar 19, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/source/en/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -1272,6 +1272,8 @@
title: PP-OCRv5_mobile_det
- local: model_doc/pp_ocrv5_server_det
title: PP-OCRv5_server_det
- local: model_doc/pp_chart2table
title: PPChart2Table
- local: model_doc/pp_lcnet
title: PPLCNet
- local: model_doc/pp_lcnet_v3
Expand Down
174 changes: 174 additions & 0 deletions docs/source/en/model_doc/pp_chart2table.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,174 @@
<!--Copyright 2026 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.

⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.

-->
*This model was released on {release_date} and added to Hugging Face Transformers on 2026-03-16.*

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to double check, we dont have a release date yet / will be done later in a follow up PR?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, released on 2025-05-20


# PP-Chart2Table

<div class="flex flex-wrap space-x-1">
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
</div>

## Overview

**PP-Chart2Table** is a SOTA multimodal model developed by the PaddlePaddle team, specializing in chart parsing for both Chinese and English. Its high performance is driven by a novel "Shuffled Chart Data Retrieval" training task, which, combined with a refined token masking strategy, significantly improves its efficiency in converting charts to data tables. The model is further strengthened by an advanced data synthesis pipeline that uses high-quality seed data, RAG, and LLMs persona design to create a richer, more diverse training set. To address the challenge of large-scale unlabeled, out-of-distribution (OOD) data, the team implemented a two-stage distillation process, ensuring robust adaptability and generalization on real-world data.

## Model Architecture
PP-Chart2Table adopts a multimodal fusion architecture that combines a vision tower for chart feature extraction and a language model for table structure generation, enabling end-to-end chart-to-table conversion.


## Usage

### Single input inference

The example below demonstrates how to classify image with PP-Chart2Table using [`Pipeline`] or the [`AutoModel`].

<hfoptions id="usage">
<hfoption id="Pipeline">

```py
import requests
from PIL import Image
from transformers import pipeline
model_path = "PaddlePaddle/PP-Chart2Table_safetensors"
pipe = pipeline(
task="image-text-to-text",
model=model_path,
device_map="auto",
)
image = Image.open(requests.get("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/chart_parsing_02.png", stream=True).raw)
result = pipe(
images=image,
text="",

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a small comment to tell that we use prefilled instructions that are forced - as a note or tip somewhere. I don't think it's super obvious

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

do_sample=False,
max_new_tokens=256
)
print(result)

```

</hfoption>

<hfoption id="AutoModel">

```py
import requests
from PIL import Image
from transformers import AutoModelForImageTextToText, AutoProcessor

model_path = "PaddlePaddle/PP-Chart2Table_safetensors"
model = AutoModelForImageTextToText.from_pretrained(
model_path,
dtype="float32",

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
dtype="float32",

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

device_map="auto",
)
processor = AutoProcessor.from_pretrained(model_path, use_fast=True).to(model.device)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
processor = AutoProcessor.from_pretrained(model_path, use_fast=True).to(model.device)
processor = AutoProcessor.from_pretrained(model_path).to(model.device)

not sure but I don't think we need this

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


image = Image.open(requests.get("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/chart_parsing_02.png", stream=True).raw)
inputs = processor(images=image)

generated_ids = model.generate(**inputs, use_cache=True, do_sample=False, max_new_tokens=256)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
generated_ids = model.generate(**inputs, use_cache=True, do_sample=False, max_new_tokens=256)
generated_ids = model.generate(**inputs, do_sample=False, max_new_tokens=256)

same here, shouldn't be needed I think

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

generated_ids_trimmed = [out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)]
result = processor.batch_decode(generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False)
print(result)

```

</hfoption>
</hfoptions>

### Batched inference

Here is how you can do it with PP-Chart2Table using [`Pipeline`] or the [`AutoModel`]:

<hfoptions id="usage">
<hfoption id="Pipeline">

```py
import requests
from transformers import pipeline
from PIL import Image
model_path = "PaddlePaddle/PP-Chart2Table_safetensors"
pipe = pipeline(
task="image-text-to-text",
model=model_path,
device_map="auto",
)
image = Image.open(requests.get("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/chart_parsing_02.png", stream=True).raw)
result = pipe(
images=[image, image],
text="",
do_sample=False,
max_new_tokens=256
)
print(result)
```

</hfoption>

<hfoption id="AutoModel">

```py
import requests
from PIL import Image
from transformers import AutoModelForImageTextToText, AutoProcessor

model_path = "PaddlePaddle/PP-Chart2Table_safetensors"
model = AutoModelForImageTextToText.from_pretrained(
model_path,
dtype="float32",

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
dtype="float32",

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

device_map="auto",
)
processor = AutoProcessor.from_pretrained(model_path).to(model.device)

image = Image.open(requests.get("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/chart_parsing_02.png", stream=True).raw)
inputs = processor(images=[image, image])

generated_ids = model.generate(**inputs, do_sample=False, max_new_tokens=256)
generated_ids_trimmed = [out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)]
result = processor.batch_decode(generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False)
print(result)
```

</hfoption>
</hfoptions>

## PPChart2TableForConditionalGeneration

[[autodoc]] PPChart2TableForConditionalGeneration

## PPChart2TableModel

[[autodoc]] PPChart2TableModel

## PPChart2TableConfig

[[autodoc]] PPChart2TableConfig

## PPChart2TableVisionPreTrainedModel

[[autodoc]] PPChart2TableVisionPreTrainedModel

## PPChart2TablePreTrainedModel

[[autodoc]] PPChart2TablePreTrainedModel

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## PPChart2TableVisionPreTrainedModel
[[autodoc]] PPChart2TableVisionPreTrainedModel
## PPChart2TablePreTrainedModel
[[autodoc]] PPChart2TablePreTrainedModel

Nit: don't really need those exposed in the docs

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


## PPChart2TableImageProcessorFast

[[autodoc]] PPChart2TableImageProcessorFast

## PPChart2TableProcessor

[[autodoc]] PPChart2TableProcessor
1 change: 1 addition & 0 deletions src/transformers/conversion_mapping.py
Original file line number Diff line number Diff line change
Expand Up @@ -450,6 +450,7 @@ def register_checkpoint_conversion_mapping(
"sam3_tracker",
"sam3_tracker_video",
"paddleocrvl",
"ppchart2table",

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changed on main, we need to do the same as here

for this model type

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

# NOTE: Slightly different from `model_type` (to follow naming conventions in vllm/sglang)
"ernie4_5_vlmoe",
"ernie4_5_vl_moe", # BC alias
Expand Down
1 change: 1 addition & 0 deletions src/transformers/models/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -317,6 +317,7 @@
from .plbart import *
from .poolformer import *
from .pop2piano import *
from .pp_chart2table import *
from .pp_doclayout_v2 import *
from .pp_doclayout_v3 import *
from .pp_lcnet import *
Expand Down
2 changes: 2 additions & 0 deletions src/transformers/models/auto/configuration_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -355,6 +355,7 @@
("plbart", "PLBartConfig"),
("poolformer", "PoolFormerConfig"),
("pop2piano", "Pop2PianoConfig"),
("pp_chart2table", "PPChart2TableConfig"),
Comment thread
vasqu marked this conversation as resolved.
("pp_doclayout_v2", "PPDocLayoutV2Config"),
("pp_doclayout_v3", "PPDocLayoutV3Config"),
("pp_lcnet", "PPLCNetConfig"),
Expand Down Expand Up @@ -869,6 +870,7 @@
("plbart", "PLBart"),
("poolformer", "PoolFormer"),
("pop2piano", "Pop2Piano"),
("pp_chart2table", "PPChart2Table"),
("pp_doclayout_v2", "PPDocLayoutV2"),
("pp_doclayout_v3", "PPDocLayoutV3"),
("pp_lcnet", "PPLCNet"),
Expand Down
1 change: 1 addition & 0 deletions src/transformers/models/auto/image_processing_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -170,6 +170,7 @@
("pixio", ("BitImageProcessor", "BitImageProcessorFast")),
("pixtral", ("PixtralImageProcessor", "PixtralImageProcessorFast")),
("poolformer", ("PoolFormerImageProcessor", "PoolFormerImageProcessorFast")),
("pp_chart2table", (None, "PPChart2TableImageProcessorFast")),
("pp_doclayout_v2", (None, "PPDocLayoutV2ImageProcessorFast")),
("pp_doclayout_v3", (None, "PPDocLayoutV3ImageProcessorFast")),
("pp_lcnet", (None, "PPLCNetImageProcessorFast")),
Expand Down
1 change: 1 addition & 0 deletions src/transformers/models/auto/modeling_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -985,6 +985,7 @@ class _BaseModelWithGenerate(PreTrainedModel, GenerationMixin):
("perception_lm", "PerceptionLMForConditionalGeneration"),
("pix2struct", "Pix2StructForConditionalGeneration"),
("pixtral", "LlavaForConditionalGeneration"),
("pp_chart2table", "PPChart2TableForConditionalGeneration"),

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
("pp_chart2table", "PPChart2TableForConditionalGeneration"),
("pp_chart2table", "GotOcr2ForConditionalGeneration"),

this way we don't reimplement but have the connection with model_type. Keep in mind that the config on the hub needs "architectures": ["GotOcr2ForConditionalGeneration"] then

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

("qwen2_5_vl", "Qwen2_5_VLForConditionalGeneration"),
("qwen2_vl", "Qwen2VLForConditionalGeneration"),
("qwen3_5", "Qwen3_5ForConditionalGeneration"),
Expand Down
1 change: 1 addition & 0 deletions src/transformers/models/auto/processing_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -136,6 +136,7 @@
("pix2struct", "Pix2StructProcessor"),
("pixtral", "PixtralProcessor"),
("pop2piano", "Pop2PianoProcessor"),
("pp_chart2table", "PPChart2TableProcessor"),
("qwen2_5_omni", "Qwen2_5OmniProcessor"),
("qwen2_5_vl", "Qwen2_5_VLProcessor"),
("qwen2_audio", "Qwen2AudioProcessor"),
Expand Down
1 change: 1 addition & 0 deletions src/transformers/models/auto/tokenization_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -255,6 +255,7 @@
else ("TokenizersBackend" if is_tokenizers_available() else None),
),
("plbart", "PLBartTokenizer" if is_tokenizers_available() else None),
("pp_chart2table", "TokenizersBackend" if is_tokenizers_available() else None),

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily needed iirc, we auto fallback to the tokenizers backend but not a big deal to have it here as well

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

("prophetnet", "ProphetNetTokenizer"),
("qdqbert", "BertTokenizer" if is_tokenizers_available() else None),
("qwen2", "Qwen2Tokenizer" if is_tokenizers_available() else None),
Expand Down
30 changes: 30 additions & 0 deletions src/transformers/models/pp_chart2table/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# Copyright 2026 The HuggingFace Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

from typing import TYPE_CHECKING

from ...utils import _LazyModule
from ...utils.import_utils import define_import_structure


if TYPE_CHECKING:
from .configuration_pp_chart2table import *
from .image_processing_pp_chart2table_fast import *
from .modeling_pp_chart2table import *
from .processing_pp_chart2table import *
Comment thread
vasqu marked this conversation as resolved.
else:
import sys

_file = globals()["__file__"]
sys.modules[__name__] = _LazyModule(__name__, _file, define_import_structure(_file), module_spec=__spec__)
Loading
Loading