[Model] Add PP-Chart2Table Model Support by XingweiDeng · Pull Request #43767 · huggingface/transformers

XingweiDeng · 2026-02-05T13:54:13Z

What does this PR do?

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

XingweiDeng · 2026-03-15T11:45:13Z

Hi @molbap @vasqu @yonigozlan , PP-Chart2Table is a model contributed by our team. Could you please take a look and review it when you have a moment? Thanks a lot!

zhang-prog · 2026-03-16T10:12:32Z

@vasqu This is PP-Chart2Table, one of the five models we plan to merge this week. PTAL.🤗

vasqu

Super solid! I only have a few smaller comments to align with a few smaller standards but overall nice job

vasqu · 2026-03-16T15:06:48Z

+model_path = "PaddlePaddle/PP-Chart2Table_safetensors"
+model = AutoModelForImageTextToText.from_pretrained(
+    model_path, 
+    dtype="float32",


Suggested change

dtype="float32",

vasqu · 2026-03-16T15:07:21Z

+    dtype="float32",
+    device_map="auto",
+)
+processor = AutoProcessor.from_pretrained(model_path, use_fast=True).to(model.device)


Suggested change

processor = AutoProcessor.from_pretrained(model_path, use_fast=True).to(model.device)

processor = AutoProcessor.from_pretrained(model_path).to(model.device)

not sure but I don't think we need this

vasqu · 2026-03-16T15:07:44Z

+image = Image.open(requests.get("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/chart_parsing_02.png", stream=True).raw)
+inputs = processor(images=image)
+
+generated_ids = model.generate(**inputs, use_cache=True, do_sample=False, max_new_tokens=256)


Suggested change

generated_ids = model.generate(**inputs, use_cache=True, do_sample=False, max_new_tokens=256)

generated_ids = model.generate(**inputs, do_sample=False, max_new_tokens=256)

same here, shouldn't be needed I think

vasqu · 2026-03-16T15:08:50Z

+model_path = "PaddlePaddle/PP-Chart2Table_safetensors"
+model = AutoModelForImageTextToText.from_pretrained(
+    model_path, 
+    dtype="float32",


Suggested change

dtype="float32",

vasqu · 2026-03-16T15:10:29Z

+## PPChart2TableVisionPreTrainedModel
+
+[[autodoc]] PPChart2TableVisionPreTrainedModel
+
+## PPChart2TablePreTrainedModel
+
+[[autodoc]] PPChart2TablePreTrainedModel


Suggested change

## PPChart2TableVisionPreTrainedModel

[[autodoc]] PPChart2TableVisionPreTrainedModel

## PPChart2TablePreTrainedModel

[[autodoc]] PPChart2TablePreTrainedModel

Nit: don't really need those exposed in the docs

vasqu · 2026-03-16T15:32:54Z

+    @unittest.skip(reason="PPChart2Table does not support this test.")
+    def test_model_is_small(self):
+        pass


Definitely shouldn't be skipped, let's shrink the model to make this pass. It's fairly important for our CI to run fast

vasqu · 2026-03-16T15:33:16Z

+    @unittest.skip(
+        reason="PPChart2Table have reused the GotOcr2 model, which does not implement the latest logic for capturing attentions and hidden_states introduced in Transformers v5."
+    )
+    def test_get_image_features_attentions(self):
+        pass
+
+    @unittest.skip(
+        reason="PPChart2Table have reused the GotOcr2 model, which does not implement the latest logic for capturing attentions and hidden_states introduced in Transformers v5."
+    )
+    def test_get_image_features_hidden_states(self):
+        pass


They are not skipped in got ocr2 so imo we should check what goes wrong

vasqu · 2026-03-16T15:34:29Z

+            **inputs,
+            use_cache=True,
+            do_sample=False,
+            max_new_tokens=1024,


I see that we properly end early but we probably should reduce this nonetheless

Suggested change

max_new_tokens=1024,

max_new_tokens=32,

just a wild guess

vasqu · 2026-03-16T15:36:42Z

    "BambaConfig": ["attn_layer_indices"],
    "Dots1Config": ["max_window_layers"],
    "JambaConfig": ["attn_layer_offset", "attn_layer_period", "expert_layer_offset", "expert_layer_period"],
+    "PPChart2TableConfig": ["tie_word_embeddings"],


Suggested change

"PPChart2TableConfig": ["tie_word_embeddings"],

shouldn't be needed, we have excluded some common attributes with ATTRIBUTES_TO_ALLOW in the file

zucchini-nlp · 2026-03-19T13:45:39Z

+        if images is None:
+            raise ValueError("At least one of `images` must be provided")
+        image_inputs = self.image_processor(images=images, **output_kwargs["images_kwargs"])
+
+        # Prepare input ids for batch
+        if text is None:
+            raise ValueError("At least one of `text` must be provided")
+
+        if not isinstance(text, list):
+            text = [text]
+
+        input_ids = self.tokenizer(text, **output_kwargs["text_kwargs"]).input_ids
+


nit: except for errors, looks same as super().__call__. We can do

if text is None or images is None: raise ValueError("Both `images` and `text` must be provided") return super().__call(images=images, text=text, **kwargs)

vasqu · 2026-03-19T14:57:54Z

#43514 just got merged, we need to fixup the img processor a bit, let me know if I should step in

yonigozlan

Cc @vasqu changes needed after image proc refactor ;)

vasqu · 2026-03-19T17:44:17Z

run-slow: pp_chart2table

github-actions · 2026-03-19T17:45:32Z

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/pp_chart2table"]
quantizations: []

github-actions · 2026-03-19T18:04:52Z

CI Results

Workflow Run ⚙️

Commit Info

Context	Commit	Description
RUN	3200c811	workflow commit (merge commit)
PR	eb5c2a51	branch commit (from PR)
main	e94695e5	base commit (on `main`)

Model CI Report

❌ 3 new failed tests from this PR 😭

pp_chart2table:
tests/models/pp_chart2table/test_modeling_pp_chart2table.py::PPChart2TableIntegrationTest::test_small_model_integration_test_pp_chart2table (✅ ⟹ ❌)
tests/models/pp_chart2table/test_modeling_pp_chart2table.py::PPChart2TableIntegrationTest::test_small_model_integration_test_pp_chart2table_batched (✅ ⟹ ❌)
tests/models/pp_chart2table/test_processing_pp_chart2table.py::PPChart2TableProcessorTest::test_ocr_queries (✅ ⟹ ❌)

vasqu · 2026-03-19T18:36:20Z

run-slow: pp_chart2table

github-actions · 2026-03-19T18:37:23Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, pp_chart2table

github-actions · 2026-03-19T18:37:52Z

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/pp_chart2table"]
quantizations: []

github-actions · 2026-03-19T18:48:02Z

CI Results

Workflow Run ⚙️

Commit Info

Context	Commit	Description
RUN	e8916c60	workflow commit (merge commit)
PR	081537cc	branch commit (from PR)
main	b96f8a98	base commit (on `main`)

✅ No failing test specific to this PR 🎉 👏 !

HuggingFaceDocBuilderDev · 2026-03-19T18:59:55Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

XingweiDeng added 17 commits February 5, 2026 21:52

init

8aa566b

fix doc

1a5908d

update

c51b1c6

update

5e3f1d3

update

2c064cc

update

81514c0

update

fc7c75f

update

01f2b29

update

d8cc881

update

117a6cb

Merge remote-tracking branch 'official/main' into feat/pp_chart2table

053f59e

refactor image_processor_fast

db1e9a8

update

d8763e5

update

3d8a654

update

4abb70d

update

1efd48b

update

779cbca

XingweiDeng added 10 commits March 15, 2026 19:53

update

d61079a

update

618c63c

update

1079052

update

974d3b1

update

3f01494

update

3b91e2d

merge transformers main

587652c

update

65b7d01

update

cc85b83

update

b419732

vasqu reviewed Mar 16, 2026

View reviewed changes

zucchini-nlp reviewed Mar 19, 2026

View reviewed changes

XingweiDeng added 4 commits March 19, 2026 22:33

update

3c0b028

update

b50607c

update

44529f7

merge

ca92529

XingweiDeng and others added 4 commits March 19, 2026 22:58

update

ba238c8

fixup after new refactor

e7401f0

fix

28653c3

update

d7d8ee8

XingweiDeng closed this Mar 19, 2026

XingweiDeng reopened this Mar 19, 2026

update

d71e07b

yonigozlan reviewed Mar 19, 2026

View reviewed changes

Comment thread tests/models/pp_chart2table/test_image_processing_pp_chart2table.py Outdated

Comment thread tests/models/pp_chart2table/test_image_processing_pp_chart2table.py Outdated

Comment thread tests/models/pp_chart2table/test_image_processing_pp_chart2table.py Outdated

vasqu and others added 2 commits March 19, 2026 18:22

last fixups

c095f11

update

eb5c2a5

Merge branch 'main' into feat/pp_chart2table

081537c

remove my todos I left there

bcccd9d

vasqu enabled auto-merge March 19, 2026 18:50

vasqu added this pull request to the merge queue Mar 19, 2026

Merged via the queue into huggingface:main with commit aa1c36f Mar 19, 2026
28 checks passed

	processor = AutoProcessor.from_pretrained(model_path, use_fast=True).to(model.device)
	processor = AutoProcessor.from_pretrained(model_path).to(model.device)

	generated_ids = model.generate(**inputs, use_cache=True, do_sample=False, max_new_tokens=256)
	generated_ids = model.generate(**inputs, do_sample=False, max_new_tokens=256)

Conversation

XingweiDeng commented Feb 5, 2026

What does this PR do?

Before submitting

Who can review?

Uh oh!

XingweiDeng commented Mar 15, 2026

Uh oh!

zhang-prog commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vasqu left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vasqu commented Mar 19, 2026

Uh oh!

yonigozlan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vasqu commented Mar 19, 2026

Uh oh!

github-actions bot commented Mar 19, 2026

Uh oh!

github-actions bot commented Mar 19, 2026

CI Results

Commit Info

Model CI Report

Uh oh!

vasqu commented Mar 19, 2026

Uh oh!

github-actions bot commented Mar 19, 2026

Uh oh!

github-actions bot commented Mar 19, 2026

Uh oh!

github-actions bot commented Mar 19, 2026

CI Results

zhang-prog commented Mar 16, 2026 •

edited

Loading