[llava] one pixel is missing from padding when length is odd by cyr0930 · Pull Request #37819 · huggingface/transformers

cyr0930 · 2025-04-28T03:12:49Z

One pixel is missing from padding when length is odd in llava family and aria model which is different from the logic in LLaVA-NeXT repo. In LLaVA-NeXT, this is automatically done as its logic resize image first and paste (or overwrite) original image to it (https://github.com/LLaVA-VL/LLaVA-NeXT/blob/main/llava/mm_utils.py#L186).

Also add vision_aspect_ratio argument to LlavaOnevisionImageProcessor to regulate max_num_patches during unpadding. It was hard-coded to "9" before. (https://github.com/LLaVA-VL/LLaVA-NeXT/blob/main/llava/model/llava_arch.py#L387)

And fix typo "processinf" to "processing" in documents.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@amyeroberts, @qubvel

github-actions · 2025-04-28T03:13:02Z

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the Ready for review button (at the bottom of the PR page). This will assign reviewers and trigger CI.

qubvel · 2025-04-28T09:15:52Z

cc @zucchini-nlp

zucchini-nlp

Thanks for submitting PR, I agree it has to be padded in square. Can you update the model code as well, where we do unpadding and mark the PR as ready for review when finalized?

zucchini-nlp · 2025-04-28T21:14:28Z

src/transformers/models/llava_onevision/processing_llava_onevision.py

+        max_num_patches = int(vision_aspect_ratio.strip("anyres_max_"))
+        ratio = math.sqrt(current_height * current_width / (max_num_patches * patches_height**2))


afaik we don't have hf-style checkpoints with different anyres patching. In that case there's no need to support since we try to not bloat code with unused features

Oh I understood your intention.
However I think supporting "vision_aspect_ratio" argument is a good idea in the sense of consistency as the folloing code already allow variable vision_aspect_ratio.

https://github.com/huggingface/transformers/blob/v4.51.3/src/transformers/models/llava_onevision/modeling_llava_onevision.py#L450

Ah oke, if code uses different anyres we can keep it :)

cyr0930 · 2025-05-02T05:38:29Z

Thanks for submitting PR, I agree it has to be padded in square. Can you update the model code as well, where we do unpadding and mark the PR as ready for review when finalized?

Okay I will. Thanks for the review tho.

cyr0930 · 2025-05-02T07:17:10Z

Maybe logic in LLaVA-NeXT project should be fixed too

zucchini-nlp

Thanks a lot, looks good to me! Hm, just checked LLaVA NeXT repo and seems they unpad image by removing same pixels from height/width. We try to stay close to the original implementation, so would be great to flag it to authors yeah

Also, if you can add a small test for edge cases with odd shapes? As an example, a prev issue (#34522) on edge case image size where model fails to unpad correctly. After testing, we're good to merge :)

(llava_onevision)

cyr0930 · 2025-05-06T07:32:34Z

Looks like "vision_aspect_ratio" is better to put in processor not in image_processor

zucchini-nlp

Oke, sounds good to keep in processor. Thanks for the tests!

HuggingFaceDocBuilderDev · 2025-05-06T10:28:26Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

…face#37819) * [fix] one pixel should be added when length is odd * [fix] add vision_aspect_ratio args & typo * [fix] style * [fix] do not fix fast file directly * [fix] convert using modular * remove duplicate codes * match unpad logic with pad logic * test odd-sized images for llava & aria * test unpad odd-sized padding for llava family * fix style * add kwarg to onvision modular * move vision_aspect_ratio from image_processor to processor (llava_onevision)

cyr0930 added 2 commits April 28, 2025 02:43

[fix] one pixel should be added when length is odd

f4e3b48

[fix] add vision_aspect_ratio args & typo

6035788

github-actions bot marked this pull request as draft April 28, 2025 03:13

cyr0930 added 3 commits April 28, 2025 03:21

[fix] style

51588ae

[fix] do not fix fast file directly

3da420f

[fix] convert using modular

359d9e6

zucchini-nlp reviewed Apr 29, 2025

View reviewed changes

cyr0930 added 2 commits May 2, 2025 07:01

remove duplicate codes

db60446

match unpad logic with pad logic

cc581ab

cyr0930 force-pushed the fix/one_pixel_off branch from ed31cb5 to cc581ab Compare May 2, 2025 07:04

Merge branch 'main' into fix/one_pixel_off

5bd0c64

cyr0930 marked this pull request as ready for review May 2, 2025 07:15

zucchini-nlp reviewed May 2, 2025

View reviewed changes

cyr0930 mentioned this pull request May 4, 2025

[bug] unpad logic is not suited with pad logic LLaVA-VL/LLaVA-NeXT#450

Closed

cyr0930 added 4 commits May 5, 2025 07:44

test odd-sized images for llava & aria

76db31b

test unpad odd-sized padding for llava family

da43b8c

fix style

8d3297d

add kwarg to onvision modular

63bd86e

cyr0930 force-pushed the fix/one_pixel_off branch from ea998a8 to 63bd86e Compare May 6, 2025 06:32

move vision_aspect_ratio from image_processor to processor

5e35b64

(llava_onevision)

Merge branch 'main' into fix/one_pixel_off

76a30bc

cyr0930 requested a review from zucchini-nlp May 6, 2025 09:31

zucchini-nlp approved these changes May 6, 2025

View reviewed changes

Merge branch 'main' into fix/one_pixel_off

917a9d1

zucchini-nlp merged commit acded47 into huggingface:main May 6, 2025
20 checks passed

cyr0930 mentioned this pull request May 7, 2025

[bug] fix llava processor to calculate unpadding size correctly #37988

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[llava] one pixel is missing from padding when length is odd#37819

[llava] one pixel is missing from padding when length is odd#37819
zucchini-nlp merged 15 commits intohuggingface:mainfrom
cyr0930:fix/one_pixel_off

cyr0930 commented Apr 28, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Apr 28, 2025

Uh oh!

qubvel commented Apr 28, 2025

Uh oh!

zucchini-nlp left a comment

Uh oh!

zucchini-nlp Apr 28, 2025

Uh oh!

cyr0930 May 2, 2025 •

edited

Loading

Uh oh!

zucchini-nlp May 2, 2025 •

edited

Loading

Uh oh!

cyr0930 commented May 2, 2025

Uh oh!

cyr0930 commented May 2, 2025

Uh oh!

zucchini-nlp left a comment •

edited

Loading

Uh oh!

cyr0930 commented May 6, 2025

Uh oh!

zucchini-nlp left a comment

Uh oh!

HuggingFaceDocBuilderDev commented May 6, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		max_num_patches = int(vision_aspect_ratio.strip("anyres_max_"))
		ratio = math.sqrt(current_height * current_width / (max_num_patches * patches_height**2))

Conversation

cyr0930 commented Apr 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Before submitting

Who can review?

Uh oh!

github-actions bot commented Apr 28, 2025

Uh oh!

qubvel commented Apr 28, 2025

Uh oh!

zucchini-nlp left a comment

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp Apr 28, 2025

Choose a reason for hiding this comment

Uh oh!

cyr0930 May 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp May 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cyr0930 commented May 2, 2025

Uh oh!

cyr0930 commented May 2, 2025

Uh oh!

zucchini-nlp left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cyr0930 commented May 6, 2025

Uh oh!

zucchini-nlp left a comment

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented May 6, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

cyr0930 commented Apr 28, 2025 •

edited

Loading

cyr0930 May 2, 2025 •

edited

Loading

zucchini-nlp May 2, 2025 •

edited

Loading

zucchini-nlp left a comment •

edited

Loading