Skip to content

Improve BatchFeature: stack list and lists of torch tensors#42750

Merged
yonigozlan merged 5 commits into
huggingface:mainfrom
yonigozlan:improve-batch-feature
Dec 12, 2025
Merged

Improve BatchFeature: stack list and lists of torch tensors#42750
yonigozlan merged 5 commits into
huggingface:mainfrom
yonigozlan:improve-batch-feature

Conversation

@yonigozlan

@yonigozlan yonigozlan commented Dec 9, 2025

Copy link
Copy Markdown
Contributor

I have been wanting to change that for a while, it shouldn't be a breaking change, but align what we support in BatchFeature between numpy arrays and torch tensors.
The issue was that np.array() works on lists of array and even nested list of arrays), but torch.tensor() doesn't, so if we tried to call batch feature, with lists of tensors, we would get an error. I haven't added support for nested lists of tensors as I haven't seen the need anywhere.
Also the errors we were getting were very generic, and not very useful, this should help with that.

Needed for #41388

@HuggingFaceDocBuilderDev

Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@zucchini-nlp zucchini-nlp left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great PR! Left a few questions to make sure I got it right

Comment thread src/transformers/models/eomt/image_processing_eomt_fast.py
Comment on lines 92 to 95
BatchFeature class for Fuyu image processor and processor.

The outputs dictionary from the processors contains a mix of tensors and lists of tensors.
"""

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like Fuyu has its own BatchFeature because we couldn't skip conversion or convert nested lists? 🤔 Can you inspect and maybe remove it if possible, now that the base class supports skipping?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I can do that in another PR! I didn't look into too much details what the Fuyu batch feature does, but it would be great to get rid of it

Comment on lines +122 to +127

# stack list of tensors if tensor_type is PyTorch (# torch.tensor() does not support list of tensors)
if isinstance(value, (list, tuple)) and len(value) > 0 and torch.is_tensor(value[0]):
return torch.stack(value)

# convert list of numpy arrays to numpy array (stack) if tensor_type is Numpy

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i dunno if you saw the PR. Community member noticed that VideoMetadata objects throw and error when return type is 'pt', because they can't be converted to tensors

I think we can add the fix here by checking if value is a list/array/etc and early existing otherwise. We won't be able to convert non-list objects anyway

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I was just wondering why we restricted BatchFeature to only be able to contain arrays/tensors structures in the first place, just to make sure we wouldn't break an important assumption by silently allowing other objects in BatchFeature.
Also these changes should be made along with changes to the ".to()" method no?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, and we never had a variety of model inputs in the past. Usually whatever is output from processor goes directly in forward, so 99% chance it's an array-like object

IMO we can break the assumption now

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok sgtm, I might do that in another PR though, as this would be quite a big change, and it might be lost in the 63+ files modified here just due to allowing stacking tensors


mean_value = round(pixel_values.mean().item(), 4)
self.assertEqual(mean_value, 0.2353)
self.assertEqual(mean_value, -0.2303)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to make sure: was it a typo or did this PR change pixel outputs?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll have to check on a CI runner if that value is correct, but this tests failed on main on my end (using an A10)

Comment thread tests/utils/test_feature_extraction_utils.py
@github-actions

Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: beit, bridgetower, cohere2_vision, convnext, deepseek_vl, deepseek_vl_hybrid, depth_pro, dinov3_vit, donut, dpt, efficientloftr, efficientnet, eomt, flava, fuyu, gemma3

@yonigozlan yonigozlan force-pushed the improve-batch-feature branch from e2ceac2 to 875c36e Compare December 10, 2025 18:33
@yonigozlan yonigozlan changed the title Improve BatchFeature: stack list and nested lists of torch tensors Improve BatchFeature: stack list and lists of torch tensors Dec 10, 2025
@github-actions

Copy link
Copy Markdown
Contributor

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=42750&sha=e2ceac

@zucchini-nlp zucchini-nlp left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oke, thanks, let's not forget about Fuyu and video metadata in subsequent PRs

@Cyrilvallez Cyrilvallez left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! LGTM

@yonigozlan yonigozlan merged commit a61aba5 into huggingface:main Dec 12, 2025
32 checks passed
yonigozlan added a commit to yonigozlan/transformers that referenced this pull request Dec 12, 2025
…ace#42750)

* stack lists of tensors in BatchFeature, improve error messages, add tests

* remove unnecessary stack in fast image processors and video processors

* make style

* fix tests
ArthurZucker pushed a commit that referenced this pull request Jan 21, 2026
* remove attributes and add all missing sub processors to their auto classes

* remove all mentions of .attributes

* cleanup

* fix processor tests

* fix modular

* remove last attributes

* fixup

* fixes after merge

* fix wrong tokenizer in auto florence2

* fix missing audio_processor + nits

* Override __init__ in NewProcessor and change hf-internal-testing-repo (temporarily)

* fix auto tokenizer test

* add init to markup_lm

* update CustomProcessor in custom_processing

* remove print

* nit

* refactor processor tests first part

* refactor part 2

* fix test modeling owlv2

* fix test_processing_layoutxlm

* Fix owlv2, wav2vec2, markuplm, voxtral issues

* part3

* refactor all processor with mixin

* add support for loading and saving multiple tokenizer natively

* remove exclude_attributes from save_pretrained

* get processor from pretrained instead of components in tests

* skip tests in colqwen2, pixtral

* modifs after review

* fix style and copies

* Fix after review

* add test_processor_from_pretrained_vs_from_components, fix failing tests

* fix overflowing_tokens tests

* add config for layoutxlm

* fix ci

* use modular

* fic docstring

* Fix most tests

* Standardize mgp_str tests

* fix oneformer processing tests + fix copies

* fix after review

* fix missing fet_images in fast image processors

* fix 01 - to check

* fix 02 - to check

* fix 03 - to check

* fix 03 - to check

* fix 03 - to check

* fix 04 - to check

* fix 05 - to check

* fix 06 - sytle

* fix 07 - revert

* Fix some errors

* Improve BatchFeature: stack list and lists of torch tensors (#42750)

* stack lists of tensors in BatchFeature, improve error messages, add tests

* remove unnecessary stack in fast image processors and video processors

* make style

* fix tests

* fix remaining tests

* fix copies

* Fix Lfm2_vl im proc test

* nit after review

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
SangbumChoi pushed a commit to SangbumChoi/transformers that referenced this pull request Jan 23, 2026
…ace#42750)

* stack lists of tensors in BatchFeature, improve error messages, add tests

* remove unnecessary stack in fast image processors and video processors

* make style

* fix tests
SangbumChoi pushed a commit to SangbumChoi/transformers that referenced this pull request Jan 23, 2026
* remove attributes and add all missing sub processors to their auto classes

* remove all mentions of .attributes

* cleanup

* fix processor tests

* fix modular

* remove last attributes

* fixup

* fixes after merge

* fix wrong tokenizer in auto florence2

* fix missing audio_processor + nits

* Override __init__ in NewProcessor and change hf-internal-testing-repo (temporarily)

* fix auto tokenizer test

* add init to markup_lm

* update CustomProcessor in custom_processing

* remove print

* nit

* refactor processor tests first part

* refactor part 2

* fix test modeling owlv2

* fix test_processing_layoutxlm

* Fix owlv2, wav2vec2, markuplm, voxtral issues

* part3

* refactor all processor with mixin

* add support for loading and saving multiple tokenizer natively

* remove exclude_attributes from save_pretrained

* get processor from pretrained instead of components in tests

* skip tests in colqwen2, pixtral

* modifs after review

* fix style and copies

* Fix after review

* add test_processor_from_pretrained_vs_from_components, fix failing tests

* fix overflowing_tokens tests

* add config for layoutxlm

* fix ci

* use modular

* fic docstring

* Fix most tests

* Standardize mgp_str tests

* fix oneformer processing tests + fix copies

* fix after review

* fix missing fet_images in fast image processors

* fix 01 - to check

* fix 02 - to check

* fix 03 - to check

* fix 03 - to check

* fix 03 - to check

* fix 04 - to check

* fix 05 - to check

* fix 06 - sytle

* fix 07 - revert

* Fix some errors

* Improve BatchFeature: stack list and lists of torch tensors (huggingface#42750)

* stack lists of tensors in BatchFeature, improve error messages, add tests

* remove unnecessary stack in fast image processors and video processors

* make style

* fix tests

* fix remaining tests

* fix copies

* Fix Lfm2_vl im proc test

* nit after review

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants