Add GLPNImageProcessorFast with enhanced 4-channel support for #36978 #40472

akacmazz · 2025-08-26T22:14:30Z

What does this PR do?

This commit introduces GLPNImageProcessorFast, a PyTorch-optimized image processor for GLPN models with enhanced multi-channel support.

Key improvements:

Added GLPNImageProcessorFast class with native PyTorch tensor processing
Enhanced support for 1, 3, and 4-channel images (including RGBA)
Optimized preprocessing pipeline using torchvision transforms
Updated GLPNImageProcessor to support 4-channel inference
Added comprehensive tests for multi-channel image processing
Added proper documentation for the new processor

The fast processor leverages PyTorch tensors throughout the processing pipeline, providing better performance and memory efficiency compared to the PIL-based approach. Both processors now support variable channel dimensions for improved flexibility.

Technical details:

Uses torchvision.transforms for efficient tensor-based preprocessing
Implements proper channel dimension handling with infer_channel_dimension_format(num_channels=(1,3, 4))
Maintains API compatibility with existing GLPNImageProcessor
Provides significant performance improvements for PyTorch workflows

This enhancement enables GLPN models to work seamlessly with RGBA images and other multi-channel inputs, which is particularly useful for computer vision applications involving images with transparency channels.

Fixes #36978 #36978

Did you read the [contributor guideline](https://github.com/huggingface/transformers/blob/main/
CONTRIBUTING.md#create-a-pull-request),
Pull Request section?
Was this discussed/approved via a Github issue or the forum?
Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting
docstrings.
Did you write any new necessary tests?

Who can review?

@yonigozlan, @amyeroberts, @qubvel

This PR focuses on vision models (GLPN) and adds a new fast image processor with enhanced channel support. The changes include both the core implementation and comprehensive testing.

│ │ This commit introduces GLPNImageProcessorFast, a PyTorch-optimized image processor for GLPN models │ with enhanced multi-channel support. │ │ Key improvements: │ - Added GLPNImageProcessorFast class with native PyTorch tensor processing │ - Enhanced support for 1, 3, and 4-channel images (including RGBA) │ - Optimized preprocessing pipeline using torchvision transforms │ - Updated GLPNImageProcessor to support 4-channel inference │ - Added comprehensive tests for multi-channel image processing │ - Added proper documentation for the new processor │ │ The fast processor leverages PyTorch tensors throughout the processing pipeline, │ providing better performance and memory efficiency compared to the PIL-based approach. │ Both processors now support variable channel dimensions for improved flexibility. │ │

github-actions · 2025-08-26T22:15:46Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, glpn

yonigozlan

Hey @akacmazz, thanks for contributing! Quite a few simplification possible here!

yonigozlan · 2025-09-08T18:04:01Z

src/transformers/models/glpn/image_processing_glpn.py

        if input_data_format is None:
            # We assume that all images have the same channel dimension format.
-            input_data_format = infer_channel_dimension_format(images[0])
+            input_data_format = infer_channel_dimension_format(images[0], num_channels=(1, 3, 4))


Not sure why that would be needed if the numpy 4 channels test was passing.

yonigozlan · 2025-09-08T18:04:21Z