Skip to content

Add Image Processor Fast Deformable DETR#34353

Merged
yonigozlan merged 10 commits intohuggingface:mainfrom
yonigozlan:add-copied-detr-fast
Nov 19, 2024
Merged

Add Image Processor Fast Deformable DETR#34353
yonigozlan merged 10 commits intohuggingface:mainfrom
yonigozlan:add-copied-detr-fast

Conversation

@yonigozlan
Copy link
Member

@yonigozlan yonigozlan commented Oct 23, 2024

What does this PR do?

Adds a fast image processor for Deformable DETR. Follows issue #33810.
This image processor is a result of this work on comparing different image processing method.

The diffs look bad but this PR is almost exclusively made up of # Copied from based on the fast image processor for DETR!

Implementation

See #34063

Usage

Except for the fact that it only returns torch tensors, this fast processor is fully compatible with the current one.
It can be instantiated through AutoImageProcessor with use_fast=True, or through the Class directly:

from transformers import AutoImageProcessor

processor = AutoImageProcessor.from_pretrained("SenseTime/deformable-detr", use_fast=True)
from transformers import DeformableDetrImageProcessorFast

processor = DeformableDetrImageProcessorFast.from_pretrained("SenseTime/deformable-detr")

Usage is the same as the current processor, except for the device kwarg:

from torchvision.io import read_image
images = torchvision.io.read_image(image_path)
processor = DeformableDetrImageProcessorFast.from_pretrained("SenseTime/deformable-detr")
images_processed = processor(images , return_tensors="pt", device="cuda")

If device is not specified:

  • If the input images are tensors, the processing will be done on the device of the images.
  • If the inputs are PIL or Numpy images, the processing is done on CPU.

Performance gains

  • Average over 100 runs on the same 480x640 image. No padding needed, as "all" the images have the same size.

benchmark_results_full_pipeline_deformable_detr_fast_single


  • Average over 10% of the COCO 2017 validation dataset, with batch_size=8. Forcing padding to 1333x1333 (="longest_edge"), as otherwise torch.compile needs to recompile if the different batches have different max sizes.

benchmark_results_full_pipeline_deformable_detr_fast_batched_compiled


  • Average over 10% of the COCO 2017 validation dataset, with batch_size=1. Forcing padding to 1333x1333.
    benchmark_results_full_pipeline_deformable_detr_fast_padded

Tests

  • The new image processor is tested on all the tests of the current processor.
  • I have also added two consistency tests (panoptic and detection) for processing on GPU vs CPU.

Who can review?

@ArthurZucker Pinging you directly as there is almost no "new" code here.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@yonigozlan yonigozlan marked this pull request as ready for review October 23, 2024 17:43
@ArthurZucker ArthurZucker requested a review from molbap October 24, 2024 13:51
@yonigozlan yonigozlan force-pushed the add-copied-detr-fast branch from b9cfe3b to f9848a7 Compare October 24, 2024 22:22
Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, same comment as for the other PR mostly! 🤗

@yonigozlan
Copy link
Member Author

Will make the modifications once this PR #34354 is merged, as most of them will be copied from :)

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I don't understand: literally everything is copied from. Why not directy map to use the detr class?

@yonigozlan yonigozlan force-pushed the add-copied-detr-fast branch from f7f480a to 4bc94a4 Compare November 5, 2024 15:57
@yonigozlan
Copy link
Member Author

One thing I don't understand: literally everything is copied from. Why not directy map to use the detr class?

All pre-processing functions are copied from image_processing_detr_fast, but the post-processing function are copied from image_processing_deformable_detr. I guess the post-processing functions are also the reasons why there is a base DeformableDetrImageProcessorFast in the first place, as all the base pre-processing functions are also copied from image_processing_detr

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks! Let's work to make it simpler to add these, with maybe a bit of abstraction on the FastImageProcessor class!

@yonigozlan yonigozlan merged commit eedc113 into huggingface:main Nov 19, 2024
BernardZach pushed a commit to BernardZach/transformers that referenced this pull request Dec 5, 2024
* add deformable detr image processor fast

* add fast processor to doc

* fix copies

* nit docstring

* Add tests gpu/cpu and fix docstrings

* fix docstring

* import changes from detr

* fix imports

* rebase and fix

* fix input data format change in detr and rtdetr fast
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants