Skip to content
This repository has been archived by the owner on Oct 9, 2023. It is now read-only.

Customizable data pipeline for object detection #159

Closed
reactivetype opened this issue Mar 4, 2021 · 3 comments
Closed

Customizable data pipeline for object detection #159

reactivetype opened this issue Mar 4, 2021 · 3 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@reactivetype
Copy link

reactivetype commented Mar 4, 2021

🚀 Feature

I would like to have a flexible interface to customize dataset and data pipeline for object detection

Motivation

Thanks for creating this fantastic library. For research or application, I want to use different datasets other than CustomCOCODataset. There are two possible scenarios:

  • using datasets readily available in different format (e.g. YOLO) without converting the format from YOLO to COCO. Here, I assume my model knows how to read and infer the labels (e.g. xyxy, xywh) and build targets from the dataset label.
  • I want to apply some multi-image data augmentation such as Mixup or mosaic augmentation to create new training image from the combination of multiple images from the dataset.

Is it possible to do any of these two scenarios? Can I swap the CustomCOCODataset with my custom LightningDataModule? Do we need to customize ObjectDetectionDataPipeline? I am not sure what the task pipeline is for. Some guideline would be appreciated. Thanks.

@reactivetype reactivetype added enhancement New feature or request help wanted Extra attention is needed labels Mar 4, 2021
@kaushikb11
Copy link
Contributor

Hi, @reactivetype! Yes, that sounds great. As you can see currently, the flow for OD is like this:

datamodule = ObjectDetectionData.from_coco(
    train_folder="data/coco128/images/train2017/",
    train_ann_file="data/coco128/annotations/instances_train2017.json",
    batch_size=5
)

model = ObjectDetector(num_classes=datamodule.num_classes)

We could add support for more datasets by adding class methods to the ObjectDetectionData class. For eg., ObjectDetectionData.from_yolo(..), ObjectDetectionData.from_voc(..), etc.

Yes, you could pass transformations functions to the train_transform argument in ObjecDetectionData.from_coco.

The purpose of the DataPipeline is to provide the flow for the transformation of data using hooks. So, depending on your data requirements, you could tweak it by creating a Subclass of it.

But right now, we are doing a refactor on DataPipeline #141. Hence, the behavior could change but would be a better experience for the User! :)

@edgarriba
Copy link
Contributor

edgarriba commented May 3, 2021

@reactivetype DataPipeline is already merged. Please, check if that suits your use case. On the other hand, we are refactoring the data modules to make it more flexible and user friendly in front of custom data structures. Take a look at #256

@edenlightning
Copy link
Contributor

Please feel free to reopen if needed!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants