Are new models planned to be added? #2707

talcs · 2020-09-24T20:14:28Z

oke-aditya · 2020-09-25T02:35:59Z

fmassa · 2020-09-25T15:15:41Z

Hi,

To complement @oke-aditya great answer, we will be adding more models to torchvision, including Efficient Nets and MobileNetV3.

The current limitation is that we would like to ensure that we can reproduce the pretrained model using the training scripts from references/classification, but those models require a different training recipe than then one present in [references/classification`](https://github.com/pytorch/vision/tree/master/references/classification), so we will need to update those recipes before uploading those new models.

songyuc · 2021-01-11T08:11:18Z

I hope to add Mish activation function.

digantamisra98 · 2021-01-19T04:26:41Z

@songyuc There is a closed feature request on PyTorch for adding Mish. You can comment over there for increased visibility so that Mish can be considered to be added in the future. Link to the issue - pytorch/pytorch#25584

WZMIAOMIAO · 2021-02-24T02:29:38Z

first, thanks for your great works.
I hope to add Swish activation and NFNets(High-Performance Large-Scale Image Recognition Without Normalization) https://arxiv.org/abs/2102.06171.
In addition, I would like to ask when eficientnet can be added. I found that it was mentioned in 2019, but now it's 2021. I refer to the mobilenetV3 model in torchvision, then I built efficientnet models Test9_efficientNet, but I don't have a GPU to train with.

oke-aditya · 2021-02-24T03:38:24Z

Hi @WZMIAOMIAO Swish Activation function is added in to PyTorch (not torchvision) as nn.Silu.
Mobilenetv3 would be hopefully available in next release.

WZMIAOMIAO · 2021-02-24T07:10:48Z

@oke-aditya Thank you for your reply. I've seen MobileNetv3 in the torchvision repository. When will EfficientNet, RegNet and NFNet be added?

stanwinata · 2021-10-06T17:38:36Z

Hey guys, I was wondering if the pytorch team are open for public contributions to these models? 🤔
I assume we can follow similar PR formats to the one here and here along with validation/proof that we can reproduce paper results.

datumbox · 2021-10-06T17:55:54Z

@stwinata Thanks for offering. Which models do you have in mind to contribute?

The process of model contribution was a bit problematic (mainly due to the training bit) and we still haven't figure out all details. But depending on the proposal, we might be able to work something out. :)

stanwinata · 2021-10-06T18:28:01Z

@stwinata Thanks for offering. Which models do you have in mind to contribute?

@datumbox thanks for the quick reply! I am interested in DETR or EfficientDet. I was thinking for first commit maybe DETR might be easier, since we can use DETR's original repo for referene and may be able try to load weights for preliminary validations.

The process of model contribution was a bit problematic (mainly due to the training bit) and we still haven't figure out all details. But depending on the proposal, we might be able to work something out. :)

Perhaps we can also try to determine a canonical pipeline for model contribution through this experience and document it S.T others can contribute in the future easily 😃 !

stanwinata · 2021-10-06T18:48:55Z

(mainly due to the training bit)

@datumbox Does this come down to lack of GPU resources? Or is it due to the need to validate that it can properly train?

datumbox · 2021-10-06T19:04:54Z

@stwinata DETR sounds a good addition to me. Since @fmassa is one of the main authors, I will let him have the final say on this.

Contributing models is tricky because:

To reproduce the paper it's an iterative process of code + recipe + training. Getting a PR that just adds the implementation is less useful because someone from the maintainers needs to do the heavy lifting of the model reproduction which is the time consuming bit. This is why in the past we avoided accepting contributions on this space.
On the other hand if someone actually sends a PR that reproduces the paper and has weights, then the only thing for the maintainers is to confirm the accuracy of the pre-trained weights and retrain the model using the reference script to ensure our recipe works as expected.
The GPU resources is not a concern for us but for the contributor. We can train models but this is not an infra available for open-source maintainers.

Happy to discuss more and see if it's worth doing this now.

stanwinata · 2021-10-06T19:20:00Z

@datumbox These comments makes sense 😃

To reproduce the paper it's an iterative process of code + recipe + training. Getting a PR that just adds the implementation is less useful because someone from the maintainers needs to do the heavy lifting of the model reproduction which is the time consuming bit. This is why in the past we avoided accepting contributions on this space.

On the other hand if someone actually sends a PR that reproduces the paper and has weights, then the only thing for the maintainers is to confirm the accuracy of the pre-trained weights and retrain the model using the reference script to ensure our recipe works as expected.

Yeah I agree, some might even say getting it to models to be "useful" aka reproducing the Paper results are the fun bits 😃
I think in the future model contributions/PRs should include:

Implementation
Saved weights
Proof of Paper's Benchmark reproduction
documentations

I think this way, we can ease the load on Pytorch/Vision maintainers, make PRs much more concrete and useful.

Perhaps we can also have a simple util script that tests trained candidate implementations on various benchmarks.(this might be another feature request 😄 )

The GPU resources is not a concern for us but for the contributor. We can train models but this is not an infra available for open-source maintainers.

I also agree with this. Moreover, I think these days GPU-resources either at home, or thru AWS and GCP are getting ubiquitous enough for contributors to do training by themselves 😃

datumbox · 2021-10-07T08:43:20Z

@stwinata Thanks for the comments. I think we agree. Below I write few thoughts on the potential process we could adopt.

The minimum to merge such a contribution is:

The PR must include the code implementation, have documentation and tests.
It should also extend the existing reference scripts used to train the model.
The weights need to reproduce closely the results of the paper in terms of accuracy.
The PR description should include commands/configuration used to train the model, so that we can easily run them on our infra to verify.

Note that there are details here related to the code quality etc, but these are rules that apply in all PRs.

For someone who would be interested in adding a model, here are a few important considerations:

Training big models requires lots of resources and the cost quickly adds up.
Reproducing models is fun but also risky as you might not always get the results reported on the paper. It might require a huge amount of effort to close the gap.
The contribution might not get merged if we significantly lack in terms of accuracy, speed etc.

The above are a very big ask I think. But if an OSS contributor is willing to give it a try despite the above adversities, then we would be happy to pair up and help. This should happen in a coordinated way to:

Ensure that the model in question is of interest and that nobody else is already working on adding it.
Ensure there is an assigned maintainer providing support, guidance and regular feedback.

@fmassa let me know your thoughts on this as well.

xiaohu2015 · 2021-11-16T13:30:37Z

I am aming at adding FCOS to torchvision.
https://github.com/xiaohu2015/vision/blob/main/torchvision/models/detection/fcos.py

xiaohu2015 · 2022-05-12T02:17:44Z

@santhoshnumberone I think DINO is more practical, since user can train less epochs to get good mAP.

santhoshnumberone · 2022-05-13T07:03:16Z

@santhoshnumberone I think DINO is more practical, since user can train less epochs to get good mAP.

What I meant was, if anyone of you could check if facebookresearch/dino and the DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection are not the same.

I feel both are different

oke-aditya · 2022-05-13T07:08:41Z

There are many variants of DETR. E.g. deformable DETR, modulated DETR, A simple search would give these results.

https://github.com/search?q=DETR

Let's start of by including vanilla DETR :)

zhiqwang · 2022-06-22T03:41:53Z

Hi , MobileOne introduced by Apple is interesting, the mobile-vision team implement it at facebookresearch/mobile-vision#91 , is there any plan to support it there?

datumbox · 2022-06-22T07:39:54Z

@zhiqwang MobileOne wan't in our shotlist but we can certainly keep an eye on it if it builds momentum.

abhi-glitchhg · 2022-06-23T19:05:25Z

MobileViT is another lightweight vision transformer-based model. The code is available here . Might be good to keep an eye on this one too.

oke-aditya · 2022-07-10T04:29:15Z

Any small list for Semantic Segmentation models?

Maybe a tentative one

I can try U2Net. Maybe it's an easy Model

talregev · 2022-07-10T04:33:38Z

Please add BiFPN

https://paperswithcode.com/method/bifpn#:~:text=A%20BiFPN%2C%20or%20Weighted%20Bi,fast%20multi%2Dscale%20feature%20fusion.

oke-aditya · 2022-07-10T04:38:32Z

Yesss BiFPN is very popular. Maybe once we initiate EfficientDet it would get added.

talregev · 2022-07-10T04:40:04Z

Will you try to add EfficientDet?

oke-aditya · 2022-07-10T05:26:46Z

I'm pretty noob when it comes to implementing models. But maybe I will give it a shot after I add a few easy models.

talregev · 2022-07-10T18:03:48Z

@datumbox Please Pin this issue.

talregev · 2022-07-14T11:39:22Z

U-Net (Not SOTA? But we do have AlexNet which was SOTA long back)

@oke-aditya Can you add Unet?

oke-aditya · 2022-07-14T11:46:21Z

Will need to discuss with the maintainers, and I think U2-Net will be more helpful. U-Net I'm not sure. I think I can implement them.

talregev · 2022-07-14T11:54:06Z

Will need to discuss with the maintainers, and I think U2-Net will be more helpful. U-Net I'm not sure. I think I can implement them.

The maintainers are not responsive to issues. Ones you open a PR, they will talk with you.

NicolasHug · 2022-07-14T12:46:03Z

Hi @talregev ,

It's nice to see that you're excited about torchvision models. I'm going to ask you to be a little more patient here. @datumbox is on well deserved holidays and I'm sure he'll get back to you as soon as he can.

The maintainers are not responsive to issues. Ones you open a PR, they will talk with you.

Rest assured that we are responsive to issues. Furthermore, like in most projects, we don't encourage opening PRs prior to opening issues in order to leave time and space for discussing the requested feature.

As a side note:

Please do X

Will you/can you do Y

@username (as in #2707 (comment))

isn't the best way to engage with open source projects. We always welcome suggestions and feature requests, but in order for us to help you best, we usually need a bit more details on what is requested, and why it would be useful to you. Also, while a gentle ping can sometimes be appropriate, just at-ing people without context or form might not get you the outcome that you're looking for.

talregev · 2022-07-14T12:50:26Z

Hi @talregev ,

It's nice to see that you're excited about torchvision models. I'm going to ask you to be a little more patient here. @datumbox is on well deserved holidays and I'm sure he'll get back to you as soon as he can.

The maintainers are not responsive to issues. Ones you open a PR, they will talk with you.

Rest assured that we are responsive to issues. Furthermore, like in most projects, we don't encourage opening PRs prior to opening issues in order to leave time and space for discussing the requested feature.

As a side note:

Please do X

Will you/can you do Y

@username (as in #2707 (comment))

isn't the best way to engage with open source projects. We always welcome suggestions and feature requests, but in order for us to help you best, we usually need a bit more details on what is requested, and why it would be useful to you. Also, while a gentle ping can sometimes be appropriate, just at-ing people without context or form might not get you the outcome that you're looking for.

@NicolasHug Thank you for your nice suggestion. Can you pin this issue?

datumbox · 2022-07-25T15:59:30Z

@talregev Apologies for the delayed response. I was on my annual leave.

As others pointed out, U-net is a bit old now (released in 2015) and there are quite a few good community implementations already. If we were to add more models, we would probably prioritizing transformer approaches that yield better results. We don't have immediate plans for this though as our focus this half would be Videos.

Concerning pinning the issue, we got quite a few pinned ones already so I'm not sure this will increase the visibility. There are at least 4-5 more tickets like this for losses, operators, data augmentations etc. Perhaps the solution here is to pin the issue with our H2 roadmap once finalized and then link to this issue from there.

yokosyun · 2022-08-18T22:01:15Z

I also want to have BiFPN neck

pri1311 · 2022-11-19T05:55:38Z

Any small list for Semantic Segmentation models?

Hey, I was currently working with SegFormer, TransUnet, UNETR (although I am working with medical imaging datasets solely now).

All three of them are relatively new models (early 2021) but they have shown good results and also have a fair amount of citations.
Any thoughts from the maintainers, on whether they are worth adding to torchvision?

Edit: MaskFormer and DPT could also be good additions I believe.

oke-aditya · 2022-11-20T19:28:42Z

Hi. While these models are new. These look to be specialized in medical image segmentation.

Are these also valid on general datasets such as Pascal VOC or COCO? Or is there any valid performance measurement over these standard datasets?

pri1311 · 2022-11-20T19:41:50Z

These look to be specialized in medical image segmentation.

UNETR and TransUnet yes.

SegFormer, MaskFormer and DPT support Cityscapes, ADE20k, coco, etc

Coderx7 · 2023-02-16T13:05:50Z

Hello everyone,
Would you kindly consider adding the SimpleNet architecture to the classification section?
SimpleNet is a 2016 architecture, that outperformed deeper and more complex architectures at the time using plain CNN.
There was never an imagenet model due to me not having the proper infrastructure, but recently I could train some variants that perform very well.
Here's the result as of now:

Method	#Params	ImageNet	ImageNet-Real-Labels(val)
simplenetv1_9m_m2(36.3 MB)	9.5m	74.23/91.748	81.22/94.756
simplenetv1_5m_m2(22 MB)	5.7m	72.03/90.324	79.328/93.714
simplenetv1_small_m2_075(12.6 MB)	3m	68.506/88.15	76.283/92.02
simplenetv1_small_m2_05(5.78 MB)	1.5m	61.67/83.488	69.31/ 88.195

As you can see, it outperforms VGGNet and many other architectures, it also outperforms resnet18(11m vs 5m), and some MobileNet variants as well, and achieves a high accuracy nonetheless despite being super simple and compared to architectures such as DenseNet, it performs very well with a fraction of memory usage.

It performs much faster on older GTX cards, but still performs decently on new hardware as well.
Unlike MobileNet class of architectures, it doesn't need QAT for getting maximum accuracy after quantization, static quantization already can provide excellent results pretty close to before quantization.

I believe having an efficient yet simple architecture that uses basic operators and provides a decent performance can be a good addition to the diversity of models in the Pytorch repository.

Heres is the link to our pytorch implementation of simplenet : https://github.com/Coderx7/SimpleNet_Pytorch

I'd be delighted to answer any questions.
Thank you very much for your time.

senarvi · 2023-05-02T15:40:06Z

There's new discussion about adding YOLO in this issue.

pmeier added new feature module: models needs discussion labels Sep 25, 2020

datumbox mentioned this issue Jan 8, 2021

Which paper is torchvision.ops.deform_conv2d from? #3233

Closed

This was referenced Jan 16, 2021

Implementation of Efficient-Net and Efficient-Det in Pytorch Lightning Lightning-Universe/lightning-bolts#1

Open

Pre-trained shufflenetv2_x1.5 and shufflenetv2_x2.0 raise "...not supported as of now". #3257

Closed

TorchVision Roadmap - 2021 H1 #3221

Closed

oke-aditya mentioned this issue Mar 4, 2021

add the pretrained resnext101_64x4d model #3485

Closed

oke-aditya mentioned this issue Apr 24, 2021

torchvision.models.mnasnet1_3(pretrained=True) #3722

Closed

This was referenced May 23, 2021

Inception-ResNet #3899

Closed

[RFC] TorchVision with Batteries included - Phase 1 #3911

Closed

oke-aditya mentioned this issue Jun 14, 2021

Mask R-CNN with MobileNet v3 backbone #4048

Open

datumbox mentioned this issue Sep 23, 2021

[Feature Request] Add EfficientNetV2 #4468

Closed

xiaohu2015 mentioned this issue Nov 11, 2021

add efficientnetv2 #4910

Closed

datumbox mentioned this issue Nov 15, 2021

add GN and GIoU loss for retinanet #4932

Closed

rvandeghen mentioned this issue May 12, 2022

Change number of coco classes in detection recipe #5999

Open

datumbox mentioned this issue Jul 6, 2022

[RFC] New Augmentation techniques in Torchvison #3817

Open

17 tasks

datumbox mentioned this issue Jul 27, 2022

[RFC] Batteries Included - Phase 3 #6323

Open

16 tasks

datumbox mentioned this issue Sep 20, 2022

[RFC] U-Net framework #6610

Open

This was referenced Aug 16, 2023

x1.5 model consistently underperforming RangiLyu/nanodet#524

Closed

Added pre-trained weights for ShuffleNetV2 x1.5 and x2.0 RangiLyu/nanodet#526

Merged

Are new models planned to be added? #2707

Are new models planned to be added? #2707

Comments

talcs commented Sep 24, 2020 • edited by datumbox Loading

🚀 Feature

Motivation

Pitch

oke-aditya commented Sep 25, 2020 • edited Loading

fmassa commented Sep 25, 2020

songyuc commented Jan 11, 2021

digantamisra98 commented Jan 19, 2021

WZMIAOMIAO commented Feb 24, 2021

oke-aditya commented Feb 24, 2021

WZMIAOMIAO commented Feb 24, 2021

stanwinata commented Oct 6, 2021

datumbox commented Oct 6, 2021

stanwinata commented Oct 6, 2021 • edited Loading

stanwinata commented Oct 6, 2021 • edited Loading

datumbox commented Oct 6, 2021

stanwinata commented Oct 6, 2021 • edited Loading

datumbox commented Oct 7, 2021

xiaohu2015 commented Nov 16, 2021 • edited Loading

xiaohu2015 commented May 12, 2022

santhoshnumberone commented May 13, 2022

oke-aditya commented May 13, 2022

zhiqwang commented Jun 22, 2022 • edited Loading

datumbox commented Jun 22, 2022

abhi-glitchhg commented Jun 23, 2022

oke-aditya commented Jul 10, 2022 • edited Loading

talregev commented Jul 10, 2022

oke-aditya commented Jul 10, 2022

talregev commented Jul 10, 2022

oke-aditya commented Jul 10, 2022

talregev commented Jul 10, 2022

talregev commented Jul 14, 2022

oke-aditya commented Jul 14, 2022 • edited Loading

talregev commented Jul 14, 2022

NicolasHug commented Jul 14, 2022 • edited Loading

talregev commented Jul 14, 2022

datumbox commented Jul 25, 2022

yokosyun commented Aug 18, 2022

pri1311 commented Nov 19, 2022 • edited Loading

oke-aditya commented Nov 20, 2022

pri1311 commented Nov 20, 2022 • edited Loading

Coderx7 commented Feb 16, 2023 • edited Loading

senarvi commented May 2, 2023

talcs commented Sep 24, 2020 •

edited by datumbox

Loading

oke-aditya commented Sep 25, 2020 •

edited

Loading

stanwinata commented Oct 6, 2021 •

edited

Loading

stanwinata commented Oct 6, 2021 •

edited

Loading

stanwinata commented Oct 6, 2021 •

edited

Loading

xiaohu2015 commented Nov 16, 2021 •

edited

Loading

zhiqwang commented Jun 22, 2022 •

edited

Loading

oke-aditya commented Jul 10, 2022 •

edited

Loading

oke-aditya commented Jul 14, 2022 •

edited

Loading

NicolasHug commented Jul 14, 2022 •

edited

Loading

pri1311 commented Nov 19, 2022 •

edited

Loading

pri1311 commented Nov 20, 2022 •

edited

Loading

Coderx7 commented Feb 16, 2023 •

edited

Loading