Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Are new models planned to be added? #2707

Open
19 of 37 tasks
talcs opened this issue Sep 24, 2020 · 62 comments · Fixed by #5197
Open
19 of 37 tasks

Are new models planned to be added? #2707

talcs opened this issue Sep 24, 2020 · 62 comments · Fixed by #5197

Comments

@talcs
Copy link
Contributor

talcs commented Sep 24, 2020

🚀 Feature

Adding new models to the models section.

Motivation

Many new models have been proposed in the recent years and do not exist in the models module.
For example, the EfficientNets appear to provide with 8 models of different complexities that outperform everything else that exists at each complexity level.

Pitch

See Contributing to Torchvision - Models for guidance on adding new models.

Add pre-trained weights for the following variants:

@oke-aditya
Copy link
Contributor

oke-aditya commented Sep 25, 2020

This request has come often. Just linking all those for reference.

archived - update issue instead

Edit by @datumbox: I shamelessly edited your comment and moved your fantastic up-to-date list on the issue for greater visibility.

Reply by @oke-aditya: I was actually going to suggest to do the same 😃

A generalized guideline for adding models is being added in contributing.md file in this pr #2663.

@fmassa
Copy link
Member

fmassa commented Sep 25, 2020

Hi,

To complement @oke-aditya great answer, we will be adding more models to torchvision, including Efficient Nets and MobileNetV3.

The current limitation is that we would like to ensure that we can reproduce the pretrained model using the training scripts from references/classification, but those models require a different training recipe than then one present in [references/classification`](https://github.com/pytorch/vision/tree/master/references/classification), so we will need to update those recipes before uploading those new models.

@songyuc
Copy link

songyuc commented Jan 11, 2021

I hope to add Mish activation function.

@digantamisra98
Copy link

@songyuc There is a closed feature request on PyTorch for adding Mish. You can comment over there for increased visibility so that Mish can be considered to be added in the future. Link to the issue - pytorch/pytorch#25584

@WZMIAOMIAO
Copy link
Contributor

first, thanks for your great works.
I hope to add Swish activation and NFNets(High-Performance Large-Scale Image Recognition Without Normalization) https://arxiv.org/abs/2102.06171.
In addition, I would like to ask when eficientnet can be added. I found that it was mentioned in 2019, but now it's 2021. I refer to the mobilenetV3 model in torchvision, then I built efficientnet models Test9_efficientNet, but I don't have a GPU to train with.

@oke-aditya
Copy link
Contributor

Hi @WZMIAOMIAO Swish Activation function is added in to PyTorch (not torchvision) as nn.Silu.
Mobilenetv3 would be hopefully available in next release.

@WZMIAOMIAO
Copy link
Contributor

@oke-aditya Thank you for your reply. I've seen MobileNetv3 in the torchvision repository. When will EfficientNet, RegNet and NFNet be added?

@stanwinata
Copy link

Hey guys, I was wondering if the pytorch team are open for public contributions to these models? 🤔
I assume we can follow similar PR formats to the one here and here along with validation/proof that we can reproduce paper results.

@datumbox
Copy link
Contributor

datumbox commented Oct 6, 2021

@stwinata Thanks for offering. Which models do you have in mind to contribute?

The process of model contribution was a bit problematic (mainly due to the training bit) and we still haven't figure out all details. But depending on the proposal, we might be able to work something out. :)

@stanwinata
Copy link

stanwinata commented Oct 6, 2021

@stwinata Thanks for offering. Which models do you have in mind to contribute?

@datumbox thanks for the quick reply! I am interested in DETR or EfficientDet. I was thinking for first commit maybe DETR might be easier, since we can use DETR's original repo for referene and may be able try to load weights for preliminary validations.

The process of model contribution was a bit problematic (mainly due to the training bit) and we still haven't figure out all details. But depending on the proposal, we might be able to work something out. :)

Perhaps we can also try to determine a canonical pipeline for model contribution through this experience and document it S.T others can contribute in the future easily 😃 !

@stanwinata
Copy link

stanwinata commented Oct 6, 2021

(mainly due to the training bit)

@datumbox Does this come down to lack of GPU resources? Or is it due to the need to validate that it can properly train?

@datumbox
Copy link
Contributor

datumbox commented Oct 6, 2021

@stwinata DETR sounds a good addition to me. Since @fmassa is one of the main authors, I will let him have the final say on this.

Contributing models is tricky because:

  1. To reproduce the paper it's an iterative process of code + recipe + training. Getting a PR that just adds the implementation is less useful because someone from the maintainers needs to do the heavy lifting of the model reproduction which is the time consuming bit. This is why in the past we avoided accepting contributions on this space.
  2. On the other hand if someone actually sends a PR that reproduces the paper and has weights, then the only thing for the maintainers is to confirm the accuracy of the pre-trained weights and retrain the model using the reference script to ensure our recipe works as expected.
  3. The GPU resources is not a concern for us but for the contributor. We can train models but this is not an infra available for open-source maintainers.

Happy to discuss more and see if it's worth doing this now.

@stanwinata
Copy link

stanwinata commented Oct 6, 2021

@datumbox These comments makes sense 😃

  1. To reproduce the paper it's an iterative process of code + recipe + training. Getting a PR that just adds the implementation is less useful because someone from the maintainers needs to do the heavy lifting of the model reproduction which is the time consuming bit. This is why in the past we avoided accepting contributions on this space.
  2. On the other hand if someone actually sends a PR that reproduces the paper and has weights, then the only thing for the maintainers is to confirm the accuracy of the pre-trained weights and retrain the model using the reference script to ensure our recipe works as expected.

Yeah I agree, some might even say getting it to models to be "useful" aka reproducing the Paper results are the fun bits 😃
I think in the future model contributions/PRs should include:

  • Implementation
  • Saved weights
  • Proof of Paper's Benchmark reproduction
  • documentations

I think this way, we can ease the load on Pytorch/Vision maintainers, make PRs much more concrete and useful.

Perhaps we can also have a simple util script that tests trained candidate implementations on various benchmarks.(this might be another feature request 😄 )

  1. The GPU resources is not a concern for us but for the contributor. We can train models but this is not an infra available for open-source maintainers.

I also agree with this. Moreover, I think these days GPU-resources either at home, or thru AWS and GCP are getting ubiquitous enough for contributors to do training by themselves 😃

@datumbox
Copy link
Contributor

datumbox commented Oct 7, 2021

@stwinata Thanks for the comments. I think we agree. Below I write few thoughts on the potential process we could adopt.

The minimum to merge such a contribution is:

  1. The PR must include the code implementation, have documentation and tests.
  2. It should also extend the existing reference scripts used to train the model.
  3. The weights need to reproduce closely the results of the paper in terms of accuracy.
  4. The PR description should include commands/configuration used to train the model, so that we can easily run them on our infra to verify.

Note that there are details here related to the code quality etc, but these are rules that apply in all PRs.

For someone who would be interested in adding a model, here are a few important considerations:

  1. Training big models requires lots of resources and the cost quickly adds up.
  2. Reproducing models is fun but also risky as you might not always get the results reported on the paper. It might require a huge amount of effort to close the gap.
  3. The contribution might not get merged if we significantly lack in terms of accuracy, speed etc.

The above are a very big ask I think. But if an OSS contributor is willing to give it a try despite the above adversities, then we would be happy to pair up and help. This should happen in a coordinated way to:

  1. Ensure that the model in question is of interest and that nobody else is already working on adding it.
  2. Ensure there is an assigned maintainer providing support, guidance and regular feedback.

@fmassa let me know your thoughts on this as well.

@xiaohu2015
Copy link
Contributor

xiaohu2015 commented Nov 16, 2021

I am aming at adding FCOS to torchvision.
https://github.com/xiaohu2015/vision/blob/main/torchvision/models/detection/fcos.py

@xiaohu2015
Copy link
Contributor

@santhoshnumberone I think DINO is more practical, since user can train less epochs to get good mAP.

@santhoshnumberone
Copy link

@santhoshnumberone I think DINO is more practical, since user can train less epochs to get good mAP.

What I meant was, if anyone of you could check if facebookresearch/dino and the DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection are not the same.

I feel both are different

@oke-aditya
Copy link
Contributor

There are many variants of DETR. E.g. deformable DETR, modulated DETR, A simple search would give these results.

https://github.com/search?q=DETR

Let's start of by including vanilla DETR :)

@zhiqwang
Copy link
Contributor

zhiqwang commented Jun 22, 2022

Hi , MobileOne introduced by Apple is interesting, the mobile-vision team implement it at facebookresearch/mobile-vision#91 , is there any plan to support it there?

@datumbox
Copy link
Contributor

@zhiqwang MobileOne wan't in our shotlist but we can certainly keep an eye on it if it builds momentum.

@abhi-glitchhg
Copy link
Contributor

MobileViT is another lightweight vision transformer-based model. The code is available here . Might be good to keep an eye on this one too.

@oke-aditya
Copy link
Contributor

oke-aditya commented Jul 10, 2022

Any small list for Semantic Segmentation models?

Maybe a tentative one

  • U2Net
  • HRNet
  • DeepLabV3+
  • SegNet (Too old?)
  • U-Net (Not SOTA? But we do have AlexNet which was SOTA long back)

I can try U2Net. Maybe it's an easy Model

@talregev
Copy link
Contributor

Please add BiFPN

https://paperswithcode.com/method/bifpn#:~:text=A%20BiFPN%2C%20or%20Weighted%20Bi,fast%20multi%2Dscale%20feature%20fusion.

@oke-aditya
Copy link
Contributor

Yesss BiFPN is very popular. Maybe once we initiate EfficientDet it would get added.

@talregev
Copy link
Contributor

Will you try to add EfficientDet?

@oke-aditya
Copy link
Contributor

I'm pretty noob when it comes to implementing models. But maybe I will give it a shot after I add a few easy models.

@talregev
Copy link
Contributor

@datumbox Please Pin this issue.

@talregev
Copy link
Contributor

  • U-Net (Not SOTA? But we do have AlexNet which was SOTA long back)

@oke-aditya Can you add Unet?

@oke-aditya
Copy link
Contributor

oke-aditya commented Jul 14, 2022

Will need to discuss with the maintainers, and I think U2-Net will be more helpful. U-Net I'm not sure. I think I can implement them.

@talregev
Copy link
Contributor

Will need to discuss with the maintainers, and I think U2-Net will be more helpful. U-Net I'm not sure. I think I can implement them.

The maintainers are not responsive to issues. Ones you open a PR, they will talk with you.

@NicolasHug
Copy link
Member

NicolasHug commented Jul 14, 2022

Hi @talregev ,

It's nice to see that you're excited about torchvision models. I'm going to ask you to be a little more patient here. @datumbox is on well deserved holidays and I'm sure he'll get back to you as soon as he can.

The maintainers are not responsive to issues. Ones you open a PR, they will talk with you.

Rest assured that we are responsive to issues. Furthermore, like in most projects, we don't encourage opening PRs prior to opening issues in order to leave time and space for discussing the requested feature.


As a side note:

Please do X

Will you/can you do Y

@username (as in #2707 (comment))

isn't the best way to engage with open source projects. We always welcome suggestions and feature requests, but in order for us to help you best, we usually need a bit more details on what is requested, and why it would be useful to you. Also, while a gentle ping can sometimes be appropriate, just at-ing people without context or form might not get you the outcome that you're looking for.

@talregev
Copy link
Contributor

Hi @talregev ,

It's nice to see that you're excited about torchvision models. I'm going to ask you to be a little more patient here. @datumbox is on well deserved holidays and I'm sure he'll get back to you as soon as he can.

The maintainers are not responsive to issues. Ones you open a PR, they will talk with you.

Rest assured that we are responsive to issues. Furthermore, like in most projects, we don't encourage opening PRs prior to opening issues in order to leave time and space for discussing the requested feature.


As a side note:

Please do X

Will you/can you do Y

@username (as in #2707 (comment))

isn't the best way to engage with open source projects. We always welcome suggestions and feature requests, but in order for us to help you best, we usually need a bit more details on what is requested, and why it would be useful to you. Also, while a gentle ping can sometimes be appropriate, just at-ing people without context or form might not get you the outcome that you're looking for.

@NicolasHug Thank you for your nice suggestion. Can you pin this issue?

@datumbox
Copy link
Contributor

@talregev Apologies for the delayed response. I was on my annual leave.

As others pointed out, U-net is a bit old now (released in 2015) and there are quite a few good community implementations already. If we were to add more models, we would probably prioritizing transformer approaches that yield better results. We don't have immediate plans for this though as our focus this half would be Videos.

Concerning pinning the issue, we got quite a few pinned ones already so I'm not sure this will increase the visibility. There are at least 4-5 more tickets like this for losses, operators, data augmentations etc. Perhaps the solution here is to pin the issue with our H2 roadmap once finalized and then link to this issue from there.

@yokosyun
Copy link
Contributor

I also want to have BiFPN neck

@pri1311
Copy link

pri1311 commented Nov 19, 2022

Any small list for Semantic Segmentation models?

Hey, I was currently working with SegFormer, TransUnet, UNETR (although I am working with medical imaging datasets solely now).

All three of them are relatively new models (early 2021) but they have shown good results and also have a fair amount of citations.
Any thoughts from the maintainers, on whether they are worth adding to torchvision?

Edit: MaskFormer and DPT could also be good additions I believe.

@oke-aditya
Copy link
Contributor

Hi. While these models are new. These look to be specialized in medical image segmentation.

Are these also valid on general datasets such as Pascal VOC or COCO? Or is there any valid performance measurement over these standard datasets?

@pri1311
Copy link

pri1311 commented Nov 20, 2022

These look to be specialized in medical image segmentation.

UNETR and TransUnet yes.

SegFormer, MaskFormer and DPT support Cityscapes, ADE20k, coco, etc

@Coderx7
Copy link

Coderx7 commented Feb 16, 2023

Hello everyone,
Would you kindly consider adding the SimpleNet architecture to the classification section?
SimpleNet is a 2016 architecture, that outperformed deeper and more complex architectures at the time using plain CNN.
There was never an imagenet model due to me not having the proper infrastructure, but recently I could train some variants that perform very well.
Here's the result as of now:

Method #Params ImageNet ImageNet-Real-Labels(val)
simplenetv1_9m_m2(36.3 MB) 9.5m 74.23/91.748 81.22/94.756
simplenetv1_5m_m2(22 MB) 5.7m 72.03/90.324 79.328/93.714
simplenetv1_small_m2_075(12.6 MB) 3m 68.506/88.15 76.283/92.02
simplenetv1_small_m2_05(5.78 MB) 1.5m 61.67/83.488 69.31/ 88.195

As you can see, it outperforms VGGNet and many other architectures, it also outperforms resnet18(11m vs 5m), and some MobileNet variants as well, and achieves a high accuracy nonetheless despite being super simple and compared to architectures such as DenseNet, it performs very well with a fraction of memory usage.

It performs much faster on older GTX cards, but still performs decently on new hardware as well.
Unlike MobileNet class of architectures, it doesn't need QAT for getting maximum accuracy after quantization, static quantization already can provide excellent results pretty close to before quantization.

I believe having an efficient yet simple architecture that uses basic operators and provides a decent performance can be a good addition to the diversity of models in the Pytorch repository.

Heres is the link to our pytorch implementation of simplenet : https://github.com/Coderx7/SimpleNet_Pytorch

I'd be delighted to answer any questions.
Thank you very much for your time.

@senarvi
Copy link

senarvi commented May 2, 2023

There's new discussion about adding YOLO in this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.