diff --git a/.github/workflows/build_documentation.yml b/.github/workflows/build_documentation.yml index 0fa5224f9166..ff928f1eb221 100644 --- a/.github/workflows/build_documentation.yml +++ b/.github/workflows/build_documentation.yml @@ -45,7 +45,7 @@ jobs: run: | sudo apt-get -y update && sudo apt-get install -y libsndfile1-dev - pip install git+https://github.com/huggingface/doc-builder@add_install_cell + pip install git+https://github.com/huggingface/doc-builder pip install git+https://github.com/huggingface/transformers#egg=transformers[dev] export TORCH_VERSION=$(python -c "from torch import version; print(version.__version__.split('+')[0])") diff --git a/docs/source/index.rst b/docs/source/index.rst deleted file mode 100644 index 003e5b867fc3..000000000000 --- a/docs/source/index.rst +++ /dev/null @@ -1,718 +0,0 @@ -Transformers -======================================================================================================================= - -State-of-the-art Natural Language Processing for Jax, Pytorch and TensorFlow - -๐Ÿค— Transformers (formerly known as `pytorch-transformers` and `pytorch-pretrained-bert`) provides general-purpose -architectures (BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet...) for Natural Language Understanding (NLU) and Natural -Language Generation (NLG) with over 32+ pretrained models in 100+ languages and deep interoperability between Jax, -PyTorch and TensorFlow. - -This is the documentation of our repository `transformers `__. You can -also follow our `online course `__ that teaches how to use this library, as well as the -other libraries developed by Hugging Face and the Hub. - -If you are looking for custom support from the Hugging Face team ------------------------------------------------------------------------------------------------------------------------ - -.. raw:: html - - - HuggingFace Expert Acceleration Program -
- -Features ------------------------------------------------------------------------------------------------------------------------ - -- High performance on NLU and NLG tasks -- Low barrier to entry for educators and practitioners - -State-of-the-art NLP for everyone: - -- Deep learning researchers -- Hands-on practitioners -- AI/ML/NLP teachers and educators - -.. - Copyright 2020 The HuggingFace Team. All rights reserved. - - Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with - the License. You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - - Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on - an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the - specific language governing permissions and limitations under the License. - -Lower compute costs, smaller carbon footprint: - -- Researchers can share trained models instead of always retraining -- Practitioners can reduce compute time and production costs -- 8 architectures with over 30 pretrained models, some in more than 100 languages - -Choose the right framework for every part of a model's lifetime: - -- Train state-of-the-art models in 3 lines of code -- Deep interoperability between Jax, Pytorch and TensorFlow models -- Move a single model between Jax/PyTorch/TensorFlow frameworks at will -- Seamlessly pick the right framework for training, evaluation, production - -The support for Jax is still experimental (with a few models right now), expect to see it grow in the coming months! - -`All the model checkpoints `__ are seamlessly integrated from the huggingface.co `model -hub `__ where they are uploaded directly by `users `__ and -`organizations `__. - -Current number of checkpoints: |checkpoints| - -.. |checkpoints| image:: https://img.shields.io/endpoint?url=https://huggingface.co/api/shields/models&color=brightgreen - -Contents ------------------------------------------------------------------------------------------------------------------------ - -The documentation is organized in five parts: - -- **GET STARTED** contains a quick tour, the installation instructions and some useful information about our philosophy - and a glossary. -- **USING ๐Ÿค— TRANSFORMERS** contains general tutorials on how to use the library. -- **ADVANCED GUIDES** contains more advanced guides that are more specific to a given script or part of the library. -- **RESEARCH** focuses on tutorials that have less to do with how to use the library but more about general research in - transformers model -- The three last section contain the documentation of each public class and function, grouped in: - - - **MAIN CLASSES** for the main classes exposing the important APIs of the library. - - **MODELS** for the classes and functions related to each model implemented in the library. - - **INTERNAL HELPERS** for the classes and functions we use internally. - -The library currently contains Jax, PyTorch and Tensorflow implementations, pretrained model weights, usage scripts and -conversion utilities for the following models. - -Supported models -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -.. - This list is updated automatically from the README with `make fix-copies`. Do not update manually! - -1. :doc:`ALBERT ` (from Google Research and the Toyota Technological Institute at Chicago) released - with the paper `ALBERT: A Lite BERT for Self-supervised Learning of Language Representations - `__, by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush - Sharma, Radu Soricut. -2. :doc:`BART ` (from Facebook) released with the paper `BART: Denoising Sequence-to-Sequence - Pre-training for Natural Language Generation, Translation, and Comprehension - `__ by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman - Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer. -3. :doc:`BARThez ` (from ร‰cole polytechnique) released with the paper `BARThez: a Skilled Pretrained - French Sequence-to-Sequence Model `__ by Moussa Kamal Eddine, Antoine J.-P. - Tixier, Michalis Vazirgiannis. -4. :doc:`BARTpho ` (from VinAI Research) released with the paper `BARTpho: Pre-trained - Sequence-to-Sequence Models for Vietnamese `__ by Nguyen Luong Tran, Duong Minh Le - and Dat Quoc Nguyen. -5. :doc:`BEiT ` (from Microsoft) released with the paper `BEiT: BERT Pre-Training of Image Transformers - `__ by Hangbo Bao, Li Dong, Furu Wei. -6. :doc:`BERT ` (from Google) released with the paper `BERT: Pre-training of Deep Bidirectional - Transformers for Language Understanding `__ by Jacob Devlin, Ming-Wei Chang, - Kenton Lee and Kristina Toutanova. -7. :doc:`BERTweet ` (from VinAI Research) released with the paper `BERTweet: A pre-trained language - model for English Tweets `__ by Dat Quoc Nguyen, Thanh Vu and Anh Tuan - Nguyen. -8. :doc:`BERT For Sequence Generation ` (from Google) released with the paper `Leveraging - Pre-trained Checkpoints for Sequence Generation Tasks `__ by Sascha Rothe, Shashi - Narayan, Aliaksei Severyn. -9. :doc:`BigBird-RoBERTa ` (from Google Research) released with the paper `Big Bird: Transformers - for Longer Sequences `__ by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua - Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed. -10. :doc:`BigBird-Pegasus ` (from Google Research) released with the paper `Big Bird: - Transformers for Longer Sequences `__ by Manzil Zaheer, Guru Guruganesh, Avinava - Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr - Ahmed. -11. :doc:`Blenderbot ` (from Facebook) released with the paper `Recipes for building an - open-domain chatbot `__ by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary - Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston. -12. :doc:`BlenderbotSmall ` (from Facebook) released with the paper `Recipes for building - an open-domain chatbot `__ by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, - Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston. -13. :doc:`BORT ` (from Alexa) released with the paper `Optimal Subarchitecture Extraction For BERT - `__ by Adrian de Wynter and Daniel J. Perry. -14. :doc:`ByT5 ` (from Google Research) released with the paper `ByT5: Towards a token-free future with - pre-trained byte-to-byte models `__ by Linting Xue, Aditya Barua, Noah Constant, - Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel. -15. :doc:`CamemBERT ` (from Inria/Facebook/Sorbonne) released with the paper `CamemBERT: a Tasty - French Language Model `__ by Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz - Suรกrez*, Yoann Dupont, Laurent Romary, ร‰ric Villemonte de la Clergerie, Djamรฉ Seddah and Benoรฎt Sagot. -16. :doc:`CANINE ` (from Google Research) released with the paper `CANINE: Pre-training an Efficient - Tokenization-Free Encoder for Language Representation `__ by Jonathan H. Clark, - Dan Garrette, Iulia Turc, John Wieting. -17. :doc:`CLIP ` (from OpenAI) released with the paper `Learning Transferable Visual Models From - Natural Language Supervision `__ by Alec Radford, Jong Wook Kim, Chris Hallacy, - Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen - Krueger, Ilya Sutskever. -18. :doc:`ConvBERT ` (from YituTech) released with the paper `ConvBERT: Improving BERT with - Span-based Dynamic Convolution `__ by Zihang Jiang, Weihao Yu, Daquan Zhou, - Yunpeng Chen, Jiashi Feng, Shuicheng Yan. -19. :doc:`CPM ` (from Tsinghua University) released with the paper `CPM: A Large-scale Generative - Chinese Pre-trained Language Model `__ by Zhengyan Zhang, Xu Han, Hao Zhou, Pei - Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng, - Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang, - Juanzi Li, Xiaoyan Zhu, Maosong Sun. -20. :doc:`CTRL ` (from Salesforce) released with the paper `CTRL: A Conditional Transformer Language - Model for Controllable Generation `__ by Nitish Shirish Keskar*, Bryan McCann*, - Lav R. Varshney, Caiming Xiong and Richard Socher. -21. :doc:`DeBERTa ` (from Microsoft) released with the paper `DeBERTa: Decoding-enhanced BERT with - Disentangled Attention `__ by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu - Chen. -22. :doc:`DeBERTa-v2 ` (from Microsoft) released with the paper `DeBERTa: Decoding-enhanced BERT - with Disentangled Attention `__ by Pengcheng He, Xiaodong Liu, Jianfeng Gao, - Weizhu Chen. -23. :doc:`DeiT ` (from Facebook) released with the paper `Training data-efficient image transformers & - distillation through attention `__ by Hugo Touvron, Matthieu Cord, Matthijs - Douze, Francisco Massa, Alexandre Sablayrolles, Hervรฉ Jรฉgou. -24. :doc:`DETR ` (from Facebook) released with the paper `End-to-End Object Detection with Transformers - `__ by Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, - Alexander Kirillov, Sergey Zagoruyko. -25. :doc:`DialoGPT ` (from Microsoft Research) released with the paper `DialoGPT: Large-Scale - Generative Pre-training for Conversational Response Generation `__ by Yizhe - Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan. -26. :doc:`DistilBERT ` (from HuggingFace), released together with the paper `DistilBERT, a - distilled version of BERT: smaller, faster, cheaper and lighter `__ by Victor - Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into `DistilGPT2 - `__, RoBERTa into `DistilRoBERTa - `__, Multilingual BERT into - `DistilmBERT `__ and a German - version of DistilBERT. -27. :doc:`DPR ` (from Facebook) released with the paper `Dense Passage Retrieval for Open-Domain - Question Answering `__ by Vladimir Karpukhin, Barlas OฤŸuz, Sewon Min, Patrick - Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. -28. :doc:`EncoderDecoder ` (from Google Research) released with the paper `Leveraging - Pre-trained Checkpoints for Sequence Generation Tasks `__ by Sascha Rothe, Shashi - Narayan, Aliaksei Severyn. -29. :doc:`ELECTRA ` (from Google Research/Stanford University) released with the paper `ELECTRA: - Pre-training text encoders as discriminators rather than generators `__ by Kevin - Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning. -30. :doc:`FlauBERT ` (from CNRS) released with the paper `FlauBERT: Unsupervised Language Model - Pre-training for French `__ by Hang Le, Loรฏc Vial, Jibril Frej, Vincent Segonne, - Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoรฎt Crabbรฉ, Laurent Besacier, Didier Schwab. -31. :doc:`FNet ` (from Google Research) released with the paper `FNet: Mixing Tokens with Fourier - Transforms `__ by James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago - Ontanon. -32. :doc:`Funnel Transformer ` (from CMU/Google Brain) released with the paper `Funnel-Transformer: - Filtering out Sequential Redundancy for Efficient Language Processing `__ by - Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le. -33. :doc:`GPT ` (from OpenAI) released with the paper `Improving Language Understanding by Generative - Pre-Training `__ by Alec Radford, Karthik Narasimhan, Tim Salimans - and Ilya Sutskever. -34. :doc:`GPT-2 ` (from OpenAI) released with the paper `Language Models are Unsupervised Multitask - Learners `__ by Alec Radford*, Jeffrey Wu*, Rewon Child, David - Luan, Dario Amodei** and Ilya Sutskever**. -35. :doc:`GPT-J ` (from EleutherAI) released in the repository `kingoflolz/mesh-transformer-jax - `__ by Ben Wang and Aran Komatsuzaki. -36. :doc:`GPT Neo ` (from EleutherAI) released in the repository `EleutherAI/gpt-neo - `__ by Sid Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy. -37. :doc:`Hubert ` (from Facebook) released with the paper `HuBERT: Self-Supervised Speech - Representation Learning by Masked Prediction of Hidden Units `__ by Wei-Ning Hsu, - Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed. -38. :doc:`I-BERT ` (from Berkeley) released with the paper `I-BERT: Integer-only BERT Quantization - `__ by Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer. -39. `ImageGPT `__ (from OpenAI) released with the - paper `Generative Pretraining from Pixels `__ by Mark Chen, Alec Radford, Rewon - Child, Jeffrey Wu, Heewoo Jun, David Luan, Ilya Sutskever. -40. :doc:`LayoutLM ` (from Microsoft Research Asia) released with the paper `LayoutLM: Pre-training - of Text and Layout for Document Image Understanding `__ by Yiheng Xu, Minghao Li, - Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou. -41. :doc:`LayoutLMv2 ` (from Microsoft Research Asia) released with the paper `LayoutLMv2: - Multi-modal Pre-training for Visually-Rich Document Understanding `__ by Yang Xu, - Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min - Zhang, Lidong Zhou. -42. :doc:`LayoutXLM ` (from Microsoft Research Asia) released with the paper `LayoutXLM: - Multimodal Pre-training for Multilingual Visually-rich Document Understanding `__ - by Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei. -43. :doc:`LED ` (from AllenAI) released with the paper `Longformer: The Long-Document Transformer - `__ by Iz Beltagy, Matthew E. Peters, Arman Cohan. -44. :doc:`Longformer ` (from AllenAI) released with the paper `Longformer: The Long-Document - Transformer `__ by Iz Beltagy, Matthew E. Peters, Arman Cohan. -45. :doc:`LUKE ` (from Studio Ousia) released with the paper `LUKE: Deep Contextualized Entity - Representations with Entity-aware Self-attention `__ by Ikuya Yamada, Akari Asai, - Hiroyuki Shindo, Hideaki Takeda, Yuji Matsumoto. -46. :doc:`LXMERT ` (from UNC Chapel Hill) released with the paper `LXMERT: Learning Cross-Modality - Encoder Representations from Transformers for Open-Domain Question Answering `__ - by Hao Tan and Mohit Bansal. -47. :doc:`M2M100 ` (from Facebook) released with the paper `Beyond English-Centric Multilingual - Machine Translation `__ by Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, - Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, - Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin. -48. :doc:`MarianMT ` Machine translation models trained using `OPUS `__ data by - Jรถrg Tiedemann. The `Marian Framework `__ is being developed by the Microsoft - Translator Team. -49. :doc:`MBart ` (from Facebook) released with the paper `Multilingual Denoising Pre-training for - Neural Machine Translation `__ by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, - Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer. -50. :doc:`MBart-50 ` (from Facebook) released with the paper `Multilingual Translation with Extensible - Multilingual Pretraining and Finetuning `__ by Yuqing Tang, Chau Tran, Xian Li, - Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan. -51. :doc:`Megatron-BERT ` (from NVIDIA) released with the paper `Megatron-LM: Training - Multi-Billion Parameter Language Models Using Model Parallelism `__ by Mohammad - Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro. -52. :doc:`Megatron-GPT2 ` (from NVIDIA) released with the paper `Megatron-LM: Training - Multi-Billion Parameter Language Models Using Model Parallelism `__ by Mohammad - Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro. -53. :doc:`MPNet ` (from Microsoft Research) released with the paper `MPNet: Masked and Permuted - Pre-training for Language Understanding `__ by Kaitao Song, Xu Tan, Tao Qin, - Jianfeng Lu, Tie-Yan Liu. -54. :doc:`MT5 ` (from Google AI) released with the paper `mT5: A massively multilingual pre-trained - text-to-text transformer `__ by Linting Xue, Noah Constant, Adam Roberts, Mihir - Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel. -55. :doc:`Pegasus ` (from Google) released with the paper `PEGASUS: Pre-training with Extracted - Gap-sentences for Abstractive Summarization `__ by Jingqing Zhang, Yao Zhao, - Mohammad Saleh and Peter J. Liu. -56. `Perceiver IO `__ (from Deepmind) released - with the paper `Perceiver IO: A General Architecture for Structured Inputs & Outputs - `__ by Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, - Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hรฉnaff, Matthew M. - Botvinick, Andrew Zisserman, Oriol Vinyals, Joรฃo Carreira. -57. :doc:`PhoBERT ` (from VinAI Research) released with the paper `PhoBERT: Pre-trained language - models for Vietnamese `__ by Dat Quoc Nguyen and Anh Tuan - Nguyen. -58. :doc:`ProphetNet ` (from Microsoft Research) released with the paper `ProphetNet: Predicting - Future N-gram for Sequence-to-Sequence Pre-training `__ by Yu Yan, Weizhen Qi, - Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou. -59. :doc:`QDQBert ` (from NVIDIA) released with the paper `Integer Quantization for Deep Learning - Inference: Principles and Empirical Evaluation `__ by Hao Wu, Patrick Judd, - Xiaojie Zhang, Mikhail Isaev and Paulius Micikevicius. -60. :doc:`Reformer ` (from Google Research) released with the paper `Reformer: The Efficient - Transformer `__ by Nikita Kitaev, ลukasz Kaiser, Anselm Levskaya. -61. :doc:`RemBERT ` (from Google Research) released with the paper `Rethinking embedding coupling in - pre-trained language models `__ by Hyung Won Chung, Thibault Fรฉvry, Henry - Tsai, M. Johnson, Sebastian Ruder. -62. :doc:`RoBERTa ` (from Facebook), released together with the paper a `Robustly Optimized BERT - Pretraining Approach `__ by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar - Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov. -63. :doc:`RoFormer ` (from ZhuiyiTechnology), released together with the paper a `RoFormer: - Enhanced Transformer with Rotary Position Embedding `__ by Jianlin Su and - Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu. -64. :doc:`SegFormer ` (from NVIDIA) released with the paper `SegFormer: Simple and Efficient - Design for Semantic Segmentation with Transformers `__ by Enze Xie, Wenhai Wang, - Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo. -65. :doc:`SEW ` (from ASAPP) released with the paper `Performance-Efficiency Trade-offs in Unsupervised - Pre-training for Speech Recognition `__ by Felix Wu, Kwangyoun Kim, Jing Pan, Kyu - Han, Kilian Q. Weinberger, Yoav Artzi. -66. :doc:`SEW-D ` (from ASAPP) released with the paper `Performance-Efficiency Trade-offs in - Unsupervised Pre-training for Speech Recognition `__ by Felix Wu, Kwangyoun Kim, - Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi. -67. :doc:`SpeechToTextTransformer ` (from Facebook), released together with the paper - `fairseq S2T: Fast Speech-to-Text Modeling with fairseq `__ by Changhan Wang, Yun - Tang, Xutai Ma, Anne Wu, Dmytro Okhonko, Juan Pino. -68. :doc:`SpeechToTextTransformer2 ` (from Facebook), released together with the paper - `Large-Scale Self- and Semi-Supervised Learning for Speech Translation `__ by - Changhan Wang, Anne Wu, Juan Pino, Alexei Baevski, Michael Auli, Alexis Conneau. -69. :doc:`Splinter ` (from Tel Aviv University), released together with the paper `Few-Shot - Question Answering by Pretraining Span Selection `__ by Ori Ram, Yuval Kirstain, - Jonathan Berant, Amir Globerson, Omer Levy. -70. :doc:`SqueezeBert ` (from Berkeley) released with the paper `SqueezeBERT: What can computer - vision teach NLP about efficient neural networks? `__ by Forrest N. Iandola, - Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer. -71. :doc:`T5 ` (from Google AI) released with the paper `Exploring the Limits of Transfer Learning with a - Unified Text-to-Text Transformer `__ by Colin Raffel and Noam Shazeer and Adam - Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu. -72. :doc:`T5v1.1 ` (from Google AI) released in the repository - `google-research/text-to-text-transfer-transformer - `__ by - Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi - Zhou and Wei Li and Peter J. Liu. -73. :doc:`TAPAS ` (from Google AI) released with the paper `TAPAS: Weakly Supervised Table Parsing via - Pre-training `__ by Jonathan Herzig, Paweล‚ Krzysztof Nowak, Thomas Mรผller, - Francesco Piccinno and Julian Martin Eisenschlos. -74. :doc:`Transformer-XL ` (from Google/CMU) released with the paper `Transformer-XL: - Attentive Language Models Beyond a Fixed-Length Context `__ by Zihang Dai*, - Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov. -75. :doc:`TrOCR ` (from Microsoft), released together with the paper `TrOCR: Transformer-based Optical - Character Recognition with Pre-trained Models `__ by Minghao Li, Tengchao Lv, Lei - Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei. -76. :doc:`UniSpeech ` (from Microsoft Research) released with the paper `UniSpeech: Unified Speech - Representation Learning with Labeled and Unlabeled Data `__ by Chengyi Wang, Yu - Wu, Yao Qian, Kenichi Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang. -77. :doc:`UniSpeechSat ` (from Microsoft Research) released with the paper `UNISPEECH-SAT: - UNIVERSAL SPEECH REPRESENTATION LEARNING WITH SPEAKER AWARE PRE-TRAINING `__ by - Sanyuan Chen, Yu Wu, Chengyi Wang, Zhengyang Chen, Zhuo Chen, Shujie Liu, Jian Wu, Yao Qian, Furu Wei, Jinyu Li, - Xiangzhan Yu. -78. :doc:`Vision Transformer (ViT) ` (from Google AI) released with the paper `An Image is Worth 16x16 - Words: Transformers for Image Recognition at Scale `__ by Alexey Dosovitskiy, - Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias - Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby. -79. :doc:`VisualBERT ` (from UCLA NLP) released with the paper `VisualBERT: A Simple and - Performant Baseline for Vision and Language `__ by Liunian Harold Li, Mark - Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang. -80. :doc:`Wav2Vec2 ` (from Facebook AI) released with the paper `wav2vec 2.0: A Framework for - Self-Supervised Learning of Speech Representations `__ by Alexei Baevski, Henry - Zhou, Abdelrahman Mohamed, Michael Auli. -81. :doc:`XLM ` (from Facebook) released together with the paper `Cross-lingual Language Model - Pretraining `__ by Guillaume Lample and Alexis Conneau. -82. :doc:`XLM-ProphetNet ` (from Microsoft Research) released with the paper `ProphetNet: - Predicting Future N-gram for Sequence-to-Sequence Pre-training `__ by Yu Yan, - Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou. -83. :doc:`XLM-RoBERTa ` (from Facebook AI), released together with the paper `Unsupervised - Cross-lingual Representation Learning at Scale `__ by Alexis Conneau*, Kartikay - Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmรกn, Edouard Grave, Myle Ott, Luke - Zettlemoyer and Veselin Stoyanov. -84. :doc:`XLNet ` (from Google/CMU) released with the paper `โ€‹XLNet: Generalized Autoregressive - Pretraining for Language Understanding `__ by Zhilin Yang*, Zihang Dai*, Yiming - Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le. -85. :doc:`XLSR-Wav2Vec2 ` (from Facebook AI) released with the paper `Unsupervised - Cross-Lingual Representation Learning For Speech Recognition `__ by Alexis - Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli. - - -Supported frameworks -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -The table below represents the current support in the library for each of those models, whether they have a Python -tokenizer (called "slow"). A "fast" tokenizer backed by the ๐Ÿค— Tokenizers library, whether they have support in Jax (via -Flax), PyTorch, and/or TensorFlow. - -.. - This table is updated automatically from the auto modules with `make fix-copies`. Do not update manually! - -.. rst-class:: center-aligned-table - -+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ -| Model | Tokenizer slow | Tokenizer fast | PyTorch support | TensorFlow support | Flax Support | -+=============================+================+================+=================+====================+==============+ -| ALBERT | โœ… | โœ… | โœ… | โœ… | โœ… | -+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ -| BART | โœ… | โœ… | โœ… | โœ… | โœ… | -+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ -| BEiT | โŒ | โŒ | โœ… | โŒ | โœ… | -+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ -| BERT | โœ… | โœ… | โœ… | โœ… | โœ… | -+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ -| Bert Generation | โœ… | โŒ | โœ… | โŒ | โŒ | -+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ -| BigBird | โœ… | โœ… | โœ… | โŒ | โœ… | -+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ -| BigBirdPegasus | โŒ | โŒ | โœ… | โŒ | โŒ | -+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ -| Blenderbot | โœ… | โœ… | โœ… | โœ… | โœ… | -+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ -| BlenderbotSmall | โœ… | โœ… | โœ… | โœ… | โŒ | -+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ -| CamemBERT | โœ… | โœ… | โœ… | โœ… | โŒ | -+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ -| Canine | โœ… | โŒ | โœ… | โŒ | โŒ | -+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ -| CLIP | โœ… | โœ… | โœ… | โŒ | โœ… | -+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ -| ConvBERT | โœ… | โœ… | โœ… | โœ… | โŒ | -+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ -| CTRL | โœ… | โŒ | โœ… | โœ… | โŒ | -+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ -| DeBERTa | โœ… | โœ… | โœ… | โœ… | โŒ | -+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ -| DeBERTa-v2 | โœ… | โŒ | โœ… | โœ… | โŒ | -+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ -| DeiT | โŒ | โŒ | โœ… | โŒ | โŒ | -+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ -| DETR | โŒ | โŒ | โœ… | โŒ | โŒ | -+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ -| DistilBERT | โœ… | โœ… | โœ… | โœ… | โœ… | -+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ -| DPR | โœ… | โœ… | โœ… | โœ… | โŒ | -+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ -| ELECTRA | โœ… | โœ… | โœ… | โœ… | โœ… | -+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ -| Encoder decoder | โŒ | โŒ | โœ… | โœ… | โœ… | -+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ -| FairSeq Machine-Translation | โœ… | โŒ | โœ… | โŒ | โŒ | -+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ -| FlauBERT | โœ… | โŒ | โœ… | โœ… | โŒ | -+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ -| FNet | โœ… | โœ… | โœ… | โŒ | โŒ | -+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ -| Funnel Transformer | โœ… | โœ… | โœ… | โœ… | โŒ | -+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ -| GPT Neo | โŒ | โŒ | โœ… | โŒ | โœ… | -+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ -| GPT-J | โŒ | โŒ | โœ… | โŒ | โœ… | -+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ -| Hubert | โŒ | โŒ | โœ… | โœ… | โŒ | -+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ -| I-BERT | โŒ | โŒ | โœ… | โŒ | โŒ | -+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ -| ImageGPT | โŒ | โŒ | โœ… | โŒ | โŒ | -+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ -| LayoutLM | โœ… | โœ… | โœ… | โœ… | โŒ | -+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ -| LayoutLMv2 | โœ… | โœ… | โœ… | โŒ | โŒ | -+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ -| LED | โœ… | โœ… | โœ… | โœ… | โŒ | -+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ -| Longformer | โœ… | โœ… | โœ… | โœ… | โŒ | -+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ -| LUKE | โœ… | โŒ | โœ… | โŒ | โŒ | -+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ -| LXMERT | โœ… | โœ… | โœ… | โœ… | โŒ | -+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ -| M2M100 | โœ… | โŒ | โœ… | โŒ | โŒ | -+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ -| Marian | โœ… | โŒ | โœ… | โœ… | โœ… | -+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ -| mBART | โœ… | โœ… | โœ… | โœ… | โœ… | -+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ -| MegatronBert | โŒ | โŒ | โœ… | โŒ | โŒ | -+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ -| MobileBERT | โœ… | โœ… | โœ… | โœ… | โŒ | -+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ -| MPNet | โœ… | โœ… | โœ… | โœ… | โŒ | -+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ -| mT5 | โœ… | โœ… | โœ… | โœ… | โœ… | -+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ -| OpenAI GPT | โœ… | โœ… | โœ… | โœ… | โŒ | -+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ -| OpenAI GPT-2 | โœ… | โœ… | โœ… | โœ… | โœ… | -+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ -| Pegasus | โœ… | โœ… | โœ… | โœ… | โœ… | -+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ -| Perceiver | โœ… | โŒ | โœ… | โŒ | โŒ | -+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ -| ProphetNet | โœ… | โŒ | โœ… | โŒ | โŒ | -+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ -| QDQBert | โŒ | โŒ | โœ… | โŒ | โŒ | -+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ -| RAG | โœ… | โŒ | โœ… | โœ… | โŒ | -+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ -| Reformer | โœ… | โœ… | โœ… | โŒ | โŒ | -+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ -| RemBERT | โœ… | โœ… | โœ… | โœ… | โŒ | -+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ -| RetriBERT | โœ… | โœ… | โœ… | โŒ | โŒ | -+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ -| RoBERTa | โœ… | โœ… | โœ… | โœ… | โœ… | -+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ -| RoFormer | โœ… | โœ… | โœ… | โœ… | โŒ | -+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ -| SegFormer | โŒ | โŒ | โœ… | โŒ | โŒ | -+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ -| SEW | โŒ | โŒ | โœ… | โŒ | โŒ | -+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ -| SEW-D | โŒ | โŒ | โœ… | โŒ | โŒ | -+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ -| Speech Encoder decoder | โŒ | โŒ | โœ… | โŒ | โŒ | -+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ -| Speech2Text | โœ… | โŒ | โœ… | โŒ | โŒ | -+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ -| Speech2Text2 | โœ… | โŒ | โŒ | โŒ | โŒ | -+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ -| Splinter | โœ… | โœ… | โœ… | โŒ | โŒ | -+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ -| SqueezeBERT | โœ… | โœ… | โœ… | โŒ | โŒ | -+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ -| T5 | โœ… | โœ… | โœ… | โœ… | โœ… | -+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ -| TAPAS | โœ… | โŒ | โœ… | โœ… | โŒ | -+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ -| Transformer-XL | โœ… | โŒ | โœ… | โœ… | โŒ | -+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ -| TrOCR | โŒ | โŒ | โœ… | โŒ | โŒ | -+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ -| UniSpeech | โŒ | โŒ | โœ… | โŒ | โŒ | -+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ -| UniSpeechSat | โŒ | โŒ | โœ… | โŒ | โŒ | -+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ -| Vision Encoder decoder | โŒ | โŒ | โœ… | โŒ | โœ… | -+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ -| VisionTextDualEncoder | โŒ | โŒ | โœ… | โŒ | โœ… | -+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ -| VisualBert | โŒ | โŒ | โœ… | โŒ | โŒ | -+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ -| ViT | โŒ | โŒ | โœ… | โœ… | โœ… | -+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ -| Wav2Vec2 | โœ… | โŒ | โœ… | โœ… | โœ… | -+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ -| XLM | โœ… | โŒ | โœ… | โœ… | โŒ | -+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ -| XLM-RoBERTa | โœ… | โœ… | โœ… | โœ… | โŒ | -+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ -| XLMProphetNet | โœ… | โŒ | โœ… | โŒ | โŒ | -+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ -| XLNet | โœ… | โœ… | โœ… | โœ… | โŒ | -+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ - -.. toctree:: - :maxdepth: 2 - :caption: Get started - - quicktour - installation - philosophy - glossary - -.. toctree:: - :maxdepth: 2 - :caption: Using ๐Ÿค— Transformers - - task_summary - model_summary - preprocessing - training - model_sharing - tokenizer_summary - multilingual - -.. toctree:: - :maxdepth: 2 - :caption: Advanced guides - - pretrained_models - examples - troubleshooting - custom_datasets - notebooks - sagemaker - community - converting_tensorflow_models - migration - contributing - add_new_model - add_new_pipeline - fast_tokenizers - performance - parallelism - testing - debugging - serialization - pr_checks - -.. toctree:: - :maxdepth: 2 - :caption: Research - - bertology - perplexity - benchmarks - -.. toctree:: - :maxdepth: 2 - :caption: Main Classes - - main_classes/callback - main_classes/configuration - main_classes/data_collator - main_classes/keras_callbacks - main_classes/logging - main_classes/model - main_classes/optimizer_schedules - main_classes/output - main_classes/pipelines - main_classes/processors - main_classes/tokenizer - main_classes/trainer - main_classes/deepspeed - main_classes/feature_extractor - -.. toctree:: - :maxdepth: 2 - :caption: Models - - model_doc/albert - model_doc/auto - model_doc/bart - model_doc/barthez - model_doc/bartpho - model_doc/beit - model_doc/bert - model_doc/bertweet - model_doc/bertgeneration - model_doc/bert_japanese - model_doc/bigbird - model_doc/bigbird_pegasus - model_doc/blenderbot - model_doc/blenderbot_small - model_doc/bort - model_doc/byt5 - model_doc/camembert - model_doc/canine - model_doc/clip - model_doc/convbert - model_doc/cpm - model_doc/ctrl - model_doc/deberta - model_doc/deberta_v2 - model_doc/deit - model_doc/detr - model_doc/dialogpt - model_doc/distilbert - model_doc/dpr - model_doc/electra - model_doc/encoderdecoder - model_doc/flaubert - model_doc/fnet - model_doc/fsmt - model_doc/funnel - model_doc/herbert - model_doc/ibert - model_doc/imagegpt - model_doc/layoutlm - model_doc/layoutlmv2 - model_doc/layoutxlm - model_doc/led - model_doc/longformer - model_doc/luke - model_doc/lxmert - model_doc/marian - model_doc/m2m_100 - model_doc/mbart - model_doc/megatron_bert - model_doc/megatron_gpt2 - model_doc/mobilebert - model_doc/mpnet - model_doc/mt5 - model_doc/gpt - model_doc/gpt2 - model_doc/gptj - model_doc/gpt_neo - model_doc/hubert - model_doc/pegasus - model_doc/perceiver - model_doc/phobert - model_doc/prophetnet - model_doc/qdqbert - model_doc/rag - model_doc/reformer - model_doc/rembert - model_doc/retribert - model_doc/roberta - model_doc/roformer - model_doc/segformer - model_doc/sew - model_doc/sew_d - model_doc/speechencoderdecoder - model_doc/speech_to_text - model_doc/speech_to_text_2 - model_doc/splinter - model_doc/squeezebert - model_doc/t5 - model_doc/t5v1.1 - model_doc/tapas - model_doc/transformerxl - model_doc/trocr - model_doc/unispeech - model_doc/unispeech_sat - model_doc/visionencoderdecoder - model_doc/vision_text_dual_encoder - model_doc/vit - model_doc/visual_bert - model_doc/wav2vec2 - model_doc/xlm - model_doc/xlmprophetnet - model_doc/xlmroberta - model_doc/xlnet - model_doc/xlsr_wav2vec2 - -.. toctree:: - :maxdepth: 2 - :caption: Internal Helpers - - internal/modeling_utils - internal/pipelines_utils - internal/tokenization_utils - internal/trainer_utils - internal/generation_utils - internal/file_utils