From e2154a0533566975bdb485758768d2d0e0d2309e Mon Sep 17 00:00:00 2001 From: eharper Date: Thu, 15 Feb 2024 23:09:35 -0700 Subject: [PATCH 01/28] update Signed-off-by: eharper --- README.rst | 47 +++++++++++++++++++++++++++++++++++------------ 1 file changed, 35 insertions(+), 12 deletions(-) diff --git a/README.rst b/README.rst index 44e5df6b7488..00a40c5f1087 100644 --- a/README.rst +++ b/README.rst @@ -57,31 +57,54 @@ such as FSDP, Mixture-of-Experts, and RLHF with TensorRT-LLM to provide speedups Introduction ------------ -NVIDIA NeMo is a conversational AI toolkit built for researchers working on automatic speech recognition (ASR), -text-to-speech synthesis (TTS), large language models (LLMs), and -natural language processing (NLP). -The primary objective of NeMo is to help researchers from industry and academia to reuse prior work (code and pretrained models) -and make it easier to create new `conversational AI models `_. +NVIDIA NeMo Framework is a generative AI framework built for researchers and pytorch developers +working on large language models (LLMs), multimodal models, automatic speech recognition (ASR), +and text-to-speech synthesis (TTS). +The primary objective of NeMo is to provide a scalable framework for researchers and developers from industry and academia +to more easily implement and design new generative AI models by being able to leverage existing code and pretrained models. + +For technical documentation, please see the `NeMo Framework User Guide `_. All NeMo models are trained with `Lightning `_ and training is automatically scalable to 1000s of GPUs. -Additionally, NeMo Megatron LLM models can be trained up to 1 trillion parameters using tensor and pipeline model parallelism. -NeMo models can be optimized for inference and deployed for production use-cases with `NVIDIA Riva `_. + +When applicable, NeMo models take advantage of the latest possible distributed training techniques, +including parallelism strategies such as +* data parallelism +* tensor paralellsim +* pipeline model parallelism +* fully sharded data parallelism (FSDP) +* sequence parallelism +* context parallelism +* mixture-of-experts (MoE) +and mixed precision training recipes with bfloat16 and FP8 training. + +NeMo's Transformer based LLM and Multimodal models leverage `NVIDIA Transformer Engine `_ for FP8 training on NVIDIA Hopper GPUs +and leverages `NVIDIA Megatron Core `_ for scaling transformer model training. + +NeMo LLM and Multimodal models can be deployed and optimized with `NVIDIA Inference Microservices (Early Access)`_. + +NeMo ASR and TTS models can be optimized for inference and deployed for production use-cases with `NVIDIA Riva `_. + +For scaling NeMo LLM and Multimodal training on Slurm clusters or public clouds, please see the `NVIDIA Framework Launcher `_. +The NeMo Framework launcher has extensive recipes, scripts, utilities, and documentation for training NeMo LLMs and Multimodal models and also has an `Autoconfigurator `_ +which can be used to find the optimal model parallel configuration for training on a specific cluster. +To get started quickly with the NeMo Framework Launcher, please see the `NeMo Framework Playbooks `_ +The NeMo Framework Launcher does not currently support ASR and TTS training but will soon. Getting started with NeMo is simple. State of the Art pretrained NeMo models are freely available on `HuggingFace Hub `_ and `NVIDIA NGC `_. -These models can be used to transcribe audio, synthesize speech, or translate text in just a few lines of code. +These models can be used to generate text or images, transcribe audio, and synthesize speech in just a few lines of code. We have extensive `tutorials `_ that -can be run on `Google Colab `_. +can be run on `Google Colab `_ or with our `NGC NeMo Framework Container. `_ + + For advanced users that want to train NeMo models from scratch or finetune existing NeMo models we have a full suite of `example scripts `_ that support multi-GPU/multi-node training. -For scaling NeMo LLM training on Slurm clusters or public clouds, please see the `NVIDIA NeMo Megatron Launcher `_. -The NM launcher has extensive recipes, scripts, utilities, and documentation for training NeMo LLMs and also has an `Autoconfigurator `_ -which can be used to find the optimal model parallel configuration for training on a specific cluster. Key Features ------------ From 7bf5c27870af226136fb9bbc9841ae14693e9283 Mon Sep 17 00:00:00 2001 From: eharper Date: Thu, 15 Feb 2024 23:27:26 -0700 Subject: [PATCH 02/28] udpate Signed-off-by: eharper --- README.rst | 82 ++---------------------------------------------------- 1 file changed, 3 insertions(+), 79 deletions(-) diff --git a/README.rst b/README.rst index 00a40c5f1087..b168464e4ac0 100644 --- a/README.rst +++ b/README.rst @@ -35,7 +35,7 @@ .. _main-readme: -**NVIDIA NeMo** +**NVIDIA NeMo Framework** =============== Latest News @@ -100,73 +100,9 @@ These models can be used to generate text or images, transcribe audio, and synth We have extensive `tutorials `_ that can be run on `Google Colab `_ or with our `NGC NeMo Framework Container. `_ - - For advanced users that want to train NeMo models from scratch or finetune existing NeMo models we have a full suite of `example scripts `_ that support multi-GPU/multi-node training. - -Key Features ------------- - -* Speech processing - * `HuggingFace Space for Audio Transcription (File, Microphone and YouTube) `_ - * `Pretrained models `_ available in 14+ languages - * `Automatic Speech Recognition (ASR) `_ - * Supported ASR `models `_: - * Jasper, QuartzNet, CitriNet, ContextNet - * Conformer-CTC, Conformer-Transducer, FastConformer-CTC, FastConformer-Transducer - * Squeezeformer-CTC and Squeezeformer-Transducer - * LSTM-Transducer (RNNT) and LSTM-CTC - * Supports the following decoders/losses: - * CTC - * Transducer/RNNT - * Hybrid Transducer/CTC - * NeMo Original `Multi-blank Transducers `_ and `Token-and-Duration Transducers (TDT) `_ - * Streaming/Buffered ASR (CTC/Transducer) - `Chunked Inference Examples `_ - * `Cache-aware Streaming Conformer `_ with multiple lookaheads (including microphone streaming `tutorial `_). - * Beam Search decoding - * `Language Modelling for ASR (CTC and RNNT) `_: N-gram LM in fusion with Beam Search decoding, Neural Rescoring with Transformer - * `Support of long audios for Conformer with memory efficient local attention `_ - * `Speech Classification, Speech Command Recognition and Language Identification `_: MatchboxNet (Command Recognition), AmberNet (LangID) - * `Voice activity Detection (VAD) `_: MarbleNet - * ASR with VAD Inference - `Example `_ - * `Speaker Recognition `_: TitaNet, ECAPA_TDNN, SpeakerNet - * `Speaker Diarization `_ - * Clustering Diarizer: TitaNet, ECAPA_TDNN, SpeakerNet - * Neural Diarizer: MSDD (Multi-scale Diarization Decoder) - * `Speech Intent Detection and Slot Filling `_: Conformer-Transformer -* Natural Language Processing - * `NeMo Megatron pre-training of Large Language Models `_ - * `Neural Machine Translation (NMT) `_ - * `Punctuation and Capitalization `_ - * `Token classification (named entity recognition) `_ - * `Text classification `_ - * `Joint Intent and Slot Classification `_ - * `Question answering `_ - * `GLUE benchmark `_ - * `Information retrieval `_ - * `Entity Linking `_ - * `Dialogue State Tracking `_ - * `Prompt Learning `_ - * `NGC collection of pre-trained NLP models. `_ - * `Synthetic Tabular Data Generation `_ -* Text-to-Speech Synthesis (TTS): - * `Documentation `_ - * Mel-Spectrogram generators: FastPitch, SSL FastPitch, Mixer-TTS/Mixer-TTS-X, RAD-TTS, Tacotron2 - * Vocoders: HiFiGAN, UnivNet, WaveGlow - * End-to-End Models: VITS - * `Pre-trained Model Checkpoints in NVIDIA GPU Cloud (NGC) `_ -* `Tools `_ - * `Text Processing (text normalization and inverse text normalization) `_ - * `NeMo Forced Aligner `_ - * `CTC-Segmentation tool `_ - * `Speech Data Explorer `_: a dash-based tool for interactive exploration of ASR/TTS datasets - * `Speech Data Processor `_ - - -Built for speed, NeMo can utilize NVIDIA's Tensor Cores and scale out training to multiple GPUs and multiple nodes. - Requirements ------------ @@ -174,8 +110,8 @@ Requirements 2) Pytorch 1.13.1 or above 3) NVIDIA GPU, if you intend to do model training -Documentation -------------- +Developer Documentation +----------------------- .. |main| image:: https://readthedocs.com/projects/nvidia-nemo/badge/?version=main :alt: Documentation Status @@ -195,18 +131,6 @@ Documentation | Stable | |stable| | `Documentation of the stable (i.e. most recent release) branch. `_ | +---------+-------------+------------------------------------------------------------------------------------------------------------------------------------------+ -Tutorials ---------- -A great way to start with NeMo is by checking `one of our tutorials `_. - -You can also get a high-level overview of NeMo by watching the talk *NVIDIA NeMo: Toolkit for Conversational AI*, presented at PyData Yerevan 2022: - -|pydata| - -.. |pydata| image:: https://img.youtube.com/vi/J-P6Sczmas8/maxres3.jpg - :target: https://www.youtube.com/embed/J-P6Sczmas8?mute=0&start=14&autoplay=0 - :width: 600 - :alt: NeMo presentation at PyData@Yerevan 2022 Getting help with NeMo ---------------------- From 790815f71ce8c4cf737daa3bcfd60302b7499d91 Mon Sep 17 00:00:00 2001 From: eharper Date: Thu, 15 Feb 2024 23:29:30 -0700 Subject: [PATCH 03/28] update Signed-off-by: eharper --- nemo/collections/asr/README.md | 37 ++++++++++++++++++++++++++++++++++ nemo/collections/tts/README.md | 7 +++++++ 2 files changed, 44 insertions(+) create mode 100644 nemo/collections/asr/README.md create mode 100644 nemo/collections/tts/README.md diff --git a/nemo/collections/asr/README.md b/nemo/collections/asr/README.md new file mode 100644 index 000000000000..691c9df2bb35 --- /dev/null +++ b/nemo/collections/asr/README.md @@ -0,0 +1,37 @@ +# Automatic Speech Recognition (ASR) + +## Key Features + +* `HuggingFace Space for Audio Transcription (File, Microphone and YouTube) `_ +* `Pretrained models `_ available in 14+ languages +* `Automatic Speech Recognition (ASR) `_ + * Supported ASR `models `_: + * Jasper, QuartzNet, CitriNet, ContextNet + * Conformer-CTC, Conformer-Transducer, FastConformer-CTC, FastConformer-Transducer + * Squeezeformer-CTC and Squeezeformer-Transducer + * LSTM-Transducer (RNNT) and LSTM-CTC + * Supports the following decoders/losses: + * CTC + * Transducer/RNNT + * Hybrid Transducer/CTC + * NeMo Original `Multi-blank Transducers `_ and `Token-and-Duration Transducers (TDT) `_ + * Streaming/Buffered ASR (CTC/Transducer) - `Chunked Inference Examples `_ + * `Cache-aware Streaming Conformer `_ with multiple lookaheads (including microphone streaming `tutorial `_). + * Beam Search decoding + * `Language Modelling for ASR (CTC and RNNT) `_: N-gram LM in fusion with Beam Search decoding, Neural Rescoring with Transformer + * `Support of long audios for Conformer with memory efficient local attention `_ +* `Speech Classification, Speech Command Recognition and Language Identification `_: MatchboxNet (Command Recognition), AmberNet (LangID) +* `Voice activity Detection (VAD) `_: MarbleNet + * ASR with VAD Inference - `Example `_ +* `Speaker Recognition `_: TitaNet, ECAPA_TDNN, SpeakerNet +* `Speaker Diarization `_ + * Clustering Diarizer: TitaNet, ECAPA_TDNN, SpeakerNet + * Neural Diarizer: MSDD (Multi-scale Diarization Decoder) +* `Speech Intent Detection and Slot Filling `_: Conformer-Transformer + +You can also get a high-level overview of NeMo ASR by watching the talk *NVIDIA NeMo: Toolkit for Conversational AI*, presented at PyData Yerevan 2022: + + +[![NVIDIA NeMo: Toolkit for Conversational AI](https://img.youtube.com/vi/J-P6Sczmas8/maxres3.jpg +)](https://www.youtube.com/embed/J-P6Sczmas8?mute=0&start=14&autoplay=0 + "NeMo presentation at PyData@Yerevan 2022") diff --git a/nemo/collections/tts/README.md b/nemo/collections/tts/README.md new file mode 100644 index 000000000000..54af33ab31cb --- /dev/null +++ b/nemo/collections/tts/README.md @@ -0,0 +1,7 @@ +# Text-to-Speech Synthesis (TTS): + +* `Documentation `_ +* Mel-Spectrogram generators: FastPitch, SSL FastPitch, Mixer-TTS/Mixer-TTS-X, RAD-TTS, Tacotron2 +* Vocoders: HiFiGAN, UnivNet, WaveGlow +* End-to-End Models: VITS +* `Pre-trained Model Checkpoints in NVIDIA GPU Cloud (NGC) `_ \ No newline at end of file From 0cab11a2ae73e6ecedfa1a34868ef18de02fa550 Mon Sep 17 00:00:00 2001 From: eharper Date: Thu, 15 Feb 2024 23:30:46 -0700 Subject: [PATCH 04/28] update Signed-off-by: eharper --- README.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.rst b/README.rst index b168464e4ac0..e3667721e38b 100644 --- a/README.rst +++ b/README.rst @@ -82,7 +82,7 @@ and mixed precision training recipes with bfloat16 and FP8 training. NeMo's Transformer based LLM and Multimodal models leverage `NVIDIA Transformer Engine `_ for FP8 training on NVIDIA Hopper GPUs and leverages `NVIDIA Megatron Core `_ for scaling transformer model training. -NeMo LLM and Multimodal models can be deployed and optimized with `NVIDIA Inference Microservices (Early Access)`_. +NeMo LLM and Multimodal models can be deployed and optimized with `NVIDIA Inference Microservices (Early Access) `_. NeMo ASR and TTS models can be optimized for inference and deployed for production use-cases with `NVIDIA Riva `_. From d0389a59facc77cb1ed1e7aa04359a276239efbe Mon Sep 17 00:00:00 2001 From: eharper Date: Thu, 15 Feb 2024 23:44:16 -0700 Subject: [PATCH 05/28] update Signed-off-by: eharper --- README.rst | 3 +++ 1 file changed, 3 insertions(+) diff --git a/README.rst b/README.rst index e3667721e38b..a1fc1c5e714f 100644 --- a/README.rst +++ b/README.rst @@ -82,6 +82,9 @@ and mixed precision training recipes with bfloat16 and FP8 training. NeMo's Transformer based LLM and Multimodal models leverage `NVIDIA Transformer Engine `_ for FP8 training on NVIDIA Hopper GPUs and leverages `NVIDIA Megatron Core `_ for scaling transformer model training. +NeMo LLMs can be aligned with state of the art methods such as SteerLM, DPO and Reinforcement Learning from Human Feedback (RLHF), +see `NVIDIA NeMo Aligner `_ for more details. + NeMo LLM and Multimodal models can be deployed and optimized with `NVIDIA Inference Microservices (Early Access) `_. NeMo ASR and TTS models can be optimized for inference and deployed for production use-cases with `NVIDIA Riva `_. From 9221f13976d55242249645dfef2a732a78469251 Mon Sep 17 00:00:00 2001 From: ntajbakhsh Date: Fri, 16 Feb 2024 11:39:47 -0800 Subject: [PATCH 06/28] landing pages added --- nemo/collections/multimodal/README.md | 26 ++++++++++++++++++++++++++ nemo/collections/nlp/README.md | 11 +++++++++++ 2 files changed, 37 insertions(+) create mode 100644 nemo/collections/multimodal/README.md create mode 100644 nemo/collections/nlp/README.md diff --git a/nemo/collections/multimodal/README.md b/nemo/collections/multimodal/README.md new file mode 100644 index 000000000000..015d1fedfaaf --- /dev/null +++ b/nemo/collections/multimodal/README.md @@ -0,0 +1,26 @@ +## NeMo Multimodal Collections + +The NeMo Multimodal Collection supports a diverse range of multimodal models tailored for various tasks, including text-2-image generation, text-2-NeRF synthesis, multimodal language models (LLM), and foundational vision and language models. Leveraging existing modules from other NeMo collections such as LLM and Vision whenever feasible, our multimodal collections prioritize efficiency by avoiding redundant implementations and maximizing reuse of NeMo's existing modules. Here's a comprehensive list of the models currently supported within the multimodal collection: + +- **Foundation Vision-Language Models:** + - CLIP + +- **Foundation Text-to-Image Generation:** + - Stable Diffusion + - Imagen + +- **Customizable Text-to-Image Models:** + - SD-LoRA + - SD-ControlNet + - SD-Instruct pix2pix + +- **Multimodal Language Models:** + - NeVA + - LLAVA + +- **Text-to-NeRF Synthesis:** + - DreamFusion++ + +- **NSFW Detection Support** + +Our documentation provides detailed information on each supported model, facilitating seamless integration and utilization within your projects. \ No newline at end of file diff --git a/nemo/collections/nlp/README.md b/nemo/collections/nlp/README.md new file mode 100644 index 000000000000..1fd5f7b126ed --- /dev/null +++ b/nemo/collections/nlp/README.md @@ -0,0 +1,11 @@ +## NeMo NLP/LLM Collection + +The NeMo NLP/LLM Collection is designed to provide comprehensive support for on-demand large language community models as well as Nvidia's top LLM offerings. Leveraging the constantly evolving Megatron Core and Transformer Engine libraries, our LLM collection is highly optimized, enabling NeMo users to perform foundation model training across thousands of GPUs and fine-tuning LLMs with SFT and PEFT. Additionally, we prioritize supporting TRTLLM export for the released models, which can accelerate inference by 2-3x depending on the model size. Here's a detailed list of the models currently supported within the LLM collection: + +- **Bert** +- **GPT-style models** +- **Falcon** +- **code-llama 7B** +- **Mixtral** + +Our documentation offers comprehensive insights into each supported model, facilitating seamless integration and utilization within your projects. From 50cb408ae149660287f3d542b587ba5fbe9f3204 Mon Sep 17 00:00:00 2001 From: ntajbakhsh Date: Fri, 16 Feb 2024 13:54:17 -0800 Subject: [PATCH 07/28] landing page added for vision --- nemo/collections/vision/README.md | 4 ++++ 1 file changed, 4 insertions(+) create mode 100644 nemo/collections/vision/README.md diff --git a/nemo/collections/vision/README.md b/nemo/collections/vision/README.md new file mode 100644 index 000000000000..1cfe05a97ab9 --- /dev/null +++ b/nemo/collections/vision/README.md @@ -0,0 +1,4 @@ +NeMo Vision Collection +======================== + +The NeMo Vision Collection is designed to support the multimodal collection, particularly for models like LLAVA that necessitate a vision encoder implementation. At present, the vision collection features support for ViT, a customized version of the transformer model from Megatron core. \ No newline at end of file From b5d62e48268adcee5057dc68c6c2d950a30f6c1b Mon Sep 17 00:00:00 2001 From: ntajbakhsh Date: Fri, 16 Feb 2024 12:06:18 -0800 Subject: [PATCH 08/28] landing pages updated --- nemo/collections/multimodal/README.md | 7 ++++--- nemo/collections/nlp/README.md | 6 ++++-- 2 files changed, 8 insertions(+), 5 deletions(-) diff --git a/nemo/collections/multimodal/README.md b/nemo/collections/multimodal/README.md index 015d1fedfaaf..e05fa3d88e11 100644 --- a/nemo/collections/multimodal/README.md +++ b/nemo/collections/multimodal/README.md @@ -1,6 +1,7 @@ -## NeMo Multimodal Collections +NeMo Multimodal Collections +============================ -The NeMo Multimodal Collection supports a diverse range of multimodal models tailored for various tasks, including text-2-image generation, text-2-NeRF synthesis, multimodal language models (LLM), and foundational vision and language models. Leveraging existing modules from other NeMo collections such as LLM and Vision whenever feasible, our multimodal collections prioritize efficiency by avoiding redundant implementations and maximizing reuse of NeMo's existing modules. Here's a comprehensive list of the models currently supported within the multimodal collection: +The NeMo Multimodal Collection supports a diverse range of multimodal models tailored for various tasks, including text-2-image generation, text-2-NeRF synthesis, multimodal language models (LLM), and foundational vision and language models. Leveraging existing modules from other NeMo collections such as LLM and Vision whenever feasible, our multimodal collections prioritize efficiency by avoiding redundant implementations and maximizing reuse of NeMo's existing modules. Here's a detailed list of the models currently supported within the multimodal collection: - **Foundation Vision-Language Models:** - CLIP @@ -23,4 +24,4 @@ The NeMo Multimodal Collection supports a diverse range of multimodal models tai - **NSFW Detection Support** -Our documentation provides detailed information on each supported model, facilitating seamless integration and utilization within your projects. \ No newline at end of file +Our documentation provides detailed information on each supported model, facilitating seamless integration and utilization within your projects. diff --git a/nemo/collections/nlp/README.md b/nemo/collections/nlp/README.md index 1fd5f7b126ed..785bd4832c3a 100644 --- a/nemo/collections/nlp/README.md +++ b/nemo/collections/nlp/README.md @@ -1,11 +1,13 @@ -## NeMo NLP/LLM Collection +NeMo NLP/LLM Collection +======================== -The NeMo NLP/LLM Collection is designed to provide comprehensive support for on-demand large language community models as well as Nvidia's top LLM offerings. Leveraging the constantly evolving Megatron Core and Transformer Engine libraries, our LLM collection is highly optimized, enabling NeMo users to perform foundation model training across thousands of GPUs and fine-tuning LLMs with SFT and PEFT. Additionally, we prioritize supporting TRTLLM export for the released models, which can accelerate inference by 2-3x depending on the model size. Here's a detailed list of the models currently supported within the LLM collection: +The NeMo NLP/LLM Collection is designed to provide comprehensive support for on-demand large language community models as well as Nvidia's top LLM offerings. By harnessing the cutting-edge Megatron Core, our LLM collection is highly optimized, empowering NeMo users to undertake foundation model training across thousands of GPUs while facilitating fine-tuning of LLMs using techniques such as SFT and PEFT. Leveraging the Transformer Engine library, our collection ensures seamless support for FP8 workloads on Hopper H100 GPUs. Additionally, we prioritize supporting TRTLLM export for the released models, which can accelerate inference by 2-3x depending on the model size. Here's a detailed list of the models currently supported within the LLM collection: - **Bert** - **GPT-style models** - **Falcon** - **code-llama 7B** +- **Mistral** - **Mixtral** Our documentation offers comprehensive insights into each supported model, facilitating seamless integration and utilization within your projects. From e10c12ab47a2db5c9e00596d9fdef1818424737c Mon Sep 17 00:00:00 2001 From: ntajbakhsh Date: Fri, 16 Feb 2024 12:29:38 -0800 Subject: [PATCH 09/28] some minor changes to the main readme --- README.rst | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/README.rst b/README.rst index a1fc1c5e714f..f3f2ccbb08cb 100644 --- a/README.rst +++ b/README.rst @@ -65,13 +65,15 @@ to more easily implement and design new generative AI models by being able to le For technical documentation, please see the `NeMo Framework User Guide `_. + +## Model Training and Scalability All NeMo models are trained with `Lightning `_ and training is automatically scalable to 1000s of GPUs. When applicable, NeMo models take advantage of the latest possible distributed training techniques, including parallelism strategies such as * data parallelism -* tensor paralellsim +* tensor parallelism * pipeline model parallelism * fully sharded data parallelism (FSDP) * sequence parallelism @@ -82,19 +84,23 @@ and mixed precision training recipes with bfloat16 and FP8 training. NeMo's Transformer based LLM and Multimodal models leverage `NVIDIA Transformer Engine `_ for FP8 training on NVIDIA Hopper GPUs and leverages `NVIDIA Megatron Core `_ for scaling transformer model training. +## Model Alignment NeMo LLMs can be aligned with state of the art methods such as SteerLM, DPO and Reinforcement Learning from Human Feedback (RLHF), see `NVIDIA NeMo Aligner `_ for more details. +## Model Deployment NeMo LLM and Multimodal models can be deployed and optimized with `NVIDIA Inference Microservices (Early Access) `_. NeMo ASR and TTS models can be optimized for inference and deployed for production use-cases with `NVIDIA Riva `_. +## Scaling and Training on Clusters For scaling NeMo LLM and Multimodal training on Slurm clusters or public clouds, please see the `NVIDIA Framework Launcher `_. The NeMo Framework launcher has extensive recipes, scripts, utilities, and documentation for training NeMo LLMs and Multimodal models and also has an `Autoconfigurator `_ which can be used to find the optimal model parallel configuration for training on a specific cluster. To get started quickly with the NeMo Framework Launcher, please see the `NeMo Framework Playbooks `_ The NeMo Framework Launcher does not currently support ASR and TTS training but will soon. +## Getting Started Getting started with NeMo is simple. State of the Art pretrained NeMo models are freely available on `HuggingFace Hub `_ and `NVIDIA NGC `_. From cac4ec8efeab577e34215ece05af4204b691f0d3 Mon Sep 17 00:00:00 2001 From: eharper Date: Fri, 16 Feb 2024 17:42:00 -0700 Subject: [PATCH 10/28] update Signed-off-by: eharper --- README.rst | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/README.rst b/README.rst index a1fc1c5e714f..eae908aeb66a 100644 --- a/README.rst +++ b/README.rst @@ -106,6 +106,16 @@ can be run on `Google Colab `_ or with our `N For advanced users that want to train NeMo models from scratch or finetune existing NeMo models we have a full suite of `example scripts `_ that support multi-GPU/multi-node training. +Key Features +------------ + +NeMo models are organized into collections: + +* NeMo LLM +* NeMo Multimodal +* NeMo ASR +* NeMo TTS + Requirements ------------ From c34182499c7f477e1d4965c2cc5de2df93c92c3b Mon Sep 17 00:00:00 2001 From: eharper Date: Fri, 16 Feb 2024 17:44:45 -0700 Subject: [PATCH 11/28] update Signed-off-by: eharper --- README.rst | 18 +++++------------- nemo/collections/nlp/README.md | 2 +- 2 files changed, 6 insertions(+), 14 deletions(-) diff --git a/README.rst b/README.rst index 90bf38d9cb80..520a88b70828 100644 --- a/README.rst +++ b/README.rst @@ -65,15 +65,13 @@ to more easily implement and design new generative AI models by being able to le For technical documentation, please see the `NeMo Framework User Guide `_. - -## Model Training and Scalability All NeMo models are trained with `Lightning `_ and training is automatically scalable to 1000s of GPUs. When applicable, NeMo models take advantage of the latest possible distributed training techniques, including parallelism strategies such as * data parallelism -* tensor parallelism +* tensor paralellsim * pipeline model parallelism * fully sharded data parallelism (FSDP) * sequence parallelism @@ -84,23 +82,19 @@ and mixed precision training recipes with bfloat16 and FP8 training. NeMo's Transformer based LLM and Multimodal models leverage `NVIDIA Transformer Engine `_ for FP8 training on NVIDIA Hopper GPUs and leverages `NVIDIA Megatron Core `_ for scaling transformer model training. -## Model Alignment NeMo LLMs can be aligned with state of the art methods such as SteerLM, DPO and Reinforcement Learning from Human Feedback (RLHF), see `NVIDIA NeMo Aligner `_ for more details. -## Model Deployment NeMo LLM and Multimodal models can be deployed and optimized with `NVIDIA Inference Microservices (Early Access) `_. NeMo ASR and TTS models can be optimized for inference and deployed for production use-cases with `NVIDIA Riva `_. -## Scaling and Training on Clusters For scaling NeMo LLM and Multimodal training on Slurm clusters or public clouds, please see the `NVIDIA Framework Launcher `_. The NeMo Framework launcher has extensive recipes, scripts, utilities, and documentation for training NeMo LLMs and Multimodal models and also has an `Autoconfigurator `_ which can be used to find the optimal model parallel configuration for training on a specific cluster. To get started quickly with the NeMo Framework Launcher, please see the `NeMo Framework Playbooks `_ The NeMo Framework Launcher does not currently support ASR and TTS training but will soon. -## Getting Started Getting started with NeMo is simple. State of the Art pretrained NeMo models are freely available on `HuggingFace Hub `_ and `NVIDIA NGC `_. @@ -115,12 +109,10 @@ we have a full suite of `example scripts `_ +* `Multimodal `_ +* `Automatic Speech Recognition `_ +* `Text to Speech `_ Requirements ------------ diff --git a/nemo/collections/nlp/README.md b/nemo/collections/nlp/README.md index 785bd4832c3a..faef438ddb58 100644 --- a/nemo/collections/nlp/README.md +++ b/nemo/collections/nlp/README.md @@ -10,4 +10,4 @@ The NeMo NLP/LLM Collection is designed to provide comprehensive support for on- - **Mistral** - **Mixtral** -Our documentation offers comprehensive insights into each supported model, facilitating seamless integration and utilization within your projects. +Our `documentation `_ offers comprehensive insights into each supported model, facilitating seamless integration and utilization within your projects. From d84503a2c6e6f992eb5313a53a233b878b34c302 Mon Sep 17 00:00:00 2001 From: eharper Date: Fri, 16 Feb 2024 17:53:17 -0700 Subject: [PATCH 12/28] update Signed-off-by: eharper --- README.rst | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.rst b/README.rst index 520a88b70828..761ce200a15d 100644 --- a/README.rst +++ b/README.rst @@ -70,6 +70,7 @@ training is automatically scalable to 1000s of GPUs. When applicable, NeMo models take advantage of the latest possible distributed training techniques, including parallelism strategies such as + * data parallelism * tensor paralellsim * pipeline model parallelism @@ -77,6 +78,7 @@ including parallelism strategies such as * sequence parallelism * context parallelism * mixture-of-experts (MoE) + and mixed precision training recipes with bfloat16 and FP8 training. NeMo's Transformer based LLM and Multimodal models leverage `NVIDIA Transformer Engine `_ for FP8 training on NVIDIA Hopper GPUs From d8d08afb07e97fcd936d1bc8acc221d1cf438531 Mon Sep 17 00:00:00 2001 From: eharper Date: Fri, 16 Feb 2024 17:55:08 -0700 Subject: [PATCH 13/28] update Signed-off-by: eharper --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index 761ce200a15d..511667869da1 100644 --- a/README.rst +++ b/README.rst @@ -104,6 +104,7 @@ These models can be used to generate text or images, transcribe audio, and synth We have extensive `tutorials `_ that can be run on `Google Colab `_ or with our `NGC NeMo Framework Container. `_ +and we have `playbooks `_ for users that want to train NeMo models with the NeMo Framework Launcher. For advanced users that want to train NeMo models from scratch or finetune existing NeMo models we have a full suite of `example scripts `_ that support multi-GPU/multi-node training. From af9d2c0e81c518bb2eda78145c9151156ee636cc Mon Sep 17 00:00:00 2001 From: eharper Date: Fri, 16 Feb 2024 18:03:21 -0700 Subject: [PATCH 14/28] update Signed-off-by: eharper --- README.rst | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.rst b/README.rst index 511667869da1..f698ab944d52 100644 --- a/README.rst +++ b/README.rst @@ -113,9 +113,10 @@ Key Features ------------ * `Large Language Models `_ -* `Multimodal `_ +* `Multimodal `_ * `Automatic Speech Recognition `_ * `Text to Speech `_ +* `Computer Vision `_ Requirements ------------ From e87cf41d529065098d229292d98a8879124c5785 Mon Sep 17 00:00:00 2001 From: eharper Date: Fri, 16 Feb 2024 18:32:12 -0700 Subject: [PATCH 15/28] update Signed-off-by: eharper --- docs/source/nlp/information_retrieval.rst | 2 +- docs/source/starthere/intro.rst | 15 +++++++++------ 2 files changed, 10 insertions(+), 7 deletions(-) diff --git a/docs/source/nlp/information_retrieval.rst b/docs/source/nlp/information_retrieval.rst index 5cf87143848c..b40caeee8a3b 100644 --- a/docs/source/nlp/information_retrieval.rst +++ b/docs/source/nlp/information_retrieval.rst @@ -8,7 +8,7 @@ The model architecture and pre-training process are detailed in the `Sentence-BE Sentence-BERT utilizes a BERT-based architecture, but it is trained using a siamese and triplet network structure to derive fixed-sized sentence embeddings that capture semantic information. Sentence-BERT is commonly used to generate high-quality sentence embeddings for various downstream natural language processing tasks, such as semantic textual similarity, clustering, and information retrieval -Data Input for the Senntence-BERT model +Data Input for the Sentence-BERT model --------------------------------------- The fine-tuning data for the Sentence-BERT (SBERT) model should consist of data instances, diff --git a/docs/source/starthere/intro.rst b/docs/source/starthere/intro.rst index e6a59b0832ab..699fbd44faf1 100644 --- a/docs/source/starthere/intro.rst +++ b/docs/source/starthere/intro.rst @@ -8,14 +8,17 @@ Introduction .. _dummy_header: -`NVIDIA NeMo `_, part of the NVIDIA AI platform, is a toolkit for building new state-of-the-art -conversational AI models. NeMo has separate collections for Automatic Speech Recognition (ASR), -Natural Language Processing (NLP), and Text-to-Speech (TTS) models. Each collection consists of +NVIDIA NeMo Framework is an end-to-end, cloud-native framework to build, customize, and deploy generative AI models anywhere. +To learn more about using NeMo in generative AI workflows, please refer to the `NeMo Framework User Guide! `_." + +`NVIDIA NeMo Framework `_ has separate collections for Large Language Models (LLMs), +Multimodal, Computer Vision, Automatic Speech Recognition (ASR), +and Text-to-Speech (TTS) models. Each collection consists of prebuilt modules that include everything needed to train on your data. -Every module can easily be customized, extended, and composed to create new conversational AI +Every module can easily be customized, extended, and composed to create new generative AI model architectures. -Conversational AI architectures are typically large and require a lot of data and compute +Generative AI architectures are typically large and require a lot of data and compute for training. NeMo uses `PyTorch Lightning `_ for easy and performant multi-GPU/multi-node mixed-precision training. @@ -38,7 +41,7 @@ Before you begin using NeMo, it's assumed you meet the following prerequisites. Quick Start Guide ----------------- -You can try out NeMo's ASR, NLP and TTS functionality with the example below, which is based on the `Audio Translation `_ tutorial. +You can try out NeMo's ASR, LLM and TTS functionality with the example below, which is based on the `Audio Translation `_ tutorial. Once you have :ref:`installed NeMo `, then you can run the code below: From 2f0b59e2bc6b10a77c706d4421bef95dbd22c36b Mon Sep 17 00:00:00 2001 From: eharper Date: Fri, 16 Feb 2024 18:36:26 -0700 Subject: [PATCH 16/28] update Signed-off-by: eharper --- docs/source/nlp/nemo_megatron/intro.rst | 12 +++--------- 1 file changed, 3 insertions(+), 9 deletions(-) diff --git a/docs/source/nlp/nemo_megatron/intro.rst b/docs/source/nlp/nemo_megatron/intro.rst index 80b30a267b18..768e1d544558 100644 --- a/docs/source/nlp/nemo_megatron/intro.rst +++ b/docs/source/nlp/nemo_megatron/intro.rst @@ -1,8 +1,7 @@ -NeMo Megatron -============= +Large Language Models +===================== -Megatron :cite:`nlp-megatron-shoeybi2019megatron` is a large, powerful transformer developed by the Applied Deep Learning Research -team at NVIDIA. NeMo Megatron supports several types of models: +To learn more about using NeMo to train Large Language Models at scale, please refer to the `NeMo Framework User Guide! `_." * GPT-style models (decoder only) * T5/BART/UL2-style models (encoder-decoder) @@ -10,11 +9,6 @@ team at NVIDIA. NeMo Megatron supports several types of models: * RETRO model (decoder only) - -.. note:: - NeMo Megatron has an Enterprise edition which contains tools for data preprocessing, hyperparameter tuning, container, scripts for various clouds and more. With Enterprise edition you also get deployment tools. Apply for `early access here `_ . - - .. toctree:: :maxdepth: 1 From b8f6de8a1653bfdf87764cda9d621da62fae01de Mon Sep 17 00:00:00 2001 From: eharper Date: Fri, 16 Feb 2024 18:40:51 -0700 Subject: [PATCH 17/28] update Signed-off-by: eharper --- docs/source/index.rst | 87 ++++++++++++++++++++++--------------------- 1 file changed, 44 insertions(+), 43 deletions(-) diff --git a/docs/source/index.rst b/docs/source/index.rst index 7407886eefc8..a9f6f2c8f40d 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -1,5 +1,5 @@ -NVIDIA NeMo User Guide -====================== +NVIDIA NeMo Framework Developer Docs +==================================== .. toctree:: :maxdepth: 2 @@ -12,17 +12,28 @@ NVIDIA NeMo User Guide starthere/migration-guide .. toctree:: - :maxdepth: 2 - :caption: NeMo Core - :name: core + :maxdepth: 3 + :caption: Multimodal (MM) + :name: Multimodal - core/core - core/exp_manager - core/neural_types - core/export - core/adapters/intro - core/api + multimodal/mllm/intro + multimodal/vlm/intro + multimodal/text2img/intro + multimodal/nerf/intro + multimodal/api + + +.. toctree:: + :maxdepth: 3 + :caption: Large Language Models + :name: Large Language Models + nlp/nemo_megatron/intro + nlp/machine_translation/machine_translation + nlp/text_normalization/intro + nlp/api + nlp/megatron_onnx_export + nlp/models .. toctree:: :maxdepth: 2 @@ -36,19 +47,6 @@ NVIDIA NeMo User Guide asr/ssl/intro asr/speech_intent_slot/intro -.. toctree:: - :maxdepth: 3 - :caption: Natural Language Processing - :name: Natural Language Processing - - nlp/nemo_megatron/intro - nlp/machine_translation/machine_translation - nlp/text_normalization/intro - nlp/api - nlp/megatron_onnx_export - nlp/models - - .. toctree:: :maxdepth: 1 :caption: Text To Speech (TTS) @@ -56,6 +54,26 @@ NVIDIA NeMo User Guide tts/intro +.. toctree:: + :maxdepth: 2 + :caption: Vision + :name: vision + + vision/intro + + +.. toctree:: + :maxdepth: 2 + :caption: NeMo Core + :name: core + + core/core + core/exp_manager + core/neural_types + core/export + core/adapters/intro + core/api + .. toctree:: :maxdepth: 2 :caption: Common @@ -71,27 +89,10 @@ NVIDIA NeMo User Guide text_processing/g2p/g2p common/intro -.. toctree:: - :maxdepth: 3 - :caption: Multimodal (MM) - :name: Multimodal - - multimodal/mllm/intro - multimodal/vlm/intro - multimodal/text2img/intro - multimodal/nerf/intro - multimodal/api - -.. toctree:: - :maxdepth: 2 - :caption: Vision - :name: vision - - vision/intro .. toctree:: :maxdepth: 3 - :caption: Tools - :name: Tools + :caption: Speech Tools + :name: Speech Tools tools/intro From f53167998e1467c2bc3b769819905c191fc2086d Mon Sep 17 00:00:00 2001 From: eharper Date: Fri, 16 Feb 2024 18:46:20 -0700 Subject: [PATCH 18/28] update Signed-off-by: eharper --- docs/source/nlp/nemo_megatron/intro.rst | 2 +- docs/source/starthere/intro.rst | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/source/nlp/nemo_megatron/intro.rst b/docs/source/nlp/nemo_megatron/intro.rst index 768e1d544558..faf315a40c04 100644 --- a/docs/source/nlp/nemo_megatron/intro.rst +++ b/docs/source/nlp/nemo_megatron/intro.rst @@ -1,7 +1,7 @@ Large Language Models ===================== -To learn more about using NeMo to train Large Language Models at scale, please refer to the `NeMo Framework User Guide! `_." +To learn more about using NeMo to train Large Language Models at scale, please refer to the `NeMo Framework User Guide! `_. * GPT-style models (decoder only) * T5/BART/UL2-style models (encoder-decoder) diff --git a/docs/source/starthere/intro.rst b/docs/source/starthere/intro.rst index 699fbd44faf1..f26adf86df6b 100644 --- a/docs/source/starthere/intro.rst +++ b/docs/source/starthere/intro.rst @@ -9,7 +9,7 @@ Introduction .. _dummy_header: NVIDIA NeMo Framework is an end-to-end, cloud-native framework to build, customize, and deploy generative AI models anywhere. -To learn more about using NeMo in generative AI workflows, please refer to the `NeMo Framework User Guide! `_." +To learn more about using NeMo in generative AI workflows, please refer to the `NeMo Framework User Guide! `_. `NVIDIA NeMo Framework `_ has separate collections for Large Language Models (LLMs), Multimodal, Computer Vision, Automatic Speech Recognition (ASR), From b96ba9fb799a903867f4a2f0639b292147e78641 Mon Sep 17 00:00:00 2001 From: eharper Date: Fri, 16 Feb 2024 18:51:10 -0700 Subject: [PATCH 19/28] update Signed-off-by: eharper --- docs/source/starthere/intro.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/starthere/intro.rst b/docs/source/starthere/intro.rst index f26adf86df6b..88d7b765b233 100644 --- a/docs/source/starthere/intro.rst +++ b/docs/source/starthere/intro.rst @@ -9,7 +9,7 @@ Introduction .. _dummy_header: NVIDIA NeMo Framework is an end-to-end, cloud-native framework to build, customize, and deploy generative AI models anywhere. -To learn more about using NeMo in generative AI workflows, please refer to the `NeMo Framework User Guide! `_. +To learn more about using NeMo in generative AI workflows, please refer to the `NeMo Framework User Guide! `_ `NVIDIA NeMo Framework `_ has separate collections for Large Language Models (LLMs), Multimodal, Computer Vision, Automatic Speech Recognition (ASR), From 385b2729c1642e191c988d9d33538de220b60dbd Mon Sep 17 00:00:00 2001 From: eharper Date: Fri, 16 Feb 2024 18:52:18 -0700 Subject: [PATCH 20/28] update Signed-off-by: eharper --- docs/source/nlp/api.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/source/nlp/api.rst b/docs/source/nlp/api.rst index 33709bd05a19..b9b4d529ba46 100755 --- a/docs/source/nlp/api.rst +++ b/docs/source/nlp/api.rst @@ -1,5 +1,5 @@ -NeMo Megatron API -======================= +Large language Model API +======================== Pretraining Model Classes ------------------------- From 5d881f8755eb4fdaac1a753bbc653e8ca9740838 Mon Sep 17 00:00:00 2001 From: eharper Date: Fri, 16 Feb 2024 18:53:04 -0700 Subject: [PATCH 21/28] update Signed-off-by: eharper --- README.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.rst b/README.rst index f698ab944d52..02b16d3b13a9 100644 --- a/README.rst +++ b/README.rst @@ -58,7 +58,7 @@ Introduction ------------ NVIDIA NeMo Framework is a generative AI framework built for researchers and pytorch developers -working on large language models (LLMs), multimodal models, automatic speech recognition (ASR), +working on large language models (LLMs), multimodal models (MM), automatic speech recognition (ASR), and text-to-speech synthesis (TTS). The primary objective of NeMo is to provide a scalable framework for researchers and developers from industry and academia to more easily implement and design new generative AI models by being able to leverage existing code and pretrained models. From 630c53155dd5929e06528ed54178e346fce3a785 Mon Sep 17 00:00:00 2001 From: eharper Date: Fri, 16 Feb 2024 18:57:17 -0700 Subject: [PATCH 22/28] update Signed-off-by: eharper --- docs/source/starthere/intro.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/starthere/intro.rst b/docs/source/starthere/intro.rst index 88d7b765b233..185350bad3ab 100644 --- a/docs/source/starthere/intro.rst +++ b/docs/source/starthere/intro.rst @@ -12,7 +12,7 @@ NVIDIA NeMo Framework is an end-to-end, cloud-native framework to build, customi To learn more about using NeMo in generative AI workflows, please refer to the `NeMo Framework User Guide! `_ `NVIDIA NeMo Framework `_ has separate collections for Large Language Models (LLMs), -Multimodal, Computer Vision, Automatic Speech Recognition (ASR), +Multimodal (MM), Computer Vision (CV), Automatic Speech Recognition (ASR), and Text-to-Speech (TTS) models. Each collection consists of prebuilt modules that include everything needed to train on your data. Every module can easily be customized, extended, and composed to create new generative AI From 20ea43fbff13e69335336f671ed67c7575b95141 Mon Sep 17 00:00:00 2001 From: eharper Date: Fri, 16 Feb 2024 18:59:20 -0700 Subject: [PATCH 23/28] update Signed-off-by: eharper --- docs/source/multimodal/api.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/multimodal/api.rst b/docs/source/multimodal/api.rst index 63ce477273b3..d6f96e6c6ea4 100644 --- a/docs/source/multimodal/api.rst +++ b/docs/source/multimodal/api.rst @@ -1,4 +1,4 @@ -NeMo Megatron API +Multimodal API ======================= Model Classes From 528111dafc94c7a50110e3374e0d617be6e1fb77 Mon Sep 17 00:00:00 2001 From: eharper Date: Fri, 16 Feb 2024 18:59:52 -0700 Subject: [PATCH 24/28] update Signed-off-by: eharper --- docs/source/index.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/index.rst b/docs/source/index.rst index a9f6f2c8f40d..7cde15534f1a 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -25,7 +25,7 @@ NVIDIA NeMo Framework Developer Docs .. toctree:: :maxdepth: 3 - :caption: Large Language Models + :caption: Large Language Models (LLMs) :name: Large Language Models nlp/nemo_megatron/intro From 7762752667e5158e6c1c5a805946c60e6df82e3a Mon Sep 17 00:00:00 2001 From: eharper Date: Fri, 16 Feb 2024 19:01:31 -0700 Subject: [PATCH 25/28] update Signed-off-by: eharper --- docs/source/index.rst | 1 - 1 file changed, 1 deletion(-) diff --git a/docs/source/index.rst b/docs/source/index.rst index 7cde15534f1a..0b62b78db814 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -30,7 +30,6 @@ NVIDIA NeMo Framework Developer Docs nlp/nemo_megatron/intro nlp/machine_translation/machine_translation - nlp/text_normalization/intro nlp/api nlp/megatron_onnx_export nlp/models From 247251e9cd9b303181e1017a8cda77c6dc51ad63 Mon Sep 17 00:00:00 2001 From: eharper Date: Fri, 16 Feb 2024 19:02:06 -0700 Subject: [PATCH 26/28] update Signed-off-by: eharper --- docs/source/index.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/source/index.rst b/docs/source/index.rst index 0b62b78db814..9d66d693000e 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -29,10 +29,10 @@ NVIDIA NeMo Framework Developer Docs :name: Large Language Models nlp/nemo_megatron/intro + nlp/models nlp/machine_translation/machine_translation - nlp/api nlp/megatron_onnx_export - nlp/models + nlp/api .. toctree:: :maxdepth: 2 From 5139ad2e1fa0a9538f3694e3108bca88693ab74c Mon Sep 17 00:00:00 2001 From: ntajbakhsh Date: Fri, 16 Feb 2024 18:22:26 -0800 Subject: [PATCH 27/28] typo fixed --- README.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.rst b/README.rst index 02b16d3b13a9..3135bdbfabdd 100644 --- a/README.rst +++ b/README.rst @@ -72,7 +72,7 @@ When applicable, NeMo models take advantage of the latest possible distributed t including parallelism strategies such as * data parallelism -* tensor paralellsim +* tensor parallelism * pipeline model parallelism * fully sharded data parallelism (FSDP) * sequence parallelism From efab0ebb4aff50dfb8a51c202d574ae70089dbd9 Mon Sep 17 00:00:00 2001 From: eharper Date: Fri, 16 Feb 2024 19:22:43 -0700 Subject: [PATCH 28/28] update Signed-off-by: eharper --- nemo/collections/asr/README.md | 30 +++++++++++++-------------- nemo/collections/multimodal/README.md | 2 +- nemo/collections/nlp/README.md | 2 +- nemo/collections/tts/README.md | 4 ++-- nemo/collections/vision/README.md | 4 +++- 5 files changed, 22 insertions(+), 20 deletions(-) diff --git a/nemo/collections/asr/README.md b/nemo/collections/asr/README.md index 691c9df2bb35..9a1b947f2d18 100644 --- a/nemo/collections/asr/README.md +++ b/nemo/collections/asr/README.md @@ -2,10 +2,10 @@ ## Key Features -* `HuggingFace Space for Audio Transcription (File, Microphone and YouTube) `_ -* `Pretrained models `_ available in 14+ languages -* `Automatic Speech Recognition (ASR) `_ - * Supported ASR `models `_: +* [HuggingFace Space for Audio Transcription (File, Microphone and YouTube)](https://huggingface.co/spaces/smajumdar/nemo_multilingual_language_id) +* [Pretrained models](https://ngc.nvidia.com/catalog/collections/nvidia:nemo_asr) available in 14+ languages +* [Automatic Speech Recognition (ASR)](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/intro.html) + * Supported ASR [models](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/asr/models.html): * Jasper, QuartzNet, CitriNet, ContextNet * Conformer-CTC, Conformer-Transducer, FastConformer-CTC, FastConformer-Transducer * Squeezeformer-CTC and Squeezeformer-Transducer @@ -14,20 +14,20 @@ * CTC * Transducer/RNNT * Hybrid Transducer/CTC - * NeMo Original `Multi-blank Transducers `_ and `Token-and-Duration Transducers (TDT) `_ - * Streaming/Buffered ASR (CTC/Transducer) - `Chunked Inference Examples `_ - * `Cache-aware Streaming Conformer `_ with multiple lookaheads (including microphone streaming `tutorial `_). + * NeMo Original [Multi-blank Transducers](https://arxiv.org/abs/2211.03541) and [Token-and-Duration Transducers (TDT)](https://arxiv.org/abs/2304.06795) + * Streaming/Buffered ASR (CTC/Transducer) - [Chunked Inference Examples](https://github.com/NVIDIA/NeMo/tree/stable/examples/asr/asr_chunked_inference) + * [Cache-aware Streaming Conformer](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/asr/models.html#cache-aware-streaming-conformer) with multiple lookaheads (including microphone streaming [tutorial](https://github.com/NVIDIA/NeMo/blob/main/tutorials/asr/Online_ASR_Microphone_Demo_Cache_Aware_Streaming.ipynb). * Beam Search decoding - * `Language Modelling for ASR (CTC and RNNT) `_: N-gram LM in fusion with Beam Search decoding, Neural Rescoring with Transformer - * `Support of long audios for Conformer with memory efficient local attention `_ -* `Speech Classification, Speech Command Recognition and Language Identification `_: MatchboxNet (Command Recognition), AmberNet (LangID) -* `Voice activity Detection (VAD) `_: MarbleNet - * ASR with VAD Inference - `Example `_ -* `Speaker Recognition `_: TitaNet, ECAPA_TDNN, SpeakerNet -* `Speaker Diarization `_ + * [Language Modelling for ASR (CTC and RNNT)](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/asr_language_modeling.html): N-gram LM in fusion with Beam Search decoding, Neural Rescoring with Transformer + * [Support of long audios for Conformer with memory efficient local attention](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/results.html#inference-on-long-audio) +* [Speech Classification, Speech Command Recognition and Language Identification](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/speech_classification/intro.html): MatchboxNet (Command Recognition), AmberNet (LangID) +* [Voice activity Detection (VAD)](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/asr/speech_classification/models.html#marblenet-vad): MarbleNet + * ASR with VAD Inference - [Example](https://github.com/NVIDIA/NeMo/tree/stable/examples/asr/asr_vad) +* [Speaker Recognition](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/speaker_recognition/intro.html): TitaNet, ECAPA_TDNN, SpeakerNet +* [Speaker Diarization](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/speaker_diarization/intro.html) * Clustering Diarizer: TitaNet, ECAPA_TDNN, SpeakerNet * Neural Diarizer: MSDD (Multi-scale Diarization Decoder) -* `Speech Intent Detection and Slot Filling `_: Conformer-Transformer +* [Speech Intent Detection and Slot Filling](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/speech_intent_slot/intro.html): Conformer-Transformer You can also get a high-level overview of NeMo ASR by watching the talk *NVIDIA NeMo: Toolkit for Conversational AI*, presented at PyData Yerevan 2022: diff --git a/nemo/collections/multimodal/README.md b/nemo/collections/multimodal/README.md index e05fa3d88e11..c160ac89569d 100644 --- a/nemo/collections/multimodal/README.md +++ b/nemo/collections/multimodal/README.md @@ -24,4 +24,4 @@ The NeMo Multimodal Collection supports a diverse range of multimodal models tai - **NSFW Detection Support** -Our documentation provides detailed information on each supported model, facilitating seamless integration and utilization within your projects. +Our [documentation](https://docs.nvidia.com/nemo-framework/user-guide/latest/index.html) offers comprehensive insights into each supported model, facilitating seamless integration and utilization within your projects. diff --git a/nemo/collections/nlp/README.md b/nemo/collections/nlp/README.md index faef438ddb58..fc6644d28293 100644 --- a/nemo/collections/nlp/README.md +++ b/nemo/collections/nlp/README.md @@ -10,4 +10,4 @@ The NeMo NLP/LLM Collection is designed to provide comprehensive support for on- - **Mistral** - **Mixtral** -Our `documentation `_ offers comprehensive insights into each supported model, facilitating seamless integration and utilization within your projects. +Our [documentation](https://docs.nvidia.com/nemo-framework/user-guide/latest/index.html) offers comprehensive insights into each supported model, facilitating seamless integration and utilization within your projects. diff --git a/nemo/collections/tts/README.md b/nemo/collections/tts/README.md index 54af33ab31cb..44b2b1b7a25c 100644 --- a/nemo/collections/tts/README.md +++ b/nemo/collections/tts/README.md @@ -1,7 +1,7 @@ # Text-to-Speech Synthesis (TTS): -* `Documentation `_ +* [Documentation](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/tts/intro.html#) * Mel-Spectrogram generators: FastPitch, SSL FastPitch, Mixer-TTS/Mixer-TTS-X, RAD-TTS, Tacotron2 * Vocoders: HiFiGAN, UnivNet, WaveGlow * End-to-End Models: VITS -* `Pre-trained Model Checkpoints in NVIDIA GPU Cloud (NGC) `_ \ No newline at end of file +* [Pre-trained Model Checkpoints in NVIDIA GPU Cloud (NGC)](https://ngc.nvidia.com/catalog/collections/nvidia:nemo_tts) \ No newline at end of file diff --git a/nemo/collections/vision/README.md b/nemo/collections/vision/README.md index 1cfe05a97ab9..057f5b3a4719 100644 --- a/nemo/collections/vision/README.md +++ b/nemo/collections/vision/README.md @@ -1,4 +1,6 @@ NeMo Vision Collection ======================== -The NeMo Vision Collection is designed to support the multimodal collection, particularly for models like LLAVA that necessitate a vision encoder implementation. At present, the vision collection features support for ViT, a customized version of the transformer model from Megatron core. \ No newline at end of file +The NeMo Vision Collection is designed to support the multimodal collection, particularly for models like LLAVA that necessitate a vision encoder implementation. At present, the vision collection features support for ViT, a customized version of the transformer model from Megatron core. + +Our [documentation](https://docs.nvidia.com/nemo-framework/user-guide/latest/index.html) offers comprehensive insights into each supported model, facilitating seamless integration and utilization within your projects. \ No newline at end of file