huggingface · regisss · Nov 26, 2024 · Aug 22, 2024 · Nov 26, 2024 · Nov 26, 2024
@@ -1,6 +1,6 @@
 - sections:
   - local: index
-    title: 🤗 Optimum Habana
+    title: 🤗 Optimum for Intel Gaudi
   - local: installation
     title: Installation
   - local: quickstart
@@ -16,12 +16,14 @@
       title: Run Inference
     - local: tutorials/stable_diffusion
       title: Stable Diffusion
-    - local: tutorials/stable_diffusion_ldm3d
-      title: LDM3D
+    - local: tutorials/tgi
+      title: TGI on Gaudi
     title: Tutorials
   - sections:
     - local: usage_guides/overview
       title: Overview
+    - local: usage_guides/script_adaptation
+      title: Script Adaptation
     - local: usage_guides/pretraining
       title: Pretraining Transformers
     - local: usage_guides/accelerate_training
@@ -32,20 +34,16 @@
       title: How to use DeepSpeed
     - local: usage_guides/multi_node_training
       title: Multi-node Training
+    - local: usage_guides/quantization
+      title: Quantization
     title: How-To Guides
-  - sections:
-    - local: concept_guides/hpu
-      title: What are Habana's Gaudi and HPUs?
-    title: Conceptual Guides
   - sections:
     - local: package_reference/trainer
       title: Gaudi Trainer
     - local: package_reference/gaudi_config
       title: Gaudi Configuration
-    - local: package_reference/stable_diffusion_pipeline
-      title: Gaudi Stable Diffusion Pipeline
     - local: package_reference/distributed_runner
       title: Distributed Runner
     title: Reference
-  title: Optimum Habana
-  isExpanded: false
+  title: Optimum for Intel Gaudi
+  isExpanded: false
@@ -15,15 +15,36 @@ limitations under the License.
 -->
 
 
-# Optimum for Intel Gaudi
+# Optimum for Intel® Gaudi® AI Accelerator
 
-Optimum for Intel Gaudi is the interface between the Transformers and Diffusers libraries and [Intel® Gaudi® AI Accelerators (HPUs)](https://docs.habana.ai/en/latest/index.html).
+Optimum for Intel Gaudi AI accelerator is the interface between Hugging Face libraries (Transformers, Diffusers, Accelerate,...) and [Intel Gaudi AI Accelerators (HPUs)](https://docs.habana.ai/en/latest/index.html).
 It provides a set of tools that enable easy model loading, training and inference on single- and multi-HPU settings for various downstream tasks as shown in the table below.
 
-HPUs offer fast model training and inference as well as a great price-performance ratio.
-Check out [this blog post about BERT pre-training](https://huggingface.co/blog/pretraining-bert) and [this post benchmarking Intel Gaudi 2 with NVIDIA A100 GPUs](https://huggingface.co/blog/habana-gaudi-2-benchmark) for concrete examples.
-If you are not familiar with HPUs, we recommend you take a look at [our conceptual guide](./concept_guides/hpu).
+<div class="mt-10">
+  <div class="w-full flex flex-col space-y-4 md:space-y-0 md:grid md:grid-cols-2 md:gap-y-4 md:gap-x-5">
+    <a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="./tutorials/overview"
+      ><div class="w-full text-center bg-gradient-to-br from-blue-400 to-blue-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">Tutorials</div>
+      <p class="text-gray-700">Learn the basics and become familiar with training transformers on HPUs with 🤗 Optimum. Start here if you are using 🤗 Optimum for Intel Gaudi for the first time!</p>
+    </a>
+    <a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="./usage_guides/overview"
+      ><div class="w-full text-center bg-gradient-to-br from-indigo-400 to-indigo-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">How-to guides</div>
+      <p class="text-gray-700">Practical guides to help you achieve a specific goal. Take a look at these guides to learn how to use 🤗 Optimum for Intel Gaudi to solve real-world problems.</p>
+    </a>
+  </div>
+</div>
+
+The Intel Gaudi AI accelerator family currently includes three product generations:
+[Intel Gaudi 1](https://habana.ai/products/gaudi/),
+[Intel Gaudi 2](https://habana.ai/products/gaudi2/), and
+[Intel Gaudi 3](https://habana.ai/products/gaudi3/).
+Each server is equipped with 8 devices, known as Habana Processing Units (HPUs), providing 128GB of memory on Gaudi 3,
+96GB on Gaudi 2, and 32GB on the first-gen Gaudi. For more details on the underlying hardware architecture, check out the
+[Gaudi Architecture Overview](https://docs.habana.ai/en/latest/Gaudi_Overview/Gaudi_Architecture.html).
+Optimum for Intel Gaudi library is fully compatible with all three generations of Gaudi accelerators.
 
+For in-depth examples of running workloads on Gaudi, explore the following blog posts:
+- [Benchmarking Intel Gaudi 2 with NVIDIA A100 GPUs](https://huggingface.co/blog/habana-gaudi-2-benchmark)
+- [Accelerating Vision-Language Models: BridgeTower on Habana Gaudi2](https://huggingface.co/blog/bridgetower)
 
 The following model architectures, tasks and device distributions have been validated for Optimum for Intel Gaudi:
 
@@ -91,7 +112,7 @@ In the tables below, ✅ means single-card, multi-card and DeepSpeed have all be
 | Stable Diffusion XL | <li>[fine-tuning](https://github.com/huggingface/optimum-habana/tree/main/examples/stable-diffusion/training#fine-tuning-for-stable-diffusion-xl)</li> | <div style="text-align:left"><li>Single card</li></div> | <li>[text-to-image generation](https://github.com/huggingface/optimum-habana/tree/main/examples/stable-diffusion)</li> |
 | Stable Diffusion Depth2img | | <li>Single card</li> | <li>[depth-to-image generation](https://github.com/huggingface/optimum-habana/tree/main/examples/stable-diffusion)</li> |
 | LDM3D               |          | <div style="text-align:left"><li>Single card</li></div> | <li>[text-to-image generation](https://github.com/huggingface/optimum-habana/tree/main/examples/stable-diffusion)</li> |
-| Text to Video    |          | <li>Single card</li> | <li>[text-to-video generation](https://github.com/huggingface/optimum-habana/tree/main/examples/text-to-video)</li> |
+| Text to Video       |          | <li>Single card</li> | <li>[text-to-video generation](https://github.com/huggingface/optimum-habana/tree/main/examples/text-to-video)</li> |
 
 - PyTorch Image Models/TIMM:
 
@@ -109,27 +130,5 @@ In the tables below, ✅ means single-card, multi-card and DeepSpeed have all be
 
 
 Other models and tasks supported by the 🤗 Transformers and 🤗 Diffusers library may also work.
-You can refer to this [section](https://github.com/huggingface/optimum-habana#how-to-use-it) for using them with 🤗 Optimum Habana.
-Besides, [this page](https://github.com/huggingface/optimum-habana/tree/main/examples) explains how to modify any [example](https://github.com/huggingface/transformers/tree/main/examples/pytorch) from the 🤗 Transformers library to make it work with 🤗 Optimum Habana.
-
-
-<div class="mt-10">
-  <div class="w-full flex flex-col space-y-4 md:space-y-0 md:grid md:grid-cols-2 md:gap-y-4 md:gap-x-5">
-    <a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="./tutorials/overview"
-      ><div class="w-full text-center bg-gradient-to-br from-blue-400 to-blue-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">Tutorials</div>
-      <p class="text-gray-700">Learn the basics and become familiar with training transformers on HPUs with 🤗 Optimum. Start here if you are using 🤗 Optimum Habana for the first time!</p>
-    </a>
-    <a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="./usage_guides/overview"
-      ><div class="w-full text-center bg-gradient-to-br from-indigo-400 to-indigo-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">How-to guides</div>
-      <p class="text-gray-700">Practical guides to help you achieve a specific goal. Take a look at these guides to learn how to use 🤗 Optimum Habana to solve real-world problems.</p>
-    </a>
-    <a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="./concept_guides/hpu"
-      ><div class="w-full text-center bg-gradient-to-br from-pink-400 to-pink-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">Conceptual guides</div>
-      <p class="text-gray-700">High-level explanations for building a better understanding of important topics such as HPUs.</p>
-   </a>
-    <a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="./package_reference/trainer"
-      ><div class="w-full text-center bg-gradient-to-br from-purple-400 to-purple-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">Reference</div>
-      <p class="text-gray-700">Technical descriptions of how the Habana classes and methods of 🤗 Optimum Habana work.</p>
-    </a>
-  </div>
-</div>
+You can refer to this [section](https://github.com/huggingface/optimum-habana#how-to-use-it) for using them with 🤗 Optimum for Intel Gaudi.
+In addition, [this page](https://github.com/huggingface/optimum-habana/tree/main/examples) explains how to modify any [example](https://github.com/huggingface/transformers/tree/main/examples/pytorch) from the 🤗 Transformers library to make it work with 🤗 Optimum for Intel Gaudi.
@@ -12,17 +12,22 @@ specific language governing permissions and limitations under the License.
 
 # Installation
 
-To install Optimum for Intel Gaudi, you first need to install SynapseAI and the Intel® Gaudi® drivers by following the official [installation guide](https://docs.habana.ai/en/latest/Installation_Guide/index.html).
+To install Optimum for Intel® Gaudi® AI accelerator, you first need to install Intel Gaudi Software and the Intel Gaudi
+AI accelerator drivers by following the official [installation guide](https://docs.habana.ai/en/latest/Installation_Guide/index.html).
 Then, Optimum for Intel Gaudi can be installed using `pip` as follows:
 
 ```bash
 python -m pip install --upgrade-strategy eager optimum[habana]
 ```
 
 
-To use DeepSpeed on HPUs, you also need to run the following command:
+To use Microsoft® DeepSpeed with Intel Gaudi devices, you also need to run the following command:
 
 ```bash
 python -m pip install git+https://github.com/HabanaAI/DeepSpeed.git@1.18.0
 ```
 
+To ensure that you are installing the correct Intel Gaudi Software, please run the `hl-smi` command to confirm the software version
+being used in the system and apply the same version when running the DeepSpeed installation; please review the Intel Gaudi
+[Support Matrix](https://docs.habana.ai/en/latest/Support_Matrix/Support_Matrix.html) and ensure that you are using an appropriate
+version of DeepSpeed.
@@ -16,20 +16,109 @@ limitations under the License.
 
 # GaudiConfig
 
+To define a configuration for a specific workload you can use `GaudiConfig` class.
+
 Here is a description of each configuration parameter:
-- `use_fused_adam` enables to decide whether to use the [custom fused implementation of the ADAM optimizer provided by Intel® Gaudi® AI Accelerator](https://docs.habana.ai/en/latest/PyTorch/Model_Optimization_PyTorch/Custom_Ops_PyTorch.html#custom-optimizers).
-- `use_fused_clip_norm` enables to decide whether to use the [custom fused implementation of gradient norm clipping provided by Intel® Gaudi® AI Accelerator](https://docs.habana.ai/en/latest/PyTorch/Model_Optimization_PyTorch/Custom_Ops_PyTorch.html#other-custom-ops).
-- `use_torch_autocast` enables PyTorch autocast; used to define good pre-defined config; users should favor `--bf16` training argument
-- `autocast_bf16_ops` list of operations that should be run with bf16 precision under autocast context; using environment flag PT_HPU_AUTOCAST_LOWER_PRECISION_OPS_LIST is a preffered way for operator autocast list override
-- `autocast_fp32_ops` list of operations that should be run with fp32 precision under autocast context; using environment flag PT_HPU_AUTOCAST_FP32_OPS_LIST is a preffered way for operator autocast list override
+- `use_fused_adam` controls whether to use the [custom fused implementation of the ADAM optimizer provided by Intel® Gaudi® AI Accelerator](https://docs.habana.ai/en/latest/PyTorch/Model_Optimization_PyTorch/Custom_Ops_PyTorch.html#custom-optimizers).
+- `use_fused_clip_norm` controls whether to use the [custom fused implementation of gradient norm clipping provided by Intel® Gaudi® AI Accelerator](https://docs.habana.ai/en/latest/PyTorch/Model_Optimization_PyTorch/Custom_Ops_PyTorch.html#other-custom-ops).
+- `use_torch_autocast` controls whether to enable PyTorch autocast; used to define good pre-defined config; users should favor `--bf16` training argument
+- `use_dynamic_shapes` controls whether to enable dynamic shapes suppport when processing input dataset
+- `autocast_bf16_ops` list of operations that should be run with bf16 precision under autocast context; using environment flag LOWER_LIST is a preffered way for operator autocast list override
+- `autocast_fp32_ops` list of operations that should be run with fp32 precision under autocast context; using environment flag FP32_LIST is a preffered way for operator autocast list override
 
+Parameter values of this class can be set from an external JSON file.
 
-You can find examples of Gaudi configurations in the [Habana model repository on the Hugging Face Hub](https://huggingface.co/habana). For instance, [for BERT Large we have](https://huggingface.co/Habana/bert-large-uncased-whole-word-masking/blob/main/gaudi_config.json):
+You can find examples of Gaudi configurations in the [Intel Gaudi model repository on the Hugging Face Hub](https://huggingface.co/habana).
+For instance, [for BERT Large we have](https://huggingface.co/Habana/bert-large-uncased-whole-word-masking/blob/main/gaudi_config.json):
+```JSON
+{
+  "use_fused_adam": true,
+  "use_fused_clip_norm": true,
+  "use_torch_autocast": true
+}
+```
 
+More advanced configuration file [for Stable Diffusion 2](https://huggingface.co/Habana/stable-diffusion-2/blob/main/gaudi_config.json):
 ```JSON
 {
+  "use_torch_autocast": true,
   "use_fused_adam": true,
   "use_fused_clip_norm": true,
+  "autocast_bf16_ops": [
+    "_convolution.deprecated",
+    "_convolution",
+    "conv1d",
+    "conv2d",
+    "conv3d",
+    "conv_tbc",
+    "conv_transpose1d",
+    "conv_transpose2d.input",
+    "conv_transpose3d.input",
+    "convolution",
+    "prelu",
+    "addmm",
+    "addmv",
+    "addr",
+    "matmul",
+    "einsum",
+    "mm",
+    "mv",
+    "silu",
+    "linear",
+    "addbmm",
+    "baddbmm",
+    "bmm",
+    "chain_matmul",
+    "linalg_multi_dot",
+    "layer_norm",
+    "group_norm"
+  ],
+  "autocast_fp32_ops": [
+    "acos",
+    "asin",
+    "cosh",
+    "erfinv",
+    "exp",
+    "expm1",
+    "log",
+    "log10",
+    "log2",
+    "log1p",
+    "reciprocal",
+    "rsqrt",
+    "sinh",
+    "tan",
+    "pow.Tensor_Scalar",
+    "pow.Tensor_Tensor",
+    "pow.Scalar",
+    "softplus",
+    "frobenius_norm",
+    "frobenius_norm.dim",
+    "nuclear_norm",
+    "nuclear_norm.dim",
+    "cosine_similarity",
+    "poisson_nll_loss",
+    "cosine_embedding_loss",
+    "nll_loss",
+    "nll_loss2d",
+    "hinge_embedding_loss",
+    "kl_div",
+    "l1_loss",
+    "smooth_l1_loss",
+    "huber_loss",
+    "mse_loss",
+    "margin_ranking_loss",
+    "multilabel_margin_loss",
+    "soft_margin_loss",
+    "triplet_margin_loss",
+    "multi_margin_loss",
+    "binary_cross_entropy_with_logits",
+    "dist",
+    "pdist",
+    "cdist",
+    "renorm",
+    "logsumexp"
+  ]
 }
 ```