Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 9 additions & 11 deletions docs/source/_toctree.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
- sections:
- local: index
title: 🤗 Optimum Habana
title: 🤗 Optimum for Intel Gaudi
- local: installation
title: Installation
- local: quickstart
Expand All @@ -16,12 +16,14 @@
title: Run Inference
- local: tutorials/stable_diffusion
title: Stable Diffusion
- local: tutorials/stable_diffusion_ldm3d
title: LDM3D
- local: tutorials/tgi
title: TGI on Gaudi
title: Tutorials
- sections:
- local: usage_guides/overview
title: Overview
- local: usage_guides/script_adaptation
title: Script Adaptation
- local: usage_guides/pretraining
title: Pretraining Transformers
- local: usage_guides/accelerate_training
Expand All @@ -32,20 +34,16 @@
title: How to use DeepSpeed
- local: usage_guides/multi_node_training
title: Multi-node Training
- local: usage_guides/quantization
title: Quantization
Comment thread
dsocek marked this conversation as resolved.
title: How-To Guides
- sections:
- local: concept_guides/hpu
title: What are Habana's Gaudi and HPUs?
title: Conceptual Guides
- sections:
- local: package_reference/trainer
title: Gaudi Trainer
- local: package_reference/gaudi_config
title: Gaudi Configuration
- local: package_reference/stable_diffusion_pipeline
title: Gaudi Stable Diffusion Pipeline
- local: package_reference/distributed_runner
title: Distributed Runner
title: Reference
title: Optimum Habana
isExpanded: false
title: Optimum for Intel Gaudi
isExpanded: false
49 changes: 0 additions & 49 deletions docs/source/concept_guides/hpu.mdx

This file was deleted.

59 changes: 29 additions & 30 deletions docs/source/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -15,15 +15,36 @@ limitations under the License.
-->


# Optimum for Intel Gaudi
# Optimum for Intel® Gaudi® AI Accelerator

Optimum for Intel Gaudi is the interface between the Transformers and Diffusers libraries and [Intel® Gaudi® AI Accelerators (HPUs)](https://docs.habana.ai/en/latest/index.html).
Optimum for Intel Gaudi AI accelerator is the interface between Hugging Face libraries (Transformers, Diffusers, Accelerate,...) and [Intel Gaudi AI Accelerators (HPUs)](https://docs.habana.ai/en/latest/index.html).
It provides a set of tools that enable easy model loading, training and inference on single- and multi-HPU settings for various downstream tasks as shown in the table below.

HPUs offer fast model training and inference as well as a great price-performance ratio.
Check out [this blog post about BERT pre-training](https://huggingface.co/blog/pretraining-bert) and [this post benchmarking Intel Gaudi 2 with NVIDIA A100 GPUs](https://huggingface.co/blog/habana-gaudi-2-benchmark) for concrete examples.
If you are not familiar with HPUs, we recommend you take a look at [our conceptual guide](./concept_guides/hpu).
Comment thread
dsocek marked this conversation as resolved.
<div class="mt-10">
<div class="w-full flex flex-col space-y-4 md:space-y-0 md:grid md:grid-cols-2 md:gap-y-4 md:gap-x-5">
<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="./tutorials/overview"
><div class="w-full text-center bg-gradient-to-br from-blue-400 to-blue-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">Tutorials</div>
<p class="text-gray-700">Learn the basics and become familiar with training transformers on HPUs with 🤗 Optimum. Start here if you are using 🤗 Optimum for Intel Gaudi for the first time!</p>
</a>
<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="./usage_guides/overview"
><div class="w-full text-center bg-gradient-to-br from-indigo-400 to-indigo-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">How-to guides</div>
<p class="text-gray-700">Practical guides to help you achieve a specific goal. Take a look at these guides to learn how to use 🤗 Optimum for Intel Gaudi to solve real-world problems.</p>
</a>
</div>
</div>

The Intel Gaudi AI accelerator family currently includes three product generations:
[Intel Gaudi 1](https://habana.ai/products/gaudi/),
[Intel Gaudi 2](https://habana.ai/products/gaudi2/), and
[Intel Gaudi 3](https://habana.ai/products/gaudi3/).
Each server is equipped with 8 devices, known as Habana Processing Units (HPUs), providing 128GB of memory on Gaudi 3,
96GB on Gaudi 2, and 32GB on the first-gen Gaudi. For more details on the underlying hardware architecture, check out the
[Gaudi Architecture Overview](https://docs.habana.ai/en/latest/Gaudi_Overview/Gaudi_Architecture.html).
Optimum for Intel Gaudi library is fully compatible with all three generations of Gaudi accelerators.

For in-depth examples of running workloads on Gaudi, explore the following blog posts:
- [Benchmarking Intel Gaudi 2 with NVIDIA A100 GPUs](https://huggingface.co/blog/habana-gaudi-2-benchmark)
- [Accelerating Vision-Language Models: BridgeTower on Habana Gaudi2](https://huggingface.co/blog/bridgetower)

The following model architectures, tasks and device distributions have been validated for Optimum for Intel Gaudi:

Expand Down Expand Up @@ -91,7 +112,7 @@ In the tables below, ✅ means single-card, multi-card and DeepSpeed have all be
| Stable Diffusion XL | <li>[fine-tuning](https://github.com/huggingface/optimum-habana/tree/main/examples/stable-diffusion/training#fine-tuning-for-stable-diffusion-xl)</li> | <div style="text-align:left"><li>Single card</li></div> | <li>[text-to-image generation](https://github.com/huggingface/optimum-habana/tree/main/examples/stable-diffusion)</li> |
| Stable Diffusion Depth2img | | <li>Single card</li> | <li>[depth-to-image generation](https://github.com/huggingface/optimum-habana/tree/main/examples/stable-diffusion)</li> |
| LDM3D | | <div style="text-align:left"><li>Single card</li></div> | <li>[text-to-image generation](https://github.com/huggingface/optimum-habana/tree/main/examples/stable-diffusion)</li> |
| Text to Video | | <li>Single card</li> | <li>[text-to-video generation](https://github.com/huggingface/optimum-habana/tree/main/examples/text-to-video)</li> |
| Text to Video | | <li>Single card</li> | <li>[text-to-video generation](https://github.com/huggingface/optimum-habana/tree/main/examples/text-to-video)</li> |

- PyTorch Image Models/TIMM:

Expand All @@ -109,27 +130,5 @@ In the tables below, ✅ means single-card, multi-card and DeepSpeed have all be


Other models and tasks supported by the 🤗 Transformers and 🤗 Diffusers library may also work.
You can refer to this [section](https://github.com/huggingface/optimum-habana#how-to-use-it) for using them with 🤗 Optimum Habana.
Besides, [this page](https://github.com/huggingface/optimum-habana/tree/main/examples) explains how to modify any [example](https://github.com/huggingface/transformers/tree/main/examples/pytorch) from the 🤗 Transformers library to make it work with 🤗 Optimum Habana.


<div class="mt-10">
<div class="w-full flex flex-col space-y-4 md:space-y-0 md:grid md:grid-cols-2 md:gap-y-4 md:gap-x-5">
<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="./tutorials/overview"
><div class="w-full text-center bg-gradient-to-br from-blue-400 to-blue-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">Tutorials</div>
<p class="text-gray-700">Learn the basics and become familiar with training transformers on HPUs with 🤗 Optimum. Start here if you are using 🤗 Optimum Habana for the first time!</p>
</a>
<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="./usage_guides/overview"
><div class="w-full text-center bg-gradient-to-br from-indigo-400 to-indigo-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">How-to guides</div>
<p class="text-gray-700">Practical guides to help you achieve a specific goal. Take a look at these guides to learn how to use 🤗 Optimum Habana to solve real-world problems.</p>
</a>
<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="./concept_guides/hpu"
><div class="w-full text-center bg-gradient-to-br from-pink-400 to-pink-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">Conceptual guides</div>
<p class="text-gray-700">High-level explanations for building a better understanding of important topics such as HPUs.</p>
</a>
<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="./package_reference/trainer"
><div class="w-full text-center bg-gradient-to-br from-purple-400 to-purple-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">Reference</div>
<p class="text-gray-700">Technical descriptions of how the Habana classes and methods of 🤗 Optimum Habana work.</p>
</a>
Comment thread
dsocek marked this conversation as resolved.
</div>
</div>
You can refer to this [section](https://github.com/huggingface/optimum-habana#how-to-use-it) for using them with 🤗 Optimum for Intel Gaudi.
In addition, [this page](https://github.com/huggingface/optimum-habana/tree/main/examples) explains how to modify any [example](https://github.com/huggingface/transformers/tree/main/examples/pytorch) from the 🤗 Transformers library to make it work with 🤗 Optimum for Intel Gaudi.
9 changes: 7 additions & 2 deletions docs/source/installation.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -12,17 +12,22 @@ specific language governing permissions and limitations under the License.

# Installation

To install Optimum for Intel Gaudi, you first need to install SynapseAI and the Intel® Gaudi® drivers by following the official [installation guide](https://docs.habana.ai/en/latest/Installation_Guide/index.html).
To install Optimum for Intel® Gaudi® AI accelerator, you first need to install Intel Gaudi Software and the Intel Gaudi
AI accelerator drivers by following the official [installation guide](https://docs.habana.ai/en/latest/Installation_Guide/index.html).
Then, Optimum for Intel Gaudi can be installed using `pip` as follows:

```bash
python -m pip install --upgrade-strategy eager optimum[habana]
```


To use DeepSpeed on HPUs, you also need to run the following command:
To use Microsoft® DeepSpeed with Intel Gaudi devices, you also need to run the following command:

```bash
python -m pip install git+https://github.com/HabanaAI/DeepSpeed.git@1.18.0
```

To ensure that you are installing the correct Intel Gaudi Software, please run the `hl-smi` command to confirm the software version
being used in the system and apply the same version when running the DeepSpeed installation; please review the Intel Gaudi
[Support Matrix](https://docs.habana.ai/en/latest/Support_Matrix/Support_Matrix.html) and ensure that you are using an appropriate
version of DeepSpeed.
101 changes: 95 additions & 6 deletions docs/source/package_reference/gaudi_config.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -16,20 +16,109 @@ limitations under the License.

# GaudiConfig

To define a configuration for a specific workload you can use `GaudiConfig` class.

Here is a description of each configuration parameter:
- `use_fused_adam` enables to decide whether to use the [custom fused implementation of the ADAM optimizer provided by Intel® Gaudi® AI Accelerator](https://docs.habana.ai/en/latest/PyTorch/Model_Optimization_PyTorch/Custom_Ops_PyTorch.html#custom-optimizers).
- `use_fused_clip_norm` enables to decide whether to use the [custom fused implementation of gradient norm clipping provided by Intel® Gaudi® AI Accelerator](https://docs.habana.ai/en/latest/PyTorch/Model_Optimization_PyTorch/Custom_Ops_PyTorch.html#other-custom-ops).
- `use_torch_autocast` enables PyTorch autocast; used to define good pre-defined config; users should favor `--bf16` training argument
- `autocast_bf16_ops` list of operations that should be run with bf16 precision under autocast context; using environment flag PT_HPU_AUTOCAST_LOWER_PRECISION_OPS_LIST is a preffered way for operator autocast list override
- `autocast_fp32_ops` list of operations that should be run with fp32 precision under autocast context; using environment flag PT_HPU_AUTOCAST_FP32_OPS_LIST is a preffered way for operator autocast list override
- `use_fused_adam` controls whether to use the [custom fused implementation of the ADAM optimizer provided by Intel® Gaudi® AI Accelerator](https://docs.habana.ai/en/latest/PyTorch/Model_Optimization_PyTorch/Custom_Ops_PyTorch.html#custom-optimizers).
- `use_fused_clip_norm` controls whether to use the [custom fused implementation of gradient norm clipping provided by Intel® Gaudi® AI Accelerator](https://docs.habana.ai/en/latest/PyTorch/Model_Optimization_PyTorch/Custom_Ops_PyTorch.html#other-custom-ops).
- `use_torch_autocast` controls whether to enable PyTorch autocast; used to define good pre-defined config; users should favor `--bf16` training argument
- `use_dynamic_shapes` controls whether to enable dynamic shapes suppport when processing input dataset
- `autocast_bf16_ops` list of operations that should be run with bf16 precision under autocast context; using environment flag LOWER_LIST is a preffered way for operator autocast list override
- `autocast_fp32_ops` list of operations that should be run with fp32 precision under autocast context; using environment flag FP32_LIST is a preffered way for operator autocast list override

Parameter values of this class can be set from an external JSON file.

You can find examples of Gaudi configurations in the [Habana model repository on the Hugging Face Hub](https://huggingface.co/habana). For instance, [for BERT Large we have](https://huggingface.co/Habana/bert-large-uncased-whole-word-masking/blob/main/gaudi_config.json):
You can find examples of Gaudi configurations in the [Intel Gaudi model repository on the Hugging Face Hub](https://huggingface.co/habana).
For instance, [for BERT Large we have](https://huggingface.co/Habana/bert-large-uncased-whole-word-masking/blob/main/gaudi_config.json):
```JSON
{
"use_fused_adam": true,
"use_fused_clip_norm": true,
"use_torch_autocast": true
}
```

More advanced configuration file [for Stable Diffusion 2](https://huggingface.co/Habana/stable-diffusion-2/blob/main/gaudi_config.json):
```JSON
{
"use_torch_autocast": true,
"use_fused_adam": true,
"use_fused_clip_norm": true,
"autocast_bf16_ops": [
"_convolution.deprecated",
"_convolution",
"conv1d",
"conv2d",
"conv3d",
"conv_tbc",
"conv_transpose1d",
"conv_transpose2d.input",
"conv_transpose3d.input",
"convolution",
"prelu",
"addmm",
"addmv",
"addr",
"matmul",
"einsum",
"mm",
"mv",
"silu",
"linear",
"addbmm",
"baddbmm",
"bmm",
"chain_matmul",
"linalg_multi_dot",
"layer_norm",
"group_norm"
],
"autocast_fp32_ops": [
"acos",
"asin",
"cosh",
"erfinv",
"exp",
"expm1",
"log",
"log10",
"log2",
"log1p",
"reciprocal",
"rsqrt",
"sinh",
"tan",
"pow.Tensor_Scalar",
"pow.Tensor_Tensor",
"pow.Scalar",
"softplus",
"frobenius_norm",
"frobenius_norm.dim",
"nuclear_norm",
"nuclear_norm.dim",
"cosine_similarity",
"poisson_nll_loss",
"cosine_embedding_loss",
"nll_loss",
"nll_loss2d",
"hinge_embedding_loss",
"kl_div",
"l1_loss",
"smooth_l1_loss",
"huber_loss",
"mse_loss",
"margin_ranking_loss",
"multilabel_margin_loss",
"soft_margin_loss",
"triplet_margin_loss",
"multi_margin_loss",
"binary_cross_entropy_with_logits",
"dist",
"pdist",
"cdist",
"renorm",
"logsumexp"
]
}
```

Expand Down
Loading