diff --git a/README.md b/README.md
index 8910d2bb46..429caebffd 100644
--- a/README.md
+++ b/README.md
@@ -239,7 +239,7 @@ The following model architectures, tasks and device distributions have been vali
| DETR | |
Single card | [object detection](https://github.com/huggingface/optimum-habana/tree/main/examples/object-detection) |
| Mllama | LoRA | :heavy_check_mark: | [image to text](https://github.com/huggingface/optimum-habana/tree/main/examples/image-to-text) |
| MiniCPM3 | | Single card | [text generation](https://github.com/huggingface/optimum-habana/tree/main/examples/text-generation) |
-| Baichuan2 | | Single card | [text generation](https://github.com/huggingface/optimum-habana/tree/main/examples/text-generation) |
+| Baichuan2 | DeepSpeed | Single card | [language modeling](https://github.com/huggingface/optimum-habana/tree/main/examples/language-modeling)[text generation](https://github.com/huggingface/optimum-habana/tree/main/examples/text-generation) |
| DeepSeek-V2 | | :heavy_check_mark: | [text generation](https://github.com/huggingface/optimum-habana/tree/main/examples/text-generation) |
| ChatGLM | DeepSpeed | Single card | [language modeling](https://github.com/huggingface/optimum-habana/tree/main/examples/language-modeling)[text generation](https://github.com/huggingface/optimum-habana/tree/main/examples/text-generation) |
diff --git a/docs/source/index.mdx b/docs/source/index.mdx
index e61a8ad5a2..51d6dadf0f 100644
--- a/docs/source/index.mdx
+++ b/docs/source/index.mdx
@@ -106,7 +106,7 @@ In the tables below, ✅ means single-card, multi-card and DeepSpeed have all be
| DETR | | Single card | [object detection](https://github.com/huggingface/optimum-habana/tree/main/examples/object-detection) |
| Mllama | LoRA |✅ | [image to text](https://github.com/huggingface/optimum-habana/tree/main/examples/image-to-text) |
| MiniCPM3 | | Single card | [text generation](https://github.com/huggingface/optimum-habana/tree/main/examples/text-generation) |
-| Baichuan2 | | Single card | [text generation](https://github.com/huggingface/optimum-habana/tree/main/examples/text-generation) |
+| Baichuan2 | DeepSpeed | Single card | [language modeling](https://github.com/huggingface/optimum-habana/tree/main/examples/language-modeling)[text generation](https://github.com/huggingface/optimum-habana/tree/main/examples/text-generation) |
| DeepSeek-V2 | | ✅ | [text generation](https://github.com/huggingface/optimum-habana/tree/main/examples/text-generation) |
| ChatGLM | DeepSpeed | Single card | [language modeling](https://github.com/huggingface/optimum-habana/tree/main/examples/language-modeling)[text generation](https://github.com/huggingface/optimum-habana/tree/main/examples/text-generation) |
diff --git a/examples/language-modeling/README.md b/examples/language-modeling/README.md
index 23cbe5aacf..94ac08bec8 100644
--- a/examples/language-modeling/README.md
+++ b/examples/language-modeling/README.md
@@ -157,6 +157,33 @@ python ../gaudi_spawn.py \
--logging_steps 20
```
+### Multi-card Training with Deepspeed (Baichuan2-13B-Chat)
+```bash
+python ../gaudi_spawn.py \
+ --world_size 8 --use_deepspeed run_clm.py \
+ --config_name baichuan-inc/Baichuan2-13B-Chat \
+ --tokenizer_name baichuan-inc/Baichuan2-13B-Chat \
+ --dataset_name wikitext \
+ --num_train_epochs 30 \
+ --dataset_config_name wikitext-2-raw-v1 \
+ --per_device_train_batch_size 2 \
+ --per_device_eval_batch_size 2 \
+ --do_train \
+ --do_eval \
+ --deepspeed llama2_ds_zero3_config.json \
+ --output_dir /tmp/test-clm \
+ --gaudi_config_name Habana/gpt2 \
+ --use_habana \
+ --use_lazy_mode \
+ --throughput_warmup_steps 3 \
+ --bf16 \
+ --block_size 1024 \
+ --use_cache False \
+ --overwrite_output_dir \
+ --logging_first_step True \
+ --logging_steps 20
+```
+
## Multi-Node Training with Deepspeed (GPT-NeoX)