diff --git a/README.md b/README.md index 8910d2bb46..429caebffd 100644 --- a/README.md +++ b/README.md @@ -239,7 +239,7 @@ The following model architectures, tasks and device distributions have been vali | DETR | |
  • Single card
  • |
  • [object detection](https://github.com/huggingface/optimum-habana/tree/main/examples/object-detection)
  • | | Mllama |
  • LoRA
  • | :heavy_check_mark: |
  • [image to text](https://github.com/huggingface/optimum-habana/tree/main/examples/image-to-text)
  • | | MiniCPM3 | |
  • Single card
  • |
  • [text generation](https://github.com/huggingface/optimum-habana/tree/main/examples/text-generation)
  • | -| Baichuan2 | |
  • Single card
  • |
  • [text generation](https://github.com/huggingface/optimum-habana/tree/main/examples/text-generation)
  • | +| Baichuan2 |
  • DeepSpeed
  • |
  • Single card
  • |
  • [language modeling](https://github.com/huggingface/optimum-habana/tree/main/examples/language-modeling)
  • [text generation](https://github.com/huggingface/optimum-habana/tree/main/examples/text-generation)
  • | | DeepSeek-V2 | | :heavy_check_mark: |
  • [text generation](https://github.com/huggingface/optimum-habana/tree/main/examples/text-generation)
  • | | ChatGLM |
  • DeepSpeed
  • |
  • Single card
  • |
  • [language modeling](https://github.com/huggingface/optimum-habana/tree/main/examples/language-modeling)
  • [text generation](https://github.com/huggingface/optimum-habana/tree/main/examples/text-generation)
  • | diff --git a/docs/source/index.mdx b/docs/source/index.mdx index e61a8ad5a2..51d6dadf0f 100644 --- a/docs/source/index.mdx +++ b/docs/source/index.mdx @@ -106,7 +106,7 @@ In the tables below, ✅ means single-card, multi-card and DeepSpeed have all be | DETR | |
  • Single card
  • |
  • [object detection](https://github.com/huggingface/optimum-habana/tree/main/examples/object-detection)
  • | | Mllama |
  • LoRA
  • |✅ |
  • [image to text](https://github.com/huggingface/optimum-habana/tree/main/examples/image-to-text)
  • | | MiniCPM3 | |
  • Single card
  • |
  • [text generation](https://github.com/huggingface/optimum-habana/tree/main/examples/text-generation)
  • | -| Baichuan2 | |
  • Single card
  • |
  • [text generation](https://github.com/huggingface/optimum-habana/tree/main/examples/text-generation)
  • | +| Baichuan2 |
  • DeepSpeed
  • |
  • Single card
  • |
  • [language modeling](https://github.com/huggingface/optimum-habana/tree/main/examples/language-modeling)
  • [text generation](https://github.com/huggingface/optimum-habana/tree/main/examples/text-generation)
  • | | DeepSeek-V2 | | ✅ |
  • [text generation](https://github.com/huggingface/optimum-habana/tree/main/examples/text-generation)
  • | | ChatGLM |
  • DeepSpeed
  • |
  • Single card
  • |
  • [language modeling](https://github.com/huggingface/optimum-habana/tree/main/examples/language-modeling)
  • [text generation](https://github.com/huggingface/optimum-habana/tree/main/examples/text-generation)
  • | diff --git a/examples/language-modeling/README.md b/examples/language-modeling/README.md index 23cbe5aacf..94ac08bec8 100644 --- a/examples/language-modeling/README.md +++ b/examples/language-modeling/README.md @@ -157,6 +157,33 @@ python ../gaudi_spawn.py \ --logging_steps 20 ``` +### Multi-card Training with Deepspeed (Baichuan2-13B-Chat) +```bash +python ../gaudi_spawn.py \ + --world_size 8 --use_deepspeed run_clm.py \ + --config_name baichuan-inc/Baichuan2-13B-Chat \ + --tokenizer_name baichuan-inc/Baichuan2-13B-Chat \ + --dataset_name wikitext \ + --num_train_epochs 30 \ + --dataset_config_name wikitext-2-raw-v1 \ + --per_device_train_batch_size 2 \ + --per_device_eval_batch_size 2 \ + --do_train \ + --do_eval \ + --deepspeed llama2_ds_zero3_config.json \ + --output_dir /tmp/test-clm \ + --gaudi_config_name Habana/gpt2 \ + --use_habana \ + --use_lazy_mode \ + --throughput_warmup_steps 3 \ + --bf16 \ + --block_size 1024 \ + --use_cache False \ + --overwrite_output_dir \ + --logging_first_step True \ + --logging_steps 20 +``` + ## Multi-Node Training with Deepspeed (GPT-NeoX)