diff --git a/torchtitan/models/deepseek_v3/README.md b/torchtitan/models/deepseek_v3/README.md index 54aa8f8d28..6698852b47 100644 --- a/torchtitan/models/deepseek_v3/README.md +++ b/torchtitan/models/deepseek_v3/README.md @@ -16,6 +16,9 @@ python scripts/download_tokenizer.py --repo_id deepseek-ai/DeepSeek-V3 python scripts/download_tokenizer.py --repo_id deepseek-ai/deepseek-moe-16b-base ``` +> **Note:** We are reusing the tokenizer from deepseek-ai/deepseek-moe-16b-base to help users test and run the 16B model. This is not the official tokenizer for the DeepSeek-V3-16B model. The DeepSeek-V3 model has a different architecture from the deepseek-moe models (different attention implementation, MoE router implementation, etc.), making it not feasible to load deepseek-moe-16b model weights into DeepSeek-V3-16B. + + ## Training ### Debug Training