diff --git a/torchtitan/models/deepseek_v3/README.md b/torchtitan/models/deepseek_v3/README.md
index 54aa8f8d28..6698852b47 100644
--- a/torchtitan/models/deepseek_v3/README.md
+++ b/torchtitan/models/deepseek_v3/README.md
@@ -16,6 +16,9 @@ python scripts/download_tokenizer.py --repo_id deepseek-ai/DeepSeek-V3
 python scripts/download_tokenizer.py --repo_id deepseek-ai/deepseek-moe-16b-base
 ```
 
+> **Note:** We are reusing the tokenizer from deepseek-ai/deepseek-moe-16b-base to help users test and run the 16B model. This is not the official tokenizer for the DeepSeek-V3-16B model. The DeepSeek-V3 model has a different architecture from the deepseek-moe models (different attention implementation, MoE router implementation, etc.), making it not feasible to load deepseek-moe-16b model weights into DeepSeek-V3-16B.
+
+
 ## Training
 
 ### Debug Training