FSDP2 with Hugging Face tensor parallel plan should be able to handle tie_word_embeddings model with tp_size size > 1.
Currently in order to train a tie_word_embeddings model with tp_size > 1, we need to set NRL_SKIP_TIED_WEIGHT_CHECK=1 to skip the check, otherwise there will be an assert.
Need to validate if the following model works well, then remove the tie weights check and the env var.
- Qwen/Qwen2.5-1.5B-Instruct
- meta-llama/Llama-3.2-1B
- google/gemma-2-2b-it
- google/gemma-3-1b-it