Skip to content

Remove tie weights check in DTensor worker #684

@yuki-97

Description

@yuki-97

FSDP2 with Hugging Face tensor parallel plan should be able to handle tie_word_embeddings model with tp_size size > 1.

Currently in order to train a tie_word_embeddings model with tp_size > 1, we need to set NRL_SKIP_TIED_WEIGHT_CHECK=1 to skip the check, otherwise there will be an assert.

Need to validate if the following model works well, then remove the tie weights check and the env var.

  1. Qwen/Qwen2.5-1.5B-Instruct
  2. meta-llama/Llama-3.2-1B
  3. google/gemma-2-2b-it
  4. google/gemma-3-1b-it

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions