Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update nemotron_340b.yaml #450

Merged
merged 1 commit into from
Dec 19, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 13 additions & 5 deletions launcher_scripts/conf/training/nemotron/nemotron_340b.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -58,8 +58,8 @@ model:
rampup_batch_size: null
context_parallel_size: 1
tensor_model_parallel_size: 8
pipeline_model_parallel_size: 12
virtual_pipeline_model_parallel_size: 8
pipeline_model_parallel_size: 8
virtual_pipeline_model_parallel_size: 12
encoder_seq_length: 4096
max_position_embeddings: 4096
num_layers: 96
Expand Down Expand Up @@ -131,9 +131,17 @@ model:
fsdp_sharding_strategy: 'full' # Method to shard model states. Available options are 'full', 'hybrid', and 'grad'.
fsdp_grad_reduce_dtype: 32 # Gradient reduction data type.
fsdp_sharded_checkpoint: False # Store and load FSDP shared checkpoint.

defer_embedding_wgrad_compute: True
wgrad_deferral_limit: 22
cross_entropy_loss_fusion: True
enable_vboost: True
ub_tp_comm_overlap: True
apply_rope_fusion: True
deteministic_mode: False
overlap_p2p_comm: True # Overlap p2p communication with computes. This argument is valid only when `virtual_pipeline_model_parallel_size` is larger than 1
batch_p2p_comm: False # Batch consecutive inter-peer send/recv operations. This argument is valid only when `virtual_pipeline_model_parallel_size` is larger than 1

overlap_p2p_comm: False # Overlap p2p communication with computes. This argument is valid only when `virtual_pipeline_model_parallel_size` is larger than 1
batch_p2p_comm: True # Batch consecutive inter-peer send/recv operations. This argument is valid only when `virtual_pipeline_model_parallel_size` is larger than 1
num_query_groups: 8 # Number of query groups for group query attention. If None, normal attention is used.

## Network
Expand Down Expand Up @@ -188,4 +196,4 @@ model:
- .0333
- ${data_dir}/my-nemotron_00_text_document
- .0333
- ${data_dir}/my-nemotron_00_text_document
- ${data_dir}/my-nemotron_00_text_document
Loading