-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Falcon Variants (7B/40B/180B) in Mcore NeMo #7666
Merged
Merged
Changes from all commits
Commits
Show all changes
86 commits
Select commit
Hold shift + click to select a range
fe90ac2
support falcon
xuanzic ffaf228
support falcon bug fix layernorm naming
xuanzic 562e6f0
fix todo
xuanzic 8297b5c
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 2fc07a4
fix for new architecture
xuanzic 9bafd73
new transformerlayer for falcon
xuanzic 3bf4b54
Merge branch 'vchen/falcon' of https://github.com/xuanzic/NeMo into v…
xuanzic 36fe312
fix for new decoder architecture
xuanzic 044026d
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 908004e
add DDP
xuanzic c69d577
fix state dict based on spec system
xuanzic 0610e19
fix state dict based on change in layers, fix amp O2
xuanzic a8684d0
add falcon spec system support
xuanzic 0995272
remove old falcon mcore support
xuanzic f2ad089
refactor conversion script to align with others
xuanzic 47d2f23
add support for falcon-rw model (normal gpt architecture)
xuanzic ed8869a
modify falcon 7b config and remove trust remote code due to HF code c…
xuanzic 59e0f2e
rename falcon implementation dir
xuanzic 03d06bc
change dir name
xuanzic 71b25b8
modify block name
xuanzic 9bb2e32
rename decoder layer
xuanzic d105603
clean up
xuanzic 65fb726
remove debug
xuanzic eaa42ff
Merge remote-tracking branch 'upstream/main' into vchen/falcon
xuanzic c4ad769
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] b9264d4
add proper header
xuanzic 1c1c7dd
Merge branch 'main' into vchen/falcon
xuanzic 3dcbd38
falcon lora mixin to support when non-fused LN linear
HuiyingLi 11fd6cf
Merge remote-tracking branch 'upstream/main' into vchen/falcon
xuanzic 508ba85
Merge pull request #2 from HuiyingLi/vchen/falcon
xuanzic a9df5a4
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] d712925
revise jenkinsfile, tokenizer update in convertion script, add two fa…
xuanzic 7a46e86
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] d4a5fec
refactor falcon to use MCoreGPT+spec+baselayer initial commit
HuiyingLi f186033
modification to get nemo run with mcore in this version
HuiyingLi fbc7c19
Merge pull request #3 from xuanzic/huiyingl/refactor_falcon
xuanzic 0170f31
revise jenkins
xuanzic 1c846b8
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 42ff405
small fix on the output file path
xuanzic 0fea447
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 39b78a9
add nemo to hf conversion script
xuanzic c85f3ac
fix on base layer config and missing state dict due to dist ckpt
xuanzic b0c1bb7
Revert "fix on base layer config and missing state dict due to dist c…
xuanzic ce1bf4a
fix on base layer config and missing state dict due to dist ckpt
xuanzic 8db8992
Merge pull request #4 from xuanzic/vchen/nemo_falcon_convert
xuanzic 2bc8286
fix conflict
xuanzic 6351ae9
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] a6c1fae
fix megatron_gpt_model
xuanzic 1156b6a
modify model config
xuanzic 5034dfc
Apply suggestions from code review
xuanzic c97d38c
fix based on review
xuanzic 4383bd1
multiple revise based on review and latest mcore changes
xuanzic b499c88
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 94f4ba2
fix
xuanzic 8928050
subclass from TransformerLayer
HuiyingLi b523440
fixes according to comments
HuiyingLi 54ae837
Merge branch 'main' into vchen/falcon
ericharper 8ebf142
add falcon ci test
xuanzic 1c1bc51
add post_self_attn_layernorm
HuiyingLi f3931f7
Merge branch 'main' into vchen/falcon
ericharper f84fee6
add explicit explanation/refs for handling lora logic
HuiyingLi b545a8b
Merge pull request #5 from xuanzic/huiyingl/subclass_transformerlayer
xuanzic ea39e68
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] aea1e81
fixes for code scanning
HuiyingLi 812ce43
Merge branch 'main' into vchen/falcon
ericharper b0966c1
remove unused imports
HuiyingLi 0c9a2e3
unit test for falcon model
HuiyingLi 8e8ba66
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] b1e6398
add falcon transformer layer unit test
HuiyingLi 2759755
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] f737a74
Merge branch 'main' into vchen/falcon
ericharper 5ad525c
fixes for code scan
HuiyingLi 9c4960f
remove mcore dependent tests
HuiyingLi fb04806
Revert "remove mcore dependent tests"
HuiyingLi e54fdad
add import guards
HuiyingLi 7cd8cfb
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] e51cfa1
Merge branch 'main' into vchen/falcon
ericharper beada8c
add import guards cont
HuiyingLi 5d76cf3
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] fafa39f
Merge branch 'main' into vchen/falcon
ericharper 27b7694
fixes for ci import tests and unit tests
HuiyingLi e7476e8
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 9028555
fixes for codeql
HuiyingLi 0531cff
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] eb5bf94
Merge branch 'main' into vchen/falcon
ericharper 5f866da
Revert "fixes for codeql"
HuiyingLi File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
219 changes: 219 additions & 0 deletions
219
examples/nlp/language_modeling/conf/megatron_falcon_config.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,219 @@ | ||
name: megatron_falcon_gpt | ||
restore_from_path: null # used when starting from a .nemo file | ||
|
||
trainer: | ||
devices: 1 | ||
num_nodes: 1 | ||
accelerator: gpu | ||
precision: bf16 | ||
logger: False # logger provided by exp_manager | ||
enable_checkpointing: False | ||
use_distributed_sampler: False | ||
max_epochs: -1 # PTL default. In practice, max_steps will be reached first. | ||
max_steps: 100000 # consumed_samples = global_step * micro_batch_size * data_parallel_size * accumulate_grad_batches | ||
log_every_n_steps: 10 | ||
val_check_interval: 100 | ||
limit_val_batches: 50 | ||
limit_test_batches: 500 | ||
accumulate_grad_batches: 1 # do not modify, grad acc is automatic for training megatron models | ||
gradient_clip_val: 1.0 | ||
benchmark: False | ||
enable_model_summary: False # default PTL callback for this does not support model parallelism, instead we log manually | ||
|
||
exp_manager: | ||
explicit_log_dir: null | ||
exp_dir: null | ||
name: megatron_falcon_gpt | ||
create_wandb_logger: False | ||
wandb_logger_kwargs: | ||
project: null | ||
name: null | ||
resume_if_exists: True | ||
resume_ignore_no_checkpoint: True | ||
create_checkpoint_callback: True | ||
checkpoint_callback_params: | ||
monitor: val_loss | ||
save_top_k: 10 | ||
mode: min | ||
always_save_nemo: False # saves nemo file during validation, not implemented for model parallel | ||
save_nemo_on_train_end: False # not recommended when training large models on clusters with short time limits | ||
filename: 'megatron_falcon--{val_loss:.2f}-{step}-{consumed_samples}' | ||
model_parallel_size: ${multiply:${model.tensor_model_parallel_size}, ${model.pipeline_model_parallel_size}} | ||
|
||
model: | ||
mcore_gpt: True | ||
# specify micro_batch_size, global_batch_size, and model parallelism | ||
# gradient accumulation will be done automatically based on data_parallel_size | ||
micro_batch_size: 1 # limited by GPU memory | ||
global_batch_size: 1 # will use more micro batches to reach global batch size | ||
tensor_model_parallel_size: 1 # intra-layer model parallelism | ||
pipeline_model_parallel_size: 1 # inter-layer model parallelism | ||
virtual_pipeline_model_parallel_size: null # interleaved pipeline | ||
|
||
# model architecture | ||
encoder_seq_length: 2048 | ||
max_position_embeddings: ${.encoder_seq_length} | ||
num_layers: 32 # 7b: 32 | 40b: 60 | 180b: 80 | ||
hidden_size: 4544 # 7b: 4544 | 40b: 8192 | 180b: 14848 | ||
ffn_hidden_size: 18176 # Transformer FFN hidden size. Usually 4 * hidden_size. | 7b: 18176 | 40b: 32768 | 180b: 59392 | ||
num_attention_heads: 71 # 7b: 71 | 40b: 128 | 180b: 232 | ||
init_method_std: 0.02 # Standard deviation of the zero mean normal distribution used for weight initialization.') | ||
use_scaled_init_method: True # use scaled residuals initialization | ||
hidden_dropout: 0.0 # Dropout probability for hidden state transformer. | ||
attention_dropout: 0.0 # Dropout probability for attention | ||
ffn_dropout: 0.0 # Dropout probability in the feed-forward layer. | ||
kv_channels: null # Projection weights dimension in multi-head attention. Set to hidden_size // num_attention_heads if null | ||
apply_query_key_layer_scaling: True # scale Q * K^T by 1 / layer-number. | ||
normalization: 'layernorm' # Normalization layer to use. Options are 'layernorm', 'rmsnorm' | ||
layernorm_epsilon: 1e-5 | ||
do_layer_norm_weight_decay: False # True means weight decay on all params | ||
make_vocab_size_divisible_by: 128 # Pad the vocab size to be divisible by this value for computation efficiency. | ||
pre_process: True # add embedding | ||
post_process: True # add pooler | ||
persist_layer_norm: True # Use of persistent fused layer norm kernel. | ||
bias: False # Whether to use bias terms in all weight matrices. | ||
activation: 'gelu' # Options ['gelu', 'geglu', 'swiglu', 'reglu', 'squared-relu', 'fast-geglu', 'fast-swiglu', 'fast-reglu'] | ||
headscale: False # Whether to learn extra parameters that scale the output of the each self-attention head. | ||
transformer_block_type: 'pre_ln' # Options ['pre_ln', 'post_ln', 'normformer'] | ||
openai_gelu: False # Use OpenAI's GELU instead of the default GeLU | ||
normalize_attention_scores: True # Whether to scale the output Q * K^T by 1 / sqrt(hidden_size_per_head). This arg is provided as a configuration option mostly for compatibility with models that have been weight-converted from HF. You almost always want to se this to True. | ||
position_embedding_type: 'rope' # Position embedding type. Options ['learned_absolute', 'rope'] | ||
rotary_percentage: 1.0 # If using position_embedding_type=rope, then the per head dim is multiplied by this. | ||
attention_type: 'multihead' # Attention type. Options ['multihead'] | ||
share_embeddings_and_output_weights: False # Share embedding and output layer weights. | ||
overlap_p2p_comm: False # Overlap p2p communication with computes. This argument is valid only when `virtual_pipeline_model_parallel_size` is larger than 1 | ||
batch_p2p_comm: True # Batch consecutive inter-peer send/recv operations. This argument is valid only when `virtual_pipeline_model_parallel_size` is larger than 1 | ||
num_query_groups: 1 # Number of query groups for group query attention. If None, normal attention is used. | 7b: 1 | 40b: 8 | 180b: 8 | ||
gc_interval: 0 | ||
precision: bf16 | ||
mcore_customization_config: | ||
new_decoder_architecture: false | ||
parallel_attention: true | ||
|
||
tokenizer: | ||
library: 'huggingface' | ||
type: 'tiiuae/falcon-7b' | ||
use_fast: True | ||
|
||
# Mixed precision | ||
native_amp_init_scale: 4294967296 # 2 ** 32 | ||
native_amp_growth_interval: 1000 | ||
hysteresis: 2 # Gradient scale hysteresis | ||
fp32_residual_connection: False # Move residual connections to fp32 | ||
fp16_lm_cross_entropy: False # Move the cross entropy unreduced loss calculation for lm head to fp16 | ||
|
||
# Megatron O2-style half-precision | ||
megatron_amp_O2: False # Enable O2-level automatic mixed precision using main parameters | ||
grad_allreduce_chunk_size_mb: 125 | ||
|
||
# Fusion | ||
grad_div_ar_fusion: True # Fuse grad division into torch.distributed.all_reduce. Only used with O2 and no pipeline parallelism.. | ||
gradient_accumulation_fusion: False # Fuse weight gradient accumulation to GEMMs. Only used with pipeline parallelism and O2. | ||
bias_activation_fusion: False # Use a kernel that fuses the bias addition from weight matrices with the subsequent activation function. | ||
bias_dropout_add_fusion: False # Use a kernel that fuses the bias addition, dropout and residual connection addition. | ||
masked_softmax_fusion: True # Use a kernel that fuses the attention softmax with it's mask. | ||
get_attention_mask_from_fusion: True # When using fused softmax it will create the attention mask so we won't copy it to the pipeline stages. | ||
|
||
|
||
# Miscellaneous | ||
seed: 1234 | ||
resume_from_checkpoint: null # manually set the checkpoint file to load from | ||
use_cpu_initialization: False # Init weights on the CPU (slow for large models) | ||
onnx_safe: False # Use work-arounds for known problems with Torch ONNX exporter. | ||
apex_transformer_log_level: 30 # Python logging level displays logs with severity greater than or equal to this | ||
gradient_as_bucket_view: True # PyTorch DDP argument. Allocate gradients in a contiguous bucket to save memory (less fragmentation and buffer memory) | ||
sync_batch_comm: False # Enable stream synchronization after each p2p communication between pipeline stages | ||
|
||
## Activation Checkpointing | ||
# NeMo Megatron supports 'selective' activation checkpointing where only the memory intensive part of attention is checkpointed. | ||
# These memory intensive activations are also less compute intensive which makes activation checkpointing more efficient for LLMs (20B+). | ||
# See Reducing Activation Recomputation in Large Transformer Models: https://arxiv.org/abs/2205.05198 for more details. | ||
# 'full' will checkpoint the entire transformer layer. | ||
activations_checkpoint_granularity: null # 'selective' or 'full' | ||
activations_checkpoint_method: null # 'uniform', 'block' | ||
# 'uniform' divides the total number of transformer layers and checkpoints the input activation | ||
# of each chunk at the specified granularity. When used with 'selective', 'uniform' checkpoints all attention blocks in the model. | ||
# 'block' checkpoints the specified number of layers per pipeline stage at the specified granularity | ||
activations_checkpoint_num_layers: null | ||
# when using 'uniform' this creates groups of transformer layers to checkpoint. Usually set to 1. Increase to save more memory. | ||
# when using 'block' this this will checkpoint the first activations_checkpoint_num_layers per pipeline stage. | ||
num_micro_batches_with_partial_activation_checkpoints: null | ||
# This feature is valid only when used with pipeline-model-parallelism. | ||
# When an integer value is provided, it sets the number of micro-batches where only a partial number of Transformer layers get checkpointed | ||
# and recomputed within a window of micro-batches. The rest of micro-batches in the window checkpoint all Transformer layers. The size of window is | ||
# set by the maximum outstanding micro-batch backpropagations, which varies at different pipeline stages. The number of partial layers to checkpoint | ||
# per micro-batch is set by 'activations_checkpoint_num_layers' with 'activations_checkpoint_method' of 'block'. | ||
# This feature enables using activation checkpoint at a fraction of micro-batches up to the point of full GPU memory usage. | ||
activations_checkpoint_layers_per_pipeline: null | ||
# This feature is valid only when used with pipeline-model-parallelism. | ||
# When an integer value (rounded down when float is given) is provided, it sets the number of Transformer layers to skip checkpointing at later | ||
# pipeline stages. For example, 'activations_checkpoint_layers_per_pipeline' of 3 makes pipeline stage 1 to checkpoint 3 layers less than | ||
# stage 0 and stage 2 to checkpoint 6 layers less stage 0, and so on. This is possible because later pipeline stage | ||
# uses less GPU memory with fewer outstanding micro-batch backpropagations. Used with 'num_micro_batches_with_partial_activation_checkpoints', | ||
# this feature removes most of activation checkpoints at the last pipeline stage, which is the critical execution path. | ||
|
||
## Sequence Parallelism | ||
# Makes tensor parallelism more memory efficient for LLMs (20B+) by parallelizing layer norms and dropout sequentially | ||
# See Reducing Activation Recomputation in Large Transformer Models: https://arxiv.org/abs/2205.05198 for more details. | ||
sequence_parallel: False | ||
|
||
## Transformer Engine | ||
fp8: False # enables fp8 in TransformerLayer forward | ||
fp8_e4m3: False # sets fp8_format = recipe.Format.E4M3 | ||
fp8_hybrid: False # sets fp8_format = recipe.Format.HYBRID | ||
fp8_margin: 0 # scaling margin | ||
fp8_interval: 1 # scaling update interval | ||
fp8_amax_history_len: 1 # Number of steps for which amax history is recorded per tensor | ||
fp8_amax_compute_algo: most_recent # 'most_recent' or 'max'. Algorithm for computing amax from history | ||
reduce_amax: True # Perform reduction to sync amax tensors across GPUs after every iteration | ||
use_emha: False # Use fused multi-head attention for large sequence-length. Note this is not yet supported. Please set to False. | ||
|
||
data: | ||
# Path to data must be specified by the user. | ||
# Supports List, String and Dictionary | ||
# List : can override from the CLI: "model.data.data_prefix=[.5,/raid/data/pile/my-gpt3_00_text_document,.5,/raid/data/pile/my-gpt3_01_text_document]", | ||
# Or see example below: | ||
# data_prefix: | ||
# - .5 | ||
# - /raid/data/pile/my-gpt3_00_text_document | ||
# - .5 | ||
# - /raid/data/pile/my-gpt3_01_text_document | ||
# Dictionary: can override from CLI "model.data.data_prefix"={"train":[1.0, /path/to/data], "validation":/path/to/data, "test":/path/to/test} | ||
# Or see example below: | ||
# "model.data.data_prefix: {train:[1.0,/path/to/data], validation:[/path/to/data], test:[/path/to/test]}" | ||
# data_prefix: ??? | ||
index_mapping_dir: null # path to save index mapping .npy files, by default will save in the same location as data_prefix | ||
data_impl: mmap | ||
splits_string: 900,50,50 | ||
seq_length: ${model.encoder_seq_length} | ||
skip_warmup: True | ||
num_workers: 2 | ||
dataloader_type: single # cyclic | ||
reset_position_ids: False # Reset position ids after end-of-document token | ||
reset_attention_mask: False # Reset attention mask after end-of-document token | ||
eod_mask_loss: False # Mask loss for the end of document tokens | ||
validation_drop_last: True # Set to false if the last partial validation samples is to be consumed | ||
no_seqlen_plus_one_input_tokens: False # Set to True to disable fetching (sequence length + 1) input tokens, instead get (sequence length) input tokens and mask the last token | ||
pad_samples_to_global_batch_size: False # Set to True if you want to pad the last partial batch with -1's to equal global batch size | ||
shuffle_documents: True # Set to False to disable documents shuffling. Sample index will still be shuffled | ||
|
||
# Nsys profiling options | ||
nsys_profile: | ||
enabled: False | ||
start_step: 10 # Global batch to start profiling | ||
end_step: 10 # Global batch to end profiling | ||
ranks: [0] # Global rank IDs to profile | ||
gen_shape: False # Generate model and kernel details including input shapes | ||
|
||
optim: | ||
name: distributed_fused_adam | ||
lr: 2e-4 | ||
weight_decay: 0.01 | ||
betas: | ||
- 0.9 | ||
- 0.98 | ||
sched: | ||
name: CosineAnnealing | ||
warmup_steps: 500 | ||
constant_steps: 50000 | ||
min_lr: 2e-5 |
38 changes: 38 additions & 0 deletions
38
examples/nlp/language_modeling/conf/megatron_falcon_inference.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
inference: | ||
greedy: False # Whether or not to use sampling ; use greedy decoding otherwise | ||
top_k: 0 # The number of highest probability vocabulary tokens to keep for top-k-filtering. | ||
top_p: 0.9 # If set to float < 1, only the most probable tokens with probabilities that add up to top_p or higher are kept for generation. | ||
temperature: 1.0 # sampling temperature | ||
add_BOS: False # add the bos token at the begining of the prompt | ||
tokens_to_generate: 30 # The minimum length of the sequence to be generated. | ||
all_probs: False # whether return the log prob for all the tokens in vocab | ||
repetition_penalty: 1.2 # The parameter for repetition penalty. 1.0 means no penalty. | ||
min_tokens_to_generate: 0 # The minimum length of the sequence to be generated. | ||
compute_logprob: False # a flag used to compute logprob of all the input text, a very special case of running inference, default False | ||
end_strings: ["<|endoftext|>"] # generation will stop when one of these tokens is generated | ||
|
||
trainer: | ||
devices: 1 | ||
num_nodes: 1 | ||
accelerator: gpu | ||
logger: False # logger provided by exp_manager | ||
precision: bf16 # 16, 32, or bf16 | ||
use_distributed_sampler: False | ||
|
||
tensor_model_parallel_size: 1 | ||
pipeline_model_parallel_size: 1 | ||
megatron_amp_O2: True # Enable O2-level automatic mixed precision to save memory | ||
gpt_model_file: null # GPT nemo file path | ||
checkpoint_dir: null # checkpoint file dir. This is used to load the PTL checkpoint generated during the GPT training | ||
checkpoint_name: null # PTL checkpoint file name, only used for PTL checkpoint loading | ||
hparams_file: null # model configuration file, only used for PTL checkpoint loading | ||
prompts: # prompts for GPT inference | ||
- "Q: How are you?" | ||
- "Q: How big is the universe?" | ||
server: False # whether launch the API server | ||
port: 5555 # the port number for the inference server | ||
web_server: False # whether launch the web inference server | ||
share: False # whether create a public URL | ||
username: test # user name for web client | ||
password: test2 # password for web client | ||
web_port: 9889 # the port number of the web server |
13 changes: 13 additions & 0 deletions
13
nemo/collections/nlp/models/language_modeling/megatron/falcon/__init__.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
# Copyright (c) 2023, NVIDIA CORPORATION. All rights reserved. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have many
megatron_*_inference.yaml
configs with minor differences. We could setup some common denominator for them so that it becomes more transparent and manageable (especially that more and more configs are expected to be added in future).That's a general mark cc @ericharper