-
Notifications
You must be signed in to change notification settings - Fork 239
Fix Energon Support in Qwen3-VL #2440
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
29 commits
Select commit
Hold shift + click to select a range
df7e912
init commit
kamran-nvidia fd806d2
fix: Add missing import of io in task_encoder and test_task_encoder
kamran-nvidia 1cf57ee
Merge branch 'main' into kamran/qwen3_vl_energon
kamran-nvidia 6fcf48f
fix: Update file permissions for peft_seq_unpacked.sh
kamran-nvidia 374252a
fix: Improve logging and comments in EnergonMultiModalDataModule for …
kamran-nvidia d2ec77a
fix: Remove unnecessary blank lines in task_encoder and test_task_enc…
kamran-nvidia 0aace0e
feat: Add comprehensive tests for Context Parallelism handling in Ene…
kamran-nvidia 21d2628
fix: Remove unused import of 'io' in task_encoder.py
kamran-nvidia 6b6e273
Update src/megatron/bridge/recipes/qwen_vl/data/energon/task_encoder.py
kamran-nvidia 1477d91
Merge branch 'main' into kamran/qwen3_vl_energon
kamran-nvidia 2fb1b96
fix: Remove unused '__subflavor__' attribute from QwenVLTaskEncoder t…
kamran-nvidia 258dc4b
fix: Add missing '__subflavors__' attribute to ChatMLSample in QwenVL…
kamran-nvidia f5187af
Merge branch 'main' into kamran/qwen3_vl_energon
kamran-nvidia 706fe9b
fix: Add pack_sequences_in_batch attribute to EnergonProvider
kamran-nvidia 10feab5
feat: Add dataset_type argument for VLM recipes in run_recipe.py
kamran-nvidia 70df6af
fix: Update videos attribute type in ChatMLSample to support nested l…
kamran-nvidia 5c4b45a
feat: Add energon_test.sh for LoRA finetuning with sequence packing c…
kamran-nvidia 612ae77
Merge branch 'main' into kamran/qwen3_vl_energon
kamran-nvidia 5a0685f
fix: Update copyright year and modify command for running LoRA finetu…
kamran-nvidia 3445703
docs: Add finetuning instructions for Energon dataset in README.md
kamran-nvidia 19ca029
feat: Extend video handler to support additional video extensions and…
kamran-nvidia d42610f
Merge branch 'main' into kamran/qwen3_vl_energon
kamran-nvidia 4f35c68
fix: correct typos in README.md and task_encoder.py
kamran-nvidia fee00b2
Merge branch 'main' into kamran/qwen3_vl_energon
kamran-nvidia 550cba7
Merge branch 'kamran/qwen3_vl_energon' of github.com:NVIDIA-NeMo/Mega…
kamran-nvidia 4926295
feat: integrate ProcessGroupCollection for distributed training in En…
kamran-nvidia 853dc57
Merge branch 'main' into kamran/qwen3_vl_energon
kamran-nvidia 5047bd6
feat: update image and video processing to use PIL format in QwenVLTa…
kamran-nvidia 9390e19
fix: update input_ids extraction to handle BatchEncoding type in Qwen…
kamran-nvidia File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,79 @@ | ||
| #!/usr/bin/env bash | ||
| # Copyright (c) 2026, NVIDIA CORPORATION. All rights reserved. | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
| # Workspace directory for checkpoints and results | ||
| WORKSPACE=${WORKSPACE:-/workspace} | ||
|
|
||
| # Before training, make sure to set WANDB_API_KEY or disable wandb logging | ||
| # export WANDB_API_KEY=<your_wandb_api_key> | ||
| # export WANDB_MODE=disabled | ||
|
|
||
| # Test Seq Packing configurations for LoRA finetuning on the dense model | ||
| PRETRAINED_CHECKPOINT=${WORKSPACE}/models/Qwen3-VL-8B-Instruct | ||
| MODEL_NAME=qwen3_vl_8b | ||
| DATASET_NAME=energon | ||
| SEQ_LENGTH=4096 | ||
| TRAIN_ITERS=50 | ||
| GLOBAL_BATCH_SIZE=32 | ||
| MICRO_BATCH_SIZE=2 | ||
| EVAL_ITERS=10 | ||
| LR=0.00005 | ||
| MIN_LR=0.000005 | ||
| LR_WARMUP_ITERS=10 | ||
| LOG_INTERVAL=1 | ||
| WANDB_PROJECT=megatron-bridge-${DATASET_NAME} | ||
|
|
||
| SEQ_PACKING_CONFIGS=(False True) | ||
|
|
||
| # EP/TP/PP/CP/N_PROC combinations: "EP,TP,PP,CP,N_PROC" configurations | ||
| # N_PROC is the total number of processes (GPUs) used for training | ||
| # N_PROC is used to control DP size, to make the loss curves comparable | ||
| PARALLELISM_CONFIGS=("1,1,1,4,8" "1,1,1,2,4" "1,1,1,1,2") | ||
|
|
||
| for pack_config in "${SEQ_PACKING_CONFIGS[@]}"; do | ||
| for par_config in "${PARALLELISM_CONFIGS[@]}"; do | ||
| IFS=',' read -r EP TP PP CP N_PROC <<< "$par_config" | ||
| echo "Running LoRA finetuning pack_sequences_in_batch=$pack_config with EP=$EP TP=$TP PP=$PP CP=$CP N_PROC=$N_PROC" | ||
| uv run python -m torch.distributed.run --nproc_per_node=$N_PROC scripts/training/run_recipe.py \ | ||
| --recipe ${MODEL_NAME}_finetune_config \ | ||
| --step_func qwen3_vl_step \ | ||
| --peft_scheme lora \ | ||
| --dataset_type energon \ | ||
| checkpoint.pretrained_checkpoint=$PRETRAINED_CHECKPOINT \ | ||
| model.seq_length=$SEQ_LENGTH \ | ||
| train.train_iters=$TRAIN_ITERS \ | ||
| train.global_batch_size=$GLOBAL_BATCH_SIZE \ | ||
| train.micro_batch_size=$MICRO_BATCH_SIZE \ | ||
| train.eval_iters=$EVAL_ITERS \ | ||
| optimizer.lr=$LR \ | ||
| optimizer.min_lr=$MIN_LR \ | ||
| scheduler.lr_warmup_iters=$LR_WARMUP_ITERS \ | ||
| checkpoint.save=${WORKSPACE}/results/${MODEL_NAME}_lora_seq_pack_${pack_config}_cp${CP} \ | ||
| logger.log_interval=$LOG_INTERVAL \ | ||
| logger.wandb_project=$WANDB_PROJECT \ | ||
| logger.wandb_exp_name=${MODEL_NAME}_${DATASET_NAME}_lora_seq_pack_${pack_config}_cp${CP} \ | ||
| dataset.seq_length=$SEQ_LENGTH \ | ||
| dataset.path=/path/to/energon/dataset \ | ||
| dataset.pack_sequences_in_batch=$pack_config \ | ||
| model.expert_model_parallel_size=$EP \ | ||
| model.tensor_model_parallel_size=$TP \ | ||
| model.pipeline_model_parallel_size=$PP \ | ||
| model.context_parallel_size=$CP \ | ||
| model.calculate_per_token_loss=True \ | ||
| ddp.average_in_collective=False \ | ||
| ddp.grad_reduce_in_fp32=True | ||
| done | ||
| done | ||
|
|
||
Empty file.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.