-
Notifications
You must be signed in to change notification settings - Fork 173
feat: Support qwen3-next, mcore path #1530
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
2944cd7
d458dd7
dfe58b8
6257143
87a1d8c
1ed6b5d
c7a042d
f90fe85
e24ff52
f9df826
72915eb
e0f3661
4bb8b2c
8a16bb5
2bf6313
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| +1 −0 | megatron/core/model_parallel_config.py | |
| +1 −1 | megatron/core/pipeline_parallel/schedules.py | |
| +16 −1 | tools/checkpoint/checkpoint_inspector.py |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,47 @@ | ||
| defaults: ../../grpo_math_1B.yaml | ||
| grpo: | ||
| num_prompts_per_step: 64 | ||
| num_generations_per_prompt: 32 | ||
| checkpointing: | ||
| checkpoint_dir: results/grpo-qwen3-next-80ba3b-8n8g-megatron | ||
| policy: | ||
| model_name: Qwen/Qwen3-Next-80B-A3B-Instruct | ||
| train_micro_batch_size: 1 | ||
| max_total_sequence_length: 4096 | ||
| dtensor_cfg: | ||
| enabled: false | ||
| optimizer: null | ||
| scheduler: null | ||
| sequence_packing: | ||
| enabled: false | ||
| algorithm: modified_ffd | ||
| make_sequence_length_divisible_by: ${policy.megatron_cfg.tensor_model_parallel_size} | ||
| megatron_cfg: | ||
| enabled: true | ||
| converter_type: Qwen3NextForCausalLM | ||
| tensor_model_parallel_size: 2 | ||
| pipeline_model_parallel_size: 4 | ||
| expert_model_parallel_size: 4 | ||
| sequence_parallel: true | ||
| optimizer: | ||
| lr: 3.0e-07 | ||
| min_lr: 3.0e-08 | ||
| scheduler: | ||
| lr_warmup_iters: 50 | ||
| lr_warmup_init: 3.0e-08 | ||
| env_vars: | ||
| PYTORCH_CUDA_ALLOC_CONF: expandable_segments:False | ||
| generation: | ||
| vllm_cfg: | ||
| tensor_parallel_size: 4 | ||
| gpu_memory_utilization: 0.7 | ||
| logger: | ||
| log_dir: logs/grpo-qwen3-next-80ba3b-8n8g-megatron | ||
| wandb_enabled: true | ||
| tensorboard_enabled: true | ||
| wandb: | ||
| project: nemo-rl | ||
| name: grpo-qwen3-next-80ba3b-8n8g-megatron | ||
| cluster: | ||
| gpus_per_node: 8 | ||
| num_nodes: 8 |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,47 @@ | ||
| defaults: ../../sft.yaml | ||
| sft: | ||
| max_num_steps: 1000000 | ||
| val_period: 50 | ||
| checkpointing: | ||
| checkpoint_dir: results/sft-qwen3-next-80ba3b-instruct-8n8g-megatron | ||
| save_period: 50 | ||
| policy: | ||
| model_name: Qwen/Qwen3-Next-80B-A3B-Instruct | ||
| tokenizer: | ||
| name: Qwen/Qwen3-Next-80B-A3B-Instruct | ||
| chat_template: default | ||
| train_global_batch_size: 512 | ||
| max_total_sequence_length: 4096 | ||
| dtensor_cfg: | ||
| enabled: false | ||
| make_sequence_length_divisible_by: ${policy.megatron_cfg.tensor_model_parallel_size} | ||
| optimizer: null | ||
| megatron_cfg: | ||
| enabled: true | ||
| converter_type: Qwen3NextForCausalLM | ||
| pipeline_model_parallel_size: 2 | ||
| expert_model_parallel_size: 8 | ||
| optimizer: | ||
| lr: 2.0e-05 | ||
| min_lr: 1.99999e-05 | ||
| weight_decay: 0.01 | ||
| bf16: true | ||
| scheduler: | ||
| lr_warmup_init: 1.9999e-65 | ||
|
Comment on lines
+19
to
+30
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Double-check
If this wasn’t intentional (e.g., you meant 🤖 Prompt for AI Agents |
||
| data: | ||
| dataset_name: openmathinstruct2 | ||
| prompt_file: examples/prompts/math.txt | ||
| split: train_1M | ||
| add_generation_prompt: true | ||
| output_key: generated_solution | ||
| seed: 42 | ||
| logger: | ||
| log_dir: logs/sft-qwen3-next-80ba3b-instruct-8n8g-megatron | ||
| wandb: | ||
| project: nemo-rl | ||
| name: sft-qwen3-next-80ba3b-instruct-8n8g-megatron | ||
| tensorboard: | ||
| log_dir: tb_logs-sft-dev-openmathinstruct2 | ||
| cluster: | ||
| num_nodes: 8 | ||
| gpus_per_node: 8 | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Verify transformers version 4.57.1 availability and qwen3-next requirements.
The dependency on
transformers>=4.57.1has been added. Please confirm:🌐 Web query:
💡 Result:
The latest version on PyPI is transformers 4.57.1 — uploaded Oct 14, 2025. [1]
Sources
[1] PyPI — transformers release files (transformers‑4.57.1).
Length of output: 5094
🏁 Script executed:
Length of output: 3891
Length of output: 0
🏁 Script executed:
Length of output: 2294
Length of output: 3092
🏁 Script executed:
Length of output: 2446
🏁 Script executed:
Length of output: 58
Remove test_transformer_memory_regression.py and reinstate disabled nightly test per the documented cleanup instructions.
The
transformers>=4.57.1constraint in setup.py (added for qwen3-next support) will immediately failtests/unit/test_transformer_memory_regression.py, which enforces version[4.54, 4.56). This is actually good news—the memory regression is fixed in 4.57.1—but the cleanup steps documented in the test must be completed:tests/unit/test_transformer_memory_regression.pytests/test_suites/llm/dpo-mistral-nemo-instruct-2407-1n8g-fsdp2tp8-actckpt-long.shtransformers>=4.54,<4.56#1343🤖 Prompt for AI Agents