[megatron] support qwen3.5 models for megatron, bump mbridge + megatron-core to latest by erictang000 · Pull Request #1425 · NovaSky-AI/SkyRL

erictang000 · 2026-04-01T20:25:55Z

GPU CI: https://github.com/NovaSky-AI/SkyRL/actions/runs/23869520430
Megatron GPU CI: https://github.com/NovaSky-AI/SkyRL/actions/runs/23869278330

WandB run for Qwen3.5-0.8B:
https://wandb.ai/sky-posttraining-uc-berkeley/gsm8k_megatron/runs/5cm9tg0j

gemini-code-assist

Code Review

This pull request adds a Megatron training script for Qwen 3.5, updates dependencies, and introduces a monkey-patch for transformers v5 compatibility. Review feedback identifies a likely version typo in pyproject.toml, an undefined variable and inconsistent model naming in the shell script, and suggests more specific exception handling for the vLLM engine workaround.

gemini-code-assist · 2026-04-01T20:27:04Z

examples/train/megatron/run_megatron_qwen3.5.sh

+  trainer.policy.megatron_config.expert_model_parallel_size=$MEGATRON_EP \
+  trainer.policy.megatron_config.expert_tensor_parallel_size=$MEGATRON_ETP \
+  trainer.use_sample_packing=false \
+  trainer.flash_attn=$FLASH_ATTN \


The variable $FLASH_ATTN is used here, but its definition on line 30 is commented out. This will result in an empty value being passed to the trainer, which may cause a parsing error in the entrypoint.

Suggested change

trainer.flash_attn=$FLASH_ATTN \

trainer.flash_attn=false \

pyproject.toml

gemini-code-assist · 2026-04-01T20:27:04Z

examples/train/megatron/run_megatron_qwen3.5.sh

@@ -0,0 +1,77 @@
+set -x
+
+# Colocated GRPO training+generation for Moonlight-16B-A3B-Instruct on GSM8K with Megatron.


The comment mentions Moonlight-16B-A3B-Instruct, but the script is configured for Qwen/Qwen3.5-0.8B (line 12). This should be updated to reflect the correct model.

gemini-code-assist · 2026-04-01T20:27:04Z

examples/train/megatron/run_megatron_qwen3.5.sh

+  generator.inference_engine.gpu_memory_utilization=0.6 \
+  trainer.logger="$LOGGER" \
+  trainer.project_name="gsm8k_megatron" \
+  trainer.run_name="gsm8k_megatron_tp${MEGATRON_TP}_pp${MEGATRON_PP}_cp${MEGATRON_CP}_ep${MEGATRON_EP}_etp${MEGATRON_ETP}_moonlight16b-a3b" \


The run_name suffix refers to moonlight16b-a3b. It should be updated to match the Qwen 3.5 model being used.

Suggested change

trainer.run_name="gsm8k_megatron_tp${MEGATRON_TP}_pp${MEGATRON_PP}_cp${MEGATRON_CP}_ep${MEGATRON_EP}_etp${MEGATRON_ETP}_moonlight16b-a3b" \

trainer.run_name="gsm8k_megatron_tp${MEGATRON_TP}_pp${MEGATRON_PP}_cp${MEGATRON_CP}_ep${MEGATRON_EP}_etp${MEGATRON_ETP}_qwen3.5-0.8b" \

gemini-code-assist · 2026-04-01T20:27:04Z

skyrl/backends/skyrl_train/inference_engines/vllm/vllm_engine.py

+except Exception:
+    pass


Catching a broad Exception and passing silently is generally discouraged. While this is a monkey-patch workaround, it would be safer to catch specific errors (like ImportError or AttributeError) or at least log a warning if the patch fails, to aid in debugging if the library structure changes unexpectedly.

devin-ai-integration

Devin Review found 1 potential issue.

View 4 additional findings in Devin Review.

devin-ai-integration · 2026-04-01T20:28:16Z

examples/train/megatron/run_megatron_qwen3.5.sh

+  trainer.policy.megatron_config.expert_model_parallel_size=$MEGATRON_EP \
+  trainer.policy.megatron_config.expert_tensor_parallel_size=$MEGATRON_ETP \
+  trainer.use_sample_packing=false \
+  trainer.flash_attn=$FLASH_ATTN \


🔴 Undefined $FLASH_ATTN variable used in script, will pass empty value to config

The script references $FLASH_ATTN on line 49 (trainer.flash_attn=$FLASH_ATTN), but the variable definition on line 30 is commented out (# FLASH_ATTN=false). Since the script does not use set -u, the undefined variable silently expands to an empty string, resulting in trainer.flash_attn= being passed to the training entrypoint. This will either cause a configuration parsing error or set an unexpected value. Every other megatron script in the same directory (e.g., run_megatron_moonlight.sh:33, run_megatron_qwen3-30b-a3b.sh:27) properly defines FLASH_ATTN before using it.

Prompt for agents

In examples/train/megatron/run_megatron_qwen3.5.sh, the FLASH_ATTN variable on line 30 is commented out (# FLASH_ATTN=false) but is referenced on line 49 as trainer.flash_attn=$FLASH_ATTN. Either: 1. Uncomment line 30 to define FLASH_ATTN (e.g., change line 30 from '# FLASH_ATTN=false' to 'FLASH_ATTN=true' or 'FLASH_ATTN=false' depending on whether flash attention is supported for Qwen3.5), or 2. Remove line 49 entirely if the flash_attn config should not be set for this model (similar to how run_megatron_nemotron_mini_4b.sh omits it).

Was this helpful? React with 👍 or 👎 to provide feedback.

x

470ad3c

gemini-code-assist bot reviewed Apr 1, 2026

View reviewed changes

devin-ai-integration bot reviewed Apr 1, 2026

View reviewed changes

erictang000 mentioned this pull request Apr 1, 2026

Support Qwen3.5 with FSDP + Megatron Backends #1254

Open

fix tx v5 issues

830eae2

erictang000 mentioned this pull request Apr 3, 2026

[dependencies] Upgrade transformers to >=5.0.0,<=5.3.0 #1426

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[megatron] support qwen3.5 models for megatron, bump mbridge + megatron-core to latest#1425

[megatron] support qwen3.5 models for megatron, bump mbridge + megatron-core to latest#1425
erictang000 wants to merge 2 commits intomainfrom
qwen3_5_megatron

erictang000 commented Apr 1, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Apr 1, 2026

Uh oh!

Uh oh!

gemini-code-assist bot Apr 1, 2026

Uh oh!

gemini-code-assist bot Apr 1, 2026

Uh oh!

gemini-code-assist bot Apr 1, 2026

Uh oh!

devin-ai-integration bot left a comment

Uh oh!

devin-ai-integration bot Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		@@ -0,0 +1,77 @@
		set -x

		# Colocated GRPO training+generation for Moonlight-16B-A3B-Instruct on GSM8K with Megatron.

	trainer.run_name="gsm8k_megatron_tp${MEGATRON_TP}_pp${MEGATRON_PP}_cp${MEGATRON_CP}_ep${MEGATRON_EP}_etp${MEGATRON_ETP}_moonlight16b-a3b" \
	trainer.run_name="gsm8k_megatron_tp${MEGATRON_TP}_pp${MEGATRON_PP}_cp${MEGATRON_CP}_ep${MEGATRON_EP}_etp${MEGATRON_ETP}_qwen3.5-0.8b" \

Conversation

erictang000 commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gemini-code-assist bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

erictang000 commented Apr 1, 2026 •

edited

Loading