Skip to content

[ci, megatron] test: Add Qwen3 Megatron+Mindspeed Ascend NPU CI #3465

Merged
FightingZhen merged 3 commits intoverl-project:mainfrom
wlf-darkmatter:ci-megatron
Nov 19, 2025
Merged

[ci, megatron] test: Add Qwen3 Megatron+Mindspeed Ascend NPU CI #3465
FightingZhen merged 3 commits intoverl-project:mainfrom
wlf-darkmatter:ci-megatron

Conversation

@wlf-darkmatter
Copy link
Contributor

@wlf-darkmatter wlf-darkmatter commented Sep 13, 2025

What does this PR do?

Add Qwen3 Megatron+Mindspeed Ascend NPU CI

Checklist Before Starting

  • Search for similar PRs. Paste at least one query link here: ...
  • Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
    • {modules} include fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data
    • If this PR involves multiple modules, separate them with , like [megatron, fsdp, doc]
    • {type} is in feat, fix, refactor, chore, test
    • If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.
    • Example: [BREAKING][fsdp, megatron] feat: dynamic batching

Test

For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc.

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

@gemini-code-assist
Copy link
Contributor

Note

Gemini is unable to generate a review for this pull request due to the file types involved not being currently supported.

@wlf-darkmatter wlf-darkmatter marked this pull request as draft September 13, 2025 08:25
@wlf-darkmatter wlf-darkmatter changed the title Ci megatron [ci, megatron] test: Add Qwen3 Megatron+Mindspeed Ascend NPU CI Sep 13, 2025
@wlf-darkmatter wlf-darkmatter force-pushed the ci-megatron branch 2 times, most recently from 083471f to b629bed Compare September 18, 2025 06:45
@wlf-darkmatter
Copy link
Contributor Author

Since the existing test scripts all use small models like 0.5B or 0.6B, while the smallest Qwen3-MoE model is 30B, this would significantly increase the runtime when pulling the model. Additionally, network issues could make the CI less stable. May I use a fully dummy model with weights trimmed to approximately 1B instead? @tardis-key

@tardis-key
Copy link
Collaborator

tardis-key commented Sep 22, 2025

Since the existing test scripts all use small models like 0.5B or 0.6B, while the smallest Qwen3-MoE model is 30B, this would significantly increase the runtime when pulling the model. Additionally, network issues could make the CI less stable. May I use a fully dummy model with weights trimmed to approximately 1B instead? @tardis-key

Using a trimmed model is a good idea.
But the current config system requires a path, and I’m not sure if a dummy model will work. If it doesn’t, we can make it happen by uploading the trimmed model to Hugging Face. @wlf-darkmatter

@CLAassistant
Copy link

CLAassistant commented Nov 6, 2025

CLA assistant check
All committers have signed the CLA.

@wlf-darkmatter wlf-darkmatter marked this pull request as ready for review November 6, 2025 14:03
@wlf-darkmatter
Copy link
Contributor Author

self test OK
image

@wlf-darkmatter wlf-darkmatter force-pushed the ci-megatron branch 2 times, most recently from b087786 to b7f261d Compare November 8, 2025 09:24
@wlf-darkmatter wlf-darkmatter changed the title [ci, megatron] test: Add Qwen3 Megatron+Mindspeed Ascend NPU CI WIP [ci, megatron] test: Add Qwen3 Megatron+Mindspeed Ascend NPU CI Nov 12, 2025
@wlf-darkmatter wlf-darkmatter changed the title WIP [ci, megatron] test: Add Qwen3 Megatron+Mindspeed Ascend NPU CI [WIP, ci, megatron] test: Add Qwen3 Megatron+Mindspeed Ascend NPU CI Nov 12, 2025
@FightingZhen FightingZhen changed the title [WIP, ci, megatron] test: Add Qwen3 Megatron+Mindspeed Ascend NPU CI [WIP][ci, megatron] test: Add Qwen3 Megatron+Mindspeed Ascend NPU CI Nov 13, 2025
@FightingZhen FightingZhen marked this pull request as draft November 13, 2025 01:42
Update run_dapo_qwen3_moe_30b_megatron_npu.sh

Update e2e_ascend.yml
@wlf-darkmatter wlf-darkmatter changed the title [WIP][ci, megatron] test: Add Qwen3 Megatron+Mindspeed Ascend NPU CI [ci, megatron] test: Add Qwen3 Megatron+Mindspeed Ascend NPU CI Nov 18, 2025
@wlf-darkmatter wlf-darkmatter marked this pull request as ready for review November 18, 2025 08:34
@FightingZhen FightingZhen merged commit c12b0cf into verl-project:main Nov 19, 2025
8 checks passed
wuwendyy pushed a commit to wuwendyy/verl that referenced this pull request Nov 19, 2025
…-project#3465)

### What does this PR do?

Add Qwen3 Megatron+Mindspeed Ascend NPU CI

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
@wlf-darkmatter wlf-darkmatter deleted the ci-megatron branch November 25, 2025 07:20
TimurTaepov pushed a commit to giorgossideris/verl that referenced this pull request Dec 20, 2025
…-project#3465)

### What does this PR do?

Add Qwen3 Megatron+Mindspeed Ascend NPU CI

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
vyomakesh0728 added a commit to vyomakesh0728/verl that referenced this pull request Jan 22, 2026
…-project#3465)

### What does this PR do?

Add Qwen3 Megatron+Mindspeed Ascend NPU CI

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants