[ci] fix: Add mbridge dependency into e2e_ascend#4560
[ci] fix: Add mbridge dependency into e2e_ascend#4560wuxibin89 merged 2 commits intoverl-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request adds the mbridge dependency to the Ascend Dockerfiles and updates the corresponding documentation. My review focuses on ensuring the reproducibility and stability of the Docker images and development environments. The main feedback is to pin the version of the newly added mbridge dependency across all files to avoid potential issues with future releases.
| # Remove existing triton or triton-ascend installed by some third-party packages | ||
| pip uninstall -y triton triton-ascend && \ | ||
| # Install mbridge | ||
| pip install mbridge && \ |
There was a problem hiding this comment.
To ensure reproducible Docker builds, it is a best practice to pin the versions of all installed packages. Using pip install mbridge will install the latest version, which could introduce breaking changes in the future and make builds non-deterministic. Please pin this dependency to a specific, known-working version.
pip install mbridge==<version> && \
| # Remove existing triton or triton-ascend installed by some third-party packages | ||
| pip uninstall -y triton triton-ascend && \ | ||
| # Install mbridge | ||
| pip install mbridge && \ |
There was a problem hiding this comment.
To ensure reproducible Docker builds, it is a best practice to pin the versions of all installed packages. Using pip install mbridge will install the latest version, which could introduce breaking changes in the future and make builds non-deterministic. Please pin this dependency to a specific, known-working version.
pip install mbridge==<version> && \
| echo "export PYTHONPATH=$PYTHONPATH:\"$(pwd)/Megatron-LM\"" >> ~/.bashrc | ||
|
|
||
| # 安装 mbridge | ||
| pip install mbridge |
There was a problem hiding this comment.
The installation instructions should specify a fixed version for mbridge to ensure users set up a consistent and working environment. Installing the latest version can lead to unexpected issues if breaking changes are introduced in the dependency. Please add a specific version to the pip install command.
| pip install mbridge | |
| pip install mbridge==<version> |
| Megatron-LM v0.12.1 | ||
| MindSpeed (f2b0977e) | ||
| triton-ascend 3.2.0rc4 | ||
| mbridge latest version |
There was a problem hiding this comment.
Using 'latest version' for a dependency is discouraged as it leads to non-reproducible environments and can introduce breaking changes unexpectedly. Please specify a fixed, known-working version for mbridge in this component list to ensure consistency and stability.
| mbridge latest version | |
| mbridge <version> |
### What does this PR do?
> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.
As title, add third-party download cache into e2e_ascend additionally.
### Checklist Before Starting
- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.
Not related.
### API and Usage Example
> Demonstrate how the API changes if any, and provide usage example(s)
if possible.
Not related.
### Design & Code Changes
> Demonstrate the high-level design if this PR is complex, and list the
specific changes.
Not related.
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
### What does this PR do?
> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.
As title, add third-party download cache into e2e_ascend additionally.
### Checklist Before Starting
- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.
Not related.
### API and Usage Example
> Demonstrate how the API changes if any, and provide usage example(s)
if possible.
Not related.
### Design & Code Changes
> Demonstrate the high-level design if this PR is complex, and list the
specific changes.
Not related.
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
What does this PR do?
As title, add third-party download cache into e2e_ascend additionally.
Checklist Before Starting
[{modules}] {type}: {description}(This will be checked by the CI){modules}includefsdp,megatron,sglang,vllm,rollout,trainer,ci,training_utils,recipe,hardware,deployment,ray,worker,single_controller,misc,perf,model,algo,env,tool,ckpt,doc,data,like[megatron, fsdp, doc]{type}is infeat,fix,refactor,chore,test[BREAKING]to the beginning of the title.[BREAKING][fsdp, megatron] feat: dynamic batchingTest
Not related.
API and Usage Example
Not related.
Design & Code Changes
Not related.
Checklist Before Submitting
Important
Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.
pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=alwaysci-requestchannel in theverlSlack workspace. (If not accessible, please try the Feishu group (飞书群).)