Skip to content

Conversation

@dependabot
Copy link

@dependabot dependabot bot commented on behalf of github Apr 2, 2025

Bumps accelerate from 1.4.0 to 1.6.0.

Release notes

Sourced from accelerate's releases.

v1.6.0: FSDPv2, DeepSpeed TP and XCCL backend support

FSDPv2 support

This release introduces the support for FSDPv2 thanks to @​S1ro1.

If you are using python code, you need to set fsdp_version=2 in FullyShardedDataParallelPlugin:

from accelerate import FullyShardedDataParallelPlugin, Accelerator
fsdp_plugin = FullyShardedDataParallelPlugin(
fsdp_version=2
# other options...
)
accelerator = Accelerator(fsdp_plugin=fsdp_plugin)

If want to convert a YAML config that contains the FSDPv1 config to FSDPv2 one , use our conversion tool:

accelerate to-fsdp2 --config_file config.yaml --output_file new_config.yaml`

To learn more about the difference between FSDPv1 and FSDPv2, read the following documentation.

DeepSpeed TP support

We have added initial support for DeepSpeed + TP. Not many changes were required as the DeepSpeed APIs was already compatible. We only needed to make sure that the dataloader was compatible with TP and that we were able to save the TP weights. Thanks @​inkcherry for the work ! huggingface/accelerate#3390.

To use TP with deepspeed, you need to update the setting in the deepspeed config file by including tensor_parallel key:

    ....
    "tensor_parallel":{
      "autotp_size": ${autotp_size}
    },
   ...

More details in this deepspeed PR.

Support for XCCL distributed backend

We've added support for XCCL which is an Intel distributed backend which can be used with XPU devices. More details in this torch PR. Thanks @​dvrogozh for the integration !

What's Changed

... (truncated)

Commits

Dependabot compatibility score

You can trigger a rebase of this PR by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Note
Automatic rebases have been disabled on this pull request as it has been open for over 30 days.

@dependabot dependabot bot added dependencies Pull requests that update a dependency file python Pull requests that update python code labels Apr 2, 2025
@dependabot dependabot bot force-pushed the dependabot/pip/accelerate-1.6.0 branch from 66a0d60 to 46f03f5 Compare April 23, 2025 13:49
Kunlun-Zhu and others added 5 commits April 23, 2025 21:56
* openmanus-0.01

* agentgym input process

* add gitignore

* adding pipeline description docs

* Fix multi-server parallel support

* adding initial testing
* fix(train_ppo): 使用 setsid 和 kill PGID 改进后台服务器清理

- 使用 `setsid` 在独立的进程组中启动 AgentGym 服务器。这确保了由 `conda run` 启动的所有子进程都属于同一个组。
- 修改 `trap EXIT` 清理逻辑,使用 `kill -- -PGID` 杀死整个进程组,而不是仅杀死初始的 `conda run` PID。这样可以更可靠地终止服务器及其所有潜在的子进程。
- 将存储的 ID 从 PID 改为 PGID。
- 移除了之前检查服务器进程是否存在的逻辑,因为 `setsid` 很快退出,该检查不再可靠。

此更改解决了主脚本退出时 AgentGym 服务器可能由于信号未通过 `conda run` 正确传播而无法正确终止的问题。

TODO: 检查 train_grpo.sh 并应用类似的服务端清理逻辑。

* fix(train_ppo): 改进 AgentGym 服务器管理和启动检查
- **新增基于网络端口 (`nc`) 的服务器启动检查逻辑,替代原先不可靠的 PID 检查。**
- 实现端口检查的重试机制,并在检查失败时触发清理并退出。

---------

Co-authored-by: chenzp15 <[email protected]>
* adding new reward & delete unused modules

* Fix/agentgym server cleanup (#53)

* fix(train_ppo): 使用 setsid 和 kill PGID 改进后台服务器清理

- 使用 `setsid` 在独立的进程组中启动 AgentGym 服务器。这确保了由 `conda run` 启动的所有子进程都属于同一个组。
- 修改 `trap EXIT` 清理逻辑,使用 `kill -- -PGID` 杀死整个进程组,而不是仅杀死初始的 `conda run` PID。这样可以更可靠地终止服务器及其所有潜在的子进程。
- 将存储的 ID 从 PID 改为 PGID。
- 移除了之前检查服务器进程是否存在的逻辑,因为 `setsid` 很快退出,该检查不再可靠。

此更改解决了主脚本退出时 AgentGym 服务器可能由于信号未通过 `conda run` 正确传播而无法正确终止的问题。

TODO: 检查 train_grpo.sh 并应用类似的服务端清理逻辑。

* fix(train_ppo): 改进 AgentGym 服务器管理和启动检查
- **新增基于网络端口 (`nc`) 的服务器启动检查逻辑,替代原先不可靠的 PID 检查。**
- 实现端口检查的重试机制,并在检查失败时触发清理并退出。

---------

Co-authored-by: chenzp15 <[email protected]>

* fix config bugs

* .

* .

---------

Co-authored-by: rxdaozhang <[email protected]>
Co-authored-by: chenzp15 <[email protected]>
Bumps [accelerate](https://github.com/huggingface/accelerate) from 1.4.0 to 1.6.0.
- [Release notes](https://github.com/huggingface/accelerate/releases)
- [Commits](huggingface/accelerate@v1.4.0...v1.6.0)

---
updated-dependencies:
- dependency-name: accelerate
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
@dependabot dependabot bot force-pushed the dependabot/pip/accelerate-1.6.0 branch from 46f03f5 to ec8ca6b Compare April 27, 2025 04:00
@realtmxi realtmxi force-pushed the dependabot/pip/accelerate-1.6.0 branch from ec8ca6b to 8aa799c Compare May 23, 2025 10:51
@realtmxi realtmxi force-pushed the main branch 2 times, most recently from 6c72038 to 4d642b4 Compare May 23, 2025 11:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file python Pull requests that update python code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants