Add delta weight sync blogpost#3386
Conversation
Sparse safetensors over HF Buckets for async RL weight sync in TRL. ~99% bf16 sparsity at RL learning rates means per-step payload drops from 1.2 GB to 20-35 MB on Qwen3-0.6B. Includes four interactive animations and a disaggregated demo running on HF Spaces. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@lewtun @kashif @qgallouedec kindly review when you got time :) |
qgallouedec
left a comment
There was a problem hiding this comment.
very easy to ready from top to bottom! Nice work! I couldn't access the figures though.
I only have some minor recommendations.
|
|
||
| If you read our previous post on [the landscape of async RL training](https://huggingface.co/blog/async-rl-landscape), you already know the punchline. Every async RL library, regardless of how it spells "actor model" or which color its NCCL backend is painted, eventually trips over the same root: **weight synchronization**. | ||
|
|
||
| The inference engine speaks the policy of step N. The trainer just finished step N+1. The fresh weights have to get from one side to the other before the inference engine starts drifting hopelessly off-policy. In a synchronous setup you pay for this once per step and it is no big deal. In an async setup it happens constantly, in the background, while generation is also trying to happen, and it had better be fast. |
There was a problem hiding this comment.
In a synchronous setup you pay for this once per step and it is no big deal
I would disagree with this. It also the same problem with the sync setup: you interrupt the training 1min30 where you could only interrupt it 1 sec.
There was a problem hiding this comment.
Agreed, I need to fix it . The better point to make, is that you don't need to stop syncing on the upload part for the trainer. You can just inform inference enginer that weights are ready and go fetch from the rollout buffer
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
|
Thanks @kashif 🙏🏼 🙏🏼 |
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Removed user 'nouamanetazi' from the list of users.
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
|
|
||
| ## 1. The One Terabyte Problem | ||
|
|
||
| If you read our previous post on [the landscape of async RL training](https://huggingface.co/blog/async-rl-landscape), you already know the punchline. Every async RL library, regardless of how it spells "actor model" or which color its NCCL backend is painted, eventually trips over the same root: **weight synchronization**. |
There was a problem hiding this comment.
| If you read our previous post on [the landscape of async RL training](https://huggingface.co/blog/async-rl-landscape), you already know the punchline. Every async RL library, regardless of how it spells "actor model" or which color its NCCL backend is painted, eventually trips over the same root: **weight synchronization**. | |
| If you read our previous post on [the landscape of async RL training](https://huggingface.co/blog/async-rl-training-landscape), you already know the punchline. Every async RL library, regardless of how it spells "actor model" or which color its NCCL backend is painted, eventually trips over the same root: **weight synchronization**. |
Sparse safetensors over HF Buckets for async RL weight sync in TRL. ~99% bf16 sparsity at RL learning rates means per-step payload drops from 1.2 GB to 20-35 MB on Qwen3-0.6B. Includes four interactive animations and a disaggregated demo running on HF Spaces.