SP GRPO support + batch SP fixes by djsaunde · Pull Request #2643 · axolotl-ai-cloud/axolotl

djsaunde · 2025-05-07T03:05:37Z

Description

This PR implements support for sequence parallelism (SP) for the GRPO trainer. This includes a custom sampler, similar to the one implemented in TRL, but with logic for replicating samples across processes in the same SP group.

To make this happen, we need to override a lot more code from the GRPO trainer (e.g., the __init__ function). This could be greatly improved by refactoring the trainer upstream to be a lot more modular and therefore more easily extensible; most of the code we're taking from the superclass is unchanged.

We were also able to fix the issue in the pad_to_sequence_len: false case where, for specific input lengths, the ring_flash_attn batch ring attention function experienced extremely large or inf gradient norms, by calling torch.compile on the function 🤔

some small changes / refactors.

Motivation and Context

GRPO for sufficiently advanced tasks will likely require longer sequence lengths than we can fit on on a single GPU. Hence, let's add SP support.

Follow-ups:

Add zigzag, stripe batch ring attn adapters + data splitting, gathering logic
Upstream base GRPO trainer refactor (?)

How has this been tested?

Pytests (need additional coverage)
Manual testing of:
- SP + SFT (pad_to_seq_len: false vs. pad_to_seq_len: true, sample_packing: false vs. sample_packing: true, etc.)
- SP + GRPO

Screenshots (if appropriate)

Example GRPO + SP training run:

Note that the config differed a fair bit from the one in our blog post, so they're not directly comparable.

Types of changes

Social Handles (Optional)

github-actions · 2025-05-07T03:09:48Z

🚀 Deployed on https://68226e22b86d4412d5423d85--resonant-treacle-0fd729.netlify.app

codecov · 2025-05-07T03:19:45Z

Codecov Report

Attention: Patch coverage is 40.76305% with 295 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/axolotl/core/trainers/grpo/trainer.py	13.96%	191 Missing ⚠️
src/axolotl/core/trainers/grpo/sampler.py	14.28%	48 Missing ⚠️
...rc/axolotl/utils/ctx_managers/sequence_parallel.py	70.90%	32 Missing ⚠️
src/axolotl/core/trainer_builder.py	70.27%	11 Missing ⚠️
src/axolotl/core/trainers/grpo/__init__.py	68.75%	5 Missing ⚠️
src/axolotl/utils/data/rl.py	70.00%	3 Missing ⚠️
src/axolotl/train.py	84.61%	2 Missing ⚠️
src/axolotl/utils/schemas/config.py	80.00%	2 Missing ⚠️
src/axolotl/core/trainers/dpo/__init__.py	50.00%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

salmanmohammadi

Really really nice. A few nits, and we need to sync the GRPOTrainer changes with upstream, but this looks good to me.

djsaunde · 2025-05-09T16:37:42Z

Note: GRPO + SP + Liger results in exploding losses. I suggest we merge this now (once tests pass) and follow up with a fix for this case.

… instability; lint

djsaunde requested review from salmanmohammadi and winglian May 7, 2025 03:05

djsaunde self-assigned this May 7, 2025

github-actions Bot temporarily deployed to preview May 7, 2025 03:09 Inactive

github-actions Bot temporarily deployed to preview May 7, 2025 03:28 Inactive

winglian reviewed May 7, 2025

View reviewed changes

Comment thread src/axolotl/core/trainers/grpo/trainer.py

winglian reviewed May 7, 2025

View reviewed changes

Comment thread src/axolotl/train.py Outdated

winglian reviewed May 7, 2025

View reviewed changes

Comment thread src/axolotl/utils/ctx_managers/sequence_parallel.py

winglian added this to the Axolotl v0.10.0 milestone May 7, 2025

github-actions Bot temporarily deployed to preview May 7, 2025 20:20 Inactive

github-actions Bot temporarily deployed to preview May 7, 2025 20:38 Inactive

github-actions Bot temporarily deployed to preview May 7, 2025 21:06 Inactive

github-actions Bot temporarily deployed to preview May 8, 2025 02:24 Inactive

salmanmohammadi reviewed May 8, 2025

View reviewed changes

Comment thread src/axolotl/common/datasets.py

salmanmohammadi reviewed May 8, 2025

View reviewed changes

Comment thread src/axolotl/core/trainer_builder.py

salmanmohammadi reviewed May 8, 2025

View reviewed changes

Comment thread src/axolotl/core/trainers/grpo/sampler.py

salmanmohammadi reviewed May 8, 2025

View reviewed changes

Comment thread src/axolotl/core/trainers/grpo/trainer.py

github-actions Bot temporarily deployed to preview May 8, 2025 21:35 Inactive