Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
111 commits
Select commit Hold shift + click to select a range
b23765c
initial proposal
garrett4wade Jun 11, 2025
d389cd8
add arealite
garrett4wade Jun 13, 2025
577f72c
.
garrett4wade Jun 13, 2025
989bc8a
change api
garrett4wade Jun 13, 2025
fd7103f
.
garrett4wade Jun 13, 2025
ad76c83
remove LOG_ROOT
garrett4wade Jun 14, 2025
7a51668
remove MODEL_SAVE_PATH
garrett4wade Jun 14, 2025
7adfedd
remove PARAM_REALLOC_PATH, DATASET_CACHE
garrett4wade Jun 14, 2025
170ec3a
prepare for testing
garrett4wade Jun 14, 2025
a433a29
prepare for testing
garrett4wade Jun 14, 2025
d15b091
ready for run
garrett4wade Jun 14, 2025
0b0aa9c
local run
garrett4wade Jun 14, 2025
cafc602
tests mainly pass
garrett4wade Jun 15, 2025
2f17696
format
garrett4wade Jun 15, 2025
b591015
.
garrett4wade Jun 16, 2025
7a81f55
Merge branch 'fw/gh/fix-init-constants' of https://github.com/inclusi…
garrett4wade Jun 16, 2025
0d07566
amend cluster.py
garrett4wade Jun 16, 2025
cd771ad
.
garrett4wade Jun 16, 2025
8f4370d
.
garrett4wade Jun 16, 2025
29b3d50
client test pass
garrett4wade Jun 16, 2025
dbb2703
pass rollout test
garrett4wade Jun 16, 2025
b0d0026
remove unused imports
garrett4wade Jun 16, 2025
da41bf1
add arealite readme
garrett4wade Jun 16, 2025
90cc896
change api
garrett4wade Jun 18, 2025
7de1863
.
garrett4wade Jun 18, 2025
5441e87
.
garrett4wade Jun 18, 2025
2a01a5a
.
garrett4wade Jun 18, 2025
1183bc4
.
garrett4wade Jun 19, 2025
1a63361
.
nuzant Jun 19, 2025
f9390da
.
garrett4wade Jun 19, 2025
a5e82f2
.
nuzant Jun 19, 2025
20c7cd8
.
nuzant Jun 19, 2025
1424e7a
.
garrett4wade Jun 19, 2025
8bf6dd1
Merge branch 'mzy/gh/fsdp2-engine' of https://github.com/inclusionAI/…
garrett4wade Jun 19, 2025
e4921d9
format
garrett4wade Jun 19, 2025
7fbe7d9
.
garrett4wade Jun 19, 2025
a218692
implement iteraptable generation (#112)
zhaochenyang20 Jun 19, 2025
4fc6e2c
update code
garrett4wade Jun 19, 2025
d5484f9
.
garrett4wade Jun 19, 2025
88bae72
fix
garrett4wade Jun 20, 2025
eeda029
Merge branch 'fw/gh/fix-init-constants' of code.alipay.com:inclusionA…
garrett4wade Jun 20, 2025
06829c6
Merge branch 'fw/refactor' of code.alipay.com:inclusionAI/AReaL into …
garrett4wade Jun 20, 2025
6ce5ec3
.
garrett4wade Jun 20, 2025
92d6364
.
garrett4wade Jun 20, 2025
866ceac
.
garrett4wade Jun 20, 2025
91c7de2
pass controller generate batch test
garrett4wade Jun 20, 2025
38e3cac
.
garrett4wade Jun 21, 2025
816e115
refactor rollout controller into worker and controller
garrett4wade Jun 21, 2025
211d461
.
garrett4wade Jun 21, 2025
8b87b23
.
garrett4wade Jun 21, 2025
6f1370c
.
garrett4wade Jun 22, 2025
331ca7c
change to async rollout
garrett4wade Jun 22, 2025
9bf9b18
pass rollout controller test
garrett4wade Jun 22, 2025
18aa285
pass test
garrett4wade Jun 22, 2025
e9d97f3
.
garrett4wade Jun 22, 2025
394a0ff
update readme
garrett4wade Jun 23, 2025
5ffce54
.
garrett4wade Jun 23, 2025
d3c2f15
sft debug
nuzant Jun 23, 2025
9129dd7
merge
nuzant Jun 23, 2025
6ebcbc5
.
garrett4wade Jun 23, 2025
7f1397e
Merge branch 'main' of https://github.com/inclusionAI/AReaL into fw/r…
garrett4wade Jun 23, 2025
302e876
add lisence
garrett4wade Jun 23, 2025
6ed10c9
remove unused files
garrett4wade Jun 23, 2025
7695179
remove unsed args in ppo
garrett4wade Jun 23, 2025
b4766bd
add hf engine wrapper (#116)
Jayon02 Jun 24, 2025
8d2bd4e
format
garrett4wade Jun 24, 2025
06060cf
format
nuzant Jun 24, 2025
b112d83
.
nuzant Jun 24, 2025
49a31c4
refine hf engine
garrett4wade Jun 24, 2025
5d5ac78
.
nuzant Jun 24, 2025
e7163ea
merge fw/refactor
nuzant Jun 24, 2025
ccdf037
fix
nuzant Jun 24, 2025
92e3a3d
add fsdp engine and sft tests
nuzant Jun 25, 2025
9b8306c
.
garrett4wade Jun 25, 2025
8019335
merge
garrett4wade Jun 25, 2025
f9643b7
.
garrett4wade Jun 25, 2025
aa8f4ef
.
garrett4wade Jun 25, 2025
d07f595
pass ppo unittest
garrett4wade Jun 25, 2025
242243b
pass ppo and rollout controller tests
garrett4wade Jun 26, 2025
3059640
clear unused imports
garrett4wade Jun 26, 2025
59d288b
rename ppo to grpo
garrett4wade Jun 26, 2025
1be260e
change reward function organization
garrett4wade Jun 26, 2025
eb431c1
reorganize code
garrett4wade Jun 26, 2025
63cd942
add dataset api
garrett4wade Jun 26, 2025
7e7240d
.
garrett4wade Jun 26, 2025
05a2df0
.
garrett4wade Jun 26, 2025
6ec4493
.
Jun 26, 2025
84ff759
format
Jun 26, 2025
f099bbd
chmod fix
nuzant Jun 26, 2025
15537cb
.
garrett4wade Jun 26, 2025
8c338e9
Merge branch 'fw/refactor' of https://code.alipay.com/inclusionAI/ARe…
garrett4wade Jun 26, 2025
9724c8a
rename workflow to collector
garrett4wade Jun 27, 2025
77a557c
refactor llm_client location
garrett4wade Jun 27, 2025
73b5b3e
.
garrett4wade Jun 27, 2025
4320da8
.
garrett4wade Jun 27, 2025
b424176
fix llm server api
garrett4wade Jun 27, 2025
d2a317d
refactor config structure
garrett4wade Jun 28, 2025
a2ade35
.
garrett4wade Jun 30, 2025
8612932
fix tests
garrett4wade Jun 30, 2025
91d6399
.
garrett4wade Jun 30, 2025
c66ed17
.
garrett4wade Jun 30, 2025
2ce1ece
.
garrett4wade Jul 1, 2025
09f339f
Fix unresolved issue in SFTTrainer PR (#139)
nuzant Jul 1, 2025
df5ee49
Merge branch 'fw/refactor' of https://github.com/inclusionAI/AReaL in…
garrett4wade Jul 1, 2025
d1f863c
Merge branch 'fw/refactor2' of https://code.alipay.com/inclusionAI/AR…
garrett4wade Jul 1, 2025
ab7503a
.
garrett4wade Jul 2, 2025
a5299b1
.
garrett4wade Jul 2, 2025
3a8796b
.
garrett4wade Jul 2, 2025
89a8d8c
.
garrett4wade Jul 4, 2025
078d3e1
Add CI for testing AReaLite (#150)
futrime Jul 7, 2025
9a06675
.
garrett4wade Jul 7, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 50 additions & 0 deletions .github/workflows/test-arealite.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
name: Test AReaLite

on:
push:
paths:
- .github/workflows/test-arealite.yml
- arealite/**
- ci/**
workflow_dispatch:

jobs:
test-arealite:
runs-on: ubuntu-latest
concurrency:
group: test-arealite
steps:
- uses: actions/checkout@v4

- uses: appleboy/ssh-action@v1
env:
GIT_REPO_URL: https://github.bibk.top/${{ github.repository }}
GIT_COMMIT_SHA: ${{ github.sha }}
with:
host: ${{ secrets.CI_NODE_ADDR }}
username: ${{ secrets.CI_NODE_USER }}
key: ${{ secrets.REMOTE_SSH_KEY }}
envs: GIT_REPO_URL,GIT_COMMIT_SHA
script_path: ci/clone_repo.sh

- uses: appleboy/ssh-action@v1
env:
GIT_COMMIT_SHA: ${{ github.sha }}
with:
host: ${{ secrets.CI_NODE_ADDR }}
username: ${{ secrets.CI_NODE_USER }}
key: ${{ secrets.REMOTE_SSH_KEY }}
command_timeout: 2h
envs: GIT_COMMIT_SHA
script_path: ci/build_env_image.sh

- uses: appleboy/ssh-action@v1
env:
GIT_COMMIT_SHA: ${{ github.sha }}
with:
host: ${{ secrets.CI_NODE_ADDR }}
username: ${{ secrets.CI_NODE_USER }}
key: ${{ secrets.REMOTE_SSH_KEY }}
command_timeout: 1h
envs: GIT_COMMIT_SHA
script_path: ci/test_arealite.sh
213 changes: 213 additions & 0 deletions arealite/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,213 @@
# AReaLite Design Doc

## Motivation

AReaL is too heavy for AI researchers to use, understand, and develop with for several reasons. The most important issue is that its code architecture is *system-centric* rather than *AI-centric* — the RL algorithm workflow consists of multiple *workers* that run consecutive *model function calls*, neither of which are well-known concepts for AI researchers. As a result, users must first understand these concepts before they can develop workflows and algorithms for their own use cases.

Additionally, due to historical reasons, AReaL's code is not clean. There are large pieces of code inherited from previous projects that are not useful but significantly increase the burden on users and developers. Sometimes debugging is difficult even for core developers like myself.

Since the tools for building RL workflows are becoming increasingly mature, implementing a framework that achieves comparable efficiency requires much fewer lines of code. Now is the proper time to revisit the API design and distill the giant codebase into a neat and clean one. The distilled codebase does not need to be ultra-efficient. Instead, we want to deliver 90% functionality of the original AReaL while minimizing the lines of code and the burden on potential users. Our aim is to build an RL training framework that is fast to use, fast to read, and fast to execute. Here comes the lite version of AReaL — AReaLite.

AReaLite is the first step in AReaL's refactoring process. It is not only a standalone training library with shallow interfaces, but will also provide the core API definitions to be used by AReaL in the future. AReaL will essentially transform its current worker-based architecture into an AI-centric architecture like AReaLite. AReaL will **extend** AReaLite's APIs and implementations to support more backends for efficient large-scale training.

## Expectations of AReaLite

### Highlights

+ Fully asynchronous training with decoupled inference and training.
+ Elastic inference device scaling — users can shut down or launch more inference processes independently during training.
+ Full SFT/RL algorithmic functionality matching AReaL.
+ Arbitrary agentic rollout collector customization in a single file.
+ Easy navigation to implementation references via Ctrl+click in VSCode.
+ Support for distributed launching with Ray/SLURM/torchrun.

### AReaLite's Scope

+ Not bound to Ray.
+ Only supports SGLang and PyTorch FSDP2 with SPMD launching.
+ No customized data structures like `SequenceSample`. All data are PyTorch tensors.
+ Uses HuggingFace (models, datasets) and PyTorch (FSDP, data structures) as much as possible.

## Architecture

### Core Components

```
arealite/
├── api/ # Abstract interfaces and data structures
├── impl/ # Concrete implementations
├── cli/ # Command-line interfaces
├── config/ # Configuration templates
└── tests/ # Standalone test scripts
```

#### 1. API Layer (`api/`)

The API layer defines abstract interfaces and data structures that provide a clean contract between different components:

- **`engine_api.py`**: Defines `SPMDWrapper` for SPMD-based training backends (FSDP) and `EngineFactory`
- **`trainer_api.py`**: Defines `Trainer` base class for different training algorithms and `TrainerFactory`
- **`rollout_api.py`**: Defines `RolloutCollector`, `Agent`, `Environment` for RL data collection and `RolloutCollectorFactory`
- **`cli_args.py`**: Defines configuration dataclasses for all components

#### 2. Implementation Layer (`impl/`)

The implementation layer contains concrete implementations of the API interfaces:

- **`fsdp_wrapper.py`**: FSDP-based training engine using PyTorch FSDP2
- **`trainer/grpo.py`**: GRPO trainer implementation for reinforcement learning
- **`rollout_controller.py`**: Coordinates rollout data collection across workers
- **`rlvr/`**: RLVR collector implementations
- **`agentic/`**: Agentic collector implementations (math, code tasks)

#### 3. CLI Layer (`cli/`)

The CLI layer provides user-facing commands:

- **`main.py`**: Main entry point for launching complete training pipelines
- **`launch_server.py`**: Utility for launching standalone LLM servers

### Data Flow Architecture

AReaLite uses an **async producer-consumer pattern**:

```
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ LLM Servers │◄──►│ Rollout Workers │───►│ Data Buffer │
│ (SGLang) │ │ (Async Batch) │ │ │
└─────────────────┘ └──────────────────┘ └─────────────────┘
▲ │
│ ▼
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Checkpoints │◄───│ FSDP Trainer │◄───│ Training Loop │
│ │ │ (Sync Batch) │ │ │
└─────────────────┘ └──────────────────┘ └─────────────────┘
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a main entry point annotation somewhere?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a data flow arch graph in the code walk-through example: https://github.com/inclusionAI/AReaL/blob/lite/docs/arealite/gsm8k_grpo.md

```

### Key Design Principles

#### 1. **AI-Centric API Design**
Unlike the original AReaL's system-centric approach with workers and model functions, AReaLite uses familiar ML concepts:
- `Agent` and `Environment` (from RL literature)
- `RolloutCollector` (combines multiple agents and the environment to generate rollout data)
- `Trainer` (from HuggingFace/PyTorch, fetches rollout data and updates model parameters)

#### 2. **Factory Pattern for Extensibility**
Each major component uses a factory pattern for easy customization:
- `EngineFactory` creates training backends
- `TrainerFactory` creates training algorithms
- `RolloutCollectorFactory` creates rollout collectors

#### 3. **Configuration-Driven Architecture**
All components are configured through dataclasses defined in `cli_args.py`, enabling:
- Type-safe configuration
- Easy CLI argument generation
- Clear documentation of available options


## Implementation Details

### Training Pipeline

1. **Initialization**: Factory classes create configured instances of engines, trainers, and rollout collectors
2. **Rollout Phase**: `RolloutController` coordinates async data collection across multiple `RolloutWorker` instances
3. **Training Phase**: `Trainer` performs synchronous gradient updates using collected data
4. **Weight Updates**: Updated model weights are pushed to LLM servers via `update_weights_to()`

### Rollout System

The rollout system supports arbitrary agentic rollout paradigms, implemented as `RolloutCollector` instances. `RolloutCollector` exposes a `run_episode` method for users to implement the logic of collecting a complete agentic trajectory. Users can implement gymnasium-compatible `Agent` and `Environment` interfaces first and combine them as a collector as in normal RL literature (in `arealite/impl/agentic/`), or users can implement the collector directly if the agent-environment interfaces are not compatible with the desired use cases (in `arealite/impl/rlvr/`).

## Expected Usage

### Basic RL Training
```bash
python3 arealite.cli.main \
experiment_name=my-exp trial_name=my-trial \
trainer.type=grpo \
trainer.grpo.actor.path=Qwen/Qwen2-0.5B
```

### Rollout-Only Evaluation
```bash
python3 arealite.cli.main \
trainer.type=null \
valid_dataset.path=huggingface/dataset
```

### Distributed Training
```bash
python3 arealite.cli.main \
mode=ray \
allocation_mode=sglang.d16p1m1+d32p2m1 \
trainer.type=grpo
```

## Customization Guide

### Adding New Trainers

1. **Implement trainer class** in `impl/trainer/`:
```python
from arealite.api.trainer_api import Trainer

class MyTrainer(Trainer):
def train(self, resume_from_checkpoint=None):
# Implementation here
pass
```

2. **Add configuration** in `cli_args.py`:
```python
@dataclass
class MyTrainerConfig:
learning_rate: float = 1e-4
```

3. **Register in factory** in `trainer_api.py`:
```python
def make_trainer(self, config: TrainerConfig) -> Trainer:
if config.type == "my_trainer":
return MyTrainer(...)
```

### Adding New Rollout Collectors

1. **Implement collector** in `impl/`:
```python
from arealite.api.rollout_api import RolloutCollector

class MyCollector(RolloutCollector):
async def arun_episode(self, gconfig, env_option=None, seed=None):
# Implementation here
pass
```

2. **Register in factory** in `rollout_api.py`:
```python
def make_collector(self, config: RolloutCollectorConfig):
if config.type == "my_collector":
return MyCollector(...)
```

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing in production system often lacks is systematic support of reward model, verifier and generative reward model. From the RL point of view it can be encapsulated in the concept of environment but it is curbersome to integrate them. Method a bit here might be good for those who are using RL in production.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reward models can be implemented as additional TrainEngines, just like the reference model. Generative reward models is encapsulated in the RolloutWorkflow object (aka the previous RolloutCollector), e.g., the agent call use the same LLM but with different prompts to judge whether the previous generated answer is correct.

## Roadmap

- [ ] Finalize API design. (In-progress)
- [x] Implement standalone SGLang server (`impl/sglang_server.py`).
- [x] Implement SGLang client generation (`impl/sglang_client.py`).
- [x] Rollout pipeline (`tests/test_rollout.py`).
- [x] SGLang rollout interruption.
- [x] Asynchronous RL system-wide utilities (e.g., `RolloutController`).
- [ ] Various launching scripts: ray, torchrun, slurm.
- [ ] FSDP2 engine with transformers models. (In-progress)
- [ ] SFT trainer. (In-progress)
- [ ] SGLang update weights. (In-progress)
- [x] GRPO trainer.
- [ ] Add benchmarking against original AReaL
- [ ] CI and unittests.
- [ ] Other RL algorithms (DPO, REINFORCE, etc.)
- [ ] Support for multi-modal models
- [ ] User guide for transitioning from v0.3.0.
- [ ] Advanced agentic collectors (tool use, planning)
- [ ] Examples of training GSM8K, TLDR, and a search agent.
- [ ] Allow external persistent SGLang servers for debugging purposes.
Loading
Loading