[WIP] [Refactor] Add AReaLite API and examples. #125

garrett4wade · 2025-06-23T12:32:23Z

Motivation

AReaL is too heavy for AI researchers to use, understand, and develop with for several reasons. The most important issue is that its code architecture is system-centric rather than AI-centric — the RL algorithm workflow consists of multiple workers that run consecutive model function calls, neither of which are well-known concepts for AI researchers. As a result, users must first understand these concepts before they can develop workflows and algorithms for their own use cases.

Additionally, due to historical reasons, AReaL's code is not clean. There are large pieces of code inherited from previous projects that are not useful but significantly increase the burden on users and developers. Sometimes debugging is difficult even for core developers like myself.

Since the tools for building RL workflows are becoming increasingly mature, implementing a framework that achieves comparable efficiency requires much fewer lines of code. Now is the proper time to revisit the API design and distill the giant codebase into a neat and clean one. The distilled codebase does not need to be ultra-efficient. Instead, we want to deliver 90% functionality of the original AReaL while minimizing the lines of code and the burden on potential users. Our aim is to build an RL training framework that is fast to use, fast to read, and fast to execute. Here comes the lite version of AReaL — AReaLite.

AReaLite is the first step in AReaL's refactoring process. It is not only a standalone training library with shallow interfaces, but will also provide the core API definitions to be used by AReaL in the future. AReaL will essentially transform its current worker-based architecture into an AI-centric architecture like AReaLite. AReaL will extend AReaLite's APIs and implementations to support more backends for efficient large-scale training.

Expectations of AReaLite

Highlights

Fully asynchronous training with decoupled inference and training.
Elastic inference device scaling — users can shut down or launch more inference processes independently during training.
Full SFT/RL algorithmic functionality matching AReaL.
Arbitrary agentic rollout workflow customization in a single file.
Easy navigation to implementation references via Ctrl+click in VSCode.
Support for distributed launching with Ray/SLURM/torchrun.

AReaLite's Scope

Not bound to Ray.
Only supports SGLang and PyTorch FSDP2 with SPMD launching.
No customized data structures like SequenceSample. All data are PyTorch tensors.
Uses HuggingFace (models, datasets) and PyTorch (FSDP, data structures) as much as possible.

Architecture

Core Components

arealite/
├── api/           # Abstract interfaces and data structures
├── impl/          # Concrete implementations
├── cli/           # Command-line interfaces
├── config/        # Configuration templates
└── tests/         # Standalone test scripts

Data Flow Architecture

AReaLite uses an async producer-consumer pattern:

┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│   LLM Servers   │◄──►│ Rollout Workers  │───►│   Data Buffer   │
│   (SGLang)      │    │  (Async Batch)   │    │                 │
└─────────────────┘    └──────────────────┘    └─────────────────┘
        ▲                                               │
        │                                               ▼
┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│   Checkpoints   │◄───│  FSDP Trainer    │◄───│ Training Loop   │
│                 │    │  (Sync Batch)    │    │                 │
└─────────────────┘    └──────────────────┘    └─────────────────┘

Key Design Principles

1. AI-Centric API Design

Unlike the original AReaL's system-centric approach with workers and model functions, AReaLite uses familiar ML concepts:

Agent and Environment (from RL literature)
RolloutWorkflow (combines multiple agents and the environment to generate rollout data)
Trainer (from HuggingFace/PyTorch, fetches rollout data and updates model parameters)

2. Factory Pattern for Extensibility

Each major component uses a factory pattern for easy customization:

EngineFactory creates training backends
TrainerFactory creates training algorithms
RolloutWorkflowFactory creates rollout workflows

3. Configuration-Driven Architecture

All components are configured through dataclasses defined in cli_args.py, enabling:

Type-safe configuration
Easy CLI argument generation
Clear documentation of available options

Roadmap

…onAI/AReaL into fw/refactor

* . * . * efficient loading * format * . * .

…to fw/refactor

into fw/refactor

arealite/README.md

tsaoyu · 2025-07-05T08:16:21Z

arealite/README.md

+┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
+│   Checkpoints   │◄───│  FSDP Trainer    │◄───│ Training Loop   │
+│                 │    │  (Sync Batch)    │    │                 │
+└─────────────────┘    └──────────────────┘    └─────────────────┘


Add a main entry point annotation somewhere?

Added a data flow arch graph in the code walk-through example: https://github.com/inclusionAI/AReaL/blob/lite/docs/arealite/gsm8k_grpo.md

arealite/README.md

tsaoyu · 2025-07-05T08:27:13Z

arealite/README.md

+    if config.type == "my_collector":
+        return MyCollector(...)
+```
+


One thing in production system often lacks is systematic support of reward model, verifier and generative reward model. From the RL point of view it can be encapsulated in the concept of environment but it is curbersome to integrate them. Method a bit here might be good for those who are using RL in production.

Reward models can be implemented as additional TrainEngines, just like the reference model. Generative reward models is encapsulated in the RolloutWorkflow object (aka the previous RolloutCollector), e.g., the agent call use the same LLM but with different prompts to judge whether the previous generated answer is correct.

tsaoyu · 2025-07-05T08:36:40Z

arealite/api/cli_args.py

+    def __str__(self):
+        """Returns compact string representation: 'Parallel(mp=X,pp=Y,dp=Z)'."""
+        return (
+            f"Parallel(mp={self.tensor_parallel_size},"


No "mp" any more.

tsaoyu · 2025-07-05T08:51:45Z

arealite/api/cli_args.py

+    c_clip: Optional[float] = field(
+        default=None,
+        metadata={
+            "help": "Dual clipping factor for policy ratio, must > 1.0. None disables dual clipping."


Is there any data validation similar to PyDantic to check range of parameters here?

Good idea. I'll mark that.

arealite/api/io_struct.py

tsaoyu · 2025-07-05T08:57:35Z

arealite/api/llm_client_api.py

+# Licensed under the Apache License, Version 2.0
+
+import abc
+import asyncio


I saw there is uvloop as the async manager. Does it actually used in the project

Changed to uvloop.run wherever using asyncio.run.

tsaoyu · 2025-07-05T09:00:36Z

arealite/api/llm_server_api.py

+        # Cleanup registry
+        try:
+            self.registry.unregister_server(self.server_id)
+        except Exception as e:


The catch seems a bit too wide

tsaoyu · 2025-07-05T09:16:59Z

arealite/impl/agentic/math_code_single_step.py

+
+        return Trajectory(
+            prompt=env_option,
+            data=dict(rewards=torch.tensor(rewards), **pad_sequences_to_tensors(data)),


**pad_sequences_to_tensors(data) in function? return data=data_dict? TensorDict is not that bad though

Uses TensorDict as the basic data structure

arealite/impl/engine/hf_wrapper.py

tsaoyu · 2025-07-05T09:42:58Z

arealite/impl/trainer/grpo.py

+                        # Run batched rollout by submitting requests to LLM servers
+                        trajs = self.rollout_controller.generate_batch(
+                            batch_size=len(data),
+                            env_options=data,


This is good, very clear on prepare_batch and generate_batch

tsaoyu · 2025-07-05T09:45:24Z

arealite/impl/trainer/grpo.py

+                            base_model_path=self.config.actor.path,
+                        )
+
+                assert len(mb_stats) == self.config.ppo_n_minibatches


Throw some hints here such as f"Micro batch mismatch, current {mb_stats} and config {self.config.ppo_n_minibatches}. Check your configuration. "

tsaoyu · 2025-07-05T09:50:49Z

arealite/launcher/with_ray.py

haha, why this is empty

It's not empty in the new lite branch. :)

* ci: add test-arealite * ci: add checkout before running test-arealite * ci: add USERNAME * ci: add test script * ci: add GitHub mirror * ci: fix typo * ci: clone one commit * ci: fix condition * ci: set command timeout to 60m * ci: enable pip cache * ci: optimize container lifecycle * ci: split into many stages * ci(test-arealite): fix typo * ci: fix wrong env * ci: fix pytest * ci: uninstall transformer-engine * ci: uninstall transformer-engine * ci: fix model paths * ci: show stdout/stderr * ci: fix not clean up * ci: backup sglang * ci: remove tmp repo dir when run * ci: fix docker run exit 1 condition * ci(test-arealite): limit the concurrency and extend command timeout

garrett4wade · 2025-07-07T01:52:27Z

@tsaoyu Hi Yu, thank you for the thorough review!

We're currently finalizing some internal discussions about the API design, and the final implementation will differ slightly from what's currently proposed. We'd like to defer addressing these specific issues for a few days while we complete those discussions.

Your feedback is valuable and we'll incorporate it into our revised implementation. We'd appreciate having you review again once the final version is ready.

Thanks for your patience!

garrett4wade · 2025-08-01T13:11:36Z

Closed since #154 has been merged.

garrett4wade and others added 30 commits June 11, 2025 15:57

initial proposal

b23765c

add arealite

d389cd8

.

577f72c

change api

989bc8a

.

fd7103f

remove LOG_ROOT

ad76c83

remove MODEL_SAVE_PATH

7a51668

remove PARAM_REALLOC_PATH, DATASET_CACHE

7adfedd

prepare for testing

170ec3a

prepare for testing

a433a29

ready for run

d15b091

local run

0b0aa9c

tests mainly pass

cafc602

format

2f17696

.

b591015

Merge branch 'fw/gh/fix-init-constants' of https://github.com/inclusi…

7a81f55

…onAI/AReaL into fw/refactor

amend cluster.py

0d07566

.

cd771ad

.

8f4370d

client test pass

29b3d50

pass rollout test

dbb2703

remove unused imports

b0d0026

add arealite readme

da41bf1

change api

90cc896

.

7de1863

.

5441e87

.

2a01a5a

.

1183bc4

.

1a63361

.

f9390da

nuzant and others added 7 commits July 1, 2025 10:13

Fix unresolved issue in SFTTrainer PR (#139)

09f339f

* . * . * efficient loading * format * . * .

Merge branch 'fw/refactor' of https://github.com/inclusionAI/AReaL in…

df5ee49

…to fw/refactor

Merge branch 'fw/refactor2' of https://code.alipay.com/inclusionAI/AReaL

d1f863c

into fw/refactor

.

ab7503a

.

a5299b1

.

3a8796b

.

89a8d8c