🎉News

🚀 Scaling VLA Training via Reinforcement Learning

We demonstrate that even simple 0/1 rewards can enable effective, scalable, generalizable online RL for VLA models.

Overview of SimpleVLA-RL. SimpleVLA-RL is an efficient RL framework for VLA that improves long-horizon planning under data scarcity, outperforms SFT in simulation and real-world tasks, reveals a “pushcut” new-action phenomenon, and strengthens spatial/object/goal generalization.

🎉News

[2025-10-01] SimpleVLA-RL now supports RoboTwin2.0 Benchmark. Feel free to experiment with it!
[2025-09-12] Excited to release the SimpleVLA-RL paper! Check it out: Paper.
[2025-05-27] We release the code of SimpleVLA-RL.

📖Overview

We introduce SimpleVLA-RL, a simple yet effective approach for online Reinforcement Learning (RL) for Vision-Language-Action (VLA) models, which utilizes only outcome-level 0/1 rule-based reward signals directly obtained from simulation environments.

📃Main Results

We evaluate SimpleVLA-RL on the LIBERO using OpenVLA-OFT. SimpleVLA-RL improves the performance of OpenVLA-OFT to 97.6 points on LIBERO-Long and sets a new state-of-the-art. Remarkably, using only one trajectory per task for cold-start SFT, SimpleVLA-RL raises the performance of OpenVLA-OFT from 17.3 to 91.7, yielding an improvement of 74.4 points (430.1%).

✨Getting Started

1. Set Up the Environment

See SETUP.md for detailed instructions on setting up the conda environment.

2. Prepare the SFT Model

An SFT (Supervised Fine-Tuning) VLA model is required for RL training. Below are the available options:

OpenVLA-OFT SFT Models
Download from the SimpleVLA-RL Collection. Available models include:
- libero-10 traj1/trajall SFT
- libero-goal/object/spatial traj1 SFT
- Robotwin2.0 tasks traj1000 SFT
OpenVLA SFT Models
Download from here.
Other Models
For other models, you may need to fine-tune them yourself.

3. Train with SimpleVLA-RL

Before running the training script, ensure the following configurations are properly set:

Set Your Weights and Biases (WandB) API Key
Replace the WANDB_API_KEY field in SimpleVLA-RL/align.json with your own WandB API key.
Modify Key Variables
Update the following variables in examples/run_openvla_oft_rl_libero/twin2.sh as needed:
- WANDB_API_KEY: Your WandB API key.
- EXPERIMENT_NAME: The name of your experiment. You can choose any name.
- SFT_MODEL_PATH: Path to your SFT model.
- CKPT_PATH: Path where your checkpoints will be saved.
- DATASET_NAME: For detailed options, refer to examples/run_openvla_oft_rl_libero/twin2.sh.
- ALIGN_PATH: Path to the SimpleVLA-RL/align.json file.
- NUM_GPUS: Number of GPUs available per node (e.g., 8).
- NUM_NODES: Number of nodes used for RL training (e.g., 1).

Note

The script has been tested on the following configurations:
- Single-node setup: NUM_NODES=1, NUM_GPUS=8 (1 node with 8 NVIDIA A800 GPUs, each having 80GB memory).
- Multi-node setup: NUM_NODES=2, NUM_GPUS=8 (2 nodes with 16 NVIDIA A800 GPUs, each having 80GB memory).
The driver version used is 470.161.03, and the CUDA version is 12.4. (Not necessary)

Run RL Training
Use the following command to start RL training for OpenVLA-OFT on the LIBERO or RoboTwin2.0 benchmark:
```
bash examples/run_openvla_oft_rl_libero.sh
or
bash examples/run_openvla_oft_rl_twin2.sh
```

4. Run Evaluation

To evaluate the performance of your model, enable evaluation mode by setting trainer.val_only=True in examples/run_openvla_oft_rl_libero/twin2.sh. Then, execute the same script:

bash examples/run_openvla_oft_rl_libero.sh
or
bash examples/run_openvla_oft_rl_twin2.sh

🌻Acknowledgement

We develop this preview version of the code based on veRL, OpenVLA-OFT, RoboTwin2.0, and PRIME. We acknowledge their significant contributions! For further details and updates, please refer to the official documentation and repositories of the respective projects.

📨Contact

Haozhan Li: [email protected]
Ning Ding: [email protected]

📝TODO

Models:
- ✅ Support OpenVLA and OpenVLA-OFT
- ⏳ Support Pi0 fast tokenizer
Benchmarks:
- ✅ Support LIBERO benchmark
- ✅ Support RoboTwin benchmark

🎈Citation

If you find SimpleVLA-RL helpful, please cite us.

@article{li2025simplevla,
  title={SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning},
  author={Li, Haozhan and Zuo, Yuxin and Yu, Jiale and Zhang, Yuhao and Yang, Zhaohui and Zhang, Kaiyan and Zhu, Xuekai and Zhang, Yuchen and Chen, Tianxing and Cui, Ganqu and others},
  journal={arXiv preprint arXiv:2509.09674},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
examples		examples
figs		figs
modified_codes/robotwin2		modified_codes/robotwin2
verl		verl
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SETUP.md		SETUP.md
align.json		align.json
copy_overwrite_robotwin2.sh		copy_overwrite_robotwin2.sh
pre_collect_robotwin2_seed.sh		pre_collect_robotwin2_seed.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🚀 Scaling VLA Training via Reinforcement Learning

🎉News

📖Overview

📃Main Results

✨Getting Started

1. Set Up the Environment

2. Prepare the SFT Model

3. Train with SimpleVLA-RL

4. Run Evaluation

🌻Acknowledgement

📨Contact

📝TODO

🎈Citation

About

Uh oh!

Releases

Packages

Contributors 4

Languages

License

PRIME-RL/SimpleVLA-RL

Folders and files

Latest commit

History

Repository files navigation

🚀 Scaling VLA Training via Reinforcement Learning

🎉News

📖Overview

📃Main Results

✨Getting Started

1. Set Up the Environment

2. Prepare the SFT Model

3. Train with SimpleVLA-RL

4. Run Evaluation

🌻Acknowledgement

📨Contact

📝TODO

🎈Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages