Skip to content

PRIME-RL/SimpleVLA-RL

Repository files navigation

🚀 Scaling VLA Training via Reinforcement Learning

Paper Github Hugging Face Collection Twitter WeChat

We demonstrate that even simple 0/1 rewards can enable effective, scalable, generalizable online RL for VLA models.

Overview of SimpleVLA-RL.

Overview of SimpleVLA-RL. SimpleVLA-RL is an efficient RL framework for VLA that improves long-horizon planning under data scarcity, outperforms SFT in simulation and real-world tasks, reveals a “pushcut” new-action phenomenon, and strengthens spatial/object/goal generalization.

🎉News

  • [2025-10-01] SimpleVLA-RL now supports RoboTwin2.0 Benchmark. Feel free to experiment with it!
  • [2025-09-12] Excited to release the SimpleVLA-RL paper! Check it out: Paper.
  • [2025-05-27] We release the code of SimpleVLA-RL.

📖Overview

We introduce SimpleVLA-RL, a simple yet effective approach for online Reinforcement Learning (RL) for Vision-Language-Action (VLA) models, which utilizes only outcome-level 0/1 rule-based reward signals directly obtained from simulation environments.

Overview of SimpleVLA-RL.

📃Main Results

We evaluate SimpleVLA-RL on the LIBERO using OpenVLA-OFT. SimpleVLA-RL improves the performance of OpenVLA-OFT to 97.6 points on LIBERO-Long and sets a new state-of-the-art. Remarkably, using only one trajectory per task for cold-start SFT, SimpleVLA-RL raises the performance of OpenVLA-OFT from 17.3 to 91.7, yielding an improvement of 74.4 points (430.1%).

Main Results of SimpleVLA-RL.

✨Getting Started

1. Set Up the Environment

See SETUP.md for detailed instructions on setting up the conda environment.

2. Prepare the SFT Model

An SFT (Supervised Fine-Tuning) VLA model is required for RL training. Below are the available options:

  • OpenVLA-OFT SFT Models
    Download from the SimpleVLA-RL Collection. Available models include:

    • libero-10 traj1/trajall SFT
    • libero-goal/object/spatial traj1 SFT
    • Robotwin2.0 tasks traj1000 SFT
  • OpenVLA SFT Models
    Download from here.

  • Other Models
    For other models, you may need to fine-tune them yourself.

3. Train with SimpleVLA-RL

Before running the training script, ensure the following configurations are properly set:

  • Set Your Weights and Biases (WandB) API Key
    Replace the WANDB_API_KEY field in SimpleVLA-RL/align.json with your own WandB API key.

  • Modify Key Variables
    Update the following variables in examples/run_openvla_oft_rl_libero/twin2.sh as needed:

    • WANDB_API_KEY: Your WandB API key.
    • EXPERIMENT_NAME: The name of your experiment. You can choose any name.
    • SFT_MODEL_PATH: Path to your SFT model.
    • CKPT_PATH: Path where your checkpoints will be saved.
    • DATASET_NAME: For detailed options, refer to examples/run_openvla_oft_rl_libero/twin2.sh.
    • ALIGN_PATH: Path to the SimpleVLA-RL/align.json file.
    • NUM_GPUS: Number of GPUs available per node (e.g., 8).
    • NUM_NODES: Number of nodes used for RL training (e.g., 1).

Note

  • The script has been tested on the following configurations:
    • Single-node setup: NUM_NODES=1, NUM_GPUS=8 (1 node with 8 NVIDIA A800 GPUs, each having 80GB memory).
    • Multi-node setup: NUM_NODES=2, NUM_GPUS=8 (2 nodes with 16 NVIDIA A800 GPUs, each having 80GB memory).
  • The driver version used is 470.161.03, and the CUDA version is 12.4. (Not necessary)
  • Run RL Training
    Use the following command to start RL training for OpenVLA-OFT on the LIBERO or RoboTwin2.0 benchmark:

    bash examples/run_openvla_oft_rl_libero.sh
    or
    bash examples/run_openvla_oft_rl_twin2.sh

4. Run Evaluation

To evaluate the performance of your model, enable evaluation mode by setting trainer.val_only=True in examples/run_openvla_oft_rl_libero/twin2.sh. Then, execute the same script:

bash examples/run_openvla_oft_rl_libero.sh
or
bash examples/run_openvla_oft_rl_twin2.sh

🌻Acknowledgement

We develop this preview version of the code based on veRL, OpenVLA-OFT, RoboTwin2.0, and PRIME. We acknowledge their significant contributions! For further details and updates, please refer to the official documentation and repositories of the respective projects.

📨Contact

📝TODO

  • Models:
    • ✅ Support OpenVLA and OpenVLA-OFT
    • ⏳ Support Pi0 fast tokenizer
  • Benchmarks:
    • ✅ Support LIBERO benchmark
    • ✅ Support RoboTwin benchmark

🎈Citation

If you find SimpleVLA-RL helpful, please cite us.

@article{li2025simplevla,
  title={SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning},
  author={Li, Haozhan and Zuo, Yuxin and Yu, Jiale and Zhang, Yuhao and Yang, Zhaohui and Zhang, Kaiyan and Zhu, Xuekai and Zhang, Yuchen and Chen, Tianxing and Cui, Ganqu and others},
  journal={arXiv preprint arXiv:2509.09674},
  year={2025}
}