🧠 AutoThink: Adaptive Reasoning in R1-Style Models

🔗 Codebase | 🤗 Hugging Face | 📑 Paper | 📖 WeChat Chinese Version

🏆 This work has been accepted to NeurIPS 2025 (Poster).

AutoThink is a reinforcement learning framework designed to equip R1-style language models with adaptive reasoning capabilities. Instead of always thinking or never thinking, the model learns when to engage in explicit reasoning, balancing performance and efficiency.

This repository implements AutoThink, as described in our paper:

Learning When to Think: Shaping Adaptive Reasoning in R1-Style Models via Multi-Stage RL

📰 News

[2025/05/28] Our work was featured on the QbitAI WeChat public account: 📖 Chinese Version
[2025/05/27] We apply AutoThink to the SOTA 7B model Skywork-OR1-Math-7B. AutoThink reduces reasoning token usage by 56% with less than 2% accuracy degradation. We also updated the paper to fix minor issues and released the corresponding trained model.
[2025/05/16] We release the Code, Models, and Paper for AutoThink.

🚀 Features

🧩 Minimal Prompting with ellipsis (<think>\n...\n) to activate stochastic thinking.
🎯 Multi-stage RL to stabilize, reinforce, and prune reasoning behavior.
⚙️ Integrated with the verl framework.
📊 Benchmarked on five mathematical reasoning datasets: MATH, Minerva, Olympiad, AIME24, AMC23.

⚙️ Environment Setup (Recommended)

We recommend using Python 3.10 and PyTorch ≥ 2.1.
Our experimental setup follows the configuration of the DeepScaleR environment.

Install the environment:

# Recommend Python 3.10.
git clone https://github.com/ScienceOne-AI/AutoThink.git
cd deepscaler
pip install -e ./verl
pip install -e .

The raw training data is located in deepscaler/data/[train|test], along with preprocessing scripts. To convert the raw data into Parquet files for training, run:

# Output parquet files in data/*.parquet.
python scripts/data/deepscaler_dataset.py

💡 Different Prompt Strategies

You can control the model's reasoning behavior by modifying the chat_template field in tokenizer_config.json. Update the value with one of the following:

Standard Prompt (default for Distill-R1, no changes needed):

"<|Assistant|><think>\n"

No-Thinking Prompt (forces minimal reasoning):

"<|Assistant|><think>\nOkay, I think I have finished thinking.\n</think>\n\n"

Ellipsis Prompt (adaptive reasoning mode):

"<|Assistant|><think>\n...\n"

These prompts enable different reasoning behaviors.
Before AutoThink training, please replace the default chat_template with Ellipsis Prompt and keep the inference prompt consistent.

🏋️ Training

AutoThink training proceeds in three stages with different reward designs:

# Stage 1: Stabilize dual-mode reasoning
bash scripts/train_stage1.sh

# Stage 2: Reinforce accurate behavior
bash scripts/train_stage2.sh

# Stage 3: Prune redundant reasoning
bash scripts/train_stage3.sh

Make sure to configure your model paths and data in scripts/train_*.sh.

📈 Evaluation

After training, evaluate the model using:

bash scripts/eval/eval_model_1.5b.sh

📄 Citation

@article{tu2025learning,
  title={Learning When to Think: Shaping Adaptive Reasoning in R1-Style Models via Multi-Stage RL},
  author={Tu, Songjun and Lin, Jiahao and Zhang, Qichao and Tian, Xiangyu and Li, Linjing and Lan, Xiangyuan and Zhao, Dongbin},
  journal={arXiv preprint arXiv:2505.10832},
  year={2025}
}

🔍 Acknowledgements

We build and reference on the following open source trunks, and thank the following sources for their contributions to the LLM-Reasoning open source community:

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
assets		assets
deepscaler		deepscaler
scripts		scripts
verl		verl
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🧠 AutoThink: Adaptive Reasoning in R1-Style Models

📰 News

🚀 Features

⚙️ Environment Setup (Recommended)

💡 Different Prompt Strategies

🏋️ Training

📈 Evaluation

📄 Citation

🔍 Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

ScienceOne-AI/AutoThink

Folders and files

Latest commit

History

Repository files navigation

🧠 AutoThink: Adaptive Reasoning in R1-Style Models

📰 News

🚀 Features

⚙️ Environment Setup (Recommended)

💡 Different Prompt Strategies

🏋️ Training

📈 Evaluation

📄 Citation

🔍 Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages