Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 43 additions & 1 deletion Readme.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
# OpenManus-RL

🤗 <a href="https://huggingface.co/datasets/CharlieDreemur/OpenManus-RL" target="_blank">Dataset (OpenManus-RL)</a>
</p>
OpenManus-RL is an open-source initiative collaboratively led by **Ulab-UIUC** and **MetaGPT**.

This project is an extended version of the original [@OpenManus](https://github.com/mannaandpoem/OpenManus) initiative. Inspired by successful RL tunning for reasoning LLM such as Deepseek-R1, QwQ-32B, we will explore new paradigms for RL-based LLM agent tuning, particularly building upon foundations.
Expand Down Expand Up @@ -32,6 +33,9 @@ Code and dataset coming soon! Stay tuned!
- [Test-time Scaling of Trajectories](#test-time-scaling-of-trajectories)
- [Action Space Awareness and Strategic Exploration](#action-space-awareness-and-strategic-exploration)
- [Integration with RL Tuning Frameworks](#integration-with-rl-tuning-frameworks)
- [Dataset](#dataset)
- [Dataset Overbiew](#dataset-overview)
- [Data Instances](#data-instances)
- [Running](#Running)
- [Related Work](#related-work)
- [Agent tuning](#agent-tuning)
Expand Down Expand Up @@ -156,6 +160,44 @@ In summary, our method systematically integrates advanced reasoning paradigms, d
</div>
</div>

# Dataset
[**OpenManusRL-Dataset**](https://huggingface.co/datasets/CharlieDreemur/OpenManus-RL) combines agent trajectories from [AgentInstruct](https://huggingface.co/datasets/THUDM/AgentInstruct) and [Agent-FLAN](https://huggingface.co/datasets/internlm/Agent-FLAN) with features:

- 🔍 **ReAct Framework** - <a href="https://react-lm.github.io/" target="_blank">Reasoning-Acting integration</a>
- 🧠 **Structured Training** - Separate format/reasoning learning
- 🚫 **Anti-Hallucination** - Negative samples + environment grounding
- 🌐 **6 Domains** - OS, DB, Web, KG, Household, E-commerce

## Dataset Overview

| Source | Trajectories | Avg Turns | Key Features |
|--------|--------------|-----------|--------------|
| [AgentInstruct](https://huggingface.co/datasets/THUDM/AgentInstruct) | 1,866 | 5.24 | Multi-task QA |
| [Agent-FLAN](https://huggingface.co/datasets/internlm/Agent-FLAN) | 3,000+ | 3-35 | Error recovery patterns |
| **Combined** | 4,866+ | 4-20 | Enhanced generalization |

### Supported Tasks
- **text-generation**: ReAct-style instruction following
- **conversational-ai**: Tool-augmented dialogues

### Languages
English

## Data Instances

**ReAct Pattern Example**:
```json
{
"id": "os_0",
"conversations": [
{"role": "user", "content": "Count files in /etc"},
{"role": "assistant", "content": "Think: Need reliable counting method\nAct: bash\n```bash\nls -1 /etc | wc -l\n```"},
{"role": "user", "content": "OS Output: 220"},
{"role": "assistant", "content": "Think: Verified through execution\nAct: answer(220)"}
]
}
```

# Running

## OpenManus-RL
Expand Down