From fdd957311626b5156ab409d00be1cc1bb86c6395 Mon Sep 17 00:00:00 2001
From: CharlieDreemur
Date: Mon, 10 Mar 2025 00:27:41 -0500
Subject: [PATCH 1/4] Update Readme.md
add huggingface dataset
---
Readme.md | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/Readme.md b/Readme.md
index 3dec7d5e..c83dfd74 100644
--- a/Readme.md
+++ b/Readme.md
@@ -1,5 +1,6 @@
# OpenManus-RL
-
+🤗 Dataset (OpenManus-RL)
+
OpenManus-RL is an open-source initiative collaboratively led by **Ulab-UIUC** and **MetaGPT**.
This project is an extended version of the original [@OpenManus](https://github.com/mannaandpoem/OpenManus) initiative. Inspired by successful RL tunning for reasoning LLM such as Deepseek-R1, QwQ-32B, we will explore new paradigms for RL-based LLM agent tuning, particularly building upon foundations.
From 1a008e267f0509b98188611967ff665dd15d994d Mon Sep 17 00:00:00 2001
From: CharlieDreemur
Date: Mon, 10 Mar 2025 02:09:49 -0500
Subject: [PATCH 2/4] Update Readme.md
---
Readme.md | 39 +++++++++++++++++++++++++++++++++++++++
1 file changed, 39 insertions(+)
diff --git a/Readme.md b/Readme.md
index c83dfd74..180743b7 100644
--- a/Readme.md
+++ b/Readme.md
@@ -33,6 +33,7 @@ Code and dataset coming soon! Stay tuned!
- [Test-time Scaling of Trajectories](#test-time-scaling-of-trajectories)
- [Action Space Awareness and Strategic Exploration](#action-space-awareness-and-strategic-exploration)
- [Integration with RL Tuning Frameworks](#integration-with-rl-tuning-frameworks)
+ - [Dataset](#dataset)
- [Running](#Running)
- [Related Work](#related-work)
- [Agent tuning](#agent-tuning)
@@ -157,6 +158,44 @@ In summary, our method systematically integrates advanced reasoning paradigms, d
+# Dataset
+[**OpenManusRL-Dataset**](https://huggingface.co/datasets/CharlieDreemur/OpenManus-RL) combines agent trajectories from [AgentInstruct](https://huggingface.co/datasets/THUDM/AgentInstruct) and [Agent-FLAN](https://huggingface.co/datasets/internlm/Agent-FLAN) with features:
+
+- 🔍 **ReAct Framework** - Reasoning-Acting integration
+- 🧠 **Structured Training** - Separate format/reasoning learning
+- 🚫 **Anti-Hallucination** - Negative samples + environment grounding
+- 🌐 **6 Domains** - OS, DB, Web, KG, Household, E-commerce
+
+## Dataset Composition
+
+| Source | Trajectories | Avg Turns | Key Features |
+|--------|--------------|-----------|--------------|
+| [AgentInstruct](https://huggingface.co/datasets/THUDM/AgentInstruct) | 1,866 | 5.24 | Multi-task QA |
+| [Agent-FLAN](https://huggingface.co/datasets/internlm/Agent-FLAN) | 3,000+ | 3-35 | Error recovery patterns |
+| **Combined** | 4,866+ | 4-20 | Enhanced generalization |
+
+### Supported Tasks
+- **text-generation**: ReAct-style instruction following
+- **conversational-ai**: Tool-augmented dialogues
+
+### Languages
+English
+
+## Data Instances
+
+**ReAct Pattern Example**:
+```json
+{
+ "id": "os_0",
+ "conversations": [
+ {"role": "user", "content": "Count files in /etc"},
+ {"role": "assistant", "content": "Think: Need reliable counting method\nAct: bash\n```bash\nls -1 /etc | wc -l\n```"},
+ {"role": "user", "content": "OS Output: 220"},
+ {"role": "assistant", "content": "Think: Verified through execution\nAct: answer(220)"}
+ ]
+}
+```
+
# Running
## OpenManus-RL
From e2a01f4bce9b07d6489aa87d81f6c7dc706a35f4 Mon Sep 17 00:00:00 2001
From: CharlieDreemur
Date: Mon, 10 Mar 2025 02:12:19 -0500
Subject: [PATCH 3/4] Update Readme.md
---
Readme.md | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/Readme.md b/Readme.md
index 180743b7..4b2a9a70 100644
--- a/Readme.md
+++ b/Readme.md
@@ -34,6 +34,8 @@ Code and dataset coming soon! Stay tuned!
- [Action Space Awareness and Strategic Exploration](#action-space-awareness-and-strategic-exploration)
- [Integration with RL Tuning Frameworks](#integration-with-rl-tuning-frameworks)
- [Dataset](#dataset)
+ - [Dataset Overbiew](#dataset-overview)
+ - [Data Instance](#data-instance)
- [Running](#Running)
- [Related Work](#related-work)
- [Agent tuning](#agent-tuning)
@@ -166,7 +168,7 @@ In summary, our method systematically integrates advanced reasoning paradigms, d
- 🚫 **Anti-Hallucination** - Negative samples + environment grounding
- 🌐 **6 Domains** - OS, DB, Web, KG, Household, E-commerce
-## Dataset Composition
+## Dataset Overview
| Source | Trajectories | Avg Turns | Key Features |
|--------|--------------|-----------|--------------|
From 8fd3c353411516c88410e5f45c089b5179155dce Mon Sep 17 00:00:00 2001
From: CharlieDreemur
Date: Mon, 10 Mar 2025 02:12:38 -0500
Subject: [PATCH 4/4] Update Readme.md
---
Readme.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Readme.md b/Readme.md
index 4b2a9a70..c6e0dce2 100644
--- a/Readme.md
+++ b/Readme.md
@@ -35,7 +35,7 @@ Code and dataset coming soon! Stay tuned!
- [Integration with RL Tuning Frameworks](#integration-with-rl-tuning-frameworks)
- [Dataset](#dataset)
- [Dataset Overbiew](#dataset-overview)
- - [Data Instance](#data-instance)
+ - [Data Instances](#data-instances)
- [Running](#Running)
- [Related Work](#related-work)
- [Agent tuning](#agent-tuning)