From fdd957311626b5156ab409d00be1cc1bb86c6395 Mon Sep 17 00:00:00 2001 From: CharlieDreemur Date: Mon, 10 Mar 2025 00:27:41 -0500 Subject: [PATCH 1/4] Update Readme.md add huggingface dataset --- Readme.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/Readme.md b/Readme.md index 3dec7d5e..c83dfd74 100644 --- a/Readme.md +++ b/Readme.md @@ -1,5 +1,6 @@ # OpenManus-RL - +🤗 Dataset (OpenManus-RL) +

OpenManus-RL is an open-source initiative collaboratively led by **Ulab-UIUC** and **MetaGPT**. This project is an extended version of the original [@OpenManus](https://github.com/mannaandpoem/OpenManus) initiative. Inspired by successful RL tunning for reasoning LLM such as Deepseek-R1, QwQ-32B, we will explore new paradigms for RL-based LLM agent tuning, particularly building upon foundations. From 1a008e267f0509b98188611967ff665dd15d994d Mon Sep 17 00:00:00 2001 From: CharlieDreemur Date: Mon, 10 Mar 2025 02:09:49 -0500 Subject: [PATCH 2/4] Update Readme.md --- Readme.md | 39 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 39 insertions(+) diff --git a/Readme.md b/Readme.md index c83dfd74..180743b7 100644 --- a/Readme.md +++ b/Readme.md @@ -33,6 +33,7 @@ Code and dataset coming soon! Stay tuned! - [Test-time Scaling of Trajectories](#test-time-scaling-of-trajectories) - [Action Space Awareness and Strategic Exploration](#action-space-awareness-and-strategic-exploration) - [Integration with RL Tuning Frameworks](#integration-with-rl-tuning-frameworks) + - [Dataset](#dataset) - [Running](#Running) - [Related Work](#related-work) - [Agent tuning](#agent-tuning) @@ -157,6 +158,44 @@ In summary, our method systematically integrates advanced reasoning paradigms, d +# Dataset +[**OpenManusRL-Dataset**](https://huggingface.co/datasets/CharlieDreemur/OpenManus-RL) combines agent trajectories from [AgentInstruct](https://huggingface.co/datasets/THUDM/AgentInstruct) and [Agent-FLAN](https://huggingface.co/datasets/internlm/Agent-FLAN) with features: + +- 🔍 **ReAct Framework** - Reasoning-Acting integration +- 🧠 **Structured Training** - Separate format/reasoning learning +- 🚫 **Anti-Hallucination** - Negative samples + environment grounding +- 🌐 **6 Domains** - OS, DB, Web, KG, Household, E-commerce + +## Dataset Composition + +| Source | Trajectories | Avg Turns | Key Features | +|--------|--------------|-----------|--------------| +| [AgentInstruct](https://huggingface.co/datasets/THUDM/AgentInstruct) | 1,866 | 5.24 | Multi-task QA | +| [Agent-FLAN](https://huggingface.co/datasets/internlm/Agent-FLAN) | 3,000+ | 3-35 | Error recovery patterns | +| **Combined** | 4,866+ | 4-20 | Enhanced generalization | + +### Supported Tasks +- **text-generation**: ReAct-style instruction following +- **conversational-ai**: Tool-augmented dialogues + +### Languages +English + +## Data Instances + +**ReAct Pattern Example**: +```json +{ + "id": "os_0", + "conversations": [ + {"role": "user", "content": "Count files in /etc"}, + {"role": "assistant", "content": "Think: Need reliable counting method\nAct: bash\n```bash\nls -1 /etc | wc -l\n```"}, + {"role": "user", "content": "OS Output: 220"}, + {"role": "assistant", "content": "Think: Verified through execution\nAct: answer(220)"} + ] +} +``` + # Running ## OpenManus-RL From e2a01f4bce9b07d6489aa87d81f6c7dc706a35f4 Mon Sep 17 00:00:00 2001 From: CharlieDreemur Date: Mon, 10 Mar 2025 02:12:19 -0500 Subject: [PATCH 3/4] Update Readme.md --- Readme.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/Readme.md b/Readme.md index 180743b7..4b2a9a70 100644 --- a/Readme.md +++ b/Readme.md @@ -34,6 +34,8 @@ Code and dataset coming soon! Stay tuned! - [Action Space Awareness and Strategic Exploration](#action-space-awareness-and-strategic-exploration) - [Integration with RL Tuning Frameworks](#integration-with-rl-tuning-frameworks) - [Dataset](#dataset) + - [Dataset Overbiew](#dataset-overview) + - [Data Instance](#data-instance) - [Running](#Running) - [Related Work](#related-work) - [Agent tuning](#agent-tuning) @@ -166,7 +168,7 @@ In summary, our method systematically integrates advanced reasoning paradigms, d - 🚫 **Anti-Hallucination** - Negative samples + environment grounding - 🌐 **6 Domains** - OS, DB, Web, KG, Household, E-commerce -## Dataset Composition +## Dataset Overview | Source | Trajectories | Avg Turns | Key Features | |--------|--------------|-----------|--------------| From 8fd3c353411516c88410e5f45c089b5179155dce Mon Sep 17 00:00:00 2001 From: CharlieDreemur Date: Mon, 10 Mar 2025 02:12:38 -0500 Subject: [PATCH 4/4] Update Readme.md --- Readme.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Readme.md b/Readme.md index 4b2a9a70..c6e0dce2 100644 --- a/Readme.md +++ b/Readme.md @@ -35,7 +35,7 @@ Code and dataset coming soon! Stay tuned! - [Integration with RL Tuning Frameworks](#integration-with-rl-tuning-frameworks) - [Dataset](#dataset) - [Dataset Overbiew](#dataset-overview) - - [Data Instance](#data-instance) + - [Data Instances](#data-instances) - [Running](#Running) - [Related Work](#related-work) - [Agent tuning](#agent-tuning)