diff --git a/.gitignore b/.gitignore
index ac0720b7..a61c97ad 100644
--- a/.gitignore
+++ b/.gitignore
@@ -26,4 +26,4 @@ __pycache__
 data/output_env_groups/
 data/output_txt_files/
 
-
+outputs/
diff --git a/README.md b/README.md
index 7d0a4b78..e8147381 100644
--- a/README.md
+++ b/README.md
@@ -47,6 +47,7 @@ Code and dataset coming soon! Stay tuned!
 - [Acknowledgement](#acknowledgement)
 - [Community Group](#community-group)
 - [Citation](#citation)
+- [Documentation](#documentation)
 
 ---
 
@@ -368,3 +369,9 @@ Please cite the following paper if you find OpenManus helpful!
   </picture>
 </a>
 </p>
+
+## Documentation
+- [Development Guide (English)](docs/DEVELOPMENT_GUIDE_EN.md)
+- [Development Guide (Chinese)](docs/DEVELOPMENT_GUIDE_ZH.md)
+- [Training Process Overview (English)](docs/README.md)
+- [Training Process Overview (Chinese)](docs/README_ZH.md)
diff --git a/docs/README.md b/docs/README.md
new file mode 100644
index 00000000..6c052b00
--- /dev/null
+++ b/docs/README.md
@@ -0,0 +1,189 @@
+# OpenManus Model Training Overview
+
+This document provides a detailed explanation of the training logic and core functions in OpenManus, focusing on how agent trajectories are utilized for loss calculation and training.
+
+## Overall Training Architecture
+
+OpenManus uses Proximal Policy Optimization (PPO) algorithm with the following core components:
+
+```
+RayPPOTrainer (Main Trainer)
+├── OpenManusAgent (Environment Interaction)
+├── ActorRolloutRefWorker (Policy Network)
+└── CriticWorker (Value Network)
+```
+
+## Main Training Loop
+
+The training core is implemented in the `RayPPOTrainer.fit()` method:
+
+```python
+# Simplified training loop
+for epoch in range(epochs):
+    for batch in train_dataloader:
+        # 1. Collect trajectories
+        trajectories = generation_manager.run_llm_loop(batch)
+        
+        # 2. Calculate composite rewards
+        compute_total_reward(trajectories)
+        
+        # 3. Compute advantage function
+        compute_advantage(batch)
+        
+        # 4. Update critic network
+        critic_wg.update_critic(batch)
+        
+        # 5. Update actor network
+        actor_rollout_wg.update_actor(batch)
+```
+
+## Key Processes
+
+### 1. Trajectory Collection
+
+Implemented in `OpenManusAgent.run_llm_loop` and `_run_single_rollout`:
+
+```python
+# Key logic in _run_single_rollout
+while not done:
+    # Get current observation
+    observation = client.observe()
+    
+    # Generate LLM response
+    response = actor_model.generate(observation)
+    
+    # Parse action
+    action = parse_action(response)
+    
+    # Execute environment step
+    next_obs, reward, done, info = client.step(action)
+    
+    # Record trajectory
+    trajectory.append({"from": "human", "value": next_obs, "reward": reward, "info": info})
+```
+
+### 2. Reward Composition
+
+Multiple reward signals are combined through the `RewardComposer`:
+
+```python
+# Called in _convert_rollout_results_to_dataproto
+total_score, breakdown = reward_composer.compute_total_reward(
+    trajectory=trajectory,
+    reward_model_info=reward_model_info,
+    env_name=env_name
+)
+```
+
+Main reward components include:
+- `GoalReward`: Primary task success reward
+- `LengthPenalty`: Penalty for excessive length
+- `FormatReward`: Reward for correct output format
+
+### 3. Reward Allocation
+
+In the `_convert_rollout_results_to_dataproto` method, rewards are allocated to individual tokens:
+
+```python
+# Different reward allocation strategies:
+if reward_allocation == "last_token":
+    # Assign reward only to the last token
+    token_level_rewards[0, last_segment_end] = reward_to_distribute
+    
+elif reward_allocation == "uniform_positive":
+    # Distribute positive rewards evenly, negative rewards only to the last token
+    if reward_to_distribute > 0:
+        reward_per_token = reward_to_distribute / total_agent_tokens
+        for start, end in agent_indices_in_padded:
+            token_level_rewards[0, start:end+1] = reward_per_token
+            
+elif reward_allocation == "discounted":
+    # Discounted rewards, allocated backward from the last segment
+    gamma = config.algorithm_config.get('gamma', 1.0)
+    current_reward = reward_to_distribute
+    for start, end in reversed(agent_indices_in_padded):
+        # Calculate reward within each segment
+        token_level_rewards[0, start:end+1] = current_reward / segment_len
+        current_reward *= (gamma ** segment_len)
+```
+
+### 4. Advantage Computation
+
+In the `compute_advantage` function, Generalized Advantage Estimation (GAE) is used:
+
+```python
+if adv_estimator == 'gae':
+    advantages, returns = core_algos.compute_gae_advantage_return(
+        token_level_rewards=token_level_rewards,
+        values=values,
+        eos_mask=response_mask,
+        gamma=gamma,
+        lam=lam
+    )
+```
+
+### 5. Policy Update
+
+The policy is updated in `update_actor` using the PPO objective function:
+
+```python
+def update_policy(self, data):
+    old_log_probs = data.batch['old_log_probs']
+    advantages = data.batch['advantages']
+    
+    # Calculate log probabilities of current policy
+    current_log_probs = self.compute_log_prob(data)
+    
+    # Calculate policy ratio
+    ratio = torch.exp(current_log_probs - old_log_probs)
+    
+    # Clip ratio
+    ratio_clipped = torch.clamp(ratio, 1-clip_eps, 1+clip_eps)
+    
+    # PPO objective
+    policy_loss = -torch.min(
+        advantages * ratio,
+        advantages * ratio_clipped
+    ).mean()
+    
+    self.optimizer.zero_grad()
+    policy_loss.backward()
+    self.optimizer.step()
+```
+
+### 6. Value Network Update
+
+The critic network is updated in `update_critic` by minimizing the value loss:
+
+```python
+def update_critic(self, data):
+    values = self.compute_values(data)
+    returns = data.batch['returns']
+    
+    # Value loss
+    value_loss = F.mse_loss(values, returns)
+    
+    self.optimizer.zero_grad()
+    value_loss.backward()
+    self.optimizer.step()
+```
+
+## Distributed Training Architecture
+
+OpenManus uses Ray and FSDP (Fully Sharded Data Parallel) for distributed training:
+
+- `ActorRolloutRefWorker`: Responsible for policy network inference and training
+- `CriticWorker`: Responsible for value network training
+- `RayPPOTrainer`: Coordinates communication and synchronization between different workers
+
+FSDP shards model parameters across nodes using `ShardingStrategy.FULL_SHARD`, allowing for training larger models.
+
+## Summary
+
+The OpenManus training process integrates several key technologies:
+1. PPO-based reinforcement learning framework
+2. Trajectory-based environment interaction and reward collection
+3. Composite reward calculation and flexible reward allocation strategies
+4. Distributed training architecture supporting large-scale models
+
+The core of the entire process lies in how to collect meaningful trajectories from environment interactions, and optimize the LLM's decision-making capabilities through appropriate reward functions and advantage estimation. 
\ No newline at end of file
diff --git a/docs/README_ZH.md b/docs/README_ZH.md
new file mode 100644
index 00000000..db00cbc7
--- /dev/null
+++ b/docs/README_ZH.md
@@ -0,0 +1,189 @@
+# OpenManus模型训练概述
+
+本文档提供了OpenManus训练逻辑和核心函数的详细解释，重点关注智能体轨迹（trajectories）如何用于损失计算和训练。
+
+## 整体训练架构
+
+OpenManus采用近端策略优化（PPO）算法，主要由以下核心组件组成：
+
+```
+RayPPOTrainer (主训练器)
+├── OpenManusAgent (环境交互)
+├── ActorRolloutRefWorker (策略网络)
+└── CriticWorker (价值网络)
+```
+
+## 主要训练循环
+
+训练核心在`RayPPOTrainer.fit()`方法中实现：
+
+```python
+# 简化的训练循环
+for epoch in range(epochs):
+    for batch in train_dataloader:
+        # 1. 收集轨迹
+        trajectories = generation_manager.run_llm_loop(batch)
+        
+        # 2. 计算复合奖励
+        compute_total_reward(trajectories)
+        
+        # 3. 计算优势函数
+        compute_advantage(batch)
+        
+        # 4. 更新critic网络
+        critic_wg.update_critic(batch)
+        
+        # 5. 更新actor网络
+        actor_rollout_wg.update_actor(batch)
+```
+
+## 关键流程
+
+### 1. 轨迹收集 (Trajectory Collection)
+
+在`OpenManusAgent.run_llm_loop`和`_run_single_rollout`中实现：
+
+```python
+# _run_single_rollout中的关键逻辑
+while not done:
+    # 获取当前观察
+    observation = client.observe()
+    
+    # 生成LLM响应
+    response = actor_model.generate(observation)
+    
+    # 解析动作
+    action = parse_action(response)
+    
+    # 执行环境步骤
+    next_obs, reward, done, info = client.step(action)
+    
+    # 记录轨迹
+    trajectory.append({"from": "human", "value": next_obs, "reward": reward, "info": info})
+```
+
+### 2. 奖励组合 (Reward Composition)
+
+通过`RewardComposer`组合多种奖励信号：
+
+```python
+# 在_convert_rollout_results_to_dataproto中调用
+total_score, breakdown = reward_composer.compute_total_reward(
+    trajectory=trajectory,
+    reward_model_info=reward_model_info,
+    env_name=env_name
+)
+```
+
+主要奖励组件包括：
+- `GoalReward`: 主要任务成功奖励
+- `LengthPenalty`: 长度惩罚
+- `FormatReward`: 输出格式正确性奖励
+
+### 3. 奖励分配 (Reward Allocation)
+
+在`_convert_rollout_results_to_dataproto`方法中，奖励被分配到各个token上：
+
+```python
+# 几种奖励分配策略：
+if reward_allocation == "last_token":
+    # 只给最后一个token分配奖励
+    token_level_rewards[0, last_segment_end] = reward_to_distribute
+    
+elif reward_allocation == "uniform_positive":
+    # 均匀分配正奖励，负奖励仅给最后token
+    if reward_to_distribute > 0:
+        reward_per_token = reward_to_distribute / total_agent_tokens
+        for start, end in agent_indices_in_padded:
+            token_level_rewards[0, start:end+1] = reward_per_token
+            
+elif reward_allocation == "discounted":
+    # 折扣奖励，从最后一个segment反向分配
+    gamma = config.algorithm_config.get('gamma', 1.0)
+    current_reward = reward_to_distribute
+    for start, end in reversed(agent_indices_in_padded):
+        # 计算每个segment内的奖励
+        token_level_rewards[0, start:end+1] = current_reward / segment_len
+        current_reward *= (gamma ** segment_len)
+```
+
+### 4. 优势函数计算 (Advantage Computation)
+
+在`compute_advantage`函数中，使用广义优势估计（GAE）计算优势函数：
+
+```python
+if adv_estimator == 'gae':
+    advantages, returns = core_algos.compute_gae_advantage_return(
+        token_level_rewards=token_level_rewards,
+        values=values,
+        eos_mask=response_mask,
+        gamma=gamma,
+        lam=lam
+    )
+```
+
+### 5. 策略更新 (Policy Update)
+
+在`update_actor`中使用PPO目标函数更新策略：
+
+```python
+def update_policy(self, data):
+    old_log_probs = data.batch['old_log_probs']
+    advantages = data.batch['advantages']
+    
+    # 计算当前策略的log概率
+    current_log_probs = self.compute_log_prob(data)
+    
+    # 计算策略比率
+    ratio = torch.exp(current_log_probs - old_log_probs)
+    
+    # 截断比率
+    ratio_clipped = torch.clamp(ratio, 1-clip_eps, 1+clip_eps)
+    
+    # PPO目标
+    policy_loss = -torch.min(
+        advantages * ratio,
+        advantages * ratio_clipped
+    ).mean()
+    
+    self.optimizer.zero_grad()
+    policy_loss.backward()
+    self.optimizer.step()
+```
+
+### 6. 价值网络更新 (Critic Update)
+
+在`update_critic`中通过最小化价值损失更新critic网络：
+
+```python
+def update_critic(self, data):
+    values = self.compute_values(data)
+    returns = data.batch['returns']
+    
+    # 价值损失
+    value_loss = F.mse_loss(values, returns)
+    
+    self.optimizer.zero_grad()
+    value_loss.backward()
+    self.optimizer.step()
+```
+
+## 分布式训练架构
+
+OpenManus使用Ray和FSDP（完全分片数据并行）进行分布式训练：
+
+- `ActorRolloutRefWorker`: 负责策略网络的前向推理和训练
+- `CriticWorker`: 负责价值网络的训练
+- `RayPPOTrainer`: 协调不同worker之间的通信和同步
+
+FSDP通过`ShardingStrategy.FULL_SHARD`跨节点分片模型参数，允许训练更大的模型。
+
+## 总结
+
+OpenManus的训练流程整合了几个关键技术：
+1. 基于PPO的强化学习框架
+2. 基于轨迹的环境交互和奖励收集
+3. 组合式奖励计算和灵活的奖励分配策略
+4. 分布式训练架构支持大规模模型
+
+整个流程的核心在于如何从环境交互中收集有意义的轨迹，并通过适当的奖励函数和优势估计来优化LLM的决策能力。 
\ No newline at end of file
diff --git a/openmanus_rl/agentgym/OpenManus/.gitattributes b/openmanus_rl/agentgym/OpenManus/.gitattributes
deleted file mode 100644
index 462e4734..00000000
--- a/openmanus_rl/agentgym/OpenManus/.gitattributes
+++ /dev/null
@@ -1,30 +0,0 @@
-# HTML code is incorrectly calculated into statistics, so ignore them
-*.html linguist-detectable=false
-# Auto detect text files and perform LF normalization
-* text=auto eol=lf
-# Ensure shell scripts use LF (Linux style) line endings on Windows
-*.sh text eol=lf
-# Treat specific binary files as binary and prevent line ending conversion
-*.png binary
-*.jpg binary
-*.gif binary
-*.ico binary
-*.jpeg binary
-*.mp3 binary
-*.zip binary
-*.bin binary
-# Preserve original line endings for specific document files
-*.doc text eol=crlf
-*.docx text eol=crlf
-*.pdf binary
-# Ensure source code and script files use LF line endings
-*.py text eol=lf
-*.js text eol=lf
-*.html text eol=lf
-*.css text eol=lf
-# Specify custom diff driver for specific file types
-*.md diff=markdown
-*.json diff=json
-*.mp4 filter=lfs diff=lfs merge=lfs -text
-*.mov filter=lfs diff=lfs merge=lfs -text
-*.webm filter=lfs diff=lfs merge=lfs -text
diff --git a/openmanus_rl/agentgym/OpenManus/.github/ISSUE_TEMPLATE/config.yaml b/openmanus_rl/agentgym/OpenManus/.github/ISSUE_TEMPLATE/config.yaml
deleted file mode 100644
index 892abfb4..00000000
--- a/openmanus_rl/agentgym/OpenManus/.github/ISSUE_TEMPLATE/config.yaml
+++ /dev/null
@@ -1,4 +0,0 @@
-blank_issues_enabled: false
-contact_links:
-  - name: "📑 Read online docs"
-    about: Find tutorials, use cases, and guides in the OpenManus documentation.
diff --git a/openmanus_rl/agentgym/OpenManus/.github/ISSUE_TEMPLATE/request_new_features.yaml b/openmanus_rl/agentgym/OpenManus/.github/ISSUE_TEMPLATE/request_new_features.yaml
deleted file mode 100644
index 749ab7fa..00000000
--- a/openmanus_rl/agentgym/OpenManus/.github/ISSUE_TEMPLATE/request_new_features.yaml
+++ /dev/null
@@ -1,21 +0,0 @@
-name: "🤔 Request new features"
-description: Suggest ideas or features you’d like to see implemented in OpenManus.
-labels: enhancement
-body:
-  - type: textarea
-    id: feature-description
-    attributes:
-      label: Feature description
-      description: |
-        Provide a clear and concise description of the proposed feature
-    validations:
-      required: true
-  - type: textarea
-    id: your-feature
-    attributes:
-      label: Your Feature
-      description: |
-        Explain your idea or implementation process, if any. Optionally, include a Pull Request URL.
-        Ensure accompanying docs/tests/examples are provided for review.
-    validations:
-      required: false
diff --git a/openmanus_rl/agentgym/OpenManus/.github/ISSUE_TEMPLATE/show_me_the_bug.yaml b/openmanus_rl/agentgym/OpenManus/.github/ISSUE_TEMPLATE/show_me_the_bug.yaml
deleted file mode 100644
index de9298e8..00000000
--- a/openmanus_rl/agentgym/OpenManus/.github/ISSUE_TEMPLATE/show_me_the_bug.yaml
+++ /dev/null
@@ -1,44 +0,0 @@
-name: "🪲 Show me the Bug"
-description: Report a bug encountered while using OpenManus and seek assistance.
-labels: bug
-body:
-  - type: textarea
-    id: bug-description
-    attributes:
-      label: Bug Description
-      description: |
-        Clearly describe the bug you encountered
-    validations:
-      required: true
-  - type: textarea
-    id: solve-method
-    attributes:
-      label: Bug solved method
-      description: |
-        If resolved, explain the solution. Optionally, include a Pull Request URL.
-        If unresolved, provide additional details to aid investigation
-    validations:
-      required: true
-  - type: textarea
-    id: environment-information
-    attributes:
-      label: Environment information
-      description: |
-        System: e.g., Ubuntu 22.04
-        Python: e.g., 3.12
-        OpenManus version: e.g., 0.1.0
-      value: |
-        - System version:
-        - Python version:
-        - OpenManus version or branch:
-        - Installation method (e.g., `pip install -r requirements.txt` or `pip install -e .`):
-    validations:
-      required: true
-  - type: textarea
-    id: extra-information
-    attributes:
-      label: Extra information
-      description: |
-        For example, attach screenshots or logs to help diagnose the issue
-    validations:
-      required: false
diff --git a/openmanus_rl/agentgym/OpenManus/.github/PULL_REQUEST_TEMPLATE.md b/openmanus_rl/agentgym/OpenManus/.github/PULL_REQUEST_TEMPLATE.md
deleted file mode 100644
index 1859f27d..00000000
--- a/openmanus_rl/agentgym/OpenManus/.github/PULL_REQUEST_TEMPLATE.md
+++ /dev/null
@@ -1,17 +0,0 @@
-**Features**
-<!-- Describe the features or bug fixes in this PR. For bug fixes, link to the issue. -->
-
-- Feature 1
-- Feature 2
-
-**Feature Docs**
-<!-- Provide RFC, tutorial, or use case links for significant updates. Optional for minor changes. -->
-
-**Influence**
-<!-- Explain the impact of these changes for reviewer focus. -->
-
-**Result**
-<!-- Include screenshots or logs of unit tests or running results. -->
-
-**Other**
-<!-- Additional notes about this PR. -->
diff --git a/openmanus_rl/agentgym/OpenManus/.github/dependabot.yml b/openmanus_rl/agentgym/OpenManus/.github/dependabot.yml
deleted file mode 100644
index 1ef0e949..00000000
--- a/openmanus_rl/agentgym/OpenManus/.github/dependabot.yml
+++ /dev/null
@@ -1,58 +0,0 @@
-version: 2
-updates:
-  - package-ecosystem: "pip"
-    directory: "/"
-    schedule:
-      interval: "weekly"
-    open-pull-requests-limit: 4
-    groups:
-      # Group critical packages that might need careful review
-      core-dependencies:
-        patterns:
-          - "pydantic*"
-          - "openai"
-          - "fastapi"
-          - "tiktoken"
-      browsergym-related:
-        patterns:
-          - "browsergym*"
-          - "browser-use"
-          - "playwright"
-      search-tools:
-        patterns:
-          - "googlesearch-python"
-          - "baidusearch"
-          - "duckduckgo_search"
-      pre-commit:
-        patterns:
-          - "pre-commit"
-      security-all:
-        applies-to: "security-updates"
-        patterns:
-          - "*"
-      version-all:
-        applies-to: "version-updates"
-        patterns:
-          - "*"
-        exclude-patterns:
-          - "pydantic*"
-          - "openai"
-          - "fastapi"
-          - "tiktoken"
-          - "browsergym*"
-          - "browser-use"
-          - "playwright"
-          - "googlesearch-python"
-          - "baidusearch"
-          - "duckduckgo_search"
-          - "pre-commit"
-
-  - package-ecosystem: "github-actions"
-    directory: "/"
-    schedule:
-      interval: "weekly"
-    open-pull-requests-limit: 4
-    groups:
-      actions:
-        patterns:
-          - "*"
diff --git a/openmanus_rl/agentgym/OpenManus/.github/workflows/build-package.yaml b/openmanus_rl/agentgym/OpenManus/.github/workflows/build-package.yaml
deleted file mode 100644
index 754c3007..00000000
--- a/openmanus_rl/agentgym/OpenManus/.github/workflows/build-package.yaml
+++ /dev/null
@@ -1,33 +0,0 @@
-name: Build and upload Python package
-
-on:
-  workflow_dispatch:
-  release:
-    types: [created, published]
-
-jobs:
-  deploy:
-    runs-on: ubuntu-latest
-    steps:
-      - uses: actions/checkout@v4
-      - name: Set up Python
-        uses: actions/setup-python@v5
-        with:
-          python-version: '3.12'
-          cache: 'pip'
-      - name: Install dependencies
-        run: |
-          python -m pip install --upgrade pip
-          pip install -r requirements.txt
-          pip install setuptools wheel twine
-      - name: Set package version
-        run: |
-          export VERSION="${GITHUB_REF#refs/tags/v}"
-          sed -i "s/version=.*/version=\"${VERSION}\",/" setup.py
-      - name: Build and publish
-        env:
-          TWINE_USERNAME: __token__
-          TWINE_PASSWORD: ${{ secrets.PYPI_API_TOKEN }}
-        run: |
-          python setup.py bdist_wheel sdist
-          twine upload dist/*
diff --git a/openmanus_rl/agentgym/OpenManus/.github/workflows/environment-corrupt-check.yaml b/openmanus_rl/agentgym/OpenManus/.github/workflows/environment-corrupt-check.yaml
deleted file mode 100644
index dc66fe04..00000000
--- a/openmanus_rl/agentgym/OpenManus/.github/workflows/environment-corrupt-check.yaml
+++ /dev/null
@@ -1,33 +0,0 @@
-name: Environment Corruption Check
-on:
-  push:
-    branches: ["main"]
-    paths:
-      - requirements.txt
-  pull_request:
-    branches: ["main"]
-    paths:
-      - requirements.txt
-concurrency:
-  group: ${{ github.workflow }}-${{ github.event_name }}-${{ github.ref }}
-  cancel-in-progress: true
-jobs:
-  test-python-versions:
-    runs-on: ubuntu-latest
-    strategy:
-      matrix:
-        python-version: ["3.11.11", "3.12.8", "3.13.2"]
-      fail-fast: false
-    steps:
-      - name: Checkout repository
-        uses: actions/checkout@v4
-      - name: Set up Python ${{ matrix.python-version }}
-        uses: actions/setup-python@v5
-        with:
-          python-version: ${{ matrix.python-version }}
-      - name: Upgrade pip
-        run: |
-          python -m pip install --upgrade pip
-      - name: Install dependencies
-        run: |
-          pip install -r requirements.txt
diff --git a/openmanus_rl/agentgym/OpenManus/.github/workflows/pr-autodiff.yaml b/openmanus_rl/agentgym/OpenManus/.github/workflows/pr-autodiff.yaml
deleted file mode 100644
index 46c95c65..00000000
--- a/openmanus_rl/agentgym/OpenManus/.github/workflows/pr-autodiff.yaml
+++ /dev/null
@@ -1,138 +0,0 @@
-name: PR Diff Summarization
-on:
-  # pull_request:
-  #   branches: [main]
-  #   types: [opened, ready_for_review, reopened]
-  issue_comment:
-    types: [created]
-permissions:
-  contents: read
-  pull-requests: write
-jobs:
-  pr-diff-summarization:
-    runs-on: ubuntu-latest
-    if: |
-      (github.event_name == 'pull_request') ||
-      (github.event_name == 'issue_comment' &&
-       contains(github.event.comment.body, '!pr-diff') &&
-       (github.event.comment.author_association == 'CONTRIBUTOR' || github.event.comment.author_association == 'COLLABORATOR' || github.event.comment.author_association == 'MEMBER' || github.event.comment.author_association == 'OWNER') &&
-       github.event.issue.pull_request)
-    steps:
-      - name: Get PR head SHA
-        id: get-pr-sha
-        run: |
-          PR_URL="${{ github.event.issue.pull_request.url || github.event.pull_request.url }}"
-          # https://api.github.com/repos/OpenManus/pulls/1
-          RESPONSE=$(curl -s -H "Authorization: Bearer ${{ secrets.GITHUB_TOKEN }}" $PR_URL)
-          SHA=$(echo $RESPONSE | jq -r '.head.sha')
-          TARGET_BRANCH=$(echo $RESPONSE | jq -r '.base.ref')
-          echo "pr_sha=$SHA" >> $GITHUB_OUTPUT
-          echo "target_branch=$TARGET_BRANCH" >> $GITHUB_OUTPUT
-          echo "Retrieved PR head SHA from API: $SHA, target branch: $TARGET_BRANCH"
-      - name: Check out code
-        uses: actions/checkout@v4
-        with:
-          ref: ${{ steps.get-pr-sha.outputs.pr_sha }}
-          fetch-depth: 0
-      - name: Set up Python
-        uses: actions/setup-python@v5
-        with:
-          python-version: '3.11'
-      - name: Install dependencies
-        run: |
-          python -m pip install --upgrade pip
-          pip install openai requests
-      - name: Create and run Python script
-        env:
-          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
-          OPENAI_BASE_URL: ${{ secrets.OPENAI_BASE_URL }}
-          GH_TOKEN: ${{ github.token }}
-          PR_NUMBER: ${{ github.event.pull_request.number || github.event.issue.number }}
-          TARGET_BRANCH: ${{ steps.get-pr-sha.outputs.target_branch }}
-        run: |-
-          cat << 'EOF' > /tmp/_workflow_core.py
-          import os
-          import subprocess
-          import json
-          import requests
-          from openai import OpenAI
-
-          def get_diff():
-              result = subprocess.run(
-                  ['git', 'diff', 'origin/' + os.getenv('TARGET_BRANCH') + '...HEAD'],
-                  capture_output=True, text=True, check=True)
-              return '\n'.join(
-                  line for line in result.stdout.split('\n')
-                  if any(line.startswith(c) for c in ('+', '-'))
-                  and not line.startswith(('---', '+++'))
-              )[:round(200000 * 0.4)]  # Truncate to prevent overflow
-
-          def generate_comment(diff_content):
-              client = OpenAI(
-                  base_url=os.getenv("OPENAI_BASE_URL"),
-                  api_key=os.getenv("OPENAI_API_KEY")
-              )
-
-              guidelines = '''
-          1. English version first, Chinese Simplified version after
-          2. Example format:
-              # Diff Report
-              ## English
-              - Added `ABC` class
-              - Fixed `f()` behavior in `foo` module
-
-              ### Comments Highlight
-              - `config.toml` needs to be configured properly to make sure new features work as expected.
-
-              ### Spelling/Offensive Content Check
-              - No spelling mistakes or offensive content found in the code or comments.
-
-              ## 中文（简体）
-              - 新增了 `ABC` 类
-              - `foo` 模块中的 `f()` 行为已修复
-
-              ### 评论高亮
-              - `config.toml` 需要正确配置才能确保新功能正常运行。
-
-              ### 内容检查
-              - 没有发现代码或注释中的拼写错误或不当措辞。
-
-          3. Highlight non-English comments
-          4. Check for spelling/offensive content'''
-
-              response = client.chat.completions.create(
-                  model="o3-mini",
-                  messages=[{
-                      "role": "system",
-                      "content": "Generate bilingual code review feedback."
-                  }, {
-                      "role": "user",
-                      "content": f"Review these changes per guidelines:\n{guidelines}\n\nDIFF:\n{diff_content}"
-                  }]
-              )
-              return response.choices[0].message.content
-
-          def post_comment(comment):
-              repo = os.getenv("GITHUB_REPOSITORY")
-              pr_number = os.getenv("PR_NUMBER")
-
-              headers = {
-                  "Authorization": f"Bearer {os.getenv('GH_TOKEN')}",
-                  "Accept": "application/vnd.github.v3+json"
-              }
-              url = f"https://api.github.com/repos/{repo}/issues/{pr_number}/comments"
-
-              requests.post(url, json={"body": comment}, headers=headers)
-
-          if __name__ == "__main__":
-              diff_content = get_diff()
-              if not diff_content.strip():
-                  print("No meaningful diff detected.")
-                  exit(0)
-
-              comment = generate_comment(diff_content)
-              post_comment(comment)
-              print("Comment posted successfully.")
-          EOF
-
-          python /tmp/_workflow_core.py
diff --git a/openmanus_rl/agentgym/OpenManus/.github/workflows/pre-commit.yaml b/openmanus_rl/agentgym/OpenManus/.github/workflows/pre-commit.yaml
deleted file mode 100644
index c9b3b22e..00000000
--- a/openmanus_rl/agentgym/OpenManus/.github/workflows/pre-commit.yaml
+++ /dev/null
@@ -1,26 +0,0 @@
-name: Pre-commit checks
-
-on:
-  pull_request:
-    branches:
-      - '**'
-  push:
-    branches:
-      - '**'
-
-jobs:
-  pre-commit-check:
-    runs-on: ubuntu-latest
-    steps:
-      - name: Checkout Source Code
-        uses: actions/checkout@v4
-      - name: Set up Python 3.12
-        uses: actions/setup-python@v5
-        with:
-          python-version: '3.12'
-      - name: Install pre-commit and tools
-        run: |
-          python -m pip install --upgrade pip
-          pip install pre-commit black==23.1.0 isort==5.12.0 autoflake==2.0.1
-      - name: Run pre-commit hooks
-        run: pre-commit run --all-files
diff --git a/openmanus_rl/agentgym/OpenManus/.github/workflows/stale.yaml b/openmanus_rl/agentgym/OpenManus/.github/workflows/stale.yaml
deleted file mode 100644
index ea52562d..00000000
--- a/openmanus_rl/agentgym/OpenManus/.github/workflows/stale.yaml
+++ /dev/null
@@ -1,23 +0,0 @@
-name: Close inactive issues
-
-on:
-  schedule:
-    - cron: "5 0 * * *"
-
-jobs:
-  close-issues:
-    runs-on: ubuntu-latest
-    permissions:
-      issues: write
-      pull-requests: write
-    steps:
-      - uses: actions/stale@v9
-        with:
-          days-before-issue-stale: 30
-          days-before-issue-close: 14
-          stale-issue-label: "inactive"
-          stale-issue-message: "This issue has been inactive for 30 days. Please comment if you have updates."
-          close-issue-message: "This issue was closed due to 45 days of inactivity. Reopen if still relevant."
-          days-before-pr-stale: -1
-          days-before-pr-close: -1
-          repo-token: ${{ secrets.GITHUB_TOKEN }}
diff --git a/openmanus_rl/agentgym/OpenManus/.github/workflows/top-issues.yaml b/openmanus_rl/agentgym/OpenManus/.github/workflows/top-issues.yaml
deleted file mode 100644
index 9ad9f590..00000000
--- a/openmanus_rl/agentgym/OpenManus/.github/workflows/top-issues.yaml
+++ /dev/null
@@ -1,29 +0,0 @@
-name: Top issues
-on:
-  schedule:
-    - cron: '0 0/2 * * *'
-  workflow_dispatch:
-jobs:
-  ShowAndLabelTopIssues:
-    permissions:
-      issues: write
-      pull-requests: write
-      actions: read
-      contents: read
-    name: Display and label top issues
-    runs-on: ubuntu-latest
-    if: github.repository == 'mannaandpoem/OpenManus'
-    steps:
-      - name: Run top issues action
-        uses: rickstaa/top-issues-action@7e8dda5d5ae3087670f9094b9724a9a091fc3ba1 # v1.3.101
-        env:
-          github_token: ${{ secrets.GITHUB_TOKEN }}
-        with:
-          label: true
-          dashboard: true
-          dashboard_show_total_reactions: true
-          top_issues: true
-          top_features: true
-          top_bugs: true
-          top_pull_requests: true
-          top_list_size: 14
diff --git a/openmanus_rl/agentgym/OpenManus/.gitignore b/openmanus_rl/agentgym/OpenManus/.gitignore
deleted file mode 100644
index 857ec7e7..00000000
--- a/openmanus_rl/agentgym/OpenManus/.gitignore
+++ /dev/null
@@ -1,199 +0,0 @@
-### Project-specific ###
-# Logs
-logs/
-
-# Data
-data/
-
-# Workspace
-workspace/
-
-### Python ###
-# Byte-compiled / optimized / DLL files
-__pycache__/
-*.py[cod]
-*$py.class
-
-# C extensions
-*.so
-
-# Distribution / packaging
-.Python
-build/
-develop-eggs/
-dist/
-downloads/
-eggs/
-.eggs/
-lib/
-lib64/
-parts/
-sdist/
-var/
-wheels/
-share/python-wheels/
-*.egg-info/
-.installed.cfg
-*.egg
-MANIFEST
-
-# PyInstaller
-#  Usually these files are written by a python script from a template
-#  before PyInstaller builds the exe, so as to inject date/other infos into it.
-*.manifest
-*.spec
-
-# Installer logs
-pip-log.txt
-pip-delete-this-directory.txt
-
-# Unit test / coverage reports
-htmlcov/
-.tox/
-.nox/
-.coverage
-.coverage.*
-.cache
-nosetests.xml
-coverage.xml
-*.cover
-*.py,cover
-.hypothesis/
-.pytest_cache/
-cover/
-
-# Translations
-*.mo
-*.pot
-
-# Django stuff:
-*.log
-local_settings.py
-db.sqlite3
-db.sqlite3-journal
-
-# Flask stuff:
-instance/
-.webassets-cache
-
-# Scrapy stuff:
-.scrapy
-
-# Sphinx documentation
-docs/_build/
-
-# PyBuilder
-.pybuilder/
-target/
-
-# Jupyter Notebook
-.ipynb_checkpoints
-
-# IPython
-profile_default/
-ipython_config.py
-
-# pyenv
-#   For a library or package, you might want to ignore these files since the code is
-#   intended to run in multiple environments; otherwise, check them in:
-# .python-version
-
-# pipenv
-#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
-#   However, in case of collaboration, if having platform-specific dependencies or dependencies
-#   having no cross-platform support, pipenv may install dependencies that don't work, or not
-#   install all needed dependencies.
-#Pipfile.lock
-
-# UV
-#   Similar to Pipfile.lock, it is generally recommended to include uv.lock in version control.
-#   This is especially recommended for binary packages to ensure reproducibility, and is more
-#   commonly ignored for libraries.
-#uv.lock
-
-# poetry
-#   Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
-#   This is especially recommended for binary packages to ensure reproducibility, and is more
-#   commonly ignored for libraries.
-#   https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
-#poetry.lock
-
-# pdm
-#   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
-#pdm.lock
-#   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
-#   in version control.
-#   https://pdm.fming.dev/latest/usage/project/#working-with-version-control
-.pdm.toml
-.pdm-python
-.pdm-build/
-
-# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
-__pypackages__/
-
-# Celery stuff
-celerybeat-schedule
-celerybeat.pid
-
-# SageMath parsed files
-*.sage.py
-
-# Environments
-.env
-.venv
-env/
-venv/
-ENV/
-env.bak/
-venv.bak/
-
-# Spyder project settings
-.spyderproject
-.spyproject
-
-# Rope project settings
-.ropeproject
-
-# mkdocs documentation
-/site
-
-# mypy
-.mypy_cache/
-.dmypy.json
-dmypy.json
-
-# Pyre type checker
-.pyre/
-
-# pytype static type analyzer
-.pytype/
-
-# Cython debug symbols
-cython_debug/
-
-# PyCharm
-#  JetBrains specific template is maintained in a separate JetBrains.gitignore that can
-#  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
-#  and can be added to the global gitignore or merged into this file.  For a more nuclear
-#  option (not recommended) you can uncomment the following to ignore the entire idea folder.
-.idea/
-
-# PyPI configuration file
-.pypirc
-
-### Visual Studio Code ###
-.vscode/*
-!.vscode/settings.json
-!.vscode/tasks.json
-!.vscode/launch.json
-!.vscode/extensions.json
-!.vscode/*.code-snippets
-
-# Local History for Visual Studio Code
-.history/
-
-# Built Visual Studio Code Extensions
-*.vsix
-
-# OSX
-.DS_Store
diff --git a/openmanus_rl/agentgym/OpenManus/.pre-commit-config.yaml b/openmanus_rl/agentgym/OpenManus/.pre-commit-config.yaml
deleted file mode 100644
index 3b6c1ba4..00000000
--- a/openmanus_rl/agentgym/OpenManus/.pre-commit-config.yaml
+++ /dev/null
@@ -1,39 +0,0 @@
-repos:
-  - repo: https://github.com/psf/black
-    rev: 23.1.0
-    hooks:
-      - id: black
-
-  - repo: https://github.com/pre-commit/pre-commit-hooks
-    rev: v4.4.0
-    hooks:
-      - id: trailing-whitespace
-      - id: end-of-file-fixer
-      - id: check-yaml
-      - id: check-added-large-files
-
-  - repo: https://github.com/PyCQA/autoflake
-    rev: v2.0.1
-    hooks:
-      - id: autoflake
-        args: [
-          --remove-all-unused-imports,
-          --ignore-init-module-imports,
-          --expand-star-imports,
-          --remove-duplicate-keys,
-          --remove-unused-variables,
-          --recursive,
-          --in-place,
-          --exclude=__init__.py,
-        ]
-        files: \.py$
-
-  - repo: https://github.com/pycqa/isort
-    rev: 5.12.0
-    hooks:
-      - id: isort
-        args: [
-          "--profile", "black",
-          "--filter-files",
-          "--lines-after-imports=2",
-        ]
diff --git a/openmanus_rl/agentgym/OpenManus/CODE_OF_CONDUCT.md b/openmanus_rl/agentgym/OpenManus/CODE_OF_CONDUCT.md
deleted file mode 100644
index 42eb10cc..00000000
--- a/openmanus_rl/agentgym/OpenManus/CODE_OF_CONDUCT.md
+++ /dev/null
@@ -1,162 +0,0 @@
-# Contributor Covenant Code of Conduct
-
-## Our Pledge
-
-We as members, contributors, and leaders pledge to make participation in our
-community a harassment-free experience for everyone, regardless of age, body
-size, visible or invisible disability, ethnicity, sex characteristics, gender
-identity and expression, level of experience, education, socio-economic status,
-nationality, personal appearance, race, caste, color, religion, or sexual
-identity and orientation.
-
-We pledge to act and interact in ways that contribute to an open, welcoming,
-diverse, inclusive, and healthy community.
-
-## Our Standards
-
-Examples of behavior that contributes to a positive environment for our
-community include:
-
-* Demonstrating empathy and kindness toward other people.
-* Being respectful of differing opinions, viewpoints, and experiences.
-* Giving and gracefully accepting constructive feedback.
-* Accepting responsibility and apologizing to those affected by our mistakes,
-  and learning from the experience.
-* Focusing on what is best not just for us as individuals, but for the overall
-  community.
-
-Examples of unacceptable behavior include:
-
-* The use of sexualized language or imagery, and sexual attention or advances of
-  any kind.
-* Trolling, insulting or derogatory comments, and personal or political attacks.
-* Public or private harassment.
-* Publishing others' private information, such as a physical or email address,
-  without their explicit permission.
-* Other conduct which could reasonably be considered inappropriate in a
-  professional setting.
-
-## Enforcement Responsibilities
-
-Community leaders are responsible for clarifying and enforcing our standards of
-acceptable behavior and will take appropriate and fair corrective action in
-response to any behavior that they deem inappropriate, threatening, offensive,
-or harmful.
-
-Community leaders have the right and responsibility to remove, edit, or reject
-comments, commits, code, wiki edits, issues, and other contributions that are
-not aligned to this Code of Conduct, and will communicate reasons for moderation
-decisions when appropriate.
-
-## Scope
-
-This Code of Conduct applies within all community spaces, and also applies when
-an individual is officially representing the community in public spaces.
-Examples of representing our community include using an official email address,
-posting via an official social media account, or acting as an appointed
-representative at an online or offline event.
-
-## Enforcement
-
-Instances of abusive, harassing, or otherwise unacceptable behavior may be
-reported to the community leaders responsible for enforcement at
-mannaandpoem@gmail.com
-All complaints will be reviewed and investigated promptly and fairly.
-
-All community leaders are obligated to respect the privacy and security of the
-reporter of any incident.
-
-## Enforcement Guidelines
-
-Community leaders will follow these Community Impact Guidelines in determining
-the consequences for any action they deem in violation of this Code of Conduct:
-
-### 1. Correction
-
-**Community Impact**: Use of inappropriate language or other behavior deemed
-unprofessional or unwelcome in the community.
-
-**Consequence**: A private, written warning from community leaders, providing
-clarity around the nature of the violation and an explanation of why the
-behavior was inappropriate. A public apology may be requested.
-
-### 2. Warning
-
-**Community Impact**: A violation through a single incident or series of
-actions.
-
-**Consequence**: A warning with consequences for continued behavior. No
-interaction with the people involved, including unsolicited interaction with
-those enforcing the Code of Conduct, for a specified period of time. This
-includes avoiding interactions in community spaces as well as external channels
-like social media. Violating these terms may lead to a temporary or permanent
-ban.
-
-### 3. Temporary Ban
-
-**Community Impact**: A serious violation of community standards, including
-sustained inappropriate behavior.
-
-**Consequence**: A temporary ban from any sort of interaction or public
-communication with the community for a specified period of time. No public or
-private interaction with the people involved, including unsolicited interaction
-with those enforcing the Code of Conduct, is allowed during this period.
-Violating these terms may lead to a permanent ban.
-
-### 4. Permanent Ban
-
-**Community Impact**: Demonstrating a pattern of violation of community
-standards, including sustained inappropriate behavior, harassment of an
-individual, or aggression toward or disparagement of classes of individuals.
-
-**Consequence**: A permanent ban from any sort of public interaction within the
-community.
-
-### Slack and Discord Etiquettes
-
-These Slack and Discord etiquette guidelines are designed to foster an inclusive, respectful, and productive environment
-for all community members. By following these best practices, we ensure effective communication and collaboration while
-minimizing disruptions. Let’s work together to build a supportive and welcoming community!
-
-- Communicate respectfully and professionally, avoiding sarcasm or harsh language, and remember that tone can be
-  difficult to interpret in text.
-- Use threads for specific discussions to keep channels organized and easier to follow.
-- Tag others only when their input is critical or urgent, and use @here, @channel or @everyone sparingly to minimize
-  disruptions.
-- Be patient, as open-source contributors and maintainers often have other commitments and may need time to respond.
-- Post questions or discussions in the most relevant
-  channel ([discord - #general](https://discord.com/channels/1125308739348594758/1138430348557025341)).
-- When asking for help or raising issues, include necessary details like links, screenshots, or clear explanations to
-  provide context.
-- Keep discussions in public channels whenever possible to allow others to benefit from the conversation, unless the
-  matter is sensitive or private.
-- Always adhere to [our standards](https://github.com/mannaandpoem/OpenManus/blob/main/CODE_OF_CONDUCT.md#our-standards)
-  to ensure a welcoming and collaborative environment.
-- If you choose to mute a channel, consider setting up alerts for topics that still interest you to stay engaged. For
-  Slack, Go to Settings → Notifications → My Keywords to add specific keywords that will notify you when mentioned. For
-  example, if you're here for discussions about LLMs, mute the channel if it’s too busy, but set notifications to alert
-  you only when “LLMs” appears in messages. Also for Discord, go to the channel notifications and choose the option that
-  best describes your need.
-
-## Attribution
-
-This Code of Conduct is adapted from the [Contributor Covenant][homepage],
-version 2.1, available at
-[https://www.contributor-covenant.org/version/2/1/code_of_conduct.html][v2.1].
-
-Community Impact Guidelines were inspired by
-[Mozilla's code of conduct enforcement ladder][Mozilla CoC].
-
-For answers to common questions about this code of conduct, see the FAQ at
-[https://www.contributor-covenant.org/faq][FAQ]. Translations are available at
-[https://www.contributor-covenant.org/translations][translations].
-
-[homepage]: https://www.contributor-covenant.org
-
-[v2.1]: https://www.contributor-covenant.org/version/2/1/code_of_conduct.html
-
-[Mozilla CoC]: https://github.com/mozilla/diversity
-
-[FAQ]: https://www.contributor-covenant.org/faq
-
-[translations]: https://www.contributor-covenant.org/translations
diff --git a/openmanus_rl/agentgym/OpenManus/Dockerfile b/openmanus_rl/agentgym/OpenManus/Dockerfile
deleted file mode 100644
index 9f7a1908..00000000
--- a/openmanus_rl/agentgym/OpenManus/Dockerfile
+++ /dev/null
@@ -1,13 +0,0 @@
-FROM python:3.12-slim
-
-WORKDIR /app/OpenManus
-
-RUN apt-get update && apt-get install -y --no-install-recommends git curl \
-    && rm -rf /var/lib/apt/lists/* \
-    && (command -v uv >/dev/null 2>&1 || pip install --no-cache-dir uv)
-
-COPY . .
-
-RUN uv pip install --system -r requirements.txt
-
-CMD ["bash"]
diff --git a/openmanus_rl/agentgym/OpenManus/LICENSE b/openmanus_rl/agentgym/OpenManus/LICENSE
deleted file mode 100644
index db2216e1..00000000
--- a/openmanus_rl/agentgym/OpenManus/LICENSE
+++ /dev/null
@@ -1,21 +0,0 @@
-MIT License
-
-Copyright (c) 2025 manna_and_poem
-
-Permission is hereby granted, free of charge, to any person obtaining a copy
-of this software and associated documentation files (the "Software"), to deal
-in the Software without restriction, including without limitation the rights
-to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
-copies of the Software, and to permit persons to whom the Software is
-furnished to do so, subject to the following conditions:
-
-The above copyright notice and this permission notice shall be included in all
-copies or substantial portions of the Software.
-
-THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
-IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
-FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
-AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
-LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
-OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
-SOFTWARE.
diff --git a/openmanus_rl/agentgym/OpenManus/README.md b/openmanus_rl/agentgym/OpenManus/README.md
deleted file mode 100644
index d92c4e64..00000000
--- a/openmanus_rl/agentgym/OpenManus/README.md
+++ /dev/null
@@ -1,176 +0,0 @@
-<p align="center">
-  <img src="assets/logo.jpg" width="200"/>
-</p>
-
-English | [中文](README_zh.md) | [한국어](README_ko.md) | [日本語](README_ja.md)
-
-[![GitHub stars](https://img.shields.io/github/stars/mannaandpoem/OpenManus?style=social)](https://github.com/mannaandpoem/OpenManus/stargazers)
-&ensp;
-[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) &ensp;
-[![Discord Follow](https://dcbadge.vercel.app/api/server/DYn29wFk9z?style=flat)](https://discord.gg/DYn29wFk9z)
-
-# 👋 OpenManus
-
-Manus is incredible, but OpenManus can achieve any idea without an *Invite Code* 🛫!
-
-Our team members [@Xinbin Liang](https://github.com/mannaandpoem) and [@Jinyu Xiang](https://github.com/XiangJinyu) (core authors), along with [@Zhaoyang Yu](https://github.com/MoshiQAQ), [@Jiayi Zhang](https://github.com/didiforgithub), and [@Sirui Hong](https://github.com/stellaHSR), we are from [@MetaGPT](https://github.com/geekan/MetaGPT). The prototype is launched within 3 hours and we are keeping building!
-
-It's a simple implementation, so we welcome any suggestions, contributions, and feedback!
-
-Enjoy your own agent with OpenManus!
-
-We're also excited to introduce [OpenManus-RL](https://github.com/OpenManus/OpenManus-RL), an open-source project dedicated to reinforcement learning (RL)- based (such as GRPO) tuning methods for LLM agents, developed collaboratively by researchers from UIUC and OpenManus.
-
-## Project Demo
-
-<video src="https://private-user-images.githubusercontent.com/61239030/420168772-6dcfd0d2-9142-45d9-b74e-d10aa75073c6.mp4?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NDEzMTgwNTksIm5iZiI6MTc0MTMxNzc1OSwicGF0aCI6Ii82MTIzOTAzMC80MjAxNjg3NzItNmRjZmQwZDItOTE0Mi00NWQ5LWI3NGUtZDEwYWE3NTA3M2M2Lm1wND9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAzMDclMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMzA3VDAzMjIzOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTdiZjFkNjlmYWNjMmEzOTliM2Y3M2VlYjgyNDRlZDJmOWE3NWZhZjE1MzhiZWY4YmQ3NjdkNTYwYTU5ZDA2MzYmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.UuHQCgWYkh0OQq9qsUWqGsUbhG3i9jcZDAMeHjLt5T4" data-canonical-src="https://private-user-images.githubusercontent.com/61239030/420168772-6dcfd0d2-9142-45d9-b74e-d10aa75073c6.mp4?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NDEzMTgwNTksIm5iZiI6MTc0MTMxNzc1OSwicGF0aCI6Ii82MTIzOTAzMC80MjAxNjg3NzItNmRjZmQwZDItOTE0Mi00NWQ5LWI3NGUtZDEwYWE3NTA3M2M2Lm1wND9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAzMDclMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMzA3VDAzMjIzOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTdiZjFkNjlmYWNjMmEzOTliM2Y3M2VlYjgyNDRlZDJmOWE3NWZhZjE1MzhiZWY4YmQ3NjdkNTYwYTU5ZDA2MzYmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.UuHQCgWYkh0OQq9qsUWqGsUbhG3i9jcZDAMeHjLt5T4" controls="controls" muted="muted" class="d-block rounded-bottom-2 border-top width-fit" style="max-height:640px; min-height: 200px"></video>
-
-## Installation
-
-We provide two installation methods. Method 2 (using uv) is recommended for faster installation and better dependency management.
-
-### Method 1: Using conda
-
-1. Create a new conda environment:
-
-```bash
-conda create -n open_manus python=3.12
-conda activate open_manus
-```
-
-2. Clone the repository:
-
-```bash
-git clone https://github.com/mannaandpoem/OpenManus.git
-cd OpenManus
-```
-
-3. Install dependencies:
-
-```bash
-pip install -r requirements.txt
-```
-
-### Method 2: Using uv (Recommended)
-
-1. Install uv (A fast Python package installer and resolver):
-
-```bash
-curl -LsSf https://astral.sh/uv/install.sh | sh
-```
-
-2. Clone the repository:
-
-```bash
-git clone https://github.com/mannaandpoem/OpenManus.git
-cd OpenManus
-```
-
-3. Create a new virtual environment and activate it:
-
-```bash
-uv venv --python 3.12
-source .venv/bin/activate  # On Unix/macOS
-# Or on Windows:
-# .venv\Scripts\activate
-```
-
-4. Install dependencies:
-
-```bash
-uv pip install -r requirements.txt
-```
-
-### Browser Automation Tool (Optional)
-```bash
-playwright install
-```
-
-## Configuration
-
-OpenManus requires configuration for the LLM APIs it uses. Follow these steps to set up your configuration:
-
-1. Create a `config.toml` file in the `config` directory (you can copy from the example):
-
-```bash
-cp config/config.example.toml config/config.toml
-```
-
-2. Edit `config/config.toml` to add your API keys and customize settings:
-
-```toml
-# Global LLM configuration
-[llm]
-model = "gpt-4o"
-base_url = "https://api.openai.com/v1"
-api_key = "sk-..."  # Replace with your actual API key
-max_tokens = 4096
-temperature = 0.0
-
-# Optional configuration for specific LLM models
-[llm.vision]
-model = "gpt-4o"
-base_url = "https://api.openai.com/v1"
-api_key = "sk-..."  # Replace with your actual API key
-```
-
-## Quick Start
-
-One line for run OpenManus:
-
-```bash
-python main.py
-```
-
-Then input your idea via terminal!
-
-For MCP tool version, you can run:
-```bash
-python run_mcp.py
-```
-
-For unstable multi-agent version, you also can run:
-
-```bash
-python run_flow.py
-```
-
-## How to contribute
-
-We welcome any friendly suggestions and helpful contributions! Just create issues or submit pull requests.
-
-Or contact @mannaandpoem via 📧email: mannaandpoem@gmail.com
-
-**Note**: Before submitting a pull request, please use the pre-commit tool to check your changes. Run `pre-commit run --all-files` to execute the checks.
-
-## Community Group
-Join our networking group on Feishu and share your experience with other developers!
-
-<div align="center" style="display: flex; gap: 20px;">
-    <img src="assets/community_group.jpg" alt="OpenManus 交流群" width="300" />
-</div>
-
-## Star History
-
-[![Star History Chart](https://api.star-history.com/svg?repos=mannaandpoem/OpenManus&type=Date)](https://star-history.com/#mannaandpoem/OpenManus&Date)
-
-## Acknowledgement
-
-Thanks to [anthropic-computer-use](https://github.com/anthropics/anthropic-quickstarts/tree/main/computer-use-demo)
-and [browser-use](https://github.com/browser-use/browser-use) for providing basic support for this project!
-
-Additionally, we are grateful to [AAAJ](https://github.com/metauto-ai/agent-as-a-judge), [MetaGPT](https://github.com/geekan/MetaGPT), [OpenHands](https://github.com/All-Hands-AI/OpenHands) and [SWE-agent](https://github.com/SWE-agent/SWE-agent).
-
-OpenManus is built by contributors from MetaGPT. Huge thanks to this agent community!
-
-## Cite
-```bibtex
-@misc{openmanus2025,
-  author = {Xinbin Liang and Jinyu Xiang and Zhaoyang Yu and Jiayi Zhang and Sirui Hong},
-  title = {OpenManus: An open-source framework for building general AI agents},
-  year = {2025},
-  publisher = {GitHub},
-  journal = {GitHub repository},
-  howpublished = {\url{https://github.com/mannaandpoem/OpenManus}},
-}
-```
diff --git a/openmanus_rl/agentgym/OpenManus/README_ja.md b/openmanus_rl/agentgym/OpenManus/README_ja.md
deleted file mode 100644
index be163fbf..00000000
--- a/openmanus_rl/agentgym/OpenManus/README_ja.md
+++ /dev/null
@@ -1,175 +0,0 @@
-<p align="center">
-  <img src="assets/logo.jpg" width="200"/>
-</p>
-
-[English](README.md) | [中文](README_zh.md) | [한국어](README_ko.md) | 日本語
-
-[![GitHub stars](https://img.shields.io/github/stars/mannaandpoem/OpenManus?style=social)](https://github.com/mannaandpoem/OpenManus/stargazers)
-&ensp;
-[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) &ensp;
-[![Discord Follow](https://dcbadge.vercel.app/api/server/DYn29wFk9z?style=flat)](https://discord.gg/DYn29wFk9z)
-
-# 👋 OpenManus
-
-Manusは素晴らしいですが、OpenManusは*招待コード*なしでどんなアイデアも実現できます！🛫
-
-私たちのチームメンバー [@Xinbin Liang](https://github.com/mannaandpoem) と [@Jinyu Xiang](https://github.com/XiangJinyu)（主要開発者）、そして [@Zhaoyang Yu](https://github.com/MoshiQAQ)、[@Jiayi Zhang](https://github.com/didiforgithub)、[@Sirui Hong](https://github.com/stellaHSR) は [@MetaGPT](https://github.com/geekan/MetaGPT) から来ました。プロトタイプは3時間以内に立ち上げられ、継続的に開発を進めています！
-
-これはシンプルな実装ですので、どんな提案、貢献、フィードバックも歓迎します！
-
-OpenManusで自分だけのエージェントを楽しみましょう！
-
-また、UIUCとOpenManusの研究者が共同開発した[OpenManus-RL](https://github.com/OpenManus/OpenManus-RL)をご紹介できることを嬉しく思います。これは強化学習（RL）ベース（GRPOなど）のLLMエージェントチューニング手法に特化したオープンソースプロジェクトです。
-
-## プロジェクトデモ
-
-<video src="https://private-user-images.githubusercontent.com/61239030/420168772-6dcfd0d2-9142-45d9-b74e-d10aa75073c6.mp4?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NDEzMTgwNTksIm5iZiI6MTc0MTMxNzc1OSwicGF0aCI6Ii82MTIzOTAzMC80MjAxNjg3NzItNmRjZmQwZDItOTE0Mi00NWQ5LWI3NGUtZDEwYWE3NTA3M2M2Lm1wND9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAzMDclMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMzA3VDAzMjIzOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTdiZjFkNjlmYWNjMmEzOTliM2Y3M2VlYjgyNDRlZDJmOWE3NWZhZjE1MzhiZWY4YmQ3NjdkNTYwYTU5ZDA2MzYmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.UuHQCgWYkh0OQq9qsUWqGsUbhG3i9jcZDAMeHjLt5T4" data-canonical-src="https://private-user-images.githubusercontent.com/61239030/420168772-6dcfd0d2-9142-45d9-b74e-d10aa75073c6.mp4?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NDEzMTgwNTksIm5iZiI6MTc0MTMxNzc1OSwicGF0aCI6Ii82MTIzOTAzMC80MjAxNjg3NzItNmRjZmQwZDItOTE0Mi00NWQ5LWI3NGUtZDEwYWE3NTA3M2M2Lm1wND9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAzMDclMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMzA3VDAzMjIzOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTdiZjFkNjlmYWNjMmEzOTliM2Y3M2VlYjgyNDRlZDJmOWE3NWZhZjE1MzhiZWY4YmQ3NjdkNTYwYTU5ZDA2MzYmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.UuHQCgWYkh0OQq9qsUWqGsUbhG3i9jcZDAMeHjLt5T4" controls="controls" muted="muted" class="d-block rounded-bottom-2 border-top width-fit" style="max-height:640px; min-height: 200px"></video>
-
-## インストール方法
-
-インストール方法は2つ提供しています。方法2（uvを使用）は、より高速なインストールと優れた依存関係管理のため推奨されています。
-
-### 方法1：condaを使用
-
-1. 新しいconda環境を作成します：
-
-```bash
-conda create -n open_manus python=3.12
-conda activate open_manus
-```
-
-2. リポジトリをクローンします：
-
-```bash
-git clone https://github.com/mannaandpoem/OpenManus.git
-cd OpenManus
-```
-
-3. 依存関係をインストールします：
-
-```bash
-pip install -r requirements.txt
-```
-
-### 方法2：uvを使用（推奨）
-
-1. uv（高速なPythonパッケージインストーラーと管理機能）をインストールします：
-
-```bash
-curl -LsSf https://astral.sh/uv/install.sh | sh
-```
-
-2. リポジトリをクローンします：
-
-```bash
-git clone https://github.com/mannaandpoem/OpenManus.git
-cd OpenManus
-```
-
-3. 新しい仮想環境を作成してアクティベートします：
-
-```bash
-uv venv --python 3.12
-source .venv/bin/activate  # Unix/macOSの場合
-# Windowsの場合：
-# .venv\Scripts\activate
-```
-
-4. 依存関係をインストールします：
-
-```bash
-uv pip install -r requirements.txt
-```
-
-### ブラウザ自動化ツール（オプション）
-```bash
-playwright install
-```
-
-## 設定
-
-OpenManusを使用するには、LLM APIの設定が必要です。以下の手順に従って設定してください：
-
-1. `config`ディレクトリに`config.toml`ファイルを作成します（サンプルからコピーできます）：
-
-```bash
-cp config/config.example.toml config/config.toml
-```
-
-2. `config/config.toml`を編集してAPIキーを追加し、設定をカスタマイズします：
-
-```toml
-# グローバルLLM設定
-[llm]
-model = "gpt-4o"
-base_url = "https://api.openai.com/v1"
-api_key = "sk-..."  # 実際のAPIキーに置き換えてください
-max_tokens = 4096
-temperature = 0.0
-
-# 特定のLLMモデル用のオプション設定
-[llm.vision]
-model = "gpt-4o"
-base_url = "https://api.openai.com/v1"
-api_key = "sk-..."  # 実際のAPIキーに置き換えてください
-```
-
-## クイックスタート
-
-OpenManusを実行する一行コマンド：
-
-```bash
-python main.py
-```
-
-その後、ターミナルからプロンプトを入力してください！
-
-MCP ツールバージョンを使用する場合は、以下を実行します：
-```bash
-python run_mcp.py
-```
-
-開発中のマルチエージェントバージョンを試すには、以下を実行します：
-
-```bash
-python run_flow.py
-```
-
-## 貢献方法
-
-我々は建設的な意見や有益な貢献を歓迎します！issueを作成するか、プルリクエストを提出してください。
-
-または @mannaandpoem に📧メールでご連絡ください：mannaandpoem@gmail.com
-
-**注意**: プルリクエストを送信する前に、pre-commitツールを使用して変更を確認してください。`pre-commit run --all-files`を実行してチェックを実行します。
-
-## コミュニティグループ
-Feishuのネットワーキンググループに参加して、他の開発者と経験を共有しましょう！
-
-<div align="center" style="display: flex; gap: 20px;">
-    <img src="assets/community_group.jpg" alt="OpenManus 交流群" width="300" />
-</div>
-
-## スター履歴
-
-[![Star History Chart](https://api.star-history.com/svg?repos=mannaandpoem/OpenManus&type=Date)](https://star-history.com/#mannaandpoem/OpenManus&Date)
-
-## 謝辞
-
-このプロジェクトの基本的なサポートを提供してくれた[anthropic-computer-use](https://github.com/anthropics/anthropic-quickstarts/tree/main/computer-use-demo)
-と[browser-use](https://github.com/browser-use/browser-use)に感謝します！
-
-さらに、[AAAJ](https://github.com/metauto-ai/agent-as-a-judge)、[MetaGPT](https://github.com/geekan/MetaGPT)、[OpenHands](https://github.com/All-Hands-AI/OpenHands)、[SWE-agent](https://github.com/SWE-agent/SWE-agent)にも感謝します。
-
-OpenManusはMetaGPTのコントリビューターによって構築されました。このエージェントコミュニティに大きな感謝を！
-
-## 引用
-```bibtex
-@misc{openmanus2025,
-  author = {Xinbin Liang and Jinyu Xiang and Zhaoyang Yu and Jiayi Zhang and Sirui Hong},
-  title = {OpenManus: An open-source framework for building general AI agents},
-  year = {2025},
-  publisher = {GitHub},
-  journal = {GitHub repository},
-  howpublished = {\url{https://github.com/mannaandpoem/OpenManus}},
-}
diff --git a/openmanus_rl/agentgym/OpenManus/README_ko.md b/openmanus_rl/agentgym/OpenManus/README_ko.md
deleted file mode 100644
index 9a17c4b0..00000000
--- a/openmanus_rl/agentgym/OpenManus/README_ko.md
+++ /dev/null
@@ -1,176 +0,0 @@
-<p align="center">
-  <img src="assets/logo.jpg" width="200"/>
-</p>
-
-[English](README.md) | [中文](README_zh.md) | 한국어 | [日本語](README_ja.md)
-
-[![GitHub stars](https://img.shields.io/github/stars/mannaandpoem/OpenManus?style=social)](https://github.com/mannaandpoem/OpenManus/stargazers)
-&ensp;
-[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) &ensp;
-[![Discord Follow](https://dcbadge.vercel.app/api/server/DYn29wFk9z?style=flat)](https://discord.gg/DYn29wFk9z)
-
-# 👋 OpenManus
-
-Manus는 놀라운 도구지만, OpenManus는 *초대 코드* 없이도 모든 아이디어를 실현할 수 있습니다! 🛫
-
-우리 팀의 멤버인 [@Xinbin Liang](https://github.com/mannaandpoem)와 [@Jinyu Xiang](https://github.com/XiangJinyu) (핵심 작성자), 그리고 [@Zhaoyang Yu](https://github.com/MoshiQAQ), [@Jiayi Zhang](https://github.com/didiforgithub), [@Sirui Hong](https://github.com/stellaHSR)이 함께 했습니다. 우리는 [@MetaGPT](https://github.com/geekan/MetaGPT)로부터 왔습니다. 프로토타입은 단 3시간 만에 출시되었으며, 계속해서 발전하고 있습니다!
-
-이 프로젝트는 간단한 구현에서 시작되었으며, 여러분의 제안, 기여 및 피드백을 환영합니다!
-
-OpenManus를 통해 여러분만의 에이전트를 즐겨보세요!
-
-또한 [OpenManus-RL](https://github.com/OpenManus/OpenManus-RL)을 소개하게 되어 기쁩니다. OpenManus와 UIUC 연구자들이 공동 개발한 이 오픈소스 프로젝트는 LLM 에이전트에 대해 강화 학습(RL) 기반 (예: GRPO) 튜닝 방법을 제공합니다.
-
-## 프로젝트 데모
-
-<video src="https://private-user-images.githubusercontent.com/61239030/420168772-6dcfd0d2-9142-45d9-b74e-d10aa75073c6.mp4?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NDEzMTgwNTksIm5iZiI6MTc0MTMxNzc1OSwicGF0aCI6Ii82MTIzOTAzMC80MjAxNjg3NzItNmRjZmQwZDItOTE0Mi00NWQ5LWI3NGUtZDEwYWE3NTA3M2M2Lm1wND9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAzMDclMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMzA3VDAzMjIzOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTdiZjFkNjlmYWNjMmEzOTliM2Y3M2VlYjgyNDRlZDJmOWE3NWZhZjE1MzhiZWY4YmQ3NjdkNTYwYTU5ZDA2MzYmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.UuHQCgWYkh0OQq9qsUWqGsUbhG3i9jcZDAMeHjLt5T4" data-canonical-src="https://private-user-images.githubusercontent.com/61239030/420168772-6dcfd0d2-9142-45d9-b74e-d10aa75073c6.mp4?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NDEzMTgwNTksIm5iZiI6MTc0MTMxNzc1OSwicGF0aCI6Ii82MTIzOTAzMC80MjAxNjg3NzItNmRjZmQwZDItOTE0Mi00NWQ5LWI3NGUtZDEwYWE3NTA3M2M2Lm1wND9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAzMDclMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMzA3VDAzMjIzOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTdiZjFkNjlmYWNjMmEzOTliM2Y3M2VlYjgyNDRlZDJmOWE3NWZhZjE1MzhiZWY4YmQ3NjdkNTYwYTU5ZDA2MzYmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.UuHQCgWYkh0OQq9qsUWqGsUbhG3i9jcZDAMeHjLt5T4" controls="controls" muted="muted" class="d-block rounded-bottom-2 border-top width-fit" style="max-height:640px; min-height: 200px"></video>
-
-## 설치 방법
-
-두 가지 설치 방법을 제공합니다. **방법 2 (uv 사용)** 이 더 빠른 설치와 효율적인 종속성 관리를 위해 권장됩니다.
-
-### 방법 1: conda 사용
-
-1. 새로운 conda 환경을 생성합니다:
-
-```bash
-conda create -n open_manus python=3.12
-conda activate open_manus
-```
-
-2. 저장소를 클론합니다:
-
-```bash
-git clone https://github.com/mannaandpoem/OpenManus.git
-cd OpenManus
-```
-
-3. 종속성을 설치합니다:
-
-```bash
-pip install -r requirements.txt
-```
-
-### 방법 2: uv 사용 (권장)
-
-1. uv를 설치합니다. (빠른 Python 패키지 설치 및 종속성 관리 도구):
-
-```bash
-curl -LsSf https://astral.sh/uv/install.sh | sh
-```
-
-2. 저장소를 클론합니다:
-
-```bash
-git clone https://github.com/mannaandpoem/OpenManus.git
-cd OpenManus
-```
-
-3. 새로운 가상 환경을 생성하고 활성화합니다:
-
-```bash
-uv venv --python 3.12
-source .venv/bin/activate  # Unix/macOS의 경우
-# Windows의 경우:
-# .venv\Scripts\activate
-```
-
-4. 종속성을 설치합니다:
-
-```bash
-uv pip install -r requirements.txt
-```
-
-### 브라우저 자동화 도구 (선택사항)
-```bash
-playwright install
-```
-
-## 설정 방법
-
-OpenManus를 사용하려면 사용하는 LLM API에 대한 설정이 필요합니다. 아래 단계를 따라 설정을 완료하세요:
-
-1. `config` 디렉토리에 `config.toml` 파일을 생성하세요 (예제 파일을 복사하여 사용할 수 있습니다):
-
-```bash
-cp config/config.example.toml config/config.toml
-```
-
-2. `config/config.toml` 파일을 편집하여 API 키를 추가하고 설정을 커스터마이징하세요:
-
-```toml
-# 전역 LLM 설정
-[llm]
-model = "gpt-4o"
-base_url = "https://api.openai.com/v1"
-api_key = "sk-..."  # 실제 API 키로 변경하세요
-max_tokens = 4096
-temperature = 0.0
-
-# 특정 LLM 모델에 대한 선택적 설정
-[llm.vision]
-model = "gpt-4o"
-base_url = "https://api.openai.com/v1"
-api_key = "sk-..."  # 실제 API 키로 변경하세요
-```
-
-## 빠른 시작
-
-OpenManus를 실행하는 한 줄 명령어:
-
-```bash
-python main.py
-```
-
-이후 터미널에서 아이디어를 작성하세요!
-
-MCP 도구 버전을 사용하려면 다음을 실행하세요:
-```bash
-python run_mcp.py
-```
-
-불안정한 멀티 에이전트 버전을 실행하려면 다음을 실행할 수 있습니다:
-
-```bash
-python run_flow.py
-```
-
-## 기여 방법
-
-모든 친절한 제안과 유용한 기여를 환영합니다! 이슈를 생성하거나 풀 리퀘스트를 제출해 주세요.
-
-또는 📧 메일로 연락주세요. @mannaandpoem : mannaandpoem@gmail.com
-
-**참고**: pull request를 제출하기 전에 pre-commit 도구를 사용하여 변경 사항을 확인하십시오. `pre-commit run --all-files`를 실행하여 검사를 실행합니다.
-
-## 커뮤니티 그룹
-Feishu 네트워킹 그룹에 참여하여 다른 개발자들과 경험을 공유하세요!
-
-<div align="center" style="display: flex; gap: 20px;">
-    <img src="assets/community_group.jpg" alt="OpenManus 交流群" width="300" />
-</div>
-
-## Star History
-
-[![Star History Chart](https://api.star-history.com/svg?repos=mannaandpoem/OpenManus&type=Date)](https://star-history.com/#mannaandpoem/OpenManus&Date)
-
-## 감사의 글
-
-이 프로젝트에 기본적인 지원을 제공해 주신 [anthropic-computer-use](https://github.com/anthropics/anthropic-quickstarts/tree/main/computer-use-demo)와
-[browser-use](https://github.com/browser-use/browser-use)에게 감사드립니다!
-
-또한, [AAAJ](https://github.com/metauto-ai/agent-as-a-judge), [MetaGPT](https://github.com/geekan/MetaGPT), [OpenHands](https://github.com/All-Hands-AI/OpenHands), [SWE-agent](https://github.com/SWE-agent/SWE-agent)에 깊은 감사를 드립니다.
-
-OpenManus는 MetaGPT 기여자들에 의해 개발되었습니다. 이 에이전트 커뮤니티에 깊은 감사를 전합니다!
-
-## 인용
-```bibtex
-@misc{openmanus2025,
-  author = {Xinbin Liang and Jinyu Xiang and Zhaoyang Yu and Jiayi Zhang and Sirui Hong},
-  title = {OpenManus: An open-source framework for building general AI agents},
-  year = {2025},
-  publisher = {GitHub},
-  journal = {GitHub repository},
-  howpublished = {\url{https://github.com/mannaandpoem/OpenManus}},
-}
-```
diff --git a/openmanus_rl/agentgym/OpenManus/README_zh.md b/openmanus_rl/agentgym/OpenManus/README_zh.md
deleted file mode 100644
index bbd75505..00000000
--- a/openmanus_rl/agentgym/OpenManus/README_zh.md
+++ /dev/null
@@ -1,179 +0,0 @@
-<p align="center">
-  <img src="assets/logo.jpg" width="200"/>
-</p>
-
-[English](README.md) | 中文 | [한국어](README_ko.md) | [日本語](README_ja.md)
-
-[![GitHub stars](https://img.shields.io/github/stars/mannaandpoem/OpenManus?style=social)](https://github.com/mannaandpoem/OpenManus/stargazers)
-&ensp;
-[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) &ensp;
-[![Discord Follow](https://dcbadge.vercel.app/api/server/DYn29wFk9z?style=flat)](https://discord.gg/DYn29wFk9z)
-
-# 👋 OpenManus
-
-Manus 非常棒，但 OpenManus 无需邀请码即可实现任何创意 🛫！
-
-我们的团队成员 [@Xinbin Liang](https://github.com/mannaandpoem) 和 [@Jinyu Xiang](https://github.com/XiangJinyu)（核心作者），以及 [@Zhaoyang Yu](https://github.com/MoshiQAQ)、[@Jiayi Zhang](https://github.com/didiforgithub) 和 [@Sirui Hong](https://github.com/stellaHSR)，来自 [@MetaGPT](https://github.com/geekan/MetaGPT)团队。我们在 3
-小时内完成了开发并持续迭代中！
-
-这是一个简洁的实现方案，欢迎任何建议、贡献和反馈！
-
-用 OpenManus 开启你的智能体之旅吧！
-
-我们也非常高兴地向大家介绍 [OpenManus-RL](https://github.com/OpenManus/OpenManus-RL)，这是一个专注于基于强化学习（RL，例如 GRPO）的方法来优化大语言模型（LLM）智能体的开源项目，由来自UIUC 和 OpenManus 的研究人员合作开发。
-
-## 项目演示
-
-<video src="https://private-user-images.githubusercontent.com/61239030/420168772-6dcfd0d2-9142-45d9-b74e-d10aa75073c6.mp4?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NDEzMTgwNTksIm5iZiI6MTc0MTMxNzc1OSwicGF0aCI6Ii82MTIzOTAzMC80MjAxNjg3NzItNmRjZmQwZDItOTE0Mi00NWQ5LWI3NGUtZDEwYWE3NTA3M2M2Lm1wND9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAzMDclMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMzA3VDAzMjIzOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTdiZjFkNjlmYWNjMmEzOTliM2Y3M2VlYjgyNDRlZDJmOWE3NWZhZjE1MzhiZWY4YmQ3NjdkNTYwYTU5ZDA2MzYmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.UuHQCgWYkh0OQq9qsUWqGsUbhG3i9jcZDAMeHjLt5T4" data-canonical-src="https://private-user-images.githubusercontent.com/61239030/420168772-6dcfd0d2-9142-45d9-b74e-d10aa75073c6.mp4?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NDEzMTgwNTksIm5iZiI6MTc0MTMxNzc1OSwicGF0aCI6Ii82MTIzOTAzMC80MjAxNjg3NzItNmRjZmQwZDItOTE0Mi00NWQ5LWI3NGUtZDEwYWE3NTA3M2M2Lm1wND9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAzMDclMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMzA3VDAzMjIzOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTdiZjFkNjlmYWNjMmEzOTliM2Y3M2VlYjgyNDRlZDJmOWE3NWZhZjE1MzhiZWY4YmQ3NjdkNTYwYTU5ZDA2MzYmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.UuHQCgWYkh0OQq9qsUWqGsUbhG3i9jcZDAMeHjLt5T4" controls="controls" muted="muted" class="d-block rounded-bottom-2 border-top width-fit" style="max-height:640px; min-height: 200px"></video>
-
-## 安装指南
-
-我们提供两种安装方式。推荐使用方式二（uv），因为它能提供更快的安装速度和更好的依赖管理。
-
-### 方式一：使用 conda
-
-1. 创建新的 conda 环境：
-
-```bash
-conda create -n open_manus python=3.12
-conda activate open_manus
-```
-
-2. 克隆仓库：
-
-```bash
-git clone https://github.com/mannaandpoem/OpenManus.git
-cd OpenManus
-```
-
-3. 安装依赖：
-
-```bash
-pip install -r requirements.txt
-```
-
-### 方式二：使用 uv（推荐）
-
-1. 安装 uv（一个快速的 Python 包管理器）：
-
-```bash
-curl -LsSf https://astral.sh/uv/install.sh | sh
-```
-
-2. 克隆仓库：
-
-```bash
-git clone https://github.com/mannaandpoem/OpenManus.git
-cd OpenManus
-```
-
-3. 创建并激活虚拟环境：
-
-```bash
-uv venv --python 3.12
-source .venv/bin/activate  # Unix/macOS 系统
-# Windows 系统使用：
-# .venv\Scripts\activate
-```
-
-4. 安装依赖：
-
-```bash
-uv pip install -r requirements.txt
-```
-
-### 浏览器自动化工具（可选）
-```bash
-playwright install
-```
-
-## 配置说明
-
-OpenManus 需要配置使用的 LLM API，请按以下步骤设置：
-
-1. 在 `config` 目录创建 `config.toml` 文件（可从示例复制）：
-
-```bash
-cp config/config.example.toml config/config.toml
-```
-
-2. 编辑 `config/config.toml` 添加 API 密钥和自定义设置：
-
-```toml
-# 全局 LLM 配置
-[llm]
-model = "gpt-4o"
-base_url = "https://api.openai.com/v1"
-api_key = "sk-..."  # 替换为真实 API 密钥
-max_tokens = 4096
-temperature = 0.0
-
-# 可选特定 LLM 模型配置
-[llm.vision]
-model = "gpt-4o"
-base_url = "https://api.openai.com/v1"
-api_key = "sk-..."  # 替换为真实 API 密钥
-```
-
-## 快速启动
-
-一行命令运行 OpenManus：
-
-```bash
-python main.py
-```
-
-然后通过终端输入你的创意！
-
-如需使用 MCP 工具版本，可运行：
-```bash
-python run_mcp.py
-```
-
-如需体验不稳定的多智能体版本，可运行：
-
-```bash
-python run_flow.py
-```
-
-## 贡献指南
-
-我们欢迎任何友好的建议和有价值的贡献！可以直接创建 issue 或提交 pull request。
-
-或通过 📧 邮件联系 @mannaandpoem：mannaandpoem@gmail.com
-
-**注意**: 在提交 pull request 之前，请使用 pre-commit 工具检查您的更改。运行 `pre-commit run --all-files` 来执行检查。
-
-## 交流群
-
-加入我们的飞书交流群，与其他开发者分享经验！
-
-<div align="center" style="display: flex; gap: 20px;">
-    <img src="assets/community_group.jpg" alt="OpenManus 交流群" width="300" />
-</div>
-
-## Star 数量
-
-[![Star History Chart](https://api.star-history.com/svg?repos=mannaandpoem/OpenManus&type=Date)](https://star-history.com/#mannaandpoem/OpenManus&Date)
-
-## 致谢
-
-特别感谢 [anthropic-computer-use](https://github.com/anthropics/anthropic-quickstarts/tree/main/computer-use-demo)
-和 [browser-use](https://github.com/browser-use/browser-use) 为本项目提供的基础支持！
-
-此外，我们感谢 [AAAJ](https://github.com/metauto-ai/agent-as-a-judge)，[MetaGPT](https://github.com/geekan/MetaGPT)，[OpenHands](https://github.com/All-Hands-AI/OpenHands) 和 [SWE-agent](https://github.com/SWE-agent/SWE-agent).
-
-OpenManus 由 MetaGPT 社区的贡献者共同构建，感谢这个充满活力的智能体开发者社区！
-
-## 引用我们
-
-```bibtex
-@misc{openmanus2025,
-  author = {Xinbin Liang and Jinyu Xiang and Zhaoyang Yu and Jiayi Zhang and Sirui Hong},
-  title = {OpenManus: An open-source framework for building general AI agents},
-  year = {2025},
-  publisher = {GitHub},
-  journal = {GitHub repository},
-  howpublished = {\url{https://github.com/mannaandpoem/OpenManus}},
-}
-```
diff --git a/openmanus_rl/agentgym/OpenManus/app/__init__.py b/openmanus_rl/agentgym/OpenManus/app/__init__.py
deleted file mode 100644
index 0749c6de..00000000
--- a/openmanus_rl/agentgym/OpenManus/app/__init__.py
+++ /dev/null
@@ -1,10 +0,0 @@
-# Python version check: 3.11-3.13
-import sys
-
-
-if sys.version_info < (3, 11) or sys.version_info > (3, 13):
-    print(
-        "Warning: Unsupported Python version {ver}, please use 3.11-3.13".format(
-            ver=".".join(map(str, sys.version_info))
-        )
-    )
diff --git a/openmanus_rl/agentgym/OpenManus/app/agent/__init__.py b/openmanus_rl/agentgym/OpenManus/app/agent/__init__.py
deleted file mode 100644
index 082d91d2..00000000
--- a/openmanus_rl/agentgym/OpenManus/app/agent/__init__.py
+++ /dev/null
@@ -1,18 +0,0 @@
-from app.agent.base import BaseAgent
-from app.agent.browser import BrowserAgent
-from app.agent.mcp import MCPAgent
-from app.agent.planning import PlanningAgent
-from app.agent.react import ReActAgent
-from app.agent.swe import SWEAgent
-from app.agent.toolcall import ToolCallAgent
-
-
-__all__ = [
-    "BaseAgent",
-    "BrowserAgent",
-    "PlanningAgent",
-    "ReActAgent",
-    "SWEAgent",
-    "ToolCallAgent",
-    "MCPAgent",
-]
diff --git a/openmanus_rl/agentgym/OpenManus/app/agent/base.py b/openmanus_rl/agentgym/OpenManus/app/agent/base.py
deleted file mode 100644
index 65f66007..00000000
--- a/openmanus_rl/agentgym/OpenManus/app/agent/base.py
+++ /dev/null
@@ -1,196 +0,0 @@
-from abc import ABC, abstractmethod
-from contextlib import asynccontextmanager
-from typing import List, Optional
-
-from pydantic import BaseModel, Field, model_validator
-
-from app.llm import LLM
-from app.logger import logger
-from app.sandbox.client import SANDBOX_CLIENT
-from app.schema import ROLE_TYPE, AgentState, Memory, Message
-
-
-class BaseAgent(BaseModel, ABC):
-    """Abstract base class for managing agent state and execution.
-
-    Provides foundational functionality for state transitions, memory management,
-    and a step-based execution loop. Subclasses must implement the `step` method.
-    """
-
-    # Core attributes
-    name: str = Field(..., description="Unique name of the agent")
-    description: Optional[str] = Field(None, description="Optional agent description")
-
-    # Prompts
-    system_prompt: Optional[str] = Field(
-        None, description="System-level instruction prompt"
-    )
-    next_step_prompt: Optional[str] = Field(
-        None, description="Prompt for determining next action"
-    )
-
-    # Dependencies
-    llm: LLM = Field(default_factory=LLM, description="Language model instance")
-    memory: Memory = Field(default_factory=Memory, description="Agent's memory store")
-    state: AgentState = Field(
-        default=AgentState.IDLE, description="Current agent state"
-    )
-
-    # Execution control
-    max_steps: int = Field(default=10, description="Maximum steps before termination")
-    current_step: int = Field(default=0, description="Current step in execution")
-
-    duplicate_threshold: int = 2
-
-    class Config:
-        arbitrary_types_allowed = True
-        extra = "allow"  # Allow extra fields for flexibility in subclasses
-
-    @model_validator(mode="after")
-    def initialize_agent(self) -> "BaseAgent":
-        """Initialize agent with default settings if not provided."""
-        if self.llm is None or not isinstance(self.llm, LLM):
-            self.llm = LLM(config_name=self.name.lower())
-        if not isinstance(self.memory, Memory):
-            self.memory = Memory()
-        return self
-
-    @asynccontextmanager
-    async def state_context(self, new_state: AgentState):
-        """Context manager for safe agent state transitions.
-
-        Args:
-            new_state: The state to transition to during the context.
-
-        Yields:
-            None: Allows execution within the new state.
-
-        Raises:
-            ValueError: If the new_state is invalid.
-        """
-        if not isinstance(new_state, AgentState):
-            raise ValueError(f"Invalid state: {new_state}")
-
-        previous_state = self.state
-        self.state = new_state
-        try:
-            yield
-        except Exception as e:
-            self.state = AgentState.ERROR  # Transition to ERROR on failure
-            raise e
-        finally:
-            self.state = previous_state  # Revert to previous state
-
-    def update_memory(
-        self,
-        role: ROLE_TYPE,  # type: ignore
-        content: str,
-        base64_image: Optional[str] = None,
-        **kwargs,
-    ) -> None:
-        """Add a message to the agent's memory.
-
-        Args:
-            role: The role of the message sender (user, system, assistant, tool).
-            content: The message content.
-            base64_image: Optional base64 encoded image.
-            **kwargs: Additional arguments (e.g., tool_call_id for tool messages).
-
-        Raises:
-            ValueError: If the role is unsupported.
-        """
-        message_map = {
-            "user": Message.user_message,
-            "system": Message.system_message,
-            "assistant": Message.assistant_message,
-            "tool": lambda content, **kw: Message.tool_message(content, **kw),
-        }
-
-        if role not in message_map:
-            raise ValueError(f"Unsupported message role: {role}")
-
-        # Create message with appropriate parameters based on role
-        kwargs = {"base64_image": base64_image, **(kwargs if role == "tool" else {})}
-        self.memory.add_message(message_map[role](content, **kwargs))
-
-    async def run(self, request: Optional[str] = None) -> str:
-        """Execute the agent's main loop asynchronously.
-
-        Args:
-            request: Optional initial user request to process.
-
-        Returns:
-            A string summarizing the execution results.
-
-        Raises:
-            RuntimeError: If the agent is not in IDLE state at start.
-        """
-        if self.state != AgentState.IDLE:
-            raise RuntimeError(f"Cannot run agent from state: {self.state}")
-
-        if request:
-            self.update_memory("user", request)
-
-        results: List[str] = []
-        async with self.state_context(AgentState.RUNNING):
-            while (
-                self.current_step < self.max_steps and self.state != AgentState.FINISHED
-            ):
-                self.current_step += 1
-                logger.info(f"Executing step {self.current_step}/{self.max_steps}")
-                step_result = await self.step()
-
-                # Check for stuck state
-                if self.is_stuck():
-                    self.handle_stuck_state()
-
-                results.append(f"Step {self.current_step}: {step_result}")
-
-            if self.current_step >= self.max_steps:
-                self.current_step = 0
-                self.state = AgentState.IDLE
-                results.append(f"Terminated: Reached max steps ({self.max_steps})")
-        await SANDBOX_CLIENT.cleanup()
-        return "\n".join(results) if results else "No steps executed"
-
-    @abstractmethod
-    async def step(self) -> str:
-        """Execute a single step in the agent's workflow.
-
-        Must be implemented by subclasses to define specific behavior.
-        """
-
-    def handle_stuck_state(self):
-        """Handle stuck state by adding a prompt to change strategy"""
-        stuck_prompt = "\
-        Observed duplicate responses. Consider new strategies and avoid repeating ineffective paths already attempted."
-        self.next_step_prompt = f"{stuck_prompt}\n{self.next_step_prompt}"
-        logger.warning(f"Agent detected stuck state. Added prompt: {stuck_prompt}")
-
-    def is_stuck(self) -> bool:
-        """Check if the agent is stuck in a loop by detecting duplicate content"""
-        if len(self.memory.messages) < 2:
-            return False
-
-        last_message = self.memory.messages[-1]
-        if not last_message.content:
-            return False
-
-        # Count identical content occurrences
-        duplicate_count = sum(
-            1
-            for msg in reversed(self.memory.messages[:-1])
-            if msg.role == "assistant" and msg.content == last_message.content
-        )
-
-        return duplicate_count >= self.duplicate_threshold
-
-    @property
-    def messages(self) -> List[Message]:
-        """Retrieve a list of messages from the agent's memory."""
-        return self.memory.messages
-
-    @messages.setter
-    def messages(self, value: List[Message]):
-        """Set the list of messages in the agent's memory."""
-        self.memory.messages = value
diff --git a/openmanus_rl/agentgym/OpenManus/app/agent/browser.py b/openmanus_rl/agentgym/OpenManus/app/agent/browser.py
deleted file mode 100644
index ae0ce2fa..00000000
--- a/openmanus_rl/agentgym/OpenManus/app/agent/browser.py
+++ /dev/null
@@ -1,129 +0,0 @@
-import json
-from typing import Any, Optional
-
-from pydantic import Field
-
-from app.agent.toolcall import ToolCallAgent
-from app.logger import logger
-from app.prompt.browser import NEXT_STEP_PROMPT, SYSTEM_PROMPT
-from app.schema import Message, ToolChoice
-from app.tool import BrowserUseTool, Terminate, ToolCollection
-
-
-class BrowserAgent(ToolCallAgent):
-    """
-    A browser agent that uses the browser_use library to control a browser.
-
-    This agent can navigate web pages, interact with elements, fill forms,
-    extract content, and perform other browser-based actions to accomplish tasks.
-    """
-
-    name: str = "browser"
-    description: str = "A browser agent that can control a browser to accomplish tasks"
-
-    system_prompt: str = SYSTEM_PROMPT
-    next_step_prompt: str = NEXT_STEP_PROMPT
-
-    max_observe: int = 10000
-    max_steps: int = 20
-
-    # Configure the available tools
-    available_tools: ToolCollection = Field(
-        default_factory=lambda: ToolCollection(BrowserUseTool(), Terminate())
-    )
-
-    # Use Auto for tool choice to allow both tool usage and free-form responses
-    tool_choices: ToolChoice = ToolChoice.AUTO
-    special_tool_names: list[str] = Field(default_factory=lambda: [Terminate().name])
-
-    _current_base64_image: Optional[str] = None
-
-    async def _handle_special_tool(self, name: str, result: Any, **kwargs):
-        if not self._is_special_tool(name):
-            return
-        else:
-            await self.available_tools.get_tool(BrowserUseTool().name).cleanup()
-            await super()._handle_special_tool(name, result, **kwargs)
-
-    async def get_browser_state(self) -> Optional[dict]:
-        """Get the current browser state for context in next steps."""
-        browser_tool = self.available_tools.get_tool(BrowserUseTool().name)
-        if not browser_tool:
-            return None
-
-        try:
-            # Get browser state directly from the tool
-            result = await browser_tool.get_current_state()
-
-            if result.error:
-                logger.debug(f"Browser state error: {result.error}")
-                return None
-
-            # Store screenshot if available
-            if hasattr(result, "base64_image") and result.base64_image:
-                self._current_base64_image = result.base64_image
-
-            # Parse the state info
-            return json.loads(result.output)
-
-        except Exception as e:
-            logger.debug(f"Failed to get browser state: {str(e)}")
-            return None
-
-    async def think(self) -> bool:
-        """Process current state and decide next actions using tools, with browser state info added"""
-        # Add browser state to the context
-        browser_state = await self.get_browser_state()
-
-        # Initialize placeholder values
-        url_info = ""
-        tabs_info = ""
-        content_above_info = ""
-        content_below_info = ""
-        results_info = ""
-
-        if browser_state and not browser_state.get("error"):
-            # URL and title info
-            url_info = f"\n   URL: {browser_state.get('url', 'N/A')}\n   Title: {browser_state.get('title', 'N/A')}"
-
-            # Tab information
-            if "tabs" in browser_state:
-                tabs = browser_state.get("tabs", [])
-                if tabs:
-                    tabs_info = f"\n   {len(tabs)} tab(s) available"
-
-            # Content above/below viewport
-            pixels_above = browser_state.get("pixels_above", 0)
-            pixels_below = browser_state.get("pixels_below", 0)
-
-            if pixels_above > 0:
-                content_above_info = f" ({pixels_above} pixels)"
-
-            if pixels_below > 0:
-                content_below_info = f" ({pixels_below} pixels)"
-
-            # Add screenshot as base64 if available
-            if self._current_base64_image:
-                # Create a message with image attachment
-                image_message = Message.user_message(
-                    content="Current browser screenshot:",
-                    base64_image=self._current_base64_image,
-                )
-                self.memory.add_message(image_message)
-
-        # Replace placeholders with actual browser state info
-        self.next_step_prompt = NEXT_STEP_PROMPT.format(
-            url_placeholder=url_info,
-            tabs_placeholder=tabs_info,
-            content_above_placeholder=content_above_info,
-            content_below_placeholder=content_below_info,
-            results_placeholder=results_info,
-        )
-
-        # Call parent implementation
-        result = await super().think()
-
-        # Reset the next_step_prompt to its original state
-        self.next_step_prompt = NEXT_STEP_PROMPT
-
-        return result
diff --git a/openmanus_rl/agentgym/OpenManus/app/agent/manus.py b/openmanus_rl/agentgym/OpenManus/app/agent/manus.py
deleted file mode 100644
index d7ec2f9a..00000000
--- a/openmanus_rl/agentgym/OpenManus/app/agent/manus.py
+++ /dev/null
@@ -1,63 +0,0 @@
-from pydantic import Field
-
-from app.agent.browser import BrowserAgent
-from app.config import config
-from app.prompt.browser import NEXT_STEP_PROMPT as BROWSER_NEXT_STEP_PROMPT
-from app.prompt.manus import NEXT_STEP_PROMPT, SYSTEM_PROMPT
-from app.tool import Terminate, ToolCollection
-from app.tool.browser_use_tool import BrowserUseTool
-from app.tool.python_execute import PythonExecute
-from app.tool.str_replace_editor import StrReplaceEditor
-
-
-class Manus(BrowserAgent):
-    """
-    A versatile general-purpose agent that uses planning to solve various tasks.
-
-    This agent extends BrowserAgent with a comprehensive set of tools and capabilities,
-    including Python execution, web browsing, file operations, and information retrieval
-    to handle a wide range of user requests.
-    """
-
-    name: str = "Manus"
-    description: str = (
-        "A versatile agent that can solve various tasks using multiple tools"
-    )
-
-    system_prompt: str = SYSTEM_PROMPT.format(directory=config.workspace_root)
-    next_step_prompt: str = NEXT_STEP_PROMPT
-
-    max_observe: int = 10000
-    max_steps: int = 20
-
-    # Add general-purpose tools to the tool collection
-    available_tools: ToolCollection = Field(
-        default_factory=lambda: ToolCollection(
-            PythonExecute(), BrowserUseTool(), StrReplaceEditor(), Terminate()
-        )
-    )
-
-    async def think(self) -> bool:
-        """Process current state and decide next actions with appropriate context."""
-        # Store original prompt
-        original_prompt = self.next_step_prompt
-
-        # Only check recent messages (last 3) for browser activity
-        recent_messages = self.memory.messages[-3:] if self.memory.messages else []
-        browser_in_use = any(
-            "browser_use" in msg.content.lower()
-            for msg in recent_messages
-            if hasattr(msg, "content") and isinstance(msg.content, str)
-        )
-
-        if browser_in_use:
-            # Override with browser-specific prompt temporarily to get browser context
-            self.next_step_prompt = BROWSER_NEXT_STEP_PROMPT
-
-        # Call parent's think method
-        result = await super().think()
-
-        # Restore original prompt
-        self.next_step_prompt = original_prompt
-
-        return result
diff --git a/openmanus_rl/agentgym/OpenManus/app/agent/mcp.py b/openmanus_rl/agentgym/OpenManus/app/agent/mcp.py
deleted file mode 100644
index 01a48b09..00000000
--- a/openmanus_rl/agentgym/OpenManus/app/agent/mcp.py
+++ /dev/null
@@ -1,185 +0,0 @@
-from typing import Any, Dict, List, Optional, Tuple
-
-from pydantic import Field
-
-from app.agent.toolcall import ToolCallAgent
-from app.logger import logger
-from app.prompt.mcp import MULTIMEDIA_RESPONSE_PROMPT, NEXT_STEP_PROMPT, SYSTEM_PROMPT
-from app.schema import AgentState, Message
-from app.tool.base import ToolResult
-from app.tool.mcp import MCPClients
-
-
-class MCPAgent(ToolCallAgent):
-    """Agent for interacting with MCP (Model Context Protocol) servers.
-
-    This agent connects to an MCP server using either SSE or stdio transport
-    and makes the server's tools available through the agent's tool interface.
-    """
-
-    name: str = "mcp_agent"
-    description: str = "An agent that connects to an MCP server and uses its tools."
-
-    system_prompt: str = SYSTEM_PROMPT
-    next_step_prompt: str = NEXT_STEP_PROMPT
-
-    # Initialize MCP tool collection
-    mcp_clients: MCPClients = Field(default_factory=MCPClients)
-    available_tools: MCPClients = None  # Will be set in initialize()
-
-    max_steps: int = 20
-    connection_type: str = "stdio"  # "stdio" or "sse"
-
-    # Track tool schemas to detect changes
-    tool_schemas: Dict[str, Dict[str, Any]] = Field(default_factory=dict)
-    _refresh_tools_interval: int = 5  # Refresh tools every N steps
-
-    # Special tool names that should trigger termination
-    special_tool_names: List[str] = Field(default_factory=lambda: ["terminate"])
-
-    async def initialize(
-        self,
-        connection_type: Optional[str] = None,
-        server_url: Optional[str] = None,
-        command: Optional[str] = None,
-        args: Optional[List[str]] = None,
-    ) -> None:
-        """Initialize the MCP connection.
-
-        Args:
-            connection_type: Type of connection to use ("stdio" or "sse")
-            server_url: URL of the MCP server (for SSE connection)
-            command: Command to run (for stdio connection)
-            args: Arguments for the command (for stdio connection)
-        """
-        if connection_type:
-            self.connection_type = connection_type
-
-        # Connect to the MCP server based on connection type
-        if self.connection_type == "sse":
-            if not server_url:
-                raise ValueError("Server URL is required for SSE connection")
-            await self.mcp_clients.connect_sse(server_url=server_url)
-        elif self.connection_type == "stdio":
-            if not command:
-                raise ValueError("Command is required for stdio connection")
-            await self.mcp_clients.connect_stdio(command=command, args=args or [])
-        else:
-            raise ValueError(f"Unsupported connection type: {self.connection_type}")
-
-        # Set available_tools to our MCP instance
-        self.available_tools = self.mcp_clients
-
-        # Store initial tool schemas
-        await self._refresh_tools()
-
-        # Add system message about available tools
-        tool_names = list(self.mcp_clients.tool_map.keys())
-        tools_info = ", ".join(tool_names)
-
-        # Add system prompt and available tools information
-        self.memory.add_message(
-            Message.system_message(
-                f"{self.system_prompt}\n\nAvailable MCP tools: {tools_info}"
-            )
-        )
-
-    async def _refresh_tools(self) -> Tuple[List[str], List[str]]:
-        """Refresh the list of available tools from the MCP server.
-
-        Returns:
-            A tuple of (added_tools, removed_tools)
-        """
-        if not self.mcp_clients.session:
-            return [], []
-
-        # Get current tool schemas directly from the server
-        response = await self.mcp_clients.session.list_tools()
-        current_tools = {tool.name: tool.inputSchema for tool in response.tools}
-
-        # Determine added, removed, and changed tools
-        current_names = set(current_tools.keys())
-        previous_names = set(self.tool_schemas.keys())
-
-        added_tools = list(current_names - previous_names)
-        removed_tools = list(previous_names - current_names)
-
-        # Check for schema changes in existing tools
-        changed_tools = []
-        for name in current_names.intersection(previous_names):
-            if current_tools[name] != self.tool_schemas.get(name):
-                changed_tools.append(name)
-
-        # Update stored schemas
-        self.tool_schemas = current_tools
-
-        # Log and notify about changes
-        if added_tools:
-            logger.info(f"Added MCP tools: {added_tools}")
-            self.memory.add_message(
-                Message.system_message(f"New tools available: {', '.join(added_tools)}")
-            )
-        if removed_tools:
-            logger.info(f"Removed MCP tools: {removed_tools}")
-            self.memory.add_message(
-                Message.system_message(
-                    f"Tools no longer available: {', '.join(removed_tools)}"
-                )
-            )
-        if changed_tools:
-            logger.info(f"Changed MCP tools: {changed_tools}")
-
-        return added_tools, removed_tools
-
-    async def think(self) -> bool:
-        """Process current state and decide next action."""
-        # Check MCP session and tools availability
-        if not self.mcp_clients.session or not self.mcp_clients.tool_map:
-            logger.info("MCP service is no longer available, ending interaction")
-            self.state = AgentState.FINISHED
-            return False
-
-        # Refresh tools periodically
-        if self.current_step % self._refresh_tools_interval == 0:
-            await self._refresh_tools()
-            # All tools removed indicates shutdown
-            if not self.mcp_clients.tool_map:
-                logger.info("MCP service has shut down, ending interaction")
-                self.state = AgentState.FINISHED
-                return False
-
-        # Use the parent class's think method
-        return await super().think()
-
-    async def _handle_special_tool(self, name: str, result: Any, **kwargs) -> None:
-        """Handle special tool execution and state changes"""
-        # First process with parent handler
-        await super()._handle_special_tool(name, result, **kwargs)
-
-        # Handle multimedia responses
-        if isinstance(result, ToolResult) and result.base64_image:
-            self.memory.add_message(
-                Message.system_message(
-                    MULTIMEDIA_RESPONSE_PROMPT.format(tool_name=name)
-                )
-            )
-
-    def _should_finish_execution(self, name: str, **kwargs) -> bool:
-        """Determine if tool execution should finish the agent"""
-        # Terminate if the tool name is 'terminate'
-        return name.lower() == "terminate"
-
-    async def cleanup(self) -> None:
-        """Clean up MCP connection when done."""
-        if self.mcp_clients.session:
-            await self.mcp_clients.disconnect()
-            logger.info("MCP connection closed")
-
-    async def run(self, request: Optional[str] = None) -> str:
-        """Run the agent with cleanup when done."""
-        try:
-            result = await super().run(request)
-            return result
-        finally:
-            # Ensure cleanup happens even if there's an error
-            await self.cleanup()
diff --git a/openmanus_rl/agentgym/OpenManus/app/agent/planning.py b/openmanus_rl/agentgym/OpenManus/app/agent/planning.py
deleted file mode 100644
index 7e98912b..00000000
--- a/openmanus_rl/agentgym/OpenManus/app/agent/planning.py
+++ /dev/null
@@ -1,259 +0,0 @@
-import time
-from typing import Dict, List, Optional
-
-from pydantic import Field, model_validator
-
-from app.agent.toolcall import ToolCallAgent
-from app.logger import logger
-from app.prompt.planning import NEXT_STEP_PROMPT, PLANNING_SYSTEM_PROMPT
-from app.schema import TOOL_CHOICE_TYPE, Message, ToolCall, ToolChoice
-from app.tool import PlanningTool, Terminate, ToolCollection
-
-
-class PlanningAgent(ToolCallAgent):
-    """
-    An agent that creates and manages plans to solve tasks.
-
-    This agent uses a planning tool to create and manage structured plans,
-    and tracks progress through individual steps until task completion.
-    """
-
-    name: str = "planning"
-    description: str = "An agent that creates and manages plans to solve tasks"
-
-    system_prompt: str = PLANNING_SYSTEM_PROMPT
-    next_step_prompt: str = NEXT_STEP_PROMPT
-
-    available_tools: ToolCollection = Field(
-        default_factory=lambda: ToolCollection(PlanningTool(), Terminate())
-    )
-    tool_choices: TOOL_CHOICE_TYPE = ToolChoice.AUTO  # type: ignore
-    special_tool_names: List[str] = Field(default_factory=lambda: [Terminate().name])
-
-    tool_calls: List[ToolCall] = Field(default_factory=list)
-    active_plan_id: Optional[str] = Field(default=None)
-
-    # Add a dictionary to track the step status for each tool call
-    step_execution_tracker: Dict[str, Dict] = Field(default_factory=dict)
-    current_step_index: Optional[int] = None
-
-    max_steps: int = 20
-
-    @model_validator(mode="after")
-    def initialize_plan_and_verify_tools(self) -> "PlanningAgent":
-        """Initialize the agent with a default plan ID and validate required tools."""
-        self.active_plan_id = f"plan_{int(time.time())}"
-
-        if "planning" not in self.available_tools.tool_map:
-            self.available_tools.add_tool(PlanningTool())
-
-        return self
-
-    async def think(self) -> bool:
-        """Decide the next action based on plan status."""
-        prompt = (
-            f"CURRENT PLAN STATUS:\n{await self.get_plan()}\n\n{self.next_step_prompt}"
-            if self.active_plan_id
-            else self.next_step_prompt
-        )
-        self.messages.append(Message.user_message(prompt))
-
-        # Get the current step index before thinking
-        self.current_step_index = await self._get_current_step_index()
-
-        result = await super().think()
-
-        # After thinking, if we decided to execute a tool and it's not a planning tool or special tool,
-        # associate it with the current step for tracking
-        if result and self.tool_calls:
-            latest_tool_call = self.tool_calls[0]  # Get the most recent tool call
-            if (
-                latest_tool_call.function.name != "planning"
-                and latest_tool_call.function.name not in self.special_tool_names
-                and self.current_step_index is not None
-            ):
-                self.step_execution_tracker[latest_tool_call.id] = {
-                    "step_index": self.current_step_index,
-                    "tool_name": latest_tool_call.function.name,
-                    "status": "pending",  # Will be updated after execution
-                }
-
-        return result
-
-    async def act(self) -> str:
-        """Execute a step and track its completion status."""
-        result = await super().act()
-
-        # After executing the tool, update the plan status
-        if self.tool_calls:
-            latest_tool_call = self.tool_calls[0]
-
-            # Update the execution status to completed
-            if latest_tool_call.id in self.step_execution_tracker:
-                self.step_execution_tracker[latest_tool_call.id]["status"] = "completed"
-                self.step_execution_tracker[latest_tool_call.id]["result"] = result
-
-                # Update the plan status if this was a non-planning, non-special tool
-                if (
-                    latest_tool_call.function.name != "planning"
-                    and latest_tool_call.function.name not in self.special_tool_names
-                ):
-                    await self.update_plan_status(latest_tool_call.id)
-
-        return result
-
-    async def get_plan(self) -> str:
-        """Retrieve the current plan status."""
-        if not self.active_plan_id:
-            return "No active plan. Please create a plan first."
-
-        result = await self.available_tools.execute(
-            name="planning",
-            tool_input={"command": "get", "plan_id": self.active_plan_id},
-        )
-        return result.output if hasattr(result, "output") else str(result)
-
-    async def run(self, request: Optional[str] = None) -> str:
-        """Run the agent with an optional initial request."""
-        if request:
-            await self.create_initial_plan(request)
-        return await super().run()
-
-    async def update_plan_status(self, tool_call_id: str) -> None:
-        """
-        Update the current plan progress based on completed tool execution.
-        Only marks a step as completed if the associated tool has been successfully executed.
-        """
-        if not self.active_plan_id:
-            return
-
-        if tool_call_id not in self.step_execution_tracker:
-            logger.warning(f"No step tracking found for tool call {tool_call_id}")
-            return
-
-        tracker = self.step_execution_tracker[tool_call_id]
-        if tracker["status"] != "completed":
-            logger.warning(f"Tool call {tool_call_id} has not completed successfully")
-            return
-
-        step_index = tracker["step_index"]
-
-        try:
-            # Mark the step as completed
-            await self.available_tools.execute(
-                name="planning",
-                tool_input={
-                    "command": "mark_step",
-                    "plan_id": self.active_plan_id,
-                    "step_index": step_index,
-                    "step_status": "completed",
-                },
-            )
-            logger.info(
-                f"Marked step {step_index} as completed in plan {self.active_plan_id}"
-            )
-        except Exception as e:
-            logger.warning(f"Failed to update plan status: {e}")
-
-    async def _get_current_step_index(self) -> Optional[int]:
-        """
-        Parse the current plan to identify the first non-completed step's index.
-        Returns None if no active step is found.
-        """
-        if not self.active_plan_id:
-            return None
-
-        plan = await self.get_plan()
-
-        try:
-            plan_lines = plan.splitlines()
-            steps_index = -1
-
-            # Find the index of the "Steps:" line
-            for i, line in enumerate(plan_lines):
-                if line.strip() == "Steps:":
-                    steps_index = i
-                    break
-
-            if steps_index == -1:
-                return None
-
-            # Find the first non-completed step
-            for i, line in enumerate(plan_lines[steps_index + 1 :], start=0):
-                if "[ ]" in line or "[→]" in line:  # not_started or in_progress
-                    # Mark current step as in_progress
-                    await self.available_tools.execute(
-                        name="planning",
-                        tool_input={
-                            "command": "mark_step",
-                            "plan_id": self.active_plan_id,
-                            "step_index": i,
-                            "step_status": "in_progress",
-                        },
-                    )
-                    return i
-
-            return None  # No active step found
-        except Exception as e:
-            logger.warning(f"Error finding current step index: {e}")
-            return None
-
-    async def create_initial_plan(self, request: str) -> None:
-        """Create an initial plan based on the request."""
-        logger.info(f"Creating initial plan with ID: {self.active_plan_id}")
-
-        messages = [
-            Message.user_message(
-                f"Analyze the request and create a plan with ID {self.active_plan_id}: {request}"
-            )
-        ]
-        self.memory.add_messages(messages)
-        response = await self.llm.ask_tool(
-            messages=messages,
-            system_msgs=[Message.system_message(self.system_prompt)],
-            tools=self.available_tools.to_params(),
-            tool_choice=ToolChoice.AUTO,
-        )
-        assistant_msg = Message.from_tool_calls(
-            content=response.content, tool_calls=response.tool_calls
-        )
-
-        self.memory.add_message(assistant_msg)
-
-        plan_created = False
-        for tool_call in response.tool_calls:
-            if tool_call.function.name == "planning":
-                result = await self.execute_tool(tool_call)
-                logger.info(
-                    f"Executed tool {tool_call.function.name} with result: {result}"
-                )
-
-                # Add tool response to memory
-                tool_msg = Message.tool_message(
-                    content=result,
-                    tool_call_id=tool_call.id,
-                    name=tool_call.function.name,
-                )
-                self.memory.add_message(tool_msg)
-                plan_created = True
-                break
-
-        if not plan_created:
-            logger.warning("No plan created from initial request")
-            tool_msg = Message.assistant_message(
-                "Error: Parameter `plan_id` is required for command: create"
-            )
-            self.memory.add_message(tool_msg)
-
-
-async def main():
-    # Configure and run the agent
-    agent = PlanningAgent(available_tools=ToolCollection(PlanningTool(), Terminate()))
-    result = await agent.run("Help me plan a trip to the moon")
-    print(result)
-
-
-if __name__ == "__main__":
-    import asyncio
-
-    asyncio.run(main())
diff --git a/openmanus_rl/agentgym/OpenManus/app/agent/react.py b/openmanus_rl/agentgym/OpenManus/app/agent/react.py
deleted file mode 100644
index 7f948208..00000000
--- a/openmanus_rl/agentgym/OpenManus/app/agent/react.py
+++ /dev/null
@@ -1,38 +0,0 @@
-from abc import ABC, abstractmethod
-from typing import Optional
-
-from pydantic import Field
-
-from app.agent.base import BaseAgent
-from app.llm import LLM
-from app.schema import AgentState, Memory
-
-
-class ReActAgent(BaseAgent, ABC):
-    name: str
-    description: Optional[str] = None
-
-    system_prompt: Optional[str] = None
-    next_step_prompt: Optional[str] = None
-
-    llm: Optional[LLM] = Field(default_factory=LLM)
-    memory: Memory = Field(default_factory=Memory)
-    state: AgentState = AgentState.IDLE
-
-    max_steps: int = 10
-    current_step: int = 0
-
-    @abstractmethod
-    async def think(self) -> bool:
-        """Process current state and decide next action"""
-
-    @abstractmethod
-    async def act(self) -> str:
-        """Execute decided actions"""
-
-    async def step(self) -> str:
-        """Execute a single step: think and act."""
-        should_act = await self.think()
-        if not should_act:
-            return "Thinking complete - no action needed"
-        return await self.act()
diff --git a/openmanus_rl/agentgym/OpenManus/app/agent/swe.py b/openmanus_rl/agentgym/OpenManus/app/agent/swe.py
deleted file mode 100644
index 0ac1b36f..00000000
--- a/openmanus_rl/agentgym/OpenManus/app/agent/swe.py
+++ /dev/null
@@ -1,38 +0,0 @@
-from typing import List
-
-from pydantic import Field
-
-from app.agent.toolcall import ToolCallAgent
-from app.prompt.swe import NEXT_STEP_TEMPLATE, SYSTEM_PROMPT
-from app.tool import Bash, StrReplaceEditor, Terminate, ToolCollection
-
-
-class SWEAgent(ToolCallAgent):
-    """An agent that implements the SWEAgent paradigm for executing code and natural conversations."""
-
-    name: str = "swe"
-    description: str = "an autonomous AI programmer that interacts directly with the computer to solve tasks."
-
-    system_prompt: str = SYSTEM_PROMPT
-    next_step_prompt: str = NEXT_STEP_TEMPLATE
-
-    available_tools: ToolCollection = ToolCollection(
-        Bash(), StrReplaceEditor(), Terminate()
-    )
-    special_tool_names: List[str] = Field(default_factory=lambda: [Terminate().name])
-
-    max_steps: int = 30
-
-    bash: Bash = Field(default_factory=Bash)
-    working_dir: str = "."
-
-    async def think(self) -> bool:
-        """Process current state and decide next action"""
-        # Update working directory
-        result = await self.bash.execute("pwd")
-        self.working_dir = result.output
-        self.next_step_prompt = self.next_step_prompt.format(
-            current_dir=self.working_dir
-        )
-
-        return await super().think()
diff --git a/openmanus_rl/agentgym/OpenManus/app/agent/toolcall.py b/openmanus_rl/agentgym/OpenManus/app/agent/toolcall.py
deleted file mode 100644
index 76f6f019..00000000
--- a/openmanus_rl/agentgym/OpenManus/app/agent/toolcall.py
+++ /dev/null
@@ -1,234 +0,0 @@
-import json
-from typing import Any, List, Optional, Union
-
-from pydantic import Field
-
-from app.agent.react import ReActAgent
-from app.exceptions import TokenLimitExceeded
-from app.logger import logger
-from app.prompt.toolcall import NEXT_STEP_PROMPT, SYSTEM_PROMPT
-from app.schema import TOOL_CHOICE_TYPE, AgentState, Message, ToolCall, ToolChoice
-from app.tool import CreateChatCompletion, Terminate, ToolCollection
-
-
-TOOL_CALL_REQUIRED = "Tool calls required but none provided"
-
-
-class ToolCallAgent(ReActAgent):
-    """Base agent class for handling tool/function calls with enhanced abstraction"""
-
-    name: str = "toolcall"
-    description: str = "an agent that can execute tool calls."
-
-    system_prompt: str = SYSTEM_PROMPT
-    next_step_prompt: str = NEXT_STEP_PROMPT
-
-    available_tools: ToolCollection = ToolCollection(
-        CreateChatCompletion(), Terminate()
-    )
-    tool_choices: TOOL_CHOICE_TYPE = ToolChoice.AUTO  # type: ignore
-    special_tool_names: List[str] = Field(default_factory=lambda: [Terminate().name])
-
-    tool_calls: List[ToolCall] = Field(default_factory=list)
-    _current_base64_image: Optional[str] = None
-
-    max_steps: int = 30
-    max_observe: Optional[Union[int, bool]] = None
-
-    async def think(self) -> bool:
-        """Process current state and decide next actions using tools"""
-        if self.next_step_prompt:
-            user_msg = Message.user_message(self.next_step_prompt)
-            self.messages += [user_msg]
-
-        try:
-            # Get response with tool options
-            response = await self.llm.ask_tool(
-                messages=self.messages,
-                system_msgs=(
-                    [Message.system_message(self.system_prompt)]
-                    if self.system_prompt
-                    else None
-                ),
-                tools=self.available_tools.to_params(),
-                tool_choice=self.tool_choices,
-            )
-        except ValueError:
-            raise
-        except Exception as e:
-            # Check if this is a RetryError containing TokenLimitExceeded
-            if hasattr(e, "__cause__") and isinstance(e.__cause__, TokenLimitExceeded):
-                token_limit_error = e.__cause__
-                logger.error(
-                    f"🚨 Token limit error (from RetryError): {token_limit_error}"
-                )
-                self.memory.add_message(
-                    Message.assistant_message(
-                        f"Maximum token limit reached, cannot continue execution: {str(token_limit_error)}"
-                    )
-                )
-                self.state = AgentState.FINISHED
-                return False
-            raise
-
-        self.tool_calls = tool_calls = (
-            response.tool_calls if response and response.tool_calls else []
-        )
-        content = response.content if response and response.content else ""
-
-        # Log response info
-        logger.info(f"✨ {self.name}'s thoughts: {content}")
-        logger.info(
-            f"🛠️ {self.name} selected {len(tool_calls) if tool_calls else 0} tools to use"
-        )
-        if tool_calls:
-            logger.info(
-                f"🧰 Tools being prepared: {[call.function.name for call in tool_calls]}"
-            )
-            logger.info(f"🔧 Tool arguments: {tool_calls[0].function.arguments}")
-
-        try:
-            if response is None:
-                raise RuntimeError("No response received from the LLM")
-
-            # Handle different tool_choices modes
-            if self.tool_choices == ToolChoice.NONE:
-                if tool_calls:
-                    logger.warning(
-                        f"🤔 Hmm, {self.name} tried to use tools when they weren't available!"
-                    )
-                if content:
-                    self.memory.add_message(Message.assistant_message(content))
-                    return True
-                return False
-
-            # Create and add assistant message
-            assistant_msg = (
-                Message.from_tool_calls(content=content, tool_calls=self.tool_calls)
-                if self.tool_calls
-                else Message.assistant_message(content)
-            )
-            self.memory.add_message(assistant_msg)
-
-            if self.tool_choices == ToolChoice.REQUIRED and not self.tool_calls:
-                return True  # Will be handled in act()
-
-            # For 'auto' mode, continue with content if no commands but content exists
-            if self.tool_choices == ToolChoice.AUTO and not self.tool_calls:
-                return bool(content)
-
-            return bool(self.tool_calls)
-        except Exception as e:
-            logger.error(f"🚨 Oops! The {self.name}'s thinking process hit a snag: {e}")
-            self.memory.add_message(
-                Message.assistant_message(
-                    f"Error encountered while processing: {str(e)}"
-                )
-            )
-            return False
-
-    async def act(self) -> str:
-        """Execute tool calls and handle their results"""
-        if not self.tool_calls:
-            if self.tool_choices == ToolChoice.REQUIRED:
-                raise ValueError(TOOL_CALL_REQUIRED)
-
-            # Return last message content if no tool calls
-            return self.messages[-1].content or "No content or commands to execute"
-
-        results = []
-        for command in self.tool_calls:
-            # Reset base64_image for each tool call
-            self._current_base64_image = None
-
-            result = await self.execute_tool(command)
-
-            if self.max_observe:
-                result = result[: self.max_observe]
-
-            logger.info(
-                f"🎯 Tool '{command.function.name}' completed its mission! Result: {result}"
-            )
-
-            # Add tool response to memory
-            tool_msg = Message.tool_message(
-                content=result,
-                tool_call_id=command.id,
-                name=command.function.name,
-                base64_image=self._current_base64_image,
-            )
-            self.memory.add_message(tool_msg)
-            results.append(result)
-
-        return "\n\n".join(results)
-
-    async def execute_tool(self, command: ToolCall) -> str:
-        """Execute a single tool call with robust error handling"""
-        if not command or not command.function or not command.function.name:
-            return "Error: Invalid command format"
-
-        name = command.function.name
-        if name not in self.available_tools.tool_map:
-            return f"Error: Unknown tool '{name}'"
-
-        try:
-            # Parse arguments
-            args = json.loads(command.function.arguments or "{}")
-
-            # Execute the tool
-            logger.info(f"🔧 Activating tool: '{name}'...")
-            result = await self.available_tools.execute(name=name, tool_input=args)
-
-            # Handle special tools
-            await self._handle_special_tool(name=name, result=result)
-
-            # Check if result is a ToolResult with base64_image
-            if hasattr(result, "base64_image") and result.base64_image:
-                # Store the base64_image for later use in tool_message
-                self._current_base64_image = result.base64_image
-
-                # Format result for display
-                observation = (
-                    f"Observed output of cmd `{name}` executed:\n{str(result)}"
-                    if result
-                    else f"Cmd `{name}` completed with no output"
-                )
-                return observation
-
-            # Format result for display (standard case)
-            observation = (
-                f"Observed output of cmd `{name}` executed:\n{str(result)}"
-                if result
-                else f"Cmd `{name}` completed with no output"
-            )
-
-            return observation
-        except json.JSONDecodeError:
-            error_msg = f"Error parsing arguments for {name}: Invalid JSON format"
-            logger.error(
-                f"📝 Oops! The arguments for '{name}' don't make sense - invalid JSON, arguments:{command.function.arguments}"
-            )
-            return f"Error: {error_msg}"
-        except Exception as e:
-            error_msg = f"⚠️ Tool '{name}' encountered a problem: {str(e)}"
-            logger.exception(error_msg)
-            return f"Error: {error_msg}"
-
-    async def _handle_special_tool(self, name: str, result: Any, **kwargs):
-        """Handle special tool execution and state changes"""
-        if not self._is_special_tool(name):
-            return
-
-        if self._should_finish_execution(name=name, result=result, **kwargs):
-            # Set agent state to finished
-            logger.info(f"🏁 Special tool '{name}' has completed the task!")
-            self.state = AgentState.FINISHED
-
-    @staticmethod
-    def _should_finish_execution(**kwargs) -> bool:
-        """Determine if tool execution should finish the agent"""
-        return True
-
-    def _is_special_tool(self, name: str) -> bool:
-        """Check if tool name is in special tools list"""
-        return name.lower() in [n.lower() for n in self.special_tool_names]
diff --git a/openmanus_rl/agentgym/OpenManus/app/config.py b/openmanus_rl/agentgym/OpenManus/app/config.py
deleted file mode 100644
index 94597074..00000000
--- a/openmanus_rl/agentgym/OpenManus/app/config.py
+++ /dev/null
@@ -1,236 +0,0 @@
-import threading
-import tomllib
-from pathlib import Path
-from typing import Dict, List, Optional
-
-from pydantic import BaseModel, Field
-
-
-def get_project_root() -> Path:
-    """Get the project root directory"""
-    return Path(__file__).resolve().parent.parent
-
-
-PROJECT_ROOT = get_project_root()
-WORKSPACE_ROOT = PROJECT_ROOT / "workspace"
-
-
-class LLMSettings(BaseModel):
-    model: str = Field(..., description="Model name")
-    base_url: str = Field(..., description="API base URL")
-    api_key: str = Field(..., description="API key")
-    max_tokens: int = Field(4096, description="Maximum number of tokens per request")
-    max_input_tokens: Optional[int] = Field(
-        None,
-        description="Maximum input tokens to use across all requests (None for unlimited)",
-    )
-    temperature: float = Field(1.0, description="Sampling temperature")
-    api_type: str = Field(..., description="Azure, Openai, or Ollama")
-    api_version: str = Field(..., description="Azure Openai version if AzureOpenai")
-
-
-class ProxySettings(BaseModel):
-    server: str = Field(None, description="Proxy server address")
-    username: Optional[str] = Field(None, description="Proxy username")
-    password: Optional[str] = Field(None, description="Proxy password")
-
-
-class SearchSettings(BaseModel):
-    engine: str = Field(default="Google", description="Search engine the llm to use")
-
-
-class BrowserSettings(BaseModel):
-    headless: bool = Field(False, description="Whether to run browser in headless mode")
-    disable_security: bool = Field(
-        True, description="Disable browser security features"
-    )
-    extra_chromium_args: List[str] = Field(
-        default_factory=list, description="Extra arguments to pass to the browser"
-    )
-    chrome_instance_path: Optional[str] = Field(
-        None, description="Path to a Chrome instance to use"
-    )
-    wss_url: Optional[str] = Field(
-        None, description="Connect to a browser instance via WebSocket"
-    )
-    cdp_url: Optional[str] = Field(
-        None, description="Connect to a browser instance via CDP"
-    )
-    proxy: Optional[ProxySettings] = Field(
-        None, description="Proxy settings for the browser"
-    )
-    max_content_length: int = Field(
-        2000, description="Maximum length for content retrieval operations"
-    )
-
-
-class SandboxSettings(BaseModel):
-    """Configuration for the execution sandbox"""
-
-    use_sandbox: bool = Field(False, description="Whether to use the sandbox")
-    image: str = Field("python:3.12-slim", description="Base image")
-    work_dir: str = Field("/workspace", description="Container working directory")
-    memory_limit: str = Field("512m", description="Memory limit")
-    cpu_limit: float = Field(1.0, description="CPU limit")
-    timeout: int = Field(300, description="Default command timeout (seconds)")
-    network_enabled: bool = Field(
-        False, description="Whether network access is allowed"
-    )
-
-
-class AppConfig(BaseModel):
-    llm: Dict[str, LLMSettings]
-    sandbox: Optional[SandboxSettings] = Field(
-        None, description="Sandbox configuration"
-    )
-    browser_config: Optional[BrowserSettings] = Field(
-        None, description="Browser configuration"
-    )
-    search_config: Optional[SearchSettings] = Field(
-        None, description="Search configuration"
-    )
-
-    class Config:
-        arbitrary_types_allowed = True
-
-
-class Config:
-    _instance = None
-    _lock = threading.Lock()
-    _initialized = False
-
-    def __new__(cls):
-        if cls._instance is None:
-            with cls._lock:
-                if cls._instance is None:
-                    cls._instance = super().__new__(cls)
-        return cls._instance
-
-    def __init__(self):
-        if not self._initialized:
-            with self._lock:
-                if not self._initialized:
-                    self._config = None
-                    self._load_initial_config()
-                    self._initialized = True
-
-    @staticmethod
-    def _get_config_path() -> Path:
-        root = PROJECT_ROOT
-        config_path = root / "config" / "config.toml"
-        if config_path.exists():
-            return config_path
-        example_path = root / "config" / "config.example.toml"
-        if example_path.exists():
-            return example_path
-        raise FileNotFoundError("No configuration file found in config directory")
-
-    def _load_config(self) -> dict:
-        config_path = self._get_config_path()
-        with config_path.open("rb") as f:
-            return tomllib.load(f)
-
-    def _load_initial_config(self):
-        raw_config = self._load_config()
-        base_llm = raw_config.get("llm", {})
-        llm_overrides = {
-            k: v for k, v in raw_config.get("llm", {}).items() if isinstance(v, dict)
-        }
-
-        default_settings = {
-            "model": base_llm.get("model"),
-            "base_url": base_llm.get("base_url"),
-            "api_key": base_llm.get("api_key"),
-            "max_tokens": base_llm.get("max_tokens", 4096),
-            "max_input_tokens": base_llm.get("max_input_tokens"),
-            "temperature": base_llm.get("temperature", 1.0),
-            "api_type": base_llm.get("api_type", ""),
-            "api_version": base_llm.get("api_version", ""),
-        }
-
-        # handle browser config.
-        browser_config = raw_config.get("browser", {})
-        browser_settings = None
-
-        if browser_config:
-            # handle proxy settings.
-            proxy_config = browser_config.get("proxy", {})
-            proxy_settings = None
-
-            if proxy_config and proxy_config.get("server"):
-                proxy_settings = ProxySettings(
-                    **{
-                        k: v
-                        for k, v in proxy_config.items()
-                        if k in ["server", "username", "password"] and v
-                    }
-                )
-
-            # filter valid browser config parameters.
-            valid_browser_params = {
-                k: v
-                for k, v in browser_config.items()
-                if k in BrowserSettings.__annotations__ and v is not None
-            }
-
-            # if there is proxy settings, add it to the parameters.
-            if proxy_settings:
-                valid_browser_params["proxy"] = proxy_settings
-
-            # only create BrowserSettings when there are valid parameters.
-            if valid_browser_params:
-                browser_settings = BrowserSettings(**valid_browser_params)
-
-        search_config = raw_config.get("search", {})
-        search_settings = None
-        if search_config:
-            search_settings = SearchSettings(**search_config)
-        sandbox_config = raw_config.get("sandbox", {})
-        if sandbox_config:
-            sandbox_settings = SandboxSettings(**sandbox_config)
-        else:
-            sandbox_settings = SandboxSettings()
-
-        config_dict = {
-            "llm": {
-                "default": default_settings,
-                **{
-                    name: {**default_settings, **override_config}
-                    for name, override_config in llm_overrides.items()
-                },
-            },
-            "sandbox": sandbox_settings,
-            "browser_config": browser_settings,
-            "search_config": search_settings,
-        }
-
-        self._config = AppConfig(**config_dict)
-
-    @property
-    def llm(self) -> Dict[str, LLMSettings]:
-        return self._config.llm
-
-    @property
-    def sandbox(self) -> SandboxSettings:
-        return self._config.sandbox
-
-    @property
-    def browser_config(self) -> Optional[BrowserSettings]:
-        return self._config.browser_config
-
-    @property
-    def search_config(self) -> Optional[SearchSettings]:
-        return self._config.search_config
-
-    @property
-    def workspace_root(self) -> Path:
-        """Get the workspace root directory"""
-        return WORKSPACE_ROOT
-
-    @property
-    def root_path(self) -> Path:
-        """Get the root path of the application"""
-        return PROJECT_ROOT
-
-
-config = Config()
diff --git a/openmanus_rl/agentgym/OpenManus/app/exceptions.py b/openmanus_rl/agentgym/OpenManus/app/exceptions.py
deleted file mode 100644
index fc900874..00000000
--- a/openmanus_rl/agentgym/OpenManus/app/exceptions.py
+++ /dev/null
@@ -1,13 +0,0 @@
-class ToolError(Exception):
-    """Raised when a tool encounters an error."""
-
-    def __init__(self, message):
-        self.message = message
-
-
-class OpenManusError(Exception):
-    """Base exception for all OpenManus errors"""
-
-
-class TokenLimitExceeded(OpenManusError):
-    """Exception raised when the token limit is exceeded"""
diff --git a/openmanus_rl/agentgym/OpenManus/app/flow/base.py b/openmanus_rl/agentgym/OpenManus/app/flow/base.py
deleted file mode 100644
index 13066cec..00000000
--- a/openmanus_rl/agentgym/OpenManus/app/flow/base.py
+++ /dev/null
@@ -1,91 +0,0 @@
-from abc import ABC, abstractmethod
-from enum import Enum
-from typing import Dict, List, Optional, Union
-
-from pydantic import BaseModel
-
-from app.agent.base import BaseAgent
-
-
-class FlowType(str, Enum):
-    PLANNING = "planning"
-
-
-class BaseFlow(BaseModel, ABC):
-    """Base class for execution flows supporting multiple agents"""
-
-    agents: Dict[str, BaseAgent]
-    tools: Optional[List] = None
-    primary_agent_key: Optional[str] = None
-
-    class Config:
-        arbitrary_types_allowed = True
-
-    def __init__(
-        self, agents: Union[BaseAgent, List[BaseAgent], Dict[str, BaseAgent]], **data
-    ):
-        # Handle different ways of providing agents
-        if isinstance(agents, BaseAgent):
-            agents_dict = {"default": agents}
-        elif isinstance(agents, list):
-            agents_dict = {f"agent_{i}": agent for i, agent in enumerate(agents)}
-        else:
-            agents_dict = agents
-
-        # If primary agent not specified, use first agent
-        primary_key = data.get("primary_agent_key")
-        if not primary_key and agents_dict:
-            primary_key = next(iter(agents_dict))
-            data["primary_agent_key"] = primary_key
-
-        # Set the agents dictionary
-        data["agents"] = agents_dict
-
-        # Initialize using BaseModel's init
-        super().__init__(**data)
-
-    @property
-    def primary_agent(self) -> Optional[BaseAgent]:
-        """Get the primary agent for the flow"""
-        return self.agents.get(self.primary_agent_key)
-
-    def get_agent(self, key: str) -> Optional[BaseAgent]:
-        """Get a specific agent by key"""
-        return self.agents.get(key)
-
-    def add_agent(self, key: str, agent: BaseAgent) -> None:
-        """Add a new agent to the flow"""
-        self.agents[key] = agent
-
-    @abstractmethod
-    async def execute(self, input_text: str) -> str:
-        """Execute the flow with given input"""
-
-
-class PlanStepStatus(str, Enum):
-    """Enum class defining possible statuses of a plan step"""
-
-    NOT_STARTED = "not_started"
-    IN_PROGRESS = "in_progress"
-    COMPLETED = "completed"
-    BLOCKED = "blocked"
-
-    @classmethod
-    def get_all_statuses(cls) -> list[str]:
-        """Return a list of all possible step status values"""
-        return [status.value for status in cls]
-
-    @classmethod
-    def get_active_statuses(cls) -> list[str]:
-        """Return a list of values representing active statuses (not started or in progress)"""
-        return [cls.NOT_STARTED.value, cls.IN_PROGRESS.value]
-
-    @classmethod
-    def get_status_marks(cls) -> Dict[str, str]:
-        """Return a mapping of statuses to their marker symbols"""
-        return {
-            cls.COMPLETED.value: "[✓]",
-            cls.IN_PROGRESS.value: "[→]",
-            cls.BLOCKED.value: "[!]",
-            cls.NOT_STARTED.value: "[ ]",
-        }
diff --git a/openmanus_rl/agentgym/OpenManus/app/flow/flow_factory.py b/openmanus_rl/agentgym/OpenManus/app/flow/flow_factory.py
deleted file mode 100644
index 72722829..00000000
--- a/openmanus_rl/agentgym/OpenManus/app/flow/flow_factory.py
+++ /dev/null
@@ -1,25 +0,0 @@
-from typing import Dict, List, Union
-
-from app.agent.base import BaseAgent
-from app.flow.base import BaseFlow, FlowType
-from app.flow.planning import PlanningFlow
-
-
-class FlowFactory:
-    """Factory for creating different types of flows with support for multiple agents"""
-
-    @staticmethod
-    def create_flow(
-        flow_type: FlowType,
-        agents: Union[BaseAgent, List[BaseAgent], Dict[str, BaseAgent]],
-        **kwargs,
-    ) -> BaseFlow:
-        flows = {
-            FlowType.PLANNING: PlanningFlow,
-        }
-
-        flow_class = flows.get(flow_type)
-        if not flow_class:
-            raise ValueError(f"Unknown flow type: {flow_type}")
-
-        return flow_class(agents, **kwargs)
diff --git a/openmanus_rl/agentgym/OpenManus/app/flow/planning.py b/openmanus_rl/agentgym/OpenManus/app/flow/planning.py
deleted file mode 100644
index 55ec5c9c..00000000
--- a/openmanus_rl/agentgym/OpenManus/app/flow/planning.py
+++ /dev/null
@@ -1,394 +0,0 @@
-import json
-import time
-from typing import Dict, List, Optional, Union
-
-from pydantic import Field
-
-from app.agent.base import BaseAgent
-from app.flow.base import BaseFlow, PlanStepStatus
-from app.llm import LLM
-from app.logger import logger
-from app.schema import AgentState, Message, ToolChoice
-from app.tool import PlanningTool
-
-
-class PlanningFlow(BaseFlow):
-    """A flow that manages planning and execution of tasks using agents."""
-
-    llm: LLM = Field(default_factory=lambda: LLM())
-    planning_tool: PlanningTool = Field(default_factory=PlanningTool)
-    executor_keys: List[str] = Field(default_factory=list)
-    active_plan_id: str = Field(default_factory=lambda: f"plan_{int(time.time())}")
-    current_step_index: Optional[int] = None
-
-    def __init__(
-        self, agents: Union[BaseAgent, List[BaseAgent], Dict[str, BaseAgent]], **data
-    ):
-        # Set executor keys before super().__init__
-        if "executors" in data:
-            data["executor_keys"] = data.pop("executors")
-
-        # Set plan ID if provided
-        if "plan_id" in data:
-            data["active_plan_id"] = data.pop("plan_id")
-
-        # Initialize the planning tool if not provided
-        if "planning_tool" not in data:
-            planning_tool = PlanningTool()
-            data["planning_tool"] = planning_tool
-
-        # Call parent's init with the processed data
-        super().__init__(agents, **data)
-
-        # Set executor_keys to all agent keys if not specified
-        if not self.executor_keys:
-            self.executor_keys = list(self.agents.keys())
-
-    def get_executor(self, step_type: Optional[str] = None) -> BaseAgent:
-        """
-        Get an appropriate executor agent for the current step.
-        Can be extended to select agents based on step type/requirements.
-        """
-        # If step type is provided and matches an agent key, use that agent
-        if step_type and step_type in self.agents:
-            return self.agents[step_type]
-
-        # Otherwise use the first available executor or fall back to primary agent
-        for key in self.executor_keys:
-            if key in self.agents:
-                return self.agents[key]
-
-        # Fallback to primary agent
-        return self.primary_agent
-
-    async def execute(self, input_text: str) -> str:
-        """Execute the planning flow with agents."""
-        try:
-            if not self.primary_agent:
-                raise ValueError("No primary agent available")
-
-            # Create initial plan if input provided
-            if input_text:
-                await self._create_initial_plan(input_text)
-
-                # Verify plan was created successfully
-                if self.active_plan_id not in self.planning_tool.plans:
-                    logger.error(
-                        f"Plan creation failed. Plan ID {self.active_plan_id} not found in planning tool."
-                    )
-                    return f"Failed to create plan for: {input_text}"
-
-            result = ""
-            while True:
-                # Get current step to execute
-                self.current_step_index, step_info = await self._get_current_step_info()
-
-                # Exit if no more steps or plan completed
-                if self.current_step_index is None:
-                    result += await self._finalize_plan()
-                    break
-
-                # Execute current step with appropriate agent
-                step_type = step_info.get("type") if step_info else None
-                executor = self.get_executor(step_type)
-                step_result = await self._execute_step(executor, step_info)
-                result += step_result + "\n"
-
-                # Check if agent wants to terminate
-                if hasattr(executor, "state") and executor.state == AgentState.FINISHED:
-                    break
-
-            return result
-        except Exception as e:
-            logger.error(f"Error in PlanningFlow: {str(e)}")
-            return f"Execution failed: {str(e)}"
-
-    async def _create_initial_plan(self, request: str) -> None:
-        """Create an initial plan based on the request using the flow's LLM and PlanningTool."""
-        logger.info(f"Creating initial plan with ID: {self.active_plan_id}")
-
-        # Create a system message for plan creation
-        system_message = Message.system_message(
-            "You are a planning assistant. Create a concise, actionable plan with clear steps. "
-            "Focus on key milestones rather than detailed sub-steps. "
-            "Optimize for clarity and efficiency."
-        )
-
-        # Create a user message with the request
-        user_message = Message.user_message(
-            f"Create a reasonable plan with clear steps to accomplish the task: {request}"
-        )
-
-        # Call LLM with PlanningTool
-        response = await self.llm.ask_tool(
-            messages=[user_message],
-            system_msgs=[system_message],
-            tools=[self.planning_tool.to_param()],
-            tool_choice=ToolChoice.AUTO,
-        )
-
-        # Process tool calls if present
-        if response.tool_calls:
-            for tool_call in response.tool_calls:
-                if tool_call.function.name == "planning":
-                    # Parse the arguments
-                    args = tool_call.function.arguments
-                    if isinstance(args, str):
-                        try:
-                            args = json.loads(args)
-                        except json.JSONDecodeError:
-                            logger.error(f"Failed to parse tool arguments: {args}")
-                            continue
-
-                    # Ensure plan_id is set correctly and execute the tool
-                    args["plan_id"] = self.active_plan_id
-
-                    # Execute the tool via ToolCollection instead of directly
-                    result = await self.planning_tool.execute(**args)
-
-                    logger.info(f"Plan creation result: {str(result)}")
-                    return
-
-        # If execution reached here, create a default plan
-        logger.warning("Creating default plan")
-
-        # Create default plan using the ToolCollection
-        await self.planning_tool.execute(
-            **{
-                "command": "create",
-                "plan_id": self.active_plan_id,
-                "title": f"Plan for: {request[:50]}{'...' if len(request) > 50 else ''}",
-                "steps": ["Analyze request", "Execute task", "Verify results"],
-            }
-        )
-
-    async def _get_current_step_info(self) -> tuple[Optional[int], Optional[dict]]:
-        """
-        Parse the current plan to identify the first non-completed step's index and info.
-        Returns (None, None) if no active step is found.
-        """
-        if (
-            not self.active_plan_id
-            or self.active_plan_id not in self.planning_tool.plans
-        ):
-            logger.error(f"Plan with ID {self.active_plan_id} not found")
-            return None, None
-
-        try:
-            # Direct access to plan data from planning tool storage
-            plan_data = self.planning_tool.plans[self.active_plan_id]
-            steps = plan_data.get("steps", [])
-            step_statuses = plan_data.get("step_statuses", [])
-
-            # Find first non-completed step
-            for i, step in enumerate(steps):
-                if i >= len(step_statuses):
-                    status = PlanStepStatus.NOT_STARTED.value
-                else:
-                    status = step_statuses[i]
-
-                if status in PlanStepStatus.get_active_statuses():
-                    # Extract step type/category if available
-                    step_info = {"text": step}
-
-                    # Try to extract step type from the text (e.g., [SEARCH] or [CODE])
-                    import re
-
-                    type_match = re.search(r"\[([A-Z_]+)\]", step)
-                    if type_match:
-                        step_info["type"] = type_match.group(1).lower()
-
-                    # Mark current step as in_progress
-                    try:
-                        await self.planning_tool.execute(
-                            command="mark_step",
-                            plan_id=self.active_plan_id,
-                            step_index=i,
-                            step_status=PlanStepStatus.IN_PROGRESS.value,
-                        )
-                    except Exception as e:
-                        logger.warning(f"Error marking step as in_progress: {e}")
-                        # Update step status directly if needed
-                        if i < len(step_statuses):
-                            step_statuses[i] = PlanStepStatus.IN_PROGRESS.value
-                        else:
-                            while len(step_statuses) < i:
-                                step_statuses.append(PlanStepStatus.NOT_STARTED.value)
-                            step_statuses.append(PlanStepStatus.IN_PROGRESS.value)
-
-                        plan_data["step_statuses"] = step_statuses
-
-                    return i, step_info
-
-            return None, None  # No active step found
-
-        except Exception as e:
-            logger.warning(f"Error finding current step index: {e}")
-            return None, None
-
-    async def _execute_step(self, executor: BaseAgent, step_info: dict) -> str:
-        """Execute the current step with the specified agent using agent.run()."""
-        # Prepare context for the agent with current plan status
-        plan_status = await self._get_plan_text()
-        step_text = step_info.get("text", f"Step {self.current_step_index}")
-
-        # Create a prompt for the agent to execute the current step
-        step_prompt = f"""
-        CURRENT PLAN STATUS:
-        {plan_status}
-
-        YOUR CURRENT TASK:
-        You are now working on step {self.current_step_index}: "{step_text}"
-
-        Please execute this step using the appropriate tools. When you're done, provide a summary of what you accomplished.
-        """
-
-        # Use agent.run() to execute the step
-        try:
-            step_result = await executor.run(step_prompt)
-
-            # Mark the step as completed after successful execution
-            await self._mark_step_completed()
-
-            return step_result
-        except Exception as e:
-            logger.error(f"Error executing step {self.current_step_index}: {e}")
-            return f"Error executing step {self.current_step_index}: {str(e)}"
-
-    async def _mark_step_completed(self) -> None:
-        """Mark the current step as completed."""
-        if self.current_step_index is None:
-            return
-
-        try:
-            # Mark the step as completed
-            await self.planning_tool.execute(
-                command="mark_step",
-                plan_id=self.active_plan_id,
-                step_index=self.current_step_index,
-                step_status=PlanStepStatus.COMPLETED.value,
-            )
-            logger.info(
-                f"Marked step {self.current_step_index} as completed in plan {self.active_plan_id}"
-            )
-        except Exception as e:
-            logger.warning(f"Failed to update plan status: {e}")
-            # Update step status directly in planning tool storage
-            if self.active_plan_id in self.planning_tool.plans:
-                plan_data = self.planning_tool.plans[self.active_plan_id]
-                step_statuses = plan_data.get("step_statuses", [])
-
-                # Ensure the step_statuses list is long enough
-                while len(step_statuses) <= self.current_step_index:
-                    step_statuses.append(PlanStepStatus.NOT_STARTED.value)
-
-                # Update the status
-                step_statuses[self.current_step_index] = PlanStepStatus.COMPLETED.value
-                plan_data["step_statuses"] = step_statuses
-
-    async def _get_plan_text(self) -> str:
-        """Get the current plan as formatted text."""
-        try:
-            result = await self.planning_tool.execute(
-                command="get", plan_id=self.active_plan_id
-            )
-            return result.output if hasattr(result, "output") else str(result)
-        except Exception as e:
-            logger.error(f"Error getting plan: {e}")
-            return self._generate_plan_text_from_storage()
-
-    def _generate_plan_text_from_storage(self) -> str:
-        """Generate plan text directly from storage if the planning tool fails."""
-        try:
-            if self.active_plan_id not in self.planning_tool.plans:
-                return f"Error: Plan with ID {self.active_plan_id} not found"
-
-            plan_data = self.planning_tool.plans[self.active_plan_id]
-            title = plan_data.get("title", "Untitled Plan")
-            steps = plan_data.get("steps", [])
-            step_statuses = plan_data.get("step_statuses", [])
-            step_notes = plan_data.get("step_notes", [])
-
-            # Ensure step_statuses and step_notes match the number of steps
-            while len(step_statuses) < len(steps):
-                step_statuses.append(PlanStepStatus.NOT_STARTED.value)
-            while len(step_notes) < len(steps):
-                step_notes.append("")
-
-            # Count steps by status
-            status_counts = {status: 0 for status in PlanStepStatus.get_all_statuses()}
-
-            for status in step_statuses:
-                if status in status_counts:
-                    status_counts[status] += 1
-
-            completed = status_counts[PlanStepStatus.COMPLETED.value]
-            total = len(steps)
-            progress = (completed / total) * 100 if total > 0 else 0
-
-            plan_text = f"Plan: {title} (ID: {self.active_plan_id})\n"
-            plan_text += "=" * len(plan_text) + "\n\n"
-
-            plan_text += (
-                f"Progress: {completed}/{total} steps completed ({progress:.1f}%)\n"
-            )
-            plan_text += f"Status: {status_counts[PlanStepStatus.COMPLETED.value]} completed, {status_counts[PlanStepStatus.IN_PROGRESS.value]} in progress, "
-            plan_text += f"{status_counts[PlanStepStatus.BLOCKED.value]} blocked, {status_counts[PlanStepStatus.NOT_STARTED.value]} not started\n\n"
-            plan_text += "Steps:\n"
-
-            status_marks = PlanStepStatus.get_status_marks()
-
-            for i, (step, status, notes) in enumerate(
-                zip(steps, step_statuses, step_notes)
-            ):
-                # Use status marks to indicate step status
-                status_mark = status_marks.get(
-                    status, status_marks[PlanStepStatus.NOT_STARTED.value]
-                )
-
-                plan_text += f"{i}. {status_mark} {step}\n"
-                if notes:
-                    plan_text += f"   Notes: {notes}\n"
-
-            return plan_text
-        except Exception as e:
-            logger.error(f"Error generating plan text from storage: {e}")
-            return f"Error: Unable to retrieve plan with ID {self.active_plan_id}"
-
-    async def _finalize_plan(self) -> str:
-        """Finalize the plan and provide a summary using the flow's LLM directly."""
-        plan_text = await self._get_plan_text()
-
-        # Create a summary using the flow's LLM directly
-        try:
-            system_message = Message.system_message(
-                "You are a planning assistant. Your task is to summarize the completed plan."
-            )
-
-            user_message = Message.user_message(
-                f"The plan has been completed. Here is the final plan status:\n\n{plan_text}\n\nPlease provide a summary of what was accomplished and any final thoughts."
-            )
-
-            response = await self.llm.ask(
-                messages=[user_message], system_msgs=[system_message]
-            )
-
-            return f"Plan completed:\n\n{response}"
-        except Exception as e:
-            logger.error(f"Error finalizing plan with LLM: {e}")
-
-            # Fallback to using an agent for the summary
-            try:
-                agent = self.primary_agent
-                summary_prompt = f"""
-                The plan has been completed. Here is the final plan status:
-
-                {plan_text}
-
-                Please provide a summary of what was accomplished and any final thoughts.
-                """
-                summary = await agent.run(summary_prompt)
-                return f"Plan completed:\n\n{summary}"
-            except Exception as e2:
-                logger.error(f"Error finalizing plan with agent: {e2}")
-                return "Plan completed. Error generating summary."
diff --git a/openmanus_rl/agentgym/OpenManus/app/llm.py b/openmanus_rl/agentgym/OpenManus/app/llm.py
deleted file mode 100644
index 1a4e05b6..00000000
--- a/openmanus_rl/agentgym/OpenManus/app/llm.py
+++ /dev/null
@@ -1,769 +0,0 @@
-import math
-from typing import Dict, List, Optional, Union
-
-import tiktoken
-from openai import (
-    APIError,
-    AsyncAzureOpenAI,
-    AsyncOpenAI,
-    AuthenticationError,
-    OpenAIError,
-    RateLimitError,
-)
-from openai.types.chat.chat_completion_message import ChatCompletionMessage
-from tenacity import (
-    retry,
-    retry_if_exception_type,
-    stop_after_attempt,
-    wait_random_exponential,
-)
-
-from app.config import LLMSettings, config
-from app.exceptions import TokenLimitExceeded
-from app.logger import logger  # Assuming a logger is set up in your app
-from app.schema import (
-    ROLE_VALUES,
-    TOOL_CHOICE_TYPE,
-    TOOL_CHOICE_VALUES,
-    Message,
-    ToolChoice,
-)
-
-
-REASONING_MODELS = ["o1", "o3-mini"]
-MULTIMODAL_MODELS = [
-    "gpt-4-vision-preview",
-    "gpt-4o",
-    "gpt-4o-mini",
-    "claude-3-opus-20240229",
-    "claude-3-sonnet-20240229",
-    "claude-3-haiku-20240307",
-]
-
-
-class TokenCounter:
-    # Token constants
-    BASE_MESSAGE_TOKENS = 4
-    FORMAT_TOKENS = 2
-    LOW_DETAIL_IMAGE_TOKENS = 85
-    HIGH_DETAIL_TILE_TOKENS = 170
-
-    # Image processing constants
-    MAX_SIZE = 2048
-    HIGH_DETAIL_TARGET_SHORT_SIDE = 768
-    TILE_SIZE = 512
-
-    def __init__(self, tokenizer):
-        self.tokenizer = tokenizer
-
-    def count_text(self, text: str) -> int:
-        """Calculate tokens for a text string"""
-        return 0 if not text else len(self.tokenizer.encode(text))
-
-    def count_image(self, image_item: dict) -> int:
-        """
-        Calculate tokens for an image based on detail level and dimensions
-
-        For "low" detail: fixed 85 tokens
-        For "high" detail:
-        1. Scale to fit in 2048x2048 square
-        2. Scale shortest side to 768px
-        3. Count 512px tiles (170 tokens each)
-        4. Add 85 tokens
-        """
-        detail = image_item.get("detail", "medium")
-
-        # For low detail, always return fixed token count
-        if detail == "low":
-            return self.LOW_DETAIL_IMAGE_TOKENS
-
-        # For medium detail (default in OpenAI), use high detail calculation
-        # OpenAI doesn't specify a separate calculation for medium
-
-        # For high detail, calculate based on dimensions if available
-        if detail == "high" or detail == "medium":
-            # If dimensions are provided in the image_item
-            if "dimensions" in image_item:
-                width, height = image_item["dimensions"]
-                return self._calculate_high_detail_tokens(width, height)
-
-        # Default values when dimensions aren't available or detail level is unknown
-        if detail == "high":
-            # Default to a 1024x1024 image calculation for high detail
-            return self._calculate_high_detail_tokens(1024, 1024)  # 765 tokens
-        elif detail == "medium":
-            # Default to a medium-sized image for medium detail
-            return 1024  # This matches the original default
-        else:
-            # For unknown detail levels, use medium as default
-            return 1024
-
-    def _calculate_high_detail_tokens(self, width: int, height: int) -> int:
-        """Calculate tokens for high detail images based on dimensions"""
-        # Step 1: Scale to fit in MAX_SIZE x MAX_SIZE square
-        if width > self.MAX_SIZE or height > self.MAX_SIZE:
-            scale = self.MAX_SIZE / max(width, height)
-            width = int(width * scale)
-            height = int(height * scale)
-
-        # Step 2: Scale so shortest side is HIGH_DETAIL_TARGET_SHORT_SIDE
-        scale = self.HIGH_DETAIL_TARGET_SHORT_SIDE / min(width, height)
-        scaled_width = int(width * scale)
-        scaled_height = int(height * scale)
-
-        # Step 3: Count number of 512px tiles
-        tiles_x = math.ceil(scaled_width / self.TILE_SIZE)
-        tiles_y = math.ceil(scaled_height / self.TILE_SIZE)
-        total_tiles = tiles_x * tiles_y
-
-        # Step 4: Calculate final token count
-        return (
-            total_tiles * self.HIGH_DETAIL_TILE_TOKENS
-        ) + self.LOW_DETAIL_IMAGE_TOKENS
-
-    def count_content(self, content: Union[str, List[Union[str, dict]]]) -> int:
-        """Calculate tokens for message content"""
-        if not content:
-            return 0
-
-        if isinstance(content, str):
-            return self.count_text(content)
-
-        token_count = 0
-        for item in content:
-            if isinstance(item, str):
-                token_count += self.count_text(item)
-            elif isinstance(item, dict):
-                if "text" in item:
-                    token_count += self.count_text(item["text"])
-                elif "image_url" in item:
-                    token_count += self.count_image(item)
-        return token_count
-
-    def count_tool_calls(self, tool_calls: List[dict]) -> int:
-        """Calculate tokens for tool calls"""
-        token_count = 0
-        for tool_call in tool_calls:
-            if "function" in tool_call:
-                function = tool_call["function"]
-                token_count += self.count_text(function.get("name", ""))
-                token_count += self.count_text(function.get("arguments", ""))
-        return token_count
-
-    def count_message_tokens(self, messages: List[dict]) -> int:
-        """Calculate the total number of tokens in a message list"""
-        total_tokens = self.FORMAT_TOKENS  # Base format tokens
-
-        for message in messages:
-            tokens = self.BASE_MESSAGE_TOKENS  # Base tokens per message
-
-            # Add role tokens
-            tokens += self.count_text(message.get("role", ""))
-
-            # Add content tokens
-            if "content" in message:
-                tokens += self.count_content(message["content"])
-
-            # Add tool calls tokens
-            if "tool_calls" in message:
-                tokens += self.count_tool_calls(message["tool_calls"])
-
-            # Add name and tool_call_id tokens
-            tokens += self.count_text(message.get("name", ""))
-            tokens += self.count_text(message.get("tool_call_id", ""))
-
-            total_tokens += tokens
-
-        return total_tokens
-
-
-class LLM:
-    _instances: Dict[str, "LLM"] = {}
-
-    def __new__(
-        cls, config_name: str = "default", llm_config: Optional[LLMSettings] = None
-    ):
-        if config_name not in cls._instances:
-            instance = super().__new__(cls)
-            instance.__init__(config_name, llm_config)
-            cls._instances[config_name] = instance
-        return cls._instances[config_name]
-
-    def __init__(
-        self, config_name: str = "default", llm_config: Optional[LLMSettings] = None
-    ):
-        if not hasattr(self, "client"):  # Only initialize if not already initialized
-            llm_config = llm_config or config.llm
-            llm_config = llm_config.get(config_name, llm_config["default"])
-            self.model = llm_config.model
-            self.max_tokens = llm_config.max_tokens
-            self.temperature = llm_config.temperature
-            self.api_type = llm_config.api_type
-            self.api_key = llm_config.api_key
-            self.api_version = llm_config.api_version
-            self.base_url = llm_config.base_url
-
-            # Add token counting related attributes
-            self.total_input_tokens = 0
-            self.total_completion_tokens = 0
-            self.max_input_tokens = (
-                llm_config.max_input_tokens
-                if hasattr(llm_config, "max_input_tokens")
-                else None
-            )
-
-            # Initialize tokenizer
-            try:
-                self.tokenizer = tiktoken.encoding_for_model(self.model)
-            except KeyError:
-                # If the model is not in tiktoken's presets, use cl100k_base as default
-                self.tokenizer = tiktoken.get_encoding("cl100k_base")
-
-            if self.api_type == "azure":
-                self.client = AsyncAzureOpenAI(
-                    base_url=self.base_url,
-                    api_key=self.api_key,
-                    api_version=self.api_version,
-                )
-            else:
-                self.client = AsyncOpenAI(api_key=self.api_key, base_url=self.base_url)
-
-            self.token_counter = TokenCounter(self.tokenizer)
-
-    def count_tokens(self, text: str) -> int:
-        """Calculate the number of tokens in a text"""
-        if not text:
-            return 0
-        return len(self.tokenizer.encode(text))
-
-    def count_message_tokens(self, messages: List[dict]) -> int:
-        return self.token_counter.count_message_tokens(messages)
-
-    def update_token_count(self, input_tokens: int, completion_tokens: int = 0) -> None:
-        """Update token counts"""
-        # Only track tokens if max_input_tokens is set
-        self.total_input_tokens += input_tokens
-        self.total_completion_tokens += completion_tokens
-        logger.info(
-            f"Token usage: Input={input_tokens}, Completion={completion_tokens}, "
-            f"Cumulative Input={self.total_input_tokens}, Cumulative Completion={self.total_completion_tokens}, "
-            f"Total={input_tokens + completion_tokens}, Cumulative Total={self.total_input_tokens + self.total_completion_tokens}"
-        )
-
-    def check_token_limit(self, input_tokens: int) -> bool:
-        """Check if token limits are exceeded"""
-        if self.max_input_tokens is not None:
-            return (self.total_input_tokens + input_tokens) <= self.max_input_tokens
-        # If max_input_tokens is not set, always return True
-        return True
-
-    def get_limit_error_message(self, input_tokens: int) -> str:
-        """Generate error message for token limit exceeded"""
-        if (
-            self.max_input_tokens is not None
-            and (self.total_input_tokens + input_tokens) > self.max_input_tokens
-        ):
-            return f"Request may exceed input token limit (Current: {self.total_input_tokens}, Needed: {input_tokens}, Max: {self.max_input_tokens})"
-
-        return "Token limit exceeded"
-
-    @staticmethod
-    def format_messages(
-        messages: List[Union[dict, Message]], supports_images: bool = False
-    ) -> List[dict]:
-        """
-        Format messages for LLM by converting them to OpenAI message format.
-
-        Args:
-            messages: List of messages that can be either dict or Message objects
-            supports_images: Flag indicating if the target model supports image inputs
-
-        Returns:
-            List[dict]: List of formatted messages in OpenAI format
-
-        Raises:
-            ValueError: If messages are invalid or missing required fields
-            TypeError: If unsupported message types are provided
-
-        Examples:
-            >>> msgs = [
-            ...     Message.system_message("You are a helpful assistant"),
-            ...     {"role": "user", "content": "Hello"},
-            ...     Message.user_message("How are you?")
-            ... ]
-            >>> formatted = LLM.format_messages(msgs)
-        """
-        formatted_messages = []
-
-        for message in messages:
-            # Convert Message objects to dictionaries
-            if isinstance(message, Message):
-                message = message.to_dict()
-
-            if isinstance(message, dict):
-                # If message is a dict, ensure it has required fields
-                if "role" not in message:
-                    raise ValueError("Message dict must contain 'role' field")
-
-                # Process base64 images if present and model supports images
-                if supports_images and message.get("base64_image"):
-                    # Initialize or convert content to appropriate format
-                    if not message.get("content"):
-                        message["content"] = []
-                    elif isinstance(message["content"], str):
-                        message["content"] = [
-                            {"type": "text", "text": message["content"]}
-                        ]
-                    elif isinstance(message["content"], list):
-                        # Convert string items to proper text objects
-                        message["content"] = [
-                            (
-                                {"type": "text", "text": item}
-                                if isinstance(item, str)
-                                else item
-                            )
-                            for item in message["content"]
-                        ]
-
-                    # Add the image to content
-                    message["content"].append(
-                        {
-                            "type": "image_url",
-                            "image_url": {
-                                "url": f"data:image/jpeg;base64,{message['base64_image']}"
-                            },
-                        }
-                    )
-
-                    # Remove the base64_image field
-                    del message["base64_image"]
-                # If model doesn't support images but message has base64_image, handle gracefully
-                elif not supports_images and message.get("base64_image"):
-                    # Just remove the base64_image field and keep the text content
-                    del message["base64_image"]
-
-                if "content" in message or "tool_calls" in message:
-                    formatted_messages.append(message)
-                # else: do not include the message
-            else:
-                raise TypeError(f"Unsupported message type: {type(message)}")
-
-        # Validate all messages have required fields
-        for msg in formatted_messages:
-            if msg["role"] not in ROLE_VALUES:
-                raise ValueError(f"Invalid role: {msg['role']}")
-
-        return formatted_messages
-
-    @retry(
-        wait=wait_random_exponential(min=1, max=60),
-        stop=stop_after_attempt(6),
-        retry=retry_if_exception_type(
-            (OpenAIError, Exception, ValueError)
-        ),  # Don't retry TokenLimitExceeded
-    )
-    async def ask(
-        self,
-        messages: List[Union[dict, Message]],
-        system_msgs: Optional[List[Union[dict, Message]]] = None,
-        stream: bool = True,
-        temperature: Optional[float] = None,
-    ) -> str:
-        """
-        Send a prompt to the LLM and get the response.
-
-        Args:
-            messages: List of conversation messages
-            system_msgs: Optional system messages to prepend
-            stream (bool): Whether to stream the response
-            temperature (float): Sampling temperature for the response
-
-        Returns:
-            str: The generated response
-
-        Raises:
-            TokenLimitExceeded: If token limits are exceeded
-            ValueError: If messages are invalid or response is empty
-            OpenAIError: If API call fails after retries
-            Exception: For unexpected errors
-        """
-        try:
-            # Check if the model supports images
-            supports_images = self.model in MULTIMODAL_MODELS
-
-            # Format system and user messages with image support check
-            if system_msgs:
-                system_msgs = self.format_messages(system_msgs, supports_images)
-                messages = system_msgs + self.format_messages(messages, supports_images)
-            else:
-                messages = self.format_messages(messages, supports_images)
-
-            # Calculate input token count
-            input_tokens = self.count_message_tokens(messages)
-
-            # Check if token limits are exceeded
-            if not self.check_token_limit(input_tokens):
-                error_message = self.get_limit_error_message(input_tokens)
-                # Raise a special exception that won't be retried
-                raise TokenLimitExceeded(error_message)
-
-            params = {
-                "model": self.model,
-                "messages": messages,
-            }
-
-            if self.model in REASONING_MODELS:
-                params["max_completion_tokens"] = self.max_tokens
-            else:
-                params["max_tokens"] = self.max_tokens
-                params["temperature"] = (
-                    temperature if temperature is not None else self.temperature
-                )
-
-            if not stream:
-                # Non-streaming request
-                response = await self.client.chat.completions.create(
-                    **params, stream=False
-                )
-
-                if not response.choices or not response.choices[0].message.content:
-                    raise ValueError("Empty or invalid response from LLM")
-
-                # Update token counts
-                self.update_token_count(
-                    response.usage.prompt_tokens, response.usage.completion_tokens
-                )
-
-                return response.choices[0].message.content
-
-            # Streaming request, For streaming, update estimated token count before making the request
-            self.update_token_count(input_tokens)
-
-            response = await self.client.chat.completions.create(**params, stream=True)
-
-            collected_messages = []
-            completion_text = ""
-            async for chunk in response:
-                chunk_message = chunk.choices[0].delta.content or ""
-                collected_messages.append(chunk_message)
-                completion_text += chunk_message
-                print(chunk_message, end="", flush=True)
-
-            print()  # Newline after streaming
-            full_response = "".join(collected_messages).strip()
-            if not full_response:
-                raise ValueError("Empty response from streaming LLM")
-
-            # estimate completion tokens for streaming response
-            completion_tokens = self.count_tokens(completion_text)
-            logger.info(
-                f"Estimated completion tokens for streaming response: {completion_tokens}"
-            )
-            self.total_completion_tokens += completion_tokens
-
-            return full_response
-
-        except TokenLimitExceeded:
-            # Re-raise token limit errors without logging
-            raise
-        except ValueError:
-            logger.exception(f"Validation error")
-            raise
-        except OpenAIError as oe:
-            logger.exception(f"OpenAI API error")
-            if isinstance(oe, AuthenticationError):
-                logger.error("Authentication failed. Check API key.")
-            elif isinstance(oe, RateLimitError):
-                logger.error("Rate limit exceeded. Consider increasing retry attempts.")
-            elif isinstance(oe, APIError):
-                logger.error(f"API error: {oe}")
-            raise
-        except Exception:
-            logger.exception(f"Unexpected error in ask")
-            raise
-
-    @retry(
-        wait=wait_random_exponential(min=1, max=60),
-        stop=stop_after_attempt(6),
-        retry=retry_if_exception_type(
-            (OpenAIError, Exception, ValueError)
-        ),  # Don't retry TokenLimitExceeded
-    )
-    async def ask_with_images(
-        self,
-        messages: List[Union[dict, Message]],
-        images: List[Union[str, dict]],
-        system_msgs: Optional[List[Union[dict, Message]]] = None,
-        stream: bool = False,
-        temperature: Optional[float] = None,
-    ) -> str:
-        """
-        Send a prompt with images to the LLM and get the response.
-
-        Args:
-            messages: List of conversation messages
-            images: List of image URLs or image data dictionaries
-            system_msgs: Optional system messages to prepend
-            stream (bool): Whether to stream the response
-            temperature (float): Sampling temperature for the response
-
-        Returns:
-            str: The generated response
-
-        Raises:
-            TokenLimitExceeded: If token limits are exceeded
-            ValueError: If messages are invalid or response is empty
-            OpenAIError: If API call fails after retries
-            Exception: For unexpected errors
-        """
-        try:
-            # For ask_with_images, we always set supports_images to True because
-            # this method should only be called with models that support images
-            if self.model not in MULTIMODAL_MODELS:
-                raise ValueError(
-                    f"Model {self.model} does not support images. Use a model from {MULTIMODAL_MODELS}"
-                )
-
-            # Format messages with image support
-            formatted_messages = self.format_messages(messages, supports_images=True)
-
-            # Ensure the last message is from the user to attach images
-            if not formatted_messages or formatted_messages[-1]["role"] != "user":
-                raise ValueError(
-                    "The last message must be from the user to attach images"
-                )
-
-            # Process the last user message to include images
-            last_message = formatted_messages[-1]
-
-            # Convert content to multimodal format if needed
-            content = last_message["content"]
-            multimodal_content = (
-                [{"type": "text", "text": content}]
-                if isinstance(content, str)
-                else content
-                if isinstance(content, list)
-                else []
-            )
-
-            # Add images to content
-            for image in images:
-                if isinstance(image, str):
-                    multimodal_content.append(
-                        {"type": "image_url", "image_url": {"url": image}}
-                    )
-                elif isinstance(image, dict) and "url" in image:
-                    multimodal_content.append({"type": "image_url", "image_url": image})
-                elif isinstance(image, dict) and "image_url" in image:
-                    multimodal_content.append(image)
-                else:
-                    raise ValueError(f"Unsupported image format: {image}")
-
-            # Update the message with multimodal content
-            last_message["content"] = multimodal_content
-
-            # Add system messages if provided
-            if system_msgs:
-                all_messages = (
-                    self.format_messages(system_msgs, supports_images=True)
-                    + formatted_messages
-                )
-            else:
-                all_messages = formatted_messages
-
-            # Calculate tokens and check limits
-            input_tokens = self.count_message_tokens(all_messages)
-            if not self.check_token_limit(input_tokens):
-                raise TokenLimitExceeded(self.get_limit_error_message(input_tokens))
-
-            # Set up API parameters
-            params = {
-                "model": self.model,
-                "messages": all_messages,
-                "stream": stream,
-            }
-
-            # Add model-specific parameters
-            if self.model in REASONING_MODELS:
-                params["max_completion_tokens"] = self.max_tokens
-            else:
-                params["max_tokens"] = self.max_tokens
-                params["temperature"] = (
-                    temperature if temperature is not None else self.temperature
-                )
-
-            # Handle non-streaming request
-            if not stream:
-                response = await self.client.chat.completions.create(**params)
-
-                if not response.choices or not response.choices[0].message.content:
-                    raise ValueError("Empty or invalid response from LLM")
-
-                self.update_token_count(response.usage.prompt_tokens)
-                return response.choices[0].message.content
-
-            # Handle streaming request
-            self.update_token_count(input_tokens)
-            response = await self.client.chat.completions.create(**params)
-
-            collected_messages = []
-            async for chunk in response:
-                chunk_message = chunk.choices[0].delta.content or ""
-                collected_messages.append(chunk_message)
-                print(chunk_message, end="", flush=True)
-
-            print()  # Newline after streaming
-            full_response = "".join(collected_messages).strip()
-
-            if not full_response:
-                raise ValueError("Empty response from streaming LLM")
-
-            return full_response
-
-        except TokenLimitExceeded:
-            raise
-        except ValueError as ve:
-            logger.error(f"Validation error in ask_with_images: {ve}")
-            raise
-        except OpenAIError as oe:
-            logger.error(f"OpenAI API error: {oe}")
-            if isinstance(oe, AuthenticationError):
-                logger.error("Authentication failed. Check API key.")
-            elif isinstance(oe, RateLimitError):
-                logger.error("Rate limit exceeded. Consider increasing retry attempts.")
-            elif isinstance(oe, APIError):
-                logger.error(f"API error: {oe}")
-            raise
-        except Exception as e:
-            logger.error(f"Unexpected error in ask_with_images: {e}")
-            raise
-
-    @retry(
-        wait=wait_random_exponential(min=1, max=60),
-        stop=stop_after_attempt(6),
-        retry=retry_if_exception_type(
-            (OpenAIError, Exception, ValueError)
-        ),  # Don't retry TokenLimitExceeded
-    )
-    async def ask_tool(
-        self,
-        messages: List[Union[dict, Message]],
-        system_msgs: Optional[List[Union[dict, Message]]] = None,
-        timeout: int = 300,
-        tools: Optional[List[dict]] = None,
-        tool_choice: TOOL_CHOICE_TYPE = ToolChoice.AUTO,  # type: ignore
-        temperature: Optional[float] = None,
-        **kwargs,
-    ) -> ChatCompletionMessage | None:
-        """
-        Ask LLM using functions/tools and return the response.
-
-        Args:
-            messages: List of conversation messages
-            system_msgs: Optional system messages to prepend
-            timeout: Request timeout in seconds
-            tools: List of tools to use
-            tool_choice: Tool choice strategy
-            temperature: Sampling temperature for the response
-            **kwargs: Additional completion arguments
-
-        Returns:
-            ChatCompletionMessage: The model's response
-
-        Raises:
-            TokenLimitExceeded: If token limits are exceeded
-            ValueError: If tools, tool_choice, or messages are invalid
-            OpenAIError: If API call fails after retries
-            Exception: For unexpected errors
-        """
-        try:
-            # Validate tool_choice
-            if tool_choice not in TOOL_CHOICE_VALUES:
-                raise ValueError(f"Invalid tool_choice: {tool_choice}")
-
-            # Check if the model supports images
-            supports_images = self.model in MULTIMODAL_MODELS
-
-            # Format messages
-            if system_msgs:
-                system_msgs = self.format_messages(system_msgs, supports_images)
-                messages = system_msgs + self.format_messages(messages, supports_images)
-            else:
-                messages = self.format_messages(messages, supports_images)
-
-            # Calculate input token count
-            input_tokens = self.count_message_tokens(messages)
-
-            # If there are tools, calculate token count for tool descriptions
-            tools_tokens = 0
-            if tools:
-                for tool in tools:
-                    tools_tokens += self.count_tokens(str(tool))
-
-            input_tokens += tools_tokens
-
-            # Check if token limits are exceeded
-            if not self.check_token_limit(input_tokens):
-                error_message = self.get_limit_error_message(input_tokens)
-                # Raise a special exception that won't be retried
-                raise TokenLimitExceeded(error_message)
-
-            # Validate tools if provided
-            if tools:
-                for tool in tools:
-                    if not isinstance(tool, dict) or "type" not in tool:
-                        raise ValueError("Each tool must be a dict with 'type' field")
-
-            # Set up the completion request
-            params = {
-                "model": self.model,
-                "messages": messages,
-                "tools": tools,
-                "tool_choice": tool_choice,
-                "timeout": timeout,
-                **kwargs,
-            }
-
-            if self.model in REASONING_MODELS:
-                params["max_completion_tokens"] = self.max_tokens
-            else:
-                params["max_tokens"] = self.max_tokens
-                params["temperature"] = (
-                    temperature if temperature is not None else self.temperature
-                )
-
-            response: ChatCompletion = await self.client.chat.completions.create(
-                **params, stream=False
-            )
-
-            # Check if response is valid
-            if not response.choices or not response.choices[0].message:
-                print(response)
-                # raise ValueError("Invalid or empty response from LLM")
-                return None
-
-            # Update token counts
-            self.update_token_count(
-                response.usage.prompt_tokens, response.usage.completion_tokens
-            )
-
-            return response.choices[0].message
-
-        except TokenLimitExceeded:
-            # Re-raise token limit errors without logging
-            raise
-        except ValueError as ve:
-            logger.error(f"Validation error in ask_tool: {ve}")
-            raise
-        except OpenAIError as oe:
-            logger.error(f"OpenAI API error: {oe}")
-            if isinstance(oe, AuthenticationError):
-                logger.error("Authentication failed. Check API key.")
-            elif isinstance(oe, RateLimitError):
-                logger.error("Rate limit exceeded. Consider increasing retry attempts.")
-            elif isinstance(oe, APIError):
-                logger.error(f"API error: {oe}")
-            raise
-        except Exception as e:
-            logger.error(f"Unexpected error in ask_tool: {e}")
-            raise
diff --git a/openmanus_rl/agentgym/OpenManus/app/logger.py b/openmanus_rl/agentgym/OpenManus/app/logger.py
deleted file mode 100644
index c5d9ce18..00000000
--- a/openmanus_rl/agentgym/OpenManus/app/logger.py
+++ /dev/null
@@ -1,42 +0,0 @@
-import sys
-from datetime import datetime
-
-from loguru import logger as _logger
-
-from app.config import PROJECT_ROOT
-
-
-_print_level = "INFO"
-
-
-def define_log_level(print_level="INFO", logfile_level="DEBUG", name: str = None):
-    """Adjust the log level to above level"""
-    global _print_level
-    _print_level = print_level
-
-    current_date = datetime.now()
-    formatted_date = current_date.strftime("%Y%m%d%H%M%S")
-    log_name = (
-        f"{name}_{formatted_date}" if name else formatted_date
-    )  # name a log with prefix name
-
-    _logger.remove()
-    _logger.add(sys.stderr, level=print_level)
-    _logger.add(PROJECT_ROOT / f"logs/{log_name}.log", level=logfile_level)
-    return _logger
-
-
-logger = define_log_level()
-
-
-if __name__ == "__main__":
-    logger.info("Starting application")
-    logger.debug("Debug message")
-    logger.warning("Warning message")
-    logger.error("Error message")
-    logger.critical("Critical message")
-
-    try:
-        raise ValueError("Test error")
-    except Exception as e:
-        logger.exception(f"An error occurred: {e}")
diff --git a/openmanus_rl/agentgym/OpenManus/app/mcp/__init__.py b/openmanus_rl/agentgym/OpenManus/app/mcp/__init__.py
deleted file mode 100644
index e69de29b..00000000
diff --git a/openmanus_rl/agentgym/OpenManus/app/mcp/server.py b/openmanus_rl/agentgym/OpenManus/app/mcp/server.py
deleted file mode 100644
index 028ffae8..00000000
--- a/openmanus_rl/agentgym/OpenManus/app/mcp/server.py
+++ /dev/null
@@ -1,196 +0,0 @@
-import argparse
-import asyncio
-import atexit
-import json
-import logging
-import os
-import sys
-from inspect import Parameter, Signature
-from typing import Any, Dict, Optional
-
-from mcp.server.fastmcp import FastMCP
-
-
-# Add directories to Python path (needed for proper importing)
-current_dir = os.path.dirname(os.path.abspath(__file__))
-parent_dir = os.path.dirname(current_dir)
-root_dir = os.path.dirname(parent_dir)
-sys.path.insert(0, parent_dir)
-sys.path.insert(0, current_dir)
-sys.path.insert(0, root_dir)
-
-# Configure logging (using the same format as original)
-logging.basicConfig(
-    level=logging.INFO, format="%(asctime)s - %(name)s - %(levelname)s - %(message)s"
-)
-logger = logging.getLogger("mcp-server")
-
-from app.tool.base import BaseTool
-from app.tool.bash import Bash
-from app.tool.browser_use_tool import BrowserUseTool
-from app.tool.str_replace_editor import StrReplaceEditor
-from app.tool.terminate import Terminate
-
-
-class MCPServer:
-    """MCP Server implementation with tool registration and management."""
-
-    def __init__(self, name: str = "openmanus"):
-        self.server = FastMCP(name)
-        self.tools: Dict[str, BaseTool] = {}
-
-        # Initialize standard tools
-        self.tools["bash"] = Bash()
-        self.tools["browser"] = BrowserUseTool()
-        self.tools["editor"] = StrReplaceEditor()
-        self.tools["terminate"] = Terminate()
-
-        from app.logger import logger as app_logger
-
-        global logger
-        logger = app_logger
-
-    def register_tool(self, tool: BaseTool, method_name: Optional[str] = None) -> None:
-        """Register a tool with parameter validation and documentation."""
-        tool_name = method_name or tool.name
-        tool_param = tool.to_param()
-        tool_function = tool_param["function"]
-
-        # Define the async function to be registered
-        async def tool_method(**kwargs):
-            logger.info(f"Executing {tool_name}: {kwargs}")
-            result = await tool.execute(**kwargs)
-
-            logger.info(f"Result of {tool_name}: {result}")
-
-            # Handle different types of results (match original logic)
-            if hasattr(result, "model_dump"):
-                return json.dumps(result.model_dump())
-            elif isinstance(result, dict):
-                return json.dumps(result)
-            return result
-
-        # Set method metadata
-        tool_method.__name__ = tool_name
-        tool_method.__doc__ = self._build_docstring(tool_function)
-        tool_method.__signature__ = self._build_signature(tool_function)
-
-        # Store parameter schema (important for tools that access it programmatically)
-        param_props = tool_function.get("parameters", {}).get("properties", {})
-        required_params = tool_function.get("parameters", {}).get("required", [])
-        tool_method._parameter_schema = {
-            param_name: {
-                "description": param_details.get("description", ""),
-                "type": param_details.get("type", "any"),
-                "required": param_name in required_params,
-            }
-            for param_name, param_details in param_props.items()
-        }
-
-        # Register with server
-        self.server.tool()(tool_method)
-        logger.info(f"Registered tool: {tool_name}")
-
-    def _build_docstring(self, tool_function: dict) -> str:
-        """Build a formatted docstring from tool function metadata."""
-        description = tool_function.get("description", "")
-        param_props = tool_function.get("parameters", {}).get("properties", {})
-        required_params = tool_function.get("parameters", {}).get("required", [])
-
-        # Build docstring (match original format)
-        docstring = description
-        if param_props:
-            docstring += "\n\nParameters:\n"
-            for param_name, param_details in param_props.items():
-                required_str = (
-                    "(required)" if param_name in required_params else "(optional)"
-                )
-                param_type = param_details.get("type", "any")
-                param_desc = param_details.get("description", "")
-                docstring += (
-                    f"    {param_name} ({param_type}) {required_str}: {param_desc}\n"
-                )
-
-        return docstring
-
-    def _build_signature(self, tool_function: dict) -> Signature:
-        """Build a function signature from tool function metadata."""
-        param_props = tool_function.get("parameters", {}).get("properties", {})
-        required_params = tool_function.get("parameters", {}).get("required", [])
-
-        parameters = []
-
-        # Follow original type mapping
-        for param_name, param_details in param_props.items():
-            param_type = param_details.get("type", "")
-            default = Parameter.empty if param_name in required_params else None
-
-            # Map JSON Schema types to Python types (same as original)
-            annotation = Any
-            if param_type == "string":
-                annotation = str
-            elif param_type == "integer":
-                annotation = int
-            elif param_type == "number":
-                annotation = float
-            elif param_type == "boolean":
-                annotation = bool
-            elif param_type == "object":
-                annotation = dict
-            elif param_type == "array":
-                annotation = list
-
-            # Create parameter with same structure as original
-            param = Parameter(
-                name=param_name,
-                kind=Parameter.KEYWORD_ONLY,
-                default=default,
-                annotation=annotation,
-            )
-            parameters.append(param)
-
-        return Signature(parameters=parameters)
-
-    async def cleanup(self) -> None:
-        """Clean up server resources."""
-        logger.info("Cleaning up resources")
-        # Follow original cleanup logic - only clean browser tool
-        if "browser" in self.tools and hasattr(self.tools["browser"], "cleanup"):
-            await self.tools["browser"].cleanup()
-
-    def register_all_tools(self) -> None:
-        """Register all tools with the server."""
-        for tool in self.tools.values():
-            self.register_tool(tool)
-
-    def run(self, transport: str = "stdio") -> None:
-        """Run the MCP server."""
-        # Register all tools
-        self.register_all_tools()
-
-        # Register cleanup function (match original behavior)
-        atexit.register(lambda: asyncio.run(self.cleanup()))
-
-        # Start server (with same logging as original)
-        logger.info(f"Starting OpenManus server ({transport} mode)")
-        self.server.run(transport=transport)
-
-
-def parse_args() -> argparse.Namespace:
-    """Parse command line arguments."""
-    parser = argparse.ArgumentParser(description="OpenManus MCP Server")
-    parser.add_argument(
-        "--transport",
-        choices=["stdio"],
-        default="stdio",
-        help="Communication method: stdio or http (default: stdio)",
-    )
-    return parser.parse_args()
-
-
-if __name__ == "__main__":
-    args = parse_args()
-
-    # Create and run server (maintaining original flow)
-    server = MCPServer()
-    server.run(transport=args.transport)
diff --git a/openmanus_rl/agentgym/OpenManus/app/prompt/__init__.py b/openmanus_rl/agentgym/OpenManus/app/prompt/__init__.py
deleted file mode 100644
index e69de29b..00000000
diff --git a/openmanus_rl/agentgym/OpenManus/app/prompt/browser.py b/openmanus_rl/agentgym/OpenManus/app/prompt/browser.py
deleted file mode 100644
index 70fed300..00000000
--- a/openmanus_rl/agentgym/OpenManus/app/prompt/browser.py
+++ /dev/null
@@ -1,92 +0,0 @@
-SYSTEM_PROMPT = """\
-You are an AI agent designed to automate browser tasks. Your goal is to accomplish the ultimate task following the rules.
-
-# Input Format
-Task
-Previous steps
-Current URL
-Open Tabs
-Interactive Elements
-[index]<type>text</type>
-- index: Numeric identifier for interaction
-- type: HTML element type (button, input, etc.)
-- text: Element description
-Example:
-[33]<button>Submit Form</button>
-
-- Only elements with numeric indexes in [] are interactive
-- elements without [] provide only context
-
-# Response Rules
-1. RESPONSE FORMAT: You must ALWAYS respond with valid JSON in this exact format:
-{{"current_state": {{"evaluation_previous_goal": "Success|Failed|Unknown - Analyze the current elements and the image to check if the previous goals/actions are successful like intended by the task. Mention if something unexpected happened. Shortly state why/why not",
-"memory": "Description of what has been done and what you need to remember. Be very specific. Count here ALWAYS how many times you have done something and how many remain. E.g. 0 out of 10 websites analyzed. Continue with abc and xyz",
-"next_goal": "What needs to be done with the next immediate action"}},
-"action":[{{"one_action_name": {{// action-specific parameter}}}}, // ... more actions in sequence]}}
-
-2. ACTIONS: You can specify multiple actions in the list to be executed in sequence. But always specify only one action name per item. Use maximum {{max_actions}} actions per sequence.
-Common action sequences:
-- Form filling: [{{"input_text": {{"index": 1, "text": "username"}}}}, {{"input_text": {{"index": 2, "text": "password"}}}}, {{"click_element": {{"index": 3}}}}]
-- Navigation and extraction: [{{"go_to_url": {{"url": "https://example.com"}}}}, {{"extract_content": {{"goal": "extract the names"}}}}]
-- Actions are executed in the given order
-- If the page changes after an action, the sequence is interrupted and you get the new state.
-- Only provide the action sequence until an action which changes the page state significantly.
-- Try to be efficient, e.g. fill forms at once, or chain actions where nothing changes on the page
-- only use multiple actions if it makes sense.
-
-3. ELEMENT INTERACTION:
-- Only use indexes of the interactive elements
-- Elements marked with "[]Non-interactive text" are non-interactive
-
-4. NAVIGATION & ERROR HANDLING:
-- If no suitable elements exist, use other functions to complete the task
-- If stuck, try alternative approaches - like going back to a previous page, new search, new tab etc.
-- Handle popups/cookies by accepting or closing them
-- Use scroll to find elements you are looking for
-- If you want to research something, open a new tab instead of using the current tab
-- If captcha pops up, try to solve it - else try a different approach
-- If the page is not fully loaded, use wait action
-
-5. TASK COMPLETION:
-- Use the done action as the last action as soon as the ultimate task is complete
-- Dont use "done" before you are done with everything the user asked you, except you reach the last step of max_steps.
-- If you reach your last step, use the done action even if the task is not fully finished. Provide all the information you have gathered so far. If the ultimate task is completly finished set success to true. If not everything the user asked for is completed set success in done to false!
-- If you have to do something repeatedly for example the task says for "each", or "for all", or "x times", count always inside "memory" how many times you have done it and how many remain. Don't stop until you have completed like the task asked you. Only call done after the last step.
-- Don't hallucinate actions
-- Make sure you include everything you found out for the ultimate task in the done text parameter. Do not just say you are done, but include the requested information of the task.
-
-6. VISUAL CONTEXT:
-- When an image is provided, use it to understand the page layout
-- Bounding boxes with labels on their top right corner correspond to element indexes
-
-7. Form filling:
-- If you fill an input field and your action sequence is interrupted, most often something changed e.g. suggestions popped up under the field.
-
-8. Long tasks:
-- Keep track of the status and subresults in the memory.
-
-9. Extraction:
-- If your task is to find information - call extract_content on the specific pages to get and store the information.
-Your responses must be always JSON with the specified format.
-"""
-
-NEXT_STEP_PROMPT = """
-What should I do next to achieve my goal?
-
-When you see [Current state starts here], focus on the following:
-- Current URL and page title{url_placeholder}
-- Available tabs{tabs_placeholder}
-- Interactive elements and their indices
-- Content above{content_above_placeholder} or below{content_below_placeholder} the viewport (if indicated)
-- Any action results or errors{results_placeholder}
-
-For browser interactions:
-- To navigate: browser_use with action="go_to_url", url="..."
-- To click: browser_use with action="click_element", index=N
-- To type: browser_use with action="input_text", index=N, text="..."
-- To extract: browser_use with action="extract_content", goal="..."
-- To scroll: browser_use with action="scroll_down" or "scroll_up"
-
-Consider both what's visible and what might be beyond the current viewport.
-Be methodical - remember your progress and what you've learned so far.
-"""
diff --git a/openmanus_rl/agentgym/OpenManus/app/prompt/manus.py b/openmanus_rl/agentgym/OpenManus/app/prompt/manus.py
deleted file mode 100644
index f080ba45..00000000
--- a/openmanus_rl/agentgym/OpenManus/app/prompt/manus.py
+++ /dev/null
@@ -1,8 +0,0 @@
-SYSTEM_PROMPT = (
-    "You are OpenManus, an all-capable AI assistant, aimed at solving any task presented by the user. You have various tools at your disposal that you can call upon to efficiently complete complex requests. Whether it's programming, information retrieval, file processing, or web browsing, you can handle it all."
-    "The initial directory is: {directory}"
-)
-
-NEXT_STEP_PROMPT = """
-Based on user needs, proactively select the most appropriate tool or combination of tools. For complex tasks, you can break down the problem and use different tools step by step to solve it. After using each tool, clearly explain the execution results and suggest the next steps.
-"""
diff --git a/openmanus_rl/agentgym/OpenManus/app/prompt/mcp.py b/openmanus_rl/agentgym/OpenManus/app/prompt/mcp.py
deleted file mode 100644
index acf15b2a..00000000
--- a/openmanus_rl/agentgym/OpenManus/app/prompt/mcp.py
+++ /dev/null
@@ -1,43 +0,0 @@
-"""Prompts for the MCP Agent."""
-
-SYSTEM_PROMPT = """You are an AI assistant with access to a Model Context Protocol (MCP) server.
-You can use the tools provided by the MCP server to complete tasks.
-The MCP server will dynamically expose tools that you can use - always check the available tools first.
-
-When using an MCP tool:
-1. Choose the appropriate tool based on your task requirements
-2. Provide properly formatted arguments as required by the tool
-3. Observe the results and use them to determine next steps
-4. Tools may change during operation - new tools might appear or existing ones might disappear
-
-Follow these guidelines:
-- Call tools with valid parameters as documented in their schemas
-- Handle errors gracefully by understanding what went wrong and trying again with corrected parameters
-- For multimedia responses (like images), you'll receive a description of the content
-- Complete user requests step by step, using the most appropriate tools
-- If multiple tools need to be called in sequence, make one call at a time and wait for results
-
-Remember to clearly explain your reasoning and actions to the user.
-"""
-
-NEXT_STEP_PROMPT = """Based on the current state and available tools, what should be done next?
-Think step by step about the problem and identify which MCP tool would be most helpful for the current stage.
-If you've already made progress, consider what additional information you need or what actions would move you closer to completing the task.
-"""
-
-# Additional specialized prompts
-TOOL_ERROR_PROMPT = """You encountered an error with the tool '{tool_name}'.
-Try to understand what went wrong and correct your approach.
-Common issues include:
-- Missing or incorrect parameters
-- Invalid parameter formats
-- Using a tool that's no longer available
-- Attempting an operation that's not supported
-
-Please check the tool specifications and try again with corrected parameters.
-"""
-
-MULTIMEDIA_RESPONSE_PROMPT = """You've received a multimedia response (image, audio, etc.) from the tool '{tool_name}'.
-This content has been processed and described for you.
-Use this information to continue the task or provide insights to the user.
-"""
diff --git a/openmanus_rl/agentgym/OpenManus/app/prompt/planning.py b/openmanus_rl/agentgym/OpenManus/app/prompt/planning.py
deleted file mode 100644
index bd5f4ce7..00000000
--- a/openmanus_rl/agentgym/OpenManus/app/prompt/planning.py
+++ /dev/null
@@ -1,27 +0,0 @@
-PLANNING_SYSTEM_PROMPT = """
-You are an expert Planning Agent tasked with solving problems efficiently through structured plans.
-Your job is:
-1. Analyze requests to understand the task scope
-2. Create a clear, actionable plan that makes meaningful progress with the `planning` tool
-3. Execute steps using available tools as needed
-4. Track progress and adapt plans when necessary
-5. Use `finish` to conclude immediately when the task is complete
-
-
-Available tools will vary by task but may include:
-- `planning`: Create, update, and track plans (commands: create, update, mark_step, etc.)
-- `finish`: End the task when complete
-Break tasks into logical steps with clear outcomes. Avoid excessive detail or sub-steps.
-Think about dependencies and verification methods.
-Know when to conclude - don't continue thinking once objectives are met.
-"""
-
-NEXT_STEP_PROMPT = """
-Based on the current state, what's your next action?
-Choose the most efficient path forward:
-1. Is the plan sufficient, or does it need refinement?
-2. Can you execute the next step immediately?
-3. Is the task complete? If so, use `finish` right away.
-
-Be concise in your reasoning, then select the appropriate tool or action.
-"""
diff --git a/openmanus_rl/agentgym/OpenManus/app/prompt/swe.py b/openmanus_rl/agentgym/OpenManus/app/prompt/swe.py
deleted file mode 100644
index a496988e..00000000
--- a/openmanus_rl/agentgym/OpenManus/app/prompt/swe.py
+++ /dev/null
@@ -1,28 +0,0 @@
-SYSTEM_PROMPT = """SETTING: You are an autonomous programmer, and you're working directly in the command line with a special interface.
-
-The special interface consists of a file editor that shows you {{WINDOW}} lines of a file at a time.
-In addition to typical bash commands, you can also use specific commands to help you navigate and edit files.
-To call a command, you need to invoke it with a function call/tool call.
-
-Please note that THE EDIT COMMAND REQUIRES PROPER INDENTATION.
-If you'd like to add the line '        print(x)' you must fully write that out, with all those spaces before the code! Indentation is important and code that is not indented correctly will fail and require fixing before it can be run.
-
-RESPONSE FORMAT:
-Your shell prompt is formatted as follows:
-(Open file: <path>)
-(Current directory: <cwd>)
-bash-$
-
-First, you should _always_ include a general thought about what you're going to do next.
-Then, for every response, you must include exactly _ONE_ tool call/function call.
-
-Remember, you should always include a _SINGLE_ tool call/function call and then wait for a response from the shell before continuing with more discussion and commands. Everything you include in the DISCUSSION section will be saved for future reference.
-If you'd like to issue two commands at once, PLEASE DO NOT DO THAT! Please instead first submit just the first tool call, and then after receiving a response you'll be able to issue the second tool call.
-Note that the environment does NOT support interactive session commands (e.g. python, vim), so please do not invoke them.
-"""
-
-NEXT_STEP_TEMPLATE = """{{observation}}
-(Open file: {{open_file}})
-(Current directory: {{working_dir}})
-bash-$
-"""
diff --git a/openmanus_rl/agentgym/OpenManus/app/prompt/toolcall.py b/openmanus_rl/agentgym/OpenManus/app/prompt/toolcall.py
deleted file mode 100644
index e1a3be93..00000000
--- a/openmanus_rl/agentgym/OpenManus/app/prompt/toolcall.py
+++ /dev/null
@@ -1,5 +0,0 @@
-SYSTEM_PROMPT = "You are an agent that can execute tool calls"
-
-NEXT_STEP_PROMPT = (
-    "If you want to stop interaction, use `terminate` tool/function call."
-)
diff --git a/openmanus_rl/agentgym/OpenManus/app/sandbox/__init__.py b/openmanus_rl/agentgym/OpenManus/app/sandbox/__init__.py
deleted file mode 100644
index ccf0df6d..00000000
--- a/openmanus_rl/agentgym/OpenManus/app/sandbox/__init__.py
+++ /dev/null
@@ -1,30 +0,0 @@
-"""
-Docker Sandbox Module
-
-Provides secure containerized execution environment with resource limits
-and isolation for running untrusted code.
-"""
-from app.sandbox.client import (
-    BaseSandboxClient,
-    LocalSandboxClient,
-    create_sandbox_client,
-)
-from app.sandbox.core.exceptions import (
-    SandboxError,
-    SandboxResourceError,
-    SandboxTimeoutError,
-)
-from app.sandbox.core.manager import SandboxManager
-from app.sandbox.core.sandbox import DockerSandbox
-
-
-__all__ = [
-    "DockerSandbox",
-    "SandboxManager",
-    "BaseSandboxClient",
-    "LocalSandboxClient",
-    "create_sandbox_client",
-    "SandboxError",
-    "SandboxTimeoutError",
-    "SandboxResourceError",
-]
diff --git a/openmanus_rl/agentgym/OpenManus/app/sandbox/client.py b/openmanus_rl/agentgym/OpenManus/app/sandbox/client.py
deleted file mode 100644
index 09a8f2e8..00000000
--- a/openmanus_rl/agentgym/OpenManus/app/sandbox/client.py
+++ /dev/null
@@ -1,201 +0,0 @@
-from abc import ABC, abstractmethod
-from typing import Dict, Optional, Protocol
-
-from app.config import SandboxSettings
-from app.sandbox.core.sandbox import DockerSandbox
-
-
-class SandboxFileOperations(Protocol):
-    """Protocol for sandbox file operations."""
-
-    async def copy_from(self, container_path: str, local_path: str) -> None:
-        """Copies file from container to local.
-
-        Args:
-            container_path: File path in container.
-            local_path: Local destination path.
-        """
-        ...
-
-    async def copy_to(self, local_path: str, container_path: str) -> None:
-        """Copies file from local to container.
-
-        Args:
-            local_path: Local source file path.
-            container_path: Destination path in container.
-        """
-        ...
-
-    async def read_file(self, path: str) -> str:
-        """Reads file content from container.
-
-        Args:
-            path: File path in container.
-
-        Returns:
-            str: File content.
-        """
-        ...
-
-    async def write_file(self, path: str, content: str) -> None:
-        """Writes content to file in container.
-
-        Args:
-            path: File path in container.
-            content: Content to write.
-        """
-        ...
-
-
-class BaseSandboxClient(ABC):
-    """Base sandbox client interface."""
-
-    @abstractmethod
-    async def create(
-        self,
-        config: Optional[SandboxSettings] = None,
-        volume_bindings: Optional[Dict[str, str]] = None,
-    ) -> None:
-        """Creates sandbox."""
-
-    @abstractmethod
-    async def run_command(self, command: str, timeout: Optional[int] = None) -> str:
-        """Executes command."""
-
-    @abstractmethod
-    async def copy_from(self, container_path: str, local_path: str) -> None:
-        """Copies file from container."""
-
-    @abstractmethod
-    async def copy_to(self, local_path: str, container_path: str) -> None:
-        """Copies file to container."""
-
-    @abstractmethod
-    async def read_file(self, path: str) -> str:
-        """Reads file."""
-
-    @abstractmethod
-    async def write_file(self, path: str, content: str) -> None:
-        """Writes file."""
-
-    @abstractmethod
-    async def cleanup(self) -> None:
-        """Cleans up resources."""
-
-
-class LocalSandboxClient(BaseSandboxClient):
-    """Local sandbox client implementation."""
-
-    def __init__(self):
-        """Initializes local sandbox client."""
-        self.sandbox: Optional[DockerSandbox] = None
-
-    async def create(
-        self,
-        config: Optional[SandboxSettings] = None,
-        volume_bindings: Optional[Dict[str, str]] = None,
-    ) -> None:
-        """Creates a sandbox.
-
-        Args:
-            config: Sandbox configuration.
-            volume_bindings: Volume mappings.
-
-        Raises:
-            RuntimeError: If sandbox creation fails.
-        """
-        self.sandbox = DockerSandbox(config, volume_bindings)
-        await self.sandbox.create()
-
-    async def run_command(self, command: str, timeout: Optional[int] = None) -> str:
-        """Runs command in sandbox.
-
-        Args:
-            command: Command to execute.
-            timeout: Execution timeout in seconds.
-
-        Returns:
-            Command output.
-
-        Raises:
-            RuntimeError: If sandbox not initialized.
-        """
-        if not self.sandbox:
-            raise RuntimeError("Sandbox not initialized")
-        return await self.sandbox.run_command(command, timeout)
-
-    async def copy_from(self, container_path: str, local_path: str) -> None:
-        """Copies file from container to local.
-
-        Args:
-            container_path: File path in container.
-            local_path: Local destination path.
-
-        Raises:
-            RuntimeError: If sandbox not initialized.
-        """
-        if not self.sandbox:
-            raise RuntimeError("Sandbox not initialized")
-        await self.sandbox.copy_from(container_path, local_path)
-
-    async def copy_to(self, local_path: str, container_path: str) -> None:
-        """Copies file from local to container.
-
-        Args:
-            local_path: Local source file path.
-            container_path: Destination path in container.
-
-        Raises:
-            RuntimeError: If sandbox not initialized.
-        """
-        if not self.sandbox:
-            raise RuntimeError("Sandbox not initialized")
-        await self.sandbox.copy_to(local_path, container_path)
-
-    async def read_file(self, path: str) -> str:
-        """Reads file from container.
-
-        Args:
-            path: File path in container.
-
-        Returns:
-            File content.
-
-        Raises:
-            RuntimeError: If sandbox not initialized.
-        """
-        if not self.sandbox:
-            raise RuntimeError("Sandbox not initialized")
-        return await self.sandbox.read_file(path)
-
-    async def write_file(self, path: str, content: str) -> None:
-        """Writes file to container.
-
-        Args:
-            path: File path in container.
-            content: File content.
-
-        Raises:
-            RuntimeError: If sandbox not initialized.
-        """
-        if not self.sandbox:
-            raise RuntimeError("Sandbox not initialized")
-        await self.sandbox.write_file(path, content)
-
-    async def cleanup(self) -> None:
-        """Cleans up resources."""
-        if self.sandbox:
-            await self.sandbox.cleanup()
-            self.sandbox = None
-
-
-def create_sandbox_client() -> LocalSandboxClient:
-    """Creates a sandbox client.
-
-    Returns:
-        LocalSandboxClient: Sandbox client instance.
-    """
-    return LocalSandboxClient()
-
-
-SANDBOX_CLIENT = create_sandbox_client()
diff --git a/openmanus_rl/agentgym/OpenManus/app/sandbox/core/exceptions.py b/openmanus_rl/agentgym/OpenManus/app/sandbox/core/exceptions.py
deleted file mode 100644
index 5c1f0e8a..00000000
--- a/openmanus_rl/agentgym/OpenManus/app/sandbox/core/exceptions.py
+++ /dev/null
@@ -1,17 +0,0 @@
-"""Exception classes for the sandbox system.
-
-This module defines custom exceptions used throughout the sandbox system to
-handle various error conditions in a structured way.
-"""
-
-
-class SandboxError(Exception):
-    """Base exception for sandbox-related errors."""
-
-
-class SandboxTimeoutError(SandboxError):
-    """Exception raised when a sandbox operation times out."""
-
-
-class SandboxResourceError(SandboxError):
-    """Exception raised for resource-related errors."""
diff --git a/openmanus_rl/agentgym/OpenManus/app/sandbox/core/manager.py b/openmanus_rl/agentgym/OpenManus/app/sandbox/core/manager.py
deleted file mode 100644
index 5814f120..00000000
--- a/openmanus_rl/agentgym/OpenManus/app/sandbox/core/manager.py
+++ /dev/null
@@ -1,313 +0,0 @@
-import asyncio
-import uuid
-from contextlib import asynccontextmanager
-from typing import Dict, Optional, Set
-
-import docker
-from docker.errors import APIError, ImageNotFound
-
-from app.config import SandboxSettings
-from app.logger import logger
-from app.sandbox.core.sandbox import DockerSandbox
-
-
-class SandboxManager:
-    """Docker sandbox manager.
-
-    Manages multiple DockerSandbox instances lifecycle including creation,
-    monitoring, and cleanup. Provides concurrent access control and automatic
-    cleanup mechanisms for sandbox resources.
-
-    Attributes:
-        max_sandboxes: Maximum allowed number of sandboxes.
-        idle_timeout: Sandbox idle timeout in seconds.
-        cleanup_interval: Cleanup check interval in seconds.
-        _sandboxes: Active sandbox instance mapping.
-        _last_used: Last used time record for sandboxes.
-    """
-
-    def __init__(
-        self,
-        max_sandboxes: int = 100,
-        idle_timeout: int = 3600,
-        cleanup_interval: int = 300,
-    ):
-        """Initializes sandbox manager.
-
-        Args:
-            max_sandboxes: Maximum sandbox count limit.
-            idle_timeout: Idle timeout in seconds.
-            cleanup_interval: Cleanup check interval in seconds.
-        """
-        self.max_sandboxes = max_sandboxes
-        self.idle_timeout = idle_timeout
-        self.cleanup_interval = cleanup_interval
-
-        # Docker client
-        self._client = docker.from_env()
-
-        # Resource mappings
-        self._sandboxes: Dict[str, DockerSandbox] = {}
-        self._last_used: Dict[str, float] = {}
-
-        # Concurrency control
-        self._locks: Dict[str, asyncio.Lock] = {}
-        self._global_lock = asyncio.Lock()
-        self._active_operations: Set[str] = set()
-
-        # Cleanup task
-        self._cleanup_task: Optional[asyncio.Task] = None
-        self._is_shutting_down = False
-
-        # Start automatic cleanup
-        self.start_cleanup_task()
-
-    async def ensure_image(self, image: str) -> bool:
-        """Ensures Docker image is available.
-
-        Args:
-            image: Image name.
-
-        Returns:
-            bool: Whether image is available.
-        """
-        try:
-            self._client.images.get(image)
-            return True
-        except ImageNotFound:
-            try:
-                logger.info(f"Pulling image {image}...")
-                await asyncio.get_event_loop().run_in_executor(
-                    None, self._client.images.pull, image
-                )
-                return True
-            except (APIError, Exception) as e:
-                logger.error(f"Failed to pull image {image}: {e}")
-                return False
-
-    @asynccontextmanager
-    async def sandbox_operation(self, sandbox_id: str):
-        """Context manager for sandbox operations.
-
-        Provides concurrency control and usage time updates.
-
-        Args:
-            sandbox_id: Sandbox ID.
-
-        Raises:
-            KeyError: If sandbox not found.
-        """
-        if sandbox_id not in self._locks:
-            self._locks[sandbox_id] = asyncio.Lock()
-
-        async with self._locks[sandbox_id]:
-            if sandbox_id not in self._sandboxes:
-                raise KeyError(f"Sandbox {sandbox_id} not found")
-
-            self._active_operations.add(sandbox_id)
-            try:
-                self._last_used[sandbox_id] = asyncio.get_event_loop().time()
-                yield self._sandboxes[sandbox_id]
-            finally:
-                self._active_operations.remove(sandbox_id)
-
-    async def create_sandbox(
-        self,
-        config: Optional[SandboxSettings] = None,
-        volume_bindings: Optional[Dict[str, str]] = None,
-    ) -> str:
-        """Creates a new sandbox instance.
-
-        Args:
-            config: Sandbox configuration.
-            volume_bindings: Volume mapping configuration.
-
-        Returns:
-            str: Sandbox ID.
-
-        Raises:
-            RuntimeError: If max sandbox count reached or creation fails.
-        """
-        async with self._global_lock:
-            if len(self._sandboxes) >= self.max_sandboxes:
-                raise RuntimeError(
-                    f"Maximum number of sandboxes ({self.max_sandboxes}) reached"
-                )
-
-            config = config or SandboxSettings()
-            if not await self.ensure_image(config.image):
-                raise RuntimeError(f"Failed to ensure Docker image: {config.image}")
-
-            sandbox_id = str(uuid.uuid4())
-            try:
-                sandbox = DockerSandbox(config, volume_bindings)
-                await sandbox.create()
-
-                self._sandboxes[sandbox_id] = sandbox
-                self._last_used[sandbox_id] = asyncio.get_event_loop().time()
-                self._locks[sandbox_id] = asyncio.Lock()
-
-                logger.info(f"Created sandbox {sandbox_id}")
-                return sandbox_id
-
-            except Exception as e:
-                logger.error(f"Failed to create sandbox: {e}")
-                if sandbox_id in self._sandboxes:
-                    await self.delete_sandbox(sandbox_id)
-                raise RuntimeError(f"Failed to create sandbox: {e}")
-
-    async def get_sandbox(self, sandbox_id: str) -> DockerSandbox:
-        """Gets a sandbox instance.
-
-        Args:
-            sandbox_id: Sandbox ID.
-
-        Returns:
-            DockerSandbox: Sandbox instance.
-
-        Raises:
-            KeyError: If sandbox does not exist.
-        """
-        async with self.sandbox_operation(sandbox_id) as sandbox:
-            return sandbox
-
-    def start_cleanup_task(self) -> None:
-        """Starts automatic cleanup task."""
-
-        async def cleanup_loop():
-            while not self._is_shutting_down:
-                try:
-                    await self._cleanup_idle_sandboxes()
-                except Exception as e:
-                    logger.error(f"Error in cleanup loop: {e}")
-                await asyncio.sleep(self.cleanup_interval)
-
-        self._cleanup_task = asyncio.create_task(cleanup_loop())
-
-    async def _cleanup_idle_sandboxes(self) -> None:
-        """Cleans up idle sandboxes."""
-        current_time = asyncio.get_event_loop().time()
-        to_cleanup = []
-
-        async with self._global_lock:
-            for sandbox_id, last_used in self._last_used.items():
-                if (
-                    sandbox_id not in self._active_operations
-                    and current_time - last_used > self.idle_timeout
-                ):
-                    to_cleanup.append(sandbox_id)
-
-        for sandbox_id in to_cleanup:
-            try:
-                await self.delete_sandbox(sandbox_id)
-            except Exception as e:
-                logger.error(f"Error cleaning up sandbox {sandbox_id}: {e}")
-
-    async def cleanup(self) -> None:
-        """Cleans up all resources."""
-        logger.info("Starting manager cleanup...")
-        self._is_shutting_down = True
-
-        # Cancel cleanup task
-        if self._cleanup_task:
-            self._cleanup_task.cancel()
-            try:
-                await asyncio.wait_for(self._cleanup_task, timeout=1.0)
-            except (asyncio.CancelledError, asyncio.TimeoutError):
-                pass
-
-        # Get all sandbox IDs to clean up
-        async with self._global_lock:
-            sandbox_ids = list(self._sandboxes.keys())
-
-        # Concurrently clean up all sandboxes
-        cleanup_tasks = []
-        for sandbox_id in sandbox_ids:
-            task = asyncio.create_task(self._safe_delete_sandbox(sandbox_id))
-            cleanup_tasks.append(task)
-
-        if cleanup_tasks:
-            # Wait for all cleanup tasks to complete, with timeout to avoid infinite waiting
-            try:
-                await asyncio.wait(cleanup_tasks, timeout=30.0)
-            except asyncio.TimeoutError:
-                logger.error("Sandbox cleanup timed out")
-
-        # Clean up remaining references
-        self._sandboxes.clear()
-        self._last_used.clear()
-        self._locks.clear()
-        self._active_operations.clear()
-
-        logger.info("Manager cleanup completed")
-
-    async def _safe_delete_sandbox(self, sandbox_id: str) -> None:
-        """Safely deletes a single sandbox.
-
-        Args:
-            sandbox_id: Sandbox ID to delete.
-        """
-        try:
-            if sandbox_id in self._active_operations:
-                logger.warning(
-                    f"Sandbox {sandbox_id} has active operations, waiting for completion"
-                )
-                for _ in range(10):  # Wait at most 10 times
-                    await asyncio.sleep(0.5)
-                    if sandbox_id not in self._active_operations:
-                        break
-                else:
-                    logger.warning(
-                        f"Timeout waiting for sandbox {sandbox_id} operations to complete"
-                    )
-
-            # Get reference to sandbox object
-            sandbox = self._sandboxes.get(sandbox_id)
-            if sandbox:
-                await sandbox.cleanup()
-
-                # Remove sandbox record from manager
-                async with self._global_lock:
-                    self._sandboxes.pop(sandbox_id, None)
-                    self._last_used.pop(sandbox_id, None)
-                    self._locks.pop(sandbox_id, None)
-                    logger.info(f"Deleted sandbox {sandbox_id}")
-        except Exception as e:
-            logger.error(f"Error during cleanup of sandbox {sandbox_id}: {e}")
-
-    async def delete_sandbox(self, sandbox_id: str) -> None:
-        """Deletes specified sandbox.
-
-        Args:
-            sandbox_id: Sandbox ID.
-        """
-        if sandbox_id not in self._sandboxes:
-            return
-
-        try:
-            await self._safe_delete_sandbox(sandbox_id)
-        except Exception as e:
-            logger.error(f"Failed to delete sandbox {sandbox_id}: {e}")
-
-    async def __aenter__(self) -> "SandboxManager":
-        """Async context manager entry."""
-        return self
-
-    async def __aexit__(self, exc_type, exc_val, exc_tb) -> None:
-        """Async context manager exit."""
-        await self.cleanup()
-
-    def get_stats(self) -> Dict:
-        """Gets manager statistics.
-
-        Returns:
-            Dict: Statistics information.
-        """
-        return {
-            "total_sandboxes": len(self._sandboxes),
-            "active_operations": len(self._active_operations),
-            "max_sandboxes": self.max_sandboxes,
-            "idle_timeout": self.idle_timeout,
-            "cleanup_interval": self.cleanup_interval,
-            "is_shutting_down": self._is_shutting_down,
-        }
diff --git a/openmanus_rl/agentgym/OpenManus/app/sandbox/core/sandbox.py b/openmanus_rl/agentgym/OpenManus/app/sandbox/core/sandbox.py
deleted file mode 100644
index c57b3f23..00000000
--- a/openmanus_rl/agentgym/OpenManus/app/sandbox/core/sandbox.py
+++ /dev/null
@@ -1,462 +0,0 @@
-import asyncio
-import io
-import os
-import tarfile
-import tempfile
-import uuid
-from typing import Dict, Optional
-
-import docker
-from docker.errors import NotFound
-from docker.models.containers import Container
-
-from app.config import SandboxSettings
-from app.sandbox.core.exceptions import SandboxTimeoutError
-from app.sandbox.core.terminal import AsyncDockerizedTerminal
-
-
-class DockerSandbox:
-    """Docker sandbox environment.
-
-    Provides a containerized execution environment with resource limits,
-    file operations, and command execution capabilities.
-
-    Attributes:
-        config: Sandbox configuration.
-        volume_bindings: Volume mapping configuration.
-        client: Docker client.
-        container: Docker container instance.
-        terminal: Container terminal interface.
-    """
-
-    def __init__(
-        self,
-        config: Optional[SandboxSettings] = None,
-        volume_bindings: Optional[Dict[str, str]] = None,
-    ):
-        """Initializes a sandbox instance.
-
-        Args:
-            config: Sandbox configuration. Default configuration used if None.
-            volume_bindings: Volume mappings in {host_path: container_path} format.
-        """
-        self.config = config or SandboxSettings()
-        self.volume_bindings = volume_bindings or {}
-        self.client = docker.from_env()
-        self.container: Optional[Container] = None
-        self.terminal: Optional[AsyncDockerizedTerminal] = None
-
-    async def create(self) -> "DockerSandbox":
-        """Creates and starts the sandbox container.
-
-        Returns:
-            Current sandbox instance.
-
-        Raises:
-            docker.errors.APIError: If Docker API call fails.
-            RuntimeError: If container creation or startup fails.
-        """
-        try:
-            # Prepare container config
-            host_config = self.client.api.create_host_config(
-                mem_limit=self.config.memory_limit,
-                cpu_period=100000,
-                cpu_quota=int(100000 * self.config.cpu_limit),
-                network_mode="none" if not self.config.network_enabled else "bridge",
-                binds=self._prepare_volume_bindings(),
-            )
-
-            # Generate unique container name with sandbox_ prefix
-            container_name = f"sandbox_{uuid.uuid4().hex[:8]}"
-
-            # Create container
-            container = await asyncio.to_thread(
-                self.client.api.create_container,
-                image=self.config.image,
-                command="tail -f /dev/null",
-                hostname="sandbox",
-                working_dir=self.config.work_dir,
-                host_config=host_config,
-                name=container_name,
-                tty=True,
-                detach=True,
-            )
-
-            self.container = self.client.containers.get(container["Id"])
-
-            # Start container
-            await asyncio.to_thread(self.container.start)
-
-            # Initialize terminal
-            self.terminal = AsyncDockerizedTerminal(
-                container["Id"],
-                self.config.work_dir,
-                env_vars={"PYTHONUNBUFFERED": "1"}
-                # Ensure Python output is not buffered
-            )
-            await self.terminal.init()
-
-            return self
-
-        except Exception as e:
-            await self.cleanup()  # Ensure resources are cleaned up
-            raise RuntimeError(f"Failed to create sandbox: {e}") from e
-
-    def _prepare_volume_bindings(self) -> Dict[str, Dict[str, str]]:
-        """Prepares volume binding configuration.
-
-        Returns:
-            Volume binding configuration dictionary.
-        """
-        bindings = {}
-
-        # Create and add working directory mapping
-        work_dir = self._ensure_host_dir(self.config.work_dir)
-        bindings[work_dir] = {"bind": self.config.work_dir, "mode": "rw"}
-
-        # Add custom volume bindings
-        for host_path, container_path in self.volume_bindings.items():
-            bindings[host_path] = {"bind": container_path, "mode": "rw"}
-
-        return bindings
-
-    @staticmethod
-    def _ensure_host_dir(path: str) -> str:
-        """Ensures directory exists on the host.
-
-        Args:
-            path: Directory path.
-
-        Returns:
-            Actual path on the host.
-        """
-        host_path = os.path.join(
-            tempfile.gettempdir(),
-            f"sandbox_{os.path.basename(path)}_{os.urandom(4).hex()}",
-        )
-        os.makedirs(host_path, exist_ok=True)
-        return host_path
-
-    async def run_command(self, cmd: str, timeout: Optional[int] = None) -> str:
-        """Runs a command in the sandbox.
-
-        Args:
-            cmd: Command to execute.
-            timeout: Timeout in seconds.
-
-        Returns:
-            Command output as string.
-
-        Raises:
-            RuntimeError: If sandbox not initialized or command execution fails.
-            TimeoutError: If command execution times out.
-        """
-        if not self.terminal:
-            raise RuntimeError("Sandbox not initialized")
-
-        try:
-            return await self.terminal.run_command(
-                cmd, timeout=timeout or self.config.timeout
-            )
-        except TimeoutError:
-            raise SandboxTimeoutError(
-                f"Command execution timed out after {timeout or self.config.timeout} seconds"
-            )
-
-    async def read_file(self, path: str) -> str:
-        """Reads a file from the container.
-
-        Args:
-            path: File path.
-
-        Returns:
-            File contents as string.
-
-        Raises:
-            FileNotFoundError: If file does not exist.
-            RuntimeError: If read operation fails.
-        """
-        if not self.container:
-            raise RuntimeError("Sandbox not initialized")
-
-        try:
-            # Get file archive
-            resolved_path = self._safe_resolve_path(path)
-            tar_stream, _ = await asyncio.to_thread(
-                self.container.get_archive, resolved_path
-            )
-
-            # Read file content from tar stream
-            content = await self._read_from_tar(tar_stream)
-            return content.decode("utf-8")
-
-        except NotFound:
-            raise FileNotFoundError(f"File not found: {path}")
-        except Exception as e:
-            raise RuntimeError(f"Failed to read file: {e}")
-
-    async def write_file(self, path: str, content: str) -> None:
-        """Writes content to a file in the container.
-
-        Args:
-            path: Target path.
-            content: File content.
-
-        Raises:
-            RuntimeError: If write operation fails.
-        """
-        if not self.container:
-            raise RuntimeError("Sandbox not initialized")
-
-        try:
-            resolved_path = self._safe_resolve_path(path)
-            parent_dir = os.path.dirname(resolved_path)
-
-            # Create parent directory
-            if parent_dir:
-                await self.run_command(f"mkdir -p {parent_dir}")
-
-            # Prepare file data
-            tar_stream = await self._create_tar_stream(
-                os.path.basename(path), content.encode("utf-8")
-            )
-
-            # Write file
-            await asyncio.to_thread(
-                self.container.put_archive, parent_dir or "/", tar_stream
-            )
-
-        except Exception as e:
-            raise RuntimeError(f"Failed to write file: {e}")
-
-    def _safe_resolve_path(self, path: str) -> str:
-        """Safely resolves container path, preventing path traversal.
-
-        Args:
-            path: Original path.
-
-        Returns:
-            Resolved absolute path.
-
-        Raises:
-            ValueError: If path contains potentially unsafe patterns.
-        """
-        # Check for path traversal attempts
-        if ".." in path.split("/"):
-            raise ValueError("Path contains potentially unsafe patterns")
-
-        resolved = (
-            os.path.join(self.config.work_dir, path)
-            if not os.path.isabs(path)
-            else path
-        )
-        return resolved
-
-    async def copy_from(self, src_path: str, dst_path: str) -> None:
-        """Copies a file from the container.
-
-        Args:
-            src_path: Source file path (container).
-            dst_path: Destination path (host).
-
-        Raises:
-            FileNotFoundError: If source file does not exist.
-            RuntimeError: If copy operation fails.
-        """
-        try:
-            # Ensure destination file's parent directory exists
-            parent_dir = os.path.dirname(dst_path)
-            if parent_dir:
-                os.makedirs(parent_dir, exist_ok=True)
-
-            # Get file stream
-            resolved_src = self._safe_resolve_path(src_path)
-            stream, stat = await asyncio.to_thread(
-                self.container.get_archive, resolved_src
-            )
-
-            # Create temporary directory to extract file
-            with tempfile.TemporaryDirectory() as tmp_dir:
-                # Write stream to temporary file
-                tar_path = os.path.join(tmp_dir, "temp.tar")
-                with open(tar_path, "wb") as f:
-                    for chunk in stream:
-                        f.write(chunk)
-
-                # Extract file
-                with tarfile.open(tar_path) as tar:
-                    members = tar.getmembers()
-                    if not members:
-                        raise FileNotFoundError(f"Source file is empty: {src_path}")
-
-                    # If destination is a directory, we should preserve relative path structure
-                    if os.path.isdir(dst_path):
-                        tar.extractall(dst_path)
-                    else:
-                        # If destination is a file, we only extract the source file's content
-                        if len(members) > 1:
-                            raise RuntimeError(
-                                f"Source path is a directory but destination is a file: {src_path}"
-                            )
-
-                        with open(dst_path, "wb") as dst:
-                            src_file = tar.extractfile(members[0])
-                            if src_file is None:
-                                raise RuntimeError(
-                                    f"Failed to extract file: {src_path}"
-                                )
-                            dst.write(src_file.read())
-
-        except docker.errors.NotFound:
-            raise FileNotFoundError(f"Source file not found: {src_path}")
-        except Exception as e:
-            raise RuntimeError(f"Failed to copy file: {e}")
-
-    async def copy_to(self, src_path: str, dst_path: str) -> None:
-        """Copies a file to the container.
-
-        Args:
-            src_path: Source file path (host).
-            dst_path: Destination path (container).
-
-        Raises:
-            FileNotFoundError: If source file does not exist.
-            RuntimeError: If copy operation fails.
-        """
-        try:
-            if not os.path.exists(src_path):
-                raise FileNotFoundError(f"Source file not found: {src_path}")
-
-            # Create destination directory in container
-            resolved_dst = self._safe_resolve_path(dst_path)
-            container_dir = os.path.dirname(resolved_dst)
-            if container_dir:
-                await self.run_command(f"mkdir -p {container_dir}")
-
-            # Create tar file to upload
-            with tempfile.TemporaryDirectory() as tmp_dir:
-                tar_path = os.path.join(tmp_dir, "temp.tar")
-                with tarfile.open(tar_path, "w") as tar:
-                    # Handle directory source path
-                    if os.path.isdir(src_path):
-                        os.path.basename(src_path.rstrip("/"))
-                        for root, _, files in os.walk(src_path):
-                            for file in files:
-                                file_path = os.path.join(root, file)
-                                arcname = os.path.join(
-                                    os.path.basename(dst_path),
-                                    os.path.relpath(file_path, src_path),
-                                )
-                                tar.add(file_path, arcname=arcname)
-                    else:
-                        # Add single file to tar
-                        tar.add(src_path, arcname=os.path.basename(dst_path))
-
-                # Read tar file content
-                with open(tar_path, "rb") as f:
-                    data = f.read()
-
-                # Upload to container
-                await asyncio.to_thread(
-                    self.container.put_archive,
-                    os.path.dirname(resolved_dst) or "/",
-                    data,
-                )
-
-                # Verify file was created successfully
-                try:
-                    await self.run_command(f"test -e {resolved_dst}")
-                except Exception:
-                    raise RuntimeError(f"Failed to verify file creation: {dst_path}")
-
-        except FileNotFoundError:
-            raise
-        except Exception as e:
-            raise RuntimeError(f"Failed to copy file: {e}")
-
-    @staticmethod
-    async def _create_tar_stream(name: str, content: bytes) -> io.BytesIO:
-        """Creates a tar file stream.
-
-        Args:
-            name: Filename.
-            content: File content.
-
-        Returns:
-            Tar file stream.
-        """
-        tar_stream = io.BytesIO()
-        with tarfile.open(fileobj=tar_stream, mode="w") as tar:
-            tarinfo = tarfile.TarInfo(name=name)
-            tarinfo.size = len(content)
-            tar.addfile(tarinfo, io.BytesIO(content))
-        tar_stream.seek(0)
-        return tar_stream
-
-    @staticmethod
-    async def _read_from_tar(tar_stream) -> bytes:
-        """Reads file content from a tar stream.
-
-        Args:
-            tar_stream: Tar file stream.
-
-        Returns:
-            File content.
-
-        Raises:
-            RuntimeError: If read operation fails.
-        """
-        with tempfile.NamedTemporaryFile() as tmp:
-            for chunk in tar_stream:
-                tmp.write(chunk)
-            tmp.seek(0)
-
-            with tarfile.open(fileobj=tmp) as tar:
-                member = tar.next()
-                if not member:
-                    raise RuntimeError("Empty tar archive")
-
-                file_content = tar.extractfile(member)
-                if not file_content:
-                    raise RuntimeError("Failed to extract file content")
-
-                return file_content.read()
-
-    async def cleanup(self) -> None:
-        """Cleans up sandbox resources."""
-        errors = []
-        try:
-            if self.terminal:
-                try:
-                    await self.terminal.close()
-                except Exception as e:
-                    errors.append(f"Terminal cleanup error: {e}")
-                finally:
-                    self.terminal = None
-
-            if self.container:
-                try:
-                    await asyncio.to_thread(self.container.stop, timeout=5)
-                except Exception as e:
-                    errors.append(f"Container stop error: {e}")
-
-                try:
-                    await asyncio.to_thread(self.container.remove, force=True)
-                except Exception as e:
-                    errors.append(f"Container remove error: {e}")
-                finally:
-                    self.container = None
-
-        except Exception as e:
-            errors.append(f"General cleanup error: {e}")
-
-        if errors:
-            print(f"Warning: Errors during cleanup: {', '.join(errors)}")
-
-    async def __aenter__(self) -> "DockerSandbox":
-        """Async context manager entry."""
-        return await self.create()
-
-    async def __aexit__(self, exc_type, exc_val, exc_tb) -> None:
-        """Async context manager exit."""
-        await self.cleanup()
diff --git a/openmanus_rl/agentgym/OpenManus/app/sandbox/core/terminal.py b/openmanus_rl/agentgym/OpenManus/app/sandbox/core/terminal.py
deleted file mode 100644
index aee51844..00000000
--- a/openmanus_rl/agentgym/OpenManus/app/sandbox/core/terminal.py
+++ /dev/null
@@ -1,346 +0,0 @@
-"""
-Asynchronous Docker Terminal
-
-This module provides asynchronous terminal functionality for Docker containers,
-allowing interactive command execution with timeout control.
-"""
-
-import asyncio
-import re
-import socket
-from typing import Dict, Optional, Tuple, Union
-
-import docker
-from docker import APIClient
-from docker.errors import APIError
-from docker.models.containers import Container
-
-
-class DockerSession:
-    def __init__(self, container_id: str) -> None:
-        """Initializes a Docker session.
-
-        Args:
-            container_id: ID of the Docker container.
-        """
-        self.api = APIClient()
-        self.container_id = container_id
-        self.exec_id = None
-        self.socket = None
-
-    async def create(self, working_dir: str, env_vars: Dict[str, str]) -> None:
-        """Creates an interactive session with the container.
-
-        Args:
-            working_dir: Working directory inside the container.
-            env_vars: Environment variables to set.
-
-        Raises:
-            RuntimeError: If socket connection fails.
-        """
-        startup_command = [
-            "bash",
-            "-c",
-            f"cd {working_dir} && "
-            "PROMPT_COMMAND='' "
-            "PS1='$ ' "
-            "exec bash --norc --noprofile",
-        ]
-
-        exec_data = self.api.exec_create(
-            self.container_id,
-            startup_command,
-            stdin=True,
-            tty=True,
-            stdout=True,
-            stderr=True,
-            privileged=True,
-            user="root",
-            environment={**env_vars, "TERM": "dumb", "PS1": "$ ", "PROMPT_COMMAND": ""},
-        )
-        self.exec_id = exec_data["Id"]
-
-        socket_data = self.api.exec_start(
-            self.exec_id, socket=True, tty=True, stream=True, demux=True
-        )
-
-        if hasattr(socket_data, "_sock"):
-            self.socket = socket_data._sock
-            self.socket.setblocking(False)
-        else:
-            raise RuntimeError("Failed to get socket connection")
-
-        await self._read_until_prompt()
-
-    async def close(self) -> None:
-        """Cleans up session resources.
-
-        1. Sends exit command
-        2. Closes socket connection
-        3. Checks and cleans up exec instance
-        """
-        try:
-            if self.socket:
-                # Send exit command to close bash session
-                try:
-                    self.socket.sendall(b"exit\n")
-                    # Allow time for command execution
-                    await asyncio.sleep(0.1)
-                except:
-                    pass  # Ignore sending errors, continue cleanup
-
-                # Close socket connection
-                try:
-                    self.socket.shutdown(socket.SHUT_RDWR)
-                except:
-                    pass  # Some platforms may not support shutdown
-
-                self.socket.close()
-                self.socket = None
-
-            if self.exec_id:
-                try:
-                    # Check exec instance status
-                    exec_inspect = self.api.exec_inspect(self.exec_id)
-                    if exec_inspect.get("Running", False):
-                        # If still running, wait for it to complete
-                        await asyncio.sleep(0.5)
-                except:
-                    pass  # Ignore inspection errors, continue cleanup
-
-                self.exec_id = None
-
-        except Exception as e:
-            # Log error but don't raise, ensure cleanup continues
-            print(f"Warning: Error during session cleanup: {e}")
-
-    async def _read_until_prompt(self) -> str:
-        """Reads output until prompt is found.
-
-        Returns:
-            String containing output up to the prompt.
-
-        Raises:
-            socket.error: If socket communication fails.
-        """
-        buffer = b""
-        while b"$ " not in buffer:
-            try:
-                chunk = self.socket.recv(4096)
-                if chunk:
-                    buffer += chunk
-            except socket.error as e:
-                if e.errno == socket.EWOULDBLOCK:
-                    await asyncio.sleep(0.1)
-                    continue
-                raise
-        return buffer.decode("utf-8")
-
-    async def execute(self, command: str, timeout: Optional[int] = None) -> str:
-        """Executes a command and returns cleaned output.
-
-        Args:
-            command: Shell command to execute.
-            timeout: Maximum execution time in seconds.
-
-        Returns:
-            Command output as string with prompt markers removed.
-
-        Raises:
-            RuntimeError: If session not initialized or execution fails.
-            TimeoutError: If command execution exceeds timeout.
-        """
-        if not self.socket:
-            raise RuntimeError("Session not initialized")
-
-        try:
-            # Sanitize command to prevent shell injection
-            sanitized_command = self._sanitize_command(command)
-            full_command = f"{sanitized_command}\necho $?\n"
-            self.socket.sendall(full_command.encode())
-
-            async def read_output() -> str:
-                buffer = b""
-                result_lines = []
-                command_sent = False
-
-                while True:
-                    try:
-                        chunk = self.socket.recv(4096)
-                        if not chunk:
-                            break
-
-                        buffer += chunk
-                        lines = buffer.split(b"\n")
-
-                        buffer = lines[-1]
-                        lines = lines[:-1]
-
-                        for line in lines:
-                            line = line.rstrip(b"\r")
-
-                            if not command_sent:
-                                command_sent = True
-                                continue
-
-                            if line.strip() == b"echo $?" or line.strip().isdigit():
-                                continue
-
-                            if line.strip():
-                                result_lines.append(line)
-
-                        if buffer.endswith(b"$ "):
-                            break
-
-                    except socket.error as e:
-                        if e.errno == socket.EWOULDBLOCK:
-                            await asyncio.sleep(0.1)
-                            continue
-                        raise
-
-                output = b"\n".join(result_lines).decode("utf-8")
-                output = re.sub(r"\n\$ echo \$\$?.*$", "", output)
-
-                return output
-
-            if timeout:
-                result = await asyncio.wait_for(read_output(), timeout)
-            else:
-                result = await read_output()
-
-            return result.strip()
-
-        except asyncio.TimeoutError:
-            raise TimeoutError(f"Command execution timed out after {timeout} seconds")
-        except Exception as e:
-            raise RuntimeError(f"Failed to execute command: {e}")
-
-    def _sanitize_command(self, command: str) -> str:
-        """Sanitizes the command string to prevent shell injection.
-
-        Args:
-            command: Raw command string.
-
-        Returns:
-            Sanitized command string.
-
-        Raises:
-            ValueError: If command contains potentially dangerous patterns.
-        """
-
-        # Additional checks for specific risky commands
-        risky_commands = [
-            "rm -rf /",
-            "rm -rf /*",
-            "mkfs",
-            "dd if=/dev/zero",
-            ":(){:|:&};:",
-            "chmod -R 777 /",
-            "chown -R",
-        ]
-
-        for risky in risky_commands:
-            if risky in command.lower():
-                raise ValueError(
-                    f"Command contains potentially dangerous operation: {risky}"
-                )
-
-        return command
-
-
-class AsyncDockerizedTerminal:
-    def __init__(
-        self,
-        container: Union[str, Container],
-        working_dir: str = "/workspace",
-        env_vars: Optional[Dict[str, str]] = None,
-        default_timeout: int = 60,
-    ) -> None:
-        """Initializes an asynchronous terminal for Docker containers.
-
-        Args:
-            container: Docker container ID or Container object.
-            working_dir: Working directory inside the container.
-            env_vars: Environment variables to set.
-            default_timeout: Default command execution timeout in seconds.
-        """
-        self.client = docker.from_env()
-        self.container = (
-            container
-            if isinstance(container, Container)
-            else self.client.containers.get(container)
-        )
-        self.working_dir = working_dir
-        self.env_vars = env_vars or {}
-        self.default_timeout = default_timeout
-        self.session = None
-
-    async def init(self) -> None:
-        """Initializes the terminal environment.
-
-        Ensures working directory exists and creates an interactive session.
-
-        Raises:
-            RuntimeError: If initialization fails.
-        """
-        await self._ensure_workdir()
-
-        self.session = DockerSession(self.container.id)
-        await self.session.create(self.working_dir, self.env_vars)
-
-    async def _ensure_workdir(self) -> None:
-        """Ensures working directory exists in container.
-
-        Raises:
-            RuntimeError: If directory creation fails.
-        """
-        try:
-            await self._exec_simple(f"mkdir -p {self.working_dir}")
-        except APIError as e:
-            raise RuntimeError(f"Failed to create working directory: {e}")
-
-    async def _exec_simple(self, cmd: str) -> Tuple[int, str]:
-        """Executes a simple command using Docker's exec_run.
-
-        Args:
-            cmd: Command to execute.
-
-        Returns:
-            Tuple of (exit_code, output).
-        """
-        result = await asyncio.to_thread(
-            self.container.exec_run, cmd, environment=self.env_vars
-        )
-        return result.exit_code, result.output.decode("utf-8")
-
-    async def run_command(self, cmd: str, timeout: Optional[int] = None) -> str:
-        """Runs a command in the container with timeout.
-
-        Args:
-            cmd: Shell command to execute.
-            timeout: Maximum execution time in seconds.
-
-        Returns:
-            Command output as string.
-
-        Raises:
-            RuntimeError: If terminal not initialized.
-        """
-        if not self.session:
-            raise RuntimeError("Terminal not initialized")
-
-        return await self.session.execute(cmd, timeout=timeout or self.default_timeout)
-
-    async def close(self) -> None:
-        """Closes the terminal session."""
-        if self.session:
-            await self.session.close()
-
-    async def __aenter__(self) -> "AsyncDockerizedTerminal":
-        """Async context manager entry."""
-        await self.init()
-        return self
-
-    async def __aexit__(self, exc_type, exc_val, exc_tb) -> None:
-        """Async context manager exit."""
-        await self.close()
diff --git a/openmanus_rl/agentgym/OpenManus/app/schema.py b/openmanus_rl/agentgym/OpenManus/app/schema.py
deleted file mode 100644
index de18c4fd..00000000
--- a/openmanus_rl/agentgym/OpenManus/app/schema.py
+++ /dev/null
@@ -1,184 +0,0 @@
-from enum import Enum
-from typing import Any, List, Literal, Optional, Union
-
-from pydantic import BaseModel, Field
-
-
-class Role(str, Enum):
-    """Message role options"""
-
-    SYSTEM = "system"
-    USER = "user"
-    ASSISTANT = "assistant"
-    TOOL = "tool"
-
-
-ROLE_VALUES = tuple(role.value for role in Role)
-ROLE_TYPE = Literal[ROLE_VALUES]  # type: ignore
-
-
-class ToolChoice(str, Enum):
-    """Tool choice options"""
-
-    NONE = "none"
-    AUTO = "auto"
-    REQUIRED = "required"
-
-
-TOOL_CHOICE_VALUES = tuple(choice.value for choice in ToolChoice)
-TOOL_CHOICE_TYPE = Literal[TOOL_CHOICE_VALUES]  # type: ignore
-
-
-class AgentState(str, Enum):
-    """Agent execution states"""
-
-    IDLE = "IDLE"
-    RUNNING = "RUNNING"
-    FINISHED = "FINISHED"
-    ERROR = "ERROR"
-
-
-class Function(BaseModel):
-    name: str
-    arguments: str
-
-
-class ToolCall(BaseModel):
-    """Represents a tool/function call in a message"""
-
-    id: str
-    type: str = "function"
-    function: Function
-
-
-class Message(BaseModel):
-    """Represents a chat message in the conversation"""
-
-    role: ROLE_TYPE = Field(...)  # type: ignore
-    content: Optional[str] = Field(default=None)
-    tool_calls: Optional[List[ToolCall]] = Field(default=None)
-    name: Optional[str] = Field(default=None)
-    tool_call_id: Optional[str] = Field(default=None)
-    base64_image: Optional[str] = Field(default=None)
-
-    def __add__(self, other) -> List["Message"]:
-        """支持 Message + list 或 Message + Message 的操作"""
-        if isinstance(other, list):
-            return [self] + other
-        elif isinstance(other, Message):
-            return [self, other]
-        else:
-            raise TypeError(
-                f"unsupported operand type(s) for +: '{type(self).__name__}' and '{type(other).__name__}'"
-            )
-
-    def __radd__(self, other) -> List["Message"]:
-        """支持 list + Message 的操作"""
-        if isinstance(other, list):
-            return other + [self]
-        else:
-            raise TypeError(
-                f"unsupported operand type(s) for +: '{type(other).__name__}' and '{type(self).__name__}'"
-            )
-
-    def to_dict(self) -> dict:
-        """Convert message to dictionary format"""
-        message = {"role": self.role}
-        if self.content is not None:
-            message["content"] = self.content
-        if self.tool_calls is not None:
-            message["tool_calls"] = [tool_call.dict() for tool_call in self.tool_calls]
-        if self.name is not None:
-            message["name"] = self.name
-        if self.tool_call_id is not None:
-            message["tool_call_id"] = self.tool_call_id
-        if self.base64_image is not None:
-            message["base64_image"] = self.base64_image
-        return message
-
-    @classmethod
-    def user_message(
-        cls, content: str, base64_image: Optional[str] = None
-    ) -> "Message":
-        """Create a user message"""
-        return cls(role=Role.USER, content=content, base64_image=base64_image)
-
-    @classmethod
-    def system_message(cls, content: str) -> "Message":
-        """Create a system message"""
-        return cls(role=Role.SYSTEM, content=content)
-
-    @classmethod
-    def assistant_message(
-        cls, content: Optional[str] = None, base64_image: Optional[str] = None
-    ) -> "Message":
-        """Create an assistant message"""
-        return cls(role=Role.ASSISTANT, content=content, base64_image=base64_image)
-
-    @classmethod
-    def tool_message(
-        cls, content: str, name, tool_call_id: str, base64_image: Optional[str] = None
-    ) -> "Message":
-        """Create a tool message"""
-        return cls(
-            role=Role.TOOL,
-            content=content,
-            name=name,
-            tool_call_id=tool_call_id,
-            base64_image=base64_image,
-        )
-
-    @classmethod
-    def from_tool_calls(
-        cls,
-        tool_calls: List[Any],
-        content: Union[str, List[str]] = "",
-        base64_image: Optional[str] = None,
-        **kwargs,
-    ):
-        """Create ToolCallsMessage from raw tool calls.
-
-        Args:
-            tool_calls: Raw tool calls from LLM
-            content: Optional message content
-            base64_image: Optional base64 encoded image
-        """
-        formatted_calls = [
-            {"id": call.id, "function": call.function.model_dump(), "type": "function"}
-            for call in tool_calls
-        ]
-        return cls(
-            role=Role.ASSISTANT,
-            content=content,
-            tool_calls=formatted_calls,
-            base64_image=base64_image,
-            **kwargs,
-        )
-
-
-class Memory(BaseModel):
-    messages: List[Message] = Field(default_factory=list)
-    max_messages: int = Field(default=100)
-
-    def add_message(self, message: Message) -> None:
-        """Add a message to memory"""
-        self.messages.append(message)
-        # Optional: Implement message limit
-        if len(self.messages) > self.max_messages:
-            self.messages = self.messages[-self.max_messages :]
-
-    def add_messages(self, messages: List[Message]) -> None:
-        """Add multiple messages to memory"""
-        self.messages.extend(messages)
-
-    def clear(self) -> None:
-        """Clear all messages"""
-        self.messages.clear()
-
-    def get_recent_messages(self, n: int) -> List[Message]:
-        """Get n most recent messages"""
-        return self.messages[-n:]
-
-    def to_dict_list(self) -> List[dict]:
-        """Convert messages to list of dicts"""
-        return [msg.to_dict() for msg in self.messages]
diff --git a/openmanus_rl/agentgym/OpenManus/app/tool/__init__.py b/openmanus_rl/agentgym/OpenManus/app/tool/__init__.py
deleted file mode 100644
index 6fbd1bc7..00000000
--- a/openmanus_rl/agentgym/OpenManus/app/tool/__init__.py
+++ /dev/null
@@ -1,20 +0,0 @@
-from app.tool.base import BaseTool
-from app.tool.bash import Bash
-from app.tool.browser_use_tool import BrowserUseTool
-from app.tool.create_chat_completion import CreateChatCompletion
-from app.tool.planning import PlanningTool
-from app.tool.str_replace_editor import StrReplaceEditor
-from app.tool.terminate import Terminate
-from app.tool.tool_collection import ToolCollection
-
-
-__all__ = [
-    "BaseTool",
-    "Bash",
-    "BrowserUseTool",
-    "Terminate",
-    "StrReplaceEditor",
-    "ToolCollection",
-    "CreateChatCompletion",
-    "PlanningTool",
-]
diff --git a/openmanus_rl/agentgym/OpenManus/app/tool/base.py b/openmanus_rl/agentgym/OpenManus/app/tool/base.py
deleted file mode 100644
index ba4084db..00000000
--- a/openmanus_rl/agentgym/OpenManus/app/tool/base.py
+++ /dev/null
@@ -1,80 +0,0 @@
-from abc import ABC, abstractmethod
-from typing import Any, Dict, Optional
-
-from pydantic import BaseModel, Field
-
-
-class BaseTool(ABC, BaseModel):
-    name: str
-    description: str
-    parameters: Optional[dict] = None
-
-    class Config:
-        arbitrary_types_allowed = True
-
-    async def __call__(self, **kwargs) -> Any:
-        """Execute the tool with given parameters."""
-        return await self.execute(**kwargs)
-
-    @abstractmethod
-    async def execute(self, **kwargs) -> Any:
-        """Execute the tool with given parameters."""
-
-    def to_param(self) -> Dict:
-        """Convert tool to function call format."""
-        return {
-            "type": "function",
-            "function": {
-                "name": self.name,
-                "description": self.description,
-                "parameters": self.parameters,
-            },
-        }
-
-
-class ToolResult(BaseModel):
-    """Represents the result of a tool execution."""
-
-    output: Any = Field(default=None)
-    error: Optional[str] = Field(default=None)
-    base64_image: Optional[str] = Field(default=None)
-    system: Optional[str] = Field(default=None)
-
-    class Config:
-        arbitrary_types_allowed = True
-
-    def __bool__(self):
-        return any(getattr(self, field) for field in self.__fields__)
-
-    def __add__(self, other: "ToolResult"):
-        def combine_fields(
-            field: Optional[str], other_field: Optional[str], concatenate: bool = True
-        ):
-            if field and other_field:
-                if concatenate:
-                    return field + other_field
-                raise ValueError("Cannot combine tool results")
-            return field or other_field
-
-        return ToolResult(
-            output=combine_fields(self.output, other.output),
-            error=combine_fields(self.error, other.error),
-            base64_image=combine_fields(self.base64_image, other.base64_image, False),
-            system=combine_fields(self.system, other.system),
-        )
-
-    def __str__(self):
-        return f"Error: {self.error}" if self.error else self.output
-
-    def replace(self, **kwargs):
-        """Returns a new ToolResult with the given fields replaced."""
-        # return self.copy(update=kwargs)
-        return type(self)(**{**self.dict(), **kwargs})
-
-
-class CLIResult(ToolResult):
-    """A ToolResult that can be rendered as a CLI output."""
-
-
-class ToolFailure(ToolResult):
-    """A ToolResult that represents a failure."""
diff --git a/openmanus_rl/agentgym/OpenManus/app/tool/bash.py b/openmanus_rl/agentgym/OpenManus/app/tool/bash.py
deleted file mode 100644
index c6b9072f..00000000
--- a/openmanus_rl/agentgym/OpenManus/app/tool/bash.py
+++ /dev/null
@@ -1,158 +0,0 @@
-import asyncio
-import os
-from typing import Optional
-
-from app.exceptions import ToolError
-from app.tool.base import BaseTool, CLIResult
-
-
-_BASH_DESCRIPTION = """Execute a bash command in the terminal.
-* Long running commands: For commands that may run indefinitely, it should be run in the background and the output should be redirected to a file, e.g. command = `python3 app.py > server.log 2>&1 &`.
-* Interactive: If a bash command returns exit code `-1`, this means the process is not yet finished. The assistant must then send a second call to terminal with an empty `command` (which will retrieve any additional logs), or it can send additional text (set `command` to the text) to STDIN of the running process, or it can send command=`ctrl+c` to interrupt the process.
-* Timeout: If a command execution result says "Command timed out. Sending SIGINT to the process", the assistant should retry running the command in the background.
-"""
-
-
-class _BashSession:
-    """A session of a bash shell."""
-
-    _started: bool
-    _process: asyncio.subprocess.Process
-
-    command: str = "/bin/bash"
-    _output_delay: float = 0.2  # seconds
-    _timeout: float = 120.0  # seconds
-    _sentinel: str = "<<exit>>"
-
-    def __init__(self):
-        self._started = False
-        self._timed_out = False
-
-    async def start(self):
-        if self._started:
-            return
-
-        self._process = await asyncio.create_subprocess_shell(
-            self.command,
-            preexec_fn=os.setsid,
-            shell=True,
-            bufsize=0,
-            stdin=asyncio.subprocess.PIPE,
-            stdout=asyncio.subprocess.PIPE,
-            stderr=asyncio.subprocess.PIPE,
-        )
-
-        self._started = True
-
-    def stop(self):
-        """Terminate the bash shell."""
-        if not self._started:
-            raise ToolError("Session has not started.")
-        if self._process.returncode is not None:
-            return
-        self._process.terminate()
-
-    async def run(self, command: str):
-        """Execute a command in the bash shell."""
-        if not self._started:
-            raise ToolError("Session has not started.")
-        if self._process.returncode is not None:
-            return CLIResult(
-                system="tool must be restarted",
-                error=f"bash has exited with returncode {self._process.returncode}",
-            )
-        if self._timed_out:
-            raise ToolError(
-                f"timed out: bash has not returned in {self._timeout} seconds and must be restarted",
-            )
-
-        # we know these are not None because we created the process with PIPEs
-        assert self._process.stdin
-        assert self._process.stdout
-        assert self._process.stderr
-
-        # send command to the process
-        self._process.stdin.write(
-            command.encode() + f"; echo '{self._sentinel}'\n".encode()
-        )
-        await self._process.stdin.drain()
-
-        # read output from the process, until the sentinel is found
-        try:
-            async with asyncio.timeout(self._timeout):
-                while True:
-                    await asyncio.sleep(self._output_delay)
-                    # if we read directly from stdout/stderr, it will wait forever for
-                    # EOF. use the StreamReader buffer directly instead.
-                    output = (
-                        self._process.stdout._buffer.decode()
-                    )  # pyright: ignore[reportAttributeAccessIssue]
-                    if self._sentinel in output:
-                        # strip the sentinel and break
-                        output = output[: output.index(self._sentinel)]
-                        break
-        except asyncio.TimeoutError:
-            self._timed_out = True
-            raise ToolError(
-                f"timed out: bash has not returned in {self._timeout} seconds and must be restarted",
-            ) from None
-
-        if output.endswith("\n"):
-            output = output[:-1]
-
-        error = (
-            self._process.stderr._buffer.decode()
-        )  # pyright: ignore[reportAttributeAccessIssue]
-        if error.endswith("\n"):
-            error = error[:-1]
-
-        # clear the buffers so that the next output can be read correctly
-        self._process.stdout._buffer.clear()  # pyright: ignore[reportAttributeAccessIssue]
-        self._process.stderr._buffer.clear()  # pyright: ignore[reportAttributeAccessIssue]
-
-        return CLIResult(output=output, error=error)
-
-
-class Bash(BaseTool):
-    """A tool for executing bash commands"""
-
-    name: str = "bash"
-    description: str = _BASH_DESCRIPTION
-    parameters: dict = {
-        "type": "object",
-        "properties": {
-            "command": {
-                "type": "string",
-                "description": "The bash command to execute. Can be empty to view additional logs when previous exit code is `-1`. Can be `ctrl+c` to interrupt the currently running process.",
-            },
-        },
-        "required": ["command"],
-    }
-
-    _session: Optional[_BashSession] = None
-
-    async def execute(
-        self, command: str | None = None, restart: bool = False, **kwargs
-    ) -> CLIResult:
-        if restart:
-            if self._session:
-                self._session.stop()
-            self._session = _BashSession()
-            await self._session.start()
-
-            return CLIResult(system="tool has been restarted.")
-
-        if self._session is None:
-            self._session = _BashSession()
-            await self._session.start()
-
-        if command is not None:
-            return await self._session.run(command)
-
-        raise ToolError("no command provided.")
-
-
-if __name__ == "__main__":
-    bash = Bash()
-    rst = asyncio.run(bash.execute("ls -l"))
-    print(rst)
diff --git a/openmanus_rl/agentgym/OpenManus/app/tool/browser_use_tool.py b/openmanus_rl/agentgym/OpenManus/app/tool/browser_use_tool.py
deleted file mode 100644
index 0158e075..00000000
--- a/openmanus_rl/agentgym/OpenManus/app/tool/browser_use_tool.py
+++ /dev/null
@@ -1,621 +0,0 @@
-import asyncio
-import base64
-import json
-from typing import Generic, Optional, TypeVar
-
-from browser_use import Browser as BrowserUseBrowser
-from browser_use import BrowserConfig
-from browser_use.browser.context import BrowserContext, BrowserContextConfig
-from browser_use.dom.service import DomService
-from pydantic import Field, field_validator
-from pydantic_core.core_schema import ValidationInfo
-
-from app.config import config
-from app.llm import LLM
-from app.tool.base import BaseTool, ToolResult
-from app.tool.web_search import WebSearch
-
-
-_BROWSER_DESCRIPTION = """
-Interact with a web browser to perform various actions such as navigation, element interaction, content extraction, and tab management. This tool provides a comprehensive set of browser automation capabilities:
-
-Navigation:
-- 'go_to_url': Go to a specific URL in the current tab
-- 'go_back': Go back
-- 'refresh': Refresh the current page
-- 'web_search': Search the query in the current tab, the query should be a search query like humans search in web, concrete and not vague or super long. More the single most important items.
-
-Element Interaction:
-- 'click_element': Click an element by index
-- 'input_text': Input text into a form element
-- 'scroll_down'/'scroll_up': Scroll the page (with optional pixel amount)
-- 'scroll_to_text': If you dont find something which you want to interact with, scroll to it
-- 'send_keys': Send strings of special keys like Escape,Backspace, Insert, PageDown, Delete, Enter, Shortcuts such as `Control+o`, `Control+Shift+T` are supported as well. This gets used in keyboard.press.
-- 'get_dropdown_options': Get all options from a dropdown
-- 'select_dropdown_option': Select dropdown option for interactive element index by the text of the option you want to select
-
-Content Extraction:
-- 'extract_content': Extract page content to retrieve specific information from the page, e.g. all company names, a specifc description, all information about, links with companies in structured format or simply links
-
-Tab Management:
-- 'switch_tab': Switch to a specific tab
-- 'open_tab': Open a new tab with a URL
-- 'close_tab': Close the current tab
-
-Utility:
-- 'wait': Wait for a specified number of seconds
-"""
-
-Context = TypeVar("Context")
-
-
-class BrowserUseTool(BaseTool, Generic[Context]):
-    name: str = "browser_use"
-    description: str = _BROWSER_DESCRIPTION
-    parameters: dict = {
-        "type": "object",
-        "properties": {
-            "action": {
-                "type": "string",
-                "enum": [
-                    "go_to_url",
-                    "click_element",
-                    "input_text",
-                    "scroll_down",
-                    "scroll_up",
-                    "scroll_to_text",
-                    "send_keys",
-                    "get_dropdown_options",
-                    "select_dropdown_option",
-                    "go_back",
-                    "web_search",
-                    "wait",
-                    "extract_content",
-                    "switch_tab",
-                    "open_tab",
-                    "close_tab",
-                ],
-                "description": "The browser action to perform",
-            },
-            "url": {
-                "type": "string",
-                "description": "URL for 'go_to_url' or 'open_tab' actions",
-            },
-            "index": {
-                "type": "integer",
-                "description": "Element index for 'click_element', 'input_text', 'get_dropdown_options', or 'select_dropdown_option' actions",
-            },
-            "text": {
-                "type": "string",
-                "description": "Text for 'input_text', 'scroll_to_text', or 'select_dropdown_option' actions",
-            },
-            "scroll_amount": {
-                "type": "integer",
-                "description": "Pixels to scroll (positive for down, negative for up) for 'scroll_down' or 'scroll_up' actions",
-            },
-            "tab_id": {
-                "type": "integer",
-                "description": "Tab ID for 'switch_tab' action",
-            },
-            "query": {
-                "type": "string",
-                "description": "Search query for 'web_search' action",
-            },
-            "goal": {
-                "type": "string",
-                "description": "Extraction goal for 'extract_content' action",
-            },
-            "keys": {
-                "type": "string",
-                "description": "Keys to send for 'send_keys' action",
-            },
-            "seconds": {
-                "type": "integer",
-                "description": "Seconds to wait for 'wait' action",
-            },
-        },
-        "required": ["action"],
-        "dependencies": {
-            "go_to_url": ["url"],
-            "click_element": ["index"],
-            "input_text": ["index", "text"],
-            "switch_tab": ["tab_id"],
-            "open_tab": ["url"],
-            "scroll_down": ["scroll_amount"],
-            "scroll_up": ["scroll_amount"],
-            "scroll_to_text": ["text"],
-            "send_keys": ["keys"],
-            "get_dropdown_options": ["index"],
-            "select_dropdown_option": ["index", "text"],
-            "go_back": [],
-            "web_search": ["query"],
-            "wait": ["seconds"],
-            "extract_content": ["goal"],
-        },
-    }
-
-    lock: asyncio.Lock = Field(default_factory=asyncio.Lock)
-    browser: Optional[BrowserUseBrowser] = Field(default=None, exclude=True)
-    context: Optional[BrowserContext] = Field(default=None, exclude=True)
-    dom_service: Optional[DomService] = Field(default=None, exclude=True)
-    web_search_tool: WebSearch = Field(default_factory=WebSearch, exclude=True)
-
-    # Context for generic functionality
-    tool_context: Optional[Context] = Field(default=None, exclude=True)
-
-    llm: Optional[LLM] = Field(default_factory=LLM)
-
-    @field_validator("parameters", mode="before")
-    def validate_parameters(cls, v: dict, info: ValidationInfo) -> dict:
-        if not v:
-            raise ValueError("Parameters cannot be empty")
-        return v
-
-    async def _ensure_browser_initialized(self) -> BrowserContext:
-        """Ensure browser and context are initialized."""
-        if self.browser is None:
-            browser_config_kwargs = {"headless": False, "disable_security": True}
-
-            if config.browser_config:
-                from browser_use.browser.browser import ProxySettings
-
-                # handle proxy settings.
-                if config.browser_config.proxy and config.browser_config.proxy.server:
-                    browser_config_kwargs["proxy"] = ProxySettings(
-                        server=config.browser_config.proxy.server,
-                        username=config.browser_config.proxy.username,
-                        password=config.browser_config.proxy.password,
-                    )
-
-                browser_attrs = [
-                    "headless",
-                    "disable_security",
-                    "extra_chromium_args",
-                    "chrome_instance_path",
-                    "wss_url",
-                    "cdp_url",
-                ]
-
-                for attr in browser_attrs:
-                    value = getattr(config.browser_config, attr, None)
-                    if value is not None:
-                        if not isinstance(value, list) or value:
-                            browser_config_kwargs[attr] = value
-
-            self.browser = BrowserUseBrowser(BrowserConfig(**browser_config_kwargs))
-
-        if self.context is None:
-            context_config = BrowserContextConfig()
-
-            # if there is context config in the config, use it.
-            if (
-                config.browser_config
-                and hasattr(config.browser_config, "new_context_config")
-                and config.browser_config.new_context_config
-            ):
-                context_config = config.browser_config.new_context_config
-
-            self.context = await self.browser.new_context(context_config)
-            self.dom_service = DomService(await self.context.get_current_page())
-
-        return self.context
-
-    async def execute(
-        self,
-        action: str,
-        url: Optional[str] = None,
-        index: Optional[int] = None,
-        text: Optional[str] = None,
-        scroll_amount: Optional[int] = None,
-        tab_id: Optional[int] = None,
-        query: Optional[str] = None,
-        goal: Optional[str] = None,
-        keys: Optional[str] = None,
-        seconds: Optional[int] = None,
-        **kwargs,
-    ) -> ToolResult:
-        """
-        Execute a specified browser action.
-
-        Args:
-            action: The browser action to perform
-            url: URL for navigation or new tab
-            index: Element index for click or input actions
-            text: Text for input action or search query
-            scroll_amount: Pixels to scroll for scroll action
-            tab_id: Tab ID for switch_tab action
-            query: Search query for Google search
-            goal: Extraction goal for content extraction
-            keys: Keys to send for keyboard actions
-            seconds: Seconds to wait
-            **kwargs: Additional arguments
-
-        Returns:
-            ToolResult with the action's output or error
-        """
-        async with self.lock:
-            try:
-                context = await self._ensure_browser_initialized()
-
-                # Get max content length from config
-                max_content_length = getattr(
-                    config.browser_config, "max_content_length", 2000
-                )
-
-                # Navigation actions
-                if action == "go_to_url":
-                    if not url:
-                        return ToolResult(
-                            error="URL is required for 'go_to_url' action"
-                        )
-                    page = await context.get_current_page()
-                    await page.goto(url)
-                    await page.wait_for_load_state()
-                    return ToolResult(output=f"Navigated to {url}")
-
-                elif action == "go_back":
-                    await context.go_back()
-                    return ToolResult(output="Navigated back")
-
-                elif action == "refresh":
-                    await context.refresh_page()
-                    return ToolResult(output="Refreshed current page")
-
-                elif action == "web_search":
-                    if not query:
-                        return ToolResult(
-                            error="Query is required for 'web_search' action"
-                        )
-                    search_results = await self.web_search_tool.execute(query)
-
-                    if search_results:
-                        # Navigate to the first search result
-                        first_result = search_results[0]
-                        if isinstance(first_result, dict) and "url" in first_result:
-                            url_to_navigate = first_result["url"]
-                        elif isinstance(first_result, str):
-                            url_to_navigate = first_result
-                        else:
-                            return ToolResult(
-                                error=f"Invalid search result format: {first_result}"
-                            )
-
-                        page = await context.get_current_page()
-                        await page.goto(url_to_navigate)
-                        await page.wait_for_load_state()
-
-                        return ToolResult(
-                            output=f"Searched for '{query}' and navigated to first result: {url_to_navigate}\nAll results:"
-                            + "\n".join([str(r) for r in search_results])
-                        )
-                    else:
-                        return ToolResult(
-                            error=f"No search results found for '{query}'"
-                        )
-
-                # Element interaction actions
-                elif action == "click_element":
-                    if index is None:
-                        return ToolResult(
-                            error="Index is required for 'click_element' action"
-                        )
-                    element = await context.get_dom_element_by_index(index)
-                    if not element:
-                        return ToolResult(error=f"Element with index {index} not found")
-                    download_path = await context._click_element_node(element)
-                    output = f"Clicked element at index {index}"
-                    if download_path:
-                        output += f" - Downloaded file to {download_path}"
-                    return ToolResult(output=output)
-
-                elif action == "input_text":
-                    if index is None or not text:
-                        return ToolResult(
-                            error="Index and text are required for 'input_text' action"
-                        )
-                    element = await context.get_dom_element_by_index(index)
-                    if not element:
-                        return ToolResult(error=f"Element with index {index} not found")
-                    await context._input_text_element_node(element, text)
-                    return ToolResult(
-                        output=f"Input '{text}' into element at index {index}"
-                    )
-
-                elif action == "scroll_down" or action == "scroll_up":
-                    direction = 1 if action == "scroll_down" else -1
-                    amount = (
-                        scroll_amount
-                        if scroll_amount is not None
-                        else context.config.browser_window_size["height"]
-                    )
-                    await context.execute_javascript(
-                        f"window.scrollBy(0, {direction * amount});"
-                    )
-                    return ToolResult(
-                        output=f"Scrolled {'down' if direction > 0 else 'up'} by {amount} pixels"
-                    )
-
-                elif action == "scroll_to_text":
-                    if not text:
-                        return ToolResult(
-                            error="Text is required for 'scroll_to_text' action"
-                        )
-                    page = await context.get_current_page()
-                    try:
-                        locator = page.get_by_text(text, exact=False)
-                        await locator.scroll_into_view_if_needed()
-                        return ToolResult(output=f"Scrolled to text: '{text}'")
-                    except Exception as e:
-                        return ToolResult(error=f"Failed to scroll to text: {str(e)}")
-
-                elif action == "send_keys":
-                    if not keys:
-                        return ToolResult(
-                            error="Keys are required for 'send_keys' action"
-                        )
-                    page = await context.get_current_page()
-                    await page.keyboard.press(keys)
-                    return ToolResult(output=f"Sent keys: {keys}")
-
-                elif action == "get_dropdown_options":
-                    if index is None:
-                        return ToolResult(
-                            error="Index is required for 'get_dropdown_options' action"
-                        )
-                    element = await context.get_dom_element_by_index(index)
-                    if not element:
-                        return ToolResult(error=f"Element with index {index} not found")
-                    page = await context.get_current_page()
-                    options = await page.evaluate(
-                        """
-                        (xpath) => {
-                            const select = document.evaluate(xpath, document, null,
-                                XPathResult.FIRST_ORDERED_NODE_TYPE, null).singleNodeValue;
-                            if (!select) return null;
-                            return Array.from(select.options).map(opt => ({
-                                text: opt.text,
-                                value: opt.value,
-                                index: opt.index
-                            }));
-                        }
-                    """,
-                        element.xpath,
-                    )
-                    return ToolResult(output=f"Dropdown options: {options}")
-
-                elif action == "select_dropdown_option":
-                    if index is None or not text:
-                        return ToolResult(
-                            error="Index and text are required for 'select_dropdown_option' action"
-                        )
-                    element = await context.get_dom_element_by_index(index)
-                    if not element:
-                        return ToolResult(error=f"Element with index {index} not found")
-                    page = await context.get_current_page()
-                    await page.select_option(element.xpath, label=text)
-                    return ToolResult(
-                        output=f"Selected option '{text}' from dropdown at index {index}"
-                    )
-
-                # Content extraction actions
-                elif action == "extract_content":
-                    if not goal:
-                        return ToolResult(
-                            error="Goal is required for 'extract_content' action"
-                        )
-                    page = await context.get_current_page()
-                    try:
-                        # Get page content and convert to markdown for better processing
-                        html_content = await page.content()
-
-                        # Import markdownify here to avoid global import
-                        try:
-                            import markdownify
-
-                            content = markdownify.markdownify(html_content)
-                        except ImportError:
-                            # Fallback if markdownify is not available
-                            content = html_content
-
-                        # Create prompt for LLM
-                        prompt_text = """
-Your task is to extract the content of the page. You will be given a page and a goal, and you should extract all relevant information around this goal from the page. If the goal is vague, summarize the page. Respond in json format.
-Extraction goal: {goal}
-
-Page content:
-{page}
-"""
-                        # Format the prompt with the goal and content
-                        max_content_length = min(50000, len(content))
-                        formatted_prompt = prompt_text.format(
-                            goal=goal, page=content[:max_content_length]
-                        )
-
-                        # Create a proper message list for the LLM
-                        from app.schema import Message
-
-                        messages = [Message.user_message(formatted_prompt)]
-
-                        # Define extraction function for the tool
-                        extraction_function = {
-                            "type": "function",
-                            "function": {
-                                "name": "extract_content",
-                                "description": "Extract specific information from a webpage based on a goal",
-                                "parameters": {
-                                    "type": "object",
-                                    "properties": {
-                                        "extracted_content": {
-                                            "type": "object",
-                                            "description": "The content extracted from the page according to the goal",
-                                        }
-                                    },
-                                    "required": ["extracted_content"],
-                                },
-                            },
-                        }
-
-                        # Use LLM to extract content with required function calling
-                        response = await self.llm.ask_tool(
-                            messages,
-                            tools=[extraction_function],
-                            tool_choice="required",
-                        )
-
-                        # Extract content from function call response
-                        if (
-                            response
-                            and response.tool_calls
-                            and len(response.tool_calls) > 0
-                        ):
-                            # Get the first tool call arguments
-                            tool_call = response.tool_calls[0]
-                            # Parse the JSON arguments
-                            try:
-                                args = json.loads(tool_call.function.arguments)
-                                extracted_content = args.get("extracted_content", {})
-                                # Format extracted content as JSON string
-                                content_json = json.dumps(
-                                    extracted_content, indent=2, ensure_ascii=False
-                                )
-                                msg = f"Extracted from page:\n{content_json}\n"
-                            except Exception as e:
-                                msg = f"Error parsing extraction result: {str(e)}\nRaw response: {tool_call.function.arguments}"
-                        else:
-                            msg = "No content was extracted from the page."
-
-                        return ToolResult(output=msg)
-                    except Exception as e:
-                        # Provide a more helpful error message
-                        error_msg = f"Failed to extract content: {str(e)}"
-                        try:
-                            # Try to return a portion of the page content as fallback
-                            return ToolResult(
-                                output=f"{error_msg}\nHere's a portion of the page content:\n{content[:2000]}..."
-                            )
-                        except:
-                            # If all else fails, just return the error
-                            return ToolResult(error=error_msg)
-
-                # Tab management actions
-                elif action == "switch_tab":
-                    if tab_id is None:
-                        return ToolResult(
-                            error="Tab ID is required for 'switch_tab' action"
-                        )
-                    await context.switch_to_tab(tab_id)
-                    page = await context.get_current_page()
-                    await page.wait_for_load_state()
-                    return ToolResult(output=f"Switched to tab {tab_id}")
-
-                elif action == "open_tab":
-                    if not url:
-                        return ToolResult(error="URL is required for 'open_tab' action")
-                    await context.create_new_tab(url)
-                    return ToolResult(output=f"Opened new tab with {url}")
-
-                elif action == "close_tab":
-                    await context.close_current_tab()
-                    return ToolResult(output="Closed current tab")
-
-                # Utility actions
-                elif action == "wait":
-                    seconds_to_wait = seconds if seconds is not None else 3
-                    await asyncio.sleep(seconds_to_wait)
-                    return ToolResult(output=f"Waited for {seconds_to_wait} seconds")
-
-                else:
-                    return ToolResult(error=f"Unknown action: {action}")
-
-            except Exception as e:
-                return ToolResult(error=f"Browser action '{action}' failed: {str(e)}")
-
-    async def get_current_state(
-        self, context: Optional[BrowserContext] = None
-    ) -> ToolResult:
-        """
-        Get the current browser state as a ToolResult.
-        If context is not provided, uses self.context.
-        """
-        try:
-            # Use provided context or fall back to self.context
-            ctx = context or self.context
-            if not ctx:
-                return ToolResult(error="Browser context not initialized")
-
-            state = await ctx.get_state()
-
-            # Create a viewport_info dictionary if it doesn't exist
-            viewport_height = 0
-            if hasattr(state, "viewport_info") and state.viewport_info:
-                viewport_height = state.viewport_info.height
-            elif hasattr(ctx, "config") and hasattr(ctx.config, "browser_window_size"):
-                viewport_height = ctx.config.browser_window_size.get("height", 0)
-
-            # Take a screenshot for the state
-            page = await ctx.get_current_page()
-
-            await page.bring_to_front()
-            await page.wait_for_load_state()
-
-            screenshot = await page.screenshot(
-                full_page=True, animations="disabled", type="jpeg", quality=100
-            )
-
-            screenshot = base64.b64encode(screenshot).decode("utf-8")
-
-            # Build the state info with all required fields
-            state_info = {
-                "url": state.url,
-                "title": state.title,
-                "tabs": [tab.model_dump() for tab in state.tabs],
-                "help": "[0], [1], [2], etc., represent clickable indices corresponding to the elements listed. Clicking on these indices will navigate to or interact with the respective content behind them.",
-                "interactive_elements": (
-                    state.element_tree.clickable_elements_to_string()
-                    if state.element_tree
-                    else ""
-                ),
-                "scroll_info": {
-                    "pixels_above": getattr(state, "pixels_above", 0),
-                    "pixels_below": getattr(state, "pixels_below", 0),
-                    "total_height": getattr(state, "pixels_above", 0)
-                    + getattr(state, "pixels_below", 0)
-                    + viewport_height,
-                },
-                "viewport_height": viewport_height,
-            }
-
-            return ToolResult(
-                output=json.dumps(state_info, indent=4, ensure_ascii=False),
-                base64_image=screenshot,
-            )
-        except Exception as e:
-            return ToolResult(error=f"Failed to get browser state: {str(e)}")
-
-    async def cleanup(self):
-        """Clean up browser resources."""
-        async with self.lock:
-            if self.context is not None:
-                await self.context.close()
-                self.context = None
-                self.dom_service = None
-            if self.browser is not None:
-                await self.browser.close()
-                self.browser = None
-
-    def __del__(self):
-        """Ensure cleanup when object is destroyed."""
-        if self.browser is not None or self.context is not None:
-            try:
-                asyncio.run(self.cleanup())
-            except RuntimeError:
-                loop = asyncio.new_event_loop()
-                loop.run_until_complete(self.cleanup())
-                loop.close()
-
-    @classmethod
-    def create_with_context(cls, context: Context) -> "BrowserUseTool[Context]":
-        """Factory method to create a BrowserUseTool with a specific context."""
-        tool = cls()
-        tool.tool_context = context
-        return tool
diff --git a/openmanus_rl/agentgym/OpenManus/app/tool/create_chat_completion.py b/openmanus_rl/agentgym/OpenManus/app/tool/create_chat_completion.py
deleted file mode 100644
index 882a5beb..00000000
--- a/openmanus_rl/agentgym/OpenManus/app/tool/create_chat_completion.py
+++ /dev/null
@@ -1,169 +0,0 @@
-from typing import Any, List, Optional, Type, Union, get_args, get_origin
-
-from pydantic import BaseModel, Field
-
-from app.tool import BaseTool
-
-
-class CreateChatCompletion(BaseTool):
-    name: str = "create_chat_completion"
-    description: str = (
-        "Creates a structured completion with specified output formatting."
-    )
-
-    # Type mapping for JSON schema
-    type_mapping: dict = {
-        str: "string",
-        int: "integer",
-        float: "number",
-        bool: "boolean",
-        dict: "object",
-        list: "array",
-    }
-    response_type: Optional[Type] = None
-    required: List[str] = Field(default_factory=lambda: ["response"])
-
-    def __init__(self, response_type: Optional[Type] = str):
-        """Initialize with a specific response type."""
-        super().__init__()
-        self.response_type = response_type
-        self.parameters = self._build_parameters()
-
-    def _build_parameters(self) -> dict:
-        """Build parameters schema based on response type."""
-        if self.response_type == str:
-            return {
-                "type": "object",
-                "properties": {
-                    "response": {
-                        "type": "string",
-                        "description": "The response text that should be delivered to the user.",
-                    },
-                },
-                "required": self.required,
-            }
-
-        if isinstance(self.response_type, type) and issubclass(
-            self.response_type, BaseModel
-        ):
-            schema = self.response_type.model_json_schema()
-            return {
-                "type": "object",
-                "properties": schema["properties"],
-                "required": schema.get("required", self.required),
-            }
-
-        return self._create_type_schema(self.response_type)
-
-    def _create_type_schema(self, type_hint: Type) -> dict:
-        """Create a JSON schema for the given type."""
-        origin = get_origin(type_hint)
-        args = get_args(type_hint)
-
-        # Handle primitive types
-        if origin is None:
-            return {
-                "type": "object",
-                "properties": {
-                    "response": {
-                        "type": self.type_mapping.get(type_hint, "string"),
-                        "description": f"Response of type {type_hint.__name__}",
-                    }
-                },
-                "required": self.required,
-            }
-
-        # Handle List type
-        if origin is list:
-            item_type = args[0] if args else Any
-            return {
-                "type": "object",
-                "properties": {
-                    "response": {
-                        "type": "array",
-                        "items": self._get_type_info(item_type),
-                    }
-                },
-                "required": self.required,
-            }
-
-        # Handle Dict type
-        if origin is dict:
-            value_type = args[1] if len(args) > 1 else Any
-            return {
-                "type": "object",
-                "properties": {
-                    "response": {
-                        "type": "object",
-                        "additionalProperties": self._get_type_info(value_type),
-                    }
-                },
-                "required": self.required,
-            }
-
-        # Handle Union type
-        if origin is Union:
-            return self._create_union_schema(args)
-
-        return self._build_parameters()
-
-    def _get_type_info(self, type_hint: Type) -> dict:
-        """Get type information for a single type."""
-        if isinstance(type_hint, type) and issubclass(type_hint, BaseModel):
-            return type_hint.model_json_schema()
-
-        return {
-            "type": self.type_mapping.get(type_hint, "string"),
-            "description": f"Value of type {getattr(type_hint, '__name__', 'any')}",
-        }
-
-    def _create_union_schema(self, types: tuple) -> dict:
-        """Create schema for Union types."""
-        return {
-            "type": "object",
-            "properties": {
-                "response": {"anyOf": [self._get_type_info(t) for t in types]}
-            },
-            "required": self.required,
-        }
-
-    async def execute(self, required: list | None = None, **kwargs) -> Any:
-        """Execute the chat completion with type conversion.
-
-        Args:
-            required: List of required field names or None
-            **kwargs: Response data
-
-        Returns:
-            Converted response based on response_type
-        """
-        required = required or self.required
-
-        # Handle case when required is a list
-        if isinstance(required, list) and len(required) > 0:
-            if len(required) == 1:
-                required_field = required[0]
-                result = kwargs.get(required_field, "")
-            else:
-                # Return multiple fields as a dictionary
-                return {field: kwargs.get(field, "") for field in required}
-        else:
-            required_field = "response"
-            result = kwargs.get(required_field, "")
-
-        # Type conversion logic
-        if self.response_type == str:
-            return result
-
-        if isinstance(self.response_type, type) and issubclass(
-            self.response_type, BaseModel
-        ):
-            return self.response_type(**kwargs)
-
-        if get_origin(self.response_type) in (list, dict):
-            return result  # Assuming result is already in correct format
-
-        try:
-            return self.response_type(result)
-        except (ValueError, TypeError):
-            return result
diff --git a/openmanus_rl/agentgym/OpenManus/app/tool/file_operators.py b/openmanus_rl/agentgym/OpenManus/app/tool/file_operators.py
deleted file mode 100644
index dd64c838..00000000
--- a/openmanus_rl/agentgym/OpenManus/app/tool/file_operators.py
+++ /dev/null
@@ -1,158 +0,0 @@
-"""File operation interfaces and implementations for local and sandbox environments."""
-
-import asyncio
-from pathlib import Path
-from typing import Optional, Protocol, Tuple, Union, runtime_checkable
-
-from app.config import SandboxSettings
-from app.exceptions import ToolError
-from app.sandbox.client import SANDBOX_CLIENT
-
-
-PathLike = Union[str, Path]
-
-
-@runtime_checkable
-class FileOperator(Protocol):
-    """Interface for file operations in different environments."""
-
-    async def read_file(self, path: PathLike) -> str:
-        """Read content from a file."""
-        ...
-
-    async def write_file(self, path: PathLike, content: str) -> None:
-        """Write content to a file."""
-        ...
-
-    async def is_directory(self, path: PathLike) -> bool:
-        """Check if path points to a directory."""
-        ...
-
-    async def exists(self, path: PathLike) -> bool:
-        """Check if path exists."""
-        ...
-
-    async def run_command(
-        self, cmd: str, timeout: Optional[float] = 120.0
-    ) -> Tuple[int, str, str]:
-        """Run a shell command and return (return_code, stdout, stderr)."""
-        ...
-
-
-class LocalFileOperator(FileOperator):
-    """File operations implementation for local filesystem."""
-
-    encoding: str = "utf-8"
-
-    async def read_file(self, path: PathLike) -> str:
-        """Read content from a local file."""
-        try:
-            return Path(path).read_text(encoding=self.encoding)
-        except Exception as e:
-            raise ToolError(f"Failed to read {path}: {str(e)}") from None
-
-    async def write_file(self, path: PathLike, content: str) -> None:
-        """Write content to a local file."""
-        try:
-            Path(path).write_text(content, encoding=self.encoding)
-        except Exception as e:
-            raise ToolError(f"Failed to write to {path}: {str(e)}") from None
-
-    async def is_directory(self, path: PathLike) -> bool:
-        """Check if path points to a directory."""
-        return Path(path).is_dir()
-
-    async def exists(self, path: PathLike) -> bool:
-        """Check if path exists."""
-        return Path(path).exists()
-
-    async def run_command(
-        self, cmd: str, timeout: Optional[float] = 120.0
-    ) -> Tuple[int, str, str]:
-        """Run a shell command locally."""
-        process = await asyncio.create_subprocess_shell(
-            cmd, stdout=asyncio.subprocess.PIPE, stderr=asyncio.subprocess.PIPE
-        )
-
-        try:
-            stdout, stderr = await asyncio.wait_for(
-                process.communicate(), timeout=timeout
-            )
-            return (
-                process.returncode or 0,
-                stdout.decode(),
-                stderr.decode(),
-            )
-        except asyncio.TimeoutError as exc:
-            try:
-                process.kill()
-            except ProcessLookupError:
-                pass
-            raise TimeoutError(
-                f"Command '{cmd}' timed out after {timeout} seconds"
-            ) from exc
-
-
-class SandboxFileOperator(FileOperator):
-    """File operations implementation for sandbox environment."""
-
-    def __init__(self):
-        self.sandbox_client = SANDBOX_CLIENT
-
-    async def _ensure_sandbox_initialized(self):
-        """Ensure sandbox is initialized."""
-        if not self.sandbox_client.sandbox:
-            await self.sandbox_client.create(config=SandboxSettings())
-
-    async def read_file(self, path: PathLike) -> str:
-        """Read content from a file in sandbox."""
-        await self._ensure_sandbox_initialized()
-        try:
-            return await self.sandbox_client.read_file(str(path))
-        except Exception as e:
-            raise ToolError(f"Failed to read {path} in sandbox: {str(e)}") from None
-
-    async def write_file(self, path: PathLike, content: str) -> None:
-        """Write content to a file in sandbox."""
-        await self._ensure_sandbox_initialized()
-        try:
-            await self.sandbox_client.write_file(str(path), content)
-        except Exception as e:
-            raise ToolError(f"Failed to write to {path} in sandbox: {str(e)}") from None
-
-    async def is_directory(self, path: PathLike) -> bool:
-        """Check if path points to a directory in sandbox."""
-        await self._ensure_sandbox_initialized()
-        result = await self.sandbox_client.run_command(
-            f"test -d {path} && echo 'true' || echo 'false'"
-        )
-        return result.strip() == "true"
-
-    async def exists(self, path: PathLike) -> bool:
-        """Check if path exists in sandbox."""
-        await self._ensure_sandbox_initialized()
-        result = await self.sandbox_client.run_command(
-            f"test -e {path} && echo 'true' || echo 'false'"
-        )
-        return result.strip() == "true"
-
-    async def run_command(
-        self, cmd: str, timeout: Optional[float] = 120.0
-    ) -> Tuple[int, str, str]:
-        """Run a command in sandbox environment."""
-        await self._ensure_sandbox_initialized()
-        try:
-            stdout = await self.sandbox_client.run_command(
-                cmd, timeout=int(timeout) if timeout else None
-            )
-            return (
-                0,  # Always return 0 since we don't have explicit return code from sandbox
-                stdout,
-                "",  # No stderr capture in the current sandbox implementation
-            )
-        except TimeoutError as exc:
-            raise TimeoutError(
-                f"Command '{cmd}' timed out after {timeout} seconds in sandbox"
-            ) from exc
-        except Exception as exc:
-            return 1, "", f"Error executing command in sandbox: {str(exc)}"
diff --git a/openmanus_rl/agentgym/OpenManus/app/tool/file_saver.py b/openmanus_rl/agentgym/OpenManus/app/tool/file_saver.py
deleted file mode 100644
index 7d92a021..00000000
--- a/openmanus_rl/agentgym/OpenManus/app/tool/file_saver.py
+++ /dev/null
@@ -1,67 +0,0 @@
-import os
-
-import aiofiles
-
-from app.config import WORKSPACE_ROOT
-from app.tool.base import BaseTool
-
-
-class FileSaver(BaseTool):
-    name: str = "file_saver"
-    description: str = """Save content to a local file at a specified path.
-Use this tool when you need to save text, code, or generated content to a file on the local filesystem.
-The tool accepts content and a file path, and saves the content to that location.
-"""
-    parameters: dict = {
-        "type": "object",
-        "properties": {
-            "content": {
-                "type": "string",
-                "description": "(required) The content to save to the file.",
-            },
-            "file_path": {
-                "type": "string",
-                "description": "(required) The path where the file should be saved, including filename and extension.",
-            },
-            "mode": {
-                "type": "string",
-                "description": "(optional) The file opening mode. Default is 'w' for write. Use 'a' for append.",
-                "enum": ["w", "a"],
-                "default": "w",
-            },
-        },
-        "required": ["content", "file_path"],
-    }
-
-    async def execute(self, content: str, file_path: str, mode: str = "w") -> str:
-        """
-        Save content to a file at the specified path.
-
-        Args:
-            content (str): The content to save to the file.
-            file_path (str): The path where the file should be saved.
-            mode (str, optional): The file opening mode. Default is 'w' for write. Use 'a' for append.
-
-        Returns:
-            str: A message indicating the result of the operation.
-        """
-        try:
-            # Place the generated file in the workspace directory
-            if os.path.isabs(file_path):
-                file_name = os.path.basename(file_path)
-                full_path = os.path.join(WORKSPACE_ROOT, file_name)
-            else:
-                full_path = os.path.join(WORKSPACE_ROOT, file_path)
-
-            # Ensure the directory exists
-            directory = os.path.dirname(full_path)
-            if directory and not os.path.exists(directory):
-                os.makedirs(directory)
-
-            # Write directly to the file
-            async with aiofiles.open(full_path, mode, encoding="utf-8") as file:
-                await file.write(content)
-
-            return f"Content successfully saved to {full_path}"
-        except Exception as e:
-            return f"Error saving file: {str(e)}"
diff --git a/openmanus_rl/agentgym/OpenManus/app/tool/mcp.py b/openmanus_rl/agentgym/OpenManus/app/tool/mcp.py
deleted file mode 100644
index 3115286e..00000000
--- a/openmanus_rl/agentgym/OpenManus/app/tool/mcp.py
+++ /dev/null
@@ -1,115 +0,0 @@
-from contextlib import AsyncExitStack
-from typing import List, Optional
-
-from mcp import ClientSession, StdioServerParameters
-from mcp.client.sse import sse_client
-from mcp.client.stdio import stdio_client
-from mcp.types import TextContent
-
-from app.logger import logger
-from app.tool.base import BaseTool, ToolResult
-from app.tool.tool_collection import ToolCollection
-
-
-class MCPClientTool(BaseTool):
-    """Represents a tool proxy that can be called on the MCP server from the client side."""
-
-    session: Optional[ClientSession] = None
-
-    async def execute(self, **kwargs) -> ToolResult:
-        """Execute the tool by making a remote call to the MCP server."""
-        if not self.session:
-            return ToolResult(error="Not connected to MCP server")
-
-        try:
-            result = await self.session.call_tool(self.name, kwargs)
-            content_str = ", ".join(
-                item.text for item in result.content if isinstance(item, TextContent)
-            )
-            return ToolResult(output=content_str or "No output returned.")
-        except Exception as e:
-            return ToolResult(error=f"Error executing tool: {str(e)}")
-
-
-class MCPClients(ToolCollection):
-    """
-    A collection of tools that connects to an MCP server and manages available tools through the Model Context Protocol.
-    """
-
-    session: Optional[ClientSession] = None
-    exit_stack: AsyncExitStack = None
-    description: str = "MCP client tools for server interaction"
-
-    def __init__(self):
-        super().__init__()  # Initialize with empty tools list
-        self.name = "mcp"  # Keep name for backward compatibility
-        self.exit_stack = AsyncExitStack()
-
-    async def connect_sse(self, server_url: str) -> None:
-        """Connect to an MCP server using SSE transport."""
-        if not server_url:
-            raise ValueError("Server URL is required.")
-        if self.session:
-            await self.disconnect()
-
-        streams_context = sse_client(url=server_url)
-        streams = await self.exit_stack.enter_async_context(streams_context)
-        self.session = await self.exit_stack.enter_async_context(
-            ClientSession(*streams)
-        )
-
-        await self._initialize_and_list_tools()
-
-    async def connect_stdio(self, command: str, args: List[str]) -> None:
-        """Connect to an MCP server using stdio transport."""
-        if not command:
-            raise ValueError("Server command is required.")
-        if self.session:
-            await self.disconnect()
-
-        server_params = StdioServerParameters(command=command, args=args)
-        stdio_transport = await self.exit_stack.enter_async_context(
-            stdio_client(server_params)
-        )
-        read, write = stdio_transport
-        self.session = await self.exit_stack.enter_async_context(
-            ClientSession(read, write)
-        )
-
-        await self._initialize_and_list_tools()
-
-    async def _initialize_and_list_tools(self) -> None:
-        """Initialize session and populate tool map."""
-        if not self.session:
-            raise RuntimeError("Session not initialized.")
-
-        await self.session.initialize()
-        response = await self.session.list_tools()
-
-        # Clear existing tools
-        self.tools = tuple()
-        self.tool_map = {}
-
-        # Create proper tool objects for each server tool
-        for tool in response.tools:
-            server_tool = MCPClientTool(
-                name=tool.name,
-                description=tool.description,
-                parameters=tool.inputSchema,
-                session=self.session,
-            )
-            self.tool_map[tool.name] = server_tool
-
-        self.tools = tuple(self.tool_map.values())
-        logger.info(
-            f"Connected to server with tools: {[tool.name for tool in response.tools]}"
-        )
-
-    async def disconnect(self) -> None:
-        """Disconnect from the MCP server and clean up resources."""
-        if self.session and self.exit_stack:
-            await self.exit_stack.aclose()
-            self.session = None
-            self.tools = tuple()
-            self.tool_map = {}
-            logger.info("Disconnected from MCP server")
diff --git a/openmanus_rl/agentgym/OpenManus/app/tool/planning.py b/openmanus_rl/agentgym/OpenManus/app/tool/planning.py
deleted file mode 100644
index 47e334d6..00000000
--- a/openmanus_rl/agentgym/OpenManus/app/tool/planning.py
+++ /dev/null
@@ -1,363 +0,0 @@
-# tool/planning.py
-from typing import Dict, List, Literal, Optional
-
-from app.exceptions import ToolError
-from app.tool.base import BaseTool, ToolResult
-
-
-_PLANNING_TOOL_DESCRIPTION = """
-A planning tool that allows the agent to create and manage plans for solving complex tasks.
-The tool provides functionality for creating plans, updating plan steps, and tracking progress.
-"""
-
-
-class PlanningTool(BaseTool):
-    """
-    A planning tool that allows the agent to create and manage plans for solving complex tasks.
-    The tool provides functionality for creating plans, updating plan steps, and tracking progress.
-    """
-
-    name: str = "planning"
-    description: str = _PLANNING_TOOL_DESCRIPTION
-    parameters: dict = {
-        "type": "object",
-        "properties": {
-            "command": {
-                "description": "The command to execute. Available commands: create, update, list, get, set_active, mark_step, delete.",
-                "enum": [
-                    "create",
-                    "update",
-                    "list",
-                    "get",
-                    "set_active",
-                    "mark_step",
-                    "delete",
-                ],
-                "type": "string",
-            },
-            "plan_id": {
-                "description": "Unique identifier for the plan. Required for create, update, set_active, and delete commands. Optional for get and mark_step (uses active plan if not specified).",
-                "type": "string",
-            },
-            "title": {
-                "description": "Title for the plan. Required for create command, optional for update command.",
-                "type": "string",
-            },
-            "steps": {
-                "description": "List of plan steps. Required for create command, optional for update command.",
-                "type": "array",
-                "items": {"type": "string"},
-            },
-            "step_index": {
-                "description": "Index of the step to update (0-based). Required for mark_step command.",
-                "type": "integer",
-            },
-            "step_status": {
-                "description": "Status to set for a step. Used with mark_step command.",
-                "enum": ["not_started", "in_progress", "completed", "blocked"],
-                "type": "string",
-            },
-            "step_notes": {
-                "description": "Additional notes for a step. Optional for mark_step command.",
-                "type": "string",
-            },
-        },
-        "required": ["command"],
-        "additionalProperties": False,
-    }
-
-    plans: dict = {}  # Dictionary to store plans by plan_id
-    _current_plan_id: Optional[str] = None  # Track the current active plan
-
-    async def execute(
-        self,
-        *,
-        command: Literal[
-            "create", "update", "list", "get", "set_active", "mark_step", "delete"
-        ],
-        plan_id: Optional[str] = None,
-        title: Optional[str] = None,
-        steps: Optional[List[str]] = None,
-        step_index: Optional[int] = None,
-        step_status: Optional[
-            Literal["not_started", "in_progress", "completed", "blocked"]
-        ] = None,
-        step_notes: Optional[str] = None,
-        **kwargs,
-    ):
-        """
-        Execute the planning tool with the given command and parameters.
-
-        Parameters:
-        - command: The operation to perform
-        - plan_id: Unique identifier for the plan
-        - title: Title for the plan (used with create command)
-        - steps: List of steps for the plan (used with create command)
-        - step_index: Index of the step to update (used with mark_step command)
-        - step_status: Status to set for a step (used with mark_step command)
-        - step_notes: Additional notes for a step (used with mark_step command)
-        """
-
-        if command == "create":
-            return self._create_plan(plan_id, title, steps)
-        elif command == "update":
-            return self._update_plan(plan_id, title, steps)
-        elif command == "list":
-            return self._list_plans()
-        elif command == "get":
-            return self._get_plan(plan_id)
-        elif command == "set_active":
-            return self._set_active_plan(plan_id)
-        elif command == "mark_step":
-            return self._mark_step(plan_id, step_index, step_status, step_notes)
-        elif command == "delete":
-            return self._delete_plan(plan_id)
-        else:
-            raise ToolError(
-                f"Unrecognized command: {command}. Allowed commands are: create, update, list, get, set_active, mark_step, delete"
-            )
-
-    def _create_plan(
-        self, plan_id: Optional[str], title: Optional[str], steps: Optional[List[str]]
-    ) -> ToolResult:
-        """Create a new plan with the given ID, title, and steps."""
-        if not plan_id:
-            raise ToolError("Parameter `plan_id` is required for command: create")
-
-        if plan_id in self.plans:
-            raise ToolError(
-                f"A plan with ID '{plan_id}' already exists. Use 'update' to modify existing plans."
-            )
-
-        if not title:
-            raise ToolError("Parameter `title` is required for command: create")
-
-        if (
-            not steps
-            or not isinstance(steps, list)
-            or not all(isinstance(step, str) for step in steps)
-        ):
-            raise ToolError(
-                "Parameter `steps` must be a non-empty list of strings for command: create"
-            )
-
-        # Create a new plan with initialized step statuses
-        plan = {
-            "plan_id": plan_id,
-            "title": title,
-            "steps": steps,
-            "step_statuses": ["not_started"] * len(steps),
-            "step_notes": [""] * len(steps),
-        }
-
-        self.plans[plan_id] = plan
-        self._current_plan_id = plan_id  # Set as active plan
-
-        return ToolResult(
-            output=f"Plan created successfully with ID: {plan_id}\n\n{self._format_plan(plan)}"
-        )
-
-    def _update_plan(
-        self, plan_id: Optional[str], title: Optional[str], steps: Optional[List[str]]
-    ) -> ToolResult:
-        """Update an existing plan with new title or steps."""
-        if not plan_id:
-            raise ToolError("Parameter `plan_id` is required for command: update")
-
-        if plan_id not in self.plans:
-            raise ToolError(f"No plan found with ID: {plan_id}")
-
-        plan = self.plans[plan_id]
-
-        if title:
-            plan["title"] = title
-
-        if steps:
-            if not isinstance(steps, list) or not all(
-                isinstance(step, str) for step in steps
-            ):
-                raise ToolError(
-                    "Parameter `steps` must be a list of strings for command: update"
-                )
-
-            # Preserve existing step statuses for unchanged steps
-            old_steps = plan["steps"]
-            old_statuses = plan["step_statuses"]
-            old_notes = plan["step_notes"]
-
-            # Create new step statuses and notes
-            new_statuses = []
-            new_notes = []
-
-            for i, step in enumerate(steps):
-                # If the step exists at the same position in old steps, preserve status and notes
-                if i < len(old_steps) and step == old_steps[i]:
-                    new_statuses.append(old_statuses[i])
-                    new_notes.append(old_notes[i])
-                else:
-                    new_statuses.append("not_started")
-                    new_notes.append("")
-
-            plan["steps"] = steps
-            plan["step_statuses"] = new_statuses
-            plan["step_notes"] = new_notes
-
-        return ToolResult(
-            output=f"Plan updated successfully: {plan_id}\n\n{self._format_plan(plan)}"
-        )
-
-    def _list_plans(self) -> ToolResult:
-        """List all available plans."""
-        if not self.plans:
-            return ToolResult(
-                output="No plans available. Create a plan with the 'create' command."
-            )
-
-        output = "Available plans:\n"
-        for plan_id, plan in self.plans.items():
-            current_marker = " (active)" if plan_id == self._current_plan_id else ""
-            completed = sum(
-                1 for status in plan["step_statuses"] if status == "completed"
-            )
-            total = len(plan["steps"])
-            progress = f"{completed}/{total} steps completed"
-            output += f"• {plan_id}{current_marker}: {plan['title']} - {progress}\n"
-
-        return ToolResult(output=output)
-
-    def _get_plan(self, plan_id: Optional[str]) -> ToolResult:
-        """Get details of a specific plan."""
-        if not plan_id:
-            # If no plan_id is provided, use the current active plan
-            if not self._current_plan_id:
-                raise ToolError(
-                    "No active plan. Please specify a plan_id or set an active plan."
-                )
-            plan_id = self._current_plan_id
-
-        if plan_id not in self.plans:
-            raise ToolError(f"No plan found with ID: {plan_id}")
-
-        plan = self.plans[plan_id]
-        return ToolResult(output=self._format_plan(plan))
-
-    def _set_active_plan(self, plan_id: Optional[str]) -> ToolResult:
-        """Set a plan as the active plan."""
-        if not plan_id:
-            raise ToolError("Parameter `plan_id` is required for command: set_active")
-
-        if plan_id not in self.plans:
-            raise ToolError(f"No plan found with ID: {plan_id}")
-
-        self._current_plan_id = plan_id
-        return ToolResult(
-            output=f"Plan '{plan_id}' is now the active plan.\n\n{self._format_plan(self.plans[plan_id])}"
-        )
-
-    def _mark_step(
-        self,
-        plan_id: Optional[str],
-        step_index: Optional[int],
-        step_status: Optional[str],
-        step_notes: Optional[str],
-    ) -> ToolResult:
-        """Mark a step with a specific status and optional notes."""
-        if not plan_id:
-            # If no plan_id is provided, use the current active plan
-            if not self._current_plan_id:
-                raise ToolError(
-                    "No active plan. Please specify a plan_id or set an active plan."
-                )
-            plan_id = self._current_plan_id
-
-        if plan_id not in self.plans:
-            raise ToolError(f"No plan found with ID: {plan_id}")
-
-        if step_index is None:
-            raise ToolError("Parameter `step_index` is required for command: mark_step")
-
-        plan = self.plans[plan_id]
-
-        if step_index < 0 or step_index >= len(plan["steps"]):
-            raise ToolError(
-                f"Invalid step_index: {step_index}. Valid indices range from 0 to {len(plan['steps'])-1}."
-            )
-
-        if step_status and step_status not in [
-            "not_started",
-            "in_progress",
-            "completed",
-            "blocked",
-        ]:
-            raise ToolError(
-                f"Invalid step_status: {step_status}. Valid statuses are: not_started, in_progress, completed, blocked"
-            )
-
-        if step_status:
-            plan["step_statuses"][step_index] = step_status
-
-        if step_notes:
-            plan["step_notes"][step_index] = step_notes
-
-        return ToolResult(
-            output=f"Step {step_index} updated in plan '{plan_id}'.\n\n{self._format_plan(plan)}"
-        )
-
-    def _delete_plan(self, plan_id: Optional[str]) -> ToolResult:
-        """Delete a plan."""
-        if not plan_id:
-            raise ToolError("Parameter `plan_id` is required for command: delete")
-
-        if plan_id not in self.plans:
-            raise ToolError(f"No plan found with ID: {plan_id}")
-
-        del self.plans[plan_id]
-
-        # If the deleted plan was the active plan, clear the active plan
-        if self._current_plan_id == plan_id:
-            self._current_plan_id = None
-
-        return ToolResult(output=f"Plan '{plan_id}' has been deleted.")
-
-    def _format_plan(self, plan: Dict) -> str:
-        """Format a plan for display."""
-        output = f"Plan: {plan['title']} (ID: {plan['plan_id']})\n"
-        output += "=" * len(output) + "\n\n"
-
-        # Calculate progress statistics
-        total_steps = len(plan["steps"])
-        completed = sum(1 for status in plan["step_statuses"] if status == "completed")
-        in_progress = sum(
-            1 for status in plan["step_statuses"] if status == "in_progress"
-        )
-        blocked = sum(1 for status in plan["step_statuses"] if status == "blocked")
-        not_started = sum(
-            1 for status in plan["step_statuses"] if status == "not_started"
-        )
-
-        output += f"Progress: {completed}/{total_steps} steps completed "
-        if total_steps > 0:
-            percentage = (completed / total_steps) * 100
-            output += f"({percentage:.1f}%)\n"
-        else:
-            output += "(0%)\n"
-
-        output += f"Status: {completed} completed, {in_progress} in progress, {blocked} blocked, {not_started} not started\n\n"
-        output += "Steps:\n"
-
-        # Add each step with its status and notes
-        for i, (step, status, notes) in enumerate(
-            zip(plan["steps"], plan["step_statuses"], plan["step_notes"])
-        ):
-            status_symbol = {
-                "not_started": "[ ]",
-                "in_progress": "[→]",
-                "completed": "[✓]",
-                "blocked": "[!]",
-            }.get(status, "[ ]")
-
-            output += f"{i}. {status_symbol} {step}\n"
-            if notes:
-                output += f"   Notes: {notes}\n"
-
-        return output
diff --git a/openmanus_rl/agentgym/OpenManus/app/tool/python_execute.py b/openmanus_rl/agentgym/OpenManus/app/tool/python_execute.py
deleted file mode 100644
index 08ceffa8..00000000
--- a/openmanus_rl/agentgym/OpenManus/app/tool/python_execute.py
+++ /dev/null
@@ -1,75 +0,0 @@
-import multiprocessing
-import sys
-from io import StringIO
-from typing import Dict
-
-from app.tool.base import BaseTool
-
-
-class PythonExecute(BaseTool):
-    """A tool for executing Python code with timeout and safety restrictions."""
-
-    name: str = "python_execute"
-    description: str = "Executes Python code string. Note: Only print outputs are visible, function return values are not captured. Use print statements to see results."
-    parameters: dict = {
-        "type": "object",
-        "properties": {
-            "code": {
-                "type": "string",
-                "description": "The Python code to execute.",
-            },
-        },
-        "required": ["code"],
-    }
-
-    def _run_code(self, code: str, result_dict: dict, safe_globals: dict) -> None:
-        original_stdout = sys.stdout
-        try:
-            output_buffer = StringIO()
-            sys.stdout = output_buffer
-            exec(code, safe_globals, safe_globals)
-            result_dict["observation"] = output_buffer.getvalue()
-            result_dict["success"] = True
-        except Exception as e:
-            result_dict["observation"] = str(e)
-            result_dict["success"] = False
-        finally:
-            sys.stdout = original_stdout
-
-    async def execute(
-        self,
-        code: str,
-        timeout: int = 5,
-    ) -> Dict:
-        """
-        Executes the provided Python code with a timeout.
-
-        Args:
-            code (str): The Python code to execute.
-            timeout (int): Execution timeout in seconds.
-
-        Returns:
-            Dict: Contains 'output' with execution output or error message and 'success' status.
-        """
-
-        with multiprocessing.Manager() as manager:
-            result = manager.dict({"observation": "", "success": False})
-            if isinstance(__builtins__, dict):
-                safe_globals = {"__builtins__": __builtins__}
-            else:
-                safe_globals = {"__builtins__": __builtins__.__dict__.copy()}
-            proc = multiprocessing.Process(
-                target=self._run_code, args=(code, result, safe_globals)
-            )
-            proc.start()
-            proc.join(timeout)
-
-            # timeout process
-            if proc.is_alive():
-                proc.terminate()
-                proc.join(1)
-                return {
-                    "observation": f"Execution timeout after {timeout} seconds",
-                    "success": False,
-                }
-            return dict(result)
diff --git a/openmanus_rl/agentgym/OpenManus/app/tool/search/__init__.py b/openmanus_rl/agentgym/OpenManus/app/tool/search/__init__.py
deleted file mode 100644
index fe127ae3..00000000
--- a/openmanus_rl/agentgym/OpenManus/app/tool/search/__init__.py
+++ /dev/null
@@ -1,14 +0,0 @@
-from app.tool.search.baidu_search import BaiduSearchEngine
-from app.tool.search.base import WebSearchEngine
-from app.tool.search.bing_search import BingSearchEngine
-from app.tool.search.duckduckgo_search import DuckDuckGoSearchEngine
-from app.tool.search.google_search import GoogleSearchEngine
-
-
-__all__ = [
-    "WebSearchEngine",
-    "BaiduSearchEngine",
-    "DuckDuckGoSearchEngine",
-    "GoogleSearchEngine",
-    "BingSearchEngine",
-]
diff --git a/openmanus_rl/agentgym/OpenManus/app/tool/search/baidu_search.py b/openmanus_rl/agentgym/OpenManus/app/tool/search/baidu_search.py
deleted file mode 100644
index d415ce8c..00000000
--- a/openmanus_rl/agentgym/OpenManus/app/tool/search/baidu_search.py
+++ /dev/null
@@ -1,9 +0,0 @@
-from baidusearch.baidusearch import search
-
-from app.tool.search.base import WebSearchEngine
-
-
-class BaiduSearchEngine(WebSearchEngine):
-    def perform_search(self, query, num_results=10, *args, **kwargs):
-        """Baidu search engine."""
-        return search(query, num_results=num_results)
diff --git a/openmanus_rl/agentgym/OpenManus/app/tool/search/base.py b/openmanus_rl/agentgym/OpenManus/app/tool/search/base.py
deleted file mode 100644
index 31323812..00000000
--- a/openmanus_rl/agentgym/OpenManus/app/tool/search/base.py
+++ /dev/null
@@ -1,17 +0,0 @@
-class WebSearchEngine(object):
-    def perform_search(
-        self, query: str, num_results: int = 10, *args, **kwargs
-    ) -> list[dict]:
-        """
-        Perform a web search and return a list of URLs.
-
-        Args:
-            query (str): The search query to submit to the search engine.
-            num_results (int, optional): The number of search results to return. Default is 10.
-            args: Additional arguments.
-            kwargs: Additional keyword arguments.
-
-        Returns:
-            List: A list of dict matching the search query.
-        """
-        raise NotImplementedError
diff --git a/openmanus_rl/agentgym/OpenManus/app/tool/search/bing_search.py b/openmanus_rl/agentgym/OpenManus/app/tool/search/bing_search.py
deleted file mode 100644
index 46955b50..00000000
--- a/openmanus_rl/agentgym/OpenManus/app/tool/search/bing_search.py
+++ /dev/null
@@ -1,146 +0,0 @@
-from typing import List
-
-import requests
-from bs4 import BeautifulSoup
-
-from app.logger import logger
-from app.tool.search.base import WebSearchEngine
-
-
-ABSTRACT_MAX_LENGTH = 300
-
-USER_AGENTS = [
-    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36",
-    "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)",
-    "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/49.0.2623.108 Chrome/49.0.2623.108 Safari/537.36",
-    "Mozilla/5.0 (Windows; U; Windows NT 5.1; pt-BR) AppleWebKit/533.3 (KHTML, like Gecko) QtWeb Internet Browser/3.7 http://www.QtWeb.net",
-    "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36",
-    "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/532.2 (KHTML, like Gecko) ChromePlus/4.0.222.3 Chrome/4.0.222.3 Safari/532.2",
-    "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.4pre) Gecko/20070404 K-Ninja/2.1.3",
-    "Mozilla/5.0 (Future Star Technologies Corp.; Star-Blade OS; x86_64; U; en-US) iNet Browser 4.7",
-    "Mozilla/5.0 (Windows; U; Windows NT 6.1; rv:2.2) Gecko/20110201",
-    "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080414 Firefox/2.0.0.13 Pogo/2.0.0.13.6866",
-]
-
-HEADERS = {
-    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
-    "Content-Type": "application/x-www-form-urlencoded",
-    "User-Agent": USER_AGENTS[0],
-    "Referer": "https://www.bing.com/",
-    "Accept-Encoding": "gzip, deflate",
-    "Accept-Language": "zh-CN,zh;q=0.9",
-}
-
-BING_HOST_URL = "https://www.bing.com"
-BING_SEARCH_URL = "https://www.bing.com/search?q="
-
-
-class BingSearchEngine(WebSearchEngine):
-    session: requests.Session = None
-
-    def __init__(self, **data):
-        """Initialize the BingSearch tool with a requests session."""
-        super().__init__(**data)
-        self.session = requests.Session()
-        self.session.headers.update(HEADERS)
-
-    def _search_sync(self, query: str, num_results: int = 10) -> List[str]:
-        """
-        Synchronous Bing search implementation to retrieve a list of URLs matching a query.
-
-        Args:
-            query (str): The search query to submit to Bing. Must not be empty.
-            num_results (int, optional): The maximum number of URLs to return. Defaults to 10.
-
-        Returns:
-            List[str]: A list of URLs from the search results, capped at `num_results`.
-                       Returns an empty list if the query is empty or no results are found.
-
-        Notes:
-            - Pagination is handled by incrementing the `first` parameter and following `next_url` links.
-            - If fewer results than `num_results` are available, all found URLs are returned.
-        """
-        if not query:
-            return []
-
-        list_result = []
-        first = 1
-        next_url = BING_SEARCH_URL + query
-
-        while len(list_result) < num_results:
-            data, next_url = self._parse_html(
-                next_url, rank_start=len(list_result), first=first
-            )
-            if data:
-                list_result.extend([item["url"] for item in data])
-            if not next_url:
-                break
-            first += 10
-
-        return list_result[:num_results]
-
-    def _parse_html(self, url: str, rank_start: int = 0, first: int = 1) -> tuple:
-        """
-        Parse Bing search result HTML synchronously to extract search results and the next page URL.
-
-        Args:
-            url (str): The URL of the Bing search results page to parse.
-            rank_start (int, optional): The starting rank for numbering the search results. Defaults to 0.
-            first (int, optional): Unused parameter (possibly legacy). Defaults to 1.
-        Returns:
-            tuple: A tuple containing:
-                - list: A list of dictionaries with keys 'title', 'abstract', 'url', and 'rank' for each result.
-                - str or None: The URL of the next results page, or None if there is no next page.
-        """
-        try:
-            res = self.session.get(url=url)
-            res.encoding = "utf-8"
-            root = BeautifulSoup(res.text, "lxml")
-
-            list_data = []
-            ol_results = root.find("ol", id="b_results")
-            if not ol_results:
-                return [], None
-
-            for li in ol_results.find_all("li", class_="b_algo"):
-                title = ""
-                url = ""
-                abstract = ""
-                try:
-                    h2 = li.find("h2")
-                    if h2:
-                        title = h2.text.strip()
-                        url = h2.a["href"].strip()
-
-                    p = li.find("p")
-                    if p:
-                        abstract = p.text.strip()
-
-                    if ABSTRACT_MAX_LENGTH and len(abstract) > ABSTRACT_MAX_LENGTH:
-                        abstract = abstract[:ABSTRACT_MAX_LENGTH]
-
-                    rank_start += 1
-                    list_data.append(
-                        {
-                            "title": title,
-                            "abstract": abstract,
-                            "url": url,
-                            "rank": rank_start,
-                        }
-                    )
-                except Exception:
-                    continue
-
-            next_btn = root.find("a", title="Next page")
-            if not next_btn:
-                return list_data, None
-
-            next_url = BING_HOST_URL + next_btn["href"]
-            return list_data, next_url
-        except Exception as e:
-            logger.warning(f"Error parsing HTML: {e}")
-            return [], None
-
-    def perform_search(self, query, num_results=10, *args, **kwargs):
-        """Bing search engine."""
-        return self._search_sync(query, num_results=num_results)
diff --git a/openmanus_rl/agentgym/OpenManus/app/tool/search/duckduckgo_search.py b/openmanus_rl/agentgym/OpenManus/app/tool/search/duckduckgo_search.py
deleted file mode 100644
index 3dd5c52c..00000000
--- a/openmanus_rl/agentgym/OpenManus/app/tool/search/duckduckgo_search.py
+++ /dev/null
@@ -1,9 +0,0 @@
-from duckduckgo_search import DDGS
-
-from app.tool.search.base import WebSearchEngine
-
-
-class DuckDuckGoSearchEngine(WebSearchEngine):
-    async def perform_search(self, query, num_results=10, *args, **kwargs):
-        """DuckDuckGo search engine."""
-        return DDGS.text(query, num_results=num_results)
diff --git a/openmanus_rl/agentgym/OpenManus/app/tool/search/google_search.py b/openmanus_rl/agentgym/OpenManus/app/tool/search/google_search.py
deleted file mode 100644
index 425106d7..00000000
--- a/openmanus_rl/agentgym/OpenManus/app/tool/search/google_search.py
+++ /dev/null
@@ -1,9 +0,0 @@
-from googlesearch import search
-
-from app.tool.search.base import WebSearchEngine
-
-
-class GoogleSearchEngine(WebSearchEngine):
-    def perform_search(self, query, num_results=10, *args, **kwargs):
-        """Google search engine."""
-        return search(query, num_results=num_results)
diff --git a/openmanus_rl/agentgym/OpenManus/app/tool/str_replace_editor.py b/openmanus_rl/agentgym/OpenManus/app/tool/str_replace_editor.py
deleted file mode 100644
index a907f41e..00000000
--- a/openmanus_rl/agentgym/OpenManus/app/tool/str_replace_editor.py
+++ /dev/null
@@ -1,432 +0,0 @@
-"""File and directory manipulation tool with sandbox support."""
-
-from collections import defaultdict
-from pathlib import Path
-from typing import Any, DefaultDict, List, Literal, Optional, get_args
-
-from app.config import config
-from app.exceptions import ToolError
-from app.tool import BaseTool
-from app.tool.base import CLIResult, ToolResult
-from app.tool.file_operators import (
-    FileOperator,
-    LocalFileOperator,
-    PathLike,
-    SandboxFileOperator,
-)
-
-
-Command = Literal[
-    "view",
-    "create",
-    "str_replace",
-    "insert",
-    "undo_edit",
-]
-
-# Constants
-SNIPPET_LINES: int = 4
-MAX_RESPONSE_LEN: int = 16000
-TRUNCATED_MESSAGE: str = (
-    "<response clipped><NOTE>To save on context only part of this file has been shown to you. "
-    "You should retry this tool after you have searched inside the file with `grep -n` "
-    "in order to find the line numbers of what you are looking for.</NOTE>"
-)
-
-# Tool description
-_STR_REPLACE_EDITOR_DESCRIPTION = """Custom editing tool for viewing, creating and editing files
-* State is persistent across command calls and discussions with the user
-* If `path` is a file, `view` displays the result of applying `cat -n`. If `path` is a directory, `view` lists non-hidden files and directories up to 2 levels deep
-* The `create` command cannot be used if the specified `path` already exists as a file
-* If a `command` generates a long output, it will be truncated and marked with `<response clipped>`
-* The `undo_edit` command will revert the last edit made to the file at `path`
-
-Notes for using the `str_replace` command:
-* The `old_str` parameter should match EXACTLY one or more consecutive lines from the original file. Be mindful of whitespaces!
-* If the `old_str` parameter is not unique in the file, the replacement will not be performed. Make sure to include enough context in `old_str` to make it unique
-* The `new_str` parameter should contain the edited lines that should replace the `old_str`
-"""
-
-
-def maybe_truncate(
-    content: str, truncate_after: Optional[int] = MAX_RESPONSE_LEN
-) -> str:
-    """Truncate content and append a notice if content exceeds the specified length."""
-    if not truncate_after or len(content) <= truncate_after:
-        return content
-    return content[:truncate_after] + TRUNCATED_MESSAGE
-
-
-class StrReplaceEditor(BaseTool):
-    """A tool for viewing, creating, and editing files with sandbox support."""
-
-    name: str = "str_replace_editor"
-    description: str = _STR_REPLACE_EDITOR_DESCRIPTION
-    parameters: dict = {
-        "type": "object",
-        "properties": {
-            "command": {
-                "description": "The commands to run. Allowed options are: `view`, `create`, `str_replace`, `insert`, `undo_edit`.",
-                "enum": ["view", "create", "str_replace", "insert", "undo_edit"],
-                "type": "string",
-            },
-            "path": {
-                "description": "Absolute path to file or directory.",
-                "type": "string",
-            },
-            "file_text": {
-                "description": "Required parameter of `create` command, with the content of the file to be created.",
-                "type": "string",
-            },
-            "old_str": {
-                "description": "Required parameter of `str_replace` command containing the string in `path` to replace.",
-                "type": "string",
-            },
-            "new_str": {
-                "description": "Optional parameter of `str_replace` command containing the new string (if not given, no string will be added). Required parameter of `insert` command containing the string to insert.",
-                "type": "string",
-            },
-            "insert_line": {
-                "description": "Required parameter of `insert` command. The `new_str` will be inserted AFTER the line `insert_line` of `path`.",
-                "type": "integer",
-            },
-            "view_range": {
-                "description": "Optional parameter of `view` command when `path` points to a file. If none is given, the full file is shown. If provided, the file will be shown in the indicated line number range, e.g. [11, 12] will show lines 11 and 12. Indexing at 1 to start. Setting `[start_line, -1]` shows all lines from `start_line` to the end of the file.",
-                "items": {"type": "integer"},
-                "type": "array",
-            },
-        },
-        "required": ["command", "path"],
-    }
-    _file_history: DefaultDict[PathLike, List[str]] = defaultdict(list)
-    _local_operator: LocalFileOperator = LocalFileOperator()
-    _sandbox_operator: SandboxFileOperator = SandboxFileOperator()
-
-    # def _get_operator(self, use_sandbox: bool) -> FileOperator:
-    def _get_operator(self) -> FileOperator:
-        """Get the appropriate file operator based on execution mode."""
-        return (
-            self._sandbox_operator
-            if config.sandbox.use_sandbox
-            else self._local_operator
-        )
-
-    async def execute(
-        self,
-        *,
-        command: Command,
-        path: str,
-        file_text: str | None = None,
-        view_range: list[int] | None = None,
-        old_str: str | None = None,
-        new_str: str | None = None,
-        insert_line: int | None = None,
-        **kwargs: Any,
-    ) -> str:
-        """Execute a file operation command."""
-        # Get the appropriate file operator
-        operator = self._get_operator()
-
-        # Validate path and command combination
-        await self.validate_path(command, Path(path), operator)
-
-        # Execute the appropriate command
-        if command == "view":
-            result = await self.view(path, view_range, operator)
-        elif command == "create":
-            if file_text is None:
-                raise ToolError("Parameter `file_text` is required for command: create")
-            await operator.write_file(path, file_text)
-            self._file_history[path].append(file_text)
-            result = ToolResult(output=f"File created successfully at: {path}")
-        elif command == "str_replace":
-            if old_str is None:
-                raise ToolError(
-                    "Parameter `old_str` is required for command: str_replace"
-                )
-            result = await self.str_replace(path, old_str, new_str, operator)
-        elif command == "insert":
-            if insert_line is None:
-                raise ToolError(
-                    "Parameter `insert_line` is required for command: insert"
-                )
-            if new_str is None:
-                raise ToolError("Parameter `new_str` is required for command: insert")
-            result = await self.insert(path, insert_line, new_str, operator)
-        elif command == "undo_edit":
-            result = await self.undo_edit(path, operator)
-        else:
-            # This should be caught by type checking, but we include it for safety
-            raise ToolError(
-                f'Unrecognized command {command}. The allowed commands for the {self.name} tool are: {", ".join(get_args(Command))}'
-            )
-
-        return str(result)
-
-    async def validate_path(
-        self, command: str, path: Path, operator: FileOperator
-    ) -> None:
-        """Validate path and command combination based on execution environment."""
-        # Check if path is absolute
-        if not path.is_absolute():
-            raise ToolError(f"The path {path} is not an absolute path")
-
-        # Only check if path exists for non-create commands
-        if command != "create":
-            if not await operator.exists(path):
-                raise ToolError(
-                    f"The path {path} does not exist. Please provide a valid path."
-                )
-
-            # Check if path is a directory
-            is_dir = await operator.is_directory(path)
-            if is_dir and command != "view":
-                raise ToolError(
-                    f"The path {path} is a directory and only the `view` command can be used on directories"
-                )
-
-        # Check if file exists for create command
-        elif command == "create":
-            exists = await operator.exists(path)
-            if exists:
-                raise ToolError(
-                    f"File already exists at: {path}. Cannot overwrite files using command `create`."
-                )
-
-    async def view(
-        self,
-        path: PathLike,
-        view_range: Optional[List[int]] = None,
-        operator: FileOperator = None,
-    ) -> CLIResult:
-        """Display file or directory content."""
-        # Determine if path is a directory
-        is_dir = await operator.is_directory(path)
-
-        if is_dir:
-            # Directory handling
-            if view_range:
-                raise ToolError(
-                    "The `view_range` parameter is not allowed when `path` points to a directory."
-                )
-
-            return await self._view_directory(path, operator)
-        else:
-            # File handling
-            return await self._view_file(path, operator, view_range)
-
-    @staticmethod
-    async def _view_directory(path: PathLike, operator: FileOperator) -> CLIResult:
-        """Display directory contents."""
-        find_cmd = f"find {path} -maxdepth 2 -not -path '*/\\.*'"
-
-        # Execute command using the operator
-        returncode, stdout, stderr = await operator.run_command(find_cmd)
-
-        if not stderr:
-            stdout = (
-                f"Here's the files and directories up to 2 levels deep in {path}, "
-                f"excluding hidden items:\n{stdout}\n"
-            )
-
-        return CLIResult(output=stdout, error=stderr)
-
-    async def _view_file(
-        self,
-        path: PathLike,
-        operator: FileOperator,
-        view_range: Optional[List[int]] = None,
-    ) -> CLIResult:
-        """Display file content, optionally within a specified line range."""
-        # Read file content
-        file_content = await operator.read_file(path)
-        init_line = 1
-
-        # Apply view range if specified
-        if view_range:
-            if len(view_range) != 2 or not all(isinstance(i, int) for i in view_range):
-                raise ToolError(
-                    "Invalid `view_range`. It should be a list of two integers."
-                )
-
-            file_lines = file_content.split("\n")
-            n_lines_file = len(file_lines)
-            init_line, final_line = view_range
-
-            # Validate view range
-            if init_line < 1 or init_line > n_lines_file:
-                raise ToolError(
-                    f"Invalid `view_range`: {view_range}. Its first element `{init_line}` should be "
-                    f"within the range of lines of the file: {[1, n_lines_file]}"
-                )
-            if final_line > n_lines_file:
-                raise ToolError(
-                    f"Invalid `view_range`: {view_range}. Its second element `{final_line}` should be "
-                    f"smaller than the number of lines in the file: `{n_lines_file}`"
-                )
-            if final_line != -1 and final_line < init_line:
-                raise ToolError(
-                    f"Invalid `view_range`: {view_range}. Its second element `{final_line}` should be "
-                    f"larger or equal than its first `{init_line}`"
-                )
-
-            # Apply range
-            if final_line == -1:
-                file_content = "\n".join(file_lines[init_line - 1 :])
-            else:
-                file_content = "\n".join(file_lines[init_line - 1 : final_line])
-
-        # Format and return result
-        return CLIResult(
-            output=self._make_output(file_content, str(path), init_line=init_line)
-        )
-
-    async def str_replace(
-        self,
-        path: PathLike,
-        old_str: str,
-        new_str: Optional[str] = None,
-        operator: FileOperator = None,
-    ) -> CLIResult:
-        """Replace a unique string in a file with a new string."""
-        # Read file content and expand tabs
-        file_content = (await operator.read_file(path)).expandtabs()
-        old_str = old_str.expandtabs()
-        new_str = new_str.expandtabs() if new_str is not None else ""
-
-        # Check if old_str is unique in the file
-        occurrences = file_content.count(old_str)
-        if occurrences == 0:
-            raise ToolError(
-                f"No replacement was performed, old_str `{old_str}` did not appear verbatim in {path}."
-            )
-        elif occurrences > 1:
-            # Find line numbers of occurrences
-            file_content_lines = file_content.split("\n")
-            lines = [
-                idx + 1
-                for idx, line in enumerate(file_content_lines)
-                if old_str in line
-            ]
-            raise ToolError(
-                f"No replacement was performed. Multiple occurrences of old_str `{old_str}` "
-                f"in lines {lines}. Please ensure it is unique"
-            )
-
-        # Replace old_str with new_str
-        new_file_content = file_content.replace(old_str, new_str)
-
-        # Write the new content to the file
-        await operator.write_file(path, new_file_content)
-
-        # Save the original content to history
-        self._file_history[path].append(file_content)
-
-        # Create a snippet of the edited section
-        replacement_line = file_content.split(old_str)[0].count("\n")
-        start_line = max(0, replacement_line - SNIPPET_LINES)
-        end_line = replacement_line + SNIPPET_LINES + new_str.count("\n")
-        snippet = "\n".join(new_file_content.split("\n")[start_line : end_line + 1])
-
-        # Prepare the success message
-        success_msg = f"The file {path} has been edited. "
-        success_msg += self._make_output(
-            snippet, f"a snippet of {path}", start_line + 1
-        )
-        success_msg += "Review the changes and make sure they are as expected. Edit the file again if necessary."
-
-        return CLIResult(output=success_msg)
-
-    async def insert(
-        self,
-        path: PathLike,
-        insert_line: int,
-        new_str: str,
-        operator: FileOperator = None,
-    ) -> CLIResult:
-        """Insert text at a specific line in a file."""
-        # Read and prepare content
-        file_text = (await operator.read_file(path)).expandtabs()
-        new_str = new_str.expandtabs()
-        file_text_lines = file_text.split("\n")
-        n_lines_file = len(file_text_lines)
-
-        # Validate insert_line
-        if insert_line < 0 or insert_line > n_lines_file:
-            raise ToolError(
-                f"Invalid `insert_line` parameter: {insert_line}. It should be within "
-                f"the range of lines of the file: {[0, n_lines_file]}"
-            )
-
-        # Perform insertion
-        new_str_lines = new_str.split("\n")
-        new_file_text_lines = (
-            file_text_lines[:insert_line]
-            + new_str_lines
-            + file_text_lines[insert_line:]
-        )
-
-        # Create a snippet for preview
-        snippet_lines = (
-            file_text_lines[max(0, insert_line - SNIPPET_LINES) : insert_line]
-            + new_str_lines
-            + file_text_lines[insert_line : insert_line + SNIPPET_LINES]
-        )
-
-        # Join lines and write to file
-        new_file_text = "\n".join(new_file_text_lines)
-        snippet = "\n".join(snippet_lines)
-
-        await operator.write_file(path, new_file_text)
-        self._file_history[path].append(file_text)
-
-        # Prepare success message
-        success_msg = f"The file {path} has been edited. "
-        success_msg += self._make_output(
-            snippet,
-            "a snippet of the edited file",
-            max(1, insert_line - SNIPPET_LINES + 1),
-        )
-        success_msg += "Review the changes and make sure they are as expected (correct indentation, no duplicate lines, etc). Edit the file again if necessary."
-
-        return CLIResult(output=success_msg)
-
-    async def undo_edit(
-        self, path: PathLike, operator: FileOperator = None
-    ) -> CLIResult:
-        """Revert the last edit made to a file."""
-        if not self._file_history[path]:
-            raise ToolError(f"No edit history found for {path}.")
-
-        old_text = self._file_history[path].pop()
-        await operator.write_file(path, old_text)
-
-        return CLIResult(
-            output=f"Last edit to {path} undone successfully. {self._make_output(old_text, str(path))}"
-        )
-
-    def _make_output(
-        self,
-        file_content: str,
-        file_descriptor: str,
-        init_line: int = 1,
-        expand_tabs: bool = True,
-    ) -> str:
-        """Format file content for display with line numbers."""
-        file_content = maybe_truncate(file_content)
-        if expand_tabs:
-            file_content = file_content.expandtabs()
-
-        # Add line numbers to each line
-        file_content = "\n".join(
-            [
-                f"{i + init_line:6}\t{line}"
-                for i, line in enumerate(file_content.split("\n"))
-            ]
-        )
-
-        return (
-            f"Here's the result of running `cat -n` on {file_descriptor}:\n"
-            + file_content
-            + "\n"
-        )
diff --git a/openmanus_rl/agentgym/OpenManus/app/tool/terminal.py b/openmanus_rl/agentgym/OpenManus/app/tool/terminal.py
deleted file mode 100644
index 86b401ce..00000000
--- a/openmanus_rl/agentgym/OpenManus/app/tool/terminal.py
+++ /dev/null
@@ -1,182 +0,0 @@
-import asyncio
-import os
-import shlex
-from typing import Optional
-
-from app.tool.base import BaseTool, CLIResult
-
-
-class Terminal(BaseTool):
-    name: str = "execute_command"
-    description: str = """Request to execute a CLI command on the system.
-Use this when you need to perform system operations or run specific commands to accomplish any step in the user's task.
-You must tailor your command to the user's system and provide a clear explanation of what the command does.
-Prefer to execute complex CLI commands over creating executable scripts, as they are more flexible and easier to run.
-Commands will be executed in the current working directory.
-Note: You MUST append a `sleep 0.05` to the end of the command for commands that will complete in under 50ms, as this will circumvent a known issue with the terminal tool where it will sometimes not return the output when the command completes too quickly.
-"""
-    parameters: dict = {
-        "type": "object",
-        "properties": {
-            "command": {
-                "type": "string",
-                "description": "(required) The CLI command to execute. This should be valid for the current operating system. Ensure the command is properly formatted and does not contain any harmful instructions.",
-            }
-        },
-        "required": ["command"],
-    }
-    process: Optional[asyncio.subprocess.Process] = None
-    current_path: str = os.getcwd()
-    lock: asyncio.Lock = asyncio.Lock()
-
-    async def execute(self, command: str) -> CLIResult:
-        """
-        Execute a terminal command asynchronously with persistent context.
-
-        Args:
-            command (str): The terminal command to execute.
-
-        Returns:
-            str: The output, and error of the command execution.
-        """
-        # Split the command by & to handle multiple commands
-        commands = [cmd.strip() for cmd in command.split("&") if cmd.strip()]
-        final_output = CLIResult(output="", error="")
-
-        for cmd in commands:
-            sanitized_command = self._sanitize_command(cmd)
-
-            # Handle 'cd' command internally
-            if sanitized_command.lstrip().startswith("cd "):
-                result = await self._handle_cd_command(sanitized_command)
-            else:
-                async with self.lock:
-                    try:
-                        self.process = await asyncio.create_subprocess_shell(
-                            sanitized_command,
-                            stdout=asyncio.subprocess.PIPE,
-                            stderr=asyncio.subprocess.PIPE,
-                            cwd=self.current_path,
-                        )
-                        stdout, stderr = await self.process.communicate()
-                        result = CLIResult(
-                            output=stdout.decode().strip(),
-                            error=stderr.decode().strip(),
-                        )
-                    except Exception as e:
-                        result = CLIResult(output="", error=str(e))
-                    finally:
-                        self.process = None
-
-            # Combine outputs
-            if result.output:
-                final_output.output += (
-                    (result.output + "\n") if final_output.output else result.output
-                )
-            if result.error:
-                final_output.error += (
-                    (result.error + "\n") if final_output.error else result.error
-                )
-
-        # Remove trailing newlines
-        final_output.output = final_output.output.rstrip()
-        final_output.error = final_output.error.rstrip()
-        return final_output
-
-    async def execute_in_env(self, env_name: str, command: str) -> CLIResult:
-        """
-        Execute a terminal command asynchronously within a specified Conda environment.
-
-        Args:
-            env_name (str): The name of the Conda environment.
-            command (str): The terminal command to execute within the environment.
-
-        Returns:
-            str: The output, and error of the command execution.
-        """
-        sanitized_command = self._sanitize_command(command)
-
-        # Construct the command to run within the Conda environment
-        # Using 'conda run -n env_name command' to execute without activating
-        conda_command = f"conda run -n {shlex.quote(env_name)} {sanitized_command}"
-
-        return await self.execute(conda_command)
-
-    async def _handle_cd_command(self, command: str) -> CLIResult:
-        """
-        Handle 'cd' commands to change the current path.
-
-        Args:
-            command (str): The 'cd' command to process.
-
-        Returns:
-            TerminalOutput: The result of the 'cd' command.
-        """
-        try:
-            parts = shlex.split(command)
-            if len(parts) < 2:
-                new_path = os.path.expanduser("~")
-            else:
-                new_path = os.path.expanduser(parts[1])
-
-            # Handle relative paths
-            if not os.path.isabs(new_path):
-                new_path = os.path.join(self.current_path, new_path)
-
-            new_path = os.path.abspath(new_path)
-
-            if os.path.isdir(new_path):
-                self.current_path = new_path
-                return CLIResult(
-                    output=f"Changed directory to {self.current_path}", error=""
-                )
-            else:
-                return CLIResult(output="", error=f"No such directory: {new_path}")
-        except Exception as e:
-            return CLIResult(output="", error=str(e))
-
-    @staticmethod
-    def _sanitize_command(command: str) -> str:
-        """
-        Sanitize the command for safe execution.
-
-        Args:
-            command (str): The command to sanitize.
-
-        Returns:
-            str: The sanitized command.
-        """
-        # Example sanitization: restrict certain dangerous commands
-        dangerous_commands = ["rm", "sudo", "shutdown", "reboot"]
-        try:
-            parts = shlex.split(command)
-            if any(cmd in dangerous_commands for cmd in parts):
-                raise ValueError("Use of dangerous commands is restricted.")
-        except Exception:
-            # If shlex.split fails, try basic string comparison
-            if any(cmd in command for cmd in dangerous_commands):
-                raise ValueError("Use of dangerous commands is restricted.")
-
-        # Additional sanitization logic can be added here
-        return command
-
-    async def close(self):
-        """Close the persistent shell process if it exists."""
-        async with self.lock:
-            if self.process:
-                self.process.terminate()
-                try:
-                    await asyncio.wait_for(self.process.wait(), timeout=5)
-                except asyncio.TimeoutError:
-                    self.process.kill()
-                    await self.process.wait()
-                finally:
-                    self.process = None
-
-    async def __aenter__(self):
-        """Enter the asynchronous context manager."""
-        return self
-
-    async def __aexit__(self, exc_type, exc_val, exc_tb):
-        """Exit the asynchronous context manager and close the process."""
-        await self.close()
diff --git a/openmanus_rl/agentgym/OpenManus/app/tool/terminate.py b/openmanus_rl/agentgym/OpenManus/app/tool/terminate.py
deleted file mode 100644
index 8c2d82ca..00000000
--- a/openmanus_rl/agentgym/OpenManus/app/tool/terminate.py
+++ /dev/null
@@ -1,25 +0,0 @@
-from app.tool.base import BaseTool
-
-
-_TERMINATE_DESCRIPTION = """Terminate the interaction when the request is met OR if the assistant cannot proceed further with the task.
-When you have finished all the tasks, call this tool to end the work."""
-
-
-class Terminate(BaseTool):
-    name: str = "terminate"
-    description: str = _TERMINATE_DESCRIPTION
-    parameters: dict = {
-        "type": "object",
-        "properties": {
-            "status": {
-                "type": "string",
-                "description": "The finish status of the interaction.",
-                "enum": ["success", "failure"],
-            }
-        },
-        "required": ["status"],
-    }
-
-    async def execute(self, status: str) -> str:
-        """Finish the current execution"""
-        return f"The interaction has been completed with status: {status}"
diff --git a/openmanus_rl/agentgym/OpenManus/app/tool/tool_collection.py b/openmanus_rl/agentgym/OpenManus/app/tool/tool_collection.py
deleted file mode 100644
index 41c4d845..00000000
--- a/openmanus_rl/agentgym/OpenManus/app/tool/tool_collection.py
+++ /dev/null
@@ -1,58 +0,0 @@
-"""Collection classes for managing multiple tools."""
-from typing import Any, Dict, List
-
-from app.exceptions import ToolError
-from app.tool.base import BaseTool, ToolFailure, ToolResult
-
-
-class ToolCollection:
-    """A collection of defined tools."""
-
-    class Config:
-        arbitrary_types_allowed = True
-
-    def __init__(self, *tools: BaseTool):
-        self.tools = tools
-        self.tool_map = {tool.name: tool for tool in tools}
-
-    def __iter__(self):
-        return iter(self.tools)
-
-    def to_params(self) -> List[Dict[str, Any]]:
-        return [tool.to_param() for tool in self.tools]
-
-    async def execute(
-        self, *, name: str, tool_input: Dict[str, Any] = None
-    ) -> ToolResult:
-        tool = self.tool_map.get(name)
-        if not tool:
-            return ToolFailure(error=f"Tool {name} is invalid")
-        try:
-            result = await tool(**tool_input)
-            return result
-        except ToolError as e:
-            return ToolFailure(error=e.message)
-
-    async def execute_all(self) -> List[ToolResult]:
-        """Execute all tools in the collection sequentially."""
-        results = []
-        for tool in self.tools:
-            try:
-                result = await tool()
-                results.append(result)
-            except ToolError as e:
-                results.append(ToolFailure(error=e.message))
-        return results
-
-    def get_tool(self, name: str) -> BaseTool:
-        return self.tool_map.get(name)
-
-    def add_tool(self, tool: BaseTool):
-        self.tools += (tool,)
-        self.tool_map[tool.name] = tool
-        return self
-
-    def add_tools(self, *tools: BaseTool):
-        for tool in tools:
-            self.add_tool(tool)
-        return self
diff --git a/openmanus_rl/agentgym/OpenManus/app/tool/web_search.py b/openmanus_rl/agentgym/OpenManus/app/tool/web_search.py
deleted file mode 100644
index cb139342..00000000
--- a/openmanus_rl/agentgym/OpenManus/app/tool/web_search.py
+++ /dev/null
@@ -1,101 +0,0 @@
-import asyncio
-from typing import List
-
-from tenacity import retry, stop_after_attempt, wait_exponential
-
-from app.config import config
-from app.tool.base import BaseTool
-from app.tool.search import (
-    BaiduSearchEngine,
-    BingSearchEngine,
-    DuckDuckGoSearchEngine,
-    GoogleSearchEngine,
-    WebSearchEngine,
-)
-
-
-class WebSearch(BaseTool):
-    name: str = "web_search"
-    description: str = """Perform a web search and return a list of relevant links.
-    This function attempts to use the primary search engine API to get up-to-date results.
-    If an error occurs, it falls back to an alternative search engine."""
-    parameters: dict = {
-        "type": "object",
-        "properties": {
-            "query": {
-                "type": "string",
-                "description": "(required) The search query to submit to the search engine.",
-            },
-            "num_results": {
-                "type": "integer",
-                "description": "(optional) The number of search results to return. Default is 10.",
-                "default": 10,
-            },
-        },
-        "required": ["query"],
-    }
-    _search_engine: dict[str, WebSearchEngine] = {
-        "google": GoogleSearchEngine(),
-        "baidu": BaiduSearchEngine(),
-        "duckduckgo": DuckDuckGoSearchEngine(),
-        "bing": BingSearchEngine(),
-    }
-
-    async def execute(self, query: str, num_results: int = 10) -> List[str]:
-        """
-        Execute a Web search and return a list of URLs.
-
-        Args:
-            query (str): The search query to submit to the search engine.
-            num_results (int, optional): The number of search results to return. Default is 10.
-
-        Returns:
-            List[str]: A list of URLs matching the search query.
-        """
-        engine_order = self._get_engine_order()
-        for engine_name in engine_order:
-            engine = self._search_engine[engine_name]
-            try:
-                links = await self._perform_search_with_engine(
-                    engine, query, num_results
-                )
-                if links:
-                    return links
-            except Exception as e:
-                print(f"Search engine '{engine_name}' failed with error: {e}")
-        return []
-
-    def _get_engine_order(self) -> List[str]:
-        """
-        Determines the order in which to try search engines.
-        Preferred engine is first (based on configuration), followed by the remaining engines.
-
-        Returns:
-            List[str]: Ordered list of search engine names.
-        """
-        preferred = "google"
-        if config.search_config and config.search_config.engine:
-            preferred = config.search_config.engine.lower()
-
-        engine_order = []
-        if preferred in self._search_engine:
-            engine_order.append(preferred)
-        for key in self._search_engine:
-            if key not in engine_order:
-                engine_order.append(key)
-        return engine_order
-
-    @retry(
-        stop=stop_after_attempt(3),
-        wait=wait_exponential(multiplier=1, min=1, max=10),
-    )
-    async def _perform_search_with_engine(
-        self,
-        engine: WebSearchEngine,
-        query: str,
-        num_results: int,
-    ) -> List[str]:
-        loop = asyncio.get_event_loop()
-        return await loop.run_in_executor(
-            None, lambda: list(engine.perform_search(query, num_results=num_results))
-        )
diff --git a/openmanus_rl/agentgym/OpenManus/assets/community_group.jpg b/openmanus_rl/agentgym/OpenManus/assets/community_group.jpg
deleted file mode 100644
index 3998f0d4..00000000
Binary files a/openmanus_rl/agentgym/OpenManus/assets/community_group.jpg and /dev/null differ
diff --git a/openmanus_rl/agentgym/OpenManus/assets/logo.jpg b/openmanus_rl/agentgym/OpenManus/assets/logo.jpg
deleted file mode 100644
index 634b8f68..00000000
Binary files a/openmanus_rl/agentgym/OpenManus/assets/logo.jpg and /dev/null differ
diff --git a/openmanus_rl/agentgym/OpenManus/config/.gitignore b/openmanus_rl/agentgym/OpenManus/config/.gitignore
deleted file mode 100644
index eaff1825..00000000
--- a/openmanus_rl/agentgym/OpenManus/config/.gitignore
+++ /dev/null
@@ -1,2 +0,0 @@
-# prevent the local config file from being uploaded to the remote repository
-config.toml
diff --git a/openmanus_rl/agentgym/OpenManus/config/config.example.toml b/openmanus_rl/agentgym/OpenManus/config/config.example.toml
deleted file mode 100644
index d5750a2e..00000000
--- a/openmanus_rl/agentgym/OpenManus/config/config.example.toml
+++ /dev/null
@@ -1,77 +0,0 @@
-# Global LLM configuration
-[llm]
-model = "claude-3-7-sonnet-20250219"        # The LLM model to use
-base_url = "https://api.anthropic.com/v1/"  # API endpoint URL
-api_key = "YOUR_API_KEY"                    # Your API key
-max_tokens = 8192                           # Maximum number of tokens in the response
-temperature = 0.0                           # Controls randomness
-
-# [llm] #AZURE OPENAI:
-# api_type= 'azure'
-# model = "YOUR_MODEL_NAME" #"gpt-4o-mini"
-# base_url = "{YOUR_AZURE_ENDPOINT.rstrip('/')}/openai/deployments/{AZURE_DEPOLYMENT_ID}"
-# api_key = "AZURE API KEY"
-# max_tokens = 8096
-# temperature = 0.0
-# api_version="AZURE API VERSION" #"2024-08-01-preview"
-
-# [llm] #OLLAMA:
-# api_type = 'ollama'
-# model = "llama3.2"
-# base_url = "http://localhost:11434/v1"
-# api_key = "ollama"
-# max_tokens = 4096
-# temperature = 0.0
-
-# Optional configuration for specific LLM models
-[llm.vision]
-model = "claude-3-7-sonnet-20250219"        # The vision model to use
-base_url = "https://api.anthropic.com/v1/"  # API endpoint URL for vision model
-api_key = "YOUR_API_KEY"                    # Your API key for vision model
-max_tokens = 8192                           # Maximum number of tokens in the response
-temperature = 0.0                           # Controls randomness for vision model
-
-# [llm.vision] #OLLAMA VISION:
-# api_type = 'ollama'
-# model = "llama3.2-vision"
-# base_url = "http://localhost:11434/v1"
-# api_key = "ollama"
-# max_tokens = 4096
-# temperature = 0.0
-
-# Optional configuration for specific browser configuration
-# [browser]
-# Whether to run browser in headless mode (default: false)
-#headless = false
-# Disable browser security features (default: true)
-#disable_security = true
-# Extra arguments to pass to the browser
-#extra_chromium_args = []
-# Path to a Chrome instance to use to connect to your normal browser
-# e.g. '/Applications/Google Chrome.app/Contents/MacOS/Google Chrome'
-#chrome_instance_path = ""
-# Connect to a browser instance via WebSocket
-#wss_url = ""
-# Connect to a browser instance via CDP
-#cdp_url = ""
-
-# Optional configuration, Proxy settings for the browser
-# [browser.proxy]
-# server = "http://proxy-server:port"
-# username = "proxy-username"
-# password = "proxy-password"
-
-# Optional configuration, Search settings.
-# [search]
-# Search engine for agent to use. Default is "Google", can be set to "Baidu" or "DuckDuckGo".
-#engine = "Google"
-
-## Sandbox configuration
-#[sandbox]
-#use_sandbox = false
-#image = "python:3.12-slim"
-#work_dir = "/workspace"
-#memory_limit = "1g"  # 512m
-#cpu_limit = 2.0
-#timeout = 300
-#network_enabled = true
diff --git a/openmanus_rl/agentgym/OpenManus/examples/japan-travel-plan/japan_travel_guide_instructions.txt b/openmanus_rl/agentgym/OpenManus/examples/japan-travel-plan/japan_travel_guide_instructions.txt
deleted file mode 100644
index a45128fb..00000000
--- a/openmanus_rl/agentgym/OpenManus/examples/japan-travel-plan/japan_travel_guide_instructions.txt
+++ /dev/null
@@ -1,62 +0,0 @@
-JAPAN TRAVEL HANDBOOK - GUIDE TO VERSIONS
-
-Location: D:/OpenManus/
-
-1. DETAILED DIGITAL VERSION
-File: japan_travel_handbook.html
-Best for: Desktop/laptop viewing
-Features:
-- Complete comprehensive guide
-- Detailed itinerary
-- Full proposal planning section
-- All hotel recommendations
-- Comprehensive budget breakdown
-Usage: Open in web browser for trip planning and detailed reference
-
-2. PRINT-FRIENDLY VERSION
-File: japan_travel_handbook_print.html
-Best for: Physical reference during travel
-Features:
-- Condensed essential information
-- Optimized for paper printing
-- Clear, printer-friendly formatting
-- Quick reference tables
-Usage: Print and keep in travel documents folder
-
-3. MOBILE-OPTIMIZED VERSION
-File: japan_travel_handbook_mobile.html
-Best for: On-the-go reference during trip
-Features:
-- Touch-friendly interface
-- Collapsible sections
-- Quick access emergency buttons
-- Dark mode support
-- Responsive design
-Usage: Save to phone's browser bookmarks for quick access
-
-RECOMMENDED SETUP:
-1. Before Trip:
-   - Use detailed version for planning
-   - Print the print-friendly version
-   - Save mobile version to phone
-
-2. During Trip:
-   - Keep printed version with travel documents
-   - Use mobile version for daily reference
-   - Access detailed version when needed for specific information
-
-3. Emergency Access:
-   - Mobile version has quick-access emergency information
-   - Keep printed version as backup
-   - All emergency numbers and contacts in both versions
-
-Note: All versions contain the same core information but are formatted differently for optimal use in different situations.
-
-IMPORTANT DATES:
-- Trip Duration: April 15-23, 2024
-- Proposal Day: April 19, 2024
-- Key Reservation Deadlines:
-  * Flights: Book by January 2024
-  * Hotels: Book by February 2024
-  * Restaurant Reservations: Book by January 2024
-  * JR Pass: Purchase by March 2024
diff --git a/openmanus_rl/agentgym/OpenManus/examples/japan-travel-plan/japan_travel_handbook.html b/openmanus_rl/agentgym/OpenManus/examples/japan-travel-plan/japan_travel_handbook.html
deleted file mode 100644
index 5b5965ec..00000000
--- a/openmanus_rl/agentgym/OpenManus/examples/japan-travel-plan/japan_travel_handbook.html
+++ /dev/null
@@ -1,124 +0,0 @@
-<!DOCTYPE html>
-<html lang="en">
-<head>
-    <meta charset="UTF-8">
-    <meta name="viewport" content="width=device-width, initial-scale=1.0">
-    <title>Japan Travel Handbook - April 15-23, 2024</title>
-    <style>
-        body { font-family: Arial, sans-serif; line-height: 1.6; margin: 0; padding: 20px; }
-        .container { max-width: 1000px; margin: 0 auto; }
-        h1, h2, h3 { color: #333; }
-        .day-item { background: #f9f9f9; padding: 15px; margin: 10px 0; border-radius: 5px; }
-        .important-note { background: #ffe6e6; padding: 10px; border-radius: 5px; }
-        .phrase-table { width: 100%; border-collapse: collapse; }
-        .phrase-table td, .phrase-table th { border: 1px solid #ddd; padding: 8px; }
-        .proposal-spot { background: #e6ffe6; padding: 15px; margin: 10px 0; border-radius: 5px; }
-        .flight-info { background: #e6f3ff; padding: 15px; margin: 10px 0; border-radius: 5px; }
-        .checklist { background: #fff3e6; padding: 15px; margin: 10px 0; border-radius: 5px; }
-        .hotels { background: #e6e6ff; padding: 15px; margin: 10px 0; border-radius: 5px; }
-        .proposal-plan { background: #ffe6ff; padding: 15px; margin: 10px 0; border-radius: 5px; }
-        .checkbox-list li { list-style-type: none; margin-bottom: 8px; }
-        .checkbox-list li:before { content: "☐ "; }
-        .warning { color: #ff4444; }
-    </style>
-</head>
-<body>
-    <div class="container">
-        [Previous content remains the same...]
-
-        <div class="proposal-plan">
-            <h2>🌸 Proposal Planning Guide 🌸</h2>
-
-            <h3>Ring Security & Transport</h3>
-            <ul>
-                <li><strong>Carrying the Ring:</strong>
-                    <ul>
-                        <li>Always keep the ring in your carry-on luggage, never in checked bags</li>
-                        <li>Use a discrete, non-branded box or case</li>
-                        <li>Consider travel insurance that covers jewelry</li>
-                        <li>Keep receipt/appraisal documentation separate from the ring</li>
-                    </ul>
-                </li>
-                <li><strong>Airport Security Tips:</strong>
-                    <ul>
-                        <li>No need to declare the ring unless value exceeds ¥1,000,000 (~$6,700)</li>
-                        <li>If asked, simply state it's "personal jewelry"</li>
-                        <li>Consider requesting private screening to maintain surprise</li>
-                        <li>Keep ring in original box until through security, then transfer to more discrete case</li>
-                    </ul>
-                </li>
-            </ul>
-
-            <h3>Proposal Location Details - Maruyama Park</h3>
-            <ul>
-                <li><strong>Best Timing:</strong>
-                    <ul>
-                        <li>Date: April 19 (Day 5)</li>
-                        <li>Time: 5:30 PM (30 minutes before sunset)</li>
-                        <li>Park closes at 8:00 PM in April</li>
-                    </ul>
-                </li>
-                <li><strong>Specific Spot Recommendations:</strong>
-                    <ul>
-                        <li>Primary Location: Near the famous weeping cherry tree
-                            <br>- Less crowded in early evening
-                            <br>- Beautiful illumination starts at dusk
-                            <br>- Iconic Kyoto backdrop
-                        </li>
-                        <li>Backup Location: Gion Shirakawa area
-                            <br>- Atmospheric stone-paved street
-                            <br>- Traditional buildings and cherry trees
-                            <br>- Beautiful in light rain
-                        </li>
-                    </ul>
-                </li>
-            </ul>
-
-            <h3>Proposal Day Planning</h3>
-            <ul>
-                <li><strong>Morning Preparation:</strong>
-                    <ul>
-                        <li>Confirm weather forecast</li>
-                        <li>Transfer ring to secure pocket/bag</li>
-                        <li>Have backup indoor location details ready</li>
-                    </ul>
-                </li>
-                <li><strong>Suggested Timeline:</strong>
-                    <ul>
-                        <li>4:00 PM: Start heading to Maruyama Park area</li>
-                        <li>4:30 PM: Light refreshments at nearby tea house</li>
-                        <li>5:15 PM: Begin walk through park</li>
-                        <li>5:30 PM: Arrive at proposal spot</li>
-                        <li>6:00 PM: Sunset and illumination begins</li>
-                        <li>7:00 PM: Celebratory dinner reservation</li>
-                    </ul>
-                </li>
-            </ul>
-
-            <h3>Celebration Dinner Options</h3>
-            <ul>
-                <li><strong>Traditional Japanese:</strong> Kikunoi Roan
-                    <br>- Intimate 2-star Michelin restaurant
-                    <br>- Advance reservation required (3 months)
-                    <br>- Price: ¥15,000-20,000 per person
-                </li>
-                <li><strong>Modern Fusion:</strong> The Sodoh
-                    <br>- Beautiful garden views
-                    <br>- Western-style seating available
-                    <br>- Price: ¥12,000-15,000 per person
-                </li>
-            </ul>
-
-            <div class="warning">
-                <h3>Important Notes:</h3>
-                <ul>
-                    <li>Keep proposal plans in separate notes from shared itinerary</li>
-                    <li>Have a backup plan in case of rain (indoor locations listed above)</li>
-                    <li>Consider hiring a local photographer to capture the moment</li>
-                    <li>Save restaurant staff contact info in case of timing changes</li>
-                </ul>
-            </div>
-        </div>
-    </div>
-</body>
-</html>
diff --git a/openmanus_rl/agentgym/OpenManus/examples/japan-travel-plan/japan_travel_handbook_mobile.html b/openmanus_rl/agentgym/OpenManus/examples/japan-travel-plan/japan_travel_handbook_mobile.html
deleted file mode 100644
index 00e1a92c..00000000
--- a/openmanus_rl/agentgym/OpenManus/examples/japan-travel-plan/japan_travel_handbook_mobile.html
+++ /dev/null
@@ -1,255 +0,0 @@
-<!DOCTYPE html>
-<html lang="en">
-<head>
-    <meta charset="UTF-8">
-    <meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no">
-    <title>Japan Travel Guide (Mobile)</title>
-    <style>
-        * { box-sizing: border-box; }
-
-        body {
-            font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, sans-serif;
-            margin: 0;
-            padding: 10px;
-            line-height: 1.6;
-            font-size: 16px;
-        }
-        .container {
-            max-width: 100%;
-            margin: 0 auto;
-        }
-        h1 { font-size: 1.5em; margin: 10px 0; }
-        h2 { font-size: 1.3em; margin: 8px 0; }
-        h3 { font-size: 1.1em; margin: 6px 0; }
-
-        /* Mobile-friendly cards */
-        .card {
-            background: #fff;
-            border-radius: 10px;
-            box-shadow: 0 2px 5px rgba(0,0,0,0.1);
-            margin: 10px 0;
-            padding: 15px;
-        }
-
-        /* Collapsible sections */
-        .collapsible {
-            background: #f8f9fa;
-            border: none;
-            border-radius: 8px;
-            width: 100%;
-            padding: 15px;
-            text-align: left;
-            font-size: 1.1em;
-            font-weight: bold;
-            cursor: pointer;
-            margin: 5px 0;
-        }
-
-        .content {
-            display: none;
-            padding: 10px;
-        }
-
-        .active {
-            background: #e9ecef;
-        }
-
-        /* Mobile-friendly tables */
-        .table-wrapper {
-            overflow-x: auto;
-            margin: 10px 0;
-        }
-        table {
-            width: 100%;
-            border-collapse: collapse;
-            min-width: 300px;
-        }
-        th, td {
-            padding: 10px;
-            border: 1px solid #ddd;
-            text-align: left;
-        }
-        th {
-            background: #f8f9fa;
-        }
-
-        /* Touch-friendly lists */
-        ul, ol {
-            padding-left: 20px;
-            margin: 10px 0;
-        }
-        li {
-            margin: 8px 0;
-            padding: 5px 0;
-        }
-
-        /* Emergency info styling */
-        .emergency {
-            background: #ffe6e6;
-            border-left: 4px solid #ff4444;
-            padding: 10px;
-            margin: 10px 0;
-        }
-
-        /* Quick access buttons */
-        .quick-access {
-            display: flex;
-            flex-wrap: wrap;
-            gap: 10px;
-            margin: 10px 0;
-        }
-        .quick-btn {
-            background: #007bff;
-            color: white;
-            border: none;
-            border-radius: 20px;
-            padding: 10px 20px;
-            font-size: 0.9em;
-            cursor: pointer;
-            flex: 1 1 auto;
-            text-align: center;
-            min-width: 120px;
-        }
-
-        /* Dark mode support */
-        @media (prefers-color-scheme: dark) {
-            body {
-                background: #1a1a1a;
-                color: #fff;
-            }
-            .card {
-                background: #2d2d2d;
-            }
-            .collapsible {
-                background: #333;
-                color: #fff;
-            }
-            .active {
-                background: #404040;
-            }
-            th {
-                background: #333;
-            }
-            td, th {
-                border-color: #404040;
-            }
-        }
-    </style>
-</head>
-<body>
-    <div class="container">
-        <h1>Japan Travel Guide</h1>
-        <p><strong>April 15-23, 2024</strong></p>
-
-        <div class="quick-access">
-            <button class="quick-btn" onclick="showSection('emergency')">Emergency</button>
-            <button class="quick-btn" onclick="showSection('phrases')">Phrases</button>
-            <button class="quick-btn" onclick="showSection('transport')">Transport</button>
-            <button class="quick-btn" onclick="showSection('proposal')">Proposal</button>
-        </div>
-
-        <div class="emergency card" id="emergency">
-            <h2>Emergency Contacts</h2>
-            <ul>
-                <li>🚑 Emergency: 119</li>
-                <li>👮 Police: 110</li>
-                <li>🏢 US Embassy: +81-3-3224-5000</li>
-                <li>ℹ️ Tourist Info: 03-3201-3331</li>
-            </ul>
-        </div>
-
-        <button class="collapsible">📅 Daily Itinerary</button>
-        <div class="content">
-            <div class="table-wrapper">
-                <table>
-                    <tr><th>Date</th><th>Location</th><th>Activities</th></tr>
-                    <tr><td>Apr 15</td><td>Tokyo</td><td>Arrival, Shinjuku</td></tr>
-                    <tr><td>Apr 16</td><td>Tokyo</td><td>Meiji, Harajuku, Senso-ji</td></tr>
-                    <tr><td>Apr 17</td><td>Tokyo</td><td>Tea Ceremony, Budokan</td></tr>
-                    <tr><td>Apr 18</td><td>Kyoto</td><td>Travel, Kinkaku-ji</td></tr>
-                    <tr><td>Apr 19</td><td>Kyoto</td><td>Fushimi Inari, Proposal</td></tr>
-                    <tr><td>Apr 20</td><td>Nara</td><td>Deer Park, Temples</td></tr>
-                    <tr><td>Apr 21</td><td>Tokyo</td><td>Return, Bay Cruise</td></tr>
-                </table>
-            </div>
-        </div>
-
-        <button class="collapsible">🗣️ Essential Phrases</button>
-        <div class="content">
-            <div class="table-wrapper">
-                <table>
-                    <tr><th>English</th><th>Japanese</th></tr>
-                    <tr><td>Thank you</td><td>ありがとう</td></tr>
-                    <tr><td>Excuse me</td><td>すみません</td></tr>
-                    <tr><td>Please</td><td>お願いします</td></tr>
-                    <tr><td>Where is...</td><td>...はどこですか</td></tr>
-                    <tr><td>Help!</td><td>助けて!</td></tr>
-                </table>
-            </div>
-        </div>
-
-        <button class="collapsible">🚅 Transportation</button>
-        <div class="content">
-            <div class="card">
-                <h3>Key Routes</h3>
-                <ul>
-                    <li>Tokyo-Kyoto: 2h15m</li>
-                    <li>Kyoto-Nara: 45m</li>
-                    <li>Last trains: ~midnight</li>
-                </ul>
-                <p><strong>JR Pass:</strong> Activate April 15</p>
-            </div>
-        </div>
-
-        <button class="collapsible">💍 Proposal Plan</button>
-        <div class="content">
-            <div class="card">
-                <h3>April 19 Timeline</h3>
-                <ul>
-                    <li>4:00 PM: Head to Maruyama Park</li>
-                    <li>5:30 PM: Arrive at spot</li>
-                    <li>7:00 PM: Dinner at Kikunoi Roan</li>
-                </ul>
-                <p><strong>Backup:</strong> Gion Shirakawa area</p>
-            </div>
-        </div>
-
-        <button class="collapsible">💰 Budget Tracker</button>
-        <div class="content">
-            <div class="table-wrapper">
-                <table>
-                    <tr><th>Item</th><th>Budget</th></tr>
-                    <tr><td>Hotels</td><td>$1500-2000</td></tr>
-                    <tr><td>Transport</td><td>$600-800</td></tr>
-                    <tr><td>Food</td><td>$800-1000</td></tr>
-                    <tr><td>Activities</td><td>$600-800</td></tr>
-                    <tr><td>Shopping</td><td>$500-400</td></tr>
-                </table>
-            </div>
-        </div>
-    </div>
-
-    <script>
-        // Add click handlers for collapsible sections
-        var coll = document.getElementsByClassName("collapsible");
-        for (var i = 0; i < coll.length; i++) {
-            coll[i].addEventListener("click", function() {
-                this.classList.toggle("active");
-                var content = this.nextElementSibling;
-                if (content.style.display === "block") {
-                    content.style.display = "none";
-                } else {
-                    content.style.display = "block";
-                }
-            });
-        }
-
-        // Function to show specific section
-        function showSection(id) {
-            document.getElementById(id).scrollIntoView({
-                behavior: 'smooth'
-            });
-        }
-    </script>
-</body>
-</html>
diff --git a/openmanus_rl/agentgym/OpenManus/examples/japan-travel-plan/japan_travel_handbook_print.html b/openmanus_rl/agentgym/OpenManus/examples/japan-travel-plan/japan_travel_handbook_print.html
deleted file mode 100644
index b924628c..00000000
--- a/openmanus_rl/agentgym/OpenManus/examples/japan-travel-plan/japan_travel_handbook_print.html
+++ /dev/null
@@ -1,162 +0,0 @@
-<!DOCTYPE html>
-<html lang="en">
-<head>
-    <meta charset="UTF-8">
-    <title>Japan Travel Handbook (Print Version) - April 15-23, 2024</title>
-    <style>
-        @media print {
-            body {
-                font-family: Arial, sans-serif;
-                font-size: 11pt;
-                line-height: 1.4;
-                margin: 0.5in;
-            }
-            h1 { font-size: 16pt; }
-            h2 { font-size: 14pt; }
-            h3 { font-size: 12pt; }
-
-            .section {
-                margin: 10px 0;
-                padding: 5px;
-                border: 1px solid #ccc;
-                page-break-inside: avoid;
-            }
-            .no-break {
-                page-break-inside: avoid;
-            }
-
-            table {
-                border-collapse: collapse;
-                width: 100%;
-                margin: 10px 0;
-            }
-            td, th {
-                border: 1px solid #000;
-                padding: 4px;
-                font-size: 10pt;
-            }
-            ul, ol {
-                margin: 5px 0;
-                padding-left: 20px;
-            }
-            li {
-                margin: 3px 0;
-            }
-            .page-break {
-                page-break-before: always;
-            }
-        }
-        /* Screen styles */
-        body {
-            font-family: Arial, sans-serif;
-            line-height: 1.4;
-            margin: 20px;
-            max-width: 800px;
-            margin: 0 auto;
-        }
-
-        .section {
-            margin: 15px 0;
-            padding: 15px;
-            border: 1px solid #ccc;
-            border-radius: 5px;
-        }
-
-        table {
-            border-collapse: collapse;
-            width: 100%;
-            margin: 10px 0;
-        }
-        td, th {
-            border: 1px solid #000;
-            padding: 8px;
-        }
-        @media screen {
-            .page-break {
-                margin: 30px 0;
-                border-top: 2px dashed #ccc;
-            }
-        }
-    </style>
-</head>
-<body>
-    <h1>Japan Travel Handbook (Print Version)</h1>
-    <p><strong>Trip Dates:</strong> April 15-23, 2024</p>
-
-    <div class="section">
-        <h2>Emergency Contacts & Important Information</h2>
-        <ul>
-            <li>Emergency in Japan: 119 (Ambulance/Fire) / 110 (Police)</li>
-            <li>US Embassy Tokyo: +81-3-3224-5000</li>
-            <li>Tourist Information Hotline: 03-3201-3331</li>
-            <li>Your Travel Insurance: [Write number here]</li>
-        </ul>
-    </div>
-
-    <div class="section">
-        <h2>Daily Itinerary Summary</h2>
-        <table>
-            <tr><th>Date</th><th>Location</th><th>Key Activities</th></tr>
-            <tr><td>Apr 15</td><td>Tokyo</td><td>Arrival, Shinjuku area exploration</td></tr>
-            <tr><td>Apr 16</td><td>Tokyo</td><td>Meiji Shrine, Harajuku, Senso-ji, Skytree</td></tr>
-            <tr><td>Apr 17</td><td>Tokyo</td><td>Tea Ceremony, Budokan, Yanaka Ginza</td></tr>
-            <tr><td>Apr 18</td><td>Kyoto</td><td>Travel to Kyoto, Kinkaku-ji, Gion</td></tr>
-            <tr><td>Apr 19</td><td>Kyoto</td><td>Fushimi Inari, Arashiyama, Evening Proposal</td></tr>
-            <tr><td>Apr 20</td><td>Nara/Kyoto</td><td>Nara Park day trip, deer feeding</td></tr>
-            <tr><td>Apr 21</td><td>Tokyo</td><td>Return to Tokyo, bay cruise</td></tr>
-        </table>
-    </div>
-
-    <div class="page-break"></div>
-
-    <div class="section">
-        <h2>Essential Japanese Phrases</h2>
-        <table>
-            <tr><th>English</th><th>Japanese</th><th>When to Use</th></tr>
-            <tr><td>Arigatou gozaimasu</td><td>ありがとうございます</td><td>Thank you (formal)</td></tr>
-            <tr><td>Sumimasen</td><td>すみません</td><td>Excuse me/Sorry</td></tr>
-            <tr><td>Onegaishimasu</td><td>お願いします</td><td>Please</td></tr>
-            <tr><td>Toire wa doko desu ka?</td><td>トイレはどこですか？</td><td>Where is the bathroom?</td></tr>
-            <tr><td>Eigo ga hanasemasu ka?</td><td>英語が話せますか？</td><td>Do you speak English?</td></tr>
-        </table>
-    </div>
-
-    <div class="section">
-        <h2>Transportation Notes</h2>
-        <ul>
-            <li>JR Pass: Activate on April 15</li>
-            <li>Tokyo-Kyoto Shinkansen: ~2h15m</li>
-            <li>Kyoto-Nara Local Train: ~45m</li>
-            <li>Last trains: Usually around midnight</li>
-            <li>Keep ¥3000 for unexpected taxi rides</li>
-        </ul>
-    </div>
-
-    <div class="page-break"></div>
-
-    <div class="section no-break">
-        <h2>Proposal Day Timeline (April 19)</h2>
-        <table>
-            <tr><th>Time</th><th>Activity</th><th>Notes</th></tr>
-            <tr><td>4:00 PM</td><td>Head to Maruyama Park</td><td>Check weather first</td></tr>
-            <tr><td>4:30 PM</td><td>Tea house visit</td><td>Light refreshments</td></tr>
-            <tr><td>5:15 PM</td><td>Park walk begins</td><td>Head to weeping cherry tree</td></tr>
-            <tr><td>5:30 PM</td><td>Arrive at spot</td><td>Find quiet area</td></tr>
-            <tr><td>7:00 PM</td><td>Dinner reservation</td><td>Kikunoi Roan</td></tr>
-        </table>
-        <p><strong>Backup Location:</strong> Gion Shirakawa area (in case of rain)</p>
-    </div>
-
-    <div class="section">
-        <h2>Quick Reference Budget</h2>
-        <table>
-            <tr><th>Item</th><th>Budget (USD)</th><th>Notes</th></tr>
-            <tr><td>Hotels</td><td>1500-2000</td><td>Pre-booked</td></tr>
-            <tr><td>Transport</td><td>600-800</td><td>Including JR Pass</td></tr>
-            <tr><td>Food</td><td>800-1000</td><td>~$60/person/day</td></tr>
-            <tr><td>Activities</td><td>600-800</td><td>Including tea ceremony</td></tr>
-            <tr><td>Shopping</td><td>500-400</td><td>Souvenirs/gifts</td></tr>
-        </table>
-    </div>
-</body>
-</html>
diff --git a/openmanus_rl/agentgym/OpenManus/examples/pictures/japan-travel-plan-1.png b/openmanus_rl/agentgym/OpenManus/examples/pictures/japan-travel-plan-1.png
deleted file mode 100644
index e9e344e8..00000000
Binary files a/openmanus_rl/agentgym/OpenManus/examples/pictures/japan-travel-plan-1.png and /dev/null differ
diff --git a/openmanus_rl/agentgym/OpenManus/examples/pictures/japan-travel-plan-2.png b/openmanus_rl/agentgym/OpenManus/examples/pictures/japan-travel-plan-2.png
deleted file mode 100644
index 88958ae3..00000000
Binary files a/openmanus_rl/agentgym/OpenManus/examples/pictures/japan-travel-plan-2.png and /dev/null differ
diff --git a/openmanus_rl/agentgym/OpenManus/examples/readme.md b/openmanus_rl/agentgym/OpenManus/examples/readme.md
deleted file mode 100644
index e18592ee..00000000
--- a/openmanus_rl/agentgym/OpenManus/examples/readme.md
+++ /dev/null
@@ -1,16 +0,0 @@
-# Examples
-
-We put some examples in the `examples` directory. All the examples use the same prompt
-as [Manus](https://manus.im/?utm_source=ai-bot.cn).
-
-The Model we use is `claude3.5`.
-
-## Japan Travel Plan
-**Prompt**：
-```
-I need a 7-day Japan itinerary for April 15-23 from Seattle, with a $2500-5000 budget for my fiancée and me. We love historical sites, hidden gems, and Japanese culture (kendo, tea ceremonies, Zen meditation). We want to see Nara's deer and explore cities on foot. I plan to propose during this trip and need a special location recommendation. Please provide a detailed itinerary and a simple HTML travel handbook with maps, attraction descriptions, essential Japanese phrases, and travel tips we can reference throughout our journey.
-```
-**preview**：
-![alt text](./pictures/japan-travel-plan-1.png)
-
-![alt text](./pictures/japan-travel-plan-2.png)
diff --git a/openmanus_rl/agentgym/OpenManus/main.py b/openmanus_rl/agentgym/OpenManus/main.py
deleted file mode 100644
index 60a0032b..00000000
--- a/openmanus_rl/agentgym/OpenManus/main.py
+++ /dev/null
@@ -1,23 +0,0 @@
-import asyncio
-
-from app.agent.manus import Manus
-from app.logger import logger
-
-
-async def main():
-    agent = Manus()
-    try:
-        prompt = input("Enter your prompt: ")
-        if not prompt.strip():
-            logger.warning("Empty prompt provided.")
-            return
-
-        logger.warning("Processing your request...")
-        await agent.run(prompt)
-        logger.info("Request processing completed.")
-    except KeyboardInterrupt:
-        logger.warning("Operation interrupted.")
-
-
-if __name__ == "__main__":
-    asyncio.run(main())
diff --git a/openmanus_rl/agentgym/OpenManus/requirements.txt b/openmanus_rl/agentgym/OpenManus/requirements.txt
deleted file mode 100644
index 7e7b82fa..00000000
--- a/openmanus_rl/agentgym/OpenManus/requirements.txt
+++ /dev/null
@@ -1,33 +0,0 @@
-pydantic~=2.10.6
-openai~=1.66.3
-tenacity~=9.0.0
-pyyaml~=6.0.2
-loguru~=0.7.3
-numpy
-datasets~=3.2.0
-fastapi~=0.115.11
-tiktoken~=0.9.0
-
-html2text~=2024.2.26
-gymnasium~=1.0.0
-pillow~=10.4.0
-browsergym~=0.13.3
-uvicorn~=0.34.0
-unidiff~=0.7.5
-browser-use~=0.1.40
-googlesearch-python~=1.3.0
-baidusearch~=1.0.3
-duckduckgo_search~=7.5.1
-
-aiofiles~=24.1.0
-pydantic_core~=2.27.2
-colorama~=0.4.6
-playwright~=1.50.0
-
-docker~=7.1.0
-pytest~=8.3.5
-pytest-asyncio~=0.25.3
-
-mcp~=1.4.1
-httpx>=0.27.0
-tomli>=2.0.0
diff --git a/openmanus_rl/agentgym/OpenManus/run_flow.py b/openmanus_rl/agentgym/OpenManus/run_flow.py
deleted file mode 100644
index 66872a7a..00000000
--- a/openmanus_rl/agentgym/OpenManus/run_flow.py
+++ /dev/null
@@ -1,50 +0,0 @@
-import asyncio
-import time
-
-from app.agent.manus import Manus
-from app.flow.base import FlowType
-from app.flow.flow_factory import FlowFactory
-from app.logger import logger
-
-
-async def run_flow():
-    agents = {
-        "manus": Manus(),
-    }
-
-    try:
-        prompt = input("Enter your prompt: ")
-
-        if prompt.strip().isspace() or not prompt:
-            logger.warning("Empty prompt provided.")
-            return
-
-        flow = FlowFactory.create_flow(
-            flow_type=FlowType.PLANNING,
-            agents=agents,
-        )
-        logger.warning("Processing your request...")
-
-        try:
-            start_time = time.time()
-            result = await asyncio.wait_for(
-                flow.execute(prompt),
-                timeout=3600,  # 60 minute timeout for the entire execution
-            )
-            elapsed_time = time.time() - start_time
-            logger.info(f"Request processed in {elapsed_time:.2f} seconds")
-            logger.info(result)
-        except asyncio.TimeoutError:
-            logger.error("Request processing timed out after 1 hour")
-            logger.info(
-                "Operation terminated due to timeout. Please try a simpler request."
-            )
-
-    except KeyboardInterrupt:
-        logger.info("Operation cancelled by user.")
-    except Exception as e:
-        logger.error(f"Error: {str(e)}")
-
-
-if __name__ == "__main__":
-    asyncio.run(run_flow())
diff --git a/openmanus_rl/agentgym/OpenManus/run_mcp.py b/openmanus_rl/agentgym/OpenManus/run_mcp.py
deleted file mode 100644
index 9cb36715..00000000
--- a/openmanus_rl/agentgym/OpenManus/run_mcp.py
+++ /dev/null
@@ -1,107 +0,0 @@
-#!/usr/bin/env python
-import argparse
-import asyncio
-import sys
-
-from app.agent.mcp import MCPAgent
-from app.config import config
-from app.logger import logger
-
-
-class MCPRunner:
-    """Runner class for MCP Agent with proper path handling and configuration."""
-
-    def __init__(self):
-        self.root_path = config.root_path
-        self.server_script = self.root_path / "app" / "mcp" / "server.py"
-        self.agent = MCPAgent()
-
-    async def initialize(self, connection_type: str, server_url: str = None) -> None:
-        """Initialize the MCP agent with the appropriate connection."""
-        logger.info(f"Initializing MCPAgent with {connection_type} connection...")
-
-        if connection_type == "stdio":
-            await self.agent.initialize(
-                connection_type="stdio",
-                command=sys.executable,
-                args=[str(self.server_script)],
-            )
-        else:  # sse
-            await self.agent.initialize(connection_type="sse", server_url=server_url)
-
-        logger.info(f"Connected to MCP server via {connection_type}")
-
-    async def run_interactive(self) -> None:
-        """Run the agent in interactive mode."""
-        print("\nMCP Agent Interactive Mode (type 'exit' to quit)\n")
-        while True:
-            user_input = input("\nEnter your request: ")
-            if user_input.lower() in ["exit", "quit", "q"]:
-                break
-            response = await self.agent.run(user_input)
-            print(f"\nAgent: {response}")
-
-    async def run_single_prompt(self, prompt: str) -> None:
-        """Run the agent with a single prompt."""
-        await self.agent.run(prompt)
-
-    async def run_default(self) -> None:
-        """Run the agent in default mode."""
-        await self.agent.run(
-            "Hello, what tools are available to me? Terminate after you have listed the tools."
-        )
-
-    async def cleanup(self) -> None:
-        """Clean up agent resources."""
-        await self.agent.cleanup()
-        logger.info("Session ended")
-
-
-def parse_args() -> argparse.Namespace:
-    """Parse command line arguments."""
-    parser = argparse.ArgumentParser(description="Run the MCP Agent")
-    parser.add_argument(
-        "--connection",
-        "-c",
-        choices=["stdio", "sse"],
-        default="stdio",
-        help="Connection type: stdio or sse",
-    )
-    parser.add_argument(
-        "--server-url",
-        default="http://127.0.0.1:8000/sse",
-        help="URL for SSE connection",
-    )
-    parser.add_argument(
-        "--interactive", "-i", action="store_true", help="Run in interactive mode"
-    )
-    parser.add_argument("--prompt", "-p", help="Single prompt to execute and exit")
-    return parser.parse_args()
-
-
-async def run_mcp() -> None:
-    """Main entry point for the MCP runner."""
-    args = parse_args()
-    runner = MCPRunner()
-
-    try:
-        await runner.initialize(args.connection, args.server_url)
-
-        if args.prompt:
-            await runner.run_single_prompt(args.prompt)
-        elif args.interactive:
-            await runner.run_interactive()
-        else:
-            await runner.run_default()
-
-    except KeyboardInterrupt:
-        logger.info("Program interrupted by user")
-    except Exception as e:
-        logger.error(f"Error running MCPAgent: {str(e)}", exc_info=True)
-        sys.exit(1)
-    finally:
-        await runner.cleanup()
-
-
-if __name__ == "__main__":
-    asyncio.run(run_mcp())
diff --git a/openmanus_rl/agentgym/OpenManus/setup.py b/openmanus_rl/agentgym/OpenManus/setup.py
deleted file mode 100644
index eb36dac1..00000000
--- a/openmanus_rl/agentgym/OpenManus/setup.py
+++ /dev/null
@@ -1,49 +0,0 @@
-from setuptools import find_packages, setup
-
-
-with open("README.md", "r", encoding="utf-8") as fh:
-    long_description = fh.read()
-
-setup(
-    name="openmanus",
-    version="0.1.0",
-    author="mannaandpoem and OpenManus Team",
-    author_email="mannaandpoem@gmail.com",
-    description="A versatile agent that can solve various tasks using multiple tools",
-    long_description=long_description,
-    long_description_content_type="text/markdown",
-    url="https://github.com/mannaandpoem/OpenManus",
-    packages=find_packages(),
-    install_requires=[
-        "pydantic~=2.10.4",
-        "openai>=1.58.1,<1.67.0",
-        "tenacity~=9.0.0",
-        "pyyaml~=6.0.2",
-        "loguru~=0.7.3",
-        "numpy",
-        "datasets~=3.2.0",
-        "html2text~=2024.2.26",
-        "gymnasium~=1.0.0",
-        "pillow~=10.4.0",
-        "browsergym~=0.13.3",
-        "uvicorn~=0.34.0",
-        "unidiff~=0.7.5",
-        "browser-use~=0.1.40",
-        "googlesearch-python~=1.3.0",
-        "aiofiles~=24.1.0",
-        "pydantic_core>=2.27.2,<2.28.0",
-        "colorama~=0.4.6",
-    ],
-    classifiers=[
-        "Programming Language :: Python :: 3",
-        "Programming Language :: Python :: 3.12",
-        "License :: OSI Approved :: MIT License",
-        "Operating System :: OS Independent",
-    ],
-    python_requires=">=3.12",
-    entry_points={
-        "console_scripts": [
-            "openmanus=main:main",
-        ],
-    },
-)
diff --git a/openmanus_rl/agentgym/OpenManus/tests/sandbox/test_client.py b/openmanus_rl/agentgym/OpenManus/tests/sandbox/test_client.py
deleted file mode 100644
index 6b2c61f2..00000000
--- a/openmanus_rl/agentgym/OpenManus/tests/sandbox/test_client.py
+++ /dev/null
@@ -1,110 +0,0 @@
-import tempfile
-from pathlib import Path
-from typing import AsyncGenerator
-
-import pytest
-import pytest_asyncio
-
-from app.config import SandboxSettings
-from app.sandbox.client import LocalSandboxClient, create_sandbox_client
-
-
-@pytest_asyncio.fixture(scope="function")
-async def local_client() -> AsyncGenerator[LocalSandboxClient, None]:
-    """Creates a local sandbox client for testing."""
-    client = create_sandbox_client()
-    try:
-        yield client
-    finally:
-        await client.cleanup()
-
-
-@pytest.fixture(scope="function")
-def temp_dir() -> Path:
-    """Creates a temporary directory for testing."""
-    with tempfile.TemporaryDirectory() as tmp_dir:
-        yield Path(tmp_dir)
-
-
-@pytest.mark.asyncio
-async def test_sandbox_creation(local_client: LocalSandboxClient):
-    """Tests sandbox creation with specific configuration."""
-    config = SandboxSettings(
-        image="python:3.12-slim",
-        work_dir="/workspace",
-        memory_limit="512m",
-        cpu_limit=0.5,
-    )
-
-    await local_client.create(config)
-    result = await local_client.run_command("python3 --version")
-    assert "Python 3.10" in result
-
-
-@pytest.mark.asyncio
-async def test_local_command_execution(local_client: LocalSandboxClient):
-    """Tests command execution in local sandbox."""
-    await local_client.create()
-
-    result = await local_client.run_command("echo 'test'")
-    assert result.strip() == "test"
-
-    with pytest.raises(Exception):
-        await local_client.run_command("sleep 10", timeout=1)
-
-
-@pytest.mark.asyncio
-async def test_local_file_operations(local_client: LocalSandboxClient, temp_dir: Path):
-    """Tests file operations in local sandbox."""
-    await local_client.create()
-
-    # Test write and read operations
-    test_content = "Hello, World!"
-    await local_client.write_file("/workspace/test.txt", test_content)
-    content = await local_client.read_file("/workspace/test.txt")
-    assert content.strip() == test_content
-
-    # Test copying file to container
-    src_file = temp_dir / "src.txt"
-    src_file.write_text("Copy to container")
-    await local_client.copy_to(str(src_file), "/workspace/copied.txt")
-    content = await local_client.read_file("/workspace/copied.txt")
-    assert content.strip() == "Copy to container"
-
-    # Test copying file from container
-    dst_file = temp_dir / "dst.txt"
-    await local_client.copy_from("/workspace/test.txt", str(dst_file))
-    assert dst_file.read_text().strip() == test_content
-
-
-@pytest.mark.asyncio
-async def test_local_volume_binding(local_client: LocalSandboxClient, temp_dir: Path):
-    """Tests volume binding in local sandbox."""
-    bind_path = str(temp_dir)
-    volume_bindings = {bind_path: "/data"}
-
-    await local_client.create(volume_bindings=volume_bindings)
-
-    test_file = temp_dir / "test.txt"
-    test_file.write_text("Volume test")
-
-    content = await local_client.read_file("/data/test.txt")
-    assert "Volume test" in content
-
-
-@pytest.mark.asyncio
-async def test_local_error_handling(local_client: LocalSandboxClient):
-    """Tests error handling in local sandbox."""
-    await local_client.create()
-
-    with pytest.raises(Exception) as exc:
-        await local_client.read_file("/nonexistent.txt")
-    assert "not found" in str(exc.value).lower()
-
-    with pytest.raises(Exception) as exc:
-        await local_client.copy_from("/nonexistent.txt", "local.txt")
-    assert "not found" in str(exc.value).lower()
-
-
-if __name__ == "__main__":
-    pytest.main(["-v", __file__])
diff --git a/openmanus_rl/agentgym/OpenManus/tests/sandbox/test_docker_terminal.py b/openmanus_rl/agentgym/OpenManus/tests/sandbox/test_docker_terminal.py
deleted file mode 100644
index bf0821a1..00000000
--- a/openmanus_rl/agentgym/OpenManus/tests/sandbox/test_docker_terminal.py
+++ /dev/null
@@ -1,104 +0,0 @@
-"""Tests for the AsyncDockerizedTerminal implementation."""
-
-import docker
-import pytest
-import pytest_asyncio
-
-from app.sandbox.core.terminal import AsyncDockerizedTerminal
-
-
-@pytest.fixture(scope="module")
-def docker_client():
-    """Fixture providing a Docker client."""
-    return docker.from_env()
-
-
-@pytest_asyncio.fixture(scope="module")
-async def docker_container(docker_client):
-    """Fixture providing a test Docker container."""
-    container = docker_client.containers.run(
-        "python:3.12-slim",
-        "tail -f /dev/null",
-        name="test_container",
-        detach=True,
-        remove=True,
-    )
-    yield container
-    container.stop()
-
-
-@pytest_asyncio.fixture
-async def terminal(docker_container):
-    """Fixture providing an initialized AsyncDockerizedTerminal instance."""
-    terminal = AsyncDockerizedTerminal(
-        docker_container,
-        working_dir="/workspace",
-        env_vars={"TEST_VAR": "test_value"},
-        default_timeout=30,
-    )
-    await terminal.init()
-    yield terminal
-    await terminal.close()
-
-
-class TestAsyncDockerizedTerminal:
-    """Test cases for AsyncDockerizedTerminal."""
-
-    @pytest.mark.asyncio
-    async def test_basic_command_execution(self, terminal):
-        """Test basic command execution functionality."""
-        result = await terminal.run_command("echo 'Hello World'")
-        assert "Hello World" in result
-
-    @pytest.mark.asyncio
-    async def test_environment_variables(self, terminal):
-        """Test environment variable setting and access."""
-        result = await terminal.run_command("echo $TEST_VAR")
-        assert "test_value" in result
-
-    @pytest.mark.asyncio
-    async def test_working_directory(self, terminal):
-        """Test working directory setup."""
-        result = await terminal.run_command("pwd")
-        assert "/workspace" == result
-
-    @pytest.mark.asyncio
-    async def test_command_timeout(self, docker_container):
-        """Test command timeout functionality."""
-        terminal = AsyncDockerizedTerminal(docker_container, default_timeout=1)
-        await terminal.init()
-        try:
-            with pytest.raises(TimeoutError):
-                await terminal.run_command("sleep 5")
-        finally:
-            await terminal.close()
-
-    @pytest.mark.asyncio
-    async def test_multiple_commands(self, terminal):
-        """Test execution of multiple commands in sequence."""
-        cmd1 = await terminal.run_command("echo 'First'")
-        cmd2 = await terminal.run_command("echo 'Second'")
-        assert "First" in cmd1
-        assert "Second" in cmd2
-
-    @pytest.mark.asyncio
-    async def test_session_cleanup(self, docker_container):
-        """Test proper cleanup of resources."""
-        terminal = AsyncDockerizedTerminal(docker_container)
-        await terminal.init()
-        assert terminal.session is not None
-        await terminal.close()
-        # Verify session is properly cleaned up
-        # Note: session object still exists, but internal connection is closed
-        assert terminal.session is not None
-
-
-# Configure pytest-asyncio
-def pytest_configure(config):
-    """Configure pytest-asyncio."""
-    config.addinivalue_line("asyncio_mode", "strict")
-    config.addinivalue_line("asyncio_default_fixture_loop_scope", "function")
-
-
-if __name__ == "__main__":
-    pytest.main(["-v", __file__])
diff --git a/openmanus_rl/agentgym/OpenManus/tests/sandbox/test_sandbox.py b/openmanus_rl/agentgym/OpenManus/tests/sandbox/test_sandbox.py
deleted file mode 100644
index b21dd6f3..00000000
--- a/openmanus_rl/agentgym/OpenManus/tests/sandbox/test_sandbox.py
+++ /dev/null
@@ -1,152 +0,0 @@
-import pytest
-import pytest_asyncio
-
-from app.sandbox.core.sandbox import DockerSandbox, SandboxSettings
-
-
-@pytest.fixture(scope="module")
-def sandbox_config():
-    """Creates sandbox configuration for testing."""
-    return SandboxSettings(
-        image="python:3.12-slim",
-        work_dir="/workspace",
-        memory_limit="1g",
-        cpu_limit=0.5,
-        network_enabled=True,
-    )
-
-
-@pytest_asyncio.fixture(scope="module")
-async def sandbox(sandbox_config):
-    """Creates and manages a test sandbox instance."""
-    sandbox = DockerSandbox(sandbox_config)
-    await sandbox.create()
-    try:
-        yield sandbox
-    finally:
-        await sandbox.cleanup()
-
-
-@pytest.mark.asyncio
-async def test_sandbox_working_directory(sandbox):
-    """Tests sandbox working directory configuration."""
-    result = await sandbox.terminal.run_command("pwd")
-    assert result.strip() == "/workspace"
-
-
-@pytest.mark.asyncio
-async def test_sandbox_file_operations(sandbox):
-    """Tests sandbox file read/write operations."""
-    # Test file writing
-    test_content = "Hello from sandbox!"
-    await sandbox.write_file("/workspace/test.txt", test_content)
-
-    # Test file reading
-    content = await sandbox.read_file("/workspace/test.txt")
-    assert content.strip() == test_content
-
-
-@pytest.mark.asyncio
-async def test_sandbox_python_execution(sandbox):
-    """Tests Python code execution in sandbox."""
-    # Write test file
-    await sandbox.write_file("/workspace/test.txt", "Hello from file!")
-
-    # Write Python script
-    python_code = """
-print("Hello from Python!")
-with open('/workspace/test.txt') as f:
-    print(f.read())
-"""
-    await sandbox.write_file("/workspace/test.py", python_code)
-
-    # Execute script and verify output
-    result = await sandbox.terminal.run_command("python3 /workspace/test.py")
-    assert "Hello from Python!" in result
-    assert "Hello from file!" in result
-
-
-@pytest.mark.asyncio
-async def test_sandbox_file_persistence(sandbox):
-    """Tests file persistence in sandbox."""
-    # Create multiple files
-    files = {
-        "file1.txt": "Content 1",
-        "file2.txt": "Content 2",
-        "nested/file3.txt": "Content 3",
-    }
-
-    # Write files
-    for path, content in files.items():
-        await sandbox.write_file(f"/workspace/{path}", content)
-
-    # Verify file contents
-    for path, expected_content in files.items():
-        content = await sandbox.read_file(f"/workspace/{path}")
-        assert content.strip() == expected_content
-
-
-@pytest.mark.asyncio
-async def test_sandbox_python_environment(sandbox):
-    """Tests Python environment configuration."""
-    # Test Python version
-    result = await sandbox.terminal.run_command("python3 --version")
-    assert "Python 3.10" in result
-
-    # Test basic module imports
-    python_code = """
-import sys
-import os
-import json
-print("Python is working!")
-"""
-    await sandbox.write_file("/workspace/env_test.py", python_code)
-    result = await sandbox.terminal.run_command("python3 /workspace/env_test.py")
-    assert "Python is working!" in result
-
-
-@pytest.mark.asyncio
-async def test_sandbox_network_access(sandbox):
-    """Tests sandbox network access."""
-    if not sandbox.config.network_enabled:
-        pytest.skip("Network access is disabled")
-
-    # Test network connectivity
-    await sandbox.terminal.run_command("apt update && apt install curl -y")
-    result = await sandbox.terminal.run_command("curl -I https://www.example.com")
-    assert "HTTP/2 200" in result
-
-
-@pytest.mark.asyncio
-async def test_sandbox_cleanup(sandbox_config):
-    """Tests sandbox cleanup process."""
-    sandbox = DockerSandbox(sandbox_config)
-    await sandbox.create()
-
-    # Create test files
-    await sandbox.write_file("/workspace/test.txt", "test")
-    container_id = sandbox.terminal.container.id
-    # Perform cleanup
-    await sandbox.cleanup()
-
-    # Verify container has been removed
-    import docker
-
-    client = docker.from_env()
-    containers = client.containers.list(all=True)
-    assert not any(c.id == container_id for c in containers)
-
-
-@pytest.mark.asyncio
-async def test_sandbox_error_handling():
-    """Tests error handling with invalid configuration."""
-    # Test invalid configuration
-    invalid_config = SandboxSettings(image="nonexistent:latest", work_dir="/invalid")
-
-    sandbox = DockerSandbox(invalid_config)
-    with pytest.raises(Exception):
-        await sandbox.create()
-
-
-if __name__ == "__main__":
-    pytest.main(["-v", __file__])
diff --git a/openmanus_rl/agentgym/OpenManus/tests/sandbox/test_sandbox_manager.py b/openmanus_rl/agentgym/OpenManus/tests/sandbox/test_sandbox_manager.py
deleted file mode 100644
index 09f498d2..00000000
--- a/openmanus_rl/agentgym/OpenManus/tests/sandbox/test_sandbox_manager.py
+++ /dev/null
@@ -1,138 +0,0 @@
-import asyncio
-import os
-import tempfile
-from typing import AsyncGenerator
-
-import pytest
-import pytest_asyncio
-
-from app.sandbox.core.manager import SandboxManager
-
-
-@pytest_asyncio.fixture(scope="function")
-async def manager() -> AsyncGenerator[SandboxManager, None]:
-    """Creates a sandbox manager instance.
-
-    Uses function scope to ensure each test case has its own manager instance.
-    """
-    manager = SandboxManager(max_sandboxes=2, idle_timeout=60, cleanup_interval=30)
-    try:
-        yield manager
-    finally:
-        # Ensure all resources are cleaned up
-        await manager.cleanup()
-
-
-@pytest.fixture
-def temp_file():
-    """Creates a temporary test file."""
-    with tempfile.NamedTemporaryFile(mode="w+", delete=False) as f:
-        f.write("test content")
-        path = f.name
-    try:
-        yield path
-    finally:
-        if os.path.exists(path):
-            os.unlink(path)
-
-
-@pytest.mark.asyncio
-async def test_create_sandbox(manager):
-    """Tests sandbox creation."""
-    # Create default sandbox
-    sandbox_id = await manager.create_sandbox()
-    assert sandbox_id in manager._sandboxes
-    assert sandbox_id in manager._last_used
-
-    # Verify sandbox functionality
-    sandbox = await manager.get_sandbox(sandbox_id)
-    result = await sandbox.run_command("echo 'test'")
-    assert result.strip() == "test"
-
-
-@pytest.mark.asyncio
-async def test_max_sandboxes_limit(manager):
-    """Tests maximum sandbox limit enforcement."""
-    created_sandboxes = []
-    try:
-        # Create maximum number of sandboxes
-        for _ in range(manager.max_sandboxes):
-            sandbox_id = await manager.create_sandbox()
-            created_sandboxes.append(sandbox_id)
-
-        # Verify created sandbox count
-        assert len(manager._sandboxes) == manager.max_sandboxes
-
-        # Attempting to create additional sandbox should fail
-        with pytest.raises(RuntimeError) as exc_info:
-            await manager.create_sandbox()
-
-        # Verify error message
-        expected_message = (
-            f"Maximum number of sandboxes ({manager.max_sandboxes}) reached"
-        )
-        assert str(exc_info.value) == expected_message
-
-    finally:
-        # Clean up all created sandboxes
-        for sandbox_id in created_sandboxes:
-            try:
-                await manager.delete_sandbox(sandbox_id)
-            except Exception as e:
-                print(f"Failed to cleanup sandbox {sandbox_id}: {e}")
-
-
-@pytest.mark.asyncio
-async def test_get_nonexistent_sandbox(manager):
-    """Tests retrieving a non-existent sandbox."""
-    with pytest.raises(KeyError, match="Sandbox .* not found"):
-        await manager.get_sandbox("nonexistent-id")
-
-
-@pytest.mark.asyncio
-async def test_sandbox_cleanup(manager):
-    """Tests sandbox cleanup functionality."""
-    sandbox_id = await manager.create_sandbox()
-    assert sandbox_id in manager._sandboxes
-
-    await manager.delete_sandbox(sandbox_id)
-    assert sandbox_id not in manager._sandboxes
-    assert sandbox_id not in manager._last_used
-
-
-@pytest.mark.asyncio
-async def test_idle_sandbox_cleanup(manager):
-    """Tests automatic cleanup of idle sandboxes."""
-    # Set short idle timeout
-    manager.idle_timeout = 0.1
-
-    sandbox_id = await manager.create_sandbox()
-    assert sandbox_id in manager._sandboxes
-
-    # Wait longer than idle timeout
-    await asyncio.sleep(0.2)
-
-    # Trigger cleanup
-    await manager._cleanup_idle_sandboxes()
-    assert sandbox_id not in manager._sandboxes
-
-
-@pytest.mark.asyncio
-async def test_manager_cleanup(manager):
-    """Tests manager cleanup functionality."""
-    # Create multiple sandboxes
-    sandbox_ids = []
-    for _ in range(2):
-        sandbox_id = await manager.create_sandbox()
-        sandbox_ids.append(sandbox_id)
-
-    # Clean up all resources
-    await manager.cleanup()
-
-    # Verify all sandboxes have been cleaned up
-    assert not manager._sandboxes
-    assert not manager._last_used
-
-
-if __name__ == "__main__":
-    pytest.main(["-v", __file__])
diff --git a/openmanus_rl/llm_agent/openmanus.py b/openmanus_rl/llm_agent/openmanus.py
index 7f181852..5c1b3c5b 100644
--- a/openmanus_rl/llm_agent/openmanus.py
+++ b/openmanus_rl/llm_agent/openmanus.py
@@ -10,11 +10,12 @@
 import importlib # Added import
 import traceback # For error logging
 from concurrent.futures import ThreadPoolExecutor, as_completed # For parallel rollout
-from ragen.utils.plot import (
+from openmanus_rl.utils.visualization import (
     save_trajectory_to_output,
     parse_llm_output
 )
 from verl.utils.tracking import Tracking
+from omegaconf import DictConfig # Import DictConfig for type hint
 
 @dataclass
 class AgentConfig:
@@ -28,28 +29,29 @@ class AgentConfig:
         max_response_length: Maximum length of response
         max_obs_length: Maximum length of observation
         num_gpus: Number of GPUs to use
-        react_format: Whether to use ReAct format
         env_name: Name of the environment (e.g., "webshop")
         env_ports: List of ports for parallel servers
         env_server_base: Base URL for environment server
+        react_format: Whether to use ReAct format
         env_data_len: Number of data samples in the environment (used for client init)
         rollout_strategy: Strategy to use for rollout (StandardReAct/ToT/MCTS)
-        storage_backend: Backend for storing trajectories (mongodb/file)
         max_workers: Maximum number of worker threads
         logging: dict = None  # Contains log_images, log_n_image_per_batch, log_image_step_size, etc.
+        algorithm_config: DictConfig = None # Pass relevant part of algorithm config
     """
+    # All required fields without default values
     max_turns: int
     max_start_length: int
     max_prompt_length: int 
     max_response_length: int
     max_obs_length: int
     num_gpus: int
-    react_format: bool = True
-    
-    # Environment configuration (Now passed from trainer)
     env_name: str 
     env_ports: List[int] # List of ports for parallel servers
     env_server_base: str
+    
+    # All optional fields with default values
+    react_format: bool = True
     env_data_len: int = 200 # Default, might need adjustment
     rollout_strategy: str = "StandardReAct" # Strategy is now internal logic
     # storage_backend: str = "mongodb" # Storage handled elsewhere or not needed here
@@ -58,29 +60,8 @@ class AgentConfig:
     # Add visualization-related configuration
     logging: dict = None  # Contains log_images, log_n_image_per_batch, log_image_step_size, etc.
 
-def create_react_prompt(task_description, tool_manager):
-    """
-    Create a prompt for the agent using ReAct format.
-    
-    Args:
-        task_description: Description of the specific task
-        tool_manager: ToolManager instance with registered tools
-        
-    Returns:
-        Formatted prompt string
-    """
-    tools_instructions = tool_manager.get_prompt_instructions()
-    
-    prompt = f"""# Task
-{task_description}
-
-# Instructions
-{tools_instructions}
-
-Let's solve this step by step.
-
-"""
-    return prompt
+    # Add algorithm config relevant to reward allocation
+    algorithm_config: DictConfig = None # Pass relevant part of algorithm config
 
 class OpenManusAgent:
     def __init__(
@@ -88,7 +69,6 @@ def __init__(
         tokenizer,
         actor_rollout_wg, # This is the Verl component for generation
         config: AgentConfig,
-        tool_manager, # Keep for potential parsing, but execution is via env
         is_validation: bool = False,
         logger: Tracking = None,  # Add logger parameter for trajectory saving
     ):
@@ -98,17 +78,15 @@ def __init__(
         Args:
             tokenizer: Tokenizer for text processing
             actor_rollout_wg: Actor rollout wrapper for generation
-            config: Agent configuration including env details
-            tool_manager: Manager for tool operations (potentially unused)
+            config: Agent configuration including env details and algorithm config
             is_validation: Whether in validation mode
             logger: Logger for tracking and visualization
         """
         self.tokenizer = tokenizer
         self.actor_rollout_wg = actor_rollout_wg
-        self.config = config
-        self.tool_manager = tool_manager
+        self.config = config # AgentConfig now holds algorithm_config
         self.is_validation = is_validation
-        self.logger = logger  # Add logger attribute
+        self.logger = logger
 
         self.tensor_fn = TensorHelper(TensorConfig(
             pad_token_id=tokenizer.pad_token_id,
@@ -412,48 +390,55 @@ def _save_trajectory(self, trajectory: List[Dict],
     def _run_single_rollout(self, initial_prompt_ids: torch.Tensor, task_idx: int, client: Any) -> Dict[str, Any]:
         """
         Runs the interaction loop for a single environment instance using the provided client.
-        
+        Now includes the final computed reward from the environment step in the result.
+
         Args:
             initial_prompt_ids: Token IDs for the initial prompt/observation.
             task_idx: The index for resetting the environment.
             client: The specific environment client instance to use for this rollout.
-            
+
         Returns:
-            A dictionary containing the trajectory, step rewards, final reward, turns,
-            final env score, and original task index.
+            A dictionary containing the trajectory, step rewards, final reward,
+            final env score, turns, and original task index.
         """
         trajectory = []
         step_rewards = []  # Store rewards per step
-        final_reward = 0.0 
-        final_env_score = 0.0
+        final_reward = 0.0 # Reward from the *last step*
+        final_env_score = 0.0 # Final score from env info
         done = False
         turns = 0
-        current_input_ids = None 
+        current_input_ids = None
 
         try:
             # Reset environment using the provided client
-            client.reset(task_idx)
+            # Some envs might need a specific seed or config reset
+            # print(f"[Agent._run_single_rollout][{task_idx}] Resetting env...")
+            reset_info = client.reset(task_idx) # Capture potential info from reset
             initial_obs_text = client.observe()
-            
+            # print(f"[Agent._run_single_rollout][{task_idx}] Initial Obs: {initial_obs_text[:100]}...")
+
             # Handle initial observation
             if not initial_obs_text:
-                print(f"[Agent._run_single_rollout][{task_idx} @ {client.env_server_base}] Warning: Received empty initial observation. Using initial prompt from batch.")
+                # print(f"[Agent._run_single_rollout][{task_idx} @ {client.env_server_base}] Warning: Received empty initial observation. Using initial prompt from batch.")
+                # Use the initial prompt text passed in
                 initial_prompt_text = self.tokenizer.decode(initial_prompt_ids[0], skip_special_tokens=True)
                 trajectory.append({"from": "human", "value": initial_prompt_text})
                 current_input_ids = initial_prompt_ids
             else:
                 trajectory.append({"from": "human", "value": initial_obs_text})
                 current_input_ids = self.tokenizer(initial_obs_text, return_tensors='pt', add_special_tokens=False)['input_ids']
-            
+
             # --- Interaction Loop --- 
             for t in range(self.config.max_turns):
                 turns = t + 1
-                if current_input_ids is None: break 
-                
+                if current_input_ids is None:
+                    # print(f"[Agent._run_single_rollout][{task_idx}] Breaking loop: current_input_ids is None")
+                    break
+
                 # Handle input that exceeds max length
                 if current_input_ids.shape[1] > self.config.max_prompt_length:
+                    # print(f"[Agent._run_single_rollout][{task_idx} @ {client.env_server_base}] Warning: Truncating input {current_input_ids.shape} > {self.config.max_prompt_length}.")
                     current_input_ids = current_input_ids[:, -self.config.max_prompt_length:]
-                    print(f"[Agent._run_single_rollout][{task_idx} @ {client.env_server_base}] Warning: Truncating input {current_input_ids.shape} > {self.config.max_prompt_length}.")
 
                 # Prepare input
                 current_attention_mask = self.tensor_fn.create_attention_mask(current_input_ids)
@@ -465,62 +450,76 @@ def _run_single_rollout(self, initial_prompt_ids: torch.Tensor, task_idx: int, c
                     'attention_mask': current_attention_mask.to(device),
                     'position_ids': current_position_ids.to(device)
                 })
-                
+
                 # Generate response
                 generation_config = GenerationConfig(
                     max_new_tokens=self.config.max_response_length,
                     eos_token_id=self.tokenizer.eos_token_id,
                     pad_token_id=self.tokenizer.pad_token_id,
-                    temperature=1.0, 
-                    do_sample=True   
+                    temperature=1.0, # Consider adjusting temperature/sampling based on validation vs training
+                    do_sample=True
                 )
                 # Generation happens on the actor worker group's device
                 gen_output_proto = self.actor_rollout_wg.generate_sequences(gen_input_proto, generation_config=generation_config)
-                response_ids = gen_output_proto.batch['response_ids'] 
+                response_ids = gen_output_proto.batch['response_ids']
                 response_text = self.tokenizer.decode(response_ids[0], skip_special_tokens=True)
+                # print(f"[Agent._run_single_rollout][{task_idx}][Turn {t+1}] Response: {response_text[:100]}...")
                 trajectory.append({"from": "gpt", "value": response_text})
 
                 # Post-process response to get action
                 action_types, action_contents = self.postprocess_predictions([response_text])
-                action_text = action_contents[0] 
-                
+                action_text = action_contents[0]
+
                 # Execute environment step using the provided client
-                if action_text is None: action_text = "" 
+                if action_text is None: action_text = ""
+                # print(f"[Agent._run_single_rollout][{task_idx}][Turn {t+1}] Action: {action_text}")
                 next_obs_text, reward, done, info = client.step(action_text)
-                
-                # Record rewards
+                # print(f"[Agent._run_single_rollout][{task_idx}][Turn {t+1}] Env Step Result: Reward={reward}, Done={done}, Info={info}")
+
+                # Store the reward from this specific step
                 step_rewards.append(reward)
-                final_reward = reward 
+                final_reward = reward # Keep track of the reward from the last executed step
                 final_env_score = info.get('score', 0.0) # Use .get for safety
 
+                # Add reward and info to the trajectory for this agent step
+                # This helps the RewardComposer access step-specific info if needed
+                trajectory[-1]['reward'] = reward
+                trajectory[-1]['info'] = info
+
                 # Process next observation
                 if not done:
+                    # print(f"[Agent._run_single_rollout][{task_idx}][Turn {t+1}] Next Obs: {next_obs_text[:100]}...")
                     trajectory.append({"from": "human", "value": next_obs_text})
                     next_obs_ids = self.tokenizer(next_obs_text, return_tensors='pt', add_special_tokens=False)['input_ids']
                     # Ensure tensors are concatenated on the same device (e.g., CPU or model's device if needed later)
                     current_input_ids = torch.cat([
                         current_input_ids.to(response_ids.device), # Move to same device as response_ids
-                        response_ids, 
+                        response_ids,
                         next_obs_ids.to(response_ids.device) # Move to same device
                     ], dim=1)
                 else:
-                    break 
+                    # print(f"[Agent._run_single_rollout][{task_idx}][Turn {t+1}] Done received.")
+                    break
 
         except Exception as e:
             print(f"[Agent._run_single_rollout][{task_idx} @ {getattr(client, 'env_server_base', 'unknown_client')}] Error during rollout: {e}")
             print(traceback.format_exc())
+            # Reset results on error
+            trajectory = trajectory # Keep partial trajectory for debugging?
             step_rewards = []
-            final_reward = 0.0 
+            final_reward = 0.0
             final_env_score = 0.0
-            done = True
+            done = True # Mark as done on error
 
+        # Return the collected information
         return {
-            'trajectory': trajectory, 
-            'step_rewards': step_rewards,
-            'reward': final_reward, 
-            'turns': turns, 
-            'env_score': final_env_score,
-            'task_idx': task_idx
+            'trajectory': trajectory,        # Full interaction history
+            'step_rewards': step_rewards,    # List of rewards from each env.step call
+            'reward': final_reward,          # Reward from the *last* env.step call
+            'env_score': final_env_score,    # Final score reported by env info
+            'turns': turns,
+            'task_idx': task_idx,
+            'done': done                   # Whether the episode finished naturally or via error
         }
 
     def run_llm_loop(self, gen_batch: DataProto, output_dir: str = None, global_steps: int = 0) -> DataProto:
@@ -635,23 +634,28 @@ def _convert_rollout_results_to_dataproto(self, results: List[Dict], original_ba
         """
         Convert the list of dictionaries (each containing trajectory, step_rewards, env_score)
         from the internal rollout loop into a DataProto suitable for PPO training.
-        Creates 'token_level_rewards' based on step_rewards.
-        
+        Creates 'token_level_rewards' based on the chosen reward allocation strategy.
+
         Args:
-            results: List of result dictionaries from rollout
-            original_batch: Original batch DataProto with metadata
-            
+            results: List of result dictionaries from _run_single_rollout.
+            original_batch: Original batch DataProto with metadata.
+
         Returns:
-            DataProto: Processed data with rewards and metadata
+            DataProto: Processed data with token-level rewards and metadata.
         """
         batch_input_ids = []
         batch_attention_mask = []
         batch_position_ids = []
-        batch_info_mask = [] 
-        batch_rewards = [] # Store final rewards
-        batch_token_level_rewards = [] # Store step rewards aligned with tokens
+        batch_info_mask = []
+        batch_token_level_rewards = [] # Store final token-level rewards for PPO
         batch_meta_info = defaultdict(list)
 
+        # Get reward allocation strategy from config
+        reward_allocation = "last_token" # Default
+        if self.config.algorithm_config:
+            reward_allocation = self.config.algorithm_config.get('reward_allocation', 'last_token')
+        print(f"[Agent._convert_rollout] Using reward allocation strategy: {reward_allocation}")
+
         # Get the index mapping from the original batch
         original_indices = original_batch.meta_info.get('idx', list(range(original_batch.batch['input_ids'].shape[0])))
         if isinstance(original_indices, torch.Tensor):
@@ -660,68 +664,72 @@ def _convert_rollout_results_to_dataproto(self, results: List[Dict], original_ba
 
         print(f"[Agent._convert_rollout] Formatting {len(results)} trajectories.")
         for result_dict in results:
-            # Extract trajectory and reward information
+            # Extract trajectory and other info
             trajectory = result_dict.get('trajectory', [])
+            # IMPORTANT: Decide which reward signal to use for allocation.
+            # Option 1: Use the final env score (often 0/1 for success)
+            # reward_to_distribute = result_dict.get('env_score', 0.0)
+            # Option 2: Use the reward from the last step
+            # reward_to_distribute = result_dict.get('reward', 0.0)
+            # Option 3: Use the sum/average of step_rewards (less common for final goal tasks)
             step_rewards_list = result_dict.get('step_rewards', [])
-            final_reward = result_dict.get('reward', 0.0)
+            # Let's use final_env_score as the primary signal for allocation, as it usually reflects task completion.
+            reward_to_distribute = result_dict.get('env_score', 0.0)
+
             turns = result_dict.get('turns', 0)
-            env_score = result_dict.get('env_score', 0.0)
             task_idx = result_dict.get('task_idx', -1)
-            
+
             # Get the original batch index
             original_batch_idx = original_indices_map.get(task_idx, -1)
-            if original_batch_idx == -1: 
+            if original_batch_idx == -1:
                 print(f"[Agent._convert_rollout] Warning: Task idx {task_idx} not found in original batch. Skipping.")
                 continue
-                
-            # --- Concatenate conversation and align rewards --- 
+
+            # --- Concatenate conversation and identify agent segments --- 
             conversation_ids_list = []
             info_mask_parts = []
             segment_lengths = [] # Store length of each segment (human/gpt)
-            agent_response_indices = [] # Store indices of agent responses
-            valid_actions = 0
+            agent_response_indices = [] # Store indices of agent responses (in the segment list)
+            valid_actions = 0 # Count of agent turns
 
             if not trajectory:
-                 print(f"[Agent._convert_rollout] Warning: Empty trajectory for task_idx {task_idx}. Using initial prompt only.")
-                 # If trajectory is empty, use original prompt
+                 # ... (handle empty trajectory) ...
                  initial_prompt_ids = original_batch.batch['input_ids'][original_batch_idx:original_batch_idx+1]
                  conversation_ids_list.append(initial_prompt_ids)
                  info_mask_parts.append(torch.ones_like(initial_prompt_ids))
                  segment_lengths.append(initial_prompt_ids.shape[1])
             else:
-                # Process each turn in the trajectory
                 for turn_idx, msg in enumerate(trajectory):
                     msg_text = msg.get("value", "")
                     msg_from = msg.get("from", "")
                     if not msg_text: continue
-                    
-                    # Convert text to token ids
+
                     msg_ids = self.tokenizer(msg_text, add_special_tokens=False, return_tensors='pt')['input_ids']
                     conversation_ids_list.append(msg_ids)
                     segment_lengths.append(msg_ids.shape[1])
-                    
-                    # Distinguish between agent responses and environment observations
+
                     if msg_from == "gpt":
-                        info_mask_parts.append(torch.ones_like(msg_ids)) 
-                        valid_actions += 1 
-                        agent_response_indices.append(len(conversation_ids_list) - 1) # Store index of this segment
-                    else: 
-                        info_mask_parts.append(torch.ones_like(msg_ids)) 
-            
-            # Concatenate, Pad, Truncate (Input IDs, Info Mask)
+                        info_mask_parts.append(torch.ones_like(msg_ids))
+                        valid_actions += 1
+                        agent_response_indices.append(len(conversation_ids_list) - 1)
+                    else: # human or other
+                        info_mask_parts.append(torch.ones_like(msg_ids))
+
             if not conversation_ids_list:
                 print(f"[Agent._convert_rollout] Warning: No valid conversation segments for task_idx {task_idx}. Skipping.")
                 continue
-                
-            # Concatenate all conversation segments
+
+            # --- Pad and Truncate --- 
+            # ... (Padding and truncation logic remains the same) ...
             full_input_ids = torch.cat(conversation_ids_list, dim=1)
             full_info_mask = torch.cat(info_mask_parts, dim=1)
             seq_len = full_input_ids.shape[1]
-            target_len = self.config.max_prompt_length 
+            target_len = self.config.max_prompt_length # Or another max len? Check this
             padding_len = max(0, target_len - seq_len)
+            agent_indices_in_padded = [] # List of (start, end) indices for agent tokens in the final padded tensor
 
             if seq_len > target_len:
-                # Truncate from left - need to adjust segment_lengths and indices
+                # Truncate left
                 removed_len = seq_len - target_len
                 current_removed = 0
                 first_segment_idx = 0
@@ -732,61 +740,72 @@ def _convert_rollout_results_to_dataproto(self, results: List[Dict], original_ba
                     if segment_lengths[first_segment_idx] == 0:
                         first_segment_idx += 1
                 
-                # Adjust agent response indices if segments were removed
-                agent_response_indices = [idx for idx in agent_response_indices if idx >= first_segment_idx]
-                # Recalculate indices relative to the truncated start
-                agent_response_indices = [idx - first_segment_idx for idx in agent_response_indices]
-                # Update segment_lengths list
+                # Adjust agent response indices
+                adjusted_agent_response_indices = [idx - first_segment_idx for idx in agent_response_indices if idx >= first_segment_idx]
                 segment_lengths = segment_lengths[first_segment_idx:]
                 
-                # Truncate input_ids and info_mask
                 full_input_ids = full_input_ids[:, -target_len:]
                 full_info_mask = full_info_mask[:, -target_len:]
-                seq_len = target_len # Update sequence length
-
+                seq_len = target_len
+                padding_len = 0 # No padding needed after truncation
             elif seq_len < target_len:
-                # Pad left (Input IDs)
+                # Pad left
                 pad_tensor = torch.full((1, padding_len), self.tokenizer.pad_token_id, dtype=torch.long, device=full_input_ids.device)
-                full_input_ids = torch.cat([pad_tensor, full_input_ids], dim=1) 
-                # Pad left (Info Mask)
-                info_pad = torch.zeros_like(pad_tensor) # Padding is masked
+                full_input_ids = torch.cat([pad_tensor, full_input_ids], dim=1)
+                info_pad = torch.zeros_like(pad_tensor) # Padding is masked in info
                 full_info_mask = torch.cat([info_pad, full_info_mask], dim=1)
-            
-            # --- Create Token Level Rewards Tensor --- 
+                adjusted_agent_response_indices = agent_response_indices # Indices remain the same relative to segments
+
+            # Calculate agent token indices in the *final* padded/truncated tensor
+            current_token_idx_in_padded = padding_len
+            for segment_idx, length in enumerate(segment_lengths):
+                 is_agent_response = segment_idx in adjusted_agent_response_indices
+                 start_idx = current_token_idx_in_padded
+                 end_idx = current_token_idx_in_padded + length - 1
+                 if is_agent_response and length > 0:
+                      agent_indices_in_padded.append((start_idx, end_idx))
+                 current_token_idx_in_padded += length
+
+            # --- Create Token Level Rewards Tensor based on Allocation Strategy --- 
             token_level_rewards = torch.zeros_like(full_input_ids, dtype=torch.float32)
-            
-            # If there are step rewards, assign them to appropriate tokens
-            if step_rewards_list:
-                current_token_idx_in_unpadded = 0 
-                agent_turn_reward_idx = 0
-                for segment_idx, length in enumerate(segment_lengths):
-                    if length == 0: continue # Skip segments that were fully truncated
-                    
-                    # Check if this segment corresponds to an agent response
-                    is_agent_response = segment_idx in agent_response_indices
-                    
-                    if is_agent_response and agent_turn_reward_idx < len(step_rewards_list):
-                        # Assign reward for this step
-                        reward_for_this_step = step_rewards_list[agent_turn_reward_idx]
-                        # Assign reward to the last token of this agent segment
-                        end_idx_in_unpadded = current_token_idx_in_unpadded + length - 1
-                        actual_end_idx_in_padded = padding_len + end_idx_in_unpadded # Adjust for padding
-                        if actual_end_idx_in_padded < target_len:
-                            token_level_rewards[0, actual_end_idx_in_padded] = reward_for_this_step
-                        agent_turn_reward_idx += 1
-                    
-                    current_token_idx_in_unpadded += length
-            
-            # --- Add reward shaping variations, supporting multiple reward distribution methods ---
-            # 1. If there's only one reward, distribute it across all agent response tokens
-            if len(step_rewards_list) == 1 and valid_actions > 0:
-                # Distribute reward across all agent response tokens
-                reward_value = step_rewards_list[0] / max(1, valid_actions)
-                # Identify agent response tokens where info_mask is 1
-                agent_token_mask = (full_info_mask == 1)
-                token_level_rewards = torch.where(agent_token_mask, 
-                                                torch.full_like(token_level_rewards, reward_value), 
-                                                token_level_rewards)
+
+            if agent_indices_in_padded: # Only allocate if there are agent responses
+                if reward_allocation == "last_token":
+                    # Assign reward only to the last token of the last agent segment
+                    last_segment_start, last_segment_end = agent_indices_in_padded[-1]
+                    if last_segment_end < target_len: # Ensure index is within bounds
+                        token_level_rewards[0, last_segment_end] = reward_to_distribute
+
+                elif reward_allocation == "uniform_positive":
+                    # Distribute positive rewards evenly across all agent tokens
+                    if reward_to_distribute > 0:
+                        total_agent_tokens = sum(end - start + 1 for start, end in agent_indices_in_padded)
+                        reward_per_token = reward_to_distribute / max(1, total_agent_tokens)
+                        for start, end in agent_indices_in_padded:
+                            token_level_rewards[0, start : end + 1] = reward_per_token
+                    # Negative rewards are assigned to the last token (or ignored)
+                    elif reward_to_distribute < 0:
+                         last_segment_start, last_segment_end = agent_indices_in_padded[-1]
+                         if last_segment_end < target_len:
+                              token_level_rewards[0, last_segment_end] = reward_to_distribute
+
+                elif reward_allocation == "discounted":
+                    # Distribute reward starting from the last agent segment, discounted backward
+                    gamma = self.config.algorithm_config.get('gamma', 1.0) if self.config.algorithm_config else 1.0
+                    current_reward = reward_to_distribute
+                    # Iterate segments backward
+                    for start, end in reversed(agent_indices_in_padded):
+                        segment_len = end - start + 1
+                        # Simple example: distribute reward uniformly within the segment
+                        reward_for_segment = current_reward / segment_len
+                        token_level_rewards[0, start : end + 1] = reward_for_segment
+                        # Apply discount for the *next* (earlier) segment
+                        current_reward *= (gamma ** segment_len)
+                else:
+                     print(f"[Agent._convert_rollout] Warning: Unknown reward_allocation strategy '{reward_allocation}'. Defaulting to last_token.")
+                     last_segment_start, last_segment_end = agent_indices_in_padded[-1]
+                     if last_segment_end < target_len:
+                         token_level_rewards[0, last_segment_end] = reward_to_distribute
 
             # --- Create Attention Mask and Position IDs --- 
             full_attention_mask = self.tensor_fn.create_attention_mask(full_input_ids)
@@ -796,142 +815,102 @@ def _convert_rollout_results_to_dataproto(self, results: List[Dict], original_ba
             batch_input_ids.append(full_input_ids)
             batch_attention_mask.append(full_attention_mask)
             batch_position_ids.append(full_position_ids)
-            batch_info_mask.append(full_info_mask)
-            batch_token_level_rewards.append(token_level_rewards) # Store rewards tensor
-            batch_rewards.append(final_reward) # Store final reward
-            
-            # Add metadata
+            batch_info_mask.append(full_info_mask) # Store the info mask
+            batch_token_level_rewards.append(token_level_rewards) # Store calculated rewards
+
+            # Add metadata (ensure reward/env_score reflect the values used for distribution if needed)
             batch_meta_info["task_idx"].append(task_idx)
             batch_meta_info["turns_stats"].append(turns)
             batch_meta_info["valid_action_stats"].append(valid_actions)
-            batch_meta_info["reward"].append(final_reward) # Last step reward
-            batch_meta_info["env_score"].append(env_score) 
-            batch_meta_info["rollout_trajectory"].append(trajectory) # Add trajectory list
-            
-            # --- Add reward_model information ---
-            if 'reward_model' in original_batch.meta_info:
-                if isinstance(original_batch.meta_info['reward_model'], list) and len(original_batch.meta_info['reward_model']) > original_batch_idx:
-                    # Assume reward_model is a list in the batch metadata
-                    batch_meta_info["reward_model"].append(original_batch.meta_info['reward_model'][original_batch_idx])
-                elif isinstance(original_batch.meta_info['reward_model'], dict):
-                    # If reward_model is a single dict passed for the whole batch (less likely)
-                    if original_batch_idx == 0: # Add only once
-                        batch_meta_info["reward_model"] = original_batch.meta_info['reward_model']
-            
-            # Copy other relevant metadata from the original batch
+            batch_meta_info["reward"].append(result_dict.get('reward', 0.0)) # Last step reward
+            batch_meta_info["env_score"].append(result_dict.get('env_score', 0.0)) # Final env score
+            batch_meta_info["rollout_trajectory"].append(trajectory)
+            # Add other relevant metadata from original_batch
+            # ... (metadata copying logic as before) ...
             for key, value in original_batch.meta_info.items():
-                # Avoid duplicating keys already handled (idx, reward, reward_model)
-                if key not in ['idx', 'reward', 'reward_model']:
-                    if isinstance(value, list) and len(value) > original_batch_idx:
-                        batch_meta_info[key].append(value[original_batch_idx])
-                    elif not isinstance(value, list): # Keep non-list metadata (add only once)
-                        if original_batch_idx == 0: # Add only once
-                            batch_meta_info[key] = value 
+                 if key not in ['idx', 'reward', 'env_score']: # Avoid duplication
+                      if isinstance(value, list) and len(value) > original_batch_idx:
+                           batch_meta_info[key].append(value[original_batch_idx])
+                      elif not isinstance(value, list): # Keep non-list metadata
+                           if task_idx == original_indices[0]: # Add only once per batch
+                                batch_meta_info[key] = value
 
         # --- Stack Tensors --- 
-        if not batch_input_ids: 
+        if not batch_input_ids:
             print("[Agent._convert_rollout] No valid trajectories formatted. Returning empty DataProto.")
-            return DataProto.from_dict({}) 
-            
+            # Return structure matching trainer expectations, even if empty
+            return DataProto.from_dict({
+                "input_ids": torch.empty((0,0), dtype=torch.long),
+                "attention_mask": torch.empty((0,0), dtype=torch.long),
+                "position_ids": torch.empty((0,0), dtype=torch.long),
+                "info_mask": torch.empty((0,0), dtype=torch.long),
+                "token_level_rewards": torch.empty((0,0), dtype=torch.float)
+            })
+
         # Create final batch data
         final_batch = {
             "input_ids": torch.cat(batch_input_ids, dim=0),
             "attention_mask": torch.cat(batch_attention_mask, dim=0),
             "position_ids": torch.cat(batch_position_ids, dim=0),
-            "info_mask": torch.cat(batch_info_mask, dim=0), 
-            "token_level_rewards": torch.cat(batch_token_level_rewards, dim=0) # Add stacked rewards tensor
+            "info_mask": torch.cat(batch_info_mask, dim=0),
+            "token_level_rewards": torch.cat(batch_token_level_rewards, dim=0) # Crucial output for PPO
         }
-        
+
         # Create DataProto and add metadata
         data_proto = DataProto.from_dict(final_batch)
+        # ... (metadata handling as before, converting lists to tensors where appropriate) ...
         for key, value in batch_meta_info.items():
             try:
-                # Try to convert values to tensors
                 if isinstance(value, list) and all(isinstance(item, (int, float)) for item in value):
                     data_proto.meta_info[key] = torch.tensor(value)
+                # Handle numpy arrays if they appear
+                elif isinstance(value, np.ndarray):
+                     data_proto.meta_info[key] = torch.from_numpy(value)
                 else:
+                    # Keep as list for non-numeric types (like trajectories)
                     data_proto.meta_info[key] = value
-            except (ValueError, TypeError):
-                data_proto.meta_info[key] = value 
-        
-        # Add rewards tensor
-        data_proto.meta_info["rewards"] = torch.tensor(batch_rewards, dtype=torch.float32)
-        
-        # Explicitly add environment scores
+            except (ValueError, TypeError, RuntimeError) as e:
+                 # Fallback: keep as list if tensor conversion fails
+                 print(f"[Agent._convert_rollout] Warning: Could not convert metadata '{key}' to tensor: {e}. Keeping as list.")
+                 data_proto.meta_info[key] = value
+
+        # Explicitly add final env scores as a tensor if possible
         if "env_score" in batch_meta_info:
             try:
                 data_proto.meta_info["env_scores"] = torch.tensor(batch_meta_info["env_score"], dtype=torch.float32)
             except (ValueError, TypeError):
-                print("[Agent._convert_rollout] Could not convert env_scores to tensor, keeping as list.")
+                # This case should be less likely now, but keep fallback
+                print("[Agent._convert_rollout] Could not convert env_scores to tensor, keeping original list.")
                 data_proto.meta_info["env_scores"] = batch_meta_info["env_score"]
-                 
+
         print(f"[Agent._convert_rollout] Final batch shapes: input_ids={final_batch['input_ids'].shape}, token_level_rewards={final_batch['token_level_rewards'].shape}")
         return data_proto
 
-    def execute_predictions(self, predictions: List[str], pad_token: str, active_mask=None, execute_tools=True) -> Tuple[List[str], List[bool], List[bool], List[bool]]:
-        """
-        Execute predictions (Placeholder - Actual execution handled by AgentGym Task via controller).
-        
-        This method is likely called by the RolloutController's strategy.
-        In the AgentGym setup, the controller gets the prediction string from the agent 
-        (via model generation) and passes it to the task's step function.
-        This method here might only be needed for parsing or returning flags, not execution.
-        
-        Args:
-            predictions: List of action predictions (strings from model)
-            pad_token: Padding token
-            active_mask: Mask for active sequences
-            execute_tools: Whether to execute tools (Likely ignored)
-            
-        Returns:
-            Placeholder tuple. The actual next_obs, dones, etc., come from the Task's step method.
-        """
-        # The original implementation had a recursive call to rollout_controller._rollout_one, which is incorrect.
-        # The RolloutController orchestrates the flow: agent predicts -> controller passes prediction to task.step -> task executes -> task returns obs/done.
-        # Therefore, this method in the agent should likely *not* execute tools or interact with the environment directly.
-        
-        # For now, return placeholder values. The actual values are determined by the Task environment.
-        # We need to understand how the chosen Strategy uses this method, if at all.
-        num_preds = len(predictions)
-        dummy_obs = ["" for _ in range(num_preds)]
-        dummy_dones = [False for _ in range(num_preds)] # Assume not done unless Task says otherwise
-        dummy_valid = [True for _ in range(num_preds)] # Assume valid unless parsing fails
-        dummy_tool_use = [False for _ in range(num_preds)] # Determine based on prediction parsing
-        
-        # Basic check if prediction looks like a tool call based on common patterns
-        actions, _ = self.postprocess_predictions(predictions)
-        for i, action_type in enumerate(actions):
-            if action_type == 'action':
-                dummy_tool_use[i] = True
-                
-        # If not using tools (e.g., final response), these flags might be different.
-        # This part might need refinement based on how the strategy/trainer uses these return values.
-
-        print(f"[Agent.execute_predictions] Received {num_preds} predictions. Returning placeholder env state.")
-        return dummy_obs, dummy_dones, dummy_valid, dummy_tool_use
 
     def postprocess_predictions(self, predictions: List[Any]) -> Tuple[List[str], List[str]]:
         """
-        Process predictions into actions and content.
-        
+        Process predictions into actions and content based on XML-like tags.
+        Does not require tool_manager.
+
         Args:
-            predictions: List of raw predictions
-            
+            predictions: List of raw predictions (strings from LLM)
+
         Returns:
-            Tuple of (action types list, action contents list)
+            Tuple of (action types list ['action' or 'response' or None],
+                    action contents list [text inside tags or empty string])
         """
         actions = []
         contents = []
-                
+
         for prediction in predictions:
             if isinstance(prediction, str):
                 # Extract action or response tags
                 action_pattern = r'<action>(.*?)</action>'
                 response_pattern = r'<response>(.*?)</response>'
-                
+
                 action_match = re.search(action_pattern, prediction, re.DOTALL)
                 response_match = re.search(response_pattern, prediction, re.DOTALL)
-                
+
                 if action_match:
                     actions.append('action')
                     contents.append(action_match.group(1).strip())
@@ -939,9 +918,14 @@ def postprocess_predictions(self, predictions: List[Any]) -> Tuple[List[str], Li
                     actions.append('response')
                     contents.append(response_match.group(1).strip())
                 else:
+                    # If no recognized tag, assume it's neither a specific action nor response
                     actions.append(None)
-                    contents.append('')
+                    contents.append('') # Return empty content if no tag found
             else:
-                raise ValueError(f"Invalid prediction type: {type(prediction)}")
-            
+                # Handle non-string predictions if necessary, e.g., raise error or log warning
+                print(f"[Warning] Received non-string prediction: {type(prediction)}. Cannot process.")
+                actions.append(None)
+                contents.append('')
+                # Or raise ValueError(f"Invalid prediction type: {type(prediction)}")
+
         return actions, contents
\ No newline at end of file
diff --git a/openmanus_rl/utils/__init__.py b/openmanus_rl/utils/__init__.py
new file mode 100644
index 00000000..5355cc93
--- /dev/null
+++ b/openmanus_rl/utils/__init__.py
@@ -0,0 +1,3 @@
+"""
+Utility modules for OpenManus-RL.
+""" 
\ No newline at end of file
diff --git a/openmanus_rl/utils/visualization.py b/openmanus_rl/utils/visualization.py
new file mode 100644
index 00000000..e7c593d3
--- /dev/null
+++ b/openmanus_rl/utils/visualization.py
@@ -0,0 +1,206 @@
+"""
+Visualization utilities for OpenManus agent trajectories.
+Replaces dependency on ragen.utils.plot with custom implementation.
+"""
+
+import os
+import re
+import json
+import datetime
+import numpy as np
+import matplotlib.pyplot as plt
+from typing import List, Dict, Any, Optional, Union, Tuple
+from PIL import Image
+
+
+def parse_llm_output(text: str, strategy: str = "raw") -> Dict[str, Any]:
+    """
+    Parse LLM output text according to different strategies.
+    
+    Args:
+        text: Raw text output from LLM
+        strategy: Parsing strategy, options:
+            - "raw": Return raw text
+            - "action_response": Extract <action> and <response> tags
+            - "react": Parse ReAct format (Thought, Action, Observation)
+    
+    Returns:
+        Dictionary with parsed components
+    """
+    result = {"raw": text}
+    
+    if strategy == "raw":
+        return result
+    
+    if strategy == "action_response":
+        # Extract action and response tags
+        action_match = re.search(r'<action>(.*?)</action>', text, re.DOTALL)
+        response_match = re.search(r'<response>(.*?)</response>', text, re.DOTALL)
+        
+        result["action"] = action_match.group(1).strip() if action_match else None
+        result["response"] = response_match.group(1).strip() if response_match else None
+        return result
+    
+    if strategy == "react":
+        # Parse ReAct format (Thought, Action, Observation)
+        thought_match = re.search(r'(?:Think:|Thought:)\s*(.*?)(?:Act:|Action:|$)', text, re.DOTALL)
+        action_match = re.search(r'(?:Act:|Action:)\s*(.*?)(?:Obs:|Observation:|$)', text, re.DOTALL)
+        obs_match = re.search(r'(?:Obs:|Observation:)\s*(.*?)$', text, re.DOTALL)
+        
+        result["thought"] = thought_match.group(1).strip() if thought_match else None
+        result["action"] = action_match.group(1).strip() if action_match else None
+        result["observation"] = obs_match.group(1).strip() if obs_match else None
+        return result
+    
+    # Default case
+    return result
+
+
+def save_trajectory_to_output(
+    trajectories: List[Dict[str, Any]],
+    save_dir: str,
+    format: str = "html",
+    prefix: str = "trajectory"
+) -> List[str]:
+    """
+    Save agent trajectories to output files.
+    
+    Args:
+        trajectories: List of trajectory dictionaries
+        save_dir: Directory to save output files
+        format: Output format (html, json, txt)
+        prefix: Filename prefix
+    
+    Returns:
+        List of saved file paths
+    """
+    os.makedirs(save_dir, exist_ok=True)
+    timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
+    saved_files = []
+    
+    for i, traj in enumerate(trajectories):
+        filename = f"{prefix}_{timestamp}_{i}"
+        
+        if format == "json":
+            # Save as JSON
+            filepath = os.path.join(save_dir, f"{filename}.json")
+            with open(filepath, 'w') as f:
+                json.dump(traj, f, indent=2)
+            saved_files.append(filepath)
+            
+        elif format == "html":
+            # Save as HTML with visualization
+            filepath = os.path.join(save_dir, f"{filename}.html")
+            _create_html_visualization(traj, filepath)
+            saved_files.append(filepath)
+            
+        elif format == "txt":
+            # Save as plain text
+            filepath = os.path.join(save_dir, f"{filename}.txt")
+            with open(filepath, 'w') as f:
+                f.write(_trajectory_to_text(traj))
+            saved_files.append(filepath)
+            
+        # If trajectory contains state images, save them too
+        if "state" in traj and isinstance(traj["state"], list):
+            state_dir = os.path.join(save_dir, f"{filename}_states")
+            os.makedirs(state_dir, exist_ok=True)
+            
+            for j, state_img in enumerate(traj["state"]):
+                if isinstance(state_img, np.ndarray):
+                    img_path = os.path.join(state_dir, f"state_{j:03d}.png")
+                    Image.fromarray(state_img).save(img_path)
+                    saved_files.append(img_path)
+    
+    return saved_files
+
+
+def _trajectory_to_text(trajectory: Dict[str, Any]) -> str:
+    """Convert a trajectory to formatted text."""
+    text = "AGENT TRAJECTORY\n" + "="*50 + "\n\n"
+    
+    # Add answers/responses
+    if "answer" in trajectory and isinstance(trajectory["answer"], list):
+        for i, answer in enumerate(trajectory["answer"]):
+            text += f"STEP {i+1}:\n"
+            text += f"Agent Response:\n{answer}\n\n"
+    
+    # Add parsed responses if available
+    if "parsed_response" in trajectory and isinstance(trajectory["parsed_response"], list):
+        text += "\nPARSED RESPONSES\n" + "-"*50 + "\n\n"
+        for i, parsed in enumerate(trajectory["parsed_response"]):
+            text += f"Step {i+1} Parsed:\n"
+            for key, value in parsed.items():
+                if value:
+                    text += f"  {key}: {value}\n"
+            text += "\n"
+    
+    return text
+
+
+def _create_html_visualization(trajectory: Dict[str, Any], filepath: str) -> None:
+    """Create HTML visualization of a trajectory."""
+    html = """
+    <!DOCTYPE html>
+    <html>
+    <head>
+        <title>Agent Trajectory Visualization</title>
+        <style>
+            body { font-family: Arial, sans-serif; margin: 20px; }
+            .step { margin-bottom: 20px; border: 1px solid #ddd; padding: 15px; border-radius: 5px; }
+            .response { white-space: pre-wrap; background: #f5f5f5; padding: 10px; border-radius: 3px; }
+            .parsed { margin-top: 10px; color: #555; }
+            .thought { color: #0066cc; }
+            .action { color: #cc6600; }
+            .observation { color: #006600; }
+            img { max-width: 100%; max-height: 300px; margin-top: 10px; }
+        </style>
+    </head>
+    <body>
+        <h1>Agent Trajectory Visualization</h1>
+    """
+    
+    # Process each step
+    steps = len(trajectory.get("answer", []))
+    for i in range(steps):
+        html += f'<div class="step"><h3>Step {i+1}</h3>'
+        
+        # Add agent response
+        if "answer" in trajectory and i < len(trajectory["answer"]):
+            html += f'<div class="response">{trajectory["answer"][i]}</div>'
+        
+        # Add parsed components
+        if "parsed_response" in trajectory and i < len(trajectory["parsed_response"]):
+            parsed = trajectory["parsed_response"][i]
+            html += '<div class="parsed">'
+            
+            if "thought" in parsed and parsed["thought"]:
+                html += f'<div class="thought"><strong>Thought:</strong> {parsed["thought"]}</div>'
+            
+            if "action" in parsed and parsed["action"]:
+                html += f'<div class="action"><strong>Action:</strong> {parsed["action"]}</div>'
+                
+            if "observation" in parsed and parsed["observation"]:
+                html += f'<div class="observation"><strong>Observation:</strong> {parsed["observation"]}</div>'
+                
+            html += '</div>'
+        
+        # Add state image if available
+        if "state" in trajectory and isinstance(trajectory["state"], list) and i < len(trajectory["state"]):
+            html += f'<div><img src="{os.path.basename(filepath)}_states/state_{i:03d}.png" alt="State {i}"></div>'
+            
+        html += '</div>'
+    
+    html += """
+    </body>
+    </html>
+    """
+    
+    with open(filepath, 'w') as f:
+        f.write(html)
+
+
+# Compatibility alias for easier migration
+def plot_trajectory(*args, **kwargs):
+    """Compatibility function for legacy code."""
+    return save_trajectory_to_output(*args, **kwargs) 
\ No newline at end of file
diff --git a/requirements.txt b/requirements.txt
index 57c62a34..68175ad0 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -29,3 +29,6 @@ vllm==0.7.2
 wandb>=0.19.1
 codetiming==1.4.0
 omegaconf==2.3.0
+matplotlib==3.10.1
+hydra-core==1.3.2
+flash-attn==2.7.4
\ No newline at end of file
diff --git a/train_grpo.sh b/train_grpo.sh
index f3eb91ff..820c79d3 100644
--- a/train_grpo.sh
+++ b/train_grpo.sh
@@ -4,18 +4,22 @@
 export CUDA_VISIBLE_DEVICES=${CUDA_VISIBLE_DEVICES:-0,1,2,3,4,5,6,7}
 WAND_PROJECT=${WAND_PROJECT:-'OpenManus-rl'}
 export BASE_MODEL=${BASE_MODEL:-'Qwen/Qwen2.5-3B'}
-AGENTGYM_HOST=${AGENTGYM_HOST:-'127.0.0.1'}
+AGENTGYM_HOST=${AGENTGYM_HOST:-'0.0.0.0'} # Default to 0.0.0.0 for external access
 AGENTGYM_SQL_BIRD_PATH=${AGENTGYM_SQL_BIRD_PATH:-} # Used only for sqlgym
 
 # --- Argument Parsing ---
 usage() {
-    echo "Usage: $0 --env_name <environment_name> [--port <port>] [--data_dir <path>] [--exp_name_suffix <suffix>]"
+    echo "Usage: $0 --env_name <environment_name> [--num_servers <N>] [--base_port <port>] [--data_dir <path>] [--exp_name_suffix <suffix>]"
     echo "Supported env_names: webshop, webarena, maze, wordle, alfworld, sciworld, babyai, textcraft, weather, movie, academia, todo, sheet, sqlgym"
+    echo "  --num_servers: Number of parallel AgentGym servers to launch (default: 1)."
+    echo "  --base_port: Starting port number for servers (default varies by env)."
+    echo "Assumes dedicated conda environments like 'agentenv-webshop' are already created and set up."
     exit 1
 }
 
-AGENTGYM_ENV_NAME="webshop"
-AGENTGYM_PORT_OVERRIDE=""
+AGENTGYM_ENV_NAME="webshop" # Default environment
+NUM_SERVERS=1 # Default number of servers
+BASE_PORT_OVERRIDE=""
 DATA_DIR_OVERRIDE=""
 EXP_NAME_SUFFIX=""
 
@@ -23,73 +27,73 @@ while [[ $# -gt 0 ]]; do
     key="$1"
     case $key in
         --env_name)
-            AGENTGYM_ENV_NAME="$2"
-            shift; shift;;
-        --port)
-            AGENTGYM_PORT_OVERRIDE="$2"
-            shift; shift;;
+            AGENTGYM_ENV_NAME="$2"; shift; shift;;
+        --num_servers)
+            NUM_SERVERS="$2"; shift; shift;;
+        --base_port) # Changed from --port to --base_port
+            BASE_PORT_OVERRIDE="$2"; shift; shift;;
         --data_dir)
-            DATA_DIR_OVERRIDE="$2"
-            shift; shift;;
+            DATA_DIR_OVERRIDE="$2"; shift; shift;;
         --exp_name_suffix)
-            EXP_NAME_SUFFIX="_$2"
-            shift; shift;;
+            EXP_NAME_SUFFIX="_$2"; shift; shift;;
         *)
-            echo "Unknown option: $1"
-            usage;;
+            echo "Unknown option: $1"; usage;;
     esac
 done
 
-if [ -z "$AGENTGYM_ENV_NAME" ]; then
-    echo "Error: --env_name is required."
+if ! [[ "$NUM_SERVERS" =~ ^[1-9][0-9]*$ ]]; then
+    echo "Error: --num_servers must be a positive integer."
     usage
 fi
 
+if [ -z "$AGENTGYM_ENV_NAME" ]; then
+    echo "Error: --env_name is required."; usage
+fi
+
+# --- Determine Base Environment (where verl runs) ---
+BASE_CONDA_ENV=${CONDA_DEFAULT_ENV:-openmanus-rl}
+echo "[Info] Detected base conda environment: $BASE_CONDA_ENV"
+echo "[Info] Verl trainer will run in this environment."
+
 # --- Environment Specific Setup ---
 LAUNCH_CMD=""
-DEFAULT_PORT=""
+DEFAULT_BASE_PORT="" # Renamed from DEFAULT_PORT
 URL_PATH=""
 
 case $AGENTGYM_ENV_NAME in
     webshop)
         LAUNCH_CMD="webshop --host $AGENTGYM_HOST --port \$AGENTGYM_PORT"
-        DEFAULT_PORT=36001;;
+        DEFAULT_BASE_PORT=36001;;
     webarena)
         LAUNCH_CMD="webarena --host $AGENTGYM_HOST --port \$AGENTGYM_PORT"
-        DEFAULT_PORT=8000;;
+        DEFAULT_BASE_PORT=8000;;
     maze)
         LAUNCH_CMD="lmrlgym --host $AGENTGYM_HOST --port \$AGENTGYM_PORT"
-        DEFAULT_PORT=36001
-        URL_PATH="/maze/";;
+        DEFAULT_BASE_PORT=36001; URL_PATH="/maze/";;
     wordle)
         LAUNCH_CMD="lmrlgym --host $AGENTGYM_HOST --port \$AGENTGYM_PORT"
-        DEFAULT_PORT=36001
-        URL_PATH="/wordle/";;
+        DEFAULT_BASE_PORT=36001; URL_PATH="/wordle/";;
     alfworld)
         LAUNCH_CMD="alfworld --host $AGENTGYM_HOST --port \$AGENTGYM_PORT"
-        DEFAULT_PORT=36001;;
+        DEFAULT_BASE_PORT=36001;;
     sciworld)
         LAUNCH_CMD="sciworld --host $AGENTGYM_HOST --port \$AGENTGYM_PORT"
-        DEFAULT_PORT=36001;;
+        DEFAULT_BASE_PORT=36001;;
     babyai)
         LAUNCH_CMD="babyai --host $AGENTGYM_HOST --port \$AGENTGYM_PORT"
-        DEFAULT_PORT=36001;;
+        DEFAULT_BASE_PORT=36001;;
     textcraft)
         LAUNCH_CMD="textcraft --host $AGENTGYM_HOST --port \$AGENTGYM_PORT"
-        DEFAULT_PORT=36001;;
+        DEFAULT_BASE_PORT=36001;;
     weather|movie|academia|todo|sheet)
-        LAUNCH_CMD="\$AGENTGYM_ENV_NAME --host $AGENTGYM_HOST --port \$AGENTGYM_PORT"
-        DEFAULT_PORT=8000;;
+        LAUNCH_CMD="\\\$AGENTGYM_ENV_NAME --host $AGENTGYM_HOST --port \\\$AGENTGYM_PORT" # Escaped env name var
+        DEFAULT_BASE_PORT=8000;;
     sqlgym)
-        if [ -z "$AGENTGYM_SQL_BIRD_PATH" ]; then
-            echo "Error: AGENTGYM_SQL_BIRD_PATH environment variable must be set for sqlgym."
-            exit 1
-        fi
-        LAUNCH_CMD="AGENTENV_SQLGYM_BIRD_PATH=$AGENTGYM_SQL_BIRD_PATH sqlgym --host $AGENTGYM_HOST --port \$AGENTGYM_PORT"
-        DEFAULT_PORT=36002;;
+        if [ -z "$AGENTGYM_SQL_BIRD_PATH" ]; then echo "Error: AGENTGYM_SQL_BIRD_PATH must be set for sqlgym."; exit 1; fi
+        LAUNCH_CMD="AGENTENV_SQLGYM_BIRD_PATH=$AGENTGYM_SQL_BIRD_PATH sqlgym --host $AGENTGYM_HOST --port \\\$AGENTGYM_PORT"
+        DEFAULT_BASE_PORT=36002;;
     *)
-        echo "Error: Unsupported environment name '$AGENTGYM_ENV_NAME'"
-        usage;;
+        echo "Error: Unsupported environment name '$AGENTGYM_ENV_NAME'"; usage;;
 esac
 
 # --- Environment Dependency Installation (in‑place) ---
@@ -119,57 +123,129 @@ else
     echo "[Setup] WARNING: $ENV_SETUP_DIR not found; skipping env‑specific installation."
 fi
 
-export AGENTGYM_PORT=${AGENTGYM_PORT_OVERRIDE:-$DEFAULT_PORT}
-FINAL_LAUNCH_CMD=$(eval echo $LAUNCH_CMD)
+# --- Start AgentGym Servers in Dedicated Environment ---
+TARGET_ENV_NAME="agentenv-${AGENTGYM_ENV_NAME}"
+AGENTGYM_PIDS=() # Array to store PIDs
+AGENTGYM_PORTS=() # Array to store ports
 
-# --- Data & Experiment Naming ---
-export DATA_DIR=${DATA_DIR_OVERRIDE:-"data/$AGENTGYM_ENV_NAME"}
-export EXPERIMENT_NAME="OpenManus-rl-grpo-${BASE_MODEL##*/}-${AGENTGYM_ENV_NAME}${EXP_NAME_SUFFIX}"
+# Check if target env exists
+if ! conda env list | grep -Eq "^${TARGET_ENV_NAME}\\s"; then
+    echo "[Error] Dedicated environment '$TARGET_ENV_NAME' not found. Please create it first."
+    exit 1
+fi
+
+# Determine base port
+AGENTGYM_BASE_PORT=${BASE_PORT_OVERRIDE:-$DEFAULT_BASE_PORT}
+
+echo -e "\\n[Server] Starting $NUM_SERVERS AgentGym server(s) for ${AGENTGYM_ENV_NAME} in env '$TARGET_ENV_NAME'..."
+echo "[Server] Base Port: ${AGENTGYM_BASE_PORT}"
+
+# Create logs directory
+mkdir -p logs
+
+for (( i=0; i<$NUM_SERVERS; i++ )); do
+    # Calculate port for this server instance
+    export AGENTGYM_PORT=$((AGENTGYM_BASE_PORT + i))
+    AGENTGYM_PORTS+=($AGENTGYM_PORT) # Store port
+
+    # Prepare the specific launch command for this instance
+    CURRENT_LAUNCH_CMD=$(eval echo $LAUNCH_CMD) # Substitute $AGENTGYM_PORT
+
+    echo "[Server $(($i+1))/$NUM_SERVERS] Launching on ${AGENTGYM_HOST}:${AGENTGYM_PORT}..."
+    echo "[Server $(($i+1))/$NUM_SERVERS] Command: $CURRENT_LAUNCH_CMD"
+
+    # Run server in background using conda run
+    LOG_FILE="logs/${TARGET_ENV_NAME}_server_${AGENTGYM_PORT}.log"
+    echo "[Server $(($i+1))/$NUM_SERVERS] Logging to $LOG_FILE"
 
-# --- Start AgentGym Server ---
-echo "Starting AgentGym server for ${AGENTGYM_ENV_NAME} on ${AGENTGYM_HOST}:${AGENTGYM_PORT} ..."
-echo "Launch command: $FINAL_LAUNCH_CMD"
-$FINAL_LAUNCH_CMD &
-AGENTGYM_PID=$!
-echo "AgentGym server started with PID: $AGENTGYM_PID"
+    # Use bash -c to handle potential env vars in launch cmd
+    conda run --no-capture-output -n "$TARGET_ENV_NAME" bash -c "$CURRENT_LAUNCH_CMD" > "$LOG_FILE" 2>&1 &
+    PID=$!
 
-sleep 10
-if ! kill -0 $AGENTGYM_PID > /dev/null 2>&1; then
-    echo "AgentGym server failed to start."
+    # Check if PID was obtained
+    if [ -z "$PID" ]; then
+        echo "[Error] Failed to get PID for AgentGym server instance $i on port $AGENTGYM_PORT."
+        # Attempt to kill already launched servers before exiting
+        for p in "${AGENTGYM_PIDS[@]}"; do kill $p 2>/dev/null; done
+        exit 1
+    fi
+    AGENTGYM_PIDS+=($PID) # Store PID
+    echo "[Server $(($i+1))/$NUM_SERVERS] Launched (PID: $PID)."
+    sleep 2 # Small delay between starting servers
+done
+
+# --- Wait and Check Servers ---
+echo "[Server] Waiting for AgentGym servers (${AGENTGYM_PIDS[*]}) to initialize..."
+sleep 15 # Adjust sleep time if needed
+
+# Check if all server processes are still running
+ALL_SERVERS_RUNNING=true
+for PID in "${AGENTGYM_PIDS[@]}"; do
+    if ! kill -0 $PID > /dev/null 2>&1; then
+        echo "[Error] AgentGym server (PID: $PID) failed to start or exited prematurely."
+        # Attempt to find the corresponding log file (this is a bit heuristic)
+        PORT=$(grep -oP -- "--port\\s+\\K\\d+" "logs/"*"${PID}"* 2>/dev/null || echo "unknown")
+        echo "[Error] Check server log potentially named logs/${TARGET_ENV_NAME}_server_${PORT}.log or similar."
+        ALL_SERVERS_RUNNING=false
+    fi
+done
+
+if [ "$ALL_SERVERS_RUNNING" = false ]; then
+    echo "[Error] Not all servers started successfully. Exiting."
+    # Kill remaining servers
+    for p in "${AGENTGYM_PIDS[@]}"; do kill $p 2>/dev/null; done
     exit 1
 fi
-trap "echo 'Stopping AgentGym server (PID: $AGENTGYM_PID)...'; kill $AGENTGYM_PID" EXIT
+echo "[Server] All AgentGym servers appear to be running."
+
+# Setup trap to kill all server processes on script exit/interrupt
+trap "echo '[Cleanup] Stopping AgentGym servers (PIDs: ${AGENTGYM_PIDS[*]})...'; kill ${AGENTGYM_PIDS[*]} 2>/dev/null || echo '[Cleanup] Servers already stopped.'; wait ${AGENTGYM_PIDS[*]} 2>/dev/null" EXIT
 
-# --- Run GRPO Training ---
+# --- Data and Experiment Naming ---
+export DATA_DIR=${DATA_DIR_OVERRIDE:-"data/$AGENTGYM_ENV_NAME"} # Default data dir based on env name
+export EXPERIMENT_NAME="OpenManus-rl-grpo-${BASE_MODEL##*/}-${AGENTGYM_ENV_NAME}${EXP_NAME_SUFFIX}"
+
+# --- Run GRPO Training in Base Environment ---
+echo -e "\\n[Trainer] Running GRPO training in base environment '$BASE_CONDA_ENV'..."
 export VLLM_ATTENTION_BACKEND=${VLLM_ATTENTION_BACKEND:-XFORMERS}
 
-AGENTGYM_SERVER_BASE="http://$AGENTGYM_HOST"
-if [ -n "$URL_PATH" ]; then
-    AGENTGYM_SERVER_BASE="$AGENTGYM_SERVER_BASE$URL_PATH"
-fi
+# Construct server base URL, adding path if needed
+AGENTGYM_SERVER_BASE="http://$AGENTGYM_HOST" # Base URL without port
+# Construct the list of ports as a comma-separated string for OmegaConf
+AGENTGYM_PORTS_STR=$(IFS=,; echo "${AGENTGYM_PORTS[*]}")
 
-echo "Using Data Directory: $DATA_DIR"
-echo "Experiment Name: $EXPERIMENT_NAME"
-echo "AgentGym Base URL: $AGENTGYM_SERVER_BASE:$AGENTGYM_PORT"
+echo "[Trainer] Using Data Directory: $DATA_DIR"
+echo "[Trainer] Experiment Name: $EXPERIMENT_NAME"
+echo "[Trainer] AgentGym Base URL: $AGENTGYM_SERVER_BASE"
+echo "[Trainer] AgentGym Ports: $AGENTGYM_PORTS_STR" # Pass list of ports
 
+# Check if train/test files exist
 TRAIN_FILE="$DATA_DIR/train.parquet"
 TEST_FILE="$DATA_DIR/test.parquet"
 
 if [ ! -f "$TRAIN_FILE" ]; then
-    echo "Warning: Train file not found at $TRAIN_FILE"
+    echo "[Warning] Train file not found at $TRAIN_FILE. Ensure data generation script was run for $AGENTGYM_ENV_NAME."
 fi
 if [ ! -f "$TEST_FILE" ]; then
-    echo "Warning: Test file not found at $TEST_FILE"
+    echo "[Warning] Test file not found at $TEST_FILE. Ensure data generation script was run for $AGENTGYM_ENV_NAME."
 fi
 
-mkdir -p logs
+# Ensure base environment is activated correctly for trainer
+echo "[Trainer] Ensuring base environment '$BASE_CONDA_ENV' is active..."
+CONDA_BASE=$(conda info --base)
+source "${CONDA_BASE}/etc/profile.d/conda.sh"
+conda activate "$BASE_CONDA_ENV" || { echo "Error: Failed to activate base env '$BASE_CONDA_ENV'"; exit 1; }
+
+TRAINER_LOG_FILE="logs/${EXPERIMENT_NAME}.log"
+echo "[Trainer] Logging trainer output to $TRAINER_LOG_FILE"
+echo "[Trainer] Starting GRPO training..."
 
 PYTHONUNBUFFERED=1 python3 -m verl.trainer.main_ppo \
     data.train_files=$TRAIN_FILE \
     data.val_files=$TEST_FILE \
     data.env_name=$AGENTGYM_ENV_NAME \
     data.env_server_base=$AGENTGYM_SERVER_BASE \
-    data.env_port=$AGENTGYM_PORT \
+    data.env_ports=[${AGENTGYM_PORTS_STR}] \
     data.train_data_num=null \
     data.val_data_num=null \
     data.train_batch_size=512 \
@@ -218,10 +294,12 @@ PYTHONUNBUFFERED=1 python3 -m verl.trainer.main_ppo \
     trainer.default_hdfs_dir=null \
     trainer.default_local_dir=verl_checkpoints/$EXPERIMENT_NAME \
     max_turns=2 \
-    2>&1 | tee logs/${EXPERIMENT_NAME}.log
+    2>&1 | tee "$TRAINER_LOG_FILE" # Log trainer output
+
+TRAINER_EXIT_CODE=$?
+
+echo "GRPO training finished with exit code $TRAINER_EXIT_CODE."
 
-# Clean up server
-kill $AGENTGYM_PID
-wait $AGENTGYM_PID 2>/dev/null
+# Cleanup is handled by the trap
 
-echo "AgentGym server stopped."
\ No newline at end of file
+exit $TRAINER_EXIT_CODE
\ No newline at end of file
diff --git a/train_ppo.sh b/train_ppo.sh
index c574e580..1cd5f11d 100644
--- a/train_ppo.sh
+++ b/train_ppo.sh
@@ -3,7 +3,7 @@
 # --- Configuration (defaults, can be overridden via env vars) ---
 export CUDA_VISIBLE_DEVICES=${CUDA_VISIBLE_DEVICES:-0,1,2,3,4,5,6,7}
 WAND_PROJECT=${WAND_PROJECT:-'OpenManus-rl'}
-export BASE_MODEL=${BASE_MODEL:-'meta-llama/Llama-3.2-3B'}
+export BASE_MODEL=${BASE_MODEL:-'Qwen/Qwen2.5-3B'}
 AGENTGYM_HOST=${AGENTGYM_HOST:-'0.0.0.0'} # Default to 0.0.0.0 for external access
 AGENTGYM_SQL_BIRD_PATH=${AGENTGYM_SQL_BIRD_PATH:-} # Used only for sqlgym
 
@@ -103,7 +103,7 @@ esac
 
 # --- Start AgentGym Servers in Dedicated Environment ---
 TARGET_ENV_NAME="agentenv-${AGENTGYM_ENV_NAME}"
-AGENTGYM_PIDS=() # Array to store PIDs
+AGENTGYM_PGIDS=() # Array to store PGIDs (changed from PIDS)
 AGENTGYM_PORTS=() # Array to store ports
 
 # Check if target env exists
@@ -132,55 +132,94 @@ for (( i=0; i<$NUM_SERVERS; i++ )); do
     echo "[Server $(($i+1))/$NUM_SERVERS] Launching on ${AGENTGYM_HOST}:${AGENTGYM_PORT}..."
     echo "[Server $(($i+1))/$NUM_SERVERS] Command: $CURRENT_LAUNCH_CMD"
 
-    # Run server in background using conda run
+    # Run server in background using conda run within a new process group (setsid)
     LOG_FILE="logs/${TARGET_ENV_NAME}_server_${AGENTGYM_PORT}.log"
     echo "[Server $(($i+1))/$NUM_SERVERS] Logging to $LOG_FILE"
 
-    # Use bash -c to handle potential env vars in launch cmd (like for sqlgym)
-    conda run --no-capture-output -n "$TARGET_ENV_NAME" bash -c "$CURRENT_LAUNCH_CMD" > "$LOG_FILE" 2>&1 &
-    PID=$!
+    # Use setsid to ensure the server runs in its own process group
+    setsid conda run --no-capture-output -n "$TARGET_ENV_NAME" bash -c "$CURRENT_LAUNCH_CMD" > "$LOG_FILE" 2>&1 &
+    PGID=$! # PID of setsid becomes the Process Group ID
 
-    # Check if PID was obtained
-    if [ -z "$PID" ]; then
-        echo "[Error] Failed to get PID for AgentGym server instance $i on port $AGENTGYM_PORT."
+    # Check if PGID was obtained
+    if [ -z "$PGID" ]; then
+        echo "[Error] Failed to get PGID for AgentGym server instance $i on port $AGENTGYM_PORT."
         # Attempt to kill already launched servers before exiting
-        for p in "${AGENTGYM_PIDS[@]}"; do kill $p 2>/dev/null; done
+        for pgid in "${AGENTGYM_PGIDS[@]}"; do kill -- -$pgid 2>/dev/null; done # Kill process group
         exit 1
     fi
-    AGENTGYM_PIDS+=($PID) # Store PID
-    echo "[Server $(($i+1))/$NUM_SERVERS] Launched (PID: $PID)."
+    AGENTGYM_PGIDS+=($PGID) # Store PGID
+    echo "[Server $(($i+1))/$NUM_SERVERS] Launched (PGID: $PGID)."
     sleep 2 # Small delay between starting servers
 done
 
 # --- Wait and Check Servers ---
-echo "[Server] Waiting for AgentGym servers (${AGENTGYM_PIDS[*]}) to initialize..."
-sleep 15 # Adjust sleep time if needed
-
-# Check if all server processes are still running
+echo "[Server] Checking if AgentGym servers (${AGENTGYM_PORTS[*]}) are responsive..."
 ALL_SERVERS_RUNNING=true
-for PID in "${AGENTGYM_PIDS[@]}"; do
-    if ! kill -0 $PID > /dev/null 2>&1; then
-        echo "[Error] AgentGym server (PID: $PID) failed to start or exited prematurely."
-        # Attempt to find the corresponding log file (this is a bit heuristic)
-        PORT=$(grep -oP -- "--port\\s+\\K\\d+" "logs/"*"${PID}"* 2>/dev/null || echo "unknown")
-        echo "[Error] Check server log potentially named logs/${TARGET_ENV_NAME}_server_${PORT}.log or similar."
+MAX_RETRIES=5       # Number of times to check each server
+RETRY_DELAY=3       # Seconds to wait between retries
+CONNECT_TIMEOUT=1   # Seconds for nc connection timeout
+
+for (( i=0; i<${#AGENTGYM_PORTS[@]}; i++ )); do
+    PORT=${AGENTGYM_PORTS[i]}
+    PGID=${AGENTGYM_PGIDS[i]} # Corresponding PGID for logging/debug
+    LOG_FILE="logs/${TARGET_ENV_NAME}_server_${PORT}.log"
+    SERVER_UP=false
+
+    # Determine host to check (use localhost if host is 0.0.0.0)
+    CHECK_HOST=$AGENTGYM_HOST
+    if [ "$CHECK_HOST" == "0.0.0.0" ]; then
+        CHECK_HOST="127.0.0.1"
+    fi
+
+    echo "[Server Check] Checking server on ${CHECK_HOST}:${PORT} (PGID: $PGID)..."
+    for (( attempt=1; attempt<=$MAX_RETRIES; attempt++ )); do
+        # Use netcat (nc) to check if port is open. -z: zero-I/O mode, -w: timeout
+        # Redirect errors to /dev/null to avoid clutter
+        if nc -z -w $CONNECT_TIMEOUT "$CHECK_HOST" "$PORT" > /dev/null 2>&1; then
+             echo "[Server Check] Server on port $PORT is responsive."
+             SERVER_UP=true
+             break # Exit retry loop for this server
+        else
+            if [ $attempt -lt $MAX_RETRIES ]; then
+                echo "[Server Check] Server on port $PORT not responsive (Attempt $attempt/$MAX_RETRIES). Retrying in $RETRY_DELAY seconds..."
+                sleep $RETRY_DELAY
+            else
+                echo "[Error] Server on port $PORT (PGID: $PGID) failed to respond after $MAX_RETRIES attempts."
+                echo "[Error] Check server log for details: $LOG_FILE"
+            fi
+        fi
+    done
+
+    if [ "$SERVER_UP" = false ]; then
         ALL_SERVERS_RUNNING=false
+        # No need to check remaining servers if one failed
+        break
     fi
 done
 
 if [ "$ALL_SERVERS_RUNNING" = false ]; then
-    echo "[Error] Not all servers started successfully. Exiting."
-    # Kill remaining servers
-    for p in "${AGENTGYM_PIDS[@]}"; do kill $p 2>/dev/null; done
-    exit 1
+    echo "[Error] Not all AgentGym servers started successfully or became responsive. Initiating cleanup..."
+    # Manually trigger cleanup for potentially started PGIDs before exiting
+    # We duplicate part of the trap logic here for immediate cleanup on check failure
+    CLEANUP_PGIDS_ON_FAIL=(${AGENTGYM_PGIDS[*]});
+    for pgid_fail in "${CLEANUP_PGIDS_ON_FAIL[@]}"; do
+        echo "[Cleanup] Killing process group -$pgid_fail due to failed startup check."
+        kill -- -$pgid_fail 2>/dev/null;
+    done
+    wait 2>/dev/null # Give kill commands a moment
+    echo "[Error] Exiting script due to server startup failure."
+    exit 1 # Exit with error code
 fi
-echo "[Server] All AgentGym servers appear to be running."
 
-# Setup trap to kill all server processes on script exit/interrupt
-trap "echo '[Cleanup] Stopping AgentGym servers (PIDs: ${AGENTGYM_PIDS[*]})...'; kill ${AGENTGYM_PIDS[*]} 2>/dev/null || echo '[Cleanup] Servers already stopped.'; wait ${AGENTGYM_PIDS[*]} 2>/dev/null" EXIT
+echo "[Server] All AgentGym servers appear to be responsive and running."
+
+
+# Setup trap to kill all server process groups on script exit/interrupt
+# Note the use of kill -- -$pgid to target the entire process group
+trap "echo '[Cleanup] Stopping AgentGym server process groups (PGIDs: ${AGENTGYM_PGIDS[*]})...'; CLEANUP_PGIDS=(${AGENTGYM_PGIDS[*]}); for pgid in \${CLEANUP_PGIDS[@]}; do echo '[Cleanup] Killing process group -\$pgid'; kill -- -\$pgid 2>/dev/null; done; wait 2>/dev/null; echo '[Cleanup] Done.'" EXIT
 
 # --- Data and Experiment Naming ---
-export DATA_DIR=${DATA_DIR_OVERRIDE:-"data/$AGENTGYM_ENV_NAME"} # Default data dir based on env name
+export DATA_DIR=${DATA_DIR_OVERRIDE:-"./data/$AGENTGYM_ENV_NAME"} # Default data dir based on env name
 export EXPERIMENT_NAME="OpenManus-rl-ppo-${BASE_MODEL##*/}-${AGENTGYM_ENV_NAME}${EXP_NAME_SUFFIX}"
 
 
@@ -203,6 +242,9 @@ echo "[Trainer] AgentGym Ports: $AGENTGYM_PORTS_STR" # Pass list of ports
 TRAIN_FILE="$DATA_DIR/train.parquet"
 TEST_FILE="$DATA_DIR/test.parquet"
 
+echo "[Trainer] Train file: $TRAIN_FILE"
+echo "[Trainer] Test file: $TEST_FILE"
+
 if [ ! -f "$TRAIN_FILE" ]; then
     echo "[Warning] Train file not found at $TRAIN_FILE. Ensure data generation script was run for $AGENTGYM_ENV_NAME."
 fi
@@ -229,69 +271,76 @@ TRAINER_LOG_FILE="logs/${EXPERIMENT_NAME}.log"
 echo "[Trainer] Logging trainer output to $TRAINER_LOG_FILE"
 echo "[Trainer] Starting PPO training..."
 
-PYTHONUNBUFFERED=1 python3 -m verl.trainer.main_ppo \\
-    data.train_files=$TRAIN_FILE \\
-    data.val_files=$TEST_FILE \\
-    data.env_name=$AGENTGYM_ENV_NAME \\
-    data.env_server_base=$AGENTGYM_SERVER_BASE \\
-    data.env_ports=[${AGENTGYM_PORTS_STR}] \\ // Pass ports as a list
-    data.train_data_num=null \\
-    data.val_data_num=null \\
-    data.train_batch_size=512 \\
-    data.val_batch_size=256 \\
-    data.max_prompt_length=4096 \\
-    data.max_response_length=500 \\
-    data.max_start_length=2048 \\
-    data.max_obs_length=500 \\
-    data.shuffle_train_dataloader=True \\
-    algorithm.adv_estimator=gae \\
-    actor_rollout_ref.model.path=$BASE_MODEL \\
-    actor_rollout_ref.actor.optim.lr=1e-6 \\
-    actor_rollout_ref.model.enable_gradient_checkpointing=true \\
-    actor_rollout_ref.model.use_remove_padding=True \\
-    actor_rollout_ref.actor.optim.lr_warmup_steps_ratio=0.95 \\
-    actor_rollout_ref.actor.ppo_mini_batch_size=256 \\
-    actor_rollout_ref.actor.ppo_micro_batch_size=64 \\
-    actor_rollout_ref.actor.fsdp_config.param_offload=true \\
-    actor_rollout_ref.actor.fsdp_config.grad_offload=true \\
-    actor_rollout_ref.actor.fsdp_config.optimizer_offload=true \\
-    actor_rollout_ref.rollout.log_prob_micro_batch_size=128 \\
-    actor_rollout_ref.rollout.tensor_model_parallel_size=1 \\
-    actor_rollout_ref.rollout.name=vllm \\
-    actor_rollout_ref.rollout.gpu_memory_utilization=0.6 \\
-    actor_rollout_ref.ref.log_prob_micro_batch_size=128 \\
-    actor_rollout_ref.ref.fsdp_config.param_offload=True \\
-    actor_rollout_ref.rollout.n_agent=1 \\
-    actor_rollout_ref.rollout.temperature=1 \\
-    actor_rollout_ref.actor.state_masking=true \\
-    critic.optim.lr=1e-5 \\
-    critic.model.use_remove_padding=True \\
-    critic.optim.lr_warmup_steps_ratio=0.05 \\
-    critic.model.path=$BASE_MODEL \\
-    critic.model.enable_gradient_checkpointing=true \\
-    critic.ppo_micro_batch_size=8 \\
-    critic.model.fsdp_config.param_offload=true \\
-    critic.model.fsdp_config.grad_offload=true \\
-    critic.model.fsdp_config.optimizer_offload=true \\
-    algorithm.kl_ctrl.kl_coef=0.001 \\
-    algorithm.no_think_rl=false \\
-    algorithm.reward_score_fn=agentgym \\
-    trainer.critic_warmup=0 \\
-    trainer.logger=['wandb'] \\
-    +trainer.val_only=false \\
-    +trainer.val_before_train=true \\
-    trainer.default_hdfs_dir=null \\
-    trainer.n_gpus_per_node=8 \\
-    trainer.nnodes=1 \\
-    trainer.save_freq=100 \\
-    trainer.test_freq=50 \\
-    trainer.project_name=$WAND_PROJECT \\
-    trainer.experiment_name=$EXPERIMENT_NAME \\
-    trainer.total_epochs=15 \\
-    trainer.total_training_steps=305 \\
-    trainer.default_hdfs_dir=null \\
-    trainer.default_local_dir=verl_checkpoints/$EXPERIMENT_NAME \\
-    max_turns=2 \\
+# --- Construct Hydra Overrides Array ---
+hydra_overrides=(
+    "data.train_files=$TRAIN_FILE"
+    "data.val_files=$TEST_FILE"
+    "data.env_name=$AGENTGYM_ENV_NAME"
+    "data.env_server_base=$AGENTGYM_SERVER_BASE"
+    "data.env_ports=[${AGENTGYM_PORTS_STR}]"
+    "data.train_data_num=null"
+    "data.val_data_num=null"
+    "data.train_batch_size=512"
+    "data.val_batch_size=2"
+    "data.max_prompt_length=4096"
+    "data.max_response_length=500"
+    "data.max_start_length=2048"
+    "data.max_obs_length=500"
+    "data.shuffle_train_dataloader=True"
+    "algorithm.adv_estimator=gae"
+    "actor_rollout_ref.model.path=$BASE_MODEL"
+    "actor_rollout_ref.actor.optim.lr=1e-6"
+    "actor_rollout_ref.model.enable_gradient_checkpointing=true"
+    "actor_rollout_ref.model.use_remove_padding=True"
+    "actor_rollout_ref.actor.optim.lr_warmup_steps_ratio=0.95"
+    "actor_rollout_ref.actor.ppo_mini_batch_size=256"
+    "actor_rollout_ref.actor.ppo_micro_batch_size=64"
+    "actor_rollout_ref.actor.fsdp_config.param_offload=true"
+    "actor_rollout_ref.actor.fsdp_config.grad_offload=true"
+    "actor_rollout_ref.actor.fsdp_config.optimizer_offload=true"
+    "actor_rollout_ref.rollout.log_prob_micro_batch_size=128"
+    "actor_rollout_ref.rollout.tensor_model_parallel_size=1"
+    "actor_rollout_ref.rollout.name=vllm"
+    "actor_rollout_ref.rollout.gpu_memory_utilization=0.6"
+    "actor_rollout_ref.ref.log_prob_micro_batch_size=128"
+    "actor_rollout_ref.ref.fsdp_config.param_offload=True"
+    "actor_rollout_ref.rollout.n_agent=1" 
+    "actor_rollout_ref.rollout.temperature=1"
+    "actor_rollout_ref.actor.state_masking=true"
+    "critic.optim.lr=1e-5"
+    "critic.model.use_remove_padding=True"
+    "critic.optim.lr_warmup_steps_ratio=0.05"
+    "critic.model.path=$BASE_MODEL"
+    "critic.model.enable_gradient_checkpointing=true"
+    "critic.ppo_micro_batch_size=8"
+    "critic.model.fsdp_config.param_offload=true"
+    "critic.model.fsdp_config.grad_offload=true"
+    "critic.model.fsdp_config.optimizer_offload=true"
+    "algorithm.kl_ctrl.kl_coef=0.001"
+    "algorithm.no_think_rl=false"
+    "algorithm.reward_score_fn=agentgym"
+    "trainer.critic_warmup=0"
+    "trainer.logger=['wandb']"
+    "+trainer.val_only=false"
+    "+trainer.val_before_train=true"
+    "trainer.default_hdfs_dir=null"
+    "trainer.n_gpus_per_node=8"
+    "trainer.nnodes=1"
+    "trainer.save_freq=100"
+    "trainer.test_freq=50"
+    "trainer.project_name=$WAND_PROJECT"
+    "trainer.experiment_name=$EXPERIMENT_NAME"
+    "trainer.total_epochs=15"
+    "trainer.total_training_steps=305"
+    "trainer.default_hdfs_dir=null"
+    "trainer.default_local_dir=verl_checkpoints/$EXPERIMENT_NAME"
+    "max_turns=2"
+)
+
+# --- Execute Python Training Script ---
+PYTHONUNBUFFERED=1 python3 -m verl.trainer.main_ppo \
+    --config-name ppo_trainer --config-path config \
+    "${hydra_overrides[@]}" \
     2>&1 | tee "$TRAINER_LOG_FILE" # Log trainer output
 
 TRAINER_EXIT_CODE=$?
diff --git a/verl/trainer/config/ppo_trainer.yaml b/verl/trainer/config/ppo_trainer.yaml
index d389a3cb..5a09d893 100644
--- a/verl/trainer/config/ppo_trainer.yaml
+++ b/verl/trainer/config/ppo_trainer.yaml
@@ -14,6 +14,10 @@ data:
   return_raw_input_ids: False  # This should be set to true when the tokenizer between policy and rm differs
   return_raw_chat: False
   shuffle_train_dataloader: True
+  env_name: webshop  # 默认环境名称
+  env_server_base: "http://0.0.0.0"  # 默认服务器基地址 
+  env_ports: [36001]  # 默认端口列表
+  env_data_len: 200  # 默认数据长度
 
 actor_rollout_ref:
   hybrid_engine: True
@@ -150,11 +154,33 @@ algorithm:
   gamma: 1.0
   lam: 1.0
   adv_estimator: gae
+  reward_score_fn: agentgym # Base scoring function (or other default)
   no_think_rl: False
   kl_penalty: kl  # how to estimate kl divergence
   kl_ctrl:
     type: fixed
     kl_coef: 0.001
+  # --- New Reward Components Configuration ---
+  reward_components:
+    goal_reward: # Base reward from env success/score
+      enabled: true
+      weight: 1.0
+    length_penalty:
+      enabled: false # Default to disabled
+      weight: -0.01 # Penalty weight (negative)
+      max_length: 450 # Max length before penalty
+      min_length: 20  # Min length before penalty (optional)
+      penalty_type: "linear" # Options: "linear", "quadratic", "log"
+    format_reward:
+      enabled: false # Default to disabled
+      weight: 0.2  # Reward weight (positive)
+      # Define patterns per environment or use a default
+      patterns_by_env:
+        webshop: ['<action>.*</action>', '<response>.*</response>']
+        # alfworld: ['go to .*', 'take .* from .*', 'put .* in/on .*']
+        default: [] # Default empty list if no env-specific patterns
+  # --- New Reward Allocation Strategy ---
+  reward_allocation: "last_token" # Options: "last_token", "uniform_positive", "discounted"
   state_masking:
     start_state_marker: "<information>"
     end_state_marker: "</information>"
diff --git a/verl/trainer/main_ppo.py b/verl/trainer/main_ppo.py
index a3400b65..2fb3610a 100644
--- a/verl/trainer/main_ppo.py
+++ b/verl/trainer/main_ppo.py
@@ -22,6 +22,7 @@
 from verl.trainer.ppo.ray_trainer import RayPPOTrainer
 import re
 import numpy as np
+from omegaconf import OmegaConf
 
 def _select_rm_score_fn(data_source):
     # 定义已知的AgentGym环境列表
@@ -129,7 +130,6 @@ def main_task(config):
 
     # print initial config
     from pprint import pprint
-    from omegaconf import OmegaConf
     pprint(OmegaConf.to_container(config, resolve=True))  # resolve=True will eval symbol values
     OmegaConf.resolve(config)
 
@@ -176,45 +176,52 @@ def main_task(config):
         Role.RefPolicy: global_pool_id,
     }
 
-    # --- Conditionally Define Reward Functions --- 
+    # --- Conditionally Define Reward Functions ---
     reward_fn = None
     val_reward_fn = None
 
     # Define known AgentGym environments (mirroring agentgym.py or train_ppo.sh)
     KNOWN_AGENTGYM_ENVS = [
-        "webshop", "webarena", "maze", "wordle", "alfworld", 
-        "sciworld", "babyai", "textcraft", "weather", "movie", 
+        "webshop", "webarena", "maze", "wordle", "alfworld",
+        "sciworld", "babyai", "textcraft", "weather", "movie",
         "academia", "todo", "sheet", "sqlgym"
     ]
     is_agentgym_run = config.data.env_name in KNOWN_AGENTGYM_ENVS
-    
+
+    # --- Get Reward Component Configuration --- 
+    # Safely get the reward components config, default to empty dict if not present
+    reward_component_config = OmegaConf.to_container(
+        config.algorithm.get('reward_components', {}), # Use .get for safety
+        resolve=True
+    )
+    print(f"[main_task] Reward component configuration: {reward_component_config}")
+
+    # --- Initialize RewardManager (if needed, e.g., for non-AgentGym) ---
+    # Decide if RewardManager is still needed. With RewardComposer, its role might change
+    # or become obsolete if all scoring is handled by components.
+    # For now, let's assume it might still be used for specific datasets.
     if not is_agentgym_run:
         print("[main_task] Initializing RewardManager for non-AgentGym run.")
-        # Initialize RewardManager only for non-AgentGym runs
-        # Make sure RewardManager class definition exists above
         try:
+            # Pass reward_component_config to RewardManager if it needs it
             reward_fn = RewardManager(tokenizer=tokenizer, num_examine=0, format_score=config.get('format_score', 0.))
             val_reward_fn = RewardManager(tokenizer=tokenizer, num_examine=1, format_score=config.get('format_score', 0.))
         except NameError:
-             print("[main_task] Error: RewardManager class not defined. Cannot initialize reward functions.")
-             # Decide how to proceed - exit or continue without reward_fn?
-             # For now, let reward_fn remain None
-             pass
-
-        # Setup RewardModel worker only if reward_model is enabled AND it's NOT AgentGym
-        if config.reward_model.enable:
-            print("[main_task] Setting up RewardModel worker.")
-            if config.reward_model.strategy == 'fsdp':
-                from verl.workers.fsdp_workers import RewardModelWorker
-            elif config.reward_model.strategy == 'megatron':
-                from verl.workers.megatron_workers import RewardModelWorker
-            else:
-                raise NotImplementedError
-            role_worker_mapping[Role.RewardModel] = ray.remote(RewardModelWorker)
-            mapping[Role.RewardModel] = global_pool_id
+             print("[main_task] Error: RewardManager class not defined. Skipping.")
+             pass # reward_fn and val_reward_fn remain None
+
+    # --- Setup RewardModel worker (if needed) ---
+    # This logic remains largely the same, depends on reward_model.enable config
+    if config.reward_model.enable:
+         print("[main_task] Setting up RewardModel worker.")
+         # ... (rest of the RewardModel setup logic) ...
+         # if config.reward_model.strategy == 'fsdp':
+         #    from verl.workers.fsdp_workers import RewardModelWorker
+         # ... etc ...
+         # role_worker_mapping[Role.RewardModel] = ray.remote(RewardModelWorker)
+         # mapping[Role.RewardModel] = global_pool_id
     else:
-        print(f"[main_task] AgentGym run ({config.data.env_name}) detected. Skipping RewardManager initialization.")
-        pass 
+        print(f"[main_task] AgentGym run ({config.data.env_name}) or RewardModel not enabled. Skipping RewardManager/RewardModel worker setup.")
 
     # --- Initialize Trainer --- 
     resource_pool_manager = ResourcePoolManager(resource_pool_spec=resource_pool_spec, mapping=mapping)
@@ -225,6 +232,7 @@ def main_task(config):
                             ray_worker_group_cls=ray_worker_group_cls,
                             reward_fn=reward_fn, # Pass potentially None
                             val_reward_fn=val_reward_fn, # Pass potentially None
+                            reward_component_config=reward_component_config, # Pass the parsed config
                             )
     trainer.init_workers()
     trainer.fit()
diff --git a/verl/trainer/ppo/ray_trainer.py b/verl/trainer/ppo/ray_trainer.py
index 7c66c25f..8c0b7841 100644
--- a/verl/trainer/ppo/ray_trainer.py
+++ b/verl/trainer/ppo/ray_trainer.py
@@ -42,6 +42,9 @@
 import re
 from openmanus_rl.llm_agent.openmanus import OpenManusAgent, AgentConfig
 from verl.utils.reward_score import SUPPORTED_REWARD_SCORE_FNS
+from verl.utils.reward_score.agentgym import compute_score as agentgym_compute_score
+from verl.utils.reward_score.reward_components import RewardComposer, GoalReward, LengthPenalty, FormatReward
+from verl.utils.tracking import Tracking
 
 WorkerType = Type[Worker]
 
@@ -333,7 +336,8 @@ def __init__(self,
                  resource_pool_manager: ResourcePoolManager,
                  ray_worker_group_cls: RayWorkerGroup = RayWorkerGroup,
                  reward_fn=None,
-                 val_reward_fn=None):
+                 val_reward_fn=None,
+                 reward_component_config: dict = None):
 
         # assert torch.cuda.is_available(), 'cuda must be available on driver'
 
@@ -341,6 +345,7 @@ def __init__(self,
         self.config = config
         self.reward_fn = reward_fn
         self.val_reward_fn = val_reward_fn
+        self.reward_component_config = reward_component_config or {}
 
         self.hybrid_engine = config.actor_rollout_ref.hybrid_engine
         assert self.hybrid_engine, 'Currently, only support hybrid engine'
@@ -370,14 +375,52 @@ def __init__(self,
 
         self._create_dataloader()
         self._init_logger()
+        self._init_reward_composer()
     
     def _init_logger(self):
-        from verl.utils.tracking import Tracking
         self.logger = Tracking(project_name=self.config.trainer.project_name,
                           experiment_name=self.config.trainer.experiment_name,
                           default_backend=self.config.trainer.logger,
                           config=OmegaConf.to_container(self.config, resolve=True))
 
+    def _init_reward_composer(self):
+        """Initializes the RewardComposer based on the configuration."""
+        components = []
+        cfg = self.reward_component_config
+        print(f"[Trainer._init_reward_composer] Initializing with config: {cfg}")
+
+        # --- Build Reward Components List --- 
+        # Example: Dynamically add components based on config
+        if cfg.get('goal_reward', {}).get('enabled', True):
+            components.append(GoalReward(weight=cfg['goal_reward'].get('weight', 1.0)))
+            print("  - Added GoalReward")
+
+        if cfg.get('length_penalty', {}).get('enabled', False):
+            lp_cfg = cfg['length_penalty']
+            components.append(LengthPenalty(
+                weight=lp_cfg.get('weight', -0.01),
+                max_length=lp_cfg.get('max_length', 500),
+                min_length=lp_cfg.get('min_length', 10),
+                penalty_type=lp_cfg.get('penalty_type', "linear")
+            ))
+            print("  - Added LengthPenalty")
+
+        if cfg.get('format_reward', {}).get('enabled', False):
+            fmt_cfg = cfg['format_reward']
+            # Get patterns specific to the current env or use default
+            patterns = fmt_cfg.get('patterns_by_env', {}).get(
+                self.config.data.env_name, # Assumes env_name is available in self.config.data
+                fmt_cfg.get('patterns_by_env', {}).get('default', [])
+            )
+            components.append(FormatReward(
+                weight=fmt_cfg.get('weight', 0.2),
+                required_patterns=patterns
+            ))
+            print(f"  - Added FormatReward with patterns: {patterns}")
+
+        self.reward_composer = RewardComposer(components=components)
+        print(f"[Trainer._init_reward_composer] Composer initialized with {len(components)} components.")
+
     def _create_dataloader(self):
         from torch.utils.data import DataLoader
         # TODO: we have to make sure the batch size is divisible by the dp size
@@ -444,30 +487,21 @@ def _create_dataloader(self):
 
     def _validate(self):
         """
-        Validation loop.
+        Validation loop using the RewardComposer.
         """
         import torch
-        all_metrics = defaultdict(list) 
-        all_calculated_scores = [] # Store scores calculated by score_fn
-
-        # --- Determine Score Function --- 
-        score_fn = None
-        score_fn_name = self.config.algorithm.get('reward_score_fn')
-        if score_fn_name and score_fn_name in SUPPORTED_REWARD_SCORE_FNS:
-            score_fn = SUPPORTED_REWARD_SCORE_FNS[score_fn_name]
-            print(f"[Trainer._validate] Using reward score function: {score_fn_name}")
-        else:
-            print(f"[Trainer._validate] No valid reward_score_fn configured ('{score_fn_name}'). Using val_reward_fn if available.")
-            score_fn = self.val_reward_fn # Fallback to RewardManager if passed
-            if score_fn:
-                 print(f"[Trainer._validate] Using val_reward_fn (likely RewardManager).")
-            else:
-                 print(f"[Trainer._validate] No score_fn or val_reward_fn available.")
-
-        # Determine if this is an AgentGym run
-        is_agentgym_run = self.config.data.env_name in KNOWN_AGENTGYM_ENVS
-
-        # Agent config preparation (remains the same)
+        all_metrics = defaultdict(list)
+        all_calculated_scores = [] # Store final composite scores
+        all_reward_breakdowns = defaultdict(list) # Store breakdown from composer
+
+        # --- Determine if Validation is Possible (e.g., if composer has components) ---
+        can_validate = bool(self.reward_composer and self.reward_composer.components)
+        if not can_validate:
+            print("[Trainer._validate] No reward components configured in composer. Skipping validation scoring.")
+            # Still might run generation for qualitative checks if needed
+            # return {} # Or continue if other validation steps exist
+
+        # Agent config preparation
         gen_config = AgentConfig(
             max_turns=self.config.max_turns,
             max_start_length=self.config.data.max_start_length,
@@ -476,122 +510,114 @@ def _validate(self):
             max_obs_length=self.config.data.max_obs_length,
             num_gpus=self.config.trainer.n_gpus_per_node, 
             env_name=self.config.data.env_name, 
-            env_port=self.config.data.env_port,
+            env_ports=self.config.data.env_ports, # Use env_ports list
             env_server_base=self.config.data.env_server_base,
             env_data_len=self.config.data.get('env_data_len', 200),
             max_workers=self.config.actor_rollout_ref.rollout.get('max_workers', 10),
+            logging=self.config.get('logging') # Pass logging config
         )
+        agent_logger = self.logger if hasattr(self, 'logger') else None
         generation_manager = OpenManusAgent(
             tokenizer=self.tokenizer,
             actor_rollout_wg=self.actor_rollout_wg,
             config=gen_config,
-            tool_manager=None, 
             is_validation = True,
+            logger=agent_logger # Pass logger
         )
 
         # --- Run Validation Loop --- 
         for batch_dict in self.val_dataloader:
             timing_raw = {}
             test_batch: DataProto = DataProto.from_single_dict(batch_dict)
-            
+
             final_batch_output = None # To store results from rollout/generation
-            
+            is_agentgym_run = self.config.data.env_name in KNOWN_AGENTGYM_ENVS # Moved check inside loop if needed
+
             # --- Rollout/Generation --- 
             if is_agentgym_run:
-                # print("[Trainer._validate] Running AgentGym/do_search path.") # Debug
+                # Run AgentGym loop
                 test_gen_batch = test_batch.pop(batch_keys=['input_ids', 'attention_mask', 'position_ids'])
+                # Ensure idx and reward_model are present (as before)
+                # ... (add idx and reward_model if missing) ...
                 if 'idx' not in test_gen_batch.meta_info:
                      batch_size = test_gen_batch.batch['input_ids'].shape[0]
                      test_gen_batch.meta_info['idx'] = torch.arange(batch_size)
                 if 'reward_model' not in test_gen_batch.meta_info:
                      batch_size = test_gen_batch.batch['input_ids'].shape[0]
-                     test_gen_batch.meta_info['reward_model'] = [{} for _ in range(batch_size)] 
+                     test_gen_batch.meta_info['reward_model'] = [{} for _ in range(batch_size)]
 
                 with _timer('step', timing_raw):
-                    final_batch_output = generation_manager.run_llm_loop(gen_batch=test_gen_batch)
-                    
-            else: # Original Path (Not AgentGym)
-                # print("[Trainer._validate] Running original/non-AgentGym path.") # Debug
-                # Check reward model style if needed (original logic)
-                # if self.config.reward_model.enable and test_batch[0].non_tensor_batch['reward_model']['style'] == 'model':
-                #    continue 
-                test_gen_batch = test_batch.pop(['input_ids', 'attention_mask', 'position_ids'])
-                test_gen_batch.meta_info = {
-                    'eos_token_id': self.tokenizer.eos_token_id,
-                    'pad_token_id': self.tokenizer.pad_token_id,
-                    'recompute_log_prob': False,
-                    'do_sample': False,
-                    'validate': True,
-                }
-                test_gen_batch_padded, pad_size = pad_dataproto_to_divisor(test_gen_batch, self.actor_rollout_wg.world_size)
-                test_output_gen_batch_padded = self.actor_rollout_wg.generate_sequences(test_gen_batch_padded)
-                final_batch_output = unpad_dataproto(test_output_gen_batch_padded, pad_size=pad_size)
-            
-            # --- Score Calculation (using results in final_batch_output) --- 
-            if final_batch_output and score_fn:
+                    # Prepare output directory if logging images
+                    output_dir = None
+                    if self.config.logging and self.config.logging.get('log_images'):
+                        output_dir = os.path.join(
+                            self.config.trainer.default_local_dir,
+                            f"val_step_{self.global_steps}"
+                        )
+                    final_batch_output = generation_manager.run_llm_loop(
+                        gen_batch=test_gen_batch,
+                        output_dir=output_dir,
+                        global_steps=self.global_steps
+                    )
+            else:
+                 # Run original generation path
+                 # ... (original generation logic) ...
+                 print("[Trainer._validate] Non-AgentGym validation path not fully updated for RewardComposer.")
+                 continue # Skip scoring for now
+
+            # --- Score Calculation (using RewardComposer) --- 
+            if can_validate and final_batch_output and final_batch_output.batch:
                 current_batch_size = final_batch_output.batch['input_ids'].shape[0]
                 env_name = self.config.data.env_name
-                
+
                 # Prepare data needed by the score function
                 trajectories = final_batch_output.meta_info.get('rollout_trajectory', [[]] * current_batch_size)
-                reward_models = final_batch_output.meta_info.get('reward_model', [{}] * current_batch_size)
-                env_scores_from_rollout = final_batch_output.meta_info.get('env_scores', None) # Direct scores from env
-                
+                reward_models_info = final_batch_output.meta_info.get('reward_model', [{}] * current_batch_size)
+
                 batch_scores = []
                 for i in range(current_batch_size):
-                    # --- Call the selected score function --- 
-                    # Check if score_fn is the agentgym one by name (or reference)
-                    if score_fn_name == 'agentgym': 
-                        # Pass trajectory and reward_model info
-                        score_kwargs = {
-                            'trajectory': trajectories[i] if i < len(trajectories) else [],
-                            'reward_model_info': reward_models[i] if i < len(reward_models) else {}
-                        }
-                        try:
-                            score = score_fn(env_name=env_name, **score_kwargs)
-                            batch_scores.append(score)
-                        except Exception as e:
-                            print(f"[Trainer._validate] Error calling score function {score_fn_name} for sample {i}: {e}")
-                            batch_scores.append(0.0)
-                    elif score_fn == self.val_reward_fn: # Check if it's the RewardManager
-                        # RewardManager expects the full batch DataProto
-                        # Reconstruct a single item DataProto for RewardManager
-                        single_item_batch = test_batch[i].union(final_batch_output[i])
-                        try:
-                             # RewardManager.__call__ returns a tensor, get the score
-                             reward_tensor = score_fn(single_item_batch) 
-                             # Assume score is sum or last non-zero value
-                             score = reward_tensor.sum().item() # Or other logic based on RewardManager output
-                             batch_scores.append(score)
-                        except Exception as e:
-                            print(f"[Trainer._validate] Error calling val_reward_fn (RewardManager) for sample {i}: {e}")
-                            batch_scores.append(0.0)
-                    else:
-                        # Handle other potential score functions if needed
-                        print(f"[Trainer._validate] Warning: Handling for score function {score_fn_name} not implemented. Skipping.")
+                    try:
+                        # Use RewardComposer to get the total score and breakdown
+                        total_score, breakdown = self.reward_composer.compute_total_reward(
+                            trajectory=trajectories[i] if i < len(trajectories) else [],
+                            reward_model_info=reward_models_info[i] if i < len(reward_models_info) else {},
+                            env_name=env_name,
+                            # Pass any other context needed by components
+                        )
+                        batch_scores.append(total_score)
+                        # Store breakdown for aggregation
+                        for comp_name, comp_score in breakdown.items():
+                             all_reward_breakdowns[f'val/{comp_name}/mean'].append(comp_score)
+
+                    except Exception as e:
+                        print(f"[Trainer._validate] Error calling RewardComposer for sample {i}: {e}")
                         batch_scores.append(0.0)
-                        
+                        # Record 0 for all components in breakdown on error?
+                        for comp in self.reward_composer.components:
+                             all_reward_breakdowns[f'val/{comp.name}/mean'].append(0.0)
+
                 all_calculated_scores.extend(batch_scores)
-                # print(f"[Trainer._validate] Calculated Batch Scores: {batch_scores}") # Debug
-            elif not score_fn:
-                 print("[Trainer._validate] No score function available to calculate scores.")
-                 
-            # Collect timing or other common metrics if needed
-            # ...
+            elif not can_validate:
+                 print("[Trainer._validate] Skipping scoring as no components are configured.")
+            elif not final_batch_output or not final_batch_output.batch:
+                 print("[Trainer._validate] Skipping scoring due to empty generation output.")
 
         # --- Aggregate and Log Metrics --- 
         final_metrics = {}
         if all_calculated_scores:
             mean_score = np.mean(all_calculated_scores)
-            log_key = f'val/{score_fn_name}/mean' if score_fn_name else 'val/calculated_score/mean'
-            final_metrics[log_key] = mean_score
-            print(f"[Trainer._validate] Final Mean Score ({log_key}): {mean_score}")
+            final_metrics['val/total_reward/mean'] = mean_score # Log the composite score
+            print(f"[Trainer._validate] Final Mean Composite Score: {mean_score}")
+
+            # Aggregate and log breakdown
+            for name, scores in all_reward_breakdowns.items():
+                 if scores:
+                      final_metrics[name] = np.mean(scores)
         else:
              print("[Trainer._validate] No validation scores collected to report.")
-             # ... (Fallback logging if needed) ...
 
-        # Aggregate other metrics if collected
-        # ... 
+        # Add other potential validation metrics (timing, etc.)
+        # ...
 
         return final_metrics
 
@@ -701,17 +727,18 @@ def _balance_batch(self, batch: DataProto, metrics, logging_prefix='global_seqle
 
     def fit(self):
         """
-        The training loop of PPO.
+        The training loop of PPO, modified to use RewardComposer.
         """
         logger = self.logger
         self.global_steps = 0
-        
+
         # Determine if this is an AgentGym run upfront
         self.is_agentgym_run = self.config.data.env_name in KNOWN_AGENTGYM_ENVS
         print(f"[Trainer.fit] Is AgentGym run: {self.is_agentgym_run}")
 
         # perform validation before training
-        if self.val_reward_fn is not None or self.config.algorithm.get('reward_score_fn') == 'agentgym': # Check if validation is possible
+        can_validate = bool(self.reward_composer and self.reward_composer.components)
+        if can_validate:
              if self.config.trainer.get('val_before_train', True):
                 val_metrics = self._validate()
                 pprint(f'Initial validation metrics: {val_metrics}')
@@ -719,7 +746,7 @@ def fit(self):
                 if self.config.trainer.get('val_only', False):
                     return
         else:
-             print("[Trainer.fit] Skipping initial validation as no val_reward_fn or agentgym score fn is configured.")
+             print("[Trainer.fit] Skipping initial validation as no reward components are configured.")
 
         # we start from step 1
         self.global_steps += 1
@@ -728,25 +755,26 @@ def fit(self):
         generation_manager = None
         if self.is_agentgym_run:
             gen_config = AgentConfig(
-                 # ... (ensure all necessary AgentGym params are passed from self.config.data)
                 max_turns=self.config.max_turns,
                 max_start_length=self.config.data.max_start_length,
                 max_prompt_length=self.config.data.max_prompt_length,
                 max_response_length=self.config.data.max_response_length,
                 max_obs_length=self.config.data.max_obs_length,
-                num_gpus=self.config.trainer.n_gpus_per_node, 
-                env_name=self.config.data.env_name, 
-                env_port=self.config.data.env_port,
+                num_gpus=self.config.trainer.n_gpus_per_node,
+                env_name=self.config.data.env_name,
+                env_ports=self.config.data.env_ports, # Use the list of ports
                 env_server_base=self.config.data.env_server_base,
                 env_data_len=self.config.data.get('env_data_len', 200),
                 max_workers=self.config.actor_rollout_ref.rollout.get('max_workers', 10),
+                logging=self.config.get('logging') # Pass logging config
             )
+            agent_logger = self.logger if hasattr(self, 'logger') else None
             generation_manager = OpenManusAgent(
                 tokenizer=self.tokenizer,
                 actor_rollout_wg=self.actor_rollout_wg,
                 config=gen_config,
-                tool_manager=None, # Tool manager likely not needed
-                # is_validation = False # Default
+                is_validation = True,
+                logger=agent_logger # Pass logger
             )
 
         # start training loop
@@ -757,73 +785,79 @@ def fit(self):
                 timing_raw = {}
 
                 batch: DataProto = DataProto.from_single_dict(batch_dict)
-                # Do NOT repeat batch here initially, repeat happens after rollout/generation if needed
-                # batch = batch.repeat(repeat_times=self.config.actor_rollout_ref.rollout.n_agent, interleave=True)
-
-                # pop those keys for generation / initial prompt
+                original_batch_size = batch.batch['input_ids'].shape[0]
                 gen_batch = batch.pop(batch_keys=['input_ids', 'attention_mask', 'position_ids'])
-                if 'idx' not in gen_batch.meta_info: # Add index if missing
-                     gen_batch.meta_info['idx'] = torch.arange(gen_batch.batch['input_ids'].shape[0])
-                if 'reward_model' not in gen_batch.meta_info: # Add placeholder
-                     gen_batch.meta_info['reward_model'] = [{} for _ in range(gen_batch.batch['input_ids'].shape[0])] 
+                # ... (add idx and reward_model if missing) ...
+                if 'idx' not in gen_batch.meta_info: gen_batch.meta_info['idx'] = torch.arange(original_batch_size)
+                if 'reward_model' not in gen_batch.meta_info: gen_batch.meta_info['reward_model'] = [{} for _ in range(original_batch_size)]
 
                 ####################
                 # Rollout / Generation Step
                 ####################
+                final_gen_batch_output = None
                 with _timer('step', timing_raw):
                     if self.is_agentgym_run:
                         # --- AgentGym Path --- 
                         with _timer('gen', timing_raw):
-                            final_gen_batch_output = generation_manager.run_llm_loop(gen_batch=gen_batch)
-                            
-                        # Check if final_gen_batch_output is empty (e.g., error during rollout)
-                        if not final_gen_batch_output.batch: 
-                             print("[Trainer.fit] Warning: AgentGym rollout returned empty batch. Skipping step.")
-                             continue # Skip to next training batch
-                             
-                        # Add log probs (needed for PPO loss)
-                        with torch.no_grad():
-                            output_logp = self.actor_rollout_wg.compute_log_prob(final_gen_batch_output)
-                            final_gen_batch_output = final_gen_batch_output.union(output_logp)
-                            
-                        # Merge rollout results back with original batch info (like index)
-                        batch = batch.union(final_gen_batch_output)
+                             # Prepare output directory if logging images
+                            output_dir = None
+                            if self.config.logging and self.config.logging.get('log_images'):
+                                output_dir = os.path.join(
+                                    self.config.trainer.default_local_dir,
+                                    f"train_step_{self.global_steps}"
+                                )
+                            final_gen_batch_output = generation_manager.run_llm_loop(
+                                gen_batch=gen_batch,
+                                output_dir=output_dir,
+                                global_steps=self.global_steps
+                            )
+
+                        if not final_gen_batch_output or not final_gen_batch_output.batch:
+                            print("[Trainer.fit] Warning: AgentGym rollout returned empty batch. Skipping step.")
+                            continue # Skip to next training batch
+
+                        # Add log probs (needed for PPO loss calculation later)
+                        with torch.no_grad(), _timer('logp', timing_raw):
+                            # Need to ensure the batch passed here has the correct format
+                            # It should contain the full sequence (prompt+response)
+                            # The run_llm_loop might need adjustment to return this structure
+                            # OR we compute log probs based on what run_llm_loop returns
+                            # Assuming run_llm_loop returns a batch with keys like 'input_ids', 'attention_mask'
+                            # representing the full trajectory for each item
+                            if 'input_ids' in final_gen_batch_output.batch:
+                                 output_logp = self.actor_rollout_wg.compute_log_prob(final_gen_batch_output)
+                                 final_gen_batch_output = final_gen_batch_output.union(output_logp)
+                            else:
+                                 print("[Trainer.fit] Warning: Cannot compute log probabilities, expected keys not found in rollout output.")
+
+                        # Merge rollout results back with original batch info
+                        # Be careful about overwriting vs. union
+                        batch = gen_batch.union(final_gen_batch_output) # Start with original gen_batch meta, add rollout results
                         # Assign UID (can use index)
-                        if 'index' in batch.non_tensor_batch:
-                            batch.non_tensor_batch['uid'] = batch.non_tensor_batch['index'].copy()
+                        if 'idx' in batch.meta_info:
+                            batch.non_tensor_batch['uid'] = batch.meta_info['idx'].tolist() # Use list of indices as UID
                         else: # Fallback UID
-                             batch.non_tensor_batch['uid'] = np.array([str(uuid.uuid4()) for _ in range(len(batch.batch))], dtype=object)
+                             batch.non_tensor_batch['uid'] = [str(uuid.uuid4()) for _ in range(batch.batch['input_ids'].shape[0])] # Generate unique IDs
 
                     else:
-                        # --- Original Path --- 
-                        # Generate sequences
-                        with _timer('gen', timing_raw):
-                            gen_batch_output = self.actor_rollout_wg.generate_sequences(gen_batch)
-                        # Add log probs
-                        with torch.no_grad():
-                             output_logp = self.actor_rollout_wg.compute_log_prob(gen_batch_output)
-                             gen_batch_output = gen_batch_output.union(output_logp)
-                             
-                        # Assign UID
-                        batch.non_tensor_batch['uid'] = np.array([str(uuid.uuid4()) for _ in range(len(batch.batch))], dtype=object)
-                        # Merge generated results
-                        batch = batch.union(gen_batch_output)
+                        # --- Original Path (Non-AgentGym) ---
+                        # ... (original generation logic) ...
+                        print("[Trainer.fit] Non-AgentGym training path not fully updated for RewardComposer.")
+                        continue # Skip processing for now
 
                     # Apply batch repetition if configured (AFTER generation/rollout)
                     if self.config.actor_rollout_ref.rollout.n > 1:
+                        # Need to ensure UID handling is correct with repetition if needed by GRPO
                         batch = batch.repeat(repeat_times=self.config.actor_rollout_ref.rollout.n, interleave=True)
-                        
+
                     ####################
-                    # Post-Rollout Processing
+                    # Post-Rollout Processing (Common for both paths after merging)
                     ####################
                     self._balance_batch(batch, metrics=metrics)
-                    batch.meta_info['global_token_num'] = torch.sum(batch.batch['attention_mask'], dim=-1).tolist()
+                    # batch.meta_info['global_token_num'] = torch.sum(batch.batch['attention_mask'], dim=-1).tolist() # Recalculate if needed
 
-                    # Ensure correct dtypes (mostly long, except log_probs)
-                    for key in batch.batch.keys():
-                        if key != 'old_log_probs' and 'log_prob' not in key and 'rewards' not in key and 'scores' not in key: # Keep floats for rewards/scores/logprobs
-                            if torch.is_tensor(batch.batch[key]):
-                                 batch.batch[key] = batch.batch[key].long()
+                    # Ensure correct dtypes
+                    # ... (dtype correction logic) ...
 
                     # --- Compute Ref Log Probs --- 
                     if self.use_reference_policy:
@@ -836,86 +870,101 @@ def fit(self):
                         with _timer('values', timing_raw):
                             values = self.critic_wg.compute_values(batch)
                             batch = batch.union(values)
-                    
-                    # --- Compute Rewards & Advantages --- 
+
+                    # --- Compute Composite Rewards & Advantages --- 
                     with _timer('adv', timing_raw):
-                        # Use RM model if configured (and not AgentGym? Check logic)
-                        if self.use_rm and not self.is_agentgym_run: # Only use RM model if NOT agentgym?
-                            reward_tensor_rm = self.rm_wg.compute_rm_score(batch)
-                            batch = batch.union(reward_tensor_rm)
-                            if 'token_level_scores' not in batch.batch:
-                                batch.batch['token_level_scores'] = reward_tensor_rm.get('rm_scores', torch.zeros_like(batch.batch['input_ids'], dtype=torch.float32))
-                        
-                        # --- Get Token Level Scores/Rewards --- 
-                        if self.is_agentgym_run and 'token_level_rewards' in batch.batch:
-                            # Trust rewards from agentgym rollout
-                            print("[Trainer.fit] Using token_level_rewards directly from AgentGym rollout.")
-                            if 'token_level_scores' not in batch.batch: # Need scores for KL penalty
-                                batch.batch['token_level_scores'] = batch.batch['token_level_rewards'].clone()
-                            # token_level_rewards is already set 
-                            
-                        elif not self.is_agentgym_run and self.reward_fn: 
-                            # Use RewardManager for non-agentgym runs
-                            print("[Trainer.fit] Using self.reward_fn (RewardManager) to compute scores.")
-                            reward_tensor = self.reward_fn(batch)
-                            batch.batch['token_level_scores'] = reward_tensor
-                            batch.batch['token_level_rewards'] = batch.batch['token_level_scores'].clone()
-                        else:
-                            # Fallback: No rewards available
-                            print(f"[Trainer.fit] Warning: No reward source found (AgentGym: {self.is_agentgym_run}, reward_fn: {self.reward_fn is not None}). Using zeros.")
-                            if 'token_level_scores' not in batch.batch:
-                                 batch.batch['token_level_scores'] = torch.zeros_like(batch.batch['input_ids'], dtype=torch.float32)
-                            if 'token_level_rewards' not in batch.batch:
-                                 batch.batch['token_level_rewards'] = torch.zeros_like(batch.batch['input_ids'], dtype=torch.float32)
-
-                        # Apply KL penalty (modifies token_level_rewards)
+                        # 1. Calculate base scores using RewardComposer
+                        batch_size = batch.batch['input_ids'].shape[0]
+                        composite_scores = torch.zeros(batch_size, dtype=torch.float32) # Store per-item scores
+                        reward_breakdowns = [] # Store breakdown dicts
+
+                        trajectories = batch.meta_info.get('rollout_trajectory', [[]] * batch_size)
+                        reward_models_info = batch.meta_info.get('reward_model', [{}] * batch_size)
+
+                        for i in range(batch_size):
+                            total_score, breakdown = self.reward_composer.compute_total_reward(
+                                trajectory=trajectories[i] if i < len(trajectories) else [],
+                                reward_model_info=reward_models_info[i] if i < len(reward_models_info) else {},
+                                env_name=self.config.data.env_name
+                                # Add other kwargs if needed
+                            )
+                            composite_scores[i] = total_score
+                            reward_breakdowns.append(breakdown)
+
+                        # 2. Decide how composite_scores map to token_level_scores
+                        # Simplest approach: Assign the total score to the last token (or distribute later)
+                        # We need a tensor representing scores assigned to *tokens*
+                        # Let's assume the reward allocation happens in OpenManusAgent first,
+                        # returning token_level_rewards based on the total score.
+                        # So, `batch` should already contain `token_level_rewards` from the agent.
+                        # We need to create `token_level_scores` which is pre-KL penalty.
+                        if 'token_level_rewards' not in batch.batch:
+                             print("[Trainer.fit] Error: 'token_level_rewards' not found in batch after agent processing. Cannot proceed.")
+                             continue
+                             
+                        # Assign scores before KL penalty. Often same as rewards if no base RM score.
+                        batch.batch['token_level_scores'] = batch.batch['token_level_rewards'].clone()
+                        metrics['reward/composer_total_mean'] = composite_scores.mean().item()
+                        # Log average breakdown
+                        avg_breakdown = {f'reward/{k}/mean': np.mean([d.get(k, 0.0) for d in reward_breakdowns])
+                                           for k in reward_breakdowns[0].keys()} if reward_breakdowns else {}
+                        metrics.update(avg_breakdown)
+
+                        # 3. Apply KL penalty (modifies token_level_rewards based on token_level_scores)
                         if not self.config.actor_rollout_ref.actor.use_kl_loss and self.use_reference_policy:
-                            batch, kl_metrics = apply_kl_penalty(batch, kl_ctrl=self.kl_ctrl, kl_penalty=self.config.algorithm.kl_penalty)
-                            metrics.update(kl_metrics)
-                        
-                        # Compute advantages using the final token_level_rewards
+                            # apply_kl_penalty needs 'responses' and 'info_mask' or 'attention_mask'
+                            # Need to ensure these are correctly present in `batch`
+                            if 'responses' not in batch.batch or 'info_mask' not in batch.batch:
+                                 print("[Trainer.fit] Warning: Cannot apply KL penalty. Missing 'responses' or 'info_mask'.")
+                            else:
+                                 batch, kl_metrics = apply_kl_penalty(batch, kl_ctrl=self.kl_ctrl, kl_penalty=self.config.algorithm.kl_penalty)
+                                 metrics.update(kl_metrics)
+
+                        # 4. Compute advantages using the potentially KL-penalized token_level_rewards
                         batch = compute_advantage(batch,
                                                   adv_estimator=self.config.algorithm.adv_estimator,
                                                   gamma=self.config.algorithm.gamma,
                                                   lam=self.config.algorithm.lam,
                                                   num_repeat=self.config.actor_rollout_ref.rollout.n)
 
-                    # update critic
+                    # --- Update Critic --- 
                     if self.use_critic:
                         with _timer('update_critic', timing_raw):
                             critic_output = self.critic_wg.update_critic(batch)
                         critic_output_metrics = reduce_metrics(critic_output.meta_info['metrics'])
                         metrics.update(critic_output_metrics)
 
-                    # implement critic warmup
+                    # --- Update Actor --- 
                     if self.config.trainer.critic_warmup <= self.global_steps:
-                        # update actor
                         with _timer('update_actor', timing_raw):
-                            # Apply state masking only for agentgym runs if configured
-                            if self.is_agentgym_run and self.config.actor_rollout_ref.actor.state_masking:
-                                batch, metrics = self._create_loss_mask(batch, metrics)
+                            # Apply state masking if configured
+                            # if self.is_agentgym_run and self.config.actor_rollout_ref.actor.state_masking:
+                            #    batch, metrics = self._create_loss_mask(batch, metrics)
                             actor_output = self.actor_rollout_wg.update_actor(batch)
                         actor_output_metrics = reduce_metrics(actor_output.meta_info['metrics'])
                         metrics.update(actor_output_metrics)
 
-                    # validate
-                    # Check if validation is possible
-                    can_validate = self.config.algorithm.get('reward_score_fn') or self.val_reward_fn is not None
+                    # --- Validation --- 
                     if can_validate and self.config.trainer.test_freq > 0 and \
                         self.global_steps % self.config.trainer.test_freq == 0:
                         with _timer('testing', timing_raw):
                             val_metrics: dict = self._validate()
                         metrics.update(val_metrics)
 
-                    # ... (save checkpoint) ...
+                    # --- Save Checkpoint --- 
                     if self.config.trainer.save_freq > 0 and \
                             self.global_steps % self.config.trainer.save_freq == 0:
                         with _timer('save_checkpoint', timing_raw):
                             self._save_checkpoint()
 
-                # collect metrics
-                metrics.update(compute_data_metrics(batch=batch, use_critic=self.use_critic))
-                metrics.update(compute_timing_metrics(batch=batch, timing_raw=timing_raw))
+                # --- Collect and Log Metrics --- 
+                # compute_data_metrics needs 'advantages', 'returns', 'token_level_scores', 'token_level_rewards' etc.
+                # Ensure they are all present in the batch
+                try:
+                     metrics.update(compute_data_metrics(batch=batch, use_critic=self.use_critic))
+                     metrics.update(compute_timing_metrics(batch=batch, timing_raw=timing_raw))
+                except KeyError as e:
+                     print(f"[Trainer.fit] Warning: Skipping some metrics calculation due to missing key: {e}")
 
                 # Log metrics
                 logger.log(data=metrics, step=self.global_steps)
@@ -927,8 +976,9 @@ def fit(self):
                         val_metrics = self._validate()
                         pprint(f'Final validation metrics: {val_metrics}')
                         logger.log(data=val_metrics, step=self.global_steps)
+                    print("[Trainer.fit] Reached total training steps. Exiting.")
                     return
-    
+
     def _create_loss_mask(self, batch, metrics):
         """Create loss mask for state tokens."""
         response_length = batch.batch['responses'].shape[-1]
diff --git a/verl/utils/reward_score/reward_components.py b/verl/utils/reward_score/reward_components.py
new file mode 100644
index 00000000..c175cfd0
--- /dev/null
+++ b/verl/utils/reward_score/reward_components.py
@@ -0,0 +1,241 @@
+# verl/utils/reward_score/reward_components.py
+import re
+import numpy as np
+from abc import ABC, abstractmethod
+from typing import List, Dict, Any, Tuple
+
+class RewardComponent(ABC):
+    """Abstract base class for all reward components."""
+    def __init__(self, weight: float = 1.0, name: str = ""):
+        """
+        Initializes the reward component.
+
+        Args:
+            weight (float): The weight to apply to the computed reward score.
+            name (str): An optional name for the component. Defaults to the class name.
+        """
+        self.weight = weight
+        self.name = name or self.__class__.__name__
+
+    @abstractmethod
+    def compute(self, trajectory: List[Dict[str, Any]], **kwargs) -> float:
+        """
+        Computes the reward score based on the trajectory and other context.
+        Must be implemented by subclasses.
+
+        Args:
+            trajectory (List[Dict[str, Any]]): A list of dictionaries representing the
+                                               conversation or rollout steps. Each dict
+                                               typically contains 'from' (e.g., 'human', 'gpt')
+                                               and 'value' (the text content). Agent steps
+                                               might also include 'reward', 'info', etc.
+            **kwargs: Additional context that might be needed, such as original prompt,
+                      environment configuration, reward model info, etc.
+
+        Returns:
+            float: The computed reward score (before applying the weight).
+        """
+        pass
+
+    def __call__(self, trajectory: List[Dict[str, Any]], **kwargs) -> float:
+        """
+        Computes and returns the weighted reward score.
+
+        Args:
+            trajectory (List[Dict[str, Any]]): The rollout trajectory.
+            **kwargs: Additional context passed to the compute method.
+
+        Returns:
+            float: The final weighted reward score for this component.
+        """
+        # Apply the component's weight to the computed score
+        return self.weight * self.compute(trajectory, **kwargs)
+
+# --- Example Concrete Reward Components ---
+
+class GoalReward(RewardComponent):
+    """Provides reward based on task success/failure indicated in the final step info."""
+    def compute(self, trajectory: List[Dict[str, Any]], **kwargs) -> float:
+        """
+        Checks the 'info' dictionary of the last step in the trajectory for success indicators.
+        Assumes AgentGym-like 'info' structure or relevant keys in 'reward_model_info'.
+
+        Args:
+            trajectory: The rollout trajectory.
+            **kwargs: May contain 'reward_model_info' for non-AgentGym scenarios.
+
+        Returns:
+            1.0 for success, -1.0 for failure (optional), or score value, otherwise 0.0.
+        """
+        if not trajectory:
+            return 0.0
+
+        # Check the last step first, assuming it contains final env info
+        last_step = trajectory[-1]
+        if isinstance(last_step.get('info'), dict):
+            info = last_step['info']
+            if info.get('success') is True:
+                return 1.0
+            if info.get('fail') is True:
+                return -1.0  # Optional penalty for failure
+            # Use 'score' if success/fail flags are not present
+            return float(info.get('score', 0.0))
+
+        # Fallback: Check reward_model_info if provided (for non-AgentGym?)
+        reward_model_info = kwargs.get('reward_model_info', {})
+        if reward_model_info.get('success') is True: # Example key
+             return 1.0
+
+        # Default to 0 if no clear success/failure/score indicator is found
+        return 0.0
+
+class LengthPenalty(RewardComponent):
+    """Applies a penalty based on the length of the last agent response."""
+    def __init__(self, weight: float = -0.01, max_length: int = 500, min_length: int = 10, penalty_type: str = "linear"):
+        """
+        Initializes the length penalty component.
+
+        Args:
+            weight (float): The weight for the penalty (typically negative).
+            max_length (int): Responses longer than this will be penalized.
+            min_length (int): Responses shorter than this can optionally be penalized.
+            penalty_type (str): The type of penalty calculation ('linear', 'quadratic', 'log').
+        """
+        super().__init__(weight=weight)
+        assert max_length > min_length, "max_length must be greater than min_length"
+        self.max_length = max_length
+        self.min_length = min_length
+        self.penalty_type = penalty_type
+
+    def _get_last_response_length(self, trajectory: List[Dict[str, Any]]) -> int:
+        """Helper to find the length of the last 'gpt' response."""
+        for msg in reversed(trajectory):
+            if msg.get('from') == 'gpt':
+                # Consider using token length for better consistency
+                # For now, using character length
+                return len(msg.get('value', ""))
+        return 0
+
+    def compute(self, trajectory: List[Dict[str, Any]], **kwargs) -> float:
+        """
+        Calculates the penalty based on the length of the last agent response.
+
+        Args:
+            trajectory: The rollout trajectory.
+            **kwargs: Not used by this component.
+
+        Returns:
+            A non-positive value representing the penalty (0 if within bounds).
+        """
+        length = self._get_last_response_length(trajectory)
+        penalty = 0.0
+
+        if length > self.max_length:
+            diff = length - self.max_length
+            if self.penalty_type == "linear":
+                penalty = diff
+            elif self.penalty_type == "quadratic":
+                penalty = diff ** 2
+            elif self.penalty_type == "log":
+                 # Use log1p to handle diff=0 gracefully (log(1)=0)
+                 penalty = np.log1p(diff)
+            else:
+                 raise ValueError(f"Unknown penalty_type: {self.penalty_type}")
+        elif length < self.min_length:
+             diff = self.min_length - length
+             # Apply similar penalty logic for being too short (optional)
+             if self.penalty_type == "linear":
+                 penalty = diff # Example: Linear penalty for being too short
+             # Add quadratic/log if needed for short responses
+
+        # The penalty value itself is positive or zero, weight makes it negative
+        return penalty
+
+
+class FormatReward(RewardComponent):
+    """Rewards responses that match specific regular expression patterns."""
+    def __init__(self, weight: float = 0.5, required_patterns: List[str] = None):
+        """
+        Initializes the format reward component.
+
+        Args:
+            weight (float): The reward weight (typically positive).
+            required_patterns (List[str]): A list of regex patterns. A reward is given
+                                           if the last response matches *any* of these patterns.
+        """
+        super().__init__(weight=weight)
+        # Compile regex patterns for efficiency
+        self.required_patterns = [re.compile(p, re.DOTALL) for p in required_patterns] if required_patterns else []
+
+    def _get_last_response(self, trajectory: List[Dict[str, Any]]) -> str:
+        """Helper to find the text of the last 'gpt' response."""
+        for msg in reversed(trajectory):
+            if msg.get('from') == 'gpt':
+                return msg.get('value', "")
+        return ""
+
+    def compute(self, trajectory: List[Dict[str, Any]], **kwargs) -> float:
+        """
+        Checks if the last agent response matches any of the required patterns.
+
+        Args:
+            trajectory: The rollout trajectory.
+            **kwargs: Not used by this component.
+
+        Returns:
+            1.0 if a match is found, 0.0 otherwise (before applying weight).
+        """
+        if not self.required_patterns:
+            return 0.0
+
+        last_response = self._get_last_response(trajectory)
+
+        # Check if any pattern matches
+        for pattern in self.required_patterns:
+            if pattern.search(last_response):
+                # Reward is 1.0 if format matches, weight scales it
+                return 1.0
+
+        # No pattern matched
+        return 0.0
+
+# --- Reward Composer ---
+class RewardComposer:
+    """Combines multiple reward components to compute a total reward."""
+    def __init__(self, components: List[RewardComponent]):
+        """
+        Initializes the composer with a list of reward components.
+
+        Args:
+            components (List[RewardComponent]): The reward components to combine.
+        """
+        self.components = components
+
+    def compute_total_reward(self, trajectory: List[Dict[str, Any]], **kwargs) -> Tuple[float, Dict[str, float]]:
+        """
+        Computes the total weighted reward by summing the outputs of all components.
+
+        Args:
+            trajectory (List[Dict[str, Any]]): The rollout trajectory.
+            **kwargs: Additional context passed to each component's compute method.
+
+        Returns:
+            Tuple[float, Dict[str, float]]: A tuple containing:
+                - The total weighted reward.
+                - A dictionary breaking down the reward contributed by each component (by name).
+        """
+        total_reward = 0.0
+        reward_breakdown = {}
+        for component in self.components:
+            try:
+                # Call the component's __call__ method to get the weighted reward
+                component_reward = component(trajectory, **kwargs)
+                total_reward += component_reward
+                # Store the individual weighted reward for analysis
+                reward_breakdown[component.name] = component_reward
+            except Exception as e:
+                # Log error but continue, potentially assigning 0 reward for the failed component
+                print(f"Error computing reward for component {component.name}: {e}")
+                reward_breakdown[component.name] = 0.0 # Or handle as needed
+
+        return total_reward, reward_breakdown 
\ No newline at end of file
diff --git a/wandb/latest-run b/wandb/latest-run
new file mode 120000
index 00000000..ed2932dd
--- /dev/null
+++ b/wandb/latest-run
@@ -0,0 +1 @@
+run-20250427_034358-dtcb0ywm
\ No newline at end of file
diff --git a/wandb/run-20250427_034048-wm59xvbf/files/requirements.txt b/wandb/run-20250427_034048-wm59xvbf/files/requirements.txt
new file mode 100644
index 00000000..e401f074
--- /dev/null
+++ b/wandb/run-20250427_034048-wm59xvbf/files/requirements.txt
@@ -0,0 +1,265 @@
+colorama==0.4.6
+setproctitle==1.2.2
+psutil==7.0.0
+safetensors==0.4.5
+xxhash==3.5.0
+multiprocess==0.70.16
+tqdm==4.67.0
+nvidia-nvjitlink-cu12==12.4.127
+pyarrow==18.0.0
+requests==2.32.3
+huggingface-hub==0.29.1
+dill==0.3.8
+sympy==1.13.1
+nvidia-cudnn-cu12==9.1.0.70
+gym==0.23.1
+rsa==4.9.1
+bitsandbytes==0.45.5
+MarkupSafe==3.0.2
+matplotlib==3.10.1
+Jinja2==3.1.6
+pydantic==2.11.3
+grpcio==1.71.0
+watchfiles==1.0.5
+anyio==3.7.1
+cloudpathlib==0.16.0
+chardet==5.2.0
+pyasn1==0.6.1
+marisa-trie==1.2.1
+aiohttp-cors==0.8.1
+aiosignal==1.3.2
+httpcore==1.0.8
+nvidia-cusolver-cu12==11.4.5.107
+jsonschema==4.23.0
+pydantic_core==2.33.1
+nvidia-nvtx-cu12==12.1.105
+intel-openmp==2021.4.0
+pip==25.0
+google-api-core==2.24.2
+kiwisolver==1.4.8
+spacy-legacy==3.0.12
+prompt_toolkit==3.0.51
+mdurl==0.1.2
+outlines==0.0.46
+uvicorn==0.34.2
+hydra-core==1.3.2
+prometheus_client==0.21.1
+markdown-it-py==3.0.0
+hjson==3.1.0
+distilabel==1.5.3
+fonttools==4.57.0
+typepy==1.3.4
+lxml==5.3.2
+Pygments==2.19.1
+python-dateutil==2.9.0.post0
+rpds-py==0.24.0
+mbstrdecoder==1.1.4
+python-Levenshtein==0.27.1
+mkl==2021.4.0
+idna==3.10
+language_data==1.3.0
+six==1.17.0
+sentencepiece==0.2.0
+weasel==0.3.4
+typer==0.9.4
+gym-notices==0.0.8
+einops==0.8.1
+jsonschema-specifications==2024.10.1
+wcwidth==0.2.13
+llvmlite==0.44.0
+filelock==3.18.0
+fsspec==2024.12.0
+httptools==0.6.4
+tcolorpy==0.1.7
+httpx==0.28.1
+latex2sympy2_extended==1.0.6
+smart-open==6.4.0
+setuptools==75.8.0
+colorama==0.4.6
+ray==2.44.1
+blake3==1.0.4
+platformdirs==4.3.7
+antlr4-python3-runtime==4.9.3
+urllib3==2.4.0
+torchaudio==2.5.1
+catalogue==2.0.10
+certifi==2025.1.31
+rich==14.0.0
+typing-inspection==0.4.0
+nvidia-cusparse-cu12==12.1.0.106
+peft==0.15.2
+en-core-web-lg==3.7.1
+fastapi==0.115.12
+aiohappyeyeballs==2.6.1
+cachetools==5.5.2
+orjson==3.10.16
+gitdb==4.0.12
+wheel==0.45.1
+pytz==2025.2
+torch==2.4.0
+protobuf==3.20.3
+tensordict==0.7.2
+preshed==3.0.9
+msgpack==1.1.0
+compressed-tensors==0.9.1
+googleapis-common-protos==1.70.0
+tabulate==0.9.0
+aenum==3.1.15
+annotated-types==0.7.0
+zipp==3.21.0
+pyasn1_modules==0.4.2
+nvidia-cuda-runtime-cu12==12.1.105
+setproctitle==1.3.5
+torchvision==0.19.0
+propcache==0.3.1
+packaging==25.0
+packaging==24.2
+diskcache==5.6.3
+inquirerpy==0.3.4
+proto-plus==1.26.1
+interegular==0.3.3
+pyzmq==26.4.0
+rouge_score==0.1.2
+GitPython==3.1.44
+partial-json-parser==0.2.1.1.post5
+langdetect==1.0.9
+nltk==3.9.1
+regex==2024.11.6
+virtualenv==20.30.0
+Werkzeug==2.3.8
+murmurhash==1.0.12
+tabledata==1.3.4
+smmap==5.0.2
+frozenlist==1.6.0
+nest-asyncio==1.6.0
+mkl_random==1.2.8
+depyf==0.18.0
+nvidia-cuda-cupti-cu12==12.1.105
+wasabi==1.1.3
+starlette==0.46.2
+parameterized==0.9.0
+cloudpickle==3.1.1
+numba==0.61.2
+e2b-code-interpreter==1.2.0
+airportsdata==20250224
+prometheus-fastapi-instrumentator==7.1.0
+pillow==11.2.1
+pandas==2.2.3
+deepspeed==0.15.4
+nvidia-nccl-cu12==2.20.5
+joblib==1.4.2
+sniffio==1.3.1
+psutil==7.0.0
+pfzy==0.3.4
+DataProperty==1.1.0
+e2b==1.3.3
+xgrammar==0.1.18
+uvloop==0.21.0
+pyparsing==3.2.3
+click==8.1.8
+nvidia-curand-cu12==10.3.2.106
+yarl==1.20.0
+importlib_metadata==8.6.1
+pathvalidate==3.2.3
+math-verify==0.5.2
+colorful==0.5.6
+nvidia-cufft-cu12==11.0.2.54
+mkl-service==2.4.0
+faiss==1.9.0
+lm-format-enforcer==0.10.6
+nvidia-cusparselt-cu12==0.6.2
+mkl_fft==1.3.11
+numpy==1.26.4
+scikit-learn==1.6.1
+threadpoolctl==3.6.0
+vllm==0.6.3
+opencensus==0.11.4
+srsly==2.5.1
+tbb==2021.13.1
+referencing==0.36.2
+omegaconf==2.3.0
+tokenizers==0.21.1
+scipy==1.15.2
+google-auth==2.39.0
+h11==0.14.0
+langcodes==3.5.0
+universal_pathlib==0.2.6
+opencv-python-headless==4.11.0.86
+nvidia-cublas-cu12==12.1.3.1
+mistral_common==1.5.4
+contourpy==1.3.2
+docker-pycreds==0.4.0
+xformers==0.0.27.post2
+hf_transfer==0.1.9
+RapidFuzz==3.13.0
+openmanus-rl==0.0.1
+tblib==3.1.0
+sentry-sdk==2.26.1
+astor==0.8.1
+mpmath==1.3.0
+nvidia-ml-py==12.570.86
+pyairports==2.1.1
+msgspec==0.19.0
+codetiming==1.4.0
+lighteval==0.6.0.dev0
+opencensus-context==0.1.3
+termcolor==2.3.0
+cycler==0.12.1
+openai==1.75.0
+distro==1.9.0
+absl-py==2.2.2
+transformers==4.49.0
+spacy==3.7.2
+thinc==8.2.5
+aiohttp==3.11.18
+outlines_core==0.1.26
+sacrebleu==2.5.1
+colorlog==6.9.0
+wandb==0.19.9
+accelerate==1.4.0
+datasets==3.5.0
+pycountry==24.6.1
+triton==3.0.0
+python-dotenv==1.1.0
+ninja==1.11.1.4
+tzdata==2025.2
+py-spy==0.4.0
+spacy-loggers==1.0.5
+Levenshtein==0.27.1
+websockets==15.0.1
+distlib==0.3.9
+typing_extensions==4.13.2
+blis==0.7.11
+lark==1.2.2
+gguf==0.10.0
+networkx==3.4.2
+nvidia-cuda-nvrtc-cu12==12.1.105
+pytablewriter==1.2.1
+PyYAML==6.0.2
+tiktoken==0.9.0
+multidict==6.4.3
+cymem==2.0.11
+confection==0.1.5
+attrs==25.3.0
+py-cpuinfo==9.0.0
+liger_kernel==0.5.3
+jiter==0.9.0
+charset-normalizer==3.4.1
+portalocker==3.1.1
+trl==0.16.0.dev0
+wheel==0.43.0
+typing_extensions==4.12.2
+jaraco.functools==4.0.1
+importlib_metadata==8.0.0
+typeguard==4.3.0
+autocommand==2.2.2
+more-itertools==10.3.0
+tomli==2.0.1
+zipp==3.19.2
+packaging==24.2
+jaraco.text==3.12.1
+backports.tarfile==1.2.0
+jaraco.collections==5.1.0
+jaraco.context==5.3.0
+platformdirs==4.2.2
+inflect==7.3.1
diff --git a/wandb/run-20250427_034048-wm59xvbf/files/wandb-metadata.json b/wandb/run-20250427_034048-wm59xvbf/files/wandb-metadata.json
new file mode 100644
index 00000000..7c79f120
--- /dev/null
+++ b/wandb/run-20250427_034048-wm59xvbf/files/wandb-metadata.json
@@ -0,0 +1,114 @@
+{
+  "os": "Linux-5.15.0-122-generic-x86_64-with-glibc2.35",
+  "python": "CPython 3.11.11",
+  "startedAt": "2025-04-27T03:40:48.246634Z",
+  "args": [
+    "--node-ip-address=172.22.224.42",
+    "--node-manager-port=41307",
+    "--object-store-name=/tmp/ray/session_2025-04-27_03-40-34_059482_3447017/sockets/plasma_store",
+    "--raylet-name=/tmp/ray/session_2025-04-27_03-40-34_059482_3447017/sockets/raylet",
+    "--redis-address=None",
+    "--metrics-agent-port=63833",
+    "--logging-rotate-bytes=536870912",
+    "--logging-rotate-backup-count=5",
+    "--runtime-env-agent-port=64912",
+    "--gcs-address=172.22.224.42:63199",
+    "--session-name=session_2025-04-27_03-40-34_059482_3447017",
+    "--temp-dir=/tmp/ray",
+    "--webui=127.0.0.1:8265",
+    "--cluster-id=a18fdfb88869696431e182aec4c3949e557cf38c9cde24d511499901",
+    "--startup-token=96",
+    "--worker-launch-time-ms=1745725238583",
+    "--node-id=05605af7f9e59171525db68c3817251417d89e459b3c11f3cfed5bbc",
+    "--runtime-env-hash=1830736042"
+  ],
+  "program": "/home/kunlunz2/.conda/envs/openmanus-rl/lib/python3.11/site-packages/ray/_private/workers/default_worker.py",
+  "git": {
+    "remote": "git@github.com:OpenManus/OpenManus-RL.git",
+    "commit": "cd671d8182a4ef322e9d44431b6d43a5709402f9"
+  },
+  "email": "zhuklun@mail2.sysu.edu.cn",
+  "root": "/home/kunlunz2/github_repos/OpenManus-RL",
+  "host": "sn4622122392",
+  "executable": "/home/kunlunz2/.conda/envs/openmanus-rl/bin/python3",
+  "cpu_count": 48,
+  "cpu_count_logical": 96,
+  "gpu": "NVIDIA RTX A6000",
+  "gpu_count": 10,
+  "disk": {
+    "/": {
+      "total": "2002365816832",
+      "used": "1296476807168"
+    }
+  },
+  "memory": {
+    "total": "1081814921216"
+  },
+  "cpu": {
+    "count": 48,
+    "countLogical": 96
+  },
+  "gpu_nvidia": [
+    {
+      "name": "NVIDIA RTX A6000",
+      "memoryTotal": "51527024640",
+      "cudaCores": 10752,
+      "architecture": "Ampere"
+    },
+    {
+      "name": "NVIDIA RTX A6000",
+      "memoryTotal": "51527024640",
+      "cudaCores": 10752,
+      "architecture": "Ampere"
+    },
+    {
+      "name": "NVIDIA RTX A6000",
+      "memoryTotal": "51527024640",
+      "cudaCores": 10752,
+      "architecture": "Ampere"
+    },
+    {
+      "name": "NVIDIA RTX A6000",
+      "memoryTotal": "51527024640",
+      "cudaCores": 10752,
+      "architecture": "Ampere"
+    },
+    {
+      "name": "NVIDIA RTX A6000",
+      "memoryTotal": "51527024640",
+      "cudaCores": 10752,
+      "architecture": "Ampere"
+    },
+    {
+      "name": "NVIDIA RTX A6000",
+      "memoryTotal": "51527024640",
+      "cudaCores": 10752,
+      "architecture": "Ampere"
+    },
+    {
+      "name": "NVIDIA RTX A6000",
+      "memoryTotal": "51527024640",
+      "cudaCores": 10752,
+      "architecture": "Ampere"
+    },
+    {
+      "name": "NVIDIA RTX A6000",
+      "memoryTotal": "51527024640",
+      "cudaCores": 10752,
+      "architecture": "Ampere"
+    },
+    {
+      "name": "NVIDIA RTX A6000",
+      "memoryTotal": "51527024640",
+      "cudaCores": 10752,
+      "architecture": "Ampere"
+    },
+    {
+      "name": "NVIDIA RTX A6000",
+      "memoryTotal": "51527024640",
+      "cudaCores": 10752,
+      "architecture": "Ampere"
+    }
+  ],
+  "cudaVersion": "12.4"
+}
\ No newline at end of file
diff --git a/openmanus_rl/agentgym/OpenManus/app/flow/__init__.py b/wandb/run-20250427_034048-wm59xvbf/run-wm59xvbf.wandb
similarity index 100%
rename from openmanus_rl/agentgym/OpenManus/app/flow/__init__.py
rename to wandb/run-20250427_034048-wm59xvbf/run-wm59xvbf.wandb
diff --git a/wandb/run-20250427_034358-dtcb0ywm/files/requirements.txt b/wandb/run-20250427_034358-dtcb0ywm/files/requirements.txt
new file mode 100644
index 00000000..d932415b
--- /dev/null
+++ b/wandb/run-20250427_034358-dtcb0ywm/files/requirements.txt
@@ -0,0 +1,266 @@
+colorama==0.4.6
+setproctitle==1.2.2
+psutil==7.0.0
+safetensors==0.4.5
+xxhash==3.5.0
+multiprocess==0.70.16
+tqdm==4.67.0
+nvidia-nvjitlink-cu12==12.4.127
+pyarrow==18.0.0
+requests==2.32.3
+huggingface-hub==0.29.1
+dill==0.3.8
+sympy==1.13.1
+nvidia-cudnn-cu12==9.1.0.70
+gym==0.23.1
+rsa==4.9.1
+bitsandbytes==0.45.5
+MarkupSafe==3.0.2
+matplotlib==3.10.1
+Jinja2==3.1.6
+pydantic==2.11.3
+grpcio==1.71.0
+watchfiles==1.0.5
+anyio==3.7.1
+cloudpathlib==0.16.0
+chardet==5.2.0
+pyasn1==0.6.1
+marisa-trie==1.2.1
+aiohttp-cors==0.8.1
+aiosignal==1.3.2
+httpcore==1.0.8
+nvidia-cusolver-cu12==11.4.5.107
+jsonschema==4.23.0
+pydantic_core==2.33.1
+nvidia-nvtx-cu12==12.1.105
+intel-openmp==2021.4.0
+pip==25.0
+google-api-core==2.24.2
+kiwisolver==1.4.8
+spacy-legacy==3.0.12
+prompt_toolkit==3.0.51
+mdurl==0.1.2
+outlines==0.0.46
+uvicorn==0.34.2
+hydra-core==1.3.2
+prometheus_client==0.21.1
+markdown-it-py==3.0.0
+hjson==3.1.0
+distilabel==1.5.3
+fonttools==4.57.0
+typepy==1.3.4
+lxml==5.3.2
+Pygments==2.19.1
+python-dateutil==2.9.0.post0
+rpds-py==0.24.0
+mbstrdecoder==1.1.4
+python-Levenshtein==0.27.1
+mkl==2021.4.0
+idna==3.10
+language_data==1.3.0
+six==1.17.0
+sentencepiece==0.2.0
+weasel==0.3.4
+typer==0.9.4
+gym-notices==0.0.8
+einops==0.8.1
+jsonschema-specifications==2024.10.1
+wcwidth==0.2.13
+llvmlite==0.44.0
+filelock==3.18.0
+fsspec==2024.12.0
+httptools==0.6.4
+tcolorpy==0.1.7
+httpx==0.28.1
+latex2sympy2_extended==1.0.6
+smart-open==6.4.0
+setuptools==75.8.0
+colorama==0.4.6
+ray==2.44.1
+blake3==1.0.4
+platformdirs==4.3.7
+antlr4-python3-runtime==4.9.3
+urllib3==2.4.0
+torchaudio==2.5.1
+catalogue==2.0.10
+certifi==2025.1.31
+rich==14.0.0
+typing-inspection==0.4.0
+nvidia-cusparse-cu12==12.1.0.106
+peft==0.15.2
+en-core-web-lg==3.7.1
+fastapi==0.115.12
+aiohappyeyeballs==2.6.1
+cachetools==5.5.2
+orjson==3.10.16
+gitdb==4.0.12
+wheel==0.45.1
+pytz==2025.2
+torch==2.4.0
+protobuf==3.20.3
+tensordict==0.7.2
+preshed==3.0.9
+msgpack==1.1.0
+compressed-tensors==0.9.1
+googleapis-common-protos==1.70.0
+tabulate==0.9.0
+aenum==3.1.15
+annotated-types==0.7.0
+zipp==3.21.0
+pyasn1_modules==0.4.2
+nvidia-cuda-runtime-cu12==12.1.105
+setproctitle==1.3.5
+torchvision==0.19.0
+propcache==0.3.1
+packaging==25.0
+packaging==24.2
+diskcache==5.6.3
+inquirerpy==0.3.4
+proto-plus==1.26.1
+interegular==0.3.3
+pyzmq==26.4.0
+rouge_score==0.1.2
+GitPython==3.1.44
+partial-json-parser==0.2.1.1.post5
+langdetect==1.0.9
+nltk==3.9.1
+regex==2024.11.6
+virtualenv==20.30.0
+Werkzeug==2.3.8
+murmurhash==1.0.12
+tabledata==1.3.4
+smmap==5.0.2
+frozenlist==1.6.0
+nest-asyncio==1.6.0
+mkl_random==1.2.8
+depyf==0.18.0
+nvidia-cuda-cupti-cu12==12.1.105
+wasabi==1.1.3
+starlette==0.46.2
+parameterized==0.9.0
+cloudpickle==3.1.1
+numba==0.61.2
+e2b-code-interpreter==1.2.0
+airportsdata==20250224
+prometheus-fastapi-instrumentator==7.1.0
+pillow==11.2.1
+pandas==2.2.3
+deepspeed==0.15.4
+nvidia-nccl-cu12==2.20.5
+joblib==1.4.2
+sniffio==1.3.1
+psutil==7.0.0
+pfzy==0.3.4
+DataProperty==1.1.0
+e2b==1.3.3
+xgrammar==0.1.18
+uvloop==0.21.0
+pyparsing==3.2.3
+click==8.1.8
+nvidia-curand-cu12==10.3.2.106
+yarl==1.20.0
+importlib_metadata==8.6.1
+pathvalidate==3.2.3
+math-verify==0.5.2
+colorful==0.5.6
+nvidia-cufft-cu12==11.0.2.54
+mkl-service==2.4.0
+faiss==1.9.0
+lm-format-enforcer==0.10.6
+nvidia-cusparselt-cu12==0.6.2
+mkl_fft==1.3.11
+numpy==1.26.4
+scikit-learn==1.6.1
+threadpoolctl==3.6.0
+vllm==0.6.3
+opencensus==0.11.4
+srsly==2.5.1
+tbb==2021.13.1
+referencing==0.36.2
+omegaconf==2.3.0
+tokenizers==0.21.1
+scipy==1.15.2
+google-auth==2.39.0
+h11==0.14.0
+langcodes==3.5.0
+universal_pathlib==0.2.6
+opencv-python-headless==4.11.0.86
+nvidia-cublas-cu12==12.1.3.1
+mistral_common==1.5.4
+contourpy==1.3.2
+docker-pycreds==0.4.0
+xformers==0.0.27.post2
+hf_transfer==0.1.9
+RapidFuzz==3.13.0
+openmanus-rl==0.0.1
+tblib==3.1.0
+sentry-sdk==2.26.1
+astor==0.8.1
+mpmath==1.3.0
+nvidia-ml-py==12.570.86
+pyairports==2.1.1
+msgspec==0.19.0
+codetiming==1.4.0
+lighteval==0.6.0.dev0
+opencensus-context==0.1.3
+termcolor==2.3.0
+cycler==0.12.1
+openai==1.75.0
+distro==1.9.0
+absl-py==2.2.2
+transformers==4.49.0
+spacy==3.7.2
+thinc==8.2.5
+aiohttp==3.11.18
+outlines_core==0.1.26
+sacrebleu==2.5.1
+colorlog==6.9.0
+wandb==0.19.9
+accelerate==1.4.0
+datasets==3.5.0
+pycountry==24.6.1
+triton==3.0.0
+python-dotenv==1.1.0
+ninja==1.11.1.4
+tzdata==2025.2
+py-spy==0.4.0
+spacy-loggers==1.0.5
+Levenshtein==0.27.1
+websockets==15.0.1
+distlib==0.3.9
+typing_extensions==4.13.2
+blis==0.7.11
+lark==1.2.2
+gguf==0.10.0
+networkx==3.4.2
+nvidia-cuda-nvrtc-cu12==12.1.105
+pytablewriter==1.2.1
+PyYAML==6.0.2
+tiktoken==0.9.0
+multidict==6.4.3
+cymem==2.0.11
+confection==0.1.5
+attrs==25.3.0
+py-cpuinfo==9.0.0
+liger_kernel==0.5.3
+jiter==0.9.0
+charset-normalizer==3.4.1
+portalocker==3.1.1
+trl==0.16.0.dev0
+flash_attn==2.7.4.post1
+wheel==0.43.0
+typing_extensions==4.12.2
+jaraco.functools==4.0.1
+importlib_metadata==8.0.0
+typeguard==4.3.0
+autocommand==2.2.2
+more-itertools==10.3.0
+tomli==2.0.1
+zipp==3.19.2
+packaging==24.2
+jaraco.text==3.12.1
+backports.tarfile==1.2.0
+jaraco.collections==5.1.0
+jaraco.context==5.3.0
+platformdirs==4.2.2
+inflect==7.3.1
diff --git a/wandb/run-20250427_034358-dtcb0ywm/files/wandb-metadata.json b/wandb/run-20250427_034358-dtcb0ywm/files/wandb-metadata.json
new file mode 100644
index 00000000..a602031a
--- /dev/null
+++ b/wandb/run-20250427_034358-dtcb0ywm/files/wandb-metadata.json
@@ -0,0 +1,114 @@
+{
+  "os": "Linux-5.15.0-122-generic-x86_64-with-glibc2.35",
+  "python": "CPython 3.11.11",
+  "startedAt": "2025-04-27T03:43:58.536736Z",
+  "args": [
+    "--node-ip-address=172.22.224.42",
+    "--node-manager-port=42617",
+    "--object-store-name=/tmp/ray/session_2025-04-27_03-43-44_914355_3460697/sockets/plasma_store",
+    "--raylet-name=/tmp/ray/session_2025-04-27_03-43-44_914355_3460697/sockets/raylet",
+    "--redis-address=None",
+    "--metrics-agent-port=42919",
+    "--logging-rotate-bytes=536870912",
+    "--logging-rotate-backup-count=5",
+    "--runtime-env-agent-port=59692",
+    "--gcs-address=172.22.224.42:62654",
+    "--session-name=session_2025-04-27_03-43-44_914355_3460697",
+    "--temp-dir=/tmp/ray",
+    "--webui=127.0.0.1:8265",
+    "--cluster-id=1f6a9f7213643f12f808e20060afd6d45cb9f78b6cd32249ebc8888e",
+    "--startup-token=96",
+    "--worker-launch-time-ms=1745725428581",
+    "--node-id=26fecf4bec15bf02994ab12f87650823927266969b36e5de4d29a793",
+    "--runtime-env-hash=1830736042"
+  ],
+  "program": "/home/kunlunz2/.conda/envs/openmanus-rl/lib/python3.11/site-packages/ray/_private/workers/default_worker.py",
+  "git": {
+    "remote": "git@github.com:OpenManus/OpenManus-RL.git",
+    "commit": "cd671d8182a4ef322e9d44431b6d43a5709402f9"
+  },
+  "email": "zhuklun@mail2.sysu.edu.cn",
+  "root": "/home/kunlunz2/github_repos/OpenManus-RL",
+  "host": "sn4622122392",
+  "executable": "/home/kunlunz2/.conda/envs/openmanus-rl/bin/python3",
+  "cpu_count": 48,
+  "cpu_count_logical": 96,
+  "gpu": "NVIDIA RTX A6000",
+  "gpu_count": 10,
+  "disk": {
+    "/": {
+      "total": "2002365816832",
+      "used": "1297094238208"
+    }
+  },
+  "memory": {
+    "total": "1081814921216"
+  },
+  "cpu": {
+    "count": 48,
+    "countLogical": 96
+  },
+  "gpu_nvidia": [
+    {
+      "name": "NVIDIA RTX A6000",
+      "memoryTotal": "51527024640",
+      "cudaCores": 10752,
+      "architecture": "Ampere"
+    },
+    {
+      "name": "NVIDIA RTX A6000",
+      "memoryTotal": "51527024640",
+      "cudaCores": 10752,
+      "architecture": "Ampere"
+    },
+    {
+      "name": "NVIDIA RTX A6000",
+      "memoryTotal": "51527024640",
+      "cudaCores": 10752,
+      "architecture": "Ampere"
+    },
+    {
+      "name": "NVIDIA RTX A6000",
+      "memoryTotal": "51527024640",
+      "cudaCores": 10752,
+      "architecture": "Ampere"
+    },
+    {
+      "name": "NVIDIA RTX A6000",
+      "memoryTotal": "51527024640",
+      "cudaCores": 10752,
+      "architecture": "Ampere"
+    },
+    {
+      "name": "NVIDIA RTX A6000",
+      "memoryTotal": "51527024640",
+      "cudaCores": 10752,
+      "architecture": "Ampere"
+    },
+    {
+      "name": "NVIDIA RTX A6000",
+      "memoryTotal": "51527024640",
+      "cudaCores": 10752,
+      "architecture": "Ampere"
+    },
+    {
+      "name": "NVIDIA RTX A6000",
+      "memoryTotal": "51527024640",
+      "cudaCores": 10752,
+      "architecture": "Ampere"
+    },
+    {
+      "name": "NVIDIA RTX A6000",
+      "memoryTotal": "51527024640",
+      "cudaCores": 10752,
+      "architecture": "Ampere"
+    },
+    {
+      "name": "NVIDIA RTX A6000",
+      "memoryTotal": "51527024640",
+      "cudaCores": 10752,
+      "architecture": "Ampere"
+    }
+  ],
+  "cudaVersion": "12.4"
+}
\ No newline at end of file
diff --git a/wandb/run-20250427_034358-dtcb0ywm/run-dtcb0ywm.wandb b/wandb/run-20250427_034358-dtcb0ywm/run-dtcb0ywm.wandb
new file mode 100644
index 00000000..44a8b1e6
Binary files /dev/null and b/wandb/run-20250427_034358-dtcb0ywm/run-dtcb0ywm.wandb differ

Date	Location	Activities
Apr 15	Tokyo	Arrival, Shinjuku
Apr 16	Tokyo	Meiji, Harajuku, Senso-ji
Apr 17	Tokyo	Tea Ceremony, Budokan
Apr 18	Kyoto	Travel, Kinkaku-ji
Apr 19	Kyoto	Fushimi Inari, Proposal
Apr 20	Nara	Deer Park, Temples
Apr 21	Tokyo	Return, Bay Cruise
English	Japanese
Thank you	ありがとう
Excuse me	すみません
Please	お願いします
Where is...	...はどこですか
Help!	助けて!
Item	Budget
Hotels	$1500-2000
Transport	$600-800
Food	$800-1000
Activities	$600-800
Shopping	$500-400
Date	Location	Key Activities
Apr 15	Tokyo	Arrival, Shinjuku area exploration
Apr 16	Tokyo	Meiji Shrine, Harajuku, Senso-ji, Skytree
Apr 17	Tokyo	Tea Ceremony, Budokan, Yanaka Ginza
Apr 18	Kyoto	Travel to Kyoto, Kinkaku-ji, Gion
Apr 19	Kyoto	Fushimi Inari, Arashiyama, Evening Proposal
Apr 20	Nara/Kyoto	Nara Park day trip, deer feeding
Apr 21	Tokyo	Return to Tokyo, bay cruise
English	Japanese	When to Use
Arigatou gozaimasu	ありがとうございます	Thank you (formal)
Sumimasen	すみません	Excuse me/Sorry
Onegaishimasu	お願いします	Please
Toire wa doko desu ka?	トイレはどこですか？	Where is the bathroom?
Eigo ga hanasemasu ka?	英語が話せますか？	Do you speak English?
Time	Activity	Notes
4:00 PM	Head to Maruyama Park	Check weather first
4:30 PM	Tea house visit	Light refreshments
5:15 PM	Park walk begins	Head to weeping cherry tree
5:30 PM	Arrive at spot	Find quiet area
7:00 PM	Dinner reservation	Kikunoi Roan
Item	Budget (USD)	Notes
Hotels	1500-2000	Pre-booked
Transport	600-800	Including JR Pass
Food	800-1000	~$60/person/day
Activities	600-800	Including tea ceremony
Shopping	500-400	Souvenirs/gifts