radixark · zhaochenyang20 · Jan 5, 2026 · Nov 24, 2025 · Nov 24, 2025 · Nov 24, 2025
diff --git a/.gitmodules b/.gitmodules
@@ -0,0 +1,8 @@
+[submodule "examples/swe-agent/nemo-gym"]
+	path = examples/swe-agent/nemo-gym
+	url = https://github.com/yueming-yuan/Gym
+	branch = miles-swe-agent
+[submodule "examples/swe-agent/mini-swe-agent"]
+	path = examples/swe-agent/mini-swe-agent
+	url = https://github.com/yueming-yuan/nv-mini-swe-agent
+	branch = miles-swe-agent
diff --git a/examples/swe-agent/README.md b/examples/swe-agent/README.md
@@ -0,0 +1,130 @@
+### Introduction
+
+This is an example for SWE-agent training. This example uses NVIDIA's Nemo-Gym as the Gym environment implement, SWE-Gym as the training data, and SWE-bench as the evaluation.
+
+This implementation of this example is partially in submodules below:
+- Nemo-Gym: https://github.com/yueming-yuan/Gym/tree/miles-swe-agent
+- mini-swe-agent: https://github.com/yueming-yuan/nv-mini-swe-agent/tree/miles-swe-agent
+
+
+### Prepare environment
+#### Update submodules
+```bash
+git submodule update --init --recursive .
+```
+#### Docker settings
+```bash
+# 1. create a docker network
+docker network create swe-net
+
+# 2. create environment docker
+docker run -itd \
+  --name swe_env \
+  --shm-size 16g \
+  -v /var/run/docker.sock:/var/run/docker.sock \
+  -v /mnt/data:/data \
+  -v /home/sglang-rl/<your_name>:/workspace \
+  --ipc=host \
+  --ulimit nofile=65536:65536 \
+  --ulimit memlock=-1 \
+  --ulimit stack=67108864 \
+  --network swe-net \
+  ubuntu:latest \
+  /bin/bash
+
+# 3. create miles docker
+docker run -itd \
+  --shm-size 32g \
+  --gpus all \
+  -v /mnt/data/cache/huggingface:/root/.cache/huggingface \
+  -v /mnt/data:/data \
+  -v /home/sglang-rl/<your_name>:/workspace \
+  --ipc=host \
+  --ulimit nofile=65536:65536 \
+  --ulimit memlock=-1 \
+  --ulimit stack=67108864 \
+  --privileged \
+  --network swe-net \
+  --name miles_<your_name> \
+  slimerl/slime:latest \
+  /bin/zsh
+
+# 4. install utils in environment docker
+docker exec -it swe_env /bin/bash
+apt update && apt install -y zsh curl git python3 python3-pip docker.io
+```
+note: `-v /var/run/docker.sock:/var/run/docker.sock` is required for Docker-in-Docker SWE environment execution; use `--network swe-net` to enable communication between training & environment.
+
+#### Installation
+
+In **environment docker**, install Gym
+```bash
+git clone https://github.com/yueming-yuan/Gym
+cd Gym
+
+curl -LsSf https://astral.sh/uv/install.sh | sh
+source $HOME/.local/bin/env
+uv venv --python 3.12 && source .venv/bin/activate
+uv sync --extra dev --group docs
+
+# configure env.yaml
+echo "policy_base_url: https://api.openai.com/v1
+policy_api_key: your-openai-api-key
+policy_model_name: gpt-4.1-2025-04-14
+default_host: 0.0.0.0" > env.yaml
+```
+note: set host IP to `0.0.0.0` to enable communications between dockers.
+
+then set up for SWE-agent server:
+```bash
+cd responses_api_agents/mini_swe_agent
+uv pip install -r requirements.txt
+```
+Now you should be able to run the SWE-agent server.
+
+For **miles docker** setup, please follow the standard setup process.
+
+### Preparing data
+In **miles docker**, download **SWE-Gym** data from huggingface and convert it to Miles' prompt data format with this script.
+```
+cd miles/examples/swe-agent
+python download_and_process_data.py --input SWE-Gym/SWE-Gym --output /root/swe_train.jsonl
+```
+
+### Running train
+1. In environment docker, launch the agent server
+```bash
+cd Gym
+source .venv/bin/activate
+cd responses_api_agents/mini_swe_agent
+./start_server.sh
+```
+
+
+2. In miles docker,
+(1) export `SWE_AGENT_GYM_URL` to be the port of the second server you started in Gym in environment docker, whose `server_type` is `responses_api_agents`. `swe_env` is the environment docker's name; replace it if you changed the name.
+(minor TODO: modify the port selections to avoid setting this every time.) (2) launch the training.
+```bash
+export SWE_AGENT_GYM_URL="http://swe_env:<port_of_responses_api_agents>"
+bash examples/swe-agent/run-qwen3-4b-instruct.sh
+```
+
+
+### Troubleshooting
+1. The first time of every SWE environment can be slow, and may need to wait before generation, because each SWE-Gym task has a specific docker, and `docker pull` takes time.
+2. Sometimes the environment may also be slow at evaluation. The timeout of evaluation is 10 minutes by default. If the server is stuck at `[EVAL]<instance> Running eval`, you may need to wait for it.
+
+## Metrics
+```
+agent/turns_mean, agent/turns_sum - Turn counts
+agent/tool_calls_mean, agent/tool_calls_sum - Tool call counts
+agent/total_time_mean/max/min - Total time statistics
+agent/model_query_time_sum_mean - Avg total model time per rollout
+agent/env_execution_time_sum_mean - Avg total env time per rollout
+agent/eval_time_mean - Avg evaluation time
+agent/overhead_time_mean - Avg overhead time
+agent/time_per_turn - Avg time per turn
+agent/model_query_time_avg - Avg model query time per turn
+agent/env_execution_time_avg - Avg env execution time per turn
+agent/model_time_ratio, agent/env_time_ratio - Time ratios
+```
diff --git a/examples/swe-agent/download_and_process_data.py b/examples/swe-agent/download_and_process_data.py
@@ -0,0 +1,85 @@
+#!/usr/bin/env python3
+"""Download and process data to Miles format."""
+
+import argparse
+import json
+import tempfile
+from pathlib import Path
+from datasets import load_dataset
+
+
+def convert_to_miles_format(input_path: str, output_path: str, limit: int = None, split: str = "train"):
+    """Convert JSONL to Miles format.
+
+    Args:
+        input_path: Path to input JSONL file
+        output_path: Path to output JSONL file in Miles format
+        limit: Optional limit on number of samples
+        split: Dataset split name (used in metadata)
+    """
+    count = 0
+    with open(input_path) as fin, open(output_path, "w") as fout:
+        for line in fin:
+            if limit and count >= limit:
+                break
+
+            instance = json.loads(line)
+
+            # Add subset and split to metadata for Gym API
+            metadata = dict(instance)
+            metadata["subset"] = "gym"
+            metadata["split"] = split
+
+            miles_sample = {
+                "prompt": instance.get("problem_statement", ""),
+                "metadata": metadata,
+            }
+
+            fout.write(json.dumps(miles_sample) + "\n")
+            count += 1
+
+    print(f"Converted {count} samples: {input_path} -> {output_path}")
+
+
+def main():
+    parser = argparse.ArgumentParser(description="Download HuggingFace dataset and convert to Miles format")
+    parser.add_argument("--input", type=str, required=True, help="HuggingFace dataset path or local JSONL file")
+    parser.add_argument("--output", type=str, required=True, help="Output JSONL file path")
+    parser.add_argument(
+        "--split", type=str, default="train", help="Dataset split (default: train, only for HF datasets)"
+    )
+    parser.add_argument("--limit", type=int, help="Limit number of samples")
+
+    args = parser.parse_args()
+
+    input_path = Path(args.input)
+
+    if input_path.exists() and input_path.suffix == ".jsonl":
+        print(f"Processing local file: {args.input}")
+        convert_to_miles_format(args.input, args.output, args.limit, args.split)
+    else:
+        print(f"Loading HuggingFace dataset: {args.input} (split={args.split})")
+        ds = load_dataset(args.input, split=args.split)
+
+        if args.limit:
+            ds = ds.select(range(min(args.limit, len(ds))))
+
+        tmp_path = None
+        try:
+            with tempfile.NamedTemporaryFile(mode="w", suffix=".jsonl", delete=False) as tmp:
+                tmp_path = tmp.name
+
+            print(f"Downloading to temporary file: {tmp_path}")
+            ds.to_json(tmp_path)
+
+            print(f"Converting to Miles format: {args.output}")
+            convert_to_miles_format(tmp_path, args.output, split=args.split)
+        finally:
+            if tmp_path and Path(tmp_path).exists():
+                Path(tmp_path).unlink()
+
+    print("Done.")
+
+
+if __name__ == "__main__":
+    main()