[example] Add SWE-agent example by yueming-yuan · Pull Request #367 · radixark/miles

yueming-yuan · 2025-12-29T00:41:39Z

Add SWE-Agent Training Example with Nemo-Gym Integration

Overview

This PR adds an example for training SWE-Agent using Miles with NVIDIA's Nemo-Gym environment, SWE-Gym dataset, and SWE-bench evaluation.

Changes Summary

New directory: examples/swe-agent/
Submodules added: 2 (Nemo-Gym and mini-swe-agent)

*Note: the implementation of this example is partially here, and partially in the above two submodules. Check the codes there for more details.

Key Components

1. Core Integration Module (`generate_with_swe_agent.py`)

Custom generation function: Integrates with Nemo-Gym's /run endpoint to execute agent trajectories
Token and mask builder: Converts multi-turn agent conversations into training-ready tokens with proper loss masking
Reward function: Extracts rewards from Gym environment responses
Dynamic filtering: Filters out aborted samples to maintain training data quality
Agent metrics aggregation: Collects and logs comprehensive agent performance metrics including:
- Turn counts and tool call statistics
- Time breakdowns (model query time, environment execution time, evaluation time)
- Time ratios and overhead analysis
Custom rollout function: Wraps base sglang rollout with agent-specific metric collection

2. Data Processing Utility (`download_and_process_data.py`)

Downloads SWE-Gym dataset from HuggingFace
Converts to Miles format with proper metadata structure
Adds required subset and split fields for Gym API compatibility

3. Training Scripts

run-qwen3-4b-instruct.sh: Production training script for Qwen3-4B-Instruct model

4. Documentation (`README.md`)

5. Submodules (`.gitmodules`)

nemo-gym: The Gym environment provider, launching the environment/agent server. Fork with Miles-specific adaptations, added metrics computation, rollout status handling, sampling params handling, etc.
- Branch: miles-swe-agent
- URL: https://github.com/yueming-yuan/Gym
mini-swe-agent: Lightweight SWE-agent implementation. Fork with Miles-specific adaptations, added metrics computation, rollout status handling, sampling params handling, etc.
- Branch: miles-swe-agent
- URL: https://github.com/yueming-yuan/nv-mini-swe-agent

Technical Details

Integration Architecture

Miles (training framework) runs in one Docker container with GPU access
Nemo-Gym (environment server) runs in a separate Docker container
Communication via HTTP over a shared Docker network (swe-net)
Each SWE-Gym task spawns its own isolated Docker container for safe code execution

Sample Status Handling

COMPLETED: Agent successfully submitted a solution (exit_status == "Submitted")
TRUNCATED: Rollout exceeded limits (RolloutTruncated, LimitsExceeded, CollapseContinued)
ABORTED: Other failures (filtered out from training via dynamic_filter)

Loss Masking Strategy

First 2 messages (system + user prompt): masked out (no loss)
Assistant responses: loss applied (mask = 1)
User/environment responses: masked out (mask = 0)

gemini-code-assist · 2025-12-29T00:41:42Z

Important

Installation incomplete: to start using Gemini Code Assist, please ask the organization owner(s) to visit the Gemini Code Assist Admin Console and sign the Terms of Services.

zhaochenyang20 · 2025-12-31T02:15:00Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces an example for training SWE-Agent using Miles and Nemo-Gym. The changes are comprehensive, including new documentation, data processing scripts, core integration logic, and run scripts. My review focuses on improving the clarity and correctness of the documentation, enhancing the robustness of the data processing script, and refining the code quality of the integration logic and execution scripts. I've identified a critical issue in the README that would prevent users from running the example, and provided suggestions for better logging, code style, and removing redundancies.

examples/swe-agent/README.md

examples/swe-agent/download_and_process_data.py

gemini-code-assist · 2025-12-31T02:16:51Z

examples/swe-agent/generate_with_swe_agent.py

+from miles.utils.types import Sample
+from miles.rollout.sglang_rollout import GenerateState, eval_rollout
+from miles.rollout.filter_hub.base_types import DynamicFilterOutput
+from miles.utils.misc import load_function


The load_function is imported but not used directly within this file. It should be removed to avoid clutter.

gemini-code-assist · 2025-12-31T02:16:51Z

examples/swe-agent/generate_with_swe_agent.py

+    response = await post(f"{gym_url}/run", request)
+
+    exit_status = response.get("info", {}).get("exit_status", "")
+    print(f"exit_status: {exit_status}, reward: {response.get('reward', 0.0)}")


It's better to use the configured logger (logger.info or logger.debug) instead of print() for outputting status information. This provides more control over log levels and formatting, which is especially useful in complex systems.

Suggested change

print(f"exit_status: {exit_status}, reward: {response.get('reward', 0.0)}")

logger.info(f"exit_status: {exit_status}, reward: {response.get('reward', 0.0)}")

gemini-code-assist · 2025-12-31T02:16:51Z

examples/swe-agent/generate_with_swe_agent.py

+
+    if exit_status == "Submitted":
+        sample.status = Sample.Status.COMPLETED
+    elif exit_status == "RolloutTruncated" or exit_status == "LimitsExceeded" or exit_status == "CollapseContinued":


For better readability, you can use the in operator to check if exit_status is one of several values, instead of using a chain of or conditions.

Suggested change

elif exit_status == "RolloutTruncated" or exit_status == "LimitsExceeded" or exit_status == "CollapseContinued":

elif exit_status in ("RolloutTruncated", "LimitsExceeded", "CollapseContinued"):

gemini-code-assist · 2025-12-31T02:16:51Z

examples/swe-agent/generate_with_swe_agent.py

+        metrics["agent/total_time_max"] = max(values)
+        metrics["agent/total_time_min"] = min(values)
+
+    print(f"agent metrics: {metrics}")


This print() statement should be replaced with a call to the logger (e.g., logger.info()) for consistent logging throughout the application.

Suggested change

print(f"agent metrics: {metrics}")

logger.info(f"agent metrics: {metrics}")

gemini-code-assist · 2025-12-31T02:16:51Z

examples/swe-agent/run-qwen3-4b-instruct.sh

+sleep 3
+pkill -9 ray
+pkill -9 python


This block of commands (sleep 3 and the subsequent pkill commands) is a repetition of the cleanup logic. The initial pkill commands after ray stop --force should be sufficient. Removing this redundancy will make the script cleaner.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

…nto swe-gym-yueming

zhaochenyang20 · 2026-01-01T04:20:41Z

Thanks. I will take a review on this. @yueming-yuan

zijiexia · 2026-01-02T06:11:57Z

I'm able to reproduce this example and you may find the training log here

zhaochenyang20 · 2026-01-03T00:51:06Z

examples/swe-agent/generate_with_swe_agent.py

+def build_tokens_and_mask_from_messages(
+    messages: list[dict],
+    tokenizer,
+) -> tuple[list[int], list[int], str, int]:
+
+    if not messages or len(messages) < 2:
+        return [], [], "", 0
+
+    all_tokens = []
+    loss_mask = []
+    response_text = ""
+    prompt_length = 0
+
+    for i, msg in enumerate(messages):
+        content = msg.get("content", "")
+        if not content:
+            continue
+
+        msg_tokens = tokenizer(content, add_special_tokens=False)["input_ids"]
+        all_tokens.extend(msg_tokens)
+
+        if i < 2:
+            prompt_length += len(msg_tokens)
+        else:
+            response_text += content
+            if msg["role"] == "assistant":
+                loss_mask.extend([1] * len(msg_tokens))
+            else:
+                loss_mask.extend([0] * len(msg_tokens))
+
+    response_length = len(all_tokens) - prompt_length
+
+    return all_tokens, loss_mask, response_text, response_length


The code performs frequent list extensions and manual mask handling within a Python loop. While currently on CPU, this becomes a GIL contention point.

zhaochenyang20 · 2026-01-03T01:05:35Z

examples/swe-agent/generate_with_swe_agent.py

+        if i < 2:
+            prompt_length += len(msg_tokens)
+        else:
+            response_text += content


I replaced the string concatenation (+=) inside the loop with list.append() and "".join().

strings are immutable, so repeated concatenation creates an O(N²) memory copy overhead.

zhaochenyang20 · 2026-01-03T01:06:04Z

examples/swe-agent/generate_with_swe_agent.py

+        if i < 2:
+            prompt_length += len(msg_tokens)
+        else:


I decoupled the loop into prompt_msgs and response_msgs using slicing.

This removes the repeated if i < 2 branch check from the iteration loop, which is better for CPU branch prediction and makes the logic for "Prompt vs. Response" explicitly clear.

zhaochenyang20

my comments is fixed by myseldf

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: zijiexia <zijie_xia@icloud.com> Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>

yueming-yuan added 13 commits November 24, 2025 14:03

added swe-gym example

0acbd06

clean up

31f88cb

clean up

4aee9ce

fix bug & add docs

f83e118

format

2c4fe8c

add docs, update script

a387853

fix docs

7108b80

update

b2b2767

update

f2b09dd

update

f9b4852

update

23fcbc2

update

5295a8c

update

c277db7

yueming-yuan added 2 commits December 28, 2025 16:42

update

df4c810

update

8e9d594

gemini-code-assist bot reviewed Dec 31, 2025

View reviewed changes

yueming-yuan and others added 8 commits December 31, 2025 15:40

Update examples/swe-agent/README.md

ac6c8c1

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update examples/swe-agent/README.md

670afee

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update examples/swe-agent/README.md

c5f7b67

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update examples/swe-agent/README.md

c3889cd

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update examples/swe-agent/download_and_process_data.py

acaeeaf

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

lints

092ce25

Merge branch 'swe-gym-yueming' of https://github.com/radixark/miles i…

d07f76c

…nto swe-gym-yueming

lints

81653f2

Tiny fix to swe-bench instructions (#379)

0d87341

hot path fix for build_tokens_and_mask_from_messages

2a469fd

zhaochenyang20 reviewed Jan 3, 2026

View reviewed changes

add in place change of sample comments

6dfd1a9

zhaochenyang20 approved these changes Jan 3, 2026

View reviewed changes

zhaochen20 added 2 commits January 2, 2026 17:23

Merge remote-tracking branch 'origin/main' into swe-gym-yueming

a71ed70

sync with main and fix lint issues

e9c35a7

zhaochenyang20 merged commit 47a5bdf into main Jan 5, 2026
6 checks passed

zhaochenyang20 deleted the swe-gym-yueming branch January 5, 2026 05:55

miles-code-angel mentioned this pull request Jan 5, 2026

code sync THUDM/slime#1329

Merged

fzyzcjy pushed a commit that referenced this pull request Mar 19, 2026

[router] extract middleware folder (#367)

8ca972f

	print(f"exit_status: {exit_status}, reward: {response.get('reward', 0.0)}")
	logger.info(f"exit_status: {exit_status}, reward: {response.get('reward', 0.0)}")

	elif exit_status == "RolloutTruncated" or exit_status == "LimitsExceeded" or exit_status == "CollapseContinued":
	elif exit_status in ("RolloutTruncated", "LimitsExceeded", "CollapseContinued"):

	print(f"agent metrics: {metrics}")
	logger.info(f"agent metrics: {metrics}")

Conversation

yueming-yuan commented Dec 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Add SWE-Agent Training Example with Nemo-Gym Integration

Overview

Changes Summary

Key Components

1. Core Integration Module (generate_with_swe_agent.py)

2. Data Processing Utility (download_and_process_data.py)

3. Training Scripts

4. Documentation (README.md)

5. Submodules (.gitmodules)

Technical Details

Integration Architecture

Sample Status Handling

Loss Masking Strategy

Uh oh!

gemini-code-assist bot commented Dec 29, 2025

Uh oh!

zhaochenyang20 commented Dec 31, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot Dec 31, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 31, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 31, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 31, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 31, 2025

Choose a reason for hiding this comment

Uh oh!

zhaochenyang20 commented Jan 1, 2026

Uh oh!

zijiexia commented Jan 2, 2026

Uh oh!

zhaochenyang20 Jan 3, 2026

Choose a reason for hiding this comment

Uh oh!

zhaochenyang20 Jan 3, 2026

Choose a reason for hiding this comment

Uh oh!

zhaochenyang20 Jan 3, 2026

Choose a reason for hiding this comment

Uh oh!

zhaochenyang20 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

yueming-yuan commented Dec 29, 2025 •

edited

Loading

1. Core Integration Module (`generate_with_swe_agent.py`)

2. Data Processing Utility (`download_and_process_data.py`)

4. Documentation (`README.md`)

5. Submodules (`.gitmodules`)