Skip to content

[example] Add SWE-agent example#367

Merged
zhaochenyang20 merged 28 commits intomainfrom
swe-gym-yueming
Jan 5, 2026
Merged

[example] Add SWE-agent example#367
zhaochenyang20 merged 28 commits intomainfrom
swe-gym-yueming

Conversation

@yueming-yuan
Copy link
Collaborator

@yueming-yuan yueming-yuan commented Dec 29, 2025

Add SWE-Agent Training Example with Nemo-Gym Integration

Overview

This PR adds an example for training SWE-Agent using Miles with NVIDIA's Nemo-Gym environment, SWE-Gym dataset, and SWE-bench evaluation.

Changes Summary

  • New directory: examples/swe-agent/
  • Submodules added: 2 (Nemo-Gym and mini-swe-agent)

*Note: the implementation of this example is partially here, and partially in the above two submodules. Check the codes there for more details.

Key Components

1. Core Integration Module (generate_with_swe_agent.py)

  • Custom generation function: Integrates with Nemo-Gym's /run endpoint to execute agent trajectories
  • Token and mask builder: Converts multi-turn agent conversations into training-ready tokens with proper loss masking
  • Reward function: Extracts rewards from Gym environment responses
  • Dynamic filtering: Filters out aborted samples to maintain training data quality
  • Agent metrics aggregation: Collects and logs comprehensive agent performance metrics including:
    • Turn counts and tool call statistics
    • Time breakdowns (model query time, environment execution time, evaluation time)
    • Time ratios and overhead analysis
  • Custom rollout function: Wraps base sglang rollout with agent-specific metric collection

2. Data Processing Utility (download_and_process_data.py)

  • Downloads SWE-Gym dataset from HuggingFace
  • Converts to Miles format with proper metadata structure
  • Adds required subset and split fields for Gym API compatibility

3. Training Scripts

  • run-qwen3-4b-instruct.sh: Production training script for Qwen3-4B-Instruct model

4. Documentation (README.md)

5. Submodules (.gitmodules)

Technical Details

Integration Architecture

  1. Miles (training framework) runs in one Docker container with GPU access
  2. Nemo-Gym (environment server) runs in a separate Docker container
  3. Communication via HTTP over a shared Docker network (swe-net)
  4. Each SWE-Gym task spawns its own isolated Docker container for safe code execution

Sample Status Handling

  • COMPLETED: Agent successfully submitted a solution (exit_status == "Submitted")
  • TRUNCATED: Rollout exceeded limits (RolloutTruncated, LimitsExceeded, CollapseContinued)
  • ABORTED: Other failures (filtered out from training via dynamic_filter)

Loss Masking Strategy

  • First 2 messages (system + user prompt): masked out (no loss)
  • Assistant responses: loss applied (mask = 1)
  • User/environment responses: masked out (mask = 0)

@gemini-code-assist
Copy link
Contributor

Important

Installation incomplete: to start using Gemini Code Assist, please ask the organization owner(s) to visit the Gemini Code Assist Admin Console and sign the Terms of Services.

@zhaochenyang20
Copy link
Collaborator

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an example for training SWE-Agent using Miles and Nemo-Gym. The changes are comprehensive, including new documentation, data processing scripts, core integration logic, and run scripts. My review focuses on improving the clarity and correctness of the documentation, enhancing the robustness of the data processing script, and refining the code quality of the integration logic and execution scripts. I've identified a critical issue in the README that would prevent users from running the example, and provided suggestions for better logging, code style, and removing redundancies.

from miles.utils.types import Sample
from miles.rollout.sglang_rollout import GenerateState, eval_rollout
from miles.rollout.filter_hub.base_types import DynamicFilterOutput
from miles.utils.misc import load_function
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The load_function is imported but not used directly within this file. It should be removed to avoid clutter.

response = await post(f"{gym_url}/run", request)

exit_status = response.get("info", {}).get("exit_status", "")
print(f"exit_status: {exit_status}, reward: {response.get('reward', 0.0)}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

It's better to use the configured logger (logger.info or logger.debug) instead of print() for outputting status information. This provides more control over log levels and formatting, which is especially useful in complex systems.

Suggested change
print(f"exit_status: {exit_status}, reward: {response.get('reward', 0.0)}")
logger.info(f"exit_status: {exit_status}, reward: {response.get('reward', 0.0)}")


if exit_status == "Submitted":
sample.status = Sample.Status.COMPLETED
elif exit_status == "RolloutTruncated" or exit_status == "LimitsExceeded" or exit_status == "CollapseContinued":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For better readability, you can use the in operator to check if exit_status is one of several values, instead of using a chain of or conditions.

Suggested change
elif exit_status == "RolloutTruncated" or exit_status == "LimitsExceeded" or exit_status == "CollapseContinued":
elif exit_status in ("RolloutTruncated", "LimitsExceeded", "CollapseContinued"):

metrics["agent/total_time_max"] = max(values)
metrics["agent/total_time_min"] = min(values)

print(f"agent metrics: {metrics}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This print() statement should be replaced with a call to the logger (e.g., logger.info()) for consistent logging throughout the application.

Suggested change
print(f"agent metrics: {metrics}")
logger.info(f"agent metrics: {metrics}")

Comment on lines +9 to +11
sleep 3
pkill -9 ray
pkill -9 python
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This block of commands (sleep 3 and the subsequent pkill commands) is a repetition of the cleanup logic. The initial pkill commands after ray stop --force should be sufficient. Removing this redundancy will make the script cleaner.

yueming-yuan and others added 8 commits December 31, 2025 15:40
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@zhaochenyang20
Copy link
Collaborator

Thanks. I will take a review on this. @yueming-yuan

@zijiexia
Copy link
Contributor

zijiexia commented Jan 2, 2026

I'm able to reproduce this example and you may find the training log here

Comment on lines +16 to +48
def build_tokens_and_mask_from_messages(
messages: list[dict],
tokenizer,
) -> tuple[list[int], list[int], str, int]:

if not messages or len(messages) < 2:
return [], [], "", 0

all_tokens = []
loss_mask = []
response_text = ""
prompt_length = 0

for i, msg in enumerate(messages):
content = msg.get("content", "")
if not content:
continue

msg_tokens = tokenizer(content, add_special_tokens=False)["input_ids"]
all_tokens.extend(msg_tokens)

if i < 2:
prompt_length += len(msg_tokens)
else:
response_text += content
if msg["role"] == "assistant":
loss_mask.extend([1] * len(msg_tokens))
else:
loss_mask.extend([0] * len(msg_tokens))

response_length = len(all_tokens) - prompt_length

return all_tokens, loss_mask, response_text, response_length
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code performs frequent list extensions and manual mask handling within a Python loop. While currently on CPU, this becomes a GIL contention point.

if i < 2:
prompt_length += len(msg_tokens)
else:
response_text += content
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replaced the string concatenation (+=) inside the loop with list.append() and "".join().

strings are immutable, so repeated concatenation creates an O(N²) memory copy overhead.

Comment on lines +37 to +39
if i < 2:
prompt_length += len(msg_tokens)
else:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decoupled the loop into prompt_msgs and response_msgs using slicing.

This removes the repeated if i < 2 branch check from the iteration loop, which is better for CPU branch prediction and makes the logic for "Prompt vs. Response" explicitly clear.

Copy link
Collaborator

@zhaochenyang20 zhaochenyang20 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my comments is fixed by myseldf

@zhaochenyang20 zhaochenyang20 merged commit 47a5bdf into main Jan 5, 2026
6 checks passed
@zhaochenyang20 zhaochenyang20 deleted the swe-gym-yueming branch January 5, 2026 05:55
fzyzcjy pushed a commit that referenced this pull request Mar 19, 2026
fzyzcjy pushed a commit that referenced this pull request Mar 19, 2026
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: zijiexia <zijie_xia@icloud.com>
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants