-
Notifications
You must be signed in to change notification settings - Fork 163
Support mini-swe-agent as agent harness #1212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
104 commits
Select commit
Hold shift + click to select a range
537e8b3
some support
wasiahmad febd824
updating to support mini-swe-agent
wasiahmad 783354b
updating to support mini-swe-agent
wasiahmad 71167b0
updating to support mini-swe-agent
wasiahmad 6422992
fixing a minor bug
wasiahmad 583b9ab
fixing a minor bug
wasiahmad 3d99c60
fixing a minor bug
wasiahmad 3846964
fixing a minor bug
wasiahmad dac4bca
fixing a minor bug
wasiahmad 9f3e234
fixing a minor bug
wasiahmad fa2e5b5
Add HMMT Nov 2025 dataset (#1061)
i-vainn 4bd5f7d
Use docker build cache (#1056)
gwarmstrong aff6e4a
ci: Add CodeRabbit configuration file (#1063)
chtruong814 7c44a0d
FIX integration tests by escaping aalcr and adding judge args (#1062)
gwarmstrong e5bcd68
ENH add tool calling args (#1067)
gwarmstrong c74cd99
Fix sglang tool calling (#1070)
gwarmstrong e03f563
Network Blocking for Sandbox Code Execution (#1071)
gwarmstrong c376270
Fixes to support SWE-bench Multilingual (#1064)
ludwig-n 1b1f66e
fix: IFBench error handling and build improvements (#1073)
gwarmstrong 782b083
FIX math verify handle leading zeros and int literals cases (#1074)
gwarmstrong 1545f73
build: move data preparation to beginning of gpu tests build (#1077)
gwarmstrong 6594d4c
MAINT update langugage-data dependency (#1076)
gwarmstrong 53f1056
MAINT: Add audio requirements to vllm image (#1081)
gwarmstrong 7e35ddd
Add apex-shortlist dataset (#1080)
i-vainn 0316807
Introduce regex for small differences of formatting from judge (#1082)
wprazuch 0807259
Add LCB Prompts, fix regex bug in robust_eval, remove CR, make summar…
gnalbandyan b74c543
MAINT pin nemo-evaluator (#1095)
gwarmstrong 5c15cf7
Update issue templates
gwarmstrong c4eb65f
Delete .github/ISSUE_TEMPLATE directory
gwarmstrong 2d93252
enable blank issues (#1096)
gwarmstrong b40fff1
Fix input_file path handling when executor is "none" (#1089)
bzantium da79a43
TST for #1089 (#1097)
gwarmstrong e4aa660
Stepheng/prover cleanup (#1078)
stephencge 7934476
add stem dependencies in main python sandbox (#1099)
jiacheng-xu eb5fe5a
Audiometrics unification (#1093)
Jorjeous 56a3fa9
FEAT Add Tavily Search (#1085)
gwarmstrong f7e5479
updating code extraction logic (#1086)
wasiahmad 56662d3
Sandbox add stem (#1101)
jiacheng-xu 2007af2
Handle none output in wmtp24++ (#1091)
Froxyy-dev 180f114
ENH enable sandbox env overrides in generate (#1107)
gwarmstrong 637ce1f
Search Tool Parameter updates (#1112)
gwarmstrong 3fb4e65
autoformalize cleanup (#1098)
stephencge c98b587
HF ASR Leaderboard Evaluation (#1104)
melllinia 3ea7a17
Stepheng/nemotron math proofs docs (#1111)
stephencge e9ad754
Stepheng/prover gpt oss fix (#1114)
stephencge 552af8c
add Nemotron-Math-V2.pdf (#1113)
wedu-nvidia dfc8e9a
SWE-bench: don't pass external environment variables into Apptainer c…
ludwig-n 88ad93b
Adding clan PR with AudioBench and Librispeech PC. (#1103)
Jorjeous 9b3c571
Schema overrides for tool-calling (#1118)
gwarmstrong cec7759
FIX tool call error handling and search tool errors (#1120)
gwarmstrong e7582a3
Use run.Script for generate pipeline (#1052)
gwarmstrong 8e02df1
Port ICPC changes to IOI (#1046)
SeanNaren 92a1bc9
replace raise error with LOG.warning in AA LCR dataset prepare (#1119)
anowaczynski-nvidia 8e0c152
FIX tavily search results return type (#1123)
gwarmstrong 9a042c1
Revert "Use run.Script for generate pipeline (#1052)" (#1125)
gwarmstrong 667d56b
Fix: add serialized_output on bad request (#1127)
gwarmstrong 21a4be4
update paper link (#1128)
wedu-nvidia 25aae9e
update paper link, references to dataset, self-correction differences…
stephencge 3754a9e
FIX ioi ignore (#1131)
gwarmstrong fb866ea
download AA-LCR_extracted-text.zip via hf_hub_download (#1126)
anowaczynski-nvidia c52c04e
Evaluation on Livecodebench-pro (#1115)
wasiahmad 67d3493
Evaluation support for SWE-rebench (#1102)
wasiahmad 46ecb38
Trust remote code in tokenizer (#1146)
Kipok b5fe5e0
Merge branch 'main' into mini_swe_agent
wasiahmad 0a49ab3
adding mini-swe-agent in generation-task
wasiahmad d22eb1d
updating mini-swe-agent cmd
wasiahmad 0a56152
updating mini-swe-agent cmd
wasiahmad 1a815de
updating mini-swe-agent cmd
wasiahmad fbdaf2f
updating mini-swe-agent cmd
wasiahmad 6ed5573
updating mini-swe-agent cmd
wasiahmad 9338bde
updating mini-swe-agent cmd
wasiahmad 961262e
updating mini-swe-agent cmd
wasiahmad 4790ea9
updating mini-swe-agent cmd
wasiahmad 08db911
updating mini-swe-agent cmd
wasiahmad a85041c
updating mini-swe-agent cmd
wasiahmad 0288d9a
updating mini-swe-agent cmd
wasiahmad 7bae25b
updating mini-swe-agent cmd
wasiahmad 26d0e54
updating mini-swe-agent cmd
wasiahmad 9bff948
updating mini-swe-agent cmd
wasiahmad 241393d
updating mini-swe-agent cmd
wasiahmad c1fb80c
updating mini-swe-agent cmd
wasiahmad 61c666d
updating mini-swe-agent cmd
wasiahmad aa08f7d
updating mini-swe-agent cmd
wasiahmad f3dc6df
updating mini-swe-agent cmd
wasiahmad 9cde4e6
updating mini-swe-agent cmd
wasiahmad 5a64c00
updating mini-swe-agent cmd
wasiahmad d911f9f
updating mini-swe-agent cmd
wasiahmad 6dcba7b
updating mini-swe-agent cmd
wasiahmad 894ca31
updating mini-swe-agent cmd
wasiahmad 0ce1349
updating mini-swe-agent cmd
wasiahmad b685c0b
updating mini-swe-agent cmd
wasiahmad f27c50b
Fix getting patch
ludwig-n d567697
Save configs in separate folder
ludwig-n 4c3e0db
Update docs
ludwig-n 20f8580
Remove drop_params from configs
ludwig-n 74922d7
supporting agent_max_turns
wasiahmad 6615af3
downgrading rich to avoid issues with some instances
wasiahmad 72f0d46
missing && added
wasiahmad 8e0848a
Merge branch 'main' into mini_swe_agent
wasiahmad 6251767
Merge branch 'main' into mini_swe_agent
wasiahmad 8a095f7
adding reference
wasiahmad 0acf4a4
Remove step_limit and set cost_limit=0 in all configs
ludwig-n 4b05c9d
Merge branch 'main' into mini_swe_agent
wasiahmad 276556c
Merge branch 'main' into mini_swe_agent
wasiahmad File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.