-
Notifications
You must be signed in to change notification settings - Fork 163
Evaluation on Livecodebench-pro #1115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
54 commits
Select commit
Hold shift + click to select a range
2f01be8
data downloading modified
wasiahmad f23866d
making test_25q2 as default split
wasiahmad 0b7bd0d
updating data prep logic
wasiahmad 359e757
updating data prep logic
wasiahmad 1560196
updating data prep logic
wasiahmad dd376d6
MAINT update langugage-data dependency (#1076)
gwarmstrong 01384aa
MAINT: Add audio requirements to vllm image (#1081)
gwarmstrong c91e459
Add apex-shortlist dataset (#1080)
i-vainn eb99be2
Introduce regex for small differences of formatting from judge (#1082)
wprazuch a6f475c
Add LCB Prompts, fix regex bug in robust_eval, remove CR, make summar…
gnalbandyan 6cb9b79
MAINT pin nemo-evaluator (#1095)
gwarmstrong f96d242
Update issue templates
gwarmstrong 087d762
Delete .github/ISSUE_TEMPLATE directory
gwarmstrong fdbefe9
enable blank issues (#1096)
gwarmstrong c2c38cd
Fix input_file path handling when executor is "none" (#1089)
bzantium a915e8d
TST for #1089 (#1097)
gwarmstrong a5b3bd7
Stepheng/prover cleanup (#1078)
stephencge 62a8a06
add stem dependencies in main python sandbox (#1099)
jiacheng-xu 0af2629
Audiometrics unification (#1093)
Jorjeous f796b77
FEAT Add Tavily Search (#1085)
gwarmstrong f40f3a1
updating code extraction logic (#1086)
wasiahmad 0727665
Sandbox add stem (#1101)
jiacheng-xu 321edab
Handle none output in wmtp24++ (#1091)
Froxyy-dev d9e6d23
ENH enable sandbox env overrides in generate (#1107)
gwarmstrong f56614b
Search Tool Parameter updates (#1112)
gwarmstrong 2af0b63
autoformalize cleanup (#1098)
stephencge 4603c77
HF ASR Leaderboard Evaluation (#1104)
melllinia 67fbc84
Stepheng/nemotron math proofs docs (#1111)
stephencge 579e765
Stepheng/prover gpt oss fix (#1114)
stephencge 079da02
changing code extraction logic
wasiahmad e33bf59
minor fixes
wasiahmad a379a6a
minor fixes
wasiahmad 8c54820
minor fixes
wasiahmad 97d5693
add Nemotron-Math-V2.pdf (#1113)
wedu-nvidia d031317
fixing metric issue and missing problem-id issue
wasiahmad b2b06ac
adding metric type for lcb-pro
wasiahmad 8982271
debugging
wasiahmad 32ad110
fixed a minor issue
wasiahmad 9bb52d2
SWE-bench: don't pass external environment variables into Apptainer c…
ludwig-n f99e5cb
Adding clan PR with AudioBench and Librispeech PC. (#1103)
Jorjeous ad51e99
Schema overrides for tool-calling (#1118)
gwarmstrong 28e7567
FIX tool call error handling and search tool errors (#1120)
gwarmstrong 464561d
Use run.Script for generate pipeline (#1052)
gwarmstrong e3aad78
Port ICPC changes to IOI (#1046)
SeanNaren 58eb7d9
replace raise error with LOG.warning in AA LCR dataset prepare (#1119)
anowaczynski-nvidia 5575646
FIX tavily search results return type (#1123)
gwarmstrong 0e64314
Revert "Use run.Script for generate pipeline (#1052)" (#1125)
gwarmstrong f3f9c90
Fix: add serialized_output on bad request (#1127)
gwarmstrong da917f6
update paper link (#1128)
wedu-nvidia dca79a6
update paper link, references to dataset, self-correction differences…
stephencge d94e953
updating documentation
wasiahmad 35dc934
FIX ioi ignore (#1131)
gwarmstrong b149479
download AA-LCR_extracted-text.zip via hf_hub_download (#1126)
anowaczynski-nvidia 18e7043
fixing conflicts
wasiahmad File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.