Skip to content
Merged
Show file tree
Hide file tree
Changes from 174 commits
Commits
Show all changes
232 commits
Select commit Hold shift + click to select a range
cd9cb64
initial poc
cquil11 Nov 12, 2025
00ac64a
remove -d flag when launching docker container
cquil11 Nov 12, 2025
e38b38a
syntax error
cquil11 Nov 12, 2025
66eae81
compatibility fixes
cquil11 Nov 12, 2025
fdec241
add correct endpoint prefix
cquil11 Nov 12, 2025
08de857
remove reference env var
cquil11 Nov 12, 2025
06231ee
run vllm serve in background
cquil11 Nov 12, 2025
21ed067
unescape sequences
cquil11 Nov 12, 2025
65ef1f0
stop vllm to stdout after it stops
cquil11 Nov 12, 2025
cb55721
stop vllm to stdout after it stops pt 2
cquil11 Nov 12, 2025
788b7f1
get rid of docker stop as no longer in detatched
cquil11 Nov 12, 2025
a87e174
clone bench serving to tmp dir
cquil11 Nov 12, 2025
c1d0a79
clone bench serving to tmp dir pt 2
cquil11 Nov 12, 2025
4823afa
add explanatory comment
cquil11 Nov 12, 2025
d52299f
cleaning up
cquil11 Nov 12, 2025
85de6e7
cleaning up
cquil11 Nov 13, 2025
48f7588
adding mi355x refactor
cquil11 Nov 13, 2025
faec31e
adding h200 initial refactor
cquil11 Nov 13, 2025
1ef1b23
different way to see server logs
cquil11 Nov 13, 2025
75523ee
cleanup
cquil11 Nov 13, 2025
2536652
now fail if server fails
cquil11 Nov 13, 2025
2d58f0d
starting on b200
cquil11 Nov 13, 2025
f5cf4a7
doign b200
cquil11 Nov 13, 2025
92af70b
reverting erroneous change
cquil11 Nov 13, 2025
f330d67
fixing b200
cquil11 Nov 14, 2025
c5fcf81
fixing b200 pt 2
cquil11 Nov 14, 2025
3ededf0
updating mi300
cquil11 Nov 14, 2025
813381b
updating mi300 pt 2
cquil11 Nov 14, 2025
e1b387c
updating mi300 pt 3 -- remove detached mode
cquil11 Nov 14, 2025
c0a5c62
cleaning up mi355x
cquil11 Nov 14, 2025
634768c
fixing mi300x and updating 325x
cquil11 Nov 14, 2025
61a5c8f
reverting max conc to 512 on gptoss fp4 b200 docker
cquil11 Nov 14, 2025
74363e4
mi325x debug
cquil11 Nov 14, 2025
220e026
add back correct launch script for new mi325x slurm cluster (#231)
cquil11 Nov 14, 2025
5db1af8
fixing mi300x and updating 325x
cquil11 Nov 14, 2025
9806d30
Merge branch 'main' into refactor-docker-runner-launch
cquil11 Nov 14, 2025
b4eb57e
cleanng up
cquil11 Nov 14, 2025
04e30f3
add wait for h200 slurm dsr1
cquil11 Nov 14, 2025
d36965a
max num seqs back to 512 for gptoss fpr b200 docker
cquil11 Nov 14, 2025
fa7cbca
fix port issue for dsr1 mi300x docker
cquil11 Nov 14, 2025
1031ac9
fix mi355x docker NUM_PROMPTS
cquil11 Nov 14, 2025
8b847f1
adding prop of failure for server logs
cquil11 Nov 14, 2025
832bafc
add utils function for benchmark
cquil11 Nov 14, 2025
ebe3b62
add utils function for benchmark
cquil11 Nov 14, 2025
aa9070f
function-ize the waiting for server to start
cquil11 Nov 14, 2025
0d2c112
dont show arg parsing set -x
cquil11 Nov 14, 2025
271091d
dont show arg parsing set +x oops
cquil11 Nov 14, 2025
898b132
dont show arg parsing set +x oops
cquil11 Nov 14, 2025
fd2e33e
capture server pid
cquil11 Nov 14, 2025
2a4faf5
Squash-merge bryan/eval into refactor-docker-runner-launch
Oseltamivir Nov 14, 2025
173d7bf
evals h100-cr
Oseltamivir Nov 15, 2025
4ff8a9b
evals h100-cw
Oseltamivir Nov 15, 2025
83901e7
evals h200-nb
Oseltamivir Nov 15, 2025
6c65a24
move eval script here
Oseltamivir Nov 15, 2025
343d24e
evals mi300x-amd
Oseltamivir Nov 15, 2025
2de4a18
evals mi325x-amd
Oseltamivir Nov 15, 2025
21825ce
evals mi300x-tw
Oseltamivir Nov 15, 2025
00bfa34
evals mi300x-oci
Oseltamivir Nov 15, 2025
e8aa07e
evals mi325x-tw
Oseltamivir Nov 15, 2025
bf4eff2
evals mi325x-tw summary
Oseltamivir Nov 15, 2025
71008bb
evals mi325x-tw summary
Oseltamivir Nov 15, 2025
7f3cd09
evals mi355x-amd
Oseltamivir Nov 15, 2025
dfff2f4
evals mi325x-tw summary
Oseltamivir Nov 15, 2025
9a11152
evals mi325x-tw summary
Oseltamivir Nov 15, 2025
1ead695
evals mi325x-tw summary
Oseltamivir Nov 15, 2025
348d5d9
all summary
Oseltamivir Nov 15, 2025
679caa6
evals b200-nvd
Oseltamivir Nov 16, 2025
eda5e2f
evals b200-nvd 2
Oseltamivir Nov 16, 2025
42151cc
evals b200-nvd 3
Oseltamivir Nov 16, 2025
512dfc0
evals h100-cr
Oseltamivir Nov 16, 2025
4de631d
evals b200-nvd 1
Oseltamivir Nov 16, 2025
b33cb80
evals h200-trt-cw
Oseltamivir Nov 16, 2025
5babdb0
evals h200-trt-cw 2
Oseltamivir Nov 16, 2025
12a85b8
evals h200-trt-cw 3
Oseltamivir Nov 16, 2025
eb2846f
evals h100-cr 2
Oseltamivir Nov 16, 2025
4166070
evals h200-trt-cw 4
Oseltamivir Nov 16, 2025
5f6b772
evals h200-trt-cw 5 (EP/TP HARD)
Oseltamivir Nov 16, 2025
30baa1f
evals h200-trt-cw 6 (EP/TP HARD)
Oseltamivir Nov 16, 2025
5a209fd
evals h200-trt-cw 6 (EP/TP HARD)
Oseltamivir Nov 16, 2025
89a9cbd
evals h200-cw dsr1
Oseltamivir Nov 16, 2025
9254ef1
evals mi300x-cr dsr1
Oseltamivir Nov 16, 2025
6705ea3
evals mi300x-cr dsr1 2
Oseltamivir Nov 16, 2025
c1fc6db
evals mi325x-cr dsr1
Oseltamivir Nov 16, 2025
090630a
evals mi325x-cr dsr1 2
Oseltamivir Nov 16, 2025
d984d7a
evals mi355x-amd dsr1
Oseltamivir Nov 16, 2025
fb66e33
evals mi355x-amd dsr1 2
Oseltamivir Nov 16, 2025
d0eb0c4
evals mi355x-amd dsr1 3
Oseltamivir Nov 16, 2025
c1dc1a6
evals mi355x-amd dsr1 4
Oseltamivir Nov 16, 2025
88d3bf5
evals b200-nvd dsr1
Oseltamivir Nov 16, 2025
8a0677d
evals b200-nvd fp8 dsr1
Oseltamivir Nov 16, 2025
dab1a2c
Merge remote-tracking branch 'origin/main' into evals-on-refactor
Oseltamivir Nov 20, 2025
f862af7
Lighteval 1
Oseltamivir Nov 21, 2025
5ef76ef
Lighteval 1.75
Oseltamivir Nov 21, 2025
3081241
Lighteval Mi325x
Oseltamivir Nov 21, 2025
f182319
Lighteval Mi300x CR
Oseltamivir Nov 21, 2025
5ba2cf2
Lighteval Mi355x amd
Oseltamivir Nov 21, 2025
5bf69ab
Lighteval b200_nvd
Oseltamivir Nov 21, 2025
f862689
Lighteval h200_cr0
Oseltamivir Nov 21, 2025
c3df519
Lighteval h200-nb_1
Oseltamivir Nov 21, 2025
c1edb9a
Lighteval h100-cw_1
Oseltamivir Nov 21, 2025
d21826b
Error reproduction
Oseltamivir Nov 22, 2025
abdad78
Error file removal
Oseltamivir Nov 22, 2025
bd36530
error reproducibility
Oseltamivir Nov 22, 2025
a0434b1
should NOT error reproduce
Oseltamivir Nov 22, 2025
f56a311
should NOT error reproduce
Oseltamivir Nov 22, 2025
27bd2de
should NOT error reproduce
Oseltamivir Nov 22, 2025
c058b16
should NOT error reproduce
Oseltamivir Nov 22, 2025
2e36914
Double check other runner
Oseltamivir Nov 23, 2025
d2cf0fb
Cleanup MI300x_AMD
Oseltamivir Nov 23, 2025
0a8901a
Cleanup MI300x_AMD
Oseltamivir Nov 23, 2025
afd304f
Cleanup MI300x_AMD
Oseltamivir Nov 23, 2025
ef2ee40
Cleanup MI300x_AMD MUST WORK
Oseltamivir Nov 23, 2025
3790696
works
Oseltamivir Nov 23, 2025
92f244c
Working lighteval
Oseltamivir Nov 25, 2025
3e30425
lightevel fix
Oseltamivir Nov 25, 2025
0d87ea5
lighteval test h100-cw_1
Oseltamivir Nov 25, 2025
00b1623
lighteval test h100-cr_1 + parsing
Oseltamivir Nov 25, 2025
83a71d2
lighteval test b200_nvd
Oseltamivir Nov 25, 2025
df71abe
lighteval test b200_nvd
Oseltamivir Nov 25, 2025
4aa8d34
lighteval test mi300x-amd_0
Oseltamivir Nov 25, 2025
fe2ecd5
lighteval test h100-cw_1
Oseltamivir Nov 25, 2025
fef016a
lighteval test mi300x-cr_0
Oseltamivir Nov 25, 2025
124eb70
lighteval test mi325x-tw_1
Oseltamivir Nov 25, 2025
2b0b986
lighteval test mi355x-amd_4
Oseltamivir Nov 25, 2025
dae7345
lighteval test b200-nvd_3
Oseltamivir Nov 25, 2025
993b19f
lighteval test h100-cw_1 sudo test
Oseltamivir Nov 25, 2025
f5b3a7a
b200 fix check
Oseltamivir Nov 25, 2025
ff1eba6
b200 fix check
Oseltamivir Nov 25, 2025
d6a52ec
b200 fix check
Oseltamivir Nov 25, 2025
4dd7e21
b200 fix check
Oseltamivir Nov 25, 2025
37bd3df
b200 fix check
Oseltamivir Nov 25, 2025
43c7c59
b200 fix check
Oseltamivir Nov 25, 2025
e5a8e3a
b200 fix check
Oseltamivir Nov 25, 2025
8fb95f4
b200 fix check
Oseltamivir Nov 25, 2025
237b4e8
b200 fix check
Oseltamivir Nov 25, 2025
79eadc5
Prelimary lighteval for all
Oseltamivir Nov 26, 2025
a2d77ff
Prelimary lighteval for all 2 - fixed TP
Oseltamivir Nov 26, 2025
4e139a0
Prelimary lighteval for all 3
Oseltamivir Nov 26, 2025
76b8c2c
Fix lighteval 1
Oseltamivir Nov 27, 2025
fda8e2c
Check both
Oseltamivir Nov 27, 2025
2e7c127
lm-eval check
Oseltamivir Nov 27, 2025
867bfc3
lm-eval check
Oseltamivir Nov 27, 2025
8cbe81f
lm-eval check
Oseltamivir Nov 27, 2025
1b3b79f
lm-eva
Oseltamivir Nov 27, 2025
65f0303
mi325x test
Oseltamivir Nov 27, 2025
ddd3862
mi325x test
Oseltamivir Nov 27, 2025
30ad3ba
all change, test deepseek
Oseltamivir Nov 28, 2025
688e2c5
all change, test deepseek
Oseltamivir Nov 28, 2025
6b320ce
retest mi325x
Oseltamivir Nov 28, 2025
9768dea
test b200
Oseltamivir Nov 28, 2025
4c339b4
clean b200
Oseltamivir Nov 28, 2025
efe94aa
test h200
Oseltamivir Nov 28, 2025
705fc10
H200 test
Oseltamivir Nov 28, 2025
f79f243
B200-nvd2 sleep
Oseltamivir Nov 28, 2025
d9a4fed
B200-nvd2 sleep
Oseltamivir Nov 28, 2025
8c6b944
B200-nvd2 sleep
Oseltamivir Nov 28, 2025
28a026f
mi325x test
Oseltamivir Nov 28, 2025
c4bd3d2
mi325x test, no text, no empty fix
Oseltamivir Nov 28, 2025
14068bc
h100, tmp eval_out
Oseltamivir Nov 29, 2025
af2c385
h100, tmp eval_out, sweep integration
Oseltamivir Nov 29, 2025
5e1d68d
touch up sweep naming, remove funny triton error
Oseltamivir Nov 29, 2025
1a3262f
touch up sweep summary
Oseltamivir Nov 29, 2025
733d7ca
touch up run name
Oseltamivir Nov 29, 2025
68c1a2d
Missing eval env var docker
Oseltamivir Nov 30, 2025
6cb94a7
Typo
Oseltamivir Nov 30, 2025
bc472c3
Add proper coverage
Oseltamivir Nov 30, 2025
837622f
Merge remote-tracking branch 'origin/main' into evals-on-refactor
Oseltamivir Dec 1, 2025
2461447
Add evals
Oseltamivir Dec 2, 2025
848e834
Merge branch 'main' into evals-on-refactor
cquil11 Dec 2, 2025
710d428
Cam's solution
Oseltamivir Dec 2, 2025
3c8b9bc
b200 scancel fix
Oseltamivir Dec 2, 2025
1390c52
Change to 2 fewshot, forgot eval env var in b200
Oseltamivir Dec 2, 2025
544e698
Resolve issues
Oseltamivir Dec 3, 2025
dd96fcf
Merge branch 'main' into evals-on-refactor
cquil11 Dec 3, 2025
5ec3378
Resolve issues/nits
Oseltamivir Dec 4, 2025
ae4e481
fix summary table hardware
Oseltamivir Dec 4, 2025
48a220d
fix summary table hardware
Oseltamivir Dec 4, 2025
61327ca
fix summary table hardware 2
Oseltamivir Dec 4, 2025
1cf2967
final touches
Oseltamivir Dec 5, 2025
34e3b2a
Merge branch 'main' into evals-on-refactor
cquil11 Dec 5, 2025
1d889b8
Cleanup comments, ammend lighteval
Oseltamivir Dec 6, 2025
779a257
pt 1 manual merge conflict fixes
cquil11 Dec 15, 2025
00e77d0
Merge branch 'main' into evals-on-refactor
cquil11 Dec 15, 2025
9d4b217
pt 2 manual merge conflict fixes
cquil11 Dec 15, 2025
a9fad5b
use double quotes for gha parsing
cquil11 Dec 15, 2025
e07eb69
getting rid of full sweep sched changes
cquil11 Dec 15, 2025
9275f0d
add back spec decoding and disagg env vars
cquil11 Dec 15, 2025
dba25aa
add an option to ONLY run evals
cquil11 Dec 16, 2025
5de917b
remove full-sweep-test workflow and add collect-evals job to run swee…
cquil11 Dec 16, 2025
37d05d3
add run-eval to e2e tests
cquil11 Dec 16, 2025
6a546e5
math500 prompt and h200 trt evals
Oseltamivir Dec 16, 2025
d299d41
remove run prefix
cquil11 Dec 16, 2025
569d0c3
add result-prefix to benchmark tmpl uploaded artifacts
cquil11 Dec 16, 2025
30a3431
Evals summary refactor
Oseltamivir Dec 17, 2025
22c8a2b
Evals summary refactor 2
Oseltamivir Dec 17, 2025
8d12b35
Evals summary aesthetics
Oseltamivir Dec 17, 2025
d7a515a
TRT package fix, trt testing
Oseltamivir Dec 18, 2025
25f71bd
trt testing 2
Oseltamivir Dec 18, 2025
ab6bf8f
max_num_tokens
Oseltamivir Dec 19, 2025
0472555
Merge branch 'main' into evals-on-refactor
cquil11 Jan 5, 2026
0d8d7d1
Merge branch 'main' into evals-on-refactor
cquil11 Jan 7, 2026
9a873c4
unbounded gen len
Oseltamivir Jan 8, 2026
999b9f6
Fix tmpl args, add isl/osl to table
Oseltamivir Jan 8, 2026
9a13250
add isl/osl
Oseltamivir Jan 8, 2026
4b0f8de
set max tokens
Oseltamivir Jan 12, 2026
a52f4c6
remove nvd
Oseltamivir Jan 12, 2026
568e1d3
In case of multiple evals
Oseltamivir Jan 13, 2026
d55c796
diagnostic
Oseltamivir Jan 13, 2026
cdd2332
Merge remote-tracking branch 'origin/main' into evals-on-refactor
Oseltamivir Jan 13, 2026
0699df8
Merge remote-tracking branch 'origin/main' into evals-on-refactor
Oseltamivir Jan 13, 2026
fcd14e2
test dp_attn
Oseltamivir Jan 13, 2026
c902545
DP_ATTENTION back
Oseltamivir Jan 14, 2026
715269c
REMOVE LIGHTEVAL
Oseltamivir Jan 15, 2026
c19bb21
Merge branch 'main' into evals-on-refactor, address claude
Oseltamivir Jan 15, 2026
be431c8
Merge branch 'main' into evals-on-refactor
Oseltamivir Jan 15, 2026
50f09cc
Merge remote-tracking branch 'origin/main' into evals-on-refactor
Oseltamivir Jan 16, 2026
500029b
Merge branch 'main' into evals-on-refactor
Oseltamivir Jan 20, 2026
a353ea4
Add evals for atom, trt_mtp
Oseltamivir Jan 20, 2026
d6d4055
remove tokenizer from benchmarkserving
Oseltamivir Jan 20, 2026
338d80c
remove model_name
Oseltamivir Jan 20, 2026
e28631c
More evals for spec decode
Oseltamivir Jan 20, 2026
fa49cdc
claude pr comments
Oseltamivir Jan 19, 2026
7e628ff
chore(deps): bump the github-actions group with 2 updates (#488)
dependabot[bot] Jan 19, 2026
518d004
fix: update ep metadata in gb200 dynamo sglang configs to match comme…
functionstackx Jan 19, 2026
388020f
Experimental folder (increasing researcher/developer velocity) (#489)
functionstackx Jan 19, 2026
ef15b99
summary table
Oseltamivir Jan 21, 2026
b5b9ec0
Merge branch 'main' into evals-on-refactor
Oseltamivir Jan 21, 2026
a1f9b89
Merge branch 'main' into evals-on-refactor
Oseltamivir Jan 21, 2026
62079d6
Remove git installation and repository cloning
Oseltamivir Jan 21, 2026
5409158
evals final
Oseltamivir Jan 21, 2026
9ae0f90
more retries, lower conc, for stability
Oseltamivir Jan 21, 2026
43fd4e8
Merge branch 'main' into evals-on-refactor
Oseltamivir Jan 21, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 27 additions & 3 deletions .github/workflows/benchmark-tmpl.yml
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,10 @@ on:
required: false
type: string
default: '0.8'

run-eval:
type: boolean
required: false
default: false
env:
HF_TOKEN: ${{ secrets.HF_TOKEN }}
HF_HUB_CACHE: '/mnt/hf_hub_cache/'
Expand All @@ -62,6 +65,7 @@ env:
EP_SIZE: ${{ inputs.ep }}
DP_ATTENTION: ${{ inputs.dp-attn }}
CONC: ${{ inputs.conc }}
RUN_EVAL: ${{ inputs.run-eval }}

permissions:
contents: read
Expand All @@ -70,7 +74,7 @@ jobs:
benchmark:
runs-on: ${{ inputs.runner }}
timeout-minutes: 180
name: '${{ inputs.exp-name }} ${{ inputs.runner }} ${{ inputs.framework }} ${{ inputs.precision }} tp=${{ inputs.tp }} ep=${{ inputs.ep }} dpa=${{ inputs.dp-attn }} conc=${{ inputs.conc }}'
name: '${{ inputs.exp-name }} ${{ inputs.runner }} ${{ inputs.framework }} ${{ inputs.precision }} ${{ inputs.run-eval && ''eval '' || '''' }}tp=${{ inputs.tp }} ep=${{ inputs.ep }} dpa=${{ inputs.dp-attn }} conc=${{ inputs.conc }}'
steps:
- name: Resource cleanup
run: |
Expand Down Expand Up @@ -131,6 +135,9 @@ jobs:
env:
RUNNER_NAME: ${{ runner.name }}
RESULT_FILENAME: ${{ env.EXP_NAME }}_${{ env.PRECISION }}_${{ env.FRAMEWORK }}_tp${{ env.TP }}_ep${{ env.EP_SIZE }}_dpa_${{ env.DP_ATTENTION }}_conc${{ env.CONC }}_${{ runner.name }}
# Suppress per-job eval markdown from being appended to the step summary.
# We'll publish a single combined eval table in the collection job instead.
GITHUB_STEP_SUMMARY: ''
run: |
bash ./runners/launch_${RUNNER_NAME%%_*}.sh
FOUND_RESULT_FILE=
Expand Down Expand Up @@ -158,4 +165,21 @@ jobs:
uses: actions/upload-artifact@330a01c490aca151604b8cf639adc76d48f6c5d4 # v5.0.0
with:
name: ${{ env.RESULT_FILENAME }}
path: agg_${{ env.RESULT_FILENAME }}.json
path: agg_${{ env.RESULT_FILENAME }}.json

- name: Upload eval results (if any)
if: ${{ env.RUN_EVAL == 'true' }}
uses: actions/upload-artifact@330a01c490aca151604b8cf639adc76d48f6c5d4 # v5.0.0
with:
name: eval_${{ env.EXP_NAME }}_${{ env.RESULT_FILENAME }}
path: |
meta_env.json
results*.json
if-no-files-found: ignore

- name: Cleanup eval outputs (post-upload)
if: ${{ env.RUN_EVAL == 'true' }}
run: |
rm -f meta_env.json || true
# Remove any eval results JSONs that were moved into workspace
rm -f results*.json || true
45 changes: 45 additions & 0 deletions .github/workflows/collect-evals.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
name: Template - Collect Evals
Comment thread
cquil11 marked this conversation as resolved.

on:
workflow_call:
inputs:
exp-name:
required: false
type: string
default: ''

permissions:
contents: read

jobs:
collect-evals:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
with:
token: ${{ secrets.REPO_PAT }}
fetch-depth: 0

- name: Download eval artifacts
uses: actions/download-artifact@018cc2cf5baa6db3ef3c5f8a56943fffe632ef53 # v6.0.0
with:
path: eval_results/
pattern: ${{ inputs.exp-name && format('eval_{0}_*', inputs.exp-name) || 'eval_*' }}

- name: Summarize evals
run: |
echo "## Eval Summary - ${{ inputs.exp-name || 'all' }}" >> $GITHUB_STEP_SUMMARY
echo "" >> $GITHUB_STEP_SUMMARY
python3 utils/collect_eval_results.py eval_results/ ${{ inputs.exp-name || 'all' }} >> $GITHUB_STEP_SUMMARY

- name: Upload aggregated evals
uses: actions/upload-artifact@330a01c490aca151604b8cf639adc76d48f6c5d4 # v5.0.0
with:
name: eval_results_${{ inputs.exp-name || 'all' }}
path: agg_eval_${{ inputs.exp-name || 'all' }}.json

- name: Cleanup downloaded eval artifacts
if: ${{ always() }}
run: |
rm -rf eval_results/ || true
64 changes: 64 additions & 0 deletions .github/workflows/eval-gms8k.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
name: Eval - GSM8K (PoC)
Comment thread
Oseltamivir marked this conversation as resolved.
Outdated

on:
workflow_dispatch:
inputs:
exp-name:
description: "Experiment name (prefix selects docker script)"
required: false
type: string
default: "gptoss_gsm8k_poc"
image:
description: "Serving image"
required: false
type: string
default: "vllm/vllm-openai:v0.11.0"
model:
description: "Model"
required: false
type: string
default: "openai/gpt-oss-120b"
tp:
description: "Tensor Parallel Size"
required: false
type: string
default: "2"
port:
description: "Server port"
required: false
type: string
default: "8888"
num_fewshot:
description: "Fewshot k for GSM8K"
required: false
type: string
default: "5"
limit:
description: "Sample limit for GSM8K"
required: false
type: string
default: "1300"
push:
paths:
- '.github/workflows/eval-gms8k.yml'
Comment thread
Oseltamivir marked this conversation as resolved.
Outdated
- '.github/workflows/eval-tmpl.yml'
- 'benchmarks/benchmark_lib.sh'

jobs:
eval:
uses: ./.github/workflows/eval-tmpl.yml
secrets: inherit
with:
runner: h100-cw_0
image: ${{ inputs.image || 'vllm/vllm-openai:v0.11.0' }}
model: ${{ inputs.model || 'openai/gpt-oss-120b' }}
framework: vllm
precision: fp4
exp-name: ${{ inputs.exp-name || 'gptoss_gsm8k_poc' }}
tp: '4'
ep: '1'
dp-attn: false
port: ${{ inputs.port || '8888' }}
eval-task: gsm8k
num-fewshot: ${{ inputs.num_fewshot || '5' }}
limit: ${{ inputs.limit || '200' }}
152 changes: 152 additions & 0 deletions .github/workflows/eval-tmpl.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,152 @@
name: Template - Eval

on:
workflow_call:
inputs:
runner:
required: true
type: string
image:
required: true
type: string
model:
required: true
type: string
framework:
required: true
type: string
precision:
required: true
type: string
exp-name:
required: true
type: string
tp:
required: true
type: string
ep:
required: false
type: string
default: '1'
dp-attn:
required: false
type: boolean
default: false
port:
required: false
type: string
default: '8888'
eval-task:
required: true
type: string
num-fewshot:
required: false
type: string
default: '5'
limit:
required: false
type: string
default: '200'

env:
HF_TOKEN: ${{ secrets.HF_TOKEN }}
HF_HUB_CACHE: '/mnt/hf_hub_cache/'
EXP_NAME: ${{ inputs.exp-name }}
MODEL: ${{ inputs.model }}
IMAGE: ${{ inputs.image }}
FRAMEWORK: ${{ inputs.framework }}
PRECISION: ${{ inputs.precision }}
TP: ${{ inputs.tp }}
EP_SIZE: ${{ inputs.ep }}
DP_ATTENTION: ${{ inputs.dp-attn }}
PORT: ${{ inputs.port }}
EVAL_TASK: ${{ inputs['eval-task'] }}
NUM_FEWSHOT: ${{ inputs['num-fewshot'] }}
LIMIT: ${{ inputs.limit }}
# Keep eval outputs only under /tmp
EVAL_RESULT_DIR: /tmp/eval_out
CONC: '32'
MAX_MODEL_LEN: '4096'
ISL: 1024
OSL: 1024
RANDOM_RANGE_RATIO: '0.8'
RESULT_FILENAME: results
RUN_EVAL: true

jobs:
eval:
runs-on: ${{ inputs.runner }}
timeout-minutes: 180
name: "Eval ${{ inputs.exp-name }} ${{ inputs.runner }} ${{ inputs.precision }} tp=${{ inputs.tp }} task=${{ inputs['eval-task'] }} limit=${{ inputs.limit }}"
steps:
- name: Resource cleanup
run: |
sudo rm -rf /home/nvadmin/actions-runner/_work/InferenceMAX/InferenceMAX/eval_out/
# Helper to avoid indefinite hangs on flaky tools (Docker/Slurm)
safe_timeout() {
if command -v timeout >/dev/null 2>&1; then
timeout -k 5 30s "$@"
else
"$@"
fi
}
host=$(hostname)
if [[ "$host" == "b200-81" || "$host" == "b200-80" || "$host" == "b200-79" ]]; then
if command -v docker >/dev/null 2>&1; then
echo "[INFO] Running container-by-container cleanup on $host"
cids=$(safe_timeout docker ps -aq || true)
for cid in $cids; do
echo "[INFO] Cleaning container $cid"
safe_timeout docker stop -t 90 "$cid" || true
safe_timeout docker wait "$cid" >/dev/null 2>&1 || true
safe_timeout docker rm -f "$cid" >/dev/null 2>&1 || true
done
sleep 2
if nvidia-smi --query-compute-apps=pid --format=csv,noheader | grep -q '[0-9]'; then
echo "[WARN] After stop, GPU still busy:"
nvidia-smi || true
fi
else
echo "[Docker] docker client not found; skipping cleanup"
fi
else
echo "[Docker] skipping docker cleanup on host $host"
fi
# Best-effort cleanup of prior eval outputs; do not block

if command -v squeue >/dev/null 2>&1; then
echo "[Slurm] Cleaning up resources ..."
safe_timeout scancel -u "$USER" || true
# Wait up to 5 minutes for jobs to clear to avoid indefinite hang
end=$((SECONDS + 300))
while [ $SECONDS -lt $end ]; do
queued=$(safe_timeout squeue -u "$USER" --noheader --format='%i' 2>/dev/null || true)
if [ -z "$queued" ]; then
break
fi
echo "$queued" | sed 's/^/[Slurm] pending job: /' || true
sleep 5
done
# Final status; do not block
safe_timeout squeue -u "$USER" || true
if [ -n "$(safe_timeout squeue -u "$USER" --noheader --format='%i' 2>/dev/null || true)" ]; then
echo "[Slurm] Jobs still present after timeout; proceeding"
fi
fi

- uses: actions/checkout@v5
with:
fetch-depth: 0
# Avoid aggressive workspace deletion if stale, rely on git reset/clean later
clean: true

- name: Launch eval via runner script
env:
RUNNER_NAME: ${{ runner.name }}
RUN_MODE: eval
# Optional: structured filename if runner chooses to use it later
EVAL_RESULT_BASENAME: ${{ env.EXP_NAME }}_${{ env.PRECISION }}_${{ env.FRAMEWORK }}_${{ runner.name }}
run: |
bash ./runners/launch_${RUNNER_NAME%%_*}.sh

# Intentionally no eval artifact uploads: eval outputs remain in /tmp only.
4 changes: 2 additions & 2 deletions .github/workflows/full-sweep-1k1k-scheduler.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ jobs:
- id: get-dsr1-configs
run: |
pip install pydantic
CONFIG_JSON=$(python3 ${GITHUB_WORKSPACE}/utils/matrix-logic/generate_sweep_configs.py full-sweep --config-files ${GITHUB_WORKSPACE}/.github/configs/nvidia-master.yaml ${GITHUB_WORKSPACE}/.github/configs/amd-master.yaml --seq-lens 1k1k --model-prefix dsr1)
CONFIG_JSON=$(python3 ${GITHUB_WORKSPACE}/utils/matrix-logic/generate_sweep_configs.py full-sweep --config-files ${GITHUB_WORKSPACE}/.github/configs/nvidia-master.yaml ${GITHUB_WORKSPACE}/.github/configs/amd-master.yaml --seq-lens 1k1k --model-prefix dsr1 --run-evals)
echo "search-space-config=$CONFIG_JSON" >> $GITHUB_OUTPUT

get-gptoss-configs:
Expand All @@ -31,7 +31,7 @@ jobs:
- id: get-gptoss-configs
run: |
pip install pydantic
CONFIG_JSON=$(python3 ${GITHUB_WORKSPACE}/utils/matrix-logic/generate_sweep_configs.py full-sweep --config-files ${GITHUB_WORKSPACE}/.github/configs/nvidia-master.yaml ${GITHUB_WORKSPACE}/.github/configs/amd-master.yaml --seq-lens 1k1k --model-prefix gptoss)
CONFIG_JSON=$(python3 ${GITHUB_WORKSPACE}/utils/matrix-logic/generate_sweep_configs.py full-sweep --config-files ${GITHUB_WORKSPACE}/.github/configs/nvidia-master.yaml ${GITHUB_WORKSPACE}/.github/configs/amd-master.yaml --seq-lens 1k1k --model-prefix gptoss --run-evals)
echo "search-space-config=$CONFIG_JSON" >> $GITHUB_OUTPUT

benchmark-dsr1:
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/full-sweep-1k8k-scheduler.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ jobs:
- id: get-dsr1-configs
run: |
pip install pydantic
CONFIG_JSON=$(python3 ${GITHUB_WORKSPACE}/utils/matrix-logic/generate_sweep_configs.py full-sweep --config-files ${GITHUB_WORKSPACE}/.github/configs/nvidia-master.yaml ${GITHUB_WORKSPACE}/.github/configs/amd-master.yaml --seq-lens 1k8k --model-prefix dsr1)
CONFIG_JSON=$(python3 ${GITHUB_WORKSPACE}/utils/matrix-logic/generate_sweep_configs.py full-sweep --config-files ${GITHUB_WORKSPACE}/.github/configs/nvidia-master.yaml ${GITHUB_WORKSPACE}/.github/configs/amd-master.yaml --seq-lens 1k8k --model-prefix dsr1 --run-evals)
echo "search-space-config=$CONFIG_JSON" >> $GITHUB_OUTPUT

get-gptoss-configs:
Expand All @@ -31,7 +31,7 @@ jobs:
- id: get-gptoss-configs
run: |
pip install pydantic
CONFIG_JSON=$(python3 ${GITHUB_WORKSPACE}/utils/matrix-logic/generate_sweep_configs.py full-sweep --config-files ${GITHUB_WORKSPACE}/.github/configs/nvidia-master.yaml ${GITHUB_WORKSPACE}/.github/configs/amd-master.yaml --seq-lens 1k8k --model-prefix gptoss)
CONFIG_JSON=$(python3 ${GITHUB_WORKSPACE}/utils/matrix-logic/generate_sweep_configs.py full-sweep --config-files ${GITHUB_WORKSPACE}/.github/configs/nvidia-master.yaml ${GITHUB_WORKSPACE}/.github/configs/amd-master.yaml --seq-lens 1k8k --model-prefix gptoss --run-evals)
echo "search-space-config=$CONFIG_JSON" >> $GITHUB_OUTPUT

benchmark-dsr1:
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/full-sweep-8k1k-scheduler.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ jobs:
- id: get-dsr1-configs
run: |
pip install pydantic
CONFIG_JSON=$(python3 ${GITHUB_WORKSPACE}/utils/matrix-logic/generate_sweep_configs.py full-sweep --config-files ${GITHUB_WORKSPACE}/.github/configs/nvidia-master.yaml ${GITHUB_WORKSPACE}/.github/configs/amd-master.yaml --seq-lens 8k1k --model-prefix dsr1)
CONFIG_JSON=$(python3 ${GITHUB_WORKSPACE}/utils/matrix-logic/generate_sweep_configs.py full-sweep --config-files ${GITHUB_WORKSPACE}/.github/configs/nvidia-master.yaml ${GITHUB_WORKSPACE}/.github/configs/amd-master.yaml --seq-lens 8k1k --model-prefix dsr1 --run-evals)
echo "search-space-config=$CONFIG_JSON" >> $GITHUB_OUTPUT

get-gptoss-configs:
Expand All @@ -31,7 +31,7 @@ jobs:
- id: get-gptoss-configs
run: |
pip install pydantic
CONFIG_JSON=$(python3 ${GITHUB_WORKSPACE}/utils/matrix-logic/generate_sweep_configs.py full-sweep --config-files ${GITHUB_WORKSPACE}/.github/configs/nvidia-master.yaml ${GITHUB_WORKSPACE}/.github/configs/amd-master.yaml --seq-lens 8k1k --model-prefix gptoss)
CONFIG_JSON=$(python3 ${GITHUB_WORKSPACE}/utils/matrix-logic/generate_sweep_configs.py full-sweep --config-files ${GITHUB_WORKSPACE}/.github/configs/nvidia-master.yaml ${GITHUB_WORKSPACE}/.github/configs/amd-master.yaml --seq-lens 8k1k --model-prefix gptoss --run-evals)
echo "search-space-config=$CONFIG_JSON" >> $GITHUB_OUTPUT

benchmark-dsr1:
Expand Down
Loading
Loading