Multinode Evals by Oseltamivir · Pull Request #245 · ishandhanani/srt-slurm

Oseltamivir · 2026-04-07T00:09:59Z

Summary

Add InferenceX multi-node eval support through an lm-eval benchmark runner and eval-only orchestration path. Lets InferenceX run accuracy-only jobs against existing srt-slurm multi-node disaggregated recipes without running the throughput benchmark stage.

How

Add an lm-eval benchmark runner that sources InferenceX's benchmarks/benchmark_lib.sh from a mounted /infmax-workspace.
Mount INFMAX_WORKSPACE into the container as /infmax-workspace when provided.
Add EVAL_ONLY=true handling in do_sweep.py so eval-only jobs start infra/workers/frontend, run
the full model health check, skip throughput, and launch lm-eval directly.
Keep RUN_EVAL=true behavior as a post-benchmark eval path for normal throughput jobs.
Pass model/framework/topology metadata into the eval container, including served MODEL_NAME, prefill/decode TP/EP/DPA/worker counts, sequence length, precision, runner type, and eval concurrency.
Map srt-slurm PREFILL_DP_ATTN / DECODE_DP_ATTN env vars to the InferenceX PREFILL_DP_ATTENTION /DECODE_DP_ATTENTION names expected by append_lm_eval_summary.
Copy eval outputs (meta_env.json, results*.json, sample*.jsonl) into /logs/eval_results/ for launcher-side artifact pickup.
Preserve partial eval artifacts on lm-eval failure while still returning the original eval failure
code.
Document the InferenceX lm-eval integration in docs/accuracy.md.

What

For EVAL_ONLY=true:

srt-slurm still starts the normal deployment topology.
The throughput benchmark runner is skipped.
wait_for_model() verifies the configured prefill/decode or aggregated worker counts.
lm-eval runs against the OpenAI-compatible endpoint.
Eval failure is fatal.
Low score leads to failure

For RUN_EVAL=true without EVAL_ONLY=true:

The normal benchmark runs first.
lm-eval runs as a post-step if throughput succeeds.
Eval failure is non-fatal to the benchmark result.
Low score leads to failure

Validation run

https://github.com/SemiAnalysisAI/InferenceX/actions/runs/24059388771

InferenceX PR

SemiAnalysisAI/InferenceX#1000

Adds support for running lm-eval accuracy evaluations as a post-benchmark step, leveraging the InferenceX benchmark_lib.sh harness. - New LMEvalRunner registered as "lm-eval" benchmark type - bench.sh script sources benchmark_lib.sh and calls run_eval/append_lm_eval_summary - Post-benchmark eval hook in SweepOrchestrator.run() triggered by RUN_EVAL=true - Auto-mount INFMAX_WORKSPACE into container when env var is set Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

In eval-only mode the benchmark stage is skipped, which also skips its model health check. The 30s port check in _run_post_eval is insufficient — workers are still loading. Use wait_for_model() with the full health check config (same as benchmark stage) when EVAL_ONLY=true. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Instead of capping eval examples with --limit to avoid timeouts, use the highest benchmark concurrency for eval requests. This runs the full eval set faster by matching the throughput the server was already benchmarked at. do_sweep.py computes max(config.benchmark.concurrencies) and passes it as EVAL_CONC to the lm-eval bench script. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…q1-2026

Oseltamivir · 2026-04-08T01:24:33Z

Continued in NVIDIA/srt-slurm#12

Oseltamivir and others added 6 commits April 4, 2026 11:38

update docs, clean up code

702ff00

Merge branch 'ishandhanani:sa-submission-q1-2026' into sa-submission-…

1629f25

…q1-2026

Clean up

4fc6e27

Oseltamivir mentioned this pull request Apr 7, 2026

Multinode evals SemiAnalysisAI/InferenceX#1000

Merged

Oseltamivir closed this Apr 7, 2026

Oseltamivir mentioned this pull request Apr 7, 2026

Add lm-eval benchmark runner for InferenceX evals NVIDIA/srt-slurm#12

Merged

This was referenced Apr 17, 2026

Add lm-eval benchmark runner for InferenceX evals NVIDIA/srt-slurm#40

Closed

Add lm-eval benchmark runner for evals NVIDIA/srt-slurm#41

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multinode Evals#245

Multinode Evals#245
Oseltamivir wants to merge 6 commits into
ishandhanani:sa-submission-q1-2026from
Oseltamivir:sa-submission-q1-2026

Oseltamivir commented Apr 7, 2026 •

edited

Loading

Uh oh!

Oseltamivir commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Oseltamivir commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

How

What

Validation run

InferenceX PR

Uh oh!

Oseltamivir commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Oseltamivir commented Apr 7, 2026 •

edited

Loading