Fix benchmark bugs: GPQA model detection, UTF-8 decode, random_range_ratio#194
Conversation
📝 WalkthroughWalkthroughThis PR extends benchmark configuration and model detection capabilities. It adds an optional Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/srtctl/cli/mixins/postprocess_stage.py (2)
274-274:⚠️ Potential issue | 🟡 MinorType mismatch:
container_mountsexpectsdict[Path, Path], notdict[str, str].The pipeline failure indicates
start_srun_processrequiresdict[Path, Path] | Noneforcontainer_mounts, but this passes string keys/values.Proposed fix
proc = start_srun_process( command=["bash", "-c", script], nodelist=[self.runtime.nodes.head], output=str(self.runtime.log_dir / "postprocess.log"), container_image="python:3.11", - container_mounts={str(self.runtime.log_dir): "/logs"}, + container_mounts={self.runtime.log_dir: Path("/logs")}, env_to_set=env, )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/srtctl/cli/mixins/postprocess_stage.py` at line 274, The call is passing strings for container_mounts but start_srun_process expects dict[Path, Path] | None; change the mapping to use pathlib.Path objects (e.g. Path(self.runtime.log_dir) as the key and Path("/logs") as the value) before passing it to start_srun_process, and add an import for pathlib.Path if missing; ensure the resulting container_mounts variable is typed/constructed as dict[Path, Path] (or None when appropriate) so the signature matches.
416-416:⚠️ Potential issue | 🟡 MinorType mismatch: Same issue as line 274.
Proposed fix
proc = start_srun_process( command=["bash", "-c", script], nodelist=[self.runtime.nodes.head], output=str(analysis_log), container_image="python:3.11", - container_mounts={str(self.runtime.log_dir): "/logs"}, + container_mounts={self.runtime.log_dir: Path("/logs")}, env_to_set=env_to_set, )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/srtctl/cli/mixins/postprocess_stage.py` at line 416, The container_mounts entry currently uses str(self.runtime.log_dir) which causes a typing mismatch; update the mapping to use the expected path type by passing self.runtime.log_dir (not str()) so the container_mounts parameter and the runtime.log_dir attribute types align (same fix as applied at the earlier occurrence around line 274); adjust any related annotations if needed to ensure container_mounts accepts the runtime path type.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@src/srtctl/cli/mixins/postprocess_stage.py`:
- Line 274: The call is passing strings for container_mounts but
start_srun_process expects dict[Path, Path] | None; change the mapping to use
pathlib.Path objects (e.g. Path(self.runtime.log_dir) as the key and
Path("/logs") as the value) before passing it to start_srun_process, and add an
import for pathlib.Path if missing; ensure the resulting container_mounts
variable is typed/constructed as dict[Path, Path] (or None when appropriate) so
the signature matches.
- Line 416: The container_mounts entry currently uses str(self.runtime.log_dir)
which causes a typing mismatch; update the mapping to use the expected path type
by passing self.runtime.log_dir (not str()) so the container_mounts parameter
and the runtime.log_dir attribute types align (same fix as applied at the
earlier occurrence around line 274); adjust any related annotations if needed to
ensure container_mounts accepts the runtime path type.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
src/srtctl/benchmarks/sa_bench.pysrc/srtctl/benchmarks/scripts/gpqa/bench.shsrc/srtctl/cli/mixins/postprocess_stage.pysrc/srtctl/core/schema.py
Summary
/v1/modelsAPI endpoint instead of hardcodingdeepseek-ai/DeepSeek-R1. Fixes 404 errors when serving models with different names.read_text(errors="replace")to handle non-UTF-8 bytes in benchmark output (ANSI escape codes, etc.) that causedUnicodeDecodeErrorcrashes.random_range_ratioparameter to bench script for configurable input/output length variance.random_range_ratiofield toBenchmarkConfig.Test plan
random_range_ratioset in configSummary by CodeRabbit
Release Notes
New Features
random_range_ratioparameter to customize SA-Bench behavior with sensible defaultsImprovements