Skip to content

Add benchmark queue improvements: cancel, clear, timeout, status#12

Open
adriangb wants to merge 2 commits intoalamb:mainfrom
adriangb:benchmark-queue-improvements
Open

Add benchmark queue improvements: cancel, clear, timeout, status#12
adriangb wants to merge 2 commits intoalamb:mainfrom
adriangb:benchmark-queue-improvements

Conversation

@adriangb
Copy link
Collaborator

@adriangb adriangb commented Feb 15, 2026

Summary

  • poll.sh: Run jobs with setsid for killable process groups; write .started marker files containing ISO timestamp + PID; handle cancelled jobs (file removed mid-run) gracefully
  • scrape_comments.py: Add cancel benchmark <id_or_url>, cancel benchmarks, and clear benchmark queue commands (allowed users only); enhance show benchmark queue table with Started and Status columns showing elapsed time for running jobs; add stale .started file cleanup when PID is dead
  • gh_compare_branch.sh: Add 25-minute (BENCHMARK_TIMEOUT) per benchmark run; on timeout, skip to next benchmark
  • gh_compare_branch_bench.sh: Add timeout + --no-run pre-compilation step to separate compile from execution
  • gh_compare_arrow.sh: Add timeout + --no-run pre-compilation step

New Commands

  • cancel benchmark <comment_id> — Cancel a queued or running benchmark by comment ID (numeric, #issuecomment-NNN URL, or API URL). Kills running process group via SIGTERM.
  • cancel benchmarks — Cancel all queued and running benchmark jobs for the PR where the comment is posted. Reports counts of running and queued jobs cancelled.
  • clear benchmark queue — Remove all queued (non-running) jobs. Skips currently running jobs.

Test plan

  • Verify poll.sh creates .started files on job start and removes them on completion
  • Test show benchmark queue displays Started time and elapsed for running jobs, "queued" for waiting jobs
  • Test cancel benchmark <id> removes queued job files
  • Test cancel benchmark <id> kills running benchmarks via process group and removes files
  • Test cancel benchmarks on a PR with no jobs → "No benchmark jobs found"
  • Test cancel benchmarks on a PR with queued jobs → removes them, reports count
  • Test cancel benchmarks on a PR with running jobs → kills process group, removes files, reports count
  • Test that cancel benchmark <id> still works (singular form unchanged)
  • Test clear benchmark queue removes all non-running jobs and reports counts
  • Verify timeout command works on the benchmark machine (Linux, GNU coreutils)
  • Verify cargo bench --no-run compiles without running for criterion benchmarks

🤖 Generated with Claude Code

…display

- poll.sh: Run jobs with setsid for killable process groups, write .started
  marker files with timestamp and PID, handle cancelled jobs gracefully
- scrape_comments.py: Add cancel benchmark, clear benchmark queue commands,
  enhance queue display with Started/Status columns showing elapsed time
- gh_compare_branch.sh: Add 25-minute timeout per benchmark run
- gh_compare_branch_bench.sh: Add timeout and pre-compilation step
- gh_compare_arrow.sh: Add timeout and pre-compilation step

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@adriangb
Copy link
Collaborator Author

cc @Dandandan

Adds a new "cancel benchmarks" (plural, no argument) command that
cancels all queued and running benchmark jobs for the PR where the
comment is posted. Restricted to ALLOWED_USERS.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@alamb
Copy link
Owner

alamb commented Feb 24, 2026

Thank you for this PR

Posting some comments from Discord:

this is probably fine but I don't think I will have tome to review it anytime soon. The issue is I have no good way to test changes to the benchmark runner and without that I don't want to break the runner / actively running scripts.

Maybe someone can help writing some tests so that we would have a better ability to evaluate changes

@Dandandan
Copy link
Collaborator

PR #12 Review: Benchmark queue improvements (cancel, clear, timeout, status)

The PR is well-structured and the overall approach is sound. Here are my observations:


poll.sh

Race condition in .started file creation (poll.sh)

setsid bash "$job" &
JOB_PID=$!
echo "$(date -u ...) $JOB_PID" > "${job}.started"   # <-- window here
wait $JOB_PID 2>/dev/null || true

There's a brief window between backgrounding the job and writing .started. If a cancel arrives in this window, get_job_started_info returns (None, None) and the job is treated as queued—the .sh file gets removed but the already-running process is never killed. In practice this window is tiny, but worth noting.

No SIGKILL escalation
cancel_benchmark and cancel_all_benchmarks only send SIGTERM. If a benchmark hangs and ignores SIGTERM (e.g., blocking I/O), there's no follow-up SIGKILL. Consider a short wait + SIGKILL, or at least document the limitation.


scrape_comments.py

clear benchmark queue is global, not PR-scoped (scrape_comments.py:clear_benchmark_queue)
cancel benchmarks is PR-scoped (find_jobs_by_pr_number), but clear benchmark queue calls list_job_files() which returns all .sh files in jobs/ regardless of PR. A user posting this from PR #5 would silently clear jobs enqueued for PR #7. This inconsistency is surprising — either scope both to the PR, or document that "clear queue" is a global admin action.

find_job_by_comment_id splits on both _ and - (scrape_comments.py)

parts = re.split(r"[_\-]", name_no_ext)
if comment_id in parts:

Arrow job names are arrow-<pr>-<comment>.sh, DataFusion names are <pr>_<comment>.sh. If a PR number happens to equal a comment ID from another PR (unlikely but possible), this could return the wrong job. Might be worth anchoring the match to the last numeric segment.

fetch_issue_comment_bodies(...).append(body) mutation — this is an existing pattern used to prevent double-posting on subsequent script runs, so it's fine here, but it's still worth being aware of.


Timeout handling in shell scripts

The timeout+continue logic in gh_compare_branch.sh is correct: if the baseline run times out, the loop skips the branch run and comparison for that benchmark. Good.

The --no-run pre-compilation step is a genuine improvement — it separates compilation time from benchmark time, reducing the risk of a timeout eating into actual measurement.

One nit on gh_compare_arrow.sh and gh_compare_branch_bench.sh: on timeout during the main branch run, the script exit 124s rather than continuing to the next benchmark. That may be intentional (arrow/branch_bench scripts only have one benchmark each), but it's a different behavior from gh_compare_branch.sh.


Minor nits

  • date -u +"%Y-%m-%dT%H:%M:%SZ" + str.replace("Z", "+00:00") works, though Python 3.11+ accepts Z natively in fromisoformat.
  • parse_comment_id_from_input handles all three ID formats (plain, HTML anchor, API URL) cleanly — good defensive parsing.
  • Stale .started cleanup in get_job_started_info is good defensive programming.

Summary

Category Verdict
Correctness Mostly good; race condition is minor
Security cancel/clear are gated on ALLOWED_USERS
clear queue scope Inconsistent with cancel benchmarks — worth fixing
Timeout logic Correct for multi-bench loop; single-bench scripts exit on timeout
Overall Ready to merge with minor concerns addressed

The main actionable items are: (1) clarify/fix the global scope of clear benchmark queue, and (2) optionally add SIGKILL escalation or documentation for non-cooperative processes.

🤖 Generated with Claude Code

@adriangb
Copy link
Collaborator Author

adriangb commented Mar 1, 2026

I think I’ll abandon this in favor of a more holistic solution: #13

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants