Add benchmark queue improvements: cancel, clear, timeout, status by adriangb · Pull Request #12 · alamb/datafusion-benchmarking

adriangb · 2026-02-15T16:09:03Z

Summary

poll.sh: Run jobs with setsid for killable process groups; write .started marker files containing ISO timestamp + PID; handle cancelled jobs (file removed mid-run) gracefully
scrape_comments.py: Add cancel benchmark <id_or_url>, cancel benchmarks, and clear benchmark queue commands (allowed users only); enhance show benchmark queue table with Started and Status columns showing elapsed time for running jobs; add stale .started file cleanup when PID is dead
gh_compare_branch.sh: Add 25-minute (BENCHMARK_TIMEOUT) per benchmark run; on timeout, skip to next benchmark
gh_compare_branch_bench.sh: Add timeout + --no-run pre-compilation step to separate compile from execution
gh_compare_arrow.sh: Add timeout + --no-run pre-compilation step

New Commands

cancel benchmark <comment_id> — Cancel a queued or running benchmark by comment ID (numeric, #issuecomment-NNN URL, or API URL). Kills running process group via SIGTERM.
cancel benchmarks — Cancel all queued and running benchmark jobs for the PR where the comment is posted. Reports counts of running and queued jobs cancelled.
clear benchmark queue — Remove all queued (non-running) jobs. Skips currently running jobs.

Test plan

🤖 Generated with Claude Code

…display - poll.sh: Run jobs with setsid for killable process groups, write .started marker files with timestamp and PID, handle cancelled jobs gracefully - scrape_comments.py: Add cancel benchmark, clear benchmark queue commands, enhance queue display with Started/Status columns showing elapsed time - gh_compare_branch.sh: Add 25-minute timeout per benchmark run - gh_compare_branch_bench.sh: Add timeout and pre-compilation step - gh_compare_arrow.sh: Add timeout and pre-compilation step Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

adriangb · 2026-02-15T16:10:16Z

cc @Dandandan

Adds a new "cancel benchmarks" (plural, no argument) command that cancels all queued and running benchmark jobs for the PR where the comment is posted. Restricted to ALLOWED_USERS. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

alamb · 2026-02-24T20:13:52Z

Thank you for this PR

Posting some comments from Discord:

this is probably fine but I don't think I will have tome to review it anytime soon. The issue is I have no good way to test changes to the benchmark runner and without that I don't want to break the runner / actively running scripts.

Maybe someone can help writing some tests so that we would have a better ability to evaluate changes

Dandandan · 2026-03-01T12:36:17Z

PR #12 Review: Benchmark queue improvements (cancel, clear, timeout, status)

The PR is well-structured and the overall approach is sound. Here are my observations:

poll.sh

Race condition in .started file creation (poll.sh)

setsid bash "$job" &
JOB_PID=$!
echo "$(date -u ...) $JOB_PID" > "${job}.started"   # <-- window here
wait $JOB_PID 2>/dev/null || true

There's a brief window between backgrounding the job and writing .started. If a cancel arrives in this window, get_job_started_info returns (None, None) and the job is treated as queued—the .sh file gets removed but the already-running process is never killed. In practice this window is tiny, but worth noting.

No SIGKILL escalation
cancel_benchmark and cancel_all_benchmarks only send SIGTERM. If a benchmark hangs and ignores SIGTERM (e.g., blocking I/O), there's no follow-up SIGKILL. Consider a short wait + SIGKILL, or at least document the limitation.

scrape_comments.py

clear benchmark queue is global, not PR-scoped (scrape_comments.py:clear_benchmark_queue)
cancel benchmarks is PR-scoped (find_jobs_by_pr_number), but clear benchmark queue calls list_job_files() which returns all .sh files in jobs/ regardless of PR. A user posting this from PR #5 would silently clear jobs enqueued for PR #7. This inconsistency is surprising — either scope both to the PR, or document that "clear queue" is a global admin action.

find_job_by_comment_id splits on both _ and - (scrape_comments.py)

parts = re.split(r"[_\-]", name_no_ext)
if comment_id in parts:

Arrow job names are arrow-<pr>-<comment>.sh, DataFusion names are <pr>_<comment>.sh. If a PR number happens to equal a comment ID from another PR (unlikely but possible), this could return the wrong job. Might be worth anchoring the match to the last numeric segment.

fetch_issue_comment_bodies(...).append(body) mutation — this is an existing pattern used to prevent double-posting on subsequent script runs, so it's fine here, but it's still worth being aware of.

Timeout handling in shell scripts

The timeout+continue logic in gh_compare_branch.sh is correct: if the baseline run times out, the loop skips the branch run and comparison for that benchmark. Good.

The --no-run pre-compilation step is a genuine improvement — it separates compilation time from benchmark time, reducing the risk of a timeout eating into actual measurement.

One nit on gh_compare_arrow.sh and gh_compare_branch_bench.sh: on timeout during the main branch run, the script exit 124s rather than continuing to the next benchmark. That may be intentional (arrow/branch_bench scripts only have one benchmark each), but it's a different behavior from gh_compare_branch.sh.

Minor nits

date -u +"%Y-%m-%dT%H:%M:%SZ" + str.replace("Z", "+00:00") works, though Python 3.11+ accepts Z natively in fromisoformat.
parse_comment_id_from_input handles all three ID formats (plain, HTML anchor, API URL) cleanly — good defensive parsing.
Stale .started cleanup in get_job_started_info is good defensive programming.

Summary

Category	Verdict
Correctness	Mostly good; race condition is minor
Security	`cancel`/`clear` are gated on `ALLOWED_USERS` ✓
`clear queue` scope	Inconsistent with `cancel benchmarks` — worth fixing
Timeout logic	Correct for multi-bench loop; single-bench scripts exit on timeout
Overall	Ready to merge with minor concerns addressed

The main actionable items are: (1) clarify/fix the global scope of clear benchmark queue, and (2) optionally add SIGKILL escalation or documentation for non-cooperative processes.

🤖 Generated with Claude Code

adriangb · 2026-03-01T12:45:21Z

I think I’ll abandon this in favor of a more holistic solution: #13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add benchmark queue improvements: cancel, clear, timeout, status#12

Add benchmark queue improvements: cancel, clear, timeout, status#12
adriangb wants to merge 2 commits intoalamb:mainfrom
adriangb:benchmark-queue-improvements

adriangb commented Feb 15, 2026 •

edited

Loading

Uh oh!

adriangb commented Feb 15, 2026

Uh oh!

alamb commented Feb 24, 2026

Uh oh!

Dandandan commented Mar 1, 2026

Uh oh!

adriangb commented Mar 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

adriangb commented Feb 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

New Commands

Test plan

Uh oh!

adriangb commented Feb 15, 2026

Uh oh!

alamb commented Feb 24, 2026

Uh oh!

Dandandan commented Mar 1, 2026

PR #12 Review: Benchmark queue improvements (cancel, clear, timeout, status)

poll.sh

scrape_comments.py

Timeout handling in shell scripts

Minor nits

Summary

Uh oh!

adriangb commented Mar 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

adriangb commented Feb 15, 2026 •

edited

Loading