fix: security hardening — path traversal, injection, jq filter bug (batch 3) by johnwaldo · Pull Request #3877 · marcusquinn/aidevops

johnwaldo · 2026-03-08T12:13:50Z

Summary

Security hardening fixes for 3 scripts addressing path traversal, command injection, and data corruption vulnerabilities.

Issues Fixed

Issue	File	Fix
#3152	`worker-token-helper.sh`	Validate `--token-file` path is within `TOKEN_DIR` (prevents path traversal + arbitrary file deletion); validate `--ttl` is numeric (prevents bash arithmetic injection); remove `2>/dev/null` from strategy fallback calls
#3179	`ai-judgment-helper.sh`	Use `awk -v` for safe variable passing (prevents command injection via `threshold`); use `jq` for JSON construction (prevents JSON injection via LLM output)
#3240	`stuck-detection-helper.sh`	Fix jq filter to match on both issue number AND repo slug (prevents cross-repo data corruption); use `awk -v` for safe variable passing; remove `2>/dev/null` from `gh issue edit` calls

Vulnerability Details

Path traversal (quality-debt: .agents/scripts/worker-token-helper.sh — PR #3094 review feedback (critical) #3152): --token-file accepted any path — attacker could overwrite/delete arbitrary files via dd if=/dev/urandom + rm -f
Arithmetic injection (quality-debt: .agents/scripts/worker-token-helper.sh — PR #3094 review feedback (critical) #3152): --ttl value used in (( )) without validation — bash evaluates expressions like a[$(id>&2)0]
Awk command injection (quality-debt: .agents/scripts/ai-judgment-helper.sh — PR #2914 review feedback (critical) #3179): threshold CLI arg interpolated into awk string — 0.7; system("id") executes arbitrary commands
JSON injection (quality-debt: .agents/scripts/ai-judgment-helper.sh — PR #2914 review feedback (critical) #3179): LLM output interpolated into JSON string — attacker-controlled details field could manipulate JSON structure
Cross-repo data corruption (quality-debt: .agents/scripts/stuck-detection-helper.sh — PR #2393 review feedback (critical) #3240): jq filter matched on issue number only — issues with same number in different repos would be incorrectly removed

Closes #3152, closes #3240

Partially addresses #3179 (critical and high findings fixed; medium findings remain)

Summary by CodeRabbit

Bug Fixes
- Surface errors instead of suppressing them so failures propagate to callers.
- Produce consistent, structured JSON for evaluator results using a robust builder.
- Enforce path confinement and add validation for token operations, including numeric TTL and filename checks.
- Improve state and label management in detection workflows: deduplicate and match by repository+issue, add numeric input validation, and ensure state progression when confidence is low.

…njection, JSON injection, jq filter bug Security fixes: - marcusquinn#3152: worker-token-helper.sh — validate --token-file path within TOKEN_DIR to prevent path traversal; validate --ttl is numeric to prevent bash arithmetic injection; remove 2>/dev/null from strategy fallback calls - marcusquinn#3179: ai-judgment-helper.sh — use awk -v for safe variable passing (prevents command injection via threshold); use jq for JSON construction (prevents JSON injection via LLM output) - marcusquinn#3240: stuck-detection-helper.sh — fix jq filter to match on both issue number AND repo slug (prevents cross-repo data corruption); use awk -v for safe variable passing; remove 2>/dev/null from gh issue edit calls

coderabbitai · 2026-03-08T12:14:05Z

Warning

Rate limit exceeded

@marcusquinn has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 18 minutes and 27 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 4d5d4b8e-322d-493d-b7de-c98100187b31

📥 Commits

Reviewing files that changed from the base of the PR and between 0a39ee6 and 9a61e30.

📒 Files selected for processing (2)

.agents/scripts/stuck-detection-helper.sh
.agents/scripts/worker-token-helper.sh

Walkthrough

Refactors in three shell helper scripts: JSON construction moved to jq with typed fields, silent error redirections removed to allow error propagation, and worker-token inputs and file paths validated to prevent injection and path traversal.

Changes

Cohort / File(s)	Summary
AI evaluation / JSON output `.agents/scripts/ai-judgment-helper.sh`	Replace inline awk float compare with parameterized awk; build evaluator result and fallback JSON via `jq -cn` to preserve numeric/boolean typing.
Stuck-detection / state & GH ops `.agents/scripts/stuck-detection-helper.sh`	Remove `2>/dev/null` suppressions so write and `gh` errors surface; validate numeric inputs and parameterize awk threshold checks; update jq state filters to deduplicate/match by both `issue` and `repo` (use two-field unique keys).
Worker token / validation & safety `.agents/scripts/worker-token-helper.sh`	Add TTL numeric validation; remove error-suppressing redirections so token creation errors propagate; enforce `realpath` confinement to `TOKEN_DIR` and require `.token` extension for revocation; perform validations earlier in control flow.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested labels

needs-review

Poem

🔧 Three scripts tightened, errors left to speak,
jq sculpts the output, types no longer leak,
validation gates the tokens' hidden door,
state now checks repos — duplicates to explore,
a safer shell dance, tidy and sleek.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 57.14% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title precisely describes the main changes: security hardening across three helper scripts addressing path traversal, injection vulnerabilities, and jq filter bugs.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist · 2026-03-08T12:14:14Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the security posture of several critical agent scripts by addressing multiple vulnerabilities. It introduces robust input validation and safer data handling practices to prevent common attack vectors such as path traversal, command injection, and JSON injection, while also rectifying a logic error that could lead to cross-repository data corruption. The changes aim to make these scripts more resilient and secure against malicious inputs and incorrect state management.

Highlights

Path Traversal & Arithmetic Injection Prevention: Implemented robust validation for --token-file paths and --ttl values in worker-token-helper.sh to prevent path traversal and arithmetic injection vulnerabilities, ensuring file operations are confined and numerical inputs are safe.
Command & JSON Injection Mitigation: Refactored ai-judgment-helper.sh to use awk -v for safe variable passing and jq for secure JSON construction, effectively eliminating command and JSON injection risks from external inputs.
Cross-Repo Data Corruption Fix: Updated stuck-detection-helper.sh to correctly identify issues by both issue number and repository slug when managing labeled issues, preventing unintended data corruption across different repositories.
Improved Error Visibility: Removed 2>/dev/null from several command calls across worker-token-helper.sh and stuck-detection-helper.sh to ensure that potential error messages are no longer suppressed and are visible for debugging and monitoring.

Changelog

.agents/scripts/ai-judgment-helper.sh
- Updated awk command to use -v for safe variable passing of score and threshold, preventing command injection.
- Refactored JSON construction for result_json and fallback_json to use jq -n --arg, preventing JSON injection.
.agents/scripts/stuck-detection-helper.sh
- Removed 2>/dev/null from printf command when writing temporary state files.
- Updated awk command to use -v for safe variable passing of confidence and STUCK_CONFIDENCE_THRESHOLD.
- Removed 2>/dev/null from gh issue edit calls for adding and removing labels.
- Removed 2>/dev/null from gh issue comment calls.
- Added a redundant --arg repo to a jq command when updating labeled_issues state.
- Modified jq filter to include repo in unique_by for labeled_issues to prevent cross-repo data corruption.
- Modified jq filter to include repo in select for labeled_issues removal to prevent cross-repo data corruption.
.agents/scripts/worker-token-helper.sh
- Added duplicate validation for the --ttl argument to ensure it is a positive integer, preventing arithmetic injection.
- Removed 2>/dev/null from create_app_token and create_delegated_token function calls.
- Implemented path validation using realpath to ensure --token-file is within TOKEN_DIR in cmd_validate and cmd_revoke functions, preventing path traversal.

Activity

No specific activity (comments, reviews, progress) was provided in the context for this pull request.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces several important security hardening fixes across three scripts, addressing vulnerabilities like path traversal, command injection, and data corruption. The changes are well-implemented, using safer alternatives like awk -v and jq --arg for passing variables and jq for robust JSON construction. I've identified a few minor issues related to duplicated code and incorrect indentation, which are detailed in the comments. Overall, this is a great set of security improvements.

.agents/scripts/worker-token-helper.sh

.agents/scripts/ai-judgment-helper.sh

.agents/scripts/stuck-detection-helper.sh

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (1)

.agents/scripts/worker-token-helper.sh (1)

370-379: Drop the duplicated TTL guard.

Line 370 through Line 379 repeat the same regex check and error path. Keeping a single validation block avoids drift the next time this rule changes.

Proposed cleanup

 		--ttl | -t)
 			ttl="$2"
 			# Validate TTL is numeric to prevent arithmetic injection
 			if ! [[ "$ttl" =~ ^[0-9]+$ ]]; then
 				log_token "ERROR" "TTL must be a positive integer: ${ttl}"
 				return 1
 			fi
-			# Validate TTL is numeric to prevent arithmetic injection
-			if ! [[ "$ttl" =~ ^[0-9]+$ ]]; then
-				log_token "ERROR" "TTL must be a positive integer: ${ttl}"
-				return 1
-			fi
 			if ((ttl > MAX_TTL)); then

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In @.agents/scripts/worker-token-helper.sh around lines 370 - 379, The script
contains a duplicated TTL numeric-guard block using the variable ttl and calling
log_token on failure; remove the redundant second validation so only one check
remains: keep the single if [[ "$ttl" =~ ^[0-9]+$ ]] { ... } block that logs via
log_token "ERROR" "TTL must be a positive integer: ${ttl}" and returns 1, and
delete the repeated identical lines to avoid duplication and future drift.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.agents/scripts/ai-judgment-helper.sh:
- Around line 908-911: The JSON output is being pretty-printed by jq -n which
breaks the JSONL one-record-per-line contract; update the jq invocations that
build result JSON (e.g., where result_json is assigned and the other jq -n usage
that emits evaluator records used by eval_dataset) to use compact mode by
changing jq -n to jq -cn so each record is emitted as a single-line JSON object,
preserving the existing use of --arg/--argjson for fields like eval_type, score,
passed and details.

In @.agents/scripts/stuck-detection-helper.sh:
- Around line 344-349: The jq deduplication is wrong and a duplicate --arg is
present: in the pipeline that builds new_state from state (using variables
issue_number, repo_slug, now) change unique_by(.issue + .repo) to
unique_by([.issue, .repo]) so uniqueness is based on the tuple rather than
string concatenation, and remove the duplicated --arg repo "$repo_slug" argument
so --arg repo is only passed once.

In @.agents/scripts/worker-token-helper.sh:
- Around line 529-538: The current realpath call on the token file can fail if
the file is already deleted; change the check to canonicalize the parent
directory instead: compute parent_dir=$(dirname "$token_file") and realpath that
(realpath "$parent_dir") and verify it starts with ${TOKEN_DIR}, and separately
validate the filename via basename to ensure it contains no directory
components; keep using log_token for errors and preserve the existing -f guards
in the cleanup logic so deletion remains safe and idempotent.

---

Nitpick comments:
In @.agents/scripts/worker-token-helper.sh:
- Around line 370-379: The script contains a duplicated TTL numeric-guard block
using the variable ttl and calling log_token on failure; remove the redundant
second validation so only one check remains: keep the single if [[ "$ttl" =~
^[0-9]+$ ]] { ... } block that logs via log_token "ERROR" "TTL must be a
positive integer: ${ttl}" and returns 1, and delete the repeated identical lines
to avoid duplication and future drift.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 2311bd43-8d71-4b3b-b53e-d499c026a7a3

📥 Commits

Reviewing files that changed from the base of the PR and between 2b20a17 and 5e4113c.

📒 Files selected for processing (3)

.agents/scripts/ai-judgment-helper.sh
.agents/scripts/stuck-detection-helper.sh
.agents/scripts/worker-token-helper.sh

.agents/scripts/ai-judgment-helper.sh

.agents/scripts/stuck-detection-helper.sh

.agents/scripts/worker-token-helper.sh

marcusquinn · 2026-03-08T12:50:48Z

This PR is from an external contributor (@johnwaldo). Auto-merge is disabled for external PRs — a maintainer must review and merge manually.

…idempotent revoke - Remove duplicated TTL validation block in worker-token-helper.sh - Use jq -cn (compact) instead of jq -n to maintain JSONL contract - Fix indentation on fallback_json block in ai-judgment-helper.sh - Remove duplicate --arg repo in stuck-detection-helper.sh - Fix unique_by(.issue + .repo) → unique_by([.issue, .repo]) to prevent string concatenation collisions (e.g. issue 12 + repo 3/x vs issue 1 + repo 23/x) - Fix cmd_revoke realpath to resolve parent dir instead of token file, so cleanup works when only .meta remains (token already deleted)

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

.agents/scripts/stuck-detection-helper.sh (1)
269-277: ⚠️ Potential issue | 🟠 Major

Validate confidence as numeric before threshold comparison to prevent string comparison semantics.

Line 271 uses awk to compare confidence against threshold, but awk treats non-numeric strings as strings. When confidence is a non-numeric value (e.g., "high", "foo"), the comparison becomes a lexicographic comparison, causing strings that appear "greater than" the threshold to spuriously return 1 and trigger label/comment operations.

The repro confirms the issue:
0.8   -> 1  ✓ correct
0.6   -> 0  ✓ correct
high  -> 1  ✗ WRONG (string "high" >= "0.7")
foo   -> 1  ✗ WRONG (string "foo" >= "0.7")
Add numeric validation before the awk comparison:
Suggested fix
 	# Check confidence threshold
 	local above_threshold
+	if ! [[ "$confidence" =~ ^([0-9]+([.][0-9]+)?|[.][0-9]+)$ ]] || \
+		! [[ "$STUCK_CONFIDENCE_THRESHOLD" =~ ^([0-9]+([.][0-9]+)?|[.][0-9]+)$ ]]; then
+		_sd_log_error "confidence values must be numeric (got confidence=${confidence}, threshold=${STUCK_CONFIDENCE_THRESHOLD})"
+		return 1
+	fi
 	above_threshold=$(awk -v c="$confidence" -v t="$STUCK_CONFIDENCE_THRESHOLD" 'BEGIN { print (c >= t) ? 1 : 0 }') || above_threshold="0"
This aligns with the coding guideline requirement for automation scripts: Reliability and robustness.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.agents/scripts/stuck-detection-helper.sh around lines 269 - 277, The
confidence variable must be validated as a numeric value before running the awk
threshold comparison to avoid lexicographic comparisons; modify the block around
above_threshold (and the STUCK_CONFIDENCE_THRESHOLD check) to first test
confidence with a numeric regex (e.g., allow integers or decimals) and if it
fails, set a safe numeric default (0) or treat it as below threshold and log a
warning via _sd_log_info; then run the existing awk comparison to compute
above_threshold and proceed to call _sd_record_milestone/return as before.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.agents/scripts/worker-token-helper.sh:
- Around line 445-454: The token validation in cmd_validate resolves token_file
with realpath but compares it to the raw TOKEN_DIR string, which can fail when
TOKEN_DIR contains symlinks; update cmd_validate to canonicalize TOKEN_DIR
(e.g., compute a TOKEN_DIR_REAL via realpath or similar) and compare real_path
of token_file against that canonical TOKEN_DIR path (same approach used in
cmd_revoke), ensuring the check uses the canonicalized directory before
returning error.

---

Outside diff comments:
In @.agents/scripts/stuck-detection-helper.sh:
- Around line 269-277: The confidence variable must be validated as a numeric
value before running the awk threshold comparison to avoid lexicographic
comparisons; modify the block around above_threshold (and the
STUCK_CONFIDENCE_THRESHOLD check) to first test confidence with a numeric regex
(e.g., allow integers or decimals) and if it fails, set a safe numeric default
(0) or treat it as below threshold and log a warning via _sd_log_info; then run
the existing awk comparison to compute above_threshold and proceed to call
_sd_record_milestone/return as before.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: edb72120-8811-4e49-8b86-c8002f879460

📥 Commits

Reviewing files that changed from the base of the PR and between 5e4113c and 2cd10c7.

📒 Files selected for processing (3)

.agents/scripts/ai-judgment-helper.sh
.agents/scripts/stuck-detection-helper.sh
.agents/scripts/worker-token-helper.sh

.agents/scripts/worker-token-helper.sh

…ze TOKEN_DIR in cmd_validate - stuck-detection-helper.sh: Add numeric regex validation for confidence and threshold before awk comparison to prevent string comparison semantics (e.g., "high" >= "0.7" is true lexicographically in awk) - worker-token-helper.sh: Canonicalize TOKEN_DIR via realpath in cmd_validate to match the pattern already used in cmd_revoke, preventing false rejections when TOKEN_DIR contains symlinks Addresses remaining CodeRabbit review feedback on PR marcusquinn#3877.

marcusquinn · 2026-03-08T15:39:51Z

@coderabbitai review

coderabbitai · 2026-03-08T15:39:58Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

.agents/scripts/stuck-detection-helper.sh (1)
264-283: ⚠️ Potential issue | 🟠 Major

Validate all parameters before GitHub operations and state mutations.

The guard at line 264 checks issue_number, milestone_min, and confidence, but omits elapsed_min—yet elapsed_min is interpolated into the comment body at line 313. More critically, milestone_min is passed to _sd_record_milestone (line 276) which pipes it to jq --argjson ms "$milestone" at line 242; non-numeric values like "30m" cause jq to exit with code 2, leaving GitHub labels applied but state unrecorded. Similarly, confidence enters an awk comparison at line 270 without validation, risking lexicographic semantics if the variable contains non-numeric text.

Add numeric validation before any side effects:
	if [[ -z "$issue_number" || -z "$milestone_min" || -z "$elapsed_min" || -z "$confidence" ]]; then
		_sd_log_error "usage: label-stuck <issue_number> <milestone_min> <elapsed_min> <confidence> <reasoning> <suggested_actions> [--repo <slug>]"
		return 1
	fi

+	if ! [[ "$milestone_min" =~ ^[0-9]+$ ]] || ! [[ "$elapsed_min" =~ ^[0-9]+$ ]]; then
+		_sd_log_error "milestone_min and elapsed_min must be positive integers (got milestone=${milestone_min}, elapsed=${elapsed_min})"
+		return 1
+	fi
+
+	if ! [[ "$confidence" =~ ^([0-9]+([.][0-9]+)?|[.][0-9]+)$ ]]; then
+		_sd_log_error "confidence must be numeric (got confidence=${confidence})"
+		return 1
+	fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.agents/scripts/stuck-detection-helper.sh around lines 264 - 283, The guard
currently omits validating elapsed_min and does not ensure milestone_min and
confidence are numeric before performing GitHub ops or calling
_sd_record_milestone; update the parameter validation at the start of
label-stuck to: check that elapsed_min is provided, validate milestone_min and
confidence (and STUCK_CONFIDENCE_THRESHOLD) match a numeric regex before any
side effects, and return non-zero with an _sd_log_error if validation fails;
ensure the check happens before any calls that mutate state (e.g., before
invoking _sd_record_milestone and before any labeling/GitHub API calls) so
functions like _sd_record_milestone and the awk comparison receive only numeric
values.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.agents/scripts/stuck-detection-helper.sh:
- Around line 423-428: The jq expression that assigns new_state can fail when
.labeled_issues is missing; update the filter used in the new_state assignment
(the jq call using --arg key "$issue_key" --arg issue "$issue_number" --arg repo
"$repo_slug") to coalesce .labeled_issues to an empty array before iterating it
(for example using (.labeled_issues // []) or (.labeled_issues // [] ) ), e.g.
keep del(.milestones_checked[$key]) and set .labeled_issues = [(.labeled_issues
// [])[] | select(not (.issue == $issue and .repo == $repo))] so the jq call no
longer errors and the || true masking is unnecessary for this case.

---

Outside diff comments:
In @.agents/scripts/stuck-detection-helper.sh:
- Around line 264-283: The guard currently omits validating elapsed_min and does
not ensure milestone_min and confidence are numeric before performing GitHub ops
or calling _sd_record_milestone; update the parameter validation at the start of
label-stuck to: check that elapsed_min is provided, validate milestone_min and
confidence (and STUCK_CONFIDENCE_THRESHOLD) match a numeric regex before any
side effects, and return non-zero with an _sd_log_error if validation fails;
ensure the check happens before any calls that mutate state (e.g., before
invoking _sd_record_milestone and before any labeling/GitHub API calls) so
functions like _sd_record_milestone and the awk comparison receive only numeric
values.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b59465a8-23b7-4464-bc8b-0fc5a609c26f

📥 Commits

Reviewing files that changed from the base of the PR and between 2cd10c7 and 0a39ee6.

📒 Files selected for processing (2)

.agents/scripts/stuck-detection-helper.sh
.agents/scripts/worker-token-helper.sh

🚧 Files skipped from review as they are similar to previous changes (1)

.agents/scripts/worker-token-helper.sh

.agents/scripts/stuck-detection-helper.sh

marcusquinn · 2026-03-08T15:51:49Z

Dispatching fix worker to address CodeRabbit CHANGES_REQUESTED review and failing CI (Monitor & Auto-Fix Code Quality, SonarCloud — systemic issues #3848/#3837 cover those; Label PR from Conventional Commit is per-PR).

…gainst missing state Address remaining CodeRabbit CHANGES_REQUESTED review feedback on PR marcusquinn#3877: - Add elapsed_min to required parameter check in cmd_label_stuck - Validate milestone_min and elapsed_min as positive integers before any GitHub operations or state mutations (prevents jq --argjson failures and comment interpolation of non-numeric values) - Guard cmd_label_clear jq filter against missing .labeled_issues key using null coalescing (// []) to prevent 'Cannot iterate over null' errors that were silently masked by || true

- ai-judgment-helper: use jq -cn for compact one-record-per-line output - stuck-detection-helper: validate elapsed_min/milestone_min/confidence as numeric - stuck-detection-helper: fix unique_by string concat collision, use tuple key - stuck-detection-helper: handle missing .labeled_issues key in clear filter - worker-token-helper: resolve TOKEN_DIR with realpath for symlink consistency - worker-token-helper: handle missing token file gracefully in cmd_revoke

johnwaldo · 2026-03-08T17:52:55Z

@coderabbitai review

All review feedback addressed:

ai-judgment-helper: added -cn flag for compact JSON output
stuck-detection-helper: added numeric validation for milestone_min, elapsed_min, confidence
stuck-detection-helper: fixed unique_by to use tuple key instead of string concatenation
stuck-detection-helper: fixed labeled_issues filter to handle missing key with // []
worker-token-helper: resolved TOKEN_DIR with realpath for symlink consistency in both validate and revoke

Please re-review.

coderabbitai · 2026-03-08T17:53:12Z

@johnwaldo Thanks for the thorough follow-up and for addressing the review feedback! I'll kick off a fresh incremental review now to validate all the changes across the three scripts:

ai-judgment-helper.sh — -cn flag for compact JSON output
stuck-detection-helper.sh — numeric validation, tuple unique_by key, // [] fallback
worker-token-helper.sh — realpath canonicalization for both validate and revoke

🧠 Learnings used

Learnt from: marcusquinn
Repo: marcusquinn/aidevops PR: 0
File: :0-0
Timestamp: 2026-03-03T16:33:18.313Z
Learning: For the marcusquinn/aidevops repository: The daily quality sweep system (pulse-wrapper.sh) should only mention coderabbitai for full codebase review when there are significant changes (Quality Gate failure, +10 issues in one day, etc.), not on every automated sweep with stable metrics.

Learnt from: CR
Repo: marcusquinn/aidevops PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-08T12:35:20.071Z
Learning: Consult tools/build-agent/agent-review.md for guidance on reviewing and improving agents

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

.agents/scripts/stuck-detection-helper.sh (1)

149-158: ⚠️ Potential issue | 🟠 Major

Use a symlink-safe temp file for state writes.

"${state_file}.tmp.$$" is predictable, so a pre-created symlink in SUPERVISOR_DIR can redirect the write into another file. Also, mv ... 2>/dev/null still hides the real failure reason in the atomic-write path.

Suggested fix

-	local tmp_file="${state_file}.tmp.$$"
-	if ! printf '%s\n' "$state_json" >"$tmp_file"; then
+	local tmp_file
+	tmp_file=$(mktemp "$(dirname "$state_file")/$(basename "$state_file").tmp.XXXXXX") || {
+		_sd_log_warn "failed to allocate temp state file"
+		return 1
+	}
+	if ! printf '%s\n' "$state_json" >"$tmp_file"; then
 		_sd_log_warn "failed to write temp state file: $tmp_file"
 		rm -f "$tmp_file" 2>/dev/null || true
 		return 1
 	fi
-	if ! mv -f "$tmp_file" "$state_file" 2>/dev/null; then
+	if ! mv -f "$tmp_file" "$state_file"; then
 		_sd_log_warn "failed to move temp state to: $state_file"
 		rm -f "$tmp_file" 2>/dev/null || true
 		return 1
 	fi

As per coding guidelines, ".agents/scripts/*.sh: Automation scripts - focus on: Reliability and robustness" and "Clear logging and feedback".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In @.agents/scripts/stuck-detection-helper.sh around lines 149 - 158, Replace
the predictable temp filename logic that sets tmp_file="${state_file}.tmp.$$"
with a symlink-safe mktemp call that creates the temp file in the same directory
as state_file (e.g., use mktemp --tmpdir="$(dirname "$state_file")" or mktemp
"$(dirname "$state_file")/tmp.XXXXXX"), and ensure the created tmp_file is
cleaned up on failures; also stop silencing mv errors—capture mv's stderr when
moving tmp_file to state_file and pass that error text to _sd_log_warn (include
the state_file and captured error), then remove tmp_file and return 1.
Reference: tmp_file, state_file, and _sd_log_warn.

🧹 Nitpick comments (2)

.agents/scripts/worker-token-helper.sh (1)
370-378: Solid arithmetic injection guard.

The regex validation ensures TTL contains only digits before any arithmetic evaluation, effectively blocking injection payloads like 1; rm -rf /.

One edge-case nitpick: bash arithmetic interprets leading zeros as octal (e.g., 010 → 8 decimal). If you want explicit decimal interpretation regardless of input format, you could use 10#$ttl in the comparison. This is quite minor since users rarely pass octal-formatted TTLs.

,
Optional: force decimal interpretation
-		if ((ttl > MAX_TTL)); then
+		if ((10#$ttl > MAX_TTL)); then
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.agents/scripts/worker-token-helper.sh around lines 370 - 378, The
comparison against MAX_TTL can misinterpret inputs with leading zeros as octal;
update the arithmetic check that uses ttl (and subsequent assignment) to force
decimal interpretation by applying the bash base prefix (e.g., use 10#$ttl) when
evaluating ttl in the (( ttl > MAX_TTL )) test and any arithmetic context
involving ttl (refer to the ttl variable and the MAX_TTL comparison in the
worker-token-helper.sh snippet and keep the existing regex validation and
log_token calls intact).
.agents/scripts/stuck-detection-helper.sh (1)
291-295: Clarify exit code semantics: "skipped" paths should use distinct status.

The label-stuck command returns 0 in three distinct scenarios: confidence below threshold (line 295), GitHub skipped via SD_SKIP_GITHUB=true (line 317), and successful labeling (line 372). This ambiguity means future callers cannot distinguish between "checked and skipped" versus "checked and labeled"—currently no active callers exist, but integrating this helper without a separate exit code for skipped scenarios will create the misreporting risk described. Consider using exit code 2 (or similar) for the "checked but skipped" paths to keep the contract unambiguous.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.agents/scripts/stuck-detection-helper.sh around lines 291 - 295, The
"label-stuck" script currently returns 0 for both "checked but skipped" and
"checked and labeled" outcomes; change the skipped-return semantics so callers
can distinguish them by returning a non-zero but distinct code (e.g., 2) for
skipped paths. Specifically, update the branch that logs "confidence ... below
threshold" (the block using _sd_log_info, _sd_record_milestone and currently
`return 0`) to `return 2`, and likewise update the SD_SKIP_GITHUB path (the
branch checking SD_SKIP_GITHUB=true) to return the same distinct code; keep the
successful labeling/normal completion return as 0 so success remains unchanged.
Ensure the referenced helpers (_sd_log_info, _sd_record_milestone) remain called
as before and only the return values change.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.agents/scripts/stuck-detection-helper.sh:
- Around line 351-354: The gh issue comment failure handler currently calls
_sd_log_warn which exits 0, so failures are swallowed; update the handler after
the gh issue comment invocation to propagate a non-zero status (e.g., change the
failure branch to call _sd_log_warn and then exit 1 or return 1 depending on
whether this code runs in the main script or inside a function) so that the gh
issue comment failure is visible to the caller; target the gh issue comment
invocation and its failure block and ensure it returns/ exits non-zero instead
of relying on _sd_log_warn's current behavior.

---

Outside diff comments:
In @.agents/scripts/stuck-detection-helper.sh:
- Around line 149-158: Replace the predictable temp filename logic that sets
tmp_file="${state_file}.tmp.$$" with a symlink-safe mktemp call that creates the
temp file in the same directory as state_file (e.g., use mktemp
--tmpdir="$(dirname "$state_file")" or mktemp "$(dirname
"$state_file")/tmp.XXXXXX"), and ensure the created tmp_file is cleaned up on
failures; also stop silencing mv errors—capture mv's stderr when moving tmp_file
to state_file and pass that error text to _sd_log_warn (include the state_file
and captured error), then remove tmp_file and return 1. Reference: tmp_file,
state_file, and _sd_log_warn.

---

Nitpick comments:
In @.agents/scripts/stuck-detection-helper.sh:
- Around line 291-295: The "label-stuck" script currently returns 0 for both
"checked but skipped" and "checked and labeled" outcomes; change the
skipped-return semantics so callers can distinguish them by returning a non-zero
but distinct code (e.g., 2) for skipped paths. Specifically, update the branch
that logs "confidence ... below threshold" (the block using _sd_log_info,
_sd_record_milestone and currently `return 0`) to `return 2`, and likewise
update the SD_SKIP_GITHUB path (the branch checking SD_SKIP_GITHUB=true) to
return the same distinct code; keep the successful labeling/normal completion
return as 0 so success remains unchanged. Ensure the referenced helpers
(_sd_log_info, _sd_record_milestone) remain called as before and only the return
values change.

In @.agents/scripts/worker-token-helper.sh:
- Around line 370-378: The comparison against MAX_TTL can misinterpret inputs
with leading zeros as octal; update the arithmetic check that uses ttl (and
subsequent assignment) to force decimal interpretation by applying the bash base
prefix (e.g., use 10#$ttl) when evaluating ttl in the (( ttl > MAX_TTL )) test
and any arithmetic context involving ttl (refer to the ttl variable and the
MAX_TTL comparison in the worker-token-helper.sh snippet and keep the existing
regex validation and log_token calls intact).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a36c1c74-c0c7-44a6-997a-b1be80ed69b8

📥 Commits

Reviewing files that changed from the base of the PR and between 0a39ee6 and e03d196.

📒 Files selected for processing (2)

.agents/scripts/stuck-detection-helper.sh
.agents/scripts/worker-token-helper.sh

.agents/scripts/stuck-detection-helper.sh

Capture gh issue comment failure status and return it after state recording completes. State is always recorded regardless of comment success (since the label was already applied), preventing re-labeling on the next check cycle.

johnwaldo · 2026-03-08T18:37:47Z

@coderabbitai review

Addressed the remaining review comment:

stuck-detection-helper.sh: Advisory-comment failure now propagated via comment_failed flag. State recording still happens regardless (label was already applied), preventing re-labeling on next check. Returns non-zero only if the comment failed.

All other review comments (duplicate TTL validation, indentation, duplicate --arg, jq -cn, unique_by tuple, realpath canonicalization, labeled_issues null guard) were already addressed in prior commits.

coderabbitai · 2026-03-08T18:38:00Z

@johnwaldo Thanks for the update! The comment_failed flag approach for stuck-detection-helper.sh is a clean solution — propagating the advisory-comment failure while preserving state ensures the label-was-applied record is maintained and avoids re-labeling on the next cycle. Kicking off a fresh incremental review now to validate the final fix across all three scripts.

🧠 Learnings used

Learnt from: marcusquinn
Repo: marcusquinn/aidevops PR: 0
File: :0-0
Timestamp: 2026-03-03T16:33:18.313Z
Learning: For the marcusquinn/aidevops repository: The daily quality sweep system (pulse-wrapper.sh) should only mention coderabbitai for full codebase review when there are significant changes (Quality Gate failure, +10 issues in one day, etc.), not on every automated sweep with stable metrics.

Learnt from: marcusquinn
Repo: marcusquinn/aidevops PR: 3893
File: .agents/scripts/pulse-wrapper.sh:452-455
Timestamp: 2026-03-08T16:59:32.817Z
Learning: In marcusquinn/aidevops, the `supervisor` label has been in production on all managed repos for months as a role label for health dashboard issues. The `contributor` label was added in PR `#3893` following the same convention for non-maintainer health dashboards. Renaming these labels would break existing health issues and require migration logic — label namespacing is a known deferred follow-up item, not an immediate fix requirement. Both labels are only created on repos with `pulse: true` in repos.json, which are fully framework-managed repos.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

marcusquinn · 2026-03-08T20:44:13Z

Dispatching worker to address review feedback.

Model: sonnet (anthropic/claude-sonnet-4-6)
Branch: bugfix/critical-batch3
Scope: Address CHANGES_REQUESTED review feedback on PR fix: security hardening — path traversal, injection, jq filter bug (batch 3) #3877
Attempt: 1 of 1
Direction: Push fixes to existing branch bugfix/critical-batch3 — no new PR needed

johnwaldo · 2026-03-08T21:03:44Z

@coderabbitai review

coderabbitai · 2026-03-08T21:03:50Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

marcusquinn · 2026-03-08T21:17:33Z

@coderabbitai review

coderabbitai · 2026-03-08T21:17:40Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

marcusquinn · 2026-03-08T21:19:16Z

Worker killed after 3h34m with 2 commits (struggle_ratio: 19996).

Model: sonnet (anthropic/claude-sonnet-4-6)
Branch: bugfix/critical-batch3
Reason: Extreme thrashing — 39992 messages, only 2 commits. Context exhaustion loop.
Diagnosis: 7 unresolved bot suggestions detected. Worker likely looped on review feedback without resolving suggestions.
Next action: Re-dispatch to address 7 unresolved bot suggestions, then merge (PR is APPROVED + CI PASS).

marcusquinn · 2026-03-08T21:38:29Z

Dispatching fix worker to address 7 unresolved bot review suggestions.

Model: sonnet (anthropic/claude-sonnet-4-6)
Branch: bugfix/critical-batch3
Scope: Address unresolved inline suggestions from coderabbit/gemini on this PR
Attempt: 1 of 3

marcusquinn · 2026-03-09T04:09:12Z

Review Bot Suggestions — Resolution Summary

All CodeRabbit and Gemini Code Assist findings have been addressed. Here's the status of each:

Already Fixed (prior commits)

Finding	File	Status
Duplicate TTL validation block	`worker-token-helper.sh`	Fixed — single block remains
Indentation inconsistency	`ai-judgment-helper.sh`	Fixed — consistent single-tab
Duplicate `--arg repo`	`stuck-detection-helper.sh`	Fixed — single arg
`unique_by(.issue + .repo)` string concat	`stuck-detection-helper.sh`	Fixed — uses tuple `[.issue, .repo]`
`jq -n` → `jq -cn` for JSONL	`ai-judgment-helper.sh`	Fixed — compact output on both calls
`realpath` on deleted token file	`worker-token-helper.sh:cmd_revoke`	Fixed — resolves parent dir + validates basename
`TOKEN_DIR` symlink canonicalization	`worker-token-helper.sh:cmd_validate`	Fixed — `realpath "$TOKEN_DIR"` before comparison
Numeric validation for confidence/milestone/elapsed	`stuck-detection-helper.sh`	Fixed — regex guards before any side effects
Predictable temp filename (symlink attack)	`stuck-detection-helper.sh`	Fixed — uses `mktemp` with `XXXXXX` suffix
`.labeled_issues` null coalesce in `label-clear`	`stuck-detection-helper.sh`	Fixed — `(.labeled_issues // [])`
Octal interpretation of leading-zero TTL	`worker-token-helper.sh`	Fixed — `10#$ttl` forces decimal
Exit code 2 for "checked but skipped" paths	`stuck-detection-helper.sh`	Fixed — `return 2` for below-threshold and skip-github

Intentionally Not Applied

Finding	Reason
`return 1` in `gh issue comment` failure handler	Current design defers return via `comment_failed` flag (line 375) to ensure state recording completes. Immediate `return 1` would skip `_sd_record_milestone` and state write, causing re-labeling on next check.

All inline comments have been replied to individually.

github-actions bot added the bug Auto-created from TODO.md tag label Mar 8, 2026

gemini-code-assist bot reviewed Mar 8, 2026

View reviewed changes

.agents/scripts/worker-token-helper.sh Outdated Show resolved Hide resolved

.agents/scripts/ai-judgment-helper.sh Outdated Show resolved Hide resolved

.agents/scripts/stuck-detection-helper.sh Outdated Show resolved Hide resolved

marcusquinn mentioned this pull request Mar 8, 2026

[Supervisor:marcusquinn] 0 PRs, 0 assigned, 0 workers at 10:11 UTC #2645

Open

coderabbitai bot requested changes Mar 8, 2026

View reviewed changes

.agents/scripts/ai-judgment-helper.sh Outdated Show resolved Hide resolved

.agents/scripts/stuck-detection-helper.sh Outdated Show resolved Hide resolved

.agents/scripts/worker-token-helper.sh Show resolved Hide resolved

marcusquinn added the external-contributor PR from external contributor, requires maintainer review label Mar 8, 2026

coderabbitai bot requested changes Mar 8, 2026

View reviewed changes

.agents/scripts/worker-token-helper.sh Show resolved Hide resolved

coderabbitai bot requested changes Mar 8, 2026

View reviewed changes

.agents/scripts/stuck-detection-helper.sh Show resolved Hide resolved

marcusquinn and others added 2 commits March 8, 2026 15:54

coderabbitai bot requested changes Mar 8, 2026

View reviewed changes

.agents/scripts/stuck-detection-helper.sh Show resolved Hide resolved

fix: propagate advisory-comment failure from label-stuck

9a61e30

Capture gh issue comment failure status and return it after state recording completes. State is always recorded regardless of comment success (since the label was already applied), preventing re-labeling on the next check cycle.

coderabbitai bot approved these changes Mar 8, 2026

View reviewed changes

marcusquinn added the needs-review-fixes label Mar 8, 2026

marcusquinn merged commit 47bb6bf into marcusquinn:main Mar 9, 2026
54 checks passed

coderabbitai bot mentioned this pull request Mar 12, 2026

GH#3179: fix critical review feedback on ai-judgment-helper.sh (PR #2914) #4236

Merged

marcusquinn added the already-fixed Already fixed by another change label Mar 16, 2026

Conversation

johnwaldo commented Mar 8, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Issues Fixed

Vulnerability Details

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

Suggested labels

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist bot commented Mar 8, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

marcusquinn commented Mar 8, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

marcusquinn commented Mar 8, 2026

Uh oh!

coderabbitai bot commented Mar 8, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

marcusquinn commented Mar 8, 2026

Uh oh!

johnwaldo commented Mar 8, 2026

Uh oh!

coderabbitai bot commented Mar 8, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

johnwaldo commented Mar 8, 2026

Uh oh!

coderabbitai bot commented Mar 8, 2026

Uh oh!

marcusquinn commented Mar 8, 2026

Uh oh!

johnwaldo commented Mar 8, 2026

Uh oh!

coderabbitai bot commented Mar 8, 2026

Uh oh!

marcusquinn commented Mar 8, 2026

Uh oh!

coderabbitai bot commented Mar 8, 2026

Uh oh!

marcusquinn commented Mar 8, 2026

Uh oh!

marcusquinn commented Mar 8, 2026

Uh oh!

marcusquinn commented Mar 9, 2026

Review Bot Suggestions — Resolution Summary

johnwaldo commented Mar 8, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 8, 2026 •

edited

Loading