t1113: Add worker_never_started diagnostic and auto-retry with environment check by marcusquinn · Pull Request #1980 · marcusquinn/aidevops

marcusquinn · 2026-02-19T22:47:40Z

Summary

Prevents wasting retries on environment issues where the AI CLI is broken. On Feb 13, 5 tasks failed with worker_never_started:no_sentinel — the CLI was invoked but never produced output. Each failure burned a retry, exhausting max_retries on an infrastructure problem that no task retry would fix.

Changes

1. Pre-dispatch CLI health check (`dispatch.sh`)

New check_cli_health() function verifies the AI CLI binary exists and can execute before spawning workers
Checks: command -v for PATH presence, then lightweight version/help command with 10s timeout
Caches result per-pulse and for 5 minutes (file-based) to avoid repeated probes
On failure: defers dispatch to next pulse (return 3) instead of proceeding — no worktree created, no retry burned
Added cli_health=ok to dispatch metadata log for auditability

2. ENVIRONMENT failure mode (`evaluate.sh`)

New failure category ENVIRONMENT for dispatch infrastructure failures
Reclassified worker_never_started*, log_file_missing*, log_file_empty, no_log_path_in_db*, dispatch_script_not_executable from LOGIC to ENVIRONMENT
These failures indicate the CLI/environment is broken, not the task itself

3. Re-queue without burning retries (`pulse.sh`)

When pulse detects an ENVIRONMENT failure, re-queues the task (evaluating -> queued) without incrementing retry count
Invalidates CLI health cache so next dispatch re-verifies the environment
Stores failure pattern for diagnostics without marking as permanent task failure
Proof-log entry records failure_mode=ENVIRONMENT,retry_preserved=true

Verification

All 3 modified .sh files pass ShellCheck with zero violations
evaluating:queued is an existing valid state transition
failed:queued is also valid (used by existing recovery paths)

Ref #1664

Summary by CodeRabbit

Release Notes

New Features
- Added CLI health verification before task dispatch to ensure required tools are available before processing begins.
Improvements
- Introduced environment failure classification to prevent wasted retries on infrastructure issues.
- Enhanced failure handling to intelligently defer environment-related failures without consuming retry attempts.

…ed failures (t1113) Before dispatch, verify the AI CLI binary exists and can execute. If the check fails, defer dispatch to next pulse instead of spawning a worker that will never produce output. This prevents burning retries on environment issues (missing binary, broken installation, PATH problems). The check_cli_health() function: - Verifies binary exists in PATH (command -v) - Runs lightweight version check with 10s timeout - Caches result per-pulse and for 5 minutes (file-based) - Logs diagnostic detail on failure for troubleshooting Also adds cli_health=ok to dispatch metadata log for auditability.

…without burning retries (t1113) Reclassify worker_never_started, log_file_missing, log_file_empty, and related dispatch infrastructure failures from LOGIC to new ENVIRONMENT category. These failures indicate the CLI/environment is broken, not the task itself. When the pulse detects an ENVIRONMENT failure: - Re-queues the task (evaluating -> queued) without incrementing retry count - Invalidates CLI health cache so next dispatch re-verifies the environment - Stores failure pattern for diagnostics without marking as task failure - Logs proof-log entry with failure_mode=ENVIRONMENT for auditability This prevents the Feb 13 scenario where 5 tasks exhausted their retries on worker_never_started:no_sentinel when the underlying issue was a broken CLI environment that no amount of task retries would fix.

gemini-code-assist · 2026-02-19T22:47:58Z

Summary of Changes

Hello @marcusquinn, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the robustness of the task dispatch system by intelligently handling environment-related failures. It introduces proactive checks for AI CLI health and a new failure classification, ensuring that tasks encountering infrastructure problems are re-queued without consuming valuable retries, thereby improving system efficiency and preventing tasks from being prematurely marked as failed due to external factors.

Highlights

Pre-dispatch CLI Health Check: A new check_cli_health() function was added to dispatch.sh to verify the AI CLI binary's existence and executability before spawning workers. This check includes PATH verification and a lightweight version/help command with a timeout, caching results to prevent redundant checks. If the CLI is unhealthy, dispatch is deferred, preventing worktree creation and retry burning.
New ENVIRONMENT Failure Mode: A new ENVIRONMENT failure category was introduced in evaluate.sh for dispatch infrastructure failures. Several existing failure patterns like worker_never_started and log_file_missing were reclassified from LOGIC to ENVIRONMENT, indicating issues with the environment rather than the task logic itself.
Smart Re-queue for Environment Failures: The pulse.sh script was updated to detect ENVIRONMENT failures. When such a failure occurs, the task is re-queued (evaluating -> queued) without incrementing its retry count, preserving retries for actual task-related issues. The CLI health cache is also invalidated to force a re-verification of the environment on the next dispatch attempt.

Changelog

.agents/scripts/supervisor/dispatch.sh
- Added a new check_cli_health function to verify the AI CLI binary's presence and executability, including caching mechanisms.
- Integrated the check_cli_health function into the cmd_dispatch logic, deferring dispatch if the CLI is unhealthy.
- Included cli_health=ok in the dispatch metadata log for successful CLI health checks.
.agents/scripts/supervisor/evaluate.sh
- Introduced a new ENVIRONMENT failure category to classify infrastructure-related dispatch issues.
- Reclassified specific failure patterns (e.g., worker_never_started, log_file_missing) from the LOGIC category to the new ENVIRONMENT category.
.agents/scripts/supervisor/pulse.sh
- Modified the failed task handling to classify failures using the new classify_failure_mode function.
- Implemented conditional logic to re-queue tasks with ENVIRONMENT failures back to queued status without incrementing their retry count.
- Added logic to invalidate the CLI health cache when an ENVIRONMENT failure occurs, ensuring a fresh check on the next dispatch.

Activity

The pull request was opened by marcusquinn.
No human activity (comments, reviews) has occurred yet.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

coderabbitai · 2026-02-19T22:48:01Z

Walkthrough

Added CLI health verification function with caching to validate AI CLI binary availability before dispatch. Introduced ENVIRONMENT failure category to classify infrastructure/dispatch issues separately from logic errors. Integrated environment-aware failure handling in pulse Phase 1 that requeues without consuming retries, invalidates health cache, and performs pre-dispatch verification in dispatch flow.

Changes

Cohort / File(s)	Summary
CLI Health Check Infrastructure `.agents/scripts/supervisor/dispatch.sh`	Introduced `check_cli_health()` function verifying CLI binary existence, executability, and version output usability. Includes pulse-level fast path and 5-minute file-based cache. Integrated into cmd_dispatch before worktree creation with deferral (return code 3) on failure. Added cli_health=ok metadata to dispatch logs.
Environment Failure Classification `.agents/scripts/supervisor/evaluate.sh`	Added new ENVIRONMENT category for infrastructure/dispatch issues alongside existing TRANSIENT, RESOURCE, LOGIC, BLOCKED, AMBIGUOUS categories. Reclassified patterns (worker_never_started, log_file_missing, log_file_empty, no_log_path_in_db*, dispatch_script_not_executable) from LOGIC/TRANSIENT to ENVIRONMENT with deferred retry behavior.
Environment-Aware Failure Handling `.agents/scripts/supervisor/pulse.sh`	Modified Phase 1 failure handling to classify failures via `classify_failure_mode`. ENVIRONMENT failures requeue without incrementing retry counter, write environment_failure proof-log, invalidate CLI health cache, and reset health-verify flag. Non-environment failures preserve original logic.

Sequence Diagram

sequenceDiagram
    participant Pulse as Pulse Supervisor
    participant Dispatch as Dispatch Flow
    participant Evaluate as Evaluate (Classify)
    participant Health as CLI Health Check
    participant Worker as Worker Process
    participant Cache as Health Cache

    Pulse->>Dispatch: Phase 1 - Check for work
    Dispatch->>Health: check_cli_health()
    Health->>Cache: Check 5-min cache
    alt Cache Valid
        Health->>Health: Return cached result
    else Cache Expired
        Health->>Health: Verify CLI binary exists
        Health->>Health: Verify executable & version
        Health->>Cache: Store result
    end
    Health-->>Dispatch: Health status (0=ok, 1=fail)
    
    alt CLI Healthy
        Dispatch->>Worker: Spawn worker process
        Dispatch-->>Pulse: Return 0
    else CLI Unhealthy
        Dispatch-->>Pulse: Return 3 (defer)
    end

    alt Evaluation Phase
        Pulse->>Evaluate: classify_failure_mode()
        Evaluate-->>Pulse: ENVIRONMENT or other
        alt Is ENVIRONMENT
            Pulse->>Pulse: Requeue (no retry++)
            Pulse->>Cache: Invalidate health cache
            Pulse->>Worker: Cleanup process tree
        else Non-ENVIRONMENT
            Pulse->>Pulse: Original failure logic
        end
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

t1113: Add worker_never_started diagnostic and auto-retry with environment check #1664 — Directly addresses worker_never_started diagnostic with auto-retry and environment checks via new ENVIRONMENT category and requeue logic.

Possibly related PRs

fix: supervisor self-healing -- macOS timeout, PR detection, model names, stale PID cleanup #429 — Both modify supervisor health-check infrastructure (this adds check_cli_health; #429 updates check_model_health and health-probing).
t1031: Modularize supervisor-helper.sh into domain modules #1359 — Both enhance the same supervisor modules (dispatch.sh, evaluate.sh, pulse.sh) with complementary health-check and environment-failure handling logic.
feat: add worker dispatch with worktree isolation to supervisor (t128.2) #377 — This PR extends the dispatch/workflow foundation introduced in #377 (cmd_dispatch, resolve_ai_cli, dispatch flow) with pre-dispatch health verification.

Poem

🏥 A CLI breathes before it runs,
Environment gremlins meet their match—
No wasteful retry, cached and spun,
Just defer with grace and a gentle patch.
Infrastructure speaks, we now listen true. ✨

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main changes: adding a worker_never_started diagnostic mechanism and implementing environment-aware retry logic with CLI health checks across dispatch, evaluate, and pulse scripts.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/t1113

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-02-19T22:48:11Z

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 30 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Thu Feb 19 22:48:06 UTC 2026: Code review monitoring started
Thu Feb 19 22:48:07 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 30

📈 Current Quality Metrics

BUGS: 0
CODE SMELLS: 30
VULNERABILITIES: 0

Generated on: Thu Feb 19 22:48:09 UTC 2026

Generated by AI DevOps Framework Code Review Monitoring

sonarqubecloud · 2026-02-19T22:49:00Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

.agents/scripts/supervisor/dispatch.sh (2)
1530-1534: Document the external unset contract for _PULSE_CLI_VERIFIED.

_PULSE_CLI_VERIFIED is never cleared within this function or file — its lifecycle depends on pulse.sh unsetting it on ENVIRONMENT failures. The function comment only describes the fast-path behaviour, not the obligation on the caller. A future refactor or new entrypoint that bypasses pulse.sh would silently skip all CLI health checks for the entire process lifetime.

Consider adding a brief note (matching the existing _PULSE_HEALTH_VERIFIED pattern) so the coupling is explicit:
✏️ Suggested doc addition
 	# Pulse-level fast path: if CLI was already verified in this pulse, skip
+	# NOTE: _PULSE_CLI_VERIFIED must be unset by pulse.sh after ENVIRONMENT failures
+	# so the next pulse re-probes the CLI. See pulse.sh ENVIRONMENT failure handler.
 	if [[ -n "${_PULSE_CLI_VERIFIED:-}" ]]; then
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.agents/scripts/supervisor/dispatch.sh around lines 1530 - 1534, The
fast-path uses the external flag _PULSE_CLI_VERIFIED but its lifecycle is
controlled elsewhere (pulse.sh) and is never unset in this file; add a short
comment above the check for _PULSE_CLI_VERIFIED in dispatch.sh (matching the
existing _PULSE_HEALTH_VERIFIED pattern) documenting that callers (pulse.sh)
must unset this variable on ENVIRONMENT failures and that other entrypoints must
honor/replicate that contract to avoid silently skipping CLI health checks for
the process lifetime; reference the variable name _PULSE_CLI_VERIFIED and
pulse.sh in the comment so future maintainers know where the unset happens and
the obligation to clear it.
978-984: check_cli_health is not called in do_prompt_repeat or cmd_reprompt, leaving those dispatch paths unguarded.

The PR's stated goal is to avoid wasting retries when the CLI environment is broken. Both do_prompt_repeat (line 980) and cmd_reprompt (line 3001) call check_model_health only, without verifying CLI binary health. Since both can spawn real workers and consume retry counts, an environment where the opencode binary is missing or broken would be caught on the first dispatch via cmd_dispatch, but subsequent prompt-repeat and reprompt cycles would still proceed and fail.

Consider adding the same guard to both functions:
♻️ Proposed addition to do_prompt_repeat (~Line 978)
 	# Pre-dispatch health check
 	local health_model health_exit=0
+	local cli_health_rc=0
+	check_cli_health "$ai_cli" >/dev/null || cli_health_rc=$?
+	if [[ "$cli_health_rc" -ne 0 ]]; then
+		log_error "do_prompt_repeat: CLI health check failed ($ai_cli) — deferring"
+		return 1
+	fi
 	health_model=$(resolve_model "health" "$ai_cli")
Apply the same pattern to cmd_reprompt near line 3001. The redirect to /dev/null discards diagnostic stdout while preserving stderr emission.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.agents/scripts/supervisor/dispatch.sh around lines 978 - 984,
do_prompt_repeat and cmd_reprompt currently only call check_model_health,
leaving CLI binary failures unguarded; add the same check_cli_health guard used
in cmd_dispatch: after resolving the ai_cli and health_model (using
resolve_model "$ai_cli"), call check_cli_health "$ai_cli" "$health_model" and
capture its exit code (e.g., cli_exit=0; check_cli_health ... || cli_exit=$?);
if cli_exit is non‑zero log a warning like the existing log_warn message and
return 1 to abort the prompt-repeat/reprompt path so retries aren't wasted.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.agents/scripts/supervisor/dispatch.sh:
- Around line 1562-1595: The health-check incorrectly initializes version_exit=1
and uses the "cmd || version_exit=$?" idiom, causing successful commands to
leave version_exit at 1 and producing false positives; change each invocation
that currently does version_output=$(... ) || version_exit=$? to first capture
output then immediately set version_exit=$? (e.g., version_output=$(... 2>&1);
version_exit=$?) for all branches that run the CLI (the blocks referencing
timeout_cmd, ai_cli, and the "opencode" vs "claude CLI" branches), leaving the
rest of the logic (the check using version_exit, version_output, cli_cache_file,
_PULSE_CLI_VERIFIED, and log_info) unchanged.

---

Nitpick comments:
In @.agents/scripts/supervisor/dispatch.sh:
- Around line 1530-1534: The fast-path uses the external flag
_PULSE_CLI_VERIFIED but its lifecycle is controlled elsewhere (pulse.sh) and is
never unset in this file; add a short comment above the check for
_PULSE_CLI_VERIFIED in dispatch.sh (matching the existing _PULSE_HEALTH_VERIFIED
pattern) documenting that callers (pulse.sh) must unset this variable on
ENVIRONMENT failures and that other entrypoints must honor/replicate that
contract to avoid silently skipping CLI health checks for the process lifetime;
reference the variable name _PULSE_CLI_VERIFIED and pulse.sh in the comment so
future maintainers know where the unset happens and the obligation to clear it.
- Around line 978-984: do_prompt_repeat and cmd_reprompt currently only call
check_model_health, leaving CLI binary failures unguarded; add the same
check_cli_health guard used in cmd_dispatch: after resolving the ai_cli and
health_model (using resolve_model "$ai_cli"), call check_cli_health "$ai_cli"
"$health_model" and capture its exit code (e.g., cli_exit=0; check_cli_health
... || cli_exit=$?); if cli_exit is non‑zero log a warning like the existing
log_warn message and return 1 to abort the prompt-repeat/reprompt path so
retries aren't wasted.

coderabbitai · 2026-02-19T22:54:07Z

.agents/scripts/supervisor/dispatch.sh

+	local version_output=""
+	local version_exit=1
+
+	# Use timeout to prevent hanging on broken installations
+	local timeout_cmd=""
+	if command -v gtimeout &>/dev/null; then
+		timeout_cmd="gtimeout"
+	elif command -v timeout &>/dev/null; then
+		timeout_cmd="timeout"
+	fi
+
+	if [[ "$ai_cli" == "opencode" ]]; then
+		if [[ -n "$timeout_cmd" ]]; then
+			version_output=$("$timeout_cmd" 10 "$ai_cli" version 2>&1) || version_exit=$?
+		else
+			version_output=$("$ai_cli" version 2>&1) || version_exit=$?
+		fi
+	else
+		# claude CLI
+		if [[ -n "$timeout_cmd" ]]; then
+			version_output=$("$timeout_cmd" 10 "$ai_cli" --version 2>&1) || version_exit=$?
+		else
+			version_output=$("$ai_cli" --version 2>&1) || version_exit=$?
+		fi
+	fi
+
+	# If version command succeeded (exit 0) or produced output, CLI is working
+	if [[ "$version_exit" -eq 0 ]] || [[ -n "$version_output" && "$version_exit" -ne 124 && "$version_exit" -ne 137 ]]; then
+		# Cache the healthy result
+		date +%s >"$cli_cache_file" 2>/dev/null || true
+		_PULSE_CLI_VERIFIED="true"
+		log_info "CLI health: OK ($ai_cli: ${version_output:0:80})"
+		return 0
+	fi


⚠️ Potential issue | 🟡 Minor

version_exit=1 initialization makes the exit_code==0 branch unreachable and enables a false-positive health result.

The || version_exit=$? idiom only assigns when the command exits non-zero (short-circuit evaluation). Because version_exit is initialized to 1, if opencode version / claude --version exits successfully (code 0), version_exit stays at 1 — the "$version_exit" -eq 0 guard on Line 1589 is permanently dead code.

More importantly this creates a false positive: a binary that exits non-zero (broken installation, startup crash) but still emits any output to stderr/stdout (which 2>&1 captures) will satisfy the second condition [[ -n "$version_output" && "$version_exit" -ne 124 && "$version_exit" -ne 137 ]] and be reported as healthy — exactly the class of environment failure this check is designed to catch.

🐛 Proposed fix — separate assignment from capture

- local version_output="" - local version_exit=1 + local version_output="" + local version_exit=0 if [[ "$ai_cli" == "opencode" ]]; then if [[ -n "$timeout_cmd" ]]; then - version_output=$("$timeout_cmd" 10 "$ai_cli" version 2>&1) || version_exit=$? + version_output=$("$timeout_cmd" 10 "$ai_cli" version 2>&1) + version_exit=$? else - version_output=$("$ai_cli" version 2>&1) || version_exit=$? + version_output=$("$ai_cli" version 2>&1) + version_exit=$? fi else if [[ -n "$timeout_cmd" ]]; then - version_output=$("$timeout_cmd" 10 "$ai_cli" --version 2>&1) || version_exit=$? + version_output=$("$timeout_cmd" 10 "$ai_cli" --version 2>&1) + version_exit=$? else - version_output=$("$ai_cli" --version 2>&1) || version_exit=$? + version_output=$("$ai_cli" --version 2>&1) + version_exit=$? fi fi

With this change, a clean exit 0 (even with empty output) is treated as healthy via the first condition, and a non-zero exit is correctly treated as a failure unless there is output (which preserves the "output-even-on-non-zero" lenient path intentionally).

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In @.agents/scripts/supervisor/dispatch.sh around lines 1562 - 1595, The health-check incorrectly initializes version_exit=1 and uses the "cmd || version_exit=$?" idiom, causing successful commands to leave version_exit at 1 and producing false positives; change each invocation that currently does version_output=$(... ) || version_exit=$? to first capture output then immediately set version_exit=$? (e.g., version_output=$(... 2>&1); version_exit=$?) for all branches that run the CLI (the blocks referencing timeout_cmd, ai_cli, and the "opencode" vs "claude CLI" branches), leaving the rest of the logic (the check using version_exit, version_output, cli_cache_file, _PULSE_CLI_VERIFIED, and log_info) unchanged.

gemini-code-assist

Code Review

The pull request introduces a valuable pre-dispatch health check for the AI CLI, improving the overall reliability of the supervisor by preventing wasted retries on environment-related issues. The classification of failures into a new ENVIRONMENT category is a sound architectural decision. However, there are some technical issues in the health check implementation, particularly regarding arithmetic safety with potentially empty variables and incorrect exit code handling due to initialization values.

gemini-code-assist · 2026-02-19T22:55:24Z

.agents/scripts/supervisor/dispatch.sh

+		local cached_at
+		cached_at=$(cat "$cli_cache_file" 2>/dev/null || echo "0")
+		local now
+		now=$(date +%s)
+		local age=$((now - cached_at))


The arithmetic expansion at line 1545 is unsafe because cached_at can be empty if the cache file exists but is empty (in which case cat succeeds and the || echo "0" fallback is not triggered). This would cause a syntax error in the arithmetic expansion. Additionally, following the repository style guide (line 11), declarations and assignments for expansions that can fail should be separate to ensure exit code safety.

Suggested change

local cached_at

cached_at=$(cat "$cli_cache_file" 2>/dev/null || echo "0")

local now

now=$(date +%s)

local age=$((now - cached_at))

local cached_at

cached_at=$(cat "$cli_cache_file" 2>/dev/null || echo "0")

local now

now=$(date +%s)

local age

age=$((now - ${cached_at:-0}))

References

Use local var="$1" pattern in functions (declare and assign separately for exit code safety) ^(link)

In shell scripts with set -e, var=$(cat file || echo 0) is not safe for arithmetic if the file can be empty. cat on an empty file succeeds, making var an empty string and causing arithmetic expansion to fail. Use parameter expansion with a default value (e.g., $((... - ${var:-0}))) to handle empty strings robustly.

gemini-code-assist · 2026-02-19T22:55:24Z

.agents/scripts/supervisor/dispatch.sh

+
+	# Check 2: binary is executable and can produce version output
+	local version_output=""
+	local version_exit=1


Initializing version_exit to 1 causes a logic error in the health check. When the version command succeeds (exit 0), the || version_exit=$? part is skipped, leaving version_exit at its initial value of 1. This causes the first condition of the health check at line 1589 ([[ "$version_exit" -eq 0 ]]) to fail even on success, forcing the check to rely on the more lenient output-based condition. It should be initialized to 0.

Suggested change

local version_exit=1

local version_exit=0

gemini-code-assist · 2026-02-19T22:55:24Z

.agents/scripts/supervisor/dispatch.sh

+	fi
+
+	# If version command succeeded (exit 0) or produced output, CLI is working
+	if [[ "$version_exit" -eq 0 ]] || [[ -n "$version_output" && "$version_exit" -ne 124 && "$version_exit" -ne 137 ]]; then


The health check logic is currently too lenient. If the version command fails (e.g., exit 1) but still produces some output (like an error message), it will be considered 'healthy' because of the [[ -n "$version_output" ... ]] condition. Given that the goal is to detect broken installations, a non-zero exit code should generally be treated as a failure unless there's a specific reason to trust the output of a failed command.

Suggested change

if [[ "$version_exit" -eq 0 ]] || [[ -n "$version_output" && "$version_exit" -ne 124 && "$version_exit" -ne 137 ]]; then

if [[ "$version_exit" -eq 0 ]]; then

Auto-dismissed: bot review does not block autonomous pipeline

marcusquinn added 2 commits February 19, 2026 22:45

coderabbitai bot previously requested changes Feb 19, 2026

View reviewed changes

gemini-code-assist bot reviewed Feb 19, 2026

View reviewed changes

marcusquinn mentioned this pull request Feb 19, 2026

[Supervisor:marcusquinn] 0 queued, 0 working, 1 in review at 14:46 UTC #1502

Closed

marcusquinn merged commit d2dae8d into main Feb 19, 2026
23 checks passed

marcusquinn deleted the feature/t1113 branch February 19, 2026 23:32

marcusquinn added a commit that referenced this pull request Feb 19, 2026

chore: mark t1113 complete in TODO.md (#1980)

8eb928d

marcusquinn mentioned this pull request Feb 19, 2026

t1113: Add worker_never_started diagnostic and auto-retry with environment check #1664

Closed

github-actions bot mentioned this pull request Feb 19, 2026

t1113: Add worker_never_started diagnostic and auto-retry with environment check #1661

Closed

coderabbitai bot mentioned this pull request Feb 19, 2026

t1190: Fix worker_never_started:no_sentinel dispatch failures #1981

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

t1113: Add worker_never_started diagnostic and auto-retry with environment check#1980

t1113: Add worker_never_started diagnostic and auto-retry with environment check#1980
marcusquinn merged 2 commits intomainfrom
feature/t1113

marcusquinn commented Feb 19, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

gemini-code-assist bot commented Feb 19, 2026

Uh oh!

coderabbitai bot commented Feb 19, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 19, 2026

Uh oh!

sonarqubecloud bot commented Feb 19, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Feb 19, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Feb 19, 2026

Uh oh!

gemini-code-assist bot Feb 19, 2026

Uh oh!

gemini-code-assist bot Feb 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	if [[ "$version_exit" -eq 0 ]] \|\| [[ -n "$version_output" && "$version_exit" -ne 124 && "$version_exit" -ne 137 ]]; then
	if [[ "$version_exit" -eq 0 ]]; then

Conversation

marcusquinn commented Feb 19, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

1. Pre-dispatch CLI health check (dispatch.sh)

2. ENVIRONMENT failure mode (evaluate.sh)

3. Re-queue without burning retries (pulse.sh)

Verification

Summary by CodeRabbit

Release Notes

Uh oh!

gemini-code-assist bot commented Feb 19, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

coderabbitai bot commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related issues

Possibly related PRs

Poem

Uh oh!

github-actions bot commented Feb 19, 2026

🔍 Code Quality Report

📈 Current Quality Metrics

Uh oh!

sonarqubecloud bot commented Feb 19, 2026

Quality Gate passed

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

marcusquinn commented Feb 19, 2026 •

edited by coderabbitai bot

Loading

1. Pre-dispatch CLI health check (`dispatch.sh`)

2. ENVIRONMENT failure mode (`evaluate.sh`)

3. Re-queue without burning retries (`pulse.sh`)

coderabbitai bot commented Feb 19, 2026 •

edited

Loading