t1190: Fix worker_never_started:no_sentinel dispatch failures by marcusquinn · Pull Request #1981 · marcusquinn/aidevops

marcusquinn · 2026-02-19T22:52:14Z

WIP - incremental commits

Root cause: dispatch/wrapper scripts used fixed filenames, causing a race condition where a second dispatch overwrites the scripts before the first wrapper process reads them.

Fixes:

Timestamped dispatch/wrapper filenames prevent overwrite race condition
WRAPPER_STARTED sentinel distinguishes 'wrapper never ran' vs 'dispatch exec failed'
Wrapper stderr redirected to log file (was /dev/null) for diagnosis
evaluate.sh: improved no_sentinel sub-classification
cleanup.sh: remove timestamped scripts on task cleanup

Ref #1799

Summary by CodeRabbit

Bug Fixes
- Fixed race conditions occurring during simultaneous task dispatches by implementing timestamped script identifiers.
- Enhanced startup failure detection with improved diagnostics distinguishing between wrapper initialization and execution errors.
- Improved error logging and diagnostics to better surface issues during task execution and startup phases.
- Enhanced process cleanup to remove residual timestamped task-related scripts.

… and wrapper sentinel (t1190) Root cause: dispatch/wrapper scripts used fixed filenames (e.g., t001-dispatch.sh), causing a race condition where a second dispatch overwrites the scripts before the first wrapper process reads them. The first wrapper then executes the new dispatch script which writes WORKER_STARTED to a different log file, leaving the original log with only the metadata header (no WORKER_STARTED → no_sentinel failure). Fixes: 1. Timestamped dispatch/wrapper filenames (t001-dispatch-20260213142302.sh) prevent overwrite race — each dispatch gets unique scripts that cannot be clobbered. 2. WRAPPER_STARTED sentinel written to log before running dispatch script, enabling distinguish between 'wrapper never ran' vs 'dispatch exec failed'. 3. Wrapper stderr redirected to log file (was /dev/null) so wrapper startup errors (bash not found, permission denied) are captured for diagnosis. 4. evaluate.sh: improved no_sentinel sub-classification using WRAPPER_STARTED signal. 5. cleanup.sh: remove timestamped scripts when task is cleaned up. Chose timestamped filenames over temp files (mktemp) to preserve debuggability — the timestamp matches the log file timestamp, making correlation easy.

coderabbitai · 2026-02-19T22:52:25Z

Walkthrough

The supervisor scripts receive enhancements for reliability and observability: timestamped dispatch and wrapper scripts prevent race conditions, startup sentinels (WRAPPER_STARTED) enable fine-grained diagnostics, improved logging routes stderr to task log files, and a new helper function standardizes wrapper script launches with consistent process isolation.

Changes

Cohort / File(s)	Summary
Timestamped Script Management `\.agents/scripts/supervisor/cleanup.sh`	Extends cleanup_worker_processes to remove timestamped dispatch, wrapper, reprompt, and prompt-repeat scripts from task's pids directory via conditional file deletion loop.
Dispatch and Launch Enhancement `\.agents/scripts/supervisor/dispatch.sh`	Introduces timestamped filenames for dispatch and wrapper scripts; adds WRAPPER_STARTED startup sentinel and dispatch-level log output; creates _launch_wrapper_script helper for consistent resilient background launches; redirects stderr to task log files for improved traceability across prompt-repeat, reprompt, and standard dispatch paths.
Startup Diagnostics `\.agents/scripts/supervisor/evaluate.sh`	Adds wrapper_started detection in log metadata extraction; expands startup failure diagnostics in evaluate_worker to distinguish three cases: startup_error present, WRAPPER_STARTED present but no WORKER_STARTED (dispatch_exec_failed), or neither (no_sentinel); improves error message capture by removing stderr suppression.

Sequence Diagram

sequenceDiagram
    participant Dispatcher
    participant WrapperScript as Wrapper Script<br/>(timestamped)
    participant DispatchScript as Dispatch Script<br/>(timestamped)
    participant WorkerProcess as Worker Process
    participant TaskLog as Task Log
    participant Evaluator

    Dispatcher->>Dispatcher: Generate timestamp
    Dispatcher->>WrapperScript: Create with timestamp suffix
    Dispatcher->>DispatchScript: Create with timestamp suffix
    Dispatcher->>WrapperScript: Launch via _launch_wrapper_script
    WrapperScript->>TaskLog: Write WRAPPER_STARTED sentinel
    WrapperScript->>DispatchScript: Execute with setsid/nohup
    DispatchScript->>TaskLog: Write START and dispatch logs
    DispatchScript->>WorkerProcess: Launch worker
    WorkerProcess->>TaskLog: Write WORKER_STARTED sentinel
    WorkerProcess->>TaskLog: Execute and log results
    Evaluator->>TaskLog: Extract log metadata
    Evaluator->>Evaluator: Check WRAPPER_STARTED presence
    Evaluator->>Evaluator: Check WORKER_STARTED presence
    Evaluator->>Evaluator: Distinguish failure modes
    Evaluator->>Evaluator: Report diagnostics

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related issues

t1190: Investigate and fix worker_never_started:no_sentinel dispatch failures #1799: PR directly addresses worker_never_started:no_sentinel failures by implementing WRAPPER_STARTED sentinel emission and enhanced startup/error logging to distinguish wrapper versus worker startup failures.

Possibly related PRs

feat: add 3-tier outcome evaluation and re-prompt cycle to supervisor (t128.3) #378: Both PRs modify supervisor dispatch/evaluation flow and implement the same WRAPPER_STARTED sentinel with coordinated dispatch/wrapper script behavior and logging patterns.
t1113: Add worker_never_started diagnostic and auto-retry with environment check #1980: Both PRs directly overlap in dispatch.sh and evaluate.sh startup diagnostics—adding wrapper-start sentinels and worker_never_started failure-reporting logic with enhanced error classification.
t1031: Modularize supervisor-helper.sh into domain modules #1359: Both PRs target the same supervisor modules (dispatch.sh, cleanup.sh, evaluate.sh) and implement overlapping functionality around timestamped wrapper/dispatch scripts and process cleanup scaffolding.

Poem

⏰ Timestamps now guard against racing time,
Sentinels light the way through startup's climb,
Wrapper speaks first, then worker takes the stage,
Logs reveal the truth on every page,
Reliability scripted, one shell at a time! 🚀

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically identifies the fix for a known issue (t1190) addressing a precise failure classification problem (worker_never_started:no_sentinel), making it immediately clear to reviewers what bug is being resolved.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/t1190

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-02-19T22:52:52Z

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 30 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Thu Feb 19 22:52:47 UTC 2026: Code review monitoring started
Thu Feb 19 22:52:48 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 30

📈 Current Quality Metrics

BUGS: 0
CODE SMELLS: 30
VULNERABILITIES: 0

Generated on: Thu Feb 19 22:52:50 UTC 2026

Generated by AI DevOps Framework Code Review Monitoring

Cancelled tasks are administrative cleanup actions (orphaned tasks, superseded work, cross-repo misregistration cleanup) — not worker failures. Including them in the failure count inflated the 7-day failure rate from <1% to 11%, causing false alarms and masking the true worker reliability signal. Root cause analysis (t1248): - Feb 12: 5 hung workers (t302, t303, t311.2x2, t311.3) — all hit the 1800s default timeout because their ~1h estimates weren't parsed. Fixed by t1199 (estimate-based timeout) and t1222 (graceful SIGTERM at 50% timeout). - Feb 13: 12 no_sentinel failures (t1010, t1030, t1032.1x4, t1032.2x2) at 14:40-14:56 UTC — caused by a race condition where concurrent dispatches overwrote fixed-filename wrapper scripts. Fixed by t1190 (timestamped filenames, WRAPPER_STARTED sentinel) — PR #1981 pending merge. - Feb 18-19: 67 cancellations (orphaned DB tasks, superseded tasks, cross-repo misregistration cleanup via t1237) — these are intentional, not failures. Fix: split 'failed' and 'cancelled' into separate metrics. Success rate now only counts status='failed' in the denominator, giving an accurate signal. Cancelled tasks get their own row for visibility without distorting the rate. True 7-day worker failure rate: 2/475 = 0.4% (not 11%) Pattern tracker overall rate: 94% (977/1037) — unchanged, counts retries

gemini-code-assist · 2026-02-19T23:15:19Z

Summary of Changes

Hello @marcusquinn, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical race condition in the supervisor's worker dispatch mechanism, which previously caused 'worker_never_started:no_sentinel' failures. The changes enhance the robustness of worker startup by ensuring unique script execution and provide significantly improved diagnostic information when workers fail to initialize. This leads to more reliable task execution and easier debugging of startup issues.

Highlights

Race Condition Fix: Implemented timestamped filenames for dispatch and wrapper scripts to prevent overwrites during rapid dispatches, resolving a critical race condition.
Improved Diagnostics: Introduced a 'WRAPPER_STARTED' sentinel to distinguish between cases where the wrapper process never ran and where the dispatch script failed to execute, providing clearer failure insights.
Enhanced Error Logging: Redirected wrapper script stderr to log files instead of /dev/null, capturing startup errors for better diagnosis of worker failures.
Refined Failure Classification: Updated 'evaluate.sh' to provide more granular sub-classifications for 'worker_never_started' failures, differentiating between 'dispatch_exec_failed' and 'no_sentinel'.
Cleanup Mechanism: Added logic to 'cleanup.sh' to remove the newly introduced timestamped dispatch and wrapper scripts, ensuring proper system hygiene.

Changelog

.agents/scripts/supervisor/cleanup.sh
- Added logic to remove timestamped dispatch, wrapper, reprompt, and prompt-repeat scripts associated with a task during cleanup.
.agents/scripts/supervisor/dispatch.sh
- Modified 'do_prompt_repeat' to use timestamped filenames for 'prompt-repeat' dispatch and wrapper scripts.
- Added a 'WRAPPER_STARTED' sentinel echo to the 'prompt-repeat' wrapper script.
- Changed 'nohup' commands in 'do_prompt_repeat' to redirect wrapper stderr to 'new_log_file'.
- Modified 'cmd_dispatch' to use timestamped filenames for main 'dispatch' and 'wrapper' scripts.
- Added a 'WRAPPER_STARTED' sentinel echo to the main 'wrapper' script.
- Changed 'nohup' commands in 'cmd_dispatch' to redirect wrapper stderr to 'log_file'.
- Modified 'do_reprompt' to use timestamped filenames for 'reprompt' dispatch and wrapper scripts.
- Added a 'WRAPPER_STARTED' sentinel echo to the 'reprompt' wrapper script.
- Changed 'nohup' commands in 'do_reprompt' to redirect wrapper stderr to 'new_log_file'.
.agents/scripts/supervisor/evaluate.sh
- Added 'wrapper_started' metadata extraction based on the 'WRAPPER_STARTED' sentinel.
- Updated the 'evaluate_worker' function to differentiate 'worker_never_started' into 'dispatch_exec_failed' (if 'WRAPPER_STARTED' is present but 'WORKER_STARTED' is not) and 'no_sentinel' (if neither started).

Activity

The pull request is currently a Work In Progress (WIP) with incremental commits.
It addresses issue t1190: Investigate and fix worker_never_started:no_sentinel dispatch failures #1799.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request effectively addresses a race condition in script dispatching by using timestamped filenames, and significantly improves error diagnostics by adding a WRAPPER_STARTED sentinel and redirecting wrapper script stderr to log files. The changes are logical and well-commented. My review includes a few suggestions to improve code consistency, adhere to the repository's shell scripting style guide, and reduce code duplication for better long-term maintainability.

.agents/scripts/supervisor/cleanup.sh

.agents/scripts/supervisor/dispatch.sh

.agents/scripts/supervisor/evaluate.sh

…1983) Cancelled tasks are administrative cleanup actions (orphaned tasks, superseded work, cross-repo misregistration cleanup) — not worker failures. Including them in the failure count inflated the 7-day failure rate from <1% to 11%, causing false alarms and masking the true worker reliability signal. Root cause analysis (t1248): - Feb 12: 5 hung workers (t302, t303, t311.2x2, t311.3) — all hit the 1800s default timeout because their ~1h estimates weren't parsed. Fixed by t1199 (estimate-based timeout) and t1222 (graceful SIGTERM at 50% timeout). - Feb 13: 12 no_sentinel failures (t1010, t1030, t1032.1x4, t1032.2x2) at 14:40-14:56 UTC — caused by a race condition where concurrent dispatches overwrote fixed-filename wrapper scripts. Fixed by t1190 (timestamped filenames, WRAPPER_STARTED sentinel) — PR #1981 pending merge. - Feb 18-19: 67 cancellations (orphaned DB tasks, superseded tasks, cross-repo misregistration cleanup via t1237) — these are intentional, not failures. Fix: split 'failed' and 'cancelled' into separate metrics. Success rate now only counts status='failed' in the denominator, giving an accurate signal. Cancelled tasks get their own row for visibility without distorting the rate. True 7-day worker failure rate: 2/475 = 0.4% (not 11%) Pattern tracker overall rate: 94% (977/1037) — unchanged, counts retries

- cleanup.sh: normalize glob patterns ([0-9]* -> *) for reprompt/prompt-repeat scripts; remove redundant 2>/dev/null from rm -f (already guarded by [[ -f ]]) - dispatch.sh: extract duplicated setsid/nohup wrapper launch into _launch_wrapper_script() helper to eliminate code duplication - evaluate.sh: remove blanket 2>/dev/null suppression from grep calls per style guide (log file existence already checked before these calls)

github-actions · 2026-02-19T23:36:23Z

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 30 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Thu Feb 19 23:36:19 UTC 2026: Code review monitoring started
Thu Feb 19 23:36:19 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 30

📈 Current Quality Metrics

BUGS: 0
CODE SMELLS: 30
VULNERABILITIES: 0

Generated on: Thu Feb 19 23:36:22 UTC 2026

Generated by AI DevOps Framework Code Review Monitoring

sonarqubecloud · 2026-02-19T23:37:17Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

.agents/scripts/supervisor/evaluate.sh (1)
1028-1043: ⚠️ Potential issue | 🟡 Minor

startup_error grep may false-match previous-run errors from the REPROMPT METADATA header.

The startup_error extraction at Line 1031 scans the entire log. On reprompt retries the log file is pre-seeded with a === REPROMPT METADATA === header containing previous_error=${terror:-none}. If terror from a prior run contained WORKER_FAILED: or command not found, the grep will match that stale header entry — producing a misleading failed:worker_never_started:previous_error=WORKER_FAILED_... detail instead of no_sentinel or dispatch_exec_failed.

This doesn't break the routing pipeline (outcome remains in the worker_never_started family), but it degrades diagnostic accuracy and could misattribute the cause of the new failure to the old one.
🛡️ Proposed fix — exclude header lines from startup_error search
-		startup_error=$(grep -i 'WORKER_FAILED\|WORKER_DISPATCH_ERROR\|command not found\|No such file\|Permission denied' "$tlog" | head -1 | head -c 200 || echo "")
+		# Skip REPROMPT METADATA header lines (previous_error= could match stale errors)
+		startup_error=$(grep -v '^=== $REPROMPT METADATA\|END REPROMPT METADATA$\|^task_id=\|^timestamp=\|^retry=\|^work_dir=\|^previous_error=\|^fresh_worktree=\|^strategy=' "$tlog" \
+			| grep -i 'WORKER_FAILED\|WORKER_DISPATCH_ERROR\|command not found\|No such file\|Permission denied' \
+			| head -1 | head -c 200 || echo "")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.agents/scripts/supervisor/evaluate.sh around lines 1028 - 1043, The grep
used to set startup_error scans the whole log and can match stale entries in the
REPROMPT METADATA header; change the extraction to exclude header lines by
piping the log through a filter before matching (e.g., cat "$tlog" | grep -v
'^=== REPROMPT METADATA ===' -A0 -B9999 or simply grep -v 'previous_error=' |
grep -i 'WORKER_FAILED\|WORKER_DISPATCH_ERROR\|command not found\|No such
file\|Permission denied' | head -1 | head -c 200) so the startup_error
assignment (variable startup_error, using "$tlog") ignores metadata header lines
and only considers actual runtime log lines. Ensure the rest of the logic (the
WRAPPER_STARTED check and the echo branches) remains unchanged.

🧹 Nitpick comments (2)

.agents/scripts/supervisor/dispatch.sh (1)
2725-2741: _launch_wrapper_script relies on implicit $! propagation — fragile contract worth hardening.

Bash functions run inline (not as subprocesses), but running a function in the background or as part of a pipeline forces it to run as a subprocess. When called normally, $! in the caller correctly reflects the last background process started inside the function. return 0 only sets $?, not $!, so the current local worker_pid=$! at Line 2708 captures the correct PID.

The fragility: any future addition of another background command (&) inside _launch_wrapper_script — even a log write or heartbeat — would silently break worker_pid. Making the PID return explicit costs two lines and removes the implicit contract entirely.
♻️ Explicit PID return via stdout
 _launch_wrapper_script() {
 	local wrapper_script="$1"
 	local log_file="$2"
 
 	if command -v setsid &>/dev/null; then
 		nohup setsid bash "${wrapper_script}" >>"${log_file}" 2>&1 &
 	else
 		nohup bash "${wrapper_script}" >>"${log_file}" 2>&1 &
 	fi
-	return 0
+	echo "$!"
 }
Caller update (Line 2708):
-_launch_wrapper_script "${wrapper_script}" "${log_file}"
-local worker_pid=$!
+local worker_pid
+worker_pid=$(_launch_wrapper_script "${wrapper_script}" "${log_file}")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.agents/scripts/supervisor/dispatch.sh around lines 2725 - 2741, The helper
_launch_wrapper_script currently relies on the caller reading $! externally
which is fragile; change it to explicitly emit the launched wrapper PID on
stdout so callers can capture it deterministically (e.g., have
_launch_wrapper_script print the PID of the backgrounded nohup/setsid process
before returning), then update callers that set local worker_pid=$! to instead
capture that stdout (e.g., worker_pid="$( _launch_wrapper_script ... )"); ensure
the function still redirects logs and returns non-zero on failure while printing
only the PID on success so downstream parsing remains stable.
.agents/scripts/supervisor/cleanup.sh (1)
287-298: Redundant existence guard — rm -f already handles missing files gracefully.

[[ -f "$script" ]] && rm -f "$script" || true is more complex than it needs to be. The -f flag to rm suppresses "no such file or directory" errors and exits 0 on missing operands, so the guard is redundant. When no glob matches, bash passes the literal pattern as $script, and rm -f on that non-existent path exits 0 without error.
♻️ Simplified cleanup loop
-	local script
-	for script in "$SUPERVISOR_DIR/pids/${task_id}"-dispatch-*.sh \
-		"$SUPERVISOR_DIR/pids/${task_id}"-wrapper-*.sh \
-		"$SUPERVISOR_DIR/pids/${task_id}"-reprompt-*.sh \
-		"$SUPERVISOR_DIR/pids/${task_id}"-reprompt-wrapper-*.sh \
-		"$SUPERVISOR_DIR/pids/${task_id}"-prompt-repeat-*.sh \
-		"$SUPERVISOR_DIR/pids/${task_id}"-prompt-repeat-wrapper-*.sh; do
-		[[ -f "$script" ]] && rm -f "$script" || true
-	done
+	rm -f \
+		"$SUPERVISOR_DIR/pids/${task_id}"-dispatch-*.sh \
+		"$SUPERVISOR_DIR/pids/${task_id}"-wrapper-*.sh \
+		"$SUPERVISOR_DIR/pids/${task_id}"-reprompt-*.sh \
+		"$SUPERVISOR_DIR/pids/${task_id}"-reprompt-wrapper-*.sh \
+		"$SUPERVISOR_DIR/pids/${task_id}"-prompt-repeat-*.sh \
+		"$SUPERVISOR_DIR/pids/${task_id}"-prompt-repeat-wrapper-*.sh \
+		2>/dev/null || true
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.agents/scripts/supervisor/cleanup.sh around lines 287 - 298, The loop over
files using the variable script (for script in
"$SUPERVISOR_DIR/pids/${task_id}"-dispatch-*.sh ... ) has a redundant existence
guard ([[ -f "$script" ]] && rm -f "$script" || true); simplify it by removing
the conditional and calling rm -f "$script" directly inside the loop so that the
cleanup uses rm -f "$script" for each glob-expanded entry (referencing
SUPERVISOR_DIR, task_id and the script variable) and thereby reduce complexity.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.agents/scripts/supervisor/dispatch.sh:
- Around line 1128-1132: The inline daemon-launch block duplicated in
do_prompt_repeat and cmd_reprompt should be replaced with a call to the existing
helper _launch_wrapper_script to complete the DRY refactor: locate the inline if
command -v setsid ... nohup ... fi blocks inside the functions do_prompt_repeat
and cmd_reprompt and remove them, then invoke _launch_wrapper_script passing the
same wrapper_script and new_log_file variables (or their local equivalents) so
both functions use the centralized launch logic implemented for cmd_dispatch.
- Line 1076: The generated wrapper sentinel currently writes an inner echo that
uses single quotes so wrapper_pid=$$ is literal; update the three sentinel
emission sites that produce the line like echo "echo 'WRAPPER_STARTED
task_id=${task_id} strategy=prompt_repeat wrapper_pid=\$\$ ...' >>
'${new_log_file}'" to instead produce a fully double-quoted inner echo so $$
expands at runtime (i.e., make the inner echo use "WRAPPER_STARTED ...
wrapper_pid=$$ ..." rather than single quotes); locate the three occurrences in
.agents/scripts/supervisor/dispatch.sh that emit the WRAPPER_STARTED sentinel
(the same pattern at the lines noted in the review) and change the generated
echo quoting accordingly so the wrapper PID is recorded.

---

Outside diff comments:
In @.agents/scripts/supervisor/evaluate.sh:
- Around line 1028-1043: The grep used to set startup_error scans the whole log
and can match stale entries in the REPROMPT METADATA header; change the
extraction to exclude header lines by piping the log through a filter before
matching (e.g., cat "$tlog" | grep -v '^=== REPROMPT METADATA ===' -A0 -B9999 or
simply grep -v 'previous_error=' | grep -i
'WORKER_FAILED\|WORKER_DISPATCH_ERROR\|command not found\|No such
file\|Permission denied' | head -1 | head -c 200) so the startup_error
assignment (variable startup_error, using "$tlog") ignores metadata header lines
and only considers actual runtime log lines. Ensure the rest of the logic (the
WRAPPER_STARTED check and the echo branches) remains unchanged.

---

Nitpick comments:
In @.agents/scripts/supervisor/cleanup.sh:
- Around line 287-298: The loop over files using the variable script (for script
in "$SUPERVISOR_DIR/pids/${task_id}"-dispatch-*.sh ... ) has a redundant
existence guard ([[ -f "$script" ]] && rm -f "$script" || true); simplify it by
removing the conditional and calling rm -f "$script" directly inside the loop so
that the cleanup uses rm -f "$script" for each glob-expanded entry (referencing
SUPERVISOR_DIR, task_id and the script variable) and thereby reduce complexity.

In @.agents/scripts/supervisor/dispatch.sh:
- Around line 2725-2741: The helper _launch_wrapper_script currently relies on
the caller reading $! externally which is fragile; change it to explicitly emit
the launched wrapper PID on stdout so callers can capture it deterministically
(e.g., have _launch_wrapper_script print the PID of the backgrounded
nohup/setsid process before returning), then update callers that set local
worker_pid=$! to instead capture that stdout (e.g., worker_pid="$(
_launch_wrapper_script ... )"); ensure the function still redirects logs and
returns non-zero on failure while printing only the PID on success so downstream
parsing remains stable.

.agents/scripts/supervisor/dispatch.sh

Auto-dismissed: bot review does not block autonomous pipeline

marcusquinn mentioned this pull request Feb 19, 2026

[Supervisor:marcusquinn] 0 queued, 0 working, 1 in review at 14:46 UTC #1502

Closed

marcusquinn mentioned this pull request Feb 19, 2026

t1248: Fix success rate metric — exclude cancelled tasks from failure count #1983

Merged

marcusquinn marked this pull request as ready for review February 19, 2026 23:15

gemini-code-assist bot reviewed Feb 19, 2026

View reviewed changes

.agents/scripts/supervisor/cleanup.sh Show resolved Hide resolved

.agents/scripts/supervisor/dispatch.sh Outdated Show resolved Hide resolved

.agents/scripts/supervisor/evaluate.sh Outdated Show resolved Hide resolved

.agents/scripts/supervisor/evaluate.sh Outdated Show resolved Hide resolved

coderabbitai bot previously requested changes Feb 19, 2026

View reviewed changes

.agents/scripts/supervisor/dispatch.sh Show resolved Hide resolved

.agents/scripts/supervisor/dispatch.sh Show resolved Hide resolved

coderabbitai bot approved these changes Feb 20, 2026

View reviewed changes

marcusquinn merged commit 4dae331 into main Feb 20, 2026
27 checks passed

marcusquinn deleted the feature/t1190 branch February 20, 2026 00:27

marcusquinn added a commit that referenced this pull request Feb 20, 2026

chore: mark t1190 complete in TODO.md (#1981)

f70926f

marcusquinn mentioned this pull request Feb 20, 2026

t1190: Investigate and fix worker_never_started:no_sentinel dispatch failures #1799

Closed

coderabbitai bot mentioned this pull request Feb 21, 2026

t1165.4: Integration test — multi-container batch dispatch #2111

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

t1190: Fix worker_never_started:no_sentinel dispatch failures#1981

t1190: Fix worker_never_started:no_sentinel dispatch failures#1981
marcusquinn merged 2 commits intomainfrom
feature/t1190

marcusquinn commented Feb 19, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 19, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 19, 2026

Uh oh!

gemini-code-assist bot commented Feb 19, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Feb 19, 2026

Uh oh!

sonarqubecloud bot commented Feb 19, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

marcusquinn commented Feb 19, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related issues

Possibly related PRs

Poem

Uh oh!

github-actions bot commented Feb 19, 2026

🔍 Code Quality Report

📈 Current Quality Metrics

Uh oh!

gemini-code-assist bot commented Feb 19, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Feb 19, 2026

🔍 Code Quality Report

📈 Current Quality Metrics

Uh oh!

sonarqubecloud bot commented Feb 19, 2026

Quality Gate passed

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

marcusquinn commented Feb 19, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 19, 2026 •

edited

Loading