t1165.4: Integration test — multi-container batch dispatch by marcusquinn · Pull Request #2111 · marcusquinn/aidevops

marcusquinn · 2026-02-21T16:22:42Z

Summary

Add comprehensive integration test for multi-container batch dispatch (t1165.4)
Verifies parallel workers, OAuth routing, container lifecycle, log aggregation, failure isolation, adaptive concurrency
70 test assertions covering the full supervisor batch dispatch pipeline
All tests pass with 0 failures, 0 skips

Test Coverage

Area	Tests
Batch creation	5
OAuth routing	6
Parallel dispatch	6
Log aggregation	12
Worker evaluation	4
Failure isolation	5
Second wave backfill	2
Batch completion	2
Cross-worker logs	4
Cleanup	3
State audit trail	7
Adaptive concurrency	4
Edge cases	4
Quality checks	3
Total	70

Ref #1766

Summary by CodeRabbit

Tests
- Added comprehensive integration test suite for multi-container batch dispatch workflow, including validation of task queueing, worker dispatch, OAuth routing, logging, lifecycle transitions, error handling, and batch completion detection.

coderabbitai · 2026-02-21T16:22:48Z

Walkthrough

A comprehensive Bash integration test script exercising end-to-end multi-container batch dispatch workflows, including task registration, parallel worker dispatch across containers, OAuth routing logic, lifecycle state transitions, logging isolation, failure recovery with backfill, and extensive edge case validation.

Changes

Cohort / File(s)	Summary
Integration Test Suite `tests/test-multi-container-batch-dispatch.sh`	New 1161-line test script validating multi-container batch dispatch orchestration: supervisor DB initialization, mock CLI tools (claude/opencode), task/batch registration, OAuth routing patterns, parallel worker dispatch with concurrency enforcement, lifecycle transitions (queued → dispatched → running → evaluating → complete), worker logging isolation with sentinels and heartbeats, mixed evaluation outcomes (success vs. rate-limit retries), failure isolation, backfill waves, batch completion detection, state-machine validation, and edge cases (empty batches, duplicates, release-on-complete behavior).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

feat: add worker dispatch with worktree isolation to supervisor (t128.2) #377: Directly validates dispatch/pulse/worker-status/worktree and lifecycle state-machine behaviors implemented in supervisor-helper.sh.
t1190: Fix worker_never_started:no_sentinel dispatch failures #1981: Tests dispatch/evaluate/cleanup pathways and failure modes, including wrapper startup sentinels and timestamped dispatch behavior.
t1164: End-to-end verification of dual-CLI architecture #2105: Exercises dual-CLI OAuth routing logic (claude vs. opencode models) and dispatch behavior used by the supervisor.

Poem

🚀 Batch dispatchers dance in tests so grand,
Workers queue up, container-bound and planned,
OAuth routes them left and right with flair,
Logs stay isolated in the digital air,
Edge cases tamed—automation's finest prayer! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 64.29% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically describes the primary change: adding an integration test for multi-container batch dispatch functionality, with a ticket reference (t1165.4) for traceability.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/t1165.4

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist · 2026-02-21T16:22:58Z

Summary of Changes

Hello @marcusquinn, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a critical integration test designed to thoroughly validate the supervisor's behavior in a multi-container batch dispatch environment. It ensures that the system correctly handles parallel task execution, routes OAuth credentials appropriately, manages container lifecycles, aggregates logs effectively, isolates failures, and scales concurrency adaptively. This new test significantly enhances the reliability and stability of the batch processing pipeline.

Highlights

New Integration Test: Added a comprehensive integration test for multi-container batch dispatch (t1165.4) to validate the supervisor's orchestration logic.
Key Verifications: The test verifies parallel workers, correct OAuth routing, container lifecycle management, log aggregation, failure isolation, and adaptive concurrency.
Extensive Coverage: Includes 70 test assertions covering the full supervisor batch dispatch pipeline, ensuring robust functionality.
Test Status: All tests pass with 0 failures and 0 skips, indicating successful implementation.

Changelog

tests/test-multi-container-batch-dispatch.sh
- Added a new comprehensive integration test script for multi-container batch dispatch.

Activity

No human activity has been recorded on this pull request yet.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

github-actions · 2026-02-21T16:23:13Z

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 38 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Sat Feb 21 16:23:09 UTC 2026: Code review monitoring started
Sat Feb 21 16:23:09 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 38

📈 Current Quality Metrics

BUGS: 0
CODE SMELLS: 38
VULNERABILITIES: 0

Generated on: Sat Feb 21 16:23:12 UTC 2026

Generated by AI DevOps Framework Code Review Monitoring

gemini-code-assist

Code Review

This pull request introduces a comprehensive integration test for multi-container batch dispatch. The test script is well-structured and covers a wide range of scenarios, including parallel execution, OAuth routing, log aggregation, and failure isolation. My review focuses on adherence to the repository's shell scripting style guide and best practices. I've identified several areas for improvement, primarily related to explicit return statements, error handling, and output redirection, to enhance the script's robustness and maintainability. Overall, this is a valuable addition that significantly improves test coverage for the supervisor's batch processing capabilities.

gemini-code-assist · 2026-02-21T16:25:03Z

tests/test-multi-container-batch-dispatch.sh

+export MOCK_CLAUDE_LOG="$TEST_DIR/mock-claude-invocations.log"
+export MOCK_OPENCODE_LOG="$TEST_DIR/mock-opencode-invocations.log"
+
+# shellcheck disable=SC2317,SC2329


The repository style guide (line 14) requires a reason for disabling ShellCheck warnings. Please add a comment explaining why SC2317 and SC2329 are disabled for the cleanup function.

Suggested change

# shellcheck disable=SC2317,SC2329

# shellcheck disable=SC2317,SC2329 # SC2317: The trap is for EXIT, so commands after it are unreachable. SC2329: The trap is not meant to be inherited by subshells.

References

ShellCheck violations require targeted inline disables with reason comments. ^(link)

gemini-code-assist · 2026-02-21T16:25:03Z

tests/test-multi-container-batch-dispatch.sh

+test_db() {
+	sqlite3 -cmd ".timeout 5000" "$TEST_DIR/supervisor/supervisor.db" "$@"
+}
+
+# Helper: get task status
+get_status() {
+	test_db "SELECT status FROM tasks WHERE id = '$1';"
+}
+
+# Helper: get task field
+get_field() {
+	test_db "SELECT $2 FROM tasks WHERE id = '$1';"
+}
+
+# Helper: create a mock worker log file
+create_log() {
+	local task_id="$1"
+	local content="$2"
+	local log_file="$TEST_DIR/supervisor/logs/${task_id}.log"
+	mkdir -p "$TEST_DIR/supervisor/logs"
+	echo "$content" >"$log_file"
+	test_db "UPDATE tasks SET log_file = '$log_file' WHERE id = '$task_id';"
+	echo "$log_file"
+}
+
+# Helper: create a mock PID file for a running worker
+create_pid_file() {
+	local task_id="$1"
+	local pid="$2"
+	mkdir -p "$TEST_DIR/supervisor/pids"
+	echo "$pid" >"$TEST_DIR/supervisor/pids/${task_id}.pid"
+}
+
+# Helper: count tasks in a given status
+count_status() {
+	local status="$1"
+	test_db "SELECT count(*) FROM tasks WHERE status = '$status';"
+}


Several helper functions are missing explicit return statements, which is required by the repository's style guide (line 12). While the script may function correctly with set -e, adding explicit returns improves clarity and adherence to the coding standards.

For functions that wrap another command and should propagate its exit code, use return $?. For functions that are primarily used for their standard output and are expected to succeed, use return 0.

test_db() { sqlite3 -cmd ".timeout 5000" "$TEST_DIR/supervisor/supervisor.db" "$@" return $? } # Helper: get task status get_status() { test_db "SELECT status FROM tasks WHERE id = '$1';" return $? } # Helper: get task field get_field() { test_db "SELECT $2 FROM tasks WHERE id = '$1';" return $? } # Helper: create a mock worker log file create_log() { local task_id="$1" local content="$2" local log_file="$TEST_DIR/supervisor/logs/${task_id}.log" mkdir -p "$TEST_DIR/supervisor/logs" echo "$content" >"$log_file" test_db "UPDATE tasks SET log_file = '$log_file' WHERE id = '$task_id';" echo "$log_file" return 0 } # Helper: create a mock PID file for a running worker create_pid_file() { local task_id="$1" local pid="$2" mkdir -p "$TEST_DIR/supervisor/pids" echo "$pid" >"$TEST_DIR/supervisor/pids/${task_id}.pid" return $? } # Helper: count tasks in a given status count_status() { local status="$1" test_db "SELECT count(*) FROM tasks WHERE status = '$status';" return $? }

References

All functions must have explicit return statements. ^(link)

All functions in shell scripts must have explicit return statements to ensure predictable behavior and exit code propagation.

gemini-code-assist · 2026-02-21T16:25:03Z

tests/test-multi-container-batch-dispatch.sh

+export HOME=\$mock_home
+mkdir -p "\$mock_home/.claude"
+echo '{"hasCompletedOnboarding":true}' > "\$mock_home/.claude/settings.json"
+rm -f '$TEST_DIR/supervisor/health/claude-oauth' 2>/dev/null


The -f flag for rm already suppresses errors for non-existent files, so 2>/dev/null is redundant here. It can also hide other errors like permission issues, which goes against the repository's general rules. Please remove 2>/dev/null from this line and similar occurrences on lines 357, 388, 419, 453, and 484.

Suggested change

rm -f '$TEST_DIR/supervisor/health/claude-oauth' 2>/dev/null

rm -f '$TEST_DIR/supervisor/health/claude-oauth'

References

Avoid using 2>/dev/null to suppress errors on file operations if the file's existence has already been verified by a preceding check (e.g., [[ -f "$file" ]] or an early return). This practice is redundant for 'file not found' errors and can mask other important issues like permissions problems.

Avoid using 2>/dev/null for blanket suppression of command errors in shell scripts to ensure that authentication, syntax, or system issues remain visible for debugging.

gemini-code-assist · 2026-02-21T16:25:03Z

tests/test-multi-container-batch-dispatch.sh

+# Verify WRAPPER_STARTED sentinel in all logs
+for tid in mc-t1 mc-t2 mc-t3; do
+	log_file=$(get_field "$tid" "log_file")
+	if grep -q "WRAPPER_STARTED task_id=$tid" "$log_file" 2>/dev/null; then


Suppressing stderr with 2>/dev/null can hide important errors, such as file permission issues, making debugging harder. It's better to let potential errors be visible. Please remove 2>/dev/null from this line and similar grep calls on lines 610, 621, 880, and 894.

Suggested change

if grep -q "WRAPPER_STARTED task_id=$tid" "$log_file" 2>/dev/null; then

if grep -q "WRAPPER_STARTED task_id=$tid" "$log_file"; then

References

Avoid using 2>/dev/null for blanket suppression of command errors in shell scripts to ensure that authentication, syntax, or system issues remain visible for debugging.

gemini-code-assist · 2026-02-21T16:25:03Z

tests/test-multi-container-batch-dispatch.sh

+section "Cross-Cutting Quality"
+
+# Test: supervisor-helper.sh passes bash syntax check
+if bash -n "$SUPERVISOR_SCRIPT" 2>/dev/null; then


Suppressing stderr for syntax checks with 2>/dev/null hides the actual error messages, making it difficult to debug test failures. Please remove 2>/dev/null to allow error messages to be printed. This also applies to the checks on lines 1122 and 1135.

Suggested change

if bash -n "$SUPERVISOR_SCRIPT" 2>/dev/null; then

if bash -n "$SUPERVISOR_SCRIPT"; then

References

Avoid using 2>/dev/null for blanket suppression of command errors in shell scripts to ensure that authentication, syntax, or system issues remain visible for debugging.

Comprehensive integration test verifying: - Parallel workers across containers (batch concurrency enforcement) - Correct OAuth routing (claude for Anthropic, opencode for non-Anthropic) - Container lifecycle (dispatch -> running -> evaluate -> cleanup) - Log aggregation across parallel workers (sentinels, per-task logs) - Batch completion detection and post-completion hooks - Adaptive concurrency scaling - Cross-container failure isolation (one worker fails, others continue) - Second dispatch wave backfill after completions - Edge cases (empty batch, single-task, duplicates, release-on-complete) 70 test assertions covering the full supervisor batch dispatch pipeline.

github-actions · 2026-02-21T17:34:06Z

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 38 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Sat Feb 21 17:34:02 UTC 2026: Code review monitoring started
Sat Feb 21 17:34:02 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 38

📈 Current Quality Metrics

BUGS: 0
CODE SMELLS: 38
VULNERABILITIES: 0

Generated on: Sat Feb 21 17:34:05 UTC 2026

Generated by AI DevOps Framework Code Review Monitoring

sonarqubecloud · 2026-02-21T17:34:48Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (5)

tests/test-multi-container-batch-dispatch.sh (5)
186-194: _run_isolated_test swallows all stderr, making failures hard to diagnose.

Line 190 redirects all stderr to /dev/null. If a sourced module or the test body produces an unexpected error, it will be silently eaten. Consider routing stderr to a temp file and dumping it only on failure, or honoring the --verbose flag.
Sketch
-	bash "$test_script" 2>/dev/null
+	local stderr_file="$TEST_DIR/isolated-stderr-$$.log"
+	bash "$test_script" 2>"$stderr_file"
 	local rc=$?
+	if [[ $rc -ne 0 && "$VERBOSE" == "--verbose" ]]; then
+		cat "$stderr_file" >&2
+	fi
+	rm -f "$stderr_file"
 	rm -f "$test_script"
 	return $rc
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test-multi-container-batch-dispatch.sh` around lines 186 - 194, The
helper _run_isolated_test currently discards all stderr by running bash
"$test_script" 2>/dev/null; change this to capture stderr to a temp file (e.g.,
"${test_script}.err") and only suppress output on success, printing the stderr
file on non-zero exit (or when a global verbose flag is set) so test failures
and sourced-module errors are visible; update the function to write stderr to
that temp file, check the return code from bash, and cat the stderr file on
failure or when --verbose is enabled before cleaning up.
308-494: Heavy duplication across six OAuth routing test blocks.

Each of the six _run_isolated_test heredocs (lines 309–329, 340–360, 371–391, 402–422, 436–456, 467–487) repeats ~18 identical setup lines, varying only SUPERVISOR_PREFER_OAUTH and the model argument to resolve_ai_cli. Extracting a helper would cut ~170 lines of near-identical code and make adding new routing test cases trivial.
Sketch: extract a reusable routing test helper
+# Helper: resolve CLI for a given model under specified OAuth preference
+resolve_cli_for_model() {
+    local model="$1"
+    local prefer_oauth="${2:-true}"
+    _run_isolated_test <<EOF
+#!/usr/bin/env bash
+set -euo pipefail
+export AIDEVOPS_SUPERVISOR_DIR='$TEST_DIR/supervisor'
+export PATH='$MOCK_BIN':/usr/bin:/bin
+export SUPERVISOR_PREFER_OAUTH=$prefer_oauth
+unset SUPERVISOR_CLI 2>/dev/null || true
+BLUE='' GREEN='' YELLOW='' RED='' NC=''
+SUPERVISOR_LOG='/dev/null'
+SUPERVISOR_DIR='$TEST_DIR/supervisor'
+source '$SHARED_CONSTANTS'
+source '$SUPERVISOR_DIR_MODULE/_common.sh'
+source '$SUPERVISOR_DIR_MODULE/dispatch.sh'
+mock_home=\$(mktemp -d)
+export HOME=\$mock_home
+mkdir -p "\$mock_home/.claude"
+echo '{"hasCompletedOnboarding":true}' > "\$mock_home/.claude/settings.json"
+rm -f '$TEST_DIR/supervisor/health/claude-oauth' 2>/dev/null
+resolve_ai_cli '$model'
+rm -rf "\$mock_home"
+EOF
+}
Then each test collapses to:
oauth_opus_cli=$(resolve_cli_for_model 'anthropic/claude-opus-4-6' true)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test-multi-container-batch-dispatch.sh` around lines 308 - 494, The
tests duplicate the same ~18-line setup across multiple _run_isolated_test
heredocs around resolve_ai_cli; extract a helper function (e.g.,
resolve_cli_for_model) that accepts the model string and a
SUPERVISOR_PREFER_OAUTH flag, builds and runs the single heredoc calling
_run_isolated_test and resolve_ai_cli (reusing the same HOME/mock setup,
sourcing _common.sh and dispatch.sh, and cleanup), then replace the six repeated
blocks with calls like oauth_opus_cli=$(resolve_cli_for_model
'anthropic/claude-opus-4-6' true) (ensure the helper toggles
SUPERVISOR_PREFER_OAUTH and unsets SUPERVISOR_CLI as before and returns the
resolve_ai_cli output).
140-143: Redundant return $?.

In Bash, a function implicitly returns the exit code of its last command. The explicit return $? on line 142 is a no-op. Not harmful, but removing it keeps the helper minimal.
Proposed fix
 sup() {
-	bash "$SUPERVISOR_SCRIPT" "$@" 2>&1
-	return $?
+	bash "$SUPERVISOR_SCRIPT" "$@" 2>&1
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test-multi-container-batch-dispatch.sh` around lines 140 - 143, The
helper function sup currently ends with an unnecessary explicit "return $?"
which is redundant because Bash functions return the exit status of the last
command; remove the "return $?" line from the sup() function (which runs bash
"$SUPERVISOR_SCRIPT" "$@" 2>&1) so the function simply invokes that command and
lets its exit status propagate implicitly.
1018-1022: Adaptive concurrency assertion only checks for "is a number", not the expected value.

The test calls calculate_adaptive_concurrency 3 2 6 but only verifies the result is a non-empty integer. It doesn't assert the expected output (e.g., 6 if the formula is min(base * factor, max)). This makes it a smoke test rather than a correctness test. Consider asserting the expected value.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test-multi-container-batch-dispatch.sh` around lines 1018 - 1022, The
test only verifies that adaptive_result is a non-empty integer; change it to
compute the expected value (e.g., expected = min(base * factor, max) for the
call calculate_adaptive_concurrency 3 2 6 → expected 6) and assert
adaptive_result equals that expected value instead of only matching a number;
update the conditional that currently checks "$adaptive_result" =~ ^[0-9]+$ to
compare numeric equality between adaptive_result and the calculated expected,
and produce the pass/skip messages referencing calculate_adaptive_concurrency
and adaptive_result accordingly.
88-102: Mock default log paths fall back to shared /tmp locations.

Lines 90 and 108 define fallback paths (/tmp/mock-claude-invocations.log, /tmp/mock-opencode-invocations.log) inside the heredoc. While MOCK_CLAUDE_LOG / MOCK_OPENCODE_LOG are exported on lines 120–121 to override these defaults, the fallback creates a small risk: if the env var isn't inherited (e.g., a subshell env -i scenario), logs land in a shared /tmp file, potentially colliding across parallel CI runs or leaking test artifact details.

Consider pointing the fallback at a path that's guaranteed unique (e.g., derived from $$ or mktemp), or removing the default entirely and failing fast if the env var is unset.
Proposed fix
-MOCK_LOG="${MOCK_CLAUDE_LOG:-/tmp/mock-claude-invocations.log}"
+MOCK_LOG="${MOCK_CLAUDE_LOG:?MOCK_CLAUDE_LOG must be set}"
-MOCK_LOG="${MOCK_OPENCODE_LOG:-/tmp/mock-opencode-invocations.log}"
+MOCK_LOG="${MOCK_OPENCODE_LOG:?MOCK_OPENCODE_LOG must be set}"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test-multi-container-batch-dispatch.sh` around lines 88 - 102, The mock
scripts embed unsafe shared /tmp fallbacks for MOCK_CLAUDE_LOG and
MOCK_OPENCODE_LOG inside the heredocs which can collide in CI; update the
heredoc defaults to use unique per-process names (e.g., derive from $$ or use
mktemp) or remove the inline fallback entirely and make the script fail fast if
MOCK_CLAUDE_LOG / MOCK_OPENCODE_LOG are not set (these env vars are exported
later in the test harness), so that logs never silently land in a global /tmp
file.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/test-multi-container-batch-dispatch.sh`:
- Around line 910-920: The PID file existence assertion is too loose—update the
check that iterates over tid in mc-t1..mc-t6 (the pid_count logic that tests
"$TEST_DIR/supervisor/pids/${tid}.pid") to require all 6 PID files instead of
">= 3"; change the conditional to compare against 6 and adjust the pass/fail
messages ("PID files exist for dispatched workers ($pid_count found)" and
"Expected >= 3 PID files, got $pid_count") to reflect that 6 are expected so
failures are clear.

---

Nitpick comments:
In `@tests/test-multi-container-batch-dispatch.sh`:
- Around line 186-194: The helper _run_isolated_test currently discards all
stderr by running bash "$test_script" 2>/dev/null; change this to capture stderr
to a temp file (e.g., "${test_script}.err") and only suppress output on success,
printing the stderr file on non-zero exit (or when a global verbose flag is set)
so test failures and sourced-module errors are visible; update the function to
write stderr to that temp file, check the return code from bash, and cat the
stderr file on failure or when --verbose is enabled before cleaning up.
- Around line 308-494: The tests duplicate the same ~18-line setup across
multiple _run_isolated_test heredocs around resolve_ai_cli; extract a helper
function (e.g., resolve_cli_for_model) that accepts the model string and a
SUPERVISOR_PREFER_OAUTH flag, builds and runs the single heredoc calling
_run_isolated_test and resolve_ai_cli (reusing the same HOME/mock setup,
sourcing _common.sh and dispatch.sh, and cleanup), then replace the six repeated
blocks with calls like oauth_opus_cli=$(resolve_cli_for_model
'anthropic/claude-opus-4-6' true) (ensure the helper toggles
SUPERVISOR_PREFER_OAUTH and unsets SUPERVISOR_CLI as before and returns the
resolve_ai_cli output).
- Around line 140-143: The helper function sup currently ends with an
unnecessary explicit "return $?" which is redundant because Bash functions
return the exit status of the last command; remove the "return $?" line from the
sup() function (which runs bash "$SUPERVISOR_SCRIPT" "$@" 2>&1) so the function
simply invokes that command and lets its exit status propagate implicitly.
- Around line 1018-1022: The test only verifies that adaptive_result is a
non-empty integer; change it to compute the expected value (e.g., expected =
min(base * factor, max) for the call calculate_adaptive_concurrency 3 2 6 →
expected 6) and assert adaptive_result equals that expected value instead of
only matching a number; update the conditional that currently checks
"$adaptive_result" =~ ^[0-9]+$ to compare numeric equality between
adaptive_result and the calculated expected, and produce the pass/skip messages
referencing calculate_adaptive_concurrency and adaptive_result accordingly.
- Around line 88-102: The mock scripts embed unsafe shared /tmp fallbacks for
MOCK_CLAUDE_LOG and MOCK_OPENCODE_LOG inside the heredocs which can collide in
CI; update the heredoc defaults to use unique per-process names (e.g., derive
from $$ or use mktemp) or remove the inline fallback entirely and make the
script fail fast if MOCK_CLAUDE_LOG / MOCK_OPENCODE_LOG are not set (these env
vars are exported later in the test harness), so that logs never silently land
in a global /tmp file.

coderabbitai · 2026-02-21T17:38:30Z

tests/test-multi-container-batch-dispatch.sh

+pid_count=0
+for tid in mc-t1 mc-t2 mc-t3 mc-t4 mc-t5 mc-t6; do
+	if [[ -f "$TEST_DIR/supervisor/pids/${tid}.pid" ]]; then
+		pid_count=$((pid_count + 1))
+	fi
+done
+if [[ "$pid_count" -ge 3 ]]; then
+	pass "PID files exist for dispatched workers ($pid_count found)"
+else
+	fail "Expected >= 3 PID files, got $pid_count"
+fi


⚠️ Potential issue | 🟡 Minor

Loose assertion: PID file count check uses >= 3 when all 6 should exist.

PID files were created for mc-t1..mc-t3 (line 528) and mc-t4..mc-t6 (line 767), so all 6 should be present before any cleanup runs. The >= 3 threshold could silently pass even if half the PID files were never written.

Proposed fix

-if [[ "$pid_count" -ge 3 ]]; then - pass "PID files exist for dispatched workers ($pid_count found)" +if [[ "$pid_count" -eq 6 ]]; then + pass "PID files exist for all 6 dispatched workers" else - fail "Expected >= 3 PID files, got $pid_count" + fail "Expected 6 PID files, got $pid_count" fi

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/test-multi-container-batch-dispatch.sh` around lines 910 - 920, The PID file existence assertion is too loose—update the check that iterates over tid in mc-t1..mc-t6 (the pid_count logic that tests "$TEST_DIR/supervisor/pids/${tid}.pid") to require all 6 PID files instead of ">= 3"; change the conditional to compare against 6 and adjust the pass/fail messages ("PID files exist for dispatched workers ($pid_count found)" and "Expected >= 3 PID files, got $pid_count") to reflect that 6 are expected so failures are clear.

Auto-dismissed: bot review does not block autonomous pipeline

marcusquinn marked this pull request as ready for review February 21, 2026 16:22

gemini-code-assist bot reviewed Feb 21, 2026

View reviewed changes

marcusquinn mentioned this pull request Feb 21, 2026

[Supervisor:marcusquinn] 0 queued, 0 working, 1 in review at 14:46 UTC #1502

Closed

marcusquinn force-pushed the feature/t1165.4 branch from 3ea6ad2 to cf0d05d Compare February 21, 2026 17:33

coderabbitai bot previously requested changes Feb 21, 2026

View reviewed changes

marcusquinn merged commit 8a6ef98 into main Feb 21, 2026
15 checks passed

marcusquinn mentioned this pull request Feb 21, 2026

t1165.4: Integration test: multi-container batch dispatch #1766

Closed

28 tasks

marcusquinn deleted the feature/t1165.4 branch February 21, 2026 17:50

marcusquinn added a commit that referenced this pull request Feb 21, 2026

chore: mark t1165.4 complete in TODO.md (#2111)

9feca79

This was referenced Feb 21, 2026

t1160: Dual-CLI Architecture: OpenCode primary + Claude Code CLI fallback #1746

Closed

t1165: Containerized Claude Code CLI instances for multi-subscription scaling #1762

Closed

marcusquinn mentioned this pull request Feb 22, 2026

t1165: Verify — containerized Claude Code CLI instances for multi-subscription scaling #2180

Merged

	# shellcheck disable=SC2317,SC2329
	# shellcheck disable=SC2317,SC2329 # SC2317: The trap is for EXIT, so commands after it are unreachable. SC2329: The trap is not meant to be inherited by subshells.

	rm -f '$TEST_DIR/supervisor/health/claude-oauth' 2>/dev/null
	rm -f '$TEST_DIR/supervisor/health/claude-oauth'

	if grep -q "WRAPPER_STARTED task_id=$tid" "$log_file" 2>/dev/null; then
	if grep -q "WRAPPER_STARTED task_id=$tid" "$log_file"; then

	if bash -n "$SUPERVISOR_SCRIPT" 2>/dev/null; then
	if bash -n "$SUPERVISOR_SCRIPT"; then

Conversation

marcusquinn commented Feb 21, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test Coverage

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist bot commented Feb 21, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

github-actions bot commented Feb 21, 2026

🔍 Code Quality Report

📈 Current Quality Metrics

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 21, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Feb 21, 2026

🔍 Code Quality Report

📈 Current Quality Metrics

Uh oh!

sonarqubecloud bot commented Feb 21, 2026

Quality Gate passed

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 21, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

marcusquinn commented Feb 21, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 21, 2026 •

edited

Loading