t1165.4: Integration test — multi-container batch dispatch#2111
t1165.4: Integration test — multi-container batch dispatch#2111marcusquinn merged 1 commit intomainfrom
Conversation
WalkthroughA comprehensive Bash integration test script exercising end-to-end multi-container batch dispatch workflows, including task registration, parallel worker dispatch across containers, OAuth routing logic, lifecycle state transitions, logging isolation, failure recovery with backfill, and extensive edge case validation. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @marcusquinn, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a critical integration test designed to thoroughly validate the supervisor's behavior in a multi-container batch dispatch environment. It ensures that the system correctly handles parallel task execution, routes OAuth credentials appropriately, manages container lifecycles, aggregates logs effectively, isolates failures, and scales concurrency adaptively. This new test significantly enhances the reliability and stability of the batch processing pipeline. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Sat Feb 21 16:23:12 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive integration test for multi-container batch dispatch. The test script is well-structured and covers a wide range of scenarios, including parallel execution, OAuth routing, log aggregation, and failure isolation. My review focuses on adherence to the repository's shell scripting style guide and best practices. I've identified several areas for improvement, primarily related to explicit return statements, error handling, and output redirection, to enhance the script's robustness and maintainability. Overall, this is a valuable addition that significantly improves test coverage for the supervisor's batch processing capabilities.
| export MOCK_CLAUDE_LOG="$TEST_DIR/mock-claude-invocations.log" | ||
| export MOCK_OPENCODE_LOG="$TEST_DIR/mock-opencode-invocations.log" | ||
|
|
||
| # shellcheck disable=SC2317,SC2329 |
There was a problem hiding this comment.
The repository style guide (line 14) requires a reason for disabling ShellCheck warnings. Please add a comment explaining why SC2317 and SC2329 are disabled for the cleanup function.
| # shellcheck disable=SC2317,SC2329 | |
| # shellcheck disable=SC2317,SC2329 # SC2317: The trap is for EXIT, so commands after it are unreachable. SC2329: The trap is not meant to be inherited by subshells. |
References
- ShellCheck violations require targeted inline disables with reason comments. (link)
| test_db() { | ||
| sqlite3 -cmd ".timeout 5000" "$TEST_DIR/supervisor/supervisor.db" "$@" | ||
| } | ||
|
|
||
| # Helper: get task status | ||
| get_status() { | ||
| test_db "SELECT status FROM tasks WHERE id = '$1';" | ||
| } | ||
|
|
||
| # Helper: get task field | ||
| get_field() { | ||
| test_db "SELECT $2 FROM tasks WHERE id = '$1';" | ||
| } | ||
|
|
||
| # Helper: create a mock worker log file | ||
| create_log() { | ||
| local task_id="$1" | ||
| local content="$2" | ||
| local log_file="$TEST_DIR/supervisor/logs/${task_id}.log" | ||
| mkdir -p "$TEST_DIR/supervisor/logs" | ||
| echo "$content" >"$log_file" | ||
| test_db "UPDATE tasks SET log_file = '$log_file' WHERE id = '$task_id';" | ||
| echo "$log_file" | ||
| } | ||
|
|
||
| # Helper: create a mock PID file for a running worker | ||
| create_pid_file() { | ||
| local task_id="$1" | ||
| local pid="$2" | ||
| mkdir -p "$TEST_DIR/supervisor/pids" | ||
| echo "$pid" >"$TEST_DIR/supervisor/pids/${task_id}.pid" | ||
| } | ||
|
|
||
| # Helper: count tasks in a given status | ||
| count_status() { | ||
| local status="$1" | ||
| test_db "SELECT count(*) FROM tasks WHERE status = '$status';" | ||
| } |
There was a problem hiding this comment.
Several helper functions are missing explicit return statements, which is required by the repository's style guide (line 12). While the script may function correctly with set -e, adding explicit returns improves clarity and adherence to the coding standards.
For functions that wrap another command and should propagate its exit code, use return $?. For functions that are primarily used for their standard output and are expected to succeed, use return 0.
test_db() {
sqlite3 -cmd ".timeout 5000" "$TEST_DIR/supervisor/supervisor.db" "$@"
return $?
}
# Helper: get task status
get_status() {
test_db "SELECT status FROM tasks WHERE id = '$1';"
return $?
}
# Helper: get task field
get_field() {
test_db "SELECT $2 FROM tasks WHERE id = '$1';"
return $?
}
# Helper: create a mock worker log file
create_log() {
local task_id="$1"
local content="$2"
local log_file="$TEST_DIR/supervisor/logs/${task_id}.log"
mkdir -p "$TEST_DIR/supervisor/logs"
echo "$content" >"$log_file"
test_db "UPDATE tasks SET log_file = '$log_file' WHERE id = '$task_id';"
echo "$log_file"
return 0
}
# Helper: create a mock PID file for a running worker
create_pid_file() {
local task_id="$1"
local pid="$2"
mkdir -p "$TEST_DIR/supervisor/pids"
echo "$pid" >"$TEST_DIR/supervisor/pids/${task_id}.pid"
return $?
}
# Helper: count tasks in a given status
count_status() {
local status="$1"
test_db "SELECT count(*) FROM tasks WHERE status = '$status';"
return $?
}References
- All functions must have explicit
returnstatements. (link) - All functions in shell scripts must have explicit
returnstatements to ensure predictable behavior and exit code propagation.
| export HOME=\$mock_home | ||
| mkdir -p "\$mock_home/.claude" | ||
| echo '{"hasCompletedOnboarding":true}' > "\$mock_home/.claude/settings.json" | ||
| rm -f '$TEST_DIR/supervisor/health/claude-oauth' 2>/dev/null |
There was a problem hiding this comment.
The -f flag for rm already suppresses errors for non-existent files, so 2>/dev/null is redundant here. It can also hide other errors like permission issues, which goes against the repository's general rules. Please remove 2>/dev/null from this line and similar occurrences on lines 357, 388, 419, 453, and 484.
| rm -f '$TEST_DIR/supervisor/health/claude-oauth' 2>/dev/null | |
| rm -f '$TEST_DIR/supervisor/health/claude-oauth' |
References
- Avoid using
2>/dev/nullto suppress errors on file operations if the file's existence has already been verified by a preceding check (e.g.,[[ -f "$file" ]]or an early return). This practice is redundant for 'file not found' errors and can mask other important issues like permissions problems. - Avoid using
2>/dev/nullfor blanket suppression of command errors in shell scripts to ensure that authentication, syntax, or system issues remain visible for debugging.
| # Verify WRAPPER_STARTED sentinel in all logs | ||
| for tid in mc-t1 mc-t2 mc-t3; do | ||
| log_file=$(get_field "$tid" "log_file") | ||
| if grep -q "WRAPPER_STARTED task_id=$tid" "$log_file" 2>/dev/null; then |
There was a problem hiding this comment.
Suppressing stderr with 2>/dev/null can hide important errors, such as file permission issues, making debugging harder. It's better to let potential errors be visible. Please remove 2>/dev/null from this line and similar grep calls on lines 610, 621, 880, and 894.
| if grep -q "WRAPPER_STARTED task_id=$tid" "$log_file" 2>/dev/null; then | |
| if grep -q "WRAPPER_STARTED task_id=$tid" "$log_file"; then |
References
- Avoid using
2>/dev/nullfor blanket suppression of command errors in shell scripts to ensure that authentication, syntax, or system issues remain visible for debugging.
| section "Cross-Cutting Quality" | ||
|
|
||
| # Test: supervisor-helper.sh passes bash syntax check | ||
| if bash -n "$SUPERVISOR_SCRIPT" 2>/dev/null; then |
There was a problem hiding this comment.
Suppressing stderr for syntax checks with 2>/dev/null hides the actual error messages, making it difficult to debug test failures. Please remove 2>/dev/null to allow error messages to be printed. This also applies to the checks on lines 1122 and 1135.
| if bash -n "$SUPERVISOR_SCRIPT" 2>/dev/null; then | |
| if bash -n "$SUPERVISOR_SCRIPT"; then |
References
- Avoid using
2>/dev/nullfor blanket suppression of command errors in shell scripts to ensure that authentication, syntax, or system issues remain visible for debugging.
Comprehensive integration test verifying: - Parallel workers across containers (batch concurrency enforcement) - Correct OAuth routing (claude for Anthropic, opencode for non-Anthropic) - Container lifecycle (dispatch -> running -> evaluate -> cleanup) - Log aggregation across parallel workers (sentinels, per-task logs) - Batch completion detection and post-completion hooks - Adaptive concurrency scaling - Cross-container failure isolation (one worker fails, others continue) - Second dispatch wave backfill after completions - Edge cases (empty batch, single-task, duplicates, release-on-complete) 70 test assertions covering the full supervisor batch dispatch pipeline.
3ea6ad2 to
cf0d05d
Compare
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Sat Feb 21 17:34:05 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
|
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (5)
tests/test-multi-container-batch-dispatch.sh (5)
186-194:_run_isolated_testswallows all stderr, making failures hard to diagnose.Line 190 redirects all stderr to
/dev/null. If a sourced module or the test body produces an unexpected error, it will be silently eaten. Consider routing stderr to a temp file and dumping it only on failure, or honoring the--verboseflag.Sketch
- bash "$test_script" 2>/dev/null + local stderr_file="$TEST_DIR/isolated-stderr-$$.log" + bash "$test_script" 2>"$stderr_file" local rc=$? + if [[ $rc -ne 0 && "$VERBOSE" == "--verbose" ]]; then + cat "$stderr_file" >&2 + fi + rm -f "$stderr_file" rm -f "$test_script" return $rc🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test-multi-container-batch-dispatch.sh` around lines 186 - 194, The helper _run_isolated_test currently discards all stderr by running bash "$test_script" 2>/dev/null; change this to capture stderr to a temp file (e.g., "${test_script}.err") and only suppress output on success, printing the stderr file on non-zero exit (or when a global verbose flag is set) so test failures and sourced-module errors are visible; update the function to write stderr to that temp file, check the return code from bash, and cat the stderr file on failure or when --verbose is enabled before cleaning up.
308-494: Heavy duplication across six OAuth routing test blocks.Each of the six
_run_isolated_testheredocs (lines 309–329, 340–360, 371–391, 402–422, 436–456, 467–487) repeats ~18 identical setup lines, varying onlySUPERVISOR_PREFER_OAUTHand the model argument toresolve_ai_cli. Extracting a helper would cut ~170 lines of near-identical code and make adding new routing test cases trivial.Sketch: extract a reusable routing test helper
+# Helper: resolve CLI for a given model under specified OAuth preference +resolve_cli_for_model() { + local model="$1" + local prefer_oauth="${2:-true}" + _run_isolated_test <<EOF +#!/usr/bin/env bash +set -euo pipefail +export AIDEVOPS_SUPERVISOR_DIR='$TEST_DIR/supervisor' +export PATH='$MOCK_BIN':/usr/bin:/bin +export SUPERVISOR_PREFER_OAUTH=$prefer_oauth +unset SUPERVISOR_CLI 2>/dev/null || true +BLUE='' GREEN='' YELLOW='' RED='' NC='' +SUPERVISOR_LOG='/dev/null' +SUPERVISOR_DIR='$TEST_DIR/supervisor' +source '$SHARED_CONSTANTS' +source '$SUPERVISOR_DIR_MODULE/_common.sh' +source '$SUPERVISOR_DIR_MODULE/dispatch.sh' +mock_home=\$(mktemp -d) +export HOME=\$mock_home +mkdir -p "\$mock_home/.claude" +echo '{"hasCompletedOnboarding":true}' > "\$mock_home/.claude/settings.json" +rm -f '$TEST_DIR/supervisor/health/claude-oauth' 2>/dev/null +resolve_ai_cli '$model' +rm -rf "\$mock_home" +EOF +}Then each test collapses to:
oauth_opus_cli=$(resolve_cli_for_model 'anthropic/claude-opus-4-6' true)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test-multi-container-batch-dispatch.sh` around lines 308 - 494, The tests duplicate the same ~18-line setup across multiple _run_isolated_test heredocs around resolve_ai_cli; extract a helper function (e.g., resolve_cli_for_model) that accepts the model string and a SUPERVISOR_PREFER_OAUTH flag, builds and runs the single heredoc calling _run_isolated_test and resolve_ai_cli (reusing the same HOME/mock setup, sourcing _common.sh and dispatch.sh, and cleanup), then replace the six repeated blocks with calls like oauth_opus_cli=$(resolve_cli_for_model 'anthropic/claude-opus-4-6' true) (ensure the helper toggles SUPERVISOR_PREFER_OAUTH and unsets SUPERVISOR_CLI as before and returns the resolve_ai_cli output).
140-143: Redundantreturn $?.In Bash, a function implicitly returns the exit code of its last command. The explicit
return $?on line 142 is a no-op. Not harmful, but removing it keeps the helper minimal.Proposed fix
sup() { - bash "$SUPERVISOR_SCRIPT" "$@" 2>&1 - return $? + bash "$SUPERVISOR_SCRIPT" "$@" 2>&1 }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test-multi-container-batch-dispatch.sh` around lines 140 - 143, The helper function sup currently ends with an unnecessary explicit "return $?" which is redundant because Bash functions return the exit status of the last command; remove the "return $?" line from the sup() function (which runs bash "$SUPERVISOR_SCRIPT" "$@" 2>&1) so the function simply invokes that command and lets its exit status propagate implicitly.
1018-1022: Adaptive concurrency assertion only checks for "is a number", not the expected value.The test calls
calculate_adaptive_concurrency 3 2 6but only verifies the result is a non-empty integer. It doesn't assert the expected output (e.g.,6if the formula ismin(base * factor, max)). This makes it a smoke test rather than a correctness test. Consider asserting the expected value.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test-multi-container-batch-dispatch.sh` around lines 1018 - 1022, The test only verifies that adaptive_result is a non-empty integer; change it to compute the expected value (e.g., expected = min(base * factor, max) for the call calculate_adaptive_concurrency 3 2 6 → expected 6) and assert adaptive_result equals that expected value instead of only matching a number; update the conditional that currently checks "$adaptive_result" =~ ^[0-9]+$ to compare numeric equality between adaptive_result and the calculated expected, and produce the pass/skip messages referencing calculate_adaptive_concurrency and adaptive_result accordingly.
88-102: Mock default log paths fall back to shared/tmplocations.Lines 90 and 108 define fallback paths (
/tmp/mock-claude-invocations.log,/tmp/mock-opencode-invocations.log) inside the heredoc. WhileMOCK_CLAUDE_LOG/MOCK_OPENCODE_LOGare exported on lines 120–121 to override these defaults, the fallback creates a small risk: if the env var isn't inherited (e.g., a subshellenv -iscenario), logs land in a shared/tmpfile, potentially colliding across parallel CI runs or leaking test artifact details.Consider pointing the fallback at a path that's guaranteed unique (e.g., derived from
$$ormktemp), or removing the default entirely and failing fast if the env var is unset.Proposed fix
-MOCK_LOG="${MOCK_CLAUDE_LOG:-/tmp/mock-claude-invocations.log}" +MOCK_LOG="${MOCK_CLAUDE_LOG:?MOCK_CLAUDE_LOG must be set}"-MOCK_LOG="${MOCK_OPENCODE_LOG:-/tmp/mock-opencode-invocations.log}" +MOCK_LOG="${MOCK_OPENCODE_LOG:?MOCK_OPENCODE_LOG must be set}"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test-multi-container-batch-dispatch.sh` around lines 88 - 102, The mock scripts embed unsafe shared /tmp fallbacks for MOCK_CLAUDE_LOG and MOCK_OPENCODE_LOG inside the heredocs which can collide in CI; update the heredoc defaults to use unique per-process names (e.g., derive from $$ or use mktemp) or remove the inline fallback entirely and make the script fail fast if MOCK_CLAUDE_LOG / MOCK_OPENCODE_LOG are not set (these env vars are exported later in the test harness), so that logs never silently land in a global /tmp file.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/test-multi-container-batch-dispatch.sh`:
- Around line 910-920: The PID file existence assertion is too loose—update the
check that iterates over tid in mc-t1..mc-t6 (the pid_count logic that tests
"$TEST_DIR/supervisor/pids/${tid}.pid") to require all 6 PID files instead of
">= 3"; change the conditional to compare against 6 and adjust the pass/fail
messages ("PID files exist for dispatched workers ($pid_count found)" and
"Expected >= 3 PID files, got $pid_count") to reflect that 6 are expected so
failures are clear.
---
Nitpick comments:
In `@tests/test-multi-container-batch-dispatch.sh`:
- Around line 186-194: The helper _run_isolated_test currently discards all
stderr by running bash "$test_script" 2>/dev/null; change this to capture stderr
to a temp file (e.g., "${test_script}.err") and only suppress output on success,
printing the stderr file on non-zero exit (or when a global verbose flag is set)
so test failures and sourced-module errors are visible; update the function to
write stderr to that temp file, check the return code from bash, and cat the
stderr file on failure or when --verbose is enabled before cleaning up.
- Around line 308-494: The tests duplicate the same ~18-line setup across
multiple _run_isolated_test heredocs around resolve_ai_cli; extract a helper
function (e.g., resolve_cli_for_model) that accepts the model string and a
SUPERVISOR_PREFER_OAUTH flag, builds and runs the single heredoc calling
_run_isolated_test and resolve_ai_cli (reusing the same HOME/mock setup,
sourcing _common.sh and dispatch.sh, and cleanup), then replace the six repeated
blocks with calls like oauth_opus_cli=$(resolve_cli_for_model
'anthropic/claude-opus-4-6' true) (ensure the helper toggles
SUPERVISOR_PREFER_OAUTH and unsets SUPERVISOR_CLI as before and returns the
resolve_ai_cli output).
- Around line 140-143: The helper function sup currently ends with an
unnecessary explicit "return $?" which is redundant because Bash functions
return the exit status of the last command; remove the "return $?" line from the
sup() function (which runs bash "$SUPERVISOR_SCRIPT" "$@" 2>&1) so the function
simply invokes that command and lets its exit status propagate implicitly.
- Around line 1018-1022: The test only verifies that adaptive_result is a
non-empty integer; change it to compute the expected value (e.g., expected =
min(base * factor, max) for the call calculate_adaptive_concurrency 3 2 6 →
expected 6) and assert adaptive_result equals that expected value instead of
only matching a number; update the conditional that currently checks
"$adaptive_result" =~ ^[0-9]+$ to compare numeric equality between
adaptive_result and the calculated expected, and produce the pass/skip messages
referencing calculate_adaptive_concurrency and adaptive_result accordingly.
- Around line 88-102: The mock scripts embed unsafe shared /tmp fallbacks for
MOCK_CLAUDE_LOG and MOCK_OPENCODE_LOG inside the heredocs which can collide in
CI; update the heredoc defaults to use unique per-process names (e.g., derive
from $$ or use mktemp) or remove the inline fallback entirely and make the
script fail fast if MOCK_CLAUDE_LOG / MOCK_OPENCODE_LOG are not set (these env
vars are exported later in the test harness), so that logs never silently land
in a global /tmp file.
| pid_count=0 | ||
| for tid in mc-t1 mc-t2 mc-t3 mc-t4 mc-t5 mc-t6; do | ||
| if [[ -f "$TEST_DIR/supervisor/pids/${tid}.pid" ]]; then | ||
| pid_count=$((pid_count + 1)) | ||
| fi | ||
| done | ||
| if [[ "$pid_count" -ge 3 ]]; then | ||
| pass "PID files exist for dispatched workers ($pid_count found)" | ||
| else | ||
| fail "Expected >= 3 PID files, got $pid_count" | ||
| fi |
There was a problem hiding this comment.
Loose assertion: PID file count check uses >= 3 when all 6 should exist.
PID files were created for mc-t1..mc-t3 (line 528) and mc-t4..mc-t6 (line 767), so all 6 should be present before any cleanup runs. The >= 3 threshold could silently pass even if half the PID files were never written.
Proposed fix
-if [[ "$pid_count" -ge 3 ]]; then
- pass "PID files exist for dispatched workers ($pid_count found)"
+if [[ "$pid_count" -eq 6 ]]; then
+ pass "PID files exist for all 6 dispatched workers"
else
- fail "Expected >= 3 PID files, got $pid_count"
+ fail "Expected 6 PID files, got $pid_count"
fi🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/test-multi-container-batch-dispatch.sh` around lines 910 - 920, The PID
file existence assertion is too loose—update the check that iterates over tid in
mc-t1..mc-t6 (the pid_count logic that tests
"$TEST_DIR/supervisor/pids/${tid}.pid") to require all 6 PID files instead of
">= 3"; change the conditional to compare against 6 and adjust the pass/fail
messages ("PID files exist for dispatched workers ($pid_count found)" and
"Expected >= 3 PID files, got $pid_count") to reflect that 6 are expected so
failures are clear.
Auto-dismissed: bot review does not block autonomous pipeline



Summary
Test Coverage
Ref #1766
Summary by CodeRabbit