t1165.2: Container pool manager for supervisor#2184
Conversation
Implements container pool management in the supervisor with:
- Spawn/destroy Docker containers with OAuth token injection
- Health checks (Docker state + rate limit cooldown expiry)
- Round-robin dispatch across healthy containers (oldest-first)
- Per-container rate limit tracking with configurable cooldown
- Idle container cleanup respecting pool minimum size
- CLI: supervisor-helper.sh pool {spawn|destroy|list|status|health|select|rate-limit|cleanup}
- DB schema: container_pool + container_dispatch_log tables
- Integration hook in dispatch.sh (opt-in via SUPERVISOR_USE_CONTAINER_POOL=true)
- 45/45 unit tests passing, ShellCheck clean
Summary of ChangesHello @marcusquinn, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the supervisor's capabilities by introducing a sophisticated container pooling system. This system automates the management of worker containers, ensuring efficient resource utilization, load balancing through round-robin dispatch, and resilience against rate limits or unhealthy containers. It provides a scalable foundation for executing tasks in parallel, improving overall system performance and reliability. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
WalkthroughThis PR introduces a comprehensive container pool management system to the supervisor framework. It adds a new pool subcommand with SQLite-backed lifecycle management, health monitoring, round-robin container selection with rate limiting, dispatch logging, and idle cleanup. Changes
Sequence DiagramsequenceDiagram
participant CLI as supervisor-helper.sh pool
participant PM as pool_*() functions
participant DB as SQLite Database
participant Docker as Docker CLI
CLI->>PM: cmd_pool spawn [args]
PM->>DB: Insert container record (status=starting)
PM->>Docker: docker run (local or remote)
Docker-->>PM: Container ID
PM->>DB: Update with container ID & image
rect rgba(100, 150, 200, 0.5)
Note over PM,DB: Health Check Loop
PM->>Docker: docker inspect
PM->>DB: Update status & health_check_ts
end
rect rgba(150, 200, 100, 0.5)
Note over PM,DB: Dispatch Flow (when enabled)
CLI->>PM: pool_select_container()
PM->>DB: Query healthy containers (skip rate-limited)
PM-->>CLI: Container (round-robin)
CLI->>PM: pool_record_dispatch(container_id)
PM->>DB: Insert dispatch_log entry
PM->>DB: Increment dispatch_count
end
rect rgba(200, 100, 100, 0.5)
Note over PM,DB: Completion & Rate Limiting
CLI->>PM: pool_record_completion(container_id, outcome)
PM->>DB: Update completion_ts
alt Rate Limited Outcome
PM->>PM: pool_mark_rate_limited()
PM->>DB: Set rate_limit_until cooldown
end
end
CLI->>PM: pool_destroy_idle()
PM->>DB: Query idle containers
PM->>Docker: docker stop/rm
PM->>DB: Update status=stopped
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Mon Feb 23 01:18:32 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
|
There was a problem hiding this comment.
Code Review
This is an excellent pull request that adds a significant and well-tested container pool manager feature. The new module is well-structured, and the inclusion of a comprehensive test suite is commendable. My review includes a critical fix to prevent error suppression and a suggestion to optimize database queries for better performance, both aligned with established repository rules.
| pool_container_id=$(pool_select_container 2>/dev/null) || pool_container_id="" | ||
| if [[ -n "$pool_container_id" ]]; then | ||
| log_info "Container pool: selected $pool_container_id for $task_id (t1165.2)" | ||
| pool_record_dispatch "$pool_container_id" "$task_id" 2>/dev/null || true |
There was a problem hiding this comment.
The use of 2>/dev/null here suppresses all error output from pool_record_dispatch. This is risky because the function performs database operations that could fail for various reasons (e.g., database locked, disk full, schema mismatch). Suppressing these errors can hide critical problems and make debugging difficult. The project's general rules advise against suppressing stderr in such cases, and specifically recommend against blanket suppression of command errors to ensure issues remain visible for debugging. While || true is used to prevent script exit, 2>/dev/null should be removed to allow potential errors to be logged.
| pool_record_dispatch "$pool_container_id" "$task_id" 2>/dev/null || true | |
| pool_record_dispatch "$pool_container_id" "$task_id" || true |
References
- Avoid using '2>/dev/null' for blanket suppression of command errors in shell scripts to ensure that authentication, syntax, or system issues remain visible for debugging.
- In shell scripts with 'set -e' enabled, use '|| true' to prevent the script from exiting when a command like 'jq' fails on an optional lookup. Do not suppress stderr with '2>/dev/null' so that actual syntax or system errors remain visible for debugging.
| local total healthy unhealthy rate_limited stopped failed | ||
| total=$(db "$SUPERVISOR_DB" "SELECT COUNT(*) FROM container_pool;" 2>/dev/null || echo "0") | ||
| healthy=$(db "$SUPERVISOR_DB" "SELECT COUNT(*) FROM container_pool WHERE status='healthy';" 2>/dev/null || echo "0") | ||
| unhealthy=$(db "$SUPERVISOR_DB" "SELECT COUNT(*) FROM container_pool WHERE status='unhealthy';" 2>/dev/null || echo "0") | ||
| rate_limited=$(db "$SUPERVISOR_DB" "SELECT COUNT(*) FROM container_pool WHERE status='rate_limited';" 2>/dev/null || echo "0") | ||
| stopped=$(db "$SUPERVISOR_DB" "SELECT COUNT(*) FROM container_pool WHERE status='stopped';" 2>/dev/null || echo "0") | ||
| failed=$(db "$SUPERVISOR_DB" "SELECT COUNT(*) FROM container_pool WHERE status='failed';" 2>/dev/null || echo "0") |
There was a problem hiding this comment.
These six separate database queries to get status counts can be consolidated into a single, more efficient query using conditional aggregation. This reduces the overhead of multiple sqlite3 process invocations and database accesses, making the function faster and more atomic. This aligns with the rule to efficiently fetch multiple aggregate statistics from SQLite using a single query that returns a delimited string. Additionally, the 2>/dev/null in the suggested code should be removed to avoid blanket suppression of errors, ensuring that potential issues like database locks or syntax errors remain visible for debugging, as per repository guidelines. The || echo fallback is sufficient to provide default values without hiding errors.
| local total healthy unhealthy rate_limited stopped failed | |
| total=$(db "$SUPERVISOR_DB" "SELECT COUNT(*) FROM container_pool;" 2>/dev/null || echo "0") | |
| healthy=$(db "$SUPERVISOR_DB" "SELECT COUNT(*) FROM container_pool WHERE status='healthy';" 2>/dev/null || echo "0") | |
| unhealthy=$(db "$SUPERVISOR_DB" "SELECT COUNT(*) FROM container_pool WHERE status='unhealthy';" 2>/dev/null || echo "0") | |
| rate_limited=$(db "$SUPERVISOR_DB" "SELECT COUNT(*) FROM container_pool WHERE status='rate_limited';" 2>/dev/null || echo "0") | |
| stopped=$(db "$SUPERVISOR_DB" "SELECT COUNT(*) FROM container_pool WHERE status='stopped';" 2>/dev/null || echo "0") | |
| failed=$(db "$SUPERVISOR_DB" "SELECT COUNT(*) FROM container_pool WHERE status='failed';" 2>/dev/null || echo "0") | |
| local total healthy unhealthy rate_limited stopped failed | |
| local stats_line | |
| stats_line=$(db -separator ' ' "$SUPERVISOR_DB" " | |
| SELECT | |
| COALESCE(COUNT(*), 0), | |
| COALESCE(SUM(CASE WHEN status = 'healthy' THEN 1 ELSE 0 END), 0), | |
| COALESCE(SUM(CASE WHEN status = 'unhealthy' THEN 1 ELSE 0 END), 0), | |
| COALESCE(SUM(CASE WHEN status = 'rate_limited' THEN 1 ELSE 0 END), 0), | |
| COALESCE(SUM(CASE WHEN status = 'stopped' THEN 1 ELSE 0 END), 0), | |
| COALESCE(SUM(CASE WHEN status = 'failed' THEN 1 ELSE 0 END), 0) | |
| FROM container_pool; | |
| " || echo "0 0 0 0 0 0") | |
| read -r total healthy unhealthy rate_limited stopped failed <<<"$stats_line" |
References
- To efficiently fetch multiple aggregate statistics from SQLite in a shell script, use a single query with subselects that returns a delimited string, and parse it using
IFS='|' read. - Avoid using '2>/dev/null' for blanket suppression of command errors in shell scripts to ensure that authentication, syntax, or system issues remain visible for debugging.
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.agents/scripts/supervisor/container-pool.sh:
- Around line 182-197: The indirect expansion ${!token_ref} can fail under set
-u when token_ref contains characters invalid for an env var name (e.g. paths
like "foo/bar"); before falling back to the env-var lookup in the token
resolution block, validate token_ref is a legal shell identifier (e.g. matches
/^[a-zA-Z_][a-zA-Z0-9_]*$/) and only then perform the indirect expansion to set
token_value, otherwise skip that branch (and keep the existing warning path).
Update the code around token_ref/token_value/docker_args in the token injection
logic (the block that attempts gopass then env var) to perform this identifier
check and avoid indirect expansion errors.
- Around line 483-490: The SELECT that assigns selected in the container-pool
script must exclude containers with active/incomplete dispatches: update the SQL
inside the db "$SUPERVISOR_DB" call (the query that selects from container_pool)
to add an exclusion such as AND NOT EXISTS (SELECT 1 FROM dispatch WHERE
dispatch.container_id = container_pool.id AND dispatch.completed_at IS NULL)
(or, if your schema uses a status column, AND NOT EXISTS (SELECT 1 FROM dispatch
WHERE dispatch.container_id = container_pool.id AND dispatch.status NOT IN
('succeeded','failed','cancelled'))). This change should be made where selected
is set so the round-robin never picks a container that currently has an
unfinished dispatch.
- Around line 567-577: pool_mark_rate_limited currently interpolates the
cooldown directly into the SQL timestamp expression; validate and sanitize the
cooldown before interpolation by ensuring it's a non-negative integer (e.g.
check with a digit-only regex or coerce with printf '%d' and fallback to the
default CONTAINER_POOL_RATE_LIMIT_COOLDOWN if invalid), optionally clamp to a
reasonable max, then use that sanitized numeric variable in the SQL (leave
sql_escape for container_id unchanged). Update the pool_mark_rate_limited
function to perform this validation/fallback and reference the sanitized
variable (e.g. cooldown_seconds) in the strftime('+${cooldown_seconds} seconds')
fragment.
In @.agents/scripts/supervisor/database.sh:
- Around line 767-776: ensure_db and init_db call the helper
_create_container_pool_schema which is defined in container-pool.sh but
database.sh doesn’t source that file; update database.sh to ensure the function
is available before calling it by adding a defensive check in ensure_db/init_db:
test whether _create_container_pool_schema is defined (e.g., via declare -f or
type) and if not, source container-pool.sh (or abort with a clear error), or
alternatively source container-pool.sh near the top of database.sh to guarantee
availability; reference ensure_db, init_db, and _create_container_pool_schema
when making the change.
In @.agents/scripts/supervisor/dispatch.sh:
- Around line 3222-3234: The pool selection is never wired into routing—after
getting pool_container_id you must set and persist the task's dispatch target
and short-circuit to a pool-specific dispatch branch so the selected container
is actually used. Concretely: when pool_select_container returns a non-empty
pool_container_id, set dispatch_target="pool:$pool_container_id", persist that
value to whatever task metadata/store the dispatcher later reads (so downstream
code sees it), and route the task through the pool-specific dispatch path that
sends the job to that container (and keep calling pool_record_dispatch to update
metrics). Alternatively, only invoke pool_select_container when the subsequent
routing supports a "pool:" dispatch_target; ensure any routing logic (the parts
that read dispatch_target) recognizes the "pool:<id>" prefix and sends the task
to that container.
| # Inject OAuth token if provided | ||
| if [[ -n "$token_ref" ]]; then | ||
| local token_value="" | ||
| # Try gopass first, then env var | ||
| if command -v gopass &>/dev/null; then | ||
| token_value=$(gopass show "$token_ref" 2>/dev/null || echo "") | ||
| fi | ||
| if [[ -z "$token_value" ]]; then | ||
| # Try as env var name | ||
| token_value="${!token_ref:-}" | ||
| fi | ||
| if [[ -n "$token_value" ]]; then | ||
| docker_args+=("-e" "CLAUDE_CODE_OAUTH_TOKEN=$token_value") | ||
| else | ||
| log_warn "OAuth token ref '$token_ref' could not be resolved — container may lack auth" | ||
| fi |
There was a problem hiding this comment.
Guard env-var fallback for non‑identifier token refs.
If gopass isn’t available and --token-ref is a path (e.g., foo/bar), ${!token_ref} triggers an “invalid indirect expansion” under set -u and aborts pool_spawn. Only attempt indirect expansion for valid env var names.
Suggested fix
- if [[ -z "$token_value" ]]; then
- # Try as env var name
- token_value="${!token_ref:-}"
- fi
+ if [[ -z "$token_value" ]]; then
+ # Try as env var name only when token_ref is a valid identifier
+ if [[ "$token_ref" =~ ^[A-Za-z_][A-Za-z0-9_]*$ ]]; then
+ token_value="${!token_ref:-}"
+ fi
+ fi📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Inject OAuth token if provided | |
| if [[ -n "$token_ref" ]]; then | |
| local token_value="" | |
| # Try gopass first, then env var | |
| if command -v gopass &>/dev/null; then | |
| token_value=$(gopass show "$token_ref" 2>/dev/null || echo "") | |
| fi | |
| if [[ -z "$token_value" ]]; then | |
| # Try as env var name | |
| token_value="${!token_ref:-}" | |
| fi | |
| if [[ -n "$token_value" ]]; then | |
| docker_args+=("-e" "CLAUDE_CODE_OAUTH_TOKEN=$token_value") | |
| else | |
| log_warn "OAuth token ref '$token_ref' could not be resolved — container may lack auth" | |
| fi | |
| # Inject OAuth token if provided | |
| if [[ -n "$token_ref" ]]; then | |
| local token_value="" | |
| # Try gopass first, then env var | |
| if command -v gopass &>/dev/null; then | |
| token_value=$(gopass show "$token_ref" 2>/dev/null || echo "") | |
| fi | |
| if [[ -z "$token_value" ]]; then | |
| # Try as env var name only when token_ref is a valid identifier | |
| if [[ "$token_ref" =~ ^[A-Za-z_][A-Za-z0-9_]*$ ]]; then | |
| token_value="${!token_ref:-}" | |
| fi | |
| fi | |
| if [[ -n "$token_value" ]]; then | |
| docker_args+=("-e" "CLAUDE_CODE_OAUTH_TOKEN=$token_value") | |
| else | |
| log_warn "OAuth token ref '$token_ref' could not be resolved — container may lack auth" | |
| fi |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.agents/scripts/supervisor/container-pool.sh around lines 182 - 197, The
indirect expansion ${!token_ref} can fail under set -u when token_ref contains
characters invalid for an env var name (e.g. paths like "foo/bar"); before
falling back to the env-var lookup in the token resolution block, validate
token_ref is a legal shell identifier (e.g. matches /^[a-zA-Z_][a-zA-Z0-9_]*$/)
and only then perform the indirect expansion to set token_value, otherwise skip
that branch (and keep the existing warning path). Update the code around
token_ref/token_value/docker_args in the token injection logic (the block that
attempts gopass then env var) to perform this identifier check and avoid
indirect expansion errors.
| local selected | ||
| selected=$(db "$SUPERVISOR_DB" " | ||
| SELECT id FROM container_pool | ||
| WHERE status = 'healthy' | ||
| AND (rate_limit_until IS NULL OR rate_limit_until <= strftime('%Y-%m-%dT%H:%M:%SZ','now')) | ||
| $host_clause | ||
| ORDER BY COALESCE(last_dispatch_at, '1970-01-01T00:00:00Z') ASC | ||
| LIMIT 1; |
There was a problem hiding this comment.
Exclude containers with active dispatches from selection.
Round‑robin should not pick a container that already has an incomplete dispatch. Otherwise a busy container can receive multiple tasks.
Suggested fix
selected=$(db "$SUPERVISOR_DB" "
SELECT id FROM container_pool
WHERE status = 'healthy'
AND (rate_limit_until IS NULL OR rate_limit_until <= strftime('%Y-%m-%dT%H:%M:%SZ','now'))
$host_clause
+ AND NOT EXISTS (
+ SELECT 1 FROM container_dispatch_log cdl
+ WHERE cdl.container_id = container_pool.id AND cdl.completed_at IS NULL
+ )
ORDER BY COALESCE(last_dispatch_at, '1970-01-01T00:00:00Z') ASC
LIMIT 1;
" 2>/dev/null || echo "")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| local selected | |
| selected=$(db "$SUPERVISOR_DB" " | |
| SELECT id FROM container_pool | |
| WHERE status = 'healthy' | |
| AND (rate_limit_until IS NULL OR rate_limit_until <= strftime('%Y-%m-%dT%H:%M:%SZ','now')) | |
| $host_clause | |
| ORDER BY COALESCE(last_dispatch_at, '1970-01-01T00:00:00Z') ASC | |
| LIMIT 1; | |
| local selected | |
| selected=$(db "$SUPERVISOR_DB" " | |
| SELECT id FROM container_pool | |
| WHERE status = 'healthy' | |
| AND (rate_limit_until IS NULL OR rate_limit_until <= strftime('%Y-%m-%dT%H:%M:%SZ','now')) | |
| $host_clause | |
| AND NOT EXISTS ( | |
| SELECT 1 FROM container_dispatch_log cdl | |
| WHERE cdl.container_id = container_pool.id AND cdl.completed_at IS NULL | |
| ) | |
| ORDER BY COALESCE(last_dispatch_at, '1970-01-01T00:00:00Z') ASC | |
| LIMIT 1; |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.agents/scripts/supervisor/container-pool.sh around lines 483 - 490, The
SELECT that assigns selected in the container-pool script must exclude
containers with active/incomplete dispatches: update the SQL inside the db
"$SUPERVISOR_DB" call (the query that selects from container_pool) to add an
exclusion such as AND NOT EXISTS (SELECT 1 FROM dispatch WHERE
dispatch.container_id = container_pool.id AND dispatch.completed_at IS NULL)
(or, if your schema uses a status column, AND NOT EXISTS (SELECT 1 FROM dispatch
WHERE dispatch.container_id = container_pool.id AND dispatch.status NOT IN
('succeeded','failed','cancelled'))). This change should be made where selected
is set so the round-robin never picks a container that currently has an
unfinished dispatch.
| pool_mark_rate_limited() { | ||
| local container_id="$1" | ||
| local cooldown="${2:-$CONTAINER_POOL_RATE_LIMIT_COOLDOWN}" | ||
|
|
||
| db "$SUPERVISOR_DB" " | ||
| UPDATE container_pool | ||
| SET status = 'rate_limited', | ||
| rate_limit_until = strftime('%Y-%m-%dT%H:%M:%SZ','now','+${cooldown} seconds'), | ||
| rate_limit_count = rate_limit_count + 1, | ||
| updated_at = strftime('%Y-%m-%dT%H:%M:%SZ','now') | ||
| WHERE id = '$(sql_escape "$container_id")'; |
There was a problem hiding this comment.
Validate cooldown to prevent SQL injection / malformed timestamps.
cooldown is interpolated directly into SQL; a non‑numeric value can break the query or inject SQL. Validate as an integer and fall back to the default.
Suggested fix
pool_mark_rate_limited() {
local container_id="$1"
local cooldown="${2:-$CONTAINER_POOL_RATE_LIMIT_COOLDOWN}"
+
+ if ! [[ "$cooldown" =~ ^[0-9]+$ ]]; then
+ log_warn "Invalid cooldown '$cooldown' — falling back to default"
+ cooldown="$CONTAINER_POOL_RATE_LIMIT_COOLDOWN"
+ fi
db "$SUPERVISOR_DB" "
UPDATE container_pool📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| pool_mark_rate_limited() { | |
| local container_id="$1" | |
| local cooldown="${2:-$CONTAINER_POOL_RATE_LIMIT_COOLDOWN}" | |
| db "$SUPERVISOR_DB" " | |
| UPDATE container_pool | |
| SET status = 'rate_limited', | |
| rate_limit_until = strftime('%Y-%m-%dT%H:%M:%SZ','now','+${cooldown} seconds'), | |
| rate_limit_count = rate_limit_count + 1, | |
| updated_at = strftime('%Y-%m-%dT%H:%M:%SZ','now') | |
| WHERE id = '$(sql_escape "$container_id")'; | |
| pool_mark_rate_limited() { | |
| local container_id="$1" | |
| local cooldown="${2:-$CONTAINER_POOL_RATE_LIMIT_COOLDOWN}" | |
| if ! [[ "$cooldown" =~ ^[0-9]+$ ]]; then | |
| log_warn "Invalid cooldown '$cooldown' — falling back to default" | |
| cooldown="$CONTAINER_POOL_RATE_LIMIT_COOLDOWN" | |
| fi | |
| db "$SUPERVISOR_DB" " | |
| UPDATE container_pool | |
| SET status = 'rate_limited', | |
| rate_limit_until = strftime('%Y-%m-%dT%H:%M:%SZ','now','+${cooldown} seconds'), | |
| rate_limit_count = rate_limit_count + 1, | |
| updated_at = strftime('%Y-%m-%dT%H:%M:%SZ','now') | |
| WHERE id = '$(sql_escape "$container_id")'; |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.agents/scripts/supervisor/container-pool.sh around lines 567 - 577,
pool_mark_rate_limited currently interpolates the cooldown directly into the SQL
timestamp expression; validate and sanitize the cooldown before interpolation by
ensuring it's a non-negative integer (e.g. check with a digit-only regex or
coerce with printf '%d' and fallback to the default
CONTAINER_POOL_RATE_LIMIT_COOLDOWN if invalid), optionally clamp to a reasonable
max, then use that sanitized numeric variable in the SQL (leave sql_escape for
container_id unchanged). Update the pool_mark_rate_limited function to perform
this validation/fallback and reference the sanitized variable (e.g.
cooldown_seconds) in the strftime('+${cooldown_seconds} seconds') fragment.
| # Migrate: create container_pool tables if missing (t1165.2) | ||
| # Container pool manager — spawn/destroy containers, health checks, | ||
| # round-robin dispatch, per-container rate limit tracking. | ||
| local has_container_pool | ||
| has_container_pool=$(db "$SUPERVISOR_DB" "SELECT count(*) FROM sqlite_master WHERE type='table' AND name='container_pool';" 2>/dev/null || echo "0") | ||
| if [[ "$has_container_pool" -eq 0 ]]; then | ||
| log_info "Creating container_pool tables (t1165.2)..." | ||
| _create_container_pool_schema | ||
| log_success "Created container_pool and container_dispatch_log tables (t1165.2)" | ||
| fi |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find all scripts that source database.sh
echo "=== Scripts sourcing database.sh ==="
rg -n "source .*database\.sh|^\s*\. .*database\.sh" -g '*.sh' --no-heading
echo -e "\n=== Scripts sourcing container-pool.sh ==="
rg -n "source .*container-pool\.sh|^\s*\. .*container-pool\.sh" -g '*.sh' --no-heading
echo -e "\n=== Entry points (main scripts in supervisor/) ==="
fd -t f -e sh "^[^.]" .agents/scripts/supervisor/ | head -20
echo -e "\n=== Check supervisor/ directory structure ==="
ls -la .agents/scripts/supervisor/ | grep -E "\.sh$"Repository: marcusquinn/aidevops
Length of output: 3856
🏁 Script executed:
#!/bin/bash
echo "=== Check backfill-blocked-comments.sh calls to ensure_db/init_db ==="
rg -n "ensure_db|init_db" .agents/scripts/supervisor/backfill-blocked-comments.sh
echo -e "\n=== Check test-dual-cli-e2e.sh calls to ensure_db/init_db (sample) ==="
rg -n "ensure_db|init_db" tests/test-dual-cli-e2e.sh | head -10
echo -e "\n=== Check test-dispatch-claude-cli.sh calls to ensure_db/init_db (sample) ==="
rg -n "ensure_db|init_db" tests/test-dispatch-claude-cli.sh | head -10
echo -e "\n=== Check if _create_container_pool_schema is conditionally defined ==="
rg -n "_create_container_pool_schema" .agents/scripts/supervisor/database.sh
echo -e "\n=== Check sourcing order in supervisor-helper.sh ==="
sed -n '210,245p' .agents/scripts/supervisor-helper.shRepository: marcusquinn/aidevops
Length of output: 2290
🏁 Script executed:
#!/bin/bash
echo "=== Check for module-level calls to ensure_db/init_db in database.sh ==="
# Look for calls outside of function definitions
rg -n "ensure_db|init_db" .agents/scripts/supervisor/database.sh | head -20
echo -e "\n=== Look at the actual function definitions and calls in database.sh ==="
rg -B2 -A5 "^ensure_db\(\)|^init_db\(\)" .agents/scripts/supervisor/database.sh | head -40
echo -e "\n=== Check if backfill-blocked-comments.sh actually needs container-pool ==="
cat .agents/scripts/supervisor/backfill-blocked-comments.shRepository: marcusquinn/aidevops
Length of output: 4504
🏁 Script executed:
#!/bin/bash
echo "=== Context around lines 941 and 973 (ensure_db calls) ==="
sed -n '935,945p' .agents/scripts/supervisor/database.sh
echo "---"
sed -n '968,978p' .agents/scripts/supervisor/database.sh
echo -e "\n=== Check what function/block contains these calls ==="
sed -n '900,950p' .agents/scripts/supervisor/database.sh | cat -nRepository: marcusquinn/aidevops
Length of output: 2950
🏁 Script executed:
#!/bin/bash
echo "=== Check end of database.sh for module-level invocations ==="
tail -50 .agents/scripts/supervisor/database.sh
echo -e "\n=== Check if database.sh sources container-pool.sh internally ==="
head -50 .agents/scripts/supervisor/database.sh | grep -i "source.*container-pool"Repository: marcusquinn/aidevops
Length of output: 1368
Potential scoping issue: _create_container_pool_schema is called by ensure_db/init_db but may not be defined at call time.
ensure_db (lines 767-776) and init_db (line 930) invoke _create_container_pool_schema, which is defined in container-pool.sh. However, database.sh does not source container-pool.sh internally. While supervisor-helper.sh correctly sources container-pool.sh after database.sh, direct consumers of database.sh (such as backfill-blocked-comments.sh or test utilities) may source only database.sh without container-pool.sh. This creates a fragile dependency: if any code path manually sources database.sh and later calls ensure_db or init_db, it will fail with an undefined function error.
Recommendation: Either (1) add source container-pool.sh at the start of database.sh to guarantee the function is available, (2) move _create_container_pool_schema into database.sh, or (3) add a defensive check in ensure_db/init_db to validate the function exists before calling it.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.agents/scripts/supervisor/database.sh around lines 767 - 776, ensure_db and
init_db call the helper _create_container_pool_schema which is defined in
container-pool.sh but database.sh doesn’t source that file; update database.sh
to ensure the function is available before calling it by adding a defensive
check in ensure_db/init_db: test whether _create_container_pool_schema is
defined (e.g., via declare -f or type) and if not, source container-pool.sh (or
abort with a clear error), or alternatively source container-pool.sh near the
top of database.sh to guarantee availability; reference ensure_db, init_db, and
_create_container_pool_schema when making the change.
| # t1165.2: Container pool dispatch — if pool has healthy containers, | ||
| # select one via round-robin and record the dispatch. The container ID | ||
| # is stored in dispatch_target as "pool:<container_id>" for tracking. | ||
| # Pool dispatch is opt-in: only used when SUPERVISOR_USE_CONTAINER_POOL=true | ||
| # or when the task has dispatch_target=pool. | ||
| local pool_container_id="" | ||
| if [[ "${SUPERVISOR_USE_CONTAINER_POOL:-false}" == "true" ]]; then | ||
| pool_container_id=$(pool_select_container 2>/dev/null) || pool_container_id="" | ||
| if [[ -n "$pool_container_id" ]]; then | ||
| log_info "Container pool: selected $pool_container_id for $task_id (t1165.2)" | ||
| pool_record_dispatch "$pool_container_id" "$task_id" 2>/dev/null || true | ||
| fi | ||
| fi |
There was a problem hiding this comment.
Pool selection is recorded but never used for routing.
pool_select_container sets pool_container_id and updates pool stats, yet the dispatch still follows the normal local/remote path and never uses the selected container. This makes pool metrics incorrect and the comment about dispatch_target=pool isn’t honored. Either wire the selected container into a pool-specific dispatch branch (and persist it on the task), or gate selection so it only runs when an actual pool dispatch path exists.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.agents/scripts/supervisor/dispatch.sh around lines 3222 - 3234, The pool
selection is never wired into routing—after getting pool_container_id you must
set and persist the task's dispatch target and short-circuit to a pool-specific
dispatch branch so the selected container is actually used. Concretely: when
pool_select_container returns a non-empty pool_container_id, set
dispatch_target="pool:$pool_container_id", persist that value to whatever task
metadata/store the dispatcher later reads (so downstream code sees it), and
route the task through the pool-specific dispatch path that sends the job to
that container (and keep calling pool_record_dispatch to update metrics).
Alternatively, only invoke pool_select_container when the subsequent routing
supports a "pool:" dispatch_target; ensure any routing logic (the parts that
read dispatch_target) recognizes the "pool:<id>" prefix and sends the task to
that container.



Summary
container-pool.sh) to the supervisor for managing Docker/OrbStack container poolsdispatch.shvia opt-inSUPERVISOR_USE_CONTAINER_POOL=trueenvironment variableWhat's included
New files
.agents/scripts/supervisor/container-pool.sh— Pool manager module (849 lines)tests/test-container-pool.sh— 45 unit tests covering all pool operationsModified files
.agents/scripts/supervisor/database.sh— Schema migration forcontainer_poolandcontainer_dispatch_logtables.agents/scripts/supervisor/dispatch.sh— Pool-aware dispatch hook (selects container via round-robin before dispatch).agents/scripts/supervisor-helper.sh— Module source +poolcommand routingFeatures
last_dispatch_attimestamprate_limitedoutcomeCONTAINER_POOL_MINsupervisor-helper.sh pool {spawn|destroy|list|status|health|select|rate-limit|cleanup|help}Quality
local var="$1"pattern throughoutRef #1764
Summary by CodeRabbit
Release Notes
New Features
poolcommandTests