t1085.7: AI supervisor testing + validation — dry-run, mock context, token budget, cost reporting#1635
Conversation
Tests cover: dry-run mode with mock context, token budget tracking (context size measurement + 50K token budget enforcement), cost reporting (log audit trail + DB state_log events), JSON action plan parser (4 extraction strategies), concurrency safety (lock files), mailbox/memory/pattern integration verification, CLI interface tests, error handling (missing repo slug, missing gh CLI), and integration tests against live repo (--live flag). 35 tests total: 32 pass, 3 skip (live mode). Uses mock gh CLI and mock helper scripts to avoid slow API calls in unit tests.
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @marcusquinn, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a comprehensive end-to-end test suite for the AI Supervisor pipeline. The tests cover critical aspects such as dry-run modes, token budget adherence, cost reporting, JSON parsing robustness, concurrency safety, and integration with various supervisor components. The suite utilizes mock data and a mock GitHub CLI to ensure fast execution, providing a solid foundation for validating the AI supervisor's behavior and reliability. Highlights
Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Wed Feb 18 02:55:38 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
|
There was a problem hiding this comment.
Code Review
The pull request introduces a comprehensive end-to-end test suite for the AI Supervisor pipeline, covering dry-run modes, token budget tracking, cost reporting, and integration with various modules. The tests use mock environments and a mock gh CLI to ensure fast execution. While the test coverage is excellent, the script violates several rules in the Repository Style Guide, particularly regarding explicit return statements, the local variable pattern for function arguments, and SQLite initialization pragmas. Additionally, some functions wrapping commands should propagate their exit codes using return $? instead of a hardcoded return 0 to avoid masking potential errors. Adhering to these standards will ensure consistency with the rest of the framework's shell scripts.
| pass() { | ||
| PASS=$((PASS + 1)) | ||
| TOTAL=$((TOTAL + 1)) | ||
| echo " PASS: $1" | ||
| } |
There was a problem hiding this comment.
This function violates the Repository Style Guide in two ways: it does not use the local var="$1" pattern for its argument, and it lacks an explicit return statement.
| pass() { | |
| PASS=$((PASS + 1)) | |
| TOTAL=$((TOTAL + 1)) | |
| echo " PASS: $1" | |
| } | |
| pass() { | |
| local msg="$1" | |
| PASS=$((PASS + 1)) | |
| TOTAL=$((TOTAL + 1)) | |
| echo " PASS: $msg" | |
| return 0 | |
| } |
| fail() { | ||
| FAIL=$((FAIL + 1)) | ||
| TOTAL=$((TOTAL + 1)) | ||
| echo " FAIL: $1" | ||
| } |
There was a problem hiding this comment.
This function is missing an explicit return statement and does not follow the local var="$1" pattern for arguments.
| fail() { | |
| FAIL=$((FAIL + 1)) | |
| TOTAL=$((TOTAL + 1)) | |
| echo " FAIL: $1" | |
| } | |
| fail() { | |
| local msg="$1" | |
| FAIL=$((FAIL + 1)) | |
| TOTAL=$((TOTAL + 1)) | |
| echo " FAIL: $msg" | |
| return 0 | |
| } |
| skip() { | ||
| SKIP=$((SKIP + 1)) | ||
| TOTAL=$((TOTAL + 1)) | ||
| echo " SKIP: $1" | ||
| } |
There was a problem hiding this comment.
This function is missing an explicit return statement and does not follow the local var="$1" pattern for arguments.
| skip() { | |
| SKIP=$((SKIP + 1)) | |
| TOTAL=$((TOTAL + 1)) | |
| echo " SKIP: $1" | |
| } | |
| skip() { | |
| local msg="$1" | |
| SKIP=$((SKIP + 1)) | |
| TOTAL=$((TOTAL + 1)) | |
| echo " SKIP: $msg" | |
| return 0 | |
| } |
| setup_test_env() { | ||
| TEST_TMP=$(mktemp -d) | ||
| mkdir -p "$TEST_TMP/logs" "$TEST_TMP/db" | ||
| } |
There was a problem hiding this comment.
This function is missing an explicit return statement. Additionally, functions that wrap commands like mktemp and mkdir should propagate their exit codes to the caller instead of masking potential errors with a hardcoded return 0.
| setup_test_env() { | |
| TEST_TMP=$(mktemp -d) | |
| mkdir -p "$TEST_TMP/logs" "$TEST_TMP/db" | |
| } | |
| setup_test_env() { | |
| TEST_TMP=$(mktemp -d) | |
| mkdir -p "$TEST_TMP/logs" "$TEST_TMP/db" | |
| return $? | |
| } |
References
- All functions must have explicit return statements. (link)
- In shell scripts, functions that wrap a command should propagate its exit code to the caller. Avoid masking potential errors with a hardcoded
return 0. Instead, usereturn $?or capture the exit code and return it.
| cleanup_test_env() { | ||
| [[ -n "$TEST_TMP" && -d "$TEST_TMP" ]] && rm -rf "$TEST_TMP" | ||
| } |
There was a problem hiding this comment.
This function is missing an explicit return statement. Functions that wrap commands like rm -rf should propagate their exit codes to the caller instead of masking potential errors with a hardcoded return 0.
| cleanup_test_env() { | |
| [[ -n "$TEST_TMP" && -d "$TEST_TMP" ]] && rm -rf "$TEST_TMP" | |
| } | |
| cleanup_test_env() { | |
| [[ -n "$TEST_TMP" && -d "$TEST_TMP" ]] && rm -rf "$TEST_TMP" | |
| return $? | |
| } |
References
- All functions must have explicit return statements. (link)
- In shell scripts, functions that wrap a command should propagate its exit code to the caller. Avoid masking potential errors with a hardcoded
return 0. Instead, usereturn $?or capture the exit code and return it.
| AI_ACTIONS_LOG_DIR="$TEST_TMP/logs" | ||
|
|
||
| # Create a minimal DB with test data | ||
| sqlite3 "$SUPERVISOR_DB" " |
There was a problem hiding this comment.
The Repository Style Guide requires all SQLite databases to use WAL mode and a busy timeout. While this is a test setup, it should still follow the established pattern for consistency and to avoid potential locking issues during concurrent test execution.
| sqlite3 "$SUPERVISOR_DB" " | |
| sqlite3 "$SUPERVISOR_DB" " | |
| PRAGMA journal_mode=WAL; | |
| PRAGMA busy_timeout=5000; | |
| CREATE TABLE IF NOT EXISTS tasks ( |
References
- All SQLite databases use WAL mode + busy_timeout=5000. (link)
| exit 0 | ||
| ) | ||
| } |
There was a problem hiding this comment.
The function _test_mock_context is missing an explicit return statement. Although it ends with a subshell that exits, the style guide requires the function itself to have a return. Functions that wrap commands should propagate their exit code to the caller, which is correctly handled by return $? in the suggestion.
| exit 0 | |
| ) | |
| } | |
| exit 0 | |
| ) | |
| return $? | |
| } |
References
- All functions must have explicit return statements. (link)
- In shell scripts, functions that wrap a command should propagate its exit code to the caller. Avoid masking potential errors with a hardcoded
return 0. Instead, usereturn $?or capture the exit code and return it.
| local no_gh_dir="$TEST_TMP/no-gh-bin" | ||
| mkdir -p "$no_gh_dir" | ||
| # Only provide essential commands (sqlite3, jq, etc.) | ||
| for cmd in sqlite3 jq date wc tr sed grep head cut mktemp mv tail printf bash; do |
There was a problem hiding this comment.
The loop variable cmd should be declared as local to avoid leaking into the subshell's scope, following the naming convention for locals.
| for cmd in sqlite3 jq date wc tr sed grep head cut mktemp mv tail printf bash; do | |
| local cmd | |
| for cmd in sqlite3 jq date wc tr sed grep head cut mktemp mv tail printf bash; do |
References
- Variables: lower_snake for locals. (link)



Summary
Comprehensive end-to-end test suite for the AI Supervisor pipeline (t1085.7), covering all testing and validation requirements:
--liveflag): Real GitHub data, has_actionable_work(), full dry-run pipeline against live repo35 tests total: 32 pass, 3 skip (live mode requires
--liveflag)Uses mock
ghCLI and mock helper scripts to avoid slow API calls — test suite runs in ~10 seconds.Ref #1606