-
Notifications
You must be signed in to change notification settings - Fork 5
test: add comprehensive test suite for critical scripts (t135.13) #466
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Fix run-tests.sh to use BASH_SOURCE-relative path instead of hardcoded ~/git/ - Add test-smoke-help.sh: bash -n syntax check (160 scripts) + help command smoke tests (98 scripts) + ShellCheck on critical scripts - Add test-supervisor-state-machine.sh: 46 tests covering state transitions, retry logic, batch completion, metadata, error handling, cancellation - Add test-memory-mail.sh: 37 tests covering memory store/recall/FTS5/ namespaces/versioning + mail send/check/read/archive/register/deregister
WalkthroughThis pull request introduces comprehensive test coverage for core helper scripts and enhances repository path resolution. Changes include a path resolution fix in the Docker test runner and three new test suites validating memory-mail, smoke-test, and supervisor state-machine functionality. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @marcusquinn, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the project's testing infrastructure by introducing a comprehensive suite of tests for critical shell scripts. It addresses a pathing issue in the test runner and adds new test files to validate script syntax, help command functionality, the supervisor state machine's logic, and the core memory and mail helper scripts. These additions aim to improve the reliability, maintainability, and robustness of the codebase by ensuring that key components behave as expected across various scenarios. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Sat Feb 7 19:31:35 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces an impressive and comprehensive test suite for several critical scripts, significantly improving the project's test coverage and robustness. The new test scripts for the supervisor state machine, memory/mail helpers, and the smoke/help checks are well-structured, thorough, and use best practices like isolated test environments.
I've identified a couple of areas for improvement in tests/test-smoke-help.sh to enhance the reliability of the ShellCheck validation and to streamline the script processing logic. My detailed comments are provided on the relevant lines.
Overall, this is an excellent contribution that greatly enhances the quality assurance of the agent scripts.
| # Skip archived scripts | ||
| [[ "$script" == *"_archive/"* ]] && continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| fail "help: $name" "No output produced (exit=$help_exit)" | ||
| help_fail=$((help_fail + 1)) | ||
| fi | ||
| done < <(git -C "$REPO_DIR" ls-files '.agents/scripts/*.sh') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency with the syntax-check loop and to make the check inside the loop redundant, you can filter out archived scripts at this stage. This improves efficiency by not iterating over scripts that will be skipped anyway.
| done < <(git -C "$REPO_DIR" ls-files '.agents/scripts/*.sh') | |
| done < <(git -C "$REPO_DIR" ls-files '.agents/scripts/*.sh' | grep -v '_archive/') |
| sc_output=$(shellcheck -S error "$script_path" 2>&1 || true) | ||
| sc_errors=$(echo "$sc_output" | grep -c "error" || true) | ||
| if [[ "$sc_errors" -eq 0 ]]; then | ||
| pass "shellcheck: $name (0 errors)" | ||
| else | ||
| fail "shellcheck: $name ($sc_errors errors)" \ | ||
| "$(echo "$sc_output" | head -5)" | ||
| fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current method of counting ShellCheck issues using grep -c "error" is not fully reliable. It can produce false positives if the word "error" appears in the script's source code (e.g., in comments shown as context) and may not accurately count the number of distinct errors. A more robust approach is to rely on shellcheck's exit code, which is non-zero when issues are found. This aligns with the principle of capturing a command's exit code for reliable conditional logic.
| sc_output=$(shellcheck -S error "$script_path" 2>&1 || true) | |
| sc_errors=$(echo "$sc_output" | grep -c "error" || true) | |
| if [[ "$sc_errors" -eq 0 ]]; then | |
| pass "shellcheck: $name (0 errors)" | |
| else | |
| fail "shellcheck: $name ($sc_errors errors)" \ | |
| "$(echo "$sc_output" | head -5)" | |
| fi | |
| if sc_output=$(shellcheck -S error "$script_path" 2>&1); then | |
| pass "shellcheck: $name (0 errors)" | |
| else | |
| sc_errors=$(echo "$sc_output" | grep -cE '^(In|For) ' || true) | |
| fail "shellcheck: $name ($sc_errors errors)" \ | |
| "$(echo "$sc_output" | head -5)" | |
| fi |
References
- In shell scripts, capture a command's exit code in a variable instead of using $? directly in conditionals. This aligns with ShellCheck SC2181 and improves clarity for multi-way branches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@tests/test-memory-mail.sh`:
- Around line 42-47: The skip() helper currently drops its optional detail
argument so calls like skip "Namespace store failed" "$ns_output" lose the
message; update skip() to accept and print an optional second argument (similar
to how fail() prints details) by appending the detail output to the printed SKIP
line or on a new indented line when $2 is non-empty, ensuring all existing
skip("...","...") call sites preserve their second-argument details.
In `@tests/test-smoke-help.sh`:
- Around line 180-181: The current use of "|| true" after capturing help_output
masks the real exit status so help_exit is always 0; remove the "|| true" and
instead temporarily disable errexit around the timeout invocation (e.g. set +e
before running help_output=$(timeout 5 bash "$abs_path" help 2>&1) and then
capture help_exit=$? immediately afterwards, then restore set -e) so the actual
timeout/exit code from timeout and the tested variables help_output and
help_exit reflect the real result.
In `@tests/test-supervisor-state-machine.sh`:
- Around line 95-100: The pass message mentions batch_tasks but the condition
that builds the tables check (variable tables populated by test_db and compared
with patterns "tasks", "batches", "state_log") doesn't verify "batch_tasks";
update the condition to also check for "batch_tasks" (e.g., include && "$tables"
== *"batch_tasks"*) so the if matches the pass message, using the same variables
and functions (tables, test_db, pass, fail), or alternatively remove
"batch_tasks" from the pass string to keep message and condition consistent.
🧹 Nitpick comments (6)
tests/docker/run-tests.sh (1)
3-3: Consider aligningsetflags with the newer test scripts.The new test files (
test-smoke-help.sh,test-memory-mail.sh,test-supervisor-state-machine.sh) all useset -euo pipefail, while this script usesset -uo pipefail(no-e). The omission of-eis defensible here since individual test commands are allowed to fail, but worth noting as a minor inconsistency across the test suite.tests/test-smoke-help.sh (3)
227-228: ShellCheck error counting is fragile —grep -c "error"matches the word anywhere in the output.The string
"error"can appear in ShellCheck's diagnostic messages even for non-error severity levels, inflating the count. A more reliable approach would be to use ShellCheck's exit code directly (non-zero means findings at the requested severity) or use a structured output format.♻️ Proposed alternative using exit code
- sc_output=$(shellcheck -S error "$script_path" 2>&1 || true) - sc_errors=$(echo "$sc_output" | grep -c "error" || true) - if [[ "$sc_errors" -eq 0 ]]; then + sc_exit=0 + sc_output=$(shellcheck -S error "$script_path" 2>&1) || sc_exit=$? + if [[ "$sc_exit" -eq 0 ]]; then pass "shellcheck: $name (0 errors)" else - fail "shellcheck: $name ($sc_errors errors)" \ + fail "shellcheck: $name (exit code $sc_exit)" \ "$(echo "$sc_output" | head -5)" fi
23-54: Consider extracting the shared test framework into a common sourced file.The
pass(),fail(),skip(), andsection()functions are duplicated acrosstest-smoke-help.sh,test-memory-mail.sh,test-supervisor-state-machine.sh, and (in a slightly different form)tests/docker/run-tests.sh. Extracting these into a sharedtests/test-framework.shthat each test script sources would reduce duplication and ensure consistent behavior (e.g., the--verbosegating in this file vs. always-print in the others).
64-75: Syntax check suppresses error details.
bash -n "$abs_path" 2>/dev/nullsilences the parse error output. When a script fails syntax validation, the developer gets no clue about what broke. Consider capturing stderr and including it in thefailmessage, at least in verbose mode.♻️ Proposed improvement
- if bash -n "$abs_path" 2>/dev/null; then + syntax_errors=$(bash -n "$abs_path" 2>&1) || true + if [[ -z "$syntax_errors" ]]; then pass "syntax: $name" syntax_pass=$((syntax_pass + 1)) else - fail "syntax: $name" "bash -n failed" + fail "syntax: $name" "$syntax_errors" syntax_fail=$((syntax_fail + 1)) fitests/test-memory-mail.sh (1)
20-20:VERBOSEis captured but never used in this file.Unlike
test-smoke-help.shwhere--verbosegates pass/skip output, hereVERBOSEis assigned on line 20 but never referenced. Either wire it into the output functions or remove it for clarity.tests/test-supervisor-state-machine.sh (1)
72-79: Per coding guidelines, preferlocalfor parameter assignments in shell functions.
get_status()andget_field()use$1/$2directly in string interpolation. Usinglocalmakes intent clearer and avoids accidental reference to the wrong positional parameter in more complex functions. As per coding guidelines, use thelocal var="$1"pattern.♻️ Proposed refactor
get_status() { - test_db "SELECT status FROM tasks WHERE id = '$1';" + local task_id="$1" + test_db "SELECT status FROM tasks WHERE id = '$task_id';" } get_field() { - test_db "SELECT $2 FROM tasks WHERE id = '$1';" + local task_id="$1" + local field="$2" + test_db "SELECT $field FROM tasks WHERE id = '$task_id';" }
|
|
||
| skip() { | ||
| SKIP_COUNT=$((SKIP_COUNT + 1)) | ||
| TOTAL_COUNT=$((TOTAL_COUNT + 1)) | ||
| printf " \033[0;33mSKIP\033[0m %s\n" "$1" | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
skip() silently ignores its second argument — detail messages are lost.
skip() (lines 42–47) accepts only $1, but it's called with two arguments at lines 209 and 395 (e.g., skip "Namespace store failed" "$ns_output"). The detail message is silently discarded. Either extend skip() to print an optional second argument (like fail() does), or remove the extra args from the call sites.
🐛 Proposed fix — extend skip() to match fail()
skip() {
SKIP_COUNT=$((SKIP_COUNT + 1))
TOTAL_COUNT=$((TOTAL_COUNT + 1))
printf " \033[0;33mSKIP\033[0m %s\n" "$1"
+ if [[ -n "${2:-}" ]]; then
+ printf " %s\n" "$2"
+ fi
}Also applies to: 209-209
🤖 Prompt for AI Agents
In `@tests/test-memory-mail.sh` around lines 42 - 47, The skip() helper currently
drops its optional detail argument so calls like skip "Namespace store failed"
"$ns_output" lose the message; update skip() to accept and print an optional
second argument (similar to how fail() prints details) by appending the detail
output to the printed SKIP line or on a new indented line when $2 is non-empty,
ensuring all existing skip("...","...") call sites preserve their
second-argument details.
| help_output=$(timeout 5 bash "$abs_path" help 2>&1) || true | ||
| help_exit=$? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: || true clobbers the exit code, making timeout detection dead code.
Line 180's || true causes $? on line 181 to always be 0, so the timeout check at line 188 ($help_exit -eq 124) can never trigger. Under set -e, you need a different pattern to capture the real exit code.
🐛 Proposed fix
- help_output=$(timeout 5 bash "$abs_path" help 2>&1) || true
- help_exit=$?
+ help_exit=0
+ help_output=$(timeout 5 bash "$abs_path" help 2>&1) || help_exit=$?📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| help_output=$(timeout 5 bash "$abs_path" help 2>&1) || true | |
| help_exit=$? | |
| help_exit=0 | |
| help_output=$(timeout 5 bash "$abs_path" help 2>&1) || help_exit=$? |
🤖 Prompt for AI Agents
In `@tests/test-smoke-help.sh` around lines 180 - 181, The current use of "||
true" after capturing help_output masks the real exit status so help_exit is
always 0; remove the "|| true" and instead temporarily disable errexit around
the timeout invocation (e.g. set +e before running help_output=$(timeout 5 bash
"$abs_path" help 2>&1) and then capture help_exit=$? immediately afterwards,
then restore set -e) so the actual timeout/exit code from timeout and the tested
variables help_output and help_exit reflect the real result.
| tables=$(test_db "SELECT name FROM sqlite_master WHERE type='table' ORDER BY name;" | tr '\n' ',') | ||
| if [[ "$tables" == *"tasks"* && "$tables" == *"batches"* && "$tables" == *"state_log"* ]]; then | ||
| pass "Required tables exist (tasks, batches, state_log, batch_tasks)" | ||
| else | ||
| fail "Missing required tables" "Found: $tables" | ||
| fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pass message claims batch_tasks is verified, but the condition doesn't check for it.
Line 96 checks for "tasks", "batches", and "state_log" in the table list, but the pass message on line 97 also mentions "batch_tasks". Either add batch_tasks to the condition or remove it from the message to avoid a misleading test result.
🐛 Proposed fix — align condition with message
-if [[ "$tables" == *"tasks"* && "$tables" == *"batches"* && "$tables" == *"state_log"* ]]; then
- pass "Required tables exist (tasks, batches, state_log, batch_tasks)"
+if [[ "$tables" == *"tasks"* && "$tables" == *"batches"* && "$tables" == *"state_log"* && "$tables" == *"batch_tasks"* ]]; then
+ pass "Required tables exist (tasks, batches, state_log, batch_tasks)"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| tables=$(test_db "SELECT name FROM sqlite_master WHERE type='table' ORDER BY name;" | tr '\n' ',') | |
| if [[ "$tables" == *"tasks"* && "$tables" == *"batches"* && "$tables" == *"state_log"* ]]; then | |
| pass "Required tables exist (tasks, batches, state_log, batch_tasks)" | |
| else | |
| fail "Missing required tables" "Found: $tables" | |
| fi | |
| tables=$(test_db "SELECT name FROM sqlite_master WHERE type='table' ORDER BY name;" | tr '\n' ',') | |
| if [[ "$tables" == *"tasks"* && "$tables" == *"batches"* && "$tables" == *"state_log"* && "$tables" == *"batch_tasks"* ]]; then | |
| pass "Required tables exist (tasks, batches, state_log, batch_tasks)" | |
| else | |
| fail "Missing required tables" "Found: $tables" | |
| fi |
🤖 Prompt for AI Agents
In `@tests/test-supervisor-state-machine.sh` around lines 95 - 100, The pass
message mentions batch_tasks but the condition that builds the tables check
(variable tables populated by test_db and compared with patterns "tasks",
"batches", "state_log") doesn't verify "batch_tasks"; update the condition to
also check for "batch_tasks" (e.g., include && "$tables" == *"batch_tasks"*) so
the if matches the pass message, using the same variables and functions (tables,
test_db, pass, fail), or alternatively remove "batch_tasks" from the pass string
to keep message and condition consistent.



Summary
tests/docker/run-tests.shto useBASH_SOURCE-relative path instead of hardcoded~/git/(t135.13.1)tests/test-smoke-help.sh: bash -n syntax check for all 160 non-archived scripts + help command smoke tests for 98 scripts with help functions + ShellCheck on 11 critical scripts (t135.13.2)tests/test-supervisor-state-machine.sh: 46 unit tests covering DB init, task addition, valid/invalid state transitions, post-PR lifecycle, retry logic, error handling, cancellation, blocked state, state log audit trail, metadata fields, batch completion detection, and edge cases (t135.13.3)tests/test-memory-mail.sh: 37 unit tests covering memory store/recall/FTS5 hyphen handling/stats/relational versioning/namespace isolation/prune/help + mail DB init/agent registration/send/check/read/archive/message types/priority/status/deregister/prune/help (t135.13.4)Test Results
All tests use isolated temp directories and do not touch production databases.
Resolves t135.13 subtasks: t135.13.1, t135.13.2, t135.13.3, t135.13.4
Summary by CodeRabbit
Release Notes